These notes attempt to explain how to use the ASN.1 infrastructure to
add new ASN.1 types.  ASN.1 is complicated and easy to get wrong, so
it is best to verify your results against another tool (such as asn1c)
if at all possible.  These notes are up to date as of 2012-02-13.

If you are trying to debug a problem that shows up in the ASN.1
encoder or decoder, skip to the last section.


General
-------

For the moment, a developer must hand-translate the ASN.1 module into
macro invocations that generate data structures used by the encoder
and decoder.  Ideally we would have a tool to compile an ASN.1 module
(and probably some additional information about C identifier mappings)
and generate the macro invocations.

Currently the ASN.1 infrastructure is not visible to applications or
plugins.  For plugin modules shipped as part of the krb5 tree, the
types can be added to asn1_k_encode.c and exported from libkrb5.
Plugin modules built separately from the krb5 tree must use another
tool (such as asn1c) for now if they need to do ASN.1 encoding or
decoding.


Tags
----

Before you start writing macro invocations, it is important to
understand a little bit about ASN.1 tags.  You will most commonly see
tag notation in a sequence definition, like:

  TypeName ::= SEQUENCE {
    field-name [0] IMPLICIT OCTET STRING OPTIONAL
  }

Contrary to intuition, the tag notation "[0] IMPLICIT" is not a
property of the sequence field; instead, it specifies a type that
wraps the type to the right (OCTET STRING).  The right way to think
about the above definition is:

  TypeName is defined as a sequence type
    which has an optional field named field-name
      whose type is a tagged type
        the tag's class is context-specific (by default)
        the tag's number is 0
        it is an implicit tag
        the tagged type wraps OCTET STRING

The other case you are likely to see tag notation is something like:

  AS-REQ ::= [APPLICATION 10] KDC-REQ

This example defines AS-REQ to be a tagged type whose class is
application, whose tag number is 10, and whose base type is KDC-REQ.
The tag may be implicit or explicit depending on the module's tag
environment, which we will get to in a moment.

Tags can have one of four classes: universal, application, private,
and context-specific.  Universal tags are used for built-in ASN.1
types.  Application and context-specific tags are the most common to
see in ASN.1 modules; private is rarely used.  If no tag class is
specified, the default is context-specific.

Tags can be explicit or implicit, and the distinction is important to
the wire encoding.  If a tag's closing bracket is followed by the word
IMPLICIT or EXPLICIT, then it is clear which kind of tag it is, but
usually there will be no such annotation.  If not, the default depends
on the header of the ASN.1 module.  Look at the top of the module for
the word DEFINITIONS.  It may be followed by one of three phrases:

* EXPLICIT TAGS -- in this case, tags default to explicit
* IMPLICIT TAGS -- in this case, tags default to implicit (usually)
* AUTOMATIC TAGS -- tags default to implicit (usually) and are also
  automatically added to sequence fields (usually)

If none of those phrases appear, the default is explicit tags.

Even if a module defaults to implicit tags, a tag defaults to explicit
if its base type is a choice type or ANY type (or the information
object equivalent of an ANY type).

If the module's default is AUTOMATIC TAGS, sequence and set fields
should have ascending context-specific tags wrapped around the field
types, starting from 0, unless one of the fields of the sequence or
set is already a tagged type.  See ITU X.680 section 24.2 for details,
particularly if COMPONENTS OF is used in the sequence definition.


Basic types
-----------

In our infrastructure, a type descriptor specifies a mapping between
an ASN.1 type and a C type.  The first step is to ensure that type
descriptors are defined for the basic types used by your ASN.1 module,
as mapped to the C types used in your structures, in asn1_k_encode.c.
If not, you will need to create it.  For a BOOLEAN or INTEGER ASN.1
type, you will use one of these macros:

  DEFBOOLTYPE(descname, ctype)
  DEFINTTYPE(descname, ctype)
  DEFUINTTYPE(descname, ctype)

where "descname" is an identifier you make up and "ctype" is the
integer type of the C object you want to map the ASN.1 value to.  For
integers, use DEFINTTYPE if the C type is a signed integer type and
DEFUINTTYPE if it is an unsigned type.  (For booleans, the distinction
is unimportant since all integer types can hold the values 0 and 1.)
We don't generally define integer mappings for every typedef name of
an integer type.  For example, we use the type descriptor int32, which
maps an ASN.1 INTEGER to a krb5_int32, for krb5_enctype values.

String types are a little more complicated.  Our practice is to store
strings in a krb5_data structure (rather than a zero-terminated C
string), so our infrastructure currently assumes that all strings are
represented as "counted types", meaning the C representation is a
combination of a pointer and an integer type.  So, first you must
declare a counted type descriptor (we will describe those in more
detail later) with something like:

  DEFCOUNTEDSTRINGTYPE(generalstring, char *, unsigned int,
                       k5_asn1_encode_bytestring, k5_asn1_decode_bytestring,
                       ASN1_GENERALSTRING);

The first parameter is an identifier you make up.  The second and
third parameters are the C types of the pointer and integer holding
the string; for a krb5_data object, those should be the types in the
example.  The pointer type must be char * or unsigned char *.  The
fourth and fifth parameters reference primitive encoder and decoder
functions; these should almost always be the ones in the example,
unless the ASN.1 type is BIT STRING.  The sixth parameter is the
universal tag number of the ASN.1 type, as defined in krbasn1.h.

Once you have defined the counted type, you can define a normal type
descriptor to wrap it in a krb5_data structure with something like:

  DEFCOUNTEDTYPE(gstring_data, krb5_data, data, length, generalstring);


Sequences
---------

In our infrastructure, we model ASN.1 sequences using an array of
normal type descriptors.  Each type descriptor is applied in turn to
the C object to generate (or consume) an encoding of an ASN.1 value.

Of course, each value needs to be stored in a different place within
the C object, or they would just overwrite each other.  To address
this, you must create an offset type wrapper for each sequence field:

  DEFOFFSETTYPE(descname, structuretype, fieldname, basedesc)

where "descname" is an identifier you make up, "structuretype" and
"fieldtype" are used to compute the offset and type-check the
structure field, and "basedesc" is the type of the ASN.1 object to be
stored at that offset.

If your C structure contains a pointer to another C object, you will
need to first define a pointer wrapper, which is very simple:

  DEFPTRTYPE(descname, basedesc)

Then wrap the defined pointer type in an offset type as described
above.  Once a pointer descriptor is defined for a base descriptor, it
can be reused many times, so pointer descriptors are usually defined
right after the types they wrap.  When decoding, pointer wrappers
cause a pointer to be allocated with a block of memory equal to the
size of the C type corresponding to the base type.  (For offset types,
the corresponding C type is the structure type inside which the offset
is computed.)  It is okay for several fields of a sequence to
reference the same pointer field within a structure, as long as the
pointer types all wrap base types with the same corresponding C type.

If the sequence field has a context tag attached to its type, you will
also need to create a tag wrapper for it:

  DEFCTAGGEDTYPE(descname, tagnum, basedesc)
  DEFCTAGGEDTYPE_IMPLICIT(descname, tagnum, basedesc)

Use the first macro for explicit context tags and the second for
implicit context tags.  "tagnum" is the number of the context-specific
tag, and "basedesc" is the name you chose for the offset type above.

You don't actually need to separately write out DEFOFFSETTYPE and
DEFCTAGGEDTYPE for each field.  The combination of offset and context
tag is so common that we have a macro to combine them:

  DEFFIELD(descname, structuretype, fieldname, tagnum, basedesc)
  DEFFIELD_IMPLICIT(descname, structuretype, fieldname, tagnum, basedesc)

Once you have defined tag and offset wrappers for each sequence field,
combine them together in an array and use the DEFSEQTYPE macro to
define the sequence type descriptor:

  static const struct atype_info *my_sequence_fields[] = {
      &k5_atype_my_sequence_0, &k5_atype_my_sequence_1,
  };
  DEFSEQTYPE(my_sequence, structuretype, my_sequence_fields)

Each field name must by prefixed by "&k5_atype_" to get a pointer to
the actual variable used to hold the type descriptor.

ASN.1 sequence types may or may not be defined to be extensible, and
may group extensions together in blocks which must appear together.
Our model does not distinguish these cases.  Our decoder treats all
sequence types as extensible.  Extension blocks must be modeled by
making all of the extension fields optional, and the decoder will not
enforce that they appear together.

If your ASN.1 sequence contains optional fields, keep reading.


Optional sequence fields
------------------------

ASN.1 sequence fields can be annotated with OPTIONAL or, less
commonly, with DEFAULT VALUE.  (Be aware that if DEFAULT VALUE is
specified for a sequence field, DER mandates that fields with that
value not be encoded within the sequence.  Most standards in the
Kerberos ecosystem avoid the use of DEFAULT VALUE for this reason.)
Although optionality is a property of sequence or set fields, not
types, we still model optional sequence fields using type wrappers.
Optional type wrappers must only be used as members of a sequence,
although they can be nested in offset or pointer wrappers first.

The simplest way to represent an optional value in a C structure is
with a pointer which takes the value NULL if the field is not present.
In this case, you can just use DEFOPTIONALZEROTYPE to wrap the pointer
type:

  DEFPTRTYPE(ptr_basetype, basetype);
  DEFOPTIONALZEROTYPE(opt_ptr_basetype, ptr_basetype);

and then use opt_ptr_basetype in the DEFFIELD invocation for the
sequence field.  DEFOPTIONALZEROTYPE can also be used for integer
types, if it is okay for the value 0 to represent that the
corresponding ASN.1 value is omitted.  Optional-zero wrappers, like
pointer wrappers, are usually defined just after the types they wrap.

For null-terminated sequences, you can use a wrapper like this:

  DEFOPTIONALEMPTYTYPE(opt_seqof_basetype, seqof_basetype)

to omit the sequence if it is either NULL or of zero length.

A more general way to wrap optional types is:

  DEFOPTIONALTYPE(descname, predicatefn, initfn, basedesc);

where "predicatefn" has the signature "int (*fn)(const void *p)" and
is used by the encoder to test whether the ASN.1 value is present in
the C object.  "initfn" has the signature "void (*fn)(void *p)" and is
used by the decoder to initialize the C object field if the
corresponding ASN.1 value is omitted in the wire encoding.  "initfn"
can be NULL, in which case the C object will simply be left alone.
All C objects are initialized to zero-filled memory when they are
allocated by the decoder.

An optional string type, represented in a krb5_data structure, can be
wrapped using the nonempty_data function already defined in
asn1_k_encode.c, like so:

  DEFOPTIONALTYPE(opt_ostring_data, nonempty_data, NULL, ostring_data);


Sequence-of types
-----------------

ASN.1 sequence-of types can be represented as C types in two ways.
The simplest is to use an array of pointers terminated in a null
pointer.  A descriptor for a sequence-of represented this way is
defined in three steps:

  DEFPTRTYPE(ptr_basetype, basetype);
  DEFNULLTERMSEQOFTYPE(seqof_basetype, ptr_basetype);
  DEFPTRTYPE(ptr_seqof_basetype, seqof_basetype);

If the C type corresponding to basetype is "ctype", then the C type
corresponding to ptr_seqof_basetype will be "ctype **".  The middle
type sort of corresponds to "ctype *", but not exactly, as it
describes an object of variable size.

You can also use DEFNONEMPTYNULLTERMSEQOFTYPE in the second step.  In
this case, the encoder will throw an error if the sequence is empty.
For historical reasons, the decoder will *not* throw an error if the
sequence is empty, so the calling code must check before assuming a
first element is present.

The other way of representing sequences is through a combination of
pointer and count.  This pattern is most often used for compactness
when the base type is an integer type.  A descriptor for a sequence-of
represented this way is defined using a counted type descriptor:

  DEFCOUNTEDSEQOFTYPE(descname, lentype, basedesc)

where "lentype" is the C type of the length and "basedesc" is a
pointer wrapper for the sequence element type (*not* the element type
itself).  For example, an array of 32-bit signed integers is defined
as:

  DEFINTTYPE(int32, krb5_int32);
  DEFPTRTYPE(int32_ptr, int32);
  DEFCOUNTEDSEQOFTYPE(cseqof_int32, krb5_int32, int32_ptr);

To use a counted sequence-of type in a sequence, use DEFCOUNTEDTYPE:

  DEFCOUNTEDTYPE(descname, structuretype, ptrfield, lenfield, cdesc)

where "structuretype", "ptrfield", and "lenfield" are used to compute
the field offsets and type-check the structure fields, and "cdesc" is
the name of the counted type descriptor.

The combination of DEFCOUNTEDTYPE and DEFCTAGGEDTYPE can be
abbreviated using DEFCNFIELD:

  DEFCNFIELD(descname, structuretype, ptrfield, lenfield, tagnum, cdesc)


Tag wrappers
------------

We've previously covered DEFCTAGGEDTYPE and DEFCTAGGEDTYPE_IMPLICIT,
which are used to define context-specific tag wrappers.  There are
two other macros for creating tag wrappers.  The first is:

  DEFAPPTAGGEDTYPE(descname, tagnum, basedesc)

Use this macro to model an "[APPLICATION tagnum]" tag wrapper in an
ASN.1 module.

There is also a general tag wrapper macro:

  DEFTAGGEDTYPE(descname, class, construction, tag, implicit, basedesc)

where "class" is one of UNIVERSAL, APPLICATION, CONTEXT_SPECIFIC, or
PRIVATE, "construction" is one of PRIMITIVE or CONSTRUCTED, "tag" is
the tag number, "implicit" is 1 for an implicit tag and 0 for an
explicit tag, and "basedesc" is the wrapped type.  Note that that
primitive vs. constructed is not a concept within the abstract ASN.1
type model, but is instead a concept used in DER.  In general, all
explicit tags should be constructed (but see the section on "Dirty
tricks" below).  The construction parameter is ignored for implicit
tags.


Choice types
------------

ASN.1 CHOICE types are represented in C using a signed integer
distinguisher and a union.  Modeling a choice type happens in three
steps:

1. Define type descriptors for each alternative of the choice,
typically using DEFCTAGGEDTYPE to create a tag wrapper for an existing
type.  There is no need to create offset type wrappers, as union
fields always have an offset of 0.  For example:

  DEFCTAGGEDTYPE(my_choice_0, 0, firstbasedesc);
  DEFCTAGGEDTYPE(my_choice_1, 1, secondbasedesc);

2. Assemble them into an array, similar to how you would for a
sequence, and use DEFCHOICETYPE to create a counted type descriptor:

  static const struct atype_info *my_choice_alternatives[] = {
      &k5_atype_my_choice_0, &k5_atype_my_choice_1
  };
  DEFCHOICETYPE(my_choice, union my_choice_choices, enum my_choice_selector,
                my_choice_alternatives);

The second and third parameters to DEFCHOICETYPE are the C types of
the union and distinguisher fields.

3. Wrap the counted type descriptor in a type descriptor for the
structure containing the distinguisher and union:

  DEFCOUNTEDTYPE_SIGNED(descname, structuretype, u, choice, my_choice);

The third and fourth parameters to DEFCOUNTEDTYPE_SIGNED are the field
names of the union and distinguisher fields within structuretype.

ASN.1 choice types may be defined to be extensible, or may not be.
Our model does not distinguish between the two cases.  Our decoder
treats all choice types as extensible.

Our encoder will throw an error if the distinguisher is not within the
range of valid offsets of the alternatives array.  Our decoder will
set the distinguisher to -1 if the tag of the ASN.1 value is not
matched by any of the alternatives, and will leave the union
zero-filled in that case.


Counted type descriptors
------------------------

Several times in earlier sections we've referred to the notion of
"counted type descriptors" without defining what they are.  Counted
type descriptors live in a separate namespace from normal type
descriptors, and specify a mapping between an ASN.1 type and two C
objects, one of them having integer type.  There are four kinds of
counted type descriptors, defined using the following macros:

  DEFCOUNTEDSTRINGTYPE(descname, ptrtype, lentype, encfn, decfn, tagnum)
  DEFCOUNTEDDERTYPE(descname, ptrtype, lentype)
  DEFCOUNTEDSEQOFTYPE(descname, lentype, baseptrdesc)
  DEFCHOICETYPE(descname, uniontype, distinguishertype, fields)

DEFCOUNTEDDERTYPE is described in the "Dirty tricks" section below.
The other three kinds of counted types have been covered previously.

Counted types are always used by wrapping them in a normal type
descriptor with one of these macros:

  DEFCOUNTEDTYPE(descname, structuretype, datafield, countfield, cdesc)
  DEFCOUNTEDTYPE_SIGNED(descname, structuretype, datafield, countfield, cdesc)

These macros are similar in concept to an offset type, only with two
offsets.  Use DEFCOUNTEDTYPE if the count field is unsigned,
DEFCOUNTEDTYPE_SIGNED if it is signed.


Defining encoder and decoder functions
--------------------------------------

After you have created a type descriptor for your types, you need to
create encoder or decoder functions for the ones you want calling code
to be able to process.  Do this with one of the following macros:

  MAKE_ENCODER(funcname, desc)
  MAKE_DECODER(funcname, desc)
  MAKE_CODEC(typename, desc)

MAKE_ENCODER and MAKE_DECODER allow you to choose function names.
MAKE_CODEC defines encoder and decoder functions with the names
"encode_typename" and "decode_typename".

If you are defining functions for a null-terminated sequence, use the
descriptor created with DEFNULLTERMSEQOFTYPE or
DEFNONEMPTYNULLTERMSEQOFTYPE, rather than the pointer to it.  This is
because encoder and decoder functions implicitly traffic in pointers
to the C object being encoded or decoded.

Encoder and decoder functions must be prototyped separately, either in
k5-int.h or in a subsidiary included by it.  Encoder functions have
the prototype:

  krb5_error_code encode_typename(const ctype *rep, krb5_data **code_out);

where "ctype" is the C type corresponding to desc.  Decoder functions
have the prototype:

  krb5_error_code decode_typename(const krb5_data *code, ctype **rep_out);

Decoder functions allocate a container for the C type of the object
being decoded and return a pointer to it in *rep_out.


Writing test cases
------------------

New ASN.1 types in libkrb5 will typically only be accepted with test
cases.  Our current test framework lives in src/tests/asn.1.  Adding
new types to this framework involves the following steps:

1. Define an initializer for a sample value of the type in ktest.c,
named ktest_make_sample_typename().  Also define a contents-destructor
for it, named ktest_empty_typename().  Prototype these functions in
ktest.h.

2. Define an equality test for the type in ktest_equal.c.  Prototype
this in ktest_equal.h.  (This step is not necessary if the type has no
decoder.)

3. Add a test case to krb5_encode_test.c, following the examples of
existing test cases there.  Update reference_encode.out and
trval_reference.out to contain the output generated by your test case.

4. Add a test case to krb5_decode_test.c, following the examples of
existing test cases there, and using the output generated by your
encode test.

5. Add a test case to krb5_decode_leak.c, following the examples of
existing test cases there.

Following these steps will not ensure the correctness of your
translation of the ASN.1 module to macro invocations; it only lets us
detect unintentional changes to the encodings after they are defined.
To ensure that your translations are correct, you should extend
tests/asn.1/make-vectors.c and use "make test-vectors" to create
vectors using asn1c.


Dirty tricks
------------

In rare cases you may want to represent the raw DER encoding of a
value in the C structure.  If so, you can use DEFCOUNTEDDERTYPE (or
more likely, the existing der_data type descriptor).  The encoder and
decoder will throw errors if the wire encoding doesn't have a valid
outermost tag, so be sure to use valid DER encodings in your test
cases (see ktest_make_sample_algorithm_identifier for an example).

Conversely, the ASN.1 module may define an OCTET STRING wrapper around
a DER encoding which you want to represent as the decoded value.  (The
existing example of this is in PKINIT hash agility, where the
PartyUInfo and PartyVInfo fields of OtherInfo are defined as octet
strings which contain the DER encodings of KRB5PrincipalName values.)
In this case you can use a DEFTAGGEDTYPE wrapper like so:

  DEFTAGGEDTYPE(descname, UNIVERSAL, PRIMITIVE, ASN1_OCTETSTRING, 0,
                basedesc)


Limitations
-----------

We cannot currently encode or decode SET or SET OF types.

We cannot model self-referential types (like "MATHSET ::= SET OF
MATHSET").

If a sequence uses an optional field that is a choice field (without
a context tag wrapper), or an optional field that uses a stored DER
encoding (again, without a context tag wrapper), our decoder may
assign a value to the choice or stored-DER field when the correct
behavior is to skip that field and assign the value to a subsequent
field.  It should be very rare for ASN.1 modules to use choice or open
types this way.

For historical interoperability reasons, our decoder accepts the
indefinite length form for constructed tags, which is allowed by BER
but not DER.  We still require the primitive forms of basic scalar
types, however, so we do not accept all BER encodings of ASN.1 values.


Debugging
---------

If you are looking at a stack trace with a bunch of ASN.1 encoder or
decoder calls at the top, here are some notes that might help with
debugging:

1. You may have noticed that the entry point into the encoder is
defined by a macro like MAKE_CODEC.  Don't worry about this; those
macros just define thin wrappers around k5_asn1_full_encode and
k5_asn1_full_decode.  If you are stepping through code and hit a
wrapper function, just enter "step" to get into the actual encoder or
decoder function.

2. If you are in the encoder, look for stack frames in
encode_sequence(), and print the value of i within those stack frames.
You should be able to subtract 1 from those values and match them up
with the sequence field offsets in asn1_k_encode.c for the type being
encoded.  For example, if an as-req is being encoded and the i values
(starting with the one closest to encode_krb5_as_req) are 4, 2, and 2,
you could match those up as following:

* as_req_encode wraps untagged_as_req, whose field at offset 3 is the
  descriptor for kdc_req_4, which wraps kdc_req_body.

* kdc_req_body is a function wrapper around kdc_req_hack, whose field
  at offset 1 is the descriptor for req_body_1, which wraps
  opt_principal.

* opt_principal wraps principal, which wraps principal_data, whose
  field at offset 1 is the descriptor for princname_1.

* princname_1 is a sequence of general strings represented in the data
  and length fields of the krb5_principal_data structure.

So the problem would likely be in the data components of the client
principal in the kdc_req structure.

3. If you are in the decoder, look for stacks frames in
decode_sequence(), and again print the values of i.  You can match
these up just as above, except without subtracting 1 from the i
values.