Home

01 Jun 2014

ASN.1 and DER

ASN.1

ASN.1 stands for abstract syntax notation one. It is a way of describing rules for representing information between different systems in a networked environment. It creates consistency and compatibility between varying systems by defining an abstract syntax for the information being recorded that does not limit the way it can be encoded. In order to send information over a network, it needs to be converted into a stream of bits. Data stored as ASN.1 consist of sequences of bytes. There are many different encoding rules for ASN.1. Some groups of rules are “Basic Encoding Rules (BER)”, “Distinguished Encoding Rules (DER)”, and “XML Encoding Rules (XER)”.

DER

DER is a subset of BER that creates one consistent method of representing any value as an octet string. (It also states that for simple string types, the primitive method depicting a definite-length must be used whereas for structured types, the constructed method shall be used, but more on this later)

The DER format consists of storing information in triplets in the following order: a type tag, length of data, value of data. As for the type tag, simple types include integers and strings whereas structured types, such as SEQUENCE, act like a container and consist of components within the type. Types also belong to a class. For example, there are “Universal” types that are defined by X.208 that are the same for all applications. Some examples of universal types, listed in HEX are:

INTEGER – 02

NULL – 05

OBJECT IDENTIFIER – 06

UTF8String – 12

IA5String – 16

Class types can be universal, application, context-specific, and private. “Application” types are defined per application. Lets create an example of a user information structure that we will encode using ASN.1. Conceptually, ASN.1 follows this kind of notation:

World-Schema DEFINITIONS AUTOMATIC TAGS ::= 
BEGIN
  UserInfo ::= SEQUENCE {
     userID INTEGER,
     token UTF8String
  }
END

::= is the ASN.1 assignment operator.

Given the notation above, we can now write an example instance of the user information object:

myUser UserInfo ::= 
{  
    userID 111,
    token "qwerty"
}

That same information when encoded in an XER version of XML would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<UserInfo>
  <userID>111</userID>
  <token>qwerty</token>
</UserInfo>

Types

In the XML format, the information is human-readable. In DER, it would instead use the triplet scheme that contains the type, length and value. So for our example where the user ID is 111, 111 is 6F in HEX so the output would look like this: 02 01 6F

The 02, that’s the INTEGER type, then the length of the integer (1), then the value of 6F (111 in decimal).

Types can be represented as either primitive or constructed. Simple types where a definite length of data is known ahed of time will use the PRIMITIVE method of representation. Structured types use the CONSTRUCTED method which can include an end-of-content octet (two octets of 00 00) when the length is unknown. For primitive methods, the class and tag are stored together in the same octet. Tag numbers between 0 and 30 store the class in bits 8 and 7. Bit 6 determines if the primitive or constructed method is being used; 0 for primitive and 1 for constructed. Here it is a 0 to declare the primitive method is being used. Bits 5 to 1 will store the tag number.

NOTE: For primitive methods when tag numbers are 31 or greater, two or more octets are used such that the first octet is the same as already described except that the 5-1 bits are all set to 1. Then the second and onward octets give the tag number in base 128 where the most significant digit is first. Bit 8 of each of the octets are set to 1 except the last octet where it is set to 0 to denote the end of the octets.

For constructed definite-length types: Bits 8 and 7 are 00 to depict universal. Bit 6 is 1 indicating the encoding is constructed. Bits 5-1 represent the tag number.

For constructed types of an indefinite length, start with the same structure as above, followed by a length octet of 80, then the contents, and ending with 00 00 to denote the end of the octets.

For tagging, we can declare a new tag that is not from the universal class of tags, in other words, a context-specific class. This can still be used to tell the parser what type of value it is but also uniquely tags or names the value. In the XML format, for example, each entity is named using tags such as

<userID></userID>

In ASN.1 it is not, which is why the non-universal tags may need to be defined and used. This is especially true when there are duplicates of the same type in a structure that are optional. If a structure contained two optional integers and only one integer was present, the parser would not know which of the optional variables that integer was for. By defending a context-specific tag for each type, we can distinguish between multiple entries of the same data type.

When the length of the data type is known by both sides of the communication, we can drop the tag type and just pass the encoding for the specific tag. This creates two types of tagging methods, implicit and explicit. Implicit tagging tags or distinguishes the value but the data type is implicit because it is assumed both sides know the size. Explicit tagging also tags or distinguishes the value but also explicitly sets a type for the data. Explicit tagging therefor requires two additional bytes for each tag. So to reduce overhead where both sides know what the custom size should be, then we can omit the data type and save space by using implicit tagging.

Encoding an Object

So back to our example, lets encode our object which as you recall includes a userID integer of 111 and a token string of “qwerty”. Lets start with the tags. Bits 8 and 7 will be set to 10 to depict that context-specific (user-defined) tagging will be used. Bit 6 will be set to 0 to indicate that the type is primitive. 5-1 will include the implicit tags we will use. The first tag will be 0 for the userID integer tag and the second will be 1 for the token entry, so

1000 0000 = 80 HEX is the first tag.

1000 0001 = 81 HEX is the second tag.

Now that we have this information, we can fill out the full ASN.1 representation. Given our information we will have this format in ASN.1 notation:

UserInfo SEQUENCE: tag = [UNIVERSAL 16] constructed; length = 11
  userID INTEGER: tag = [0] primitive; length = 1 111
  token UTF8String: tag = [1] primitive; length = 6 0x717765727479

Lets convert the entire structure, in order, into HEX bytes. First comes the sequence tag, followed by the length:

30 – SEQUENCE type tag

0B – length of 11

Next we have our implicit tag for the integer value:

80 – first tag for userID

It’s length is 1 and the actual data is 111:

01 – the length of 1

6F – the value 111

Then we have our second implicit tag for the token, a length of 6, and the data:

81 – the second implicit tag for the token entry

06 – a length of 6

717765727479 – encoded string data

Putting that together in HEX octets makes 300B8001 6F810671 77657274 79

This can now be sent across the network as a stream of bits.

More Information

If you would like to test out your own objects and conversions, there is a great online tool here

A good ASN.1 C Compiler library that you can use in your applications can be found here.

For more information about ASN.1, check out these references:

A Layman’s Guide to a Subset of ASN.1, BER, and DER

Computer Networks and Open Systems