codebook doc -> CVS

author Monty <xiphmont@xiph.org>

Tue, 16 Jul 2002 09:24:55 +0000 (09:24 +0000)

committer Monty <xiphmont@xiph.org>

Tue, 16 Jul 2002 09:24:55 +0000 (09:24 +0000)
author Monty <xiphmont@xiph.org>
Tue, 16 Jul 2002 09:24:55 +0000 (09:24 +0000)
committer Monty <xiphmont@xiph.org>
Tue, 16 Jul 2002 09:24:55 +0000 (09:24 +0000)
diff --git a/doc/vorbis-spec-codebook.html b/doc/vorbis-spec-codebook.html

new file mode 100644 (file)

index 0000000..ad69f57
--- /dev/null
+++ b/doc/vorbis-spec-codebook.html
@@ -0,0 +1,399 @@
+<HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE>
+<BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000">
+<nobr><img src="white-ogg.png"><img src="vorbisword2.png"></nobr><p>
+
+<h1><font color=#000070>
+Ogg Vorbis I format specification: probability model and codebooks
+</font></h1>
+
+<em>Last update to this document: July 16, 2002</em><br>
+
+<h1>Overview</h1>
+
+Unlike practically every other mainstream audio codec, Vorbis has no
+statically configured probability model, instead packing all entropy
+decoding configuration, VQ and Huffman, into the bitstream itself in
+the third header, the codec setup header.  This packed configuration
+consists of multiple 'codebooks', each containing a specific
+Huffman-equivalent representation for decoding compressed codewords as
+well as an optional lookup table of output vector values to which a
+decoded Huffman value is applied as an offset, generating the final
+decoded output corresponding to a given compressed codeword.
+
+<h2>bitwise operation</h2>
+
+The codebook mechanism is built on top of the <a
+href="vorbis-spec-bitpack.html">Vorbis bitpacker</a>; both the
+codebooks themselves and the codewords they decode are unrolled from a
+packet as a series of arbitrary-width values read from the stream
+according to the <a href="vorbis-spec-bitpack.html">Vorbis bitpacking
+convention</a>.
+
+<h1>Packed Codebook Format</h1>
+
+For purposes of the below examples, we assume that the storage
+system's native byte width is eight bits.  This is not universally
+true; see <a href="vorbis-spec-bitpack.html">the Vorbis bitpacking
+convention</a> document for discussion relating to non-eight-bit
+bytes.<p>
+
+<h2>codebook decode</h2>
+
+A codebook begins with a 24 bit sync pattern, 0x564342:
+
+<pre>
+byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42)
+byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43)
+byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56)
+</pre>
+
+16 bit <tt>[codebook_dimensions]</tt> and 24 bit <tt>[codebook_entries]</tt> fields:
+
+<pre>
+
+byte 3: [ X X X X X X X X ] 
+byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned)
+
+byte 5: [ X X X X X X X X ] 
+byte 6: [ X X X X X X X X ] 
+byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned)
+
+</pre>
+
+Next is the <tt>[ordered]</tt> bit flag:
+
+<pre>
+
+byte 8: [               X ] [ordered] (1 bit)
+
+</pre>
+
+We now read the list of codeword lengths; each entry (numbering a
+total of <tt>[codebook_entries]</tt>) is assigned a codeword length.  However,
+decode of lengths is according to whether the <tt>[ordered]</tt> flag is
+set or unset.
+
+<ul>
+
+  <li>If the <tt>[ordered]</tt> flag is unset, the codeword list is not
+  length ordered and the decoder needs to read each codeword length
+  one-by-one.<p> The decoder first reads one additional bit flag, the
+  <tt>[sparse]</tt> flag.  This flag determines whether or not the
+  codebook contains unused entries that are not to be included in the
+  codeword decode tree:<p>
+
+<pre>
+byte 8: [             X 1 ] [sparse] flag (1 bit)
+</pre>
+
+  The decoder now performs for each of the <tt>[codebook_entries]</tt> code book entries:
+
+<pre>
+  
+  1) if([sparse] is set){
+
+         2) [flag] = read one bit;
+         3) if([flag] is set){
+
+              4) [length] = read a five bit unsigned integer;
+              5) codeword length for this entry is [length]+1;
+
+            } else {
+
+              6) this entry is unused.  mark it as such.
+
+            }
+
+     } else the sparse flag is not set {
+
+        7) [length] = read a five bit unsigned integer;
+        8) the codeword length for this entry is [length]+1;
+        
+     }
+
+</pre>
+
+  <li>If the <tt>[ordered]</tt> flag is set, the codeword list for this
+  codebook is encoded in ascending length order.  Rather than reading
+  a length for every codeword, the encoder reads the number of
+  codewords per length.  That is, beginning at entry zero:
+
+<pre>
+  1) [current_entry] = 0;
+  2) [current_length] = read a five bit unsigned integer and add one;
+  3) [number] = read <a
+  href="helper.html#ilog">ilog</a>([codebook_entries] - [current_entry]) bits as an unsigned integer
+  4) set the entries [current_entry] through [current_entry]+[n]-1 
+     inclusive of the [codebook_codeword_lengths] array to [current_length]
+  5) set [current_entry] to [number] + [current_entry]
+  6) increment [current_length]
+  7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION; the decoder will
+     not be able to read this stream.
+  8) if [current_entry] is less than [codebook_entries], repeat process starting at 3)
+  9) done.
+</pre>
+
+</ul>
+
+After all codeword lengths have been decoded, the decoder reads the
+vector lookup table.  Vorbis I supports three lookup types:
+<ol><li>No lookup
+<li>Implicitly populated value mapping (lattice VQ)
+<li>Explicitly populated value mapping (tessellated or 'foam' VQ)
+</ol>
+
+The lookup table type is read as a four bit unsigned integer:
+<pre>
+  1) [codebook_lookup_type] = read four bits as an unsigned integer
+</pre>
+
+Codebook decode precedes according to <tt>[codebook_lookup_type]</tt>:
+<ul>
+<li> Lookup type zero indicates no lookup to be read.  Proceed past
+lookup decode.  
+
+<li> Lookup types one and two are similar, differing only in the
+number of lookup values to be read.  Lookup type one reads a list of
+values that are permuted in a set pattern to build a list of vectors,
+each vector of order <tt>[codebook_dimensions]</tt> scalars.  Lookup
+type two builds the same vector list, but reads each scalar for each
+vector explicitly, rather than building vectors from a smaller list of
+possible scalar values.  Lookup decode proceeds as follows:
+
+<pre>
+  1) [codebook_minimum_value] = <a href="helper.html#float32_unpack">float32_unpack</a>( read 32 bits as an unsigned integer) 
+  2) [codebook_delta_value] = <a href="helper.html#float32_unpack">float32_unpack</a>( read 32 bits as an unsigned integer) 
+  3) [codebook_value_bits] = read 4 bits as an unsigned integer and add one
+  4) [codebook_sequence_p] = read 1 bit as a boolean flag
+
+  if ( [codebook_lookup_type] is 1 ) {
+   
+     5) [codebook_lookup_values] = <a href="helper.html#lookup1_values">lookup1_values</a>( <tt>[codebook_entries]</tt>, <tt>[codebook_dimensions]</tt> )
+
+  } else {
+
+     6) [codebook_lookup_values] = <tt>[codebook_entries]</tt> * <tt>[codebook_dimensions]</tt>
+
+  }
+
+  7) read a total of [codebook_lookup_values] unsigned integers of [codebook_value_bits] each; 
+     store these in order in the array [codebook_multiplicands]
+
+</pre>
+<li>A <tt>[codebook_lookup_type]</tt> of greater than two is reserved and indicates
+a stream that's not decodable by the specification in this document.
+
+</ul>
+
+An 'end of packet' during any read operation in the above steps is
+considered an error condition rendering the stream undecodable.<p>
+
+<h2>Huffman decision tree representation</h2>
+
+The <tt>[codebook_codeword_lengths]</tt> array and
+<tt>[codebook_entries]</tt> value uniquely define the Huffman decision
+tree used for entropy decoding.<p>
+
+Briefly, each used codebook entry (recall that length-unordered
+codebooks support unused codeword entries) is assigned, in order, the
+lowest valued unused binary Huffman codeword possible.  Assume the
+following codeword length list:<p>
+
+<pre>
+entry 0: length 2
+entry 1: length 4
+entry 2: length 4
+entry 3: length 4
+entry 4: length 4
+entry 5: length 2
+entry 6: length 3
+entry 7: length 3
+</pre>
+
+Assigning codewords in order (lowest possible value of the appropriate
+length to highest) results in the following codeword list:<p>
+
+<pre>
+entry 0: length 2 codeword 00
+entry 1: length 4 codeword 0100
+entry 2: length 4 codeword 0101
+entry 3: length 4 codeword 0110
+entry 4: length 4 codeword 0111
+entry 5: length 2 codeword 10
+entry 6: length 3 codeword 110
+entry 7: length 3 codeword 111
+</pre>
+
+<em>note that unlike most binary numerical values in this document, we
+intend the above codewords to be read and used bit by bit from left to
+right, thus the codeword '001' is the bit string 'zero, zero, one'.
+When determining 'lowest possible value' in the assignment definition
+above, the leftmost bit is the MSb.</em><p>
+
+It is clear that the codeword length list represents a Huffman
+decision tree with the entry numbers equivalent to the leaves numbered
+left-to-right:<p>
+
+<img src="hufftree.png"><p>
+
+As we assign codewords in order, we see that each choice constructs a
+new leaf in the leftmost possible position.<p>
+
+Note that it's possible to underspecify or overspecify a Huffman tree
+via the length list.  In the above example, if codeword seven were
+eliminated, it's clear that the tree is unfinished:<p>
+
+<img src="hufftree-under.png"><p>
+
+Similarly, in the original codebook, it's clear that the tree is fully
+populated and a ninth codeword is impossible.  Both underspecified and
+overspecified trees are an error condition rendering the stream
+undecodable.<p>
+
+Codebook entries marked 'unused' are simply skipped in the assigning
+process.  They have no codeword and do not appear in the decision
+tree, thus it's impossible for any bit pattern read from the stream to
+decode to that entry number.<p>
+
+<h2>VQ lookup table vector representation</h2>
+
+Decoding the VQ lookup table vectors relies on the following values:
+<ul>
+<li> the <tt>[codebook_multiplicands]</tt> array
+<li> <tt>[codebook_minimum_value]</tt>
+<li> <tt>[codebook_delta_value]</tt>
+<li> <tt>[codebook_sequence_p]</tt>
+<li> <tt>[codebook_lookup_type]</tt>
+<li> <tt>[codebook_entries]</tt>
+<li> <tt>[codebook_dimensions]</tt>
+<li> <tt>[codebook_lookup_values]</tt>
+</tt>
+</ul>
+
+Decoding a specific vector in the vector lookup table proceeds
+according to <tt>[codebook_lookup_type]</tt>.<p>
+
+<h3>Vector value decode: Lookup type 1</h3>
+
+Lookup type one specifies a lattice VQ lookup table built
+algorithmically from a list of scalar values.  The scalar values of a
+specific vector entry are calculated as follows, assuming
+<tt>[lookup_offset]</tt> specifies the vector to be
+calculated:<p>
+
+<pre>
+  1) [last] = zero;
+  2) [index_divisor] = one;
+  3) iterate [codebook_dimensions] times, once for each scalar value in the vector {
+       
+       4) [multiplicand_offset] = ( [lookup_offset] divided by [index_divisor] using integer 
+          division ) integer modulo [codebook_lookup_values]
+
+       5) set this iteration's scalar value = 
+            ( [codebook_multiplicands] array element number [multiplicand_offset] ) *
+            [codebook_delta_value] + [codebook_minimum_value] + [last];
+
+       6) if ( [codebook_sequence_p] is set ) then set [last] = this iteration's scalar value
+
+       7) [index_divisor] = [index_divisor] * [codebook_lookup_values]
+
+     }
+ 
+  8) vector calculation completed.
+</pre>
+
+<h3>Vector value decode: Lookup type 2</h3>
+
+Lookup type two specifies a VQ lookup table in which each scalar in
+each vector is explicitly set by the <tt>[codebook_multiplicands]</tt>
+array in a one-to-one mapping.  The scalar values of a specific vector
+entry in the lookup table are calculated as follows, assuming
+<tt>[lookup_offset]</tt> specifies the vector to be calculated:<p>
+
+<pre>
+  1) [last] = zero;
+  2) [multiplicand_offset] = [lookup_offset] * [codebook_dimensions]
+  3) iterate [codebook_dimensions] times, once for each scalar value in the vector {
+
+       4) set this iteration's scalar value = 
+            ( [codebook_multiplicands] array element number [multiplicand_offset] ) *
+            [codebook_delta_value] + [codebook_minimum_value] + [last];
+
+       5) if ( [codebook_sequence_p] is set ) then set [last] = this iteration's scalar value
+
+       6) increment [multiplicand_offset]
+
+     }
+ 
+  7) vector calculation completed.
+</pre>
+
+<h1>Use of the codebook abstraction</h1>
+
+The decoder uses the codebook abstraction much as it does the
+bit-unpacking convention; a specific codebook reads a
+codeword from the bitstream, decoding it into an entry number, and then
+returns that entry number to the decoder (when used in a scalar
+entropy coding context), or uses that entry number as an offset into
+the VQ lookup table, returning a vector of values (when used in a context
+desiring a VQ value). Scalar or VQ context is always explicit; any call
+to the codebook mechanism requests either a scalar entry number or a
+lookup vector.<p>
+
+Note that VQ lookup type zero indicates that there is no lookup table;
+requesting decode using a codebook of lookup type 0 in any context
+expecting a vector return value (even in a case where a vector of
+dimension one) is forbidden.  If decoder setup or decode requests such
+an action, that is an error condition rendering the packet
+undecodable.<p>
+
+Using a codebook to read from the packet bitstream consists first of
+reading and decoding the next codeword in the bitstream. The decoder
+reads bits until the accumulated bits match a codeword in the
+codebook.  This process can be though of as logically walking the
+Huffman decode tree by reading one bit at a time from the bitstream,
+and using the bit as a decision boolean to take the 0 branch (left in
+the above examples) or the 1 branch (right in the above examples).
+Walking the tree finishes when the decode process hits a leaf in the
+decision tree; the result is the entry number corresponding to that
+leaf.  Reading past the end of a packet propagates the 'end-of-stream'
+condition to the decoder.<p>
+
+When used in a scalar context, the resulting codeword entry is the
+desired return value.<p>
+
+When used in a VQ context, the codeword entry number is used as an
+offset into the VQ lookup table.  The value returned to the decoder is
+the vector of scalars corresponding to this offset.<p>
+
+<hr>
+<a href="http://www.xiph.org/">
+<img src="white-xifish.png" align=left border=0>
+</a>
+<font size=-2 color=#505050>
+
+Ogg is a <a href="http://www.xiph.org">Xiph.org Foundation</a> effort
+to protect essential tenets of Internet multimedia from corporate
+hostage-taking; Open Source is the net's greatest tool to keep
+everyone honest. See <a href="http://www.xiph.org/about.html">About
+the Xiph.org Foundation</a> for details.
+<p>
+
+Ogg Vorbis is the first Ogg audio CODEC.  Anyone may freely use and
+distribute the Ogg and Vorbis specification, whether in a private,
+public or corporate capacity.  However, the Xiph.org Foundation and
+the Ogg project (xiph.org) reserve the right to set the Ogg Vorbis
+specification and certify specification compliance.<p>
+
+Xiph.org's Vorbis software CODEC implementation is distributed under a
+BSD-like license.  This does not restrict third parties from
+distributing independent implementations of Vorbis software under
+other licenses.<p>
+
+Ogg, Vorbis, Xiph.org Foundation and their logos are trademarks (tm)
+of the <a href="http://www.xiph.org/">Xiph.org Foundation</a>.  These
+pages are copyright (C) 1994-2002 Xiph.org Foundation. All rights
+reserved.<p>
+
+</body>
+
author	Monty <xiphmont@xiph.org>
	Tue, 16 Jul 2002 09:24:55 +0000 (09:24 +0000)
committer	Monty <xiphmont@xiph.org>
	Tue, 16 Jul 2002 09:24:55 +0000 (09:24 +0000)