doc/xml/08-residue.xml

   1 <?xml version="1.0" standalone="no"?>
   2 <!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
   3                 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
   4
   5 ]>
   6
   7 <section id="vorbis-spec-residue">
   8 <sectioninfo>
   9  <releaseinfo>
  10   $Id$
  11  </releaseinfo>
  12 </sectioninfo>
  13 <title>Residue setup and decode</title>
  14
  15
  16 <section>
  17 <title>Overview</title>
  18
  19 <para>
  20 A residue vector represents the fine detail of the audio spectrum of
  21 one channel in an audio frame after the encoder subtracts the floor
  22 curve and performs any channel coupling.  A residue vector may
  23 represent spectral lines, spectral magnitude, spectral phase or
  24 hybrids as mixed by channel coupling.  The exact semantic content of
  25 the vector does not matter to the residue abstraction.</para>
  26
  27 <para>
  28 Whatever the exact qualities, the Vorbis residue abstraction codes the
  29 residue vectors into the bitstream packet, and then reconstructs the
  30 vectors during decode.  Vorbis makes use of three different encoding
  31 variants (numbered 0, 1 and 2) of the same basic vector encoding
  32 abstraction.</para>
  33
  34 </section>
  35
  36 <section>
  37 <title>Residue format</title>
  38
  39 <para>
  40 Residue format partitions each vector in the vector bundle into chunks,
  41 classifies each chunk, encodes the chunk classifications and finally
  42 encodes the chunks themselves using the the specific VQ arrangement
  43 defined for each selected classification.
  44 The exact interleaving and partitioning vary by residue encoding number,
  45 however the high-level process used to classify and encode the residue
  46 vector is the same in all three variants.</para>
  47
  48 <para>
  49 A set of coded residue vectors are all of the same length.  High level
  50 coding structure, ignoring for the moment exactly how a partition is
  51 encoded and simply trusting that it is, is as follows:</para>
  52
  53 <para>
  54 <itemizedlist>
  55 <listitem><para>Each vector is partitioned into multiple equal sized chunks
  56 according to configuration specified.  If we have a vector size of
  57 <emphasis>n</emphasis>, a partition size <emphasis>residue_partition_size</emphasis>, and a total
  58 of <emphasis>ch</emphasis> residue vectors, the total number of partitioned chunks
  59 coded is <emphasis>n</emphasis>/<emphasis>residue_partition_size</emphasis>*<emphasis>ch</emphasis>.  It is
  60 important to note that the integer division truncates.  In the below
  61 example, we assume an example <emphasis>residue_partition_size</emphasis> of 8.</para></listitem>
  62
  63 <listitem><para>Each partition in each vector has a classification number that
  64 specifies which of multiple configured VQ codebook setups are used to
  65 decode that partition.  The classification numbers of each partition
  66 can be thought of as forming a vector in their own right, as in the
  67 illustration below.  Just as the residue vectors are coded in grouped
  68 partitions to increase encoding efficiency, the classification vector
  69 is also partitioned into chunks.  The integer elements of each scalar
  70 in a classification chunk are built into a single scalar that
  71 represents the classification numbers in that chunk.  In the below
  72 example, the classification codeword encodes two classification
  73 numbers.</para></listitem>
  74
  75 <listitem><para>The values in a residue vector may be encoded monolithically in a
  76 single pass through the residue vector, but more often efficient
  77 codebook design dictates that each vector is encoded as the additive
  78 sum of several passes through the residue vector using more than one
  79 VQ codebook.  Thus, each residue value potentially accumulates values
  80 from multiple decode passes.  The classification value associated with
  81 a partition is the same in each pass, thus the classification codeword
  82 is coded only in the first pass.</para></listitem>
  83
  84 </itemizedlist>
  85 </para>
  86
  87 <mediaobject>
  88 <imageobject>
  89  <imagedata fileref="residue-pack.png" format="PNG"/>
  90 </imageobject>
  91 <textobject>
  92  <phrase>[illustration of residue vector format]</phrase>
  93 </textobject>
  94 </mediaobject>
  95
  96 </section>
  97
  98 <section><title>residue 0</title>
  99
 100 <para>
 101 Residue 0 and 1 differ only in the way the values within a residue
 102 partition are interleaved during partition encoding (visually treated
 103 as a black box--or cyan box or brown box--in the above figure).</para>
 104
 105 <para>
 106 Residue encoding 0 interleaves VQ encoding according to the
 107 dimension of the codebook used to encode a partition in a specific
 108 pass.  The dimension of the codebook need not be the same in multiple
 109 passes, however the partition size must be an even multiple of the
 110 codebook dimension.</para>
 111
 112 <para>
 113 As an example, assume a partition vector of size eight, to be encoded
 114 by residue 0 using codebook sizes of 8, 4, 2 and 1:</para>
 115
 116 <programlisting>
 117
 118             original residue vector: [ 0 1 2 3 4 5 6 7 ]
 119
 120 codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ]
 121
 122 codebook dimensions = 4  encoded as: [ 0 2 4 6 ], [ 1 3 5 7 ]
 123
 124 codebook dimensions = 2  encoded as: [ 0 4 ], [ 1 5 ], [ 2 6 ], [ 3 7 ]
 125
 126 codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ]
 127
 128 </programlisting>
 129
 130 <para>
 131 It is worth mentioning at this point that no configurable value in the
 132 residue coding setup is restricted to a power of two.</para>
 133
 134 </section>
 135
 136 <section><title>residue 1</title>
 137
 138 <para>
 139 Residue 1 does not interleave VQ encoding.  It represents partition
 140 vector scalars in order.  As with residue 0, however, partition length
 141 must be an integer multiple of the codebook dimension, although
 142 dimension may vary from pass to pass.</para>
 143
 144 <para>
 145 As an example, assume a partition vector of size eight, to be encoded
 146 by residue 0 using codebook sizes of 8, 4, 2 and 1:</para>
 147
 148 <programlisting>
 149
 150             original residue vector: [ 0 1 2 3 4 5 6 7 ]
 151
 152 codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ]
 153
 154 codebook dimensions = 4  encoded as: [ 0 1 2 3 ], [ 4 5 6 7 ]
 155
 156 codebook dimensions = 2  encoded as: [ 0 1 ], [ 2 3 ], [ 4 5 ], [ 6 7 ]
 157
 158 codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ]
 159
 160 </programlisting>
 161
 162 </section>
 163
 164 <section><title>residue 2</title>
 165
 166 <para>
 167 Residue type two can be thought of as a variant of residue type 1.
 168 Rather than encoding multiple passed-in vectors as in residue type 1,
 169 the <emphasis>ch</emphasis> passed in vectors of length <emphasis>n</emphasis> are first
 170 interleaved and flattened into a single vector of length
 171 <emphasis>ch</emphasis>*<emphasis>n</emphasis>.  Encoding then proceeds as in type 1. Decoding is
 172 as in type 1 with decode interleave reversed. If operating on a single
 173 vector to begin with, residue type 1 and type 2 are equivalent.</para>
 174
 175 <mediaobject>
 176 <imageobject>
 177  <imagedata fileref="residue2.png" format="PNG"/>
 178 </imageobject>
 179 <textobject>
 180  <phrase>[illustration of residue type 2]</phrase>
 181 </textobject>
 182 </mediaobject>
 183
 184 </section>
 185
 186 <section>
 187 <title>Residue decode</title>
 188
 189 <section><title>header decode</title>
 190
 191 <para>
 192 Header decode for all three residue types is identical.</para>
 193 <programlisting>
 194   1) [residue_begin] = read 24 bits as unsigned integer
 195   2) [residue_end] = read 24 bits as unsigned integer
 196   3) [residue_partition_size] = read 24 bits as unsigned integer and add one
 197   4) [residue_classifications] = read 6 bits as unsigned integer and add one
 198   5) [residue_classbook] = read 8 bits as unsigned integer
 199 </programlisting>
 200
 201 <para>
 202 <varname>[residue_begin]</varname> and
 203 <varname>[residue_end]</varname> select the specific sub-portion of
 204 each vector that is actually coded; it implements akin to a bandpass
 205 where, for coding purposes, the vector effectively begins at element
 206 <varname>[residue_begin]</varname> and ends at
 207 <varname>[residue_end]</varname>.  Preceding and following values in
 208 the unpacked vectors are zeroed.  Note that for residue type 2, these
 209 values as well as <varname>[residue_partition_size]</varname>apply to
 210 the interleaved vector, not the individual vectors before interleave.
 211 <varname>[residue_partition_size]</varname> is as explained above,
 212 <varname>[residue_classifications]</varname> is the number of possible
 213 classification to which a partition can belong and
 214 <varname>[residue_classbook]</varname> is the codebook number used to
 215 code classification codewords.  The number of dimensions in book
 216 <varname>[residue_classbook]</varname> determines how many
 217 classification values are grouped into a single classification
 218 codeword.  Note that the number of entries and dimensions in book
 219 <varname>[residue_classbook]</varname>, along with
 220 <varname>[residue_classifications]</varname>, overdetermines to
 221 possible number of classification codewords.  If
 222 <varname>[residue_classifications]</varname>^<varname>[residue_classbook]</varname>.dimensions
 223 exceeds <varname>[residue_classbook]</varname>.entries, the
 224 bitstream should be regarded to be undecodable. </para>
 225
 226 <para>
 227 Next we read a bitmap pattern that specifies which partition classes
 228 code values in which passes.</para>
 229
 230 <programlisting>
 231   1) iterate [i] over the range 0 ... [residue_classifications]-1 {
 232
 233        2) [high_bits] = 0
 234        3) [low_bits] = read 3 bits as unsigned integer
 235        4) [bitflag] = read one bit as boolean
 236        5) if ( [bitflag] is set ) then [high_bits] = read five bits as unsigned integer
 237        6) vector [residue_cascade] element [i] = [high_bits] * 8 + [low_bits]
 238      }
 239   7) done
 240 </programlisting>
 241
 242 <para>
 243 Finally, we read in a list of book numbers, each corresponding to
 244 specific bit set in the cascade bitmap.  We loop over the possible
 245 codebook classifications and the maximum possible number of encoding
 246 stages (8 in Vorbis I, as constrained by the elements of the cascade
 247 bitmap being eight bits):</para>
 248
 249 <programlisting>
 250   1) iterate [i] over the range 0 ... [residue_classifications]-1 {
 251
 252        2) iterate [j] over the range 0 ... 7 {
 253
 254             3) if ( vector [residue_cascade] element [i] bit [j] is set ) {
 255
 256                  4) array [residue_books] element [i][j] = read 8 bits as unsigned integer
 257
 258                } else {
 259
 260                  5) array [residue_books] element [i][j] = unused
 261
 262                }
 263           }
 264       }
 265
 266   6) done
 267 </programlisting>
 268
 269 <para>
 270 An end-of-packet condition at any point in header decode renders the
 271 stream undecodable.  In addition, any codebook number greater than the
 272 maximum numbered codebook set up in this stream also renders the
 273 stream undecodable.</para>
 274
 275 </section>
 276
 277 <section><title>packet decode</title>
 278
 279 <para>
 280 Format 0 and 1 packet decode is identical except for specific
 281 partition interleave.  Format 2 packet decode can be built out of the
 282 format 1 decode process.  Thus we describe first the decode
 283 infrastructure identical to all three formats.</para>
 284
 285 <para>
 286 In addition to configuration information, the residue decode process
 287 is passed the number of vectors in the submap bundle and a vector of
 288 flags indicating if any of the vectors are not to be decoded.  If the
 289 passed in number of vectors is 3 and vector number 1 is marked 'do not
 290 decode', decode skips vector 1 during the decode loop.  However, even
 291 'do not decode' vectors are allocated and zeroed.</para>
 292
 293 <para>
 294 Depending on the values of <varname>[residue_begin]</varname> and
 295 <varname>[residue_end]</varname>, it is obvious that the encoded
 296 portion of a residue vector may be the entire possible residue vector
 297 or some other strict subset of the actual residue vector size with
 298 zero padding at either uncoded end.  However, it is also possible to
 299 set <varname>[residue_begin]</varname> and
 300 <varname>[residue_end]</varname> to specify a range partially or
 301 wholly beyond the maximum vector size.  Before beginning residue
 302 decode, limit <varname>[residue_begin]</varname> and
 303 <varname>[residue_end]</varname> to the maximum possible vector size
 304 as follows.  We assume that the number of vectors being encoded,
 305 <varname>[ch]</varname> is provided by the higher level decoding
 306 process.</para>
 307
 308 <programlisting>
 309   1) [actual_size] = current blocksize/2;
 310   2) if residue encoding is format 2
 311        3) [actual_size] = [actual_size] * [ch];
 312   4) [limit_residue_begin] = maximum of ([residue_begin],[actual_size]);
 313   5) [limit_residue_end] = maximum of ([residue_end],[actual_size]);
 314 </programlisting>
 315
 316 <para>
 317 The following convenience values are conceptually useful to clarifying
 318 the decode process:</para>
 319
 320 <programlisting>
 321   1) [classwords_per_codeword] = [codebook_dimensions] value of codebook [residue_classbook]
 322   2) [n_to_read] = [limit_residue_end] - [limit_residue_begin]
 323   3) [partitions_to_read] = [n_to_read] / [residue_partition_size]
 324 </programlisting>
 325
 326 <para>
 327 Packet decode proceeds as follows, matching the description offered earlier in the document. </para>
 328 <programlisting>
 329   1) allocate and zero all vectors that will be returned.
 330   2) if ([n_to_read] is zero), stop; there is no residue to decode.
 331   3) iterate [pass] over the range 0 ... 7 {
 332
 333        4) [partition_count] = 0
 334
 335        5) while [partition_count] is less than [partitions_to_read]
 336
 337             6) if ([pass] is zero) {
 338
 339                  7) iterate [j] over the range 0 .. [ch]-1 {
 340
 341                       8) if vector [j] is not marked 'do not decode' {
 342
 343                            9) [temp] = read from packet using codebook [residue_classbook] in scalar context
 344                           10) iterate [i] descending over the range [classwords_per_codeword]-1 ... 0 {
 345
 346                                11) array [classifications] element [j],([i]+[partition_count]) =
 347                                    [temp] integer modulo [residue_classifications]
 348                                12) [temp] = [temp] / [residue_classifications] using integer division
 349
 350                               }
 351
 352                          }
 353
 354                     }
 355
 356                }
 357
 358            13) iterate [i] over the range 0 .. ([classwords_per_codeword] - 1) while [partition_count]
 359                is also less than [partitions_to_read] {
 360
 361                  14) iterate [j] over the range 0 .. [ch]-1 {
 362
 363                       15) if vector [j] is not marked 'do not decode' {
 364
 365                            16) [vqclass] = array [classifications] element [j],[partition_count]
 366                            17) [vqbook] = array [residue_books] element [vqclass],[pass]
 367                            18) if ([vqbook] is not 'unused') {
 368
 369                                 19) decode partition into output vector number [j], starting at scalar
 370                                     offset [limit_residue_begin]+[partition_count]*[residue_partition_size] using
 371                                     codebook number [vqbook] in VQ context
 372                           }
 373                      }
 374
 375                  20) increment [partition_count] by one
 376
 377                }
 378           }
 379      }
 380
 381  21) done
 382
 383 </programlisting>
 384
 385 <para>
 386 An end-of-packet condition during packet decode is to be considered a
 387 nominal occurrence.  Decode returns the result of vector decode up to
 388 that point.</para>
 389
 390 </section>
 391
 392 <section><title>format 0 specifics</title>
 393
 394 <para>
 395 Format zero decodes partitions exactly as described earlier in the
 396 'Residue Format: residue 0' section.  The following pseudocode
 397 presents the same algorithm. Assume:</para>
 398
 399 <para>
 400 <itemizedlist>
 401 <listitem><simpara> <varname>[n]</varname> is the value in <varname>[residue_partition_size]</varname></simpara></listitem>
 402 <listitem><simpara><varname>[v]</varname> is the residue vector</simpara></listitem>
 403 <listitem><simpara><varname>[offset]</varname> is the beginning read offset in [v]</simpara></listitem>
 404 </itemizedlist>
 405 </para>
 406
 407 <programlisting>
 408  1) [step] = [n] / [codebook_dimensions]
 409  2) iterate [i] over the range 0 ... [step]-1 {
 410
 411       3) vector [entry_temp] = read vector from packet using current codebook in VQ context
 412       4) iterate [j] over the range 0 ... [codebook_dimensions]-1 {
 413
 414            5) vector [v] element ([offset]+[i]+[j]*[step]) =
 415                 vector [v] element ([offset]+[i]+[j]*[step]) +
 416                 vector [entry_temp] element [j]
 417
 418          }
 419
 420     }
 421
 422   6) done
 423
 424 </programlisting>
 425
 426 </section>
 427
 428 <section><title>format 1 specifics</title>
 429
 430 <para>
 431 Format 1 decodes partitions exactly as described earlier in the
 432 'Residue Format: residue 1' section.  The following pseudocode
 433 presents the same algorithm. Assume:</para>
 434
 435 <para>
 436 <itemizedlist>
 437 <listitem><simpara> <varname>[n]</varname> is the value in
 438 <varname>[residue_partition_size]</varname></simpara></listitem>
 439 <listitem><simpara><varname>[v]</varname> is the residue vector</simpara></listitem>
 440 <listitem><simpara><varname>[offset]</varname> is the beginning read offset in [v]</simpara></listitem>
 441 </itemizedlist>
 442 </para>
 443
 444 <programlisting>
 445  1) [i] = 0
 446  2) vector [entry_temp] = read vector from packet using current codebook in VQ context
 447  3) iterate [j] over the range 0 ... [codebook_dimensions]-1 {
 448
 449       4) vector [v] element ([offset]+[i]) =
 450           vector [v] element ([offset]+[i]) +
 451           vector [entry_temp] element [j]
 452       5) increment [i]
 453
 454     }
 455
 456   6) if ( [i] is less than [n] ) continue at step 2
 457   7) done
 458 </programlisting>
 459
 460 </section>
 461
 462 <section><title>format 2 specifics</title>
 463
 464 <para>
 465 Format 2 is reducible to format 1.  It may be implemented as an additional step prior to and an additional post-decode step after a normal format 1 decode.
 466 </para>
 467
 468 <para>
 469 Format 2 handles 'do not decode' vectors differently than residue 0 or
 470 1; if all vectors are marked 'do not decode', no decode occurrs.
 471 However, if at least one vector is to be decoded, all the vectors are
 472 decoded.  We then request normal format 1 to decode a single vector
 473 representing all output channels, rather than a vector for each
 474 channel.  After decode, deinterleave the vector into independent vectors, one for each output channel.  That is:</para>
 475
 476 <orderedlist>
 477  <listitem><simpara>If all vectors 0 through <emphasis>ch</emphasis>-1 are marked 'do not decode', allocate and clear a single vector <varname>[v]</varname>of length <emphasis>ch*n</emphasis> and skip step 2 below; proceed directly to the post-decode step.</simpara></listitem>
 478  <listitem><simpara>Rather than performing format 1 decode to produce <emphasis>ch</emphasis> vectors of length <emphasis>n</emphasis> each, call format 1 decode to produce a single vector <varname>[v]</varname> of length <emphasis>ch*n</emphasis>. </simpara></listitem>
 479  <listitem><para>Post decode: Deinterleave the single vector <varname>[v]</varname> returned by format 1 decode as described above into <emphasis>ch</emphasis> independent vectors, one for each outputchannel, according to:
 480   <programlisting>
 481   1) iterate [i] over the range 0 ... [n]-1 {
 482
 483        2) iterate [j] over the range 0 ... [ch]-1 {
 484
 485             3) output vector number [j] element [i] = vector [v] element ([i] * [ch] + [j])
 486
 487           }
 488      }
 489
 490   4) done
 491   </programlisting>
 492  </para></listitem>
 493 </orderedlist>
 494
 495 </section>
 496
 497 </section>
 498
 499 </section>
 500