doc/08-residue.tex

   1 % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
   2 %!TEX root = Vorbis_I_spec.tex
   3 \section{Residue setup and decode} \label{vorbis:spec:residue}
   4
   5 \subsection{Overview}
   6
   7 A residue vector represents the fine detail of the audio spectrum of
   8 one channel in an audio frame after the encoder subtracts the floor
   9 curve and performs any channel coupling.  A residue vector may
  10 represent spectral lines, spectral magnitude, spectral phase or
  11 hybrids as mixed by channel coupling.  The exact semantic content of
  12 the vector does not matter to the residue abstraction.
  13
  14 Whatever the exact qualities, the Vorbis residue abstraction codes the
  15 residue vectors into the bitstream packet, and then reconstructs the
  16 vectors during decode.  Vorbis makes use of three different encoding
  17 variants (numbered 0, 1 and 2) of the same basic vector encoding
  18 abstraction.
  19
  20
  21
  22 \subsection{Residue format}
  23
  24 Residue format partitions each vector in the vector bundle into chunks,
  25 classifies each chunk, encodes the chunk classifications and finally
  26 encodes the chunks themselves using the the specific VQ arrangement
  27 defined for each selected classification.
  28 The exact interleaving and partitioning vary by residue encoding number,
  29 however the high-level process used to classify and encode the residue
  30 vector is the same in all three variants.
  31
  32 A set of coded residue vectors are all of the same length.  High level
  33 coding structure, ignoring for the moment exactly how a partition is
  34 encoded and simply trusting that it is, is as follows:
  35
  36 \begin{itemize}
  37 \item Each vector is partitioned into multiple equal sized chunks
  38 according to configuration specified.  If we have a vector size of
  39 \emph{n}, a partition size \emph{residue\_partition\_size}, and a total
  40 of \emph{ch} residue vectors, the total number of partitioned chunks
  41 coded is \emph{n}/\emph{residue\_partition\_size}*\emph{ch}.  It is
  42 important to note that the integer division truncates.  In the below
  43 example, we assume an example \emph{residue\_partition\_size} of 8.
  44
  45 \item Each partition in each vector has a classification number that
  46 specifies which of multiple configured VQ codebook setups are used to
  47 decode that partition.  The classification numbers of each partition
  48 can be thought of as forming a vector in their own right, as in the
  49 illustration below.  Just as the residue vectors are coded in grouped
  50 partitions to increase encoding efficiency, the classification vector
  51 is also partitioned into chunks.  The integer elements of each scalar
  52 in a classification chunk are built into a single scalar that
  53 represents the classification numbers in that chunk.  In the below
  54 example, the classification codeword encodes two classification
  55 numbers.
  56
  57 \item The values in a residue vector may be encoded monolithically in a
  58 single pass through the residue vector, but more often efficient
  59 codebook design dictates that each vector is encoded as the additive
  60 sum of several passes through the residue vector using more than one
  61 VQ codebook.  Thus, each residue value potentially accumulates values
  62 from multiple decode passes.  The classification value associated with
  63 a partition is the same in each pass, thus the classification codeword
  64 is coded only in the first pass.
  65
  66 \end{itemize}
  67
  68
  69 \begin{center}
  70 \includegraphics[width=\textwidth]{residue-pack}
  71 \captionof{figure}{illustration of residue vector format}
  72 \end{center}
  73
  74
  75
  76 \subsection{residue 0}
  77
  78 Residue 0 and 1 differ only in the way the values within a residue
  79 partition are interleaved during partition encoding (visually treated
  80 as a black box--or cyan box or brown box--in the above figure).
  81
  82 Residue encoding 0 interleaves VQ encoding according to the
  83 dimension of the codebook used to encode a partition in a specific
  84 pass.  The dimension of the codebook need not be the same in multiple
  85 passes, however the partition size must be an even multiple of the
  86 codebook dimension.
  87
  88 As an example, assume a partition vector of size eight, to be encoded
  89 by residue 0 using codebook sizes of 8, 4, 2 and 1:
  90
  91 \begin{programlisting}
  92
  93             original residue vector: [ 0 1 2 3 4 5 6 7 ]
  94
  95 codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ]
  96
  97 codebook dimensions = 4  encoded as: [ 0 2 4 6 ], [ 1 3 5 7 ]
  98
  99 codebook dimensions = 2  encoded as: [ 0 4 ], [ 1 5 ], [ 2 6 ], [ 3 7 ]
 100
 101 codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ]
 102
 103 \end{programlisting}
 104
 105 It is worth mentioning at this point that no configurable value in the
 106 residue coding setup is restricted to a power of two.
 107
 108
 109
 110 \subsection{residue 1}
 111
 112 Residue 1 does not interleave VQ encoding.  It represents partition
 113 vector scalars in order.  As with residue 0, however, partition length
 114 must be an integer multiple of the codebook dimension, although
 115 dimension may vary from pass to pass.
 116
 117 As an example, assume a partition vector of size eight, to be encoded
 118 by residue 0 using codebook sizes of 8, 4, 2 and 1:
 119
 120 \begin{programlisting}
 121
 122             original residue vector: [ 0 1 2 3 4 5 6 7 ]
 123
 124 codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ]
 125
 126 codebook dimensions = 4  encoded as: [ 0 1 2 3 ], [ 4 5 6 7 ]
 127
 128 codebook dimensions = 2  encoded as: [ 0 1 ], [ 2 3 ], [ 4 5 ], [ 6 7 ]
 129
 130 codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ]
 131
 132 \end{programlisting}
 133
 134
 135
 136 \subsection{residue 2}
 137
 138 Residue type two can be thought of as a variant of residue type 1.
 139 Rather than encoding multiple passed-in vectors as in residue type 1,
 140 the \emph{ch} passed in vectors of length \emph{n} are first
 141 interleaved and flattened into a single vector of length
 142 \emph{ch}*\emph{n}.  Encoding then proceeds as in type 1. Decoding is
 143 as in type 1 with decode interleave reversed. If operating on a single
 144 vector to begin with, residue type 1 and type 2 are equivalent.
 145
 146 \begin{center}
 147 \includegraphics[width=\textwidth]{residue2}
 148 \captionof{figure}{illustration of residue type 2}
 149 \end{center}
 150
 151
 152 \subsection{Residue decode}
 153
 154 \subsubsection{header decode}
 155
 156 Header decode for all three residue types is identical.
 157 \begin{programlisting}
 158   1) [residue\_begin] = read 24 bits as unsigned integer
 159   2) [residue\_end] = read 24 bits as unsigned integer
 160   3) [residue\_partition\_size] = read 24 bits as unsigned integer and add one
 161   4) [residue\_classifications] = read 6 bits as unsigned integer and add one
 162   5) [residue\_classbook] = read 8 bits as unsigned integer
 163 \end{programlisting}
 164
 165 \varname{[residue\_begin]} and
 166 \varname{[residue\_end]} select the specific sub-portion of
 167 each vector that is actually coded; it implements akin to a bandpass
 168 where, for coding purposes, the vector effectively begins at element
 169 \varname{[residue\_begin]} and ends at
 170 \varname{[residue\_end]}.  Preceding and following values in
 171 the unpacked vectors are zeroed.  Note that for residue type 2, these
 172 values as well as \varname{[residue\_partition\_size]}apply to
 173 the interleaved vector, not the individual vectors before interleave.
 174 \varname{[residue\_partition\_size]} is as explained above,
 175 \varname{[residue\_classifications]} is the number of possible
 176 classification to which a partition can belong and
 177 \varname{[residue\_classbook]} is the codebook number used to
 178 code classification codewords.  The number of dimensions in book
 179 \varname{[residue\_classbook]} determines how many
 180 classification values are grouped into a single classification
 181 codeword.  Note that the number of entries and dimensions in book
 182 \varname{[residue\_classbook]}, along with
 183 \varname{[residue\_classifications]}, overdetermines to
 184 possible number of classification codewords.
 185 If \varname{[residue\_classifications]}\^{}\varname{[residue\_classbook]}.dimensions
 186 exceeds \varname{[residue\_classbook]}.entries, the
 187 bitstream should be regarded to be undecodable.
 188
 189 Next we read a bitmap pattern that specifies which partition classes
 190 code values in which passes.
 191
 192 \begin{programlisting}
 193   1) iterate [i] over the range 0 ... [residue\_classifications]-1 {
 194
 195        2) [high\_bits] = 0
 196        3) [low\_bits] = read 3 bits as unsigned integer
 197        4) [bitflag] = read one bit as boolean
 198        5) if ( [bitflag] is set ) then [high\_bits] = read five bits as unsigned integer
 199        6) vector [residue\_cascade] element [i] = [high\_bits] * 8 + [low\_bits]
 200      }
 201   7) done
 202 \end{programlisting}
 203
 204 Finally, we read in a list of book numbers, each corresponding to
 205 specific bit set in the cascade bitmap.  We loop over the possible
 206 codebook classifications and the maximum possible number of encoding
 207 stages (8 in Vorbis I, as constrained by the elements of the cascade
 208 bitmap being eight bits):
 209
 210 \begin{programlisting}
 211   1) iterate [i] over the range 0 ... [residue\_classifications]-1 {
 212
 213        2) iterate [j] over the range 0 ... 7 {
 214
 215             3) if ( vector [residue\_cascade] element [i] bit [j] is set ) {
 216
 217                  4) array [residue\_books] element [i][j] = read 8 bits as unsigned integer
 218
 219                } else {
 220
 221                  5) array [residue\_books] element [i][j] = unused
 222
 223                }
 224           }
 225       }
 226
 227   6) done
 228 \end{programlisting}
 229
 230 An end-of-packet condition at any point in header decode renders the
 231 stream undecodable.  In addition, any codebook number greater than the
 232 maximum numbered codebook set up in this stream also renders the
 233 stream undecodable. All codebooks in array [residue\_books] are
 234 required to have a value mapping.  The presence of codebook in array
 235 [residue\_books] without a value mapping (maptype equals zero) renders
 236 the stream undecodable.
 237
 238
 239
 240 \subsubsection{packet decode}
 241
 242 Format 0 and 1 packet decode is identical except for specific
 243 partition interleave.  Format 2 packet decode can be built out of the
 244 format 1 decode process.  Thus we describe first the decode
 245 infrastructure identical to all three formats.
 246
 247 In addition to configuration information, the residue decode process
 248 is passed the number of vectors in the submap bundle and a vector of
 249 flags indicating if any of the vectors are not to be decoded.  If the
 250 passed in number of vectors is 3 and vector number 1 is marked 'do not
 251 decode', decode skips vector 1 during the decode loop.  However, even
 252 'do not decode' vectors are allocated and zeroed.
 253
 254 Depending on the values of \varname{[residue\_begin]} and
 255 \varname{[residue\_end]}, it is obvious that the encoded
 256 portion of a residue vector may be the entire possible residue vector
 257 or some other strict subset of the actual residue vector size with
 258 zero padding at either uncoded end.  However, it is also possible to
 259 set \varname{[residue\_begin]} and
 260 \varname{[residue\_end]} to specify a range partially or
 261 wholly beyond the maximum vector size.  Before beginning residue
 262 decode, limit \varname{[residue\_begin]} and
 263 \varname{[residue\_end]} to the maximum possible vector size
 264 as follows.  We assume that the number of vectors being encoded,
 265 \varname{[ch]} is provided by the higher level decoding
 266 process.
 267
 268 \begin{programlisting}
 269   1) [actual\_size] = current blocksize/2;
 270   2) if residue encoding is format 2
 271        3) [actual\_size] = [actual\_size] * [ch];
 272   4) [limit\_residue\_begin] = minimum of ([residue\_begin],[actual\_size]);
 273   5) [limit\_residue\_end] = minimum of ([residue\_end],[actual\_size]);
 274 \end{programlisting}
 275
 276 The following convenience values are conceptually useful to clarifying
 277 the decode process:
 278
 279 \begin{programlisting}
 280   1) [classwords\_per\_codeword] = [codebook\_dimensions] value of codebook [residue\_classbook]
 281   2) [n\_to\_read] = [limit\_residue\_end] - [limit\_residue\_begin]
 282   3) [partitions\_to\_read] = [n\_to\_read] / [residue\_partition\_size]
 283 \end{programlisting}
 284
 285 Packet decode proceeds as follows, matching the description offered earlier in the document.
 286 \begin{programlisting}
 287   1) allocate and zero all vectors that will be returned.
 288   2) if ([n\_to\_read] is zero), stop; there is no residue to decode.
 289   3) iterate [pass] over the range 0 ... 7 {
 290
 291        4) [partition\_count] = 0
 292
 293        5) while [partition\_count] is less than [partitions\_to\_read]
 294
 295             6) if ([pass] is zero) {
 296
 297                  7) iterate [j] over the range 0 .. [ch]-1 {
 298
 299                       8) if vector [j] is not marked 'do not decode' {
 300
 301                            9) [temp] = read from packet using codebook [residue\_classbook] in scalar context
 302                           10) iterate [i] descending over the range [classwords\_per\_codeword]-1 ... 0 {
 303
 304                                11) array [classifications] element [j],([i]+[partition\_count]) =
 305                                    [temp] integer modulo [residue\_classifications]
 306                                12) [temp] = [temp] / [residue\_classifications] using integer division
 307
 308                               }
 309
 310                          }
 311
 312                     }
 313
 314                }
 315
 316            13) iterate [i] over the range 0 .. ([classwords\_per\_codeword] - 1) while [partition\_count]
 317                is also less than [partitions\_to\_read] {
 318
 319                  14) iterate [j] over the range 0 .. [ch]-1 {
 320
 321                       15) if vector [j] is not marked 'do not decode' {
 322
 323                            16) [vqclass] = array [classifications] element [j],[partition\_count]
 324                            17) [vqbook] = array [residue\_books] element [vqclass],[pass]
 325                            18) if ([vqbook] is not 'unused') {
 326
 327                                 19) decode partition into output vector number [j], starting at scalar
 328                                     offset [limit\_residue\_begin]+[partition\_count]*[residue\_partition\_size] using
 329                                     codebook number [vqbook] in VQ context
 330                           }
 331                      }
 332
 333                  20) increment [partition\_count] by one
 334
 335                }
 336           }
 337      }
 338
 339  21) done
 340
 341 \end{programlisting}
 342
 343 An end-of-packet condition during packet decode is to be considered a
 344 nominal occurrence.  Decode returns the result of vector decode up to
 345 that point.
 346
 347
 348
 349 \subsubsection{format 0 specifics}
 350
 351 Format zero decodes partitions exactly as described earlier in the
 352 'Residue Format: residue 0' section.  The following pseudocode
 353 presents the same algorithm. Assume:
 354
 355 \begin{itemize}
 356 \item  \varname{[n]} is the value in \varname{[residue\_partition\_size]}
 357 \item \varname{[v]} is the residue vector
 358 \item \varname{[offset]} is the beginning read offset in [v]
 359 \end{itemize}
 360
 361
 362 \begin{programlisting}
 363  1) [step] = [n] / [codebook\_dimensions]
 364  2) iterate [i] over the range 0 ... [step]-1 {
 365
 366       3) vector [entry\_temp] = read vector from packet using current codebook in VQ context
 367       4) iterate [j] over the range 0 ... [codebook\_dimensions]-1 {
 368
 369            5) vector [v] element ([offset]+[i]+[j]*[step]) =
 370                 vector [v] element ([offset]+[i]+[j]*[step]) +
 371                 vector [entry\_temp] element [j]
 372
 373          }
 374
 375     }
 376
 377   6) done
 378
 379 \end{programlisting}
 380
 381
 382
 383 \subsubsection{format 1 specifics}
 384
 385 Format 1 decodes partitions exactly as described earlier in the
 386 'Residue Format: residue 1' section.  The following pseudocode
 387 presents the same algorithm. Assume:
 388
 389 \begin{itemize}
 390 \item  \varname{[n]} is the value in
 391 \varname{[residue\_partition\_size]}
 392 \item \varname{[v]} is the residue vector
 393 \item \varname{[offset]} is the beginning read offset in [v]
 394 \end{itemize}
 395
 396
 397 \begin{programlisting}
 398  1) [i] = 0
 399  2) vector [entry\_temp] = read vector from packet using current codebook in VQ context
 400  3) iterate [j] over the range 0 ... [codebook\_dimensions]-1 {
 401
 402       4) vector [v] element ([offset]+[i]) =
 403           vector [v] element ([offset]+[i]) +
 404           vector [entry\_temp] element [j]
 405       5) increment [i]
 406
 407     }
 408
 409   6) if ( [i] is less than [n] ) continue at step 2
 410   7) done
 411 \end{programlisting}
 412
 413
 414
 415 \subsubsection{format 2 specifics}
 416
 417 Format 2 is reducible to format 1.  It may be implemented as an additional step prior to and an additional post-decode step after a normal format 1 decode.
 418
 419
 420 Format 2 handles 'do not decode' vectors differently than residue 0 or
 421 1; if all vectors are marked 'do not decode', no decode occurrs.
 422 However, if at least one vector is to be decoded, all the vectors are
 423 decoded.  We then request normal format 1 to decode a single vector
 424 representing all output channels, rather than a vector for each
 425 channel.  After decode, deinterleave the vector into independent vectors, one for each output channel.  That is:
 426
 427 \begin{enumerate}
 428  \item If all vectors 0 through \emph{ch}-1 are marked 'do not decode', allocate and clear a single vector \varname{[v]}of length \emph{ch*n} and skip step 2 below; proceed directly to the post-decode step.
 429  \item Rather than performing format 1 decode to produce \emph{ch} vectors of length \emph{n} each, call format 1 decode to produce a single vector \varname{[v]} of length \emph{ch*n}.
 430  \item Post decode: Deinterleave the single vector \varname{[v]} returned by format 1 decode as described above into \emph{ch} independent vectors, one for each outputchannel, according to:
 431   \begin{programlisting}
 432   1) iterate [i] over the range 0 ... [n]-1 {
 433
 434        2) iterate [j] over the range 0 ... [ch]-1 {
 435
 436             3) output vector number [j] element [i] = vector [v] element ([i] * [ch] + [j])
 437
 438           }
 439      }
 440
 441   4) done
 442   \end{programlisting}
 443
 444 \end{enumerate}
 445
 446
 447
 448
 449
 450
 451