doc/vp3-format.txt

   1 VP3 Bitstream Format and Decoding Process
   2 by Mike Melanson (mike at multimedia.cx)
   3 v0.5: December 8, 2004
   4
   5
   6 [December 8, 2004: Note that this document is not complete and likely
   7 will never be completed. However, it helped form the basis of Theora I
   8 specification available at
   9   http://www.theora.org/doc/Theora_I_spec.pdf ]
  10
  11
  12 Contents
  13 --------
  14  * Introduction
  15  * Underlying Coding Concepts
  16  * VP3 Coding Overview
  17  * VP3 Chunk Format
  18  * Decoding The Frame Header
  19  * Initializing The Quantization Matrices
  20  * Hilbert Coding Pattern
  21  * Unpacking The Block Coding Information
  22  * Unpacking The Macroblock Coding Mode Information
  23  * Unpacking The Macroblock Motion Vectors
  24  * Unpacking The DCT Coefficients
  25  * Reversing The DC Prediction
  26  * Reconstructing The Frame
  27  * Theora Specification
  28  * Appendix A: Quantization Matrices And Scale Factors
  29  * Appendix B: Macroblock Coding Mode Alphabets
  30  * Appendix C: DCT Coefficient VLC Tables
  31  * Appendix D: The VP3 IDCT
  32  * Acknowledgements
  33  * References
  34  * Changelog
  35
  36
  37 Introduction
  38 ------------
  39 A company named On2 (http://www.on2.com) created a video codec named
  40 VP3. Eventually, they decided to open source it. Like any body of code
  41 that was produced on a deadline, the source code was not particularly
  42 clean or well-documented. This makes it difficult to understand the
  43 fundamental operation of the codec.
  44
  45 This document describes the VP3 bitstream format and decoding process at
  46 a higher level than source code.
  47
  48
  49 Underlying Coding Concepts
  50 --------------------------
  51 In order to understand the VP3 coding method it is necessary to
  52 understand the individual steps in the process. Like many multimedia
  53 compression algorithms VP3 does not consist of a single coding method.
  54 Rather, it uses a chain of methods to achieve compression.
  55
  56 If you are acquainted with the MPEG video clique then many of VP3's
  57 coding concepts should look familiar as well. What follows is a list of
  58 the coding methods used in VP3 and a brief description of each.
  59
  60 * Discrete Cosine Transform (DCT): This is a magical mathematical
  61 function that takes a group of numbers and turns it into another group
  62 of numbers. The transformed group of numbers exhibits some curious
  63 properties. Notably, larger numbers are concentrated in certain areas of
  64 the transformed group.
  65
  66 A video codec like VP3 often operates on 8x8 blocks of numbers. When
  67 these 8x8 blocks are transformed using a DCT the larger numbers occur
  68 mostly in the up and left areas of the block with the largest number
  69 occurring as the first in the block (up-left corner). This number is
  70 called the DC coefficient. The other 63 numbers are called the AC
  71 coefficients.
  72
  73 The DCT and its opposite operation, the inverse DCT, require a lot of
  74 multiplications. Much research and experimentation is focused of
  75 optimizing this phase of the coding/decoding process.
  76
  77 * Quantization: This coding step tosses out information by essentially
  78 dividing a number to be coded by a factor and throwing away the
  79 remainder. The inverse process (dequantization) involves multiplying by
  80 the same factor to obtain a number that is close enough to the original.
  81
  82 * Run Length Encoding (RLE): The concept behind RLE is to shorten runs
  83 of numbers that are the same. For example, the string "88888" is encoded
  84 as (5, 8), indicating a run of 5 '8' numbers. In VP3 (and MPEG/JPEG),
  85 RLE is used to record the number of zero-value coefficients that occur
  86 before a non-zero coefficient. For example:
  87
  88   0 0 0 0 5 0 2 0 0 0 9
  89
  90 is encoded as:
  91
  92   (4, 5), (1, 2), (3, 9)
  93
  94 This indicates that a run of 4 zeroes is followed by a coefficient of 5;
  95 then a run of 1 zero is followed by 2; then a run of 3 zeroes is
  96 followed by 9.
  97
  98 * Zigzag Ordering: After transforming and quantizing a block of samples,
  99 the samples are not in an optimal order for run length encoding. Zigzag
 100 ordering rearranges the samples to put more zeros between non-zero
 101 samples.
 102
 103 * Differential (or Delta) Pulse Code Modulation (DPCM): 1 + 1 = 2. Got
 104 that? Seriously, that is what DPCM means. Rather than encoding absolute
 105 values, encode the differences between successive values. For example:
 106
 107   82 84 81 80 86 88 85
 108
 109 Can be delta-encoded as:
 110
 111   82 +2 -3 -1 +6 +2 -3
 112
 113 Most of the numbers turn into smaller numbers which require less
 114 information to encode.
 115
 116 * Motion Compensation: Simply, this coding method specifies that a block
 117 from a certain position in the previous frame is to be copied into a new
 118 position in the current frame. This technique is often combined with DCT
 119 and DPCM coding, as well as fractional pixel motion.
 120
 121 * Entropy Coding (a.k.a. Huffman Coding): This is the process of coding
 122 frequently occurring symbols with fewer bits than symbols that are not
 123 likely to occur as frequently.
 124
 125 * Variable Length Run Length Booleans: An initial Boolean bit is
 126 extracted from the bitstream. A variable length code (VLC) is extracted
 127 from the bitstream and converted to a count. This count indicates that
 128 the next (count) elements are to be set to the Boolean value.
 129 Afterwards, the Boolean value is toggled, the next VLC is extracted and
 130 converted to a count, and the process continues until all elements are
 131 set to either 0 or 1.
 132
 133 * YUV Colorspace: Like many modern video codecs, VP3 operates on a YUV
 134 colorspace rather than a RGB colorspace. Specifically, VP3 uses YUV
 135 4:2:0, alias YUV420P, YV12. Note: Throughout the course of this
 136 document, the U and V planes (a.k.a., Cb and Cr planes) will be
 137 collectively referred to as C planes (color or chrominance planes).
 138
 139 * Frame Types: VP3 has intra-coded frames, a.k.a. intraframes, I-frames,
 140 or keyframes. VP3 happens to call these golden frames. VP3 has
 141 interframes, a.k.a. predicted frames or P-frames. These frames can use
 142 information from either the previous interframe or from the previous
 143 golden frame.
 144
 145
 146 VP3 Overview
 147 ------------
 148 The first thing to understand about the VP3 coding method is that it
 149 encodes all 3 planes upside down. That is, the data is encoded from
 150 bottom-to-top rather than top-to-bottom as is done with many video
 151 codecs.
 152
 153 VP3 codes a video frame by first breaking each of the 3 planes (Y, U,
 154 and V) into a series of 8x8 blocks called fragments. VP3 also has a
 155 notion of superblocks. Superblocks encapsulate 16 fragments arranged in
 156 a 4x4 matrix. Each plane has its own set of superblocks. Further, VP3
 157 also uses the notion of macroblocks which is the same as that found in
 158 JPEG/MPEG. One macroblock encompasses 4 blocks from the Y plane arranged
 159 in a 2x2 matrix, 1 block from the U plane, and 1 block from the V plane.
 160 While a fragment or a superblock applies to 1 and only 1 plane, a
 161 macroblock extends over all 3 planes.
 162
 163 VP3 compresses golden frames by transforming each fragment with a
 164 discrete cosine transform. Each transformed sample is then quantized and
 165 the DC coefficient is reduced via DPCM using a combination of DC
 166 coefficients from surrounding fragments as predictors. Then, each
 167 fragment's DC coefficient is entropy-coded in the output bitstream,
 168 followed by each fragment's first AC coefficient, then each second AC
 169 coefficient, and so on.
 170
 171 An interframe, naturally, is more complicated. While there is only one
 172 coding mode available for a golden frame (intra coding), there are 8
 173 coding modes that the VP3 coder can choose from for interframe
 174 macroblocks. Intra coding as seen in the keyframe is still available.
 175 The rest of the modes involve encoding a fragment diff, either from the
 176 previous frame or the golden frame, from the same coordinate or from the
 177 same coordinate plus a motion vector. All of the macroblock coding modes
 178 and motion vectors are encoded in an interframe bitstream.
 179
 180
 181 VP3 Chunk Format
 182 ----------------
 183 The high-level format of a compressed VP3 frame is laid out as:
 184
 185  * chunk header
 186  * block coding information
 187  * macroblock coding mode information
 188  * motion vectors
 189  * DC coefficients
 190  * 1st AC coefficients
 191  * 2nd AC coefficients
 192  * ...
 193  * 63rd AC coefficients
 194
 195
 196 Decoding The Frame Header
 197 -------------------------
 198 The chunk header always contains at least 1 byte which has the following
 199 format:
 200
 201   bit 7: 0 = golden frame, 1 = interframe
 202   bit 6: unused
 203   bits 5-0: Quality index (0..63)
 204
 205 Further, if the frame is a golden frame, there are 2 more bytes in the
 206 header:
 207
 208   byte 0: version byte 0
 209   byte 1:
 210     bits 7-3: VP3 version number (stored)
 211     bit 2:    key frame coding method (0 = DCT key frame, only type
 212               supported)
 213     bits 1-0: unused, spare bits
 214
 215 All frame headers are encoded with a quality index. This 6-bit value is
 216 used to index into 2 dequantizer scaling tables, 1 for DC values and 1
 217 for AC values. Each of the 3 dequantization tables is modified per these
 218 scaling values.
 219
 220
 221 Initializing The Quantization Matrices
 222 --------------------------------------
 223 VP3 has three static matrices for quantizing and dequantizing fragments.
 224 One matrix is for quantizing golden frame Y fragments, one matrix is for
 225 quantizing golden frame C fragments, and one matrix is for quantizing both
 226 golden frame and interframe Y or C fragments. While these matrices are
 227 static, they are adjusted according to quality index coded in the header.
 228
 229 The quality index is an index into 2 64-element tables:
 230 dc_scale_factor[] and ac_scale_factor[]. Each quantization factor from
 231 each of the three quantization matrices is adjusted by the appropriate
 232 scale factor according to this formula:
 233
 234                 base quantizer * scale factor
 235   quantizer  =  -----------------------------
 236                             100
 237
 238     where scale factor =
 239         dc_scale_factor[quality_index] for DC dequantizer
 240         ac_scale_factor[quality_index] for AC dequantizer
 241
 242 The quantization matrices need to be recalculated at the beginning of a
 243 frame decode if the current frame's quality index is different from the
 244 previous frame's quality index.
 245
 246 See Appendix A for the complete VP3 quantization matrices and scale factor
 247 tables.
 248
 249 As an example, this is the base quantization matrix for golden frame Y
 250 fragments:
 251
 252     16  11  10  16  24  40  51  61
 253     12  12  14  19  26  58  60  55
 254     14  13  16  24  40  57  69  56
 255     14  17  22  29  51  87  80  62
 256     18  22  37  58  68 109 103  77
 257     24  35  55  64  81 104 113  92
 258     49  64  78  87 103 121 120 101
 259     72  92  95  98 112 100 103  99
 260
 261 If a particular coded frame specifies a quality index of 54. Element 54
 262 of the dc_scale_factor table is 20, thus:
 263
 264                                16 * 20
 265   DC coefficient quantizer  =  -------  =  3
 266                                  100
 267
 268 Element 54 of the ac_scale_factor table is 24. The AC coefficient
 269 quantizers are each scaled using this factor, e.g.:
 270
 271     11 * 24
 272     -------   =  2
 273       100
 274
 275     100 * 24
 276     --------  =  24
 277       100
 278
 279 [not complete; still need to explain how these quantizers are saturated
 280 and scaled with respect to the DCT process]
 281
 282
 283 Hilbert Coding Pattern
 284 ----------------------
 285 VP3 uses a Hilbert pattern to code fragments within a superblock. A
 286 Hilbert pattern is a recursive pattern that can grow quite complicated.
 287 The coding pattern that VP3 uses is restricted to this pattern subset,
 288 where each fragment in a superblock is represented by a 'X':
 289
 290       X -> X    X -> X
 291            |    ^
 292            v    |
 293       X <- X    X <- X
 294       |              ^
 295       v              |
 296       X    X -> X    X
 297       |    ^    |    ^
 298       v    |    v    |
 299       X -> X    X -> X
 300
 301 As an example of this pattern, consider a plane that is 256 samples wide
 302 and 64 samples high. Each fragment row will be 32 fragments wide. The
 303 first superblock in the plane will be comprised of these 16 fragments:
 304
 305    0   1   2   3  ...  31
 306   32  33  34  35  ...  63
 307   64  65  66  67  ...  95
 308   96  97  98  99  ... 127
 309
 310 The order in which these 16 fragments are coded is:
 311
 312    0 |  0  1  14 15
 313   32 |  3  2  13 12
 314   64 |  4  7   8 11
 315   96 |  5  6   9 10
 316
 317 All of the image coding information, including the block coding status
 318 and modes, the motion vectors, and the DCT coefficients, are all coded
 319 and decoded using this pattern. Thus, it is rather critical to have the
 320 pattern and all of its corner cases handled correctly. In the above
 321 example, if the bottom row and left column were not present due to the
 322 superblock being in a corner, the pattern proceeds as if the missing
 323 fragments were present, but the missing fragments are omitted in the
 324 final coding list. The coding order would be:
 325
 326   0, 1, 2, 3, 4, 7, 8, 13, 14
 327
 328
 329 Unpacking The Block Coding Information
 330 --------------------------------------
 331 After unpacking the frame header, the decoder unpacks the block coding
 332 information. The only information determined in this phase is whether a
 333 particular superblock and its fragments are coded in the current frame
 334 or unchanged from the previous frame. The actual coding method is
 335 determined in the next phase.
 336
 337 If the frame is a golden frame then every superblock, macroblock, and
 338 fragment is marked as coded.
 339
 340 If the frame is an interframe, then the block coding information must be
 341 decoded. This is the phase where a decoder will build a list of coded
 342 fragments for which coding mode, motion vector, and DCT coefficient data
 343 must be decoded.
 344
 345 First, a list of partially-coded superblocks is unpacked from the
 346 stream. This list is coded as a series of variable-length run length
 347 codes (VLRLC). First, the code is initialized by reading the next bit in
 348 the stream. Then, while there are still superblocks remaining in the
 349 list, fetch a VLC from the stream according to this table:
 350
 351   Codeword                Run Length
 352   0                       1
 353   10x                     2-3
 354   110x                    4-5
 355   1110xx                  6-9
 356   11110xxx                10-17
 357   111110xxxx              18-33
 358   111111xxxxxxxxxxxx      34-4129
 359
 360 For example, a VLC of 1101 represents a run length of 5. If the VLRLC
 361 was initialized to 1, then the next 5 superblocks would be set to 1,
 362 indicating that they are partially coded in the current frame. Then the
 363 bit value is toggled to 0, another VLC is fetched from the stream and
 364 the process continues until each superblock has been marked either
 365 partially coded (1) or not (0).
 366
 367 If any of the superblocks were marked as not partially coded in the
 368 previous step, then a list of fully-coded superblocks is unpacked next
 369 using the same VLRLC as the list of partially-coded superblocks.
 370 Initialize the VLRLC with the next bit in the stream. For each
 371 superblock that was not marked as partially coded, mark it with either a
 372 0 or 1 according to the current VLRLC. By the end of this step, each
 373 superblock will be marked as either not coded, partially coded, or fully
 374 coded.
 375
 376 Let's work through an example with an image frame that is 256x64 pixels.
 377 This means that the Y plane contains 4x2 superblocks and each of the C
 378 planes contains 2 superblocks each. The superblocks are numbered as
 379 follows:
 380
 381    Y:  0  1  2  3     U:  8  9
 382        4  5  6  7     V: 10 11
 383
 384 This is the state of the bitstream:
 385
 386   1100011001101
 387
 388 Which is interpreted as:
 389
 390  initial   2 1's   1 0   4 1's  5 0's
 391     1       100     0     1100   1101
 392
 393 Superblocks 0-1 and 3-6 are marked as partially coded. Since there were
 394 blocks that were not marked, proceed to unpack the list of fully-coded
 395 superblocks. This is the state of the bitstream:
 396
 397   1101101
 398
 399 Which is interpreted as:
 400
 401  initial  3 1's  3 0's
 402     1      101    100
 403
 404 Superblocks 2, 7, and 8 are marked as fully coded while superblocks 9,
 405 10, and 11 are marked as not coded.
 406
 407 If any of the superblocks were marked as partially coded, the next data
 408 in the bitstream will define which fragments inside each partially-coded
 409 superblock are coded. This is the first place where the Hilbert pattern
 410 comes into play.
 411
 412 For each partially-coded superblock, iterate through each fragment
 413 according to the Hilbert pattern. Use the VLRLC method, only with a
 414 different table, to determine which fragments are coded. The VLRLC table
 415 for fragment coding runs is:
 416
 417   Codeword                Run Length
 418   0x                      1-2
 419   10x                     3-4
 420   110x                    5-6
 421   1110xx                  7-10
 422   11110xx                 11-14
 423   11111xxxx               15-30
 424
 425 Continuing with the contrived example, superblocks 0 and 1 are both
 426 partially coded. This is the state of the bitstream:
 427
 428   0011001111010001111010...(not complete)
 429
 430 Which is interpreted as:
 431  initial  2 0's  3 1's  13 0's   1 1   13 0's
 432     0      01     100   1111010   00   1111010 ...
 433
 434 This indicates that fragments 2-4 in superblock 0 are coded, while
 435 fragments 0, 1, and 5-15 are not. Note that the run of 12 0's cascades
 436 over into the next fragment, indicating that fragment 0 of superblock 1
 437 is not coded. Fragment 1 of superblock 1 is coded, while the rest of the
 438 superblock's fragments are not coded. The example ends there (a real
 439 bitstream should have enough data to describe all of the partially-coded
 440 superblocks). Superblock 2 is fully coded which means all 16 fragments
 441 are coded. Thus, superblocks 0-2 have the following coded fragments:
 442
 443   0 |  x  x  x  x    x  x  x  x    0  1 14 15
 444  32 |  3  2  x  x    x  2  x  x    3  2 13 12
 445  64 |  4  x  x  x    x  x  x  x    4  7  8 11
 446  96 |  x  x  x  x    x  x  x  x    5  6  9 10
 447
 448 This is a good place to generate the list of coded fragment numbers for
 449 this frame. In this case, the list will begin as:
 450
 451   33 32 64 37 8 9 41 40 72 104 105 73 ...
 452
 453 and so on through the remaining 8 fragments of superblock 2 and onto the
 454 fragments for the remaining superblocks that are either fully or
 455 partially coded.
 456
 457
 458 Unpacking The Macroblock Coding Mode Information
 459 ------------------------------------------------
 460 After unpacking the block coding information, the decoder unpacks the
 461 macroblock coding mode information. This process is simple when
 462 decoding a golden frame-- since the only possible decoding mode is INTRA,
 463 no macroblock coding mode information is transmitted. However, in an
 464 interframe, each coded macroblock is encoded with one of 8 methods:
 465
 466 0, INTER_NO_MV:
 467   current fragment =
 468     (fragment from previous frame @ same coordinates) +
 469     (DCT-encoded residual)
 470
 471 1, INTRA:
 472   current fragment = DCT-encoded block, just like in a golden frame
 473
 474 2, INTER_PLUS_MV:
 475   current fragment =
 476     (fragment from previous frame @ (same coords + motion vector)) +
 477     (DCT-encoded residual)
 478
 479 3, INTER_LAST_MV:
 480   same as INTER_PLUS_MV but using the last motion vector decoded from
 481   the bitstream
 482
 483 4, INTER_PRIOR_LAST;
 484   same as INTER_PLUS_MV but using the second-to-last motion vector
 485   decoded from the bitstream
 486
 487 5, USING_GOLDEN:
 488   same as INTER_NO_MV but referencing the golden frame instead of
 489   previous interframe
 490
 491 6, GOLDEN_MV:
 492   same as INTER_PLUS_MV but referencing the golden frame instead of
 493   previous interframe
 494
 495 7, INTER_FOURMV:
 496   same as INTER_PLUS_MV except that each of the 4 Y fragments gets its
 497   own motion vector, and the U and V fragments share the same motion
 498   vector which is the average of the 4 Y fragment vectors
 499
 500 The MB coding mode information is encoded using one of 8 alphabets. The
 501 first 3 bits of the MB coding mode stream indicate which of the 8
 502 alphabets, 0..7, to use to decode the MB coding information in this frame.
 503 The reason for the different alphabets is to minimize the number of bits
 504 needed to encode this section of information. Each alphabet arranges the
 505 coding modes in a different order, indexing the 8 modes into 8 index
 506 slots. Index 0 is encoded with 1 bit (0), index 1 is encoded with 2 bits
 507 (10), index 2 is encoded with 3 bits (110), and so on up to indices 6 and
 508 7 which are encoded with 6 bits each (1111110 and 1111111, respectively):
 509
 510   index   encoding
 511   -----   --------
 512     0      0
 513     1      10
 514     2      110
 515     3      1110
 516     4      11110
 517     5      111110
 518     6      1111110
 519     7      1111111
 520
 521 For example, the coding modes are arranged in alphabet 1 as follows:
 522
 523   index   coding mode
 524   -----   -----------
 525     0     MODE_INTER_LAST_MV
 526     1     MODE_INTER_PRIOR_LAST
 527     2     MODE_INTER_PLUS_MV
 528     3     MODE_INTER_NO_MV
 529     4     MODE_INTRA
 530     5     MODE_USING_GOLDEN,
 531     6     MODE_GOLDEN_MV
 532     7     MODE_INTER_FOURMV
 533
 534 This alphabet arrangement is designed for frames in which motion vectors
 535 based off of the previous interframe dominate.
 536
 537 When unpacking MB coding mode information for a frame, the decoder first
 538 reads 3 bits from the stream to determine the alphabet. In this example,
 539 the 3 bits would be 001 to indicate alphabet 1. Consider this contrived
 540 bitstream following the alphabet number:
 541
 542   1010000011000011111110...
 543
 544 The bits are read as follows:
 545
 546          10 10 0 0 0 0 110 0 0 0 1111111 0
 547   index:  1  1 0 0 0 0   2 0 0 0       7 0
 548
 549 This arrangement of indices translates to this series of coding modes:
 550
 551   index   coding mode
 552   -----   -----------
 553     1     MODE_INTER_PRIOR_LAST
 554     1     MODE_INTER_PRIOR_LAST
 555     0     MODE_INTER_LAST_MV
 556     0     MODE_INTER_LAST_MV
 557     0     MODE_INTER_LAST_MV
 558     0     MODE_INTER_LAST_MV
 559     2     MODE_INTER_PLUS_MV
 560     0     MODE_INTER_LAST_MV
 561     0     MODE_INTER_LAST_MV
 562     0     MODE_INTER_LAST_MV
 563     7     MODE_INTER_FOURMV
 564     0     MODE_INTER_LAST_MV
 565
 566 There are 6 pre-defined alphabets. Consult Appendix B for the complete
 567 alphabets. What happens if none of the 6 pre-defined alphabets fit? The
 568 VP3 encoder can choose to use alphabet 0 which indicates a custom
 569 alphabet. The 3-bit coding mode numbers for each index, 0..7, are stored
 570 after the alphabet number in the bitstream. For example, the sequence:
 571
 572   000 111 110 101 100 011 010 001 000
 573
 574 would indicate coding alphabet 0 (custom alphabet), index 0 corresponds to
 575 coding mode 7 (INTER_FOURMV), index 1 corresponds to coding mode 6
 576 (GOLDEN_MV), and so on down to index 7 which would correspond to coding
 577 mode 0 (INTER_NO_MV).
 578
 579 There is one more possible alphabet: Alphabet 7. This alphabet is
 580 reserved for when there is such a mixture of coding modes used in a frame
 581 that using any variable-length coding mode would result in more bits than
 582 a fixed-length representation. When alphabet 7 is specified, the decoder
 583 reads 3 bits at a time from the bitstream, and uses those directly as the
 584 macroblock coding modes.
 585
 586 To recap, this is the general algorithm for decoding macroblock coding
 587 mode information:
 588
 589   if (golden frame)
 590     all frames are intracoded, there is no MB coding mode information
 591   else
 592     read 3 bits from bitstream to determine alphabet
 593     if alphabet = 0
 594       this is a custom alphabet, populate index table with 8 3-bit coding
 595         modes read from bitstream
 596     foreach coded macroblock, unpack a coding mode:
 597       if alphabet = 7
 598         read 3 bits from the bitstream as the coding mode for the
 599           macroblock
 600       else
 601         read a VLC from the bitstream
 602         use the decoded VLC value to index into the coding mode alphabet
 603           selected for this frame and assign the indexed coding mode to
 604           this macroblock
 605
 606
 607 Unpacking The Macroblock Motion Vectors
 608 ---------------------------------------
 609 After unpacking the macroblock coding mode information, the decoder
 610 unpacks the macroblock motion vectors. This phase essentially assigns a
 611 motion vector to each of the 6 constituent fragments of any coded
 612 macroblock that requires motion vectors.
 613
 614 If the frame is a golden frame then there is no motion compensation and
 615 no motion vectors are encoded in the bitstream.
 616
 617 If the frame is an interframe, the next bit is read from the bitstream
 618 to determine the vector entropy coding method used. If the coding method
 619 is zero then all of the vectors will be unpacked using a VLC method. If
 620 the coding method is 1 then all of the vectors will be unpacked using a
 621 fixed length method.
 622
 623 The VLC unpacking method reads 3 bits from the bitstream. These 3 bits
 624 comprise a number ranging from 0..7 which indicate the next action:
 625
 626 0, MV component = 0
 627 1, MV component = 1
 628 2, MV component = -1
 629 3, MV component = 2, read next bit for sign
 630 4, MV component = 3, read next bit for sign
 631 5, MV component = 4 + (read next 2 bits), read next bit for sign
 632    range: (4..7, -4..-7)
 633 6, MV component = 8 + (read next 3 bits), read next bit for sign
 634    range: (8..15, -8..-15)
 635 7, MV component = 16 + (read next 4 bits), read next bit for sign
 636    range: (16..31, -16..-31)
 637
 638 The fixed length vector unpacking method simply reads the next 5 bits
 639 from the bitstream, reads the next bit for sign, and calls the whole
 640 thing a motion vector component. This gives a range of (-31..31), which
 641 is the same range as the VLC method.
 642
 643 For example, consider the following contrived motion vector bitstream:
 644
 645   000001011011111000...
 646
 647 The stream is read as:
 648
 649   0  (000  010)  (110 111 1  100 0)
 650
 651 The first bit indicates the entropy method which, in this example, is
 652 variable length as opposed to fixed length. The next 3 bits are 0 which
 653 indicate a X MV component of 0. The next 3 bits are 2 which indicate a Y
 654 MV component of -1. The first motion vector encoded in this stream is
 655 (0, -1). The next 3 bits are 6 which indicate 8 + next 3 bits (7) with
 656 another bit indicating sign (1 in this case, which is negative). Thus,
 657 the X MV component is -15. The next 3 bits are 4 which indicate a Y MV
 658 component of 3 with one more bit for the sign (0 is positive). So the
 659 second motion vector encoded in this stream is (-15, 3).
 660
 661 As an example of the fixed-length entropy method, consider the following
 662 contrived bitstream:
 663
 664   1010101101010...
 665
 666 The stream is read as:
 667
 668   1  01010 1  10101 0
 669
 670 The first bit indicates the fixed length entropy method. The first 5 bits
 671 are 10 followed by a negative sign bit. The next 5 bits are 21 followed by
 672 a positive sign bit. The first motion vector in this stream is (-10, 21).
 673
 674 During this phase of the decoding process, it is traditional to assign all
 675 motion vectors for all coded macroblocks that require them, whether they
 676 are unpacked from the motion vector bitstream or copied from previous
 677 coded macroblocks. It is necessary to track the motion vectors for both
 678 the previous macroblock as well as the next-to-last (prior) macroblock.
 679 The general algorithm for this phase is as follows:
 680
 681   foreach coded macroblock
 682     last MV = 0
 683     prior last MV = 0
 684     if coding mode = MODE_INTER_PLUS_MV or MODE_GOLDEN_MV
 685       read current MV pair from the bitstream and set all fragment motion
 686         vectors to that pair
 687       prior last MV = last MV
 688       last MV = current MV
 689
 690     if coding mode = MODE_INTER_FOURMV
 691       read MV for first Y fragment in macroblock
 692       read MV for second Y fragment in macroblock
 693       read MV for third Y fragment in macroblock
 694       read MV for fourth Y fragment in macroblock
 695       set U & V fragment motion vectors to average of 4 Y vectors,
 696         calculated as follows:
 697         if sum of all 4 X motion components is positive, the X
 698         motion component for the U & V fragments is (sum + 2) / 4,
 699         otherwise, it is (sum - 2) / 4; repeat the same process for the
 700         Y components
 701       prior last MV = last MV
 702       last MV = MV for fourth Y fragment from this macroblock
 703
 704     if coding mode = MODE_INTER_LAST_MV
 705       motion vectors for this macroblock are the same as last MV; note
 706         that in this case, the last MV remains the last MV and the prior
 707         last MV remains the prior last MV
 708
 709     if coding mode = MODE_INTER_PRIOR_LAST
 710       motion vectors for this macroblock are the same as prior last MV
 711       prior last MV = last MV
 712       last MV = current MV (effectively, swap last and prior last vectors)
 713
 714
 715 Unpacking The DCT Coefficients
 716 ------------------------------
 717 After unpacking the macroblock motion vectors, the decoder unpacks the
 718 fragment DCT coefficient data. Each coded fragment has 64 DCT
 719 coefficients. Some of the coefficients will be non-zero. Many of the
 720 coefficients will, or should be 0 as this is where the coding method
 721 derives much of its compression.
 722
 723 During this phase, the decoder will be unpacking DCT coefficients, zero
 724 runs, and end-of-block (EOB) codes. The decoder unpacks the the DC
 725 coefficients for all fragments, then all of the first AC coefficients,
 726 and so on until all of the 64 DCT coefficients are unpacked from the
 727 bitstream.
 728
 729 To obtain the DCT coefficients, the decoder unpacks a series of VLCs
 730 from the bitstream which turn into a series of tokens ranging from
 731 0..31. Each of these tokens specifies which action to take next. VP3
 732 defines 80 different 32-element histograms for VLC decoding:
 733
 734   16 histograms for DC token decoding
 735   16 histograms for group 1 AC token decoding
 736   16 histograms for group 2 AC token decoding
 737   16 histograms for group 3 AC token decoding
 738   16 histograms for group 4 AC token decoding
 739
 740 The decoder fetches 4 bits from the bitstream that will be used to
 741 select a DC histogram and 4 bits that will be used to select 4 AC
 742 histograms, one for each AC group.
 743
 744 The meaning of each of the 32 possible tokens follows. 'EB' stands for
 745 extra bits read from bitstream directly after the VLC token:
 746
 747 0, DCT_EOB_TOKEN
 748 set the current block to EOB, meaning that the block is marked as being
 749 fully unpacked
 750
 751 1, DCT_EOB_PAIR_TOKEN
 752 set the next 2 blocks to EOB
 753
 754 2. DCT_EOB_TRIPLE_TOKEN
 755 set the next 3 blocks to EOB
 756
 757 3, DCT_REPEAT_RUN_TOKEN
 758 set the next (2 EBs + 4) blocks to EOB
 759
 760 4, DCT_REPEAT_RUN2_TOKEN
 761 set the next (3 EBs + 8) blocks to EOB
 762
 763 5, DCT_REPEAT_RUN3_TOKEN
 764 set the next (4 EBs + 16) blocks to EOB
 765
 766 6, DCT_REPEAT_RUN4_TOKEN
 767 set the next (12 EBs) blocks to EOB
 768
 769 7, DCT_SHORT_ZRL_TOKEN
 770 skip (3 EBs + 1) positions in the output matrix
 771
 772 8, DCT_ZRL_TOKEN
 773 skip (6 EBs + 1) positions in the output matrix
 774
 775 9, ONE_TOKEN
 776 output 1 as coefficient
 777
 778 10, MINUS_ONE_TOKEN
 779 output -1 as coefficient
 780
 781 11, TWO_TOKEN
 782 output 2 as coefficient
 783
 784 12, MINUS_TWO_TOKEN
 785 output -2 as coefficient
 786
 787 13, 14, 15, 16, LOW_VAL_TOKENS
 788 next EB determines coefficient sign; coeff = DCT_VAL_CAT2_MIN (3) +
 789 (token - 13) (this gives a range of +/- 3..6)
 790
 791 17, DCT_VAL_CATEGORY3
 792 next EB determines coefficient sign; coeff = DCT_VAL_CAT3_MIN (7) + next
 793 EB (this gives a range of +/- 7..8)
 794
 795 18, DCT_VAL_CATEGORY4
 796 next EB determines coefficient sign; coeff = DCT_VAL_CAT4_MIN (9) + next
 797 2 EBs (this gives a range of +/- 9..12)
 798
 799 19, DCT_VAL_CATEGORY5
 800 next EB determines coefficient sign; coeff = DCT_VAL_CAT5_MIN (13) +
 801 next 3 EBs (this gives a range of +/- 13..20)
 802
 803 20, DCT_VAL_CATEGORY6
 804 next EB determines coefficient sign; coeff = DCT_VAL_CAT6_MIN (21) +
 805 next 4 EBs (this gives a range of +/- 21..36)
 806
 807 21, DCT_VAL_CATEGORY7
 808 next EB determines coefficient sign; coeff = DCT_VAL_CAT7_MIN (37) +
 809 next 5 EBs (this gives a range of +/- 37..68)
 810
 811 22, DCT_VAL_CATEGORY8
 812 next EB determines coefficient sign; coeff = DCT_VAL_CAT8_MIN (69) +
 813 next 9 EBs (this gives a range of +/- 69..580)
 814
 815 23, 24, 25, 26, 27, DCT_RUN_CATEGORY1
 816 coefficient of +/- 1 preceded by a number of 0s; next EB determines sign
 817 of coefficient; skip (token - 22) 0s in the output matrix before
 818 placing the final coefficient (this gives a range of 1..5 0s)
 819
 820 28, DCT_RUN_CATEGORY1B
 821 coefficient of +/- 1 preceded by a number of 0s; next EB determines sign
 822 of coefficient; skip (next 2 EBs + 6) 0s in the output matrix before
 823 placing the final coefficient (this gives a range of 6..9 0s)
 824
 825 29, DCT_RUN_CATEGORY1C
 826 coefficient of +/- 1 preceded by a number of 0s; next EB determines sign
 827 of coefficient; skip (next 3 EBs + 10) 0s in the output matrix before
 828 placing the final coefficient (this gives a range of 10..17 0s)
 829
 830 30, DCT_RUN_CATEGORY2
 831 coefficient of +/- 2..3 preceded by a single zero; next EB determines
 832 sign of coefficient; coefficient = (next EB + 2)
 833
 834 31, DCT_RUN_CATEGORY2B (not specifically named in VP3 source)
 835 coefficient of +/- 2..3 preceded by 2 or 3 0s; next EB determines
 836 sign of coefficient; coefficient = (next EB + 2); skip (next EB + 2) 0s
 837 before placing coefficient in output matrix
 838
 839 Note: EOB runs can, and often do, cross threshold stages and plane
 840 boundaries. For example, a decoder may have decoded all of the AC #2
 841 coefficients for all fragments and still have an EOB run of 2. That
 842 means that during the AC #3 decode process, the first 2 coded fragments
 843 that are not already EOB will be set to EOB.
 844
 845 Let's work through a highly contrived example to illustrate the
 846 coefficient decoding process.
 847
 848
 849
 850 [not finished]
 851
 852
 853
 854
 855 When the decoder is finished unpacking the DCT coefficients, the entire
 856 encoded VP3 frame bitstream should be consumed.
 857
 858
 859 Reversing The DC Prediction
 860 ---------------------------
 861 Now that all of the DCT coefficient data has been unpacked, the DC
 862 coefficients need to be fully reconstructed before the IDCT can be
 863 performed.
 864
 865 VP3 uses a somewhat involved process for DC prediction which uses up to
 866 four DC coefficients from surrounding fragments. For each fragment to be
 867 transformed with the IDCT, the DC coefficient is predicted from weighted
 868 sum of the DC coefficients in the left (l), up-left (ul), up (u), and
 869 up-right (ur) fragments, if they are coded (not unchanged from the
 870 previous frame) in a compatible frame (current, previous, or golden).
 871
 872 In a golden frame, the prediction is quite straightforward since all
 873 fragments will be coded. A fragment's DC prediction will fall into 1 of
 874 5 groups:
 875
 876      abbbbbbbbb
 877      cdddddddde
 878      cdddddddde
 879      cdddddddde
 880      cdddddddde
 881
 882 * Group a is the top left corner fragment. There is nothing to predict
 883 from. This DC coefficient has a lot of energy and requires many bits to
 884 code.
 885
 886 * Group b is the remainder of the top row of fragments. These fragments
 887 can only predict from the left fragment.
 888
 889 * Group c is the left column of fragments, not including the top left
 890 fragment. These fragments have the top and top-right fragments from
 891 which to predict.
 892
 893 * Group d is the main body of fragments. These fragments have access to
 894 all 4 predictors.
 895
 896 * Group e is the right column of fragments, not including the top right
 897 fragment. These fragments can predict from the left, up-left and up
 898 fragments.
 899
 900 The process of reversing prediction for interframes grows more complex.
 901 First, the decoder must evaluate which candidate fragments (l, ul, u, or
 902 ur) are available for as predictors. Then, it can only use fragments
 903 that are coded within the same frame (current, previous, or golden).
 904 Further, there are auxiliary predictors for each frame type that are
 905 initialized to 0 at the start of each video frame decode operation. The
 906 decoder falls back on these auxiliary predictors when it can not find
 907 any valid candidate predictors for the current fragment.
 908
 909 To work through some examples, consider the following notation, e.g.:
 910
 911   ul-C = up-left fragment, coded in the current frame
 912    u-P = up fragment, coded as a motion residual from the previous frame
 913   ur-C = up-right fragment, coded in the current frame
 914    l-G = left fragment, coded as a motion residual from the golden frame
 915    x-P = current fragment where DC prediction is being performed, coded
 916          as a motion residual from the previous frame
 917
 918 This is a simple case:
 919
 920    ul-C   u-C  ur-C
 921     l-C   x-C
 922
 923 The current fragment predicts from all four of the candidate fragments
 924 since they are coded in the same frame.
 925
 926    ul-P   u-C  ur-C
 927     l-P   x-P
 928
 929 The current fragment predicts from the left and up-left fragments.
 930
 931    ul-C   u-P  ur-G
 932     l-P   x-G
 933
 934 The current fragment predicts from the up-right fragment.
 935
 936    ul-C   u-C  ur-C
 937     l-C   x-G
 938
 939 The current fragment does not predict from any of the candidate
 940 fragments since the current fragment is a motion residual from the
 941 golden frame. Rather, add the auxiliary golden frame predictor to the
 942 current fragment's DC coefficient. Save the new DC coefficient as the
 943 new golden frame auxiliary DC predictor.
 944
 945 If the decoder only finds one valid candidate predictor, then it is used
 946 by itself. When the decoder finds multiple valid candidate fragments
 947 from which to predict DC, it applies a weighting function to the
 948 surrounding fragments' DC coefficients. The following table presents all
 949 16 possible combinations of available/not available predictors and what
 950 to do in each case:
 951
 952     ul   u  ur   l
 953     --  --  --  --
 954      0   0   0   0    no predictors available:
 955                         use the last predictor saved for the frame type
 956                         (either intra, inter, or golden)
 957
 958      0   0   0   1    left predictor available:
 959                         pred = l.dc
 960
 961      0   0   1   0    up-right predictor available:
 962                         pred = ur.dc
 963
 964      0   0   1   1    up-right, left predictors available:
 965                         pred = (53 * ur.dc) + (75 * l.dc)
 966                                --------------------------
 967                                          128
 968
 969      0   1   0   0    up predictor available:
 970                         pred = u.dc
 971
 972      0   1   0   1    up, left predictors available:
 973                         pred = (u.dc + l.dc)
 974                                -------------
 975                                      2
 976
 977      0   1   1   0    up, up-right predictors available:
 978                         discard up-right predictor
 979                         pred = u.dc
 980
 981      0   1   1   1    up, up-right, left predictors available:
 982                         discard up predictor
 983                         pred = (53 * ur.dc) + (75 * l.dc)
 984                                --------------------------
 985                                          128
 986
 987      1   0   0   0    up-left predictor available:
 988                         pred = ul.dc
 989
 990      1   0   0   1    up-left, left predictors available:
 991                         discard up-left predictor
 992                         pred = l.dc
 993
 994      1   0   1   0    up-left, up-right predictors available:
 995                         pred = (ul.dc + ur.dc)
 996                                ---------------
 997                                       2
 998
 999      1   0   1   1    up-left, up-right, left predictors available:
1000                         discard up-left predictor
1001                         pred = (53 * ur.dc) + (75 * l.dc)
1002                                --------------------------
1003                                          128
1004
1005      1   1   0   0    up-left, up predictors available:
1006                         discard up-left
1007                         pred = u.dc
1008
1009      1   1   0   1    up-left, up, left predictors available:
1010                         pred = (-26 * ul.dc + 29 * u.dc + 29 * l.dc)
1011                                -------------------------------------
1012                                                  32
1013
1014      1   1   1   0    up-left, up, up-right predictors available:
1015                         pred = (3 * ul.dc + 10 * u.dc + 3 * ur.dc)
1016                                -----------------------------------
1017                                                 16
1018
1019      1   1   1   1    all 4 predictors available:
1020                         discard up-right predictor
1021                         pred = (-26 * ul.dc + 29 * u.dc + 29 * l.dc)
1022                                -------------------------------------
1023                                                  32
1024
1025 Note that this final prediction case ([ul u l]) risks outranging. The
1026 difference of the predicted DC is checked against u.dc, l.dc, and ul.dc,
1027 in that order, and if the difference is greater than 128 in any case,
1028 the predictor is assigned as that DC coefficient. In pseudocode:
1029
1030   if (ABSOLUTE_VALUE(pred - u.dc) > 128)
1031     pref = u.dc
1032   else if (ABSOLUTE_VALUE(pred - l.dc) > 128)
1033     pref = l.dc
1034   else if (ABSOLUTE_VALUE(pred - ul.dc) > 128)
1035     pref = ul.dc
1036
1037 The predicted value is, at long last, added to the fragment's decoded DC
1038 coefficient. Finally, the new DC coefficient is saved as the frame
1039 type's auxiliary predictor. For example, if this fragment is coded as a
1040 motion residual from the previous frame, save the fragment's DC
1041 coefficient as the previous frame auxiliary predictor.
1042
1043
1044 [still need to mention precise rounding considerations, a.k.a, the
1045 HIGHTBITDUPPED() macro]
1046
1047
1048
1049 Reconstructing The Frame
1050 ------------------------
1051 rough outline:
1052   - foreach fragment:
1053     - if motion vector
1054       - copy motion fragment from appropriate frame into current frame
1055         (don't forget to account for unrestricted motion vectors)
1056     - dequantize fragment coefficients
1057     - run coefficients through inverse DCT
1058     - if INTRA coded fragment
1059       - output transformed coefficients
1060     - else
1061       - apply transformed residual to motion fragment
1062
1063 [not finished]
1064
1065
1066 Theora Specification
1067 --------------------
1068 The Theora project leverages the VP3 codec into a new video coding
1069 system. The algorithm and bitstream format are the same as VP3 with a
1070 few minor differences:
1071
1072 1) The frame orientation is reversed-- VP3 is coded from bottom to top
1073 while Theora video is coded from top to bottom.
1074 [nope-- only true in the first few alpha releases; final Theora spec will
1075 be upside-down, the same as VP3]
1076
1077 2) Variable histograms-- VP3 uses a hardcoded set of histograms for DCT
1078 coefficient coding (described in section "Unpacking The DCT
1079 Coefficients"). Theora packs the histogram information in the header of
1080 the transport format (which is meant to be Ogg, but can probably be
1081 coerced into a variety of other multimedia container formats).
1082
1083 3) Variable quantization-- As with the histograms, Theora codes the
1084 quantization tables and quality thresholds (described in section
1085 "Initializing The Quantization Matrices") into the header.
1086
1087 4) [special VLRLC case for encoding unusually large runs of blocks;
1088 necessary for HD resolutions]
1089
1090 [still need coding format of histogram and quantizer information]
1091
1092
1093 Appendix A: VP31 Quantization Matrices And Scale Factors
1094 --------------------------------------------------------
1095 The following quantization matrices and scale factor tables are hardcoded
1096 into the VP31 coding standard. These tables can vary according to the
1097 setup information transported along with a Theora file.
1098
1099 Base quantization matrix for golden frame Y fragments (note that this
1100 is the same as JPEG):
1101
1102     16  11  10  16  24  40  51  61
1103     12  12  14  19  26  58  60  55
1104     14  13  16  24  40  57  69  56
1105     14  17  22  29  51  87  80  62
1106     18  22  37  58  68 109 103  77
1107     24  35  55  64  81 104 113  92
1108     49  64  78  87 103 121 120 101
1109     72  92  95  98 112 100 103  99
1110
1111
1112 Base quantization matrix for golden frame C fragments (note that this
1113 is the same as JPEG):
1114
1115     17  18  24  47  99  99  99  99
1116     18  21  26  66  99  99  99  99
1117     24  26  56  99  99  99  99  99
1118     47  66  99  99  99  99  99  99
1119     99  99  99  99  99  99  99  99
1120     99  99  99  99  99  99  99  99
1121     99  99  99  99  99  99  99  99
1122     99  99  99  99  99  99  99  99
1123
1124
1125 Base quantization matrix for interframe Y and C fragments:
1126
1127     16  16  16  20  24  28  32  40
1128     16  16  20  24  28  32  40  48
1129     16  20  24  28  32  40  48  64
1130     20  24  28  32  40  48  64  64
1131     24  28  32  40  48  64  64  64
1132     28  32  40  48  64  64  64  96
1133     32  40  48  64  64  64  96 128
1134     40  48  64  64  64  96 128 128
1135
1136
1137 DC coefficient scale factor table:
1138
1139    220 200 190 180 170 170 160 160
1140    150 150 140 140 130 130 120 120
1141    110 110 100 100  90  90  90  80
1142     80  80  70  70  70  60  60  60
1143     60  50  50  50  50  40  40  40
1144     40  40  30  30  30  30  30  30
1145     30  20  20  20  20  20  20  20
1146     20  10  10  10  10  10  10  10
1147
1148
1149 AC coefficient scale factor table:
1150
1151    500 450 400 370 340 310 285 265
1152    245 225 210 195 185 180 170 160
1153    150 145 135 130 125 115 110 107
1154    100  96  93  89  85  82  75  74
1155     70  68  64  60  57  56  52  50
1156     49  45  44  43  40  38  37  35
1157     33  32  30  29  28  25  24  22
1158     21  19  18  17  15  13  12  10
1159
1160
1161 Appendix B: Macroblock Coding Mode Alphabets
1162 --------------------------------------------
1163 These are the 6 pre-defined alphabets used to decode macroblock coding
1164 mode information:
1165
1166 Alphabet 1:
1167   index   coding mode
1168   -----   -----------
1169     0     MODE_INTER_LAST_MV
1170     1     MODE_INTER_PRIOR_LAST
1171     2     MODE_INTER_PLUS_MV
1172     3     MODE_INTER_NO_MV
1173     4     MODE_INTRA
1174     5     MODE_USING_GOLDEN,
1175     6     MODE_GOLDEN_MV
1176     7     MODE_INTER_FOURMV
1177
1178 Alphabet 2:
1179   index   coding mode
1180   -----   -----------
1181     0     MODE_INTER_LAST_MV
1182     1     MODE_INTER_PRIOR_LAST
1183     2     MODE_INTER_NO_MV
1184     3     MODE_INTER_PLUS_MV
1185     4     MODE_INTRA
1186     5     MODE_USING_GOLDEN
1187     6     MODE_GOLDEN_MV
1188     7     MODE_INTER_FOURMV
1189
1190 Alphabet 3:
1191   index   coding mode
1192   -----   -----------
1193     0     MODE_INTER_LAST_MV
1194     1     MODE_INTER_PLUS_MV
1195     2     MODE_INTER_PRIOR_LAST
1196     3     MODE_INTER_NO_MV
1197     4     MODE_INTRA
1198     5     MODE_USING_GOLDEN
1199     6     MODE_GOLDEN_MV
1200     7     MODE_INTER_FOURMV
1201
1202 Alphabet 4:
1203   index   coding mode
1204   -----   -----------
1205     0     MODE_INTER_LAST_MV
1206     1     MODE_INTER_PLUS_MV
1207     2     MODE_INTER_NO_MV
1208     3     MODE_INTER_PRIOR_LAST
1209     4     MODE_INTRA
1210     5     MODE_USING_GOLDEN
1211     6     MODE_GOLDEN_MV
1212     7     MODE_INTER_FOURMV
1213
1214 Alphabet 5:
1215   index   coding mode
1216   -----   -----------
1217     0     MODE_INTER_NO_MV
1218     1     MODE_INTER_LAST_MV
1219     2     MODE_INTER_PRIOR_LAST
1220     3     MODE_INTER_PLUS_MV
1221     4     MODE_INTRA
1222     5     MODE_USING_GOLDEN
1223     6     MODE_GOLDEN_MV
1224     7     MODE_INTER_FOURMV
1225
1226 Alphabet 6:
1227   index   coding mode
1228   -----   -----------
1229     0     MODE_INTER_NO_MV
1230     1     MODE_USING_GOLDEN
1231     2     MODE_INTER_LAST_MV
1232     3     MODE_INTER_PRIOR_LAST
1233     4     MODE_INTER_PLUS_MV
1234     5     MODE_INTRA
1235     6     MODE_GOLDEN_MV
1236     7     MODE_INTER_FOURMV
1237
1238
1239 Appendix C: DCT Coefficient VLC Tables
1240 --------------------------------------
1241 - VP31 tables are hardcoded
1242 - Theora tables are transported with video stream
1243
1244 [not finished]
1245
1246
1247 Appendix D: The VP3 IDCT
1248 ------------------------
1249
1250 [not finished]
1251
1252
1253 Acknowledgements
1254 ----------------
1255 Thanks to Michael Niedermayer (michaelni at gmx dot at) for peer review,
1256 corrections, and recommendations for improvement.
1257
1258 Dan Miller (dan at on2 dot com) for clarifications on pieces of the
1259 format.
1260
1261 Timothy B. Terriberry (tterribe at vt dot edu) for clarification about the
1262 differences between VP3 and Theora, detailed explanation of motion
1263 vector mechanics.
1264
1265
1266 References
1267 ----------
1268 Tables necessary for decoding VP3:
1269 http://mplayerhq.hu/cgi-bin/cvsweb.cgi/~checkout~/ffmpeg/libavcodec/vp3data.h?content-type=text/x-cvsweb-markup&cvsroot=FFMpeg
1270
1271 Official VP3 site:
1272 http://www.vp3.com/
1273
1274 Theora, based on VP3:
1275 http://www.theora.org/
1276
1277 On2, creators of the VP3 format:
1278 http://www.on2.com/
1279
1280
1281 ChangeLog
1282 ---------
1283 v0.5: December 8, 2004
1284 - reworked section "Reversing The DC Prediction" to include a tabular
1285 representation of all 16 prediction modes
1286
1287 v0.4: March 2, 2004
1288 - renamed and expanded section "Initializing The Quantization Matrices"
1289 - outlined section "Reconstructing The Frame"
1290 - moved Theora Differences Appendix to its own section entitled "Theora
1291 Specification"
1292 - added Appendix: Quantization Matrices And Scale Factors
1293 - added Appendix: DCT Coefficient VLC Tables
1294
1295 v0.3: February 29, 2004
1296 - expanded section "Unpacking The Macroblock Coding Mode Information"
1297 - expanded section "Unpacking The Macroblock Motion Vectors"
1298 - added Appendix: Macroblock Coding Mode Alphabets
1299
1300 v0.2: October 9, 2003
1301 - expanded section "Reversing the DC Prediction"
1302 - added Appendix: Theora Differences
1303
1304 v0.1: June 17, 2003
1305 - initial release, nowhere near complete