doc/framing.txt

   1 ********************************************************************
   2 *                                                                  *
   3 * THIS FILE IS PART OF THE Ogg Vorbis SOFTWARE CODEC SOURCE CODE.  *
   4 * USE, DISTRIBUTION AND REPRODUCTION OF THIS SOURCE IS GOVERNED BY *
   5 * THE GNU PUBLIC LICENSE 2, WHICH IS INCLUDED WITH THIS SOURCE.    *
   6 * PLEASE READ THESE TERMS DISTRIBUTING.                            *
   7 *                                                                  *
   8 * THE OggSQUISH SOURCE CODE IS (C) COPYRIGHT 1994-1999             *
   9 * by 1999 Monty <monty@xiph.org> and The XIPHOPHORUS Company       *
  10 * http://www.xiph.org/                                             *
  11 *                                                                  *
  12 ********************************************************************
  13
  14  function: discussion of Vorbis framing
  15  author: Monty <xiphmont@mit.edu>, <monty@xiph.org>
  16  modifications by: Monty
  17  last modification date: Jun 30 1999
  18
  19 ********************************************************************
  20
  21 Vorbis encodes short-time blocks of PCM data into raw packets of
  22 bit-packed data.  These raw packets may be used directly by transport
  23 mechanisms that provide their own framing and packet-seperation
  24 mechanisms (such as UDP datagrams).  For stream based storage (such as
  25 files) and transport (such as TCP streams or pipes), Vorbis also
  26 specifies an additional layer of bitstream structure to provide
  27 framing/sync, sync recapture after error, landmarks during seeking,
  28 and enough information to properly seperate data back into packets at
  29 the original packet boundaries without relying on decoding to find
  30 packet boundaries.
  31
  32 Design constraints:
  33
  34 1) True streaming; we must not need to seek to build a 100% complete
  35    bitstream.
  36
  37 2) Use no more than approximately 1-2% of bitstream bandwidth for packet
  38    boundary marking, high-level framing, sync and seeking.
  39
  40 3) Specification of absolute position within the original sample stream.
  41
  42 4) Simple mechanism to ease limited editing, such as a simplified
  43    concatenation mechanism.
  44
  45 5) Detection of corruption and recapture after error.
  46
  47 A vorbis stream is structured by dividing packets into segments of up
  48 to 255 bytes and then wrapping a group of contiguous packet segments
  49 into a variable length page preceeded by a page header.  Both the
  50 header size and page size are variable; the page header contains
  51 sizing information and checksum data to determine header/page size and
  52 data integrity.
  53
  54 The bitstream is captured (or recaptured) by looking for the beginning
  55 of a page, specifically the capture pattern.  Once the capture pattern
  56 is found, the decoder verifies page sync and integrity by computing
  57 and comparing the checksum. At that point, the decoder can extract the
  58 packetss themselves.
  59
  60 **** Packet segmentation
  61
  62 Packets are logically divided into multiple segments before encoding
  63 into a page. Note that the segmentation and fragmentation process is a
  64 logical one; it's used to compute page header values and the original
  65 page data need not be disturbed, even when a packet spans page
  66 boundaries.
  67
  68 The raw packet is logically divided into [n] 255 byte segments and a
  69 last fractional segment of < 255 bytes.  A packet size may well
  70 consist only of the trailing fractional segment, and a fractional
  71 segment may be zero length.  These values, called "lacing values" are
  72 then saved and placed into the header segment table.
  73
  74 An example should make the basic concept clear:
  75
  76 raw packet:
  77   ___________________________________________
  78  |______________packet data__________________| 753 bytes
  79
  80 lacing values for page header segment table: 255,255,243
  81
  82 We simply add the lacing values for the total size; the last lacing
  83 value for a packet is always the value that is less than 255. Note
  84 that this encoding both avoids imposing a maximum packet size as well
  85 as imposing minimum overhead on small packets (as opposed to, eg,
  86 simply using two bytes at the head of every packet and having a max
  87 packet size of 32k.  Small packets (<255, the typical case) are
  88 penalized with twice the segmentation overhead). Using the lacing
  89 values as suggested, small packets see the minimum possible
  90 byte-aligned overheade (1 byte) and large packets, over 512 bytes or
  91 so, see a fairly constant ~.5% overhead on encoding space.
  92
  93 Clarification of boundary cases:
  94
  95 Note that a lacing value of 255 implies that a second lacing value
  96 follows in the packet, and a value of < 255 marks the end of the
  97 packet after that many additional bytes.  A packet of 255 bytes (or a
  98 multiple of 255 bytes) is terminated by a lacing value of 0:
  99
 100 raw packet:
 101   _______________________________
 102  |________packet data____________|          255 bytes
 103
 104 lacing values: 255, 0
 105
 106 Note also that a 'nil' (zero length) packet is not an error; it
 107 consists of nothing more than a lacing value of zero in the header.
 108
 109 **** Packets spanning pages:
 110
 111 Packets are not resticted to beginning and ending within a page,
 112 although individual segments are, by definition, required to do so.
 113 Packets are not restricted to a maximum size, although excessively
 114 large packets in the data stream are discouraged; the Vorbis
 115 specification strongly recommend nominal page size of approximately
 116 4-8kB (large packets are forseen as being useful for initialization
 117 data at the beginning of a logical bitstream).
 118
 119 After segmenting a packet, the encoder may decide not to place all the
 120 resulting segments into the current page; to do so, the encoder places
 121 the lacing values of the segments it wishes to belong to the current
 122 page into the current segment table, then finishes the page.  The next
 123 page is begun with the first value in the segment table belonging to
 124 the next packet segment, thus continuing the packet (data in the
 125 packet body must also correspond properly to the lacing values in the
 126 spanned pages. The segment data in the first packet corresponding to the
 127 lacing values of the first page belong in that page; packet segments listed in the segment table of the following page must begin the page body of the subsequent page).
 128
 129 The last mechanic to spanning a page boundary is to set the header
 130 flag in the new page to indicate that the first lacing value in the
 131 segment table continues rather than begins a packet; a header flag of
 132 0x02 is used to indicate a continued packet.  Although mandatory, it
 133 is not actually algorithmically necessary; one could inspect the
 134 preceeding segment table to determine if the packet is new or
 135 continued.  Adding the information to the packet_header flag allows a
 136 simpler design (with no overhead) that needs only inspect the current
 137 page header after frame capture.  This also allows faster error
 138 recovery in the event that the packet originates in a corrupt
 139 preceeding page, implying that the previous page's segment table
 140 cannot be trusted.
 141
 142 Note that a packet can span an arbitrary number of pages; the above
 143 spanning process is repeated for each spanned page boundary.  Also a
 144 'zero termination' on a packet size that is an even multiple of 255
 145 must appear even if the lacing value appears in the next page as a
 146 zero-length continuation of the current packet.  The header flag
 147 should be set to 0x02 to indicate that the packet spanned, even though
 148 the span is a nil case as far as data is concerned.
 149
 150 The encoding looks odd, but is properly optimized for speed and the
 151 expected case of the majority of packets being between 50 and 200
 152 bytes (note that it is designed such that packets of wildly different
 153 sizes can be handled within the model; placing packet size
 154 restrictions on the encoder would have only slightly simplified design
 155 in page generation and increased overall encoder complexity).
 156
 157 The main point behind tracking individual packets (and packet
 158 segments) is to allow more flexible encoding tricks that requiring
 159 explicit knowledge of packet size. An example is simple bandwidth
 160 limiting, implemented by simply truncating packets in the nominal case
 161 (the packet is arranged so that the least sensitive portion of the
 162 data comes last).
 163
 164 **** Page header
 165
 166 The headering mechanism is designed to avoid copying and re-assembly
 167 of the packet data; the header can be generated directly from incoming
 168 packet data.  The encoder buffers packet data until it finishes a
 169 complete page at which point it writes the header followed by the
 170 buffered packet segments.
 171
 172 capture_pattern:
 173
 174  A header begins with a capture pattern that simplifies identifying
 175  pages; once the decoder has found the capture pattern it can do a more
 176  intensive job of verifying that it has in fact found a page boundary
 177  (as opposed to an inadvertant coincidence in the byte stream).
 178
 179  byte value
 180   0  0x4f 'O'
 181   1  0x67 'g'
 182   2  0x67 'g'
 183   3  0x53 'S'
 184
 185 stream_structure_version:
 186
 187  The capture pattern is followed by the stream structure revision:
 188
 189   4  0x00
 190
 191
 192 header_type_flag:
 193
 194  The header type flag identifies this page's context in the bitstream:
 195
 196   5  0x00 (beginning of bitstream)
 197      0x01 (bitstream continued, fresh packet)
 198      0x02 (bitstream continued, continued packet)
 199
 200 PCM absolute position
 201
 202  (This is packed in the same way the rest of Vorbis packet data is
 203  packed; LSb of LSB first.  Note that the 'position' data specifies a
 204  sample number (eg, in a CD quality sample is four octets, 16 bits for
 205  left and 16 bits for right).  The position specified is the total
 206  samples encoded after including all packets begun in this page.  The
 207  rationale here is that the position specified in the frame header of
 208  the last page tells how long the PCM data coded by the bitstream is.
 209
 210   6  0xXX LSB
 211   7  0xXX
 212   8  0xXX
 213   9  0xXX
 214  10  0xXX
 215  11  0xXX
 216  12  0xXX
 217  13  0xXX MSB
 218
 219 stream serial number:
 220
 221  Vorbis allows for seperate logical bitstreams to be mixed at page
 222  granularity in a physical bitstream.  The most common case would be
 223  sequential arrangement, but it is possible to interleave pages for
 224  two seperate bitstreams to be decoded concurrently.  Right now, the
 225  standard code doesn't use the serial number (sets it to zero), but it
 226  will eventually.  Each logical stream must have a unique serial
 227  number within a physical stream:
 228
 229  14  0xXX LSB
 230  15  0xXX
 231  16  0xXX
 232  17  0xXX MSB
 233
 234 page sequence no
 235
 236  Page counter; lets us know if a page is lost (useful where packets
 237  span page boundaries).
 238
 239  18  0xXX LSB
 240  19  0xXX
 241  20  0xXX
 242  21  0xXX MSB
 243
 244 page checksum
 245
 246  32 bit CRC value (direct algorithm, initial val and final XOR = 0,
 247  generator polynomial=0x04c11db7).  The value is computed over the
 248  entire header (with the CRC field in the header set to zero) and then
 249  continued over the page.  The CRC field is then filled with the
 250  computed value.
 251
 252  (A thorough discussion of CRC algorithms can be found in "A Painless
 253  Guide to CRC Error Detection Algorithms" by Ross Williams
 254  (ross@guest.adelaide.edu.au).  The document is available from
 255  ftp://ftp.adelaide.edu.au/pub/rocksoft)
 256
 257  22  0xXX LSB
 258  23  0xXX
 259  24  0xXX
 260  25  0xXX MSB
 261
 262 page_segments
 263
 264  The number of segment entries to appear in the segment table. The
 265  maximum number of 255 segments (255 bytes each) sets the maximum
 266  possible physical page size at 65307 bytes or just under 64kB (thus
 267  we know that a header corrupted so as destroy sizing/alignment
 268  information will not cause a runaway bitstream.  We'll read in the
 269  page according to the corrupted size information that's guaranteed to
 270  be a reasonable size regardless, notice the checksum mismatch, drop
 271  sync and then look for recapture).
 272
 273  26 0x00-0xff (0-255)
 274
 275 segment_table (containing packet lacing values)
 276
 277  The lacing values for each packet segment physically appearing in
 278  this page are listed in contiguous order.
 279
 280  27 0x00-0xff (0-255)
 281  [...]
 282  n  0x00-0xff (0-255, n=page_segments+26)
 283
 284 Total page size is calculated directly from the known header size and
 285 lacing values in the segment table. Packet data segments follow
 286 immediately after the header.
 287
 288 Page headers typically impose a flat .25-.5% space overhead assuming
 289 nominal ~8k page sizes.  The segmentation table needed for exact
 290 packet recovery in the streaming layer adds approximately .5-1%
 291 nominal assuming expected encoder behavior in the 44.1kHz, 128kbps
 292 stereo encodings.
 293