<section id="vorbis-spec-intro">
<sectioninfo>
<releaseinfo>
- $Id: 01-introduction.xml,v 1.3 2002/10/14 23:47:54 giles Exp $
+ $Id: 01-introduction.xml,v 1.4 2002/10/21 23:36:59 giles Exp $
<emphasis>Last update to this document: July 18, 2002</emphasis>
</releaseinfo>
</sectioninfo>
<para>
This document provides a high level description of the Vorbis codec's
construction. A bit-by-bit specification appears beginning in <xref
-linkend="vorbis-spec-codec"/>. The other reference documents assumes
-a high-level understanding of the Vorbis decode process, which is
-provided in this document.</para>
+linkend="vorbis-spec-codec"/>. The later sections assume a high-level
+understanding of the Vorbis decode process, which is
+provided here.</para>
<section>
<title>Application</title>
Vorbis is a general purpose perceptual audio CODEC intended to allow
maximum encoder flexibility, thus allowing it to scale competitively
over an exceptionally wide range of bitrates. At the high
-quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits),
+quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits)
it is in the same league as MPEG-2 and MPC. Similarly, the 1.0
-encoder can encode high-quality CD and DAT rate stereo at below 48kpbs
+encoder can encode high-quality CD and DAT rate stereo at below 48kbps
without resampling to a lower rate. Vorbis is also intended for
lower and higher sample rates (from 8kHz telephony to 192kHz digital
masters) and a range of channel representations (monaural,
computationally simpler than mp3, although it does require more
working memory as Vorbis has no static probability model; the vector
codebooks used in the first stage of decoding from the bitstream are
-packed, in their entirety, into the Vorbis bitstream headers. In
+packed in their entirety into the Vorbis bitstream headers. In
packed form, these codebooks occupy only a few kilobytes; the extent
to which they are pre-decoded into a cache is the dominant factor in
decoder memory usage.
Vorbis provides none of its own framing, synchronization or protection
against errors; it is solely a method of accepting input audio,
dividing it into individual frames and compressing these frames into
-raw, unformatted 'packets'. The decoder then accepts these raw
+raw, unformatted 'packets'. The decoder then accepts these raw
packets in sequence, decodes them, synthesizes audio frames from
them, and reassembles the frames into a facsimile of the original
-audio stream. Vorbis is a free-form VBR codec and packets have no
+audio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no
minimum size, maximum size, or fixed/expected size. Packets
are designed that they may be truncated (or padded) and remain
decodable; this is not to be considered an error condition and is used
requirement or fundamental assumption in the Vorbis design.</para>
<para>
-<ulink url="vorbis-ogg.html">The specifications for embedding Vorbis into
-an Ogg transport stream is in a separate document.</ulink></para>
+The specification for embedding Vorbis into
+an Ogg transport stream is in a <xref linkend="vorbis-over-ogg"/>.
+</para>
+
</section>
<section>
<title>Codec Setup and Probability Model</title>
<para>
-Vorbis's heritage is as a research CODEC and its current design
+Vorbis' heritage is as a research CODEC and its current design
reflects a desire to allow multiple decades of continuous encoder
improvement before running out of room within the codec specification.
-For these reasons, configurable aspects codec setup intentionally
+For these reasons, configurable aspects of codec setup intentionally
lean toward the extreme of forward adaptive.</para>
<para>
parameters (often several hundred fields). This makes it impossible,
as it would be with MPEG audio layers, to embed a simple frame type
flag in each audio packet, or begin decode at any frame in the stream
-without having previously fetched the codec setup header. [Note:
-Vorbis *can* initiate decode at any arbitrary packet within a
+without having previously fetched the codec setup header.
+</para>
+
+<note><para>
+Vorbis <emphasis>can</emphasis> initiate decode at any arbitrary packet within a
bitstream so long as the codec has been initialized/setup with the
-setup headers].</para>
+setup headers.</para></note>
<para>
Thus, Vorbis headers are both required for decode to begin and
current design trends (and also points out limitations in some
existing software/interface designs, such as Windows' ACM codec
framework). However, we find that it does not fundamentally limit
-Vorbis's suitable application space.</para>
+Vorbis' suitable application space.</para>
</section>
<para>
The comment header includes user text comments ("tags") and a vendor
string for the application/library that produced the bitstream. The
-encoding of the comment header is described within this document; the
-proper use of the comment fields is described in
+encoding and proper use of the comment header is described in
<xref linkend="vorbis-spec-comment"/>.</para>
</section>
<listitem><simpara>compute dot product of floor and residue, producing audio spectrum vector</simpara></listitem>
<listitem><simpara>inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I</simpara></listitem>
<listitem><simpara>overlap/add left-hand output of transform with right-hand output of previous frame</simpara></listitem>
-<listitem><simpara>store right hand-data from transform of current frame for future lapping.</simpara></listitem>
+<listitem><simpara>store right hand-data from transform of current frame for future lapping</simpara></listitem>
<listitem><simpara>if not first frame, return results of overlap/add as audio result of current frame</simpara></listitem>
</orderedlist>
</para>
<para>
Vorbis I uses four packet types. The first three packet types mark each
-of the three Vorbis headers described above. The fourth packet type
-marks an audio packet. All others packet types are reserved; packets
-marked with a reserved flag type should be ignored.</para>
+of the three Vorbis headers described above. The fourth packet type
+marks an audio packet. All other packet types are reserved; packets
+marked with a reserved type should be ignored.</para>
<para>
Following the three header packets, all packets in a Vorbis I stream
must be modified for seamless lapping as above. It is possible to
correctly infer window shape to be applied to the current window from
knowing the sizes of the current, previous and next window. It is
-legal for a decoder to use this method; However, in the case of a long
+legal for a decoder to use this method. However, in the case of a long
window (short windows require no modification), Vorbis also codes two
flag bits to specify pre- and post- window shape. Although not
strictly necessary for function, this minor redundancy allows a packet
audio</ulink></citetitle>, by T. Sporer, K. Brandenburg and B. Edler. Vorbis windows
all use the slope function
<inlineequation>
+ <!-- this equation is incorrect -->
<alt>y=sin(2PI*sin^2(x/n))</alt>
<inlinemediaobject>
<textobject>
<para>
A detailed discussion of stereo in the Vorbis codec can be found in
-the document <ulink url="stereo.html">_Stereo Channel Coupling in the
-Vorbis CODEC_</ulink>. Vorbis is not limited to only stereo coupling, but
+the document <ulink url="stereo.html"><citetitle>Stereo Channel Coupling in the
+Vorbis CODEC</citetitle></ulink>. Vorbis is not limited to only stereo coupling, but
the stereo document also gives a good overview of the generic coupling
mechanism.</para>
angle) back to Cartesian representation.</para>
<para>
-After decoupling, in order, each pair of vectors on the coupling list
-in, the resulting residue vector represents the fine spectral detail
+After decoupling, in order, each pair of vectors on the coupling list,
+the resulting residue vectors represent the fine spectral detail
of each output channel.</para>
</section>
<para>
The audio spectrum is converted back into time domain PCM audio via an
inverse Modified Discrete Cosine Transform (MDCT). A detailed
-description of the MDCT is available in the paper
-<citetitle pubwork="article"><ulink
-url="http://www.iocon.com/resource/docs/ps/eusipco_corrected.ps">
-The use of multirate filter banks for coding of high quality digital
-audio</ulink></citetitle>, by T. Sporer, K. Brandenburg and B. Edler.</para>
+description of the MDCT is available in the paper <ulink
+url="http://www.iocon.com/resource/docs/ps/eusipco_corrected.ps"><citetitle pubwork="article">The use of multirate filter banks for coding of high quality digital
+audio</citetitle></ulink>, by T. Sporer, K. Brandenburg and B. Edler.</para>
<para>
Note that the PCM produced directly from the MDCT is not yet finished
<section id="vorbis-spec-bitpacking">
<sectioninfo>
<releaseinfo>
- $Id: 02-bitpacking.xml,v 1.3 2002/10/14 18:20:30 giles Exp $
+ $Id: 02-bitpacking.xml,v 1.4 2002/10/21 23:37:00 giles Exp $
<emphasis>Last update to this document: July 14, 2002</emphasis>
</releaseinfo>
</sectioninfo>
<para>
The most ubiquitous architectures today consider a 'byte' to be an
octet (eight bits) and a word to be a group of two, four or eight
-bytes (16,32 or 64 bits). Note however that the Vorbis bitpacking
+bytes (16, 32 or 64 bits). Note however that the Vorbis bitpacking
convention is still well defined for any native byte size; Vorbis uses
the native bit-width of a given storage system. This document assumes
that a byte is one octet for purposes of example.</para>
The Vorbis codec has need to code arbitrary bit-width integers, from
zero to 32 bits wide, into packets. These integer fields are not
aligned to the boundaries of the byte representation; the next field
-is written at the bit position that the previous field ends.</para>
+is written at the bit position at which the previous field ends.</para>
<para>
-The encoder logically packs integers by writing the LSb of an binary
+The encoder logically packs integers by writing the LSb of a binary
integer to the logical bitstream first, followed by next least
significant bit, etc, until the requested number of bits have been
coded. When packing the bits into bytes, the encoder begins by
<section id="vorbis-spec-codebook">
<sectioninfo>
<releaseinfo>
- $Id: 03-codebook.xml,v 1.2 2002/10/14 23:47:54 giles Exp $
+ $Id: 03-codebook.xml,v 1.3 2002/10/21 23:37:00 giles Exp $
<emphasis>Last update to this document: August 8, 2002</emphasis>
</releaseinfo>
</sectioninfo>
<title>Packed Codebook Format</title>
<para>
-For purposes of the below examples, we assume that the storage
+For purposes of the examples below, we assume that the storage
system's native byte width is eight bits. This is not universally
true; see <xref linkend="vorbis-spec-bitpacking"/> for discussion
relating to non-eight-bit bytes.</para>
<para>
The decoder now performs for each of the <varname>[codebook_entries]</varname>
- code book entries:
+ codebook entries:
<screen>
<screen>
1) [current_entry] = 0;
2) [current_length] = read a five bit unsigned integer and add 1;
- 3) [number] = read <link linkend="vorbis-spec-ilog">ilog</link>([codebook_entries] -
-[current_entry])
-bits as an unsigned integer
+ 3) [number] = read <link linkend="vorbis-spec-ilog">ilog</link>([codebook_entries] - [current_entry]) bits as an unsigned integer
4) set the entries [current_entry] through [current_entry]+[number]-1, inclusive,
- of the [codebook_codeword_lengths] array to [current_length]
+ of the [codebook_codeword_lengths] array to [current_length]
5) set [current_entry] to [number] + [current_entry]
6) increment [current_length] by 1
- 7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION; the decoder will
- not be able to read this stream.
+ 7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION;
+ the decoder will not be able to read this stream.
8) if [current_entry] is less than [codebook_entries], repeat process starting at 3)
9) done.
</screen></para>
7) read a total of [codebook_lookup_values] unsigned integers of [codebook_value_bits] each;
store these in order in the array [codebook_multiplicands]
-
</screen></para>
</listitem><listitem>
<para>A <varname>[codebook_lookup_type]</varname> of greater than two is reserved
-and indicates
-a stream that's not decodable by the specification in this
+and indicates a stream that is not decodable by the specification in this
document.</para>
</listitem>
</itemizedlist>
<para>
Lookup type one specifies a lattice VQ lookup table built
-algorithmically from a list of scalar values. Calculate [unpack] the
+algorithmically from a list of scalar values. Calculate (unpack) the
final values of a codebook entry vector from the entries in
<varname>[codebook_multiplicands]</varname> as follows (<varname>[value_vector]</varname>
-is the output vector representing the vctor of values for entry number
+is the output vector representing the vector of values for entry number
<varname>[lookup_offset]</varname> in this codebook):
<screen>
array in a one-to-one mapping. Calculate [unpack] the
final values of a codebook entry vector from the entries in
<varname>[codebook_multiplicands]</varname> as follows (<varname>[value_vector]</varname>
-is the output vector representing the vctor of values for entry number
+is the output vector representing the vector of values for entry number
<varname>[lookup_offset]</varname> in this codebook):
<screen>
<section id="vorbis-spec-codec">
<sectioninfo>
<releaseinfo>
- $Id: 04-codec.xml,v 1.3 2002/10/18 00:43:00 giles Exp $
+ $Id: 04-codec.xml,v 1.4 2002/10/21 23:37:00 giles Exp $
<emphasis>Last update to this document: October 15, 2002</emphasis>
</releaseinfo>
</sectioninfo>
<orderedlist>
<listitem><simpara><varname>[vorbis_mapping_count]</varname> = read 6 bits as unsigned integer and add one</simpara></listitem>
- <listitem><simpara>For each <varname>[i]</varname>
-of <varname>[vorbis_mapping_count]</varname> mapping numbers:</simpara>
+ <listitem><para>For each <varname>[i]</varname> of <varname>[vorbis_mapping_count]</varname> mapping numbers:
<orderedlist>
<listitem><simpara>read the mapping type: 16 bits as unsigned integer. There's no reason to save the mapping type in Vorbis I.</simpara></listitem>
<listitem><simpara>If the mapping type is nonzero, the stream is undecodable</simpara></listitem>
<listitem><para>If the mapping type is zero:
<orderedlist>
- <listitem><simpara>read 1 bit as a boolean flag</simpara>
+ <listitem><para>read 1 bit as a boolean flag
<orderedlist>
<listitem><simpara>if set, <varname>[vorbis_mapping_submaps]</varname> = read 4 bits as unsigned integer and add one</simpara></listitem>
<listitem><simpara>if unset, <varname>[vorbis_mapping_submaps]</varname> = 1</simpara></listitem>
</orderedlist>
+ </para>
</listitem>
- <listitem><simpara>read 1 bit as a boolean flag</simpara>
+ <listitem><para>read 1 bit as a boolean flag
<orderedlist>
<listitem><para>if set, square polar channel mapping is in use:
<orderedlist>
<listitem><simpara>vector <varname>[vorbis_mapping_angle]</varname> element <varname>[j]</varname>= read <link linkend="vorbis-spec-ilog">ilog</link>(<varname>[audio_channels]</varname> - 1) bits as unsigned integer</simpara></listitem>
<listitem><simpara>the numbers read in the above two steps are channel numbers representing the channel to treat as magnitude and the channel to treat as angle, respectively. If for any coupling step the angle channel number equals the magnitude channel number, the magnitude channel number is greater than <varname>[audio_channels]</varname>-1, or the angle channel is greater than <varname>[audio_channels]</varname>-1, the stream is undecodable.</simpara></listitem>
</orderedlist>
- </para></listitem>
+ </para>
+ </listitem>
</orderedlist>
- </para></listitem>
+ </para>
+ </listitem>
<listitem><simpara>if unset, <varname>[vorbis_mapping_coupling_steps]</varname> = 0</simpara></listitem>
</orderedlist>
+ </para>
</listitem>
<listitem><simpara>read 2 bits (reserved field); if the value is nonzero, the stream is undecodable</simpara></listitem>
<listitem><simpara>if <varname>[vorbis_mapping_submaps]</varname> is greater than one, we read channel multiplex settings. For each <varname>[j]</varname> of <varname>[audio_channels]</varname> channels:</simpara>
</orderedlist></para>
</listitem>
</orderedlist>
- </listitem>
+ </para></listitem>
</orderedlist>
</section>
<para>
Following the three header packets, all packets in a Vorbis I stream
are audio. The first step of audio packet decode is to read and
-verify the packet type; <emphasis>a non-audio packet when audio is expected
+verify the packet type. <emphasis>A non-audio packet when audio is expected
indicates stream corruption or a non-compliant stream. The decoder
must ignore the packet and not attempt decoding it to audio</emphasis>.
</para>
</orderedlist>
<para>
-Vorbis windows all use the slope function y=sin(0.5 * π * sin^2((x+.5)/n * π)),
+Vorbis windows all use the slope function y=sin(0.5 * \pi * sin^2((x+.5)/n * \pi)),
where n is window size and x ranges 0...n-1, but dissimilar
lapping requirements can affect overall shape. Window generation
proceeds as follows:</para>
</listitem>
<listitem><simpara> window from range 0 ... <varname>[left_window_start]</varname>-1 inclusive is zero</simpara></listitem>
<listitem><simpara> for <varname>[i]</varname> in range <varname>[left_window_start]</varname> ...
-<varname>[left_window_end]</varname>-1, window(<varname>[i]</varname>) = sin(.5 * π * sin^2( (<varname>[i]</varname>-<varname>[left_window_start]</varname>+.5) / <varname>[left_n]</varname> * .5 * π) )</simpara></listitem>
+<varname>[left_window_end]</varname>-1, window(<varname>[i]</varname>) = sin(.5 * \pi * sin^2( (<varname>[i]</varname>-<varname>[left_window_start]</varname>+.5) / <varname>[left_n]</varname> * .5 * \pi) )</simpara></listitem>
<listitem><simpara> window from range <varname>[left_window_end]</varname> ... <varname>[right_window_start]</varname>-1
-inclusive is one</simpara></listitem><listitem><simpara> for <varname>[i]</varname> in range <varname>[right_window_start]</varname> ... <varname>[right_window_end]</varname>-1, window(<varname>[i]</varname>) = sin(.5 * π * sin^2( (<varname>[i]</varname>-<varname>[right_window_start]</varname>+.5) / <varname>[right_n]</varname> * .5 * π/2. + .5 * π) )</simpara></listitem>
+inclusive is one</simpara></listitem><listitem><simpara> for <varname>[i]</varname> in range <varname>[right_window_start]</varname> ... <varname>[right_window_end]</varname>-1, window(<varname>[i]</varname>) = sin(.5 * \pi * sin^2( (<varname>[i]</varname>-<varname>[right_window_start]</varname>+.5) / <varname>[right_n]</varname> * .5 * \pi/2. + .5 * \pi) )</simpara></listitem>
<listitem><simpara> window from range <varname>[rigth_window_start]</varname> ... <varname>[n]</varname>-1 is
zero</simpara></listitem>
</orderedlist>
floor (vector <varname>[vorbis_floor_types]</varname> element
<varname>[floor_number]</varname>) is zero then decode the floor for
channel <varname>[i]</varname> according to the
-<xref linkend="vorbis-spec-floor0-decode"/>floor 0 decode
-algorithm</simpara></listitem>
+<xref linkend="vorbis-spec-floor0-decode"/></simpara></listitem>
<listitem><simpara>if the type of this floor
is one then decode the floor for channel <varname>[i]</varname> according
-to the <xref linkend="vorbis-spec-floor1-decode"/>floor 1 decode
-algorithm</simpara></listitem>
+to the <xref linkend="vorbis-spec-floor1-decode"/></simpara></listitem>
<listitem><simpara>save the needed decoded floor information for channel for later synthesis</simpara></listitem>
<listitem><simpara>if the decoded floor returned 'unused', set vector <varname>[no_residue]</varname> element
<varname>[i]</varname> to true, else set vector <varname>[no_residue]</varname> element <varname>[i]</varname> to
<para>
For each channel, multiply each element of the floor curve by each
element of that channel's residue vector. The result is the dot
-product the floor and residue vectors for each channel; the produced
+product of the floor and residue vectors for each channel; the produced
vectors are the length <varname>[n]</varname>/2 audio spectrum for each
channel.</para>
Convert the audio spectrum vector of each channel back into time
domain PCM audio via an inverse Modified Discrete Cosine Transform
(MDCT). A detailed description of the MDCT is available in the paper
-<citetitle pubwork="article"><ulink
-url="http://www.iocon.com/resource/docs/ps/eusipco_corrected.ps">The
+<ulink url="http://www.iocon.com/resource/docs/ps/eusipco_corrected.ps"><citetitle pubwork="article">The
use of multirate filter banks for coding of high quality digital
-audio</ulink></citetitle>, by T. Sporer, K. Brandenburg and B. Edler. The window
-function used for the MDCT is the window determined earlier.</para>
+audio</citetitle></ulink>, by T. Sporer, K. Brandenburg and B. Edler. The window
+function used for the MDCT is the function described earlier.</para>
</section>
Windowed MDCT output is overlapped and added with the right hand data
of the previous window such that the 3/4 point of the previous window
is aligned with the 1/4 point of the current window (as illustrated in
-<xref linkend="vorbis-spec-window"/>. The overlapped portion
+<xref linkend="vorbis-spec-window"/>). The overlapped portion
produced from overlapping the previous and current frame data is
finished data to be returned by the decoder. This data spans from the
center of the previous window to the center of the current window. In
the case of same-sized windows, the amount of data to return is
one-half block consisting of and only of the overlapped portions. When
-overlapping a short and long window, much of the returned range is not
+overlapping a short and long window, much of the returned range does not
actually overlap. This does not damage transform orthogonality. Pay
attention however to returning the correct data range; the amount of
data to be returned is:
-<programlisting>window_blocksize(previous_window)/4+window_blocksize(current_window)/4</programlisting>
+<programlisting>
+window_blocksize(previous_window)/4+window_blocksize(current_window)/4
+</programlisting>
from the center (element windowsize/2) of the previous window to the
center (element windowsize/2-1, inclusive) of the current window.</para>
</listitem>
</varlistentry><varlistentry>
<term>six channels</term><listitem>
- <simpara>the stream is 5,1 surround. channel order: front left, front
+ <simpara>the stream is 5.1 surround. channel order: front left, front
center, front right, rear left, rear right, LFE</simpara>
</listitem>
</varlistentry><varlistentry>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
[
+%isogrk3;
<!ENTITY intro SYSTEM "01-introduction.xml">
<!ENTITY bitpacking SYSTEM "02-bitpacking.xml">
<!ENTITY codebook SYSTEM "03-codebook.xml">
<!ENTITY oggvorbis SYSTEM "a1-encapsulation_ogg.xml">
<!ENTITY rtpvorbis SYSTEM "a2-encapsulation_rtp.xml">
<!ENTITY footer SYSTEM "footer.xml">
-<!ENTITY draft-rtp SYSTEM "../draft-moffitt-vorbis-rtp-00.txt">
+<!NOTATION TEXT SYSTEM "text/plain">
+<!ENTITY draft-rtp SYSTEM "../draft-moffitt-vorbis-rtp-00.txt" NDATA TEXT>
]>
<article>