1 % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
2 %!TEX root = Vorbis_I_spec.tex
3 \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
7 This document describes using Ogg logical and physical transport
8 streams to encapsulate Vorbis compressed audio packet data into file
11 The \xref{vorbis:spec:intro} provides an overview of the construction
12 of Vorbis audio packets.
14 The \href{oggstream.html}{Ogg
15 bitstream overview} and \href{framing.html}{Ogg logical
16 bitstream and framing spec} provide detailed descriptions of Ogg
17 transport streams. This specification document assumes a working
18 knowledge of the concepts covered in these named backround
19 documents. Please read them first.
21 \subsubsection{Restrictions}
23 The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
24 streams use Ogg transport streams in degenerate, unmultiplexed
29 A meta-headerless Ogg file encapsulates the Vorbis I packets
32 The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
35 The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
40 This is not to say that it is not currently possible to multiplex
41 Vorbis with other media types into a multi-stream Ogg file. At the
42 time this document was written, Ogg was becoming a popular container
43 for low-bitrate movies consisting of DivX video and Vorbis audio.
44 However, a 'Vorbis I audio file' is taken to imply Vorbis audio
45 existing alone within a degenerate Ogg stream. A compliant 'Vorbis
46 audio player' is not required to implement Ogg support beyond the
47 specific support of Vorbis within a degenrate Ogg stream (naturally,
48 application authors are encouraged to support full multiplexed Ogg
54 \subsubsection{MIME type}
56 The MIME type of Ogg files depend on the context. Specifically, complex
57 multimedia and applications should use \literal{application/ogg},
58 while visual media should use \literal{video/ogg}, and audio
59 \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear
60 in any of those types. RTP encapsulated Vorbis should use
61 \literal{audio/vorbis} + \literal{audio/vorbis-config}.
64 \subsection{Encapsulation}
66 Ogg encapsulation of a Vorbis packet stream is straightforward.
71 The first Vorbis packet (the identification header), which
72 uniquely identifies a stream as Vorbis audio, is placed alone in the
73 first page of the logical Ogg stream. This results in a first Ogg
74 page of exactly 58 bytes at the very beginning of the logical stream.
78 This first page is marked 'beginning of stream' in the page flags.
82 The second and third vorbis packets (comment and setup
83 headers) may span one or more pages beginning on the second page of
84 the logical stream. However many pages they span, the third header
85 packet finishes the page on which it ends. The next (first audio) packet
86 must begin on a fresh page.
90 The granule position of these first pages containing only headers is zero.
94 The first audio packet of the logical stream begins a fresh Ogg page.
98 Packets are placed into ogg pages in order until the end of stream.
102 The last page is marked 'end of stream' in the page flags.
106 Vorbis packets may span page boundaries.
110 The granule position of pages containing Vorbis audio is in units
111 of PCM audio samples (per channel; a stereo stream's granule position
112 does not increment at twice the speed of a mono stream).
116 The granule position of a page represents the end PCM sample
117 position of the last packet \emph{completed} on that
118 page. The 'last PCM sample' is the last complete sample returned by
119 decode, not an internal sample awaiting lapping with a
120 subsequent block. A page that is entirely spanned by a single
121 packet (that completes on a subsequent page) has no granule
122 position, and the granule position is set to '-1'.
125 Note that the last decoded (fully lapped) PCM sample from a packet
126 is not necessarily the middle sample from that block. If, eg, the
127 current Vorbis packet encodes a "long block" and the next Vorbis
128 packet encodes a "short block", the last decodable sample from the
129 current packet be at position (3*long\_block\_length/4) -
130 (short\_block\_length/4).
134 The granule (PCM) position of the first page need not indicate
135 that the stream started at position zero. Although the granule
136 position belongs to the last completed packet on the page and a
137 valid granule position must be positive, by
138 inference it may indicate that the PCM position of the beginning
139 of audio is positive or negative.
144 A positive starting value simply indicates that this stream begins at
145 some positive time offset, potentially within a larger
146 program. This is a common case when connecting to the middle
150 A negative value indicates that
151 output samples preceeding time zero should be discarded during
152 decoding; this technique is used to allow sample-granularity
153 editing of the stream start time of already-encoded Vorbis
154 streams. The number of samples to be discarded must not exceed
155 the overlap-add span of the first two audio packets.
160 In both of these cases in which the initial audio PCM starting
161 offset is nonzero, the second finished audio packet must flush the
162 page on which it appears and the third packet begin a fresh page.
163 This allows the decoder to always be able to perform PCM position
164 adjustments before needing to return any PCM data from synthesis,
165 resulting in correct positioning information without any aditional
170 Failure to do so should, at worst, cause a
171 decoder implementation to return incorrect positioning information
172 for seeking operations at the very beginning of the stream.
177 A granule position on the final page in a stream that indicates
178 less audio data than the final packet would normally return is used to
179 end the stream on other than even frame boundaries. The difference
180 between the actual available data returned and the declared amount
181 indicates how many trailing samples to discard from the decoding