+This document describes some things to know about the Ogg format, as well
+as implementation details in GStreamer.
+
+INTRODUCTION
+============
+
+ogg and the granulepos
+----------------------
+
+An ogg stream contains pages with a serial number and a granulepos.
+The granulepos is a 64 bit signed integer. It is a value that in some way
+represents a time since the start of the stream.
+The interpretation as such is however both codec-specific and
+stream-specific.
+
+ogg has no notion of time: it only knows about bytes and granulepos values
+on pages.
+
+The granule position is just a number; the only guarantee for a valid ogg
+stream is that within a logical stream, this number never decreases.
+
+While logically a granulepos value can be constructed for every ogg packet,
+the page is marked with only one granulepos value: the granulepos of the
+last packet to end on that page.
+
+theora and the granulepos
+-------------------------
+
+The granulepos in theora is an encoding of the frame number of the last
+key frame ("i frame"), and the number of frames since the last key frame
+("p frame"). The granulepos is constructed as the sum of the first number,
+shifted to the left for granuleshift bits, and the second number:
+granulepos = pframe << granuleshift + iframe
+
+(This means that given a framenumber or a timestamp, one cannot generate
+ the one and only granulepos for that page; several granulepos possibilities
+ correspond to this frame number. You also need the last keyframe, as well
+ as the granuleshift.
+ However, given a granulepos, the theora codec can still map that to a
+ unique timestamp and frame number for that theora stream)
+
+ Note: currently theora stores the "presentation time" as the granulepos;
+ ie. a first data page with one packet contains one video frame and
+ will be marked with 0/0. Changing that to be 1/0 (so that it
+ represents the number of decodable frames up to that point, like
+ for Vorbis) is being discussed.
+
+vorbis and granulepos
+---------------------
+
+In Vorbis, the granulepos represents the number of samples that can be
+decoded from all packets up to that point.
+
+In GStreamer, the vorbisenc elements produces a stream where:
+- OFFSET is the byte offset this buffer is at; ie a running count of the
+ number of bytes produced before
+- OFFSET_END is the granulepos of the produced vorbis buffer
+- TIMESTAMP is the timestamp matching the begin of the buffer
+- DURATION is set such that TIMESTAMP + DURATION is the correct
+in a raw vorbis stream we use the granulepos as the offset field.
+
+Ogg media mapping
+-----------------
+
+Ogg defines a mapping for each media type that it embeds.
+
+For Vorbis:
+
+ - 3 header pages, with granulepos 0.
+ - 1 page with 1 packet header identification
+ - N pages with 2 packets comments and codebooks
+ - granulepos is samplenumber of next page
+ - one packet can contain a variable number of samples but one frame
+ that should be handed to the vorbis decoder.
+
+For Theora
+
+ - 3 header pages, with granulepos 0.
+ - 1 page with 1 packet header identification
+ - N pages with 2 packets comments and codebooks
+ - granulepos is framenumber of last packet in page, where framenumber
+ is a combination of keyframe number and p frames since keyframe.
+ - one packet contains 1 frame
+
+
+
+
+DEMUXING
+========
+
ogg demuxer
-----------
1) the streaming mode.
- In this mode, the ogg demuxer receives buffers in the _chain() function which
- are then simply submited to the ogg sync layer. Pages are then processed when the
- sync layer detects them, pads are created for new chains and packets are sent to
- the peer elements of the pads.
+In this mode, the ogg demuxer receives buffers in the _chain() function which
+are then simply submited to the ogg sync layer. Pages are then processed when
+the sync layer detects them, pads are created for new chains and packets are
+sent to the peer elements of the pads.
- In this mode, no seeking is possible. This is the typical case when the stream is
- read from a network source.
+In this mode, no seeking is possible. This is the typical case when the
+stream is read from a network source.
- In this mode, no setup is done at startup, the pages are just read and decoded.
- A new logical chain is detected when one of the pages has the BOS flag set. At this
- point the existing pads are removed and new pads are created for all the logical
- streams in this new chain.
+In this mode, no setup is done at startup, the pages are just read and decoded.
+A new logical chain is detected when one of the pages has the BOS flag set. At
+this point the existing pads are removed and new pads are created for all the
+logical streams in this new chain.
2) the random access mode.
- In this mode, the ogg file is first scanned to detect the position and length of
- all chains. This scanning is performed using a recursive binary search algorithm
- that is explained below.
+ In this mode, the ogg file is first scanned to detect the position and length
+of all chains. This scanning is performed using a recursive binary search
+algorithm that is explained below.
find_chains(start, end)
{
| 111 | 222 |
BOS EOS
+What can an ogg demuxer do?
+---------------------------
+An ogg demuxer can read pages and get the granulepos from them.
+It can ask the decoder elements to convert a granulepos to time.
-ogg and the granulepos
-----------------------
-
-an ogg streams contains pages with a serial number and a granule pos. The granulepos
-is a number that is codec specific and denotes the 'position' of the last sample in
-the last packet in that page.
-
-ogg has therefore no notion about time, it only knows about bytes and granule positions.
-
-The granule position is just a number, it can contain gaps or can just be any random
-number.
-
-
-theora and the granulepos
--------------------------
-
-the granulepos in theora consists of the framenumber of the last keyframe shifted some
-amount of bits plus the number of p/b-frames.
-
-This means that given a framenumber or a timestamp one cannot generate the granulepos
-for that frame. eg frame 10 could have several valid granulepos values depending on if
-the last keyframe was on frame 5 or 0. Given a granulepos we can, however, create a
-unique correct timestamp and a framenumber.
+An ogg demuxer can also get the granulepos of the first and the last page of a
+stream to get the start and end timestamp of that stream.
+It can also get the length in bytes of the stream
+(when the peer is seekable, that is).
-in a raw theroa stream we use the granulepos as the offset field.
+An ogg demuxer is therefore basically able to seek to any byte position and
+timestamp.
-The granulepos of an ogg page is the framenumber of the last frame in the page.
-
-
-vorbis and granulepos
----------------------
-
-the granulepos in vorbis happens to be the same as the sample counter. conversion to and
-from granulepos is therefore easy.
-
-in a raw vorbis stream we use the granulepos as the offset field.
-
-The granulepos of an ogg page is the sample number of the next page in the ogg stream.
-
-
-What can ogg do?
-----------------
-
-An ogg demuxer can read pages and get the granuleposition from it. It can ask the decoder
-elements to convert a granulepos to time.
-
-An ogg demuxer can also get the granulepos of the first and the last page of a stream to
-get the start and end timestamp of that stream. It can also get the length in bytes of
-the stream (when the peer is seekable, that is).
-
-An ogg demuxer is therefore basically able to seek to any byte position and timestamp.
-When asked to seek to a given granulepos, the ogg demuxer should always convert the
-value to a timestamp using the peer decoder element conversion function. It can then
-binary search the file to eventually end up on the page with the given granule pos or
-a granulepos with the same timestamp.
+When asked to seek to a given granulepos, the ogg demuxer should always convert
+the value to a timestamp using the peer decoder element conversion function. It
+can then binary search the file to eventually end up on the page with the given
+granule pos or a granulepos with the same timestamp.
Seeking in ogg currently
------------------------
-When seeking in an ogg, the decoders can choose to forward the seek event as a
+When seeking in an ogg, the decoders can choose to forward the seek event as a
granulepos or a timestamp to the ogg demuxer.
In the case of a granulepos, the ogg demuxer will seek back to the beginning of
the stream and skip pages until it finds one with the requested timestamp.
In the case of a timestamp, the ogg demuxer also seeks back to the beginning of
-the stream. For each page it reads, it asks the decoder element to convert the
-granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until the
-page has a timestamp bigger or equal to the requested one.
+the stream. For each page it reads, it asks the decoder element to convert the
+granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until
+the page has a timestamp bigger or equal to the requested one.
-It is therefore important that the decoder elements in vorbis can convert a granulepos
-into a timestamp or never seek on timestamp on the oggdemuxer.
+It is therefore important that the decoder elements in vorbis can convert a
+granulepos into a timestamp or never seek on timestamp on the oggdemuxer.
-The default format on the oggdemuxer source pads is currently defined as a the
-granulepos of the packets, it is also the value of the OFFSET field in the GstBuffer.
+The default format on the oggdemuxer source pads is currently defined as a the
+granulepos of the packets, it is also the value of the OFFSET field in the
+GstBuffer.
+MUXING
+======
Oggmux
------
+The ogg muxer's job is to output complete Ogg pages such that the absolute
+time represented by the valid (ie, not -1) granulepos values on those pages
+never decreases. This has to be true for all logical streams in the group at
+the same time.
+
+To achieve this, encoders are required to pass along the exact time that the
+granulepos represents for each ogg packet that it pushes to the ogg muxer.
+This is ESSENTIAL: without this exact time representation of the granulepos,
+the muxer can not produce valid streams.
+
+The ogg muxer has a packet queue per sink pad. From this queue a page can
+be flushed when:
+ - total byte size of queued packets exceeds a given value
+ - total time duration of queued packets exceeds a given value
+ - total byte size of queued packets exceeds maximum Ogg page size
+ - eos of the pad
+ - encoder sent a command to flush out an ogg page after this new packet
+ (in 0.8, through a flush event; in 0.10, with a GstOggBuffer)
+ - muxer wants a flush to happen (so it can output pages)
+
+The ogg muxer also has a page queue per sink pad. This queue collects
+Ogg pages from the corresponding packet queue. Each page is also marked
+with the timestamp that the granulepos in the header represents.
+
+A page can be flushed from this collection of page queues when:
+- ideally, every page queue has at least one page with a valid granulepos
+ -> choose the page, from all queues, with the lowest timestamp value
+- if not, muxer can wait if the following limits aren't reached:
+ - total byte size of any page queue exceeds a limit
+ - total time duration of any page queue exceeds a limit
+- if this limit is reached, then:
+ - request a page flush from packet queue to page queue for each queue
+ that does not have pages
+ - now take the page from all queues with the lowest timestamp value
+ - make sure all later-coming data is marked as old, either to be still
+ output (but producing an invalid stream, though it can be fixed later)
+ or dropped (which means it's gone forever)
+
The oggmuxer uses the offset fields to fill in the granulepos in the pages.
+GStreamer implementation details
+--------------------------------
+As said before, the basic rule is that the ogg muxer needs an exact time
+representation for each granulepos. This needs to be provided by the encoder.
+
+Potential problems are:
+ - initial offsets for a raw stream need to be preserved somehow. Example:
+ if the first audio sample has time 0.5, the granulepos in the vorbis encoder
+ needs to be adjusted to take this into account.
+ - initial offsets may need be on rate boundaries. Example:
+ if the framerate is 5 fps, and the first video frame has time 0.1 s, the
+ granulepos cannot correctly represent this timestamp.
+ This can be handled out-of-band (initial offset in another muxing format,
+ skeleton track with initial offsets, ...)
+
+Given that the basic rule for muxing is that the muxer needs an exact timestamp
+matching the granulepos, we need some way of communicating this time value
+from encoders to the Ogg muxer. So we need a mechanism to communicate
+a granulepos and its time representation for each GstBuffer.
+
+(This is an instance of a more generic problem - having a way to attach
+ more fields to a GstBuffer)
+
+Possible ways:
+- setting TIMESTAMP to this value: bad - this value represents the end time
+ of the buffer, and thus conflicts with GStreamer's idea of what TIMESTAMP
+ is. This would cause problems muxing the encoded stream in other muxing
+ formats, or for streaming. Note that this is what was done in GStreamer 0.8
+- setting DURATION to GP_TIME - TIMESTAMP: bad - this breaks the concept of
+ duration for this frame. Take the video example above; each buffer would
+ have a correct timestamp, but always a 0.1 s duration as opposed to the
+ correct 0.2 s duration
+- subclassing GstBuffer: clean, but requires a common header used between
+ ogg muxer and all encoders that can be muxed into ogg. Also, what if
+ a format can be muxed into more than one container, and they each have
+ their own "extra" info to communicate ?
+- adding key/value pairs to GstBuffer: clean, but requires changes to
+ core. Also, the overhead of allocating e.g. a GstStructure for *each* buffer
+ may be expensive.
+- "cheating":
+ - abuse OFFSET to store the timestamp matching this granulepos
+ - abuse OFFSET_END to store the granulepos value
+ The drawback here is that before, it made sense to use OFFSET and OFFSET_END
+ to store a byte count. Given that this is not used for anything critical
+ (you can't store a raw theora or vorbis stream in a file anyway),
+ this is what's being done for now.
+
+In practice
+-----------
+- all encoders of formats that can be muxed into Ogg produce a stream where:
+ - OFFSET is abused to be the granulepos of the encoded theora buffer
+ - OFFSET_END is abused to be the timestamp corresponding exactly to the
+ granulepos
+ - TIMESTAMP is the timestamp matching the begin of the buffer
+ - DURATION is the length in time of the buffer
+
+- initial delays should be handled in the GStreamer encoders by mangling
+ the granulepos of the encoded packet to take the delay into account as
+ best as possible and store that in OFFSET;
+ this then brings TIMESTAMP + DURATION to within less
+ than a frame period of the granulepos's time representation
+ The ogg muxer will then create new ogg packets with this OFFSET as
+ the granulepos. So in effect, the granulepos produced by the encoders
+ does not get used directly.
TODO
----
-
-- use the OFFSET field in the GstBuffer to store/read the granulepos as
- opposed to the OFFSET_END field.
-
-
-Ogg media mapping
------------------
-
-Ogg defines a mapping for each media type that it embeds.
-
-For Vorbis:
-
- - 3 header pages, with granulepos 0.
- - 1 page with 1 packet header identification
- - N pages with 2 packets comments and codebooks
- - granulepos is samplenumber of next page
- - one packet can contain a variable number of samples but one frame
- that should be handed to the vorbis decoder.
-
-For Theora
-
- - 3 header pages, with granulepos 0.
- - 1 page with 1 packet header identification
- - N pages with 2 packets comments and codebooks
- - granulepos is framenumber of last packet in page, where framenumber
- is a combination of keyframe number and p frames since keyframe.
- - one packet contains 1 frame
-
-
-
+- decide on a proper mechanism for communicating extra per-buffer fields