--- /dev/null
+<?xml version='1.0'?>
+<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
+
+<rfc ipr="full2026" docName="RTP Payload Format for Vorbis Encoded Audio">
+
+<front>
+<title>draft-kerr-avt-vorbis-rtp-04</title>
+
+<author initials="P" surname="Kerr" fullname="Phil Kerr">
+<organization>Xiph.Org</organization>
+<address>
+<email>phil@plus24.com</email>
+<uri>http://www.xiph.org/</uri>
+</address>
+</author>
+
+<date day="01" month="December" year="2004" />
+
+<area>General</area>
+<workgroup>AVT Working Group</workgroup>
+<keyword>I-D</keyword>
+
+<keyword>Internet-Draft</keyword>
+<keyword>Vorbis</keyword>
+<keyword>RTP</keyword>
+<abstract>
+<t>This document describes a RTP payload format for transporting
+ Vorbis encoded audio. It details the RTP encapsulation mechanism
+ for raw Vorbis data and details the delivery mechanisms for the
+ decoder probability model, referred to as a codebook, metadata
+ and other setup information.</t>
+</abstract>
+
+<note title="Xiph.Org Note">
+<t>
+ This is a working copy of the Vorbis I-D, used for discussion on the xiph-rtp discussion list. It will be subject
+ to frequent revisions before it is submitted to the IETF.
+</t>
+</note>
+
+<note title="Editors Note">
+<t>
+ All references to RFC XXXX are to be replaced by references to the RFC number of this memo, when published.
+</t>
+</note>
+</front>
+
+<middle>
+
+<section anchor="Introduction" title="Introduction">
+<t>
+ Vorbis is a general purpose perceptual audio codec intended to allow
+ maximum encoder flexibility, thus allowing it to scale competitively
+ over an exceptionally wide range of bitrates. At the high
+ quality/bitrate end of the scale (CD or DAT rate stereo,
+ 16/24 bits), it is in the same league as MPEG-2 and MPC. Similarly,
+ the 1.0 encoder can encode high-quality CD and DAT rate stereo at
+ below 48k bits/sec without resampling to a lower rate. Vorbis is
+ also intended for lower and higher sample rates (from 8kHz
+ telephony to 192kHz digital masters) and a range of channel
+ representations (monaural, polyphonic, stereo, quadraphonic, 5.1,
+ ambisonic, or up to 255 discrete channels).
+
+ Vorbis encoded audio is generally encapsulated within an Ogg format
+ bitstream <xref target="rfc3533"></xref>, which provides framing and synchronization. For the
+ purposes of RTP transport, this layer is unnecessary, and so raw
+ Vorbis packets are used in the payload.
+</t>
+<section anchor="Terminology" title="Terminology">
+<t>
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 <xref target="rfc2119"></xref>.
+</t>
+
+</section>
+</section>
+
+<section anchor="Payload Format" title="Payload Format">
+<t>
+ For RTP based transportation of Vorbis encoded audio the standard
+ RTP header is followed by an 8 bit payload header, then the payload
+ data. The payload header is used to signify if the following packet
+ contains fragmented Vorbis data and/or the the number of whole Vorbis
+ data frames. The payload data contains the raw Vorbis bitstream
+ information.
+</t>
+<section anchor="RTP Header" title="RTP Header">
+
+<artwork><![CDATA[
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+]]></artwork>
+<t>
+ The RTP header begins with an octet of fields (V, P, X, and CC) to
+ support specialized RTP uses (see <xref target="avt-rtp-new-11"></xref> and <xref target="avt-profile-new-12"></xref> for details). For
+ Vorbis RTP, the following values are used.
+<t>
+</t>
+ Version (V): 2 bits</t><t>
+ This field identifies the version of RTP. The version
+ used by this specification is two (2).
+<t>
+</t>
+ Padding (P): 1 bit</t><t>
+ Padding MAY be used with this payload format according to
+ section 5.1 of <xref target="avt-rtp-new-11"></xref>.
+<t>
+</t>
+ Extension (X): 1 bit</t><t>
+ Always set to 0, as audio silence suppression is not used by
+ the Vorbis codec.
+<t>
+</t>
+ CSRC count (CC): 4 bits</t><t>
+ The CSRC count is used in accordance with <xref target="rfc1889"></xref>.
+<t>
+</t>
+ Marker (M): 1 bit</t><t>
+ Set to zero. Audio silence suppression not used. This conforms
+ to section 4.1 of <xref target="vorbis-spec-ref"></xref>.
+<t>
+</t>
+ Payload Type (PT): 7 bits</t><t>
+ An RTP profile for a class of applications is expected to assign
+ a payload type for this format, or a dynamically allocated
+ payload type SHOULD be chosen which designates the payload as
+ Vorbis.
+<t>
+</t>
+ Sequence number: 16 bits</t><t>
+ The sequence number increments by one for each RTP data packet
+ sent, and may be used by the receiver to detect packet loss and
+ to restore packet sequence. This field is detailed further in
+ <xref target="rfc1889"></xref>.
+<t>
+</t>
+ Timestamp: 32 bits</t><t>
+ A timestamp representing the sampling time of the first sample of
+ the first Vorbis packet in the RTP packet. The clock frequency
+ MUST be set to the sample rate of the encoded audio data and is
+ conveyed out-of-band as a SDP attribute.
+<t>
+</t>
+ SSRC/CSRC identifiers: </t><t>
+ These two fields, 32 bits each with one SSRC field and a maximum
+ of 16 CSRC fields, are as defined in <xref target="rfc1889"></xref>.
+</t>
+</section>
+
+<section anchor="Payload Header" title="Payload Header">
+<t>
+ After the RTP Header section the next octet is the Payload Header.
+ This octet is split into a number of bitfields detailing the format
+ of the following Payload Data packets.
+</t>
+<artwork><![CDATA[
+ 0 1 2 3 4 5 6 7
+ +---+---+---+---+---+---+---+---+
+ | C | F | R | # of packets |
+ +---+---+---+---+---+---+---+---+
+]]></artwork>
+<t>
+ Continuation (C): 1 bit</t><t>
+ Set to one if this is a continuation of a fragmented packet.
+</t>
+<t>
+ Fragmented (F): 1 bit</t><t>
+ Set to one if the payload contains complete packets or if it
+ contains the last fragment of a fragmented packet.
+</t>
+<t>
+ Reserved (R): 1 bit</t><t>
+ Reserved, MUST be set to zero by senders, and ignored by
+ receivers.
+</t>
+<t>
+ The last 5 bits are the number of complete packets in this payload.
+ This provides for a maximum number of 32 Vorbis packets in the
+ payload. If C is set to one, this number MUST be 0.
+</t>
+</section>
+
+<section anchor="Payload Data" title="Payload Data">
+<t>
+ Vorbis packets are unbounded in length currently. At some future
+ point there will likely be a practical limit placed on packet
+ length.
+</t>
+<t>
+ Typical Vorbis packet sizes are from very small (2-3 bytes) to
+ quite large (8-12 kilobytes). The reference implementation <xref target="libvorbis"></xref>
+ typically produces packets less than ~800 bytes, except for the
+ header packets which are ~4-12 kilobytes.
+</t>
+<t>
+ Within a RTP context the maximum Vorbis packet SHOULD be kept below
+ the MTU size, typically 1500 octets, including the RTP and payload
+ headers, to avoid fragmentation. For the delivery of Vorbis audio
+ using RTP the maximum size of the header block is limited to 64K.
+</t>
+<t>
+ If the payload contains a single Vorbis packet or a Vorbis packet
+ fragment, the Vorbis packet data follows the payload header.
+</t>
+<t>
+ For payloads which consist of multiple Vorbis packets, payload data
+ consists of one octet representing the packet length followed by
+ the packet data for each of the Vorbis packets in the payload.
+</t>
+<t>
+ The Vorbis packet length field is the length of the Vorbis data
+ block minus one octet.
+</t>
+<t>
+ The payload packing of the Vorbis data packets SHOULD follow the
+ guidelines set-out in section 4.4 of <xref target="avt-profile-new-12"></xref> where the oldest packet
+ occurs immediately after the RTP packet header.
+</t>
+<t>
+ Channel mapping of the audio is in accordance with BS. 775-1
+ ITU-R.
+</t>
+</section>
+
+
+<section anchor="Example RTP Packet" title="Example RTP Packet">
+<t>
+ Here is an example RTP packet containing two Vorbis packets.
+
+ RTP Packet Header:
+
+<artwork><![CDATA[
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | 2 |0|0| 0 |0| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp (in sample rate units) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronisation source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+]]></artwork>
+
+ Payload Data:
+
+<artwork><![CDATA[
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0|1|0| # pks: 2| len | vorbis data ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ...vorbis data... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | len | next vorbis packet data... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+]]></artwork>
+
+</t>
+</section>
+
+
+</section>
+
+<section anchor="Frame Packetizing" title="Frame Packetizing">
+<t>
+ Each RTP packet contains either one complete Vorbis packet, one
+ Vorbis packet fragment, or an integer number of complete Vorbis
+ packets (up to a max of 32 packets, since the number of packets
+ is defined by a 5 bit value).
+</t>
+<t>
+ Any Vorbis packet that is larger than 256 octets and less than the
+ path-MTU MUST be placed in a RTP packet by itself.
+</t>
+<t>
+ Any Vorbis packet that is 256 bytes or less SHOULD be bundled in the
+ RTP packet with as many Vorbis packets as will fit, up to a maximum
+ of 32.
+</t>
+<t>
+ If a Vorbis packet will not fit within the network MTU, it SHOULD be
+ fragmented. A fragmented packet has a zero in the last five bits
+ of the payload header. Each fragment after the first will also set
+ the Continued (C) bit to one in the payload header. The RTP packet
+ containing the last fragment of the Vorbis packet will have the
+ Fragmented (F) bit set to one. To maintain the correct sequence
+ for fragmented packet reception the timestamp field of fragmented
+ packets MUST be the same as the first packet sent, with the sequence
+ number incremented as normal for the subsequent RTP packets. Path
+ MTU is detailed in <xref target="rfc1063"></xref> and <xref target="rfc1981"></xref>.
+</t>
+
+<section anchor="Example Fragmented Vorbis Packet" title="Example Fragmented Vorbis Packet">
+<t>
+ Here is an example fragmented Vorbis packet split over three RTP
+ packets.
+</t>
+<artwork><![CDATA[
+
+ Packet 1:
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | 1000 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | xxxxx |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0|0|0| 0| len | vorbis data .. |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ..vorbis data.. |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+]]></artwork>
+
+<t>
+ In this packet the initial sequence number is 1000 and the
+ timestamp is xxxxx. The number of packets field is set to 0.
+</t>
+
+<artwork><![CDATA[
+
+ Packet 2:
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | 1001 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | xxxxx |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |1|0|0| 0| len | vorbis data ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ..vorbis data.. |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+]]></artwork>
+
+<t>
+ The C bit is set to 1 and the number of packets field is set to 0.
+ For large Vorbis fragments there can be several of these type of
+ payload packets. The maximum packet size SHOULD be no greater
+ than the path MTU, including all RTP and payload headers. The
+ sequence number has been incremented by one but the timestamp field
+ remains the same as the initial packet.
+</t>
+
+<artwork><![CDATA[
+
+ Packet 3:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | 1002 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | xxxxx |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |1|1|0| 0| len | vorbis data .. |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ..vorbis data.. |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+]]></artwork>
+
+<t>
+ This is the last Vorbis fragment packet. The C and F bits are
+ set and the packet count remains set to 0. As in the previous
+ packets the timestamp remains set to the first packet in the
+ sequence and the sequence number has been incremented.
+</t>
+</section>
+
+<section anchor="Packet Loss" title="Packet Loss">
+<t>
+ As there is no error correction within the Vorbis stream, packet
+ loss will result in a loss of signal. Packet loss is more of an
+ issue for fragmented Vorbis packets as the client will have to
+ cope with the handling of the C and F flags. If we use the
+ fragmented Vorbis packet example above and the first packet is
+ lost the client SHOULD detect that the next packet has the packet
+ count field set to 0 and the C bit is set and MUST drop it. The
+ next packet, which is the final fragmented packet, SHOULD be dropped
+ in the same manner, or buffered. Feedback reports on lost and
+ dropped packets MUST be sent back via RTCP.
+</t>
+</section>
+</section>
+
+
+<section anchor="Configuration Headers" title="Configuration Headers">
+<t>
+ To decode a Vorbis stream three configuration header blocks are
+ needed. The first header indicates the sample and bitrates, the
+ number of channels and the version of the Vorbis encoder used.
+ The second header contains the decoders probability model, or
+ codebooks and the third header details stream metadata.
+</t>
+<t>
+ As the RTP stream may change certain configuration data mid-session
+ there are two different methods for delivering this configuration
+ data to a client, RTCP which is detailed below and SDP which is
+ detailed in section 5. SDP delivery is used to set-up an initial
+ state for the client application and RTCP is used to change state
+ during the session. The changes may be due to different metadata
+ or codebooks as well as different bitrates of the stream.
+</t>
+<t>
+ Unlike other mainstream audio codecs Vorbis has no statically
+ configured probability model, instead it packs all entropy decoding
+ configuration, VQ and Huffman models into a self-contained codebook.
+ This codebook block also requires additional identification
+ information detailing the number of audio channels, bit rates and
+ other information used to initialise the Vorbis stream.
+</t>
+
+
+<section anchor="RTCP Based Header Transmission" title="RTCP Based Header Transmission">
+<t>
+ The three header data blocks are sent out-of-band as an APP defined
+ RTCP message with the 4 octet name field set to VORB.
+</t>
+<t>
+ Synchronizing the configuration headers to the RTP stream is
+ critical. A 32 bit timestamp field is used to indicate the
+ timepoint when a VORB header MUST be applied to the RTP stream.
+ VORB RTCP packets SHOULD be sent just ahead of the change in the
+ RTP stream. As the reception loss of the RTCP header will mean
+ the RTP stream will fail to decode properly the freqency of their
+ periodic retransmission SHOULD be high enough to minimize the
+ stream disturbance whilst remaining under the RTCP bandwidth
+ allocation.
+</t>
+
+
+<artwork><![CDATA[
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P| subtype | PT=APP=204 | Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SSRC/CSRC |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | VORB |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Timestamp (in sample rate units) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Vorbis Version |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Audio Sample Rate |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Bitrate Maximum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Bitrate Nominal |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Bitrate Minimum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | bsz 0 | bsz 1 | Num Audio Channels |c|m|o|x|x|x|x|x|
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | Codebook length | Codebook checksum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ .. Codebook |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ .. URI string |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | Vendor string length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Vendor string ..
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | User comments list length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ .. User comment length / User comment |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+]]></artwork>
+
+<t>
+ The first Vorbis config header defines the Vorbis stream
+ attributes. The Vorbis version MUST be set to zero to comply with
+ this document. The fields Sample Rate, Bitrate Maximum/Nominal/
+ Minimum and Num Audio Channels are set in accordance with <xref target="vorbis-spec-ref"></xref> with
+ the bsz fields above referring to the blocksize parameters. The
+ framing bit is not used for RTP transportation and so applications
+ constructing Vorbis files MUST take care to set this if required.
+</t>
+<t>
+ The next 8 bits are used to indicate the presence of the two
+ other Vorbis stream config headers and the size overflow header.
+</t>
+<t>
+ The c flag indicates the presence of a codebook header block, the
+ m flag indicates the presence of a comment metadata block. The o
+ flag indicates if the size of either of the c and m headers would
+ make the VORB packet greater than that allowed for a RTCP message.
+</t>
+<t>
+ The remaining five bits, indicated with an x, are reserved/unused
+ and MUST be set to 0 for this version of the document.
+</t>
+<t>
+ If the c flag is set then the next header block will contain the
+ codebook configuration data.
+</t>
+<t>
+ The configuration information detailed above MUST be completely
+ intact, as a client can not decode a stream with an incomplete
+ or corrupted codebook set.
+</t>
+<t>
+ A 16 bit codebook length field and a 16 bit 1's complement checksum
+ of the codebook precedes the codebook datablock. The length field
+ allows for codebooks to be up to 64K in size. The checksum is used
+ to detect a corrupted codebook.
+</t>
+<t>
+ If a checksum failure is detected then a new config header file
+ SHOULD be obtained from SDP, if the codebook has not changed since
+ the session has started. If no SDP value is set and no other method
+ for obtaining the config headers exists then this is considered to
+ be a failure and SHOULD be reported to the client application.
+</t>
+<t>
+ If the m flag is set then the next header block will contain the
+ comment metadata, such as artist name, track title and so on. These
+ metadata messages are not intended to be fully descriptive but to
+ offer basic track/song information. This message MUST be sent at
+ the start of the stream, together with the setup and codebook
+ headers, even if it contains no information. During a session the
+ metadata associated with the stream may change from that specified
+ at the start, e.g. a live concert broadcast changing acts/scenes, so
+ clients MUST have the ability to receive m header blocks. Details
+ on the format of the comments can be found in the Vorbis
+ documentation <xref target="v-comment"></xref>.
+</t>
+<t>
+ The format for the data takes the form of a 32 bit codec vendors
+ name length field followed by the name encoded in UTF-8. The next
+ field denotes the number of user comments and then the user comments
+ length and text field pairs, up to the number indicated by the user
+ comment list length.
+</t>
+<t>
+ If the o, overflow, bit is set then the URI of a whole header block
+ is specified in an overflow URI field, which is a null terminated
+ UTF-8 string. The header file specified at the URI MUST NOT have
+ the overflow flag set, otherwise a loop condition will occur.
+</t>
+</section>
+
+<section anchor="Codebook Caching" title="Codebook Caching">
+<t>
+ Codebook caching allows clients that have previously connected to a
+ stream to re-use the codebooks and thus begin the playback of the
+ session faster. When a client receives a codebook it may store
+ it, together with the MD5 key, locally and can compare the MD5 key
+ of locally cached codebooks with the key it receives via SDP, which
+ is detailed in the next section.
+</t>
+</section>
+</section>
+
+<section anchor="Session Description for Vorbis RTP Streams" title="Session Description for Vorbis RTP Streams">
+<t>
+ Session description information concerning the Vorbis stream
+ SHOULD be provided if possible and MUST be in accordance with <xref target="rfc2327"></xref>.
+ The SDP information is split into two sections, a mandatory
+ section detailing the RTP stream and an optional section used to
+ convey information needed for codebook caching.
+</t>
+<t>
+ Below is an outline of the mandatory SDP attributes.
+</t>
+<t>
+ c=IN IP4/6
+ m=audio RTP/AVP 98
+ a=rtpmap:98 vorbis/
+ a=fmtp:98 header=
+ a=fmtp:98 md5key=
+</t>
+<t>
+ The port value is specified by the server application bound to
+ the address specified in the c attribute. The bitrate value
+ specified in the a attribute MUST match the Vorbis sample rate
+ value.
+</t>
+<t>
+ The Vorbis codebook specified in the header attribute MUST contain
+ all of the configuration data. If the codebook MD5 attribute,
+ md5key, is set the key is compared to a locally held cache and
+ if found the associated local codebook is used, if not the
+ client MUST use the configuration headers specified with the
+ header attribute.
+</t>
+
+<section anchor="SDP Based Config Header Transmission" title="SDP Based Config Header Transmission">
+<t>
+ The optional SDP attributes are used to convey details of the
+ Vorbis stream which are required for codebook caching. If the
+ following attributes are set they take precedent over values
+ specified in the u attribute detailed above. The maximum size
+ of the mandatory and optional SDP attributes MUST be less than
+ 1K in size to conform to section 4.1 of <xref target="rfc2327"></xref>.
+</t>
+<t>
+ a=fmtp:98 bitrate_min=
+ a=fmtp:98 bitrate_norm=
+ a=fmtp:98 bitrate_max=
+ a=fmtp:98 bsz0=
+ a=fmtp:98 bsz1=
+ a=fmtp:98 channels=
+ a=fmtp:98 meta_vendor=
+</t>
+<t>
+ The metadata attribute, meta_vendor, provides the bare minimum
+ information required for decoding but does not convey any
+ meaningful stream metadata information. As outlined in the Vorbis
+ comment field and header specification documentation, <xref target="v-comment"></xref>, a number
+ of predefined field names are available which SHOULD be used. An
+ example would be:
+</t>
+<t>
+ a=fmtp:98 meta_vendor=Xiph.Org libVorbis I 20020717
+ a=fmtp:98 meta_artist=Honest Bob and the Factory-to-Dealer-Incentives
+ a=fmtp:98 meta_title=I'm Still Around
+ a=fmtp:98 meta_tracknumber=5
+</t>
+</section>
+</section>
+
+<section anchor="IANA Considerations" title="IANA Considerations">
+<t>
+ MIME media type name: audio
+</t>
+<t>
+ MIME subtype: vorbis
+</t>
+<t>
+ Required Parameters:</t><t>
+ header indicates the URI of the decoding codebook.
+ md5key indicates the MD5 key of the codebooks.
+</t>
+<t>
+ Optional Parameters: </t><t>
+ bitrate_min, bitrate_norm and bitrate_max indicate the
+ minimum, nominal and maximum bitrates. bsz0 and bsz1
+ indicate the blocksize values. channels indicates the
+ number of audio channels in the stream. meta_vendor
+ indicates the encoding codec vendor.
+</t>
+<t>
+ Encoding considerations:</t><t>
+ This type is only defined for transfer via RTP as specified
+ in RFC XXXX.
+</t>
+<t>
+ Security Considerations:</t><t>
+ See Section 6 of RFC 3047.
+</t>
+<t>
+ Interoperability considerations: none
+</t>
+<t>
+
+ Published specification:
+ See the Vorbis documentation <xref target="vorbis-spec-ref"></xref> for details.
+</t>
+<t>
+ Applications which use this media type:
+ Audio streaming and conferencing tools
+</t>
+<t>
+ Additional information: none
+</t>
+<t>
+ Person & email address to contact for further information:</t><t>
+ Phil Kerr
+ phil@plus24.com
+</t>
+<t>
+ Intended usage: COMMON
+</t>
+<t>
+ Author/Change controller:</t><t>
+ Author: Phil Kerr
+ Change controller: IETF AVT Working Group
+</t>
+</section>
+
+<section anchor="Congestion Control" title="Congestion Control">
+<t>
+ Vorbis clients SHOULD send regular receiver reports detailing
+ congestion. A mechanism for dynamically downgrading the stream,
+ known as bitrate peeling, will allow for a graceful backing off
+ of the stream bitrate. This feature is not available at present
+ so an alternative would be to redirect the client to a lower
+ bitrate stream if one is available.
+</t>
+</section>
+
+<section anchor="Security Considerations" title="Security Considerations">
+<t>
+ RTP packets using this payload format are subject to the security
+ considerations discussed in the RTP specification <xref target="rfc1889"></xref>. This implies
+ that the confidentiality of the media stream is achieved by using
+ encryption. Because the data compression used with this payload
+ format is applied end-to-end, encryption may be performed on the
+ compressed data. Where the size of a data block is set care MUST
+ be taken to prevent buffer overflows in the client applications.
+</t>
+</section>
+
+<section anchor="Acknowledgments" title="Acknowledgments">
+<t>
+ This document is a continuation of draft-moffitt-vorbis-rtp-00.txt.
+ The MIME type section is a continuation of draft-short-avt-rtp-
+ vorbis-mime-00.txt
+</t>
+<t>
+ Thanks to the AVT, Ogg Vorbis Communities / Xiph.org including
+ Steve Casner, Ramon Garcia, Pascal Hennequin, Ralph Jiles,
+ Tor-Einar Jarnbjo, Colin Law, John Lazzaro, Jack Moffitt,
+ Colin Perkins, Barry Short, Mike Smith, Magnus Westerlund.
+</t>
+</section>
+
+</middle>
+
+<back>
+<references>
+
+<reference anchor="rfc3533">
+<front>
+<title>The Ogg Encapsulation Format Version 0</title>
+<author initials="S." surname="Pfeiffer" fullname="Silvia Pfeiffer"></author>
+</front>
+<seriesInfo name="RFC" value="3533" />
+</reference>
+
+<reference anchor="rfc2119">
+<front>
+<title>Key words for use in RFCs to Indicate Requirement Levels </title>
+<author initials="S." surname="Bradner" fullname="Scott Bradner"></author>
+</front>
+<seriesInfo name="RFC" value="2119" />
+</reference>
+
+<reference anchor="rfc1889">
+<front>
+<title>RTP: A Transport Protocol for Real-Time Applications</title>
+<author initials="H." surname="Schulzrinne, et al." fullname="Henning Schulzrinne, et al."></author>
+</front>
+<seriesInfo name="RFC" value="1889" />
+</reference>
+
+<reference anchor="avt-rtp-new-11">
+<front>
+<title>RTP: A transport protocol for real-time applications. Work in progress, draft-ietf-avt-rtp-new-11.txt</title>
+</front>
+</reference>
+
+<reference anchor="avt-profile-new-12">
+<front>
+<title>RTP Profile for Audio and Video Conferences with Minimal Control. Work in progress, draft-ietf-avt-profile-new-12.txt</title>
+</front>
+</reference>
+
+<reference anchor="vorbis-spec-ref">
+<front>
+<title>Ogg Vorbis I spec: Codec setup and packet decode. http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-ref.html</title>
+</front>
+</reference>
+
+<reference anchor="v-comment">
+<front>
+<title>Ogg Vorbis I spec: Comment field and header specification. http://www.xiph.org/ogg/vorbis/doc/v-comment.html</title>
+</front>
+</reference>
+
+<reference anchor="rfc2327">
+<front>
+<title>SDP: Session Description Protocol</title>
+<author initials="M." surname="Handley" fullname="Mark Handley"></author>
+<author initials="V." surname="Jacobson" fullname="Van Jacobson"></author>
+</front>
+<seriesInfo name="RFC" value="2327" />
+</reference>
+
+<reference anchor="rfc1063">
+<front>
+<title>Path MTU Discovery</title>
+<author initials="J." surname="Mogul et al." fullname="J. Mogul et al."></author>
+</front>
+<seriesInfo name="RFC" value="1063" />
+</reference>
+
+<reference anchor="rfc1981">
+<front>
+<title>Path MTU Discovery for IP version 6</title>
+<author initials="J." surname="McCann et al." fullname="J. McCann et al."></author>
+</front>
+<seriesInfo name="RFC" value="1981" />
+</reference>
+
+<reference anchor="libvorbis">
+<front>
+<title>libvorbis: Available from the Xiph website, http://www.xiph.org</title>
+</front>
+</reference>
+
+</references>
+</back>
+</rfc>