3 Network Working Group Phil Kerr
4 Internet-Draft Ogg Vorbis Community
5 October 27, 2003 OpenDrama
6 Expires: April 27, 2003
9 RTP Payload Format for Vorbis Encoded Audio
11 <draft-kerr-avt-vorbis-rtp-03.txt>
15 This document is an Internet-Draft and is in full conformance
16 with all provisions of Section 10 of RFC2026.
18 Internet-Drafts are working documents of the Internet Engineering
19 Task Force (IETF), its areas, and its working groups. Note that
20 other groups may also distribute working documents as
23 Internet-Drafts are draft documents valid for a maximum of six
24 months and may be updated, replaced, or obsoleted by other
25 documents at any time. It is inappropriate to use Internet-
26 Drafts as reference material or to cite them other than as
29 The list of current Internet-Drafts can be accessed at
30 http://www.ietf.org/ietf/1id-abstracts.txt
32 The list of Internet-Draft Shadow Directories can be accessed at
33 http://www.ietf.org/shadow.html.
37 Copyright (C) The Internet Society (2003). All Rights Reserved.
41 This document describes a RTP payload format for transporting
42 Vorbis encoded audio. It details the RTP encapsulation mechanism
43 for raw Vorbis data and details the delivery mechanisms for the
44 decoder probability model, referred to as a codebook, metadata
45 and other setup information.
47 [Note to RFC Editor: All references to RFC XXXX are to be replaced
48 by references to the RFC number of this memo, when published.]
57 Kerr Expires April 27, 2003 [Page 1]
59 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
64 1. Introduction ........................................ 2
65 1.1 Terminology ......................................... 2
66 2. Payload Format ...................................... 3
67 2.1 RTP Header .......................................... 3
68 2.2 Payload Header ...................................... 4
69 2.3 Payload Data ........................................ 5
70 2.4 Example RTP Packet .................................. 5
71 3. Frame Packetizing ................................... 6
72 3.1 Example Fragmented Vorbis Packet .................... 6
73 3.2 Packet Loss ......................................... 8
74 4. Configuration Headers ............................... 8
75 4.1 RTCP Based Config Header Transmission ............... 9
76 4.2 Codebook Caching .................................... 11
77 5. Session Description ................................. 11
78 5.1 SDP Based Config Header Transmission ................ 12
79 6. IANA Considerations ................................. 13
80 7. Congestion Control .................................. 13
81 8. Security Considerations ............................. 14
82 9. Acknowledgements .................................... 14
83 10. Normative References ................................ 14
84 10.1 Informative References ................................ 15
85 11. Full Copyright Statement ............................ 15
86 11.1 IPR Statement ....................................... 15
87 12. Authors Address ..................................... 15
92 Vorbis is a general purpose perceptual audio codec intended to allow
93 maximum encoder flexibility, thus allowing it to scale competitively
94 over an exceptionally wide range of bitrates. At the high
95 quality/bitrate end of the scale (CD or DAT rate stereo,
96 16/24 bits), it is in the same league as MPEG-2 and MPC. Similarly,
97 the 1.0 encoder can encode high-quality CD and DAT rate stereo at
98 below 48k bits/sec without resampling to a lower rate. Vorbis is
99 also intended for lower and higher sample rates (from 8kHz
100 telephony to 192kHz digital masters) and a range of channel
101 representations (monaural, polyphonic, stereo, quadraphonic, 5.1,
102 ambisonic, or up to 255 discrete channels).
104 Vorbis encoded audio is generally encapsulated within an Ogg format
105 bitstream [1], which provides framing and synchronization. For the
106 purposes of RTP transport, this layer is unnecessary, and so raw
107 Vorbis packets are used in the payload.
111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
113 document are to be interpreted as described in RFC 2119 [2].
116 Kerr Expires April 27, 2003 [Page 2]
118 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
123 For RTP based transportation of Vorbis encoded audio the standard
124 RTP header is followed by an 8 bit payload header, then the payload
125 data. The payload header is used to signify if the following packet
126 contains fragmented Vorbis data and/or the the number of whole Vorbis
127 data frames. The payload data contains the raw Vorbis bitstream
134 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
135 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
136 |V=2|P|X| CC |M| PT | sequence number |
137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
139 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
140 | synchronization source (SSRC) identifier |
141 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
142 | contributing source (CSRC) identifiers |
144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
147 The RTP header begins with an octet of fields (V, P, X, and CC) to
148 support specialized RTP uses (see [4] and [5] for details). For
149 Vorbis RTP, the following values are used.
152 This field identifies the version of RTP. The version
153 used by this specification is two (2).
156 Padding MAY be used with this payload format according to
160 Always set to 0, as audio silence suppression is not used by
163 CSRC count (CC): 4 bits
164 The CSRC count is used in accordance with [3].
171 Kerr Expires April 27, 2003 [Page 3]
173 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
177 Set to zero. Audio silence suppression not used. This conforms
178 to section 4.1 of [6].
180 Payload Type (PT): 7 bits
181 An RTP profile for a class of applications is expected to assign
182 a payload type for this format, or a dynamically allocated
183 payload type SHOULD be chosen which designates the payload as
186 Sequence number: 16 bits
187 The sequence number increments by one for each RTP data packet
188 sent, and may be used by the receiver to detect packet loss and
189 to restore packet sequence. This field is detailed further in
193 A timestamp representing the sampling time of the first sample of
194 the first Vorbis packet in the RTP packet. The clock frequency
195 MUST be set to the sample rate of the encoded audio data and is
196 conveyed out-of-band as a SDP attribute.
198 SSRC/CSRC identifiers:
199 These two fields, 32 bits each with one SSRC field and a maximum
200 of 16 CSRC fields, are as defined in [3].
205 After the RTP Header section the next octet is the Payload Header.
206 This octet is split into a number of bitfields detailing the format
207 of the following Payload Data packets.
210 +---+---+---+---+---+---+---+---+
211 | C | F | R | # of packets |
212 +---+---+---+---+---+---+---+---+
214 Continuation (C): 1 bit
215 Set to one if this is a continuation of a fragmented packet.
217 Fragmented (F): 1 bit
218 Set to one if the payload contains complete packets or if it
219 contains the last fragment of a fragmented packet.
222 Reserved, MUST be set to zero by senders, and ignored by
225 The last 5 bits are the number of complete packets in this payload.
226 This provides for a maximum number of 32 Vorbis packets in the
227 payload. If C is set to one, this number MUST be 0.
229 Kerr Expires April 27, 2003 [Page 4]
231 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
236 Vorbis packets are unbounded in length currently. At some future
237 point there will likely be a practical limit placed on packet
240 Typical Vorbis packet sizes are from very small (2-3 bytes) to
241 quite large (8-12 kilobytes). The reference implementation [11]
242 typically produces packets less than ~800 bytes, except for the
243 header packets which are ~4-12 kilobytes.
245 Within a RTP context the maximum Vorbis packet SHOULD be kept below
246 the MTU size, typically 1500 octets, including the RTP and payload
247 headers, to avoid fragmentation. For the delivery of Vorbis audio
248 using RTP the maximum size of the header block is limited to 64K.
250 If the payload contains a single Vorbis packet or a Vorbis packet
251 fragment, the Vorbis packet data follows the payload header.
253 For payloads which consist of multiple Vorbis packets, payload data
254 consists of one octet representing the packet length followed by
255 the packet data for each of the Vorbis packets in the payload.
257 The Vorbis packet length field is the length of the Vorbis data
258 block minus one octet.
260 The payload packing of the Vorbis data packets SHOULD follow the
261 guidelines set-out in section 4.4 of [5] where the oldest packet
262 occurs immediately after the RTP packet header.
264 Channel mapping of the audio is in accordance with BS. 775-1
268 2.4 Example RTP Packet
270 Here is an example RTP packet containing two Vorbis packets.
275 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
277 | 2 |0|0| 0 |0| PT | sequence number |
278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
279 | timestamp (in sample rate units) |
280 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
281 | synchronisation source (SSRC) identifier |
282 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
283 | contributing source (CSRC) identifiers |
285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
287 Kerr Expires April 27, 2003 [Page 5]
289 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
295 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
297 |0|1|0| # pks: 2| len | vorbis data ... |
298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
299 | ...vorbis data... |
300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
301 | ... | len | next vorbis packet data... |
302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
307 Each RTP packet contains either one complete Vorbis packet, one
308 Vorbis packet fragment, or an integer number of complete Vorbis
309 packets (up to a max of 32 packets, since the number of packets
310 is defined by a 5 bit value).
312 Any Vorbis packet that is larger than 256 octets and less than the
313 path-MTU MUST be placed in a RTP packet by itself.
315 Any Vorbis packet that is 256 bytes or less SHOULD be bundled in the
316 RTP packet with as many Vorbis packets as will fit, up to a maximum
319 If a Vorbis packet will not fit within the network MTU, it SHOULD be
320 fragmented. A fragmented packet has a zero in the last five bits
321 of the payload header. Each fragment after the first will also set
322 the Continued (C) bit to one in the payload header. The RTP packet
323 containing the last fragment of the Vorbis packet will have the
324 Fragmented (F) bit set to one. To maintain the correct sequence
325 for fragmented packet reception the timestamp field of fragmented
326 packets MUST be the same as the first packet sent, with the sequence
327 number incremented as normal for the subsequent RTP packets. Path
328 MTU is detailed in [9] and [10].
333 3.1 Example Fragmented Vorbis Packet
335 Here is an example fragmented Vorbis packet split over three RTP
346 Kerr Expires April 27, 2003 [Page 6]
348 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
353 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
355 |V=2|P|X| CC |M| PT | 1000 |
356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
359 | synchronization source (SSRC) identifier |
360 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
361 | contributing source (CSRC) identifiers |
363 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
365 |0|0|0| 0| len | vorbis data .. |
366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
370 In this packet the initial sequence number is 1000 and the
371 timestamp is xxxxx. The number of packets field is set to 0.
376 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
378 |V=2|P|X| CC |M| PT | 1001 |
379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
382 | synchronization source (SSRC) identifier |
383 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
384 | contributing source (CSRC) identifiers |
386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
388 |1|0|0| 0| len | vorbis data ... |
389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
391 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
393 The C bit is set to 1 and the number of packets field is set to 0.
394 For large Vorbis fragments there can be several of these type of
395 payload packets. The maximum packet size SHOULD be no greater
396 than the path MTU, including all RTP and payload headers. The
397 sequence number has been incremented by one but the timestamp field
398 remains the same as the initial packet.
404 Kerr Expires April 27, 2003 [Page 7]
406 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
412 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
413 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
414 |V=2|P|X| CC |M| PT | 1002 |
415 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
418 | synchronization source (SSRC) identifier |
419 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
420 | contributing source (CSRC) identifiers |
422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
423 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
424 |1|1|0| 0| len | vorbis data .. |
425 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
429 This is the last Vorbis fragment packet. The C and F bits are
430 set and the packet count remains set to 0. As in the previous
431 packets the timestamp remains set to the first packet in the
432 sequence and the sequence number has been incremented.
437 As there is no error correction within the Vorbis stream, packet
438 loss will result in a loss of signal. Packet loss is more of an
439 issue for fragmented Vorbis packets as the client will have to
440 cope with the handling of the C and F flags. If we use the
441 fragmented Vorbis packet example above and the first packet is
442 lost the client SHOULD detect that the next packet has the packet
443 count field set to 0 and the C bit is set and MUST drop it. The
444 next packet, which is the final fragmented packet, SHOULD be dropped
445 in the same manner, or buffered. Feedback reports on lost and
446 dropped packets MUST be sent back via RTCP.
449 4 Configuration Headers
451 To decode a Vorbis stream three configuration header blocks are
452 needed. The first header indicates the sample and bitrates, the
453 number of channels and the version of the Vorbis encoder used.
454 The second header contains the decoders probability model, or
455 codebooks and the third header details stream metadata.
462 Kerr Expires April 27, 2003 [Page 8]
464 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
467 As the RTP stream may change certain configuration data mid-session
468 there are two different methods for delivering this configuration
469 data to a client, RTCP which is detailed below and SDP which is
470 detailed in section 5. SDP delivery is used to set-up an initial
471 state for the client application and RTCP is used to change state
472 during the session. The changes may be due to different metadata
473 or codebooks as well as different bitrates of the stream.
475 Unlike other mainstream audio codecs Vorbis has no statically
476 configured probability model, instead it packs all entropy decoding
477 configuration, VQ and Huffman models into a self-contained codebook.
478 This codebook block also requires additional identification
479 information detailing the number of audio channels, bit rates and
480 other information used to initialise the Vorbis stream.
483 4.1 RTCP Based Header Transmission
485 The three header data blocks are sent out-of-band as an APP defined
486 RTCP message with the 4 octet name field set to VORB.
488 Synchronizing the configuration headers to the RTP stream is
489 critical. A 32 bit timestamp field is used to indicate the
490 timepoint when a VORB header MUST be applied to the RTP stream.
491 VORB RTCP packets SHOULD be sent just ahead of the change in the
492 RTP stream. As the reception loss of the RTCP header will mean
493 the RTP stream will fail to decode properly the freqency of their
494 periodic retransmission SHOULD be high enough to minimize the
495 stream disturbance whilst remaining under the RTCP bandwidth
499 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
501 |V=2|P| subtype | PT=APP=204 | Length |
502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
504 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
506 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
507 | Timestamp (in sample rate units) |
508 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
511 | Audio Sample Rate |
512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
518 Kerr Expires April 27, 2003 [Page 9]
520 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
524 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
528 | bsz 0 | bsz 1 | Num Audio Channels |c|m|o|x|x|x|x|x|
529 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
530 | Codebook length | Codebook checksum |
531 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
533 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
535 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
536 | Vendor string length |
537 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
539 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
540 | User comments list length |
541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
542 .. User comment length / User comment |
543 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
546 The first Vorbis config header defines the Vorbis stream
547 attributes. The Vorbis version MUST be set to zero to comply with
548 this document. The fields Sample Rate, Bitrate Maximum/Nominal/
549 Minimum and Num Audio Channels are set in accordance with [6] with
550 the bsz fields above referring to the blocksize parameters. The
551 framing bit is not used for RTP transportation and so applications
552 constructing Vorbis files MUST take care to set this if required.
554 The next 8 bits are used to indicate the presence of the two
555 other Vorbis stream config headers and the size overflow header.
557 The c flag indicates the presence of a codebook header block, the
558 m flag indicates the presence of a comment metadata block. The o
559 flag indicates if the size of either of the c and m headers would
560 make the VORB packet greater than that allowed for a RTCP message.
562 The remaining five bits, indicated with an x, are reserved/unused
563 and MUST be set to 0 for this version of the document.
565 If the c flag is set then the next header block will contain the
566 codebook configuration data.
568 The configuration information detailed above MUST be completely
569 intact, as a client can not decode a stream with an incomplete
570 or corrupted codebook set.
572 A 16 bit codebook length field and a 16 bit 1's complement checksum
573 of the codebook precedes the codebook datablock. The length field
574 allows for codebooks to be up to 64K in size. The checksum is used
575 to detect a corrupted codebook.
577 Kerr Expires April 27, 2003 [Page 10]
579 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
582 If a checksum failure is detected then a new config header file
583 SHOULD be obtained from SDP, if the codebook has not changed since
584 the session has started. If no SDP value is set and no other method
585 for obtaining the config headers exists then this is considered to
586 be a failure and SHOULD be reported to the client application.
588 If the m flag is set then the next header block will contain the
589 comment metadata, such as artist name, track title and so on. These
590 metadata messages are not intended to be fully descriptive but to
591 offer basic track/song information. This message MUST be sent at
592 the start of the stream, together with the setup and codebook
593 headers, even if it contains no information. During a session the
594 metadata associated with the stream may change from that specified
595 at the start, e.g. a live concert broadcast changing acts/scenes, so
596 clients MUST have the ability to receive m header blocks. Details
597 on the format of the comments can be found in the Vorbis
600 The format for the data takes the form of a 32 bit codec vendors
601 name length field followed by the name encoded in UTF-8. The next
602 field denotes the number of user comments and then the user comments
603 length and text field pairs, up to the number indicated by the user
606 If the o, overflow, bit is set then the URI of a whole header block
607 is specified in an overflow URI field, which is a null terminated
608 UTF-8 string. The header file specified at the URI MUST NOT have
609 the overflow flag set, otherwise a loop condition will occur.
614 Codebook caching allows clients that have previously connected to a
615 stream to re-use the codebooks and thus begin the playback of the
616 session faster. When a client receives a codebook it may store
617 it, together with the MD5 key, locally and can compare the MD5 key
618 of locally cached codebooks with the key it receives via SDP, which
619 is detailed in section 5.
622 5 Session Description for Vorbis RTP Streams
624 Session description information concerning the Vorbis stream
625 SHOULD be provided if possible and MUST be in accordance with [8].
626 The SDP information is split into two sections, a mandatory
627 section detailing the RTP stream and an optional section used to
628 convey information needed for codebook caching.
635 Kerr Expires April 27, 2003 [Page 11]
637 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
640 Below is an outline of the mandatory SDP attributes.
642 c=IN IP4/6 <Vorbis stream>
643 m=audio <port> RTP/AVP 98
644 a=rtpmap:98 vorbis/<sample rate>
645 a=fmtp:98 header=<URI of Vorbis codebooks>
646 a=fmtp:98 md5key=<MD5 key of codebook>
648 The port value is specified by the server application bound to
649 the address specified in the c attribute. The bitrate value
650 specified in the a attribute MUST match the Vorbis sample rate
653 The Vorbis codebook specified in the header attribute MUST contain
654 all of the configuration data. If the codebook MD5 attribute,
655 md5key, is set the key is compared to a locally held cache and
656 if found the associated local codebook is used, if not the
657 client MUST use the configuration headers specified with the
660 5.1 SDP Based Config Header Transmission
662 The optional SDP attributes are used to convey details of the
663 Vorbis stream which are required for codebook caching. If the
664 following attributes are set they take precedent over values
665 specified in the u attribute detailed above. The maximum size
666 of the mandatory and optional SDP attributes MUST be less than
667 1K in size to conform to section 4.1 of [8].
669 a=fmtp:98 bitrate_min=<Bitrate Minimum>
670 a=fmtp:98 bitrate_norm=<Bitrate Normal>
671 a=fmtp:98 bitrate_max=<Bitrate Maximum>
672 a=fmtp:98 bsz0=<Block Size 0>
673 a=fmtp:98 bsz1=<Block Size 1>
674 a=fmtp:98 channels=<Num Audio Channels>
675 a=fmtp:98 meta_vendor=<Vendor Name>
678 The metadata attribute, meta_vendor, provides the bare minimum
679 information required for decoding but does not convey any
680 meaningful stream metadata information. As outlined in the Vorbis
681 comment field and header specification documentation, [7], a number
682 of predefined field names are available which SHOULD be used. An
688 Kerr Expires April 27, 2003 [Page 12]
690 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
693 a=fmtp:98 meta_vendor=Xiph.Org libVorbis I 20020717
694 a=fmtp:98 meta_artist=Honest Bob and the Factory-to-Dealer-Incentives
695 a=fmtp:98 meta_title=I'm Still Around
696 a=fmtp:98 meta_tracknumber=5
699 6 IANA Considerations
701 MIME media type name: audio
706 header indicates the URI of the decoding codebook.
707 md5key indicates the MD5 key of the codebooks.
710 bitrate_min, bitrate_norm and bitrate_max indicate the
711 minimum, nominal and maximum bitrates. bsz0 and bsz1
712 indicate the blocksize values. channels indicates the
713 number of audio channels in the stream. meta_vendor
714 indicates the encoding codec vendor.
716 Encoding considerations:
717 This type is only defined for transfer via RTP as specified
720 Security Considerations:
721 See Section 6 of RFC 3047.
723 Interoperability considerations: none
725 Published specification:
726 See the Vorbis documentation [2] for details.
728 Applications which use this media type:
729 Audio streaming and conferencing tools
731 Additional information: none
733 Person & email address to contact for further information:
735 philkerr@elec.gla.ac.uk/phil@plus24.com
737 Intended usage: COMMON
739 Author/Change controller:
741 Change controller: IETF AVT Working Group
745 Vorbis clients SHOULD send regular receiver reports detailing
746 congestion. A mechanism for dynamically downgrading the stream,
747 known as bitrate peeling, will allow for a graceful backing off
748 of the stream bitrate. This feature is not available at present
749 so an alternative would be to redirect the client to a lower
750 bitrate stream if one is available.
752 Kerr Expires April 27, 2003 [Page 13]
754 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
757 8 Security Considerations
759 RTP packets using this payload format are subject to the security
760 considerations discussed in the RTP specification [3]. This implies
761 that the confidentiality of the media stream is achieved by using
762 encryption. Because the data compression used with this payload
763 format is applied end-to-end, encryption may be performed on the
764 compressed data. Where the size of a data block is set care MUST
765 be taken to prevent buffer overflows in the client applications.
770 This document is a continuation of draft-moffitt-vorbis-rtp-00.txt.
771 The MIME type section is a continuation of draft-short-avt-rtp-
774 Thanks to the AVT, Ogg Vorbis Communities / Xiph.org including
775 Steve Casner, Ramon Garcia, Pascal Hennequin, Ralph Jiles,
776 Tor-Einar Jarnbjo, Colin Law, John Lazzaro, Jack Moffitt,
777 Colin Perkins, Barry Short, Mike Smith, Magnus Westerlund.
780 10 Normative References
782 1. The Ogg Encapsulation Format Version 0 (RFC 3533), S. Pfeiffer.
784 2. Key words for use in RFCs to Indicate Requirement Levels
785 (RFC 2119), S. Bradner.
787 3. RTP: A Transport Protocol for Real-Time Applications (RFC 1889),
790 4. RTP: A transport protocol for real-time applications. Work
791 in progress, draft-ietf-avt-rtp-new-11.txt.
793 5. RTP Profile for Audio and Video Conferences with Minimal Control.
794 Work in progress, draft-ietf-avt-profile-new-12.txt.
796 6. Ogg Vorbis I spec: Codec setup and packet decode.
797 http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-ref.html
799 7. Ogg Vorbis I spec: Comment field and header specification.
800 http://www.xiph.org/ogg/vorbis/doc/v-comment.html
802 8. SDP: Session Description Protocol (RFC 2327), Handley, M. and
805 9. Path MTU Discovery (RFC 1063), Mogul & Deering
807 10. Path MTU Discovery for IP version 6 (RFC 1981), McCann, J. et al.
813 Kerr Expires April 27, 2003 [Page 14]
815 Internet Draft draft-kerr-avt-vorbis-rtp-03.txt October 27, 2003
818 10.1 Informative References
820 11. libvorbis: Available from the Xiph website, http://www.xiph.org
823 11 Full Copyright Statement
825 Copyright (C) The Internet Society (2003). All Rights Reserved.
827 This document and translations of it may be copied and furnished to
828 others, and derivative works that comment on or otherwise explain it
829 or assist in its implementation may be prepared, copied, published
830 and distributed, in whole or in part, without restriction of any
831 kind, provided that the above copyright notice and this paragraph are
832 included on all such copies and derivative works. However, this
833 document itself may not be modified in any way, such as by removing
834 the copyright notice or references to the Internet Society or other
835 Internet organizations, except as needed for the purpose of
836 developing Internet standards in which case the procedures for
837 copyrights defined in the Internet Standards process must be
838 followed, or as required to translate it into languages other than
841 The limited permissions granted above are perpetual and will not be
842 revoked by the Internet Society or its successors or assigns.
844 This document and the information contained herein is provided on an
845 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
846 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
847 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
848 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
849 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
853 "The IETF takes no position regarding the validity or scope of
\r any intellectual property or other rights that might be claimed
\r to pertain to the implementation or use of the technology
\r described in this document or the extent to which any license
\r under such rights might or might not be available; neither does
\r it represent that it has made any effort to identify any such
\r rights. Information on the IETF's procedures with respect to
\r rights in standards-track and standards-related documentation
\r can be found in BCP-11. Copies of claims of rights made
\r available for publication and any assurances of licenses to
\r be made available, or the result of an attempt made
\r to obtain a general license or permission for the use of such
\r proprietary rights by implementors or users of this
\r specification can be obtained from the IETF Secretariat."
855 Further IPR details on the Vorbis bitstream may be found on the
856 Xiph website: http://www.xiph.org
863 Centre for Music Technology
864 University of Glasgow
867 Phone: +44 141 330 5740
868 Email: philkerr@elec.gla.ac.uk
871 WWW: http://www.xiph.org/
874 Kerr Expires April 27, 2003 [Page 15]