From a91804520c0e6ede6aafe141ce8d278e9fc527a5 Mon Sep 17 00:00:00 2001 From: Ralph Giles Date: Mon, 21 Oct 2002 23:37:00 +0000 Subject: [PATCH] Typo and grammar corrections found in proofing. svn path=/trunk/vorbis/; revision=4030 --- doc/xml/01-introduction.xml | 69 +++++++++++++++++++++++---------------------- doc/xml/02-bitpacking.xml | 8 +++--- doc/xml/03-codebook.xml | 26 ++++++++--------- doc/xml/04-codec.xml | 52 ++++++++++++++++++---------------- doc/xml/Vorbis_I_spec.xml | 4 ++- 5 files changed, 81 insertions(+), 78 deletions(-) diff --git a/doc/xml/01-introduction.xml b/doc/xml/01-introduction.xml index fd13c51..c5f94b9 100644 --- a/doc/xml/01-introduction.xml +++ b/doc/xml/01-introduction.xml @@ -1,7 +1,7 @@
- $Id: 01-introduction.xml,v 1.3 2002/10/14 23:47:54 giles Exp $ + $Id: 01-introduction.xml,v 1.4 2002/10/21 23:36:59 giles Exp $ Last update to this document: July 18, 2002 @@ -13,9 +13,9 @@ This document provides a high level description of the Vorbis codec's construction. A bit-by-bit specification appears beginning in . The other reference documents assumes -a high-level understanding of the Vorbis decode process, which is -provided in this document. +linkend="vorbis-spec-codec"/>. The later sections assume a high-level +understanding of the Vorbis decode process, which is +provided here.
Application @@ -23,9 +23,9 @@ provided in this document. Vorbis is a general purpose perceptual audio CODEC intended to allow maximum encoder flexibility, thus allowing it to scale competitively over an exceptionally wide range of bitrates. At the high -quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits), +quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits) it is in the same league as MPEG-2 and MPC. Similarly, the 1.0 -encoder can encode high-quality CD and DAT rate stereo at below 48kpbs +encoder can encode high-quality CD and DAT rate stereo at below 48kbps without resampling to a lower rate. Vorbis is also intended for lower and higher sample rates (from 8kHz telephony to 192kHz digital masters) and a range of channel representations (monaural, @@ -54,7 +54,7 @@ encoder and simple, low-complexity decoder. Vorbis decode is computationally simpler than mp3, although it does require more working memory as Vorbis has no static probability model; the vector codebooks used in the first stage of decoding from the bitstream are -packed, in their entirety, into the Vorbis bitstream headers. In +packed in their entirety into the Vorbis bitstream headers. In packed form, these codebooks occupy only a few kilobytes; the extent to which they are pre-decoded into a cache is the dominant factor in decoder memory usage. @@ -64,10 +64,10 @@ decoder memory usage. Vorbis provides none of its own framing, synchronization or protection against errors; it is solely a method of accepting input audio, dividing it into individual frames and compressing these frames into -raw, unformatted 'packets'. The decoder then accepts these raw +raw, unformatted 'packets'. The decoder then accepts these raw packets in sequence, decodes them, synthesizes audio frames from them, and reassembles the frames into a facsimile of the original -audio stream. Vorbis is a free-form VBR codec and packets have no +audio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no minimum size, maximum size, or fixed/expected size. Packets are designed that they may be truncated (or padded) and remain decodable; this is not to be considered an error condition and is used @@ -85,18 +85,20 @@ embedded in an Ogg stream specifically, although this is by no means a requirement or fundamental assumption in the Vorbis design. -The specifications for embedding Vorbis into -an Ogg transport stream is in a separate document. +The specification for embedding Vorbis into +an Ogg transport stream is in a . + +
Codec Setup and Probability Model -Vorbis's heritage is as a research CODEC and its current design +Vorbis' heritage is as a research CODEC and its current design reflects a desire to allow multiple decades of continuous encoder improvement before running out of room within the codec specification. -For these reasons, configurable aspects codec setup intentionally +For these reasons, configurable aspects of codec setup intentionally lean toward the extreme of forward adaptive. @@ -107,10 +109,13 @@ packed into the bitstream header along with extensive CODEC setup parameters (often several hundred fields). This makes it impossible, as it would be with MPEG audio layers, to embed a simple frame type flag in each audio packet, or begin decode at any frame in the stream -without having previously fetched the codec setup header. [Note: -Vorbis *can* initiate decode at any arbitrary packet within a +without having previously fetched the codec setup header. + + + +Vorbis can initiate decode at any arbitrary packet within a bitstream so long as the codec has been initialized/setup with the -setup headers]. +setup headers. Thus, Vorbis headers are both required for decode to begin and @@ -125,7 +130,7 @@ causes some amount of complaint among engineers as this runs against current design trends (and also points out limitations in some existing software/interface designs, such as Windows' ACM codec framework). However, we find that it does not fundamentally limit -Vorbis's suitable application space. +Vorbis' suitable application space.
@@ -338,8 +343,7 @@ sample rate and number of channels. The comment header includes user text comments ("tags") and a vendor string for the application/library that produced the bitstream. The -encoding of the comment header is described within this document; the -proper use of the comment fields is described in +encoding and proper use of the comment header is described in .
@@ -368,7 +372,7 @@ fundamentally the same. compute dot product of floor and residue, producing audio spectrum vector inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I overlap/add left-hand output of transform with right-hand output of previous frame -store right hand-data from transform of current frame for future lapping. +store right hand-data from transform of current frame for future lapping if not first frame, return results of overlap/add as audio result of current frame @@ -388,9 +392,9 @@ specification, it need not be a literal semantic implementation. Vorbis I uses four packet types. The first three packet types mark each -of the three Vorbis headers described above. The fourth packet type -marks an audio packet. All others packet types are reserved; packets -marked with a reserved flag type should be ignored. +of the three Vorbis headers described above. The fourth packet type +marks an audio packet. All other packet types are reserved; packets +marked with a reserved type should be ignored. Following the three header packets, all packets in a Vorbis I stream @@ -457,7 +461,7 @@ In the unequal-sized window case, the window shape of the long window must be modified for seamless lapping as above. It is possible to correctly infer window shape to be applied to the current window from knowing the sizes of the current, previous and next window. It is -legal for a decoder to use this method; However, in the case of a long +legal for a decoder to use this method. However, in the case of a long window (short windows require no modification), Vorbis also codes two flag bits to specify pre- and post- window shape. Although not strictly necessary for function, this minor redundancy allows a packet @@ -475,6 +479,7 @@ The use of multirate filter banks for coding of high quality digital audio, by T. Sporer, K. Brandenburg and B. Edler. Vorbis windows all use the slope function + y=sin(2PI*sin^2(x/n)) @@ -515,8 +520,8 @@ are coded individually in channel order. A detailed discussion of stereo in the Vorbis codec can be found in -the document _Stereo Channel Coupling in the -Vorbis CODEC_. Vorbis is not limited to only stereo coupling, but +the document Stereo Channel Coupling in the +Vorbis CODEC. Vorbis is not limited to only stereo coupling, but the stereo document also gives a good overview of the generic coupling mechanism. @@ -529,8 +534,8 @@ polar representation (where one vector is magnitude and the second angle) back to Cartesian representation. -After decoupling, in order, each pair of vectors on the coupling list -in, the resulting residue vector represents the fine spectral detail +After decoupling, in order, each pair of vectors on the coupling list, +the resulting residue vectors represent the fine spectral detail of each output channel. @@ -590,11 +595,9 @@ representation. The audio spectrum is converted back into time domain PCM audio via an inverse Modified Discrete Cosine Transform (MDCT). A detailed -description of the MDCT is available in the paper - -The use of multirate filter banks for coding of high quality digital -audio, by T. Sporer, K. Brandenburg and B. Edler. +description of the MDCT is available in the paper The use of multirate filter banks for coding of high quality digital +audio, by T. Sporer, K. Brandenburg and B. Edler. Note that the PCM produced directly from the MDCT is not yet finished diff --git a/doc/xml/02-bitpacking.xml b/doc/xml/02-bitpacking.xml index 2493817..2a65128 100644 --- a/doc/xml/02-bitpacking.xml +++ b/doc/xml/02-bitpacking.xml @@ -1,7 +1,7 @@
- $Id: 02-bitpacking.xml,v 1.3 2002/10/14 18:20:30 giles Exp $ + $Id: 02-bitpacking.xml,v 1.4 2002/10/21 23:37:00 giles Exp $ Last update to this document: July 14, 2002 @@ -40,7 +40,7 @@ size that is a grouped multiple of this smallest size. The most ubiquitous architectures today consider a 'byte' to be an octet (eight bits) and a word to be a group of two, four or eight -bytes (16,32 or 64 bits). Note however that the Vorbis bitpacking +bytes (16, 32 or 64 bits). Note however that the Vorbis bitpacking convention is still well defined for any native byte size; Vorbis uses the native bit-width of a given storage system. This document assumes that a byte is one octet for purposes of example. @@ -83,10 +83,10 @@ through byte n. The Vorbis codec has need to code arbitrary bit-width integers, from zero to 32 bits wide, into packets. These integer fields are not aligned to the boundaries of the byte representation; the next field -is written at the bit position that the previous field ends. +is written at the bit position at which the previous field ends. -The encoder logically packs integers by writing the LSb of an binary +The encoder logically packs integers by writing the LSb of a binary integer to the logical bitstream first, followed by next least significant bit, etc, until the requested number of bits have been coded. When packing the bits into bytes, the encoder begins by diff --git a/doc/xml/03-codebook.xml b/doc/xml/03-codebook.xml index 43b3da0..c14e1a0 100644 --- a/doc/xml/03-codebook.xml +++ b/doc/xml/03-codebook.xml @@ -1,7 +1,7 @@
- $Id: 03-codebook.xml,v 1.2 2002/10/14 23:47:54 giles Exp $ + $Id: 03-codebook.xml,v 1.3 2002/10/21 23:37:00 giles Exp $ Last update to this document: August 8, 2002 @@ -35,7 +35,7 @@ stream according to . Packed Codebook Format -For purposes of the below examples, we assume that the storage +For purposes of the examples below, we assume that the storage system's native byte width is eight bits. This is not universally true; see for discussion relating to non-eight-bit bytes. @@ -98,7 +98,7 @@ byte 8: [ X 1 ] [sparse] flag (1 bit) The decoder now performs for each of the [codebook_entries] - code book entries: + codebook entries: @@ -134,15 +134,13 @@ byte 8: [ X 1 ] [sparse] flag (1 bit) 1) [current_entry] = 0; 2) [current_length] = read a five bit unsigned integer and add 1; - 3) [number] = read ilog([codebook_entries] - -[current_entry]) -bits as an unsigned integer + 3) [number] = read ilog([codebook_entries] - [current_entry]) bits as an unsigned integer 4) set the entries [current_entry] through [current_entry]+[number]-1, inclusive, - of the [codebook_codeword_lengths] array to [current_length] + of the [codebook_codeword_lengths] array to [current_length] 5) set [current_entry] to [number] + [current_entry] 6) increment [current_length] by 1 - 7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION; the decoder will - not be able to read this stream. + 7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION; + the decoder will not be able to read this stream. 8) if [current_entry] is less than [codebook_entries], repeat process starting at 3) 9) done. @@ -202,12 +200,10 @@ possible scalar values. Lookup decode proceeds as follows: 7) read a total of [codebook_lookup_values] unsigned integers of [codebook_value_bits] each; store these in order in the array [codebook_multiplicands] - A [codebook_lookup_type] of greater than two is reserved -and indicates -a stream that's not decodable by the specification in this +and indicates a stream that is not decodable by the specification in this document. @@ -340,10 +336,10 @@ decode in a VQ context. Lookup type one specifies a lattice VQ lookup table built -algorithmically from a list of scalar values. Calculate [unpack] the +algorithmically from a list of scalar values. Calculate (unpack) the final values of a codebook entry vector from the entries in [codebook_multiplicands] as follows ([value_vector] -is the output vector representing the vctor of values for entry number +is the output vector representing the vector of values for entry number [lookup_offset] in this codebook): @@ -377,7 +373,7 @@ each vector is explicitly set by the [codebook_multiplicands] array in a one-to-one mapping. Calculate [unpack] the final values of a codebook entry vector from the entries in [codebook_multiplicands] as follows ([value_vector] -is the output vector representing the vctor of values for entry number +is the output vector representing the vector of values for entry number [lookup_offset] in this codebook): diff --git a/doc/xml/04-codec.xml b/doc/xml/04-codec.xml index 5fa0fc4..9fc1f5c 100644 --- a/doc/xml/04-codec.xml +++ b/doc/xml/04-codec.xml @@ -1,7 +1,7 @@
- $Id: 04-codec.xml,v 1.3 2002/10/18 00:43:00 giles Exp $ + $Id: 04-codec.xml,v 1.4 2002/10/21 23:37:00 giles Exp $ Last update to this document: October 15, 2002 @@ -207,20 +207,20 @@ uses a single mapping type (0), with implicit PCM channel mappings. [vorbis_mapping_count] = read 6 bits as unsigned integer and add one - For each [i] -of [vorbis_mapping_count] mapping numbers: + For each [i] of [vorbis_mapping_count] mapping numbers: read the mapping type: 16 bits as unsigned integer. There's no reason to save the mapping type in Vorbis I. If the mapping type is nonzero, the stream is undecodable If the mapping type is zero: - read 1 bit as a boolean flag + read 1 bit as a boolean flag if set, [vorbis_mapping_submaps] = read 4 bits as unsigned integer and add one if unset, [vorbis_mapping_submaps] = 1 + - read 1 bit as a boolean flag + read 1 bit as a boolean flag if set, square polar channel mapping is in use: @@ -231,11 +231,14 @@ of [vorbis_mapping_count] mapping numbers: vector [vorbis_mapping_angle] element [j]= read ilog([audio_channels] - 1) bits as unsigned integer the numbers read in the above two steps are channel numbers representing the channel to treat as magnitude and the channel to treat as angle, respectively. If for any coupling step the angle channel number equals the magnitude channel number, the magnitude channel number is greater than [audio_channels]-1, or the angle channel is greater than [audio_channels]-1, the stream is undecodable. - + + - + + if unset, [vorbis_mapping_coupling_steps] = 0 + read 2 bits (reserved field); if the value is nonzero, the stream is undecodable if [vorbis_mapping_submaps] is greater than one, we read channel multiplex settings. For each [j] of [audio_channels] channels: @@ -257,7 +260,7 @@ of [vorbis_mapping_count] mapping numbers: - +
@@ -299,7 +302,7 @@ After reading mode descriptions, setup header decode is complete. Following the three header packets, all packets in a Vorbis I stream are audio. The first step of audio packet decode is to read and -verify the packet type; a non-audio packet when audio is expected +verify the packet type. A non-audio packet when audio is expected indicates stream corruption or a non-compliant stream. The decoder must ignore the packet and not attempt decoding it to audio. @@ -341,7 +344,7 @@ illustration of overlapping dissimilar -Vorbis windows all use the slope function y=sin(0.5 * π * sin^2((x+.5)/n * π)), +Vorbis windows all use the slope function y=sin(0.5 * \pi * sin^2((x+.5)/n * \pi)), where n is window size and x ranges 0...n-1, but dissimilar lapping requirements can affect overall shape. Window generation proceeds as follows: @@ -382,9 +385,9 @@ set) then window from range 0 ... [left_window_start]-1 inclusive is zero for [i] in range [left_window_start] ... -[left_window_end]-1, window([i]) = sin(.5 * π * sin^2( ([i]-[left_window_start]+.5) / [left_n] * .5 * π) ) +[left_window_end]-1, window([i]) = sin(.5 * \pi * sin^2( ([i]-[left_window_start]+.5) / [left_n] * .5 * \pi) ) window from range [left_window_end] ... [right_window_start]-1 -inclusive is one for [i] in range [right_window_start] ... [right_window_end]-1, window([i]) = sin(.5 * π * sin^2( ([i]-[right_window_start]+.5) / [right_n] * .5 * π/2. + .5 * π) ) +inclusive is one for [i] in range [right_window_start] ... [right_window_end]-1, window([i]) = sin(.5 * \pi * sin^2( ([i]-[right_window_start]+.5) / [right_n] * .5 * \pi/2. + .5 * \pi) ) window from range [rigth_window_start] ... [n]-1 is zero @@ -420,12 +423,10 @@ For each floor [i] of [audio_channels] floor (vector [vorbis_floor_types] element [floor_number]) is zero then decode the floor for channel [i] according to the -floor 0 decode -algorithm + if the type of this floor is one then decode the floor for channel [i] according -to the floor 1 decode -algorithm +to the save the needed decoded floor information for channel for later synthesis if the decoded floor returned 'unused', set vector [no_residue] element [i] to true, else set vector [no_residue] element [i] to @@ -571,7 +572,7 @@ length for floor computation is [n]/2.
For each channel, multiply each element of the floor curve by each element of that channel's residue vector. The result is the dot -product the floor and residue vectors for each channel; the produced +product of the floor and residue vectors for each channel; the produced vectors are the length [n]/2 audio spectrum for each channel. @@ -607,11 +608,10 @@ representation. Convert the audio spectrum vector of each channel back into time domain PCM audio via an inverse Modified Discrete Cosine Transform (MDCT). A detailed description of the MDCT is available in the paper -The +The use of multirate filter banks for coding of high quality digital -audio, by T. Sporer, K. Brandenburg and B. Edler. The window -function used for the MDCT is the window determined earlier. +audio, by T. Sporer, K. Brandenburg and B. Edler. The window +function used for the MDCT is the function described earlier.
@@ -621,18 +621,20 @@ function used for the MDCT is the window determined earlier.
Windowed MDCT output is overlapped and added with the right hand data of the previous window such that the 3/4 point of the previous window is aligned with the 1/4 point of the current window (as illustrated in -. The overlapped portion +). The overlapped portion produced from overlapping the previous and current frame data is finished data to be returned by the decoder. This data spans from the center of the previous window to the center of the current window. In the case of same-sized windows, the amount of data to return is one-half block consisting of and only of the overlapped portions. When -overlapping a short and long window, much of the returned range is not +overlapping a short and long window, much of the returned range does not actually overlap. This does not damage transform orthogonality. Pay attention however to returning the correct data range; the amount of data to be returned is: -window_blocksize(previous_window)/4+window_blocksize(current_window)/4 + +window_blocksize(previous_window)/4+window_blocksize(current_window)/4 + from the center (element windowsize/2) of the previous window to the center (element windowsize/2-1, inclusive) of the current window. @@ -679,7 +681,7 @@ front center, front right, rear left, rear right six channels - the stream is 5,1 surround. channel order: front left, front + the stream is 5.1 surround. channel order: front left, front center, front right, rear left, rear right, LFE diff --git a/doc/xml/Vorbis_I_spec.xml b/doc/xml/Vorbis_I_spec.xml index 8354c07..ff93846 100644 --- a/doc/xml/Vorbis_I_spec.xml +++ b/doc/xml/Vorbis_I_spec.xml @@ -2,6 +2,7 @@ @@ -15,7 +16,8 @@ - + + ]>
-- 2.7.4