1 MIME types in GStreamer
6 A MIME type is a combination of two (short) strings (words)---the content type
7 and the content subtype. Content types are broad categories used for describing
8 almost all types of files: video, audio, text, and application are common
9 content types. The subtype further breaks the content type down into a more
10 specific type description, for example 'application/ogg', 'audio/raw',
11 'video/mpeg', or 'text/plain'.
13 So the content type and subtype make up a pair that describes the type of
14 information contained in a file. In multimedia processing, MIME types are used
15 to describe the type of information carried by a media stream. In GStreamer, we
16 use MIME types in the same way, to identify the types of information that are
17 allowed to pass between GStreamer elements. The MIME type is part of a GstCaps
18 object that describes a media stream. Besides a MIME type, a GstCaps object also
19 contains a name and some stream properties (GstProps, which hold combinations of
22 An example of a MIME type is 'video/mpeg'. A corresponding GstCaps could be
25 GstCaps *caps = gst_caps_new("video_mpeg_type",
27 gst_props_new("width", GST_PROPS_INT(384),
28 "height", GST_PROPS_INT(288),
32 GstCaps *caps = GST_CAPS_NEW("video_mpeg_type",
34 "width", GST_PROPS_INT(384),
35 "height", GST_PROPS_INT(288));
37 Obviously, MIME types and their corresponding properties are of major importance
38 in GStreamer for uniquely identifying media streams.
40 Official MIME media types are assigned by the IANA. Current assignments are at
41 http://www.iana.org/assignments/media-types/.
46 Some streams may have MIME types or GstCaps that do not fully describe the
47 stream. In most cases, this is not a problem, though. For example, if a stream
48 contains Ogg/Vorbis data (which is of type 'application/ogg'), we don't need to
49 know the samplerate of the raw audio stream, since we can't play the encoded
50 audio anyway. The samplerate is, however, important for raw audio, so a decoder
51 would need to retrieve the samplerate from the Ogg/Vorbis stream headers (the
52 headers are part of the bytestream) in order to pass it on in the GstCaps that
53 belongs to the decoded audio (which becomes a type like 'audio/raw'). However,
54 other plugins might want to know such properties, even for compressed streams.
55 One such example is an AVI muxer, which does want to know the samplerate of an
56 audio stream, even when it is compressed.
58 Another problem is that many media types can be defined in multiple ways. For
59 example, MJPEG video can be defined as 'video/jpeg', 'video/mjpeg',
60 'image/jpeg', 'video/x-msvideo' with a compression of (fourcc) MJPG, etc.
61 None of these is really official, since there isn't an official mimetype
62 for encoded MJPEG video.
64 The main focus of this document is to propose a standardized set of MIME types
65 and properties that will be used by the GStreamer plugins.
67 Different types of streams
68 ==========================
70 There are several types of media streams. The most important distinction will be
71 container formats, audio codecs and video codecs. Container formats are
72 bytestreams that contain one or more substreams inside it, and don't provide any
73 direct media data itself. Examples are Quicktime, AVI or MPEG System Stream.
74 They mostly contain of a set of headers that define the media streams that are
75 packed inside the container, along with the media data itself.
77 Video codecs and audio codecs describe encoded audio or video data. Examples are
78 MPEG-1 video, DivX video, MPEG-1 layer 3 (MP3) audio or Ogg/Vorbis audio.
79 Actually, Ogg is a container format too (for Vorbis audio), but these are
80 usually used in conjunction with each other.
82 Finally, there are the somewhat obvious (but not commonly encountered as files)
88 1 - AVI (Microsoft RIFF/AVI)
89 MIME type: video/x-msvideo
95 MIME type: video/quicktime
101 MIME type: video/mpeg
102 Properties: 'systemstream' = TRUE (BOOLEAN)
103 Parser: mp1videoparse
107 MIME type: video/x-ms-asf
113 MIME type: audio/x-wav
119 MIME type: video/x-pn-realvideo
120 Properties: 'systemstream' = TRUE (BOOLEAN)
124 7 - DV (Digital Video)
125 MIME type: video/x-dv
126 Properties: 'systemstream' = TRUE (BOOLEAN)
131 MIME type: application/ogg
137 MIME type: video/x-mkv
142 10 - Shockwave (Macromedia)
143 MIME type: application/x-shockwave-flash
149 MIME type: audio/x-au
155 MIME type: audio/x-mod
157 Parser: modplug, mikmod
161 MIME type: video/x-fli
167 MIME type: application/x-ape
173 MIME type: audio/x-aiff
179 MIME type: audio/x-sid
184 Please note that we try to keep these MIME types as similar as possible to the
185 MIME types used as standards in Gnome (Gnome-VFS/Nautilus) and KDE
188 Also, there is a very thin line between audio codecs and audio containers
189 (take mp3 vs. sid, etc.). This is just a per-case thing right now and needs to
190 be documented further.
195 For convenience, the fourcc codes used in the AVI container format will be
196 listed along with the MIME type and optional properties.
198 Optional properties for all video formats are the following:
200 width = 1 - MAXINT (INT)
201 height = 1 - MAXINT (INT)
202 pixel_width = 1 - MAXINT (INT, with pixel_height forms aspect ratio)
203 pixel_height = 1 - MAXINT (INT, with pixel_width forms aspect ratio)
204 framerate = 0 - MAXFLOAT (FLOAT)
206 1 - MPEG-1, -2 and -4 video (ISO/LA MPEG)
207 MIME type: video/mpeg
208 Properties: systemstream = FALSE (BOOLEAN)
209 mpegversion = 1/2/4 (INT)
210 Known fourccs: MPEG, MPGI
211 Encoder: mpeg1enc, mpeg2enc
212 Decoder: mpeg1dec, mpeg2dec, mpeg2subt
214 2 - DivX 3.x, 4.x and 5.x video (divx.com)
215 MIME type: video/x-divx
217 Optional properties: divxversion = 3/4/5 (INT)
218 Known fourccs: DIV3, DIV4, DIV5, DIVX, DX50, DIVX, divx
220 Decoder: dvdreadsrc, dvdnavsrc
222 3 - Microsoft MPEG 4.1, 4.2 and 4.3
223 MIME type: video/x-msmpeg
225 Optional properties: msmpegversion = 41/42/43 (INT)
226 Known fourccs: MPG4, MP42, MP43
227 Encoder: ffenc_msmpeg4, ffenc_msmpeg4v1, ffenc_msmpeg4v2
228 Decoder: ffdec_msmpeg4, ffdec_msmpeg4v1, ffdec_msmpeg4v2
230 4 - Motion-JPEG (official and extended)
231 MIME type: video/x-jpeg
233 Known fourccs: MJPG (YUY2 MJPEG), JPEG (any), PIXL (Pinnacle/Miro), VIXL
237 5 - Sorensen (Quicktime - SVQ1/SVQ3)
238 MIME types: video/x-svq
239 Properties: svqversion = 1/3 (INT)
243 6 - H263 and related codecs
244 MIME type: video/x-h263
246 Known fourccs: H263, i263, M263, x263, VDOW, VIVO
251 MIME type: video/x-pn-realvideo
252 Properties: systemstream = FALSE (BOOLEAN)
253 Known fourccs: RV10, RV20, RV30
257 8 - Digital Video (DV)
258 MIME type: video/x-dv
259 Properties: systemstream = FALSE (BOOLEAN)
260 Known fourccs: DVSD, dvsd
264 9 - Windows Media Video 1 and 2 (WMV)
265 MIME type: video/x-wmv
266 Properties: wmvversion = 1/2 (INT)
271 MIME type: video/x-xvid
273 Known fourccs: xvid, XVID
278 MIME type: video/x-3ivx
280 Known fourccs: 3IV0, 3IV1, 3IV2
284 12 - Ogg/Tarkin (Xiph)
285 MIME type: video/x-tarkin
291 MIME type: video/x-vp3
296 14 - Ogg/Theora (Xiph, VP3-like)
297 MIME type: video/x-theora
303 MIME type: video/x-huffyuv
309 16 - FF Video 1 (FFMPEG)
310 MIME type: video/x-ffv
311 Properties: ffvversion = 1 (INT)
316 MIME type: video/x-h264
322 MIME type: video/x-indeo
323 Properties: indeoversion = 3 (INT)
327 19 - Portable Network Graphics (PNG)
328 MIME type: video/x-png
333 TODO: subsampling information for YUV?
335 TODO: colorspace identifications for MJPEG? How?
337 TODO: how to distinguish MJPEG-A/B (Quicktime) and lossless JPEG?
339 TODO: divx4/divx5/xvid/3ivx/mpeg-4 - how to make them overlap? (all
340 ISO MPEG-4 compatible)
344 for convenience, the two-byte hexcodes (as are being used for identification
345 in AVI files) are also given
347 Preface - (optional) properties for all audio formats:
348 'rate' = X (int) <- sampling rate
349 'channels' = X (int) <- number of audio channels
351 1 - Raw Audio (integer format)
352 mimetype: audio/x-raw-int
353 properties: 'width' = X (INT) <- memory bits per sample
354 'depth' = X (INT) <- used bits per sample
355 'signed' = X (BOOLEAN)
356 'endianness' = 1234/4321 (INT)
358 2 - Raw Audio (floating point format)
359 mimetype: audio/x-raw-float
360 properties: 'depth' = X (INT) <- 32=float, 64=double
361 'endianness' = 1234/4321 (INT) <- use G_BIG/LITTLE_ENDIAN!
362 'buffer-frames' = (INT)
363 With regards to the signal: 0.0 represents no signal, +/- 1.0 is 0 dB.
366 mimetype: audio/x-alaw
369 mimetype: audio/x-mulaw
371 5 - MPEG-1 layer 1/2/3 audio
373 properties: 'mpegversion' = 1 (INT)
374 'layer' = 1/2/3 (INT)
377 mimetype: audio/x-vorbis
379 7 - Windows Media Audio 1 and 2 (WMA)
380 mimetype: audio/x-wma
381 properties: 'wmaversion' = 1/2 (INT)
384 mimetype: audio/x-ac3
386 9 - FLAC (Free Lossless Audio Codec)
387 mimetype: audio/x-flac
389 10 - MACE 3/6 (Quicktime audio)
390 mimetype: audio/x-mace
391 properties: 'maceversion' = 3/6 (INT)
395 properties: 'mpegversion' = 4 (INT)
397 12 - (IMA) ADPCM (Quicktime/WAV/Microsoft/4XM)
398 mimetype: audio/x-adpcm
399 properties: 'layout' = "quicktime"/"wav"/"microsoft"/"4xm" (STRING)
401 Note: the difference between each of these is the number of
402 samples packaed together per channel. For WAV, for
403 example, each sample is 4 bit, and 8 samples are packed
404 together per channel in the bytestream. For the others,
405 refer to technical documentation.
406 We probably want to distinguish these differently, but
407 I don't know how, yet.
409 13 - RealAudio (Real)
410 mimetype: audio/x-pn-realaudio
411 properties: 'bitrate' = 14400/28800 (INT)
413 For convenience, the two-byte hexcodes (as used for identification in AVI files)
416 Properties for all audio formats include the following:
418 rate = 1 - MAXINT (INT, sampling rate)
419 channels = 1 - MAXINT (INT, number of audio channels)
422 MIME type: audio/x-alaw
428 MIME type: audio/x-mulaw
433 3 - MPEG-1 layer 1/2/3 audio
434 MIME type: audio/mpeg
435 Properties: mpegversion = 1 (INT)
441 MIME type: audio/x-vorbis
445 5 - Windows Media Audio 1 and 2 (WMA)
446 MIME type: audio/x-wma
447 Properties: wmaversion = 1/2 (INT)
452 MIME type: audio/x-ac3
457 7 - FLAC (Free Lossless Audio Codec)
458 MIME type: audio/x-flac
463 8 - MACE 3/6 (Quicktime audio)
464 MIME type: audio/x-mace
465 Properties: maceversion = 3/6 (INT)
470 MIME type: audio/mpeg
471 Properties: mpegversion = 4 (INT)
475 10 - (IMA) ADPCM (Quicktime/WAV/Microsoft/4XM)
476 MIME type: audio/x-adpcm
477 Properties: layout = "quicktime"/"wav"/"microsoft"/"4xm" (STRING)
481 Note: The difference between each of these four PCM formats is the number
482 of samples packaed together per channel. For WAV, for example, each
483 sample is 4 bit, and 8 samples are packed together per channel in the
484 bytestream. For the others, refer to technical documentation. We
485 probably want to distinguish these differently, but I don't know how,
488 11 - RealAudio (Real)
489 MIME type: audio/x-pn-realaudio
490 Properties: bitrate = 14400/28800 (INT)
495 MIME type: audio/x-dv
501 MIME type: audio/x-gsm
507 MIME type: audio/x-speex
510 TODO: adpcm/dv needs confirmation from someone with knowledge...
515 Raw formats contain unencoded, raw media information. These are rather rare from
516 an end user point of view since raw media files have historically been
517 prohibitively large ... hence the multitude of encoding formats.
519 Raw video formats require the following common properties, in addition to
520 format-specific properties:
522 width = 1 - MAXINT (INT)
523 height = 1 - MAXINT (INT)
525 1 - Raw Video (YUV/YCbCr)
526 MIME type: video/x-raw-yuv
527 Properties: 'format' = 'XXXX' (fourcc)
528 Known fourccs: YUY2, I420, Y41P, YVYU, UYVY, etc.
531 Some raw video formats have implicit alignment rules. We should discuss this
532 more. Also, some formats have multiple fourccs (e.g. IYUV/I420 or
533 YUY2/YUYV). For each of these, we only use one (e.g. I420 and YUY2).
535 Currently recognized formats:
537 YUY2: packed, Y-U-Y-V order, U/V hor 2x subsampled (YUV-4:2:2, 16 bpp)
538 YVYU: packed, Y-V-Y-U order, U/V hor 2x subsampled (YUV-4:2:2, 16 bpp)
539 UYVY: packed, U-Y-V-Y order, U/V hor 2x subsampled (YUV-4:2:2, 16 bpp)
540 Y41P: packed, UYVYUYVYYYYY order, U/V hor 4x subsampled (YUV-4:1:1, 12 bpp)
541 IUY2: packed, U-Y-V order, not subsampled (YUV-1:1:1, 24 bpp)
543 Y42B: planar, Y-U-V order, U/V hor 2x subsampled (YUV-4:2:2, 16 bpp)
544 YV12: planar, Y-V-U order, U/V hor+ver 2x subsampled (YUV-4:2:0, 12 bpp)
545 I420: planar, Y-U-V order, U/V hor+ver 2x subsampled (YUV-4:2:0, 12 bpp)
546 Y41B: planar, Y-U-V order, U/V hor 4x subsampled (YUV-4:1:1, 12bpp)
547 YUV9: planar, Y-U-V order, U/V hor+ver 4x subsampled (YUV-4:1:0, 9bpp)
548 YVU9: planar, Y-V-U order, U/V hor+ver 4x subsampled (YUV-4:1:0, 9bpp)
550 Y800: one-plane (Y-only, YUV-4:0:0, 8bpp)
552 See http://www.fourcc.org/ for more information.
554 Note: YUV-4:4:4 (both planar and packed, in multiple orders) are missing.
557 MIME type: video/x-raw-rgb
558 Properties: endianness = 1234/4321 (INT) <- use G_LITTLE/BIG_ENDIAN
559 depth = 15/16/24 (INT, color depth)
560 bpp = 16/24/32 (INT, bits used to store each pixel)
561 red_mask = bitmask (0x..) (INT)
562 green_mask = bitmask (0x..) (INT)
563 blue_mask = bitmask (0x..) (INT)
565 24 and 32 bit RGB should always be specified as big endian, since any little
566 endian format can be transformed into big endian by rearranging the color
567 masks. 15 and 16 bit formats should generally have the same byte order as
570 Color masks are interpreted by loading 'bpp' number of bits using the given
571 'endianness', and masking and shifting by each color mask. Loading a 24-bit
572 value cannot be done directly, but one can perform an equivalent operation.
576 - memory: RRRRRRRR GGGGGGGG BBBBBBBB RRRRRRRR GGGGGGGG ...
579 endianness = 4321 (G_BIG_ENDIAN)
581 green_mask = 0x00ff00
584 - memory: xRRRRRGG GGGBBBBB xRRRRRGG GGGBBBBB xRRRRRGG ...
587 endianness = 4321 (G_BIG_ENDIAN)
592 - memory: GGGBBBBB xRRRRRGG GGGBBBBB xRRRRRGG GGGBBBBB ...
595 endianness = 1234 (G_LITTLE_ENDIAN)
600 The raw audio formats require the following common properties, in addition to
601 format-specific properties:
603 rate = 1 - MAXINT (INT, sampling rate)
604 channels = 1 - MAXINT (INT, number of audio channels)
605 buffer-frames = 1 - MAXINT (INT, number of frames per buffer)
606 endianness = 1234/4321 (INT) <- use G_BIG/LITTLE_ENDIAN
608 3 - Raw audio (integer format)
609 MIME type: audio/x-raw-int
610 properties: width = 8/16/32 (INT, bits used to store each sample)
611 depth = 8 - 32 (INT, bits actually used per sample)
612 signed = TRUE/FALSE (BOOLEAN)
614 4 - Raw audio (floating point format)
615 MIME type: audio/x-raw-float
616 Properties: width = 32/64 (INT)
621 So, a short bit on what plugins should do. Above, I've stated that audio
622 properties like 'channels' and 'rate' or video properties like 'width' and
623 'height' are all optional. This doesn't mean you can just simply omit them and
624 everything will still work!
626 An example is the best way to explain all this. AVI needs the width, height,
627 rate and channels for the AVI header. So if these properties are missing, the
628 avimux element cannot properly create the AVI header. On the other hand, MPEG
629 doesn't have such properties in its header, so the mpegdemux element would need
630 to parse the separate streams in order to find them out. We don't want that
631 either, because a plugin only does one job. So normally, mpegdemux and avimux
632 wouldn't allow transcoding. To solve this problem, there are stream parser
633 elements (such as mpegaudioparse, ac3parse and mpeg1videoparse).
635 Conclusions to draw from here: a plugin gives info it can provide as seen from
636 its own task/job. If it can't, other elements might still need it and a stream
637 parser needs to be written if it doesn't already exist.
639 On properties that can be described by one of these (properties such as 'width',
640 'height', 'fps', etc.): they're forbidden and should be handled using filtered
643 Status of this document
644 =======================
646 Not all plugins strictly follow these guidelines yet, but these are the official
647 types. Plugins not following these specs either use extensions that should be
648 documented, or are buggy (and should be fixed).
650 Blame Ronald Bultje <rbultje@ronald.bitfreak.net> aka BBB for any mistakes in