3 ## Problems this proposal attempts to solve
5 - Duplication of pipeline code for gstreamer-based applications
6 wishing to encode and or mux streams, leading to subtle differences
7 and inconsistencies across those applications.
9 - No unified system for describing encoding targets for applications
10 in a user-friendly way.
12 - No unified system for creating encoding targets for applications,
13 resulting in duplication of code across all applications,
14 differences and inconsistencies that come with that duplication, and
15 applications hardcoding element names and settings resulting in poor
20 1. Convenience encoding element
22 Create a convenience `GstBin` for encoding and muxing several streams,
23 hereafter called 'EncodeBin'.
25 This element will only contain one single property, which is a profile.
27 2. Define a encoding profile system
29 3. Encoding profile helper library
31 Create a helper library to:
33 - create EncodeBin instances based on profiles, and
35 - help applications to create/load/save/browse those profiles.
41 EncodeBin is a `GstBin` subclass.
43 It implements the `GstTagSetter` interface, by which it will proxy the
46 Only two introspectable property (i.e. usable without extra API):
47 - A `GstEncodingProfile`
48 - The name of the profile to use
50 When a profile is selected, encodebin will:
52 - Add REQUEST sinkpads for all the `GstStreamProfile`
53 - Create the muxer and expose the source pad
55 Whenever a request pad is created, encodebin will:
57 - Create the chain of elements for that pad
59 - Return that ghost pad
61 This allows reducing the code to the minimum for applications wishing to
62 encode a source for a given profile:
65 encbin = gst_element_factory_make ("encodebin, NULL);
66 g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
67 gst_element_link (encbin, filesink);
69 vsrcpad = gst_element_get_src_pad (source, "src1");
70 vsinkpad = gst_element_get_request_pad (encbin, "video\_%u");
71 gst_pad_link (vsrcpad, vsinkpad);
74 ### Explanation of the Various stages in EncodeBin
76 This describes the various stages which can happen in order to end up
77 with a multiplexed stream that can then be stored or streamed.
81 The streams fed to EncodeBin can be of various types:
84 - Uncompressed (but maybe subsampled)
87 - Uncompressed (audio/x-raw)
92 #### Steps involved for raw video encoding
96 1) Transform raw video feed (optional)
98 Here we modify the various fundamental properties of a raw video stream
99 to be compatible with the intersection of: \* The encoder `GstCaps` and \*
100 The specified "Stream Restriction" of the profile/target
102 The fundamental properties that can be modified are: \* width/height
103 This is done with a video scaler. The DAR (Display Aspect Ratio) MUST be
104 respected. If needed, black borders can be added to comply with the
105 target DAR. \* framerate \* format/colorspace/depth All of this is done
106 with a colorspace converter
108 2) Actual encoding (optional for raw streams)
110 An encoder (with some optional settings) is used.
114 A muxer (with some optional settings) is used.
116 4) Outgoing encoded and muxed stream
118 #### Steps involved for raw audio encoding
120 This is roughly the same as for raw video, expect for (1)
122 1) Transform raw audo feed (optional)
124 We modify the various fundamental properties of a raw audio stream to be
125 compatible with the intersection of: \* The encoder `GstCaps` and \* The
126 specified "Stream Restriction" of the profile/target
128 The fundamental properties that can be modifier are: \* Number of
129 channels \* Type of raw audio (integer or floating point) \* Depth
130 (number of bits required to encode one sample)
132 #### Steps involved for encoded audio/video streams
134 Steps (1) and (2) are replaced by a parser if a parser is available for
137 #### Steps involved for other streams
139 Other streams will just be forwarded as-is to the muxer, provided the
140 muxer accepts the stream type.
142 ## Encoding Profile System
144 This work is based on:
146 - The existing [GstPreset API documentation][gst-preset] system for elements
148 - The gnome-media [GConf audio profile system][gconf-audio-profile]
150 - The investigation done into device profiles by Arista and
151 Transmageddon: [Research on a Device Profile API][device-profile-api],
152 and [Research on defining presets usage][preset-usage].
156 - Encoding Target Category A Target Category is a classification of
157 devices/systems/use-cases for encoding.
159 Such a classification is required in order for: \* Applications with a
160 very-specific use-case to limit the number of profiles they can offer
161 the user. A screencasting application has no use with the online
162 services targets for example. \* Offering the user some initial
163 classification in the case of a more generic encoding application (like
164 a video editor or a transcoder).
166 Ex: Consumer devices Online service Intermediate Editing Format
167 Screencast Capture Computer
169 - Encoding Profile Target A Profile Target describes a specific entity
170 for which we wish to encode. A Profile Target must belong to at
171 least one Target Category. It will define at least one Encoding
174 Examples (with category): Nokia N900 (Consumer device) Sony PlayStation 3
175 (Consumer device) Youtube (Online service) DNxHD (Intermediate editing
176 format) HuffYUV (Screencast) Theora (Computer)
178 - Encoding Profile A specific combination of muxer, encoders, presets
181 Examples: Nokia N900/H264 HQ, Ipod/High Quality, DVD/Pal,
182 Youtube/High Quality HTML5/Low Bandwith, DNxHD
186 An encoding profile requires the following information:
188 - Name This string is not translatable and must be unique. A
189 recommendation to guarantee uniqueness of the naming could be:
191 - Description This is a translatable string describing the profile
192 - Muxing format This is a string containing the GStreamer media-type
193 of the container format.
194 - Muxing preset This is an optional string describing the preset(s) to
196 - Multipass setting This is a boolean describing whether the profile
197 requires several passes.
198 - List of Stream Profile
200 2.3.1 Stream Profiles
202 A Stream Profile consists of:
204 - Type The type of stream profile (audio, video, text, private-data)
205 - Encoding Format This is a string containing the GStreamer media-type
206 of the encoding format to be used. If encoding is not to be applied,
207 the raw audio media type will be used.
208 - Encoding preset This is an optional string describing the preset(s)
209 to use on the encoder.
210 - Restriction This is an optional `GstCaps` containing the restriction
211 of the stream that can be fed to the encoder. This will generally
212 containing restrictions in video width/heigh/framerate or audio
214 - presence This is an integer specifying how many streams can be used
215 in the containing profile. 0 means that any number of streams can be
217 - pass This is an integer which is only meaningful if the multipass
218 flag has been set in the profile. If it has been set it indicates
219 which pass this Stream Profile corresponds to.
221 ### 2.4 Example profile
223 The representation used here is XML only as an example. No decision is
224 made as to which formatting to use for storing targets and profiles.
227 <gst-encoding-target>
228 <name>Nokia N900</name>
229 <category>Consumer Device</category>
231 <profile>Nokia N900/H264 HQ</profile>
232 <profile>Nokia N900/MP3</profile>
233 <profile>Nokia N900/AAC</profile>
235 </gst-encoding-target>
237 <gst-encoding-profile>
238 <name>Nokia N900/H264 HQ</name>
240 High Quality H264/AAC for the Nokia N900
242 <format>video/quicktime,variant=iso</format>
246 <format>audio/mpeg,mpegversion=4</format>
247 <preset>Quality High/Main</preset>
248 <restriction>audio/x-raw,channels=[1,2]</restriction>
249 <presence>1</presence>
253 <format>video/x-h264</format>
254 <preset>Profile Baseline/Quality High</preset>
256 video/x-raw,width=[16, 800],\
257 height=[16, 480],framerate=[1/1, 30000/1001]
259 <presence>1</presence>
262 </gst-encoding-profile>
267 A proposed C API is contained in the gstprofile.h file in this
270 ### Modifications required in the existing GstPreset system
272 #### Temporary preset.
274 Currently a preset needs to be saved on disk in order to be used.
276 This makes it impossible to have temporary presets (that exist only
277 during the lifetime of a process), which might be required in the new
278 proposed profile system
280 #### Categorisation of presets.
282 Currently presets are just aliases of a group of property/value without
283 any meanings or explanation as to how they exclude each other.
285 Take for example the H264 encoder. It can have presets for: \* passes
286 (1,2 or 3 passes) \* profiles (Baseline, Main, ...) \* quality (Low,
289 In order to programmatically know which presets exclude each other, we
290 here propose the categorisation of these presets.
292 This can be done in one of two ways 1. in the name (by making the name
293 be `[<category>:]<name>`) This would give for example: "Quality:High",
294 "Profile:Baseline" 2. by adding a new `_meta` key This would give for
295 example: `_meta/category:quality`
297 #### Aggregation of presets.
299 There can be more than one choice of presets to be done for an element
300 (quality, profile, pass).
302 This means that one can not currently describe the full configuration of
303 an element with a single string but with many.
305 The proposal here is to extend the `GstPreset` API to be able to set all
306 presets using one string and a well-known separator ('/').
308 This change only requires changes in the core preset handling code.
310 This would allow doing the following: `gst_preset_load_preset
311 (h264enc, "pass:1/profile:baseline/quality:high")`
313 ### Points to be determined
315 This document hasn't determined yet how to solve the following problems:
317 #### Storage of profiles
319 One proposal for storage would be to use a system wide directory (like
320 $prefix/share/gstreamer-0.10/profiles) and store XML files for every
323 Users could then add their own profiles in ~/.gstreamer-0.10/profiles
325 This poses some limitations as to what to do if some applications want
326 to have some profiles limited to their own usage.
328 ## Helper library for profiles
330 These helper methods could also be added to existing libraries (like
331 `GstPreset`, `GstPbUtils`, ..).
333 The various API proposed are in the accompanying gstprofile.h file.
335 ### Getting user-readable names for formats
337 This is already provided by `GstPbUtils`.
339 ### Hierarchy of profiles
341 The goal is for applications to be able to present to the user a list of
342 combo-boxes for choosing their output profile:
345 [ Category ] # optional, depends on the application [ Device/Site/..
346 ] # optional, depends on the application [ Profile ]
348 Convenience methods are offered to easily get lists of categories,
349 devices, and profiles.
351 ### Creating Profiles
353 The goal is for applications to be able to easily create profiles.
355 The applications needs to be able to have a fast/efficient way to: \*
356 select a container format and see all compatible streams he can use with
357 it. \* select a codec format and see which container formats he can use
360 The remaining parts concern the restrictions to encoder input.
362 ### Ensuring availability of plugins for Profiles
364 When an application wishes to use a Profile, it should be able to query
365 whether it has all the needed plugins to use it.
367 This part will use `GstPbUtils` to query, and if needed install the
368 missing plugins through the installed distribution plugin installer.
370 ## Use-cases researched
372 This is a list of various use-cases where encoding/muxing is being used.
376 The goal is to convert with as minimal loss of quality any input file
377 for a target use. A specific variant of this is transmuxing (see below).
379 Example applications: Arista, Transmageddon
381 ### Rendering timelines
383 The incoming streams are a collection of various segments that need to
384 be rendered. Those segments can vary in nature (i.e. the video
385 width/height can change). This requires the use of identiy with the
386 single-segment property activated to transform the incoming collection
387 of segments to a single continuous segment.
389 Example applications: PiTiVi, Jokosher
391 ### Encoding of live sources
393 The major risk to take into account is the encoder not encoding the
394 incoming stream fast enough. This is outside of the scope of encodebin,
395 and should be solved by using queues between the sources and encodebin,
396 as well as implementing QoS in encoders and sources (the encoders
397 emitting QoS events, and the upstream elements adapting themselves
400 Example applications: camerabin, cheese
402 ### Screencasting applications
404 This is similar to encoding of live sources. The difference being that
405 due to the nature of the source (size and amount/frequency of updates)
406 one might want to do the encoding in two parts: \* The actual live
407 capture is encoded with a 'almost-lossless' codec (such as huffyuv) \*
408 Once the capture is done, the file created in the first step is then
409 rendered to the desired target format.
411 Fixing sources to only emit region-updates and having encoders capable
412 of encoding those streams would fix the need for the first step but is
413 outside of the scope of encodebin.
415 Example applications: Istanbul, gnome-shell, recordmydesktop
419 This is the case of an incoming live stream which will be
420 broadcasted/transmitted live. One issue to take into account is to
421 reduce the encoding latency to a minimum. This should mostly be done by
422 picking low-latency encoders.
424 Example applications: Rygel, Coherence
428 Given a certain file, the aim is to remux the contents WITHOUT decoding
429 into either a different container format or the same container format.
430 Remuxing into the same container format is useful when the file was not
431 created properly (for example, the index is missing). Whenever
432 available, parsers should be applied on the encoded streams to validate
433 and/or fix the streams before muxing them.
435 Metadata from the original file must be kept in the newly created file.
437 Example applications: Arista, Transmaggedon
439 ### Loss-less cutting
441 Given a certain file, the aim is to extract a certain part of the file
442 without going through the process of decoding and re-encoding that file.
443 This is similar to the transmuxing use-case.
445 Example applications: PiTiVi, Transmageddon, Arista, ...
447 ### Multi-pass encoding
449 Some encoders allow doing a multi-pass encoding. The initial pass(es)
450 are only used to collect encoding estimates and are not actually muxed
451 and outputted. The final pass uses previously collected information, and
452 the output is then muxed and outputted.
454 ### Archiving and intermediary format
456 The requirement is to have lossless
460 Example applications: Sound-juicer
464 Example application: Thoggen
468 Some of these are still active documents, some other not
470 [gst-preset]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
471 [gconf-audio-profile]: http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
472 [device-profile-api]: http://gstreamer.freedesktop.org/wiki/DeviceProfile (FIXME: wiki is gone)
473 [preset-usage]: http://gstreamer.freedesktop.org/wiki/PresetDesign (FIXME: wiki is gone)