3 This document describes the events and objects involved in stream
4 selection in GStreamer pipelines, elements and applications
12 Update to reflect design changes
19 This new API is intended to address the use cases described in
22 1) As a user/app I want an overview and control of the media streams
23 that can be configured within a pipeline for processing, even
24 when some streams are mutually exclusive or logical constructs only.
26 2) The user/app can disable entirely streams it's not interested
27 in so they don't occupy memory or processing power - discarded
28 as early as possible in the pipeline. The user/app can also
29 (re-)enable them at a later time.
31 3) If the set of possible stream configurations is changing,
32 the user/app should be aware of the pending change and
33 be able to make configuration choices for the new set of streams,
34 as well as possibly still reconfiguring the old set
36 4) Elements that have some other internal mechanism for triggering
37 stream selections (DVD, or maybe some scripted playback
38 playlist) should be able to trigger 'selection' of some particular
41 5) Indicate known relationships between streams - for example that
42 2 separate video feeds represent the 2 views of a stereoscopic
43 view, or that certain streams are mutually exclusive.
45 > Note: the streams that are "available" are not automatically
46 > the ones active, or present in the pipeline as pads. Think HLS/DASH
51 1) Playing an MPEG-TS multi-program stream, we want to tell the
52 app that there are multiple programs that could be extracted
53 from the incoming feed. Further, we want to provide a mechanism
54 for the app to select which program(s) to decode, and once
55 that is known to further tell the app which elementary streams
56 are then available within those program(s) so the app/user can
57 choose which audio track(s) to decode and/or use.
59 2) A new PMT arrives for an MPEG-TS stream, due to a codec or
60 channel change. The pipeline will need to reconfigure to
61 play the desired streams from new program. Equally, there
62 may be multiple seconds of content buffered from the old
63 program and it should still be possible to switch (for example)
64 subtitle tracks responsively in the draining out data, as
65 well as selecting which subs track to play from the new feed.
66 This same scenario applies when doing gapless transition to a
67 new source file/URL, except that likely the element providing
68 the list of streams also changes as a new demuxer is installed.
70 3) When playing a multi-angle DVD, the DVD Virtual Machine needs to
71 extract 1 angle from the data for presentation. It can publish
72 the available angles as logical streams, even though only one
75 4) When playing a DVD, the user can make stream selections from the
76 DVD menu to choose audio or sub-picture tracks, or the DVD VM
77 can trigger automatic selections. In addition, the player UI
78 should be able to show which audio/subtitle tracks are available
79 and allow direct selection in a GUI the same as for normal
80 files with subtitle tracks in them.
82 5) Playing a SCHC (3DTV) feed, where one view is MPEG-2 and the other
83 is H.264 and they should be combined for 3D presentation, or
84 not bother decoding 1 stream if displaying 2D.
85 (bug https://bugzilla.gnome.org/show_bug.cgi?id=719333)
87 FIXME - need some use cases indicating what alternate streams in
88 HLS might require - what are the possibilities?
92 Stream selection in GStreamer is implemented in several parts:
93 1) Objects describing streams : `GstStream`
94 2) Objects describing a collection of streams : `GstStreamCollection`
95 3) Events from the app allowing selection and activation of some streams:
96 `GST_EVENT_SELECT_STREAMS`
97 4) Messages informing the user/application about the available
98 streams and current status: `GST_MESSAGE_STREAM_COLLECTION` and
99 `GST_MESSAGE_STREAMS_SELECTED`
109 gst_stream_get_*(...)
110 gst_stream_set_*(...)
111 gst_event_set_stream(...)
112 gst_event_parse_stream(...)
115 `GstStream` objects are a high-level convenience object containing
116 information regarding a possible data stream that can be exposed by
119 They are mostly the aggregation of information present in other
120 GStreamer components (`STREAM_START`, `CAPS`, `TAGS` events) but are not
121 tied to the presence of a `GstPad`, and for some use-cases provide
122 information that the existing components don't provide.
124 The various properties of a `GstStream` object are:
125 - stream_id (from the `STREAM_START` event)
126 - flags (from the `STREAM_START` event)
129 - type (high-level type of stream: Audio, Video, Container,...)
131 `GstStream` objects can be subclassed so that they can be re-used by
132 elements already using the notion of stream (which is common for
133 example in demuxers).
135 Elements that create `GstStream` should also set it on the
136 `GST_EVENT_STREAM_START` event of the relevant pad. This helps
137 downstream elements to have all information in one location.
139 ## Exposing collections of streams
146 gst_stream_collection_new(...)
147 gst_stream_collection_add_stream(...)
148 gst_stream_collection_get_size(...)
149 gst_stream_collection_get_stream(...)
150 GST_MESSAGE_STREAM_COLLECTION
151 gst_message_new_stream_collection(...)
152 gst_message_parse_stream_collection(...)
153 GST_EVENT_STREAM_COLLECTION
154 gst_event_new_stream_collection(...)
155 gst_event_parse_stream_collection(...)
158 Elements that create new streams (such as demuxers) or can create
159 new streams (like the HLS/DASH alternative streams) can list the
160 streams they can make available with the `GstStreamCollection` object.
162 Other elements that might generate `GstStreamCollections` are the
163 DVD-VM, which handles internal switching of tracks, or parsebin and
164 decodebin3 when it aggregates and presents multiple internal stream
165 sources as a single configurable collection.
167 The `GstStreamCollection` object is a flat listing of `GstStream` objects.
169 The various properties of a `GstStreamCollection` are:
171 - the identifier of the collection (unique name)
172 - Generated from the 'upstream stream id' (or stream ids, plural)
173 - the list of `GstStreams` in the collection.
174 - (Not implemented) : Flags -
175 For now, the only flag is `INFORMATIONAL` - used by container parsers to
176 publish information about detected streams without allowing selection of
178 - (Not implemented yet) : The relationship between the various streams
179 This specifies which streams are exclusive (can not be selected at the
180 same time), are related (such as `LINKED_VIEW` or `ENHANCEMENT`), or need to
181 be selected together.
183 An element will inform outside components about that collection via:
185 * a `GST_MESSAGE_STREAM_COLLECTION` message on the bus.
186 * a `GST_EVENT_STREAM_COLLECTION` on each source pads.
188 Applications and container bin elements can listen and collect the
189 various stream collections to know the full range of streams
190 available within a bin/pipeline.
192 Once posted on the bus, a `GstStreamCollection` is immutable. It is
193 updated by subsequent messages with a matching identifier.
195 If the element that provided the collection goes away, there is no way
196 to know that the streams are no longer valid (without having the
197 user/app track that element). The exception to that is if the bin
198 containing that element (such as parsebin or decodebin3) informs that
199 the next collection is a replacement of the former one.
201 The mutual exclusion and relationship lists use stream-ids
202 rather than `GstStream` references in order to avoid circular
203 referencing problems.
205 ### Usage from elements
207 When a demuxer knows the list of streams it can expose, it
208 creates a new `GstStream` for each stream it can provide with the
209 appropriate information (stream id, flag, tags, caps, ...).
211 The demuxer then creates a `GstStreamCollection` object in which it
212 will put the list of `GstStream` it can expose. That collection is
213 then both posted on the bus (via a `GST_MESSAGE_COLLECTION`) and on
214 each pad (via a `GST_EVENT_STREAM_COLLECTION`).
216 That new collection must be posted on the bus *before* the changes
217 are made available. i.e. before pads corresponding to that selection
220 In order to be backwards-compatible and support elements that don't
221 create streams/collection yet, the new 'parsebin' element used by
222 decodebin3 will automatically create those if not provided.
224 ### Usage from application
226 Applications can know what streams are available by listening to the
227 `GST_MESSAGE_STREAM_COLLECTION` messages posted on the bus.
229 The application can list the available streams per-type (such as all
230 the audio streams, or all the video streams) by iterating the
231 streams available in the collection by `GST_STREAM_TYPE`.
233 The application will also be able to use these stream information to
234 decide which streams should be activated or not (see the stream
235 selection event below).
237 ### Backwards compatibility
239 Not all demuxers will create the various `GstStream` and
240 `GstStreamCollection` objects. In order to remain backwards
241 compatible, a parent bin (parsebin in decodebin3) will create the
242 `GstStream` and `GstStreamCollection` based on the pads being
243 added/removed from an element.
245 This allows providing stream listing/selection for any demuxer-like
246 element even if it doesn't implement the `GstStreamCollection` usage.
248 ## Stream selection event
254 GST_EVENT_SELECT_STREAMS
255 gst_event_new_select_streams(...)
256 gst_event_parse_select_streams(...)
259 Stream selection events are generated by the application and sent into the
260 pipeline to configure the streams.
263 * List of `GstStreams` to activate - a subset of the `GstStreamCollection`
264 * (Not implemented) - List of `GstStreams` to be kept discarded - a
265 subset of streams for which hot-swapping will not be desired,
266 allowing elements (such as decodebin3, demuxers, ...) to not parse or
267 buffer those streams at all.
269 ### Usage from application
271 There are two use-cases where an application needs to specify in a
272 generic fashion which streams it wants in output:
274 1) When there are several present streams of which it only wants a
275 subset (such as one audio, one video and one subtitle
276 stream). Those streams are demuxed and present in the pipeline.
277 2) When the stream the user wants require some element to undertake
278 some action to expose that stream in the pipeline (such as
279 DASH/HLS alternative streams).
281 From the point of view of the application, those two use-cases are
282 treated identically. The streams are all available through the
283 `GstStreamCollection` posted on the bus, and it will select a subset.
285 The application can select the streams it wants by creating a
286 `GST_EVENT_SELECT_STREAMS` event with the list of stream-id of the
287 streams it wants. That event is then sent on the pipeline,
288 eventually traveling all the way upstream from each sink.
290 In some cases, selecting one stream may trigger the availability of
291 other dependent streams, resulting in new `GstStreamCollection`
292 messages. This can happen in the case where choosing a different DVB
293 channel would create a new single-program collection.
295 ### Usage in elements
297 Elements that receive the `GST_EVENT_SELECT_STREAMS` event and that
298 can activate/deactivate streams need to look at the list of
299 stream-id contained in the event and decide if they need to do some
302 In the standard demuxer case (demuxing and exposing all streams),
303 there is nothing to do by default.
305 In decodebin3, activating or deactivating streams is taken care of by
306 linking only the streams present in the event to decoders and output
309 In the case of elements that can expose alternate streams that are
310 not present in the pipeline as pads, they will take the appropriate
311 action to add/remove those streams.
313 Containers that receive the event should pass it to any elements
314 with no downstream peers, so that streams can be configured during
315 pre-roll before a pipeline is completely linked down to sinks.
317 ## decodebin3 usage and example
319 This is an example of how decodebin3 works by using the
320 above-mentioned objects/events/messages.
322 For clarity/completeness, we will consider a MPEG-TS stream that has
323 multiple audio streams. Furthermore that stream might have changes
324 at some point (switching video codec, or adding/removing audio
327 ### Initial differences
329 decodebin3 is different, compared to decodebin2, in the sense that, by
331 * it will only expose as output ghost source pads one stream of each
332 type (one audio, one video, ..).
333 * It will only decode the exposed streams
335 The multiqueue element is still used and takes in all elementary
336 (non-decoded) streams. If parsers are needed/present they are placed
337 before the multiqueue. This is needed in order for multiqueue to
338 work only with packetized and properly timestamped streams.
340 Note that the whole typefinding of streams, and optional depayloading,
341 demuxing and parsing are done in a new 'parsebin' element.
343 Just like the current implementation, demuxers will expose all
344 streams present within a program as source pads. They will connect
345 to parsers and multiqueue.
347 Initial setup. 1 video stream, 2 audio streams.
350 +---------------------+
352 | --------- | +-------------+
353 | | demux |--[parser]-+-| multiqueue |--[videodec]---[
354 ]-+-| |--[parser]-+-| |
355 | | |--[parser]-+-| |--[audiodec]---[
356 | --------- | +-------------+
357 +---------------------+
360 ### GstStreamCollection
362 When parsing the initial PAT/PMT, the demuxer will:
363 1) create the various GstStream objects for each stream.
364 2) create the GstStreamCollection for that initial PMT
365 3) post the `GST_MESSAGE_STREAM_COLLECTION` Decodebin will intercept that
366 message and know what the demuxer will be exposing.
367 4) The demuxer creates the various pads and sends the corresponding
368 `STREAM_START` event (with the same stream-id as the corresponding
369 `GstStream` objects), `CAPS` event, and `TAGS` event.
371 * parsebin will add all relevant parsers and expose those streams.
373 * Decodebin will be able to correlate, based on `STREAM_START` event
374 stream-id, what pad corresponds to which stream. It links each stream
375 from parsebin to multiqueue.
377 * Decodebin knows all the streams that will be available. Since by
378 default it is configured to only expose a stream of each type, it
379 will pick a stream of each for which it will complete the
380 auto-plugging (finding a decoder and then exposing that stream as a
383 > Note: If the demuxer doesn't create/post the `GstStreamCollection`,
384 > parsebin will create it on itself, as explained in section 2.3
387 ### Changing the active selection from the application
389 The user wants to change the audio track. The application received
390 the `GST_MESSAGE_STREAM_COLLECTION` containing the list of available
391 streams. For clarity, we will assume those stream-ids are
392 "video-main", "audio-english" and "audio-french".
394 The user prefers to use the french soundtrack (which it knows based
395 on the language tag contained in the `GstStream` objects).
397 The application will create and send a `GST_EVENT_SELECT_STREAM` event
398 containing the list of streams: "video-main", "audio-french".
400 That event gets sent on the pipeline, the sinks send it upstream and
401 eventually reach decodebin.
404 * The currently active selection ("video-main", "audio-english")
405 * The available stream collection ("video-main", "audio-english",
407 * The list of streams in the event ("video-main", "audio-french")
409 Decodebin determines that no change is required for "video-main",
410 but sees that it needs to deactivate "audio-english" and activate
413 It unlinks the multiqueue source pad connected to the audiodec. Then
414 it queries audiodec, using the `GST_QUERY_ACCEPT_CAPS`, whether it can
415 accept as-is the caps from the "audio-french" stream.
416 1) If it does, the multiqueue source pad corresponding to
417 "audio-french" is linked to the decoder.
418 2) If it does not, the existing audio decoder is removed,
419 a new decoder is selected (like during initial
420 auto-plugging), and replaces the old audio decoder element.
422 The newly selected stream gets decoded and output through the same
423 pad as the previous audio stream.
426 The default behaviour would be to only expose one stream of each
427 type. But nothing prevents decodebin from outputting more/less of
428 each type if the `GST_EVENT_SELECT_STREAM` event specifies that. This
429 allows covering more use-case than the simple playback one.
430 Such examples could be :
431 * Wanting just a video stream or just an audio stream
432 * Wanting all decoded streams
433 * Wanting all audio streams
436 ### Changes coming from upstream
438 At some point in time, a PMT change happens. Let's assume a change
439 in video-codec and/or PID.
441 The demuxer creates a new `GstStream` for the changed/new stream,
442 creates a new GstStreamCollection for the updated PMT and posts it.
444 Decodebin sees the new `GstStreamCollection` message.
446 The demuxer (and parsebin) then adds and removes pads.
447 1) decodebin will match the new pads to `GstStream` in the "new"
448 `GstStreamCollection` the same way it did for the initial pads in
450 2) decodebin will see whether the new stream can re-use a multiqueue
451 slot used by a stream of the same type no longer present (it
452 compares the old collection to the new collection).
453 In this case, decodebin sees that the new video stream can re-use
454 the same slot as the previous video stream.
455 3) If the new stream is going to be active by default (in this case
456 it does because we are replacing the only video stream, which was
457 active), it will check whether the caps are compatible with the
458 existing videodec (in the same way it was done for the audio
459 decoder switch in section 4.3).
461 Eventually, the stream that switched will be decoded and output
462 through the same pad as the previous video stream in a gapless fashion.
468 There is a main (multi-bitrate or not) stream with audio and
469 video interleaved in MPEG-TS. The manifest also indicates the
470 presence of alternate language audio-only streams.
471 HLS would expose one collection containing:
472 1) The main A+V CONTAINER stream (MPEG-TS), initially active,
473 downloaded and exposed as a pad
474 2) The alternate A-only streams, initially inactive and not exposed as pads
475 the tsdemux element connected to the first stream will also expose
476 a collection containing
481 [ Collection 1 ] [ Collection 2 ]
482 [ (hlsdemux) ] [ (tsdemux) ]
483 [ upstream:nil ] /----[ upstream:main]
485 [ "main" (A+V) ]<-/ [ "video" (V) ] viddec1 : "video"
486 [ "fre" (A) ] [ "eng" (A) ] auddec1 : "eng"
490 The user might want to use the korean audio track instead of the
494 => SELECT_STREAMS ("video", "kor")
497 1) decodebin3 receives and sends the event further upstream
498 2) tsdemux sees that "video" is part of its current upstream,
499 so adds the corresponding stream-id ("main") to the event
500 and sends it upstream ("main", "video", "kor")
501 3) hlsdemux receives the event
502 => It activates "kor" in addition to "main"
503 4) The event travels back to decodebin3 which will remember the
504 requested selection. If "kor" is already present it will switch
505 the "eng" stream from the audio decoder to the "kor" stream.
506 If it appears a bit later, it will wait until that "kor" stream
507 is available before switching
509 #### multi-program MPEG-TS
511 Assuming the case of a MPEG-TS stream which contains multiple
513 There would be three "levels" of collection:
514 1) The collection of programs presents in the stream
515 2) The collection of elementary streams presents in a stream
516 3) The collection of streams decodebin can expose
518 Initially tsdemux exposes the first program present (default)
521 [ Collection 1 ] [ Collection 2 ] [ Collection 3 ]
522 [ (tsdemux) ] [ (tsdemux) ] [ (decodebin) ]
523 [ id:Programs ]<-\ [ id:BBC1 ]<-\ [ id:BBC1-decoded ]
524 [ upstream:nil ] \-----[ upstream:Programs] \----[ upstream:BBC1 ]
526 [ "BBC1" (C) ] [ id:"bbcvideo"(V) ] [ id:"bbcvideo"(V)]
527 [ "ITV" (C) ] [ id:"bbcaudio"(A) ] [ id:"bbcaudio"(A)]
528 [ "NBC" (C) ] [ ] [ ]
531 At some point the user wants to switch to ITV (of which we do not
532 know the topology at this point in time. A `SELECT_STREAMS` event
533 is sent with "ITV" in it and the pointer to the Collection1.
534 1) The event travels up the pipeline until tsdemux receives it
535 and begins the switch.
536 2) tsdemux publishes a new 'Collection 2a/ITV' and marks 'Collection 2/BBC'
538 2a) App may send a `SELECT_STREAMS` event configuring which demuxer output
539 streams should be selected (parsed)
540 3) tsdemux adds/removes pads as needed (flushing pads as it removes them?)
541 4) Decodebin feeds new pad streams through existing parsers/decoders as
542 needed. As data from the new collection arrives out each decoder,
543 decodebin sends new `GstStreamCollection` messages to the app so it
544 can know that the new streams are now switchable at that level.
545 4a) As new `GstStreamCollections` are published, the app may override
546 the default decodebin stream selection to expose more/fewer streams.
547 The default is to decode and output 1 stream of each type.
552 [ Collection 1 ] [ Collection 4 ] [ Collection 5 ]
553 [ (tsdemux) ] [ (tsdemux) ] [ (decodebin) ]
554 [ id:Programs ]<-\ [ id:ITV ]<-\ [ id:ITV-decoded ]
555 [ upstream:nil ] \-----[ upstream:Programs] \----[ upstream:ITV ]
557 [ "BBC1" (C) ] [ id:"itvvideo"(V) ] [ id:"itvvideo"(V)]
558 [ "ITV" (C) ] [ id:"itvaudio"(A) ] [ id:"itvaudio"(A)]
559 [ "NBC" (C) ] [ ] [ ]
564 - Add missing implementation
566 - Add flags to `GstStreamCollection`
568 - Add mutual-exclusion and relationship API to `GstStreamCollection`
570 - Add helper API to figure out whether a collection is a replacement
571 of another or a completely new one. This will require a more generic
572 system to know whether a certain stream-id is a replacement of
577 - Is a `FLUSHING` flag for stream-selection required or not? This would
578 make the handler of the `SELECT_STREAMS` event send `FLUSH START/STOP`
579 before switching to the other streams. This is tricky when dealing
580 where situations where we keep some streams and only switch some
581 others. Do we flush all streams? Do we only flush the new streams,
582 potentially resulting in delay to fully switch? Furthermore, due to
583 efficient buffering in decodebin3, the switching time has been
584 minimized extensively, to the point where flushing might not bring a
585 noticeable improvement.
587 - Store the stream collection in bins/pipelines? A Bin/Pipeline could
588 store all active collection internally, so that it could be queried
589 later on. This could be useful to then get, on any pipeline, at any
590 point in time, the full list of collections available without having
591 to listen to all COLLECTION messages on the bus. This would require
592 fixing the "is a collection a replacement or not" issue first.
594 - When switching to new collections, should decodebin3 make any effort
595 to *map* corresponding streams from the old to new PMT - that is,
596 try and stick to the *english* language audio track, for example?
597 Alternatively, rely on the app to do such smarts with stream-select