From aff7ad108082f465b0bf8eb447f2a845a8370027 Mon Sep 17 00:00:00 2001 From: =?utf8?q?Tim-Philipp=20M=C3=BCller?= Date: Thu, 8 Dec 2016 22:58:08 +0000 Subject: [PATCH] design: move over design docs from gst-plugins-base Or most of them anyway (excl. draft-hw-acceleration and draft-va which didn't seem particularly pertinent). --- markdown/design/audiosinks.md | 129 +++ markdown/design/decodebin.md | 264 ++++++ markdown/design/encoding.md | 469 +++++++++++ markdown/design/interlaced-video.md | 102 +++ markdown/design/keyframe-force.md | 97 +++ markdown/design/mediatype-audio-raw.md | 68 ++ markdown/design/mediatype-text-raw.md | 22 + markdown/design/mediatype-video-raw.md | 1240 +++++++++++++++++++++++++++++ markdown/design/orc-integration.md | 159 ++++ markdown/design/playbin.md | 66 ++ markdown/design/stereo-multiview-video.md | 320 ++++++++ markdown/design/subtitle-overlays.md | 527 ++++++++++++ sitemap.txt | 12 + 13 files changed, 3475 insertions(+) create mode 100644 markdown/design/audiosinks.md create mode 100644 markdown/design/decodebin.md create mode 100644 markdown/design/encoding.md create mode 100644 markdown/design/interlaced-video.md create mode 100644 markdown/design/keyframe-force.md create mode 100644 markdown/design/mediatype-audio-raw.md create mode 100644 markdown/design/mediatype-text-raw.md create mode 100644 markdown/design/mediatype-video-raw.md create mode 100644 markdown/design/orc-integration.md create mode 100644 markdown/design/playbin.md create mode 100644 markdown/design/stereo-multiview-video.md create mode 100644 markdown/design/subtitle-overlays.md diff --git a/markdown/design/audiosinks.md b/markdown/design/audiosinks.md new file mode 100644 index 0000000..440f3e8 --- /dev/null +++ b/markdown/design/audiosinks.md @@ -0,0 +1,129 @@ +## Audiosink design + +### Requirements + + - must operate chain based. Most simple playback pipelines will push + audio from the decoders into the audio sink. + + - must operate getrange based Most professional audio applications + will operate in a mode where the audio sink pulls samples from the + pipeline. This is typically done in a callback from the audiosink + requesting N samples. The callback is either scheduled from a thread + or from an interrupt from the audio hardware device. + + - Exact sample accurate clocks. the audiosink must be able to provide + a clock that is sample accurate even if samples are dropped or when + discontinuities are found in the stream. + + - Exact timing of playback. The audiosink must be able to play samples + at their exact times. + + - use DMA access when possible. When the hardware can do DMA we should + use it. This should also work over bufferpools to avoid data copying + to/from kernel space. + +### Design + +The design is based on a set of base classes and the concept of a +ringbuffer of samples. + + +-----------+ - provide preroll, rendering, timing + + basesink + - caps nego + +-----+-----+ + | + +-----V----------+ - manages ringbuffer + + audiobasesink + - manages scheduling (push/pull) + +-----+----------+ - manages clock/query/seek + | - manages scheduling of samples in the ringbuffer + | - manages caps parsing + | + +-----V------+ - default ringbuffer implementation with a GThread + + audiosink + - subclasses provide open/read/close methods + +------------+ + +The ringbuffer is a contiguous piece of memory divided into segtotal +pieces of segments. Each segment has segsize bytes. + + play position + v + +---+---+---+-------------------------------------+----------+ + + 0 | 1 | 2 | .... | segtotal | + +---+---+---+-------------------------------------+----------+ + <---> + segsize bytes = N samples * bytes_per_sample. + +The ringbuffer has a play position, which is expressed in segments. The +play position is where the device is currently reading samples from the +buffer. + +The ringbuffer can be put to the PLAYING or STOPPED state. + +In the STOPPED state no samples are played to the device and the play +pointer does not advance. + +In the PLAYING state samples are written to the device and the +ringbuffer should call a configurable callback after each segment is +written to the device. In this state the play pointer is advanced after +each segment is written. + +A write operation to the ringbuffer will put new samples in the +ringbuffer. If there is not enough space in the ringbuffer, the write +operation will block. The playback of the buffer never stops, even if +the buffer is empty. When the buffer is empty, silence is played by the +device. + +The ringbuffer is implemented with lockfree atomic operations, +especially on the reading side so that low-latency operations are +possible. + +Whenever new samples are to be put into the ringbuffer, the position of +the read pointer is taken. The required write position is taken and the +diff is made between the required and actual position. If the difference +is \<0, the sample is too late. If the difference is bigger than +segtotal, the writing part has to wait for the play pointer to advance. + +### Scheduling + +#### chain based mode + +In chain based mode, bytes are written into the ringbuffer. This +operation will eventually block when the ringbuffer is filled. + +When no samples arrive in time, the ringbuffer will play silence. Each +buffer that arrives will be placed into the ringbuffer at the correct +times. This means that dropping samples or inserting silence is done +automatically and very accurate and independend of the play pointer. + +In this mode, the ringbuffer is usually kept as full as possible. When +using a small buffer (small segsize and segtotal), the latency for audio +to start from the sink to when it is played can be kept low but at least +one context switch has to be made between read and write. + +#### getrange based mode + +In getrange based mode, the audiobasesink will use the callback +function of the ringbuffer to get a segsize samples from the peer +element. These samples will then be placed in the ringbuffer at the +next play position. It is assumed that the getrange function returns +fast enough to fill the ringbuffer before the play pointer reaches +the write pointer. + +In this mode, the ringbuffer is usually kept as empty as possible. +There is no context switch needed between the elements that create +the samples and the actual writing of the samples to the device. + +#### DMA mode + +Elements that can do DMA based access to the audio device have to +subclass from the GstAudioBaseSink class and wrap the DMA ringbuffer +in a subclass of GstRingBuffer. + +The ringbuffer subclass should trigger a callback after writing or +playing each sample to the device. This callback can be triggered +from a thread or from a signal from the audio device. + +### Clocks + +The GstAudioBaseSink class will use the ringbuffer to act as a clock +provider. It can do this by using the play pointer and the delay to +calculate the clock time. diff --git a/markdown/design/decodebin.md b/markdown/design/decodebin.md new file mode 100644 index 0000000..42efe75 --- /dev/null +++ b/markdown/design/decodebin.md @@ -0,0 +1,264 @@ +# Decodebin design + +## GstDecodeBin + +### Description + + - Autoplug and decode to raw media + + - Input: single pad with ANY caps + + - Output: Dynamic pads + +### Contents + + - a GstTypeFindElement connected to the single sink pad + + - optionally a demuxer/parser + + - optionally one or more DecodeGroup + +### Autoplugging + +The goal is to reach 'target' caps (by default raw media). + +This is done by using the GstCaps of a source pad and finding the +available demuxers/decoders GstElement that can be linked to that pad. + +The process starts with the source pad of typefind and stops when no +more non-target caps are left. It is commonly done while pre-rolling, +but can also happen whenever a new pad appears on any element. + +Once a target caps has been found, that pad is ghosted and the +'pad-added' signal is emitted. + +If no compatible elements can be found for a GstCaps, the pad is ghosted +and the 'unknown-type' signal is emitted. + +### Assisted auto-plugging + +When starting the auto-plugging process for a given GstCaps, two signals +are emitted in the following way in order to allow the application/user +to assist or fine-tune the process. + + - **'autoplug-continue'**: + + gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps) + + This signal is fired at the very beginning with the source pad GstCaps. If + the callback returns TRUE, the process continues normally. If the + callback returns FALSE, then the GstCaps are considered as a target caps + and the autoplugging process stops. + + - **'autoplug-factories'**: + + GValueArray user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps); + + Get a list of elementfactories for @pad with @caps. This function is + used to instruct decodebin2 of the elements it should try to + autoplug. The default behaviour when this function is not overriden + is to get all elements that can handle @caps from the registry + sorted by rank. + + - **'autoplug-select'**: + + gint user_function (GstElement* decodebin, GstPad* pad, GstCaps*caps, GValueArray* factories); + + This signal is fired once autoplugging has got a list of compatible + GstElementFactory. The signal is emitted with the GstCaps of the + source pad and a pointer on the GValueArray of compatible factories. + + The callback should return the index of the elementfactory in + @factories that should be tried next. + + If the callback returns -1, the autoplugging process will stop as if + no compatible factories were found. + +The default implementation of this function will try to autoplug the +first factory of the list. + +### Target Caps + +The target caps are a read/write GObject property of decodebin. + +By default the target caps are: + + - Raw audio: audio/x-raw + + - Raw video: video/x-raw + + - Raw text: text/x-raw, format={utf8,pango-markup} + +### Media chain/group handling + +When autoplugging, all streams coming out of a demuxer will be grouped +in a DecodeGroup. + +All new source pads created on that demuxer after it has emitted the +'no-more-pads' signal will be put in another DecodeGroup. + +Only one decodegroup can be active at any given time. If a new +decodegroup is created while another one exists, that decodegroup will +be set as blocking until the existing one has drained. + +## DecodeGroup + +### Description + +Streams belonging to the same group/chain of a media file. + +### Contents + +The DecodeGroup contains: + + - a GstMultiQueue to which all streams of a the media group are connected. + + - the eventual decoders which are autoplugged in order to produce the + requested target pads. + +### Proper group draining + +The DecodeGroup takes care that all the streams in the group are +completely drained (EOS has come through all source ghost pads). + +### Pre-roll and block + +The DecodeGroup has a global blocking feature. If enabled, all the +ghosted source pads for that group will be blocked. + +A method is available to unblock all blocked pads for that group. + +## GstMultiQueue + +Multiple input-output data queue. + +`multiqueue` achieves the same functionality as `queue`, with a +few differences: + + - Multiple streams handling. + + The element handles queueing data on more than one stream at once. + To achieve such a feature it has request sink pads (sink\_%u) and + 'sometimes' src pads (src\_%u). + + When requesting a given sinkpad, the associated srcpad for that + stream will be created. Ex: requesting sink\_1 will generate src\_1. + + - Non-starvation on multiple streams. + + If more than one stream is used with the element, the streams' + queues will be dynamically grown (up to a limit), in order to ensure + that no stream is risking data starvation. This guarantees that at + any given time there are at least N bytes queued and available for + each individual stream. + + If an EOS event comes through a srcpad, the associated queue should + be considered as 'not-empty' in the queue-size-growing algorithm. + + - Non-linked srcpads graceful handling. + + A GstTask is started for all srcpads when going to + GST\_STATE\_PAUSED. + + The task are blocking against a GCondition which will be fired in + two different cases: + + - When the associated queue has received a buffer. + + - When the associated queue was previously declared as 'not-linked' + and the first buffer of the queue is scheduled to be pushed + synchronously in relation to the order in which it arrived globally + in the element (see 'Synchronous data pushing' below). + + When woken up by the GCondition, the GstTask will try to push the + next GstBuffer/GstEvent on the queue. If pushing the + GstBuffer/GstEvent returns GST\_FLOW\_NOT\_LINKED, then the + associated queue is marked as 'not-linked'. If pushing the + GstBuffer/GstEvent succeeded the queue will no longer be marked as + 'not-linked'. + + If pushing on all srcpads returns GstFlowReturn different from + GST\_FLOW\_OK, then all the srcpads' tasks are stopped and + subsequent pushes on sinkpads will return GST\_FLOW\_NOT\_LINKED. + + - Synchronous data pushing for non-linked pads. + + In order to better support dynamic switching between streams, the + multiqueue (unlike the current GStreamer queue) continues to push + buffers on non-linked pads rather than shutting down. + + In addition, to prevent a non-linked stream from very quickly + consuming all available buffers and thus 'racing ahead' of the other + streams, the element must ensure that buffers and inlined events for + a non-linked stream are pushed in the same order as they were + received, relative to the other streams controlled by the element. + This means that a buffer cannot be pushed to a non-linked pad any + sooner than buffers in any other stream which were received before + it. + +## Parsers, decoders and auto-plugging + +This section has DRAFT status. + +Some media formats come in different "flavours" or "stream formats". +These formats differ in the way the setup data and media data is +signalled and/or packaged. An example for this is H.264 video, where +there is a bytestream format (with codec setup data signalled inline and +units prefixed by a sync code and packet length information) and a "raw" +format where codec setup data is signalled out of band (via the caps) +and the chunking is implicit in the way the buffers were muxed into a +container, to mention just two of the possible variants. + +Especially on embedded platforms it is common that decoders can only +handle one particular stream format, and not all of them. + +Where there are multiple stream formats, parsers are usually expected to +be able to convert between the different formats. This will, if +implemented correctly, work as expected in a static pipeline such as + + ... ! parser ! decoder ! sink + +where the parser can query the decoder's capabilities even before +processing the first piece of data, and configure itself to convert +accordingly, if conversion is needed at all. + +In an auto-plugging context this is not so straight-forward though, +because elements are plugged incrementally and not before the previous +element has processes some data and decided what it will output exactly +(unless the template caps are completely fixed, then it can continue +right away, this is not always the case here though, see below). A +parser will thus have to decide on *some* output format so auto-plugging +can continue. It doesn't know anything about the available decoders and +their capabilities though, so it's possible that it will choose a format +that is not supported by any of the available decoders, or by the +preferred decoder. + +If the parser had sufficiently concise but fixed source pad template +caps, decodebin could continue to plug a decoder right away, allowing +the parser to configure itself in the same way as it would with a static +pipeline. This is not an option, unfortunately, because often the parser +needs to process some data to determine e.g. the format's profile or +other stream properties (resolution, sample rate, channel configuration, +etc.), and there may be different decoders for different profiles (e.g. +DSP codec for baseline profile, and software fallback for main/high +profile; or a DSP codec only supporting certain resolutions, with a +software fallback for unusual resolutions). So if decodebin just plugged +the most highest-ranking decoder, that decoder might not be be able to +handle the actual stream later on, which would yield an error (this is a +data flow error then which would be hard to intercept and avoid in +decodebin). In other words, we can't solve this issue by plugging a +decoder right away with the parser. + +So decodebin needs to communicate to the parser the set of available +decoder caps (which would contain the relevant capabilities/restrictions +such as supported profiles, resolutions, etc.), after the usual +"autoplug-\*" signal filtering/sorting of course. + +This is done by plugging a capsfilter element right after the parser, +and constructing set of filter caps from the list of available decoders +(one appends at the end just the name(s) of the caps structures from the +parser pad template caps to function as an 'ANY other' caps equivalent). +This let the parser negotiate to a supported stream format in the same +way as with the static pipeline mentioned above, but of course incur +some overhead through the additional capsfilter element. + diff --git a/markdown/design/encoding.md b/markdown/design/encoding.md new file mode 100644 index 0000000..5b5ef4a --- /dev/null +++ b/markdown/design/encoding.md @@ -0,0 +1,469 @@ +## Encoding and Muxing + +## Problems this proposal attempts to solve + + - Duplication of pipeline code for gstreamer-based applications + wishing to encode and or mux streams, leading to subtle differences + and inconsistencies across those applications. + + - No unified system for describing encoding targets for applications + in a user-friendly way. + + - No unified system for creating encoding targets for applications, + resulting in duplication of code across all applications, + differences and inconsistencies that come with that duplication, and + applications hardcoding element names and settings resulting in poor + portability. + +## Goals + +1. Convenience encoding element + + Create a convenience GstBin for encoding and muxing several streams, + hereafter called 'EncodeBin'. + + This element will only contain one single property, which is a profile. + +2. Define a encoding profile system + +3. Encoding profile helper library + +Create a helper library to: + + - create EncodeBin instances based on profiles, and + + - help applications to create/load/save/browse those profiles. + +## EncodeBin + +### Proposed API + +EncodeBin is a GstBin subclass. + +It implements the GstTagSetter interface, by which it will proxy the +calls to the muxer. + +Only two introspectable property (i.e. usable without extra API): + - A GstEncodingProfile + - The name of the profile to use + +When a profile is selected, encodebin will: + + - Add REQUEST sinkpads for all the GstStreamProfile + - Create the muxer and expose the source pad + +Whenever a request pad is created, encodebin will: + + - Create the chain of elements for that pad + - Ghost the sink pad + - Return that ghost pad + +This allows reducing the code to the minimum for applications wishing to +encode a source for a given profile: + + encbin = gst_element_factory_make ("encodebin, NULL); + g_object_set (encbin, "profile", "N900/H264 HQ", NULL); + gst_element_link (encbin, filesink); + + vsrcpad = gst_element_get_src_pad (source, "src1"); + vsinkpad = gst_element_get_request\_pad (encbin, "video\_%u"); + gst_pad_link (vsrcpad, vsinkpad); + +### Explanation of the Various stages in EncodeBin + +This describes the various stages which can happen in order to end up +with a multiplexed stream that can then be stored or streamed. + +#### Incoming streams + +The streams fed to EncodeBin can be of various types: + + - Video + - Uncompressed (but maybe subsampled) + - Compressed + - Audio + - Uncompressed (audio/x-raw) + - Compressed + - Timed text + - Private streams + +#### Steps involved for raw video encoding + +0) Incoming Stream + +1) Transform raw video feed (optional) + +Here we modify the various fundamental properties of a raw video stream +to be compatible with the intersection of: \* The encoder GstCaps and \* +The specified "Stream Restriction" of the profile/target + +The fundamental properties that can be modified are: \* width/height +This is done with a video scaler. The DAR (Display Aspect Ratio) MUST be +respected. If needed, black borders can be added to comply with the +target DAR. \* framerate \* format/colorspace/depth All of this is done +with a colorspace converter + +2) Actual encoding (optional for raw streams) + +An encoder (with some optional settings) is used. + +3) Muxing + +A muxer (with some optional settings) is used. + +4) Outgoing encoded and muxed stream + +#### Steps involved for raw audio encoding + +This is roughly the same as for raw video, expect for (1) + +1) Transform raw audo feed (optional) + +We modify the various fundamental properties of a raw audio stream to be +compatible with the intersection of: \* The encoder GstCaps and \* The +specified "Stream Restriction" of the profile/target + +The fundamental properties that can be modifier are: \* Number of +channels \* Type of raw audio (integer or floating point) \* Depth +(number of bits required to encode one sample) + +#### Steps involved for encoded audio/video streams + +Steps (1) and (2) are replaced by a parser if a parser is available for +the given format. + +#### Steps involved for other streams + +Other streams will just be forwarded as-is to the muxer, provided the +muxer accepts the stream type. + +## Encoding Profile System + +This work is based on: + + - The existing [GstPreset API documentation][gst-preset] system for elements + + - The gnome-media [GConf audio profile system][gconf-audio-profile] + + - The investigation done into device profiles by Arista and + Transmageddon: [Research on a Device Profile API][device-profile-api], + and [Research on defining presets usage][preset-usage]. + +### Terminology + + - Encoding Target Category A Target Category is a classification of + devices/systems/use-cases for encoding. + +Such a classification is required in order for: \* Applications with a +very-specific use-case to limit the number of profiles they can offer +the user. A screencasting application has no use with the online +services targets for example. \* Offering the user some initial +classification in the case of a more generic encoding application (like +a video editor or a transcoder). + +Ex: Consumer devices Online service Intermediate Editing Format +Screencast Capture Computer + + - Encoding Profile Target A Profile Target describes a specific entity + for which we wish to encode. A Profile Target must belong to at + least one Target Category. It will define at least one Encoding + Profile. + + Examples (with category): Nokia N900 (Consumer device) Sony PlayStation 3 + (Consumer device) Youtube (Online service) DNxHD (Intermediate editing + format) HuffYUV (Screencast) Theora (Computer) + + - Encoding Profile A specific combination of muxer, encoders, presets + and limitations. + + Examples: Nokia N900/H264 HQ, Ipod/High Quality, DVD/Pal, + Youtube/High Quality HTML5/Low Bandwith, DNxHD + +### Encoding Profile + +An encoding profile requires the following information: + + - Name This string is not translatable and must be unique. A + recommendation to guarantee uniqueness of the naming could be: + / + - Description This is a translatable string describing the profile + - Muxing format This is a string containing the GStreamer media-type + of the container format. + - Muxing preset This is an optional string describing the preset(s) to + use on the muxer. + - Multipass setting This is a boolean describing whether the profile + requires several passes. + - List of Stream Profile + +2.3.1 Stream Profiles + +A Stream Profile consists of: + + - Type The type of stream profile (audio, video, text, private-data) + - Encoding Format This is a string containing the GStreamer media-type + of the encoding format to be used. If encoding is not to be applied, + the raw audio media type will be used. + - Encoding preset This is an optional string describing the preset(s) + to use on the encoder. + - Restriction This is an optional GstCaps containing the restriction + of the stream that can be fed to the encoder. This will generally + containing restrictions in video width/heigh/framerate or audio + depth. + - presence This is an integer specifying how many streams can be used + in the containing profile. 0 means that any number of streams can be + used. + - pass This is an integer which is only meaningful if the multipass + flag has been set in the profile. If it has been set it indicates + which pass this Stream Profile corresponds to. + +### 2.4 Example profile + +The representation used here is XML only as an example. No decision is +made as to which formatting to use for storing targets and profiles. + + + Nokia N900 + Consumer Device + + Nokia N900/H264 HQ + Nokia N900/MP3 + Nokia N900/AAC + + + + + Nokia N900/H264 HQ + + High Quality H264/AAC for the Nokia N900 + + video/quicktime,variant=iso + + + audio + audio/mpeg,mpegversion=4 + Quality High/Main + audio/x-raw,channels=[1,2] + 1 + + + video + video/x-h264 + Profile Baseline/Quality High + + video/x-raw,width=[16, 800],\ + height=[16, 480],framerate=[1/1, 30000/1001] + + 1 + + + + +### API + +A proposed C API is contained in the gstprofile.h file in this +directory. + +### Modifications required in the existing GstPreset system + +#### Temporary preset. + +Currently a preset needs to be saved on disk in order to be used. + +This makes it impossible to have temporary presets (that exist only +during the lifetime of a process), which might be required in the new +proposed profile system + +#### Categorisation of presets. + +Currently presets are just aliases of a group of property/value without +any meanings or explanation as to how they exclude each other. + +Take for example the H264 encoder. It can have presets for: \* passes +(1,2 or 3 passes) \* profiles (Baseline, Main, ...) \* quality (Low, +medium, High) + +In order to programmatically know which presets exclude each other, we +here propose the categorisation of these presets. + +This can be done in one of two ways 1. in the name (by making the name +be \[:\]) This would give for example: "Quality:High", +"Profile:Baseline" 2. by adding a new \_meta key This would give for +example: \_meta/category:quality + +#### Aggregation of presets. + +There can be more than one choice of presets to be done for an element +(quality, profile, pass). + +This means that one can not currently describe the full configuration of +an element with a single string but with many. + +The proposal here is to extend the GstPreset API to be able to set all +presets using one string and a well-known separator ('/'). + +This change only requires changes in the core preset handling code. + +This would allow doing the following: gst\_preset\_load\_preset +(h264enc, "pass:1/profile:baseline/quality:high"); + +### Points to be determined + +This document hasn't determined yet how to solve the following problems: + +#### Storage of profiles + +One proposal for storage would be to use a system wide directory (like +$prefix/share/gstreamer-0.10/profiles) and store XML files for every +individual profiles. + +Users could then add their own profiles in ~/.gstreamer-0.10/profiles + +This poses some limitations as to what to do if some applications want +to have some profiles limited to their own usage. + +## Helper library for profiles + +These helper methods could also be added to existing libraries (like +GstPreset, GstPbUtils, ..). + +The various API proposed are in the accompanying gstprofile.h file. + +### Getting user-readable names for formats + +This is already provided by GstPbUtils. + +### Hierarchy of profiles + +The goal is for applications to be able to present to the user a list of +combo-boxes for choosing their output profile: + +\[ Category \] \# optional, depends on the application \[ Device/Site/.. +\] \# optional, depends on the application \[ Profile \] + +Convenience methods are offered to easily get lists of categories, +devices, and profiles. + +### Creating Profiles + +The goal is for applications to be able to easily create profiles. + +The applications needs to be able to have a fast/efficient way to: \* +select a container format and see all compatible streams he can use with +it. \* select a codec format and see which container formats he can use +with it. + +The remaining parts concern the restrictions to encoder input. + +### Ensuring availability of plugins for Profiles + +When an application wishes to use a Profile, it should be able to query +whether it has all the needed plugins to use it. + +This part will use GstPbUtils to query, and if needed install the +missing plugins through the installed distribution plugin installer. + +## Use-cases researched + +This is a list of various use-cases where encoding/muxing is being used. + +### Transcoding + +The goal is to convert with as minimal loss of quality any input file +for a target use. A specific variant of this is transmuxing (see below). + +Example applications: Arista, Transmageddon + +### Rendering timelines + +The incoming streams are a collection of various segments that need to +be rendered. Those segments can vary in nature (i.e. the video +width/height can change). This requires the use of identiy with the +single-segment property activated to transform the incoming collection +of segments to a single continuous segment. + +Example applications: PiTiVi, Jokosher + +### Encoding of live sources + +The major risk to take into account is the encoder not encoding the +incoming stream fast enough. This is outside of the scope of encodebin, +and should be solved by using queues between the sources and encodebin, +as well as implementing QoS in encoders and sources (the encoders +emitting QoS events, and the upstream elements adapting themselves +accordingly). + +Example applications: camerabin, cheese + +### Screencasting applications + +This is similar to encoding of live sources. The difference being that +due to the nature of the source (size and amount/frequency of updates) +one might want to do the encoding in two parts: \* The actual live +capture is encoded with a 'almost-lossless' codec (such as huffyuv) \* +Once the capture is done, the file created in the first step is then +rendered to the desired target format. + +Fixing sources to only emit region-updates and having encoders capable +of encoding those streams would fix the need for the first step but is +outside of the scope of encodebin. + +Example applications: Istanbul, gnome-shell, recordmydesktop + +### Live transcoding + +This is the case of an incoming live stream which will be +broadcasted/transmitted live. One issue to take into account is to +reduce the encoding latency to a minimum. This should mostly be done by +picking low-latency encoders. + +Example applications: Rygel, Coherence + +### Transmuxing + +Given a certain file, the aim is to remux the contents WITHOUT decoding +into either a different container format or the same container format. +Remuxing into the same container format is useful when the file was not +created properly (for example, the index is missing). Whenever +available, parsers should be applied on the encoded streams to validate +and/or fix the streams before muxing them. + +Metadata from the original file must be kept in the newly created file. + +Example applications: Arista, Transmaggedon + +### Loss-less cutting + +Given a certain file, the aim is to extract a certain part of the file +without going through the process of decoding and re-encoding that file. +This is similar to the transmuxing use-case. + +Example applications: PiTiVi, Transmageddon, Arista, ... + +### Multi-pass encoding + +Some encoders allow doing a multi-pass encoding. The initial pass(es) +are only used to collect encoding estimates and are not actually muxed +and outputted. The final pass uses previously collected information, and +the output is then muxed and outputted. + +### Archiving and intermediary format + +The requirement is to have lossless + +### CD ripping + +Example applications: Sound-juicer + +### DVD ripping + +Example application: Thoggen + +### Research links + +Some of these are still active documents, some other not + +[gst-preset]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html +[gconf-audio-profile]: http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html +[device-profile-api]: http://gstreamer.freedesktop.org/wiki/DeviceProfile (FIXME: wiki is gone) +[preset-usage]: http://gstreamer.freedesktop.org/wiki/PresetDesign (FIXME: wiki is gone) + diff --git a/markdown/design/interlaced-video.md b/markdown/design/interlaced-video.md new file mode 100644 index 0000000..7abeaf8 --- /dev/null +++ b/markdown/design/interlaced-video.md @@ -0,0 +1,102 @@ +# Interlaced Video + +Video buffers have a number of states identifiable through a combination +of caps and buffer flags. + +Possible states: +- Progressive +- Interlaced + - Plain + - One field + - Two fields + - Three fields - this should be a progressive buffer with a repeated 'first' + field that can be used for telecine pulldown + - Telecine + - One field + - Two fields + - Progressive + - Interlaced (a.k.a. 'mixed'; the fields are from different frames) + - Three fields - this should be a progressive buffer with a repeated 'first' + field that can be used for telecine pulldown + +Note: It can be seen that the difference between the plain interlaced +and telecine states is that in the telecine state, buffers containing +two fields may be progressive. + +Tools for identification: + - GstVideoInfo + - GstVideoInterlaceMode - enum `GST_VIDEO_INTERLACE_MODE_...` + - PROGRESSIVE + - INTERLEAVED + - MIXED + - Buffers flags - `GST_VIDEO_BUFFER_FLAG_...` + - TFF + - RFF + - ONEFIELD + - INTERLACED + +## Identification of Buffer States + +Note that flags are not necessarily interpreted in the same way for all +different states nor are they necessarily required nor make sense in all +cases. + +### Progressive + +If the interlace mode in the video info corresponding to a buffer is +**"progressive"**, then the buffer is progressive. + +### Plain Interlaced + +If the video info interlace mode is **"interleaved"**, then the buffer is +plain interlaced. + +`GST_VIDEO_BUFFER_FLAG_TFF` indicates whether the top or bottom field +is to be displayed first. The timestamp on the buffer corresponds to the +first field. + +`GST_VIDEO_BUFFER_FLAG_RFF` indicates that the first field (indicated +by the TFF flag) should be repeated. This is generally only used for +telecine purposes but as the telecine state was added long after the +interlaced state was added and defined, this flag remains valid for +plain interlaced buffers. + +`GST_VIDEO_BUFFER_FLAG_ONEFIELD` means that only the field indicated +through the TFF flag is to be used. The other field should be ignored. + +### Telecine + +If video info interlace mode is **"mixed"** then the buffers are in some +form of telecine state. + +The `TFF` and `ONEFIELD` flags have the same semantics as for the plain +interlaced state. + +`GST_VIDEO_BUFFER_FLAG_RFF` in the telecine state indicates that the +buffer contains only repeated fields that are present in other buffers +and are as such unneeded. For example, in a sequence of three telecined +frames, we might have: + + AtAb AtBb BtBb + +In this situation, we only need the first and third buffers as the +second buffer contains fields present in the first and third. + +Note that the following state can have its second buffer identified +using the `ONEFIELD` flag (and `TFF` not set): + + AtAb AtBb BtCb + +The telecine state requires one additional flag to be able to identify +progressive buffers. + +The presence of the `GST_VIDEO_BUFFER_FLAG_INTERLACED` means that the +buffer is an 'interlaced' or 'mixed' buffer that contains two fields +that, when combined with fields from adjacent buffers, allow +reconstruction of progressive frames. The absence of the flag implies +the buffer containing two fields is a progressive frame. + +For example in the following sequence, the third buffer would be mixed +(yes, it is a strange pattern, but it can happen): + + AtAb AtBb BtCb CtDb DtDb diff --git a/markdown/design/keyframe-force.md b/markdown/design/keyframe-force.md new file mode 100644 index 0000000..84be492 --- /dev/null +++ b/markdown/design/keyframe-force.md @@ -0,0 +1,97 @@ +# Forcing keyframes + +Consider the following use case: + +We have a pipeline that performs video and audio capture from a live +source, compresses and muxes the streams and writes the resulting data +into a file. + +Inside the uncompressed video data we have a specific pattern inserted +at specific moments that should trigger a switch to a new file, meaning, +we close the existing file we are writing to and start writing to a new +file. + +We want the new file to start with a keyframe so that one can start +decoding the file immediately. + +## Components + +1) We need an element that is able to detect the pattern in the video + stream. + +2) We need to inform the video encoder that it should start encoding a + keyframe starting from exactly the frame with the pattern. + +3) We need to inform the demuxer that it should flush out any pending + data and start creating the start of a new file with the keyframe as + a first video frame. + +4) We need to inform the sink element that it should start writing to + the next file. This requires application interaction to instruct the + sink of the new filename. The application should also be free to + ignore the boundary and continue to write to the existing file. The + application will typically use an event pad probe to detect the + custom event. + +## Implementation + +### Downstream + +The implementation would consist of generating a `GST_EVENT_CUSTOM_DOWNSTREAM` +event that marks the keyframe boundary. This event is inserted into the +pipeline by the application upon a certain trigger. In the above use case +this trigger would be given by the element that detects the pattern, in the +form of an element message. + +The custom event would travel further downstream to instruct encoder, +muxer and sink about the possible switch. + +The information passed in the event consists of: + +**GstForceKeyUnit** + + - **"timestamp"** (`G_TYPE_UINT64`): the timestamp of the buffer that + triggered the event. + + - **"stream-time"** (`G_TYPE_UINT64`): the stream position that triggered the event. + + - **"running-time"** (`G_TYPE_UINT64`): the running time of the stream when + the event was triggered. + + - **"all-headers"** (`G_TYPE_BOOLEAN`): Send all headers, including + those in the caps or those sent at the start of the stream. + + - **...**: optional other data fields. + +Note that this event is purely informational, no element is required to +perform an action but it should forward the event downstream, just like +any other event it does not handle. + +Elements understanding the event should behave as follows: + +1) The video encoder receives the event before the next frame. Upon + reception of the event it schedules to encode the next frame as a + keyframe. Before pushing out the encoded keyframe it must push the + GstForceKeyUnit event downstream. + +2) The muxer receives the GstForceKeyUnit event and flushes out its + current state, preparing to produce data that can be used as a + keyunit. Before pushing out the new data it pushes the + GstForceKeyUnit event downstream. + +3) The application receives the GstForceKeyUnit on a sink padprobe of + the sink and reconfigures the sink to make it perform new actions + after receiving the next buffer. + +### Upstream + +When using RTP packets can get lost or receivers can be added at any +time, they may request a new key frame. + +An downstream element sends an upstream "GstForceKeyUnit" event up the +pipeline. + +When an element produces some kind of key unit in output, but has no +such concept in its input (like an encoder that takes raw frames), it +consumes the event (doesn't pass it upstream), and instead sends a +downstream GstForceKeyUnit event and a new keyframe. diff --git a/markdown/design/mediatype-audio-raw.md b/markdown/design/mediatype-audio-raw.md new file mode 100644 index 0000000..f65c1d0 --- /dev/null +++ b/markdown/design/mediatype-audio-raw.md @@ -0,0 +1,68 @@ +# Raw Audio Media Types + +**audio/x-raw** + + - **format**, G\_TYPE\_STRING, mandatory The format of the audio samples, see + the Formats section for a list of valid sample formats. + + - **rate**, G\_TYPE\_INT, mandatory The samplerate of the audio + + - **channels**, G\_TYPE\_INT, mandatory The number of channels + + - **channel-mask**, GST\_TYPE\_BITMASK, mandatory for more than 2 channels + Bitmask of channel positions present. May be omitted for mono and + stereo. May be set to 0 to denote that the channels are unpositioned. + + - **layout**, G\_TYPE\_STRING, mandatory The layout of channels within a + buffer. Possible values are "interleaved" (for LRLRLRLR) and + "non-interleaved" (LLLLRRRR) + +Use `GstAudioInfo` and related helper API to create and parse raw audio caps. + +## Metadata + + - `GstAudioDownmixMeta`: A matrix for downmixing multichannel audio to a + lower numer of channels. + +## Formats + +The following values can be used for the format string property. + + - "S8" 8-bit signed PCM audio + - "U8" 8-bit unsigned PCM audio + + - "S16LE" 16-bit signed PCM audio + - "S16BE" 16-bit signed PCM audio + - "U16LE" 16-bit unsigned PCM audio + - "U16BE" 16-bit unsigned PCM audio + + - "S24\_32LE" 24-bit signed PCM audio packed into 32-bit + - "S24\_32BE" 24-bit signed PCM audio packed into 32-bit + - "U24\_32LE" 24-bit unsigned PCM audio packed into 32-bit + - "U24\_32BE" 24-bit unsigned PCM audio packed into 32-bit + + - "S32LE" 32-bit signed PCM audio + - "S32BE" 32-bit signed PCM audio + - "U32LE" 32-bit unsigned PCM audio + - "U32BE" 32-bit unsigned PCM audio + + - "S24LE" 24-bit signed PCM audio + - "S24BE" 24-bit signed PCM audio + - "U24LE" 24-bit unsigned PCM audio + - "U24BE" 24-bit unsigned PCM audio + + - "S20LE" 20-bit signed PCM audio + - "S20BE" 20-bit signed PCM audio + - "U20LE" 20-bit unsigned PCM audio + - "U20BE" 20-bit unsigned PCM audio + + - "S18LE" 18-bit signed PCM audio + - "S18BE" 18-bit signed PCM audio + - "U18LE" 18-bit unsigned PCM audio + - "U18BE" 18-bit unsigned PCM audio + + - "F32LE" 32-bit floating-point audio + - "F32BE" 32-bit floating-point audio + - "F64LE" 64-bit floating-point audio + - "F64BE" 64-bit floating-point audio + diff --git a/markdown/design/mediatype-text-raw.md b/markdown/design/mediatype-text-raw.md new file mode 100644 index 0000000..103ed1d --- /dev/null +++ b/markdown/design/mediatype-text-raw.md @@ -0,0 +1,22 @@ +# Raw Text Media Types + +**text/x-raw** + + - **format**, G\_TYPE\_STRING, mandatory The format of the text, see the + Formats section for a list of valid format strings. + +## Metadata + +There are no common metas for this raw format yet. + +## Formats + + - "utf8": plain timed utf8 text (formerly text/plain) + Parsed timed text in utf8 format. + + - "pango-markup": plain timed utf8 text with pango markup + (formerly text/x-pango-markup). Same as "utf8", but text embedded in an + XML-style markup language for size, colour, emphasis, etc. + See [Pango Markup Format][pango-markup] + +[pango-markup]: http://developer.gnome.org/pango/stable/PangoMarkupFormat.html diff --git a/markdown/design/mediatype-video-raw.md b/markdown/design/mediatype-video-raw.md new file mode 100644 index 0000000..26409e6 --- /dev/null +++ b/markdown/design/mediatype-video-raw.md @@ -0,0 +1,1240 @@ +# Raw Video Media Types + +**video/x-raw** + + - **width**, G\_TYPE\_INT, mandatory The width of the image in pixels. + + - **height**, G\_TYPE\_INT, mandatory The height of the image in pixels + + - **framerate**, GST\_TYPE\_FRACTION, default 0/1 The framerate of the video + 0/1 for variable framerate + + - **max-framerate**, GST\_TYPE\_FRACTION, default as framerate For variable + framerates this would be the maximum framerate that is expected. This + value is only valid when the framerate is 0/1 + + - **views**, G\_TYPE\_INT, default 1 The number of views for multiview video. + Each buffer contains multiple GstVideoMeta buffers that describe each + view. use the frame id to get access to the different views. + + - **interlace-mode**, G\_TYPE\_STRING, default progressive The interlace mode. + The following values are possible: + + - *"progressive"*: all frames are progressive + + - *"interleaved"*: 2 fields are interleaved in one video frame. Extra buffer + flags describe the field order. + + - *"mixed"*: progressive and interleaved frames, extra buffer flags + describe the frame and fields. + + - *"fields"*: 2 fields are stored in one buffer, use the frame ID + to get access to the required field. For multiview (the + 'views' property > 1) the fields of view N can be found at + frame ID (N * 2) and (N * 2) + 1. + Each view has only half the amount of lines as noted in the + height property, pads specifying the "fields" property + must be prepared for this. This mode requires multiple + GstVideoMeta metadata to describe the fields. + + - **chroma-site**, G\_TYPE\_STRING, default UNKNOWN The chroma siting of the + video frames. + + - *"jpeg"*: `GST_VIDEO_CHROMA_SITE_JPEG` + - *"mpeg2"*: `GST_VIDEO_CHROMA_SITE_MPEG2` + - *"dv"*: `GST_VIDEO_CHROMA_SITE_DV` + + - **colorimetry**, G\_TYPE\_STRING, default UNKNOWN The colorimetry of the + video frames predefined colorimetry is given with the following values: + + - *"bt601"* + - *"bt709"* + - *"smpte240m"* + + - **pixel-aspect-ratio**, GST\_TYPE\_FRACTION, default 1/1 The pixel aspect + ration of the video + + - **format**, G\_TYPE\_STRING, mandatory The format of the video, see the + Formats section for a list of valid format strings. + +## Metadata + + - `GstVideoMeta` contains the description of one video field or frame. It + has stride support and support for having multiple memory regions per frame. + Multiple GstVideoMeta can be added to a buffer and can be identified with a + unique id. This id can be used to select fields in interlaced formats or + views in multiview formats. + + - `GstVideoCropMeta` contains the cropping region of the video. + +## Formats + +- **"I420"** planar 4:2:0 YUV + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth: 8 + pstride: 1 + default offset: size (component0) + default rstride: RU4 (RU2 (width) / 2) + default size: rstride (component1) * RU2 (height) / 2 + + Component 2: V + depth 8 + pstride: 1 + default offset: offset (component1) + size (component1) + default rstride: RU4 (RU2 (width) / 2) + default size: rstride (component2) * RU2 (height) / 2 + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"YV12"** planar 4:2:0 YUV + + Same as I420 but with U and V planes swapped + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth 8 + pstride: 1 + default offset: offset (component2) + size (component2) + default rstride: RU4 (RU2 (width) / 2) + default size: rstride (component1) * RU2 (height) / 2 + + Component 2: V + depth: 8 + pstride: 1 + default offset: size (component0) + default rstride: RU4 (RU2 (width) / 2) + default size: rstride (component2) * RU2 (height) / 2 + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"YUY2"** packed 4:2:2 YUV + + +--+--+--+--+ +--+--+--+--+ + |Y0|U0|Y1|V0| |Y2|U2|Y3|V2| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: Y + depth: 8 + pstride: 2 + offset: 0 + + Component 1: U + depth: 8 + offset: 1 + pstride: 4 + + Component 2: V + depth 8 + offset: 3 + pstride: 4 + + Image + default rstride: RU4 (width * 2) + default size: rstride (image) * height + +- **"YVYU"** packed 4:2:2 YUV + + Same as "YUY2" but with U and V planes swapped + + +--+--+--+--+ +--+--+--+--+ + |Y0|V0|Y1|U0| |Y2|V2|Y3|U2| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: Y + depth: 8 + pstride: 2 + offset: 0 + + Component 1: U + depth: 8 + pstride: 4 + offset: 3 + + Component 2: V + depth 8 + pstride: 4 + offset: 1 + + Image + default rstride: RU4 (width * 2) + default size: rstride (image) * height + +- **"UYVY"** packed 4:2:2 YUV + + +--+--+--+--+ +--+--+--+--+ + |U0|Y0|V0|Y1| |U2|Y2|V2|Y3| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: Y + depth: 8 + pstride: 2 + offset: 1 + + Component 1: U + depth: 8 + pstride: 4 + offset: 0 + + Component 2: V + depth 8 + pstride: 4 + offset: 2 + + Image + default rstride: RU4 (width * 2) + default size: rstride (image) * height + +- **"AYUV"** packed 4:4:4 YUV with alpha channel + + +--+--+--+--+ +--+--+--+--+ + |A0|Y0|U0|V0| |A1|Y1|U1|V1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: Y + depth: 8 + pstride: 4 + offset: 1 + + Component 1: U + depth: 8 + pstride: 4 + offset: 2 + + Component 2: V + depth 8 + pstride: 4 + offset: 3 + + Component 3: A + depth 8 + pstride: 4 + offset: 0 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"RGBx"** sparse rgb packed into 32 bit, space last + + +--+--+--+--+ +--+--+--+--+ + |R0|G0|B0|X | |R1|G1|B1|X | ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 8 + pstride: 4 + offset: 0 + + Component 1: G + depth: 8 + pstride: 4 + offset: 1 + + Component 2: B + depth 8 + pstride: 4 + offset: 2 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"BGRx"** sparse reverse rgb packed into 32 bit, space last + + +--+--+--+--+ +--+--+--+--+ + |B0|G0|R0|X | |B1|G1|R1|X | ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 8 + pstride: 4 + offset: 2 + + Component 1: G + depth: 8 + pstride: 4 + offset: 1 + + Component 2: B + depth 8 + pstride: 4 + offset: 0 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"xRGB"** sparse rgb packed into 32 bit, space first + + +--+--+--+--+ +--+--+--+--+ + |X |R0|G0|B0| |X |R1|G1|B1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 8 + pstride: 4 + offset: 1 + + Component 1: G + depth: 8 + pstride: 4 + offset: 2 + + Component 2: B + depth 8 + pstride: 4 + offset: 3 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"xBGR"** sparse reverse rgb packed into 32 bit, space first + + +--+--+--+--+ +--+--+--+--+ + |X |B0|G0|R0| |X |B1|G1|R1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 8 + pstride: 4 + offset: 3 + + Component 1: G + depth: 8 + pstride: 4 + offset: 2 + + Component 2: B + depth 8 + pstride: 4 + offset: 1 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"RGBA"** rgb with alpha channel last + + +--+--+--+--+ +--+--+--+--+ + |R0|G0|B0|A0| |R1|G1|B1|A1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 8 + pstride: 4 + offset: 0 + + Component 1: G + depth: 8 + pstride: 4 + offset: 1 + + Component 2: B + depth 8 + pstride: 4 + offset: 2 + + Component 3: A + depth 8 + pstride: 4 + offset: 3 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"BGRA"** reverse rgb with alpha channel last + + +--+--+--+--+ +--+--+--+--+ + |B0|G0|R0|A0| |B1|G1|R1|A1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 8 + pstride: 4 + offset: 2 + + Component 1: G + depth: 8 + pstride: 4 + offset: 1 + + Component 2: B + depth 8 + pstride: 4 + offset: 0 + + Component 3: A + depth 8 + pstride: 4 + offset: 3 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"ARGB"** rgb with alpha channel first + + +--+--+--+--+ +--+--+--+--+ + |A0|R0|G0|B0| |A1|R1|G1|B1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 8 + pstride: 4 + offset: 1 + + Component 1: G + depth: 8 + pstride: 4 + offset: 2 + + Component 2: B + depth 8 + pstride: 4 + offset: 3 + + Component 3: A + depth 8 + pstride: 4 + offset: 0 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"ABGR"** reverse rgb with alpha channel first + + +--+--+--+--+ +--+--+--+--+ + |A0|R0|G0|B0| |A1|R1|G1|B1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 8 + pstride: 4 + offset: 1 + + Component 1: G + depth: 8 + pstride: 4 + offset: 2 + + Component 2: B + depth 8 + pstride: 4 + offset: 3 + + Component 3: A + depth 8 + pstride: 4 + offset: 0 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"RGB"** rgb + + +--+--+--+ +--+--+--+ + |R0|G0|B0| |R1|G1|B1| ... + +--+--+--+ +--+--+--+ + + Component 0: R + depth: 8 + pstride: 3 + offset: 0 + + Component 1: G + depth: 8 + pstride: 3 + offset: 1 + + Component 2: B + depth 8 + pstride: 3 + offset: 2 + + Image + default rstride: RU4 (width * 3) + default size: rstride (image) * height + +- **"BGR"** reverse rgb + + +--+--+--+ +--+--+--+ + |B0|G0|R0| |B1|G1|R1| ... + +--+--+--+ +--+--+--+ + + Component 0: R + depth: 8 + pstride: 3 + offset: 2 + + Component 1: G + depth: 8 + pstride: 3 + offset: 1 + + Component 2: B + depth 8 + pstride: 3 + offset: 0 + + Image + default rstride: RU4 (width * 3) + default size: rstride (image) * height + +- **"Y41B"** planar 4:1:1 YUV + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * height + + Component 1: U + depth 8 + pstride: 1 + default offset: size (component0) + default rstride: RU16 (width) / 4 + default size: rstride (component1) * height + + Component 2: V + depth: 8 + pstride: 1 + default offset: offset (component1) + size (component1) + default rstride: RU16 (width) / 4 + default size: rstride (component2) * height + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"Y42B"** planar 4:2:2 YUV + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * height + + Component 1: U + depth 8 + pstride: 1 + default offset: size (component0) + default rstride: RU8 (width) / 2 + default size: rstride (component1) * height + + Component 2: V + depth: 8 + pstride: 1 + default offset: offset (component1) + size (component1) + default rstride: RU8 (width) / 2 + default size: rstride (component2) * height + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"Y444"** planar 4:4:4 YUV + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * height + + Component 1: U + depth 8 + pstride: 1 + default offset: size (component0) + default rstride: RU4 (width) + default size: rstride (component1) * height + + Component 2: V + depth: 8 + pstride: 1 + default offset: offset (component1) + size (component1) + default rstride: RU4 (width) + default size: rstride (component2) * height + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"v210"** packed 4:2:2 10-bit YUV, complex format + + Component 0: Y + depth: 10 + + Component 1: U + depth 10 + + Component 2: V + depth: 10 + + Image + default rstride: RU48 (width) * 128 + default size: rstride (image) * height + +- **"v216"** packed 4:2:2 16-bit YUV, Y0-U0-Y1-V1 order + + +--+--+--+--+ +--+--+--+--+ + |U0|Y0|V0|Y1| |U1|Y2|V1|Y3| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: Y + depth: 16 LE + pstride: 4 + offset: 2 + + Component 1: U + depth 16 LE + pstride: 8 + offset: 0 + + Component 2: V + depth: 16 LE + pstride: 8 + offset: 4 + + Image + default rstride: RU8 (width * 2) + default size: rstride (image) * height + +- **"NV12"** planar 4:2:0 YUV with interleaved UV plane + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth 8 + pstride: 2 + default offset: size (component0) + default rstride: RU4 (width) + + Component 2: V + depth: 8 + pstride: 2 + default offset: offset (component1) + 1 + default rstride: RU4 (width) + + Image + default size: RU4 (width) * RU2 (height) * 3 / 2 + +- **"NV21"** planar 4:2:0 YUV with interleaved VU plane + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth 8 + pstride: 2 + default offset: offset (component1) + 1 + default rstride: RU4 (width) + + Component 2: V + depth: 8 + pstride: 2 + default offset: size (component0) + default rstride: RU4 (width) + + Image + default size: RU4 (width) * RU2 (height) * 3 / 2 + +- **"GRAY8"** 8-bit grayscale "Y800" same as "GRAY8" + + Component 0: Y + depth: 8 + offset: 0 + pstride: 1 + default rstride: RU4 (width) + default size: rstride (component0) * height + + Image + default size: size (component0) + +- **"GRAY16\_BE"** 16-bit grayscale, most significant byte first + + Component 0: Y + depth: 16 + offset: 0 + pstride: 2 + default rstride: RU4 (width * 2) + default size: rstride (component0) * height + + Image + default size: size (component0) + +- **"GRAY16\_LE"** 16-bit grayscale, least significant byte first +- **"Y16"** same as "GRAY16\_LE" + + Component 0: Y + depth: 16 LE + offset: 0 + pstride: 2 + default rstride: RU4 (width * 2) + default size: rstride (component0) * height + + Image + default size: size (component0) + +- **"v308"** packed 4:4:4 YUV + + +--+--+--+ +--+--+--+ + |Y0|U0|V0| |Y1|U1|V1| ... + +--+--+--+ +--+--+--+ + + Component 0: Y + depth: 8 + pstride: 3 + offset: 0 + + Component 1: U + depth 8 + pstride: 3 + offset: 1 + + Component 2: V + depth: 8 + pstride: 3 + offset: 2 + + Image + default rstride: RU4 (width * 3) + default size: rstride (image) * height + +- **"IYU2"** packed 4:4:4 YUV, U-Y-V order + + +--+--+--+ +--+--+--+ + |U0|Y0|V0| |U1|Y1|V1| ... + +--+--+--+ +--+--+--+ + + Component 0: Y + depth: 8 + pstride: 3 + offset: 1 + + Component 1: U + depth 8 + pstride: 3 + offset: 0 + + Component 2: V + depth: 8 + pstride: 3 + offset: 2 + + Image + default rstride: RU4 (width * 3) + default size: rstride (image) * height + +- **"RGB16"** rgb 5-6-5 bits per component + + +--+--+--+ +--+--+--+ + |R0|G0|B0| |R1|G1|B1| ... + +--+--+--+ +--+--+--+ + + Component 0: R + depth: 5 + pstride: 2 + + Component 1: G + depth 6 + pstride: 2 + + Component 2: B + depth: 5 + pstride: 2 + + Image + default rstride: RU4 (width * 2) + default size: rstride (image) * height + +- **"BGR16"** reverse rgb 5-6-5 bits per component + + +--+--+--+ +--+--+--+ + |B0|G0|R0| |B1|G1|R1| ... + +--+--+--+ +--+--+--+ + + Component 0: R + depth: 5 + pstride: 2 + + Component 1: G + depth 6 + pstride: 2 + + Component 2: B + depth: 5 + pstride: 2 + + Image + default rstride: RU4 (width * 2) + default size: rstride (image) * height + +- **"RGB15"** rgb 5-5-5 bits per component + + +--+--+--+ +--+--+--+ + |R0|G0|B0| |R1|G1|B1| ... + +--+--+--+ +--+--+--+ + + Component 0: R + depth: 5 + pstride: 2 + + Component 1: G + depth 5 + pstride: 2 + + Component 2: B + depth: 5 + pstride: 2 + + Image + default rstride: RU4 (width * 2) + default size: rstride (image) * height + +- **"BGR15"** reverse rgb 5-5-5 bits per component + + +--+--+--+ +--+--+--+ + |B0|G0|R0| |B1|G1|R1| ... + +--+--+--+ +--+--+--+ + + Component 0: R + depth: 5 + pstride: 2 + + Component 1: G + depth 5 + pstride: 2 + + Component 2: B + depth: 5 + pstride: 2 + + Image + default rstride: RU4 (width * 2) + default size: rstride (image) * height + +- **"UYVP"** packed 10-bit 4:2:2 YUV (U0-Y0-V0-Y1 U2-Y2-V2-Y3 U4 ...) + + Component 0: Y + depth: 10 + + Component 1: U + depth 10 + + Component 2: V + depth: 10 + + Image + default rstride: RU4 (width * 2 * 5) + default size: rstride (image) * height + +- **"A420"** planar 4:4:2:0 AYUV + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth 8 + pstride: 1 + default offset: size (component0) + default rstride: RU4 (RU2 (width) / 2) + default size: rstride (component1) * (RU2 (height) / 2) + + Component 2: V + depth: 8 + pstride: 1 + default offset: size (component0) + size (component1) + default rstride: RU4 (RU2 (width) / 2) + default size: rstride (component2) * (RU2 (height) / 2) + + Component 3: A + depth: 8 + pstride: 1 + default offset: size (component0) + size (component1) + + size (component2) + default rstride: RU4 (width) + default size: rstride (component3) * RU2 (height) + + Image + default size: size (component0) + + size (component1) + + size (component2) + + size (component3) + +- **"RGB8P"** 8-bit paletted RGB + + Component 0: INDEX + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * height + + Component 1: PALETTE + depth 32 + pstride: 4 + default offset: size (component0) + rstride: 4 + size: 256 * 4 + + Image + default size: size (component0) + size (component1) + +- **"YUV9"** planar 4:1:0 YUV + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * height + + Component 1: U + depth 8 + pstride: 1 + default offset: size (component0) + default rstride: RU4 (RU4 (width) / 4) + default size: rstride (component1) * (RU4 (height) / 4) + + Component 2: V + depth: 8 + pstride: 1 + default offset: offset (component1) + size (component1) + default rstride: RU4 (RU4 (width) / 4) + default size: rstride (component2) * (RU4 (height) / 4) + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"YVU9"** planar 4:1:0 YUV (like YUV9 but UV planes swapped) + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU4 (width) + default size: rstride (component0) * height + + Component 1: U + depth 8 + pstride: 1 + default offset: offset (component2) + size (component2) + default rstride: RU4 (RU4 (width) / 4) + default size: rstride (component1) * (RU4 (height) / 4) + + Component 2: V + depth: 8 + pstride: 1 + default offset: size (component0) + default rstride: RU4 (RU4 (width) / 4) + default size: rstride (component2) * (RU4 (height) / 4) + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"IYU1"** packed 4:1:1 YUV (Cb-Y0-Y1-Cr-Y2-Y3 ...) + + +--+--+--+ +--+--+--+ + |B0|G0|R0| |B1|G1|R1| ... + +--+--+--+ +--+--+--+ + + Component 0: Y + depth: 8 + offset: 1 + pstride: 2 + + Component 1: U + depth 5 + offset: 0 + pstride: 2 + + Component 2: V + depth: 5 + offset: 4 + pstride: 2 + + Image + default rstride: RU4 (RU4 (width) + RU4 (width) / 2) + default size: rstride (image) * height + +- **"ARGB64"** rgb with alpha channel first, 16 bits per channel + + +--+--+--+--+ +--+--+--+--+ + |A0|R0|G0|B0| |A1|R1|G1|B1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: R + depth: 16 LE + pstride: 8 + offset: 2 + + Component 1: G + depth 16 LE + pstride: 8 + offset: 4 + + Component 2: B + depth: 16 LE + pstride: 8 + offset: 6 + + Component 3: A + depth: 16 LE + pstride: 8 + offset: 0 + + Image + default rstride: width * 8 + default size: rstride (image) * height + +- **"AYUV64"** packed 4:4:4 YUV with alpha channel, 16 bits per channel (A0-Y0-U0-V0 ...) + + +--+--+--+--+ +--+--+--+--+ + |A0|Y0|U0|V0| |A1|Y1|U1|V1| ... + +--+--+--+--+ +--+--+--+--+ + + Component 0: Y + depth: 16 LE + pstride: 8 + offset: 2 + + Component 1: U + depth 16 LE + pstride: 8 + offset: 4 + + Component 2: V + depth: 16 LE + pstride: 8 + offset: 6 + + Component 3: A + depth: 16 LE + pstride: 8 + offset: 0 + + Image + default rstride: width * 8 + default size: rstride (image) * height + +- **"r210"** packed 4:4:4 RGB, 10 bits per channel + + +--+--+--+ +--+--+--+ + |R0|G0|B0| |R1|G1|B1| ... + +--+--+--+ +--+--+--+ + + Component 0: R + depth: 10 + pstride: 4 + + Component 1: G + depth 10 + pstride: 4 + + Component 2: B + depth: 10 + pstride: 4 + + Image + default rstride: width * 4 + default size: rstride (image) * height + +- **"I420\_10LE"** planar 4:2:0 YUV, 10 bits per channel LE + + Component 0: Y + depth: 10 LE + pstride: 2 + default offset: 0 + default rstride: RU4 (width * 2) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth: 10 LE + pstride: 2 + default offset: size (component0) + default rstride: RU4 (width) + default size: rstride (component1) * RU2 (height) / 2 + + Component 2: V + depth 10 LE + pstride: 2 + default offset: offset (component1) + size (component1) + default rstride: RU4 (width) + default size: rstride (component2) * RU2 (height) / 2 + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"I420\_10BE"** planar 4:2:0 YUV, 10 bits per channel BE + + Component 0: Y + depth: 10 BE + pstride: 2 + default offset: 0 + default rstride: RU4 (width * 2) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth: 10 BE + pstride: 2 + default offset: size (component0) + default rstride: RU4 (width) + default size: rstride (component1) * RU2 (height) / 2 + + Component 2: V + depth 10 BE + pstride: 2 + default offset: offset (component1) + size (component1) + default rstride: RU4 (width) + default size: rstride (component2) * RU2 (height) / 2 + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"I422\_10LE"** planar 4:2:2 YUV, 10 bits per channel LE + + Component 0: Y + depth: 10 LE + pstride: 2 + default offset: 0 + default rstride: RU4 (width * 2) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth: 10 LE + pstride: 2 + default offset: size (component0) + default rstride: RU4 (width) + default size: rstride (component1) * RU2 (height) + + Component 2: V + depth 10 LE + pstride: 2 + default offset: offset (component1) + size (component1) + default rstride: RU4 (width) + default size: rstride (component2) * RU2 (height) + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"I422\_10BE"** planar 4:2:2 YUV, 10 bits per channel BE + + Component 0: Y + depth: 10 BE + pstride: 2 + default offset: 0 + default rstride: RU4 (width * 2) + default size: rstride (component0) * RU2 (height) + + Component 1: U + depth: 10 BE + pstride: 2 + default offset: size (component0) + default rstride: RU4 (width) + default size: rstride (component1) * RU2 (height) + + Component 2: V + depth 10 BE + pstride: 2 + default offset: offset (component1) + size (component1) + default rstride: RU4 (width) + default size: rstride (component2) * RU2 (height) + + Image + default size: size (component0) + + size (component1) + + size (component2) + +- **"Y444\_10BE"** planar 4:4:4 YUV, 10 bits per channel +- **"Y444\_10LE"** planar 4:4:4 YUV, 10 bits per channel + +- **"GBR"** planar 4:4:4 RGB, 8 bits per channel +- **"GBR\_10BE"** planar 4:4:4 RGB, 10 bits per channel +- **"GBR\_10LE"** planar 4:4:4 RGB, 10 bits per channel + +- **"NV16"** planar 4:2:2 YUV with interleaved UV plane +- **"NV61"** planar 4:2:2 YUV with interleaved VU plane +- **"NV24"** planar 4:4:4 YUV with interleaved UV plane + +- **"NV12\_64Z32"** planar 4:2:0 YUV with interleaved UV plane in 64x32 tiles zigzag + + Component 0: Y + depth: 8 + pstride: 1 + default offset: 0 + default rstride: RU128 (width) + default size: rstride (component0) * RU32 (height) + + Component 1: U + depth 8 + pstride: 2 + default offset: size (component0) + default rstride: (y_tiles << 16) | x_tiles + default x_tiles: RU128 (width) >> tile_width + default y_tiles: RU32 (height) >> tile_height + + Component 2: V + depth: 8 + pstride: 2 + default offset: offset (component1) + 1 + default rstride: (y_tiles << 16) | x_tiles + default x_tiles: RU128 (width) >> tile_width + default y_tiles: RU64 (height) >> (tile_height + 1) + + Image + default size: RU128 (width) * (RU32 (height) + RU64 (height) / 2) + tile mode: ZFLIPZ_2X2 + tile width: 6 + tile height: 5 diff --git a/markdown/design/orc-integration.md b/markdown/design/orc-integration.md new file mode 100644 index 0000000..d569a2e --- /dev/null +++ b/markdown/design/orc-integration.md @@ -0,0 +1,159 @@ +# Orc Integration + +## About Orc + +Orc code can be in one of two forms: in .orc files that is converted by +orcc to C code that calls liborc functions, or C code that calls liborc +to create complex operations at runtime. The former is mostly for +functions with predetermined functionality. The latter is for +functionality that is determined at runtime, where writing .orc +functions for all combinations would be prohibitive. Orc also has a fast +memcpy and memset which are useful independently. + +## Fast memcpy() + +\*\*\* This part is not integrated yet. \*\*\* + +Orc has built-in functions `orc_memcpy()` and `orc_memset()` that work +like `memcpy()` and `memset()`. These are meant for large copies only. A +reasonable cutoff for using `orc_memcpy()` instead of `memcpy()` is if the +number of bytes is generally greater than 100. **DO NOT** use `orc_memcpy()` +if the typical is size is less than 20 bytes, especially if the size is +known at compile time, as these cases are inlined by the compiler. + +(Example: sys/ximage/ximagesink.c) + +Add $(ORC\_CFLAGS) to libgstximagesink\_la\_CFLAGS and $(ORC\_LIBS) to +libgstximagesink\_la\_LIBADD. Then, in the source file, add: + +\#ifdef HAVE\_ORC \#include \#else \#define +orc\_memcpy(a,b,c) memcpy(a,b,c) \#endif + +Then switch relevant uses of memcpy() to orc\_memcpy(). + +The above example works whether or not Orc is enabled at compile time. + +## Normal Usage + +The following lines are added near the top of Makefile.am for plugins +that use Orc code in .orc files (this is for the volume plugin): + +ORC\_BASE=volume include $(top\_srcdir)/common/orc.mk + +Also add the generated source file to the plugin build: + +nodist\_libgstvolume\_la\_SOURCES = $(ORC\_SOURCES) + +And of course, add $(ORC\_CFLAGS) to libgstvolume\_la\_CFLAGS, and +$(ORC\_LIBS) to libgstvolume\_la\_LIBADD. + +The value assigned to ORC\_BASE does not need to be related to the name +of the plugin. + +## Advanced Usage + +The Holy Grail of Orc usage is to programmatically generate Orc code at +runtime, have liborc compile it into binary code at runtime, and then +execute this code. Currently, the best example of this is in +Schroedinger. An example of how this would be used is audioconvert: +given an input format, channel position manipulation, dithering and +quantizing configuration, and output format, a Orc code generator would +create an OrcProgram, add the appropriate instructions to do each step +based on the configuration, and then compile the program. Successfully +compiling the program would return a function pointer that can be called +to perform the operation. + +This sort of advanced usage requires structural changes to current +plugins (e.g., audioconvert) and will probably be developed +incrementally. Moreover, if such code is intended to be used without Orc +as strict build/runtime requirement, two codepaths would need to be +developed and tested. For this reason, until GStreamer requires Orc, I +think it's a good idea to restrict such advanced usage to the cog plugin +in -bad, which requires Orc. + +## Build Process + +The goal of the build process is to make Orc non-essential for most +developers and users. This is not to say you shouldn't have Orc +installed -- without it, you will get slow backup C code, just that +people compiling GStreamer are not forced to switch from Liboil to Orc +immediately. + +With Orc installed, the build process will use the Orc Compiler (orcc) +to convert each .orc file into a temporary C source (tmp-orc.c) and a +temporary header file (${name}orc.h if constructed from ${base}.orc). +The C source file is compiled and linked to the plugin, and the header +file is included by other source files in the plugin. + +If 'make orc-update' is run in the source directory, the files tmp-orc.c +and ${base}orc.h are copied to ${base}orc-dist.c and ${base}orc-dist.h +respectively. The -dist.\[ch\] files are automatically disted via +orc.mk. The -dist.\[ch\] files should be checked in to git whenever the +.orc source is changed and checked in. Example workflow: + +edit .orc file ... make, test, etc. make orc-update git add volume.orc +volumeorc-dist.c volumeorc-dist.h git commit + +At 'make dist' time, all of the .orc files are compiled, and then copied +to their -dist.\[ch\] counterparts, and then the -dist.\[ch\] files are +added to the dist directory. + +Without Orc installed (or --disable-orc given to configure), the +-dist.\[ch\] files are copied to tmp-orc.c and ${name}orc.h. When +compiled Orc disabled, DISABLE\_ORC is defined in config.h, and the C +backup code is compiled. This backup code is pure C, and does not +include orc headers or require linking against liborc. + +The common/orc.mk build method is limited by the inflexibility of +automake. The file tmp-orc.c must be a fixed filename, using ORC\_NAME +to generate the filename does not work because it conflicts with +automake's dependency generation. Building multiple .orc files is not +possible due to this restriction. + +## Testing + +If you create another .orc file, please add it to tests/orc/Makefile.am. +This causes automatic test code to be generated and run during 'make +check'. Each function in the .orc file is tested by comparing the +results of executing the run-time compiled code and the C backup +function. + +## Orc Limitations + +### audioconvert + +Orc doesn't have a mechanism for generating random numbers, which +prevents its use as-is for dithering. One way around this is to generate +suitable dithering values in one pass, then use those values in a second +Orc-based pass. + +Orc doesn't handle 64-bit float, for no good reason. + +Irrespective of Orc handling 64-bit float, it would be useful to have a +direct 32-bit float to 16-bit integer conversion. + +audioconvert is a good candidate for programmatically generated Orc code. + +audioconvert enumerates functions in terms of big-endian vs. +little-endian. Orc's functions are "native" and "swapped". +Programmatically generating code removes the need to worry about this. + +Orc doesn't handle 24-bit samples. Fixing this is not a priority (for ds). + +### videoscale + +Orc doesn't handle horizontal resampling yet. The plan is to add special +sampling opcodes, for nearest, bilinear, and cubic interpolation. + +### videotestsrc + +Lots of code in videotestsrc needs to be rewritten to be SIMD (and Orc) +friendly, e.g., stuff that uses `oil_splat_u8()`. + +A fast low-quality random number generator in Orc would be useful here. + +### volume + +Many of the comments on audioconvert apply here as well. + +There are a bunch of FIXMEs in here that are due to misapplied patches. diff --git a/markdown/design/playbin.md b/markdown/design/playbin.md new file mode 100644 index 0000000..ed6babf --- /dev/null +++ b/markdown/design/playbin.md @@ -0,0 +1,66 @@ +# playbin + +The purpose of this element is to decode and render the media contained +in a given generic uri. The element extends GstPipeline and is typically +used in playback situations. + +Required features: + + - accept and play any valid uri. This includes + - rendering video/audio + - overlaying subtitles on the video + - optionally read external subtitle files + - allow for hardware (non raw) sinks + - selection of audio/video/subtitle streams based on language. + - perform network buffering/incremental download + - gapless playback + - support for visualisations with configurable sizes + - ability to reject files that are too big, or of a format that would + require too much CPU/memory usage. + - be very efficient with adding elements such as converters to reduce + the amount of negotiation that has to happen. + - handle chained oggs. This includes having support for dynamic pad + add and remove from a demuxer. + +## Components + +### decodebin + + - performs the autoplugging of demuxers/decoders + - emits signals when for steering the autoplugging + - to decide if a non-raw media format is acceptable as output + - to sort the possible decoders for a non-raw format + - see also decodebin2 design doc + +### uridecodebin + + - combination of a source to handle the given uri, an optional + queueing element and one or more decodebin2 elements to decode the + non-raw streams. + +### playsink + + - handles display of audio/video/text. + - has request audio/video/text input pad. There is only one sinkpad + per type. The requested pads define the configuration of the + internal pipeline. + - allows for setting audio/video sinks or does automatic + sink selection. + - allows for configuration of visualisation element. + - allows for enable/disable of visualisation, audio and video. + +### playbin + + - combination of one or more uridecodebin elements to read the uri and + subtitle uri. + - support for queuing new media to support gapless playback. + - handles stream selection. + - uses playsink to display. + - selection of sinks and configuration of uridecodebin with raw + output formats. + +## Gapless playback feature + +playbin has an "about-to-finish" signal. The application should +configure a new uri (and optional suburi) in the callback. When the +current media finishes, this new media will be played next. diff --git a/markdown/design/stereo-multiview-video.md b/markdown/design/stereo-multiview-video.md new file mode 100644 index 0000000..1bda44c --- /dev/null +++ b/markdown/design/stereo-multiview-video.md @@ -0,0 +1,320 @@ +# Stereoscopic & Multiview Video Handling + +There are two cases to handle: + + - Encoded video output from a demuxer to parser / decoder or from encoders + into a muxer. + + - Raw video buffers + +The design below is somewhat based on the proposals from +[bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157) + +Multiview is used as a generic term to refer to handling both +stereo content (left and right eye only) as well as extensions for videos +containing multiple independent viewpoints. + +## Encoded Signalling + +This is regarding the signalling in caps and buffers from demuxers to +parsers (sometimes) or out from encoders. + +For backward compatibility with existing codecs many transports of +stereoscopic 3D content use normal 2D video with 2 views packed spatially +in some way, and put extra new descriptions in the container/mux. + +Info in the demuxer seems to apply to stereo encodings only. For all +MVC methods I know, the multiview encoding is in the video bitstream itself +and therefore already available to decoders. Only stereo systems have been retro-fitted +into the demuxer. + +Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets) +and it would be useful to be able to put the info onto caps and buffers from the +parser without decoding. + +To handle both cases, we need to be able to output the required details on +encoded video for decoders to apply onto the raw video buffers they decode. + +*If there ever is a need to transport multiview info for encoded data the +same system below for raw video or some variation should work* + +### Encoded Video: Properties that need to be encoded into caps + +1. multiview-mode (called "Channel Layout" in bug 611157) + * Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo + (switches between mono and stereo - mp4 can do this) + * Uses a buffer flag to mark individual buffers as mono or "not mono" + (single|stereo|multiview) for mixed scenarios. The alternative (not + proposed) is for the demuxer to switch caps for each mono to not-mono + change, and not used a 'mixed' caps variant at all. + * _single_ refers to a stream of buffers that only contain 1 view. + It is different from mono in that the stream is a marked left or right + eye stream for later combining in a mixer or when displaying. + * _multiple_ marks a stream with multiple independent views encoded. + It is included in this list for completeness. As noted above, there's + currently no scenario that requires marking encoded buffers as MVC. + +2. Frame-packing arrangements / view sequence orderings + * Possible frame packings: side-by-side, side-by-side-quincunx, + column-interleaved, row-interleaved, top-bottom, checker-board + * bug 611157 - sreerenj added side-by-side-full and top-bottom-full but + I think that's covered by suitably adjusting pixel-aspect-ratio. If + not, they can be added later. + * _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest. + * _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+ + * _side-by-side-quincunx_. Side By Side packing, but quincunx sampling - + 1 pixel offset of each eye needs to be accounted when upscaling or displaying + * there may be other packings (future expansion) + * Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved + * _frame-by-frame_, each buffer is left, then right view etc + * _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right. + Demuxer info indicates which one is which. + Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin) + -> *Leave this for future expansion* + * _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2 + -> *Leave this for future expansion / deletion* + +3. view encoding order + * Describes how to decide which piece of each frame corresponds to left or right eye + * Possible orderings left, right, left-then-right, right-then-left + - Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams + - Need a buffer flag for marking the first buffer of a group. + +4. "Frame layout flags" + * flags for view specific interpretation + * horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right + Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors. + * This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry. + * It might be better just to use a hex value / integer + +## Buffer representation for raw video + + - Transported as normal video buffers with extra metadata + - The caps define the overall buffer width/height, with helper functions to + extract the individual views for packed formats + - pixel-aspect-ratio adjusted if needed to double the overall width/height + - video sinks that don't know about multiview extensions yet will show the + packed view as-is. For frame-sequence outputs, things might look weird, but + just adding multiview-mode to the sink caps can disallow those transports. + - _row-interleaved_ packing is actually just side-by-side memory layout with + half frame width, twice the height, so can be handled by adjusting the + overall caps and strides + - Other exotic layouts need new pixel formats defined (checker-board, + column-interleaved, side-by-side-quincunx) + - _Frame-by-frame_ - one view per buffer, but with alternating metas marking + which buffer is which left/right/other view and using a new buffer flag as + described above to mark the start of a group of corresponding frames. + - New video caps addition as for encoded buffers + +### Proposed Caps fields + +Combining the requirements above and collapsing the combinations into mnemonics: + +* multiview-mode = + mono | left | right | sbs | sbs-quin | col | row | topbot | checkers | + frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row | + mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames + +* multiview-flags = + + 0x0000 none + + 0x0001 right-view-first + + 0x0002 left-h-flipped + + 0x0004 left-v-flipped + + 0x0008 right-h-flipped + + 0x0010 right-v-flipped + +### Proposed new buffer flags + +Add two new `GST_VIDEO_BUFFER_*` flags in video-frame.h and make it clear that +those flags can apply to encoded video buffers too. wtay says that's currently +the case anyway, but the documentation should say it. + + - **`GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW`** - Marks a buffer as representing + non-mono content, although it may be a single (left or right) eye view. + + - **`GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE`** - for frame-sequential methods of + transport, mark the "first" of a left/right/other group of frames + +### A new GstMultiviewMeta + +This provides a place to describe all provided views in a buffer / stream, +and through Meta negotiation to inform decoders about which views to decode if +not all are wanted. + +* Logical labels/names and mapping to GstVideoMeta numbers +* Standard view labels LEFT/RIGHT, and non-standard ones (strings) + + GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1 + GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2 + + struct GstVideoMultiviewViewInfo { + guint view_label; + guint meta_id; // id of the GstVideoMeta for this view + + padding; + } + + struct GstVideoMultiviewMeta { + guint n_views; + GstVideoMultiviewViewInfo *view_info; + } + +The meta is optional, and probably only useful later for MVC + + +## Outputting stereo content + +The initial implementation for output will be stereo content in glimagesink + +### Output Considerations with OpenGL + + - If we have support for stereo GL buffer formats, we can output separate + left/right eye images and let the hardware take care of display. + + - Otherwise, glimagesink needs to render one window with left/right in a + suitable frame packing and that will only show correctly in fullscreen on a + device set for the right 3D packing -> requires app intervention to set the + video mode. + + - Which could be done manually on the TV, or with HDMI 1.4 by setting the + right video mode for the screen to inform the TV or third option, we support + rendering to two separate overlay areas on the screen - one for left eye, + one for right which can be supported using the 'splitter' element and two + output sinks or, better, add a 2nd window overlay for split stereo output + + - Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so + initial implementation won't include that + +## Other elements for handling multiview content + + - videooverlay interface extensions + - __Q__: Should this be a new interface? + - Element message to communicate the presence of stereoscopic information to the app + - App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags + - Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata + - New API for the app to set rendering options for stereo/multiview content + - This might be best implemented as a **multiview GstContext**, so that + the pipeline can share app preferences for content interpretation and downmixing + to mono for output, or in the sink and have those down as far upstream/downstream as possible. + + - Converter element + - convert different view layouts + - Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono + + - Mixer element + - take 2 video streams and output as stereo + - later take n video streams + - share code with the converter, it just takes input from n pads instead of one. + + - Splitter element + - Output one pad per view + +### Implementing MVC handling in decoders / parsers (and encoders) + +Things to do to implement MVC handling + +1. Parsing SEI in h264parse and setting caps (patches available in + bugzilla for parsing, see below) +2. Integrate gstreamer-vaapi MVC support with this proposal +3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC) +4. generating SEI in H.264 encoder +5. Support for MPEG2 MVC extensions + +## Relevant bugs + + - [bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser + - [bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support + - [bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams + +## Other Information + +[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D) + +## Open Questions + +### Background + +### Representation for GstGL + +When uploading raw video frames to GL textures, the goal is to implement: + +Split packed frames into separate GL textures when uploading, and +attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and +multiview-flags fields in the caps should change to reflect the conversion +from one incoming GstMemory to multiple GstGLMemory, and change the +width/height in the output info as needed. + +This is (currently) targetted as 2 render passes - upload as normal +to a single stereo-packed RGBA texture, and then unpack into 2 +smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as +2 GstGLMemory attached to one buffer. We can optimise the upload later +to go directly to 2 textures for common input formats. + +Separat output textures have a few advantages: + + - Filter elements can more easily apply filters in several passes to each + texture without fundamental changes to our filters to avoid mixing pixels + from separate views. + + - Centralises the sampling of input video frame packings in the upload code, + which makes adding new packings in the future easier. + + - Sampling multiple textures to generate various output frame-packings + for display is conceptually simpler than converting from any input packing + to any output packing. + + - In implementations that support quad buffers, having separate textures + makes it trivial to do GL_LEFT/GL_RIGHT output + +For either option, we'll need new glsink output API to pass more +information to applications about multiple views for the draw signal/callback. + +I don't know if it's desirable to support *both* methods of representing +views. If so, that should be signalled in the caps too. That could be a +new multiview-mode for passing views in separate GstMemory objects +attached to a GstBuffer, which would not be GL specific. + +### Overriding frame packing interpretation + +Most sample videos available are frame packed, with no metadata +to say so. How should we override that interpretation? + + - Simple answer: Use capssetter + new properties on playbin to + override the multiview fields. *Basically implemented in playbin, using* + *a pad probe. Needs more work for completeness* + +### Adding extra GstVideoMeta to buffers + +There should be one GstVideoMeta for the entire video frame in packed +layouts, and one GstVideoMeta per GstGLMemory when views are attached +to a GstBuffer separately. This should be done by the buffer pool, +which knows from the caps. + +### videooverlay interface extensions + +GstVideoOverlay needs: + +- A way to announce the presence of multiview content when it is + detected/signalled in a stream. +- A way to tell applications which output methods are supported/available +- A way to tell the sink which output method it should use +- Possibly a way to tell the sink to override the input frame + interpretation / caps - depends on the answer to the question + above about how to model overriding input interpretation. + +### What's implemented + +- Caps handling +- gst-plugins-base libsgstvideo pieces +- playbin caps overriding +- conversion elements - glstereomix, gl3dconvert (needs a rename), + glstereosplit. + +### Possible future enhancements + +- Make GLupload split to separate textures at upload time? + - Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture. +- Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed. + - current done by packing then downloading which isn't OK overhead for RGBA download +- Think about how we integrate GLstereo - do we need to do anything special, + or can the app just render to stereo/quad buffers if they're available? diff --git a/markdown/design/subtitle-overlays.md b/markdown/design/subtitle-overlays.md new file mode 100644 index 0000000..ebd0d83 --- /dev/null +++ b/markdown/design/subtitle-overlays.md @@ -0,0 +1,527 @@ +# Subtitle overlays, hardware-accelerated decoding and playbin + +This document describes some of the considerations and requirements that +led to the current `GstVideoOverlayCompositionMeta` API which allows +attaching of subtitle bitmaps or logos to video buffers. + +## Background + +Subtitles can be muxed in containers or come from an external source. + +Subtitles come in many shapes and colours. Usually they are either +text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles +and the most common form of DVB subs). Bitmap based subtitles are +usually compressed in some way, like some form of run-length encoding. + +Subtitles are currently decoded and rendered in subtitle-format-specific +overlay elements. These elements have two sink pads (one for raw video +and one for the subtitle format in question) and one raw video source +pad. + +They will take care of synchronising the two input streams, and of +decoding and rendering the subtitles on top of the raw video stream. + +Digression: one could theoretically have dedicated decoder/render +elements that output an AYUV or ARGB image, and then let a videomixer +element do the actual overlaying, but this is not very efficient, +because it requires us to allocate and blend whole pictures (1920x1080 +AYUV = 8MB, 1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the +overlay region is only a small rectangle at the bottom. This wastes +memory and CPU. We could do something better by introducing a new format +that only encodes the region(s) of interest, but we don't have such a +format yet, and are not necessarily keen to rewrite this part of the +logic in playbin at this point - and we can't change existing elements' +behaviour, so would need to introduce new elements for this. + +Playbin supports outputting compressed formats, i.e. it does not force +decoding to a raw format, but is happy to output to a non-raw format as +long as the sink supports that as well. + +In case of certain hardware-accelerated decoding APIs, we will make use +of that functionality. However, the decoder will not output a raw video +format then, but some kind of hardware/API-specific format (in the caps) +and the buffers will reference hardware/API-specific objects that the +hardware/API-specific sink will know how to handle. + +## The Problem + +In the case of such hardware-accelerated decoding, the decoder will not +output raw pixels that can easily be manipulated. Instead, it will +output hardware/API-specific objects that can later be used to render a +frame using the same API. + +Even if we could transform such a buffer into raw pixels, we most likely +would want to avoid that, in order to avoid the need to map the data +back into system memory (and then later back to the GPU). It's much +better to upload the much smaller encoded data to the GPU/DSP and then +leave it there until rendered. + +Before `GstVideoOverlayComposition` playbin only supported subtitles on +top of raw decoded video. It would try to find a suitable overlay element +from the plugin registry based on the input subtitle caps and the rank. +(It is assumed that we will be able to convert any raw video format into +any format required by the overlay using a converter such as videoconvert.) + +It would not render subtitles if the video sent to the sink is not raw +YUV or RGB or if conversions had been disabled by setting the +native-video flag on playbin. + +Subtitle rendering is considered an important feature. Enabling +hardware-accelerated decoding by default should not lead to a major +feature regression in this area. + +This means that we need to support subtitle rendering on top of non-raw +video. + +## Possible Solutions + +The goal is to keep knowledge of the subtitle format within the +format-specific GStreamer plugins, and knowledge of any specific video +acceleration API to the GStreamer plugins implementing that API. We do +not want to make the pango/dvbsuboverlay/dvdspu/kate plugins link to +libva/libvdpau/etc. and we do not want to make the vaapi/vdpau plugins +link to all of libpango/libkate/libass etc. + +Multiple possible solutions come to mind: + +1) backend-specific overlay elements + + e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu, + vaapidvbsuboverlay, vdpaudvbsuboverlay, etc. + + This assumes the overlay can be done directly on the + backend-specific object passed around. + + The main drawback with this solution is that it leads to a lot of + code duplication and may also lead to uncertainty about distributing + certain duplicated pieces of code. The code duplication is pretty + much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu, + kate, assrender, etc. available in form of base classes to derive + from is not really an option. Similarly, one would not really want + the vaapi/vdpau plugin to depend on a bunch of other libraries such + as libpango, libkate, libtiger, libass, etc. + + One could add some new kind of overlay plugin feature though in + combination with a generic base class of some sort, but in order to + accommodate all the different cases and formats one would end up + with quite convoluted/tricky API. + + (Of course there could also be a GstFancyVideoBuffer that provides + an abstraction for such video accelerated objects and that could + provide an API to add overlays to it in a generic way, but in the + end this is just a less generic variant of (c), and it is not clear + that there are real benefits to a specialised solution vs. a more + generic one). + +2) convert backend-specific object to raw pixels and then overlay + + Even where possible technically, this is most likely very + inefficient. + +3) attach the overlay data to the backend-specific video frame buffers + in a generic way and do the actual overlaying/blitting later in + backend-specific code such as the video sink (or an accelerated + encoder/transcoder) + + In this case, the actual overlay rendering (i.e. the actual text + rendering or decoding DVD/DVB data into pixels) is done in the + subtitle-format-specific GStreamer plugin. All knowledge about the + subtitle format is contained in the overlay plugin then, and all + knowledge about the video backend in the video backend specific + plugin. + + The main question then is how to get the overlay pixels (and we will + only deal with pixels here) from the overlay element to the video + sink. + + This could be done in multiple ways: One could send custom events + downstream with the overlay data, or one could attach the overlay + data directly to the video buffers in some way. + + Sending inline events has the advantage that is is fairly + transparent to any elements between the overlay element and the + video sink: if an effects plugin creates a new video buffer for the + output, nothing special needs to be done to maintain the subtitle + overlay information, since the overlay data is not attached to the + buffer. However, it slightly complicates things at the sink, since + it would also need to look for the new event in question instead of + just processing everything in its buffer render function. + + If one attaches the overlay data to the buffer directly, any element + between overlay and video sink that creates a new video buffer would + need to be aware of the overlay data attached to it and copy it over + to the newly-created buffer. + + One would have to do implement a special kind of new query (e.g. + FEATURE query) that is not passed on automatically by + gst\_pad\_query\_default() in order to make sure that all elements + downstream will handle the attached overlay data. (This is only a + problem if we want to also attach overlay data to raw video pixel + buffers; for new non-raw types we can just make it mandatory and + assume support and be done with it; for existing non-raw types + nothing changes anyway if subtitles don't work) (we need to maintain + backwards compatibility for existing raw video pipelines like e.g.: + ..decoder \! suboverlay \! encoder..) + + Even though slightly more work, attaching the overlay information to + buffers seems more intuitive than sending it interleaved as events. + And buffers stored or passed around (e.g. via the "last-buffer" + property in the sink when doing screenshots via playbin) always + contain all the information needed. + +4) create a video/x-raw-\*-delta format and use a backend-specific + videomixer + + This possibility was hinted at already in the digression in section + 1. It would satisfy the goal of keeping subtitle format knowledge in + the subtitle plugins and video backend knowledge in the video + backend plugin. It would also add a concept that might be generally + useful (think ximagesrc capture with xdamage). However, it would + require adding foorender variants of all the existing overlay + elements, and changing playbin to that new design, which is somewhat + intrusive. And given the general nature of such a new format/API, we + would need to take a lot of care to be able to accommodate all + possible use cases when designing the API, which makes it + considerably more ambitious. Lastly, we would need to write + videomixer variants for the various accelerated video backends as + well. + +Overall (c) appears to be the most promising solution. It is the least +intrusive and should be fairly straight-forward to implement with +reasonable effort, requiring only small changes to existing elements and +requiring no new elements. + +Doing the final overlaying in the sink as opposed to a videomixer or +overlay in the middle of the pipeline has other advantages: + + - if video frames need to be dropped, e.g. for QoS reasons, we could + also skip the actual subtitle overlaying and possibly the + decoding/rendering as well, if the implementation and API allows for + that to be delayed. + + - the sink often knows the actual size of the window/surface/screen + the output video is rendered to. This *may* make it possible to + render the overlay image in a higher resolution than the input + video, solving a long standing issue with pixelated subtitles on top + of low-resolution videos that are then scaled up in the sink. This + would require for the rendering to be delayed of course instead of + just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer + in the overlay, but that could all be supported. + + - if the video backend / sink has support for high-quality text + rendering (clutter?) we could just pass the text or pango markup to + the sink and let it do the rest (this is unlikely to be supported in + the general case - text and glyph rendering is hard; also, we don't + really want to make up our own text markup system, and pango markup + is probably too limited for complex karaoke stuff). + +## API needed + +1) Representation of subtitle overlays to be rendered + + We need to pass the overlay pixels from the overlay element to the + sink somehow. Whatever the exact mechanism, let's assume we pass a + refcounted GstVideoOverlayComposition struct or object. + + A composition is made up of one or more overlays/rectangles. + + In the simplest case an overlay rectangle is just a blob of + RGBA/ABGR \[FIXME?\] or AYUV pixels with positioning info and other + metadata, and there is only one rectangle to render. + + We're keeping the naming generic ("OverlayFoo" rather than + "SubtitleFoo") here, since this might also be handy for other use + cases such as e.g. logo overlays or so. It is not designed for + full-fledged video stream mixing + though. + + // Note: don't mind the exact implementation details, they'll be hidden + + // FIXME: might be confusing in 0.11 though since GstXOverlay was + // renamed to GstVideoOverlay in 0.11, but not much we can do, + // maybe we can rename GstVideoOverlay to something better + + struct GstVideoOverlayComposition + { + guint num_rectangles; + GstVideoOverlayRectangle ** rectangles; + + /* lowest rectangle sequence number still used by the upstream + * overlay element. This way a renderer maintaining some kind of + * rectangles <-> surface cache can know when to free cached + * surfaces/rectangles. */ + guint min_seq_num_used; + + /* sequence number for the composition (same series as rectangles) */ + guint seq_num; + } + + struct GstVideoOverlayRectangle + { + /* Position on video frame and dimension of output rectangle in + * output frame terms (already adjusted for the PAR of the output + * frame). x/y can be negative (overlay will be clipped then) */ + gint x, y; + guint render_width, render_height; + + /* Dimensions of overlay pixels */ + guint width, height, stride; + + /* This is the PAR of the overlay pixels */ + guint par_n, par_d; + + /* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems, + * and BGRA on little-endian systems (i.e. pixels are treated as + * 32-bit values and alpha is always in the most-significant byte, + * and blue is in the least-significant byte). + * + * FIXME: does anyone actually use AYUV in practice? (we do + * in our utility function to blend on top of raw video) + * What about AYUV and endianness? Do we always have [A][Y][U][V] + * in memory? */ + /* FIXME: maybe use our own enum? */ + GstVideoFormat format; + + /* Refcounted blob of memory, no caps or timestamps */ + GstBuffer *pixels; + + // FIXME: how to express source like text or pango markup? + // (just add source type enum + source buffer with data) + // + // FOR 0.10: always send pixel blobs, but attach source data in + // addition (reason: if downstream changes, we can't renegotiate + // that properly, if we just do a query of supported formats from + // the start). Sink will just ignore pixels and use pango markup + // from source data if it supports that. + // + // FOR 0.11: overlay should query formats (pango markup, pixels) + // supported by downstream and then only send that. We can + // renegotiate via the reconfigure event. + // + + /* sequence number: useful for backends/renderers/sinks that want + * to maintain a cache of rectangles <-> surfaces. The value of + * the min_seq_num_used in the composition tells the renderer which + * rectangles have expired. */ + guint seq_num; + + /* FIXME: we also need a (private) way to cache converted/scaled + * pixel blobs */ + } + + (a1) Overlay consumer + API: + + How would this work in a video sink that supports scaling of textures: + + gst_foo_sink_render () { + /* assume only one for now */ + if video_buffer has composition: + composition = video_buffer.get_composition() + + for each rectangle in composition: + if rectangle.source_data_type == PANGO_MARKUP + actor = text_from_pango_markup (rectangle.get_source_data()) + else + pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...) + actor = texture_from_rgba (pixels, ...) + + .. position + scale on top of video surface ... + } + + (a2) Overlay producer + API: + + e.g. logo or subpicture overlay: got pixels, stuff into rectangle: + + if (logoverlay->cached_composition == NULL) { + comp = composition_new (); + + rect = rectangle_new (format, pixels_buf, + width, height, stride, par_n, par_d, + x, y, render_width, render_height); + + /* composition adds its own ref for the rectangle */ + composition_add_rectangle (comp, rect); + rectangle_unref (rect); + + /* buffer adds its own ref for the composition */ + video_buffer_attach_composition (comp); + + /* we take ownership of the composition and save it for later */ + logoverlay->cached_composition = comp; + } else { + video_buffer_attach_composition (logoverlay->cached_composition); + } + + FIXME: also add some API to modify render position/dimensions of a + rectangle (probably requires creation of new rectangle, unless we + handle writability like with other mini objects). + +2) Fallback overlay rendering/blitting on top of raw video + + Eventually we want to use this overlay mechanism not only for + hardware-accelerated video, but also for plain old raw video, either + at the sink or in the overlay element directly. + + Apart from the advantages listed earlier in section 3, this allows + us to consolidate a lot of overlaying/blitting code that is + currently repeated in every single overlay element in one location. + This makes it considerably easier to support a whole range of raw + video formats out of the box, add SIMD-optimised rendering using + ORC, or handle corner cases correctly. + + (Note: side-effect of overlaying raw video at the video sink is that + if e.g. a screnshotter gets the last buffer via the last-buffer + property of basesink, it would get an image without the subtitles on + top. This could probably be fixed by re-implementing the property in + GstVideoSink though. Playbin2 could handle this internally as well). + + void + gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp + GstBuffer * video_buf) + { + guint n; + + g_return_if_fail (gst_buffer_is_writable (video_buf)); + g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL); + + ... parse video_buffer caps into BlendVideoFormatInfo ... + + for each rectangle in the composition: { + + if (gst_video_format_is_yuv (video_buf_format)) { + overlay_format = FORMAT_AYUV; + } else if (gst_video_format_is_rgb (video_buf_format)) { + overlay_format = FORMAT_ARGB; + } else { + /* FIXME: grayscale? */ + return; + } + + /* this will scale and convert AYUV<->ARGB if needed */ + pixels = rectangle_get_pixels_scaled (rectangle, overlay_format); + + ... clip output rectangle ... + + __do_blend (video_buf_format, video_buf->data, + overlay_format, pixels->data, + x, y, width, height, stride); + + gst_buffer_unref (pixels); + } + } + +3) Flatten all rectangles in a composition + + We cannot assume that the video backend API can handle any number of + rectangle overlays, it's possible that it only supports one single + overlay, in which case we need to squash all rectangles into one. + + However, we'll just declare this a corner case for now, and + implement it only if someone actually needs it. It's easy to add + later API-wise. Might be a bit tricky if we have rectangles with + different PARs/formats (e.g. subs and a logo), though we could + probably always just use the code from (b) with a fully transparent + video buffer to create a flattened overlay buffer. + +4) query support for the new video composition mechanism + + This is handled via GstMeta and an ALLOCATION query - we can simply + query whether downstream supports the GstVideoOverlayComposition meta. + + There appears to be no issue with downstream possibly not being + linked yet at the time when an overlay would want to do such a + query, but we would just have to default to something and update + ourselves later on a reconfigure event then. + +Other considerations: + + - renderers (overlays or sinks) may be able to handle only ARGB or + only AYUV (for most graphics/hw-API it's likely ARGB of some sort, + while our blending utility functions will likely want the same + colour space as the underlying raw video format, which is usually + YUV of some sort). We need to convert where required, and should + cache the conversion. + + - renderers may or may not be able to scale the overlay. We need to do + the scaling internally if not (simple case: just horizontal scaling + to adjust for PAR differences; complex case: both horizontal and + vertical scaling, e.g. if subs come from a different source than the + video or the video has been rescaled or cropped between overlay + element and sink). + + - renderers may be able to generate (possibly scaled) pixels on demand + from the original data (e.g. a string or RLE-encoded data). We will + ignore this for now, since this functionality can still be added + later via API additions. The most interesting case would be to pass + a pango markup string, since e.g. clutter can handle that natively. + + - renderers may be able to write data directly on top of the video + pixels (instead of creating an intermediary buffer with the overlay + which is then blended on top of the actual video frame), e.g. + dvdspu, dvbsuboverlay + +However, in the interest of simplicity, we should probably ignore the +fact that some elements can blend their overlays directly on top of the +video (decoding/uncompressing them on the fly), even more so as it's not +obvious that it's actually faster to decode the same overlay 70-90 times +(say) (ie. ca. 3 seconds of video frames) and then blend it 70-90 times +instead of decoding it once into a temporary buffer and then blending it +directly from there, possibly SIMD-accelerated. Also, this is only +relevant if the video is raw video and not some hardware-acceleration +backend object. + +And ultimately it is the overlay element that decides whether to do the +overlay right there and then or have the sink do it (if supported). It +could decide to keep doing the overlay itself for raw video and only use +our new API for non-raw video. + + - renderers may want to make sure they only upload the overlay pixels + once per rectangle if that rectangle recurs in subsequent frames (as + part of the same composition or a different composition), as is + likely. This caching of e.g. surfaces needs to be done renderer-side + and can be accomplished based on the sequence numbers. The + composition contains the lowest sequence number still in use + upstream (an overlay element may want to cache created + compositions+rectangles as well after all to re-use them for + multiple frames), based on that the renderer can expire cached + objects. The caching needs to be done renderer-side because + attaching renderer-specific objects to the rectangles won't work + well given the refcounted nature of rectangles and compositions, + making it unpredictable when a rectangle or composition will be + freed or from which thread context it will be freed. The + renderer-specific objects are likely bound to other types of + renderer-specific contexts, and need to be managed in connection + with those. + + - composition/rectangles should internally provide a certain degree of + thread-safety. Multiple elements (sinks, overlay element) might + access or use the same objects from multiple threads at the same + time, and it is expected that elements will keep a ref to + compositions and rectangles they push downstream for a while, e.g. + until the current subtitle composition expires. + +## Future considerations + + - alternatives: there may be multiple versions/variants of the same + subtitle stream. On DVDs, there may be a 4:3 version and a 16:9 + version of the same subtitles. We could attach both variants and let + the renderer pick the best one for the situation (currently we just + use the 16:9 version). With totem, it's ultimately totem that adds + the 'black bars' at the top/bottom, so totem also knows if it's got + a 4:3 display and can/wants to fit 4:3 subs (which may render on top + of the bars) or not, for example. + +## Misc. FIXMEs + +TEST: should these look (roughly) alike (note text distortion) - needs +fixing in textoverlay + + gst-launch-1.0 \ + videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 \ + ! textoverlay text=Hello font-desc=72 ! xvimagesink \ + videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 \ + ! textoverlay text=Hello font-desc=72 ! xvimagesink \ + videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 \ + ! textoverlay text=Hello font-desc=72 ! xvimagesink diff --git a/sitemap.txt b/sitemap.txt index 6166b17..27cccab 100644 --- a/sitemap.txt +++ b/sitemap.txt @@ -141,6 +141,7 @@ index.md design/MT-refcounting.md design/TODO.md design/activation.md + design/audiosinks.md design/buffer.md design/buffering.md design/bufferpool.md @@ -149,10 +150,12 @@ index.md design/context.md design/controller.md design/conventions.md + design/decodebin.md design/dynamic.md design/element-sink.md design/element-source.md design/element-transform.md + design/encoding.md design/events.md design/framestep.md design/gstbin.md @@ -162,8 +165,13 @@ index.md design/gstobject.md design/gstpipeline.md design/draft-klass.md + design/interlaced-video.md + design/keyframe-force.md design/latency.md design/live-source.md + design/mediatype-audio-raw.md + design/mediatype-text-raw.md + design/mediatype-video-raw.md design/memory.md design/messages.md design/meta.md @@ -171,7 +179,9 @@ index.md design/miniobject.md design/missing-plugins.md design/negotiation.md + design/orc-integration.md design/overview.md + design/playbin.md design/preroll.md design/probes.md design/progress.md @@ -186,9 +196,11 @@ index.md design/sparsestreams.md design/standards.md design/states.md + design/stereo-multiview-video.md design/stream-selection.md design/stream-status.md design/streams.md + design/subtitle-overlays.md design/synchronisation.md design/draft-tagreading.md design/toc.md -- 2.7.4