From: Tim-Philipp Müller Date: Thu, 8 Dec 2016 22:59:58 +0000 (+0000) Subject: docs: design: move most design docs to gst-docs module X-Git-Tag: 1.19.3~511^2~2495 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=46138b1b1dc3bc9bfd4ce9102deddbf7f46de223;p=platform%2Fupstream%2Fgstreamer.git docs: design: move most design docs to gst-docs module --- diff --git a/docs/design/Makefile.am b/docs/design/Makefile.am index 94dece6..bd55852 100644 --- a/docs/design/Makefile.am +++ b/docs/design/Makefile.am @@ -2,16 +2,5 @@ SUBDIRS = EXTRA_DIST = \ - design-audiosinks.txt \ - design-decodebin.txt \ - design-encoding.txt \ - design-orc-integration.txt \ draft-hw-acceleration.txt \ - draft-keyframe-force.txt \ - draft-subtitle-overlays.txt\ - draft-va.txt \ - part-interlaced-video.txt \ - part-mediatype-audio-raw.txt\ - part-mediatype-text-raw.txt\ - part-mediatype-video-raw.txt\ - part-playbin.txt + draft-va.txt diff --git a/docs/design/design-audiosinks.txt b/docs/design/design-audiosinks.txt deleted file mode 100644 index e2dafad..0000000 --- a/docs/design/design-audiosinks.txt +++ /dev/null @@ -1,138 +0,0 @@ -Audiosink design ----------------- - -Requirements: - - - must operate chain based. - Most simple playback pipelines will push audio from the decoders - into the audio sink. - - - must operate getrange based - Most professional audio applications will operate in a mode where - the audio sink pulls samples from the pipeline. This is typically - done in a callback from the audiosink requesting N samples. The - callback is either scheduled from a thread or from an interrupt - from the audio hardware device. - - - Exact sample accurate clocks. - the audiosink must be able to provide a clock that is sample - accurate even if samples are dropped or when discontinuities are - found in the stream. - - - Exact timing of playback. - The audiosink must be able to play samples at their exact times. - - - use DMA access when possible. - When the hardware can do DMA we should use it. This should also - work over bufferpools to avoid data copying to/from kernel space. - - -Design: - - The design is based on a set of base classes and the concept of a - ringbuffer of samples. - - +-----------+ - provide preroll, rendering, timing - + basesink + - caps nego - +-----+-----+ - | - +-----V----------+ - manages ringbuffer - + audiobasesink + - manages scheduling (push/pull) - +-----+----------+ - manages clock/query/seek - | - manages scheduling of samples in the ringbuffer - | - manages caps parsing - | - +-----V------+ - default ringbuffer implementation with a GThread - + audiosink + - subclasses provide open/read/close methods - +------------+ - - The ringbuffer is a contiguous piece of memory divided into segtotal - pieces of segments. Each segment has segsize bytes. - - play position - v - +---+---+---+-------------------------------------+----------+ - + 0 | 1 | 2 | .... | segtotal | - +---+---+---+-------------------------------------+----------+ - <---> - segsize bytes = N samples * bytes_per_sample. - - - The ringbuffer has a play position, which is expressed in - segments. The play position is where the device is currently reading - samples from the buffer. - - The ringbuffer can be put to the PLAYING or STOPPED state. - - In the STOPPED state no samples are played to the device and the play - pointer does not advance. - - In the PLAYING state samples are written to the device and the ringbuffer - should call a configurable callback after each segment is written to the - device. In this state the play pointer is advanced after each segment is - written. - - A write operation to the ringbuffer will put new samples in the ringbuffer. - If there is not enough space in the ringbuffer, the write operation will - block. The playback of the buffer never stops, even if the buffer is - empty. When the buffer is empty, silence is played by the device. - - The ringbuffer is implemented with lockfree atomic operations, especially - on the reading side so that low-latency operations are possible. - - Whenever new samples are to be put into the ringbuffer, the position of the - read pointer is taken. The required write position is taken and the diff - is made between the required and actual position. If the difference is <0, - the sample is too late. If the difference is bigger than segtotal, the - writing part has to wait for the play pointer to advance. - - -Scheduling: - - - chain based mode: - - In chain based mode, bytes are written into the ringbuffer. This operation - will eventually block when the ringbuffer is filled. - - When no samples arrive in time, the ringbuffer will play silence. Each - buffer that arrives will be placed into the ringbuffer at the correct - times. This means that dropping samples or inserting silence is done - automatically and very accurate and independend of the play pointer. - - In this mode, the ringbuffer is usually kept as full as possible. When - using a small buffer (small segsize and segtotal), the latency for audio - to start from the sink to when it is played can be kept low but at least - one context switch has to be made between read and write. - - - getrange based mode - - In getrange based mode, the audiobasesink will use the callback function - of the ringbuffer to get a segsize samples from the peer element. These - samples will then be placed in the ringbuffer at the next play position. - It is assumed that the getrange function returns fast enough to fill the - ringbuffer before the play pointer reaches the write pointer. - - In this mode, the ringbuffer is usually kept as empty as possible. There - is no context switch needed between the elements that create the samples - and the actual writing of the samples to the device. - - -DMA mode: - - - Elements that can do DMA based access to the audio device have to subclass - from the GstAudioBaseSink class and wrap the DMA ringbuffer in a subclass - of GstRingBuffer. - - The ringbuffer subclass should trigger a callback after writing or playing - each sample to the device. This callback can be triggered from a thread or - from a signal from the audio device. - - -Clocks: - - The GstAudioBaseSink class will use the ringbuffer to act as a clock provider. - It can do this by using the play pointer and the delay to calculate the - clock time. - - - diff --git a/docs/design/design-decodebin.txt b/docs/design/design-decodebin.txt deleted file mode 100644 index ec8df9a..0000000 --- a/docs/design/design-decodebin.txt +++ /dev/null @@ -1,274 +0,0 @@ -Decodebin design - -GstDecodeBin ------------- - -Description: - - Autoplug and decode to raw media - - Input : single pad with ANY caps Output : Dynamic pads - -* Contents - - _ a GstTypeFindElement connected to the single sink pad - - _ optionally a demuxer/parser - - _ optionally one or more DecodeGroup - -* Autoplugging - - The goal is to reach 'target' caps (by default raw media). - - This is done by using the GstCaps of a source pad and finding the available -demuxers/decoders GstElement that can be linked to that pad. - - The process starts with the source pad of typefind and stops when no more -non-target caps are left. It is commonly done while pre-rolling, but can also -happen whenever a new pad appears on any element. - - Once a target caps has been found, that pad is ghosted and the -'pad-added' signal is emitted. - - If no compatible elements can be found for a GstCaps, the pad is ghosted and -the 'unknown-type' signal is emitted. - - -* Assisted auto-plugging - - When starting the auto-plugging process for a given GstCaps, two signals are -emitted in the following way in order to allow the application/user to assist or -fine-tune the process. - - _ 'autoplug-continue' : - - gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps) - - This signal is fired at the very beginning with the source pad GstCaps. If - the callback returns TRUE, the process continues normally. If the callback - returns FALSE, then the GstCaps are considered as a target caps and the - autoplugging process stops. - - - 'autoplug-factories' : - - GValueArray user_function (GstElement* decodebin, GstPad* pad, - GstCaps* caps); - - Get a list of elementfactories for @pad with @caps. This function is used to - instruct decodebin2 of the elements it should try to autoplug. The default - behaviour when this function is not overriden is to get all elements that - can handle @caps from the registry sorted by rank. - - - 'autoplug-select' : - - gint user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps, - GValueArray* factories); - - This signal is fired once autoplugging has got a list of compatible - GstElementFactory. The signal is emitted with the GstCaps of the source pad - and a pointer on the GValueArray of compatible factories. - - The callback should return the index of the elementfactory in @factories - that should be tried next. - - If the callback returns -1, the autoplugging process will stop as if no - compatible factories were found. - - The default implementation of this function will try to autoplug the first - factory of the list. - -* Target Caps - - The target caps are a read/write GObject property of decodebin. - - By default the target caps are: - - _ Raw audio : audio/x-raw - - _ and raw video : video/x-raw - - _ and Text : text/plain, text/x-pango-markup - - -* media chain/group handling - - When autoplugging, all streams coming out of a demuxer will be grouped in a -DecodeGroup. - - All new source pads created on that demuxer after it has emitted the -'no-more-pads' signal will be put in another DecodeGroup. - - Only one decodegroup can be active at any given time. If a new decodegroup is -created while another one exists, that decodegroup will be set as blocking until -the existing one has drained. - - - -DecodeGroup ------------ - -Description: - - Streams belonging to the same group/chain of a media file. - -* Contents - - The DecodeGroup contains: - - _ a GstMultiQueue to which all streams of a the media group are connected. - - _ the eventual decoders which are autoplugged in order to produce the - requested target pads. - -* Proper group draining - - The DecodeGroup takes care that all the streams in the group are completely -drained (EOS has come through all source ghost pads). - -* Pre-roll and block - - The DecodeGroup has a global blocking feature. If enabled, all the ghosted -source pads for that group will be blocked. - - A method is available to unblock all blocked pads for that group. - - - -GstMultiQueue -------------- - -Description: - - Multiple input-output data queue - - The GstMultiQueue achieves the same functionality as GstQueue, with a few -differences: - - * Multiple streams handling. - - The element handles queueing data on more than one stream at once. To - achieve such a feature it has request sink pads (sink_%u) and 'sometimes' src - pads (src_%u). - - When requesting a given sinkpad, the associated srcpad for that stream will - be created. Ex: requesting sink_1 will generate src_1. - - - * Non-starvation on multiple streams. - - If more than one stream is used with the element, the streams' queues will - be dynamically grown (up to a limit), in order to ensure that no stream is - risking data starvation. This guarantees that at any given time there are at - least N bytes queued and available for each individual stream. - - If an EOS event comes through a srcpad, the associated queue should be - considered as 'not-empty' in the queue-size-growing algorithm. - - - * Non-linked srcpads graceful handling. - - A GstTask is started for all srcpads when going to GST_STATE_PAUSED. - - The task are blocking against a GCondition which will be fired in two - different cases: - - _ When the associated queue has received a buffer. - - _ When the associated queue was previously declared as 'not-linked' and the - first buffer of the queue is scheduled to be pushed synchronously in - relation to the order in which it arrived globally in the element (see - 'Synchronous data pushing' below). - - When woken up by the GCondition, the GstTask will try to push the next - GstBuffer/GstEvent on the queue. If pushing the GstBuffer/GstEvent returns - GST_FLOW_NOT_LINKED, then the associated queue is marked as 'not-linked'. If - pushing the GstBuffer/GstEvent succeeded the queue will no longer be marked as - 'not-linked'. - - If pushing on all srcpads returns GstFlowReturn different from GST_FLOW_OK, - then all the srcpads' tasks are stopped and subsequent pushes on sinkpads will - return GST_FLOW_NOT_LINKED. - - * Synchronous data pushing for non-linked pads. - - In order to better support dynamic switching between streams, the multiqueue - (unlike the current GStreamer queue) continues to push buffers on non-linked - pads rather than shutting down. - - In addition, to prevent a non-linked stream from very quickly consuming all - available buffers and thus 'racing ahead' of the other streams, the element - must ensure that buffers and inlined events for a non-linked stream are pushed - in the same order as they were received, relative to the other streams - controlled by the element. This means that a buffer cannot be pushed to a - non-linked pad any sooner than buffers in any other stream which were received - before it. - - -===================================== - Parsers, decoders and auto-plugging -===================================== - -This section has DRAFT status. - -Some media formats come in different "flavours" or "stream formats". These -formats differ in the way the setup data and media data is signalled and/or -packaged. An example for this is H.264 video, where there is a bytestream -format (with codec setup data signalled inline and units prefixed by a sync -code and packet length information) and a "raw" format where codec setup -data is signalled out of band (via the caps) and the chunking is implicit -in the way the buffers were muxed into a container, to mention just two of -the possible variants. - -Especially on embedded platforms it is common that decoders can only -handle one particular stream format, and not all of them. - -Where there are multiple stream formats, parsers are usually expected -to be able to convert between the different formats. This will, if -implemented correctly, work as expected in a static pipeline such as - - ... ! parser ! decoder ! sink - -where the parser can query the decoder's capabilities even before -processing the first piece of data, and configure itself to convert -accordingly, if conversion is needed at all. - -In an auto-plugging context this is not so straight-forward though, -because elements are plugged incrementally and not before the previous -element has processes some data and decided what it will output exactly -(unless the template caps are completely fixed, then it can continue -right away, this is not always the case here though, see below). A -parser will thus have to decide on *some* output format so auto-plugging -can continue. It doesn't know anything about the available decoders and -their capabilities though, so it's possible that it will choose a format -that is not supported by any of the available decoders, or by the preferred -decoder. - -If the parser had sufficiently concise but fixed source pad template caps, -decodebin could continue to plug a decoder right away, allowing the -parser to configure itself in the same way as it would with a static -pipeline. This is not an option, unfortunately, because often the -parser needs to process some data to determine e.g. the format's profile or -other stream properties (resolution, sample rate, channel configuration, etc.), -and there may be different decoders for different profiles (e.g. DSP codec -for baseline profile, and software fallback for main/high profile; or a DSP -codec only supporting certain resolutions, with a software fallback for -unusual resolutions). So if decodebin just plugged the most highest-ranking -decoder, that decoder might not be be able to handle the actual stream later -on, which would yield an error (this is a data flow error then which would -be hard to intercept and avoid in decodebin). In other words, we can't solve -this issue by plugging a decoder right away with the parser. - -So decodebin needs to communicate to the parser the set of available decoder -caps (which would contain the relevant capabilities/restrictions such as -supported profiles, resolutions, etc.), after the usual "autoplug-*" signal -filtering/sorting of course. - -This is done by plugging a capsfilter element right after the parser, and -constructing set of filter caps from the list of available decoders (one -appends at the end just the name(s) of the caps structures from the parser -pad template caps to function as an 'ANY other' caps equivalent). This let -the parser negotiate to a supported stream format in the same way as with -the static pipeline mentioned above, but of course incur some overhead -through the additional capsfilter element. - diff --git a/docs/design/design-encoding.txt b/docs/design/design-encoding.txt deleted file mode 100644 index b28de89..0000000 --- a/docs/design/design-encoding.txt +++ /dev/null @@ -1,571 +0,0 @@ -Encoding and Muxing -------------------- - -Summary -------- - A. Problems - B. Goals - 1. EncodeBin - 2. Encoding Profile System - 3. Helper Library for Profiles - I. Use-cases researched - - -A. Problems this proposal attempts to solve -------------------------------------------- - -* Duplication of pipeline code for gstreamer-based applications - wishing to encode and or mux streams, leading to subtle differences - and inconsistencies across those applications. - -* No unified system for describing encoding targets for applications - in a user-friendly way. - -* No unified system for creating encoding targets for applications, - resulting in duplication of code across all applications, - differences and inconsistencies that come with that duplication, - and applications hardcoding element names and settings resulting in - poor portability. - - - -B. Goals --------- - -1. Convenience encoding element - - Create a convenience GstBin for encoding and muxing several streams, - hereafter called 'EncodeBin'. - - This element will only contain one single property, which is a - profile. - -2. Define a encoding profile system - -2. Encoding profile helper library - - Create a helper library to: - * create EncodeBin instances based on profiles, and - * help applications to create/load/save/browse those profiles. - - - - -1. EncodeBin ------------- - -1.1 Proposed API ----------------- - - EncodeBin is a GstBin subclass. - - It implements the GstTagSetter interface, by which it will proxy the - calls to the muxer. - - Only two introspectable property (i.e. usable without extra API): - * A GstEncodingProfile* - * The name of the profile to use - - When a profile is selected, encodebin will: - * Add REQUEST sinkpads for all the GstStreamProfile - * Create the muxer and expose the source pad - - Whenever a request pad is created, encodebin will: - * Create the chain of elements for that pad - * Ghost the sink pad - * Return that ghost pad - - This allows reducing the code to the minimum for applications - wishing to encode a source for a given profile: - - ... - - encbin = gst_element_factory_make("encodebin, NULL); - g_object_set (encbin, "profile", "N900/H264 HQ", NULL); - gst_element_link (encbin, filesink); - - ... - - vsrcpad = gst_element_get_src_pad(source, "src1"); - vsinkpad = gst_element_get_request_pad (encbin, "video_%u"); - gst_pad_link(vsrcpad, vsinkpad); - - ... - - -1.2 Explanation of the Various stages in EncodeBin --------------------------------------------------- - - This describes the various stages which can happen in order to end - up with a multiplexed stream that can then be stored or streamed. - -1.2.1 Incoming streams - - The streams fed to EncodeBin can be of various types: - - * Video - * Uncompressed (but maybe subsampled) - * Compressed - * Audio - * Uncompressed (audio/x-raw) - * Compressed - * Timed text - * Private streams - - -1.2.2 Steps involved for raw video encoding - -(0) Incoming Stream - -(1) Transform raw video feed (optional) - - Here we modify the various fundamental properties of a raw video - stream to be compatible with the intersection of: - * The encoder GstCaps and - * The specified "Stream Restriction" of the profile/target - - The fundamental properties that can be modified are: - * width/height - This is done with a video scaler. - The DAR (Display Aspect Ratio) MUST be respected. - If needed, black borders can be added to comply with the target DAR. - * framerate - * format/colorspace/depth - All of this is done with a colorspace converter - -(2) Actual encoding (optional for raw streams) - - An encoder (with some optional settings) is used. - -(3) Muxing - - A muxer (with some optional settings) is used. - -(4) Outgoing encoded and muxed stream - - -1.2.3 Steps involved for raw audio encoding - - This is roughly the same as for raw video, expect for (1) - -(1) Transform raw audo feed (optional) - - We modify the various fundamental properties of a raw audio stream to - be compatible with the intersection of: - * The encoder GstCaps and - * The specified "Stream Restriction" of the profile/target - - The fundamental properties that can be modifier are: - * Number of channels - * Type of raw audio (integer or floating point) - * Depth (number of bits required to encode one sample) - - -1.2.4 Steps involved for encoded audio/video streams - - Steps (1) and (2) are replaced by a parser if a parser is available - for the given format. - - -1.2.5 Steps involved for other streams - - Other streams will just be forwarded as-is to the muxer, provided the - muxer accepts the stream type. - - - - -2. Encoding Profile System --------------------------- - - This work is based on: - * The existing GstPreset system for elements [0] - * The gnome-media GConf audio profile system [1] - * The investigation done into device profiles by Arista and - Transmageddon [2 and 3] - -2.2 Terminology ---------------- - -* Encoding Target Category - A Target Category is a classification of devices/systems/use-cases - for encoding. - - Such a classification is required in order for: - * Applications with a very-specific use-case to limit the number of - profiles they can offer the user. A screencasting application has - no use with the online services targets for example. - * Offering the user some initial classification in the case of a - more generic encoding application (like a video editor or a - transcoder). - - Ex: - Consumer devices - Online service - Intermediate Editing Format - Screencast - Capture - Computer - -* Encoding Profile Target - A Profile Target describes a specific entity for which we wish to - encode. - A Profile Target must belong to at least one Target Category. - It will define at least one Encoding Profile. - - Ex (with category): - Nokia N900 (Consumer device) - Sony PlayStation 3 (Consumer device) - Youtube (Online service) - DNxHD (Intermediate editing format) - HuffYUV (Screencast) - Theora (Computer) - -* Encoding Profile - A specific combination of muxer, encoders, presets and limitations. - - Ex: - Nokia N900/H264 HQ - Ipod/High Quality - DVD/Pal - Youtube/High Quality - HTML5/Low Bandwith - DNxHD - -2.3 Encoding Profile --------------------- - -An encoding profile requires the following information: - - * Name - This string is not translatable and must be unique. - A recommendation to guarantee uniqueness of the naming could be: - / - * Description - This is a translatable string describing the profile - * Muxing format - This is a string containing the GStreamer media-type of the - container format. - * Muxing preset - This is an optional string describing the preset(s) to use on the - muxer. - * Multipass setting - This is a boolean describing whether the profile requires several - passes. - * List of Stream Profile - -2.3.1 Stream Profiles - -A Stream Profile consists of: - - * Type - The type of stream profile (audio, video, text, private-data) - * Encoding Format - This is a string containing the GStreamer media-type of the encoding - format to be used. If encoding is not to be applied, the raw audio - media type will be used. - * Encoding preset - This is an optional string describing the preset(s) to use on the - encoder. - * Restriction - This is an optional GstCaps containing the restriction of the - stream that can be fed to the encoder. - This will generally containing restrictions in video - width/heigh/framerate or audio depth. - * presence - This is an integer specifying how many streams can be used in the - containing profile. 0 means that any number of streams can be - used. - * pass - This is an integer which is only meaningful if the multipass flag - has been set in the profile. If it has been set it indicates which - pass this Stream Profile corresponds to. - -2.4 Example profile -------------------- - -The representation used here is XML only as an example. No decision is -made as to which formatting to use for storing targets and profiles. - - - Nokia N900 - Consumer Device - - Nokia N900/H264 HQ - Nokia N900/MP3 - Nokia N900/AAC - - - - - Nokia N900/H264 HQ - - High Quality H264/AAC for the Nokia N900 - - video/quicktime,variant=iso - - - audio - audio/mpeg,mpegversion=4 - Quality High/Main - audio/x-raw,channels=[1,2] - 1 - - - video - video/x-h264 - Profile Baseline/Quality High - - video/x-raw,width=[16, 800],\ - height=[16, 480],framerate=[1/1, 30000/1001] - - 1 - - - - - -2.5 API -------- - A proposed C API is contained in the gstprofile.h file in this directory. - - -2.6 Modifications required in the existing GstPreset system ------------------------------------------------------------ - -2.6.1. Temporary preset. - - Currently a preset needs to be saved on disk in order to be - used. - - This makes it impossible to have temporary presets (that exist only - during the lifetime of a process), which might be required in the - new proposed profile system - -2.6.2 Categorisation of presets. - - Currently presets are just aliases of a group of property/value - without any meanings or explanation as to how they exclude each - other. - - Take for example the H264 encoder. It can have presets for: - * passes (1,2 or 3 passes) - * profiles (Baseline, Main, ...) - * quality (Low, medium, High) - - In order to programmatically know which presets exclude each other, - we here propose the categorisation of these presets. - - This can be done in one of two ways - 1. in the name (by making the name be [:]) - This would give for example: "Quality:High", "Profile:Baseline" - 2. by adding a new _meta key - This would give for example: _meta/category:quality - -2.6.3 Aggregation of presets. - - There can be more than one choice of presets to be done for an - element (quality, profile, pass). - - This means that one can not currently describe the full - configuration of an element with a single string but with many. - - The proposal here is to extend the GstPreset API to be able to set - all presets using one string and a well-known separator ('/'). - - This change only requires changes in the core preset handling code. - - This would allow doing the following: - gst_preset_load_preset (h264enc, - "pass:1/profile:baseline/quality:high"); - -2.7 Points to be determined ---------------------------- - - This document hasn't determined yet how to solve the following - problems: - -2.7.1 Storage of profiles - - One proposal for storage would be to use a system wide directory - (like $prefix/share/gstreamer-0.10/profiles) and store XML files for - every individual profiles. - - Users could then add their own profiles in ~/.gstreamer-0.10/profiles - - This poses some limitations as to what to do if some applications - want to have some profiles limited to their own usage. - - -3. Helper library for profiles ------------------------------- - - These helper methods could also be added to existing libraries (like - GstPreset, GstPbUtils, ..). - - The various API proposed are in the accompanying gstprofile.h file. - -3.1 Getting user-readable names for formats - - This is already provided by GstPbUtils. - -3.2 Hierarchy of profiles - - The goal is for applications to be able to present to the user a list - of combo-boxes for choosing their output profile: - - [ Category ] # optional, depends on the application - [ Device/Site/.. ] # optional, depends on the application - [ Profile ] - - Convenience methods are offered to easily get lists of categories, - devices, and profiles. - -3.3 Creating Profiles - - The goal is for applications to be able to easily create profiles. - - The applications needs to be able to have a fast/efficient way to: - * select a container format and see all compatible streams he can use - with it. - * select a codec format and see which container formats he can use - with it. - - The remaining parts concern the restrictions to encoder - input. - -3.4 Ensuring availability of plugins for Profiles - - When an application wishes to use a Profile, it should be able to - query whether it has all the needed plugins to use it. - - This part will use GstPbUtils to query, and if needed install the - missing plugins through the installed distribution plugin installer. - - -I. Use-cases researched ------------------------ - - This is a list of various use-cases where encoding/muxing is being - used. - -* Transcoding - - The goal is to convert with as minimal loss of quality any input - file for a target use. - A specific variant of this is transmuxing (see below). - - Example applications: Arista, Transmageddon - -* Rendering timelines - - The incoming streams are a collection of various segments that need - to be rendered. - Those segments can vary in nature (i.e. the video width/height can - change). - This requires the use of identiy with the single-segment property - activated to transform the incoming collection of segments to a - single continuous segment. - - Example applications: PiTiVi, Jokosher - -* Encoding of live sources - - The major risk to take into account is the encoder not encoding the - incoming stream fast enough. This is outside of the scope of - encodebin, and should be solved by using queues between the sources - and encodebin, as well as implementing QoS in encoders and sources - (the encoders emitting QoS events, and the upstream elements - adapting themselves accordingly). - - Example applications: camerabin, cheese - -* Screencasting applications - - This is similar to encoding of live sources. - The difference being that due to the nature of the source (size and - amount/frequency of updates) one might want to do the encoding in - two parts: - * The actual live capture is encoded with a 'almost-lossless' codec - (such as huffyuv) - * Once the capture is done, the file created in the first step is - then rendered to the desired target format. - - Fixing sources to only emit region-updates and having encoders - capable of encoding those streams would fix the need for the first - step but is outside of the scope of encodebin. - - Example applications: Istanbul, gnome-shell, recordmydesktop - -* Live transcoding - - This is the case of an incoming live stream which will be - broadcasted/transmitted live. - One issue to take into account is to reduce the encoding latency to - a minimum. This should mostly be done by picking low-latency - encoders. - - Example applications: Rygel, Coherence - -* Transmuxing - - Given a certain file, the aim is to remux the contents WITHOUT - decoding into either a different container format or the same - container format. - Remuxing into the same container format is useful when the file was - not created properly (for example, the index is missing). - Whenever available, parsers should be applied on the encoded streams - to validate and/or fix the streams before muxing them. - - Metadata from the original file must be kept in the newly created - file. - - Example applications: Arista, Transmaggedon - -* Loss-less cutting - - Given a certain file, the aim is to extract a certain part of the - file without going through the process of decoding and re-encoding - that file. - This is similar to the transmuxing use-case. - - Example applications: PiTiVi, Transmageddon, Arista, ... - -* Multi-pass encoding - - Some encoders allow doing a multi-pass encoding. - The initial pass(es) are only used to collect encoding estimates and - are not actually muxed and outputted. - The final pass uses previously collected information, and the output - is then muxed and outputted. - -* Archiving and intermediary format - - The requirement is to have lossless - -* CD ripping - - Example applications: Sound-juicer - -* DVD ripping - - Example application: Thoggen - - - -* Research links - - Some of these are still active documents, some other not - -[0] GstPreset API documentation - http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html - -[1] gnome-media GConf profiles - http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html - -[2] Research on a Device Profile API - http://gstreamer.freedesktop.org/wiki/DeviceProfile - -[3] Research on defining presets usage - http://gstreamer.freedesktop.org/wiki/PresetDesign - diff --git a/docs/design/design-orc-integration.txt b/docs/design/design-orc-integration.txt deleted file mode 100644 index a6a401d..0000000 --- a/docs/design/design-orc-integration.txt +++ /dev/null @@ -1,204 +0,0 @@ - -Orc Integration -=============== - -Sections --------- - - - About Orc - - Fast memcpy() - - Normal Usage - - Build Process - - Testing - - Orc Limitations - - -About Orc ---------- - -Orc code can be in one of two forms: in .orc files that is converted -by orcc to C code that calls liborc functions, or C code that calls -liborc to create complex operations at runtime. The former is mostly -for functions with predetermined functionality. The latter is for -functionality that is determined at runtime, where writing .orc -functions for all combinations would be prohibitive. Orc also has -a fast memcpy and memset which are useful independently. - - -Fast memcpy() -------------- - -*** This part is not integrated yet. *** - -Orc has built-in functions orc_memcpy() and orc_memset() that work -like memcpy() and memset(). These are meant for large copies only. -A reasonable cutoff for using orc_memcpy() instead of memcpy() is -if the number of bytes is generally greater than 100. DO NOT use -orc_memcpy() if the typical is size is less than 20 bytes, especially -if the size is known at compile time, as these cases are inlined by -the compiler. - -(Example: sys/ximage/ximagesink.c) - -Add $(ORC_CFLAGS) to libgstximagesink_la_CFLAGS and $(ORC_LIBS) to -libgstximagesink_la_LIBADD. Then, in the source file, add: - - #ifdef HAVE_ORC - #include - #else - #define orc_memcpy(a,b,c) memcpy(a,b,c) - #endif - -Then switch relevant uses of memcpy() to orc_memcpy(). - -The above example works whether or not Orc is enabled at compile -time. - - -Normal Usage ------------- - -The following lines are added near the top of Makefile.am for plugins -that use Orc code in .orc files (this is for the volume plugin): - - ORC_BASE=volume - include $(top_srcdir)/common/orc.mk - -Also add the generated source file to the plugin build: - - nodist_libgstvolume_la_SOURCES = $(ORC_SOURCES) - -And of course, add $(ORC_CFLAGS) to libgstvolume_la_CFLAGS, and -$(ORC_LIBS) to libgstvolume_la_LIBADD. - -The value assigned to ORC_BASE does not need to be related to -the name of the plugin. - - -Advanced Usage --------------- - -The Holy Grail of Orc usage is to programmatically generate Orc code -at runtime, have liborc compile it into binary code at runtime, and -then execute this code. Currently, the best example of this is in -Schroedinger. An example of how this would be used is audioconvert: -given an input format, channel position manipulation, dithering and -quantizing configuration, and output format, a Orc code generator -would create an OrcProgram, add the appropriate instructions to do -each step based on the configuration, and then compile the program. -Successfully compiling the program would return a function pointer -that can be called to perform the operation. - -This sort of advanced usage requires structural changes to current -plugins (e.g., audioconvert) and will probably be developed -incrementally. Moreover, if such code is intended to be used without -Orc as strict build/runtime requirement, two codepaths would need to -be developed and tested. For this reason, until GStreamer requires -Orc, I think it's a good idea to restrict such advanced usage to the -cog plugin in -bad, which requires Orc. - - -Build Process -------------- - -The goal of the build process is to make Orc non-essential for most -developers and users. This is not to say you shouldn't have Orc -installed -- without it, you will get slow backup C code, just that -people compiling GStreamer are not forced to switch from Liboil to -Orc immediately. - -With Orc installed, the build process will use the Orc Compiler (orcc) -to convert each .orc file into a temporary C source (tmp-orc.c) and a -temporary header file (${name}orc.h if constructed from ${base}.orc). -The C source file is compiled and linked to the plugin, and the header -file is included by other source files in the plugin. - -If 'make orc-update' is run in the source directory, the files -tmp-orc.c and ${base}orc.h are copied to ${base}orc-dist.c and -${base}orc-dist.h respectively. The -dist.[ch] files are automatically -disted via orc.mk. The -dist.[ch] files should be checked in to -git whenever the .orc source is changed and checked in. Example -workflow: - - edit .orc file - ... make, test, etc. - make orc-update - git add volume.orc volumeorc-dist.c volumeorc-dist.h - git commit - -At 'make dist' time, all of the .orc files are compiled, and then -copied to their -dist.[ch] counterparts, and then the -dist.[ch] -files are added to the dist directory. - -Without Orc installed (or --disable-orc given to configure), the --dist.[ch] files are copied to tmp-orc.c and ${name}orc.h. When -compiled Orc disabled, DISABLE_ORC is defined in config.h, and -the C backup code is compiled. This backup code is pure C, and -does not include orc headers or require linking against liborc. - -The common/orc.mk build method is limited by the inflexibility of -automake. The file tmp-orc.c must be a fixed filename, using ORC_NAME -to generate the filename does not work because it conflicts with -automake's dependency generation. Building multiple .orc files -is not possible due to this restriction. - - -Testing -------- - -If you create another .orc file, please add it to -tests/orc/Makefile.am. This causes automatic test code to be -generated and run during 'make check'. Each function in the .orc -file is tested by comparing the results of executing the run-time -compiled code and the C backup function. - - -Orc Limitations ---------------- - -audioconvert - - Orc doesn't have a mechanism for generating random numbers, which - prevents its use as-is for dithering. One way around this is to - generate suitable dithering values in one pass, then use those - values in a second Orc-based pass. - - Orc doesn't handle 64-bit float, for no good reason. - - Irrespective of Orc handling 64-bit float, it would be useful to - have a direct 32-bit float to 16-bit integer conversion. - - audioconvert is a good candidate for programmatically generated - Orc code. - - audioconvert enumerates functions in terms of big-endian vs. - little-endian. Orc's functions are "native" and "swapped". - Programmatically generating code removes the need to worry about - this. - - Orc doesn't handle 24-bit samples. Fixing this is not a priority - (for ds). - -videoscale - - Orc doesn't handle horizontal resampling yet. The plan is to add - special sampling opcodes, for nearest, bilinear, and cubic - interpolation. - -videotestsrc - - Lots of code in videotestsrc needs to be rewritten to be SIMD - (and Orc) friendly, e.g., stuff that uses oil_splat_u8(). - - A fast low-quality random number generator in Orc would be useful - here. - -volume - - Many of the comments on audioconvert apply here as well. - - There are a bunch of FIXMEs in here that are due to misapplied - patches. - - - diff --git a/docs/design/draft-keyframe-force.txt b/docs/design/draft-keyframe-force.txt deleted file mode 100644 index 14945f0..0000000 --- a/docs/design/draft-keyframe-force.txt +++ /dev/null @@ -1,91 +0,0 @@ -Forcing keyframes ------------------ - -Consider the following use case: - - We have a pipeline that performs video and audio capture from a live source, - compresses and muxes the streams and writes the resulting data into a file. - - Inside the uncompressed video data we have a specific pattern inserted at - specific moments that should trigger a switch to a new file, meaning, we close - the existing file we are writing to and start writing to a new file. - - We want the new file to start with a keyframe so that one can start decoding - the file immediately. - -Components: - - 1) We need an element that is able to detect the pattern in the video stream. - - 2) We need to inform the video encoder that it should start encoding a keyframe - starting from exactly the frame with the pattern. - - 3) We need to inform the demuxer that it should flush out any pending data and - start creating the start of a new file with the keyframe as a first video - frame. - - 4) We need to inform the sink element that it should start writing to the next - file. This requires application interaction to instruct the sink of the new - filename. The application should also be free to ignore the boundary and - continue to write to the existing file. The application will typically use - an event pad probe to detect the custom event. - -Implementation: - - The implementation would consist of generating a GST_EVENT_CUSTOM_DOWNSTREAM - event that marks the keyframe boundary. This event is inserted into the - pipeline by the application upon a certain trigger. In the above use case this - trigger would be given by the element that detects the pattern, in the form of - an element message. - - The custom event would travel further downstream to instruct encoder, muxer and - sink about the possible switch. - - The information passed in the event consists of: - - name: GstForceKeyUnit - (G_TYPE_UINT64)"timestamp" : the timestamp of the buffer that - triggered the event. - (G_TYPE_UINT64)"stream-time" : the stream position that triggered the - event. - (G_TYPE_UINT64)"running-time" : the running time of the stream when the - event was triggered. - (G_TYPE_BOOLEAN)"all-headers" : Send all headers, including those in - the caps or those sent at the start of - the stream. - - .... : optional other data fields. - - Note that this event is purely informational, no element is required to - perform an action but it should forward the event downstream, just like any - other event it does not handle. - - Elements understanding the event should behave as follows: - - 1) The video encoder receives the event before the next frame. Upon reception - of the event it schedules to encode the next frame as a keyframe. - Before pushing out the encoded keyframe it must push the GstForceKeyUnit - event downstream. - - 2) The muxer receives the GstForceKeyUnit event and flushes out its current state, - preparing to produce data that can be used as a keyunit. Before pushing out - the new data it pushes the GstForceKeyUnit event downstream. - - 3) The application receives the GstForceKeyUnit on a sink padprobe of the sink - and reconfigures the sink to make it perform new actions after receiving - the next buffer. - - -Upstream --------- - -When using RTP packets can get lost or receivers can be added at any time, -they may request a new key frame. - -An downstream element sends an upstream "GstForceKeyUnit" event up the -pipeline. - -When an element produces some kind of key unit in output, but has -no such concept in its input (like an encoder that takes raw frames), -it consumes the event (doesn't pass it upstream), and instead sends -a downstream GstForceKeyUnit event and a new keyframe. diff --git a/docs/design/draft-subtitle-overlays.txt b/docs/design/draft-subtitle-overlays.txt deleted file mode 100644 index 87f2c2c..0000000 --- a/docs/design/draft-subtitle-overlays.txt +++ /dev/null @@ -1,546 +0,0 @@ -=============================================================== - Subtitle overlays, hardware-accelerated decoding and playbin -=============================================================== - -Status: EARLY DRAFT / BRAINSTORMING - - === 1. Background === - -Subtitles can be muxed in containers or come from an external source. - -Subtitles come in many shapes and colours. Usually they are either -text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles -and the most common form of DVB subs). Bitmap based subtitles are -usually compressed in some way, like some form of run-length encoding. - -Subtitles are currently decoded and rendered in subtitle-format-specific -overlay elements. These elements have two sink pads (one for raw video -and one for the subtitle format in question) and one raw video source pad. - -They will take care of synchronising the two input streams, and of -decoding and rendering the subtitles on top of the raw video stream. - -Digression: one could theoretically have dedicated decoder/render elements -that output an AYUV or ARGB image, and then let a videomixer element do -the actual overlaying, but this is not very efficient, because it requires -us to allocate and blend whole pictures (1920x1080 AYUV = 8MB, -1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the overlay region -is only a small rectangle at the bottom. This wastes memory and CPU. -We could do something better by introducing a new format that only -encodes the region(s) of interest, but we don't have such a format yet, and -are not necessarily keen to rewrite this part of the logic in playbin -at this point - and we can't change existing elements' behaviour, so would -need to introduce new elements for this. - -Playbin2 supports outputting compressed formats, i.e. it does not -force decoding to a raw format, but is happy to output to a non-raw -format as long as the sink supports that as well. - -In case of certain hardware-accelerated decoding APIs, we will make use -of that functionality. However, the decoder will not output a raw video -format then, but some kind of hardware/API-specific format (in the caps) -and the buffers will reference hardware/API-specific objects that -the hardware/API-specific sink will know how to handle. - - - === 2. The Problem === - -In the case of such hardware-accelerated decoding, the decoder will not -output raw pixels that can easily be manipulated. Instead, it will -output hardware/API-specific objects that can later be used to render -a frame using the same API. - -Even if we could transform such a buffer into raw pixels, we most -likely would want to avoid that, in order to avoid the need to -map the data back into system memory (and then later back to the GPU). -It's much better to upload the much smaller encoded data to the GPU/DSP -and then leave it there until rendered. - -Currently playbin only supports subtitles on top of raw decoded video. -It will try to find a suitable overlay element from the plugin registry -based on the input subtitle caps and the rank. (It is assumed that we -will be able to convert any raw video format into any format required -by the overlay using a converter such as videoconvert.) - -It will not render subtitles if the video sent to the sink is not -raw YUV or RGB or if conversions have been disabled by setting the -native-video flag on playbin. - -Subtitle rendering is considered an important feature. Enabling -hardware-accelerated decoding by default should not lead to a major -feature regression in this area. - -This means that we need to support subtitle rendering on top of -non-raw video. - - - === 3. Possible Solutions === - -The goal is to keep knowledge of the subtitle format within the -format-specific GStreamer plugins, and knowledge of any specific -video acceleration API to the GStreamer plugins implementing -that API. We do not want to make the pango/dvbsuboverlay/dvdspu/kate -plugins link to libva/libvdpau/etc. and we do not want to make -the vaapi/vdpau plugins link to all of libpango/libkate/libass etc. - - -Multiple possible solutions come to mind: - - (a) backend-specific overlay elements - - e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu, - vaapidvbsuboverlay, vdpaudvbsuboverlay, etc. - - This assumes the overlay can be done directly on the backend-specific - object passed around. - - The main drawback with this solution is that it leads to a lot of - code duplication and may also lead to uncertainty about distributing - certain duplicated pieces of code. The code duplication is pretty - much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu, - kate, assrender, etc. available in form of base classes to derive - from is not really an option. Similarly, one would not really want - the vaapi/vdpau plugin to depend on a bunch of other libraries - such as libpango, libkate, libtiger, libass, etc. - - One could add some new kind of overlay plugin feature though in - combination with a generic base class of some sort, but in order - to accommodate all the different cases and formats one would end - up with quite convoluted/tricky API. - - (Of course there could also be a GstFancyVideoBuffer that provides - an abstraction for such video accelerated objects and that could - provide an API to add overlays to it in a generic way, but in the - end this is just a less generic variant of (c), and it is not clear - that there are real benefits to a specialised solution vs. a more - generic one). - - - (b) convert backend-specific object to raw pixels and then overlay - - Even where possible technically, this is most likely very - inefficient. - - - (c) attach the overlay data to the backend-specific video frame buffers - in a generic way and do the actual overlaying/blitting later in - backend-specific code such as the video sink (or an accelerated - encoder/transcoder) - - In this case, the actual overlay rendering (i.e. the actual text - rendering or decoding DVD/DVB data into pixels) is done in the - subtitle-format-specific GStreamer plugin. All knowledge about - the subtitle format is contained in the overlay plugin then, - and all knowledge about the video backend in the video backend - specific plugin. - - The main question then is how to get the overlay pixels (and - we will only deal with pixels here) from the overlay element - to the video sink. - - This could be done in multiple ways: One could send custom - events downstream with the overlay data, or one could attach - the overlay data directly to the video buffers in some way. - - Sending inline events has the advantage that is is fairly - transparent to any elements between the overlay element and - the video sink: if an effects plugin creates a new video - buffer for the output, nothing special needs to be done to - maintain the subtitle overlay information, since the overlay - data is not attached to the buffer. However, it slightly - complicates things at the sink, since it would also need to - look for the new event in question instead of just processing - everything in its buffer render function. - - If one attaches the overlay data to the buffer directly, any - element between overlay and video sink that creates a new - video buffer would need to be aware of the overlay data - attached to it and copy it over to the newly-created buffer. - - One would have to do implement a special kind of new query - (e.g. FEATURE query) that is not passed on automatically by - gst_pad_query_default() in order to make sure that all elements - downstream will handle the attached overlay data. (This is only - a problem if we want to also attach overlay data to raw video - pixel buffers; for new non-raw types we can just make it - mandatory and assume support and be done with it; for existing - non-raw types nothing changes anyway if subtitles don't work) - (we need to maintain backwards compatibility for existing raw - video pipelines like e.g.: ..decoder ! suboverlay ! encoder..) - - Even though slightly more work, attaching the overlay information - to buffers seems more intuitive than sending it interleaved as - events. And buffers stored or passed around (e.g. via the - "last-buffer" property in the sink when doing screenshots via - playbin) always contain all the information needed. - - - (d) create a video/x-raw-*-delta format and use a backend-specific videomixer - - This possibility was hinted at already in the digression in - section 1. It would satisfy the goal of keeping subtitle format - knowledge in the subtitle plugins and video backend knowledge - in the video backend plugin. It would also add a concept that - might be generally useful (think ximagesrc capture with xdamage). - However, it would require adding foorender variants of all the - existing overlay elements, and changing playbin to that new - design, which is somewhat intrusive. And given the general - nature of such a new format/API, we would need to take a lot - of care to be able to accommodate all possible use cases when - designing the API, which makes it considerably more ambitious. - Lastly, we would need to write videomixer variants for the - various accelerated video backends as well. - - -Overall (c) appears to be the most promising solution. It is the least -intrusive and should be fairly straight-forward to implement with -reasonable effort, requiring only small changes to existing elements -and requiring no new elements. - -Doing the final overlaying in the sink as opposed to a videomixer -or overlay in the middle of the pipeline has other advantages: - - - if video frames need to be dropped, e.g. for QoS reasons, - we could also skip the actual subtitle overlaying and - possibly the decoding/rendering as well, if the - implementation and API allows for that to be delayed. - - - the sink often knows the actual size of the window/surface/screen - the output video is rendered to. This *may* make it possible to - render the overlay image in a higher resolution than the input - video, solving a long standing issue with pixelated subtitles on - top of low-resolution videos that are then scaled up in the sink. - This would require for the rendering to be delayed of course instead - of just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer - in the overlay, but that could all be supported. - - - if the video backend / sink has support for high-quality text - rendering (clutter?) we could just pass the text or pango markup - to the sink and let it do the rest (this is unlikely to be - supported in the general case - text and glyph rendering is - hard; also, we don't really want to make up our own text markup - system, and pango markup is probably too limited for complex - karaoke stuff). - - - === 4. API needed === - - (a) Representation of subtitle overlays to be rendered - - We need to pass the overlay pixels from the overlay element to the - sink somehow. Whatever the exact mechanism, let's assume we pass - a refcounted GstVideoOverlayComposition struct or object. - - A composition is made up of one or more overlays/rectangles. - - In the simplest case an overlay rectangle is just a blob of - RGBA/ABGR [FIXME?] or AYUV pixels with positioning info and other - metadata, and there is only one rectangle to render. - - We're keeping the naming generic ("OverlayFoo" rather than - "SubtitleFoo") here, since this might also be handy for - other use cases such as e.g. logo overlays or so. It is not - designed for full-fledged video stream mixing though. - - // Note: don't mind the exact implementation details, they'll be hidden - - // FIXME: might be confusing in 0.11 though since GstXOverlay was - // renamed to GstVideoOverlay in 0.11, but not much we can do, - // maybe we can rename GstVideoOverlay to something better - - struct GstVideoOverlayComposition - { - guint num_rectangles; - GstVideoOverlayRectangle ** rectangles; - - /* lowest rectangle sequence number still used by the upstream - * overlay element. This way a renderer maintaining some kind of - * rectangles <-> surface cache can know when to free cached - * surfaces/rectangles. */ - guint min_seq_num_used; - - /* sequence number for the composition (same series as rectangles) */ - guint seq_num; - } - - struct GstVideoOverlayRectangle - { - /* Position on video frame and dimension of output rectangle in - * output frame terms (already adjusted for the PAR of the output - * frame). x/y can be negative (overlay will be clipped then) */ - gint x, y; - guint render_width, render_height; - - /* Dimensions of overlay pixels */ - guint width, height, stride; - - /* This is the PAR of the overlay pixels */ - guint par_n, par_d; - - /* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems, - * and BGRA on little-endian systems (i.e. pixels are treated as - * 32-bit values and alpha is always in the most-significant byte, - * and blue is in the least-significant byte). - * - * FIXME: does anyone actually use AYUV in practice? (we do - * in our utility function to blend on top of raw video) - * What about AYUV and endianness? Do we always have [A][Y][U][V] - * in memory? */ - /* FIXME: maybe use our own enum? */ - GstVideoFormat format; - - /* Refcounted blob of memory, no caps or timestamps */ - GstBuffer *pixels; - - // FIXME: how to express source like text or pango markup? - // (just add source type enum + source buffer with data) - // - // FOR 0.10: always send pixel blobs, but attach source data in - // addition (reason: if downstream changes, we can't renegotiate - // that properly, if we just do a query of supported formats from - // the start). Sink will just ignore pixels and use pango markup - // from source data if it supports that. - // - // FOR 0.11: overlay should query formats (pango markup, pixels) - // supported by downstream and then only send that. We can - // renegotiate via the reconfigure event. - // - - /* sequence number: useful for backends/renderers/sinks that want - * to maintain a cache of rectangles <-> surfaces. The value of - * the min_seq_num_used in the composition tells the renderer which - * rectangles have expired. */ - guint seq_num; - - /* FIXME: we also need a (private) way to cache converted/scaled - * pixel blobs */ - } - - (a1) Overlay consumer API: - - How would this work in a video sink that supports scaling of textures: - - gst_foo_sink_render () { - /* assume only one for now */ - if video_buffer has composition: - composition = video_buffer.get_composition() - - for each rectangle in composition: - if rectangle.source_data_type == PANGO_MARKUP - actor = text_from_pango_markup (rectangle.get_source_data()) - else - pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...) - actor = texture_from_rgba (pixels, ...) - - .. position + scale on top of video surface ... - } - - (a2) Overlay producer API: - - e.g. logo or subpicture overlay: got pixels, stuff into rectangle: - - if (logoverlay->cached_composition == NULL) { - comp = composition_new (); - - rect = rectangle_new (format, pixels_buf, - width, height, stride, par_n, par_d, - x, y, render_width, render_height); - - /* composition adds its own ref for the rectangle */ - composition_add_rectangle (comp, rect); - rectangle_unref (rect); - - /* buffer adds its own ref for the composition */ - video_buffer_attach_composition (comp); - - /* we take ownership of the composition and save it for later */ - logoverlay->cached_composition = comp; - } else { - video_buffer_attach_composition (logoverlay->cached_composition); - } - - FIXME: also add some API to modify render position/dimensions of - a rectangle (probably requires creation of new rectangle, unless - we handle writability like with other mini objects). - - (b) Fallback overlay rendering/blitting on top of raw video - - Eventually we want to use this overlay mechanism not only for - hardware-accelerated video, but also for plain old raw video, - either at the sink or in the overlay element directly. - - Apart from the advantages listed earlier in section 3, this - allows us to consolidate a lot of overlaying/blitting code that - is currently repeated in every single overlay element in one - location. This makes it considerably easier to support a whole - range of raw video formats out of the box, add SIMD-optimised - rendering using ORC, or handle corner cases correctly. - - (Note: side-effect of overlaying raw video at the video sink is - that if e.g. a screnshotter gets the last buffer via the last-buffer - property of basesink, it would get an image without the subtitles - on top. This could probably be fixed by re-implementing the - property in GstVideoSink though. Playbin2 could handle this - internally as well). - - void - gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp - GstBuffer * video_buf) - { - guint n; - - g_return_if_fail (gst_buffer_is_writable (video_buf)); - g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL); - - ... parse video_buffer caps into BlendVideoFormatInfo ... - - for each rectangle in the composition: { - - if (gst_video_format_is_yuv (video_buf_format)) { - overlay_format = FORMAT_AYUV; - } else if (gst_video_format_is_rgb (video_buf_format)) { - overlay_format = FORMAT_ARGB; - } else { - /* FIXME: grayscale? */ - return; - } - - /* this will scale and convert AYUV<->ARGB if needed */ - pixels = rectangle_get_pixels_scaled (rectangle, overlay_format); - - ... clip output rectangle ... - - __do_blend (video_buf_format, video_buf->data, - overlay_format, pixels->data, - x, y, width, height, stride); - - gst_buffer_unref (pixels); - } - } - - - (c) Flatten all rectangles in a composition - - We cannot assume that the video backend API can handle any - number of rectangle overlays, it's possible that it only - supports one single overlay, in which case we need to squash - all rectangles into one. - - However, we'll just declare this a corner case for now, and - implement it only if someone actually needs it. It's easy - to add later API-wise. Might be a bit tricky if we have - rectangles with different PARs/formats (e.g. subs and a logo), - though we could probably always just use the code from (b) - with a fully transparent video buffer to create a flattened - overlay buffer. - - (d) core API: new FEATURE query - - For 0.10 we need to add a FEATURE query, so the overlay element - can query whether the sink downstream and all elements between - the overlay element and the sink support the new overlay API. - Elements in between need to support it because the render - positions and dimensions need to be updated if the video is - cropped or rescaled, for example. - - In order to ensure that all elements support the new API, - we need to drop the query in the pad default query handler - (so it only succeeds if all elements handle it explicitly). - - Might want two variants of the feature query - one where - all elements in the chain need to support it explicitly - and one where it's enough if some element downstream - supports it. - - In 0.11 this could probably be handled via GstMeta and - ALLOCATION queries (and/or we could simply require - elements to be aware of this API from the start). - - There appears to be no issue with downstream possibly - not being linked yet at the time when an overlay would - want to do such a query. - - -Other considerations: - - - renderers (overlays or sinks) may be able to handle only ARGB or only AYUV - (for most graphics/hw-API it's likely ARGB of some sort, while our - blending utility functions will likely want the same colour space as - the underlying raw video format, which is usually YUV of some sort). - We need to convert where required, and should cache the conversion. - - - renderers may or may not be able to scale the overlay. We need to - do the scaling internally if not (simple case: just horizontal scaling - to adjust for PAR differences; complex case: both horizontal and vertical - scaling, e.g. if subs come from a different source than the video or the - video has been rescaled or cropped between overlay element and sink). - - - renderers may be able to generate (possibly scaled) pixels on demand - from the original data (e.g. a string or RLE-encoded data). We will - ignore this for now, since this functionality can still be added later - via API additions. The most interesting case would be to pass a pango - markup string, since e.g. clutter can handle that natively. - - - renderers may be able to write data directly on top of the video pixels - (instead of creating an intermediary buffer with the overlay which is - then blended on top of the actual video frame), e.g. dvdspu, dvbsuboverlay - - However, in the interest of simplicity, we should probably ignore the - fact that some elements can blend their overlays directly on top of the - video (decoding/uncompressing them on the fly), even more so as it's - not obvious that it's actually faster to decode the same overlay - 70-90 times (say) (ie. ca. 3 seconds of video frames) and then blend - it 70-90 times instead of decoding it once into a temporary buffer - and then blending it directly from there, possibly SIMD-accelerated. - Also, this is only relevant if the video is raw video and not some - hardware-acceleration backend object. - - And ultimately it is the overlay element that decides whether to do - the overlay right there and then or have the sink do it (if supported). - It could decide to keep doing the overlay itself for raw video and - only use our new API for non-raw video. - - - renderers may want to make sure they only upload the overlay pixels once - per rectangle if that rectangle recurs in subsequent frames (as part of - the same composition or a different composition), as is likely. This caching - of e.g. surfaces needs to be done renderer-side and can be accomplished - based on the sequence numbers. The composition contains the lowest - sequence number still in use upstream (an overlay element may want to - cache created compositions+rectangles as well after all to re-use them - for multiple frames), based on that the renderer can expire cached - objects. The caching needs to be done renderer-side because attaching - renderer-specific objects to the rectangles won't work well given the - refcounted nature of rectangles and compositions, making it unpredictable - when a rectangle or composition will be freed or from which thread - context it will be freed. The renderer-specific objects are likely bound - to other types of renderer-specific contexts, and need to be managed - in connection with those. - - - composition/rectangles should internally provide a certain degree of - thread-safety. Multiple elements (sinks, overlay element) might access - or use the same objects from multiple threads at the same time, and it - is expected that elements will keep a ref to compositions and rectangles - they push downstream for a while, e.g. until the current subtitle - composition expires. - - === 5. Future considerations === - - - alternatives: there may be multiple versions/variants of the same subtitle - stream. On DVDs, there may be a 4:3 version and a 16:9 version of the same - subtitles. We could attach both variants and let the renderer pick the best - one for the situation (currently we just use the 16:9 version). With totem, - it's ultimately totem that adds the 'black bars' at the top/bottom, so totem - also knows if it's got a 4:3 display and can/wants to fit 4:3 subs (which - may render on top of the bars) or not, for example. - - === 6. Misc. FIXMEs === - -TEST: should these look (roughly) alike (note text distortion) - needs fixing in textoverlay - -gst-launch-0.10 \ - videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \ - videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \ - videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 ! textoverlay text=Hello font-desc=72 ! xvimagesink - - ~~~ THE END ~~~ - diff --git a/docs/design/part-interlaced-video.txt b/docs/design/part-interlaced-video.txt deleted file mode 100644 index 4ac678e..0000000 --- a/docs/design/part-interlaced-video.txt +++ /dev/null @@ -1,107 +0,0 @@ -Interlaced Video -================ - -Video buffers have a number of states identifiable through a combination of caps -and buffer flags. - -Possible states: -- Progressive -- Interlaced - - Plain - - One field - - Two fields - - Three fields - this should be a progressive buffer with a repeated 'first' - field that can be used for telecine pulldown - - Telecine - - One field - - Two fields - - Progressive - - Interlaced (a.k.a. 'mixed'; the fields are from different frames) - - Three fields - this should be a progressive buffer with a repeated 'first' - field that can be used for telecine pulldown - -Note: It can be seen that the difference between the plain interlaced and -telecine states is that in the telecine state, buffers containing two fields may -be progressive. - -Tools for identification: -- GstVideoInfo - - GstVideoInterlaceMode - enum - GST_VIDEO_INTERLACE_MODE_... - - PROGRESSIVE - - INTERLEAVED - - MIXED -- Buffers flags - GST_VIDEO_BUFFER_FLAG_... - - TFF - - RFF - - ONEFIELD - - INTERLACED - - -Identification of Buffer States -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Note that flags are not necessarily interpreted in the same way for all -different states nor are they necessarily required nor make sense in all cases. - - -Progressive -........... - -If the interlace mode in the video info corresponding to a buffer is -"progressive", then the buffer is progressive. - - -Plain Interlaced -................ - -If the video info interlace mode is "interleaved", then the buffer is plain -interlaced. - -GST_VIDEO_BUFFER_FLAG_TFF indicates whether the top or bottom field is to be -displayed first. The timestamp on the buffer corresponds to the first field. - -GST_VIDEO_BUFFER_FLAG_RFF indicates that the first field (indicated by the TFF flag) -should be repeated. This is generally only used for telecine purposes but as the -telecine state was added long after the interlaced state was added and defined, -this flag remains valid for plain interlaced buffers. - -GST_VIDEO_BUFFER_FLAG_ONEFIELD means that only the field indicated through the TFF -flag is to be used. The other field should be ignored. - - -Telecine -........ - -If video info interlace mode is "mixed" then the buffers are in some form of -telecine state. - -The TFF and ONEFIELD flags have the same semantics as for the plain interlaced -state. - -GST_VIDEO_BUFFER_FLAG_RFF in the telecine state indicates that the buffer contains -only repeated fields that are present in other buffers and are as such -unneeded. For example, in a sequence of three telecined frames, we might have: - -AtAb AtBb BtBb - -In this situation, we only need the first and third buffers as the second -buffer contains fields present in the first and third. - -Note that the following state can have its second buffer identified using the -ONEFIELD flag (and TFF not set): - -AtAb AtBb BtCb - -The telecine state requires one additional flag to be able to identify -progressive buffers. - -The presence of the GST_VIDEO_BUFFER_FLAG_INTERLACED means that the buffer is an -'interlaced' or 'mixed' buffer that contains two fields that, when combined -with fields from adjacent buffers, allow reconstruction of progressive frames. -The absence of the flag implies the buffer containing two fields is a -progressive frame. - -For example in the following sequence, the third buffer would be mixed (yes, it -is a strange pattern, but it can happen): - -AtAb AtBb BtCb CtDb DtDb diff --git a/docs/design/part-mediatype-audio-raw.txt b/docs/design/part-mediatype-audio-raw.txt deleted file mode 100644 index 503ef63..0000000 --- a/docs/design/part-mediatype-audio-raw.txt +++ /dev/null @@ -1,76 +0,0 @@ -Media Types ------------ - - audio/x-raw - - format, G_TYPE_STRING, mandatory - The format of the audio samples, see the Formats section for a list - of valid sample formats. - - rate, G_TYPE_INT, mandatory - The samplerate of the audio - - channels, G_TYPE_INT, mandatory - The number of channels - - channel-mask, GST_TYPE_BITMASK, mandatory for more than 2 channels - Bitmask of channel positions present. May be omitted for mono and - stereo. May be set to 0 to denote that the channels are unpositioned. - - layout, G_TYPE_STRING, mandatory - The layout of channels within a buffer. Possible values are - "interleaved" (for LRLRLRLR) and "non-interleaved" (LLLLRRRR) - -Use GstAudioInfo and related helper API to create and parse raw audio caps. - - -Metadata --------- - - "GstAudioDownmixMeta" - A matrix for downmixing multichannel audio to a lower numer of channels. - - -Formats -------- - - The following values can be used for the format string property. - - "S8" 8-bit signed PCM audio - "U8" 8-bit unsigned PCM audio - - "S16LE" 16-bit signed PCM audio - "S16BE" 16-bit signed PCM audio - "U16LE" 16-bit unsigned PCM audio - "U16BE" 16-bit unsigned PCM audio - - "S24_32LE" 24-bit signed PCM audio packed into 32-bit - "S24_32BE" 24-bit signed PCM audio packed into 32-bit - "U24_32LE" 24-bit unsigned PCM audio packed into 32-bit - "U24_32BE" 24-bit unsigned PCM audio packed into 32-bit - - "S32LE" 32-bit signed PCM audio - "S32BE" 32-bit signed PCM audio - "U32LE" 32-bit unsigned PCM audio - "U32BE" 32-bit unsigned PCM audio - - "S24LE" 24-bit signed PCM audio - "S24BE" 24-bit signed PCM audio - "U24LE" 24-bit unsigned PCM audio - "U24BE" 24-bit unsigned PCM audio - - "S20LE" 20-bit signed PCM audio - "S20BE" 20-bit signed PCM audio - "U20LE" 20-bit unsigned PCM audio - "U20BE" 20-bit unsigned PCM audio - - "S18LE" 18-bit signed PCM audio - "S18BE" 18-bit signed PCM audio - "U18LE" 18-bit unsigned PCM audio - "U18BE" 18-bit unsigned PCM audio - - "F32LE" 32-bit floating-point audio - "F32BE" 32-bit floating-point audio - "F64LE" 64-bit floating-point audio - "F64BE" 64-bit floating-point audio - diff --git a/docs/design/part-mediatype-text-raw.txt b/docs/design/part-mediatype-text-raw.txt deleted file mode 100644 index 82fbdd5..0000000 --- a/docs/design/part-mediatype-text-raw.txt +++ /dev/null @@ -1,28 +0,0 @@ -Media Types ------------ - - text/x-raw - - format, G_TYPE_STRING, mandatory - The format of the text, see the Formats section for a list of valid format - strings. - -Metadata --------- - - There are no common metas for this raw format yet. - -Formats -------- - - "utf8" plain timed utf8 text (formerly text/plain) - - Parsed timed text in utf8 format. - - "pango-markup" plain timed utf8 text with pango markup (formerly text/x-pango-markup) - - Same as "utf8", but text embedded in an XML-style markup language for - size, colour, emphasis, etc. - - See http://developer.gnome.org/pango/stable/PangoMarkupFormat.html - diff --git a/docs/design/part-mediatype-video-raw.txt b/docs/design/part-mediatype-video-raw.txt deleted file mode 100644 index c7fc837..0000000 --- a/docs/design/part-mediatype-video-raw.txt +++ /dev/null @@ -1,1258 +0,0 @@ -Media Types ------------ - - video/x-raw - - width, G_TYPE_INT, mandatory - The width of the image in pixels. - - height, G_TYPE_INT, mandatory - The height of the image in pixels - - framerate, GST_TYPE_FRACTION, default 0/1 - The framerate of the video 0/1 for variable framerate - - max-framerate, GST_TYPE_FRACTION, default as framerate - For variable framerates this would be the maximum framerate that - is expected. This value is only valid when the framerate is 0/1 - - views, G_TYPE_INT, default 1 - The number of views for multiview video. Each buffer contains - multiple GstVideoMeta buffers that describe each view. use the frame id to - get access to the different views. - - interlace-mode, G_TYPE_STRING, default progressive - The interlace mode. The following values are possible: - - "progressive" : all frames are progressive - "interleaved" : 2 fields are interleaved in one video frame. Extra buffer - flags describe the field order. - "mixed" : progressive and interleaved frames, extra buffer flags describe - the frame and fields. - "fields" : 2 fields are stored in one buffer, use the frame ID - to get access to the required field. For multiview (the - 'views' property > 1) the fields of view N can be found at - frame ID (N * 2) and (N * 2) + 1. - Each view has only half the amount of lines as noted in the - height property, pads specifying the "fields" property - must be prepared for this. This mode requires multiple - GstVideoMeta metadata to describe the fields. - - chroma-site, G_TYPE_STRING, default UNKNOWN - The chroma siting of the video frames. - - "jpeg" : GST_VIDEO_CHROMA_SITE_JPEG - "mpeg2": GST_VIDEO_CHROMA_SITE_MPEG2 - "dv" : GST_VIDEO_CHROMA_SITE_DV - - colorimetry, G_TYPE_STRING, default UNKNOWN - The colorimetry of the video frames predefined colorimetry is given with - the following values: - - "bt601" - "bt709" - "smpte240m" - - pixel-aspect-ratio, GST_TYPE_FRACTION, default 1/1 - The pixel aspect ration of the video - - format, G_TYPE_STRING, mandatory - The format of the video, see the Formats section for a list of valid format - strings. - -Metadata --------- - - "GstVideoMeta" - contains the description of one video field or frame. It has - stride support and support for having multiple memory regions per frame. - - Multiple GstVideoMeta can be added to a buffer and can be identified with a - unique id. This id can be used to select fields in interlaced formats or - views in multiview formats. - - "GstVideoCropMeta" - Contains the cropping region of the video. - - -Formats -------- - - "I420" planar 4:2:0 YUV - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth: 8 - pstride: 1 - default offset: size (component0) - default rstride: RU4 (RU2 (width) / 2) - default size: rstride (component1) * RU2 (height) / 2 - - Component 2: V - depth 8 - pstride: 1 - default offset: offset (component1) + size (component1) - default rstride: RU4 (RU2 (width) / 2) - default size: rstride (component2) * RU2 (height) / 2 - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "YV12" planar 4:2:0 YUV - - Same as I420 but with U and V planes swapped - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth 8 - pstride: 1 - default offset: offset (component2) + size (component2) - default rstride: RU4 (RU2 (width) / 2) - default size: rstride (component1) * RU2 (height) / 2 - - Component 2: V - depth: 8 - pstride: 1 - default offset: size (component0) - default rstride: RU4 (RU2 (width) / 2) - default size: rstride (component2) * RU2 (height) / 2 - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "YUY2" packed 4:2:2 YUV - - +--+--+--+--+ +--+--+--+--+ - |Y0|U0|Y1|V0| |Y2|U2|Y3|V2| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: Y - depth: 8 - pstride: 2 - offset: 0 - - Component 1: U - depth: 8 - offset: 1 - pstride: 4 - - Component 2: V - depth 8 - offset: 3 - pstride: 4 - - Image - default rstride: RU4 (width * 2) - default size: rstride (image) * height - - - "YVYU" packed 4:2:2 YUV - - Same as "YUY2" but with U and V planes swapped - - +--+--+--+--+ +--+--+--+--+ - |Y0|V0|Y1|U0| |Y2|V2|Y3|U2| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: Y - depth: 8 - pstride: 2 - offset: 0 - - Component 1: U - depth: 8 - pstride: 4 - offset: 3 - - Component 2: V - depth 8 - pstride: 4 - offset: 1 - - Image - default rstride: RU4 (width * 2) - default size: rstride (image) * height - - - "UYVY" packed 4:2:2 YUV - - +--+--+--+--+ +--+--+--+--+ - |U0|Y0|V0|Y1| |U2|Y2|V2|Y3| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: Y - depth: 8 - pstride: 2 - offset: 1 - - Component 1: U - depth: 8 - pstride: 4 - offset: 0 - - Component 2: V - depth 8 - pstride: 4 - offset: 2 - - Image - default rstride: RU4 (width * 2) - default size: rstride (image) * height - - - "AYUV" packed 4:4:4 YUV with alpha channel - - +--+--+--+--+ +--+--+--+--+ - |A0|Y0|U0|V0| |A1|Y1|U1|V1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: Y - depth: 8 - pstride: 4 - offset: 1 - - Component 1: U - depth: 8 - pstride: 4 - offset: 2 - - Component 2: V - depth 8 - pstride: 4 - offset: 3 - - Component 3: A - depth 8 - pstride: 4 - offset: 0 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - - "RGBx" sparse rgb packed into 32 bit, space last - - +--+--+--+--+ +--+--+--+--+ - |R0|G0|B0|X | |R1|G1|B1|X | ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 8 - pstride: 4 - offset: 0 - - Component 1: G - depth: 8 - pstride: 4 - offset: 1 - - Component 2: B - depth 8 - pstride: 4 - offset: 2 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "BGRx" sparse reverse rgb packed into 32 bit, space last - - +--+--+--+--+ +--+--+--+--+ - |B0|G0|R0|X | |B1|G1|R1|X | ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 8 - pstride: 4 - offset: 2 - - Component 1: G - depth: 8 - pstride: 4 - offset: 1 - - Component 2: B - depth 8 - pstride: 4 - offset: 0 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "xRGB" sparse rgb packed into 32 bit, space first - - +--+--+--+--+ +--+--+--+--+ - |X |R0|G0|B0| |X |R1|G1|B1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 8 - pstride: 4 - offset: 1 - - Component 1: G - depth: 8 - pstride: 4 - offset: 2 - - Component 2: B - depth 8 - pstride: 4 - offset: 3 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "xBGR" sparse reverse rgb packed into 32 bit, space first - - +--+--+--+--+ +--+--+--+--+ - |X |B0|G0|R0| |X |B1|G1|R1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 8 - pstride: 4 - offset: 3 - - Component 1: G - depth: 8 - pstride: 4 - offset: 2 - - Component 2: B - depth 8 - pstride: 4 - offset: 1 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "RGBA" rgb with alpha channel last - - +--+--+--+--+ +--+--+--+--+ - |R0|G0|B0|A0| |R1|G1|B1|A1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 8 - pstride: 4 - offset: 0 - - Component 1: G - depth: 8 - pstride: 4 - offset: 1 - - Component 2: B - depth 8 - pstride: 4 - offset: 2 - - Component 3: A - depth 8 - pstride: 4 - offset: 3 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "BGRA" reverse rgb with alpha channel last - - +--+--+--+--+ +--+--+--+--+ - |B0|G0|R0|A0| |B1|G1|R1|A1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 8 - pstride: 4 - offset: 2 - - Component 1: G - depth: 8 - pstride: 4 - offset: 1 - - Component 2: B - depth 8 - pstride: 4 - offset: 0 - - Component 3: A - depth 8 - pstride: 4 - offset: 3 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "ARGB" rgb with alpha channel first - - +--+--+--+--+ +--+--+--+--+ - |A0|R0|G0|B0| |A1|R1|G1|B1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 8 - pstride: 4 - offset: 1 - - Component 1: G - depth: 8 - pstride: 4 - offset: 2 - - Component 2: B - depth 8 - pstride: 4 - offset: 3 - - Component 3: A - depth 8 - pstride: 4 - offset: 0 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "ABGR" reverse rgb with alpha channel first - - +--+--+--+--+ +--+--+--+--+ - |A0|R0|G0|B0| |A1|R1|G1|B1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 8 - pstride: 4 - offset: 1 - - Component 1: G - depth: 8 - pstride: 4 - offset: 2 - - Component 2: B - depth 8 - pstride: 4 - offset: 3 - - Component 3: A - depth 8 - pstride: 4 - offset: 0 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "RGB" rgb - - +--+--+--+ +--+--+--+ - |R0|G0|B0| |R1|G1|B1| ... - +--+--+--+ +--+--+--+ - - Component 0: R - depth: 8 - pstride: 3 - offset: 0 - - Component 1: G - depth: 8 - pstride: 3 - offset: 1 - - Component 2: B - depth 8 - pstride: 3 - offset: 2 - - Image - default rstride: RU4 (width * 3) - default size: rstride (image) * height - - "BGR" reverse rgb - - +--+--+--+ +--+--+--+ - |B0|G0|R0| |B1|G1|R1| ... - +--+--+--+ +--+--+--+ - - Component 0: R - depth: 8 - pstride: 3 - offset: 2 - - Component 1: G - depth: 8 - pstride: 3 - offset: 1 - - Component 2: B - depth 8 - pstride: 3 - offset: 0 - - Image - default rstride: RU4 (width * 3) - default size: rstride (image) * height - - "Y41B" planar 4:1:1 YUV - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * height - - Component 1: U - depth 8 - pstride: 1 - default offset: size (component0) - default rstride: RU16 (width) / 4 - default size: rstride (component1) * height - - Component 2: V - depth: 8 - pstride: 1 - default offset: offset (component1) + size (component1) - default rstride: RU16 (width) / 4 - default size: rstride (component2) * height - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "Y42B" planar 4:2:2 YUV - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * height - - Component 1: U - depth 8 - pstride: 1 - default offset: size (component0) - default rstride: RU8 (width) / 2 - default size: rstride (component1) * height - - Component 2: V - depth: 8 - pstride: 1 - default offset: offset (component1) + size (component1) - default rstride: RU8 (width) / 2 - default size: rstride (component2) * height - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "Y444" planar 4:4:4 YUV - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * height - - Component 1: U - depth 8 - pstride: 1 - default offset: size (component0) - default rstride: RU4 (width) - default size: rstride (component1) * height - - Component 2: V - depth: 8 - pstride: 1 - default offset: offset (component1) + size (component1) - default rstride: RU4 (width) - default size: rstride (component2) * height - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "v210" packed 4:2:2 10-bit YUV, complex format - - Component 0: Y - depth: 10 - - Component 1: U - depth 10 - - Component 2: V - depth: 10 - - Image - default rstride: RU48 (width) * 128 - default size: rstride (image) * height - - - "v216" packed 4:2:2 16-bit YUV, Y0-U0-Y1-V1 order - - +--+--+--+--+ +--+--+--+--+ - |U0|Y0|V0|Y1| |U1|Y2|V1|Y3| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: Y - depth: 16 LE - pstride: 4 - offset: 2 - - Component 1: U - depth 16 LE - pstride: 8 - offset: 0 - - Component 2: V - depth: 16 LE - pstride: 8 - offset: 4 - - Image - default rstride: RU8 (width * 2) - default size: rstride (image) * height - - "NV12" planar 4:2:0 YUV with interleaved UV plane - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth 8 - pstride: 2 - default offset: size (component0) - default rstride: RU4 (width) - - Component 2: V - depth: 8 - pstride: 2 - default offset: offset (component1) + 1 - default rstride: RU4 (width) - - Image - default size: RU4 (width) * RU2 (height) * 3 / 2 - - - "NV21" planar 4:2:0 YUV with interleaved VU plane - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth 8 - pstride: 2 - default offset: offset (component1) + 1 - default rstride: RU4 (width) - - Component 2: V - depth: 8 - pstride: 2 - default offset: size (component0) - default rstride: RU4 (width) - - Image - default size: RU4 (width) * RU2 (height) * 3 / 2 - - "GRAY8" 8-bit grayscale - "Y800" same as "GRAY8" - - Component 0: Y - depth: 8 - offset: 0 - pstride: 1 - default rstride: RU4 (width) - default size: rstride (component0) * height - - Image - default size: size (component0) - - "GRAY16_BE" 16-bit grayscale, most significant byte first - - Component 0: Y - depth: 16 - offset: 0 - pstride: 2 - default rstride: RU4 (width * 2) - default size: rstride (component0) * height - - Image - default size: size (component0) - - "GRAY16_LE" 16-bit grayscale, least significant byte first - "Y16" same as "GRAY16_LE" - - Component 0: Y - depth: 16 LE - offset: 0 - pstride: 2 - default rstride: RU4 (width * 2) - default size: rstride (component0) * height - - Image - default size: size (component0) - - "v308" packed 4:4:4 YUV - - +--+--+--+ +--+--+--+ - |Y0|U0|V0| |Y1|U1|V1| ... - +--+--+--+ +--+--+--+ - - Component 0: Y - depth: 8 - pstride: 3 - offset: 0 - - Component 1: U - depth 8 - pstride: 3 - offset: 1 - - Component 2: V - depth: 8 - pstride: 3 - offset: 2 - - Image - default rstride: RU4 (width * 3) - default size: rstride (image) * height - - "IYU2" packed 4:4:4 YUV, U-Y-V order - - +--+--+--+ +--+--+--+ - |U0|Y0|V0| |U1|Y1|V1| ... - +--+--+--+ +--+--+--+ - - Component 0: Y - depth: 8 - pstride: 3 - offset: 1 - - Component 1: U - depth 8 - pstride: 3 - offset: 0 - - Component 2: V - depth: 8 - pstride: 3 - offset: 2 - - Image - default rstride: RU4 (width * 3) - default size: rstride (image) * height - - "RGB16" rgb 5-6-5 bits per component - - +--+--+--+ +--+--+--+ - |R0|G0|B0| |R1|G1|B1| ... - +--+--+--+ +--+--+--+ - - Component 0: R - depth: 5 - pstride: 2 - - Component 1: G - depth 6 - pstride: 2 - - Component 2: B - depth: 5 - pstride: 2 - - Image - default rstride: RU4 (width * 2) - default size: rstride (image) * height - - "BGR16" reverse rgb 5-6-5 bits per component - - +--+--+--+ +--+--+--+ - |B0|G0|R0| |B1|G1|R1| ... - +--+--+--+ +--+--+--+ - - Component 0: R - depth: 5 - pstride: 2 - - Component 1: G - depth 6 - pstride: 2 - - Component 2: B - depth: 5 - pstride: 2 - - Image - default rstride: RU4 (width * 2) - default size: rstride (image) * height - - "RGB15" rgb 5-5-5 bits per component - - +--+--+--+ +--+--+--+ - |R0|G0|B0| |R1|G1|B1| ... - +--+--+--+ +--+--+--+ - - Component 0: R - depth: 5 - pstride: 2 - - Component 1: G - depth 5 - pstride: 2 - - Component 2: B - depth: 5 - pstride: 2 - - Image - default rstride: RU4 (width * 2) - default size: rstride (image) * height - - "BGR15" reverse rgb 5-5-5 bits per component - - +--+--+--+ +--+--+--+ - |B0|G0|R0| |B1|G1|R1| ... - +--+--+--+ +--+--+--+ - - Component 0: R - depth: 5 - pstride: 2 - - Component 1: G - depth 5 - pstride: 2 - - Component 2: B - depth: 5 - pstride: 2 - - Image - default rstride: RU4 (width * 2) - default size: rstride (image) * height - - "UYVP" packed 10-bit 4:2:2 YUV (U0-Y0-V0-Y1 U2-Y2-V2-Y3 U4 ...) - - Component 0: Y - depth: 10 - - Component 1: U - depth 10 - - Component 2: V - depth: 10 - - Image - default rstride: RU4 (width * 2 * 5) - default size: rstride (image) * height - - "A420" planar 4:4:2:0 AYUV - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth 8 - pstride: 1 - default offset: size (component0) - default rstride: RU4 (RU2 (width) / 2) - default size: rstride (component1) * (RU2 (height) / 2) - - Component 2: V - depth: 8 - pstride: 1 - default offset: size (component0) + size (component1) - default rstride: RU4 (RU2 (width) / 2) - default size: rstride (component2) * (RU2 (height) / 2) - - Component 3: A - depth: 8 - pstride: 1 - default offset: size (component0) + size (component1) + - size (component2) - default rstride: RU4 (width) - default size: rstride (component3) * RU2 (height) - - Image - default size: size (component0) + - size (component1) + - size (component2) + - size (component3) - - "RGB8P" 8-bit paletted RGB - - Component 0: INDEX - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * height - - Component 1: PALETTE - depth 32 - pstride: 4 - default offset: size (component0) - rstride: 4 - size: 256 * 4 - - Image - default size: size (component0) + size (component1) - - "YUV9" planar 4:1:0 YUV - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * height - - Component 1: U - depth 8 - pstride: 1 - default offset: size (component0) - default rstride: RU4 (RU4 (width) / 4) - default size: rstride (component1) * (RU4 (height) / 4) - - Component 2: V - depth: 8 - pstride: 1 - default offset: offset (component1) + size (component1) - default rstride: RU4 (RU4 (width) / 4) - default size: rstride (component2) * (RU4 (height) / 4) - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "YVU9" planar 4:1:0 YUV (like YUV9 but UV planes swapped) - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU4 (width) - default size: rstride (component0) * height - - Component 1: U - depth 8 - pstride: 1 - default offset: offset (component2) + size (component2) - default rstride: RU4 (RU4 (width) / 4) - default size: rstride (component1) * (RU4 (height) / 4) - - Component 2: V - depth: 8 - pstride: 1 - default offset: size (component0) - default rstride: RU4 (RU4 (width) / 4) - default size: rstride (component2) * (RU4 (height) / 4) - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "IYU1" packed 4:1:1 YUV (Cb-Y0-Y1-Cr-Y2-Y3 ...) - - +--+--+--+ +--+--+--+ - |B0|G0|R0| |B1|G1|R1| ... - +--+--+--+ +--+--+--+ - - Component 0: Y - depth: 8 - offset: 1 - pstride: 2 - - Component 1: U - depth 5 - offset: 0 - pstride: 2 - - Component 2: V - depth: 5 - offset: 4 - pstride: 2 - - Image - default rstride: RU4 (RU4 (width) + RU4 (width) / 2) - default size: rstride (image) * height - - "ARGB64" rgb with alpha channel first, 16 bits per channel - - +--+--+--+--+ +--+--+--+--+ - |A0|R0|G0|B0| |A1|R1|G1|B1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: R - depth: 16 LE - pstride: 8 - offset: 2 - - Component 1: G - depth 16 LE - pstride: 8 - offset: 4 - - Component 2: B - depth: 16 LE - pstride: 8 - offset: 6 - - Component 3: A - depth: 16 LE - pstride: 8 - offset: 0 - - Image - default rstride: width * 8 - default size: rstride (image) * height - - "AYUV64" packed 4:4:4 YUV with alpha channel, 16 bits per channel (A0-Y0-U0-V0 ...) - - +--+--+--+--+ +--+--+--+--+ - |A0|Y0|U0|V0| |A1|Y1|U1|V1| ... - +--+--+--+--+ +--+--+--+--+ - - Component 0: Y - depth: 16 LE - pstride: 8 - offset: 2 - - Component 1: U - depth 16 LE - pstride: 8 - offset: 4 - - Component 2: V - depth: 16 LE - pstride: 8 - offset: 6 - - Component 3: A - depth: 16 LE - pstride: 8 - offset: 0 - - Image - default rstride: width * 8 - default size: rstride (image) * height - - "r210" packed 4:4:4 RGB, 10 bits per channel - - +--+--+--+ +--+--+--+ - |R0|G0|B0| |R1|G1|B1| ... - +--+--+--+ +--+--+--+ - - Component 0: R - depth: 10 - pstride: 4 - - Component 1: G - depth 10 - pstride: 4 - - Component 2: B - depth: 10 - pstride: 4 - - Image - default rstride: width * 4 - default size: rstride (image) * height - - "I420_10LE" planar 4:2:0 YUV, 10 bits per channel LE - - Component 0: Y - depth: 10 LE - pstride: 2 - default offset: 0 - default rstride: RU4 (width * 2) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth: 10 LE - pstride: 2 - default offset: size (component0) - default rstride: RU4 (width) - default size: rstride (component1) * RU2 (height) / 2 - - Component 2: V - depth 10 LE - pstride: 2 - default offset: offset (component1) + size (component1) - default rstride: RU4 (width) - default size: rstride (component2) * RU2 (height) / 2 - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "I420_10BE" planar 4:2:0 YUV, 10 bits per channel BE - - Component 0: Y - depth: 10 BE - pstride: 2 - default offset: 0 - default rstride: RU4 (width * 2) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth: 10 BE - pstride: 2 - default offset: size (component0) - default rstride: RU4 (width) - default size: rstride (component1) * RU2 (height) / 2 - - Component 2: V - depth 10 BE - pstride: 2 - default offset: offset (component1) + size (component1) - default rstride: RU4 (width) - default size: rstride (component2) * RU2 (height) / 2 - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "I422_10LE" planar 4:2:2 YUV, 10 bits per channel LE - - Component 0: Y - depth: 10 LE - pstride: 2 - default offset: 0 - default rstride: RU4 (width * 2) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth: 10 LE - pstride: 2 - default offset: size (component0) - default rstride: RU4 (width) - default size: rstride (component1) * RU2 (height) - - Component 2: V - depth 10 LE - pstride: 2 - default offset: offset (component1) + size (component1) - default rstride: RU4 (width) - default size: rstride (component2) * RU2 (height) - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "I422_10BE" planar 4:2:2 YUV, 10 bits per channel BE - - Component 0: Y - depth: 10 BE - pstride: 2 - default offset: 0 - default rstride: RU4 (width * 2) - default size: rstride (component0) * RU2 (height) - - Component 1: U - depth: 10 BE - pstride: 2 - default offset: size (component0) - default rstride: RU4 (width) - default size: rstride (component1) * RU2 (height) - - Component 2: V - depth 10 BE - pstride: 2 - default offset: offset (component1) + size (component1) - default rstride: RU4 (width) - default size: rstride (component2) * RU2 (height) - - Image - default size: size (component0) + - size (component1) + - size (component2) - - "Y444_10BE" planar 4:4:4 YUV, 10 bits per channel - "Y444_10LE" planar 4:4:4 YUV, 10 bits per channel - - "GBR" planar 4:4:4 RGB, 8 bits per channel - "GBR_10BE" planar 4:4:4 RGB, 10 bits per channel - "GBR_10LE" planar 4:4:4 RGB, 10 bits per channel - - "NV16" planar 4:2:2 YUV with interleaved UV plane - "NV61" planar 4:2:2 YUV with interleaved VU plane - "NV24" planar 4:4:4 YUV with interleaved UV plane - - - "NV12_64Z32" planar 4:2:0 YUV with interleaved UV plane in 64x32 tiles zigzag - - Component 0: Y - depth: 8 - pstride: 1 - default offset: 0 - default rstride: RU128 (width) - default size: rstride (component0) * RU32 (height) - - Component 1: U - depth 8 - pstride: 2 - default offset: size (component0) - default rstride: (y_tiles << 16) | x_tiles - default x_tiles: RU128 (width) >> tile_width - default y_tiles: RU32 (height) >> tile_height - - Component 2: V - depth: 8 - pstride: 2 - default offset: offset (component1) + 1 - default rstride: (y_tiles << 16) | x_tiles - default x_tiles: RU128 (width) >> tile_width - default y_tiles: RU64 (height) >> (tile_height + 1) - - Image - default size: RU128 (width) * (RU32 (height) + RU64 (height) / 2) - tile mode: ZFLIPZ_2X2 - tile width: 6 - tile height: 5 - diff --git a/docs/design/part-playbin.txt b/docs/design/part-playbin.txt deleted file mode 100644 index 232ac0c..0000000 --- a/docs/design/part-playbin.txt +++ /dev/null @@ -1,69 +0,0 @@ -playbin --------- - -The purpose of this element is to decode and render the media contained in a -given generic uri. The element extends GstPipeline and is typically used in -playback situations. - -Required features: - - - accept and play any valid uri. This includes - - rendering video/audio - - overlaying subtitles on the video - - optionally read external subtitle files - - allow for hardware (non raw) sinks - - selection of audio/video/subtitle streams based on language. - - perform network buffering/incremental download - - gapless playback - - support for visualisations with configurable sizes - - ability to reject files that are too big, or of a format that would require - too much CPU/memory usage. - - be very efficient with adding elements such as converters to reduce the - amount of negotiation that has to happen. - - handle chained oggs. This includes having support for dynamic pad add and - remove from a demuxer. - -Components ----------- - -* decodebin2 - - - performs the autoplugging of demuxers/decoders - - emits signals when for steering the autoplugging - - to decide if a non-raw media format is acceptable as output - - to sort the possible decoders for a non-raw format - - see also decodebin2 design doc - -* uridecodebin - - - combination of a source to handle the given uri, an optional queueing element - and one or more decodebin2 elements to decode the non-raw streams. - -* playsink - - - handles display of audio/video/text. - - has request audio/video/text input pad. There is only one sinkpad per type. - The requested pads define the configuration of the internal pipeline. - - allows for setting audio/video sinks or does automatic sink selection. - - allows for configuration of visualisation element. - - allows for enable/disable of visualisation, audio and video. - -* playbin - - - combination of one or more uridecodebin elements to read the uri and subtitle - uri. - - support for queuing new media to support gapless playback. - - handles stream selection. - - uses playsink to display. - - selection of sinks and configuration of uridecodebin with raw output formats. - - -Gapless playback ----------------- - -playbin has an "about-to-finish" signal. The application should configure a new -uri (and optional suburi) in the callback. When the current media finishes, this -new media will be played next. - - - diff --git a/docs/design/part-stereo-multiview-video.markdown b/docs/design/part-stereo-multiview-video.markdown deleted file mode 100644 index 838e6d8..0000000 --- a/docs/design/part-stereo-multiview-video.markdown +++ /dev/null @@ -1,278 +0,0 @@ -Design for Stereoscopic & Multiview Video Handling -================================================== - -There are two cases to handle: - -* Encoded video output from a demuxer to parser / decoder or from encoders into a muxer. -* Raw video buffers - -The design below is somewhat based on the proposals from -[bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157) - -Multiview is used as a generic term to refer to handling both -stereo content (left and right eye only) as well as extensions for videos -containing multiple independent viewpoints. - -Encoded Signalling ------------------- -This is regarding the signalling in caps and buffers from demuxers to -parsers (sometimes) or out from encoders. - -For backward compatibility with existing codecs many transports of -stereoscopic 3D content use normal 2D video with 2 views packed spatially -in some way, and put extra new descriptions in the container/mux. - -Info in the demuxer seems to apply to stereo encodings only. For all -MVC methods I know, the multiview encoding is in the video bitstream itself -and therefore already available to decoders. Only stereo systems have been retro-fitted -into the demuxer. - -Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets) -and it would be useful to be able to put the info onto caps and buffers from the -parser without decoding. - -To handle both cases, we need to be able to output the required details on -encoded video for decoders to apply onto the raw video buffers they decode. - -*If there ever is a need to transport multiview info for encoded data the -same system below for raw video or some variation should work* - -### Encoded Video: Properties that need to be encoded into caps -1. multiview-mode (called "Channel Layout" in bug 611157) - * Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo - (switches between mono and stereo - mp4 can do this) - * Uses a buffer flag to mark individual buffers as mono or "not mono" - (single|stereo|multiview) for mixed scenarios. The alternative (not - proposed) is for the demuxer to switch caps for each mono to not-mono - change, and not used a 'mixed' caps variant at all. - * _single_ refers to a stream of buffers that only contain 1 view. - It is different from mono in that the stream is a marked left or right - eye stream for later combining in a mixer or when displaying. - * _multiple_ marks a stream with multiple independent views encoded. - It is included in this list for completeness. As noted above, there's - currently no scenario that requires marking encoded buffers as MVC. -2. Frame-packing arrangements / view sequence orderings - * Possible frame packings: side-by-side, side-by-side-quincunx, - column-interleaved, row-interleaved, top-bottom, checker-board - * bug 611157 - sreerenj added side-by-side-full and top-bottom-full but - I think that's covered by suitably adjusting pixel-aspect-ratio. If - not, they can be added later. - * _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest. - * _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+ - * _side-by-side-quincunx_. Side By Side packing, but quincunx sampling - - 1 pixel offset of each eye needs to be accounted when upscaling or displaying - * there may be other packings (future expansion) - * Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved - * _frame-by-frame_, each buffer is left, then right view etc - * _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right. - Demuxer info indicates which one is which. - Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin) - -> *Leave this for future expansion* - * _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2 - -> *Leave this for future expansion / deletion* -3. view encoding order - * Describes how to decide which piece of each frame corresponds to left or right eye - * Possible orderings left, right, left-then-right, right-then-left - - Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams - - Need a buffer flag for marking the first buffer of a group. -4. "Frame layout flags" - * flags for view specific interpretation - * horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right - Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors. - * This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry. - * It might be better just to use a hex value / integer - -Buffer representation for raw video ------------------------------------ -* Transported as normal video buffers with extra metadata -* The caps define the overall buffer width/height, with helper functions to - extract the individual views for packed formats -* pixel-aspect-ratio adjusted if needed to double the overall width/height -* video sinks that don't know about multiview extensions yet will show the packed view as-is - For frame-sequence outputs, things might look weird, but just adding multiview-mode to the sink caps - can disallow those transports. -* _row-interleaved_ packing is actually just side-by-side memory layout with half frame width, twice - the height, so can be handled by adjusting the overall caps and strides -* Other exotic layouts need new pixel formats defined (checker-board, column-interleaved, side-by-side-quincunx) -* _Frame-by-frame_ - one view per buffer, but with alternating metas marking which buffer is which left/right/other view and using a new buffer flag as described above - to mark the start of a group of corresponding frames. -* New video caps addition as for encoded buffers - -### Proposed Caps fields -Combining the requirements above and collapsing the combinations into mnemonics: - -* multiview-mode = - mono | left | right | sbs | sbs-quin | col | row | topbot | checkers | - frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row | - mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames -* multiview-flags = - + 0x0000 none - + 0x0001 right-view-first - + 0x0002 left-h-flipped - + 0x0004 left-v-flipped - + 0x0008 right-h-flipped - + 0x0010 right-v-flipped - -### Proposed new buffer flags -Add two new GST_VIDEO_BUFFER flags in video-frame.h and make it clear that those -flags can apply to encoded video buffers too. wtay says that's currently the -case anyway, but the documentation should say it. - -**GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW** - Marks a buffer as representing non-mono content, although it may be a single (left or right) eye view. -**GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE** - for frame-sequential methods of transport, mark the "first" of a left/right/other group of frames - -### A new GstMultiviewMeta -This provides a place to describe all provided views in a buffer / stream, -and through Meta negotiation to inform decoders about which views to decode if -not all are wanted. - -* Logical labels/names and mapping to GstVideoMeta numbers -* Standard view labels LEFT/RIGHT, and non-standard ones (strings) - - GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1 - GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2 - - struct GstVideoMultiviewViewInfo { - guint view_label; - guint meta_id; // id of the GstVideoMeta for this view - - padding; - } - - struct GstVideoMultiviewMeta { - guint n_views; - GstVideoMultiviewViewInfo *view_info; - } - -The meta is optional, and probably only useful later for MVC - - -Outputting stereo content -------------------------- -The initial implementation for output will be stereo content in glimagesink - -### Output Considerations with OpenGL -* If we have support for stereo GL buffer formats, we can output separate left/right eye images and let the hardware take care of display. -* Otherwise, glimagesink needs to render one window with left/right in a suitable frame packing - and that will only show correctly in fullscreen on a device set for the right 3D packing -> requires app intervention to set the video mode. -* Which could be done manually on the TV, or with HDMI 1.4 by setting the right video mode for the screen to inform the TV or third option, we - support rendering to two separate overlay areas on the screen - one for left eye, one for right which can be supported using the 'splitter' element and 2 output sinks or, better, add a 2nd window overlay for split stereo output -* Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so initial implementation won't include that - -## Other elements for handling multiview content -* videooverlay interface extensions - * __Q__: Should this be a new interface? - * Element message to communicate the presence of stereoscopic information to the app - * App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags - * Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata - * New API for the app to set rendering options for stereo/multiview content - * This might be best implemented as a **multiview GstContext**, so that - the pipeline can share app preferences for content interpretation and downmixing - to mono for output, or in the sink and have those down as far upstream/downstream as possible. -* Converter element - * convert different view layouts - * Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono -* Mixer element - * take 2 video streams and output as stereo - * later take n video streams - * share code with the converter, it just takes input from n pads instead of one. -* Splitter element - * Output one pad per view - -### Implementing MVC handling in decoders / parsers (and encoders) -Things to do to implement MVC handling - -1. Parsing SEI in h264parse and setting caps (patches available in - bugzilla for parsing, see below) -2. Integrate gstreamer-vaapi MVC support with this proposal -3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC) -4. generating SEI in H.264 encoder -5. Support for MPEG2 MVC extensions - -## Relevant bugs -[bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser -[bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support -[bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams - -## Other Information -[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D) - -## Open Questions - -### Background - -### Representation for GstGL -When uploading raw video frames to GL textures, the goal is to implement: - -2. Split packed frames into separate GL textures when uploading, and -attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and -multiview-flags fields in the caps should change to reflect the conversion -from one incoming GstMemory to multiple GstGLMemory, and change the -width/height in the output info as needed. - -This is (currently) targetted as 2 render passes - upload as normal -to a single stereo-packed RGBA texture, and then unpack into 2 -smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as -2 GstGLMemory attached to one buffer. We can optimise the upload later -to go directly to 2 textures for common input formats. - -Separat output textures have a few advantages: - -* Filter elements can more easily apply filters in several passes to each -texture without fundamental changes to our filters to avoid mixing pixels -from separate views. -* Centralises the sampling of input video frame packings in the upload code, -which makes adding new packings in the future easier. -* Sampling multiple textures to generate various output frame-packings -for display is conceptually simpler than converting from any input packing -to any output packing. -* In implementations that support quad buffers, having separate textures -makes it trivial to do GL_LEFT/GL_RIGHT output - -For either option, we'll need new glsink output API to pass more -information to applications about multiple views for the draw signal/callback. - -I don't know if it's desirable to support *both* methods of representing -views. If so, that should be signalled in the caps too. That could be a -new multiview-mode for passing views in separate GstMemory objects -attached to a GstBuffer, which would not be GL specific. - -### Overriding frame packing interpretation -Most sample videos available are frame packed, with no metadata -to say so. How should we override that interpretation? - -* Simple answer: Use capssetter + new properties on playbin to - override the multiview fields - *Basically implemented in playbin, using a pad probe. Needs more work for completeness* - -### Adding extra GstVideoMeta to buffers -There should be one GstVideoMeta for the entire video frame in packed -layouts, and one GstVideoMeta per GstGLMemory when views are attached -to a GstBuffer separately. This should be done by the buffer pool, -which knows from the caps. - -### videooverlay interface extensions -GstVideoOverlay needs: - -* A way to announce the presence of multiview content when it is - detected/signalled in a stream. -* A way to tell applications which output methods are supported/available -* A way to tell the sink which output method it should use -* Possibly a way to tell the sink to override the input frame - interpretation / caps - depends on the answer to the question - above about how to model overriding input interpretation. - -### What's implemented -* Caps handling -* gst-plugins-base libsgstvideo pieces -* playbin caps overriding -* conversion elements - glstereomix, gl3dconvert (needs a rename), - glstereosplit. - -### Possible future enhancements -* Make GLupload split to separate textures at upload time? - * Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture. -* Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed. - - current done by packing then downloading which isn't OK overhead for RGBA download -* Think about how we integrate GLstereo - do we need to do anything special, - or can the app just render to stereo/quad buffers if they're available?