BufferPools ----------- This document proposes a mechnism to build pools of reusable buffers. The proposal should improve performance and help to implement zero-copy usecases. Last edited: 2009-09-01 Stefan Kost Current Behaviour ----------------- Elements either create own buffers or request downstream buffers via pad_alloc. There is hardly any reuse of buffers, instead they are ususaly disposed after being rendered. Problems -------- - hardware based elements like to reuse buffers as they e.g. - mlock them (dsp) - establish a index<->adress relation (v4l2) - not reusing buffers has overhead and makes run time behaviour non-deterministic: - malloc (which usualy becomes an mmap for bigger buffers and thus a syscall) and free (can trigger compression of freelists in the allocator) - shm alloc/attach, detach/free (xvideo) - some usecases cause memcpys - not having the right amount of buffers (e.g. too few buffers in v4l2src) - receiving buffers of wrong type (e.g. plain buffers in xvimagesink) - receving buffers with wrong alignment (dsp) - some usecases cause unneded cacheflushes when buffers are passed between user and kernel-space What is needed -------------- Elements that sink raw data buffers of usualy constant size would like to maintain a bufferpool. These could be sinks or encoders. We need mechanims to select and dynamically update: - the bufferpool owners in a pipeline - the bufferpool sizes - the queued buffer sizes, alignments and flags Proposal -------- Querying the bufferpool size and buffer alignments can work simillar to latency queries (gst/gstbin.c:{gst_bin_query,bin_query_latency_fold}. Aggregation is quite straight forward : number-of-buffers is summed up and for alignment we gather the MAX value. Bins need to track which elemnts have been selected as bufferpools owners and update if those are removed (FIXME: in which states?). Bins would also need to track if elements that replied to the query are removed and update the bufferpool configuration (event). Likewise addition of new elements needs to be handled (query and if configuration is changed, update with event). Bufferpools owners need to handle caps changes to keep the queued buffers valid for the negotiated format. The bufferpool could be a helper GObject (like we use GstAdapter). If would manage a collection of GstBuffers. For each buffer t tracks wheter its in use or available. The bufferpool in gst-plugin-good/sys/v4l2/gstv4l2bufferpool might be a starting point. Scenarios --------- v4l2src ! xvimagesink ~~~~~~~~~~~~~~~~~~~~~ - v4l2src would report 1 buffer (do we still want the queue-size property?) - xvimagesink would report 1 buffer v4l2src ! tee name=t ! queue ! xvimagesink t. ! queue ! enc ! mux ! filesink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - v4l2src would report 1 buffer - xvimagesink would report 1 buffer - enc would report 1 buffer filesrc ! demux ! queue ! dec ! xvimagesink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - dec would report 1 buffer - xvimagesink would report 1 buffer Issues ------ Does it make sense to also have pools for sources or should they always use buffers from a downstream element. Do we need to add +1 to aggregated buffercount to alloc to have a buffer floating? E.g. Can we push buffers queickly enough to have e.g. v4l2src ! xvimagesink working with 2 buffers. What about v4l2src ! queue ! xvimagesink? There are more attributes on buffers needed to reduce the overhead even more: - padding: when using buffers on hardware one might need to pad the buffer on the end to a specific alignment - mlock: hardware that uses DMA needs buffers memory locked, if a buffer is already memory locked, it can be used by other hardware based elements as is - cache flushes: hardware based elements usualy need to flush cpu caches when sending results as the dma based memory writes do no update eventually cached values on the cpu. now if there is no element next in the pipeline that actually reads from this memory area we could avoid the flushes. All other hardware elements and elements with any caps (tee, queue, capsfilter) are examples for those.