docs/random/ensonic/draft-bufferpools.txt

   1 BufferPools
   2 -----------
   3
   4 This document proposes a mechnism to build pools of reusable buffers. The
   5 proposal should improve performance and help to implement zero-copy usecases.
   6
   7 Last edited: 2009-09-01 Stefan Kost
   8
   9
  10 Current Behaviour
  11 -----------------
  12
  13 Elements either create own buffers or request downstream buffers via pad_alloc.
  14 There is hardly any reuse of buffers, instead they are ususaly disposed after
  15 being rendered.
  16
  17
  18 Problems
  19 --------
  20
  21   - hardware based elements like to reuse buffers as they e.g.
  22     - mlock them (dsp)
  23     - establish a index<->adress relation (v4l2)
  24   - not reusing buffers has overhead and makes run time behaviour
  25     non-deterministic:
  26     - malloc (which usualy becomes an mmap for bigger buffers and thus a
  27       syscall) and free (can trigger compression of freelists in the allocator)
  28     - shm alloc/attach, detach/free (xvideo)
  29   - some usecases cause memcpys
  30     - not having the right amount of buffers (e.g. too few buffers in v4l2src)
  31     - receiving buffers of wrong type (e.g. plain buffers in xvimagesink)
  32     - receving buffers with wrong alignment (dsp)
  33   - some usecases cause unneded cacheflushes when buffers are passed between
  34     user and kernel-space
  35
  36
  37 What is needed
  38 --------------
  39
  40 Elements that sink raw data buffers of usualy constant size would like to
  41 maintain a bufferpool. These could be sinks or encoders. We need mechanims to
  42 select and dynamically update:
  43
  44   - the bufferpool owners in a pipeline
  45   - the bufferpool sizes
  46   - the queued buffer sizes, alignments and flags
  47
  48
  49 Proposal
  50 --------
  51 Querying the bufferpool size and buffer alignments can work simillar to latency
  52 queries (gst/gstbin.c:{gst_bin_query,bin_query_latency_fold}. Aggregation is
  53 quite straight forward : number-of-buffers is summed up and for alignment we
  54 gather the MAX value.
  55
  56 Bins need to track which elemnts have been selected as bufferpools owners and
  57 update if those are removed (FIXME: in which states?).
  58
  59 Bins would also need to track if elements that replied to the query are removed
  60 and update the bufferpool configuration (event). Likewise addition of new
  61 elements needs to be handled (query and if configuration is changed, update with
  62 event).
  63
  64 Bufferpools owners need to handle caps changes to keep the queued buffers valid
  65 for the negotiated format.
  66
  67 The bufferpool could be a helper GObject (like we use GstAdapter). If would
  68 manage a collection of GstBuffers. For each buffer t tracks wheter its in use or
  69 available. The bufferpool in gst-plugin-good/sys/v4l2/gstv4l2bufferpool might be
  70 a starting point.
  71
  72
  73 Scenarios
  74 ---------
  75
  76 v4l2src ! xvimagesink
  77 ~~~~~~~~~~~~~~~~~~~~~
  78 - v4l2src would report 1 buffer (do we still want the queue-size property?)
  79 - xvimagesink would report 1 buffer
  80
  81 v4l2src ! tee name=t ! queue ! xvimagesink t. ! queue ! enc ! mux ! filesink
  82 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  83 - v4l2src would report 1 buffer
  84 - xvimagesink would report 1 buffer
  85 - enc would report 1 buffer
  86
  87 filesrc ! demux ! queue ! dec ! xvimagesink
  88 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  89 - dec would report 1 buffer
  90 - xvimagesink would report 1 buffer
  91
  92
  93 Issues
  94 ------
  95
  96 Does it make sense to also have pools for sources or should they always use
  97 buffers from a downstream element.
  98
  99 Do we need to add +1 to aggregated buffercount to alloc to have a buffer
 100 floating? E.g. Can we push buffers queickly enough to have e.g.  v4l2src !
 101 xvimagesink working with 2 buffers. What about v4l2src ! queue ! xvimagesink?
 102
 103 There are more attributes on buffers needed to reduce the overhead even more:
 104
 105   - padding: when using buffers on hardware one might need to pad the buffer on
 106     the end to a specific alignment
 107   - mlock: hardware that uses DMA needs buffers memory locked, if a buffer is
 108     already memory locked, it can be used by other hardware based elements as is
 109   - cache flushes: hardware based elements usualy need to flush cpu caches when
 110     sending results as the dma based memory writes do no update eventually
 111     cached values on the cpu. now if there is no element next in the pipeline
 112     that actually reads from this memory area we could avoid the flushes. All
 113     other hardware elements and elements with any caps (tee, queue, capsfilter)
 114     are examples for those.
 115