subprojects/gst-docs/markdown/additional/design/dmabuf.md

   1 # DMA buffers
   2
   3 This document describes the GStreamer caps negotiation of DMA buffers on
   4 Linux-like platforms.
   5
   6 The DMA buffer sharing is the efficient way to share the buffer/memory
   7 between different Linux kernel driver, such as codecs/3D/display/cameras.
   8 For example, the decoder may want its output to be directly shared with the
   9 display server for rendering without a copy.
  10
  11 Any device driver which is part of DMA buffer sharing, can do so as either
  12 the *exporter* or *importer* of buffers.
  13
  14 This kind of buffer/memory is usually stored in non-system memory (maybe in
  15 device's local memory or something else not directly accessible by the
  16 CPU), then its memory mapping for CPU access may impose a big overhead and
  17 low performance, or even impossible.
  18
  19 DMA buffers are exposed to user-space as *file descriptors* allowing to pass
  20 them between processes.
  21
  22
  23 # DRM PRIME buffers
  24
  25 PRIME is the cross device buffer sharing framework in DRM kernel
  26 subsystem. These are the ones normally used in GStreamer which might
  27 contain video frames.
  28
  29 PRIME buffers requires some metadata to describe how to interpret them,
  30 such as a set of file descriptors (for example, one per plane), color
  31 definition in fourcc, and DRM-modifiers. If the frame is going to be mapped
  32 onto system's memory, also is needed padding, strides, offsets, etc.
  33
  34
  35 ## File descriptor
  36
  37 Each file descriptor represents a chunk of a frame, usually a plane. For
  38 example, when a DMA buffer contains NV12 format data, it might be
  39 composited by 2 planes: one for its Y component and the other for both UV
  40 components. Then, the hardware may use two detached memory chunks, one per
  41 plane, exposed as two file descriptors. Otherwise, if hardware uses only
  42 one continuous memory chunk for all the planes, the DMA buffer should just
  43 have one file descriptor.
  44
  45
  46 ## DRM fourcc
  47
  48 Just like fourcc common usage, DRM-fourcc describes the underlying format
  49 of the video frame, such as `DRM_FORMAT_YVU420` or `DRM_FORMAT_NV12`. All
  50 of them with the prefix `DRM_FORMAT_`. Please refer to `drm_fourcc.h` in
  51 the kernel for a full list. This list of fourcc formats maps to GStreamer
  52 video formats.
  53
  54
  55 ## DRM modifier
  56
  57 DRM-modifier describes the translation mechanism between pixel to memory
  58 samples and the actual memory storage of the buffer. The most
  59 straightforward modifier is LINEAR, where each pixel has contiguous storage
  60 and pixel location in memory can be easily calculated with the stride. This
  61 is considered the baseline interchange format, and most convenient for CPU
  62 access. Nonetheless, modern hardware employs more sophisticated memory
  63 access mechanisms, such as tiling and possibly compression.  For example,
  64 the TILED modifier describes memory storage where pixels are stored in 4x4
  65 blocks arranged in row-major ordering. For example, the first tile in
  66 memory stores pixels (0,0) to (3,3) inclusive, and the second tile in
  67 memory stores pixels (4,0) to (7,3) inclusive, and so on.
  68
  69 DRM-modifier is a sixteen hexadecimal digits to represent these memory
  70 layouts. For example, `0x0000000000000000` means linear,
  71 `0x0100000000000001` means Intel's X tile mode, etc. Please refer to
  72 `drm_fourcc.h` in kernel for a full list.
  73
  74 Excepting the linear modifier, the first 8 bits represent the vendor ID and
  75 the other 56 bits describe the memory layout, which may be hardware
  76 dependent. Users should be careful when interpreting non-linear memory by
  77 themselves.
  78
  79 Please bear in mind that, even for the linear modifier, as the access to
  80 DMA memory's content is through `map()` / `unmap()` functions, its
  81 read/write performance may be low or even bad, because of its cache type
  82 and coherence assurance. So, most of the times, it's advised to avoid that
  83 code path for upload or download frame data.
  84
  85
  86 ## Meta Data
  87
  88 The meta data contains information about how to interpret the memory
  89 holding the video frame, either when the frame mapped and its DRM modifier
  90 is linear, or by other API that imports those DMA buffers.
  91
  92
  93 # DMABufs in GStreamer
  94
  95
  96 ## Representation
  97
  98 In GStreamer, a full DMA buffer-based video frame is mapped to a
  99 `GstBuffer`, and each file descriptor used to describe the whole frame is
 100 held by a `GstMemory` mini-object. A derived class of `GstDmaBufAllocator`
 101 would be implemented for every wrapped API *exporting* DMA buffers to
 102 user-space, as memory allocator.
 103
 104
 105 ## DRM format caps field
 106
 107 The *GstCapsFeatures* *memory:DMABuf* is usually used to negotiate DMA
 108 buffers. It is recommended to allow DMAbuf to flow without the
 109 *GstCapsFeatures* *memory:DMABuf* if the DRM-modifier is linear.
 110
 111 But also, in order to negotiate *memory:DMABuf* thoroughly, it's required
 112 to match the DRM-modifiers between upstream and downstream. Otherwise video
 113 sinks might end rendering wrong frames assuming linear access.
 114
 115 Because DRM-fourcc and DRM-modifier are both necessary to render frames
 116 DMABuf-backed, we now consider both as a pair and combine them together to
 117 assure uniqueness. In caps, we use a *:* to link them together and write in
 118 the mode of *FORMAT:MODIFIER*, which represents a totally new single video
 119 format. For example, `NV12:0x0100000000000002` is a new video format
 120 combined by video format NV12 and the modifier `0x0100000000000002`. It's
 121 not NV12 and it's not its subset either. A modifier must always be present,
 122 except if the modifier is linear, then it should not be included,
 123 so `NV12:0x0000000000000000` is invalid, it must be `drm-format=NV12`.
 124
 125 Please note that this form of video format only appears within
 126 *memory:DMABuf* feature. It must not appear in any other video caps
 127 feature.
 128
 129 Unlike other type of video buffers, DMABuf frames might not be mappable and
 130 its internal format is opaque to the user. Then, unless the modifier is
 131 linear (0x0000000000000000) or some other well known tiled format such as
 132 NV12_4L4, NV12_16L16, NV12_64Z32, NV12_16L32S, etc. (which are defined in
 133 video-format.h), we always use `GST_VIDEO_FORMAT_ENCODED` in
 134 `GstVideoFormat` enum to represent its video format.
 135
 136 In order to not misuse this new format with the common video format, **in**
 137 *memory:DMABuf* feature, *drm-format* field in caps will replace the
 138 traditional *format* field.
 139
 140 So a DMABuf-backed video caps may look like:
 141
 142 ```
 143      video/x-raw(memory:DMABuf), \
 144                 drm-format=(string)NV12:0x0x0100000000000001, \
 145                 width=(int)1920, \
 146                 height=(int)1080, \
 147                 interlace-mode=(string)progressive, \
 148                 multiview-mode=(string)mono, \
 149                 multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, \
 150                 pixel-aspect-ratio=(fraction)1/1, \
 151                 framerate=(fraction)24/1, \
 152                 colorimetry=(string)bt709"
 153 ```
 154
 155 And when we call a video info API such as `gst_video_info_from_caps()` with
 156 this caps, it should return an video format as `GST_VIDEO_FORMAT_ENCODED`,
 157 leaving other fields unchanged as normal video caps.
 158
 159 In addition, a new structure
 160
 161 ```
 162 struct GstDrmVideoInfo
 163 {
 164   GstVideoInfo vinfo;
 165   guint32 drm_fourcc;
 166   guint64 drm_modifier;
 167 };
 168 ```
 169
 170 is introduced to represent more info of DMA video caps. User should use
 171 this DMABuf related API such as `gst_drm_video_info_from_caps()` to recognize
 172 the video format and parse the DMA info from caps.
 173
 174
 175 ## Meta data
 176
 177 Besides the *file descriptors*, there may be a `GstVideoMeta` data attached
 178 to each `GstBuffer` to describe more information such as the width, height,
 179 pitches, strides and plane offsets for that DMA buffer (Please note that
 180 the mandatory width and height information appears both in "caps" and here,
 181 and they should be always equal). This kind of information is only obtained
 182 by each module's API, such as the functions
 183 `VkImageDrmFormatModifierExplicitCreateInfoEXT()` in Vulkan, and
 184 `vaExportSurfaceHandle()` in VA-API. The information should be translated
 185 into `GstVideoMeta`'s fields when the DMA buffer is created and
 186 exported. These meta data is useful when other module wants to import the
 187 DMA buffers.
 188
 189 For example, we may create a `GstBuffer` using `vaExportSurfaceHandle()`
 190 VA-API, and set each field of `GstVideoMeta` with information from
 191 `VADRMPRIMESurfaceDescriptor`. Later, a downstream Vulkan element imports
 192 these DMA buffers with `VkImageDrmFormatModifierExplicitCreateInfoEXT()`,
 193 translating fields form buffer's `GstVideoMeta` into the
 194 `VkSubresourceLayout` parameter.
 195
 196 In short, the `GstVideoMeta` contains the common extra video information
 197 about the DMA buffer, which can be interpreted by each module.
 198
 199 Information in `GstVideoMeta` depends on the hardware context and
 200 setting. Its values, such as stride and pitch, may differ from the standard
 201 video format because of the hardware's requirement. For example, if a DMA
 202 buffer represents a compressed video in memory, its pitch and stride may be
 203 smaller than the standard linear one because of the compression. Please
 204 remind that users should not use this meta data to interpret and access the
 205 DMA buffer, **unless the modifier is linear**.
 206
 207
 208 # Negotiation of DMA buffer
 209
 210 If two elements of different modules (for example, VA-API decoder to
 211 Wayland sink) want to transfer dmabufs, the negotiation should ensure a
 212 common *drm-format* (FORMAT:MODIFIER).  As we already illustrate how to
 213 represent both of them in caps before, so the negotiation here in fact has
 214 no special operation except finding the intersection.
 215
 216
 217 ## Static Template Caps
 218
 219 If an element can list all the DRM fourcc/modifier composition at register
 220 time, `gst-inspect` result should look like:
 221
 222 ```
 223 SRC template: 'src'
 224     Availability: Always
 225       Capabilities:
 226         video/x-raw(memory:DMABuf)
 227           width:  [ 16, 16384 ]
 228           height: [ 16, 16384 ]
 229           drm-format: { (string)NV12:0x0100000000000001, \
 230                         (string)I420, (string)YV12, \
 231                         (string)YUY2:0x0100000000000002, \
 232                         (string)P010_10LE:0x0100000000000002, \
 233                         (string)BGRA:0x0100000000000002, \
 234                         (string)RGBA:0x0100000000000002, \
 235                         (string)BGR10A2_LE:0x0100000000000002, \
 236                         (string)VUYA:0x0100000000000002 }
 237 ```
 238
 239 But because sometimes it is impossible to enumerate and list all
 240 drm_fourcc/modifier composition in static templates (for example, we may
 241 need a runtime context which is not available at register time to detect
 242 the real modifers a HW can support), we can let the *drm-format* field
 243 absent to mean the super set of all formats.
 244
 245
 246 ## Renegotiation
 247
 248 Sometimes, a renegotiation may happen if the downstream element is not
 249 pleased with the caps set by the upstream element. For example, some sink
 250 element may not know the preferred DRM fourcc/modifier until the real
 251 render target window is realized. Then, it will send a "reconfigure" event
 252 to upstream element to require a renegotiation. At this round negotiation,
 253 the downstream element will provide a more precise *drm-format* list.
 254
 255
 256 ## Example
 257
 258 Consider the pipeline of:
 259
 260 ```
 261 vapostproc ! video/x-raw(memory:DMABuf) ! glupload
 262 ```
 263
 264 both `vapostproc` and `glupload` work on the same GPU. (DMABuf caps filter
 265 is just for illustration, it doesn't need to be specified, since DMA
 266 negotiation is well supported.)
 267
 268 The VA-API based `vapostproc` element can detect the modifiers at the
 269 element registration time and the src template should be:
 270
 271 ```
 272 SRC template: 'src'
 273     Availability: Always
 274       Capabilities:
 275         video/x-raw(memory:DMABuf)
 276           width:  [ 16, 16384 ]
 277           height: [ 16, 16384 ]
 278           drm-format: { (string)NV12:0x0100000000000001, \
 279                         (string)NV12, (string)I420, (string)YV12, \
 280                         (string)BGRA:0x0100000000000002 }
 281 ```
 282
 283 While `glupload` needs the runtime EGL context to check the DRM fourcc and
 284 modifiers, so it can just leave the *drm-format* field absent in its sink
 285 template:
 286
 287 ```
 288 SINK template: 'sink'
 289     Availability: Always
 290       Capabilities:
 291         video/x-raw(memory:DMABuf)
 292           width:  [ 1, 2147483647 ]
 293           height: [ 1, 2147483647 ]
 294 ```
 295
 296 At runtime, when the `vapostproc` wants to decide its src caps, it first
 297 query the downstream `glupload` element about all possible DMA caps. The
 298 `glupload` should answer that query based on the GL/EGL query result, such
 299 as:
 300
 301 ```
 302 drm-format: { (string)NV12:0x0100000000000001, (string)BGRA }
 303 ```
 304
 305 So, the intersection with `vapostproc`'s src caps will be
 306 `NV12:0x0100000000000001`. It will be the sent to downstream (`glupload`)
 307 by a CAPS event. The `vapostproc` element may also query the allocation
 308 after that CAPS event, but downstream `glupload` will not provide a DMA
 309 buffer pool because EGL API is mostly for DMAbuf importing. Then
 310 `vapostproc` will create its own DMA pool, the buffers created from that
 311 new pool should conform *drm-format*, described in this document, with
 312 `NV12:0x0100000000000001`. Also, the downstream `glupload` should make sure
 313 that it can import other DMA buffers which are not created in the pool it
 314 provided, as long as they conform with *drm-format*
 315 `NV12:0x0100000000000001`.
 316
 317 Then, when `vapostproc` handles each frame, it creates GPU surfaces with
 318 *drm-format* `NV12:0x0100000000000001`. Each surface is also exported as a
 319 set of file descriptors, each one wrapped in `GstMemory` allocated by a
 320 subclass of `GstDmaBufAllocator`. All the `GstMemory` are appended to a
 321 `GstBuffer`. There may be some extra information about the pitch, stride
 322 and plane offset when we export the surface, we also need to translate them
 323 into `GstVideoMeta` and attached it to the `GstBuffer`.
 324
 325 Later `glupload`, when it receives a `GstBuffer`, it can use those file
 326 descriptors with *drm-format* `NV12:0x0100000000000001` to import an
 327 EGLImage. If the `GstVideoMeta` exists, this extra parameters should also
 328 be provided to the importing API.