Documentation/contiguous-memory.txt

   1                                                              -*- org -*-
   2
   3 * Contiguous Memory Allocator
   4
   5    The Contiguous Memory Allocator (CMA) is a framework, which allows
   6    setting up a machine-specific configuration for physically-contiguous
   7    memory management. Memory for devices is then allocated according
   8    to that configuration.
   9
  10    The main role of the framework is not to allocate memory, but to
  11    parse and manage memory configurations, as well as to act as an
  12    in-between between device drivers and pluggable allocators. It is
  13    thus not tied to any memory allocation method or strategy.
  14
  15 ** Why is it needed?
  16
  17     Various devices on embedded systems have no scatter-getter and/or
  18     IO map support and as such require contiguous blocks of memory to
  19     operate.  They include devices such as cameras, hardware video
  20     decoders and encoders, etc.
  21
  22     Such devices often require big memory buffers (a full HD frame is,
  23     for instance, more then 2 mega pixels large, i.e. more than 6 MB
  24     of memory), which makes mechanisms such as kmalloc() ineffective.
  25
  26     Some embedded devices impose additional requirements on the
  27     buffers, e.g. they can operate only on buffers allocated in
  28     particular location/memory bank (if system has more than one
  29     memory bank) or buffers aligned to a particular memory boundary.
  30
  31     Development of embedded devices have seen a big rise recently
  32     (especially in the V4L area) and many such drivers include their
  33     own memory allocation code. Most of them use bootmem-based methods.
  34     CMA framework is an attempt to unify contiguous memory allocation
  35     mechanisms and provide a simple API for device drivers, while
  36     staying as customisable and modular as possible.
  37
  38 ** Design
  39
  40     The main design goal for the CMA was to provide a customisable and
  41     modular framework, which could be configured to suit the needs of
  42     individual systems.  Configuration specifies a list of memory
  43     regions, which then are assigned to devices.  Memory regions can
  44     be shared among many device drivers or assigned exclusively to
  45     one.  This has been achieved in the following ways:
  46
  47     1. The core of the CMA does not handle allocation of memory and
  48        management of free space.  Dedicated allocators are used for
  49        that purpose.
  50
  51        This way, if the provided solution does not match demands
  52        imposed on a given system, one can develop a new algorithm and
  53        easily plug it into the CMA framework.
  54
  55        The presented solution includes an implementation of a best-fit
  56        algorithm.
  57
  58     2. When requesting memory, devices have to introduce themselves.
  59        This way CMA knows who the memory is allocated for.  This
  60        allows for the system architect to specify which memory regions
  61        each device should use.
  62
  63     3. Memory regions are grouped in various "types".  When device
  64        requests a chunk of memory, it can specify what type of memory
  65        it needs.  If no type is specified, "common" is assumed.
  66
  67        This makes it possible to configure the system in such a way,
  68        that a single device may get memory from different memory
  69        regions, depending on the "type" of memory it requested.  For
  70        example, a video codec driver might want to allocate some
  71        shared buffers from the first memory bank and the other from
  72        the second to get the highest possible memory throughput.
  73
  74     4. For greater flexibility and extensibility, the framework allows
  75        device drivers to register private regions of reserved memory
  76        which then may be used only by them.
  77
  78        As an effect, if a driver would not use the rest of the CMA
  79        interface, it can still use CMA allocators and other
  80        mechanisms.
  81
  82        4a. Early in boot process, device drivers can also request the
  83            CMA framework to a reserve a region of memory for them
  84            which then will be used as a private region.
  85
  86            This way, drivers do not need to directly call bootmem,
  87            memblock or similar early allocator but merely register an
  88            early region and the framework will handle the rest
  89            including choosing the right early allocator.
  90
  91     4. CMA allows a run-time configuration of the memory regions it
  92        will use to allocate chunks of memory from.  The set of memory
  93        regions is given on command line so it can be easily changed
  94        without the need for recompiling the kernel.
  95
  96        Each region has it's own size, alignment demand, a start
  97        address (physical address where it should be placed) and an
  98        allocator algorithm assigned to the region.
  99
 100        This means that there can be different algorithms running at
 101        the same time, if different devices on the platform have
 102        distinct memory usage characteristics and different algorithm
 103        match those the best way.
 104
 105 ** Use cases
 106
 107     Let's analyse some imaginary system that uses the CMA to see how
 108     the framework can be used and configured.
 109
 110
 111     We have a platform with a hardware video decoder and a camera each
 112     needing 20 MiB of memory in the worst case.  Our system is written
 113     in such a way though that the two devices are never used at the
 114     same time and memory for them may be shared.  In such a system the
 115     following configuration would be used in the platform
 116     initialisation code:
 117
 118         static struct cma_region regions[] = {
 119                 { .name = "region", .size = 20 << 20 },
 120                 { }
 121         }
 122         static const char map[] __initconst = "video,camera=region";
 123
 124         cma_set_defaults(regions, map);
 125
 126     The regions array defines a single 20-MiB region named "region".
 127     The map says that drivers named "video" and "camera" are to be
 128     granted memory from the previously defined region.
 129
 130     A shorter map can be used as well:
 131
 132         static const char map[] __initconst = "*=region";
 133
 134     The asterisk ("*") matches all devices thus all devices will use
 135     the region named "region".
 136
 137     We can see, that because the devices share the same memory region,
 138     we save 20 MiB, compared to the situation when each of the devices
 139     would reserve 20 MiB of memory for itself.
 140
 141
 142     Now, let's say that we have also many other smaller devices and we
 143     want them to share some smaller pool of memory.  For instance 5
 144     MiB.  This can be achieved in the following way:
 145
 146         static struct cma_region regions[] = {
 147                 { .name = "region", .size = 20 << 20 },
 148                 { .name = "common", .size =  5 << 20 },
 149                 { }
 150         }
 151         static const char map[] __initconst =
 152                 "video,camera=region;*=common";
 153
 154         cma_set_defaults(regions, map);
 155
 156     This instructs CMA to reserve two regions and let video and camera
 157     use region "region" whereas all other devices should use region
 158     "common".
 159
 160
 161     Later on, after some development of the system, it can now run
 162     video decoder and camera at the same time.  The 20 MiB region is
 163     no longer enough for the two to share.  A quick fix can be made to
 164     grant each of those devices separate regions:
 165
 166         static struct cma_region regions[] = {
 167                 { .name = "v", .size = 20 << 20 },
 168                 { .name = "c", .size = 20 << 20 },
 169                 { .name = "common", .size =  5 << 20 },
 170                 { }
 171         }
 172         static const char map[] __initconst = "video=v;camera=c;*=common";
 173
 174         cma_set_defaults(regions, map);
 175
 176     This solution also shows how with CMA you can assign private pools
 177     of memory to each device if that is required.
 178
 179     Allocation mechanisms can be replaced dynamically in a similar
 180     manner as well. Let's say that during testing, it has been
 181     discovered that, for a given shared region of 40 MiB,
 182     fragmentation has become a problem.  It has been observed that,
 183     after some time, it becomes impossible to allocate buffers of the
 184     required sizes. So to satisfy our requirements, we would have to
 185     reserve a larger shared region beforehand.
 186
 187     But fortunately, you have also managed to develop a new allocation
 188     algorithm -- Neat Allocation Algorithm or "na" for short -- which
 189     satisfies the needs for both devices even on a 30 MiB region.  The
 190     configuration can be then quickly changed to:
 191
 192         static struct cma_region regions[] = {
 193                 { .name = "region", .size = 30 << 20, .alloc_name = "na" },
 194                 { .name = "common", .size =  5 << 20 },
 195                 { }
 196         }
 197         static const char map[] __initconst = "video,camera=region;*=common";
 198
 199         cma_set_defaults(regions, map);
 200
 201     This shows how you can develop your own allocation algorithms if
 202     the ones provided with CMA do not suit your needs and easily
 203     replace them, without the need to modify CMA core or even
 204     recompiling the kernel.
 205
 206 ** Technical Details
 207
 208 *** The attributes
 209
 210     As shown above, CMA is configured by a two attributes: list
 211     regions and map.  The first one specifies regions that are to be
 212     reserved for CMA.  The second one specifies what regions each
 213     device is assigned to.
 214
 215 **** Regions
 216
 217      Regions is a list of regions terminated by a region with size
 218      equal zero.  The following fields may be set:
 219
 220      - size       -- size of the region (required, must not be zero)
 221      - alignment  -- alignment of the region; must be power of two or
 222                      zero (optional)
 223      - start      -- where the region has to start (optional)
 224      - alloc_name -- the name of allocator to use (optional)
 225      - alloc      -- allocator to use (optional; and besides
 226                      alloc_name is probably is what you want)
 227
 228      size, alignment and start is specified in bytes.  Size will be
 229      aligned up to a PAGE_SIZE.  If alignment is less then a PAGE_SIZE
 230      it will be set to a PAGE_SIZE.  start will be aligned to
 231      alignment.
 232
 233      If command line parameter support is enabled, this attribute can
 234      also be overriden by a command line "cma" parameter.  When given
 235      on command line its forrmat is as follows:
 236
 237          regions-attr  ::= [ regions [ ';' ] ]
 238          regions       ::= region [ ';' regions ]
 239
 240          region        ::= REG-NAME
 241                              '=' size
 242                            [ '@' start ]
 243                            [ '/' alignment ]
 244                            [ ':' ALLOC-NAME ]
 245
 246          size          ::= MEMSIZE   // size of the region
 247          start         ::= MEMSIZE   // desired start address of
 248                                      // the region
 249          alignment     ::= MEMSIZE   // alignment of the start
 250                                      // address of the region
 251
 252      REG-NAME specifies the name of the region.  All regions given at
 253      via the regions attribute need to have a name.  Moreover, all
 254      regions need to have a unique name.  If two regions have the same
 255      name it is unspecified which will be used when requesting to
 256      allocate memory from region with given name.
 257
 258      ALLOC-NAME specifies the name of allocator to be used with the
 259      region.  If no allocator name is provided, the "default"
 260      allocator will be used with the region.  The "default" allocator
 261      is, of course, the first allocator that has been registered. ;)
 262
 263      size, start and alignment are specified in bytes with suffixes
 264      that memparse() accept.  If start is given, the region will be
 265      reserved on given starting address (or at close to it as
 266      possible).  If alignment is specified, the region will be aligned
 267      to given value.
 268
 269 **** Map
 270
 271      The format of the "map" attribute is as follows:
 272
 273          map-attr      ::= [ rules [ ';' ] ]
 274          rules         ::= rule [ ';' rules ]
 275          rule          ::= patterns '=' regions
 276
 277          patterns      ::= pattern [ ',' patterns ]
 278
 279          regions       ::= REG-NAME [ ',' regions ]
 280                        // list of regions to try to allocate memory
 281                        // from
 282
 283          pattern       ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
 284                        // pattern request must match for the rule to
 285                        // apply; the first rule that matches is
 286                        // applied; if dev-pattern part is omitted
 287                        // value identical to the one used in previous
 288                        // pattern is assumed.
 289
 290          dev-pattern   ::= PATTERN
 291                        // pattern that device name must match for the
 292                        // rule to apply; may contain question marks
 293                        // which mach any characters and end with an
 294                        // asterisk which match the rest of the string
 295                        // (including nothing).
 296
 297      It is a sequence of rules which specify what regions should given
 298      (device, type) pair use.  The first rule that matches is applied.
 299
 300      For rule to match, the pattern must match (dev, type) pair.
 301      Pattern consist of the part before and after slash.  The first
 302      part must match device name and the second part must match kind.
 303
 304      If the first part is empty, the device name is assumed to match
 305      iff it matched in previous pattern.  If the second part is
 306      omitted it will mach any type of memory requested by device.
 307
 308      If SysFS support is enabled, this attribute is accessible via
 309      SysFS and can be changed at run-time by writing to
 310      /sys/kernel/mm/contiguous/map.
 311
 312      If command line parameter support is enabled, this attribute can
 313      also be overriden by a command line "cma.map" parameter.
 314
 315 **** Examples
 316
 317      Some examples (whitespace added for better readability):
 318
 319          cma = r1 = 64M       // 64M region
 320                     @512M       // starting at address 512M
 321                                 // (or at least as near as possible)
 322                     /1M         // make sure it's aligned to 1M
 323                     :foo(bar);  // uses allocator "foo" with "bar"
 324                                 // as parameters for it
 325                r2 = 64M       // 64M region
 326                     /1M;        // make sure it's aligned to 1M
 327                                 // uses the first available allocator
 328                r3 = 64M       // 64M region
 329                     @512M       // starting at address 512M
 330                     :foo;       // uses allocator "foo" with no parameters
 331
 332          cma_map = foo = r1;
 333                        // device foo with kind==NULL uses region r1
 334
 335                    foo/quaz = r2;  // OR:
 336                    /quaz = r2;
 337                        // device foo with kind == "quaz" uses region r2
 338
 339          cma_map = foo/quaz = r1;
 340                        // device foo with type == "quaz" uses region r1
 341
 342                    foo/* = r2;     // OR:
 343                    /* = r2;
 344                        // device foo with any other kind uses region r2
 345
 346                    bar = r1,r2;
 347                        // device bar uses region r1 or r2
 348
 349                    baz?/a , baz?/b = r3;
 350                        // devices named baz? where ? is any character
 351                        // with type being "a" or "b" use r3
 352
 353 *** The device and types of memory
 354
 355     The name of the device is taken from the device structure.  It is
 356     not possible to use CMA if driver does not register a device
 357     (actually this can be overcome if a fake device structure is
 358     provided with at least the name set).
 359
 360     The type of memory is an optional argument provided by the device
 361     whenever it requests memory chunk.  In many cases this can be
 362     ignored but sometimes it may be required for some devices.
 363
 364     For instance, let's say that there are two memory banks and for
 365     performance reasons a device uses buffers in both of them.
 366     Platform defines a memory types "a" and "b" for regions in both
 367     banks.  The device driver would use those two types then to
 368     request memory chunks from different banks.  CMA attributes could
 369     look as follows:
 370
 371          static struct cma_region regions[] = {
 372                  { .name = "a", .size = 32 << 20 },
 373                  { .name = "b", .size = 32 << 20, .start = 512 << 20 },
 374                  { }
 375          }
 376          static const char map[] __initconst = "foo/a=a;foo/b=b;*=a,b";
 377
 378     And whenever the driver allocated the memory it would specify the
 379     kind of memory:
 380
 381         buffer1 = cma_alloc(dev, "a", 1 << 20, 0);
 382         buffer2 = cma_alloc(dev, "b", 1 << 20, 0);
 383
 384     If it was needed to try to allocate from the other bank as well if
 385     the dedicated one is full, the map attributes could be changed to:
 386
 387          static const char map[] __initconst = "foo/a=a,b;foo/b=b,a;*=a,b";
 388
 389     On the other hand, if the same driver was used on a system with
 390     only one bank, the configuration could be changed just to:
 391
 392          static struct cma_region regions[] = {
 393                  { .name = "r", .size = 64 << 20 },
 394                  { }
 395          }
 396          static const char map[] __initconst = "*=r";
 397
 398     without the need to change the driver at all.
 399
 400 *** Device API
 401
 402     There are three basic calls provided by the CMA framework to
 403     devices.  To allocate a chunk of memory cma_alloc() function needs
 404     to be used:
 405
 406         dma_addr_t cma_alloc(const struct device *dev, const char *type,
 407                              size_t size, dma_addr_t alignment);
 408
 409     If required, device may specify alignment in bytes that the chunk
 410     need to satisfy.  It have to be a power of two or zero.  The
 411     chunks are always aligned at least to a page.
 412
 413     The type specifies the type of memory as described to in the
 414     previous subsection.  If device driver does not care about memory
 415     type it can safely pass NULL as the type which is the same as
 416     possing "common".
 417
 418     The basic usage of the function is just a:
 419
 420         addr = cma_alloc(dev, NULL, size, 0);
 421
 422     The function returns bus address of allocated chunk or a value
 423     that evaluates to true if checked with IS_ERR_VALUE(), so the
 424     correct way for checking for errors is:
 425
 426         unsigned long addr = cma_alloc(dev, NULL, size, 0);
 427         if (IS_ERR_VALUE(addr))
 428                 /* Error */
 429                 return (int)addr;
 430         /* Allocated */
 431
 432     (Make sure to include <linux/err.h> which contains the definition
 433     of the IS_ERR_VALUE() macro.)
 434
 435
 436     Allocated chunk is freed via a cma_free() function:
 437
 438         int cma_free(dma_addr_t addr);
 439
 440     It takes bus address of the chunk as an argument frees it.
 441
 442
 443     The last function is the cma_info() which returns information
 444     about regions assigned to given (dev, type) pair.  Its syntax is:
 445
 446         int cma_info(struct cma_info *info,
 447                      const struct device *dev,
 448                      const char *type);
 449
 450     On successful exit it fills the info structure with lower and
 451     upper bound of regions, total size and number of regions assigned
 452     to given (dev, type) pair.
 453
 454 **** Dynamic and private regions
 455
 456      In the basic setup, regions are provided and initialised by
 457      platform initialisation code (which usually use
 458      cma_set_defaults() for that purpose).
 459
 460      It is, however, possible to create and add regions dynamically
 461      using cma_region_register() function.
 462
 463          int cma_region_register(struct cma_region *reg);
 464
 465      The region does not have to have name.  If it does not, it won't
 466      be accessed via standard mapping (the one provided with map
 467      attribute).  Such regions are private and to allocate chunk from
 468      them, one needs to call:
 469
 470          dma_addr_t cma_alloc_from_region(struct cma_region *reg,
 471                                           size_t size, dma_addr_t alignment);
 472
 473      It is just like cma_alloc() expect one specifies what region to
 474      allocate memory from.  The region must have been registered.
 475
 476 **** Allocating from region specified by name
 477
 478      If a driver preferred allocating from a region or list of regions
 479      it knows name of it can use a different call simmilar to the
 480      previous:
 481
 482          dma_addr_t cma_alloc_from(const char *regions,
 483                                    size_t size, dma_addr_t alignment);
 484
 485      The first argument is a comma-separated list of regions the
 486      driver desires CMA to try and allocate from.  The list is
 487      terminated by a NUL byte or a semicolon.
 488
 489      Similarly, there is a call for requesting information about named
 490      regions:
 491
 492         int cma_info_about(struct cma_info *info, const char *regions);
 493
 494      Generally, it should not be needed to use those interfaces but
 495      they are provided nevertheless.
 496
 497 **** Registering early regions
 498
 499      An early region is a region that is managed by CMA early during
 500      boot process.  It's platforms responsibility to reserve memory
 501      for early regions.  Later on, when CMA initialises, early regions
 502      with reserved memory are registered as normal regions.
 503      Registering an early region may be a way for a device to request
 504      a private pool of memory without worrying about actually
 505      reserving the memory:
 506
 507          int cma_early_region_register(struct cma_region *reg);
 508
 509      This needs to be done quite early on in boot process, before
 510      platform traverses the cma_early_regions list to reserve memory.
 511
 512      When boot process ends, device driver may see whether the region
 513      was reserved (by checking reg->reserved flag) and if so, whether
 514      it was successfully registered as a normal region (by checking
 515      the reg->registered flag).  If that is the case, device driver
 516      can use normal API calls to use the region.
 517
 518 *** Allocator operations
 519
 520     Creating an allocator for CMA needs four functions to be
 521     implemented.
 522
 523
 524     The first two are used to initialise an allocator for given driver
 525     and clean up afterwards:
 526
 527         int  cma_foo_init(struct cma_region *reg);
 528         void cma_foo_cleanup(struct cma_region *reg);
 529
 530     The first is called when allocator is attached to region.  When
 531     the function is called, the cma_region structure is fully
 532     initialised (ie. starting address and size have correct values).
 533     As a meter of fact, allocator should never modify the cma_region
 534     structure other then the private_data field which it may use to
 535     point to it's private data.
 536
 537     The second call cleans up and frees all resources the allocator
 538     has allocated for the region.  The function can assume that all
 539     chunks allocated form this region have been freed thus the whole
 540     region is free.
 541
 542
 543     The two other calls are used for allocating and freeing chunks.
 544     They are:
 545
 546         struct cma_chunk *cma_foo_alloc(struct cma_region *reg,
 547                                         size_t size, dma_addr_t alignment);
 548         void cma_foo_free(struct cma_chunk *chunk);
 549
 550     As names imply the first allocates a chunk and the other frees
 551     a chunk of memory.  It also manages a cma_chunk object
 552     representing the chunk in physical memory.
 553
 554     Either of those function can assume that they are the only thread
 555     accessing the region.  Therefore, allocator does not need to worry
 556     about concurrency.  Moreover, all arguments are guaranteed to be
 557     valid (i.e. page aligned size, a power of two alignment no lower
 558     the a page size).
 559
 560
 561     When allocator is ready, all that is left is to register it by
 562     calling cma_allocator_register() function:
 563
 564             int cma_allocator_register(struct cma_allocator *alloc);
 565
 566     The argument is an structure with pointers to the above functions
 567     and allocator's name.  The whole call may look something like
 568     this:
 569
 570         static struct cma_allocator alloc = {
 571                 .name    = "foo",
 572                 .init    = cma_foo_init,
 573                 .cleanup = cma_foo_cleanup,
 574                 .alloc   = cma_foo_alloc,
 575                 .free    = cma_foo_free,
 576         };
 577         return cma_allocator_register(&alloc);
 578
 579     The name ("foo") will be used when a this particular allocator is
 580     requested as an allocator for given region.
 581
 582 *** Integration with platform
 583
 584     There is one function that needs to be called form platform
 585     initialisation code.  That is the cma_early_regions_reserve()
 586     function:
 587
 588         void cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
 589
 590     It traverses list of all of the early regions provided by platform
 591     and registered by drivers and reserves memory for them.  The only
 592     argument is a callback function used to reserve the region.
 593     Passing NULL as the argument is the same as passing
 594     cma_early_region_reserve() function which uses bootmem and
 595     memblock for allocating.
 596
 597     Alternatively, platform code could traverse the cma_early_regions
 598     list by itself but this should never be necessary.
 599
 600
 601     Platform has also a way of providing default attributes for CMA,
 602     cma_set_defaults() function is used for that purpose:
 603
 604         int cma_set_defaults(struct cma_region *regions, const char *map)
 605
 606     It needs to be called after early params have been parsed but
 607     prior to reserving regions.  It let one specify the list of
 608     regions defined by platform and the map attribute.  The map may
 609     point to a string in __initdata.  See above in this document for
 610     example usage of this function.
 611
 612 ** Future work
 613
 614     In the future, implementation of mechanisms that would allow the
 615     free space inside the regions to be used as page cache, filesystem
 616     buffers or swap devices is planned.  With such mechanisms, the
 617     memory would not be wasted when not used.
 618
 619     Because all allocations and freeing of chunks pass the CMA
 620     framework it can follow what parts of the reserved memory are
 621     freed and what parts are allocated.  Tracking the unused memory
 622     would let CMA use it for other purposes such as page cache, I/O
 623     buffers, swap, etc.