docs/programmer_reference/am_conf_select.html

   1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml">
   4   <head>
   5     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
   6     <title>Selecting an access method</title>
   7     <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
   8     <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
   9     <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
  10     <link rel="up" href="am_conf.html" title="Chapter 2.  Access Method Configuration" />
  11     <link rel="prev" href="am_conf.html" title="Chapter 2.  Access Method Configuration" />
  12     <link rel="next" href="am_conf_logrec.html" title="Logical record numbers" />
  13   </head>
  14   <body>
  15     <div xmlns="" class="navheader">
  16       <div class="libver">
  17         <p>Library Version 11.2.5.3</p>
  18       </div>
  19       <table width="100%" summary="Navigation header">
  20         <tr>
  21           <th colspan="3" align="center">Selecting an access method</th>
  22         </tr>
  23         <tr>
  24           <td width="20%" align="left"><a accesskey="p" href="am_conf.html">Prev</a> </td>
  25           <th width="60%" align="center">Chapter 2.
  26                 Access Method Configuration
  27         </th>
  28           <td width="20%" align="right"> <a accesskey="n" href="am_conf_logrec.html">Next</a></td>
  29         </tr>
  30       </table>
  31       <hr />
  32     </div>
  33     <div class="sect1" lang="en" xml:lang="en">
  34       <div class="titlepage">
  35         <div>
  36           <div>
  37             <h2 class="title" style="clear: both"><a id="am_conf_select"></a>Selecting an access method</h2>
  38           </div>
  39         </div>
  40       </div>
  41       <div class="toc">
  42         <dl>
  43           <dt>
  44             <span class="sect2">
  45               <a href="am_conf_select.html#idm1772384">Btree or Heap?</a>
  46             </span>
  47           </dt>
  48           <dt>
  49             <span class="sect2">
  50               <a href="am_conf_select.html#idm2622384">Hash or Btree?</a>
  51             </span>
  52           </dt>
  53           <dt>
  54             <span class="sect2">
  55               <a href="am_conf_select.html#idm1789184">Queue or Recno?</a>
  56             </span>
  57           </dt>
  58         </dl>
  59       </div>
  60       <p>
  61         The Berkeley DB access method implementation unavoidably interacts with
  62         each application's data set, locking requirements and data access
  63         patterns.  For this reason, one access method may result in
  64         dramatically better performance for an application than another one.
  65         Applications whose data could be stored using more than one access
  66         method may want to benchmark their performance using the different
  67         candidates.
  68     </p>
  69       <p>
  70         One of the strengths of Berkeley DB is that it provides multiple access
  71         methods with nearly identical interfaces to the different access
  72         methods.  This means that it is simple to modify an application to use
  73         a different access method.  Applications can easily benchmark the
  74         different Berkeley DB access methods against each other for their
  75         particular data set and access pattern.
  76     </p>
  77       <p>
  78         Most applications choose between using the Btree or Heap access
  79         methods, between Btree or Hash, or between Queue and Recno, because
  80         each of these pairs offer similar functionality.
  81     </p>
  82       <div class="sect2" lang="en" xml:lang="en">
  83         <div class="titlepage">
  84           <div>
  85             <div>
  86               <h3 class="title"><a id="idm1772384"></a>Btree or Heap?</h3>
  87             </div>
  88           </div>
  89         </div>
  90         <p>
  91             Most applications use Btree because it performs well for most
  92             general-purpose database workloads.  But there are
  93             circumstances where Heap is the better choice.  This section
  94             describes the differences between the two access methods so
  95             that you can better understand when Heap might be the superior
  96             choice for your application.
  97         </p>
  98         <p>
  99             Before continuing, it is helpful to have a high level
 100             understanding of the operating differences between Btree and
 101             Heap.
 102         </p>
 103         <div class="sect3" lang="en" xml:lang="en">
 104           <div class="titlepage">
 105             <div>
 106               <div>
 107                 <h4 class="title"><a id="idm1549752"></a>Disk Space Usage</h4>
 108               </div>
 109             </div>
 110           </div>
 111           <p>
 112                 The Heap access method was developed for use in systems
 113                 with constrained disk space (such as an embedded system).
 114                 Because of the way it reuses page space, for some workloads
 115                 it can be much better than Btree on disk space usage
 116                 because it will not grow the on-disk database file as fast
 117                 as Btree.  Of course, this assumes that your application is
 118                 characterized by a roughly equal number of record creations
 119                 and deletions.
 120             </p>
 121           <p>
 122                 Also, Heap can actively control the space used by the
 123                 database with the use of the <a href="../api_reference/C/dbset_heapsize.html" class="olink">DB-&gt;set_heapsize()</a> method. When
 124                 the limit specified by that method is reached, no
 125                 additional pages will be allocated and existing pages will
 126                 be aggressively searched for free space.  Also records in
 127                 the heap can be split to fill space on two or more pages.
 128             </p>
 129         </div>
 130         <div class="sect3" lang="en" xml:lang="en">
 131           <div class="titlepage">
 132             <div>
 133               <div>
 134                 <h4 class="title"><a id="idm1572504"></a>Record Access</h4>
 135               </div>
 136             </div>
 137           </div>
 138           <p>
 139                 Btree and Heap are fundamentally different because of the
 140                 way that you access records in them. In Btree, you access a
 141                 record by using the record's key. This lookup occurs fairly
 142                 quickly because Btree places records in the database
 143                 according to a pre-defined sorting order.  Your application
 144                 is responsible for constructing the key, which means that
 145                 it is relatively easy for your application to know what key
 146                 is in use by any given record.
 147             </p>
 148           <p>
 149                 Conversely, Heap accesses records based on their offset
 150                 location within the database. You retrieve a record in a
 151                 Heap database using the record's Record ID (RID), which is
 152                 created when the record is added to the database.  The RID
 153                 is created for you; you cannot specify this yourself.
 154                 Because the RID is created for you, your application does
 155                 not have control over the key value.  For this reason,
 156                 retrieval operations for a Heap database are usually
 157                 performed using secondary databases. You can then use this
 158                 secondary index to retrieve records stored in your Heap
 159                 database.
 160             </p>
 161           <p>
 162                 Note that an application's data access requirements
 163                 grow complex, Btree databases also frequently
 164                 require secondary databases. So at a certain level
 165                 of complexity you will be using secondary databases
 166                 regardless of the access method that you choose.
 167             </p>
 168           <p>
 169                 Secondary databases are described in
 170                 <a class="xref" href="am_second.html" title="Secondary indexes">Secondary indexes</a>.
 171             </p>
 172         </div>
 173         <div class="sect3" lang="en" xml:lang="en">
 174           <div class="titlepage">
 175             <div>
 176               <div>
 177                 <h4 class="title"><a id="idm2352312"></a>Record Creation/Deletion</h4>
 178               </div>
 179             </div>
 180           </div>
 181           <p>
 182                 When Btree creates a new record, it places the record on
 183                 the database page that is appropriate for the sorting order
 184                 in use by the database. If Btree can not find a page to put
 185                 the new record on, it locates a page that is in the proper
 186                 location for the new record, splits it so that the existing
 187                 records are divided between the two pages, and then adds
 188                 the new record to the appropriate page.
 189             </p>
 190           <p>
 191                 On deletion, Btree simply removes the deleted record from
 192                 whatever page it is stored on. This leaves some amount of
 193                 unused space ("holes") on the page. Only new records that
 194                 sort to this page can fill that space. However, once a page
 195                 is completely empty, it can be reused to hold records with
 196                 a different sort value.
 197             </p>
 198           <p>
 199                 In order to reclaim unused disk space, you must run the
 200                 <a href="../api_reference/C/dbcompact.html" class="olink">DB-&gt;compact()</a> method, which attempts to fill holes in
 201                 existing pages by moving records from other pages. If it is
 202                 successful in moving enough records, it might be left with
 203                 entire pages that have no data on them. In this event, the
 204                 unused pages might be removed from the database (depending
 205                 on the flags that you provide to <a href="../api_reference/C/dbcompact.html" class="olink">DB-&gt;compact()</a>), which causes
 206                 the database file to be reduced in size.
 207             </p>
 208           <p>
 209                 Both tree searching and page compaction are relatively
 210                 expensive operations. Heap avoids these operations, and so
 211                 is able to perform better under some circumstances.
 212             </p>
 213           <p>
 214                 Heap does not care about record order.  When a record is
 215                 created in a Heap database, it is placed on the first page
 216                 that has space to store the record. No sorting is involved,
 217                 and so the overhead from record sorting is removed.
 218             </p>
 219           <p>
 220                 On deletion, both Btree and Heap free space within a page
 221                 when a record is deleted.  However, unlike Btree, Heap has
 222                 no compaction operation, nor does it have to wait for a
 223                 record with the proper sort order to fill a hole on a page.
 224                 Instead, Heap simply reuses empty page space whenever any
 225                 record is added that will fit into the space.
 226             </p>
 227         </div>
 228         <div class="sect3" lang="en" xml:lang="en">
 229           <div class="titlepage">
 230             <div>
 231               <div>
 232                 <h4 class="title"><a id="idm1962568"></a>Cursor Operations</h4>
 233               </div>
 234             </div>
 235           </div>
 236           <p>
 237                 When considering Heap, be aware that this access method
 238                 does not support the full range of cursor operations that
 239                 Btree does.
 240             </p>
 241           <div class="itemizedlist">
 242             <ul type="disc">
 243               <li>
 244                 <p>
 245                         On sequential cursor scans of the database, the
 246                         retrieval order of the records is not predictable
 247                         for Heap because the records are not sorted. Btree,
 248                         of course, sorts its records so the retrieval order
 249                         is predictable.
 250                     </p>
 251               </li>
 252               <li>
 253                 <p>
 254                         When using a Heap database, you cannot create new
 255                         records using a cursor.  Also, this means
 256                         that Heap does not support the <a href="../api_reference/C/dbcput.html" class="olink">DBC-&gt;put()</a>
 257                         <code class="literal">DB_AFTER</code> and
 258                         <code class="literal">DB_BEFORE</code> flags.
 259                         You can, however, update existing records using a
 260                         cursor.
 261                     </p>
 262               </li>
 263               <li>
 264                 <p>
 265                         For concurrent applications, iterating through the records in a
 266                         Heap database is not recommended due to performance
 267                         considerations. This is because there is a good
 268                         chance that there are a lot of empty pages in the
 269                         database if you have a concurrent application.
 270                     </p>
 271                 <p>
 272                         For a Heap database, entire regions are locked when
 273                         a lock is acquired for a database page. If there is
 274                         then contention for that region, and a new database
 275                         page needs to be added, then Berkeley DB simply
 276                         creates a whole new region.  The locked region is
 277                         then padded with empty pages in order to reach the
 278                         new region.
 279                     </p>
 280                 <p>
 281                         The result is that if the last used page in a
 282                         region is 10, and a new region is created at page
 283                         100, then there are empty pages from 11 to 99. If
 284                         you are iterating with a cursor, then all those
 285                         empty pages must be examined by the cursor before
 286                         it can reach the data at page 100.
 287                     </p>
 288               </li>
 289             </ul>
 290           </div>
 291         </div>
 292         <div class="sect3" lang="en" xml:lang="en">
 293           <div class="titlepage">
 294             <div>
 295               <div>
 296                 <h4 class="title"><a id="idm2036040"></a>Which Access Method Should You Use?</h4>
 297               </div>
 298             </div>
 299           </div>
 300           <p>
 301                 Ultimately, you can only determine which access method is
 302                 superior for your application through performance testing
 303                 using both access methods. To be effective, this
 304                 performance testing must use a production-equivalent
 305                 workload.
 306             </p>
 307           <p>
 308                 That said, there are a few times when you absolutely must
 309                 use Btree:
 310             </p>
 311           <div class="itemizedlist">
 312             <ul type="disc">
 313               <li>
 314                 <p>
 315                         If you want to use bulk put and get operations.
 316                     </p>
 317               </li>
 318               <li>
 319                 <p>
 320                         If having your database clustered on sort order is
 321                         important to you.
 322                     </p>
 323               </li>
 324               <li>
 325                 <p>
 326                         If you want to be able to create records using
 327                         cursors.
 328                     </p>
 329               </li>
 330               <li>
 331                 <p>
 332                         If you have multiple threads/processes
 333                         simultaneously creating new records, and you want
 334                         to be able to efficiently iterate over those
 335                         records using a cursor.
 336                     </p>
 337               </li>
 338             </ul>
 339           </div>
 340           <p>
 341                 But beyond those limitations, there are some application
 342                 characteristics that should cause you to suspect that Heap
 343                 will work better for your application than Btree. They are:
 344             </p>
 345           <div class="itemizedlist">
 346             <ul type="disc">
 347               <li>
 348                 <p>
 349                         Your application will run in an environment with
 350                         constrained resources and you want to set a hard
 351                         limit on the size of the database file.
 352                     </p>
 353               </li>
 354               <li>
 355                 <p>
 356                         You want to limit the disk space growth of your
 357                         database file, and your application performs a
 358                         roughly equivalent number of record creations and
 359                         deletions.
 360                     </p>
 361               </li>
 362               <li>
 363                 <p>
 364                         Inserts into a Btree require sorting the new record
 365                         onto its proper page. This operation can require
 366                         multiple page reads. A Heap database can simply
 367                         reuse whatever empty page space it can find in the
 368                         cache. Insert-intensive applications will typically
 369                         find that Heap is much more efficient than Btree,
 370                         especially as the size of the database increases.
 371                     </p>
 372               </li>
 373             </ul>
 374           </div>
 375         </div>
 376       </div>
 377       <div class="sect2" lang="en" xml:lang="en">
 378         <div class="titlepage">
 379           <div>
 380             <div>
 381               <h3 class="title"><a id="idm2622384"></a>Hash or Btree?</h3>
 382             </div>
 383           </div>
 384         </div>
 385         <p>
 386             The Hash and Btree access methods should be used when logical
 387             record numbers are not the primary key used for data access.
 388             (If logical record numbers are a secondary key used for data
 389             access, the Btree access method is a possible choice, as it
 390             supports simultaneous access by a key and a record number.)
 391         </p>
 392         <p>
 393             Keys in Btrees are stored in sorted order and the relationship
 394             between them is defined by that sort order.  For this reason,
 395             the Btree access method should be used when there is any
 396             locality of reference among keys.  Locality of reference means
 397             that accessing one particular key in the Btree implies that the
 398             application is more likely to access keys near to the key being
 399             accessed, where "near" is defined by the sort order.  For
 400             example, if keys are timestamps, and it is likely that a
 401             request for an 8AM timestamp will be followed by a request for
 402             a 9AM timestamp, the Btree access method is generally the right
 403             choice.  Or, for example, if the keys are names, and the
 404             application will want to review all entries with the same last
 405             name, the Btree access method is again a good choice.
 406         </p>
 407         <p>
 408             There is little difference in performance between the Hash and
 409             Btree access methods on small data sets, where all, or most of,
 410             the data set fits into the cache.  However, when a data set is
 411             large enough that significant numbers of data pages no longer
 412             fit into the cache, then the Btree locality of reference
 413             described previously becomes important for performance reasons.
 414             For example, there is no locality of reference for the Hash
 415             access method, and so key "AAAAA" is as likely to be stored on
 416             the same database page with key "ZZZZZ" as with key "AAAAB".
 417             In the Btree access method, because items are sorted, key
 418             "AAAAA" is far more likely to be near key "AAAAB" than key
 419             "ZZZZZ".  So, if the application exhibits locality of reference
 420             in its data requests, then the Btree page read into the cache
 421             to satisfy a request for key "AAAAA" is much more likely to be
 422             useful to satisfy subsequent requests from the application than
 423             the Hash page read into the cache to satisfy the same request.
 424             This means that for applications with locality of reference,
 425             the cache is generally much more effective for the Btree access
 426             method than the Hash access method, and the Btree access method
 427             will make many fewer I/O calls.
 428         </p>
 429         <p>
 430             However, when a data set becomes even larger, the Hash access
 431             method can outperform the Btree access method.  The reason for
 432             this is that Btrees contain more metadata pages than Hash
 433             databases.  The data set can grow so large that metadata pages
 434             begin to dominate the cache for the Btree access method.  If
 435             this happens, the Btree can be forced to do an I/O for each
 436             data request because the probability that any particular data
 437             page is already in the cache becomes quite small.  Because the
 438             Hash access method has fewer metadata pages, its cache stays
 439             "hotter" longer in the presence of large data sets.  In
 440             addition, once the data set is so large that both the Btree and
 441             Hash access methods are almost certainly doing an I/O for each
 442             random data request, the fact that Hash does not have to walk
 443             several internal pages as part of a key search becomes a
 444             performance advantage for the Hash access method as well.
 445         </p>
 446         <p>
 447             Application data access patterns strongly affect all of these
 448             behaviors, for example, accessing the data by walking a cursor
 449             through the database will greatly mitigate the large data set
 450             behavior describe above because each I/O into the cache will
 451             satisfy a fairly large number of subsequent data
 452             requests.
 453         </p>
 454         <p>
 455             In the absence of information on application data and data
 456             access patterns, for small data sets either the Btree or Hash
 457             access methods will suffice.  For data sets larger than the
 458             cache, we normally recommend using the Btree access method.  If
 459             you have truly large data, then the Hash access method may be a
 460             better choice.  The <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility is a useful tool for monitoring
 461             how well your cache is performing.
 462         </p>
 463       </div>
 464       <div class="sect2" lang="en" xml:lang="en">
 465         <div class="titlepage">
 466           <div>
 467             <div>
 468               <h3 class="title"><a id="idm1789184"></a>Queue or Recno?</h3>
 469             </div>
 470           </div>
 471         </div>
 472         <p>
 473             The Queue or Recno access methods should be used when logical
 474             record numbers are the primary key used for data access.  The
 475             advantage of the Queue access method is that it performs record
 476             level locking and for this reason supports significantly higher
 477             levels of concurrency than the Recno access method.  The
 478             advantage of the Recno access method is that it supports a
 479             number of additional features beyond those supported by the
 480             Queue access method, such as variable-length records and
 481             support for backing flat-text files.
 482         </p>
 483         <p>
 484             Logical record numbers can be mutable or fixed: mutable, where
 485             logical record numbers can change as records are deleted or
 486             inserted, and fixed, where record numbers never change
 487             regardless of the database operation.  It is possible to store
 488             and retrieve records based on logical record numbers in the
 489             Btree access method.  However, those record numbers are always
 490             mutable, and as records are deleted or inserted, the logical
 491             record number for other records in the database will change.
 492             The Queue access method always runs in fixed mode, and logical
 493             record numbers never change regardless of the database
 494             operation. The Recno access method can be configured to run in
 495             either mutable or fixed mode.
 496         </p>
 497         <p>
 498             In addition, the Recno access method provides support for
 499             databases whose permanent storage is a flat text file and the
 500             database is used as a fast, temporary storage area while the
 501             data is being read or modified.
 502         </p>
 503       </div>
 504     </div>
 505     <div class="navfooter">
 506       <hr />
 507       <table width="100%" summary="Navigation footer">
 508         <tr>
 509           <td width="40%" align="left"><a accesskey="p" href="am_conf.html">Prev</a> </td>
 510           <td width="20%" align="center">
 511             <a accesskey="u" href="am_conf.html">Up</a>
 512           </td>
 513           <td width="40%" align="right"> <a accesskey="n" href="am_conf_logrec.html">Next</a></td>
 514         </tr>
 515         <tr>
 516           <td width="40%" align="left" valign="top">Chapter 2.
 517                 Access Method Configuration
 518          </td>
 519           <td width="20%" align="center">
 520             <a accesskey="h" href="index.html">Home</a>
 521           </td>
 522           <td width="40%" align="right" valign="top"> Logical record numbers</td>
 523         </tr>
 524       </table>
 525     </div>
 526   </body>
 527 </html>