doc/persistency-concept.txt

   1 Persistent Storage System
   2 ~~~~~~~~~~~~~~~~~~~~~~~~~
   3
   4
   5 -------
   6 Purpose
   7 -------
   8
   9 The Storage System is responsible for persistently storing and
  10 loading Patches and Atoms for the packagemanager. Other object types
  11 might be added in future.
  12
  13 ----------
  14 Operations
  15 ----------
  16
  17 Required operations are:
  18
  19 - read all objects of a given type
  20 - read in an object identified by a given attribute
  21 - change status
  22 - save a new object
  23 - delete an object (only for cleanup after update, does not need to
  24   be fast)
  25 - a query interface similar to what rpm offers
  26
  27
  28
  29
  30 -----------
  31 Constraints
  32 -----------
  33
  34 In order of relevance:
  35
  36 - It will be part of the ZYPP library, and as such needs to be
  37   reentrant and thread safe.
  38
  39 - There is only one attribute in an object that might need to be
  40   modified, the status. Thus, no universal "modify object" operation is
  41   needed.
  42
  43 - Most time critical operation is "read all objects", since this will
  44   defer the startup of the YaST packagemanager. This already takes an
  45   uncomfortably long time.
  46
  47 - It should try to cut down on memory usage, since it will be in use at
  48   install time, at least when updating.
  49
  50 - We'll probably get something in the order of 100-1000 of these objects.
  51
  52 - It should be possible to use different backends for the actual low
  53   level storage, i.e. Berkley db, postgres, mysql, flat file.
  54
  55
  56 ------------
  57 Architecture
  58 ------------
  59
  60 Here's a layer diagram of the main architecture. The paranthesed
  61 figures indicate APIs::
  62
  63  +--------------------+--------------+
  64  | caller, e.g. package manager      |
  65  +--------(1)---------+              +
  66  | query interface    |              |
  67  +-------------------(2)-------------+
  68  | Persistent Storage Core           |
  69  +--------(3)---------+------(4)-----+
  70  | Backend plugin     | Type plugin  |
  71  +--------------------+--------------+
  72  | Backend            | Parser       |
  73  | (e.g., Berkley DB) |              |
  74  +--------------------+              +
  75  | Filesystem         |              |
  76  +--------------------+--------------+
  77
  78
  79
  80 APIs:
  81
  82 1) Query API
  83
  84 2) Core API
  85
  86 3) Backend Plugin API
  87
  88 4) Type Plugin API
  89
  90
  91 These components are described below, the APIs are subject to a later,
  92 more detailed follow-up.
  93
  94
  95
  96 Query Interface
  97 ===============
  98
  99 This implements the rpm-like query operation by using general and
 100 simpler search operations within the Core API.
 101
 102
 103 Persistent Storage Core
 104 =======================
 105
 106 The Core is the "main contact" for the layers above and breaks down
 107 the complete functionality for the simpler modules in the lower layer.
 108
 109
 110 Backend Plugin
 111 ==============
 112
 113 The backend plugin considers data objects in a form that is suitable
 114 for the backend, i.e. a data record is represented by:
 115
 116 - an XML string
 117 - a set of keys, which are
 118   - pairs (attribute name, value)
 119
 120 The following operations are available:
 121
 122 - create the storage database
 123 - find a record by a given attribute and return a handle
 124 - insert a new record and return a handle to it
 125
 126 These operations affect the record that is referenced by a handle:
 127
 128 - read the record
 129 - update the status
 130 - remove it (delete it)
 131
 132
 133 Type plugins
 134 ============
 135
 136 These are responsible for dealing with the specifics of the type of
 137 objects that are stored. This includes:
 138
 139 - knowing about the path names of the database
 140 - knowing about the attributes and keys
 141 - converting the internal representation of a data object to XML and back
 142 - extracting the key values from an XML string
 143
 144
 145 Parser
 146 ======
 147
 148 The parser creates simple structure-like objects from the XML string.
 149 This is derived from the already implemented XMLNodeIterator class.
 150
 151
 152 --------
 153 Backends
 154 --------
 155
 156 This is an overview of backends that look promising for handling the
 157 low level storage.
 158
 159 Plain Files
 160 ===========
 161
 162 Basic idea: Store the data for each object as an XML element in a flat
 163 file, store the status separately and add indices for fast access.
 164
 165 For each type there is
 166
 167 - a master file, which contains:
 168
 169   - the pathnames of the other files
 170   - storage options
 171
 172 - one or more data files, which contains all the data as XML, except
 173   the status
 174
 175 - the status for each object as a binary file, each byte representing
 176   the status for one object. The association between status byte and
 177   object is contained in an index file.
 178
 179 - one or more index files (for all data fields that need to be a
 180   key). The index could either be an ordered flat file to mmap, or
 181   something like a tree or hash table. (I'm still looking for a suitable
 182   library).
 183
 184 Advantage: More direct control over everything, can be tailored to our
 185 needs, probably the efficients way to do it.
 186
 187
 188 Berkley DB
 189 ==========
 190
 191 Berkley DB is a small non-relational embedded database library that
 192 provides compact transparent data storage with a wealth of related
 193 services. Data is managed as (key,value) tuples, and indexing is
 194 performed for the keys.
 195
 196 Basic idea: Store the data for each object as an XML string in a
 197 berkley db. Fast access and everything is managed by the db. The
 198 Berkley db is already used within rpm, so it's no additional
 199 dependency. The library is 1-2MB (depending whether you use the C++
 200 binding or only C).
 201
 202 Another option: Instead of saving the XML string, serialize the
 203 ParserData structure and save this one directly. Faster, but more
 204 difficult to access for debugging.
 205
 206 Advantage: Everything's there.
 207
 208 Disadvantage: Berkley DB had some issues in the past time. We'll
 209 depend on it.
 210
 211 Side note: xmldb cannot be used since it needs a lot of special libraries with
 212 specific versions. For this reason, it hasn't made it on the SUSE
 213 Linux 10.0.
 214
 215
 216
 217
 218 SQLite
 219 ======
 220
 221 SQLite is an embedded and very small SQL engine that contains both
 222 server and client, and has no multi-user access support. It is very
 223 small (200-300 KByte).
 224
 225 Basic idea: Store the data as an XML string in a record. For each key
 226 data field, add another attribute.
 227
 228 Disadvantage: We don't really need the relational aspects if SQLite,
 229 and we don't have many people with knowledge of this topic in the
 230 theme. Not much experience with it. Additional library in the install
 231 system. Probably worst performance (but SQLite claims being fast for SQL)
 232
 233 Advantage: Maximum flexibility, especially in queries. Easy changes in
 234 the structure.