doc/persistency-concept.txt

   1 Persistent Storage System
   2 ~~~~~~~~~~~~~~~~~~~~~~~~~
   3
   4
   5 -------
   6 Purpose
   7 -------
   8
   9 The Storage System is responsible for persistently storing and
  10 loading Patches, Atoms and similar objects for the packagemanager.
  11 Other object kinds might be added in future.
  12
  13 ----------
  14 Operations
  15 ----------
  16
  17 Required operations are:
  18
  19 - read all objects of a given kind
  20 - read in objects identified by a list of
  21   tuples (attribute name, match criteria, value)
  22 - change status
  23 - save a new object
  24 - delete an object (only for cleanup after update, does not need to
  25   be fast)
  26 - a query interface similar to what rpm offers
  27
  28
  29
  30
  31 -----------
  32 Constraints
  33 -----------
  34
  35 In order of relevance:
  36
  37 - It will be part of the ZYPP library, and as such needs to be
  38   reentrant and thread safe.
  39
  40 - There is only one attribute in an object that might need to be
  41   modified, the status. Thus, no universal "modify object" operation is
  42   needed.
  43
  44 - Most time critical operation is "read all objects", since this will
  45   defer the startup of the YaST packagemanager. This already takes an
  46   uncomfortably long time.
  47
  48 - It should try to cut down on memory usage, since it will be in use at
  49   install time, at least when updating.
  50
  51 - We'll probably get something in the order of 1000-5000 of these objects.
  52
  53 - It must be possible to use different backends for the actual low
  54   level storage, i.e. Berkley db, postgres, mysql, flat file.
  55
  56
  57 ------------
  58 Architecture
  59 ------------
  60
  61 Here's a layer diagram of the main architecture. The paranthesed
  62 figures indicate APIs::
  63
  64  +--------------------+--------------+
  65  | caller, e.g. package manager      |
  66  +--------(1)---------+              +
  67  | query interface    |              |
  68  +-------------------(2)-------------+
  69  | Persistent Storage Core           |
  70  +--------(3)---------+------(4)-----+
  71  | Backend plugin     | Kind plugin  |
  72  +--------------------+--------------+
  73  | Backend            | Parser       |
  74  | (e.g., Berkley DB) |              |
  75  +--------------------+              +
  76  | Filesystem         |              |
  77  +--------------------+--------------+
  78
  79
  80
  81 APIs:
  82
  83 1) Query API
  84
  85 2) Core API
  86
  87 3) Backend Plugin API
  88
  89 4) Kind Plugin API
  90
  91
  92 These components are described below, the APIs are subject to a later,
  93 more detailed follow-up.
  94
  95
  96
  97 Query Interface
  98 ===============
  99
 100 This implements the rpm-like query operation by using general and
 101 simpler search operations within the Core API.
 102
 103 FIXME: Insert list of operations.
 104
 105
 106 Persistent Storage Core
 107 =======================
 108
 109 The Core is the "main contact" for the layers above and breaks down
 110 the complete functionality for the simpler modules in the lower layer.
 111
 112 FIXME: Insert list of operations.
 113
 114
 115 Backend Plugin
 116 ==============
 117
 118 The backend plugin considers data objects in a form that is suitable
 119 for the backend, i.e. a data record is represented by:
 120
 121 - an XML string
 122 - a set of keys, which are
 123   - pairs (attribute name, value)
 124
 125 The following operations are available:
 126
 127 - create the storage database
 128 - find records by attributes and return a list of handles.
 129 - insert a new record and return a handle to it
 130
 131 These operations affect the record that is referenced by a handle:
 132
 133 - read the record
 134 - update the status
 135 - remove it (delete it)
 136
 137
 138 Kind plugins
 139 ============
 140
 141 These are responsible for dealing with the specifics of the kind of
 142 objects that are stored. This includes:
 143
 144 - knowing about the path names of the database
 145 - knowing about the attributes and keys
 146 - converting the internal representation of a data object to XML and back
 147 - extracting the key values from an XML string
 148
 149
 150 Parser
 151 ======
 152
 153 In the backend, only XML strings are stored as contents, and additionally
 154 indexes for fast access. For this, we need a parser which creates simple
 155 structure-like objects from the XML string.
 156 This is derived from the already implemented XMLNodeIterator class.
 157
 158
 159 --------
 160 Backends
 161 --------
 162
 163 This is an overview of backends that look promising for handling the
 164 low level storage.
 165
 166 Plain Files
 167 ===========
 168
 169 Basic idea: Store the data for each object as an XML element in a flat
 170 file, store the status separately and add indices for fast access.
 171
 172 For each kind there is
 173
 174 - a master file, which contains:
 175
 176   - the pathnames of the other files
 177   - storage options
 178
 179 - one or more data files, which contains all the data as XML, except
 180   the status
 181
 182 - the status for each object as a binary file, each byte representing
 183   the status for one object. The association between status byte and
 184   object is contained in an index file.
 185
 186 - one or more index files (for all data fields that need to be a
 187   key). The index could either be an ordered flat file to mmap, or
 188   something like a tree or hash table. (I'm still looking for a suitable
 189   library).
 190
 191 Advantage: More direct control over everything, can be tailored to our
 192 needs, probably the efficients way to do it.
 193
 194 Disadvantage: Needs a lot of effort to do it right and get a good performance.
 195
 196
 197 Berkley DB
 198 ==========
 199
 200 Berkley DB is a small non-relational embedded database library that
 201 provides compact transparent data storage with a wealth of related
 202 services. Data is managed as (key,value) tuples, and indexing is
 203 performed for the keys.
 204
 205 Basic idea: Store the data for each object as an XML string in a
 206 berkley db. Fast access and everything is managed by the db. The
 207 Berkley db is already used within rpm, so it's no additional
 208 dependency. The library is 1-2MB (depending whether you use the C++
 209 binding or only C).
 210
 211 Another option: Instead of saving the XML string, serialize the
 212 ParserData structure and save this one directly. Faster, but more
 213 difficult to access for debugging.
 214
 215 Advantage: Everything's there for queries with a single index.
 216
 217 Disadvantage: Not so good on multiple indexes. This needs some
 218 effort.
 219
 220 Side note: xmldb cannot be used since it needs a lot of special libraries with
 221 specific versions. For this reason, it hasn't made it on the SUSE
 222 Linux 10.0.
 223
 224
 225
 226
 227 SQLite
 228 ======
 229
 230 SQLite is an embedded and very small SQL engine that contains both
 231 server and client, and has no multi-user access support. It is very
 232 small (200-300 KByte).
 233
 234 Basic idea: Store the data as an XML string in a record. For each key
 235 data field, add another attribute.
 236
 237 Disadvantage: We don't really need the relational aspects if SQLite,
 238 and we don't have many people with knowledge of this topic in the
 239 theme. Not much experience with it. Additional library in the install
 240 system. Probably worst performance (but SQLite claims being fast for SQL).
 241
 242
 243 Advantage: Maximum flexibility, especially in queries. Easy changes in
 244 the structure.