1 Persistent Storage System
2 ~~~~~~~~~~~~~~~~~~~~~~~~~
9 The Storage System is responsible for persistently storing and
10 loading Patches, Atoms and similar objects for the packagemanager.
11 Other object kinds might be added in future.
17 Required operations are:
19 - read all objects of a given kind
20 - read in objects identified by a list of
21 tuples (attribute name, match criteria, value)
24 - delete an object (only for cleanup after update, does not need to
26 - a query interface similar to what rpm offers
35 In order of relevance:
37 - It will be part of the ZYPP library, and as such needs to be
38 reentrant and thread safe.
40 - There is only one attribute in an object that might need to be
41 modified, the status. Thus, no universal "modify object" operation is
44 - Most time critical operation is "read all objects", since this will
45 defer the startup of the YaST packagemanager. This already takes an
46 uncomfortably long time.
48 - It should try to cut down on memory usage, since it will be in use at
49 install time, at least when updating.
51 - We'll probably get something in the order of 1000-5000 of these objects.
53 - It must be possible to use different backends for the actual low
54 level storage, i.e. Berkley db, postgres, mysql, flat file.
61 Here's a layer diagram of the main architecture. The paranthesed
62 figures indicate APIs::
64 +--------------------+--------------+
65 | caller, e.g. package manager |
66 +--------(1)---------+ +
68 +-------------------(2)-------------+
69 | Persistent Storage Core |
70 +--------(3)---------+------(4)-----+
71 | Backend plugin | Kind plugin |
72 +--------------------+--------------+
74 | (e.g., Berkley DB) | |
75 +--------------------+ +
77 +--------------------+--------------+
92 These components are described below, the APIs are subject to a later,
93 more detailed follow-up.
100 This implements the rpm-like query operation by using general and
101 simpler search operations within the Core API.
103 FIXME: Insert list of operations.
106 Persistent Storage Core
107 =======================
109 The Core is the "main contact" for the layers above and breaks down
110 the complete functionality for the simpler modules in the lower layer.
112 FIXME: Insert list of operations.
118 The backend plugin considers data objects in a form that is suitable
119 for the backend, i.e. a data record is represented by:
122 - a set of keys, which are
123 - pairs (attribute name, value)
125 The following operations are available:
127 - create the storage database
128 - find records by attributes and return a list of handles.
129 - insert a new record and return a handle to it
131 These operations affect the record that is referenced by a handle:
135 - remove it (delete it)
141 These are responsible for dealing with the specifics of the kind of
142 objects that are stored. This includes:
144 - knowing about the path names of the database
145 - knowing about the attributes and keys
146 - converting the internal representation of a data object to XML and back
147 - extracting the key values from an XML string
153 In the backend, only XML strings are stored as contents, and additionally
154 indexes for fast access. For this, we need a parser which creates simple
155 structure-like objects from the XML string.
156 This is derived from the already implemented XMLNodeIterator class.
163 This is an overview of backends that look promising for handling the
169 Basic idea: Store the data for each object as an XML element in a flat
170 file, store the status separately and add indices for fast access.
172 For each kind there is
174 - a master file, which contains:
176 - the pathnames of the other files
179 - one or more data files, which contains all the data as XML, except
182 - the status for each object as a binary file, each byte representing
183 the status for one object. The association between status byte and
184 object is contained in an index file.
186 - one or more index files (for all data fields that need to be a
187 key). The index could either be an ordered flat file to mmap, or
188 something like a tree or hash table. (I'm still looking for a suitable
191 Advantage: More direct control over everything, can be tailored to our
192 needs, probably the efficients way to do it.
194 Disadvantage: Needs a lot of effort to do it right and get a good performance.
200 Berkley DB is a small non-relational embedded database library that
201 provides compact transparent data storage with a wealth of related
202 services. Data is managed as (key,value) tuples, and indexing is
203 performed for the keys.
205 Basic idea: Store the data for each object as an XML string in a
206 berkley db. Fast access and everything is managed by the db. The
207 Berkley db is already used within rpm, so it's no additional
208 dependency. The library is 1-2MB (depending whether you use the C++
211 Another option: Instead of saving the XML string, serialize the
212 ParserData structure and save this one directly. Faster, but more
213 difficult to access for debugging.
215 Advantage: Everything's there for queries with a single index.
217 Disadvantage: Not so good on multiple indexes. This needs some
220 Side note: xmldb cannot be used since it needs a lot of special libraries with
221 specific versions. For this reason, it hasn't made it on the SUSE
230 SQLite is an embedded and very small SQL engine that contains both
231 server and client, and has no multi-user access support. It is very
232 small (200-300 KByte).
234 Basic idea: Store the data as an XML string in a record. For each key
235 data field, add another attribute.
237 Disadvantage: We don't really need the relational aspects if SQLite,
238 and we don't have many people with knowledge of this topic in the
239 theme. Not much experience with it. Additional library in the install
240 system. Probably worst performance (but SQLite claims being fast for SQL).
243 Advantage: Maximum flexibility, especially in queries. Easy changes in