From 5efd3768a18bbe4466bf72a36dd75cab61befbf4 Mon Sep 17 00:00:00 2001 From: Michael Radziej Date: Mon, 10 Oct 2005 13:48:15 +0000 Subject: [PATCH] Concept for persistent storage for libzypp. This is the initial (old) version. --- doc/persistency-concept.txt | 234 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 234 insertions(+) create mode 100644 doc/persistency-concept.txt diff --git a/doc/persistency-concept.txt b/doc/persistency-concept.txt new file mode 100644 index 0000000..23aae1c --- /dev/null +++ b/doc/persistency-concept.txt @@ -0,0 +1,234 @@ +Persistent Storage System +~~~~~~~~~~~~~~~~~~~~~~~~~ + + +------- +Purpose +------- + +The Storage System is responsible for persistently storing and +loading Patches and Atoms for the packagemanager. Other object types +might be added in future. + +---------- +Operations +---------- + +Required operations are: + +- read all objects of a given type +- read in an object identified by a given attribute +- change status +- save a new object +- delete an object (only for cleanup after update, does not need to + be fast) +- a query interface similar to what rpm offers + + + + +----------- +Constraints +----------- + +In order of relevance: + +- It will be part of the ZYPP library, and as such needs to be + reentrant and thread safe. + +- There is only one attribute in an object that might need to be + modified, the status. Thus, no universal "modify object" operation is + needed. + +- Most time critical operation is "read all objects", since this will + defer the startup of the YaST packagemanager. This already takes an + uncomfortably long time. + +- It should try to cut down on memory usage, since it will be in use at + install time, at least when updating. + +- We'll probably get something in the order of 100-1000 of these objects. + +- It should be possible to use different backends for the actual low + level storage, i.e. Berkley db, postgres, mysql, flat file. + + +------------ +Architecture +------------ + +Here's a layer diagram of the main architecture. The paranthesed +figures indicate APIs:: + + +--------------------+--------------+ + | caller, e.g. package manager | + +--------(1)---------+ + + | query interface | | + +-------------------(2)-------------+ + | Persistent Storage Core | + +--------(3)---------+------(4)-----+ + | Backend plugin | Type plugin | + +--------------------+--------------+ + | Backend | Parser | + | (e.g., Berkley DB) | | + +--------------------+ + + | Filesystem | | + +--------------------+--------------+ + + + +APIs: + +1) Query API + +2) Core API + +3) Backend Plugin API + +4) Type Plugin API + + +These components are described below, the APIs are subject to a later, +more detailed follow-up. + + + +Query Interface +=============== + +This implements the rpm-like query operation by using general and +simpler search operations within the Core API. + + +Persistent Storage Core +======================= + +The Core is the "main contact" for the layers above and breaks down +the complete functionality for the simpler modules in the lower layer. + + +Backend Plugin +============== + +The backend plugin considers data objects in a form that is suitable +for the backend, i.e. a data record is represented by: + +- an XML string +- a set of keys, which are + - pairs (attribute name, value) + +The following operations are available: + +- create the storage database +- find a record by a given attribute and return a handle +- insert a new record and return a handle to it + +These operations affect the record that is referenced by a handle: + +- read the record +- update the status +- remove it (delete it) + + +Type plugins +============ + +These are responsible for dealing with the specifics of the type of +objects that are stored. This includes: + +- knowing about the path names of the database +- knowing about the attributes and keys +- converting the internal representation of a data object to XML and back +- extracting the key values from an XML string + + +Parser +====== + +The parser creates simple structure-like objects from the XML string. +This is derived from the already implemented XMLNodeIterator class. + + +-------- +Backends +-------- + +This is an overview of backends that look promising for handling the +low level storage. + +Plain Files +=========== + +Basic idea: Store the data for each object as an XML element in a flat +file, store the status separately and add indices for fast access. + +For each type there is + +- a master file, which contains: + + - the pathnames of the other files + - storage options + +- one or more data files, which contains all the data as XML, except + the status + +- the status for each object as a binary file, each byte representing + the status for one object. The association between status byte and + object is contained in an index file. + +- one or more index files (for all data fields that need to be a + key). The index could either be an ordered flat file to mmap, or + something like a tree or hash table. (I'm still looking for a suitable + library). + +Advantage: More direct control over everything, can be tailored to our +needs, probably the efficients way to do it. + + +Berkley DB +========== + +Berkley DB is a small non-relational embedded database library that +provides compact transparent data storage with a wealth of related +services. Data is managed as (key,value) tuples, and indexing is +performed for the keys. + +Basic idea: Store the data for each object as an XML string in a +berkley db. Fast access and everything is managed by the db. The +Berkley db is already used within rpm, so it's no additional +dependency. The library is 1-2MB (depending whether you use the C++ +binding or only C). + +Another option: Instead of saving the XML string, serialize the +ParserData structure and save this one directly. Faster, but more +difficult to access for debugging. + +Advantage: Everything's there. + +Disadvantage: Berkley DB had some issues in the past time. We'll +depend on it. + +Side note: xmldb cannot be used since it needs a lot of special libraries with +specific versions. For this reason, it hasn't made it on the SUSE +Linux 10.0. + + + + +SQLite +====== + +SQLite is an embedded and very small SQL engine that contains both +server and client, and has no multi-user access support. It is very +small (200-300 KByte). + +Basic idea: Store the data as an XML string in a record. For each key +data field, add another attribute. + +Disadvantage: We don't really need the relational aspects if SQLite, +and we don't have many people with knowledge of this topic in the +theme. Not much experience with it. Additional library in the install +system. Probably worst performance (but SQLite claims being fast for SQL) + +Advantage: Maximum flexibility, especially in queries. Easy changes in +the structure. -- 2.7.4