Concept for persistent storage for libzypp.

author Michael Radziej <mir@suse.de>

Mon, 10 Oct 2005 13:48:15 +0000 (13:48 +0000)

committer Michael Radziej <mir@suse.de>

Mon, 10 Oct 2005 13:48:15 +0000 (13:48 +0000)
author Michael Radziej <mir@suse.de>
Mon, 10 Oct 2005 13:48:15 +0000 (13:48 +0000)
committer Michael Radziej <mir@suse.de>
Mon, 10 Oct 2005 13:48:15 +0000 (13:48 +0000)
diff --git a/doc/persistency-concept.txt b/doc/persistency-concept.txt

new file mode 100644 (file)

index 0000000..23aae1c
--- /dev/null
+++ b/doc/persistency-concept.txt
@@ -0,0 +1,234 @@
+Persistent Storage System
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+
+-------
+Purpose
+-------
+
+The Storage System is responsible for persistently storing and
+loading Patches and Atoms for the packagemanager. Other object types
+might be added in future.
+
+----------
+Operations
+----------
+
+Required operations are:
+
+- read all objects of a given type
+- read in an object identified by a given attribute
+- change status
+- save a new object
+- delete an object (only for cleanup after update, does not need to
+  be fast)
+- a query interface similar to what rpm offers
+
+
+
+
+-----------
+Constraints
+-----------
+
+In order of relevance:
+
+- It will be part of the ZYPP library, and as such needs to be
+  reentrant and thread safe.
+
+- There is only one attribute in an object that might need to be
+  modified, the status. Thus, no universal "modify object" operation is
+  needed.
+
+- Most time critical operation is "read all objects", since this will 
+  defer the startup of the YaST packagemanager. This already takes an
+  uncomfortably long time.
+
+- It should try to cut down on memory usage, since it will be in use at
+  install time, at least when updating. 
+
+- We'll probably get something in the order of 100-1000 of these objects.
+
+- It should be possible to use different backends for the actual low
+  level storage, i.e. Berkley db, postgres, mysql, flat file. 
+
+
+------------
+Architecture
+------------
+
+Here's a layer diagram of the main architecture. The paranthesed
+figures indicate APIs::
+
+ +--------------------+--------------+
+ | caller, e.g. package manager      |
+ +--------(1)---------+              +
+ | query interface    |              |
+ +-------------------(2)-------------+
+ | Persistent Storage Core           |
+ +--------(3)---------+------(4)-----+
+ | Backend plugin     | Type plugin  |
+ +--------------------+--------------+
+ | Backend            | Parser       |
+ | (e.g., Berkley DB) |              |
+ +--------------------+              +
+ | Filesystem         |              |
+ +--------------------+--------------+
+
+
+
+APIs:
+
+1) Query API
+
+2) Core API
+
+3) Backend Plugin API
+
+4) Type Plugin API
+
+
+These components are described below, the APIs are subject to a later,
+more detailed follow-up.
+
+
+
+Query Interface
+===============
+
+This implements the rpm-like query operation by using general and
+simpler search operations within the Core API.
+
+
+Persistent Storage Core
+=======================
+
+The Core is the "main contact" for the layers above and breaks down
+the complete functionality for the simpler modules in the lower layer.
+
+
+Backend Plugin
+==============
+
+The backend plugin considers data objects in a form that is suitable
+for the backend, i.e. a data record is represented by:
+
+- an XML string
+- a set of keys, which are
+  - pairs (attribute name, value)
+
+The following operations are available:
+
+- create the storage database
+- find a record by a given attribute and return a handle
+- insert a new record and return a handle to it
+
+These operations affect the record that is referenced by a handle:
+
+- read the record
+- update the status
+- remove it (delete it)
+
+
+Type plugins
+============
+
+These are responsible for dealing with the specifics of the type of
+objects that are stored. This includes:
+
+- knowing about the path names of the database
+- knowing about the attributes and keys
+- converting the internal representation of a data object to XML and back
+- extracting the key values from an XML string
+
+
+Parser
+======
+
+The parser creates simple structure-like objects from the XML string.
+This is derived from the already implemented XMLNodeIterator class.
+
+
+--------
+Backends
+--------
+
+This is an overview of backends that look promising for handling the
+low level storage.
+
+Plain Files
+===========
+
+Basic idea: Store the data for each object as an XML element in a flat
+file, store the status separately and add indices for fast access.
+
+For each type there is 
+
+- a master file, which contains:
+
+  - the pathnames of the other files
+  - storage options
+
+- one or more data files, which contains all the data as XML, except
+  the status 
+
+- the status for each object as a binary file, each byte representing
+  the status for one object. The association between status byte and 
+  object is contained in an index file.
+
+- one or more index files (for all data fields that need to be a
+  key). The index could either be an ordered flat file to mmap, or
+  something like a tree or hash table. (I'm still looking for a suitable
+  library).
+
+Advantage: More direct control over everything, can be tailored to our
+needs, probably the efficients way to do it.
+
+
+Berkley DB
+==========
+
+Berkley DB is a small non-relational embedded database library that
+provides compact transparent data storage with a wealth of related
+services. Data is managed as (key,value) tuples, and indexing is
+performed for the keys.
+
+Basic idea: Store the data for each object as an XML string in a
+berkley db. Fast access and everything is managed by the db. The
+Berkley db is already used within rpm, so it's no additional
+dependency. The library is 1-2MB (depending whether you use the C++
+binding or only C).
+
+Another option: Instead of saving the XML string, serialize the
+ParserData structure and save this one directly. Faster, but more
+difficult to access for debugging.
+
+Advantage: Everything's there. 
+
+Disadvantage: Berkley DB had some issues in the past time. We'll
+depend on it.
+
+Side note: xmldb cannot be used since it needs a lot of special libraries with
+specific versions. For this reason, it hasn't made it on the SUSE 
+Linux 10.0.
+
+
+
+
+SQLite
+======
+
+SQLite is an embedded and very small SQL engine that contains both
+server and client, and has no multi-user access support. It is very
+small (200-300 KByte).
+
+Basic idea: Store the data as an XML string in a record. For each key
+data field, add another attribute.
+
+Disadvantage: We don't really need the relational aspects if SQLite,
+and we don't have many people with knowledge of this topic in the
+theme. Not much experience with it. Additional library in the install
+system. Probably worst performance (but SQLite claims being fast for SQL)
+
+Advantage: Maximum flexibility, especially in queries. Easy changes in
+the structure.
author	Michael Radziej <mir@suse.de>
	Mon, 10 Oct 2005 13:48:15 +0000 (13:48 +0000)
committer	Michael Radziej <mir@suse.de>
	Mon, 10 Oct 2005 13:48:15 +0000 (13:48 +0000)