--- /dev/null
+
+CACHE WRITE API DESIGN
+dmacvicar@suse.de
+ma@suse.de
+
+Background and problems
+=======================
+
+The biggest problem when designing an API that will be used
+by metadata parsers to fill a database is the fact that all
+the formats read data in different order.
+
+YUM is flawed as it mixes descriptions and user data with
+basic solver data (NVRAD [1]). But it is easy to write to a
+store.
+
+SUSETags is on the other hand has different files for primary
+and user data.
+
+If we consideer a simple API where we have data objects to pass
+to the store:
+
+ .------------------.
+ | package |
+ +------------------+
+ | NVRAD |
+ +------------------+
+ | summary |
+ +------------------+
+ | other data |
+ +------------------+
+ | description |
+ '------------------'
+
+The SQL table behind the API, has a resolvables table which stores
+
+|-----------------| |-----------------|
+| resolvable | | package_data |
+|-----------------| |-----------------|
+| id | NVRAD | | package_id |
+|-----------------| |-----------------|
+ | other data |
+ |-----------------|
+
+Inserting a package means, inserting a resolvable entry, getting a new id
+for it, and then insert a new package entry and fill package_id with it.
+
+YUM can insert this data at the same time while parsing as it is available
+at the same time. SUSETags can't, so it has to cache the NVRAD and the id, to
+insert the second block when it becomes available from the translations file.
+
+This causes the design of the data structures to be high dependant on how the metadata
+is read.
+
+Also
+
+We try to look for a solution that works well in the 99% of the cases, giving the flexibility
+in the rest 1%.
+
+Requirements
+============
+
+- Allow parsers to enter metadata as they get it.
+- Be reasonable fast
+
+The first requirement would make us unable to use fixed data transport
+objects to insert data.
+
+If we choose a data object with certains fields, we will be going in favour
+of the design of certain metadata format.
+
+Proposed Solution
+=================
+
+- a basic resolvable NVRAD data object
+- a dynamic fields object:
+
+ .------------------------.
+ | package_data |
+ +------------------+-----+
+ | summary | [ ] |
+ | description | [ ] |
+ | group | [ ] |
+ | packager | [ ] |
+ | license | [ ] |
+ '------------------+-----'
+
+Everytime a package object is inserted in the cache, the resolvable
+entry will be inserted, but also the specific data for the resolvable
+kind will be created in a empty state. The id of the resolvable will be
+returned.
+
+The parser can then write the data passing a structure like the one
+described above, where the first column represents the field and the second
+the field to update. A SQL UPDATE statement will be generated from this
+data object, and adding the fields for first time will be no different as
+UPDATING the fields.
+
+This presents one problem. As the SQL is generated from the data object
+actve fields, we can't precompile those update statements. This is solved easily.
+We can assume if a metadata parser is inserting a combination of fields for
+lot of packages, that it will use the same combination for all packages in most
+of the cases. We can precompile the statements for a combination of fields and
+cache them in a precompiled statement pool. When we will insert another data
+block, we can lookup if a precompiled statement for the combination exists and
+use it. The only cases that will not benefit from it would be updating all the
+time in different orders (which will hit the cache when all field combinations are
+reached). So problem has a easy solution.
+
+Implementation
+==============
+
+The implementation of the data objects is not defined yet. Several alternatives
+come to mind:
+
+struct PackageData
+{
+ enum Fields {
+ FIELD_DESCRIPTION,
+ FIELD_SUMMARY,
+ FIELD_GROUP,
+ };
+
+ string description;
+ string summary;
+ int size;
+
+ int fields_mask;
+ // or set<Field> fields
+};
+
+struct PackageData
+{
+ pair<bool, string> description;
+ pair<bool, string> summary;
+ pair<bool, int> size;
+};
+
+In this case, we would need to know the types of the data
+before writing it.
+
+We are investigating the use of boost::any [2] in order to see if
+it is possible to make the api even easier.
+
+
+ [1]: Name Version Release Arch Deps
+ [2]: http://www.boost.org/doc/html/any.html
\ No newline at end of file