--- /dev/null
+I'll use this doc to describe how I think metadata should work from the
+perspective of the application developer and end user, and from that
+extrapolate what we need to provide that.
+
+RATIONALE
+---------
+One of the strong points of GStreamer is that it abstracts library dependencies
+away. A user is free to choose whatever plug-ins he has, and a developer
+can code to the general API that GStreamer provides without having to deal
+with the underlying codecs.
+
+It is important that GStreamer also handles metadata well and efficiently,
+since more often than not the same libraries are needed to do this. So
+to avoid applications depending on these libs just to do the metadata,
+we should make sure GStreamer provides a reasonable and fast abstraction
+for this as well.
+
+GOALS
+-----
+- quickly read and write content metadata
+- quickly read stream metadata
+- cache both kinds of data transparently
+- (possibly) provide bins that do this
+- provide a simple API to do this
+
+DEFINITION OF TERMS
+-------------------
+The user or developer using GStreamer is interested in all information that
+describes the stream. The library handles these two types differently
+however, so I will use the following terms to describe this :
+
+- content metadata
+ every kind of information that is tied to the "concept" of the stream,
+ and not tied to the actual encoding or representation of the stream.
+ - it can be altered without transcoding the stream
+ - it would stay the same for different encodings of the file
+ - describes properties of the information encoded into the stream
+ - examples:
+ - artist, title, author
+ - year, track order, album
+ - comments
+
+- stream metadata
+ every kind of information that is tied to the "codec" or "representation"
+ used.
+ - cannot be altered without transcoding
+ - is the set of parameters the stream has been encoded with
+ - describes properties of the stream itself
+ - examples:
+ - samplerate, bit depth/width, channels
+ - bitrate, encoding mode (e.g. joint stereo)
+ - video size, frames per second, colorspace used
+ - length in time
+
+
+EXAMPLE PIPELINES
+-----------------
+reading content metadata : filesrc ! id3v1
+ - would read metadata from file
+ - id3v1 immediately causes filesrc to seek until it has found
+ - the (first) metadata
+ - that there is no metadata present
+
+resetting and writing content metadata :
+ filesrc ! id3v1 reset=true artist="Arid" ! filesink
+
+ - effect: clear the current tag and reset it to only have Arid as artist
+ - id3v1 seeks to the right location, clears the tag, and writes the new one
+ - filesrc might not be necessary here
+ - this probably only works when doing an in-place edit
+
+COST
+----
+Querying metadata can be expensive.
+Any application querying for metadata should take this into account and
+make sure that it doesn't block the app unnecessarily while the querying
+happens.
+
+In most cases, querying content data should be fast since it doesn't involve
+decoding
+
+Technical data could be harder and thus might be better done only when needed.
+
+CACHE
+-----
+Getting metadata can be an expensive operation. It makes sense to cache
+the metadata queried on-disk to provide rapid access to this data.
+It is important however that this is done transparently - the system should
+be able to keep working without it, or keep working when you delete this cache.
+
+The API would provide a function like
+ gst_metadata_content_read_cached (location)
+or even
+ gst_metadata_read_cached (location, GST_METADATA_CONTENT, GST_METADATA_READ_CACHED)
+to try and get the cached metadata.
+
+- check if the file is cached in the metadata cache
+ - if no, then read the metadata and store it in the cache
+ - if yes, then check the file against it's timestamp (or (part of) md5sum ?)
+ - if it was changed, force a new read and store it in the cache
+ - if it wasn't changed, just return the cached metadata
+
+For optimizations, it might also make sense to do
+ GList * gst_metadata_read_many (GList *locations, ...)
+which would allow the back-end to implement this more efficiently.
+Suppose an application loads a playlist, for example, then this playlist
+could be handed to this function, and a GList of metadata types could
+be returned.
+
+Possible implementations :
+- one large XML file : would end up being too heavy
+- one XML file per dir on system : good compromise; would still make sense
+ to keep this in memory instead of reading and writing it all the time
+ Also, odds are good that users mostly use files from same dir in one app
+ (but not necessarily)
+
+Possible extra niceties :
+- matching of moved files, and a similar move of metadata (through user-space
+ tool ?)
+
+!!! For speed reasons, it might make sense to somehow keep the cache in memory
+instead of reparsing the same cache file each time.
+
+!!! For disk space reasons, it might make sense to have a system cache.
+Not sure if the complexity added is worth it though.
+
+!!! For disk space reasons, we might want to add an upper limit on the size of
+the cache. For that we might need a timestamp on last retrieval of metadata,
+so that we can drop the old ones.
+
+The cache should use standard glibc.
+FIXME: is it worth it to use gnome-vfs for this ?
+
+STANDARDIZATION OF METADATA
+---------------------------
+Different file formats have different "tags". It is not always possible
+to map metadata to tags. Some level of agreement on metadata names is also
+required.
+
+For technical metadata, the names or properties should be fairly standard.
+We also use the same names as used for properties and capabilities in
+GStreamer.
+
+This means we use
+ - encoded audio
+ - "bitrate" (which is bits per second - use the most correct one,
+ ie. average bitrate for VBR for example)
+
+ - raw audio
+ - "samplerate" - sampling frequency
+ - "channels"
+ - "bitwidth" - how wide is the audio in bits
+
+ - encoded video
+ - "bitrate"
+
+ - raw video
+ (FIXME: I don't know enough about video, are these correct)
+ - "width"
+ - "height"
+ - "colordepth"
+ - "colorspace"
+ - "fps"
+ - "aspectratio"
+
+We must find a way to avoid collision. A system stream can contain both
+audio and video (-> bitrate) or multiple audio or video streams. One way
+to do this might be to make a metadata set for a stream a GList of metadata
+for elementary streams.
+
+For content metadata, the standards are less clear.
+Some nice ones to standardize on might be
+ - artist
+ - title
+ - author
+ - year
+ - genre (touchy though)
+ - RMS, inpoint, outpoint (calculated through some formula, used for mixing)
+
+TESTING
+-------
+It is important to write a decent testsuite for this and do speed comparisons
+between the library used and the GStreamer implementation.
+
+
+API
+---
+struct GstMetadata
+{
+ gchar *location;
+ GstMetadataType type;
+
+ GList *streams;
+ GHashtable *values;
+};
+
+(streams would be a GList of (again) GstMetadata's.
+ "location" would then be reused to indicate an identifier in the stream.
+ FIXME: is that evil ?)
+
+GstMetadataType - technical, content
+GstMetadataReadType - cached, raw
+
+GstMetadata *
+gst_metadata_read (const char *location,
+ GstMetadataType type,
+ GstMetadataReadType read_type);
+GstMetadata *
+gst_metadata_read_props (const char *location,
+ GList *names,
+ GstMetadataType type,
+ GstMetadataReadType read_type);
+GstMetadata *
+gst_metadata_read_cached (const char *location,
+ GstMetadataType type,
+ GstMetadataReadType read_type);
+
+GstMetadata *
+gst_metadata_read_props_cached (...)
+
+GList *
+gst_metadata_read_cached_many (GList *locations,
+ GstMetadataType type,
+ GstMetadataReadType read_type);
+
+GList *
+gst_metadata_read_props_cached_many (GList *locations,
+ GList *names,
+ GstMetadataType type,
+ GstMetadataReadType read_type);
+
+GList *
+gst_metadata_content_write (const char *location,
+ GstMetadata *metadata);
+
+
+SOME USEFUL RESOURCES
+---------------------
+http://www.chin.gc.ca/English/Standards/metadata_multimedia.html
+- describes multimedia data for images
+ distinction between content (descriptive), technical and
+ administrative metadata