docs/manual/libxml++_without_code.xml

   1 <?xml version="1.0"?>
   2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
   3   "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
   4   <!ENTITY date "February 2002">
   5   <!ENTITY url_examples_base "http://git.gnome.org/browse/libxml++/tree/examples/">
   6 ]>
   7
   8 <book id="index" lang="en">
   9   <bookinfo>
  10     <title>libxml++ - An XML Parser for C++</title>
  11     <author>
  12       <firstname>Murray</firstname>
  13       <surname>Cumming</surname>
  14       <affiliation>
  15               <address><email>murrayc@murrayc.com</email></address>
  16       </affiliation>
  17     </author>
  18     <date>12th September 2004</date>
  19     <abstract>
  20       <para>This is an introduction to libxml2's C++ binding, with simple examples.</para>
  21     </abstract>
  22   </bookinfo>
  23   <chapter id="chapter-introduction">
  24     <title>libxml++</title>
  25     <para>
  26       libxml++ is a C++ API for the popular <ulink url="http://www.xmlsoft.org">libxml2</ulink> XML parser, written in C.
  27       libxml2 is famous for its high performance and compliance to standard specifications, but its C API is quite difficult even for common tasks.
  28     </para>
  29
  30     <para>
  31       libxml++ presents a simple C++-like API that can achieve common tasks with less code.
  32       Unlike some other C++ parsers, it does not try to avoid the advantages of standard C++ features
  33       such as namespaces, STL containers or runtime type identification, and it does not try
  34       to conform to standard API specifications meant for Java. Therefore libxml++ requires
  35       a fairly modern C++ compiler such as g++ 4.9 or g++ 5. libxml++ 2.39.1 and later require
  36       a C++11-compliant compiler.
  37     </para>
  38
  39     <para>But libxml++ was created mainly to fill the need for an API-stable and ABI-stable C++ XML parser which could be used as a shared library dependency by C++ applications that are distributed widely in binary form. That means that installed applications will not break when new versions of libxml++ are installed on a user's computer. Gradual improvement of the libxml++ API is still possible via non-breaking API additions, and new independent versions of the ABI that can be installed in parallel with older versions. These are the general techniques and principles followed by the <ulink
  40 url="http://www.gnome.org">GNOME</ulink> project, of which libxml++ is a part.</para>
  41
  42     <sect1>
  43     <title>Installation</title>
  44     <para>libxml++ is packaged by major Linux and *BSD distributions and can be installed from source on Linux and Windows, using any modern compiler, such as g++, SUN Forte, or MSVC++.</para>
  45     <para>For instance, to install libxml++ and its documentation on debian, use apt-get or synaptic like so:
  46     <programlisting>
  47     # apt-get install libxml++3.0-dev libxml++3.0-doc
  48     </programlisting>
  49     </para>
  50     <para>To check that you have the libxml++ development packages installed, and that your environment is working properly, try <command>pkg-config libxml++-3.0 --modversion</command>.</para>
  51     <para>Links for downloading and more documentation can be found at <ulink
  52 url="http://libxmlplusplus.sourceforge.net">libxmlplusplus.sourceforge.net</ulink>.
  53     libxml++ is licensed under the LGPL, which allows its use via dynamic linking in both open source and closed-source software. The underlying libxml2 library uses the even more generous MIT licence.</para>
  54     </sect1>
  55
  56     <sect1>
  57     <title>UTF-8 and Glib::ustring</title>
  58     <para>The libxml++ API takes, and gives, strings in the UTF-8 Unicode encoding, which can support all known languages and locales. This choice was made because, of the encodings that have this capability, UTF-8 is the most commonly accepted choice. UTF-8 is a multi-byte encoding, meaning that some characters use more than 1 byte. But for compatibility, old-fashioned 7-bit ASCII strings are unchanged when encoded as UTF-8, and UTF-8 strings do not contain null bytes which would cause old code to misjudge the number of bytes. For these reasons, you can store a UTF-8 string in a std::string object. However, the std::string API will operate on that string in terms of bytes, instead of characters.</para>
  59     <para>Because Standard C++ has no string class that can fully handle UTF-8, libxml++ uses the Glib::ustring class from the glibmm library. Glib::ustring has almost exactly the same API as std::string, but methods such as length() and operator[] deal with whole UTF-8 characters rather than raw bytes.</para>
  60     <para>There are implicit conversions between std::string and Glib::ustring, so you can use std::string wherever you see a Glib::ustring in the API, if you really don't care about any locale other than English. However, that is unlikely in today's connected world.</para>
  61     <para>glibmm also provides useful API to convert between encodings and locales.</para>
  62     </sect1>
  63
  64     <sect1>
  65     <title>Compilation and Linking</title>
  66     <para>To use libxml++ in your application, you must tell the compiler where to find the include headers and where to find the libxml++ library. libxml++ provides a pkg-config .pc file to make this easy. For instance, the following command will provide the necessary compiler options:
  67     <command>pkg-config libxml++-3.0 --cflags --libs</command>
  68     </para>
  69     <para>When using autoconf and automake, this is even easier with the PKG_CHECK_MODULES macro in your configure.ac file. For instance:
  70     <programlisting>
  71     PKG_CHECK_MODULES(SOMEAPP, libxml++-3.0 >= 3.0.0)
  72     AC_SUBST(SOMEAPP_CFLAGS)
  73     AC_SUBST(SOMEAPP_LIBS)
  74     </programlisting>
  75     </para>
  76     </sect1>
  77
  78     </chapter>
  79
  80   <chapter id="chapter-parsers">
  81     <title>Parsers</title>
  82     <para>Like the underlying libxml2 library, libxml++ allows the use of 3 parsers, depending on your needs - the DOM, SAX, and TextReader parsers. The relative advantages and behaviour of these parsers will be explained here.</para>
  83     <para>All of the parsers may parse XML documents directly from disk, a string, or a C++ std::istream. Although the libxml++ API uses only Glib::ustring, and therefore the UTF-8 encoding, libxml++ can parse documents in any encoding, converting to UTF-8 automatically. This conversion will not lose any information because UTF-8 can represent any locale.</para>
  84     <para>Remember that white space is usually significant in XML documents, so the parsers might provide unexpected text nodes that contain only spaces and new lines. The parser does not know whether you care about these text nodes, but your application may choose to ignore them.</para>
  85
  86     <sect1>
  87       <title>DOM Parser</title>
  88       <para>The DOM (Document Object Model) parser parses the whole document at once and stores the structure in memory, available via <methodname>DomParser::get_document()</methodname>. With methods such as <methodname>Document::get_root_node()</methodname> and <methodname>Node::get_children()</methodname>, you may then navigate into the hierarchy of XML nodes without restriction, jumping forwards or backwards in the document based on the information that you encounter. Therefore the DOM parser uses a relatively large amount of memory.</para>
  89       <para>You should use C++ RTTI (via <literal>dynamic_cast&lt;&gt;</literal>) to identify the specific node type and to perform actions which are not possible with all node types. For instance, only <classname>Element</classname>s have attributes. Here is the inheritance hierarchy of node types:</para>
  90
  91       <para>
  92       <itemizedlist>
  93       <listitem><para>xmlpp::Node
  94         <itemizedlist>
  95           <listitem><para>xmlpp::Attribute
  96           <itemizedlist>
  97             <listitem><para>xmlpp::AttributeDeclaration</para></listitem>
  98             <listitem><para>xmlpp::AttributeNode</para></listitem>
  99           </itemizedlist>
 100           </para></listitem>
 101           <listitem><para>xmlpp::ContentNode
 102           <itemizedlist>
 103             <listitem><para>xmlpp::CdataNode</para></listitem>
 104             <listitem><para>xmlpp::CommentNode</para></listitem>
 105             <listitem><para>xmlpp::EntityDeclaration</para></listitem>
 106             <listitem><para>xmlpp::ProcessingInstructionNode</para></listitem>
 107             <listitem><para>xmlpp::TextNode</para></listitem>
 108           </itemizedlist>
 109           </para></listitem>
 110           <listitem><para>xmlpp::Element</para></listitem>
 111           <listitem><para>xmlpp::EntityReference</para></listitem>
 112           <listitem><para>xmlpp::XIncludeEnd</para></listitem>
 113           <listitem><para>xmlpp::XIncludeStart</para></listitem>
 114         </itemizedlist>
 115         </para></listitem>
 116
 117       </itemizedlist>
 118     </para>
 119
 120     <para>All <classname>Node</classname>s created by the DOM parser are leaves
 121       in the node type tree. For instance, the DOM parser can create
 122       <classname>TextNode</classname>s and <classname>Element</classname>s, but it
 123       does not create objects whose exact type is <classname>ContentNode</classname>
 124       or <classname>Node</classname>.
 125     </para>
 126     <para>Although you may obtain pointers to the <classname>Node</classname>s, these <classname>Node</classname>s are always owned by their parent <classname>Node</classname>. In most cases that means that the <classname>Node</classname> will exist, and your pointer will be valid, as long as the <classname>Document</classname> instance exists.</para>
 127     <para>There are also several methods which can create new child <classname>Node</classname>s. By using these, and one of the <methodname>Document::write_*()</methodname> methods, you can use libxml++ to build a new XML document.</para>
 128
 129 <sect2>
 130 <title>Example</title>
 131 <para>This example looks in the document for expected elements and then examines them. All these examples are included in the libxml++ source distribution.</para>
 132 <para><ulink url="&url_examples_base;dom_parser">Source Code</ulink></para>
 133 </sect2>
 134
 135
 136     </sect1>
 137
 138
 139     <sect1>
 140       <title>SAX Parser</title>
 141       <para>The SAX (Simple API for XML) parser presents each node of the XML document in sequence. So when you process one node, you must have already stored information about any relevant previous nodes, and you have no information at that time about subsequent nodes. The SAX parser uses less memory than the DOM parser and it is a suitable abstraction for documents that can be processed sequentially rather than as a whole.</para>
 142
 143       <para>By using the <literal>parse_chunk()</literal> method instead of <literal>parse()</literal>, you can even parse parts of the XML document before you have received the whole document.</para>
 144
 145       <para>As shown in the example, you should derive your own class from SaxParser and override some of the virtual methods. These &quot;handler&quot; methods will be called while the document is parsed.</para>
 146
 147 <sect2>
 148 <title>Example</title>
 149 <para>This example shows how the handler methods are called during parsing.</para>
 150 <para><ulink url="&url_examples_base;sax_parser">Source Code</ulink></para>
 151 </sect2>
 152
 153     </sect1>
 154
 155     <sect1>
 156       <title>TextReader Parser</title>
 157       <para>Like the SAX parser, the TextReader parser is suitable for sequential parsing, but instead of implementing handlers for specific parts of the document, it allows you to detect the current node type, process the node accordingly, and skip forward in the document as much as necessary. Unlike the DOM parser, you may not move backwards in the XML document. And unlike the SAX parser, you must not waste time processing nodes that do not interest you. </para>
 158       <para>All methods are on the single parser instance, but their result depends on the current context. For instance, use <literal>read()</literal> to move to the next node, and <literal>move_to_element()</literal> to navigate to child nodes. These methods will return false when no more nodes are available. Then use methods such as <literal>get_name()</literal> and <literal>get_value()</literal> to examine the elements and their attributes.</para>
 159
 160 <sect2>
 161 <title>Example</title>
 162 <para>This example examines each node in turn, then moves to the next node.</para>
 163 <para><ulink url="&url_examples_base;textreader">Source Code</ulink></para>
 164 </sect2>
 165
 166
 167     </sect1>
 168
 169
 170   </chapter>
 171
 172
 173 </book>