6 Stefan Behnel, Holger Joukl
8 lxml supports an alternative API similar to the Amara_ bindery or
9 gnosis.xml.objectify_ through a custom Element implementation. The main idea
10 is to hide the usage of XML behind normal Python objects, sometimes referred
11 to as data-binding. It allows you to use XML as if you were dealing with a
12 normal Python object hierarchy.
14 Accessing the children of an XML element deploys object attribute access. If
15 there are multiple children with the same name, slicing and indexing can be
16 used. Python data types are extracted from XML content automatically and made
17 available to the normal Python operators.
21 1 The lxml.objectify API
22 1.1 Creating objectify trees
23 1.2 Element access through object attributes
24 1.3 Tree generation with the E-factory
25 1.4 Namespace handling
29 4.1 Recursive tree dump
30 4.2 Recursive string representation of elements
31 5 How data types are matched
33 5.2 XML Schema datatype annotation
34 5.3 The DataElement factory
35 5.4 Defining additional data classes
36 5.5 Advanced element class lookup
37 6 What is different from lxml.etree?
39 .. _Amara: http://uche.ogbuji.net/tech/4suite/amara/
40 .. _gnosis.xml.objectify: http://gnosis.cx/download/
41 .. _`benchmark page`: performance.html#lxml-objectify
44 To set up and use ``objectify``, you need both the ``lxml.etree``
45 module and ``lxml.objectify``:
49 >>> from lxml import etree
50 >>> from lxml import objectify
52 The objectify API is very different from the ElementTree API. If it
53 is used, it should not be mixed with other element implementations
54 (such as trees parsed with ``lxml.etree``), to avoid non-obvious
57 The `benchmark page`_ has some hints on performance optimisation of
58 code using lxml.objectify.
60 To make the doctests in this document look a little nicer, we also use
65 >>> import lxml.usedoctest
67 Imported from within a doctest, this relieves us from caring about the exact
68 formatting of XML output.
71 >>> try: from StringIO import StringIO
72 ... except ImportError:
73 ... from io import BytesIO # Python 3
75 ... if isinstance(s, str): s = s.encode('UTF-8')
80 >>> from lxml import etree as _etree
81 >>> if sys.version_info[0] >= 3:
82 ... class etree_mock(object):
83 ... def __getattr__(self, name): return getattr(_etree, name)
84 ... def tostring(self, *args, **kwargs):
85 ... s = _etree.tostring(*args, **kwargs)
86 ... if isinstance(s, bytes) and bytes([10]) in s: s = s.decode("utf-8") # CR
87 ... if s[-1] == '\n': s = s[:-1]
90 ... class etree_mock(object):
91 ... def __getattr__(self, name): return getattr(_etree, name)
92 ... def tostring(self, *args, **kwargs):
93 ... s = _etree.tostring(*args, **kwargs)
94 ... if s[-1] == '\n': s = s[:-1]
96 >>> etree = etree_mock()
99 The lxml.objectify API
100 ======================
102 In ``lxml.objectify``, element trees provide an API that models the behaviour
103 of normal Python object trees as closely as possible.
106 Creating objectify trees
107 ------------------------
109 As with ``lxml.etree``, you can either create an ``objectify`` tree by
110 parsing an XML document or by building one from scratch. To parse a
111 document, just use the ``parse()`` or ``fromstring()`` functions of
114 .. sourcecode:: pycon
116 >>> fileobject = StringIO('<test/>')
118 >>> tree = objectify.parse(fileobject)
119 >>> print(isinstance(tree.getroot(), objectify.ObjectifiedElement))
122 >>> root = objectify.fromstring('<test/>')
123 >>> print(isinstance(root, objectify.ObjectifiedElement))
126 To build a new tree in memory, ``objectify`` replicates the standard
127 factory function ``Element()`` from ``lxml.etree``:
129 .. sourcecode:: pycon
131 >>> obj_el = objectify.Element("new")
132 >>> print(isinstance(obj_el, objectify.ObjectifiedElement))
135 After creating such an Element, you can use the `usual API`_ of
136 lxml.etree to add SubElements to the tree:
138 .. sourcecode:: pycon
140 >>> child = etree.SubElement(obj_el, "newchild", attr="value")
142 .. _`usual API`: tutorial.html#the-element-class
144 New subelements will automatically inherit the objectify behaviour
145 from their tree. However, all independent elements that you create
146 through the ``Element()`` factory of lxml.etree (instead of objectify)
147 will not support the ``objectify`` API by themselves:
149 .. sourcecode:: pycon
151 >>> subel = etree.SubElement(obj_el, "sub")
152 >>> print(isinstance(subel, objectify.ObjectifiedElement))
155 >>> independent_el = etree.Element("new")
156 >>> print(isinstance(independent_el, objectify.ObjectifiedElement))
160 Element access through object attributes
161 ----------------------------------------
163 The main idea behind the ``objectify`` API is to hide XML element access
164 behind the usual object attribute access pattern. Asking an element for an
165 attribute will return the sequence of children with corresponding tag names:
167 .. sourcecode:: pycon
169 >>> root = objectify.Element("root")
170 >>> b = etree.SubElement(root, "b")
171 >>> print(root.b[0].tag)
173 >>> root.index(root.b[0])
175 >>> b = etree.SubElement(root, "b")
176 >>> print(root.b[0].tag)
178 >>> print(root.b[1].tag)
180 >>> root.index(root.b[1])
183 For convenience, you can omit the index '0' to access the first child:
185 .. sourcecode:: pycon
187 >>> print(root.b.tag)
189 >>> root.index(root.b)
193 Iteration and slicing also obey the requested tag:
195 .. sourcecode:: pycon
197 >>> x1 = etree.SubElement(root, "x")
198 >>> x2 = etree.SubElement(root, "x")
199 >>> x3 = etree.SubElement(root, "x")
201 >>> [ el.tag for el in root.x ]
204 >>> [ el.tag for el in root.x[1:3] ]
207 >>> [ el.tag for el in root.x[-1:] ]
211 >>> [ el.tag for el in root.x ]
214 If you want to iterate over all children or need to provide a specific
215 namespace for the tag, use the ``iterchildren()`` method. Like the other
216 methods for iteration, it supports an optional tag keyword argument:
218 .. sourcecode:: pycon
220 >>> [ el.tag for el in root.iterchildren() ]
223 >>> [ el.tag for el in root.iterchildren(tag='b') ]
226 >>> [ el.tag for el in root.b ]
229 XML attributes are accessed as in the normal ElementTree API:
231 .. sourcecode:: pycon
233 >>> c = etree.SubElement(root, "c", myattr="someval")
234 >>> print(root.c.get("myattr"))
237 >>> root.c.set("c", "oh-oh")
238 >>> print(root.c.get("c"))
241 In addition to the normal ElementTree API for appending elements to trees,
242 subtrees can also be added by assigning them to object attributes. In this
243 case, the subtree is automatically deep copied and the tag name of its root is
244 updated to match the attribute name:
246 .. sourcecode:: pycon
248 >>> el = objectify.Element("yet_another_child")
249 >>> root.new_child = el
250 >>> print(root.new_child.tag)
255 >>> root.y = [ objectify.Element("y"), objectify.Element("y") ]
256 >>> [ el.tag for el in root.y ]
259 The latter is a short form for operations on the full slice:
261 .. sourcecode:: pycon
263 >>> root.y[:] = [ objectify.Element("y") ]
264 >>> [ el.tag for el in root.y ]
267 You can also replace children that way:
269 .. sourcecode:: pycon
271 >>> child1 = etree.SubElement(root, "child")
272 >>> child2 = etree.SubElement(root, "child")
273 >>> child3 = etree.SubElement(root, "child")
275 >>> el = objectify.Element("new_child")
276 >>> subel = etree.SubElement(el, "sub")
279 >>> print(root.child.sub.tag)
282 >>> root.child[2] = el
283 >>> print(root.child[2].sub.tag)
286 Note that special care must be taken when changing the tag name of an element:
288 .. sourcecode:: pycon
290 >>> print(root.b.tag)
292 >>> root.b.tag = "notB"
294 Traceback (most recent call last):
296 AttributeError: no such child: b
297 >>> print(root.notB.tag)
301 Tree generation with the E-factory
302 ----------------------------------
304 To simplify the generation of trees even further, you can use the E-factory:
306 .. sourcecode:: pycon
313 ... E.d("how", tell="me")
316 >>> print(etree.tostring(root, pretty_print=True))
317 <root xmlns:py="http://codespeak.net/lxml/objectify/pytype">
318 <a py:pytype="int">5</a>
319 <b py:pytype="float">6.1</b>
320 <c py:pytype="bool">true</c>
321 <d py:pytype="str" tell="me">how</d>
324 This allows you to write up a specific language in tags:
326 .. sourcecode:: pycon
328 >>> ROOT = objectify.E.root
329 >>> TITLE = objectify.E.title
330 >>> HOWMANY = getattr(objectify.E, "how-many")
333 ... TITLE("The title"),
337 >>> print(etree.tostring(root, pretty_print=True))
338 <root xmlns:py="http://codespeak.net/lxml/objectify/pytype">
339 <title py:pytype="str">The title</title>
340 <how-many py:pytype="int">5</how-many>
343 ``objectify.E`` is an instance of ``objectify.ElementMaker``. By default, it
344 creates pytype annotated Elements without a namespace. You can switch off the
345 pytype annotation by passing False to the ``annotate`` keyword argument of the
346 constructor. You can also pass a default namespace and an ``nsmap``:
348 .. sourcecode:: pycon
350 >>> myE = objectify.ElementMaker(annotate=False,
351 ... namespace="http://my/ns", nsmap={None : "http://my/ns"})
353 >>> root = myE.root( myE.someint(2) )
355 >>> print(etree.tostring(root, pretty_print=True))
356 <root xmlns="http://my/ns">
364 During tag lookups, namespaces are handled mostly behind the scenes.
365 If you access a child of an Element without specifying a namespace,
366 the lookup will use the namespace of the parent:
368 .. sourcecode:: pycon
370 >>> root = objectify.Element("{http://ns/}root")
371 >>> b = etree.SubElement(root, "{http://ns/}b")
372 >>> c = etree.SubElement(root, "{http://other/}c")
374 >>> print(root.b.tag)
377 Note that the ``SubElement()`` factory of ``lxml.etree`` does not
378 inherit any namespaces when creating a new subelement. Element
379 creation must be explicit about the namespace, and is simplified
380 through the E-factory as described above.
382 Lookups, however, inherit namespaces implicitly:
384 .. sourcecode:: pycon
386 >>> print(root.b.tag)
390 Traceback (most recent call last):
392 AttributeError: no such child: {http://ns/}c
394 To access an element in a different namespace than its parent, you can
397 .. sourcecode:: pycon
399 >>> c = getattr(root, "{http://other/}c")
403 For convenience, there is also a quick way through item access:
405 .. sourcecode:: pycon
407 >>> c = root["{http://other/}c"]
411 The same approach must be used to access children with tag names that are not
412 valid Python identifiers:
414 .. sourcecode:: pycon
416 >>> el = etree.SubElement(root, "{http://ns/}tag-name")
417 >>> print(root["tag-name"].tag)
420 >>> new_el = objectify.Element("{http://ns/}new-element")
421 >>> el = etree.SubElement(new_el, "{http://ns/}child")
422 >>> el = etree.SubElement(new_el, "{http://ns/}child")
423 >>> el = etree.SubElement(new_el, "{http://ns/}child")
425 >>> root["tag-name"] = [ new_el, new_el ]
426 >>> print(len(root["tag-name"]))
428 >>> print(root["tag-name"].tag)
431 >>> print(len(root["tag-name"].child))
433 >>> print(root["tag-name"].child.tag)
435 >>> print(root["tag-name"][1].child.tag)
438 or for names that have a special meaning in lxml.objectify:
440 .. sourcecode:: pycon
442 >>> root = objectify.XML("<root><text>TEXT</text></root>")
444 >>> print(root.text.text)
445 Traceback (most recent call last):
447 AttributeError: 'NoneType' object has no attribute 'text'
449 >>> print(root["text"].text)
456 When dealing with XML documents from different sources, you will often
457 require them to follow a common schema. In lxml.objectify, this
458 directly translates to enforcing a specific object tree, i.e. expected
459 object attributes are ensured to be there and to have the expected
460 type. This can easily be achieved through XML Schema validation at
461 parse time. Also see the `documentation on validation`_ on this
464 .. _`documentation on validation`: validation.html
466 First of all, we need a parser that knows our schema, so let's say we
467 parse the schema from a file-like object (or file or filename):
469 .. sourcecode:: pycon
471 >>> f = StringIO('''\
472 ... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
473 ... <xsd:element name="a" type="AType"/>
474 ... <xsd:complexType name="AType">
476 ... <xsd:element name="b" type="xsd:string" />
478 ... </xsd:complexType>
481 >>> schema = etree.XMLSchema(file=f)
483 When creating the validating parser, we must make sure it `returns
484 objectify trees`_. This is best done with the ``makeparser()``
487 .. sourcecode:: pycon
489 >>> parser = objectify.makeparser(schema = schema)
491 .. _`returns objectify trees`: #advance-element-class-lookup
493 Now we can use it to parse a valid document:
495 .. sourcecode:: pycon
497 >>> xml = "<a><b>test</b></a>"
498 >>> a = objectify.fromstring(xml, parser)
503 Or an invalid document:
505 .. sourcecode:: pycon
507 >>> xml = "<a><b>test</b><c/></a>"
508 >>> a = objectify.fromstring(xml, parser)
509 Traceback (most recent call last):
510 lxml.etree.XMLSyntaxError: Element 'c': This element is not expected.
512 Note that the same works for parse-time DTD validation, except that
513 DTDs do not support any data types by design.
519 For both convenience and speed, objectify supports its own path language,
520 represented by the ``ObjectPath`` class:
522 .. sourcecode:: pycon
524 >>> root = objectify.Element("{http://ns/}root")
525 >>> b1 = etree.SubElement(root, "{http://ns/}b")
526 >>> c = etree.SubElement(b1, "{http://ns/}c")
527 >>> b2 = etree.SubElement(root, "{http://ns/}b")
528 >>> d = etree.SubElement(root, "{http://other/}d")
530 >>> path = objectify.ObjectPath("root.b.c")
533 >>> path.hasattr(root)
535 >>> print(path.find(root).tag)
538 >>> find = objectify.ObjectPath("root.b.c")
539 >>> print(find(root).tag)
542 >>> find = objectify.ObjectPath("root.{http://other/}d")
543 >>> print(find(root).tag)
546 >>> find = objectify.ObjectPath("root.{not}there")
547 >>> print(find(root).tag)
548 Traceback (most recent call last):
550 AttributeError: no such child: {not}there
552 >>> find = objectify.ObjectPath("{not}there")
553 >>> print(find(root).tag)
554 Traceback (most recent call last):
556 ValueError: root element does not match: need {not}there, got {http://ns/}root
558 >>> find = objectify.ObjectPath("root.b[1]")
559 >>> print(find(root).tag)
562 >>> find = objectify.ObjectPath("root.{http://ns/}b[1]")
563 >>> print(find(root).tag)
566 Apart from strings, ObjectPath also accepts lists of path segments:
568 .. sourcecode:: pycon
570 >>> find = objectify.ObjectPath(['root', 'b', 'c'])
571 >>> print(find(root).tag)
574 >>> find = objectify.ObjectPath(['root', '{http://ns/}b[1]'])
575 >>> print(find(root).tag)
578 You can also use relative paths starting with a '.' to ignore the actual root
579 element and only inherit its namespace:
581 .. sourcecode:: pycon
583 >>> find = objectify.ObjectPath(".b[1]")
584 >>> print(find(root).tag)
587 >>> find = objectify.ObjectPath(['', 'b[1]'])
588 >>> print(find(root).tag)
591 >>> find = objectify.ObjectPath(".unknown[1]")
592 >>> print(find(root).tag)
593 Traceback (most recent call last):
595 AttributeError: no such child: {http://ns/}unknown
597 >>> find = objectify.ObjectPath(".{http://other/}unknown[1]")
598 >>> print(find(root).tag)
599 Traceback (most recent call last):
601 AttributeError: no such child: {http://other/}unknown
603 For convenience, a single dot represents the empty ObjectPath (identity):
605 .. sourcecode:: pycon
607 >>> find = objectify.ObjectPath(".")
608 >>> print(find(root).tag)
611 ObjectPath objects can be used to manipulate trees:
613 .. sourcecode:: pycon
615 >>> root = objectify.Element("{http://ns/}root")
617 >>> path = objectify.ObjectPath(".some.child.{http://other/}unknown")
618 >>> path.hasattr(root)
621 Traceback (most recent call last):
623 AttributeError: no such child: {http://ns/}some
625 >>> path.setattr(root, "my value") # creates children as necessary
626 >>> path.hasattr(root)
628 >>> print(path.find(root).text)
630 >>> print(root.some.child["{http://other/}unknown"].text)
633 >>> print(len( path.find(root) ))
635 >>> path.addattr(root, "my new value")
636 >>> print(len( path.find(root) ))
638 >>> [ el.text for el in path.find(root) ]
639 ['my value', 'my new value']
641 As with attribute assignment, ``setattr()`` accepts lists:
643 .. sourcecode:: pycon
645 >>> path.setattr(root, ["v1", "v2", "v3"])
646 >>> [ el.text for el in path.find(root) ]
650 Note, however, that indexing is only supported in this context if the children
651 exist. Indexing of non existing children will not extend or create a list of
652 such children but raise an exception:
654 .. sourcecode:: pycon
656 >>> path = objectify.ObjectPath(".{non}existing[1]")
657 >>> path.setattr(root, "my value")
658 Traceback (most recent call last):
660 TypeError: creating indexed path attributes is not supported
662 It is worth noting that ObjectPath does not depend on the ``objectify`` module
663 or the ObjectifiedElement implementation. It can also be used in combination
664 with Elements from the normal lxml.etree API.
670 The objectify module knows about Python data types and tries its best to let
671 element content behave like them. For example, they support the normal math
674 .. sourcecode:: pycon
676 >>> root = objectify.fromstring(
677 ... "<root><a>5</a><b>11</b><c>true</c><d>hoi</d></root>")
685 >>> print(root.a + 2)
687 >>> print(1 + root.a)
697 >>> print(root.d + " test !")
699 >>> root.d = "%s - %s"
700 >>> print(root.d % (1234, 12345))
703 However, data elements continue to provide the objectify API. This means that
704 sequence operations such as ``len()``, slicing and indexing (e.g. of strings)
705 cannot behave as the Python types. Like all other tree elements, they show
706 the normal slicing behaviour of objectify elements:
708 .. sourcecode:: pycon
710 >>> root = objectify.fromstring("<root><a>test</a><b>toast</b></root>")
711 >>> print(root.a + ' me') # behaves like a string, right?
713 >>> len(root.a) # but there's only one 'a' element!
715 >>> [ a.tag for a in root.a ]
717 >>> print(root.a[0].tag)
722 >>> [ str(a) for a in root.a[:1] ]
725 If you need to run sequence operations on data types, you must ask the API for
726 the *real* Python value. The string value is always available through the
727 normal ElementTree ``.text`` attribute. Additionally, all data classes
728 provide a ``.pyval`` attribute that returns the value as plain Python type:
730 .. sourcecode:: pycon
732 >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
743 Note, however, that both attributes are read-only in objectify. If you want
744 to change values, just assign them directly to the attribute:
746 .. sourcecode:: pycon
748 >>> root.a.text = "25"
749 Traceback (most recent call last):
751 TypeError: attribute 'text' of 'StringElement' objects is not writable
753 >>> root.a.pyval = 25
754 Traceback (most recent call last):
756 TypeError: attribute 'pyval' of 'StringElement' objects is not writable
761 >>> print(root.a.pyval)
764 In other words, ``objectify`` data elements behave like immutable Python
765 types. You can replace them, but not modify them.
771 To see the data types that are currently used, you can call the module level
772 ``dump()`` function that returns a recursive string representation for
775 .. sourcecode:: pycon
777 >>> root = objectify.fromstring("""
778 ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
779 ... <a attr1="foo" attr2="bar">1</a>
784 ... <d xsi:nil="true"/>
788 >>> print(objectify.dump(root))
789 root = None [ObjectifiedElement]
793 a = 1.2 [FloatElement]
795 b = True [BoolElement]
796 c = 'what?' [StringElement]
797 d = None [NoneElement]
800 You can freely switch between different types for the same child:
802 .. sourcecode:: pycon
804 >>> root = objectify.fromstring("<root><a>5</a></root>")
805 >>> print(objectify.dump(root))
806 root = None [ObjectifiedElement]
809 >>> root.a = 'nice string!'
810 >>> print(objectify.dump(root))
811 root = None [ObjectifiedElement]
812 a = 'nice string!' [StringElement]
816 >>> print(objectify.dump(root))
817 root = None [ObjectifiedElement]
818 a = True [BoolElement]
821 >>> root.a = [1, 2, 3]
822 >>> print(objectify.dump(root))
823 root = None [ObjectifiedElement]
831 >>> root.a = (1, 2, 3)
832 >>> print(objectify.dump(root))
833 root = None [ObjectifiedElement]
842 Recursive string representation of elements
843 -------------------------------------------
845 Normally, elements use the standard string representation for str() that is
846 provided by lxml.etree. You can enable a pretty-print representation for
847 objectify elements like this:
849 .. sourcecode:: pycon
851 >>> objectify.enable_recursive_str()
853 >>> root = objectify.fromstring("""
854 ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
855 ... <a attr1="foo" attr2="bar">1</a>
860 ... <d xsi:nil="true"/>
865 root = None [ObjectifiedElement]
869 a = 1.2 [FloatElement]
871 b = True [BoolElement]
872 c = 'what?' [StringElement]
873 d = None [NoneElement]
876 This behaviour can be switched off in the same way:
878 .. sourcecode:: pycon
880 >>> objectify.enable_recursive_str(False)
883 How data types are matched
884 ==========================
886 Objectify uses two different types of Elements. Structural Elements (or tree
887 Elements) represent the object tree structure. Data Elements represent the
888 data containers at the leafs. You can explicitly create tree Elements with
889 the ``objectify.Element()`` factory and data Elements with the
890 ``objectify.DataElement()`` factory.
892 When Element objects are created, lxml.objectify must determine which
893 implementation class to use for them. This is relatively easy for tree
894 Elements and less so for data Elements. The algorithm is as follows:
896 1. If an element has children, use the default tree class.
898 2. If an element is defined as xsi:nil, use the NoneElement class.
900 3. If a "Python type hint" attribute is given, use this to determine the element
903 4. If an XML Schema xsi:type hint is given, use this to determine the element
906 5. Try to determine the element class from the text content type by trial and
909 6. If the element is a root node then use the default tree class.
911 7. Otherwise, use the default class for empty data classes.
913 You can change the default classes for tree Elements and empty data Elements
914 at setup time. The ``ObjectifyElementClassLookup()`` call accepts two keyword
915 arguments, ``tree_class`` and ``empty_data_class``, that determine the Element
916 classes used in these cases. By default, ``tree_class`` is a class called
917 ``ObjectifiedElement`` and ``empty_data_class`` is a ``StringElement``.
923 The "type hint" mechanism deploys an XML attribute defined as
924 ``lxml.objectify.PYTYPE_ATTRIBUTE``. It may contain any of the following
925 string values: int, long, float, str, unicode, NoneType:
927 .. sourcecode:: pycon
929 >>> print(objectify.PYTYPE_ATTRIBUTE)
930 {http://codespeak.net/lxml/objectify/pytype}pytype
931 >>> ns, name = objectify.PYTYPE_ATTRIBUTE[1:].split('}')
933 >>> root = objectify.fromstring("""\
934 ... <root xmlns:py='%s'>
935 ... <a py:pytype='str'>5</a>
936 ... <b py:pytype='int'>5</b>
937 ... <c py:pytype='NoneType' />
941 >>> print(root.a + 10)
943 >>> print(root.b + 10)
948 Note that you can change the name and namespace used for this
949 attribute through the ``set_pytype_attribute_tag(tag)`` module
950 function, in case your application ever needs to. There is also a
951 utility function ``annotate()`` that recursively generates this
952 attribute for the elements of a tree:
954 .. sourcecode:: pycon
956 >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
957 >>> print(objectify.dump(root))
958 root = None [ObjectifiedElement]
959 a = 'test' [StringElement]
962 >>> objectify.annotate(root)
964 >>> print(objectify.dump(root))
965 root = None [ObjectifiedElement]
966 a = 'test' [StringElement]
972 XML Schema datatype annotation
973 ------------------------------
975 A second way of specifying data type information uses XML Schema types as
976 element annotations. Objectify knows those that can be mapped to normal
979 .. sourcecode:: pycon
981 >>> root = objectify.fromstring('''\
982 ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
983 ... xmlns:xsd="http://www.w3.org/2001/XMLSchema">
984 ... <d xsi:type="xsd:double">5</d>
985 ... <i xsi:type="xsd:int" >5</i>
986 ... <s xsi:type="xsd:string">5</s>
989 >>> print(objectify.dump(root))
990 root = None [ObjectifiedElement]
991 d = 5.0 [FloatElement]
992 * xsi:type = 'xsd:double'
994 * xsi:type = 'xsd:int'
995 s = '5' [StringElement]
996 * xsi:type = 'xsd:string'
998 Again, there is a utility function ``xsiannotate()`` that recursively
999 generates the "xsi:type" attribute for the elements of a tree:
1001 .. sourcecode:: pycon
1003 >>> root = objectify.fromstring('''\
1004 ... <root><a>test</a><b>5</b><c>true</c></root>
1006 >>> print(objectify.dump(root))
1007 root = None [ObjectifiedElement]
1008 a = 'test' [StringElement]
1010 c = True [BoolElement]
1012 >>> objectify.xsiannotate(root)
1014 >>> print(objectify.dump(root))
1015 root = None [ObjectifiedElement]
1016 a = 'test' [StringElement]
1017 * xsi:type = 'xsd:string'
1019 * xsi:type = 'xsd:integer'
1020 c = True [BoolElement]
1021 * xsi:type = 'xsd:boolean'
1023 Note, however, that ``xsiannotate()`` will always use the first XML Schema
1024 datatype that is defined for any given Python type, see also
1025 `Defining additional data classes`_.
1027 The utility function ``deannotate()`` can be used to get rid of 'py:pytype'
1028 and/or 'xsi:type' information:
1030 .. sourcecode:: pycon
1032 >>> root = objectify.fromstring('''\
1033 ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1034 ... xmlns:xsd="http://www.w3.org/2001/XMLSchema">
1035 ... <d xsi:type="xsd:double">5</d>
1036 ... <i xsi:type="xsd:int" >5</i>
1037 ... <s xsi:type="xsd:string">5</s>
1039 >>> objectify.annotate(root)
1040 >>> print(objectify.dump(root))
1041 root = None [ObjectifiedElement]
1042 d = 5.0 [FloatElement]
1043 * xsi:type = 'xsd:double'
1044 * py:pytype = 'float'
1046 * xsi:type = 'xsd:int'
1048 s = '5' [StringElement]
1049 * xsi:type = 'xsd:string'
1051 >>> objectify.deannotate(root)
1052 >>> print(objectify.dump(root))
1053 root = None [ObjectifiedElement]
1058 You can control which type attributes should be de-annotated with the keyword
1059 arguments 'pytype' (default: True) and 'xsi' (default: True).
1060 ``deannotate()`` can also remove 'xsi:nil' attributes by setting 'xsi_nil=True'
1063 .. sourcecode:: pycon
1065 >>> root = objectify.fromstring('''\
1066 ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1067 ... xmlns:xsd="http://www.w3.org/2001/XMLSchema">
1068 ... <d xsi:type="xsd:double">5</d>
1069 ... <i xsi:type="xsd:int" >5</i>
1070 ... <s xsi:type="xsd:string">5</s>
1071 ... <n xsi:nil="true"/>
1073 >>> objectify.annotate(root)
1074 >>> print(objectify.dump(root))
1075 root = None [ObjectifiedElement]
1076 d = 5.0 [FloatElement]
1077 * xsi:type = 'xsd:double'
1078 * py:pytype = 'float'
1080 * xsi:type = 'xsd:int'
1082 s = '5' [StringElement]
1083 * xsi:type = 'xsd:string'
1085 n = None [NoneElement]
1087 * py:pytype = 'NoneType'
1088 >>> objectify.deannotate(root, xsi_nil=True)
1089 >>> print(objectify.dump(root))
1090 root = None [ObjectifiedElement]
1094 n = u'' [StringElement]
1096 The DataElement factory
1097 -----------------------
1099 For convenience, the ``DataElement()`` factory creates an Element with a
1100 Python value in one step. You can pass the required Python type name or the
1103 .. sourcecode:: pycon
1105 >>> root = objectify.Element("root")
1106 >>> root.x = objectify.DataElement(5, _pytype="int")
1107 >>> print(objectify.dump(root))
1108 root = None [ObjectifiedElement]
1112 >>> root.x = objectify.DataElement(5, _pytype="str", myattr="someval")
1113 >>> print(objectify.dump(root))
1114 root = None [ObjectifiedElement]
1115 x = '5' [StringElement]
1117 * myattr = 'someval'
1119 >>> root.x = objectify.DataElement(5, _xsi="integer")
1120 >>> print(objectify.dump(root))
1121 root = None [ObjectifiedElement]
1124 * xsi:type = 'xsd:integer'
1126 XML Schema types reside in the XML schema namespace thus ``DataElement()``
1127 tries to correctly prefix the xsi:type attribute value for you:
1129 .. sourcecode:: pycon
1131 >>> root = objectify.Element("root")
1132 >>> root.s = objectify.DataElement(5, _xsi="string")
1134 >>> objectify.deannotate(root, xsi=False)
1135 >>> print(etree.tostring(root, pretty_print=True))
1136 <root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
1137 <s xsi:type="xsd:string">5</s>
1140 ``DataElement()`` uses a default nsmap to set these prefixes:
1142 .. sourcecode:: pycon
1144 >>> el = objectify.DataElement('5', _xsi='string')
1145 >>> namespaces = list(el.nsmap.items())
1146 >>> namespaces.sort()
1147 >>> for prefix, namespace in namespaces:
1148 ... print("%s - %s" % (prefix, namespace))
1149 py - http://codespeak.net/lxml/objectify/pytype
1150 xsd - http://www.w3.org/2001/XMLSchema
1151 xsi - http://www.w3.org/2001/XMLSchema-instance
1153 >>> print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))
1156 While you can set custom namespace prefixes, it is necessary to provide valid
1157 namespace information if you choose to do so:
1159 .. sourcecode:: pycon
1161 >>> el = objectify.DataElement('5', _xsi='foo:string',
1162 ... nsmap={'foo': 'http://www.w3.org/2001/XMLSchema'})
1163 >>> namespaces = list(el.nsmap.items())
1164 >>> namespaces.sort()
1165 >>> for prefix, namespace in namespaces:
1166 ... print("%s - %s" % (prefix, namespace))
1167 foo - http://www.w3.org/2001/XMLSchema
1168 py - http://codespeak.net/lxml/objectify/pytype
1169 xsi - http://www.w3.org/2001/XMLSchema-instance
1171 >>> print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))
1174 Note how lxml chose a default prefix for the XML Schema Instance
1175 namespace. We can override it as in the following example:
1177 .. sourcecode:: pycon
1179 >>> el = objectify.DataElement('5', _xsi='foo:string',
1180 ... nsmap={'foo': 'http://www.w3.org/2001/XMLSchema',
1181 ... 'myxsi': 'http://www.w3.org/2001/XMLSchema-instance'})
1182 >>> namespaces = list(el.nsmap.items())
1183 >>> namespaces.sort()
1184 >>> for prefix, namespace in namespaces:
1185 ... print("%s - %s" % (prefix, namespace))
1186 foo - http://www.w3.org/2001/XMLSchema
1187 myxsi - http://www.w3.org/2001/XMLSchema-instance
1188 py - http://codespeak.net/lxml/objectify/pytype
1190 >>> print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))
1193 Care must be taken if different namespace prefixes have been used for the same
1194 namespace. Namespace information gets merged to avoid duplicate definitions
1195 when adding a new sub-element to a tree, but this mechanism does not adapt the
1196 prefixes of attribute values:
1198 .. sourcecode:: pycon
1200 >>> root = objectify.fromstring("""<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>""")
1201 >>> print(etree.tostring(root, pretty_print=True))
1202 <root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>
1204 >>> s = objectify.DataElement("17", _xsi="string")
1205 >>> print(etree.tostring(s, pretty_print=True))
1206 <value xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</value>
1209 >>> print(etree.tostring(root, pretty_print=True))
1210 <root xmlns:schema="http://www.w3.org/2001/XMLSchema">
1211 <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</s>
1214 It is your responsibility to fix the prefixes of attribute values if you
1215 choose to deviate from the standard prefixes. A convenient way to do this for
1216 xsi:type attributes is to use the ``xsiannotate()`` utility:
1218 .. sourcecode:: pycon
1220 >>> objectify.xsiannotate(root)
1221 >>> print(etree.tostring(root, pretty_print=True))
1222 <root xmlns:schema="http://www.w3.org/2001/XMLSchema">
1223 <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="schema:string">17</s>
1226 Of course, it is discouraged to use different prefixes for one and the same
1227 namespace when building up an objectify tree.
1230 Defining additional data classes
1231 --------------------------------
1233 You can plug additional data classes into objectify that will be used in
1234 exactly the same way as the predefined types. Data classes can either inherit
1235 from ``ObjectifiedDataElement`` directly or from one of the specialised
1236 classes like ``NumberElement`` or ``BoolElement``. The numeric types require
1237 an initial call to the NumberElement method ``self._setValueParser(function)``
1238 to set their type conversion function (string -> numeric Python type). This
1239 call should be placed into the element ``_init()`` method.
1241 The registration of data classes uses the ``PyType`` class:
1243 .. sourcecode:: pycon
1245 >>> class ChristmasDate(objectify.ObjectifiedDataElement):
1246 ... def call_santa(self):
1247 ... print("Ho ho ho!")
1249 >>> def checkChristmasDate(date_string):
1250 ... if not date_string.startswith('24.12.'):
1251 ... raise ValueError # or TypeError
1253 >>> xmas_type = objectify.PyType('date', checkChristmasDate, ChristmasDate)
1255 The PyType constructor takes a string type name, an (optional) callable type
1256 check and the custom data class. If a type check is provided it must accept a
1257 string as argument and raise ValueError or TypeError if it cannot handle the
1260 PyTypes are used if an element carries a ``py:pytype`` attribute denoting its
1261 data type or, in absence of such an attribute, if the given type check callable
1262 does not raise a ValueError/TypeError exception when applied to the element
1265 If you want, you can also register this type under an XML Schema type name:
1267 .. sourcecode:: pycon
1269 >>> xmas_type.xmlSchemaTypes = ("date",)
1271 XML Schema types will be considered if the element has an ``xsi:type``
1272 attribute that specifies its data type. The line above binds the XSD type
1273 ``date`` to the newly defined Python type. Note that this must be done before
1274 the next step, which is to register the type. Then you can use it:
1276 .. sourcecode:: pycon
1278 >>> xmas_type.register()
1280 >>> root = objectify.fromstring(
1281 ... "<root><a>24.12.2000</a><b>12.24.2000</b></root>")
1282 >>> root.a.call_santa()
1284 >>> root.b.call_santa()
1285 Traceback (most recent call last):
1287 AttributeError: no such child: call_santa
1289 If you need to specify dependencies between the type check functions, you can
1290 pass a sequence of type names through the ``before`` and ``after`` keyword
1291 arguments of the ``register()`` method. The PyType will then try to register
1292 itself before or after the respective types, as long as they are currently
1293 registered. Note that this only impacts the currently registered types at the
1294 time of registration. Types that are registered later on will not care about
1295 the dependencies of already registered types.
1297 If you provide XML Schema type information, this will override the type check
1298 function defined above:
1300 .. sourcecode:: pycon
1302 >>> root = objectify.fromstring('''\
1303 ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
1304 ... <a xsi:type="date">12.24.2000</a>
1309 >>> root.a.call_santa()
1312 To unregister a type, call its ``unregister()`` method:
1314 .. sourcecode:: pycon
1316 >>> root.a.call_santa()
1318 >>> xmas_type.unregister()
1319 >>> root.a.call_santa()
1320 Traceback (most recent call last):
1322 AttributeError: no such child: call_santa
1324 Be aware, though, that this does not immediately apply to elements to which
1325 there already is a Python reference. Their Python class will only be changed
1326 after all references are gone and the Python object is garbage collected.
1329 Advanced element class lookup
1330 -----------------------------
1332 In some cases, the normal data class setup is not enough. Being based
1333 on ``lxml.etree``, however, ``lxml.objectify`` supports very
1334 fine-grained control over the Element classes used in a tree. All you
1335 have to do is configure a different `class lookup`_ mechanism (or
1336 write one yourself).
1338 .. _`class lookup`: element_classes.html
1340 The first step for the setup is to create a new parser that builds
1341 objectify documents. The objectify API is meant for data-centric XML
1342 (as opposed to document XML with mixed content). Therefore, we
1343 configure the parser to let it remove whitespace-only text from the
1344 parsed document if it is not enclosed by an XML element. Note that
1345 this alters the document infoset, so if you consider the removed
1346 spaces as data in your specific use case, you should go with a normal
1347 parser and just set the element class lookup. Most applications,
1348 however, will work fine with the following setup:
1350 .. sourcecode:: pycon
1352 >>> parser = objectify.makeparser(remove_blank_text=True)
1354 What this does internally, is:
1356 .. sourcecode:: pycon
1358 >>> parser = etree.XMLParser(remove_blank_text=True)
1360 >>> lookup = objectify.ObjectifyElementClassLookup()
1361 >>> parser.set_element_class_lookup(lookup)
1363 If you want to change the lookup scheme, say, to get additional
1364 support for `namespace specific classes`_, you can register the
1365 objectify lookup as a fallback of the namespace lookup. In this case,
1366 however, you have to take care that the namespace classes inherit from
1367 ``objectify.ObjectifiedElement``, not only from the normal
1368 ``lxml.etree.ElementBase``, so that they support the ``objectify``
1369 API. The above setup code then becomes:
1371 .. sourcecode:: pycon
1373 >>> lookup = etree.ElementNamespaceClassLookup(
1374 ... objectify.ObjectifyElementClassLookup() )
1375 >>> parser.set_element_class_lookup(lookup)
1377 .. _`namespace specific classes`: element_classes.html#namespace-class-lookup
1379 See the documentation on `class lookup`_ schemes for more information.
1382 What is different from lxml.etree?
1383 ==================================
1385 Such a different Element API obviously implies some side effects to the normal
1386 behaviour of the rest of the API.
1388 * len(<element>) returns the sibling count, not the number of children of
1389 <element>. You can retrieve the number of children with the
1390 ``countchildren()`` method.
1392 * Iteration over elements does not yield the children, but the siblings. You
1393 can access all children with the ``iterchildren()`` method on elements or
1394 retrieve a list by calling the ``getchildren()`` method.
1396 * The find, findall and findtext methods require a different implementation
1397 based on ETXPath. In ``lxml.etree``, they use a Python implementation based
1398 on the original iteration scheme. This has the disadvantage that they may
1399 not be 100% backwards compatible, and the additional advantage that they now
1400 support any XPath expression.