doc/objectify.txt

   1 ==============
   2 lxml.objectify
   3 ==============
   4
   5 :Authors:
   6   Stefan Behnel, Holger Joukl
   7
   8 lxml supports an alternative API similar to the Amara_ bindery or
   9 gnosis.xml.objectify_ through a custom Element implementation.  The main idea
  10 is to hide the usage of XML behind normal Python objects, sometimes referred
  11 to as data-binding.  It allows you to use XML as if you were dealing with a
  12 normal Python object hierarchy.
  13
  14 Accessing the children of an XML element deploys object attribute access.  If
  15 there are multiple children with the same name, slicing and indexing can be
  16 used.  Python data types are extracted from XML content automatically and made
  17 available to the normal Python operators.
  18
  19 .. contents::
  20 ..
  21    1  The lxml.objectify API
  22      1.1  Creating objectify trees
  23      1.2  Element access through object attributes
  24      1.3  Tree generation with the E-factory
  25      1.4  Namespace handling
  26    2  Asserting a Schema
  27    3  ObjectPath
  28    4  Python data types
  29      4.1  Recursive tree dump
  30      4.2  Recursive string representation of elements
  31    5  How data types are matched
  32      5.1  Type annotations
  33      5.2  XML Schema datatype annotation
  34      5.3  The DataElement factory
  35      5.4  Defining additional data classes
  36      5.5  Advanced element class lookup
  37    6  What is different from lxml.etree?
  38
  39 .. _Amara: http://uche.ogbuji.net/tech/4suite/amara/
  40 .. _gnosis.xml.objectify: http://gnosis.cx/download/
  41 .. _`benchmark page`: performance.html#lxml-objectify
  42
  43
  44 To set up and use ``objectify``, you need both the ``lxml.etree``
  45 module and ``lxml.objectify``:
  46
  47 .. sourcecode:: pycon
  48
  49     >>> from lxml import etree
  50     >>> from lxml import objectify
  51
  52 The objectify API is very different from the ElementTree API.  If it
  53 is used, it should not be mixed with other element implementations
  54 (such as trees parsed with ``lxml.etree``), to avoid non-obvious
  55 behaviour.
  56
  57 The `benchmark page`_ has some hints on performance optimisation of
  58 code using lxml.objectify.
  59
  60 To make the doctests in this document look a little nicer, we also use
  61 this:
  62
  63 .. sourcecode:: pycon
  64
  65     >>> import lxml.usedoctest
  66
  67 Imported from within a doctest, this relieves us from caring about the exact
  68 formatting of XML output.
  69
  70 ..
  71     >>> try: from StringIO import StringIO
  72     ... except ImportError:
  73     ...     from io import BytesIO # Python 3
  74     ...     def StringIO(s):
  75     ...         if isinstance(s, str): s = s.encode('UTF-8')
  76     ...         return BytesIO(s)
  77
  78 ..
  79   >>> import sys
  80   >>> from lxml import etree as _etree
  81   >>> if sys.version_info[0] >= 3:
  82   ...   class etree_mock(object):
  83   ...     def __getattr__(self, name): return getattr(_etree, name)
  84   ...     def tostring(self, *args, **kwargs):
  85   ...       s = _etree.tostring(*args, **kwargs)
  86   ...       if isinstance(s, bytes) and bytes([10]) in s: s = s.decode("utf-8") # CR
  87   ...       if s[-1] == '\n': s = s[:-1]
  88   ...       return s
  89   ... else:
  90   ...   class etree_mock(object):
  91   ...     def __getattr__(self, name): return getattr(_etree, name)
  92   ...     def tostring(self, *args, **kwargs):
  93   ...       s = _etree.tostring(*args, **kwargs)
  94   ...       if s[-1] == '\n': s = s[:-1]
  95   ...       return s
  96   >>> etree = etree_mock()
  97
  98
  99 The lxml.objectify API
 100 ======================
 101
 102 In ``lxml.objectify``, element trees provide an API that models the behaviour
 103 of normal Python object trees as closely as possible.
 104
 105
 106 Creating objectify trees
 107 ------------------------
 108
 109 As with ``lxml.etree``, you can either create an ``objectify`` tree by
 110 parsing an XML document or by building one from scratch.  To parse a
 111 document, just use the ``parse()`` or ``fromstring()`` functions of
 112 the module:
 113
 114 .. sourcecode:: pycon
 115
 116     >>> fileobject = StringIO('<test/>')
 117
 118     >>> tree = objectify.parse(fileobject)
 119     >>> print(isinstance(tree.getroot(), objectify.ObjectifiedElement))
 120     True
 121
 122     >>> root = objectify.fromstring('<test/>')
 123     >>> print(isinstance(root, objectify.ObjectifiedElement))
 124     True
 125
 126 To build a new tree in memory, ``objectify`` replicates the standard
 127 factory function ``Element()`` from ``lxml.etree``:
 128
 129 .. sourcecode:: pycon
 130
 131     >>> obj_el = objectify.Element("new")
 132     >>> print(isinstance(obj_el, objectify.ObjectifiedElement))
 133     True
 134
 135 After creating such an Element, you can use the `usual API`_ of
 136 lxml.etree to add SubElements to the tree:
 137
 138 .. sourcecode:: pycon
 139
 140     >>> child = etree.SubElement(obj_el, "newchild", attr="value")
 141
 142 .. _`usual API`: tutorial.html#the-element-class
 143
 144 New subelements will automatically inherit the objectify behaviour
 145 from their tree.  However, all independent elements that you create
 146 through the ``Element()`` factory of lxml.etree (instead of objectify)
 147 will not support the ``objectify`` API by themselves:
 148
 149 .. sourcecode:: pycon
 150
 151     >>> subel = etree.SubElement(obj_el, "sub")
 152     >>> print(isinstance(subel, objectify.ObjectifiedElement))
 153     True
 154
 155     >>> independent_el = etree.Element("new")
 156     >>> print(isinstance(independent_el, objectify.ObjectifiedElement))
 157     False
 158
 159
 160 Element access through object attributes
 161 ----------------------------------------
 162
 163 The main idea behind the ``objectify`` API is to hide XML element access
 164 behind the usual object attribute access pattern.  Asking an element for an
 165 attribute will return the sequence of children with corresponding tag names:
 166
 167 .. sourcecode:: pycon
 168
 169     >>> root = objectify.Element("root")
 170     >>> b = etree.SubElement(root, "b")
 171     >>> print(root.b[0].tag)
 172     b
 173     >>> root.index(root.b[0])
 174     0
 175     >>> b = etree.SubElement(root, "b")
 176     >>> print(root.b[0].tag)
 177     b
 178     >>> print(root.b[1].tag)
 179     b
 180     >>> root.index(root.b[1])
 181     1
 182
 183 For convenience, you can omit the index '0' to access the first child:
 184
 185 .. sourcecode:: pycon
 186
 187     >>> print(root.b.tag)
 188     b
 189     >>> root.index(root.b)
 190     0
 191     >>> del root.b
 192
 193 Iteration and slicing also obey the requested tag:
 194
 195 .. sourcecode:: pycon
 196
 197     >>> x1 = etree.SubElement(root, "x")
 198     >>> x2 = etree.SubElement(root, "x")
 199     >>> x3 = etree.SubElement(root, "x")
 200
 201     >>> [ el.tag for el in root.x ]
 202     ['x', 'x', 'x']
 203
 204     >>> [ el.tag for el in root.x[1:3] ]
 205     ['x', 'x']
 206
 207     >>> [ el.tag for el in root.x[-1:] ]
 208     ['x']
 209
 210     >>> del root.x[1:2]
 211     >>> [ el.tag for el in root.x ]
 212     ['x', 'x']
 213
 214 If you want to iterate over all children or need to provide a specific
 215 namespace for the tag, use the ``iterchildren()`` method.  Like the other
 216 methods for iteration, it supports an optional tag keyword argument:
 217
 218 .. sourcecode:: pycon
 219
 220     >>> [ el.tag for el in root.iterchildren() ]
 221     ['b', 'x', 'x']
 222
 223     >>> [ el.tag for el in root.iterchildren(tag='b') ]
 224     ['b']
 225
 226     >>> [ el.tag for el in root.b ]
 227     ['b']
 228
 229 XML attributes are accessed as in the normal ElementTree API:
 230
 231 .. sourcecode:: pycon
 232
 233     >>> c = etree.SubElement(root, "c", myattr="someval")
 234     >>> print(root.c.get("myattr"))
 235     someval
 236
 237     >>> root.c.set("c", "oh-oh")
 238     >>> print(root.c.get("c"))
 239     oh-oh
 240
 241 In addition to the normal ElementTree API for appending elements to trees,
 242 subtrees can also be added by assigning them to object attributes.  In this
 243 case, the subtree is automatically deep copied and the tag name of its root is
 244 updated to match the attribute name:
 245
 246 .. sourcecode:: pycon
 247
 248     >>> el = objectify.Element("yet_another_child")
 249     >>> root.new_child = el
 250     >>> print(root.new_child.tag)
 251     new_child
 252     >>> print(el.tag)
 253     yet_another_child
 254
 255     >>> root.y = [ objectify.Element("y"), objectify.Element("y") ]
 256     >>> [ el.tag for el in root.y ]
 257     ['y', 'y']
 258
 259 The latter is a short form for operations on the full slice:
 260
 261 .. sourcecode:: pycon
 262
 263     >>> root.y[:] = [ objectify.Element("y") ]
 264     >>> [ el.tag for el in root.y ]
 265     ['y']
 266
 267 You can also replace children that way:
 268
 269 .. sourcecode:: pycon
 270
 271     >>> child1 = etree.SubElement(root, "child")
 272     >>> child2 = etree.SubElement(root, "child")
 273     >>> child3 = etree.SubElement(root, "child")
 274
 275     >>> el = objectify.Element("new_child")
 276     >>> subel = etree.SubElement(el, "sub")
 277
 278     >>> root.child = el
 279     >>> print(root.child.sub.tag)
 280     sub
 281
 282     >>> root.child[2] = el
 283     >>> print(root.child[2].sub.tag)
 284     sub
 285
 286 Note that special care must be taken when changing the tag name of an element:
 287
 288 .. sourcecode:: pycon
 289
 290     >>> print(root.b.tag)
 291     b
 292     >>> root.b.tag = "notB"
 293     >>> root.b
 294     Traceback (most recent call last):
 295       ...
 296     AttributeError: no such child: b
 297     >>> print(root.notB.tag)
 298     notB
 299
 300
 301 Tree generation with the E-factory
 302 ----------------------------------
 303
 304 To simplify the generation of trees even further, you can use the E-factory:
 305
 306 .. sourcecode:: pycon
 307
 308     >>> E = objectify.E
 309     >>> root = E.root(
 310     ...   E.a(5),
 311     ...   E.b(6.1),
 312     ...   E.c(True),
 313     ...   E.d("how", tell="me")
 314     ... )
 315
 316     >>> print(etree.tostring(root, pretty_print=True))
 317     <root xmlns:py="http://codespeak.net/lxml/objectify/pytype">
 318       <a py:pytype="int">5</a>
 319       <b py:pytype="float">6.1</b>
 320       <c py:pytype="bool">true</c>
 321       <d py:pytype="str" tell="me">how</d>
 322     </root>
 323
 324 This allows you to write up a specific language in tags:
 325
 326 .. sourcecode:: pycon
 327
 328     >>> ROOT = objectify.E.root
 329     >>> TITLE = objectify.E.title
 330     >>> HOWMANY = getattr(objectify.E, "how-many")
 331
 332     >>> root = ROOT(
 333     ...   TITLE("The title"),
 334     ...   HOWMANY(5)
 335     ... )
 336
 337     >>> print(etree.tostring(root, pretty_print=True))
 338     <root xmlns:py="http://codespeak.net/lxml/objectify/pytype">
 339       <title py:pytype="str">The title</title>
 340       <how-many py:pytype="int">5</how-many>
 341     </root>
 342
 343 ``objectify.E`` is an instance of ``objectify.ElementMaker``.  By default, it
 344 creates pytype annotated Elements without a namespace.  You can switch off the
 345 pytype annotation by passing False to the ``annotate`` keyword argument of the
 346 constructor.  You can also pass a default namespace and an ``nsmap``:
 347
 348 .. sourcecode:: pycon
 349
 350     >>> myE = objectify.ElementMaker(annotate=False,
 351     ...           namespace="http://my/ns", nsmap={None : "http://my/ns"})
 352
 353     >>> root = myE.root( myE.someint(2) )
 354
 355     >>> print(etree.tostring(root, pretty_print=True))
 356     <root xmlns="http://my/ns">
 357       <someint>2</someint>
 358     </root>
 359
 360
 361 Namespace handling
 362 ------------------
 363
 364 During tag lookups, namespaces are handled mostly behind the scenes.
 365 If you access a child of an Element without specifying a namespace,
 366 the lookup will use the namespace of the parent:
 367
 368 .. sourcecode:: pycon
 369
 370     >>> root = objectify.Element("{http://ns/}root")
 371     >>> b = etree.SubElement(root, "{http://ns/}b")
 372     >>> c = etree.SubElement(root, "{http://other/}c")
 373
 374     >>> print(root.b.tag)
 375     {http://ns/}b
 376
 377 Note that the ``SubElement()`` factory of ``lxml.etree`` does not
 378 inherit any namespaces when creating a new subelement.  Element
 379 creation must be explicit about the namespace, and is simplified
 380 through the E-factory as described above.
 381
 382 Lookups, however, inherit namespaces implicitly:
 383
 384 .. sourcecode:: pycon
 385
 386     >>> print(root.b.tag)
 387     {http://ns/}b
 388
 389     >>> print(root.c)
 390     Traceback (most recent call last):
 391         ...
 392     AttributeError: no such child: {http://ns/}c
 393
 394 To access an element in a different namespace than its parent, you can
 395 use ``getattr()``:
 396
 397 .. sourcecode:: pycon
 398
 399     >>> c = getattr(root, "{http://other/}c")
 400     >>> print(c.tag)
 401     {http://other/}c
 402
 403 For convenience, there is also a quick way through item access:
 404
 405 .. sourcecode:: pycon
 406
 407     >>> c = root["{http://other/}c"]
 408     >>> print(c.tag)
 409     {http://other/}c
 410
 411 The same approach must be used to access children with tag names that are not
 412 valid Python identifiers:
 413
 414 .. sourcecode:: pycon
 415
 416     >>> el = etree.SubElement(root, "{http://ns/}tag-name")
 417     >>> print(root["tag-name"].tag)
 418     {http://ns/}tag-name
 419
 420     >>> new_el = objectify.Element("{http://ns/}new-element")
 421     >>> el = etree.SubElement(new_el, "{http://ns/}child")
 422     >>> el = etree.SubElement(new_el, "{http://ns/}child")
 423     >>> el = etree.SubElement(new_el, "{http://ns/}child")
 424
 425     >>> root["tag-name"] = [ new_el, new_el ]
 426     >>> print(len(root["tag-name"]))
 427     2
 428     >>> print(root["tag-name"].tag)
 429     {http://ns/}tag-name
 430
 431     >>> print(len(root["tag-name"].child))
 432     3
 433     >>> print(root["tag-name"].child.tag)
 434     {http://ns/}child
 435     >>> print(root["tag-name"][1].child.tag)
 436     {http://ns/}child
 437
 438 or for names that have a special meaning in lxml.objectify:
 439
 440 .. sourcecode:: pycon
 441
 442     >>> root = objectify.XML("<root><text>TEXT</text></root>")
 443
 444     >>> print(root.text.text)
 445     Traceback (most recent call last):
 446       ...
 447     AttributeError: 'NoneType' object has no attribute 'text'
 448
 449     >>> print(root["text"].text)
 450     TEXT
 451
 452
 453 Asserting a Schema
 454 ==================
 455
 456 When dealing with XML documents from different sources, you will often
 457 require them to follow a common schema.  In lxml.objectify, this
 458 directly translates to enforcing a specific object tree, i.e. expected
 459 object attributes are ensured to be there and to have the expected
 460 type.  This can easily be achieved through XML Schema validation at
 461 parse time.  Also see the `documentation on validation`_ on this
 462 topic.
 463
 464 .. _`documentation on validation`: validation.html
 465
 466 First of all, we need a parser that knows our schema, so let's say we
 467 parse the schema from a file-like object (or file or filename):
 468
 469 .. sourcecode:: pycon
 470
 471     >>> f = StringIO('''\
 472     ...   <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 473     ...     <xsd:element name="a" type="AType"/>
 474     ...     <xsd:complexType name="AType">
 475     ...       <xsd:sequence>
 476     ...         <xsd:element name="b" type="xsd:string" />
 477     ...       </xsd:sequence>
 478     ...     </xsd:complexType>
 479     ...   </xsd:schema>
 480     ... ''')
 481     >>> schema = etree.XMLSchema(file=f)
 482
 483 When creating the validating parser, we must make sure it `returns
 484 objectify trees`_.  This is best done with the ``makeparser()``
 485 function:
 486
 487 .. sourcecode:: pycon
 488
 489     >>> parser = objectify.makeparser(schema = schema)
 490
 491 .. _`returns objectify trees`: #advance-element-class-lookup
 492
 493 Now we can use it to parse a valid document:
 494
 495 .. sourcecode:: pycon
 496
 497     >>> xml = "<a><b>test</b></a>"
 498     >>> a = objectify.fromstring(xml, parser)
 499
 500     >>> print(a.b)
 501     test
 502
 503 Or an invalid document:
 504
 505 .. sourcecode:: pycon
 506
 507     >>> xml = "<a><b>test</b><c/></a>"
 508     >>> a = objectify.fromstring(xml, parser)
 509     Traceback (most recent call last):
 510     lxml.etree.XMLSyntaxError: Element 'c': This element is not expected.
 511
 512 Note that the same works for parse-time DTD validation, except that
 513 DTDs do not support any data types by design.
 514
 515
 516 ObjectPath
 517 ==========
 518
 519 For both convenience and speed, objectify supports its own path language,
 520 represented by the ``ObjectPath`` class:
 521
 522 .. sourcecode:: pycon
 523
 524     >>> root = objectify.Element("{http://ns/}root")
 525     >>> b1 = etree.SubElement(root, "{http://ns/}b")
 526     >>> c  = etree.SubElement(b1,   "{http://ns/}c")
 527     >>> b2 = etree.SubElement(root, "{http://ns/}b")
 528     >>> d  = etree.SubElement(root, "{http://other/}d")
 529
 530     >>> path = objectify.ObjectPath("root.b.c")
 531     >>> print(path)
 532     root.b.c
 533     >>> path.hasattr(root)
 534     True
 535     >>> print(path.find(root).tag)
 536     {http://ns/}c
 537
 538     >>> find = objectify.ObjectPath("root.b.c")
 539     >>> print(find(root).tag)
 540     {http://ns/}c
 541
 542     >>> find = objectify.ObjectPath("root.{http://other/}d")
 543     >>> print(find(root).tag)
 544     {http://other/}d
 545
 546     >>> find = objectify.ObjectPath("root.{not}there")
 547     >>> print(find(root).tag)
 548     Traceback (most recent call last):
 549       ...
 550     AttributeError: no such child: {not}there
 551
 552     >>> find = objectify.ObjectPath("{not}there")
 553     >>> print(find(root).tag)
 554     Traceback (most recent call last):
 555       ...
 556     ValueError: root element does not match: need {not}there, got {http://ns/}root
 557
 558     >>> find = objectify.ObjectPath("root.b[1]")
 559     >>> print(find(root).tag)
 560     {http://ns/}b
 561
 562     >>> find = objectify.ObjectPath("root.{http://ns/}b[1]")
 563     >>> print(find(root).tag)
 564     {http://ns/}b
 565
 566 Apart from strings, ObjectPath also accepts lists of path segments:
 567
 568 .. sourcecode:: pycon
 569
 570     >>> find = objectify.ObjectPath(['root', 'b', 'c'])
 571     >>> print(find(root).tag)
 572     {http://ns/}c
 573
 574     >>> find = objectify.ObjectPath(['root', '{http://ns/}b[1]'])
 575     >>> print(find(root).tag)
 576     {http://ns/}b
 577
 578 You can also use relative paths starting with a '.' to ignore the actual root
 579 element and only inherit its namespace:
 580
 581 .. sourcecode:: pycon
 582
 583     >>> find = objectify.ObjectPath(".b[1]")
 584     >>> print(find(root).tag)
 585     {http://ns/}b
 586
 587     >>> find = objectify.ObjectPath(['', 'b[1]'])
 588     >>> print(find(root).tag)
 589     {http://ns/}b
 590
 591     >>> find = objectify.ObjectPath(".unknown[1]")
 592     >>> print(find(root).tag)
 593     Traceback (most recent call last):
 594       ...
 595     AttributeError: no such child: {http://ns/}unknown
 596
 597     >>> find = objectify.ObjectPath(".{http://other/}unknown[1]")
 598     >>> print(find(root).tag)
 599     Traceback (most recent call last):
 600       ...
 601     AttributeError: no such child: {http://other/}unknown
 602
 603 For convenience, a single dot represents the empty ObjectPath (identity):
 604
 605 .. sourcecode:: pycon
 606
 607     >>> find = objectify.ObjectPath(".")
 608     >>> print(find(root).tag)
 609     {http://ns/}root
 610
 611 ObjectPath objects can be used to manipulate trees:
 612
 613 .. sourcecode:: pycon
 614
 615     >>> root = objectify.Element("{http://ns/}root")
 616
 617     >>> path = objectify.ObjectPath(".some.child.{http://other/}unknown")
 618     >>> path.hasattr(root)
 619     False
 620     >>> path.find(root)
 621     Traceback (most recent call last):
 622       ...
 623     AttributeError: no such child: {http://ns/}some
 624
 625     >>> path.setattr(root, "my value") # creates children as necessary
 626     >>> path.hasattr(root)
 627     True
 628     >>> print(path.find(root).text)
 629     my value
 630     >>> print(root.some.child["{http://other/}unknown"].text)
 631     my value
 632
 633     >>> print(len( path.find(root) ))
 634     1
 635     >>> path.addattr(root, "my new value")
 636     >>> print(len( path.find(root) ))
 637     2
 638     >>> [ el.text for el in path.find(root) ]
 639     ['my value', 'my new value']
 640
 641 As with attribute assignment, ``setattr()`` accepts lists:
 642
 643 .. sourcecode:: pycon
 644
 645     >>> path.setattr(root, ["v1", "v2", "v3"])
 646     >>> [ el.text for el in path.find(root) ]
 647     ['v1', 'v2', 'v3']
 648
 649
 650 Note, however, that indexing is only supported in this context if the children
 651 exist.  Indexing of non existing children will not extend or create a list of
 652 such children but raise an exception:
 653
 654 .. sourcecode:: pycon
 655
 656     >>> path = objectify.ObjectPath(".{non}existing[1]")
 657     >>> path.setattr(root, "my value")
 658     Traceback (most recent call last):
 659       ...
 660     TypeError: creating indexed path attributes is not supported
 661
 662 It is worth noting that ObjectPath does not depend on the ``objectify`` module
 663 or the ObjectifiedElement implementation.  It can also be used in combination
 664 with Elements from the normal lxml.etree API.
 665
 666
 667 Python data types
 668 =================
 669
 670 The objectify module knows about Python data types and tries its best to let
 671 element content behave like them.  For example, they support the normal math
 672 operators:
 673
 674 .. sourcecode:: pycon
 675
 676     >>> root = objectify.fromstring(
 677     ...             "<root><a>5</a><b>11</b><c>true</c><d>hoi</d></root>")
 678     >>> root.a + root.b
 679     16
 680     >>> root.a += root.b
 681     >>> print(root.a)
 682     16
 683
 684     >>> root.a = 2
 685     >>> print(root.a + 2)
 686     4
 687     >>> print(1 + root.a)
 688     3
 689
 690     >>> print(root.c)
 691     True
 692     >>> root.c = False
 693     >>> if not root.c:
 694     ...     print("false!")
 695     false!
 696
 697     >>> print(root.d + " test !")
 698     hoi test !
 699     >>> root.d = "%s - %s"
 700     >>> print(root.d % (1234, 12345))
 701     1234 - 12345
 702
 703 However, data elements continue to provide the objectify API.  This means that
 704 sequence operations such as ``len()``, slicing and indexing (e.g. of strings)
 705 cannot behave as the Python types.  Like all other tree elements, they show
 706 the normal slicing behaviour of objectify elements:
 707
 708 .. sourcecode:: pycon
 709
 710     >>> root = objectify.fromstring("<root><a>test</a><b>toast</b></root>")
 711     >>> print(root.a + ' me') # behaves like a string, right?
 712     test me
 713     >>> len(root.a) # but there's only one 'a' element!
 714     1
 715     >>> [ a.tag for a in root.a ]
 716     ['a']
 717     >>> print(root.a[0].tag)
 718     a
 719
 720     >>> print(root.a)
 721     test
 722     >>> [ str(a) for a in root.a[:1] ]
 723     ['test']
 724
 725 If you need to run sequence operations on data types, you must ask the API for
 726 the *real* Python value.  The string value is always available through the
 727 normal ElementTree ``.text`` attribute.  Additionally, all data classes
 728 provide a ``.pyval`` attribute that returns the value as plain Python type:
 729
 730 .. sourcecode:: pycon
 731
 732     >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
 733     >>> root.a.text
 734     'test'
 735     >>> root.a.pyval
 736     'test'
 737
 738     >>> root.b.text
 739     '5'
 740     >>> root.b.pyval
 741     5
 742
 743 Note, however, that both attributes are read-only in objectify.  If you want
 744 to change values, just assign them directly to the attribute:
 745
 746 .. sourcecode:: pycon
 747
 748     >>> root.a.text  = "25"
 749     Traceback (most recent call last):
 750       ...
 751     TypeError: attribute 'text' of 'StringElement' objects is not writable
 752
 753     >>> root.a.pyval = 25
 754     Traceback (most recent call last):
 755       ...
 756     TypeError: attribute 'pyval' of 'StringElement' objects is not writable
 757
 758     >>> root.a = 25
 759     >>> print(root.a)
 760     25
 761     >>> print(root.a.pyval)
 762     25
 763
 764 In other words, ``objectify`` data elements behave like immutable Python
 765 types.  You can replace them, but not modify them.
 766
 767
 768 Recursive tree dump
 769 -------------------
 770
 771 To see the data types that are currently used, you can call the module level
 772 ``dump()`` function that returns a recursive string representation for
 773 elements:
 774
 775 .. sourcecode:: pycon
 776
 777     >>> root = objectify.fromstring("""
 778     ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 779     ...   <a attr1="foo" attr2="bar">1</a>
 780     ...   <a>1.2</a>
 781     ...   <b>1</b>
 782     ...   <b>true</b>
 783     ...   <c>what?</c>
 784     ...   <d xsi:nil="true"/>
 785     ... </root>
 786     ... """)
 787
 788     >>> print(objectify.dump(root))
 789     root = None [ObjectifiedElement]
 790         a = 1 [IntElement]
 791           * attr1 = 'foo'
 792           * attr2 = 'bar'
 793         a = 1.2 [FloatElement]
 794         b = 1 [IntElement]
 795         b = True [BoolElement]
 796         c = 'what?' [StringElement]
 797         d = None [NoneElement]
 798           * xsi:nil = 'true'
 799
 800 You can freely switch between different types for the same child:
 801
 802 .. sourcecode:: pycon
 803
 804     >>> root = objectify.fromstring("<root><a>5</a></root>")
 805     >>> print(objectify.dump(root))
 806     root = None [ObjectifiedElement]
 807         a = 5 [IntElement]
 808
 809     >>> root.a = 'nice string!'
 810     >>> print(objectify.dump(root))
 811     root = None [ObjectifiedElement]
 812         a = 'nice string!' [StringElement]
 813           * py:pytype = 'str'
 814
 815     >>> root.a = True
 816     >>> print(objectify.dump(root))
 817     root = None [ObjectifiedElement]
 818         a = True [BoolElement]
 819           * py:pytype = 'bool'
 820
 821     >>> root.a = [1, 2, 3]
 822     >>> print(objectify.dump(root))
 823     root = None [ObjectifiedElement]
 824         a = 1 [IntElement]
 825           * py:pytype = 'int'
 826         a = 2 [IntElement]
 827           * py:pytype = 'int'
 828         a = 3 [IntElement]
 829           * py:pytype = 'int'
 830
 831     >>> root.a = (1, 2, 3)
 832     >>> print(objectify.dump(root))
 833     root = None [ObjectifiedElement]
 834         a = 1 [IntElement]
 835           * py:pytype = 'int'
 836         a = 2 [IntElement]
 837           * py:pytype = 'int'
 838         a = 3 [IntElement]
 839           * py:pytype = 'int'
 840
 841
 842 Recursive string representation of elements
 843 -------------------------------------------
 844
 845 Normally, elements use the standard string representation for str() that is
 846 provided by lxml.etree.  You can enable a pretty-print representation for
 847 objectify elements like this:
 848
 849 .. sourcecode:: pycon
 850
 851     >>> objectify.enable_recursive_str()
 852
 853     >>> root = objectify.fromstring("""
 854     ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 855     ...   <a attr1="foo" attr2="bar">1</a>
 856     ...   <a>1.2</a>
 857     ...   <b>1</b>
 858     ...   <b>true</b>
 859     ...   <c>what?</c>
 860     ...   <d xsi:nil="true"/>
 861     ... </root>
 862     ... """)
 863
 864     >>> print(str(root))
 865     root = None [ObjectifiedElement]
 866         a = 1 [IntElement]
 867           * attr1 = 'foo'
 868           * attr2 = 'bar'
 869         a = 1.2 [FloatElement]
 870         b = 1 [IntElement]
 871         b = True [BoolElement]
 872         c = 'what?' [StringElement]
 873         d = None [NoneElement]
 874           * xsi:nil = 'true'
 875
 876 This behaviour can be switched off in the same way:
 877
 878 .. sourcecode:: pycon
 879
 880     >>> objectify.enable_recursive_str(False)
 881
 882
 883 How data types are matched
 884 ==========================
 885
 886 Objectify uses two different types of Elements.  Structural Elements (or tree
 887 Elements) represent the object tree structure.  Data Elements represent the
 888 data containers at the leafs.  You can explicitly create tree Elements with
 889 the ``objectify.Element()`` factory and data Elements with the
 890 ``objectify.DataElement()`` factory.
 891
 892 When Element objects are created, lxml.objectify must determine which
 893 implementation class to use for them.  This is relatively easy for tree
 894 Elements and less so for data Elements.  The algorithm is as follows:
 895
 896 1. If an element has children, use the default tree class.
 897
 898 2. If an element is defined as xsi:nil, use the NoneElement class.
 899
 900 3. If a "Python type hint" attribute is given, use this to determine the element
 901    class, see below.
 902
 903 4. If an XML Schema xsi:type hint is given, use this to determine the element
 904    class, see below.
 905
 906 5. Try to determine the element class from the text content type by trial and
 907    error.
 908
 909 6. If the element is a root node then use the default tree class.
 910
 911 7. Otherwise, use the default class for empty data classes.
 912
 913 You can change the default classes for tree Elements and empty data Elements
 914 at setup time.  The ``ObjectifyElementClassLookup()`` call accepts two keyword
 915 arguments, ``tree_class`` and ``empty_data_class``, that determine the Element
 916 classes used in these cases.  By default, ``tree_class`` is a class called
 917 ``ObjectifiedElement`` and ``empty_data_class`` is a ``StringElement``.
 918
 919
 920 Type annotations
 921 ----------------
 922
 923 The "type hint" mechanism deploys an XML attribute defined as
 924 ``lxml.objectify.PYTYPE_ATTRIBUTE``.  It may contain any of the following
 925 string values: int, long, float, str, unicode, NoneType:
 926
 927 .. sourcecode:: pycon
 928
 929     >>> print(objectify.PYTYPE_ATTRIBUTE)
 930     {http://codespeak.net/lxml/objectify/pytype}pytype
 931     >>> ns, name = objectify.PYTYPE_ATTRIBUTE[1:].split('}')
 932
 933     >>> root = objectify.fromstring("""\
 934     ... <root xmlns:py='%s'>
 935     ...   <a py:pytype='str'>5</a>
 936     ...   <b py:pytype='int'>5</b>
 937     ...   <c py:pytype='NoneType' />
 938     ... </root>
 939     ... """ % ns)
 940
 941     >>> print(root.a + 10)
 942     510
 943     >>> print(root.b + 10)
 944     15
 945     >>> print(root.c)
 946     None
 947
 948 Note that you can change the name and namespace used for this
 949 attribute through the ``set_pytype_attribute_tag(tag)`` module
 950 function, in case your application ever needs to.  There is also a
 951 utility function ``annotate()`` that recursively generates this
 952 attribute for the elements of a tree:
 953
 954 .. sourcecode:: pycon
 955
 956     >>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
 957     >>> print(objectify.dump(root))
 958     root = None [ObjectifiedElement]
 959         a = 'test' [StringElement]
 960         b = 5 [IntElement]
 961
 962     >>> objectify.annotate(root)
 963
 964     >>> print(objectify.dump(root))
 965     root = None [ObjectifiedElement]
 966         a = 'test' [StringElement]
 967           * py:pytype = 'str'
 968         b = 5 [IntElement]
 969           * py:pytype = 'int'
 970
 971
 972 XML Schema datatype annotation
 973 ------------------------------
 974
 975 A second way of specifying data type information uses XML Schema types as
 976 element annotations.  Objectify knows those that can be mapped to normal
 977 Python types:
 978
 979 .. sourcecode:: pycon
 980
 981     >>> root = objectify.fromstring('''\
 982     ...    <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 983     ...          xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 984     ...      <d xsi:type="xsd:double">5</d>
 985     ...      <i xsi:type="xsd:int"   >5</i>
 986     ...      <s xsi:type="xsd:string">5</s>
 987     ...    </root>
 988     ...    ''')
 989     >>> print(objectify.dump(root))
 990     root = None [ObjectifiedElement]
 991         d = 5.0 [FloatElement]
 992           * xsi:type = 'xsd:double'
 993         i = 5 [IntElement]
 994           * xsi:type = 'xsd:int'
 995         s = '5' [StringElement]
 996           * xsi:type = 'xsd:string'
 997
 998 Again, there is a utility function ``xsiannotate()`` that recursively
 999 generates the "xsi:type" attribute for the elements of a tree:
1000
1001 .. sourcecode:: pycon
1002
1003     >>> root = objectify.fromstring('''\
1004     ...    <root><a>test</a><b>5</b><c>true</c></root>
1005     ...    ''')
1006     >>> print(objectify.dump(root))
1007     root = None [ObjectifiedElement]
1008         a = 'test' [StringElement]
1009         b = 5 [IntElement]
1010         c = True [BoolElement]
1011
1012     >>> objectify.xsiannotate(root)
1013
1014     >>> print(objectify.dump(root))
1015     root = None [ObjectifiedElement]
1016         a = 'test' [StringElement]
1017           * xsi:type = 'xsd:string'
1018         b = 5 [IntElement]
1019           * xsi:type = 'xsd:integer'
1020         c = True [BoolElement]
1021           * xsi:type = 'xsd:boolean'
1022
1023 Note, however, that ``xsiannotate()`` will always use the first XML Schema
1024 datatype that is defined for any given Python type, see also
1025 `Defining additional data classes`_.
1026
1027 The utility function ``deannotate()`` can be used to get rid of 'py:pytype'
1028 and/or 'xsi:type' information:
1029
1030 .. sourcecode:: pycon
1031
1032     >>> root = objectify.fromstring('''\
1033     ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1034     ...       xmlns:xsd="http://www.w3.org/2001/XMLSchema">
1035     ...   <d xsi:type="xsd:double">5</d>
1036     ...   <i xsi:type="xsd:int"   >5</i>
1037     ...   <s xsi:type="xsd:string">5</s>
1038     ... </root>''')
1039     >>> objectify.annotate(root)
1040     >>> print(objectify.dump(root))
1041     root = None [ObjectifiedElement]
1042         d = 5.0 [FloatElement]
1043           * xsi:type = 'xsd:double'
1044           * py:pytype = 'float'
1045         i = 5 [IntElement]
1046           * xsi:type = 'xsd:int'
1047           * py:pytype = 'int'
1048         s = '5' [StringElement]
1049           * xsi:type = 'xsd:string'
1050           * py:pytype = 'str'
1051     >>> objectify.deannotate(root)
1052     >>> print(objectify.dump(root))
1053     root = None [ObjectifiedElement]
1054         d = 5 [IntElement]
1055         i = 5 [IntElement]
1056         s = 5 [IntElement]
1057
1058 You can control which type attributes should be de-annotated with the keyword
1059 arguments 'pytype' (default: True) and 'xsi' (default: True).
1060 ``deannotate()`` can also remove 'xsi:nil' attributes by setting 'xsi_nil=True'
1061 (default: False):
1062
1063 .. sourcecode:: pycon
1064
1065     >>> root = objectify.fromstring('''\
1066     ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1067     ...       xmlns:xsd="http://www.w3.org/2001/XMLSchema">
1068     ...   <d xsi:type="xsd:double">5</d>
1069     ...   <i xsi:type="xsd:int"   >5</i>
1070     ...   <s xsi:type="xsd:string">5</s>
1071     ...   <n xsi:nil="true"/>
1072     ... </root>''')
1073     >>> objectify.annotate(root)
1074     >>> print(objectify.dump(root))
1075     root = None [ObjectifiedElement]
1076         d = 5.0 [FloatElement]
1077           * xsi:type = 'xsd:double'
1078           * py:pytype = 'float'
1079         i = 5 [IntElement]
1080           * xsi:type = 'xsd:int'
1081           * py:pytype = 'int'
1082         s = '5' [StringElement]
1083           * xsi:type = 'xsd:string'
1084           * py:pytype = 'str'
1085         n = None [NoneElement]
1086           * xsi:nil = 'true'
1087           * py:pytype = 'NoneType'
1088     >>> objectify.deannotate(root, xsi_nil=True)
1089     >>> print(objectify.dump(root))
1090     root = None [ObjectifiedElement]
1091         d = 5 [IntElement]
1092         i = 5 [IntElement]
1093         s = 5 [IntElement]
1094         n = u'' [StringElement]
1095
1096 The DataElement factory
1097 -----------------------
1098
1099 For convenience, the ``DataElement()`` factory creates an Element with a
1100 Python value in one step.  You can pass the required Python type name or the
1101 XSI type name:
1102
1103 .. sourcecode:: pycon
1104
1105     >>> root = objectify.Element("root")
1106     >>> root.x = objectify.DataElement(5, _pytype="int")
1107     >>> print(objectify.dump(root))
1108     root = None [ObjectifiedElement]
1109         x = 5 [IntElement]
1110           * py:pytype = 'int'
1111
1112     >>> root.x = objectify.DataElement(5, _pytype="str", myattr="someval")
1113     >>> print(objectify.dump(root))
1114     root = None [ObjectifiedElement]
1115         x = '5' [StringElement]
1116           * py:pytype = 'str'
1117           * myattr = 'someval'
1118
1119     >>> root.x = objectify.DataElement(5, _xsi="integer")
1120     >>> print(objectify.dump(root))
1121     root = None [ObjectifiedElement]
1122         x = 5 [IntElement]
1123           * py:pytype = 'int'
1124           * xsi:type = 'xsd:integer'
1125
1126 XML Schema types reside in the XML schema namespace thus ``DataElement()``
1127 tries to correctly prefix the xsi:type attribute value for you:
1128
1129 .. sourcecode:: pycon
1130
1131     >>> root = objectify.Element("root")
1132     >>> root.s = objectify.DataElement(5, _xsi="string")
1133
1134     >>> objectify.deannotate(root, xsi=False)
1135     >>> print(etree.tostring(root, pretty_print=True))
1136     <root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
1137       <s xsi:type="xsd:string">5</s>
1138     </root>
1139
1140 ``DataElement()`` uses a default nsmap to set these prefixes:
1141
1142 .. sourcecode:: pycon
1143
1144     >>> el = objectify.DataElement('5', _xsi='string')
1145     >>> namespaces = list(el.nsmap.items())
1146     >>> namespaces.sort()
1147     >>> for prefix, namespace in namespaces:
1148     ...     print("%s - %s" % (prefix, namespace))
1149     py - http://codespeak.net/lxml/objectify/pytype
1150     xsd - http://www.w3.org/2001/XMLSchema
1151     xsi - http://www.w3.org/2001/XMLSchema-instance
1152
1153     >>> print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))
1154     xsd:string
1155
1156 While you can set custom namespace prefixes, it is necessary to provide valid
1157 namespace information if you choose to do so:
1158
1159 .. sourcecode:: pycon
1160
1161     >>> el = objectify.DataElement('5', _xsi='foo:string',
1162     ...          nsmap={'foo': 'http://www.w3.org/2001/XMLSchema'})
1163     >>> namespaces = list(el.nsmap.items())
1164     >>> namespaces.sort()
1165     >>> for prefix, namespace in namespaces:
1166     ...     print("%s - %s" % (prefix, namespace))
1167     foo - http://www.w3.org/2001/XMLSchema
1168     py - http://codespeak.net/lxml/objectify/pytype
1169     xsi - http://www.w3.org/2001/XMLSchema-instance
1170
1171     >>> print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))
1172     foo:string
1173
1174 Note how lxml chose a default prefix for the XML Schema Instance
1175 namespace.  We can override it as in the following example:
1176
1177 .. sourcecode:: pycon
1178
1179     >>> el = objectify.DataElement('5', _xsi='foo:string',
1180     ...          nsmap={'foo': 'http://www.w3.org/2001/XMLSchema',
1181     ...                 'myxsi': 'http://www.w3.org/2001/XMLSchema-instance'})
1182     >>> namespaces = list(el.nsmap.items())
1183     >>> namespaces.sort()
1184     >>> for prefix, namespace in namespaces:
1185     ...     print("%s - %s" % (prefix, namespace))
1186     foo - http://www.w3.org/2001/XMLSchema
1187     myxsi - http://www.w3.org/2001/XMLSchema-instance
1188     py - http://codespeak.net/lxml/objectify/pytype
1189
1190     >>> print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))
1191     foo:string
1192
1193 Care must be taken if different namespace prefixes have been used for the same
1194 namespace.  Namespace information gets merged to avoid duplicate definitions
1195 when adding a new sub-element to a tree, but this mechanism does not adapt the
1196 prefixes of attribute values:
1197
1198 .. sourcecode:: pycon
1199
1200     >>> root = objectify.fromstring("""<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>""")
1201     >>> print(etree.tostring(root, pretty_print=True))
1202     <root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>
1203
1204     >>> s = objectify.DataElement("17", _xsi="string")
1205     >>> print(etree.tostring(s, pretty_print=True))
1206     <value xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</value>
1207
1208     >>> root.s = s
1209     >>> print(etree.tostring(root, pretty_print=True))
1210     <root xmlns:schema="http://www.w3.org/2001/XMLSchema">
1211       <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</s>
1212     </root>
1213
1214 It is your responsibility to fix the prefixes of attribute values if you
1215 choose to deviate from the standard prefixes.  A convenient way to do this for
1216 xsi:type attributes is to use the ``xsiannotate()`` utility:
1217
1218 .. sourcecode:: pycon
1219
1220     >>> objectify.xsiannotate(root)
1221     >>> print(etree.tostring(root, pretty_print=True))
1222     <root xmlns:schema="http://www.w3.org/2001/XMLSchema">
1223       <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="schema:string">17</s>
1224     </root>
1225
1226 Of course, it is discouraged to use different prefixes for one and the same
1227 namespace when building up an objectify tree.
1228
1229
1230 Defining additional data classes
1231 --------------------------------
1232
1233 You can plug additional data classes into objectify that will be used in
1234 exactly the same way as the predefined types.  Data classes can either inherit
1235 from ``ObjectifiedDataElement`` directly or from one of the specialised
1236 classes like ``NumberElement`` or ``BoolElement``.  The numeric types require
1237 an initial call to the NumberElement method ``self._setValueParser(function)``
1238 to set their type conversion function (string -> numeric Python type).  This
1239 call should be placed into the element ``_init()`` method.
1240
1241 The registration of data classes uses the ``PyType`` class:
1242
1243 .. sourcecode:: pycon
1244
1245     >>> class ChristmasDate(objectify.ObjectifiedDataElement):
1246     ...     def call_santa(self):
1247     ...         print("Ho ho ho!")
1248
1249     >>> def checkChristmasDate(date_string):
1250     ...     if not date_string.startswith('24.12.'):
1251     ...         raise ValueError # or TypeError
1252
1253     >>> xmas_type = objectify.PyType('date', checkChristmasDate, ChristmasDate)
1254
1255 The PyType constructor takes a string type name, an (optional) callable type
1256 check and the custom data class.  If a type check is provided it must accept a
1257 string as argument and raise ValueError or TypeError if it cannot handle the
1258 string value.
1259
1260 PyTypes are used if an element carries a ``py:pytype`` attribute denoting its
1261 data type or, in absence of such an attribute, if the given type check callable
1262 does not raise a ValueError/TypeError exception when applied to the element
1263 text.
1264
1265 If you want, you can also register this type under an XML Schema type name:
1266
1267 .. sourcecode:: pycon
1268
1269     >>> xmas_type.xmlSchemaTypes = ("date",)
1270
1271 XML Schema types will be considered if the element has an ``xsi:type``
1272 attribute that specifies its data type.  The line above binds the XSD type
1273 ``date`` to the newly defined Python type.  Note that this must be done before
1274 the next step, which is to register the type.  Then you can use it:
1275
1276 .. sourcecode:: pycon
1277
1278     >>> xmas_type.register()
1279
1280     >>> root = objectify.fromstring(
1281     ...             "<root><a>24.12.2000</a><b>12.24.2000</b></root>")
1282     >>> root.a.call_santa()
1283     Ho ho ho!
1284     >>> root.b.call_santa()
1285     Traceback (most recent call last):
1286       ...
1287     AttributeError: no such child: call_santa
1288
1289 If you need to specify dependencies between the type check functions, you can
1290 pass a sequence of type names through the ``before`` and ``after`` keyword
1291 arguments of the ``register()`` method.  The PyType will then try to register
1292 itself before or after the respective types, as long as they are currently
1293 registered.  Note that this only impacts the currently registered types at the
1294 time of registration.  Types that are registered later on will not care about
1295 the dependencies of already registered types.
1296
1297 If you provide XML Schema type information, this will override the type check
1298 function defined above:
1299
1300 .. sourcecode:: pycon
1301
1302     >>> root = objectify.fromstring('''\
1303     ...    <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
1304     ...      <a xsi:type="date">12.24.2000</a>
1305     ...    </root>
1306     ...    ''')
1307     >>> print(root.a)
1308     12.24.2000
1309     >>> root.a.call_santa()
1310     Ho ho ho!
1311
1312 To unregister a type, call its ``unregister()`` method:
1313
1314 .. sourcecode:: pycon
1315
1316     >>> root.a.call_santa()
1317     Ho ho ho!
1318     >>> xmas_type.unregister()
1319     >>> root.a.call_santa()
1320     Traceback (most recent call last):
1321       ...
1322     AttributeError: no such child: call_santa
1323
1324 Be aware, though, that this does not immediately apply to elements to which
1325 there already is a Python reference.  Their Python class will only be changed
1326 after all references are gone and the Python object is garbage collected.
1327
1328
1329 Advanced element class lookup
1330 -----------------------------
1331
1332 In some cases, the normal data class setup is not enough.  Being based
1333 on ``lxml.etree``, however, ``lxml.objectify`` supports very
1334 fine-grained control over the Element classes used in a tree.  All you
1335 have to do is configure a different `class lookup`_ mechanism (or
1336 write one yourself).
1337
1338 .. _`class lookup`: element_classes.html
1339
1340 The first step for the setup is to create a new parser that builds
1341 objectify documents.  The objectify API is meant for data-centric XML
1342 (as opposed to document XML with mixed content).  Therefore, we
1343 configure the parser to let it remove whitespace-only text from the
1344 parsed document if it is not enclosed by an XML element.  Note that
1345 this alters the document infoset, so if you consider the removed
1346 spaces as data in your specific use case, you should go with a normal
1347 parser and just set the element class lookup.  Most applications,
1348 however, will work fine with the following setup:
1349
1350 .. sourcecode:: pycon
1351
1352     >>> parser = objectify.makeparser(remove_blank_text=True)
1353
1354 What this does internally, is:
1355
1356 .. sourcecode:: pycon
1357
1358     >>> parser = etree.XMLParser(remove_blank_text=True)
1359
1360     >>> lookup = objectify.ObjectifyElementClassLookup()
1361     >>> parser.set_element_class_lookup(lookup)
1362
1363 If you want to change the lookup scheme, say, to get additional
1364 support for `namespace specific classes`_, you can register the
1365 objectify lookup as a fallback of the namespace lookup.  In this case,
1366 however, you have to take care that the namespace classes inherit from
1367 ``objectify.ObjectifiedElement``, not only from the normal
1368 ``lxml.etree.ElementBase``, so that they support the ``objectify``
1369 API.  The above setup code then becomes:
1370
1371 .. sourcecode:: pycon
1372
1373     >>> lookup = etree.ElementNamespaceClassLookup(
1374     ...                   objectify.ObjectifyElementClassLookup() )
1375     >>> parser.set_element_class_lookup(lookup)
1376
1377 .. _`namespace specific classes`: element_classes.html#namespace-class-lookup
1378
1379 See the documentation on `class lookup`_ schemes for more information.
1380
1381
1382 What is different from lxml.etree?
1383 ==================================
1384
1385 Such a different Element API obviously implies some side effects to the normal
1386 behaviour of the rest of the API.
1387
1388 * len(<element>) returns the sibling count, not the number of children of
1389   <element>. You can retrieve the number of children with the
1390   ``countchildren()`` method.
1391
1392 * Iteration over elements does not yield the children, but the siblings.  You
1393   can access all children with the ``iterchildren()`` method on elements or
1394   retrieve a list by calling the ``getchildren()`` method.
1395
1396 * The find, findall and findtext methods require a different implementation
1397   based on ETXPath.  In ``lxml.etree``, they use a Python implementation based
1398   on the original iteration scheme.  This has the disadvantage that they may
1399   not be 100% backwards compatible, and the additional advantage that they now
1400   support any XPath expression.