Doc/library/xml.etree.elementtree.rst

   1 :mod:`xml.etree.ElementTree` --- The ElementTree XML API
   2 ========================================================
   3
   4 .. module:: xml.etree.ElementTree
   5    :synopsis: Implementation of the ElementTree API.
   6 .. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
   7
   8
   9 .. versionadded:: 2.5
  10
  11 **Source code:** :source:`Lib/xml/etree/ElementTree.py`
  12
  13 --------------
  14
  15 The :class:`Element` type is a flexible container object, designed to store
  16 hierarchical data structures in memory.  The type can be described as a cross
  17 between a list and a dictionary.
  18
  19 Each element has a number of properties associated with it:
  20
  21 * a tag which is a string identifying what kind of data this element represents
  22   (the element type, in other words).
  23
  24 * a number of attributes, stored in a Python dictionary.
  25
  26 * a text string.
  27
  28 * an optional tail string.
  29
  30 * a number of child elements, stored in a Python sequence
  31
  32 To create an element instance, use the :class:`Element` constructor or the
  33 :func:`SubElement` factory function.
  34
  35 The :class:`ElementTree` class can be used to wrap an element structure, and
  36 convert it from and to XML.
  37
  38 A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
  39
  40 See http://effbot.org/zone/element-index.htm for tutorials and links to other
  41 docs.  Fredrik Lundh's page is also the location of the development version of
  42 the xml.etree.ElementTree.
  43
  44 .. versionchanged:: 2.7
  45    The ElementTree API is updated to 1.3.  For more information, see
  46    `Introducing ElementTree 1.3
  47    <http://effbot.org/zone/elementtree-13-intro.htm>`_.
  48
  49
  50 .. _elementtree-functions:
  51
  52 Functions
  53 ---------
  54
  55
  56 .. function:: Comment(text=None)
  57
  58    Comment element factory.  This factory function creates a special element
  59    that will be serialized as an XML comment by the standard serializer.  The
  60    comment string can be either a bytestring or a Unicode string.  *text* is a
  61    string containing the comment string.  Returns an element instance
  62    representing a comment.
  63
  64
  65 .. function:: dump(elem)
  66
  67    Writes an element tree or element structure to sys.stdout.  This function
  68    should be used for debugging only.
  69
  70    The exact output format is implementation dependent.  In this version, it's
  71    written as an ordinary XML file.
  72
  73    *elem* is an element tree or an individual element.
  74
  75
  76 .. function:: fromstring(text)
  77
  78    Parses an XML section from a string constant.  Same as :func:`XML`.  *text*
  79    is a string containing XML data.  Returns an :class:`Element` instance.
  80
  81
  82 .. function:: fromstringlist(sequence, parser=None)
  83
  84    Parses an XML document from a sequence of string fragments.  *sequence* is a
  85    list or other sequence containing XML data fragments.  *parser* is an
  86    optional parser instance.  If not given, the standard :class:`XMLParser`
  87    parser is used.  Returns an :class:`Element` instance.
  88
  89    .. versionadded:: 2.7
  90
  91
  92 .. function:: iselement(element)
  93
  94    Checks if an object appears to be a valid element object.  *element* is an
  95    element instance.  Returns a true value if this is an element object.
  96
  97
  98 .. function:: iterparse(source, events=None, parser=None)
  99
 100    Parses an XML section into an element tree incrementally, and reports what's
 101    going on to the user.  *source* is a filename or file object containing XML
 102    data.  *events* is a list of events to report back.  If omitted, only "end"
 103    events are reported.  *parser* is an optional parser instance.  If not
 104    given, the standard :class:`XMLParser` parser is used.  Returns an
 105    :term:`iterator` providing ``(event, elem)`` pairs.
 106
 107    .. note::
 108
 109       :func:`iterparse` only guarantees that it has seen the ">"
 110       character of a starting tag when it emits a "start" event, so the
 111       attributes are defined, but the contents of the text and tail attributes
 112       are undefined at that point.  The same applies to the element children;
 113       they may or may not be present.
 114
 115       If you need a fully populated element, look for "end" events instead.
 116
 117
 118 .. function:: parse(source, parser=None)
 119
 120    Parses an XML section into an element tree.  *source* is a filename or file
 121    object containing XML data.  *parser* is an optional parser instance.  If
 122    not given, the standard :class:`XMLParser` parser is used.  Returns an
 123    :class:`ElementTree` instance.
 124
 125
 126 .. function:: ProcessingInstruction(target, text=None)
 127
 128    PI element factory.  This factory function creates a special element that
 129    will be serialized as an XML processing instruction.  *target* is a string
 130    containing the PI target.  *text* is a string containing the PI contents, if
 131    given.  Returns an element instance, representing a processing instruction.
 132
 133
 134 .. function:: register_namespace(prefix, uri)
 135
 136    Registers a namespace prefix.  The registry is global, and any existing
 137    mapping for either the given prefix or the namespace URI will be removed.
 138    *prefix* is a namespace prefix.  *uri* is a namespace uri.  Tags and
 139    attributes in this namespace will be serialized with the given prefix, if at
 140    all possible.
 141
 142    .. versionadded:: 2.7
 143
 144
 145 .. function:: SubElement(parent, tag, attrib={}, **extra)
 146
 147    Subelement factory.  This function creates an element instance, and appends
 148    it to an existing element.
 149
 150    The element name, attribute names, and attribute values can be either
 151    bytestrings or Unicode strings.  *parent* is the parent element.  *tag* is
 152    the subelement name.  *attrib* is an optional dictionary, containing element
 153    attributes.  *extra* contains additional attributes, given as keyword
 154    arguments.  Returns an element instance.
 155
 156
 157 .. function:: tostring(element, encoding="us-ascii", method="xml")
 158
 159    Generates a string representation of an XML element, including all
 160    subelements.  *element* is an :class:`Element` instance.  *encoding* [1]_ is
 161    the output encoding (default is US-ASCII).  *method* is either ``"xml"``,
 162    ``"html"`` or ``"text"`` (default is ``"xml"``).  Returns an encoded string
 163    containing the XML data.
 164
 165
 166 .. function:: tostringlist(element, encoding="us-ascii", method="xml")
 167
 168    Generates a string representation of an XML element, including all
 169    subelements.  *element* is an :class:`Element` instance.  *encoding* [1]_ is
 170    the output encoding (default is US-ASCII).   *method* is either ``"xml"``,
 171    ``"html"`` or ``"text"`` (default is ``"xml"``).  Returns a list of encoded
 172    strings containing the XML data.  It does not guarantee any specific
 173    sequence, except that ``"".join(tostringlist(element)) ==
 174    tostring(element)``.
 175
 176    .. versionadded:: 2.7
 177
 178
 179 .. function:: XML(text, parser=None)
 180
 181    Parses an XML section from a string constant.  This function can be used to
 182    embed "XML literals" in Python code.  *text* is a string containing XML
 183    data.  *parser* is an optional parser instance.  If not given, the standard
 184    :class:`XMLParser` parser is used.  Returns an :class:`Element` instance.
 185
 186
 187 .. function:: XMLID(text, parser=None)
 188
 189    Parses an XML section from a string constant, and also returns a dictionary
 190    which maps from element id:s to elements.  *text* is a string containing XML
 191    data.  *parser* is an optional parser instance.  If not given, the standard
 192    :class:`XMLParser` parser is used.  Returns a tuple containing an
 193    :class:`Element` instance and a dictionary.
 194
 195
 196 .. _elementtree-element-objects:
 197
 198 Element Objects
 199 ---------------
 200
 201
 202 .. class:: Element(tag, attrib={}, **extra)
 203
 204    Element class.  This class defines the Element interface, and provides a
 205    reference implementation of this interface.
 206
 207    The element name, attribute names, and attribute values can be either
 208    bytestrings or Unicode strings.  *tag* is the element name.  *attrib* is
 209    an optional dictionary, containing element attributes.  *extra* contains
 210    additional attributes, given as keyword arguments.
 211
 212
 213    .. attribute:: tag
 214
 215       A string identifying what kind of data this element represents (the
 216       element type, in other words).
 217
 218
 219    .. attribute:: text
 220
 221       The *text* attribute can be used to hold additional data associated with
 222       the element.  As the name implies this attribute is usually a string but
 223       may be any application-specific object.  If the element is created from
 224       an XML file the attribute will contain any text found between the element
 225       tags.
 226
 227
 228    .. attribute:: tail
 229
 230       The *tail* attribute can be used to hold additional data associated with
 231       the element.  This attribute is usually a string but may be any
 232       application-specific object.  If the element is created from an XML file
 233       the attribute will contain any text found after the element's end tag and
 234       before the next tag.
 235
 236
 237    .. attribute:: attrib
 238
 239       A dictionary containing the element's attributes.  Note that while the
 240       *attrib* value is always a real mutable Python dictionary, an ElementTree
 241       implementation may choose to use another internal representation, and
 242       create the dictionary only if someone asks for it.  To take advantage of
 243       such implementations, use the dictionary methods below whenever possible.
 244
 245    The following dictionary-like methods work on the element attributes.
 246
 247
 248    .. method:: clear()
 249
 250       Resets an element.  This function removes all subelements, clears all
 251       attributes, and sets the text and tail attributes to None.
 252
 253
 254    .. method:: get(key, default=None)
 255
 256       Gets the element attribute named *key*.
 257
 258       Returns the attribute value, or *default* if the attribute was not found.
 259
 260
 261    .. method:: items()
 262
 263       Returns the element attributes as a sequence of (name, value) pairs.  The
 264       attributes are returned in an arbitrary order.
 265
 266
 267    .. method:: keys()
 268
 269       Returns the elements attribute names as a list.  The names are returned
 270       in an arbitrary order.
 271
 272
 273    .. method:: set(key, value)
 274
 275       Set the attribute *key* on the element to *value*.
 276
 277    The following methods work on the element's children (subelements).
 278
 279
 280    .. method:: append(subelement)
 281
 282       Adds the element *subelement* to the end of this elements internal list
 283       of subelements.
 284
 285
 286    .. method:: extend(subelements)
 287
 288       Appends *subelements* from a sequence object with zero or more elements.
 289       Raises :exc:`AssertionError` if a subelement is not a valid object.
 290
 291       .. versionadded:: 2.7
 292
 293
 294    .. method:: find(match)
 295
 296       Finds the first subelement matching *match*.  *match* may be a tag name
 297       or path.  Returns an element instance or ``None``.
 298
 299
 300    .. method:: findall(match)
 301
 302       Finds all matching subelements, by tag name or path.  Returns a list
 303       containing all matching elements in document order.
 304
 305
 306    .. method:: findtext(match, default=None)
 307
 308       Finds text for the first subelement matching *match*.  *match* may be
 309       a tag name or path.  Returns the text content of the first matching
 310       element, or *default* if no element was found.  Note that if the matching
 311       element has no text content an empty string is returned.
 312
 313
 314    .. method:: getchildren()
 315
 316       .. deprecated:: 2.7
 317          Use ``list(elem)`` or iteration.
 318
 319
 320    .. method:: getiterator(tag=None)
 321
 322       .. deprecated:: 2.7
 323          Use method :meth:`Element.iter` instead.
 324
 325
 326    .. method:: insert(index, element)
 327
 328       Inserts a subelement at the given position in this element.
 329
 330
 331    .. method:: iter(tag=None)
 332
 333       Creates a tree :term:`iterator` with the current element as the root.
 334       The iterator iterates over this element and all elements below it, in
 335       document (depth first) order.  If *tag* is not ``None`` or ``'*'``, only
 336       elements whose tag equals *tag* are returned from the iterator.  If the
 337       tree structure is modified during iteration, the result is undefined.
 338
 339       .. versionadded:: 2.7
 340
 341
 342    .. method:: iterfind(match)
 343
 344       Finds all matching subelements, by tag name or path.  Returns an iterable
 345       yielding all matching elements in document order.
 346
 347       .. versionadded:: 2.7
 348
 349
 350    .. method:: itertext()
 351
 352       Creates a text iterator.  The iterator loops over this element and all
 353       subelements, in document order, and returns all inner text.
 354
 355       .. versionadded:: 2.7
 356
 357
 358    .. method:: makeelement(tag, attrib)
 359
 360       Creates a new element object of the same type as this element.  Do not
 361       call this method, use the :func:`SubElement` factory function instead.
 362
 363
 364    .. method:: remove(subelement)
 365
 366       Removes *subelement* from the element.  Unlike the find\* methods this
 367       method compares elements based on the instance identity, not on tag value
 368       or contents.
 369
 370    :class:`Element` objects also support the following sequence type methods
 371    for working with subelements: :meth:`__delitem__`, :meth:`__getitem__`,
 372    :meth:`__setitem__`, :meth:`__len__`.
 373
 374    Caution: Elements with no subelements will test as ``False``.  This behavior
 375    will change in future versions.  Use specific ``len(elem)`` or ``elem is
 376    None`` test instead. ::
 377
 378      element = root.find('foo')
 379
 380      if not element:  # careful!
 381          print "element not found, or element has no subelements"
 382
 383      if element is None:
 384          print "element not found"
 385
 386
 387 .. _elementtree-elementtree-objects:
 388
 389 ElementTree Objects
 390 -------------------
 391
 392
 393 .. class:: ElementTree(element=None, file=None)
 394
 395    ElementTree wrapper class.  This class represents an entire element
 396    hierarchy, and adds some extra support for serialization to and from
 397    standard XML.
 398
 399    *element* is the root element.  The tree is initialized with the contents
 400    of the XML *file* if given.
 401
 402
 403    .. method:: _setroot(element)
 404
 405       Replaces the root element for this tree.  This discards the current
 406       contents of the tree, and replaces it with the given element.  Use with
 407       care.  *element* is an element instance.
 408
 409
 410    .. method:: find(match)
 411
 412       Finds the first toplevel element matching *match*.  *match* may be a tag
 413       name or path.  Same as getroot().find(match).  Returns the first matching
 414       element, or ``None`` if no element was found.
 415
 416
 417    .. method:: findall(match)
 418
 419       Finds all matching subelements, by tag name or path.  Same as
 420       getroot().findall(match).  *match* may be a tag name or path.  Returns a
 421       list containing all matching elements, in document order.
 422
 423
 424    .. method:: findtext(match, default=None)
 425
 426       Finds the element text for the first toplevel element with given tag.
 427       Same as getroot().findtext(match).  *match* may be a tag name or path.
 428       *default* is the value to return if the element was not found.  Returns
 429       the text content of the first matching element, or the default value no
 430       element was found.  Note that if the element is found, but has no text
 431       content, this method returns an empty string.
 432
 433
 434    .. method:: getiterator(tag=None)
 435
 436       .. deprecated:: 2.7
 437          Use method :meth:`ElementTree.iter` instead.
 438
 439
 440    .. method:: getroot()
 441
 442       Returns the root element for this tree.
 443
 444
 445    .. method:: iter(tag=None)
 446
 447       Creates and returns a tree iterator for the root element.  The iterator
 448       loops over all elements in this tree, in section order.  *tag* is the tag
 449       to look for (default is to return all elements)
 450
 451
 452    .. method:: iterfind(match)
 453
 454       Finds all matching subelements, by tag name or path.  Same as
 455       getroot().iterfind(match). Returns an iterable yielding all matching
 456       elements in document order.
 457
 458       .. versionadded:: 2.7
 459
 460
 461    .. method:: parse(source, parser=None)
 462
 463       Loads an external XML section into this element tree.  *source* is a file
 464       name or file object.  *parser* is an optional parser instance.  If not
 465       given, the standard XMLParser parser is used.  Returns the section
 466       root element.
 467
 468
 469    .. method:: write(file, encoding="us-ascii", xml_declaration=None, method="xml")
 470
 471       Writes the element tree to a file, as XML.  *file* is a file name, or a
 472       file object opened for writing.  *encoding* [1]_ is the output encoding
 473       (default is US-ASCII).  *xml_declaration* controls if an XML declaration
 474       should be added to the file.  Use False for never, True for always, None
 475       for only if not US-ASCII or UTF-8 (default is None).  *method* is either
 476       ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).  Returns an
 477       encoded string.
 478
 479 This is the XML file that is going to be manipulated::
 480
 481     <html>
 482         <head>
 483             <title>Example page</title>
 484         </head>
 485         <body>
 486             <p>Moved to <a href="http://example.org/">example.org</a>
 487             or <a href="http://example.com/">example.com</a>.</p>
 488         </body>
 489     </html>
 490
 491 Example of changing the attribute "target" of every link in first paragraph::
 492
 493     >>> from xml.etree.ElementTree import ElementTree
 494     >>> tree = ElementTree()
 495     >>> tree.parse("index.xhtml")
 496     <Element 'html' at 0xb77e6fac>
 497     >>> p = tree.find("body/p")     # Finds first occurrence of tag p in body
 498     >>> p
 499     <Element 'p' at 0xb77ec26c>
 500     >>> links = list(p.iter("a"))   # Returns list of all links
 501     >>> links
 502     [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
 503     >>> for i in links:             # Iterates through all found links
 504     ...     i.attrib["target"] = "blank"
 505     >>> tree.write("output.xhtml")
 506
 507 .. _elementtree-qname-objects:
 508
 509 QName Objects
 510 -------------
 511
 512
 513 .. class:: QName(text_or_uri, tag=None)
 514
 515    QName wrapper.  This can be used to wrap a QName attribute value, in order
 516    to get proper namespace handling on output.  *text_or_uri* is a string
 517    containing the QName value, in the form {uri}local, or, if the tag argument
 518    is given, the URI part of a QName.  If *tag* is given, the first argument is
 519    interpreted as an URI, and this argument is interpreted as a local name.
 520    :class:`QName` instances are opaque.
 521
 522
 523 .. _elementtree-treebuilder-objects:
 524
 525 TreeBuilder Objects
 526 -------------------
 527
 528
 529 .. class:: TreeBuilder(element_factory=None)
 530
 531    Generic element structure builder.  This builder converts a sequence of
 532    start, data, and end method calls to a well-formed element structure.  You
 533    can use this class to build an element structure using a custom XML parser,
 534    or a parser for some other XML-like format.  The *element_factory* is called
 535    to create new :class:`Element` instances when given.
 536
 537
 538    .. method:: close()
 539
 540       Flushes the builder buffers, and returns the toplevel document
 541       element.  Returns an :class:`Element` instance.
 542
 543
 544    .. method:: data(data)
 545
 546       Adds text to the current element.  *data* is a string.  This should be
 547       either a bytestring, or a Unicode string.
 548
 549
 550    .. method:: end(tag)
 551
 552       Closes the current element.  *tag* is the element name.  Returns the
 553       closed element.
 554
 555
 556    .. method:: start(tag, attrs)
 557
 558       Opens a new element.  *tag* is the element name.  *attrs* is a dictionary
 559       containing element attributes.  Returns the opened element.
 560
 561
 562    In addition, a custom :class:`TreeBuilder` object can provide the
 563    following method:
 564
 565    .. method:: doctype(name, pubid, system)
 566
 567       Handles a doctype declaration.  *name* is the doctype name.  *pubid* is
 568       the public identifier.  *system* is the system identifier.  This method
 569       does not exist on the default :class:`TreeBuilder` class.
 570
 571       .. versionadded:: 2.7
 572
 573
 574 .. _elementtree-xmlparser-objects:
 575
 576 XMLParser Objects
 577 -----------------
 578
 579
 580 .. class:: XMLParser(html=0, target=None, encoding=None)
 581
 582    :class:`Element` structure builder for XML source data, based on the expat
 583    parser.  *html* are predefined HTML entities.  This flag is not supported by
 584    the current implementation.  *target* is the target object.  If omitted, the
 585    builder uses an instance of the standard TreeBuilder class.  *encoding* [1]_
 586    is optional.  If given, the value overrides the encoding specified in the
 587    XML file.
 588
 589
 590    .. method:: close()
 591
 592       Finishes feeding data to the parser.  Returns an element structure.
 593
 594
 595    .. method:: doctype(name, pubid, system)
 596
 597       .. deprecated:: 2.7
 598          Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
 599          target.
 600
 601
 602    .. method:: feed(data)
 603
 604       Feeds data to the parser.  *data* is encoded data.
 605
 606 :meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
 607 for each opening tag, its :meth:`end` method for each closing tag,
 608 and data is processed by method :meth:`data`.  :meth:`XMLParser.close`
 609 calls *target*\'s method :meth:`close`.
 610 :class:`XMLParser` can be used not only for building a tree structure.
 611 This is an example of counting the maximum depth of an XML file::
 612
 613     >>> from xml.etree.ElementTree import XMLParser
 614     >>> class MaxDepth:                     # The target object of the parser
 615     ...     maxDepth = 0
 616     ...     depth = 0
 617     ...     def start(self, tag, attrib):   # Called for each opening tag.
 618     ...         self.depth += 1
 619     ...         if self.depth > self.maxDepth:
 620     ...             self.maxDepth = self.depth
 621     ...     def end(self, tag):             # Called for each closing tag.
 622     ...         self.depth -= 1
 623     ...     def data(self, data):
 624     ...         pass            # We do not need to do anything with data.
 625     ...     def close(self):    # Called when all data has been parsed.
 626     ...         return self.maxDepth
 627     ...
 628     >>> target = MaxDepth()
 629     >>> parser = XMLParser(target=target)
 630     >>> exampleXml = """
 631     ... <a>
 632     ...   <b>
 633     ...   </b>
 634     ...   <b>
 635     ...     <c>
 636     ...       <d>
 637     ...       </d>
 638     ...     </c>
 639     ...   </b>
 640     ... </a>"""
 641     >>> parser.feed(exampleXml)
 642     >>> parser.close()
 643     4
 644
 645
 646 .. rubric:: Footnotes
 647
 648 .. [#] The encoding string included in XML output should conform to the
 649    appropriate standards.  For example, "UTF-8" is valid, but "UTF8" is
 650    not.  See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
 651    and http://www.iana.org/assignments/character-sets.