1 :mod:`xml.etree.ElementTree` --- The ElementTree XML API
2 ========================================================
4 .. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6 .. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
11 **Source code:** :source:`Lib/xml/etree/ElementTree.py`
15 The :class:`Element` type is a flexible container object, designed to store
16 hierarchical data structures in memory. The type can be described as a cross
17 between a list and a dictionary.
19 Each element has a number of properties associated with it:
21 * a tag which is a string identifying what kind of data this element represents
22 (the element type, in other words).
24 * a number of attributes, stored in a Python dictionary.
28 * an optional tail string.
30 * a number of child elements, stored in a Python sequence
32 To create an element instance, use the :class:`Element` constructor or the
33 :func:`SubElement` factory function.
35 The :class:`ElementTree` class can be used to wrap an element structure, and
36 convert it from and to XML.
38 A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
40 See http://effbot.org/zone/element-index.htm for tutorials and links to other
41 docs. Fredrik Lundh's page is also the location of the development version of
42 the xml.etree.ElementTree.
44 .. versionchanged:: 2.7
45 The ElementTree API is updated to 1.3. For more information, see
46 `Introducing ElementTree 1.3
47 <http://effbot.org/zone/elementtree-13-intro.htm>`_.
50 .. _elementtree-functions:
56 .. function:: Comment(text=None)
58 Comment element factory. This factory function creates a special element
59 that will be serialized as an XML comment by the standard serializer. The
60 comment string can be either a bytestring or a Unicode string. *text* is a
61 string containing the comment string. Returns an element instance
62 representing a comment.
65 .. function:: dump(elem)
67 Writes an element tree or element structure to sys.stdout. This function
68 should be used for debugging only.
70 The exact output format is implementation dependent. In this version, it's
71 written as an ordinary XML file.
73 *elem* is an element tree or an individual element.
76 .. function:: fromstring(text)
78 Parses an XML section from a string constant. Same as :func:`XML`. *text*
79 is a string containing XML data. Returns an :class:`Element` instance.
82 .. function:: fromstringlist(sequence, parser=None)
84 Parses an XML document from a sequence of string fragments. *sequence* is a
85 list or other sequence containing XML data fragments. *parser* is an
86 optional parser instance. If not given, the standard :class:`XMLParser`
87 parser is used. Returns an :class:`Element` instance.
92 .. function:: iselement(element)
94 Checks if an object appears to be a valid element object. *element* is an
95 element instance. Returns a true value if this is an element object.
98 .. function:: iterparse(source, events=None, parser=None)
100 Parses an XML section into an element tree incrementally, and reports what's
101 going on to the user. *source* is a filename or file object containing XML
102 data. *events* is a list of events to report back. If omitted, only "end"
103 events are reported. *parser* is an optional parser instance. If not
104 given, the standard :class:`XMLParser` parser is used. Returns an
105 :term:`iterator` providing ``(event, elem)`` pairs.
109 :func:`iterparse` only guarantees that it has seen the ">"
110 character of a starting tag when it emits a "start" event, so the
111 attributes are defined, but the contents of the text and tail attributes
112 are undefined at that point. The same applies to the element children;
113 they may or may not be present.
115 If you need a fully populated element, look for "end" events instead.
118 .. function:: parse(source, parser=None)
120 Parses an XML section into an element tree. *source* is a filename or file
121 object containing XML data. *parser* is an optional parser instance. If
122 not given, the standard :class:`XMLParser` parser is used. Returns an
123 :class:`ElementTree` instance.
126 .. function:: ProcessingInstruction(target, text=None)
128 PI element factory. This factory function creates a special element that
129 will be serialized as an XML processing instruction. *target* is a string
130 containing the PI target. *text* is a string containing the PI contents, if
131 given. Returns an element instance, representing a processing instruction.
134 .. function:: register_namespace(prefix, uri)
136 Registers a namespace prefix. The registry is global, and any existing
137 mapping for either the given prefix or the namespace URI will be removed.
138 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
139 attributes in this namespace will be serialized with the given prefix, if at
142 .. versionadded:: 2.7
145 .. function:: SubElement(parent, tag, attrib={}, **extra)
147 Subelement factory. This function creates an element instance, and appends
148 it to an existing element.
150 The element name, attribute names, and attribute values can be either
151 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
152 the subelement name. *attrib* is an optional dictionary, containing element
153 attributes. *extra* contains additional attributes, given as keyword
154 arguments. Returns an element instance.
157 .. function:: tostring(element, encoding="us-ascii", method="xml")
159 Generates a string representation of an XML element, including all
160 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
161 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
162 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an encoded string
163 containing the XML data.
166 .. function:: tostringlist(element, encoding="us-ascii", method="xml")
168 Generates a string representation of an XML element, including all
169 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
170 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
171 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns a list of encoded
172 strings containing the XML data. It does not guarantee any specific
173 sequence, except that ``"".join(tostringlist(element)) ==
176 .. versionadded:: 2.7
179 .. function:: XML(text, parser=None)
181 Parses an XML section from a string constant. This function can be used to
182 embed "XML literals" in Python code. *text* is a string containing XML
183 data. *parser* is an optional parser instance. If not given, the standard
184 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
187 .. function:: XMLID(text, parser=None)
189 Parses an XML section from a string constant, and also returns a dictionary
190 which maps from element id:s to elements. *text* is a string containing XML
191 data. *parser* is an optional parser instance. If not given, the standard
192 :class:`XMLParser` parser is used. Returns a tuple containing an
193 :class:`Element` instance and a dictionary.
196 .. _elementtree-element-objects:
202 .. class:: Element(tag, attrib={}, **extra)
204 Element class. This class defines the Element interface, and provides a
205 reference implementation of this interface.
207 The element name, attribute names, and attribute values can be either
208 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
209 an optional dictionary, containing element attributes. *extra* contains
210 additional attributes, given as keyword arguments.
215 A string identifying what kind of data this element represents (the
216 element type, in other words).
221 The *text* attribute can be used to hold additional data associated with
222 the element. As the name implies this attribute is usually a string but
223 may be any application-specific object. If the element is created from
224 an XML file the attribute will contain any text found between the element
230 The *tail* attribute can be used to hold additional data associated with
231 the element. This attribute is usually a string but may be any
232 application-specific object. If the element is created from an XML file
233 the attribute will contain any text found after the element's end tag and
237 .. attribute:: attrib
239 A dictionary containing the element's attributes. Note that while the
240 *attrib* value is always a real mutable Python dictionary, an ElementTree
241 implementation may choose to use another internal representation, and
242 create the dictionary only if someone asks for it. To take advantage of
243 such implementations, use the dictionary methods below whenever possible.
245 The following dictionary-like methods work on the element attributes.
250 Resets an element. This function removes all subelements, clears all
251 attributes, and sets the text and tail attributes to None.
254 .. method:: get(key, default=None)
256 Gets the element attribute named *key*.
258 Returns the attribute value, or *default* if the attribute was not found.
263 Returns the element attributes as a sequence of (name, value) pairs. The
264 attributes are returned in an arbitrary order.
269 Returns the elements attribute names as a list. The names are returned
270 in an arbitrary order.
273 .. method:: set(key, value)
275 Set the attribute *key* on the element to *value*.
277 The following methods work on the element's children (subelements).
280 .. method:: append(subelement)
282 Adds the element *subelement* to the end of this elements internal list
286 .. method:: extend(subelements)
288 Appends *subelements* from a sequence object with zero or more elements.
289 Raises :exc:`AssertionError` if a subelement is not a valid object.
291 .. versionadded:: 2.7
294 .. method:: find(match)
296 Finds the first subelement matching *match*. *match* may be a tag name
297 or path. Returns an element instance or ``None``.
300 .. method:: findall(match)
302 Finds all matching subelements, by tag name or path. Returns a list
303 containing all matching elements in document order.
306 .. method:: findtext(match, default=None)
308 Finds text for the first subelement matching *match*. *match* may be
309 a tag name or path. Returns the text content of the first matching
310 element, or *default* if no element was found. Note that if the matching
311 element has no text content an empty string is returned.
314 .. method:: getchildren()
317 Use ``list(elem)`` or iteration.
320 .. method:: getiterator(tag=None)
323 Use method :meth:`Element.iter` instead.
326 .. method:: insert(index, element)
328 Inserts a subelement at the given position in this element.
331 .. method:: iter(tag=None)
333 Creates a tree :term:`iterator` with the current element as the root.
334 The iterator iterates over this element and all elements below it, in
335 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
336 elements whose tag equals *tag* are returned from the iterator. If the
337 tree structure is modified during iteration, the result is undefined.
339 .. versionadded:: 2.7
342 .. method:: iterfind(match)
344 Finds all matching subelements, by tag name or path. Returns an iterable
345 yielding all matching elements in document order.
347 .. versionadded:: 2.7
350 .. method:: itertext()
352 Creates a text iterator. The iterator loops over this element and all
353 subelements, in document order, and returns all inner text.
355 .. versionadded:: 2.7
358 .. method:: makeelement(tag, attrib)
360 Creates a new element object of the same type as this element. Do not
361 call this method, use the :func:`SubElement` factory function instead.
364 .. method:: remove(subelement)
366 Removes *subelement* from the element. Unlike the find\* methods this
367 method compares elements based on the instance identity, not on tag value
370 :class:`Element` objects also support the following sequence type methods
371 for working with subelements: :meth:`__delitem__`, :meth:`__getitem__`,
372 :meth:`__setitem__`, :meth:`__len__`.
374 Caution: Elements with no subelements will test as ``False``. This behavior
375 will change in future versions. Use specific ``len(elem)`` or ``elem is
376 None`` test instead. ::
378 element = root.find('foo')
380 if not element: # careful!
381 print "element not found, or element has no subelements"
384 print "element not found"
387 .. _elementtree-elementtree-objects:
393 .. class:: ElementTree(element=None, file=None)
395 ElementTree wrapper class. This class represents an entire element
396 hierarchy, and adds some extra support for serialization to and from
399 *element* is the root element. The tree is initialized with the contents
400 of the XML *file* if given.
403 .. method:: _setroot(element)
405 Replaces the root element for this tree. This discards the current
406 contents of the tree, and replaces it with the given element. Use with
407 care. *element* is an element instance.
410 .. method:: find(match)
412 Finds the first toplevel element matching *match*. *match* may be a tag
413 name or path. Same as getroot().find(match). Returns the first matching
414 element, or ``None`` if no element was found.
417 .. method:: findall(match)
419 Finds all matching subelements, by tag name or path. Same as
420 getroot().findall(match). *match* may be a tag name or path. Returns a
421 list containing all matching elements, in document order.
424 .. method:: findtext(match, default=None)
426 Finds the element text for the first toplevel element with given tag.
427 Same as getroot().findtext(match). *match* may be a tag name or path.
428 *default* is the value to return if the element was not found. Returns
429 the text content of the first matching element, or the default value no
430 element was found. Note that if the element is found, but has no text
431 content, this method returns an empty string.
434 .. method:: getiterator(tag=None)
437 Use method :meth:`ElementTree.iter` instead.
440 .. method:: getroot()
442 Returns the root element for this tree.
445 .. method:: iter(tag=None)
447 Creates and returns a tree iterator for the root element. The iterator
448 loops over all elements in this tree, in section order. *tag* is the tag
449 to look for (default is to return all elements)
452 .. method:: iterfind(match)
454 Finds all matching subelements, by tag name or path. Same as
455 getroot().iterfind(match). Returns an iterable yielding all matching
456 elements in document order.
458 .. versionadded:: 2.7
461 .. method:: parse(source, parser=None)
463 Loads an external XML section into this element tree. *source* is a file
464 name or file object. *parser* is an optional parser instance. If not
465 given, the standard XMLParser parser is used. Returns the section
469 .. method:: write(file, encoding="us-ascii", xml_declaration=None, method="xml")
471 Writes the element tree to a file, as XML. *file* is a file name, or a
472 file object opened for writing. *encoding* [1]_ is the output encoding
473 (default is US-ASCII). *xml_declaration* controls if an XML declaration
474 should be added to the file. Use False for never, True for always, None
475 for only if not US-ASCII or UTF-8 (default is None). *method* is either
476 ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an
479 This is the XML file that is going to be manipulated::
483 <title>Example page</title>
486 <p>Moved to <a href="http://example.org/">example.org</a>
487 or <a href="http://example.com/">example.com</a>.</p>
491 Example of changing the attribute "target" of every link in first paragraph::
493 >>> from xml.etree.ElementTree import ElementTree
494 >>> tree = ElementTree()
495 >>> tree.parse("index.xhtml")
496 <Element 'html' at 0xb77e6fac>
497 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
499 <Element 'p' at 0xb77ec26c>
500 >>> links = list(p.iter("a")) # Returns list of all links
502 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
503 >>> for i in links: # Iterates through all found links
504 ... i.attrib["target"] = "blank"
505 >>> tree.write("output.xhtml")
507 .. _elementtree-qname-objects:
513 .. class:: QName(text_or_uri, tag=None)
515 QName wrapper. This can be used to wrap a QName attribute value, in order
516 to get proper namespace handling on output. *text_or_uri* is a string
517 containing the QName value, in the form {uri}local, or, if the tag argument
518 is given, the URI part of a QName. If *tag* is given, the first argument is
519 interpreted as an URI, and this argument is interpreted as a local name.
520 :class:`QName` instances are opaque.
523 .. _elementtree-treebuilder-objects:
529 .. class:: TreeBuilder(element_factory=None)
531 Generic element structure builder. This builder converts a sequence of
532 start, data, and end method calls to a well-formed element structure. You
533 can use this class to build an element structure using a custom XML parser,
534 or a parser for some other XML-like format. The *element_factory* is called
535 to create new :class:`Element` instances when given.
540 Flushes the builder buffers, and returns the toplevel document
541 element. Returns an :class:`Element` instance.
544 .. method:: data(data)
546 Adds text to the current element. *data* is a string. This should be
547 either a bytestring, or a Unicode string.
552 Closes the current element. *tag* is the element name. Returns the
556 .. method:: start(tag, attrs)
558 Opens a new element. *tag* is the element name. *attrs* is a dictionary
559 containing element attributes. Returns the opened element.
562 In addition, a custom :class:`TreeBuilder` object can provide the
565 .. method:: doctype(name, pubid, system)
567 Handles a doctype declaration. *name* is the doctype name. *pubid* is
568 the public identifier. *system* is the system identifier. This method
569 does not exist on the default :class:`TreeBuilder` class.
571 .. versionadded:: 2.7
574 .. _elementtree-xmlparser-objects:
580 .. class:: XMLParser(html=0, target=None, encoding=None)
582 :class:`Element` structure builder for XML source data, based on the expat
583 parser. *html* are predefined HTML entities. This flag is not supported by
584 the current implementation. *target* is the target object. If omitted, the
585 builder uses an instance of the standard TreeBuilder class. *encoding* [1]_
586 is optional. If given, the value overrides the encoding specified in the
592 Finishes feeding data to the parser. Returns an element structure.
595 .. method:: doctype(name, pubid, system)
598 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
602 .. method:: feed(data)
604 Feeds data to the parser. *data* is encoded data.
606 :meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
607 for each opening tag, its :meth:`end` method for each closing tag,
608 and data is processed by method :meth:`data`. :meth:`XMLParser.close`
609 calls *target*\'s method :meth:`close`.
610 :class:`XMLParser` can be used not only for building a tree structure.
611 This is an example of counting the maximum depth of an XML file::
613 >>> from xml.etree.ElementTree import XMLParser
614 >>> class MaxDepth: # The target object of the parser
617 ... def start(self, tag, attrib): # Called for each opening tag.
619 ... if self.depth > self.maxDepth:
620 ... self.maxDepth = self.depth
621 ... def end(self, tag): # Called for each closing tag.
623 ... def data(self, data):
624 ... pass # We do not need to do anything with data.
625 ... def close(self): # Called when all data has been parsed.
626 ... return self.maxDepth
628 >>> target = MaxDepth()
629 >>> parser = XMLParser(target=target)
641 >>> parser.feed(exampleXml)
646 .. rubric:: Footnotes
648 .. [#] The encoding string included in XML output should conform to the
649 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
650 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
651 and http://www.iana.org/assignments/character-sets.