14 * Crash when merging text nodes in ``element.remove()``.
16 * Crash in sax/target parser when reporting empty doctype.
31 * Crash when building an nsmap (Element property) with empty
34 * Crash due to race condition when errors (or user messages) occur
35 during threaded XSLT processing.
37 * XSLT stylesheet compilation could ignore compilation errors.
49 * ``lxml.html.tostring()`` gained new serialisation options
50 ``with_tail`` and ``doctype``.
55 * Fixed a crash when using ``iterparse()`` for HTML parsing and
56 requesting start events.
58 * Fixed parsing of more selectors in cssselect. Whitespace before
59 pseudo-elements and pseudo-classes is significant as it is a
60 descendant combinator.
61 "E :pseudo" should parse the same as "E \*:pseudo", not "E:pseudo".
64 * lxml.html.diff no longer raises an exception when hitting
65 'img' tags without 'src' attribute.
77 * ``lxml.objectify.deannotate()`` has a new boolean option
78 ``cleanup_namespaces`` to remove the objectify namespace
79 declarations (and generally clean up the namespace declarations)
80 after removing the type annotations.
82 * ``lxml.objectify`` gained its own ``SubElement()`` function as a
83 copy of ``etree.SubElement`` to avoid an otherwise redundant import
84 of ``lxml.etree`` on the user side.
89 * Fixed the "descendant" bug in cssselect a second time (after a first
90 fix in lxml 2.3.1). The previous change resulted in a serious
91 performance regression for the XPath based evaluation of the
92 translated expression. Note that this breaks the usage of some of
93 the generated XPath expressions as XSLT location paths that
94 previously worked in 2.3.1.
96 * Fixed parsing of some selectors in cssselect. Whitespace after combinators
97 ">", "+" and "~" is now correctly ignored. Previously is was parsed as
98 a descendant combinator. For example, "div> .foo" was parsed the same as
99 "div>* .foo" instead of "div>.foo". Patch by Simon Sapin.
111 * New option ``kill_tags`` in ``lxml.html.clean`` to remove specific
112 tags and their content (i.e. their whole subtree).
114 * ``pi.get()`` and ``pi.attrib`` on processing instructions to parse
115 pseudo-attributes from the text content of processing instructions.
117 * ``lxml.get_include()`` returns a list of include paths that can be
118 used to compile external C code against lxml.etree. This is
119 specifically required for statically linked lxml builds when code
120 needs to compile against the exact same header file versions as lxml
123 * ``Resolver.resolve_file()`` takes an additional option
124 ``close_file`` that configures if the file(-like) object will be
125 closed after reading or not. By default, the file will be closed,
126 as the user is not expected to keep a reference to it.
131 * HTML cleaning didn't remove 'data:' links.
133 * The html5lib parser integration now uses the 'official'
134 implementation in html5lib itself, which makes it work with newer
135 releases of the library.
137 * In ``lxml.sax``, ``endElementNS()`` could incorrectly reject a plain
138 tag name when the corresponding start event inferred the same plain
139 tag name to be in the default namespace.
141 * When an open file-like object is passed into ``parse()`` or
142 ``iterparse()``, the parser will no longer close it after use. This
143 reverts a change in lxml 2.3 where all files would be closed. It is
144 the users responsibility to properly close the file(-like) object,
147 * Assertion error in lxml.html.cleaner when discarding top-level elements.
149 * In lxml.cssselect, use the xpath 'A//B' (short for
150 'A/descendant-or-self::node()/B') instead of 'A/descendant::B' for
151 the css descendant selector ('A B'). This makes a few edge cases
152 like ``"div *:last-child"`` consistent with the selector behavior in
153 WebKit and Firefox, and makes more css expressions valid location
154 paths (for use in xsl:template match).
156 * In lxml.html, non-selected ``<option>`` tags no longer show up in the
157 collected form values.
159 * Adding/removing ``<option>`` values to/from a multiple select form
160 field properly selects them and unselects them.
165 * Static builds can specify the download directory with the
166 ``--download-dir`` option.
175 * When looking for children, ``lxml.objectify`` takes '{}tag' as
176 meaning an empty namespace, as opposed to the parent namespace.
181 * When finished reading from a file-like object, the parser
182 immediately calls its ``.close()`` method.
184 * When finished parsing, ``iterparse()`` immediately closes the input
187 * Work-around for libxml2 bug that can leave the HTML parser in a
188 non-functional state after parsing a severly broken document (fixed
191 * ``marque`` tag in HTML cleanup code is correctly named ``marquee``.
196 * Some public functions in the Cython-level C-API have more explicit
200 2.3beta1 (2010-09-06)
201 =====================
209 * Crash in newer libxml2 versions when moving elements between
210 documents that had attributes on replaced XInclude nodes.
212 * ``XMLID()`` function was missing the optional ``parser`` and
213 ``base_url`` parameters.
215 * Searching for wildcard tags in ``iterparse()`` was broken in Py3.
217 * ``lxml.html.open_in_browser()`` didn't work in Python 3 due to the
218 use of os.tempnam. It now takes an optional 'encoding' parameter.
224 2.3alpha2 (2010-07-24)
225 ======================
233 * Crash in XSLT when generating text-only result documents with a
234 stylesheet created in a different thread.
239 * ``repr()`` of Element objects shows the hex ID with leading 0x
240 (following ElementTree 1.3).
243 2.3alpha1 (2010-06-19)
244 ======================
249 * Keyword argument ``namespaces`` in ``lxml.cssselect.CSSSelector()``
250 to pass a prefix-to-namespace mapping for the selector.
252 * New function ``lxml.etree.register_namespace(prefix, uri)`` that
253 globally registers a namespace prefix for a namespace that newly
254 created Elements in that namespace will use automatically. Follows
257 * Support 'unicode' string name as encoding parameter in
258 ``tostring()``, following ElementTree 1.3.
260 * Support 'c14n' serialisation method in ``ElementTree.write()`` and
261 ``tostring()``, following ElementTree 1.3.
263 * The ElementPath expression syntax (``el.find*()``) was extended to
264 match the upcoming ElementTree 1.3 that will ship in the standard
265 library of Python 3.2/2.7. This includes extended support for
266 predicates as well as namespace prefixes (as known from XPath).
268 * During regular XPath evaluation, various ESXLT functions are
269 available within their namespace when using libxslt 1.1.26 or later.
271 * Support passing a readily configured logger instance into
272 ``PyErrorLog``, instead of a logger name.
274 * On serialisation, the new ``doctype`` parameter can be used to
275 override the DOCTYPE (internal subset) of the document.
277 * New parameter ``output_parent`` to ``XSLTExtension.apply_templates()``
278 to append the resulting content directly to an output element.
280 * ``XSLTExtension.process_children()`` to process the content of the
281 XSLT extension element itself.
283 * ISO-Schematron support based on the de-facto Schematron reference
284 'skeleton implementation'.
286 * XSLT objects now take XPath object as ``__call__`` stylesheet
289 * Enable path caching in ElementPath (``el.find*()``) to avoid parsing
292 * Setting the value of a namespaced attribute always uses a prefixed
293 namespace instead of the default namespace even if both declare the
294 same namespace URI. This avoids serialisation problems when an
295 attribute from a default namespace is set on an element from a
298 * XSLT extension elements: support for XSLT context nodes other than
299 elements: document root, comments, processing instructions.
301 * Support for strings (in addition to Elements) in node-sets returned
302 by extension functions.
304 * Forms that lack an ``action`` attribute default to the base URL of
305 the document on submit.
307 * XPath attribute result strings have an ``attrname`` property.
309 * Namespace URIs get validated against RFC 3986 at the API level
310 (required by the XML namespace specification).
312 * Target parsers show their target object in the ``.target`` property
313 (compatible with ElementTree).
318 * API is hardened against invalid proxy instances to prevent crashes
319 due to incorrectly instantiated Element instances.
321 * Prevent crash when instantiating ``CommentBase`` and friends.
323 * Export ElementTree compatible XML parser class as
324 ``XMLTreeBuilder``, as it is called in ET 1.2.
326 * ObjectifiedDataElements in lxml.objectify were not hashable. They
327 now use the hash value of the underlying Python value (string,
328 number, etc.) to which they compare equal.
330 * Parsing broken fragments in lxml.html could fail if the fragment
331 contained an orphaned closing '</div>' tag.
333 * Using XSLT extension elements around the root of the output document
336 * ``lxml.cssselect`` did not distinguish between ``x[attr="val"]`` and
337 ``x [attr="val"]`` (with a space). The latter now matches the
338 attribute independent of the element.
340 * Rewriting multiple links inside of HTML text content could end up
341 replacing unrelated content as replacements could impact the
342 reported position of subsequent matches. Modifications are now
343 simplified by letting the ``iterlinks()`` generator in ``lxml.html``
344 return links in reversed order if they appear inside the same text
345 node. Thus, replacements and link-internal modifications no longer
346 change the position of links reported afterwards.
348 * The ``.value`` attribute of ``textarea`` elements in lxml.html did
349 not represent the complete raw value (including child tags etc.). It
350 now serialises the complete content on read and replaces the
351 complete content by a string on write.
353 * Target parser didn't call ``.close()`` on the target object if
354 parsing failed. Now it is guaranteed that ``.close()`` will be
355 called after parsing, regardless of the outcome.
360 * Official support for Python 3.1.2 and later.
362 * Static MS Windows builds can now download their dependencies
365 * ``Element.attrib`` no longer uses a cyclic reference back to its
366 Element object. It therefore no longer requires the garbage
367 collector to clean up.
369 * Static builds include libiconv, in addition to libxml2 and libxslt.
378 * Crash in newer libxml2 versions when moving elements between
379 documents that had attributes on replaced XInclude nodes.
381 * Import fix for urljoin in Python 3.1+.
390 * Crash in XSLT when generating text-only result documents with a
391 stylesheet created in a different thread.
400 * Fixed several Python 3 regressions by building with Cython 0.11.3.
409 * Support for running XSLT extension elements on the input root node
410 (e.g. in a template matching on "/").
415 * Crash in XPath evaluation when reading smart strings from a document
416 other than the original context document.
418 * Support recent versions of html5lib by not requiring its
419 ``XHTMLParser`` in ``htmlparser.py`` anymore.
421 * Manually instantiating the custom element classes in
422 ``lxml.objectify`` could crash.
424 * Invalid XML text characters were not rejected by the API when they
425 appeared in unicode strings directly after non-ASCII characters.
427 * lxml.html.open_http_urllib() did not work in Python 3.
429 * The functions ``strip_tags()`` and ``strip_elements()`` in
430 ``lxml.etree`` did not remove all occurrences of a tag in all cases.
432 * Crash in XSLT extension elements when the XSLT context node is not
442 * Static build of libxml2/libxslt was broken.
454 * The ``resolve_entities`` option did not work in the incremental feed
457 * Looking up and deleting attributes without a namespace could hit a
458 namespaced attribute of the same name instead.
460 * Late errors during calls to ``SubElement()`` (e.g. attribute related
461 ones) could leave a partially initialised element in the tree.
463 * Modifying trees that contain parsed entity references could result
466 * ObjectifiedElement.__setattr__ created an empty-string child element when the
467 attribute value was rejected as a non-unicode/non-ascii string
469 * Syntax errors in ``lxml.cssselect`` could result in misleading error
472 * Invalid syntax in CSS expressions could lead to an infinite loop in
473 the parser of ``lxml.cssselect``.
475 * CSS special character escapes were not properly handled in
478 * CSS Unicode escapes were not properly decoded in ``lxml.cssselect``.
480 * Select options in HTML forms that had no explicit ``value``
481 attribute were not handled correctly. The HTML standard dictates
482 that their value is defined by their text content. This is now
483 supported by lxml.html.
485 * XPath raised a TypeError when finding CDATA sections. This is now
488 * Calling ``help(lxml.objectify)`` didn't work at the prompt.
490 * The ``ElementMaker`` in lxml.objectify no longer defines the default
491 namespaces when annotation is disabled.
493 * Feed parser failed to honout the 'recover' option on parse errors.
495 * Diverting the error logging to Python's logging system was broken.
507 * New helper functions ``strip_attributes()``, ``strip_elements()``,
508 ``strip_tags()`` in lxml.etree to remove attributes/subtrees/tags
514 * Namespace cleanup on subtree insertions could result in missing
515 namespace declarations (and potentially crashes) if the element
516 defining a namespace was deleted and the namespace was not used by
517 the top element of the inserted subtree but only in deeper subtrees.
519 * Raising an exception from a parser target callback didn't always
520 terminate the parser.
522 * Only {true, false, 1, 0} are accepted as the lexical representation for
523 BoolElement ({True, False, T, F, t, f} not any more), restoring lxml <= 2.0
536 * Injecting default attributes into a document during XML Schema
537 validation (also at parse time).
539 * Pass ``huge_tree`` parser option to disable parser security
540 restrictions imposed by libxml2 2.7.
545 * The script for statically building libxml2 and libxslt didn't work
548 * ``XMLSchema()`` also passes invalid schema documents on to libxml2
549 for parsing (which could lead to a crash before release 2.6.24).
561 * Support for ``standalone`` flag in XML declaration through
562 ``tree.docinfo.standalone`` and by passing ``standalone=True/False``
568 * Crash when parsing an XML Schema with external imports from a
572 2.2beta4 (2009-02-27)
573 =====================
578 * Support strings and instantiable Element classes as child arguments
579 to the constructor of custom Element classes.
581 * GZip compression support for serialisation to files and file-like
587 * Deep-copying an ElementTree copied neither its sibling PIs and
588 comments nor its internal/external DTD subsets.
590 * Soupparser failed on broken attributes without values.
592 * Crash in XSLT when overwriting an already defined attribute using
595 * Crash bug in exception handling code under Python 3. This was due
596 to a problem in Cython, not lxml itself.
598 * ``lxml.html.FormElement._name()`` failed for non top-level forms.
600 * ``TAG`` special attribute in constructor of custom Element classes
601 was evaluated incorrectly.
606 * Official support for Python 3.0.1.
608 * ``Element.findtext()`` now returns an empty string instead of None
609 for Elements without text content.
612 2.2beta3 (2009-02-17)
613 =====================
618 * ``XSLT.strparam()`` class method to wrap quoted string parameters
619 that require escaping.
624 * Memory leak in XPath evaluators.
626 * Crash when parsing indented XML in one thread and merging it with
627 other documents parsed in another thread.
629 * Setting the ``base`` attribute in ``lxml.objectify`` from a unicode
632 * Fixes following changes in Python 3.0.1.
634 * Minor fixes for Python 3.
639 * The global error log (which is copied into the exception log) is now
640 local to a thread, which fixes some race conditions.
642 * More robust error handling on serialisation.
645 2.2beta2 (2009-01-25)
646 =====================
651 * Potential memory leak on exception handling. This was due to a
652 problem in Cython, not lxml itself.
654 * ``iter_links`` (and related link-rewriting functions) in
655 ``lxml.html`` would interpret CSS like ``url("link")`` incorrectly
656 (treating the quotation marks as part of the link).
658 * Failing import on systems that have an ``io`` module.
667 * Potential memory leak on exception handling. This was due to a
668 problem in Cython, not lxml itself.
670 * Failing import on systems that have an ``io`` module.
673 2.2beta1 (2008-12-12)
674 =====================
679 * Allow ``lxml.html.diff.htmldiff`` to accept Element objects, not
685 * Crash when using an XPath evaluator in multiple threads.
687 * Fixed missing whitespace before ``Link:...`` in ``lxml.html.diff``.
692 * Export ``lxml.html.parse``.
701 * Crash when using an XPath evaluator in multiple threads.
710 * Crash when using an XPath evaluator in multiple threads.
713 2.2alpha1 (2008-11-23)
714 ======================
719 * Support for XSLT result tree fragments in XPath/XSLT extension
722 * QName objects have new properties ``namespace`` and ``localname``.
724 * New options for exclusive C14N and C14N without comments.
726 * Instantiating a custom Element classes creates a new Element.
731 * XSLT didn't inherit the parse options of the input document.
733 * 0-bytes could slip through the API when used inside of Unicode
736 * With ``lxml.html.clean.autolink``, links with balanced parenthesis,
737 that end in a parenthesis, will be linked in their entirety (typical
738 with Wikipedia links).
753 * Ref-count leaks when lxml enters a try-except statement while an
754 outside exception lives in sys.exc_*(). This was due to a problem in
755 Cython, not lxml itself.
757 * Parser Unicode decoding errors could get swallowed by other
760 * Name/import errors in some Python modules.
762 * Internal DTD subsets that did not specify a system or public ID were
763 not serialised and did not appear in the docinfo property of
766 * Fix a pre-Py3k warning when parsing from a gzip file in Py2.6.
768 * Test suite fixes for libxml2 2.7.
770 * Resolver.resolve_string() did not work for non-ASCII byte strings.
772 * Resolver.resolve_file() was broken.
774 * Overriding the parser encoding didn't work for many encodings.
786 * Ref-count leaks when lxml enters a try-except statement while an
787 outside exception lives in sys.exc_*(). This was due to a problem in
788 Cython, not lxml itself.
797 * lxml.etree now tries to find the absolute path name of files when
798 parsing from a file-like object. This helps custom resolvers when
799 resolving relative URLs, as lixbml2 can prepend them with the path
800 of the source document.
805 * Memory problem when passing documents between threads.
807 * Target parser did not honour the ``recover`` option and raised an
808 exception instead of calling ``.close()`` on the target.
820 * Memory problem when passing documents between threads.
822 * Target parser did not honour the ``recover`` option and raised an
823 exception instead of calling ``.close()`` on the target.
835 * Crash when parsing XSLT stylesheets in a thread and using them in
838 * Encoding problem when including text with ElementInclude under
851 * ``lxml.html.rewrite_links()`` strips links to work around documents
852 with whitespace in URL attributes.
857 * Crash when parsing XSLT stylesheets in a thread and using them in
860 * CSS selector parser dropped remaining expression after a function
873 * Smart strings can be switched off in XPath (``smart_strings``
876 * ``lxml.html.rewrite_links()`` strips links to work around documents
877 with whitespace in URL attributes.
882 * Custom resolvers were not used for XMLSchema includes/imports and
885 * CSS selector parser dropped remaining expression after a function
891 * ``objectify.enableRecursiveStr()`` was removed, use
892 ``objectify.enable_recursive_str()`` instead
894 * Speed-up when running XSLTs on documents from other threads
903 * Pickling ``ElementTree`` objects in lxml.objectify.
908 * Descending dot-separated classes in CSS selectors were not resolved
911 * ``ElementTree.parse()`` didn't handle target parser result.
913 * Potential threading problem in XInclude.
915 * Crash in Element class lookup classes when the __init__() method of
916 the super class is not called from Python subclasses.
921 * Non-ASCII characters in attribute values are no longer escaped on
925 2.1beta3 (2008-06-19)
926 =====================
931 * Major overhaul of ``tools/xpathgrep.py`` script.
933 * Pickling ``ElementTree`` objects in lxml.objectify.
935 * Support for parsing from file-like objects that return unicode
938 * New function ``etree.cleanup_namespaces(el)`` that removes unused
939 namespace declarations from a (sub)tree (experimental).
941 * XSLT results support the buffer protocol in Python 3.
943 * Polymorphic functions in ``lxml.html`` that accept either a tree or
944 a parsable string will return either a UTF-8 encoded byte string, a
945 unicode string or a tree, based on the type of the input.
946 Previously, the result was always a byte string or a tree.
948 * Support for Python 2.6 and 3.0 beta.
950 * File name handling now uses a heuristic to convert between byte
951 strings (usually filenames) and unicode strings (usually URLs).
953 * Parsing from a plain file object frees the GIL under Python 2.x.
955 * Running ``iterparse()`` on a plain file (or filename) frees the GIL
956 on reading under Python 2.x.
958 * Conversion functions ``html_to_xhtml()`` and ``xhtml_to_html()`` in
959 lxml.html (experimental).
961 * Most features in lxml.html work for XHTML namespaced tag names
967 * ``ElementTree.parse()`` didn't handle target parser result.
969 * Crash in Element class lookup classes when the __init__() method of
970 the super class is not called from Python subclasses.
972 * A number of problems related to unicode/byte string conversion of
973 filenames and error messages were fixed.
975 * Building on MacOS-X now passes the "flat_namespace" option to the C
976 compiler, which reportedly prevents build quirks and crashes on this
979 * Windows build was broken.
981 * Rare crash when serialising to a file object with certain encodings.
986 * Non-ASCII characters in attribute values are no longer escaped on
989 * Passing non-ASCII byte strings or invalid unicode strings as .tag,
990 namespaces, etc. will result in a ValueError instead of an
991 AssertionError (just like the tag well-formedness check).
993 * Up to several times faster attribute access (i.e. tree traversal) in
1006 * Incorrect evaluation of ``el.find("tag[child]")``.
1008 * Windows build was broken.
1010 * Moving a subtree from a document created in one thread into a
1011 document of another thread could crash when the rest of the source
1012 document is deleted while the subtree is still in use.
1014 * Rare crash when serialising to a file object with certain encodings.
1019 * lxml should now build without problems on MacOS-X.
1022 2.1beta2 (2008-05-02)
1023 =====================
1028 * All parse functions in lxml.html take a ``parser`` keyword argument.
1030 * lxml.html has a new parser class ``XHTMLParser`` and a module
1031 attribute ``xhtml_parser`` that provide XML parsers that are
1032 pre-configured for the lxml.html package.
1037 * Moving a subtree from a document created in one thread into a
1038 document of another thread could crash when the rest of the source
1039 document is deleted while the subtree is still in use.
1041 * Passing an nsmap when creating an Element will no longer strip
1042 redundantly defined namespace URIs. This prevented the definition
1043 of more than one prefix for a namespace on the same Element.
1048 * If the default namespace is redundantly defined with a prefix on the
1049 same Element, the prefix will now be preferred for subelements and
1050 attributes. This allows users to work around a problem in libxml2
1051 where attributes from the default namespace could serialise without
1052 a prefix even when they appear on an Element with a different
1053 namespace (i.e. they would end up in the wrong namespace).
1065 * Resolving to a filename in custom resolvers didn't work.
1067 * lxml did not honour libxslt's second error state "STOPPED", which
1068 let some XSLT errors pass silently.
1070 * Memory leak in Schematron with libxml2 >= 2.6.31.
1076 2.1beta1 (2008-04-15)
1077 =====================
1082 * Error logging in Schematron (requires libxml2 2.6.32 or later).
1084 * Parser option ``strip_cdata`` for normalising or keeping CDATA
1085 sections. Defaults to ``True`` as before, thus replacing CDATA
1086 sections by their text content.
1088 * ``CDATA()`` factory to wrap string content as CDATA section.
1093 * Resolving to a filename in custom resolvers didn't work.
1095 * lxml did not honour libxslt's second error state "STOPPED", which
1096 let some XSLT errors pass silently.
1098 * Memory leak in Schematron with libxml2 >= 2.6.31.
1100 * lxml.etree accepted non well-formed namespace prefix names.
1105 * Major cleanup in internal ``moveNodeToDocument()`` function, which
1106 takes care of namespace cleanup when moving elements between
1107 different namespace contexts.
1109 * New Elements created through the ``makeelement()`` method of an HTML
1110 parser or through lxml.html now end up in a new HTML document
1111 (doctype HTML 4.01 Transitional) instead of a generic XML document.
1112 This mostly impacts the serialisation and the availability of a DTD
1125 * Hanging thread in conjunction with GTK threading.
1127 * Crash bug in iterparse when moving elements into other documents.
1129 * HTML elements' ``.cssselect()`` method was broken.
1131 * ``ElementTree.find*()`` didn't accept QName objects.
1137 2.1alpha1 (2008-03-27)
1138 ======================
1143 * New event types 'comment' and 'pi' in ``iterparse()``.
1145 * ``XSLTAccessControl`` instances have a property ``options`` that
1146 returns a dict of access configuration options.
1148 * Constant instances ``DENY_ALL`` and ``DENY_WRITE`` on
1149 ``XSLTAccessControl`` class.
1151 * Extension elements for XSLT (experimental!)
1153 * ``Element.base`` property returns the xml:base or HTML base URL of
1156 * ``docinfo.URL`` property is writable.
1161 * Default encoding for plain text serialisation was different from
1162 that of XML serialisation (UTF-8 instead of ASCII).
1167 * Minor API speed-ups.
1169 * The benchmark suite now uses tail text in the trees, which makes the
1170 absolute numbers incomparable to previous results.
1172 * Generating the HTML documentation now requires Pygments_, which is
1173 used to enable syntax highlighting for the doctest examples.
1175 .. _Pygments: http://pygments.org/
1177 Most long-time deprecated functions and methods were removed:
1179 - ``etree.clearErrorLog()``, use ``etree.clear_error_log()``
1181 - ``etree.useGlobalPythonLog()``, use
1182 ``etree.use_global_python_log()``
1184 - ``etree.ElementClassLookup.setFallback()``, use
1185 ``etree.ElementClassLookup.set_fallback()``
1187 - ``etree.getDefaultParser()``, use ``etree.get_default_parser()``
1189 - ``etree.setDefaultParser()``, use ``etree.set_default_parser()``
1191 - ``etree.setElementClassLookup()``, use
1192 ``etree.set_element_class_lookup()``
1194 Note that ``parser.setElementClassLookup()`` has not been removed
1195 yet, although ``parser.set_element_class_lookup()`` should be used
1198 - ``xpath_evaluator.registerNamespace()``, use
1199 ``xpath_evaluator.register_namespace()``
1201 - ``xpath_evaluator.registerNamespaces()``, use
1202 ``xpath_evaluator.register_namespaces()``
1204 - ``objectify.setPytypeAttributeTag``, use
1205 ``objectify.set_pytype_attribute_tag``
1207 - ``objectify.setDefaultParser()``, use
1208 ``objectify.set_default_parser()``
1217 * soupparser.parse() allows passing keyword arguments on to
1220 * ``fromstring()`` method in ``lxml.html.soupparser``.
1225 * ``lxml.html.diff`` didn't treat empty tags properly (e.g.,
1228 * Handle entity replacements correctly in target parser.
1230 * Crash when using ``iterparse()`` with XML Schema validation.
1232 * The BeautifulSoup parser (soupparser.py) did not replace entities,
1233 which made them turn up in text content.
1235 * Attribute assignment of custom PyTypes in objectify could fail to
1236 correctly serialise the value to a string.
1241 * ``lxml.html.ElementSoup`` was replaced by a new module
1242 ``lxml.html.soupparser`` with a more consistent API. The old module
1243 remains for compatibility with ElementTree's own ElementSoup module.
1245 * Setting the XSLT_CONFIG and XML2_CONFIG environment variables at
1246 build time will let setup.py pick up the ``xml2-config`` and
1247 ``xslt-config`` scripts from the supplied path name.
1249 * Passing ``--with-xml2-config=/path/to/xml2-config`` to setup.py will
1250 override the ``xml2-config`` script that is used to determine the C
1251 compiler options. The same applies for the ``--with-xslt-config``
1261 * Support passing ``base_url`` to file parser functions to override
1262 the filename of the file(-like) object.
1267 * The prefix for objectify's pytype namespace was missing from the set
1268 of default prefixes.
1270 * Memory leak in Schematron (fixed only for libxml2 2.6.31+).
1272 * Error type names in RelaxNG were reported incorrectly.
1274 * Slice deletion bug fixed in objectify.
1279 * Enabled doctests for some Python modules (especially ``lxml.html``).
1281 * Add a ``method`` argument to ``lxml.html.tostring()``
1282 (``method="xml"`` for XHTML output).
1284 * Make it clearer that methods like ``lxml.html.fromstring()`` take a
1285 ``base_url`` argument.
1294 * Child iteration in ``lxml.pyclasslookup``.
1296 * Loads of new docstrings reflect the signature of functions and
1297 methods to make them visible in API docs and ``help()``
1302 * The module ``lxml.html.builder`` was duplicated as
1303 ``lxml.htmlbuilder``
1305 * Form elements would return None for ``form.fields.keys()`` if there
1306 was an unnamed input field. Now unnamed input fields are completely
1309 * Setting an element slice in objectify could insert slice-overlapping
1310 elements at the wrong position.
1315 * The generated API documentation was cleaned up and disburdened from
1316 non-public classes etc.
1318 * The previously public module ``lxml.html.setmixin`` was renamed to
1319 ``lxml.html._setmixin`` as it is not an official part of lxml. If
1320 you want to use it, feel free to copy it over to your own source
1323 * Passing ``--with-xslt-config=/path/to/xslt-config`` to setup.py will
1324 override the ``xslt-config`` script that is used to determine the C
1334 * Passing the ``unicode`` type as ``encoding`` to ``tostring()`` will
1335 serialise to unicode. The ``tounicode()`` function is now
1338 * ``XMLSchema()`` and ``RelaxNG()`` can parse from StringIO.
1340 * ``makeparser()`` function in ``lxml.objectify`` to create a new
1341 parser with the usual objectify setup.
1343 * Plain ASCII XPath string results are no longer forced into unicode
1344 objects as in 2.0beta1, but are returned as plain strings as before.
1346 * All XPath string results are 'smart' objects that have a
1347 ``getparent()`` method to retrieve their parent Element.
1349 * ``with_tail`` option in serialiser functions.
1351 * More accurate exception messages in validator creation.
1353 * Parse-time XML schema validation (``schema`` parser keyword).
1355 * XPath string results of the ``text()`` function and attribute
1356 selection make their Element container accessible through a
1357 ``getparent()`` method. As a side-effect, they are now always
1358 unicode objects (even ASCII strings).
1360 * ``XSLT`` objects are usable in any thread - at the cost of a deep
1361 copy if they were not created in that thread.
1363 * Invalid entity names and character references will be rejected by
1364 the ``Entity()`` factory.
1366 * ``entity.text`` returns the textual representation of the entity,
1369 * New properties ``position`` and ``code`` on ParseError exception (as
1372 * Rich comparison of ``element.attrib`` proxies.
1374 * ElementTree compatible TreeBuilder class.
1376 * Use default prefixes for some common XML namespaces.
1378 * ``lxml.html.clean.Cleaner`` now allows for a ``host_whitelist``, and
1379 two overridable methods: ``allow_embedded_url(el, url)`` and the
1380 more general ``allow_element(el)``.
1382 * Extended slicing of Elements as in ``element[1:-1:2]``, both in
1383 etree and in objectify
1385 * Resolvers can now provide a ``base_url`` keyword argument when
1386 resolving a document as string data.
1388 * When using ``lxml.doctestcompare`` you can give the doctest option
1389 ``NOPARSE_MARKUP`` (like ``# doctest: +NOPARSE_MARKUP``) to suppress
1390 the special checking for one test.
1392 * Separate ``feed_error_log`` property for the feed parser interface.
1393 The normal parser interface and ``iterparse`` continue to use
1396 * The normal parsers and the feed parser interface are now separated
1397 and can be used concurrently on the same parser instance.
1399 * ``fromstringlist()`` and ``tostringlist()`` functions as in
1402 * ``iterparse()`` accepts an ``html`` boolean keyword argument for
1403 parsing with the HTML parser (note that this interface may be
1406 * Parsers accept an ``encoding`` keyword argument that overrides the encoding
1407 of the parsed documents.
1409 * New C-API function ``hasChild()`` to test for children
1411 * ``annotate()`` function in objectify can annotate with Python types and XSI
1412 types in one step. Accompanied by ``xsiannotate()`` and ``pyannotate()``.
1414 * ``ET.write()``, ``tostring()`` and ``tounicode()`` now accept a keyword
1415 argument ``method`` that can be one of 'xml' (or None), 'html' or 'text' to
1416 serialise as XML, HTML or plain text content.
1418 * ``iterfind()`` method on Elements returns an iterator equivalent to
1421 * ``itertext()`` method on Elements
1423 * Setting a QName object as value of the .text property or as an attribute
1424 will resolve its prefix in the respective context
1426 * ElementTree-like parser target interface as described in
1427 http://effbot.org/elementtree/elementtree-xmlparser.htm
1429 * ElementTree-like feed parser interface on XMLParser and HTMLParser
1430 (``feed()`` and ``close()`` methods)
1432 * Reimplemented ``objectify.E`` for better performance and improved
1433 integration with objectify. Provides extended type support based on
1436 * XSLT objects now support deep copying
1438 * New ``makeSubElement()`` C-API function that allows creating a new
1439 subelement straight with text, tail and attributes.
1441 * XPath extension functions can now access the current context node
1442 (``context.context_node``) and use a context dictionary
1443 (``context.eval_context``) from the context provided in their first
1446 * HTML tag soup parser based on BeautifulSoup in ``lxml.html.ElementSoup``
1448 * New module ``lxml.doctestcompare`` by Ian Bicking for writing simplified
1449 doctests based on XML/HTML output. Use by importing ``lxml.usedoctest`` or
1450 ``lxml.html.usedoctest`` from within a doctest.
1452 * New module ``lxml.cssselect`` by Ian Bicking for selecting Elements with CSS
1455 * New package ``lxml.html`` written by Ian Bicking for advanced HTML
1458 * Namespace class setup is now local to the ``ElementNamespaceClassLookup``
1459 instance and no longer global.
1461 * Schematron validation (incomplete in libxml2)
1463 * Additional ``stringify`` argument to ``objectify.PyType()`` takes a
1464 conversion function to strings to support setting text values from arbitrary
1467 * Entity support through an ``Entity`` factory and element classes. XML
1468 parsers now have a ``resolve_entities`` keyword argument that can be set to
1469 False to keep entities in the document.
1471 * ``column`` field on error log entries to accompany the ``line`` field
1473 * Error specific messages in XPath parsing and evaluation
1474 NOTE: for evaluation errors, you will now get an XPathEvalError instead of
1475 an XPathSyntaxError. To catch both, you can except on ``XPathError``
1477 * The regular expression functions in XPath now support passing a node-set
1480 * Extended type annotation in objectify: new ``xsiannotate()`` function
1482 * EXSLT RegExp support in standard XPath (not only XSLT)
1487 * Missing import in ``lxml.html.clean``.
1489 * Some Python 2.4-isms prevented lxml from building/running under
1492 * XPath on ElementTrees could crash when selecting the virtual root
1493 node of the ElementTree.
1495 * Compilation ``--without-threading`` was buggy in alpha5/6.
1497 * Memory leak in the ``parse()`` function.
1499 * Minor bugs in XSLT error message formatting.
1501 * Result document memory leak in target parser.
1503 * Target parser failed to report comments.
1505 * In the ``lxml.html`` ``iter_links`` method, links in ``<object>``
1506 tags weren't recognized. (Note: plugin-specific link parameters
1507 still aren't recognized.) Also, the ``<embed>`` tag, though not
1508 standard, is now included in ``lxml.html.defs.special_inline_tags``.
1510 * Using custom resolvers on XSLT stylesheets parsed from a string
1511 could request ill-formed URLs.
1513 * With ``lxml.doctestcompare`` if you do ``<tag xmlns="...">`` in your
1514 output, it will then be namespace-neutral (before the ellipsis was
1515 treated as a real namespace).
1517 * AttributeError in feed parser on parse errors
1519 * XML feed parser setup problem
1521 * Type annotation for unicode strings in ``DataElement()``
1523 * lxml failed to serialise namespace declarations of elements other than the
1526 * Race condition in XSLT where the resolver context leaked between concurrent
1529 * lxml.etree did not check tag/attribute names
1531 * The XML parser did not report undefined entities as error
1533 * The text in exceptions raised by XML parsers, validators and XPath
1534 evaluators now reports the first error that occurred instead of the last
1536 * Passing '' as XPath namespace prefix did not raise an error
1538 * Thread safety in XPath evaluators
1543 * Exceptions carry only the part of the error log that is related to
1544 the operation that caused the error.
1546 * ``XMLSchema()`` and ``RelaxNG()`` now enforce passing the source
1547 file/filename through the ``file`` keyword argument.
1549 * The test suite now skips most doctests under Python 2.3.
1551 * ``make clean`` no longer removes the .c files (use ``make
1552 realclean`` instead)
1554 * Minor performance tweaks for Element instantiation and subelement
1557 * Various places in the XPath, XSLT and iteration APIs now require
1558 keyword-only arguments.
1560 * The argument order in ``element.itersiblings()`` was changed to
1561 match the order used in all other iteration methods. The second
1562 argument ('preceding') is now a keyword-only argument.
1564 * The ``getiterator()`` method on Elements and ElementTrees was
1565 reverted to return an iterator as it did in lxml 1.x. The ET API
1566 specification allows it to return either a sequence or an iterator,
1567 and it traditionally returned a sequence in ET and an iterator in
1568 lxml. However, it is now deprecated in favour of the ``iter()``
1569 method, which should be used in new code wherever possible.
1571 * The 'pretty printed' serialisation of ElementTree objects now
1572 inserts newlines at the root level between processing instructions,
1573 comments and the root tag.
1575 * A 'pretty printed' serialisation is now terminated with a newline.
1577 * Second argument to ``lxml.etree.Extension()`` helper is no longer
1578 required, third argument is now a keyword-only argument ``ns``.
1580 * ``lxml.html.tostring`` takes an ``encoding`` argument.
1582 * The module source files were renamed to "lxml.*.pyx", such as
1583 "lxml.etree.pyx". This was changed for consistency with the way
1584 Pyrex commonly handles package imports. The main effect is that
1585 classes now know about their fully qualified class name, including
1586 the package name of their module.
1588 * Keyword-only arguments in some API functions, especially in the
1589 parsers and serialisers.
1591 * Tag name validation in lxml.etree (and lxml.html) now distinguishes
1592 between HTML tags and XML tags based on the parser that was used to
1593 parse or create them. HTML tags no longer reject any non-ASCII
1594 characters in tag names but only spaces and the special characters
1597 * lxml.etree now emits a warning if you use XPath with libxml2 2.6.27
1598 (which can crash on certain XPath errors)
1600 * Type annotation in objectify now preserves the already annotated type by
1601 default to prevent loosing type information that is already there.
1603 * ``element.getiterator()`` returns a list, use ``element.iter()`` to retrieve
1604 an iterator (ElementTree 1.3 compatible behaviour)
1606 * objectify.PyType for None is now called "NoneType"
1608 * ``el.getiterator()`` renamed to ``el.iter()``, following ElementTree 1.3 -
1609 original name is still available as alias
1611 * In the public C-API, ``findOrBuildNodeNs()`` was replaced by the more
1612 generic ``findOrBuildNodeNsPrefix``
1614 * Major refactoring in XPath/XSLT extension function code
1616 * Network access in parsers disabled by default
1625 * Backported decref crash fix from 2.0
1627 * Well hidden free-while-in-use crash bug in ObjectPath
1632 * The test suites now run ``gc.collect()`` in the ``tearDown()``
1633 methods. While this makes them take a lot longer to run, it also
1634 makes it easier to link a specific test to garbage collection
1635 problems that would otherwise appear in later tests.
1647 * lxml.etree could crash when adding more than 10000 namespaces to a
1650 * lxml failed to serialise namespace declarations of elements other
1651 than the root node of a tree
1660 * The ``ElementMaker`` in ``lxml.builder`` now accepts the keyword arguments
1661 ``namespace`` and ``nsmap`` to set a namespace and nsmap for the Elements it
1664 * The ``docinfo`` on ElementTree objects has new properties ``internalDTD``
1665 and ``externalDTD`` that return a DTD object for the internal or external
1666 subset of the document respectively.
1668 * Serialising an ElementTree now includes any internal DTD subsets that are
1669 part of the document, as well as comments and PIs that are siblings of the
1675 * Parsing with the ``no_network`` option could fail
1680 * lxml now raises a TagNameWarning about tag names containing ':' instead of
1681 an Error as 1.3.3 did. The reason is that a number of projects currently
1682 misuse the previous lack of tag name validation to generate namespace
1683 prefixes without declaring namespaces. Apart from the danger of generating
1684 broken XML this way, it also breaks most of the namespace-aware tools in
1685 XML, including XPath, XSLT and validation. lxml 1.3.x will continue to
1686 support this bug with a Warning, while lxml 2.0 will be strict about
1687 well-formed tag names (not only regarding ':').
1689 * Serialising an Element no longer includes its comment and PI siblings (only
1690 ElementTree serialisation includes them).
1699 * ElementTree compatible parser ``ETCompatXMLParser`` strips processing
1700 instructions and comments while parsing XML
1702 * Parsers now support stripping PIs (keyword argument 'remove_pis')
1704 * ``etree.fromstring()`` now supports parsing both HTML and XML, depending on
1705 the parser you pass.
1707 * Support ``base_url`` keyword argument in ``HTML()`` and ``XML()``
1712 * Parsing from Python Unicode strings failed on some platforms
1714 * ``Element()`` did not raise an exception on tag names containing ':'
1716 * ``Element.getiterator(tag)`` did not accept ``Comment`` and
1717 ``ProcessingInstruction`` as tags. It also accepts ``Element`` now.
1729 * "deallocating None" crash bug
1738 * objectify.DataElement now supports setting values from existing data
1739 elements (not just plain Python types) and reuses defined namespaces etc.
1741 * E-factory support for lxml.objectify (``objectify.E``)
1746 * Better way to prevent crashes in Element proxy cleanup code
1748 * objectify.DataElement didn't set up None value correctly
1750 * objectify.DataElement didn't check the value against the provided type hints
1752 * Reference-counting bug in ``Element.attrib.pop()``
1761 * Module ``lxml.pyclasslookup`` module implements an Element class lookup
1762 scheme that can access the entire tree in read-only mode to help determining
1763 a suitable Element class
1765 * Parsers take a ``remove_comments`` keyword argument that skips over comments
1767 * ``parse()`` function in ``objectify``, corresponding to ``XML()`` etc.
1769 * ``Element.addnext(el)`` and ``Element.addprevious(el)`` methods to support
1770 adding processing instructions and comments around the root node
1772 * ``Element.attrib`` was missing ``clear()`` and ``pop()`` methods
1774 * Extended type annotation in objectify: cleaner annotation namespace setup
1775 plus new ``deannotate()`` function
1777 * Support for custom Element class instantiation in lxml.sax: passing a
1778 ``makeelement`` function to the ElementTreeContentHandler will reuse the
1779 lookup context of that function
1781 * '.' represents empty ObjectPath (identity)
1783 * ``Element.values()`` to accompany the existing ``.keys()`` and ``.items()``
1785 * ``collectAttributes()`` C-function to build a list of attribute
1786 keys/values/items for a libxml2 node
1788 * ``DTD`` validator class (like ``RelaxNG`` and ``XMLSchema``)
1790 * HTML generator helpers by Fredrik Lundh in ``lxml.htmlbuilder``
1792 * ``ElementMaker`` XML generator by Fredrik Lundh in ``lxml.builder.E``
1794 * Support for pickeling ``objectify.ObjectifiedElement`` objects to XML
1796 * ``update()`` method on Element.attrib
1798 * Optimised replacement for libxml2's _xmlReconsiliateNs(). This allows lxml
1799 a better handling of namespaces when moving elements between documents.
1804 * Removing Elements from a tree could make them loose their namespace
1807 * ``ElementInclude`` didn't honour base URL of original document
1809 * Replacing the children slice of an Element would cut off the tails of the
1812 * ``Element.getiterator(tag)`` did not accept ``Comment`` and
1813 ``ProcessingInstruction`` as tags
1815 * API functions now check incoming strings for XML conformity. Zero bytes or
1816 low ASCII characters are no longer accepted (AssertionError).
1818 * XSLT parsing failed to pass resolver context on to imported documents
1820 * passing '' as namespace prefix in nsmap could be passed through to libxml2
1822 * Objectify couldn't handle prefixed XSD type names in ``xsi:type``
1824 * More ET compatible behaviour when writing out XML declarations or not
1826 * More robust error handling in ``iterparse()``
1828 * Documents lost their top-level PIs and comments on serialisation
1830 * lxml.sax failed on comments and PIs. Comments are now properly ignored and
1833 * Possible memory leaks in namespace handling when moving elements between
1839 * major restructuring in the documentation
1848 * Build fixes for MS compiler
1850 * Item assignments to special names like ``element["text"]`` failed
1852 * Renamed ObjectifiedDataElement.__setText() to _setText() to make it easier
1855 * The pattern for attribute names in ObjectPath was too restrictive
1864 * Rich comparison of QName objects
1866 * Support for regular expressions in benchmark selection
1868 * get/set emulation (not .attrib!) for attributes on processing instructions
1870 * ElementInclude Python module for ElementTree compatible XInclude processing
1871 that honours custom resolvers registered with the source document
1873 * ElementTree.parser property holds the parser used to parse the document
1875 * setup.py has been refactored for greater readability and flexibility
1877 * --rpath flag to setup.py to induce automatic linking-in of dynamic library
1878 runtime search paths has been renamed to --auto-rpath. This makes it
1879 possible to pass an --rpath directly to distutils; previously this was being
1885 * Element instantiation now uses locks to prevent race conditions with threads
1887 * ElementTree.write() did not raise an exception when the file was not writable
1889 * Error handling could crash under Python <= 2.4.1 - fixed by disabling thread
1890 support in these environments
1892 * Element.find*() did not accept QName objects as path
1897 * code cleanup: redundant _NodeBase super class merged into _Element class
1898 Note: although the impact should be zero in most cases, this change breaks
1899 the compatibiliy of the public C-API
1908 * Data elements in objectify support repr(), which is now used by dump()
1910 * Source distribution now ships with a patched Pyrex
1912 * New C-API function makeElement() to create new elements with text,
1913 tail, attributes and namespaces
1915 * Reuse original parser flags for XInclude
1917 * Simplified support for handling XSLT processing instructions
1922 * Parser resources were not freed before the next parser run
1924 * Open files and XML strings returned by Python resolvers were not
1927 * Crash in the IDDict returned by XMLDTDID
1929 * Copying Comments and ProcessingInstructions failed
1931 * Memory leak for external URLs in _XSLTProcessingInstruction.parseXSL()
1933 * Memory leak when garbage collecting tailed root elements
1935 * HTML script/style content was not propagated to .text
1937 * Show text xincluded between text nodes correctly in .text and .tail
1939 * 'integer * objectify.StringElement' operation was not supported
1948 * XSLT profiling support (``profile_run`` keyword)
1950 * countchildren() method on objectify.ObjectifiedElement
1952 * Support custom elements for tree nodes in lxml.objectify
1957 * lxml.objectify failed to support long data values (e.g., "123L")
1959 * Error messages from XSLT did not reach ``XSLT.error_log``
1961 * Factories objectify.Element() and objectify.DataElement() were missing
1962 ``attrib`` and ``nsmap`` keyword arguments
1964 * Changing the default parser in lxml.objectify did not update the factories
1965 Element() and DataElement()
1967 * Let lxml.objectify.Element() always generate tree elements (not data
1970 * Build under Windows failed ('\0' bug in patched Pyrex version)
1979 * Comments and processing instructions return '<!-- coment -->' and
1980 '<?pi-target content?>' for repr()
1982 * Parsers are now the preferred (and default) place where element class lookup
1983 schemes should be registered. Namespace lookup is no longer supported by
1986 * Support for Python 2.5 beta
1988 * Unlock the GIL for deep copying documents and for XPath()
1990 * New ``compact`` keyword argument for parsing read-only documents
1992 * Support for parser options in iterparse()
1994 * The ``namespace`` axis is supported in XPath and returns (prefix, URI)
1997 * The XPath expression "/" now returns an empty list instead of raising an
2000 * XML-Object API on top of lxml (lxml.objectify)
2002 * Customizable Element class lookup:
2004 * different pre-implemented lookup mechanisms
2006 * support for externally provided lookup functions
2008 * Support for processing instructions (ET-like, not compatible)
2010 * Public C-level API for independent extension modules
2012 * Module level ``iterwalk()`` function as 'iterparse' for trees
2014 * Module level ``iterparse()`` function similar to ElementTree (see
2015 documentation for differences)
2017 * Element.nsmap property returns a mapping of all namespace prefixes known at
2018 the Element to their namespace URI
2020 * Reentrant threading support in RelaxNG, XMLSchema and XSLT
2022 * Threading support in parsers and serializers:
2024 * All in-memory operations (tostring, parse(StringIO), etc.) free the GIL
2026 * File operations (on file names) free the GIL
2028 * Reading from file-like objects frees the GIL and reacquires it for reading
2030 * Serialisation to file-like objects is single-threaded (high lock overhead)
2032 * Element iteration over XPath axes:
2034 * Element.iterdescendants() iterates over the descendants of an element
2036 * Element.iterancestors() iterates over the ancestors of an element (from
2039 * Element.itersiblings() iterates over either the following or preceding
2040 siblings of an element
2042 * Element.iterchildren() iterates over the children of an element in either
2045 * All iterators support the ``tag`` keyword argument to restrict the
2048 * Element.getnext() and Element.getprevious() return the direct siblings of an
2054 * filenames with local 8-bit encoding were not supported
2056 * 1.1beta did not compile under Python 2.3
2058 * ignore unknown 'pyval' attribute values in objectify
2060 * objectify.ObjectifiedElement.addattr() failed to accept Elements and Lists
2062 * objectify.ObjectPath.setattr() failed to accept Elements and Lists
2064 * XPathSyntaxError now inherits from XPathError
2066 * Threading race conditions in RelaxNG and XMLSchema
2068 * Crash when mixing elements from XSLT results into other trees, concurrent
2069 XSLT is only allowed when the stylesheet was parsed in the main thread
2071 * The EXSLT ``regexp:match`` function now works as defined (except for some
2072 differences in the regular expression syntax)
2074 * Setting element.text to '' returned None on request, not the empty string
2076 * ``iterparse()`` could crash on long XML files
2078 * Creating documents no longer copies the parser for later URL resolving. For
2079 performance reasons, only a reference is kept. Resolver updates on the
2080 parser will now be reflected by documents that were parsed before the
2081 change. Although this should rarely become visible, it is a behavioral
2091 * List-like ``Element.extend()`` method
2096 * Crash in tail handling in ``Element.replace()``
2105 * Element.replace(old, new) method to replace a subelement by another one
2110 * Crash when mixing elements from XSLT results into other trees
2112 * Copying/deepcopying did not work for ElementTree objects
2114 * Setting an attribute to a non-string value did not raise an exception
2116 * Element.remove() deleted the tail text from the removed Element
2125 * Support for setting a custom default Element class as opposed to namespace
2126 specific classes (which still override the default class)
2131 * Rare exceptions in Python list functions were not handled
2133 * Parsing accepted unicode strings with XML encoding declaration in certain
2136 * Parsing 8-bit encoded strings from StringIO objects raised an exception
2138 * Module function ``initThread()`` was removed - useless (and never worked)
2140 * XSLT and parser exception messages include the error line number
2149 * Repeated calls to Element.attrib now efficiently return the same instance
2154 * Document deallocation could crash in certain garbage collection scenarios
2156 * Extension function calls in XSLT variable declarations could break the
2157 stylesheet and crash on repeated calls
2159 * Deep copying Elements could loose namespaces declared in parents
2161 * Deep copying Elements did not copy tail
2163 * Parsing file(-like) objects failed to load external entities
2165 * Parsing 8-bit strings from file(-like) objects raised an exception
2167 * xsl:include failed when the stylesheet was parsed from a file-like object
2169 * lxml.sax.ElementTreeProducer did not call startDocument() / endDocument()
2171 * MSVC compiler complained about long strings (supports only 2048 bytes)
2180 * Element.getiterator() and the findall() methods support finding arbitrary
2181 elements from a namespace (pattern ``{namespace}*``)
2183 * Another speedup in tree iteration code
2185 * General speedup of Python Element object creation and deallocation
2187 * Writing C14N no longer serializes in memory (reduced memory footprint)
2189 * PyErrorLog for error logging through the Python ``logging`` module
2191 * ``Element.getroottree()`` returns an ElementTree for the root node of the
2192 document that contains the element.
2194 * ElementTree.getpath(element) returns a simple, absolute XPath expression to
2195 find the element in the tree structure
2197 * Error logs have a ``last_error`` attribute for convenience
2199 * Comment texts can be changed through the API
2201 * Formatted output via ``pretty_print`` keyword in serialization functions
2203 * XSLT can block access to file system and network via ``XSLTAccessControl``
2205 * ElementTree.write() no longer serializes in memory (reduced memory
2208 * Speedup of Element.findall(tag) and Element.getiterator(tag)
2210 * Support for writing the XML representation of Elements and ElementTrees to
2211 Python unicode strings via ``etree.tounicode()``
2213 * Support for writing XSLT results to Python unicode strings via ``unicode()``
2215 * Parsing a unicode string no longer copies the string (reduced memory
2218 * Parsing file-like objects reads chunks rather than the whole file (reduced
2221 * Parsing StringIO objects from the start avoids copying the string (reduced
2224 * Read-only 'docinfo' attribute in ElementTree class holds DOCTYPE
2225 information, original encoding and XML version as seen by the parser
2227 * etree module can be compiled without libxslt by commenting out the line
2228 ``include "xslt.pxi"`` near the end of the etree.pyx source file
2230 * Better error messages in parser exceptions
2232 * Error reporting also works in XSLT
2234 * Support for custom document loaders (URI resolvers) in parsers and XSLT,
2235 resolvers are registered at parser level
2237 * Implementation of exslt:regexp for XSLT based on the Python 're' module,
2238 enabled by default, can be switched off with 'regexp=False' keyword argument
2240 * Support for exslt extensions (libexslt) and libxslt extra functions
2241 (node-set, document, write, output)
2243 * Substantial speedup in XPath.evaluate()
2245 * HTMLParser for parsing (broken) HTML
2247 * XMLDTDID function parses XML into tuple (root node, ID dict) based on xml:id
2248 implementation of libxml2 (as opposed to ET compatible XMLID)
2253 * Memory leak in Element.__setitem__
2255 * Memory leak in Element.attrib.items() and Element.attrib.values()
2257 * Memory leak in XPath extension functions
2259 * Memory leak in unicode related setup code
2261 * Element now raises ValueError on empty tag names
2263 * Namespace fixing after moving elements between documents could fail if the
2264 source document was freed too early
2266 * Setting namespace-less tag names on namespaced elements ('{ns}t' -> 't')
2267 didn't reset the namespace
2269 * Unknown constants from newer libxml2 versions could raise exceptions in the
2272 * lxml.etree compiles much faster
2274 * On libxml2 <= 2.6.22, parsing strings with encoding declaration could fail
2277 * Document reference in ElementTree objects was not updated when the root
2278 element was moved to a different document
2280 * Running absolute XPath expressions on an Element now evaluates against the
2283 * Evaluating absolute XPath expressions (``/*``) on an ElementTree could fail
2285 * Crashes when calling XSLT, RelaxNG, etc. with uninitialized ElementTree
2288 * Removed public function ``initThreadLogging()``, replaced by more general
2289 ``initThread()`` which fixes a number of setup problems in threads
2291 * Memory leak when using iconv encoders in tostring/write
2293 * Deep copying Elements and ElementTrees maintains the document information
2295 * Serialization functions raise LookupError for unknown encodings
2297 * Memory deallocation crash resulting from deep copying elements
2299 * Some ElementTree methods could crash if the root node was not initialized
2300 (neither file nor element passed to the constructor)
2302 * Element/SubElement failed to set attribute namespaces from passed ``attrib``
2305 * ``tostring()`` adds an XML declaration for non-ASCII encodings
2307 * ``tostring()`` failed to serialize encodings that contain 0-bytes
2309 * ElementTree.xpath() and XPathDocumentEvaluator were not using the
2310 ElementTree root node as reference point
2312 * Calling ``document('')`` in XSLT failed to return the stylesheet
2321 * Speedup for Element.makeelement(): the new element reuses the original
2322 libxml2 document instead of creating a new empty one
2324 * Speedup for reversed() iteration over element children (Py2.4+ only)
2326 * ElementTree compatible QName class
2328 * RelaxNG and XMLSchema accept any Element, not only ElementTrees
2333 * str(xslt_result) was broken for XSLT output other than UTF-8
2335 * Memory leak if write_c14n fails to write the file after conversion
2337 * Crash in XMLSchema and RelaxNG when passing non-schema documents
2339 * Memory leak in RelaxNG() when RelaxNGParseError is raised
2347 * lxml.sax.ElementTreeContentHandler checks closing elements and raises
2348 SaxError on mismatch
2350 * lxml.sax.ElementTreeContentHandler supports namespace-less SAX events
2351 (startElement, endElement) and defaults to empty attributes (keyword
2354 * Speedup for repeatedly accessing element tag names
2356 * Minor API performance improvements
2361 * Memory deallocation bug when using XSLT output method "html"
2363 * sax.py was handling UTF-8 encoded tag names where it shouldn't
2365 * lxml.tests package will no longer be installed (is still in source tar)
2373 * Error logging API for libxml2 error messages
2375 * Various performance improvements
2377 * Benchmark script for lxml, ElementTree and cElementTree
2379 * Support for registering extension functions through new FunctionNamespace
2380 class (see doc/extensions.txt)
2382 * ETXPath class for XPath expressions in ElementTree notation ('//{ns}tag')
2384 * Support for variables in XPath expressions (also in XPath class)
2386 * XPath class for compiled XPath expressions
2388 * XMLID module level function (ElementTree compatible)
2390 * XMLParser API for customized libxml2 parser configuration
2392 * Support for custom Element classes through new Namespace API (see
2393 doc/namespace_extensions.txt)
2395 * Common exception base class LxmlError for module exceptions
2397 * real iterator support in iter(Element), Element.getiterator()
2399 * XSLT objects are callable, result trees support str()
2401 * Added MANIFEST.in for easier creation of RPM files.
2403 * 'getparent' method on elements allows navigation to an element's
2406 * Python core compatible SAX tree builder and SAX event generator. See
2407 doc/sax.txt for more information.
2412 * Segfaults and memory leaks in various API functions of Element
2414 * Segfault in XSLT.tostring()
2416 * ElementTree objects no longer interfere, Elements can be root of different
2417 ElementTrees at the same time
2419 * document('') works in XSLT documents read from files (in-memory documents
2420 cannot support this due to libxslt deficiencies)
2428 * Support for copy.deepcopy() on elements. copy.copy() works also, but
2429 does the same thing, and does *not* create a shallow copy, as that
2430 makes no sense in the context of libxml2 trees. This means a
2431 potential incompatibility with ElementTree, but there's more chance
2432 that it works than if copy.copy() isn't supported at all.
2434 * Increased compatibility with (c)ElementTree; .parse() on ElementTree is
2435 supported and parsing of gzipped XML files works.
2437 * implemented index() on elements, allowing one to find the index of a
2443 * Use xslt-config instead of xml2-config to find out libxml2
2444 directories to take into account a case where libxslt is installed
2445 in a different directory than libxslt.
2447 * Eliminate crash condition in iteration when text nodes are changed.
2449 * Passing 'None' to tostring() does not result in a segfault anymore,
2450 but an AssertionError.
2452 * Some test fixes for Windows.
2454 * Raise XMLSyntaxError and XPathSyntaxError instead of plain python
2455 syntax errors. This should be less confusing.
2457 * Fixed error with uncaught exception in Pyrex code.
2459 * Calling lxml.etree.fromstring('') throws XMLSyntaxError instead of a
2462 * has_key() works on attrib. 'in' tests also work correctly on attrib.
2464 * INSTALL.txt was saying 2.2.16 instead of 2.6.16 as a supported
2465 libxml2 version, as it should.
2467 * Passing a UTF-8 encoded string to the XML() function would fail;
2476 * parameters (XPath expressions) can be passed to XSLT using keyword
2479 * Simple XInclude support. Calling the xinclude() method on a tree
2480 will process any XInclude statements in the document.
2482 * XMLSchema support. Use the XMLSchema class or the convenience
2483 xmlschema() method on a tree to do XML Schema (XSD) validation.
2485 * Added convenience xslt() method on tree. This is less efficient
2486 than the XSLT object, but makes it easier to write quick code.
2488 * Added convenience relaxng() method on tree. This is less efficient
2489 than the RelaxNG object, but makes it easier to write quick code.
2491 * Make it possible to use XPathEvaluator with elements as well. The
2492 XPathEvaluator in this case will retain the element so multiple
2493 XPath queries can be made against one element efficiently. This
2494 replaces the second argument to the .evaluate() method that existed
2497 * Allow registerNamespace() to be called on an XPathEvaluator, after
2498 creation, to add additional namespaces. Also allow registerNamespaces(),
2499 which does the same for a namespace dictionary.
2501 * Add 'prefix' attribute to element to be able to read prefix information.
2502 This is entirely read-only.
2504 * It is possible to supply an extra nsmap keyword parameter to
2505 the Element() and SubElement() constructors, which supplies a
2506 prefix to namespace URI mapping. This will create namespace
2507 prefix declarations on these elements and these prefixes will show up
2508 in XML serialization.
2513 * Killed yet another memory management related bug: trees created
2514 using newDoc would not get a libxml2-level dictionary, which caused
2515 problems when deallocating these documents later if they contained a
2516 node that came from a document with a dictionary.
2518 * Moving namespaced elements between documents was problematic as
2519 references to the original document would remain. This has been fixed
2520 by applying xmlReconciliateNs() after each move operation.
2522 * Can pass None to 'dump()' without segfaults.
2524 * tostring() works properly for non-root elements as well.
2526 * Cleaned out the tostring() method so it should handle encoding
2529 * Cleaned out the ElementTree.write() method so it should handle encoding
2530 correctly. Writing directly to a file should also be faster, as there is no
2531 need to go through a Python string in that case. Made sure the test cases
2532 test both serializing to StringIO as well as serializing to a real file.
2540 * Changed setup.py so that library_dirs is also guessed. This should
2541 help with compilation on the Mac OS X platform, where otherwise the
2542 wrong library (shipping with the OS) could be picked up.
2544 * Tweaked setup.py so that it picks up the version from version.txt.
2549 * Do the right thing when handling namespaced attributes.
2551 * fix bug where tostring() moved nodes into new documents. tostring()
2552 had very nasty side-effects before this fix, sorry!
2557 * Python 2.2 compatibility fixes.
2559 * unicode fixes in Element() and Comment() as well as XML(); unicode
2560 input wasn't properly being UTF-8 encoded.
2565 Initial public release.