From: Hyunjee Kim Date: Thu, 31 Jan 2019 01:39:57 +0000 (+0900) Subject: Imported Upstream version 3.3.3 X-Git-Tag: upstream/4.3.0~29 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=7b9b19a3b7de01c320ec45b57f428b09c1e52621;p=platform%2Fupstream%2Fpython-lxml.git Imported Upstream version 3.3.3 Change-Id: Ib6a5c6417557352a0ba94148493d91c1c1640fe0 Signed-off-by: Hyunjee Kim --- diff --git a/CHANGES.txt b/CHANGES.txt index 281c017d..95e4111d 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -2,6 +2,22 @@ lxml changelog ============== +3.3.3 (2014-03-04) +================== + +Bugs fixed +---------- + +* Crash when using Element subtypes with ``__slots__``. + +Other changes +------------- + +* The internal classes ``_LogEntry`` and ``_Attrib`` can no longer be + subclassed from Python code. + + + 3.3.2 (2014-02-26) ================== @@ -10,7 +26,7 @@ Bugs fixed * The properties ``resolvers`` and ``version``, as well as the methods ``set_element_class_lookup()`` and ``makeelement()``, were lost from - ``iterparse`` objects. + ``iterparse`` objects in 3.3.0. * LP#1222132: instances of ``XMLSchema``, ``Schematron`` and ``RelaxNG`` did not clear their local ``error_log`` before running a validation. @@ -18,8 +34,8 @@ Bugs fixed * LP#1238500: lxml.doctestcompare mixed up "expected" and "actual" in attribute values. -* Some file I/O tests were failing in MS-Windows due to incorrect temp file - usage. Initial patch by Gabi Davar. +* Some file I/O tests were failing in MS-Windows due to non-portable temp + file usage. Initial patch by Gabi Davar. * LP#910014: duplicate IDs in a document were not reported by DTD validation. diff --git a/PKG-INFO b/PKG-INFO index 0ca9a9fb..b1065aad 100644 --- a/PKG-INFO +++ b/PKG-INFO @@ -1,12 +1,12 @@ Metadata-Version: 1.1 Name: lxml -Version: 3.3.2 +Version: 3.3.3 Summary: Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API. Home-page: http://lxml.de/ Author: lxml dev team Author-email: lxml-dev@lxml.de License: UNKNOWN -Download-URL: http://pypi.python.org/packages/source/l/lxml/lxml-3.3.2.tar.gz +Download-URL: http://pypi.python.org/packages/source/l/lxml/lxml-3.3.3.tar.gz Description: lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It provides safe and convenient access to these libraries using the ElementTree API. @@ -37,33 +37,20 @@ Description: lxml is a Pythonic, mature binding for the libxml2 and libxslt libr as soon as a maintenance branch has been established. Note that this requires Cython to be installed at an appropriate version for the build. - 3.3.2 (2014-02-26) + 3.3.3 (2014-03-04) ================== Bugs fixed ---------- - * The properties ``resolvers`` and ``version``, as well as the methods - ``set_element_class_lookup()`` and ``makeelement()``, were lost from - ``iterparse`` objects. + * Crash when using Element subtypes with ``__slots__``. - * LP#1222132: instances of ``XMLSchema``, ``Schematron`` and ``RelaxNG`` - did not clear their local ``error_log`` before running a validation. + Other changes + ------------- - * LP#1238500: lxml.doctestcompare mixed up "expected" and "actual" in - attribute values. + * The internal classes ``_LogEntry`` and ``_Attrib`` can no longer be + subclassed from Python code. - * Some file I/O tests were failing in MS-Windows due to incorrect temp file - usage. Initial patch by Gabi Davar. - - * LP#910014: duplicate IDs in a document were not reported by DTD validation. - - * LP#1185332: ``tostring(method="html")`` did not use HTML serialisation - semantics for trailing tail text. Initial patch by Sylvain Viollon. - - * LP#1281139: ``.attrib`` value of Comments lost its mutation methods - in 3.3.0. Even though it is empty and immutable, it should still - provide the same interface as that returned for Elements. diff --git a/doc/FAQ.txt b/doc/FAQ.txt index 454b1dc1..2b4c9ef3 100644 --- a/doc/FAQ.txt +++ b/doc/FAQ.txt @@ -48,12 +48,12 @@ ElementTree_. 6 Parsing and Serialisation 6.1 Why doesn't the ``pretty_print`` option reformat my XML output? 6.2 Why can't lxml parse my XML from unicode strings? - 6.3 What is the difference between str(xslt(doc)) and xslt(doc).write() ? - 6.4 Why can't I just delete parents or clear the root node in iterparse()? - 6.5 How do I output null characters in XML text? - 6.6 Is lxml vulnerable to XML bombs? - 6.7 How do I configure lxml safely as a web-service endpoint? - 6.8 Can lxml parse from file objects opened in unicode mode? + 6.3 Can lxml parse from file objects opened in unicode mode? + 6.4 What is the difference between str(xslt(doc)) and xslt(doc).write() ? + 6.5 Why can't I just delete parents or clear the root node in iterparse()? + 6.6 How do I output null characters in XML text? + 6.7 Is lxml vulnerable to XML bombs? + 6.8 How do I configure lxml safely as a web-service endpoint? 7 XPath and Document Traversal 7.1 What are the ``findall()`` and ``xpath()`` methods on Element(Tree)? 7.2 Why doesn't ``findall()`` support full XPath expressions? @@ -862,13 +862,26 @@ lxml can add fresh whitespace to the XML tree to indent it. Note that the ``remove_blank_text`` option also uses a heuristic if it has no definite knowledge about the document's ignorable whitespace. It will keep blank text nodes that appear after non-blank text nodes -at the same level. This is to prevent document-style XML from -breaking. +at the same level. This is to prevent document-style XML from loosing +content. -If you want to be sure all blank text is removed, you have to use -either a DTD to tell the parser which whitespace it can safely ignore, -or remove the ignorable whitespace manually after parsing, e.g. by -setting all tail text to None: +The HTMLParser has this structural knowledge built-in, which means that +most whitespace that appears between tags in HTML documents will *not* +be removed by this option, except in places where it is truly ignorable, +e.g. in the page header, between table structure tags, etc. Therefore, +it is also safe to use this option with the HTMLParser, as it will keep +content like the following intact (i.e. it will not remove the space +that separates the two words): + +.. sourcecode:: html + +

some text

+ +If you want to be sure all blank text is removed from an XML document +(or just more blank text than the parser does by itself), you have to +use either a DTD to tell the parser which whitespace it can safely +ignore, or remove the ignorable whitespace manually after parsing, +e.g. by setting all tail text to None: .. sourcecode:: python @@ -921,6 +934,30 @@ valid encoding. .. _`XML specification`: http://www.w3.org/TR/REC-xml/ +Can lxml parse from file objects opened in unicode/text mode? +------------------------------------------------------------- + +Technically, yes. However, you likely do not want to do that, because +it is extremely inefficient. The text encoding that libxml2 uses +internally is UTF-8, so parsing from a Unicode file means that Python +first reads a chunk of data from the file, then decodes it into a new +buffer, and then copies it into a new unicode string object, just to +let libxml2 make yet another copy while encoding it down into UTF-8 +in order to parse it. It's clear that this involves a lot more +recoding and copying than when parsing straight from the bytes that +the file contains. + +If you really know the encoding better than the parser (e.g. when +parsing HTML that lacks a content declaration), then instead of passing +an encoding parameter into the file object when opening it, create a +new instance of an XMLParser or HTMLParser and pass the encoding into +its constructor. Afterwards, use that parser for parsing, e.g. by +passing it into the ``etree.parse(file, parser)`` function. Remember +to open the file in binary mode (mode="rb"), or, if possible, prefer +passing the file path directly into ``parse()`` instead of an opened +Python file object. + + What is the difference between str(xslt(doc)) and xslt(doc).write() ? --------------------------------------------------------------------- @@ -1050,27 +1087,6 @@ API for lxml that applies certain counter measures internally. .. _defusedxml: https://bitbucket.org/tiran/defusedxml -Can lxml parse from file objects opened in unicode/text mode? -------------------------------------------------------------- - -Technically, yes. However, you likely do not want to do that, because -it is extremely inefficient. The text encoding that libxml2 uses -internally is UTF-8, so parsing from a Unicode file means that Python -first reads a chunk of data from the file, then decodes it into a new -buffer, and then copies it into a new unicode string object, just to -let libxml2 make yet another copy while encoding it down into UTF-8 -in order to parse it. It's clear that this involves a lot more -recoding and copying than when parsing straight from the bytes that -the file contains. - -If you really know the encoding better than the parser (e.g. when -parsing HTML that lacks a content declaration), then instead of passing -an encoding parameter into the file object when opening it, create a -new instance of an XMLParser or HTMLParser and pass the encoding into -its constructor. Afterwards, use that parser for parsing, e.g. by -passing it into the ``etree.parse(file, parser)`` function. - - XPath and Document Traversal ============================ diff --git a/doc/html/FAQ.html b/doc/html/FAQ.html index 0d5ccbdb..a93deac0 100644 --- a/doc/html/FAQ.html +++ b/doc/html/FAQ.html @@ -10,7 +10,7 @@
-

lxml FAQ - Frequently Asked Questions

+

lxml FAQ - Frequently Asked Questions

Frequently asked questions on lxml. See also the notes on compatibility to ElementTree.

@@ -59,12 +59,12 @@
  • Parsing and Serialisation
  • XPath and Document Traversal
      @@ -702,12 +702,22 @@ lxml can add fresh whitespace to the XML tree to indent it.

      Note that the remove_blank_text option also uses a heuristic if it has no definite knowledge about the document's ignorable whitespace. It will keep blank text nodes that appear after non-blank text nodes -at the same level. This is to prevent document-style XML from -breaking.

      -

      If you want to be sure all blank text is removed, you have to use -either a DTD to tell the parser which whitespace it can safely ignore, -or remove the ignorable whitespace manually after parsing, e.g. by -setting all tail text to None:

      +at the same level. This is to prevent document-style XML from loosing +content.

      +

      The HTMLParser has this structural knowledge built-in, which means that +most whitespace that appears between tags in HTML documents will not +be removed by this option, except in places where it is truly ignorable, +e.g. in the page header, between table structure tags, etc. Therefore, +it is also safe to use this option with the HTMLParser, as it will keep +content like the following intact (i.e. it will not remove the space +that separates the two words):

      +
      <p><b>some</b> <em>text</em></p>
      +
      +

      If you want to be sure all blank text is removed from an XML document +(or just more blank text than the parser does by itself), you have to +use either a DTD to tell the parser which whitespace it can safely +ignore, or remove the ignorable whitespace manually after parsing, +e.g. by setting all tail text to None:

      for element in root.iter():
           element.tail = None
       
      @@ -747,6 +757,27 @@ Python interpreters. Don't do it.

      broken. lxml will not parse them. You must provide parsable data in a valid encoding.

  • +
    +

    Can lxml parse from file objects opened in unicode/text mode?

    +

    Technically, yes. However, you likely do not want to do that, because +it is extremely inefficient. The text encoding that libxml2 uses +internally is UTF-8, so parsing from a Unicode file means that Python +first reads a chunk of data from the file, then decodes it into a new +buffer, and then copies it into a new unicode string object, just to +let libxml2 make yet another copy while encoding it down into UTF-8 +in order to parse it. It's clear that this involves a lot more +recoding and copying than when parsing straight from the bytes that +the file contains.

    +

    If you really know the encoding better than the parser (e.g. when +parsing HTML that lacks a content declaration), then instead of passing +an encoding parameter into the file object when opening it, create a +new instance of an XMLParser or HTMLParser and pass the encoding into +its constructor. Afterwards, use that parser for parsing, e.g. by +passing it into the etree.parse(file, parser) function. Remember +to open the file in binary mode (mode="rb"), or, if possible, prefer +passing the file path directly into parse() instead of an opened +Python file object.

    +

    What is the difference between str(xslt(doc)) and xslt(doc).write() ?

    The str() implementation of the XSLTResultTree class (a subclass of the @@ -852,24 +883,6 @@ safely expose their values to the evaluation engine.

    The defusedxml package comes with an example setup and a wrapper API for lxml that applies certain counter measures internally.

    -
    -

    Can lxml parse from file objects opened in unicode/text mode?

    -

    Technically, yes. However, you likely do not want to do that, because -it is extremely inefficient. The text encoding that libxml2 uses -internally is UTF-8, so parsing from a Unicode file means that Python -first reads a chunk of data from the file, then decodes it into a new -buffer, and then copies it into a new unicode string object, just to -let libxml2 make yet another copy while encoding it down into UTF-8 -in order to parse it. It's clear that this involves a lot more -recoding and copying than when parsing straight from the bytes that -the file contains.

    -

    If you really know the encoding better than the parser (e.g. when -parsing HTML that lacks a content declaration), then instead of passing -an encoding parameter into the file object when opening it, create a -new instance of an XMLParser or HTMLParser and pass the encoding into -its constructor. Afterwards, use that parser for parsing, e.g. by -passing it into the etree.parse(file, parser) function.

    -

    XPath and Document Traversal

    @@ -922,7 +935,7 @@ map it to your namespace. See also the question above.

    diff --git a/doc/html/api.html b/doc/html/api.html index 00b9dcf9..1cc9f2f2 100644 --- a/doc/html/api.html +++ b/doc/html/api.html @@ -8,7 +8,7 @@
    -

    APIs specific to lxml.etree

    +

    APIs specific to lxml.etree

    lxml.etree tries to follow established APIs wherever possible. Sometimes, however, the need to expose a feature in an easy way led to the invention of a @@ -450,7 +450,7 @@ example:

    diff --git a/doc/html/api/abc.ABCMeta-class.html b/doc/html/api/abc.ABCMeta-class.html index 185c6a8a..4dad4448 100644 --- a/doc/html/api/abc.ABCMeta-class.html +++ b/doc/html/api/abc.ABCMeta-class.html @@ -426,7 +426,7 @@ even via super()).