From: Hyunjee Kim
Date: Thu, 31 Jan 2019 01:39:57 +0000 (+0900)
Subject: Imported Upstream version 3.3.3
X-Git-Tag: upstream/4.3.0~29
X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=7b9b19a3b7de01c320ec45b57f428b09c1e52621;p=platform%2Fupstream%2Fpython-lxml.git
Imported Upstream version 3.3.3
Change-Id: Ib6a5c6417557352a0ba94148493d91c1c1640fe0
Signed-off-by: Hyunjee Kim
---
diff --git a/CHANGES.txt b/CHANGES.txt
index 281c017d..95e4111d 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -2,6 +2,22 @@
lxml changelog
==============
+3.3.3 (2014-03-04)
+==================
+
+Bugs fixed
+----------
+
+* Crash when using Element subtypes with ``__slots__``.
+
+Other changes
+-------------
+
+* The internal classes ``_LogEntry`` and ``_Attrib`` can no longer be
+ subclassed from Python code.
+
+
+
3.3.2 (2014-02-26)
==================
@@ -10,7 +26,7 @@ Bugs fixed
* The properties ``resolvers`` and ``version``, as well as the methods
``set_element_class_lookup()`` and ``makeelement()``, were lost from
- ``iterparse`` objects.
+ ``iterparse`` objects in 3.3.0.
* LP#1222132: instances of ``XMLSchema``, ``Schematron`` and ``RelaxNG``
did not clear their local ``error_log`` before running a validation.
@@ -18,8 +34,8 @@ Bugs fixed
* LP#1238500: lxml.doctestcompare mixed up "expected" and "actual" in
attribute values.
-* Some file I/O tests were failing in MS-Windows due to incorrect temp file
- usage. Initial patch by Gabi Davar.
+* Some file I/O tests were failing in MS-Windows due to non-portable temp
+ file usage. Initial patch by Gabi Davar.
* LP#910014: duplicate IDs in a document were not reported by DTD validation.
diff --git a/PKG-INFO b/PKG-INFO
index 0ca9a9fb..b1065aad 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,12 +1,12 @@
Metadata-Version: 1.1
Name: lxml
-Version: 3.3.2
+Version: 3.3.3
Summary: Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
Home-page: http://lxml.de/
Author: lxml dev team
Author-email: lxml-dev@lxml.de
License: UNKNOWN
-Download-URL: http://pypi.python.org/packages/source/l/lxml/lxml-3.3.2.tar.gz
+Download-URL: http://pypi.python.org/packages/source/l/lxml/lxml-3.3.3.tar.gz
Description: lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It
provides safe and convenient access to these libraries using the ElementTree
API.
@@ -37,33 +37,20 @@ Description: lxml is a Pythonic, mature binding for the libxml2 and libxslt libr
as soon as a maintenance branch has been established. Note that this
requires Cython to be installed at an appropriate version for the build.
- 3.3.2 (2014-02-26)
+ 3.3.3 (2014-03-04)
==================
Bugs fixed
----------
- * The properties ``resolvers`` and ``version``, as well as the methods
- ``set_element_class_lookup()`` and ``makeelement()``, were lost from
- ``iterparse`` objects.
+ * Crash when using Element subtypes with ``__slots__``.
- * LP#1222132: instances of ``XMLSchema``, ``Schematron`` and ``RelaxNG``
- did not clear their local ``error_log`` before running a validation.
+ Other changes
+ -------------
- * LP#1238500: lxml.doctestcompare mixed up "expected" and "actual" in
- attribute values.
+ * The internal classes ``_LogEntry`` and ``_Attrib`` can no longer be
+ subclassed from Python code.
- * Some file I/O tests were failing in MS-Windows due to incorrect temp file
- usage. Initial patch by Gabi Davar.
-
- * LP#910014: duplicate IDs in a document were not reported by DTD validation.
-
- * LP#1185332: ``tostring(method="html")`` did not use HTML serialisation
- semantics for trailing tail text. Initial patch by Sylvain Viollon.
-
- * LP#1281139: ``.attrib`` value of Comments lost its mutation methods
- in 3.3.0. Even though it is empty and immutable, it should still
- provide the same interface as that returned for Elements.
diff --git a/doc/FAQ.txt b/doc/FAQ.txt
index 454b1dc1..2b4c9ef3 100644
--- a/doc/FAQ.txt
+++ b/doc/FAQ.txt
@@ -48,12 +48,12 @@ ElementTree_.
6 Parsing and Serialisation
6.1 Why doesn't the ``pretty_print`` option reformat my XML output?
6.2 Why can't lxml parse my XML from unicode strings?
- 6.3 What is the difference between str(xslt(doc)) and xslt(doc).write() ?
- 6.4 Why can't I just delete parents or clear the root node in iterparse()?
- 6.5 How do I output null characters in XML text?
- 6.6 Is lxml vulnerable to XML bombs?
- 6.7 How do I configure lxml safely as a web-service endpoint?
- 6.8 Can lxml parse from file objects opened in unicode mode?
+ 6.3 Can lxml parse from file objects opened in unicode mode?
+ 6.4 What is the difference between str(xslt(doc)) and xslt(doc).write() ?
+ 6.5 Why can't I just delete parents or clear the root node in iterparse()?
+ 6.6 How do I output null characters in XML text?
+ 6.7 Is lxml vulnerable to XML bombs?
+ 6.8 How do I configure lxml safely as a web-service endpoint?
7 XPath and Document Traversal
7.1 What are the ``findall()`` and ``xpath()`` methods on Element(Tree)?
7.2 Why doesn't ``findall()`` support full XPath expressions?
@@ -862,13 +862,26 @@ lxml can add fresh whitespace to the XML tree to indent it.
Note that the ``remove_blank_text`` option also uses a heuristic if it
has no definite knowledge about the document's ignorable whitespace.
It will keep blank text nodes that appear after non-blank text nodes
-at the same level. This is to prevent document-style XML from
-breaking.
+at the same level. This is to prevent document-style XML from loosing
+content.
-If you want to be sure all blank text is removed, you have to use
-either a DTD to tell the parser which whitespace it can safely ignore,
-or remove the ignorable whitespace manually after parsing, e.g. by
-setting all tail text to None:
+The HTMLParser has this structural knowledge built-in, which means that
+most whitespace that appears between tags in HTML documents will *not*
+be removed by this option, except in places where it is truly ignorable,
+e.g. in the page header, between table structure tags, etc. Therefore,
+it is also safe to use this option with the HTMLParser, as it will keep
+content like the following intact (i.e. it will not remove the space
+that separates the two words):
+
+.. sourcecode:: html
+
+ some text
+
+If you want to be sure all blank text is removed from an XML document
+(or just more blank text than the parser does by itself), you have to
+use either a DTD to tell the parser which whitespace it can safely
+ignore, or remove the ignorable whitespace manually after parsing,
+e.g. by setting all tail text to None:
.. sourcecode:: python
@@ -921,6 +934,30 @@ valid encoding.
.. _`XML specification`: http://www.w3.org/TR/REC-xml/
+Can lxml parse from file objects opened in unicode/text mode?
+-------------------------------------------------------------
+
+Technically, yes. However, you likely do not want to do that, because
+it is extremely inefficient. The text encoding that libxml2 uses
+internally is UTF-8, so parsing from a Unicode file means that Python
+first reads a chunk of data from the file, then decodes it into a new
+buffer, and then copies it into a new unicode string object, just to
+let libxml2 make yet another copy while encoding it down into UTF-8
+in order to parse it. It's clear that this involves a lot more
+recoding and copying than when parsing straight from the bytes that
+the file contains.
+
+If you really know the encoding better than the parser (e.g. when
+parsing HTML that lacks a content declaration), then instead of passing
+an encoding parameter into the file object when opening it, create a
+new instance of an XMLParser or HTMLParser and pass the encoding into
+its constructor. Afterwards, use that parser for parsing, e.g. by
+passing it into the ``etree.parse(file, parser)`` function. Remember
+to open the file in binary mode (mode="rb"), or, if possible, prefer
+passing the file path directly into ``parse()`` instead of an opened
+Python file object.
+
+
What is the difference between str(xslt(doc)) and xslt(doc).write() ?
---------------------------------------------------------------------
@@ -1050,27 +1087,6 @@ API for lxml that applies certain counter measures internally.
.. _defusedxml: https://bitbucket.org/tiran/defusedxml
-Can lxml parse from file objects opened in unicode/text mode?
--------------------------------------------------------------
-
-Technically, yes. However, you likely do not want to do that, because
-it is extremely inefficient. The text encoding that libxml2 uses
-internally is UTF-8, so parsing from a Unicode file means that Python
-first reads a chunk of data from the file, then decodes it into a new
-buffer, and then copies it into a new unicode string object, just to
-let libxml2 make yet another copy while encoding it down into UTF-8
-in order to parse it. It's clear that this involves a lot more
-recoding and copying than when parsing straight from the bytes that
-the file contains.
-
-If you really know the encoding better than the parser (e.g. when
-parsing HTML that lacks a content declaration), then instead of passing
-an encoding parameter into the file object when opening it, create a
-new instance of an XMLParser or HTMLParser and pass the encoding into
-its constructor. Afterwards, use that parser for parsing, e.g. by
-passing it into the ``etree.parse(file, parser)`` function.
-
-
XPath and Document Traversal
============================
diff --git a/doc/html/FAQ.html b/doc/html/FAQ.html
index 0d5ccbdb..a93deac0 100644
--- a/doc/html/FAQ.html
+++ b/doc/html/FAQ.html
@@ -10,7 +10,7 @@
-
lxml FAQ - Frequently Asked Questions
+
lxml FAQ - Frequently Asked Questions
Frequently asked questions on lxml. See also the notes on compatibility to
ElementTree.
@@ -59,12 +59,12 @@
Parsing and Serialisation
XPath and Document Traversal
@@ -702,12 +702,22 @@ lxml can add fresh whitespace to the XML tree to indent it.
Note that the remove_blank_text option also uses a heuristic if it
has no definite knowledge about the document's ignorable whitespace.
It will keep blank text nodes that appear after non-blank text nodes
-at the same level. This is to prevent document-style XML from
-breaking.
-If you want to be sure all blank text is removed, you have to use
-either a DTD to tell the parser which whitespace it can safely ignore,
-or remove the ignorable whitespace manually after parsing, e.g. by
-setting all tail text to None:
+at the same level. This is to prevent document-style XML from loosing
+content.
+The HTMLParser has this structural knowledge built-in, which means that
+most whitespace that appears between tags in HTML documents will not
+be removed by this option, except in places where it is truly ignorable,
+e.g. in the page header, between table structure tags, etc. Therefore,
+it is also safe to use this option with the HTMLParser, as it will keep
+content like the following intact (i.e. it will not remove the space
+that separates the two words):
+<p><b>some</b> <em>text</em></p>
+
+If you want to be sure all blank text is removed from an XML document
+(or just more blank text than the parser does by itself), you have to
+use either a DTD to tell the parser which whitespace it can safely
+ignore, or remove the ignorable whitespace manually after parsing,
+e.g. by setting all tail text to None:
for element in root.iter():
element.tail = None
@@ -747,6 +757,27 @@ Python interpreters. Don't do it.
broken. lxml will not parse them. You must provide parsable data in a
valid encoding.
+
+
Can lxml parse from file objects opened in unicode/text mode?
+
Technically, yes. However, you likely do not want to do that, because
+it is extremely inefficient. The text encoding that libxml2 uses
+internally is UTF-8, so parsing from a Unicode file means that Python
+first reads a chunk of data from the file, then decodes it into a new
+buffer, and then copies it into a new unicode string object, just to
+let libxml2 make yet another copy while encoding it down into UTF-8
+in order to parse it. It's clear that this involves a lot more
+recoding and copying than when parsing straight from the bytes that
+the file contains.
+
If you really know the encoding better than the parser (e.g. when
+parsing HTML that lacks a content declaration), then instead of passing
+an encoding parameter into the file object when opening it, create a
+new instance of an XMLParser or HTMLParser and pass the encoding into
+its constructor. Afterwards, use that parser for parsing, e.g. by
+passing it into the etree.parse(file, parser) function. Remember
+to open the file in binary mode (mode="rb"), or, if possible, prefer
+passing the file path directly into parse() instead of an opened
+Python file object.
+
What is the difference between str(xslt(doc)) and xslt(doc).write() ?
The str() implementation of the XSLTResultTree class (a subclass of the
@@ -852,24 +883,6 @@ safely expose their values to the evaluation engine.
The defusedxml package comes with an example setup and a wrapper
API for lxml that applies certain counter measures internally.
-
-
Can lxml parse from file objects opened in unicode/text mode?
-
Technically, yes. However, you likely do not want to do that, because
-it is extremely inefficient. The text encoding that libxml2 uses
-internally is UTF-8, so parsing from a Unicode file means that Python
-first reads a chunk of data from the file, then decodes it into a new
-buffer, and then copies it into a new unicode string object, just to
-let libxml2 make yet another copy while encoding it down into UTF-8
-in order to parse it. It's clear that this involves a lot more
-recoding and copying than when parsing straight from the bytes that
-the file contains.
-
If you really know the encoding better than the parser (e.g. when
-parsing HTML that lacks a content declaration), then instead of passing
-an encoding parameter into the file object when opening it, create a
-new instance of an XMLParser or HTMLParser and pass the encoding into
-its constructor. Afterwards, use that parser for parsing, e.g. by
-passing it into the etree.parse(file, parser) function.
-
XPath and Document Traversal
@@ -922,7 +935,7 @@ map it to your namespace. See also the question above.
diff --git a/doc/html/api.html b/doc/html/api.html
index 00b9dcf9..1cc9f2f2 100644
--- a/doc/html/api.html
+++ b/doc/html/api.html
@@ -8,7 +8,7 @@
-
APIs specific to lxml.etree
+
APIs specific to lxml.etree
lxml.etree tries to follow established APIs wherever possible. Sometimes,
however, the need to expose a feature in an easy way led to the invention of a
@@ -450,7 +450,7 @@ example:
diff --git a/doc/html/api/abc.ABCMeta-class.html b/doc/html/api/abc.ABCMeta-class.html
index 185c6a8a..4dad4448 100644
--- a/doc/html/api/abc.ABCMeta-class.html
+++ b/doc/html/api/abc.ABCMeta-class.html
@@ -426,7 +426,7 @@ even via super()).