From: Cory Benfield <lukasaoz@gmail.com>
Date: Fri, 10 Aug 2012 13:47:13 +0000 (+0100)
Subject: Document encodings and RFC compliance.
X-Git-Tag: v0.13.7~10^2
X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=7a9419ce356d0be0383ec83b14e66658ea7f96be;p=services%2Fpython-requests.git

Document encodings and RFC compliance.
---

diff --git a/docs/user/advanced.rst b/docs/user/advanced.rst
index adda9c7..0ac450d 100644
--- a/docs/user/advanced.rst
+++ b/docs/user/advanced.rst
@@ -343,6 +343,31 @@ To use HTTP Basic Auth with your proxy, use the `http://user:password@host/` syn
         "http": "http://user:pass@10.10.1.10:3128/",
     }
 
+Compliance
+----------
+
+Requests is intended to be compliant with all relevant specifications and
+RFCs where that compliance will not cause difficulties for users. This
+attention to the specification can lead to some behaviour that may seem
+unusual to those not familiar with the relevant specification.
+
+Encodings
+^^^^^^^^^
+
+When you receive a response, Requests makes a guess at the encoding to use for
+decoding the response when you call the ``Response.text`` method. Requests
+will first check for an encoding in the HTTP header, and if none is present,
+will use `chardet <http://pypi.python.org/pypi/chardet>`_ to attempt to guess
+the encoding.
+
+The only time Requests will not do this is if no explicit charset is present
+in the HTTP headers **and** the ``Content-Type`` header contains ``text``. In
+this situation,
+`RFC 2616 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_
+specifies that the default charset must be ``ISO-8859-1``. Requests follows
+the specification in this case. If you require a different encoding, you can
+manually set the ``Response.encoding`` property, or use the raw
+``Request.content``.
 
 HTTP Verbs
 ----------
diff --git a/docs/user/quickstart.rst b/docs/user/quickstart.rst
index 9b0399d..71ffea0 100644
--- a/docs/user/quickstart.rst
+++ b/docs/user/quickstart.rst
@@ -86,12 +86,22 @@ again::
 Requests will automatically decode content from the server. Most unicode
 charsets are seamlessly decoded.
 
-When you make a request, ``r.encoding`` is set, based on the HTTP headers.
-Requests will use that encoding when you access ``r.text``.  If ``r.encoding``
-is ``None``, Requests will make an extremely educated guess of the encoding
-of the response body. You can manually set ``r.encoding`` to any encoding
-you'd like, and that charset will be used.
-
+When you make a request, Requests makes educated guesses about the encoding of
+the response based on the HTTP headers. The text encoding guessed by Requests
+is used when you access ``r.text``. You can find out what encoding Requests is
+using, and change it, using the ``r.encoding`` property::
+
+    >>> r.encoding
+    'utf-8'
+    >>> r.encoding = 'ISO-8859-1'
+
+If you change the encoding, Requests will use the new value of ``r.encoding``
+whenever you call ``r.text``.
+
+Requests will also use custom encodings in the event that you need them. If
+you have created your own encoding and registered it with the ``codecs``
+module, you can simply use the codec name as the value of ``r.encoding`` and
+Requests will handle the decoding for you.
 
 Binary Response Content
 -----------------------