From 22e7a9e4a8c167c5b4da85966b7ecf2eef5af5a9 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sun, 2 Jun 2013 15:48:01 +0200 Subject: [PATCH] add explicit section on Cython's Python string types --- docs/src/tutorial/strings.rst | 45 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 8 deletions(-) diff --git a/docs/src/tutorial/strings.rst b/docs/src/tutorial/strings.rst index cea7d76..0e6ef81 100644 --- a/docs/src/tutorial/strings.rst +++ b/docs/src/tutorial/strings.rst @@ -3,14 +3,43 @@ Unicode and passing strings =========================== -Similar to the string semantics in Python 3, Cython also strictly -separates byte strings and unicode strings. Above all, this means -that by default there is no automatic conversion between byte strings -and unicode strings (except for what Python 2 does in string operations). -All encoding and decoding must pass through an explicit encoding/decoding -step. For simple cases, the module-level ``c_string_type`` and -``c_string_encoding`` directives can be used to implicitly insert these -encoding/decoding steps to ease conversion between Python and C strings. +Similar to the string semantics in Python 3, Cython strictly separates +byte strings and unicode strings. Above all, this means that by default +there is no automatic conversion between byte strings and unicode strings +(except for what Python 2 does in string operations). All encoding and +decoding must pass through an explicit encoding/decoding step. To ease +conversion between Python and C strings in simple cases, the module-level +``c_string_type`` and ``c_string_encoding`` directives can be used to +implicitly insert these encoding/decoding steps. + + +Python string types in Cython code +---------------------------------- + +Cython supports three Python string types: :type:`bytes`, :type:`str` +and :type:`unicode`. The :type:`str` type is special in that it is the +byte string in Python 2 and the Unicode string in Python 3 (for Cython +code compiled with language level 2, i.e. the default). Thus, in Python +2, both :type:`bytes` and :type:`str` represent the byte string type, +whereas in Python 3, :type:`str` and :type:`unicode` represent the Python +Unicode string type. The switch is made at C compile time, the Python +version that is used to run Cython is not relevant. + +When compiling Cython code with language level 3, the :type:`str` type +is identified with exactly the Unicode string type at Cython compile time, +i.e. it no does not identify with :type:`bytes` when running in Python 2. + +Note that the :type:`str` type is not compatible with the :type:`unicode` +type in Python 2, i.e. you cannot assign a Unicode string to a variable +or argument that is typed :type:`str`. The attempt will result in either +a compile time error (if detectable) or a ``TypeError`` exception at +runtime. You should therefore be careful when you statically type a +string variable in code that must be compatible with Python 2, as this +Python version allows a mix of byte strings and unicode strings for data +and users normally expect code to be able to work with both. Code that +only targets Python 3 can safely type variables and arguments as either +:type:`bytes` or :type:`unicode`. + General notes about C strings ----------------------------- -- 2.7.4