From 015793d0719ad3d04267ad68c8f524b12a2264cc Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 31 Dec 2012 16:10:38 +0100 Subject: [PATCH] add warning to string tutorial that users should avoid char* in favour of Python objects unless they know what they are doing --- docs/src/tutorial/strings.rst | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/docs/src/tutorial/strings.rst b/docs/src/tutorial/strings.rst index a95cd28..a146c21 100644 --- a/docs/src/tutorial/strings.rst +++ b/docs/src/tutorial/strings.rst @@ -10,7 +10,38 @@ strings (except for what Python 2 does in string operations). All encoding and decoding must pass through an explicit encoding/decoding step. -It is, however, very easy to pass byte strings between C code and Python. +General notes about C strings +----------------------------- + +In many use cases, C strings (a.k.a. character pointers) are slow +and cumbersome. For one, they usually require manual memory +management in one way or another, which makes it more likely to +introduce bugs into your code. + +Then, Python string objects cache their length, so requesting it +(e.g. to validate the bounds of index access or when concatenating +two strings into one) is an efficient constant time operation. +In contrast, calling :c:func:`strlen()` to get this information +from a C string takes linear time, which makes many operations on +C strings rather costly. + +Regarding text processing, Python has built-in support for Unicode, +which C lacks completely. If you are dealing with Unicode text, +you are usually better off using Python Unicode string objects than +trying to work with encoded data in C strings. Cython makes this +quite easy and efficient. + +Generally speaking: unless you know what you are doing, avoid +using C strings where possible and use Python string objects instead. +The obvious exception to this is when passing them back and forth +from and to external C code. Also, C++ strings remember their length +as well, so they can provide a suitable alternative to Python bytes +objects in some cases. + +Passing byte strings +-------------------- + +It is very easy to pass byte strings between C code and Python. When receiving a byte string from a C library, you can let Cython convert it into a Python byte string by simply assigning it to a Python variable:: @@ -18,6 +49,10 @@ Python variable:: cdef char* c_string = c_call_returning_a_c_string() cdef bytes py_string = c_string +A type cast to ``object`` or ``bytes`` will do the same thing:: + + py_string = c_string + This creates a Python byte string object that holds a copy of the original C string. It can be safely passed around in Python code, and will be garbage collected when the last reference to it goes out of -- 2.7.4