How to build lxml from source ============================= To build lxml from source, you need libxml2 and libxslt properly installed, *including the header files*. These are likely shipped in separate ``-dev`` or ``-devel`` packages like ``libxml2-dev``, which you must install before trying to build lxml. The build process also requires setuptools_. The lxml source distribution comes with a script called ``ez_setup.py`` that can be used to install them. .. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools .. contents:: .. 1 Cython 2 Github, git and hg 3 Setuptools 4 Running the tests and reporting errors 5 Building an egg 6 Building lxml on MacOS-X 7 Static linking on Windows 8 Building Debian packages from SVN sources Cython ------ .. _EasyInstall: http://peak.telecommunity.com/DevCenter/EasyInstall .. _Cython: http://www.cython.org The lxml.etree and lxml.objectify modules are written in Cython_. Since we distribute the Cython-generated .c files with lxml releases, however, you do not need Cython to build lxml from the normal release sources. We even encourage you to *not install Cython* for a normal release build, as the generated C code can vary quite heavily between Cython versions, which may or may not generate correct code for lxml. The pre-generated release sources were tested and therefore are known to work. So, if you want a reliable build of lxml, we suggest to a) use a source release of lxml and b) disable or uninstall Cython for the build. *Only* if you are interested in building lxml from a checkout of the developer sources (e.g. to test a bug fix that has not been release yet) or if you want to be an lxml developer, then you do need a working Cython installation. You can use EasyInstall_ to install it:: easy_install "Cython>=0.14.1" lxml currently requires Cython 0.14.1, later release versions should work as well. Github, git and hg ------------------- The lxml package is developed in a repository on Github_ using Mercurial_ and the `hg-git`_ plugin. You can retrieve the current developer version using:: hg clone git://github.com/lxml/lxml.git lxml This will create a directory ``lxml`` and download the source into it, including the complete development history. Don't be afraid, the download is fairly quick. You can also browse the `lxml repository`_ through the web. .. _Github: https://github.com/lxml/ .. _Mercurial: http://mercurial.selenic.com/ .. _`hg-git`: http://hg-git.github.com/ .. _`lxml repository`: https://github.com/lxml/lxml .. _`source tar-ball`: https://github.com/lxml/lxml/tarball/master Setuptools ---------- Usually, building lxml is done through setuptools. Clone the source repository as described above (or download the `source tar-ball`_ and unpack it) and then type:: python setup.py build or:: python setup.py bdist_egg If you want to test lxml from the source directory, it is better to build it in-place like this:: python setup.py build_ext -i or, in Unix-like environments:: make If you get errors about missing header files (e.g. ``Python.h`` or ``libxml/xmlversion.h``) then you need to make sure the development packages of Python, libxml2 and libxslt are properly installed. On Linux distributions, they are usually called something like ``libxml2-dev`` or ``libxslt-devel``. If these packages were installed in non-standard places, try passing the following option to setup.py to make sure the right config is found:: python setup.py build --with-xslt-config=/path/to/xslt-config If this doesn't help, you may have to add the location of the header files to the include path like:: python setup.py build_ext -i -I /usr/include/libxml2 where the file is in ``/usr/include/libxml2/libxml/xmlversion.h`` To use lxml.etree in-place, you can place lxml's ``src`` directory on your Python module search path (PYTHONPATH) and then import ``lxml.etree`` to play with it:: # cd lxml # PYTHONPATH=src python Python 2.5.1 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>> To make sure everything gets recompiled cleanly after changes, you can run ``make clean`` or delete the file ``src/lxml/etree.c``. Distutils do not automatically pick up changes that affect files other than the main file ``src/lxml/etree.pyx``. Running the tests and reporting errors -------------------------------------- The source distribution (tgz) and the source repository contain a test suite for lxml. You can run it from the top-level directory:: python test.py Note that the test script only tests the in-place build (see distutils building above), as it searches the ``src`` directory. You can use the following one-step command to trigger an in-place build and test it:: make test This also runs the ElementTree and cElementTree compatibility tests. To call them separately, make sure you have lxml on your PYTHONPATH first, then run:: python selftest.py and:: python selftest2.py If the tests give failures, errors, or worse, segmentation faults, we'd really like to know. Please contact us on the `mailing list`_, and please specify the version of lxml, libxml2, libxslt and Python you were using, as well as your operating system type (Linux, Windows, MacOs, ...). .. _`mailing list`: http://lxml.de/mailinglist/ Building an egg --------------- This is the procedure to make an lxml egg for your platform (assuming that you have setuptools_ installed): * Download the lxml-x.y.tar.gz release. This contains the pregenerated C so that you can be sure you build exactly from the release sources. Unpack them and cd into the resulting directory. * python setup.py build * If you're on a unixy platform, cd into ``build/lib.your.platform`` and strip any ``.so`` file you find there. This reduces the size of the egg considerably. * ``python setup.py bdist_egg`` This will put the egg into the ``dist`` directory. Building lxml on MacOS-X ------------------------ Apple regularly ships new system releases with horribly outdated system libraries. This is specifically the case for libxml2 and libxslt, where the system provided versions are too old to build lxml. While the Unix environment in MacOS-X makes it relatively easy to install Unix/Linux style package management tools and new software, it actually seems to be hard to get libraries set up for exclusive usage that MacOS-X ships in an older version. Alternative distributions (like macports) install their libraries in addition to the system libraries, but the compiler and the runtime loader on MacOS still sees the system libraries before the new libraries. This can lead to undebuggable crashes where the newer library seems to be loaded but the older system library is used. Apple discourages static building against libraries, which would help working around this problem. Apple does not ship static library binaries with its system and several package management systems follow this decision. Therefore, building static binaries requires building the dependencies first. The ``setup.py`` script does this automatically when you call it like this:: python setup.py build --static-deps This will download and build the latest versions of libxml2 and libxslt from the official FTP download site. If you want to use specific versions, or want to prevent any online access, you can download both ``tar.gz`` release files yourself, place them into a subdirectory ``libs`` in the lxml distribution, and call ``setup.py`` with the desired target versions like this:: python setup.py build --static-deps \ --libxml2-version=2.7.3 \ --libxslt-version=1.1.24 \ sudo python setup.py install Instead of ``build``, you can use any target, like ``bdist_egg`` if you want to use setuptools to build an installable egg. Note that this also works with EasyInstall_. Since you can't pass command line options in this case, you have to use an environment variable instead:: STATIC_DEPS=true easy_install lxml Some machines may require an additional run with "sudo" to install the package into the Python package directory:: STATIC_DEPS=true sudo easy_install lxml The ``STATICBUILD`` environment variable is handled equivalently to the ``STATIC_DEPS`` variable, but is used by some other extension packages, too. Static linking on Windows ------------------------- Most operating systems have proper package management that makes installing current versions of libxml2 and libxslt easy. The most famous exception is Microsoft Windows, which entirely lacks these capabilities. It can therefore be interesting to statically link the external libraries into lxml.etree to avoid having to install them separately. Download lxml and all required libraries to the same directory. The iconv, libxml2, libxslt, and zlib libraries are all available from the ftp site ftp://ftp.zlatkovic.com/pub/libxml/. Your directory should now have the following files in it (although most likely different versions):: iconv-1.9.1.win32.zip libxml2-2.6.23.win32.zip libxslt-1.1.15.win32.zip lxml-1.0.0.tgz zlib-1.2.3.win32.zip Now extract each of those files in the *same* directory. This should give you something like this:: iconv-1.9.1.win32/ iconv-1.9.1.win32.zip libxml2-2.6.23.win32/ libxml2-2.6.23.win32.zip libxslt-1.1.15.win32/ libxslt-1.1.15.win32.zip lxml-1.0.0/ lxml-1.0.0.tgz zlib-1.2.3.win32/ zlib-1.2.3.win32.zip Go to the lxml directory and edit the file ``setup.py``. There should be a section near the top that looks like this:: STATIC_INCLUDE_DIRS = [] STATIC_LIBRARY_DIRS = [] STATIC_CFLAGS = [] Change this section to something like this, but take care to use the correct version numbers:: STATIC_INCLUDE_DIRS = [ "..\\libxml2-2.6.23.win32\\include", "..\\libxslt-1.1.15.win32\\include", "..\\zlib-1.2.3.win32\\include", "..\\iconv-1.9.1.win32\\include" ] STATIC_LIBRARY_DIRS = [ "..\\libxml2-2.6.23.win32\\lib", "..\\libxslt-1.1.15.win32\\lib", "..\\zlib-1.2.3.win32\\lib", "..\\iconv-1.9.1.win32\\lib" ] STATIC_CFLAGS = [] Add any CFLAGS you might consider useful to the third list. Now you should be able to pass the ``--static`` option to setup.py and everything should work well. Try calling:: python setup.py bdist_wininst --static This will create a windows installer in the ``pkg`` directory. Building Debian packages from SVN sources ----------------------------------------- `Andreas Pakulat`_ proposed the following approach. .. _`Andreas Pakulat`: http://thread.gmane.org/gmane.comp.python.lxml.devel/1239/focus=1249 * ``apt-get source lxml`` * remove the unpacked directory * tar.gz the lxml SVN version and replace the orig.tar.gz that lies in the directory * check md5sum of created tar.gz file and place new sum and size in dsc file * do ``dpkg-source -x lxml-[VERSION].dsc`` and cd into the newly created directory * run ``dch -i`` and add a comment like "use trunk version", this will increase the debian version number so apt/dpkg won't get confused * run ``dpkg-buildpackage -rfakeroot -us -uc`` to build the package In case ``dpkg-buildpackage`` tells you that some dependecies are missing, you can either install them manually or run ``apt-get build-dep lxml``. That will give you .deb packages in the parent directory which can be installed using ``dpkg -i``.