docs/INTERNALS

   1 Table of Contents
   2 =================
   3
   4  - [Intro](#intro)
   5  - [git](#git)
   6  - [Portability](#Portability)
   7  - [Windows vs Unix](#winvsunix)
   8  - [Library](#Library)
   9    - [`Curl_connect`](#Curl_connect)
  10    - [`Curl_do`](#Curl_do)
  11    - [`Curl_readwrite`](#Curl_readwrite)
  12    - [`Curl_done`](#Curl_done)
  13    - [`Curl_disconnect`](#Curl_disconnect)
  14  - [HTTP(S)](#http)
  15  - [FTP](#ftp)
  16    - [Kerberos](#kerberos)
  17  - [TELNET](#telnet)
  18  - [FILE](#file)
  19  - [SMB](#smb)
  20  - [LDAP](#ldap)
  21  - [E-mail](#email)
  22  - [General](#general)
  23  - [Persistent Connections](#persistent)
  24  - [multi interface/non-blocking](#multi)
  25  - [SSL libraries](#ssl)
  26  - [Library Symbols](#symbols)
  27  - [Return Codes and Informationals](#returncodes)
  28  - [AP/ABI](#abi)
  29  - [Client](#client)
  30  - [Memory Debugging](#memorydebug)
  31  - [Test Suite](#test)
  32  - [Asynchronous name resolves](#asyncdns)
  33    - [c-ares](#cares)
  34  - [`curl_off_t`](#curl_off_t)
  35  - [curlx](#curlx)
  36  - [Content Encoding](#contentencoding)
  37  - [hostip.c explained](#hostip)
  38  - [Track Down Memory Leaks](#memoryleak)
  39  - [`multi_socket`](#multi_socket)
  40  - [Structs in libcurl](#structs)
  41
  42 <a name="intro"></a>
  43 curl internals
  44 ==============
  45
  46  This project is split in two. The library and the client. The client part
  47  uses the library, but the library is designed to allow other applications to
  48  use it.
  49
  50  The largest amount of code and complexity is in the library part.
  51
  52
  53 <a name="git"></a>
  54 git
  55 ===
  56
  57  All changes to the sources are committed to the git repository as soon as
  58  they're somewhat verified to work. Changes shall be committed as independently
  59  as possible so that individual changes can be easier spotted and tracked
  60  afterwards.
  61
  62  Tagging shall be used extensively, and by the time we release new archives we
  63  should tag the sources with a name similar to the released version number.
  64
  65 <a name="Portability"></a>
  66 Portability
  67 ===========
  68
  69  We write curl and libcurl to compile with C89 compilers.  On 32bit and up
  70  machines. Most of libcurl assumes more or less POSIX compliance but that's
  71  not a requirement.
  72
  73  We write libcurl to build and work with lots of third party tools, and we
  74  want it to remain functional and buildable with these and later versions
  75  (older versions may still work but is not what we work hard to maintain):
  76
  77 Dependencies
  78 ------------
  79
  80  - OpenSSL      0.9.7
  81  - GnuTLS       1.2
  82  - zlib         1.1.4
  83  - libssh2      0.16
  84  - c-ares       1.6.0
  85  - libidn       0.4.1
  86  - cyassl       2.0.0
  87  - openldap     2.0
  88  - MIT Kerberos 1.2.4
  89  - GSKit        V5R3M0
  90  - NSS          3.14.x
  91  - axTLS        1.2.7
  92  - PolarSSL     1.3.0
  93  - Heimdal      ?
  94  - nghttp2      1.0.0
  95
  96 Operating Systems
  97 -----------------
  98
  99  On systems where configure runs, we aim at working on them all - if they have
 100  a suitable C compiler. On systems that don't run configure, we strive to keep
 101  curl running fine on:
 102
 103  - Windows      98
 104  - AS/400       V5R3M0
 105  - Symbian      9.1
 106  - Windows CE   ?
 107  - TPF          ?
 108
 109 Build tools
 110 -----------
 111
 112  When writing code (mostly for generating stuff included in release tarballs)
 113  we use a few "build tools" and we make sure that we remain functional with
 114  these versions:
 115
 116  - GNU Libtool  1.4.2
 117  - GNU Autoconf 2.57
 118  - GNU Automake 1.7
 119  - GNU M4       1.4
 120  - perl         5.004
 121  - roffit       0.5
 122  - groff        ? (any version that supports "groff -Tps -man [in] [out]")
 123  - ps2pdf (gs)  ?
 124
 125 <a name="winvsunix"></a>
 126 Windows vs Unix
 127 ===============
 128
 129  There are a few differences in how to program curl the unix way compared to
 130  the Windows way. The four perhaps most notable details are:
 131
 132  1. Different function names for socket operations.
 133
 134    In curl, this is solved with defines and macros, so that the source looks
 135    the same at all places except for the header file that defines them. The
 136    macros in use are sclose(), sread() and swrite().
 137
 138  2. Windows requires a couple of init calls for the socket stuff.
 139
 140    That's taken care of by the `curl_global_init()` call, but if other libs
 141    also do it etc there might be reasons for applications to alter that
 142    behaviour.
 143
 144  3. The file descriptors for network communication and file operations are
 145     not easily interchangeable as in unix.
 146
 147    We avoid this by not trying any funny tricks on file descriptors.
 148
 149  4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
 150     destroying binary data, although you do want that conversion if it is
 151     text coming through... (sigh)
 152
 153    We set stdout to binary under windows
 154
 155  Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
 156  conditionals that deal with features *should* instead be in the format
 157  `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
 158  we maintain a `curl_config-win32.h` file in lib directory that is supposed to
 159  look exactly as a `curl_config.h` file would have looked like on a Windows
 160  machine!
 161
 162  Generally speaking: always remember that this will be compiled on dozens of
 163  operating systems. Don't walk on the edge.
 164
 165 <a name="Library"></a>
 166 Library
 167 =======
 168
 169  (See `LIBCURL-STRUCTS` for a separate document describing all major internal
 170  structs and their purposes.)
 171
 172  There are plenty of entry points to the library, namely each publicly defined
 173  function that libcurl offers to applications. All of those functions are
 174  rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
 175  put in the lib/easy.c file.
 176
 177  `curl_global_init_()` and `curl_global_cleanup()` should be called by the
 178  application to initialize and clean up global stuff in the library. As of
 179  today, it can handle the global SSL initing if SSL is enabled and it can init
 180  the socket layer on windows machines. libcurl itself has no "global" scope.
 181
 182  All printf()-style functions use the supplied clones in lib/mprintf.c. This
 183  makes sure we stay absolutely platform independent.
 184
 185  [ `curl_easy_init()`][2] allocates an internal struct and makes some
 186  initializations.  The returned handle does not reveal internals. This is the
 187  'SessionHandle' struct which works as an "anchor" struct for all `curl_easy`
 188  functions. All connections performed will get connect-specific data allocated
 189  that should be used for things related to particular connections/requests.
 190
 191  [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
 192  be passed in pairs: the parameter-ID and the parameter-value. The list of
 193  options is documented in the man page. This function mainly sets things in
 194  the 'SessionHandle' struct.
 195
 196  `curl_easy_perform()` is just a wrapper function that makes use of the multi
 197  API.  It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
 198  `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
 199  and then returns.
 200
 201  Some of the most important key functions in url.c are called from multi.c
 202  when certain key steps are to be made in the transfer operation.
 203
 204 <a name="Curl_connect"></a>
 205 Curl_connect()
 206 --------------
 207
 208    Analyzes the URL, it separates the different components and connects to the
 209    remote host. This may involve using a proxy and/or using SSL. The
 210    `Curl_resolv()` function in lib/hostip.c is used for looking up host names
 211    (it does then use the proper underlying method, which may vary between
 212    platforms and builds).
 213
 214    When `Curl_connect` is done, we are connected to the remote site. Then it
 215    is time to tell the server to get a document/file. `Curl_do()` arranges
 216    this.
 217
 218    This function makes sure there's an allocated and initiated 'connectdata'
 219    struct that is used for this particular connection only (although there may
 220    be several requests performed on the same connect). A bunch of things are
 221    inited/inherited from the SessionHandle struct.
 222
 223 <a name="Curl_do"></a>
 224 Curl_do()
 225 ---------
 226
 227    `Curl_do()` makes sure the proper protocol-specific function is called. The
 228    functions are named after the protocols they handle.
 229
 230    The protocol-specific functions of course deal with protocol-specific
 231    negotiations and setup. They have access to the `Curl_sendf()` (from
 232    lib/sendf.c) function to send printf-style formatted data to the remote
 233    host and when they're ready to make the actual file transfer they call the
 234    `Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and
 235    returns.
 236
 237    If this DO function fails and the connection is being re-used, libcurl will
 238    then close this connection, setup a new connection and re-issue the DO
 239    request on that. This is because there is no way to be perfectly sure that
 240    we have discovered a dead connection before the DO function and thus we
 241    might wrongly be re-using a connection that was closed by the remote peer.
 242
 243    Some time during the DO function, the `Curl_setup_transfer()` function must
 244    be called with some basic info about the upcoming transfer: what socket(s)
 245    to read/write and the expected file transfer sizes (if known).
 246
 247 <a name="Curl_readwrite"></a>
 248 Curl_readwrite()
 249 ----------------
 250
 251    Called during the transfer of the actual protocol payload.
 252
 253    During transfer, the progress functions in lib/progress.c are called at a
 254    frequent interval (or at the user's choice, a specified callback might get
 255    called). The speedcheck functions in lib/speedcheck.c are also used to
 256    verify that the transfer is as fast as required.
 257
 258 <a name="Curl_done"></a>
 259 Curl_done()
 260 -----------
 261
 262    Called after a transfer is done. This function takes care of everything
 263    that has to be done after a transfer. This function attempts to leave
 264    matters in a state so that `Curl_do()` should be possible to call again on
 265    the same connection (in a persistent connection case). It might also soon
 266    be closed with `Curl_disconnect()`.
 267
 268 <a name="Curl_disconnect"></a>
 269 Curl_disconnect()
 270 -----------------
 271
 272    When doing normal connections and transfers, no one ever tries to close any
 273    connections so this is not normally called when `curl_easy_perform()` is
 274    used. This function is only used when we are certain that no more transfers
 275    is going to be made on the connection. It can be also closed by force, or
 276    it can be called to make sure that libcurl doesn't keep too many
 277    connections alive at the same time.
 278
 279    This function cleans up all resources that are associated with a single
 280    connection.
 281
 282 <a name="http"></a>
 283 HTTP(S)
 284 =======
 285
 286  HTTP offers a lot and is the protocol in curl that uses the most lines of
 287  code. There is a special file (lib/formdata.c) that offers all the multipart
 288  post functions.
 289
 290  base64-functions for user+password stuff (and more) is in (lib/base64.c) and
 291  all functions for parsing and sending cookies are found in (lib/cookie.c).
 292
 293  HTTPS uses in almost every means the same procedure as HTTP, with only two
 294  exceptions: the connect procedure is different and the function used to read
 295  or write from the socket is different, although the latter fact is hidden in
 296  the source by the use of `Curl_read()` for reading and `Curl_write()` for
 297  writing data to the remote server.
 298
 299  `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
 300  encoding.
 301
 302  An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
 303  series of functions we use. They append data to one single buffer, and when
 304  the building is done the entire request is sent off in one single write. This
 305  is done this way to overcome problems with flawed firewalls and lame servers.
 306
 307 <a name="ftp"></a>
 308 FTP
 309 ===
 310
 311  The `Curl_if2ip()` function can be used for getting the IP number of a
 312  specified network interface, and it resides in lib/if2ip.c.
 313
 314  `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
 315  was made a separate function to prevent us programmers from forgetting that
 316  they must be CRLF terminated. They must also be sent in one single write() to
 317  make firewalls and similar happy.
 318
 319 <a name="kerberos"></a>
 320 Kerberos
 321 --------
 322
 323  Kerberos support is mainly in lib/krb5.c and lib/security.c but also
 324  `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
 325  `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
 326
 327 <a name="telnet"></a>
 328 TELNET
 329 ======
 330
 331  Telnet is implemented in lib/telnet.c.
 332
 333 <a name="file"></a>
 334 FILE
 335 ====
 336
 337  The file:// protocol is dealt with in lib/file.c.
 338
 339 <a name="smb"></a>
 340 SMB
 341 ===
 342
 343  The smb:// protocol is dealt with in lib/smb.c.
 344
 345 <a name="ldap"></a>
 346 LDAP
 347 ====
 348
 349  Everything LDAP is in lib/ldap.c and lib/openldap.c
 350
 351 <a name="email"></a>
 352 E-mail
 353 ======
 354
 355  The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c.
 356
 357 <a name="general"></a>
 358 General
 359 =======
 360
 361  URL encoding and decoding, called escaping and unescaping in the source code,
 362  is found in lib/escape.c.
 363
 364  While transferring data in Transfer() a few functions might get used.
 365  `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more).
 366
 367  lib/getenv.c offers `curl_getenv()` which is for reading environment
 368  variables in a neat platform independent way. That's used in the client, but
 369  also in lib/url.c when checking the proxy environment variables. Note that
 370  contrary to the normal unix getenv(), this returns an allocated buffer that
 371  must be free()ed after use.
 372
 373  lib/netrc.c holds the .netrc parser
 374
 375  lib/timeval.c features replacement functions for systems that don't have
 376  gettimeofday() and a few support functions for timeval conversions.
 377
 378  A function named `curl_version()` that returns the full curl version string
 379  is found in lib/version.c.
 380
 381 <a name="persistent"></a>
 382 Persistent Connections
 383 ======================
 384
 385  The persistent connection support in libcurl requires some considerations on
 386  how to do things inside of the library.
 387
 388  - The 'SessionHandle' struct returned in the [`curl_easy_init()`][2] call
 389    must never hold connection-oriented data. It is meant to hold the root data
 390    as well as all the options etc that the library-user may choose.
 391
 392  - The 'SessionHandle' struct holds the "connection cache" (an array of
 393    pointers to 'connectdata' structs).
 394
 395  - This enables the 'curl handle' to be reused on subsequent transfers.
 396
 397  - When libcurl is told to perform a transfer, it first checks for an already
 398    existing connection in the cache that we can use. Otherwise it creates a
 399    new one and adds that the cache. If the cache is full already when a new
 400    connection is added added, it will first close the oldest unused one.
 401
 402  - When the transfer operation is complete, the connection is left
 403    open. Particular options may tell libcurl not to, and protocols may signal
 404    closure on connections and then they won't be kept open of course.
 405
 406  - When `curl_easy_cleanup()` is called, we close all still opened connections,
 407    unless of course the multi interface "owns" the connections.
 408
 409  The curl handle must be re-used in order for the persistent connections to
 410  work.
 411
 412 <a name="multi"></a>
 413 multi interface/non-blocking
 414 ============================
 415
 416  The multi interface is a non-blocking interface to the library. To make that
 417  interface work as good as possible, no low-level functions within libcurl
 418  must be written to work in a blocking manner. (There are still a few spots
 419  violating this rule.)
 420
 421  One of the primary reasons we introduced c-ares support was to allow the name
 422  resolve phase to be perfectly non-blocking as well.
 423
 424  The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
 425  the code to allow non-blocking operations even on multi-stage command-
 426  response protocols. They are built around state machines that return when
 427  they would otherwise block waiting for data.  The DICT, LDAP and TELNET
 428  protocols are crappy examples and they are subject for rewrite in the future
 429  to better fit the libcurl protocol family.
 430
 431 <a name="ssl"></a>
 432 SSL libraries
 433 =============
 434
 435  Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
 436  extended to its successor OpenSSL but has since also been extended to several
 437  other SSL/TLS libraries and we expect and hope to further extend the support
 438  in future libcurl versions.
 439
 440  To deal with this internally in the best way possible, we have a generic SSL
 441  function API as provided by the vtls/vtls.[ch] system, and they are the only
 442  SSL functions we must use from within libcurl. vtls is then crafted to use
 443  the appropriate lower-level function calls to whatever SSL library that is in
 444  use. For example vtls/openssl.[ch] for the OpenSSL library.
 445
 446 <a name="symbols"></a>
 447 Library Symbols
 448 ===============
 449
 450  All symbols used internally in libcurl must use a `Curl_` prefix if they're
 451  used in more than a single file. Single-file symbols must be made static.
 452  Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
 453  but they are to be changed to follow this pattern in future versions.) Public
 454  API functions are marked with `CURL_EXTERN` in the public header files so
 455  that all others can be hidden on platforms where this is possible.
 456
 457 <a name="returncodes"></a>
 458 Return Codes and Informationals
 459 ===============================
 460
 461  I've made things simple. Almost every function in libcurl returns a CURLcode,
 462  that must be `CURLE_OK` if everything is OK or otherwise a suitable error
 463  code as the curl/curl.h include file defines. The very spot that detects an
 464  error must use the `Curl_failf()` function to set the human-readable error
 465  description.
 466
 467  In aiding the user to understand what's happening and to debug curl usage, we
 468  must supply a fair amount of informational messages by using the
 469  `Curl_infof()` function. Those messages are only displayed when the user
 470  explicitly asks for them. They are best used when revealing information that
 471  isn't otherwise obvious.
 472
 473 <a name="abi"></a>
 474 API/ABI
 475 =======
 476
 477  We make an effort to not export or show internals or how internals work, as
 478  that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
 479  for our promise to users.
 480
 481 <a name="client"></a>
 482 Client
 483 ======
 484
 485  main() resides in `src/tool_main.c`.
 486
 487  `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script
 488  to display the complete "manual" and the src/tool_urlglob.c file holds the
 489  functions used for the URL-"globbing" support. Globbing in the sense that the
 490  {} and [] expansion stuff is there.
 491
 492  The client mostly messes around to setup its 'config' struct properly, then
 493  it calls the `curl_easy_*()` functions of the library and when it gets back
 494  control after the `curl_easy_perform()` it cleans up the library, checks
 495  status and exits.
 496
 497  When the operation is done, the ourWriteOut() function in src/writeout.c may
 498  be called to report about the operation. That function is using the
 499  `curl_easy_getinfo()` function to extract useful information from the curl
 500  session.
 501
 502  It may loop and do all this several times if many URLs were specified on the
 503  command line or config file.
 504
 505 <a name="memorydebug"></a>
 506 Memory Debugging
 507 ================
 508
 509  The file lib/memdebug.c contains debug-versions of a few functions. Functions
 510  such as malloc, free, fopen, fclose, etc that somehow deal with resources
 511  that might give us problems if we "leak" them. The functions in the memdebug
 512  system do nothing fancy, they do their normal function and then log
 513  information about what they just did. The logged data can then be analyzed
 514  after a complete session,
 515
 516  memanalyze.pl is the perl script present in tests/ that analyzes a log file
 517  generated by the memory tracking system. It detects if resources are
 518  allocated but never freed and other kinds of errors related to resource
 519  management.
 520
 521  Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
 522  is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
 523  differentiate code which is _only_ used for memory tracking/debugging.
 524
 525  Use -DCURLDEBUG when compiling to enable memory debugging, this is also
 526  switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
 527  when compiling to enable a debug build or run configure with --enable-debug.
 528
 529  curl --version will list 'Debug' feature for debug enabled builds, and
 530  will list 'TrackMemory' feature for curl debug memory tracking capable
 531  builds. These features are independent and can be controlled when running
 532  the configure script. When --enable-debug is given both features will be
 533  enabled, unless some restriction prevents memory tracking from being used.
 534
 535 <a name="test"></a>
 536 Test Suite
 537 ==========
 538
 539  The test suite is placed in its own subdirectory directly off the root in the
 540  curl archive tree, and it contains a bunch of scripts and a lot of test case
 541  data.
 542
 543  The main test script is runtests.pl that will invoke test servers like
 544  httpserver.pl and ftpserver.pl before all the test cases are performed. The
 545  test suite currently only runs on unix-like platforms.
 546
 547  You'll find a description of the test suite in the tests/README file, and the
 548  test case data files in the tests/FILEFORMAT file.
 549
 550  The test suite automatically detects if curl was built with the memory
 551  debugging enabled, and if it was it will detect memory leaks, too.
 552
 553 <a name="asyncdns"></a>
 554 Asynchronous name resolves
 555 ==========================
 556
 557  libcurl can be built to do name resolves asynchronously, using either the
 558  normal resolver in a threaded manner or by using c-ares.
 559
 560 <a name="cares"></a>
 561 [c-ares][3]
 562 ------
 563
 564 ### Build libcurl to use a c-ares
 565
 566 1. ./configure --enable-ares=/path/to/ares/install
 567 2. make
 568
 569 ### c-ares on win32
 570
 571  First I compiled c-ares. I changed the default C runtime library to be the
 572  single-threaded rather than the multi-threaded (this seems to be required to
 573  prevent linking errors later on). Then I simply build the areslib project
 574  (the other projects adig/ahost seem to fail under MSVC).
 575
 576  Next was libcurl. I opened lib/config-win32.h and I added a:
 577  `#define USE_ARES 1`
 578
 579  Next thing I did was I added the path for the ares includes to the include
 580  path, and the libares.lib to the libraries.
 581
 582  Lastly, I also changed libcurl to be single-threaded rather than
 583  multi-threaded, again this was to prevent some duplicate symbol errors. I'm
 584  not sure why I needed to change everything to single-threaded, but when I
 585  didn't I got redefinition errors for several CRT functions (malloc, stricmp,
 586  etc.)
 587
 588 <a name="curl_off_t"></a>
 589 `curl_off_t`
 590 ==========
 591
 592  curl_off_t is a data type provided by the external libcurl include
 593  headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
 594  options that end with LARGE. The type is 64bit large on most modern
 595  platforms.
 596
 597 curlx
 598 =====
 599
 600  The libcurl source code offers a few functions by source only. They are not
 601  part of the official libcurl API, but the source files might be useful for
 602  others so apps can optionally compile/build with these sources to gain
 603  additional functions.
 604
 605  We provide them through a single header file for easy access for apps:
 606  "curlx.h"
 607
 608 `curlx_strtoofft()`
 609 -------------------
 610    A macro that converts a string containing a number to a curl_off_t number.
 611    This might use the curlx_strtoll() function which is provided as source
 612    code in strtoofft.c. Note that the function is only provided if no
 613    strtoll() (or equivalent) function exist on your platform. If curl_off_t
 614    is only a 32 bit number on your platform, this macro uses strtol().
 615
 616 `curlx_tvnow()`
 617 ---------------
 618    returns a struct timeval for the current time.
 619
 620 `curlx_tvdiff()`
 621 --------------
 622    returns the difference between two timeval structs, in number of
 623    milliseconds.
 624
 625 `curlx_tvdiff_secs()`
 626 ---------------------
 627    returns the same as curlx_tvdiff but with full usec resolution (as a
 628    double)
 629
 630 Future
 631 ------
 632
 633  Several functions will be removed from the public curl_ name space in a
 634  future libcurl release. They will then only become available as curlx_
 635  functions instead. To make the transition easier, we already today provide
 636  these functions with the curlx_ prefix to allow sources to get built properly
 637  with the new function names. The functions this concerns are:
 638
 639  - `curlx_getenv`
 640  - `curlx_strequal`
 641  - `curlx_strnequal`
 642  - `curlx_mvsnprintf`
 643  - `curlx_msnprintf`
 644  - `curlx_maprintf`
 645  - `curlx_mvaprintf`
 646  - `curlx_msprintf`
 647  - `curlx_mprintf`
 648  - `curlx_mfprintf`
 649  - `curlx_mvsprintf`
 650  - `curlx_mvprintf`
 651  - `curlx_mvfprintf`
 652
 653 <a name="contentencoding"></a>
 654 Content Encoding
 655 ================
 656
 657 ## About content encodings
 658
 659  [HTTP/1.1][4] specifies that a client may request that a server encode its
 660  response. This is usually used to compress a response using one of a set of
 661  commonly available compression techniques. These schemes are 'deflate' (the
 662  zlib algorithm), 'gzip' and 'compress'. A client requests that the sever
 663  perform an encoding by including an Accept-Encoding header in the request
 664  document. The value of the header should be one of the recognized tokens
 665  'deflate', ... (there's a way to register new schemes/tokens, see sec 3.5 of
 666  the spec). A server MAY honor the client's encoding request. When a response
 667  is encoded, the server includes a Content-Encoding header in the
 668  response. The value of the Content-Encoding header indicates which scheme was
 669  used to encode the data.
 670
 671  A client may tell a server that it can understand several different encoding
 672  schemes. In this case the server may choose any one of those and use it to
 673  encode the response (indicating which one using the Content-Encoding header).
 674  It's also possible for a client to attach priorities to different schemes so
 675  that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
 676  information on the Accept-Encoding header.
 677
 678 ## Supported content encodings
 679
 680  The 'deflate' and 'gzip' content encoding are supported by libcurl. Both
 681  regular and chunked transfers work fine.  The zlib library is required for
 682  this feature.
 683
 684 ## The libcurl interface
 685
 686  To cause libcurl to request a content encoding use:
 687
 688   [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
 689
 690  where string is the intended value of the Accept-Encoding header.
 691
 692  Currently, libcurl only understands how to process responses that use the
 693  "deflate" or "gzip" Content-Encoding, so the only values for
 694  [`CURLOPT_ACCEPT_ENCODING`][5] that will work (besides "identity," which does
 695  nothing) are "deflate" and "gzip" If a response is encoded using the
 696  "compress" or methods, libcurl will return an error indicating that the
 697  response could not be decoded.  If <string> is NULL no Accept-Encoding header
 698  is generated.  If <string> is a zero-length string, then an Accept-Encoding
 699  header containing all supported encodings will be generated.
 700
 701  The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
 702  content to be automatically decoded.  If it is not set and the server still
 703  sends encoded content (despite not having been asked), the data is returned
 704  in its raw form and the Content-Encoding type is not checked.
 705
 706 ## The curl interface
 707
 708  Use the [--compressed][6] option with curl to cause it to ask servers to
 709  compress responses using any format supported by curl.
 710
 711 <a name="hostip"></a>
 712 hostip.c explained
 713 ==================
 714
 715  The main compile-time defines to keep in mind when reading the host*.c source
 716  file are these:
 717
 718 ## `CURLRES_IPV6`
 719
 720  this host has getaddrinfo() and family, and thus we use that. The host may
 721  not be able to resolve IPv6, but we don't really have to take that into
 722  account. Hosts that aren't IPv6-enabled have CURLRES_IPV4 defined.
 723
 724 ## `CURLRES_ARES`
 725
 726  is defined if libcurl is built to use c-ares for asynchronous name
 727  resolves. This can be Windows or *nix.
 728
 729 ## `CURLRES_THREADED`
 730
 731  is defined if libcurl is built to use threading for asynchronous name
 732  resolves. The name resolve will be done in a new thread, and the supported
 733  asynch API will be the same as for ares-builds. This is the default under
 734  (native) Windows.
 735
 736  If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
 737  libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
 738  defined.
 739
 740 ## host*.c sources
 741
 742  The host*.c sources files are split up like this:
 743
 744  - hostip.c      - method-independent resolver functions and utility functions
 745  - hostasyn.c    - functions for asynchronous name resolves
 746  - hostsyn.c     - functions for synchronous name resolves
 747  - asyn-ares.c   - functions for asynchronous name resolves using c-ares
 748  - asyn-thread.c - functions for asynchronous name resolves using threads
 749  - hostip4.c     - IPv4 specific functions
 750  - hostip6.c     - IPv6 specific functions
 751
 752  The hostip.h is the single united header file for all this. It defines the
 753  `CURLRES_*` defines based on the config*.h and curl_setup.h defines.
 754
 755 <a name="memoryleak"></a>
 756 Track Down Memory Leaks
 757 =======================
 758
 759 ## Single-threaded
 760
 761   Please note that this memory leak system is not adjusted to work in more
 762   than one thread. If you want/need to use it in a multi-threaded app. Please
 763   adjust accordingly.
 764
 765
 766 ## Build
 767
 768   Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with
 769   --enable-debug fixes this). 'make clean' first, then 'make' so that all
 770   files actually are rebuilt properly. It will also make sense to build
 771   libcurl with the debug option (usually -g to the compiler) so that debugging
 772   it will be easier if you actually do find a leak in the library.
 773
 774   This will create a library that has memory debugging enabled.
 775
 776 ## Modify Your Application
 777
 778   Add a line in your application code:
 779
 780        `curl_memdebug("dump");`
 781
 782   This will make the malloc debug system output a full trace of all resource
 783   using functions to the given file name. Make sure you rebuild your program
 784   and that you link with the same libcurl you built for this purpose as
 785   described above.
 786
 787 ## Run Your Application
 788
 789   Run your program as usual. Watch the specified memory trace file grow.
 790
 791   Make your program exit and use the proper libcurl cleanup functions etc. So
 792   that all non-leaks are returned/freed properly.
 793
 794 ## Analyze the Flow
 795
 796   Use the tests/memanalyze.pl perl script to analyze the dump file:
 797
 798     tests/memanalyze.pl dump
 799
 800   This now outputs a report on what resources that were allocated but never
 801   freed etc. This report is very fine for posting to the list!
 802
 803   If this doesn't produce any output, no leak was detected in libcurl. Then
 804   the leak is mostly likely to be in your code.
 805
 806 <a name="multi_socket"></a>
 807 `multi_socket`
 808 ==============
 809
 810  Implementation of the `curl_multi_socket` API
 811
 812   The main ideas of this API are simply:
 813
 814    1 - The application can use whatever event system it likes as it gets info
 815        from libcurl about what file descriptors libcurl waits for what action
 816        on. (The previous API returns `fd_sets` which is very select()-centric).
 817
 818    2 - When the application discovers action on a single socket, it calls
 819        libcurl and informs that there was action on this particular socket and
 820        libcurl can then act on that socket/transfer only and not care about
 821        any other transfers. (The previous API always had to scan through all
 822        the existing transfers.)
 823
 824   The idea is that [`curl_multi_socket_action()`][7] calls a given callback
 825   with information about what socket to wait for what action on, and the
 826   callback only gets called if the status of that socket has changed.
 827
 828   We also added a timer callback that makes libcurl call the application when
 829   the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
 830   and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
 831   Internally, there's an added a struct to each easy handle in which we store
 832   an "expire time" (if any). The structs are then "splay sorted" so that we
 833   can add and remove times from the linked list and yet somewhat swiftly
 834   figure out both how long time there is until the next nearest timer expires
 835   and which timer (handle) we should take care of now. Of course, the upside
 836   of all this is that we get a [`curl_multi_timeout()`][8] that should also
 837   work with old-style applications that use [`curl_multi_perform()`][11].
 838
 839   We created an internal "socket to easy handles" hash table that given
 840   a socket (file descriptor) return the easy handle that waits for action on
 841   that socket.  This hash is made using the already existing hash code
 842   (previously only used for the DNS cache).
 843
 844   To make libcurl able to report plain sockets in the socket callback, we had
 845   to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
 846   the conversion from sockets to `fd_sets` for that function is only done in
 847   the last step before the data is returned. I also had to extend c-ares to
 848   get a function that can return plain sockets, as that library too returned
 849   only `fd_sets` and that is no longer good enough. The changes done to c-ares
 850   are available in c-ares 1.3.1 and later.
 851
 852 <a name="structs"></a>
 853 Structs in libcurl
 854 ==================
 855
 856 This section should cover 7.32.0 pretty accurately, but will make sense even
 857 for older and later versions as things don't change drastically that often.
 858
 859 ## SessionHandle
 860
 861   The SessionHandle handle struct is the one returned to the outside in the
 862   external API as a "CURL *". This is usually known as an easy handle in API
 863   documentations and examples.
 864
 865   Information and state that is related to the actual connection is in the
 866   'connectdata' struct. When a transfer is about to be made, libcurl will
 867   either create a new connection or re-use an existing one. The particular
 868   connectdata that is used by this handle is pointed out by
 869   SessionHandle->easy_conn.
 870
 871   Data and information that regard this particular single transfer is put in
 872   the SingleRequest sub-struct.
 873
 874   When the SessionHandle struct is added to a multi handle, as it must be in
 875   order to do any transfer, the ->multi member will point to the `Curl_multi`
 876   struct it belongs to. The ->prev and ->next members will then be used by the
 877   multi code to keep a linked list of SessionHandle structs that are added to
 878   that same multi handle. libcurl always uses multi so ->multi *will* point to
 879   a `Curl_multi` when a transfer is in progress.
 880
 881   ->mstate is the multi state of this particular SessionHandle. When
 882   `multi_runsingle()` is called, it will act on this handle according to which
 883   state it is in. The mstate is also what tells which sockets to return for a
 884   specific SessionHandle when [`curl_multi_fdset()`][12] is called etc.
 885
 886   The libcurl source code generally use the name 'data' for the variable that
 887   points to the SessionHandle.
 888
 889   When doing multiplexed HTTP/2 transfers, each SessionHandle is associated
 890   with an individual stream, sharing the same connectdata struct. Multiplexing
 891   makes it even more important to keep things associated with the right thing!
 892
 893 ## connectdata
 894
 895   A general idea in libcurl is to keep connections around in a connection
 896   "cache" after they have been used in case they will be used again and then
 897   re-use an existing one instead of creating a new as it creates a significant
 898   performance boost.
 899
 900   Each 'connectdata' identifies a single physical connection to a server. If
 901   the connection can't be kept alive, the connection will be closed after use
 902   and then this struct can be removed from the cache and freed.
 903
 904   Thus, the same SessionHandle can be used multiple times and each time select
 905   another connectdata struct to use for the connection. Keep this in mind, as
 906   it is then important to consider if options or choices are based on the
 907   connection or the SessionHandle.
 908
 909   Functions in libcurl will assume that connectdata->data points to the
 910   SessionHandle that uses this connection (for the moment).
 911
 912   As a special complexity, some protocols supported by libcurl require a
 913   special disconnect procedure that is more than just shutting down the
 914   socket. It can involve sending one or more commands to the server before
 915   doing so. Since connections are kept in the connection cache after use, the
 916   original SessionHandle may no longer be around when the time comes to shut
 917   down a particular connection. For this purpose, libcurl holds a special
 918   dummy `closure_handle` SessionHandle in the `Curl_multi` struct to use when
 919   needed.
 920
 921   FTP uses two TCP connections for a typical transfer but it keeps both in
 922   this single struct and thus can be considered a single connection for most
 923   internal concerns.
 924
 925   The libcurl source code generally use the name 'conn' for the variable that
 926   points to the connectdata.
 927
 928 ## Curl_multi
 929
 930   Internally, the easy interface is implemented as a wrapper around multi
 931   interface functions. This makes everything multi interface.
 932
 933   `Curl_multi` is the multi handle struct exposed as "CURLM *" in external APIs.
 934
 935   This struct holds a list of SessionHandle structs that have been added to
 936   this handle with [`curl_multi_add_handle()`][13]. The start of the list is
 937   ->easyp and ->num_easy is a counter of added SessionHandles.
 938
 939   ->msglist is a linked list of messages to send back when
 940   [`curl_multi_info_read()`][14] is called. Basically a node is added to that
 941   list when an individual SessionHandle's transfer has completed.
 942
 943   ->hostcache points to the name cache. It is a hash table for looking up name
 944   to IP. The nodes have a limited life time in there and this cache is meant
 945   to reduce the time for when the same name is wanted within a short period of
 946   time.
 947
 948   ->timetree points to a tree of SessionHandles, sorted by the remaining time
 949   until it should be checked - normally some sort of timeout. Each
 950   SessionHandle has one node in the tree.
 951
 952   ->sockhash is a hash table to allow fast lookups of socket descriptor to
 953   which SessionHandle that uses that descriptor. This is necessary for the
 954   `multi_socket` API.
 955
 956   ->conn_cache points to the connection cache. It keeps track of all
 957   connections that are kept after use. The cache has a maximum size.
 958
 959   ->closure_handle is described in the 'connectdata' section.
 960
 961   The libcurl source code generally use the name 'multi' for the variable that
 962   points to the Curl_multi struct.
 963
 964 ## Curl_handler
 965
 966   Each unique protocol that is supported by libcurl needs to provide at least
 967   one `Curl_handler` struct. It defines what the protocol is called and what
 968   functions the main code should call to deal with protocol specific issues.
 969   In general, there's a source file named [protocol].c in which there's a
 970   "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's
 971   then the main array with all individual `Curl_handler` structs pointed to
 972   from a single array which is scanned through when a URL is given to libcurl
 973   to work with.
 974
 975   ->scheme is the URL scheme name, usually spelled out in uppercase. That's
 976   "HTTP" or "FTP" etc. SSL versions of the protcol need its own `Curl_handler`
 977   setup so HTTPS separate from HTTP.
 978
 979   ->setup_connection is called to allow the protocol code to allocate protocol
 980   specific data that then gets associated with that SessionHandle for the rest
 981   of this transfer. It gets freed again at the end of the transfer. It will be
 982   called before the 'connectdata' for the transfer has been selected/created.
 983   Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
 984   SessionHandle->req.protop to point to it.
 985
 986   ->connect_it allows a protocol to do some specific actions after the TCP
 987   connect is done, that can still be considered part of the connection phase.
 988
 989   Some protocols will alter the connectdata->recv[] and connectdata->send[]
 990   function pointers in this function.
 991
 992   ->connecting is similarly a function that keeps getting called as long as the
 993   protocol considers itself still in the connecting phase.
 994
 995   ->do_it is the function called to issue the transfer request. What we call
 996   the DO action internally. If the DO is not enough and things need to be kept
 997   getting done for the entire DO sequence to complete, ->doing is then usually
 998   also provided. Each protocol that needs to do multiple commands or similar
 999   for do/doing need to implement their own state machines (see SCP, SFTP,
1000   FTP). Some protocols (only FTP and only due to historical reasons) has a
1001   separate piece of the DO state called `DO_MORE`.
1002
1003   ->doing keeps getting called while issuing the transfer request command(s)
1004
1005   ->done gets called when the transfer is complete and DONE. That's after the
1006   main data has been transferred.
1007
1008   ->do_more gets called during the `DO_MORE` state. The FTP protocol uses this
1009   state when setting up the second connection.
1010
1011   ->`proto_getsock`
1012   ->`doing_getsock`
1013   ->`domore_getsock`
1014   ->`perform_getsock`
1015   Functions that return socket information. Which socket(s) to wait for which
1016   action(s) during the particular multi state.
1017
1018   ->disconnect is called immediately before the TCP connection is shutdown.
1019
1020   ->readwrite gets called during transfer to allow the protocol to do extra
1021   reads/writes
1022
1023   ->defport is the default report TCP or UDP port this protocol uses
1024
1025   ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions
1026   have their "base" protocol set and then the SSL variation. Like
1027   "HTTP|HTTPS".
1028
1029   ->flags is a bitmask with additional information about the protocol that will
1030   make it get treated differently by the generic engine:
1031
1032   - `PROTOPT_SSL` - will make it connect and negotiate SSL
1033
1034   - `PROTOPT_DUAL` - this protocol uses two connections
1035
1036   - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
1037     connection. This flag is no longer used by code, yet still set for a bunch
1038     protocol handlers.
1039
1040   - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
1041     limit which "direction" of socket actions that the main engine will
1042     concern itself about.
1043
1044   - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:)
1045
1046   - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
1047     one unless one is provided
1048
1049   - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
1050     (?foo=bar)
1051
1052 ## conncache
1053
1054   Is a hash table with connections for later re-use. Each SessionHandle has
1055   a pointer to its connection cache. Each multi handle sets up a connection
1056   cache that all added SessionHandles share by default.
1057
1058 ## Curl_share
1059
1060   The libcurl share API allocates a `Curl_share` struct, exposed to the
1061   external API as "CURLSH *".
1062
1063   The idea is that the struct can have a set of own versions of caches and
1064   pools and then by providing this struct in the `CURLOPT_SHARE` option, those
1065   specific SessionHandles will use the caches/pools that this share handle
1066   holds.
1067
1068   Then individual SessionHandle structs can be made to share specific things
1069   that they otherwise wouldn't, such as cookies.
1070
1071   The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
1072   session cache.
1073
1074 ## CookieInfo
1075
1076   This is the main cookie struct. It holds all known cookies and related
1077   information. Each SessionHandle has its own private CookieInfo even when
1078   they are added to a multi handle. They can be made to share cookies by using
1079   the share API.
1080
1081
1082 [1]: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html
1083 [2]: http://curl.haxx.se/libcurl/c/curl_easy_init.html
1084 [3]: http://c-ares.haxx.se/
1085 [4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
1086 [5]: http://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
1087 [6]: http://curl.haxx.se/docs/manpage.html#--compressed
1088 [7]: http://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
1089 [8]: http://curl.haxx.se/libcurl/c/curl_multi_timeout.html
1090 [9]: http://curl.haxx.se/libcurl/c/curl_multi_setopt.html
1091 [10]: http://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
1092 [11]: http://curl.haxx.se/libcurl/c/curl_multi_perform.html
1093 [12]: http://curl.haxx.se/libcurl/c/curl_multi_fdset.html
1094 [13]: http://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
1095 [14]: http://curl.haxx.se/libcurl/c/curl_multi_info_read.html