doc/text/libarchive-formats.5.txt

   1 LIBARCHIVE-FORMATS(5)       BSD File Formats Manual      LIBARCHIVE-FORMATS(5)
   2
   3 1mNAME0m
   4      1mlibarchive-formats 22m— archive formats supported by the libarchive library
   5
   6 1mDESCRIPTION0m
   7      The libarchive(3) library reads and writes a variety of streaming archive
   8      formats.  Generally speaking, all of these archive formats consist of a
   9      series of “entries”.  Each entry stores a single file system object, such
  10      as a file, directory, or symbolic link.
  11
  12      The following provides a brief description of each format supported by
  13      libarchive, with some information about recognized extensions or limita‐
  14      tions of the current library support.  Note that just because a format is
  15      supported by libarchive does not imply that a program that uses
  16      libarchive will support that format.  Applications that use libarchive
  17      specify which formats they wish to support, though many programs do use
  18      libarchive convenience functions to enable all supported formats.
  19
  20    1mTar Formats0m
  21      The libarchive(3) library can read most tar archives.  It can write
  22      POSIX-standard “ustar” and “pax interchange” formats as well as v7 tar
  23      format and a subset of the legacy GNU tar format.
  24
  25      All tar formats store each entry in one or more 512-byte records.  The
  26      first record is used for file metadata, including filename, timestamp,
  27      and mode information, and the file data is stored in subsequent records.
  28      Later variants have extended this by either appropriating undefined areas
  29      of the header record, extending the header to multiple records, or by
  30      storing special entries that modify the interpretation of subsequent
  31      entries.
  32
  33      1mgnutar  22mThe libarchive(3) library can read most GNU-format tar archives.
  34              It currently supports the most popular GNU extensions, including
  35              modern long filename and linkname support, as well as atime and
  36              ctime data.  The libarchive library does not support multi-volume
  37              archives, nor the old GNU long filename format.  It can read GNU
  38              sparse file entries, including the new POSIX-based formats.
  39
  40              The libarchive(3) library can write GNU tar format, including
  41              long filename and linkname support, as well as atime and ctime
  42              data.
  43
  44      1mpax     22mThe libarchive(3) library can read and write POSIX-compliant pax
  45              interchange format archives.  Pax interchange format archives are
  46              an extension of the older ustar format that adds a separate entry
  47              with additional attributes stored as key/value pairs immediately
  48              before each regular entry.  The presence of these additional
  49              entries is the only difference between pax interchange format and
  50              the older ustar format.  The extended attributes are of unlimited
  51              length and are stored as UTF-8 Unicode strings.  Keywords defined
  52              in the standard are in all lowercase; vendors are allowed to
  53              define custom keys by preceding them with the vendor name in all
  54              uppercase.  When writing pax archives, libarchive uses many of
  55              the SCHILY keys defined by Joerg Schilling's “star” archiver and
  56              a few LIBARCHIVE keys.  The libarchive library can read most of
  57              the SCHILY keys and most of the GNU keys introduced by GNU tar.
  58              It silently ignores any keywords that it does not understand.
  59
  60              The pax interchange format converts filenames to Unicode and
  61              stores them using the UTF-8 encoding.  Prior to libarchive 3.0,
  62              libarchive erroneously assumed that the system wide-character
  63              routines natively supported Unicode.  This caused it to mis-han‐
  64              dle non-ASCII filenames on systems that did not satisfy this
  65              assumption.
  66
  67      1mrestricted pax0m
  68              The libarchive library can also write pax archives in which it
  69              attempts to suppress the extended attributes entry whenever pos‐
  70              sible.  The result will be identical to a ustar archive unless
  71              the extended attributes entry is required to store a long file
  72              name, long linkname, extended ACL, file flags, or if any of the
  73              standard ustar data (user name, group name, UID, GID, etc) cannot
  74              be fully represented in the ustar header.  In all cases, the
  75              result can be dearchived by any program that can read POSIX-com‐
  76              pliant pax interchange format archives.  Programs that correctly
  77              read ustar format (see below) will also be able to read this for‐
  78              mat; any extended attributes will be extracted as separate files
  79              stored in 4mPaxHeader24m directories.
  80
  81      1mustar   22mThe libarchive library can both read and write this format.  This
  82              format has the following limitations:
  83              1m·   22mDevice major and minor numbers are limited to 21 bits.  Nodes
  84                  with larger numbers will not be added to the archive.
  85              1m·   22mPath names in the archive are limited to 255 bytes.  (Shorter
  86                  if there is no / character in exactly the right place.)
  87              1m·   22mSymbolic links and hard links are stored in the archive with
  88                  the name of the referenced file.  This name is limited to 100
  89                  bytes.
  90              1m·   22mExtended attributes, file flags, and other extended security
  91                  information cannot be stored.
  92              1m·   22mArchive entries are limited to 8 gigabytes in size.
  93              Note that the pax interchange format has none of these restric‐
  94              tions.  The ustar format is old and widely supported.  It is rec‐
  95              ommended when compatibility is the primary concern.
  96
  97      1mv7      22mThe libarchive library can read and write the legacy v7 tar for‐
  98              mat.  This format has the following limitations:
  99              1m·   22mOnly regular files, directories, and symbolic links can be
 100                  archived.  Block and character device nodes, FIFOs, and sock‐
 101                  ets cannot be archived.
 102              1m·   22mPath names in the archive are limited to 100 bytes.
 103              1m·   22mSymbolic links and hard links are stored in the archive with
 104                  the name of the referenced file.  This name is limited to 100
 105                  bytes.
 106              1m·   22mUser and group information are stored as numeric IDs; there
 107                  is no provision for storing user or group names.
 108              1m·   22mExtended attributes, file flags, and other extended security
 109                  information cannot be stored.
 110              1m·   22mArchive entries are limited to 8 gigabytes in size.
 111              Generally, users should prefer the ustar format for portability
 112              as the v7 tar format is both less useful and less portable.
 113
 114      The libarchive library also reads a variety of commonly-used extensions
 115      to the basic tar format.  These extensions are recognized automatically
 116      whenever they appear.
 117
 118      Numeric extensions.
 119              The POSIX standards require fixed-length numeric fields to be
 120              written with some character position reserved for terminators.
 121              Libarchive allows these fields to be written without terminator
 122              characters.  This extends the allowable range; in particular,
 123              ustar archives with this extension can support entries up to 64
 124              gigabytes in size.  Libarchive also recognizes base-256 values in
 125              most numeric fields.  This essentially removes all limitations on
 126              file size, modification time, and device numbers.
 127
 128      Solaris extensions
 129              Libarchive recognizes ACL and extended attribute records written
 130              by Solaris tar.
 131
 132      The first tar program appeared in Seventh Edition Unix in 1979.  The
 133      first official standard for the tar file format was the “ustar” (Unix
 134      Standard Tar) format defined by POSIX in 1988.  POSIX.1-2001 extended the
 135      ustar format to create the “pax interchange” format.
 136
 137    1mCpio Formats0m
 138      The libarchive library can read a number of common cpio variants and can
 139      write “odc” and “newc” format archives.  A cpio archive stores each entry
 140      as a fixed-size header followed by a variable-length filename and vari‐
 141      able-length data.  Unlike the tar format, the cpio format does only mini‐
 142      mal padding of the header or file data.  There are several cpio variants,
 143      which differ primarily in how they store the initial header: some store
 144      the values as octal or hexadecimal numbers in ASCII, others as binary
 145      values of varying byte order and length.
 146
 147      1mbinary  22mThe libarchive library transparently reads both big-endian and
 148              little-endian variants of the original binary cpio format.  This
 149              format used 32-bit binary values for file size and mtime, and
 150              16-bit binary values for the other fields.
 151
 152      1modc     22mThe libarchive library can both read and write this POSIX-stan‐
 153              dard format, which is officially known as the “cpio interchange
 154              format” or the “octet-oriented cpio archive format” and sometimes
 155              unofficially referred to as the “old character format”.  This
 156              format stores the header contents as octal values in ASCII.  It
 157              is standard, portable, and immune from byte-order confusion.
 158              File sizes and mtime are limited to 33 bits (8GB file size),
 159              other fields are limited to 18 bits.
 160
 161      1mSVR4/newc0m
 162              The libarchive library can read both CRC and non-CRC variants of
 163              this format.  The SVR4 format uses eight-digit hexadecimal values
 164              for all header fields.  This limits file size to 4GB, and also
 165              limits the mtime and other fields to 32 bits.  The SVR4 format
 166              can optionally include a CRC of the file contents, although
 167              libarchive does not currently verify this CRC.
 168
 169      Cpio first appeared in PWB/UNIX 1.0, which was released within AT&T in
 170      1977.  PWB/UNIX 1.0 formed the basis of System III Unix, released outside
 171      of AT&T in 1981.  This makes cpio older than tar, although cpio was not
 172      included in Version 7 AT&T Unix.  As a result, the tar command became
 173      much better known in universities and research groups that used Version
 174      7.  The combination of the 1mfind 22mand 1mcpio 22mutilities provided very precise
 175      control over file selection.  Unfortunately, the format has many limita‐
 176      tions that make it unsuitable for widespread use.  Only the POSIX format
 177      permits files over 4GB, and its 18-bit limit for most other fields makes
 178      it unsuitable for modern systems.  In addition, cpio formats only store
 179      numeric UID/GID values (not usernames and group names), which can make it
 180      very difficult to correctly transfer archives across systems with dissim‐
 181      ilar user numbering.
 182
 183    1mShar Formats0m
 184      A “shell archive” is a shell script that, when executed on a POSIX-com‐
 185      pliant system, will recreate a collection of file system objects.  The
 186      libarchive library can write two different kinds of shar archives:
 187
 188      1mshar    22mThe traditional shar format uses a limited set of POSIX commands,
 189              including echo(1), mkdir(1), and sed(1).  It is suitable for
 190              portably archiving small collections of plain text files.  How‐
 191              ever, it is not generally well-suited for large archives (many
 192              implementations of sh(1) have limits on the size of a script) nor
 193              should it be used with non-text files.
 194
 195      1mshardump0m
 196              This format is similar to shar but encodes files using
 197              uuencode(1) so that the result will be a plain text file regard‐
 198              less of the file contents.  It also includes additional shell
 199              commands that attempt to reproduce as many file attributes as
 200              possible, including owner, mode, and flags.  The additional com‐
 201              mands used to restore file attributes make shardump archives less
 202              portable than plain shar archives.
 203
 204    1mISO9660 format0m
 205      Libarchive can read and extract from files containing ISO9660-compliant
 206      CDROM images.  In many cases, this can remove the need to burn a physical
 207      CDROM just in order to read the files contained in an ISO9660 image.  It
 208      also avoids security and complexity issues that come with virtual mounts
 209      and loopback devices.  Libarchive supports the most common Rockridge
 210      extensions and has partial support for Joliet extensions.  If both exten‐
 211      sions are present, the Joliet extensions will be used and the Rockridge
 212      extensions will be ignored.  In particular, this can create problems with
 213      hardlinks and symlinks, which are supported by Rockridge but not by
 214      Joliet.
 215
 216      Libarchive reads ISO9660 images using a streaming strategy.  This allows
 217      it to read compressed images directly (decompressing on the fly) and
 218      allows it to read images directly from network sockets, pipes, and other
 219      non-seekable data sources.  This strategy works well for optimized
 220      ISO9660 images created by many popular programs.  Such programs collect
 221      all directory information at the beginning of the ISO9660 image so it can
 222      be read from a physical disk with a minimum of seeking.  However, not all
 223      ISO9660 images can be read in this fashion.
 224
 225      Libarchive can also write ISO9660 images.  Such images are fully opti‐
 226      mized with the directory information preceding all file data.  This is
 227      done by storing all file data to a temporary file while collecting direc‐
 228      tory information in memory.  When the image is finished, libarchive
 229      writes out the directory structure followed by the file data.  The loca‐
 230      tion used for the temporary file can be changed by the usual environment
 231      variables.
 232
 233    1mZip format0m
 234      Libarchive can read and write zip format archives that have uncompressed
 235      entries and entries compressed with the “deflate” algorithm.  Other zip
 236      compression algorithms are not supported.  It can extract jar archives,
 237      archives that use Zip64 extensions and self-extracting zip archives.
 238      Libarchive can use either of two different strategies for reading Zip ar‐
 239      chives: a streaming strategy which is fast and can handle extremely large
 240      archives, and a seeking strategy which can correctly process self-
 241      extracting Zip archives and archives with deleted members or other in-
 242      place modifications.
 243
 244      The streaming reader processes Zip archives as they are read.  It can
 245      read archives of arbitrary size from tape or network sockets, and can
 246      decode Zip archives that have been separately compressed or encoded.
 247      However, self-extracting Zip archives and archives with certain types of
 248      modifications cannot be correctly handled.  Such archives require that
 249      the reader first process the Central Directory, which is ordinarily
 250      located at the end of a Zip archive and is thus inaccessible to the
 251      streaming reader.  If the program using libarchive has enabled seek sup‐
 252      port, then libarchive will use this to processes the central directory
 253      first.
 254
 255      In particular, the seeking reader must be used to correctly handle self-
 256      extracting archives.  Such archives consist of a program followed by a
 257      regular Zip archive.  The streaming reader cannot parse the initial pro‐
 258      gram portion, but the seeking reader starts by reading the Central Direc‐
 259      tory from the end of the archive.  Similarly, Zip archives that have been
 260      modified in-place can have deleted entries or other garbage data that can
 261      only be accurately detected by first reading the Central Directory.
 262
 263    1mArchive (library) file format0m
 264      The Unix archive format (commonly created by the ar(1) archiver) is a
 265      general-purpose format which is used almost exclusively for object files
 266      to be read by the link editor ld(1).  The ar format has never been stan‐
 267      dardised.  There are two common variants: the GNU format derived from
 268      SVR4, and the BSD format, which first appeared in 4.4BSD.  The two differ
 269      primarily in their handling of filenames longer than 15 characters: the
 270      GNU/SVR4 variant writes a filename table at the beginning of the archive;
 271      the BSD format stores each long filename in an extension area adjacent to
 272      the entry.  Libarchive can read both extensions, including archives that
 273      may include both types of long filenames.  Programs using libarchive can
 274      write GNU/SVR4 format if they provide an entry called 4m//24m containing a
 275      filename table to be written into the archive before any of the entries.
 276      Any entries whose names are not in the filename table will be written
 277      using BSD-style long filenames.  This can cause problems for programs
 278      such as GNU ld that do not support the BSD-style long filenames.
 279
 280    1mmtree0m
 281      Libarchive can read and write files in mtree(5) format.  This format is
 282      not a true archive format, but rather a textual description of a file
 283      hierarchy in which each line specifies the name of a file and provides
 284      specific metadata about that file.  Libarchive can read all of the key‐
 285      words supported by both the NetBSD and FreeBSD versions of mtree(8),
 286      although many of the keywords cannot currently be stored in an
 287      archive_entry object.  When writing, libarchive supports use of the
 288      archive_write_set_options(3) interface to specify which keywords should
 289      be included in the output.  If libarchive was compiled with access to
 290      suitable cryptographic libraries (such as the OpenSSL libraries), it can
 291      compute hash entries such as 1msha512 22mor 1mmd5 22mfrom file data being written
 292      to the mtree writer.
 293
 294      When reading an mtree file, libarchive will locate the corresponding
 295      files on disk using the 1mcontents 22mkeyword if present or the regular file‐
 296      name.  If it can locate and open the file on disk, it will use that to
 297      fill in any metadata that is missing from the mtree file and will read
 298      the file contents and return those to the program using libarchive.  If
 299      it cannot locate and open the file on disk, libarchive will return an
 300      error for any attempt to read the entry body.
 301
 302    1m7-Zip0m
 303      Libarchive can read and write 7-Zip format archives.  TODO: Need more
 304      information
 305
 306    1mCAB0m
 307      Libarchive can read Microsoft Cabinet ( “CAB”) format archives.  TODO:
 308      Need more information.
 309
 310    1mLHA0m
 311      TODO: Information about libarchive's LHA support
 312
 313    1mRAR0m
 314      Libarchive has limited support for reading RAR format archives.  Cur‐
 315      rently, libarchive can read RARv3 format archives which have been either
 316      created uncompressed, or compressed using any of the compression methods
 317      supported by the RARv3 format.  Libarchive can also read self-extracting
 318      RAR archives.
 319
 320    1mWarc0m
 321      Libarchive can read and write “web archives”.  TODO: Need more informa‐
 322      tion
 323
 324    1mXAR0m
 325      Libarchive can read and write the XAR format used by many Apple tools.
 326      TODO: Need more information
 327
 328 1mSEE ALSO0m
 329      ar(1), cpio(1), mkisofs(1), shar(1), tar(1), zip(1), zlib(3), cpio(5),
 330      mtree(5), tar(5)
 331
 332 BSD                            December 27, 2016                           BSD