man/man1/dos2unix.txt

   1 NAME
   2     dos2unix - DOS/Mac to Unix and vice versa text file format converter
   3
   4 SYNOPSIS
   5         dos2unix [options] [FILE ...] [-n INFILE OUTFILE ...]
   6         unix2dos [options] [FILE ...] [-n INFILE OUTFILE ...]
   7
   8 DESCRIPTION
   9     The Dos2unix package includes utilities "dos2unix" and "unix2dos" to
  10     convert plain text files in DOS or Mac format to Unix format and vice
  11     versa.
  12
  13     In DOS/Windows text files a line break, also known as newline, is a
  14     combination of two characters: a Carriage Return (CR) followed by a Line
  15     Feed (LF). In Unix text files a line break is a single character: the
  16     Line Feed (LF). In Mac text files, prior to Mac OS X, a line break was
  17     single Carriage Return (CR) character. Nowadays Mac OS uses Unix style
  18     (LF) line breaks.
  19
  20     Besides line breaks Dos2unix can also convert the encoding of files. A
  21     few DOS code pages can be converted to Unix Latin-1. And Windows Unicode
  22     (UTF-16) files can be converted to Unix Unicode (UTF-8) files.
  23
  24     Binary files are automatically skipped, unless conversion is forced.
  25
  26     Non-regular files, such as directories and FIFOs, are automatically
  27     skipped.
  28
  29     Symbolic links and their targets are by default kept untouched. Symbolic
  30     links can optionally be replaced, or the output can be written to the
  31     symbolic link target. Writing to a symbolic link target is not supported
  32     on Windows.
  33
  34     Dos2unix was modelled after dos2unix under SunOS/Solaris. There is one
  35     important difference with the original SunOS/Solaris version. This
  36     version does by default in-place conversion (old file mode), while the
  37     original SunOS/Solaris version only supports paired conversion (new file
  38     mode). See also options "-o" and "-n".
  39
  40 OPTIONS
  41     --  Treat all following options as file names. Use this option if you
  42         want to convert files whose names start with a dash. For instance to
  43         convert a file named "-foo", you can use this command:
  44
  45             dos2unix -- -foo
  46
  47         Or in new file mode:
  48
  49             dos2unix -n -- -foo out.txt
  50
  51     -ascii
  52         Convert only line breaks. This is the default conversion mode.
  53
  54     -iso
  55         Conversion between DOS and ISO-8859-1 character set. See also
  56         section CONVERSION MODES.
  57
  58     -1252
  59         Use Windows code page 1252 (Western European).
  60
  61     -437
  62         Use DOS code page 437 (US). This is the default code page used for
  63         ISO conversion.
  64
  65     -850
  66         Use DOS code page 850 (Western European).
  67
  68     -860
  69         Use DOS code page 860 (Portuguese).
  70
  71     -863
  72         Use DOS code page 863 (French Canadian).
  73
  74     -865
  75         Use DOS code page 865 (Nordic).
  76
  77     -7  Convert 8 bit characters to 7 bit space.
  78
  79     -b, --keep-bom
  80         Keep Byte Order Mark (BOM). When the input file has a BOM, write a
  81         BOM in the output file. This is the default behavior when converting
  82         to DOS line breaks. See also option "-r".
  83
  84     -c, --convmode CONVMODE
  85         Set conversion mode. Where CONVMODE is one of: *ascii*, *7bit*,
  86         *iso*, *mac* with ascii being the default.
  87
  88     -f, --force
  89         Force conversion of binary files.
  90
  91     -h, --help
  92         Display help and exit.
  93
  94     -k, --keepdate
  95         Keep the date stamp of output file same as input file.
  96
  97     -L, --license
  98         Display program's license.
  99
 100     -l, --newline
 101         Add additional newline.
 102
 103         dos2unix: Only DOS line breaks are changed to two Unix line breaks.
 104         In Mac mode only Mac line breaks are changed to two Unix line
 105         breaks.
 106
 107         unix2dos: Only Unix line breaks are changed to two DOS line breaks.
 108         In Mac mode Unix line breaks are changed to two Mac line breaks.
 109
 110     -m, --add-bom
 111         Write a Byte Order Mark (BOM) in the output file. By default an
 112         UTF-8 BOM is written.
 113
 114         When the input file is UTF-16, and the option "-u" is used, an
 115         UTF-16 BOM will be written.
 116
 117         Never use this option when the output encoding is other than UTF-8
 118         or UTF-16. See also section UNICODE.
 119
 120     -n, --newfile INFILE OUTFILE ...
 121         New file mode. Convert file INFILE and write output to file OUTFILE.
 122         File names must be given in pairs and wildcard names should *not* be
 123         used or you *will* lose your files.
 124
 125         The person who starts the conversion in new file (paired) mode will
 126         be the owner of the converted file. The read/write permissions of
 127         the new file will be the permissions of the original file minus the
 128         umask(1) of the person who runs the conversion.
 129
 130     -o, --oldfile FILE ...
 131         Old file mode. Convert file FILE and overwrite output to it. The
 132         program defaults to run in this mode. Wildcard names may be used.
 133
 134         In old file (in-place) mode the converted file gets the same owner,
 135         group, and read/write permissions as the original file. Also when
 136         the file is converted by another user who has write permissions on
 137         the file (e.g. user root). The conversion will be aborted when it is
 138         not possible to preserve the original values. Change of owner could
 139         mean that the original owner is not able to read the file any more.
 140         Change of group could be a security risk, the file could be made
 141         readable for persons for whom it is not intended. Preservation of
 142         owner, group, and read/write permissions is only supported on Unix.
 143
 144     -q, --quiet
 145         Quiet mode. Suppress all warnings and messages. The return value is
 146         zero. Except when wrong command-line options are used.
 147
 148     -r, --remove-bom
 149         Remove Byte Order Mark (BOM). Do not write a BOM in the output file.
 150         This is the default behavior when converting to Unix line breaks.
 151         See also option "-b".
 152
 153     -s, --safe
 154         Skip binary files (default).
 155
 156     -u, --keep-utf16
 157         Keep the original UTF-16 encoding of the input file. The output file
 158         will be written in the same UTF-16 encoding, little or big endian,
 159         as the input file. This prevents transformation to UTF-8. An UTF-16
 160         BOM will be written accordingly. This option can be disabled with
 161         the "-ascii" option.
 162
 163     -ul, --assume-utf16le
 164         Assume that the input file format is UTF-16LE.
 165
 166         When there is a Byte Order Mark in the input file the BOM has
 167         priority over this option.
 168
 169         When you made a wrong assumption (the input file was not in UTF-16LE
 170         format) and the conversion succeeded, you will get an UTF-8 output
 171         file with wrong text. You can undo the wrong conversion with
 172         iconv(1) by converting the UTF-8 output file back to UTF-16LE. This
 173         will bring back the original file.
 174
 175         The assumption of UTF-16LE works as a *conversion mode*. By
 176         switching to the default *ascii* mode the UTF-16LE assumption is
 177         turned off.
 178
 179     -ub, --assume-utf16be
 180         Assume that the input file format is UTF-16BE.
 181
 182         This option works the same as option "-ul".
 183
 184     -v, --verbose
 185         Display verbose messages. Extra information is displayed about Byte
 186         Order Marks and the amount of converted line breaks.
 187
 188     -F, --follow-symlink
 189         Follow symbolic links and convert the targets.
 190
 191     -R, --replace-symlink
 192         Replace symbolic links with converted files (original target files
 193         remain unchanged).
 194
 195     -S, --skip-symlink
 196         Keep symbolic links and targets unchanged (default).
 197
 198     -V, --version
 199         Display version information and exit.
 200
 201 MAC MODE
 202     In normal mode line breaks are converted from DOS to Unix and vice
 203     versa. Mac line breaks are not converted.
 204
 205     In Mac mode line breaks are converted from Mac to Unix and vice versa.
 206     DOS line breaks are not changed.
 207
 208     To run in Mac mode use the command-line option "-c mac" or use the
 209     commands "mac2unix" or "unix2mac".
 210
 211 CONVERSION MODES
 212     ascii
 213         In mode "ascii" only line breaks are converted. This is the default
 214         conversion mode.
 215
 216         Although the name of this mode is ASCII, which is a 7 bit standard,
 217         the actual mode is 8 bit. Use always this mode when converting
 218         Unicode UTF-8 files.
 219
 220     7bit
 221         In this mode all 8 bit non-ASCII characters (with values from 128 to
 222         255) are converted to a 7 bit space.
 223
 224     iso Characters are converted between a DOS character set (code page) and
 225         ISO character set ISO-8859-1 (Latin-1) on Unix. DOS characters
 226         without ISO-8859-1 equivalent, for which conversion is not possible,
 227         are converted to a dot. The same counts for ISO-8859-1 characters
 228         without DOS counterpart.
 229
 230         When only option "-iso" is used dos2unix will try to determine the
 231         active code page. When this is not possible dos2unix will use
 232         default code page CP437, which is mainly used in the USA. To force a
 233         specific code page use options -437 (US), -850 (Western European),
 234         -860 (Portuguese), -863 (French Canadian), or -865 (Nordic). Windows
 235         code page CP1252 (Western European) is also supported with option
 236         -1252. For other code pages use dos2unix in combination with
 237         iconv(1). Iconv can convert between a long list of character
 238         encodings.
 239
 240         Never use ISO conversion on Unicode text files. It will corrupt
 241         UTF-8 encoded files.
 242
 243         Some examples:
 244
 245         Convert from DOS default code page to Unix Latin-1
 246
 247             dos2unix -iso -n in.txt out.txt
 248
 249         Convert from DOS CP850 to Unix Latin-1
 250
 251             dos2unix -850 -n in.txt out.txt
 252
 253         Convert from Windows CP1252 to Unix Latin-1
 254
 255             dos2unix -1252 -n in.txt out.txt
 256
 257         Convert from Windows CP1252 to Unix UTF-8 (Unicode)
 258
 259             iconv -f CP1252 -t UTF-8 in.txt | dos2unix > out.txt
 260
 261         Convert from Unix Latin-1 to DOS default code page
 262
 263             unix2dos -iso -n in.txt out.txt
 264
 265         Convert from Unix Latin-1 to DOS CP850
 266
 267             unix2dos -850 -n in.txt out.txt
 268
 269         Convert from Unix Latin-1 to Windows CP1252
 270
 271             unix2dos -1252 -n in.txt out.txt
 272
 273         Convert from Unix UTF-8 (Unicode) to Windows CP1252
 274
 275             unix2dos < in.txt | iconv -f UTF-8 -t CP1252 > out.txt
 276
 277         See also <http://czyborra.com/charsets/codepages.html> and
 278         <http://czyborra.com/charsets/iso8859.html>.
 279
 280 UNICODE
 281   Encodings
 282     There exist different Unicode encodings. On Unix and Linux Unicode files
 283     are typically encoded in UTF-8 encoding. On Windows Unicode text files
 284     can be encoded in UTF-8, UTF-16, or UTF-16 big endian, but are mostly
 285     encoded in UTF-16 format.
 286
 287   Conversion
 288     Unicode text files can have DOS, Unix or Mac line breaks, like regular
 289     text files.
 290
 291     All versions of dos2unix and unix2dos can convert UTF-8 encoded files,
 292     because UTF-8 was designed for backward compatibility with ASCII.
 293
 294     Dos2unix and unix2dos with Unicode UTF-16 support, can read little and
 295     big endian UTF-16 encoded text files. To see if dos2unix was built with
 296     UTF-16 support type "dos2unix -V".
 297
 298     UTF-16 encoded files are by default converted to UTF-8. On Unix/Linux it
 299     is required that the locale character encoding is set to UTF-8. Use the
 300     locale(1) command to find out what the locale character encoding is.
 301     UTF-8 formatted text files are well supported on both Windows and
 302     Unix/Linux.
 303
 304     UTF-16 and UTF-8 encoding are fully compatible, there will no text be
 305     lost in the conversion. When an UTF-16 to UTF-8 conversion error occurs,
 306     for instance when the UTF-16 input file contains an error, the file will
 307     be skipped.
 308
 309     When option "-u" is used, the output file will be written in the same
 310     UTF-16 encoding as the input file. Option "-u" prevents conversion to
 311     UTF-8.
 312
 313     Dos2unix and unix2dos have no option to convert UTF-8 files to UTF-16.
 314
 315     ISO and 7-bit mode conversion do not work on UTF-16 files.
 316
 317   Byte Order Mark
 318     On Windows Unicode text files typically have a Byte Order Mark (BOM),
 319     because many Windows programs (including Notepad) add BOMs by default.
 320     See also <http://en.wikipedia.org/wiki/Byte_order_mark>.
 321
 322     On Unix Unicode files typically don't have a BOM. It is assumed that
 323     text files are encoded in the locale character encoding.
 324
 325     Dos2unix can only detect if a file is in UTF-16 format if the file has a
 326     BOM. When an UTF-16 file doesn't have a BOM, dos2unix will see the file
 327     as a binary file.
 328
 329     Use option "-ul" or "-ub" to convert an UTF-16 file without BOM.
 330
 331     Dos2unix writes by default no BOM in the output file. With option "-b"
 332     Dos2unix writes a BOM when the input file has a BOM.
 333
 334     Unix2dos writes by default a BOM in the output file when the input file
 335     has a BOM. Use option "-r" to remove the BOM.
 336
 337     Dos2unix and unix2dos write always a BOM when option "-m" is used.
 338
 339   Unicode examples
 340     Convert from Windows UTF-16 (with BOM) to Unix UTF-8
 341
 342         dos2unix -n in.txt out.txt
 343
 344     Convert from Windows UTF-16LE (without BOM) to Unix UTF-8
 345
 346         dos2unix -ul -n in.txt out.txt
 347
 348     Convert from Unix UTF-8 to Windows UTF-8 with BOM
 349
 350         unix2dos -m -n in.txt out.txt
 351
 352     Convert from Unix UTF-8 to Windows UTF-16
 353
 354         unix2dos < in.txt | iconv -f UTF-8 -t UTF-16 > out.txt
 355
 356 EXAMPLES
 357     Read input from 'stdin' and write output to 'stdout'.
 358
 359         dos2unix
 360         dos2unix -l -c mac
 361
 362     Convert and replace a.txt. Convert and replace b.txt.
 363
 364         dos2unix a.txt b.txt
 365         dos2unix -o a.txt b.txt
 366
 367     Convert and replace a.txt in ascii conversion mode.
 368
 369         dos2unix a.txt
 370
 371     Convert and replace a.txt in ascii conversion mode. Convert and replace
 372     b.txt in 7bit conversion mode.
 373
 374         dos2unix a.txt -c 7bit b.txt
 375         dos2unix -c ascii a.txt -c 7bit b.txt
 376         dos2unix -ascii a.txt -7 b.txt
 377
 378     Convert a.txt from Mac to Unix format.
 379
 380         dos2unix -c mac a.txt
 381         mac2unix a.txt
 382
 383     Convert a.txt from Unix to Mac format.
 384
 385         unix2dos -c mac a.txt
 386         unix2mac a.txt
 387
 388     Convert and replace a.txt while keeping original date stamp.
 389
 390         dos2unix -k a.txt
 391         dos2unix -k -o a.txt
 392
 393     Convert a.txt and write to e.txt.
 394
 395         dos2unix -n a.txt e.txt
 396
 397     Convert a.txt and write to e.txt, keep date stamp of e.txt same as
 398     a.txt.
 399
 400         dos2unix -k -n a.txt e.txt
 401
 402     Convert and replace a.txt. Convert b.txt and write to e.txt.
 403
 404         dos2unix a.txt -n b.txt e.txt
 405         dos2unix -o a.txt -n b.txt e.txt
 406
 407     Convert c.txt and write to e.txt. Convert and replace a.txt. Convert and
 408     replace b.txt. Convert d.txt and write to f.txt.
 409
 410         dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt
 411
 412 RECURSIVE CONVERSION
 413     Use dos2unix in combination with the find(1) and xargs(1) commands to
 414     recursively convert text files in a directory tree structure. For
 415     instance to convert all .txt files in the directory tree under the
 416     current directory type:
 417
 418         find . -name *.txt |xargs dos2unix
 419
 420 LOCALIZATION
 421     LANG
 422         The primary language is selected with the environment variable LANG.
 423         The LANG variable consists out of several parts. The first part is
 424         in small letters the language code. The second is optional and is
 425         the country code in capital letters, preceded with an underscore.
 426         There is also an optional third part: character encoding, preceded
 427         with a dot. A few examples for POSIX standard type shells:
 428
 429             export LANG=nl               Dutch
 430             export LANG=nl_NL            Dutch, The Netherlands
 431             export LANG=nl_BE            Dutch, Belgium
 432             export LANG=es_ES            Spanish, Spain
 433             export LANG=es_MX            Spanish, Mexico
 434             export LANG=en_US.iso88591   English, USA, Latin-1 encoding
 435             export LANG=en_GB.UTF-8      English, UK, UTF-8 encoding
 436
 437         For a complete list of language and country codes see the gettext
 438         manual:
 439         <http://www.gnu.org/software/gettext/manual/gettext.html#Language-Co
 440         des>
 441
 442         On Unix systems you can use to command locale(1) to get locale
 443         specific information.
 444
 445     LANGUAGE
 446         With the LANGUAGE environment variable you can specify a priority
 447         list of languages, separated by colons. Dos2unix gives preference to
 448         LANGUAGE over LANG. For instance, first Dutch and then German:
 449         "LANGUAGE=nl:de". You have to first enable localization, by setting
 450         LANG (or LC_ALL) to a value other than "C", before you can use a
 451         language priority list through the LANGUAGE variable. See also the
 452         gettext manual:
 453         <http://www.gnu.org/software/gettext/manual/gettext.html#The-LANGUAG
 454         E-variable>
 455
 456         If you select a language which is not available you will get the
 457         standard English messages.
 458
 459     DOS2UNIX_LOCALEDIR
 460         With the environment variable DOS2UNIX_LOCALEDIR the LOCALEDIR set
 461         during compilation can be overruled. LOCALEDIR is used to find the
 462         language files. The GNU default value is "/usr/local/share/locale".
 463         Option --version will display the LOCALEDIR that is used.
 464
 465         Example (POSIX shell):
 466
 467             export DOS2UNIX_LOCALEDIR=$HOME/share/locale
 468
 469 RETURN VALUE
 470     On success, zero is returned. When a system error occurs the last system
 471     error will be returned. For other errors 1 is returned.
 472
 473     The return value is always zero in quiet mode, except when wrong
 474     command-line options are used.
 475
 476 STANDARDS
 477     <http://en.wikipedia.org/wiki/Text_file>
 478
 479     <http://en.wikipedia.org/wiki/Carriage_return>
 480
 481     <http://en.wikipedia.org/wiki/Newline>
 482
 483     <http://en.wikipedia.org/wiki/Unicode>
 484
 485 AUTHORS
 486     Benjamin Lin - <blin@socs.uts.edu.au> Bernd Johannes Wuebben (mac2unix
 487     mode) - <wuebben@kde.org>, Christian Wurll (add extra newline) -
 488     <wurll@ira.uka.de>, Erwin Waterlander - <waterlan@xs4all.nl>
 489     (Maintainer)
 490
 491     Project page: <http://waterlan.home.xs4all.nl/dos2unix.html>
 492
 493     SourceForge page: <http://sourceforge.net/projects/dos2unix/>
 494
 495 SEE ALSO
 496     file(1) find(1) iconv(1) locale(1) xargs(1)
 497