man/man1/dos2unix.pod

   1 /*
   2 #  Copyright and License
   3 #
   4 #   Copyright (C) 2009-2014 Erwin Waterlander
   5 #   All rights reserved.
   6 #
   7 #   Redistribution and use in source and binary forms, with or without
   8 #   modification, are permitted provided that the following conditions
   9 #   are met:
  10 #   1. Redistributions of source code must retain the above copyright
  11 #      notice, this list of conditions and the following disclaimer.
  12 #   2. Redistributions in binary form must reproduce the above copyright
  13 #      notice in the documentation and/or other materials provided with
  14 #      the distribution.
  15 #
  16 #   THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY
  17 #   EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  18 #   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  19 #   PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE
  20 #   FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  21 #   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
  22 #   OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
  23 #   BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  24 #   WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
  25 #   OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
  26 #   IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27 #
  28 #   Description
  29 #
  30 #       To learn what TOP LEVEL section to use in manual pages,
  31 #       see POSIX/Susv standard and "Utility Description Defaults" at
  32 #       http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap01.html#tag_01_11
  33 #
  34 #       This is manual page in Perl POD format. Read more at
  35 #       http://perldoc.perl.org/perlpod.html or run command:
  36 #
  37 #           perldoc perlpod | less
  38 #
  39 #       To check the syntax:
  40 #
  41 #           podchecker *.pod
  42 #
  43 #       Create manual page with command:
  44 #
  45 #           pod2man PAGE.N.pod > PAGE.N
  46 */
  47
  48 =pod
  49
  50 =encoding UTF-8
  51
  52 =head1 NAME
  53
  54 dos2unix - DOS/Mac to Unix and vice versa text file format converter
  55
  56 =head1 SYNOPSIS
  57
  58     dos2unix [options] [FILE ...] [-n INFILE OUTFILE ...]
  59     unix2dos [options] [FILE ...] [-n INFILE OUTFILE ...]
  60
  61 =head1 DESCRIPTION
  62
  63 The Dos2unix package includes utilities C<dos2unix> and C<unix2dos> to convert
  64 plain text files in DOS or Mac format to Unix format and vice versa.
  65
  66 In DOS/Windows text files a line break, also known as newline, is a combination
  67 of two characters: a Carriage Return (CR) followed by a Line Feed (LF). In Unix
  68 text files a line break is a single character: the Line Feed (LF). In Mac text
  69 files, prior to Mac OS X, a line break was single Carriage Return (CR)
  70 character. Nowadays Mac OS uses Unix style (LF) line breaks.
  71
  72 Besides line breaks Dos2unix can also convert the encoding of files. A few
  73 DOS code pages can be converted to Unix Latin-1. And Windows Unicode (UTF-16)
  74 files can be converted to Unix Unicode (UTF-8) files.
  75
  76 Binary files are automatically skipped, unless conversion is forced.
  77
  78 Non-regular files, such as directories and FIFOs, are automatically skipped.
  79
  80 Symbolic links and their targets are by default kept untouched.  Symbolic links
  81 can optionally be replaced, or the output can be written to the symbolic link
  82 target.  Writing to a symbolic link target is not supported on Windows.
  83
  84 Dos2unix was modelled after dos2unix under SunOS/Solaris.  There is one
  85 important difference with the original SunOS/Solaris version. This version does
  86 by default in-place conversion (old file mode), while the original
  87 SunOS/Solaris version only supports paired conversion (new file mode). See also
  88 options C<-o> and C<-n>.
  89
  90 =head1 OPTIONS
  91
  92 =over 4
  93
  94 =item B<-->
  95
  96 Treat all following options as file names. Use this option if you want to
  97 convert files whose names start with a dash. For instance to convert
  98 a file named "-foo", you can use this command:
  99
 100     dos2unix -- -foo
 101
 102 Or in new file mode:
 103
 104     dos2unix -n -- -foo out.txt
 105
 106 =item B<-ascii>
 107
 108 Convert only line breaks. This is the default conversion mode.
 109
 110 =item B<-iso>
 111
 112 Conversion between DOS and ISO-8859-1 character set. See also section
 113 CONVERSION MODES.
 114
 115 =item B<-1252>
 116
 117 Use Windows code page 1252 (Western European).
 118
 119 =item B<-437>
 120
 121 Use DOS code page 437 (US). This is the default code page used for ISO conversion.
 122
 123 =item B<-850>
 124
 125 Use DOS code page 850 (Western European).
 126
 127 =item B<-860>
 128
 129 Use DOS code page 860 (Portuguese).
 130
 131 =item B<-863>
 132
 133 Use DOS code page 863 (French Canadian).
 134
 135 =item B<-865>
 136
 137 Use DOS code page 865 (Nordic).
 138
 139 =item B<-7>
 140
 141 Convert 8 bit characters to 7 bit space.
 142
 143 =item B<-b, --keep-bom>
 144
 145 Keep Byte Order Mark (BOM). When the input file has a BOM, write a BOM in
 146 the output file. This is the default behavior when converting to DOS line
 147 breaks. See also option C<-r>.
 148
 149 =item B<-c, --convmode CONVMODE>
 150
 151 Set conversion mode. Where CONVMODE is one of:
 152 I<ascii>, I<7bit>, I<iso>, I<mac>
 153 with ascii being the default.
 154
 155 =item B<-f, --force>
 156
 157 Force conversion of binary files.
 158
 159 =item B<-h, --help>
 160
 161 Display help and exit.
 162
 163 =item B<-k, --keepdate>
 164
 165 Keep the date stamp of output file same as input file.
 166
 167 =item B<-L, --license>
 168
 169 Display program's license.
 170
 171 =item B<-l, --newline>
 172
 173 Add additional newline.
 174
 175 B<dos2unix>: Only DOS line breaks are changed to two Unix line breaks.
 176 In Mac mode only Mac line breaks are changed to two Unix
 177 line breaks.
 178
 179 B<unix2dos>: Only Unix line breaks are changed to two DOS line breaks.
 180 In Mac mode Unix line breaks are changed to two Mac line breaks.
 181
 182 =item B<-m, --add-bom>
 183
 184 Write a Byte Order Mark (BOM) in the output file. By default an UTF-8 BOM
 185 is written.
 186
 187 When the input file is UTF-16, and the option C<-u> is used, an UTF-16
 188 BOM will be written.
 189
 190 Never use this option when the output encoding is other than UTF-8 or UTF-16.
 191 See also section UNICODE.
 192
 193
 194 =item B<-n, --newfile INFILE OUTFILE ...>
 195
 196 New file mode. Convert file INFILE and write output to file OUTFILE.
 197 File names must be given in pairs and wildcard names should I<not> be
 198 used or you I<will> lose your files.
 199
 200 The person who starts the conversion in new file (paired) mode will be the owner
 201 of the converted file. The read/write permissions of the new file will be the
 202 permissions of the original file minus the umask(1) of the person who runs the
 203 conversion.
 204
 205 =item B<-o, --oldfile FILE ...>
 206
 207 Old file mode. Convert file FILE and overwrite output to it. The program
 208 defaults to run in this mode. Wildcard names may be used.
 209
 210 In old file (in-place) mode the converted file gets the same owner, group, and
 211 read/write permissions as the original file. Also when the file is converted by
 212 another user who has write permissions on the file (e.g. user root).  The
 213 conversion will be aborted when it is not possible to preserve the original
 214 values.  Change of owner could mean that the original owner is not able to read
 215 the file any more. Change of group could be a security risk, the file could be
 216 made readable for persons for whom it is not intended.  Preservation of owner,
 217 group, and read/write permissions is only supported on Unix.
 218
 219 =item B<-q, --quiet>
 220
 221 Quiet mode. Suppress all warnings and messages. The return value is zero.
 222 Except when wrong command-line options are used.
 223
 224 =item B<-r, --remove-bom>
 225
 226 Remove Byte Order Mark (BOM). Do not write a BOM in the output file.
 227 This is the default behavior when converting to Unix line breaks.
 228 See also option C<-b>.
 229
 230 =item B<-s, --safe>
 231
 232 Skip binary files (default).
 233
 234 =item B<-u, --keep-utf16>
 235
 236 Keep the original UTF-16 encoding of the input file. The output file will be
 237 written in the same UTF-16 encoding, little or big endian, as the input file.
 238 This prevents transformation to UTF-8. An UTF-16 BOM will be written
 239 accordingly. This option can be disabled with the C<-ascii> option.
 240
 241 =item B<-ul, --assume-utf16le>
 242
 243 Assume that the input file format is UTF-16LE.
 244
 245 When there is a Byte Order Mark in the input file the BOM has priority over
 246 this option.
 247
 248 When you made a wrong assumption (the input file was not in UTF-16LE format) and
 249 the conversion succeeded, you will get an UTF-8 output file with wrong text.
 250 You can undo the wrong conversion with iconv(1) by converting the UTF-8 output
 251 file back to UTF-16LE. This will bring back the original file.
 252
 253 The assumption of UTF-16LE works as a I<conversion mode>. By switching to the default
 254 I<ascii> mode the UTF-16LE assumption is turned off.
 255
 256 =item B<-ub, --assume-utf16be>
 257
 258 Assume that the input file format is UTF-16BE.
 259
 260 This option works the same as option C<-ul>.
 261
 262 =item B<-v, --verbose>
 263
 264 Display verbose messages. Extra information is displayed about Byte Order Marks
 265 and the amount of converted line breaks.
 266
 267 =item B<-F, --follow-symlink>
 268
 269 Follow symbolic links and convert the targets.
 270
 271 =item B<-R, --replace-symlink>
 272
 273 Replace symbolic links with converted files
 274 (original target files remain unchanged).
 275
 276 =item B<-S, --skip-symlink>
 277
 278 Keep symbolic links and targets unchanged (default).
 279
 280 =item B<-V, --version>
 281
 282 Display version information and exit.
 283
 284 =back
 285
 286 =head1 MAC MODE
 287
 288 In normal mode line breaks are converted from DOS to Unix and vice versa.
 289 Mac line breaks are not converted.
 290
 291 In Mac mode line breaks are converted from Mac to Unix and vice versa. DOS
 292 line breaks are not changed.
 293
 294 To run in Mac mode use the command-line option C<-c mac> or use the
 295 commands C<mac2unix> or C<unix2mac>.
 296
 297 =head1 CONVERSION MODES
 298
 299 =over 4
 300
 301 =item B<ascii>
 302
 303 In mode C<ascii> only line breaks are converted. This is the default conversion
 304 mode.
 305
 306 Although the name of this mode is ASCII, which is a 7 bit standard, the
 307 actual mode is 8 bit. Use always this mode when converting Unicode UTF-8
 308 files.
 309
 310 =item B<7bit>
 311
 312 In this mode all 8 bit non-ASCII characters (with values from 128 to 255)
 313 are converted to a 7 bit space.
 314
 315 =item B<iso>
 316
 317 Characters are converted between a DOS character set (code page) and ISO
 318 character set ISO-8859-1 (Latin-1) on Unix. DOS characters without ISO-8859-1
 319 equivalent, for which conversion is not possible, are converted to a dot. The
 320 same counts for ISO-8859-1 characters without DOS counterpart.
 321
 322 When only option C<-iso> is used dos2unix will try to determine the active code
 323 page. When this is not possible dos2unix will use default code page CP437,
 324 which is mainly used in the USA.  To force a specific code page use options
 325 C<-437> (US), C<-850> (Western European), C<-860> (Portuguese), C<-863> (French
 326 Canadian), or C<-865> (Nordic).  Windows code page CP1252 (Western European) is
 327 also supported with option C<-1252>. For other code pages use dos2unix in
 328 combination with iconv(1).  Iconv can convert between a long list of character
 329 encodings.
 330
 331 Never use ISO conversion on Unicode text files. It will corrupt UTF-8 encoded files.
 332
 333 Some examples:
 334
 335 Convert from DOS default code page to Unix Latin-1
 336
 337     dos2unix -iso -n in.txt out.txt
 338
 339 Convert from DOS CP850 to Unix Latin-1
 340
 341     dos2unix -850 -n in.txt out.txt
 342
 343 Convert from Windows CP1252 to Unix Latin-1
 344
 345     dos2unix -1252 -n in.txt out.txt
 346
 347 Convert from Windows CP1252 to Unix UTF-8 (Unicode)
 348
 349     iconv -f CP1252 -t UTF-8 in.txt | dos2unix > out.txt
 350
 351 Convert from Unix Latin-1 to DOS default code page
 352
 353     unix2dos -iso -n in.txt out.txt
 354
 355 Convert from Unix Latin-1 to DOS CP850
 356
 357     unix2dos -850 -n in.txt out.txt
 358
 359 Convert from Unix Latin-1 to Windows CP1252
 360
 361     unix2dos -1252 -n in.txt out.txt
 362
 363 Convert from Unix UTF-8 (Unicode) to Windows CP1252
 364
 365     unix2dos < in.txt | iconv -f UTF-8 -t CP1252 > out.txt
 366
 367 See also L<http://czyborra.com/charsets/codepages.html>
 368 and L<http://czyborra.com/charsets/iso8859.html>.
 369
 370 =back
 371
 372 =head1 UNICODE
 373
 374 =head2 Encodings
 375
 376 There exist different Unicode encodings. On Unix and Linux Unicode files are
 377 typically encoded in UTF-8 encoding. On Windows Unicode text files can be
 378 encoded in UTF-8, UTF-16, or UTF-16 big endian, but are mostly encoded in
 379 UTF-16 format.
 380
 381 =head2 Conversion
 382
 383 Unicode text files can have DOS, Unix or Mac line breaks, like regular text
 384 files.
 385
 386 All versions of dos2unix and unix2dos can convert UTF-8 encoded files, because
 387 UTF-8 was designed for backward compatibility with ASCII.
 388
 389 Dos2unix and unix2dos with Unicode UTF-16 support, can read little and big
 390 endian UTF-16 encoded text files. To see if dos2unix was built with UTF-16
 391 support type C<dos2unix -V>.
 392
 393 UTF-16 encoded files are by default converted to UTF-8. On Unix/Linux it is
 394 required that the locale character encoding is set to UTF-8. Use the locale(1)
 395 command to find out what the locale character encoding is. UTF-8 formatted
 396 text files are well supported on both Windows and Unix/Linux.
 397
 398 UTF-16 and UTF-8 encoding are fully compatible, there will no text be lost in
 399 the conversion. When an UTF-16 to UTF-8 conversion error occurs, for instance
 400 when the UTF-16 input file contains an error, the file will be skipped.
 401
 402 When option C<-u> is used, the output file will be written in the same UTF-16
 403 encoding as the input file. Option C<-u> prevents conversion to UTF-8.
 404
 405 Dos2unix and unix2dos have no option to convert UTF-8 files to UTF-16.
 406
 407 ISO and 7-bit mode conversion do not work on UTF-16 files.
 408
 409 =head2 Byte Order Mark
 410
 411 On Windows Unicode text files typically have a Byte Order Mark (BOM), because
 412 many Windows programs (including Notepad) add BOMs by default. See also
 413 L<http://en.wikipedia.org/wiki/Byte_order_mark>.
 414
 415 On Unix Unicode files typically don't have a BOM. It is assumed that text files
 416 are encoded in the locale character encoding.
 417
 418 Dos2unix can only detect if a file is in UTF-16 format if the file has a BOM.
 419 When an UTF-16 file doesn't have a BOM, dos2unix will see the file as a binary
 420 file.
 421
 422 Use option C<-ul> or C<-ub> to convert an UTF-16 file without BOM.
 423
 424 Dos2unix writes by default no BOM in the output file. With option C<-b>
 425 Dos2unix writes a BOM when the input file has a BOM.
 426
 427 Unix2dos writes by default a BOM in the output file when the input file has a
 428 BOM. Use option C<-r> to remove the BOM.
 429
 430 Dos2unix and unix2dos write always a BOM when option C<-m> is used.
 431
 432 =head2 Unicode examples
 433
 434 Convert from Windows UTF-16 (with BOM) to Unix UTF-8
 435
 436     dos2unix -n in.txt out.txt
 437
 438 Convert from Windows UTF-16LE (without BOM) to Unix UTF-8
 439
 440     dos2unix -ul -n in.txt out.txt
 441
 442 Convert from Unix UTF-8 to Windows UTF-8 with BOM
 443
 444     unix2dos -m -n in.txt out.txt
 445
 446 Convert from Unix UTF-8 to Windows UTF-16
 447
 448     unix2dos < in.txt | iconv -f UTF-8 -t UTF-16 > out.txt
 449
 450 =head1 EXAMPLES
 451
 452 Read input from 'stdin' and write output to 'stdout'.
 453
 454     dos2unix
 455     dos2unix -l -c mac
 456
 457 Convert and replace a.txt. Convert and replace b.txt.
 458
 459     dos2unix a.txt b.txt
 460     dos2unix -o a.txt b.txt
 461
 462 Convert and replace a.txt in ascii conversion mode.
 463
 464     dos2unix a.txt
 465
 466 Convert and replace a.txt in ascii conversion mode.
 467 Convert and replace b.txt in 7bit conversion mode.
 468
 469     dos2unix a.txt -c 7bit b.txt
 470     dos2unix -c ascii a.txt -c 7bit b.txt
 471     dos2unix -ascii a.txt -7 b.txt
 472
 473 Convert a.txt from Mac to Unix format.
 474
 475     dos2unix -c mac a.txt
 476     mac2unix a.txt
 477
 478 Convert a.txt from Unix to Mac format.
 479
 480     unix2dos -c mac a.txt
 481     unix2mac a.txt
 482
 483 Convert and replace a.txt while keeping original date stamp.
 484
 485     dos2unix -k a.txt
 486     dos2unix -k -o a.txt
 487
 488 Convert a.txt and write to e.txt.
 489
 490     dos2unix -n a.txt e.txt
 491
 492 Convert a.txt and write to e.txt, keep date stamp of e.txt same as a.txt.
 493
 494     dos2unix -k -n a.txt e.txt
 495
 496 Convert and replace a.txt. Convert b.txt and write to e.txt.
 497
 498     dos2unix a.txt -n b.txt e.txt
 499     dos2unix -o a.txt -n b.txt e.txt
 500
 501 Convert c.txt and write to e.txt. Convert and replace a.txt.
 502 Convert and replace b.txt. Convert d.txt and write to f.txt.
 503
 504     dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt
 505
 506 =head1 RECURSIVE CONVERSION
 507
 508 Use dos2unix in combination with the find(1) and xargs(1) commands to
 509 recursively convert text files in a directory tree structure. For instance to
 510 convert all .txt files in the directory tree under the current directory type:
 511
 512     find . -name *.txt |xargs dos2unix
 513
 514 =head1 LOCALIZATION
 515
 516 =over 4
 517
 518 =item B<LANG>
 519
 520 The primary language is selected with the environment variable LANG. The LANG
 521 variable consists out of several parts. The first part is in small letters the
 522 language code. The second is optional and is the country code in capital
 523 letters, preceded with an underscore. There is also an optional third part:
 524 character encoding, preceded with a dot. A few examples for POSIX standard type
 525 shells:
 526
 527     export LANG=nl               Dutch
 528     export LANG=nl_NL            Dutch, The Netherlands
 529     export LANG=nl_BE            Dutch, Belgium
 530     export LANG=es_ES            Spanish, Spain
 531     export LANG=es_MX            Spanish, Mexico
 532     export LANG=en_US.iso88591   English, USA, Latin-1 encoding
 533     export LANG=en_GB.UTF-8      English, UK, UTF-8 encoding
 534
 535 For a complete list of language and country codes see the gettext manual:
 536 L<http://www.gnu.org/software/gettext/manual/gettext.html#Language-Codes>
 537
 538 On Unix systems you can use to command locale(1) to get locale specific
 539 information.
 540
 541 =item B<LANGUAGE>
 542
 543 With the LANGUAGE environment variable you can specify a priority list of
 544 languages, separated by colons. Dos2unix gives preference to LANGUAGE over LANG.
 545 For instance, first Dutch and then German: C<LANGUAGE=nl:de>. You have to first
 546 enable localization, by setting LANG (or LC_ALL) to a value other than
 547 "C", before you can use a language priority list through the LANGUAGE
 548 variable. See also the gettext manual:
 549 L<http://www.gnu.org/software/gettext/manual/gettext.html#The-LANGUAGE-variable>
 550
 551 If you select a language which is not available you will get the
 552 standard English messages.
 553
 554
 555 =item B<DOS2UNIX_LOCALEDIR>
 556
 557 With the environment variable DOS2UNIX_LOCALEDIR the LOCALEDIR set
 558 during compilation can be overruled. LOCALEDIR is used to find the
 559 language files. The GNU default value is C</usr/local/share/locale>.
 560 Option B<--version> will display the LOCALEDIR that is used.
 561
 562 Example (POSIX shell):
 563
 564     export DOS2UNIX_LOCALEDIR=$HOME/share/locale
 565
 566 =back
 567
 568
 569 =head1 RETURN VALUE
 570
 571 On success, zero is returned.  When a system error occurs the last system error will be
 572 returned. For other errors 1 is returned.
 573
 574 The return value is always zero in quiet mode, except when wrong command-line options
 575 are used.
 576
 577 =head1 STANDARDS
 578
 579 L<http://en.wikipedia.org/wiki/Text_file>
 580
 581 L<http://en.wikipedia.org/wiki/Carriage_return>
 582
 583 L<http://en.wikipedia.org/wiki/Newline>
 584
 585 L<http://en.wikipedia.org/wiki/Unicode>
 586
 587 =head1 AUTHORS
 588
 589 Benjamin Lin - <blin@socs.uts.edu.au>
 590 Bernd Johannes Wuebben (mac2unix mode) - <wuebben@kde.org>,
 591 Christian Wurll (add extra newline) - <wurll@ira.uka.de>,
 592 Erwin Waterlander - <waterlan@xs4all.nl> (Maintainer)
 593
 594 Project page: L<http://waterlan.home.xs4all.nl/dos2unix.html>
 595
 596 SourceForge page: L<http://sourceforge.net/projects/dos2unix/>
 597
 598 =head1 SEE ALSO
 599
 600 file(1)
 601 find(1)
 602 iconv(1)
 603 locale(1)
 604 xargs(1)
 605
 606 =cut