2 dos2unix - DOS/Mac to Unix and vice versa text file format converter
5 dos2unix [options] [FILE ...] [-n INFILE OUTFILE ...]
6 unix2dos [options] [FILE ...] [-n INFILE OUTFILE ...]
9 The Dos2unix package includes utilities "dos2unix" and "unix2dos" to
10 convert plain text files in DOS or Mac format to Unix format and vice
13 In DOS/Windows text files a line break, also known as newline, is a
14 combination of two characters: a Carriage Return (CR) followed by a Line
15 Feed (LF). In Unix text files a line break is a single character: the
16 Line Feed (LF). In Mac text files, prior to Mac OS X, a line break was
17 single Carriage Return (CR) character. Nowadays Mac OS uses Unix style
20 Besides line breaks Dos2unix can also convert the encoding of files. A
21 few DOS code pages can be converted to Unix Latin-1. And Windows Unicode
22 (UTF-16) files can be converted to Unix Unicode (UTF-8) files.
24 Binary files are automatically skipped, unless conversion is forced.
26 Non-regular files, such as directories and FIFOs, are automatically
29 Symbolic links and their targets are by default kept untouched. Symbolic
30 links can optionally be replaced, or the output can be written to the
31 symbolic link target. Writing to a symbolic link target is not supported
34 Dos2unix was modelled after dos2unix under SunOS/Solaris. There is one
35 important difference with the original SunOS/Solaris version. This
36 version does by default in-place conversion (old file mode), while the
37 original SunOS/Solaris version only supports paired conversion (new file
38 mode). See also options "-o" and "-n".
41 -- Treat all following options as file names. Use this option if you
42 want to convert files whose names start with a dash. For instance to
43 convert a file named "-foo", you can use this command:
49 dos2unix -n -- -foo out.txt
52 Convert only line breaks. This is the default conversion mode.
55 Conversion between DOS and ISO-8859-1 character set. See also
56 section CONVERSION MODES.
59 Use Windows code page 1252 (Western European).
62 Use DOS code page 437 (US). This is the default code page used for
66 Use DOS code page 850 (Western European).
69 Use DOS code page 860 (Portuguese).
72 Use DOS code page 863 (French Canadian).
75 Use DOS code page 865 (Nordic).
77 -7 Convert 8 bit characters to 7 bit space.
80 Keep Byte Order Mark (BOM). When the input file has a BOM, write a
81 BOM in the output file. This is the default behavior when converting
82 to DOS line breaks. See also option "-r".
84 -c, --convmode CONVMODE
85 Set conversion mode. Where CONVMODE is one of: *ascii*, *7bit*,
86 *iso*, *mac* with ascii being the default.
89 Force conversion of binary files.
92 On Windows UTF-16 files are by default converted to UTF-8,
93 regardless of the locale setting. Use this option to convert UTF-16
94 files to GB18030. This option is only available on Windows. See also
98 Display help and exit.
100 -i[FLAGS], --info[=FLAGS] FILE ...
101 Display file information. No conversion is done.
103 The following information is printed, in this order: number of DOS
104 line breaks, number of Unix line breaks, number of Mac line breaks,
105 byte order mark, text or binary, file name.
109 6 0 0 no_bom text dos.txt
110 0 6 0 no_bom text unix.txt
111 0 0 6 no_bom text mac.txt
112 6 6 6 no_bom text mixed.txt
113 50 0 0 UTF-16LE text utf16le.txt
114 0 50 0 no_bom text utf8unix.txt
115 50 0 0 UTF-8 text utf8dos.txt
116 2 418 219 no_bom binary dos2unix.exe
118 Optionally extra flags can be set to change the output. One or more
121 d Print number of DOS line breaks.
123 u Print number of Unix line breaks.
125 m Print number of Mac line breaks.
127 b Print the byte order mark.
129 t Print if file is text or binary.
131 c Print only the files that would be converted.
133 With the "c" flag dos2unix will print only the files that
134 contain DOS line breaks, unix2dos will print only file names
135 that have Unix line breaks.
139 Show information for all *.txt files:
143 Show only the number of DOS line breaks and Unix line breaks:
147 Show only the byte order mark:
149 dos2unix --info=b *.txt
151 List the files that have DOS line breaks:
155 List the files that have Unix line breaks:
160 Keep the date stamp of output file same as input file.
163 Display program's license.
166 Add additional newline.
168 dos2unix: Only DOS line breaks are changed to two Unix line breaks.
169 In Mac mode only Mac line breaks are changed to two Unix line
172 unix2dos: Only Unix line breaks are changed to two DOS line breaks.
173 In Mac mode Unix line breaks are changed to two Mac line breaks.
176 Write a Byte Order Mark (BOM) in the output file. By default an
177 UTF-8 BOM is written.
179 When the input file is UTF-16, and the option "-u" is used, an
180 UTF-16 BOM will be written.
182 Never use this option when the output encoding is other than UTF-8
183 or UTF-16. See also section UNICODE.
185 -n, --newfile INFILE OUTFILE ...
186 New file mode. Convert file INFILE and write output to file OUTFILE.
187 File names must be given in pairs and wildcard names should *not* be
188 used or you *will* lose your files.
190 The person who starts the conversion in new file (paired) mode will
191 be the owner of the converted file. The read/write permissions of
192 the new file will be the permissions of the original file minus the
193 umask(1) of the person who runs the conversion.
195 -o, --oldfile FILE ...
196 Old file mode. Convert file FILE and overwrite output to it. The
197 program defaults to run in this mode. Wildcard names may be used.
199 In old file (in-place) mode the converted file gets the same owner,
200 group, and read/write permissions as the original file. Also when
201 the file is converted by another user who has write permissions on
202 the file (e.g. user root). The conversion will be aborted when it is
203 not possible to preserve the original values. Change of owner could
204 mean that the original owner is not able to read the file any more.
205 Change of group could be a security risk, the file could be made
206 readable for persons for whom it is not intended. Preservation of
207 owner, group, and read/write permissions is only supported on Unix.
210 Quiet mode. Suppress all warnings and messages. The return value is
211 zero. Except when wrong command-line options are used.
214 Remove Byte Order Mark (BOM). Do not write a BOM in the output file.
215 This is the default behavior when converting to Unix line breaks.
216 See also option "-b".
219 Skip binary files (default).
222 Keep the original UTF-16 encoding of the input file. The output file
223 will be written in the same UTF-16 encoding, little or big endian,
224 as the input file. This prevents transformation to UTF-8. An UTF-16
225 BOM will be written accordingly. This option can be disabled with
228 -ul, --assume-utf16le
229 Assume that the input file format is UTF-16LE.
231 When there is a Byte Order Mark in the input file the BOM has
232 priority over this option.
234 When you made a wrong assumption (the input file was not in UTF-16LE
235 format) and the conversion succeeded, you will get an UTF-8 output
236 file with wrong text. You can undo the wrong conversion with
237 iconv(1) by converting the UTF-8 output file back to UTF-16LE. This
238 will bring back the original file.
240 The assumption of UTF-16LE works as a *conversion mode*. By
241 switching to the default *ascii* mode the UTF-16LE assumption is
244 -ub, --assume-utf16be
245 Assume that the input file format is UTF-16BE.
247 This option works the same as option "-ul".
250 Display verbose messages. Extra information is displayed about Byte
251 Order Marks and the amount of converted line breaks.
254 Follow symbolic links and convert the targets.
256 -R, --replace-symlink
257 Replace symbolic links with converted files (original target files
261 Keep symbolic links and targets unchanged (default).
264 Display version information and exit.
267 In normal mode line breaks are converted from DOS to Unix and vice
268 versa. Mac line breaks are not converted.
270 In Mac mode line breaks are converted from Mac to Unix and vice versa.
271 DOS line breaks are not changed.
273 To run in Mac mode use the command-line option "-c mac" or use the
274 commands "mac2unix" or "unix2mac".
278 In mode "ascii" only line breaks are converted. This is the default
281 Although the name of this mode is ASCII, which is a 7 bit standard,
282 the actual mode is 8 bit. Use always this mode when converting
286 In this mode all 8 bit non-ASCII characters (with values from 128 to
287 255) are converted to a 7 bit space.
289 iso Characters are converted between a DOS character set (code page) and
290 ISO character set ISO-8859-1 (Latin-1) on Unix. DOS characters
291 without ISO-8859-1 equivalent, for which conversion is not possible,
292 are converted to a dot. The same counts for ISO-8859-1 characters
293 without DOS counterpart.
295 When only option "-iso" is used dos2unix will try to determine the
296 active code page. When this is not possible dos2unix will use
297 default code page CP437, which is mainly used in the USA. To force a
298 specific code page use options -437 (US), -850 (Western European),
299 -860 (Portuguese), -863 (French Canadian), or -865 (Nordic). Windows
300 code page CP1252 (Western European) is also supported with option
301 -1252. For other code pages use dos2unix in combination with
302 iconv(1). Iconv can convert between a long list of character
305 Never use ISO conversion on Unicode text files. It will corrupt
310 Convert from DOS default code page to Unix Latin-1:
312 dos2unix -iso -n in.txt out.txt
314 Convert from DOS CP850 to Unix Latin-1:
316 dos2unix -850 -n in.txt out.txt
318 Convert from Windows CP1252 to Unix Latin-1:
320 dos2unix -1252 -n in.txt out.txt
322 Convert from Windows CP1252 to Unix UTF-8 (Unicode):
324 iconv -f CP1252 -t UTF-8 in.txt | dos2unix > out.txt
326 Convert from Unix Latin-1 to DOS default code page:
328 unix2dos -iso -n in.txt out.txt
330 Convert from Unix Latin-1 to DOS CP850:
332 unix2dos -850 -n in.txt out.txt
334 Convert from Unix Latin-1 to Windows CP1252:
336 unix2dos -1252 -n in.txt out.txt
338 Convert from Unix UTF-8 (Unicode) to Windows CP1252:
340 unix2dos < in.txt | iconv -f UTF-8 -t CP1252 > out.txt
342 See also <http://czyborra.com/charsets/codepages.html> and
343 <http://czyborra.com/charsets/iso8859.html>.
347 There exist different Unicode encodings. On Unix and Linux Unicode files
348 are typically encoded in UTF-8 encoding. On Windows Unicode text files
349 can be encoded in UTF-8, UTF-16, or UTF-16 big endian, but are mostly
350 encoded in UTF-16 format.
353 Unicode text files can have DOS, Unix or Mac line breaks, like regular
356 All versions of dos2unix and unix2dos can convert UTF-8 encoded files,
357 because UTF-8 was designed for backward compatibility with ASCII.
359 Dos2unix and unix2dos with Unicode UTF-16 support, can read little and
360 big endian UTF-16 encoded text files. To see if dos2unix was built with
361 UTF-16 support type "dos2unix -V".
363 On Unix/Linux UTF-16 encoded files are converted to the locale character
364 encoding. Use the locale(1) command to find out what the locale
365 character encoding is. When conversion is not possible a conversion
366 error will occur and the file will be skipped.
368 On Windows UTF-16 files are by default converted to UTF-8. UTF-8
369 formatted text files are well supported on both Windows and Unix/Linux.
371 UTF-16 and UTF-8 encoding are fully compatible, there will no text be
372 lost in the conversion. When an UTF-16 to UTF-8 conversion error occurs,
373 for instance when the UTF-16 input file contains an error, the file will
376 When option "-u" is used, the output file will be written in the same
377 UTF-16 encoding as the input file. Option "-u" prevents conversion to
380 Dos2unix and unix2dos have no option to convert UTF-8 files to UTF-16.
382 ISO and 7-bit mode conversion do not work on UTF-16 files.
385 On Windows Unicode text files typically have a Byte Order Mark (BOM),
386 because many Windows programs (including Notepad) add BOMs by default.
387 See also <http://en.wikipedia.org/wiki/Byte_order_mark>.
389 On Unix Unicode files typically don't have a BOM. It is assumed that
390 text files are encoded in the locale character encoding.
392 Dos2unix can only detect if a file is in UTF-16 format if the file has a
393 BOM. When an UTF-16 file doesn't have a BOM, dos2unix will see the file
396 Use option "-ul" or "-ub" to convert an UTF-16 file without BOM.
398 Dos2unix writes by default no BOM in the output file. With option "-b"
399 Dos2unix writes a BOM when the input file has a BOM.
401 Unix2dos writes by default a BOM in the output file when the input file
402 has a BOM. Use option "-r" to remove the BOM.
404 Dos2unix and unix2dos write always a BOM when option "-m" is used.
407 Convert from Windows UTF-16 (with BOM) to Unix UTF-8:
409 dos2unix -n in.txt out.txt
411 Convert from Windows UTF-16LE (without BOM) to Unix UTF-8:
413 dos2unix -ul -n in.txt out.txt
415 Convert from Unix UTF-8 to Windows UTF-8 with BOM:
417 unix2dos -m -n in.txt out.txt
419 Convert from Unix UTF-8 to Windows UTF-16:
421 unix2dos < in.txt | iconv -f UTF-8 -t UTF-16 > out.txt
424 GB18030 is a Chinese government standard. A mandatory subset of the
425 GB18030 standard is officially required for all software products sold
426 in China. See also <http://en.wikipedia.org/wiki/GB_18030>.
428 GB18030 is fully compatible with Unicode, and can be considered an
429 unicode transformation format. Like UTF-8, GB18030 is compatible with
430 ASCII. GB18030 is also compatible with Windows code page 936, also known
433 On Unix/Linux UTF-16 files are converted to GB18030 when the locale
434 encoding is set to GB18030. Note that this will only work if the
435 location is set to China. E.g. in an English British locale setting
436 "en_GB.GB18030" conversion of UTF-16 to GB18030 will not work, but in a
437 Chinese "zh_CN.GB18030" locale setting it will work.
439 On Windows you need to use option "-gb" to convert UTF-16 files to
442 GB18030 encoded files can have a Byte Order Mark, like Unicode files.
445 Read input from 'stdin' and write output to 'stdout':
450 Convert and replace a.txt. Convert and replace b.txt:
453 dos2unix -o a.txt b.txt
455 Convert and replace a.txt in ascii conversion mode:
459 Convert and replace a.txt in ascii conversion mode, convert and replace
460 b.txt in 7bit conversion mode:
462 dos2unix a.txt -c 7bit b.txt
463 dos2unix -c ascii a.txt -c 7bit b.txt
464 dos2unix -ascii a.txt -7 b.txt
466 Convert a.txt from Mac to Unix format:
468 dos2unix -c mac a.txt
471 Convert a.txt from Unix to Mac format:
473 unix2dos -c mac a.txt
476 Convert and replace a.txt while keeping original date stamp:
481 Convert a.txt and write to e.txt:
483 dos2unix -n a.txt e.txt
485 Convert a.txt and write to e.txt, keep date stamp of e.txt same as
488 dos2unix -k -n a.txt e.txt
490 Convert and replace a.txt, convert b.txt and write to e.txt:
492 dos2unix a.txt -n b.txt e.txt
493 dos2unix -o a.txt -n b.txt e.txt
495 Convert c.txt and write to e.txt, convert and replace a.txt, convert and
496 replace b.txt, convert d.txt and write to f.txt:
498 dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt
501 Use dos2unix in combination with the find(1) and xargs(1) commands to
502 recursively convert text files in a directory tree structure. For
503 instance to convert all .txt files in the directory tree under the
504 current directory type:
506 find . -name *.txt |xargs dos2unix
510 The primary language is selected with the environment variable LANG.
511 The LANG variable consists out of several parts. The first part is
512 in small letters the language code. The second is optional and is
513 the country code in capital letters, preceded with an underscore.
514 There is also an optional third part: character encoding, preceded
515 with a dot. A few examples for POSIX standard type shells:
518 export LANG=nl_NL Dutch, The Netherlands
519 export LANG=nl_BE Dutch, Belgium
520 export LANG=es_ES Spanish, Spain
521 export LANG=es_MX Spanish, Mexico
522 export LANG=en_US.iso88591 English, USA, Latin-1 encoding
523 export LANG=en_GB.UTF-8 English, UK, UTF-8 encoding
525 For a complete list of language and country codes see the gettext
527 <http://www.gnu.org/software/gettext/manual/html_node/Usual-Language
530 On Unix systems you can use the command locale(1) to get locale
531 specific information.
534 With the LANGUAGE environment variable you can specify a priority
535 list of languages, separated by colons. Dos2unix gives preference to
536 LANGUAGE over LANG. For instance, first Dutch and then German:
537 "LANGUAGE=nl:de". You have to first enable localization, by setting
538 LANG (or LC_ALL) to a value other than "C", before you can use a
539 language priority list through the LANGUAGE variable. See also the
541 <http://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-v
544 If you select a language which is not available you will get the
545 standard English messages.
548 With the environment variable DOS2UNIX_LOCALEDIR the LOCALEDIR set
549 during compilation can be overruled. LOCALEDIR is used to find the
550 language files. The GNU default value is "/usr/local/share/locale".
551 Option --version will display the LOCALEDIR that is used.
553 Example (POSIX shell):
555 export DOS2UNIX_LOCALEDIR=$HOME/share/locale
558 On success, zero is returned. When a system error occurs the last system
559 error will be returned. For other errors 1 is returned.
561 The return value is always zero in quiet mode, except when wrong
562 command-line options are used.
565 <http://en.wikipedia.org/wiki/Text_file>
567 <http://en.wikipedia.org/wiki/Carriage_return>
569 <http://en.wikipedia.org/wiki/Newline>
571 <http://en.wikipedia.org/wiki/Unicode>
574 Benjamin Lin - <blin@socs.uts.edu.au>, Bernd Johannes Wuebben (mac2unix
575 mode) - <wuebben@kde.org>, Christian Wurll (add extra newline) -
576 <wurll@ira.uka.de>, Erwin Waterlander - <waterlan@xs4all.nl>
579 Project page: <http://waterlan.home.xs4all.nl/dos2unix.html>
581 SourceForge page: <http://sourceforge.net/projects/dos2unix/>
584 file(1) find(1) iconv(1) locale(1) xargs(1)