2 dos2unix - DOS/Mac to Unix and vice versa text file format converter
5 dos2unix [options] [FILE ...] [-n INFILE OUTFILE ...]
6 unix2dos [options] [FILE ...] [-n INFILE OUTFILE ...]
9 The Dos2unix package includes utilities "dos2unix" and "unix2dos" to
10 convert plain text files in DOS or Mac format to Unix format and vice
13 In DOS/Windows text files a line break, also known as newline, is a
14 combination of two characters: a Carriage Return (CR) followed by a Line
15 Feed (LF). In Unix text files a line break is a single character: the
16 Line Feed (LF). In Mac text files, prior to Mac OS X, a line break was
17 single Carriage Return (CR) character. Nowadays Mac OS uses Unix style
20 Besides line breaks Dos2unix can also convert the encoding of files. A
21 few DOS code pages can be converted to Unix Latin-1. And Windows Unicode
22 (UTF-16) files can be converted to Unix Unicode (UTF-8) files.
24 Binary files are automatically skipped, unless conversion is forced.
26 Non-regular files, such as directories and FIFOs, are automatically
29 Symbolic links and their targets are by default kept untouched. Symbolic
30 links can optionally be replaced, or the output can be written to the
31 symbolic link target. Writing to a symbolic link target is not supported
34 Dos2unix was modelled after dos2unix under SunOS/Solaris. There is one
35 important difference with the original SunOS/Solaris version. This
36 version does by default in-place conversion (old file mode), while the
37 original SunOS/Solaris version only supports paired conversion (new file
38 mode). See also options "-o" and "-n".
41 -- Treat all following options as file names. Use this option if you
42 want to convert files whose names start with a dash. For instance to
43 convert a file named "-foo", you can use this command:
49 dos2unix -n -- -foo out.txt
52 Convert only line breaks. This is the default conversion mode.
55 Conversion between DOS and ISO-8859-1 character set. See also
56 section CONVERSION MODES.
59 Use Windows code page 1252 (Western European).
62 Use DOS code page 437 (US). This is the default code page used for
66 Use DOS code page 850 (Western European).
69 Use DOS code page 860 (Portuguese).
72 Use DOS code page 863 (French Canadian).
75 Use DOS code page 865 (Nordic).
77 -7 Convert 8 bit characters to 7 bit space.
80 Keep Byte Order Mark (BOM). When the input file has a BOM, write a
81 BOM in the output file. This is the default behavior when converting
82 to DOS line breaks. See also option "-r".
84 -c, --convmode CONVMODE
85 Set conversion mode. Where CONVMODE is one of: *ascii*, *7bit*,
86 *iso*, *mac* with ascii being the default.
89 Force conversion of binary files.
92 Display help and exit.
94 -i[FLAGS], --info[=FLAGS] FILE ...
95 Display file information. No conversion is done.
97 The following information is printed, in this order: number of DOS
98 line breaks, number of Unix line breaks, number of Mac line breaks,
99 byte order mark, text or binary, file name.
103 6 0 0 no_bom text dos.txt
104 0 6 0 no_bom text unix.txt
105 0 0 6 no_bom text mac.txt
106 6 6 6 no_bom text mixed.txt
107 50 0 0 UTF-16LE text utf16le.txt
108 0 50 0 no_bom text utf8unix.txt
109 50 0 0 UTF-8 text utf8dos.txt
110 2 418 219 no_bom binary dos2unix.exe
112 Optionally extra flags can be set to change the output. One or more
115 d Print number of DOS line breaks.
117 u Print number of Unix line breaks.
119 m Print number of Mac line breaks.
121 b Print the byte order mark.
123 t Print if file is text or binary.
125 c Print only the files that would be converted.
127 With the "c" flag dos2unix will print only the files that
128 contain DOS line breaks, unix2dos will print only file names
129 that have Unix line breaks.
133 Show information for all *.txt files:
137 Show only the number of DOS line breaks and Unix line breaks:
141 Show only the byte order mark:
143 dos2unix --info=b *.txt
145 List the files that have DOS line breaks.
149 List the files that have Unix line breaks.
154 Keep the date stamp of output file same as input file.
157 Display program's license.
160 Add additional newline.
162 dos2unix: Only DOS line breaks are changed to two Unix line breaks.
163 In Mac mode only Mac line breaks are changed to two Unix line
166 unix2dos: Only Unix line breaks are changed to two DOS line breaks.
167 In Mac mode Unix line breaks are changed to two Mac line breaks.
170 Write a Byte Order Mark (BOM) in the output file. By default an
171 UTF-8 BOM is written.
173 When the input file is UTF-16, and the option "-u" is used, an
174 UTF-16 BOM will be written.
176 Never use this option when the output encoding is other than UTF-8
177 or UTF-16. See also section UNICODE.
179 -n, --newfile INFILE OUTFILE ...
180 New file mode. Convert file INFILE and write output to file OUTFILE.
181 File names must be given in pairs and wildcard names should *not* be
182 used or you *will* lose your files.
184 The person who starts the conversion in new file (paired) mode will
185 be the owner of the converted file. The read/write permissions of
186 the new file will be the permissions of the original file minus the
187 umask(1) of the person who runs the conversion.
189 -o, --oldfile FILE ...
190 Old file mode. Convert file FILE and overwrite output to it. The
191 program defaults to run in this mode. Wildcard names may be used.
193 In old file (in-place) mode the converted file gets the same owner,
194 group, and read/write permissions as the original file. Also when
195 the file is converted by another user who has write permissions on
196 the file (e.g. user root). The conversion will be aborted when it is
197 not possible to preserve the original values. Change of owner could
198 mean that the original owner is not able to read the file any more.
199 Change of group could be a security risk, the file could be made
200 readable for persons for whom it is not intended. Preservation of
201 owner, group, and read/write permissions is only supported on Unix.
204 Quiet mode. Suppress all warnings and messages. The return value is
205 zero. Except when wrong command-line options are used.
208 Remove Byte Order Mark (BOM). Do not write a BOM in the output file.
209 This is the default behavior when converting to Unix line breaks.
210 See also option "-b".
213 Skip binary files (default).
216 Keep the original UTF-16 encoding of the input file. The output file
217 will be written in the same UTF-16 encoding, little or big endian,
218 as the input file. This prevents transformation to UTF-8. An UTF-16
219 BOM will be written accordingly. This option can be disabled with
222 -ul, --assume-utf16le
223 Assume that the input file format is UTF-16LE.
225 When there is a Byte Order Mark in the input file the BOM has
226 priority over this option.
228 When you made a wrong assumption (the input file was not in UTF-16LE
229 format) and the conversion succeeded, you will get an UTF-8 output
230 file with wrong text. You can undo the wrong conversion with
231 iconv(1) by converting the UTF-8 output file back to UTF-16LE. This
232 will bring back the original file.
234 The assumption of UTF-16LE works as a *conversion mode*. By
235 switching to the default *ascii* mode the UTF-16LE assumption is
238 -ub, --assume-utf16be
239 Assume that the input file format is UTF-16BE.
241 This option works the same as option "-ul".
244 Display verbose messages. Extra information is displayed about Byte
245 Order Marks and the amount of converted line breaks.
248 Follow symbolic links and convert the targets.
250 -R, --replace-symlink
251 Replace symbolic links with converted files (original target files
255 Keep symbolic links and targets unchanged (default).
258 Display version information and exit.
261 In normal mode line breaks are converted from DOS to Unix and vice
262 versa. Mac line breaks are not converted.
264 In Mac mode line breaks are converted from Mac to Unix and vice versa.
265 DOS line breaks are not changed.
267 To run in Mac mode use the command-line option "-c mac" or use the
268 commands "mac2unix" or "unix2mac".
272 In mode "ascii" only line breaks are converted. This is the default
275 Although the name of this mode is ASCII, which is a 7 bit standard,
276 the actual mode is 8 bit. Use always this mode when converting
280 In this mode all 8 bit non-ASCII characters (with values from 128 to
281 255) are converted to a 7 bit space.
283 iso Characters are converted between a DOS character set (code page) and
284 ISO character set ISO-8859-1 (Latin-1) on Unix. DOS characters
285 without ISO-8859-1 equivalent, for which conversion is not possible,
286 are converted to a dot. The same counts for ISO-8859-1 characters
287 without DOS counterpart.
289 When only option "-iso" is used dos2unix will try to determine the
290 active code page. When this is not possible dos2unix will use
291 default code page CP437, which is mainly used in the USA. To force a
292 specific code page use options -437 (US), -850 (Western European),
293 -860 (Portuguese), -863 (French Canadian), or -865 (Nordic). Windows
294 code page CP1252 (Western European) is also supported with option
295 -1252. For other code pages use dos2unix in combination with
296 iconv(1). Iconv can convert between a long list of character
299 Never use ISO conversion on Unicode text files. It will corrupt
304 Convert from DOS default code page to Unix Latin-1
306 dos2unix -iso -n in.txt out.txt
308 Convert from DOS CP850 to Unix Latin-1
310 dos2unix -850 -n in.txt out.txt
312 Convert from Windows CP1252 to Unix Latin-1
314 dos2unix -1252 -n in.txt out.txt
316 Convert from Windows CP1252 to Unix UTF-8 (Unicode)
318 iconv -f CP1252 -t UTF-8 in.txt | dos2unix > out.txt
320 Convert from Unix Latin-1 to DOS default code page
322 unix2dos -iso -n in.txt out.txt
324 Convert from Unix Latin-1 to DOS CP850
326 unix2dos -850 -n in.txt out.txt
328 Convert from Unix Latin-1 to Windows CP1252
330 unix2dos -1252 -n in.txt out.txt
332 Convert from Unix UTF-8 (Unicode) to Windows CP1252
334 unix2dos < in.txt | iconv -f UTF-8 -t CP1252 > out.txt
336 See also <http://czyborra.com/charsets/codepages.html> and
337 <http://czyborra.com/charsets/iso8859.html>.
341 There exist different Unicode encodings. On Unix and Linux Unicode files
342 are typically encoded in UTF-8 encoding. On Windows Unicode text files
343 can be encoded in UTF-8, UTF-16, or UTF-16 big endian, but are mostly
344 encoded in UTF-16 format.
347 Unicode text files can have DOS, Unix or Mac line breaks, like regular
350 All versions of dos2unix and unix2dos can convert UTF-8 encoded files,
351 because UTF-8 was designed for backward compatibility with ASCII.
353 Dos2unix and unix2dos with Unicode UTF-16 support, can read little and
354 big endian UTF-16 encoded text files. To see if dos2unix was built with
355 UTF-16 support type "dos2unix -V".
357 UTF-16 encoded files are by default converted to UTF-8. On Unix/Linux it
358 is required that the locale character encoding is set to UTF-8. Use the
359 locale(1) command to find out what the locale character encoding is.
360 UTF-8 formatted text files are well supported on both Windows and
363 UTF-16 and UTF-8 encoding are fully compatible, there will no text be
364 lost in the conversion. When an UTF-16 to UTF-8 conversion error occurs,
365 for instance when the UTF-16 input file contains an error, the file will
368 When option "-u" is used, the output file will be written in the same
369 UTF-16 encoding as the input file. Option "-u" prevents conversion to
372 Dos2unix and unix2dos have no option to convert UTF-8 files to UTF-16.
374 ISO and 7-bit mode conversion do not work on UTF-16 files.
377 On Windows Unicode text files typically have a Byte Order Mark (BOM),
378 because many Windows programs (including Notepad) add BOMs by default.
379 See also <http://en.wikipedia.org/wiki/Byte_order_mark>.
381 On Unix Unicode files typically don't have a BOM. It is assumed that
382 text files are encoded in the locale character encoding.
384 Dos2unix can only detect if a file is in UTF-16 format if the file has a
385 BOM. When an UTF-16 file doesn't have a BOM, dos2unix will see the file
388 Use option "-ul" or "-ub" to convert an UTF-16 file without BOM.
390 Dos2unix writes by default no BOM in the output file. With option "-b"
391 Dos2unix writes a BOM when the input file has a BOM.
393 Unix2dos writes by default a BOM in the output file when the input file
394 has a BOM. Use option "-r" to remove the BOM.
396 Dos2unix and unix2dos write always a BOM when option "-m" is used.
399 Convert from Windows UTF-16 (with BOM) to Unix UTF-8
401 dos2unix -n in.txt out.txt
403 Convert from Windows UTF-16LE (without BOM) to Unix UTF-8
405 dos2unix -ul -n in.txt out.txt
407 Convert from Unix UTF-8 to Windows UTF-8 with BOM
409 unix2dos -m -n in.txt out.txt
411 Convert from Unix UTF-8 to Windows UTF-16
413 unix2dos < in.txt | iconv -f UTF-8 -t UTF-16 > out.txt
416 Read input from 'stdin' and write output to 'stdout'.
421 Convert and replace a.txt. Convert and replace b.txt.
424 dos2unix -o a.txt b.txt
426 Convert and replace a.txt in ascii conversion mode.
430 Convert and replace a.txt in ascii conversion mode. Convert and replace
431 b.txt in 7bit conversion mode.
433 dos2unix a.txt -c 7bit b.txt
434 dos2unix -c ascii a.txt -c 7bit b.txt
435 dos2unix -ascii a.txt -7 b.txt
437 Convert a.txt from Mac to Unix format.
439 dos2unix -c mac a.txt
442 Convert a.txt from Unix to Mac format.
444 unix2dos -c mac a.txt
447 Convert and replace a.txt while keeping original date stamp.
452 Convert a.txt and write to e.txt.
454 dos2unix -n a.txt e.txt
456 Convert a.txt and write to e.txt, keep date stamp of e.txt same as
459 dos2unix -k -n a.txt e.txt
461 Convert and replace a.txt. Convert b.txt and write to e.txt.
463 dos2unix a.txt -n b.txt e.txt
464 dos2unix -o a.txt -n b.txt e.txt
466 Convert c.txt and write to e.txt. Convert and replace a.txt. Convert and
467 replace b.txt. Convert d.txt and write to f.txt.
469 dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt
472 Use dos2unix in combination with the find(1) and xargs(1) commands to
473 recursively convert text files in a directory tree structure. For
474 instance to convert all .txt files in the directory tree under the
475 current directory type:
477 find . -name *.txt |xargs dos2unix
481 The primary language is selected with the environment variable LANG.
482 The LANG variable consists out of several parts. The first part is
483 in small letters the language code. The second is optional and is
484 the country code in capital letters, preceded with an underscore.
485 There is also an optional third part: character encoding, preceded
486 with a dot. A few examples for POSIX standard type shells:
489 export LANG=nl_NL Dutch, The Netherlands
490 export LANG=nl_BE Dutch, Belgium
491 export LANG=es_ES Spanish, Spain
492 export LANG=es_MX Spanish, Mexico
493 export LANG=en_US.iso88591 English, USA, Latin-1 encoding
494 export LANG=en_GB.UTF-8 English, UK, UTF-8 encoding
496 For a complete list of language and country codes see the gettext
498 <http://www.gnu.org/software/gettext/manual/gettext.html#Language-Co
501 On Unix systems you can use to command locale(1) to get locale
502 specific information.
505 With the LANGUAGE environment variable you can specify a priority
506 list of languages, separated by colons. Dos2unix gives preference to
507 LANGUAGE over LANG. For instance, first Dutch and then German:
508 "LANGUAGE=nl:de". You have to first enable localization, by setting
509 LANG (or LC_ALL) to a value other than "C", before you can use a
510 language priority list through the LANGUAGE variable. See also the
512 <http://www.gnu.org/software/gettext/manual/gettext.html#The-LANGUAG
515 If you select a language which is not available you will get the
516 standard English messages.
519 With the environment variable DOS2UNIX_LOCALEDIR the LOCALEDIR set
520 during compilation can be overruled. LOCALEDIR is used to find the
521 language files. The GNU default value is "/usr/local/share/locale".
522 Option --version will display the LOCALEDIR that is used.
524 Example (POSIX shell):
526 export DOS2UNIX_LOCALEDIR=$HOME/share/locale
529 On success, zero is returned. When a system error occurs the last system
530 error will be returned. For other errors 1 is returned.
532 The return value is always zero in quiet mode, except when wrong
533 command-line options are used.
536 <http://en.wikipedia.org/wiki/Text_file>
538 <http://en.wikipedia.org/wiki/Carriage_return>
540 <http://en.wikipedia.org/wiki/Newline>
542 <http://en.wikipedia.org/wiki/Unicode>
545 Benjamin Lin - <blin@socs.uts.edu.au> Bernd Johannes Wuebben (mac2unix
546 mode) - <wuebben@kde.org>, Christian Wurll (add extra newline) -
547 <wurll@ira.uka.de>, Erwin Waterlander - <waterlan@xs4all.nl>
550 Project page: <http://waterlan.home.xs4all.nl/dos2unix.html>
552 SourceForge page: <http://sourceforge.net/projects/dos2unix/>
555 file(1) find(1) iconv(1) locale(1) xargs(1)