2 dos2unix - DOS/Mac to Unix and vice versa text file format converter
5 dos2unix [options] [FILE ...] [-n INFILE OUTFILE ...]
6 unix2dos [options] [FILE ...] [-n INFILE OUTFILE ...]
9 The Dos2unix package includes utilities "dos2unix" and "unix2dos" to
10 convert plain text files in DOS or Mac format to Unix format and vice
13 In DOS/Windows text files a line break, also known as newline, is a
14 combination of two characters: a Carriage Return (CR) followed by a Line
15 Feed (LF). In Unix text files a line break is a single character: the
16 Line Feed (LF). In Mac text files, prior to Mac OS X, a line break was
17 single Carriage Return (CR) character. Nowadays Mac OS uses Unix style
20 Besides line breaks Dos2unix can also convert the encoding of files. A
21 few DOS code pages can be converted to Unix Latin-1. And Windows Unicode
22 (UTF-16) files can be converted to Unix Unicode (UTF-8) files.
24 Binary files are automatically skipped, unless conversion is forced.
26 Non-regular files, such as directories and FIFOs, are automatically
29 Symbolic links and their targets are by default kept untouched. Symbolic
30 links can optionally be replaced, or the output can be written to the
31 symbolic link target. Writing to a symbolic link target is not supported
34 Dos2unix was modelled after dos2unix under SunOS/Solaris. There is one
35 important difference with the original SunOS/Solaris version. This
36 version does by default in-place conversion (old file mode), while the
37 original SunOS/Solaris version only supports paired conversion (new file
38 mode). See also options "-o" and "-n".
41 -- Treat all following options as file names. Use this option if you
42 want to convert files whose names start with a dash. For instance to
43 convert a file named "-foo", you can use this command:
49 dos2unix -n -- -foo out.txt
52 Convert only line breaks. This is the default conversion mode.
55 Conversion between DOS and ISO-8859-1 character set. See also
56 section CONVERSION MODES.
59 Use Windows code page 1252 (Western European).
62 Use DOS code page 437 (US). This is the default code page used for
66 Use DOS code page 850 (Western European).
69 Use DOS code page 860 (Portuguese).
72 Use DOS code page 863 (French Canadian).
75 Use DOS code page 865 (Nordic).
77 -7 Convert 8 bit characters to 7 bit space.
80 Keep Byte Order Mark (BOM). When the input file has a BOM, write a
81 BOM in the output file. This is the default behavior when converting
82 to DOS line breaks. See also option "-r".
84 -c, --convmode CONVMODE
85 Set conversion mode. Where CONVMODE is one of: *ascii*, *7bit*,
86 *iso*, *mac* with ascii being the default.
89 Force conversion of binary files.
92 Display help and exit.
95 Keep the date stamp of output file same as input file.
98 Display program's license.
101 Add additional newline.
103 dos2unix: Only DOS line breaks are changed to two Unix line breaks.
104 In Mac mode only Mac line breaks are changed to two Unix line
107 unix2dos: Only Unix line breaks are changed to two DOS line breaks.
108 In Mac mode Unix line breaks are changed to two Mac line breaks.
111 Write a Byte Order Mark (BOM) in the output file. By default an
112 UTF-8 BOM is written.
114 When the input file is UTF-16, and the option "-u" is used, an
115 UTF-16 BOM will be written.
117 Never use this option when the output encoding is other than UTF-8
118 or UTF-16. See also section UNICODE.
120 -n, --newfile INFILE OUTFILE ...
121 New file mode. Convert file INFILE and write output to file OUTFILE.
122 File names must be given in pairs and wildcard names should *not* be
123 used or you *will* lose your files.
125 The person who starts the conversion in new file (paired) mode will
126 be the owner of the converted file. The read/write permissions of
127 the new file will be the permissions of the original file minus the
128 umask(1) of the person who runs the conversion.
130 -o, --oldfile FILE ...
131 Old file mode. Convert file FILE and overwrite output to it. The
132 program defaults to run in this mode. Wildcard names may be used.
134 In old file (in-place) mode the converted file gets the same owner,
135 group, and read/write permissions as the original file. Also when
136 the file is converted by another user who has write permissions on
137 the file (e.g. user root). The conversion will be aborted when it is
138 not possible to preserve the original values. Change of owner could
139 mean that the original owner is not able to read the file any more.
140 Change of group could be a security risk, the file could be made
141 readable for persons for whom it is not intended. Preservation of
142 owner, group, and read/write permissions is only supported on Unix.
145 Quiet mode. Suppress all warnings and messages. The return value is
146 zero. Except when wrong command-line options are used.
149 Remove Byte Order Mark (BOM). Do not write a BOM in the output file.
150 This is the default behavior when converting to Unix line breaks.
151 See also option "-b".
154 Skip binary files (default).
157 Keep the original UTF-16 encoding of the input file. The output file
158 will be written in the same UTF-16 encoding, little or big endian,
159 as the input file. This prevents transformation to UTF-8. An UTF-16
160 BOM will be written accordingly. This option can be disabled with
163 -ul, --assume-utf16le
164 Assume that the input file format is UTF-16LE.
166 When there is a Byte Order Mark in the input file the BOM has
167 priority over this option.
169 When you made a wrong assumption (the input file was not in UTF-16LE
170 format) and the conversion succeeded, you will get an UTF-8 output
171 file with wrong text. You can undo the wrong conversion with
172 iconv(1) by converting the UTF-8 output file back to UTF-16LE. This
173 will bring back the original file.
175 The assumption of UTF-16LE works as a *conversion mode*. By
176 switching to the default *ascii* mode the UTF-16LE assumption is
179 -ub, --assume-utf16be
180 Assume that the input file format is UTF-16BE.
182 This option works the same as option "-ul".
185 Display verbose messages. Extra information is displayed about Byte
186 Order Marks and the amount of converted line breaks.
189 Follow symbolic links and convert the targets.
191 -R, --replace-symlink
192 Replace symbolic links with converted files (original target files
196 Keep symbolic links and targets unchanged (default).
199 Display version information and exit.
202 In normal mode line breaks are converted from DOS to Unix and vice
203 versa. Mac line breaks are not converted.
205 In Mac mode line breaks are converted from Mac to Unix and vice versa.
206 DOS line breaks are not changed.
208 To run in Mac mode use the command-line option "-c mac" or use the
209 commands "mac2unix" or "unix2mac".
213 In mode "ascii" only line breaks are converted. This is the default
216 Although the name of this mode is ASCII, which is a 7 bit standard,
217 the actual mode is 8 bit. Use always this mode when converting
221 In this mode all 8 bit non-ASCII characters (with values from 128 to
222 255) are converted to a 7 bit space.
224 iso Characters are converted between a DOS character set (code page) and
225 ISO character set ISO-8859-1 (Latin-1) on Unix. DOS characters
226 without ISO-8859-1 equivalent, for which conversion is not possible,
227 are converted to a dot. The same counts for ISO-8859-1 characters
228 without DOS counterpart.
230 When only option "-iso" is used dos2unix will try to determine the
231 active code page. When this is not possible dos2unix will use
232 default code page CP437, which is mainly used in the USA. To force a
233 specific code page use options -437 (US), -850 (Western European),
234 -860 (Portuguese), -863 (French Canadian), or -865 (Nordic). Windows
235 code page CP1252 (Western European) is also supported with option
236 -1252. For other code pages use dos2unix in combination with
237 iconv(1). Iconv can convert between a long list of character
240 Never use ISO conversion on Unicode text files. It will corrupt
245 Convert from DOS default code page to Unix Latin-1
247 dos2unix -iso -n in.txt out.txt
249 Convert from DOS CP850 to Unix Latin-1
251 dos2unix -850 -n in.txt out.txt
253 Convert from Windows CP1252 to Unix Latin-1
255 dos2unix -1252 -n in.txt out.txt
257 Convert from Windows CP1252 to Unix UTF-8 (Unicode)
259 iconv -f CP1252 -t UTF-8 in.txt | dos2unix > out.txt
261 Convert from Unix Latin-1 to DOS default code page
263 unix2dos -iso -n in.txt out.txt
265 Convert from Unix Latin-1 to DOS CP850
267 unix2dos -850 -n in.txt out.txt
269 Convert from Unix Latin-1 to Windows CP1252
271 unix2dos -1252 -n in.txt out.txt
273 Convert from Unix UTF-8 (Unicode) to Windows CP1252
275 unix2dos < in.txt | iconv -f UTF-8 -t CP1252 > out.txt
277 See also <http://czyborra.com/charsets/codepages.html> and
278 <http://czyborra.com/charsets/iso8859.html>.
282 There exist different Unicode encodings. On Unix and Linux Unicode files
283 are typically encoded in UTF-8 encoding. On Windows Unicode text files
284 can be encoded in UTF-8, UTF-16, or UTF-16 big endian, but are mostly
285 encoded in UTF-16 format.
288 Unicode text files can have DOS, Unix or Mac line breaks, like regular
291 All versions of dos2unix and unix2dos can convert UTF-8 encoded files,
292 because UTF-8 was designed for backward compatibility with ASCII.
294 Dos2unix and unix2dos with Unicode UTF-16 support, can read little and
295 big endian UTF-16 encoded text files. To see if dos2unix was built with
296 UTF-16 support type "dos2unix -V".
298 UTF-16 encoded files are by default converted to UTF-8. On Unix/Linux it
299 is required that the locale character encoding is set to UTF-8. Use the
300 locale(1) command to find out what the locale character encoding is.
301 UTF-8 formatted text files are well supported on both Windows and
304 UTF-16 and UTF-8 encoding are fully compatible, there will no text be
305 lost in the conversion. When an UTF-16 to UTF-8 conversion error occurs,
306 for instance when the UTF-16 input file contains an error, the file will
309 When option "-u" is used, the output file will be written in the same
310 UTF-16 encoding as the input file. Option "-u" prevents conversion to
313 Dos2unix and unix2dos have no option to convert UTF-8 files to UTF-16.
315 ISO and 7-bit mode conversion do not work on UTF-16 files.
318 On Windows Unicode text files typically have a Byte Order Mark (BOM),
319 because many Windows programs (including Notepad) add BOMs by default.
320 See also <http://en.wikipedia.org/wiki/Byte_order_mark>.
322 On Unix Unicode files typically don't have a BOM. It is assumed that
323 text files are encoded in the locale character encoding.
325 Dos2unix can only detect if a file is in UTF-16 format if the file has a
326 BOM. When an UTF-16 file doesn't have a BOM, dos2unix will see the file
329 Use option "-ul" or "-ub" to convert an UTF-16 file without BOM.
331 Dos2unix writes by default no BOM in the output file. With option "-b"
332 Dos2unix writes a BOM when the input file has a BOM.
334 Unix2dos writes by default a BOM in the output file when the input file
335 has a BOM. Use option "-r" to remove the BOM.
337 Dos2unix and unix2dos write always a BOM when option "-m" is used.
340 Convert from Windows UTF-16 (with BOM) to Unix UTF-8
342 dos2unix -n in.txt out.txt
344 Convert from Windows UTF-16LE (without BOM) to Unix UTF-8
346 dos2unix -ul -n in.txt out.txt
348 Convert from Unix UTF-8 to Windows UTF-8 with BOM
350 unix2dos -m -n in.txt out.txt
352 Convert from Unix UTF-8 to Windows UTF-16
354 unix2dos < in.txt | iconv -f UTF-8 -t UTF-16 > out.txt
357 Read input from 'stdin' and write output to 'stdout'.
362 Convert and replace a.txt. Convert and replace b.txt.
365 dos2unix -o a.txt b.txt
367 Convert and replace a.txt in ascii conversion mode.
371 Convert and replace a.txt in ascii conversion mode. Convert and replace
372 b.txt in 7bit conversion mode.
374 dos2unix a.txt -c 7bit b.txt
375 dos2unix -c ascii a.txt -c 7bit b.txt
376 dos2unix -ascii a.txt -7 b.txt
378 Convert a.txt from Mac to Unix format.
380 dos2unix -c mac a.txt
383 Convert a.txt from Unix to Mac format.
385 unix2dos -c mac a.txt
388 Convert and replace a.txt while keeping original date stamp.
393 Convert a.txt and write to e.txt.
395 dos2unix -n a.txt e.txt
397 Convert a.txt and write to e.txt, keep date stamp of e.txt same as
400 dos2unix -k -n a.txt e.txt
402 Convert and replace a.txt. Convert b.txt and write to e.txt.
404 dos2unix a.txt -n b.txt e.txt
405 dos2unix -o a.txt -n b.txt e.txt
407 Convert c.txt and write to e.txt. Convert and replace a.txt. Convert and
408 replace b.txt. Convert d.txt and write to f.txt.
410 dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt
413 Use dos2unix in combination with the find(1) and xargs(1) commands to
414 recursively convert text files in a directory tree structure. For
415 instance to convert all .txt files in the directory tree under the
416 current directory type:
418 find . -name *.txt |xargs dos2unix
422 The primary language is selected with the environment variable LANG.
423 The LANG variable consists out of several parts. The first part is
424 in small letters the language code. The second is optional and is
425 the country code in capital letters, preceded with an underscore.
426 There is also an optional third part: character encoding, preceded
427 with a dot. A few examples for POSIX standard type shells:
430 export LANG=nl_NL Dutch, The Netherlands
431 export LANG=nl_BE Dutch, Belgium
432 export LANG=es_ES Spanish, Spain
433 export LANG=es_MX Spanish, Mexico
434 export LANG=en_US.iso88591 English, USA, Latin-1 encoding
435 export LANG=en_GB.UTF-8 English, UK, UTF-8 encoding
437 For a complete list of language and country codes see the gettext
439 <http://www.gnu.org/software/gettext/manual/gettext.html#Language-Co
442 On Unix systems you can use to command locale(1) to get locale
443 specific information.
446 With the LANGUAGE environment variable you can specify a priority
447 list of languages, separated by colons. Dos2unix gives preference to
448 LANGUAGE over LANG. For instance, first Dutch and then German:
449 "LANGUAGE=nl:de". You have to first enable localization, by setting
450 LANG (or LC_ALL) to a value other than "C", before you can use a
451 language priority list through the LANGUAGE variable. See also the
453 <http://www.gnu.org/software/gettext/manual/gettext.html#The-LANGUAG
456 If you select a language which is not available you will get the
457 standard English messages.
460 With the environment variable DOS2UNIX_LOCALEDIR the LOCALEDIR set
461 during compilation can be overruled. LOCALEDIR is used to find the
462 language files. The GNU default value is "/usr/local/share/locale".
463 Option --version will display the LOCALEDIR that is used.
465 Example (POSIX shell):
467 export DOS2UNIX_LOCALEDIR=$HOME/share/locale
470 On success, zero is returned. When a system error occurs the last system
471 error will be returned. For other errors 1 is returned.
473 The return value is always zero in quiet mode, except when wrong
474 command-line options are used.
477 <http://en.wikipedia.org/wiki/Text_file>
479 <http://en.wikipedia.org/wiki/Carriage_return>
481 <http://en.wikipedia.org/wiki/Newline>
483 <http://en.wikipedia.org/wiki/Unicode>
486 Benjamin Lin - <blin@socs.uts.edu.au> Bernd Johannes Wuebben (mac2unix
487 mode) - <wuebben@kde.org>, Christian Wurll (add extra newline) -
488 <wurll@ira.uka.de>, Erwin Waterlander - <waterlan@xs4all.nl>
491 Project page: <http://waterlan.home.xs4all.nl/dos2unix.html>
493 SourceForge page: <http://sourceforge.net/projects/dos2unix/>
496 file(1) find(1) iconv(1) locale(1) xargs(1)