1 This is wdiff.info, produced by makeinfo version 4.13 from wdiff.texi.
3 INFO-DIR-SECTION Text creation and manipulation
5 * Word differences: (wdiff). GNU wdiff and diff related tools.
8 INFO-DIR-SECTION Individual utilities
10 * wdiff: (wdiff)wdiff invocation. Word difference finder.
11 * wdiff2: (wdiff)wdiff invocation. Word difference finder.
12 * mdiff: (wdiff)mdiff invocation. Line cluster finder.
13 * unify: (wdiff)unify invocation. Diff format converter.
16 This file documents the `wdiff' command, which compares two files,
17 finding which words have been deleted or added to the first for getting
18 the second. It also documents some other `diff' related tools.
20 Copyright (C) 1992, 1994, 1997, 1998, 1999, 2010, 2011, 2012, 2013
21 Free Software Foundation, Inc.
23 Permission is granted to make and distribute verbatim copies of this
24 manual provided the copyright notice and this permission notice are
25 preserved on all copies.
27 Permission is granted to copy and distribute modified versions of
28 this manual under the conditions for verbatim copying, provided that
29 the entire resulting derived work is distributed under the terms of a
30 permission notice identical to this one.
32 Permission is granted to copy and distribute translations of this
33 manual into another language, under the above conditions for modified
34 versions, except that this permission notice may be stated in a
35 translation approved by the Foundation.
38 File: wdiff.info, Node: Top, Next: Overview, Up: (dir)
43 These Info pages document `wdiff', a word difference finder, together
44 with some other `diff' related tools. Note that some tools documented
45 here are considered experimental and may not be part of every `wdiff'
46 installation. *Note Experimental::.
48 This is for release 1.2.1.
53 * wdiff:: The word difference finder
54 * mdiff:: The line cluster finder
55 * unify:: The diff format converter
56 * Compatibility:: How `mdiff' differs
57 * Experimental:: About experimental programs
60 --- The Detailed Node Listing ---
62 The word difference finder
64 * wdiff invocation:: Invoking `wdiff'
65 * wdiff Examples:: Actual examples of `wdiff' usage
67 The multi-difference finder
69 * mdiff invocation:: Invoking `mdiff'
70 * Efficiency:: Resource considerations and efficiency
72 The diff format converter
74 * unify invocation:: Invoking `unify'
78 * diff Compatibility:: Differences with `diff'
79 * wdiff Compatibility:: Differences with `wdiff'
83 * Experimental History:: History of the experimental programs
86 File: wdiff.info, Node: Overview, Next: wdiff, Prev: Top, Up: Top
91 `wdiff' is a front end to `diff' for comparing files on a word per word
92 basis. It works by creating two temporary files, one word per line, and
93 then executes `diff' on these files. It collects the `diff' output and
94 uses it to produce a nicer display of word differences between the
97 `mdiff' studies one or more input files altogether, and discovers
98 blocks of items which repeat at more than one place. Items may be
99 lines, words, or units defined by user. When in word mode, `mdiff'
100 compares two files, finding which words have been deleted or added to
101 the first in order to create the second, which is useful when two texts
102 differ only by a few words and paragraphs have been refilled. The
103 program has many output formats and interacts well with terminals and
104 pagers (notably with `less').
106 `unify' is able to convert context diffs to unidiff format, or the
107 other way around. Some people just prefer one format and despise the
108 other, it is a religious issue. This program brings peace back to
111 `wdiff2' is intended as a replacement to `wdiff'. It aims at
112 supporting the same set of options, but uses `mdiff' instead of `diff'
115 `wdiff', `mdiff' and `wdiff2' were written by François Pinard, while
116 `unify' has been contributed by Wayne Davison. Please report bugs to
117 <wdiff-bugs@gnu.org>. Include the version number, which you can find
118 by running the program with `--version'. Please include in your
119 message sufficient input to reproduce what you got, the output you
120 indeed expected, and careful explanations about the nature of the
124 File: wdiff.info, Node: wdiff, Next: mdiff, Prev: Overview, Up: Top
126 2 The word difference finder
127 ****************************
129 There are actually two programs for comparing files on a word per word
130 basis. `wdiff' is a front-end to `diff' as found in the GNU diffutils
131 package. It is quite mature. Its planned successor, `wdiff2', is a
132 front end to `mdiff', and as experimental as `mdiff' itself. *Note
135 A word is anything between whitespace. This is useful for comparing two
136 texts in which a few words have been changed and for which paragraphs
141 * wdiff invocation:: Invoking `wdiff'
142 * wdiff Examples:: Actual examples of `wdiff' usage
145 File: wdiff.info, Node: wdiff invocation, Next: wdiff Examples, Up: wdiff
150 The programs `wdiff' and `wdiff2' aim at providing the same set of
151 command line options. They are described below. *Note wdiff
152 Compatibility::, for a list of differences.
154 wdiff OPTION ... OLD_FILE NEW_FILE
155 wdiff OPTION ... -d [DIFF_FILE]
157 `wdiff' compares files OLD_FILE and NEW_FILE and produces an
158 annotated copy of NEW_FILE on standard output. The empty string or the
159 string `-' denotes standard input, but standard input cannot be used
160 twice in the same invocation. The complete path of a file should be
161 given, a directory name is not accepted. `wdiff' will exit with a
162 status of 0 if no differences were found, a status of 1 if any
163 differences were found, or a status of 2 for any error.
165 In this documentation, "deleted text" refers to text in OLD_FILE
166 which is not in NEW_FILE, while "inserted text" refers to text on
167 NEW_FILE which is not in OLD_FILE.
169 `wdiff' supports the following command line options:
173 Print an informative help message describing the options.
177 Print the version number of `wdiff' on the standard error output.
181 Avoid producing deleted words on the output. If neither `-1' or
182 `-2' is selected, the original right margin may be exceeded for
187 Avoid producing inserted words on the output. When this flag is
188 given, the whitespace in the output is taken from OLD_FILE instead
189 of NEW_FILE. If neither `-1' or `-2' is selected, the original
190 right margin may be exceeded for some lines.
194 Avoid producing common words on the output. When this option is
195 not selected, common words and whitespace are taken from NEW_FILE,
196 unless option `-2' is given, in which case common words and
197 whitespace are rather taken from OLD_FILE. When selected,
198 differences are separated from one another by lines of dashes.
199 Moreover, if this option is selected at the same time as `-1' or
200 `-2', then none of the output will have any emphasis, i.e. no bold
201 or underlining. Finally, if this option is not selected, but both
202 `-1' and `-2' are, then sections of common words between
203 differences are segregated by lines of dashes.
207 Do not consider case difference while comparing words. Each lower
208 case letter is seen as identical to its upper case equivalent for
209 the purpose of deciding if two words are the same.
213 On completion, for each file, the total number of words, the
214 number of common words between the files, the number of words
215 deleted or inserted and the number of words that have changed is
216 output. (A changed word is one that has been replaced or is part
217 of a replacement.) Except for the total number of words, all of
218 the numbers are followed by a percentage relative to the total
219 number of words in the file.
223 Some initiatives which were previously automatically taken in
224 previous versions of `wdiff' are now put under the control of this
225 option. By using it, a pager is interposed whenever the `wdiff'
226 output is directed to the user's terminal. Without this option,
227 no pager will be called, the user is then responsible for
228 explicitly piping `wdiff' output into a pager, if required.
230 The pager is selected by the value of the PAGER environment
231 variable when `wdiff' is run. If PAGER is not defined at run
232 time, then a default pager, selected at installation time, will be
233 used instead. A defined but empty value of PAGER means no pager
236 When a pager is interposed through the use of this option, one of
237 the options `-l' or `-t' is also selected, depending on whether
238 the string `less' appears in the pager's name or not.
240 It is often useful to define `wdiff' as an alias for `wdiff -a'.
241 However, this _hides_ the normal `wdiff' behaviour. The default
242 behaviour may be restored simply by piping the output from `wdiff'
243 through `cat'. This dissociates the output from the user's
248 Use over-striking to emphasize parts of the output. Each
249 character of the deleted text is underlined by writing an
250 underscore `_' first, then a backspace and then the letter to be
251 underlined. Each character of the inserted text is emboldened by
252 writing it twice, with a backspace in between. This option is not
257 Use over-striking to emphasize parts of output. This option works
258 as option `-p', but also over-strikes whitespace associated with
259 inserted text. `less' shows such whitespace using reverse video.
260 This option is not selected by default. However, it is
261 automatically turned on whenever `wdiff' launches the pager
262 `less'. See option `-a'.
264 This option is commonly used in conjunction with `less':
266 wdiff -l OLD_FILE NEW_FILE | less
270 Force the production of `termcap' strings for emphasising parts of
271 output, even if the standard output is not associated with a
272 terminal. The `TERM' environment variable must contain the name
273 of a valid `termcap' entry. If the terminal description permits,
274 underlining is used for marking deleted text, while bold or
275 reverse video is used for marking inserted text. This option is
276 not selected by default. However, it is automatically turned on
277 whenever `wdiff' launches a pager, and it is known that the pager
278 is _not_ `less'. See option `-a'.
280 This option is commonly used when `wdiff' output is not redirected,
281 but sent directly to the user terminal, as in:
283 wdiff -t OLD_FILE NEW_FILE
285 A common kludge uses `wdiff' together with the pager `more', as in:
287 wdiff -t OLD_FILE NEW_FILE | more
289 However, some versions of `more' use `termcap' emphasis for their
290 own purposes, so strange interactions are possible.
292 `--start-delete ARGUMENT'
294 Use ARGUMENT as the "start delete" string. This string will be
295 output prior to any sequence of deleted text, to mark where it
296 starts. By default, no start delete string is used unless there
297 is no other means of distinguishing where such text starts; in
298 this case the default start delete string is `[-'.
300 `--end-delete ARGUMENT'
302 Use ARGUMENT as the "end delete" string. This string will be
303 output after any sequence of deleted text, to mark where it ends.
304 By default, no end delete string is used unless there is no other
305 means of distinguishing where such text ends; in this case the
306 default end delete string is `-]'.
308 `--start-insert ARGUMENT'
310 Use ARGUMENT as the "start insert" string. This string will be
311 output prior to any sequence of inserted text, to mark where it
312 starts. By default, no start insert string is used unless there
313 is no other means of distinguishing where such text starts; in
314 this case the default start insert string is `{+'.
316 `--end-insert ARGUMENT'
318 Use ARGUMENT as the "end insert" string. This string will be
319 output after any sequence of inserted text, to mark where it ends.
320 By default, no end insert string is used unless there is no other
321 means of distinguishing where such text ends; in this case the
322 default end insert string is `+}'.
326 Avoid spanning the end of line while showing deleted or inserted
327 text. Any single fragment of deleted or inserted text spanning
328 many lines will be considered as being made up of many smaller
329 fragments not containing a newline. So deleted text, for example,
330 will have an end delete string at the end of each line, just
331 before the new line, and a start delete string at the beginning of
332 the next line. A long paragraph of inserted text will have each
333 line bracketed between start insert and end insert strings. This
334 behaviour is not selected by default.
338 Use single unified diff as input. If no input file is specified,
339 standard input is used instead. This can be used to post-process
340 diffs generated form other applications, like version control
345 Note that options `-p', `-t', and `-[wxyz]' are not mutually
346 exclusive. If you use a combination of them, you will merely
347 accumulate the effect of each. Option `-l' is a variant of option `-p'.
350 File: wdiff.info, Node: wdiff Examples, Prev: wdiff invocation, Up: wdiff
352 2.2 Actual examples of `wdiff' usage
353 ====================================
355 This section presents a few examples of usage, most of them have been
356 contributed by `wdiff' users.
358 * Change bars example.
360 * This example comes from a discussion with Joe Wells
363 The following command produces a copy of NEW_FILE, shifted
364 right one space to accommodate change bars since the last
365 revision, ignoring those changes coming only from paragraph
366 refilling. Any line with new or changed text will get a `|'
367 in column 1. However, deleted text is not shown nor marked.
369 wdiff -1n OLD_FILE NEW_FILE |
370 sed -e 's/^/ /;/{+/s/^ /|/;s/{+//g;s/+}//g'
372 Here is how it works. Word differences are found, paying
373 attention only to additions, as requested by option `-1'.
374 For bigger changes which span line boundaries, the insert
375 bracket strings are repeated on each output line, as
376 requested by option `-n'. This output is then reformatted
377 with a `sed' script which shifts the text right two columns,
378 turns the initial space into a bar only if there is some new
379 text on that line, then removes all insert bracket strings.
384 * This example has been provided by Steve Fisk
385 <fisk@polar.bowdoin.edu>.
387 The following uses LaTeX to put deleted text in boxes, and
388 new text in double boxes:
390 wdiff -w "\fbox{" -x "}" -y "\fbox{\fbox{" -z "}}" ...
397 * This example comes from Paul Fox <pgf@cayman.com>.
399 Using `wdiff', with some `troff'-specific delimiters gives
400 _much_ better output. The delimiters I used:
402 wdiff -w'\s-5' -x'\s0' -y'\fB' -z'\fP' ...
404 This makes the pointsize of deletions 5 points smaller than
405 normal, and emboldens insertions. Fantastic!
409 wdiff -w'\fI' -x'\fP' -y'\fB' -z'\fP'
411 since that's more like the defaults you use for terminals or
412 printers, but since I actually use italics for emphasis in my
413 documents, I thought the point size thing was clearer.
415 I tried it on code, and it works surprisingly well there,
418 * Marty Leisner <leisner@eso.mc.xerox.com> says:
420 In the previous example, you had smaller text being taken out
421 and bold face inserted. I had smaller text being taken out
422 and larger text being inserted, I'm using bold face for other
423 things, so this is more clear.
425 wdiff -w '\s-3' -x'\s0' -y'\s+3' -z'\s0'
428 * Colored output example.
430 * This example comes from Martin von Gagern
431 <Martin.vGagern@gmx.net>.
433 If you like colored output, and your terminal supports ANSI
434 escape sequences, you can use this invocation:
437 -w $'\033[30;41m' -x $'\033[0m' \
438 -y $'\033[30;42m' -z $'\033[0m' \
441 This will print deleted text black on red, and inserted text
442 black on green, assuming that your normal terminal colors are
443 white on black. Of course you can choose different colors if
446 The `$'...'' notation is supported by GNU bash, and maybe
447 other shells as well. If your shell doesn't support it, you
448 might need some more tricks to generate these escape
449 sequences as command line arguments.
453 On a related note, GNU Emacs users might notice that the interactive
454 function `compare-windows' ignores changes in whitespace, if it is
455 given a numeric argument. If the variable `compare-ignore-case' is
456 non-`nil', it ignores differences in case as well. So, in a way, this
457 offers a kind of incremental version of `wdiff'.
460 File: wdiff.info, Node: mdiff, Next: unify, Prev: wdiff, Up: Top
462 3 The multi-difference finder
463 *****************************
465 The name `mdiff' stands for _multi-_`diff', and has the purpose of
466 encompassing the functionnality of a few other `diff'-type programs.
467 The prefix _multi-_ also stands for the fact the program is often able
468 to study more than two input files at once.
470 The theory of operation is simple. The program splits all input
471 files into a sequence of items, which may be lines or words. `mdiff' is
472 then said to operate either in "line mode" or in "word mode". It then
473 tries to find sequences of items which are repeated in the input files.
474 Such common sequences are called "clusters" of items, and each
475 occurrence of a repetition is called a cluster "member". What remains,
476 once all cluster members are conceptually removed from all input files,
477 is a set of "differences". The role of `mdiff' is to conveniently list
478 either cluster members and differences.
480 When input files are very similar, it is likely that clusters will
481 encompass many items (lines or words) and differences will be small.
482 So, most listing options inhibit the printing of cluster members.
483 However, one may ask for the few beginning or ending items of cluster
484 members to be printed nevertheless, as a way to provide a kind of
485 feedback or "context" of the difference, those context items are
486 sometimes said to be at the "horizon" of the difference. In merged
487 listings, cluster members may just not be printed, except maybe for a
488 few context items at the beginning of the member (just after a
489 difference), and a few context items at the end of the member (just
490 before a difference).
492 When cluster members are short, or if you prefer, when the
493 differences are not far away from each other, it is quite possible that
494 the required context items often cover the full extent of the cluster
495 members, which then are not inhibited anymore when this happens. A run
496 of differences intermixed with such non-suppressed members is called a
497 "hunk". Some reports produced by `mdiff' are showned as a list of
498 hunks, and it is to be understood that common items are elided between
499 hunks. However, each hunk in itself has no item missing, and each item
500 of the hunk is analysed as pertaining either to only one of the input
501 file or to many of them. Each hunk is preceded by a header, which
502 explains the line position of all input files prior to the hunk itself.
503 By comparing a hunk header with the previous hunk header, the user can
504 have a hint about how much printing was spared.
506 When two input files are quite similar, clusters are usually
507 presented in the same order in all files. If a cluster member A in the
508 first file corresponds to a cluster member A in the second file, it is
509 likely that another cluster member B which appears _after_ A in the
510 first file will correspond to a cluster member B in the second file
511 which appears _after_ A as well. So, in many cases, while producing
512 merged listing of files, cluster members may be made to naturally
513 correspond to one another. However, this is not always true, in
514 particular when the second file has been produced from the first by
515 moving a big chunk of code away from its original position. In such
516 cases, we say that members have "crossed". When members are crossed
517 and `mdiff' has to make a merged listing, it selects one cluster member
518 as being _naturally_ associated with its correspondant (either the pair
519 of A's or the pair of B's) and then consider the other cluster as being
520 part of a difference. The crossed nature of the member may still be
521 analysed and reported, or it may be ignored.
523 The standard `diff' program is meant for when there are exactly two
524 input files, for which crossed members should be ignored. `mdiff'
525 output format has been designed in such a way that it should resemble
526 `diff' output for this precise case. However, `diff' formats are not
527 sufficient for representing all cases which `mdiff' may address, and
528 this is not mature yet. That is why `mdiff', in its current state,
529 still experiments with output formats, which are subject to change.
531 When the input files are not very similar, or rather different,
532 merged listings are not very significant nor useful, and may even be
533 rather confusing. The best to do in such cases is using `mdiff' for
534 making an annotated relisting of all input files, in which cluster
535 members are properly identified and referred to one another.
539 Read summary: 137 files, 41975 lines
540 Work summary: 439 clusters, 1608 members, 8837 duplicate lines
542 The summary lines, triggered by the `-s' option, say that about 8837
543 non-ignorable lines could be removed over the 41975 which has been read,
544 by using functions, `#include', `#define', or similar devices.
546 If one manages to execute `mdiff' within GNU Emacs so the output
547 described above is collected into the `*compilation*' buffer, the
548 command `C-`' (`M-x next-error') will proceed to the next cluster
549 member in the other window, and similarily for other compilation mode
550 commands. This is a useful way for handling `mdiff' output.
552 Each line in the hunk, after the header, comes from the compared
553 files, but is shifted right so the first column (or the first few
554 columns) of each line gives information about where the line is coming
555 from. A space indicates a line which is common to all files. In case
556 there are only two input files, a minus sign indicates a line from the
557 first file and a plus sign a line from the second file. Else, a letter
558 from `a' to `z', or more than one letter if there are more than 26
559 files, indicates to which file the line pertains. If a line or a block
560 of line pertains to many files but not to all of them, the first column
561 holds a vertical bar, and the line or block of lines is bracketed
562 between `@/' and `@\' lines, which are kind of comments within the
563 hunk. The initial bracket lists all file letters that are related to
566 I initially wrote `mdiff' specifically to help cleaning a C++
567 project which was a bit large, and in which many big monolithic classes
568 were derived from each other most probably by rough copying followed by
569 local modifications. I intended to fragment most common clusters and
570 segregate the parts into virtual methods in outer classes, and override
571 these methods, as appropriate, with less common variants within inner
572 classes. `mdiff' was good at pointing me to exactly where I should
573 look at. Of course, it never did the cleanup for me, but it helped
574 doing the research about what should be done. Reusing `mdiff' over the
575 half-cleaned project gave me more fine grained analysis of what was left
580 * mdiff invocation:: Invoking `mdiff'
581 * Efficiency:: Resource considerations and efficiency
584 File: wdiff.info, Node: mdiff invocation, Next: Efficiency, Up: mdiff
589 The format for running the `mdiff' program is:
591 mdiff OPTION ... FILE ...
593 `mdiff' read all input FILEs and produces its results on standard
594 output. Optionally, standard error might receive a progress report or
597 `wdiff' compares files OLD_FILE and NEW_FILE and produces an
598 annotated copy of NEW_FILE on standard output. The empty string or the
599 string `-' denotes standard input, but standard input cannot be used
600 twice in the same invocation. The complete path of a file should be
601 given, a directory name is not accepted. `wdiff' will exit with a
602 status of 0 if no differences were found, a status of 1 if any
603 differences were found, or a status of 2 for any error.
605 In this documentation, "deleted text" refers to text in OLD_FILE
606 which is not in NEW_FILE, while "inserted text" refers to text on
607 NEW_FILE which is not in OLD_FILE.
609 `mdiff' supports the following command line options:
612 Merely prints the version numbers on standard output, and exits
613 without doing anything else.
616 Merely prints a page of help on standard output, and exits without
621 Specifies the minimum number of non-ignorable lines which are
622 required for two runs of lines to compare as equal. No cluster
623 member may ever have less than NUMBER lines. By default, clusters
624 have 4 lines or more.
628 Avoid producing deleted words on the output. If neither `-1' or
629 `-2' is selected, the original right margin may be exceeded for
634 Avoid producing inserted words on the output. When this flag is
635 given, the whitespace in the output is taken from OLD_FILE instead
636 of NEW_FILE. If neither `-1' or `-2' is selected, the original
637 right margin may be exceeded for some lines.
641 Avoid producing common words on the output. When this option is
642 not selected, common words and whitespace are taken from NEW_FILE,
643 unless option `-2' is given, in which case common words and
644 whitespace are rather taken from OLD_FILE. When selected,
645 differences are separated from one another by lines of dashes.
646 Moreover, if this option is selected at the same time as `-1' or
647 `-2', then none of the output will have any emphasis, i.e. no bold
648 or underlining. Finally, if this option is not selected, but both
649 `-1' and `-2' are, then sections of common words between
650 differences are segregated by lines of dashes.
654 Do not consider case difference while comparing words. Each lower
655 case letter is seen as identical to its upper case equivalent for
656 the purpose of deciding if two words are the same.
660 Some initiatives which were previously automatically taken in
661 previous versions of `wdiff' are now put under the control of this
662 option. By using it, a pager is interposed whenever the `wdiff'
663 output is directed to the user's terminal. Without this option,
664 no pager will be called, the user is then responsible for
665 explicitly piping `wdiff' output into a pager, if required.
667 The pager is selected by the value of the PAGER environment
668 variable when `wdiff' is run. If PAGER is not defined at run
669 time, then a default pager, selected at installation time, will be
670 used instead. A defined but empty value of PAGER means no pager
673 When a pager is interposed through the use of this option, one of
674 the options `-l' or `-t' is also selected, depending on whether
675 the string `less' appears in the pager's name or not.
677 It is often useful to define `wdiff' as an alias for `wdiff -a'.
678 However, this _hides_ the normal `wdiff' behaviour. The default
679 behaviour may be restored simply by piping the output from `wdiff'
680 through `cat'. This dissociates the output from the user's
685 Use over-striking to emphasize parts of the output. Each
686 character of the deleted text is underlined by writing an
687 underscore `_' first, then a backspace and then the letter to be
688 underlined. Each character of the inserted text is emboldened by
689 writing it twice, with a backspace in between. This option is not
694 Use over-striking to emphasize parts of output. This option works
695 as option `-p', but also over-strikes whitespace associated with
696 inserted text. `less' shows such whitespace using reverse video.
697 This option is not selected by default. However, it is
698 automatically turned on whenever `wdiff' launches the pager
699 `less'. See option `-a'.
701 This option is commonly used in conjunction with `less':
703 wdiff -l OLD_FILE NEW_FILE | less
707 Force the production of `termcap' strings for emphasising parts of
708 output, even if the standard output is not associated with a
709 terminal. The `TERM' environment variable must contain the name
710 of a valid `termcap' entry. If the terminal description permits,
711 underlining is used for marking deleted text, while bold or
712 reverse video is used for marking inserted text. This option is
713 not selected by default. However, it is automatically turned on
714 whenever `wdiff' launches a pager, and it is known that the pager
715 is _not_ `less'. See option `-a'.
717 This option is commonly used when `wdiff' output is not redirected,
718 but sent directly to the user terminal, as in:
720 wdiff -t OLD_FILE NEW_FILE
722 A common kludge uses `wdiff' together with the pager `more', as in:
724 wdiff -t OLD_FILE NEW_FILE | more
726 However, some versions of `more' use `termcap' emphasis for their
727 own purposes, so strange interactions are possible.
729 `--start-delete ARGUMENT'
731 Use ARGUMENT as the "start delete" string. This string will be
732 output prior to any sequence of deleted text, to mark where it
733 starts. By default, no start delete string is used unless there
734 is no other means of distinguishing where such text starts; in
735 this case the default start delete string is `[-'.
737 `--end-delete ARGUMENT'
739 Use ARGUMENT as the "end delete" string. This string will be
740 output after any sequence of deleted text, to mark where it ends.
741 By default, no end delete string is used unless there is no other
742 means of distinguishing where such text ends; in this case the
743 default end delete string is `-]'.
745 `--start-insert ARGUMENT'
747 Use ARGUMENT as the "start insert" string. This string will be
748 output prior to any sequence of inserted text, to mark where it
749 starts. By default, no start insert string is used unless there
750 is no other means of distinguishing where such text starts; in
751 this case the default start insert string is `{+'.
753 `--end-insert ARGUMENT'
755 Use ARGUMENT as the "end insert" string. This string will be
756 output after any sequence of inserted text, to mark where it ends.
757 By default, no end insert string is used unless there is no other
758 means of distinguishing where such text ends; in this case the
759 default end insert string is `+}'.
763 Avoid spanning the end of line while showing deleted or inserted
764 text. Any single fragment of deleted or inserted text spanning
765 many lines will be considered as being made up of many smaller
766 fragments not containing a newline. So deleted text, for example,
767 will have an end delete string at the end of each line, just
768 before the new line, and a start delete string at the beginning of
769 the next line. A long paragraph of inserted text will have each
770 line bracketed between start insert and end insert strings. This
771 behaviour is not selected by default.
774 Some choices are hard-wired into the program, but might well become
775 options in later releases. For example:
777 * No cluster may span a file boundary, that is, start near the end
778 of one input file and continue at the beginning of the next file.
780 * A cluster may have many members from the same file.
782 * White space is ignored between the beginning of a line and the
783 first non-white character.
785 * White space is significant when embedded in a line, or when ending
788 * Lines having no significant part (only white lines for now) are
789 "ignorable". Such ignorable lines are logically considered as not
790 being part of the input files for the sake of comparisons.
792 * Comments from the C language are not especially ignored. Unless
793 ignored for other reasons (being white lines), they are indeed
796 * No cluster member may ever directly start nor end with ignorable
797 lines. However, ignorable lines may still be embedded within a
800 * In the generated output, clusters containing the biggest number of
801 ignorable lines are output first, while smaller clusters appear
802 last. All lines pertaining to a single cluster are output
803 together. Within a cluster, members are listed in the order of
804 the initial reading of input files.
807 Note that options `-p', `-t', and `-[wxyz]' are not mutually
808 exclusive. If you use a combination of them, you will merely
809 accumulate the effect of each. Option `-l' is a variant of option `-p'.
812 File: wdiff.info, Node: Efficiency, Prev: mdiff invocation, Up: mdiff
814 3.2 Resource considerations and efficiency
815 ==========================================
818 `mdiff' can easily handle medium-sized project. For a 32 bits
819 architecture, the memory requirements may computed like this:
825 * 4 bytes per cluster
827 * 8 bytes per cluster member
830 To evaluate the speed, consider the example shown above (*note
831 mdiff::), and yielding these statistics:
833 Read summary: 137 files, 41975 lines
834 Work summary: 439 clusters, 1608 members ...
836 Once many files in the memory cache, and redirecting the output to
837 `/dev/null', the processing takes 3 seconds of real time on an
838 Intel 486/100, which looks good. I was indeed afraid of some
839 hidden O(N^2) behaviour(1), even if the program is mostly
840 O(N*log(N)). Maybe one will discover or construct cases putting
841 `mdiff' on its knees. So far, `mdiff' seemingly behaves well for
842 the little problems given to it. If we devise and generate a more
843 traditional `diff'-like output, in which all input files are
844 relisted, this will add some time to the processing, but it will
845 be only linear with regard with the total length of input files.
847 There is a clever optimized sorting algorithm for _all_ substrings
848 of a file, which might be generalised to handle words or lines for
849 `mdiff'. But since the program is already faster than we initially
850 expected, there is no emergency to resort to using such an
853 Trading complexity for clarity
854 When lines repeat a lot, there are surprisingly many ways to
855 relate blocks of lines, and reporting them _all_ can make very
856 hairy listings. Any choice about reporting similarities, or not,
857 is somewhat arbitrary, but we ought to make some of such choices
858 for the program to be practical. Some of these choices are
861 If all members of a given cluster A are proper subsets of all
862 members of another given cluster B, then cluster A is wholly
863 forgotten. However, let's presume for example that there are more
864 members in A than in B. Then, some members of A necessarily
865 appear unrelated to any member of B. In such case, it has been
866 decided more useful to report _all_ occurrences of A members, even
867 those embedded within occurrences of B members. When only
868 interested in members B, annotations pertaining to A may be
869 perceived as clutter. However, when interested in members of A,
870 getting all of them is probably the most useful choice.
872 It sometimes happen that members of a very same cluster overlap.
873 In the string `a a a', there are two overlapping members for the
874 cluster represented by the string `a a', one from the first two
875 `a', another from the last two `a'. In such cases, one member of
876 such an overlap is automatically chopped so the overlap does not
879 White lines and items containing only delimiters are the possible
880 source of a lot of complexity, if these are fully taken as
881 significant. Since this does not add much to clarity, they are
882 better ignored, usually, through using `--ignore-blank-lines'
883 (`-B') or `--ignore-delimiters' (`-j'). Increasing the value of
884 `--minimum-size=ITEMS' (`-J ITEMS') option also cut off complexity
885 in favor of clarity, yet some small matches may then go unnoticed.
886 Exactly how to best adjust the ITEMS value is left for the user to
890 ---------- Footnotes ----------
892 (1) N is the total number of lines.
895 File: wdiff.info, Node: unify, Next: Compatibility, Prev: mdiff, Up: Top
897 4 The diff format converter
898 ***************************
900 The program `unify' has the purpose of manipulating context diffs and
901 unified context diffs. `unify' will accept either a regular context
902 diff (old- or new-style) or a unified context diff as input, and
903 generate either a unified diff or a new-style context diff as output.
905 Various other options allow you to echo the non-diff (comment) lines
906 to stderr, modify the diff by removing the comment lines, and/or tweak
907 the diff into a format that is good for releasing patches.
909 I think most people prefer unified context diffs in general. But
910 some of us just have trouble reading unidiffs, unless they get very
911 simple. Usual context diffs show how the code was _before_, and then,
912 how the code is _after_. Some people just prefer understanding twice
913 thoroughly, than once fuzzily. The tool is useful for those who handle
914 a lot of diffs from various sources, and want them in a uniform format.
918 * unify invocation:: Invoking `unify'
921 File: wdiff.info, Node: unify invocation, Up: unify
926 The format for running the `unify' program is:
928 unify OPTION ... [FILE]
930 The program reads the diff to convert from FILE, or if the source
931 file is not mentioned, it will be read from the standard input. The
932 default is to output the diff in the opposite style of whatever was
933 input, that is, regular context diffs will become unified context diffs,
934 and unified context diffs will become unified context diffs, but this
935 can be overridden by options.
937 `unify' supports the following command line options:
940 Merely prints the version numbers on standard output, and exits
941 without doing anything else.
944 Merely prints a page of help on standard output, and exits without
949 Forces context diff output.
953 Echoes non-diff (comment) lines to stderr. If a comment line is
954 being stripped via the `-p' option, it is echoed with a preceding
955 `!!! '. If all comments are being stripped (via the `-s' option),
956 no special designation is given.
960 Is used to force a context diff to be interpreted as being of the
961 old-style even if it has the extra trailing asterisks that
962 normally mark the new-style. This is only needed if `unify' fails
963 to work with your version of `diff'.
967 Turns on patch-output mode. This will do two things:
969 1. Transform a header like:
971 *** orig/file Sat May 5 02:59:37 1990
972 --- ./file Sat May 5 03:00:08 1990
974 into a line of `Index: file' -- we choose the shorter name
975 and strip a leading `./' sequence if present.
977 2. Strip lines that begin with `Only in ', `Common subdir',
978 `Binary files' or `diff -'.
986 Strips non-diff lines (comments).
990 Forces unified diff output.
993 Is the same as `-up'.
997 Will use a `=' prefix in a unified diff for lines that are common
998 to both files instead of using a leading space. Though this is
999 harder to read, it is less likely to be mangled by
1000 trailing-space-stripping sites when posted to Usenet.
1004 File: wdiff.info, Node: Compatibility, Next: Experimental, Prev: unify, Up: Top
1006 5 How `mdiff' differs
1007 *********************
1009 The GNU project already has a `diff' program which is part of the GNU
1010 diffutils package. There also are various non-GNU `diff' programs
1011 provided by various constructors.
1013 There is also the well-established `wdiff' which uses `diff' under
1014 the hood. It differs slightly from `wdiff2', its intended `mdiff'-based
1017 The following sections compare `mdiff' specifications with both GNU
1018 `diff' and with `wdiff'.
1022 * diff Compatibility:: Differences with `diff'
1023 * wdiff Compatibility:: Differences with `wdiff'
1026 File: wdiff.info, Node: diff Compatibility, Next: wdiff Compatibility, Up: Compatibility
1028 5.1 Differences with `diff'
1029 ===========================
1031 GNU `diff' is a program which matured for a long while, and for which
1032 algorithms are based on computer science literature. It is a fast
1033 program. By comparison, `mdiff' is not more than a program kludged up
1034 rapidly to satisfy a few precise needs. It only tries not being
1037 Most `diff' options are accepted by `mdiff' under the same short and
1038 long option names, and is able to produce resembling output, for making
1039 `mdiff' easier to learn and less surprising to users. Yet, some
1040 differences exist in option decoding and output format. Since `diff'
1041 and `mdiff' use different matching algorithms, it is very likely that
1042 the differences will not be exactly analyzed identically.
1044 * A few `diff' options, which either accept no argument or require a
1045 mandatory one, are implemented in `mdiff' as options accepting an
1046 optional argument. This may yield some surprises, for example,
1047 `-c4bir' would be accepted by `diff' and rejected by `mdiff', yet
1048 it may rewritten `-birc4' for both. See below.
1050 * Options `-c' and `-u' in `diff' ask for regular context and
1051 unified context output, respectively, without specifying the number
1052 of lines in the context. `diff' has `-C NUMBER' and `-U NUMBER'
1053 options for asking for regular or unified context diffs with
1054 NUMBER context lines. If `-c4' asks for four lines of context,
1055 the `4' is not really an argument of `-c', and this is really
1056 interpreted as `-c -4', where `-NUMBER' is meant to be a
1057 deprecated option for choosing the number of context lines, option
1058 which `mdiff' does not implement. In `mdiff', `-c' and `-u' are
1059 really two options which are allowed to receive an optional
1060 argument, so the number of lines may, or may not be given, at the
1061 choice of the user. In `mdiff', options `-C' and `-U' are
1062 completely equivalent to `-c' and `-u', and are provided only for
1063 the sake of compatibility.
1065 * Option `-v' in `diff' means `--version', while it means
1066 `--verbose' in `mdiff'. There is no short form for `--version' in
1071 File: wdiff.info, Node: wdiff Compatibility, Prev: diff Compatibility, Up: Compatibility
1073 5.2 Differences with `wdiff'
1074 ============================
1076 Even if `mdiff' is meant to fully support `wdiff', options have been
1077 shuffled around so `mdiff' could better merge both `diff' and `wdiff'
1078 options in a common scheme. `diff' habits were almost always favored
1079 in this option reorganisation.
1081 `wdiff2' is now a mere front-end to `mdiff' that only rewrites the
1082 options. The following notes apply.
1084 * Some options are just transmitted unchanged, these are `-1', `-2',
1087 * Option `-c' also gets turned into `-i', to be compatible with
1088 `wdiff' versions up to `0.4'.
1090 * Simple option `-a' in `wdiff' becomes `-A' in `mdiff', `-l'
1091 becomes `-k', `-n' becomes `-m', `-p' becomes `-o', `-s' becomes
1092 `-v' and `-t' becomes `-z'.
1094 * Options introducing strings, which are `-w', `-x', `-y' and `-z'
1095 in `wdiff', respectively become `-Y', `-Z', `-Q' and `-R' in
1098 * Options `-C', `-h' and `-v' are processed directly by `wdiff' and
1099 are not transmitted to `mdiff'.
1101 * Further, the `-C' option of `wdiff' has no equivalent in `mdiff'.
1103 * A new option `-q' inhibits the message which explains how `mdiff'
1104 might have been directly called.
1106 * The option `--diff-input' (`-d') from `wdiff' isn't supported by
1111 File: wdiff.info, Node: Experimental, Prev: Compatibility, Up: Top
1113 6 Experimental programs
1114 ***********************
1116 The GNU wdiff source package contains sources for a number of tools
1117 besides `wdiff' itself. These are considered experimental: they might
1118 work for you, but they might just as well fail. The following programs
1119 are considered experimental:
1127 Building these applications can be configured at build time by
1128 passing `--with-experimental' to the `configure' script.
1130 _For this build, they have been enabled._ If you encounter a bug in
1131 an experimental program, the maintainers would still like to learn
1132 about it, but there is a greater chance that they decide not to fix
1133 such issues unless you provide a patch as well.
1137 * Experimental History:: History of the experimental programs
1140 File: wdiff.info, Node: Experimental History, Up: Experimental
1142 6.1 History of the Experimental programs
1143 ========================================
1145 Many users suggested features, which were in turn inviting for the
1146 integration of `wdiff' into GNU diffutils. Collaboration proved to be
1147 rather difficult. After a few years, the `wdiff' author finally gave in
1148 and created `mdiff' as a way to break out of the situation and for
1149 becoming able to proceed with users' suggestions.
1151 Before `mdiff' and the new `wdiff2' based on it were officially
1152 released, the original author resigned maintainership. The new
1153 maintainers had little experience with the code, and therefore decided
1154 to mark it experimental. That way, the code wouldn't be lost, but it
1155 would be clear that it wasn't as testes as the good old `wdiff' command.
1161 Node: Overview
\7f2888
1163 Node: wdiff invocation
\7f5303
1164 Node: wdiff Examples
\7f14037
1166 Node: mdiff invocation
\7f24944
1167 Node: Efficiency
\7f34596
1168 Ref: Efficiency-Footnote-1
\7f38298
1170 Node: unify invocation
\7f39435
1171 Node: Compatibility
\7f41790
1172 Node: diff Compatibility
\7f42468
1173 Node: wdiff Compatibility
\7f44725
1174 Node: Experimental
\7f46128
1175 Node: Experimental History
\7f46987