doc/wdiff.info

   1 This is wdiff.info, produced by makeinfo version 4.13 from wdiff.texi.
   2
   3 INFO-DIR-SECTION Text creation and manipulation
   4 START-INFO-DIR-ENTRY
   5 * Word differences: (wdiff).    GNU wdiff and diff related tools.
   6 END-INFO-DIR-ENTRY
   7
   8 INFO-DIR-SECTION Individual utilities
   9 START-INFO-DIR-ENTRY
  10 * wdiff: (wdiff)wdiff invocation.               Word difference finder.
  11 * wdiff2: (wdiff)wdiff invocation.              Word difference finder.
  12 * mdiff: (wdiff)mdiff invocation.               Line cluster finder.
  13 * unify: (wdiff)unify invocation.               Diff format converter.
  14 END-INFO-DIR-ENTRY
  15
  16    This file documents the `wdiff' command, which compares two files,
  17 finding which words have been deleted or added to the first for getting
  18 the second.  It also documents some other `diff' related tools.
  19
  20    Copyright (C) 1992, 1994, 1997, 1998, 1999, 2010, 2011, 2012, 2013
  21 Free Software Foundation, Inc.
  22
  23    Permission is granted to make and distribute verbatim copies of this
  24 manual provided the copyright notice and this permission notice are
  25 preserved on all copies.
  26
  27    Permission is granted to copy and distribute modified versions of
  28 this manual under the conditions for verbatim copying, provided that
  29 the entire resulting derived work is distributed under the terms of a
  30 permission notice identical to this one.
  31
  32    Permission is granted to copy and distribute translations of this
  33 manual into another language, under the above conditions for modified
  34 versions, except that this permission notice may be stated in a
  35 translation approved by the Foundation.
  36
  37 \1f
  38 File: wdiff.info,  Node: Top,  Next: Overview,  Up: (dir)
  39
  40 GNU wdiff
  41 *********
  42
  43 These Info pages document `wdiff', a word difference finder, together
  44 with some other `diff' related tools.  Note that some tools documented
  45 here are considered experimental and may not be part of every `wdiff'
  46 installation.  *Note Experimental::.
  47
  48 This is for release 1.2.1.
  49
  50 * Menu:
  51
  52 * Overview::                    Overview
  53 * wdiff::                       The word difference finder
  54 * mdiff::                       The line cluster finder
  55 * unify::                       The diff format converter
  56 * Compatibility::               How `mdiff' differs
  57 * Experimental::                About experimental programs
  58
  59
  60 --- The Detailed Node Listing ---
  61
  62 The word difference finder
  63
  64 * wdiff invocation::            Invoking `wdiff'
  65 * wdiff Examples::              Actual examples of `wdiff' usage
  66
  67 The multi-difference finder
  68
  69 * mdiff invocation::            Invoking `mdiff'
  70 * Efficiency::                  Resource considerations and efficiency
  71
  72 The diff format converter
  73
  74 * unify invocation::            Invoking `unify'
  75
  76 How `mdiff' differs
  77
  78 * diff Compatibility::          Differences with `diff'
  79 * wdiff Compatibility::         Differences with `wdiff'
  80
  81 Experimental programs
  82
  83 * Experimental History::        History of the experimental programs
  84
  85 \1f
  86 File: wdiff.info,  Node: Overview,  Next: wdiff,  Prev: Top,  Up: Top
  87
  88 1 Overview
  89 **********
  90
  91 `wdiff' is a front end to `diff' for comparing files on a word per word
  92 basis. It works by creating two temporary files, one word per line, and
  93 then executes `diff' on these files. It collects the `diff' output and
  94 uses it to produce a nicer display of word differences between the
  95 original files.
  96
  97    `mdiff' studies one or more input files altogether, and discovers
  98 blocks of items which repeat at more than one place.  Items may be
  99 lines, words, or units defined by user.  When in word mode, `mdiff'
 100 compares two files, finding which words have been deleted or added to
 101 the first in order to create the second, which is useful when two texts
 102 differ only by a few words and paragraphs have been refilled.  The
 103 program has many output formats and interacts well with terminals and
 104 pagers (notably with `less').
 105
 106    `unify' is able to convert context diffs to unidiff format, or the
 107 other way around.  Some people just prefer one format and despise the
 108 other, it is a religious issue.  This program brings peace back to
 109 Earth.
 110
 111    `wdiff2' is intended as a replacement to `wdiff'.  It aims at
 112 supporting the same set of options, but uses `mdiff' instead of `diff'
 113 as its backend.
 114
 115    `wdiff', `mdiff' and `wdiff2' were written by François Pinard, while
 116 `unify' has been contributed by Wayne Davison.  Please report bugs to
 117 <wdiff-bugs@gnu.org>.  Include the version number, which you can find
 118 by running the program with `--version'.  Please include in your
 119 message sufficient input to reproduce what you got, the output you
 120 indeed expected, and careful explanations about the nature of the
 121 problem.
 122
 123 \1f
 124 File: wdiff.info,  Node: wdiff,  Next: mdiff,  Prev: Overview,  Up: Top
 125
 126 2 The word difference finder
 127 ****************************
 128
 129 There are actually two programs for comparing files on a word per word
 130 basis.  `wdiff' is a front-end to `diff' as found in the GNU diffutils
 131 package. It is quite mature.  Its planned successor, `wdiff2', is a
 132 front end to `mdiff', and as experimental as `mdiff' itself.  *Note
 133 Experimental::.
 134
 135 A word is anything between whitespace.  This is useful for comparing two
 136 texts in which a few words have been changed and for which paragraphs
 137 have been refilled.
 138
 139 * Menu:
 140
 141 * wdiff invocation::            Invoking `wdiff'
 142 * wdiff Examples::              Actual examples of `wdiff' usage
 143
 144 \1f
 145 File: wdiff.info,  Node: wdiff invocation,  Next: wdiff Examples,  Up: wdiff
 146
 147 2.1 Invoking `wdiff'
 148 ====================
 149
 150 The programs `wdiff' and `wdiff2' aim at providing the same set of
 151 command line options. They are described below.  *Note wdiff
 152 Compatibility::, for a list of differences.
 153
 154      wdiff OPTION ... OLD_FILE NEW_FILE
 155      wdiff OPTION ... -d [DIFF_FILE]
 156
 157    `wdiff' compares files OLD_FILE and NEW_FILE and produces an
 158 annotated copy of NEW_FILE on standard output.  The empty string or the
 159 string `-' denotes standard input, but standard input cannot be used
 160 twice in the same invocation.  The complete path of a file should be
 161 given, a directory name is not accepted.  `wdiff' will exit with a
 162 status of 0 if no differences were found, a status of 1 if any
 163 differences were found, or a status of 2 for any error.
 164
 165    In this documentation, "deleted text" refers to text in OLD_FILE
 166 which is not in NEW_FILE, while "inserted text" refers to text on
 167 NEW_FILE which is not in OLD_FILE.
 168
 169    `wdiff' supports the following command line options:
 170
 171 `--help'
 172 `-h'
 173      Print an informative help message describing the options.
 174
 175 `--version'
 176 `-v'
 177      Print the version number of `wdiff' on the standard error output.
 178
 179 `--no-deleted'
 180 `-1'
 181      Avoid producing deleted words on the output.  If neither `-1' or
 182      `-2' is selected, the original right margin may be exceeded for
 183      some lines.
 184
 185 `--no-inserted'
 186 `-2'
 187      Avoid producing inserted words on the output.  When this flag is
 188      given, the whitespace in the output is taken from OLD_FILE instead
 189      of NEW_FILE.  If neither `-1' or `-2' is selected, the original
 190      right margin may be exceeded for some lines.
 191
 192 `--no-common'
 193 `-3'
 194      Avoid producing common words on the output.  When this option is
 195      not selected, common words and whitespace are taken from NEW_FILE,
 196      unless option `-2' is given, in which case common words and
 197      whitespace are rather taken from OLD_FILE.  When selected,
 198      differences are separated from one another by lines of dashes.
 199      Moreover, if this option is selected at the same time as `-1' or
 200      `-2', then none of the output will have any emphasis, i.e. no bold
 201      or underlining.  Finally, if this option is not selected, but both
 202      `-1' and `-2' are, then sections of common words between
 203      differences are segregated by lines of dashes.
 204
 205 `--ignore-case'
 206 `-i'
 207      Do not consider case difference while comparing words.  Each lower
 208      case letter is seen as identical to its upper case equivalent for
 209      the purpose of deciding if two words are the same.
 210
 211 `--statistics'
 212 `-s'
 213      On completion, for each file, the total number of words, the
 214      number of common words between the files, the number of words
 215      deleted or inserted and the number of words that have changed is
 216      output.  (A changed word is one that has been replaced or is part
 217      of a replacement.)  Except for the total number of words, all of
 218      the numbers are followed by a percentage relative to the total
 219      number of words in the file.
 220
 221 `--auto-pager'
 222 `-a'
 223      Some initiatives which were previously automatically taken in
 224      previous versions of `wdiff' are now put under the control of this
 225      option.  By using it, a pager is interposed whenever the `wdiff'
 226      output is directed to the user's terminal.  Without this option,
 227      no pager will be called, the user is then responsible for
 228      explicitly piping `wdiff' output into a pager, if required.
 229
 230      The pager is selected by the value of the PAGER environment
 231      variable when `wdiff' is run.  If PAGER is not defined at run
 232      time, then a default pager, selected at installation time, will be
 233      used instead.  A defined but empty value of PAGER means no pager
 234      at all.
 235
 236      When a pager is interposed through the use of this option, one of
 237      the options `-l' or `-t' is also selected, depending on whether
 238      the string `less' appears in the pager's name or not.
 239
 240      It is often useful to define `wdiff' as an alias for `wdiff -a'.
 241      However, this _hides_ the normal `wdiff' behaviour.  The default
 242      behaviour may be restored simply by piping the output from `wdiff'
 243      through `cat'.  This dissociates the output from the user's
 244      terminal.
 245
 246 `--printer'
 247 `-p'
 248      Use over-striking to emphasize parts of the output.  Each
 249      character of the deleted text is underlined by writing an
 250      underscore `_' first, then a backspace and then the letter to be
 251      underlined.  Each character of the inserted text is emboldened by
 252      writing it twice, with a backspace in between.  This option is not
 253      selected by default.
 254
 255 `--less-mode'
 256 `-l'
 257      Use over-striking to emphasize parts of output.  This option works
 258      as option `-p', but also over-strikes whitespace associated with
 259      inserted text.  `less' shows such whitespace using reverse video.
 260      This option is not selected by default.  However, it is
 261      automatically turned on whenever `wdiff' launches the pager
 262      `less'.  See option `-a'.
 263
 264      This option is commonly used in conjunction with `less':
 265
 266           wdiff -l OLD_FILE NEW_FILE | less
 267
 268 `--terminal'
 269 `-t'
 270      Force the production of `termcap' strings for emphasising parts of
 271      output, even if the standard output is not associated with a
 272      terminal.  The `TERM' environment variable must contain the name
 273      of a valid `termcap' entry.  If the terminal description permits,
 274      underlining is used for marking deleted text, while bold or
 275      reverse video is used for marking inserted text.  This option is
 276      not selected by default.  However, it is automatically turned on
 277      whenever `wdiff' launches a pager, and it is known that the pager
 278      is _not_ `less'.  See option `-a'.
 279
 280      This option is commonly used when `wdiff' output is not redirected,
 281      but sent directly to the user terminal, as in:
 282
 283           wdiff -t OLD_FILE NEW_FILE
 284
 285      A common kludge uses `wdiff' together with the pager `more', as in:
 286
 287           wdiff -t OLD_FILE NEW_FILE | more
 288
 289      However, some versions of `more' use `termcap' emphasis for their
 290      own purposes, so strange interactions are possible.
 291
 292 `--start-delete ARGUMENT'
 293 `-w ARGUMENT'
 294      Use ARGUMENT as the "start delete" string.  This string will be
 295      output prior to any sequence of deleted text, to mark where it
 296      starts.  By default, no start delete string is used unless there
 297      is no other means of distinguishing where such text starts; in
 298      this case the default start delete string is `[-'.
 299
 300 `--end-delete ARGUMENT'
 301 `-x ARGUMENT'
 302      Use ARGUMENT as the "end delete" string.  This string will be
 303      output after any sequence of deleted text, to mark where it ends.
 304      By default, no end delete string is used unless there is no other
 305      means of distinguishing where such text ends; in this case the
 306      default end delete string is `-]'.
 307
 308 `--start-insert ARGUMENT'
 309 `-y ARGUMENT'
 310      Use ARGUMENT as the "start insert" string.  This string will be
 311      output prior to any sequence of inserted text, to mark where it
 312      starts.  By default, no start insert string is used unless there
 313      is no other means of distinguishing where such text starts; in
 314      this case the default start insert string is `{+'.
 315
 316 `--end-insert ARGUMENT'
 317 `-z ARGUMENT'
 318      Use ARGUMENT as the "end insert" string.  This string will be
 319      output after any sequence of inserted text, to mark where it ends.
 320      By default, no end insert string is used unless there is no other
 321      means of distinguishing where such text ends; in this case the
 322      default end insert string is `+}'.
 323
 324 `--avoid-wraps'
 325 `-n'
 326      Avoid spanning the end of line while showing deleted or inserted
 327      text.  Any single fragment of deleted or inserted text spanning
 328      many lines will be considered as being made up of many smaller
 329      fragments not containing a newline.  So deleted text, for example,
 330      will have an end delete string at the end of each line, just
 331      before the new line, and a start delete string at the beginning of
 332      the next line.  A long paragraph of inserted text will have each
 333      line bracketed between start insert and end insert strings.  This
 334      behaviour is not selected by default.
 335
 336 `--diff-input'
 337 `-d'
 338      Use single unified diff as input. If no input file is specified,
 339      standard input is used instead. This can be used to post-process
 340      diffs generated form other applications, like version control
 341      systems:
 342
 343           svn diff | wdiff -d
 344
 345    Note that options `-p', `-t', and `-[wxyz]' are not mutually
 346 exclusive.  If you use a combination of them, you will merely
 347 accumulate the effect of each.  Option `-l' is a variant of option `-p'.
 348
 349 \1f
 350 File: wdiff.info,  Node: wdiff Examples,  Prev: wdiff invocation,  Up: wdiff
 351
 352 2.2 Actual examples of `wdiff' usage
 353 ====================================
 354
 355 This section presents a few examples of usage, most of them have been
 356 contributed by `wdiff' users.
 357
 358    * Change bars example.
 359
 360         * This example comes from a discussion with Joe Wells
 361           <jbw@cs.bu.edu>.
 362
 363           The following command produces a copy of NEW_FILE, shifted
 364           right one space to accommodate change bars since the last
 365           revision, ignoring those changes coming only from paragraph
 366           refilling.  Any line with new or changed text will get a `|'
 367           in column 1.  However, deleted text is not shown nor marked.
 368
 369                wdiff -1n OLD_FILE NEW_FILE |
 370                  sed -e 's/^/  /;/{+/s/^ /|/;s/{+//g;s/+}//g'
 371
 372           Here is how it works.  Word differences are found, paying
 373           attention only to additions, as requested by option `-1'.
 374           For bigger changes which span line boundaries, the insert
 375           bracket strings are repeated on each output line, as
 376           requested by option `-n'.  This output is then reformatted
 377           with a `sed' script which shifts the text right two columns,
 378           turns the initial space into a bar only if there is some new
 379           text on that line, then removes all insert bracket strings.
 380
 381
 382    * LaTeX example.
 383
 384         *  This example has been provided by Steve Fisk
 385           <fisk@polar.bowdoin.edu>.
 386
 387           The following uses LaTeX to put deleted text in boxes, and
 388           new text in double boxes:
 389
 390                wdiff -w "\fbox{" -x "}" -y "\fbox{\fbox{" -z "}}" ...
 391
 392           works nicely.
 393
 394
 395    * `troff' example.
 396
 397         * This example comes from Paul Fox <pgf@cayman.com>.
 398
 399           Using `wdiff', with some `troff'-specific delimiters gives
 400           _much_ better output.  The delimiters I used:
 401
 402                wdiff -w'\s-5' -x'\s0' -y'\fB' -z'\fP' ...
 403
 404           This makes the pointsize of deletions 5 points smaller than
 405           normal, and emboldens insertions.  Fantastic!
 406
 407           I experimented with:
 408
 409                wdiff -w'\fI' -x'\fP' -y'\fB' -z'\fP'
 410
 411           since that's more like the defaults you use for terminals or
 412           printers, but since I actually use italics for emphasis in my
 413           documents, I thought the point size thing was clearer.
 414
 415           I tried it on code, and it works surprisingly well there,
 416           too...
 417
 418         * Marty Leisner <leisner@eso.mc.xerox.com> says:
 419
 420           In the previous example, you had smaller text being taken out
 421           and bold face inserted.  I had smaller text being taken out
 422           and larger text being inserted, I'm using bold face for other
 423           things, so this is more clear.
 424
 425                wdiff -w '\s-3' -x'\s0' -y'\s+3' -z'\s0'
 426
 427
 428    * Colored output example.
 429
 430         * This example comes from Martin von Gagern
 431           <Martin.vGagern@gmx.net>.
 432
 433           If you like colored output, and your terminal supports ANSI
 434           escape sequences, you can use this invocation:
 435
 436                wdiff -n \
 437                  -w $'\033[30;41m' -x $'\033[0m' \
 438                  -y $'\033[30;42m' -z $'\033[0m' \
 439                  ... | less -R
 440
 441           This will print deleted text black on red, and inserted text
 442           black on green, assuming that your normal terminal colors are
 443           white on black.  Of course you can choose different colors if
 444           you prefer.
 445
 446           The `$'...'' notation is supported by GNU bash, and maybe
 447           other shells as well. If your shell doesn't support it, you
 448           might need some more tricks to generate these escape
 449           sequences as command line arguments.
 450
 451
 452
 453    On a related note, GNU Emacs users might notice that the interactive
 454 function `compare-windows' ignores changes in whitespace, if it is
 455 given a numeric argument.  If the variable `compare-ignore-case' is
 456 non-`nil', it ignores differences in case as well.  So, in a way, this
 457 offers a kind of incremental version of `wdiff'.
 458
 459 \1f
 460 File: wdiff.info,  Node: mdiff,  Next: unify,  Prev: wdiff,  Up: Top
 461
 462 3 The multi-difference finder
 463 *****************************
 464
 465 The name `mdiff' stands for _multi-_`diff', and has the purpose of
 466 encompassing the functionnality of a few other `diff'-type programs.
 467 The prefix _multi-_ also stands for the fact the program is often able
 468 to study more than two input files at once.
 469
 470    The theory of operation is simple.  The program splits all input
 471 files into a sequence of items, which may be lines or words.  `mdiff' is
 472 then said to operate either in "line mode" or in "word mode".  It then
 473 tries to find sequences of items which are repeated in the input files.
 474 Such common sequences are called "clusters" of items, and each
 475 occurrence of a repetition is called a cluster "member".  What remains,
 476 once all cluster members are conceptually removed from all input files,
 477 is a set of "differences".  The role of `mdiff' is to conveniently list
 478 either cluster members and differences.
 479
 480    When input files are very similar, it is likely that clusters will
 481 encompass many items (lines or words) and differences will be small.
 482 So, most listing options inhibit the printing of cluster members.
 483 However, one may ask for the few beginning or ending items of cluster
 484 members to be printed nevertheless, as a way to provide a kind of
 485 feedback or "context" of the difference, those context items are
 486 sometimes said to be at the "horizon" of the difference.  In merged
 487 listings, cluster members may just not be printed, except maybe for a
 488 few context items at the beginning of the member (just after a
 489 difference), and a few context items at the end of the member (just
 490 before a difference).
 491
 492    When cluster members are short, or if you prefer, when the
 493 differences are not far away from each other, it is quite possible that
 494 the required context items often cover the full extent of the cluster
 495 members, which then are not inhibited anymore when this happens.  A run
 496 of differences intermixed with such non-suppressed members is called a
 497 "hunk".  Some reports produced by `mdiff' are showned as a list of
 498 hunks, and it is to be understood that common items are elided between
 499 hunks.  However, each hunk in itself has no item missing, and each item
 500 of the hunk is analysed as pertaining either to only one of the input
 501 file or to many of them.  Each hunk is preceded by a header, which
 502 explains the line position of all input files prior to the hunk itself.
 503 By comparing a hunk header with the previous hunk header, the user can
 504 have a hint about how much printing was spared.
 505
 506    When two input files are quite similar, clusters are usually
 507 presented in the same order in all files.  If a cluster member A in the
 508 first file corresponds to a cluster member A in the second file, it is
 509 likely that another cluster member B which appears _after_ A in the
 510 first file will correspond to a cluster member B in the second file
 511 which appears _after_ A as well.  So, in many cases, while producing
 512 merged listing of files, cluster members may be made to naturally
 513 correspond to one another.  However, this is not always true, in
 514 particular when the second file has been produced from the first by
 515 moving a big chunk of code away from its original position.  In such
 516 cases, we say that members have "crossed".  When members are crossed
 517 and `mdiff' has to make a merged listing, it selects one cluster member
 518 as being _naturally_ associated with its correspondant (either the pair
 519 of A's or the pair of B's) and then consider the other cluster as being
 520 part of a difference.  The crossed nature of the member may still be
 521 analysed and reported, or it may be ignored.
 522
 523    The standard `diff' program is meant for when there are exactly two
 524 input files, for which crossed members should be ignored.  `mdiff'
 525 output format has been designed in such a way that it should resemble
 526 `diff' output for this precise case.  However, `diff' formats are not
 527 sufficient for representing all cases which `mdiff' may address, and
 528 this is not mature yet.  That is why `mdiff', in its current state,
 529 still experiments with output formats, which are subject to change.
 530
 531    When the input files are not very similar, or rather different,
 532 merged listings are not very significant nor useful, and may even be
 533 rather confusing.  The best to do in such cases is using `mdiff' for
 534 making an annotated relisting of all input files, in which cluster
 535 members are properly identified and referred to one another.
 536
 537    Statistics.
 538
 539      Read summary: 137 files, 41975 lines
 540      Work summary: 439 clusters, 1608 members, 8837 duplicate lines
 541
 542 The summary lines, triggered by the `-s' option, say that about 8837
 543 non-ignorable lines could be removed over the 41975 which has been read,
 544 by using functions, `#include', `#define', or similar devices.
 545
 546    If one manages to execute `mdiff' within GNU Emacs so the output
 547 described above is collected into the `*compilation*' buffer, the
 548 command `C-`' (`M-x next-error') will proceed to the next cluster
 549 member in the other window, and similarily for other compilation mode
 550 commands.  This is a useful way for handling `mdiff' output.
 551
 552    Each line in the hunk, after the header, comes from the compared
 553 files, but is shifted right so the first column (or the first few
 554 columns) of each line gives information about where the line is coming
 555 from.  A space indicates a line which is common to all files.  In case
 556 there are only two input files, a minus sign indicates a line from the
 557 first file and a plus sign a line from the second file.  Else, a letter
 558 from `a' to `z', or more than one letter if there are more than 26
 559 files, indicates to which file the line pertains.  If a line or a block
 560 of line pertains to many files but not to all of them, the first column
 561 holds a vertical bar, and the line or block of lines is bracketed
 562 between `@/' and `@\' lines, which are kind of comments within the
 563 hunk.  The initial bracket lists all file letters that are related to
 564 the incoming line.
 565
 566    I initially wrote `mdiff' specifically to help cleaning a C++
 567 project which was a bit large, and in which many big monolithic classes
 568 were derived from each other most probably by rough copying followed by
 569 local modifications.  I intended to fragment most common clusters and
 570 segregate the parts into virtual methods in outer classes, and override
 571 these methods, as appropriate, with less common variants within inner
 572 classes.  `mdiff' was good at pointing me to exactly where I should
 573 look at.  Of course, it never did the cleanup for me, but it helped
 574 doing the research about what should be done.  Reusing `mdiff' over the
 575 half-cleaned project gave me more fine grained analysis of what was left
 576 to consider.
 577
 578 * Menu:
 579
 580 * mdiff invocation::            Invoking `mdiff'
 581 * Efficiency::                  Resource considerations and efficiency
 582
 583 \1f
 584 File: wdiff.info,  Node: mdiff invocation,  Next: Efficiency,  Up: mdiff
 585
 586 3.1 Invoking `mdiff'
 587 ====================
 588
 589 The format for running the `mdiff' program is:
 590
 591      mdiff OPTION ... FILE ...
 592
 593    `mdiff' read all input FILEs and produces its results on standard
 594 output.  Optionally, standard error might receive a progress report or
 595 a few statistics.
 596
 597    `wdiff' compares files OLD_FILE and NEW_FILE and produces an
 598 annotated copy of NEW_FILE on standard output.  The empty string or the
 599 string `-' denotes standard input, but standard input cannot be used
 600 twice in the same invocation.  The complete path of a file should be
 601 given, a directory name is not accepted.  `wdiff' will exit with a
 602 status of 0 if no differences were found, a status of 1 if any
 603 differences were found, or a status of 2 for any error.
 604
 605    In this documentation, "deleted text" refers to text in OLD_FILE
 606 which is not in NEW_FILE, while "inserted text" refers to text on
 607 NEW_FILE which is not in OLD_FILE.
 608
 609    `mdiff' supports the following command line options:
 610
 611 `--version'
 612      Merely prints the version numbers on standard output, and exits
 613      without doing anything else.
 614
 615 `--help'
 616      Merely prints a page of help on standard output, and exits without
 617      doing anything else.
 618
 619 `--threshold=NUMBER'
 620 `-t NUMBER'
 621      Specifies the minimum number of non-ignorable lines which are
 622      required for two runs of lines to compare as equal.  No cluster
 623      member may ever have less than NUMBER lines.  By default, clusters
 624      have 4 lines or more.
 625
 626 `--no-deleted'
 627 `-1'
 628      Avoid producing deleted words on the output.  If neither `-1' or
 629      `-2' is selected, the original right margin may be exceeded for
 630      some lines.
 631
 632 `--no-inserted'
 633 `-2'
 634      Avoid producing inserted words on the output.  When this flag is
 635      given, the whitespace in the output is taken from OLD_FILE instead
 636      of NEW_FILE.  If neither `-1' or `-2' is selected, the original
 637      right margin may be exceeded for some lines.
 638
 639 `--no-common'
 640 `-3'
 641      Avoid producing common words on the output.  When this option is
 642      not selected, common words and whitespace are taken from NEW_FILE,
 643      unless option `-2' is given, in which case common words and
 644      whitespace are rather taken from OLD_FILE.  When selected,
 645      differences are separated from one another by lines of dashes.
 646      Moreover, if this option is selected at the same time as `-1' or
 647      `-2', then none of the output will have any emphasis, i.e. no bold
 648      or underlining.  Finally, if this option is not selected, but both
 649      `-1' and `-2' are, then sections of common words between
 650      differences are segregated by lines of dashes.
 651
 652 `--ignore-case'
 653 `-i'
 654      Do not consider case difference while comparing words.  Each lower
 655      case letter is seen as identical to its upper case equivalent for
 656      the purpose of deciding if two words are the same.
 657
 658 `--auto-pager'
 659 `-A'
 660      Some initiatives which were previously automatically taken in
 661      previous versions of `wdiff' are now put under the control of this
 662      option.  By using it, a pager is interposed whenever the `wdiff'
 663      output is directed to the user's terminal.  Without this option,
 664      no pager will be called, the user is then responsible for
 665      explicitly piping `wdiff' output into a pager, if required.
 666
 667      The pager is selected by the value of the PAGER environment
 668      variable when `wdiff' is run.  If PAGER is not defined at run
 669      time, then a default pager, selected at installation time, will be
 670      used instead.  A defined but empty value of PAGER means no pager
 671      at all.
 672
 673      When a pager is interposed through the use of this option, one of
 674      the options `-l' or `-t' is also selected, depending on whether
 675      the string `less' appears in the pager's name or not.
 676
 677      It is often useful to define `wdiff' as an alias for `wdiff -a'.
 678      However, this _hides_ the normal `wdiff' behaviour.  The default
 679      behaviour may be restored simply by piping the output from `wdiff'
 680      through `cat'.  This dissociates the output from the user's
 681      terminal.
 682
 683 `--printer'
 684 `-p'
 685      Use over-striking to emphasize parts of the output.  Each
 686      character of the deleted text is underlined by writing an
 687      underscore `_' first, then a backspace and then the letter to be
 688      underlined.  Each character of the inserted text is emboldened by
 689      writing it twice, with a backspace in between.  This option is not
 690      selected by default.
 691
 692 `--less-mode'
 693 `-l'
 694      Use over-striking to emphasize parts of output.  This option works
 695      as option `-p', but also over-strikes whitespace associated with
 696      inserted text.  `less' shows such whitespace using reverse video.
 697      This option is not selected by default.  However, it is
 698      automatically turned on whenever `wdiff' launches the pager
 699      `less'.  See option `-a'.
 700
 701      This option is commonly used in conjunction with `less':
 702
 703           wdiff -l OLD_FILE NEW_FILE | less
 704
 705 `--terminal'
 706 `-t'
 707      Force the production of `termcap' strings for emphasising parts of
 708      output, even if the standard output is not associated with a
 709      terminal.  The `TERM' environment variable must contain the name
 710      of a valid `termcap' entry.  If the terminal description permits,
 711      underlining is used for marking deleted text, while bold or
 712      reverse video is used for marking inserted text.  This option is
 713      not selected by default.  However, it is automatically turned on
 714      whenever `wdiff' launches a pager, and it is known that the pager
 715      is _not_ `less'.  See option `-a'.
 716
 717      This option is commonly used when `wdiff' output is not redirected,
 718      but sent directly to the user terminal, as in:
 719
 720           wdiff -t OLD_FILE NEW_FILE
 721
 722      A common kludge uses `wdiff' together with the pager `more', as in:
 723
 724           wdiff -t OLD_FILE NEW_FILE | more
 725
 726      However, some versions of `more' use `termcap' emphasis for their
 727      own purposes, so strange interactions are possible.
 728
 729 `--start-delete ARGUMENT'
 730 `-w ARGUMENT'
 731      Use ARGUMENT as the "start delete" string.  This string will be
 732      output prior to any sequence of deleted text, to mark where it
 733      starts.  By default, no start delete string is used unless there
 734      is no other means of distinguishing where such text starts; in
 735      this case the default start delete string is `[-'.
 736
 737 `--end-delete ARGUMENT'
 738 `-x ARGUMENT'
 739      Use ARGUMENT as the "end delete" string.  This string will be
 740      output after any sequence of deleted text, to mark where it ends.
 741      By default, no end delete string is used unless there is no other
 742      means of distinguishing where such text ends; in this case the
 743      default end delete string is `-]'.
 744
 745 `--start-insert ARGUMENT'
 746 `-y ARGUMENT'
 747      Use ARGUMENT as the "start insert" string.  This string will be
 748      output prior to any sequence of inserted text, to mark where it
 749      starts.  By default, no start insert string is used unless there
 750      is no other means of distinguishing where such text starts; in
 751      this case the default start insert string is `{+'.
 752
 753 `--end-insert ARGUMENT'
 754 `-z ARGUMENT'
 755      Use ARGUMENT as the "end insert" string.  This string will be
 756      output after any sequence of inserted text, to mark where it ends.
 757      By default, no end insert string is used unless there is no other
 758      means of distinguishing where such text ends; in this case the
 759      default end insert string is `+}'.
 760
 761 `--avoid-wraps'
 762 `-n'
 763      Avoid spanning the end of line while showing deleted or inserted
 764      text.  Any single fragment of deleted or inserted text spanning
 765      many lines will be considered as being made up of many smaller
 766      fragments not containing a newline.  So deleted text, for example,
 767      will have an end delete string at the end of each line, just
 768      before the new line, and a start delete string at the beginning of
 769      the next line.  A long paragraph of inserted text will have each
 770      line bracketed between start insert and end insert strings.  This
 771      behaviour is not selected by default.
 772
 773
 774    Some choices are hard-wired into the program, but might well become
 775 options in later releases.  For example:
 776
 777    * No cluster may span a file boundary, that is, start near the end
 778      of one input file and continue at the beginning of the next file.
 779
 780    * A cluster may have many members from the same file.
 781
 782    * White space is ignored between the beginning of a line and the
 783      first non-white character.
 784
 785    * White space is significant when embedded in a line, or when ending
 786      a line.
 787
 788    * Lines having no significant part (only white lines for now) are
 789      "ignorable".  Such ignorable lines are logically considered as not
 790      being part of the input files for the sake of comparisons.
 791
 792    * Comments from the C language are not especially ignored.  Unless
 793      ignored for other reasons (being white lines), they are indeed
 794      significant lines.
 795
 796    * No cluster member may ever directly start nor end with ignorable
 797      lines.  However, ignorable lines may still be embedded within a
 798      cluster member.
 799
 800    * In the generated output, clusters containing the biggest number of
 801      ignorable lines are output first, while smaller clusters appear
 802      last.  All lines pertaining to a single cluster are output
 803      together.  Within a cluster, members are listed in the order of
 804      the initial reading of input files.
 805
 806
 807    Note that options `-p', `-t', and `-[wxyz]' are not mutually
 808 exclusive.  If you use a combination of them, you will merely
 809 accumulate the effect of each.  Option `-l' is a variant of option `-p'.
 810
 811 \1f
 812 File: wdiff.info,  Node: Efficiency,  Prev: mdiff invocation,  Up: mdiff
 813
 814 3.2 Resource considerations and efficiency
 815 ==========================================
 816
 817 Memory consumption
 818      `mdiff' can easily handle medium-sized project.  For a 32 bits
 819      architecture, the memory requirements may computed like this:
 820
 821         * 8 bytes per file
 822
 823         * 8 bytes per line
 824
 825         * 4 bytes per cluster
 826
 827         * 8 bytes per cluster member
 828
 829 Time consumption
 830      To evaluate the speed, consider the example shown above (*note
 831      mdiff::), and yielding these statistics:
 832
 833           Read summary: 137 files, 41975 lines
 834           Work summary: 439 clusters, 1608 members ...
 835
 836      Once many files in the memory cache, and redirecting the output to
 837      `/dev/null', the processing takes 3 seconds of real time on an
 838      Intel 486/100, which looks good.  I was indeed afraid of some
 839      hidden O(N^2) behaviour(1), even if the program is mostly
 840      O(N*log(N)).  Maybe one will discover or construct cases putting
 841      `mdiff' on its knees.  So far, `mdiff' seemingly behaves well for
 842      the little problems given to it.  If we devise and generate a more
 843      traditional `diff'-like output, in which all input files are
 844      relisted, this will add some time to the processing, but it will
 845      be only linear with regard with the total length of input files.
 846
 847      There is a clever optimized sorting algorithm for _all_ substrings
 848      of a file, which might be generalised to handle words or lines for
 849      `mdiff'.  But since the program is already faster than we initially
 850      expected, there is no emergency to resort to using such an
 851      algorithm.
 852
 853 Trading complexity for clarity
 854      When lines repeat a lot, there are surprisingly many ways to
 855      relate blocks of lines, and reporting them _all_ can make very
 856      hairy listings.  Any choice about reporting similarities, or not,
 857      is somewhat arbitrary, but we ought to make some of such choices
 858      for the program to be practical.  Some of these choices are
 859      detailed here.
 860
 861      If all members of a given cluster A are proper subsets of all
 862      members of another given cluster B, then cluster A is wholly
 863      forgotten.  However, let's presume for example that there are more
 864      members in A than in B.  Then, some members of A necessarily
 865      appear unrelated to any member of B.  In such case, it has been
 866      decided more useful to report _all_ occurrences of A members, even
 867      those embedded within occurrences of B members.  When only
 868      interested in members B, annotations pertaining to A may be
 869      perceived as clutter.  However, when interested in members of A,
 870      getting all of them is probably the most useful choice.
 871
 872      It sometimes happen that members of a very same cluster overlap.
 873      In the string `a a a', there are two overlapping members for the
 874      cluster represented by the string `a a', one from the first two
 875      `a', another from the last two `a'.  In such cases, one member of
 876      such an overlap is automatically chopped so the overlap does not
 877      occur.
 878
 879      White lines and items containing only delimiters are the possible
 880      source of a lot of complexity, if these are fully taken as
 881      significant.  Since this does not add much to clarity, they are
 882      better ignored, usually, through using `--ignore-blank-lines'
 883      (`-B') or `--ignore-delimiters' (`-j').  Increasing the value of
 884      `--minimum-size=ITEMS' (`-J ITEMS') option also cut off complexity
 885      in favor of clarity, yet some small matches may then go unnoticed.
 886      Exactly how to best adjust the ITEMS value is left for the user to
 887      decide.
 888
 889
 890    ---------- Footnotes ----------
 891
 892    (1) N is the total number of lines.
 893
 894 \1f
 895 File: wdiff.info,  Node: unify,  Next: Compatibility,  Prev: mdiff,  Up: Top
 896
 897 4 The diff format converter
 898 ***************************
 899
 900 The program `unify' has the purpose of manipulating context diffs and
 901 unified context diffs.  `unify' will accept either a regular context
 902 diff (old- or new-style) or a unified context diff as input, and
 903 generate either a unified diff or a new-style context diff as output.
 904
 905    Various other options allow you to echo the non-diff (comment) lines
 906 to stderr, modify the diff by removing the comment lines, and/or tweak
 907 the diff into a format that is good for releasing patches.
 908
 909    I think most people prefer unified context diffs in general.  But
 910 some of us just have trouble reading unidiffs, unless they get very
 911 simple.  Usual context diffs show how the code was _before_, and then,
 912 how the code is _after_.  Some people just prefer understanding twice
 913 thoroughly, than once fuzzily.  The tool is useful for those who handle
 914 a lot of diffs from various sources, and want them in a uniform format.
 915
 916 * Menu:
 917
 918 * unify invocation::            Invoking `unify'
 919
 920 \1f
 921 File: wdiff.info,  Node: unify invocation,  Up: unify
 922
 923 4.1 Invoking `unify'
 924 ====================
 925
 926 The format for running the `unify' program is:
 927
 928      unify OPTION ... [FILE]
 929
 930    The program reads the diff to convert from FILE, or if the source
 931 file is not mentioned, it will be read from the standard input.  The
 932 default is to output the diff in the opposite style of whatever was
 933 input, that is, regular context diffs will become unified context diffs,
 934 and unified context diffs will become unified context diffs, but this
 935 can be overridden by options.
 936
 937    `unify' supports the following command line options:
 938
 939 `--version'
 940      Merely prints the version numbers on standard output, and exits
 941      without doing anything else.
 942
 943 `--help'
 944      Merely prints a page of help on standard output, and exits without
 945      doing anything else.
 946
 947 `--context-diffs'
 948 `-c'
 949      Forces context diff output.
 950
 951 `--echo-comments'
 952 `-e'
 953      Echoes non-diff (comment) lines to stderr.  If a comment line is
 954      being stripped via the `-p' option, it is echoed with a preceding
 955      `!!! '.  If all comments are being stripped (via the `-s' option),
 956      no special designation is given.
 957
 958 `--old-diffs'
 959 `-o'
 960      Is used to force a context diff to be interpreted as being of the
 961      old-style even if it has the extra trailing asterisks that
 962      normally mark the new-style.  This is only needed if `unify' fails
 963      to work with your version of `diff'.
 964
 965 `--patch-format'
 966 `-p'
 967      Turns on patch-output mode.  This will do two things:
 968
 969        1. Transform a header like:
 970
 971                *** orig/file    Sat May  5 02:59:37 1990
 972                --- ./file       Sat May  5 03:00:08 1990
 973
 974           into a line of `Index: file' -- we choose the shorter name
 975           and strip a leading `./' sequence if present.
 976
 977        2. Strip lines that begin with `Only in ', `Common subdir',
 978           `Binary files' or `diff -'.
 979
 980
 981 `-P'
 982      Is the same as `-p'.
 983
 984 `--strip-comments'
 985 `-s'
 986      Strips non-diff lines (comments).
 987
 988 `--unidiffs'
 989 `-u'
 990      Forces unified diff output.
 991
 992 `-U'
 993      Is the same as `-up'.
 994
 995 `--use-equals'
 996 `-='
 997      Will use a `=' prefix in a unified diff for lines that are common
 998      to both files instead of using a leading space.  Though this is
 999      harder to read, it is less likely to be mangled by
1000      trailing-space-stripping sites when posted to Usenet.
1001
1002
1003 \1f
1004 File: wdiff.info,  Node: Compatibility,  Next: Experimental,  Prev: unify,  Up: Top
1005
1006 5 How `mdiff' differs
1007 *********************
1008
1009 The GNU project already has a `diff' program which is part of the GNU
1010 diffutils package.  There also are various non-GNU `diff' programs
1011 provided by various constructors.
1012
1013    There is also the well-established `wdiff' which uses `diff' under
1014 the hood. It differs slightly from `wdiff2', its intended `mdiff'-based
1015 successor.
1016
1017    The following sections compare `mdiff' specifications with both GNU
1018 `diff' and with `wdiff'.
1019
1020 * Menu:
1021
1022 * diff Compatibility::          Differences with `diff'
1023 * wdiff Compatibility::         Differences with `wdiff'
1024
1025 \1f
1026 File: wdiff.info,  Node: diff Compatibility,  Next: wdiff Compatibility,  Up: Compatibility
1027
1028 5.1 Differences with `diff'
1029 ===========================
1030
1031 GNU `diff' is a program which matured for a long while, and for which
1032 algorithms are based on computer science literature.  It is a fast
1033 program.  By comparison, `mdiff' is not more than a program kludged up
1034 rapidly to satisfy a few precise needs.  It only tries not being
1035 inordinately slow.
1036
1037    Most `diff' options are accepted by `mdiff' under the same short and
1038 long option names, and is able to produce resembling output, for making
1039 `mdiff' easier to learn and less surprising to users.  Yet, some
1040 differences exist in option decoding and output format.  Since `diff'
1041 and `mdiff' use different matching algorithms, it is very likely that
1042 the differences will not be exactly analyzed identically.
1043
1044    * A few `diff' options, which either accept no argument or require a
1045      mandatory one, are implemented in `mdiff' as options accepting an
1046      optional argument.  This may yield some surprises, for example,
1047      `-c4bir' would be accepted by `diff' and rejected by `mdiff', yet
1048      it may rewritten `-birc4' for both.  See below.
1049
1050    * Options `-c' and `-u' in `diff' ask for regular context and
1051      unified context output, respectively, without specifying the number
1052      of lines in the context.  `diff' has `-C NUMBER' and `-U NUMBER'
1053      options for asking for regular or unified context diffs with
1054      NUMBER context lines.  If `-c4' asks for four lines of context,
1055      the `4' is not really an argument of `-c', and this is really
1056      interpreted as `-c -4', where `-NUMBER' is meant to be a
1057      deprecated option for choosing the number of context lines, option
1058      which `mdiff' does not implement.  In `mdiff', `-c' and `-u' are
1059      really two options which are allowed to receive an optional
1060      argument, so the number of lines may, or may not be given, at the
1061      choice of the user.  In `mdiff', options `-C' and `-U' are
1062      completely equivalent to `-c' and `-u', and are provided only for
1063      the sake of compatibility.
1064
1065    * Option `-v' in `diff' means `--version', while it means
1066      `--verbose' in `mdiff'.  There is no short form for `--version' in
1067      `mdiff'.
1068
1069
1070 \1f
1071 File: wdiff.info,  Node: wdiff Compatibility,  Prev: diff Compatibility,  Up: Compatibility
1072
1073 5.2 Differences with `wdiff'
1074 ============================
1075
1076 Even if `mdiff' is meant to fully support `wdiff', options have been
1077 shuffled around so `mdiff' could better merge both `diff' and `wdiff'
1078 options in a common scheme.  `diff' habits were almost always favored
1079 in this option reorganisation.
1080
1081    `wdiff2' is now a mere front-end to `mdiff' that only rewrites the
1082 options.  The following notes apply.
1083
1084    * Some options are just transmitted unchanged, these are `-1', `-2',
1085      `-3' and `-i'.
1086
1087    * Option `-c' also gets turned into `-i', to be compatible with
1088      `wdiff' versions up to `0.4'.
1089
1090    * Simple option `-a' in `wdiff' becomes `-A' in `mdiff', `-l'
1091      becomes `-k', `-n' becomes `-m', `-p' becomes `-o', `-s' becomes
1092      `-v' and `-t' becomes `-z'.
1093
1094    * Options introducing strings, which are `-w', `-x', `-y' and `-z'
1095      in `wdiff', respectively become `-Y', `-Z', `-Q' and `-R' in
1096      `mdiff'.
1097
1098    * Options `-C', `-h' and `-v' are processed directly by `wdiff' and
1099      are not transmitted to `mdiff'.
1100
1101    * Further, the `-C' option of `wdiff' has no equivalent in `mdiff'.
1102
1103    * A new option `-q' inhibits the message which explains how `mdiff'
1104      might have been directly called.
1105
1106    * The option `--diff-input' (`-d') from `wdiff' isn't supported by
1107      `wdiff2' (yet).
1108
1109
1110 \1f
1111 File: wdiff.info,  Node: Experimental,  Prev: Compatibility,  Up: Top
1112
1113 6 Experimental programs
1114 ***********************
1115
1116 The GNU wdiff source package contains sources for a number of tools
1117 besides `wdiff' itself.  These are considered experimental: they might
1118 work for you, but they might just as well fail.  The following programs
1119 are considered experimental:
1120
1121    * `mdiff'
1122
1123    * `wdiff2'
1124
1125    * `unify'
1126
1127    Building these applications can be configured at build time by
1128 passing `--with-experimental' to the `configure' script.
1129
1130    _For this build, they have been enabled._ If you encounter a bug in
1131 an experimental program, the maintainers would still like to learn
1132 about it, but there is a greater chance that they decide not to fix
1133 such issues unless you provide a patch as well.
1134
1135 * Menu:
1136
1137 * Experimental History::   History of the experimental programs
1138
1139 \1f
1140 File: wdiff.info,  Node: Experimental History,  Up: Experimental
1141
1142 6.1 History of the Experimental programs
1143 ========================================
1144
1145 Many users suggested features, which were in turn inviting for the
1146 integration of `wdiff' into GNU diffutils.  Collaboration proved to be
1147 rather difficult.  After a few years, the `wdiff' author finally gave in
1148 and created `mdiff' as a way to break out of the situation and for
1149 becoming able to proceed with users' suggestions.
1150
1151    Before `mdiff' and the new `wdiff2' based on it were officially
1152 released, the original author resigned maintainership.  The new
1153 maintainers had little experience with the code, and therefore decided
1154 to mark it experimental. That way, the code wouldn't be lost, but it
1155 would be clear that it wasn't as testes as the good old `wdiff' command.
1156
1157
1158 \1f
1159 Tag Table:
1160 Node: Top\7f1559
1161 Node: Overview\7f2888
1162 Node: wdiff\7f4586
1163 Node: wdiff invocation\7f5303
1164 Node: wdiff Examples\7f14037
1165 Node: mdiff\7f18109
1166 Node: mdiff invocation\7f24944
1167 Node: Efficiency\7f34596
1168 Ref: Efficiency-Footnote-1\7f38298
1169 Node: unify\7f38338
1170 Node: unify invocation\7f39435
1171 Node: Compatibility\7f41790
1172 Node: diff Compatibility\7f42468
1173 Node: wdiff Compatibility\7f44725
1174 Node: Experimental\7f46128
1175 Node: Experimental History\7f46987
1176 \1f
1177 End Tag Table
1178
1179 \1f
1180 Local Variables:
1181 coding: utf-8
1182 End: