doc/find-maint.texi

   1 \input texinfo @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename find-maint.info
   4 @settitle Maintaining Findutils
   5 @c For double-sided printing, uncomment:
   6 @c @setchapternewpage odd
   7 @c %**end of header
   8
   9 @include versionmaint.texi
  10
  11 @iftex
  12 @finalout
  13 @end iftex
  14
  15 @dircategory GNU organization
  16 @direntry
  17 * Maintaining Findutils: (find-maint).        Maintaining GNU findutils
  18 @end direntry
  19
  20 @copying
  21 This manual explains how GNU findutils is maintained, how changes should
  22 be made and tested, and what resources exist to help developers.
  23
  24 This is edition @value{EDITION}, for findutils version @value{VERSION}.
  25
  26 Copyright @copyright{} 2007, 2008, 2010, 2011 Free Software Foundation,
  27 Inc.
  28
  29 Permission is granted to copy, distribute and/or modify this document
  30 under the terms of the GNU Free Documentation License, Version 1.3
  31 or any later version published by the Free Software Foundation;
  32 with no Invariant Sections, with no
  33 Front-Cover Texts, and with no Back-Cover Texts.
  34 A copy of the license is included in the section entitled ``GNU
  35 Free Documentation License''.
  36 @end copying
  37
  38 @titlepage
  39 @title Maintaining Findutils
  40 @subtitle Edition @value{EDITION}, for GNU findutils version @value{VERSION}
  41 @subtitle @value{UPDATED}
  42 @author by James Youngman
  43
  44 @page
  45 @vskip 0pt plus 1filll
  46 @insertcopying
  47 @end titlepage
  48
  49 @contents
  50
  51 @ifnottex
  52 @node Top, Introduction, (dir), (dir)
  53 @top Maintaining GNU Findutils
  54
  55 @insertcopying
  56 @end ifnottex
  57
  58 @menu
  59 * Introduction::
  60 * Maintaining GNU Programs::
  61 * Design Issues::
  62 * Coding Conventions::
  63 * Tools::
  64 * Using the GNU Portability Library::
  65 * Documentation::
  66 * Testing::
  67 * Bugs::
  68 * Distributions::
  69 * Internationalisation::
  70 * Security::
  71 * Making Releases::
  72 * GNU Free Documentation License::
  73 @end menu
  74
  75
  76
  77
  78
  79 @node Introduction
  80 @chapter Introduction
  81
  82 This document explains how to contribute to and maintain GNU
  83 Findutils.  It concentrates on developer-specific issues.  For
  84 information about how to use the software please refer to
  85 @xref{Introduction, ,Introduction,find,The Findutils manual}.
  86
  87 This manual aims to be useful without necessarily being verbose.  It's
  88 also a recent document, so there will be a many areas in which
  89 improvements can be made.  If you find that the document misses out
  90 important information or any part of the document is be so terse as to
  91 be unuseful, please ask for help on the @email{bug-findutils@@gnu.org}
  92 mailing list.  We'll try to improve this document too.
  93
  94
  95 @node Maintaining GNU Programs
  96 @chapter Maintaining GNU Programs
  97
  98 GNU Findutils is part of the GNU Project and so there are a number of
  99 documents which set out standards for the maintenance of GNU
 100 software.
 101
 102 @table @file
 103 @item standards.texi
 104 GNU Project Coding Standards.  All changes to findutils should comply
 105 with these standards.  In some areas we go somewhat beyond the
 106 requirements of the standards, but these cases are explained in this
 107 manual.
 108 @item maintain.texi
 109 Information for Maintainers of GNU Software.  This document provides
 110 guidance for GNU maintainers.  Everybody with commit access should
 111 read this document.   Everybody else is welcome to do so too, of
 112 course.
 113 @end table
 114
 115
 116
 117 @node Design Issues
 118 @chapter Design Issues
 119
 120 The findutils package is installed on many many systems, usually as a
 121 fundamental component.  The programs in the package are often used in
 122 order to successfully boot or fix the system.
 123
 124 This fact means that for findutils we bear in mind considerations that
 125 may not apply so much as for other packages.  For example, the fact
 126 that findutils is often a base component motivates us to
 127 @itemize
 128 @item Limit dependencies on libraries
 129 @item Avoid dependencies on other large packages (for example, interpreters)
 130 @item Be conservative when making changes to the 'stable' release branch
 131 @end itemize
 132
 133 All those considerations come before functionality.  Functional
 134 enhancements are still made to findutils, but these are almost
 135 exclusively introduced in the 'development' release branch, to allow
 136 extensive testing and proving.
 137
 138 Sometimes it is useful to have a priority list to provide guidance
 139 when making design trade-offs.   For findutils, that priority list is:
 140
 141 @enumerate
 142 @item Correctness
 143 @item Standards compliance
 144 @item Security
 145 @item Backward compatibility
 146 @item Performance
 147 @item Functionality
 148 @end enumerate
 149
 150 For example, we support the @code{-exec} action because POSIX
 151 compliance requires this, even though there are security problems with
 152 it and we would otherwise prefer people to use @code{-execdir}.  There
 153 are also cases where some performance is sacrificed in the name of
 154 security.  For example, the sanity checks that @code{find} performs
 155 while traversing a directory tree may slow it down.   We adopt
 156 functional changes, and functional changes are allowed to make
 157 @code{find} slower, but only if there is no detectable impact on users
 158 who don't use the feature.
 159
 160 Backward-incompatible changes do get made in order to comply with
 161 standards (for example the behaviour of @code{-perm -...} changed in
 162 order to comply with POSIX).  However, they don't get made in order to
 163 provide better ease of use; for example the semantics of @code{-size
 164 -2G} are almost always unexpected by users, but we retain the current
 165 behaviour because of backward compatibility and for its similarity to
 166 the block-rounding behaviour of @code{-size -30}.  We might introduce
 167 a change which does not have the unfortunate rounding behaviour, but
 168 we would choose another syntax (for example @code{-size '<2G'}) for
 169 this.
 170
 171 In a general sense, we try to do test-driven development of the
 172 findutils code; that is, we try to implement test cases for new
 173 features and bug fixes before modifying the code to make the test
 174 pass.  Some features of the code are tested well, but the test
 175 coverage for other features is less good.  If you are about to modify
 176 the code for a predicate and aren't sure about the test coverage, use
 177 @code{grep} on the test directories and measure the coverage with
 178 @code{lcov} or another test coverage tool.
 179
 180 You should be able to use the @code{coverage} Makefile target (it's
 181 defined in @code{maint.mk} to generate a test coverage report for
 182 findutils.   Due to limitations in @code{lcov}, this only works if
 183 your build directory is the same asthe source directory (that is,
 184 you're not using a VPATH build configuration).
 185
 186 Lastly, we try not to depend on having a ``working system''.  The
 187 findutils suite is used for diagnosis of problems, and this applies
 188 especially to @code{find}.  We should ensure that @code{find} still
 189 works on relatively broken systems, for example systems with damaged
 190 @file{/etc/passwd} or @code{/etc/fstab} files.  Another interesting
 191 example is the case where a system is a client of one or more
 192 unresponsive NFS servers.  On such a system, if you try to stat all
 193 mount points, your program will hang indefinitely, waiting for the
 194 remote NFS server to respond.
 195
 196 Another interesting but unusual case is broken NFS servers and corrupt
 197 filesystems; sometimes they return `impossible' file modes.  It's
 198 important that find does not entirely fail when encountering such a
 199 file.
 200
 201
 202 @node Coding Conventions
 203 @chapter Coding Conventions
 204
 205 Coding style documents which set out to establish a uniform look and
 206 feel to source code have worthy goals, for example greater ease of
 207 maintenance and readability.  However, I do not believe that in
 208 general coding style guide authors can envisage every situation, and
 209 it is always possible that it might on occasion be necessary to break
 210 the letter of the style guide in order to honour its spirit, or to
 211 better achieve the style guide's goals.
 212
 213 I've certainly seen many style guides outside the free software world
 214 which make bald statements such as ``functions shall have exactly one
 215 return statement''.  The desire to ensure consistency and obviousness
 216 of control flow is laudable, but it is all too common for such bald
 217 requirements to be followed unthinkingly.  Certainly I've seen such
 218 coding standards result in unmaintainable code with terrible
 219 infelicities such as functions containing @code{if} statements nested
 220 nine levels deep.  I suppose such coding standards don't survive in
 221 free software projects because they tend to drive away potential
 222 contributors or tend to generate heated discussions on mailing lists.
 223 Equally, a nine-level-deep function in a free software program would
 224 quickly get refactored, assuming it is obvious what the function is
 225 supposed to do...
 226
 227 Be that as it may, the approach I will take for this document is to
 228 explain some idioms and practices in use in the findutils source code,
 229 and leave it up to the reader's engineering judgement to decide which
 230 considerations apply to the code they are working on, and whether or
 231 not there is sufficient reason to ignore the guidance in current
 232 circumstances.
 233
 234
 235 @menu
 236 * Make the Compiler Find the Bugs::
 237 * Factor Out Repeated Code::
 238 * The File System Is Being Modified::
 239 * Don't Trust the File System Contents::
 240 * Debugging is For Users Too::
 241 @end menu
 242
 243 @node    Make the Compiler Find the Bugs
 244 @section Make the Compiler Find the Bugs
 245
 246 Finding bugs is tedious.  If I have a filesystem containing two
 247 million files, and a find command line should print one million of
 248 them, but in fact it misses out 1%, you can tell the program is
 249 printing the wrong result only if you know the right answer for that
 250 filesystem at that time.  If you don't know this, you may just not
 251 find out about that bug.  For this reason it is important to have a
 252 comprehensive test suite.
 253
 254 The test suite is of course not the only way to find the bugs.  The
 255 findutils source code makes liberal use of the assert macro.  While on
 256 the one hand these might be a performance drain, the performance
 257 impact of most of these is negligible compared to the time taken to
 258 fetch even one sector from a disk drive.
 259
 260 Assertions should not be used to check the results of operations which
 261 may be affected by the program's external environment.  For example,
 262 never assert that a file could be opened successfully.  Errors
 263 relating to problems with the program's execution environment should
 264 be diagnosed with a user-oriented error message.  An assertion failure
 265 should always denote a bug in the program.
 266
 267 Don't use @code{assert} to catch not-fuly-implemented features of your
 268 code.  Finish the implementation, disable the code, or leave the
 269 unfinished version on a local branch.
 270
 271 Several programs in the findutils suite perform self-checks.  See for
 272 example the function @code{pred_sanity_check} in @file{find/pred.c}.
 273 This is generally desirable.
 274
 275 There are also a number of small ways in which we can help the
 276 compiler to find the bugs for us.
 277
 278 @subsection Constants in Equality Testing
 279
 280 It's a common error to write @code{=} when @code{==} is meant.
 281 Sometimes this happens in new code and is simply due to finger
 282 trouble.  Sometimes it is the result of the inadvertent deletion of a
 283 character.  In any case, there is a subset of cases where we can
 284 persuade the compiler to generate an error message when we make this
 285 mistake; this is where the equality test is with a constant.
 286
 287 This is an example of a vulnerable piece of code.
 288
 289 @example
 290 if (x == 2)
 291  ...
 292 @end example
 293
 294 A simple typo converts the above into
 295
 296 @example
 297 if (x = 2)
 298  ...
 299 @end example
 300
 301 We've introduced a bug; the condition is always true, and the value of
 302 @code{x} has been changed.  However, a simple change to our practice
 303 would have made us immune to this problem:
 304
 305 @example
 306 if (2 == x)
 307  ...
 308 @end example
 309
 310 Usually, the Emacs keystroke @kbd{M-t} can be used to swap the operands.
 311
 312
 313 @subsection Spelling of ASCII NUL
 314
 315 Strings in C are just sequences of characters terminated by a NUL.
 316 The ASCII NUL character has the numerical value zero.  It is normally
 317 represented in C code as @samp{\0}.  Here is a typical piece of C
 318 code:
 319
 320 @example
 321 *p = '\0';
 322 @end example
 323
 324 Consider what happens if there is an unfortunate typo:
 325
 326 @example
 327 *p = '0';
 328 @end example
 329
 330 We have changed the meaning of our program and the compiler cannot
 331 diagnose this as an error.  Our string is no longer terminated.  Bad
 332 things will probably happen.  It would be better if the compiler could
 333 help us diagnose this problem.
 334
 335 In C, the type of @code{'\0'} is in fact int, not char.  This provides
 336 us with a simple way to avoid this error.  The constant @code{0} has
 337 the same value and type as the constant @code{'\0'}.  However, it is
 338 not as vulnerable to typos.    For this reason I normally prefer to
 339 use this code:
 340
 341 @example
 342 *p = 0;
 343 @end example
 344
 345
 346 @node    Factor Out Repeated Code
 347 @section Factor Out Repeated Code
 348
 349 Repeated code imposes a greater maintenance burden and increases the
 350 exposure to bugs.  For example, if you discover that something you
 351 want to implement has some similarity with an existing piece of code,
 352 don't cut and paste it.  Instead, factor the code out.  The risk of
 353 cutting and pasting the code, particularly if you do this several
 354 times, is that you end up with several copies of the same code.
 355
 356 If the original code had a bug, you now have N places where this needs
 357 to be fixed.  It's all to easy to miss some out when trying to fix the
 358 bug.  Equally, it's quite possible that when pasting the code into
 359 some function, the pasted code was not quite adapted correctly to its
 360 new environment.  To pick a contrived example, perhaps it modifies a
 361 global variable which it that code shouldn't be touching in its new
 362 home.  Worse, perhaps it makes some unstated assumption about the
 363 nature of the input arguments which is in fact not true for the
 364 context of the now duplicated code.
 365
 366 A good example of the use of refactoring in findutils is the
 367 @code{collect_arg} function in @file{find/parser.c}.  A less clear-cut
 368 but larger example is the factoring out of code which would otherwise
 369 have been duplicated between @file{find/find.c} and
 370 @code{find/ftsfind.c}.
 371
 372 The findutils test suite is comprehensive enough that refactoring code
 373 should not generally be a daunting prospect from a testing point of
 374 view.  Nevertheless there are some areas which are only
 375 lightly-tested:
 376
 377 @enumerate
 378 @item Tests on the ages of files
 379 @item Code which deals with the values returned by operating system calls (for example handling of ENOENT)
 380 @item Code dealing with OS limits (for example, limits on path length
 381 or exec arguments)
 382 @item Code relating to features not all systems have (for example
 383 Solaris Doors)
 384 @end enumerate
 385
 386 Please exercise caution when working in those areas.
 387
 388
 389 @node    Debugging is For Users Too
 390 @section Debugging is For Users Too
 391
 392 Debug and diagnostic code is often used to verify that a program is
 393 working in the way its author thinks it should be.  But users are
 394 often uncertain about what a program is doing, too.  Exposing them a
 395 little more diagnostic information can help.  Much of the diagnostic
 396 code in @code{find}, for example, is controlled by the @samp{-D} flag,
 397 as opposed to C preprocessor directives.
 398
 399 Making diagnostic messages available to users also means that the
 400 phrasing of the diagnostic messages becomes important, too.
 401
 402
 403 @node    Don't Trust the File System Contents
 404 @section Don't Trust the File System Contents
 405
 406 People use @code{find} to search in directories created by other
 407 people.  Sometimes they do this to check to suspicious activity (for
 408 example to look for new setuid binaries).  This means that it would be
 409 bad if @code{find} were vulnerable to, say, a security problem
 410 exploitable by constructing a specially-crafted filename.  The same
 411 consideration would apply to @code{locate} and @code{updatedb}.
 412
 413 Henry Spencer said this well in his fifth commandment:
 414 @quotation
 415 Thou shalt check the array bounds of all strings (indeed, all arrays),
 416 for surely where thou typest @samp{foo} someone someday shall type
 417 @samp{supercalifragilisticexpialidocious}.
 418 @end quotation
 419
 420 Symbolic links can often be a problem.  If @code{find} calls
 421 @code{lstat} on something and discovers that it is a directory, it's
 422 normal for @code{find} to recurse into it.  Even if the @code{chdir}
 423 system call is used immediately, there is still a window of
 424 opportunity between the @code{lstat} and the @code{chdir} in which a
 425 malicious person could rename the directory and substitute a symbolic
 426 link to some other directory.
 427
 428 @node    The File System Is Being Modified
 429 @section The File System Is Being Modified
 430
 431 The filesystem gets modified while you are traversing it.  For,
 432 example, it's normal for files to get deleted while @code{find} is
 433 traversing a directory.  Issuing an error message seems helpful when a
 434 file is deleted from the one directory you are interested in, but if
 435 @code{find} is searching 15000 directories, such a message becomes
 436 less helpful.
 437
 438 Bear in mind also that it is possible for the directory @code{find} is
 439 currently searching could be moved to another point in the filesystem,
 440 and that the directory in which @code{find} was started could be
 441 deleted.
 442
 443 Henry Spencer's sixth commandment is also apposite here:
 444 @quotation
 445 If a function be advertised to return an error code in the event of
 446 difficulties, thou shalt check for that code, yea, even though the
 447 checks triple the size of thy code and produce aches in thy typing
 448 fingers, for if thou thinkest ``it cannot happen to me'', the gods
 449 shall surely punish thee for thy arrogance.
 450 @end quotation
 451
 452 There are a lot of files out there.  They come in all dates and
 453 sizes.  There is a condition out there in the real world to exercise
 454 every bit of the code base.  So we try to test that code base before
 455 someone falls over a bug.
 456
 457
 458 @node Tools
 459 @chapter Tools
 460 Most of the tools required to build findutils are mentioned in the
 461 file @file{README-hacking}.  We also use some other tools:
 462
 463 @table @asis
 464 @item System call traces
 465 Much of the execution time of find is spent waiting for filesystem
 466 operations.  A system call trace (for example, that provided by
 467 @code{strace}) shows what system calls are being made.   Using this
 468 information we can work to remove unnecessary file system operations.
 469
 470 @item Valgrind
 471 Valgrind is a tool which dynamically verifies the memory accesses a
 472 program makes to ensure that they are valid (for example, that the
 473 behaviour of the program does not in any way depend on the contents of
 474 uninitialized memory).
 475
 476 @item DejaGnu
 477 DejaGnu is the test framework used to run the findutils test suite
 478 (the @code{runtest} program is part of DejaGnu).  It would be ideal if
 479 everybody building @code{findutils} also ran the test suite, but many
 480 people don't have DejaGnu installed.  When changes are made to
 481 findutils, DejaGnu is invoked a lot. @xref{Testing}, for more
 482 information.
 483 @end table
 484
 485 @node Using the GNU Portability Library
 486 @chapter Using the GNU Portability Library
 487 The Gnulib library (@url{http://www.gnu.org/software/gnulib/}) makes a
 488 variety of systems look more like a GNU/Linux system and also applies
 489 a bunch of automatic bug fixes and workarounds.  Some of these also
 490 apply to GNU/Linux systems too.  For example, the Gnulib regex
 491 implementation is used when we determine that we are building on a
 492 GNU libc system with a bug in the regex implementation.
 493
 494
 495 @section How and Why we Import the Gnulib Code
 496 Gnulib does not have a release process which results in a source
 497 tarball you can download.  Instead, the code is simply made available
 498 by GIT.   The code is also available via @code{git-cvspserver}, but
 499 we decided not to use this, since @code{import-gnulib.sh} depended on
 500 @code{cvs update -D}, which at the time @code{git-cvspserver} did not
 501 support.
 502
 503 GNU projects vary in how they interact with Gnulib.  Many import a
 504 selection of code from Gnulib into the working directory and then
 505 check the updated files into the repository for their project.  The
 506 coreutils project does this, for example.
 507
 508 At the last maintainer changeover for findutils (2003) it turned out
 509 that there was a lot of material in findutils in common with Gnulib,
 510 but it had not been updated in a long time.  It was difficult to
 511 figure out which source files were intended to track external sources
 512 and which were intended to contain incompatible changes, or diverge
 513 for other reasons.
 514
 515 To reduce this uncertainty, I decided to treat Gnulib much like
 516 Automake.  Files supplied by Automake are simply absent from the
 517 findutils source tree.  When Automake is run with @code{automake
 518 --add-missing --copy}, it adds in all the files it thinks should be
 519 there which aren't there already.
 520
 521 An analogous approach is taken with Gnulib.  The Gnulib code is
 522 imported from the git repository for Gnulib with a findutils helper
 523 script, @code{import-gnulib.sh}.  That script fetches a copy of the
 524 Gnulib code into the subdirectory @file{gnulib-git} and then runs
 525 @code{gnulib-tool}.  The @code{gnulib-tool} program copies the
 526 required parts of Gnulib into the findutils source tree in the
 527 subdirectory @file{gnulib}.  This process gives us the property that
 528 the code in @file{gnulib} and @code{gnulib-git} is not included in the
 529 findutils git tree.   Both directories are listed in @file{.gitignore}
 530 and so git ignores them.
 531
 532 Findutils does not use all the Gnulib code.  The modules we need are
 533 listed in the file @file{import-gnulib.config}.  The same file also
 534 indicates the version of Gnulib that we want to use.  Since Gnulib has
 535 no actual release process, we just use a date.  Both
 536 @file{import-gnulib.sh} and @file{import-gnulib.config} are in the
 537 findutils git repository.
 538
 539 The upshot of all this is that we can use the findutils git repository
 540 to track which version of Gnulib every findutils release uses.  That
 541 information is also provided when the user invokes a findutils program
 542 with the @samp{--version} option.
 543
 544 A small number of files are installed by automake and will therefore
 545 vary according to which version of automake was used to generate a
 546 release.  This includes for example boiler-plate GNU files such as
 547 @file{ABOUT-NLS}, @file{INSTALL} and @file{COPYING}.
 548
 549
 550 @section How We Fix Gnulib Bugs
 551 If we always directly import the Gnulib code directly from the git
 552 repository in this way, it is impossible to maintain a locally
 553 different copy of Gnulib.  This is often a benefit in that accidental
 554 version skew is prevented.
 555
 556 However, sometimes we want deliberate version skew in order to use a
 557 findutils-specific patched version of a Gnulib file, for example
 558 because we fixed a bug.
 559
 560 Gnulib is used by quite a number of GNU projects, and this means that
 561 it gets plenty of testing.  Therefore there are relatively few bugs in
 562 the Gnulib code, but it does happen from time to time.
 563
 564 However, since there is no waiting around for a Gnulib source release
 565 tarball, Gnulib bugs are generally fixed quickly.  Here is an outline
 566 of the way we would contribute a fix to Gnulib (assuming you know it
 567 is not already fixed in the current Gnulib git tree):
 568
 569 @table @asis
 570 @item Check you already completed a copyright assignment for Gnulib
 571 @item Begin with a vanilla git tree
 572 Download the Findutils source code from git (or use the tree you have
 573 already)
 574 @item Check out a copy of the Gnulib source
 575 Check out a copy of the Gnulib source tree.  The
 576 @code{import-gnulib.sh} script may have generated a shallow git clone
 577 as opposed to a normal, full clone in the directory @file{gnulib-git}.
 578 This means that you may not be able to clone the repository that
 579 @code{import-gnulib.sh} generates.  However, you can make a normal
 580 (full) clone with @code{git clone
 581 git_repo="git://git.savannah.gnu.org/gnulib.git"}.  Do this somewhere
 582 outside the findutils source tree.
 583 @item Import Gnulib from your local copy
 584 The @code{import-gnulib.sh} tool has a @samp{-d} option which you can
 585 use to import the code from a local copy of Gnulib.
 586 @item Build findutils
 587 Build findutils and run the test suite, which should pass.  In our
 588 example we assume you have just noticed a bug in Gnulib, not that
 589 recent Gnulib changes broke the findutils regression tests.
 590 @item Write a test case
 591 If in fact Gnulib did break the findutils regression tests, you can probably
 592 skip this step, since you already have a test case demonstrating the problem.
 593 Otherwise, write a findutils test case for the bug and/or a Gnulib test case.
 594 @item Fix the Gnulib bug
 595 Make sure your editor follows symbolic links so that your changes to
 596 @file{gnulib/...} actually affect the files in the git working
 597 directory you checked out earlier.   Observe that your test now passes.
 598 @item Prepare a Gnulib patch
 599 Use @code{git format-patch} to prepare the patch.  Follow the normal
 600 usage for checkin comments (take a look at the output of @code{git
 601 log}).  Check that the patch conforms with the GNU coding standards,
 602 and email it to the Gnulib mailing list.
 603 @item Wait for the patch to be applied
 604 Once your bug fix has been applied, you can update your local directory
 605 from git, re-import the code into Findutils (still using the @code{-d}
 606 option), and re-run the tests.  This verifies that the fix the Gnulib
 607 team made actually fixes your problem.
 608 @item Reimport the Gnulib code
 609 Update the findutils file @file{import-gnulib.config} to specify git
 610 commit which is after the point at which the bug fix was committed to
 611 Gnulib.  You can do this with @code{git rev-parse HEAD}.  Finally,
 612 re-import the Gnulib code directly from git by using
 613 @samp{import-gnulib.sh} without the @samp{-d} option, and run the
 614 tests again.  This verifies that there was no remaining local change
 615 that we were relying on to fix the bug.   Make sure you checked
 616 everything in by running @code{git status}.
 617 @end table
 618
 619 There is an alternative to the method above; it is possible to store
 620 local diffs to be patched into gnulib beneath the
 621 @file{gnulib-local}.  Normally however, there is no need for this,
 622 since gnulib updates are very prompt.
 623
 624
 625 @node Documentation
 626 @chapter Documentation
 627
 628 The findutils git tree includes several different types of
 629 documentation.
 630
 631 @section User Documentation
 632 User-oriented documentation is provided as manual pages and in
 633 Texinfo.  See
 634 @ref{Introduction,,Introduction,find,The Findutils manual}.
 635
 636 Please make sure both sets of documentation are updated if you make a
 637 change to the code.  The GNU coding standards do not normally call for
 638 maintaining manual pages on the grounds of effort duplication.
 639 However, the manual page format is more convenient for quick
 640 reference, and so it's worth maintaining both types of documentation.
 641 However, the manual pages are normally rather more terse than the
 642 Texinfo documentation.  The manual pages are suitable for reference
 643 use, but the Texinfo manual should also include introductory and
 644 tutorial material.
 645
 646
 647 @section Build Guidance
 648
 649 @table @file
 650 @item ABOUT-NLS
 651 Describes the Free Translation Project, the translation status of
 652 various GNU projects, and how to participate by translating an
 653 application.
 654 @item AUTHORS
 655 Lists the authors of findutils.
 656 @item COPYING
 657 The copyright license covering findutils; currently, the GNU GPL,
 658 version 3.
 659 @item INSTALL
 660 Generic installation instructions for installing GNU programs.
 661 @item README
 662 Information about how to compile findutils in particular
 663 @item README-alpha
 664 A README file which is included with testing releases of findutils.
 665 @item README-hacking
 666 Describes how to build findutils from the code in git.
 667 @item THANKS
 668 Thanks for people who contributed to findutils.  Generally, if
 669 someone's contribution was significant enough to need a copyright
 670 assignment, their name should go in here.
 671 @item TODO
 672 Mainly obsolete.
 673 @end table
 674
 675
 676 @section Release Information
 677 @table @file
 678 @item NEWS
 679 Enumerates the user-visible change in each release.  Typical changes
 680 are fixed bugs, functionality changes and documentation changes.
 681 Include the date when a release is made.
 682 @item ChangeLog
 683 This file enumerates all changes to the findutils source code (with
 684 the possible exception of @file{.cvsignore} and @code{.gitignore}
 685 changes).  The level of detail used for this file should be sufficient
 686 to answer the questions ``what changed?'' and ``why was it changed?''.
 687 If a change fixes a bug, always give the bug reference number in both
 688 the @file{ChangeLog} and @file{NEWS} files and of course also in the
 689 checkin message.  In general, it should be possible to enumerate all
 690 material changes to a function by searching for its name in
 691 @file{ChangeLog}.  Mention when each release is made.
 692 @end table
 693
 694 @node Testing
 695 @chapter Testing
 696 This chapter will explain the general procedures for adding tests to
 697 the test suite, and the functions defined in the findutils-specific
 698 DejaGnu configuration.  Where appropriate references will be made to
 699 the DejaGnu documentation.
 700
 701 @node Bugs
 702 @chapter Bugs
 703
 704 Bugs are logged in the Savannah bug tracker
 705 @url{http://savannah.gnu.org/bugs/?group=findutils}.  The tracker
 706 offers several fields but their use is largely obvious.  The
 707 life-cycle of a bug is like this:
 708
 709
 710 @table @asis
 711 @item Open
 712 Someone, usually a maintainer, a distribution maintainer or a user,
 713 creates a bug by filling in the form.   They fill in field values as
 714 they see fit.  This will generate an email to
 715 @email{bug-findutils@@gnu.org}.
 716
 717 @item Triage
 718 The bug hangs around with @samp{Status=None} until someone begins to
 719 work on it.  At that point they set the ``Assigned To'' field and will
 720 sometimes set the status to @samp{In Progress}, especially if the bug
 721 will take a while to fix.
 722
 723 @item Non-bugs
 724 Quite a lot of reports are not actually bugs; for these the usual
 725 procedure is to explain why the problem is not a bug, set the status
 726 to @samp{Invalid} and close the bug.   Make sure you set the
 727 @samp{Assigned to} field to yourself before closing the bug.
 728
 729 @item Fixing
 730 When you commit a bug fix into git (or in the case of a contributed
 731 patch, commit the change), mark the bug as @samp{Fixed}.  Make sure
 732 you include a new test case where this is relevant.  If you can figure
 733 out which releases are affected, please also set the @samp{Release}
 734 field to the earliest release which is affected by the bug.
 735 Indicate which source branch the fix is included in (for example,
 736 4.2.x or 4.3.x).  Don't close the bug yet.
 737
 738 @item Release
 739 When a release is made which includes the bug fix, make sure the bug
 740 is listed in the NEWS file.  Once the release is made, fill in the
 741 @samp{Fixed Release} field and close the bug.
 742 @end table
 743
 744
 745 @node Distributions
 746 @chapter Distributions
 747 Almost all GNU/Linux distributions include findutils, but only some of
 748 them have a package maintainer who is a member of the mailing list.
 749 Distributions don't often feed back patches to the
 750 @email{bug-findutils@@gnu.org} list, but on the other hand many of
 751 their patches relate only to standards for file locations and so
 752 forth, and are therefore distribution specific.  On an irregular basis
 753 I check the current patches being used by one or two distributions,
 754 but the total number of GNU/Linux distributions is large enough that
 755 we could not hope to cover them all.
 756
 757 Often, bugs are raised against a distribution's bug tracker instead of
 758 GNU's.    Periodically (about every six months) I take a look at some
 759 of the more accessible bug trackers to indicate which bugs have been
 760 fixed upstream.
 761
 762 Many distributions include both findutils and the slocate package,
 763 which provides a replacement @code{locate}.
 764
 765
 766 @node Internationalisation
 767 @chapter Internationalisation
 768 Translation is essentially automated from the maintainer's point of
 769 view.  The TP mails the maintainer when a new PO file is available,
 770 and we just download it and check it in.  We copy the @file{.po} files
 771 into the git repository.  For more information, please see
 772 @url{http://translationproject.org/domain/findutils.html}.
 773
 774
 775 @node Security
 776 @chapter Security
 777
 778 See @ref{Security Considerations, ,Security Considerations,find,The
 779 Findutils manual}, for a full description of the findutils approach to
 780 security considerations and discussion of particular tools.
 781
 782 If someone reports a security bug publicly, we should fix this as
 783 rapidly as possible.  If necessary, this can mean issuing a fixed
 784 release containing just the one bug fix.  We try to avoid issuing
 785 releases which include both significant security fixes and functional
 786 changes.
 787
 788 Where someone reports a security problem privately, we generally try
 789 to construct and test a patch without pushing the intermediate code to
 790 the public repository.
 791
 792 Once everything has been tested, this allows us to make a release and
 793 push the patch.  The advantage of doing things this way is that we
 794 avoid situations where people watching for git commits can figure out
 795 and exploit a security problem before a fixed release is available.
 796
 797 It's important that security problems be fixed promptly, but don't
 798 rush so much that things go wrong.  Make sure the new release really
 799 fixes the problem.  It's usually best not to include functional
 800 changes in your security-fix release.
 801
 802 If the security problem is serious, send an alert to
 803 @email{vendor-sec@@lst.de}.  The members of the list include most
 804 GNU/Linux distributions.  The point of doing this is to allow them to
 805 prepare to release your security fix to their customers, once the fix
 806 becomes available.    Here is an example alert:-
 807
 808 @smallexample
 809 GNU findutils heap buffer overrun (potential privilege escalation)
 810
 811
 812
 813 I. BACKGROUND
 814 =============
 815
 816 GNU findutils is a set of programs which search for files on Unix-like
 817 systems.  It is maintained by the GNU Project of the Free Software
 818 Foundation.  For more information, see
 819 @url{http://www.gnu.org/software/findutils}.
 820
 821
 822 II. DESCRIPTION
 823 ===============
 824
 825 When GNU locate reads filenames from an old-format locate database,
 826 they are read into a fixed-length buffer allocated on the heap.
 827 Filenames longer than the 1026-byte buffer can cause a buffer overrun.
 828 The overrunning data can be chosen by any person able to control the
 829 names of filenames created on the local system.  This will normally
 830 include all local users, but in many cases also remote users (for
 831 example in the case of FTP servers allowing uploads).
 832
 833 III. ANALYSIS
 834 =============
 835
 836 Findutils supports three different formats of locate database, its
 837 native format "LOCATE02", the slocate variant of LOCATE02, and a
 838 traditional ("old") format that locate uses on other Unix systems.
 839
 840 When locate reads filenames from a LOCATE02 database (the default
 841 format), the buffer into which data is read is automatically extended
 842 to accommodate the length of the filenames.
 843
 844 This automatic buffer extension does not happen for old-format
 845 databases.  Instead a 1026-byte buffer is used.  When a longer
 846 pathname appears in the locate database, the end of this buffer is
 847 overrun.  The buffer is allocated on the heap (not the stack).
 848
 849 If the locate database is in the default LOCATE02 format, the locate
 850 program does perform automatic buffer extension, and the program is
 851 not vulnerable to this problem.  The software used to build the
 852 old-format locate database is not itself vulnerable to the same
 853 attack.
 854
 855 Most installations of GNU findutils do not use the old database
 856 format, and so will not be vulnerable.
 857
 858
 859 IV. DETECTION
 860 =============
 861
 862 Software
 863 --------
 864 All existing releases of findutils are affected.
 865
 866
 867 Installations
 868 -------------
 869
 870 To discover the longest path name on a given system, you can use the
 871 following command (requires GNU findutils and GNU coreutils):
 872
 873 @verbatim
 874 find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L
 875 @end verbatim
 876
 877 V. EXAMPLE
 878 ==========
 879
 880 This section includes a shell script which determines which of a list
 881 of locate binaries is vulnerable to the problem.  The shell script has
 882 been tested only on glibc based systems having a mktemp binary.
 883
 884 NOTE: This script deliberately overruns the buffer in order to
 885 determine if a binary is affected.  Therefore running it on your
 886 system may have undesirable effects.  We recommend that you read the
 887 script before running it.
 888
 889 @verbatim
 890 #! /bin/sh
 891 set +m
 892 if vanilla_db="$(mktemp nicedb.XXXXXX)" ; then
 893     if updatedb --prunepaths="" --old-format --localpaths="/tmp" \
 894         --output="$@{vanilla_db@}" ; then
 895         true
 896     else
 897         rm -f "$@{vanilla_db@}"
 898         vanilla_db=""
 899         echo "Failed to create old-format locate database; skipping the sanity checks" >&2
 900     fi
 901 fi
 902
 903 make_overrun_db() @{
 904     # Start with a valid database
 905     cat "$@{vanilla_db@}"
 906     # Make the final entry really long
 907     dd if=/dev/zero  bs=1 count=1500 2>/dev/null | tr '\000' 'x'
 908 @}
 909
 910
 911
 912 ulimit -c 0
 913
 914 usage() @{ echo "usage: $0 binary [binary...]" >&2; exit $1; @}
 915 [ $# -eq 0 ] && usage 1
 916
 917 bad=""
 918 good=""
 919 ugly=""
 920 if dbfile="$(mktemp nasty.XXXXXX)"
 921 then
 922     make_overrun_db > "$dbfile"
 923     for locate ; do
 924       ver="$locate = $("$locate"  --version | head -1)"
 925       if [ -z "$vanilla_db" ] || "$locate" -d "$vanilla_db" "" >/dev/null ; then
 926           "$locate" -d "$dbfile" "" >/dev/null
 927           if [ $? -gt 128 ] ; then
 928               bad="$bad
 929 vulnerable: $ver"
 930           else
 931               good="$good
 932 good: $ver"
 933           fi
 934        else
 935           # the regular locate failed
 936           ugly="$ugly
 937 buggy, may or may not be vulnerable: $ver"
 938        fi
 939     done
 940     rm -f "$@{dbfile@}" "$@{vanilla_db@}"
 941     # good: unaffected.  bad: affected (vulnerable).
 942     # ugly: doesn't even work for a normal old-format database.
 943     echo "$good"
 944     echo "$bad"
 945     echo "$ugly"
 946 else
 947   exit 1
 948 fi
 949 @end verbatim
 950
 951
 952
 953
 954 VI. VENDOR RESPONSE
 955 ===================
 956
 957 The GNU project discovered the problem while 'locate' was being worked
 958 on; this is the first public announcement of the problem.
 959
 960 The GNU findutils mantainer has issued a patch as p[art of this
 961 announcement.  The patch appears below.
 962
 963 A source release of findutils-4.2.31 will be issued on 2007-05-30.
 964 That release will of course include the patch.  The patch will be
 965 committed to the public CVS repository at the same time.  Public
 966 announcements of the release, including a description of the bug, will
 967 be made at the same time as the release.
 968
 969 A release of findutils-4.3.x will follow and will also include the
 970 patch.
 971
 972
 973 VII. PATCH
 974 ==========
 975
 976 This patch should apply to findutils-4.2.23 and later.
 977 Findutils-4.2.23 was released almost two years ago.
 978 @verbatim
 979 Index: locate/locate.c
 980 ===================================================================
 981 RCS file: /cvsroot/findutils/findutils/locate/locate.c,v
 982 retrieving revision 1.58.2.2
 983 diff -u -p -r1.58.2.2 locate.c
 984 --- locate/locate.c     22 Apr 2007 16:57:42 -0000      1.58.2.2
 985 +++ locate/locate.c     28 May 2007 10:18:16 -0000
 986 @@@@ -124,9 +124,9 @@@@ extern int errno;
 987
 988  #include "locatedb.h"
 989  #include <getline.h>
 990 -#include "../gnulib/lib/xalloc.h"
 991 -#include "../gnulib/lib/error.h"
 992 -#include "../gnulib/lib/human.h"
 993 +#include "xalloc.h"
 994 +#include "error.h"
 995 +#include "human.h"
 996  #include "dirname.h"
 997  #include "closeout.h"
 998  #include "nextelem.h"
 999 @@@@ -468,10 +468,36 @@@@ visit_justprint_unquoted(struct process_
1000    return VISIT_CONTINUE;
1001  @}
1002
1003 +static void
1004 +toolong (struct process_data *procdata)
1005 +@{
1006 +  error (EXIT_FAILURE, 0,
1007 +        _("locate database %s contains a "
1008 +          "filename longer than locate can handle"),
1009 +        procdata->dbfile);
1010 +@}
1011 +
1012 +static void
1013 +extend (struct process_data *procdata, size_t siz1, size_t siz2)
1014 +@{
1015 +  /* Figure out if the addition operation is safe before performing it. */
1016 +  if (SIZE_MAX - siz1 < siz2)
1017 +    @{
1018 +      toolong (procdata);
1019 +    @}
1020 +  else if (procdata->pathsize < (siz1+siz2))
1021 +    @{
1022 +      procdata->pathsize = siz1+siz2;
1023 +      procdata->original_filename = x2nrealloc (procdata->original_filename,
1024 +                                               &procdata->pathsize,
1025 +                                               1);
1026 +    @}
1027 +@}
1028 +
1029  static int
1030  visit_old_format(struct process_data *procdata, void *context)
1031  @{
1032 -  register char *s;
1033 +  register size_t i;
1034    (void) context;
1035
1036    /* Get the offset in the path where this path info starts.  */
1037 @@@@ -479,20 +505,35 @@@@ visit_old_format(struct process_data *pr
1038      procdata->count += getw (procdata->fp) - LOCATEDB_OLD_OFFSET;
1039    else
1040      procdata->count += procdata->c - LOCATEDB_OLD_OFFSET;
1041 +  assert(procdata->count > 0);
1042
1043 -  /* Overlay the old path with the remainder of the new.  */
1044 -  for (s = procdata->original_filename + procdata->count;
1045 +  /* Overlay the old path with the remainder of the new.  Read
1046 +   * more data until we get to the next filename.
1047 +   */
1048 +  for (i=procdata->count;
1049         (procdata->c = getc (procdata->fp)) > LOCATEDB_OLD_ESCAPE;)
1050 -    if (procdata->c < 0200)
1051 -      *s++ = procdata->c;              /* An ordinary character.  */
1052 -    else
1053 -      @{
1054 -       /* Bigram markers have the high bit set. */
1055 -       procdata->c &= 0177;
1056 -       *s++ = procdata->bigram1[procdata->c];
1057 -       *s++ = procdata->bigram2[procdata->c];
1058 -      @}
1059 -  *s-- = '\0';
1060 +    @{
1061 +      if (procdata->c < 0200)
1062 +       @{
1063 +         /* An ordinary character. */
1064 +         extend (procdata, i, 1u);
1065 +         procdata->original_filename[i++] = procdata->c;
1066 +       @}
1067 +      else
1068 +       @{
1069 +         /* Bigram markers have the high bit set. */
1070 +         extend (procdata, i, 2u);
1071 +         procdata->c &= 0177;
1072 +         procdata->original_filename[i++] = procdata->bigram1[procdata->c];
1073 +         procdata->original_filename[i++] = procdata->bigram2[procdata->c];
1074 +       @}
1075 +    @}
1076 +
1077 +  /* Consider the case where we executed the loop body zero times; we
1078 +   * still need space for the terminating null byte.
1079 +   */
1080 +  extend (procdata, i, 1u);
1081 +  procdata->original_filename[i] = 0;
1082
1083    procdata->munged_filename = procdata->original_filename;
1084 @end verbatim
1085
1086
1087 VIII. THANKS
1088 ============
1089
1090 Thanks to Rob Holland <rob@@inversepath.com> and Tavis Ormandy.
1091
1092
1093 VIII. CVE INFORMATION
1094 =====================
1095
1096 No CVE candidate number has yet been assigned for this vulnerability.
1097 If someone provides one, I will include it in the public announcement
1098 and change logs.
1099 @end smallexample
1100
1101 The original announcement above was sent out with a cleartext PGP
1102 signature, of course, but that has been omitted from the example.
1103
1104 Once a fixed release is available, announce the new release using the
1105 normal channels.  Any CVE number assigned for the problem should be
1106 included in the @file{ChangeLog} and @file{NEWS} entries. See
1107 @url{http://cve.mitre.org/} for an explanation of CVE numbers.
1108
1109
1110
1111 @node Making Releases
1112 @chapter Making Releases
1113 This section will explain how to make a findutils release.   For the
1114 time being here is a terse description of the main steps:
1115
1116 @enumerate
1117 @item Commit changes; make sure your working directory has no
1118 uncommitted changes.
1119 @item Test; make sure that all changes you have made have tests, and
1120 that the tests pass.  Verify this with @code{make distcheck}.
1121 @item Bugs; make sure all Savannah bug entries fixed in this release
1122 are fixed.
1123 @item NEWS; make sure that the NEWS and configure.in file are updated
1124 with the new release number (and checked in).
1125 @item Build the release tarball; do this with @code{make distcheck}.
1126 Copy the tarball somewhere safe.
1127 @item Tag the release; findutils releases are tagged like this for
1128 example: v4.5.5.  Previously a different format was in use:
1129 FINDUTILS_4_3_8-1.  You can create a tag with the a command like this:
1130 @code{git tag -s -m "Findutils release v4.5.7" v4.5.7}.
1131 @item Prepare the upload and upload it.
1132 @xref{Automated FTP Uploads, ,Automated FTP
1133 Uploads, maintain, Information for Maintainers of GNU Software},
1134 for detailed upload instructions.
1135 @item Make a release announcement; include an extract from the NEWS
1136 file which explains what's changed.  Announcements for test releases
1137 should just go to @email{bug-findutils@@gnu.org}.  Announcements for
1138 stable releases should go to @email{info-gnu@@gnu.org} as well.
1139 @item Bump the release numbers in git; edit the @file{configure.in}
1140 and @file{NEWS} files to advance the release numbers.   For example,
1141 if you have just released @samp{4.6.2}, bump the release number to
1142 @samp{4.6.3-git}.  The point of the @samp{-git} suffix here is that a
1143 findutils binary built from git will bear a release number indicating
1144 it's not built from the ``official'' source release.
1145 @item Close bugs; any bugs recorded on Savannah which were fixed in this
1146 release should now be marked as closed.   Update the @samp{Fixed
1147 Release} field of these bugs appropriately and make sure the
1148 @samp{Assigned to} field is populated.
1149 @end enumerate
1150
1151
1152 @node GNU Free Documentation License
1153 @appendix GNU Free Documentation License
1154 @include fdl.texi
1155
1156 @bye
1157
1158 @comment texi related words used by Emacs' spell checker ispell.el
1159
1160 @comment LocalWords: texinfo setfilename settitle setchapternewpage
1161 @comment LocalWords: iftex finalout ifinfo DIR titlepage vskip pt
1162 @comment LocalWords: filll dir samp dfn noindent xref pxref
1163 @comment LocalWords: var deffn texi deffnx itemx emph asis
1164 @comment LocalWords: findex smallexample subsubsection cindex
1165 @comment LocalWords: dircategory direntry itemize
1166
1167 @comment other words used by Emacs' spell checker ispell.el
1168 @comment LocalWords: README fred updatedb xargs Plett Rendell akefile
1169 @comment LocalWords: args grep Filesystems fo foo fOo wildcards iname
1170 @comment LocalWords: ipath regex iregex expr fubar regexps
1171 @comment LocalWords: metacharacters macs sr sc inode lname ilname
1172 @comment LocalWords: sysdep noleaf ls inum xdev filesystems usr atime
1173 @comment LocalWords: ctime mtime amin cmin mmin al daystart Sladkey rm
1174 @comment LocalWords: anewer cnewer bckw rf xtype uname gname uid gid
1175 @comment LocalWords: nouser nogroup chown chgrp perm ch maxdepth
1176 @comment LocalWords: mindepth cpio src CD AFS statted stat fstype ufs
1177 @comment LocalWords: nfs tmp mfs printf fprint dils rw djm Nov lwall
1178 @comment LocalWords: POSIXLY fls fprintf strftime locale's EDT GMT AP
1179 @comment LocalWords: EST diff perl backquotes sprintf Falstad Oct cron
1180 @comment LocalWords: eg vmunix mkdir afs allexec allwrite ARG bigram
1181 @comment LocalWords: bigrams cd chmod comp crc CVS dbfile eof
1182 @comment LocalWords: fileserver filesystem fn frcode Ghazi Hnewc iXX
1183 @comment LocalWords: joeuser Kaveh localpaths localuser LOGNAME
1184 @comment LocalWords: Meyering mv netpaths netuser nonblank nonblanks
1185 @comment LocalWords: ois ok Pinard printindex proc procs prunefs
1186 @comment LocalWords: prunepaths pwd RFS rmadillo rmdir rsh sbins str
1187 @comment LocalWords: su Timar ubins ug unstripped vf VM Weitzel
1188 @comment LocalWords: wildcard zlogout basename execdir wholename iwholename
1189 @comment LocalWords: timestamp timestamps Solaris FreeBSD OpenBSD POSIX