From 689298ab0fd25c5441b219c82438c478355f1bb4 Mon Sep 17 00:00:00 2001 From: Anas Nashif Date: Wed, 7 Nov 2012 09:30:22 -0800 Subject: [PATCH] Imported Upstream version 1.40 --- CHANGES | 67 +++ CONTRIBUTORS | 12 + INSTALL | 25 ++ Makefile | 53 +++ README | 76 ++++ TODO | 5 + fdupes.1 | 105 +++++ fdupes.c | 905 ++++++++++++++++++++++++++++++++++++++++ md5/README | 4 + md5/md5.c | 392 +++++++++++++++++ md5/md5.h | 98 +++++ testdir/nine_upsidedown | 1 + testdir/recursed_a/five | 1 + testdir/recursed_a/one | 1 + testdir/recursed_a/two | 1 + testdir/recursed_b/four | 1 + testdir/recursed_b/one | 1 + testdir/recursed_b/three | 1 + testdir/recursed_b/two_plus_one | 1 + testdir/symlink_dir | 1 + testdir/symlink_two | 1 + testdir/twice_one | 1 + testdir/two | 1 + testdir/with spaces a | 1 + testdir/with spaces b | 1 + testdir/zero_a | 0 testdir/zero_b | 0 27 files changed, 1756 insertions(+) create mode 100644 CHANGES create mode 100644 CONTRIBUTORS create mode 100644 INSTALL create mode 100644 Makefile create mode 100644 README create mode 100644 TODO create mode 100644 fdupes.1 create mode 100644 fdupes.c create mode 100644 md5/README create mode 100644 md5/md5.c create mode 100644 md5/md5.h create mode 100644 testdir/nine_upsidedown create mode 100644 testdir/recursed_a/five create mode 100644 testdir/recursed_a/one create mode 100644 testdir/recursed_a/two create mode 100644 testdir/recursed_b/four create mode 100644 testdir/recursed_b/one create mode 100644 testdir/recursed_b/three create mode 100644 testdir/recursed_b/two_plus_one create mode 120000 testdir/symlink_dir create mode 120000 testdir/symlink_two create mode 100644 testdir/twice_one create mode 100644 testdir/two create mode 100644 testdir/with spaces a create mode 100644 testdir/with spaces b create mode 100644 testdir/zero_a create mode 100644 testdir/zero_b diff --git a/CHANGES b/CHANGES new file mode 100644 index 0000000..0fb421e --- /dev/null +++ b/CHANGES @@ -0,0 +1,67 @@ +The following list, organized by fdupes version, documents changes +to fdupes. Every item on the list includes, inside square brackets, +a list of indentifiers referring to the people who contributed +that particular item. When more than one person is listed the person +who contributed the patch or idea appears first, followed by +those who've otherwise worked on that item. For a list of +contributors names and identifiers please see the CONTRIBUTORS file. + + +Changes from 1.31 to 1.40 + +- Added option to omit the first file in each group + of matches. [LM, AL] +- Added escaping of filenames containing spaces when + sameline option is specified. [AL] +- Changed version indicator format from "fdupes version X.Y" + to the simpler "fdupes X.Y". [AL] +- Changed ordering of options appearing in the help + text (--help), manpage, and README file. [AL] + +Changes from 1.30 to 1.31 + +- Added interactive option to preserve all files during + delete procedure (something similar was already in + place, but now it's official). [AL] +- Updated delete procedure prompt format. [AL] +- Cosmetic code changes. [AL] + +Changes from 1.20 to 1.30 + +- Added size option to display size of duplicates. [LB, AL] +- Added missing typecast for proper compilation under g++. [LB] +- Better handling of errors occurring during retrieval + of a file's signature. [KK, AL] +- No longer displays an error message when specified + directories contain no files. [AL] +- Added red-black tree structure (experimental compile-time + option, disabled by default). [AL] + +Changes from 1.12 to 1.20 + +- Fixed bug where program would crash when files being + scanned were named pipes or sockets. [FD] +- Fix against security risk resulting from the use of a + temporary file to store md5sum output. [FD, AL] +- Using an external md5sum program is now optional. Started + using L. Peter Deutsh's MD5 library instead. [FD, AL] +- Added hardlinks option to distinguish between hard links + and actual duplicate files. [FD, AL] +- Added noempty option to exclude zero-length files + from consideration [AL] + +Changes from 1.11 to 1.12 + +- Improved handling of extremely long input on preserve + prompt (delete option). [SSD, AL] + +Changes from 1.1 to 1.11 + +- Started checking file sizes before signatures + for better performance. [AB, AL] +- Added fdupes manpage. [AB, AL] + +Changes from 1.0 to 1.1 + +- Added delete option for semi-automatic deletion + of duplicate files. [AL] diff --git a/CONTRIBUTORS b/CONTRIBUTORS new file mode 100644 index 0000000..14b2458 --- /dev/null +++ b/CONTRIBUTORS @@ -0,0 +1,12 @@ +The following people have contributed in some way to the development +of fdupes. Please see the CHANGES file for detailed information +on their contributions. Names are listed in alphabetical order. + + [AB] Adrian Bridgett + [AL] Adrian Lopez + [FD] Frank DENIS aka Jedi/Sector One aka DJ Chrysalis + [KK] Kresimir Kukulj + [LB] Laurent Bonnaud + [LM] Luca Montecchiani +[SSD] Steven S. Dick + diff --git a/INSTALL b/INSTALL new file mode 100644 index 0000000..c55c06a --- /dev/null +++ b/INSTALL @@ -0,0 +1,25 @@ +Installing fdupes +-------------------------------------------------------------------- +To install the program, issue the following commands: + +make fdupes +su root +make install + +This will install the program in /usr/local/bin. You may change this +to a different location by editing the Makefile. Please refer to +the Makefile for an explanation of compile-time options. If you're +having trouble compiling, please take a look at the Makefile. + +UPGRADING NOTE: When upgrading from a version prior to 1.2, it should +be noted that the default installation directory for the fdupes man +page has changed from "/usr/man" to "/usr/local/man". If installing +to the default location you should delete the old man page before +proceeding. This file would be named "/usr/man/man1/fdupes.1". + +A test directory is included so that you may familiarise yourself +with the way fdupes operates. You may test the program before +installing it by issuing a command such as "./fdupes testdir" +or "./fdupes -r testdir", just to name a couple of examples. Refer +to the documentation for information on valid options. + diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..090ad1d --- /dev/null +++ b/Makefile @@ -0,0 +1,53 @@ +# +# INSTALLDIR indicates directory where program is to be installed. +# Suggested values are "/usr/local/bin" or "/usr/bin". +# +INSTALLDIR = /usr/local/bin + +# +# MANPAGEDIR indicates directory where the fdupes man page is to be +# installed. Suggested values are "/usr/local/man" or "/usr/man". +# +MANPAGEDIR = /usr/local/man + +# +# VERSION determines the program's version number. +# +VERSION = "1.40" + +# +# To use the md5sum program for calculating signatures (instead of the +# built in MD5 message digest routines) uncomment the following +# line (try this if you're having trouble with built in code). +# +#EXTERNAL_MD5 = -DEXTERNAL_MD5=\"md5sum\" + +# +# This version of fdupes can use a red-black tree structure to +# store file information. This is disabled by default, as it +# hasn't been optimized or verified correct. If you wish to +# enable this untested option, uncomment the following line. +# +#EXPERIMENTAL_RBTREE = -DEXPERIMENTAL_RBTREE + +##################################################################### +# no need to modify anything beyond this point # +##################################################################### + +fdupes: fdupes.c md5/md5.c + gcc fdupes.c md5/md5.c -Wall -o fdupes -DVERSION=\"$(VERSION)\" $(EXTERNAL_MD5) $(EXPERIMENTAL_RBTREE) + +install: fdupes + cp fdupes $(INSTALLDIR) + cp fdupes.1 $(MANPAGEDIR)/man1 + +tarball: clean + tar --directory=.. -c -z -v -f ../fdupes-$(VERSION).tar.gz fdupes-$(VERSION) + +clean: + rm -f *.o + rm -f fdupes + rm -f *~ + +love: + @echo You\'re not my type. Go find a human partner. diff --git a/README b/README new file mode 100644 index 0000000..e541e47 --- /dev/null +++ b/README @@ -0,0 +1,76 @@ +Introduction +-------------------------------------------------------------------- +FDUPES is a program for identifying duplicate files residing +within specified directories. + + +Usage +-------------------------------------------------------------------- +Usage: fdupes [options] DIRECTORY... + + -r --recurse include files residing in subdirectories + -s --symlinks follow symlinks + -H --hardlinks normally, when two or more files point to the same + disk area they are treated as non-duplicates; this + option will change this behavior + -n --noempty exclude zero-length files from consideration + -f --omitfirst omit the first file in each set of matches + -1 --sameline list each set of matches on a single line + -S --size show size of duplicate files + -q --quiet hide progress indicator + -d --delete prompt user for files to preserve and delete all + others; important: under particular circumstances, + data may be lost when using this option together + with -s or --symlinks, or when specifying a + particular directory more than once; refer to the + fdupes documentation for additional information + -v --version display fdupes version + -h --help display this help message + +Unless -1 or --sameline is specified, duplicate files are listed +together in groups, each file displayed on a separate line. The +groups are then separated from each other by blank lines. + +When -1 or --sameline is specified, spaces and backslash characters (\) +appearing in a filename are preceded by a backslash character. For +instance, "with spaces" becomes "with\ spaces". + +When using -d or --delete, care should be taken to insure against +accidental data loss. While no information will be immediately +lost, using this option together with -s or --symlink can lead +to confusing information being presented to the user when prompted +for files to preserve. Specifically, a user could accidentally +preserve a symlink while deleting the file it points to. A similar +problem arises when specifying a particular directory more than +once. All files within that directory will be listed as their own +duplicates, leading to data loss should a user preserve a file +without its "duplicate" (the file itself!). + + +Contact Information for Adrian Lopez +-------------------------------------------------------------------- +email: adrian2@caribe.net + + +Legal Information +-------------------------------------------------------------------- +FDUPES Copyright (c) 1999 Adrian Lopez + +Permission is hereby granted, free of charge, to any person +obtaining a copy of this software and associated documentation files +(the "Software"), to deal in the Software without restriction, +including without limitation the rights to use, copy, modify, merge, +publish, distribute, sublicense, and/or sell copies of the Software, +and to permit persons to whom the Software is furnished to do so, +subject to the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS +OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/TODO b/TODO new file mode 100644 index 0000000..c363f98 --- /dev/null +++ b/TODO @@ -0,0 +1,5 @@ +- Add option to highlight or identify symlinked files. +- Verify correctness of red-black tree algorithm. Optimize. +- Fix problem where MD5 collisions will result in one of the + files not being registered (causing it to be ignored). + diff --git a/fdupes.1 b/fdupes.1 new file mode 100644 index 0000000..f4b4ca3 --- /dev/null +++ b/fdupes.1 @@ -0,0 +1,105 @@ +.TH FDUPES 1 +.\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection +.\" other parms are allowed: see man(7), man(1) +.SH NAME +fdupes \- finds duplicate files in a given set of directories +.SH SYNOPSIS +.B fdupes +[ +.I options +] +.I DIRECTORY +\|.\|.\|. + +.SH "DESCRIPTION" +Searches the given path for duplicate files. Such files are found by +comparing file sizes and MD5 signatures, followed by a +byte-by-byte comparison. + +.SH OPTIONS +.TP +.B -r --recurse +include files residing in subdirectories +.TP +.B -s --symlinks +follow symlinked directories +.TP +.B -H --hardlinks +normally, when two or more files point to the same disk area they are +treated as non-duplicates; this option will change this behavior +.TP +.B -n --noempty +exclude zero-length files from consideration +.TP +.B -f --omitfirst +omit the first file in each set of matches +.TP +.B -1 --sameline +list each set of matches on a single line +.TP +.B -S --size +show size of duplicate files +.TP +.B -q --quiet +hide progress indicator +.TP +.B -d --delete +prompt user for files to preserve, deleting all others (see +.B CAVEATS +below) +.TP +.B -v --version +display fdupes version +.TP +.B -h --help +displays help +.SH "SEE ALSO" +.\" Always quote multiple words for .SH +.BR md5sum (1) +.SH NOTES +Unless +.B -1 +or +.B --sameline +is specified, duplicate files are listed +together in groups, each file displayed on a separate line. The +groups are then separated from each other by blank lines. + +When +.B -1 +or +.B --sameline +is specified, spaces and backslash characters (\fB\e\fP) appearing +in a filename are preceded by a backslash character. +.SH CAVEATS +If fdupes returns with an error message such as +.B fdupes: error invoking md5sum +it means the program has been compiled to use an external +program to calculate MD5 signatures (otherwise, fdupes uses +interal routines for this purpose), and an error has occurred +while attempting to execute it. If this is the case, the +specified program should be properly installed prior +to running fdupes. + +When using +.B \-d +or +.BR \-\-delete , +care should be taken to insure against +accidental data loss. + +When used together with options +.B \-s +or +.BR \-\-symlink , +a user could accidentally +preserve a symlink while deleting the file it points to. + +Furthermore, when specifying a particular directory more than +once, all files within that directory will be listed as their +own duplicates, leading to data loss should a user preserve a +file without its "duplicate" (the file itself!). + +.SH AUTHOR +Adrian Lopez + diff --git a/fdupes.c b/fdupes.c new file mode 100644 index 0000000..b52f723 --- /dev/null +++ b/fdupes.c @@ -0,0 +1,905 @@ +/* FDUPES Copyright (c) 1999 Adrian Lopez + + Permission is hereby granted, free of charge, to any person + obtaining a copy of this software and associated documentation files + (the "Software"), to deal in the Software without restriction, + including without limitation the rights to use, copy, modify, merge, + publish, distribute, sublicense, and/or sell copies of the Software, + and to permit persons to whom the Software is furnished to do so, + subject to the following conditions: + + The above copyright notice and this permission notice shall be + included in all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY + CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef EXTERNAL_MD5 +#include "md5/md5.h" +#endif + +#define ISFLAG(a,b) ((a & b) == b) +#define SETFLAG(a,b) (a |= b) + +#define F_RECURSE 0x001 +#define F_HIDEPROGRESS 0x002 +#define F_DSAMELINE 0x004 +#define F_FOLLOWLINKS 0x008 +#define F_DELETEFILES 0x010 +#define F_EXCLUDEEMPTY 0x020 +#define F_CONSIDERHARDLINKS 0x040 +#define F_SHOWSIZE 0x080 +#define F_OMITFIRST 0x100 + +char *program_name; + +unsigned long flags = 0; + +#define CHUNK_SIZE 8192 + +#define INPUT_SIZE 256 + +typedef struct _file { + char *d_name; + off_t size; + char *crcsignature; + ino_t inode; + int hasdupes; /* true only if file is first on duplicate chain */ + struct _file *duplicates; + struct _file *next; +} file_t; + +typedef struct _filetree { + file_t *file; +#ifdef EXPERIMENTAL_RBTREE + unsigned char color; + struct _filetree *parent; +#endif + struct _filetree *left; + struct _filetree *right; +} filetree_t; + +#ifdef EXPERIMENTAL_RBTREE +#define COLOR_RED 0 +#define COLOR_BLACK 1 +#endif + +void errormsg(char *message, ...) +{ + va_list ap; + + va_start(ap, message); + + fprintf(stderr, "\r%40s\r%s: ", "", program_name); + vfprintf(stderr, message, ap); +} + +void escapefilename(char *escape_list, char **filename_ptr) +{ + int x; + int tx; + char *tmp; + char *filename; + + filename = *filename_ptr; + + tmp = (char*) malloc(strlen(filename) * 2 + 1); + if (tmp == NULL) { + errormsg("out of memory!\n"); + exit(1); + } + + for (x = 0, tx = 0; x < strlen(filename); x++) { + if (strchr(escape_list, filename[x]) != NULL) tmp[tx++] = '\\'; + tmp[tx++] = filename[x]; + } + + tmp[tx] = '\0'; + + if (x != tx) { + *filename_ptr = realloc(*filename_ptr, strlen(tmp) + 1); + if (*filename_ptr == NULL) { + errormsg("out of memory!\n"); + exit(1); + } + strcpy(*filename_ptr, tmp); + } +} + +off_t filesize(char *filename) { + struct stat s; + + if (stat(filename, &s) != 0) return -1; + + return s.st_size; +} + +ino_t getinode(char *filename) { + struct stat s; + + if (stat(filename, &s) != 0) return 0; + + return s.st_ino; +} + +int grokdir(char *dir, file_t **filelistp) +{ + DIR *cd; + file_t *newfile; + struct dirent *dirinfo; + int lastchar; + int filecount = 0; + struct stat info; + struct stat linfo; + static int progress = 0; + static char indicator[] = "-\\|/"; + + cd = opendir(dir); + + if (!cd) { + errormsg("could not chdir to %s\n", dir); + return 0; + } + + while ((dirinfo = readdir(cd)) != NULL) { + if (strcmp(dirinfo->d_name, ".") && strcmp(dirinfo->d_name, "..")) { + if (!ISFLAG(flags, F_HIDEPROGRESS)) { + fprintf(stderr, "\rBuilding file list %c ", indicator[progress]); + progress = (progress + 1) % 4; + } + + newfile = (file_t*) malloc(sizeof(file_t)); + + if (!newfile) { + errormsg("out of memory!\n"); + closedir(cd); + exit(1); + } else newfile->next = *filelistp; + + newfile->inode = 0; + newfile->crcsignature = NULL; + newfile->duplicates = NULL; + newfile->hasdupes = 0; + + newfile->d_name = (char*)malloc(strlen(dir)+strlen(dirinfo->d_name)+2); + + if (!newfile->d_name) { + errormsg("out of memory!\n"); + free(newfile); + closedir(cd); + exit(1); + } + + strcpy(newfile->d_name, dir); + lastchar = strlen(dir) - 1; + if (lastchar >= 0 && dir[lastchar] != '/') + strcat(newfile->d_name, "/"); + strcat(newfile->d_name, dirinfo->d_name); + + if (filesize(newfile->d_name) == 0 && ISFLAG(flags, F_EXCLUDEEMPTY)) { + free(newfile->d_name); + free(newfile); + continue; + } + + if (stat(newfile->d_name, &info) == -1) { + free(newfile->d_name); + free(newfile); + continue; + } + + if (lstat(newfile->d_name, &linfo) == -1) { + free(newfile->d_name); + free(newfile); + continue; + } + + if (S_ISDIR(info.st_mode)) { + if (ISFLAG(flags, F_RECURSE) && (ISFLAG(flags, F_FOLLOWLINKS) || !S_ISLNK(linfo.st_mode))) + filecount += grokdir(newfile->d_name, filelistp); + free(newfile->d_name); + free(newfile); + } else { + if (S_ISREG(linfo.st_mode) || (S_ISLNK(linfo.st_mode) && ISFLAG(flags, F_FOLLOWLINKS))) { + *filelistp = newfile; + filecount++; + } else { + free(newfile->d_name); + free(newfile); + } + } + } + } + + closedir(cd); + + return filecount; +} + +#ifndef EXTERNAL_MD5 + +/* If EXTERNAL_MD5 is not defined, use L. Peter Deutsch's MD5 library. + */ +char *getcrcsignature(char *filename) +{ + int x; + off_t fsize; + off_t toread; + md5_state_t state; + md5_byte_t digest[16]; + static md5_byte_t chunk[CHUNK_SIZE]; + static char signature[16*2 + 1]; + char *sigp; + FILE *file; + + md5_init(&state); + + fsize = filesize(filename); + + file = fopen(filename, "rb"); + if (file == NULL) { + errormsg("error opening file %s\n", filename); + return NULL; + } + + while (fsize > 0) { + toread = (fsize % CHUNK_SIZE) ? (fsize % CHUNK_SIZE) : CHUNK_SIZE; + if (fread(chunk, toread, 1, file) != 1) { + errormsg("error reading from file %s\n", filename); + return NULL; + } + md5_append(&state, chunk, toread); + fsize -= toread; + } + + md5_finish(&state, digest); + + sigp = signature; + + for (x = 0; x < 16; x++) { + sprintf(sigp, "%02x", digest[x]); + sigp = strchr(sigp, '\0'); + } + + fclose(file); + + return signature; +} + +#endif /* [#ifndef EXTERNAL_MD5] */ + +#ifdef EXTERNAL_MD5 + +/* If EXTERNAL_MD5 is defined, use md5sum program to calculate signatures. + */ +char *getcrcsignature(char *filename) +{ + static char signature[256]; + char *command; + char *separator; + FILE *result; + + command = (char*) malloc(strlen(filename)+strlen(EXTERNAL_MD5)+2); + if (command == NULL) { + errormsg("out of memory\n"); + exit(1); + } + + sprintf(command, "%s %s", EXTERNAL_MD5, filename); + + result = popen(command, "r"); + if (result == NULL) { + errormsg("error invoking %s\n", EXTERNAL_MD5); + exit(1); + } + + free(command); + + if (fgets(signature, 256, result) == NULL) { + errormsg("error generating signature for %s\n", filename); + return NULL; + } + separator = strchr(signature, ' '); + if (separator) *separator = '\0'; + + pclose(result); + + return signature; +} + +#endif /* [#ifdef EXTERNAL_MD5] */ + +void purgetree(filetree_t *checktree) +{ + if (checktree->left != NULL) purgetree(checktree->left); + + if (checktree->right != NULL) purgetree(checktree->right); + + free(checktree); +} + +#ifdef EXPERIMENTAL_RBTREE +/* Use a red-black tree structure to store file information. + */ + +void rotate_left(filetree_t **root, filetree_t *node) +{ + filetree_t *subject; + + subject = node->right; + node->right = subject->left; + + if (subject->left != NULL) subject->left->parent = node; + subject->parent = node->parent; + + if (node->parent == NULL) { + *root = subject; + } else { + if (node == node->parent->left) + node->parent->left = subject; + else + node->parent->right = subject; + } + + subject->left = node; + node->parent = subject; +} + +void rotate_right(filetree_t **root, filetree_t *node) +{ + filetree_t *subject; + + subject = node->left; + node->left = subject->right; + + if (subject->right != NULL) subject->right->parent = node; + subject->parent = node->parent; + + if (node->parent == NULL) { + *root = subject; + } else { + if (node == node->parent->left) + node->parent->left = subject; + else + node->parent->right = subject; + } + + subject->right = node; + node->parent = subject; +} + +#define TREE_LEFT -1 +#define TREE_RIGHT 1 +#define TREE_ROOT 0 + +void registerfile(filetree_t **root, filetree_t *parent, int loc, file_t *file) +{ + filetree_t *node; + filetree_t *uncle; + + file->size = filesize(file->d_name); + file->inode = getinode(file->d_name); + + node = (filetree_t*) malloc(sizeof(filetree_t)); + if (node == NULL) { + errormsg("out of memory!\n"); + exit(1); + } + + node->file = file; + node->left = NULL; + node->right = NULL; + node->parent = parent; + node->color = COLOR_RED; + + if (loc == TREE_ROOT) + *root = node; + else if (loc == TREE_LEFT) + parent->left = node; + else + parent->right = node; + + while (node != *root && node->parent->color == COLOR_RED) { + if (node->parent->parent == NULL) return; + + if (node->parent == node->parent->parent->left) { + uncle = node->parent->parent->right; + if (uncle == NULL) return; + + if (uncle->color == COLOR_RED) { + node->parent->color = COLOR_BLACK; + uncle->color = COLOR_BLACK; + node->parent->parent->color = COLOR_RED; + node = node->parent->parent; + } else { + if (node == node->parent->right) { + node = node->parent; + rotate_left(root, node); + } + node->parent->color = COLOR_BLACK; + node->parent->parent->color = COLOR_RED; + rotate_right(root, node->parent->parent); + } + } else { + uncle = node->parent->parent->left; + if (uncle == NULL) return; + + if (uncle->color == COLOR_RED) { + node->parent->color = COLOR_BLACK; + uncle->color = COLOR_BLACK; + node->parent->parent->color = COLOR_RED; + node = node->parent->parent; + } else { + if (node == node->parent->right) { + node = node->parent; + rotate_left(root, node); + } + node->parent->color = COLOR_BLACK; + node->parent->parent->color = COLOR_RED; + rotate_right(root, node->parent->parent); + } + } + } + + (*root)->color = COLOR_BLACK; +} + +#endif /* [#ifdef EXPERIMENTAL_RBTREE] */ + +#ifndef EXPERIMENTAL_RBTREE + +int registerfile(filetree_t **branch, file_t *file) +{ + file->size = filesize(file->d_name); + file->inode = getinode(file->d_name); + + *branch = (filetree_t*) malloc(sizeof(filetree_t)); + if (*branch == NULL) { + errormsg("out of memory!\n"); + exit(1); + } + + (*branch)->file = file; + (*branch)->left = NULL; + (*branch)->right = NULL; + + return 1; +} + +#endif /* [#ifndef EXPERIMENTAL_RBTREE] */ + +file_t *checkmatch(filetree_t **root, filetree_t *checktree, file_t *file) +{ + int cmpresult; + char *crcsignature; + off_t fsize; + + /* If inodes are equal one of the files is a hard link, which + is usually not accidental. We don't want to flag them as + duplicates, unless the user specifies otherwise. */ + + if (!ISFLAG(flags, F_CONSIDERHARDLINKS) && getinode(file->d_name) == + checktree->file->inode) return NULL; + + fsize = filesize(file->d_name); + + if (fsize < checktree->file->size) + cmpresult = -1; + else + if (fsize > checktree->file->size) cmpresult = 1; + else { + if (checktree->file->crcsignature == NULL) { + crcsignature = getcrcsignature(checktree->file->d_name); + if (crcsignature == NULL) return NULL; + + checktree->file->crcsignature = (char*) malloc(strlen(crcsignature)+1); + if (checktree->file->crcsignature == NULL) { + errormsg("out of memory\n"); + exit(1); + } + strcpy(checktree->file->crcsignature, crcsignature); + } + + if (file->crcsignature == NULL) { + crcsignature = getcrcsignature(file->d_name); + if (crcsignature == NULL) return NULL; + + file->crcsignature = (char*) malloc(strlen(crcsignature)+1); + if (file->crcsignature == NULL) { + errormsg("out of memory\n"); + exit(1); + } + strcpy(file->crcsignature, crcsignature); + } + + cmpresult = strcmp(file->crcsignature, checktree->file->crcsignature); + } + + if (cmpresult < 0) { + if (checktree->left != NULL) { + return checkmatch(root, checktree->left, file); + } else { +#ifndef EXPERIMENTAL_RBTREE + registerfile(&(checktree->left), file); +#else + registerfile(root, checktree, TREE_LEFT, file); +#endif + return NULL; + } + } else if (cmpresult > 0) { + if (checktree->right != NULL) { + return checkmatch(root, checktree->right, file); + } else { +#ifndef EXPERIMENTAL_RBTREE + registerfile(&(checktree->right), file); +#else + registerfile(root, checktree, TREE_RIGHT, file); +#endif + return NULL; + } + } else return checktree->file; +} + +/* Do a bit-for-bit comparison in case two different files produce the + same signature. Unlikely, but better safe than sorry. */ + +int confirmmatch(FILE *file1, FILE *file2) +{ + unsigned char c1; + unsigned char c2; + size_t r1; + size_t r2; + + fseek(file1, 0, SEEK_SET); + fseek(file2, 0, SEEK_SET); + + do { + r1 = fread(&c1, sizeof(c1), 1, file1); + r2 = fread(&c2, sizeof(c2), 1, file2); + + if (c1 != c2) return 0; /* file contents are different */ + } while (r1 && r2); + + if (r1 != r2) return 0; /* file lengths are different */ + + return 1; +} + +void printmatches(file_t *files) +{ + file_t *tmpfile; + + while (files != NULL) { + if (files->hasdupes) { + if (!ISFLAG(flags, F_OMITFIRST)) { + if (ISFLAG(flags, F_SHOWSIZE)) printf("%ld byte%seach:\n", files->size, + (files->size != 1) ? "s " : " "); + if (ISFLAG(flags, F_DSAMELINE)) escapefilename("\\ ", &files->d_name); + printf("%s%c", files->d_name, ISFLAG(flags, F_DSAMELINE)?' ':'\n'); + } + tmpfile = files->duplicates; + while (tmpfile != NULL) { + if (ISFLAG(flags, F_DSAMELINE)) escapefilename("\\ ", &tmpfile->d_name); + printf("%s%c", tmpfile->d_name, ISFLAG(flags, F_DSAMELINE)?' ':'\n'); + tmpfile = tmpfile->duplicates; + } + printf("\n"); + + } + + files = files->next; + } +} + +void autodelete(file_t *files) +{ + int counter; + int groups = 0; + int curgroup = 0; + file_t *tmpfile; + file_t *curfile; + file_t **dupelist; + int *preserve; + char *preservestr; + char *token; + char *tstr; + int number; + int sum; + int max = 0; + int x; + int i; + + curfile = files; + + while (curfile) { + if (curfile->hasdupes) { + counter = 1; + groups++; + + tmpfile = curfile->duplicates; + while (tmpfile) { + counter++; + tmpfile = tmpfile->duplicates; + } + + if (counter > max) max = counter; + } + + curfile = curfile->next; + } + + max++; + + dupelist = (file_t**) malloc(sizeof(file_t*) * max); + preserve = (int*) malloc(sizeof(int) * max); + preservestr = (char*) malloc(INPUT_SIZE); + + if (!dupelist || !preserve || !preservestr) { + errormsg("out of memory\n"); + exit(1); + } + + while (files) { + if (files->hasdupes) { + curgroup++; + counter = 1; + dupelist[counter] = files; + + printf("[%d] %s\n", counter, files->d_name); + + tmpfile = files->duplicates; + + while (tmpfile) { + dupelist[++counter] = tmpfile; + printf("[%d] %s\n", counter, tmpfile->d_name); + tmpfile = tmpfile->duplicates; + } + + printf("\n"); + + do { + printf("Set %d of %d, preserve files [1 - %d, all]", + curgroup, groups, counter); + if (ISFLAG(flags, F_SHOWSIZE)) printf(" (%ld byte%seach)", files->size, + (files->size != 1) ? "s " : " "); + printf(": "); + fflush(stdout); + + fgets(preservestr, INPUT_SIZE, stdin); + + i = strlen(preservestr) - 1; + + while (preservestr[i]!='\n'){ /* tail of buffer must be a newline */ + tstr = (char*) + realloc(preservestr, strlen(preservestr) + 1 + INPUT_SIZE); + if (!tstr) { /* couldn't allocate memory, treat as fatal */ + errormsg("out of memory!\n"); + exit(1); + } + + preservestr = tstr; + if (!fgets(preservestr + i + 1, INPUT_SIZE, stdin)) + break; /* stop if fgets fails -- possible EOF? */ + i = strlen(preservestr)-1; + } + + for (x = 1; x <= counter; x++) preserve[x] = 0; + + token = strtok(preservestr, " ,\n"); + + while (token != NULL) { + if (strcasecmp(token, "all") == 0) + for (x = 0; x <= counter; x++) preserve[x] = 1; + + number = 0; + sscanf(token, "%d", &number); + if (number > 0 && number <= counter) preserve[number] = 1; + + token = strtok(NULL, " ,\n"); + } + + for (sum = 0, x = 1; x <= counter; x++) sum += preserve[x]; + } while (sum < 1); /* make sure we've preserved at least one file */ + + printf("\n"); + + for (x = 1; x <= counter; x++) { + if (preserve[x]) + printf(" [+] %s\n", dupelist[x]->d_name); + else { + printf(" [-] %s\n", dupelist[x]->d_name); + remove(dupelist[x]->d_name); + } + } + printf("\n"); + } + + files = files->next; + } + + free(dupelist); + free(preserve); + free(preservestr); +} + +void help_text() +{ + printf("Usage: fdupes [options] DIRECTORY...\n\n"); + + printf(" -r --recurse \tinclude files residing in subdirectories\n"); + printf(" -s --symlinks \tfollow symlinks\n"); + printf(" -H --hardlinks \tnormally, when two or more files point to the same\n"); + printf(" \tdisk area they are treated as non-duplicates; this\n"); + printf(" \toption will change this behavior\n"); + printf(" -n --noempty \texclude zero-length files from consideration\n"); + printf(" -f --omitfirst \tomit the first file in each set of matches\n"); + printf(" -1 --sameline \tlist each set of matches on a single line\n"); + printf(" -S --size \tshow size of duplicate files\n"); + printf(" -q --quiet \thide progress indicator\n"); + printf(" -d --delete \tprompt user for files to preserve and delete all\n"); + printf(" \tothers; important: under particular circumstances,\n"); + printf(" \tdata may be lost when using this option together\n"); + printf(" \twith -s or --symlinks, or when specifying a\n"); + printf(" \tparticular directory more than once; refer to the\n"); + printf(" \tfdupes documentation for additional information\n"); + printf(" -v --version \tdisplay fdupes version\n"); + printf(" -h --help \tdisplay this help message\n\n"); +} + +int main(int argc, char **argv) { + int x; + int opt; + FILE *file1; + FILE *file2; + file_t *files = NULL; + file_t *curfile; + file_t *match = NULL; + filetree_t *checktree = NULL; + int filecount = 0; + int progress = 0; + + static struct option long_options[] = + { + { "omitfirst", 0, 0, 'f' }, + { "recurse", 0, 0, 'r' }, + { "quiet", 0, 0, 'q' }, + { "sameline", 0, 0, '1' }, + { "size", 0, 0, 'S' }, + { "symlinks", 0, 0, 's' }, + { "hardlinks", 0, 0, 'H' }, + { "noempty", 0, 0, 'n' }, + { "delete", 0, 0, 'd' }, + { "version", 0, 0, 'v' }, + { "help", 0, 0, 'h' }, + { 0, 0, 0, 0 } + }; + + program_name = argv[0]; + + while ((opt = getopt_long(argc, argv, "frq1SsHndvh", long_options, NULL)) != EOF) { + switch (opt) { + case 'f': + SETFLAG(flags, F_OMITFIRST); + break; + case 'r': + SETFLAG(flags, F_RECURSE); + break; + case 'q': + SETFLAG(flags, F_HIDEPROGRESS); + break; + case '1': + SETFLAG(flags, F_DSAMELINE); + break; + case 'S': + SETFLAG(flags, F_SHOWSIZE); + break; + case 's': + SETFLAG(flags, F_FOLLOWLINKS); + break; + case 'H': + SETFLAG(flags, F_CONSIDERHARDLINKS); + break; + case 'n': + SETFLAG(flags, F_EXCLUDEEMPTY); + break; + case 'd': + SETFLAG(flags, F_DELETEFILES); + break; + case 'v': + printf("fdupes %s\n", VERSION); + exit(0); + case 'h': + help_text(); + exit(1); + default: + fprintf(stderr, "Try `fdupes --help' for more information\n"); + exit(1); + } + } + + if (optind >= argc) { + errormsg("no directories specified\n"); + exit(1); + } + + for (x = optind; x < argc; x++) filecount += grokdir(argv[x], &files); + + if (!files) exit(0); + + curfile = files; + + while (curfile) { + if (!checktree) +#ifndef EXPERIMENTAL_RBTREE + registerfile(&checktree, curfile); +#else + registerfile(&checktree, NULL, TREE_ROOT, curfile); +#endif + else + match = checkmatch(&checktree, checktree, curfile); + + if (match != NULL) { + file1 = fopen(curfile->d_name, "rb"); + if (!file1) { + curfile = curfile->next; + continue; + } + + file2 = fopen(match->d_name, "rb"); + if (!file2) { + fclose(file1); + curfile = curfile->next; + continue; + } + + if (confirmmatch(file1, file2)) { + match->hasdupes = 1; + curfile->duplicates = match->duplicates; + match->duplicates = curfile; + } + + fclose(file1); + fclose(file2); + } + + curfile = curfile->next; + + if (!ISFLAG(flags, F_HIDEPROGRESS)) { + fprintf(stderr, "\rProgress [%d/%d] %d%% ", progress, filecount, + (int)((float) progress / (float) filecount * 100.0)); + progress++; + } + } + + if (!ISFLAG(flags, F_HIDEPROGRESS)) fprintf(stderr, "\r%40s\r", " "); + + if (ISFLAG(flags, F_DELETEFILES)) autodelete(files); + else printmatches(files); + + while (files) { + curfile = files->next; + free(files->d_name); + free(files); + files = curfile; + } + + purgetree(checktree); + + return 0; +} diff --git a/md5/README b/md5/README new file mode 100644 index 0000000..84a7f14 --- /dev/null +++ b/md5/README @@ -0,0 +1,4 @@ +The MD5 library code residing within this directory was written +by L. Peter Deutsch. Although distributed here with fdupes, the +license for his MD5 library is different from the fdupes license. +Please look md5.c or md5.h for licensing information. diff --git a/md5/md5.c b/md5/md5.c new file mode 100644 index 0000000..d6f78b8 --- /dev/null +++ b/md5/md5.c @@ -0,0 +1,392 @@ +/* + Copyright (C) 1999 Aladdin Enterprises. All rights reserved. + + This software is provided 'as-is', without any express or implied + warranty. In no event will the authors be held liable for any damages + arising from the use of this software. + + Permission is granted to anyone to use this software for any purpose, + including commercial applications, and to alter it and redistribute it + freely, subject to the following restrictions: + + 1. The origin of this software must not be misrepresented; you must not + claim that you wrote the original software. If you use this software + in a product, an acknowledgment in the product documentation would be + appreciated but is not required. + 2. Altered source versions must be plainly marked as such, and must not be + misrepresented as being the original software. + 3. This notice may not be removed or altered from any source distribution. + + L. Peter Deutsch + ghost@aladdin.com + + */ +/*$Id: md5.c $ */ +/* + Independent implementation of MD5 (RFC 1321). + + This code implements the MD5 Algorithm defined in RFC 1321. + It is derived directly from the text of the RFC and not from the + reference implementation. + + The original and principal author of md5.c is L. Peter Deutsch + . Other authors are noted in the change history + that follows (in reverse chronological order): + + 1999-11-04 lpd Edited comments slightly for automatic TOC extraction. + 1999-10-18 lpd Fixed typo in header comment (ansi2knr rather than md5). + 1999-05-03 lpd Original version. + */ + +#include "md5.h" + +#ifdef TEST +/* + * Compile with -DTEST to create a self-contained executable test program. + * The test program should print out the same values as given in section + * A.5 of RFC 1321, reproduced below. + */ +#include +main() +{ + static const char *const test[7] = { + "", /*d41d8cd98f00b204e9800998ecf8427e*/ + "a", /*0cc175b9c0f1b6a831c399e269772661*/ + "abc", /*900150983cd24fb0d6963f7d28e17f72*/ + "message digest", /*f96b697d7cb7938d525a2f31aaf161d0*/ + "abcdefghijklmnopqrstuvwxyz", /*c3fcd3d76192e4007dfb496cca67e13b*/ + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789", + /*d174ab98d277d9f5a5611c2c9f419d9f*/ + "12345678901234567890123456789012345678901234567890123456789012345678901234567890" /*57edf4a22be3c955ac49da2e2107b67a*/ + }; + int i; + + for (i = 0; i < 7; ++i) { + md5_state_t state; + md5_byte_t digest[16]; + int di; + + md5_init(&state); + md5_append(&state, (const md5_byte_t *)test[i], strlen(test[i])); + md5_finish(&state, digest); + printf("MD5 (\"%s\") = ", test[i]); + for (di = 0; di < 16; ++di) + printf("%02x", digest[di]); + printf("\n"); + } + return 0; +} +#endif /* TEST */ + + +/* + * For reference, here is the program that computed the T values. + */ +#if 0 +#include +main() +{ + int i; + for (i = 1; i <= 64; ++i) { + unsigned long v = (unsigned long)(4294967296.0 * fabs(sin((double)i))); + printf("#define T%d 0x%08lx\n", i, v); + } + return 0; +} +#endif +/* + * End of T computation program. + */ +#define T1 0xd76aa478 +#define T2 0xe8c7b756 +#define T3 0x242070db +#define T4 0xc1bdceee +#define T5 0xf57c0faf +#define T6 0x4787c62a +#define T7 0xa8304613 +#define T8 0xfd469501 +#define T9 0x698098d8 +#define T10 0x8b44f7af +#define T11 0xffff5bb1 +#define T12 0x895cd7be +#define T13 0x6b901122 +#define T14 0xfd987193 +#define T15 0xa679438e +#define T16 0x49b40821 +#define T17 0xf61e2562 +#define T18 0xc040b340 +#define T19 0x265e5a51 +#define T20 0xe9b6c7aa +#define T21 0xd62f105d +#define T22 0x02441453 +#define T23 0xd8a1e681 +#define T24 0xe7d3fbc8 +#define T25 0x21e1cde6 +#define T26 0xc33707d6 +#define T27 0xf4d50d87 +#define T28 0x455a14ed +#define T29 0xa9e3e905 +#define T30 0xfcefa3f8 +#define T31 0x676f02d9 +#define T32 0x8d2a4c8a +#define T33 0xfffa3942 +#define T34 0x8771f681 +#define T35 0x6d9d6122 +#define T36 0xfde5380c +#define T37 0xa4beea44 +#define T38 0x4bdecfa9 +#define T39 0xf6bb4b60 +#define T40 0xbebfbc70 +#define T41 0x289b7ec6 +#define T42 0xeaa127fa +#define T43 0xd4ef3085 +#define T44 0x04881d05 +#define T45 0xd9d4d039 +#define T46 0xe6db99e5 +#define T47 0x1fa27cf8 +#define T48 0xc4ac5665 +#define T49 0xf4292244 +#define T50 0x432aff97 +#define T51 0xab9423a7 +#define T52 0xfc93a039 +#define T53 0x655b59c3 +#define T54 0x8f0ccc92 +#define T55 0xffeff47d +#define T56 0x85845dd1 +#define T57 0x6fa87e4f +#define T58 0xfe2ce6e0 +#define T59 0xa3014314 +#define T60 0x4e0811a1 +#define T61 0xf7537e82 +#define T62 0xbd3af235 +#define T63 0x2ad7d2bb +#define T64 0xeb86d391 + +static void +md5_process(md5_state_t *pms, const md5_byte_t *data /*[64]*/) +{ + md5_word_t + a = pms->abcd[0], b = pms->abcd[1], + c = pms->abcd[2], d = pms->abcd[3]; + md5_word_t t; + +#ifndef ARCH_IS_BIG_ENDIAN +# define ARCH_IS_BIG_ENDIAN 1 /* slower, default implementation */ +#endif +#if ARCH_IS_BIG_ENDIAN + + /* + * On big-endian machines, we must arrange the bytes in the right + * order. (This also works on machines of unknown byte order.) + */ + md5_word_t X[16]; + const md5_byte_t *xp = data; + int i; + + for (i = 0; i < 16; ++i, xp += 4) + X[i] = xp[0] + (xp[1] << 8) + (xp[2] << 16) + (xp[3] << 24); + +#else /* !ARCH_IS_BIG_ENDIAN */ + + /* + * On little-endian machines, we can process properly aligned data + * without copying it. + */ + md5_word_t xbuf[16]; + const md5_word_t *X; + + if (!((data - (const md5_byte_t *)0) & 3)) { + /* data are properly aligned */ + X = (const md5_word_t *)data; + } else { + /* not aligned */ + memcpy(xbuf, data, 64); + X = xbuf; + } +#endif + +#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32 - (n)))) + + /* Round 1. */ + /* Let [abcd k s i] denote the operation + a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */ +#define F(x, y, z) (((x) & (y)) | (~(x) & (z))) +#define SET(a, b, c, d, k, s, Ti)\ + t = a + F(b,c,d) + X[k] + Ti;\ + a = ROTATE_LEFT(t, s) + b + /* Do the following 16 operations. */ + SET(a, b, c, d, 0, 7, T1); + SET(d, a, b, c, 1, 12, T2); + SET(c, d, a, b, 2, 17, T3); + SET(b, c, d, a, 3, 22, T4); + SET(a, b, c, d, 4, 7, T5); + SET(d, a, b, c, 5, 12, T6); + SET(c, d, a, b, 6, 17, T7); + SET(b, c, d, a, 7, 22, T8); + SET(a, b, c, d, 8, 7, T9); + SET(d, a, b, c, 9, 12, T10); + SET(c, d, a, b, 10, 17, T11); + SET(b, c, d, a, 11, 22, T12); + SET(a, b, c, d, 12, 7, T13); + SET(d, a, b, c, 13, 12, T14); + SET(c, d, a, b, 14, 17, T15); + SET(b, c, d, a, 15, 22, T16); +#undef SET + + /* Round 2. */ + /* Let [abcd k s i] denote the operation + a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */ +#define G(x, y, z) (((x) & (z)) | ((y) & ~(z))) +#define SET(a, b, c, d, k, s, Ti)\ + t = a + G(b,c,d) + X[k] + Ti;\ + a = ROTATE_LEFT(t, s) + b + /* Do the following 16 operations. */ + SET(a, b, c, d, 1, 5, T17); + SET(d, a, b, c, 6, 9, T18); + SET(c, d, a, b, 11, 14, T19); + SET(b, c, d, a, 0, 20, T20); + SET(a, b, c, d, 5, 5, T21); + SET(d, a, b, c, 10, 9, T22); + SET(c, d, a, b, 15, 14, T23); + SET(b, c, d, a, 4, 20, T24); + SET(a, b, c, d, 9, 5, T25); + SET(d, a, b, c, 14, 9, T26); + SET(c, d, a, b, 3, 14, T27); + SET(b, c, d, a, 8, 20, T28); + SET(a, b, c, d, 13, 5, T29); + SET(d, a, b, c, 2, 9, T30); + SET(c, d, a, b, 7, 14, T31); + SET(b, c, d, a, 12, 20, T32); +#undef SET + + /* Round 3. */ + /* Let [abcd k s t] denote the operation + a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */ +#define H(x, y, z) ((x) ^ (y) ^ (z)) +#define SET(a, b, c, d, k, s, Ti)\ + t = a + H(b,c,d) + X[k] + Ti;\ + a = ROTATE_LEFT(t, s) + b + /* Do the following 16 operations. */ + SET(a, b, c, d, 5, 4, T33); + SET(d, a, b, c, 8, 11, T34); + SET(c, d, a, b, 11, 16, T35); + SET(b, c, d, a, 14, 23, T36); + SET(a, b, c, d, 1, 4, T37); + SET(d, a, b, c, 4, 11, T38); + SET(c, d, a, b, 7, 16, T39); + SET(b, c, d, a, 10, 23, T40); + SET(a, b, c, d, 13, 4, T41); + SET(d, a, b, c, 0, 11, T42); + SET(c, d, a, b, 3, 16, T43); + SET(b, c, d, a, 6, 23, T44); + SET(a, b, c, d, 9, 4, T45); + SET(d, a, b, c, 12, 11, T46); + SET(c, d, a, b, 15, 16, T47); + SET(b, c, d, a, 2, 23, T48); +#undef SET + + /* Round 4. */ + /* Let [abcd k s t] denote the operation + a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */ +#define I(x, y, z) ((y) ^ ((x) | ~(z))) +#define SET(a, b, c, d, k, s, Ti)\ + t = a + I(b,c,d) + X[k] + Ti;\ + a = ROTATE_LEFT(t, s) + b + /* Do the following 16 operations. */ + SET(a, b, c, d, 0, 6, T49); + SET(d, a, b, c, 7, 10, T50); + SET(c, d, a, b, 14, 15, T51); + SET(b, c, d, a, 5, 21, T52); + SET(a, b, c, d, 12, 6, T53); + SET(d, a, b, c, 3, 10, T54); + SET(c, d, a, b, 10, 15, T55); + SET(b, c, d, a, 1, 21, T56); + SET(a, b, c, d, 8, 6, T57); + SET(d, a, b, c, 15, 10, T58); + SET(c, d, a, b, 6, 15, T59); + SET(b, c, d, a, 13, 21, T60); + SET(a, b, c, d, 4, 6, T61); + SET(d, a, b, c, 11, 10, T62); + SET(c, d, a, b, 2, 15, T63); + SET(b, c, d, a, 9, 21, T64); +#undef SET + + /* Then perform the following additions. (That is increment each + of the four registers by the value it had before this block + was started.) */ + pms->abcd[0] += a; + pms->abcd[1] += b; + pms->abcd[2] += c; + pms->abcd[3] += d; +} + +void +md5_init(md5_state_t *pms) +{ + pms->count[0] = pms->count[1] = 0; + pms->abcd[0] = 0x67452301; + pms->abcd[1] = 0xefcdab89; + pms->abcd[2] = 0x98badcfe; + pms->abcd[3] = 0x10325476; +} + +void +md5_append(md5_state_t *pms, const md5_byte_t *data, int nbytes) +{ + const md5_byte_t *p = data; + int left = nbytes; + int offset = (pms->count[0] >> 3) & 63; + md5_word_t nbits = (md5_word_t)(nbytes << 3); + + if (nbytes <= 0) + return; + + /* Update the message length. */ + pms->count[1] += nbytes >> 29; + pms->count[0] += nbits; + if (pms->count[0] < nbits) + pms->count[1]++; + + /* Process an initial partial block. */ + if (offset) { + int copy = (offset + nbytes > 64 ? 64 - offset : nbytes); + + memcpy(pms->buf + offset, p, copy); + if (offset + copy < 64) + return; + p += copy; + left -= copy; + md5_process(pms, pms->buf); + } + + /* Process full blocks. */ + for (; left >= 64; p += 64, left -= 64) + md5_process(pms, p); + + /* Process a final partial block. */ + if (left) + memcpy(pms->buf, p, left); +} + +void +md5_finish(md5_state_t *pms, md5_byte_t digest[16]) +{ + static const md5_byte_t pad[64] = { + 0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 + }; + md5_byte_t data[8]; + int i; + + /* Save the length before padding. */ + for (i = 0; i < 8; ++i) + data[i] = (md5_byte_t)(pms->count[i >> 2] >> ((i & 3) << 3)); + /* Pad to 56 bytes mod 64. */ + md5_append(pms, pad, ((55 - (pms->count[0] >> 3)) & 63) + 1); + /* Append the length. */ + md5_append(pms, data, 8); + for (i = 0; i < 16; ++i) + digest[i] = (md5_byte_t)(pms->abcd[i >> 2] >> ((i & 3) << 3)); +} diff --git a/md5/md5.h b/md5/md5.h new file mode 100644 index 0000000..6614ef4 --- /dev/null +++ b/md5/md5.h @@ -0,0 +1,98 @@ +/* + Copyright (C) 1999 Aladdin Enterprises. All rights reserved. + + This software is provided 'as-is', without any express or implied + warranty. In no event will the authors be held liable for any damages + arising from the use of this software. + + Permission is granted to anyone to use this software for any purpose, + including commercial applications, and to alter it and redistribute it + freely, subject to the following restrictions: + + 1. The origin of this software must not be misrepresented; you must not + claim that you wrote the original software. If you use this software + in a product, an acknowledgment in the product documentation would be + appreciated but is not required. + 2. Altered source versions must be plainly marked as such, and must not be + misrepresented as being the original software. + 3. This notice may not be removed or altered from any source distribution. + + L. Peter Deutsch + ghost@aladdin.com + + */ +/*$Id: md5.h $ */ +/* + Independent implementation of MD5 (RFC 1321). + + This code implements the MD5 Algorithm defined in RFC 1321. + It is derived directly from the text of the RFC and not from the + reference implementation. + + The original and principal author of md5.h is L. Peter Deutsch + . Other authors are noted in the change history + that follows (in reverse chronological order): + + 1999-11-04 lpd Edited comments slightly for automatic TOC extraction. + 1999-10-18 lpd Fixed typo in header comment (ansi2knr rather than md5); + added conditionalization for C++ compilation from Martin + Purschke . + 1999-05-03 lpd Original version. + */ + +#ifndef md5_INCLUDED +# define md5_INCLUDED + +/* + * This code has some adaptations for the Ghostscript environment, but it + * will compile and run correctly in any environment with 8-bit chars and + * 32-bit ints. Specifically, it assumes that if the following are + * defined, they have the same meaning as in Ghostscript: P1, P2, P3, + * ARCH_IS_BIG_ENDIAN. + */ + +typedef unsigned char md5_byte_t; /* 8-bit byte */ +typedef unsigned int md5_word_t; /* 32-bit word */ + +/* Define the state of the MD5 Algorithm. */ +typedef struct md5_state_s { + md5_word_t count[2]; /* message length in bits, lsw first */ + md5_word_t abcd[4]; /* digest buffer */ + md5_byte_t buf[64]; /* accumulate block */ +} md5_state_t; + +#ifdef __cplusplus +extern "C" +{ +#endif + +/* Initialize the algorithm. */ +#ifdef P1 +void md5_init(P1(md5_state_t *pms)); +#else +void md5_init(md5_state_t *pms); +#endif + +/* Append a string to the message. */ +#ifdef P3 +void md5_append(P3(md5_state_t *pms, const md5_byte_t *data, int nbytes)); +#else +void md5_append(md5_state_t *pms, const md5_byte_t *data, int nbytes); +#endif + +/* Finish the message and return the digest. */ +#ifdef P2 +void md5_finish(P2(md5_state_t *pms, md5_byte_t digest[16])); +#else +void md5_finish(md5_state_t *pms, md5_byte_t digest[16]); +#endif + +#ifdef __cplusplus +} /* end extern "C" */ +#endif + +#endif /* md5_INCLUDED */ + + + + diff --git a/testdir/nine_upsidedown b/testdir/nine_upsidedown new file mode 100644 index 0000000..ffe2fce --- /dev/null +++ b/testdir/nine_upsidedown @@ -0,0 +1 @@ +six diff --git a/testdir/recursed_a/five b/testdir/recursed_a/five new file mode 100644 index 0000000..54f9d6d --- /dev/null +++ b/testdir/recursed_a/five @@ -0,0 +1 @@ +five diff --git a/testdir/recursed_a/one b/testdir/recursed_a/one new file mode 100644 index 0000000..5626abf --- /dev/null +++ b/testdir/recursed_a/one @@ -0,0 +1 @@ +one diff --git a/testdir/recursed_a/two b/testdir/recursed_a/two new file mode 100644 index 0000000..f719efd --- /dev/null +++ b/testdir/recursed_a/two @@ -0,0 +1 @@ +two diff --git a/testdir/recursed_b/four b/testdir/recursed_b/four new file mode 100644 index 0000000..8510665 --- /dev/null +++ b/testdir/recursed_b/four @@ -0,0 +1 @@ +four diff --git a/testdir/recursed_b/one b/testdir/recursed_b/one new file mode 100644 index 0000000..5626abf --- /dev/null +++ b/testdir/recursed_b/one @@ -0,0 +1 @@ +one diff --git a/testdir/recursed_b/three b/testdir/recursed_b/three new file mode 100644 index 0000000..2bdf67a --- /dev/null +++ b/testdir/recursed_b/three @@ -0,0 +1 @@ +three diff --git a/testdir/recursed_b/two_plus_one b/testdir/recursed_b/two_plus_one new file mode 100644 index 0000000..2bdf67a --- /dev/null +++ b/testdir/recursed_b/two_plus_one @@ -0,0 +1 @@ +three diff --git a/testdir/symlink_dir b/testdir/symlink_dir new file mode 120000 index 0000000..895d79b --- /dev/null +++ b/testdir/symlink_dir @@ -0,0 +1 @@ +recursed_a \ No newline at end of file diff --git a/testdir/symlink_two b/testdir/symlink_two new file mode 120000 index 0000000..64c5e58 --- /dev/null +++ b/testdir/symlink_two @@ -0,0 +1 @@ +two \ No newline at end of file diff --git a/testdir/twice_one b/testdir/twice_one new file mode 100644 index 0000000..f719efd --- /dev/null +++ b/testdir/twice_one @@ -0,0 +1 @@ +two diff --git a/testdir/two b/testdir/two new file mode 100644 index 0000000..f719efd --- /dev/null +++ b/testdir/two @@ -0,0 +1 @@ +two diff --git a/testdir/with spaces a b/testdir/with spaces a new file mode 100644 index 0000000..9502d6c --- /dev/null +++ b/testdir/with spaces a @@ -0,0 +1 @@ +with spaces diff --git a/testdir/with spaces b b/testdir/with spaces b new file mode 100644 index 0000000..9502d6c --- /dev/null +++ b/testdir/with spaces b @@ -0,0 +1 @@ +with spaces diff --git a/testdir/zero_a b/testdir/zero_a new file mode 100644 index 0000000..e69de29 diff --git a/testdir/zero_b b/testdir/zero_b new file mode 100644 index 0000000..e69de29 -- 2.7.4