5 pristine-tar - regenerate pristine tarballs
9 B<pristine-tar> [-vdk] gendelta I<tarball> I<delta>
11 B<pristine-tar> [-vdk] gentar I<delta> I<tarball>
13 B<pristine-tar> [-vdk] [-m message] commit I<tarball> [I<upstream>]
15 B<pristine-tar> [-vdk] checkout I<tarball>
17 B<pristine-tar> [-vdk] list
21 pristine-tar can regenerate an exact copy of a pristine upstream tarball
22 using only a small binary I<delta> file and the contents of the tarball,
23 which are typically kept in an I<upstream> branch in version control.
25 The I<delta> file is designed to be checked into version control along-side
26 the I<upstream> branch, thus allowing Debian packages to be built entirely
27 using sources in version control, without the need to keep copies of
30 pristine-tar supports compressed tarballs, calling out to pristine-gz(1),
31 pristine-bz2(1), and pristine-xz(1) to produce the pristine gzip, bzip2,
38 =item pristine-tar gendelta I<tarball> I<delta>
40 This takes the specified upstream I<tarball>, and generates a small binary
41 delta file that can later be used by pristine-tar gentar to recreate the
44 If the delta filename is "-", it is written to standard output.
46 =item pristine-tar gentar I<delta> I<tarball>
48 This takes the specified I<delta> file, and the files in the current
49 directory, which must have identical content to those in the upstream
50 tarball, and uses these to regenerate the pristine upstream I<tarball>.
52 If the delta filename is "-", it is read from standard input.
54 =item pristine-tar commit I<tarball> [I<upstream>]
56 B<pristine-tar commit> generates a pristine-tar delta file for the specified
57 I<tarball>, and commits it to version control. The B<pristine-tar checkout>
58 command can later be used to recreate the original tarball based only
59 on the information stored in version control.
61 The I<upstream> parameter specifies the tag or branch that contains the
62 same content that is present in the tarball. This defaults to
63 "refs/heads/upstream", or if there's no such branch, any
64 branch matching "upstream". The name of the tree it points to will be
65 recorded for later use by B<pristine-tar checkout>. Note that the content
66 does not need to be 100% identical to the content of the tarball, but
67 if it is not, additional space will be used in the delta file.
69 The delta files are stored in a branch named "pristine-tar", with filenames
70 corresponding to the input tarball, with ".delta" appended. This
71 branch is created or updated as needed to add each new delta.
73 =item pristine-tar checkout I<tarball>
75 This regenerates a copy of the specified I<tarball> using information
76 previously saved in version control by B<pristine-tar commit>.
78 =item pristine-tar list
80 This lists tarballs that pristine-tar is able to checkout from version
93 Verbose mode, show each command that is run.
105 Don't clean up the temporary directory on exit.
109 =item --message=message
111 Use this option to specify a custom commit message to pristine-tar commit.
117 Suppose you maintain the hello package, in a git repository. You have
118 just created a tarball of the release, I<hello-1.0.tar.gz>, which you
119 will upload to a "forge" site.
121 You want to ensure that, if the "forge" loses the tarball, you can always
122 recreate exactly that same tarball. And you'd prefer not to keep copies
123 of tarballs for every release, as that could use a lot of disk space
124 when hello gets the background mp3s and user-contributed levels you
125 are planning for version 2.0.
127 The solution is to use pristine-tar to commit a delta file that efficiently
128 stores enough information to reproduce the tarball later.
132 pristine-tar commit ../hello-1.0.tar.gz 1.0
134 Remember to tell git to push both the pristine-tar branch, and your tag:
136 git push --all --tags
138 Now it is a year later. The worst has come to pass; the "forge" lost
139 all its data, you deleted the tarballs to make room for bug report emails,
140 and you want to regenerate them. Happily, the git repository is still
143 git clone git://github.com/joeyh/hello.git
145 pristine-tar checkout ../hello-1.0.tar.gz
149 Only tarballs, gzipped tarballs, bzip2ed tarballs, and xzed tarballs
150 are currently supported.
152 Currently only the git revision control system is supported by the
153 "checkout" and "commit" commands. It's ok if the working copy
154 is not clean or has uncommitted changes, or has changes staged in the
155 index; none of that will be touched by "checkout" or "commit".
163 Specifies a location to place temporary files, other than the default.
169 Joey Hess <joeyh@debian.org>
171 Licensed under the GPL, version 2 or above.
178 use Pristine::Tar::Delta;
179 use Pristine::Tar::Formats;
182 use Cwd qw{getcwd abs_path};
184 # Force locale to C since tar may output utf-8 filenames differently
185 # depending on the locale.
188 # Don't let environment change tar's behavior.
189 delete $ENV{TAR_OPTIONS};
192 # Ask tar to please be compatable with version 1.26.
193 # In version 1.27, it changed some fields used in longlink entries.
194 $ENV{PRISTINE_TAR_COMPAT}=1;
196 # The following two assignments are potentially munged during the
197 # build process to hold the values of TAR_PROGRAM and XDELTA_PROGRAM
198 # parameters as given to Makefile.PL.
199 my $tar_program = "tar";
200 my $xdelta_program = "xdelta";
207 gentar => [\&gentar, 2],
208 gendelta => [\&gendelta, 2],
209 commit => [\&commit],
211 checkout => [\&checkout, 1],
212 co => [\&checkout, 1],
216 "m|message=s" => \$message,
221 print STDERR "Usage: pristine-tar [-vdk] gendelta tarball delta\n";
222 print STDERR " pristine-tar [-vdk] gentar delta tarball\n";
223 print STDERR " pristine-tar [-vdk] [-m message] commit tarball [upstream]\n";
224 print STDERR " pristine-tar [-vdk] checkout tarball\n";
225 print STDERR " pristine-tar list\n";
229 sub unquote_filename {
230 my $filename = shift;
232 $filename =~ s/\\a/\a/g;
233 $filename =~ s/\\b/\b/g;
234 $filename =~ s/\\f/\f/g;
235 $filename =~ s/\\n/\n/g;
236 $filename =~ s/\\r/\r/g;
237 $filename =~ s/\\t/\t/g;
238 $filename =~ s/\\v/\x11/g;
239 $filename =~ s/\\\\/\\/g;
244 my $recreatetarball_tempdir;
245 sub recreatetarball {
246 my $manifestfile=shift;
250 my $tempdir=tempdir();
253 open (IN, "<", $manifestfile) || die "$manifestfile: $!";
259 link($manifestfile, "$tempdir/manifest") || die "link $tempdir/manifest: $!";
261 # The manifest and source should have the same filenames,
262 # but the manifest probably has all the files under a common
263 # subdirectory. Check if it does.
265 foreach my $file (@manifest) {
266 #debug("file: $file");
267 if ($file=~m!^(/?[^/]+)(/|$)!) {
268 if (length $subdir && $subdir ne $1) {
269 debug("found file not in subdir $subdir: $file");
273 elsif (! length $subdir) {
275 debug("set subdir to $subdir");
279 debug("found file not in subdir: $file");
285 if (length $subdir) {
286 debug("subdir is $subdir");
287 doit("mkdir", "$tempdir/workdir");
291 if (! $options{clobber_source}) {
292 doit("cp", "-a", $source, "$tempdir/workdir$subdir");
295 doit("mv", $source, "$tempdir/workdir$subdir");
298 # It's important that this create an identical tarball each time
299 # for a given set of input files. So don't include file metadata
300 # in the tarball, since it can easily vary.
302 foreach my $file (@manifest) {
303 my $unquoted_file = unquote_filename($file);
305 if (-l "$tempdir/workdir/$unquoted_file") {
306 # Can't set timestamp of a symlink, so
307 # replace the symlink with an empty file.
308 unlink("$tempdir/workdir/$unquoted_file") || die "unlink: $!";
309 open(OUT, ">", "$tempdir/workdir/$unquoted_file") || die "open: $!";
312 elsif (! -e "$tempdir/workdir/$unquoted_file") {
313 debug("$file is listed in the manifest but may not be present in the source directory");
316 if ($options{create_missing}) {
317 # Avoid tar failing on the nonexistent item by
318 # creating a dummy directory.
319 debug("creating missing $unquoted_file");
320 mkpath "$tempdir/workdir/$unquoted_file";
324 if (-d "$tempdir/workdir/$unquoted_file" && (-u _ || -g _ || -k _)) {
325 # tar behaves weirdly for some special modes
326 # and ignores --mode, so clear them.
327 debug("chmod $file");
328 chmod(0755, "$tempdir/workdir/$unquoted_file") ||
333 # Set file times only after modifying of the directory content is
335 foreach my $file (@manifest) {
336 my $unquoted_file = unquote_filename($file);
337 if (-e "$tempdir/workdir/$unquoted_file") {
338 utime(0, 0, "$tempdir/workdir/$unquoted_file") || die "utime: $file: $!";
342 # If some files couldn't be matched up with the manifest,
343 # it's possible they do exist, but just with names that make sense
344 # to tar, but not to this program. Work around this and make sure
345 # such files have their metadata tweaked, by doing a full sweep of
348 debug("doing full tree sweep to catch missing files");
352 unlink($_) || die "unlink: $!";
353 open(OUT, ">", $_) || die "open: $!";
356 if (-d $_ && (-u _ || -g _ || -k _)) {
360 }, "$tempdir/workdir");
362 utime(0, 0, $_) || die "utime: $_: $!";
363 }, "$tempdir/workdir");
366 $recreatetarball_tempdir=$tempdir;
367 return recreatetarball_helper(%options);
370 sub recreatetarball_helper {
372 my $tempdir=$recreatetarball_tempdir;
374 my $ret="$tempdir/recreatetarball";
375 my @cmd=($tar_program, "cf", $ret, "--owner", 0, "--group", 0,
376 "--numeric-owner", "-C", "$tempdir/workdir",
377 "--no-recursion", "--mode", "0644",
378 "--files-from", "$tempdir/manifest");
379 if (exists $options{tar_format}) {
380 push @cmd, ("-H", $options{tar_format});
388 sub recreatetarball_longlink_100 {
389 # For a long time, Debian's tar had a patch that made it output
390 # larger tar files if a filename was exactly 100 bytes. Now that
391 # Debian's tar has been fixed, in order to recreate the tarball
392 # created by that version of tar, we reply on on an environment
393 # variable to turn back on the old behavior.
395 # This variable is currently only available in Debian's tar,
396 # so users of non-debian tar who want to recreate tarballs from
397 # deltas created using the old version of Debian's tar are SOL.
399 $ENV{TAR_LONGLINK_100}=1;
400 my $ret=recreatetarball_helper();
401 delete $ENV{TAR_LONGLINK_100};
410 my $delta=Pristine::Tar::Delta::read(Tarball => $deltafile);
411 Pristine::Tar::Delta::assert($delta, type => "tar", maxversion => 2,
412 minversion => 2, fields => [qw{manifest delta}]);
414 my $out=(defined $delta->{wrapper}
415 ? tempdir()."/".basename($tarball).".tmp"
419 push @try, sub { recreatetarball($delta->{manifest}, getcwd,
420 clobber_source => 0, %opts) };
421 push @try, \&recreatetarball_longlink_100;
422 push @try, sub { recreatetarball($delta->{manifest}, getcwd,
423 clobber_source => 0, tar_format => "gnu", %opts) };
424 push @try, sub { recreatetarball($delta->{manifest}, getcwd,
425 clobber_source => 0, tar_format => "posix", %opts) };
428 foreach my $variant (@try) {
429 my $recreatetarball=$variant->();
430 my $ret=try_doit($xdelta_program, "patch", $delta->{delta}, $recreatetarball, $out);
437 error "Failed to reproduce original tarball. Please file a bug report.";
440 if (defined $delta->{wrapper}) {
441 my $delta_wrapper=Pristine::Tar::Delta::read(Tarball => $delta->{wrapper});
442 if (grep { $_ eq $delta_wrapper->{type} } qw{gz bz2 xz}) {
443 doit("pristine-".$delta_wrapper->{type},
444 ($verbose ? "-v" : "--no-verbose"),
445 ($debug ? "-d" : "--no-debug"),
446 ($keep ? "-k" : "--no-keep"),
447 "gen".$delta_wrapper->{type},
448 $delta->{wrapper}, $out);
449 doit("mv", "-f", $out.".".$delta_wrapper->{type}, $tarball);
452 error "unknown wrapper file type: ".
453 $delta_wrapper->{type};
462 open(IN, "tar --quoting-style=escape -tf $tarball |") || die "tar tf: $!";
463 open(OUT, ">", $manifest) || die "$!";
466 # ./ or / in the manifest just confuses tar
468 print OUT "$_\n" if length $_;
479 my $tempdir=tempdir();
482 # Check to see if it's compressed, and get uncompressed tarball.
483 my $compression=undef;
484 if (is_gz($tarball)) {
486 open(IN, "-|", "zcat", $tarball) || die "zcat: $!";
487 open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
488 print OUT $_ while <IN>;
489 close IN || die "zcat: $!";
490 close OUT || die "$tempdir/origtarball: $!";
492 elsif (is_bz2($tarball)) {
494 open(IN, "-|", "bzcat", $tarball) || die "bzcat: $!";
495 open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
496 print OUT $_ while <IN>;
497 close IN || die "bzcat: $!";
498 close OUT || die "$tempdir/origtarball: $!";
500 elsif (is_xz($tarball)) {
502 open(IN, "-|", "xzcat", $tarball) || die "xzcat: $!";
503 open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
504 print OUT $_ while <IN>;
505 close IN || die "xzcat: $!";
506 close OUT || die "$tempdir/origtarball: $!";
510 # Generate a wrapper file to recreate the compressed file.
511 if (defined $compression) {
512 $delta{wrapper}="$tempdir/wrapper";
513 doit("pristine-$compression",
514 ($verbose ? "-v" : "--no-verbose"),
515 ($debug ? "-d" : "--no-debug"),
516 ($keep ? "-k" : "--no-keep"),
517 "gendelta", $tarball, $delta{wrapper});
518 $tarball="$tempdir/origtarball";
521 $delta{manifest}="$tempdir/manifest";
522 genmanifest($tarball, $delta{manifest});
525 if (! exists $opts{recreatetarball}) {
526 my $sourcedir="$tempdir/tmp";
527 doit("mkdir", $sourcedir);
528 doit($tar_program, "xf", File::Spec->rel2abs($tarball), "-C", $sourcedir);
529 # if all files were in a subdir, use the subdir as the sourcedir
530 my @out=grep { $_ ne "$sourcedir/.." && $_ ne "$sourcedir/." }
531 (glob("$sourcedir/*"), glob("$sourcedir/.*"));
532 if ($#out == 0 && -d $out[0]) {
535 $recreatetarball=recreatetarball("$tempdir/manifest", $sourcedir, clobber_source => 1);
538 $recreatetarball=$opts{recreatetarball};
541 $delta{delta}="$tempdir/delta";
542 my $ret=system("$xdelta_program delta -0 --pristine $recreatetarball $tarball $delta{delta}") >> 8;
543 # xdelta exits 1 on success if there were differences
544 if ($ret != 1 && $ret != 0) {
545 error "xdelta failed with return code $ret";
548 if (-s $delta{delta} >= -s $tarball) {
549 print STDERR "error: excessively large binary delta for $tarball\n";
550 if (! defined $compression) {
551 print STDERR "(Probably the tarball is compressed with an unsupported form of compression.)\n";
554 print STDERR "(Please consider filing a bug report.)\n";
559 Pristine::Tar::Delta::write(Tarball => $deltafile, {
568 (exists $ENV{GIT_DIR} && length $ENV{GIT_DIR})) {
572 error("cannot determine type of vcs used for the current directory");
584 if (defined $upstream && $upstream =~ /[A-Za-z0-9]{40}/) {
588 if (! defined $upstream) {
589 $upstream='upstream';
592 my @reflines=map { chomp; $_ } `git show-ref \Q$upstream\E`;
594 error "failed to find ref using: git show-ref $upstream";
597 # if one line's ref matches exactly, use it
598 foreach my $line (@reflines) {
599 my ($b)=$line=~/^[A-Za-z0-9]+\s(.*)/;
600 if ($b eq $upstream || $b eq "refs/heads/$upstream") {
601 ($id)=$line=~/^([A-Za-z0-9]+)\s/;
607 if (@reflines == 1) {
608 ($id)=$reflines[0]=~/^([A-Za-z0-9]+)\s/;
611 error "more than one ref matches \"$upstream\":\n".
612 join("\n", @reflines);
617 # We have an id that is probably a commit. Let's get to the
618 # id of the actual tree instead. This makes us more robust
619 # against any later changes to the commit.
620 my $treeid=`git rev-parse '$id^{tree}'`;
622 $id = $treeid if length $treeid;
624 doit("git archive --format=tar \Q$id\E | (cd '$dest' && tar x)");
627 die "unsupported vcs $vcs";
634 # Looks for a branch with the given name. If a local branch exists,
635 # returns it. Otherwise, looks for a remote branch, and if exactly
636 # one exists, returns that. If there's no such branch at all, returns
637 # undef. Finally, if there are multiple remote branches and no
638 # local branch, fails with an error.
641 my @reflines=split(/\n/, `git show-ref \Q$branch\E`);
642 my @remotes=grep { ! m/ refs\/heads\/\Q$branch\E$/ } @reflines;
643 if ($#reflines != $#remotes) {
647 if (@reflines == 0) {
650 elsif (@remotes == 1) {
651 my ($remote_branch)=$remotes[0]=~/^[A-Za-z0-9]+\s(.*)/;
652 return $remote_branch;
655 error "There's no local $branch branch. Several remote $branch branches exist.\n".
656 "Run \"git branch --track $branch <remote>\" to create a local $branch branch\n".
657 join("\n", @remotes);
665 my $branch="pristine-tar";
666 my $deltafile=basename($tarball).".delta";
667 my $idfile=basename($tarball).".id";
673 my $b=git_findbranch($branch);
675 error "no $branch branch found, use \"pristine-tar commit\" first";
677 elsif ($b eq $branch) {
678 $branch="refs/heads/$branch";
685 $delta=`git show $branch:\Q$deltafile\E`;
687 error "git show $branch:$deltafile failed";
689 if (! length $delta) {
690 error "git show $branch:$deltafile returned no content";
692 $id=`git show $branch:\Q$idfile\E`;
694 error "git show $branch:$idfile failed";
698 error "git show $branch:$idfile returned no id";
702 die "unsupported vcs $vcs";
705 return ($delta, $id);
713 my $branch="pristine-tar";
714 my $deltafile=basename($tarball).".delta";
715 my $idfile=basename($tarball).".id";
716 my $commit_message=defined $message ? $message :
717 "pristine-tar data for ".basename($tarball);
721 my $tempdir=tempdir();
722 open(OUT, ">$tempdir/$deltafile") || die "$tempdir/$deltafile: $!";
725 open(OUT, ">$tempdir/$idfile") || die "$tempdir/$idfile: $!";
729 # Commit the delta to a branch in git without affecting the
730 # index, and without touching the working tree. Aka deep
732 $ENV{GIT_INDEX_FILE}="$tempdir/index";
733 $ENV{GIT_WORK_TREE}="$tempdir";
734 if (! exists $ENV{GIT_DIR} || ! length $ENV{GIT_DIR}) {
735 $ENV{GIT_DIR}=getcwd."/.git";
738 $ENV{GIT_DIR}=abs_path($ENV{GIT_DIR});
740 chdir($tempdir) || die "chdir: $!";
742 # If there's no local branch, branch from a remote branch
743 # if one exists. If there's no remote branch either, the
744 # code below will create the local branch.
745 my $b=git_findbranch($branch);
746 if (defined $b && $b ne $branch) {
747 doit("git branch --track \Q$branch\E \Q$b\E");
750 my $branch_exists=(system("git show-ref --quiet --verify refs/heads/$branch") == 0);
751 if ($branch_exists) {
752 doit("git ls-tree -r --full-name $branch | git update-index --index-info");
754 doit("git", "update-index", "--add", $deltafile, $idfile);
755 my $sha=`git write-tree`;
757 error("git write-tree failed");
761 error("git write-tree did not return a sha");
763 my $pid = open(COMMIT, "|-");
766 my $commitopts=$branch_exists ? "-p $branch" : "";
767 my $commitsha=`git commit-tree $sha $commitopts`;
769 error("git commit-tree failed");
771 $commitsha=~s/\n//sg;
772 if (! length $commitsha) {
773 error("git commit-tree did not return a sha");
775 doit("git", "update-ref", "refs/heads/$branch", $commitsha);
780 print COMMIT $commit_message."\n";
781 close COMMIT || error("git commit-tree failed");
784 message("committed $deltafile to branch $branch");
787 die "unsupported vcs $vcs";
793 my $upstream=shift; # optional
795 if (! defined $tarball || @_) {
799 my $tempdir=tempdir();
800 my ($sourcedir, $id)=export($upstream);
801 genmanifest($tarball, "$tempdir/manifest");
802 my $recreatetarball=recreatetarball("$tempdir/manifest", $sourcedir,
803 clobber_source => 1, create_missing => 1);
804 my $pid = open(GENDELTA, "-|");
807 gendelta($tarball, "-", recreatetarball => $recreatetarball);
811 my $delta=<GENDELTA>;
812 close GENDELTA || error "failed to generate delta";
813 commitdelta($delta, $id, $tarball);
819 my ($delta, $id)=checkoutdelta($tarball);
820 my ($sourcedir, undef)=export($id);
821 my $pid = open(GENTAR, "|-");
824 $tarball=abs_path($tarball);
825 chdir($sourcedir) || die "chdir $sourcedir: $!";
826 gentar("-", $tarball, clobber_source => 1, create_missing => 1);
830 close GENTAR || error "failed to generate tarball";
832 message("successfully generated $tarball");
836 my $branch="pristine-tar";
839 my $b=git_findbranch($branch);
841 open (LIST, "git ls-tree $b --name-only |");
844 next unless s/\.delta$//;
850 die "unsupported vcs $vcs";