5 pristine-tar - regenerate pristine tarballs
9 B<pristine-tar> [-vdk] gendelta I<tarball> I<delta>
11 B<pristine-tar> [-vdk] gentar I<delta> I<tarball>
13 B<pristine-tar> [-vdk] [-m message] commit I<tarball> [I<upstream>]
15 B<pristine-tar> [-vdk] checkout I<tarball>
17 B<pristine-tar> [-vdk] list
21 pristine-tar can regenerate an exact copy of a pristine upstream tarball
22 using only a small binary I<delta> file and the contents of the tarball,
23 which are typically kept in an I<upstream> branch in version control.
25 The I<delta> file is designed to be checked into version control along-side
26 the I<upstream> branch, thus allowing Debian packages to be built entirely
27 using sources in version control, without the need to keep copies of
30 pristine-tar supports compressed tarballs, calling out to pristine-gz(1),
31 pristine-bz2(1), and pristine-xz(1) to produce the pristine gzip, bzip2,
38 =item pristine-tar gendelta I<tarball> I<delta>
40 This takes the specified upstream I<tarball>, and generates a small binary
41 delta file that can later be used by pristine-tar gentar to recreate the
44 If the delta filename is "-", it is written to standard output.
46 =item pristine-tar gentar I<delta> I<tarball>
48 This takes the specified I<delta> file, and the files in the current
49 directory, which must have identical content to those in the upstream
50 tarball, and uses these to regenerate the pristine upstream I<tarball>.
52 If the delta filename is "-", it is read from standard input.
54 =item pristine-tar commit I<tarball> [I<upstream>]
56 B<pristine-tar commit> generates a pristine-tar delta file for the specified
57 I<tarball>, and commits it to version control. The B<pristine-tar checkout>
58 command can later be used to recreate the original tarball based only
59 on the information stored in version control.
61 The I<upstream> parameter specifies the tag or branch that contains the
62 same content that is present in the tarball. This defaults to
63 "refs/heads/upstream", or if there's no such branch, any
64 branch matching "upstream". The name of the tree it points to will be
65 recorded for later use by B<pristine-tar checkout>. Note that the content
66 does not need to be 100% identical to the content of the tarball, but
67 if it is not, additional space will be used in the delta file.
69 The delta files are stored in a branch named "pristine-tar", with filenames
70 corresponding to the input tarball, with ".delta" appended. This
71 branch is created or updated as needed to add each new delta.
73 =item pristine-tar checkout I<tarball>
75 This regenerates a copy of the specified I<tarball> using information
76 previously saved in version control by B<pristine-tar commit>.
78 =item pristine-tar list
80 This lists tarballs that pristine-tar is able to checkout from version
93 Verbose mode, show each command that is run.
105 Don't clean up the temporary directory on exit.
109 =item --message=message
111 Use this option to specify a custom commit message to pristine-tar commit.
117 Suppose you maintain the hello package, in a git repository. You have
118 just created a tarball of the release, I<hello-1.0.tar.gz>, which you
119 will upload to a "forge" site.
121 You want to ensure that, if the "forge" loses the tarball, you can always
122 recreate exactly that same tarball. And you'd prefer not to keep copies
123 of tarballs for every release, as that could use a lot of disk space
124 when hello gets the background mp3s and user-contributed levels you
125 are planning for version 2.0.
127 The solution is to use pristine-tar to commit a delta file that efficiently
128 stores enough information to reproduce the tarball later.
132 pristine-tar commit ../hello-1.0.tar.gz 1.0
134 Remember to tell git to push both the pristine-tar branch, and your tag:
136 git push --all --tags
138 Now it is a year later. The worst has come to pass; the "forge" lost
139 all its data, you deleted the tarballs to make room for bug report emails,
140 and you want to regenerate them. Happily, the git repository is still
143 git clone git://github.com/joeyh/hello.git
145 pristine-tar checkout ../hello-1.0.tar.gz
149 Only tarballs, gzipped tarballs, bzip2ed tarballs, and xzed tarballs
150 are currently supported.
152 Currently only the git revision control system is supported by the
153 "checkout" and "commit" commands. It's ok if the working copy
154 is not clean or has uncommitted changes, or has changes staged in the
155 index; none of that will be touched by "checkout" or "commit".
163 Specifies a location to place temporary files, other than the default.
169 Joey Hess <joeyh@debian.org>
171 Licensed under the GPL, version 2 or above.
178 use Pristine::Tar::Delta;
179 use Pristine::Tar::Formats;
182 use Cwd qw{getcwd abs_path};
184 # Force locale to C since tar may output utf-8 filenames differently
185 # depending on the locale.
188 # Don't let environment change tar's behavior.
189 delete $ENV{TAR_OPTIONS};
192 # The following two assignments are potentially munged during the
193 # build process to hold the values of TAR_PROGRAM and XDELTA_PROGRAM
194 # parameters as given to Makefile.PL.
195 my $tar_program = "tar";
196 my $xdelta_program = "xdelta";
203 gentar => [\&gentar, 2],
204 gendelta => [\&gendelta, 2],
205 commit => [\&commit],
207 checkout => [\&checkout, 1],
208 co => [\&checkout, 1],
212 "m|message=s" => \$message,
217 print STDERR "Usage: pristine-tar [-vdk] gendelta tarball delta\n";
218 print STDERR " pristine-tar [-vdk] gentar delta tarball\n";
219 print STDERR " pristine-tar [-vdk] [-m message] commit tarball [upstream]\n";
220 print STDERR " pristine-tar [-vdk] checkout tarball\n";
221 print STDERR " pristine-tar list\n";
225 sub unquote_filename {
226 my $filename = shift;
228 $filename =~ s/\\a/\a/g;
229 $filename =~ s/\\b/\b/g;
230 $filename =~ s/\\f/\f/g;
231 $filename =~ s/\\n/\n/g;
232 $filename =~ s/\\r/\r/g;
233 $filename =~ s/\\t/\t/g;
234 $filename =~ s/\\v/\x11/g;
235 $filename =~ s/\\\\/\\/g;
240 my $recreatetarball_tempdir;
241 sub recreatetarball {
242 my $manifestfile=shift;
246 my $tempdir=tempdir();
249 open (IN, "<", $manifestfile) || die "$manifestfile: $!";
255 link($manifestfile, "$tempdir/manifest") || die "link $tempdir/manifest: $!";
257 # The manifest and source should have the same filenames,
258 # but the manifest probably has all the files under a common
259 # subdirectory. Check if it does.
261 foreach my $file (@manifest) {
262 #debug("file: $file");
263 if ($file=~m!^(/?[^/]+)(/|$)!) {
264 if (length $subdir && $subdir ne $1) {
265 debug("found file not in subdir $subdir: $file");
269 elsif (! length $subdir) {
271 debug("set subdir to $subdir");
275 debug("found file not in subdir: $file");
281 if (length $subdir) {
282 debug("subdir is $subdir");
283 doit("mkdir", "$tempdir/workdir");
287 if (! $options{clobber_source}) {
288 doit("cp", "-a", $source, "$tempdir/workdir$subdir");
291 doit("mv", $source, "$tempdir/workdir$subdir");
294 # It's important that this create an identical tarball each time
295 # for a given set of input files. So don't include file metadata
296 # in the tarball, since it can easily vary.
298 foreach my $file (@manifest) {
299 my $unquoted_file = unquote_filename($file);
301 if (-l "$tempdir/workdir/$unquoted_file") {
302 # Can't set timestamp of a symlink, so
303 # replace the symlink with an empty file.
304 unlink("$tempdir/workdir/$unquoted_file") || die "unlink: $!";
305 open(OUT, ">", "$tempdir/workdir/$unquoted_file") || die "open: $!";
308 elsif (! -e "$tempdir/workdir/$unquoted_file") {
309 debug("$file is listed in the manifest but may not be present in the source directory");
312 if ($options{create_missing}) {
313 # Avoid tar failing on the nonexistent item by
314 # creating a dummy directory.
315 debug("creating missing $unquoted_file");
316 mkpath "$tempdir/workdir/$unquoted_file";
320 if (-d "$tempdir/workdir/$unquoted_file" && (-u _ || -g _ || -k _)) {
321 # tar behaves weirdly for some special modes
322 # and ignores --mode, so clear them.
323 debug("chmod $file");
324 chmod(0755, "$tempdir/workdir/$unquoted_file") ||
329 # Set file times only after modifying of the directory content is
331 foreach my $file (@manifest) {
332 my $unquoted_file = unquote_filename($file);
333 if (-e "$tempdir/workdir/$unquoted_file") {
334 utime(0, 0, "$tempdir/workdir/$unquoted_file") || die "utime: $file: $!";
338 # If some files couldn't be matched up with the manifest,
339 # it's possible they do exist, but just with names that make sense
340 # to tar, but not to this program. Work around this and make sure
341 # such files have their metadata tweaked, by doing a full sweep of
344 debug("doing full tree sweep to catch missing files");
348 unlink($_) || die "unlink: $!";
349 open(OUT, ">", $_) || die "open: $!";
352 if (-d $_ && (-u _ || -g _ || -k _)) {
356 }, "$tempdir/workdir");
358 utime(0, 0, $_) || die "utime: $_: $!";
359 }, "$tempdir/workdir");
362 delete $ENV{TAR_LONGLINK_100};
363 $recreatetarball_tempdir=$tempdir;
364 return recreatetarball_helper(%options);
367 sub recreatetarball_helper {
369 my $tempdir=$recreatetarball_tempdir;
371 my $ret="$tempdir/recreatetarball";
372 my @cmd=($tar_program, "cf", $ret, "--owner", 0, "--group", 0,
373 "--numeric-owner", "-C", "$tempdir/workdir",
374 "--no-recursion", "--mode", "0644",
375 "--files-from", "$tempdir/manifest");
376 if (exists $options{tar_format}) {
377 push @cmd, ("-H", $options{tar_format});
385 sub recreatetarball_longlink_100 {
386 # For a long time, Debian's tar had a patch that made it output
387 # larger tar files if a filename was exactly 100 bytes. Now that
388 # Debian's tar has been fixed, in order to recreate the tarball
389 # created by that version of tar, we reply on on an environment
390 # variable to turn back on the old behavior.
392 # This variable is currently only available in Debian's tar,
393 # so users of non-debian tar who want to recreate tarballs from
394 # deltas created using the old version of Debian's tar are SOL.
396 $ENV{TAR_LONGLINK_100}=1;
397 my $ret=recreatetarball_helper();
398 delete $ENV{TAR_LONGLINK_100};
407 my $delta=Pristine::Tar::Delta::read(Tarball => $deltafile);
408 Pristine::Tar::Delta::assert($delta, type => "tar", maxversion => 2,
409 minversion => 2, fields => [qw{manifest delta}]);
411 my $out=(defined $delta->{wrapper}
412 ? tempdir()."/".basename($tarball).".tmp"
416 push @try, sub { recreatetarball($delta->{manifest}, getcwd,
417 clobber_source => 0, %opts) };
418 push @try, \&recreatetarball_longlink_100;
419 push @try, sub { recreatetarball($delta->{manifest}, getcwd,
420 clobber_source => 0, tar_format => "gnu", %opts) };
421 push @try, sub { recreatetarball($delta->{manifest}, getcwd,
422 clobber_source => 0, tar_format => "posix", %opts) };
425 foreach my $variant (@try) {
426 my $recreatetarball=$variant->();
427 my $ret=try_doit($xdelta_program, "patch", $delta->{delta}, $recreatetarball, $out);
434 error "Failed to reproduce original tarball. Please file a bug report.";
437 if (defined $delta->{wrapper}) {
438 my $delta_wrapper=Pristine::Tar::Delta::read(Tarball => $delta->{wrapper});
439 if (grep { $_ eq $delta_wrapper->{type} } qw{gz bz2 xz}) {
440 doit("pristine-".$delta_wrapper->{type},
441 ($verbose ? "-v" : "--no-verbose"),
442 ($debug ? "-d" : "--no-debug"),
443 ($keep ? "-k" : "--no-keep"),
444 "gen".$delta_wrapper->{type},
445 $delta->{wrapper}, $out);
446 doit("mv", "-f", $out.".".$delta_wrapper->{type}, $tarball);
449 error "unknown wrapper file type: ".
450 $delta_wrapper->{type};
459 open(IN, "tar --quoting-style=escape -tf $tarball |") || die "tar tf: $!";
460 open(OUT, ">", $manifest) || die "$!";
463 # ./ or / in the manifest just confuses tar
465 print OUT "$_\n" if length $_;
476 my $tempdir=tempdir();
479 # Check to see if it's compressed, and get uncompressed tarball.
480 my $compression=undef;
481 if (is_gz($tarball)) {
483 open(IN, "-|", "zcat", $tarball) || die "zcat: $!";
484 open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
485 print OUT $_ while <IN>;
486 close IN || die "zcat: $!";
487 close OUT || die "$tempdir/origtarball: $!";
489 elsif (is_bz2($tarball)) {
491 open(IN, "-|", "bzcat", $tarball) || die "bzcat: $!";
492 open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
493 print OUT $_ while <IN>;
494 close IN || die "bzcat: $!";
495 close OUT || die "$tempdir/origtarball: $!";
497 elsif (is_xz($tarball)) {
499 open(IN, "-|", "xzcat", $tarball) || die "xzcat: $!";
500 open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
501 print OUT $_ while <IN>;
502 close IN || die "xzcat: $!";
503 close OUT || die "$tempdir/origtarball: $!";
507 # Generate a wrapper file to recreate the compressed file.
508 if (defined $compression) {
509 $delta{wrapper}="$tempdir/wrapper";
510 doit("pristine-$compression",
511 ($verbose ? "-v" : "--no-verbose"),
512 ($debug ? "-d" : "--no-debug"),
513 ($keep ? "-k" : "--no-keep"),
514 "gendelta", $tarball, $delta{wrapper});
515 $tarball="$tempdir/origtarball";
518 $delta{manifest}="$tempdir/manifest";
519 genmanifest($tarball, $delta{manifest});
522 if (! exists $opts{recreatetarball}) {
523 my $sourcedir="$tempdir/tmp";
524 doit("mkdir", $sourcedir);
525 doit($tar_program, "xf", File::Spec->rel2abs($tarball), "-C", $sourcedir);
526 # if all files were in a subdir, use the subdir as the sourcedir
527 my @out=grep { $_ ne "$sourcedir/.." && $_ ne "$sourcedir/." }
528 (glob("$sourcedir/*"), glob("$sourcedir/.*"));
529 if ($#out == 0 && -d $out[0]) {
532 $recreatetarball=recreatetarball("$tempdir/manifest", $sourcedir, clobber_source => 1);
535 $recreatetarball=$opts{recreatetarball};
538 $delta{delta}="$tempdir/delta";
539 my $ret=system("$xdelta_program delta -0 --pristine $recreatetarball $tarball $delta{delta}") >> 8;
540 # xdelta exits 1 on success if there were differences
541 if ($ret != 1 && $ret != 0) {
542 error "xdelta failed with return code $ret";
545 if (-s $delta{delta} >= -s $tarball) {
546 print STDERR "error: excessively large binary delta for $tarball\n";
547 if (! defined $compression) {
548 print STDERR "(Probably the tarball is compressed with an unsupported form of compression.)\n";
551 print STDERR "(Please consider filing a bug report.)\n";
556 Pristine::Tar::Delta::write(Tarball => $deltafile, {
565 (exists $ENV{GIT_DIR} && length $ENV{GIT_DIR})) {
569 error("cannot determine type of vcs used for the current directory");
581 if (defined $upstream && $upstream =~ /[A-Za-z0-9]{40}/) {
585 if (! defined $upstream) {
586 $upstream='upstream';
589 my @reflines=map { chomp; $_ } `git show-ref \Q$upstream\E`;
591 error "failed to find ref using: git show-ref $upstream";
594 # if one line's ref matches exactly, use it
595 foreach my $line (@reflines) {
596 my ($b)=$line=~/^[A-Za-z0-9]+\s(.*)/;
597 if ($b eq $upstream || $b eq "refs/heads/$upstream") {
598 ($id)=$line=~/^([A-Za-z0-9]+)\s/;
604 if (@reflines == 1) {
605 ($id)=$reflines[0]=~/^([A-Za-z0-9]+)\s/;
608 error "more than one ref matches \"$upstream\":\n".
609 join("\n", @reflines);
614 # We have an id that is probably a commit. Let's get to the
615 # id of the actual tree instead. This makes us more robust
616 # against any later changes to the commit.
617 my $treeid=`git rev-parse '$id^{tree}'`;
619 $id = $treeid if length $treeid;
621 doit("git archive --format=tar \Q$id\E | (cd '$dest' && tar x)");
624 die "unsupported vcs $vcs";
631 # Looks for a branch with the given name. If a local branch exists,
632 # returns it. Otherwise, looks for a remote branch, and if exactly
633 # one exists, returns that. If there's no such branch at all, returns
634 # undef. Finally, if there are multiple remote branches and no
635 # local branch, fails with an error.
638 my @reflines=split(/\n/, `git show-ref \Q$branch\E`);
639 my @remotes=grep { ! m/ refs\/heads\/\Q$branch\E$/ } @reflines;
640 if ($#reflines != $#remotes) {
644 if (@reflines == 0) {
647 elsif (@remotes == 1) {
648 my ($remote_branch)=$remotes[0]=~/^[A-Za-z0-9]+\s(.*)/;
649 return $remote_branch;
652 error "There's no local $branch branch. Several remote $branch branches exist.\n".
653 "Run \"git branch --track $branch <remote>\" to create a local $branch branch\n".
654 join("\n", @remotes);
662 my $branch="pristine-tar";
663 my $deltafile=basename($tarball).".delta";
664 my $idfile=basename($tarball).".id";
670 my $b=git_findbranch($branch);
672 error "no $branch branch found, use \"pristine-tar commit\" first";
674 elsif ($b eq $branch) {
675 $branch="refs/heads/$branch";
682 $delta=`git show $branch:\Q$deltafile\E`;
684 error "git show $branch:$deltafile failed";
686 if (! length $delta) {
687 error "git show $branch:$deltafile returned no content";
689 $id=`git show $branch:\Q$idfile\E`;
691 error "git show $branch:$idfile failed";
695 error "git show $branch:$idfile returned no id";
699 die "unsupported vcs $vcs";
702 return ($delta, $id);
710 my $branch="pristine-tar";
711 my $deltafile=basename($tarball).".delta";
712 my $idfile=basename($tarball).".id";
713 my $commit_message=defined $message ? $message :
714 "pristine-tar data for ".basename($tarball);
718 my $tempdir=tempdir();
719 open(OUT, ">$tempdir/$deltafile") || die "$tempdir/$deltafile: $!";
722 open(OUT, ">$tempdir/$idfile") || die "$tempdir/$idfile: $!";
726 # Commit the delta to a branch in git without affecting the
727 # index, and without touching the working tree. Aka deep
729 $ENV{GIT_INDEX_FILE}="$tempdir/index";
730 if (! exists $ENV{GIT_DIR} || ! length $ENV{GIT_DIR}) {
731 $ENV{GIT_DIR}=getcwd."/.git";
734 $ENV{GIT_DIR}=abs_path($ENV{GIT_DIR});
736 chdir($tempdir) || die "chdir: $!";
738 # If there's no local branch, branch from a remote branch
739 # if one exists. If there's no remote branch either, the
740 # code below will create the local branch.
741 my $b=git_findbranch($branch);
742 if (defined $b && $b ne $branch) {
743 doit("git branch --track \Q$branch\E \Q$b\E");
746 my $branch_exists=(system("git show-ref --quiet --verify refs/heads/$branch") == 0);
747 if ($branch_exists) {
748 doit("git ls-tree -r --full-name $branch | git update-index --index-info");
750 doit("git", "update-index", "--add", $deltafile, $idfile);
751 my $sha=`git write-tree`;
753 error("git write-tree failed");
757 error("git write-tree did not return a sha");
759 my $pid = open(COMMIT, "|-");
762 my $commitopts=$branch_exists ? "-p $branch" : "";
763 my $commitsha=`git commit-tree $sha $commitopts`;
765 error("git commit-tree failed");
767 $commitsha=~s/\n//sg;
768 if (! length $commitsha) {
769 error("git commit-tree did not return a sha");
771 doit("git", "update-ref", "refs/heads/$branch", $commitsha);
776 print COMMIT $commit_message."\n";
777 close COMMIT || error("git commit-tree failed");
780 message("committed $deltafile to branch $branch");
783 die "unsupported vcs $vcs";
789 my $upstream=shift; # optional
791 if (! defined $tarball || @_) {
795 my $tempdir=tempdir();
796 my ($sourcedir, $id)=export($upstream);
797 genmanifest($tarball, "$tempdir/manifest");
798 my $recreatetarball=recreatetarball("$tempdir/manifest", $sourcedir,
799 clobber_source => 1, create_missing => 1);
800 my $pid = open(GENDELTA, "-|");
803 gendelta($tarball, "-", recreatetarball => $recreatetarball);
807 my $delta=<GENDELTA>;
808 close GENDELTA || error "failed to generate delta";
809 commitdelta($delta, $id, $tarball);
815 my ($delta, $id)=checkoutdelta($tarball);
816 my ($sourcedir, undef)=export($id);
817 my $pid = open(GENTAR, "|-");
820 $tarball=abs_path($tarball);
821 chdir($sourcedir) || die "chdir $sourcedir: $!";
822 gentar("-", $tarball, clobber_source => 1, create_missing => 1);
826 close GENTAR || error "failed to generate tarball";
828 message("successfully generated $tarball");
832 my $branch="pristine-tar";
835 my $b=git_findbranch($branch);
837 open (LIST, "git ls-tree $b --name-only |");
840 next unless s/\.delta$//;
846 die "unsupported vcs $vcs";