review.tizen.org Git - platform/upstream/snappy.git/log

Update CI configurations.

Bump GCC and Clang on Travis and remove Visual Studio 2015 from AppVeyor.

Ensure DecompressAllTags starts on a 32-byte boundary + 16 bytes.

First of all, I'm sorry about this ugly hack. I hope the following long
explanation is enough to justify it.

We have observed that, in some conditions, the results for dataset number 10
(pb) in the zippy benchmark can show a >20% regression on Skylake CPUs.

In order to diagnose this, we profiled the benchmark looking at hot functions
(99% of the time is spent on DecompressAllTags), then looked at the generated
code to see if there was any difference. In order to discard a minor difference
we observed in register allocation we replaced zippy.cc with a pre-built assembly
file so it was the same in both variants, and we still were able to reproduce the
regression.

After discarding a regression caused by the compiler, we digged a bit further
and noticed that the alignment of the function in the final binary was
different. Both were aligned to a 16-byte boundary, but the slower one was also
(by chance) aligned to a 32-byte boundary. A regression caused by alignment
differences would explain why I could reproduce it consistently on the same CitC
client, but not others: slight differences in the sources can cause the resulting
binary to have different layout.

Here are some detailed benchmark results before/after the fix. Note how fixing
the alignment makes the difference between baseline and experiment go away, but
regular 32-byte alignment puts both variants in the same ballpark as the
original regression:

Original (note BM_UCord_10 and BM_UDataBuffer_10 around the -24% line):

  BASELINE
  BM_UCord/10                    2938           2932          24194 3.767GB/s  pb
  BM_UDataBuffer/10              3008           3004          23316 3.677GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3797           3789          18512 2.915GB/s  pb
  BM_UDataBuffer/10              4024           4016          17543 2.750GB/s  pb

Aligning DecompressAllTags to a 32-byte boundary:

  BASELINE
  BM_UCord/10                    3872           3862          18035 2.860GB/s  pb
  BM_UDataBuffer/10              4010           3998          17591 2.763GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3884           3876          18126 2.850GB/s  pb
  BM_UDataBuffer/10              4037           4027          17199 2.743GB/s  pb

Aligning DecompressAllTags to a 32-byte boundary + 16 bytes (this patch):

  BASELINE
  BM_UCord/10                    3103           3095          22642 3.569GB/s  pb
  BM_UDataBuffer/10              3186           3177          21947 3.476GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3104           3095          22632 3.569GB/s  pb
  BM_UDataBuffer/10              3167           3159          22076 3.496GB/s  pb

This change forces the "good" alignment for DecompressAllTags which, if
anything, should make benchmark results more stable (and maybe we'll improve
some unlucky application!).

Fix an incorrect analysis / comment in the "pattern doubling" code.

This should have a miniscule positive effect on performance; the
main idea of the CL is just to fix the incorrect comment.

Fix Travis CI configuration for OSX.

Rework a very hot, very sensitive part of snappy to reduce the number of
instructions, the number of dynamic branches, and avoid a particular
loop structure than LLVM has a very hard time optimizing for this
particular case.

The code being changed is part of the hottest path for snappy
decompression. In the benchmarks for decompressing protocol buffers,
this has proven to be amazingly sensitive to the slightest changes in
code layout. For example, previously we added '.p2align 5' assembly
directive to the code. This essentially padded the loop out from the
function. Merely by doing this we saw significant performance
improvements.

As a consequence, several of the compiler's typically reasonable
optimizations can have surprising bad impacts. Loop unrolling is a
primary culprit, but in the next LLVM release we are seeing an issue due
to loop rotation. While some of the problems caused by the newly
triggered loop rotation in LLVM can be mitigated with ongoing work on
LLVM's code layout optimizations (specifically, loop header cloning),
that is a fairly long term project. And even minor fluctuations in how
that subsequent optimization is performed may prevent gaining the
performance back.

For now, we need some way to unblock the next LLVM release which
contains a generic improvement to the LLVM loop optimizer that enables
loop rotation in more places, but uncovers this sensitivity and weakness
in a particular case.

This CL restructures the loop to have a simpler structure. Specifically,
we eagerly test what the terminal condition will be and provide two
versions of the copy loop that use a single loop predicate.

The comments in the source code and benchmarks indicate that only one of
these two cases is actually hot: we expect to generally have enough slop
in the buffer. That in turn allows us to generate a much simpler branch
and loop structure for the hot path (especially for the protocol buffer
decompression benchmark).

However, structuring even this simple loop in a way that doesn't trigger
some other performance bubble (often a more severe one) is quite
challenging. We have to carefully manage the variables used in the loop
and the addressing pattern. We should teach LLVM how to do this
reliably, but that too is a *much* more significant undertaking and is
extremely rare to have this degree of importance. The desired structure
of the loop, as shown with IACA's analysis for the broadwell
micro-architecture (HSW and SKX are similar):

| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   1    |           |     | 1.0   1.0 |           |     |     |     |     |    | mov rcx, qword ptr [rdi+rdx*1-0x8]
|   2^   |           |     |           | 0.4       | 1.0 |     |     | 0.6 |    | mov qword ptr [rdi], rcx
|   1    |           |     |           | 1.0   1.0 |     |     |     |     |    | mov rcx, qword ptr [rdi+rdx*1]
|   2^   |           |     | 0.3       |           | 1.0 |     |     | 0.7 |    | mov qword ptr [rdi+0x8], rcx
|   1    | 0.5       |     |           |           |     | 0.5 |     |     |    | add rdi, 0x10
|   1    | 0.2       |     |           |           |     |     | 0.8 |     |    | cmp rdi, rax
|   0F   |           |     |           |           |     |     |     |     |    | jb 0xffffffffffffffe9

Specifically, the arrangement of addressing modes for the stores such
that micro-op fusion (indicated by the `^` on the `2` micro-op count) is
important to achieve good throughput for this loop.

The other thing necessary to make this change effective is to remove our
previous hack using `.p2align 5` to pad out the main decompression loop,
and to forcibly disable loop unrolling for critical loops. Because this
change simplifies the loop structure, more unrolling opportunities show
up. Also, the next LLVM release's generic loop optimization improvements
allow unrolling in more places, requiring still more disabling of
unrolling in this change.  Perhaps most surprising of these is that we
must disable loop unrolling in the *slow* path. While unrolling there
seems pointless, it should also be harmless.  This cold code is laid out
very far away from all of the hot code. All the samples shown in a
profile of the benchmark occur before this loop in the function. And
yet, if the loop gets unrolled (which seems to only happen reliably with
the next LLVM release) we see a nearly 20% regression in decompressing
protocol buffers!

With the current release of LLVM, we still observe some regression from
this source change, but it is fairly small (5% on decompressing protocol
buffers, less elsewhere). And with the next LLVM release it drops to
under 1% even in that case. Meanwhile, without this change, the next
release of LLVM will regress decompressing protocol buffers by more than
10%.

Fix generated version number in open source release.

Lands GitHub PR #61. The patch was also independently contributed by
Martin Gieseking <martin.gieseking@uos.de>.

Tag open source release 1.1.7.

Add a loop alignment directive to work around a performance regression.

We found LLVM upstream change at rL310792 degraded zippy benchmark by
~3%. Performance analysis showed the regression was caused by some
side-effect. The incidental loop alignment change (from 32 bytes to 16
bytes) led to increase of branch miss prediction and caused the
regression. The regression was reproducible on several intel
micro-architectures, like sandybridge, haswell and skylake. Sadly we
still don't have good understanding about the internal of intel branch
predictor and cannot explain how the branch miss prediction increases
when the loop alignment changes, so we cannot make a real fix here. The
workaround solution in the patch is to add a directive, align the hot
loop to 32 bytes, which can restore the performance. This is in order to
unblock the flip of default compiler to LLVM.

Add GNUInstallDirs to CMake configuration.

This is modeled after https://github.com/google/googletest/pull/1160.
The immediate benefit is fixing the library install paths on 64-bit
Linux distributions, which tend to support running 32-bit and 64-bit
code side by side by installing 32-bit libraries in /usr/lib and 64-bit
libraries in /usr/lib64.

Use 64-bit optimized code path for ARM64.

This is inspired by https://github.com/google/snappy/pull/22.

Benchmark results with the change, Pixel C with Android N2G48B

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             119544     119253       1501 818.9MB/s  html
BM_UFlat/1            1223950    1208588        163 554.0MB/s  urls
BM_UFlat/2              16081      15962      11527 7.2GB/s  jpg
BM_UFlat/3                356        352     416666 540.6MB/s  jpg_200
BM_UFlat/4              25010      24860       7683 3.8GB/s  pdf
BM_UFlat/5             484832     481572        407 811.1MB/s  html4
BM_UFlat/6             408410     408713        482 354.9MB/s  txt1
BM_UFlat/7             361714     361663        553 330.1MB/s  txt2
BM_UFlat/8            1090582    1087912        182 374.1MB/s  txt3
BM_UFlat/9            1503127    1503759        133 305.6MB/s  txt4
BM_UFlat/10            114183     114285       1715 989.6MB/s  pb
BM_UFlat/11            406714     407331        491 431.5MB/s  gaviota
BM_UIOVec/0            370397     369888        538 264.0MB/s  html
BM_UIOVec/1           3207510    3190000        100 209.9MB/s  urls
BM_UIOVec/2             16589      16573      11223 6.9GB/s  jpg
BM_UIOVec/3              1052       1052     165289 181.2MB/s  jpg_200
BM_UIOVec/4             49151      49184       3985 1.9GB/s  pdf
BM_UValidate/0          68115      68095       2893 1.4GB/s  html
BM_UValidate/1         792652     792000        250 845.4MB/s  urls
BM_UValidate/2            334        334     487804 343.1GB/s  jpg
BM_UValidate/3            235        235     666666 809.9MB/s  jpg_200
BM_UValidate/4           6126       6130      32626 15.6GB/s  pdf
BM_ZFlat/0             292697     290560        678 336.1MB/s  html (22.31 %)
BM_ZFlat/1            4062080    4050000        100 165.3MB/s  urls (47.78 %)
BM_ZFlat/2              29225      29274       6422 3.9GB/s  jpg (99.95 %)
BM_ZFlat/3               1099       1098     163934 173.7MB/s  jpg_200 (73.00 %)
BM_ZFlat/4              44117      44233       4205 2.2GB/s  pdf (83.30 %)
BM_ZFlat/5            1158058    1157894        171 337.4MB/s  html4 (22.52 %)
BM_ZFlat/6            1102983    1093922        181 132.6MB/s  txt1 (57.88 %)
BM_ZFlat/7             974142     975490        204 122.4MB/s  txt2 (61.91 %)
BM_ZFlat/8            2984670    2990000        100 136.1MB/s  txt3 (54.99 %)
BM_ZFlat/9            4100130    4090000        100 112.4MB/s  txt4 (66.26 %)
BM_ZFlat/10            276236     275139        716 411.0MB/s  pb (19.68 %)
BM_ZFlat/11            760091     759541        262 231.4MB/s  gaviota (37.72 %)

Baseline benchmark results, Pixel C with Android N2G48B

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             148957     147565       1335 661.8MB/s  html
BM_UFlat/1            1527257    1500000        132 446.4MB/s  urls
BM_UFlat/2              19589      19397       8764 5.9GB/s  jpg
BM_UFlat/3                425        418     408163 455.3MB/s  jpg_200
BM_UFlat/4              30096      29552       6497 3.2GB/s  pdf
BM_UFlat/5             595933     594594        333 657.0MB/s  html4
BM_UFlat/6             516315     514360        383 282.0MB/s  txt1
BM_UFlat/7             454653     453514        441 263.2MB/s  txt2
BM_UFlat/8            1382687    1361111        144 299.0MB/s  txt3
BM_UFlat/9            1967590    1904761        105 241.3MB/s  txt4
BM_UFlat/10            148271     144560       1342 782.3MB/s  pb
BM_UFlat/11            523997     510471        382 344.4MB/s  gaviota
BM_UIOVec/0            478443     465227        417 209.9MB/s  html
BM_UIOVec/1           4172860    4060000        100 164.9MB/s  urls
BM_UIOVec/2             21470      20975       7342 5.5GB/s  jpg
BM_UIOVec/3              1357       1330      75187 143.4MB/s  jpg_200
BM_UIOVec/4             63143      61365       3031 1.6GB/s  pdf
BM_UValidate/0          86910      85125       2279 1.1GB/s  html
BM_UValidate/1        1022256    1000000        195 669.6MB/s  urls
BM_UValidate/2            420        417     400000 274.6GB/s  jpg
BM_UValidate/3            311        302     571428 630.0MB/s  jpg_200
BM_UValidate/4           7778       7584      25445 12.6GB/s  pdf
BM_ZFlat/0             469209     457547        424 213.4MB/s  html (22.31 %)
BM_ZFlat/1            5633510    5460000        100 122.6MB/s  urls (47.78 %)
BM_ZFlat/2              37896      36693       4524 3.1GB/s  jpg (99.95 %)
BM_ZFlat/3               1485       1441     123456 132.3MB/s  jpg_200 (73.00 %)
BM_ZFlat/4              74870      72775       2652 1.3GB/s  pdf (83.30 %)
BM_ZFlat/5            1857321    1785714        112 218.8MB/s  html4 (22.52 %)
BM_ZFlat/6            1538723    1492307        130 97.2MB/s  txt1 (57.88 %)
BM_ZFlat/7            1338236    1310810        148 91.1MB/s  txt2 (61.91 %)
BM_ZFlat/8            4050820    4040000        100 100.7MB/s  txt3 (54.99 %)
BM_ZFlat/9            5234940    5230000        100 87.9MB/s  txt4 (66.26 %)
BM_ZFlat/10            400309     400000        495 282.7MB/s  pb (19.68 %)
BM_ZFlat/11           1063042    1058510        188 166.1MB/s  gaviota (37.72 %)

Add unistd.h checks back to the CMake build.

getpagesize(), as well as its POSIX.2001 replacement
sysconf(_SC_PAGESIZE), is defined in <unistd.h>. On Linux and OS X,
including <sys/mman.h> is sufficient to get a definition for
getpagesize(). However, this is not true for the Android NDK. This CL
brings back the HAVE_UNISTD_H definition and its associated header
check.

This also adds a HAVE_FUNC_SYSCONF definition, which checks for the
presence of sysconf(). The definition can be used later to replace
getpagesize() with sysconf().

Replace getpagesize() with sysconf(_SC_PAGESIZE).

getpagesize() has been removed from POSIX.1-2001. Its recommended
replacement is sysconf(_SC_PAGESIZE).

Add guidelines for opensource contributions.

The guidelines follow the instructions at
https://opensource.google.com/docs/releasing/preparing/#CONTRIBUTING

Use _BitScanForward and _BitScanReverse on MSVC.

Based on https://github.com/google/snappy/pull/30

Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros.

Redo CMake configuration.

The style was changed to match the official manual [1], the install
configuration was simplified and now matches the official packaging
guide [2], and the config files use the CMake-specific variable syntax
${VAR} instead of the autoconf-compatible syntax @VAR@, as documented in
[3]. The public header files are declared as such (for CMake 3.3+), and
the generated headers are included in the library target definition.

The tests are only built if SNAPPY_BUILD_TESTS (default ON) is true, so
zippy can be easily used in projects that add_subdirectory() its source
code directly, instead of using find_package().

[1] https://cmake.org/cmake/help/git-master/manual/cmake-language.7.html
[2] https://cmake.org/cmake/help/git-master/manual/cmake-packages.7.html
[3] https://cmake.org/cmake/help/git-master/command/configure_file.html

Small improvements to open source CI configuration.

This CL fixes 64-bit Windows testing (), makes it possible to view the
test output in the Travis / AppVeyor CI console while the test is
running, and takes advantage of the new support for the .appveyor.yml
file name to make the CI configuration less obtrusive.

Support both static and shared library CMake builds.

This can be used to fix https://github.com/Homebrew/homebrew-core/issues/15722.

Inline DISALLOW_COPY_AND_ASSIGN.

snappy-stubs-public.h defined the DISALLOW_COPY_AND_ASSIGN macro, so the
definition propagated to all translation units that included the open
source headers. The macro is now inlined, thus avoiding polluting the
macro environment of snappy users.

snappy: Remove autoconf build configuration.

Clean up CMake header and type checks.

Unused macros: HAVE_DLFCN_H, HAVE_INTTYPES_H, HAVE_MEMORY_H,
HAVE_STDLIB_H, HAVE_STRINGS_H, HAVE_STRING_H, HAVE_SYS_BYTESWAP_H,
HAVE_SYS_STAT_H, HAVE_SYS_TYPES_H, HAVE_UNISTD_H.

Used but never set macros: HAVE_LIBLZF, HAVE_LIBQUICKLZ. These only gate
conditional includes. The code that takes advantage of them was removed.

Unused types: ssize_t.

The testing code uses HAVE_FUNC_MMAP, which was not wired in the CMake
build, causing a whole test to be skipped.

zippy: Re-release snappy 1.1.5 as 1.1.6.

The migration from autotools to CMake in 1.1.5 wasn't as smooth as
intended. The SONAME / SOVERSION were broken in both build systems,
causing breakages in systems that upgraded from snappy 1.1.4 to 1.1.5,
as reported in https://github.com/Homebrew/homebrew-core/issues/15274
and https://github.com/google/snappy/pull/45.

Tag open source release 1.1.5.

Set minimum CMake version to 3.1.

The project only needs CMake 3.1 features, and some Travis CI bots have
CMake 3.2.2. Therefore, requiring CMake 3.4 is inconvenient.

Update Travis CI config, add AppVeyor for Windows CI coverage.

Explicitly copy internal::wordmask to the stack array to work around a compiler
optimization with LLVM that converts const stack arrays to global arrays.  This
is a temporary change and should be reverted when https://reviews.llvm.org/D30759
is fixed.

With PIE, accessing stack arrays is more efficient than global arrays and
wordmask was moved to the stack due to that.  However, the LLVM compiler
automatically converts stack arrays, detected as constant, to global arrays
and this transformation hurts PIE performance with LLVM.

We are working to fix this in the LLVM compiler, via
https://reviews.llvm.org/D30759, to not do this conversion in PIE mode.  Until
this patch is finished, please consider this source change as a temporary
work around to keep this array on the stack.  This source change is important
to allow some projects to flip the default compiler from GCC to LLVM for
optimized builds.

This change works for the following reason.  The LLVM compiler does not convert
non-const stack arrays to global arrays and explicitly copying the elements is
enough to make the compiler assume that this is a non-const array.

With GCC, this change does not affect code-gen in any significant way.  The
array initialization code is slightly different as it copies the constants
directly to the stack.

With LLVM, this keeps the array on the stack.

No change in performance with GCC (within noise range). With LLVM, ~0.7%
improvement in optimized mode (no FDO) and ~1.75% improvement in FDO
mode.

Remove benchmarking support for fastlz.

Use 64 bit little endian on ppc64le.

This has tangible performance benefits.

This lands https://github.com/google/snappy/pull/27

Improve the SSE2 macro check on Windows.

This lands https://github.com/google/snappy/pull/37

Check for the existence of sys/uio.h in autoconf build.

This lands https://github.com/google/snappy/pull/32

Remove quicklz and lzf support in benchmarks.

Provide a CMakeLists.txt.

This lands https://github.com/google/snappy/pull/29

Clean up unused function warnings in snappy.

Remove "using namespace std;" from zippy-stubs-internal.h.

This makes it easier to build zippy, as some compiles require a warning
suppression to accept "using namespace std".

Add Travis CI configuration to snappy and fix the make build.

The make build in the open source version uses autoconf, which is set up
to expect a project that follows the gnu standard.

Rename README to README.md. It already in markdown, we might as well let github know so that it renders nicely.

Delete UnalignedCopy64 from snappy-stubs since the version in snappy.cc is more robust and possibly faster (assuming the compiler knows how to best copy 8 bytes between locations in memory the fastest way possible - a rather safe bet).

Add std:: prefix to STL non-type names.

In order to disable global using declarations, this CL qualifies
stl names with the std namespace.

Make UnalignedCopy64 not exhibit undefined behavior when src and dst overlap.

name         old speed      new speed      delta
BM_UFlat/0   3.09GB/s ± 3%  3.07GB/s ± 2%  -0.78%  (p=0.009 n=19+19)
BM_UFlat/1   1.63GB/s ± 2%  1.62GB/s ± 2%    ~     (p=0.099 n=19+20)
BM_UFlat/2   19.7GB/s ±19%  20.7GB/s ±11%    ~     (p=0.054 n=20+19)
BM_UFlat/3   1.61GB/s ± 2%  1.60GB/s ± 1%  -0.48%  (p=0.049 n=20+17)
BM_UFlat/4   15.8GB/s ± 7%  15.6GB/s ±10%    ~     (p=0.234 n=20+20)
BM_UFlat/5   2.47GB/s ± 1%  2.46GB/s ± 2%    ~     (p=0.608 n=19+19)
BM_UFlat/6   1.07GB/s ± 2%  1.07GB/s ± 1%    ~     (p=0.128 n=20+19)
BM_UFlat/7   1.01GB/s ± 1%  1.00GB/s ± 2%    ~     (p=0.656 n=15+19)
BM_UFlat/8   1.13GB/s ± 1%  1.13GB/s ± 1%    ~     (p=0.532 n=18+19)
BM_UFlat/9    918MB/s ± 1%   916MB/s ± 1%    ~     (p=0.443 n=19+18)
BM_UFlat/10  3.90GB/s ± 1%  3.90GB/s ± 1%    ~     (p=0.895 n=20+19)
BM_UFlat/11  1.30GB/s ± 1%  1.29GB/s ± 2%    ~     (p=0.156 n=19+19)
BM_UFlat/12  2.35GB/s ± 2%  2.34GB/s ± 1%    ~     (p=0.349 n=19+17)
BM_UFlat/13  2.07GB/s ± 1%  2.06GB/s ± 2%    ~     (p=0.475 n=18+19)
BM_UFlat/14  2.23GB/s ± 1%  2.23GB/s ± 1%    ~     (p=0.983 n=19+19)
BM_UFlat/15  1.55GB/s ± 1%  1.55GB/s ± 1%    ~     (p=0.314 n=19+19)
BM_UFlat/16  1.26GB/s ± 1%  1.26GB/s ± 1%    ~     (p=0.907 n=15+18)
BM_UFlat/17  2.32GB/s ± 1%  2.32GB/s ± 1%    ~     (p=0.604 n=18+19)
BM_UFlat/18  1.61GB/s ± 1%  1.61GB/s ± 1%    ~     (p=0.212 n=18+19)
BM_UFlat/19  1.78GB/s ± 1%  1.78GB/s ± 2%    ~     (p=0.350 n=19+19)
BM_UFlat/20  1.89GB/s ± 1%  1.90GB/s ± 2%    ~     (p=0.092 n=19+19)

Also tested the current version against UNALIGNED_STORE64(dst, UNALIGNED_LOAD64(src)), there is no difference (old is memcpy, new is UNALIGNED*):

name         old speed      new speed      delta
BM_UFlat/0   3.14GB/s ± 1%  3.16GB/s ± 2%    ~     (p=0.156 n=19+19)
BM_UFlat/1   1.62GB/s ± 1%  1.61GB/s ± 2%    ~     (p=0.102 n=19+20)
BM_UFlat/2   18.8GB/s ±17%  19.1GB/s ±11%    ~     (p=0.390 n=20+16)
BM_UFlat/3   1.59GB/s ± 1%  1.58GB/s ± 1%  -1.06%  (p=0.000 n=18+18)
BM_UFlat/4   15.8GB/s ± 6%  15.6GB/s ± 7%    ~     (p=0.184 n=19+20)
BM_UFlat/5   2.46GB/s ± 1%  2.44GB/s ± 1%  -0.95%  (p=0.000 n=19+18)
BM_UFlat/6   1.08GB/s ± 1%  1.06GB/s ± 1%  -1.17%  (p=0.000 n=19+18)
BM_UFlat/7   1.00GB/s ± 1%  0.99GB/s ± 1%  -1.16%  (p=0.000 n=19+18)
BM_UFlat/8   1.14GB/s ± 2%  1.12GB/s ± 1%  -1.12%  (p=0.000 n=19+18)
BM_UFlat/9    921MB/s ± 1%   914MB/s ± 1%  -0.84%  (p=0.000 n=20+17)
BM_UFlat/10  3.94GB/s ± 2%  3.92GB/s ± 1%    ~     (p=0.058 n=19+17)
BM_UFlat/11  1.29GB/s ± 1%  1.28GB/s ± 1%  -0.77%  (p=0.001 n=19+17)
BM_UFlat/12  2.34GB/s ± 1%  2.31GB/s ± 1%  -1.10%  (p=0.000 n=18+18)
BM_UFlat/13  2.06GB/s ± 1%  2.05GB/s ± 1%  -0.73%  (p=0.001 n=19+18)
BM_UFlat/14  2.22GB/s ± 1%  2.20GB/s ± 1%  -0.73%  (p=0.000 n=18+18)
BM_UFlat/15  1.55GB/s ± 1%  1.53GB/s ± 1%  -1.07%  (p=0.000 n=19+18)
BM_UFlat/16  1.26GB/s ± 1%  1.25GB/s ± 1%  -0.79%  (p=0.000 n=18+18)
BM_UFlat/17  2.31GB/s ± 1%  2.29GB/s ± 1%  -0.98%  (p=0.000 n=20+18)
BM_UFlat/18  1.61GB/s ± 1%  1.60GB/s ± 2%  -0.71%  (p=0.001 n=20+19)
BM_UFlat/19  1.77GB/s ± 1%  1.76GB/s ± 1%  -0.61%  (p=0.007 n=19+18)
BM_UFlat/20  1.89GB/s ± 1%  1.88GB/s ± 1%  -0.75%  (p=0.000 n=20+18)

Add compression size reporting hooks.

Also, force inlining util::compression::Sample().

The inlining change is necessary. Without it even with FDO+LIPO the call
doesn't get inlined and uses 4 registers to construct parameters (which
won't be used in the common case). In some of the more compute-bound
tests that causes extra spills and significant overhead (even if
call is sufficiently long).
For example, with inlining:
BM_UFlat/0 32.7µs ± 1% 33.1µs ± 1% +1.41%
without:
BM_UFlat/0 32.7µs ± 1% 37.7µs ± 1% +15.29%

Use #ifdef __SSE2__ for the emmintrin.h include, otherwise snappy.cc does not compile with -march=prescott.

1.1.4 release.

Improve zippy decompression speed.

The CL contains the following optimizations:

1) rewrite IncrementalCopy routine: single routine that splits the code into sections based on typical probabilities observed across a variety of inputs and helps reduce branch mispredictions both for FDO and non-FDO builds. IncrementalCopy is an adaptive routine that selects the best strategy based on input.
2) introduce UnalignedCopy128 that copies 128 bits per cycle using SSE2.
3) add branch hint for the main decoding loop. The non-literal case is taken more often in benchmarks. I expect this to be a noop in production with FDO. Note that this became apparent after step 1 above.
4) use the new IncrementalCopy in ZippyScatteredWriter.

I test two archs: x86_haswell and ppc_power8.

For x86_haswell I use FDO. For ppc_power8 I do not use FDO.

x86_haswell + FDO

name                   old speed      new speed      delta
BM_UCord/0             1.97GB/s ± 1%  3.19GB/s ± 1%  +62.08%  (p=0.000 n=19+18)
BM_UCord/1             1.28GB/s ± 1%  1.51GB/s ± 1%  +18.14%  (p=0.000 n=19+18)
BM_UCord/2             15.6GB/s ± 9%  15.5GB/s ± 7%     ~     (p=0.620 n=20+20)
BM_UCord/3              811MB/s ± 1%   808MB/s ± 1%   -0.38%  (p=0.009 n=17+18)
BM_UCord/4             12.4GB/s ± 4%  12.7GB/s ± 8%   +2.70%  (p=0.002 n=17+20)
BM_UCord/5             1.77GB/s ± 0%  2.33GB/s ± 1%  +31.37%  (p=0.000 n=18+18)
BM_UCord/6              900MB/s ± 1%  1006MB/s ± 1%  +11.71%  (p=0.000 n=18+17)
BM_UCord/7              858MB/s ± 1%   938MB/s ± 2%   +9.36%  (p=0.000 n=19+16)
BM_UCord/8              921MB/s ± 1%   985MB/s ±21%   +6.94%  (p=0.028 n=19+20)
BM_UCord/9              824MB/s ± 1%   800MB/s ±20%     ~     (p=0.113 n=19+20)
BM_UCord/10            2.60GB/s ± 1%  3.67GB/s ±21%  +41.31%  (p=0.000 n=19+20)
BM_UCord/11            1.07GB/s ± 1%  1.21GB/s ± 1%  +13.17%  (p=0.000 n=16+16)
BM_UCord/12            1.84GB/s ± 8%  2.18GB/s ± 1%  +18.44%  (p=0.000 n=16+19)
BM_UCord/13            1.83GB/s ±18%  1.89GB/s ± 1%   +3.14%  (p=0.000 n=17+19)
BM_UCord/14            1.96GB/s ± 2%  1.97GB/s ± 1%   +0.55%  (p=0.000 n=16+17)
BM_UCord/15            1.30GB/s ±20%  1.43GB/s ± 1%   +9.85%  (p=0.000 n=20+20)
BM_UCord/16             658MB/s ±20%   705MB/s ± 1%   +7.22%  (p=0.000 n=20+19)
BM_UCord/17            1.96GB/s ± 2%  2.15GB/s ± 1%   +9.73%  (p=0.000 n=16+19)
BM_UCord/18             555MB/s ± 1%   833MB/s ± 1%  +50.11%  (p=0.000 n=18+19)
BM_UCord/19            1.57GB/s ± 1%  1.75GB/s ± 1%  +11.34%  (p=0.000 n=20+20)
BM_UCord/20            1.72GB/s ± 2%  1.70GB/s ± 2%   -1.01%  (p=0.001 n=20+20)
BM_UCordStringSink/0   2.88GB/s ± 1%  3.15GB/s ± 1%   +9.56%  (p=0.000 n=17+20)
BM_UCordStringSink/1   1.50GB/s ± 1%  1.52GB/s ± 1%   +1.96%  (p=0.000 n=19+20)
BM_UCordStringSink/2   14.5GB/s ±10%  14.6GB/s ±10%     ~     (p=0.542 n=20+20)
BM_UCordStringSink/3   1.06GB/s ± 1%  1.08GB/s ± 1%   +1.77%  (p=0.000 n=18+20)
BM_UCordStringSink/4   12.6GB/s ± 7%  13.2GB/s ± 4%   +4.63%  (p=0.000 n=20+20)
BM_UCordStringSink/5   2.29GB/s ± 1%  2.36GB/s ± 1%   +3.05%  (p=0.000 n=19+20)
BM_UCordStringSink/6   1.01GB/s ± 2%  1.01GB/s ± 0%     ~     (p=0.055 n=20+18)
BM_UCordStringSink/7    945MB/s ± 1%   939MB/s ± 1%   -0.60%  (p=0.000 n=19+20)
BM_UCordStringSink/8   1.06GB/s ± 1%  1.07GB/s ± 1%   +0.62%  (p=0.000 n=18+20)
BM_UCordStringSink/9    866MB/s ± 1%   864MB/s ± 1%     ~     (p=0.107 n=19+20)
BM_UCordStringSink/10  3.64GB/s ± 2%  3.98GB/s ± 1%   +9.32%  (p=0.000 n=19+20)
BM_UCordStringSink/11  1.22GB/s ± 1%  1.22GB/s ± 1%   +0.61%  (p=0.001 n=19+20)
BM_UCordStringSink/12  2.23GB/s ± 1%  2.23GB/s ± 1%     ~     (p=0.692 n=19+20)
BM_UCordStringSink/13  1.96GB/s ± 1%  1.94GB/s ± 1%   -0.82%  (p=0.000 n=17+18)
BM_UCordStringSink/14  2.09GB/s ± 2%  2.08GB/s ± 1%     ~     (p=0.147 n=20+18)
BM_UCordStringSink/15  1.47GB/s ± 1%  1.45GB/s ± 1%   -0.88%  (p=0.000 n=20+19)
BM_UCordStringSink/16   908MB/s ± 1%   917MB/s ± 1%   +0.97%  (p=0.000 n=19+19)
BM_UCordStringSink/17  2.11GB/s ± 1%  2.20GB/s ± 1%   +4.35%  (p=0.000 n=18+20)
BM_UCordStringSink/18   804MB/s ± 2%  1106MB/s ± 1%  +37.52%  (p=0.000 n=20+20)
BM_UCordStringSink/19  1.67GB/s ± 1%  1.72GB/s ± 0%   +2.81%  (p=0.000 n=18+20)
BM_UCordStringSink/20  1.77GB/s ± 3%  1.77GB/s ± 3%     ~     (p=0.815 n=20+20)

ppc_power8

name                   old speed      new speed      delta
BM_UCord/0              918MB/s ± 6%  1262MB/s ± 0%   +37.56%  (p=0.000 n=17+16)
BM_UCord/1              671MB/s ±13%   879MB/s ± 2%   +30.99%  (p=0.000 n=18+16)
BM_UCord/2             12.6GB/s ± 8%  12.6GB/s ± 5%      ~     (p=0.452 n=17+19)
BM_UCord/3              285MB/s ±10%   284MB/s ± 4%    -0.50%  (p=0.021 n=19+17)
BM_UCord/4             5.21GB/s ±12%  6.59GB/s ± 1%   +26.37%  (p=0.000 n=17+16)
BM_UCord/5              913MB/s ± 4%  1253MB/s ± 1%   +37.27%  (p=0.000 n=16+17)
BM_UCord/6              461MB/s ±13%   547MB/s ± 1%   +18.67%  (p=0.000 n=18+16)
BM_UCord/7              455MB/s ± 2%   524MB/s ± 3%   +15.28%  (p=0.000 n=16+18)
BM_UCord/8              489MB/s ± 2%   584MB/s ± 2%   +19.47%  (p=0.000 n=17+17)
BM_UCord/9              410MB/s ±33%   490MB/s ± 1%   +19.64%  (p=0.000 n=17+18)
BM_UCord/10            1.10GB/s ± 3%  1.55GB/s ± 2%   +41.21%  (p=0.000 n=16+16)
BM_UCord/11             494MB/s ± 1%   558MB/s ± 1%   +12.92%  (p=0.000 n=17+18)
BM_UCord/12             608MB/s ± 3%   793MB/s ± 1%   +30.45%  (p=0.000 n=17+16)
BM_UCord/13             545MB/s ±18%   721MB/s ± 2%   +32.22%  (p=0.000 n=19+17)
BM_UCord/14             594MB/s ± 4%   748MB/s ± 3%   +25.99%  (p=0.000 n=17+17)
BM_UCord/15             628MB/s ± 1%   822MB/s ± 3%   +30.94%  (p=0.000 n=18+16)
BM_UCord/16             277MB/s ± 2%   280MB/s ±15%    +0.86%  (p=0.001 n=17+17)
BM_UCord/17             864MB/s ± 1%  1001MB/s ± 3%   +15.96%  (p=0.000 n=17+17)
BM_UCord/18             121MB/s ± 2%   284MB/s ± 4%  +134.08%  (p=0.000 n=17+18)
BM_UCord/19             594MB/s ± 0%   713MB/s ± 2%   +19.93%  (p=0.000 n=16+17)
BM_UCord/20             553MB/s ±10%   662MB/s ± 5%   +19.74%  (p=0.000 n=16+18)
BM_UCordStringSink/0   1.37GB/s ± 4%  1.48GB/s ± 2%    +8.51%  (p=0.000 n=16+16)
BM_UCordStringSink/1    969MB/s ± 1%   990MB/s ± 1%    +2.16%  (p=0.000 n=16+18)
BM_UCordStringSink/2   13.1GB/s ±11%  13.0GB/s ±14%      ~     (p=0.858 n=17+18)
BM_UCordStringSink/3    411MB/s ± 1%   415MB/s ± 1%    +0.93%  (p=0.000 n=16+17)
BM_UCordStringSink/4   6.81GB/s ± 8%  7.29GB/s ± 5%    +7.12%  (p=0.000 n=16+19)
BM_UCordStringSink/5   1.35GB/s ± 5%  1.45GB/s ±13%    +8.00%  (p=0.000 n=16+17)
BM_UCordStringSink/6    653MB/s ± 8%   653MB/s ± 3%    -0.12%  (p=0.007 n=17+19)
BM_UCordStringSink/7    618MB/s ±13%   597MB/s ±18%    -3.45%  (p=0.001 n=18+18)
BM_UCordStringSink/8    702MB/s ± 5%   702MB/s ± 1%    -0.10%  (p=0.012 n=17+16)
BM_UCordStringSink/9    590MB/s ± 2%   564MB/s ±13%    -4.46%  (p=0.000 n=16+17)
BM_UCordStringSink/10  1.63GB/s ± 2%  1.76GB/s ± 4%    +8.28%  (p=0.000 n=17+16)
BM_UCordStringSink/11   630MB/s ±14%   684MB/s ±15%    +8.51%  (p=0.000 n=19+17)
BM_UCordStringSink/12   858MB/s ±12%   903MB/s ± 9%    +5.17%  (p=0.000 n=19+17)
BM_UCordStringSink/13   806MB/s ±22%   879MB/s ± 1%    +8.98%  (p=0.000 n=19+19)
BM_UCordStringSink/14   854MB/s ±13%   901MB/s ± 5%    +5.60%  (p=0.000 n=19+17)
BM_UCordStringSink/15   930MB/s ± 2%   964MB/s ± 3%    +3.59%  (p=0.000 n=16+16)
BM_UCordStringSink/16   363MB/s ±10%   356MB/s ± 6%      ~     (p=0.050 n=20+19)
BM_UCordStringSink/17   976MB/s ±12%  1078MB/s ± 1%   +10.52%  (p=0.000 n=20+17)
BM_UCordStringSink/18   227MB/s ± 1%   355MB/s ± 3%   +56.45%  (p=0.000 n=16+17)
BM_UCordStringSink/19   751MB/s ± 4%   808MB/s ± 4%    +7.70%  (p=0.000 n=18+17)
BM_UCordStringSink/20   761MB/s ± 8%   786MB/s ± 4%    +3.23%  (p=0.000 n=18+17)

adds std:: to stl types (#061)

Re-work fast path for handling copies in zippy decompression.

This is a performance-tuning change that shouldn't change the behavior
of the library.

This adds some complexity but the performance gain might make that
worthwhile: With FDO on perflab/haswell, a 4.0% gain (geometric mean).

SAMPLE (before)

Benchmark         Time(ns)    CPU(ns) Iterations
------------------------------------------------
BM_UFlat/0           36638      36552     100000 2.6GB/s  html
BM_UFlat/1          457153     455895       9173 1.4GB/s  urls
BM_UFlat/2            5850       5837     685481 19.6GB/s  jpg
BM_UFlat/3             122        122   34551988 1.5GB/s  jpg_200
BM_UFlat/4            6797       6781     620811 14.1GB/s  pdf
BM_UFlat/5          179485     179037      23471 2.1GB/s  html4
BM_UFlat/6          142734     142384      29525 1018.7MB/s  txt1
BM_UFlat/7          125233     124924      33709 955.6MB/s  txt2
BM_UFlat/8          382548     381533      10000 1066.7MB/s  txt3
BM_UFlat/9          525614     524297       8018 876.5MB/s  txt4
BM_UFlat/10          34946      34868     100000 3.2GB/s  pb
BM_UFlat/11         149548     149208      28063 1.2GB/s  gaviota
BM_UFlat/12          10684      10663     392580 2.1GB/s  cp
BM_UFlat/13           5494       5484     766584 1.9GB/s  c
BM_UFlat/14           1691       1688    2488784 2.1GB/s  lsp
BM_UFlat/15         676443     674726       6129 1.4GB/s  xls
BM_UFlat/16            156        156   26656909 1.2GB/s  xls_200
BM_UFlat/17         239911     239297      17558 2.0GB/s  bin
BM_UFlat/18            182        182   23072932 1047.9MB/s  bin_200
BM_UFlat/19          21544      21499     194484 1.7GB/s  sum
BM_UFlat/20           2236       2232    1877810 1.8GB/s  man
BM_UFlatSink/0       42266      42179      99732 2.3GB/s  html
BM_UFlatSink/1      461810     460633       9055 1.4GB/s  urls
BM_UFlatSink/2        5816       5804     632829 19.8GB/s  jpg
BM_UFlatSink/3         124        123   34351698 1.5GB/s  jpg_200
BM_UFlatSink/4        7173       7157     609929 13.3GB/s  pdf
BM_UFlatSink/5      184795     184302      22660 2.1GB/s  html4
BM_UFlatSink/6      143552     143223      29272 1012.7MB/s  txt1
BM_UFlatSink/7      127160     126890      33178 940.8MB/s  txt2
BM_UFlatSink/8      382219     381313      10000 1067.3MB/s  txt3
BM_UFlatSink/9      528042     526713       7988 872.5MB/s  txt4
BM_UFlatSink/10      41389      41305     100000 2.7GB/s  pb
BM_UFlatSink/11     147215     146877      28854 1.2GB/s  gaviota
BM_UFlatSink/12      12008      11984     348139 1.9GB/s  cp
BM_UFlatSink/13       5444       5433     775084 1.9GB/s  c
BM_UFlatSink/14       1647       1644    2552119 2.1GB/s  lsp
BM_UFlatSink/15     665011     663424       6320 1.4GB/s  xls
BM_UFlatSink/16        153        153   27571837 1.2GB/s  xls_200
BM_UFlatSink/17     239735     239169      17411 2.0GB/s  bin
BM_UFlatSink/18        183        182   23005573 1046.8MB/s  bin_200
BM_UFlatSink/19      22544      22498     187705 1.6GB/s  sum
BM_UFlatSink/20       2190       2186    1917894 1.8GB/s  man

SAMPLE (after)

Benchmark         Time(ns)    CPU(ns) Iterations
------------------------------------------------
BM_UFlat/0           33940      33889     100000 2.8GB/s  html
BM_UFlat/1          440728     439944       9586 1.5GB/s  urls
BM_UFlat/2            5652       5641     744776 20.3GB/s  jpg
BM_UFlat/3             123        123   34647884 1.5GB/s  jpg_200
BM_UFlat/4            6628       6615     631892 14.4GB/s  pdf
BM_UFlat/5          169523     169227      24197 2.3GB/s  html4
BM_UFlat/6          144139     143892      29232 1008.0MB/s  txt1
BM_UFlat/7          127148     126915      33144 940.6MB/s  txt2
BM_UFlat/8          380267     379233      10000 1073.2MB/s  txt3
BM_UFlat/9          529495     528194       7957 870.0MB/s  txt4
BM_UFlat/10          31844      31784     100000 3.5GB/s  pb
BM_UFlat/11         146822     146476      28737 1.2GB/s  gaviota
BM_UFlat/12          10784      10762     392176 2.1GB/s  cp
BM_UFlat/13           5528       5518     760934 1.9GB/s  c
BM_UFlat/14           1721       1719    2449291 2.0GB/s  lsp
BM_UFlat/15         673304     671774       6255 1.4GB/s  xls
BM_UFlat/16            155        155   27092003 1.2GB/s  xls_200
BM_UFlat/17         230424     229902      18285 2.1GB/s  bin
BM_UFlat/18            185        184   22818199 1033.9MB/s  bin_200
BM_UFlat/19          21035      20996     200765 1.7GB/s  sum
BM_UFlat/20           2242       2238    1864380 1.8GB/s  man
BM_UFlatSink/0       33487      33405     100000 2.9GB/s  html
BM_UFlatSink/1      431108     430226       9764 1.5GB/s  urls
BM_UFlatSink/2        5927       5916     648112 19.4GB/s  jpg
BM_UFlatSink/3         123        122   34704423 1.5GB/s  jpg_200
BM_UFlatSink/4        6472       6461     653462 14.8GB/s  pdf
BM_UFlatSink/5      164309     163988      25567 2.3GB/s  html4
BM_UFlatSink/6      138274     138020      30311 1050.9MB/s  txt1
BM_UFlatSink/7      120844     120637      34708 989.6MB/s  txt2
BM_UFlatSink/8      371046     370366      10000 1098.9MB/s  txt3
BM_UFlatSink/9      510021     508982       8269 902.9MB/s  txt4
BM_UFlatSink/10      30889      30844     100000 3.6GB/s  pb
BM_UFlatSink/11     140752     140521      29903 1.2GB/s  gaviota
BM_UFlatSink/12      10162      10146     413600 2.3GB/s  cp
BM_UFlatSink/13       5264       5256     762398 2.0GB/s  c
BM_UFlatSink/14       1622       1619    2606069 2.1GB/s  lsp
BM_UFlatSink/15     646897     645756       6512 1.5GB/s  xls
BM_UFlatSink/16        150        150   28223595 1.2GB/s  xls_200
BM_UFlatSink/17     226096     225650      18629 2.1GB/s  bin
BM_UFlatSink/18        185        184   22907935 1035.3MB/s  bin_200
BM_UFlatSink/19      21369      21335     198881 1.7GB/s  sum
BM_UFlatSink/20       2139       2136    1953637 1.8GB/s  man

Speed up Zippy decompression in PIE mode by removing the penalty for
global array access.

With PIE, accessing global arrays needs two instructions whereas it can be
done with a single instruction without PIE.  See []
For example, without PIE the access looks like:
mov    0x400780(,%rdi,4),%eax  // One instruction to access arr[i]

and with PIE the access looks like:
lea    0x149(%rip),%rax        # 400780 <_ZL3arr>
mov    (%rax,%rdi,4),%eax

This causes a slow down in zippy as it has two global arrays, wordmask and
char_table.  There is no equivalent PC-relative insn. with PIE to do this in
one instruction.

The slow down can be seen as an increase in dynamic instruction count and
cycles with a similar IPC.  We have seen this affect REDACTED recently and this
is causing a ~1% perf. slow down.

One of the mitigation techniques for small arrays is to move it onto the stack,
use the stack pointer to make the access a single instruction.  The downside to
this is the extra instructions at function call to mov the array onto the stack
which is why we want to do this only for small arrays.  I tried moving
wordmask onto the stack since it is a small array. The performance numbers look
good overall. There is an improvement in the dynamic instruction count for
almost all BM_UFlat benchmarks.  BM_UFlat/2 and BM_UFlat/3 are pretty noisy.
The only case where there is a regression is BM_UFlat/10.  Here, the instruction
count does go down but the IPC also goes down affecting performance. This also
looks noisy but I do see a small IPC drop with this change.  Otherwise, the
numbers look good and consistent.  I measured this on a perflab ivybridge
machine multiple times.  Numbers are given below.  For Improv. (improvements),
positive is good.

Binaries built as: blaze build -c opt --dynamic_mode=off

Benchmark Base CPU(ns) Opt CPU(ns) Improv. Base Cycles Opt Cycles Improv. Base Insns Opt Insns Improv.

BM_UFlat/1 541711 537052 0.86% 46068129918 45442732684 1.36% 85113352848 83917656016 1.40%
BM_UFlat/2 6228 6388 -2.57% 582789808 583267855 -0.08% 1261517746 1261116553 0.03%
BM_UFlat/3 159 120 24.53% 61538641 58783800 4.48% 90008672 90980060 -1.08%
BM_UFlat/4 7878 7787 1.16% 710491888 703718556 0.95% 1914898283 1525060250 20.36%
BM_UFlat/5 208854 207673 0.57% 17640846255 17609530720 0.18% 36546983483 36008920788 1.47%
BM_UFlat/6 172595 167225 3.11% 14642082831 14232371166 2.80% 33647820489 33056659600 1.76%
BM_UFlat/7 152364 147901 2.93% 12904338645 12635220582 2.09% 28958390984 28457982504 1.73%
BM_UFlat/8 463764 448244 3.35% 39423576973 37917435891 3.82% 88350964483 86800265943 1.76%
BM_UFlat/9 639517 621811 2.77% 54275945823 52555988926 3.17% 119503172410 117432599704 1.73%
BM_UFlat/10 41929 42358 -1.02% 3593125535 3647231492 -1.51% 8559206066 8446526639 1.32%
BM_UFlat/11 174754 173936 0.47% 14885371426 14749410955 0.91% 36693421142 35987215897 1.92%
BM_UFlat/12 13388 13257 0.98% 1192648670 1179645044 1.09% 3506482177 3454962579 1.47%
BM_UFlat/13 6801 6588 3.13% 627960003 608367286 3.12% 1847877894 1818368400 1.60%
BM_UFlat/14 2057 1989 3.31% 229005588 217393157 5.07% 609686274 599419511 1.68%
BM_UFlat/15 831618 799881 3.82% 70440388955 67911853013 3.59% 167178603105 164653652416 1.51%
BM_UFlat/16 199 199 0.00% 70109081 68747579 1.94% 106263639 105569531 0.65%
BM_UFlat/17 279031 273890 1.84% 23361373312 23294246637 0.29% 40474834585 39981682217 1.22%
BM_UFlat/18 233 199 14.59% 74530664 67841101 8.98% 94305848 92271053 2.16%
BM_UFlat/19 26743 25309 5.36% 2327215133 2206712016 5.18% 6024314357 5935228694 1.48%
BM_UFlat/20 2731 2625 3.88% 282018757 276772813 1.86% 768382519 758277029 1.32%

Is this a reasonable work-around for the problem?  Do you need more performance
measurements?  haih@ is evaluating this change for [] and I will update those
numbers once we have it.

Tested:
   Performance with zippy_unittest.

Re-work fast path that emits copies in zippy compression.

The primary motivation for the change is that FindMatchLength is
likely to discover a difference in the first 8 bytes it compares.
If that occurs then we know the length of the match is less than 12,
because FindMatchLength is invoked after a 4-byte match is found.
When emitting a copy, it is useful to know that the length is less
than 12 because the two-byte variant of an emitted copy requires that.

This is a performance-tuning change that should not affect the
library's behavior.

With FDO on perflab/Haswell the geometric mean for ZFlat/* went from
47,290ns to 45,741ns, an improvement of 3.4%.

SAMPLE (before)

BM_ZFlat/0      102824     102650      40691 951.4MB/s  html (22.31 %)
BM_ZFlat/1     1293512    1290442       3225 518.9MB/s  urls (47.78 %)
BM_ZFlat/2       10373      10353     417959 11.1GB/s  jpg (99.95 %)
BM_ZFlat/3         268        268   15745324 712.4MB/s  jpg_200 (73.00 %)
BM_ZFlat/4       12137      12113     342462 7.9GB/s  pdf (83.30 %)
BM_ZFlat/5      430672     429720       9724 909.0MB/s  html4 (22.52 %)
BM_ZFlat/6      420541     419636       9833 345.6MB/s  txt1 (57.88 %)
BM_ZFlat/7      373829     373158      10000 319.9MB/s  txt2 (61.91 %)
BM_ZFlat/8     1119014    1116604       3755 364.5MB/s  txt3 (54.99 %)
BM_ZFlat/9     1544203    1540657       2748 298.3MB/s  txt4 (66.26 %)
BM_ZFlat/10      91041      90866      46002 1.2GB/s  pb (19.68 %)
BM_ZFlat/11     332766     331990      10000 529.5MB/s  gaviota (37.72 %)
BM_ZFlat/12      39960      39886     100000 588.3MB/s  cp (48.12 %)
BM_ZFlat/13      14493      14465     287181 735.1MB/s  c (42.47 %)
BM_ZFlat/14       4447       4440     947927 799.3MB/s  lsp (48.37 %)
BM_ZFlat/15    1316362    1313350       3196 747.7MB/s  xls (41.23 %)
BM_ZFlat/16        312        311   10000000 613.0MB/s  xls_200 (78.00 %)
BM_ZFlat/17     388471     387502      10000 1.2GB/s  bin (18.11 %)
BM_ZFlat/18         65         64   64838208 2.9GB/s  bin_200 (7.50 %)
BM_ZFlat/19      65900      65787      63099 554.3MB/s  sum (48.96 %)
BM_ZFlat/20       6188       6177     681951 652.6MB/s  man (59.21 %)

SAMPLE (after)

Benchmark     Time(ns)    CPU(ns) Iterations
--------------------------------------------
BM_ZFlat/0       99259      99044      42428 986.0MB/s  html (22.31 %)
BM_ZFlat/1     1257039    1255276       3341 533.4MB/s  urls (47.78 %)
BM_ZFlat/2       10044      10030     405781 11.4GB/s  jpg (99.95 %)
BM_ZFlat/3         268        267   15732282 713.3MB/s  jpg_200 (73.00 %)
BM_ZFlat/4       11675      11657     358629 8.2GB/s  pdf (83.30 %)
BM_ZFlat/5      420951     419818       9739 930.5MB/s  html4 (22.52 %)
BM_ZFlat/6      415460     414632      10000 349.8MB/s  txt1 (57.88 %)
BM_ZFlat/7      367191     366436      10000 325.8MB/s  txt2 (61.91 %)
BM_ZFlat/8     1098345    1096036       3819 371.3MB/s  txt3 (54.99 %)
BM_ZFlat/9     1508701    1505306       2758 305.3MB/s  txt4 (66.26 %)
BM_ZFlat/10      87195      87031      47289 1.3GB/s  pb (19.68 %)
BM_ZFlat/11     322338     321637      10000 546.5MB/s  gaviota (37.72 %)
BM_ZFlat/12      36739      36668     100000 639.9MB/s  cp (48.12 %)
BM_ZFlat/13      13646      13618     304009 780.9MB/s  c (42.47 %)
BM_ZFlat/14       4249       4240     992456 837.0MB/s  lsp (48.37 %)
BM_ZFlat/15    1262925    1260012       3314 779.4MB/s  xls (41.23 %)
BM_ZFlat/16        308        308   10000000 619.8MB/s  xls_200 (78.00 %)
BM_ZFlat/17     379750     378944      10000 1.3GB/s  bin (18.11 %)
BM_ZFlat/18         62         62   67443280 3.0GB/s  bin_200 (7.50 %)
BM_ZFlat/19      61706      61587      67645 592.1MB/s  sum (48.96 %)
BM_ZFlat/20       5968       5958     698974 676.6MB/s  man (59.21 %)

Speed up the EmitLiteral fast path, +1.62% for ZFlat benchmarks.

This is inspired by the Go version in
//third_party/golang/snappy/encode_amd64.s (emitLiteralFastPath)

        Benchmark         Base:Reference   (1)
--------------------------------------------------
(BM_ZFlat_0 1/cputime_ns)        9.669e-06  +1.65%
(BM_ZFlat_1 1/cputime_ns)        7.643e-07  +2.53%
(BM_ZFlat_10 1/cputime_ns)       1.107e-05  -0.97%
(BM_ZFlat_11 1/cputime_ns)       3.002e-06  +0.71%
(BM_ZFlat_12 1/cputime_ns)       2.338e-05  +7.22%
(BM_ZFlat_13 1/cputime_ns)       6.386e-05  +9.18%
(BM_ZFlat_14 1/cputime_ns)       0.0002256  -0.05%
(BM_ZFlat_15 1/cputime_ns)       7.608e-07  -1.29%
(BM_ZFlat_16 1/cputime_ns)        0.003236  -1.28%
(BM_ZFlat_17 1/cputime_ns)        2.58e-06  +0.52%
(BM_ZFlat_18 1/cputime_ns)         0.01538  +0.00%
(BM_ZFlat_19 1/cputime_ns)       1.436e-05  +6.21%
(BM_ZFlat_2 1/cputime_ns)        0.0001044  +4.99%
(BM_ZFlat_20 1/cputime_ns)       0.0001608  -0.18%
(BM_ZFlat_3 1/cputime_ns)         0.003745  +0.38%
(BM_ZFlat_4 1/cputime_ns)        8.144e-05  +6.21%
(BM_ZFlat_5 1/cputime_ns)        2.328e-06  -1.60%
(BM_ZFlat_6 1/cputime_ns)        2.391e-06  +0.06%
(BM_ZFlat_7 1/cputime_ns)         2.68e-06  -0.61%
(BM_ZFlat_8 1/cputime_ns)        8.852e-07  +0.19%
(BM_ZFlat_9 1/cputime_ns)        6.441e-07  +1.06%

geometric mean                              +1.62%

Speed up zippy decompression by removing some zero-extensions.

This is a performance tuning change that should not affect
correctness.  On perflab with FDO on Haswell the performance gain is
21,776ns before vs 21,255ns after, about 2.4%.  (Using geometric means.)

SAMPLE PERFORMANCE with FDO on HASWELL (NEW)

Benchmark         Time(ns)    CPU(ns) Iterations
------------------------------------------------
BM_UFlat/0           37366      37279     100000 2.6GB/s  html
BM_UFlat/1          471153     470204       8975 1.4GB/s  urls
BM_UFlat/2            6116       6105     639496 18.8GB/s  jpg
BM_UFlat/3             123        123   34709908 1.5GB/s  jpg_200
BM_UFlat/4            6724       6714     623318 14.2GB/s  pdf
BM_UFlat/5          183122     182722      23138 2.1GB/s  html4
BM_UFlat/6          144981     144689      29384 1002.5MB/s  txt1
BM_UFlat/7          125939     125691      33423 949.8MB/s  txt2
BM_UFlat/8          383101     382241      10000 1064.7MB/s  txt3
BM_UFlat/9          527824     526606       7958 872.6MB/s  txt4
BM_UFlat/10          34849      34790     100000 3.2GB/s  pb
BM_UFlat/11         150213     149937      28131 1.1GB/s  gaviota
BM_UFlat/12          10850      10830     393231 2.1GB/s  cp
BM_UFlat/13           5532       5523     735739 1.9GB/s  c
BM_UFlat/14           1698       1695    2478035 2.0GB/s  lsp
BM_UFlat/15         678396     676917       6200 1.4GB/s  xls
BM_UFlat/16            155        155   26909789 1.2GB/s  xls_200
BM_UFlat/17         241235     240698      17416 2.0GB/s  bin
BM_UFlat/18            183        183   23000841 1043.5MB/s  bin_200
BM_UFlat/19          21461      21424     193275 1.7GB/s  sum
BM_UFlat/20           2232       2228    1887191 1.8GB/s  man
BM_UFlatSink/0       42272      42199      98528 2.3GB/s  html
BM_UFlatSink/1      460814     459898       9092 1.4GB/s  urls
BM_UFlatSink/2        5558       5547     768629 20.7GB/s  jpg
BM_UFlatSink/3         124        123   33629141 1.5GB/s  jpg_200
BM_UFlatSink/4        6634       6621     629989 14.4GB/s  pdf
BM_UFlatSink/5      182883     182491      23030 2.1GB/s  html4
BM_UFlatSink/6      143269     142964      29410 1014.5MB/s  txt1
BM_UFlatSink/7      127041     126809      33136 941.4MB/s  txt2
BM_UFlatSink/8      384367     383577      10000 1061.0MB/s  txt3
BM_UFlatSink/9      529979     528890       7898 868.9MB/s  txt4
BM_UFlatSink/10      41154      41075     100000 2.7GB/s  pb
BM_UFlatSink/11     146446     146155      28742 1.2GB/s  gaviota
BM_UFlatSink/12      11939      11918     352663 1.9GB/s  cp
BM_UFlatSink/13       5430       5421     770451 1.9GB/s  c
BM_UFlatSink/14       1665       1662    2538921 2.1GB/s  lsp
BM_UFlatSink/15     666840     665617       6309 1.4GB/s  xls
BM_UFlatSink/16        152        152   27639460 1.2GB/s  xls_200
BM_UFlatSink/17     240076     239573      17643 2.0GB/s  bin
BM_UFlatSink/18        183        182   23128210 1046.0MB/s  bin_200
BM_UFlatSink/19      22570      22528     185839 1.6GB/s  sum
BM_UFlatSink/20       2183       2180    1899526 1.8GB/s  man

SAMPLE PERFORMANCE with FDO on HASWELL (OLD)

Benchmark         Time(ns)    CPU(ns) Iterations
------------------------------------------------
BM_UFlat/0           37041      36990     100000 2.6GB/s  html
BM_UFlat/1          471384     470574       8930 1.4GB/s  urls
BM_UFlat/2            5997       5986     722354 19.2GB/s  jpg
BM_UFlat/3             124        123   34964717 1.5GB/s  jpg_200
BM_UFlat/4            6850       6838     621414 13.9GB/s  pdf
BM_UFlat/5          182578     182271      23001 2.1GB/s  html4
BM_UFlat/6          148338     147989      28132 980.1MB/s  txt1
BM_UFlat/7          130682     130471      32347 915.0MB/s  txt2
BM_UFlat/8          397420     396553      10000 1026.3MB/s  txt3
BM_UFlat/9          550126     548872       7736 837.2MB/s  txt4
BM_UFlat/10          35013      34958     100000 3.2GB/s  pb
BM_UFlat/11         152270     151889      27508 1.1GB/s  gaviota
BM_UFlat/12          11117      11096     379059 2.1GB/s  cp
BM_UFlat/13           5812       5801     725240 1.8GB/s  c
BM_UFlat/14           1780       1777    2383982 2.0GB/s  lsp
BM_UFlat/15         707871     706139       5946 1.4GB/s  xls
BM_UFlat/16            157        157   26889747 1.2GB/s  xls_200
BM_UFlat/17         239160     238556      17512 2.0GB/s  bin
BM_UFlat/18            181        180   23326040 1057.5MB/s  bin_200
BM_UFlat/19          22706      22656     186285 1.6GB/s  sum
BM_UFlat/20           2319       2315    1813186 1.7GB/s  man
BM_UFlatSink/0       42657      42574      99000 2.2GB/s  html
BM_UFlatSink/1      466316     465262       9036 1.4GB/s  urls
BM_UFlatSink/2        6873       6859     648525 16.7GB/s  jpg
BM_UFlatSink/3         124        124   34434643 1.5GB/s  jpg_200
BM_UFlatSink/4        6804       6790     624282 14.0GB/s  pdf
BM_UFlatSink/5      185468     185062      22746 2.1GB/s  html4
BM_UFlatSink/6      148511     148209      28284 978.6MB/s  txt1
BM_UFlatSink/7      130865     130607      32144 914.0MB/s  txt2
BM_UFlatSink/8      393931     392983      10000 1035.6MB/s  txt3
BM_UFlatSink/9      545548     544275       7740 844.3MB/s  txt4
BM_UFlatSink/10      41659      41584     100000 2.7GB/s  pb
BM_UFlatSink/11     152062     151721      27854 1.1GB/s  gaviota
BM_UFlatSink/12      11987      11968     350909 1.9GB/s  cp
BM_UFlatSink/13       5652       5641     743280 1.8GB/s  c
BM_UFlatSink/14       1728       1725    2446140 2.0GB/s  lsp
BM_UFlatSink/15     687879     686231       6138 1.4GB/s  xls
BM_UFlatSink/16        155        155   27254484 1.2GB/s  xls_200
BM_UFlatSink/17     240689     240083      17450 2.0GB/s  bin
BM_UFlatSink/18        183        182   22932858 1046.8MB/s  bin_200
BM_UFlatSink/19      22718      22674     185207 1.6GB/s  sum
BM_UFlatSink/20       2272       2268    1851664 1.7GB/s  man

Avoid calling memset when resizing the buffer.

This buffer will be initialized and then trimmed down to size during the
compression phase.

Merge pull request #6 from deviance/provide-pkg-config-data

Provide pkg-config data

Add #ifdef to guard against macro redefinition if this is included in another
Google project that also defines this.

Merge pull request #13 from huachaohuang/patch-1

Allow to compile in nested packages.

Make heuristic match skipping more aggressive.

This causes compression to be much faster on incompressible inputs
(such as the jpeg and pdf tests), and is neutral or even positive on the other
tests. The test set shows only microscopic density regressions; I attempted to
construct a worst-case test set containing ~1500 different cases of mixed
plaintext + /dev/urandom, and even those seemed to be only 0.38 percentage
points less dense on average (the single worst case was 87.8% -> 89.0%), which
we can live with given that this is already an edge case.

The original idea is by Klaus Post; I only tweaked the implementation.
Ironically, the new implementation is almost more in line with the
comment that was there, so I've left that largely alone, albeit
with a small modification.

Microbenchmark results (opt mode, 64-bit, static linking):

Ivy Bridge:

Benchmark                 Base (ns)  New (ns)                                Improvement
----------------------------------------------------------------------------------------
BM_ZFlat/0                   120284    115480  847.0MB/s  html (22.31 %)        +4.2%
BM_ZFlat/1                  1527911   1522242  440.7MB/s  urls (47.78 %)        +0.4%
BM_ZFlat/2                    17591     10582  10.9GB/s  jpg (99.95 %)         +66.2%
BM_ZFlat/3                      323       322  593.3MB/s  jpg_200 (73.00 %)     +0.3%
BM_ZFlat/4                    53691     14063  6.8GB/s  pdf (83.30 %)         +281.8%
BM_ZFlat/5                   495442    492347  794.8MB/s  html4 (22.52 %)       +0.6%
BM_ZFlat/6                   473523    473622  306.7MB/s  txt1 (57.88 %)        -0.0%
BM_ZFlat/7                   421406    420120  284.5MB/s  txt2 (61.91 %)        +0.3%
BM_ZFlat/8                  1265632   1270538  320.8MB/s  txt3 (54.99 %)        -0.4%
BM_ZFlat/9                  1742688   1737894  264.8MB/s  txt4 (66.26 %)        +0.3%
BM_ZFlat/10                  107950    103404  1095.1MB/s  pb (19.68 %)         +4.4%
BM_ZFlat/11                  372660    371818  473.5MB/s  gaviota (37.72 %)     +0.2%
BM_ZFlat/12                   53239     49528  474.4MB/s  cp (48.12 %)          +7.5%
BM_ZFlat/13                   18940     17349  613.9MB/s  c (42.47 %)           +9.2%
BM_ZFlat/14                    5155      5075  700.3MB/s  lsp (48.37 %)         +1.6%
BM_ZFlat/15                 1474757   1474471  667.2MB/s  xls (41.23 %)         +0.0%
BM_ZFlat/16                     363       362  528.0MB/s  xls_200 (78.00 %)     +0.3%
BM_ZFlat/17                  453849    456931  1073.2MB/s  bin (18.11 %)        -0.7%
BM_ZFlat/18                      90        87  2.1GB/s  bin_200 (7.50 %)        +3.4%
BM_ZFlat/19                   82163     80498  453.7MB/s  sum (48.96 %)         +2.1%
BM_ZFlat/20                    7174      7124  566.7MB/s  man (59.21 %)         +0.7%
Sum of all benchmarks       8694831   8623857                                   +0.8%

Sandy Bridge:

Benchmark                 Base (ns)  New (ns)                                Improvement
----------------------------------------------------------------------------------------
BM_ZFlat/0                   117426    112649  868.2MB/s  html (22.31 %)        +4.2%
BM_ZFlat/1                  1517095   1498522  447.5MB/s  urls (47.78 %)        +1.2%
BM_ZFlat/2                    18601     10649  10.8GB/s  jpg (99.95 %)         +74.7%
BM_ZFlat/3                      359       356  536.0MB/s  jpg_200 (73.00 %)     +0.8%
BM_ZFlat/4                    60249     13832  6.9GB/s  pdf (83.30 %)         +335.6%
BM_ZFlat/5                   481246    475571  822.7MB/s  html4 (22.52 %)       +1.2%
BM_ZFlat/6                   460541    455693  318.8MB/s  txt1 (57.88 %)        +1.1%
BM_ZFlat/7                   407751    404147  295.8MB/s  txt2 (61.91 %)        +0.9%
BM_ZFlat/8                  1228255   1222519  333.4MB/s  txt3 (54.99 %)        +0.5%
BM_ZFlat/9                  1678299   1666379  276.2MB/s  txt4 (66.26 %)        +0.7%
BM_ZFlat/10                  106499    101715  1113.4MB/s  pb (19.68 %)         +4.7%
BM_ZFlat/11                  361913    360222  488.7MB/s  gaviota (37.72 %)     +0.5%
BM_ZFlat/12                   53137     49618  473.6MB/s  cp (48.12 %)          +7.1%
BM_ZFlat/13                   18801     17812  597.8MB/s  c (42.47 %)           +5.6%
BM_ZFlat/14                    5394      5383  660.2MB/s  lsp (48.37 %)         +0.2%
BM_ZFlat/15                 1435411   1432870  686.4MB/s  xls (41.23 %)         +0.2%
BM_ZFlat/16                     389       395  483.3MB/s  xls_200 (78.00 %)     -1.5%
BM_ZFlat/17                  447255    445510  1100.4MB/s  bin (18.11 %)        +0.4%
BM_ZFlat/18                      86        86  2.2GB/s  bin_200 (7.50 %)        +0.0%
BM_ZFlat/19                   82555     79512  459.3MB/s  sum (48.96 %)         +3.8%
BM_ZFlat/20                    7527      7553  534.5MB/s  man (59.21 %)         -0.3%
Sum of all benchmarks       8488789   8360993                                   +1.5%

Haswell:

Benchmark                 Base (ns)  New (ns)                                Improvement
----------------------------------------------------------------------------------------
BM_ZFlat/0                   107512    105621  925.6MB/s  html (22.31 %)        +1.8%
BM_ZFlat/1                  1344306   1332479  503.1MB/s  urls (47.78 %)        +0.9%
BM_ZFlat/2                    14752      9471  12.1GB/s  jpg (99.95 %)         +55.8%
BM_ZFlat/3                      287       275  694.0MB/s  jpg_200 (73.00 %)     +4.4%
BM_ZFlat/4                    48810     12263  7.8GB/s  pdf (83.30 %)         +298.0%
BM_ZFlat/5                   443013    442064  884.6MB/s  html4 (22.52 %)       +0.2%
BM_ZFlat/6                   429239    432124  336.0MB/s  txt1 (57.88 %)        -0.7%
BM_ZFlat/7                   381765    383681  311.5MB/s  txt2 (61.91 %)        -0.5%
BM_ZFlat/8                  1136667   1154304  353.0MB/s  txt3 (54.99 %)        -1.5%
BM_ZFlat/9                  1579925   1592431  288.9MB/s  txt4 (66.26 %)        -0.8%
BM_ZFlat/10                   98345     92411  1.2GB/s  pb (19.68 %)            +6.4%
BM_ZFlat/11                  340397    340466  516.8MB/s  gaviota (37.72 %)     -0.0%
BM_ZFlat/12                   47076     43536  539.5MB/s  cp (48.12 %)          +8.1%
BM_ZFlat/13                   16680     15637  680.8MB/s  c (42.47 %)           +6.7%
BM_ZFlat/14                    4616      4539  782.6MB/s  lsp (48.37 %)         +1.7%
BM_ZFlat/15                 1331231   1334094  736.9MB/s  xls (41.23 %)         -0.2%
BM_ZFlat/16                     326       322  593.5MB/s  xls_200 (78.00 %)     +1.2%
BM_ZFlat/17                  404383    400326  1.2GB/s  bin (18.11 %)           +1.0%
BM_ZFlat/18                      69        69  2.7GB/s  bin_200 (7.50 %)        +0.0%
BM_ZFlat/19                   74771     71348  511.7MB/s  sum (48.96 %)         +4.8%
BM_ZFlat/20                    6461      6383  632.2MB/s  man (59.21 %)         +1.2%
Sum of all benchmarks       7810631   7773844                                   +0.5%

I've done a quick test that there are no performance regressions on external
GCC (4.9.2, Debian, Haswell, 64-bit), too.

Default to glibtoolize instead of libtoolize if it exists,
and also make it customizable through the environment variable
$LIBTOOLIZE.

Fixes autogen.sh issues on OS X, which ships its own
(incompatible) libtoolize.

R=jeff

Work around an issue where some compilers interpret <:: as a trigraph.
Also correct the namespace name.

Unbreak the open-source build for ARM due to missing ATTRIBUTE_PACKED
declaration.

Fix an issue where the ByteSource path (used for parsing std::string)
would incorrectly accept some invalid varints that the other path would not,
causing potential CHECK-failures if the unit test were run with
--write_uncompressed and a corrupted input file.

Found by the afl fuzzer.

Make UNALIGNED_LOAD16/32 on ARMv7 go through an explicitly unaligned struct,
to avoid the compiler coalescing multiple loads into a single load instruction
(which only work for aligned accesses).

A typical example where GCC would coalesce:

  uint8* p = ...;
  uint32 a = UNALIGNED_LOAD32(p);
  uint32 b = UNALIGNED_LOAD32(p + 4);
  uint32 c = a | b;

Allow to compile in nested packages.

Update URLs in the Snappy README to reflect the move to GitHub.

A=sesse
R=sanjay

Move the logic from ComputeTable into the unit test, which means it's run
automatically together with the other tests, and also removes the stray
function ComputeTable() (which was never referenced by anything else
in the open-source version, causing compiler warnings for some)
out of the core library.

Fixes public issue 96.

A=sesse
R=sanjay

Fix signed-vs.-unsigned comparison warnings.

These were found by compiling Chromium's external copy of this code with MSVC
with warning C4018 enabled.

A=pkasting
R=sanjay

Provide pkg-config data

Release Snappy 1.1.3; getting the new Uncompress variant in a release is nice,
and it's also good to finally get an official release out after the migration
to GitHub.

The GitHub releases are basically done by tagging a commit and then uploading
the .tar.gz file generated by make dist as a binary asset; GitHub will add
all files on the tagged commit on top of the tarball and recompress, but since
we don't have any nodist_* files in configure.ac, this works fine for us.
(As far as I can see, this behavior of GitHub--uncompressing the .tar.gz,
and the behavior of silently ignoring files in it that are also in the git
repository--is undocumented, but also seems to be used in some official
screenshots, so I guess we can rely on it.)

A=sesse
R=jeff

Initialized members of SnappyArrayWriter and SnappyDecompressionValidator.
These members were almost surely initialized before use by other member
functions, but Coverity was warning about this. Eliminating these warnings
minimizes clutter in that report and the likelihood of overlooking a real bug.

A=cmumford
R=jeff

Add support for Uncompress(source, sink). Various changes to allow
Uncompress(source, sink) to get the same performance as the different
variants of Uncompress to Cord/DataBuffer/String/FlatBuffer.

Changes to efficiently support Uncompress(source, sink)
--------

a) For strings - we add support to StringByteSink to do GetAppendBuffer so we
   can write to it without copying.
b) For flat array buffers, we do GetAppendBuffer and see if we can get a full buffer.

With the above changes we get performance with ByteSource/ByteSink
that is very close to directly using flat arrays and strings.

We add various benchmark cases to demonstrate that.

Orthogonal change
------------------

Add support for TryFastAppend() for SnappyScatteredWriter.

Benchmark results are below

CPU: Intel Core2 dL1:32KB dL2:4096KB
Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0               109065     108996       6410 896.0MB/s  html
BM_UFlat/1              1012175    1012343        691 661.4MB/s  urls
BM_UFlat/2                26775      26771      26149 4.4GB/s  jpg
BM_UFlat/3                48947      48940      14363 1.8GB/s  pdf
BM_UFlat/4               441029     440835       1589 886.1MB/s  html4
BM_UFlat/5                39861      39880      17823 588.3MB/s  cp
BM_UFlat/6                18315      18300      38126 581.1MB/s  c
BM_UFlat/7                 5254       5254     100000 675.4MB/s  lsp
BM_UFlat/8              1568060    1567376        447 626.6MB/s  xls
BM_UFlat/9               337512     337734       2073 429.5MB/s  txt1
BM_UFlat/10              287269     287054       2434 415.9MB/s  txt2
BM_UFlat/11              890098     890219        787 457.2MB/s  txt3
BM_UFlat/12             1186593    1186863        590 387.2MB/s  txt4
BM_UFlat/13              573927     573318       1000 853.7MB/s  bin
BM_UFlat/14               64250      64294      10000 567.2MB/s  sum
BM_UFlat/15                7301       7300      96153 552.2MB/s  man
BM_UFlat/16              109617     109636       6375 1031.5MB/s  pb
BM_UFlat/17              364438     364497       1921 482.3MB/s  gaviota
BM_UFlatSink/0           108518     108465       6450 900.4MB/s  html
BM_UFlatSink/1           991952     991997        705 675.0MB/s  urls
BM_UFlatSink/2            26815      26798      26065 4.4GB/s  jpg
BM_UFlatSink/3            49127      49122      14255 1.8GB/s  pdf
BM_UFlatSink/4           436674     436731       1604 894.4MB/s  html4
BM_UFlatSink/5            39738      39733      17345 590.5MB/s  cp
BM_UFlatSink/6            18413      18416      37962 577.4MB/s  c
BM_UFlatSink/7             5677       5676     100000 625.2MB/s  lsp
BM_UFlatSink/8          1552175    1551026        451 633.2MB/s  xls
BM_UFlatSink/9           338526     338489       2065 428.5MB/s  txt1
BM_UFlatSink/10          289387     289307       2420 412.6MB/s  txt2
BM_UFlatSink/11          893803     893706        783 455.4MB/s  txt3
BM_UFlatSink/12         1195919    1195459        586 384.4MB/s  txt4
BM_UFlatSink/13          559637     559779       1000 874.3MB/s  bin
BM_UFlatSink/14           65073      65094      10000 560.2MB/s  sum
BM_UFlatSink/15            7618       7614      92823 529.5MB/s  man
BM_UFlatSink/16          110085     110121       6352 1027.0MB/s  pb
BM_UFlatSink/17          369196     368915       1896 476.5MB/s  gaviota
BM_UValidate/0            46954      46957      14899 2.0GB/s  html
BM_UValidate/1           500621     500868       1000 1.3GB/s  urls
BM_UValidate/2              283        283    2481447 417.2GB/s  jpg
BM_UValidate/3            16230      16228      43137 5.4GB/s  pdf
BM_UValidate/4           189129     189193       3701 2.0GB/s  html4

A=uday
R=sanjay

Changes to eliminate compiler warnings on MSVC

This code was not compiling under Visual Studio 2013 with warnings being treated
as errors. Specifically:

1. Changed int -> size_t to eliminate signed/unsigned mismatch warning.
2. Added some missing return values to functions.
3. Inserting character instead of integer literals into strings to avoid type
conversions.

A=cmumford
R=jeff

Fixed unit tests to compile under MSVC.

1. Including config.h in test.
2. Including windows.h before zippy-test.h.
3. Removed definition of WIN32_LEAN_AND_MEAN. This caused problems in
   build environments that define WIN32_LEAN_AND_MEAN as our
   definition didn't check for prior existence. This constant is old
   and no longer needed anyhow.
4. Disable MSVC warning 4722 since ~LogMessageCrash() never returns.

A=cmumford
R=jeff

Change a few branch annotations that profiling found to be wrong.
Overall performance is neutral or slightly positive.

Westmere (64-bit, opt):

Benchmark               Base (ns)  New (ns)                                Improvement
--------------------------------------------------------------------------------------
BM_UFlat/0                  73798     71464  1.3GB/s  html                    +3.3%
BM_UFlat/1                 715223    704318  953.5MB/s  urls                  +1.5%
BM_UFlat/2                   8137      8871  13.0GB/s  jpg                    -8.3%
BM_UFlat/3                    200       204  935.5MB/s  jpg_200               -2.0%
BM_UFlat/4                  21627     21281  4.5GB/s  pdf                     +1.6%
BM_UFlat/5                 302806    290350  1.3GB/s  html4                   +4.3%
BM_UFlat/6                 218920    219017  664.1MB/s  txt1                  -0.0%
BM_UFlat/7                 190437    191212  626.1MB/s  txt2                  -0.4%
BM_UFlat/8                 584192    580484  703.4MB/s  txt3                  +0.6%
BM_UFlat/9                 776537    779055  591.6MB/s  txt4                  -0.3%
BM_UFlat/10                 76056     72606  1.5GB/s  pb                      +4.8%
BM_UFlat/11                235962    239043  737.4MB/s  gaviota               -1.3%
BM_UFlat/12                 28049     28000  840.1MB/s  cp                    +0.2%
BM_UFlat/13                 12225     12021  886.9MB/s  c                     +1.7%
BM_UFlat/14                  3362      3544  1004.0MB/s  lsp                  -5.1%
BM_UFlat/15                937015    939206  1048.9MB/s  xls                  -0.2%
BM_UFlat/16                   236       233  823.1MB/s  xls_200               +1.3%
BM_UFlat/17                373170    361947  1.3GB/s  bin                     +3.1%
BM_UFlat/18                   264       264  725.5MB/s  bin_200               +0.0%
BM_UFlat/19                 42834     43577  839.2MB/s  sum                   -1.7%
BM_UFlat/20                  4770      4736  853.6MB/s  man                   +0.7%
BM_UValidate/0              39671     39944  2.4GB/s  html                    -0.7%
BM_UValidate/1             443391    443391  1.5GB/s  urls                    +0.0%
BM_UValidate/2                163       163  703.3GB/s  jpg                   +0.0%
BM_UValidate/3                113       112  1.7GB/s  jpg_200                 +0.9%
BM_UValidate/4               7555      7608  12.6GB/s  pdf                    -0.7%
BM_ZFlat/0                 157616    157568  621.5MB/s  html (22.31 %)        +0.0%
BM_ZFlat/1                1997290   2014486  333.4MB/s  urls (47.77 %)        -0.9%
BM_ZFlat/2                  23035     22237  5.2GB/s  jpg (99.95 %)           +3.6%
BM_ZFlat/3                    539       540  354.5MB/s  jpg_200 (73.00 %)     -0.2%
BM_ZFlat/4                  80709     81369  1.2GB/s  pdf (81.85 %)           -0.8%
BM_ZFlat/5                 639059    639220  613.0MB/s  html4 (22.51 %)       -0.0%
BM_ZFlat/6                 577203    583370  249.3MB/s  txt1 (57.87 %)        -1.1%
BM_ZFlat/7                 510887    516094  232.0MB/s  txt2 (61.93 %)        -1.0%
BM_ZFlat/8                1535843   1556973  262.2MB/s  txt3 (54.92 %)        -1.4%
BM_ZFlat/9                2070068   2102380  219.3MB/s  txt4 (66.22 %)        -1.5%
BM_ZFlat/10                152396    152148  745.5MB/s  pb (19.64 %)          +0.2%
BM_ZFlat/11                447367    445859  395.4MB/s  gaviota (37.72 %)     +0.3%
BM_ZFlat/12                 76375     76797  306.3MB/s  cp (48.12 %)          -0.5%
BM_ZFlat/13                 31518     31987  333.3MB/s  c (42.40 %)           -1.5%
BM_ZFlat/14                 10598     10827  328.6MB/s  lsp (48.37 %)         -2.1%
BM_ZFlat/15               1782243   1802728  546.5MB/s  xls (41.23 %)         -1.1%
BM_ZFlat/16                   526       539  355.0MB/s  xls_200 (78.00 %)     -2.4%
BM_ZFlat/17                598141    597311  822.1MB/s  bin (18.11 %)         +0.1%
BM_ZFlat/18                   121       120  1.6GB/s  bin_200 (7.50 %)        +0.8%
BM_ZFlat/19                109981    112173  326.0MB/s  sum (48.96 %)         -2.0%
BM_ZFlat/20                 14355     14575  277.4MB/s  man (59.36 %)         -1.5%
Sum of all benchmarks    33882722  33879325                                   +0.0%

Sandy Bridge (64-bit, opt):

Benchmark               Base (ns)  New (ns)                                Improvement
--------------------------------------------------------------------------------------
BM_UFlat/0                  43764     41600  2.3GB/s  html                    +5.2%
BM_UFlat/1                 517990    507058  1.3GB/s  urls                    +2.2%
BM_UFlat/2                   6625      5529  20.8GB/s  jpg                   +19.8%
BM_UFlat/3                    154       155  1.2GB/s  jpg_200                 -0.6%
BM_UFlat/4                  12795     11747  8.1GB/s  pdf                     +8.9%
BM_UFlat/5                 200335    193413  2.0GB/s  html4                   +3.6%
BM_UFlat/6                 156574    156426  929.2MB/s  txt1                  +0.1%
BM_UFlat/7                 137574    137464  870.4MB/s  txt2                  +0.1%
BM_UFlat/8                 422551    421603  967.4MB/s  txt3                  +0.2%
BM_UFlat/9                 577749    578985  795.6MB/s  txt4                  -0.2%
BM_UFlat/10                 42329     39362  2.8GB/s  pb                      +7.5%
BM_UFlat/11                170615    169751  1037.9MB/s  gaviota              +0.5%
BM_UFlat/12                 12800     12719  1.8GB/s  cp                      +0.6%
BM_UFlat/13                  6585      6579  1.6GB/s  c                       +0.1%
BM_UFlat/14                  2066      2044  1.7GB/s  lsp                     +1.1%
BM_UFlat/15                750861    746911  1.3GB/s  xls                     +0.5%
BM_UFlat/16                   188       192  996.0MB/s  xls_200               -2.1%
BM_UFlat/17                271622    264333  1.8GB/s  bin                     +2.8%
BM_UFlat/18                   208       207  923.6MB/s  bin_200               +0.5%
BM_UFlat/19                 24667     24845  1.4GB/s  sum                     -0.7%
BM_UFlat/20                  2663      2662  1.5GB/s  man                     +0.0%
BM_ZFlat/0                 115173    115624  846.5MB/s  html (22.31 %)        -0.4%
BM_ZFlat/1                1530331   1537769  436.5MB/s  urls (47.77 %)        -0.5%
BM_ZFlat/2                  17503     17013  6.8GB/s  jpg (99.95 %)           +2.9%
BM_ZFlat/3                    385       385  496.3MB/s  jpg_200 (73.00 %)     +0.0%
BM_ZFlat/4                  61753     61540  1.6GB/s  pdf (81.85 %)           +0.3%
BM_ZFlat/5                 484806    483356  810.1MB/s  html4 (22.51 %)       +0.3%
BM_ZFlat/6                 464143    467609  310.9MB/s  txt1 (57.87 %)        -0.7%
BM_ZFlat/7                 410315    413319  289.5MB/s  txt2 (61.93 %)        -0.7%
BM_ZFlat/8                1244082   1249381  326.5MB/s  txt3 (54.92 %)        -0.4%
BM_ZFlat/9                1696914   1709685  269.4MB/s  txt4 (66.22 %)        -0.7%
BM_ZFlat/10                104148    103372  1096.7MB/s  pb (19.64 %)         +0.8%
BM_ZFlat/11                363522    359722  489.8MB/s  gaviota (37.72 %)     +1.1%
BM_ZFlat/12                 47021     50095  469.3MB/s  cp (48.12 %)          -6.1%
BM_ZFlat/13                 16888     16985  627.4MB/s  c (42.40 %)           -0.6%
BM_ZFlat/14                  5496      5469  650.3MB/s  lsp (48.37 %)         +0.5%
BM_ZFlat/15               1460713   1448760  679.5MB/s  xls (41.23 %)         +0.8%
BM_ZFlat/16                   387       393  486.8MB/s  xls_200 (78.00 %)     -1.5%
BM_ZFlat/17                457654    451462  1086.6MB/s  bin (18.11 %)        +1.4%
BM_ZFlat/18                    97        87  2.1GB/s  bin_200 (7.50 %)       +11.5%
BM_ZFlat/19                 77904     80924  451.7MB/s  sum (48.96 %)         -3.7%
BM_ZFlat/20                  7648      7663  527.1MB/s  man (59.36 %)         -0.2%
Sum of all benchmarks    25493635  25482069                                   +0.0%

A=dehao
R=sesse

Sync with various Google-internal changes.

Should not mean much for the open-source version.

Change some internal path names.

This is mostly to sync up with some changes from Google's internal
repositories; it does not affect the open-source distribution in itself.

Release Snappy 1.1.2.

R=jeff

git-svn-id: https://snappy.googlecode.com/svn/trunk@84 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Fix public issue 82: Stop distributing benchmark data files that have
unclear or unsuitable licensing.

In general, we replace the files we can with liberally licensed data,
and remove all the others (in particular all the parts of the Canterbury
corpus that are not clearly in the public domain). The replacements
do not always have the exact same characteristics as the original ones,
but they are more than good enough to be useful for benchmarking.

git-svn-id: https://snappy.googlecode.com/svn/trunk@83 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Add support for padding in the Snappy framed format.

This is specifically motivated by DICOM's demands that embedded data
must be of an even number of bytes, but could in principle be used for
any sort of padding/alignment needed.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@82 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Release Snappy 1.1.1.

R=jeff

git-svn-id: https://snappy.googlecode.com/svn/trunk@81 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Add autoconf tests for size_t and ssize_t. Sort-of resolves public issue 79;
it would solve the problem if MSVC typically used autoconf. However, it gives
a natural place (config.h) to put the typedef even for MSVC.

R=jsbell

git-svn-id: https://snappy.googlecode.com/svn/trunk@80 03e5f5b5-db94-4691-08a0-1a8bf15f6143

When we compare the number of bytes produced with the offset for a
backreference, make the signedness of the bytes produced clear,
by sticking it into a size_t. This avoids a signed/unsigned compare
warning from MSVC (public issue 71), and also is slightly clearer.

Since the line is now so long the explanatory comment about the -1u
trick has to go somewhere else anyway, I used the opportunity to
explain it in slightly more detail.

This is a purely stylistic change; the emitted assembler from GCC
is identical.

R=jeff

git-svn-id: https://snappy.googlecode.com/svn/trunk@79 03e5f5b5-db94-4691-08a0-1a8bf15f6143

In the fast path for decompressing literals, instead of checking
whether there's 16 bytes free and then checking right afterwards
(when having subtracted the literal size) that there are now
5 bytes free, just check once for 21 bytes. This skips a compare
and a branch; although it is easily predictable, it is still
a few cycles on a fast path that we would like to get rid of.

Benchmarking this yields very confusing results. On open-source
GCC 4.8.1 on Haswell, we get exactly the expected results; the
benchmarks where we hit the fast path for literals (in particular
the two HTML benchmarks and the protobuf benchmark) give very nice
speedups, and the others are not really affected.

However, benchmarks with Google's GCC branch on other hardware
is much less clear. It seems that we have a weak loss in some cases
(and the win for the “typical” win cases are not nearly as clear),
but that it depends on microarchitecture and plain luck in how we run
the benchmark. Looking at the generated assembler, it seems that
the removal of the if causes other large-scale changes in how the
function is laid out, which makes it likely that this is just bad luck.

Thus, we should keep this change, even though its exact current impact is
unclear; it's a sensible change per se, and dropping it on the basis of
microoptimization for a given compiler (or even branch of a compiler)
would seem like a bad strategy in the long run.

Microbenchmark results (all in 64-bit, opt mode):

  Nehalem, Google GCC:

  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   76747     75591  1.3GB/s  html           +1.5%
  BM_UFlat/1                  765756    757040  886.3MB/s  urls         +1.2%
  BM_UFlat/2                   10867     10893  10.9GB/s  jpg           -0.2%
  BM_UFlat/3                     124       131  1.4GB/s  jpg_200        -5.3%
  BM_UFlat/4                   31663     31596  2.8GB/s  pdf            +0.2%
  BM_UFlat/5                  314162    308176  1.2GB/s  html4          +1.9%
  BM_UFlat/6                   29668     29746  790.6MB/s  cp           -0.3%
  BM_UFlat/7                   12958     13386  796.4MB/s  c            -3.2%
  BM_UFlat/8                    3596      3682  966.0MB/s  lsp          -2.3%
  BM_UFlat/9                 1019193   1033493  953.3MB/s  xls          -1.4%
  BM_UFlat/10                    239       247  775.3MB/s  xls_200      -3.2%
  BM_UFlat/11                 236411    240271  606.9MB/s  txt1         -1.6%
  BM_UFlat/12                 206639    209768  571.2MB/s  txt2         -1.5%
  BM_UFlat/13                 627803    635722  641.4MB/s  txt3         -1.2%
  BM_UFlat/14                 845932    857816  538.2MB/s  txt4         -1.4%
  BM_UFlat/15                 402107    391670  1.2GB/s  bin            +2.7%
  BM_UFlat/16                    283       279  683.6MB/s  bin_200      +1.4%
  BM_UFlat/17                  46070     46815  781.5MB/s  sum          -1.6%
  BM_UFlat/18                   5053      5163  782.0MB/s  man          -2.1%
  BM_UFlat/19                  79721     76581  1.4GB/s  pb             +4.1%
  BM_UFlat/20                 251158    252330  697.5MB/s  gaviota      -0.5%
  Sum of all benchmarks      4966150   4980396                          -0.3%

  Sandy Bridge, Google GCC:

  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   42850     42182  2.3GB/s  html           +1.6%
  BM_UFlat/1                  525660    515816  1.3GB/s  urls           +1.9%
  BM_UFlat/2                    7173      7283  16.3GB/s  jpg           -1.5%
  BM_UFlat/3                      92        91  2.1GB/s  jpg_200        +1.1%
  BM_UFlat/4                   15147     14872  5.9GB/s  pdf            +1.8%
  BM_UFlat/5                  199936    192116  2.0GB/s  html4          +4.1%
  BM_UFlat/6                   12796     12443  1.8GB/s  cp             +2.8%
  BM_UFlat/7                    6588      6400  1.6GB/s  c              +2.9%
  BM_UFlat/8                    2010      1951  1.8GB/s  lsp            +3.0%
  BM_UFlat/9                  761124    763049  1.3GB/s  xls            -0.3%
  BM_UFlat/10                    186       189  1016.1MB/s  xls_200     -1.6%
  BM_UFlat/11                 159354    158460  918.6MB/s  txt1         +0.6%
  BM_UFlat/12                 139732    139950  856.1MB/s  txt2         -0.2%
  BM_UFlat/13                 429917    425027  961.7MB/s  txt3         +1.2%
  BM_UFlat/14                 585255    587324  785.8MB/s  txt4         -0.4%
  BM_UFlat/15                 276186    266173  1.8GB/s  bin            +3.8%
  BM_UFlat/16                    205       207  925.5MB/s  bin_200      -1.0%
  BM_UFlat/17                  24925     24935  1.4GB/s  sum            -0.0%
  BM_UFlat/18                   2632      2576  1.5GB/s  man            +2.2%
  BM_UFlat/19                  40546     39108  2.8GB/s  pb             +3.7%
  BM_UFlat/20                 175803    168209  1048.9MB/s  gaviota     +4.5%
  Sum of all benchmarks      3408117   3368361                          +1.2%

  Haswell, upstream GCC 4.8.1:

  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   46308     40641  2.3GB/s  html          +13.9%
  BM_UFlat/1                  513385    514706  1.3GB/s  urls           -0.3%
  BM_UFlat/2                    6197      6151  19.2GB/s  jpg           +0.7%
  BM_UFlat/3                      61        61  3.0GB/s  jpg_200        +0.0%
  BM_UFlat/4                   13551     13429  6.5GB/s  pdf            +0.9%
  BM_UFlat/5                  198317    190243  2.0GB/s  html4          +4.2%
  BM_UFlat/6                   14768     12560  1.8GB/s  cp            +17.6%
  BM_UFlat/7                    6453      6447  1.6GB/s  c              +0.1%
  BM_UFlat/8                    1991      1980  1.8GB/s  lsp            +0.6%
  BM_UFlat/9                  766947    770424  1.2GB/s  xls            -0.5%
  BM_UFlat/10                    170       169  1.1GB/s  xls_200        +0.6%
  BM_UFlat/11                 164350    163554  888.7MB/s  txt1         +0.5%
  BM_UFlat/12                 145444    143830  832.1MB/s  txt2         +1.1%
  BM_UFlat/13                 437849    438413  929.2MB/s  txt3         -0.1%
  BM_UFlat/14                 603587    605309  759.8MB/s  txt4         -0.3%
  BM_UFlat/15                 249799    248067  1.9GB/s  bin            +0.7%
  BM_UFlat/16                    191       188  1011.4MB/s  bin_200     +1.6%
  BM_UFlat/17                  26064     24778  1.4GB/s  sum            +5.2%
  BM_UFlat/18                   2620      2601  1.5GB/s  man            +0.7%
  BM_UFlat/19                  44551     37373  3.0GB/s  pb            +19.2%
  BM_UFlat/20                 165408    164584  1.0GB/s  gaviota        +0.5%
  Sum of all benchmarks      3408011   3385508                          +0.7%

git-svn-id: https://snappy.googlecode.com/svn/trunk@78 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Make the two IncrementalCopy* functions take in an ssize_t instead of a len,
in order to avoid having to do 32-to-64-bit signed conversions on a hot path
during decompression. (Also fixes some MSVC warnings, mentioned in public
issue 75, but more of those remain.) They cannot be size_t because we expect
them to go negative and test for that.

This saves a few movzwl instructions, yielding ~2% speedup in decompression.

Sandy Bridge:

Benchmark                          Base (ns)  New (ns)                                Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0                             48009     41283  2.3GB/s  html                   +16.3%
BM_UFlat/1                            531274    513419  1.3GB/s  urls                    +3.5%
BM_UFlat/2                              7378      7062  16.8GB/s  jpg                    +4.5%
BM_UFlat/3                                92        92  2.0GB/s  jpg_200                 +0.0%
BM_UFlat/4                             15057     14974  5.9GB/s  pdf                     +0.6%
BM_UFlat/5                            204323    193140  2.0GB/s  html4                   +5.8%
BM_UFlat/6                             13282     12611  1.8GB/s  cp                      +5.3%
BM_UFlat/7                              6511      6504  1.6GB/s  c                       +0.1%
BM_UFlat/8                              2014      2030  1.7GB/s  lsp                     -0.8%
BM_UFlat/9                            775909    768336  1.3GB/s  xls                     +1.0%
BM_UFlat/10                              182       184  1043.2MB/s  xls_200              -1.1%
BM_UFlat/11                           167352    161630  901.2MB/s  txt1                  +3.5%
BM_UFlat/12                           147393    142246  842.8MB/s  txt2                  +3.6%
BM_UFlat/13                           449960    432853  944.4MB/s  txt3                  +4.0%
BM_UFlat/14                           620497    594845  775.9MB/s  txt4                  +4.3%
BM_UFlat/15                           265610    267356  1.8GB/s  bin                     -0.7%
BM_UFlat/16                              206       205  932.7MB/s  bin_200               +0.5%
BM_UFlat/17                            25561     24730  1.4GB/s  sum                     +3.4%
BM_UFlat/18                             2620      2644  1.5GB/s  man                     -0.9%
BM_UFlat/19                            45766     38589  2.9GB/s  pb                     +18.6%
BM_UFlat/20                           171107    169832  1039.5MB/s  gaviota              +0.8%
Sum of all benchmarks                3500103   3394565                                   +3.1%

Westmere:

Benchmark                          Base (ns)  New (ns)                                Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0                             72624     71526  1.3GB/s  html                    +1.5%
BM_UFlat/1                            735821    722917  930.8MB/s  urls                  +1.8%
BM_UFlat/2                             10450     10172  11.7GB/s  jpg                    +2.7%
BM_UFlat/3                               117       117  1.6GB/s  jpg_200                 +0.0%
BM_UFlat/4                             29817     29648  3.0GB/s  pdf                     +0.6%
BM_UFlat/5                            297126    293073  1.3GB/s  html4                   +1.4%
BM_UFlat/6                             28252     27994  842.0MB/s  cp                    +0.9%
BM_UFlat/7                             12672     12391  862.1MB/s  c                     +2.3%
BM_UFlat/8                              3507      3425  1040.9MB/s  lsp                  +2.4%
BM_UFlat/9                           1004268    969395  1018.0MB/s  xls                  +3.6%
BM_UFlat/10                              233       227  844.8MB/s  xls_200               +2.6%
BM_UFlat/11                           230054    224981  647.8MB/s  txt1                  +2.3%
BM_UFlat/12                           201229    196447  610.5MB/s  txt2                  +2.4%
BM_UFlat/13                           609547    596761  685.3MB/s  txt3                  +2.1%
BM_UFlat/14                           824362    804821  573.8MB/s  txt4                  +2.4%
BM_UFlat/15                           371095    374899  1.3GB/s  bin                     -1.0%
BM_UFlat/16                              267       267  717.8MB/s  bin_200               +0.0%
BM_UFlat/17                            44623     43828  835.9MB/s  sum                   +1.8%
BM_UFlat/18                             5077      4815  841.0MB/s  man                   +5.4%
BM_UFlat/19                            74964     73210  1.5GB/s  pb                      +2.4%
BM_UFlat/20                           237987    236745  746.0MB/s  gaviota               +0.5%
Sum of all benchmarks                4794092   4697659                                   +2.1%

Istanbul:

Benchmark                          Base (ns)  New (ns)                                Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0                             98614     96376  1020.4MB/s  html                 +2.3%
BM_UFlat/1                            963740    953241  707.2MB/s  urls                  +1.1%
BM_UFlat/2                             25042     24769  4.8GB/s  jpg                     +1.1%
BM_UFlat/3                               180       180  1065.6MB/s  jpg_200              +0.0%
BM_UFlat/4                             45942     45403  1.9GB/s  pdf                     +1.2%
BM_UFlat/5                            400135    390226  1008.2MB/s  html4                +2.5%
BM_UFlat/6                             37768     37392  631.9MB/s  cp                    +1.0%
BM_UFlat/7                             18585     18200  588.2MB/s  c                     +2.1%
BM_UFlat/8                              5751      5690  627.7MB/s  lsp                   +1.1%
BM_UFlat/9                           1543154   1542209  641.4MB/s  xls                   +0.1%
BM_UFlat/10                              381       388  494.6MB/s  xls_200               -1.8%
BM_UFlat/11                           339715    331973  440.1MB/s  txt1                  +2.3%
BM_UFlat/12                           294807    289418  415.4MB/s  txt2                  +1.9%
BM_UFlat/13                           906160    884094  463.3MB/s  txt3                  +2.5%
BM_UFlat/14                          1224221   1198435  386.1MB/s  txt4                  +2.2%
BM_UFlat/15                           516277    502923  979.5MB/s  bin                   +2.7%
BM_UFlat/16                              405       402  477.2MB/s  bin_200               +0.7%
BM_UFlat/17                            61640     60621  605.6MB/s  sum                   +1.7%
BM_UFlat/18                             7326      7383  549.5MB/s  man                   -0.8%
BM_UFlat/19                            94720     92653  1.2GB/s  pb                      +2.2%
BM_UFlat/20                           360435    346687  510.6MB/s  gaviota               +4.0%
Sum of all benchmarks                6944998   6828663                                   +1.7%

git-svn-id: https://snappy.googlecode.com/svn/trunk@77 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Add support for uncompressing to iovecs (scatter I/O).
Windows does not have struct iovec defined anywhere,
so we define our own version that's equal to what UNIX
typically has.

The bulk of this patch was contributed by Mohit Aron.

R=jeff

git-svn-id: https://snappy.googlecode.com/svn/trunk@76 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Some code reorganization needed for an internal change.

R=fikes

git-svn-id: https://snappy.googlecode.com/svn/trunk@75 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Supports truncated test data in zippy benchmark.

R=sesse

git-svn-id: https://snappy.googlecode.com/svn/trunk@74 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Release Snappy 1.1.0.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@73 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Make ./snappy_unittest pass without "srcdir" being defined.

Previously, snappy_unittests would read from an absolute path /testdata/..;
convert it to use a relative path instead.

Patch from Marc-Antonie Ruel.

R=maruel

git-svn-id: https://snappy.googlecode.com/svn/trunk@72 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Increase the Zippy block size from 32 kB to 64 kB, winning ~3% density
while being effectively performance neutral.

The longer story about density is that we win 3-6% density on the benchmarks
where this has any effect at all; many of the benchmarks (cp, c, lsp, man)
are smaller than 32 kB and thus will have no effect. Binary data also seems
to win little or nothing; of course, the already-compressed data wins nothing.
The protobuf benchmark wins as much as ~18% depending on architecture,
but I wouldn't be too sure that this is representative of protobuf data in
general.

As of performance, we lose a tiny amount since we get more tags (e.g., a long
literal might be broken up into literal-copy-literal), but we win it back with
less clearing of the hash table, and more opportunities to skip incompressible
data (e.g. in the jpg benchmark). Decompression seems to get ever so slightly
slower, again due to more tags. The total net change is about as close to zero
as we can get, so the end effect seems to be simply more density and no
real performance change.

The comment about not changing kBlockSize, scary as it is, is not really
relevant, since we're never going to have a block-level decompressor without
explicitly marked blocks. Replace it with something more appropriate.

This affects the framing format, but it's okay to change it since it basically
has no users yet.

Density (note that cp, c, lsp and man are all smaller than 32 kB):

   Benchmark         Description   Base (%)  New (%)  Improvement
   --------------------------------------------------------------
   ZFlat/0           html            22.57    22.31     +5.6%
   ZFlat/1           urls            50.89    47.77     +6.5%
   ZFlat/2           jpg             99.88    99.87     +0.0%
   ZFlat/3           pdf             82.13    82.07     +0.1%
   ZFlat/4           html4           23.55    22.51     +4.6%
   ZFlat/5           cp              48.12    48.12     +0.0%
   ZFlat/6           c               42.40    42.40     +0.0%
   ZFlat/7           lsp             48.37    48.37     +0.0%
   ZFlat/8           xls             41.34    41.23     +0.3%
   ZFlat/9           txt1            59.81    57.87     +3.4%
   ZFlat/10          txt2            64.07    61.93     +3.5%
   ZFlat/11          txt3            57.11    54.92     +4.0%
   ZFlat/12          txt4            68.35    66.22     +3.2%
   ZFlat/13          bin             18.21    18.11     +0.6%
   ZFlat/14          sum             51.88    48.96     +6.0%
   ZFlat/15          man             59.36    59.36     +0.0%
   ZFlat/16          pb              23.15    19.64    +17.9%
   ZFlat/17          gaviota         38.27    37.72     +1.5%
   Geometric mean                    45.51    44.15     +3.1%

Microbenchmarks (64-bit, opt):

Westmere 2.8 GHz:

   Benchmark                          Base (ns)  New (ns)                                Improvement
   -------------------------------------------------------------------------------------------------
   BM_UFlat/0                             75342     75027  1.3GB/s  html                    +0.4%
   BM_UFlat/1                            723767    744269  899.6MB/s  urls                  -2.8%
   BM_UFlat/2                             10072     10072  11.7GB/s  jpg                    +0.0%
   BM_UFlat/3                             30747     30388  2.9GB/s  pdf                     +1.2%
   BM_UFlat/4                            307353    306063  1.2GB/s  html4                   +0.4%
   BM_UFlat/5                             28593     28743  816.3MB/s  cp                    -0.5%
   BM_UFlat/6                             12958     12998  818.1MB/s  c                     -0.3%
   BM_UFlat/7                              3700      3792  935.8MB/s  lsp                   -2.4%
   BM_UFlat/8                            999685    999905  982.1MB/s  xls                   -0.0%
   BM_UFlat/9                            232954    230079  630.4MB/s  txt1                  +1.2%
   BM_UFlat/10                           200785    201468  592.6MB/s  txt2                  -0.3%
   BM_UFlat/11                           617267    610968  666.1MB/s  txt3                  +1.0%
   BM_UFlat/12                           821595    822475  558.7MB/s  txt4                  -0.1%
   BM_UFlat/13                           377097    377632  1.3GB/s  bin                     -0.1%
   BM_UFlat/14                            45476     45260  805.8MB/s  sum                   +0.5%
   BM_UFlat/15                             4985      5003  805.7MB/s  man                   -0.4%
   BM_UFlat/16                            80813     77494  1.4GB/s  pb                      +4.3%
   BM_UFlat/17                           251792    241553  727.7MB/s  gaviota               +4.2%
   BM_UValidate/0                         40343     40354  2.4GB/s  html                    -0.0%
   BM_UValidate/1                        426890    451574  1.4GB/s  urls                    -5.5%
   BM_UValidate/2                           187       179  661.9GB/s  jpg                   +4.5%
   BM_UValidate/3                         13783     13827  6.4GB/s  pdf                     -0.3%
   BM_UValidate/4                        162393    163335  2.3GB/s  html4                   -0.6%
   BM_UDataBuffer/0                       93756     93302  1046.7MB/s  html                 +0.5%
   BM_UDataBuffer/1                      886714    916292  730.7MB/s  urls                  -3.2%
   BM_UDataBuffer/2                       15861     16401  7.2GB/s  jpg                     -3.3%
   BM_UDataBuffer/3                       38934     39224  2.2GB/s  pdf                     -0.7%
   BM_UDataBuffer/4                      381008    379428  1029.5MB/s  html4                +0.4%
   BM_UCord/0                             92528     91098  1072.0MB/s  html                 +1.6%
   BM_UCord/1                            858421    885287  756.3MB/s  urls                  -3.0%
   BM_UCord/2                             13140     13464  8.8GB/s  jpg                     -2.4%
   BM_UCord/3                             39012     37773  2.3GB/s  pdf                     +3.3%
   BM_UCord/4                            376869    371267  1052.1MB/s  html4                +1.5%
   BM_UCordString/0                       75810     75303  1.3GB/s  html                    +0.7%
   BM_UCordString/1                      735290    753841  888.2MB/s  urls                  -2.5%
   BM_UCordString/2                       11945     13113  9.0GB/s  jpg                     -8.9%
   BM_UCordString/3                       33901     32562  2.7GB/s  pdf                     +4.1%
   BM_UCordString/4                      310985    309390  1.2GB/s  html4                   +0.5%
   BM_UCordValidate/0                     40952     40450  2.4GB/s  html                    +1.2%
   BM_UCordValidate/1                    433842    456531  1.4GB/s  urls                    -5.0%
   BM_UCordValidate/2                      1179      1173  100.8GB/s  jpg                   +0.5%
   BM_UCordValidate/3                     14481     14392  6.1GB/s  pdf                     +0.6%
   BM_UCordValidate/4                    164364    164151  2.3GB/s  html4                   +0.1%
   BM_ZFlat/0                            160610    156601  623.6MB/s  html (22.31 %)        +2.6%
   BM_ZFlat/1                           1995238   1993582  335.9MB/s  urls (47.77 %)        +0.1%
   BM_ZFlat/2                             30133     24983  4.7GB/s  jpg (99.87 %)          +20.6%
   BM_ZFlat/3                             74453     73128  1.2GB/s  pdf (82.07 %)           +1.8%
   BM_ZFlat/4                            647674    633729  616.4MB/s  html4 (22.51 %)       +2.2%
   BM_ZFlat/5                             76259     76090  308.4MB/s  cp (48.12 %)          +0.2%
   BM_ZFlat/6                             31106     31084  342.1MB/s  c (42.40 %)           +0.1%
   BM_ZFlat/7                             10507     10443  339.8MB/s  lsp (48.37 %)         +0.6%
   BM_ZFlat/8                           1811047   1793325  547.6MB/s  xls (41.23 %)         +1.0%
   BM_ZFlat/9                            597903    581793  249.3MB/s  txt1 (57.87 %)        +2.8%
   BM_ZFlat/10                           525320    514522  232.0MB/s  txt2 (61.93 %)        +2.1%
   BM_ZFlat/11                          1596591   1551636  262.3MB/s  txt3 (54.92 %)        +2.9%
   BM_ZFlat/12                          2134523   2094033  219.5MB/s  txt4 (66.22 %)        +1.9%
   BM_ZFlat/13                           593024    587869  832.6MB/s  bin (18.11 %)         +0.9%
   BM_ZFlat/14                           114746    110666  329.5MB/s  sum (48.96 %)         +3.7%
   BM_ZFlat/15                            14376     14485  278.3MB/s  man (59.36 %)         -0.8%
   BM_ZFlat/16                           167908    150070  753.6MB/s  pb (19.64 %)         +11.9%
   BM_ZFlat/17                           460228    442253  397.5MB/s  gaviota (37.72 %)     +4.1%
   BM_ZCord/0                            164896    160241  609.4MB/s  html                  +2.9%
   BM_ZCord/1                           2070239   2043492  327.7MB/s  urls                  +1.3%
   BM_ZCord/2                             54402     47002  2.5GB/s  jpg                    +15.7%
   BM_ZCord/3                             85871     83832  1073.1MB/s  pdf                  +2.4%
   BM_ZCord/4                            664078    648825  602.0MB/s  html4                 +2.4%
   BM_ZDataBuffer/0                      174874    172549  566.0MB/s  html                  +1.3%
   BM_ZDataBuffer/1                     2134410   2139173  313.0MB/s  urls                  -0.2%
   BM_ZDataBuffer/2                       71911     69551  1.7GB/s  jpg                     +3.4%
   BM_ZDataBuffer/3                       98236     99727  902.1MB/s  pdf                   -1.5%
   BM_ZDataBuffer/4                      710776    699104  558.8MB/s  html4                 +1.7%
   Sum of all benchmarks               27358908  27200688                                   +0.6%

Sandy Bridge 2.6 GHz:

   Benchmark                          Base (ns)  New (ns)                                Improvement
   -------------------------------------------------------------------------------------------------
   BM_UFlat/0                             49356     49018  1.9GB/s  html                    +0.7%
   BM_UFlat/1                            516764    531955  1.2GB/s  urls                    -2.9%
   BM_UFlat/2                              6982      7304  16.2GB/s  jpg                    -4.4%
   BM_UFlat/3                             15285     15598  5.6GB/s  pdf                     -2.0%
   BM_UFlat/4                            206557    206669  1.8GB/s  html4                   -0.1%
   BM_UFlat/5                             13681     13567  1.7GB/s  cp                      +0.8%
   BM_UFlat/6                              6571      6592  1.6GB/s  c                       -0.3%
   BM_UFlat/7                              2008      1994  1.7GB/s  lsp                     +0.7%
   BM_UFlat/8                            775700    773286  1.2GB/s  xls                     +0.3%
   BM_UFlat/9                            165578    164480  881.8MB/s  txt1                  +0.7%
   BM_UFlat/10                           143707    144139  828.2MB/s  txt2                  -0.3%
   BM_UFlat/11                           443026    436281  932.8MB/s  txt3                  +1.5%
   BM_UFlat/12                           603129    595856  771.2MB/s  txt4                  +1.2%
   BM_UFlat/13                           271682    270450  1.8GB/s  bin                     +0.5%
   BM_UFlat/14                            26200     25666  1.4GB/s  sum                     +2.1%
   BM_UFlat/15                             2620      2608  1.5GB/s  man                     +0.5%
   BM_UFlat/16                            48908     47756  2.3GB/s  pb                      +2.4%
   BM_UFlat/17                           174638    170346  1031.9MB/s  gaviota              +2.5%
   BM_UValidate/0                         31922     31898  3.0GB/s  html                    +0.1%
   BM_UValidate/1                        341265    363554  1.8GB/s  urls                    -6.1%
   BM_UValidate/2                           160       151  782.8GB/s  jpg                   +6.0%
   BM_UValidate/3                         10402     10380  8.5GB/s  pdf                     +0.2%
   BM_UValidate/4                        129490    130587  2.9GB/s  html4                   -0.8%
   BM_UDataBuffer/0                       59383     58736  1.6GB/s  html                    +1.1%
   BM_UDataBuffer/1                      619222    637786  1049.8MB/s  urls                 -2.9%
   BM_UDataBuffer/2                       10775     11941  9.9GB/s  jpg                     -9.8%
   BM_UDataBuffer/3                       18002     17930  4.9GB/s  pdf                     +0.4%
   BM_UDataBuffer/4                      259182    259306  1.5GB/s  html4                   -0.0%
   BM_UCord/0                             59379     57814  1.6GB/s  html                    +2.7%
   BM_UCord/1                            598456    615162  1088.4MB/s  urls                 -2.7%
   BM_UCord/2                              8519      8628  13.7GB/s  jpg                    -1.3%
   BM_UCord/3                             18123     17537  5.0GB/s  pdf                     +3.3%
   BM_UCord/4                            252375    252331  1.5GB/s  html4                   +0.0%
   BM_UCordString/0                       49494     49790  1.9GB/s  html                    -0.6%
   BM_UCordString/1                      524659    541803  1.2GB/s  urls                    -3.2%
   BM_UCordString/2                        8206      8354  14.2GB/s  jpg                    -1.8%
   BM_UCordString/3                       17235     16537  5.3GB/s  pdf                     +4.2%
   BM_UCordString/4                      210188    211072  1.8GB/s  html4                   -0.4%
   BM_UCordValidate/0                     31956     31587  3.0GB/s  html                    +1.2%
   BM_UCordValidate/1                    340828    362141  1.8GB/s  urls                    -5.9%
   BM_UCordValidate/2                       783       744  158.9GB/s  jpg                   +5.2%
   BM_UCordValidate/3                     10543     10462  8.4GB/s  pdf                     +0.8%
   BM_UCordValidate/4                    130150    129789  2.9GB/s  html4                   +0.3%
   BM_ZFlat/0                            113873    111200  878.2MB/s  html (22.31 %)        +2.4%
   BM_ZFlat/1                           1473023   1489858  449.4MB/s  urls (47.77 %)        -1.1%
   BM_ZFlat/2                             23569     19486  6.1GB/s  jpg (99.87 %)          +21.0%
   BM_ZFlat/3                             49178     48046  1.8GB/s  pdf (82.07 %)           +2.4%
   BM_ZFlat/4                            475063    469394  832.2MB/s  html4 (22.51 %)       +1.2%
   BM_ZFlat/5                             46910     46816  501.2MB/s  cp (48.12 %)          +0.2%
   BM_ZFlat/6                             16883     16916  628.6MB/s  c (42.40 %)           -0.2%
   BM_ZFlat/7                              5381      5447  651.5MB/s  lsp (48.37 %)         -1.2%
   BM_ZFlat/8                           1466870   1473861  666.3MB/s  xls (41.23 %)         -0.5%
   BM_ZFlat/9                            468006    464101  312.5MB/s  txt1 (57.87 %)        +0.8%
   BM_ZFlat/10                           408157    408957  291.9MB/s  txt2 (61.93 %)        -0.2%
   BM_ZFlat/11                          1253348   1232910  330.1MB/s  txt3 (54.92 %)        +1.7%
   BM_ZFlat/12                          1702373   1702977  269.8MB/s  txt4 (66.22 %)        -0.0%
   BM_ZFlat/13                           439792    438557  1116.0MB/s  bin (18.11 %)        +0.3%
   BM_ZFlat/14                            80766     78851  462.5MB/s  sum (48.96 %)         +2.4%
   BM_ZFlat/15                             7420      7542  534.5MB/s  man (59.36 %)         -1.6%
   BM_ZFlat/16                           112043    100126  1.1GB/s  pb (19.64 %)           +11.9%
   BM_ZFlat/17                           368877    357703  491.4MB/s  gaviota (37.72 %)     +3.1%
   BM_ZCord/0                            116402    113564  859.9MB/s  html                  +2.5%
   BM_ZCord/1                           1507156   1519911  440.5MB/s  urls                  -0.8%
   BM_ZCord/2                             39860     33686  3.5GB/s  jpg                    +18.3%
   BM_ZCord/3                             56211     54694  1.6GB/s  pdf                     +2.8%
   BM_ZCord/4                            485594    479212  815.1MB/s  html4                 +1.3%
   BM_ZDataBuffer/0                      123185    121572  803.3MB/s  html                  +1.3%
   BM_ZDataBuffer/1                     1569111   1589380  421.3MB/s  urls                  -1.3%
   BM_ZDataBuffer/2                       53143     49556  2.4GB/s  jpg                     +7.2%
   BM_ZDataBuffer/3                       65725     66826  1.3GB/s  pdf                     -1.6%
   BM_ZDataBuffer/4                      517871    514750  758.9MB/s  html4                 +0.6%
   Sum of all benchmarks               20258879  20315484                                   -0.3%

AMD Instanbul 2.4 GHz:

   Benchmark                          Base (ns)  New (ns)                                Improvement
   -------------------------------------------------------------------------------------------------
   BM_UFlat/0                             97120     96585  1011.1MB/s  html                 +0.6%
   BM_UFlat/1                            917473    948016  706.3MB/s  urls                  -3.2%
   BM_UFlat/2                             21496     23938  4.9GB/s  jpg                    -10.2%
   BM_UFlat/3                             44751     45639  1.9GB/s  pdf                     -1.9%
   BM_UFlat/4                            391950    391413  998.0MB/s  html4                 +0.1%
   BM_UFlat/5                             37366     37201  630.7MB/s  cp                    +0.4%
   BM_UFlat/6                             18350     18318  580.5MB/s  c                     +0.2%
   BM_UFlat/7                              5672      5661  626.9MB/s  lsp                   +0.2%
   BM_UFlat/8                           1533390   1529441  642.1MB/s  xls                   +0.3%
   BM_UFlat/9                            335477    336553  431.0MB/s  txt1                  -0.3%
   BM_UFlat/10                           285140    292080  408.7MB/s  txt2                  -2.4%
   BM_UFlat/11                           888507    894758  454.9MB/s  txt3                  -0.7%
   BM_UFlat/12                          1187643   1210928  379.5MB/s  txt4                  -1.9%
   BM_UFlat/13                           493717    507447  964.5MB/s  bin                   -2.7%
   BM_UFlat/14                            61740     60870  599.1MB/s  sum                   +1.4%
   BM_UFlat/15                             7211      7187  560.9MB/s  man                   +0.3%
   BM_UFlat/16                            97435     93100  1.2GB/s  pb                      +4.7%
   BM_UFlat/17                           362662    356395  493.2MB/s  gaviota               +1.8%
   BM_UValidate/0                         47475     47118  2.0GB/s  html                    +0.8%
   BM_UValidate/1                        501304    529741  1.2GB/s  urls                    -5.4%
   BM_UValidate/2                           276       243  486.2GB/s  jpg                  +13.6%
   BM_UValidate/3                         16361     16261  5.4GB/s  pdf                     +0.6%
   BM_UValidate/4                        190741    190353  2.0GB/s  html4                   +0.2%
   BM_UDataBuffer/0                      111080    109771  889.6MB/s  html                  +1.2%
   BM_UDataBuffer/1                     1051035   1085999  616.5MB/s  urls                  -3.2%
   BM_UDataBuffer/2                       25801     25463  4.6GB/s  jpg                     +1.3%
   BM_UDataBuffer/3                       50493     49946  1.8GB/s  pdf                     +1.1%
   BM_UDataBuffer/4                      447258    444138  879.5MB/s  html4                 +0.7%
   BM_UCord/0                            109350    107909  905.0MB/s  html                  +1.3%
   BM_UCord/1                           1023396   1054964  634.7MB/s  urls                  -3.0%
   BM_UCord/2                             25292     24371  4.9GB/s  jpg                     +3.8%
   BM_UCord/3                             48955     49736  1.8GB/s  pdf                     -1.6%
   BM_UCord/4                            440452    437331  893.2MB/s  html4                 +0.7%
   BM_UCordString/0                       98511     98031  996.2MB/s  html                  +0.5%
   BM_UCordString/1                      933230    963495  694.9MB/s  urls                  -3.1%
   BM_UCordString/2                       23311     24076  4.9GB/s  jpg                     -3.2%
   BM_UCordString/3                       45568     46196  1.9GB/s  pdf                     -1.4%
   BM_UCordString/4                      397791    396934  984.1MB/s  html4                 +0.2%
   BM_UCordValidate/0                     47537     46921  2.0GB/s  html                    +1.3%
   BM_UCordValidate/1                    505071    532716  1.2GB/s  urls                    -5.2%
   BM_UCordValidate/2                      1663      1621  72.9GB/s  jpg                    +2.6%
   BM_UCordValidate/3                     16890     16926  5.2GB/s  pdf                     -0.2%
   BM_UCordValidate/4                    192365    191984  2.0GB/s  html4                   +0.2%
   BM_ZFlat/0                            184708    179103  545.3MB/s  html (22.31 %)        +3.1%
   BM_ZFlat/1                           2293864   2302950  290.7MB/s  urls (47.77 %)        -0.4%
   BM_ZFlat/2                             52852     47618  2.5GB/s  jpg (99.87 %)          +11.0%
   BM_ZFlat/3                            100766     96179  935.3MB/s  pdf (82.07 %)         +4.8%
   BM_ZFlat/4                            741220    727977  536.6MB/s  html4 (22.51 %)       +1.8%
   BM_ZFlat/5                             85402     85418  274.7MB/s  cp (48.12 %)          -0.0%
   BM_ZFlat/6                             36558     36494  291.4MB/s  c (42.40 %)           +0.2%
   BM_ZFlat/7                             12706     12507  283.7MB/s  lsp (48.37 %)         +1.6%
   BM_ZFlat/8                           2336823   2335688  420.5MB/s  xls (41.23 %)         +0.0%
   BM_ZFlat/9                            701804    681153  212.9MB/s  txt1 (57.87 %)        +3.0%
   BM_ZFlat/10                           606700    597194  199.9MB/s  txt2 (61.93 %)        +1.6%
   BM_ZFlat/11                          1852283   1803238  225.7MB/s  txt3 (54.92 %)        +2.7%
   BM_ZFlat/12                          2475527   2443354  188.1MB/s  txt4 (66.22 %)        +1.3%
   BM_ZFlat/13                           694497    696654  702.6MB/s  bin (18.11 %)         -0.3%
   BM_ZFlat/14                           136929    129855  280.8MB/s  sum (48.96 %)         +5.4%
   BM_ZFlat/15                            17172     17124  235.4MB/s  man (59.36 %)         +0.3%
   BM_ZFlat/16                           190364    171763  658.4MB/s  pb (19.64 %)         +10.8%
   BM_ZFlat/17                           567285    555190  316.6MB/s  gaviota (37.72 %)     +2.2%
   BM_ZCord/0                            193490    187031  522.1MB/s  html                  +3.5%
   BM_ZCord/1                           2427537   2415315  277.2MB/s  urls                  +0.5%
   BM_ZCord/2                             85378     81412  1.5GB/s  jpg                     +4.9%
   BM_ZCord/3                            121898    119419  753.3MB/s  pdf                   +2.1%
   BM_ZCord/4                            779564    762961  512.0MB/s  html4                 +2.2%
   BM_ZDataBuffer/0                      213820    207272  471.1MB/s  html                  +3.2%
   BM_ZDataBuffer/1                     2589010   2586495  258.9MB/s  urls                  +0.1%
   BM_ZDataBuffer/2                      121871    118885  1018.4MB/s  jpg                  +2.5%
   BM_ZDataBuffer/3                      145382    145986  616.2MB/s  pdf                   -0.4%
   BM_ZDataBuffer/4                      868117    852754  458.1MB/s  html4                 +1.8%
   Sum of all benchmarks               33771833  33744763                                   +0.1%

git-svn-id: https://snappy.googlecode.com/svn/trunk@71 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Adjust the Snappy open-source distribution for the changes in Google's
internal file API.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@70 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Change a few ORs to additions where they don't matter. This helps the compiler
use the LEA instruction more efficiently, since e.g. a + (b << 2) can be encoded
as one instruction. Even more importantly, it can constant-fold the
COPY_* enums together with the shifted negative constants, which also saves
some instructions. (We don't need it for LITERAL, since it happens to be 0.)

I am unsure why the compiler couldn't do this itself, but the theory is that
it cannot prove that len-1 and len-4 cannot underflow/wrap, and thus can't
do the optimization safely.

The gains are small but measurable; 0.5-1.0% over the BM_Z* benchmarks
(measured on Westmere, Sandy Bridge and Istanbul).

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@69 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Stop giving -Werror to automake, due to an incompatibility between current
versions of libtool and automake on non-GNU platforms (e.g. Mac OS X).

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@68 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Fix public issue 66: Document GetUncompressedLength better, in particular that
it leaves the source in a state that's not appropriate for RawUncompress.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@67 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Fix public issue 64: Check for <sys/time.h> at configure time,
since MSVC seemingly does not have it.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@66 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Handle the case where gettimeofday() goes backwards or returns the same value
twice; it could cause division by zero in the unit test framework.
(We already had one fix for this in place, but it was incomplete.)

This could in theory happen on any system, since there are few guarantees
about gettimeofday(), but seems to only happen in practice on GNU/Hurd, where
gettimeofday() is cached and only updated ever so often.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@65 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Mark ARMv4 as not supporting unaligned accesses (not just ARMv5 and ARMv6);
apparently Debian still targets these by default, giving us segfaults on
armel.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@64 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Fix public bug #62: Remove an extraneous comma at the end of an enum list,
causing compile errors when embedded in Mozilla on OpenBSD.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@63 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Snappy library no longer depends on iostream.

Achieved by moving logging macro definitions to a test-only
header file, and by changing non-test code to use assert,
fprintf, and abort instead of LOG/CHECK macros.

R=sesse

git-svn-id: https://snappy.googlecode.com/svn/trunk@62 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Release Snappy 1.0.5.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@61 03e5f5b5-db94-4691-08a0-1a8bf15f6143

For 32-bit platforms, do not try to accelerate multiple neighboring
32-bit loads with a 64-bit load during compression (it's not a win).

The main target for this optimization is ARM, but 32-bit x86 gets
a small gain, too, although there is noise in the microbenchmarks.
It's a no-op for 64-bit x86. It does not affect decompression.

Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from
Ubuntu/Linaro), -O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9
-mthumb-interwork, minimum 1000 iterations:

  Benchmark            Time(ns)    CPU(ns) Iterations
  ---------------------------------------------------
  BM_ZFlat/0            1158277    1160000       1000 84.2MB/s  html (23.57 %)    [ +4.3%]
  BM_ZFlat/1           14861782   14860000       1000 45.1MB/s  urls (50.89 %)    [ +1.1%]
  BM_ZFlat/2             393595     390000       1000 310.5MB/s  jpg (99.88 %)    [ +0.0%]
  BM_ZFlat/3             650583     650000       1000 138.4MB/s  pdf (82.13 %)    [ +3.1%]
  BM_ZFlat/4            4661480    4660000       1000 83.8MB/s  html4 (23.55 %)   [ +4.3%]
  BM_ZFlat/5             491973     490000       1000 47.9MB/s  cp (48.12 %)      [ +2.0%]
  BM_ZFlat/6             193575     192678       1038 55.2MB/s  c (42.40 %)       [ +9.0%]
  BM_ZFlat/7              62343      62754       3187 56.5MB/s  lsp (48.37 %)     [ +2.6%]
  BM_ZFlat/8           17708468   17710000       1000 55.5MB/s  xls (41.34 %)     [ -0.3%]
  BM_ZFlat/9            3755345    3760000       1000 38.6MB/s  txt1 (59.81 %)    [ +8.2%]
  BM_ZFlat/10           3324217    3320000       1000 36.0MB/s  txt2 (64.07 %)    [ +4.2%]
  BM_ZFlat/11          10139932   10140000       1000 40.1MB/s  txt3 (57.11 %)    [ +6.4%]
  BM_ZFlat/12          13532109   13530000       1000 34.0MB/s  txt4 (68.35 %)    [ +5.0%]
  BM_ZFlat/13           4690847    4690000       1000 104.4MB/s  bin (18.21 %)    [ +4.1%]
  BM_ZFlat/14            830682     830000       1000 43.9MB/s  sum (51.88 %)     [ +1.2%]
  BM_ZFlat/15             84784      85011       2235 47.4MB/s  man (59.36 %)     [ +1.1%]
  BM_ZFlat/16           1293254    1290000       1000 87.7MB/s  pb (23.15 %)      [ +2.3%]
  BM_ZFlat/17           2775155    2780000       1000 63.2MB/s  gaviota (38.27 %) [+12.2%]

Core i7 in 32-bit mode (only one run and 100 iterations, though, so noisy):

  Benchmark            Time(ns)    CPU(ns) Iterations
  ---------------------------------------------------
  BM_ZFlat/0             227582     223464       3043 437.0MB/s  html (23.57 %)    [ +7.4%]
  BM_ZFlat/1            2982430    2918455        233 229.4MB/s  urls (50.89 %)    [ +2.9%]
  BM_ZFlat/2              46967      46658      15217 2.5GB/s  jpg (99.88 %)       [ +0.0%]
  BM_ZFlat/3             115298     114864       5833 783.2MB/s  pdf (82.13 %)     [ +1.5%]
  BM_ZFlat/4             913440     899743        778 434.2MB/s  html4 (23.55 %)   [ +0.3%]
  BM_ZFlat/5             110302     108571       7000 216.1MB/s  cp (48.12 %)      [ +0.0%]
  BM_ZFlat/6              44409      43372      15909 245.2MB/s  c (42.40 %)       [ +0.8%]
  BM_ZFlat/7              15713      15643      46667 226.9MB/s  lsp (48.37 %)     [ +2.7%]
  BM_ZFlat/8            2625539    2602230        269 377.4MB/s  xls (41.34 %)     [ +1.4%]
  BM_ZFlat/9             808884     811429        875 178.8MB/s  txt1 (59.81 %)    [ -3.9%]
  BM_ZFlat/10            709532     700000       1000 170.5MB/s  txt2 (64.07 %)    [ +0.0%]
  BM_ZFlat/11           2177682    2162162        333 188.2MB/s  txt3 (57.11 %)    [ -1.4%]
  BM_ZFlat/12           2849640    2840000        250 161.8MB/s  txt4 (68.35 %)    [ -1.4%]
  BM_ZFlat/13            849760     835476        778 585.8MB/s  bin (18.21 %)     [ +1.2%]
  BM_ZFlat/14            165940     164571       4375 221.6MB/s  sum (51.88 %)     [ +1.4%]
  BM_ZFlat/15             20939      20571      35000 196.0MB/s  man (59.36 %)     [ +2.1%]
  BM_ZFlat/16            239209     236544       2917 478.1MB/s  pb (23.15 %)      [ +4.2%]
  BM_ZFlat/17            616206     610000       1000 288.2MB/s  gaviota (38.27 %) [ -1.6%]

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@60 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Enable the use of unaligned loads and stores for ARM-based architectures
where they are available (ARMv7 and higher). This gives a significant
speed boost on ARM, both for compression and decompression.
It should not affect x86 at all.

There are more changes possible to speed up ARM, but it might not be
that easy to do without hurting x86 or making the code uglier.
Also, we de not try to use NEON yet.

Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro),
-O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork:

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             524806     529100        378 184.6MB/s  html            [+33.6%]
BM_UFlat/1            5139790    5200000        100 128.8MB/s  urls            [+28.8%]
BM_UFlat/2              86540      84166       1901 1.4GB/s  jpg               [ +0.6%]
BM_UFlat/3             215351     210176        904 428.0MB/s  pdf             [+29.8%]
BM_UFlat/4            2144490    2100000        100 186.0MB/s  html4           [+33.3%]
BM_UFlat/5             194482     190000       1000 123.5MB/s  cp              [+36.2%]
BM_UFlat/6              91843      90175       2107 117.9MB/s  c               [+38.6%]
BM_UFlat/7              28535      28426       6684 124.8MB/s  lsp             [+34.7%]
BM_UFlat/8            9206600    9200000        100 106.7MB/s  xls             [+42.4%]
BM_UFlat/9            1865273    1886792        106 76.9MB/s  txt1             [+32.5%]
BM_UFlat/10           1576809    1587301        126 75.2MB/s  txt2             [+32.3%]
BM_UFlat/11           4968450    4900000        100 83.1MB/s  txt3             [+32.7%]
BM_UFlat/12           6673970    6700000        100 68.6MB/s  txt4             [+32.8%]
BM_UFlat/13           2391470    2400000        100 203.9MB/s  bin             [+29.2%]
BM_UFlat/14            334601     344827        522 105.8MB/s  sum             [+30.6%]
BM_UFlat/15             37404      38080       5252 105.9MB/s  man             [+33.8%]
BM_UFlat/16            535470     540540        370 209.2MB/s  pb              [+31.2%]
BM_UFlat/17           1875245    1886792        106 93.2MB/s  gaviota          [+37.8%]
BM_UValidate/0         178425     179533       1114 543.9MB/s  html            [ +2.7%]
BM_UValidate/1        2100450    2000000        100 334.8MB/s  urls            [ +5.0%]
BM_UValidate/2           1039       1044     172413 113.3GB/s  jpg             [ +3.4%]
BM_UValidate/3          59423      59470       3363 1.5GB/s  pdf               [ +7.8%]
BM_UValidate/4         760716     766283        261 509.8MB/s  html4           [ +6.5%]
BM_ZFlat/0            1204632    1204819        166 81.1MB/s  html (23.57 %)   [+32.8%]
BM_ZFlat/1           15656190   15600000        100 42.9MB/s  urls (50.89 %)   [+27.6%]
BM_ZFlat/2             403336     410677        487 294.8MB/s  jpg (99.88 %)   [+16.5%]
BM_ZFlat/3             664073     671140        298 134.0MB/s  pdf (82.13 %)   [+28.4%]
BM_ZFlat/4            4961940    4900000        100 79.7MB/s  html4 (23.55 %)  [+30.6%]
BM_ZFlat/5             500664     501253        399 46.8MB/s  cp (48.12 %)     [+33.4%]
BM_ZFlat/6             217276     215982        926 49.2MB/s  c (42.40 %)      [+25.0%]
BM_ZFlat/7              64122      65487       3054 54.2MB/s  lsp (48.37 %)    [+36.1%]
BM_ZFlat/8           18045730   18000000        100 54.6MB/s  xls (41.34 %)    [+34.4%]
BM_ZFlat/9            4051530    4000000        100 36.3MB/s  txt1 (59.81 %)   [+25.0%]
BM_ZFlat/10           3451800    3500000        100 34.1MB/s  txt2 (64.07 %)   [+25.7%]
BM_ZFlat/11          11052340   11100000        100 36.7MB/s  txt3 (57.11 %)   [+24.3%]
BM_ZFlat/12          14538690   14600000        100 31.5MB/s  txt4 (68.35 %)   [+24.7%]
BM_ZFlat/13           5041850    5000000        100 97.9MB/s  bin (18.21 %)    [+32.0%]
BM_ZFlat/14            908840     909090        220 40.1MB/s  sum (51.88 %)    [+22.2%]
BM_ZFlat/15             86921      86206       1972 46.8MB/s  man (59.36 %)    [+42.2%]
BM_ZFlat/16           1312315    1315789        152 86.0MB/s  pb (23.15 %)     [+34.5%]
BM_ZFlat/17           3173120    3200000        100 54.9MB/s  gaviota (38.27%) [+28.1%]

The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86;
positive on the decompression side, and slightly negative on the compression side
(unless that is noise; I only ran once):

Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0                86279      86140       7778 1.1GB/s  html             [ +7.5%]
BM_UFlat/1               839265     822622        778 813.9MB/s  urls           [ +9.4%]
BM_UFlat/2                 9180       9143      87500 12.9GB/s  jpg             [ +1.2%]
BM_UFlat/3                35080      35000      20000 2.5GB/s  pdf              [+10.1%]
BM_UFlat/4               350318     345000       2000 1.1GB/s  html4            [ +7.0%]
BM_UFlat/5                33808      33472      21212 701.0MB/s  cp             [ +9.0%]
BM_UFlat/6                15201      15214      46667 698.9MB/s  c              [+14.9%]
BM_UFlat/7                 4652       4651     159091 762.9MB/s  lsp            [ +7.5%]
BM_UFlat/8              1285551    1282528        538 765.7MB/s  xls            [+10.7%]
BM_UFlat/9               282510     281690       2414 514.9MB/s  txt1           [+13.6%]
BM_UFlat/10              243494     239286       2800 498.9MB/s  txt2           [+14.4%]
BM_UFlat/11              743625     740000       1000 550.0MB/s  txt3           [+14.3%]
BM_UFlat/12              999441     989717        778 464.3MB/s  txt4           [+16.1%]
BM_UFlat/13              412402     410076       1707 1.2GB/s  bin              [ +7.3%]
BM_UFlat/14               54876      54000      10000 675.3MB/s  sum            [+13.0%]
BM_UFlat/15                6146       6100     100000 660.8MB/s  man            [+14.8%]
BM_UFlat/16               90496      90286       8750 1.2GB/s  pb               [ +4.0%]
BM_UFlat/17              292650     292000       2500 602.0MB/s  gaviota        [+18.1%]
BM_UValidate/0            49620      49699      14286 1.9GB/s  html             [ +0.0%]
BM_UValidate/1           501371     500000       1000 1.3GB/s  urls             [ +0.0%]
BM_UValidate/2              232        227    3043478 521.5GB/s  jpg            [ +1.3%]
BM_UValidate/3            17250      17143      43750 5.1GB/s  pdf              [ -1.3%]
BM_UValidate/4           198643     200000       3500 1.9GB/s  html4            [ -0.9%]
BM_ZFlat/0               227128     229415       3182 425.7MB/s  html (23.57 %) [ -1.4%]
BM_ZFlat/1              2970089    2960000        250 226.2MB/s  urls (50.89 %) [ -1.9%]
BM_ZFlat/2                45683      44999      15556 2.6GB/s  jpg (99.88 %)    [ +2.2%]
BM_ZFlat/3               114661     113136       6364 795.1MB/s  pdf (82.13 %)  [ -1.5%]
BM_ZFlat/4               919702     914286        875 427.2MB/s  html4 (23.55%) [ -1.3%]
BM_ZFlat/5               108189     108422       6364 216.4MB/s  cp (48.12 %)   [ -1.2%]
BM_ZFlat/6                44525      44000      15909 241.7MB/s  c (42.40 %)    [ -2.9%]
BM_ZFlat/7                15973      15857      46667 223.8MB/s  lsp (48.37 %)  [ +0.0%]
BM_ZFlat/8              2677888    2639405        269 372.1MB/s  xls (41.34 %)  [ -1.4%]
BM_ZFlat/9               800715     780000       1000 186.0MB/s  txt1 (59.81 %) [ -0.4%]
BM_ZFlat/10              700089     700000       1000 170.5MB/s  txt2 (64.07 %) [ -2.9%]
BM_ZFlat/11             2159356    2138365        318 190.3MB/s  txt3 (57.11 %) [ -0.3%]
BM_ZFlat/12             2796143    2779923        259 165.3MB/s  txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13              856458     835476        778 585.8MB/s  bin (18.21 %)  [ -0.1%]
BM_ZFlat/14              166908     166857       4375 218.6MB/s  sum (51.88 %)  [ -1.4%]
BM_ZFlat/15               21181      20857      35000 193.3MB/s  man (59.36 %)  [ -0.8%]
BM_ZFlat/16              244009     239973       2917 471.3MB/s  pb (23.15 %)   [ -1.4%]
BM_ZFlat/17              596362     590000       1000 297.9MB/s  gaviota (38.27%) [ +0.0%]

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@59 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Lower the size allocated in the "corrupted input" unit test from 256 MB
to 2 MB. This fixes issues with running the unit test on platforms with
little RAM (e.g. some ARM boards).

Also, reactivate the 2 MB test for 64-bit platforms; there's no good
reason why it shouldn't be.

R=sanjay

git-svn-id: https://snappy.googlecode.com/svn/trunk@58 03e5f5b5-db94-4691-08a0-1a8bf15f6143

Minor refactoring to accomodate changes in Google's internal code tree.

git-svn-id: https://snappy.googlecode.com/svn/trunk@57 03e5f5b5-db94-4691-08a0-1a8bf15f6143