costan [Fri, 9 Mar 2018 15:42:50 +0000 (07:42 -0800)]
Update CI configurations.
Bump GCC and Clang on Travis and remove Visual Studio 2015 from AppVeyor.
jgorbe [Sat, 3 Feb 2018 02:38:30 +0000 (18:38 -0800)]
Ensure DecompressAllTags starts on a 32-byte boundary + 16 bytes.
First of all, I'm sorry about this ugly hack. I hope the following long
explanation is enough to justify it.
We have observed that, in some conditions, the results for dataset number 10
(pb) in the zippy benchmark can show a >20% regression on Skylake CPUs.
In order to diagnose this, we profiled the benchmark looking at hot functions
(99% of the time is spent on DecompressAllTags), then looked at the generated
code to see if there was any difference. In order to discard a minor difference
we observed in register allocation we replaced zippy.cc with a pre-built assembly
file so it was the same in both variants, and we still were able to reproduce the
regression.
After discarding a regression caused by the compiler, we digged a bit further
and noticed that the alignment of the function in the final binary was
different. Both were aligned to a 16-byte boundary, but the slower one was also
(by chance) aligned to a 32-byte boundary. A regression caused by alignment
differences would explain why I could reproduce it consistently on the same CitC
client, but not others: slight differences in the sources can cause the resulting
binary to have different layout.
Here are some detailed benchmark results before/after the fix. Note how fixing
the alignment makes the difference between baseline and experiment go away, but
regular 32-byte alignment puts both variants in the same ballpark as the
original regression:
Original (note BM_UCord_10 and BM_UDataBuffer_10 around the -24% line):
BASELINE
BM_UCord/10 2938 2932 24194 3.767GB/s pb
BM_UDataBuffer/10 3008 3004 23316 3.677GB/s pb
EXPERIMENT
BM_UCord/10 3797 3789 18512 2.915GB/s pb
BM_UDataBuffer/10 4024 4016 17543 2.750GB/s pb
Aligning DecompressAllTags to a 32-byte boundary:
BASELINE
BM_UCord/10 3872 3862 18035 2.860GB/s pb
BM_UDataBuffer/10 4010 3998 17591 2.763GB/s pb
EXPERIMENT
BM_UCord/10 3884 3876 18126 2.850GB/s pb
BM_UDataBuffer/10 4037 4027 17199 2.743GB/s pb
Aligning DecompressAllTags to a 32-byte boundary + 16 bytes (this patch):
BASELINE
BM_UCord/10 3103 3095 22642 3.569GB/s pb
BM_UDataBuffer/10 3186 3177 21947 3.476GB/s pb
EXPERIMENT
BM_UCord/10 3104 3095 22632 3.569GB/s pb
BM_UDataBuffer/10 3167 3159 22076 3.496GB/s pb
This change forces the "good" alignment for DecompressAllTags which, if
anything, should make benchmark results more stable (and maybe we'll improve
some unlucky application!).
scrubbed [Tue, 16 Jan 2018 21:39:18 +0000 (13:39 -0800)]
Fix an incorrect analysis / comment in the "pattern doubling" code.
This should have a miniscule positive effect on performance; the
main idea of the CL is just to fix the incorrect comment.
costan [Thu, 4 Jan 2018 22:26:40 +0000 (14:26 -0800)]
Fix Travis CI configuration for OSX.
chandlerc [Fri, 22 Dec 2017 04:51:07 +0000 (20:51 -0800)]
Rework a very hot, very sensitive part of snappy to reduce the number of
instructions, the number of dynamic branches, and avoid a particular
loop structure than LLVM has a very hard time optimizing for this
particular case.
The code being changed is part of the hottest path for snappy
decompression. In the benchmarks for decompressing protocol buffers,
this has proven to be amazingly sensitive to the slightest changes in
code layout. For example, previously we added '.p2align 5' assembly
directive to the code. This essentially padded the loop out from the
function. Merely by doing this we saw significant performance
improvements.
As a consequence, several of the compiler's typically reasonable
optimizations can have surprising bad impacts. Loop unrolling is a
primary culprit, but in the next LLVM release we are seeing an issue due
to loop rotation. While some of the problems caused by the newly
triggered loop rotation in LLVM can be mitigated with ongoing work on
LLVM's code layout optimizations (specifically, loop header cloning),
that is a fairly long term project. And even minor fluctuations in how
that subsequent optimization is performed may prevent gaining the
performance back.
For now, we need some way to unblock the next LLVM release which
contains a generic improvement to the LLVM loop optimizer that enables
loop rotation in more places, but uncovers this sensitivity and weakness
in a particular case.
This CL restructures the loop to have a simpler structure. Specifically,
we eagerly test what the terminal condition will be and provide two
versions of the copy loop that use a single loop predicate.
The comments in the source code and benchmarks indicate that only one of
these two cases is actually hot: we expect to generally have enough slop
in the buffer. That in turn allows us to generate a much simpler branch
and loop structure for the hot path (especially for the protocol buffer
decompression benchmark).
However, structuring even this simple loop in a way that doesn't trigger
some other performance bubble (often a more severe one) is quite
challenging. We have to carefully manage the variables used in the loop
and the addressing pattern. We should teach LLVM how to do this
reliably, but that too is a *much* more significant undertaking and is
extremely rare to have this degree of importance. The desired structure
of the loop, as shown with IACA's analysis for the broadwell
micro-architecture (HSW and SKX are similar):
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | |
---------------------------------------------------------------------------------
| 1 | | | 1.0 1.0 | | | | | | | mov rcx, qword ptr [rdi+rdx*1-0x8]
| 2^ | | | | 0.4 | 1.0 | | | 0.6 | | mov qword ptr [rdi], rcx
| 1 | | | | 1.0 1.0 | | | | | | mov rcx, qword ptr [rdi+rdx*1]
| 2^ | | | 0.3 | | 1.0 | | | 0.7 | | mov qword ptr [rdi+0x8], rcx
| 1 | 0.5 | | | | | 0.5 | | | | add rdi, 0x10
| 1 | 0.2 | | | | | | 0.8 | | | cmp rdi, rax
| 0F | | | | | | | | | | jb 0xffffffffffffffe9
Specifically, the arrangement of addressing modes for the stores such
that micro-op fusion (indicated by the `^` on the `2` micro-op count) is
important to achieve good throughput for this loop.
The other thing necessary to make this change effective is to remove our
previous hack using `.p2align 5` to pad out the main decompression loop,
and to forcibly disable loop unrolling for critical loops. Because this
change simplifies the loop structure, more unrolling opportunities show
up. Also, the next LLVM release's generic loop optimization improvements
allow unrolling in more places, requiring still more disabling of
unrolling in this change. Perhaps most surprising of these is that we
must disable loop unrolling in the *slow* path. While unrolling there
seems pointless, it should also be harmless. This cold code is laid out
very far away from all of the hot code. All the samples shown in a
profile of the benchmark occur before this loop in the function. And
yet, if the loop gets unrolled (which seems to only happen reliably with
the next LLVM release) we see a nearly 20% regression in decompressing
protocol buffers!
With the current release of LLVM, we still observe some regression from
this source change, but it is fairly small (5% on decompressing protocol
buffers, less elsewhere). And with the next LLVM release it drops to
under 1% even in that case. Meanwhile, without this change, the next
release of LLVM will regress decompressing protocol buffers by more than
10%.
costan [Wed, 20 Dec 2017 21:08:59 +0000 (13:08 -0800)]
Fix generated version number in open source release.
Lands GitHub PR #61. The patch was also independently contributed by
Martin Gieseking <martin.gieseking@uos.de>.
costan [Thu, 24 Aug 2017 19:35:03 +0000 (12:35 -0700)]
Tag open source release 1.1.7.
wmi [Mon, 21 Aug 2017 22:33:39 +0000 (15:33 -0700)]
Add a loop alignment directive to work around a performance regression.
We found LLVM upstream change at rL310792 degraded zippy benchmark by
~3%. Performance analysis showed the regression was caused by some
side-effect. The incidental loop alignment change (from 32 bytes to 16
bytes) led to increase of branch miss prediction and caused the
regression. The regression was reproducible on several intel
micro-architectures, like sandybridge, haswell and skylake. Sadly we
still don't have good understanding about the internal of intel branch
predictor and cannot explain how the branch miss prediction increases
when the loop alignment changes, so we cannot make a real fix here. The
workaround solution in the patch is to add a directive, align the hot
loop to 32 bytes, which can restore the performance. This is in order to
unblock the flip of default compiler to LLVM.
costan [Thu, 17 Aug 2017 01:46:27 +0000 (18:46 -0700)]
Add GNUInstallDirs to CMake configuration.
This is modeled after https://github.com/google/googletest/pull/1160.
The immediate benefit is fixing the library install paths on 64-bit
Linux distributions, which tend to support running 32-bit and 64-bit
code side by side by installing 32-bit libraries in /usr/lib and 64-bit
libraries in /usr/lib64.
costan [Wed, 16 Aug 2017 19:38:06 +0000 (12:38 -0700)]
Use 64-bit optimized code path for ARM64.
This is inspired by https://github.com/google/snappy/pull/22.
Benchmark results with the change, Pixel C with Android N2G48B
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0 119544 119253 1501 818.9MB/s html
BM_UFlat/1 1223950 1208588 163 554.0MB/s urls
BM_UFlat/2 16081 15962 11527 7.2GB/s jpg
BM_UFlat/3 356 352 416666 540.6MB/s jpg_200
BM_UFlat/4 25010 24860 7683 3.8GB/s pdf
BM_UFlat/5 484832 481572 407 811.1MB/s html4
BM_UFlat/6 408410 408713 482 354.9MB/s txt1
BM_UFlat/7 361714 361663 553 330.1MB/s txt2
BM_UFlat/8 1090582 1087912 182 374.1MB/s txt3
BM_UFlat/9 1503127 1503759 133 305.6MB/s txt4
BM_UFlat/10 114183 114285 1715 989.6MB/s pb
BM_UFlat/11 406714 407331 491 431.5MB/s gaviota
BM_UIOVec/0 370397 369888 538 264.0MB/s html
BM_UIOVec/1 3207510 3190000 100 209.9MB/s urls
BM_UIOVec/2 16589 16573 11223 6.9GB/s jpg
BM_UIOVec/3 1052 1052 165289 181.2MB/s jpg_200
BM_UIOVec/4 49151 49184 3985 1.9GB/s pdf
BM_UValidate/0 68115 68095 2893 1.4GB/s html
BM_UValidate/1 792652 792000 250 845.4MB/s urls
BM_UValidate/2 334 334 487804 343.1GB/s jpg
BM_UValidate/3 235 235 666666 809.9MB/s jpg_200
BM_UValidate/4 6126 6130 32626 15.6GB/s pdf
BM_ZFlat/0 292697 290560 678 336.1MB/s html (22.31 %)
BM_ZFlat/1 4062080 4050000 100 165.3MB/s urls (47.78 %)
BM_ZFlat/2 29225 29274 6422 3.9GB/s jpg (99.95 %)
BM_ZFlat/3 1099 1098 163934 173.7MB/s jpg_200 (73.00 %)
BM_ZFlat/4 44117 44233 4205 2.2GB/s pdf (83.30 %)
BM_ZFlat/5 1158058 1157894 171 337.4MB/s html4 (22.52 %)
BM_ZFlat/6 1102983 1093922 181 132.6MB/s txt1 (57.88 %)
BM_ZFlat/7 974142 975490 204 122.4MB/s txt2 (61.91 %)
BM_ZFlat/8 2984670 2990000 100 136.1MB/s txt3 (54.99 %)
BM_ZFlat/9 4100130 4090000 100 112.4MB/s txt4 (66.26 %)
BM_ZFlat/10 276236 275139 716 411.0MB/s pb (19.68 %)
BM_ZFlat/11 760091 759541 262 231.4MB/s gaviota (37.72 %)
Baseline benchmark results, Pixel C with Android N2G48B
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0 148957 147565 1335 661.8MB/s html
BM_UFlat/1 1527257 1500000 132 446.4MB/s urls
BM_UFlat/2 19589 19397 8764 5.9GB/s jpg
BM_UFlat/3 425 418 408163 455.3MB/s jpg_200
BM_UFlat/4 30096 29552 6497 3.2GB/s pdf
BM_UFlat/5 595933 594594 333 657.0MB/s html4
BM_UFlat/6 516315 514360 383 282.0MB/s txt1
BM_UFlat/7 454653 453514 441 263.2MB/s txt2
BM_UFlat/8 1382687 1361111 144 299.0MB/s txt3
BM_UFlat/9 1967590 1904761 105 241.3MB/s txt4
BM_UFlat/10 148271 144560 1342 782.3MB/s pb
BM_UFlat/11 523997 510471 382 344.4MB/s gaviota
BM_UIOVec/0 478443 465227 417 209.9MB/s html
BM_UIOVec/1 4172860 4060000 100 164.9MB/s urls
BM_UIOVec/2 21470 20975 7342 5.5GB/s jpg
BM_UIOVec/3 1357 1330 75187 143.4MB/s jpg_200
BM_UIOVec/4 63143 61365 3031 1.6GB/s pdf
BM_UValidate/0 86910 85125 2279 1.1GB/s html
BM_UValidate/1 1022256 1000000 195 669.6MB/s urls
BM_UValidate/2 420 417 400000 274.6GB/s jpg
BM_UValidate/3 311 302 571428 630.0MB/s jpg_200
BM_UValidate/4 7778 7584 25445 12.6GB/s pdf
BM_ZFlat/0 469209 457547 424 213.4MB/s html (22.31 %)
BM_ZFlat/1 5633510 5460000 100 122.6MB/s urls (47.78 %)
BM_ZFlat/2 37896 36693 4524 3.1GB/s jpg (99.95 %)
BM_ZFlat/3 1485 1441 123456 132.3MB/s jpg_200 (73.00 %)
BM_ZFlat/4 74870 72775 2652 1.3GB/s pdf (83.30 %)
BM_ZFlat/5 1857321 1785714 112 218.8MB/s html4 (22.52 %)
BM_ZFlat/6 1538723 1492307 130 97.2MB/s txt1 (57.88 %)
BM_ZFlat/7 1338236 1310810 148 91.1MB/s txt2 (61.91 %)
BM_ZFlat/8 4050820 4040000 100 100.7MB/s txt3 (54.99 %)
BM_ZFlat/9 5234940 5230000 100 87.9MB/s txt4 (66.26 %)
BM_ZFlat/10 400309 400000 495 282.7MB/s pb (19.68 %)
BM_ZFlat/11 1063042 1058510 188 166.1MB/s gaviota (37.72 %)
costan [Wed, 2 Aug 2017 16:43:03 +0000 (09:43 -0700)]
Add unistd.h checks back to the CMake build.
getpagesize(), as well as its POSIX.2001 replacement
sysconf(_SC_PAGESIZE), is defined in <unistd.h>. On Linux and OS X,
including <sys/mman.h> is sufficient to get a definition for
getpagesize(). However, this is not true for the Android NDK. This CL
brings back the HAVE_UNISTD_H definition and its associated header
check.
This also adds a HAVE_FUNC_SYSCONF definition, which checks for the
presence of sysconf(). The definition can be used later to replace
getpagesize() with sysconf().
costan [Tue, 1 Aug 2017 20:18:55 +0000 (13:18 -0700)]
Replace getpagesize() with sysconf(_SC_PAGESIZE).
getpagesize() has been removed from POSIX.1-2001. Its recommended
replacement is sysconf(_SC_PAGESIZE).
costan [Tue, 1 Aug 2017 18:36:14 +0000 (11:36 -0700)]
Add guidelines for opensource contributions.
The guidelines follow the instructions at
https://opensource.google.com/docs/releasing/preparing/#CONTRIBUTING
costan [Tue, 1 Aug 2017 17:01:27 +0000 (10:01 -0700)]
Use _BitScanForward and _BitScanReverse on MSVC.
Based on https://github.com/google/snappy/pull/30
jueminyang [Fri, 28 Jul 2017 21:31:04 +0000 (14:31 -0700)]
Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros.
costan [Fri, 28 Jul 2017 16:55:21 +0000 (09:55 -0700)]
Redo CMake configuration.
The style was changed to match the official manual [1], the install
configuration was simplified and now matches the official packaging
guide [2], and the config files use the CMake-specific variable syntax
${VAR} instead of the autoconf-compatible syntax @VAR@, as documented in
[3]. The public header files are declared as such (for CMake 3.3+), and
the generated headers are included in the library target definition.
The tests are only built if SNAPPY_BUILD_TESTS (default ON) is true, so
zippy can be easily used in projects that add_subdirectory() its source
code directly, instead of using find_package().
[1] https://cmake.org/cmake/help/git-master/manual/cmake-language.7.html
[2] https://cmake.org/cmake/help/git-master/manual/cmake-packages.7.html
[3] https://cmake.org/cmake/help/git-master/command/configure_file.html
costan [Thu, 27 Jul 2017 23:34:19 +0000 (16:34 -0700)]
Small improvements to open source CI configuration.
This CL fixes 64-bit Windows testing (), makes it possible to view the
test output in the Travis / AppVeyor CI console while the test is
running, and takes advantage of the new support for the .appveyor.yml
file name to make the CI configuration less obtrusive.
costan [Thu, 27 Jul 2017 23:31:00 +0000 (16:31 -0700)]
Support both static and shared library CMake builds.
This can be used to fix https://github.com/Homebrew/homebrew-core/issues/15722.
costan [Wed, 26 Jul 2017 17:08:17 +0000 (10:08 -0700)]
Inline DISALLOW_COPY_AND_ASSIGN.
snappy-stubs-public.h defined the DISALLOW_COPY_AND_ASSIGN macro, so the
definition propagated to all translation units that included the open
source headers. The macro is now inlined, thus avoiding polluting the
macro environment of snappy users.
costan [Tue, 25 Jul 2017 21:47:14 +0000 (14:47 -0700)]
snappy: Remove autoconf build configuration.
costan [Tue, 25 Jul 2017 21:45:04 +0000 (14:45 -0700)]
Clean up CMake header and type checks.
Unused macros: HAVE_DLFCN_H, HAVE_INTTYPES_H, HAVE_MEMORY_H,
HAVE_STDLIB_H, HAVE_STRINGS_H, HAVE_STRING_H, HAVE_SYS_BYTESWAP_H,
HAVE_SYS_STAT_H, HAVE_SYS_TYPES_H, HAVE_UNISTD_H.
Used but never set macros: HAVE_LIBLZF, HAVE_LIBQUICKLZ. These only gate
conditional includes. The code that takes advantage of them was removed.
Unused types: ssize_t.
The testing code uses HAVE_FUNC_MMAP, which was not wired in the CMake
build, causing a whole test to be skipped.
costan [Thu, 13 Jul 2017 10:13:54 +0000 (03:13 -0700)]
zippy: Re-release snappy 1.1.5 as 1.1.6.
The migration from autotools to CMake in 1.1.5 wasn't as smooth as
intended. The SONAME / SOVERSION were broken in both build systems,
causing breakages in systems that upgraded from snappy 1.1.4 to 1.1.5,
as reported in https://github.com/Homebrew/homebrew-core/issues/15274
and https://github.com/google/snappy/pull/45.
costan [Wed, 28 Jun 2017 21:23:16 +0000 (14:23 -0700)]
Tag open source release 1.1.5.
costan [Tue, 27 Jun 2017 19:23:33 +0000 (12:23 -0700)]
Set minimum CMake version to 3.1.
The project only needs CMake 3.1 features, and some Travis CI bots have
CMake 3.2.2. Therefore, requiring CMake 3.4 is inconvenient.
costan [Tue, 27 Jun 2017 18:11:16 +0000 (11:11 -0700)]
Update Travis CI config, add AppVeyor for Windows CI coverage.
tmsriram [Thu, 15 Jun 2017 21:24:18 +0000 (14:24 -0700)]
Explicitly copy internal::wordmask to the stack array to work around a compiler
optimization with LLVM that converts const stack arrays to global arrays. This
is a temporary change and should be reverted when https://reviews.llvm.org/D30759
is fixed.
With PIE, accessing stack arrays is more efficient than global arrays and
wordmask was moved to the stack due to that. However, the LLVM compiler
automatically converts stack arrays, detected as constant, to global arrays
and this transformation hurts PIE performance with LLVM.
We are working to fix this in the LLVM compiler, via
https://reviews.llvm.org/D30759, to not do this conversion in PIE mode. Until
this patch is finished, please consider this source change as a temporary
work around to keep this array on the stack. This source change is important
to allow some projects to flip the default compiler from GCC to LLVM for
optimized builds.
This change works for the following reason. The LLVM compiler does not convert
non-const stack arrays to global arrays and explicitly copying the elements is
enough to make the compiler assume that this is a non-const array.
With GCC, this change does not affect code-gen in any significant way. The
array initialization code is slightly different as it copies the constants
directly to the stack.
With LLVM, this keeps the array on the stack.
No change in performance with GCC (within noise range). With LLVM, ~0.7%
improvement in optimized mode (no FDO) and ~1.75% improvement in FDO
mode.
ysaed [Tue, 13 Jun 2017 21:35:23 +0000 (14:35 -0700)]
Remove benchmarking support for fastlz.
alkis [Tue, 6 Jun 2017 08:05:05 +0000 (01:05 -0700)]
Use 64 bit little endian on ppc64le.
This has tangible performance benefits.
This lands https://github.com/google/snappy/pull/27
alkis [Fri, 2 Jun 2017 17:21:49 +0000 (10:21 -0700)]
Improve the SSE2 macro check on Windows.
This lands https://github.com/google/snappy/pull/37
alkis [Fri, 2 Jun 2017 14:51:56 +0000 (07:51 -0700)]
Check for the existence of sys/uio.h in autoconf build.
This lands https://github.com/google/snappy/pull/32
jyrki [Tue, 9 May 2017 14:41:36 +0000 (07:41 -0700)]
Remove quicklz and lzf support in benchmarks.
vrabaud [Wed, 26 Apr 2017 11:24:58 +0000 (04:24 -0700)]
Provide a CMakeLists.txt.
This lands https://github.com/google/snappy/pull/29
costan [Fri, 17 Mar 2017 20:43:18 +0000 (13:43 -0700)]
Clean up unused function warnings in snappy.
costan [Mon, 13 Mar 2017 19:46:43 +0000 (12:46 -0700)]
Remove "using namespace std;" from zippy-stubs-internal.h.
This makes it easier to build zippy, as some compiles require a warning
suppression to accept "using namespace std".
costan [Fri, 10 Mar 2017 20:27:00 +0000 (12:27 -0800)]
Add Travis CI configuration to snappy and fix the make build.
The make build in the open source version uses autoconf, which is set up
to expect a project that follows the gnu standard.
alkis [Fri, 3 Mar 2017 20:13:38 +0000 (12:13 -0800)]
Rename README to README.md. It already in markdown, we might as well let github know so that it renders nicely.
alkis [Wed, 22 Feb 2017 18:26:14 +0000 (10:26 -0800)]
Delete UnalignedCopy64 from snappy-stubs since the version in snappy.cc is more robust and possibly faster (assuming the compiler knows how to best copy 8 bytes between locations in memory the fastest way possible - a rather safe bet).
scrubbed [Wed, 15 Feb 2017 23:06:43 +0000 (15:06 -0800)]
Add std:: prefix to STL non-type names.
In order to disable global using declarations, this CL qualifies
stl names with the std namespace.
alkis [Tue, 14 Feb 2017 20:36:05 +0000 (12:36 -0800)]
Make UnalignedCopy64 not exhibit undefined behavior when src and dst overlap.
name old speed new speed delta
BM_UFlat/0 3.09GB/s ± 3% 3.07GB/s ± 2% -0.78% (p=0.009 n=19+19)
BM_UFlat/1 1.63GB/s ± 2% 1.62GB/s ± 2% ~ (p=0.099 n=19+20)
BM_UFlat/2 19.7GB/s ±19% 20.7GB/s ±11% ~ (p=0.054 n=20+19)
BM_UFlat/3 1.61GB/s ± 2% 1.60GB/s ± 1% -0.48% (p=0.049 n=20+17)
BM_UFlat/4 15.8GB/s ± 7% 15.6GB/s ±10% ~ (p=0.234 n=20+20)
BM_UFlat/5 2.47GB/s ± 1% 2.46GB/s ± 2% ~ (p=0.608 n=19+19)
BM_UFlat/6 1.07GB/s ± 2% 1.07GB/s ± 1% ~ (p=0.128 n=20+19)
BM_UFlat/7 1.01GB/s ± 1% 1.00GB/s ± 2% ~ (p=0.656 n=15+19)
BM_UFlat/8 1.13GB/s ± 1% 1.13GB/s ± 1% ~ (p=0.532 n=18+19)
BM_UFlat/9 918MB/s ± 1% 916MB/s ± 1% ~ (p=0.443 n=19+18)
BM_UFlat/10 3.90GB/s ± 1% 3.90GB/s ± 1% ~ (p=0.895 n=20+19)
BM_UFlat/11 1.30GB/s ± 1% 1.29GB/s ± 2% ~ (p=0.156 n=19+19)
BM_UFlat/12 2.35GB/s ± 2% 2.34GB/s ± 1% ~ (p=0.349 n=19+17)
BM_UFlat/13 2.07GB/s ± 1% 2.06GB/s ± 2% ~ (p=0.475 n=18+19)
BM_UFlat/14 2.23GB/s ± 1% 2.23GB/s ± 1% ~ (p=0.983 n=19+19)
BM_UFlat/15 1.55GB/s ± 1% 1.55GB/s ± 1% ~ (p=0.314 n=19+19)
BM_UFlat/16 1.26GB/s ± 1% 1.26GB/s ± 1% ~ (p=0.907 n=15+18)
BM_UFlat/17 2.32GB/s ± 1% 2.32GB/s ± 1% ~ (p=0.604 n=18+19)
BM_UFlat/18 1.61GB/s ± 1% 1.61GB/s ± 1% ~ (p=0.212 n=18+19)
BM_UFlat/19 1.78GB/s ± 1% 1.78GB/s ± 2% ~ (p=0.350 n=19+19)
BM_UFlat/20 1.89GB/s ± 1% 1.90GB/s ± 2% ~ (p=0.092 n=19+19)
Also tested the current version against UNALIGNED_STORE64(dst, UNALIGNED_LOAD64(src)), there is no difference (old is memcpy, new is UNALIGNED*):
name old speed new speed delta
BM_UFlat/0 3.14GB/s ± 1% 3.16GB/s ± 2% ~ (p=0.156 n=19+19)
BM_UFlat/1 1.62GB/s ± 1% 1.61GB/s ± 2% ~ (p=0.102 n=19+20)
BM_UFlat/2 18.8GB/s ±17% 19.1GB/s ±11% ~ (p=0.390 n=20+16)
BM_UFlat/3 1.59GB/s ± 1% 1.58GB/s ± 1% -1.06% (p=0.000 n=18+18)
BM_UFlat/4 15.8GB/s ± 6% 15.6GB/s ± 7% ~ (p=0.184 n=19+20)
BM_UFlat/5 2.46GB/s ± 1% 2.44GB/s ± 1% -0.95% (p=0.000 n=19+18)
BM_UFlat/6 1.08GB/s ± 1% 1.06GB/s ± 1% -1.17% (p=0.000 n=19+18)
BM_UFlat/7 1.00GB/s ± 1% 0.99GB/s ± 1% -1.16% (p=0.000 n=19+18)
BM_UFlat/8 1.14GB/s ± 2% 1.12GB/s ± 1% -1.12% (p=0.000 n=19+18)
BM_UFlat/9 921MB/s ± 1% 914MB/s ± 1% -0.84% (p=0.000 n=20+17)
BM_UFlat/10 3.94GB/s ± 2% 3.92GB/s ± 1% ~ (p=0.058 n=19+17)
BM_UFlat/11 1.29GB/s ± 1% 1.28GB/s ± 1% -0.77% (p=0.001 n=19+17)
BM_UFlat/12 2.34GB/s ± 1% 2.31GB/s ± 1% -1.10% (p=0.000 n=18+18)
BM_UFlat/13 2.06GB/s ± 1% 2.05GB/s ± 1% -0.73% (p=0.001 n=19+18)
BM_UFlat/14 2.22GB/s ± 1% 2.20GB/s ± 1% -0.73% (p=0.000 n=18+18)
BM_UFlat/15 1.55GB/s ± 1% 1.53GB/s ± 1% -1.07% (p=0.000 n=19+18)
BM_UFlat/16 1.26GB/s ± 1% 1.25GB/s ± 1% -0.79% (p=0.000 n=18+18)
BM_UFlat/17 2.31GB/s ± 1% 2.29GB/s ± 1% -0.98% (p=0.000 n=20+18)
BM_UFlat/18 1.61GB/s ± 1% 1.60GB/s ± 2% -0.71% (p=0.001 n=20+19)
BM_UFlat/19 1.77GB/s ± 1% 1.76GB/s ± 1% -0.61% (p=0.007 n=19+18)
BM_UFlat/20 1.89GB/s ± 1% 1.88GB/s ± 1% -0.75% (p=0.000 n=20+18)
skanev [Wed, 1 Feb 2017 16:34:26 +0000 (08:34 -0800)]
Add compression size reporting hooks.
Also, force inlining util::compression::Sample().
The inlining change is necessary. Without it even with FDO+LIPO the call
doesn't get inlined and uses 4 registers to construct parameters (which
won't be used in the common case). In some of the more compute-bound
tests that causes extra spills and significant overhead (even if
call is sufficiently long).
For example, with inlining:
BM_UFlat/0 32.7µs ± 1% 33.1µs ± 1% +1.41%
without:
BM_UFlat/0 32.7µs ± 1% 37.7µs ± 1% +15.29%
alkis [Mon, 30 Jan 2017 20:57:14 +0000 (12:57 -0800)]
Use #ifdef __SSE2__ for the emmintrin.h include, otherwise snappy.cc does not compile with -march=prescott.
Alkis Evlogimenos [Fri, 27 Jan 2017 08:12:04 +0000 (09:12 +0100)]
1.1.4 release.
Alkis Evlogimenos [Fri, 27 Jan 2017 08:10:36 +0000 (09:10 +0100)]
Improve zippy decompression speed.
The CL contains the following optimizations:
1) rewrite IncrementalCopy routine: single routine that splits the code into sections based on typical probabilities observed across a variety of inputs and helps reduce branch mispredictions both for FDO and non-FDO builds. IncrementalCopy is an adaptive routine that selects the best strategy based on input.
2) introduce UnalignedCopy128 that copies 128 bits per cycle using SSE2.
3) add branch hint for the main decoding loop. The non-literal case is taken more often in benchmarks. I expect this to be a noop in production with FDO. Note that this became apparent after step 1 above.
4) use the new IncrementalCopy in ZippyScatteredWriter.
I test two archs: x86_haswell and ppc_power8.
For x86_haswell I use FDO. For ppc_power8 I do not use FDO.
x86_haswell + FDO
name old speed new speed delta
BM_UCord/0 1.97GB/s ± 1% 3.19GB/s ± 1% +62.08% (p=0.000 n=19+18)
BM_UCord/1 1.28GB/s ± 1% 1.51GB/s ± 1% +18.14% (p=0.000 n=19+18)
BM_UCord/2 15.6GB/s ± 9% 15.5GB/s ± 7% ~ (p=0.620 n=20+20)
BM_UCord/3 811MB/s ± 1% 808MB/s ± 1% -0.38% (p=0.009 n=17+18)
BM_UCord/4 12.4GB/s ± 4% 12.7GB/s ± 8% +2.70% (p=0.002 n=17+20)
BM_UCord/5 1.77GB/s ± 0% 2.33GB/s ± 1% +31.37% (p=0.000 n=18+18)
BM_UCord/6 900MB/s ± 1% 1006MB/s ± 1% +11.71% (p=0.000 n=18+17)
BM_UCord/7 858MB/s ± 1% 938MB/s ± 2% +9.36% (p=0.000 n=19+16)
BM_UCord/8 921MB/s ± 1% 985MB/s ±21% +6.94% (p=0.028 n=19+20)
BM_UCord/9 824MB/s ± 1% 800MB/s ±20% ~ (p=0.113 n=19+20)
BM_UCord/10 2.60GB/s ± 1% 3.67GB/s ±21% +41.31% (p=0.000 n=19+20)
BM_UCord/11 1.07GB/s ± 1% 1.21GB/s ± 1% +13.17% (p=0.000 n=16+16)
BM_UCord/12 1.84GB/s ± 8% 2.18GB/s ± 1% +18.44% (p=0.000 n=16+19)
BM_UCord/13 1.83GB/s ±18% 1.89GB/s ± 1% +3.14% (p=0.000 n=17+19)
BM_UCord/14 1.96GB/s ± 2% 1.97GB/s ± 1% +0.55% (p=0.000 n=16+17)
BM_UCord/15 1.30GB/s ±20% 1.43GB/s ± 1% +9.85% (p=0.000 n=20+20)
BM_UCord/16 658MB/s ±20% 705MB/s ± 1% +7.22% (p=0.000 n=20+19)
BM_UCord/17 1.96GB/s ± 2% 2.15GB/s ± 1% +9.73% (p=0.000 n=16+19)
BM_UCord/18 555MB/s ± 1% 833MB/s ± 1% +50.11% (p=0.000 n=18+19)
BM_UCord/19 1.57GB/s ± 1% 1.75GB/s ± 1% +11.34% (p=0.000 n=20+20)
BM_UCord/20 1.72GB/s ± 2% 1.70GB/s ± 2% -1.01% (p=0.001 n=20+20)
BM_UCordStringSink/0 2.88GB/s ± 1% 3.15GB/s ± 1% +9.56% (p=0.000 n=17+20)
BM_UCordStringSink/1 1.50GB/s ± 1% 1.52GB/s ± 1% +1.96% (p=0.000 n=19+20)
BM_UCordStringSink/2 14.5GB/s ±10% 14.6GB/s ±10% ~ (p=0.542 n=20+20)
BM_UCordStringSink/3 1.06GB/s ± 1% 1.08GB/s ± 1% +1.77% (p=0.000 n=18+20)
BM_UCordStringSink/4 12.6GB/s ± 7% 13.2GB/s ± 4% +4.63% (p=0.000 n=20+20)
BM_UCordStringSink/5 2.29GB/s ± 1% 2.36GB/s ± 1% +3.05% (p=0.000 n=19+20)
BM_UCordStringSink/6 1.01GB/s ± 2% 1.01GB/s ± 0% ~ (p=0.055 n=20+18)
BM_UCordStringSink/7 945MB/s ± 1% 939MB/s ± 1% -0.60% (p=0.000 n=19+20)
BM_UCordStringSink/8 1.06GB/s ± 1% 1.07GB/s ± 1% +0.62% (p=0.000 n=18+20)
BM_UCordStringSink/9 866MB/s ± 1% 864MB/s ± 1% ~ (p=0.107 n=19+20)
BM_UCordStringSink/10 3.64GB/s ± 2% 3.98GB/s ± 1% +9.32% (p=0.000 n=19+20)
BM_UCordStringSink/11 1.22GB/s ± 1% 1.22GB/s ± 1% +0.61% (p=0.001 n=19+20)
BM_UCordStringSink/12 2.23GB/s ± 1% 2.23GB/s ± 1% ~ (p=0.692 n=19+20)
BM_UCordStringSink/13 1.96GB/s ± 1% 1.94GB/s ± 1% -0.82% (p=0.000 n=17+18)
BM_UCordStringSink/14 2.09GB/s ± 2% 2.08GB/s ± 1% ~ (p=0.147 n=20+18)
BM_UCordStringSink/15 1.47GB/s ± 1% 1.45GB/s ± 1% -0.88% (p=0.000 n=20+19)
BM_UCordStringSink/16 908MB/s ± 1% 917MB/s ± 1% +0.97% (p=0.000 n=19+19)
BM_UCordStringSink/17 2.11GB/s ± 1% 2.20GB/s ± 1% +4.35% (p=0.000 n=18+20)
BM_UCordStringSink/18 804MB/s ± 2% 1106MB/s ± 1% +37.52% (p=0.000 n=20+20)
BM_UCordStringSink/19 1.67GB/s ± 1% 1.72GB/s ± 0% +2.81% (p=0.000 n=18+20)
BM_UCordStringSink/20 1.77GB/s ± 3% 1.77GB/s ± 3% ~ (p=0.815 n=20+20)
ppc_power8
name old speed new speed delta
BM_UCord/0 918MB/s ± 6% 1262MB/s ± 0% +37.56% (p=0.000 n=17+16)
BM_UCord/1 671MB/s ±13% 879MB/s ± 2% +30.99% (p=0.000 n=18+16)
BM_UCord/2 12.6GB/s ± 8% 12.6GB/s ± 5% ~ (p=0.452 n=17+19)
BM_UCord/3 285MB/s ±10% 284MB/s ± 4% -0.50% (p=0.021 n=19+17)
BM_UCord/4 5.21GB/s ±12% 6.59GB/s ± 1% +26.37% (p=0.000 n=17+16)
BM_UCord/5 913MB/s ± 4% 1253MB/s ± 1% +37.27% (p=0.000 n=16+17)
BM_UCord/6 461MB/s ±13% 547MB/s ± 1% +18.67% (p=0.000 n=18+16)
BM_UCord/7 455MB/s ± 2% 524MB/s ± 3% +15.28% (p=0.000 n=16+18)
BM_UCord/8 489MB/s ± 2% 584MB/s ± 2% +19.47% (p=0.000 n=17+17)
BM_UCord/9 410MB/s ±33% 490MB/s ± 1% +19.64% (p=0.000 n=17+18)
BM_UCord/10 1.10GB/s ± 3% 1.55GB/s ± 2% +41.21% (p=0.000 n=16+16)
BM_UCord/11 494MB/s ± 1% 558MB/s ± 1% +12.92% (p=0.000 n=17+18)
BM_UCord/12 608MB/s ± 3% 793MB/s ± 1% +30.45% (p=0.000 n=17+16)
BM_UCord/13 545MB/s ±18% 721MB/s ± 2% +32.22% (p=0.000 n=19+17)
BM_UCord/14 594MB/s ± 4% 748MB/s ± 3% +25.99% (p=0.000 n=17+17)
BM_UCord/15 628MB/s ± 1% 822MB/s ± 3% +30.94% (p=0.000 n=18+16)
BM_UCord/16 277MB/s ± 2% 280MB/s ±15% +0.86% (p=0.001 n=17+17)
BM_UCord/17 864MB/s ± 1% 1001MB/s ± 3% +15.96% (p=0.000 n=17+17)
BM_UCord/18 121MB/s ± 2% 284MB/s ± 4% +134.08% (p=0.000 n=17+18)
BM_UCord/19 594MB/s ± 0% 713MB/s ± 2% +19.93% (p=0.000 n=16+17)
BM_UCord/20 553MB/s ±10% 662MB/s ± 5% +19.74% (p=0.000 n=16+18)
BM_UCordStringSink/0 1.37GB/s ± 4% 1.48GB/s ± 2% +8.51% (p=0.000 n=16+16)
BM_UCordStringSink/1 969MB/s ± 1% 990MB/s ± 1% +2.16% (p=0.000 n=16+18)
BM_UCordStringSink/2 13.1GB/s ±11% 13.0GB/s ±14% ~ (p=0.858 n=17+18)
BM_UCordStringSink/3 411MB/s ± 1% 415MB/s ± 1% +0.93% (p=0.000 n=16+17)
BM_UCordStringSink/4 6.81GB/s ± 8% 7.29GB/s ± 5% +7.12% (p=0.000 n=16+19)
BM_UCordStringSink/5 1.35GB/s ± 5% 1.45GB/s ±13% +8.00% (p=0.000 n=16+17)
BM_UCordStringSink/6 653MB/s ± 8% 653MB/s ± 3% -0.12% (p=0.007 n=17+19)
BM_UCordStringSink/7 618MB/s ±13% 597MB/s ±18% -3.45% (p=0.001 n=18+18)
BM_UCordStringSink/8 702MB/s ± 5% 702MB/s ± 1% -0.10% (p=0.012 n=17+16)
BM_UCordStringSink/9 590MB/s ± 2% 564MB/s ±13% -4.46% (p=0.000 n=16+17)
BM_UCordStringSink/10 1.63GB/s ± 2% 1.76GB/s ± 4% +8.28% (p=0.000 n=17+16)
BM_UCordStringSink/11 630MB/s ±14% 684MB/s ±15% +8.51% (p=0.000 n=19+17)
BM_UCordStringSink/12 858MB/s ±12% 903MB/s ± 9% +5.17% (p=0.000 n=19+17)
BM_UCordStringSink/13 806MB/s ±22% 879MB/s ± 1% +8.98% (p=0.000 n=19+19)
BM_UCordStringSink/14 854MB/s ±13% 901MB/s ± 5% +5.60% (p=0.000 n=19+17)
BM_UCordStringSink/15 930MB/s ± 2% 964MB/s ± 3% +3.59% (p=0.000 n=16+16)
BM_UCordStringSink/16 363MB/s ±10% 356MB/s ± 6% ~ (p=0.050 n=20+19)
BM_UCordStringSink/17 976MB/s ±12% 1078MB/s ± 1% +10.52% (p=0.000 n=20+17)
BM_UCordStringSink/18 227MB/s ± 1% 355MB/s ± 3% +56.45% (p=0.000 n=16+17)
BM_UCordStringSink/19 751MB/s ± 4% 808MB/s ± 4% +7.70% (p=0.000 n=18+17)
BM_UCordStringSink/20 761MB/s ± 8% 786MB/s ± 4% +3.23% (p=0.000 n=18+17)
Behzad Nouri [Mon, 28 Nov 2016 16:49:41 +0000 (08:49 -0800)]
adds std:: to stl types (#061)
Geoff Pike [Wed, 6 Jul 2016 15:24:48 +0000 (08:24 -0700)]
Re-work fast path for handling copies in zippy decompression.
This is a performance-tuning change that shouldn't change the behavior
of the library.
This adds some complexity but the performance gain might make that
worthwhile: With FDO on perflab/haswell, a 4.0% gain (geometric mean).
SAMPLE (before)
Benchmark Time(ns) CPU(ns) Iterations
------------------------------------------------
BM_UFlat/0 36638 36552 100000 2.6GB/s html
BM_UFlat/1 457153 455895 9173 1.4GB/s urls
BM_UFlat/2 5850 5837 685481 19.6GB/s jpg
BM_UFlat/3 122 122
34551988 1.5GB/s jpg_200
BM_UFlat/4 6797 6781 620811 14.1GB/s pdf
BM_UFlat/5 179485 179037 23471 2.1GB/s html4
BM_UFlat/6 142734 142384 29525 1018.7MB/s txt1
BM_UFlat/7 125233 124924 33709 955.6MB/s txt2
BM_UFlat/8 382548 381533 10000 1066.7MB/s txt3
BM_UFlat/9 525614 524297 8018 876.5MB/s txt4
BM_UFlat/10 34946 34868 100000 3.2GB/s pb
BM_UFlat/11 149548 149208 28063 1.2GB/s gaviota
BM_UFlat/12 10684 10663 392580 2.1GB/s cp
BM_UFlat/13 5494 5484 766584 1.9GB/s c
BM_UFlat/14 1691 1688 2488784 2.1GB/s lsp
BM_UFlat/15 676443 674726 6129 1.4GB/s xls
BM_UFlat/16 156 156
26656909 1.2GB/s xls_200
BM_UFlat/17 239911 239297 17558 2.0GB/s bin
BM_UFlat/18 182 182
23072932 1047.9MB/s bin_200
BM_UFlat/19 21544 21499 194484 1.7GB/s sum
BM_UFlat/20 2236 2232 1877810 1.8GB/s man
BM_UFlatSink/0 42266 42179 99732 2.3GB/s html
BM_UFlatSink/1 461810 460633 9055 1.4GB/s urls
BM_UFlatSink/2 5816 5804 632829 19.8GB/s jpg
BM_UFlatSink/3 124 123
34351698 1.5GB/s jpg_200
BM_UFlatSink/4 7173 7157 609929 13.3GB/s pdf
BM_UFlatSink/5 184795 184302 22660 2.1GB/s html4
BM_UFlatSink/6 143552 143223 29272 1012.7MB/s txt1
BM_UFlatSink/7 127160 126890 33178 940.8MB/s txt2
BM_UFlatSink/8 382219 381313 10000 1067.3MB/s txt3
BM_UFlatSink/9 528042 526713 7988 872.5MB/s txt4
BM_UFlatSink/10 41389 41305 100000 2.7GB/s pb
BM_UFlatSink/11 147215 146877 28854 1.2GB/s gaviota
BM_UFlatSink/12 12008 11984 348139 1.9GB/s cp
BM_UFlatSink/13 5444 5433 775084 1.9GB/s c
BM_UFlatSink/14 1647 1644 2552119 2.1GB/s lsp
BM_UFlatSink/15 665011 663424 6320 1.4GB/s xls
BM_UFlatSink/16 153 153
27571837 1.2GB/s xls_200
BM_UFlatSink/17 239735 239169 17411 2.0GB/s bin
BM_UFlatSink/18 183 182
23005573 1046.8MB/s bin_200
BM_UFlatSink/19 22544 22498 187705 1.6GB/s sum
BM_UFlatSink/20 2190 2186 1917894 1.8GB/s man
SAMPLE (after)
Benchmark Time(ns) CPU(ns) Iterations
------------------------------------------------
BM_UFlat/0 33940 33889 100000 2.8GB/s html
BM_UFlat/1 440728 439944 9586 1.5GB/s urls
BM_UFlat/2 5652 5641 744776 20.3GB/s jpg
BM_UFlat/3 123 123
34647884 1.5GB/s jpg_200
BM_UFlat/4 6628 6615 631892 14.4GB/s pdf
BM_UFlat/5 169523 169227 24197 2.3GB/s html4
BM_UFlat/6 144139 143892 29232 1008.0MB/s txt1
BM_UFlat/7 127148 126915 33144 940.6MB/s txt2
BM_UFlat/8 380267 379233 10000 1073.2MB/s txt3
BM_UFlat/9 529495 528194 7957 870.0MB/s txt4
BM_UFlat/10 31844 31784 100000 3.5GB/s pb
BM_UFlat/11 146822 146476 28737 1.2GB/s gaviota
BM_UFlat/12 10784 10762 392176 2.1GB/s cp
BM_UFlat/13 5528 5518 760934 1.9GB/s c
BM_UFlat/14 1721 1719 2449291 2.0GB/s lsp
BM_UFlat/15 673304 671774 6255 1.4GB/s xls
BM_UFlat/16 155 155
27092003 1.2GB/s xls_200
BM_UFlat/17 230424 229902 18285 2.1GB/s bin
BM_UFlat/18 185 184
22818199 1033.9MB/s bin_200
BM_UFlat/19 21035 20996 200765 1.7GB/s sum
BM_UFlat/20 2242 2238 1864380 1.8GB/s man
BM_UFlatSink/0 33487 33405 100000 2.9GB/s html
BM_UFlatSink/1 431108 430226 9764 1.5GB/s urls
BM_UFlatSink/2 5927 5916 648112 19.4GB/s jpg
BM_UFlatSink/3 123 122
34704423 1.5GB/s jpg_200
BM_UFlatSink/4 6472 6461 653462 14.8GB/s pdf
BM_UFlatSink/5 164309 163988 25567 2.3GB/s html4
BM_UFlatSink/6 138274 138020 30311 1050.9MB/s txt1
BM_UFlatSink/7 120844 120637 34708 989.6MB/s txt2
BM_UFlatSink/8 371046 370366 10000 1098.9MB/s txt3
BM_UFlatSink/9 510021 508982 8269 902.9MB/s txt4
BM_UFlatSink/10 30889 30844 100000 3.6GB/s pb
BM_UFlatSink/11 140752 140521 29903 1.2GB/s gaviota
BM_UFlatSink/12 10162 10146 413600 2.3GB/s cp
BM_UFlatSink/13 5264 5256 762398 2.0GB/s c
BM_UFlatSink/14 1622 1619 2606069 2.1GB/s lsp
BM_UFlatSink/15 646897 645756 6512 1.5GB/s xls
BM_UFlatSink/16 150 150
28223595 1.2GB/s xls_200
BM_UFlatSink/17 226096 225650 18629 2.1GB/s bin
BM_UFlatSink/18 185 184
22907935 1035.3MB/s bin_200
BM_UFlatSink/19 21369 21335 198881 1.7GB/s sum
BM_UFlatSink/20 2139 2136 1953637 1.8GB/s man
Sriraman Tallam [Wed, 29 Jun 2016 17:08:46 +0000 (10:08 -0700)]
Speed up Zippy decompression in PIE mode by removing the penalty for
global array access.
With PIE, accessing global arrays needs two instructions whereas it can be
done with a single instruction without PIE. See []
For example, without PIE the access looks like:
mov 0x400780(,%rdi,4),%eax // One instruction to access arr[i]
and with PIE the access looks like:
lea 0x149(%rip),%rax # 400780 <_ZL3arr>
mov (%rax,%rdi,4),%eax
This causes a slow down in zippy as it has two global arrays, wordmask and
char_table. There is no equivalent PC-relative insn. with PIE to do this in
one instruction.
The slow down can be seen as an increase in dynamic instruction count and
cycles with a similar IPC. We have seen this affect REDACTED recently and this
is causing a ~1% perf. slow down.
One of the mitigation techniques for small arrays is to move it onto the stack,
use the stack pointer to make the access a single instruction. The downside to
this is the extra instructions at function call to mov the array onto the stack
which is why we want to do this only for small arrays. I tried moving
wordmask onto the stack since it is a small array. The performance numbers look
good overall. There is an improvement in the dynamic instruction count for
almost all BM_UFlat benchmarks. BM_UFlat/2 and BM_UFlat/3 are pretty noisy.
The only case where there is a regression is BM_UFlat/10. Here, the instruction
count does go down but the IPC also goes down affecting performance. This also
looks noisy but I do see a small IPC drop with this change. Otherwise, the
numbers look good and consistent. I measured this on a perflab ivybridge
machine multiple times. Numbers are given below. For Improv. (improvements),
positive is good.
Binaries built as: blaze build -c opt --dynamic_mode=off
Benchmark Base CPU(ns) Opt CPU(ns) Improv. Base Cycles Opt Cycles Improv. Base Insns Opt Insns Improv.
BM_UFlat/1 541711 537052 0.86%
46068129918 45442732684 1.36%
85113352848 83917656016 1.40%
BM_UFlat/2 6228 6388 -2.57%
582789808 583267855 -0.08%
1261517746 1261116553 0.03%
BM_UFlat/3 159 120 24.53%
61538641 58783800 4.48%
90008672 90980060 -1.08%
BM_UFlat/4 7878 7787 1.16%
710491888 703718556 0.95%
1914898283 1525060250 20.36%
BM_UFlat/5 208854 207673 0.57%
17640846255 17609530720 0.18%
36546983483 36008920788 1.47%
BM_UFlat/6 172595 167225 3.11%
14642082831 14232371166 2.80%
33647820489 33056659600 1.76%
BM_UFlat/7 152364 147901 2.93%
12904338645 12635220582 2.09%
28958390984 28457982504 1.73%
BM_UFlat/8 463764 448244 3.35%
39423576973 37917435891 3.82%
88350964483 86800265943 1.76%
BM_UFlat/9 639517 621811 2.77%
54275945823 52555988926 3.17%
119503172410 117432599704 1.73%
BM_UFlat/10 41929 42358 -1.02%
3593125535 3647231492 -1.51%
8559206066 8446526639 1.32%
BM_UFlat/11 174754 173936 0.47%
14885371426 14749410955 0.91%
36693421142 35987215897 1.92%
BM_UFlat/12 13388 13257 0.98%
1192648670 1179645044 1.09%
3506482177 3454962579 1.47%
BM_UFlat/13 6801 6588 3.13%
627960003 608367286 3.12%
1847877894 1818368400 1.60%
BM_UFlat/14 2057 1989 3.31%
229005588 217393157 5.07%
609686274 599419511 1.68%
BM_UFlat/15 831618 799881 3.82%
70440388955 67911853013 3.59%
167178603105 164653652416 1.51%
BM_UFlat/16 199 199 0.00%
70109081 68747579 1.94%
106263639 105569531 0.65%
BM_UFlat/17 279031 273890 1.84%
23361373312 23294246637 0.29%
40474834585 39981682217 1.22%
BM_UFlat/18 233 199 14.59%
74530664 67841101 8.98%
94305848 92271053 2.16%
BM_UFlat/19 26743 25309 5.36%
2327215133 2206712016 5.18%
6024314357 5935228694 1.48%
BM_UFlat/20 2731 2625 3.88%
282018757 276772813 1.86%
768382519 758277029 1.32%
Is this a reasonable work-around for the problem? Do you need more performance
measurements? haih@ is evaluating this change for [] and I will update those
numbers once we have it.
Tested:
Performance with zippy_unittest.
Geoff Pike [Tue, 28 Jun 2016 18:53:11 +0000 (11:53 -0700)]
Re-work fast path that emits copies in zippy compression.
The primary motivation for the change is that FindMatchLength is
likely to discover a difference in the first 8 bytes it compares.
If that occurs then we know the length of the match is less than 12,
because FindMatchLength is invoked after a 4-byte match is found.
When emitting a copy, it is useful to know that the length is less
than 12 because the two-byte variant of an emitted copy requires that.
This is a performance-tuning change that should not affect the
library's behavior.
With FDO on perflab/Haswell the geometric mean for ZFlat/* went from
47,290ns to 45,741ns, an improvement of 3.4%.
SAMPLE (before)
BM_ZFlat/0 102824 102650 40691 951.4MB/s html (22.31 %)
BM_ZFlat/1 1293512 1290442 3225 518.9MB/s urls (47.78 %)
BM_ZFlat/2 10373 10353 417959 11.1GB/s jpg (99.95 %)
BM_ZFlat/3 268 268
15745324 712.4MB/s jpg_200 (73.00 %)
BM_ZFlat/4 12137 12113 342462 7.9GB/s pdf (83.30 %)
BM_ZFlat/5 430672 429720 9724 909.0MB/s html4 (22.52 %)
BM_ZFlat/6 420541 419636 9833 345.6MB/s txt1 (57.88 %)
BM_ZFlat/7 373829 373158 10000 319.9MB/s txt2 (61.91 %)
BM_ZFlat/8 1119014 1116604 3755 364.5MB/s txt3 (54.99 %)
BM_ZFlat/9 1544203 1540657 2748 298.3MB/s txt4 (66.26 %)
BM_ZFlat/10 91041 90866 46002 1.2GB/s pb (19.68 %)
BM_ZFlat/11 332766 331990 10000 529.5MB/s gaviota (37.72 %)
BM_ZFlat/12 39960 39886 100000 588.3MB/s cp (48.12 %)
BM_ZFlat/13 14493 14465 287181 735.1MB/s c (42.47 %)
BM_ZFlat/14 4447 4440 947927 799.3MB/s lsp (48.37 %)
BM_ZFlat/15 1316362 1313350 3196 747.7MB/s xls (41.23 %)
BM_ZFlat/16 312 311
10000000 613.0MB/s xls_200 (78.00 %)
BM_ZFlat/17 388471 387502 10000 1.2GB/s bin (18.11 %)
BM_ZFlat/18 65 64
64838208 2.9GB/s bin_200 (7.50 %)
BM_ZFlat/19 65900 65787 63099 554.3MB/s sum (48.96 %)
BM_ZFlat/20 6188 6177 681951 652.6MB/s man (59.21 %)
SAMPLE (after)
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------
BM_ZFlat/0 99259 99044 42428 986.0MB/s html (22.31 %)
BM_ZFlat/1 1257039 1255276 3341 533.4MB/s urls (47.78 %)
BM_ZFlat/2 10044 10030 405781 11.4GB/s jpg (99.95 %)
BM_ZFlat/3 268 267
15732282 713.3MB/s jpg_200 (73.00 %)
BM_ZFlat/4 11675 11657 358629 8.2GB/s pdf (83.30 %)
BM_ZFlat/5 420951 419818 9739 930.5MB/s html4 (22.52 %)
BM_ZFlat/6 415460 414632 10000 349.8MB/s txt1 (57.88 %)
BM_ZFlat/7 367191 366436 10000 325.8MB/s txt2 (61.91 %)
BM_ZFlat/8 1098345 1096036 3819 371.3MB/s txt3 (54.99 %)
BM_ZFlat/9 1508701 1505306 2758 305.3MB/s txt4 (66.26 %)
BM_ZFlat/10 87195 87031 47289 1.3GB/s pb (19.68 %)
BM_ZFlat/11 322338 321637 10000 546.5MB/s gaviota (37.72 %)
BM_ZFlat/12 36739 36668 100000 639.9MB/s cp (48.12 %)
BM_ZFlat/13 13646 13618 304009 780.9MB/s c (42.47 %)
BM_ZFlat/14 4249 4240 992456 837.0MB/s lsp (48.37 %)
BM_ZFlat/15 1262925 1260012 3314 779.4MB/s xls (41.23 %)
BM_ZFlat/16 308 308
10000000 619.8MB/s xls_200 (78.00 %)
BM_ZFlat/17 379750 378944 10000 1.3GB/s bin (18.11 %)
BM_ZFlat/18 62 62
67443280 3.0GB/s bin_200 (7.50 %)
BM_ZFlat/19 61706 61587 67645 592.1MB/s sum (48.96 %)
BM_ZFlat/20 5968 5958 698974 676.6MB/s man (59.21 %)
ckennelly [Mon, 27 Jun 2016 12:01:31 +0000 (05:01 -0700)]
Speed up the EmitLiteral fast path, +1.62% for ZFlat benchmarks.
This is inspired by the Go version in
//third_party/golang/snappy/encode_amd64.s (emitLiteralFastPath)
Benchmark Base:Reference (1)
--------------------------------------------------
(BM_ZFlat_0 1/cputime_ns) 9.669e-06 +1.65%
(BM_ZFlat_1 1/cputime_ns) 7.643e-07 +2.53%
(BM_ZFlat_10 1/cputime_ns) 1.107e-05 -0.97%
(BM_ZFlat_11 1/cputime_ns) 3.002e-06 +0.71%
(BM_ZFlat_12 1/cputime_ns) 2.338e-05 +7.22%
(BM_ZFlat_13 1/cputime_ns) 6.386e-05 +9.18%
(BM_ZFlat_14 1/cputime_ns) 0.0002256 -0.05%
(BM_ZFlat_15 1/cputime_ns) 7.608e-07 -1.29%
(BM_ZFlat_16 1/cputime_ns) 0.003236 -1.28%
(BM_ZFlat_17 1/cputime_ns) 2.58e-06 +0.52%
(BM_ZFlat_18 1/cputime_ns) 0.01538 +0.00%
(BM_ZFlat_19 1/cputime_ns) 1.436e-05 +6.21%
(BM_ZFlat_2 1/cputime_ns) 0.0001044 +4.99%
(BM_ZFlat_20 1/cputime_ns) 0.0001608 -0.18%
(BM_ZFlat_3 1/cputime_ns) 0.003745 +0.38%
(BM_ZFlat_4 1/cputime_ns) 8.144e-05 +6.21%
(BM_ZFlat_5 1/cputime_ns) 2.328e-06 -1.60%
(BM_ZFlat_6 1/cputime_ns) 2.391e-06 +0.06%
(BM_ZFlat_7 1/cputime_ns) 2.68e-06 -0.61%
(BM_ZFlat_8 1/cputime_ns) 8.852e-07 +0.19%
(BM_ZFlat_9 1/cputime_ns) 6.441e-07 +1.06%
geometric mean +1.62%
Geoff Pike [Fri, 24 Jun 2016 18:56:17 +0000 (11:56 -0700)]
Speed up zippy decompression by removing some zero-extensions.
This is a performance tuning change that should not affect
correctness. On perflab with FDO on Haswell the performance gain is
21,776ns before vs 21,255ns after, about 2.4%. (Using geometric means.)
SAMPLE PERFORMANCE with FDO on HASWELL (NEW)
Benchmark Time(ns) CPU(ns) Iterations
------------------------------------------------
BM_UFlat/0 37366 37279 100000 2.6GB/s html
BM_UFlat/1 471153 470204 8975 1.4GB/s urls
BM_UFlat/2 6116 6105 639496 18.8GB/s jpg
BM_UFlat/3 123 123
34709908 1.5GB/s jpg_200
BM_UFlat/4 6724 6714 623318 14.2GB/s pdf
BM_UFlat/5 183122 182722 23138 2.1GB/s html4
BM_UFlat/6 144981 144689 29384 1002.5MB/s txt1
BM_UFlat/7 125939 125691 33423 949.8MB/s txt2
BM_UFlat/8 383101 382241 10000 1064.7MB/s txt3
BM_UFlat/9 527824 526606 7958 872.6MB/s txt4
BM_UFlat/10 34849 34790 100000 3.2GB/s pb
BM_UFlat/11 150213 149937 28131 1.1GB/s gaviota
BM_UFlat/12 10850 10830 393231 2.1GB/s cp
BM_UFlat/13 5532 5523 735739 1.9GB/s c
BM_UFlat/14 1698 1695 2478035 2.0GB/s lsp
BM_UFlat/15 678396 676917 6200 1.4GB/s xls
BM_UFlat/16 155 155
26909789 1.2GB/s xls_200
BM_UFlat/17 241235 240698 17416 2.0GB/s bin
BM_UFlat/18 183 183
23000841 1043.5MB/s bin_200
BM_UFlat/19 21461 21424 193275 1.7GB/s sum
BM_UFlat/20 2232 2228 1887191 1.8GB/s man
BM_UFlatSink/0 42272 42199 98528 2.3GB/s html
BM_UFlatSink/1 460814 459898 9092 1.4GB/s urls
BM_UFlatSink/2 5558 5547 768629 20.7GB/s jpg
BM_UFlatSink/3 124 123
33629141 1.5GB/s jpg_200
BM_UFlatSink/4 6634 6621 629989 14.4GB/s pdf
BM_UFlatSink/5 182883 182491 23030 2.1GB/s html4
BM_UFlatSink/6 143269 142964 29410 1014.5MB/s txt1
BM_UFlatSink/7 127041 126809 33136 941.4MB/s txt2
BM_UFlatSink/8 384367 383577 10000 1061.0MB/s txt3
BM_UFlatSink/9 529979 528890 7898 868.9MB/s txt4
BM_UFlatSink/10 41154 41075 100000 2.7GB/s pb
BM_UFlatSink/11 146446 146155 28742 1.2GB/s gaviota
BM_UFlatSink/12 11939 11918 352663 1.9GB/s cp
BM_UFlatSink/13 5430 5421 770451 1.9GB/s c
BM_UFlatSink/14 1665 1662 2538921 2.1GB/s lsp
BM_UFlatSink/15 666840 665617 6309 1.4GB/s xls
BM_UFlatSink/16 152 152
27639460 1.2GB/s xls_200
BM_UFlatSink/17 240076 239573 17643 2.0GB/s bin
BM_UFlatSink/18 183 182
23128210 1046.0MB/s bin_200
BM_UFlatSink/19 22570 22528 185839 1.6GB/s sum
BM_UFlatSink/20 2183 2180 1899526 1.8GB/s man
SAMPLE PERFORMANCE with FDO on HASWELL (OLD)
Benchmark Time(ns) CPU(ns) Iterations
------------------------------------------------
BM_UFlat/0 37041 36990 100000 2.6GB/s html
BM_UFlat/1 471384 470574 8930 1.4GB/s urls
BM_UFlat/2 5997 5986 722354 19.2GB/s jpg
BM_UFlat/3 124 123
34964717 1.5GB/s jpg_200
BM_UFlat/4 6850 6838 621414 13.9GB/s pdf
BM_UFlat/5 182578 182271 23001 2.1GB/s html4
BM_UFlat/6 148338 147989 28132 980.1MB/s txt1
BM_UFlat/7 130682 130471 32347 915.0MB/s txt2
BM_UFlat/8 397420 396553 10000 1026.3MB/s txt3
BM_UFlat/9 550126 548872 7736 837.2MB/s txt4
BM_UFlat/10 35013 34958 100000 3.2GB/s pb
BM_UFlat/11 152270 151889 27508 1.1GB/s gaviota
BM_UFlat/12 11117 11096 379059 2.1GB/s cp
BM_UFlat/13 5812 5801 725240 1.8GB/s c
BM_UFlat/14 1780 1777 2383982 2.0GB/s lsp
BM_UFlat/15 707871 706139 5946 1.4GB/s xls
BM_UFlat/16 157 157
26889747 1.2GB/s xls_200
BM_UFlat/17 239160 238556 17512 2.0GB/s bin
BM_UFlat/18 181 180
23326040 1057.5MB/s bin_200
BM_UFlat/19 22706 22656 186285 1.6GB/s sum
BM_UFlat/20 2319 2315 1813186 1.7GB/s man
BM_UFlatSink/0 42657 42574 99000 2.2GB/s html
BM_UFlatSink/1 466316 465262 9036 1.4GB/s urls
BM_UFlatSink/2 6873 6859 648525 16.7GB/s jpg
BM_UFlatSink/3 124 124
34434643 1.5GB/s jpg_200
BM_UFlatSink/4 6804 6790 624282 14.0GB/s pdf
BM_UFlatSink/5 185468 185062 22746 2.1GB/s html4
BM_UFlatSink/6 148511 148209 28284 978.6MB/s txt1
BM_UFlatSink/7 130865 130607 32144 914.0MB/s txt2
BM_UFlatSink/8 393931 392983 10000 1035.6MB/s txt3
BM_UFlatSink/9 545548 544275 7740 844.3MB/s txt4
BM_UFlatSink/10 41659 41584 100000 2.7GB/s pb
BM_UFlatSink/11 152062 151721 27854 1.1GB/s gaviota
BM_UFlatSink/12 11987 11968 350909 1.9GB/s cp
BM_UFlatSink/13 5652 5641 743280 1.8GB/s c
BM_UFlatSink/14 1728 1725 2446140 2.0GB/s lsp
BM_UFlatSink/15 687879 686231 6138 1.4GB/s xls
BM_UFlatSink/16 155 155
27254484 1.2GB/s xls_200
BM_UFlatSink/17 240689 240083 17450 2.0GB/s bin
BM_UFlatSink/18 183 182
22932858 1046.8MB/s bin_200
BM_UFlatSink/19 22718 22674 185207 1.6GB/s sum
BM_UFlatSink/20 2272 2268 1851664 1.7GB/s man
ckennelly [Thu, 26 May 2016 21:51:33 +0000 (14:51 -0700)]
Avoid calling memset when resizing the buffer.
This buffer will be initialized and then trimmed down to size during the
compression phase.
Steinar H. Gunderson [Mon, 23 May 2016 09:16:01 +0000 (11:16 +0200)]
Merge pull request #6 from deviance/provide-pkg-config-data
Provide pkg-config data
Peter Kasting [Mon, 16 May 2016 21:30:12 +0000 (14:30 -0700)]
Add #ifdef to guard against macro redefinition if this is included in another
Google project that also defines this.
Steinar H. Gunderson [Fri, 20 May 2016 09:27:09 +0000 (11:27 +0200)]
Merge pull request #13 from huachaohuang/patch-1
Allow to compile in nested packages.
Steinar H. Gunderson [Tue, 5 Apr 2016 09:50:26 +0000 (11:50 +0200)]
Make heuristic match skipping more aggressive.
This causes compression to be much faster on incompressible inputs
(such as the jpeg and pdf tests), and is neutral or even positive on the other
tests. The test set shows only microscopic density regressions; I attempted to
construct a worst-case test set containing ~1500 different cases of mixed
plaintext + /dev/urandom, and even those seemed to be only 0.38 percentage
points less dense on average (the single worst case was 87.8% -> 89.0%), which
we can live with given that this is already an edge case.
The original idea is by Klaus Post; I only tweaked the implementation.
Ironically, the new implementation is almost more in line with the
comment that was there, so I've left that largely alone, albeit
with a small modification.
Microbenchmark results (opt mode, 64-bit, static linking):
Ivy Bridge:
Benchmark Base (ns) New (ns) Improvement
----------------------------------------------------------------------------------------
BM_ZFlat/0 120284 115480 847.0MB/s html (22.31 %) +4.2%
BM_ZFlat/1 1527911 1522242 440.7MB/s urls (47.78 %) +0.4%
BM_ZFlat/2 17591 10582 10.9GB/s jpg (99.95 %) +66.2%
BM_ZFlat/3 323 322 593.3MB/s jpg_200 (73.00 %) +0.3%
BM_ZFlat/4 53691 14063 6.8GB/s pdf (83.30 %) +281.8%
BM_ZFlat/5 495442 492347 794.8MB/s html4 (22.52 %) +0.6%
BM_ZFlat/6 473523 473622 306.7MB/s txt1 (57.88 %) -0.0%
BM_ZFlat/7 421406 420120 284.5MB/s txt2 (61.91 %) +0.3%
BM_ZFlat/8 1265632 1270538 320.8MB/s txt3 (54.99 %) -0.4%
BM_ZFlat/9 1742688 1737894 264.8MB/s txt4 (66.26 %) +0.3%
BM_ZFlat/10 107950 103404 1095.1MB/s pb (19.68 %) +4.4%
BM_ZFlat/11 372660 371818 473.5MB/s gaviota (37.72 %) +0.2%
BM_ZFlat/12 53239 49528 474.4MB/s cp (48.12 %) +7.5%
BM_ZFlat/13 18940 17349 613.9MB/s c (42.47 %) +9.2%
BM_ZFlat/14 5155 5075 700.3MB/s lsp (48.37 %) +1.6%
BM_ZFlat/15 1474757 1474471 667.2MB/s xls (41.23 %) +0.0%
BM_ZFlat/16 363 362 528.0MB/s xls_200 (78.00 %) +0.3%
BM_ZFlat/17 453849 456931 1073.2MB/s bin (18.11 %) -0.7%
BM_ZFlat/18 90 87 2.1GB/s bin_200 (7.50 %) +3.4%
BM_ZFlat/19 82163 80498 453.7MB/s sum (48.96 %) +2.1%
BM_ZFlat/20 7174 7124 566.7MB/s man (59.21 %) +0.7%
Sum of all benchmarks 8694831 8623857 +0.8%
Sandy Bridge:
Benchmark Base (ns) New (ns) Improvement
----------------------------------------------------------------------------------------
BM_ZFlat/0 117426 112649 868.2MB/s html (22.31 %) +4.2%
BM_ZFlat/1 1517095 1498522 447.5MB/s urls (47.78 %) +1.2%
BM_ZFlat/2 18601 10649 10.8GB/s jpg (99.95 %) +74.7%
BM_ZFlat/3 359 356 536.0MB/s jpg_200 (73.00 %) +0.8%
BM_ZFlat/4 60249 13832 6.9GB/s pdf (83.30 %) +335.6%
BM_ZFlat/5 481246 475571 822.7MB/s html4 (22.52 %) +1.2%
BM_ZFlat/6 460541 455693 318.8MB/s txt1 (57.88 %) +1.1%
BM_ZFlat/7 407751 404147 295.8MB/s txt2 (61.91 %) +0.9%
BM_ZFlat/8 1228255 1222519 333.4MB/s txt3 (54.99 %) +0.5%
BM_ZFlat/9 1678299 1666379 276.2MB/s txt4 (66.26 %) +0.7%
BM_ZFlat/10 106499 101715 1113.4MB/s pb (19.68 %) +4.7%
BM_ZFlat/11 361913 360222 488.7MB/s gaviota (37.72 %) +0.5%
BM_ZFlat/12 53137 49618 473.6MB/s cp (48.12 %) +7.1%
BM_ZFlat/13 18801 17812 597.8MB/s c (42.47 %) +5.6%
BM_ZFlat/14 5394 5383 660.2MB/s lsp (48.37 %) +0.2%
BM_ZFlat/15 1435411 1432870 686.4MB/s xls (41.23 %) +0.2%
BM_ZFlat/16 389 395 483.3MB/s xls_200 (78.00 %) -1.5%
BM_ZFlat/17 447255 445510 1100.4MB/s bin (18.11 %) +0.4%
BM_ZFlat/18 86 86 2.2GB/s bin_200 (7.50 %) +0.0%
BM_ZFlat/19 82555 79512 459.3MB/s sum (48.96 %) +3.8%
BM_ZFlat/20 7527 7553 534.5MB/s man (59.21 %) -0.3%
Sum of all benchmarks 8488789 8360993 +1.5%
Haswell:
Benchmark Base (ns) New (ns) Improvement
----------------------------------------------------------------------------------------
BM_ZFlat/0 107512 105621 925.6MB/s html (22.31 %) +1.8%
BM_ZFlat/1 1344306 1332479 503.1MB/s urls (47.78 %) +0.9%
BM_ZFlat/2 14752 9471 12.1GB/s jpg (99.95 %) +55.8%
BM_ZFlat/3 287 275 694.0MB/s jpg_200 (73.00 %) +4.4%
BM_ZFlat/4 48810 12263 7.8GB/s pdf (83.30 %) +298.0%
BM_ZFlat/5 443013 442064 884.6MB/s html4 (22.52 %) +0.2%
BM_ZFlat/6 429239 432124 336.0MB/s txt1 (57.88 %) -0.7%
BM_ZFlat/7 381765 383681 311.5MB/s txt2 (61.91 %) -0.5%
BM_ZFlat/8 1136667 1154304 353.0MB/s txt3 (54.99 %) -1.5%
BM_ZFlat/9 1579925 1592431 288.9MB/s txt4 (66.26 %) -0.8%
BM_ZFlat/10 98345 92411 1.2GB/s pb (19.68 %) +6.4%
BM_ZFlat/11 340397 340466 516.8MB/s gaviota (37.72 %) -0.0%
BM_ZFlat/12 47076 43536 539.5MB/s cp (48.12 %) +8.1%
BM_ZFlat/13 16680 15637 680.8MB/s c (42.47 %) +6.7%
BM_ZFlat/14 4616 4539 782.6MB/s lsp (48.37 %) +1.7%
BM_ZFlat/15 1331231 1334094 736.9MB/s xls (41.23 %) -0.2%
BM_ZFlat/16 326 322 593.5MB/s xls_200 (78.00 %) +1.2%
BM_ZFlat/17 404383 400326 1.2GB/s bin (18.11 %) +1.0%
BM_ZFlat/18 69 69 2.7GB/s bin_200 (7.50 %) +0.0%
BM_ZFlat/19 74771 71348 511.7MB/s sum (48.96 %) +4.8%
BM_ZFlat/20 6461 6383 632.2MB/s man (59.21 %) +1.2%
Sum of all benchmarks 7810631 7773844 +0.5%
I've done a quick test that there are no performance regressions on external
GCC (4.9.2, Debian, Haswell, 64-bit), too.
Steinar H. Gunderson [Thu, 10 Mar 2016 17:37:05 +0000 (18:37 +0100)]
Default to glibtoolize instead of libtoolize if it exists,
and also make it customizable through the environment variable
$LIBTOOLIZE.
Fixes autogen.sh issues on OS X, which ships its own
(incompatible) libtoolize.
R=jeff
Steinar H. Gunderson [Fri, 8 Jan 2016 14:05:44 +0000 (15:05 +0100)]
Work around an issue where some compilers interpret <:: as a trigraph.
Also correct the namespace name.
Steinar H. Gunderson [Fri, 8 Jan 2016 10:40:06 +0000 (11:40 +0100)]
Unbreak the open-source build for ARM due to missing ATTRIBUTE_PACKED
declaration.
Steinar H. Gunderson [Mon, 4 Jan 2016 11:52:15 +0000 (12:52 +0100)]
Fix an issue where the ByteSource path (used for parsing std::string)
would incorrectly accept some invalid varints that the other path would not,
causing potential CHECK-failures if the unit test were run with
--write_uncompressed and a corrupted input file.
Found by the afl fuzzer.
Steinar H. Gunderson [Mon, 4 Jan 2016 11:51:31 +0000 (12:51 +0100)]
Make UNALIGNED_LOAD16/32 on ARMv7 go through an explicitly unaligned struct,
to avoid the compiler coalescing multiple loads into a single load instruction
(which only work for aligned accesses).
A typical example where GCC would coalesce:
uint8* p = ...;
uint32 a = UNALIGNED_LOAD32(p);
uint32 b = UNALIGNED_LOAD32(p + 4);
uint32 c = a | b;
Huachao Huang [Wed, 28 Oct 2015 09:18:23 +0000 (17:18 +0800)]
Allow to compile in nested packages.
Steinar H. Gunderson [Wed, 26 Aug 2015 15:50:48 +0000 (17:50 +0200)]
Update URLs in the Snappy README to reflect the move to GitHub.
A=sesse
R=sanjay
Steinar H. Gunderson [Wed, 19 Aug 2015 09:37:51 +0000 (11:37 +0200)]
Move the logic from ComputeTable into the unit test, which means it's run
automatically together with the other tests, and also removes the stray
function ComputeTable() (which was never referenced by anything else
in the open-source version, causing compiler warnings for some)
out of the core library.
Fixes public issue 96.
A=sesse
R=sanjay
Steinar H. Gunderson [Mon, 3 Aug 2015 11:17:04 +0000 (13:17 +0200)]
Fix signed-vs.-unsigned comparison warnings.
These were found by compiling Chromium's external copy of this code with MSVC
with warning C4018 enabled.
A=pkasting
R=sanjay
Aleksandr Makarov [Fri, 31 Jul 2015 13:25:17 +0000 (16:25 +0300)]
Provide pkg-config data
Steinar H. Gunderson [Tue, 7 Jul 2015 08:45:04 +0000 (10:45 +0200)]
Release Snappy 1.1.3; getting the new Uncompress variant in a release is nice,
and it's also good to finally get an official release out after the migration
to GitHub.
The GitHub releases are basically done by tagging a commit and then uploading
the .tar.gz file generated by make dist as a binary asset; GitHub will add
all files on the tagged commit on top of the tarball and recompress, but since
we don't have any nodist_* files in configure.ac, this works fine for us.
(As far as I can see, this behavior of GitHub--uncompressing the .tar.gz,
and the behavior of silently ignoring files in it that are also in the git
repository--is undocumented, but also seems to be used in some official
screenshots, so I guess we can rely on it.)
A=sesse
R=jeff
Steinar H. Gunderson [Mon, 22 Jun 2015 14:10:47 +0000 (16:10 +0200)]
Initialized members of SnappyArrayWriter and SnappyDecompressionValidator.
These members were almost surely initialized before use by other member
functions, but Coverity was warning about this. Eliminating these warnings
minimizes clutter in that report and the likelihood of overlooking a real bug.
A=cmumford
R=jeff
Steinar H. Gunderson [Mon, 22 Jun 2015 14:03:28 +0000 (16:03 +0200)]
Add support for Uncompress(source, sink). Various changes to allow
Uncompress(source, sink) to get the same performance as the different
variants of Uncompress to Cord/DataBuffer/String/FlatBuffer.
Changes to efficiently support Uncompress(source, sink)
--------
a) For strings - we add support to StringByteSink to do GetAppendBuffer so we
can write to it without copying.
b) For flat array buffers, we do GetAppendBuffer and see if we can get a full buffer.
With the above changes we get performance with ByteSource/ByteSink
that is very close to directly using flat arrays and strings.
We add various benchmark cases to demonstrate that.
Orthogonal change
------------------
Add support for TryFastAppend() for SnappyScatteredWriter.
Benchmark results are below
CPU: Intel Core2 dL1:32KB dL2:4096KB
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0 109065 108996 6410 896.0MB/s html
BM_UFlat/1 1012175 1012343 691 661.4MB/s urls
BM_UFlat/2 26775 26771 26149 4.4GB/s jpg
BM_UFlat/3 48947 48940 14363 1.8GB/s pdf
BM_UFlat/4 441029 440835 1589 886.1MB/s html4
BM_UFlat/5 39861 39880 17823 588.3MB/s cp
BM_UFlat/6 18315 18300 38126 581.1MB/s c
BM_UFlat/7 5254 5254 100000 675.4MB/s lsp
BM_UFlat/8 1568060 1567376 447 626.6MB/s xls
BM_UFlat/9 337512 337734 2073 429.5MB/s txt1
BM_UFlat/10 287269 287054 2434 415.9MB/s txt2
BM_UFlat/11 890098 890219 787 457.2MB/s txt3
BM_UFlat/12 1186593 1186863 590 387.2MB/s txt4
BM_UFlat/13 573927 573318 1000 853.7MB/s bin
BM_UFlat/14 64250 64294 10000 567.2MB/s sum
BM_UFlat/15 7301 7300 96153 552.2MB/s man
BM_UFlat/16 109617 109636 6375 1031.5MB/s pb
BM_UFlat/17 364438 364497 1921 482.3MB/s gaviota
BM_UFlatSink/0 108518 108465 6450 900.4MB/s html
BM_UFlatSink/1 991952 991997 705 675.0MB/s urls
BM_UFlatSink/2 26815 26798 26065 4.4GB/s jpg
BM_UFlatSink/3 49127 49122 14255 1.8GB/s pdf
BM_UFlatSink/4 436674 436731 1604 894.4MB/s html4
BM_UFlatSink/5 39738 39733 17345 590.5MB/s cp
BM_UFlatSink/6 18413 18416 37962 577.4MB/s c
BM_UFlatSink/7 5677 5676 100000 625.2MB/s lsp
BM_UFlatSink/8 1552175 1551026 451 633.2MB/s xls
BM_UFlatSink/9 338526 338489 2065 428.5MB/s txt1
BM_UFlatSink/10 289387 289307 2420 412.6MB/s txt2
BM_UFlatSink/11 893803 893706 783 455.4MB/s txt3
BM_UFlatSink/12 1195919 1195459 586 384.4MB/s txt4
BM_UFlatSink/13 559637 559779 1000 874.3MB/s bin
BM_UFlatSink/14 65073 65094 10000 560.2MB/s sum
BM_UFlatSink/15 7618 7614 92823 529.5MB/s man
BM_UFlatSink/16 110085 110121 6352 1027.0MB/s pb
BM_UFlatSink/17 369196 368915 1896 476.5MB/s gaviota
BM_UValidate/0 46954 46957 14899 2.0GB/s html
BM_UValidate/1 500621 500868 1000 1.3GB/s urls
BM_UValidate/2 283 283 2481447 417.2GB/s jpg
BM_UValidate/3 16230 16228 43137 5.4GB/s pdf
BM_UValidate/4 189129 189193 3701 2.0GB/s html4
A=uday
R=sanjay
Steinar H. Gunderson [Mon, 22 Jun 2015 13:48:29 +0000 (15:48 +0200)]
Changes to eliminate compiler warnings on MSVC
This code was not compiling under Visual Studio 2013 with warnings being treated
as errors. Specifically:
1. Changed int -> size_t to eliminate signed/unsigned mismatch warning.
2. Added some missing return values to functions.
3. Inserting character instead of integer literals into strings to avoid type
conversions.
A=cmumford
R=jeff
Steinar H. Gunderson [Mon, 22 Jun 2015 13:45:11 +0000 (15:45 +0200)]
Fixed unit tests to compile under MSVC.
1. Including config.h in test.
2. Including windows.h before zippy-test.h.
3. Removed definition of WIN32_LEAN_AND_MEAN. This caused problems in
build environments that define WIN32_LEAN_AND_MEAN as our
definition didn't check for prior existence. This constant is old
and no longer needed anyhow.
4. Disable MSVC warning 4722 since ~LogMessageCrash() never returns.
A=cmumford
R=jeff
Steinar H. Gunderson [Mon, 22 Jun 2015 13:41:30 +0000 (15:41 +0200)]
Change a few branch annotations that profiling found to be wrong.
Overall performance is neutral or slightly positive.
Westmere (64-bit, opt):
Benchmark Base (ns) New (ns) Improvement
--------------------------------------------------------------------------------------
BM_UFlat/0 73798 71464 1.3GB/s html +3.3%
BM_UFlat/1 715223 704318 953.5MB/s urls +1.5%
BM_UFlat/2 8137 8871 13.0GB/s jpg -8.3%
BM_UFlat/3 200 204 935.5MB/s jpg_200 -2.0%
BM_UFlat/4 21627 21281 4.5GB/s pdf +1.6%
BM_UFlat/5 302806 290350 1.3GB/s html4 +4.3%
BM_UFlat/6 218920 219017 664.1MB/s txt1 -0.0%
BM_UFlat/7 190437 191212 626.1MB/s txt2 -0.4%
BM_UFlat/8 584192 580484 703.4MB/s txt3 +0.6%
BM_UFlat/9 776537 779055 591.6MB/s txt4 -0.3%
BM_UFlat/10 76056 72606 1.5GB/s pb +4.8%
BM_UFlat/11 235962 239043 737.4MB/s gaviota -1.3%
BM_UFlat/12 28049 28000 840.1MB/s cp +0.2%
BM_UFlat/13 12225 12021 886.9MB/s c +1.7%
BM_UFlat/14 3362 3544 1004.0MB/s lsp -5.1%
BM_UFlat/15 937015 939206 1048.9MB/s xls -0.2%
BM_UFlat/16 236 233 823.1MB/s xls_200 +1.3%
BM_UFlat/17 373170 361947 1.3GB/s bin +3.1%
BM_UFlat/18 264 264 725.5MB/s bin_200 +0.0%
BM_UFlat/19 42834 43577 839.2MB/s sum -1.7%
BM_UFlat/20 4770 4736 853.6MB/s man +0.7%
BM_UValidate/0 39671 39944 2.4GB/s html -0.7%
BM_UValidate/1 443391 443391 1.5GB/s urls +0.0%
BM_UValidate/2 163 163 703.3GB/s jpg +0.0%
BM_UValidate/3 113 112 1.7GB/s jpg_200 +0.9%
BM_UValidate/4 7555 7608 12.6GB/s pdf -0.7%
BM_ZFlat/0 157616 157568 621.5MB/s html (22.31 %) +0.0%
BM_ZFlat/1 1997290 2014486 333.4MB/s urls (47.77 %) -0.9%
BM_ZFlat/2 23035 22237 5.2GB/s jpg (99.95 %) +3.6%
BM_ZFlat/3 539 540 354.5MB/s jpg_200 (73.00 %) -0.2%
BM_ZFlat/4 80709 81369 1.2GB/s pdf (81.85 %) -0.8%
BM_ZFlat/5 639059 639220 613.0MB/s html4 (22.51 %) -0.0%
BM_ZFlat/6 577203 583370 249.3MB/s txt1 (57.87 %) -1.1%
BM_ZFlat/7 510887 516094 232.0MB/s txt2 (61.93 %) -1.0%
BM_ZFlat/8 1535843 1556973 262.2MB/s txt3 (54.92 %) -1.4%
BM_ZFlat/9 2070068 2102380 219.3MB/s txt4 (66.22 %) -1.5%
BM_ZFlat/10 152396 152148 745.5MB/s pb (19.64 %) +0.2%
BM_ZFlat/11 447367 445859 395.4MB/s gaviota (37.72 %) +0.3%
BM_ZFlat/12 76375 76797 306.3MB/s cp (48.12 %) -0.5%
BM_ZFlat/13 31518 31987 333.3MB/s c (42.40 %) -1.5%
BM_ZFlat/14 10598 10827 328.6MB/s lsp (48.37 %) -2.1%
BM_ZFlat/15 1782243 1802728 546.5MB/s xls (41.23 %) -1.1%
BM_ZFlat/16 526 539 355.0MB/s xls_200 (78.00 %) -2.4%
BM_ZFlat/17 598141 597311 822.1MB/s bin (18.11 %) +0.1%
BM_ZFlat/18 121 120 1.6GB/s bin_200 (7.50 %) +0.8%
BM_ZFlat/19 109981 112173 326.0MB/s sum (48.96 %) -2.0%
BM_ZFlat/20 14355 14575 277.4MB/s man (59.36 %) -1.5%
Sum of all benchmarks
33882722 33879325 +0.0%
Sandy Bridge (64-bit, opt):
Benchmark Base (ns) New (ns) Improvement
--------------------------------------------------------------------------------------
BM_UFlat/0 43764 41600 2.3GB/s html +5.2%
BM_UFlat/1 517990 507058 1.3GB/s urls +2.2%
BM_UFlat/2 6625 5529 20.8GB/s jpg +19.8%
BM_UFlat/3 154 155 1.2GB/s jpg_200 -0.6%
BM_UFlat/4 12795 11747 8.1GB/s pdf +8.9%
BM_UFlat/5 200335 193413 2.0GB/s html4 +3.6%
BM_UFlat/6 156574 156426 929.2MB/s txt1 +0.1%
BM_UFlat/7 137574 137464 870.4MB/s txt2 +0.1%
BM_UFlat/8 422551 421603 967.4MB/s txt3 +0.2%
BM_UFlat/9 577749 578985 795.6MB/s txt4 -0.2%
BM_UFlat/10 42329 39362 2.8GB/s pb +7.5%
BM_UFlat/11 170615 169751 1037.9MB/s gaviota +0.5%
BM_UFlat/12 12800 12719 1.8GB/s cp +0.6%
BM_UFlat/13 6585 6579 1.6GB/s c +0.1%
BM_UFlat/14 2066 2044 1.7GB/s lsp +1.1%
BM_UFlat/15 750861 746911 1.3GB/s xls +0.5%
BM_UFlat/16 188 192 996.0MB/s xls_200 -2.1%
BM_UFlat/17 271622 264333 1.8GB/s bin +2.8%
BM_UFlat/18 208 207 923.6MB/s bin_200 +0.5%
BM_UFlat/19 24667 24845 1.4GB/s sum -0.7%
BM_UFlat/20 2663 2662 1.5GB/s man +0.0%
BM_ZFlat/0 115173 115624 846.5MB/s html (22.31 %) -0.4%
BM_ZFlat/1 1530331 1537769 436.5MB/s urls (47.77 %) -0.5%
BM_ZFlat/2 17503 17013 6.8GB/s jpg (99.95 %) +2.9%
BM_ZFlat/3 385 385 496.3MB/s jpg_200 (73.00 %) +0.0%
BM_ZFlat/4 61753 61540 1.6GB/s pdf (81.85 %) +0.3%
BM_ZFlat/5 484806 483356 810.1MB/s html4 (22.51 %) +0.3%
BM_ZFlat/6 464143 467609 310.9MB/s txt1 (57.87 %) -0.7%
BM_ZFlat/7 410315 413319 289.5MB/s txt2 (61.93 %) -0.7%
BM_ZFlat/8 1244082 1249381 326.5MB/s txt3 (54.92 %) -0.4%
BM_ZFlat/9 1696914 1709685 269.4MB/s txt4 (66.22 %) -0.7%
BM_ZFlat/10 104148 103372 1096.7MB/s pb (19.64 %) +0.8%
BM_ZFlat/11 363522 359722 489.8MB/s gaviota (37.72 %) +1.1%
BM_ZFlat/12 47021 50095 469.3MB/s cp (48.12 %) -6.1%
BM_ZFlat/13 16888 16985 627.4MB/s c (42.40 %) -0.6%
BM_ZFlat/14 5496 5469 650.3MB/s lsp (48.37 %) +0.5%
BM_ZFlat/15 1460713 1448760 679.5MB/s xls (41.23 %) +0.8%
BM_ZFlat/16 387 393 486.8MB/s xls_200 (78.00 %) -1.5%
BM_ZFlat/17 457654 451462 1086.6MB/s bin (18.11 %) +1.4%
BM_ZFlat/18 97 87 2.1GB/s bin_200 (7.50 %) +11.5%
BM_ZFlat/19 77904 80924 451.7MB/s sum (48.96 %) -3.7%
BM_ZFlat/20 7648 7663 527.1MB/s man (59.36 %) -0.2%
Sum of all benchmarks
25493635 25482069 +0.0%
A=dehao
R=sesse
Steinar H. Gunderson [Mon, 22 Jun 2015 14:07:58 +0000 (16:07 +0200)]
Sync with various Google-internal changes.
Should not mean much for the open-source version.
Steinar H. Gunderson [Mon, 22 Jun 2015 13:39:08 +0000 (15:39 +0200)]
Change some internal path names.
This is mostly to sync up with some changes from Google's internal
repositories; it does not affect the open-source distribution in itself.
snappy.mirrorbot@gmail.com [Fri, 28 Feb 2014 11:18:07 +0000 (11:18 +0000)]
Release Snappy 1.1.2.
R=jeff
git-svn-id: https://snappy.googlecode.com/svn/trunk@84
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Wed, 19 Feb 2014 10:31:49 +0000 (10:31 +0000)]
Fix public issue 82: Stop distributing benchmark data files that have
unclear or unsuitable licensing.
In general, we replace the files we can with liberally licensed data,
and remove all the others (in particular all the parts of the Canterbury
corpus that are not clearly in the public domain). The replacements
do not always have the exact same characteristics as the original ones,
but they are more than good enough to be useful for benchmarking.
git-svn-id: https://snappy.googlecode.com/svn/trunk@83
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Fri, 25 Oct 2013 13:31:27 +0000 (13:31 +0000)]
Add support for padding in the Snappy framed format.
This is specifically motivated by DICOM's demands that embedded data
must be of an even number of bytes, but could in principle be used for
any sort of padding/alignment needed.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@82
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 15 Oct 2013 15:21:31 +0000 (15:21 +0000)]
Release Snappy 1.1.1.
R=jeff
git-svn-id: https://snappy.googlecode.com/svn/trunk@81
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 13 Aug 2013 12:55:00 +0000 (12:55 +0000)]
Add autoconf tests for size_t and ssize_t. Sort-of resolves public issue 79;
it would solve the problem if MSVC typically used autoconf. However, it gives
a natural place (config.h) to put the typedef even for MSVC.
R=jsbell
git-svn-id: https://snappy.googlecode.com/svn/trunk@80
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Mon, 29 Jul 2013 11:06:44 +0000 (11:06 +0000)]
When we compare the number of bytes produced with the offset for a
backreference, make the signedness of the bytes produced clear,
by sticking it into a size_t. This avoids a signed/unsigned compare
warning from MSVC (public issue 71), and also is slightly clearer.
Since the line is now so long the explanatory comment about the -1u
trick has to go somewhere else anyway, I used the opportunity to
explain it in slightly more detail.
This is a purely stylistic change; the emitted assembler from GCC
is identical.
R=jeff
git-svn-id: https://snappy.googlecode.com/svn/trunk@79
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Sun, 30 Jun 2013 19:24:03 +0000 (19:24 +0000)]
In the fast path for decompressing literals, instead of checking
whether there's 16 bytes free and then checking right afterwards
(when having subtracted the literal size) that there are now
5 bytes free, just check once for 21 bytes. This skips a compare
and a branch; although it is easily predictable, it is still
a few cycles on a fast path that we would like to get rid of.
Benchmarking this yields very confusing results. On open-source
GCC 4.8.1 on Haswell, we get exactly the expected results; the
benchmarks where we hit the fast path for literals (in particular
the two HTML benchmarks and the protobuf benchmark) give very nice
speedups, and the others are not really affected.
However, benchmarks with Google's GCC branch on other hardware
is much less clear. It seems that we have a weak loss in some cases
(and the win for the “typical” win cases are not nearly as clear),
but that it depends on microarchitecture and plain luck in how we run
the benchmark. Looking at the generated assembler, it seems that
the removal of the if causes other large-scale changes in how the
function is laid out, which makes it likely that this is just bad luck.
Thus, we should keep this change, even though its exact current impact is
unclear; it's a sensible change per se, and dropping it on the basis of
microoptimization for a given compiler (or even branch of a compiler)
would seem like a bad strategy in the long run.
Microbenchmark results (all in 64-bit, opt mode):
Nehalem, Google GCC:
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------------------
BM_UFlat/0 76747 75591 1.3GB/s html +1.5%
BM_UFlat/1 765756 757040 886.3MB/s urls +1.2%
BM_UFlat/2 10867 10893 10.9GB/s jpg -0.2%
BM_UFlat/3 124 131 1.4GB/s jpg_200 -5.3%
BM_UFlat/4 31663 31596 2.8GB/s pdf +0.2%
BM_UFlat/5 314162 308176 1.2GB/s html4 +1.9%
BM_UFlat/6 29668 29746 790.6MB/s cp -0.3%
BM_UFlat/7 12958 13386 796.4MB/s c -3.2%
BM_UFlat/8 3596 3682 966.0MB/s lsp -2.3%
BM_UFlat/9 1019193 1033493 953.3MB/s xls -1.4%
BM_UFlat/10 239 247 775.3MB/s xls_200 -3.2%
BM_UFlat/11 236411 240271 606.9MB/s txt1 -1.6%
BM_UFlat/12 206639 209768 571.2MB/s txt2 -1.5%
BM_UFlat/13 627803 635722 641.4MB/s txt3 -1.2%
BM_UFlat/14 845932 857816 538.2MB/s txt4 -1.4%
BM_UFlat/15 402107 391670 1.2GB/s bin +2.7%
BM_UFlat/16 283 279 683.6MB/s bin_200 +1.4%
BM_UFlat/17 46070 46815 781.5MB/s sum -1.6%
BM_UFlat/18 5053 5163 782.0MB/s man -2.1%
BM_UFlat/19 79721 76581 1.4GB/s pb +4.1%
BM_UFlat/20 251158 252330 697.5MB/s gaviota -0.5%
Sum of all benchmarks 4966150 4980396 -0.3%
Sandy Bridge, Google GCC:
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------------------
BM_UFlat/0 42850 42182 2.3GB/s html +1.6%
BM_UFlat/1 525660 515816 1.3GB/s urls +1.9%
BM_UFlat/2 7173 7283 16.3GB/s jpg -1.5%
BM_UFlat/3 92 91 2.1GB/s jpg_200 +1.1%
BM_UFlat/4 15147 14872 5.9GB/s pdf +1.8%
BM_UFlat/5 199936 192116 2.0GB/s html4 +4.1%
BM_UFlat/6 12796 12443 1.8GB/s cp +2.8%
BM_UFlat/7 6588 6400 1.6GB/s c +2.9%
BM_UFlat/8 2010 1951 1.8GB/s lsp +3.0%
BM_UFlat/9 761124 763049 1.3GB/s xls -0.3%
BM_UFlat/10 186 189 1016.1MB/s xls_200 -1.6%
BM_UFlat/11 159354 158460 918.6MB/s txt1 +0.6%
BM_UFlat/12 139732 139950 856.1MB/s txt2 -0.2%
BM_UFlat/13 429917 425027 961.7MB/s txt3 +1.2%
BM_UFlat/14 585255 587324 785.8MB/s txt4 -0.4%
BM_UFlat/15 276186 266173 1.8GB/s bin +3.8%
BM_UFlat/16 205 207 925.5MB/s bin_200 -1.0%
BM_UFlat/17 24925 24935 1.4GB/s sum -0.0%
BM_UFlat/18 2632 2576 1.5GB/s man +2.2%
BM_UFlat/19 40546 39108 2.8GB/s pb +3.7%
BM_UFlat/20 175803 168209 1048.9MB/s gaviota +4.5%
Sum of all benchmarks 3408117 3368361 +1.2%
Haswell, upstream GCC 4.8.1:
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------------------
BM_UFlat/0 46308 40641 2.3GB/s html +13.9%
BM_UFlat/1 513385 514706 1.3GB/s urls -0.3%
BM_UFlat/2 6197 6151 19.2GB/s jpg +0.7%
BM_UFlat/3 61 61 3.0GB/s jpg_200 +0.0%
BM_UFlat/4 13551 13429 6.5GB/s pdf +0.9%
BM_UFlat/5 198317 190243 2.0GB/s html4 +4.2%
BM_UFlat/6 14768 12560 1.8GB/s cp +17.6%
BM_UFlat/7 6453 6447 1.6GB/s c +0.1%
BM_UFlat/8 1991 1980 1.8GB/s lsp +0.6%
BM_UFlat/9 766947 770424 1.2GB/s xls -0.5%
BM_UFlat/10 170 169 1.1GB/s xls_200 +0.6%
BM_UFlat/11 164350 163554 888.7MB/s txt1 +0.5%
BM_UFlat/12 145444 143830 832.1MB/s txt2 +1.1%
BM_UFlat/13 437849 438413 929.2MB/s txt3 -0.1%
BM_UFlat/14 603587 605309 759.8MB/s txt4 -0.3%
BM_UFlat/15 249799 248067 1.9GB/s bin +0.7%
BM_UFlat/16 191 188 1011.4MB/s bin_200 +1.6%
BM_UFlat/17 26064 24778 1.4GB/s sum +5.2%
BM_UFlat/18 2620 2601 1.5GB/s man +0.7%
BM_UFlat/19 44551 37373 3.0GB/s pb +19.2%
BM_UFlat/20 165408 164584 1.0GB/s gaviota +0.5%
Sum of all benchmarks 3408011 3385508 +0.7%
git-svn-id: https://snappy.googlecode.com/svn/trunk@78
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Fri, 14 Jun 2013 21:42:26 +0000 (21:42 +0000)]
Make the two IncrementalCopy* functions take in an ssize_t instead of a len,
in order to avoid having to do 32-to-64-bit signed conversions on a hot path
during decompression. (Also fixes some MSVC warnings, mentioned in public
issue 75, but more of those remain.) They cannot be size_t because we expect
them to go negative and test for that.
This saves a few movzwl instructions, yielding ~2% speedup in decompression.
Sandy Bridge:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 48009 41283 2.3GB/s html +16.3%
BM_UFlat/1 531274 513419 1.3GB/s urls +3.5%
BM_UFlat/2 7378 7062 16.8GB/s jpg +4.5%
BM_UFlat/3 92 92 2.0GB/s jpg_200 +0.0%
BM_UFlat/4 15057 14974 5.9GB/s pdf +0.6%
BM_UFlat/5 204323 193140 2.0GB/s html4 +5.8%
BM_UFlat/6 13282 12611 1.8GB/s cp +5.3%
BM_UFlat/7 6511 6504 1.6GB/s c +0.1%
BM_UFlat/8 2014 2030 1.7GB/s lsp -0.8%
BM_UFlat/9 775909 768336 1.3GB/s xls +1.0%
BM_UFlat/10 182 184 1043.2MB/s xls_200 -1.1%
BM_UFlat/11 167352 161630 901.2MB/s txt1 +3.5%
BM_UFlat/12 147393 142246 842.8MB/s txt2 +3.6%
BM_UFlat/13 449960 432853 944.4MB/s txt3 +4.0%
BM_UFlat/14 620497 594845 775.9MB/s txt4 +4.3%
BM_UFlat/15 265610 267356 1.8GB/s bin -0.7%
BM_UFlat/16 206 205 932.7MB/s bin_200 +0.5%
BM_UFlat/17 25561 24730 1.4GB/s sum +3.4%
BM_UFlat/18 2620 2644 1.5GB/s man -0.9%
BM_UFlat/19 45766 38589 2.9GB/s pb +18.6%
BM_UFlat/20 171107 169832 1039.5MB/s gaviota +0.8%
Sum of all benchmarks 3500103 3394565 +3.1%
Westmere:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 72624 71526 1.3GB/s html +1.5%
BM_UFlat/1 735821 722917 930.8MB/s urls +1.8%
BM_UFlat/2 10450 10172 11.7GB/s jpg +2.7%
BM_UFlat/3 117 117 1.6GB/s jpg_200 +0.0%
BM_UFlat/4 29817 29648 3.0GB/s pdf +0.6%
BM_UFlat/5 297126 293073 1.3GB/s html4 +1.4%
BM_UFlat/6 28252 27994 842.0MB/s cp +0.9%
BM_UFlat/7 12672 12391 862.1MB/s c +2.3%
BM_UFlat/8 3507 3425 1040.9MB/s lsp +2.4%
BM_UFlat/9 1004268 969395 1018.0MB/s xls +3.6%
BM_UFlat/10 233 227 844.8MB/s xls_200 +2.6%
BM_UFlat/11 230054 224981 647.8MB/s txt1 +2.3%
BM_UFlat/12 201229 196447 610.5MB/s txt2 +2.4%
BM_UFlat/13 609547 596761 685.3MB/s txt3 +2.1%
BM_UFlat/14 824362 804821 573.8MB/s txt4 +2.4%
BM_UFlat/15 371095 374899 1.3GB/s bin -1.0%
BM_UFlat/16 267 267 717.8MB/s bin_200 +0.0%
BM_UFlat/17 44623 43828 835.9MB/s sum +1.8%
BM_UFlat/18 5077 4815 841.0MB/s man +5.4%
BM_UFlat/19 74964 73210 1.5GB/s pb +2.4%
BM_UFlat/20 237987 236745 746.0MB/s gaviota +0.5%
Sum of all benchmarks 4794092 4697659 +2.1%
Istanbul:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 98614 96376 1020.4MB/s html +2.3%
BM_UFlat/1 963740 953241 707.2MB/s urls +1.1%
BM_UFlat/2 25042 24769 4.8GB/s jpg +1.1%
BM_UFlat/3 180 180 1065.6MB/s jpg_200 +0.0%
BM_UFlat/4 45942 45403 1.9GB/s pdf +1.2%
BM_UFlat/5 400135 390226 1008.2MB/s html4 +2.5%
BM_UFlat/6 37768 37392 631.9MB/s cp +1.0%
BM_UFlat/7 18585 18200 588.2MB/s c +2.1%
BM_UFlat/8 5751 5690 627.7MB/s lsp +1.1%
BM_UFlat/9 1543154 1542209 641.4MB/s xls +0.1%
BM_UFlat/10 381 388 494.6MB/s xls_200 -1.8%
BM_UFlat/11 339715 331973 440.1MB/s txt1 +2.3%
BM_UFlat/12 294807 289418 415.4MB/s txt2 +1.9%
BM_UFlat/13 906160 884094 463.3MB/s txt3 +2.5%
BM_UFlat/14 1224221 1198435 386.1MB/s txt4 +2.2%
BM_UFlat/15 516277 502923 979.5MB/s bin +2.7%
BM_UFlat/16 405 402 477.2MB/s bin_200 +0.7%
BM_UFlat/17 61640 60621 605.6MB/s sum +1.7%
BM_UFlat/18 7326 7383 549.5MB/s man -0.8%
BM_UFlat/19 94720 92653 1.2GB/s pb +2.2%
BM_UFlat/20 360435 346687 510.6MB/s gaviota +4.0%
Sum of all benchmarks 6944998 6828663 +1.7%
git-svn-id: https://snappy.googlecode.com/svn/trunk@77
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Thu, 13 Jun 2013 16:19:52 +0000 (16:19 +0000)]
Add support for uncompressing to iovecs (scatter I/O).
Windows does not have struct iovec defined anywhere,
so we define our own version that's equal to what UNIX
typically has.
The bulk of this patch was contributed by Mohit Aron.
R=jeff
git-svn-id: https://snappy.googlecode.com/svn/trunk@76
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Wed, 12 Jun 2013 19:51:15 +0000 (19:51 +0000)]
Some code reorganization needed for an internal change.
R=fikes
git-svn-id: https://snappy.googlecode.com/svn/trunk@75
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 9 Apr 2013 15:33:30 +0000 (15:33 +0000)]
Supports truncated test data in zippy benchmark.
R=sesse
git-svn-id: https://snappy.googlecode.com/svn/trunk@74
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 5 Feb 2013 14:36:15 +0000 (14:36 +0000)]
Release Snappy 1.1.0.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@73
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 5 Feb 2013 14:30:05 +0000 (14:30 +0000)]
Make ./snappy_unittest pass without "srcdir" being defined.
Previously, snappy_unittests would read from an absolute path /testdata/..;
convert it to use a relative path instead.
Patch from Marc-Antonie Ruel.
R=maruel
git-svn-id: https://snappy.googlecode.com/svn/trunk@72
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Fri, 18 Jan 2013 12:16:36 +0000 (12:16 +0000)]
Increase the Zippy block size from 32 kB to 64 kB, winning ~3% density
while being effectively performance neutral.
The longer story about density is that we win 3-6% density on the benchmarks
where this has any effect at all; many of the benchmarks (cp, c, lsp, man)
are smaller than 32 kB and thus will have no effect. Binary data also seems
to win little or nothing; of course, the already-compressed data wins nothing.
The protobuf benchmark wins as much as ~18% depending on architecture,
but I wouldn't be too sure that this is representative of protobuf data in
general.
As of performance, we lose a tiny amount since we get more tags (e.g., a long
literal might be broken up into literal-copy-literal), but we win it back with
less clearing of the hash table, and more opportunities to skip incompressible
data (e.g. in the jpg benchmark). Decompression seems to get ever so slightly
slower, again due to more tags. The total net change is about as close to zero
as we can get, so the end effect seems to be simply more density and no
real performance change.
The comment about not changing kBlockSize, scary as it is, is not really
relevant, since we're never going to have a block-level decompressor without
explicitly marked blocks. Replace it with something more appropriate.
This affects the framing format, but it's okay to change it since it basically
has no users yet.
Density (note that cp, c, lsp and man are all smaller than 32 kB):
Benchmark Description Base (%) New (%) Improvement
--------------------------------------------------------------
ZFlat/0 html 22.57 22.31 +5.6%
ZFlat/1 urls 50.89 47.77 +6.5%
ZFlat/2 jpg 99.88 99.87 +0.0%
ZFlat/3 pdf 82.13 82.07 +0.1%
ZFlat/4 html4 23.55 22.51 +4.6%
ZFlat/5 cp 48.12 48.12 +0.0%
ZFlat/6 c 42.40 42.40 +0.0%
ZFlat/7 lsp 48.37 48.37 +0.0%
ZFlat/8 xls 41.34 41.23 +0.3%
ZFlat/9 txt1 59.81 57.87 +3.4%
ZFlat/10 txt2 64.07 61.93 +3.5%
ZFlat/11 txt3 57.11 54.92 +4.0%
ZFlat/12 txt4 68.35 66.22 +3.2%
ZFlat/13 bin 18.21 18.11 +0.6%
ZFlat/14 sum 51.88 48.96 +6.0%
ZFlat/15 man 59.36 59.36 +0.0%
ZFlat/16 pb 23.15 19.64 +17.9%
ZFlat/17 gaviota 38.27 37.72 +1.5%
Geometric mean 45.51 44.15 +3.1%
Microbenchmarks (64-bit, opt):
Westmere 2.8 GHz:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 75342 75027 1.3GB/s html +0.4%
BM_UFlat/1 723767 744269 899.6MB/s urls -2.8%
BM_UFlat/2 10072 10072 11.7GB/s jpg +0.0%
BM_UFlat/3 30747 30388 2.9GB/s pdf +1.2%
BM_UFlat/4 307353 306063 1.2GB/s html4 +0.4%
BM_UFlat/5 28593 28743 816.3MB/s cp -0.5%
BM_UFlat/6 12958 12998 818.1MB/s c -0.3%
BM_UFlat/7 3700 3792 935.8MB/s lsp -2.4%
BM_UFlat/8 999685 999905 982.1MB/s xls -0.0%
BM_UFlat/9 232954 230079 630.4MB/s txt1 +1.2%
BM_UFlat/10 200785 201468 592.6MB/s txt2 -0.3%
BM_UFlat/11 617267 610968 666.1MB/s txt3 +1.0%
BM_UFlat/12 821595 822475 558.7MB/s txt4 -0.1%
BM_UFlat/13 377097 377632 1.3GB/s bin -0.1%
BM_UFlat/14 45476 45260 805.8MB/s sum +0.5%
BM_UFlat/15 4985 5003 805.7MB/s man -0.4%
BM_UFlat/16 80813 77494 1.4GB/s pb +4.3%
BM_UFlat/17 251792 241553 727.7MB/s gaviota +4.2%
BM_UValidate/0 40343 40354 2.4GB/s html -0.0%
BM_UValidate/1 426890 451574 1.4GB/s urls -5.5%
BM_UValidate/2 187 179 661.9GB/s jpg +4.5%
BM_UValidate/3 13783 13827 6.4GB/s pdf -0.3%
BM_UValidate/4 162393 163335 2.3GB/s html4 -0.6%
BM_UDataBuffer/0 93756 93302 1046.7MB/s html +0.5%
BM_UDataBuffer/1 886714 916292 730.7MB/s urls -3.2%
BM_UDataBuffer/2 15861 16401 7.2GB/s jpg -3.3%
BM_UDataBuffer/3 38934 39224 2.2GB/s pdf -0.7%
BM_UDataBuffer/4 381008 379428 1029.5MB/s html4 +0.4%
BM_UCord/0 92528 91098 1072.0MB/s html +1.6%
BM_UCord/1 858421 885287 756.3MB/s urls -3.0%
BM_UCord/2 13140 13464 8.8GB/s jpg -2.4%
BM_UCord/3 39012 37773 2.3GB/s pdf +3.3%
BM_UCord/4 376869 371267 1052.1MB/s html4 +1.5%
BM_UCordString/0 75810 75303 1.3GB/s html +0.7%
BM_UCordString/1 735290 753841 888.2MB/s urls -2.5%
BM_UCordString/2 11945 13113 9.0GB/s jpg -8.9%
BM_UCordString/3 33901 32562 2.7GB/s pdf +4.1%
BM_UCordString/4 310985 309390 1.2GB/s html4 +0.5%
BM_UCordValidate/0 40952 40450 2.4GB/s html +1.2%
BM_UCordValidate/1 433842 456531 1.4GB/s urls -5.0%
BM_UCordValidate/2 1179 1173 100.8GB/s jpg +0.5%
BM_UCordValidate/3 14481 14392 6.1GB/s pdf +0.6%
BM_UCordValidate/4 164364 164151 2.3GB/s html4 +0.1%
BM_ZFlat/0 160610 156601 623.6MB/s html (22.31 %) +2.6%
BM_ZFlat/1 1995238 1993582 335.9MB/s urls (47.77 %) +0.1%
BM_ZFlat/2 30133 24983 4.7GB/s jpg (99.87 %) +20.6%
BM_ZFlat/3 74453 73128 1.2GB/s pdf (82.07 %) +1.8%
BM_ZFlat/4 647674 633729 616.4MB/s html4 (22.51 %) +2.2%
BM_ZFlat/5 76259 76090 308.4MB/s cp (48.12 %) +0.2%
BM_ZFlat/6 31106 31084 342.1MB/s c (42.40 %) +0.1%
BM_ZFlat/7 10507 10443 339.8MB/s lsp (48.37 %) +0.6%
BM_ZFlat/8 1811047 1793325 547.6MB/s xls (41.23 %) +1.0%
BM_ZFlat/9 597903 581793 249.3MB/s txt1 (57.87 %) +2.8%
BM_ZFlat/10 525320 514522 232.0MB/s txt2 (61.93 %) +2.1%
BM_ZFlat/11 1596591 1551636 262.3MB/s txt3 (54.92 %) +2.9%
BM_ZFlat/12 2134523 2094033 219.5MB/s txt4 (66.22 %) +1.9%
BM_ZFlat/13 593024 587869 832.6MB/s bin (18.11 %) +0.9%
BM_ZFlat/14 114746 110666 329.5MB/s sum (48.96 %) +3.7%
BM_ZFlat/15 14376 14485 278.3MB/s man (59.36 %) -0.8%
BM_ZFlat/16 167908 150070 753.6MB/s pb (19.64 %) +11.9%
BM_ZFlat/17 460228 442253 397.5MB/s gaviota (37.72 %) +4.1%
BM_ZCord/0 164896 160241 609.4MB/s html +2.9%
BM_ZCord/1 2070239 2043492 327.7MB/s urls +1.3%
BM_ZCord/2 54402 47002 2.5GB/s jpg +15.7%
BM_ZCord/3 85871 83832 1073.1MB/s pdf +2.4%
BM_ZCord/4 664078 648825 602.0MB/s html4 +2.4%
BM_ZDataBuffer/0 174874 172549 566.0MB/s html +1.3%
BM_ZDataBuffer/1 2134410 2139173 313.0MB/s urls -0.2%
BM_ZDataBuffer/2 71911 69551 1.7GB/s jpg +3.4%
BM_ZDataBuffer/3 98236 99727 902.1MB/s pdf -1.5%
BM_ZDataBuffer/4 710776 699104 558.8MB/s html4 +1.7%
Sum of all benchmarks
27358908 27200688 +0.6%
Sandy Bridge 2.6 GHz:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 49356 49018 1.9GB/s html +0.7%
BM_UFlat/1 516764 531955 1.2GB/s urls -2.9%
BM_UFlat/2 6982 7304 16.2GB/s jpg -4.4%
BM_UFlat/3 15285 15598 5.6GB/s pdf -2.0%
BM_UFlat/4 206557 206669 1.8GB/s html4 -0.1%
BM_UFlat/5 13681 13567 1.7GB/s cp +0.8%
BM_UFlat/6 6571 6592 1.6GB/s c -0.3%
BM_UFlat/7 2008 1994 1.7GB/s lsp +0.7%
BM_UFlat/8 775700 773286 1.2GB/s xls +0.3%
BM_UFlat/9 165578 164480 881.8MB/s txt1 +0.7%
BM_UFlat/10 143707 144139 828.2MB/s txt2 -0.3%
BM_UFlat/11 443026 436281 932.8MB/s txt3 +1.5%
BM_UFlat/12 603129 595856 771.2MB/s txt4 +1.2%
BM_UFlat/13 271682 270450 1.8GB/s bin +0.5%
BM_UFlat/14 26200 25666 1.4GB/s sum +2.1%
BM_UFlat/15 2620 2608 1.5GB/s man +0.5%
BM_UFlat/16 48908 47756 2.3GB/s pb +2.4%
BM_UFlat/17 174638 170346 1031.9MB/s gaviota +2.5%
BM_UValidate/0 31922 31898 3.0GB/s html +0.1%
BM_UValidate/1 341265 363554 1.8GB/s urls -6.1%
BM_UValidate/2 160 151 782.8GB/s jpg +6.0%
BM_UValidate/3 10402 10380 8.5GB/s pdf +0.2%
BM_UValidate/4 129490 130587 2.9GB/s html4 -0.8%
BM_UDataBuffer/0 59383 58736 1.6GB/s html +1.1%
BM_UDataBuffer/1 619222 637786 1049.8MB/s urls -2.9%
BM_UDataBuffer/2 10775 11941 9.9GB/s jpg -9.8%
BM_UDataBuffer/3 18002 17930 4.9GB/s pdf +0.4%
BM_UDataBuffer/4 259182 259306 1.5GB/s html4 -0.0%
BM_UCord/0 59379 57814 1.6GB/s html +2.7%
BM_UCord/1 598456 615162 1088.4MB/s urls -2.7%
BM_UCord/2 8519 8628 13.7GB/s jpg -1.3%
BM_UCord/3 18123 17537 5.0GB/s pdf +3.3%
BM_UCord/4 252375 252331 1.5GB/s html4 +0.0%
BM_UCordString/0 49494 49790 1.9GB/s html -0.6%
BM_UCordString/1 524659 541803 1.2GB/s urls -3.2%
BM_UCordString/2 8206 8354 14.2GB/s jpg -1.8%
BM_UCordString/3 17235 16537 5.3GB/s pdf +4.2%
BM_UCordString/4 210188 211072 1.8GB/s html4 -0.4%
BM_UCordValidate/0 31956 31587 3.0GB/s html +1.2%
BM_UCordValidate/1 340828 362141 1.8GB/s urls -5.9%
BM_UCordValidate/2 783 744 158.9GB/s jpg +5.2%
BM_UCordValidate/3 10543 10462 8.4GB/s pdf +0.8%
BM_UCordValidate/4 130150 129789 2.9GB/s html4 +0.3%
BM_ZFlat/0 113873 111200 878.2MB/s html (22.31 %) +2.4%
BM_ZFlat/1 1473023 1489858 449.4MB/s urls (47.77 %) -1.1%
BM_ZFlat/2 23569 19486 6.1GB/s jpg (99.87 %) +21.0%
BM_ZFlat/3 49178 48046 1.8GB/s pdf (82.07 %) +2.4%
BM_ZFlat/4 475063 469394 832.2MB/s html4 (22.51 %) +1.2%
BM_ZFlat/5 46910 46816 501.2MB/s cp (48.12 %) +0.2%
BM_ZFlat/6 16883 16916 628.6MB/s c (42.40 %) -0.2%
BM_ZFlat/7 5381 5447 651.5MB/s lsp (48.37 %) -1.2%
BM_ZFlat/8 1466870 1473861 666.3MB/s xls (41.23 %) -0.5%
BM_ZFlat/9 468006 464101 312.5MB/s txt1 (57.87 %) +0.8%
BM_ZFlat/10 408157 408957 291.9MB/s txt2 (61.93 %) -0.2%
BM_ZFlat/11 1253348 1232910 330.1MB/s txt3 (54.92 %) +1.7%
BM_ZFlat/12 1702373 1702977 269.8MB/s txt4 (66.22 %) -0.0%
BM_ZFlat/13 439792 438557 1116.0MB/s bin (18.11 %) +0.3%
BM_ZFlat/14 80766 78851 462.5MB/s sum (48.96 %) +2.4%
BM_ZFlat/15 7420 7542 534.5MB/s man (59.36 %) -1.6%
BM_ZFlat/16 112043 100126 1.1GB/s pb (19.64 %) +11.9%
BM_ZFlat/17 368877 357703 491.4MB/s gaviota (37.72 %) +3.1%
BM_ZCord/0 116402 113564 859.9MB/s html +2.5%
BM_ZCord/1 1507156 1519911 440.5MB/s urls -0.8%
BM_ZCord/2 39860 33686 3.5GB/s jpg +18.3%
BM_ZCord/3 56211 54694 1.6GB/s pdf +2.8%
BM_ZCord/4 485594 479212 815.1MB/s html4 +1.3%
BM_ZDataBuffer/0 123185 121572 803.3MB/s html +1.3%
BM_ZDataBuffer/1 1569111 1589380 421.3MB/s urls -1.3%
BM_ZDataBuffer/2 53143 49556 2.4GB/s jpg +7.2%
BM_ZDataBuffer/3 65725 66826 1.3GB/s pdf -1.6%
BM_ZDataBuffer/4 517871 514750 758.9MB/s html4 +0.6%
Sum of all benchmarks
20258879 20315484 -0.3%
AMD Instanbul 2.4 GHz:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 97120 96585 1011.1MB/s html +0.6%
BM_UFlat/1 917473 948016 706.3MB/s urls -3.2%
BM_UFlat/2 21496 23938 4.9GB/s jpg -10.2%
BM_UFlat/3 44751 45639 1.9GB/s pdf -1.9%
BM_UFlat/4 391950 391413 998.0MB/s html4 +0.1%
BM_UFlat/5 37366 37201 630.7MB/s cp +0.4%
BM_UFlat/6 18350 18318 580.5MB/s c +0.2%
BM_UFlat/7 5672 5661 626.9MB/s lsp +0.2%
BM_UFlat/8 1533390 1529441 642.1MB/s xls +0.3%
BM_UFlat/9 335477 336553 431.0MB/s txt1 -0.3%
BM_UFlat/10 285140 292080 408.7MB/s txt2 -2.4%
BM_UFlat/11 888507 894758 454.9MB/s txt3 -0.7%
BM_UFlat/12 1187643 1210928 379.5MB/s txt4 -1.9%
BM_UFlat/13 493717 507447 964.5MB/s bin -2.7%
BM_UFlat/14 61740 60870 599.1MB/s sum +1.4%
BM_UFlat/15 7211 7187 560.9MB/s man +0.3%
BM_UFlat/16 97435 93100 1.2GB/s pb +4.7%
BM_UFlat/17 362662 356395 493.2MB/s gaviota +1.8%
BM_UValidate/0 47475 47118 2.0GB/s html +0.8%
BM_UValidate/1 501304 529741 1.2GB/s urls -5.4%
BM_UValidate/2 276 243 486.2GB/s jpg +13.6%
BM_UValidate/3 16361 16261 5.4GB/s pdf +0.6%
BM_UValidate/4 190741 190353 2.0GB/s html4 +0.2%
BM_UDataBuffer/0 111080 109771 889.6MB/s html +1.2%
BM_UDataBuffer/1 1051035 1085999 616.5MB/s urls -3.2%
BM_UDataBuffer/2 25801 25463 4.6GB/s jpg +1.3%
BM_UDataBuffer/3 50493 49946 1.8GB/s pdf +1.1%
BM_UDataBuffer/4 447258 444138 879.5MB/s html4 +0.7%
BM_UCord/0 109350 107909 905.0MB/s html +1.3%
BM_UCord/1 1023396 1054964 634.7MB/s urls -3.0%
BM_UCord/2 25292 24371 4.9GB/s jpg +3.8%
BM_UCord/3 48955 49736 1.8GB/s pdf -1.6%
BM_UCord/4 440452 437331 893.2MB/s html4 +0.7%
BM_UCordString/0 98511 98031 996.2MB/s html +0.5%
BM_UCordString/1 933230 963495 694.9MB/s urls -3.1%
BM_UCordString/2 23311 24076 4.9GB/s jpg -3.2%
BM_UCordString/3 45568 46196 1.9GB/s pdf -1.4%
BM_UCordString/4 397791 396934 984.1MB/s html4 +0.2%
BM_UCordValidate/0 47537 46921 2.0GB/s html +1.3%
BM_UCordValidate/1 505071 532716 1.2GB/s urls -5.2%
BM_UCordValidate/2 1663 1621 72.9GB/s jpg +2.6%
BM_UCordValidate/3 16890 16926 5.2GB/s pdf -0.2%
BM_UCordValidate/4 192365 191984 2.0GB/s html4 +0.2%
BM_ZFlat/0 184708 179103 545.3MB/s html (22.31 %) +3.1%
BM_ZFlat/1 2293864 2302950 290.7MB/s urls (47.77 %) -0.4%
BM_ZFlat/2 52852 47618 2.5GB/s jpg (99.87 %) +11.0%
BM_ZFlat/3 100766 96179 935.3MB/s pdf (82.07 %) +4.8%
BM_ZFlat/4 741220 727977 536.6MB/s html4 (22.51 %) +1.8%
BM_ZFlat/5 85402 85418 274.7MB/s cp (48.12 %) -0.0%
BM_ZFlat/6 36558 36494 291.4MB/s c (42.40 %) +0.2%
BM_ZFlat/7 12706 12507 283.7MB/s lsp (48.37 %) +1.6%
BM_ZFlat/8 2336823 2335688 420.5MB/s xls (41.23 %) +0.0%
BM_ZFlat/9 701804 681153 212.9MB/s txt1 (57.87 %) +3.0%
BM_ZFlat/10 606700 597194 199.9MB/s txt2 (61.93 %) +1.6%
BM_ZFlat/11 1852283 1803238 225.7MB/s txt3 (54.92 %) +2.7%
BM_ZFlat/12 2475527 2443354 188.1MB/s txt4 (66.22 %) +1.3%
BM_ZFlat/13 694497 696654 702.6MB/s bin (18.11 %) -0.3%
BM_ZFlat/14 136929 129855 280.8MB/s sum (48.96 %) +5.4%
BM_ZFlat/15 17172 17124 235.4MB/s man (59.36 %) +0.3%
BM_ZFlat/16 190364 171763 658.4MB/s pb (19.64 %) +10.8%
BM_ZFlat/17 567285 555190 316.6MB/s gaviota (37.72 %) +2.2%
BM_ZCord/0 193490 187031 522.1MB/s html +3.5%
BM_ZCord/1 2427537 2415315 277.2MB/s urls +0.5%
BM_ZCord/2 85378 81412 1.5GB/s jpg +4.9%
BM_ZCord/3 121898 119419 753.3MB/s pdf +2.1%
BM_ZCord/4 779564 762961 512.0MB/s html4 +2.2%
BM_ZDataBuffer/0 213820 207272 471.1MB/s html +3.2%
BM_ZDataBuffer/1 2589010 2586495 258.9MB/s urls +0.1%
BM_ZDataBuffer/2 121871 118885 1018.4MB/s jpg +2.5%
BM_ZDataBuffer/3 145382 145986 616.2MB/s pdf -0.4%
BM_ZDataBuffer/4 868117 852754 458.1MB/s html4 +1.8%
Sum of all benchmarks
33771833 33744763 +0.1%
git-svn-id: https://snappy.googlecode.com/svn/trunk@71
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Sun, 6 Jan 2013 19:21:26 +0000 (19:21 +0000)]
Adjust the Snappy open-source distribution for the changes in Google's
internal file API.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@70
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Fri, 4 Jan 2013 11:54:20 +0000 (11:54 +0000)]
Change a few ORs to additions where they don't matter. This helps the compiler
use the LEA instruction more efficiently, since e.g. a + (b << 2) can be encoded
as one instruction. Even more importantly, it can constant-fold the
COPY_* enums together with the shifted negative constants, which also saves
some instructions. (We don't need it for LITERAL, since it happens to be 0.)
I am unsure why the compiler couldn't do this itself, but the theory is that
it cannot prove that len-1 and len-4 cannot underflow/wrap, and thus can't
do the optimization safely.
The gains are small but measurable; 0.5-1.0% over the BM_Z* benchmarks
(measured on Westmere, Sandy Bridge and Istanbul).
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@69
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Mon, 8 Oct 2012 11:37:16 +0000 (11:37 +0000)]
Stop giving -Werror to automake, due to an incompatibility between current
versions of libtool and automake on non-GNU platforms (e.g. Mac OS X).
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@68
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Fri, 17 Aug 2012 13:54:47 +0000 (13:54 +0000)]
Fix public issue 66: Document GetUncompressedLength better, in particular that
it leaves the source in a state that's not appropriate for RawUncompress.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@67
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 31 Jul 2012 11:44:44 +0000 (11:44 +0000)]
Fix public issue 64: Check for <sys/time.h> at configure time,
since MSVC seemingly does not have it.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@66
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Wed, 4 Jul 2012 09:34:48 +0000 (09:34 +0000)]
Handle the case where gettimeofday() goes backwards or returns the same value
twice; it could cause division by zero in the unit test framework.
(We already had one fix for this in place, but it was incomplete.)
This could in theory happen on any system, since there are few guarantees
about gettimeofday(), but seems to only happen in practice on GNU/Hurd, where
gettimeofday() is cached and only updated ever so often.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@65
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Wed, 4 Jul 2012 09:28:33 +0000 (09:28 +0000)]
Mark ARMv4 as not supporting unaligned accesses (not just ARMv5 and ARMv6);
apparently Debian still targets these by default, giving us segfaults on
armel.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@64
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 22 May 2012 09:46:05 +0000 (09:46 +0000)]
Fix public bug #62: Remove an extraneous comma at the end of an enum list,
causing compile errors when embedded in Mozilla on OpenBSD.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@63
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 22 May 2012 09:32:50 +0000 (09:32 +0000)]
Snappy library no longer depends on iostream.
Achieved by moving logging macro definitions to a test-only
header file, and by changing non-test code to use assert,
fprintf, and abort instead of LOG/CHECK macros.
R=sesse
git-svn-id: https://snappy.googlecode.com/svn/trunk@62
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Fri, 24 Feb 2012 15:46:37 +0000 (15:46 +0000)]
Release Snappy 1.0.5.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@61
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Thu, 23 Feb 2012 17:00:36 +0000 (17:00 +0000)]
For 32-bit platforms, do not try to accelerate multiple neighboring
32-bit loads with a 64-bit load during compression (it's not a win).
The main target for this optimization is ARM, but 32-bit x86 gets
a small gain, too, although there is noise in the microbenchmarks.
It's a no-op for 64-bit x86. It does not affect decompression.
Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from
Ubuntu/Linaro), -O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9
-mthumb-interwork, minimum 1000 iterations:
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_ZFlat/0 1158277 1160000 1000 84.2MB/s html (23.57 %) [ +4.3%]
BM_ZFlat/1
14861782 14860000 1000 45.1MB/s urls (50.89 %) [ +1.1%]
BM_ZFlat/2 393595 390000 1000 310.5MB/s jpg (99.88 %) [ +0.0%]
BM_ZFlat/3 650583 650000 1000 138.4MB/s pdf (82.13 %) [ +3.1%]
BM_ZFlat/4 4661480 4660000 1000 83.8MB/s html4 (23.55 %) [ +4.3%]
BM_ZFlat/5 491973 490000 1000 47.9MB/s cp (48.12 %) [ +2.0%]
BM_ZFlat/6 193575 192678 1038 55.2MB/s c (42.40 %) [ +9.0%]
BM_ZFlat/7 62343 62754 3187 56.5MB/s lsp (48.37 %) [ +2.6%]
BM_ZFlat/8
17708468 17710000 1000 55.5MB/s xls (41.34 %) [ -0.3%]
BM_ZFlat/9 3755345 3760000 1000 38.6MB/s txt1 (59.81 %) [ +8.2%]
BM_ZFlat/10 3324217 3320000 1000 36.0MB/s txt2 (64.07 %) [ +4.2%]
BM_ZFlat/11
10139932 10140000 1000 40.1MB/s txt3 (57.11 %) [ +6.4%]
BM_ZFlat/12
13532109 13530000 1000 34.0MB/s txt4 (68.35 %) [ +5.0%]
BM_ZFlat/13 4690847 4690000 1000 104.4MB/s bin (18.21 %) [ +4.1%]
BM_ZFlat/14 830682 830000 1000 43.9MB/s sum (51.88 %) [ +1.2%]
BM_ZFlat/15 84784 85011 2235 47.4MB/s man (59.36 %) [ +1.1%]
BM_ZFlat/16 1293254 1290000 1000 87.7MB/s pb (23.15 %) [ +2.3%]
BM_ZFlat/17 2775155 2780000 1000 63.2MB/s gaviota (38.27 %) [+12.2%]
Core i7 in 32-bit mode (only one run and 100 iterations, though, so noisy):
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_ZFlat/0 227582 223464 3043 437.0MB/s html (23.57 %) [ +7.4%]
BM_ZFlat/1 2982430 2918455 233 229.4MB/s urls (50.89 %) [ +2.9%]
BM_ZFlat/2 46967 46658 15217 2.5GB/s jpg (99.88 %) [ +0.0%]
BM_ZFlat/3 115298 114864 5833 783.2MB/s pdf (82.13 %) [ +1.5%]
BM_ZFlat/4 913440 899743 778 434.2MB/s html4 (23.55 %) [ +0.3%]
BM_ZFlat/5 110302 108571 7000 216.1MB/s cp (48.12 %) [ +0.0%]
BM_ZFlat/6 44409 43372 15909 245.2MB/s c (42.40 %) [ +0.8%]
BM_ZFlat/7 15713 15643 46667 226.9MB/s lsp (48.37 %) [ +2.7%]
BM_ZFlat/8 2625539 2602230 269 377.4MB/s xls (41.34 %) [ +1.4%]
BM_ZFlat/9 808884 811429 875 178.8MB/s txt1 (59.81 %) [ -3.9%]
BM_ZFlat/10 709532 700000 1000 170.5MB/s txt2 (64.07 %) [ +0.0%]
BM_ZFlat/11 2177682 2162162 333 188.2MB/s txt3 (57.11 %) [ -1.4%]
BM_ZFlat/12 2849640 2840000 250 161.8MB/s txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13 849760 835476 778 585.8MB/s bin (18.21 %) [ +1.2%]
BM_ZFlat/14 165940 164571 4375 221.6MB/s sum (51.88 %) [ +1.4%]
BM_ZFlat/15 20939 20571 35000 196.0MB/s man (59.36 %) [ +2.1%]
BM_ZFlat/16 239209 236544 2917 478.1MB/s pb (23.15 %) [ +4.2%]
BM_ZFlat/17 616206 610000 1000 288.2MB/s gaviota (38.27 %) [ -1.6%]
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@60
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Tue, 21 Feb 2012 17:02:17 +0000 (17:02 +0000)]
Enable the use of unaligned loads and stores for ARM-based architectures
where they are available (ARMv7 and higher). This gives a significant
speed boost on ARM, both for compression and decompression.
It should not affect x86 at all.
There are more changes possible to speed up ARM, but it might not be
that easy to do without hurting x86 or making the code uglier.
Also, we de not try to use NEON yet.
Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro),
-O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork:
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0 524806 529100 378 184.6MB/s html [+33.6%]
BM_UFlat/1 5139790 5200000 100 128.8MB/s urls [+28.8%]
BM_UFlat/2 86540 84166 1901 1.4GB/s jpg [ +0.6%]
BM_UFlat/3 215351 210176 904 428.0MB/s pdf [+29.8%]
BM_UFlat/4 2144490 2100000 100 186.0MB/s html4 [+33.3%]
BM_UFlat/5 194482 190000 1000 123.5MB/s cp [+36.2%]
BM_UFlat/6 91843 90175 2107 117.9MB/s c [+38.6%]
BM_UFlat/7 28535 28426 6684 124.8MB/s lsp [+34.7%]
BM_UFlat/8 9206600 9200000 100 106.7MB/s xls [+42.4%]
BM_UFlat/9 1865273 1886792 106 76.9MB/s txt1 [+32.5%]
BM_UFlat/10 1576809 1587301 126 75.2MB/s txt2 [+32.3%]
BM_UFlat/11 4968450 4900000 100 83.1MB/s txt3 [+32.7%]
BM_UFlat/12 6673970 6700000 100 68.6MB/s txt4 [+32.8%]
BM_UFlat/13 2391470 2400000 100 203.9MB/s bin [+29.2%]
BM_UFlat/14 334601 344827 522 105.8MB/s sum [+30.6%]
BM_UFlat/15 37404 38080 5252 105.9MB/s man [+33.8%]
BM_UFlat/16 535470 540540 370 209.2MB/s pb [+31.2%]
BM_UFlat/17 1875245 1886792 106 93.2MB/s gaviota [+37.8%]
BM_UValidate/0 178425 179533 1114 543.9MB/s html [ +2.7%]
BM_UValidate/1 2100450 2000000 100 334.8MB/s urls [ +5.0%]
BM_UValidate/2 1039 1044 172413 113.3GB/s jpg [ +3.4%]
BM_UValidate/3 59423 59470 3363 1.5GB/s pdf [ +7.8%]
BM_UValidate/4 760716 766283 261 509.8MB/s html4 [ +6.5%]
BM_ZFlat/0 1204632 1204819 166 81.1MB/s html (23.57 %) [+32.8%]
BM_ZFlat/1
15656190 15600000 100 42.9MB/s urls (50.89 %) [+27.6%]
BM_ZFlat/2 403336 410677 487 294.8MB/s jpg (99.88 %) [+16.5%]
BM_ZFlat/3 664073 671140 298 134.0MB/s pdf (82.13 %) [+28.4%]
BM_ZFlat/4 4961940 4900000 100 79.7MB/s html4 (23.55 %) [+30.6%]
BM_ZFlat/5 500664 501253 399 46.8MB/s cp (48.12 %) [+33.4%]
BM_ZFlat/6 217276 215982 926 49.2MB/s c (42.40 %) [+25.0%]
BM_ZFlat/7 64122 65487 3054 54.2MB/s lsp (48.37 %) [+36.1%]
BM_ZFlat/8
18045730 18000000 100 54.6MB/s xls (41.34 %) [+34.4%]
BM_ZFlat/9 4051530 4000000 100 36.3MB/s txt1 (59.81 %) [+25.0%]
BM_ZFlat/10 3451800 3500000 100 34.1MB/s txt2 (64.07 %) [+25.7%]
BM_ZFlat/11
11052340 11100000 100 36.7MB/s txt3 (57.11 %) [+24.3%]
BM_ZFlat/12
14538690 14600000 100 31.5MB/s txt4 (68.35 %) [+24.7%]
BM_ZFlat/13 5041850 5000000 100 97.9MB/s bin (18.21 %) [+32.0%]
BM_ZFlat/14 908840 909090 220 40.1MB/s sum (51.88 %) [+22.2%]
BM_ZFlat/15 86921 86206 1972 46.8MB/s man (59.36 %) [+42.2%]
BM_ZFlat/16 1312315 1315789 152 86.0MB/s pb (23.15 %) [+34.5%]
BM_ZFlat/17 3173120 3200000 100 54.9MB/s gaviota (38.27%) [+28.1%]
The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86;
positive on the decompression side, and slightly negative on the compression side
(unless that is noise; I only ran once):
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0 86279 86140 7778 1.1GB/s html [ +7.5%]
BM_UFlat/1 839265 822622 778 813.9MB/s urls [ +9.4%]
BM_UFlat/2 9180 9143 87500 12.9GB/s jpg [ +1.2%]
BM_UFlat/3 35080 35000 20000 2.5GB/s pdf [+10.1%]
BM_UFlat/4 350318 345000 2000 1.1GB/s html4 [ +7.0%]
BM_UFlat/5 33808 33472 21212 701.0MB/s cp [ +9.0%]
BM_UFlat/6 15201 15214 46667 698.9MB/s c [+14.9%]
BM_UFlat/7 4652 4651 159091 762.9MB/s lsp [ +7.5%]
BM_UFlat/8 1285551 1282528 538 765.7MB/s xls [+10.7%]
BM_UFlat/9 282510 281690 2414 514.9MB/s txt1 [+13.6%]
BM_UFlat/10 243494 239286 2800 498.9MB/s txt2 [+14.4%]
BM_UFlat/11 743625 740000 1000 550.0MB/s txt3 [+14.3%]
BM_UFlat/12 999441 989717 778 464.3MB/s txt4 [+16.1%]
BM_UFlat/13 412402 410076 1707 1.2GB/s bin [ +7.3%]
BM_UFlat/14 54876 54000 10000 675.3MB/s sum [+13.0%]
BM_UFlat/15 6146 6100 100000 660.8MB/s man [+14.8%]
BM_UFlat/16 90496 90286 8750 1.2GB/s pb [ +4.0%]
BM_UFlat/17 292650 292000 2500 602.0MB/s gaviota [+18.1%]
BM_UValidate/0 49620 49699 14286 1.9GB/s html [ +0.0%]
BM_UValidate/1 501371 500000 1000 1.3GB/s urls [ +0.0%]
BM_UValidate/2 232 227 3043478 521.5GB/s jpg [ +1.3%]
BM_UValidate/3 17250 17143 43750 5.1GB/s pdf [ -1.3%]
BM_UValidate/4 198643 200000 3500 1.9GB/s html4 [ -0.9%]
BM_ZFlat/0 227128 229415 3182 425.7MB/s html (23.57 %) [ -1.4%]
BM_ZFlat/1 2970089 2960000 250 226.2MB/s urls (50.89 %) [ -1.9%]
BM_ZFlat/2 45683 44999 15556 2.6GB/s jpg (99.88 %) [ +2.2%]
BM_ZFlat/3 114661 113136 6364 795.1MB/s pdf (82.13 %) [ -1.5%]
BM_ZFlat/4 919702 914286 875 427.2MB/s html4 (23.55%) [ -1.3%]
BM_ZFlat/5 108189 108422 6364 216.4MB/s cp (48.12 %) [ -1.2%]
BM_ZFlat/6 44525 44000 15909 241.7MB/s c (42.40 %) [ -2.9%]
BM_ZFlat/7 15973 15857 46667 223.8MB/s lsp (48.37 %) [ +0.0%]
BM_ZFlat/8 2677888 2639405 269 372.1MB/s xls (41.34 %) [ -1.4%]
BM_ZFlat/9 800715 780000 1000 186.0MB/s txt1 (59.81 %) [ -0.4%]
BM_ZFlat/10 700089 700000 1000 170.5MB/s txt2 (64.07 %) [ -2.9%]
BM_ZFlat/11 2159356 2138365 318 190.3MB/s txt3 (57.11 %) [ -0.3%]
BM_ZFlat/12 2796143 2779923 259 165.3MB/s txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13 856458 835476 778 585.8MB/s bin (18.21 %) [ -0.1%]
BM_ZFlat/14 166908 166857 4375 218.6MB/s sum (51.88 %) [ -1.4%]
BM_ZFlat/15 21181 20857 35000 193.3MB/s man (59.36 %) [ -0.8%]
BM_ZFlat/16 244009 239973 2917 471.3MB/s pb (23.15 %) [ -1.4%]
BM_ZFlat/17 596362 590000 1000 297.9MB/s gaviota (38.27%) [ +0.0%]
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@59
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Sat, 11 Feb 2012 22:11:22 +0000 (22:11 +0000)]
Lower the size allocated in the "corrupted input" unit test from 256 MB
to 2 MB. This fixes issues with running the unit test on platforms with
little RAM (e.g. some ARM boards).
Also, reactivate the 2 MB test for 64-bit platforms; there's no good
reason why it shouldn't be.
R=sanjay
git-svn-id: https://snappy.googlecode.com/svn/trunk@58
03e5f5b5-db94-4691-08a0-
1a8bf15f6143
snappy.mirrorbot@gmail.com [Sun, 8 Jan 2012 17:55:48 +0000 (17:55 +0000)]
Minor refactoring to accomodate changes in Google's internal code tree.
git-svn-id: https://snappy.googlecode.com/svn/trunk@57
03e5f5b5-db94-4691-08a0-
1a8bf15f6143