tsan: optimize memory access functions
authorDmitry Vyukov <dvyukov@google.com>
Fri, 30 May 2014 13:36:29 +0000 (13:36 +0000)
committerDmitry Vyukov <dvyukov@google.com>
Fri, 30 May 2014 13:36:29 +0000 (13:36 +0000)
commitafdcc96d9f0a3325a70d8433cd103efdb56340a9
tree6546e8cdc55478ed9e00735e6c726705c8d886a0
parenta2332425c4bd5581c6a25ba8a1c0b0a25865f269
tsan: optimize memory access functions
The optimization is two-fold:
First, the algorithm now uses SSE instructions to
handle all 4 shadow slots at once. This makes processing
faster.
Second, if shadow contains the same access, we do not
store the event into trace. This increases effective
trace size, that is, tsan can remember up to 10x more
previous memory accesses.

Perofrmance impact:
Before:
[       OK ] DISABLED_BENCH.Mop8Read (2461 ms)
[       OK ] DISABLED_BENCH.Mop8Write (1836 ms)
After:
[       OK ] DISABLED_BENCH.Mop8Read (1204 ms)
[       OK ] DISABLED_BENCH.Mop8Write (976 ms)
But this measures only fast-path.
On large real applications the speedup is ~20%.

Trace size impact:
On app1:
Memory accesses                   :       1163265870
  Including same                  :        791312905 (68%)
on app2:
Memory accesses                   :        166875345
  Including same                  :        150449689 (90%)
90% of filtered events means that trace size is effectively 10x larger.

llvm-svn: 209897
compiler-rt/lib/sanitizer_common/tests/sanitizer_deadlock_detector_test.cc
compiler-rt/lib/tsan/check_analyze.sh
compiler-rt/lib/tsan/rtl/Makefile.old
compiler-rt/lib/tsan/rtl/tsan_defs.h
compiler-rt/lib/tsan/rtl/tsan_rtl.cc
compiler-rt/lib/tsan/rtl/tsan_rtl.h
compiler-rt/lib/tsan/rtl/tsan_stat.cc
compiler-rt/lib/tsan/rtl/tsan_stat.h
compiler-rt/lib/tsan/rtl/tsan_update_shadow_word_inl.h
compiler-rt/lib/tsan/tests/unit/tsan_mman_test.cc