[SpecialCaseList] Remove TrigramIndex
authorEllis Hoag <ellis.sparky.hoag@gmail.com>
Mon, 19 Jun 2023 17:39:13 +0000 (10:39 -0700)
committerEllis Hoag <ellis.sparky.hoag@gmail.com>
Mon, 19 Jun 2023 17:41:45 +0000 (10:41 -0700)
commit13906755427611fe2d1b1bf915e87b97ddffd236
tree9432b915511fad8eee3274b0ca86fca414dab18a
parentd5659808b2fabadd2fff4f19d7aed2e2ff3435f5
[SpecialCaseList] Remove TrigramIndex

`TrigramIndex` was added back in https://reviews.llvm.org/D27188 as an optimization to make `SpecialCaseList::match()` faster. I've found that `TrigramIndex` actually makes the function slower and it has no functional use, so we can remove it.

I grabbed the list of queries passed to `SpecialCaseList::match()` on a random very large file (`AArch64ISelLowering.cpp`) and measured the runtime to call `match()` on all of them with [this line](https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/lib/Support/SpecialCaseList.cpp#L64) disabled and then enabled.

```
$ hyperfine --warmup 3 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests' 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests'
Benchmark 1: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests
  Time (mean ± σ):     575.9 ms ±  20.3 ms    [User: 573.1 ms, System: 2.7 ms]
  Range (min … max):   555.5 ms … 620.0 ms    10 runs

Benchmark 2: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests
  Time (mean ± σ):     283.4 ms ±   6.7 ms    [User: 280.3 ms, System: 3.0 ms]
  Range (min … max):   277.0 ms … 294.9 ms    10 runs

Summary
  'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests' ran
    2.03 ± 0.09 times faster than 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests'
```

Using `perf` I found that most of the runtime in `TrigramIndex::isDefinitelyOut()` comes from a division operation that seems to come from `std::unordered_map`: https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/include/llvm/Support/TrigramIndex.h#L62

Removing `TrigramIndex` will make it easier to potentially switch to using `GlobPattern` instead of a full regex for `SpecialCaseList`. See discussion in https://reviews.llvm.org/D152762 for details.

Reviewed By: MaskRay, #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D153171
llvm/include/llvm/Support/SpecialCaseList.h
llvm/include/llvm/Support/TrigramIndex.h [deleted file]
llvm/lib/Support/CMakeLists.txt
llvm/lib/Support/SpecialCaseList.cpp
llvm/lib/Support/TrigramIndex.cpp [deleted file]
llvm/unittests/Support/CMakeLists.txt
llvm/unittests/Support/TrigramIndexTest.cpp [deleted file]
llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
llvm/utils/gn/secondary/llvm/unittests/Support/BUILD.gn