[Support] Add llvm::xxh3_64bits
ld.lld SHF_MERGE|SHF_STRINGS duplicate elimination is computation heavy
and utilitizes llvm::xxHash64, a simplified version of XXH64.
Externally many sources confirm that a new variant XXH3 is much faster.
I have picked a few hash implementations and computed the
proportion of time spent on hashing in the overall link time (a debug
build of clang 16 on a machine using AMD Zen 2 architecture):
* llvm::xxHash64: 3.63%
* official XXH64 (`#define XXH_VECTOR XXH_SCALAR`): 3.53%
* official XXH3_64bits (`#define XXH_VECTOR XXH_SCALAR`): 1.21%
* official XXH3_64bits (default, essentially `XXH_SSE2`): 1.22%
* this patch llvm::xxh3_64bits: 1.19%
The remaining part of lld remains unchanged. Consequently, a lower ratio
indicates that hashing is faster. Therefore, it is evident that XXH3 from xxhash
is significantly faster than both the official version and our llvm::xxHash64.
(
string length: count
1-3: 393434
4-8: 2084056
9-16: 2846249
17-128: 5598928
129-240: 1317989
241-: 328058
)
This patch adds heavily simplified https://github.com/Cyan4973/xxHash,
taking account of many simplification ideas from Devin Hussey's xxhash-clean.
Important x86-64 optimization ideas:
* Make XXH3_len_129to240_64b and XXH3_hashLong_64b noinline
* Unroll XXH3_len_17to128_64b
* __restrict does not affect Clang code generation
Beside SHF_MERGE|SHF_STRINGS duplicate elimination, llvm/ADT/StringMap.h
StringMapImpl::LookupBucketFor and a few places in lld can potentially be
accelerated by switching to llvm::xxh3_64bits.
Link: https://github.com/llvm/llvm-project/issues/63750
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D154812