[libc] Improve memcmp latency and codegen
authorGuillaume Chatelet <gchatelet@google.com>
Mon, 12 Jun 2023 13:30:50 +0000 (13:30 +0000)
committerGuillaume Chatelet <gchatelet@google.com>
Mon, 12 Jun 2023 13:47:16 +0000 (13:47 +0000)
commit5e32765c15ab8df3d2635a2bb5078c5b1d5714d5
treeded5fbb7c27ef58a8cdb8d7b8424add7fa9d6ac2
parentaa28875a745d4d70f491b59e1b50f592994c923b
[libc] Improve memcmp latency and codegen

This is based on ideas from @nafi to:
 - use a branchless version of 'cmp' for 'uint32_t',
 - completely resolve the lexicographic comparison through vector
   operations when wide types are available. We also get rid of byte
   reloads and serializing '__builtin_ctzll'.

I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.

The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.

Reviewed By: nafi3000

Differential Revision: https://reviews.llvm.org/D148717
16 files changed:
libc/src/__support/macros/properties/architectures.h
libc/src/string/CMakeLists.txt
libc/src/string/memory_utils/CMakeLists.txt
libc/src/string/memory_utils/aarch64/memcmp_implementations.h
libc/src/string/memory_utils/bcmp_implementations.h
libc/src/string/memory_utils/memcmp_implementations.h
libc/src/string/memory_utils/memmove_implementations.h
libc/src/string/memory_utils/memset_implementations.h
libc/src/string/memory_utils/op_aarch64.h
libc/src/string/memory_utils/op_generic.h
libc/src/string/memory_utils/op_riscv.h [new file with mode: 0644]
libc/src/string/memory_utils/op_x86.h
libc/src/string/memory_utils/utils.h
libc/src/string/memory_utils/x86_64/memcmp_implementations.h
libc/test/src/string/memory_utils/op_tests.cpp
utils/bazel/llvm-project-overlay/libc/BUILD.bazel