[LoongArch] Align functions and loops better according to uarch
authorWANG Xuerui <git@xen0n.name>
Wed, 19 Jul 2023 01:20:53 +0000 (09:20 +0800)
committerWeining Lu <luweining@loongson.cn>
Wed, 19 Jul 2023 03:25:57 +0000 (11:25 +0800)
commitf27017a06313455fa346f0aae95a27100202c728
tree55e6075e60ebb3efce1e0204ecda6c95c6a963b2
parent998f3e71c8e98cd6c3d6110fb7ca6e1700592e75
[LoongArch] Align functions and loops better according to uarch

The LA464 micro-architecture is very sensitive to alignment of hot code,
with performance variation of up to ~12% in the go1 benchmark suite of
the Go language (as observed by me during my work on the Go loong64
port).
[[ https://go.dev/cl/479816 | Manual alignment of certain loops ]] and [[ https://go.dev/cl/479817 | automatic alignment of loop heads ]]
helps a lot there, by reducing much of the random variation and
generally increasing performance, so we naturally want to do the same
here.

Practically, LA464 is the only LoongArch micro-architecture in wide use,
and we are currently supporting just that. The first "4" in "LA464"
stands for "4-issue", in particular its instruction fetch and decode
stages are 4-wide; so functions and branch targets should be preferably
aligned to at least 16 bytes for best throughput.

The Loongson team has benchmarked various combinations of function,
loop, and branch target alignments with GCC.
[[ https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619980.html | The results ]]
show that "16-byte label alignment together with 32-byte function
alignment gives best results in terms of SPEC score". A "label" in GCC
means a branch target; while we don't currently align branch targets,
we do align loops, so in this patch we default to 32-byte function
alignment and 16-byte loop alignment.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D148622
llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
llvm/lib/Target/LoongArch/LoongArchSubtarget.cpp
llvm/lib/Target/LoongArch/LoongArchSubtarget.h
llvm/test/CodeGen/LoongArch/atomicrmw-uinc-udec-wrap.ll
llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-fp.ll
llvm/test/CodeGen/LoongArch/ir-instruction/br.ll
llvm/test/CodeGen/LoongArch/preferred-alignments.ll [new file with mode: 0644]