RegAllocFast: Rewrite and improve
authorMatt Arsenault <Matthew.Arsenault@amd.com>
Mon, 14 Sep 2020 16:48:12 +0000 (12:48 -0400)
committerMatt Arsenault <Matthew.Arsenault@amd.com>
Fri, 18 Sep 2020 18:05:18 +0000 (14:05 -0400)
commitc8757ff3aa7dd7a25a6343f6ef74a70c7be04325
treede28f99cbc472fdec5ab282baf47485ac5e43db5
parent870fd53e4f6357946f4bad0b861c510cd107420c
RegAllocFast: Rewrite and improve

This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:

Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).

Check register mask operands (calls) instead of conservatively
assuming everything is clobbered.  Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition.  When testing this on the full llvm
test-suite including SPEC externals I measured:

average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)

Also adds a few testcases that were broken without this patch, in
particular bug 47278.

Patch mostly by Matthias Braun
184 files changed:
llvm/lib/CodeGen/RegAllocFast.cpp
llvm/test/CodeGen/AArch64/GlobalISel/darwin-tls-call-clobber.ll
llvm/test/CodeGen/AArch64/arm64-fast-isel-br.ll
llvm/test/CodeGen/AArch64/arm64-fast-isel-call.ll
llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
llvm/test/CodeGen/AArch64/arm64-fast-isel-conversion.ll
llvm/test/CodeGen/AArch64/arm64-vcvt_f.ll
llvm/test/CodeGen/AArch64/arm64_32-fastisel.ll
llvm/test/CodeGen/AArch64/arm64_32-null.ll
llvm/test/CodeGen/AArch64/br-cond-not-merge.ll
llvm/test/CodeGen/AArch64/cmpxchg-O0.ll
llvm/test/CodeGen/AArch64/combine-loads.ll
llvm/test/CodeGen/AArch64/fast-isel-cmpxchg.ll
llvm/test/CodeGen/AArch64/popcount.ll
llvm/test/CodeGen/AArch64/swift-return.ll
llvm/test/CodeGen/AArch64/swifterror.ll
llvm/test/CodeGen/AArch64/unwind-preserved-from-mir.mir
llvm/test/CodeGen/AArch64/unwind-preserved.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm.ll
llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll
llvm/test/CodeGen/AMDGPU/fast-ra-kills-vcc.mir [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/fastregalloc-illegal-subreg-physreg.mir [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/fastregalloc-self-loop-heuristic.mir
llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll
llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll
llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
llvm/test/CodeGen/AMDGPU/spill-agpr.mir
llvm/test/CodeGen/AMDGPU/spill-m0.ll
llvm/test/CodeGen/AMDGPU/spill192.mir
llvm/test/CodeGen/AMDGPU/unexpected-reg-unit-state.mir [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/wwm-reserved.ll
llvm/test/CodeGen/ARM/2010-08-04-StackVariable.ll
llvm/test/CodeGen/ARM/Windows/alloca.ll
llvm/test/CodeGen/ARM/cmpxchg-O0-be.ll
llvm/test/CodeGen/ARM/cmpxchg-O0.ll
llvm/test/CodeGen/ARM/crash-greedy-v6.ll
llvm/test/CodeGen/ARM/debug-info-blocks.ll
llvm/test/CodeGen/ARM/fast-isel-call.ll
llvm/test/CodeGen/ARM/fast-isel-intrinsic.ll
llvm/test/CodeGen/ARM/fast-isel-ldr-str-thumb-neg-index.ll
llvm/test/CodeGen/ARM/fast-isel-select.ll
llvm/test/CodeGen/ARM/fast-isel-vararg.ll
llvm/test/CodeGen/ARM/ldrd.ll
llvm/test/CodeGen/ARM/legalize-bitcast.ll
llvm/test/CodeGen/ARM/stack-guard-reassign.ll
llvm/test/CodeGen/ARM/swifterror.ll
llvm/test/CodeGen/ARM/thumb-big-stack.ll
llvm/test/CodeGen/Hexagon/vect/vect-load-v4i16.ll
llvm/test/CodeGen/Mips/Fast-ISel/callabi.ll
llvm/test/CodeGen/Mips/Fast-ISel/memtest1.ll
llvm/test/CodeGen/Mips/Fast-ISel/pr40325.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/add.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/add_vec.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/aggregate_struct_return.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/bitreverse.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/bitwise.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/branch.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/brindirect.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/bswap.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/call.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/ctlz.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/cttz.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/dyn_stackalloc.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/fcmp.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/float_constants.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/fptosi_and_fptoui.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/global_address.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/global_address_pic.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/icmp.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/jump_table_and_brjt.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/load_4_unaligned.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/load_split_because_of_memsize_or_align.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/long_ambiguous_chain_s32.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/long_ambiguous_chain_s64.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/mul.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/mul_vec.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/phi.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/rem_and_div.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/select.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/sitofp_and_uitofp.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/store_4_unaligned.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/store_split_because_of_memsize_or_align.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/sub.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/sub_vec.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/test_TypeInfoforMF.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/var_arg.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/zextLoad_and_sextLoad.ll
llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/zext_and_sext.ll
llvm/test/CodeGen/Mips/atomic-min-max.ll
llvm/test/CodeGen/Mips/atomic.ll
llvm/test/CodeGen/Mips/atomic64.ll
llvm/test/CodeGen/Mips/atomicCmpSwapPW.ll
llvm/test/CodeGen/Mips/copy-fp64.ll
llvm/test/CodeGen/Mips/implicit-sret.ll
llvm/test/CodeGen/Mips/micromips-eva.mir
llvm/test/CodeGen/Mips/msa/ldr_str.ll
llvm/test/CodeGen/PowerPC/addegluecrash.ll
llvm/test/CodeGen/PowerPC/aggressive-anti-dep-breaker-subreg.ll
llvm/test/CodeGen/PowerPC/aix-overflow-toc.py
llvm/test/CodeGen/PowerPC/anon_aggr.ll
llvm/test/CodeGen/PowerPC/builtins-ppc-p10vsx.ll
llvm/test/CodeGen/PowerPC/elf-common.ll
llvm/test/CodeGen/PowerPC/fast-isel-pcrel.ll
llvm/test/CodeGen/PowerPC/fp-int128-fp-combine.ll
llvm/test/CodeGen/PowerPC/fp-strict-fcmp-noopt.ll
llvm/test/CodeGen/PowerPC/fp64-to-int16.ll
llvm/test/CodeGen/PowerPC/p9-vinsert-vextract.ll
llvm/test/CodeGen/PowerPC/popcount.ll
llvm/test/CodeGen/PowerPC/spill-nor0.ll
llvm/test/CodeGen/PowerPC/spill-nor0.mir [new file with mode: 0644]
llvm/test/CodeGen/PowerPC/stack-guard-reassign.ll
llvm/test/CodeGen/PowerPC/vsx-args.ll
llvm/test/CodeGen/PowerPC/vsx.ll
llvm/test/CodeGen/SPARC/fp16-promote.ll
llvm/test/CodeGen/SystemZ/swift-return.ll
llvm/test/CodeGen/SystemZ/swifterror.ll
llvm/test/CodeGen/Thumb2/LowOverheadLoops/branch-targets.ll
llvm/test/CodeGen/Thumb2/high-reg-spill.mir
llvm/test/CodeGen/Thumb2/mve-vector-spill.ll
llvm/test/CodeGen/X86/2009-04-14-IllegalRegs.ll
llvm/test/CodeGen/X86/2010-06-28-FastAllocTiedOperand.ll
llvm/test/CodeGen/X86/2013-10-14-FastISel-incorrect-vreg.ll
llvm/test/CodeGen/X86/atomic-monotonic.ll
llvm/test/CodeGen/X86/atomic-unordered.ll
llvm/test/CodeGen/X86/atomic32.ll
llvm/test/CodeGen/X86/atomic64.ll
llvm/test/CodeGen/X86/atomic6432.ll
llvm/test/CodeGen/X86/avx-load-store.ll
llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll
llvm/test/CodeGen/X86/bug47278-eflags-error.mir [new file with mode: 0644]
llvm/test/CodeGen/X86/bug47278.mir [new file with mode: 0644]
llvm/test/CodeGen/X86/crash-O0.ll
llvm/test/CodeGen/X86/extend-set-cc-uses-dbg.ll
llvm/test/CodeGen/X86/fast-isel-cmp-branch.ll
llvm/test/CodeGen/X86/fast-isel-nontemporal.ll
llvm/test/CodeGen/X86/fast-isel-select-sse.ll
llvm/test/CodeGen/X86/fast-isel-select.ll
llvm/test/CodeGen/X86/fast-isel-x86-64.ll
llvm/test/CodeGen/X86/mixed-ptr-sizes-i686.ll
llvm/test/CodeGen/X86/mixed-ptr-sizes.ll
llvm/test/CodeGen/X86/phys-reg-local-regalloc.ll
llvm/test/CodeGen/X86/pr11415.ll
llvm/test/CodeGen/X86/pr1489.ll
llvm/test/CodeGen/X86/pr27591.ll
llvm/test/CodeGen/X86/pr30430.ll
llvm/test/CodeGen/X86/pr30813.ll
llvm/test/CodeGen/X86/pr32241.ll
llvm/test/CodeGen/X86/pr32284.ll
llvm/test/CodeGen/X86/pr32340.ll
llvm/test/CodeGen/X86/pr32345.ll
llvm/test/CodeGen/X86/pr32451.ll
llvm/test/CodeGen/X86/pr32484.ll
llvm/test/CodeGen/X86/pr34592.ll
llvm/test/CodeGen/X86/pr34653.ll
llvm/test/CodeGen/X86/pr39733.ll
llvm/test/CodeGen/X86/pr42452.ll
llvm/test/CodeGen/X86/pr44749.ll
llvm/test/CodeGen/X86/pr47000.ll
llvm/test/CodeGen/X86/regalloc-fast-missing-live-out-spill.mir
llvm/test/CodeGen/X86/stack-protector-msvc.ll
llvm/test/CodeGen/X86/stack-protector-strong-macho-win32-xor.ll
llvm/test/CodeGen/X86/swift-return.ll
llvm/test/CodeGen/X86/swifterror.ll
llvm/test/CodeGen/X86/volatile.ll
llvm/test/CodeGen/X86/win64_eh.ll
llvm/test/CodeGen/X86/x86-32-intrcc.ll
llvm/test/CodeGen/X86/x86-64-intrcc.ll
llvm/test/DebugInfo/AArch64/frameindices.ll
llvm/test/DebugInfo/AArch64/prologue_end.ll
llvm/test/DebugInfo/ARM/prologue_end.ll
llvm/test/DebugInfo/Mips/delay-slot.ll
llvm/test/DebugInfo/Mips/prologue_end.ll
llvm/test/DebugInfo/X86/dbg-declare-arg.ll
llvm/test/DebugInfo/X86/fission-ranges.ll
llvm/test/DebugInfo/X86/op_deref.ll
llvm/test/DebugInfo/X86/parameters.ll
llvm/test/DebugInfo/X86/pieces-1.ll
llvm/test/DebugInfo/X86/prologue-stack.ll
llvm/test/DebugInfo/X86/reference-argument.ll
llvm/test/DebugInfo/X86/spill-indirect-nrvo.ll
llvm/test/DebugInfo/X86/sret.ll
llvm/test/DebugInfo/X86/subreg.ll