AMDGPU: Use GlobalPriority for largest register tuples
authorMatt Arsenault <Matthew.Arsenault@amd.com>
Sat, 23 Jul 2022 16:32:05 +0000 (12:32 -0400)
committerMatt Arsenault <Matthew.Arsenault@amd.com>
Thu, 15 Sep 2022 15:45:02 +0000 (11:45 -0400)
commit69153d6c0a3f9110bc455b1cca28a5a71e2ac933
tree85772419e50c6f657fb0a670cdbe5d12a0f67417
parent3afd351b5fd9006932857a6daf42cbd1c79c4a22
AMDGPU: Use GlobalPriority for largest register tuples

Only do this for 16 and 32 register tuples, although we might want to
extend to 8 tuples.

It's incredibly expensive to spill these, and doing so majorly
interferes with the ability to allocate anything else in the function.

The lit tests show mostly sizeable improvements with a handful of tiny
regressions with large vectors.
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.large.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.ll
llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
llvm/test/CodeGen/AMDGPU/mfma-no-register-aliasing.ll
llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll