AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908
authorMatt Arsenault <Matthew.Arsenault@amd.com>
Wed, 15 Dec 2021 02:56:48 +0000 (21:56 -0500)
committerMatt Arsenault <Matthew.Arsenault@amd.com>
Tue, 8 Feb 2022 16:14:52 +0000 (11:14 -0500)
commit8b2ca766f0e58a2a094a4dffbf5b035d584ef475
tree2fd78a9992ee519e7c06d7d45b3f0d824ec0c3d3
parenta7f60bfdf663e1e4092c885139623ea682c73823
AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908

We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
scavenger up with an emergency spill slot, but it also feels
unreasonable to pay that cost for what was assumed to be a simple and
cheap copy. Pick a register that doesn't conflict with any ABI
registers.

This does not address the same issue when copying from SGPR to AGPR
for gfx90a (this coincidentally fixes it for gfx908), but that's less
interesting since the register allocator shouldn't be proactively
introducing such copies.

One edge case I'm worried about is respecting the VGPR budget implied
by amdgpu-waves-per-eu. If the theoretical upper bound of a function
is 32 VGPRs, this will force the actual count to be 33.

This is also broken if inline assembly uses/defs something in v32. The
coalescer will eliminate the intermediate vreg between the def and
use, and the introduced copy will clobber the user value.

(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
llvm/test/CodeGen/AMDGPU/accvgpr-copy.mir
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/agpr-copy-no-vgprs.mir
llvm/test/CodeGen/AMDGPU/agpr-copy-sgpr-no-vgprs.mir
llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx908.mir
llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll