[AMDGPU] Enable load clustering in the post-RA scheduler
authorJay Foad <jay.foad@amd.com>
Tue, 12 Oct 2021 14:39:43 +0000 (15:39 +0100)
committerJay Foad <jay.foad@amd.com>
Wed, 13 Oct 2021 16:12:26 +0000 (17:12 +0100)
commitc885857e9d03dbf2563c43e8a0a99b3f63a4d106
tree02edcb0899d5f4d69c88aecb4927236587ed7197
parent08c8016cfb2af9463514709271ae8c4ad6b19377
[AMDGPU] Enable load clustering in the post-RA scheduler

This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
   allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
   pressure, which in some cases gives it more freedom to reorder
   instructions.

Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
  increased by about 1%.
- The number of runs of two or more load instructions increased by
  about 4%.

Differential Revision: https://reviews.llvm.org/D111646
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
llvm/test/CodeGen/AMDGPU/idiv-licm.ll
llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
llvm/test/CodeGen/AMDGPU/sdiv64.ll
llvm/test/CodeGen/AMDGPU/srem64.ll
llvm/test/CodeGen/AMDGPU/udiv64.ll
llvm/test/CodeGen/AMDGPU/urem64.ll