[AMDGPU] Improve v_cmpx usage on GFX10.3.
authorThomas Symalla <thomas.symalla@amd.com>
Tue, 1 Feb 2022 09:28:18 +0000 (10:28 +0100)
committerThomas Symalla <thomas.symalla@amd.com>
Fri, 25 Mar 2022 10:40:18 +0000 (11:40 +0100)
commit718aec209c891487294d8a6199cf12c796c6e901
tree2e32001a85b0eb0564ee820fa37cf4872d0bab32
parente699b5da445207d0957839aad139ee62f31cb0f5
[AMDGPU] Improve v_cmpx usage on GFX10.3.

On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Same as D119696 including a buildbot and MIR test fix.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D122332
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
llvm/lib/Target/AMDGPU/SIInstrInfo.h
llvm/lib/Target/AMDGPU/SIInstrInfo.td
llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
llvm/lib/Target/AMDGPU/VOPCInstructions.td
llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx10-branch-offset-bug.ll
llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx.ll [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx.mir [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/wqm.ll