aco: use v_fma_mix_f32 for v_fma_f32 with 2 fp16 representable, different literals
authorGeorg Lehmann <dadschoorse@gmail.com>
Mon, 9 Jan 2023 11:32:56 +0000 (12:32 +0100)
committerMarge Bot <emma+marge@anholt.net>
Thu, 2 Mar 2023 10:59:05 +0000 (10:59 +0000)
commitede0630f9e14614e036c57c2d4401f750b8398a2
tree222f5e8027b6392e2153054b310aea7f4ea09187
parented349951cb78b59afa53a2fd4a2206ecc883e3bc
aco: use v_fma_mix_f32 for v_fma_f32 with 2 fp16 representable, different literals

We can pack two fp16 literals into one 32bit literal and use opsel to select
the correct value. Note that LLVM currently disassembles these instructions
incorrectly.

Foz-DB Navi21:
Totals from 13365 (9.91% of 134913) affected shaders:
VGPRs: 840880 -> 840016 (-0.10%); split: -0.11%, +0.01%
SpillSGPRs: 724 -> 722 (-0.28%)
CodeSize: 82439364 -> 82451336 (+0.01%); split: -0.06%, +0.08%
MaxWaves: 244858 -> 244980 (+0.05%)
Instrs: 15265976 -> 15247201 (-0.12%); split: -0.13%, +0.01%
Latency: 223316180 -> 223272495 (-0.02%); split: -0.03%, +0.02%
InvThroughput: 41981375 -> 41969917 (-0.03%); split: -0.04%, +0.01%
VClause: 266775 -> 266558 (-0.08%); split: -0.14%, +0.06%
SClause: 646602 -> 645996 (-0.09%); split: -0.16%, +0.07%
Copies: 794703 -> 776075 (-2.34%); split: -2.46%, +0.12%
Branches: 296317 -> 296316 (-0.00%)
PreSGPRs: 658796 -> 656479 (-0.35%); split: -0.35%, +0.00%
PreVGPRs: 744014 -> 743679 (-0.05%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20587>
src/amd/compiler/aco_optimizer.cpp