aco: use p_parallelcopy for uniform reduction with zero source
authorRhys Perry <pendingchaos02@gmail.com>
Fri, 13 May 2022 12:39:01 +0000 (13:39 +0100)
committerMarge Bot <emma+marge@anholt.net>
Tue, 31 May 2022 18:07:34 +0000 (18:07 +0000)
I think v_mov_b32 was only used because a sub-dword p_parallelcopy
couldn't take constants on some gfx levels. That shouldn't be the case
anymore.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16595>

src/amd/compiler/aco_instruction_selection.cpp

index 5b337f7..51285cf 100644 (file)
@@ -7725,10 +7725,8 @@ emit_addition_uniform_reduce(isel_context* ctx, nir_op op, Definition dst, nir_s
          bld.pseudo(aco_opcode::p_extract_vector, dst, count, Operand::zero());
       else if (nir_src_as_uint(src) == 1)
          bld.copy(dst, count);
-      else if (nir_src_as_uint(src) == 0 && dst.bytes() <= 2)
-         bld.vop1(aco_opcode::v_mov_b32, dst, Operand::zero()); /* RA will use SDWA if possible */
       else if (nir_src_as_uint(src) == 0)
-         bld.copy(dst, Operand::zero());
+         bld.copy(dst, Operand::zero(dst.bytes()));
       else if (count.type() == RegType::vgpr)
          bld.v_mul_imm(dst, count, nir_src_as_uint(src));
       else