From db204121686726c74dd0aba2d1c1790d40e7baba Mon Sep 17 00:00:00 2001
From: Ian Romanick <ian.d.romanick@intel.com>
Date: Tue, 8 Feb 2022 13:26:13 -0800
Subject: [PATCH] intel/fs: Fix constant propagation into 32x16 integer
 multiplication

Don't copy propagate the constant in situations like

    mov(8)          g8<1>D          0x7fffffffD
    mul(8)          g16<1>D         g8<8,8,1>D      g15<16,8,2>W

On platforms that only have a 32x16 multiplier, this will result in
lowering the multiply to

    mul(8)          g15<1>D         g14<8,8,1>D     0xffffUW
    mul(8)          g16<1>D         g14<8,8,1>D     0x7fffUW
    add(8)          g15.1<2>UW      g15.1<16,8,2>UW g16<16,8,2>UW

On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in

    mul(8)          g16<1>D         g15<16,8,2>W    0x7fffffffD

Volume 2a of the Skylake PRM says:

    When multiplying a DW and any lower precision integer, the
    DW operand must on src0.

See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104.

Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I
don't think it would be possible to create a situation where this could
occur.  I discovered this via some optimizations that can determine that
the non-constant source must be able to fit in 16-bits.  The case listed
above came from piglit's "ext_transform_feedback-order arrays points"
with those optimizations in place.

No shader-db or fossil-db changes on any Intel platform.

Fixes: de6c0f84879 ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
---
 src/intel/compiler/brw_fs_copy_propagation.cpp | 27 ++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp b/src/intel/compiler/brw_fs_copy_propagation.cpp
index 8535b31..9819b5d 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -836,6 +836,33 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry)
             inst->src[i] = val;
             progress = true;
          } else if (i == 0 && inst->src[1].file != IMM) {
+            /* Don't copy propagate the constant in situations like
+             *
+             *    mov(8)          g8<1>D          0x7fffffffD
+             *    mul(8)          g16<1>D         g8<8,8,1>D      g15<16,8,2>W
+             *
+             * On platforms that only have a 32x16 multiplier, this will
+             * result in lowering the multiply to
+             *
+             *    mul(8)          g15<1>D         g14<8,8,1>D     0xffffUW
+             *    mul(8)          g16<1>D         g14<8,8,1>D     0x7fffUW
+             *    add(8)          g15.1<2>UW      g15.1<16,8,2>UW g16<16,8,2>UW
+             *
+             * On Gfx8 and Gfx9, which have the full 32x32 multiplier, it
+             * results in
+             *
+             *    mul(8)          g16<1>D         g15<16,8,2>W    0x7fffffffD
+             *
+             * Volume 2a of the Skylake PRM says:
+             *
+             *    When multiplying a DW and any lower precision integer, the
+             *    DW operand must on src0.
+             */
+            if (inst->opcode == BRW_OPCODE_MUL &&
+                type_sz(inst->src[1].type) < 4 &&
+                type_sz(val.type) == 4)
+               break;
+
             /* Fit this constant in by commuting the operands.
              * Exception: we can't do this for 32-bit integer MUL/MACH
              * because it's asymmetric.
-- 
2.7.4