Don't copy propagate the constant in situations like
mov(8) g8<1>D 0x7fffffffD
mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W
On platforms that only have a 32x16 multiplier, this will result in
lowering the multiply to
mul(8) g15<1>D g14<8,8,1>D 0xffffUW
mul(8) g16<1>D g14<8,8,1>D 0x7fffUW
add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW
On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in
mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD
Volume 2a of the Skylake PRM says:
When multiplying a DW and any lower precision integer, the
DW operand must on src0.
See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104.
Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I
don't think it would be possible to create a situation where this could
occur. I discovered this via some optimizations that can determine that
the non-constant source must be able to fit in 16-bits. The case listed
above came from piglit's "ext_transform_feedback-order arrays points"
with those optimizations in place.
No shader-db or fossil-db changes on any Intel platform.
Fixes:
de6c0f84879 ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
inst->src[i] = val;
progress = true;
} else if (i == 0 && inst->src[1].file != IMM) {
+ /* Don't copy propagate the constant in situations like
+ *
+ * mov(8) g8<1>D 0x7fffffffD
+ * mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W
+ *
+ * On platforms that only have a 32x16 multiplier, this will
+ * result in lowering the multiply to
+ *
+ * mul(8) g15<1>D g14<8,8,1>D 0xffffUW
+ * mul(8) g16<1>D g14<8,8,1>D 0x7fffUW
+ * add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW
+ *
+ * On Gfx8 and Gfx9, which have the full 32x32 multiplier, it
+ * results in
+ *
+ * mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD
+ *
+ * Volume 2a of the Skylake PRM says:
+ *
+ * When multiplying a DW and any lower precision integer, the
+ * DW operand must on src0.
+ */
+ if (inst->opcode == BRW_OPCODE_MUL &&
+ type_sz(inst->src[1].type) < 4 &&
+ type_sz(val.type) == 4)
+ break;
+
/* Fit this constant in by commuting the operands.
* Exception: we can't do this for 32-bit integer MUL/MACH
* because it's asymmetric.