intel/fs: Lower integer multiply correctly when destination stride equals 4.
authorFrancisco Jerez <currojerez@riseup.net>
Thu, 17 Jan 2019 03:01:04 +0000 (19:01 -0800)
committerFrancisco Jerez <currojerez@riseup.net>
Thu, 21 Feb 2019 22:07:25 +0000 (14:07 -0800)
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
src/intel/compiler/brw_fs.cpp

index 4a8f8ea..b4730c3 100644 (file)
@@ -3996,18 +3996,22 @@ fs_visitor::lower_integer_multiplication()
 
             bool needs_mov = false;
             fs_reg orig_dst = inst->dst;
+
+            /* Get a new VGRF for the "low" 32x16-bit multiplication result if
+             * reusing the original destination is impossible due to hardware
+             * restrictions, source/destination overlap, or it being the null
+             * register.
+             */
             fs_reg low = inst->dst;
             if (orig_dst.is_null() || orig_dst.file == MRF ||
                 regions_overlap(inst->dst, inst->size_written,
                                 inst->src[0], inst->size_read(0)) ||
                 regions_overlap(inst->dst, inst->size_written,
-                                inst->src[1], inst->size_read(1))) {
+                                inst->src[1], inst->size_read(1)) ||
+                inst->dst.stride >= 4) {
                needs_mov = true;
-               /* Get a new VGRF but keep the same stride as inst->dst */
                low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
                             inst->dst.type);
-               low.stride = inst->dst.stride;
-               low.offset = inst->dst.offset % REG_SIZE;
             }
 
             /* Get a new VGRF but keep the same stride as inst->dst */