[AArch64] Pattern-match SVE extending gather loads
authorRichard Sandiford <richard.sandiford@arm.com>
Sat, 16 Nov 2019 11:26:11 +0000 (11:26 +0000)
committerRichard Sandiford <rsandifo@gcc.gnu.org>
Sat, 16 Nov 2019 11:26:11 +0000 (11:26 +0000)
This patch pattern-matches a partial gather load followed by a sign or
zero extension into an extending gather load.  (The partial gather load
is already an extending load; we just don't rely on the upper bits of
the elements.)

2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
* config/aarch64/iterators.md (SVE_2BHSI, SVE_2HSDI, SVE_4BHI)
(SVE_4HSI): New mode iterators.
(ANY_EXTEND2): New code iterator.
* config/aarch64/aarch64-sve.md
(@aarch64_gather_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>):
Extend to...
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
...this, handling extension to partial modes as well as full modes.
Describe the extension as a predicated rather than unpredicated
extension.
(@aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>):
Likewise extend to...
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
Likewise extend to...
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_sxtw)
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
Likewise extend to...
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_uxtw)
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked):
New pattern.
(*aarch64_ldff1_gather<mode>_sxtw): Canonicalize to a constant
extension predicate.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
Describe the extension as a predicated rather than unpredicated
extension.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
Likewise.  Canonicalize to a constant extension predicate.
* config/aarch64/aarch64-sve-builtins-base.cc
(svld1_gather_extend_impl::expand): Add an extra predicate for
the extension.
(svldff1_gather_extend_impl::expand): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/gather_load_extend_1.c: New test.
* gcc.target/aarch64/sve/gather_load_extend_2.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_3.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_4.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_5.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_6.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_7.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_8.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_9.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_10.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_11.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_12.c: Likewise.

From-SVN: r278346

17 files changed:
gcc/ChangeLog
gcc/config/aarch64/aarch64-sve-builtins-base.cc
gcc/config/aarch64/aarch64-sve.md
gcc/config/aarch64/iterators.md
gcc/testsuite/ChangeLog
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_10.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_11.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_12.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_4.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_5.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_6.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_7.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_8.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_9.c [new file with mode: 0644]

index 1253306..78a79ae 100644 (file)
@@ -1,5 +1,45 @@
 2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
 
+       * config/aarch64/iterators.md (SVE_2BHSI, SVE_2HSDI, SVE_4BHI)
+       (SVE_4HSI): New mode iterators.
+       (ANY_EXTEND2): New code iterator.
+       * config/aarch64/aarch64-sve.md
+       (@aarch64_gather_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>):
+       Extend to...
+       (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
+       ...this, handling extension to partial modes as well as full modes.
+       Describe the extension as a predicated rather than unpredicated
+       extension.
+       (@aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>):
+       Likewise extend to...
+       (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
+       ...this, making the same adjustments.
+       (*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
+       Likewise extend to...
+       (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_sxtw)
+       ...this, making the same adjustments.
+       (*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
+       Likewise extend to...
+       (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_uxtw)
+       ...this, making the same adjustments.
+       (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked):
+       New pattern.
+       (*aarch64_ldff1_gather<mode>_sxtw): Canonicalize to a constant
+       extension predicate.
+       (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
+       (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
+       (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
+       Describe the extension as a predicated rather than unpredicated
+       extension.
+       (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
+       Likewise.  Canonicalize to a constant extension predicate.
+       * config/aarch64/aarch64-sve-builtins-base.cc
+       (svld1_gather_extend_impl::expand): Add an extra predicate for
+       the extension.
+       (svldff1_gather_extend_impl::expand): Likewise.
+
+2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
+
        * config/aarch64/iterators.md (SVE_24, SVE_2, SVE_4): New mode
        iterators.
        * config/aarch64/aarch64-sve.md
index e12882f..52166c4 100644 (file)
@@ -1097,6 +1097,8 @@ public:
     /* Put the predicate last, since the extending gathers use the same
        operand order as mask_gather_load_optab.  */
     e.rotate_inputs_left (0, 5);
+    /* Add a constant predicate for the extension rtx.  */
+    e.args.quick_push (CONSTM1_RTX (VNx16BImode));
     insn_code icode = code_for_aarch64_gather_load (extend_rtx_code (),
                                                    e.vector_mode (0),
                                                    e.memory_vector_mode ());
@@ -1234,6 +1236,8 @@ public:
     /* Put the predicate last, since ldff1_gather uses the same operand
        order as mask_gather_load_optab.  */
     e.rotate_inputs_left (0, 5);
+    /* Add a constant predicate for the extension rtx.  */
+    e.args.quick_push (CONSTM1_RTX (VNx16BImode));
     insn_code icode = code_for_aarch64_ldff1_gather (extend_rtx_code (),
                                                     e.vector_mode (0),
                                                     e.memory_vector_mode ());
index e26ac45..1dcbb4b 100644 (file)
 
 ;; Predicated extending gather loads for 32-bit elements.  Operand 3 is
 ;; true for unsigned extension and false for signed extension.
-(define_insn "@aarch64_gather_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>"
-  [(set (match_operand:VNx4_WIDE 0 "register_operand" "=w, w, w, w, w, w")
-       (ANY_EXTEND:VNx4_WIDE
-         (unspec:VNx4_NARROW
-           [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl")
-            (match_operand:DI 1 "aarch64_sve_gather_offset_<VNx4_NARROW:Vesize>" "Z, vg<VNx4_NARROW:Vesize>, rk, rk, rk, rk")
-            (match_operand:VNx4_WIDE 2 "register_operand" "w, w, w, w, w, w")
-            (match_operand:DI 3 "const_int_operand" "Ui1, Ui1, Z, Ui1, Z, Ui1")
-            (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx4_NARROW:Vesize>" "Ui1, Ui1, Ui1, Ui1, i, i")
-            (mem:BLK (scratch))]
-           UNSPEC_LD1_GATHER)))]
-  "TARGET_SVE"
-  "@
-   ld1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%2.s]
-   ld1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%2.s, #%1]
-   ld1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%1, %2.s, sxtw]
-   ld1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%1, %2.s, uxtw]
-   ld1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%1, %2.s, sxtw %p4]
-   ld1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%1, %2.s, uxtw %p4]"
+(define_insn_and_rewrite "@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>"
+  [(set (match_operand:SVE_4HSI 0 "register_operand" "=w, w, w, w, w, w")
+       (unspec:SVE_4HSI
+         [(match_operand:VNx4BI 6 "general_operand" "UplDnm, UplDnm, UplDnm, UplDnm, UplDnm, UplDnm")
+          (ANY_EXTEND:SVE_4HSI
+            (unspec:SVE_4BHI
+              [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl")
+               (match_operand:DI 1 "aarch64_sve_gather_offset_<SVE_4BHI:Vesize>" "Z, vg<SVE_4BHI:Vesize>, rk, rk, rk, rk")
+               (match_operand:VNx4SI 2 "register_operand" "w, w, w, w, w, w")
+               (match_operand:DI 3 "const_int_operand" "Ui1, Ui1, Z, Ui1, Z, Ui1")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<SVE_4BHI:Vesize>" "Ui1, Ui1, Ui1, Ui1, i, i")
+               (mem:BLK (scratch))]
+              UNSPEC_LD1_GATHER))]
+         UNSPEC_PRED_X))]
+  "TARGET_SVE && (~<SVE_4HSI:narrower_mask> & <SVE_4BHI:self_mask>) == 0"
+  "@
+   ld1<ANY_EXTEND:s><SVE_4BHI:Vesize>\t%0.s, %5/z, [%2.s]
+   ld1<ANY_EXTEND:s><SVE_4BHI:Vesize>\t%0.s, %5/z, [%2.s, #%1]
+   ld1<ANY_EXTEND:s><SVE_4BHI:Vesize>\t%0.s, %5/z, [%1, %2.s, sxtw]
+   ld1<ANY_EXTEND:s><SVE_4BHI:Vesize>\t%0.s, %5/z, [%1, %2.s, uxtw]
+   ld1<ANY_EXTEND:s><SVE_4BHI:Vesize>\t%0.s, %5/z, [%1, %2.s, sxtw %p4]
+   ld1<ANY_EXTEND:s><SVE_4BHI:Vesize>\t%0.s, %5/z, [%1, %2.s, uxtw %p4]"
+  "&& !CONSTANT_P (operands[6])"
+  {
+    operands[6] = CONSTM1_RTX (VNx4BImode);
+  }
 )
 
 ;; Predicated extending gather loads for 64-bit elements.  The value of
 ;; operand 3 doesn't matter in this case.
-(define_insn "@aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>"
-  [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w, w, w")
-       (ANY_EXTEND:VNx2_WIDE
-         (unspec:VNx2_NARROW
-           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl")
-            (match_operand:DI 1 "aarch64_sve_gather_offset_<VNx2_NARROW:Vesize>" "Z, vg<VNx2_NARROW:Vesize>, rk, rk")
-            (match_operand:VNx2_WIDE 2 "register_operand" "w, w, w, w")
-            (match_operand:DI 3 "const_int_operand")
-            (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, Ui1, Ui1, i")
-            (mem:BLK (scratch))]
-           UNSPEC_LD1_GATHER)))]
-  "TARGET_SVE"
+(define_insn_and_rewrite "@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>"
+  [(set (match_operand:SVE_2HSDI 0 "register_operand" "=w, w, w, w")
+       (unspec:SVE_2HSDI
+         [(match_operand:VNx2BI 6 "general_operand" "UplDnm, UplDnm, UplDnm, UplDnm")
+          (ANY_EXTEND:SVE_2HSDI
+            (unspec:SVE_2BHSI
+              [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl")
+               (match_operand:DI 1 "aarch64_sve_gather_offset_<SVE_2BHSI:Vesize>" "Z, vg<SVE_2BHSI:Vesize>, rk, rk")
+               (match_operand:VNx2DI 2 "register_operand" "w, w, w, w")
+               (match_operand:DI 3 "const_int_operand")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<SVE_2BHSI:Vesize>" "Ui1, Ui1, Ui1, i")
+               (mem:BLK (scratch))]
+              UNSPEC_LD1_GATHER))]
+         UNSPEC_PRED_X))]
+  "TARGET_SVE && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
   "@
-   ld1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%2.d]
-   ld1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%2.d, #%1]
-   ld1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d]
-   ld1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, lsl %p4]"
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%2.d]
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%2.d, #%1]
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d]
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, lsl %p4]"
+  "&& !CONSTANT_P (operands[6])"
+  {
+    operands[6] = CONSTM1_RTX (VNx2BImode);
+  }
 )
 
-;; Likewise, but with the offset being sign-extended from 32 bits.
-(define_insn_and_rewrite "*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw"
-  [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w")
-       (ANY_EXTEND:VNx2_WIDE
-         (unspec:VNx2_NARROW
-           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
-            (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
-            (unspec:VNx2DI
-              [(match_operand 6)
-               (sign_extend:VNx2DI
-                 (truncate:VNx2SI
-                   (match_operand:VNx2DI 2 "register_operand" "w, w")))]
-              UNSPEC_PRED_X)
-            (match_operand:DI 3 "const_int_operand")
-            (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, i")
-            (mem:BLK (scratch))]
-           UNSPEC_LD1_GATHER)))]
-  "TARGET_SVE"
+;; Likewise, but with the offset being extended from 32 bits.
+(define_insn_and_rewrite "*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked"
+  [(set (match_operand:SVE_2HSDI 0 "register_operand" "=w, w")
+       (unspec:SVE_2HSDI
+         [(match_operand 6)
+          (ANY_EXTEND:SVE_2HSDI
+            (unspec:SVE_2BHSI
+              [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
+               (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
+               (unspec:VNx2DI
+                 [(match_operand 7)
+                  (ANY_EXTEND2:VNx2DI
+                    (match_operand:VNx2SI 2 "register_operand" "w, w"))]
+                 UNSPEC_PRED_X)
+               (match_operand:DI 3 "const_int_operand")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<SVE_2BHSI:Vesize>" "Ui1, i")
+               (mem:BLK (scratch))]
+              UNSPEC_LD1_GATHER))]
+         UNSPEC_PRED_X))]
+  "TARGET_SVE && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
   "@
-   ld1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw]
-   ld1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw %p4]"
-  "&& !rtx_equal_p (operands[5], operands[6])"
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, <ANY_EXTEND2:su>xtw]
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, <ANY_EXTEND2:su>xtw %p4]"
+  "&& (!CONSTANT_P (operands[6]) || !CONSTANT_P (operands[7]))"
   {
-    operands[6] = copy_rtx (operands[5]);
+    operands[6] = CONSTM1_RTX (VNx2BImode);
+    operands[7] = CONSTM1_RTX (VNx2BImode);
   }
 )
 
-;; Likewise, but with the offset being zero-extended from 32 bits.
-(define_insn "*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw"
-  [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w")
-       (ANY_EXTEND:VNx2_WIDE
-         (unspec:VNx2_NARROW
-           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
-            (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
-            (and:VNx2DI
-              (match_operand:VNx2DI 2 "register_operand" "w, w")
-              (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate"))
-            (match_operand:DI 3 "const_int_operand")
-            (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, i")
-            (mem:BLK (scratch))]
-           UNSPEC_LD1_GATHER)))]
-  "TARGET_SVE"
+;; Likewise, but with the offset being truncated to 32 bits and then
+;; sign-extended.
+(define_insn_and_rewrite "*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_sxtw"
+  [(set (match_operand:SVE_2HSDI 0 "register_operand" "=w, w")
+       (unspec:SVE_2HSDI
+         [(match_operand 6)
+          (ANY_EXTEND:SVE_2HSDI
+            (unspec:SVE_2BHSI
+              [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
+               (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
+               (unspec:VNx2DI
+                 [(match_operand 7)
+                  (sign_extend:VNx2DI
+                    (truncate:VNx2SI
+                      (match_operand:VNx2DI 2 "register_operand" "w, w")))]
+                 UNSPEC_PRED_X)
+               (match_operand:DI 3 "const_int_operand")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<SVE_2BHSI:Vesize>" "Ui1, i")
+               (mem:BLK (scratch))]
+              UNSPEC_LD1_GATHER))]
+         UNSPEC_PRED_X))]
+  "TARGET_SVE && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
+  "@
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw]
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw %p4]"
+  "&& (!CONSTANT_P (operands[6]) || !CONSTANT_P (operands[7]))"
+  {
+    operands[6] = CONSTM1_RTX (VNx2BImode);
+    operands[7] = CONSTM1_RTX (VNx2BImode);
+  }
+)
+
+;; Likewise, but with the offset being truncated to 32 bits and then
+;; zero-extended.
+(define_insn_and_rewrite "*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_uxtw"
+  [(set (match_operand:SVE_2HSDI 0 "register_operand" "=w, w")
+       (unspec:SVE_2HSDI
+         [(match_operand 7)
+          (ANY_EXTEND:SVE_2HSDI
+            (unspec:SVE_2BHSI
+              [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
+               (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
+               (and:VNx2DI
+                 (match_operand:VNx2DI 2 "register_operand" "w, w")
+                 (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate"))
+               (match_operand:DI 3 "const_int_operand")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<SVE_2BHSI:Vesize>" "Ui1, i")
+               (mem:BLK (scratch))]
+              UNSPEC_LD1_GATHER))]
+         UNSPEC_PRED_X))]
+  "TARGET_SVE && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
   "@
-   ld1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw]
-   ld1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw %p4]"
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw]
+   ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw %p4]"
+  "&& !CONSTANT_P (operands[7])"
+  {
+    operands[7] = CONSTM1_RTX (VNx2BImode);
+  }
 )
 
 ;; -------------------------------------------------------------------------
   "@
    ldff1d\t%0.d, %5/z, [%1, %2.d, sxtw]
    ldff1d\t%0.d, %5/z, [%1, %2.d, sxtw %p4]"
-  "&& !rtx_equal_p (operands[5], operands[6])"
+  "&& !CONSTANT_P (operands[6])"
   {
-    operands[6] = copy_rtx (operands[5]);
+    operands[6] = CONSTM1_RTX (VNx2BImode);
   }
 )
 
 
 ;; Predicated extending first-faulting gather loads for 32-bit elements.
 ;; Operand 3 is true for unsigned extension and false for signed extension.
-(define_insn "@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>"
+(define_insn_and_rewrite "@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>"
   [(set (match_operand:VNx4_WIDE 0 "register_operand" "=w, w, w, w, w, w")
-       (ANY_EXTEND:VNx4_WIDE
-         (unspec:VNx4_NARROW
-           [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl")
-            (match_operand:DI 1 "aarch64_sve_gather_offset_<VNx4_NARROW:Vesize>" "Z, vg<VNx4_NARROW:Vesize>, rk, rk, rk, rk")
-            (match_operand:VNx4_WIDE 2 "register_operand" "w, w, w, w, w, w")
-            (match_operand:DI 3 "const_int_operand" "i, i, Z, Ui1, Z, Ui1")
-            (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx4_NARROW:Vesize>" "Ui1, Ui1, Ui1, Ui1, i, i")
-            (mem:BLK (scratch))
-            (reg:VNx16BI FFRT_REGNUM)]
-           UNSPEC_LDFF1_GATHER)))]
+       (unspec:VNx4_WIDE
+         [(match_operand:VNx4BI 6 "general_operand" "UplDnm, UplDnm, UplDnm, UplDnm, UplDnm, UplDnm")
+          (ANY_EXTEND:VNx4_WIDE
+            (unspec:VNx4_NARROW
+              [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl")
+               (match_operand:DI 1 "aarch64_sve_gather_offset_<VNx4_NARROW:Vesize>" "Z, vg<VNx4_NARROW:Vesize>, rk, rk, rk, rk")
+               (match_operand:VNx4_WIDE 2 "register_operand" "w, w, w, w, w, w")
+               (match_operand:DI 3 "const_int_operand" "i, i, Z, Ui1, Z, Ui1")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx4_NARROW:Vesize>" "Ui1, Ui1, Ui1, Ui1, i, i")
+               (mem:BLK (scratch))
+               (reg:VNx16BI FFRT_REGNUM)]
+              UNSPEC_LDFF1_GATHER))]
+         UNSPEC_PRED_X))]
   "TARGET_SVE"
   "@
    ldff1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%2.s]
    ldff1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%1, %2.s, uxtw]
    ldff1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%1, %2.s, sxtw %p4]
    ldff1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%1, %2.s, uxtw %p4]"
+  "&& !CONSTANT_P (operands[6])"
+  {
+    operands[6] = CONSTM1_RTX (VNx4BImode);
+  }
 )
 
 ;; Predicated extending first-faulting gather loads for 64-bit elements.
 ;; The value of operand 3 doesn't matter in this case.
-(define_insn "@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>"
+(define_insn_and_rewrite "@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>"
   [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w, w, w")
-       (ANY_EXTEND:VNx2_WIDE
-         (unspec:VNx2_NARROW
-           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl")
-            (match_operand:DI 1 "aarch64_sve_gather_offset_<VNx2_NARROW:Vesize>" "Z, vg<VNx2_NARROW:Vesize>, rk, rk")
-            (match_operand:VNx2_WIDE 2 "register_operand" "w, w, w, w")
-            (match_operand:DI 3 "const_int_operand")
-            (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, Ui1, Ui1, i")
-            (mem:BLK (scratch))
-            (reg:VNx16BI FFRT_REGNUM)]
-           UNSPEC_LDFF1_GATHER)))]
+       (unspec:VNx2_WIDE
+         [(match_operand:VNx2BI 6 "general_operand" "UplDnm, UplDnm, UplDnm, UplDnm")
+          (ANY_EXTEND:VNx2_WIDE
+            (unspec:VNx2_NARROW
+              [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl")
+               (match_operand:DI 1 "aarch64_sve_gather_offset_<VNx2_NARROW:Vesize>" "Z, vg<VNx2_NARROW:Vesize>, rk, rk")
+               (match_operand:VNx2_WIDE 2 "register_operand" "w, w, w, w")
+               (match_operand:DI 3 "const_int_operand")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, Ui1, Ui1, i")
+               (mem:BLK (scratch))
+               (reg:VNx16BI FFRT_REGNUM)]
+              UNSPEC_LDFF1_GATHER))]
+         UNSPEC_PRED_X))]
   "TARGET_SVE"
   "@
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%2.d]
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%2.d, #%1]
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d]
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, lsl %p4]"
+  "&& !CONSTANT_P (operands[6])"
+  {
+    operands[6] = CONSTM1_RTX (VNx2BImode);
+  }
 )
 
 ;; Likewise, but with the offset being sign-extended from 32 bits.
 (define_insn_and_rewrite "*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw"
   [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w")
-       (ANY_EXTEND:VNx2_WIDE
-         (unspec:VNx2_NARROW
-           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
-            (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
-            (unspec:VNx2DI
-              [(match_operand 6)
-               (sign_extend:VNx2DI
-                 (truncate:VNx2SI
-                   (match_operand:VNx2DI 2 "register_operand" "w, w")))]
-              UNSPEC_PRED_X)
-            (match_operand:DI 3 "const_int_operand")
-            (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, i")
-            (mem:BLK (scratch))
-            (reg:VNx16BI FFRT_REGNUM)]
-           UNSPEC_LDFF1_GATHER)))]
+       (unspec:VNx2_WIDE
+         [(match_operand 6)
+          (ANY_EXTEND:VNx2_WIDE
+            (unspec:VNx2_NARROW
+              [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
+               (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
+               (unspec:VNx2DI
+                 [(match_operand 7)
+                  (sign_extend:VNx2DI
+                    (truncate:VNx2SI
+                      (match_operand:VNx2DI 2 "register_operand" "w, w")))]
+                 UNSPEC_PRED_X)
+               (match_operand:DI 3 "const_int_operand")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, i")
+               (mem:BLK (scratch))
+               (reg:VNx16BI FFRT_REGNUM)]
+              UNSPEC_LDFF1_GATHER))]
+         UNSPEC_PRED_X))]
   "TARGET_SVE"
   "@
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw]
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw %p4]"
-  "&& !rtx_equal_p (operands[5], operands[6])"
+  "&& (!CONSTANT_P (operands[6]) || !CONSTANT_P (operands[7]))"
   {
-    operands[6] = copy_rtx (operands[5]);
+    operands[6] = CONSTM1_RTX (VNx2BImode);
+    operands[7] = CONSTM1_RTX (VNx2BImode);
   }
 )
 
 ;; Likewise, but with the offset being zero-extended from 32 bits.
-(define_insn "*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw"
+(define_insn_and_rewrite "*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw"
   [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w")
-       (ANY_EXTEND:VNx2_WIDE
-         (unspec:VNx2_NARROW
-           [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
-            (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
-            (and:VNx2DI
-              (match_operand:VNx2DI 2 "register_operand" "w, w")
-              (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate"))
-            (match_operand:DI 3 "const_int_operand")
-            (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, i")
-            (mem:BLK (scratch))
-            (reg:VNx16BI FFRT_REGNUM)]
-           UNSPEC_LDFF1_GATHER)))]
+       (unspec:VNx2_WIDE
+         [(match_operand 7)
+          (ANY_EXTEND:VNx2_WIDE
+            (unspec:VNx2_NARROW
+              [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")
+               (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk")
+               (and:VNx2DI
+                 (match_operand:VNx2DI 2 "register_operand" "w, w")
+                 (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate"))
+               (match_operand:DI 3 "const_int_operand")
+               (match_operand:DI 4 "aarch64_gather_scale_operand_<VNx2_NARROW:Vesize>" "Ui1, i")
+               (mem:BLK (scratch))
+               (reg:VNx16BI FFRT_REGNUM)]
+              UNSPEC_LDFF1_GATHER))]
+         UNSPEC_PRED_X))]
   "TARGET_SVE"
   "@
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw]
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw %p4]"
+  "&& !CONSTANT_P (operands[7])"
+  {
+    operands[7] = CONSTM1_RTX (VNx2BImode);
+  }
 )
 
 ;; =========================================================================
index c5b0fa7..bfeebe9 100644 (file)
 ;; SVE modes with 2 elements.
 (define_mode_iterator SVE_2 [VNx2QI VNx2HI VNx2HF VNx2SI VNx2SF VNx2DI VNx2DF])
 
+;; SVE integer modes with 2 elements, excluding the widest element.
+(define_mode_iterator SVE_2BHSI [VNx2QI VNx2HI VNx2SI])
+
+;; SVE integer modes with 2 elements, excluding the narrowest element.
+(define_mode_iterator SVE_2HSDI [VNx2HI VNx2SI VNx2DI])
+
 ;; SVE modes with 4 elements.
 (define_mode_iterator SVE_4 [VNx4QI VNx4HI VNx4HF VNx4SI VNx4SF])
 
+;; SVE integer modes with 4 elements, excluding the widest element.
+(define_mode_iterator SVE_4BHI [VNx4QI VNx4HI])
+
+;; SVE integer modes with 4 elements, excluding the narrowest element.
+(define_mode_iterator SVE_4HSI [VNx4HI VNx4SI])
+
 ;; Modes involved in extending or truncating SVE data, for 8 elements per
 ;; 128-bit block.
 (define_mode_iterator VNx8_NARROW [VNx8QI])
 
 ;; Code iterator for sign/zero extension
 (define_code_iterator ANY_EXTEND [sign_extend zero_extend])
+(define_code_iterator ANY_EXTEND2 [sign_extend zero_extend])
 
 ;; All division operations (signed/unsigned)
 (define_code_iterator ANY_DIV [div udiv])
index e15be63..e9535af 100644 (file)
@@ -1,5 +1,20 @@
 2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
 
+       * gcc.target/aarch64/sve/gather_load_extend_1.c: New test.
+       * gcc.target/aarch64/sve/gather_load_extend_2.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_3.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_4.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_5.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_6.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_7.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_8.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_9.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_10.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_11.c: Likewise.
+       * gcc.target/aarch64/sve/gather_load_extend_12.c: Likewise.
+
+2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
+
        * gcc.target/aarch64/sve/gather_load_1.c (TEST_LOOP): Start at 0.
        (TEST_ALL): Add tests for 8-bit and 16-bit elements.
        * gcc.target/aarch64/sve/gather_load_2.c: Update accordingly.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_1.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_1.c
new file mode 100644 (file)
index 0000000..005c33b
--- /dev/null
@@ -0,0 +1,34 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define TEST_LOOP(TYPE1, TYPE2)                                                \
+  void                                                                 \
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1,      \
+                      TYPE2 *restrict src2, uint32_t *restrict index,  \
+                      int n)                                           \
+  {                                                                    \
+    for (int i = 0; i < n; ++i)                                                \
+      dst[i] += src1[i] + src2[index[i]];                              \
+  }
+
+#define TEST_ALL(T) \
+  T (uint16_t, uint8_t) \
+  T (uint32_t, uint8_t) \
+  T (uint64_t, uint8_t) \
+  T (uint32_t, uint16_t) \
+  T (uint64_t, uint16_t) \
+  T (uint64_t, uint32_t)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 1\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */
+
+/* { dg-final { scan-assembler-not {\tuxt.\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_10.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_10.c
new file mode 100644 (file)
index 0000000..09b5505
--- /dev/null
@@ -0,0 +1,39 @@
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
+
+#include <stdint.h>
+
+void
+f1 (int64_t *restrict dst, int16_t *src1, int8_t *src2, uint32_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int16_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f2 (int64_t *restrict dst, int16_t *src1, int8_t *src2, uint64_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int16_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f3 (int64_t *restrict dst, int16_t *src1, int8_t **src2)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int16_t) (src1[i] + *src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
+
+/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tsxth\tz[0-9]+\.d,} 3 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_11.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_11.c
new file mode 100644 (file)
index 0000000..5a93bb7
--- /dev/null
@@ -0,0 +1,39 @@
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
+
+#include <stdint.h>
+
+void
+f1 (int64_t *restrict dst, int32_t *src1, int8_t *src2, uint32_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int32_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f2 (int64_t *restrict dst, int32_t *src1, int8_t *src2, uint64_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int32_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f3 (int64_t *restrict dst, int32_t *src1, int8_t **src2)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int32_t) (src1[i] + *src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
+
+/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tsxtw\tz[0-9]+\.d,} 3 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_12.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_12.c
new file mode 100644 (file)
index 0000000..f5989ce
--- /dev/null
@@ -0,0 +1,39 @@
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
+
+#include <stdint.h>
+
+void
+f1 (int64_t *restrict dst, int32_t *src1, int16_t *src2, uint32_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int32_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f2 (int64_t *restrict dst, int32_t *src1, int16_t *src2, uint64_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int32_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f3 (int64_t *restrict dst, int32_t *src1, int16_t **src2)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (int32_t) (src1[i] + *src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d, lsl 1\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
+
+/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tsxtw\tz[0-9]+\.d,} 3 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_2.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_2.c
new file mode 100644 (file)
index 0000000..bd1081f
--- /dev/null
@@ -0,0 +1,34 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define TEST_LOOP(TYPE1, TYPE2)                                                \
+  void                                                                 \
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1,      \
+                      TYPE2 *restrict src2, uint32_t *restrict index,  \
+                      int n)                                           \
+  {                                                                    \
+    for (int i = 0; i < n; ++i)                                                \
+      dst[i] += src1[i] + src2[index[i]];                              \
+  }
+
+#define TEST_ALL(T) \
+  T (int16_t, int8_t) \
+  T (int32_t, int8_t) \
+  T (int64_t, int8_t) \
+  T (int32_t, int16_t) \
+  T (int64_t, int16_t) \
+  T (int64_t, int32_t)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 1\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */
+
+/* { dg-final { scan-assembler-not {\tsxt.\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_3.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_3.c
new file mode 100644 (file)
index 0000000..440f8f6
--- /dev/null
@@ -0,0 +1,34 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define TEST_LOOP(TYPE1, TYPE2)                                                \
+  void                                                                 \
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1,      \
+                      TYPE2 *restrict src2, int32_t *restrict index,   \
+                      int n)                                           \
+  {                                                                    \
+    for (int i = 0; i < n; ++i)                                                \
+      dst[i] += src1[i] + src2[index[i]];                              \
+  }
+
+#define TEST_ALL(T) \
+  T (uint16_t, uint8_t) \
+  T (uint32_t, uint8_t) \
+  T (uint64_t, uint8_t) \
+  T (uint32_t, uint16_t) \
+  T (uint64_t, uint16_t) \
+  T (uint64_t, uint32_t)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 1\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */
+/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */
+
+/* { dg-final { scan-assembler-not {\tuxt.\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_4.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_4.c
new file mode 100644 (file)
index 0000000..8cc8e75
--- /dev/null
@@ -0,0 +1,34 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define TEST_LOOP(TYPE1, TYPE2)                                                \
+  void                                                                 \
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1,      \
+                      TYPE2 *restrict src2, int32_t *restrict index,   \
+                      int n)                                           \
+  {                                                                    \
+    for (int i = 0; i < n; ++i)                                                \
+      dst[i] += src1[i] + src2[index[i]];                              \
+  }
+
+#define TEST_ALL(T) \
+  T (int16_t, int8_t) \
+  T (int32_t, int8_t) \
+  T (int64_t, int8_t) \
+  T (int32_t, int16_t) \
+  T (int64_t, int16_t) \
+  T (int64_t, int32_t)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 1\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */
+/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */
+
+/* { dg-final { scan-assembler-not {\tsxt.\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_5.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_5.c
new file mode 100644 (file)
index 0000000..4916a1e
--- /dev/null
@@ -0,0 +1,29 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define TEST_LOOP(TYPE1, TYPE2)                                                \
+  void                                                                 \
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1,      \
+                      TYPE2 *restrict src2, uint64_t *restrict index,  \
+                      int n)                                           \
+  {                                                                    \
+    for (int i = 0; i < n; ++i)                                                \
+      dst[i] += src1[i] + src2[index[i]];                              \
+  }
+
+#define TEST_ALL(T) \
+  T (uint16_t, uint8_t) \
+  T (uint32_t, uint8_t) \
+  T (uint64_t, uint8_t) \
+  T (uint32_t, uint16_t) \
+  T (uint64_t, uint16_t) \
+  T (uint64_t, uint32_t)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-not {\tuxt.\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_6.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_6.c
new file mode 100644 (file)
index 0000000..60a73b3
--- /dev/null
@@ -0,0 +1,29 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define TEST_LOOP(TYPE1, TYPE2)                                                \
+  void                                                                 \
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1,      \
+                      TYPE2 *restrict src2, uint64_t *restrict index,  \
+                      int n)                                           \
+  {                                                                    \
+    for (int i = 0; i < n; ++i)                                                \
+      dst[i] += src1[i] + src2[index[i]];                              \
+  }
+
+#define TEST_ALL(T) \
+  T (int16_t, int8_t) \
+  T (int32_t, int8_t) \
+  T (int64_t, int8_t) \
+  T (int32_t, int16_t) \
+  T (int64_t, int16_t) \
+  T (int64_t, int32_t)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-not {\tsxt.\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_7.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_7.c
new file mode 100644 (file)
index 0000000..bdfb0f7
--- /dev/null
@@ -0,0 +1,39 @@
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
+
+#include <stdint.h>
+
+void
+f1 (uint64_t *restrict dst, uint16_t *src1, uint8_t *src2, uint32_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint16_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f2 (uint64_t *restrict dst, uint16_t *src1, uint8_t *src2, uint64_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint16_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f3 (uint64_t *restrict dst, uint16_t *src1, uint8_t **src2)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint16_t) (src1[i] + *src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
+
+/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tuxth\tz[0-9]+\.d,} 3 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_8.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_8.c
new file mode 100644 (file)
index 0000000..20976ea
--- /dev/null
@@ -0,0 +1,39 @@
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
+
+#include <stdint.h>
+
+void
+f1 (uint64_t *restrict dst, uint32_t *src1, uint8_t *src2, uint32_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint32_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f2 (uint64_t *restrict dst, uint32_t *src1, uint8_t *src2, uint64_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint32_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f3 (uint64_t *restrict dst, uint32_t *src1, uint8_t **src2)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint32_t) (src1[i] + *src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
+
+/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tuxtw\tz[0-9]+\.d,} 3 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_9.c b/gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_9.c
new file mode 100644 (file)
index 0000000..b2f75e9
--- /dev/null
@@ -0,0 +1,39 @@
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
+
+#include <stdint.h>
+
+void
+f1 (uint64_t *restrict dst, uint32_t *src1, uint16_t *src2, uint32_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint32_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f2 (uint64_t *restrict dst, uint32_t *src1, uint16_t *src2, uint64_t *index)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint32_t) (src1[i] + src2[index[i]]);
+}
+
+void
+f3 (uint64_t *restrict dst, uint32_t *src1, uint16_t **src2)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] += (uint32_t) (src1[i] + *src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d, lsl 1\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
+
+/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tuxtw\tz[0-9]+\.d,} 3 } } */