middle-end: optimize slp simplify back to back permutes.
authorTamar Christina <tamar.christina@arm.com>
Thu, 5 Nov 2020 11:46:35 +0000 (11:46 +0000)
committerTamar Christina <tamar.christina@arm.com>
Thu, 5 Nov 2020 11:46:35 +0000 (11:46 +0000)
commit199988774d74091e467aef695d0d985528360613
tree1eb96fe6c46d75a4e779ac17499b4bcc496432fc
parent7eb6c0ad2611e0802c3684196c9a7e94162f2c51
middle-end: optimize slp simplify back to back permutes.

This optimizes sequential permutes. i.e. if there are two permutes back to back
this function applies the permute of the parent to the child and removed the
parent.

This relies on the materialization point calculation in optimize SLP.

This allows us to remove useless permutes such as

ldr     q0, [x0, x3]
ldr     q2, [x1, x3]
trn1    v1.4s, v0.4s, v0.4s
trn2    v0.4s, v0.4s, v0.4s
trn1    v0.4s, v1.4s, v0.4s
mov     v1.16b, v3.16b
fcmla   v1.4s, v0.4s, v2.4s, #0
fcmla   v1.4s, v0.4s, v2.4s, #90
str     q1, [x2, x3]

from the sequence the vectorizer puts out and give

ldr     q0, [x0, x3]
ldr     q2, [x1, x3]
mov     v1.16b, v3.16b
fcmla   v1.4s, v0.4s, v2.4s, #0
fcmla   v1.4s, v0.4s, v2.4s, #90
str     q1, [x2, x3]

instead.

gcc/ChangeLog:

* tree-vect-slp.c (vect_slp_tree_permute_noop_p): New.
(vect_optimize_slp): Optimize permutes.
(vectorizable_slp_permutation): Fix typo.
gcc/tree-vect-slp.c