x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast
authorH.J. Lu <hjl.tools@gmail.com>
Wed, 2 Jun 2021 14:15:45 +0000 (07:15 -0700)
committerH.J. Lu <hjl.tools@gmail.com>
Thu, 1 Jul 2021 15:11:20 +0000 (08:11 -0700)
commitedafb35bdadf309ebb9d1eddc5549f9e1ad49c09
tree14d2f553da601c4e2dbb3d8446d43dffa78c5189
parentd63454815de3b93331025bd990efdad5296ae706
x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR
operands to vector broadcast from an integer with AVX.
2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
won't increase stack alignment requirement and blocks transformation by
the combine pass.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory      : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   text    data     bss     dec     hex filename
    132       0       0     132      84 memory.o
    122       0       0     122      7a broadcast.o
$

3. Update PR 87767 tests to expect integer broadcast instead of broadcast
from memory.
4. Update avx512f_cond_move.c to expect integer broadcast.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast

shows that integer broadcast is faster than embedded memory broadcast:

$ make
gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory      : 425538
broadcast   : 375260
$

gcc/

PR target/100865
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
New prototype.
(ix86_byte_broadcast): New function.
(ix86_convert_const_wide_int_to_broadcast): Likewise.
(ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
size is 16 bytes or bigger.
(ix86_broadcast_from_integer_constant): New function.
(ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
to broadcast if mode size is 16 bytes or bigger.
* config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
prototype.
* config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.

gcc/testsuite/

PR target/100865
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
broadcast.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f_cond_move.c: Also pass
-mprefer-vector-width=512 and expect integer broadcast.
* gcc.target/i386/pr100865-1.c: New test.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-6a.c: Likewise.
* gcc.target/i386/pr100865-6b.c: Likewise.
* gcc.target/i386/pr100865-6c.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7b.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/pr100865-8a.c: Likewise.
* gcc.target/i386/pr100865-8b.c: Likewise.
* gcc.target/i386/pr100865-8c.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr100865-9c.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-11a.c: Likewise.
* gcc.target/i386/pr100865-11b.c: Likewise.
* gcc.target/i386/pr100865-11c.c: Likewise.
* gcc.target/i386/pr100865-12a.c: Likewise.
* gcc.target/i386/pr100865-12b.c: Likewise.
* gcc.target/i386/pr100865-12c.c: Likewise.
35 files changed:
gcc/config/i386/i386-expand.c
gcc/config/i386/i386-protos.h
gcc/config/i386/i386.c
gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
gcc/testsuite/gcc.target/i386/avx512f_cond_move.c
gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
gcc/testsuite/gcc.target/i386/pr100865-1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-10a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-10b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-11a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-11b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-11c.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-12a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-12b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-12c.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-4a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-4b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-5a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-5b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-6a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-6b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-6c.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-7a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-7b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-7c.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-8a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-8b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-8c.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-9a.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-9b.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr100865-9c.c [new file with mode: 0644]