Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.
By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.
Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the entire memory, so always enable
broadcast.
benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast
The performance diff
strategy : cycles
memory :
1046611188
memory :
1255420817
memory :
1044720793
memory :
1253414145
average :
1097868397
broadcast :
1044430688
broadcast :
1044477630
broadcast :
1253554603
broadcast :
1044561934
average :
1096756213
But however broadcast has larger size.
the size diff
size broadcast.o
text data bss dec hex filename
137 0 0 137 89 broadcast.o
size memory.o
text data bss dec hex filename
115 0 0 115 73 memory.o
gcc/ChangeLog:
* config/i386/i386-expand.c
(ix86_broadcast_from_integer_constant): Rename to ..
(ix86_broadcast_from_constant): .. this, and extend it to
handle float mode.
(ix86_expand_vector_move): Extend to float mode.
* config/i386/i386-features.c
(replace_constant_pool_with_broadcast): Remove.
(remove_partial_avx_dependency_gate): Ditto.
(constant_pool_broadcast): Ditto.
(class pass_constant_pool_broadcast): Ditto.
(make_pass_constant_pool_broadcast): Ditto.
(remove_partial_avx_dependency): Adjust gate.
* config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
* config/i386/i386-protos.h
(make_pass_constant_pool_broadcast): Remove.
gcc/testsuite/ChangeLog:
* gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.