introduce try store by multiple pieces
The ldist pass turns even very short loops into memset calls. E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.
Though short constant-length clearing memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.
RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.
This patch handles variable lengths with multiple conditional
power-of-2-constant-sized stores-by-pieces, so as to reduce the
overhead of length compares.
It also changes the last copy-prop pass into ccp, so that pointer
alignment and length's nonzero bits are detected and made available
for the expander, even for ldist-introduced SSA_NAMEs.
for gcc/ChangeLog
* builtins.c (try_store_by_multiple_pieces): New.
(expand_builtin_memset_args): Use it. If target_char_cast
fails, proceed as for non-constant val. Pass len's ctz to...
* expr.c (clear_storage_hints): ... this. Try store by
multiple pieces after setmem.
(clear_storage): Adjust.
* expr.h (clear_storage_hints): Likewise.
(try_store_by_multiple_pieces): Declare.
* passes.def: Replace the last copy_prop with ccp.