rs6000: Modify the way for extra penalized cost
This patch follows the discussions here[1][2], where Segher
pointed out the existing way to guard the extra penalized
cost for strided/elementwise loads with a magic bound does
not scale.
The way with nunits * stmt_cost can get one much
exaggerated penalized cost, such as: for V16QI on P8, it's
16 * 20 = 320, that's why we need one bound. To make it
better and more readable, the penalized cost is simplified
as:
unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
unsigned extra_cost = nunits * adjusted_cost;
For V2DI/V2DF, it uses 2 penalized cost for each scalar load
while for the other modes, it uses 1. It's mainly concluded
from the performance evaluations. One thing might be
related is that: More units vector gets constructed, more
instructions are used. It has more chances to schedule them
better (even run in parallelly when enough available units
at that time), so it seems reasonable not to penalize more
for them.
The SPEC2017 evaluations on Power8/Power9/Power10 at option
sets O2-vect and Ofast-unroll show this change is neutral.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html
gcc/ChangeLog:
* config/rs6000/rs6000.c
(rs6000_cost_data::update_target_cost_per_stmt): Adjust the way to
compute extra penalized cost. Remove useless parameter.
(rs6000_cost_data::rs6000_add_stmt_cost): Adjust the call to function
update_target_cost_per_stmt.