Split vector load from parm_del to elemental loads to avoid STLF stalls.
Since cfg is freed before machine_reorg, just do a rough calculation
of the window according to the layout.
Also according to an experiment on CLX, set window size to 64.
Currently only handle V2DFmode load since it doesn't need any scratch
registers, and it's sufficient to recover cray performance for -O2
compared to GCC11.
gcc/ChangeLog:
PR target/101908
* config/i386/i386.cc (ix86_split_stlf_stall_load): New
function
(ix86_reorg): Call ix86_split_stlf_stall_load.
* config/i386/i386.opt (-param=x86-stlf-window-ninsns=): New
param.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr101908-1.c: New test.
* gcc.target/i386/pr101908-2.c: New test.
* gcc.target/i386/pr101908-3.c: New test.