Split vector load from parm_del to elemental loads to avoid STLF stalls.
authorliuhongt <hongtao.liu@intel.com>
Wed, 30 Mar 2022 12:35:55 +0000 (20:35 +0800)
committerliuhongt <hongtao.liu@intel.com>
Tue, 5 Apr 2022 04:51:37 +0000 (12:51 +0800)
commite3174d6183e5c042e822d9feabb670235b737441
tree3cc052908840aea684e965682f7eec3d2eebc39b
parent418967ca275853a570b0ae566d7022ff38e7cd0d
Split vector load from parm_del to elemental loads to avoid STLF stalls.

Since cfg is freed before machine_reorg, just do a rough calculation
of the window according to the layout.
Also according to an experiment on CLX, set window size to 64.

Currently only handle V2DFmode load since it doesn't need any scratch
registers, and it's sufficient to recover cray performance for -O2
compared to GCC11.

gcc/ChangeLog:

PR target/101908
* config/i386/i386.cc (ix86_split_stlf_stall_load): New
function
(ix86_reorg): Call ix86_split_stlf_stall_load.
* config/i386/i386.opt (-param=x86-stlf-window-ninsns=): New
param.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101908-1.c: New test.
* gcc.target/i386/pr101908-2.c: New test.
* gcc.target/i386/pr101908-3.c: New test.
gcc/config/i386/i386.cc
gcc/config/i386/i386.opt
gcc/testsuite/gcc.target/i386/pr101908-1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr101908-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr101908-3.c [new file with mode: 0644]