x86: Update memcpy/memset inline strategies for Ice Lake
authorH.J. Lu <hjl.tools@gmail.com>
Fri, 22 Jan 2021 02:51:35 +0000 (18:51 -0800)
committerH.J. Lu <hjl.tools@gmail.com>
Wed, 31 Mar 2021 12:28:32 +0000 (05:28 -0700)
commitbf24f4ec73b65454ea0edcd6ab5616f04958d41e
treefec9c1991dec8ed9d18e1287bc87a31e9c6ccba0
parent1393938e4c7dab9306cdce5a73d93b242fc246ec
x86: Update memcpy/memset inline strategies for Ice Lake

Simply memcpy and memset inline strategies to avoid branches for
-mtune=icelake:

1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
   load and store for up to 16 * 16 (256) bytes when the data size is
   fixed and known.
2. Inline only if data size is known to be <= 256.
   a. Use "rep movsb/stosb" with simple code sequence if the data size
      is a constant.
   b. Use loop if data size is not a constant.
3. Use memcpy/memset libray function if data size is unknown or > 256.

On Ice Lake processor with -march=native -Ofast -flto,

1.  Performance impacts of SPEC CPU 2017 rate are:

500.perlbench_r -0.93%
502.gcc_r        0.36%
505.mcf_r        0.31%
520.omnetpp_r   -0.07%
523.xalancbmk_r -0.53%
525.x264_r      -0.09%
531.deepsjeng_r -0.19%
541.leela_r      0.16%
548.exchange2_r  0.22%
557.xz_r        -1.64%
Geomean         -0.24%

503.bwaves_r    -0.01%
507.cactuBSSN_r  0.00%
508.namd_r       0.12%
510.parest_r     0.07%
511.povray_r     0.29%
519.lbm_r        0.00%
521.wrf_r       -0.38%
526.blender_r    0.16%
527.cam4_r       0.18%
538.imagick_r    0.76%
544.nab_r       -0.84%
549.fotonik3d_r -0.07%
554.roms_r      -0.01%
Geomean          0.02%

2. Significant impacts on eembc benchmarks are:

eembc/nnet_test      9.90%
eembc/mp2decoddata2  16.42%
eembc/textv2data3   -4.86%
eembc/qos            12.90%

gcc/

* config/i386/i386-expand.c (expand_set_or_cpymem_via_rep):
For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, don't convert QImode
to SImode.
(decide_alg): For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, use
"rep movsb/stosb" only for known sizes.
* config/i386/i386-options.c (processor_cost_table): Use Ice
Lake cost for Cannon Lake, Ice Lake, Tiger Lake, Sapphire
Rapids and Alder Lake.
* config/i386/i386.h (TARGET_PREFER_KNOWN_REP_MOVSB_STOSB): New.
* config/i386/x86-tune-costs.h (icelake_memcpy): New.
(icelake_memset): Likewise.
(icelake_cost): Likewise.
* config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB):
New.

gcc/testsuite/

* gcc.target/i386/memcpy-strategy-5.c: New test.
* gcc.target/i386/memcpy-strategy-6.c: Likewise.
* gcc.target/i386/memcpy-strategy-7.c: Likewise.
* gcc.target/i386/memcpy-strategy-8.c: Likewise.
* gcc.target/i386/memset-strategy-3.c: Likewise.
* gcc.target/i386/memset-strategy-4.c: Likewise.
* gcc.target/i386/memset-strategy-5.c: Likewise.
* gcc.target/i386/memset-strategy-6.c: Likewise.
13 files changed:
gcc/config/i386/i386-expand.c
gcc/config/i386/i386-options.c
gcc/config/i386/i386.h
gcc/config/i386/x86-tune-costs.h
gcc/config/i386/x86-tune.def
gcc/testsuite/gcc.target/i386/memcpy-strategy-5.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/memcpy-strategy-6.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/memcpy-strategy-7.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/memcpy-strategy-8.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/memset-strategy-3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/memset-strategy-4.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/memset-strategy-5.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/memset-strategy-6.c [new file with mode: 0644]