x86: Update -mtune=tremont
authorH.J. Lu <hjl.tools@gmail.com>
Wed, 15 Sep 2021 06:15:10 +0000 (14:15 +0800)
committerliuhongt <hongtao.liu@intel.com>
Fri, 17 Sep 2021 08:17:00 +0000 (16:17 +0800)
commit61b03ade93b0f47dd888cd5228f017979c494263
treec601f18d62399c6b004581f56e6ea014fce29f1f
parent687e30d9d74f5c69a54617147ce3433b07ee59ce
x86: Update -mtune=tremont

Initial -mtune=tremont update

1. Use Haswell scheduling model.
2. Assume that stack engine allows to execute push&pop instructions in
parall.
3. Prepare for scheduling pass as -mtune=generic.
4. Use the same issue rate as -mtune=generic.
5. Enable partial_reg_dependency.
6. Disable accumulate_outgoing_args
7. Enable use_leave
8. Enable push_memory
9. Disable four_jump_limit
10. Disable opt_agu
11. Disable avoid_lea_for_addr
12. Disable avoid_mem_opnd_for_cmove
13. Enable misaligned_move_string_pro_epilogues
14. Enable use_cltd
16. Enable avoid_false_dep_for_bmi
17. Enable avoid_mfence
18. Disable expand_abs
19. Enable sse_typeless_stores
20. Enable sse_load0_by_pxor
21. Disable split_mem_opnd_for_fp_converts
22. Disable slow_pshufb
23. Enable partial_reg_dependency

This is the first patch to tune for Tremont.  With all patches applied,
performance impacts on SPEC CPU 2017 are:

500.perlbench_r         1.81%
502.gcc_r               0.57%
505.mcf_r               1.16%
520.omnetpp_r           0.00%
523.xalancbmk_r         0.00%
525.x264_r              4.55%
531.deepsjeng_r         0.00%
541.leela_r             0.39%
548.exchange2_r         1.13%
557.xz_r                0.00%
geomean for intrate     0.95%
503.bwaves_r            0.00%
507.cactuBSSN_r         6.94%
508.namd_r              12.37%
510.parest_r            1.01%
511.povray_r            3.70%
519.lbm_r               36.61%
521.wrf_r               8.79%
526.blender_r           2.91%
527.cam4_r              6.23%
538.imagick_r           0.28%
544.nab_r               21.99%
549.fotonik3d_r         3.63%
554.roms_r              -1.20%
geomean for fprate      7.50%

gcc/ChangeLog

* common/config/i386/i386-common.c: Use Haswell scheduling model
for Tremont.
* config/i386/i386.c (ix86_sched_init_global): Prepare for Tremont
scheduling pass.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Change Tremont
issue rate to 4.
(ix86_adjust_cost): Handle Tremont.
* config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY):
Enable for Tremont.
(X86_TUNE_USE_LEAVE): Likewise.
(X86_TUNE_PUSH_MEMORY): Likewise.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
(X86_TUNE_USE_CLTD): Likewise.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
(X86_TUNE_AVOID_MFENCE): Likewise.
(X86_TUNE_SSE_TYPELESS_STORES): Likewise.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Disable for Tremont.
(X86_TUNE_FOUR_JUMP_LIMIT): Likewise.
(X86_TUNE_OPT_AGU): Likewise.
(X86_TUNE_AVOID_LEA_FOR_ADDR): Likewise.
(X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE): Likewise.
(X86_TUNE_EXPAND_ABS): Likewise.
(X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS): Likewise.
(X86_TUNE_SLOW_PSHUFB): Likewise.
gcc/common/config/i386/i386-common.c
gcc/config/i386/i386.c
gcc/config/i386/x86-tune-sched.c
gcc/config/i386/x86-tune.def