review.tizen.org Git - platform/upstream/llvm.git/commit

author	Ilya Tokar <tokarip@google.com>
	Thu, 15 Dec 2022 20:00:27 +0000 (15:00 -0500)
committer	Ilya Tokar <tokarip@google.com>
	Tue, 24 Jan 2023 22:02:46 +0000 (17:02 -0500)
commit	d7043e8c41bb74a31c9790616c1536596814567b
tree	0b8e78e21ee1febe1c437b402640a7bb94a92402	tree \| snapshot
parent	7e89420116c91647db340a4457b8ad0d60be1d5e	commit \| diff

[X86] Add support for "light" AVX

AVX/AVX512 instructions may cause frequency drop on e.g. Skylake.
The magnitude of frequency/performance drop depends on instruction
(multiplication vs load/store) and vector width. Currently users,
that want to avoid this drop can specify -mprefer-vector-width=128.
However this also prevents generations of 256-bit wide instructions,
that have no associated frequency drop (mainly load/stores).

Add a tuning flag that allows generations of 256-bit AVX load/stores,
even when -mprefer-vector-width=128 is set, to speed-up memcpy&co.
Verified that running memcpy loop on all cores has no frequency impact
and zero CORE_POWER:LVL[12]_TURBO_LICENSE perf counters.

Makes coping memory faster e.g.:
BM_memcpy_aligned/256 80.7GB/s ± 3% 96.3GB/s ± 9% +19.33% (p=0.000 n=9+9)

Differential Revision: https://reviews.llvm.org/D134982

llvm/lib/Target/X86/X86.td		diff \| blob \| history
llvm/lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
llvm/lib/Target/X86/X86Subtarget.h		diff \| blob \| history
llvm/lib/Target/X86/X86TargetTransformInfo.h		diff \| blob \| history
llvm/test/CodeGen/X86/memcpy-light-avx.ll	[new file with mode: 0644]	blob
llvm/test/CodeGen/X86/vector-width-store-merge.ll		diff \| blob \| history