review.tizen.org Git - platform/upstream/llvm.git/commit

projects / platform / upstream / llvm.git / commit

author	Roman Lebedev <lebedev.ri@gmail.com>
	Tue, 1 Jun 2021 07:39:36 +0000 (10:39 +0300)
committer	Roman Lebedev <lebedev.ri@gmail.com>
	Tue, 1 Jun 2021 07:39:36 +0000 (10:39 +0300)
commit	cf9b1f7a0e9da5d019a8bea853f3cff85d808d18
tree	ee0bf400df718b5512e9b91666f1ad83caf6e5a8	tree \| snapshot
parent	41d7909368bebc897467a75860a524a5f172564f	commit \| diff

[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants

Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature,
which controls profitability of both the cross-lane and per-lane variable shuffles.
I guess, this has been fine so far.

But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`)
are as fast as as shuffles with fixed/immediate mask,
while lane-crossing shuffles, e.g. `VPERMPS` is performing worse.

So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles,
as suggested by @RKSimon, split the feature flag into two.

Differential Revision: https://reviews.llvm.org/D103274

66 files changed:

llvm/lib/Target/X86/X86.td		diff \| blob \| history
llvm/lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
llvm/lib/Target/X86/X86Subtarget.h		diff \| blob \| history
llvm/lib/Target/X86/X86TargetTransformInfo.h		diff \| blob \| history
llvm/test/CodeGen/X86/avx2-conversions.ll		diff \| blob \| history
llvm/test/CodeGen/X86/avx2-vector-shifts.ll		diff \| blob \| history
llvm/test/CodeGen/X86/avx512-extract-subvector-load-store.ll		diff \| blob \| history
llvm/test/CodeGen/X86/avx512-shuffles/broadcast-vector-int.ll		diff \| blob \| history
llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll		diff \| blob \| history
llvm/test/CodeGen/X86/avx512-trunc.ll		diff \| blob \| history
llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll		diff \| blob \| history
llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll		diff \| blob \| history
llvm/test/CodeGen/X86/broadcastm-lowering.ll		diff \| blob \| history
llvm/test/CodeGen/X86/combine-shl.ll		diff \| blob \| history
llvm/test/CodeGen/X86/combine-sra.ll		diff \| blob \| history
llvm/test/CodeGen/X86/combine-srl.ll		diff \| blob \| history
llvm/test/CodeGen/X86/insertelement-zero.ll		diff \| blob \| history
llvm/test/CodeGen/X86/oddshuffles.ll		diff \| blob \| history
llvm/test/CodeGen/X86/oddsubvector.ll		diff \| blob \| history
llvm/test/CodeGen/X86/paddus.ll		diff \| blob \| history
llvm/test/CodeGen/X86/phaddsub.ll		diff \| blob \| history
llvm/test/CodeGen/X86/psubus.ll		diff \| blob \| history
llvm/test/CodeGen/X86/sadd_sat_vec.ll		diff \| blob \| history
llvm/test/CodeGen/X86/shuffle-of-splat-multiuses.ll		diff \| blob \| history
llvm/test/CodeGen/X86/shuffle-strided-with-offset-128.ll		diff \| blob \| history
llvm/test/CodeGen/X86/shuffle-strided-with-offset-256.ll		diff \| blob \| history
llvm/test/CodeGen/X86/shuffle-strided-with-offset-512.ll		diff \| blob \| history
llvm/test/CodeGen/X86/shuffle-vs-trunc-128.ll		diff \| blob \| history
llvm/test/CodeGen/X86/shuffle-vs-trunc-256.ll		diff \| blob \| history
llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll		diff \| blob \| history
llvm/test/CodeGen/X86/ssub_sat_vec.ll		diff \| blob \| history
llvm/test/CodeGen/X86/uadd_sat_vec.ll		diff \| blob \| history
llvm/test/CodeGen/X86/usub_sat_vec.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vec_saddo.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vec_smulo.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vec_ssubo.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vec_uaddo.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vec_umulo.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vec_usubo.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-half-conversions.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-5.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-2.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-3.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-4.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-5.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-128-unpck.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-128-v8.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-combining.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-shuffle-v1.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-trunc-math.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-trunc-packus.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-trunc-ssat.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-trunc-usat.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-trunc.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vector-zext.ll		diff \| blob \| history

Domain: System / Toolchain;

RSS Atom