platform/upstream/llvm.git
3 years agoDelay outgoing register assignments to last.
Amara Emerson [Mon, 27 Sep 2021 06:20:46 +0000 (23:20 -0700)]
Delay outgoing register assignments to last.

The delayed stack protector feature which is currently used for SDAG (and thus
allows for more commonly generating tail calls) depends on being able to extract
the tail call into a separate return block. To do this it also has to extract
the vreg->physreg copies that set up the call's arguments, since if it doesn't
then the call inst ends up using undefined physregs in it's new spliced block.

SelectionDAG implementations can do this because they delay emitting register
copies until  *after* the stack arguments are set up. GISel however just
processes and emits the arguments in IR order, so stack arguments always end up
last, and thus this breaks the code that looks for any register arg copies that
precede the call instruction.

This patch adds a thunk argument to the assignValueToReg() and custom assignment
hooks. For outgoing arguments, register assignments use this return param to
return a thunk that does the actual generating of the copies. We collect these
until all the outgoing stack assignments have been done and then execute them,
so that the copies (and perhaps some artifacts like G_SEXTs) are placed after
any stores.

Differential Revision: https://reviews.llvm.org/D110610

3 years ago[mlir] rename the "packing" flag of linalg.pad_tensor to "nofold"
Alex Zinenko [Mon, 4 Oct 2021 10:09:29 +0000 (12:09 +0200)]
[mlir] rename the "packing" flag of linalg.pad_tensor to "nofold"

The discussion in https://reviews.llvm.org/D110425 demonstrated that "packing"
may be a confusing term to define the behavior of this op in presence of the
attribute. Instead, indicate the intended effect of preventing the folder from
being applied.

Reviewed By: nicolasvasilache, silvas

Differential Revision: https://reviews.llvm.org/D111046

3 years agoRevert "[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul"
Jay Foad [Mon, 4 Oct 2021 19:25:42 +0000 (20:25 +0100)]
Revert "[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul"

This reverts commit 90da0b9a5a5322f5a48574274421357d7b22f2cb.

It was causing an LLVM_ENABLE_EXPENSIVE_CHECKS buildbot failure.

3 years agoUpdate `DynTypedNode` to support the conversion of `TypeLoc`s.
James King [Mon, 4 Oct 2021 19:09:12 +0000 (19:09 +0000)]
Update `DynTypedNode` to support the conversion of `TypeLoc`s.

This provides better support for `TypeLoc`s to allow `TypeLoc`-related
matchers to feature stricter typing and to avoid relying on the dynamic
casting of `TypeLoc`s in matchers.

Reviewed By: ymandel, tdl-g, sbenza

Differential Revision: https://reviews.llvm.org/D110586

3 years ago[GlobalISel] Widen G_EXTRACT_VECTOR_ELT using anyext instead of sext.
Amara Emerson [Sat, 25 Sep 2021 05:52:30 +0000 (22:52 -0700)]
[GlobalISel] Widen G_EXTRACT_VECTOR_ELT using anyext instead of sext.

G_SEXT seems to be unnecessary here, anyext will do.

Differential Revision: https://reviews.llvm.org/D110469

3 years ago[PowerPC] Disable vector types when not supported by subtarget features
Lei Huang [Thu, 2 Sep 2021 18:31:10 +0000 (13:31 -0500)]
[PowerPC] Disable vector types when not supported by subtarget features

Update clang to treat vector unsigned long long and friends as invalid
for AltiVec without VSX.

Reported in: https://bugs.llvm.org/show_bug.cgi?id=47782

Reviewed By: nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D109178

3 years ago[fir] add fir.array_modify op
Jean Perier [Mon, 4 Oct 2021 18:57:40 +0000 (20:57 +0200)]
[fir] add fir.array_modify op

fir.array_update is only handling intrinsic assignments.
They are two big differences with user defined assignments:
1. The LHS and RHS types may not match, this does not play well
   with fir.array_update that relies on both the merge and the
   updated element to have the same type.
2. user defined assignment has a call semantics, with potential
   side effects. So if a fir.array_update can hide a call, it traits
   would need to be updated.

Instead of hiding more semantic in the fir.array_update, introduce
a new fir.array_modify op that allows de-correlating indicating that
an array value element is modified, and how it is modified.
This allow the ArrayValueCopy pass to still perform copy elision
while not having to implement the call itself, and could in general
be used for all kind of assignments (e.g. character assignment).

Update the alias analysis to not rely on the merge arguments (since
fir.array_modify has none).
Instead, analyze what is done with the element address.
This implies adding the ability to follow the users of fir.array_modify,
as well as being able to go through fir.store that may be generated to
store the RHS value in order to pass it to a user define routine.
This is done by adding a ReachCollector class to gather all array
accesses.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: schweitz

Differential Revision: https://reviews.llvm.org/D110928

Co-authored-by: Valentin Clement <clementval@gmail.com>
3 years ago[libc++] Disable the Apple system -fno-exceptions CI that is currently building
Louis Dionne [Mon, 4 Oct 2021 18:57:50 +0000 (14:57 -0400)]
[libc++] Disable the Apple system -fno-exceptions CI that is currently building

I'm disabling it to avoid blocking everybody until I've fixed the issue.

3 years ago[fir][NFC] Fix couple of clang-tidy warnings
Valentin Clement [Mon, 4 Oct 2021 18:54:43 +0000 (20:54 +0200)]
[fir][NFC] Fix couple of clang-tidy warnings

Fix some clang-tidy wrning in flang/Optimizer/Support and
remove explicit number of inlined elements for SmallVector. This
is mostly to sync with the changes from fir-dev.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: schweitz

Differential Revision: https://reviews.llvm.org/D111044

3 years ago[X86][SLM] Fix BSR/BSF port usage
Simon Pilgrim [Mon, 4 Oct 2021 18:52:43 +0000 (19:52 +0100)]
[X86][SLM] Fix BSR/BSF port usage

Both ports are required for BitScan ops. Update the port usage and distributed throughput based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.

3 years agoAdd core papers added in the October 2021 WG21 plenary
Corentin Jabot [Mon, 4 Oct 2021 18:42:13 +0000 (14:42 -0400)]
Add core papers added in the October 2021 WG21 plenary

3 years ago[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul
Jay Foad [Fri, 1 Oct 2021 12:30:42 +0000 (13:30 +0100)]
[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul

Also remove some redundancy because the source and result
types of any multiply are always the same.

Differential Revision: https://reviews.llvm.org/D110926

3 years ago[InstCombine] add helper for "is desirable int type"; NFC
Sanjay Patel [Mon, 4 Oct 2021 16:54:01 +0000 (12:54 -0400)]
[InstCombine] add helper for "is desirable int type"; NFC

This splits out the logic from shouldChangeType() that
currently allows 8/16/32-bit transforms even if those
types are not listed as legal in the data layout.

This could be useful as a predicate for vector
insert/extract transforms.

Note that this leaves the subsequent checks in
shouldChangeType() unchanged. We may want to merge
the checks for i1 and/or "ToLegal" into "isDesirable",
but that may alter existing transforms.

3 years ago[InstCombine] add tests for extractelt of bitcasted scalar; NFC
Sanjay Patel [Mon, 4 Oct 2021 13:40:02 +0000 (09:40 -0400)]
[InstCombine] add tests for extractelt of bitcasted scalar; NFC

3 years ago[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable"
Amara Emerson [Tue, 28 Sep 2021 01:11:07 +0000 (18:11 -0700)]
[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable"

We were previously just ignoring unreachable, but targets like Darwin want to
keep unreachable instructions as traps.

Differential Revision: https://reviews.llvm.org/D110603

3 years agoFix msan/tests/msan_test.cpp due to -Wbitwise-instead-of-logical
Amy Kwan [Mon, 4 Oct 2021 17:19:16 +0000 (12:19 -0500)]
Fix msan/tests/msan_test.cpp due to -Wbitwise-instead-of-logical

The LE Power sanitizer bot fails when testing standalone compiler-rt due to
an MSAN test warning introduced by -Wbitwise-instead-of-logical. As this option
along with -Werror is enabled on the bot, the test failure occurs.
This patch updates msan_test.cpp to fix the warning introduced by the
-Wbitwise-instead-of-logical.

3 years ago[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=6
Roman Lebedev [Mon, 4 Oct 2021 17:52:57 +0000 (20:52 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=6

3 years ago[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=6
Roman Lebedev [Mon, 4 Oct 2021 17:05:50 +0000 (20:05 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=6

3 years ago[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=6
Roman Lebedev [Mon, 4 Oct 2021 17:26:49 +0000 (20:26 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=6

3 years ago[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=6
Roman Lebedev [Mon, 4 Oct 2021 17:05:47 +0000 (20:05 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=6

3 years ago[FPEnv][InstSimplify] Prepush more tests for D106362.
Kevin P. Neal [Mon, 4 Oct 2021 17:43:32 +0000 (13:43 -0400)]
[FPEnv][InstSimplify] Prepush more tests for D106362.

In working on D106362 I found that a few more tests were needed. I've
been asked to pre-push the tests for that ticket. This should complete
the tests needed for now.

3 years ago[libc++][NFC] Fix include guard for some detail header
Louis Dionne [Mon, 4 Oct 2021 17:36:08 +0000 (13:36 -0400)]
[libc++][NFC] Fix include guard for some detail header

3 years ago[libc++][NFC] Remove header name from <version>
Louis Dionne [Mon, 4 Oct 2021 17:34:26 +0000 (13:34 -0400)]
[libc++][NFC] Remove header name from <version>

3 years ago[AArch64] Disable AArch64StorePairSuppress under optsize
David Green [Mon, 4 Oct 2021 17:28:15 +0000 (18:28 +0100)]
[AArch64] Disable AArch64StorePairSuppress under optsize

AArch64StorePairSuppress will prevent the creation of LDP's based on
scheduling info. This shouldn't apply when optimizing for size though,
where the size decrease should be considered more important.

Differential Revision: https://reviews.llvm.org/D110809

3 years ago[lldb][import-std-module] Prefer the non-module diagnostics when in fallback mode
Raphael Isemann [Mon, 4 Oct 2021 16:55:22 +0000 (18:55 +0200)]
[lldb][import-std-module] Prefer the non-module diagnostics when in fallback mode

The `fallback` setting for import-std-module is supposed to allow running
expression that require an imported C++ module without causing any regressions
for users (neither in terms of functionality nor performance). This is done by
first trying to normally parse/evaluate an expression and when an error occurred
during this first attempt, we retry with the loaded 'std' module.

When we run into a system with a 'std' module that for some reason doesn't build
or otherwise causes parse errors, then this currently means that the second
parse attempt will overwrite the error diagnostics of the first parse attempt.
Given that the module build errors are outside of the scope of what the user can
influence, it makes more sense to show the errors from the first parse attempt
that are only concerned with the actual user input.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D110696

3 years agolibc++: document in the release notes that a C++20 compiler is expected
Sylvestre Ledru [Mon, 4 Oct 2021 09:06:45 +0000 (11:06 +0200)]
libc++: document in the release notes that a C++20 compiler is expected

Differential Revision: https://reviews.llvm.org/D111043

3 years ago[flang] Better error recovery for missing THEN in ELSE IF
peter klausler [Thu, 30 Sep 2021 22:58:38 +0000 (15:58 -0700)]
[flang] Better error recovery for missing THEN in ELSE IF

The THEN keyword in the "ELSE IF (test) THEN" statement is useless
syntactically, and to omit it is a common error (at least for me!)
that has poor error recovery.  This patch changes the parser to
cough up a simple "expected 'THEN'" and still recognize the rest of
the IF construct.

Differential Revision: https://reviews.llvm.org/D110952

3 years ago[flang][NFC] Fix first line of magic-numbers.h
peter klausler [Fri, 1 Oct 2021 19:55:33 +0000 (12:55 -0700)]
[flang][NFC] Fix first line of magic-numbers.h

The first line of flang/include/flang/Runtime/magic-numbers.h
got split into two somehow; join it back up.

Differential Revision: https://reviews.llvm.org/D110965

3 years ago[NFC] Fix build failure in ScopDetection
Christopher Tetreault [Thu, 30 Sep 2021 23:39:48 +0000 (16:39 -0700)]
[NFC] Fix build failure in ScopDetection

In some build environments, the C++ compiler is unable to infer the
correct type for the DenseMap::insert in isErrorBlock. Typing out
std::make_pair helps.

3 years ago[SimpleLoopUnswitch] Allow threshold to be specified zero or more times
Christopher Tetreault [Mon, 27 Sep 2021 21:23:49 +0000 (14:23 -0700)]
[SimpleLoopUnswitch] Allow threshold to be specified zero or more times

Differential Revision: https://reviews.llvm.org/D110594

3 years ago[mlir][SPIRVToLLVM] Propagate location attribute from spv.GlobalVariable to llvm...
Weiwei Li [Mon, 4 Oct 2021 16:04:33 +0000 (00:04 +0800)]
[mlir][SPIRVToLLVM] Propagate location attribute from spv.GlobalVariable to llvm.mlir.global

This patch is mainly to propogate location attribute from spv.GlobalVariable to llvm.mlir.global.

It also contains three small changes.

1. Remove the restriction on UniformConstant In SPIRVToLLVM.cpp;
2. Remove the errorCheck on relaxedPrecision when deserializering SPIR-V in Deserializer.cpp
3. In SPIRVOps.cpp, let ConstantOp take signedInteger too.

Co-authered: Alan Liu <alanliu.yf@gmail.com> and Xinyi Liu <xyliuhelen@gmail.com>

Reviewed by:antiagainst

Differential revision: https://reviews.llvm.org/D110207

3 years ago[LLDB] Fix objc_clsopt_v16_t struct
Alfsonso Gregory [Mon, 4 Oct 2021 15:54:05 +0000 (08:54 -0700)]
[LLDB] Fix objc_clsopt_v16_t struct

The objc_clsopt_v16_t struct does not match up with the macOS/iOS15
dyld_shared_cache ObjC runtime structures. A struct field was seemingly
omitted.

Differential revision: https://reviews.llvm.org/D110477

3 years ago[lld] Use checkError more
Nico Weber [Mon, 4 Oct 2021 15:45:55 +0000 (11:45 -0400)]
[lld] Use checkError more

No behavior change.

3 years ago[IR] Migrate from getNumArgOperands to arg_size (NFC)
Kazu Hirata [Mon, 4 Oct 2021 15:40:24 +0000 (08:40 -0700)]
[IR] Migrate from getNumArgOperands to arg_size (NFC)

Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.

3 years ago[PowerPC][NFC] Remove reg name option in int128 test
Jinsong Ji [Mon, 4 Oct 2021 15:29:04 +0000 (15:29 +0000)]
[PowerPC][NFC] Remove reg name option in int128 test

The test is generated by script, so we don't really need the regname to
be meaniful here.

AIX doesn't support the reg name option, removing it for now so that we
can reuse the CHECKs for AIX triple as well.

3 years ago[libc++][NFC] Qualify nullptr_t in test
Louis Dionne [Mon, 4 Oct 2021 15:19:16 +0000 (11:19 -0400)]
[libc++][NFC] Qualify nullptr_t in test

3 years ago[gn build] Port 811b1736d91b
LLVM GN Syncbot [Mon, 4 Oct 2021 15:13:27 +0000 (15:13 +0000)]
[gn build] Port 811b1736d91b

3 years ago[analyzer] Add InvalidPtrChecker
Zurab Tsinadze [Sat, 18 Sep 2021 20:54:59 +0000 (22:54 +0200)]
[analyzer] Add InvalidPtrChecker

This patch introduces a new checker: `alpha.security.cert.env.InvalidPtr`

Checker finds usage of invalidated pointers related to environment.

Based on the following SEI CERT Rules:
ENV34-C: https://wiki.sei.cmu.edu/confluence/x/8tYxBQ
ENV31-C: https://wiki.sei.cmu.edu/confluence/x/5NUxBQ

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D97699

3 years ago[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=4
Roman Lebedev [Mon, 4 Oct 2021 14:04:29 +0000 (17:04 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=4

3 years ago[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=4
Roman Lebedev [Mon, 4 Oct 2021 13:55:08 +0000 (16:55 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=4

3 years ago[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=4
Roman Lebedev [Mon, 4 Oct 2021 13:50:43 +0000 (16:50 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=4

3 years ago[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=4
Roman Lebedev [Mon, 4 Oct 2021 13:25:45 +0000 (16:25 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=4

3 years ago[llvm-objdump] Fix common symbol output on 32 bit platforms
David Spickett [Mon, 4 Oct 2021 14:24:03 +0000 (14:24 +0000)]
[llvm-objdump] Fix common symbol output on 32 bit platforms

Since https://reviews.llvm.org/D109452 symbol-table.test has
been failing on our Arm32 bots.

https://lab.llvm.org/buildbot/#/builders/171/builds/4201

This is because in that change an implicit widening cast
of the alignment from 32 bit to 64 bit was removed and the
format string expects a 64 bit number.

3 years ago[libc++][NFC] Qualify usage of nullptr_t in the format tests
Louis Dionne [Mon, 4 Oct 2021 14:22:17 +0000 (10:22 -0400)]
[libc++][NFC] Qualify usage of nullptr_t in the format tests

3 years ago[clangd] Improve PopulateSwitch tweak
David Goldman [Fri, 1 Oct 2021 18:46:57 +0000 (14:46 -0400)]
[clangd] Improve PopulateSwitch tweak

- Support enums in C and ObjC as their
  AST representations differ slightly.

- Add support for typedef'ed enums.

Differential Revision: https://reviews.llvm.org/D110954

3 years ago[clang] Fix computation of number of dependencies using OpenMP iterator,
Alexey Bataev [Mon, 4 Oct 2021 13:28:09 +0000 (06:28 -0700)]
[clang] Fix computation of number of dependencies using OpenMP iterator,
by Raul Penacoba.

The size of kmp_depend_info and the number of dependencies are computed multiplying the iterator sizes, which not right.
Now size is computed as:

itersize1*numclausedeps1 + itersize2*numclausedeps2 + ... + itersizeN*numclausedepsN

where itersizeX is the size of the iterator and numclausedepsX the number of dependencies in that depend clause.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D111045

3 years ago[demangle] Add a unittest for _Float16 demangling. NFC
Wang, Pengfei [Wed, 29 Sep 2021 15:05:18 +0000 (23:05 +0800)]
[demangle] Add a unittest for _Float16 demangling. NFC

3 years ago[AArch64] Test for Store Pair Suppress under minsize.
David Green [Mon, 4 Oct 2021 14:01:18 +0000 (15:01 +0100)]
[AArch64] Test for Store Pair Suppress under minsize.

3 years ago[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC
Bjorn Pettersson [Tue, 28 Sep 2021 08:26:25 +0000 (10:26 +0200)]
[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC

In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer
need the IsSizeTTy lambda function and the SizeTTy object. Instead
we just follow the regular structure of checking for integer types
given an exepected number of bits.

3 years ago[OpenMP] Add options to change Attributor max iterations in OpenMPOpt
Joseph Huber [Wed, 29 Sep 2021 17:45:07 +0000 (13:45 -0400)]
[OpenMP] Add options to change Attributor max iterations in OpenMPOpt

This patch adds a new command line option `openmp-opt-max-iterations`
that controls the maximum number of iterations the attributor will run
for when compiling OpenMP target device code. This patch also adds a
remark to indicate when the attributor failed because it did not run
for enough iterations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110749

3 years ago[X86] SimplifyDemandedVectorEltsForTargetNode - simplify PMADDWD for known zero elements
Simon Pilgrim [Mon, 4 Oct 2021 13:36:32 +0000 (14:36 +0100)]
[X86] SimplifyDemandedVectorEltsForTargetNode - simplify PMADDWD for known zero elements

Noticed while investigating the regressions in D110995 - if the RHS element is already zero, then we don't need the corresponding LHS element.

Technically we could also recheck RHS once we have LHS's known zeros, but I haven't seen any missed opportunities from that yet.

3 years ago[lldb] Fix a stray array access in Editline
Pavel Labath [Mon, 4 Oct 2021 12:23:44 +0000 (14:23 +0200)]
[lldb] Fix a stray array access in Editline

This manifested itself as an asan failure in TestMultilineNavigation.py.

3 years ago[lldb] Add unit tests for Terminal API
Michał Górny [Fri, 1 Oct 2021 19:28:08 +0000 (21:28 +0200)]
[lldb] Add unit tests for Terminal API

Differential Revision: https://reviews.llvm.org/D110962

3 years ago[X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs
Roman Lebedev [Mon, 4 Oct 2021 11:23:51 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs

This required huge amount of assembly surgery, but i think this is about right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0`
So could pick cost of `25`.

For store we have:
https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `24`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111031

3 years ago[X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs
Roman Lebedev [Mon, 4 Oct 2021 11:23:51 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs

This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `10`.

For store we have:
https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111027

3 years ago[X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs
Roman Lebedev [Mon, 4 Oct 2021 11:23:46 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs

This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111026

3 years ago[X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs
Roman Lebedev [Mon, 4 Oct 2021 11:23:42 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `1`.

For store we have:
https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111025

3 years ago[X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs
Roman Lebedev [Mon, 4 Oct 2021 11:23:13 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `22`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111022

3 years ago[X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs
Roman Lebedev [Mon, 4 Oct 2021 11:23:13 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `11`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111021

3 years ago[X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs
Roman Lebedev [Mon, 4 Oct 2021 11:23:08 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111020

3 years ago[X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs
Roman Lebedev [Mon, 4 Oct 2021 11:23:04 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111019

3 years ago[PowerPC] Fix __builtin_ppc_load2r to return short instead of int.
Stefan Pintilie [Wed, 29 Sep 2021 20:06:30 +0000 (15:06 -0500)]
[PowerPC] Fix __builtin_ppc_load2r to return short instead of int.

This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D110771

3 years ago[APFloat] Common up some assertions. NFC.
Jay Foad [Mon, 4 Oct 2021 10:33:22 +0000 (11:33 +0100)]
[APFloat] Common up some assertions. NFC.

3 years ago[mlir] Tighten strided layout specification.
Nicolas Vasilache [Fri, 1 Oct 2021 11:54:29 +0000 (11:54 +0000)]
[mlir] Tighten strided layout specification.

Clarify that the strided layout specification is represented by a single semi-affine map.

Differential Revision: https://reviews.llvm.org/D110921

3 years ago[APFloat] Remove BitWidth argument from getAllOnesValue
Jay Foad [Mon, 4 Oct 2021 09:20:18 +0000 (10:20 +0100)]
[APFloat] Remove BitWidth argument from getAllOnesValue

There's no need to pass this in explicitly because it is
trivially available from the semantics.

3 years ago[lldb] [test] Terminate "process connect" connections via kill
Michał Górny [Sat, 2 Oct 2021 16:13:44 +0000 (18:13 +0200)]
[lldb] [test] Terminate "process connect" connections via kill

Fix the termination of "process connect" (and "gdb-remote") to kill
the process rather than attempting to disconnect the platform.
The latter only results in an error since we did not use "platform
connect", and apparently process-level connections (at least via
gdb-remote) do not really support disconnecting.

Differential Revision: https://reviews.llvm.org/D110996

3 years ago[X86] Add tests for enabling slow-mulld on AVX2 targets
Simon Pilgrim [Mon, 4 Oct 2021 10:17:17 +0000 (11:17 +0100)]
[X86] Add tests for enabling slow-mulld on AVX2 targets

As discussed on D110588 - Haswell/Broadwell don't have a great PMULLD implementation, we might want to enable this for them in the future

3 years ago[MLIR] Fix unused tablegen template arg warnings
Cullen Rhodes [Mon, 4 Oct 2021 09:28:57 +0000 (09:28 +0000)]
[MLIR] Fix unused tablegen template arg warnings

Identified in D109359.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D110805

3 years ago[ELF][test] Fix several LLD ICF tests
Andrew Ng [Thu, 23 Sep 2021 17:42:31 +0000 (18:42 +0100)]
[ELF][test] Fix several LLD ICF tests

A number of the ICF tests were not updated to use --print-icf-sections
instead of --verbose and various '-NOT' checks were not updated to the
latest output format of --print-icf-sections. Because these are all
'negative' tests, these issues have gone unnoticed.

Differential Revision: https://reviews.llvm.org/D110353

3 years ago[mlir][python] Provide more convenient constructors for std.CallOp
Alex Zinenko [Mon, 4 Oct 2021 09:39:19 +0000 (11:39 +0200)]
[mlir][python] Provide more convenient constructors for std.CallOp

The new constructor relies on type-based dynamic dispatch and allows one to
construct call operations given an object representing a FuncOp or its name as
a string, as opposed to requiring an explicitly constructed attribute.

Depends On D110947

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110948

3 years ago[mlir][python] Provide more convenient wrappers for std.ConstantOp
Alex Zinenko [Mon, 4 Oct 2021 09:38:53 +0000 (11:38 +0200)]
[mlir][python] Provide more convenient wrappers for std.ConstantOp

Constructing a ConstantOp using the default-generated API is verbose and
requires to specify the constant type twice: for the result type of the
operation and for the type of the attribute. It also requires to explicitly
construct the attribute. Provide custom constructors that take the type once
and accept a raw value instead of the attribute. This requires dynamic dispatch
based on type in the constructor. Also provide the corresponding accessors to
raw values.

In addition, provide a "refinement" class ConstantIndexOp similar to what
exists in C++. Unlike other "op view" Python classes, operations cannot be
automatically downcasted to this class since it does not correspond to a
specific operation name. It only exists to simplify construction of the
operation.

Depends On D110946

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110947

3 years ago[mlir][python] Usability improvements for Python bindings
Alex Zinenko [Mon, 4 Oct 2021 09:38:20 +0000 (11:38 +0200)]
[mlir][python] Usability improvements for Python bindings

Provide a couple of quality-of-life usability improvements for Python bindings,
in particular:

  * give access to the list of types for the list of op results or block
    arguments, similarly to ValueRange->TypeRange,

  * allow for constructing empty dictionary arrays,

  * support construction of array attributes by concatenating an existing
    attribute with a Python list of attributes.

All these are required for the upcoming customization of builtin and standard
ops.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110946

3 years ago[libFuzzer] Use octal instead of hex escape sequences in PrintASCII
Hans Wennborg [Fri, 1 Oct 2021 08:59:55 +0000 (10:59 +0200)]
[libFuzzer] Use octal instead of hex escape sequences in PrintASCII

Previously, PrintASCII would print the string "\ta" as "\x09a". However,
in C/C++ those strings are not the same: the trailing 'a' is part of the
escape sequence, which means it's equivalent to "\x9a". This is an
annoying quirk of the standard. (See
https://eel.is/c++draft/lex.ccon#nt:hexadecimal-escape-sequence)

To fix this, output three-digit octal escape sequences instead. Since
octal escapes are limited to max three digits, this avoids the problem
of subsequent characters unintentionally becoming part of the escape
sequence.

Dictionary files still use the non-C-compatible hex escapes, but I
believe we can't change the format since it comes from AFL, and
libfuzzer never writes such files, it only has to read them, so they're
not affected by this change.

Differential revision: https://reviews.llvm.org/D110920

3 years ago[LoopBoundSplit] Use SCEVAddRecExpr instead of SCEV for AddRecSCEV (NFC)
Jingu Kang [Mon, 13 Sep 2021 11:09:16 +0000 (12:09 +0100)]
[LoopBoundSplit] Use SCEVAddRecExpr instead of SCEV for AddRecSCEV (NFC)

Differential Revision: https://reviews.llvm.org/D109682

3 years ago[NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
David Sherwood [Mon, 4 Oct 2021 08:52:26 +0000 (09:52 +0100)]
[NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor

Avoid creating EpilogueVectorizationForceVF twice.

3 years ago[APInt] Stop using soft-deprecated constructors and methods in clang. NFC.
Jay Foad [Thu, 30 Sep 2021 09:50:04 +0000 (10:50 +0100)]
[APInt] Stop using soft-deprecated constructors and methods in clang. NFC.

Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.

Differential Revision: https://reviews.llvm.org/D110808

3 years ago[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Jay Foad [Thu, 30 Sep 2021 08:54:57 +0000 (09:54 +0100)]
[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.

Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807

3 years ago[openmp] [elf_common] Fix linking against LLVM dylib
Michał Górny [Mon, 4 Oct 2021 06:25:45 +0000 (08:25 +0200)]
[openmp] [elf_common] Fix linking against LLVM dylib

The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries.  Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.

Differential Revision: https://reviews.llvm.org/D111038

3 years ago[LLDB] Skip TestClangREPL.py on Arm/AArch64 Linux
Muhammad Omair Javaid [Mon, 4 Oct 2021 06:49:04 +0000 (11:49 +0500)]
[LLDB] Skip TestClangREPL.py on Arm/AArch64 Linux

TestClangREPL.py has been failing randomly on Arm/AArch64 Linux
buildbot. I am marking it as skipped to reduce false alarms.

3 years ago[mli][linalg] Change tensor size in unit test (NFC).
Tobias Gysi [Mon, 4 Oct 2021 06:23:53 +0000 (06:23 +0000)]
[mli][linalg] Change tensor size in unit test (NFC).

As a follow up to https://reviews.llvm.org/D110849, adapt the input tensor size to match the iteration space.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D110906

3 years ago[clangd] Follow-up on rGdea48079b90d
Kirill Bobyrev [Mon, 4 Oct 2021 06:39:06 +0000 (08:39 +0200)]
[clangd] Follow-up on rGdea48079b90d

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D110925

3 years ago[lldb] Refactor variable parsing
Jaroslav Sevcik [Sat, 25 Sep 2021 17:29:04 +0000 (19:29 +0200)]
[lldb] Refactor variable parsing

Separates the methods for recursive variable parsing in function
context and non-recursive parsing of global variables.

Differential Revision: https://reviews.llvm.org/D110570

3 years ago[SCEV] Cap the number of instructions scanned when infering flags
Philip Reames [Sun, 3 Oct 2021 23:14:06 +0000 (16:14 -0700)]
[SCEV] Cap the number of instructions scanned when infering flags

This addresses a comment from review on D109845.  The concern was raised that an unbounded scan would be expensive.  Long term plan is to cache this search - likely reusing the existing mechanism for loop side effects - but let's be simple and conservative for now.

3 years ago[SCEV] Use trivial bound on defining scope of all SCEVs when computing flags
Philip Reames [Sun, 3 Oct 2021 23:01:30 +0000 (16:01 -0700)]
[SCEV] Use trivial bound on defining scope of all SCEVs when computing flags

This addresses a comment from review on D109845.  Even for SCEVs which we can't find true bounds without recursing through operands, entry to the function forms a trivial upper bound.  In some cases, this trivial bound is enough to prove safety of flag inference.

3 years ago[SCEV] Use full logic when infering flags on add and gep
Philip Reames [Sun, 3 Oct 2021 22:32:15 +0000 (15:32 -0700)]
[SCEV] Use full logic when infering flags on add and gep

This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path.

We can still do much better here (on both paths), but this is our first step.

Differential Revision: https://reviews.llvm.org/D111003

3 years ago[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add...
Philip Reames [Sun, 3 Oct 2021 22:19:33 +0000 (15:19 -0700)]
[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec

This fixes a violation of the wrap flag rules introduced in c4048d8f. This is an alternate fix to D106852.

The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.

In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation.

One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.

The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)

Differential Revision: https://reviews.llvm.org/D109845

3 years ago[AttrBuilder] Make handling of int attribtues more generifc (NFC)
Nikita Popov [Sun, 3 Oct 2021 20:23:05 +0000 (22:23 +0200)]
[AttrBuilder] Make handling of int attribtues more generifc (NFC)

This is basically the same change as 42cc7f3c524a0ede6b903486c588003fe12d9293
but for integer attributes. Rather than treating each attribute
individually, handle them all the same way. The only thing that
needs to be done per attribute is specify how get/add convert
from/to the raw representation.

3 years ago[openmp] Fix a typo in a test REQUIRES line
Martin Storsjö [Fri, 27 Aug 2021 09:16:03 +0000 (09:16 +0000)]
[openmp] Fix a typo in a test REQUIRES line

Differential Revision: https://reviews.llvm.org/D110963

3 years ago[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:37:23 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111018

3 years ago[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:37:22 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.

For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111017

3 years ago[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:37:18 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.

For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111016

3 years ago[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:37:13 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111015

3 years ago[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:37:09 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111014

3 years ago[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:23:13 +0000 (23:23 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.

For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111013

3 years ago[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:23:13 +0000 (23:23 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.

For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111012

3 years ago[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:23:08 +0000 (23:23 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.

For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111011

3 years ago[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:23:03 +0000 (23:23 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111010

3 years ago[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs
Roman Lebedev [Sun, 3 Oct 2021 20:22:58 +0000 (23:22 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111008

3 years ago[Clang][NFC] Fix the comment for Sema::DiagIfReachable
Yuanfang Chen [Sun, 3 Oct 2021 19:49:14 +0000 (12:49 -0700)]
[Clang][NFC] Fix the comment for Sema::DiagIfReachable

3 years ago[mlir] [test] Add missing tool substitutions
Michał Górny [Sat, 2 Oct 2021 09:52:08 +0000 (11:52 +0200)]
[mlir] [test] Add missing tool substitutions

Add missing mlir-capi-*-test tool substitutions in order to fix CAPI
test failures when mlir is not installed yet.

Differential Revision: https://reviews.llvm.org/D110991