review.tizen.org Git - platform/upstream/llvm.git/log

[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul

Also remove some redundancy because the source and result
types of any multiply are always the same.

Differential Revision: https://reviews.llvm.org/D110926

[InstCombine] add helper for "is desirable int type"; NFC

This splits out the logic from shouldChangeType() that
currently allows 8/16/32-bit transforms even if those
types are not listed as legal in the data layout.

This could be useful as a predicate for vector
insert/extract transforms.

Note that this leaves the subsequent checks in
shouldChangeType() unchanged. We may want to merge
the checks for i1 and/or "ToLegal" into "isDesirable",
but that may alter existing transforms.

[InstCombine] add tests for extractelt of bitcasted scalar; NFC

[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable"

We were previously just ignoring unreachable, but targets like Darwin want to
keep unreachable instructions as traps.

Differential Revision: https://reviews.llvm.org/D110603

Fix msan/tests/msan_test.cpp due to -Wbitwise-instead-of-logical

The LE Power sanitizer bot fails when testing standalone compiler-rt due to
an MSAN test warning introduced by -Wbitwise-instead-of-logical. As this option
along with -Werror is enabled on the bot, the test failure occurs.
This patch updates msan_test.cpp to fix the warning introduced by the
-Wbitwise-instead-of-logical.

[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=6

[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=6

[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=6

[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=6

[FPEnv][InstSimplify] Prepush more tests for D106362.

In working on D106362 I found that a few more tests were needed. I've
been asked to pre-push the tests for that ticket. This should complete
the tests needed for now.

[libc++][NFC] Fix include guard for some detail header

[libc++][NFC] Remove header name from <version>

[AArch64] Disable AArch64StorePairSuppress under optsize

AArch64StorePairSuppress will prevent the creation of LDP's based on
scheduling info. This shouldn't apply when optimizing for size though,
where the size decrease should be considered more important.

Differential Revision: https://reviews.llvm.org/D110809

[lldb][import-std-module] Prefer the non-module diagnostics when in fallback mode

The `fallback` setting for import-std-module is supposed to allow running
expression that require an imported C++ module without causing any regressions
for users (neither in terms of functionality nor performance). This is done by
first trying to normally parse/evaluate an expression and when an error occurred
during this first attempt, we retry with the loaded 'std' module.

When we run into a system with a 'std' module that for some reason doesn't build
or otherwise causes parse errors, then this currently means that the second
parse attempt will overwrite the error diagnostics of the first parse attempt.
Given that the module build errors are outside of the scope of what the user can
influence, it makes more sense to show the errors from the first parse attempt
that are only concerned with the actual user input.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D110696

libc++: document in the release notes that a C++20 compiler is expected

Differential Revision: https://reviews.llvm.org/D111043

[flang] Better error recovery for missing THEN in ELSE IF

The THEN keyword in the "ELSE IF (test) THEN" statement is useless
syntactically, and to omit it is a common error (at least for me!)
that has poor error recovery. This patch changes the parser to
cough up a simple "expected 'THEN'" and still recognize the rest of
the IF construct.

Differential Revision: https://reviews.llvm.org/D110952

[flang][NFC] Fix first line of magic-numbers.h

The first line of flang/include/flang/Runtime/magic-numbers.h
got split into two somehow; join it back up.

Differential Revision: https://reviews.llvm.org/D110965

[NFC] Fix build failure in ScopDetection

In some build environments, the C++ compiler is unable to infer the
correct type for the DenseMap::insert in isErrorBlock. Typing out
std::make_pair helps.

[SimpleLoopUnswitch] Allow threshold to be specified zero or more times

Differential Revision: https://reviews.llvm.org/D110594

[mlir][SPIRVToLLVM] Propagate location attribute from spv.GlobalVariable to llvm.mlir.global

This patch is mainly to propogate location attribute from spv.GlobalVariable to llvm.mlir.global.

It also contains three small changes.

1. Remove the restriction on UniformConstant In SPIRVToLLVM.cpp;
2. Remove the errorCheck on relaxedPrecision when deserializering SPIR-V in Deserializer.cpp
3. In SPIRVOps.cpp, let ConstantOp take signedInteger too.

Co-authered: Alan Liu <alanliu.yf@gmail.com> and Xinyi Liu <xyliuhelen@gmail.com>

Reviewed by:antiagainst

Differential revision: https://reviews.llvm.org/D110207

[LLDB] Fix objc_clsopt_v16_t struct

The objc_clsopt_v16_t struct does not match up with the macOS/iOS15
dyld_shared_cache ObjC runtime structures. A struct field was seemingly
omitted.

Differential revision: https://reviews.llvm.org/D110477

[lld] Use checkError more

No behavior change.

[IR] Migrate from getNumArgOperands to arg_size (NFC)

Note that arg_operands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.

[PowerPC][NFC] Remove reg name option in int128 test

The test is generated by script, so we don't really need the regname to
be meaniful here.

AIX doesn't support the reg name option, removing it for now so that we
can reuse the CHECKs for AIX triple as well.

[libc++][NFC] Qualify nullptr_t in test

[gn build] Port 811b1736d91b

[analyzer] Add InvalidPtrChecker

This patch introduces a new checker: `alpha.security.cert.env.InvalidPtr`

Checker finds usage of invalidated pointers related to environment.

Based on the following SEI CERT Rules:
ENV34-C: https://wiki.sei.cmu.edu/confluence/x/8tYxBQ
ENV31-C: https://wiki.sei.cmu.edu/confluence/x/5NUxBQ

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D97699

[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=4

[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=4

[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=4

[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=4

[llvm-objdump] Fix common symbol output on 32 bit platforms

Since https://reviews.llvm.org/D109452 symbol-table.test has
been failing on our Arm32 bots.

https://lab.llvm.org/buildbot/#/builders/171/builds/4201

This is because in that change an implicit widening cast
of the alignment from 32 bit to 64 bit was removed and the
format string expects a 64 bit number.

[libc++][NFC] Qualify usage of nullptr_t in the format tests

[clangd] Improve PopulateSwitch tweak

- Support enums in C and ObjC as their
AST representations differ slightly.

- Add support for typedef'ed enums.

Differential Revision: https://reviews.llvm.org/D110954

[clang] Fix computation of number of dependencies using OpenMP iterator,
by Raul Penacoba.

The size of kmp_depend_info and the number of dependencies are computed multiplying the iterator sizes, which not right.
Now size is computed as:

itersize1*numclausedeps1 + itersize2*numclausedeps2 + ... + itersizeN*numclausedepsN

where itersizeX is the size of the iterator and numclausedepsX the number of dependencies in that depend clause.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D111045

[demangle] Add a unittest for _Float16 demangling. NFC

[AArch64] Test for Store Pair Suppress under minsize.

[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC

In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer
need the IsSizeTTy lambda function and the SizeTTy object. Instead
we just follow the regular structure of checking for integer types
given an exepected number of bits.

[OpenMP] Add options to change Attributor max iterations in OpenMPOpt

This patch adds a new command line option `openmp-opt-max-iterations`
that controls the maximum number of iterations the attributor will run
for when compiling OpenMP target device code. This patch also adds a
remark to indicate when the attributor failed because it did not run
for enough iterations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110749

[X86] SimplifyDemandedVectorEltsForTargetNode - simplify PMADDWD for known zero elements

Noticed while investigating the regressions in D110995 - if the RHS element is already zero, then we don't need the corresponding LHS element.

Technically we could also recheck RHS once we have LHS's known zeros, but I haven't seen any missed opportunities from that yet.

[lldb] Fix a stray array access in Editline

This manifested itself as an asan failure in TestMultilineNavigation.py.

[lldb] Add unit tests for Terminal API

Differential Revision: https://reviews.llvm.org/D110962

[X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs

This required huge amount of assembly surgery, but i think this is about right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0`
So could pick cost of `25`.

For store we have:
https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `24`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111031

[X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs

This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `10`.

For store we have:
https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111027

[X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs

This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111026

[X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `1`.

For store we have:
https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111025

[X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `22`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111022

[X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `11`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111021

[X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111020

[X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111019

[PowerPC] Fix __builtin_ppc_load2r to return short instead of int.

This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D110771

[APFloat] Common up some assertions. NFC.

[mlir] Tighten strided layout specification.

Clarify that the strided layout specification is represented by a single semi-affine map.

Differential Revision: https://reviews.llvm.org/D110921

[APFloat] Remove BitWidth argument from getAllOnesValue

There's no need to pass this in explicitly because it is
trivially available from the semantics.

[lldb] [test] Terminate "process connect" connections via kill

Fix the termination of "process connect" (and "gdb-remote") to kill
the process rather than attempting to disconnect the platform.
The latter only results in an error since we did not use "platform
connect", and apparently process-level connections (at least via
gdb-remote) do not really support disconnecting.

Differential Revision: https://reviews.llvm.org/D110996

[X86] Add tests for enabling slow-mulld on AVX2 targets

As discussed on D110588 - Haswell/Broadwell don't have a great PMULLD implementation, we might want to enable this for them in the future

[MLIR] Fix unused tablegen template arg warnings

Identified in D109359.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D110805

[ELF][test] Fix several LLD ICF tests

A number of the ICF tests were not updated to use --print-icf-sections
instead of --verbose and various '-NOT' checks were not updated to the
latest output format of --print-icf-sections. Because these are all
'negative' tests, these issues have gone unnoticed.

Differential Revision: https://reviews.llvm.org/D110353

[mlir][python] Provide more convenient constructors for std.CallOp

The new constructor relies on type-based dynamic dispatch and allows one to
construct call operations given an object representing a FuncOp or its name as
a string, as opposed to requiring an explicitly constructed attribute.

Depends On D110947

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110948

[mlir][python] Provide more convenient wrappers for std.ConstantOp

Constructing a ConstantOp using the default-generated API is verbose and
requires to specify the constant type twice: for the result type of the
operation and for the type of the attribute. It also requires to explicitly
construct the attribute. Provide custom constructors that take the type once
and accept a raw value instead of the attribute. This requires dynamic dispatch
based on type in the constructor. Also provide the corresponding accessors to
raw values.

In addition, provide a "refinement" class ConstantIndexOp similar to what
exists in C++. Unlike other "op view" Python classes, operations cannot be
automatically downcasted to this class since it does not correspond to a
specific operation name. It only exists to simplify construction of the
operation.

Depends On D110946

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110947

[mlir][python] Usability improvements for Python bindings

Provide a couple of quality-of-life usability improvements for Python bindings,
in particular:

  * give access to the list of types for the list of op results or block
    arguments, similarly to ValueRange->TypeRange,

  * allow for constructing empty dictionary arrays,

  * support construction of array attributes by concatenating an existing
    attribute with a Python list of attributes.

All these are required for the upcoming customization of builtin and standard
ops.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110946

[libFuzzer] Use octal instead of hex escape sequences in PrintASCII

Previously, PrintASCII would print the string "\ta" as "\x09a". However,
in C/C++ those strings are not the same: the trailing 'a' is part of the
escape sequence, which means it's equivalent to "\x9a". This is an
annoying quirk of the standard. (See
https://eel.is/c++draft/lex.ccon#nt:hexadecimal-escape-sequence)

To fix this, output three-digit octal escape sequences instead. Since
octal escapes are limited to max three digits, this avoids the problem
of subsequent characters unintentionally becoming part of the escape
sequence.

Dictionary files still use the non-C-compatible hex escapes, but I
believe we can't change the format since it comes from AFL, and
libfuzzer never writes such files, it only has to read them, so they're
not affected by this change.

Differential revision: https://reviews.llvm.org/D110920

[LoopBoundSplit] Use SCEVAddRecExpr instead of SCEV for AddRecSCEV (NFC)

Differential Revision: https://reviews.llvm.org/D109682

[NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor

Avoid creating EpilogueVectorizationForceVF twice.

[APInt] Stop using soft-deprecated constructors and methods in clang. NFC.

Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.

Differential Revision: https://reviews.llvm.org/D110808

[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.

Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807

[openmp] [elf_common] Fix linking against LLVM dylib

The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries. Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.

Differential Revision: https://reviews.llvm.org/D111038

[LLDB] Skip TestClangREPL.py on Arm/AArch64 Linux

TestClangREPL.py has been failing randomly on Arm/AArch64 Linux
buildbot. I am marking it as skipped to reduce false alarms.

[mli][linalg] Change tensor size in unit test (NFC).

As a follow up to https://reviews.llvm.org/D110849, adapt the input tensor size to match the iteration space.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D110906

[clangd] Follow-up on rGdea48079b90d

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D110925

[lldb] Refactor variable parsing

Separates the methods for recursive variable parsing in function
context and non-recursive parsing of global variables.

Differential Revision: https://reviews.llvm.org/D110570

[SCEV] Cap the number of instructions scanned when infering flags

This addresses a comment from review on D109845. The concern was raised that an unbounded scan would be expensive. Long term plan is to cache this search - likely reusing the existing mechanism for loop side effects - but let's be simple and conservative for now.

[SCEV] Use trivial bound on defining scope of all SCEVs when computing flags

This addresses a comment from review on D109845. Even for SCEVs which we can't find true bounds without recursing through operands, entry to the function forms a trivial upper bound. In some cases, this trivial bound is enough to prove safety of flag inference.

[SCEV] Use full logic when infering flags on add and gep

This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path.

We can still do much better here (on both paths), but this is our first step.

Differential Revision: https://reviews.llvm.org/D111003

[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec

This fixes a violation of the wrap flag rules introduced in c4048d8f. This is an alternate fix to D106852.

The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.

In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation.

One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.

The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)

Differential Revision: https://reviews.llvm.org/D109845

[AttrBuilder] Make handling of int attribtues more generifc (NFC)

This is basically the same change as 42cc7f3c524a0ede6b903486c588003fe12d9293
but for integer attributes. Rather than treating each attribute
individually, handle them all the same way. The only thing that
needs to be done per attribute is specify how get/add convert
from/to the raw representation.

[openmp] Fix a typo in a test REQUIRES line

Differential Revision: https://reviews.llvm.org/D110963

[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111018

[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.

For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111017

[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.

For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111016

[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111015

[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111014

[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.

For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111013

[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.

For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111012

[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.

For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111011

[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111010

[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111008

[Clang][NFC] Fix the comment for Sema::DiagIfReachable

[mlir] [test] Add missing tool substitutions

Add missing mlir-capi-*-test tool substitutions in order to fix CAPI
test failures when mlir is not installed yet.

Differential Revision: https://reviews.llvm.org/D110991

[ARM] Mark <= -1 immediate constant as cheap

A <= -1 constant on a compare can be converted to a < 0 operation, which
is usually cheap. If we mark the constant as cheap, preventing hoisting,
we allow that fold to happen even across different blocks.

Differential Revision: https://reviews.llvm.org/D109360

[X86] Split Cannonlake + Icelake Tuning. NFC

The Ice/Tiger/RocketLake specs were inheriting the tuning settings from CannonLake, a previous architecture. We shouldn't have this dependency, so I've copied the current tuning settings so we can make future adjustments to both CNL + ICL etc. more easily.

[CostModel][X86] X86TTIImpl::getCmpSelInstrCost - try to use Predicate argument directly first (PR48337)

There's still a lot of cases where getCmpSelInstrCost fails to specify a predicate, once those are in place we should be able to remove the fallback to the Instruction argument entirely.

[ARM] Tests for constant hoisting -1 immediates

[Analysis, CodeGen] Migrate from arg_operands to args (NFC)

Note that arg_operands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.

[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=3

[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=3

[InstCombine] fold cast of right-shift if high bits are not demanded (3rd try)

The first two tries at this were reverted because they caused an
infinite loop in instcombine.
That should be fixed after a series of patches that ended with
removing the faulty opposing transform:
3fabd98e5b3e

Original commit message:
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170

[InstCombine] add test for shl + demanded bits; NFC

This is a reduction of a test that would infinite loop with D110170.

[InstSimplify] Add additional load from constant test (NFC)

This case does not get folded, because the GEP indexes too deeply
(to the i8), making the bitcast logic not apply (on the [8 x i8]).

[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=3