review.tizen.org Git - platform/upstream/llvm.git/log

[flang][NFC] Fix first line of magic-numbers.h

The first line of flang/include/flang/Runtime/magic-numbers.h
got split into two somehow; join it back up.

Differential Revision: https://reviews.llvm.org/D110965

[NFC] Fix build failure in ScopDetection

In some build environments, the C++ compiler is unable to infer the
correct type for the DenseMap::insert in isErrorBlock. Typing out
std::make_pair helps.

[SimpleLoopUnswitch] Allow threshold to be specified zero or more times

Differential Revision: https://reviews.llvm.org/D110594

[mlir][SPIRVToLLVM] Propagate location attribute from spv.GlobalVariable to llvm.mlir.global

This patch is mainly to propogate location attribute from spv.GlobalVariable to llvm.mlir.global.

It also contains three small changes.

1. Remove the restriction on UniformConstant In SPIRVToLLVM.cpp;
2. Remove the errorCheck on relaxedPrecision when deserializering SPIR-V in Deserializer.cpp
3. In SPIRVOps.cpp, let ConstantOp take signedInteger too.

Co-authered: Alan Liu <alanliu.yf@gmail.com> and Xinyi Liu <xyliuhelen@gmail.com>

Reviewed by:antiagainst

Differential revision: https://reviews.llvm.org/D110207

[LLDB] Fix objc_clsopt_v16_t struct

The objc_clsopt_v16_t struct does not match up with the macOS/iOS15
dyld_shared_cache ObjC runtime structures. A struct field was seemingly
omitted.

Differential revision: https://reviews.llvm.org/D110477

[lld] Use checkError more

No behavior change.

[IR] Migrate from getNumArgOperands to arg_size (NFC)

Note that arg_operands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.

[PowerPC][NFC] Remove reg name option in int128 test

The test is generated by script, so we don't really need the regname to
be meaniful here.

AIX doesn't support the reg name option, removing it for now so that we
can reuse the CHECKs for AIX triple as well.

[libc++][NFC] Qualify nullptr_t in test

[gn build] Port 811b1736d91b

[analyzer] Add InvalidPtrChecker

This patch introduces a new checker: `alpha.security.cert.env.InvalidPtr`

Checker finds usage of invalidated pointers related to environment.

Based on the following SEI CERT Rules:
ENV34-C: https://wiki.sei.cmu.edu/confluence/x/8tYxBQ
ENV31-C: https://wiki.sei.cmu.edu/confluence/x/5NUxBQ

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D97699

[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=4

[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=4

[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=4

[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=4

[llvm-objdump] Fix common symbol output on 32 bit platforms

Since https://reviews.llvm.org/D109452 symbol-table.test has
been failing on our Arm32 bots.

https://lab.llvm.org/buildbot/#/builders/171/builds/4201

This is because in that change an implicit widening cast
of the alignment from 32 bit to 64 bit was removed and the
format string expects a 64 bit number.

[libc++][NFC] Qualify usage of nullptr_t in the format tests

[clangd] Improve PopulateSwitch tweak

- Support enums in C and ObjC as their
AST representations differ slightly.

- Add support for typedef'ed enums.

Differential Revision: https://reviews.llvm.org/D110954

[clang] Fix computation of number of dependencies using OpenMP iterator,
by Raul Penacoba.

The size of kmp_depend_info and the number of dependencies are computed multiplying the iterator sizes, which not right.
Now size is computed as:

itersize1*numclausedeps1 + itersize2*numclausedeps2 + ... + itersizeN*numclausedepsN

where itersizeX is the size of the iterator and numclausedepsX the number of dependencies in that depend clause.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D111045

[demangle] Add a unittest for _Float16 demangling. NFC

[AArch64] Test for Store Pair Suppress under minsize.

[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC

In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer
need the IsSizeTTy lambda function and the SizeTTy object. Instead
we just follow the regular structure of checking for integer types
given an exepected number of bits.

[OpenMP] Add options to change Attributor max iterations in OpenMPOpt

This patch adds a new command line option `openmp-opt-max-iterations`
that controls the maximum number of iterations the attributor will run
for when compiling OpenMP target device code. This patch also adds a
remark to indicate when the attributor failed because it did not run
for enough iterations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110749

[X86] SimplifyDemandedVectorEltsForTargetNode - simplify PMADDWD for known zero elements

Noticed while investigating the regressions in D110995 - if the RHS element is already zero, then we don't need the corresponding LHS element.

Technically we could also recheck RHS once we have LHS's known zeros, but I haven't seen any missed opportunities from that yet.

[lldb] Fix a stray array access in Editline

This manifested itself as an asan failure in TestMultilineNavigation.py.

[lldb] Add unit tests for Terminal API

Differential Revision: https://reviews.llvm.org/D110962

[X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs

This required huge amount of assembly surgery, but i think this is about right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0`
So could pick cost of `25`.

For store we have:
https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `24`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111031

[X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs

This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `10`.

For store we have:
https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111027

[X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs

This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111026

[X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `1`.

For store we have:
https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111025

[X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `22`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111022

[X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `11`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111021

[X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111020

[X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111019

[PowerPC] Fix __builtin_ppc_load2r to return short instead of int.

This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D110771

[APFloat] Common up some assertions. NFC.

[mlir] Tighten strided layout specification.

Clarify that the strided layout specification is represented by a single semi-affine map.

Differential Revision: https://reviews.llvm.org/D110921

[APFloat] Remove BitWidth argument from getAllOnesValue

There's no need to pass this in explicitly because it is
trivially available from the semantics.

[lldb] [test] Terminate "process connect" connections via kill

Fix the termination of "process connect" (and "gdb-remote") to kill
the process rather than attempting to disconnect the platform.
The latter only results in an error since we did not use "platform
connect", and apparently process-level connections (at least via
gdb-remote) do not really support disconnecting.

Differential Revision: https://reviews.llvm.org/D110996

[X86] Add tests for enabling slow-mulld on AVX2 targets

As discussed on D110588 - Haswell/Broadwell don't have a great PMULLD implementation, we might want to enable this for them in the future

[MLIR] Fix unused tablegen template arg warnings

Identified in D109359.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D110805

[ELF][test] Fix several LLD ICF tests

A number of the ICF tests were not updated to use --print-icf-sections
instead of --verbose and various '-NOT' checks were not updated to the
latest output format of --print-icf-sections. Because these are all
'negative' tests, these issues have gone unnoticed.

Differential Revision: https://reviews.llvm.org/D110353

[mlir][python] Provide more convenient constructors for std.CallOp

The new constructor relies on type-based dynamic dispatch and allows one to
construct call operations given an object representing a FuncOp or its name as
a string, as opposed to requiring an explicitly constructed attribute.

Depends On D110947

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110948

[mlir][python] Provide more convenient wrappers for std.ConstantOp

Constructing a ConstantOp using the default-generated API is verbose and
requires to specify the constant type twice: for the result type of the
operation and for the type of the attribute. It also requires to explicitly
construct the attribute. Provide custom constructors that take the type once
and accept a raw value instead of the attribute. This requires dynamic dispatch
based on type in the constructor. Also provide the corresponding accessors to
raw values.

In addition, provide a "refinement" class ConstantIndexOp similar to what
exists in C++. Unlike other "op view" Python classes, operations cannot be
automatically downcasted to this class since it does not correspond to a
specific operation name. It only exists to simplify construction of the
operation.

Depends On D110946

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110947

[mlir][python] Usability improvements for Python bindings

Provide a couple of quality-of-life usability improvements for Python bindings,
in particular:

  * give access to the list of types for the list of op results or block
    arguments, similarly to ValueRange->TypeRange,

  * allow for constructing empty dictionary arrays,

  * support construction of array attributes by concatenating an existing
    attribute with a Python list of attributes.

All these are required for the upcoming customization of builtin and standard
ops.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110946

[libFuzzer] Use octal instead of hex escape sequences in PrintASCII

Previously, PrintASCII would print the string "\ta" as "\x09a". However,
in C/C++ those strings are not the same: the trailing 'a' is part of the
escape sequence, which means it's equivalent to "\x9a". This is an
annoying quirk of the standard. (See
https://eel.is/c++draft/lex.ccon#nt:hexadecimal-escape-sequence)

To fix this, output three-digit octal escape sequences instead. Since
octal escapes are limited to max three digits, this avoids the problem
of subsequent characters unintentionally becoming part of the escape
sequence.

Dictionary files still use the non-C-compatible hex escapes, but I
believe we can't change the format since it comes from AFL, and
libfuzzer never writes such files, it only has to read them, so they're
not affected by this change.

Differential revision: https://reviews.llvm.org/D110920

[LoopBoundSplit] Use SCEVAddRecExpr instead of SCEV for AddRecSCEV (NFC)

Differential Revision: https://reviews.llvm.org/D109682

[NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor

Avoid creating EpilogueVectorizationForceVF twice.

[APInt] Stop using soft-deprecated constructors and methods in clang. NFC.

Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.

Differential Revision: https://reviews.llvm.org/D110808

[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.

Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807

[openmp] [elf_common] Fix linking against LLVM dylib

The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries. Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.

Differential Revision: https://reviews.llvm.org/D111038

[LLDB] Skip TestClangREPL.py on Arm/AArch64 Linux

TestClangREPL.py has been failing randomly on Arm/AArch64 Linux
buildbot. I am marking it as skipped to reduce false alarms.

[mli][linalg] Change tensor size in unit test (NFC).

As a follow up to https://reviews.llvm.org/D110849, adapt the input tensor size to match the iteration space.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D110906

[clangd] Follow-up on rGdea48079b90d

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D110925

[lldb] Refactor variable parsing

Separates the methods for recursive variable parsing in function
context and non-recursive parsing of global variables.

Differential Revision: https://reviews.llvm.org/D110570

[SCEV] Cap the number of instructions scanned when infering flags

This addresses a comment from review on D109845. The concern was raised that an unbounded scan would be expensive. Long term plan is to cache this search - likely reusing the existing mechanism for loop side effects - but let's be simple and conservative for now.

[SCEV] Use trivial bound on defining scope of all SCEVs when computing flags

This addresses a comment from review on D109845. Even for SCEVs which we can't find true bounds without recursing through operands, entry to the function forms a trivial upper bound. In some cases, this trivial bound is enough to prove safety of flag inference.

[SCEV] Use full logic when infering flags on add and gep

This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path.

We can still do much better here (on both paths), but this is our first step.

Differential Revision: https://reviews.llvm.org/D111003

[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec

This fixes a violation of the wrap flag rules introduced in c4048d8f. This is an alternate fix to D106852.

The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.

In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation.

One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.

The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)

Differential Revision: https://reviews.llvm.org/D109845

[AttrBuilder] Make handling of int attribtues more generifc (NFC)

This is basically the same change as 42cc7f3c524a0ede6b903486c588003fe12d9293
but for integer attributes. Rather than treating each attribute
individually, handle them all the same way. The only thing that
needs to be done per attribute is specify how get/add convert
from/to the raw representation.

[openmp] Fix a typo in a test REQUIRES line

Differential Revision: https://reviews.llvm.org/D110963

[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111018

[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.

For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111017

[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.

For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111016

[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111015

[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111014

[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.

For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111013

[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.

For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111012

[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.

For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111011

[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111010

[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111008

[Clang][NFC] Fix the comment for Sema::DiagIfReachable

[mlir] [test] Add missing tool substitutions

Add missing mlir-capi-*-test tool substitutions in order to fix CAPI
test failures when mlir is not installed yet.

Differential Revision: https://reviews.llvm.org/D110991

[ARM] Mark <= -1 immediate constant as cheap

A <= -1 constant on a compare can be converted to a < 0 operation, which
is usually cheap. If we mark the constant as cheap, preventing hoisting,
we allow that fold to happen even across different blocks.

Differential Revision: https://reviews.llvm.org/D109360

[X86] Split Cannonlake + Icelake Tuning. NFC

The Ice/Tiger/RocketLake specs were inheriting the tuning settings from CannonLake, a previous architecture. We shouldn't have this dependency, so I've copied the current tuning settings so we can make future adjustments to both CNL + ICL etc. more easily.

[CostModel][X86] X86TTIImpl::getCmpSelInstrCost - try to use Predicate argument directly first (PR48337)

There's still a lot of cases where getCmpSelInstrCost fails to specify a predicate, once those are in place we should be able to remove the fallback to the Instruction argument entirely.

[ARM] Tests for constant hoisting -1 immediates

[Analysis, CodeGen] Migrate from arg_operands to args (NFC)

Note that arg_operands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.

[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=3

[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=3

[InstCombine] fold cast of right-shift if high bits are not demanded (3rd try)

The first two tries at this were reverted because they caused an
infinite loop in instcombine.
That should be fixed after a series of patches that ended with
removing the faulty opposing transform:
3fabd98e5b3e

Original commit message:
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170

[InstCombine] add test for shl + demanded bits; NFC

This is a reduction of a test that would infinite loop with D110170.

[InstSimplify] Add additional load from constant test (NFC)

This case does not get folded, because the GEP indexes too deeply
(to the i8), making the bitcast logic not apply (on the [8 x i8]).

[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=3

[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=3

Fixed warnings in target/parser codes produced by -Wbitwise-instead-of-logicala

Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical

[NFC][X86][Codegen] Add test coverage for interleaved i8 load/store stride=6

[NFC][X86][LV] Add costmodel test coverage for interleaved i8 load/store stride=6

[X86] Add SSE2/AVX1/AVX512BW test coverage to interleaved load/store tests

Extension to PR51979 so codegen tests keep close to the costmodel tests

Unbreak hexagon-check-builtins.c due to rGb1fcca388441

[clang-format] allow clang-format to be passed a file of filenames so we can add a regression suite of "clean clang-formatted files" from LLVM

This change now generates that list, and the change to clang-format allows
us to run clang-format quickly over these files via the list of files.

clang-format.exe -verbose -n --files=./clang/docs/tools/clang-formatted-files.txt

```
Clang-formating 7926 files
Formatting [1/7925] clang/bindings/python/tests/cindex/INPUTS/header1.h
..
Formatting [7925/7925] utils/bazel/llvm-project-overlay/llvm/include/llvm/Config/config.h
```

This is needed because putting all those files on the command line is too
long, and invoking 7900+ clang-formats is much slower (too slow to be honest)

Using this method it takes on 7.5 minutes (on my machine) to run
`clang-format -n` over all of the files (7925), this should result in us
testing any change quickly and easily.

We should be able to use rerunning this list to ensure that we don't regress
clang-format over a large code base, but also use it to ensure none of the
previous files which were 100% clang-formatted remain so.
(which the LLVM premerge checks should be enforcing)

Reviewed By: HazardyKnusperkeks

Differential Revision: https://reviews.llvm.org/D111000

Reland "[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects"

This reverts commit a4933f57f3f0a45e1db1075f7285f0761a80fc06. New warnings were fixed.

Fixed warnings in LLVM produced by -Wbitwise-instead-of-logical

Revert "[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects"

This reverts commit f62d18ff140f67a8776a7a3c62a75645d8d540b5. Found some cases in LLVM itself.

[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects

Motivation: https://arstechnica.com/gadgets/2021/07/google-pushed-a-one-character-typo-to-production-bricking-chrome-os-devices/

Warn for pattern boolA & boolB or boolA | boolB where boolA and boolB has possible side effects.

Casting one operand to int is enough to silence this warning: for example (int)boolA & boolB or boolA| (int)boolB

Fixes https://bugs.llvm.org/show_bug.cgi?id=51216

Differential Revision: https://reviews.llvm.org/D108003

[LSV] Change the default value of InstertElement to poison

This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior.

Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value.
Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111005

[NFC][RISCV] Update test cases through update_cc_test_checks.py.

[mlir] [test] Include mlir_tools_dir in PATH to fix mlir-reduce

Include mlir_tools_dir in the PATH used in test environment,
as otherwise mlir-reduce is unable to find mlir-opt when building
standalone (and hence mlir_tools_dir != llvm_tools_dir).

Differential Revision: https://reviews.llvm.org/D110992

Fix ASAN execution for the MLIR Python tests

First the leak sanitizer has to be disabled, as even an empty script
leads to leak detection with Python.
Then we need to preload the ASAN runtime, as the main binary (python)
won't be linked against it. This will only work on Linux right now.

Differential Revision: https://reviews.llvm.org/D111004