review.tizen.org Git - platform/upstream/llvm.git/log

[ELF][test] Fix several LLD ICF tests

A number of the ICF tests were not updated to use --print-icf-sections
instead of --verbose and various '-NOT' checks were not updated to the
latest output format of --print-icf-sections. Because these are all
'negative' tests, these issues have gone unnoticed.

Differential Revision: https://reviews.llvm.org/D110353

[mlir][python] Provide more convenient constructors for std.CallOp

The new constructor relies on type-based dynamic dispatch and allows one to
construct call operations given an object representing a FuncOp or its name as
a string, as opposed to requiring an explicitly constructed attribute.

Depends On D110947

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110948

[mlir][python] Provide more convenient wrappers for std.ConstantOp

Constructing a ConstantOp using the default-generated API is verbose and
requires to specify the constant type twice: for the result type of the
operation and for the type of the attribute. It also requires to explicitly
construct the attribute. Provide custom constructors that take the type once
and accept a raw value instead of the attribute. This requires dynamic dispatch
based on type in the constructor. Also provide the corresponding accessors to
raw values.

In addition, provide a "refinement" class ConstantIndexOp similar to what
exists in C++. Unlike other "op view" Python classes, operations cannot be
automatically downcasted to this class since it does not correspond to a
specific operation name. It only exists to simplify construction of the
operation.

Depends On D110946

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110947

[mlir][python] Usability improvements for Python bindings

Provide a couple of quality-of-life usability improvements for Python bindings,
in particular:

  * give access to the list of types for the list of op results or block
    arguments, similarly to ValueRange->TypeRange,

  * allow for constructing empty dictionary arrays,

  * support construction of array attributes by concatenating an existing
    attribute with a Python list of attributes.

All these are required for the upcoming customization of builtin and standard
ops.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D110946

[libFuzzer] Use octal instead of hex escape sequences in PrintASCII

Previously, PrintASCII would print the string "\ta" as "\x09a". However,
in C/C++ those strings are not the same: the trailing 'a' is part of the
escape sequence, which means it's equivalent to "\x9a". This is an
annoying quirk of the standard. (See
https://eel.is/c++draft/lex.ccon#nt:hexadecimal-escape-sequence)

To fix this, output three-digit octal escape sequences instead. Since
octal escapes are limited to max three digits, this avoids the problem
of subsequent characters unintentionally becoming part of the escape
sequence.

Dictionary files still use the non-C-compatible hex escapes, but I
believe we can't change the format since it comes from AFL, and
libfuzzer never writes such files, it only has to read them, so they're
not affected by this change.

Differential revision: https://reviews.llvm.org/D110920

[LoopBoundSplit] Use SCEVAddRecExpr instead of SCEV for AddRecSCEV (NFC)

Differential Revision: https://reviews.llvm.org/D109682

[NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor

Avoid creating EpilogueVectorizationForceVF twice.

[APInt] Stop using soft-deprecated constructors and methods in clang. NFC.

Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.

Differential Revision: https://reviews.llvm.org/D110808

[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.

Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807

[openmp] [elf_common] Fix linking against LLVM dylib

The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries. Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.

Differential Revision: https://reviews.llvm.org/D111038

[LLDB] Skip TestClangREPL.py on Arm/AArch64 Linux

TestClangREPL.py has been failing randomly on Arm/AArch64 Linux
buildbot. I am marking it as skipped to reduce false alarms.

[mli][linalg] Change tensor size in unit test (NFC).

As a follow up to https://reviews.llvm.org/D110849, adapt the input tensor size to match the iteration space.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D110906

[clangd] Follow-up on rGdea48079b90d

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D110925

[lldb] Refactor variable parsing

Separates the methods for recursive variable parsing in function
context and non-recursive parsing of global variables.

Differential Revision: https://reviews.llvm.org/D110570

[SCEV] Cap the number of instructions scanned when infering flags

This addresses a comment from review on D109845. The concern was raised that an unbounded scan would be expensive. Long term plan is to cache this search - likely reusing the existing mechanism for loop side effects - but let's be simple and conservative for now.

[SCEV] Use trivial bound on defining scope of all SCEVs when computing flags

This addresses a comment from review on D109845. Even for SCEVs which we can't find true bounds without recursing through operands, entry to the function forms a trivial upper bound. In some cases, this trivial bound is enough to prove safety of flag inference.

[SCEV] Use full logic when infering flags on add and gep

This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path.

We can still do much better here (on both paths), but this is our first step.

Differential Revision: https://reviews.llvm.org/D111003

[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec

This fixes a violation of the wrap flag rules introduced in c4048d8f. This is an alternate fix to D106852.

The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.

In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation.

One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.

The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)

Differential Revision: https://reviews.llvm.org/D109845

[AttrBuilder] Make handling of int attribtues more generifc (NFC)

This is basically the same change as 42cc7f3c524a0ede6b903486c588003fe12d9293
but for integer attributes. Rather than treating each attribute
individually, handle them all the same way. The only thing that
needs to be done per attribute is specify how get/add convert
from/to the raw representation.

[openmp] Fix a typo in a test REQUIRES line

Differential Revision: https://reviews.llvm.org/D110963

[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111018

[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.

For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111017

[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.

For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111016

[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111015

[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111014

[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.

For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111013

[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.

For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111012

[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.

For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111011

[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111010

[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111008

[Clang][NFC] Fix the comment for Sema::DiagIfReachable

[mlir] [test] Add missing tool substitutions

Add missing mlir-capi-*-test tool substitutions in order to fix CAPI
test failures when mlir is not installed yet.

Differential Revision: https://reviews.llvm.org/D110991

[ARM] Mark <= -1 immediate constant as cheap

A <= -1 constant on a compare can be converted to a < 0 operation, which
is usually cheap. If we mark the constant as cheap, preventing hoisting,
we allow that fold to happen even across different blocks.

Differential Revision: https://reviews.llvm.org/D109360

[X86] Split Cannonlake + Icelake Tuning. NFC

The Ice/Tiger/RocketLake specs were inheriting the tuning settings from CannonLake, a previous architecture. We shouldn't have this dependency, so I've copied the current tuning settings so we can make future adjustments to both CNL + ICL etc. more easily.

[CostModel][X86] X86TTIImpl::getCmpSelInstrCost - try to use Predicate argument directly first (PR48337)

There's still a lot of cases where getCmpSelInstrCost fails to specify a predicate, once those are in place we should be able to remove the fallback to the Instruction argument entirely.

[ARM] Tests for constant hoisting -1 immediates

[Analysis, CodeGen] Migrate from arg_operands to args (NFC)

Note that arg_operands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.

[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=3

[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=3

[InstCombine] fold cast of right-shift if high bits are not demanded (3rd try)

The first two tries at this were reverted because they caused an
infinite loop in instcombine.
That should be fixed after a series of patches that ended with
removing the faulty opposing transform:
3fabd98e5b3e

Original commit message:
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170

[InstCombine] add test for shl + demanded bits; NFC

This is a reduction of a test that would infinite loop with D110170.

[InstSimplify] Add additional load from constant test (NFC)

This case does not get folded, because the GEP indexes too deeply
(to the i8), making the bitcast logic not apply (on the [8 x i8]).

[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=3

[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=3

Fixed warnings in target/parser codes produced by -Wbitwise-instead-of-logicala

Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical

[NFC][X86][Codegen] Add test coverage for interleaved i8 load/store stride=6

[NFC][X86][LV] Add costmodel test coverage for interleaved i8 load/store stride=6

[X86] Add SSE2/AVX1/AVX512BW test coverage to interleaved load/store tests

Extension to PR51979 so codegen tests keep close to the costmodel tests

Unbreak hexagon-check-builtins.c due to rGb1fcca388441

[clang-format] allow clang-format to be passed a file of filenames so we can add a regression suite of "clean clang-formatted files" from LLVM

This change now generates that list, and the change to clang-format allows
us to run clang-format quickly over these files via the list of files.

clang-format.exe -verbose -n --files=./clang/docs/tools/clang-formatted-files.txt

```
Clang-formating 7926 files
Formatting [1/7925] clang/bindings/python/tests/cindex/INPUTS/header1.h
..
Formatting [7925/7925] utils/bazel/llvm-project-overlay/llvm/include/llvm/Config/config.h
```

This is needed because putting all those files on the command line is too
long, and invoking 7900+ clang-formats is much slower (too slow to be honest)

Using this method it takes on 7.5 minutes (on my machine) to run
`clang-format -n` over all of the files (7925), this should result in us
testing any change quickly and easily.

We should be able to use rerunning this list to ensure that we don't regress
clang-format over a large code base, but also use it to ensure none of the
previous files which were 100% clang-formatted remain so.
(which the LLVM premerge checks should be enforcing)

Reviewed By: HazardyKnusperkeks

Differential Revision: https://reviews.llvm.org/D111000

Reland "[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects"

This reverts commit a4933f57f3f0a45e1db1075f7285f0761a80fc06. New warnings were fixed.

Fixed warnings in LLVM produced by -Wbitwise-instead-of-logical

Revert "[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects"

This reverts commit f62d18ff140f67a8776a7a3c62a75645d8d540b5. Found some cases in LLVM itself.

[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects

Motivation: https://arstechnica.com/gadgets/2021/07/google-pushed-a-one-character-typo-to-production-bricking-chrome-os-devices/

Warn for pattern boolA & boolB or boolA | boolB where boolA and boolB has possible side effects.

Casting one operand to int is enough to silence this warning: for example (int)boolA & boolB or boolA| (int)boolB

Fixes https://bugs.llvm.org/show_bug.cgi?id=51216

Differential Revision: https://reviews.llvm.org/D108003

[LSV] Change the default value of InstertElement to poison

This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior.

Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value.
Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111005

[NFC][RISCV] Update test cases through update_cc_test_checks.py.

[mlir] [test] Include mlir_tools_dir in PATH to fix mlir-reduce

Include mlir_tools_dir in the PATH used in test environment,
as otherwise mlir-reduce is unable to find mlir-opt when building
standalone (and hence mlir_tools_dir != llvm_tools_dir).

Differential Revision: https://reviews.llvm.org/D110992

Fix ASAN execution for the MLIR Python tests

First the leak sanitizer has to be disabled, as even an empty script
leads to leak detection with Python.
Then we need to preload the ASAN runtime, as the main binary (python)
won't be linked against it. This will only work on Linux right now.

Differential Revision: https://reviews.llvm.org/D111004

Exclude MLIR python binding tests from Sanitizer tests for now

This requires more config to work reliably during lit execution.
But also I see many leaks when running manually right now.

Fix last leaky MLIR integration test (NFC)

[IR]PATCH 2/2: Add MDNode::printTree and dumpTree

This patch adds the functionalities to print MDNode in tree shape. For
example, instead of printing a MDNode like this:
```
<0x5643e1166888> = !DILocalVariable(name: "foo", arg: 2, scope: <0x5643e11c9740>, file: <0x5643e11c6ec0>, line: 8, type: <0x5643e11ca8e0>, flags: DIFlagPublic | DIFlagFwdDecl, align: 8)
```
The printTree/dumpTree functions can give you:
```
<0x5643e1166888> = !DILocalVariable(name: "foo", arg: 2, scope: <0x5643e11c9740>, file: <0x5643e11c6ec0>, line: 8, type: <0x5643e11ca8e0>, flags: DIFlagPublic | DIFlagFwdDecl, align: 8)
  <0x5643e11c9740> = distinct !DISubprogram(scope: null, spFlags: 0)
  <0x5643e11c6ec0> = distinct !DIFile(filename: "file.c", directory: "/path/to/dir")
  <0x5643e11ca8e0> = distinct !DIDerivedType(tag: DW_TAG_pointer_type, baseType: <0x5643e11668d8>, size: 1, align: 2)
    <0x5643e11668d8> = !DIBasicType(tag: DW_TAG_unspecified_type, name: "basictype")
```
Which is useful when using it in debugger. Where sometimes printing the
whole module to see all MDNodes is too expensive.

Differential Revision: https://reviews.llvm.org/D110113

[IR]PATCH 1/2: Add AsmWriterContext into AsmWriter

AsmWriterContext is a simple compound that stores TypePrinting,
SlotTracker (i.e. "Machine" in AsmWriter), and Module instances -- three
of the most commonly used objects in the AsmWriter infrastructure.
Previously these three objects are passed as separate function arguments
to most of the printer functions in this file. Tidying them up can bring
easier code refactoring on printer functions in the future (e.g. when we
want to pass additional objects to all printer functions).

NOTE: Theoritically, this patch should be NFC.

Differential Revision: https://reviews.llvm.org/D110112

Use standard separator for TSan options in `stress.cpp` test case.

Use of space as a separator for options is problematic for wrapper
scripts (i.e. implementations of `%run`) that have to marshall
environment variables to target different than the host.

Rather than requiring every implementation of `%run` to support spaces
in `TSAN_OPTIONS` it is simpler to fix this single test case.

rdar://83637067

Differential Revision: https://reviews.llvm.org/D110967

[MLIR][NFC] Drop unnecessary use of OpBuilder in build trip count map

NFC. Drop unnecessary use of OpBuilder in buildTripCountMapAndOperands.
Rename this to getTripCountMapAndOperands and remove stale comments.

Differential Revision: https://reviews.llvm.org/D110993

Disable leak check for the MLIR Linalg CPU integration tests (NFC)

See http://llvm.org/pr52047 for tracking.

Disable leak check for the MLIR Sparse CPU integration tests (NFC)

See http://llvm.org/pr52046 for tracking.

Fix memory leaks in MLIR integration tests for vector dialect (NFC)

[LLVM][IR] Fixed input arguments for Verifier getter

ParameterABIAttributes functions work with unsigned integers as the index, so having the getter be signed makes no sense. Additionally, for this reason, the loop vars that were signed were changed to unsigned too.

Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D110344

Re-apply the fix on DwarfEHPrepare and add a test

This patch re-introduces the fix in the commit https://github.com/llvm/llvm-project/commit/66b0cebf7f736 by @yrnkrn

> In DwarfEHPrepare, after all passes are run, RewindFunction may be a dangling
>
> pointer to a dead function. To make sure it's valid, doFinalization nullptrs
> RewindFunction just like the constructor and so it will be found on next run.
>
> llvm-svn: 217737

It seems that the fix was not migrated to `DwarfEHPrepareLegacyPass`.

This patch also updates `llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` to include `-run-twice` to exercise the cleanup. Without this patch `llvm-lit -v llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` fails with

```
-- Testing: 1 tests, 1 workers --
FAIL: LLVM :: CodeGen/X86/dwarf-eh-prepare.ll (1 of 1)
******************** TEST 'LLVM :: CodeGen/X86/dwarf-eh-prepare.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice < /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll -S | /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll
--
Exit Code: 2

Command Output (stderr):
--
Referencing function in another module!
  call void @_Unwind_Resume(i8* %ehptr) #1
; ModuleID = '<stdin>'
void (i8*)* @_Unwind_Resume
; ModuleID = '<stdin>'
in function simple_cleanup_catch
LLVM ERROR: Broken function found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice -S
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Module Verifier' on function '@simple_cleanup_catch'
#0 0x000056121b570a2c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:0
#1 0x000056121b56eb64 llvm::sys::RunSignalHandlers() /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Signals.cpp:97:0
#2 0x000056121b56f28e SignalHandler(int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:397:0
#3 0x00007fc7e9b22980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
#4 0x00007fc7e87d3fb7 raise /build/glibc-S7xCS9/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
#5 0x00007fc7e87d5921 abort /build/glibc-S7xCS9/glibc-2.27/stdlib/abort.c:81:0
#6 0x000056121b4e1386 llvm::raw_svector_ostream::raw_svector_ostream(llvm::SmallVectorImpl<char>&) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:674:0
#7 0x000056121b4e1386 llvm::report_fatal_error(llvm::Twine const&, bool) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/ErrorHandling.cpp:114:0
#8 0x000056121b4e1528 (/home/arakaki/build/llvm-project/main/bin/opt+0x29e3528)
#9 0x000056121adfd03f llvm::raw_ostream::operator<<(llvm::StringRef) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:218:0
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll

--

********************
********************
Failed Tests (1):
  LLVM :: CodeGen/X86/dwarf-eh-prepare.ll

Testing Time: 0.22s
  Failed: 1
```

Reviewed By: loladiro

Differential Revision: https://reviews.llvm.org/D110979

[libc++] [ranges] Uncomment operator<=> in transform and iota iterators.

The existing tests for transform_view::iterator weren't quite right,
and can be simplified now that we have more of C++20 available to us.
Having done that, let's use the same pattern for iota_view::iterator
as well.

Differential Revision: https://reviews.llvm.org/D110774

Fix memory leak in MLIR SPIRV ModuleCombiner

Fix/disable more MLIR tests exposing leaks in ASAN builds (NFC)

Fix multiple memory leaks in mlir-cpu-runner tests (NFC)

Fix memory leak in mlir-cpu-runner/sgemm_naive_codegen.mlir (NFC)

Fix Undefined Behavior in MLIR Diagnostic: don't call memcpy with a nullptr source

This happens when streaming an empty Twine as part of a diagnostic.

Differential Revision: https://reviews.llvm.org/D111002

Fix memory leaks in MLIR unit-tests (NFC)

Fix memory leaks in mlir/unittests/MLIRTableGenTests

Trying to get MLIR ASAN-clean.

[SCEV] Split isSCEVExprNeverPoison reasoning explicitly into scope and mustexecute parts [NFC]

Inspired by the needs to D111001 and D109845. The seperation of concerns also amakes it easier to reason about correctness and completeness.

[Target] Migrate from getNumArgOperands to arg_size (NFC)

Note that getNumArgOperands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.

[llvm-jitlink] Sink getPageSize call in Session::Create.

The page size for the host process is only needed in the in-process use case.

[X86][Atom] Fix BSR/BSF uops + port usage

Both ports are required for BitScan ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.

Revert "[RISCV] Add an GPR def to the Zvlseg SPILL/RELOAD pseudos"

This reverts commit 1f161919065fbfa2b39b8f373553a64b89f826f8.

We're seeing some issues with this internally. It seems that when
the spill is created by register allocation, the GPR doesn't get
allocated and an assertion fires during virtual register rewriting.

The .mir test case contains the spill before register allocation so
register allocation sees it as any other instruction.

[clang-format] NFC 1% improvement in the overall clang-formatted status

Free memory leak on duplicate interface registration

I guess this is why we should use unique_ptr as much as possible.
Also fix the InterfaceAttachmentTest.cpp test.

Differential Revision: https://reviews.llvm.org/D110984

[X86][SSE] Fix typo + infinite-loop in HOP(HOP'(X,X),HOP'(Y,Y)) fold (PR52040)

PR52040 identified several issues with the HOP(HOP'(X,X),HOP'(Y,Y)) -> HOP(PERMUTE(HOP'(X,Y)),PERMUTE(HOP'(X,Y)) slow-HOP fold.

Not only was there a copy+paste typo when accessing the inner HOP operands, but the (unnecessary) ReplaceAllUsesOfValueWith call was missing one use checks.

Now that we have better shuffle combines of HOPs we can just return a new HOP() sequence and not use ReplaceAllUsesOfValueWith at all - this actually improved pair_sum_v8i32_v4i32 codegen as it kicks off further shuffle combines.

[clang-format] Constructor initializer lists format with pp directives

Currently constructor initializer lists sometimes format incorrectly
when there is a preprocessor directive in the middle of the list.
This patch fixes the issue when parsing the initilizer list by
ignoring the preprocessor directive when checking if a block is
part of an initializer list.

rdar://82554274

Reviewed By: MyDeveloperDay, HazardyKnusperkeks

Differential Revision: https://reviews.llvm.org/D109951

[clang-format] [docs] [NFC] improve clarity in the QualifierAlignment warning

Improve the clarity and guidance of the warning when using code modifying option in clang-format see {D69764}

Reviewed By: HazardyKnusperkeks, curdeius

Differential Revision: https://reviews.llvm.org/D110801

[NFC][libc++] Use TEST_HAS_NO_EXCEPTIONS in tests.

[libc++][doc] Update format status.

Updated based on recent commits, new reviews and work continuing for
P2216.

[X86] decomposeMulByConstant - decompose legal vXi32 multiplies on SlowPMULLD targets and all vXi64 multiplies

X86's decomposeMulByConstant never permits mul decomposition to shift+add/sub if the vector multiply is legal.

Unfortunately this isn't great for SSE41+ targets which have PMULLD for vXi32 multiplies, but is often quite slow. This patch proposes to allow decomposition if the target has the SlowPMULLD flag (i.e. Silvermont). We also always decompose legal vXi64 multiplies - even latest IceLake has really poor latencies for PMULLQ.

Differential Revision: https://reviews.llvm.org/D110588

[X86] Atom SSE shift-by-variable take 2uops/3uops not 1uop

Based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner / InstLatX64 reports as well.

[X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs

While we already model this tuple, the load cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110971

[X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/TrGW7cKsE - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `24`.

For store we have:
https://godbolt.org/z/Mh7qaqEfe - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110970

[X86][Costmodel] Load/store i8 Stride=4 VF=8 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v7746Wcf7 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `12`.

For store we have:
https://godbolt.org/z/aEeEohEbP - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110969

[X86][Costmodel] Load/store i8 Stride=4 VF=4 interleaving costs

While we already model this tuple, the store cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1n4bPh7Tn - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/r8K9sveqo - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110968

[X86][Costmodel] Load/store i8 Stride=4 VF=2 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/KP6nn36zs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/ov95zhrq6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110966

[X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs

For VF=16, costs are correct.
For VF=32, load cost is divergent.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/qKjevqf4W - for intels `Block RThroughput: <=14.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/xTssTq319 - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=5.5`
So pick cost of `13`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110961

[X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1jeocxj55 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/fr7xfa3K5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110960

[X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/obWz3PrfK - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/orjPshn3h - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110958