review.tizen.org Git - platform/upstream/llvm.git/log

[AMDGPU][NFC] Remove unused defvar in AMDGPUInstructions.td.

[DebugInfo][InstrRef] Avoid dropping fragment info during PHI elimination

InstrRefBasedLDV used to crash on the added test -- the exit block is not
in scope for the variable being propagated, but is still considered because
it contains an assignment. The failure-mode was vlocJoin ignoring
assign-only blocks and not updating DIExpressions, but pickVPHILoc would
still find a variable location for it. That led to DBG_VALUEs created with
the wrong fragment information.

Fix this by removing a filter inherited from VarLocBasedLDV: vlocJoin will
now consider assign-only blocks and will update their expressions.

Differential Revision: https://reviews.llvm.org/D114727

[DAG] Create fptosi.sat from clamped fptosi

This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976

[libc++][ABI BREAK] Do not use the C++03 emulation for std::nullptr_t by default

We only support Clangs that implement nullptr as an extension in C++03 mode,
and we don't support GCC in C++03 mode. Hence, this patch disables the
use of the std::nullptr_t emulation in C++03 mode by default. Doing that
is technically an ABI break since it changes the mangling for std::nullptr_t.
However:

(1) The only affected users are those compiling in C++03 mode that have
    std::nullptr_t as part of their ABI, which should be reasonably rare.

(2) Those users already have a lingering problem in that their code will
    be incompatible in C++03 and C++11 modes because of that very ABI break.
    Hence, the only users that could really be inconvenienced about this
    change is those that planned on compiling in C++03 mode forever - for
    other users, we're just breaking them now instead of letting them break
    themselves later on when they try to upgrade to C++11.

(3) The ABI break will cause a linker error since the mangling changed,
    and will not result in an obscure runtime error.

Furthermore, if anyone is broken by this, they can define the
_LIBCPP_ABI_USE_CXX03_NULLPTR_EMULATION macro to return to the
previous behavior. We will then remove that macro after shipping
this for one release if we haven't seen widespread issues.

Concretely, the motivation for making this change is to make our own ABI
consistent in C++03 and C++11 modes and to remove complexity around the
definition of nullptr.

Furthermore, we could investigate making nullptr a keyword in C++03 mode
as a Clang extension -- I don't think that would break anyone, since
libc++ already defines nullptr as a macro to something else. Only users
that do not use libc++ and compile in C++03 mode could potentially be
broken by that.

Differential Revision: https://reviews.llvm.org/D109459

[libc] Add a reasonably optimized version for bcmp

This is based on current memcmp implementation.

Differential Revision: https://reviews.llvm.org/D114432

[libc] Add memmove benchmarks

This patch enables the benchmarking of `memmove`.
Ideally, this should be submitted before D114637.

Differential Revision: https://reviews.llvm.org/D114694

[DebugInfo][InstrRef] "final final" test cleanups for x86 tests

Two "totally definitely the last ones" instruction referencing test
updates:

* fp-stack.ll: this test targets i686, and so it won't be getting
   instruction referencing, or at least not right now,
* X86/live-debug-values.ll: instruction referencing will produce entry
   values in this test, add check lines to account for this. It's not clear
   what the test is supposed to be testing anyway, but the entry values
   appear to be correct.

Differential Revision: https://reviews.llvm.org/D114626

[LV] Move code from widenSelectInstruction to VPWidenSelectRecipe. (NFC)

The code in widenSelectInstruction has already been transitioned
to only rely on information provided by VPWidenSelectRecipe directly.

Moving the code directly to VPWidenSelectRecipe::execute completes
the transition for the recipe.

It provides the following advantages:

1. Less indirection, easier to see what's going on.
2. Removes accesses to fields of ILV.

2) in particular ensures that no dependencies on
fields in ILV for vector code generation are re-introduced.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D114323

[Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree

Add the capability to simplify more complex constraints where there are 3
symbols in the tree. In this change I extend simplifySVal to query constraints
of children sub-symbols in a symbol tree. (The constraint for the parent is
asked in getKnownValue.)

Differential Revision: https://reviews.llvm.org/D103317

[Analyzer][solver] Do not remove the simplified symbol from the eq class

Currently, during symbol simplification we remove the original member symbol
from the equivalence class (`ClassMembers` trait). However, we keep the
reverse link (`ClassMap` trait), in order to be able the query the
related constraints even for the old member. This asymmetry can lead to
a problem when we merge equivalence classes:
```
ClassA: [a, b]   // ClassMembers trait,
a->a, b->a       // ClassMap trait, a is the representative symbol
```
Now lets delete `a`:
```
ClassA: [b]
a->a, b->a
```
Let's merge the trivial class `c` into ClassA:
```
ClassA: [c, b]
c->c, b->c, a->a
```
Now after the merge operation, `c` and `a` are actually in different
equivalence classes, which is inconsistent.

One solution to this problem is to simply avoid removing the original
member and this is what this patch does.

Other options I have considered:
1) Always merge the trivial class into the non-trivial class. This might
   work most of the time, however, will fail if we have to merge two
   non-trivial classes (in that case we no longer can track equivalences
   precisely).
2) In `removeMember`, update the reverse link as well. This would cease
   the inconsistency, but we'd loose precision since we could not query
   the constraints for the removed member.

Differential Revision: https://reviews.llvm.org/D114619

[lldb] Remove 'extern "C"' from the lldb-swig-python interface

The LLDBSWIGPython functions had (at least) two problems:
- There wasn't a single source of truth (a header file) for the
  prototypes of these functions. This meant that subtle differences
  in copies of function declarations could go by undetected. And
  not-so-subtle differences would result in strange runtime failures.
- All of the declarations had to have an extern "C" interface, because
  the function definitions were being placed inside and extert "C" block
  generated by swig.

This patch fixes both problems by moving the function definitions to the
%header block of the swig files. This block is not surrounded by extern
"C", and seems more appropriate anyway, as swig docs say it is meant for
"user-defined support code" (whereas the previous %wrapper code was for
automatically-generated wrappers).

It also puts the declarations into the SWIGPythonBridge header file
(which seems to have been created for this purpose), and ensures it is
included by all code wishing to define or use these functions. This
means that any differences in the declaration become a compiler error
instead of a runtime failure.

Differential Revision: https://reviews.llvm.org/D114369

[GlobalISel] Add matchers for constant splat.

This change exposes isBuildVectorConstantSplat() to the llvm namespace
and uses it to implement the constant splat versions of
m_SpecificICst().

CombinerHelper::matchOrShiftToFunnelShift() can now work with vector
types and CombinerHelper::matchMulOBy2()'s match for a constant splat is
simplified.

Differential Revision: https://reviews.llvm.org/D114625

[AMDGPU] Update docs for nontemporal store

Update the documented GFX10 code sequence for nontemporal stores after
D114351.

Differential Revision: https://reviews.llvm.org/D114707

[clang][ARM] PACBTI-M assembly support

Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional
extension in v8.1-M.

There are 10 new system registers and 5 new instructions, all predicated on the
feature.

The attribute for llvm-mc is called "pacbti". For armclang, an architecture
extension also called "pacbti" was created.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112420

[mlir] Decompose Bufferization Clone operation into Memref Alloc and Copy.

This patch introduces a new conversion to convert bufferization.clone operations
into a memref.alloc and a memref.copy operation. This transformation is needed to
transform all remaining clones which "survive" all previous transformations, before
a given program is lowered further (to LLVM e.g.). Otherwise, these operations
cannot be handled anymore and lead to compile errors.
See: https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665

Differential Revision: https://reviews.llvm.org/D114233

[clangd] Make std symbol generation script python3 friendly

Differential Revision: https://reviews.llvm.org/D114723

[mlir] Move bufferization-related passes to `bufferization` dialect.

[RFC](https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712)

Differential Revision: https://reviews.llvm.org/D114698

[mlir][OpDSL] Fix OpDSL tests after https://reviews.llvm.org/D114680.

Update the shapes of the convolution / pooling tests that where detected after enabling verification during printing (https://reviews.llvm.org/D114680). Also split the emit_structured_generic.py file that previously contained all tests into multiple separate files to simplify debugging.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D114731

[ELF] Move ObjFile<ELFT>::{getLocalSymbols,getGlobalSymbols} to non-template ELFFileBase. NFC

[RISCV] Decode vtype with reserved fields to raw immediate

This patch fixes a crash when doing "llvm-objdump -D --mattr=+experimental-v"
against an object file which happens to keep a word that can be decoded to
VSETVLI & VSETIVLI with reserved vlmul[2:0]=4. All vtype values with
reserved fields (vlmul[2:0]=4, vsew[2:0]=0b1xx, non-zero bits 8/9/10) are
printed to raw immediate.

Reviewed By: jhenderson, jrtc27, craig.topper

Differential Revision: https://reviews.llvm.org/D114581

[PR52549][clang-cl] Predefine _MSVC_EXECUTION_CHARACTER_SET

Since VS 2022 17.1 MSVC predefines _MSVC_EXECUTION_CHARACTER_SET to inform the users of the execution character set defined at compile time. The value the macro expands to is a Windows Code Page Identifier which are documented here: https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers

As clang currently only supports UTF-8 it is defined as 65001. If clang-cl were to support a different execution character set in the future we'd have to change the value.

Fixes https://bugs.llvm.org/show_bug.cgi?id=52549

Differential Revision: https://reviews.llvm.org/D114576

[llvm-profgen] Compute and show profile density

AutoFDO performance is sensitive to profile density, i.e., the amount of samples in the profile relative to the program size, because profiles with insufficient samples could be inaccurate due to statistical noise and thus hurt AutoFDO performance. A previous investigation showed that AutoFDO performed better on MySQL with increased amount of samples. Therefore, we implement a profile-density computation feature to give hints about profile density to users and the compiler.

We define the density of a profile Prof as follows:

- For each function A in the profile, density(A) = total_samples(A) / sizeof(A).
- density(Prof) = min(density(A)) for all functions A that are warm (defined below).

A function is considered warm if its total-samples is within top N percent of the profile. For implementation, we reuse the `ProfileSummaryBuilder::getHotCountThreshold(..)` as threshold which can be set by percent(`--profile-summary-cutoff-hot`) or by value(`--profile-summary-hot-count`).

We also introduce `--hot-function-density-threshold` to set hot function density threshold and will give suggestion if profile density is below it which implies we should increase samples.

This also applies for CS profile with all profiles merged into base.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D113781

[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`

We ask `TTI.getAddressComputationCost()` about the cost of computing vector address,
and then multiply it by the vector width. This doesn't make any sense,
it implies that we'd do a vector GEP and then scalarize the vector of pointers,
but there is no such thing in the vectorized IR, we perform scalar GEP's.

This is *especially* bad on X86, and was effectively prohibiting any scalarized
vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()`
says that cost of vector address computation is `10` as compared to `1` for scalar.

The computed costs are similar to the ones with D111222+D111220,
but we end up without masked memory intrinsics that we'd then have to
expand later on, without much luck. (D111363)

Differential Revision: https://reviews.llvm.org/D111460

[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards

We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the
stack guard for PIC code that references the GOT, since arm-pseudo may
expand this to the narrow tLDRpci rather than the wider t2LDRpci.

Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci.

Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361

Reviewed By: ardb

Differential Revision: https://reviews.llvm.org/D114762

[clang-tidy] Warn on functional C-style casts

The google-readability-casting check is meant to be on par
with cpplint's readability/casting check, according to the
documentation. However it currently does not diagnose
functional casts, like:

float x = 1.5F;
int y = int(x);

This is detected by cpplint, however, and the guidelines
are clear that such a cast is only allowed when the type
is a class type (constructor call):

> You may use cast formats like `T(x)` only when `T` is a class type.

Therefore, update the clang-tidy check to check this
case.

Differential Revision: https://reviews.llvm.org/D114427

[ELF] Move GOT/PLT relocation code closer. NFC

[X86][clang] Enable floating-point type for -mno-x87 option on 32-bits

We should match GCC's behavior which allows floating-point type for -mno-x87 option on 32-bits. https://godbolt.org/z/KrbhfWc9o

The previous block issues have partially been fixed by D112143.

Reviewed By: asavonic, nickdesaulniers

Differential Revision: https://reviews.llvm.org/D114162

[mlir][python] Audit and fix a lot of the Python pyi stubs.

* Classes that are still todo are marked with "# TODO: Auto-generated. Audit and fix."
* Those without this note have been cross-checked with C++ sources and most have been spot checked by hovering in VsCode.

Differential Revision: https://reviews.llvm.org/D114767

[mlir][python] Implement more SymbolTable methods.

* set_symbol_name, get_symbol_name, set_visibility, get_visibility, replace_all_symbol_uses, walk_symbol_tables
* In integrations I've been doing, I've been reaching for all of these to do both general IR manipulation and module merging.
* I don't love the replace_all_symbol_uses underlying APIs since they necessitate SYMBOL_COUNT walks and have various sharp edges. I'm hoping that whatever emerges eventually for this can still retain this simple API as a one-shot.

Differential Revision: https://reviews.llvm.org/D114687

[mlir][python] Add pyi stub files to enable auto completion.

There is no completely automated facility for generating stubs that are both accurate and comprehensive for native modules. After some experimentation, I found that MyPy's stubgen does the best at generating correct stubs with a few caveats that are relatively easy to fix:
  * Some types resolve to cross module symbols incorrectly.
  * staticmethod and classmethod signatures seem to always be completely generic and need to be manually provided.
  * It does not generate an __all__ which, from testing, causes namespace pollution to be visible to IDE code completion.

As a first step, I did the following:
  * Ran `stubgen` for `_mlir.ir`, `_mlir.passmanager`, and `_mlirExecutionEngine`.
  * Manually looked for all instances where unnamed arguments were being emitted (i.e. as 'arg0', etc) and updated the C++ side to include names (and re-ran stubgen to get a good initial state).
  * Made/noted a few structural changes to each `pyi` file to make it minimally functional.
  * Added the `pyi` files to the CMake rules so they are installed and visible.

To test, I added a `.env` file to the root of the project with `PYTHONPATH=...` set as per instructions. Then reload the developer window (in VsCode) and verify that completion works for various changes to test cases.

There are still a number of overly generic signatures, but I want to check in this low-touch baseline before iterating on more ambiguous changes. This is already a big improvement.

Differential Revision: https://reviews.llvm.org/D114679

[DebugInfo] Do not replace existing nodes from DICompileUnit

When creating a new DIBuilder with an existing DICompileUnit, load the
DINodes from the current DICompileUnit so they don't get overwritten.
This is done in the MachineOutliner pass, but it didn't change the CU so
the bug never appeared. We need this if we ever want to add DINodes to
the CU after it has been created, e.g., DIGlobalVariables.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114556

[AMDGPU] Enable copy between VGPR and AGPR classes during regalloc

Greedy register allocator prefers to move a constrained
live range into a larger allocatable class over spilling
them. This patch defines the necessary superclasses for
vector registers. For subtargets that support copy between
VGPRs and AGPRs, the vector register spills during regalloc
now become just copies.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D109301

[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB

Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example

    %reg101 = ...
            = ... reg101
    %reg103 = ADD %reg101, %reg102

We can create mapping between %reg101 and %reg103.

Differential Revision: https://reviews.llvm.org/D113193

[RISCV] Promote f16 log/pow/exp/sin/cos/etc. to f32 libcalls.

Prevents crashes or cannot select errors.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113822

[NFC][sanitizer] Track progress of populating the block

In multi-threaded application concurrent StackStore::Store may
finish in order different from assigned Id. So we can't assume
that after we switch writing the next block the previous is done.

The workaround is to count exact number of uptr stored into the block,
including skipped tail/head which were not able to fit entire trace.

Depends on D114490.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D114493

[RISCV] Fix a bug in RISCVFrameLowering.

When we have out-going arguments passing through stack and we do not
reserve the stack space in the prologue. Use BP to access stack objects
after adjusting the stack pointer before function calls.

callseq_start -> sp = sp - reserved_space
//
// Use FP to access fixed stack objects.
// Use BP to access non-fixed stack objects.
//
call @foo
callseq_end -> sp = sp + reserved_space

Differential Revision: https://reviews.llvm.org/D114246

[RISCV] Add a test case to show the bug in RISCVFrameLowering.

If the number of arguments is too large to use register passing, it
needs to occupy stack space to pass the arguments to the callee. There
are two scenarios. One is to reserve the space in prologue and the other
is to reserve the space before the function calls. When we need to
reserve the stack space before function calls, the stack pointer is
adjusted. Under the scenario, we should not use stack pointer to access
the stack objects. It looks like,

callseq_start -> sp = sp - reserved_space
//
// We should not use SP to access stack objects in this area.
//
call @foo
callseq_end -> sp = sp + reserved_space

Differential Revision: https://reviews.llvm.org/D114245

[NFC] Header comment in X86RegisterBanks.td referred to Aarch64

Differential Revision: https://reviews.llvm.org/D114763

[sanitizer] Add Leb128 encoding/decoding

Reviewed By: dvyukov, kstoimenov

Differential Revision: https://reviews.llvm.org/D114464

Revert "[lldb][NFC] Format lldb/include/lldb/Symbol/Type.h"

This reverts commit 6f99e1aa58e3566fcce689bc986b7676e818c038.

Add missing header

[mlir][sparse] generalize sparse tensor output implementation

Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D114399

X86: Fold masked-merge when and-not is not available

Differential Revision: https://reviews.llvm.org/D112754

Tests for D112754

Differential Revision: https://reviews.llvm.org/D113151

[Demangle] Add support for D anonymous symbols

    Anonymous symbols are represented by 0 in the mangled symbol. We should skip
    them in order to represent the demangled name correctly, otherwise demangled
    names like `demangle..anon` can happen.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114307

[Demangle] Add support for multiple identifiers in D qualified names

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114305

[Demangle] Add support for D simple single qualified names

This patch adds support for simple single qualified names that includes
internal mangled names and normal symbol names.

Differential Revision: https://reviews.llvm.org/D111415

[NFC][Regalloc] Split canEvictInterference into hint and general

There are 2 eviction queries. One is made by tryAssign, when it attempts to
free an interference occupying the hint of the candidate. The other is
during 'regular' interference resolution, where we scan over all
physical registers and try to see if we can evict live ranges in favor
of the candidate. We currently use the same logic in both cases, just
that the former never passes the cost to any subsequent query.
Technically, the 2 decisions could be implemented with different
policies.

This patch splits the 2.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D114019

[lldb][NFC] Format lldb/include/lldb/Symbol/Type.h

Reviewed By: teemperor, JDevlieghere

Differential Revision: https://reviews.llvm.org/D113604

Signed-off-by: Luís Ferreira <contact@lsferreira.net>

[openmp][devicertl] Add a missing loader_uninitialized attribute

[clang-tidy] Fix pr48613: "llvm-header-guard uses a reserved identifier"

Fixes https://bugs.llvm.org/show_bug.cgi?id=48613.

llvm-header-guard is suggesting header guards with leading underscores
if the header file path begins with a '/' or similar special character.
Only reserved identifiers should begin with an underscore.

Differential Revision: https://reviews.llvm.org/D114149

[DebugInfo][InstrRef] Terminate overlapping variable fragments

If we have a variable where its fragments are split into overlapping
segments:

    DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16)
    ...
    DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32)

we should only propagate the most recently assigned fragment out of a
block. LiveDebugValues only deals with live-in variable locations, as
overlaps within blocks is DbgEntityHistoryCalculators domain.

InstrRefBasedLDV has kept the accumulateFragmentMap method from
VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's
produced a mapping of variable / fragments to the overlapped variable /
fragments, VLocTracker uses it to identify when a debug instruction needs
to terminate the other parts it overlaps with. The test is updated for
some standard "InstrRef picks different registers" variation, and the
order of some unrelated DBG_VALUEs changes.

Differential Revision: https://reviews.llvm.org/D114603

[CVP] Remove ashr of -1 or 0

Fixes PR#52190. There is already a check for converting ashr instructions with non-negative left-hand sides into lshr; this patch adds an optimization to remove ashr altogether if the left-hand side is known to be in the range [-1, 1).

Differential Revision: https://reviews.llvm.org/D113835

[SCEVExpander] Drop poison generating flags when reusing instructions

The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct.

This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid.

In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled.

On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE.

The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB.

Differential Revision: https://reviews.llvm.org/D112734

[DebugInfo][InstrRef][NFC] "Final" x86 test cleanup

These are some final test changes for using instruction referencing on X86:
* Most of these tests just have the flag switched so that they run with
instr-ref, and just work: these tests were fixed by earlier patches.
* There are some spurious differences in textual outputs,
* A few have different temporary labels in the output because more
MCSymbols are printed to the output.

Differential Revision: https://reviews.llvm.org/D114588

[unroll] Use early return in shouldPartialUnroll [nfc]

[unroll] Reduce scope of UnrollFactor variable in computeUnrollCount [NFC]

Suggested in review of D114453, done as a separate change to get all uses at once.

[DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG

Usually dbg.declares get translated into either entries in an MF
side-table, or a DBG_VALUE on entry to the function with IsIndirect set
(including in instruction referencing mode). Much rarer is a dbg.declare
attached to a non-argument value, such as in the test added in this patch
where there's a variable-length-array. Such dbg.declares become SDDbgValue
nodes with InIndirect=true.

As it happens, we weren't correctly emitting DBG_INSTR_REFs with the
additional indirection. This patch adds the extra indirection, encoded as
adding an additional DW_OP_deref to the expression.

Differential Revision: https://reviews.llvm.org/D114440

[unroll] Split full exact and full bound unroll costing [NFC]

This change should be NFC. It's posted for review mostly to make sure others are happy with the names I'm introducing for "exact full unroll" and "bounded full unroll". The motivation here is that our cost model for bounded unrolling is too aggressive - it gives benefits for exits we aren't going to prune - but I also just think the new version of the code is a lot easier to follow.

Differential Revision: https://reviews.llvm.org/D114453

[ELF] --cref: If -Map is specified, print to the map file

PR48282: This behavior matches GNU ld and gold.

Reviewed By: markj

Differential Revision: https://reviews.llvm.org/D114663

[InstCombine] try to fold 'or' into 'mul' operand

or (mul X, Y), X --> mul X, (add Y, 1) (when the multiply has no common bits with X)

We already have this fold if the pattern ends in 'add', but we can miss it if the
'add' becomes 'or' via another no-common-bits transform.

This is part of fixing:
http://llvm.org/PR49055
...but it won't make a difference on that example yet.

https://alive2.llvm.org/ce/z/Vrmoeb

Differential Revision: https://reviews.llvm.org/D114729

[DebugInfo][InstrRef] Preserve properties of restored variables

InstrRefBasedLDV observes when variable locations are clobbered, scans what
values are available in the machine, and re-issues a DBG_VALUE for the
variable if it can find another location. Unfortunately, I hadn't joined up
the Indirectness flag, so if it did this to an Indirect Value, the
indirectness would be dropped.

Fix this, and add a test that if we clobber a variable value (on the stack
in this case), then the recovered variable location keeps the Indirect
flag.

Differential Revision: https://reviews.llvm.org/D114378

[DAG] Add tests for fpsti.sat for various architectures. NFC

OpenMP: Correctly query location for amdgpu-arch

This was trying to figure out the build path for amdgpu-arch, and
making assumptions about where it is which were not working on my
system. Whether a standalone build or not, we should have a proper
imported target to get the location from.

Update unit test API usage (NFC)

[DebugInfo][InstrRef][NFC] Test changes: DBG_VALUE to DBG_INSTR_REF

This patch contains a bunch of replacements of:

    DBG_VALUE $somereg

with,

    SOMEINST debug-instr-number1
    DBG_INSTR_REF 1, 0, ...

It's mostly SelectionDAG tests that are making sure that the variable
location assignment is placed in the correct position in the instructions.

To avoid a loss in test coverage of SelectionDAG, which is used by a lot
of different backends, all these tests now have two modes and sets of RUN
lines, one for DBG_VALUE mode, the other for instruction referencing.

Differential Revision: https://reviews.llvm.org/D114258

Revert "OpenMP: Start calling setTargetAttributes for generated kernels"

This reverts commit 6c27d389c8a00040aad998fe959f38ba709a8750.

This is failing on the buildbots

[mlir][sparse] some leftover cleanup from migration to bufferization dialect

Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D114730

[LICM] Regenerate test checks (NFC)

[InstCombine] add tests for or with mul operand; NFC

[InstCombine] (~(a | b) & c) | ~(c | (a ^ b)) -> ~((a | b) & (c | (b ^ a)))

```
----------------------------------------
define i3 @src(i3 %a, i3 %b, i3 %c) {
%0:
  %or1 = or i3 %b, %c
  %not1 = xor i3 %or1, 7
  %and1 = and i3 %a, %not1
  %xor1 = xor i3 %b, %c
  %or2 = or i3 %xor1, %a
  %not2 = xor i3 %or2, 7
  %or3 = or i3 %and1, %not2
  ret i3 %or3
}
=>
define i3 @tgt(i3 %a, i3 %b, i3 %c) {
%0:
  %obc = or i3 %b, %c
  %xbc = xor i3 %b, %c
  %o = or i3 %a, %xbc
  %and = and i3 %obc, %o
  %r = xor i3 %and, 7
  ret i3 %r
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112955

[NFC][clang]Increase the number of driver diagnostics

We're close to hitting the limited number of driver diagnostics, increase `DIAG_SIZE_DRIVER` to accommodate more.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D114615

[HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang

Introduce `__hip_atomic_load`, `__hip_atomic_store` and `__hip_atomic_compare_exchange_weak`
builtins in HIP.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D114553

[LLDB][NativePDB] fix find-functions.cpp failure on windows bots (2)

OpenMP: Start calling setTargetAttributes for generated kernels

This wasn't setting any of the attributes the target would expect to
emit for kernels.

[libc++] Fix incorrect REQUIRES on a locale-dependent test

The test doesn't depend specifically on the en_US.UTF-8 locale, instead
it depends on whether localization support exists, period.

Differential Revision: https://reviews.llvm.org/D114708

[NFC][AIX]Disable unsupported hip test on AIX

AIX doesn't support GPU. There is no point testing HIP on it.

Reviewed By: Jake-Egan

Differential Revision: https://reviews.llvm.org/D114484

[LLDB][NativePDB] fix find-functions.cpp failure on windows bots

[mlir] Handle an edge case when folding reshapes with multiple trailing 1 dimensions

We would exit early and miss this case.

Differential Revision: https://reviews.llvm.org/D114711

[llvm] Use range-based for loops (NFC)

[InstCombine] Fold (~A | B) ^ A --> ~(A & B)

https://alive2.llvm.org/ce/z/gLrYPk

Fixes:
https://llvm.org/PR52518

Reviewed by: spatel

Differential revision: https://reviews.llvm.org/D114339

[SCEV] Remove incorrect assert

Fix assertion failure reported on D113349 by removing the assert.
While the produced expression should be equivalent, it may not
be strictly the same, e.g. due to lazy nowrap flag updates. Similar
to what the main createSCEV() code does, simply retain the old
value map entry if one already exists.

[HWASan] Disable LTO test on aarch64.

It fails for non-Android aarch64 bots as well.

[fir] Get rid of the global option in FIRBuilder

Replace the global option `nameLengthHashSize` with a constexpr
with the same name. The option was not used in fir-dev so switching
to a constexpr is fine.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D114630

[X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be folded into a shuffle

The mask on the shuffle is for the output, not the input.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114697

[AMDGPU][GlobalISel] Transform (fsub (fpext (fneg (fmul x, y))), z) -> (fneg (fma (fpext x), (fpext y), z))

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98050

[AMDGPU][GlobalISel] Transform (fsub (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), (fneg z))

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98049

[AMDGPU][GlobalISel] Transform (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98048

[AMDGPU][GlobalISel] Transform (fsub (fmul x, y), z) -> (fma x, y, -z)

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D96614

[AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fpext (fmul u, v))), z) -> (fma x, y, (fma (fpext u), (fpext v), z))

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98047

[AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y, (fma u, v, z))

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D97938

[AMDGPU][GlobalISel] Transform (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D97937

[AMDGPU][GlobalISel] Transform (fadd (fmul x, y), z) -> (fma x, y, z)

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D93305

[AMDGPU] Fix list indentation in docs

[AMDGPU] Fix "must generated" typo in docs

[X86] Add vector test coverage for or with no common bits tests

Ensure D113970 handles vector types patterns as well.

Differential Revision: https://reviews.llvm.org/D114575

[mlir][memref] Fix bug in verification of memref.collapse_shape

The verifier computed an illegal type with negative dimension size when collapsing partially static memrefs.

Differential Revision: https://reviews.llvm.org/D114702

Reapply 'Implement target_clones multiversioning'

See discussion in D51650, this change was a little aggressive in an
error while doing a 'while we were here', so this removes that error
condition, as it is apparently useful.

This reverts commit bb4934601d731465e01e2e22c80ce2dbe687d73f.

[clang-format] regressed default behavior for operator parentheses

{D110833} regressed behavior of spaces before parentheses for operators, this revision reverts that so that operators are handled as they were before.

I think in hindsight it was a mistake to try and consume operator behaviour in with the function behaviour, I think Operators can be considered a special style. Its seems the code is getting confused as to if this is a function declaration or definition.

I think latterly we can consider adding an operator parentheses specific custom option but this should have been explicitly called out as it can impact projects.

Reviewed By: HazardyKnusperkeks, curdeius

Differential Revision: https://reviews.llvm.org/D114696

[CodeGen][AArch64] Bail out in performConcatVectorsCombine for scalable vectors

I tried to exercise the existing combine patterns in performConcatVectorsCombine
for scalable vectors and at the moment it doesn't seem possible. Parts of
the code currently assume we're dealing with fixed-width vectors with calls
to getVectorNumElements(), therefore I've decided to simply bail out early
for scalable vectors.

Added a test here to show that we don't crash when attempting to combine
truncate + concat:

CodeGen/AArch64/concat_vector-truncate-combine.ll

Differential Revision: https://reviews.llvm.org/D114600