review.tizen.org Git - platform/upstream/llvm.git/log

[InstCombine] canonicalize icmp with trunc op into mask and cmp, part 2

If C is a high-bit mask:
(trunc X) u< C --> (X & C) != C (are any masked-high-bits clear?)

If C is low-bit mask:
(trunc X) u> C --> (X & ~C) != 0 (are any masked-high-bits set?)

If C is not-of-power-of-2 (one clear bit):
(trunc X) u> C --> (X & (C+1)) == C+1 (are all masked-high-bits set?)

This extends the fold added with:
acabad9ff6bf (https://alive2.llvm.org/ce/z/aFr7qV)

Using decomposeBitTestICmp() to generalize this is a planned follow-up, but that requires removing an inverse fold.

Here are Alive2 generalizations for these folds:
https://alive2.llvm.org/ce/z/u-ZpC_ (ult, the previous patch)
https://alive2.llvm.org/ce/z/YsuAu2 (ult, this patch)
https://alive2.llvm.org/ce/z/ekktQP (ugt, low bitmask)
https://alive2.llvm.org/ce/z/pJY9wR (ugt, one clear bit)

Differential Revision: https://reviews.llvm.org/D112634

[SLP]Improve cost of the gather nodes.

No need to count the final shuffle cost for the constants, gathering of
the constants is just a constant vector + extra inserts, if required.

Differential Revision: https://reviews.llvm.org/D113770

[AIX] XFAIL lto-comp-dir.ll for lack of .file directive support

This test explicitly checks for .file directives, which is not currently supported on AIX. This patch sets this test to XFAIL on AIX for now.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D113640

[SLP]Fix windows build, NFC.

Need to put `IndexIdx` var to the list of captures.

[lld-macho][nfc] rename parsed-section types & variables

This is an NFC diff that prepares for pruning & relocating `__eh_frame`.

Along the way, I made the following changes to ...
* clarify usage of `section` vs. `subsection`
* remove `map` & `vec` from type names
* disambiguate class `Section` from template parameter `SectionHeader`.

Differential Revision: https://reviews.llvm.org/D113241

[flang] Allow write after non advancing read in IO runtime

1. To avoid overwriting the part of the record read in the non advancing read,
the furtherPositionInRecord field must be set to the max of the
furtherPositionInRecord and the positionInRecord at the beginning of the
IO write.

2. To allow any further read to succeed after the write, the unit
beganReadingRecord_ must be set to false when resetting the recordLength
during the write, otherwise, recordLength will not be computed in further
read and an assert is hit (at unit.cpp(398)).

The added unit test exercises both of these scenarios.

Differential Revision: https://reviews.llvm.org/D113740

[SystemZ][z/OS] Fix warnings from unsigned int to long in 32-bit mode

This patch fixes the warnings which shows up when libcxx library started to be compiled in 32-bit mode on z/OS.
More specifically, the assignment from unsigned int to time_t aka long was flags as follows:
```
libcxx/include/c++/v1/__support/ibm/nanosleep.h:31:11: warning: implicit conversion changes signedness: 'unsigned int' to 'time_t' (aka 'long') [-Wsign-conversion]
  __sec = sleep(static_cast<unsigned int>(__sec));
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libcxx/include/c++/v1/__support/ibm/nanosleep.h:36:36: warning: implicit conversion changes signedness: 'unsigned int' to 'long' [-Wsign-conversion]
      __rem->tv_nsec = __micro_sec * 1000;
                     ~ ~~~~~~~~~~~~^~~~~~
libcxx/include/c++/v1/__support/ibm/nanosleep.h:47:36: warning: implicit conversion changes signedness: 'unsigned int' to 'long' [-Wsign-conversion]
      __rem->tv_nsec = __micro_sec * 1000;
                     ~ ~~~~~~~~~~~~^~~~~~
3 warnings generated.
```

Here is a small test case illustrating the issue:

```
typedef long    time_t ;
unsigned int sleep(unsigned int );
int main() {
  time_t sec = 0;
#ifdef FIX
  sec = static_cast<time_t>(sleep(static_cast<unsigned int>(sec)));
#else
  sec = sleep(static_cast<unsigned int>(sec));
#endif
}
```
clang++ -c -Wsign-conversion -m32 t.C
```
t.C:8:9: warning: implicit conversion changes signedness: 'unsigned int' to 'time_t' (aka 'long') [-Wsign-conversion]
  sec = sleep(static_cast<unsigned int>(sec));
      ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reviewed By: ldionne, #libc, Quuxplusone, Mordante

Differential Revision: https://reviews.llvm.org/D112837

[flang][CodeGen] Transform `fir.boxchar_len` to a sequence of LLVM MLIR

This patch extends the `FIRToLLVMLowering` pass in Flang by adding a
hook to transform `fir.boxchar_len` to a sequence of LLVM MLIR
instructions.

This is part of the upstreaming effort from the `fir-dev` branch in [1].

[1] https://github.com/flang-compiler/f18-llvm-project

Differential Revision: https://reviews.llvm.org/D113763

Originally written by:
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

[SLP]Adjust GEP indices types when trying to build entries.

Need to adjust the types of GEPs indices when building the tree
entries/operands. Otherwise some of the nodes might differ and
vectorizer is unable to correctly find them and count their cost.

Differential Revision: https://reviews.llvm.org/D113792

[SLP][NFC]Add more tests for shuffles that can be optimized after SLP,
NFC.

[fir] Add fir.gentypedesc conversion

Add conversion pattern for the GenTypeDescOp.
Convert to a global constant with an addressof.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D113766

Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>

[OpenMP] Fix initializer not working on AMDGPU

The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D113963

[NFC][clangd] cleaning up unused "using"

Cleaning up unused "using" declarations.
This patch was generated from automatically applyning clang-tidy fixes.

Differential Revision: https://reviews.llvm.org/D113891

[llvm-reduce] Add new BitWriter dependency after 28d95a26109e.

[IndVarSimplify] Reduce nondeterministic behaviour in visitIVCast.

rGf39978b84f1d3a1da6c32db48f64c8daae64b3ad led to and/or exposed
an issue with IndVarSimplification for a loop where a i32 phi node is
no longer replaced by a widened (i64) phi node, because the SCEVs of a
sign-extend no longer folded the same way. I'm unsure how to properly
explain this because it's all rather complicated, but in short: SCEVs
don't fold as nicely as they used to and this caused a difference.

While investigating this, I found that IndVarSimplify can actually
optimise the case in the way we want to if it chooses the widened IV to
be 'signed' (the i32 IV is both sign and zero-extended). Oddly enough,
there is some level of indeterminism in the way the algorithm works,
it just picks the sign of the 'first' zext/sext user, where the order of
the users-iterator is not guaranteed to be the same on each invocation
of the pass (e.g. shown by first running loop-rotate, which puts the
users in a different order).

While I think the fix is valid in the sense that consistently picking
_any_ order is better than having an nondeterministic order, I can
use a bit of advice from people more familiar in this area of the
code-base.

For example, I'm not sure if this fix is hiding another issue where the
IndVarSimplify pass could actually draw the same conclusions (i.e. that
it only needs an i64 phi node) if it does a bit more work, regardless
of whether it chooses the induction variable to be signed or unsigned.

I'm also not sure if choosing signed is better than unsigned, or whether
that just happens to be beneficial only in this individual case.

Any feedback would be much appreciated!

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D112573

[llvm-reduce] Allow writing temporary files as bitcode.

Textual LLVM IR files are much bigger and take longer to write to disk.
To avoid the extra cost incurred by serializing to text, this patch adds
an option to save temporary files as bitcode instead.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D113858

[fir] Add fir.cmpc conversion

This patch adds the codegen for fir.cmpc. The real and imaginary parts
are extracted and compared separately. For the .EQ. predicate the
results are AND'd, for the .NE. predicate the results are OR'd, and for
other predicates we keep only the result on the real parts.

This patch is part of the upstreaming effort from fir-dev.

Differential Revision: https://reviews.llvm.org/D113976

Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>

Fix false positive in `bugprone-throw-keyword-missing` check

Fixes PR#52400. The tests for bugprone-throw-keyword-missing actually
already contain exceptions as class members, but not as members with
initializers, which was probably just an oversight.

[lldb] Refactor Platform::ResolveExecutable

Module resolution is probably the most complex piece of lldb [citation
needed], with numerous levels of abstraction, each one implementing
various retry and fallback strategies.

It is also a very repetitive, with only small differences between
"host", "remote-and-connected" and "remote-but-not-(yet)-connected"
scenarios.

The goal of this patch (first in series) is to reduce the number of
abstractions, and deduplicate the code.

One of the reasons for this complexity is the tension between the desire
to offload the process of module resolution to the remote platform
instance (that's how most other platform methods work), and the desire
to keep it local to the outer platform class (its easier to subclass the
outer class, and it generally makes more sense).

This patch resolves that conflict in favour of doing everything in the
outer class. The gdb-remote (our only remote platform) implementation of
ResolveExecutable was not doing anything gdb-specific, and was rather
similar to the other implementations of that method (any divergence is
most likely the result of fixes not being applied everywhere rather than
intentional).

It does this by excising the remote platform out of the resolution
codepath. The gdb-remote implementation of ResolveExecutable is moved to
Platform::ResolveRemoteExecutable, and the (only) call site is
redirected to that. On its own, this does not achieve (much), but it
creates new opportunities for layer peeling and code sharing, since all
of the code now lives closer together.

Differential Revision: https://reviews.llvm.org/D113487

[mlir][spirv] add AtomicFAddEXTOp

Differential Revision: https://reviews.llvm.org/D113764

[SCEV] Support rewriting ZExt expressions with loop guard info.

So far, applying loop guard information has been restricted to
SCEVUnknown. In a few cases, like PR40961 and PR52464, this leads to
SCEV failing to determine tight upper bounds for the backedge taken
count.

This patch adjusts SCEVLoopGuardRewriter and applyLoopGuards to support
re-writing ZExt expressions.

This is a first step towards fixing PR40961 and PR52464.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D113577

[AArch64][SVE] Instcombine SVE LD1/ST1 to stock LLVM IR

InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store
and LD1/ST1 to load/store when a ptrue all predicate pattern operand
is present.

This allows existing IR optimizations such as dead-load removal to
occur.

Differential Revision: https://reviews.llvm.org/D113489

Fix unused variable warning in LoadStoreOpt.cpp with (void)

Revert "Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp"

This reverts commit 40a609aebe4ab51174a164852b6399f322bf6d9a.

Revert "Fix unused variable warning."

This reverts commit a062e2a8ca27b615cf3d02ed5c551ca85efc0325.

Revert "Fix another unused variable error."

This reverts commit 5b84ae7c48083bd0f40199837990cf915a2053b8.

[lldb] Simplify specifying of platform supported architectures

The GetSupportedArchitectureAtIndex pattern forces the use of
complicated patterns in both the implementations of the function and in
the various callers.

This patch creates a new method (GetSupportedArchitectures), which
returns a list (vector) of architectures. The
GetSupportedArchitectureAtIndex is kept in order to enable incremental
rollout. Base Platform class contains implementations of both of these
methods, using the other method as the source of truth. Platforms
without infinite stacks should implement at least one of them.

This patch also ports Linux, FreeBSD and NetBSD platforms to the new
API. A new helper function (CreateArchList) is added to simplify the
common task of creating a list of ArchSpecs with the same OS but
different architectures.

Differential Revision: https://reviews.llvm.org/D113608

[lldb/test] Move gdb client utils into the packages tree

This infrastructure has proven proven its worth, so give it a more
prominent place.

My immediate motivation for this is the desire to reuse this
infrastructure for qemu platform testing, but I believe this move makes
sense independently of that. Moving this code to the packages tree will
allow as to add more structure to the gdb client tests -- currently they
are all crammed into the same test folder as that was the only way they
could access this code.

I'm splitting the code into two parts while moving it. The first once
contains just the generic gdb protocol wrappers, while the other one
contains the unit test glue. The reason for that is that for qemu
testing, I need to run the gdb code in a separate process, so I will
only be using the first part there.

Differential Revision: https://reviews.llvm.org/D113893

Fix another unused variable error.

[mlir] spirv: Add scf.while spirv conversion

* It works similar to scf.for coversion, but convert condition and yield ops as part of scf.whille pattern so it don't need to maintain external state

Differential Revision: https://reviews.llvm.org/D113007

Fix unused variable warning.

[mlir] Support multi-dimensional vectors in MathToLibm conversion.

Differential Revision: https://reviews.llvm.org/D113969

[MLIR] Simplify semi-affine expressions using flattening

For the semi affine expressions, whenever rhs of a floordiv, ceildiv, mod
or product expression is a symbolic expression, we introduce a local variable
representing the result, and store the floordiv/ceildiv, mod or product
affine expression in LocalExprs. In this way the expression is flattened,
and trivial addition and subtraction related simplifications are performed.
Also rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.

Differential Revision: https://reviews.llvm.org/D112808

Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp

[ARM] Add datalayout to costmodel tests. NFC

This adds a sensible datalayout to the ARM cost model tests, to prevent
the costs reported being incorrect for the size of pointers.

[MLIR] FlatAffineConstraints: Allow extraction of explicit representation of local variables

This patch extends the existing functionality of computing an explicit
representation for local variables, to also get the explicit representation,
instead of only the inequality pairs.

This is required for a future patch to remove redundant local ids based on
their explicit representation.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D113814

[CGP][PowerPC] Pre-commit test case for D113872. NFC.

Remove unnecessary <any> include.

[clang-tidy][NFC] Simplify ClangTidyStats

- Use NSDMI and remove constructor.

Differential Revision: https://reviews.llvm.org/D113847

[RISCV] Refactor some rvv instructions' definition with foreach.

Simplify rvv instructions that use eew in their mnemonic and encoding with foreach. And fix a scheduling bug.

Differential Revision: https://reviews.llvm.org/D113453

tsan: use pthread_equal instead of direct pthread_t comparison

man pthread_equal:
  The pthread_equal() function is necessary because thread IDs
  should be considered opaque: there is no portable way for
  applications to directly compare two pthread_t values.

Depends on D113916.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D113919

tsan: speed up pthread_setname_np

pthread_setname_np does linear search over all thread descriptors
to map pthread_t to the thread descriptor. This has O(N^2) complexity
and becomes much worse in the new tsan runtime that keeps all ever
existed threads in the thread registry.
Replace linear search with direct access if pthread_setname_np
is called for the current thread (a very common case).

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D113916

[gn build] Port 2e6ae1d3f2de

[libcxx] [Coroutine] Conform Coroutine Implementation

Since coroutine is merged in C++ standard and the support for coroutine
seems relatively stable. It's the time to move the implementation of
coroutine out of the experimental directory and the std::experimental
namespace. This patch creates header <coroutine> with conformed
implementation with C++ standard. To avoid breaking user's code too
fast, the <experimental/coroutine> header is remained. Note that
<experimental/coroutine> is deprecated and it would be removed in
LLVM15.

Reviewed By: Quuxplusone, ldionne

Differential Revision: https://reviews.llvm.org/D109433

Revert "[MLIR][LLVM] Permit integer types in switch other than i32"

This reverts commit 94992670fcc59d12d7f97cb08beb8d2eb15110ed.
Build is broken with:

tools/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.cpp.inc:23996:3: error: no matching function for call to 'printSwitchOpCases'
printSwitchOpCases(_odsPrinter, *this, getValue().getType(), getCaseValuesAttr(), getCaseDestinations(), getCaseOperands(), getCaseOperands().getTypes());
^~~~~~~~~~~~~~~~~~

[STATEPOINT] Force implicit-def for lr register.

STATEPOINT instruction behavior is similar to call instruction.
In aarch64 BL instruction implicitly define lr register, so
STATEPOINT instruction should do the same.
However STATEPOINT is a general pseudo instruction and I could not find
a way to override list of implicit defs for specific target.

So this patch post processes inserting STATEPOINT instruction by
adding implisit dead def for lr.

Reviewers: reames, loicottet, ostannard
Reviewed By: reames
Subscribers: danilaml, hiraditya, kristof.beyls, llvm-commits, yrouban
Differential Revision: https://reviews.llvm.org/D111114

[MLIR][LLVM] Permit integer types in switch other than i32

LLVM switchop currently only permits i32. Both LLVM IR and MLIR Standard switch permit other integer types leading to an illegal state when lowering an i8 switch from MLIR standard

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D113955

[llvm] Use make_early_inc_range (NFC)

[gn build] Port dc84770d559b

[GlobalISel] Add a store-merging optimization pass and enable for AArch64.

This is a first attempt at a constant value consecutive store merging pass,
a counterpart to the DAGCombiner's store merging optimization.

The high level goals of this pass:

* Have a simple and efficient algorithm. As close to linear time as we can get.
  Thus, prioritizing scalability of the algorithm over merging every corner case
  we can find. The DAGCombiner's store merging code has been the source of
  compile time and complexity issues in the past and I wanted to avoid that.
* Don't introduce any new data structures for ordering memory operations. In MIR,
  we don't have the concept of chains like we do in the DAG, and the instruction
  order is stricter than enforcing ordering with graph edges. Although I
  considered adding something similar, I couldn't justify the overhead.

The pass is current split into 3 main parts. The main store merging code focuses
on identifying candidate stores and managing the candidate group that's under
consideration for merging. Analyzing addressing of stores is a potentially
complex part and for now there's just a basic implementation to identify easy
cases. Finally, the other main bit of complexity is the alias analysis, which
tries to follow the same logic as the DAG's AA.

Currently this implementation only supports merging of constant stores. Stores
of arbitrary variables are technically possible with a very small change, but
the DAG chooses not to do this. Doing so here makes most code worse since
there's extra overhead in merging values into wider registers.

On AArch64 -Os, this optimization results in very minor savings on CTMark.

Differential Revision: https://reviews.llvm.org/D109131

[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset

Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools.

Differential Revision: https://reviews.llvm.org/D113727

Add the stop count to "statistics dump" in each target's dictionary.

It is great to know how many times the target has stopped over its lifetime as each time the target stops, and possibly resumes without the user seeing it for things like shared library loading and signals that are not notified and auto continued, to help explain why a debug session might be slow. This is now included as "stopCount" inside each target JSON.

Differential Revision: https://reviews.llvm.org/D113810

[RISCV] Override TargetLowering::hasAndNot for Zbb.

Differential Revision: https://reviews.llvm.org/D113937

[RISCV] Add test cases to prepare for overring TargetLowering::hasAndNot. NFC

These test files are copied directly from AArch64. Some of the cases
may benefit from ANDN with the Zbb extension. Somes cases already
improve use ANDN.

selectcc-to-shiftand.ll also contains tests that test select->and
conversion even when a ANDN isn't needed. I think this improves our
coverage of these optimizations.

Differential Revision: https://reviews.llvm.org/D113935

[X86] Fix crash with inline asm using wrong register name

Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D113834

AMDGPU: Mark prolog/epilog SCC defs as dead

A future change will add SCC liveness checks. Since we are still
relying on forward register scavenging, add dead flags to avoid
spuriously detecting SCC as live.

AMDGPU: Regenerate test checks

[mlir][linalg][bufferize][NFC] Clean up tensor op bufferization

Differential Revision: https://reviews.llvm.org/D113730

[clang] NFC: rename internal `IsPossiblyOpaquelyQualifiedType` overload

Rename `IsPossiblyOpaquelyQualifiedType` overload taking a Type*
as `IsPossiblyOpaquelyQualifiedTypeInternal` instead.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D113954

DebugInfo: Make DWARFExpression::iterator::skipBytes() const, NFC

Given that DWARFExpression::iterator::skipBytes() doesn't change any
state (it returns a new `iterator`), it might as well be
`const`-qualified.

DebugInfo: const-qualify accessors of DWARFExpression::Operation

Add `const` to DWARFExpression::Operation's accessors and make
Operation::extract() private, since it's only used by the friend class
DWARFExpression::iterator.

DebugInfo: Make DWARFExpression::iterator::operator++ return itself

Looks like an accident that `operator++` was returning `Operator&`
instead of `iterator&`. Update to match standard iterator behaviour.

[DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs.

It's possible that the mask is already a NOT. At least if InstCombine
hasn't canonicalized the input. In that case we will form an ANDN with
X instead of with Y. So we don't need to worry about Y being a constant.

We might need to check that X isn't a constant instead, but we don't
have a test case for that yet.

This fixes a size regression found when trying to enable this combine
for RISCV in D113937.

Differential Revision: https://reviews.llvm.org/D113948

add tsan shared lib

Change-Id: Ic83ff1ec86d6a7d61b07fa3df7e0cb2790b5ebc7

[LLDB][NativePDB] Fix local-variables.cpp failure on windows bots

[Bazel] Enable layering_check for MLIR build

This feature checks that headers included by a file are provided by a
header exported by one of the direct dependencies of the build rule in
which it is contained. It ensures that appropriate layering (a goal of
the LLVM project) is preserved. So far, I'm only adding this to MLIR
because we've had it turned on internally since the beginning, so MLIR
is already layering clean. It would be nice to also enable it for LLVM,
but that requires some additional cleanup.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D113952

[InstSimplify] Fold A|B | (A^B) --> A|B

This patch adds the following fold opportunity:
A|B | (A^B) --> A|B

that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479

https://alive2.llvm.org/ce/z/33-My-

Test cases with base results are added in D113860

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D113861

[RISCV] Optimize immediate materialisation with SH*ADD

Use LUI+SH*ADD+ADDI to compose specific immediates.

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D113568

[RISCV][test] Add more tests of immediate materialisation

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D113567

[mlir][sparse] first version of "truly" dynamic sparse tensors as outputs of kernels

This revision contains all "sparsification" ops and rewriting necessary to support sparse output tensors when the kernel has no reduction (viz. insertions occur in lexicographic order and are "injective"). This will be later generalized to allow reductions too. Also, this first revision only supports sparse 1-d tensors (viz. vectors) as output in the runtime support library. This will be generalized to n-d tensors shortly. But this way, the revision is kept to a manageable size.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D113705

[tosa][mlir] Refactor tosa.reshape lowering to linalg for dynamic cases.

Split tosa.reshape into three individual lowerings: collapse, expand and a
combination of both. Add simple dynamic shape support.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D113936

Revert "[InstSimplify] Fold A|B | (A^B) --> A|B"

This reverts commit 193c40e9667ca2b173232b393fc72ea9e4944aa3.

Add `isInitCapture` and `forEachLambdaCapture` matchers.

This contributes follow-up work from https://reviews.llvm.org/D112491, which
allows for increased control over the matching of lambda captures. This also
updates the documentation for the `lambdaCapture` matcher.

Reviewed By: ymandel, aaron.ballman

Differential Revision: https://reviews.llvm.org/D113575

[llvm-reduce] Don't reuse SmallVector across calls to getAllMetadata()

The SmallVector is not cleared in calls to getAllMetadata().

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D113808

[mlir][tosa] Add tosa.mul by one canonicalization

Multiply by one can be removed during canonicalization. This optimizes away unneeded operations.

Differential Revision: https://reviews.llvm.org/D113807

[NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses

Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.

So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.

This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.

However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.

Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).

Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss

D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation. D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113304

[clang] retain type sugar in auto / template argument deduction

This implements the following changes:
* AutoType retains sugared deduced-as-type.
* Template argument deduction machinery analyses the sugared type all the way
down. It would previously lose the sugar on first recursion.
* Undeduced AutoType will be properly canonicalized, including the constraint
template arguments.
* Remove the decltype node created from the decltype(auto) deduction.

As a result, we start seeing sugared types in a lot more test cases,
including some which showed very unfriendly `type-parameter-*-*` types.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith, #libc, ldionne

Differential Revision: https://reviews.llvm.org/D110216

[JITLink] Fix splitBlock if there are symbols span across the boundary

Fix `splitBlock` so that it can handle the case when the block being
split has symbols span across the split boundary. This is an error
case in general but for EHFrame splitting on macho platforms, there is an
anonymous symbol that marks the entire block. Current implementation
will leave a symbol that is out of bound of the underlying block. Fix
the problem by dropping such symbols when the block is split.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D113912

[InstSimplify] Fold A|B | (A^B) --> A|B

This patch adds the following fold opportunity:
A|B | (A^B) --> A|B

that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479

https://alive2.llvm.org/ce/z/33-My-

Test cases with base results are added in D113860

(authored by MehrHeidar, committed by rampitec).

Differential Revision: https://reviews.llvm.org/D113861

[SystemZ] Support symbolic displacements.

This patch adds support for symbolic displacements, e.g. like 'lg %r0,
sym(%r1)', which is done using relocations. This is needed to compile the
kernel without disabling the integrated assembler.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D113341

[mlir][Vector] Make vector.shape_cast based size-1 foldings opt-in and separate.

This is in prevision of dropping them altogether and using insert/extract based patterns.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D113928

[NFC][Regalloc] Factor types that would be used by the eviction advisor

This is in prepartion of pulling the eviction decision-making into an
analysis pass, which would then allow swapping that decision making
process.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D113929

[X86] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC

[llvm] adapt DWARFExpression.h for 6b9b86db9dd974585a5c71cf2e5231d1e3385f7c

No functional changes intended.

Updated the iterator class for https://github.com/llvm/llvm-project/commit/6b9b86db9dd974585a5c71cf2e5231d1e3385f7c.

Differential Revision: https://reviews.llvm.org/D113934

[unroll-runtime] Relax two profitability limitations on multi-exit unrolling

This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change.  If anyone sees actually interesting code differences out of this, please let me know.  I'm not expecting this to have much impact at all.

Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block.  We can only hit this with a poorly documented debug flag.

Case 2 - Why should we treat epilog cases differently from prolog cases?  Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled?

Sorry for the lack of tests here.  These are both *exceedingly* narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags.  Note that the legality cases for each are already exercised with force flags.

[InstrProf][NFC] Fix a few typos in code comments.

[mlir][Linalg] Add a DownscaleDepthwiseConv2DNhwcHwcOp decomposition pattern.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D113907

[asm] Make EmitMSInlineAsmStr and EmitGCCInlineAsmStr more alike

https://reviews.llvm.org/D71677 copied a bunch of code from
EmitGCCInlineAsmStr() to EmitMSInlineAsmStr() but made a few small
(likely unintentional) changes. This makes these pieces look the same.

No behavior change.

(Why are these functions two copies? No great reason as far as I can tell.
https://reviews.llvm.org/rG1778831a3d1d24ab6545635f63da4d9c5f8f0ac7 did the
split; we might want to undo them at some point. But PR23933 suggests
that a bigger change is planned for this file in the future, so keeping
this incremental for now.)

Differential Revision: https://reviews.llvm.org/D113924

[asm] Convert AsmPrinter::PrintSpecial() to StringRef

No behavior change.

Differential Revision: https://reviews.llvm.org/D113911

[asm] Correctly handle special names in variants

There's really no reason why anyone should use these special names in a variant.
I noticed this while reading the code: all other writes to OS are guarded by
this conditional, and the behavior with the check seems more correct, so
let's add the check.

Differential Revision: https://reviews.llvm.org/D113909

[PowerPC] Fix 32bit vector insert instructions for ISA3.1

The platform independent ISD::INSERT_VECTOR_ELT take a element index,
but vins* instructions take a byte index. Update 32bit td patterns for
vector insert to handle the element index accordingly.

Since vector insert for non constant index are supported in
ISA3.1, there is no need to use platform specific ISD node,
PPCISD::VECINSERT. Update td pattern to directly use
ISD::INSERT_VECTOR_ELT instead.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D113802

[NFC][lld] Inclusive language: change master file to merged file

[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with merged in these comments.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D113903

[LLDB][NativePDB] Fix image lookup by address

`image lookup -a ` doesn't work because the compilands list is always empty.
Create CU at given index if doesn't exit.

Differential Revision: https://reviews.llvm.org/D113821

Making the code compliant to the documentation about Floating Point
support default values for C/C++. FPP-MODEL=PRECISE enables FFP-CONTRACT
(FMA is enabled).

Fix a misleading FIXME in an unroll test

[SLP][NFC]Add a test for extra shuffle emission, NFC.

[llvm][fix] Inclusive language: replace master with main in find_interesting_reviews.py

As part of using inclusive language within the llvm project and to fix
the command because the master branch was renamed, this patch replaces
master with main in `find_interesting_reviews.py`.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D113918

[runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC]

All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller. The assert use has little value with the restructured code and is simply dropped.

[mlir][linalg] Make loop ops in TileLoopNest accessible

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D113927

[PatternMatch] Add m_BinOp/m_c_BinOp with specific opcode

Differential Revision: https://reviews.llvm.org/D113508