review.tizen.org Git - platform/upstream/llvm.git/log

[AArch64][GlobalISel] Fold in G_ANYEXT/G_ZEXT into TB(N)Z

This is similar to the code in getTestBitOperand in AArch64ISelLowering. Instead
of implementing all of the TB(N)Z optimizations at once, this patch implements
the simplest case first. The way that this is set up should make it fairly easy
to add the rest as we go along.

The idea here is that after determining that we can use a TB(N)Z, we can
continue looking through instructions and perform further folding.

In this case, when we have a G_ZEXT or G_ANYEXT where the extended bits are not
used, we can fold it into the TB(N)Z.

Differential Revision: https://reviews.llvm.org/D73673

[AArch64][GlobalISel] Disallow vectors in convertPtrAddToAdd.

Found by inspection, but there's no test for this yet because G_PTR_ADD is
currently illegal for vectors. I'll add the test at a later time when the
legalizer support has landed.

[InstCombine] Remove unnecessary worklist add; NFCI

Again, this will already be added by IRBuilder.

[Fuchsia] Never link in implicit "system dependencies" of sanitizer runtimes

This is never appropriate on Fuchsia and any future needs for
system library dependencies of compiler-supplied runtimes will
be addressed via .deplibs instead of driver hacks.

Patch By: mcgrathr

Differential Revision: https://reviews.llvm.org/D73734

[NFC] Fix check prefix add in fcanonicalize-elimination.ll

The test fix added by "D39306: Fix
CodeGen/AMDGPU/fcanonicalize-elimination.ll on FreeBSD 11.0" uses a test
prefix which is not actually used in the FileCheck stanza. Thus the
problem originally encountered still exists and the tests fails for host
triples that contain "1.0", including AIX 7.1.0.

MSVC Buggy version detection: turn pre-processor error into CMake configuration time check

This allows consumer to override in a cleaner way while still prevent
them from hitting bug without knowing they run an unsupported
configuration.

Differential Revision: https://reviews.llvm.org/D73677

AMDGPU: Replace subtarget check with an assert

This is already checked by the pattern subtarget predicate.

AMDGPU: Don't use separate cache arguments for s_buffer_load node

There's not much value to this separate node from the intrinsic. Make
the operand structure the same as the intrinsic, so we can reuse the
same pattern for GlobalISel.

[InstCombine] Remove unnecessary worklist add; NFCI

The IRBuilder will automatically add instructions to the worklist.
Adding it manually is unnecessary, but may mess up worklist order.

[InstCombine] Create new insts in foldICmpEqIntrinsicWithConstant; NFCI

In line with current conventions, create new instructions rather
than modify two operands in place and performing manual worklist
management.

This should be NFC apart from possible worklist order changes.

Move verification of Sema::MaximumAlignment to a .cpp file

Saves these transitive includes:

[scudo][standalone] Release secondary memory on purge

Summary:
The Secondary's cache needs to be released when the Combined's
`releaseToOS` function is called (via `M_PURGE`) for example,
which this CL adds.

Additionally, if doing a forced release, we'll release the
transfer batch class as well since now we can do that.

There is a couple of other house keeping changes as well:
- read the page size only once in the Secondary Cache `store`
- remove the interval check for `CanRelease`: we are going to
make that configurable via `mallopt` so this needs not be
set in stone there.

Reviewers: cferris, hctim, pcc, eugenis

Subscribers: #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D73730

[lldb][NFC] LLDB_LOGF to LLDB_LOG conversion in ClangASTImporter

[mlir] LLVM dialect: Generate conversions between EnumAttrCase and LLVM API

Summary:
MLIR materializes various enumeration-based LLVM IR operands as enumeration
attributes using ODS. This requires bidirectional conversion between different
but very similar enums, currently hardcoded. Extend the ODS modeling of
LLVM-specific enumeration attributes to include the name of the corresponding
enum in the LLVM C++ API as well as the names of specific enumerants. Use this
new information to automatically generate the conversion functions between enum
attributes and LLVM API enums in the two-way conversion between the LLVM
dialect and LLVM IR proper.

Differential Revision: https://reviews.llvm.org/D73468

[Clang][Bundler][NFC] Replace SmallString<...> with StringRef

Reviewers: ABataev

Reviewed By: ABataev

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73738

[AMDGPU] Add file headers for few files where it is missing.

Summary:
Added file headers for files which implement iterative lightweight scheduling
strategies. Which is basically an exercise which I undertook in order to get
used to LLVM development process.

Reviewers: arsenm, vpykhtin, cdevadas

Reviewed By: vpykhtin

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73417

[libc++abi] Bump PACKAGE_VERSION

[libc] Add [EXPECT|ASSERT]_[TRUE|FALSE] unittest macros.

Also, other EXPECT_* and ASSERT_* macros have been extended to accept
bool values.

Reviewers: abrachet, gchatelet

Subscribers: MaskRay, tschuett, libc-commits

Tags: #libc-project

Differential Revision: https://reviews.llvm.org/D73668

[mlir][NFC] Update several SPIRV operations to use declarative parsers.

Differential Revision: https://reviews.llvm.org/D73504

[mlir][NFC] Use declarative format for several operations in LLVM and Linalg dialects

Differential Revision: https://reviews.llvm.org/D73503

[mlir] Update various operations to declaratively specify their assembly format.

Summary:
This revision switches over many operations to use the declarative methods for defining the assembly specification. This updates operations in the NVVM, ROCDL, Standard, and VectorOps dialects.

Differential Revision: https://reviews.llvm.org/D73407

[mlir] Add support for generating the parser/printer from the declarative operation format.

Summary:
This revision add support, and testing, for generating the parser and printer from the declarative operation format.

Differential Revision: https://reviews.llvm.org/D73406

[mlir] Add initial support for parsing a declarative operation assembly format

Summary:
This is the first revision in a series that adds support for declaratively specifying the asm format of an operation. This revision
focuses solely on parsing the format. Future revisions will add support for generating the proper parser/printer, as well as
transitioning the syntax definition of many existing operations.

This was originally proposed here:
https://llvm.discourse.group/t/rfc-declarative-op-assembly-format/340

Differential Revision: https://reviews.llvm.org/D73405

[lldb/Reproducers] Fix API boundary tracking bug

When recording the result from the LLDB_RECORD_RESULT macro, we need to
update the boundary so we capture the copy constructor. However, when
called to record the this pointer of the (copy) constructor itself, the
boundary should not be toggled, because it is called from the
LLDB_RECORD_CONSTRUCTOR macro, which might be followed by other API
calls.

This manifested itself as an object encountered during replay that we
hadn't seen before. The index-to-object mapping would return a nullptr
and lldb would crash.

[AIX] Minor cleanup in AsmPrinter. [NFC]

- Extends the comments related to function descriptors, noting how they
are only used on AIX.

- Changes the condition used to gate the creation of the current function
symbol in AsmPrinter::SetupMachineFunction to reflect being AIX
specific. The creation of the symbol is different because of AIXs
linkage conventions, not because AIX uses function descriptors.

Differential Revision: https://reviews.llvm.org/D73115

[AArch64] -fpatchable-function-entry=N,0: place patch label after BTI

Summary:
For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after
9a24488cb67a90f889529987275c5e411ce01dda, we place the NOP sled after
the initial BTI.

```
.Lfunc_begin0:
bti c
nop
nop

.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lfunc_begin0
```

This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label:

```
.Lfunc_begin0:
bti c
.Lpatch0:
nop
nop

.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lpatch0
```

This placement is compatible with the resolution in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 .

A local linkage function whose address is not taken does not need a BTI.
Placing the patch label after BTI has the advantage that code does not
need to differentiate whether the function has an initial BTI.

Reviewers: mrutland, nickdesaulniers, nsz, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73680

Speed up compilation of ASTImporter

Avoid recursively instantiating importSeq. Use initializer list
expansion to stamp out a single instantiation of std::tuple of the
deduced sequence of types, and thread the error around that tuple type.
Avoids needlessly instantiating std::tuple N-1 times.

new time to compile: 0m25.985s
old time to compile: 0m35.563s

new obj size: 10,000kb
old obj size: 12,332kb

I found the slow TU by looking at ClangBuildAnalyzer results, and looked
at -ftime-trace for the file in chrome://tracing to find this.

Tested with: clang-cl, MSVC, and GCC.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D73667

[ConstantFold][SVE][NFC] Add test for select instruction in scalable vector.

Side notes from D73669, no need to guard the iteration on vectors, as
it is explicitly looking for a ConstantVector/ConstantDataVector, which
is not expected to be scalable at the moment. So, add the test only.

[Concepts] Add 'this' context to instantiation of member requires clause

'this' context was missing in instantiation of member requires clause.

[Concepts] Add check for dependent RC when checking function constraints

Do not attempt to check a dependent requires clause in a function constraint
(may be triggered by, for example, DiagnoseUseOfDecl).

[Concept] Fix incorrect check for containsUnexpandedParameterPack in CSE

We previously checked for containsUnexpandedParameterPack in CSEs by observing the property
in the converted arguments of the CSE. This may not work if the argument is an expanded
type-alias that contains a pack-expansion (see added test).

Check the as-written arguments when determining containsUnexpandedParameterPack and isInstantiationDependent.

[ConstantFold][SVE] Fix constant folding for scalable vector unary operations.

Summary:
Similar to issue D71445. Scalable vector should not be evaluated element by element.
Add support to handle scalable vector UndefValue.

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73678

[AArch64][SVE] Add remaining SVE2 mla indexed intrinsics.

Summary:
Add remaining SVE2 mla indexed intrinsics:
- sqdmlalb, sqdmlalt, sqdmlslb, sqdmlslt

Add suffix _lanes and switch immediate types to i32 for all mla indexed intrinsics to align with ACLE builtin definitions.

Reviewers: efriedma, sdesmalen, cameron.mcinally, c-rhodes, rengolin, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73633

[Clang][Driver] Disable llvm passes for the first host OpenMP offload compilation

Summary: With OpenMP offloading host compilation is done in two phases to capture host IR that is passed to all device compilations as input. But it turns out that we currently run entire LLVM optimization pipeline on host IR on both compilations which may have unpredictable effects on the resulting code. This patch fixes this problem by disabling LLVM passes on the first compilation, so the host IR that is passed to device compilations will be captured right after front end.

Reviewers: ABataev, jdoerfert, hfinkel

Reviewed By: ABataev

Subscribers: guansong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73721

[ThinLTO] Disable "Always import constants" due to compile time issues

Summary:
Disable the always importing of constants introduced in D70404 by
default under a new internal option, since it is causing order of
magnitude compile time regressions during the thin link. Will continue
investigating why the regressions occur.

Reviewers: evgeny777, wmi

Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73724

[libcxxabi] Insert padding in __cxa_exception struct for compatibility

Summary:
Preserve the old ABI for __cxa_exception and __cxa_dependent_exception
on 64 bit platforms or ARM_EHABI platforms.

After r276215, libunwind in llvm-project labels _Unwind_Exception to be
double word aligned. That change implictly adds a padding before
unwindHeader field in __cxa_exception and __cxa_dependent_exception.
Preserve the same negative offsets in those struct by moving the padding
to the beginning of the field.

The assumption here is that if the ABI is not aware of the padding before
unwindHeader and put the referenceCount/primaryException in there, no padding
should exist before unwindHeader.

Reviewers: EricWF, mclow.lists, ldionne, jroelofs, dexonsmith, rjmccall, compnerd, phosek, ahatanak

Reviewed By: rjmccall

Subscribers: hans, smeenai, kristof.beyls, christof, jkorous, ributzka, libcxx-commits

Tags: #libc

Differential Revision: https://reviews.llvm.org/D72543

[LoopFusion] Move instructions from FC1.GuardBlock to FC0.GuardBlock and
from FC0.ExitBlock to FC1.ExitBlock when proven safe.

Summary:
Currently LoopFusion give up when the second loop nest guard
block or the first loop nest exit block is not empty. For example:

if (0 < N) {
  for (int i = 0; i < N; ++i) {}
  x+=1;
}
y+=1;
if (0 < N) {
  for (int i = 0; i < N; ++i) {}
}
The above example should be safe to fuse.
This PR moves instructions in FC1 guard block (e.g. y+=1;) to
FC0 guard block, or instructions in FC0 exit block (e.g. x+=1;) to
FC1 exit block, which then LoopFusion is able to fuse them.
Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D73641

[AArch64][ARM] Always expand ordered vector reductions (PR44600)

fadd/fmul reductions without reassoc are lowered to
VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization
support. Until that is in place, expand these intrinsics on
ARM and AArch64. Other targets always expand the vector reduction
intrinsics.

Additionally expand fmax/fmin reductions without nonan flag on
AArch64, as the backend asserts that the flag is present when
lowering VECREDUCE_FMIN/FMAX.

This fixes https://bugs.llvm.org/show_bug.cgi?id=44600.

Differential Revision: https://reviews.llvm.org/D73135

[NFC] small refactor on RenamerClangTidyCheck.cpp

[libc] Add a missing `this->` in __llvm_libc::cpp:MutableArrayRef::end.

I had removed it to verify a review comment, but forgot to put it back.

[NFC][IndVarSimplify] Autogenerate exit_value_test2.ll check lines

[OPENMP50]Handle lastprivate conditionals passed as shared in inner
regions.

If the lastprivate conditional is passed as shared in inner region, we
shall check if it was ever changed and use this updated value after exit
from the inner region as an update value.

[BPF] fix a bug in BPFMISimplifyPatchable pass with -O0

The recommended optimization level for BPF programs
is O2 since (1). BPF is running inside the kernel and
linux kernel won't work at -O0 level, and (2). Verifier
is not able to handle O0 code properly, e.g., potential
large stack size and a lot of spills.

But we should keep -O0 at least compiling.
This patch fixed a bug in BPFMISimplifyPatchable phase
where with -O0, a segmentation fault will happen for a
simple program like:
int test(int a, int b) { return a + b; }

A test case is added to capture such a case.

Differential Revision: https://reviews.llvm.org/D73681

[Clang][Bundler] Reduce fat object size

Summary:
Fat object size has significantly increased after D65819 which changed bundler tool to add host object as a normal bundle to the fat output which almost doubled its size. That patch was fixing the following issues

1. Problems associated with the partial linking - global constructors were not called for partially linking objects which clearly resulted in incorrect behavior.
2. Eliminating "junk" target object sections from the linked binary on the host side.

The first problem is no longer relevant because we do not use partial linking for creating fat objects anymore. Target objects sections are now inserted into the resulting fat object with a help of llvm-objcopy tool.

The second issue, "junk" sections in the linked host binary, has been fixed in D73408 by adding "exclude" flag to the fat object's sections which contain target objects. This flag tells linker to drop section from the inputs when linking executable or shared library, therefore these sections will not be propagated in the linked binary.

Since both problems have been solved, we can revert D65819 changes to reduce fat object size and this patch essentially is doing that.

Reviewers: ABataev, alexshap, jdoerfert

Reviewed By: ABataev

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73642

[MLIR] Add the sqrt operation to mlir.

Summary: Add and pipe through the sqrt operation for Standard and LLVM dialects.

Reviewers: nicolasvasilache, ftynse

Reviewed By: ftynse

Subscribers: frej, ftynse, merge_guards_bot, flaub, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73571

[analyzer] CheckerContext: Make the Preprocessor available

Summary:
This patch hooks the `Preprocessor` trough `BugReporter` to the
`CheckerContext` so the checkers could look for macro definitions.

Reviewed By: NoQ

Differential Revision: https://reviews.llvm.org/D69731

[mlir] EnumsGen: dissociate string form of integer enum from C++ symbol name

Summary:
In some cases, one may want to use different names for C++ symbol of an
enumerand from its string representation. In particular, in the LLVM dialect
for, e.g., Linkage, we would like to preserve the same enumerand names as LLVM
API and the same textual IR form as LLVM IR, yet the two are different
(CamelCase vs snake_case with additional limitations on not being a C++
keyword).

Modify EnumAttrCaseInfo in OpBase.td to include both the integer value and its
string representation. By default, this representation is the same as C++
symbol name. Introduce new IntStrAttrCaseBase that allows one to use different
names. Exercise it for LLVM Dialect Linkage attribute. Other attributes will
follow as separate changes.

Differential Revision: https://reviews.llvm.org/D73362

[XCOFF][AIX] Support basic relocation type on AIX

Summary:

This patch intends to support three most common relocation type
on AIX: R_POS, R_TOC, R_RBR.
These three relocation type will be needed for object file generation
on AIX for small code model.
We will have follow up patches to bring relocation support for
large code model on AIX.

Reviewers: hubert.reinterpretcast, daltenty, DiggerLin

Differential Revision: https://reviews.llvm.org/D72027

[analyzer] DynamicSize: Remove 'getSizeInElements()' from store

Summary:
This patch uses the new `DynamicSize.cpp` to serve dynamic information.
Previously it was static and probably imprecise data.

Reviewed By: NoQ

Differential Revision: https://reviews.llvm.org/D69599

[mlir][spirv] Add GroupNonUniform min and max operations.

Add GroupNonUniform atihtmetic operations: FMax, FMin, SMax, SMin,
UMax, UMin.

Differential Revision: https://reviews.llvm.org/D73563

[gn build] Port 601687bf731

[analyzer] DynamicSize: Remove 'getExtent()' from regions

Summary:
This patch introduces a placeholder for representing the dynamic size of
regions. It also moves the `getExtent()` method of `SubRegions` to the
`MemRegionManager` as `getStaticSize()`.

Reviewed By: NoQ

Differential Revision: https://reviews.llvm.org/D69540

Bring back the tests for update_cc_tests_checks.py

The tests were removed in 287307a0c60b68099d5f9dd22ac1db2a42593533 to
avoid a dependency on python3. update_cc_tests_checks.py also works with
python2 so restore the tests without the python3 dependency.

[PowerPC][Future] Branch Distance Estimation For Prefixed Instructions

By adding the prefixed instructions the branch distances are no longer
computed correctly. Since prefixed instructions cannot cross a 64 byte
boundary we have to assume that a prefixed instruction may have a nop
prepended to it. This patch tries to take that nop into consideration
when computing the size of basic blocks.

Differential Revision: https://reviews.llvm.org/D72572

[InstCombine][DebugInfo] Fold constants wrapped in metadata

Summary:
When constant folding, constants that are wrapped in metadata were not
folded. This could lead to dbg.values being the only user of a constant
expression, due to the non-dbg uses having been rewritten, resulting in
the constant later on being removed by some other pass. This occurred
with the attached test case, in which the non-rewritten GEP in the
dbg.value intrinsic was later on removed by globalopt.

This patch makes the code look through metadata and fold such constants.

I guess that we in the future may want to allow dbg.values using GEPs and
other constant expressions to be emittable even if there are no non-dbg
uses, but for example SelectionDAG does not support that.

Reviewers: jmorse, aprantl, vsk, davide

Reviewed By: aprantl, vsk, davide

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73630

AMDGPU/GlobalISel: Don't use pointless getConstantVRegVal

This is always a G_CONSTANT already

Changed wrong ROCDL instructions in GPU lowering.

Summary:
In the scope of the lowering phase from GPU to ROCDL, the intructions for the conversion patterns seems to be wrong.
According to https://github.com/ROCm-Developer-Tools/HIP/blob/master/include/hip/hcc_detail/math_fwd.h the instructions need two underscores in the beginning instead of one.

Reviewers: nicolasvasilache, herhut, rriddle

Reviewed By: herhut, rriddle

Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73535

Fix helptext for opt/llc after 14fc20ca6

The commit https://reviews.llvm.org/rG14fc20ca6 added some options to the X86
back end that cause the help text for opt/llc to become much harder to read.
The issue is that the cl::value_desc is part of the option name and is used to
compute the indentation of the description text (i.e. the maximum length option
name is what everything aligns to). Since the commit puts a large number of
characters into that text, everything is aligned to that width.

This patch just reformats the option so that the description is contained in the
description and the list of possible values is within the angle brackets.

Note: the readability issue of the helptext was fixed in commit
70cbf8c71c510077baadcad305fea6f62e830b06, but the re-formatting wasn't
added on that commit so I am still committing this.

Differential revision: https://reviews.llvm.org/D73267

Drop arm triple from test/CodeGen/AArch64/global-merge-hidden-minsize.ll

Because it's in the AArch64/ directory, it runs in cases where the arm
target may not be available, see comment on D73235.

[FPEnv][AArch64] Add lowering and instruction selection for strict conversions

Strict fp-to-int and int-to-fp conversions can be handled in the same way that
the non-strict versions are (by using the appropriate instruction or converting
to a function call when we have no instruction).

Differential Revision: https://reviews.llvm.org/D73625

GlobalISel: Implement s32->s64 G_FPTOSI lowering

Port directly from DAG version.

The lowering for G_FPTOUI used to fail on AMDGPU because it uses
G_FPTOSI.

AMDGPU/GlobalISel: Handle s64->s64 G_FPTOSI/G_FPTOUI

[clang-format] Improve support for multiline C# strings

Reviewers: krasimir

Reviewed By: krasimir

Tags: #clang-format

Differential Revision: https://reviews.llvm.org/D73622

AMDGPU/GlobalISel: Custom lower G_LOG/G_LOG10

I'm pretty sure this is wrong and we should expand these in a correct
way, but this matches the existing behavior.

AMDGPU/GlobalISel: Legalize unpacked d16 image operations

On targets that don't have the normal packed f16 layout, handle these
during legalization. Directly modify the register types. We can infer
this was a d16 load based on the mem operand size during selection.

A16 operands should possibly be handled here as well, but don't worry
about that yet.

AMDGPU/GlobalISel: Only map VOP operands to VGPRs

This trivially avoids violating the constant bus restriction.

Previously this was allowing one SGPR in the first source
operand, which technically also avoided violating this for most
operations (but not for special cases reading vcc).

We do need to write some new, smarter operand folds to pick the
optimal SGPR to use in some kind of post-isel fold, but that's purely
an optimization.

I was originally thinking we would pick which operands should be SGPRs
in RegBankSelect, but I think this isn't really manageable. There
would be additional complexity to handle every G_* instruction, and
then any nontrivial instruction patterns would need to know when to
avoid violating it, which is likely to be very error prone.

I think having all inputs being canonically copies to VGPRs will
simplify the operand folding logic. The current folding we do is
backwards, and only considers one operand at a time, relative to
operands it already has. It therefore poorly handles the case where
there is already a constant bus operand user. If all operands are
copies, it's somewhat simpler to consider all input operands at once
to choose the optimal constant bus user.

Since the failure mode for constant bus violations is now a verifier
error and not an selection failure, this moves towards a place where
we can turn on the fallback mode. The SGPR copy folding optimizations
can be left for later.

[GlobalISel] (fix) Use pointer type size for offset constant when lowering stores

Commit 9965b12fd1b was supposed to change the offset constant when
lowering load/stores, but only introduced this change for loads. This
patch adds the same fix for stores.

test-release.sh: Add MLIR to the projects list

AMDGPU/GlobalISel: Select llvm.amdgcn.buffer.atomic.cmpswap

[Linalg] Format Linalg/fusion.mlir.

Differential Revision: https://reviews.llvm.org/D73689

Activate extension loading test on Darwin now that the underlying fix has landed

Original bug fixed by ab2300bc154f7bed43f85f74fd3fe31be71d90e0

[gn build] Port f00be8da62b

[PowerPC][Future] Prefixed Instructions 64 Byte Boundary Support

A known limitation for Future CPU is that the new prefixed instructions may
not cross 64 Byte boundaries.

All instructions are already 4 byte aligned so the only situation where this
can occur is when the prefix is in one 64 byte block and the instruction that
is prefixed is at the top of the next 64 byte block. To fix this case
PPCELFStreamer was added to intercept EmitInstruction. When a prefixed
instruction is emitted we try to align it to 64 Bytes by adding a maximum of
4 bytes. If the prefixed instruction crosses the 64 Byte boundary then the
alignment would trigger and a 4 byte nop would be added to push the
instruction into the next 64 byte block.

Differential Revision: https://reviews.llvm.org/D72570

[FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND

This gets selected to the appropriate fcvt instruction. Handling from there on
isn't fully correct yet, as we need to model fcvt reading and writing to fpsr
and fpcr.

Differential Revision: https://reviews.llvm.org/D73201

[AVR] Recognize the AVR architecture in lldb

This commit adds AVR support to lldb. With this change, it can load a
binary and do basic things like dump a line table.

Not much else has been implemented, that should be done in later
changes.

Differential Revision: https://reviews.llvm.org/D73539

[libc++] [P0325] Implement to_array from LFTS with updates.

Summary:
This patch implements https://wg21.link/P0325.
Please mind that at it is my first contribution to libc++, so I may have forgotten to abide to some conventions.

Reviewers: EricWF, mclow.lists, ldionne, lichray

Reviewed By: ldionne, lichray

Subscribers: lichray, dexonsmith, zoecarver, christof, ldionne, libcxx-commits

Tags: #libc

Differential Revision: https://reviews.llvm.org/D69882

[DAGCombiner] ISD::AND/OR/XOR - use general SelectionDAG::FoldConstantArithmetic

This handles all the constant splat / opaque testing for us.

[DAGCombiner] ISD::SDIV/UDIV/SREM/UREM - use general SelectionDAG::FoldConstantArithmetic

This handles all the constant splat / opaque testing for us.

[MLIR] Added llvm.invoke and llvm.landingpad

Summary:
I have tried to implement `llvm.invoke` and `llvm.landingpad`.

  # `llvm.invoke` is similar to `llvm.call` with two successors added, the first one is the normal label and the second one is unwind label.
  # `llvm.launchpad` takes a variable number of args with either `catch` or `filter` associated with them. Catch clauses are not array types and filter clauses are array types. This is same as the criteria used by LLVM (https://github.com/llvm/llvm-project/blob/4f82af81a04d711721300f6ca32f402f2ea6faf4/llvm/include/llvm/IR/Instructions.h#L2866)

Examples:
LLVM IR
```
define i32 @caller(i32 %a) personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
    invoke i32 @foo(i32 2) to label %success unwind label %fail

  success:
    ret i32 2

  fail:
    landingpad {i8*, i32} catch i8** @_ZTIi catch i8** null catch i8* bitcast (i8** @_ZTIi to i8*) filter [1 x i8] [ i8 1 ]
    ret i32 3
}
```
MLIR LLVM Dialect
```
llvm.func @caller(%arg0: !llvm.i32) -> !llvm.i32 {
  %0 = llvm.mlir.constant(3 : i32) : !llvm.i32
  %1 = llvm.mlir.constant("\01") : !llvm<"[1 x i8]">
  %2 = llvm.mlir.addressof @_ZTIi : !llvm<"i8**">
  %3 = llvm.bitcast %2 : !llvm<"i8**"> to !llvm<"i8*">
  %4 = llvm.mlir.null : !llvm<"i8**">
  %5 = llvm.mlir.addressof @_ZTIi : !llvm<"i8**">
  %6 = llvm.mlir.constant(2 : i32) : !llvm.i32
  %7 = llvm.invoke @foo(%6) to ^bb1 unwind ^bb2 : (!llvm.i32) -> !llvm.i32
^bb1: // pred: ^bb0
  llvm.return %6 : !llvm.i32
^bb2: // pred: ^bb0
  %8 = llvm.landingpad (catch %5 : !llvm<"i8**">) (catch %4 : !llvm<"i8**">) (catch %3 : !llvm<"i8*">) (filter %1 : !llvm<"[1 x i8]">) : !llvm<"{ i8*, i32 }">
  llvm.return %0 : !llvm.i32
}
```

Signed-off-by: Shraiysh Vaishay <cs17btech11050@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D72006

[ARM][LowOverheadLoops] Skip debug values

While iterating through the loop, don't inspect any dbg values.

Differential Revision: https://reviews.llvm.org/D73688

[yaml2obj] - Add a way to set sh_entsize for relocation sections.

We are missing ability to override the sh_entsize field for
SHT_REL[A] sections. It would be useful for writing test cases.

Differential revision: https://reviews.llvm.org/D73621

[clangd] Make go-to-def jumps to overriden methods on `final` specifier.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73690

Add 'gpu.terminator' operation.

Summary:
The 'gpu.terminator' operation is used as the terminator for the
regions of gpu.launch. This is to disambugaute them from the
return operation on 'gpu.func' functions.

This is a breaking change and users of the gpu dialect will need
to adapt their code when producting 'gpu.launch' operations.

Reviewers: nicolasvasilache

Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73620

[llvm-readobj] - Improve error message reported by DynRegionInfo.

DynRegionInfo is a helper class used to create memory ranges.
It is used for many things and can report errors.
Errors reported currently do not provide a good diagnostic.
This patch fixes it and adds a test for each possible case.

Differential revision: https://reviews.llvm.org/D73484

[clangd] Log directory when a CDB is loaded

Summary: Fixes https://github.com/clangd/clangd/issues/268

Reviewers: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73628

[lldb][NFC] Remove TypeSystemClang::GetASTContext calls in IRForTarget

Similar to previous commits, this just replaces the lookup in the
global map with the reference to the TypeSystemClang instance we already
have in this context.

[llvm-readobj] - Add a few warnings for --gnu-hash-table.

The current implementation stops dumping in case of a single error
it handles, though we can continue dumping.
This patch refines it: it adds a few warnings and a few test cases.

Differential revision: https://reviews.llvm.org/D73269

[clangd] Bump vscode-clangd v0.0.20

CHANGELOG:
- update lsp dependencies to pickup the latest LSP 3.15
- better support for cu files

Add lowering of STRICT_FSETCC and STRICT_FSETCCS

These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the
corresponding FCMP and FCMPE instructions, though the handling from there on
isn't fully correct as we don't model reads and writes to FPCR and FPSR.

Differential Revision: https://reviews.llvm.org/D73368

[clangd][vscode] Get rid of the deprecated vscode module in the extension.

Summary:
The vscode module has been deprecated, and it doesn't work anymore after
we require the minimal VSCode version 1.41.0, this patch migrate to the
new @type/vscode and vscode-test modules, see
https://code.visualstudio.com/api/working-with-extensions/testing-extension#migrating-from-vscode

Reviewers: sammccall

Subscribers: dschuff, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73624

[ASTMatchers] Add hasPlacementArg and hasAnyPlacementArg traversal matcher for CXXNewExpr

Summary: Adds new traversal matchers called `hasPlacementArg` and `hasAnyPlacementArg` that matches on arguments to `placement new` operators.

Reviewers: aaron.ballman

Reviewed By: aaron.ballman

Subscribers: merge_guards_bot, mehdi_amini, hiraditya, steven_wu, dexonsmith, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73562

AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns

Summary:
The code was assuming in a few places that if there was only one exit
from the function that it was a normal return, which is invalid. It
could be an infinite loop, in which case we still need to insert the
usual fake edge so that the null export happens. This fixes shaders that
end with an infinite loop that discards.

Reviewers: arsenm, nhaehnle, critson

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71192

[InstCombine][AMDGPU] Trim components of s_buffer_load

Summary:
Add trimming of unused components of s_buffer_load.

For s_buffer_load and unformatted buffer_load also trim unused
components at the beginning of vector and update offset accordingly.

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71785

[DebugInfo] Fix DebugLine::Prologue::getLength

The function a) returned 32-bits when in DWARF64, the PrologueLength
field is 64-bits in size, and b) didn't work for DWARF version 5.

Also deleted some related dead code. With this deletion, getLength is
itself dead, but another change is about to make use of it.

Reviewed by: probinson

Differential Revision: https://reviews.llvm.org/D73626

Inline debug variable.

Summary:
In a release build this variable becomes unused and may break the build
with `-Werror,-Wunused-variable`.

Reviewers: gribozavr2, jdoerfert, sstefan1

Reviewed By: gribozavr2

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73683

[X86][Sched] A bunch of fixes to the Zen2 sched model latencies.

Summary:
As determined with `llvm-exegesis`.

Some of these look like typos/misunderstandings of the sched model td
spec:
  - latency defaults to `1` when not set => Maybe we can avoid
    having a default ?
  - problems with regexps not being anchored by default (XCHG matching
    CMPXHG)

Note that this is not complete, it fixes only the most obvious mistakes,
and only for latency (not uops).

Reviewers: RKSimon, GGanesh

Subscribers: hiraditya, jfb, mstojanovic, hfinkel, craig.topper, andreadb, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73172

[ARM][LowOverheadLoops] Check scalar predicates

When trying to remove the loop iteration count, check that the
instruction will always execute.

Differential Revision: https://reviews.llvm.org/D73682

[InstCombine] Update SimplifyCFG test

This test also runs -instcombine. Here the operands in an or chain
have been reassociated.

[InstCombine] Add SetVector.h include

Hopefully fixes the build for examples.

[InstCombine] Process newly inserted instructions in the correct order

InstCombine operates on the basic premise that the operands of the
currently processed instruction have already been simplified. It
achieves this by pushing instructions to the worklist in reverse
program order, so that instructions are popped off in program order.
The worklist management in the main combining loop also makes sure
to uphold this invariant.

However, the same is not true for all the code that is performing
manual worklist management. The largest problem (addressed in this
patch) are instructions inserted by InstCombine's IRBuilder. These
will be pushed onto the worklist in order of insertion (generally
matching program order), which means that a) the users of the
original instruction will be visited first, as they are pushed later
in the main loop and b) the newly inserted instructions will be
visited in reverse program order.

This causes a number of problems: First, folds operate on instructions
that have not had their operands simplified, which may result in
optimizations being missed (ran into this in
https://reviews.llvm.org/D72048#1800424, which was the original
motivation for this patch). Additionally, this increases the amount
of folds InstCombine has to perform, both within one iteration, and
by increasing the number of total iterations.

This patch addresses the issue by adding a Worklist.AddDeferred()
method, which is used for instructions inserted by IRBuilder. These
will only be added to the real worklist after the combine finished,
and in reverse order, so they will end up processed in program order.
I should note that the same should also be done to nearly all other
uses of Worklist.Add(), but I'm starting with just this occurrence,
which has by far the largest test fallout.

Most of the test changes are due to
https://bugs.llvm.org/show_bug.cgi?id=44521 or other cases where
we don't canonicalize something. These are neutral. One regression
has been addressed in D73575 and D73647. The remaining regression
in an shl+sdiv fold can't really be fixed without dropping another
transform, but does not seem particularly problematic in the first
place.

Differential Revision: https://reviews.llvm.org/D73411