review.tizen.org Git - platform/upstream/llvm.git/log

[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends.

Generalize the return value of tryToCreateWidenRecipe to return either a
newly create recipe or an existing VPValue. Use this to avoid creating
unnecessary VPBlendRecipes.

Fixes PR44800.

[AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24

Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when
possible. This significantly improves some game cases by eliminating
v_readfirstlane instructions when the result feeds into a scalar
operation, like the address calculation for a scalar load or store.

Since isDivergent is only an approximation of whether a value is in
SGPRs, it can potentially regress some situations where a uniform value
ends up in a VGPR. These should be rare in real code, although the test
changes do contain a number of examples.

Most of the test changes are just using s_mul instead of v_mul/mad which
is generally better for both register pressure and latency (at least on
GFX10 where sgpr pressure doesn't affect occupancy and vector ALU
instructions have significantly longer latency than scalar ALU). Some
R600 tests now use MULLO_INT instead of MUL_UINT24.

GlobalISel appears to handle more scenarios in the desirable way,
although it can also be thrown off and fails to select the 24-bit
multiplies in some cases.

Alternative solution considered and rejected was to allow selecting
MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of
those SD operations works is don't-care on the most significant 8 bits,
and this fact is used in some combines via SimplifyDemandedBits.

Based on a patch by Nicolai Hähnle.

Differential Revision: https://reviews.llvm.org/D97063

[JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns

This allows JumpThreading's computeValueKnownInPredecessors to
recognize select form of and/or patterns as well.

[AMDGPU] Rename a prefix for sanity. NFC.

Add @llvm.coro.async.size.replace intrinsic.

The new intrinsic replaces the size in one specified AsyncFunctionPointer with
the size in another. This ability is necessary for functions which merely
forward to async functions such as those defined for partial applications.

Reviewed By: aschwaighofer

Differential Revision: https://reviews.llvm.org/D97229

[Driver][NFC] Add explicit break to final case

[libcxx] [test] Define _CRT_STDIO_ISO_WIDE_SPECIFIERS while building tests

This matches how libc++ itself is built. This avoids errors due to
mismatch if linking libc++ statically.

Differential Revision: https://reviews.llvm.org/D97169

[clang-tidy] Remove IncludeInserter from MoveConstructorInit check.

This check registers an IncludeInserter, however the check itself doesn't actually emit any fixes or includes, so the inserter is redundant.

From what I can tell the fixes were removed in D26453(rL290051) but the inserter was left in, probably an oversight.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D97243

[clang][SVE] Don't warn on vector to sizeless builtin implicit conversion

This commit prevents warnings from -Wconversion when a clang vector type
is implicitly converted to a sizeless builtin type -- for example, when
implicitly converting a fixed-predicate to a scalable predicate.

The code below:

     1    #include <arm_sve.h>
     2
     3    #define N __ARM_FEATURE_SVE_BITS
     4    #define FIXED_ATTR __attribute__((arm_sve_vector_bits (N)))
     5    typedef svbool_t fixed_svbool_t FIXED_ATTR;
     6
     7    inline fixed_svbool_t foo(fixed_svbool_t p) {
     8      return svnot_z(svptrue_b64(), p);
     9    }

would previously raise this warning:

    warning: implicit conversion turns vector to scalar: \
    'fixed_svbool_t' (vector of 8 'unsigned char' values) to 'svbool_t' \
    (aka '__SVBool_t') [-Wconversion]

Note that many cases of these implicit conversions were already
permitted because many functions inside arm_sve.h are spawned via
preprocessor macros, and the call to isInSystemMacro would cover us in
this case. This commit fixes the remaining cases.

Differential Revision: https://reviews.llvm.org/D97053

[clang-tidy] Extending bugprone-signal-handler with POSIX functions.

An option is added to the check to select wich set of functions is
defined as asynchronous-safe functions.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D90851

[lldb] [test] Un-XFAIL TestBuiltinTrap on FreeBSD/aarch64

[lldb] [test] Un-XFAIL a test that no longer fail on FreeBSD

[X86] Cleanup overflow test check prefixes. NFCI.

Tidy up the check prefixes to improve reuse.

[AMDGPU] Use divergent addresses for vector loads

Change some test cases to use divergent addresses for vector loads,
which should be the common case in real world code. Using uniform
addresses causes poor instruction selection for the surrounding
code which has to be fixed up post-register-allocation, and this causes
a lot of testsuite churn for a forthcoming patch to stop selecting
24-bit vector multiply instructions for uniform multiplies.

This shows up some problems in the idot tests where we fail to select
v_dot instructions because the patterns only match MUL_[UI]24 ISD nodes,
but the DAG contains i16 mul nodes instead.

Differential Revision: https://reviews.llvm.org/D97062

[ARM] do not consider sp as deprecated for ldm/stm

Early versions of the ARMv7 reference manuals considered the sp register
as a deprecated register for ldm/stm familiy of instructions. However,
later versions such as ARM DDI 0406C.d added a note to the Appendix:

D9.3 Use of the SP as a general-purpose register
Most ARM instructions, unlike Thumb instructions, provide exactly the
same access to the SP as to R0-R12. This means that it is possible to
use the SP as a general-purpose register. Earlier issues of this manual
deprecated the use of SP in an ARM instruction, in any way that is
deprecated, not permitted, or not possible in the corresponding
Thumb instruction. However, user feedback indicates a number of cases
where these instructions are useful. Therefore, ARM no longer deprecates
these instruction uses.
Also Armv8 manuals no longer consider SP as deprecated register for ldm/
stm A32 instructions.

Furthermore, GNU as also does not print a deprecated warning when using
SP with those instructions.

Drop deprecation warning for pop/ldm/push/stm instructions.

Patch by: Stefan Agner.

Differential Revision: https://reviews.llvm.org/D82692

[TTI] Change getOperandsScalarizationOverhead to take Type args

As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287

[CostModel] Remove VF from IntrinsicCostAttributes

getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf36af5c88c working. ARM removed the fix in
dfac521da1b90db683, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291

[clang-tidy] Update checks list.

[clang][parse][NFC] Remove dead ProhibitAttributes() call

GNU-style attribute in enum bodies are allowed (and used by several
tests), and this call to ProhibitAttributes() was dead code.

Differential Revision: https://reviews.llvm.org/D97271

[clang-tidy] Install run-clang-tidy.py in bin/ as run-clang-tidy

The run-clang-tidy.py helper script is supposed to be used by the
user, hence it should be placed in the user's PATH. Some
distributions, like Gentoo [1], won't have it in PATH unless it is
installed in bin/.

Furthermore, installed scripts in PATH usually do not carry a filename
extension, since there is no need to know that this is a Python
script. For example Debian and Ubuntu already install this script as
'run-clang-tidy' [2] and hence build systems like Meson also look for
this name first [3]. Hence we install run-clang-tidy.py as
run-clang-tidy, as suggested by Sylvestre Ledru [4].

1: https://bugs.gentoo.org/753380
2: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/60aefb14171ab5c3867a0081844b507fc9f6e015/debian/clang-tidy-X.Y.links.in#L2
3: https://github.com/mesonbuild/meson/blob/b6dc4d5e5c6e838de0b52e62d982ba2547eb366d/mesonbuild/scripts/clangtidy.py#L44
4: https://reviews.llvm.org/D90972#2380640

Reviewed By: sylvestre.ledru, JonasToth

Differential Revision: https://reviews.llvm.org/D90972

[DSE] Allow ptrs defined in the entry block in IsGuaranteedLoopInvariant.

The **IsGuaranteedLoopInvariant** function is making sure to check if the
incoming pointer is guaranteed to be loop invariant, therefore I think
the case where the pointer is defined in the entry block of a function
automatically guarantees the pointer to be loop invariant, as the entry
block of a function cannot have predecessors or be part of a loop.

I implemented this small patch and tested it using
**ninja check-llvm-unit** and **ninja check-llvm**. I added a contained test
file that shows the problem and used **opt -O3 -debug** on it to make sure
the case is not currently handled (in fact the debug log is showing that
the DSE pass is bailing out when testing if the killer store is able to
clobber the dead store).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96979

[RISCV] vle1.v/vse1.v should be unmasked instructions.

vle1.v/vse1.v should be unmasked instructions. The vm encoding is 1 for
unmasked instructions.

Differential Revision: https://reviews.llvm.org/D97237

[OpenCL][Docs] Change description for the OpenCL standard headers.

After updating the user interface in D96515, update the docs
reflecting the new approach.

Tags: #clang

Differential Revision: https://reviews.llvm.org/D96616

[mlir][Linalg] Retire hoistViewAllocOps.

This transformation was only used for quick experimentation and is not general enough.
Retire it.

Differential Revision: https://reviews.llvm.org/D97266

Fix Wdocumentation parameter warning. NFCI.

[mlir] NFC - Use declarative assembly for scf::YieldOp

[lldb][NFC] Remove unused ValueObject::LogValueObject functions

Those functions aren't called anywhere. For debugging purposes we usually
have Dump() methods (which already exist in some semi-functional form in
ValueObject).

[Support] Add reserve() method to the raw_ostream.

If resulting size of the output stream is already known,
then the space for stream data could be preliminary
allocated in some cases. f.e. raw_string_ostream could
preallocate the space for the target string(it allows
to avoid reallocations during writing into the stream).

Differential Revision: https://reviews.llvm.org/D91693

[lldb][NFC] Clean up ValueObject comments

* Remove commented out code.
* Doxygenify comments that serve as documentation.
* Use the LLVM comment style where possible.

[ARM] Add pre/post inc tests of various sizes. NFC

Revert "[WebAssembly] call_indirect issues table number relocs"

This reverts commit 861dbe1a021e6439af837b72b219fb9c449a57ae. It broke
emscripten -- see https://reviews.llvm.org/D90948#2578843.

[RISCV] Support insertion of misaligned subvectors

This patch extends the support for RVV INSERT_SUBVECTOR to cover those
which don't align to a vector register boundary. Like the support for
EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the
nearest register-sized subvector (a subregister operation), then sliding
the vector down with VSLIDEDOWN, inserting the subvector to the first
position, and sliding the vector back up again afterwards.

Unlike subvector extraction, for vectors that occupy less than a full
vector register we must preserve the untouched elements. We do this by
lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and
lowering that to a VSLIDEUP with a zero offset. This uses a
tail-undisturbed policy and so has the effect of "sliding in" the
subvector elements while preserving the surrounding ones.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96972

Fix unused variable

[OpenCL] Move remaining defines to opencl-c-base.h

Move any remaining preprocessor defines from `opencl-c.h` to
`opencl-c-base.h`, such that they are shared with
`-fdeclare-opencl-builtins` too.

In particular, move:
- the `as_type` and `as_typen` definitions, and
- the `kernel_exec` and `__kernel_exec` definitions.

Also clang-format the changes.

Differential Revision: https://reviews.llvm.org/D96948

[lldb][NFC] Give CompilerType's IsArrayType/IsVectorType/IsBlockPointerType out-parameters default values

We already do this for most functions that have out-parameters, so let's do
the same here and avoid all the `nullptr, nullptr, nullptr` in every call.

Fix UBSAN in __ubsan::Value::getSIntValue

/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128'
    #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77
    #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190
    #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338
    #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370
    #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f)
    #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24)
    #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd)

Differential Revision: https://reviews.llvm.org/D97263

[Sanitizer][NFC] Fix typo

[lldb][NFC] Don't inherit from UserID in ValueObject

ValueObject inherits from UserID which is just a bad idea:

* The inheritance gives ValueObject some member functions that are at best
  misleading (such as `Clear()` which doesn't clear any value beside `id`).

* It allows passing ValueObject to the overloaded operators for UserID (such as
  `==` or `<<` which won't actually compare or print anything in the ValueObject).

* It exposes the `SetID` and `Clear` which both allow users to change the
  internal id value.

Similar to D91699 which did the same for Process

Reviewed By: #lldb, JDevlieghere

Differential Revision: https://reviews.llvm.org/D97205

[X86] Support amx-int8 intrinsic.

Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD.

Differential Revision: https://reviews.llvm.org/D97259

[mlir] Add support for DebugCounters using the new DebugAction infrastructure

DebugCounters allow for selectively enabling the execution of a debug action based upon a "counter". This counter is comprised of two components that are used in the control of execution of an action, a "skip" value and a "count" value. The "skip" value is used to skip a certain number of initial executions of a debug action. The "count" value is used to prevent a debug action from executing after it has executed for a set number of times (not including any executions that have been skipped). For example, a counter for a debug action with `skip=47` and `count=2`, would skip the first 47 executions, then execute twice, and finally prevent any further executions.

This is effectively the same as the DebugCounter infrastructure in LLVM, but using the DebugAction infrastructure in MLIR. We can't simply reuse the DebugCounter support already present in LLVM due to its heavy reliance on global constructors (which are not allowed in MLIR). The DebugAction infrastructure already nicely supports the debug counter use case, and promotes the separation of policy and mechanism design philosophy.

Differential Revision: https://reviews.llvm.org/D96395

[mlir] Add a new debug action framework.

This revision adds the infrastructure for `Debug Actions`. This is a DEBUG only
API that allows for external entities to control various aspects of compiler
execution. This is conceptually similar to something like DebugCounters in LLVM, but at a lower level. This framework doesn't make any assumptions about how the higher level driver is controlling the execution, it merely provides a framework for connecting the two together. This means that on top of DebugCounter functionality, we could also provide more interesting drivers such as interactive execution. A high level overview of the workflow surrounding debug actions is
shown below:

*   Compiler developer defines an `action` that is taken by the a pass,
    transformation, utility that they are developing.
*   Depending on the needs, the developer dispatches various queries, pertaining
    to this action, to an `action manager` that will provide an answer as to
    what behavior the action should do.
*   An external entity registers an `action handler` with the action manager,
    and provides the logic to resolve queries on actions.

The exact definition of an `external entity` is left opaque, to allow for more
interesting handlers.

This framework was proposed here: https://llvm.discourse.group/t/rfc-debug-actions-in-mlir-debug-counters-for-the-modern-world

Differential Revision: https://reviews.llvm.org/D84986

[clang][DeclPrinter] Pass Context into StmtPrinter whenever possible

ASTContext were only passed to the StmtPrinter in some places, while it
is always available in DeclPrinter. The context is used by StmtPrinter to better
print statements in some cases, like printing constants as written.

Differential Revision: https://reviews.llvm.org/D97043

[lldb][NFC] Cleanup ValueObject construction code

Just code cleanup for ValueObject constructors:

* Use default member initializers where possible.
* Doxygenify the comments for membersa nd constructors where needed.
* Delete the default constructor which isn't defined.
* Initialize the bitfields via a utility struct instead of doing this in the
different constructors.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D97199

[RISCV] Add test case for missed opportunity use bgez for the canonical form X > -1. NFC

[SimplifyCFG] Minor tweaks to the added tests (NFC)

[SimplifyCFG] Add tests for D97244 (NFC)

[flang][NFC] Add source line to lowering TODO messages

- Add a fatal error handler that can print a message with source location
  before aborting.
- Update TODO macro to take an mlir location argument and to use the
  newly introduced fatal error handler.
- Introduce TODO_NOLOC for the few places where no source location is
  easily accessible.

Reviewed By: schweitz

Differential Revision: https://reviews.llvm.org/D97190

[CMake][profile] Don't use `TARGET lld` to avoid ordering issues

Depending on the order in which lld and compiler-rt projects are
processed by CMake, `TARGET lld` might evaluate to `TRUE` or `FALSE`
even though `lld-available` lit stanza is always set because lld is
being built. We check whether lld project is enabled instead which
is used by other compiler-rt tests.

The ideal solution here would be to use CMake generator expressions,
but those cannot be used for dependencies yet, see:
https://gitlab.kitware.com/cmake/cmake/-/issues/19467

Differential Revision: https://reviews.llvm.org/D97256

[MLIR][LinAlg] Start detensoring implementation.

This commit is the first baby step towards detensoring in
linalg-on-tensors.

Detensoring is the process through which a tensor value is convereted to one
or potentially more primitive value(s). During this process, operations with
such detensored operands are also converted to an equivalen form that works
on primitives.

The detensoring process is driven by linalg-on-tensor ops. In particular, a
linalg-on-tensor op is checked to see whether *all* its operands can be
detensored. If so, those operands are converted to thier primitive
counterparts and the linalg op is replaced by an equivalent op that takes
those new primitive values as operands.

This works towards handling github/google/iree#1159.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D96271

[NFC][libc++] Fix _LIBCPP_HAS_BITSCAN64 usage.

Seems line was accidentally left in
llvm-svn: 290924 86eebc5b658b5c2ccf2f4fbc16e8aee9880919a5

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D97211

[builtins] Replace __SOFT_FP__ with __SOFTFP__

Fix PR46294

Differential Revision: https://reviews.llvm.org/D82014

[docs][ORC] Fix section title and reference.

[SLP][Test] Add test for PR49081.ll

Move the MLIR integration tests as a subdirectory of test (NFC)

This does not change the behavior directly: the tests only run when
`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON` is configured. However running
`ninja check-mlir` will not run all the tests within a single
lit invocation. The previous behavior would wait for all the integration
tests to complete before starting to run the first regular test. The
test results were also reported separately. This change is unifying all
of this and allow concurrent execution of the integration tests with
regular non-regression and unit-tests.

Differential Revision: https://reviews.llvm.org/D97241

[libc][NFC] Eliminate couple of dependencies on llvm/ADT/StringExtras.h.

[BuildLibCalls] Add noundef to allocator fns' size

This is a patch to explicitly mark the size parameter of allocator functions like malloc/realloc/... as noundef.

For C/C++: undef can be created from reading an uninitialized variable or padding.
Calling a function with uninitialized variable is already UB.
Calling malloc with padding value is.. something that's not expected. Padding bits may appear in a coerced aggregate, which doesn't apply to malloc's size.
Therefore, malloc's size can be marked as noundef.

For transformations that introduce malloc/realloc/..: I ran LLVM unit tests with an updated Alive2 semantics, and found no regression, so it seems okay.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97045

Only verify LazyCallGraph under expensive checks

These verify calls are causing a lot of slowdown on some files, up to 8x.
The LazyCallGraph infra has been tested a lot over the years, so I'm fairly confident that we don't always need to run the verifys.

These verifies took >90% of total time in one of the compilations I looked at.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D97225

[Analysis] Use range-based for loops (NFC)

[llvm] Use llvm::drop_begin (NFC)

[Analysis] Use ListSeparator (NFC)

[libc] [Obvious] Fix.

[mlir][pdl][NFC] Extract the execution of each bytecode operation into its own function

This makes the implementation of each bytecode operation much easier to reason about, and lets the compiler decide which implementations are beneficial to inline into the main switch.

Differential Revision: https://reviews.llvm.org/D95716

[mlir][pdl] Fix bug when ordering predicates

We should be ordering predicates with higher primary/secondary sums first, but we are currently ordering them last. This allows for predicates more frequently encountered to be checked first.

Differential Revision: https://reviews.llvm.org/D95715

[GVN] Fix a typo in comment

NFC.

Differential Revision: https://reviews.llvm.org/D97200

Reviewed By: fhahn

Changes to mktime to handle invalid dates, overflow and underflow andcalculating the correct date and thenumber of seconds even if invalid datesare passed as arguments.

Added tests for invalid dates like the following
Date 1970-01-01 00:00:-1 is treated as 1969-12-31 23:59:59 and seconds
are returned for the modified date.

Tested the code by doing ninja check-libc (and cmake).

Reviewed By: sivachandra, rtenneti

Differential Revision: https://reviews.llvm.org/D96684

[dfsan] Propagate origins at non-memory/phi/call instructions

This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D97200

[obj2yaml,yaml2obj] Add NumBlocks to the BBAddrMapEntry yaml field.

As discussed in D95511, this allows us to encode invalid BBAddrMap
sections to be used in more rigorous testing.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D96831

[lldb] add check for libcxx runtime

When enabling LLDB tests with `LLVM_ENABLE_RUNTIMES=libcxx` CMake will
fail with:

```
LLDB test suite requires libc++, but it is currently disabled.
```

The issue is that the targets in LLVM_ENABLE_RUNTIMES are configured
after the targets in LLVM_ENABLE_PROJECTS, so at this point the check
for the `cxx` target will fail. CMake will add a dependency for a target
that does not exist yet however, so by first checking for `libcxx` in
LLVM_ENABLE_RUNTIMES we ensure that the `cxx` target will be present at
build time.

Tested with:
```
% cmake -G Ninja \
    -C ~/local/llvm-project/lldb/cmake/caches/Apple-lldb-macOS.cmake \
    -DLLVM_ENABLE_PROJECTS="clang;lldb" -DLLVM_ENABLE_RUNTIMES="libcxx" \
    -DLIBCXX_INCLUDE_TESTS=NO ~/local/llvm-project/llvm
% ninja check-lldb
```

Reviewed By: smeenai, JDevlieghere

Differential Revision: https://reviews.llvm.org/D97227

[mlir][IR] Refactor the `getChecked` and `verifyConstructionInvariants` methods on Attributes/Types

`verifyConstructionInvariants` is intended to allow for verifying the invariants of an attribute/type on construction, and `getChecked` is intended to enable more graceful error handling aside from an assert. There are a few problems with the current implementation of these methods:
* `verifyConstructionInvariants` requires an mlir::Location for emitting errors, which is prohibitively costly in the situations that would most likely use them, e.g. the parser.
This creates an unfortunate code duplication between the verifier code and the parser code, given that the parser operates on llvm::SMLoc and it is an undesirable overhead to pre-emptively convert from that to an mlir::Location.
* `getChecked` effectively requires duplicating the definition of the `get` method, creating a quite clunky workflow due to the subtle different in its signature.

This revision aims to talk the above problems by refactoring the implementation to use a callback for error emission. Using a callback allows for deferring the costly part of error emission until it is actually necessary.

Due to the necessary signature change in each instance of these methods, this revision also takes this opportunity to cleanup the definition of these methods by:
* restructuring the signature of `getChecked` such that it can be generated from the same code block as the `get` method.
* renaming `verifyConstructionInvariants` to `verify` to match the naming scheme of the rest of the compiler.

Differential Revision: https://reviews.llvm.org/D97100

Revert "[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt"

This reverts commit 867e379c0e14527eb7aa68485a10324693e35f5d.

For some reason this is upsetting Linux/Windows bots. Reverting while I try to
reproduce.

[Test][AArch64] Test SADDE/SSUBE/UADDE/USUBE narrowing legalization

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D96676

[AArch64][GlobalISel] Make overflow legalization use clampScalar

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96674

[GlobalISel] Implement narrowScalar for SADDE/SSUBE/UADDE/USUBE

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96673

[GlobalISel] Implement narrowScalar for SADDO/SSUBO

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96672

[GlobalISel] Implement narrowScalar for UADDO/USUBO

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96671

[lldb] Reinstate support for LLDB_VERSION_STRING

Reinstate support for specifying -DLLDB_VERSION_STRING="best-lldb"
which seems to have gotten accidentally removed in the past.

rdar://38983903

Differential revision: https://reviews.llvm.org/D97235

[MacroExpansionContext] Fix a warning.

This patch fixes:

error: private field 'PP' is not used [-Werror,-Wunused-private-field]

[libunwind] unw_* alias fixes for ELF and Mach-O

Rename the CMake option, LIBUNWIND_HERMETIC_STATIC_LIBRARY, to
LIBUNWIND_HIDE_SYMBOLS. Rename the C macro define,
_LIBUNWIND_DISABLE_VISIBILITY_ANNOTATIONS, to _LIBUNWIND_HIDE_SYMBOLS,
because now the macro adds a .hidden directive rather than merely
suppress visibility annotations.

For ELF, when LIBUNWIND_HIDE_SYMBOLS is enabled, mark unw_getcontext as
hidden. This symbol is the only one defined using src/assembly.h's
WEAK_ALIAS macro. Other unw_* weak aliases are defined in C++ and are
already hidden.

Mach-O doesn't support weak aliases, so remove .weak_reference and
weak_import. When LIBUNWIND_HIDE_SYMBOLS is enabled, output
.private_extern for the unw_* aliases.

In assembly.h, add missing SYMBOL_NAME macro invocations, which are
used to prefix symbol names with '_' on some targets.

Fixes PR46709.

Reviewed By: #libunwind, phosek, compnerd, steven_wu

Differential Revision: https://reviews.llvm.org/D93003

[sparse][mlir] simplify lattice optimization logic

Simplifies the way lattices are optimized with less, but more
powerful rules. This also fixes an inaccuracy where too many
lattices resulted (expecting a non-existing universal index).
Also puts no-side-effects on all proper getters and unifies
bufferization flags order in integration tests (for future,
more complex use cases).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D97134

[JITLink] Add a getFixupAddress convenience method to Block.

[JITLink] Don't allow creation of sections with duplicate names.

[gn build] Port 8f48ddd19358

[X86][AMX] Lower tile copy instruction.

Since there is no tile copy instruction, we need to store tile
register to stack and load from stack to another tile register.
We need extra GR to hold the stride, and we need stack slot to
hold the tile data register. We would run this pass after copy
propagation, so that we don't miss copy optimization. And we
would run this pass before prolog/epilog insertion, so that we
can allocate stack slot.

Differential Revision: https://reviews.llvm.org/D97112

DebugInfo: Emit "LocalToUnit" flag on local member function decls.

Follow-up to fe2dcd89acfd9301a230e38e9030734553baa8dc.

Update test per review comments, restoring the "D" type to its
original state, and adding new "L" type. (Sorry, this was intended to
be included in the prior commit)

Differential Revision: https://reviews.llvm.org/D96044

Add auto-upgrade support for annotation intrinsics

The llvm.ptr.annotation and llvm.var.annotation intrinsics were changed
since the 11.0 release to add an additional parameter. This patch
auto-upgrades IR containing the four-parameter versions of these
intrinsics, adding a null pointer as the fifth argument.

Differential Revision: https://reviews.llvm.org/D95993

[AMDGPU] Move RPT::getLiveRegs() check under EXPENSIVE_CHECKS

This is too expensive even for debug builds. It doubles
scheduling time if enabled.

Differential Revision: https://reviews.llvm.org/D97232

[CMake] Don't optimize tests so much under ThinLTO

This drops check-llvm under -DLLVM_ENABLE_LTO=Thin from 13m to 10m20s on Windows and 20m to 15m35s on Linux.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D96618

[RISCV] Have sexti32 also recognize AssertZExt from types smaller than i32.

An i64 AssertZExt from a type smaller than i32 has at least 33
leading zeros which mean it has at least 33 sign bits.

Since we have a couple patterns that use two sexti32, I've
switched to a ComplexPattern so tablegen didn't have to generate
9 different permutations.

As noted in the FIXME, maybe we should just call computeNumSignBits,
but we don't have tests that benefit from that yet.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D97130

DebugInfo: Emit "LocalToUnit" flag on local member function decls.

Previously, the definition was so-marked, but the declaration was
not. This resulted in LLVM's dwarf emission treating the function as
being external, and incorrectly emitting DW_AT_external.

Differential Revision: https://reviews.llvm.org/D96044

[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt

Match a G_SHUFFLE_VECTOR with a mask that allows it to be represented as a
G_INSERT_VECTOR_ELT and a G_EXTRACT_VECTOR_ELT.

This ports `isINSMask` from AArch64ISelLowering and the portion of
`AArch64TargetLowering::LowerVECTOR_SHUFFLE` which handles the equivalent
transformation.

This provides more opportunities for matching DUP. We don't have all of the
necessary combines to actually make DUP out of these yet, but this is better for
size than the full TBL expansion for G_SHUFFLE_VECTOR.

This is a -0.1% code size improvement on CTMark/Bullet at -Os.

IR example: https://godbolt.org/z/sdcevT

Differential Revision: https://reviews.llvm.org/D97214

[ValueTracking] Improve ComputeNumSignBits for SRem.

The result will have the same sign as the dividend unless the
result is 0. The magnitude of the result will always be less
than or equal to the dividend. So the result will have at least
as many sign bits as the dividend.

Previously we would do this if the divisor was a positive constant,
but that isn't required.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97170

scudo: Support memory tagging in the secondary allocator.

This patch enhances the secondary allocator to be able to detect buffer
overflow, and (on hardware supporting memory tagging) use-after-free
and buffer underflow.

Use-after-free detection is implemented by setting memory page
protection to PROT_NONE on free. Because this must be done immediately
rather than after the memory has been quarantined, we no longer use the
combined allocator quarantine for secondary allocations. Instead, a
quarantine has been added to the secondary allocator cache.

Buffer overflow detection is implemented by aligning the allocation
to the right of the writable pages, so that any overflows will
spill into the guard page to the right of the allocation, which
will have PROT_NONE page protection. Because this would require the
secondary allocator to produce a header at the correct position,
the responsibility for ensuring chunk alignment has been moved to
the secondary allocator.

Buffer underflow detection has been implemented on hardware supporting
memory tagging by tagging the memory region between the start of the
mapping and the start of the allocation with a non-zero tag. Due to
the cost of pre-tagging secondary allocations and the memory bandwidth
cost of tagged accesses, the allocation itself uses a tag of 0 and
only the first four pages have memory tagging enabled.

Differential Revision: https://reviews.llvm.org/D93731

Modify TypePrinter to differentiate between anonymous struct and unnamed struct

Currently TypePrinter lumps anonymous classes and unnamed classes in one group "anonymous" this is not correct and can be confusing in some contexts.

Differential Revision: https://reviews.llvm.org/D96807

[clangd] Shutdown sequence for modules, and doc threading requirements

This allows modules to do work on non-TUScheduler background threads.

Differential Revision: https://reviews.llvm.org/D96755

[clangd] Narrow and document a loophole in blockUntilIdle

blockUntilIdle of a parent can't always be correctly implemented as
return ChildA.blockUntilIdle() && ChildB.blockUntilIdle()
The problem is that B can schedule work on A while we're waiting on it.

I believe this is theoretically possible today between CDB and background index.
Modules open more possibilities and it's hard to reason about all of them.

I don't have a perfect fix, and the abstraction is too good to lose. this patch:
- calls out why we block on workscheduler first, and asserts correctness
- documents the issue
- reduces the practical possibility of spuriously returning true significantly

This function is ultimately only for testing, so we're driving down flake rate.

Differential Revision: https://reviews.llvm.org/D96856

[InstrProfiling] Use ELF section groups for counters, data and values

__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Differential Revision: https://reviews.llvm.org/D96757

[GloblalISel] Support lowering <3 x i8> arguments in multiple parts.

Differential Revision: https://reviews.llvm.org/D97086

[AArch64][GlobalISel] Support lowering <1 x i8> arguments.

We don't yet have working codegen for the resulting unmerges, and if
we did it would probably be horrible.

Differential Revision: https://reviews.llvm.org/D97035

[lld-link] Add /reproduce: support for several flags

/reproduce: now works correctly with:
- /call-graph-ordering-file:
- /def:
- /natvis:
- /order:
- /pdbstream:

I went through all instances of MemoryBuffer::getFile() and made sure
everything that didn't already do so called takeBuffer().

For natvis, that wasn't possible since DebugInfo/PDB wants to take
owernship of the natvis buffer. For that case, I'm manually adding the
tar file entry.

/natvis: and /pdbstream: is slightly awkward, since createResponseFile()
always adds these flags to the response file but createPDB() (which
ultimately adds the files referenced by the flags) is only called if
/debug is also passed. So when using /natvis: without /debug with
/reproduce:, lld won't warn, but when linking using the response
file from the archive, it won't find the natvis file since it's not
in the tar. This isn't a new issue though, and after this patch things
at least work with using /natvis: _with_ debug with /reproduce:.
(Same for /pdbstream:)

Differential Revison: https://reviews.llvm.org/D97212

[WebAssembly] Remap branch dests after fixCatchUnwindMismatches

Fixing catch unwind mismatches can sometimes invalidate existing branch
destinations. This CL remaps those destinations after placing
try-delegates.

Fixes https://github.com/emscripten-core/emscripten/issues/13515.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D97178