review.tizen.org Git - platform/upstream/llvm.git/log

[LV] Allow scalable vectorization with vscale = 1

This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

Differential Revision: https://reviews.llvm.org/D128542

Add wait for child processe(s) to exit. (amended+clang-formatted)

It was possible for the parent process to exit before the
forked child process had finished. In some shells, this
causes the pipe to close and FileCheck misses some output
from the child. Waiting for the child process to exit before
exiting the parent, assures that all output from stdout and
stderr is combined and forwarded through the pipe to FileCheck.

rdar://95241490

Differential Revision: https://reviews.llvm.org/D128565

[libc++][lit][AIX] Port tests for getting time to AIX

Summary:
This patch ports libc++ LIT test cases for getting time in various locales to AIX.

Reviewed by: philnik, Mordante, libc++

Differential Revision: https://reviews.llvm.org/D128087

[libc++][lit][AIX] Port tests for money format to AIX

Summary:
This patch ports libc++ LIT test cases for money formats to AIX. On AIX, the money format of locale zh_CN.UTF-8 is the similar to that of en_US.UTF-8, i.e., sign, symbol, none, value.

Reviewed by: Mordante, DiggerLin, libc++

Differential Revision: https://reviews.llvm.org/D128220

[RISCV] Remove a use of getMinVLen in favor of getRealMinVLen

The later is possibly greater than the former, and thus the assert was overly strong when a wider VLEN was set at the command line.

[RISCV] Cost model for scalable reductions

This extends the existing cost model for reductions for scalable vectors.

The existing cost model assumes that reductions are roughly logarithmic in cost for unordered variants and linear for ordered ones. This change keeps that same basic model, and extends it out to the maximum number of elements a scalable vector could possibly have.

This results in costs which aren't terribly high for unordered reductions, but are for ordered ones. This seems about right; we want to strongly bias away from using scalable ordered reductions if the cost might be linear in VL.

Differential Revision: https://reviews.llvm.org/D127447

Revert "[X86] Support `_Float16` on SSE2 and up"

Breaks buildbot
https://lab.llvm.org/buildbot/#/builders/37/builds/14334

This reverts commit f5d781d6273cc56dd8b44ee9e4cfb2ae5579bb04.

[mlir][bufferize] Improve to_tensor/to_memref folding

Differential Revision: https://reviews.llvm.org/D128615

[test] Add workaround for flaky error we see on Windows bots

[NFC][lldb] Correct Module::FindFunctions documentation

Looks like a copy/paste from ModuleList::FindCompileUnits.

Fix sphinx docs build

Fix "Title underline too short."

[Coroutine] Remove the '!func_sanitize' metadata for split functions

There is no proper RTTI for these split functions. So just delete the
metadata.

Fixes https://github.com/llvm/llvm-project/issues/49689.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D116130

[ubsan] Using metadata instead of prologue data for function sanitizer

Information in the function `Prologue Data` is intentionally opaque.
When a function with `Prologue Data` is duplicated. The self (global
value) references inside `Prologue Data` is still pointing to the
original function. This may cause errors like `fatal error: error in backend: Cannot represent a difference across sections`.

This patch detaches the information from function `Prologue Data`
and attaches it to a function metadata node.

This and D116130 fix https://github.com/llvm/llvm-project/issues/49689.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D115844

[clang][dataflow] Add `buildAndSubstituteFlowCondition` to `DataflowEnvironment`

Depends On D128658

Reviewed By: gribozavr2, xazax.hun

Differential Revision: https://reviews.llvm.org/D128659

[clang][dataflow] Do not allow substitution of true/false boolean literals in `buildAndSubstituteFlowCondition`

Reviewed By: gribozavr2, xazax.hun

Differential Revision: https://reviews.llvm.org/D128658

[mlir][sparse] remove redundant whitespace

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D128673

[MLIR][Parser] Fix AffineParser colliding bare identifiers with primitive types

The parser currently can't parse bare identifiers like 'i0' in affine
maps and sets, and similarly ids like f16/f32. But these bare ids are
part of the grammar - although they are primitive types.

```
error: expected bare identifier
set = affine_set<(i0, i1) : ()>
^
```

This patch allows the parser for AffineMap/IntegerSet to parse bare
identifiers as defined by the grammer.

Reviewed By: bondhugula, rriddle

Differential Revision: https://reviews.llvm.org/D127076

[mlir][sparse]more integration test cases for sparse_tensor.BinaryOp

Adding more test cases for sparse_tensor.BinaryOp, including different cases when overlap/left/right region is implemented/empty/identity

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D128383

[Symbolize] Fix MarkupFilter tests for Windows.

The tests use in-band ANSI color codes, while the Windows cmd console
uses an out-of-band interface for color.

[Symbolize] Fix llvm-symbolizer --filter-markup test on Windows.

The tests use in-band ANSI color codes, while the Windows cmd console
uses an out-of-band interface for color.

[BOLT] Restrict icp-inline to callsites

ICP peel for inline mode only makes sense for calls, not jump tables.
Plus, add a check that the Target BinaryFunction is found.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128404

[mlir][sparse]Add more integration tests for sparse_tensor.unary

Previously, the sparse_tensor.unary integration test does not contain cases with the use of `linalg.index` (previoulsy unsupported), this commit adds test cases that use `linalg.index` operators.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D128460

[gn build] Port eb5af0acf054

[Symbolize] Add log markup --filter to llvm-symbolizer.

This adds a --filter option to llvm-symbolizer. This takes log-bearing
symbolizer markup from stdin and writes a human-readable version to
stdout.

For now, this only implements the "symbol" markup tag; all others are
passed through unaltered. This is a proof-of-concept bit of
functionalty; implement the various tags is more-or-less just a matter
of hooking up various parts of the Symbolize library to the architecture
established here.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D126980

[Docs] Update clang & llvm release notes for HLSL

Adding release note entries for LLVM & Clang to introduce the HLSL &
DirectX support that is being added.

Reviewed By: aaron.ballman, MaskRay

Differential Revision: https://reviews.llvm.org/D127890

[OpenMP] Only strip runtime attributes if needed

Summary:
Currently in OpenMPOpt we strip `noinline` attributes from runtime
functions. This is here because the device bitcode library that we link
has problems with needed definitions getting prematurely optimized out.
This is only necessary for OpenMP offloading to GPUs so we should narrow
the scope for where we spend time doing this. In the future this
shouldn't be necessary as we move to using a linked library rather than
pulling in a bitcode library in Clang.

[libc][docs] Added fmod performance results.

[BOLT][NFC] Add aliases for ICP flags

- `indirect-call-promotion` -> `icp`
- `indirect-call-promotion-mispredict-threshold` -> `icp-mp-threshold`
- `indirect-call-promotion-use-mispredicts` -> `icp-use-mp`
- `indirect-call-promotion-topn` -> `icp-topn`
- `indirect-call-promotion-calls-topn` -> `icp-calls-topn`
- `indirect-call-promotion-jump-tables-topn` -> `icp-jt-topn`
- `icp-jump-table-targets` -> `icp-jt-targets`

This also fixes an inconsistency in ICP flag names that some start with
`indirect-call-promotion` while others start with `icp`.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128375

[BOLT][NFC] Use llvm::less_first

Follow the case of https://reviews.llvm.org/D126068 and simplify call sites
with `llvm::less_first`.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128242

[NFC][SVE] Add more tests of vector compares and selects taking an immediate operand.

Increases coverage of predicated compares (int and fp) along with
predicated zeroing of active floating point lanes.

llvm-reduce: Check shouldKeep before trying to reduce operands

No point doing the more complicated check first.

[lldb] Add a log dump command

Add a log dump command to dump logs to a file. This only works for
channels that have a log handler associated that supports dumping. For
now that's limited to the circular log handler, but more could be added
in the future.

Differential revision: https://reviews.llvm.org/D128557

Round up zero-sized symbols to 1 byte in `.debug_aranges` (without breaking other logic).

This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:

> Each descriptor is a triple consisting of a segment selector, the beginning
> address within that segment of a range of text or data covered by some entry
> owned by the corresponding compilation unit, followed by the non-zero length
> of that range.

In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.

Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.

Furthermore, this patch fixes a bug in which `available_externally` symbols
would cause unpredictable values to be emitted into the `.debug_aranges` table
under certain circumstances. In practice I don't believe that this caused
issues up until now, but the root cause of this bug--an invalid DenseMap
lookup--triggered failures in Chromium when combined with an earlier version of
this patch. Therefore, this patch fixes that bug too.

This is a revised version of diff D126257, which was reverted due to breaking
tests. The now-reverted version of this patch didn't distinguish between
symbols that didn't have their size reported to the DwarfDebug handler and
those that had their size reported to be zero. This new version of the patch
instead restricts the special handling only to the symbols whose size is
definitively known to be zero.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D126835

[libc++] Add a few missing min/max macro push/pop

Also, improve the test for nasty macros to define min and max, so this
will be caught in the future.

Differential Revision: https://reviews.llvm.org/D128655

[mlir][LLVMIR] Memorize compatible LLVM types

This patch memorize compatible LLVM types in `LLVM::isCompatibleType` in
order to avoid redundant works.

This is especially useful when the size of program is big and there are
multiple occurrences of some deeply nested LLVM struct types, in which
case we can gain quite some speedups with this patch.

Differential Revision: https://reviews.llvm.org/D127918

[mlir][LLVMIR] Add support for va_start/copy/end intrinsics

This patch adds three new LLVM intrinsic operations: llvm.intr.vastart/copy/end.
And its translation from LLVM IR.

This effectively removes a restriction, imposed by 0126dcf1f0a1, where
non-external functions in LLVM dialect cannot be variadic. At that time
it was not clear how LLVM intrinsics are going to be modeled, which
indirectly affects va_start/copy/end, the core intrinsics used in
variadic functions. But since we have LLVM intrinsics as normal
MLIR operations, it's not a problem anymore.

Differential Revision: https://reviews.llvm.org/D127540

[memprof] Return an error for unsupported symbolization.

Add a check to detect that the profiled binary was build with position
independent code. Add a test with a pie binary to which can be reused
later when support is added. Also clean up the error messages with
trailing colons.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D128564

[lldb] [llgs] Skip new vCont test on Windows

Sponsored by: The FreeBSD Foundation

[llvm-ar] Fix MRI ADDLIB command when used with thin archives

We did not properly handle using CREATETHIN in an MRI script and
attempting to use ADDLIB to add the contents of a regular archive. This
fix outputs a meaningful error message in this case and provides some
more testing.

Differential Revision: https://reviews.llvm.org/D128067

Silence an "illegal conversion" diagnostic

MSVC was issuing "illegal conversion; more than one user-defined
conversion has been implicitly applied" as a warning on this code.
Explicitly calling .str() causes a StringRef to be materialized so
that a second user-defined conversion is not required.

[Clang][OpenMP] Claim nowait clause on taskwait

Silence some format specifier warnings

This solves a format specifier warning for char32_t being converted to
an unsigned integer type, and multiple format specifier warnings for
size_t being converted to long.

[libc++][doc] Fixes a broken table entry.

[AMDGPU] Cluster stores as well as loads for GFX11

Differential Revision: https://reviews.llvm.org/D128517

[Driver][test] Add libclang_rt.profile{{.*}}.a tests for NetBSD

Differential Revision: https://reviews.llvm.org/D128620

Adding support for target in_reduction

Implementing target in_reduction by wrapping target task with host task with in_reduction and if clause. This is in compliance with OpenMP 5.0 section: 2.19.5.6.
So, this

```
  for (int i=0; i<N; i++) {
    res = res+i
  }
```

will become

```

   #pragma omp task in_reduction(+:res) if(0)
   #pragma omp target map(res)
   for (int i=0; i<N; i++) {
     res = res+i
   }
```

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D125669

[lldb] [llgs] Support "t" vCont action

Implement support for the "t" action that is used to stop a thread.
Normally this action is used only in non-stop mode. However, there's
no technical reason why it couldn't be also used in all-stop mode,
e.g. to express "resume all threads except ..." (`t:...;c`).

While at it, add a more complete test for vCont correctly resuming
a subset of program's threads.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D126983

[flang][driver] Use `-O{0|1|2|3}` to define LLVM backend pass pipeline

Support for optimisation flags in LLVM Flang was originally added in
https://reviews.llvm.org/D128043. That patch focused on LLVM
middle-end/optimisation pipelines. With this patch, Flang will
additionally configure LLVM backend pass pipelines accordingly. This
behavior is consistent with Clang.

New hook is added to translate compiler optimisation flags (e.g. `-O3`)
into backend optimisation level: `getCGOptLevel`. Identical hooks are
available in Clang and LLVM. In other words, the meaning of these
optimisation flags remains consistent with other sub-projects that use
LLVM backends.

Differential Revision: https://reviews.llvm.org/D128050

[mlir][SCF][bufferize] Small simplification and more comments

Differential Revision: https://reviews.llvm.org/D128651

[GlobalOpt] Fix memset handling in global ctor evaluation (PR55859)

The global ctor evaluator currently handles by checking whether the
memset memory is already zero, and skips it in that case. However,
it only actually checks the first byte of the memory being set.

This patch extends the code to check all bytes being set. This is
done byte-by-byte to avoid converting undef values to zeros in
larger reads. However, the handling is still not completely correct,
because there might still be padding bytes (though probably this
doesn't matter much in practice, as I'd expect global variable
padding to be zero-initialized in practice).

Mostly fixes https://github.com/llvm/llvm-project/issues/55859.

Differential Revision: https://reviews.llvm.org/D128532

[mlir][complex] complex.arg op to calculate the angle of complex number

Add complex.arg op which calculates the angle of complex number. The op name is inspired by the function carg in libm.

See: https://sourceware.org/newlib/libm.html#carg

Differential Revision: https://reviews.llvm.org/D128531

[GlobalOpt] Add tests for memset with non-zero value (NFC)

[mlir][bufferize] Infer memory space in all bufferization patterns

This change updates all remaining bufferization patterns (except for scf.while) and the remaining bufferization infrastructure to infer the memory space whenever possible instead of falling back to "0". (If a default memory space is set in the bufferization options, we still fall back to that value if the memory space could not be inferred.)

Differential Revision: https://reviews.llvm.org/D128423

tsan: add missing guard for DumpProcessMap call

Add a missing "#if !SANITIZER_GO" guard for a call to DumpProcessMap
in the Finalize hook (needed to build an updated Go race detector syso
image).

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D128641

[mlir][bufferize][NFC] Change signature of allocateTensorForShapedValue

Add a failure return value and bufferization options argument. This is to keep a subsequent change smaller.

Differential Revision: https://reviews.llvm.org/D128278

[X86] Support `_Float16` on SSE2 and up

This is split from D113107 to address #56204 and https://discourse.llvm.org/t/how-to-build-compiler-rt-for-new-x86-half-float-abi/63366

Reviewed By: zahiraam, rjmccall, bkramer

Differential Revision: https://reviews.llvm.org/D128571

[libc++][NFC] Remove trailing whitespace

[Clang] Remove unused function declaration after 77475ffd22418ca72.

[libc++] Remove dummy command in Dockerfile

It turns out that the Docker images on CI instances are not updated
based on what's in this file, but instead when a new image is pushed
to ldionne/libcxx-builder on DockerHub. So this is effectively useless.

[mlir][llvm] Add vector insert/extract intrinsics

These intrinsics will be needed to convert between fixed-length vectors
and scalable vectors.

This operation will be needed for VLS (vector-length specific)
vectorization, when interfacing with vector functions or intrinsics that
take scalable vectors as operands in a context where the length of our
vectors is known or assumed at compile time, but we still want to
generate scalable vector instructions.

Differential Revision: https://reviews.llvm.org/D127100

[SPARC] Don't do leaf optimization on procedures with inline assembly

On SPARC, leaf function optimization omits the register window sliding (and the associated register name changes). This might result in miscompilation of procedures containing inline assembly, as some of the register constraints used may interfere with the register usage of optimized functions, so we disable leaf procedure optimization on those procedures to prevent it from happening.

This is a continuation of patch D102342 by @LemonBoy, the original comment is reproduced below:

> Leaf functions allow the compiler to omit the setup and teardown of a frame pointer, therefore avoiding the exchange of the in/out register. According to the SPARC architecture manual every reference to %i0-%i5 should be replaced with %o0-o5, if the target register is already in use a further remapping step to %g1-%g7 is required to free the output register.
>
> Add a simple check to make sure not to stomp on any output register that's already in use.

Reviewed By: dcederman

Differential Revision: https://reviews.llvm.org/D128263

[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records

Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094

[flang][NFC] Add IO lowering tests

These tests were left behind or only partially upstreamed during
the lower code upstreaming.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D128634

Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>

ARM: don't try to load function pointer before long call.

Deciding to load an arbitrary global based on whether the entire module is
being built for long calls is pretty clearly spurious, and in fact the existing
indirect logic is sufficient.

[IndVars] Add test for PR56242 (NFC)

MIR: Fix parse error on empty CustomRegMask

[gn build] Port 633d1d0df766

[libc++] Use bounded iterators in std::span when the debug mode is enabled

Previously, we'd use raw pointers when the debug mode was enabled,
which means we wouldn't get out-of-range checking with std::span's
iterators.

This patch introduces a new class called __bounded_iter which can
be used to wrap iterators and make them carry around bounds-related
information. This allows iterators to assert when they are dereferenced
outside of their bounds.

As a fly-by change, this commit removes the _LIBCPP_ABI_SPAN_POINTER_ITERATORS
knob. Indeed, not using a raw pointer as the iterator type is useful to
avoid users depending on properties of raw pointers in their code.

This is an alternative to D127401.

Differential Revision: https://reviews.llvm.org/D127418

[libc++] Improve Lit's buildhost=XXXX feature on a few platforms

Differential Revision: https://reviews.llvm.org/D128455

[flang][NFC] Add array lowering tests

These tests were left behind during the upstreaming of parts lowering.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D128632

Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

[clang][dataflow] Singleton pointer values for null pointers.

When a `nullptr` is assigned to a pointer variable, it is wrapped in a `ImplicitCastExpr` with cast kind `CK_NullTo(Member)Pointer`. This patch assigns singleton pointer values representing null to these expressions.

For each pointee type, a singleton null `PointerValue` is created and stored in the `NullPointerVals` map of the `DataflowAnalysisContext` class. The pointee type is retrieved from the implicit cast expression, and used to initialise the `PointeeLoc` field of the `PointerValue`. The `PointeeLoc` created is not mapped to any `Value`, reflecting the absence of value indicated by null pointers.

Reviewed By: gribozavr2, sgatev, xazax.hun

Differential Revision: https://reviews.llvm.org/D128056

[SCF] Add thread_dim_mapping attribute to scf.foreach_thread

An optional thread_dim_mapping index array attribute specifies for each
virtual thread dimension, how it remaps 1-1 to a set of concrete processing
element resources (e.g. a CUDA grid dimension or a level of concrete nested
async parallelism). At this time, the specification is backend-dependent and
is not verified by the op, beyond being an index array attribute.
It is the reponsibility of the lowering to interpret the index array in the
context of the concrete target the op is lowered to, or to ignore it when
the specification is ill-formed or unsupported for a particular target.

Differential Revision: https://reviews.llvm.org/D128633

[mlir][bufferization][NFC] Add error handling to getBuffer

This is in preparation of adding memory space support.

Differential Revision: https://reviews.llvm.org/D128277

[mlir][bufferization][NFC] Fix typo in AllocTensorOp builders

[mlir][SCF][bufferize][NFC] Bufferize scf.for terminator separately

This allows for better type inference during bufferization and is in preparation of supporting memory spaces.

Differential Revision: https://reviews.llvm.org/D128422

[mlir][SCF][bufferize] Bufferize scf.if/execute_region terminators separately

This allows for better type inference during bufferization and is in preparation of supporting memory spaces.

Differential Revision: https://reviews.llvm.org/D128581

[mlir][SCF][bufferize][NFC] Bufferize parallel_insert_slice separately

This allows for better type inference during bufferization and is in preparation of supporting memory spaces.

Differential Revision: https://reviews.llvm.org/D128580

[AMDGPU] Regenerate MIR checks. NFC.

Fix clang docs build; NFC

This should address the break from:
https://lab.llvm.org/buildbot/#/builders/92/builds/28769

[mlir][shape][bufferize][NFC] Bufferize block terminators separately

This allows for better type inference during bufferization and is in preparation of supporting memory spaces.

Differential Revision: https://reviews.llvm.org/D128579

[AMDGPU][GFX9][DOC][NFC] Update assembler syntax description

Summary of changes:
- Updated MUBUF lds syntax (see https://reviews.llvm.org/D124485).
- Updated SMEM syntax (see https://reviews.llvm.org/D127314).
- Enabled src0=literal for v_madak*, v_madmk* (see https://reviews.llvm.org/D111067).
- Removed SYSMSG_OP_HOST_TRAP_ACK message.
- Minor bug fixing and improvements.

[STACKMAPS] Document+test UINT64_MAX stack size.

When a function does a dynamic stack allocation, the function's stack
size (in the stack map) is reported as UINT64_MAX.

This change tests and documents this property.

Differential Revision: https://reviews.llvm.org/D128525

[IR] Move vector.insert/vector.extract out of experimental namespace

These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976

[X86] combineConcatVectorOps - IsConcatFree must check extraction index

Identified in the regression reported by @alexfh on rGb5d7beeb9792 - IsConcatFree wasn't ensuring the subvector extraction index matched the position it would be concatenated back into.

[mlir][bufferization][NFC] Bufferize with PostOrder traversal

This is useful because the result type of an op can sometimes be inferred from its body (e.g., `scf.if`). This will be utilized in subsequent changes.

Also introduces a new `getBufferType` interface method on BufferizableOpInterface. This method is useful for computing a bufferized block argument type with respect to OpOperand types of the parent op.

Differential Revision: https://reviews.llvm.org/D128420

[AArch64] Define __FP_FAST_FMA[F]

Libraries use this flag to decide whether to use the fma builtin.
Author: Paul Walker

Differential Revision: https://reviews.llvm.org/D127655

[mlir][bufferization] Add `memory_space` op attribute

This attribute is currently supported on AllocTensorOp only. Future changes will add support to other ops. Furthermore, the memory space is not propagated properly in all bufferization patterns and some of the core bufferization infrastructure. This will be addressed in a subsequent change.

Differential Revision: https://reviews.llvm.org/D128274

[llvm-ar] Improve MRI script CREATE command handling

I discovered that when compared to GNU the llvm-ar MRI script parsing of
CREATE could lead to some strange behaviour. This fix improves the error
message in the case when no archive name is given and will not allow the
adding of members until CREATE is called. Along with this change I added
more testing of the CREATE command.

Differential Revision: https://reviews.llvm.org/D128055

[flang][driver] Add support for `-O{0|1|2|3}`

This patch adds support for most common optimisation compiler flags:
`-O{0|1|2|3}`. This is implemented in both the compiler and frontend
drivers. At this point, these options are only used to configure the
LLVM optimisation pipelines (aka middle-end). LLVM backend or MLIR/FIR
optimisations are not supported yet.

Previously, the middle-end pass manager was only required when
generating LLVM bitcode (i.e. for `flang-new -c -emit-llvm <file>` or
`flang-new -fc1 -emit-llvm-bc <file>`). With this change, it becomes
required for all frontend actions that are represented as
`CodeGenAction` and `CodeGenAction::executeAction` is refactored
accordingly (in the spirit of better code re-use).

Additionally, the `-fdebug-pass-manager` option is enabled to facilitate
testing. This flag can be used to configure the pass manager to print
the middle-end passes that are being run. Similar option exists in Clang
and the semantics in Flang are identical. This option translates to
extra configuration when setting up the pass manager. This is
implemented in `CodeGenAction::runOptimizationPipeline`.

This patch also adds some bolier plate code to manage code-gen options
("code-gen" refers to generating machine code in LLVM in this context).
This was extracted from Clang. In Clang, it simplifies defining code-gen
options and enables option marshalling. In Flang, option marshalling is
not yet supported (we might do at some point), but being able to
auto-generate some code with macros is beneficial. This will become
particularly apparent when we start adding more options (at least in
Clang, the list of code-gen options is rather long).

Differential Revision: https://reviews.llvm.org/D128043

[clang][dataflow] Implement functionality for flow condition variable substitution.

This patch introduces `buildAndSubstituteFlowCondition` - given a flow condition token, this function returns the expression of constraints defining the flow condition, with values substituted where specified.

As an example:
Say we have tokens `FC1`, `FC2`, `FC3`:
```
FlowConditionConstraints: {
FC1: C1,
FC2: C2,
FC3: (FC1 v FC2) ^ C3,
}
```
`buildAndSubstituteFlowCondition(FC3, /*Substitutions:*/{{C1 -> C1'}})`
returns a value corresponding to `(C1' v C2) ^ C3`.

Note:
This function returns the flow condition expressed directly as its constraints, which differs to how we currently represent the flow condition as a token bound to a set of constraints and dependencies. Making the representation consistent may be an option to consider in the future.

Depends On D128357

Reviewed By: gribozavr2, xazax.hun

Differential Revision: https://reviews.llvm.org/D128363

[flang] Update the release notes

Document changes introduced in https://reviews.llvm.org/D126164.

Differential Revision: https://reviews.llvm.org/D128413

[clang][dataflow] Move logic for `createStorageLocation` from `DataflowEnvironment` to `DataflowAnalysisContext`.

`createStorageLocation` in `DataflowEnvironment` is now a trivial wrapper around the logic in `DataflowAnalysisContext`.
Additionally, `getObjectFields` and `getFieldsFromClassHierarchy` (required for the implementation of `createStorageLocation`) are also moved to `DataflowAnalysisContext`.

Reviewed By: gribozavr2, sgatev

Differential Revision: https://reviews.llvm.org/D128359

[libc] Add a simple arm32 config.

This will be expanded in future as more functions are brought up on arm32.

[OpenCL] Reduce emitting candidate notes for builtins

When overload resolution fails, clang emits a note diagnostic for each
candidate. For OpenCL builtins this often leads to many repeated note
diagnostics with no new information. Stop emitting such notes.

Update a test that was relying on counting those notes to check how
many builtins are available for certain extension configurations.

Differential Revision: https://reviews.llvm.org/D127961

[SCEV] Assert that GEP source element type is sized (NFC)

This is checked by the IR verifier, so replace the condition with
an assert.

[AMDGPU] Fix assertion failure on mad with negative immediate addend

Without this, the new test case would fail with:

AMDGPUInstPrinter.cpp:545: void llvm::AMDGPUInstPrinter::printImmediate64(uint64_t, const llvm::MCSubtargetInfo &, llvm::raw_ostream &): Assertion `isUInt<32>(Imm) || Imm == 0x3fc45f306dc9c882' failed.

Differential Revision: https://reviews.llvm.org/D128435

[libc][NFC] Make the support thread library an object library.

It was previously a header library. Making it an object library will
allow us to declare thread local variables which can used to setup a
thread's self object.

[mlir][bufferization][NFC] Change signature of getMemRefType

These functions now accep unsigned attributes for address spaces instead of Attributes.

Differential Revision: https://reviews.llvm.org/D128275

[libunwind,EHABI,ARM] Fix get/set of RA_AUTH_CODE.

According to EHABI32 §8.5.2, the PAC for the return address of a
function described in an exception table is supposed to be addressed
in the _Unwind_VRS_{Get,Set} API by setting regclass=_UVRSC_PSEUDO and
regno=0. (The space of 'regno' values is independent for each
regclass, and for _UVRSC_PSEUDO, there is only one valid regno so far.)

That is indeed what libunwind's _Unwind_VRS_{Get,Set} functions expect
to receive. But at two call sites, the wrong values are passed in:
regno is being set to UNW_ARM_RA_AUTH_CODE (0x8F) instead of 0, and in
one case, regclass is _UVRSC_CORE instead of _UVRSC_PSEUDO.

As a result, those calls to _Unwind_VRS_{Get,Set} return
_UVRSR_FAILED, which their callers ignore. So if you compile in the
AUTG instruction that actually validates the PAC, it will try to
validate what's effectively an uninitialised register as an
authentication code, and trigger a CPU fault even on correct exception
unwinding.

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D128522

[SCEV] Use SCEVUnknown(poison) instead of SCEVUnknown(undef).

Use poison instead of undef for SCEVUnkown of unreachable values.
This should be in line with the movement to replace undef with poison
when possible.

Suggested in D114650.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128586