review.tizen.org Git - platform/upstream/llvm.git/log

[mlir][ArithToAMDGPU] Add option for saturating truncation to fp8 (#74153)

Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit floats to be
saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to
yield `240.0`, not `NaN`, and similarly for negative numbers. However,
the underlying hardware instruction that can be used for this truncation
implements overflow-to-NaN semantics.

To enable handling this usecase, we add the saturate-fp8-truncf option
to ArithToAMDGPU (off by default), which causes the requisite clamping
code to be emitted. Said clamping code ensures that Inf and NaN are
passed through exactly (and thus trancate to NaN).

Per review feedback, this commit efactors
createScalarOrSplatConstant() to the Arith dialect utilities and uses
it in this code. It also fixes naming of existing patterns and
switches from vector.extractelement/insertelement to
vector.extract/insert.

[mlir][sparse] adjust compression scheme for example (#79212)

[NFCI] Move SANITIZER_WEAK_IMPORT to sanitizer_common (#79208)

SANITIZER_WEAK_IMPORT is useful for any call that needs to be
conditionally linked in. This is currently used for the
tsan_dispatch_interceptors, but can be used for other calls introduced
in newer versions of MacOS. (such as `aligned_alloc` in this PR
https://github.com/llvm/llvm-project/pull/79198).

This PR moves the definition to a higher level so it can be used in
other sanitizers.

AMDGPU: Add SourceOfDivergence for int_amdgcn_global_load_tr (#79218)

[Clang][Driver] Fix `--save-temps` for OpenCL AoT compilation (#78333)

We can directly call `clang -c -x cl -target amdgcn -mcpu=gfx90a test.cl
-o test.o`
to compile an OpenCL kernel file. However, when `--save-temps` is
enabled, it doesn't
work because the preprocessed file (`.i` file) is taken as C source file
when it
is fed to the front end, thus causing compilation error because those
OpenCL keywords
can't be recognized. This patch fixes the issue.

[misc-coroutine-hostile-raii] Use getOperand instead of getCommonExpr. (#79206)

We were previously allowlisting awaitable types returned by
`await_transform` instead of the type of the operand of the `co_await`
expression.

This previously used to give false positives and not respect the
`AllowedAwaitablesList` flag when `await_transform` is used. See added
test cases for such examples.

[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061)

Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.

This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.

[libc++] Fix outdated release procedure for release notes

[Preprocessor][test] Test ARM64EC definitions (#78916)

[lldb][NFCI] Remove unused method BreakpointIDList::AddBreakpointID(const char *) (#79189)

This overload is completely unused.

[mlir][Target] Teach dense_resource conversion to LLVMIR Target (#78958)

This patch adds support for translating dense_resource attributes to
LLVMIR Target.
The support added is similar to how DenseElementsAttr is handled, except
we
don't need to handle splats.

Another possible way of doing this is adding iteration on
dense_resource, but that is
non-trivial as DenseResourceAttr is not meant to be something you should
directly
access. It has subclasses which you are supposed to use to iterate on
it.

Added feature in llvm-profdata merge to filter functions from the profile (#78378)

`--function=<regex>` Include functions matching regex in the output
`--no-function=<regex>` Exclude functions matching regex from the output

If both are specified, `--no-function` has a higher precedence if a
function name matches both filters

[libc] Fix implicit conversion in FEnvImpl for arm32 targets. (#79210)

[clang] Use LazyDetector for all toolchains. (#79073)

Use the LazyDetector also for the remaining toolchains to avoid unnecessarily checking for the Cuda and Rocm installations.

This fixes rdar://121397534.

[PowerPC] lower partial vector store cost (#78358)

There are matching store opcodes (stfd, stxsiwx) for the load opcodes
that make 32-bit and 64-bit vector operations cheap with VSX, so stores
should also be cheap.

[ELF,test] Actually fix defsym.ll

[ELF,test] Fix defsym.ll

[SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth.

No need in trying to analyze small graphs with gather node only to avoid
crash.

[IndVars] Add NUW variants to iv-poison.ll and variants with extra uses.

[Format] Fix detection of languages when reading from stdin (#79051)

The code cleanup in #74794 accidentally broke detection of languages by
reading file content from stdin, e.g. via `clang-format -dump-config - <
/path/to/filename`.

This PR adds unit and integration tests to reproduce the issue and adds
a fix.

Fixes: #79023

[libc++] Run the nightly libc++ build at 03:00 Eastern for real (#79184)

The nightly libc++ build was incorrectly set up to build at 22:00
Eastern when it intended to run at 03:00 Eastern. This patch fixes that.

[libc] Fix aliasing function name got accidentally deleted in #79128. (#79203)

[RISCV] Move FeatureStdExtH in RISCVFeatures.td. NFC

It was accidentally in the middle of the floating point extensions
after the recent reordering.

Revert "[libc] Fix forward arm32 buildbot" (#79201)

Reverts llvm/llvm-project#79151, necessary to revert #79128, which broke
all production builds and landed without review.

[lldb] Include SBFormat.h in LLDB.h (#79194)

This was likely overlooked when SBFormat was added.

[ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names

Port COFF's https://reviews.llvm.org/D78221 and
https://reviews.llvm.org/D137217 to ELF. For the in-process ThinLTO
link, `ld.lld --save-temps a.o d/b.o -o out` will create
ELF relocatable files `out.lto.a.o`/`d/out.lto.b.o` instead of
`out1.lto.o`/`out2.lto.o`. Deriving the LTO-generated relocatable file
name from bitcode file names helps debugging.

The relocatable file name from the first regular LTO partition does not
change: `out.lto.o`. The second, if present due to `--lto-partition=`,
changes from `out1.lto.o` to `lto.1.o`.

For an archive member, e.g. `d/a.a(coll.o at 8)`,
the relocatable file is `d/out.lto.a.a(coll.o at 8).o`.

`--lto-emit-asm` file names are changed similarly. `--lto-emit-asm -o
out` now creates `out.lto.s` instead of `out`, therefore the
`--lto-emit-asm -o -` idiom no longer works. However, I think this new
behavior (which matches COFF) is better since keeping or removing
`--lto-emit-asm` will dump different files, instead of overwriting the
`-o` output file from an executable/shared object to an assembly file.

Reviewers: rnk, igorkudrin, xur-llvm, teresajohnson, ZequanWu

Reviewed By: teresajohnson

Pull Request: https://github.com/llvm/llvm-project/pull/78835

[CMake][Release] Add option for enabling PGO to release cache file. (#78823)

The option is LLVM_RELEASE_ENABLE_PGO and it's turned on by default.

---------

Co-authored-by: Petr Hosek <phosek@google.com>

[ELF] Improve thin-archivecollision.ll

[NFC] Size and element numbers are often swapped when calling calloc (#79081)

gcc-14 will now throw a warning if size and elements are swapped.

[RISCV] Regenerate autogen test to remove spurious diff

[RISCV] Recurse on first operand of two operand shuffles (#79180)

This is the first step towards an alternate shuffle lowering design for
the general two vector argument case. The goal is to leverage the
existing lowering for single vector permutes to avoid as many of the
vrgathers as required - even if we do need the other.

This patch handles only the first argument, and is arguably a slightly
weird half-step. However, the test changes from the full two argument
recurse patch are a lot harder to reason about. Taking this half step
gives much more easily reviewable changes, and is thus worthwhile. I
intend to post the patch for the second argument once this has landed.

[LLD] [COFF] Fix crashes for cfguard with undefined weak symbols (#79063)

When marking symbols as having their address taken, we can have the
sitaution where we have the address taken of a weak symbol. If there's
no strong definition of the symbol, the symbol ends up as an absolute
symbol with the value null. In those cases, we don't have any Chunk.
Skip such symbols from the cfguard tables.

This fixes https://github.com/llvm/llvm-project/issues/78619.

[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072)

If we have a shuffle which is larger than m1, we may be able to split it
into a series of individual m1 shuffles. This patch starts with the
subcase where the mask allows a 1-to-1 mapping from source register to
destination register - each with a possible permutation of their own. We
can potentially extend this later, thought in practice this seems to
already catch a number of the most interesting cases.

[libc] fix sysconf (#79159)

Apply previously discussed fix for `sysconf`

[ASan][libc++] Turn on ASan annotations for short strings (#79049)

Originally merged here: https://github.com/llvm/llvm-project/pull/75882
Reverted here: https://github.com/llvm/llvm-project/pull/78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- https://github.com/llvm/llvm-project/pull/79065
- https://github.com/llvm/llvm-project/pull/79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
#ifndef NDEBUG
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
#endif
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
https://github.com/llvm/llvm-project/pull/72677

Requires to pass CI without fails:
- https://github.com/llvm/llvm-project/pull/75845
- https://github.com/llvm/llvm-project/pull/75858

Annotating `std::basic_string` with default allocator is implemented in
https://github.com/llvm/llvm-project/pull/72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
https://github.com/llvm/llvm-project/commit/dd1b7b797a116eed588fd752fbe61d34deeb24e4.
You can turn off annotations for a specific allocator based on changes
from
https://github.com/llvm/llvm-project/commit/2fa1bec7a20bb23f2e6620085adb257dafaa3be0.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com

[ASan][JSON] Unpoison memory before its reuse (#79065)

This commit unpoisons memory before its reuse (with reinterpret_cast).
Required by https://github.com/llvm/llvm-project/pull/79049

Notice that it's a temporary solution to prevent buildbots from failing.
Read FIXME for details.

[mlir][AMDGPU] Actually update the default ABI version, add comments (#79185)

Much confusion occurred earlier today when updating the fallback `int
abi;` in addControlVariables() didn't do anything. THis was because that
that value is the fallback for if the ABI version fails to parse ...
which it always should, because it has a default value that comes from
multiple different places.

This commit updates all the places said default variable can come from,
namely:
1. The ROCDL target attribute definition
2. The ROCDL target attribute's builders
3. The rocdl-attach-target pass's default option values.

With this, the printf test is passing.

[test] Avoid libc dep in Update warn-unsafe-buffer-usage-warning-data… (#79183)

Avoid libc dep in warn-unsafe-buffer-usage-warning-data-invocation.

To keep the test hermetic. This is in line with other existing
declarations in the file that avoid includes.

[ASan][ADT] Don't scribble with ASan (#79066)

With this commit, scribbling under AddressSanitizer (ASan) is disabled to prevent overwriting poisoned objects (e.g., annotated short strings).
Needed by https://github.com/llvm/llvm-project/pull/79049

Revert "Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass … (#79186)

…#78606"

This reverts commit 13c6f1ea2e7eb15fe492d8fca4fa1857c6f86370 because it
causes an assertion in DebugInfoMetadata.cpp:1968 in Clang Linux
builders for Fuchsia.

https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8758111613576762817/+/u/clang/build/stdout

AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (#79104)

int_amdgcn_global_load_tr did not specify non-temporal load transpose,
thus we should
not genetrate the non-temporal hint for the load. We need to implement
getTgtMemIntrinsic
to create the corresponding MemSDNode. And we don't set the non-temporal
flag because
the intrinsic did not specify it.

NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.

[RISCV] Re-format RISCVFeatures.td so it doesn't look like AssemblerPredicate is an operand to Predicate. (#79076)

AssemblerPredicate was almost always indented to the same column as the
first operand to Predicate. But AssemblerPredicate is a separate base
class so should have the same indentation as Predicate.

For the string passed to AssemblePredicate, I aligned it to the other
arguments on the previous if it fit in 80 columns. Otherwise I indented
4 spaces past the start of AssemblerPredicate.

For some vendor extensions I put the 2 classes on new lines instead of
the same line as the def. This gave more room for the strings and was
more consistent with other formatting in that portion of the file.

[Orc] Let LLJIT default to JITLink for ELF-based ARM targets (#77313)

The JITLink AArch32 backend reached feature-parity with RuntimeDyld
on ELF-based systems. This patch changes the default JIT-linker in Orc's
LLJIT for these platforms.

This allows us to run clang-repl with JITLink on ARM and use all the
features we had with RuntimeDyld before. All existing tests for
clang-repl are passing.

Re-land [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)

The reverts 94f960925b7f609636fc2ffd83053814d5e45ed1 and fixes it.

Revert "[clang-repl] Enable native CPU detection by default (#77491)" (#79178)

Reverting because `clang-repl -Xcc -mcpu=arm1176jzf-s` isn't overwriting
this as I had expected. We need to check whether a specific CPU flag was
given by the user first.

Reverts llvm/llvm-project#77491

[ConstantHoisting] Cache OptForSize. (#79170)

CacheOptForSize to remove quadratic behavior.

For each constant analyzed, ConstantHoising calls
`shouldOptimizeForSize(F)`, which calls `PSI.getTotalCallCount(F)`.
PSI.getTotalCallCount(F) goes through all the instructions in all basic
blocks, and checks if each is a call, to count them up.

This reduces `llc` time for a very large IR from ~10min to under 3min.
Reproducer testcase is much too large to share.

[gn] port 5176df55d3a

[libc][Docs] Update the GPU RPC documentation (#79069)

Summary:
This adds some more concrete information on the RPC interface. Hopefully
this is intelligable and provides some useful examples.

[RISCV] Add regalloc hints for Zcb instructions. (#78949)

This hints the register allocator to use the same register for source
and destination to enable more compression.

[RISCV] Continue with early return for shuffle lowering [nfc]

Move two cases where we're not actually going to use any of our computed index vectors or mask values above the computation of the same.

[clang][modules] Fix CodeGen options that can affect the AST. (#78816)

`OptimizationLevel` and `OptimizeSize` can affect the generated AST. They indirectly affect the `Optimize` and `OptimizeSize` frontend options, which in turn set predefined macro definitions.

This fixes rdar://121228252.

[MachineCopyPropagation] Make a SmallVector larger (NFC) (#79106)

This patch makes a SmallVector slightly larger. We encounter quite a
few instructions with 3 or 4 defs but very few beyond that on X86.

This saves 0.39% of heap allocations during the compilation of a large
preprocessed file, namely X86ISelLowering.cpp, for the X86 target.

[OpenACC] Implement 'device_type' clause parsing

'device_type' takes either an asterisk or a list of impementation
specific identifiers. This patch implements the parsing for it.
Additionally, 'dtype' is an alias for 'device_type', though we're
implementing it as its own clause kind to improve future diagnostics, as
this will allow us to differentiate the spellings.

[clang] Add missing streaming attributes to SVE builtins (#79134)

This patch adds `IsStreamingCompatible` or `IsStreamingOrSVE2p1` to the
SVE builtins that missed them.

[RISCV] Use early return for select shuffle lowering [nfc]

Minor rework of the fallback case for two argument shuffles in lowerVECTOR_SHUFFLE. We had some common code which wasn't actually common, and simplified significantly once specialized for whether we had a select or not.

[LangRef] adjust IR atomics specification following C++20 model tweaks. (#77263)

C++20 accepted two papers, [P0668](https://wg21.link/P0668) and
[P0982](https://wg21.link/P0982), which changed the atomics memory model
slightly in order to reflect the realities of the existing
implementations.

The rationale for these changes applies as well to the LLVM IR atomics
model. No code changes are expected to be required from this change: it
is primarily a matter of more-correctly-documenting the existing state
of the world.

There's three changes: two of them weaken guarantees, and one
strengthens them:

1. The memory ordering guaranteed by some backends/CPUs when seq_cst
operations are mixed with acquire/release operations on the same
location was weaker than the spec guaranteed. Therefore, the
specification is changed to remove the requirement that seq_cst ordering
is consistent with happens-before, and replaces it with a slightly
weaker requirement of consistency with a new relation named
strongly-happens-before.

2. The rules for a "release sequence" were weakened. Previously, an
acquire synchronizes with an release even if it observes a later
monotonic store from the same thread as the release store. That has now
been removed: now, only read-modify-write operations can extend a
release sequence.

3. The model for a a seq_cst fence is strengthened, such that placing a
seq_cst between monotonic accesses now _is_ sufficient to guarantee
sequential consistency in the model (as it always has been on existing
implementations.)

Note that I've directly referenced the C++ standard's atomics.order
section for the precise semantics of seq_cst, instead of fully
describing them. They are quite complex, and a lot of work has gone into
refining the words in the standard. I'm afraid if I attempt to reiterate
them, I would only introduce errors.

[DebugInfo][RemoveDIs] Disable a run-line while investigating a problem

This just reduces coverage for RemoveDIs temporarily, and it's almost
certainly a patch-ordering problem.

[JITLink][AArch32] Implement Armv5 ldr-pc stubs and use them for all pre-v7 targets (#79082)

This stub type loads an absolute address directly into the PC register.
It's the simplest and most compatible way to implement a branch
indirection across the entire address space (and probably the slowest as
well). It's the ideal fallback for all targets for which we did not
(yet) implement a more performant solution.

[CompilerRT] Attempt to fix a lit-config issue

This is a follow-up to 3112578597c03 -- it looks like passing the added
cmake flag to pythonize_bool also appends "_PYBOOL" to the flag name, which
this lit config file is missing. This trips up builds such as:

https://lab.llvm.org/buildbot/#/builders/275/builds/3661
https://lab.llvm.org/buildbot/#/builders/184/builds/9811

Where COMPILER_RT_HAS_AARCH64_SME ends up expanding to nothing.

[Clang][AArch64] Add diagnostics for builtins that use ZT0. (#79140)

Similar to what we did for ZA, this patch adds diagnostics to flag when
using a ZT0 builtin in a function that does not have ZT0 state.

[RemoveDIs][DebugInfo] Enable creation of DPVAssigns, update outstanding AT tests (#79148)

This is the final patch for DPVAssign support, implementing the actual
creation of DPVAssigns and allowing them to be converted along with
dbg.values and dbg.declares. Numerous tests landed in previous patches
will no longer be rotten after this patch lands (previously they would
trivially pass due to DPVAssigns not actually being used), and a further
batch of tests have been added here that require the changes in this
patch before they pass.

[AMDGPU] Enable architected SGPRs for GFX12 (#79160)

[libc][NFC] Remove `FPBits` cast operator (#79142)

The semantics for casting can range from "bitcast" (same representation)
to "different representation", to "type promotion". Here we remove the
cast operator and force usage of `get_val` as the only function to get
the floating point value, making the intent clearer and more consistent.

[DAG] visitSCALAR_TO_VECTOR - don't fold scalar_to_vector(bin(extract(x),extract(y)) -> bin(x,y) if extracts have other uses

Fixes #78897 - although the test case still has a number of poor codegen issues (in particular for i686 triples) that will need addressing (combining the nodes in topological order should help).

[DebugInfo][RemoveDIs] Use splice in Outliner rather than moveBefore (#79124)

This patch replaces a utility in the outliner that moves the contents of
one basic block into another basic block, with a call to splice instead.
I think it's NFC, however I'd like a second pair of eyes to look at it
just in case.

The reason for doing this is an edge case in the handling of DPValue
objects, the replacement for dbg.values. If there's a variable
assignment "dangling" at the end of a block (which happens when we
delete the terminator), inserting instructions at end() doesn't shift
the DPValue up into the block. We could probably fix this; but it's much
easier to use splice at the only call site that does this.

Patch adds --try-experimental-debuginfo-iterators to a test to exercise
this code path.

[AMDGPU] Properly check op_sel in GCNDPPCombine (#79122)

[RemoveDIs][DebugInfo] Handle DPVAssign in most transforms (#78986)

This patch trivially updates various opt passes to handle DPVAssigns. In
all cases, this means some combination of generifying existing code to
handle DPValues and DbgAssignIntrinsics, iterating over DPValues where
previously we did not, or duplicating code for DbgAssignIntrinsics to
the equivalent DPValue function (in inlining and salvageDebugInfo).

[VecLib] Fix: Restore DebugFlag state in ReplaceWithVecLibTest (#78989)

It appears that Google Tests run multiple modules from the same
invocation.
As a result setting `llvm::DebugFlag` in this pass kept it on in
subsequent passes, which is not the expected behavior.

[LAA] Add test for #79137 (NFC)

[AArch64][FMV] Support feature MOPS in Function Multi Versioning. (#78788)

The patch adds support for FEAT_MOPS (Memory Copy and Memory Set
instructions) in Function Multi Versioning. The bits [19:16] of the
system register ID_AA64ISAR2_EL1 indicate whether FEAT_MOPS is
implemented in AArch64 state. This information is accessible via ELF
hwcaps.

[X86] Add test case for Issue #78897

[libc][NFC] use builder pattern for ErrnoSetterMatcher (#79153)

ErrnoSetterMatcher::returns can be misleading as it does not initialize
the errno. This is made worse as later on there is a switch statement on
the errno comparator using __builtin_unreachable(). This patch make
ErrnoSetterMatcher::returns give back a builder that is nomially
different from ErrnoSetterMatcher.

[libc] Remove specific nan payload in math functions (#79133)

[lld-macho][arm64] implement -objc_stubs_small (#78665)

This patch implements `-objc_stubs_small` targeting arm64, aiming to
align with ld64's behavior.
1. `-objc_stubs_fast`: As previously implemented, this always uses the
Global Offset Table (GOT) to invoke `objc_msgSend`. The alignment of the
objc stub is 32 bytes.
2. `-objc_stubs_small`: This behavior depends on whether `objc_msgSend`
is defined. If it is, it directly jumps to `objc_msgSend`. If not, it
creates another stub to indirectly jump to `objc_msgSend`, minimizing
the size. The alignment of the objc stub in this case is 4 bytes.

[PGO] Remove calls to `__llvm_orderfile_dump()` in `instrprof-api.c` test (#79150)

https://github.com/llvm/llvm-project/pull/78285 added a test which calls
`__llvm_orderfile_dump()`, a functionality that is not supported on many
platforms. This PR removes the call to `__llvm_orderfile_dump()` to
avoid it failing on unsupported platforms, and turn on the test for
Windows, so we test the rest of the API that are supported.

Remove config.aarch64_sme from compiler-rt/unittests/lit.common.unit.configured.in

[libc] Fix forward arm32 buildbot (#79151)

Fix forward #79128

[DebugInfo][RemoveDIs] Handle non-instr debug-info in GlobalISel (#75228)

The RemoveDIs project is aiming to eliminate debug intrinsics like
dbg.value and dbg.declare from LLVM, and replace them with DPValue objects
attached to instructions. ISel is one of the "terminals" where that
information needs to be converted into MIR format: this patch implements
support for that in GlobalISel. We aim for the output of LLVM to be
identical with/without RemoveDIs debug-info.

This patch should be NFC, as we're handling the same data about variables
stored in a different format -- it now appears in a DPValue object rather
than as an intrinsic. To that end, I've refactored the handling of
dbg.values into a dedicated function, and call it whenever a dbg.value or a
DPValue is encountered. dbg.declare is handled in a similar way.

Testing: adding the --try-experimental-debuginfo-iterators switch to llc
causes it to try and convert to the "new" debug-info format if it's built
in (LLVM_EXPERIMENTAL_DEBUGINFO_ITERATORS=On), and it'll be covered by our
buildbot. One test has a few extra wildcard-regexes added: this is because
there's some extra data printed about attached debug-info, which is safe to
ignore.

[MLIR][AMDGPU] Switch to code object version 5 (#79144)

As AMDGPU backend has moved to cov5 as default, mlir should also switch
to it.

[gn build] Port 40bdfd39e394

[CMake][PGO] Add libunwind to list of stage1 runtimes (#78869)

This fixes the build since 8f90e6937a1fac80873bb2dab5f382c82ba1ba4e
which made libcxxabi use llvm's libunwind by default.

Fixes #78487

[ARM] Introduce the v9.5-A architecture version to Arm targets (#78994)

This introduces the Armv9.5-A architecture version to the Arm backend,
following on from the existing implementation for AArch64 targets.

Mode details about the Armv9.5-A architecture version can be found at:
* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
* https://developer.arm.com/documentation/ddi0602/2023-09/

[llvm-reduce][DebugInfo] Support reducing non-instruction debug-info (#78995)

LLVM will shortly be able to represent variable locations without
encoding information into intrinsics -- they'll be stored as DPValue
objects instead. We'll still need to be able to llvm-reduce these
variable location assignments just like we can with intrinsics today,
thus, here's an llvm-reduce pass that enumerates and reduces the DPValue
objects.

The test for this is paradoxically written with dbg.value intrinsics:
this is because we're changing all the core parts of LLVM to support
this first, with the textual IR format coming last. Until that arrives,
testing the llvm-reduce'ing of DPValues needs the added test using
intrinsics. We should be able to drop the variable assignment using
%alsoloaded using this method. As with the other llvm-reduce tests, I've
got one set of check lines for making the reduction happen as desired,
and the other set to check the final output.

[RemoveDIs][DebugInfo] Handle DPVAssigns in Assignment Tracking excluding lowering (#78982)

This patch adds support for DPVAssigns across all of
AssignmentTrackingAnalysis except for AssignmentTrackingLowering, which
is implemented in a separate patch. This patch includes handling
DPValues in MemLocFragFill, the removal of redundant DPValues as part of
AssignmentTrackingAnalysis (which is different to the version in
`BasicBlockUtils.cpp`), and preventing the DPVAssigns from being
directly emitted in SelectionDAG (just as we don't emit llvm.dbg.assigns
directly, but receive a set of locations from
AssignmentTrackingAnalysis' output).

[AMDGPU] Remove getWorkGroupIDSGPR, unused since aa6fb4c45e01

[MachineOutliner] Refactor iterating over Candidate's instructions (#78972)

Make Candidate's front() and back() functions return references to
MachineInstr and introduce begin() and end() returning iterators, the
same way it is usually done in other container-like classes.

This makes possible to iterate over the instructions contained in
Candidate the same way one can iterate over MachineBasicBlock (note that
begin() and end() return bundled iterators, just like MachineBasicBlock
does, but no instr_begin() and instr_end() are defined yet).

fix test (#79018)

Mistake with #78628 that got caught after being merged

[clang][Interp][NFC] Move ToVoid casts to the bottom

So we have all the complex casts together.

[Headers][X86] Add macro descriptions to bmiintrin.h (#79048)

These are largely copy-pasted from the corresponding function
descriptions. Added \see cross-references. Also changed <c> tags to \c.

Revert 10f3296dd7d74c975f208a8569221dc8f96d1db1 - [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)

It broke the AMDGPU buildbot: https://lab.llvm.org/buildbot/#/builders/193/builds/45378

[test] Update stack_guard_remat.ll (#79139)

Replace cp with a cat. This allows to create a writable file when the
original one is read-only.

[openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)

There were quite a few compilation warnings when building openmp on Windows with
the latest Visual Studios 2022 version 17.8.4. Some other warnings were visible
with the latest Clang at tip. This commit fixes all of them.

ValueTracking: Recognize fcmp ole/ugt with inf as a class test (#79095)

These were missed and hopefully avoids assertions when
dc3faf0ed0e3f1ea9e435a006167d9649f865da1 is recommitted.

Restore: [mlir][ROCDL] Stop setting amdgpu-implicitarg-num-bytes (#79129)

This patch restores PR#78498

[reland][libc] Remove unnecessary `FPBits` functions and properties (#79128)

- reland #79113
- Fix aarch64 RISC-V build

[gn] port 3ab8d2aac7bc

3ab8d2aac7bc relanded in 3112578597c031, so reland this too.

Might want to set this to True (and add a few source files to
builtins) at some point, but for now heal the bots.

[AArch64] Add vec3 tests with add between load and store.

Extra tests for
https://github.com/llvm/llvm-project/pull/78637
https://github.com/llvm/llvm-project/pull/78632

[MC][X86] Merge lane/element broadcast comment printers. (#79020)

This is /almost/ NFC - the only annoyance is that for some reason we were using "<C1,C2,..>" for ConstantVector types unlike all other cases - these now use the same "[C1,C2,..]" format as the other constant printers.

[RemoveDIs][DebugInfo] Handle DPVAssigns in AssignmentTrackingLowering (#78980)

Following on from the previous patch 6aeb7a7,
this patch adds the necessary code to process the DPV equivalents of
llvm.dbg.assign intrinsics. Most of the content of this patch is simply
duplicating existing functionality, using generic code for simple
functions and PointerUnions where storage is required. The most complex
changes are in the places that iterate over instructions, as iterating
over DPValues between instructions is different to iterating over
instructions that may or may not be debug intrinsics; this is most
complex in `AssignmentTrackingLowering::process`, where I've added some
comments to explain the state of the program at each key point depending
on whether we are operating on intrinsics or DPValues.

[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125)

Fixes #78856