review.tizen.org Git - platform/upstream/llvm.git/log

[C++20] [Modules] Don't perform ODR checks in GMF

Close https://github.com/llvm/llvm-project/issues/79240.

See the linked issue for details. Given the frequency of issue reporting
about false positive ODR checks (I received private issue reports too),
I'd like to backport this to 18.x too.

[AArch64] Add some release notes items (#79983)

[clang-format] Fix a bug in AnnotatingParser::rParenEndsCast() (#79549)

Fixes #78965.

(cherry picked from commit f826f55b2ab68c2515fae751dc2d6ef77f37b172)

[Release Notes][FMV] Document support for rcpc3 and mops features. (#80152)

Documents support for Load-Acquire RCpc instructions v3 (rcpc3) as well
as Memory Copy and Memory Set Acceleration instructions (mops) when
targeting AArch64.

[Clang][Sema] Fix regression due to missing ambiguity check before attempting access check. (#80730)

Previously when fixing ambiguous lookup diagnostics in
cc1b6668c57170cd440d321037ced89d6a61a9cb The change refactored
`LookupResult` to split out diagnosing access and ambiguous lookups. The
call to `getSema().CheckLookupAccess(...)` should have guarded by a
check for isAmbiguous(). This change adds that guard.

Fixes: https://github.com/llvm/llvm-project/issues/80435
(cherry picked from commit a7bc9cb6ffa91ff0ebabc45c0c7263c7c2c3a4de)

Revert "[SemaCXX] Implement CWG2137 (list-initialization from objects of the same type) (#77768)" in release/18.x (#79400)

- Revert "[SemaCXX] Implement CWG2137 (list-initialization from objects
of the same type) (#77768)", see
https://github.com/llvm/llvm-project/pull/77768#issuecomment-1908946696

[🍒] Unconditionally lower std::string's alignment requirement from 16 to 8 (#68925) (#79480)

This change saves memory by providing the allocator more freedom to
allocate the most
efficient size class by dropping the alignment requirements for
std::string's
pointer from 16 to 8. This changes the output of std::string::max_size,
which makes it ABI breaking.

That said, the discussion concluded that we don't care about this ABI
break. and would like this change enabled universally.

The ABI break isn't one of layout or "class size", but rather the value
of "max_size()" changes, which in turn changes whether `std::bad_alloc`
or `std::length_error` is thrown for large allocations.

This change is the child of PR #68807, which enabled the change behind
an ABI flag.

Refactor recomputeLiveIns to operate on whole CFG (#79498) (#79641)

Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence. This fixes a register allocation bug
introduced in AArch64 stack probing.

(cherry picked from commit ff4636a4ab00b633c15eb3942c26126ceb2662e6)

[X86][tablgen] Fix the broadcast tables (#79675)

(cherry picked from commit 7c3ee7cbe6419ea5e37ce2723cc1a1688380581f)

[LV] Fix handling of interleaving linear args (#78725)

Currently when interleaving vector calls with linear arguments,
the Part is ignored and all vector calls use the initial value
from the first lane of the current iteration.

Fix this to extract from the correct part of the linear vector.

(cherry picked from commit d4c01714239e80d21e441c3886749fc56b743f81)

[libcxx] Add a release note for Clang-cl specific features (#80010)

[docs] Add release notes for Windows specific changes in 18.x (#80011)

PR for llvm/llvm-project#79568 (#80120)

Backporting https://github.com/llvm/llvm-project/pull/79568 to clang 18.

[clang-format] Simplify the AfterPlacementOperator option (#79796)

Change AfterPlacementOperator to a boolean and deprecate SBPO_Never,
which meant never inserting a space except when after new/delete.

Fixes #78892.

(cherry picked from commit 908fd09a13b2e89a52282478544f7f70cf0a887f)

[ConstraintElim] Make sure min/max intrinsic results are not poison.

The result of umin may be poison and in that case the added constraints
are not be valid in contexts where poison doesn't cause UB. Only queue
facts for min/max intrinsics if the result is guaranteed to not be
poison.

This could be improved in the future, by only adding the fact when
solving conditions using the result value.

Fixes https://github.com/llvm/llvm-project/issues/78621.

(cherry picked from commit 3d91d9613e294b242d853039209b40a0cb7853f2)

[ConstraintElim] Add tests for #78621.

Tests with umin where the result may be poison for
https://github.com/llvm/llvm-project/issues/78621.

(cherry picked from commit c83180c1248615cf6ea8842eb4e0cebebba4ab57)

[AArch64][TargetParser] Add mcpu alias for Microsoft Azure Cobalt 100. (#79614)

With a690e86 we added -mcpu/mtune=native support to handle the Microsoft
Azure Cobalt 100 CPU as a Neoverse N2. This patch adds a CPU alias in
TargetParser to maintain compatibility with GCC.

(cherry picked from commit ae8005ffb6cd18900de8ed5a86f60a4a16975471)

[MIPS] Use generic isBlockOnlyReachableByFallthrough (#80799)

FastISel may create a redundant BGTZ terminal which fallthroughes.
```
BGTZ %2:gpr32, %bb.1, implicit-def $at

bb.1.bb1:
; predecessors: %bb.0
```

The `!I->isBarrier()` check in
MipsAsmPrinter::isBlockOnlyReachableByFallthrough
will incorrectly not print a label, leading to a `Undefined temporary
symbol `
error when we try assembling the output assembly file. See the updated
`Fast-ISel/pr40325.ll` and
https://github.com/rust-lang/rust/issues/108835

In addition, the `SwitchInst` condition is too conservative and prints
many unneeded labels (see the updated tests).

Just use the generic isBlockOnlyReachableByFallthrough, updated by
commit 1995b9fead62f2f6c0ad217bd00ce3184f741fdb for SPARC, which also
handles MIPS.

(cherry picked from commit 6b2fd7aed66d592738f26c76caa8fff95e168598)

[CLANG] Fix INF/NAN warning. (#80290)

In https://github.com/llvm/llvm-project/pull/76873 a warning was added
when the macros INFINITY and NAN are used in binary expressions when
-menable-no-nans or -menable-no-infs are used. If the user uses an
option that nullifies these two options, the warning will still be
generated. This patch adds an additional information to the warning
comment to let the user know about this. It also suppresses the warning
when #ifdef INFINITY, #ifdef NAN, #ifdef NAN or #ifndef NAN are used in
the code.

(cherry picked from commit 62c352e13c145b5606ace88ecbe9164ff011b5cf)

[Profile][Windows] Drop extern for __buildid. (#80700)

(cherry picked from commit dd22140e21f2ef51cf031354966a3d41c191c6e7)

[Github] Fix triggers formatting in code format action

A recent comment modified the job to only run on the main branch, but
the formatting was slightly off, causing the job to not run. This patch
fixes the formatting so the job will run as expected.

(cherry picked from commit 4b34558f43121df9b863ff2492f74fb2e65a5af1)

[workflows] Only run code formatter on the main branch (#80348)

Modifying a cherry-picked patch to fix code formatting issues can be
risky, so we don't typically do this. Therefore, it's not necessary to
run this job on the release branches.

(cherry picked from commit 2193c95e2459887e7e6e4f9f4aacf9252e99858f)

[InstCombine] Fix assertion failure in issue80597 (#80614)

The assertion in #80597 failed when we were trying to compute known bits
of a value in an unreachable BB.

https://github.com/llvm/llvm-project/blob/859b09da08c2a47026ba0a7d2f21b7dca705864d/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L749-L810

In this case, `SignBits` is 30 (deduced from instr info), but `Known` is
`10000101010111010011110101000?0?00000000000000000000000000000000`
(deduced from dom cond). Setting high bits of `lshr Known, 1` will lead
to conflict.

This patch masks out high bits of `Known.Zero` to address this problem.

Fixes #80597.

(cherry picked from commit cb8d83a77c25e529f58eba17bb1ec76069a04e90)

[BPI] Transfer value-handles when assign/move constructing BPI (#77774)

Background: BPI stores a collection of edge branch-probabilities, and
also a set of Callback value-handles for the blocks in the
edge-collection. When a block is deleted, BPI's eraseBlock method is
called to clear the edge-collection of references to that block, to
avoid dangling pointers.

However, when move-constructing or assigning a BPI object, the
edge-collection gets moved, but the value-handles are discarded. This
can lead to to stale entries in the edge-collection when blocks are
deleted without the callback -- not normally a problem, but if a new
block is allocated with the same address as an old block, spurious
branch probabilities will be recorded about it. The fix is to transfer
the handles from the source BPI object.

This was exposed by an unrelated debug-info change, it probably just
shifted around allocation orders to expose this. Detected as
nondeterminism and reduced by Zequan Wu:

https://github.com/llvm/llvm-project/commit/f1b0a544514f3d343f32a41de9d6fb0b6cbb6021#commitcomment-136737090

(No test because IMHO testing for a behaviour that varies with memory
allocators is likely futile; I can add the reproducer with a CHECK for
the relevant branch weights if it's desired though)

(cherry picked from commit 604a6c409e8473b212952b8633d92bbdb22a45c9)

[CMake][PGO] Add option for using an external project to generate profile data (#78879)

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
-DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.

---------

Co-authored-by: Petr Hosek <phosek@google.com>
(cherry picked from commit dd0356d741aefa25ece973d6cc4b55dcb73b84b4)

AMDGPU: Set max supported div/rem size to 64 (#80669)

This enables IR expansion for i128 divisions. The vector case is still
broken because ExpandLargeDivRem doesn't try to handle them.

Fixes: SWDEV-426193
(cherry picked from commit a5d206df792b61a0b6c5ac44343a97696fc6071d)

[libc++] Rename __bit_reference template parameter to avoid conflict (#80661)

As of 4d20cfcf4eb08217ed37c4d4c38dc395d7a66d26, `__bit_reference`
contains a template `__fill_n` with a bool `_FillValue` parameter.

Unfortunately there is a relatively widely used piece of scientific
software called NetCDF, which exposes a (C) macro `_FillValue` in its
public headers.

When building the NetCDF C++ bindings, this quickly leads to compilation
errors when the macro interferes with the template in `__bit_reference`.

Rename the parameter to `_FillVal` to avoid the conflict.

(cherry picked from commit 1ec252298925de50b27930c557ba9de3cc397afe)

[libc++] Add missing conditionals for feature-test macros (#80168)

We noticed that some feature-test macros were not conditional on
configuration flags like _LIBCPP_HAS_NO_FILESYSTEM. As a result, code
attempting to use FTMs would not work as intended.

This patch adds conditionals for a few feature-test macros, but more
issues may exist.

rdar://122020466
(cherry picked from commit f2c84211d2834c73ff874389c6bb47b1c76d391a)

[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (#80678)

Fixes #80366

(cherry picked from commit 4e958abf2f44d08129eafd5b6a4ee2bd3584ed22)

[clang] Add GCC-compatible code model names for sparc64

This adds GCC-compatible names for code model selection on 64-bit SPARC
with absolute code.
Testing with a 2-stage build then running codegen tests works okay under
all of the supported code models.

(32-bit target does not have selectable code models)

Reviewed By: @brad0, @MaskRay

(cherry picked from commit b0f0babff22e9c0af74535b05e2c6424392bb24a)

[AA][JumpThreading] Don't use DomTree for AA in JumpThreading (#79294)

JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.

Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.

Fixes https://github.com/llvm/llvm-project/issues/79175.

(cherry picked from commit 4f32f5d5720fbef06672714a62376f236a36aef5)

[JumpThreading] Add test for #79175 (NFC)

(cherry picked from commit 7143b451d71fe314730f7610d7908e3b9611815c)

[Loads] Use BatchAAResults for available value APIs (NFCI)

This allows caching AA queries both within and across the calls,
and enables us to use a custom AAQI configuration.

(cherry picked from commit 89dae798cc77789a43e9a60173f647dae03a65fe)

[compiler-rt] Remove duplicate MS names for chkstk symbols (#80450)

Prior to 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2, the builtins library
contained two chkstk implementations for each of i386 and x86_64, one
that was used in mingw environments, and one unused (with a symbol name
not matching anything that is used anywhere). Some of the functions
additionally had other, also unused, aliases.

After cleaning this up in 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2, the
unused symbol names were removed.

At the same time, symbol aliases were added for the names as they are
used by MSVC; the functions are functionally equivalent, but have
different names between mingw and MSVC style environments.

By adding a symbol alias (so that one object file contains two different
symbols for the same function), users can run into problems with
duplicate definitions, if they themselves define one of the symbols (for
various reasons), but need to link in the other one.

This happens for Wine, which provides their own definition of
"__chkstk", but when built in mingw mode does need compiler-rt to
provide the mingw specific symbol names; see
https://github.com/mstorsjo/llvm-mingw/issues/397.

To avoid the issue, remove the extra MS style names. They weren't
entirely usable as such for MSVC style environments anyway, as
compiler-rt builtins don't build these object files at all, when built
in MSVC mode; thus, the effort to provide them for MSVC style
environments in 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2 was a
half-hearted step towards that.

If we really do want to provide those functions (as an alternative to
the ones provided by MSVC itself), we should do it in a separate object
file (even if the function implementation is the same), so that users
who have a definition of one of them but need a definition of the other,
won't have conflicts.

Additionally, if we do want to provide them for MSVC, those files
actually should be built when building the builtins in MSVC mode as well
(see compiler-rt/lib/builtins/CMakeLists.txt).

If we do that, there's a risk that an MSVC style build ends up linking
in and preferring our implementation over the one provided by MSVC,
which would be suboptimal. Our implementation always probes the
requested amount of stack, while the MSVC one checks the amount of
allocated stack and only probes as much as really is needed.

In short - this reverts the situation to what it was in the 17.x release
series (except for unused functions that have been removed).

(cherry picked from commit 248aeac1ad2cf4f583490dd1312a5b448d2bb8cc)

[Coverage] Let `Decision` take account of expansions (#78969)

The current implementation (D138849) assumes `Branch`(es) would follow
after the corresponding `Decision`. It is not true if `Branch`(es) are
forwarded to expanded file ID. As a result, consecutive `Decision`(s)
would be confused with insufficient number of `Branch`(es).

`Expansion` will point `Branch`(es) in other file IDs if `Expansion` is
included in the range of `Decision`.

Fixes #77871

---------

Co-authored-by: Alan Phipps <a-phipps@ti.com>
(cherry picked from commit d912f1f0cb49465b08f82fae89ece222404e5640)

CoverageMappingWriter: Emit `Decision` before `Expansion` (#78966)

To relax scanning record, tweak order by `Decision < Expansion`, or
`Expansion` could not be distinguished whether it belonged to `Decision`
or not.

Relevant to #77871

(cherry picked from commit 438fe1db09b0c20708ea1020519d8073c37feae8)

[MSSAUpdater] Handle simplified accesses when updating phis (#78272)

This is a followup to #76819. After those changes, we can still run into
an assertion failure for a slight variation of the test case: When
fixing up MemoryPhis, we map the incoming access to the access of the
cloned instruction -- which may now no longer exist.

Fix this by reusing the getNewDefiningAccessForClone() helper, which
will look upwards for a new defining access in that case.

(cherry picked from commit a7a1b8b17e264fb0f2d2b4165cf9a7f5094b08b3)

[Clang][AArch64] Emit 'unimplemented' diagnostic for SME (#80295)

When a function F has ZA and ZT0 state, calls another function G that
only shares ZT0 state with its caller, F will have to save ZA before
the call to G, and restore it afterwards (rather than setting up a
lazy-sve).

This is not yet implemented in LLVM and does not result in a
compile-time error either. So instead of silently generating incorrect
code, it's better to emit an error saying this is not yet implemented.

(cherry picked from commit 319f4c03ba2909c7240ac157cc46216bf1518c10)

[LoongArch] Fixing the incorrect return value of LoongArchTTIImpl::getRegisterBitWidth (#79441)

When we do not enable vector features, we should return the default
value (`TargetTransformInfoImplBase::getRegisterBitWidth`) instead of
zero.

This should fix the LoongArch [buildbot
breakage](https://lab.llvm.org/staging/#/builders/5/builds/486) from
#78943.

(cherry picked from commit 1e9924c1f248bbddcb95d82a59708d617297dad3)

[clang] Represent array refs as `TemplateArgument::Declaration` (#80050)

This returns (probably temporarily) array-referring NTTP behavior to
which was prior to #78041 because ~~I'm fed up~~ have no time to fix
regressions.

(cherry picked from commit 9bf4e54ef42d907ae7550f36fa518f14fa97af6f)

ReleaseNotes: add lld/ELF notes (#80393)

[ELF] Fix compareSections assertion failure when OutputDescs in sectionCommands are non-contiguous

In a `--defsym y0=0 -T a.lds` link where a.lds contains only INSERT
commands, the `script->sectionCommands` layout may be:
```
orphan sections
SymbolAssignment due to --defsym
sections created by INSERT commands
```

The `OutputDesc` objects are not contiguous in sortInputSections, and
`compareSections` will be called with a SymbolAssignment argument,
leading to an assertion failure.

(cherry picked from commit dee8786f70a3d62b639113343fa36ef55bdbad63)

Backport '[clang] static operators should evaluate object argument (reland)' to release/18.x (#80109)

Cherry picked from commit ee01a2c3996f9647f3158f5acdb921a6ede94dc1.

Closes #80041, backport #80108.

Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>

[LAA] Drop alias scope metadata that is not valid across iterations (#79161)

LAA currently adds memory locations with their original AATags to AST.
However, scoped alias AATags may be valid only within one loop
iteration, while LAA reasons across iterations.

Fix this by determining which alias scopes are defined inside the loop,
and drop AATags that reference these scopes.

Fixes https://github.com/llvm/llvm-project/issues/79137.

(cherry picked from commit cd7ea4ea657ea41b42fcbd0e6b33faa46608d18e)

[PhaseOrdering] Add additional test for #79161 (NFC)

(cherry picked from commit 543cf08636f3a3bb55dddba2e8cad787601647ba)

[Clang][AArch64] Add missing SME macros (#80293)

__ARM_STATE_ZA and __ARM_STATE_ZT0 are set when the compiler can parse
the "za" and "zt0" strings in the SME attributes.

__ARM_FEATURE_SME and __ARM_FEATURE_SME2 are set when the compiler can
generate code for attributes with "za" and "zt0" state, respectively.

__ARM_FEATURE_LOCALLY_STREAMING is set when the compiler supports the
__arm_locally_streaming attribute.

(cherry picked from commit 9e649518e6038a5b9ea38cfa424468657d3be59e)

[libc++][modules] Support using the module std with -fno-char8_t. (#79155)

Exclude some using-declarations in the module purview when compiling
with `-fno-char8_t`.

(cherry picked from commit dc4483659fc51890fdc732acc66a4dcda6e68047)

[TLI][AArch64] Adjust TLI mappings to vector functions taking linear pointers (#80296)

The masked symbols in SLEEF are incorrectly implemented as calls to non
masked variants, what only works fine for functions which do not modify
memory.
For vector variants which modify memory we can only use a non masked
symbols for now.
The SVE ArmPL mappings need to be removed for now as well.

(cherry picked from commit 0f26441cb83c1dea9aef12c748a79e3f38e3230a)

[openmp] On Windows, fix standalone cmake build (#80174)

This fixes: https://github.com/llvm/llvm-project/issues/80117

(cherry picked from commit d2565bb11308f6cf98d838e828d9bcbe2d51e0e4)

[coverage] fix crash in code coverage and `if constexpr` with `ExprWithCleanups` (#80292)

Fixes https://github.com/llvm/llvm-project/issues/80285

(cherry picked from commit bfc6eaa26326e4d0d20d1f4a1f0064c6df0135bd)

[🍒][libc++] Fix missing and incorrect push/pop macros (#79204) (#79497)

We recently noticed that the unwrap_iter.h file was pushing macros, but
it was pushing them again instead of popping them at the end of the
file. This led to libc++ basically swallowing any custom definition of
these macros in user code:

    #define min HELLO
    #include <algorithm>
    // min is not HELLO anymore, it's not defined

While investigating this issue, I noticed that our push/pop pragmas were
actually entirely wrong too. Indeed, instead of pushing macros like
`move`, we'd push `move(int, int)` in the pragma, which is not a valid
macro name. As a result, we would not actually push macros like `move`
-- instead we'd simply undefine them. This led to the following code not
working:

    #define move HELLO
    #include <algorithm>
    // move is not HELLO anymore

Fixing the pragma push/pop incantations led to a cascade of issues
because we use identifiers like `move` in a large number of places, and
all of these headers would now need to do the push/pop dance.

This patch fixes all these issues. First, it adds a check that we don't
swallow important names like min, max, move or refresh as explained
above. This is done by augmenting the existing
system_reserved_names.gen.py test to also check that the macros are what
we expect after including each header.

Second, it fixes the push/pop pragmas to work properly and adds missing
pragmas to all the files I could detect a failure in via the newly added
test.

rdar://121365472
(cherry picked from commit 7b4622514d232ce5f7110dd8b20d90e81127c467)

[ELF] --warn-backrefs: --defsym does not make a backward reference

The interaction between --warn-backrefs was not tested, but if
--defsym-created reference causes archive member extraction, it seems
reasonable to suppress the diagnostic, which was the behavior before #78944.

(cherry picked from commit 9a1ca245c8bc60b1ca12cd906fb31130801d977e)

[llvm-jitlink] Fix detectStubKind() for big endian systems (#79970)

This function is used in `jitlink-check` lines in LIT tests. In #78371 I
missed to swap initial instruction bytes for systems that store the
constants as big-endian.

(cherry picked from commit 8a5bdd899f3cb57024d92b96c16e805ca9924ac7)

Fix analyzer crash on 'StructuralValue' (#79764)

`OpaqueValueExpr` doesn't necessarily contain a source expression.
Particularly, after #78041, it is used to carry the type and the value
kind of a non-type template argument of floating-point type or referring
to a subobject (those are so called `StructuralValue` arguments).

This fixes #79575.

(cherry picked from commit ef67f63fa5f950f4056b5783e92e137342805d74)

[AArch64] Don't generate neon integer complex numbers with +sve2. NFC (#79829)

The condition for allowing integer complex number support could also
allow neon fixed length complex numbers if +sve2 was specified. This
tightens the condition to only allow integer complex number support for
scalable vectors.

We could generalize this in the future to generate SVE intrinsics for
fixed-length vectors, but for the moment this opts for the simpler fix.

(cherry picked from commit 9520773c46777adbc1d489f831d6c93b8287ca0e)

[Docs] Fix documentation build.

Missing ending `` after c92ad411f2f94d8521cd18abcb37285f9a390ecb

[RISCV] Support __riscv_v_fixed_vlen for vbool types. (#76551)

This adopts a similar behavior to AArch64 SVE, where bool vectors are
represented as a vector of chars with 1/8 the number of elements. This
ensures the vector always occupies a power of 2 number of bytes.

A consequence of this is that vbool64_t, vbool32_t, and vool16_t can
only be used with a vector length that guarantees at least 8 bits.

[RISCV] Fix M1 shuffle on wrong SrcVec in lowerShuffleViaVRegSplitting

This fixes a miscompile from #79072 where we were taking the wrong SrcVec to do
the M1 shuffle. E.g. if the SrcVecIdx was 2 and we had 2 VRegsPerSrc, we ended
up taking it from V1 instead of V2.

[RISCV] Add test to showcase miscompile from #79072

[AArch64][SME] Fix inlining bug introduced in #78703 (#79994)

Calling a `__arm_locally_streaming` function from a function that
is not a streaming-SVE function would lead to incorrect inlining.

The issue didn't surface because the tests were not testing what
they were supposed to test.

(cherry picked from commit 3abf55a68caefd45042c27b73a658c638afbbb8b)

[SME] Stop RA from coalescing COPY instructions that transcend beyond smstart/smstop. (#78294)

This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that
expands to
a 'nop', but which stops the register allocator from coalescing a COPY
node when
its use/def crosses a SMSTART or SMSTOP instruction.

For example:

    %0:fpr64 = COPY killed $d0
    undef %2.dsub:zpr = COPY %0       // <- Do not coalesce this COPY
    ADJCALLSTACKDOWN 0, 0
MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0
    $d0 = COPY killed %0
    BL @use_f64, csr_aarch64_aapcs

If the COPY would be coalesced, that would lead to:

    $d0 = COPY killed %0

being replaced by:

    $d0 = COPY killed %2.dsub

which means the whole ZPR reg would be live upto the call, causing the
MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register:

    str     q0, [sp]   // 16-byte Folded Spill
    smstop  sm
    ldr     z0, [sp]   // 16-byte Folded Reload
    bl      use_f64

which would be incorrect for two reasons:
1. The program may load more data than it has allocated.
2. If there are other SVE objects on the stack, the compiler might use
the
   'mul vl' addressing modes to access the spill location.

By disabling the coalescing, we get the desired results:

    str     d0, [sp, #8]  // 8-byte Folded Spill
    smstop  sm
    ldr     d0, [sp, #8]  // 8-byte Folded Reload
    bl      use_f64

(cherry picked from commit dd736661826e215ac70ff3a4a4ccd75bda0c5ccd)

[sanitizer] Handle Gentoo's libstdc++ path

On Gentoo, libc++ is indeed in /usr/include/c++/*, but libstdc++ is at
e.g. /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14.

Use '/include/g++' as it should be unique enough. Note that the omission of
a trailing slash is intentional to match g++-*.

See https://github.com/llvm/llvm-project/pull/78534#issuecomment-1904145839.

Reviewed by: mgorny
Closes: https://github.com/llvm/llvm-project/pull/79264
Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit e8f882f83acf30d9b4da8846bd26314139660430)

Backport [RISCV] Graduate Zicond to non-experimental (#79811) (#80018)

The Zicond extension was ratified in the last few months, with no
changes that affect the LLVM implementation. Although there's surely
more tuning that could be done about when to select Zicond or not, there
are no known correctness issues. Therefore, we should mark support as
non-experimental.

(cherry-picked from commit d833b9d677c9dd0a35a211e2fdfada21ea9a464b)

Apply kind code check on exitstat and cmdstat  (#78286)

When testing on gcc, both exitstat and cmdstat must be a kind=4 integer,
e.g. DefaultInt. This patch changes the input arg requirement from
`AnyInt` to `TypePattern{IntType, KindCode::greaterOrEqualToKind, n}`.

The standard stated in 16.9.73
- EXITSTAT (optional) shall be a scalar of type integer with a decimal
exponent range of at least nine.
- CMDSTAT (optional) shall be a scalar of type integer with a decimal
exponent range of at least four.

```fortran
program bug
  implicit none
  integer(kind = 2) :: exitstatvar
  integer(kind = 4) :: cmdstatvar
  character(len=256) :: msg
  character(len=:), allocatable :: command
  command='echo hello'
  call execute_command_line(command, exitstat=exitstatvar, cmdstat=cmdstatvar)
end program
```
When testing the above program with exitstatvar kind<4, an error would
occur:
```
$ ../build-release/bin/flang-new test.f90
error: Semantic errors in test.f90
./test.f90:8:47: error: Actual argument for 'exitstat=' has bad type or kind 'INTEGER(2)'
    call execute_command_line(command, exitstat=exitstatvar)
```

When testing the above program with exitstatvar kind<2, an error would
occur:
```
$ ../build-release/bin/flang-new test.f90
error: Semantic errors in test.f90
./test.f90:8:47: error: Actual argument for 'cmdstat=' has bad type or kind 'INTEGER(1)'
    call execute_command_line(command, cmdstat=cmdstatvar)
```

Test file for this semantics has been added to `flang/test/Semantic`
Fixes: https://github.com/llvm/llvm-project/issues/77990
(cherry picked from commit 14a15103cc9dbdb3e95c04627e0b96b5e3aa4944)

Revert "[AArch64] merge index address with large offset into base address"

This reverts commit 32878c2065c8005b3ea30c79e16dfd7eed55d645 due to #79756 and #76202.

(cherry picked from commit 915c3d9e5a2d1314afe64cd6116a3b6c9809ec90)

[mlir] Revert to old fold logic in IR::Dialect::add{Types, Attributes}() (#79582)

Fold expressions on Clang are limited to 256 elements. This causes
compilation errors in cases when the amount of elements added exceeds
this limit. Side-step the issue by restoring the original trick that
would use the std::initializer_list. For the record, in our downstream
Clang 16 gives:

mlir/include/mlir/IR/Dialect.h:269:23: fatal error: instantiating fold
expression with 688 arguments exceeded expression nesting limit of 256
(addType<Args>(), ...);

Partially reverts 26d811b3ecd2fa1ca3d9b41e17fb42b8c7ad03d6.

Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit e3a38a75ddc6ff00301ec19a0e2488d00f2cc297)

[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)

When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc296792b577ae8f0f674e61a0929b99d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.

(cherry picked from commit 962fbafecf4730ba84a3b9fd7a662a5c30bb2c7c)

[AMDGPU] Do not bother adding reserved registers to liveins (#79436)

Tweak the implementation of llvm.amdgcn.wave.id to not add TTMP8 to the
function liveins.

[AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.

Revert "[SemaCXX] Implement CWG2137 (list-initialization from objects of the same type) (#77768)"

This reverts commit 924701311aa79180e86ad8ce43d253f27d25ec7d. Causes compilation
errors on valid code, see
https://github.com/llvm/llvm-project/pull/77768#issuecomment-1908062472.

(cherry picked from commit 6e4930c67508a90bdfd756f6e45417b5253cd741)

[mlir][LLVM] Use int32_t to indirectly construct GEPArg (#79562)

GEPArg can only be constructed from int32_t and mlir::Value. Explicitly
cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing
conversion warnings on MSVC. Some recent examples of such are:

```
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'size_t' to 'T' requires a narrowing
conversion
    with
    [
        T=mlir::LLVM::GEPArg
    ]

mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'unsigned int' to 'T' requires a narrowing
conversion
    with
    [
        T=mlir::LLVM::GEPArg
    ]
```

Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit 89cd345667a5f8f4c37c621fd8abe8d84e85c050)

[llvm] [cmake] Include httplib in LLVMConfig.cmake (#79305)

Include LLVM_ENABLE_HTTPLIB along with httplib package finding in
LLVMConfig.cmake, as this dependency is needed by LLVMDebuginfod that is
now used by LLDB. Without it, building LLDB standalone fails with:

```
CMake Error at /usr/lib/llvm/19/lib64/cmake/llvm/LLVMExports.cmake:90 (set_target_properties):
  The link interface of target "LLVMDebuginfod" contains:

    httplib::httplib

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /usr/lib/llvm/19/lib64/cmake/llvm/LLVMConfig.cmake:357 (include)
  cmake/modules/LLDBStandalone.cmake:9 (find_package)
  CMakeLists.txt:34 (include)
```

(cherry picked from commit 3c9f34c12450345c6eb524e47cf79664271e4260)

[workflows] Fix argument passing in abi-dump jobs (#79658) (#79836)

This was broken by 859e6aa1008b80d9b10657bac37822a32ee14a23, which added
quotes around the EXTRA_ARGS variable.

[AMDGPU] Move architected SGPR implementation into isel (#79120)

(cherry picked from commit 70fc9703788e8965813c5b677a85cb84b66671b6)

workflows: Merge LLVM tests together into a single job (#78877) (#79710)

This is possible now that the free GitHub runners for Windows and Linux
have more disk space:

https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/

I also had to switch from macOS-11 to macOS-13 in order to prevent the
job from timing out. macOS-13 runners have 4 vCPUs and the macOS-11
runners only have 3.

[LTO] Fix Veclib flags correctly pass to LTO flags (#78749)

Flags `-fveclib=name` were not passed to LTO flags.
This pass fixes that by converting the `-fveclib` flags to their
relevant names for opt's `-vector-lib=name` flags.

For example:
`-fveclib=SLEEF` would become `-vector-library=sleefgnuabi` and passed
through the `-plugin-opt` flag.

(cherry picked from commit 03cf0e9354e7e56ff794e9efb682ed2971bc91ec)

Fix comparison of Structural Values

Fixes a regression from #78041 as reported in the review. The original
patch failed to compare the canonical type, which this adds. A slightly
modified test of the original report is added.

(cherry picked from commit e3ee3762304aa81e4a240500844bfdd003401b36)

[X86] Do not end 'note.gnu.property' section with -fcf-protection (#79360)

The glibc now adds the required minimum ISA level for libc-nonshared.a
(linked on all programs) and this is done with an inline asm along with
.note.gnu.property and .pushsection/.popsection. However, the x86
backend always ends the 'note.gnu.property' section when building with
-fcf-protection, leading to assert failure:

llvm/llvm-project-git/llvm/lib/MC/MCStreamer.cpp:1251: virtual void
llvm::MCStreamer::switchSection(llvm::MCSection*, const llvm::MCExpr*):
Assertion `!Section->hasEnded() && "Section already ended"' failed.

[1]
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/isa-level.c;h=3f1b269848a52f994275bab6f60dded3ded6b144;hb=HEAD

(cherry picked from commit a58c62fa824fd24d20fa2366e0ec8f241cb321fe)

[LTO] Fix fat-lto output for -c -emit-llvm. (#79404)

Fix and add a test case for combining '-ffat-lto-objects -c -emit-llvm'
options and fix a spelling mistake in same test.

(cherry picked from commit f1b1611148fa533fe198fec3fa4ef8139224dc80)

[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)

…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
(cherry picked from commit cfddb59be2124f7ec615f48a2d0395c6fdb1bb56)

[ELF] Implement R_RISCV_TLSDESC for RISC-V

Support
R_RISCV_TLSDESC_HI20/R_RISCV_TLSDESC_LOAD_LO12/R_RISCV_TLSDESC_ADD_LO12/R_RISCV_TLSDESC_CALL.
LOAD_LO12/ADD_LO12/CALL relocations reference a label at the HI20
location, which requires special handling. We save the value of HI20 to
be reused. Two interleaved TLSDESC code sequences, which compilers do
not generate, are unsupported.

For -no-pie/-pie links, TLSDESC to initial-exec or local-exec
optimizations are eligible. Implement the relevant hooks
(R_RELAX_TLS_GD_TO_LE, R_RELAX_TLS_GD_TO_IE): the first two instructions
are converted to NOP while the latter two are converted to a GOT load or
a lui+addi.

The first two instructions, which would be converted to NOP, are removed
instead in the presence of relaxation. Relaxation is eligible as long as
the R_RISCV_TLSDESC_HI20 relocation has a pairing R_RISCV_RELAX,
regardless of whether the following instructions have a R_RISCV_RELAX.
In addition, for the TLSDESC to LE optimization (`lui a0,<hi20>; addi a0,a0,<lo12>`),
`lui` can be removed (i.e. use the short form) if hi20 is 0.

```
// TLSDESC to LE/IE optimization
.Ltlsdesc_hi2:
  auipc a4, %tlsdesc_hi(c)                      # if relax: remove; otherwise, NOP
  load  a5, %tlsdesc_load_lo(.Ltlsdesc_hi2)(a4) # if relax: remove; otherwise, NOP
  addi  a0, a4, %tlsdesc_add_lo(.Ltlsdesc_hi2)  # if LE && !hi20 {if relax: remove; otherwise, NOP}
  jalr  t0, 0(a5), %tlsdesc_call(.Ltlsdesc_hi2)
  add   a0, a0, tp
```

The implementation carefully ensures that an instruction unrelated to
the current TLSDESC code sequence, if immediately follows a removable
instruction (HI20 or LOAD_LO12 OR (LE-specific) ADD_LO12), is not
converted to NOP.

* `riscv64-tlsdesc.s` is inspired by `i386-tlsdesc-gd.s` (https://reviews.llvm.org/D112582).
* `riscv64-tlsdesc-relax.s` tests linker relaxation.
* `riscv-tlsdesc-gd-mixed.s` is inspired by `x86-64-tlsdesc-gd-mixed.s` (https://reviews.llvm.org/D116900).

Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373
Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/79239

(cherry picked from commit 1117fdd7c16873eb389e988c6a39ad922bae0fd0)

[ELF] Fix terminology: TLS optimizations instead of TLS relaxation. NFC

(cherry picked from commit 849951f8759171cb6c74d3ccbcf154506fc1f0ae)

[ELF] Clean up R_RISCV_RELAX code. NFC

(cherry picked from commit ccb99f221422b8de5e1ae04d3427f15878f7cd93)

[RISCV] Use TableGen-based macro fusion (#72224)

We convert existed macro fusions to TableGen.

Bacause `Fusion` depend on `Instruction` definitions which is defined
below `RISCVFeatures.td`, so we recommend user to add fusion features
when defining new processor.

(cherry picked from commit 3fdb431b636975f2062b1931158aa4dfce6a3ff1)

[TableGen] Add predicates for immediates comparison (#76004)

These predicates can be used to represent `<`, `<=`, `>`, `>=`.

And a predicate for `in range` is added.

(cherry picked from commit 664a0faac464708fc061d12e5cd492fcbfea979a)

[TableGen] Use MapVector to remove non-determinism

This fixes found non-determinism when `LLVM_REVERSE_ITERATION`
option is `ON`.

Fixes #79420.

Reviewers: ilovepi, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/79411

(cherry picked from commit 41fe98a6e7e5cdcab4a4e9e0d09339231f480c01)

[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>

[Driver,CodeGen] Support -mtls-dialect= (#79256)

GCC supports -mtls-dialect= for several architectures to select TLSDESC.
This patch supports the following values

* x86: "gnu". "gnu2" (TLSDESC) is not supported yet.
* RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915)

AArch64 toolchains seem to support TLSDESC from the beginning, and the
general dynamic model has poor support. Nobody seems to use the option
-mtls-dialect= at all, so we don't bother with it.
There also seems very little interest in AArch32's TLSDESC support.

TLSDESC does not change IR, but affects object file generation. Without
a backend option the option is a no-op for in-process ThinLTO.

There seems no motivation to have fine-grained control mixing trad/desc
for TLS, so we just pass -mllvm, and don't bother with a modules flag
metadata or function attribute.

Co-authored-by: Paul Kirth <paulkirth@google.com>
(cherry picked from commit 36b4a9ccd9f7e04010476e6b2a311f2052a4ac20)

Change check for embedded llvm version number to a regex to make test more flexible. (#79528) (#79642)

This test started to fail when LLVM created the release/18.x branch and
the main branch subsequently had the version number increased from 18 to
19.

I investigated this failure (it was blocking our internal automation)
and discovered that the CHECK statement on line 27 seemed to have the
compiler version number (1800) encoded in octal that it was checking
for. I don't know if this is something that explicitly needs to be
checked, so I am leaving it in, but it should be more flexible so the
test doesn't fail anytime the version number is changed. To accomplish
that, I changed the check for the 4-digit version number to be a regex.

I originally updated this test for the 18->19 transition in
a01195ff5cc3d7fd084743b1f47007645bb385f4. This change makes the CHECK
line more flexible so it doesn't need to be continually updated.

(cherry picked from commit 45f883ed06f39fba7557dfbbff4d10595b45f874)

[ELF] Don't resolve relocations referencing SHN_ABS to tombstone in non-SHF_ALLOC sections (#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before #74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(https://github.com/llvm/llvm-project/pull/74686#issuecomment-1904101711)
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d124ae346016c56209de7f57b85671d4367)

[X86][CodeGen] Fix crash when commute operands of Instruction for code size (#79245)

Reported in 134fcc62786d31ab73439201dce2d73808d1785a

Incorrect opcode is used b/c there is a `[[fallthrough]]` at line 2386.

(cherry picked from commit 33ecef9812e2c9bfadef035b8e34a949acae2abc)

[test] Update dwarf-loongarch-relocs.ll

Address buildbot faiures:
http://45.33.8.238/macm1/77360/step_11.txt
http://45.33.8.238/linux/128902/step_12.txt

(cherry picked from commit baba7e4175b6ca21e83b1cf8229f29dbba02e979)

[🍒][ci] Fix the base branch we use to determine changes (#79503) (#79506)

We should diff against the base branch, not always against `main`. This
allows the BuildKite pre-commit CI to work properly when we target other
branches, such as `release/18.x`.

(cherry picked from commit 3b762891826192ded07286852d326f9c9060f52e)

[workflows] Fix version-check.yml to work with the new minor release bump

(cherry picked from commit d5e69147b9d261bd53b4dd027f17131677be8613)

Use rc version suffix

Bump version to 18.1.0

[RISCV][MC] Split tests for A into Zaamo and Zalrsc parts

So that we don't duplicate tests in later patch.

Reviewers: topperc, dtcxzyw, asb

Reviewed By: asb

Pull Request: https://github.com/llvm/llvm-project/pull/79111

[RISCV] Add sifive-p670 processor (#79015)

This is an OOO core that has a vector unit. For more information see
https://www.sifive.com/cores/performance-p650-670.

Scheduler model and other tuning will come in separate patches.

[llc] Remove C backend support (#79237)

C backend is removed in 3.1.

[Modules] [HeaderSearch] Don't reenter headers if it is pragma once (#76119)

Close https://github.com/llvm/llvm-project/issues/73023

The direct issue of https://github.com/llvm/llvm-project/issues/73023 is
that we entered a header which is marked as pragma once since the
compiler think it is OK if there is controlling macro.

It doesn't make sense. I feel like it should be sufficient to skip it
after we see the '#pragma once'.

From the context, it looks like the workaround is primarily for
ObjectiveC. So we might need reviewers from OC.