review.tizen.org Git - platform/upstream/llvm.git/log

[mlir] Revert to old fold logic in IR::Dialect::add{Types, Attributes}() (#79582)

Fold expressions on Clang are limited to 256 elements. This causes
compilation errors in cases when the amount of elements added exceeds
this limit. Side-step the issue by restoring the original trick that
would use the std::initializer_list. For the record, in our downstream
Clang 16 gives:

mlir/include/mlir/IR/Dialect.h:269:23: fatal error: instantiating fold
expression with 688 arguments exceeded expression nesting limit of 256
(addType<Args>(), ...);

Partially reverts 26d811b3ecd2fa1ca3d9b41e17fb42b8c7ad03d6.

Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit e3a38a75ddc6ff00301ec19a0e2488d00f2cc297)

[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)

When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc296792b577ae8f0f674e61a0929b99d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.

(cherry picked from commit 962fbafecf4730ba84a3b9fd7a662a5c30bb2c7c)

[AMDGPU] Do not bother adding reserved registers to liveins (#79436)

Tweak the implementation of llvm.amdgcn.wave.id to not add TTMP8 to the
function liveins.

[AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.

Revert "[SemaCXX] Implement CWG2137 (list-initialization from objects of the same type) (#77768)"

This reverts commit 924701311aa79180e86ad8ce43d253f27d25ec7d. Causes compilation
errors on valid code, see
https://github.com/llvm/llvm-project/pull/77768#issuecomment-1908062472.

(cherry picked from commit 6e4930c67508a90bdfd756f6e45417b5253cd741)

[mlir][LLVM] Use int32_t to indirectly construct GEPArg (#79562)

GEPArg can only be constructed from int32_t and mlir::Value. Explicitly
cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing
conversion warnings on MSVC. Some recent examples of such are:

```
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'size_t' to 'T' requires a narrowing
conversion
    with
    [
        T=mlir::LLVM::GEPArg
    ]

mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'unsigned int' to 'T' requires a narrowing
conversion
    with
    [
        T=mlir::LLVM::GEPArg
    ]
```

Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit 89cd345667a5f8f4c37c621fd8abe8d84e85c050)

[llvm] [cmake] Include httplib in LLVMConfig.cmake (#79305)

Include LLVM_ENABLE_HTTPLIB along with httplib package finding in
LLVMConfig.cmake, as this dependency is needed by LLVMDebuginfod that is
now used by LLDB. Without it, building LLDB standalone fails with:

```
CMake Error at /usr/lib/llvm/19/lib64/cmake/llvm/LLVMExports.cmake:90 (set_target_properties):
  The link interface of target "LLVMDebuginfod" contains:

    httplib::httplib

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /usr/lib/llvm/19/lib64/cmake/llvm/LLVMConfig.cmake:357 (include)
  cmake/modules/LLDBStandalone.cmake:9 (find_package)
  CMakeLists.txt:34 (include)
```

(cherry picked from commit 3c9f34c12450345c6eb524e47cf79664271e4260)

[workflows] Fix argument passing in abi-dump jobs (#79658) (#79836)

This was broken by 859e6aa1008b80d9b10657bac37822a32ee14a23, which added
quotes around the EXTRA_ARGS variable.

[AMDGPU] Move architected SGPR implementation into isel (#79120)

(cherry picked from commit 70fc9703788e8965813c5b677a85cb84b66671b6)

workflows: Merge LLVM tests together into a single job (#78877) (#79710)

This is possible now that the free GitHub runners for Windows and Linux
have more disk space:

https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/

I also had to switch from macOS-11 to macOS-13 in order to prevent the
job from timing out. macOS-13 runners have 4 vCPUs and the macOS-11
runners only have 3.

[LTO] Fix Veclib flags correctly pass to LTO flags (#78749)

Flags `-fveclib=name` were not passed to LTO flags.
This pass fixes that by converting the `-fveclib` flags to their
relevant names for opt's `-vector-lib=name` flags.

For example:
`-fveclib=SLEEF` would become `-vector-library=sleefgnuabi` and passed
through the `-plugin-opt` flag.

(cherry picked from commit 03cf0e9354e7e56ff794e9efb682ed2971bc91ec)

Fix comparison of Structural Values

Fixes a regression from #78041 as reported in the review. The original
patch failed to compare the canonical type, which this adds. A slightly
modified test of the original report is added.

(cherry picked from commit e3ee3762304aa81e4a240500844bfdd003401b36)

[X86] Do not end 'note.gnu.property' section with -fcf-protection (#79360)

The glibc now adds the required minimum ISA level for libc-nonshared.a
(linked on all programs) and this is done with an inline asm along with
.note.gnu.property and .pushsection/.popsection. However, the x86
backend always ends the 'note.gnu.property' section when building with
-fcf-protection, leading to assert failure:

llvm/llvm-project-git/llvm/lib/MC/MCStreamer.cpp:1251: virtual void
llvm::MCStreamer::switchSection(llvm::MCSection*, const llvm::MCExpr*):
Assertion `!Section->hasEnded() && "Section already ended"' failed.

[1]
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/isa-level.c;h=3f1b269848a52f994275bab6f60dded3ded6b144;hb=HEAD

(cherry picked from commit a58c62fa824fd24d20fa2366e0ec8f241cb321fe)

[LTO] Fix fat-lto output for -c -emit-llvm. (#79404)

Fix and add a test case for combining '-ffat-lto-objects -c -emit-llvm'
options and fix a spelling mistake in same test.

(cherry picked from commit f1b1611148fa533fe198fec3fa4ef8139224dc80)

[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)

…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
(cherry picked from commit cfddb59be2124f7ec615f48a2d0395c6fdb1bb56)

[ELF] Implement R_RISCV_TLSDESC for RISC-V

Support
R_RISCV_TLSDESC_HI20/R_RISCV_TLSDESC_LOAD_LO12/R_RISCV_TLSDESC_ADD_LO12/R_RISCV_TLSDESC_CALL.
LOAD_LO12/ADD_LO12/CALL relocations reference a label at the HI20
location, which requires special handling. We save the value of HI20 to
be reused. Two interleaved TLSDESC code sequences, which compilers do
not generate, are unsupported.

For -no-pie/-pie links, TLSDESC to initial-exec or local-exec
optimizations are eligible. Implement the relevant hooks
(R_RELAX_TLS_GD_TO_LE, R_RELAX_TLS_GD_TO_IE): the first two instructions
are converted to NOP while the latter two are converted to a GOT load or
a lui+addi.

The first two instructions, which would be converted to NOP, are removed
instead in the presence of relaxation. Relaxation is eligible as long as
the R_RISCV_TLSDESC_HI20 relocation has a pairing R_RISCV_RELAX,
regardless of whether the following instructions have a R_RISCV_RELAX.
In addition, for the TLSDESC to LE optimization (`lui a0,<hi20>; addi a0,a0,<lo12>`),
`lui` can be removed (i.e. use the short form) if hi20 is 0.

```
// TLSDESC to LE/IE optimization
.Ltlsdesc_hi2:
  auipc a4, %tlsdesc_hi(c)                      # if relax: remove; otherwise, NOP
  load  a5, %tlsdesc_load_lo(.Ltlsdesc_hi2)(a4) # if relax: remove; otherwise, NOP
  addi  a0, a4, %tlsdesc_add_lo(.Ltlsdesc_hi2)  # if LE && !hi20 {if relax: remove; otherwise, NOP}
  jalr  t0, 0(a5), %tlsdesc_call(.Ltlsdesc_hi2)
  add   a0, a0, tp
```

The implementation carefully ensures that an instruction unrelated to
the current TLSDESC code sequence, if immediately follows a removable
instruction (HI20 or LOAD_LO12 OR (LE-specific) ADD_LO12), is not
converted to NOP.

* `riscv64-tlsdesc.s` is inspired by `i386-tlsdesc-gd.s` (https://reviews.llvm.org/D112582).
* `riscv64-tlsdesc-relax.s` tests linker relaxation.
* `riscv-tlsdesc-gd-mixed.s` is inspired by `x86-64-tlsdesc-gd-mixed.s` (https://reviews.llvm.org/D116900).

Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373
Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/79239

(cherry picked from commit 1117fdd7c16873eb389e988c6a39ad922bae0fd0)

[ELF] Fix terminology: TLS optimizations instead of TLS relaxation. NFC

(cherry picked from commit 849951f8759171cb6c74d3ccbcf154506fc1f0ae)

[ELF] Clean up R_RISCV_RELAX code. NFC

(cherry picked from commit ccb99f221422b8de5e1ae04d3427f15878f7cd93)

[RISCV] Use TableGen-based macro fusion (#72224)

We convert existed macro fusions to TableGen.

Bacause `Fusion` depend on `Instruction` definitions which is defined
below `RISCVFeatures.td`, so we recommend user to add fusion features
when defining new processor.

(cherry picked from commit 3fdb431b636975f2062b1931158aa4dfce6a3ff1)

[TableGen] Add predicates for immediates comparison (#76004)

These predicates can be used to represent `<`, `<=`, `>`, `>=`.

And a predicate for `in range` is added.

(cherry picked from commit 664a0faac464708fc061d12e5cd492fcbfea979a)

[TableGen] Use MapVector to remove non-determinism

This fixes found non-determinism when `LLVM_REVERSE_ITERATION`
option is `ON`.

Fixes #79420.

Reviewers: ilovepi, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/79411

(cherry picked from commit 41fe98a6e7e5cdcab4a4e9e0d09339231f480c01)

[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>

[Driver,CodeGen] Support -mtls-dialect= (#79256)

GCC supports -mtls-dialect= for several architectures to select TLSDESC.
This patch supports the following values

* x86: "gnu". "gnu2" (TLSDESC) is not supported yet.
* RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915)

AArch64 toolchains seem to support TLSDESC from the beginning, and the
general dynamic model has poor support. Nobody seems to use the option
-mtls-dialect= at all, so we don't bother with it.
There also seems very little interest in AArch32's TLSDESC support.

TLSDESC does not change IR, but affects object file generation. Without
a backend option the option is a no-op for in-process ThinLTO.

There seems no motivation to have fine-grained control mixing trad/desc
for TLS, so we just pass -mllvm, and don't bother with a modules flag
metadata or function attribute.

Co-authored-by: Paul Kirth <paulkirth@google.com>
(cherry picked from commit 36b4a9ccd9f7e04010476e6b2a311f2052a4ac20)

Change check for embedded llvm version number to a regex to make test more flexible. (#79528) (#79642)

This test started to fail when LLVM created the release/18.x branch and
the main branch subsequently had the version number increased from 18 to
19.

I investigated this failure (it was blocking our internal automation)
and discovered that the CHECK statement on line 27 seemed to have the
compiler version number (1800) encoded in octal that it was checking
for. I don't know if this is something that explicitly needs to be
checked, so I am leaving it in, but it should be more flexible so the
test doesn't fail anytime the version number is changed. To accomplish
that, I changed the check for the 4-digit version number to be a regex.

I originally updated this test for the 18->19 transition in
a01195ff5cc3d7fd084743b1f47007645bb385f4. This change makes the CHECK
line more flexible so it doesn't need to be continually updated.

(cherry picked from commit 45f883ed06f39fba7557dfbbff4d10595b45f874)

[ELF] Don't resolve relocations referencing SHN_ABS to tombstone in non-SHF_ALLOC sections (#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before #74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(https://github.com/llvm/llvm-project/pull/74686#issuecomment-1904101711)
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b6b92f7579a30c9ed32dbdf3bc749c1b40, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d124ae346016c56209de7f57b85671d4367)

[X86][CodeGen] Fix crash when commute operands of Instruction for code size (#79245)

Reported in 134fcc62786d31ab73439201dce2d73808d1785a

Incorrect opcode is used b/c there is a `[[fallthrough]]` at line 2386.

(cherry picked from commit 33ecef9812e2c9bfadef035b8e34a949acae2abc)

[test] Update dwarf-loongarch-relocs.ll

Address buildbot faiures:
http://45.33.8.238/macm1/77360/step_11.txt
http://45.33.8.238/linux/128902/step_12.txt

(cherry picked from commit baba7e4175b6ca21e83b1cf8229f29dbba02e979)

[🍒][ci] Fix the base branch we use to determine changes (#79503) (#79506)

We should diff against the base branch, not always against `main`. This
allows the BuildKite pre-commit CI to work properly when we target other
branches, such as `release/18.x`.

(cherry picked from commit 3b762891826192ded07286852d326f9c9060f52e)

[workflows] Fix version-check.yml to work with the new minor release bump

(cherry picked from commit d5e69147b9d261bd53b4dd027f17131677be8613)

Use rc version suffix

Bump version to 18.1.0

[RISCV][MC] Split tests for A into Zaamo and Zalrsc parts

So that we don't duplicate tests in later patch.

Reviewers: topperc, dtcxzyw, asb

Reviewed By: asb

Pull Request: https://github.com/llvm/llvm-project/pull/79111

[RISCV] Add sifive-p670 processor (#79015)

This is an OOO core that has a vector unit. For more information see
https://www.sifive.com/cores/performance-p650-670.

Scheduler model and other tuning will come in separate patches.

[llc] Remove C backend support (#79237)

C backend is removed in 3.1.

[Modules] [HeaderSearch] Don't reenter headers if it is pragma once (#76119)

Close https://github.com/llvm/llvm-project/issues/73023

The direct issue of https://github.com/llvm/llvm-project/issues/73023 is
that we entered a header which is marked as pragma once since the
compiler think it is OK if there is controlling macro.

It doesn't make sense. I feel like it should be sufficient to skip it
after we see the '#pragma once'.

From the context, it looks like the workaround is primarily for
ObjectiveC. So we might need reviewers from OC.

[gn build] port 7e50f006f7f6

[LSR] Fix incorrect comment. NFC (#79207)

[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)

CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.

This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.

[NewPM][CodeGen][llc] Add NPM support (#70922)

Add new pass manager support to `llc`. Users can use
`--passes=pass1,pass2...` to run mir passes, and use `--enable-new-pm`
to run default codegen pipeline.
This patch is taken from [D83612](https://reviews.llvm.org/D83612), the
original author is @yuanfang-chen.

---------

Co-authored-by: Yuanfang Chen <455423+yuanfang-chen@users.noreply.github.com>

[ELF,test] Improve dead-reloc-in-nonalloc.s

Test an absolute relocation referencing a DSO symbol, relocating a
non-SHF_ALLOC section. Also test --gc-sections.

[SROA] Only try additional vector type candidates when needed (#77678)

https://github.com/llvm/llvm-project/commit/f9c2a341b94ca71508dcefa109ece843459f7f13
causes regressions when we have a slice with integer vector type that is
the same size as the partition, and a ptr load/store slice that is not
the size of the element type.

Ref `vector-promotion.ll:ptrLoadStoreTys`.

Before the patch, we would only consider `<4 x i32>` as a candidate type
for vector promotion, and would find that it is a viable type for all
the slices.

After the patch, we now add `<2 x ptr>` as a candidate type due to slice
with user `store ptr %val0, ptr %obj, align 8` -- and flag that we
`HaveVecPtrTy`. The pre-existing behavior of this flag results in
removing the viable `<4 x i32>` and keeping only the unviable `<2 x
ptr>`, which results in a failure to promote.

The end result is failing to promote an alloca that was previously
promoted -- this does not appear to be the intent of that patch, which
has the goal of increasing promotions by providing more promotion
opportunities.

This PR preserves this behavior via a simple reorganization of the
implemention: try first the slice types with same size as the partition,
then, if there is no promotable type, try the `LoadStoreTys.`

[LoongArch] Insert nops and emit align reloc when handle alignment directive (#72962)

Refer to RISCV, we will fix up the alignment if linker relaxation
changes code size and breaks alignment. Insert enough Nops and emit
R_LARCH_ALIGN relocation type so that linker could satisfy the alignment
by removing Nops.
It does so only in sections with the SHF_EXECINSTR flag.

In LoongArch psABI v2.30, R_LARCH_ALIGN requires symbol index. The
lowest 8 bits of addend represent alignment and the other bits of addend
represent the maximum number of bytes to emit.

[Github] Only run libclang-python-tests on monorepo main

The libclang python binding test CI job currently doesn't have any
restrictions on what branches it will run on when something is pushed
and also isn't restricted to the monorepo. This patch adds a branch
restriction for the push event, only running the CI job when something
is pushed to the main branch (and the path filter is met), and also adds
a filter to ensure that the job comes from a PR against the monorepo or
a push to a branch in the monorepo.

[AsmPrinter] Remove mbb-profile-dump flag (#76595)

Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.

[SROA] NFC: Precommit test for pull/77678

Change-Id: I6b2346301f9bd840a0adceba4a0d03e9932af245

[mlir] Add example of `printAlias` to test dialect (NFC) (#79232)

Follow-up from previous pull request. Motivate the API change with an
attribute that decides between sugaring a sub-attribute or using an
alias

[RISCV] Support TLSDESC in the RISC-V backend (#66915)

This patch adds basic TLSDESC support in the RISC-V backend.

Specifically, we add new relocation types for TLSDESC, as prescribed in
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a
new pseudo instruction to simplify code generation.

This patch does not try to optimize the local dynamic case, which can be
improved in separate patches.

Linker side changes will also be handled separately.

The current implementation is only enabled when passing the new
`-enable-tlsdesc` codegen flag.

[lldb] Improve maintainability and readability for ValueObject methods (#75865)

As I worked through changes to another PR
(https://github.com/llvm/llvm-project/pull/74912), I couldn't help but
rewrite a few methods for readability, maintainability, and possibly
some behavior correctness too.

1. Exiting early instead of nested `if`-statements, which:
- Reduces indentation levels for all subsequent lines
- Treats missing pre-conditions similar to an error
- Clearly indicates that the full length of the method is the "happy
path".
2. Explicitly return empty Value Object shared pointers for those error
(like) situations, which
- Reduces the time it takes a maintainer to figure out what the method
actually returns based on those conditions.

3. Converting a mix of `if` and `if`-`else`-statements around an enum
into one `switch` statement, which:
- Consolidates the former branching logic
- Lets the compiler warn you of a (future) missing enum case
- This one may actually change behavior slightly, because what was an
early test for one enum case, now happens later on in the `switch`.

4. Consolidating near-identical, "copy-pasta" logic into one place,
which:
- Separates the common code to the diverging paths.
- Highlights the differences between the code paths.

rdar://119833526

[nfc][clang] Fix test in new-array-init.cpp (#79225)

This test was originally introduced in
https://github.com/llvm/llvm-project/pull/76976, but it incorrectly
tests braced-list initialization instead of parenthesized
initialization.

[SROA] NFC: Extract code to checkVectorTypesForPromotion

Change-Id: Ib6f237cc791a097f8f2411bc1d6502f11d4a748e

[libc] remove redundant call_once (#79226)

Missed cleanup from https://reviews.llvm.org/D134716.

Fixes: #79220

[Docs][DebugInfo][RemoveDIs] Document some debug-info transition info (#79167)

This is a high level description and FAQ for what we're doing in
RemoveDIs, and how old code should be behave with new debug-info
(exactly the same 99% of the time).

Revert "[ASan][libc++] Turn on ASan annotations for short strings (#79049)"

This reverts commit cb528ec5e6331ce207c7b835d7ab963bd5e13af7.

Reason: buildbot breakage (https://lab.llvm.org/buildbot/#/builders/5/builds/40364):
SUMMARY: AddressSanitizer: container-overflow /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan_ubsan/include/c++/v1/string:1870:29 in __get_long_pointer

[DebugInfo][RemoveDIs] "Final" cleanup for non-instr debug-info (#79121)

Here's a raft of minor fixes for the RemoveDIs project that's replacing
dbg.value intrinsics with DPValue objects, all IMO trivial:
* When inserting functions or blocks and calling setIsNewDbgInfoFormat,
   do that after setting the Parent pointer, just in case conversion from
   (or to) dbg.value mode is triggered.
* When transferring DPValues from an empty range in a splice call, don't
   transfer if there are no DPValues attached to the source block at all.
* stripNonLineTableDebugInfo should drop DPValues.
* In insertBefore, don't try to transfer DPValues if there aren't any.

[mlir][ArithToAMDGPU] Add option for saturating truncation to fp8 (#74153)

Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit floats to be
saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to
yield `240.0`, not `NaN`, and similarly for negative numbers. However,
the underlying hardware instruction that can be used for this truncation
implements overflow-to-NaN semantics.

To enable handling this usecase, we add the saturate-fp8-truncf option
to ArithToAMDGPU (off by default), which causes the requisite clamping
code to be emitted. Said clamping code ensures that Inf and NaN are
passed through exactly (and thus trancate to NaN).

Per review feedback, this commit efactors
createScalarOrSplatConstant() to the Arith dialect utilities and uses
it in this code. It also fixes naming of existing patterns and
switches from vector.extractelement/insertelement to
vector.extract/insert.

[mlir][sparse] adjust compression scheme for example (#79212)

[NFCI] Move SANITIZER_WEAK_IMPORT to sanitizer_common (#79208)

SANITIZER_WEAK_IMPORT is useful for any call that needs to be
conditionally linked in. This is currently used for the
tsan_dispatch_interceptors, but can be used for other calls introduced
in newer versions of MacOS. (such as `aligned_alloc` in this PR
https://github.com/llvm/llvm-project/pull/79198).

This PR moves the definition to a higher level so it can be used in
other sanitizers.

AMDGPU: Add SourceOfDivergence for int_amdgcn_global_load_tr (#79218)

[Clang][Driver] Fix `--save-temps` for OpenCL AoT compilation (#78333)

We can directly call `clang -c -x cl -target amdgcn -mcpu=gfx90a test.cl
-o test.o`
to compile an OpenCL kernel file. However, when `--save-temps` is
enabled, it doesn't
work because the preprocessed file (`.i` file) is taken as C source file
when it
is fed to the front end, thus causing compilation error because those
OpenCL keywords
can't be recognized. This patch fixes the issue.

[misc-coroutine-hostile-raii] Use getOperand instead of getCommonExpr. (#79206)

We were previously allowlisting awaitable types returned by
`await_transform` instead of the type of the operand of the `co_await`
expression.

This previously used to give false positives and not respect the
`AllowedAwaitablesList` flag when `await_transform` is used. See added
test cases for such examples.

[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061)

Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.

This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.

[libc++] Fix outdated release procedure for release notes

[Preprocessor][test] Test ARM64EC definitions (#78916)

[lldb][NFCI] Remove unused method BreakpointIDList::AddBreakpointID(const char *) (#79189)

This overload is completely unused.

[mlir][Target] Teach dense_resource conversion to LLVMIR Target (#78958)

This patch adds support for translating dense_resource attributes to
LLVMIR Target.
The support added is similar to how DenseElementsAttr is handled, except
we
don't need to handle splats.

Another possible way of doing this is adding iteration on
dense_resource, but that is
non-trivial as DenseResourceAttr is not meant to be something you should
directly
access. It has subclasses which you are supposed to use to iterate on
it.

Added feature in llvm-profdata merge to filter functions from the profile (#78378)

`--function=<regex>` Include functions matching regex in the output
`--no-function=<regex>` Exclude functions matching regex from the output

If both are specified, `--no-function` has a higher precedence if a
function name matches both filters

[libc] Fix implicit conversion in FEnvImpl for arm32 targets. (#79210)

[clang] Use LazyDetector for all toolchains. (#79073)

Use the LazyDetector also for the remaining toolchains to avoid unnecessarily checking for the Cuda and Rocm installations.

This fixes rdar://121397534.

[PowerPC] lower partial vector store cost (#78358)

There are matching store opcodes (stfd, stxsiwx) for the load opcodes
that make 32-bit and 64-bit vector operations cheap with VSX, so stores
should also be cheap.

[ELF,test] Actually fix defsym.ll

[ELF,test] Fix defsym.ll

[SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth.

No need in trying to analyze small graphs with gather node only to avoid
crash.

[IndVars] Add NUW variants to iv-poison.ll and variants with extra uses.

[Format] Fix detection of languages when reading from stdin (#79051)

The code cleanup in #74794 accidentally broke detection of languages by
reading file content from stdin, e.g. via `clang-format -dump-config - <
/path/to/filename`.

This PR adds unit and integration tests to reproduce the issue and adds
a fix.

Fixes: #79023

[libc++] Run the nightly libc++ build at 03:00 Eastern for real (#79184)

The nightly libc++ build was incorrectly set up to build at 22:00
Eastern when it intended to run at 03:00 Eastern. This patch fixes that.

[libc] Fix aliasing function name got accidentally deleted in #79128. (#79203)

[RISCV] Move FeatureStdExtH in RISCVFeatures.td. NFC

It was accidentally in the middle of the floating point extensions
after the recent reordering.

Revert "[libc] Fix forward arm32 buildbot" (#79201)

Reverts llvm/llvm-project#79151, necessary to revert #79128, which broke
all production builds and landed without review.

[lldb] Include SBFormat.h in LLDB.h (#79194)

This was likely overlooked when SBFormat was added.

[ELF] --save-temps --lto-emit-asm: derive ELF/asm file names from bitcode file names

Port COFF's https://reviews.llvm.org/D78221 and
https://reviews.llvm.org/D137217 to ELF. For the in-process ThinLTO
link, `ld.lld --save-temps a.o d/b.o -o out` will create
ELF relocatable files `out.lto.a.o`/`d/out.lto.b.o` instead of
`out1.lto.o`/`out2.lto.o`. Deriving the LTO-generated relocatable file
name from bitcode file names helps debugging.

The relocatable file name from the first regular LTO partition does not
change: `out.lto.o`. The second, if present due to `--lto-partition=`,
changes from `out1.lto.o` to `lto.1.o`.

For an archive member, e.g. `d/a.a(coll.o at 8)`,
the relocatable file is `d/out.lto.a.a(coll.o at 8).o`.

`--lto-emit-asm` file names are changed similarly. `--lto-emit-asm -o
out` now creates `out.lto.s` instead of `out`, therefore the
`--lto-emit-asm -o -` idiom no longer works. However, I think this new
behavior (which matches COFF) is better since keeping or removing
`--lto-emit-asm` will dump different files, instead of overwriting the
`-o` output file from an executable/shared object to an assembly file.

Reviewers: rnk, igorkudrin, xur-llvm, teresajohnson, ZequanWu

Reviewed By: teresajohnson

Pull Request: https://github.com/llvm/llvm-project/pull/78835

[CMake][Release] Add option for enabling PGO to release cache file. (#78823)

The option is LLVM_RELEASE_ENABLE_PGO and it's turned on by default.

---------

Co-authored-by: Petr Hosek <phosek@google.com>

[ELF] Improve thin-archivecollision.ll

[NFC] Size and element numbers are often swapped when calling calloc (#79081)

gcc-14 will now throw a warning if size and elements are swapped.

[RISCV] Regenerate autogen test to remove spurious diff

[RISCV] Recurse on first operand of two operand shuffles (#79180)

This is the first step towards an alternate shuffle lowering design for
the general two vector argument case. The goal is to leverage the
existing lowering for single vector permutes to avoid as many of the
vrgathers as required - even if we do need the other.

This patch handles only the first argument, and is arguably a slightly
weird half-step. However, the test changes from the full two argument
recurse patch are a lot harder to reason about. Taking this half step
gives much more easily reviewable changes, and is thus worthwhile. I
intend to post the patch for the second argument once this has landed.

[LLD] [COFF] Fix crashes for cfguard with undefined weak symbols (#79063)

When marking symbols as having their address taken, we can have the
sitaution where we have the address taken of a weak symbol. If there's
no strong definition of the symbol, the symbol ends up as an absolute
symbol with the value null. In those cases, we don't have any Chunk.
Skip such symbols from the cfguard tables.

This fixes https://github.com/llvm/llvm-project/issues/78619.

[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072)

If we have a shuffle which is larger than m1, we may be able to split it
into a series of individual m1 shuffles. This patch starts with the
subcase where the mask allows a 1-to-1 mapping from source register to
destination register - each with a possible permutation of their own. We
can potentially extend this later, thought in practice this seems to
already catch a number of the most interesting cases.

[libc] fix sysconf (#79159)

Apply previously discussed fix for `sysconf`

[ASan][libc++] Turn on ASan annotations for short strings (#79049)

Originally merged here: https://github.com/llvm/llvm-project/pull/75882
Reverted here: https://github.com/llvm/llvm-project/pull/78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- https://github.com/llvm/llvm-project/pull/79065
- https://github.com/llvm/llvm-project/pull/79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
#ifndef NDEBUG
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
#endif
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
https://github.com/llvm/llvm-project/pull/72677

Requires to pass CI without fails:
- https://github.com/llvm/llvm-project/pull/75845
- https://github.com/llvm/llvm-project/pull/75858

Annotating `std::basic_string` with default allocator is implemented in
https://github.com/llvm/llvm-project/pull/72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
https://github.com/llvm/llvm-project/commit/dd1b7b797a116eed588fd752fbe61d34deeb24e4.
You can turn off annotations for a specific allocator based on changes
from
https://github.com/llvm/llvm-project/commit/2fa1bec7a20bb23f2e6620085adb257dafaa3be0.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    advenam.tacet@trailofbits.com
    disconnect3d@trailofbits.com

[ASan][JSON] Unpoison memory before its reuse (#79065)

This commit unpoisons memory before its reuse (with reinterpret_cast).
Required by https://github.com/llvm/llvm-project/pull/79049

Notice that it's a temporary solution to prevent buildbots from failing.
Read FIXME for details.

[mlir][AMDGPU] Actually update the default ABI version, add comments (#79185)

Much confusion occurred earlier today when updating the fallback `int
abi;` in addControlVariables() didn't do anything. THis was because that
that value is the fallback for if the ABI version fails to parse ...
which it always should, because it has a default value that comes from
multiple different places.

This commit updates all the places said default variable can come from,
namely:
1. The ROCDL target attribute definition
2. The ROCDL target attribute's builders
3. The rocdl-attach-target pass's default option values.

With this, the printf test is passing.

[test] Avoid libc dep in Update warn-unsafe-buffer-usage-warning-data… (#79183)

Avoid libc dep in warn-unsafe-buffer-usage-warning-data-invocation.

To keep the test hermetic. This is in line with other existing
declarations in the file that avoid includes.

[ASan][ADT] Don't scribble with ASan (#79066)

With this commit, scribbling under AddressSanitizer (ASan) is disabled to prevent overwriting poisoned objects (e.g., annotated short strings).
Needed by https://github.com/llvm/llvm-project/pull/79049

Revert "Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass … (#79186)

…#78606"

This reverts commit 13c6f1ea2e7eb15fe492d8fca4fa1857c6f86370 because it
causes an assertion in DebugInfoMetadata.cpp:1968 in Clang Linux
builders for Fuchsia.

https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8758111613576762817/+/u/clang/build/stdout

AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (#79104)

int_amdgcn_global_load_tr did not specify non-temporal load transpose,
thus we should
not genetrate the non-temporal hint for the load. We need to implement
getTgtMemIntrinsic
to create the corresponding MemSDNode. And we don't set the non-temporal
flag because
the intrinsic did not specify it.

NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.

[RISCV] Re-format RISCVFeatures.td so it doesn't look like AssemblerPredicate is an operand to Predicate. (#79076)

AssemblerPredicate was almost always indented to the same column as the
first operand to Predicate. But AssemblerPredicate is a separate base
class so should have the same indentation as Predicate.

For the string passed to AssemblePredicate, I aligned it to the other
arguments on the previous if it fit in 80 columns. Otherwise I indented
4 spaces past the start of AssemblerPredicate.

For some vendor extensions I put the 2 classes on new lines instead of
the same line as the def. This gave more room for the strings and was
more consistent with other formatting in that portion of the file.

[Orc] Let LLJIT default to JITLink for ELF-based ARM targets (#77313)

The JITLink AArch32 backend reached feature-parity with RuntimeDyld
on ELF-based systems. This patch changes the default JIT-linker in Orc's
LLJIT for these platforms.

This allows us to run clang-repl with JITLink on ARM and use all the
features we had with RuntimeDyld before. All existing tests for
clang-repl are passing.

Re-land [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)

The reverts 94f960925b7f609636fc2ffd83053814d5e45ed1 and fixes it.

Revert "[clang-repl] Enable native CPU detection by default (#77491)" (#79178)

Reverting because `clang-repl -Xcc -mcpu=arm1176jzf-s` isn't overwriting
this as I had expected. We need to check whether a specific CPU flag was
given by the user first.

Reverts llvm/llvm-project#77491

[ConstantHoisting] Cache OptForSize. (#79170)

CacheOptForSize to remove quadratic behavior.

For each constant analyzed, ConstantHoising calls
`shouldOptimizeForSize(F)`, which calls `PSI.getTotalCallCount(F)`.
PSI.getTotalCallCount(F) goes through all the instructions in all basic
blocks, and checks if each is a call, to count them up.

This reduces `llc` time for a very large IR from ~10min to under 3min.
Reproducer testcase is much too large to share.