review.tizen.org Git - platform/upstream/llvm.git/log

Reapply #2 of [runtimes] Fix building initial libunwind+libcxxabi+libcxx with compiler implied -lunwind

This does mostly the same as D112126, but for the runtimes cmake files.
Most of that is straightforward, but the interdependency between
libcxx and libunwind is tricky:

Libunwind is built at the same time as libcxx, but libunwind is not
installed yet. LIBCXXABI_USE_LLVM_UNWINDER makes libcxx link directly
against the just-built libunwind, but the compiler implicit -lunwind
isn't found. This patch avoids that by adding --unwindlib=none if
supported, if we are going to link explicitly against a newly built
unwinder anyway.

Since the previous attempt, this no longer uses
llvm_enable_language_nolink (and thus doesn't set
CMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY during the compiler
sanity checks). Setting CMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY
during compiler sanity checks makes cmake not learn about some
aspects of the compiler, which can make further find_library or
find_package fail. This caused OpenMP to not detect libelf and libffi,
disabling some OpenMP target plugins.

Instead, require the caller to set CMAKE_{C,CXX}_COMPILER_WORKS=YES
when building in a configuration with an incomplete toolchain.

Differential Revision: https://reviews.llvm.org/D113253

[NFC][lsan] Change LeakSuppressionContext interface

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D115318

[SLP]Fix comparator for cmp instruction vectorization.

The comparator for the sort functions should provide strict weak
ordering relation between parameters. Current solution causes compiler
crash with some standard c++ library implementations, because it does
not meet this criteria. Tried to fix it + it improves the iverall
vectorization result.

Differential Revision: https://reviews.llvm.org/D115268

[reductions] Delete another piece of dead flag handling [NFC]

The code claimed to handle nsw/nuw, but those aren't passed via builder state and the explicit IR construction just above never sets them.

The only case this bit of code is actually relevant for is FMF flags. However, dropPoisonGeneratingFlags currently doesn't know about FMF at all, so this was a noop. It's also unneeded, as the caller explicitly configures the flags on the builder before this call, and the flags on the individual ops should be controled by the intrinsic flags anyways. If any of the flags aren't safe to propagate, the caller needs to make that change.

[dsymutil][NFC] Fix typo in help message

Just a simple typo fix that allows me to test landing a commit now that
I have commit access.

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D115414

[recurrence] Delete dead flag/fmf handling [NFC]

The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments. The sole exception is a caller of a method which has the argument, but no implementation.

I don't know what the intent was here, but it certaintly doesn't actually do anything today.

[asan] Run background thread for asan only on THUMB

As in D114934, or lsan crashes on the same bot.

[sanitizer] Run Stack compression in background thread

Depends on D114495.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D114498

[instcombine] Do demanded elts last when visiting extractelement

This reorders existing transforms to put demanded elements last. The reasoning here is that when we have an example which can be scalarized or handled via demanded bits, we should prefer scalarization as that doesn't require dropping flags on arithmetic instructions.

This doesn't show major changes in the tests today, but once I add support for fast math flags to dropPoisonGeneratingFlags this becomes glaringly obvious.

Differential Revision: https://reviews.llvm.org/D115394

Thread safety analysis: Remove unused variable. NFC.

Revert "[sanitizer] Run Stack compression in background thread"

This reverts commit e5c2a46c5e8fc038b9f6c898df9628f9524dc10e as this
change introduced a linker error when building sanitizer runtimes:

  ld.lld: error: undefined symbol: __sanitizer::internal_start_thread(void* (*)(void*), void*)
  >>> referenced by sanitizer_stackdepot.cpp:133 (compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp:133)
  >>>               compiler-rt/lib/sanitizer_common/CMakeFiles/RTSanitizerCommonSymbolizer.x86_64.dir/sanitizer_stackdepot.cpp.obj:(__sanitizer::(anonymous namespace)::CompressThread::NewWorkNotify())

Compute estimated trip counts for multiple exit loops

This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst.

Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable.

The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change.

This was noticed when analyzing regressions on D113939.

I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately.

Differential Revision: https://reviews.llvm.org/D115362

[InstCombine] (~a & b & c) | ~(a | b | c) -> ~(a | (b ^ c))

Transform
```
(~a & b & c) | ~(a | b | c) -> ~(a | (b ^ c))
```
And swapped case:
```
(~a | b | c) & ~(a & b & c) -> ~a | (b ^ c)
```

```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %or1 = or i4 %b, %a
  %or2 = or i4 %or1, %c
  %not1 = xor i4 %or2, 15
  %not2 = xor i4 %a, 15
  %and1 = and i4 %b, %not2
  %and2 = and i4 %and1, %c
  %or3 = or i4 %and2, %not1
  ret i4 %or3
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %1 = xor i4 %c, %b
  %2 = or i4 %1, %a
  %or3 = xor i4 %2, 15
  ret i4 %or3
}
Transformation seems to be correct!
```
```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %and1 = and i4 %b, %a
  %and2 = and i4 %and1, %c
  %not1 = xor i4 %and2, 15
  %not2 = xor i4 %a, 15
  %or1 = or i4 %not2, %b
  %or2 = or i4 %or1, %c
  %and3 = and i4 %or2, %not1
  ret i4 %and3
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %xor = xor i4 %b, %c
  %not = xor i4 %a, 15
  %or = or i4 %xor, %not
  ret i4 %or
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112966

Prevent abseil-cleanup-ctad check from stomping on surrounding context

This change applies two fixes to the abseil-cleanup-ctad check. It uses hasSingleDecl() to ensure only declStmt()s with one varDecl() are matched (leaving compount declStmt()s unchanged). It also addresses a bug in the handling of comments that surround the absl::MakeCleanup() calls by switching to the callArgs() combinator from Clang Transformer.

Reviewed By: ymandel

Differential Revision: https://reviews.llvm.org/D115452

[llvm] Use range-based for loops (NFC)

[mlir][Vector] Avoid infinite loop in InnerOuterDimReductionConversion.

This patterns tries to convert an inner (outer) dim reduction to an
outer (inner) dim reduction. Doing this on a 1D or 0D vector results
in an infinite loop since the converted op is same as the original
operation. Just returning failure when source rank <= 1 fixes the
issue.

Differential Revision: https://reviews.llvm.org/D115426

[fir] Keep runtime function name in comment

Some of the function name in the comment didn't match
their actual name in the runtime. This patch fixes that.

Reviewed By: schweitz

Differential Revision: https://reviews.llvm.org/D115076

Revert "tsan: new runtime (v3)"

This reverts commit 5a33e412815b8847610425a2a3b86d2c7c313b71 becuase it
breaks LLDB.

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39208/

[RISCV] Use MULHU for more division by constant cases.

D113805 improved handling of i32 divu/remu on RV64. The basic idea
from that can be extended to (mul (and X, C2), C1) where C2 is any
mask constant.

We can replace the and with an SLLI by shifting by the number of
leading zeros in C2 if we also shift C1 left by XLen - lzcnt(C1)
bits. This will give the full product XLen additional trailing zeros,
putting the result in the output of MULHU. If we can't use ANDI,
ZEXT.H, or ZEXT.W, this will avoid materializing C2 in a register.

The downside is it make take 1 additional instruction to create C1.
But since that's not on the critical path, it can hopefully be
interleaved with other operations.

The previous tablegen pattern is replaced by custom isel code.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D115310

[amdgpu][nfc] Drop dead PtrSet, fix a comment

[lldb] Remove unused lldb.cpp

lldb.cpp is unused after https://github.com/llvm/llvm-project/commit/ccf1469a4cdb03cb2bc7868f76164e85d90ebee1

Differential Revision: https://reviews.llvm.org/D115438

[NFC] Replace some deprecated getAlignment() calls with getAlign()

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D115370

[RISCV] Reduce duplicate FP test cases.

-Remove feq, fle, flt tests from *-arith.ll in favor of *-fcmp.ll which tests all predicates.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D113703

Avoid unnecessary output buffer allocation and initialization.

The sparse tensor code generator allocates memory for the output tensor. As
such, we only need to allocate a MemRefDescriptor to receive the output tensor
and do not need to allocate and initialize the storage for the tensor.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D115292

[NFC][mlir][OpenMP] Added documentation for omp.atomic ops

This patch adds the documentation for the operations `omp.atomic.read`,
`omp.atomic.write` and `omp.atomic.update`.

Reviewed By: peixin

Differential Revision: https://reviews.llvm.org/D115445

[AArch64][Analysis] Add on overhead costs for SVE gathers and scatters

This patch adds on an overhead cost for gathers and scatters, which
is a rough estimate based on performance investigations I have
performed on SVE hardware for various micro-benchmarks.

Differential Revision: https://reviews.llvm.org/D115143

[MLIR][GPU] Define gpu.printf op and its lowerings

- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110448

[LoopVectorize][AArch64] Add vectoriser cost model tests for gathers/scatters

I've added some tests that were previously missing for the gather-scatter costs
being calculated by the vectorizer for AArch64:

Transforms/LoopVectorize/AArch64/sve-gather-scatter-cost.ll

The costs are sometimes different to the ones in

Analysis/CostModel/AArch64/sve-gather.ll

because the vectorizer also adds on the address computation cost.

Revert "[xray] add support for hexagon"

This reverts commit 543a9ad7c460bb8d641b1b7c67bbc032c9bfdb45.

[mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations

Depends On D115263

By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115436

[mlir] AsyncParallelFor: sink constants into the parallel compute function

With complex recursive structure of async dispatch function LLVM can't always propagate constants to the parallel_compute_fn and it often prevents optimizations like loop unrolling and vectorization. We help LLVM by pushing known constants into the parallel_compute_fn explicitly.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115263

[test-release.sh] Respect the given width in LIT runs by adding `-j` in LLVM_LIT_ARGS.

This patch adds allows the LIT runs within test-release.sh to obey the width that
is passed into the script. This is accomplished by adding the width in the LLVM_LIT_ARGS
CMake configuration.

Differential Revision: https://reviews.llvm.org/D115350

[InlineAdvisor] Remove outdated comment (NFC)

This just returns None nowadays, so this comment doesn't apply
anymore.

[Inliner] Add debug message for history skip (NFC)

[AIX] Disable failing tests because of missing DWARF sections

The following tests are failing due to missing DWARF sections: `DwarfAccelNamesSection` and `DwarfAddrSection`. This patch sets these tests as `XFAIL` until the sections can be implemented for AIX.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114681

[xray] add support for hexagon

Adds x-ray support for hexagon to llvm codegen, clang driver,
compiler-rt libs.

Differential Revision: https://reviews.llvm.org/D113638

[ARM][clang] Option b-key must not affect __ARM_FEATURE_PAC_DEFAULT

When using -mbranch-protection=pac-ret+b-key, macro __ARM_FEATURE_PAC_DEFAULT
should still have the value corresponding to a-key, because b-key is only valid
for AArch64.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D115140

[mlir][linalg][bufferize] LinalgOps can bufferize inplace with input args

LinalgOp results usually bufferize inplace with output args. With this change, they may buffer inplace with input args if the value of the output arg is not used in the computation.

Differential Revision: https://reviews.llvm.org/D115022

[clang][clangd] Desugar array type.

Desugar array type.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D115107

[MetaRenamer] Add command line options to disable renaming name with specified prefixes

This patch adds 4 options for specifying functions, aliases, globals and
structs name prefixes hat don't need to be renamed by MetaRenamer pass.
This is useful if one has some downstream logic that depends directly
on an entity name. MetaRenamer can break this logic, but with the patch
you can tell it not to rename certain names.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D115323

[lldb][NFC] clang-format some files as preparation for https://reviews.llvm.org/D114627

Reviewed By: werat

Differential Revision: https://reviews.llvm.org/D115110

[MLIR] Move Presburger Math from FlatAffineConstraints to Presburger/IntegerPolyhedron

This patch factors out math functionality that is a subset of Presburger arithmetic and moves it from FlatAffineConstraints to Presburger/IntegerPolyhedron. This patch only moves some parts of the functionality planned to be moved, with subsequent patches moving more functionality. There are three main reasons for this:

1. This split makes the Presburger Library easier and more flexible to use
    across MLIR, by not depending on IR.
2. This split allows the Presburger library to be developed independently from
    Affine Analysis, with Affine Analysis using this library.
3. With more functionality being upstreamed to the Presburger Library, the
    mlir/Analysis directory will be cluttered with Presburger library components
    since they depend on math functionality from FlatAffineConstraints. Moving this
    functionality to the Presburger directory allows keeping the new functionality
    in the Presburger directory.

This patch is part of an ongoing effort to make the Presburger Library easier to use. The motivation for this effort is the feedback received at the LLVM conference from Mehdi and others.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D114674

Revert "Reapply [runtimes] Fix building initial libunwind+libcxxabi+libcxx with compiler implied -lunwind"

This reverts commit 317dc31e53b83c1d2a468d7a541925f0cc7d9dce.

After that change, OpenMP doesn't find dependencies in the host
system (it fails do find e.g. /usr/lib/x86_64-linux-gnu/libelf.so
which it found before), which causes some OpenMP target offloading
plugins to not be found. This doesn't break the build, but just
causes the AMDGPU OpenMP target plugin to be omitted. See
https://reviews.llvm.org/D113253#3181934 for the report of this
issue.

[LV] Mark various functions as const (NFC).

Make sure various accessors do not modify any state, in preparation for
D115111.

[ARM][clang] Define feature test macro for the PACBTI-M extension

If the extension string "+pacbti" was given in -march=... or -mcpu=... options the compiler shall define the following preprocessor macros:

__ARM_FEATURE_PAUTH with value 1.
__ARM_FEATURE_BTI with value 1.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Ties Stuij

Reviewed By: miyuki

Differential Revision: https://reviews.llvm.org/D112431

[clang-format] PR48916 PointerAlignment not working when using C++20 init-statement in for loop

https://bugs.llvm.org/show_bug.cgi?id=48916

Left and Right Alignment inside a loop is misaligned.

Reviewed By: HazardyKnusperkeks, curdeius

Differential Revision: https://reviews.llvm.org/D115050

[clang][deps] Use MemoryBuffer in minimizing FS

This patch avoids unnecessarily copying contents of `mmap`-ed files into `CachedFileSystemEntry` by storing `MemoryBuffer` instead. The change leads to ~50% reduction of peak memory footprint when scanning LLVM+Clang via `clang-scan-deps`.

Depends on D115331.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D115043

[llvm] Add null-termination capability to SmallVectorMemoryBuffer

Most of `MemoryBuffer` interfaces expose a `RequiresNullTerminator` parameter that's being used to:
* determine how to open a file (`mmap` vs `open`),
* assert newly initialized buffer indeed has an implicit null terminator.

This patch adds the paramater to the `SmallVectorMemoryBuffer` constructors, meaning:
* null terminator can now be added to `SmallVector`s that didn't have one before,
* `SmallVectors` that had a null terminator before keep it even after the move.

In line with existing code, the new parameter is defaulted to `true`. This patch makes sure all calls to the `SmallVectorMemoryBuffer` constructor set it to `false` to preserve the current semantics.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D115331

[llvm][lldb] Remove unused SmallVectorMemoryBuffer.h includes

[MLIR] Introduce coalesce for PresburgerSet

This patch provides functionality for simplifying `PresburgerSet`s by checking if any `FlatAffineConstraints` in the set is contained in another, and removing such redundant FACs.

This is part of a series of patches to provide functionality for [integer set coalescing](http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf) in MLIR.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D110617

[MLIR][OpenMP] Added omp.atomic.update

This patch supports the atomic construct (update) following section 2.17.7 of OpenMP 5.0 standard. Also added tests and verifier for the same.

Reviewed By: kiranchandramohan, peixin

Differential Revision: https://reviews.llvm.org/D112982

[clang] Fix a misadjusted path style comparison in a unittest

This was changed incorrectly by accident in
99023627010bbfefb71e25a2b4d056de1cbd354e.

Differential Revision: https://reviews.llvm.org/D113254

[clang][deps] Use lock_guard instead of unique_lock

This patch changes uses of `std::unique_lock` to `std::lock_guard`.

The `std::unique_lock` template provides some advanced capabilities (deferred locking, time-constrained locking attempts, etc.) we don't use in the caching filesystem. Plain `std::lock_guard` will do here.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D115332

[Test] [GVN] Add test showing equivalent PHIs generation by GVN

[ARM] Fix gcc warning about mix of enumeral and non-enumeral types

gcc warned with
../lib/Target/ARM/ARMFrameLowering.cpp:797:31: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
797 | Reg == ARM::R12 ? ARM::RA_AUTH_CODE : Reg, true);
| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~

[PowerPC] Fix gcc warning about unused variable [NFC]

gcc warned about
../lib/Target/PowerPC/PPCTargetTransformInfo.cpp:1401:13: warning: unused variable 'VecTy' [-Wunused-variable]
1401 | if (auto *VecTy = dyn_cast<FixedVectorType>(DataType)) {
| ^~~~~

tsan: new runtime (v3)

This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603

[NewPM] Port FlattenCFGPass to NPM

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D115361

[mlir][memref] Fix subview offset verification.

Offset-specific verification seems to have been lost in one of the recent refactorings.
Also add proper tests that would have caught this omission.

This addresses the immediate issues discussed in:
https://llvm.discourse.group/t/memref-subview-affine-map-and-symbols/4851

Differential Revision: https://reviews.llvm.org/D115427

[NFC][Verifier] Remove checks for atomic loads/stores that alignment is non-zero

The alignment is never 0 since getAlign() returns 1 << bits.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D115388

[C++20] [Coroutines] Mark coroutine done if unhandled_exception throws

According to [dcl.fct.def.coroutine]/p14:
> If the evaluation of the expression promise.unhandled_exception()
> exits via an exception, the coroutine is considered suspended at the
> final suspend point.

But this is not implemented in clang before. This patch would implement
this feature by marking the coroutine as done at the place of
coro.end(frame, /*InUnwindPath=*/true ).

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D115219

[RISCV] Fix vm operand constraint to fit GCC's behavior

- `vm` constraint is used for masking operand, which always v0.

- Update testcase, only masking operand should use `vm`, vector mask operations
  should just use `vr` for any vector register.

- Revise the description of `vm` constraint.

- This patch also fix issue on RISCVRegisterInfo.td and RISCVISelLowering.cpp.

  RISCVRegisterInfo.td:
  - The first VT in the list must be the largest total size since the
    SelectionDAGBuilder uses the first register in the list as the canonical
    type for the register.

  RISCVISelLowering.cpp:
  - Fix RISCVTargetLowering::splitValueIntoRegisterParts and
    RISCVTargetLowering::joinRegisterPartsIntoValue for handling vectors
    with different total size, that will happened on fractional LMUL since
    fractional LMUL is always occupy one vector register.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D112599

[Coroutines] Remove unused coroutine builtin/intrinsics llvm.coro.param (NFC-ish)

I found that the coroutine intrinsic llvm.coro.param in documentation
(https://llvm.org/docs/Coroutines.html#id101) didn't get used actually
since there isn't lowering codes in LLVM. I also checked the
implementation of libstdc++ and libc++. Both of them didn't use
llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic
is unused. I think it would be better t to remove it to avoid possible
misleading understandings.

Note: according to [class.copy.elision]/p1.3, this optimization is
allowed by the C++ language specification. Let's make it someday.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D115222

[mlir][Linalg] Bufferize the region of LinalgOps as well.

The region of `linalg.generic` might contain `tensor` operations. For
example, current lowering of `gather` uses a `tensor.extract` in the
body of the `LinalgOp`. Bufferize the ops within a `LinalgOp` region
as well to catch such cases.

Differential Revision: https://reviews.llvm.org/D115322

tsan: fork runtime

Fork the current version of tsan runtime before commiting
rewrite of the runtime (D112603). The old runtime can be
enabled with TSAN_USE_OLD_RUNTIME option.
This is a temporal measure for emergencies and is required
for Chromium rollout (for context see http://crbug.com/1275581).
The old runtime is supposed to be deleted soon.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D115223

[C++20 Modules] Don't create global module fragment for extern linkage declaration in GMF already

Previously we would create global module fragment for extern linkage
declaration which is alreday in global module fragment. However, it is
clearly redundant to do so. This patch would check if the extern linkage
declaration are already in GMF before we create a GMF for it.

[NFC] Rename MachineFunction::cloneMachineInstrBundle (coding style)

[NFC] Rename MachineFunction::deleteMachineInstr (coding style)

[llvm] Use range-based for loops (NFC)

[gn build] Port 059e03476cbb

[NFC][mlgo] Generalize model runner interface

This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.

Differential Revision: https://reviews.llvm.org/D115306

[CSKY] Complete codegen of basic arithmetic and load/store operations

Complete basic arithmetic operations such as add/sub/mul/div, and it also includes converions
and some specific operations such as bswap.Add load/store patterns to generate different addressing mode instructions.

Also enable some infra such as copy physical register and eliminate frame index.

[libc++][NFC] Remove test/support/tracked_value.h

No tests are using `libcxx/test/support/tracked_value.h`. So, remove it.

Differential Revision: https://reviews.llvm.org/D115411

[NFC] Rename MachineFunction::DeleteMachineBasicBlock

Renamed to conform to coding style

[Support] [Debuginfod] Use libcurl imported library.

A [[ https://lists.llvm.org/pipermail/llvm-dev/2021-December/154203.html | report on llvm-dev ]] indicated that `curl.h` could be found during build configuration but not during compilation. This diff uses the libcurl imported library created by findcurl, so that includes and libs are automatically managed correctly by cmake.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D115189

[PowerPC] copy byval parameter to caller's stack when needed

Now we won't copy the byval parameter (bigger than 8 bytes) to
caller's parameter save area. Instead, we will only copy the
byval parameter when it can not be passed entirely in registers
which means we have to use parameter save area according to the
64 bit SVR4 ABI.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D111485

[gn build] (manually) port ccf1469a4cdb (lldbVersion)

Revert "[asan] Run background thread for asan only on THUMB"

This reverts commit 5c27740238007d22f2d0cd0ebe2aaffa90a9c92b.

Reverting due to Windows build issue:

sanitizer_stackdepot.cpp.obj : error LNK2005: "void __cdecl __sanitizer::StackDepotStopBackgroundThread(void)" (?StackDepotStopBackgroundThread@__sanitizer@@YAXXZ) already defined in sanitizer_common_libcdep.cpp.obj
LINK : fatal error LNK1181: cannot open input file 'projects\compiler-rt\lib\asan\CMakeFiles\RTAsan_dynamic.x86_64.dir\asan_rtl_x86_64.S.obj'

Support: Avoid using SmallVector::set_size() in zlib

Stop using `SmallVector::set_size()` in zlib. Replace pairs of
`reserve()` / `set_size()` with `resize_for_overwrite()` and
`truncate()`.

Differential Revision: https://reviews.llvm.org/D115391

Support: Avoid using SmallVector::set_size() in sys::path

Stop using `SmallVector::set_size()` in sys::path APIs. In both cases,
use `truncate()` instead.

Differential Revision: https://reviews.llvm.org/D115391

[gn build] (manually) port f75cce0be861

[libc] Add a .clang-tidy file for the toplevel libc directory.

Generation of the .yaml has been removed to prevent lint from
running with every ninja invocation. The new .clang-tidy file is copied
to the libc build directory so that generated files also get checked.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D115405

Revert "[ASan] Shared optimized callbacks implementation."

This reverts commit f71c553a30cc52c0b4a6abbaa82ce97c30c13979.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D115407

ADT: Make StringRef::size() and StringRef::empty() constexpr

This unblocks using `StringLiteral::size()` for a SmallVector size in
another patch.

Differential Revision: https://reviews.llvm.org/D115395

[gn build] (manually) port f71c553a30cc

[ASan] Shared optimized callbacks implementation.

This change moves optimized callbacks from each .o file to compiler-rt. Instead of using code generation it uses direct assembly implementation. Please note that the 'or' version is not implemented and it will produce unresolved external if somehow 'or' version is requested.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D114558

AMDGPU: Mark scc defs dead in SGPR to VMEM path for no free SGPRs

This introduces verifier errors into this broken situation which we do
not handle correctly, which is better than being silently
miscompiled. For the emergency stack slot, the scavenger likes to move
the restore instruction as late as possible, which ends up separating
the SCC def from the conditional branch.

AMDGPU: Simplify test for SGPR spilling bug

Remove the control flow from the test from
25eb7fa01d7ebbe67648ea03841cda55b4239ab2. It's not necessary to
reproduce the original assert with the patch reverted. The control
flow happens to expose a different issue that calls for a separate
test in a future change.

AMDGPU: Mark SCC def as dead when expanding frame indexes

This improves liveness queries in a future change.

[RISCV] Improve tracking of EndLoc in the assembly parser.

The SMLoc::getFromPointer(S.getPointer() - 1) pattern used in
several place didn't make sense since that puts the End before the
Start location.

This patch corrects this to properly calculate the end location as
we parse. Unsure how much this matters, a lot of these are for custom
operand parsing. If the custom parsing succeeds, the instruction
matching probably won't fail, so the end loc won't be used to build
a range for a diagnostic.

I've also fixed the creation functions for the CSR and VType operands
to assign the EndLoc of the RISCVOperand class using the StartLoc
instead of an empty SMLoc. I don't think these will ever be accessed,
but it makes the code look less like we forgot to assign it. This is
consistent with some operands on the ARM target.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D115192

[lldb/Target] Slide source display for artificial locations at function start

It can happen that a line entry reports that some source code is located
at line 0. In DWARF, line 0 is a special location which indicates that
code has no 1-1 mapping with source.

When stopping in one of those artificial locations, lldb doesn't know which
line to display and shows the beginning of the file instead.

This patch mitigates this behaviour by checking if the current symbol context
of the line entry has a matching function, in which case, it slides the
source listing to the start of that function.

This patch also shows the user a warning explaining why lldb couldn't
show sources at that location.

rdar://83118425

Differential Revision: https://reviews.llvm.org/D115313

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

OpenMP: Avoid using SmallVector::set_size()

Update `OpenMPIRBuilder::collapseLoops()` to call `resize()` instead of
`set_size()`. The latter asserts on capacity limits and cannot grow,
which seems likely to be unintentional here (if it is, I think a local
assertion would be good for clarity).

Also update `CodeGenFunction::EmitOMPCollapsedCanonicalLoopNest()` to
use `pop_back_n()` instead of `set_size()`.

Differential Revision: https://reviews.llvm.org/D115378

[lldb] Make lldbVersion a full fledged library

Because of its dependency on clang (and potentially other compilers
downstream, such as swift) lldb_private::GetVersion already lives in its
own library called lldbBase. Despite that, its implementation was spread
across unrelated files. This patch improves things by introducing a
Version library with its own directory, header and implementation file.

The benefits of this patch include:

- We can get rid of the ugly quoting macros.
- Other parts of LLDB can read the version number from
lldb/Version/Version.inc.
- The implementation can be swapped out for tools like lldb-server than
don't need to depend on clang at all.

Differential revision: https://reviews.llvm.org/D115211

[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version

Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279

[dfsan] Add a flag to ignore personality routines.

This diff adds "dfsan-ignore-personality-routine" flag, which makes
the dfsan pass to not to generate wrappers for the personality functions if the
function is marked uninstrumented.

This flag is to support dfsan with the cases where the exception handling
routines cannot be instrumented (e.g. use the prebuilt version of c++ standard
library). When the personality function cannot be instrumented it is supposed
to be marked "uninstrumented" from the abi list file. While DFSan generates a
wrapper function for uninstrumented functions, it cannot cannot generate a
valid wrapper for vararg functions, and indirect invocation of vararg function
wrapper terminates the execution of dfsan-instrumented programs. This makes
invocation of personality routine to crash the program, because 1) clang adds a
declaration of personality functions as a vararg function with no fixed
argument, and 2) personality routines are always called indirectly.

To address this issue, the flag introduced in this diff makes dfsan to not to
instrument the personality function. This is not the "correct" solution in the
sense that return value label from the personality function will be undefined.
However, in practice, if the exception handling routines are uninstrumented we
wouldn't expect precise label propagation around them, and it would be more
beneficial to make the rest of the program run without termination.

Reviewed By: browneee

Differential Revision: https://reviews.llvm.org/D115317

[mlir] Added ctlz and cttz to math dialect and LLVM dialect

Count leading/trailing zeros are an existing LLVM intrinsic. Added LLVM
support for the intrinsics with lowerings from the math dialect to LLVM
dialect.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D115206

[llvm-profgen] remove check Attributes to fix build failure

Revert "A new hidden option test-changed=exe that calls exe after each time IR changes"

This reverts commit f9235e45fd1f5ca21f95105427184a6afd0f9d95.

Causes breakages on Windows: http://45.33.8.238/win/50453/step_11.txt.

ADT: Add SmallVectorImpl::truncate() to replace uses of set_size()

Add `SmallVectorImpl::truncate()`, a variant of `resize()` that cannot
increase the size.

- Compared to `resize()`, this has no code path for growing the
  allocation and can be better optimized.
- Compared to `set_size()`, this formally calls destructors, and does
  not skip any constructors.
- Compared to `pop_back_n()`, this takes the new desired size, which in
  many contexts is more intuitive than the number of elements to remove.

The immediate motivation is to pair this with `resize_for_overwrite()`
to remove uses of `set_size()`, which can then be made private.

Differential Revision: https://reviews.llvm.org/D115383

Update sink instruction testcase

Add inferable writeonly attribute. Simplify testcase.