review.tizen.org Git - platform/upstream/llvm.git/log

[AArch64] Codegen for FEAT_LRCPC3

Implements support for the following 128-bit atomic operations with +rcpc3:
- 128-bit store-release -> STILP
- 128-bit load-acquire -> LDIAPP

D126250 and D137590 added support for emitting LDAPR (Load-Acquire RCPc) rather
than LDAP (Load-Acquire) when +rcpc is available. This patch allows emitting
the 128-bit RCPc instructions added in FEAT_LRCPC3 (LDIAPP/STILP). The
implementation is different from LDAPR, because there are no non-RCPc
equivalents for these new instructions.

Support for the offset variants will be added in D141431.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D141429

[SCCP] Add extra tests for Add flag inference.

Add extra tests from #60280 and #60278 as well as test showing missed
optimization opportunity.

[clang][Interp] Implement switch statements

Differential Revision: https://reviews.llvm.org/D137415

[MLIR] Convert remaining tests to opaque pointers (NFC)

These were the final tests using -opaque-pointers=0 in mlir/.

[AArch64] Codegen for FEAT_LSE128

Codegen support for 128-bit atomicrmw (and|or|xchg).
      - store atomic -> swpp
      - atomicrmw xchg -> swpp
      - atomicrmw and -> ldclrp
      - atomicrmw or -> ldsetp

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D141406

[clang][Interp] Reject invalid declarations and expressions

Reject them early, since we will run into problems and/or assertions
later on anyway.

Differential Revision: https://reviews.llvm.org/D137386

[Support] Avoid using main thread for llvm::parallelFor().

The llvm::parallelFor() uses threads created by ThreadPoolExecutor as well as main thread.
The index for the main thread matches with the index for the first thread created by ThreadPoolExecutor.
It results in that getThreadIndex returns the same value for different threads.
To avoid thread index clashing - do not use main thread for llvm::parallelFor():

parallel::TaskGroup TG;
for (; Begin + TaskSize < End; Begin += TaskSize) {
  TG.spawn([=, &Fn] {
    for (size_t I = Begin, E = Begin + TaskSize; I != E; ++I)
      Fn(I);
  });
}
for (; Begin != End; ++Begin)    <<<< executed by main thread.
  Fn(Begin);                     <<<<
return;                          <<<<

Differential Revision: https://reviews.llvm.org/D142317

[LTO] Remove OpaquePointers option from config (NFC)

Always use opaque pointers.

[AArch64][SME2] Add intrinsics to move multi-vectors to/from ZA.

Adds intrinsics for the following:
- mova: array to vector / vector to array
- mova: tile to vector / vector to tile

Tablegen patterns have been added to match the ZA write intrinsics. As the
read intrinsics return a multi-vector, a function called SelectMultiVectorMove
has been added to AArch64ISelDAGToDAG to select the correct instruction. The
SelectSMETile function has also been added to check that the tile number
passed to read intrinsics is valid for the base register.

This patch also cleans up the sme_vector_to_tile_patterns multiclass to remove
the pattern for an offset of 0, which is handled by tileslice.

NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D142031

[clang][Interp] Fix ImplicitValueInitExprs for pointer types

This previously ran into an "unknown type" assertion when trying to emit
a 'Zero' op for a pointer type. Emit a NullPtr op instead.

Differential Revision: https://reviews.llvm.org/D137235

[LLD] Remove no-opaque-pointers plugin option

We always use opaque pointers. The opaque-pointers option is
retained as a no-op, same as no-lto-legacy-pass-manager.

[gold] Remove no-opaque-pointers option

The opaque-pointers option is retained as a no-op, same as with
new-pass-manager.

[LTO] Remove -lto-opaque-pointers flag

Always use the config default of OpaquePointers == true.

[clang][Interp] Support inc/dec operators for pointers

Differential Revision: https://reviews.llvm.org/D137232

[X86] Ensure the _mm_test_all_ones macro does not reuse argument (PR60006)

The macro _mm_test_all_ones(V) was defined as _mm_testc_si128((V), _mm_cmpeq_epi32((V), (V))) - which could cause side effects depending on the source of the V value.

The _mm_cmpeq_epi32((V), (V)) trick was just to materialize an all-ones value, which can be more safely generated with _mm_set1_epi32(-1) .

Fixes #60006

Differential Revision: https://reviews.llvm.org/D142477

[AArch64][SME2] Add Multi-vector add/sub and accumulate into ZA intrinsic

Add the following intrinsic:
    ADD
    SUB
    FADD
    FSUB
NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D142210

[NFC][WebAssembly] More fpclamptosat tests

[clang][Interp] Fix dereferencing arrays with no offset applied

A pointer to an array and a pointer to the first element of the array
are not the same in the interpreter, so handle this specially in
deref().

Differential Revision: https://reviews.llvm.org/D137082

[InstCombine] Add additional tests for dead phi cycles (NFC)

[flang][hlfir] Add hlfir.copy_in and hlfir.copy_out codegen to FIR.

Use runtime Assign to deal with the copy (and the temporary creation, so
that this code can deal with polymorphic temps without any change).

Using Assign for the copy is desired here since the copy happens when
the data is not contiguous, and it happens inside an if/then which
makes it hard to optimize.
See https://github.com/llvm/llvm-project/commit/2b60ed405b8110b20ab2e383839759ea34003127
for more details (note that, contrary to this last commit, the code at
hand is only dealing with copy-in/copy-out, it is not intended to deal
with preparing VALUE arguments).

Differential Revision: https://reviews.llvm.org/D142475

[AArch64][SME2] Add the IR intrinsics for SME2 fclamp, sclamp and uclamp instructions

Adds intrinsics for the following SME2 instructions:

* fclamp (2 and 4 vectors)
* sclamp (2 and 4 vectors)
* uclamp (2 and 4 vectors)

I've added these new instructions to the existing sve2p1-* tests
because although they are included as part of SME2 they are still
SVE-like, in that they only operate on SVE vectors.

NOTE: These intrinsics are still in development and are subject to future changes.

Differential Revision: https://reviews.llvm.org/D142355

[clang][Interp] Re-apply "Implement missing compound assign operators"

Implement mul, div, rem, etc. compound assign operators.

Differential Revision: https://reviews.llvm.org/D137071

[clang][Interp] Fix compound assign operator types

Just like we do (or will do) for floating types, we need to take into
acocunt that the LHSComputationType, ResultType and type of the
expression (what we ultimately store) might be different.

Do this by emitting cast ops before and after doing the computation.

This fixes the test failures introduced by
490e8214fca48824beda8b508d6d6bbbf3d8d9a7 on big endian machines.

Differential Revision: https://reviews.llvm.org/D142328

UpdateTestChecks: cleanup NamelessValues constructor

Remove global_ir_{prefix,prefix_regexp} (one of which is misnamed),
since they are really quite redundant with ir_{prefix,regexp} and
default the is_before_functions argument, which basically just adds
noise to the table of NamelessValues.

Differential Revision: https://reviews.llvm.org/D142451

update_test_checks.py: pick up --tool from UTC_ARGS

It's not clear to me how to write a test for this. The tests run in an
environment where the tools may not be in PATH, and so the existing
custom-tool.test needs to use --tool-binary when invoking
update_test_checks.py. But it can't do so without also using --tool.

This change does fix a problem though with using
update_any_test_checks.py in an environment where the tools *are*
available in PATH.

Differential Revision: https://reviews.llvm.org/D142450

[NFC][LLDB] Rename test file

Commit 92f0e4cca introduced test file with name TestChangeValue.py,
which leads to test failure because there already is a test files with the same name
In this commit a newly added file is renamed to fix this failure

[Docs] Typed pointers are no longer supported

As promised, typed pointers are no longer supported on the main
branch, as a matter of policy.

[AMDGPU][MC][NFC] MUBUF code cleanup

- Simplify concatenation
- Common up expressions

[flang][hlfir] Add hlfir.copy_in and hlfir.copy_out operations

These operations implement the optional copy of a non contiguous
variable into a temporary before a call, and the copy back from the
temporary after the call.

Differential Revision: https://reviews.llvm.org/D142342

[mlir][linalg] Convert tensor.from_elements to destination style

This can be a pre-processing for bufferization and allows for more efficient lowerings without an alloc.

Differential Revision: https://reviews.llvm.org/D142206

[flang] Fix bounds array creation for pointer remapping calls

`PointerAssociateRemapping` expect a descriptor holding
a newRank x 2 array of int64. The previous lowering was wrong.
Adapt the lowering to fit the expectation of the runtime.
Use the `bounds` to get the rank.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D142487

[libc][Obvious] Disable log10_test as it is failing on the x86_64 builders.

[lldb] Consider all breakpoints in breakpoint detection

Currently in some cases lldb reports stop reason as "step out" or "step over" (from thread plan completion) instead of "breakpoint", if the user breakpoint happens to be set on the same address.
The part of https://github.com/llvm/llvm-project/commit/f08f5c99262ff9eaa08956334accbb2614b0f7a2 seems to overwrite internal breakpoint detection logic, so that only the last breakpoint for the current stop address is considered.
Together with step-out plans not clearing its breakpoint until they are destrouyed, this creates a situation when there is a user breakpoint set for address, but internal breakpoint makes lldb report a plan completion stop reason instead of breakpoint.
This patch reverts that internal breakpoint detection logic to consider all breakpoints

Reviewed By: jingham

Differential Revision: https://reviews.llvm.org/D140368

[llvm] Use llvm::bit_ceil instead of PowerOf2Ceil (NFC)

The arguments to PowerOf2Ceil in this patch are all known to be
nonzero, so we can safely use llvm::bit_ceil here.

[libc][NFC] Detect host CPU features using try_compile instead of try_run.

This implements the same behavior as D141997 but makes sure that the same detection mechanism is used between CMake and source code.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D142108

[LLDB] Fixes summary formatter for libc++ map allowing modification of contained value

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D140624

[IndVars] Expand icmp in preheader rather than in loop

The motivation is that 'createInvariantCond' unconditionally
builds icmp in the loop block, while it could always do it
in preheader. Build it in preheader instead.

Patch by Aleksandr Popov!

Differential Revision: https://reviews.llvm.org/D141994
Reviewed By: nikic

[LLDB] Fix for libc++ atomic allowing modification of contained value

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D140623

libcxx: Don't apply ABI tags to extern "C" fns

GCC rejects ABI tags on non mangled functions, as they would otherwise
be a no-op.  This commit replaces such instances with equivalent
_LIBCPP_HIDE_FROM_ABI constants but without ABI tags attached.

  .../include/c++/v1/__support/musl/xlocale.h:28:68: error: 'abi_tag'
  attribute applied to extern "C" declaration 'long long int
  strtoll_l(const char*, char**, int, locale_t)'
     28 | strtoll_l(const char *__nptr, char **__endptr, int __base, locale_t) {
        |                                                                    ^

Bug: https://bugs.gentoo.org/869038

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D142415

[X86] Use llvm::countr_zero instead of findFirstSet (NFC)

At the call site of findFirstSet, ZMask | (1 << DstIdx) always have
exactly 3 bits set, and they are all among the 4 least significant
bits, so (ZMask | (1 << DstIdx)) ^ 15 has exactly one bit set. Since
the argument to findFirstSet is nonzero, we can safely switch to
llvm::countr_zero.

[MLIR] Expose LocationAttrs in the C API

This patch adds three functions to the C API:
- mlirAttributeIsALocation: returns true if the attribute is a LocationAttr,
false otherwise.
- mlirLocationGetAttribute: returns the underlying LocationAttr of a Location.
- mlirLocationFromAttribute: gets a Location from a LocationAttr.

Reviewed By: mikeurbach, Mogball

Differential Revision: https://reviews.llvm.org/D142182

[libc][NFC] Another round of replacement of "inline" with "LIBC_INLINE".

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D142509

Bump the trunk major version to 17

[CodeGen] Use llvm::bit_ceil (NFC)

If we know that x is nonzero and not a power of 2, then
llvm::findLastSet(x) + 1 is the index of the bit just above the
highest set bit in x. That is, 1 << (llvm::findLastSet(x) + 1) is the
same as llvm::bit_ceil(x).

Since llvm::bit_ceil is a nop on a power of 2, we can unconditionally
call llvm::bit_ceil. The end result actually matches the comment.

[SystemZ] Use llvm::bit_floor (NFC)

If x is known to be nonzero, findLastSet(x) returns the index of the
highest set bit counting from the LSB, so 1 << findLastSet(x) is the
same as llvm::bit_floor(x).

[M68k][MC] Make immediate operands relocatable

Sometimes memory addresses are treated as immediate values. Thus
immediate operands have to be relocatable.

Differential Revision: https://reviews.llvm.org/D137902

[M68k][Disassembler] Use custom decoder for 32-bit immediates

32-bit immediates require special cares because they go across the
normal word (16 bits) boundaries.
This patch also fixes some incorrect disassembler test cases.

Differential Revision: https://reviews.llvm.org/D142080

[TableGen] Support custom decoders for variable length instructions

Just like the encoder directive for variable-length instructions, this
patch adds a new decoder directive to allow custom decoder function on
an operand.

Right now, due to the design of DecoderEmitter each operand can only
have a single custom decoder in a given instruction.

Differential Revision: https://reviews.llvm.org/D142079

[zero-call-used-regs] Mark only non-debug instruction's register as used

zero-call-used-regs pass generate an xor instruction to help mitigate
return-oriented programming exploits via zeroing out used registers. But
in this below test case with -g option there is dbg.value instruction
associating the register with the debug-info description of the formal
parameter d, which makes the register appear used, therefore it zero the
register edi in -g case and makes binary different from without -g option.

The pass should be looking only at the non-debug uses.

$ cat test.c
char a[];
int b;
__attribute__((zero_call_used_regs("used"))) char c(int d) {
  *a = ({
    int e = d;
    b;
  });
}

This fixes https://github.com/llvm/llvm-project/issues/57962.

Differential Revision: https://reviews.llvm.org/D138757

Revert "[SCCP] Use range info to prove AddInst has NUW flag."

This reverts commit de122cb920080fd9e24b2777114271fbef932d5e.

This change causes assertion failures in many of our internal tests.
I have filed #60280 for this issue.

Revert "[clang-tidy] Introduce HeaderFileExtensions and ImplementationFileExtensions options"

This reverts commit 4240c9146248ac0a91c45dee421c6ef07709ba74.

The current solution won't work since getLocalOrGlobal does not
support returning a vector. More work needs to be put into
ensuring both the local and global way of setting the options
are available during the transition period.

Fix running MLIR tests when enabling examples but the native backends isn't configured (NFC)

[Transform] Rewrite LowerSwitch using APInt

This rewrite fixes https://github.com/llvm/llvm-project/issues/59316.

Previously LowerSwitch uses int64_t, which will crash on case branches using integers with more than 64 bits.
Using APInt fixes this problem. This patch also includes a test

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D140747

[AssumptionCache] caches @llvm.experimental.guard's

As discussed in https://github.com/llvm/llvm-project/issues/59901

This change is not NFC. There is one SCEV and EarlyCSE test that have an
improved analysis/optimization case. Rest of the tests are not failing.

I've mostly only added cleanup to SCEV since that is where this issue
started. As a follow up, I believe there is more cleanup opportunity in
SCEV and other affected passes.

There could be cases where there are missed registerAssumption of
guards, but this case is not so bad because there will be no
miscompilation. AssumptionCacheTracker should take care of deleted
guards.

Differential Revision: https://reviews.llvm.org/D142330

[Clang][OpenMP] Find the type `omp_allocator_handle_t` from identifier table

In Clang, in order to determine the type of `omp_allocator_handle_t`, Clang
checks the type of those predefined allocators. The first one it checks is
`omp_null_allocator`. If the language is C, and the system is 64-bit, what Clang
gets is a `int`, instead of an enum of size 8, given the fact how we define
`omp_allocator_handle_t` in `omp.h`. If the allocator is captured by a region,
let's say a parallel region, the allocator will be privatized. Because Clang deems
`omp_allocator_handle_t` as an `int`, it will first cast the value returned by
the runtime library (for `libomp` it is a `void *`) to `int`, and then in the
outlined function, it casts back to `omp_allocator_handle_t`. This two casts
completely shaves the first 32-bit of the pointer value returned from `libomp`,
and when the private "new" pointer is fed to another runtime function
`__kmpc_allocate()`, it causes segment fault. That is the root cause of PR54082.
I have no idea why `-fno-pic` could hide this bug.

In this patch, we detect `omp_allocator_handle_t` using roughly the same method
as `omp_event_handle_t`, by looking it up into the identifier table.

Fix #54082.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D142297

[lldb] Remove legacy six module for py2->py3

LLDB only supports Python3 now, so the `six` shim for Python2 is no longer necessary.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D142140

[lldb] Don't create Clang AST nodes in GetDIEClassTemplateParams

Otherwise we may be inserting a decl into a DeclContext that's not fully defined yet.

This simplifies/removes some clang AST node creation code. Instead, use
clang::printTemplateArgumentList().

Reviewed By: Michael137

Differential Revision: https://reviews.llvm.org/D142413

[Clang] Fix test that sometimes fails depending on the temp name

Summary:
This test has a negative check for an extra file. it turns out that
sometimes the temp name can match it. Be more specific with it.

[OpenMP] Create a temp file in /tmp if /dev/shm is not accessible

When `libomp` is initialized, it creates a temp file in `/dev/shm` to store
registration flag. Some systems, like Android, don't have `/dev/shm`, then this
feature is disabled by the macro `KMP_USE_SHM`, though most Linux distributions
have that. However, some customized distribution, such as the one reported in
https://github.com/llvm/llvm-project/issues/53955, doesn't support it either.
It causes a core dump. In this patch, if it is the case, we will try to create a
temporary file in `/tmp`, and if it still doesn't make it, then we error out.
Note that we don't consider in this patch if the temporary directory has been
set to `TMPDIR` in this patch. If `/tmp` is not accessible, we error out.

Fix #53955.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142175

[clang-format] Put peekNextToken(/*SkipComment=*/true) to good use

To prevent potential bugs in situations where we want to peek the next
non-comment token.

Differential Revision: https://reviews.llvm.org/D142412

[libc++] Introduce a compile-time mechanism to override __libcpp_verbose_abort

This changes the mechanism for verbose termination (again!) to make it
support compile-time customization in addition to link-time customization,
which is important for users who need fine-grained control over what code
gets generated around sites that call the verbose termination handler.

This concern had been raised to me both privately by prospecting users
and in https://llvm.org/D140944, so I think it is clearly worth fixing.

We still support _LIBCPP_AVAILABILITY_CUSTOM_VERBOSE_ABORT_PROVIDED for
a limited time since the same functionality can be achieved by overriding
the _LIBCPP_VERBOSE_ABORT macro.

Differential Revision: https://reviews.llvm.org/D141326

test-release.sh: Only build clang for stage1 and stage2

The stage1 and stage2 builds aren't packaged, so we only need to build
enough of the toolchain to build the next phase.

Reviewed By: thieta, amyk

Differential Revision: https://reviews.llvm.org/D141552

[compiler-rt] Remove XFAIL decorator trampoline_setup_test.c

This patch remove xfail decorator from
builtins/Unit/trampoline_setup_test.c as it is passing on Windows/AArch64
nowz. It is being skipped in code with __clang__ not defined.

https://lab.llvm.org/buildbot/#/builders/120/builds/3873

[RISCV] Combine extract_vector_elt followed by VFMV_S_F_VL.

If we're extracting an element and inserting into a undef vector
with the same number of elements, we can use the original vector.

This pattern occurs around reductions that have been cascaded
together.

This can be generalized to wider/narrow vectors by using
insert_subvector/extract_subvector, but we don't have lit tests
for that case currently.

We can also support non-undef before by using a slide or vmv.v.v

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D142264

[NFC][libc++] Remove __unexpected namespace

Remove __unexpected namespace.

Reviewed By: philnik, #libc, ldionne

Differential Revision: https://reviews.llvm.org/D141947

[lld-macho] Have all load commands aligned to the word size

This is what ld64 does, and also what we already do for most of the
other load commands. I'm not aware of a good way to test this, but I
don't think it really matters.

Differential Revision: https://reviews.llvm.org/D141462

[ADT] Use fold expressions to compare tuples. NFCI

[HWASAN] Copy some ASAN independent unit tests from ASAN to LSAN

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D142504

[CodeGen] bugfix: add REQUIRES target triple in test

[ADT] Fix circular include dependency by using std::array. NFC

2db6b34ea introduces circular dependency on llvm::ArrayRef. By
inspecting commit history, it appears that we have some issue using
deduction guide on std::array. Why don't we try std::array with explicit
template arguments?

Differential revision: https://reviews.llvm.org/D141352

[clang][test] Remove check that fails if SOURCE_DATE_EPOCH is set globally

The check for "no SOURCE_DATE_EPOCH" wasn't especially interesting, and
I am not aware of a _portable_ way to unset and environment variable in
a lit test. So remove it since it can fail if the build environment has
SOURCE_DATE_EPOCH set globally.

Differential Revision: https://reviews.llvm.org/D142511

[BOLT][DWARF] Reuse entries in .debug_addr when not modified

In some binaries produced with ThinLTO there are CUs that share entry in
.debug_addr. Before we would generate a new entry for each. Which lead to binary
size increase. This changes the behavior so that we re-use entries in
.debug_addr.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D142425

[mlir][tosa] Add RFFT2d operation

Adds the RFFT2d TOSA operation and supporting
shape inference function.

Signed-off-by: Luke Hutton <luke.hutton@arm.com>
Change-Id: I7e49c47cdd846cdc1b187545ef76d5cda2d5d9ad

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D142336

[ASan] Introduce a flag -asan-constructor-kind to control the generation of the Asan module constructor.

By default, ASan generates an asan.module_ctor function that initializes asan and
registers the globals in the module. This function is added to the
@llvm.global_ctors array. Previously, there was no way to control the
generation of this function.

This patch adds a way to control the generation of this function. The
flag -asan-constructor-kind has two options:

global: This is the default option and the default behavior of ASan. It generates an
asan.module_ctor function.
none: This skips the generation of the asan.module_ctor function.

rdar://104448572

Differential revision: https://reviews.llvm.org/D142505

[CodeGen] bugfix: ApplyDebugLocation goes out of scope before intended

rdar://103570533

Differential Revision: https://reviews.llvm.org/D142243

[OpenMP][libomptarget] Implement memory lock/unlock API in NextGen plugins

This patch implements the memory lock/unlock API, introduced in patch https://reviews.llvm.org/D139208,
in the NextGen plugins. Locked buffers feature reference counting and we allow certain overlapping. Given
an already locked buffer A, other buffers that are fully contained inside A can be locked again, even if
they are smaller than A. In this case, the reference count of locked buffer A will be incremented. However,
extending an existing locked buffer is not allowed. The original buffer is actually unlocked once all its
users have released the locked buffer and sub-buffers (i.e., the reference counter becomes zero).

Differential Revision: https://reviews.llvm.org/D141227

[InlineCost] model calls to llvm.objectsize.*

Very similar to https://reviews.llvm.org/D111272. We very often can
evaluate calls to llvm.objectsize.* regardless of inlining. Don't count
calls to llvm.objectsize.* against the InlineCost when we can evaluate
the call to a constant.

Link: https://github.com/ClangBuiltLinux/linux/issues/1302
Reviewed By: manojgupta

Differential Revision: https://reviews.llvm.org/D111456

[Clang] Add missing requires directives for new test

Summary:
Forgot to add this.

[OpenMP] Do not link the bitcode OpenMP runtime when targeting AMDGPU.

The AMDGPU target can only emit LLVM-IR, so we can always rely on LTO to
link the static version of the runtime optimally. Using the static
library only has a few advantages. Namely, it avoids several known bugs
and allows us to optimize out more functions. This is legal since the
changes in D142486 and D142484

Depends on D142486 D142484

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142491

[OpenMP] Unconditionally link the OpenMP device RTL static library

Currently we have two versions of the static library. One is built as
individual bitcode files and linked via `-mlink-builtin-bitcode`. The
other is built as a single static archive `omptarget.devicertl.a` and is
linked via `-lomptarget.devicertl` and handled by the linker wrapper
during LTO. We use the former in the case that we are not performing
LTO, because linking the library late wouldn't allow us to optimize the
runtime library effectively. The support in D142484 allows us to
unconditionally link this library, so it will only be pulled in if
needed. That is, if we linked already via `-mlink-builtin-bitcode` then
we will not pull in the static library even if it's linked on the
command line.

Depends on D142484

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142486

[LinkerWrapper] Only import static libraries with needed symbols

Currently, we pull in every single static archive member as long as we
have an offloading architecture that requires it. This goes against the
standard sematnics of static libraries that only pull in symbols that
define currently undefined symbols. In order to support this we roll
some custom symbol resolution logic to check if a static library is
needed. Because of offloading semantics, this requires an extra check
for externally visibile symbols. E.g. if a static member defines a
kernel we should import it.

The main benefit to this is that we can now link against the
`libomptarget.devicertl.a` library unconditionally. This removes the
requirement for users to specify LTO on the link command. This will also
allow us to stop using the `amdgcn` bitcode versions of the libraries.

```
clang foo.c -fopenmp --offload-arch=gfx1030 -foffload-lto -c
clang foo.o -fopenmp --offload-arch=gfx1030 -foffload-lto
```

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D142484

[OpenMP][docs] Update for record-and-replay

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142492

[BOLT] Use range-based implicit def/use accessors. NFCI

[X86] Add support for "light" AVX

AVX/AVX512 instructions may cause frequency drop on e.g. Skylake.
The magnitude of frequency/performance drop depends on instruction
(multiplication vs load/store) and vector width. Currently users,
that want to avoid this drop can specify -mprefer-vector-width=128.
However this also prevents generations of 256-bit wide instructions,
that have no associated frequency drop (mainly load/stores).

Add a tuning flag that allows generations of 256-bit AVX load/stores,
even when -mprefer-vector-width=128 is set, to speed-up memcpy&co.
Verified that running memcpy loop on all cores has no frequency impact
and zero CORE_POWER:LVL[12]_TURBO_LICENSE perf counters.

Makes coping memory faster e.g.:
BM_memcpy_aligned/256 80.7GB/s ± 3% 96.3GB/s ± 9% +19.33% (p=0.000 n=9+9)

Differential Revision: https://reviews.llvm.org/D134982

[OpenMP] Disable tests that are not supported by GCC if it is used for testing

GCC doesn't support `-fopenmp-version`, causing test failure if the compiler used
for testing is GCC.

GCC's OpenMP 5.2 support is very limited yet. Disable those tests requiring 5.2
feature for GCC as well.

We might want to take a look at all `libomp` tests and mark those tests that
don't support GCC yet.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D142173

[llvm][DiagnosticInfo] handle function pointer casts

As pointed out by @arsenm in https://reviews.llvm.org/D141451#4045099,
we don't handle ConstantExpressions for dontcall-{warn|error} IR Fn
Attrs.

Use CallBase::getCalledOperand() and Value::stripPointerCasts() should
the call to CallBase::getCalledFunction return nullptr.

I don't know how to express the IR test case in C, otherwise I'd add a
clang test, too.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D142058

IR: Add atomicrmw uinc_wrap and udec_wrap

These are essentially add/sub 1 with a clamping value.

AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.

[InstCombine] invert canonicalization of sext (x > -1) --> not (ashr x)

https://alive2.llvm.org/ce/z/2iC4oB

This is similar to changes made for zext + lshr:
21d3871b7c90
6c39a3aae1dc

The existing fold did not account for extra uses, so we
see some instruction count reductions in the test diffs.

This is intended to improve analysis (icmp likely has more
transforms than any other opcode), make other transforms
more symmetric with zext/lshr, and it can be inverted
in codegen if profitable.

As with the earlier changes, there is potential to uncover
infinite combine loops, but I have not found any yet.

[flang] Fixed missing dependency.

It looks like a flaky issue that sometimes breaks the buildbot:
https://lab.llvm.org/buildbot/#/builders/181/builds/13475

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D142081

[MC] Store target Insts table in reverse order. NFC.

This will allow an entry in the table to access data that is stored
immediately after the end of the table, by adding its opcode value
to its address.

Differential Revision: https://reviews.llvm.org/D142217

[AArch64] Add the Ampere1A core

The Ampere1A core improves on the Ampere1 with key differences being:
* memory tagging is supported
* SM3/SM4 are supported
* adds a new fusion pair for (A+B+1 and A-B-1)
(added in a later commit)

Depends on D142395

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D142396

[MC] Store number of implicit operands in MCInstrDesc. NFC.

Combine the implicit uses and defs lists into a single list of uses
followed by defs. Instead of 0-terminating the list, store the number
of uses and defs. This avoids having to scan the whole list to find the
length and removes one pointer from MCInstrDesc (although it does not
get any smaller due to alignment issues).

Remove the old accessor methods getImplicitUses, getNumImplicitUses,
getImplicitDefs and getNumImplicitDefs as all clients are using the new
implicit_uses and implicit_defs.

Differential Revision: https://reviews.llvm.org/D142216

[OpenMP][NFC] Augment release notes

[AArch64] Update enabled extensions for Ampere1 core

The original enablement for the Ampere1 core inadvertently had omitted
that FEAT_RAND is support and errorously claimed that FEAT_MTE was
available.

Adjust the definition of Ampere1 to match reality.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D142395

[OpenMP][Doc] Update release notes with NextGen plugins

Fix C++11 warnings in RangeSetTest.cpp

This change fixes the following warnings:
   llvm/clang/unittests/StaticAnalyzer/RangeSetTest.cpp:727:55: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
     727 | TYPED_TEST_SUITE(RangeSetCastToNoopTest, NoopCastTypes);
|                                                       ^
   llvm/clang/unittests/StaticAnalyzer/RangeSetTest.cpp:728:65: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
     728 | TYPED_TEST_SUITE(RangeSetCastToPromotionTest, PromotionCastTypes);
|                                                                 ^
   llvm/clang/unittests/StaticAnalyzer/RangeSetTest.cpp:729:67: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
     729 | TYPED_TEST_SUITE(RangeSetCastToTruncationTest, TruncationCastTypes);
|                                                                   ^
   llvm/clang/unittests/StaticAnalyzer/RangeSetTest.cpp:730:67: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
     730 | TYPED_TEST_SUITE(RangeSetCastToConversionTest, ConversionCastTypes);
|                                                                   ^
   llvm/clang/unittests/StaticAnalyzer/RangeSetTest.cpp:732:46: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
     732 |                  PromotionConversionCastTypes);
|                                              ^
   llvm/clang/unittests/StaticAnalyzer/RangeSetTest.cpp:734:47: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
     734 |                  TruncationConversionCastTypes);
|                                               ^

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D142439

[Clang] Only emit textual LLVM-IR in device only mode

Currently, we embed device code into the host to perform
multi-architecture linking and handling of device code. If the user
specified `-S -emit-llvm` then the embedded output will be textual
LLVM-IR. This is a problem because it can't be used by the LTO backend
and it makes reading the file confusing.

This patch changes the behaviour to only emit textual device IR if we
are in device only mode, that is, if the device code is presented
directly to the user instead of being embedded. Otherwise we should
always embed device bitcode instead.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D141717

[analyzer] Update satest dependencies

A couple of packages were out-dated while building satest docker image.
This patch updates those.

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D142454

[analyzer][solver] Improve reasoning for not equal to operator

This patch fixes certain cases where solver was not able to infer
disequality due to overlapping of values in rangeset. This case was
casting from lower signed type to bigger unsigned type.

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D140086

Revert "[15/15][Clang][RISCV][NFC] Set data member under Policy as constants"

This reverts commit 2b807336ad385e64a7d182d5fb67bdfe449707a3.

This change is causing Windows builds to hang and out of memory errors with clang-15:
- https://lab.llvm.org/buildbot/#/builders/17/builds/33129
- https://lab.llvm.org/buildbot/#/builders/174/builds/17069
- https://lab.llvm.org/buildbot/#/builders/83/builds/28484
- https://lab.llvm.org/buildbot/#/builders/172/builds/22803
- https://lab.llvm.org/buildbot/#/builders/216/builds/16210