review.tizen.org Git - platform/upstream/llvm.git/log

[NFC][InstCombine] Autogenerate checklines in a few tests being affected by an upcoming change

SROA: Enhance speculateSelectInstLoads

Allow the folding even if there is an
intervening bitcast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106667

[VPlan] Iterate over phi recipes to detect reductions to fix.

After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100102

[MCA] Simplify the rounding logic used in TimelineView::printWaitTimeEntry.

This is related to PR51392.

Before this patch, the timeline view was rounding doubles to the first decimal,
using a logic similar to this:

```
  double AverageTime = (double)Input / CumulativeExecutions;
  double Result = floor((AverageTime * 10) + 0.5) / 10
```

Here, Input and CumulativeExecutions are both unsigned integers.
The last operation is what effectively performs the rounding of AverageTime.

PR51392 has been raised because - under specific -m32 configurations of GCC -
one of the timeline tests reports slighlty different values (due to a different
rounding choice).

This patch tries to minimise the propagation of floating-point error by
hoisting the multiply by 10, so that it is performed on the unsigned.

```
  double AverageTime = (double)(Input * 10) / CumulativeExecutions;
  floor(AverageTime + 0.5) / 10
```

So we are trading a floating point multiply for a integer multiply (which can be
expanded using a simple MUL or using an `ADD + LEA` sequence). This decrease in
floating point operations executed should also help with decreasing the error in
the computation..

Strictly speaking, that computation will always be potentially subject to error
(depending on what values are passed in input). However, this patch should
improve the situation and make bug like PR51392 less frequent.

[LLD] Add required `ppc` target to the test cases. NFC

[LLD] Support compressed input sections on big-endian targets

This patch enables compressed input sections on big-endian targets by
checking the target endianness and selecting an appropriate `Chdr`
structure.

Fixes PR51369

Differential Revision: https://reviews.llvm.org/D107635

[GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs.

We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().

We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.

[OpenMP] Fix accidental reuse of VLA size

We were using an OpaqueValueExpr allocated on the stack to store
the size of a VLA. Because the VLASizeMap in CodegenFunction
uses the address of the expression to avoid recomputing VLAs,
we were accidentally reusing an earlier llvm::Value. This led to
invalid LLVM IR.

This is a temporary solution until VLASizeMap can be pushed and popped
based on the context.

Differential Revision: https://reviews.llvm.org/D107666

Fix PPC buildbot break caused by 4c4093e6e39fe6601f9c95a95a6bc242ef648cd5

This commit adds the isnan intrinsic and provides a default expansion
for it in the SDAG. However, it makes the assumption that types
it operates on are IEEE-compliant types. This is not always the case.
An example of that is PPC "double double" which has a representation
that
- Does not need to conform to IEEE requirements for isnan as it is
not an IEEE-compliant type
- Does not have a representation that allows for straightforward
reinterpreting as an integer and use of integer operations

The result was that this commit broke __builtin_isnan for ppc_fp128
making many valid numeric values report a NaN.

This patch simply changes the expansion to always expand to unordered
comparison (regardless of whether FP exceptions are tracked). This
is inline with previous semantics.

[AVR][clang] Add '$SYSROOT/avr' to possible avr-libc locations

Reviewed by: benshi001

Differential Revision: https://reviews.llvm.org/D107672

[profile][Fuchsia] Add missing system header #include

The _zx_vmar_root_self function is not a system call but
a libc function declared in a separate header.

Reviewed By: gulfem

Differential Revision: https://reviews.llvm.org/D107616

Re-land "[lldb] Upstream support for Foundation constant classes"

Upstream support for NSConstantArray, NSConstantIntegerNumber,
NSConstant{Float,Double}Number and NSConstantDictionary.

We would've upstreamed this earlier but testing it requires
-fno-constant-nsnumber-literals, -fno-constant-nsarray-literals and
-fno-constant-nsdictionary-literals which haven't been upstreamed yet.
As a temporary workaround use the system compiler (xcrun clang) for the
constant variant of the tests.

I'm just upstreaming this. The patch and the tests were all authored by
Fred Riss.

Differential revision: https://reviews.llvm.org/D107660

Revert "[lldb] Upstream support for Foundation constant classes"

This reverts commit 34d78b6a6755946e547afc47d38b59b6a2854457.

This breaks build bots witha missing file:
/home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/source/Plugins/Language/ObjC/Cocoa.cpp:10:10: fatal error: 'objc/runtime.h' file not found

Use LC_DYLD_EXPORTS_TRIE to locate the dyld trie structure if present

The pointer to the dyld trie data structure which lldb needs to parse to get
"trampoline kinds" on Darwin used to be a field in the LC_DYLD_INFO load command. A
new load command was added recently dedicated to this purpose: LC_DYLD_EXPORTS_TRIE.
The format of the trie did not change, however. So all we have to do is use the new
command if present. The commands are supposed to be mutually exclusive, so I added
an lldb_assert to warn if they are not.

Differential Revision: https://reviews.llvm.org/D107673

[MLIR][STD] Add safe scalar constant propagation for FPTruncOp

Perform scalar constant propagation for FPTruncOp only if the resulting value can be represented without precision loss or rounding.

Example:
%cst = constant 1.000000e+00 : f32
%0 = fptrunc %cst : f32 to bf16
-->
%cst = constant 1.000000e+00 : bf16

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D107518

opencl-c.h: add 3.0 optional extension support for a few more bits

These 3 are fairly simple, pipes, workgroups and subgroups.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D105858

[NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions

Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5.

PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107046

[lldb] Upstream support for Foundation constant classes

Upstream support for NSConstantArray, NSConstantIntegerNumber,
NSConstant{Float,Double}Number and NSConstantDictionary.

We would've upstreamed this earlier but testing it requires
-fno-constant-nsnumber-literals, -fno-constant-nsarray-literals and
-fno-constant-nsdictionary-literals which haven't been upstreamed yet.
As a temporary workaround use the system compiler (xcrun clang) for the
constant variant of the tests.

I'm just upstreaming this. The patch and the tests were all authored by
Fred Riss.

Differential revision: https://reviews.llvm.org/D107660

[AMDGPU] Added test for MachineLICM reg pressure. NFC.

The test shows excessive register pressure after the MachineLICM.
This is a pre-commit for the patch fixing it.

Differential Revision:

Add a "current" token to the ThreadID option to break set/modify.

This provides a convenient way to limit a breakpoint
to the current thread when setting it from the command line w/o
having to figure out what the current thread is.

Differential Revision: https://reviews.llvm.org/D107015

Revert "[lit] Have REQUIRES support the target triple"

This reverts commit 100a7b6197863d4d5ebc97761d7e98063e164e26.

Speculating that this is the reason behind a sanitizer failure:
https://lab.llvm.org/buildbot/#/builders/37/builds/5945

Fix Windows bots failure caused by 8c4208d5c1671d1b44eaf87e8f876b7d635f5114

[OpenMP]Fix PR51349: Remove AlwaysInline for if regions.

After D94315 we add the `NoInline` attribute to the outlined function to handle
data environments in the OpenMP if clause. This conflicted with the `AlwaysInline`
attribute added to the outlined function. for better performance in D106799.
The data environments should ideally not require NoInline, but for now this
fixes PR51349.

Reviewed By: mikerice

Differential Revision: https://reviews.llvm.org/D107649

[InstCombine] reduce vector casting before icmp

There may be some generalizations (see test comments) of these patterns,
but this should handle the cases motivated by:
https://llvm.org/PR51315
https://llvm.org/PR51259

The backend may want to transform differently, but at least for
the x86 examples that I looked at, there does not appear to be
any significant perf diff either way.

[InstCombine] add tests for icmp of casted vector; NFC

https://llvm.org/PR51315

[amdgpu] Revise the conversion from i64 to f32.

- Replace 'cmp+sel' with 'umin' if possible.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107507

[Profile][NFC] Clean up initializeProfileForContinuousMode

Merge two versions of `initializeProfileForContinuousMode` function into one.

Differential Revision: https://reviews.llvm.org/D107591

[Clang][DiagnosticSemaKinds] combine diagnostic texts

The diagnostic texts for warning on attributes that don't appear on the
initial declaration is generally useful. We'd like to re-use it in
D106030, but first let's combine two that already are very similar so we
may re-use it a third time in that commit.

Also, fix a few places that were using notePreviousDefinition to point
to declarations, to instead use diag::note_previous_declaration.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D107613

[clangd] Add basic support for attributes (selection, hover)

These aren't terribly common, but we currently mishandle them badly.
Not only do we not recogize the attributes themselves, but we often end up
selecting some node other than the parent (because source ranges aren't accurate
in the presence of attributes).

Differential Revision: https://reviews.llvm.org/D89785

Reapply "Support Attr in DynTypedNode and ASTMatchers."

This reverts commit 3241680f111ddf3eac37db88cacac199083543f0.
Fixed mangled post-test formatting :-(

Revert "Support Attr in DynTypedNode and ASTMatchers."

This reverts commit a4bdcdadc6ffab250b218bbdae9a0ced05bebfc9.

Fails bots:
https://lab.llvm.org/buildbot/#/builders/109/builds/20231/steps/6/logs/stdio

Support Attr in DynTypedNode and ASTMatchers.

Differential Revision: https://reviews.llvm.org/D89743

[lldb] Try harder to find the __NSCFBoolean addresses

It looks like recent CoreFoundation builds strip the non-public symbol
that we were looking for to find the 2 boolean "classes". The public
symbol is of course there, and it contains the address of the private
one. If we don't find the private symbol directly, go through a memory
read at the public symbol's location instead.

Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.

DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.

[libomptarget][amdgpu] don't declare Elf_Note on FreeBSD

On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note`
indirectly (via `<sys/elf_common.h>`). This results in compile errors
when building the libomptarget amdgpu plugin. Avoid redeclaring `struct
Elf_Note` on FreeBSD to fix the errors.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D107661

[MachO] Introduce chained fixups related load commands.

[ARC] Add codegen for llvm.ctlz intrinsic for the ARC backend

Differential Revision: https://reviews.llvm.org/D107611

Revert "[clang][fpenv][patch] Change clang option -ffp-model=precise to select ffp-contract=on"

This reverts commit 48ad446a0fb2c9b98cb7047e4daf8a84c29cef8f.

[flang] Lift -Werror checks into local functions

Lift checks for -Werror into local functions.

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D101261

[llvm] [cmake] Export LLVM_ENABLE_NEW_PASS_MANAGER into LLVMConfig.cmake

Include the vaue of LLVM_ENABLE_NEW_PASS_MANAGER in generated
LLVMConfig.cmake since it is needed by clang's build system. This fixes
test failures when the new pass manager is enabled (i.e. by default)
by having clang's CMake files correctly detect that and skip relevant
tests.

Differential Revision: https://reviews.llvm.org/D107628

[mlir] Add patterns for vector.transfer_read/write to Linalg bufferization.

Differential Revision: https://reviews.llvm.org/D107643

[CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.

Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.

While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.

For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.

Differential Revision: https://reviews.llvm.org/D106401

[CMake] Check the builtins library value first

When the builtins library isn't found, find_compiler_rt_library
returns NOTFOUND so we'll end up linking against -lNOTFOUND. We need
to check the return value before adding it to the list.

Differential Revision: https://reviews.llvm.org/D107627

Revert "[PowerPC][AIX] Limit attribute aligned to 4096."

This reverts commit 5181be344adbf7ba7dffc73526893d4e7750d34c.

Break libcxx type_traits header which uses aligned storage with
alignments greater than 4096. Reverting untill we can fix the header.

2nd Speculative fix for MachO lld test after "Have REQUIRES support the target triple"

See: http://45.33.8.238/macm1/15677/step_10.txt

Follow-up to f88ad8d as it appears the lld invocations both emit an
error message; so, try adding 'not' to the RUN lines.

[libc][nfc] move ctype_utils and FPUtils to __support

Some ctype functions are called from other libc functions (e.g. isspace
is used in atoi). By moving ctype_utils.h to __support it becomes easier
to include just the implementations of these functions. For these
reasons the implementation for isspace was moved into
ctype_utils as well.

FPUtils was moved to simplify the build order, and to clarify which
files are a part of the actual libc.

Many files were modified to accomodate these changes, mostly changing
the #include paths.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D107600

[gn build] Port 4ad9ec8a328c

[clangd] Rename Features.h -> Feature.h to avoid confilct with libstdc++

Fixes https://github.com/clangd/clangd/issues/835

Differential Revision: https://reviews.llvm.org/D107624

[mlir][NFC] Fix typos in DataLayoutInterfaces.td

[LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize

fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D107618

[GlobalISel][KnownBits] Implement G_CTPOP

Implementation copied almost verbatim from ValueTracking.

Differential revision: https://reviews.llvm.org/D107606

[MemCpyOpt] Teach memcpyopt to handle loads from the constant memory.

- Loads from the constant memory (either explicit one or as the source
of memory transfer intrinsics) won't alias any stores.

Reviewed By: asbirlea, efriedma

Differential Revision: https://reviews.llvm.org/D107605

[LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available.

This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.

This can make D107420 unnecessary.

Needs tests, but I wanted to start discussion about D107420.

Reviewed By: FreddyYe

Differential Revision: https://reviews.llvm.org/D107581

[ARM] Define ComplexPatternFuncMutatesDAG

Some of the Arm complex pattern functions call canExtractShiftFromMul,
which can modify the DAG in-place. For this to be valid and handled
successfully we need to define ComplexPatternFuncMutatesDAG.

Differential Revision: https://reviews.llvm.org/D107476

Speculative fix for MachO lld test after "Have REQUIRES support the target triple"

See: http://45.33.8.238/macm1/15677/step_10.txt

This is a test that has `REQUIRES: x86` which means it never ran
before; I don't have a MachO environment but based on the FileCheck
output it looks like it should be sufficient to remove one CHECK line.

[llvm-objcopy] [COFF] Do not patch debug entries if PointerToRawData is zero

Fix an edge case missed by https://reviews.llvm.org/D78921. For e.g.,
the Repro debug entry (generated with the /Brepro linker flag) does not
have a debug-directory payload. Do not attempt to patch Debug entries
without a payload.

Differential Revision: https://reviews.llvm.org/D107324

Disable a dataflow fuzz test after "Have REQUIRES support the target triple"

See: https://lab.llvm.org/buildbot/#/builders/75/builds/8095/steps/8/logs/stdio

which shows:
unsupported option '-fsanitize=dataflow' for target 'i386-unknown-linux-gnu'

The other dataflow tests in the same directory were already disabled,
so I think it's fine to disable this one as well.

[lldb] Fix TestFunctionStarts.py on AS

The tests strips the binary which invalidates the code signature. Skip
code signing for this test.

[MLIR][std] Introduce bitcast operation

This patch introduces a bitcast operation to the standard dialect.
RFC: https://llvm.discourse.group/t/rfc-introduce-a-bitcast-op/3774

Reviewed By: silvas

Differential Revision: https://reviews.llvm.org/D105376

[CodeGen] Remove computeDefOperandLatency (NFC)

The last use was removed on Oct 9, 2016 in commit
5c924d71173afc93aa0f0d115bd445a7496f4294.

[mlir] support collapsed loops in OpenMP-to-LLVM translation

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D105706

Fix test failure found by "Have REQUIRES support the target triple"

[AIX] Define _ARCH_PPC64 macro for 32-bit

%%%
The macro _ARCH_PPC64 is already defined for 64-bit, but this patch defines it for 32-bit on AIX to follow xlc. See: https://www.ibm.com/docs/en/xl-c-and-cpp-aix/13.1.0?topic=features-macros-related-architecture-settings

Note: This change creates a discrepancy between GCC, which defines _ARCH_PPC64 only for 64-bit mode.

Tested with SPEC.
%%%

Reviewed By: cebowleratibm

Differential Revision: https://reviews.llvm.org/D107244

[AIX] Define __HOS_AIX__ macro

%%%
This patch defines __HOS_AIX__ macro for AIX in case of a cross compiler implementation.
%%%
Tested with SPEC.

Reviewed By: cebowleratibm

Differential Revision: https://reviews.llvm.org/D107242

[lit] Have REQUIRES support the target triple

Currently the UNSUPPORTED and XFAIL clauses support specifying
substrings of the target triple; but REQUIRES does not, which can trip
people up or lead to hacking config files to insert substitute feature
names. Consistency across all three lit clauses seems preferable.

Differential Revision: https://reviews.llvm.org/D107162

Implement P1937 consteval in unevaluated contexts

In an unevaluated contexts, consteval functions should not be
immediately evaluated.

Disallow narrowing conversions to bool in noexcept specififers

Completes the support for P1401R5.

[AIX] Define __THW_PPC__ macro

%%%
This patch defines the macro __THW_PPC__ for AIX.
%%%

Tested with SPEC.

Reviewed By: cebowleratibm

Differential Revision: https://reviews.llvm.org/D107243

[AIX] Define __THW_BIG_ENDIAN__ macro

%%%
This patch defines the macro __THW_BIG_ENDIAN__ for AIX.
%%%

Tested with SPEC.

Reviewed By: cebowleratibm

Differential Revision: https://reviews.llvm.org/D107241

[GlobalISel] Improve widening of cttz/cttz_zero_undef

Differential Revision: https://reviews.llvm.org/D107631

[libc++] IWYU to fix Modules complaints about _LIBCPP_ASSERT. NFCI.

This fixes all places that used _LIBCPP_ASSERT without including <__debug>.

git grep -l _LIBCPP_ASSERT | xargs git grep -L __debug

[clangd] Canonicalize inputs provided with `--`

We already strip all the inputs provided without `--`, this patch also
handles the cases with `--`.

Differential Revision: https://reviews.llvm.org/D107637

[clangd] Strip mutliple arch options

This patch strips all the arch options in case of multiple ones. As it
results in multiple compiler jobs, which clangd cannot handle.

It doesn't pick any over the others as it is unclear which one the user wants
and defaulting to host architecture seems less surprising. Users also have the
ability to explicitly specify the architecture they want via clangd config
files.

Fixes https://github.com/clangd/clangd/issues/827.

Differential Revision: https://reviews.llvm.org/D107634

[AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description.

Corrected sendmsg description (bug https://bugs.llvm.org/show_bug.cgi?id=49648).

[clang] Remove misleading assertion in FullSourceLoc

D31709 added an assertion was added to `FullSourceLoc::hasManager()` that ensured a valid `SourceLocation` is always paired with a `SourceManager`, and missing `SourceManager` is always paired with an invalid `SourceLocation`.

This appears to be incorrect, since clients never cared about constructing `FullSourceLoc` to uphold that invariant, or always checking `isValid()` before calling `hasManager()`.

The assertion started failing when serializing diagnostics pointing into an explicit module. Explicit modules don't have valid `SourceLocation` for the `import` statement, since they are "imported" from the command-line argument `-fmodule-name=x.pcm`.

This patch removes the assertion, since `FullSourceLoc` was never intended to uphold any kind of invariants between the validity of `SourceLocation` and presence of `SourceManager`.

Reviewed By: arphaman

Differential Revision: https://reviews.llvm.org/D106862

[flang][docs] Document the `flang` wrapper script

Differential Revision: https://reviews.llvm.org/D107543

[profile] Only use NT_GNU_BUILD_ID if supported

The Solaris buildbots have been broken for some time by the unconditional
use of `NT_GNU_BUILD_ID`, e.g. Solaris/sparcv9
<https://lab.llvm.org/staging/#/builders/50/builds/4910> and Solaris/amd64
<https://lab.llvm.org/staging/#/builders/101/builds/3751>.  Being a GNU
extension, it is not defined in `<sys/elf.h>`.  However, providing a
fallback definition doesn't help because the code also relies on
`__ehdr_start`, another unportable GNU extension that most likely never
will be implemented in Solaris `ld`.  Besides, there's reallly no point in
supporting build ids since they aren't used on Solaris at all.

This patch fixes this by making the relevant code conditional on the
definition of `NT_GNU_BUILD_ID`.

Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D107556

[NFC][MLGO] Make logging more robust

1) add some self-diagnosis (when asserts are enabled) to check that all
features have the same nr of entries

2) avoid storing pointers to mutable fields because the proto API
contract doesn't actually guarantee those stay fixed even if no further
mutation of the object occurs.

Differential Revision: https://reviews.llvm.org/D107594

Split 'qualifier on reference type has no effect' out into a new flag

This introduces a new flag ignored-reference-qualifiers for the
existing "'A' qualifier on reference type B has no effect" diagnostic,
as a child of ignored-qualifiers.

Rationale:
This particular diagnostic is enabled by default, but other parts of
ignored-qualifiers are not. Anecdotally, a user may encounter this
diagnostic in the wild, and, seeing it to be valuable, might try to
raise it to error with -Werror=ignored-qualifiers, whereupon the other
diagnostics the flag covers will also be raised, to the user's surprise
and confusion. By splitting this diagnostic out into a separate flag,
and marking it as a child of ignored-qualifiers, we allow the user more
granular control of the diagnostics they care about, while maintaining
backwards compatibility with existing build scripts.

[AMDGPU] Handle functions in llvm's global ctors and dtors list

This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682

[ARM] Fold insert_subvector to concat_vectors

D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage.

I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.

[X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281)

As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width.

The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data.

This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize.

Differential Revision: https://reviews.llvm.org/D107158

[linalg] Expose `rewriteAsPaddedOp` function.

Differential Revision: https://reviews.llvm.org/D107629

[C++4OpenCL] Introduces __remove_address_space utility

This change provides a way to conveniently declare types that have
address space qualifiers removed.

Since OpenCL adds address spaces implicitly even when they are not
specified in source, it is useful to allow deriving address space
unqualified types.

Fixes llvm.org/PR45326

Differential Revision: https://reviews.llvm.org/D106785

[Orc][examples] Temporarily disable tests for the C API due to failures on sanitizer bots

These tests were added while the OrcV2Example tests had been disabled:
https://reviews.llvm.org/rGe5d8cfb2f134fcf0235ec1a35eec875a9cd36b21

Failures on sanitizer bots:
https://green.lab.llvm.org/green/job/clang-stage2-cmake-RgSan/7992/testReport/

[AArch64] NFC: drop unnecessary llvm:: namespace prefix on MCInst

[OpenCL][Docs] Adding builtins requires adding to both now

As we are trying to reach parity between opencl-c.h and
-fdeclare-opencl-builtins, ensure the documentation mentions that new
builtins should be added to both.

Reviewed by: Anastasia Stulova

[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform

This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284

[mlir] Allow to override type/attr aliases from various hooks

Use new return type for `OpAsmDialectInterface::getAlias`:

* `AliasResult::NoAlias` if an alias was not provided.
* `AliasResult::OverridableAlias` if an alias was provided, but it might be overriden by other hook.
* `AliasResult::FinalAlias` if an alias was provided and it should be used (no other hooks will be checked).

In that case `AsmPrinter` will use either the first alias with `FinalAlias` result or
the last alias with `OverridableAlias` result (it depends on dialect array order).

Used `OverridableAlias` result for `BuiltinOpAsmDialectInterface`.

Use case: provide more informative alias for built-in attributes like `AffineMapAttr`
instead of generic "map<N>".

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D107437

[FuncSpec] Return changed if function is changed by tryToReplaceWithConstant

The may get changed before specialization by RunSCCPSolver. In other
words, the pass may change the function without specialization happens.
Add test and comment to reveal this.
And it may return No Changed if the function get changed by
RunSCCPSolver before the specialization. It looks like a potential bug.

Test Plan: check-all

Reviewed By: https://reviews.llvm.org/D107622

Differential Revision: https://reviews.llvm.org/D107622

[llvm-readobj][XCOFF] Warn about invalid offset

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D107398

Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors"

This reverts commit 95800da914938129083df2fa0165c1901909c273.

[AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz

Differential Revision: https://reviews.llvm.org/D107474

[AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef

We can improve on the generic splitting by using ffbh/ffbl, which have a
defined result when the input is zero.

Differential Revision: https://reviews.llvm.org/D107442

[AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32

This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.

Differential Revision: https://reviews.llvm.org/D107441

[GlobalISel] Improve legalization of narrow CTTZ

Differential Revision: https://reviews.llvm.org/D107457

[NFC] [FuncSpec] Remove unused variables in isArgumentInteresting

[FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish)

Noticed that the computation for function specialization cost of a
function wouldn't change during the traversal of the arguments for the
function. We could hoist the computation out of the traversal. I
observed about ~1% improvement on compile time for spec2017. But I guess
it may not be precise. This should be NFC and fine.

Reviewed By: Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D107621

Introduce intrinsic llvm.isnan

This is recommit of the patch 16ff91ebccda1128c43ff3cee104e2c603569fb2,
reverted in 0c28a7c990c5218d6aec47c5052a51cba686ec5e because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854

[LV] Move reduction PHI node fixup to VPlan::execute (NFC).

All information to fix-up the reduction phi nodes in the vectorized loop
is available in VPlan now. This patch moves the code to do so, to make
this clearer. Fixing up the loop exit value still relies on other
information and remains outside of VPlan for now.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100113