review.tizen.org Git - platform/upstream/llvm.git/log

[FPEnv][X86] Implement lowering of llvm.set.rounding

Differential Revision: https://reviews.llvm.org/D74730

Use an allow list on reserved macro identifiers

The allow list is based on various official sources (see in-code comment).

This fixes https://bugs.llvm.org/show_bug.cgi?id=50248

Differential Revision: https://reviews.llvm.org/D102168

[clang-repl] Add exhaustive list of libInterpreter dependencies.

This patch should appease the bots building with -DBUILD_SHARED_LIBS=On,
resolving the regression introduced in 92f9852fc99b.

[clang-repl] Recommit "Land initial infrastructure for incremental parsing"

Original commit message:

  In http://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html we have
  mentioned our plans to make some of the incremental compilation facilities
  available in llvm mainline.

  This patch proposes a minimal version of a repl, clang-repl, which enables
  interpreter-like interaction for C++. For instance:

  ./bin/clang-repl
  clang-repl> int i = 42;
  clang-repl> extern "C" int printf(const char*,...);
  clang-repl> auto r1 = printf("i=%d\n", i);
  i=42
  clang-repl> quit

  The patch allows very limited functionality, for example, it crashes on invalid
  C++. The design of the proposed patch follows closely the design of cling. The
  idea is to gather feedback and gradually evolve both clang-repl and cling to
  what the community agrees upon.

  The IncrementalParser class is responsible for driving the clang parser and
  codegen and allows the compiler infrastructure to process more than one input.
  Every input adds to the “ever-growing” translation unit. That model is enabled
  by an IncrementalAction which prevents teardown when HandleTranslationUnit.

  The IncrementalExecutor class hides some of the underlying implementation
  details of the concrete JIT infrastructure. It exposes the minimal set of
  functionality required by our incremental compiler/interpreter.

  The Transaction class keeps track of the AST and the LLVM IR for each
  incremental input. That tracking information will be later used to implement
  error recovery.

  The Interpreter class orchestrates the IncrementalParser and the
  IncrementalExecutor to model interpreter-like behavior. It provides the public
  API which can be used (in future) when using the interpreter library.

  Differential revision: https://reviews.llvm.org/D96033

[mlir] Support masks in TransferOpReduceRank and TransferReadPermutationLowering

These two patterns allow for more efficient codegen in VectorToSCF.

Differential Revision: https://reviews.llvm.org/D102222

[gn build] Port d8b37de8a478

[GC][NFC] Move GCStrategy from CodeGen to IR

We want it to be available in analyzes so that we could use the
CodeGen notion in middle-end passes (for example, to check if
a GC may free some particular pointer).

This is a preparatory patch that simply moves the files around.

Note: if this causes some build issues, this patch must just be reverted.

Differential Revision: https://reviews.llvm.org/D100557
Reviewed By: reames

[JITLink] Expose x86-64 pointer jump stub block construction.

This can be useful for clients who want to define their own symbol for the
stub, or re-use some existing symbol.

[JITLink] Add a transferDefinedSymbol operation.

The transferDefinedSymbol operation updates a Symbol's target block, offset,
and size. This can be convenient when you want to redefine the content of some
symbol(s) pointing at a block, while retaining the original block in the graph.

Add some warnings when debugserver is running in translation

A debugserver launched x86_64 cannot control an arm64/arm64e
process on an Apple Silicon system. Warn when this situation
has happened and return an error for the most common case of
attach. I think there will be refinements to this in the
future, but start out by making it easy to spot the problem
when it happens.

rdar://76630595

[Coroutines] Salvege Debug.values

Summary: The previous implementation of coro-split didn't collect values
used by dbg instructions into the spills which made a log debug info
unavailable with optimization on.
This patch tries to collect these uses which are used by dbg.values. In
this way, the debugbility of coroutine could be as powerful as normal
functions with optimization on.

To avoid enlarging the coroutine frame, this patch only collects
`dbg.value` whose value is already in the coroutine frame. This decision
may make some debug info getting unavailable. But if we are with
optimization on, the performance issue should be considered first. And
this patch would make the debugbility of coroutine to be better only
without changing the layout of the frame.

Test-plan: check-llvm

Reviewed By: aprantl, lxfind

Differential Revision: https://reviews.llvm.org/D97673

[mlir][tosa] Fix tosa.cast semantics to perform rounding/clipping

Rounding to integers requires rounding (for floating points) and clipping
to the min/max values of the destination range. Added this behavior and
updated tests appropriately.

Reviewed By: sjarus, silvas

Differential Revision: https://reviews.llvm.org/D102375

Revert "[clang-repl] Land initial infrastructure for incremental parsing"

This reverts commit 44a4000181e1a25027e87f2ae4e71cb876a7a275.

We are seeing build failures due to missing dependency to libSupport and
CMake Error at tools/clang/tools/clang-repl/cmake_install.cmake
file INSTALL cannot find

[Coroutines] Enable printing coroutine frame when dbg info is available

Summary: This patch tries to build debug info for coroutine frame in the
middle end. Although the coroutine frame is constructed and maintained by
the compiler and the programmer shouldn't care about the coroutine frame
by the design of C++20 coroutine,
a lot of programmers told me that they want to see the layout of the
coroutine frame strongly. Although C++ is designed as an abstract layer
so that the programmers shouldn't care about the actual memory in bits,
many experienced C++ programmers  are familiar with assembler and
debugger to see the memory layout in fact, After I was been told they
want to see the coroutine frame about 3 times, I think it is an actual
and desired demand.

However, the debug information is constructed in the front end and
coroutine frame is constructed in the middle end. This is a natural and
clear gap. So I could only try to construct the debug information in the
middle end after coroutine frame constructed. It is unusual, but we are
in consensus that the approch is the best one.

One hard part is we need construct the name for variables since there
isn't a map from llvm variables to DIVar. Then here is the strategy this
patch uses:
- The name `__resume_fn `, `__destroy_fn` and `__coro_index ` are
  constructed by the patch.
- Then the name `__promise` comes from the dbg.variable of corresponding
  dbg.declare of PromiseAlloca, which shows highest priority to
construct the debug information for the member of coroutine frame.
- Then if the member is struct, we would try to get the name of the llvm
  struct directly. Then replace ':' and '.' with '_' to make it
printable for debugger.
- If the member is a basic type like integer or double, we would try to
  emit the corresponding name.
- Then if the member is a Pointer Type, we would add `Ptr` after
  corresponding pointee type.
- Otherwise, we would name it with 'UnknownType'.

Reviewered by: lxfind, aprantl, rjmcall, dblaikie

Differential Revision: https://reviews.llvm.org/D99179

[SLP] Add insertelement instructions to vectorizable tree

Add new type of tree node for `InsertElementInst` chain forming vector.
These instructions could be either removed, or replaced by shuffles during
vectorization and we can add this node to cost model, so naturally estimating
their cost, getting rid of `CompensateCost` tricks and reducing further work
for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this
patch is the first step towards revectorization of partially vectorization
(to fix PR42022 completely). After adding inserts to tree the next step is
to add vector instructions there (for instance, to merge `store <2 x float>`
and `store <2 x float>` to `store <4 x float>`).

Fixes PR40522 and PR35732.

Differential Revision: https://reviews.llvm.org/D98714

[SLP][Test] Fix and precommit tests for D98714

[clang-repl] Land initial infrastructure for incremental parsing

In http://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html we have
mentioned our plans to make some of the incremental compilation facilities
available in llvm mainline.

This patch proposes a minimal version of a repl, clang-repl, which enables
interpreter-like interaction for C++. For instance:

./bin/clang-repl
clang-repl> int i = 42;
clang-repl> extern "C" int printf(const char*,...);
clang-repl> auto r1 = printf("i=%d\n", i);
i=42
clang-repl> quit

The patch allows very limited functionality, for example, it crashes on invalid
C++. The design of the proposed patch follows closely the design of cling. The
idea is to gather feedback and gradually evolve both clang-repl and cling to
what the community agrees upon.

The IncrementalParser class is responsible for driving the clang parser and
codegen and allows the compiler infrastructure to process more than one input.
Every input adds to the “ever-growing” translation unit. That model is enabled
by an IncrementalAction which prevents teardown when HandleTranslationUnit.

The IncrementalExecutor class hides some of the underlying implementation
details of the concrete JIT infrastructure. It exposes the minimal set of
functionality required by our incremental compiler/interpreter.

The Transaction class keeps track of the AST and the LLVM IR for each
incremental input. That tracking information will be later used to implement
error recovery.

The Interpreter class orchestrates the IncrementalParser and the
IncrementalExecutor to model interpreter-like behavior. It provides the public
API which can be used (in future) when using the interpreter library.

Differential revision: https://reviews.llvm.org/D96033

[mlir] Support memref layout maps in vector transfer ops

Differential Revision: https://reviews.llvm.org/D102042

[mlir] Unrolled progressive-vector-to-scf.

Instead of an SCF for loop, these pattern generate fully unrolled loops with no temporary buffer allocations.

Differential Revision: https://reviews.llvm.org/D101981

[mlir] Allow empty position in vector.insert and vector.extract

Such ops are no-ops and are folded to their respective `source`/`vector` operand.

Differential Revision: https://reviews.llvm.org/D101879

[mlir] Fix masked vector transfer ops with broadcasts

Broadcast dimensions of a vector transfer op have no corresponding dimension in the mask vector. E.g., a 2-D TransferReadOp, where one dimension is a broadcast, can have a 1-D `mask` attribute.

This commit also adds a few additional transfer op integration tests for various combinations of broadcasts, masking, dim transposes, etc.

Differential Revision: https://reviews.llvm.org/D101745

[Debug-Info] add -gstrict-dwarf support in backend

Reviewed By: dblaikie, probinson

Differential Revision: https://reviews.llvm.org/D100826

Revert "[mlir] Fix masked vector transfer ops with broadcasts"

This reverts commit c9087788f7e41285445729127dd07ff7f82e3fc0.

Accidentally pushed old version of the commit.

[mlir] Fix masked vector transfer ops with broadcasts

Broadcast dimensions of a vector transfer op have no corresponding dimension in the mask vector. E.g., a 2-D TransferReadOp, where one dimension is a broadcast, can have a 1-D `mask` attribute.

This commit also adds a few additional transfer op integration tests for various combinations of broadcasts, masking, dim transposes, etc.

Differential Revision: https://reviews.llvm.org/D101745

[clang] Minor fix for MarkVarDeclODRUsed

Merge two if as follow up of https://reviews.llvm.org/D102270

Rename human-readable name for DW_LANG_Mips_Assembler

The Mips in DW_LANG_Mips_Assembler is a vendor name not an
architecture name and in lack of a proper generic DW_LANG_assembler,
some assemblers emit DWARF using this tag. Due to a warning I recently
introduced users will now be greeted with

This version of LLDB has no plugin for the mipsassem language. Inspection of frame variables will be limited.

By renaming this to just "Assembler" this error message will make more sense.

Differential Revision: https://reviews.llvm.org/D101406

rdar://77214764

Handle unexpanded packs appearing in type-constraints.

For a type-constraint in a lambda signature, this makes the lambda
contain an unexpanded pack; for requirements in a requires-expressions
it makes the requires-expression contain an unexpanded pack; otherwise
it's invalid.

PR50306: When instantiating a generic lambda with a constrained 'auto',
properly track that it has constraints.

Previously an instantiation of a constrained generic lambda would behave
as if unconstrained because we incorrectly cached a "has no constraints"
value that we computed before the constraints from 'auto' parameters
were attached.

Clean up handling of constrained parameters in lambdas.

No functionality change intended.

Add test for substitutability of variable templates in closure type
mangling.

[AMDGPU][OpenMP] Emit textual IR for -emit-llvm -S

Previously clang would print a binary blob into the bundled file
for amdgcn. With this patch, it will instead print textual IR as
expected.

Reviewed By: JonChesterfield, ronlieb

Differential Revision: https://reviews.llvm.org/D102065

Change-Id: I10c0127ab7357787769fdf9a2edd4b3071e790a1

scudo: Require fault address to be in bounds for UAF.

The bounds check that we previously had here was suitable for secondary
allocations but not for UAF on primary allocations, where it is likely
to result in false positives. Fix it by using a different bounds check
for UAF that requires the fault address to be in bounds.

Differential Revision: https://reviews.llvm.org/D102376

[libcxx] modifies `_CmpUnspecifiedParam` ignore types outside its domain

D85051's honeypot solution was a bit too aggressive swallowed up the
comparison types, which made comparing objects of different ordering
types ambiguous.

Depends on D101707.

Differential Revision: https://reviews.llvm.org/D101708

[mlir][sparse][capi][python] add sparse tensor passes

First set of "boilerplate" to get sparse tensor
passes available through CAPI and Python.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D102362

[AMDGPU] Refactor shouldExpandAtomicRMWInIR(). NFC.

This is logic simplification for better readability.

Differential Revision: https://reviews.llvm.org/D102371

[mlir][Linalg] Add interface methods to get lhs and rhs of contraction

Differential Revision: https://reviews.llvm.org/D102301

Change the context instruction for computeKnownBits in LoadStoreVectorizer pass

This change enables cases for which the index value for the first
load/store instruction in a pair could be a function argument. This
allows using llvm.assume to provide known bits information in such
cases.

Patch by Viacheslav Nikolaev. Thanks!

Differential Revision: https://reviews.llvm.org/D101680

Optimize GSymCreator::finalize.

The algorithm removing duplicates from the Funcs list used to have
amortized quadratic time complexity because it was potentially
removing each entry using std::vector::erase individually. This
patch is now using a erase-remove idiom with an adapted
removeIfBinary algorithm.

Probably this was made under the assumption that these removals are
rare, but there are cases where the case of duplicate entries is
occurring frequently. In these cases, the actual runtime was very
poor, taking hours to process a single binary of around 1 GiB size
including debug info. Another factor contributing to that is the
frequent output of the warning, which is now removed.

It seems this is particularly an issue with GCC-compiled binaries,
rather than clang-built binaries.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D102219

[mlir] Fix ssa values naming bug

Address comments in https://reviews.llvm.org/D102226 to fix the bug + style violations

Differential Revision: https://reviews.llvm.org/D102368

[lld][WebAssembly] Allow data symbols to extend past end of segment

This fixes a bug with string merging with string symbols that contain
NULLs, as is the case in the `merge-string.s` test.

The bug only showed when we run with `--relocatable` and then try read
the resulting object back in.  In this case we would end up with string
symbols that extend past the end of the segment in which they live.

The problem comes from the fact that sections which are flagged as
string mergable assume that all strings are NULL terminated.  The
merging algorithm will drop trailing chars that follow a NULL since they
are essentially unreachable.  However, the "size" attribute (in the
symbol table) of such a truncated symbol is not updated resulting a
symbol size that can overlap the end of the segment.

I verified that this can happen in ELF too given the right conditions
and the its harmless enough.  In practice Strings that contain embedded
null should not be part of a mergable section.

Differential Revision: https://reviews.llvm.org/D102281

[WebAssembly] Add TLS data segment flag: WASM_SEG_FLAG_TLS

Previously the linker was relying solely on the name of the segment
to imply TLS.

Differential Revision: https://reviews.llvm.org/D102202

[WebAssembly] Allow Wasm EH with Emscripten SjLj

We explicitly made it error out in D101403, out of a good intention that
the error message will make people less confusing. Turns out, we weren't
failing all cases of wasm EH + SjLj; only a few cases were failing and
our client was able to get around by fixing source code, but now we made
it fail for all cases, even the cases that previously succeeded fail,
which we didn't intend. This reverts that change.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D102364

[RISCV] Remove RISCVII:VSEW enum. Make encodeVYPE operate directly on SEW.

The VSEW encoding isn't a useful value to pass around. It's better
to use SEW or log2(SEW) directly. The only real ugliness is that
the vsetvli IR intrinsics use the VSEW encoding, but it's easy
enough to decode that when the intrinsic is processed.

Fix bad mangling of <data-member-prefix> for a closure in the initializer of a variable at global namespace scope.

This implements the direction proposed in
https://github.com/itanium-cxx-abi/cxx-abi/pull/126.

Differential Revision: https://reviews.llvm.org/D101968

[mlir-lsp-server][NFC] Add newline between Protocol JSON serialization methods and class definitions.

[mlir-lsp-server] Add support for sending diagnostics to the client

This allows for diagnostics emitted during parsing/verification to be surfaced to the user by the language client, as opposed to just being emitted to the logs like they are now.

Differential Revision: https://reviews.llvm.org/D102293

[flang] Fix standalone builds

Flang's CMake modules directory was being added to the CMake module path
twice, and AddFlang was being included after the first addition. Remove
the unnecessary first addition and move the AddFlang include down to the
second one. This way, it occurs after LLVM's CMake modules have been
included for a standalone build, so it can make use of those modules.

Suppress Deferred Diagnostics in discarded statements.

It doesn't really make sense to emit language specific diagnostics
in a discarded statement, and suppressing these diagnostics results in a
programming pattern that many users will feel is quite useful.

Basically, this makes sure we only emit errors from the 'true' side of a
'constexpr if'.

It does this by making the ExprEvaluatorBase type have an opt-in option
as to whether it should visit discarded cases.

Differential Revision: https://reviews.llvm.org/D102251

[mlir][tosa] Remove tosa.identityn operator

Removes the identityn operator from TOSA MLIR definition.
Removes TosaToLinAlg mappings

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D102329

[ELF][AVR] Add explicit relocation types to getRelExpr

[LLD] [COFF] Fix including the personality function for DWARF EH when linking with --gc-sections

Since c579a5b1d92a9bc2046d00ee2d427832e0f5ddec we don't traverse
.eh_frame when doing GC. But the exception handling personality
function needs to be included, and is only referenced from within
.eh_frame.

Differential Revision: https://reviews.llvm.org/D102138

[libcxx] [test] Fix fs.op.last_write_time for Windows

Don't use stat and lstat on Windows; lstat is missing, stat only provides
the modification times with second granularity (and does the wrong thing
regarding symlinks). Instead do a minimal reimplementation using the
native windows APIs.

Differential Revision: https://reviews.llvm.org/D101731

[cmake] Fix typo in function name

Not sure how my local testing didn't trigger this path. Should fix
https://lab.llvm.org/buildbot/#/builders/132/builds/5494

[libc++][nfc] remove duplicated __to_unsigned.

Both `<type_traits>` and `<charconv>` implemented this function with
different names and a slightly different behavior. This removes the
version in `<charconv>` and improves the version in `<typetraits>`.

- The code can be used again in C++11.
- The original claimed C++14 support, but `[[nodiscard]]` is not
available in C++14.
- Adds `_LIBCPP_INLINE_VISIBILITY`.

Reviewed By: zoecarver, #libc, Quuxplusone

Differential Revision: https://reviews.llvm.org/D102332

[InstCombine] Support one-hot merge for logical and/or

If a logical and/or is used, we need to be careful not to propagate
a potential poison value from the RHS by inserting a freeze
instruction. Otherwise it works the same way as bitwise and/or.

This is intended to address the regression reported at
https://reviews.llvm.org/D101191#2751002.

Differential Revision: https://reviews.llvm.org/D102279

Add type information to integral template argument if required.

Non-comprehensive list of cases:
* Dumping template arguments;
* Corresponding parameter contains a deduced type;
* Template arguments are for a DeclRefExpr that hadMultipleCandidates()

Type information is added in the form of prefixes (u8, u, U, L),
suffixes (U, L, UL, LL, ULL) or explicit casts to printed integral template
argument, if MSVC codeview mode is disabled.

Differential revision: https://reviews.llvm.org/D77598

Revert "Produce warning for performing pointer arithmetic on a null pointer."

This reverts commit dfc1e31d49fe1380c9bab43373995df5fed15e6d.
See discussion on https://reviews.llvm.org/D98798

[clang-tidy] Allow opt-in or out of some commonly occuring patterns in NarrowingConversionsCheck.

Within clang-tidy's NarrowingConversionsCheck.
* Allow opt-out of some common occurring patterns, such as:
  - Implicit casts between types of equivalent bit widths.
  - Implicit casts occurring from the return of a ::size() method.
  - Implicit casts on size_type and difference_type.
* Allow opt-in of errors within template instantiations.

This will help projects adopt these guidelines iteratively.
Developed in conjunction with Yitzhak Mandelbaum (ymandel).

Patch by Stephen Concannon!

Differential Revision: https://reviews.llvm.org/D99543

[PhaseOrdering] Add test for missing vectorization with NewPM.

[SCEV] Add loop-guard pessimizing test with step = 2.

[LoopFlatten] Simplify loops so that the pass can operate on unsimplified loops.

The loop flattening pass requires loops to be in simplified form. If the
loops are not in simplified form, the pass cannot operate. This patch
simplifies all loops before flattening. As a result, all loops will be
simplified regardless of whether anything ends up being flattened.

This change was inspired by observing a certain loop that was not flatten
because the loops were not in simplified form. This loop is added as a
test to verify that it is now flattened.

Differential Revision: https://reviews.llvm.org/D102249

Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30

[cmake] Add support for multiple distributions

LLVM's build system contains support for configuring a distribution, but
it can often be useful to be able to configure multiple distributions
(e.g. if you want separate distributions for the tools and the
libraries). Add this support to the build system, along with
documentation and usage examples.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D89177

[mlir][linalg] Fixed issue generating reassociation map with Rank-0 types

Rank-0 case causes a graph during linalg reshape operation.

Differential Revision: https://reviews.llvm.org/D102282

Remove AST inclusion from Basic include

That's a cyclic dependency. NFC.

[mlir][openacc] Add OpenACC translation to LLVM IR (enter_data op create/copyin)

This patch begins to translate acc.enter_data operation to call to tgt runtime call.
It currently only translate create/copyin operands of memref type. This acts as a basis to add support
for FIR types in the Flang/OpenACC support. It follows more or less a similar path than clang
with `omp target enter data map` directives.
This patch is taking a different approach than D100678 and perform a translation to LLVM IR
and make use of the OpenMPIRBuilder instead of doing a conversion to the LLVMIR dialect.

OpenACC support in Flang will rely on the current OpenMP runtime where 1:1 lowering can be
applied. Some extension will be added where features are not available yet.

Big part of this code will be shared for other standalone data operations in the OpenACC
dialect such as acc.exit_data and acc.update.

It is likely that parts of the lowering can also be shared later with the ops for
standalone data directives in the OpenMP dialect when they are introduced.

This is an initial translation and it probably needs more work.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D101504

[NFC][clang][Codegen] Split ThunkInfo into it's own header

Otherwise we'll have issues with forward definition of GlobalDecl.

Split off from https://reviews.llvm.org/D100388

[NFCI][clang][Codegen] CodeGenVTables::addVTableComponent(): use getGlobalDecl

It does the same thing.
Split off from https://reviews.llvm.org/D100388

[X86] Fix -Wunused-lambda-capture

[CMake][ELF] Add -fno-semantic-interposition and -Bsymbolic-functions

llvm-dev message: https://lists.llvm.org/pipermail/llvm-dev/2021-May/150465.html

In an ELF shared object, a default visibility defined symbol is preemptible by default.
This creates some missed optimization opportunities. -fno-semantic-interposition can optimize -fPIC:

* in Clang: avoid GOT/PLT cost for variable access/function calls to external linkage definition in the same TU
* in GCC: enable interprocedural optimizations (including inlining) and avoid PLT

See https://gist.github.com/MaskRay/2d4dfcfc897341163f734afb59f689c6 for more information.

-Bsymbolic-functions is more aggressive than -fvisibility-inlines-hidden (present since 2012) as it applies
to all function definitions. It can

* avoid PLT for cross-TU function calls && reduce dynamic symbol lookup
* reduce dynamic symbol lookup for taking function addresses and optimize out GOT/TOC on x86-64/ppc64

With both options, the libLLVM.so and libclang-cpp.so performance should
be closer to PIE binary linking against `libLLVM*.a` and `libclang*.a`

(In a -DLLVM_TARGETS_TO_BUILD=X86 build, the number of JUMP_SLOT decreases from 12716 to 1628, and the number of GLOB_DAT decreases from 1918 to 1313
The built clang with `-DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on` is significantly faster.
See the Linux kernel build result https://bugs.archlinux.org/task/70697
)

Some implication:

Interposing a subset of functions is no longer supported.
(This is fragile anyway and cannot really be supported. For Mach-O we don't use
`ld -interpose`, so interposition is not supported on Mach-O at all.)

Compiling a program which takes the address of any LLVM function with
`{gcc,clang} -fno-pic` and expects the address to equal to the address taken
from libLLVM.so or libclang-cpp.so is unsupported. I am fairly confident that
llvm-project shouldn't have different behaviors depending on such pointer
equality (as we've been using -fvisibility-inlines-hidden which applies to
inline functions for a long time), but if we accidentally do, users should be
aware that they should not make assumption on pointer equality in `-fno-pic`
mode.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D102090

Update static bound checker for Linalg to cover decreasing cases

The current static checker for linalg does not work on the decreasing
index cases well. So, this is to Update the current static bound checker
for linalg to cover decreasing index cases.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D102302

[X86][AVX] Fold concat(ps*lq(x,32),ps*lq(y,32)) -> shuffle(concat(x,y),zero) (PR46621)

On AVX1 targets we can handle v4i64 logical shifts by 32 bits as a pair of v8f32 shuffles with zero.

I was hoping to put this in LowerScalarImmediateShift, but performing that early causes regressions where other instructions were respliting the subvectors.

[mlir][sparse] keep runtime support library signature consistent

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D102285

[AArch64][GlobalISel] Add MMOs to constant pool loads to allow LICM hoisting.

This caused performance regressions vs SDAG on SingleSource/Benchmarks/Adobe-C++

[lld-macho] Implement branch-range-extension thunks

Extend the range of calls beyond an architecture's limited branch range by first calling a thunk, which loads the far address into a scratch register (x16 on ARM64) and branches through it.

Other ports (COFF, ELF) use multiple passes with successively-refined guesses regarding the expansion of text-space imposed by thunk-space overhead. This MachO algorithm places thunks during MergedOutputSection::finalize() in a single pass using exact thunk-space overheads. Thunks are kept in a separate vector to avoid the overhead of inserting into the `inputs` vector of `MergedOutputSection`.

FIXME:
* arm64-stubs.s test is broken
* add thunk tests
* Handle thunks to DylibSymbol in MergedOutputSection::finalize()

Differential Revision: https://reviews.llvm.org/D100818

[libomptarget][amdgpu][nfc] Expand errorcheck macros

[libomptarget][amdgpu][nfc] Expand errorcheck macros

These macros expand to continue, which is confusing, or exit,
which is incompatible with continuing execution on offloading fail.

Expanding the macros in place makes the code look untidy but the
control flow obvious and amenable to improving. In particular, exit
becomes easier to eliminate.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D102230

[SystemZ][z/OS] Fix warning caused by umask returning a signed integer type

On z/OS, umask() returns an int because mode_t is type int, however it is being compared to an unsigned int. This patch fixes the following warning we see when compiling Path.cpp.

```
comparison of integers of different signs: 'const int' and 'const unsigned int'
```

Reviewed By: muiez

Differential Revision: https://reviews.llvm.org/D102326

[docs] Fix documentation for bugprone-dangling-handle

string_view isn't experimental anymore.
This check has always handled both forms.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D102313

[MLIR] Factor pass timing out into a dedicated timing manager

This factors out the pass timing code into a separate `TimingManager`
that can be plugged into the `PassManager` from the outside. Users are
able to provide their own implementation of this manager, and use it to
time additional code paths outside of the pass manager. Also allows for
multiple `PassManager`s to run and contribute to a single timing report.

More specifically, moves most of the existing infrastructure in
`Pass/PassTiming.cpp` into a new `Support/Timing.cpp` file and adds a
public interface in `Support/Timing.h`. The `PassTiming` instrumentation
becomes a wrapper around the new timing infrastructure which adapts the
instrumentation callbacks to the new timers.

Reviewed By: rriddle, lattner

Differential Revision: https://reviews.llvm.org/D100647

[PowerPC] Fix definitions of CMPRB8, CMPEQB, CMPRB, SETB in PPCInstr64Bit.td and PPCInstrInfo.td

[AMDGPU] Disable the SIFormMemoryClauses pass at -O1

This patch disables the SIFormMemoryClauses pass at -O1. This pass has a
significant impact on compilation time, so we only want it to be enabled
starting from -O2.

Differential Revision: https://reviews.llvm.org/D101939

Fix grammar in README.md

[X86][AVX] combineConcatVectorOps - add ConcatSubOperand helper. NFCI.

Pull out repeated code to create a concat_vectors of the same operand from all subvecs.

[X86][AVX] Add v4i64 shift-by-32 tests

AVX1 could perform this as a v8f32 shuffle instead of splitting - based off PR46621

[TargetLowering] Improve legalization of scalable vector types

This patch extends the vector type-conversion and legalization capabilities of
scalable vector types.

Firstly, `vscale x 1` types now behave more like the corresponding `vscale x
2+` types. This enables the integer promotion legalization of extended scalable
types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`.

These `vscale x 1` types are also now better handled by
`getVectorTypeBreakdown`, where what looks like older handling for 1-element
fixed-length vector types was spuriously updated to include scalable types.

Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR`
to insert the smaller scalable vector "value" type into the wider scalable
vector "part" type. This allows AArch64 to pass and return `vscale x 1` types
by value by widening.

There are still cases where we are unable to legalize `vscale x 1` types, such
as where expansion would require splitting the vector in two.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102073

[mlir][openacc] Conversion of data operand to LLVM IR dialect

Add a conversion pass to convert higher-level type before translation.
This conversion extract meangingful information and pack it into a struct that
the translation (D101504) will be able to understand.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D102170

[OpenCL] Remove pragma requirement from Arm dot extension.

This removed the pointless need for extension pragma since
it doesn't disable anything properly and it doesn't need to
enable anything that is not possible to disable.

The change doesn't break existing kernels since it allows to
compile more cases i.e. without pragma statements but the
pragma continues to be accepted.

Differential Revision: https://reviews.llvm.org/D100985

[llvm-cov][test] Add test coverage for "gcov" implying "llvm-cov gcov" compatibility.

Much like other LLVM binary utilities, `llvm-cov` has a symlink compatibility feature where it runs in `gcov` compatibility mode if the binary name ends in `gcov`. This is identical to invoking `llvm-cov gcov ...`.

Differential Revision: https://reviews.llvm.org/D102299

[CUDA][HIP] Fix device template variables

Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:

https://godbolt.org/z/fneEfferY

This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.

It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102237

[ValueTypes] Rename MVT::getVectorNumElements() to MVT::getVectorMinNumElements(). Fix some misuses of getVectorNumElements()

getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.

The changes to isPow2VectorType and getPow2VectorType are copied from EVT.

The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.

This was motivated by the bug I fixed yesterday in 80b9510806cf11c57f2dd87191d3989fc45defa8

Reviewed By: frasercrmck, sdesmalen

Differential Revision: https://reviews.llvm.org/D102262

Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics"

This reverts commit 6c80361b8474535852afb2f7201370fb5f410091.
Breaks PowerPC Big Endian buildbots.

[DAGCombiner] Fix DAG combine store elimination, different address space.

Fixes a bug in the DAG combiner that eliminates the stores because it missed
to inspect the address space of the pointers.

%v = load %ptr_as1
// no chain side effect
store %v, %ptr_as2

As well as

store %v, %ptr_as1
store %v, %ptr_as2

Fixes a test for above in X86.

Differential Revision: https://reviews.llvm.org/D102096

[DAGCombiner] Add test exposing bug in DAG combine.

Adds a test in X86, exposing a bug in DAG combine eliminating stores that
are the same value but no the same address space.

Differential Revision: https://reviews.llvm.org/D102243

[CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr

When a ptest is used to set flags from the output of rdffr, the ptest
can be eliminated, using a flags-setting rdffrs instead.

Additionally, check that nothing consumes flags between rdffr and ptest;
this case appears to have been missed previously.

* There is no unpredicated RDFFRS instruction.
* If substituting RDFFR_PP, require that the mask argument of the
  PTEST matches that of the RDFFR_PP.
* Move some precondition code up inside optimizePTestInstr, so that it
  covers the new code paths for RDFFR which return earlier.
  * Only consider RDFFR, PTEST in same basic block.
  * Check for other flag setting instructions between the two, abort if
    found.
  * Drop an old TODO comment about removing dead PTEST instructions.

RDFFR_P to follow in later patch.

Differential Revision: https://reviews.llvm.org/D101357

[clang][AVR] Redefine some types to be compatible with avr-gcc

Reviewed By: dylanmckay

Differential Revision: https://reviews.llvm.org/D100701

[NFC] Use variable GEP index in vec_demanded_elts tests

I've changed a test in each of these files:

Transforms/InstCombine/vec_demanded_elts.ll
Transforms/InstCombine/vec_demanded_elts-inseltpoison.ll

to use a variable GEP index instead of a constant value so that
we're testing the more general case.

[Passes] Reenable the relative lookup table converter pass for ELF and COFF on aarch64

The bug (PR50227, affecting COFF) that caused the revert in
6f5670a4c3d8c079d4b676140ee69e5cc235d5a8 has been fixed in
382c505d9cfca8adaec47aea2da7bbcbc00fc05c now, so it should be safe
to reenable the pass for that target (and ELF).

In PR50227 it's also mentioned that the same pass seems to cause
problems on aarch64 on darwin, so leaving it disabled there for now.

[llvm-objdump] Exclude __mh_*_header symbols during MachO disassembly

`__mh_(execute|dylib|dylinker|bundle|preload|object)_header` are special symbols whose values hold the VMA of the Mach header to support introspection. They are attached to the first section in `__TEXT`, even though their addresses are outside `__TEXT`, and they do not refer to code.

It is normally harmless, but when the first section of `__TEXT` has no other symbols, `__mh_*_header` is considered by the disassembler when determing function boundaries. Since `__mh_*_header` refers to an address outside `__TEXT`, the boundary determination fails and disassembly quits.

Since `__TEXT,__text` normally has symbols, this bug is obscured. Experiments placing `__stubs` and `__stub_helper` first exposed the bug, since neither has symbols.

Differential Revision: https://reviews.llvm.org/D101786

[AMDGPU] Improve Codegen for build_vector

Improve the code generation of build_vector.
Use the v_pack_b32_f16 instruction instead of
v_and_b32 + v_lshl_or_b32

Differential Revision: https://reviews.llvm.org/D98081

Patch by Julien Pagès!

[InstCombine] ~(C + X) --> ~C - X (PR50308)

We can not rely on (C+X)-->(X+C) already happening,
because we might not have visited that `add` yet.
The added testcase would get stuck in an endless combine loop.