review.tizen.org Git - platform/upstream/llvm.git/log

[RISCV] Return false from isShuffleMaskLegal except for splats.

We don't support any other shuffles currently.

This changes the bswap/bitreverse tests that check for this in
their expansion code. Previously we expanded a byte swapping
shuffle through memory. Now we're scalarizing and doing bit
operations on scalars to swap bytes.

In the future we can probably use vrgather.vx to do a byte swap
shuffle.

[libcxx] adds concept std::copyable

Implements parts of:
- P0898R3 Standard Library Concepts
- P1754 Rename concepts to standard_case for C++20, while we still can

Depends on D97359

Reviewed By: EricWF, #libc, Quuxplusone, zoecarver

Differential Revision: https://reviews.llvm.org/D97443

[libcxx] adds concept std::movable

Implements parts of:
- P0898R3 Standard Library Concepts
- P1754 Rename concepts to standard_case for C++20, while we still can

Depends on D97162

Reviewed By: EricWF, #libc, Quuxplusone

Differential Revision: https://reviews.llvm.org/D97359

BPF: provide better error message for unsupported atomic operations

Currently, BPF backend does not support all variants of
  atomic_load_{add,and,or,xor}, atomic_swap and atomic_cmp_swap
For example, it only supports 32bit (with alu32 mode) and 64bit
operations for atomic_load_{and,or,xor}, atomic_swap and
atomic_cmp_swap. Due to historical reason, atomic_load_add is
always supported with 32bit and 64bit.

If user used an unsupported atomic operation, currently,
codegen selectiondag cannot find bpf support and will issue
a fatal error. This is not user friendly as user may mistakenly
think this is a compiler bug.

This patch added Custom rule for unsupported atomic operations
and will emit better error message during ReplaceNodeResults()
callback. The following is an example output.

  $ cat t.c
  short sync(short *p) {
    return  __sync_val_compare_and_swap (p, 2, 3);
  }
  $ clang -target bpf -O2 -g -c t.c
  t.c:2:11: error: Unsupported atomic operations, please use 64 bit version
    return  __sync_val_compare_and_swap (p, 2, 3);
            ^
  fatal error: error in backend: Cannot select: t19: i64,ch =
    AtomicCmpSwap<(load store seq_cst seq_cst 2 on %ir.p)> t0, t2,
    Constant:i64<2>, Constant:i64<3>, t.c:2:11
    t2: i64,ch = CopyFromReg t0, Register:i64 %0
      t1: i64 = Register %0
    t11: i64 = Constant<2>
    t10: i64 = Constant<3>
  In function: sync
  PLEASE submit a bug report ...

Fatal error will still happen since we did not really do proper
lowering for these unsupported atomic operations. But we do get
a much better error message.

Differential Revision: https://reviews.llvm.org/D98471

Revert "[compiler-rt][asan] Make wild-pointer crash error more useful"

This reverts commit f65e1aee4004c25fbeacd5024de1d17f0a7ebc5c.

[libFuzzer] Add attribute noinline on Fuzzer::ExecuteCallback().

The inlining of this function needs to be disabled as it is part of the
inpsected stack traces. It's string representation will look different
depending on if it was inlined or not which will cause it's string comparison
to fail.

When it was inlined in only one of the two execution stacks,
minimize_two_crashes.test failed on SystemZ. For details see
https://bugs.llvm.org/show_bug.cgi?id=49152.

Reviewers: Ulrich Weigand, Matt Morehouse, Arthur Eubanks

Differential Revision: https://reviews.llvm.org/D97975

[AMDGPU] Do not annotate an else branch if there is a kill

As llvm.amdgcn.kill is lowered to a terminator it can cause
else branch annotations to end up in the wrong block.
Do not annotate conditionals as else branches where there is
a kill to avoid this.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D97427

[compiler-rt][asan] Make wild-pointer crash error more useful

Right now, when you have an invalid memory address, asan would just crash and does not offer much useful info.
This patch attempted to give a bit more detail on the access.

Differential Revision: https://reviews.llvm.org/D98280

Revert "[NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes"

This reverts commit 5eaeb0fa67e57391f5584a3f67fdb131e93afda6.

It appears there are analyses that assume clearing - example:
https://lab.llvm.org/buildbot#builders/36/builds/5964

[compiler-rt] PR#39514 Support versioned llvm-symbolizer binaries

Some linux distributions produce versioned llvm-symbolizer binaries,
e.g. my llvm-11 installation puts the symbolizer binary at
/usr/bin/llvm-symbolizer-11.0.0 . However if you then try to run
a binary containing ASAN with
ASAN_SYMBOLIZER_PATH=..../llvm-symbolizer-FOO , it will fail on startup
with "isn't a known symbolizer".

Although it is possible to work around this by setting up symlinks,
that's kindof ugly - supporting versioned binaries is a nicer solution.
(There are now multiple stack overflow and blog posts talking about
this exact issue :) .)

Originally added in:
https://reviews.llvm.org/D8285

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D97682

[mlir][Vector] Lowering of transfer_read/write to vector.load/store

This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.

Reviewed By: dcaballe, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97822

[NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes

Check with the analysis result by calling invalidate instead of clear on
the analysis manager.

Differential Revision: https://reviews.llvm.org/D98440

void cast to suppress -Wunused-variable in non-asserts build

Move (llvm-original-di-preservation) test example output into the Inputs directory (since it's an input to the test execution)

The "Inputs" subdirectory is used for all files read by the test, not
only those used as input to the execution - so even though this file is
used as a golden reference for the output of the test, it's still an
input to the test execution (it is read in the process of executing the
test).

[NFC] Test commit. Add empty lines.

[RuntimeDyld] Speedup resolution of relocations to external symbols

From what I can tell, the loop inside applyExternalSymbolRelocations()
used to call getSymbolAddress(). After the JITSymbolResolver interface
redesign, the functionality has changed, and the loop should no longer
trigger repopulation of ExternalSymbolRelocations. If that's the case,
there is no need to update the loop iterator manually, and
ExternalSymbolRelocations can be cleared at once. This way, when there
are many external symbols in the program, the function runs much faster.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97531

[asan] disable no-fd test on darwin

If a log message is triggered between execv and child, this test fails.
In the meantime, disable the test to unblock CI

rdar://74992832

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D98453

[AMDGPU] Restrict image_msaa_load to MSAA dimension types

This instruction is only valid on 2D MSAA and 2D MSAA Array
surfaces. Remove intrinsic support for other dimension types,
and block assembly for unsupported dimensions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D98397

Replace use of OperationState with builder::create in GPU Kernel Outlining (NFC)

OperationState is a low level API that is rarely indicated, the builder
API convenient wrapper is preferred when possible.

[AMDGPU] Don't check hasStackObjects() when reserving VGPR

We have amdgpu_gfx functions that have high register pressure. If
we do not reserve VGPR for SGPR spill, we will fall into the path
to spill the SGPR to memory, which does not only have correctness issue,
but also have really bad performance.

I don't know why there is the check for hasStackObjects(), in our case,
we don't have stack objects at the time of finalizeLowering(). So just
remove the check that we always reserve a VGPR for possible SGPR spill
in non-entry functions.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D98345

[AMDGPU] Free reserved VGPR if no SGPR spill

I met some code generation behavior change when I tried to remove
the hasStackObject() check when reserving VGPR for SGPR spill.
For example, the function `callee_no_stack_no_fp_elim_all` in the lit
test file `callee-frame-setup.ll`.
The generated code changed from:
```
  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
  s_mov_b32 s4, s33
  s_mov_b32 s33, s32
  s_mov_b32 s33, s4
  s_setpc_b64 s[30:31]
```

into something like:
```
  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
  v_writelane_b32 v63, s33, 0
  s_mov_b32 s33, s32
  v_readlane_b32 s33, v63, 0
  s_setpc_b64 s[30:31]
```

I think we still prefer the old version where only scalar instructions are needed.
The idea here is free the reserved VGPR if no SGPR spills. So we will very likely
to use a free SGPR for fp/sp spill.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D98344

[crt][fuzzer] Fix up various numeric conversions

Attempting to build a standalone libFuzzer in Fuchsia's default toolchain for the purpose of cross-compiling the unit tests revealed a number of not-quite-proper type conversions. Fuchsia's toolchain include `-std=c++17` and `-Werror`, among others, leading to many errors like `-Wshorten-64-to-32`, `-Wimplicit-float-conversion`, etc.

Most of these have been addressed by simply making the conversion explicit with a `static_cast`. These typically fell into one of two categories: 1) conversions between types where high precision isn't critical, e.g. the "energy" calculations for `InputInfo`, and 2) conversions where the values will never reach the bits being truncated, e.g. `DftTimeInSeconds` is not going to exceed 136 years.

The major exception to this is the number of features: there are several places that treat features as `size_t`, and others as `uint32_t`. This change makes the decision to cap the features at 32 bits. The maximum value of a feature as produced by `TracePC::CollectFeatures` is roughly:
(NumPCsInPCTables + ValueBitMap::kMapSizeInBits + ExtraCountersBegin() - ExtraCountersEnd() + log2(SIZE_MAX)) * 8

It's conceivable for extremely large targets and/or extra counters that this limit could be reached. This shouldn't break fuzzing, but it will cause certain features to collide and lower the fuzzers overall precision. To address this, this change adds a warning to TracePC::PrintModuleInfo about excessive feature size if it is detected, and recommends refactoring the fuzzer into several smaller ones.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D97992

[RISCV] Add test cases for fixed vector bitreverse, bswap, ctlz, cttz, and ctpop.

Codegen needs to be improved, but I wanted to check for crashes.

Fix use of deprecated IRBuilder::CreateLoad in Kaleidoscope

Reland: [mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer.

This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97892

[PDB] Improve warning for corrupt debug info

The S_[GL]PROC32_ID symbol records are supposed to point to function ID
records. If they don't, they are corrupt. The warning message here was
very technical, but a user has encountered it in the wild. Add some more
information and some more testing.

[Matrix] Add support for matrix-by-scalar division.

This patch extends the matrix spec to allow matrix-by-scalar division.

Originally support for `/` was left out to avoid ambiguity for the
matrix-matrix version of `/`, which could either be elementwise or
specified as matrix multiplication M1 * (1/M2).

For the matrix-scalar version, no ambiguity exists; `*` is also
an elementwise operation in that case. Matrix-by-scalar division
is commonly supported by systems including Matlab, Mathematica
or NumPy.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D97857

Reland: [mlir][Vector][Affine] Improve affine vectorizer algorithm

This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
  * Removed tracking of root and terminal ops. Existing vectorization
    functionality is preserved and extended so that loop nests without
    root-terminal chains can be vectorized.
  * Vectorizing a loop nest now only requires a single topological traversal.
  * A new vector loop nest is incrementally built along the vectorization
    process. The original scalar loop is kept intact. No cloning guard is needed
    to recover the scalar loop if vectorization fails. This approach also
    simplifies the challenging task of replacing a loop operation amid the
    vectorization process without invalidating the analysis information that
    depends on the original loop.
  * Vectorization of specific operations has been implemented as independent,
    preparing them to be moved to a potential vectorization interface.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97442

[AMDGPU] Remove dead MTBUF patterns

These patterns are obviously dead, they are using format
operand which is not selected and we have no corresponding
SelectMUBUF() function.

Differential Revision: https://reviews.llvm.org/D98451

[dfsan] Disable testing origin tracking on non x86_64 arch

Fix test cases related to https://reviews.llvm.org/D95835.

[compiler-rt] Partially revert 8bd2722f65cfd7883ed9769f7bad3ff50e4c6905

Don't normalize arm architecture names; doing that loses the ability
to pick the right implementation of builtins for each architecture
variant. When building compiler-rt builtins as part of a
runtimes build, builtins for multiple armv* variants could be built
in the same directory, and with the simplified architecture name,
they'd all be built in the same directory, overlapping each other.

[Attributor] Don't access pointer elem type in constructPointer (NFC)

Splitting this out as the change is non-trivial: The way this code
handled pointer types doesn't really make sense, as GEPs can only
apply an offset to the outermost pointer, but can't drill down
into interior pointer types (which would require dereferencing
memory).

Instead give special treatment to the first (pointer) index.
I've hardcoded it to zero as that's the only way the function is
used right now, but handling non-zero indexes would be
straightforward.

The original goal here was to have an element type for CreateGEP.

[InstrProfiling] Generate runtime hook for ELF platforms

When using -fprofile-list to selectively apply instrumentation only
to certain files or functions, we may end up with a binary that doesn't
have any counters in the case where no files were selected. However,
because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
runtime would still be pulled in and incur some non-trivial overhead,
especially in the case when the continuous or runtime counter relocation
mode is being used. A better way would be to pull in the profile runtime
only when needed by declaring the __llvm_profile_runtime symbol in the
translation unit only when needed.

This approach was already used prior to 9a041a75221ca, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation.
Since TAPI is only used on Mach-O platforms, we could use the early
emission of __llvm_profile_runtime there, and on other platforms we
could change back to the earlier approach where the symbol is generated
later only when needed. We can stop passing -u__llvm_profile_runtime to
the linker on Linux and Fuchsia since the generated undefined symbol in
each translation unit that needed it serves the same purpose.

Differential Revision: https://reviews.llvm.org/D98061

Test commit

[SLP] Fix crash when matching associative reduction for integer min/max.

Associative reduction matcher in SLP begins with select instruction but when
it reached call to llvm.umax (or alike) via def-use chain the latter also matched
as UMax kind. The routine's later code assumes matched instruction to be a select
and thus it merely died on the first encountered cast that did not fit.

Differential Revision: https://reviews.llvm.org/D98432

[libc++] [docs] Add link to clang status page for C++2b and fix anchor for C++20.

[mlir][StorageUniquer] Properly call the destructor on non-trivially destructible storage instances

This allows for storage instances to store data that isn't uniqued in the context, or contain otherwise non-trivial logic, in the rare situations that they occur. Storage instances with trivial destructors will still have their destructor skipped. A consequence of this is that the storage instance definition must be visible from the place that registers the type.

Differential Revision: https://reviews.llvm.org/D98311

[clangd] Make ProjectAwareIndex optionally sync

Depends on D98029.

Differential Revision: https://reviews.llvm.org/D98165

[clangd] Add config block for Completion and option for AllScopes

Depends on D98029

Differential Revision: https://reviews.llvm.org/D98037

[IndirectCallPromotion] Don't strip ".__uniq." suffix when it strips
".llvm." suffix.

Currently IndirectCallPromotion simply strip everything after the first "."
in LTO mode, in order to match the symbol name and the name with ".llvm."
suffix in the value profile. However, if -funique-internal-linkage-names
and thinlto are both enabled, the name may have both ".__uniq." suffix and
".llvm." suffix, and the current mechanism will strip them both, which is
unexpected. The patch fixes the problem.

Differential Revision: https://reviews.llvm.org/D98389

[libcxx] Test accessing a directory on windows that gives "access denied" errors

Fix handling of skip_permission_denied on windows; after converting
the return value of GetLastError() to a standard error_code, ec.value()
is in the standard errc range, not a native windows error code. This
was missed in 156180727d6c347eda3ba749730707acb8a48093.

The directory "C:\System Volume Information" does seem to exist and
have these properties on most relevant contempory setups.

Differential Revision: https://reviews.llvm.org/D98166

[SelectionDAG] Improve scalarization of irregular vector types

Use a more general strategy when splitting a vector into scalar parts (and vice-versa) to correctly handle vector types whose element size is not a power of 2 (and a multiple of 8).

Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D98273

[MIPS] Fix lowering of irregular vector arguments

The code deciding how to split the vector in register-sized integers used the integer division operator, thus rounding down the result.
Correct the computation for irregularly-sized types (non-power-of-two, non multiple of 8) by rounding the division result upwards.

Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D98189

[lldb] Fix the man page build

In D94489 we changed the way we build the docs and now have some additional
dependencies to generate the Python API docs. As the same sphinx project is
generating the man pages for LLDB it should have in theory the same setup code
that sets up the mocked LLDB module.

However, as we don't have that setup code the man page generation just fails as
there is no mocked LLDB module and the Python API generation errors out.

The man page anyway doesn't cover the Python API so I don't think there is any
point of going through the whole process (and requiring the sphinx plugins) just
to generate the (eventually unused) Python docs.

This patch just skips the relevant Python API generation when we are building
the man page.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D98441

[mlir][Vector][Affine] Fix heap-use-after-free in vectorizer

This patch fixes a heap-use-after-free introduced by the recent changes
in the vectorizer: https://reviews.llvm.org/rG95db7b4aeaad590f37720898e339a6d54313422f
The problem is due to the way candidate loops are visited. All candidate loops
are pattern-matched beforehand using the 'NestedMatch' utility. These matches may
intersect with each other so it may happen that we try to vectorize a loop that
was previously vectorized. The new vectorization algorithm replaces the original
loops that are vectorized with new loops and, therefore, any reference to the
original loops in the pre-computed matches becomes invalid.

This patch fixes the problem by classifying the candidate matches into buckets
before vectorization. Each bucket contains all the matches that intersect. The
vectorizer uses these buckets to make sure that we only vectorize *one* match from
each bucket, at most.

Differential Revision: https://reviews.llvm.org/D98382

[gn build] Port 5433a79176a3

[lld-macho] Unbreak build breakage from rG1752f2850685

[lld-macho] Avoid requiring shell in tests

There are 3 remaining tests that still have `REQUIRE: shell`:
* color-diagnostics.test -- seems necessary for ANSI escape sequence support
* stabs.s -- the shell part could be removed, but I don't think we can support
the test on Windows anyway due to its reliance on `touch` to set the modtime
* framework.s -- uses symlinks, I'm not sure this works on Windows

Addresses PR49512.

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D98395

[lld-macho][nfc] Refactor subtractor reloc handling

SUBTRACTOR relocations are always paired with UNSIGNED
relocations to indicate a pair of symbols whose address difference we
want. Functionally they are like a single relocation: only one pointer
gets written / relocated. Previously, we would handle these pairs by
skipping over the SUBTRACTOR relocation and writing the pointer when
handling the UNSIGNED reloc. This diff reverses things, so we write
while handling SUBTRACTORs and skip over the UNSIGNED relocs instead.

Being able to distinguish between SUBTRACTOR and UNSIGNED relocs in the
write phase (i.e. inside `relocateOne`) is useful for the upcoming range
check diff: we want to check that SUBTRACTOR relocs write signed values,
but UNSIGNED relocs (naturally) write unsigned values.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D98386

[lld-macho] Fix handling of X86_64_RELOC_SIGNED_{1,2,4}

The previous implementation miscalculated the addend, resulting
in an underflow. This meant that every SIGNED_N section relocation would
be associated with the last subsection (since the addend would now be a
huge number). We were "lucky" that this mistake was typically cancelled
out -- 64-to-32-bit-truncation meant that the final value was correct,
as long as subsections were not rearranged.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D98385

[lld-macho][nfc] Create Relocations.{h,cpp} for relocation-specific code

This more closely mirrors the structure of lld-ELF.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D98384

[lld-macho][nfc] Remove `MachO::` prefix where possible

Previously, SyntheticSections.cpp did not have a top-level `using namespace
llvm::MachO` because it caused a naming conflict: `llvm::MachO::Symbol` would
collide with `lld::macho::Symbol`.

`MachO::Symbol` represents the symbols defined in InterfaceFiles (TBDs). By
moving the inclusion of InterfaceFile.h into our .cpp files, we can avoid this
name collision in other files where we are only dealing with LLD's own symbols.

Along the way, I removed all unnecessary "MachO::" prefixes in our code.

Cons of this approach: If TextAPI/MachO/Symbol.h gets included via some other
header file in the future, we could run into this collision again.

Alternative 1: Have either TextAPI/MachO or BinaryFormat/MachO.h use a different
namespace. Most of the benefit of `using namespace llvm::MachO` comes from being
able to use things in BinaryFormat/MachO.h conveniently; if TextAPI was under a
different (and fully-qualified) namespace like `llvm::tapi` that would solve our
problems. Cons: lots of files across llvm-project will need to be updated, and
folks who own the TextAPI code need to agree to the name change.

Alternative 2: Rename our Symbol to something like `LldSymbol`. I think this is
ugly.

Personally I think alternative #1 is ideal, but I'm not sure the effort to do it is
worthwhile, this diff's halfway solution seems good enough to me. Thoughts?

Reviewed By: #lld-macho, oontvoo, MaskRay

Differential Revision: https://reviews.llvm.org/D98149

[libcxx] [test] Disable a test regarding error behaviour for excessively long paths on windows

Checking for the existence of an invalid long path name isn't
an error in itself on windows.

Differential Revision: https://reviews.llvm.org/D98141

[flang] Handle type-bound procedures with alternate returns

If you specify a type-bound procedure with an alternate return, there
will be no symbol associated with that dummy argument.  In such cases,
the compiler's list of dummy arguments will contain a nullptr.  In our
analysis of the PASS arguments of type-bound procedures, we were
assuming that all dummy arguments had non-null symbols associated with
them and were using that assumption to get the name of the dummy
argument.  This caused the compiler to try to dereference a nullptr.

I fixed this by explicitly checking for a nullptr and, in such cases, emitting
an error message.  I also added tests that contain type-bound procedures with
alternate returns in both legal and illegal constructs to ensure that semantic
analysis is working for them.

Differential Revision: https://reviews.llvm.org/D98430

[SamplePGO] Skip inlinee profile scaling for sample loader inlining

For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.

Differential Revision: https://reviews.llvm.org/D98187

[Driver] Drop $sysroot/usr special case from Gentoo gcc-config detection

If --gcc-toolchain is specified, we should detect GCC installation there, and suppress other directories for detection.

Reviewed By: mgorny, manojgupta

Differential Revision: https://reviews.llvm.org/D97894

[UnitTests] Remove uses of deprecated CreateLoad() API

Missed this usage inside OpenMPIRBuilderTest.

[RISCV] Support fixed vector copysign.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98394

[ARM] Move t2DoLoopStart reg alloc hint

This adjusts the place that the t2DoLoopStart reg allocation hint is
inserted, adding it in the ARMTPAndVPTOptimizaionPass in a similar place
as other tail predicated loop optimizations. This removes the need for
doing so in a custom inserter, and should make the hint more accurate,
only adding it where we expect to create a DLS (not DLSTP or WLS).

[ARM] Improve WLS lowering

Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over to the WLS while loops, improving the
chance of lowering them successfully. To do this the lowering has to
change a little as the instructions are terminators that produce a value
- something that needs to be treated carefully.

Lowering starts at the Hardware Loop pass, inserting a new
llvm.test.start.loop.iterations that produces both an i1 to control the
loop entry and an i32 similar to the llvm.start.loop.iterations
intrinsic added for do loops. This feeds into the loop phi, properly
gluing the values together:

  %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
  %wls0 = extractvalue { i32, i1 } %wls, 0
  %wls1 = extractvalue { i32, i1 } %wls, 1
  br i1 %wls1, label %loop.ph, label %loop.exit
...
loop:
  %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ]
  ..
  %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
  %cmp = icmp ne i32 %iv.next, 0
  br i1 %cmp, label %loop, label %loop.exit

The llvm.test.start.loop.iterations need to be lowered through ISel
lowering as a pair of WLS and WLSSETUP nodes, which each get converted
to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent
t2WhileLoopStart from being a terminator that produces a value,
something difficult to control at that stage in the pipeline. Instead
the t2WhileLoopSetup produces the value of LR (essentially acting as a
lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).

These are then converted into a single t2WhileLoopStartLR at the same
point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop
to prevent them from progressing further in the pipeline. The
t2WhileLoopStartLR is a single instruction that takes a GPR and produces
LR, similar to the WLS instruction.

  %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3
  t2B %bb.1
...
bb.2.loop:
  %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2
  ...
  %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2
  t2B %bb.3

The t2WhileLoopStartLR can then be treated similar to the other low
overhead loop pseudos, eventually being lowered to a WLS providing the
branches are within range.

Differential Revision: https://reviews.llvm.org/D97729

[AArch64] Fix -Wunused-but-set-variable in GCC non-debug build

[AArch64] Fix -Wunused-but-set-variable in GCC -DLLVM_ENABLE_ASSERTIONS=off non-debug build.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D98431

[PGO] Fix two issues in PGOMemOPSizeOpt.

1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from
the value profile metadata and preserves the remaining entries for the fallback
memop call site. If there are more than five entries, the rest of the entries
would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes
up to 3 (by default) values, but potentially not for other downstream passes
that may use the value profile metadata.

2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept
track of. When the range buckets were introduced, it was changed to skip the
range buckets, but since it does not grab all entries (only five), if some range
buckets exist in the first five entries, it could potentially cause fewer
promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to
promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it
means that wrong entries may be preserved, as it didn't correctly keep track of
which were entries were skipped.

To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number
of value profile buckets), keeps track of which entries were skipped, and
preserves all the remaining entries.

Differential Revision: https://reviews.llvm.org/D97592

[IRBuilder] Deprecate CreateLoad APIs with implicit type

These APIs are not compatible with opaque pointers. Deprecate them
to avoid the introduction of further uses.

[RISCV] Handle vmv.x.s intrinsic for i64 vectors on RV32.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98372

[mlir] Remove uses of type-less CreateLoad() APIs (NFC)

For the use in LLVMOps.td I used the getPointerElementType()
escape hatch, as it's not obvious to me how the load type
should be properly obtained here.

[Polly] Remove uses of type-less CreateLoad() APIs (NFC)

These are incompatible with opaque pointers and are going away.
Explicitly specify the loaded type instead.

[ELF] Simplify isValidCIdentifier. NFC

[libcxx] Avoid intermediate string objects for substrings in windows operator/=

Check that appends with a path object doesn't do allocations, even
on windows.

Suggested by Marek in D98398. The patch might apply without D98398
(depending on how much of the diff context has to match), but doesn't
make much sense until after that patch has landed.

Differential Revision: https://reviews.llvm.org/D98412

[libcxx] [test] Use a string_view of the native path type in the concat test

This makes sure that no extra allocations happen on windows, fixing
earlier errors in the DisableAllocationGuard (in the second case that
is modified).

This is split out from D98398.

Differential Revision: https://reviews.llvm.org/D98406

[ELF] Support . and $ in symbol names in expressions

GNU ld supports `.` and `$` in symbol names while LLD doesn't support them in
`readPrimary` expressions. Using `.` can result in such an error:

```
https://github.com/ClangBuiltLinux/linux/issues/1318
ld.lld: error: ./arch/powerpc/kernel/vmlinux.lds:255: malformed number: .TOC.
>>> __toc_ptr = (DEFINED (.TOC.) ? .TOC. : ADDR (.got)) + 0x8000;
```

Allow `.` (ppc64 special symbol `.TOC.`) and `$` (RISC-V special symbol `__global_pointer$`).

Change `diag[3-5].test` to use an invalid character `^`.

Note: GNU ld allows `~` in non-leading positions of a symbol name. `~`
is not used in practice, conflicts with the unary operator, and can
cause some parsing difficulty, so this patch does not add it.

Differential Revision: https://reviews.llvm.org/D98306

[RISCV] Support extract_vector_elt for fixed and scalable masked registers.

This uses a really simple approach of converting to an i8 vector
and extracting. This is probably not the best approach especially
if you know the index is constant.

Other ideas:
-Store to stack temporary using vse1, load as scalar and shift.
-Sort of bitcast the vector to a vector of i8, slide down the
appropriate 8 bit element, copy to scalar, shift down the
correct bit within the 8 bits we extracted. Not exactly sure
how to describe such a bitcast from i1 vector to i8 vector
within the type system for elements less than 8.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98310

[ValueTypes][RISCV] Add MVT for v1f16.

RISCV makes all fixed vector MVTs with size less than or equal
to a command line option legal.

This didn't include v1f16 because it was missing but did include v1f32 and v1f64.

One test is affected where we did test this type, but it is a horizontal
reduction so it is non-sensical. Perhaps we should canonicalize that
away somewhere.

I'm not sure if we should be making v1 types legal, but this will at
least make RISCV consistent across all types.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98365

[mlir] fix cmake build

[mlir] Introduce data layout modeling subsystem

Data layout information allows to answer questions about the size and alignment
properties of a type. It enables, among others, the generation of various
linear memory addressing schemes for containers of abstract types and deeper
reasoning about vectors. This introduces the subsystem for modeling data
layouts in MLIR.

The data layout subsystem is designed to scale to MLIR's open type and
operation system. At the top level, it consists of attribute interfaces that
can be implemented by concrete data layout specifications; type interfaces that
should be implemented by types subject to data layout; operation interfaces
that must be implemented by operations that can serve as data layout scopes
(e.g., modules); and dialect interfaces for data layout properties unrelated to
specific types. Built-in types are handled specially to decrease the overall
query cost.

A concrete default implementation of these interfaces is provided in the new
Target dialect. Defaults for built-in types that match the current behavior are
also provided.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D97067

[mlir] Add LLVM loop codegen options to control software pipelining

Support specifying the II and disabling pipelining.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D98420

AMDGPU/GlobalISel: Improve private addressing mode matching

This enables the look-through-copy to hack around not correctly
regbankselecting constants to match the use bank.

GlobalISel: Fix off by one in finding explicit byval alignment

For attribute sets, the return index is at 0, and arguments start at
1. getParamAlignment adds the offset of 1, so we need to convert from
attribute index back to IR index.

AMDGPU/GlobalISel: Add more tests for byval arguments

[flang][OpenMP] Add semantic check for occurrence of multiple list items in aligned clause for simd directive

Reviewed By: kiranchandramohan, clementval

Differential Revision: https://reviews.llvm.org/D97964

[lldb] Add missing debugserver dependency to check-lldb

D96202 removes the test dependency on debugserver which is causing a bunch of
tests to fail on a clean build.

This patch re-adds the dependency. I decided to not add the dependency in the
`test/API` folder as we actually need it in all kinds of tests (we have shell,
API and a few unit tests such as the LLDBServerTests that depend on the
debugserver binary). lldb-server is also handled in the same way.

Reviewed By: JDevlieghere, labath

Differential Revision: https://reviews.llvm.org/D98196

[OpenMP] Restore backwards compatibility for libomptarget

Summary:
The changes introduced in D87946 changed the API for libomptarget
functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x
but was not given a backward-compatible interface. This change will
require people using Clang 13.x or 12.x to recompile their offloading
programs.

Reviewed By: jdoerfert cchen

Differential Revision: https://reviews.llvm.org/D98358

[Sema] Use castAs<> instead getAs<> for dereferenced pointer casts. NFCI.

getAs<> returns null for missed casts, resulting in null dereferences - use castAs<> instead which will assert the cast is correct.

Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"

This reverts commit c0f3dfb9f119bb5f22dd8846f5502b6abaf026d3.

Reverted due to an error on the clang-x64-windows-msvc buildbot.

[llvm-mca] Fix uninitialized variable in InOrderIssueStage constructor warning. NFCI.

[PowerPC] Exploit paddi instruction on Power 10 for constant materialization

Starting with Power 10 the instruction paddi is available to use.
The instruction allows for immediates that are 34 bits.

This patch adds exploitation of the paddi instruction to allow us
to materialize constants.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D93300

[lld-macho] minimal TimeTrace support

This is the minimal port from ELF. Any extension should easy from here

Test plan: ninja check-all-macho

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D98419

[OpenCL][Docs] Add guidelines for new extensions and features.

Add documentation that explains how to extend clang with the new
extensions/features. The guidelines also detail clang's position
about the extension pragmas for the new functionality.

Differential Revision: https://reviews.llvm.org/D97072

[Orc] Deallocate debug objects explicitly when destroying the DebugObjectManagerPlugin

[Transforms] SampleProfileLoaderBaseImpl<BT>::getFunctionLoc - fix Wdocumentation warnings. NFCI.

[PowerPC] Fix multi-use case for swap reduction

4c973ae implemented reduction of vector swap for lane-insensitive
operations. This commit fixes it for checking number of uses of the
vector operation.

[Sema] Add some basic lambda capture fix-its

Adds fix-its when users forget to explicitly capture variables or this in lambdas

Addresses https://github.com/clangd/clangd/issues/697

Reviewed By: kbobyrev

Differential Revision: https://reviews.llvm.org/D96975

[OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC)

Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.

[AArch64][SVE] Add fixed/scalable lowering of FMAXIMUM/FMINIMUM ISD nodes

Differential Revision: https://reviews.llvm.org/D98348

[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands

This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.

Differential Revision: https://reviews.llvm.org/D91722

Revert "[AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer"

This patch introduced codegen faults. An attempt to fix this was done
in https://reviews.llvm.org/D97193, but ultimately it was decided to
approach this differently.

This reverts commit 42635856ed3c9a05957640f9deb50cf865c03825.

Differential Revision: https://reviews.llvm.org/D98350

[PowerPC] Fix infinite loop in peephole CR optimization (PR49509)

If we encounter a degenerate select node where both operands are
the same, then we can continue negating the condition while swapping
operands, resulting in an infinite loop. Avoid this by bailing out
if both operands are the same.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49509.

Differential Revision: https://reviews.llvm.org/D98340

[compiler-rt] Set CMAKE_TRY_COMPILE_TARGET_TYPE to STATIC_LIBRARY when building builtins standalone

When building builtins, the toolchain might not yet be at a stage
when linking a test application works yet, as builtins aren't
available. Therefore set CMAKE_TRY_COMPILE_TARGET_TYPE to STATIC_LIBRARY,
to avoid failing the compiler sanity check.

Setting CMAKE_TRY_COMPILE_TARGET_TYPE to STATIC_LIBRARY has the risk
of making checks for library availability succeed falsely (e.g.
indicating that libs would be available that really aren't, as the
tests don't do any linking), but the builtins library doesn't try to
link against any external libraries (and only produces static libraries
anyway), so it should be safe here.

This avoids having to set CMAKE_C_COMPILER_WORKS when bootstrapping a
cross toolchain, when building the builtins.

Differential Revision: https://reviews.llvm.org/D91334

Revert rGcd938ab162b0ac560dd0e9fee290980c7e0e47e5 "[X86] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling."

Investigating an issue reported by @bkramer, possibly when the PSHUFB mask generates zero elements.

[flang][driver] Add -fdebug-module-writer option

[clangd] Fix buildbots without grpc enabled