review.tizen.org Git - platform/upstream/llvm.git/log

BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible

Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.

Currently, compiler may transform the above code
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
    p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
    bpf_probe_read(buf, buf_size, p2);
  }
to
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    bpf_probe_read(buf, buf_size, p2);
  }
and eventually assembly code looks like
  reloc_exist = 1;
  reloc_member_offset = 10; //calculate member offset from base
  p2 = base + reloc_member_offset;
  if (reloc_exist) {
    bpf_probe_read(bpf, buf_size, p2);
  }
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.

This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
  %6 = load @llvm.sk_buff:0:50$0:0:0:2:0
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.

But this transformation has another consequence, code sinking
may happen like below:
  PHI = <possibly different @preserve_*_access_globals>
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6

For such cases, we will not able to generate relocations since
multiple relocations are merged into one.

This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.

A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
  - report fatal error if any reloc global in PHI nodes
  - remove all bpf passthrough builtin functions

Changes for existing CORE tests:
  - for clang tests, add "-Xclang -disable-llvm-passes" flags to
    avoid builtin->reloc_global transformation so the test is still
    able to check correctness for clang generated IR.
  - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
    before "llc" command since "opt" is needed to call newly-placed
    builtin->reloc_global transformation. Add target triple in the IR
    file since "opt" requires it.
  - Since target triple is added in IR file, if a test may produce
    different results for different endianness, two tests will be
    created, one for bpfeb and another for bpfel, e.g., some tests
    for relocation of lshift/rshift of bitfields.
  - field-reloc-bitfield-1.ll has different relocations compared to
    old codes. This is because for the structure in the test,
    new code returns struct layout alignment 4 while old code
    is 8. Align 8 is more precise and permits double load. With align 4,
    the new mechanism uses 4-byte load, so generating different
    relocations.
  - test intrinsic-transforms.ll is removed. This is used to test
    cse on intrinsics so we do not lose metadata. Now metadata is attached
    to global and not instruction, it won't get lost with cse.

Differential Revision: https://reviews.llvm.org/D87153

[clang][driver][AIX] Set compiler-rt as default rtlib

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D88182

Attempt to clear some msan errors in the libcxx atomic tests.

[mlir][Affine][VectorOps] Fix super vectorizer utility (D85869)

Adding missing code that should have been part of "D85869: Utility to
vectorize loop nest using strategy."

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D88346

[scudo][standalone] Remove unused atomic_compare_exchange_weak

`atomic_compare_exchange_weak` is unused in Scudo, and its associated
test is actually wrong since the weak variant is allowed to fail
spuriously (thanks Roland).

This lead to flakes such as:
```
[ RUN      ] ScudoAtomicTest.AtomicCompareExchangeTest
../../zircon/third_party/scudo/src/tests/atomic_test.cpp:98: Failure: Expected atomic_compare_exchange_weak(reinterpret_cast<T *>(&V), &OldVal, NewVal, memory_order_relaxed) is true.
    Expected: true
    Which is: 01
    Actual  : atomic_compare_exchange_weak(reinterpret_cast<T *>(&V), &OldVal, NewVal, memory_order_relaxed)
    Which is: 00
../../zircon/third_party/scudo/src/tests/atomic_test.cpp:100: Failure: Expected atomic_compare_exchange_weak( reinterpret_cast<T *>(&V), &OldVal, NewVal, memory_order_relaxed) is false.
    Expected: false
    Which is: 00
    Actual  : atomic_compare_exchange_weak( reinterpret_cast<T *>(&V), &OldVal, NewVal, memory_order_relaxed)
    Which is: 01
../../zircon/third_party/scudo/src/tests/atomic_test.cpp:101: Failure: Expected OldVal == NewVal.
    Expected: NewVal
    Which is: 24
    Actual  : OldVal
    Which is: 42
[  FAILED  ] ScudoAtomicTest.AtomicCompareExchangeTest (0 ms)
[----------] 2 tests from ScudoAtomicTest (1 ms total)
```

So I am removing this, if someone ever needs the weak variant, feel
free to add it back with a test that is not as terrible. This test was
initially ported from sanitizer_common, but their weak version calls
the strong version, so it works for them.

Differential Revision: https://reviews.llvm.org/D88443

[clang] Selectively ena/disa-ble format-insufficient-args warning

Differential Revision: https://reviews.llvm.org/D87176

Guard `find_library(tensorflow_c_api ...)` by checking for TENSORFLOW_C_LIB_PATH to be set by the user

Also have CMake fails if the user provides a TENSORFLOW_C_LIB_PATH but
we can't find TensorFlow at this path.

At the moment the CMake script tries to figure if TensorFlow is
available on the system and enables support for it. This is in general
not desirable to customize build features this way and instead it is
preferable to let the user opt-in explicitly into the features they want
to enable. This is in line with other optional external dependencies
like Z3.
There are a few reasons to this but amongst others:
- reproducibility: making features "magically" enabled based on whether
  we find a package on the system or not makes it harder to handle bug
  reports from users.
- user control: they can't have TensorFlow on the system and build LLVM
  without TensorFlow right now. They also would suddenly distribute LLVM
  with a different set of features unknowingly just because their build
  machine environment would change subtly.

Right now this is motivated by a user reporting build failures on their system:

.../mesa-git/llvm-git/src/llvm-project/llvm/lib/Analysis/TFUtils.cpp:23:10: fatal error: tensorflow/c/c_api.h: No such file or directory
   23 | #include "tensorflow/c/c_api.h"
      |          ^~~~~~

It looks like we detected TensorFlow at configure time but couldn't set all the paths correctly.

Differential Revision: https://reviews.llvm.org/D88371

[CVP] Allow two transforms in one invocation

For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early.

This is somewhat of a speculative fix. Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding. I never received a test case. However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue. So, since it's a real issue, fix it and hope.

[EHStreamer] Simplify sharedTypeIDs with std::mismatch

(Note that EMStreamer.cpp is largely under tested. The only test checking the prefix sharing is CodeGen/WebAssembly/eh-lsda.ll)

[mlir][shape] Make conversion passes more consistent.

- use select-ops to make the lowering simpler
- change style of FileCheck variables names to be consistent
- change some variable names in the code to be more explicit

Differential Revision: https://reviews.llvm.org/D88258

[libcxx] Don't pass -s to libtool

This flag is the default in libtool on Darwin, and it's not supported
by llvm-libtool-darwin causing a build failure.

Differential Revision: https://reviews.llvm.org/D88449

[libc++] Fix constexpr dynamic allocation on GCC 10

We're technically not allowed by the Standard to call ::operator new in
constexpr functions like __libcpp_allocate. Clang doesn't seem to complain
about it, but GCC does.

[X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a constant mask.

We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits

-If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it.
-The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted.
-The result will have zeros in any position that the mask is zero.
-Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input.

Differential Revision: https://reviews.llvm.org/D87883

[X86] Add tests for D87883. NFC

[GlobalISel] Add support for lowering of vector G_SELECT and use for AArch64.

The lowering is a port of the SDAG expansion.

Differential Revision: https://reviews.llvm.org/D88364

[CMake][AIX] Limit tools in external project build

This is a follow on to D85329 which disabled some llvm tools in the
runtimes build due to XCOFF64 limitations. This change disables them
in other external project builds as well, when no list of tools is
specified in the arguments.

Reviewed By: hubert.reinterpretcast, stevewan

Differential Revision: https://reviews.llvm.org/D88310

[gn build] Re-run CompletionModelCodegen when input json files change

Fix a think-o with the numerical suffixes in the docs for init_priority.

[lldb] Add print_function import

Revert "Revert "[AArch64][GlobalISel] Add selection support for <8 x s16> G_INSERT_VECTOR_ELT with GPR scalar.""

This isn't a real with the codegen, it's a previously known bug in clang which
causes non-deterministic failures due to garbage bits in undef registers being
used in saturating instructions.

I'm disabling the result checking for the test until this issue is resolved.

This reverts commit 6c8168324b5329c94fe7e8f9a1619802091b9bec.

[mlir] [VectorOps] Relaxed restrictions on vector.reduction types even more

Recently, restrictions on vector reductions were made more relaxed by
accepting any width signless integer and floating-point. This CL relaxes
the restriction even more by including unsigned and signed integers.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D88442

[X86] Use inlineasm flag output for the _bittest* intrinsics.

Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.

If the flag can't be used directly, the backend will emit a setcc.

Differential Revision: https://reviews.llvm.org/D87888

[InstCombine] Regenerate cast tests. NFC.

[COFF] Aliases resolve directly to defined external targets

Avoid introducing unnecessary indirection for weak-external references.

We only need to introduce ".weak.<SYMBOL>.default" when referencing a
symbol that is defined, but not external.

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D88305

[libc++] Replace uses of __libcpp_allocate by std::allocator<>

Both are equivalent, however std::allocator can appear in constant
expressions and is higher level.

[libc++] Add UNSUPPORTED markup to atomic test in single-threaded mode

[libc++] Fix heap UaF issue in coroutine test

This wasn't being flagged by older versions of ASAN, but it is now.

[wasm] Move WasmTraits.h to BinaryFormat

There's no dependency on Object in there and this avoids a cyclic
dependency between libMC and libObject.

[CostModel] remove hack for intrinsic cost based on cost type

This hack seems to only have been necessary because of the
constructor bug noted in 33125cffd.

Once again, it's hard to prove NFC, but that's the hope...

Once we've found a firmware binary and loaded it, don't search more

Add the flag in ProcessMachCore::DoLoadCore that stops additional
searches for the binaries when we have an LC_NOTE identifying the
firmware/standalone binary as the correct one & we have loaded it
successfully.

[lldb] Enable markdown support for documentation

This enables support for writing LLDB documentation in markdown in
addition to reStructured text. We already had documentation written in
markdown (StructuredDataPlugins and DarwinLog) which will now also be
available on the website.

[PowerPC] Legalize v256i1 and v512i1 and implement load and store of these types

This patch legalizes the v256i1 and v512i1 types that will be used for MMA.

It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.

This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.

Differential Revision: https://reviews.llvm.org/D84968

[CostModel] fill in arguments as part of intrinsic attribute constructor

This appears to be an error of code duplication - instead of
one constructor variant calling another, we have N similar
but not identical versions.

I think this is 'NFC' based on the current callers, but it's
hard to tell or guess the intent in all cases.

[python][tests] Fix string comparison with "is"

scudo: Re-order Allocator fields for improved performance. NFCI.

Move smaller and frequently-accessed fields near the beginning
of the data structure in order to improve locality and reduce
the number of instructions required to form an access to those
fields. With this change I measured a ~5% performance improvement on
BM_malloc_sql_trace_default on aarch64 Android devices (Pixel 4 and
DragonBoard 845c).

Differential Revision: https://reviews.llvm.org/D88350

[mlir] [VectorOps] changes to printing support for integers

(1) simplify integer printing logic by always using 64-bit print
(2) add index support (since vector<16xindex> is planned to be added)
(3) adjust naming convention print_x -> printX

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D88436

[AArch64] reuse another map iterator. NFC

Revert "[AArch64][GlobalISel] Add selection support for <8 x s16> G_INSERT_VECTOR_ELT with GPR scalar."

This reverts commit b5e87c9ef2243ecd65e0ef87a1bf303c0c26db04 as it seems to have
broken a bot.

[clangd] Rename evaluate() to evaluateHeuristics()

Since we have 2 scoring functions (heuristics and decision forest),
renaming the existing evaluate() function to be more descriptive of the
Heuristics being evaluated in it.

Differential Revision: https://reviews.llvm.org/D88431

[AddressSanitizer] Copy type metadata to prevent miscompilation

When ASan and e.g. Dead Virtual Function Elimination are enabled, the
latter will rely on type metadata to determine if certain virtual calls can be
removed. However, ASan currently does not copy type metadata, which can cause
virtual function calls to be incorrectly removed.

Differential Revision: https://reviews.llvm.org/D88368

[InstCombine] Add trunc(shr(trunc(x),c)) non-uniform vector tests

[WebAssembly] Use wasm::Signature for in ObjectWriter (NFC)

There are two `WasmSignature` structs, one in
include/llvm/BinaryFormat/Wasm.h and the other in
lib/MC/WasmObjectWriter.cpp. I don't know why they got separated in this
way in the first place, but it seems we can unify them to use the one in
Wasm.h for all cases.

Reviewed By: dschuff, sbc100

Differential Revision: https://reviews.llvm.org/D88428

[AArch64][GlobalISel] Infer whether G_PHI is going to be a FPR in regbankselect

Some instructions (G_LOAD, G_SELECT, G_UNMERGE_VALUES) check if their uses
will define/use FPRs (using `onlyUsesFP` and `onlyDefinesFP`).

The register bank of a use isn't necessarily known when an instruction asks for
this.

Teach `hasFPConstraints` to look at the instructions feeding into a G_PHI when
its destination bank is unknown. If any of them are FPR, assume the entire
G_PHI will also be assigned a FPR.

Since a phi can have many inputs, and those inputs can in turn be phis,
restrict the search depth to a very low number.

Also improve the docs for `hasFPConstraints` and friends a little.

This is a 0.3% code size improvement on CTMark/Bullet at -O3, and a 0.2% code
size improvement at CTMark/pairlocalalign at -O3.

Differential Revision: https://reviews.llvm.org/D88177

[CostModel] move early exit for free intrinsics

This should be NFC unless some target was expecting that
some form of cttz/ctlz/memcpy is free in terms of size/latency
but not free in throughput cost.

[CostModel] split handling of intrinsics from other calls

This should be close to NFC (no-functional-change), but I
can't completely rule out that some call on some target
travels down a different path. There's an especially large
amount of code spaghetti in this part of the cost model.

The goal is to clean up the intrinsic cost handling so
we can canonicalize to the new min/max intrinsics without
causing regressions.

[AArch64][GlobalISel] Support shifted register form in emitTST

Support emitting ANDSXrs and ANDSWrs in `emitTST`. Update opt-fold-compare.mir
to show that it works.

Differential Revision: https://reviews.llvm.org/D87530

[GlobalISel] Combine (xor (and x, y), y) -> (and (not x), y)

When we see this:

```
%and = G_AND %x, %y
%xor = G_XOR %and, %y
```

Produce this:

```
%not = G_XOR %x, -1
%new_and = G_AND %not, %y
```

as long as we are guaranteed to eliminate the original G_AND.

Also matches all commuted forms. E.g.

```
%and = G_AND %y, %x
%xor = G_XOR %y, %and
```

will be matched as well.

Differential Revision: https://reviews.llvm.org/D88104

[InstCombine] Add basic trunc(shr(trunc(x),c)) tests

Helps improve the minor regressions noticed on D88316

[clangd] Use Decision Forest to score code completions.

By default clangd will score a code completion item using heuristics model.

Scoring can be done by Decision Forest model by passing `--ranking_model=decision_forest` to
clangd.

Features omitted from the model:
- `NameMatch` is excluded because the final score must be multiplicative in `NameMatch` to allow rescoring by the editor.
- `NeedsFixIts` is excluded because the generating dataset that needs 'fixits' is non-trivial.

There are multiple ways (heuristics) to combine the above two features with the prediction of the DF:
- `NeedsFixIts` is used as is with a penalty of `0.5`.

Various alternatives of combining NameMatch `N` and Decision forest Prediction `P`
- N * scale(P, 0, 1): Linearly scale the output of model to range [0, 1]
- N * a^P:
- More natural: Prediction of each Decision Tree can be considered as a multiplicative boost (like NameMatch)
- Ordering is independent of the absolute value of P. Order of two items is proportional to `a^{difference in model prediction score}`. Higher `a` gives higher weightage to model output as compared to NameMatch score.

Baseline MRR = 0.619
MRR for various combinations:
N * P = 0.6346, advantage%=2.5768
N * 1.1^P = 0.6600, advantage%=6.6853
N * **1.2**^P = 0.6669, advantage%=**7.8005**
N * **1.3**^P = 0.6668, advantage%=**7.7795**
N * **1.4**^P = 0.6659, advantage%=**7.6270**
N * 1.5^P = 0.6646, advantage%=7.4200
N * 1.6^P = 0.6636, advantage%=7.2671
N * 1.7^P = 0.6629, advantage%=7.1450
N * 2^P = 0.6612, advantage%=6.8673
N * 2.5^P = 0.6598, advantage%=6.6491
N * 3^P = 0.6590, advantage%=6.5242
N * scaled[0, 1] = 0.6465, advantage%=4.5054

Differential Revision: https://reviews.llvm.org/D88281

Add FunctionType to MLIR C and Python bindings.

Differential Revision: https://reviews.llvm.org/D88416

[AArch64] Reuse map iterator instead of double lookup. NFC

[unittests] Preserve LD_LIBRARY_PATH in crash recovery test

We need to preserve the LD_LIBRARY_PATH environment variable when
spawning a child process (certain setups rely on non-standard paths
for e.g. libstdc++). In order to achieve this, set
LLVM_CRC_UNIXCRCRETURNCODE in the parent process instead of creating
the child's environment from scratch.

Reviewed By: aganea

Differential Revision: https://reviews.llvm.org/D88308

[ubsan] nullability-arg: Fix crash on C++ member pointers

Extend -fsanitize=nullability-arg to handle call sites which accept C++
member pointers.

rdar://62476022

Differential Revision: https://reviews.llvm.org/D88336

[clangd] Add a trained DecisionForest for code completion.

Replaces the dummy CodeCompletion model with a trained DecisionForest
model.
The features.json needs to be manually curated specifying the features
to be used. This is a one-time cost and does not change if the model
changes until we decide to add/remove features.

Differential Revision: https://reviews.llvm.org/D88071

Revert "Add the ability to write target stop-hooks using the ScriptInterpreter."

This temporarily reverts commit b65966cff65bfb66de59621347ffd97238d3f645
while Jim figures out why the test is failing on the bots.

[clang][codegen] Annotate `correctly-rounded-divide-sqrt-fp-math` fn-attr for OpenCL only.

- `-cl-fp32-correctly-rounded-divide-sqrt` is an OpenCL-specific option
and `correctly-rounded-divide-sqrt-fp-math` should be added for OpenCL
at most.

Differential revision: https://reviews.llvm.org/D88303

[AMDGPU] Reformat AMDGPUTargetLowering::isSDNodeAlwaysUniform. NFC.

[ARM][LowOverheadLoops] Cleanup and re-arrange

Rename and reorganise how we decide where to put the LoopStart
instruction.

[llvm] Fix unused variable in non-debug configurations

[ARM] Added more patterns to generate SSAT/USAT with shift

Added patterns to generate an SSAT or USAT with shift for
SSAT/USAT instructions that are matched from IR patterns.

Differential Revision: https://reviews.llvm.org/D88145

[SVE] Lower fixed length VECREDUCE_[UMAX|UMIN] to Scalable

Essentially the same as the signed variants from D88259. Also includes a clean up of the lowering function.

Differential Revision: https://reviews.llvm.org/D88317

[ValueTracking] Fix analyses to update CxtI to be phi's incoming edges' terminators

It was mentioned that D88276 that when a phi node is visited, terminators at their incoming edges should be used for CtxI.
This is a patch that makes two functions (ComputeNumSignBitsImpl, isGuaranteedNotToBeUndefOrPoison) to do so.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D88360

[TableGen] Improved messages in PseudoLoweringEmitter.

[InstCombine] matchRotate - force splat of uniform constant rotation amounts (PR46895)

Fixes minor bug in D88402 where we were using the original shift constant (with undefs) instead of one with the splat values (re)splatted to all elements.

[NFC][ARM] Factor out some logic for LoLoops.

Create a DCE function that accepts an instruction.

[AMDGPU] Reformat SITargetLowering::isSDNodeSourceOfDivergence. NFC.

[llvm-readobj/elf] - Fix the PREL31 relocation computation used for dumping arm32 unwind info (-u).

This is a part of https://bugs.llvm.org/show_bug.cgi?id=47581.

We have the following computation:
```
(1) uint64_t Location = Address & 0x7fffffff;
(2) if (Location & 0x04000000)
(3) Location |= (uint64_t) ~0x7fffffff;
(4) return Location + Place;
```

At line 2 there is a mistype. The constant should be `0x40000000`,
not `0x04000000`, because the intention here is to sign extend the `Location`,
which is the 31 bit signed value.

Differential revision: https://reviews.llvm.org/D88407

[clang-tidy] IncludeInserter: allow <> in header name

This adds a pair of overloads for create(MainFile)?IncludeInsertion methods that
use the presence of the <> in the file name to control whether the #include
directive will use angle brackets or quotes. Motivating examples:
https://reviews.llvm.org/D82089#inline-789412 and
https://github.com/llvm/llvm-project/blob/master/clang-tools-extra/clang-tidy/modernize/MakeSmartPtrCheck.cpp#L433

The overloads with the IsAngled parameter can be removed after the users are
updated.

Update usages of createIncludeInsertion.

Update (almost all) usages of createMainFileIncludeInsertion.

Reviewed By: hokein

Differential Revision: https://reviews.llvm.org/D85666

[clang] Don't emit "no member" diagnostic if the lookup fails on an invalid record decl.

The "no member" diagnostic is likely bogus.

Reviewed By: sammccall, #libc

Differential Revision: https://reviews.llvm.org/D86765

[ARM][MVE] Enable tail-predication by default

We have been running tests/benchmarks downstream with tail-predication enabled
for some time now and this behaves as expected: we are not aware of any
correctness issues, and this performs better across the board than with
tail-predication disabled. Time to flip the switch!

Differential Revision: https://reviews.llvm.org/D88093

[InstCombine] matchRotate - allow undef in uniform constant rotation amounts (PR46895)

An extension to D87452, we can safely permit undefs in the uniform/splat detection

https://alive2.llvm.org/ce/z/nT-ptN

Differential Revision: https://reviews.llvm.org/D88402

[SCEV] Also use info from assumes in applyLoopGuards.

Similar to collecting information from branches guarding a loop, we can
also collect information from assumes dominating the loop header.

Fixes PR47247.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D87854

[AArch64] Generate .note.gnu.property based on module flags.

Flags of the module derived exclusively from the compiler flag `-mbranch-protection`.
The note is generated based on the module flags accordingly.
After this change in case of compile unit without function won't have
the .note.gnu.property if the compiler flag is not present [1].

[1] https://bugs.llvm.org/show_bug.cgi?id=46480

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D80791

[X86] Flip isShuffleEquivalent argument order to match isTargetShuffleEquivalent

A while ago, we converted isShuffleEquivalent/isTargetShuffleEquivalent to both use IsElementEquivalent internally.

This allows us to make the shuffle args optional like isTargetShuffleEquivalent and update foldShuffleOfHorizOp to use isShuffleEquivalent (which it should as its using a ISD::VECTOR_SHUFFLE mask).

[X86] Simplify broadcast mask detection with isUndefOrEqual helper.

Add an additional isUndefOrEqual variant that matches an entire mask, not just a single value.

[gn build] Port 018066d9475

[clangd] Add a tweak for filling in enumerators of a switch statement.

Add a tweak that populates an empty switch statement of an enumeration type with all of the enumerators of that type.

Before:
```
enum Color { RED, GREEN, BLUE };
void f(Color color) {
  switch (color) {}
}
```

After:
```
enum Color { RED, GREEN, BLUE };
void f(Color color) {
  switch (color) {
  case RED:
  case GREEN:
  case BLUE:
    break;
  }
}
```

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D88383

[lldb][NFC] Minor cleanup in CxxModuleHandler::tryInstantiateStdTemplate

Using llvm::None and `contains` instead of `find`.

[PowerPC] Clean-up mayRaiseFPException bits

According to POWER ISA, floating point instructions altering exception
bits in FPSCR should be 'may raise FP exception'. (excluding those
read or write the whole FPSCR directly, like mffs/mtfsf) We need to
model FPSCR well in future patches to handle the special case properly.

Instructions added mayRaiseFPException:
- fre(s)/frsqrte(s)
- fmadd(s)/fmsub(s)/fnmadd(s)/fnmsub(s)
- xscmpoqp/xscmpuqp/xscmpeqdp/xscmpgedp/xscmpgtdp
- xscvdphp/xscvhpdp/xvcvhpsp/xvcvsphp/xsrqpxp
- xsmaxcdp/xsincdp/xsmaxjdp/xsminjdp

Instructions removed mayRaiseFPException:
- xstdivdp/xvtdiv(d|s)p/xstsqrtdp/xvtsqrt(d|s)p
- xsabsdp/xsnabsdp/xvabs(d|s)p/xvnabs(d|s)p
- xsnegdp/xscpsgndp/xvneg(d|s)p/xvcpsgn(d|s)p
- xvcvsxwdp/xvcvuxwdp
- xscvdpspn/xscvspdpn

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D87738

[AMDGPU] Add bfi immediate pattern

Differential Revision: https://reviews.llvm.org/D88246

[AMDGPU] Make bfi patterns divergence-aware

This tends to increase code size but more importantly it reduces vgpr
usage, and could avoid costly readfirstlanes if the result needs to be
in an sgpr.

Differential Revision: https://reviews.llvm.org/D88245

[AMDGPU] Split R600 and GCN bfi patterns

This is in preparation for making the GCN patterns divergence-aware.
NFC.

Differential Revision: https://reviews.llvm.org/D88244

[InstCombine] Add tests for vector rotate by constants with undefs.

[yaml2obj][obj2yaml] - Add a support for SHT_ARM_EXIDX section.

This adds the support for SHT_ARM_EXIDX sections to obj2yaml/yaml2obj tools.

SHT_ARM_EXIDX is a ARM specific index table filled with entries.
Each entry consists of two 4-bytes values (words).
(https://developer.arm.com/documentation/ihi0038/c/?lang=en#index-table-entries)

Differential revision: https://reviews.llvm.org/D88228

[lldb] Reference STL types in import-std-module tests

With the recent patches to the ASTImporter that improve template type importing
(D87444), most of the import-std-module tests can now finally import the
type of the STL container they are testing. This patch removes most of the casts
that were added to simplify types to something the ASTImporter can import
(for example, std::vector<int>::size_type was casted to `size_t` until now).
Also adds the missing tests that require referencing the container type (for
example simply printing the whole container) as here we couldn't use a casting
workaround.

The only casts that remain are in the forward_list tests that reference
the iterator and the stack test. Both tests are still failing to import the
respective container type correctly (or crash while trying to import).

[obj2yaml][yaml2obj] - Stop recognizing SHT_MIPS_ABIFLAGS on non-MIPS targets.

Currently we are always recognizing the `SHT_MIPS_ABIFLAGS` section,
even on non-MIPS targets.

The problem of doing this is briefly discussed in D88228 which does the same for `SHT_ARM_EXIDX`:

"The problem is that `SHT_ARM_EXIDX` shares the value with `SHT_X86_64_UNWIND (0x70000001U)`.
We might have other machine specific conflicts, e.g.
`SHT_ARM_ATTRIBUTES` vs `SHT_MSP430_ATTRIBUTES` vs `SHT_RISCV_ATTRIBUTES (0x70000003U)`."

I think we should only recognize target specific sections when the machine type
matches. I.e. `SHT_MIPS_*` should be recognized only on `MIPS`, `SHT_ARM_*`
only on `ARM` etc.

This patch stops recognizing `SHT_MIPS_ABIFLAGS` on `non-MIPS` targets.

Note: I had to update `ScalarEnumerationTraits<ELFYAML::MIPS_ISA>::enumeration`, because
otherwise test crashes, calling `llvm_unreachable`.

Differential revision: https://reviews.llvm.org/D88294

[Coroutines] Remove unused includes. NFC.

[ARM][MVE] tail-predication: overflow checks for elementcount, cont'd

This is a reimplementation of the overflow checks for the elementcount,
i.e. the 2nd argument of intrinsic get.active.lane.mask. The element
count is lowered in each iteration of the tail-predicated loop, and
we must prove that this expression doesn't overflow.

Many thanks to Eli Friedman and Sam Parker for all their help with
this work.

Differential Revision: https://reviews.llvm.org/D88086

[ARM] Expand cannotInsertWDLSTPBetween to the last instruction

9d9a11c7be037 added this check for predicatable instructions between the
D/WLSTP and the loop's start, but it was missing the last instruction in
the block. Change it to use some iterators instead.

Differential Revision: https://reviews.llvm.org/D88354

[lldb] Remove nothreadallow from SWIG's __str__ wrappers to work around a Python>=3.7 crash

Usually when we enter a SWIG wrapper function from Python, SWIG automatically
adds a `Py_BEGIN_ALLOW_THREADS`/`Py_END_ALLOW_THREADS` around the call to the SB
API C++ function. This will ensure that Python's GIL is released when we enter
LLDB and locked again when we return to the wrapper code.

D51569 changed this behaviour but only for the generated `__str__` wrappers. The
added `nothreadallow` disables the injection of the GIL release/re-acquire code
and the GIL is now kept locked when entering LLDB and is expected to be still
locked when returning from the LLDB implementation. The main reason for that was
that back when D51569 landed the wrapper itself created a Python string. These
days it just creates a std::string and SWIG itself takes care of getting the GIL
and creating the Python string from the std::string, so that workaround isn't
necessary anymore.

This patch just removes `nothreadallow` so that our `__str__` functions now
behave like all other wrapper functions in that they release the GIL when
calling into the SB API implementation.

The motivation here is actually to work around another potential bug in LLDB.
When one calls into the LLDB SB API while holding the GIL and that call causes
LLDB to interpret some Python script via `ScriptInterpreterPython`, then the GIL
will be unlocked when the control flow returns from the SB API. In the case of
the `__str__` wrapper this would cause that the next call to a Python function
requiring the GIL would fail (as SWIG will not try to reacquire the GIL as it
isn't aware that LLDB removed it).

The reason for this unexpected GIL release seems to be a workaround for recent
Python versions:
```
    // The only case we should go further and acquire the GIL: it is unlocked.
    if (PyGILState_Check())
      return;
```

The early-exit here causes `InitializePythonRAII::m_was_already_initialized` to
be always false and that causes that `InitializePythonRAII`'s destructor always
directly unlocks the GIL via `PyEval_SaveThread`. I'm investigating how to
properly fix this bug in a follow up patch, but for now this straightforward
patch seems to be enough to unblock my other patches (and it also has the
benefit of removing this workaround).

The test for this is just a simple test for `std::deque` which has a synthetic
child provider implemented as a Python script. Inspecting the deque object will
cause `expect_expr` to create a string error message by calling
`str(deque_object)`. Printing the ValueObject causes the Python script for the
synthetic children to execute which then triggers the bug described above where
the GIL ends up being unlocked.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D88302

[Coroutines] Reuse storage for local variables with non-overlapping lifetimes

bug 45566 shows the process of building coroutine frame won't consider
that the lifetimes of different local variables are not overlapped,
which means the compiler could generates smaller frame.

This patch calculate the lifetime range of each alloca by StackLifetime
class. Then the patch build non-overlapped sets for allocas whose
lifetime ranges are not overlapped. We use the largest type in a
non-overlapped set as the field type in the frame. In insertSpills
process, if we find the type of field is not the same with the alloca,
we cast the pointer to the field type to the pointer to the alloca type.
Since the lifetime range of alloca in one non-overlapped set is not
overlapped with each other, it should be ok to reuse the storage space
in the frame.

Test plan: check-llvm, check-clang, cppcoro, folly

Reviewers: junparser, lxfind, modocache

Differential Revision: https://reviews.llvm.org/D87596

[SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy

After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

(MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700

[PowerPC] Add tests for `select` patterns. NFC.

Revert "Reland [CodeGen] emit CG profile for COFF object file"

This reverts commit 506b6170cb513f1cb6e93a3b690c758f9ded18ac.

This still causes link errors, see https://crbug.com/1130780.

[Test] Add tests where we can replace condition with invariants

Add profiling support for APValues.

For C++20 P0732R2; unused so far. Will be used and tested by a follow-on
commit.

Canonicalize declaration pointers when forming APValues.

References to different declarations of the same entity aren't different
values, so shouldn't have different representations.

Recommit of e6393ee813178e9d3306b8e3c6949a4f32f8a2cb with fixed handling
for weak declarations. We now look for attributes on the most recent
declaration when determining whether a declaration is weak. (Second
recommit with further fixes for mishandling of weak declarations. Our
behavior here is fundamentally unsound -- see PR47663 -- but this
approach attempts to not make things worse.)

[mlir][openacc] Add if, deviceptr operands and default attribute

Add operands to represent if and deviceptr. Default clause is represented with
an attribute.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D88331

[mlir][openacc] Switch to assembly format for acc.data

This patch remove the printer/parser for the acc.data operation since its syntax
fits nicely with the assembly format. It reduces the maintenance for this op.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D88330

[mlir][openacc] Remove detach and delete operands from acc.data

This patch remove the detach and delete operands. Those operands represent the detach
and delete clauses that will appear in another operation acc.exit_data

Reviewed By: kiranktp, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D88326