review.tizen.org Git - platform/upstream/llvm.git/log

[JITLink] Suppress expect-death test in release mode.

[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration

This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: reames

[Test] Add test for unreachable backedge with duplicating predecessors

AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100726

[JITLink] Enable creation and management of mutable block content.

This patch introduces new operations on jitlink::Blocks: setMutableContent,
getMutableContent and getAlreadyMutableContent. The setMutableContent method
will set the block content data and size members and flag the content as
mutable. The getMutableContent method will return a mutable copy of the existing
content value, auto-allocating and populating a new mutable copy if the existing
content is marked immutable. The getAlreadyMutableMethod asserts that the
existing content is already mutable and returns it.

setMutableContent should be used when updating the block with totally new
content backed by mutable memory. It can be used to change the size of the
block. The argument value should *not* be shared with any other block.

getMutableContent should be used when clients want to modify the existing
content and are unsure whether it is mutable yet.

getAlreadyMutableContent should be used when clients want to modify the existing
content and know from context that it must already be immutable.

These operations reduce copy-modify-update boilerplate and unnecessary copies
introduced when clients couldn't me sure whether the existing content was
mutable or not.

[cfe] Support target-specific escaped character in inline asm

GCC allows each target to define a set of non-letter and non-digit
escaped characters for inline assembly that will be replaced by another
string (They call this "punctuation" characters. The existing "%%" and
"%{" -- replaced by '%' and '{' at the end -- can be seen as special
cases shared by all targets).
This patch implements this feature by adding a new hook in `TargetInfo`.

Differential Revision: https://reviews.llvm.org/D103036

[Sema] Always search the full function scope context if a potential availability violation is encountered

This fixes both https://bugs.llvm.org/show_bug.cgi?id=50309 and https://bugs.llvm.org/show_bug.cgi?id=50310.

Previously, lambdas inside functions would mark their own bodies for later analysis when encountering a potentially unavailable decl, without taking into consideration that the entire lambda itself might be correctly guarded inside an @available check. The same applied to inner class member functions. Blocks happened to work as expected already, since Sema::getEnclosingFunction() skips through block scopes.

This patch instead simply and conservatively marks the entire outermost function scope for search, and removes some special-case logic that prevented DiagnoseUnguardedAvailabilityViolations from traversing down into lambdas and nested functions. This correctly accounts for arbitrarily nested lambdas, inner classes, and blocks that may be inside appropriate @available checks at any ancestor level. It also treats all potential availability violations inside functions consistently, without being overly sensitive to the current DeclContext, which previously caused issues where e.g. nested struct members were warned about twice.

DiagnoseUnguardedAvailabilityViolations now has more work to do in some cases, particularly in functions with many (possibly deeply) nested lambdas and classes, but the big-O is the same, and the simplicity of the approach and the fact that it fixes at least two bugs feels like a strong win.

Differential Revision: https://reviews.llvm.org/D102338

[lld:elf] Weaken the requirement for a computed binding to be STB_LOCAL

Given the following scenario:

```
// Cat.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };

extern "C" int puts(char const *);
void Cat::makeNoise() const { puts("Meow"); }
void doThingWithCat(Animal *a) { static_cast<Cat *>(a)->makeNoise(); }

// CatUser.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };

void doThingWithCat(Animal *a);

void useDoThingWithCat() {
  Cat *d = new Cat;
  doThingWithCat(d);
}

// cat.ver
{
  global: _Z17useDoThingWithCatv;
  local: *;
};

$ clang++ Cat.cpp CatUser.cpp -fpic -flto=thin -fwhole-program-vtables
-shared -O3 -fuse-ld=lld -Wl,--lto-whole-program-visibility
-Wl,--version-script,cat.ver
```

We cannot devirtualize `Cat::makeNoise`. The issue is complex:

Due to `-fsplit-lto-unit` and usage of type metadata, we place the Cat
vtable declaration into module 0 and the Cat vtable definition with type
metadata into module 1, causing duplicate entries (Undefined followed by
Defined) in the `lto::InputFile::symbols()` output.
In `BitcodeFile::parse`, after processing the `Undefined` then the
`Defined`, the final state is `Defined`.
In `BitcodeCompiler::add`, for the first symbol, `computeBinding`
returns `STB_LOCAL`, then we reset it to `Undefined` because it is
prevailing (`versionId` is `preserved`). For the second symbol, because
the state is now `Undefined`, `computeBinding` returns `STB_GLOBAL`,
causing `ExportDynamic` to be true and suppressing devirtualization.

In D77280, the `computeBinding` change used a stricter `isDefined()`
condition to make weak``Lazy` symbol work.
This patch relaxes the condition to weaker `!isLazy()` to keep it
working while making the devirtualization work as well.

Differential Revision: https://reviews.llvm.org/D98686

Making Instrumentation aware of LoopNest Pass

Intrumentation callbacks are not made aware of LoopNest passes. From the loop pass manager, we can pass the outermost loop of the LoopNest to instrumentation in case of LoopNest passes.

The current patch made the change in two places in StandardInstrumentation.cpp. I will submit a proper patch where the OuterMostLoop is passed from the LoopPassManager to the call backs. That way we will avoid making changes at multiple places in StandardInstrumentation.cpp.

A testcase also will be submitted.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102463

Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"

This reverts commit d65c32fb41b03a35a2a16330ba1ea15cf6818f04.

[libomptarget] [amdgpu] Added LDS usage to the kernel trace

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103059

Revert "Do not create LLVM IR `constant`s for objects with dynamic initialisation"

This reverts commit 13dd65b3a1a3ac049b5f3a9712059f7c61649bea.
Breaks check-clang on macOS, see https://reviews.llvm.org/D102693

[NFC][scudo] Add paramenters DCHECKs

Reviewed By: hctim

Differential Revision: https://reviews.llvm.org/D103042

lld-coff: Simplify a few lambda uses after 7975dd033cb9

Add a range-based wrapper for std::unique(begin, end, binary_predicate)

[NFC][OMP] Fix 'unused' warning

[NFC][scudo] Avoid cast in test

[dsymutil] Emit an error when the Mach-O exceeds the 4GB limit.

The Mach-O object file format is limited to 4GB because its used of
32-bit offsets in the header. It is possible for dsymutil to (silently)
emit an invalid binary. Instead of having consumers deal with this, emit
an error instead.

[dsymutil] Use EXIT_SUCCESS and EXIT_FAILURE (NFC)

[dsymutil] Compute the output location once per input file (NFC)

Compute the location of the output file just once outside the loop over
the different architectures.

PR50456: Properly handle multiple escaped newlines in a '*/'.

[scudo] Add unmapTestOnly() to secondary.

When trying to track down a vaddr-poisoning bug, I found that that the
secondary cache isn't emptied on test teardown. We should probably do
that to make the tests hermetic. Otherwise, repeating the tests lots of
times using --gtest_repeat fails after the mmap vaddr space is
exhausted.

To repro:
$ ninja check-scudo_standalone # build
$ ./projects/compiler-rt/lib/scudo/standalone/tests/ScudoUnitTest-x86_64-Test \
--gtest_filter=ScudoSecondaryTest.*:-ScudoSecondaryTest.SecondaryCombinations \
--gtest_repeat=10000

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D102874

[mlir-opt] Don't enable `printOpOnDiagnostic` if it was explicitly disabled.

We are currently explicitly setting the flag solely based on the value of `-verify`, which ends up ignoring the situation where the user explicitly disabled this option from the command line.

Differential Revision: https://reviews.llvm.org/D102952

[SLP] Fix "gathering" of insertelement instructions

For rare exceptional case vector tree node (insertelements for now only)
is marked as `NeedToGather`, this case is processed by patch. Follow-up
of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135.

Differential Revision: https://reviews.llvm.org/D102675

[OpenMP] Fix crashing critical section with hint clause

Runtime was using the default lock type without using the hint.

Differential Revision: https://reviews.llvm.org/D102955

[libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case.

Fix the case where NumTeams was set incorrectly instead of NumThreads

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103037

[lldb][NFC] Remove unused header from Target

Should have been removed with 4c0b0de904a5622c33e3ed97e86c6792fbc13feb
but I forgot to do so.

[mlir] Lower sm version for TensorCore intergration tests

Those tests only require sm70, this allows to run those integration
tests on more hardware.

Differential Revision: https://reviews.llvm.org/D103049

[compiler-rt][scudo] Fix sign-compare warnings

Fix buildbot failure
https://lab.llvm.org/buildbot/#/builders/57/builds/6542/steps/6/logs/stdio

/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:1629:28:
error: comparison of integers of different signs: 'const unsigned long'
and 'const int' [-Werror,-Wsign-compare]
GTEST_IMPL_CMP_HELPER_(GT, >);
~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:1609:12:
note: expanded from macro 'GTEST_IMPL_CMP_HELPER_'
  if (val1 op val2) {\
      ~~~~ ^  ~~~~
/llvm-project/compiler-rt/lib/scudo/standalone/tests/common_test.cpp:30:3:
note: in instantiation of function template specialization
'testing::internal::CmpHelperGT<unsigned long, int>' requested here
  EXPECT_GT(OnStart, 0);
  ^

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D103029

[libc++] Assume that __wrap_iter always wraps a fancy pointer.

Not only do we conscientiously avoid using `__wrap_iter` for non-contiguous
iterators (in vector, string, span...) but also we make the assumption
(in regex) that `__wrap_iter<_Iter>` is contiguous for all `_Iter`.

So `__wrap_iter<reverse_iterator<int*>>` should be considered IFNDR,
and every `__wrap_iter` should correctly advertise contiguity in C++20.

Drive-by simplify some type traits.

Reviewed as part of https://reviews.llvm.org/D102781

Do not create LLVM IR `constant`s for objects with dynamic initialisation

When a const-qualified object has a section attribute, that
section is set to read-only and clang outputs a LLVM IR constant
for that object. This is incorrect for dynamically initialised
objects.

For example:

    int init() { return 15; }

    __attribute__((section("SA")))
    const int a = init();

a is allocated to a read-only section and is left
unintialised (zero-initialised).

This patch adds checks if an initialiser is a constant expression
and allocates objects to sections as follows:

* const-qualified objects
  - no initialiser or constant initialiser: .rodata
  - dynamic initializer: .bss
* non const-qualified objects
  - no initialiser or dynamic initialiser: .bss
  - constant initialiser: .data

(".rodata", ".data", and ".bss" names used just for explanatory
purpose)

Differential Revision: https://reviews.llvm.org/D102693

[lldb] Move ClangModulesDeclVendor ownership to ClangPersistentVariables from Target

More decoupling of plugins and non-plugins. Target doesn't need to
manage ClangModulesDeclVendor and ClangPersistentVariables is always available
in situations where you need ClangModulesDeclVendor.

Differential Revision: https://reviews.llvm.org/D102811

[flang][cmake] Set the default for FLANG_BUILD_NEW_DRIVER for oot builds

For out-of-tree builds of Flang, FLANG_BUILD_NEW_DRIVER is not inherited
from llvm-project/llvm/CMakeLists.txt. Instead, a separate definition is
required (but only for out-of-tree builds).

Differential Revision: https://reviews.llvm.org/D102323

[NFC][CSSPGO]llvm-profge] Fix Build warning dueo to an attrbute usage.

[GreedyPatternRewriter] Introduce a config object that allows controlling internal parameters. NFC.

This exposes the iterations and top-down processing as flags, and also
allows controlling whether region simplification is desirable for a client.
This allows deleting some duplicated entrypoints to
applyPatternsAndFoldGreedily.

This also deletes the Constant Preprocessing pass, which isn't worth it
on balance.

All defaults are all kept the same, so no one should see a behavior change.

Differential Revision: https://reviews.llvm.org/D102988

[CSSPGO][llvm-profgen] Report samples for untrackable frames.

Fixing an issue where samples collected for an untrackable frame is not reported. An untrackable frame refers to a frame whose caller is untrackable due to missing debug info or pseudo probe. Though the frame is connected to its parent frame through the frame pointer chain at runtime, the compiler cannot build the connection without debug info or pseudo probe. In such case we just need to report the untrackable frame as the base frame and all of its child frames.

With more samples reported I'm seeing this improves the performance of an internal benchmark by 2.5%.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D102961

fix up test from D102742

In D102742, I mistakenly put the split file designator above a bunch of
CHECK lines, which unintentionally removed the CHECKs from actually
being verified.

This can be verified by observing:
<build dir>/test/CodeGen/X86/Output/stack-protector-3.ll.tmp/main.ll

Surface clone APIs in CAPI

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D102987

[gn build] Port b510e4cf1b96

[RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks.

This is a replacement for D101938 for inserting vsetvli
instructions where needed. This new version changes how
we track the information in such a way that we can extend
it to be aware of VL/VTYPE changes in other blocks. Given
how much it changes the previous patch, I've decided to
abandon the previous patch and post this from scratch.

For now the pass consists of a single phase that assumes
the incoming state from other basic blocks is unknown. A
follow up patch will extend this with a phase to collect
information about how VL/VTYPE change in each block and
a second phase to propagate this information to the entire
function. This will be used by a third phase to do the
vsetvli insertion.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102737

[gn build] Port a64ebb863727

[WebAssembly] Add NullifyDebugValueLists pass

`WebAssemblyDebugValueManager` does not currently handle
`DBG_VALUE_LIST`, which is a recent addition to LLVM. We tried to
nullify them within the constructor of `WebAssemblyDebugValueManager` in
D102589, but it made the class error-prone to use because it deletes
instructions within the constructor and thus invalidates existing
iterators within the BB, so the user of the class should take special
care not to use invalidated iterators. This actually caused a bug in
ExplicitLocals pass.

Instead of trying to fix ExplicitLocals pass to make the iterator usage
correct, which is possible but error-prone, this adds
NullifyDebugValueLists pass that nullifies all `DBG_VALUE_LIST`
instructions before we run WebAssembly specific passes in the backend.
We can remove this pass after we implement handlers for
`DBG_VALUE_LIST`s in `WebAssemblyDebugValueManager` and elsewhere.

Fixes https://github.com/emscripten-core/emscripten/issues/14255.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D102999

[dfsan] Add function that prints origin stack trace to buffer

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D102451

[CUDA] Work around compatibility issue with libstdc++ 11.1.0

libstdc++ redeclares __failed_assertion multiple times and that results in the
function declared with conflicting set of attributes when we include <complex>
with __host__ __device__ attributes force-applied to all functions.

In order to work around the issue, we rename __failed_assertion within the
region with forced attributes.

See https://bugs.llvm.org/show_bug.cgi?id=50383 for the details.

Differential Revision: https://reviews.llvm.org/D102936

Enable MLIR Python bindings for TOSA.

Differential Revision: https://reviews.llvm.org/D103035

[lldb] Add missing mutex guards to TargetList::CreateTarget

TestMultipleTargets is randomly failing on the bots. The reason for that is that
the test is calling `SBDebugger::CreateTarget` from multiple threads.
`TargetList::CreateTarget` is curiously missing the guard that all of its other
member functions have, so all the threads in the test end up changing the
internal TargetList state at the same time and end up corrupting it.

Reviewed By: vsk, JDevlieghere

Differential Revision: https://reviews.llvm.org/D103020

Revert "[NFC] remove explicit default value for strboolattr attribute in tests"

This reverts commit bda6e5bee04c75b1f1332b4fd1ac4e8ef6c3c247.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance

[NFC] remove explicit default value for strboolattr attribute in tests

Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080

[X86] Call insertDAGNode on trunc/zext created in tryShiftAmountMod.

This puts the new nodes in the proper place in the topologically
sorted list of nodes.

Fixes PR50431, which was introduced recently in D101944.

[gn build] Port 095e91c9737b

[NFC][scudo] Small test cleanup

Fixing issues raised on D102979 review.

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D102994

[Remarks] Add analysis remarks for memset/memcpy/memmove lengths

Re-landing now that the crasher this patch previously uncovered has been fixed
in: https://reviews.llvm.org/D102935

Differential revision: https://reviews.llvm.org/D102452

[X86][Costmodel] getMaskedMemoryOpCost(): don't scalarize non-power-of-two vectors with legal element type

This follows in steps of similar `getMemoryOpCost()` changes, D100099/D100684.

Intel SDM, `VPMASKMOV — Conditional SIMD Integer Packed Loads and Stores`:
```
Faults occur only due to mask-bit required memory accesses that caused the faults. Faults will not occur due to
referencing any memory location if the corresponding mask bit for that memory location is 0. For example, no
faults will be detected if the mask bits are all zero.
```
I.e., if mask is all-zeros, any address is fine.

Masked load/store's prime use-case is e.g. tail masking the loop remainder,
where for the last iteration, only first some few elements of a vector exist.

So much similarly, i don't see why must we scalarize non-power-of-two vectors,
iff the element type is something we can masked- store/load.
We simply need to legalize it, widen the mask, and be done with it.
And we even already count the cost of widening the mask.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D102990

[RISCV] Optimize getVLENFactoredAmount function.

If the local variable `NumOfVReg` isPowerOf2_32(NumOfVReg - 1) or isPowerOf2_32(NumOfVReg + 1), the ADDI and MUL instructions can be replaced with SLLI and ADD(or SUB) instructions.

Based on original patch by StephenFan.

Reviewed By: frasercrmck, StephenFan

Differential Revision: https://reviews.llvm.org/D100577

[mlir][doc] Fix links and references in top level docs directory

This is the fourth and final patch in a series of patches fixing markdown links and references inside the mlir documentation. This patch combined with the other three should fix almost every broken link on mlir.llvm.org as far as I can tell.

This patch in particular addresses all Markdown files in the top level docs directory.

Differential Revision: https://reviews.llvm.org/D103032

[Remarks] Look through inttoptr/ptrtoint for -ftrivial-auto-var-init remarks.

The crasher is a related problem that @aemerson found broke speck2k6/403.gcc
when I landed https://reviews.llvm.org/D102452. It has been reduced & modified
to reproduce without that patch.

Differential revision: https://reviews.llvm.org/D102935

CoroSplit: Replace ad-hoc implementation of reachability with API from CFG.h

The current ad-hoc implementation used to determine whether a basic
block is unreachable doesn't work correctly in the general case (for
example it won't detect successors of unreachable blocks as
unreachable). This patch replaces it with the correct API that uses a
DominatorTree to answer the question correctly and quickly.

rdar://77181156

Differential Revision: https://reviews.llvm.org/D102963

[llvm] Revert align attr test in test/Bitcode/attribute-3.3.ll

Revert testcase changed in D87304 now the upgrader can correctly handle
the align attribute in upgrader.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D102880

[mlir][tosa] Align tensor rank specifications with current spec

Deconstrains several TOSA operators to align with the current TOSA spec, including all the elementwise ops.
Note: some more ops are under consideration for further cleanup; they will follow once the spec has been updated.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D102958

[scudo] Separate Fuchsia & Default SizeClassMap

The Fuchsia allocator config was using the default size class map.

This CL gives Fuchsia its own size class map and changes a couple of
things in the default one:
- make `SizeDelta` configurable in `Config` for a fixed size class map
  as it currently is for a table size class map;
- switch `SizeDelta` to 0 for the default config, it allows for size
  classes that allow for power of 2s, and overall better wrt pages
  filling;
- increase the max number of caches pointers to 14 in the default,
  this makes the transfer batch 64/128 bytes on 32/64-bit platforms,
  which is cache-line friendly (previous size was 48/96 bytes).

The Fuchsia size class map remains untouched for now, this doesn't
impact Android which uses the table size class map.

Differential Revision: https://reviews.llvm.org/D102783

[CVP] Add additional test for phi common val transform (NFC)

[LoopUnroll] Add additional trip multiple test (NFC)

This uses a trip multiple on a (unique) non-latch exit.

[LoopUnroll] Regenerate test checks (NFC)

Remark was added to clang tooling Diagnostic

The diff adds Remark to Diagnostic::Level for clang tooling. That makes
Remark diagnostic level ready to use in clang-tidy checks: the
clang-diagnostic-module-import becomes visible as a part of the change.

[CostModel][X86] Add missing SSE41 v2iX sext/zext costs

Also fix existing v4i8->v4i16 sext cost to match the equivalents

[libc++][doc] Update format paper status.

- Fixes paper number P1862 -> P1868. (The title was correct.)
- Marks P1868 as in progress.
- Marks P1892 as in progress.
- Marks LWG-3327 as nothing to do, since the wording change doesn't
impact the code. (Also updated on the general C++20 status page.)

[NVPTX] Fix lowering of frem for negative values

to match fmod frem result must have the dividend sign. Previous implementation
had the wrong sign when passing negative numbers. For ex: frem(-16, 7) was
returning 5 instead of -2. We should just a ftrunc instead of floor when
lowering to get the right behavior.

Differential Revision: https://reviews.llvm.org/D102528

[CostModel][X86] Regenerate sse-itoi.ll test checks

[ConstProp] propagate poison from vector reduction element(s) to result

This follows from the underlying logic for binops and min/max.
Although it does not appear that we handle this for min/max
intrinsics currently.
https://alive2.llvm.org/ce/z/Kq9Xnh

[ConstProp] add tests for vector reductions with poison elements; NFC

[VPlan] Add first VPlan version of sinkScalarOperands.

This patch adds a first VPlan-based implementation of sinking of scalar
operands.

The current version traverse a VPlan once and processes all operands of
a predicated REPLICATE recipe. If one of those operands can be sunk,
it is moved to the block containing the predicated REPLICATE recipe.
Continue with processing the operands of the sunk recipe.

The initial version does not re-process candidates after other recipes
have been sunk. It also cannot partially sink induction increments at
the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the
induction is used for example in a GEP, only the first lane is used and
in the lowered IR the adds for the other lanes can be sunk into the
predicated blocks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100258

[lldb] Readd deleted variable in the sample test

In D102771 wanted to make `test_var` global to demonstrate the a no-launch test,
but the old variable is still needed for another test. This just creates the
global var with a different name to demonstrate the no-launch functionality.

[lldb] Introduce createTestTarget for creating a valid target in API tests

At the moment nearly every test calls something similar to
`self.dbg.CreateTarget(self.getBuildArtifact("a.out"))` and them sometimes
checks if the created target is actually valid with something like
`self.assertTrue(target.IsValid(), "some useless text")`.

Beside being really verbose the error messages generated by this pattern are
always just indicating that the target failed to be created but now why.

This patch introduces a helper function `createTestTarget` to our Test class
that creates the target with the much more verbose `CreateTarget` overload that
gives us back an SBError (with a fancy error). If the target couldn't be created
the function prints out the SBError that LLDB returned and asserts for us. It
also defaults to the "a.out" build artifact path that nearly all tests are using
to avoid to hardcode "a.out" in every test.

I converted a bunch of tests to the new function but I'll do the rest of the
test suite as follow ups.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D102771

[lldb] Reland "Fix UB in half2float" to fix the ubsan bot.

This relands part of the UB fix in 4b074b49be206306330076b9fa40632ef1960823.
The original commit also added some additional tests that uncovered some
other issues (see D102845). I landed all the passing tests in
48780527dd6820698f3537f5ebf76499030ee349 and this patch is now just fixing
the UB in half2float. See D102846 for a proposed rewrite of the function.

Original commit message:

  The added DumpDataExtractorTest uncovered that this is lshifting a negative
  integer which upsets ubsan and breaks the sanitizer bot. This patch just
  changes the variable we shift to be unsigned.

[OpenCL][Docs] Minor update to OpenCL 3.0

[CostModel][X86] Improve accuracy of vector non-uniform shift costs on XOP/AVX2 targets

By llvm-mca analysis, Haswell/Broadwell has a non-uniform vector shift recip-throughput cost of the AVX2 targets at 2 for both 128 and 256-bit vectors - XOP capable targets have better 128-bit vector shifts so improve the fallback in those cases.

[VectorCombine] Fix load extract scalarization tests with assumes.

The input IR for @load_extract_idx_var_i64_known_valid_by_assume
and @load_extract_idx_var_i64_not_known_valid_by_assume_after_load
has been swapped.

This patch fixes the test so that @load_extract_idx_var_i64_known_valid_by_assume
has the assume before the load and the other test has it after.

[VPlan] Add mayReadOrWriteMemory & friends.

This patch adds initial implementation of mayReadOrWriteMemory,
mayReadFromMemory and mayWriteToMemory to VPRecipeBase.

Used by D100258.

[OpenCL] Fix test by adding SPIR triple

[AArch64][SVE] Add fixed length codegen for FP_ROUND/FP_EXTEND

Depends on D102498

Differential Revision: https://reviews.llvm.org/D102607

[AArch64][SVE] Improve codegen for fixed length vector concat

Differential Revision: https://reviews.llvm.org/D102498

[OpenCL] Add clang extension for bit-fields.

Allow use of bit-fields as a clang extension
in OpenCL. The extension can be enabled using
pragma directives.

This fixes PR45339!

Differential Revision: https://reviews.llvm.org/D101843

[ARM] Allow findLoopPreheader to return headers with multiple loop successors

The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. This patch adds an option
to relax that, allowing ARMLowOverheadLoops to process more loops
successfully. This helps with WhileLoopStart setup instructions that can
branch/fallthrough to the low overhead loop and to branch to a separate
loop from the same preheader (but I don't believe it is possible for
both loops to be low overhead loops).

Differential Revision: https://reviews.llvm.org/D102747

Recommit "[VectorCombine] Scalarize vector load/extract."

This reverts commit 94d54155e2f38b56171811757044a3e6f643c14b.

This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.

[ARM] Ensure WLS preheader blocks have branches during memcpy lowering

This makes sure that the blocks created for lowering memcpy to loops end
up with branches, even if they fall through to the successor. Otherwise
IfCvt is getting confused with unanalyzable branches and creating
invalid block layouts.

The extra branches should be removed as the tail predicated loop is
finalized in almost all cases.

[ARM] Fix inline memcpy trip count sequence

The trip count for a memcpy/memset will be n/16 rounded up to the
nearest integer. So (n+15)>>4. The old code was including a BIC too, to
clear one of the bits, which does not seem correct. This remove the
extra BIC.

Note that ideally this would never actually be generated, as in the
creation of a tail predicated loop we will DCE that setup code, letting
the WLSTP perform the trip count calculation. So this doesn't usually
come up in testing (and apparently the ARMLowOverheadLoops pass does not
do any sort of validation on the tripcount). Only if the generation of
the WLTP fails will it use the incorrect BIC instructions.

Differential Revision: https://reviews.llvm.org/D102629

[MLIR] Drop old cmake var names

Drop old cmake variable names that were kept around so that zorg
buildbot could be migrated, which has now happened (D102977). D102976
had fixed the inconsistent names.

Differential Revision: https://reviews.llvm.org/D102997

[RISCV] Prevent store combining from infinitely looping

RVV code generation does not successfully custom-lower BUILD_VECTOR in all
cases. When it resorts to default expansion it may, on occasion, be expanded to
scalar stores through the stack. Unfortunately these stores may then be picked
up by the post-legalization DAGCombiner which merges them again. The merged
store uses a BUILD_VECTOR which is then expanded, and so on.

This patch addresses the issue by overriding the `mergeStoresAfterLegalization`
hook. A lack of granularity in this method (being passed the scalar type) means
we opt out in almost all cases when RVV fixed-length vector support is enabled.
The only exception to this rule are mask vectors, which are always either
custom-lowered or are expanded to a load from a constant pool.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D102913

[debuginfo-tests] Stop using installed LLDB and remove redundancy

The removed code just replicated what use_llvm_tool does, plus looked
for an installed LLDB on the PATH to use. In a monorepo world, it seems
likely that if people want to run the tests that require LLDB, they
should enable and build LLDB itself. If users really want to use the
installed LLDB executable, they can specify the path to the executable
as an environment variable "LLDB".

See the discussion in https://reviews.llvm.org/D95339#2638619 for
more details.

Reviewed by: jmorse, aprantl

Differential Revision: https://reviews.llvm.org/D102680

[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant

Given that BaseX is an incoming value when coming from the preheader,
it *should* be loop-invariant, but let's just document this assumption.

[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant

As per the reproducer provided by Mikael Holmén in post-commit review.

flang: include limits

Revert "[VectorCombine] Scalarize vector load/extract."

This reverts commit 86497785d540e59eaca24bed4219ddec183cbc9b.

One of the tests causes an ASAN failure.
https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio

[CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets

By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.

[AMDGPU][Libomptarget] Remove global KernelNameMap

KernelNameMap contains entries like "key.kd" => key which clearly
could be replaced by simple logic of removing suffix from the key.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102691

[Debug-Info]update section name to match AIX behaviour; nfc

[VectorCombine] Scalarize vector load/extract.

This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.

At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.

This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4

This should complement D98240.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D100273

[analyzer] Correctly propagate ConstructionContextLayer thru ParenExpr

Previously, information about `ConstructionContextLayer` was not
propagated thru causing the expression like:

Var c = (createVar());

To produce unrelated temporary for the `createVar()` result and conjure
a new symbol for the value of `c` in C++17 mode.

Reviewed By: steakhal

Patch By: tomasz-kaminski-sonarsource!

Differential Revision: https://reviews.llvm.org/D102835

[Attributor] Introduce a helper do deal with constant type mismatches

If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.

This also introduces the AA namespace for abstract attribute related
functions and types.

[Attributor] Teach AAIsDead about undef values

Not only if the branch or switch condition is dead but also if it is
assumed `undef` we can delay AAIsDead exploration.