review.tizen.org Git - platform/upstream/llvm.git/log

[RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi.

These instructions don't really exist, but we have ways we can
emulate them.

.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.

For unmasked vmsge(u).vx we use:
  vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd

For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
  vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt

For other masked cases we use this sequence:
  vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.

Differential Revision: https://reviews.llvm.org/D100925

[RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics.

Refactor to use new multiclass instead of individual patterns.

We already supported this due to SEW=64 on RV32, but we didn't have
test cases for all the types we supported.

Part of D100925

[RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics.

We don't have instructions for these, but can swap the operands
to use vmle/vmflt. This makes the IR interface more consistent and
simplifies the frontend implementation.

Part of D100925

[NFC] Remove reference to file deleted by D100981.

[scudo] Check if MADV_DONTNEED zeroes memory

QEMU just ignores MADV_DONTNEED
https://github.com/qemu/qemu/blob/b1cffefa1b163bce9aebc3416f562c1d3886eeaa/linux-user/syscall.c#L11941

Depends on D100998.

Differential Revision: https://reviews.llvm.org/D101031

[sanitizer] Use COMPILER_RT_EMULATOR with gtests

Differential Revision: https://reviews.llvm.org/D100998

[flang] Update recently added OpenMP tests to use the new driver

Switching from `%f18` to `%flang_fc1` in LIT tests added in
https://reviews.llvm.org/D91159. This way these tests are run with the
new driver, `flang-new`, when enabled (i.e. when
`FLANG_BUILD_NEW_DRIVER` is set).

Differential Revision: https://reviews.llvm.org/D101078

Temporarily revert the code part of D100981 "Delete le32/le64 targets"

This partially reverts commit 77ac823fd285973cfb3517932c09d82e6a32f46d.

Halide uses le32/le64 (https://github.com/halide/Halide/pull/5934).
Temporarily brings back the code part to give them some time for migration.

[lsan] Temporarily disable new check broken on arm7

[RISCV] Turn splat shuffles of vector loads into strided load with stride of x0.

Implementations are allowed to optimize an x0 stride to perform
less memory accesses. This is the case in SiFive cores.

No idea if this is the case in other implementations. We might
need a tuning flag for this.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D100815

[lldb] Fix that the expression commands --top-level flag overwrites --allow-jit false

The `--allow-jit` flag allows the user to force the IR interpreter to run the
provided expression.

The `--top-level` flag parses and injects the code as if its in the top level
scope of a source file.

Both flags just change the ExecutionPolicy of the expression:
* `--allow-jit true` -> doesn't change anything (its the default)
* `--allow-jit false` -> ExecutionPolicyNever
* `--top-level` -> ExecutionPolicyTopLevel

Passing `--allow-jit false` and `--top-level` currently causes the `--top-level`
to silently overwrite the ExecutionPolicy value that was set by `--allow-jit
false`. There isn't any ExecutionPolicy value that says "top-level but only
interpret", so I would say we reject this combination of flags until someone
finds time to refactor top-level feature out of the ExecutionPolicy enum.

The SBExpressionOptions suffer from a similar symptom as `SetTopLevel` and
`SetAllowJIT` just silently disable each other. But those functions don't have
any error handling, so not a lot we can do about this in the meantime.

Reviewed By: labath, kastiglione

Differential Revision: https://reviews.llvm.org/D91780

[RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32.

Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.

This is equivalent to how we move i64 to f64 on RV32.

I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.

Differential Revision: https://reviews.llvm.org/D101002

[Hexagon] Add HVX intrinsics for conditional vector loads/stores

Intrinsics for the following instructions are added. The intrinsic
name is "int_hexagon_<inst>[_128B]", e.g.
  int_hexagon_V6_vL32b_pred_ai        for 64-byte version
  int_hexagon_V6_vL32b_pred_ai_128B   for 128-byte version

V6_vL32b_pred_ai        if (Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_pred_pi        if (Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_pred_ppu       if (Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_npred_ai       if (!Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_npred_pi       if (!Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_npred_ppu      if (!Pv4) Vd32 = vmem(Rx32++Mu2)

V6_vL32b_nt_pred_ai     if (Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_pred_pi     if (Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_pred_ppu    if (Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vL32b_nt_npred_ai    if (!Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_npred_pi    if (!Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_npred_ppu   if (!Pv4) Vd32 = vmem(Rx32++Mu2):nt

V6_vS32b_pred_ai        if (Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_pred_pi        if (Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_pred_ppu       if (Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32b_npred_ai       if (!Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_npred_pi       if (!Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_npred_ppu      if (!Pv4) vmem(Rx32++Mu2) = Vs32

V6_vS32Ub_pred_ai       if (Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_pred_pi       if (Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_pred_ppu      if (Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32Ub_npred_ai      if (!Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_npred_pi      if (!Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_npred_ppu     if (!Pv4) vmemu(Rx32++Mu2) = Vs32

V6_vS32b_nt_pred_ai     if (Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_pred_pi     if (Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_pred_ppu    if (Pv4) vmem(Rx32++Mu2):nt = Vs32
V6_vS32b_nt_npred_ai    if (!Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_npred_pi    if (!Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_npred_ppu   if (!Pv4) vmem(Rx32++Mu2):nt = Vs32

[OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime

Summary:
This patch adds a new runtime function __tgt_set_info_flag that allows the
user to set the information level at runtime without using the environment
variable. Using this will require an extern function, but will eventually be
added into an auxilliary library for OpenMP support functions.

This patch required moving the current InfoLevel to a global variable which must
be instantiated by each plugin.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100774

Fix memory leak in MicrosoftDemangleNodes's Node::toString

The buffer we turn into a std::string here is malloc'd and should be
free'd before we return from this function.

Follow up to LLDB leak fixes such as D100806.

Reviewed By: mstorsjo, rupprecht, MaskRay

Differential Revision: https://reviews.llvm.org/D100843

[flang] Fix spurious errors from runtime derived type table construction

Andrezj W. @ Arm discovered that the runtime derived type table
building code in semantics was detecting fatal errors in the tests
that the f18 driver wasn't printing. This patch fixes f18 so that
these messages are printed; however, the messages were not valid user
errors, and the rest of this patch fixes them up.

There were two sources of the bogus errors. One was that the runtime
derived type information table builder was calculating the shapes of
allocatable and pointer array components in derived types, and then
complaining that they weren't constant or LEN parameter values, which
of course they couldn't be since they have to have deferred shapes
and those bounds were expressions like LBOUND(component,dim=1).

The second was that f18 was forwarding the actual LEN type parameter
expressions of a type instantiation too far into the uses of those
parameters in various expressions in the declarations of components;
when an actual LEN type parameter is not a constant value, it needs
to remain a "bare" type parameter inquiry so that it will be lowered
to a descriptor inquiry and acquire a captured expression value.

Fixing this up properly involved: moving some code into new utility
function templates in Evaluate/tools.h, tweaking the rewriting of
conversions in expression folding to elide needless integer kind
conversions of type parameter inquiries, making type parameter
inquiry folding *not* replace bare LEN type parameters with
non-constant actual parameter values, and cleaning up some
altered test results.

Differential Revision: https://reviews.llvm.org/D101001

[dfsan] Track origin at loads

    The first version of origin tracking tracks only memory stores. Although
    this is sufficient for understanding correct flows, it is hard to figure
    out where an undefined value is read from. To find reading undefined values,
    we still have to do a reverse binary search from the last store in the chain
    with printing and logging at possible code paths. This is
    quite inefficient.

    Tracking memory load instructions can help this case. The main issues of
    tracking loads are performance and code size overheads.

    With tracking only stores, the code size overhead is 38%,
    memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
    larger than #store, so both code size and cpu overhead increases. The
    first blocker is code size overhead: link fails if we inline tracking
    loads. The workaround is using external function calls to propagate
    metadata. This is also the workaround ASan uses. The cpu overhead
    is ~10x. This is a trade off between debuggability and performance,
    and will be used only when debugging cases that tracking only stores
    is not enough.

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D100967

[libc++] [test] Fix nodiscard_extensions.pass.cpp in _LIBCPP_DEBUG mode.

`std::clamp(2, 1, 3, std::greater<int>())` has UB because (1 > 3) is false.
Swap the operands to fix the _LIBCPP_ASSERT failure in this test.

[flang][openmp] Add General Semantic Checks for Allocate Directive

This patch adds semantic checks for the General Restrictions of the
Allocate Directive.

Since the requires directive is not yet implemented in Flang, the
restriction:
```
allocate directives that appear in a target region must
specify an allocator clause unless a requires directive with the
dynamic_allocators clause is present in the same compilation unit
```
will need to be updated at a later time.

A different patch will be made with the Fortran specific restrictions of
this directive.

I have used the code from https://reviews.llvm.org/D89395 for the
CheckObjectListStructure function.

Co-authored-by: Isaac Perry <isaac.perry@arm.com>
Reviewed By: clementval, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D91159

[x86] remove stale comment from test file; NFC

[libc++] Eliminate macro _LIBCPP_UNUSED_VAR. NFCI.

Reviewed as part of https://reviews.llvm.org/D100737

[libc++] Fix some typos and remove unused macros. NFCI.

Reviewed as part of https://reviews.llvm.org/D100737

[SLP]Skip undefs trying to find perfect/shuffled tree entries matching.

We can skip check for undefs trying to find perfect/shuffled tree
entries matching, they can be ignored completely improving the final
cost/vectorization results.

Differential Revision: https://reviews.llvm.org/D101061

[llvm-profgen] A couple tweaks to the testing harness.

1. Remove unnecessary filtering code.
2. Add llvm-profgen to tool substitutions.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D101006

[flang] Fix checking of argument passing for parameterized derived types

We were erroneously not taking into account the constant values of LEN type
parameters of parameterized derived types when checking for argument
compatibility.  The required checks are identical to those for assignment
compatibility.  Since argument compatibility is checked in .../lib/Evaluate and
assignment compatibility is checked in .../lib/Semantics, I moved the common
code into .../lib/Evaluate/tools.cpp and changed the assignment compatibility
checking code to call it.

After implementing these new checks, tests in resolve53.f90 were failing
because the tests were erroneous.  I fixed these tests and added new tests
to call03.f90 to test argument passing of parameterized derived types more
completely.

Differential Revision: https://reviews.llvm.org/D100989

[flang][driver][Revert] Reverts f18 to allow options passed to -W

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D100883

[PowerPC] Add missing casts for vec_xlds and vec_load_splats

The previous commits just missed some pointer casts and ended up
producing warnings.

[PowerPC] Add vec_vclz as an alias for vec_cntlz in altivec.h

Another addition for compatibility with XLC. The functions have the
same overloads so just add it as a preprocessor define.

[PowerPC] Add vec_load_splats to altivec.h

Add these overloads for compatibility with XLC. This is a word
load-and-splat.

[PowerPC] Add vec_xlds to altivec.h

Add these overloads for compatibility with XLC. This is a doubleword
load-and-splat.

[PowerPC] Add vec_roundz as alias for vec_trunc in altivec.h

Add the overloads for compatibility with XLC.

[PowerPC] Add vec_roundp as alias for vec_ceil

Add the overloads for compatibility with XLC.

[PowerPC] Add missing VSX guard for vec_roundm with vector double

The guard was missed in the previous commit.

[PowerPC] Add vec_roundm as alias for vec_floor in altivec.h

Add the overloads for compatibility with XLC.

[libc++] Re-apply `std::indirectly_readable` and `std::indirectly_writable`

That was originally committed in 04733181b513 and then reverted in
a9f11cc0d965 because it broke several people.

The problem was a missing include of __iterator/concepts.h, which has now
been fixed.

Differential Revision: https://reviews.llvm.org/D100073

[Hexagon] Unmasked and masked load pair to dame bae -? one load and selects

[lld/mac] tweak comment in a test

[AArch64] Block tryCombineToBSL combines for vectors wider than NEON

There are no patterns for the AArch64ISD::BSP ISD node for anything
other than NEON vectors at the moment. As a result, if we hit these
combines for vectors wider than a NEON vector (such as what we might get
with fixed length SVE) we will fail to lower.

This patch simply prevents us from attempting the combines if the input
vector type is too wide.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D100961

[OPENMP]Mark test as unsupported to avoid possible unexpected passes,
NFC.

[LoopVectorize] Fix bug where predicated loads/stores were dropped

This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569

[SLP]Replace more `TTI` with `TTIRef`, NFC.

To pacify MSVC buildbots.

[SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC
buildbots, NFC.

[lld/mac] make a few "named parameter comments" more consistent

Most of LLVM and almost all of lld/MachO uses `/*foo=*/bar` style.
No behavior change.

[SLP]Improve cost model for the vectorized extractelements.

1. No need to call `areAllUsersVectorized` as later the cost is
   calculated only if the instruction has one use and gets vectorized.
2. Need to calculate the cost of the dead extractelement more precisely,
   taking the vector type of the vector operand, not the resulting
   vector type.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D99980

[LoopIdiom] Added testcase for double memset (fixed in LLVM 12); NFC

[C++4OpenCL] Add extra diagnostics for kernel argument types

Add restrictions on type layout (PR48099):
- Types passed by pointer or reference must be standard layout types.
- Types passed by value must be POD types.

Patch by olestrohm (Ole Strohm)!

Differential Revision: https://reviews.llvm.org/D100471

[mlir] Move PyConcreteAttribute to header. NFC.

This allows out-of-tree users to derive PyConcreteAttribute to bind custom
attributes.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D101063

[OpenCL] Add missing C++ legacy atomics with generic

https://reviews.llvm.org/D62335 added some C++ for OpenCL specific
builtins to opencl-c.h, but these were not mirrored to the TableGen
builtin functions yet.

The TableGen builtins machinery does not have dedicated version
handling for C++ for OpenCL at the moment: all builtin versioning is
tied to `LangOpts.OpenCLVersion` (i.e., the OpenCL C version). As a
workaround, to add builtins that are only available in C++ for OpenCL,
we define a function extension guarded by the __cplusplus macro.

Differential Revision: https://reviews.llvm.org/D100935

Fixes PR50041.

Revert "[Hexagon] Masked and unmasked load to same base -> load and two selects"

This reverts commit 96dc8d7e7dee68592e56d69184b92fcb021cdb9c.

It breaks a few builds.

AArch64: support mixed-size fp <-> int conversions in GlobalISel.

AArch64: expand G_DIVREM operations in GlobalISel

We don't have a specific instruction for these, so they should be expanded to
whatever separate division & multiplication is needed.

Revert "[libcxx][iterator] adds `std::indirectly_readable` and `std::indirectly_writable`"

This reverts commit 04733181b5136e2b3df2b37c6bdd9e25f8afecd0 which was
failing for multiple people.

Update shebang for clang-format-diff script to python3.

Different distributions have different strategies migrating the `python` symlink. Debian and its derivatives provide `python-is-python2` and `python-is-python3`. If neither is installed, the user gets no `/usr/bin/python`. The clang-format-diff script and consequently `arc diff` can thus fail with a python not found error. Since we require python greater than 3.6 as part of llvm prerequisites (https://llvm.org/docs/GettingStarted.html#software), let's go ahead and update this shebang.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D100968

[Hexagon] Masked and unmasked load to same base -> load and two selects

[lld/mac] add a comment pointing to a test that took me a while to find

[X86] Regenerate atomic-eflags-reuse.ll

[LTO] Caching.h - remove unused <string> include. NFCI.

[InstCombine][NFC] Use --check-globals flag in tests.

This patch adds strings content checking to printf-2.ll via --check-globals flag.

Split off from D100724.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D101034

[SimplifyLibCalls][NFC] Use StringRef::back instead explicit indexing.

Split off from D100724.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D101032

[DAGCombiner] Allow operand of step_vector to be negative.

It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X, step_vector(-C))

Differential Revision: https://reviews.llvm.org/D100812

[mlir] Add `tensor.reshape`.

This operation a counterpart of `memref.reshape`.

RFC [Reshape Ops Restructuring](https://llvm.discourse.group/t/rfc-reshape-ops-restructuring/3310)

Differential Revision: https://reviews.llvm.org/D100971

[X86][AMX][NFC] Remove assert for comparison between different BBs.

SmallSet may use operator `<` when we insert MIRef elements, so we
cannot limit the comparison between different BBs.

We allow MIRef() to be less that any initialized MIRef object, otherwise,
we always reture false when compare between different BBs.

Differential Revision: https://reviews.llvm.org/D101039

[gn build] (manually) port aee6c86c4d better

"EmptyNodeIntrospection.inc.in" needs to be a source of the action,
so that ninja knows to rerun this action if that input changes.

[AST] Make comment a bit more specific

[gn build] (manually) port aee6c86c4d

[lldb/elf] Avoid side effects in function calls ParseUnwindSymbols

This addresses post-commit feedback to cd64273.

clang: libstdc++ LWM is 4.8.3

Document oldest libstdc++ as 4.8.3, remove a hack for a 4.6 issue.

Differential Revision: https://reviews.llvm.org/D100465

[AArch64][SVE] Regression test all ACLE tests with C++

We found issues with a number of intrinsics when building them with
C++, so it makes sense to guard these tests with some extra RUN lines
to build the tests in C++ mode.

[-Wcalled-once] Do not run analysis on Obj-C++

Objective-C++ is not yet suppoerted.

rdar://76729552

Differential Revision: https://reviews.llvm.org/D100955

[MLIR][Shape] Add canonicalizations for `shape.broadcast`

Eliminate empty shapes from the operands, partially fold all constant shape
operands, and fix normal folding.

Differential Revision: https://reviews.llvm.org/D100634

[lldb] Don't leak LineSequence in PDB parsers

`InsertSequence` doesn't take ownership of the pointer so releasing this pointer
is just leaking memory.

Follow up to D100806 that was fixing other leak sanitizer test failures

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D100846

[clang][deps] Check extra args in tests

These flags are being generated by `clang-scan-deps` and it makes sense to ensure it keeps doing so.

[AST] Add clarification comment

[lldb] XFAIL TestStoppedInStaticMemberFunction on Windows

It seems we can't find the symbols of static members on Windows? The bug is not
relevant to what this test is actually testing so let's just XFAIL it.

Fix typo "beneficiates" in comments

[AST] De-duplicate empty node introspection

This way we can add support for other nodes without duplication.

Differential Revision: https://reviews.llvm.org/D98774

[lldb-vscode] Use a DenseMap to pacify overly aggressive linters

Some linters get rather upset upon seeing
`std::unordered_map<const char*`, because it looks like a map of
strings but isn't. lldb uses interned strings so this is not a problem.
DenseMap is a better data structure for this anyways, so use that
instead.

[Bitcode] Ensure DIArgList in bitcode has no null or forward metadata refs

This patch fixes an issue in which ConstantAsMetadata arguments to a
DIArglist, as well as the Constant values referenced by that metadata,
would not be always be emitted correctly into bitcode. This patch fixes
this issue firstly by searching for ConstantAsMetadata in DIArgLists
(previously we would only search for them when directly wrapped in
MetadataAsValue), and secondly by enumerating all of a DIArgList's
arguments directly prior to enumerating the DIArgList itself.

This patch also adds a number of asserts, and no longer treats the
arguments to a DIArgList as optional fields when reading/writing to
bitcode.

Differential Revision: https://reviews.llvm.org/D100572

[Matrix] Support #pragma clang fp

From https://bugs.llvm.org/show_bug.cgi?id=49739:

Currently, `#pragma clang fp` are ignored for matrix types.

For the code below, the `contract` fast-math flag should be added to the generated call to `llvm.matrix.multiply` and `fadd`

```
typedef float fx2x2_t __attribute__((matrix_type(2, 2)));

void foo(fx2x2_t &A, fx2x2_t &C, fx2x2_t &B) {
#pragma clang fp contract(fast)
C = A*B + C;
}
```

Reviewed By: fhahn, mibintc

Differential Revision: https://reviews.llvm.org/D100834

MipsSEFrameLowering.h - remove unused headers. NFCI.

[X86][AVX] Add PR49971 test case

This is a llvm12 only bug, and is already avoided in trunk, but we should keep track of it.

[PowerPC] Add vec_roundc as alias for vec_rint in altivec.h

For compatibility with XLC, add these overloads.

[AST] Add NestedNameSpecifierLoc accessors to node introspection

Differential Revision: https://reviews.llvm.org/D100712

[lldb][NFC] Fix unsigned/signed cmp warning in MainLoopTest

The gtest checks compare all against unsigned int constants so this also needs
to be unsigned.

[lldb] Add support for evaluating expressions in static member functions

At the moment the expression parser doesn't support evaluating expressions in
static member functions and just pretends the expression is evaluated within a
non-member function. This causes that all static members are inaccessible when
doing unqualified name lookup.

This patch adds support for evaluating in static member functions. It
essentially just does the same setup as what LLDB is already doing for
non-static member functions (i.e., wrapping the expression in a fake member
function) with the difference that we now mark the wrapping function as static
(to prevent access to non-static members).

Reviewed By: shafik, jarin

Differential Revision: https://reviews.llvm.org/D81550

[PowerPC] Improve codegen for vector fp to int widening conversions

We currently do not utilize instructions that convert single
precision vectors to doubleword integer vectors. These conversions
come up in code occasionally and this improvement allows us to
open code some functions that need to be added to altivec.h.

[mlir] Move memref-tests from standard to memref folder.

Split memref-test from standard test and move them to the folder MemRef.

Differential Revision: https://reviews.llvm.org/D100950

[AArch64] Fix calling windows varargs with floats in fixed args from non-windows functions

When inspecting the calling convention, for calling windows functions
from a non-windows function, inspect the calling convention of
the called function, not the caller.

Also remove an unnecessary parameter to AArch64CallLowering
OutgoingArgHandler.

Differential Revision: https://reviews.llvm.org/D100890

[clang][deps] Include "-cc1" in the arguments

To simplify tools consuming dependency scanning results, prepend the "-cc1" argument by default.

Reviewed By: Bigcheese

Differential Revision: https://reviews.llvm.org/D100942

[mlir][linalg] remove interchange option on linalg to loop lowering.

The interchange option attached to the linalg to loop lowering affects only the loops and does not update the memory accesses generated in to body of the operation. Instead of performing the interchange during the loop lowering use the interchange pattern.

Differential Revision: https://reviews.llvm.org/D100758

clang-format: [JS] do not merge side-effect imports.

The if condition was testing the current element, but
forgot to check the previous element (doh), so it
would fail depending on sort order of the imports.

Differential Revision: https://reviews.llvm.org/D101020

[AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands

STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so
there is no need to add another implicit use of $exec when lowering them
to V_MOV_B32 instructions.

Differential Revision: https://reviews.llvm.org/D100969

[RISCV] Custom lowering of SET_ROUNDING

Differential Revision: https://reviews.llvm.org/D91242

[LoopVectorize] Don't create unnecessary vscale intrinsic calls

In quite a few cases in LoopVectorize.cpp we call createStepForVF
with a step value of 0, which leads to unnecessary generation of
llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale
and createStepForVF to return 0 when attempting to multiply
vscale by 0.

Differential Revision: https://reviews.llvm.org/D100763

[CSSPGO][llvm-profdata] Support trimming cold context when merging profiles

The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.

Differential Revision: https://reviews.llvm.org/D100528

[libcxx] [test] Fix testing on windows with c++experimental enabled

The straightforward `AddLinkFlag('-lc++experimental')` approach doesn't
work on e.g. MSVC. For linking to libc++ itself, a more convoluted logic
is used (see configure_link_flags_cxx_library).

Differential Revision: https://reviews.llvm.org/D99177

[clang-tidy] Avoid bugprone-macro-parentheses warnings after goto argument

clang-tidy should not generate warnings for the goto argument without
parentheses, because it would be a syntax error.

The only valid case where an argument can be enclosed in parentheses is
"Labels as Values" gcc extension: https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html.
This commit adds support for the label-as-values extension as implemented in clang.

Fixes bugzilla issue 49634.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D99924

[NewPM] Mark some more wrapper passes as ignored

We shouldn't print IR when seeing these passes.

[RISCV] Use TargetConstant for condition code of RISCVISD::SELECT_CC.

The value is always an immediate and can never be in a register.
This the kind of thing TargetConstant is for.

Saves a step GenDAGISel to convert a Constant to a TargetConstant.

[GVN] Introduce loop load PRE

This patch allows PRE of the following type of loads:

```
preheader:
  br label %loop

loop:
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  br label %merge

merge:
  ...
  br i1 ..., label %loop, label %exit

```

Into
```
preheader:
  %x0 = load %p
  br label %loop

loop:
  %x.pre = phi(x0, x2)
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  %x1 = load %p
  br label %merge

merge:
  x2 = phi(x.pre, x1)
  ...
  br i1 ..., label %loop, label %exit

```

So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.

The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.

There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
  we don't know if their sum is colder than the header.

Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames