review.tizen.org Git - platform/upstream/llvm.git/log

[AMDGPU] Fixed mfma-loop test. NFC.

[LLDB] Cleanup the DataEncoder utility. (NFC)

This commit removes unused methods from the DataEncoder class and cleans
up the API by making all the internal methods private.

Revert a hunk from 9634064cfa1b9bf7b7

This causes errors when building LLDB because the Windows implementation
doesn't implement this method:

C:\src\llvm-project\lldb\source\Plugins\ScriptInterpreter\Python\ScriptInterpreterPython.cpp(915,19): error: allocating an object of abstract class type 'lldb_private::ConnectionGenericFile'
              new ConnectionGenericFile(read_file, true));
                  ^
C:\src\llvm-project\lldb\include\lldb/Utility/Connection.h(174,28): note: unimplemented pure virtual method 'GetReadObject' in 'ConnectionGenericFile'
  virtual lldb::IOObjectSP GetReadObject() = 0;
                           ^

[LLDB] Implement pure virtual method in MockConnection

I made GetReadObject pure virtual in the base class and forgot to add
the method to the mock class.

Sink MachineFunction private method out of line

This method is private and only called from this file and doesn't need
to be inline. Saves a TargetMachine.h include in MachineFunction.h, a
popular header. The include was introduced in 98603a8153086 despite the
forward decl of LLVMTargetMachine.

[X86] Don't treat mxcsr as a register name when parsing MS inline assembly.

No instruction takes mxcsr as a an operand so we should always
treat it as an identifier name.

[LLDB] Fix another set of -Wdocumentation warnings

At this point I'm just fixing issues as I see them pop up locally in
incremental builds.

[LLDB] Remove dead code from StreamFile

[RegisterContext] Remove now unneded vestiges.

[LLDB] Fix a bunch of -Wdocumentation warnings in ExpressionParser

Remove redundant check. (NFC)

Use cheaper, equivalent predicate. (NFC)

[X86] Don't set the operation action for i16 SINT_TO_FP to Promote just because SSE1 is enabled.

Instead do custom promotion in the handler so that we can still
allow i16 to be used with fp80. And f64 without sse2.

[X86] Fix typo in comment. NFC

[X86] Move all the FP_TO_XINT/XINT_TO_FP setOperationActions into the same !useSoftFloat block. Qualify all of the Promote actions for these with !useSoftFloat too. NFCI

The Promote action doesn't apply until LegalizeDAG. By the time
we get there, we would have already softened all the FP operations
if useSoftFloat was true. So there wouldn't be any operation left
to Promote.

Rename clang-module-related *DWO* functions to *ClangModule* (NFC)

This avoids confusing them with fission-related functionality.

I also moved two accessor functions from DWARFDIE into static
functions in DWARFASTParserClang were their only use is located.

[PGO][PGSO] Temporarily disable the large working set size behavior.

Summary:
This temporarily disables the large working set size behavior in profile guided
size optimization due to internal benchmark regressions.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70207

libc++ status page: Fix HTML.

[SimplifyCFG] add test for select with FMF; NFC

Rename ParseTypeFromDWO to ParseTypeFromClangModule (NFC)

Because that is what this function really does. The old name is
misleading.

Revert "[RISCV] Use compiler-rt if no GCC installation detected"

This change causes test failures for builds configured with
-DCLANG_DEFAULT_RTLIB=compiler-rt.

This reverts commit 3289352e6bb9d2949c678c625478024bf2a5fbfb.

[SLP] fix miscompile on min/max reductions with extra uses (PR43948)

The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148

[clang-format] refactor the use of the SMDiagnostics in replacement warnings

Summary:
Review comments in {D69854} recommended a simpler approach of creating the SMDiagnostics to remove much of the complexity. (thanks @thakis)

@vlad.tsyrklevich I've rebuilt on both Windows and Linux (running Linux with Address and Undefined sanitizers) over the clang code base

Reviewers: thakis, klimek, mitchell-stellar, vlad.tsyrklevich

Reviewed By: thakis

Subscribers: cfe-commits, thakis, vlad.tsyrklevich

Tags: #clang-format, #clang

Differential Revision: https://reviews.llvm.org/D69921

[LLD] [COFF] Fix automatically importing data symbols from DLLs with LTO

This broke in 51dcb292cc002, "[lld-link] diagnose undefined symbols
before LTO when possible" (very soon after the 9.0 branch, so
luckily the 9.0 release is unaffected).

The code for loading objects we believe might be needed for autoimport
(loadMinGWAutomaticImports()) does run before the new
reportUnresolvable() function, but it had a condition to only operate
on symbols from regular object files. This condition came from
resolveRemainingUndefines(), but as loadMinGWAutomaticImports() now
has to operate before the LTO, it has to operate on undefineds from
LTO objects as well.

Differential Revision: https://reviews.llvm.org/D70166

Add -disable-builtin option to opt

Summary:
The option allows to disable specific target library builtin functions,
instead of -disable-simplify-libcalls, which disables all of them.

This is a prerequisite for D70143, which fixes PR43081.

Reviewers: xbolva00, spatel, jdoerfert, efriedma

Reviewed By: efriedma

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70193

[LLDB] Fix a bunch of -Wdocumentation warnings

[dsymutil] Add -dump to llvm-bcanalyzer invocations

[TargetLowering] Increase the storage size of NumRegistersForVT to allow the type break down for v256i1 and other types to be stored correctly

v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0.

This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored.

Differential Revision: https://reviews.llvm.org/D70138

[mips] Reduce number of nested `if` statements. NFC

[mips] Add test to check ELF output for JAL XGOT expansion. NFC

[mips] Add tests to check `jal sym+offset`. NFC

[LiveInterval] Allow updating subranges with slightly out-dated IR

During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.

This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.

E.g., let say we want to merge:
    V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>

We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
    V2: sub0 sub1 sub2 sub3
    V1: <offset>  sub0 sub1

This offset will look like a composed subregidx in the the class:
     V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
=>  V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>

Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.

Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.

For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)

This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
   subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.

looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.

None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.

[TTI] Fix cast cost on vector types.

- Only split vector types when both src and dst types are splittable.

[llvm-bcanalyzer] Don't dump the contents if -dump is not passed

With all the previous refactorings this slipped through and now we
always dump the contents of the bitcode files, even if -dump is not
passed.

[AArch64][v8.3a] Add missing imp-defs on RETA*.

RETA always implicitly uses LR, unlike RET which merely has an
alias that defaults it to LR.
Additionally, RETA implicitly uses SP as well, which it uses as
a discriminator to authenticate LR.

This isn't usually noticeable, because RET_ReallyLR is used in most
of the backend. However, the post-RA scheduler, if enabled, will
cause miscompiles if the imp-uses are missing.

While there, fix a typo in the lone affected testcase.

[AArch64][v8.3a] Add LDRA '[xN]!' alias.

The instruction definition has been retroactively expanded to
allow for an alias for '[xN, 0]!' as '[xN]!'.
That wouldn't make sense on LDR, but does for LDRA.

[SLP] improve test readability; NFC

[BPF] fix clang test failure for bpf-attr-preserve-access-index-4.c

Depending on different cmake configures, clang may generate different
IR name for slot variables. Let us use the regex instead of hard
coding the name. I did the same for other bpf-attr-preserve-access-index
tests with such an approach, but somehow did not do for this one.

[RISCV] Use compiler-rt if no GCC installation detected

If a GCC installation is not detected, then this attempts to
use compiler-rt and the compiler-rt crtbegin/crtend
implementations as a fallback.

Differential Revision: https://reviews.llvm.org/D68407

Fix typo in DwarfDebug [NFC]

Don't set LLVM_NO_DEAD_STRIP on AIX

Summary:
when building plugins, as AIX has symbols in it's standard library that
must be garbage collected or we will see link errors. Export lists will
handle this instead on AIX.

Reviewers: stevewan, sfertile, jasonliu, xingxue, DiggerLin

Reviewed By: DiggerLin

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70130

[BPF] add missing attribute in pragma-attribute-supported-attributes-list.test

Add the newly supported BPF specific __attribute__((preserve_access_index)
in the pragma-attribute-supported-attributes-list.test.

[SLP] reduce code duplication for min/max vs. other reductions; NFCI

[BPF] Add preserve_access_index attribute for record definition

This is a resubmission for the previous reverted commit
943436040121 with the same subject. This commit fixed the
segfault issue and addressed additional review comments.

This patch introduced a new bpf specific attribute which can
be added to struct or union definition. For example,
  struct s { ... } __attribute__((preserve_access_index));
  union u { ... } __attribute__((preserve_access_index));
The goal is to simplify user codes for cases
where preserve access index happens for certain struct/union,
so user does not need to use clang __builtin_preserve_access_index
for every members.

The attribute has no effect if -g is not specified.

When the attribute is specified and -g is specified, any member
access defined by that structure or union, including array subscript
access and inner records, will be preserved through
  __builtin_preserve_{array,struct,union}_access_index()
IR intrinsics, which will enable relocation generation
in bpf backend.

The following is an example to illustrate the usage:
  -bash-4.4$ cat t.c
  #define __reloc__ __attribute__((preserve_access_index))
  struct s1 {
    int c;
  } __reloc__;

  struct s2 {
    union {
      struct s1 b[3];
    };
  } __reloc__;

  struct s3 {
    struct s2 a;
  } __reloc__;

  int test(struct s3 *arg) {
    return arg->a.b[2].c;
  }
  -bash-4.4$ clang -target bpf -g -S -O2 t.c

A relocation with access string "0:0:0:0:2:0" will be generated
representing access offset of arg->a.b[2].c.

forward declaration with attribute is also handled properly such
that the attribute is copied and populated in real record definition.

Differential Revision: https://reviews.llvm.org/D69759

Fix comment spelling {addresing -> addressing} (NFC)

[profile] Factor out logic for mmap'ing merged profile, NFC

Split out the logic to get the size of a merged profile and to do a
compatibility check. This can be shared with both the continuous+merging
mode implementation, as well as the runtime-allocated counters
implementation planned for Fuchsia.

Lifted out of D69586.

Differential Revision: https://reviews.llvm.org/D70135

[InstCombine] propagate fast-math-flags (FMF) to select when inverting fcmp+select

As noted by the FIXME comment, this is not correct based on our current FMF semantics.
We should be propagating FMF from the final value in a sequence (in this case the
'select'). So the behavior even without this patch is wrong, but we did not allow FMF
on 'select' until recently.

But if we do the correct thing right now in this patch, we'll inevitably introduce
regressions because we have not wired up FMF propagation for 'phi' and 'select' in
other passes (like SimplifyCFG) or other places in InstCombine. I'm not seeing a
better incremental way to make progress.

That said, the potential extra damage over the existing wrong behavior from this
patch is very limited. AFAIK, the only way to have different FMF on IR in the same
function is if we have LTO inlined IR from 2 modules that were compiled using
different fast-math settings.

As seen in the tests, we may actually see some improvements with this patch because
adding the FMF to the 'select' allows matching to min/max intrinsics that were
previously missed (in the common case, the 'fcmp' and 'select' should have identical
FMF to begin with).

Next steps in the transition:

    Make similar changes in instcombine as needed.
    Enable phi-to-select FMF propagation in SimplifyCFG.
    Remove dependencies on fcmp with FMF.
    Deprecate FMF on fcmp.

Differential Revision: https://reviews.llvm.org/D69720

DWARFDebugLoclists: Add an api to get the location lists of a DWARF unit

Summary:
This avoid the need to duplicate the location lists searching logic in
various users. The "inline location list dumping" code (which is the
only user actually updated to handle DWARF v5 location lists) is
switched to this method. After adding v4 location list support, I'll
switch other users too.

Reviewers: dblaikie, probinson, JDevlieghere, aprantl, SouraVX

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70084

Remove commented out CHECK-NEXT to try and appease llvm-clang-x86_64-expensive-checks-win buildbot

PowerPC - fix uninitialized variable warnings. NFCI.

Fix uninitialized variable warning. NFCI.

Sparc - fix uninitialized variable warnings. NFCI.

PPCReduceCRLogicals - fix static analyzer warnings. NFC
- Fix uninitialized variable warnings.
- Fix null dereference warnings.

SLPVectorizer - make comparison operators + isInSchedulingRegion const

Fixes cppcheck warnings.

[clang][Tooling] Filter flags that generate output in SyntaxOnlyAdjuster

Summary:
Flags that generate output could result in failures when creating
syntax only actions. This patch introduces initial logic for filtering out
those. The first such flag is "save-temps", which saves intermediate
files(bitcode, assembly, etc.) into a specified directory.

Fixes https://github.com/clangd/clangd/issues/191

Reviewers: hokein

Subscribers: ilya-biryukov, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D70173

[clangd] Add bool return type to Index::refs API.

Summary:
Similar to fuzzyFind, the bool indicates whether there are more xref
results.

Reviewers: ilya-biryukov

Reviewed By: ilya-biryukov

Subscribers: merge_guards_bot, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D70139

[InstCombine] Avoid moving ops that do restrict undef across shuffles.

I think we have to be a bit more careful when it comes to moving
ops across shuffles, if the op does restrict undef. For example, without
this patch, we would move 'and %v, <0, 0, -1, -1>' over a
'shufflevector %a, undef, <undef, undef, 1, 2>'. As a result, the first
2 lanes of the result are undef after the combine, but they really
should be 0, unless I am missing something.

For ops that do fold to undef on undef operands, the current behavior
should be fine. I've add conservative check OpDoesRestrictUndef, maybe
there's a better existing utility?

Reviewers: spatel, RKSimon, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D70093

Revert "[RISCV] Fix wrong CFI directives"

test/DebugInfo/RISCV/relax-debug-frame.ll wasn't properly updated.

[InstCombine] Precommit shuffle tests for D70093.

[ARM][MVE] canTailPredicateLoop

This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a
first version and it operates on single block loops only. With this change, the
vectoriser will now determine if tail-folding scalar remainder loops is
possible/desired, which is the first step to generate MVE tail-predicated
vector loops.

This is disabled by default for now. I.e,, this is depends on option
-disable-mve-tail-predication, which is off by default.

I will follow up on this soon with a patch for the vectoriser to respect loop
hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
we don't want to tail-fold and we shouldn't query this TTI hook, which is
done in D70125.

Differential Revision: https://reviews.llvm.org/D69845

[RISCV] Fix wrong CFI directives

Summary: Removes CFI CFA directives that could incorrectly propagate
beyond the basic block they were inteded for. Specifically it removes
the epilogue CFI directives. See the branch_and_tail_call test for an
example of the issue. Should fix the stack unwinding issues caused by
the incorrect directives.

Reviewers: asb, lenary, shiva0217
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69723

[ARM,MVE] Add intrinsics for contiguous load/stores.

This patch adds the ACLE intrinsics for all the MVE load and store
instructions not already handled by D69791. These ones don't need new
IR intrinsics, because they can be implemented in terms of standard
LLVM IR constructions.

Some of the load and store instructions access less than 128 bits of
memory, sign/zero extending each value to a wider vector lane on load
or truncating it on store. These are represented in IR by a load of a
shorter vector followed by a zext/sext, and conversely, a trunc
followed by a short store. Existing ISel patterns already recognize
those combinations and turn them into the right MVE instructions.

The predicated forms of all these instructions are represented in the
same way, except that the ordinary load/store operation is replaced
with the existing intrinsics @llvm.masked.{load,store}. These are
currently only code-generated as predicated MVE load/store
instructions if you give LLVM the `-enable-arm-maskedldst` option; so
I've done that in the LLVM codegen test. When we make that the
default, that option can be removed.

In the Tablegen backend, I've had to add a handful of extra support
features:

* We need to be able to make clang::Address objects out of a
  pointer and an alignment (previously we only needed these when the
  user passed us an existing one).

* We can now specify vector types that aren't 128 bits wide (for use
  in those intermediate values in IR), the parametrized type system
  can make one starting from two existing vector types (using the lane
  count of one and the element type of the other).

* I've added support for code generation of pointer casts, and for
  specifying LLVM types as operands to IRBuilder operations (for zext
  and sext, though I think they'll come in useful again).

* Now not all IR construction operations need to be specified as
  Builder.CreateFoo; some don't involve a Builder at all, and one
  passes it as a parameter to a tiny static helper function in
  CGBuiltin.cpp.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Subscribers: kristof.beyls, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70088

[X86][AVX] Add plausible schedule classes to MASKPAIR/VP2INTERSECT/VDPBF16PS instructions

These are really just placeholders that use approximately the right resources - once we have CPUs scheduler models that support these instructions they will need revisiting.

In the meantime this means that all instructions have a class of some kind., meaning models can be more easily flagged as complete.

[libomptarget] Move supporti.h to support.cu

Summary:
[libomptarget] Move supporti.h to support.cu
Reimplementation of D69652, without the unity build and refactors.
Will need a clean build of libomptarget as the cmakelists changed.

Reviewers: ABataev, jdoerfert

Reviewed By: jdoerfert

Subscribers: mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D70131

Revert 57dd4b0 "[ValueTracking] Allow context-sensitive nullness check for non-pointers"

This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced
repro is small enough to fit here:

  $ cat /tmp/a.c
  unsigned char f(unsigned char *p) {
    unsigned char result = 0;
    for (int shift = 0; shift < 1; ++shift)
      result |= p[0] << (shift * 8);
    return result;
  }
  $ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f:
  f:                                      # @f
          .cfi_startproc
  # %bb.0:                                # %entry
          xorl    %eax, %eax
          retq

That's nicely optimized, but I don't think it's the right result :-)

> Same as D60846 but with a fix for the problem encountered there which
> was a missing context adjustment in the handling of PHI nodes.
>
> The test that caused D60846 to be reverted was added in e15ab8f277c7.
>
> Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
>
> Subscribers: hiraditya, bollu, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D69571

This reverts commit 57dd4b03e4806bbb4760ab6150940150d884df20.

[Mips] Add rematerialization support for ldi.fmt

Instruction ldi.fmt can be considered cheap enough to avoid spill and restore
of value that it produces since it's loaded from immediate.

Differential Revision: https://reviews.llvm.org/D69898

[mips] Show an error if 64-bit target triple provided with 32-bit CPU

When a 64-bit triple is used emit an error if the CPU only supports
32-bit code.

Patch by Miloš Stojanović.

Differential Revision: https://reviews.llvm.org/D70018

[mips][test] Add Mips CPU tests. NFC

Adding tests check all available CPUs on Mips.

Patch by Miloš Stojanović.

Differential Revision: https://reviews.llvm.org/D70017

[OpenCL] Add remaining vector data builtin functions

Add the remaining half (fp16) vector data load and store builtin
functions from the OpenCL C specification.

Patch by Pierre Gondois and Sven van Haastregt.

Temporarily revert "[InstCombine] Fold PHIs with equal incoming pointers"

Revert due to sanitizer-windows buildbot failure.

This reverts commit bbb29738b58aaf6f6518269abdcf8f64131665a9.

[DebugInfo] Avoid creating entry values for clobbered registers

Summary:
Entry values are considered for parameters that have register-described
DBG_VALUEs in the entry block (along with other conditions).

If a parameter's value has been propagated from the caller to the
callee, then the parameter's DBG_VALUE in the entry block may be
described using a register defined by some instruction, and entry values
should not be emitted for the parameter, which can currently occur.
One such case was seen in the attached test case, in which the second
parameter, which is described by a redefinition of the first parameter's
register, would incorrectly get an entry value using the first
parameter's register. This commit intends to solve such cases by keeping
track of register defines, and ignoring DBG_VALUEs in the entry block
that are described by such registers.

In a RelWithDebInfo build of clang-8, the average size of the set was
27, and in a RelWithDebInfo+ASan build it was 30.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D69889

[DebugInfo] Add helper for finding entry value candidates [NFC]

Summary:
The conditions that are used to determine if entry values should be
emitted for a parameter are quite many, and will grow slightly
in a follow-up commit, so move those to a helper function, as was
suggested in the code review for D69889.

Reviewers: djtodoro, NikolaPrica

Reviewed By: djtodoro

Subscribers: probinson, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69955

[AArch64] Extend storeRegToStackSlot to spill SVE registers.

This patch allows the register allocator to spill SVE registers to the stack.

Reviewers: ostannard, efriedma, rengolin, cameron.mcinally

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70082

[InstCombine] Fold PHIs with equal incoming pointers

In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.

Reviewers: spatel, RKSimon, lebedev.ri, apilipenko

Reviewed By: apilipenko

Tags: #llvm

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D68128

[AArch64][SVE] Allocate locals that are scalable vectors.

This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.

Reviewers: ostannard, efriedma, rengolin, cameron.mcinally

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70080

[ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement.

MVE includes instructions that extract an 8- or 16-bit lane from a
vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td`
already included isel patterns to select those instructions in
response to the `ARMISD::VGETLANEs` selection-DAG node type. But
`ARMISD::VGETLANEs` was never actually generated, because the code
that creates it was conditioned on NEON only.

It's an easy fix to enable the same code for integer MVE, and now IR
that sign-extends the result of an extractelement (whether explicitly
or as part of the function call ABI) will use `vmov.s8` instead of
`vmov.u8` followed by `sxtb`.

Reviewers: SjoerdMeijer, dmgreen, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70132

[libcxx testing] Fix -Wtautological-overlap-compare bug

[TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4)

Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by

```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.

Updates the MSP430 target with a custom implementation.

This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this.

Existing tests apply, a few more have been added.

Reviewers: asl, spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70042

[X86] Remove setOperationAction for FP_TO_SINT v8i16.

This is no longer needed after widening legalization as we
custom legalize v8i8 ourselves.

Added entries to the cost model, but bumped the cost slightly
to account for the truncate shuffle that wasn't costed before.

[GPGPU] Fix regression test after 395124.

Commit 395124 "NVPTX: Don't insert an extra empty line at the end of the last section"
changed the length of the kernel payload. Update the regression test to the new binary size.

[Reproducer] Discard reproducer directory if not generated.

If lldb was run in capture mode, but no reproducer was generated, make
sure we clean up the reproducer directory.

[VFABI] Add LLVM internal mangling for vector functions.

Summary:
This patch adds a custom ISA for vector functions for internal use
in LLVM. The <isa> token is set to "_LLVM_", and it is not attached
to any specific instruction Vector ISA, or Vector Function ABI.

The ISA is used as a token for handling Vector Function ABI-style
vectorization for those vector functions that are not directly
associated to any existing Vector Function ABI (for example, some of
the vector functions exposed by TargetLibraryInfo). The demangling
function for this ISA in a Vector Function ABI context is set to be
the same as the common one shared between X86 and AArch64.

Reviewers: jdoerfert, sdesmalen, simoll

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70089

Add 8548 CPU definition and attributes

8548 CPU is GCC's name for the e500v2, so accept this in clang. The
e500v2 doesn't support lwsync, so define __NO_LWSYNC__ for this as well,
as GCC does.

Differential Revision: https://reviews.llvm.org/D67787

AMDGPU: Extend add x, (ext setcc) combine to sub

This is the same as the add case, but inverts the operation type.

This avoids regressions in a future patch.

AMDGPU: Switch backend default max workgroup size to 1024

Previously this would default to 256, not the maximum supported size
of 1024. Using a maximum lower than the hardware maximum requires
language runtimes to enforce this limit for correctness, which no
language has correctly done. Switch the default to the conservatively
correct maximum, and force frontends to opt-in to the more optimal 256
default maximum.

I don't really understand why the changes in occupancy-levels.ll
increased the computed occupancy, which I expected to decrease. I'm
not sure if these tests should be forcing the old maximum.

AMDGPU Reduce reported maximum group size to 1024

While some targets allow encoding 2048, this was never tested or
supported.

[GlobalsAA] Reenable test.

[LLDB] Add core definition for armv8l and armv7l

This patch adds core definitions in lldb ArchSpecs for armv8l and armv7l cores.

This was needed because on Linux running on 32-bit Arm v8 we are returned
armv8l in case we are running 32-bit sysroot on 64bit kernel. In case of 32-bit
kernel and 32-bit sysroot running on arm v8 hardware we are returned armv7l.
This is quite common when we run 32 bit arm using docker container.

Signed-off-by: Muhammad Omair Javaid <omair.javaid@linaro.org>
Differential Revision: https://reviews.llvm.org/D69904

Don't assume that the clang binary's resolved name includes the string
'clang'.

This is not true in practice in some content-addressed file systems.

[Sema] Add MacroQualified case for FunctionTypeUnwrapper

This is a fix for PR43315. An assertion error is hit for this minimal example:

```
//clang -cc1 -triple x86_64-- -S tstVMStructRC-min.cpp
int (a b)(); // Assertion `Chunk.Kind == DeclaratorChunk::Function' failed.
```

This is because we do not cover the case in the FunctionTypeUnwrapper where it
receives a MacroQualifiedType. We have not run into this earlier because this
is a unique case where the __attribute__ contains both __cdecl__ and
__regparm__ (in that order), and we are compiling for x86_64. Changing the
architecture or the order of __cdecl__ and __regparm__ does not raise the
assertion.

Differential Revision: https://reviews.llvm.org/D67992

Temporarily disable test.

Temporarily Revert "Reapply [LVI] Normalize pointer behavior" as it's broken python 3.6.

Reverting to figure out if it's a problem in python or the compiler for now.

This reverts commit 885a05f48a5d320946c89590b73a764e5884fe4f.

[LLDB] Only set FRAMEWORK when we're actually building a framework.

[LLDB] Remove debug message in AddLLDB.cmake

Add a shim for setenv on PS4 since it does not exist.

A few years back a similar change was made for getenv since neither function is supported on the PS4 platform.

Recently, commit d889d1e added a call to setenv in compiler-rt which was causing linking errors because the symbol was not found. This fixes that issue by putting in a shim similar to how we previously dealt with the lack of getenv.

Differential Revision: https://reviews.llvm.org/D70033

[X86] Don't consider v64i1 as a legal type unless v64i8 is also a legal type.

This avoids some nasty issues with argument passing and lowering of
arbitrary v64i8 shuffles.

[X86] Only pass v64i8/v32i16 as v16i32 on non-avx512bw targets if the v16i32 type won't be split by prefer-vector-width=256

Otherwise just let the v64i8/v32i16 types be split to v32i8/v16i16.

In reality this shouldn't happen because it means we have a 512-bit
vector argument, but min-legal-vector-width says a value less than
512. But a 512-bit argument should have been factored into the
preferred vector width.

Fix include guard and properly order __deregister_frame_info.

Summary:
This patch fixes two problems with the crtbegin.c as written:

1. In do_init, register_frame_info is not guarded by a #define, but in
do_fini, deregister_frame_info is guarded by #ifndef
CRT_HAS_INITFINI_ARRAY. Thus when CRT_HAS_INITFINI_ARRAY is not
defined, frames are registered but then never deregistered.

The frame registry mechanism builds a linked-list from the .so's
static variable do_init.object, and when the .so is unloaded, this
memory becomes invalid and should be deregistered.

Further, libgcc's crtbegin treats the frame registry as independent
from the initfini array mechanism.

This patch fixes this by adding a new #define,
"EH_USE_FRAME_INFO_REGISTRY", which is set by the cmake option
COMPILER_RT_CRT_USE_EH_FRAME_REGISTRY Currently, do_init calls
register_frame_info, and then calls the binary's constructors. This
allows constructors to safely use libunwind. However, do_fini calls
deregister_frame_info and then calls the binary's destructors. This
prevents destructors from safely using libunwind.

This patch also switches that ordering, so that destructors can safely
use libunwind. As it happens, this is a fairly common scenario for
thread sanitizer.