platform/upstream/llvm.git
3 years agoFactor out common code from the iPhone/AppleTV/WatchOS simulator platform plugins...
Adrian Prantl [Wed, 5 Aug 2020 22:22:20 +0000 (15:22 -0700)]
Factor out common code from the iPhone/AppleTV/WatchOS simulator platform plugins. (NFC)

The implementation of these classes was copied & pasted from the
iPhone simulator plugin with only a handful of configuration
parameters substituted. This patch moves the redundant implementations
into the base class PlatformAppleSimulator.

Differential Revision: https://reviews.llvm.org/D85243

3 years agoGlobalISel: Implement lower for G_INSERT_VECTOR_ELT
Matt Arsenault [Tue, 28 Jul 2020 01:13:40 +0000 (21:13 -0400)]
GlobalISel: Implement lower for G_INSERT_VECTOR_ELT

3 years ago[gn build] mac: use frameworks instead of libs where appropriate
Mark Mentovai [Thu, 6 Aug 2020 22:57:56 +0000 (18:57 -0400)]
[gn build] mac: use frameworks instead of libs where appropriate

As of GN 3028c6a426a4, the hack that transformed "libs" ending in
".framework" from -l arguments to -framework arguments has been removed.
Instead, "frameworks" must be used, and the toolchain must provide
support.

Differential Revision: https://reviews.llvm.org/D84219

3 years ago[NewPM][GuardWidening] Fix loop guard widening tests under NPM
Arthur Eubanks [Thu, 6 Aug 2020 04:12:08 +0000 (21:12 -0700)]
[NewPM][GuardWidening] Fix loop guard widening tests under NPM

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D85394

3 years ago[ELF] Change tombstone values to (.debug_ranges/.debug_loc) 1 and (other .debug_*) 0
Fangrui Song [Thu, 6 Aug 2020 19:34:16 +0000 (12:34 -0700)]
[ELF] Change tombstone values to (.debug_ranges/.debug_loc) 1 and (other .debug_*) 0

tl;dr See D81784 for the 'tombstone value' concept. This patch changes our behavior to be almost the same as GNU ld (except that we also use 1 for .debug_loc):

* .debug_ranges & .debug_loc: 1 (LLD<11: 0+addend; GNU ld uses 1 for .debug_ranges)
* .debug_*: 0 (LLD<11: 0+addend; GNU ld uses 0; future LLD: -1)

We make the tweaks because:

1) The new tombstone is novel and needs more time to be adopted by consumers before it's the default.
2) The old (gold) strategy had problems with zero-length functions - so rather than going back that, we're going to the GNU ld strategy which doesn't have that problem.
3) One slight tweak to (2) is to apply the .debug_ranges workaround to .debug_loc for the same reasons it applies to debug_ranges - to avoid terminating lists early.

-----

http://lists.llvm.org/pipermail/llvm-dev/2020-July/143482.html

The tombstone value -1 in .debug_line caused problems to lldb (fixed by D83957;
will be included in 11.0.0) and breakpad (fixed by
https://crrev.com/c/2321300). It may potentially affects other DWARF consumers.

For .debug_ranges & .debug_loc: 1, an argument preferring 1 (GNU ld for .debug_ranges) over -2 is that:
```
{-1, -2}    <<< base address selection entry
{0, length} <<< address range
```
may create a situation where low_pc is greater than high_pc. So we use
1, the GNU ld behavior for .debug_ranges

For other .debug_* sections, there haven't been many reports. One issue is that
bloaty (src/dwarf.cc) can incorrectly count address ranges in .debug_ranges . To
reduce similar disruption, this patch changes the tombstone values to be similar to GNU ld.

This does mean another behavior change to the default trunk behavior. Sorry
about it. The default trunk behavior will be similar to release/11.x while we work on a transition plan for LLD users.

Reviewed By: dblaikie, echristo

Differential Revision: https://reviews.llvm.org/D84825

3 years agoBPF: fix libLLVMBPFCodeGen.so build failure
Yonghong Song [Thu, 6 Aug 2020 22:22:03 +0000 (15:22 -0700)]
BPF: fix libLLVMBPFCodeGen.so build failure

Buildbot reported a build failure when building shared
library libLLVMBPFCodeGen.so with unknown reference to
"createCFGSimplificationPass".

Commit 87cba434027b ("BPF: add a SimplifyCFG IR pass during
generic Scalar/IPO optimization") added an IR pass SimplifyCFG
by BPF target. The commit called function
createCFGSimplificationPass() defined in "Scalar" library.
Add this library in Target/BPF/LLVMBuild.txt so
shared library build can succeed.

3 years ago[AMDGPU] Correct missing sram-ecc target feature for gfx906
Tony [Thu, 6 Aug 2020 22:04:20 +0000 (22:04 +0000)]
[AMDGPU] Correct missing sram-ecc target feature for gfx906

Differential Revision: https://reviews.llvm.org/D85476

3 years agoAMDGPU/GlobalISel: Enable s_{and|or}n2_{b32|b64} patterns
Matt Arsenault [Sun, 26 Jul 2020 21:47:59 +0000 (17:47 -0400)]
AMDGPU/GlobalISel: Enable s_{and|or}n2_{b32|b64} patterns

3 years ago[msan] Support %ms in scanf.
Evgenii Stepanov [Wed, 5 Aug 2020 19:32:17 +0000 (12:32 -0700)]
[msan] Support %ms in scanf.

Differential Revision: https://reviews.llvm.org/D85350

3 years ago[flang][msvc] Do not use gcc/clang command line options for msvc.
Michael Kruse [Thu, 6 Aug 2020 20:44:47 +0000 (15:44 -0500)]
[flang][msvc] Do not use gcc/clang command line options for msvc.

The command line options `-Wno-error` and `-Wno-unused-parameter` are specific to gcc/clang, do not use them when compiling with other compilers.

This patch is part of the series to [[ http://lists.llvm.org/pipermail/flang-dev/2020-July/000448.html | make flang compilable with MS Visual Studio ]].

Reviewed By: isuruf

Differential Revision: https://reviews.llvm.org/D85355

3 years ago[InstCombine] Fold (x + C1) * (-1<<C2) --> (-C1 - x) * (1<<C2)
Roman Lebedev [Thu, 6 Aug 2020 20:04:05 +0000 (23:04 +0300)]
[InstCombine] Fold  (x + C1) * (-1<<C2)  -->  (-C1 - x) * (1<<C2)

Negator knows how to do this, but the one-use reasoning is getting
a bit muddy here, we don't really want to increase instruction count,
so we need to both lie that "IsNegation" and have an one-use check
on the outermost LHS value.

3 years ago[InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold
Roman Lebedev [Thu, 6 Aug 2020 18:10:43 +0000 (21:10 +0300)]
[InstCombine] Generalize  %x * (-1<<C)  -->  (-%x) * (1<<C)  fold

Multiplication is commutative, and either of operands can be negative,
so if the RHS is a negated power-of-two, we should try to make it
true power-of-two (which will allow us to turn it into a left-shift),
by trying to sink the negation down into LHS op.

But, we shouldn't re-invent the logic for sinking negation,
let's just use Negator for that.

Tests and original patch by: Simon Pilgrim @RKSimon!

Differential Revision: https://reviews.llvm.org/D85446

3 years ago[NFC][InstCombine] Add some more tests for negation sinking into mul
Roman Lebedev [Thu, 6 Aug 2020 19:43:39 +0000 (22:43 +0300)]
[NFC][InstCombine] Add some more tests for negation sinking into mul

3 years ago[InstCombine] Fold sdiv exact X, -1<<C --> -(ashr exact X, C)
Roman Lebedev [Thu, 6 Aug 2020 18:08:30 +0000 (21:08 +0300)]
[InstCombine] Fold  sdiv exact X, -1<<C  -->  -(ashr exact X, C)

While that does increases instruction count,
shift is obviously better than a division.

Name: base
Pre: (1<<C1) >= 0
%o0 = shl i8 1, C1
%r = sdiv exact i8 C0, %o0
  =>
%r = ashr exact i8 C0, C1

Name: neg
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0
  =>
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0

Name: reverse
Pre: C1 != 0 && C1 u< 8
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0
  =>
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0

https://rise4fun.com/Alive/MRplf

3 years ago[NFC][InstCombine] Negator: add a comment about negating exact arithmentic shift
Roman Lebedev [Thu, 6 Aug 2020 18:07:45 +0000 (21:07 +0300)]
[NFC][InstCombine] Negator: add a comment about negating exact arithmentic shift

3 years ago[InstCombine] Generalize sdiv exact X, 1<<C --> ashr exact X, C fold to handle...
Roman Lebedev [Thu, 6 Aug 2020 17:18:55 +0000 (20:18 +0300)]
[InstCombine] Generalize  sdiv exact X, 1<<C  -->  ashr exact X, C  fold to handle non-splat vectors

3 years ago[NFC][InstCombine] Better tests for x s/EXACT (1 << y) pattern
Roman Lebedev [Thu, 6 Aug 2020 18:04:03 +0000 (21:04 +0300)]
[NFC][InstCombine] Better tests for  x s/EXACT (1 << y)  pattern

3 years ago[NFC][InstCombine] Tests for x s/EXACT (-1 << y) pattern
Roman Lebedev [Thu, 6 Aug 2020 16:57:33 +0000 (19:57 +0300)]
[NFC][InstCombine] Tests for  x s/EXACT (-1 << y)  pattern

3 years agoUnify the code that updates the ArchSpec after finding a fat binary
Adrian Prantl [Thu, 6 Aug 2020 17:52:16 +0000 (10:52 -0700)]
Unify the code that updates the ArchSpec after finding a fat binary
with how it is done for a lean binary

In particular this affects how target create --arch is handled — it
allowed us to override the deployment target (a useful feature for the
expression evaluator), but the fat binary case didn't.

rdar://problem/66024437

Differential Revision: https://reviews.llvm.org/D85049

(cherry picked from commit 470bdd3caaab0b6e0ffed4da304244be40b78668)

3 years agoAdd -Wtautological-value-range-compare warning.
Richard Smith [Tue, 4 Aug 2020 22:33:57 +0000 (15:33 -0700)]
Add -Wtautological-value-range-compare warning.

This warning diagnoses cases where an expression is compared to a
constant, and the comparison is tautological due to the form of the
expression (but not merely due to its type). This applies in cases such
as comparisons of bit-fields and the result of bit-masks.

The new warning is added to the Clang diagnostic group
-Wtautological-constant-in-range-compare but not to the
formerly-equivalent GCC-compatibility diagnostic group -Wtype-limits,
which retains its old meaning of diagnosing only tautological
comparisons to extremal values of a type (eg, int > INT_MAX).

Reviewed By: rtrieu

Differential Revision: https://reviews.llvm.org/D85256

3 years ago[LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2...
Craig Topper [Thu, 6 Aug 2020 19:44:30 +0000 (12:44 -0700)]
[LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers.

One of the callers only wants the condition, but the vselect can
be simplified by getNode making it hard or impossible to retrieve
the condition.

Instead, return the condition and make the other 2 callers
responsible for creating the vselect node using the condition.
Rename the function to WidenVSELECTMask accordingly.

Differential Revision: https://reviews.llvm.org/D85468

3 years ago[X86] Optimize out a few extra strlen calls in getX86TargetCPU. NFCI
Craig Topper [Thu, 6 Aug 2020 18:07:21 +0000 (11:07 -0700)]
[X86] Optimize out a few extra strlen calls in getX86TargetCPU. NFCI

We had a conversion from const char * to StringRef and const char *
to std::string conversion. These both do their own
strlen call if the compiler doens't figure out how to share them.
By adding the temporary StringRef we can convert it to std::string
instead.

The other case is to use a StringSwitch<StringRef> instead of
StringSwitch<const char *> since the output values of the switch
are string literals. This allows the length to be computed at
compile time. Otherwise we have to convert from const char *
to std::string after the StringSwitch.

3 years ago[X86] Make getX86TargetCPU return std::string instead of const char *. Remove call...
Craig Topper [Thu, 6 Aug 2020 16:23:22 +0000 (09:23 -0700)]
[X86] Make getX86TargetCPU return std::string instead of const char *. Remove call to MakeArgString. NFCI

I believe this function used to be called directly from X86
specific code and was used to immediately create -target-cpu
command line. A later refactoring changed it to to be called from
a generic getCPU function that returns std::string. So on some
paths we created a string using MakeArgString converted that to
std::string then called MakeArgString again from that.

Instead just return std::string directly like the other targets.

3 years agoBPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization
Yonghong Song [Wed, 5 Aug 2020 21:11:35 +0000 (14:11 -0700)]
BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization

The following bpf linux kernel selftest failed with latest
llvm:
  $ ./test_progs -n 7/10
  ...
  The sequence of 8193 jumps is too complex.
  verification time 126272 usec
  stack depth 320
  processed 114799 insns (limit 1000000)
  ...
  libbpf: failed to load object 'pyperf600_nounroll.o'
  test_bpf_verif_scale:FAIL:110
  #7/10 pyperf600_nounroll.o:FAIL
  #7 bpf_verif_scale:FAIL

After some investigation, I found the following llvm patch
  https://reviews.llvm.org/D84108
is responsible. The patch disabled hoisting common instructions
in SimplifyCFG by default. Later on, the code changes and a
SimplifyCFG phase with hoisting on cannot do the work any more.

A test is provided to demonstrate the problem.
The IR before simplifyCFG looks like:
  for.cond:
    %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
    %cmp = icmp ult i32 %i.0, 6
    br i1 %cmp, label %for.body, label %for.cond.cleanup

  for.cond.cleanup:
    %2 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
    %cmp2 = icmp eq i8* %2, null
    %conv = zext i1 %cmp2 to i32
    call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3
    call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3
    ret i32 %conv

  for.body:
    %3 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
    %tobool.not = icmp eq i8* %3, null
    br i1 %tobool.not, label %for.inc, label %land.lhs.true

The first two insns of `for.cond.cleanup` and `for.body`, load and
icmp, can be hoisted to `for.cond` block. With Patch D84108, the
optimization is delayed. But unfortunately, later on loop rotation
added addition phi nodes to `for.body` and hoisting cannot
be done any more.

Note such a hoisting is beneficial to bpf programs as
bpf verifier does path sensitive analysis and verification.
The hoisting preverts reloading from stack which will assume
conservative value and increase exploited insns. In this case,
it caused verifier failure.

To fix this problem, I added an IR pass from bpf target
to performance additional simplifycfg with hoisting common inst
enabled.

Differential Revision: https://reviews.llvm.org/D85434

3 years ago[NFC] Rename BBSectionsPrepare -> BasicBlockSections.
Snehasish Kumar [Thu, 6 Aug 2020 00:11:48 +0000 (17:11 -0700)]
[NFC] Rename BBSectionsPrepare -> BasicBlockSections.

Rename the BBSectionsPrepare pass as suggested by the review comment in
https://reviews.llvm.org/D85368.

Differential Revision: https://reviews.llvm.org/D85380

3 years agoAdd missing override to Makefile
Adrian Prantl [Thu, 6 Aug 2020 20:07:06 +0000 (13:07 -0700)]
Add missing override to Makefile

3 years ago[InstSimplify] avoid crashing by trying to rem-by-zero
Sanjay Patel [Thu, 6 Aug 2020 20:05:04 +0000 (16:05 -0400)]
[InstSimplify] avoid crashing by trying to rem-by-zero

Bug was noted in the post-commit comments for:
rGe8760bb9a8a3

3 years ago[LLDB] Skip test_launch_simple from TestTargetAPI.py when remote
Jonas Devlieghere [Thu, 6 Aug 2020 20:02:52 +0000 (13:02 -0700)]
[LLDB] Skip test_launch_simple from TestTargetAPI.py when remote

3 years agoclang: Use byref for aggregate kernel arguments
Matt Arsenault [Wed, 6 May 2020 00:24:53 +0000 (20:24 -0400)]
clang: Use byref for aggregate kernel arguments

Add address space to indirect abi info and use it for kernels.

Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.

Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).

This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.

This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments. This also
skips using it for OpenCL due to the current workaround used to
support kernels calling kernels. Distinct function bodies would need
to be generated up front instead of emitting an illegal call.

3 years ago[VectorCombine] add tests for load+insert; NFC
Sanjay Patel [Thu, 6 Aug 2020 16:54:55 +0000 (12:54 -0400)]
[VectorCombine] add tests for load+insert; NFC

3 years agoCorrectly detect legacy iOS simulator Mach-O objectfiles
Adrian Prantl [Wed, 5 Aug 2020 20:57:14 +0000 (13:57 -0700)]
Correctly detect legacy iOS simulator Mach-O objectfiles

The code in ObjectFileMachO didn't disambiguate between ios and
ios-simulator object files for Mach-O objects using the legacy
ambiguous LC_VERSION_MIN load commands. This used to not matter before
taught ArchSpec that ios and ios-simulator are no longer compatible.

rdar://problem/66545307

Differential Revision: https://reviews.llvm.org/D85358

3 years ago[libc] Add tolower, toupper implementation.
cgyurgyik [Thu, 6 Aug 2020 19:21:07 +0000 (15:21 -0400)]
[libc] Add tolower, toupper implementation.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D85326

3 years ago[SLP] Fix order of `insertelement`/`insertvalue` seed operands
Anton Afanasyev [Tue, 14 Jul 2020 15:04:25 +0000 (18:04 +0300)]
[SLP] Fix order of `insertelement`/`insertvalue` seed operands

Summary:
This patch takes the indices operands of `insertelement`/`insertvalue`
into account while generation of seed elements for `findBuildAggregate()`.
This function has kept the original order of `insert`s before.
Also this patch optimizes `findBuildAggregate()` preventing it from
redundant temporary vector allocations and its multiple reversing.

Fixes llvm.org/pr44067

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83779

3 years agoFix CFI issues in <future>
Evgenii Stepanov [Thu, 6 Aug 2020 18:32:33 +0000 (11:32 -0700)]
Fix CFI issues in <future>

This change fixes errors reported by Control Flow Integrity (CFI) checking when using `std::packaged_task`.  The errors mostly stem from casting the underlying storage (`__buf_`) to `__base*`, even if it is uninitialized.  The solution is to wrap `__base*` access to `__buf_` behind a getter marked with _LIBCPP_NO_CFI.

Differential Revision: https://reviews.llvm.org/D82627

3 years agoAdd freeze keyword to IR emacs mode
Matt Arsenault [Thu, 6 Aug 2020 18:57:11 +0000 (14:57 -0400)]
Add freeze keyword to IR emacs mode

3 years ago[mlir][SPIR-V] Fix wrongly placed Rationale section.
MaheshRavishankar [Thu, 6 Aug 2020 18:48:49 +0000 (11:48 -0700)]
[mlir][SPIR-V] Fix wrongly placed Rationale section.

Differential Revision: https://reviews.llvm.org/D85461

3 years ago[lldb] Use target.GetLaunchInfo() instead of creating an empty one.
Jonas Devlieghere [Thu, 6 Aug 2020 18:45:12 +0000 (11:45 -0700)]
[lldb] Use target.GetLaunchInfo() instead of creating an empty one.

Update tests that were creating an empty LaunchInfo instead of using the
one coming from the target. This ensures target properties are honored.

3 years ago[clangd] Fix crash in bugprone-bad-signal-to-kill-thread clang-tidy check.
Aleksandr Platonov [Thu, 6 Aug 2020 18:44:08 +0000 (21:44 +0300)]
[clangd] Fix crash in bugprone-bad-signal-to-kill-thread clang-tidy check.

Inside clangd, clang-tidy checks don't see preprocessor events in the preamble.
This leads to `Token::PtrData == nullptr` for tokens that the macro is defined to.
E.g. `#define SIGTERM 15`:
- Token::Kind == tok::numeric_constant (Token::isLiteral() == true)
- Token::UintData == 2
- Token::PtrData == nullptr

As the result of this, bugprone-bad-signal-to-kill-thread check crashes at null-dereference inside clangd.

Reviewed By: hokein

Differential Revision: https://reviews.llvm.org/D85417

3 years ago[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.
dfukalov [Fri, 31 Jul 2020 00:56:54 +0000 (03:56 +0300)]
[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.

Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.

Fixed line endings in test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84995

3 years agoGlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT
Matt Arsenault [Mon, 27 Jul 2020 13:58:17 +0000 (09:58 -0400)]
GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT

Use the same basic strategy as LegalizeVectorTypes. Try to index into
smaller pieces if there's a constant index, and otherwise fall back to
a stack temporary.

3 years ago[NewPM][LoopUnswitch] Pin loop-unswitch to legacy PM or use simple-loop-unswitch
Arthur Eubanks [Wed, 5 Aug 2020 20:49:00 +0000 (13:49 -0700)]
[NewPM][LoopUnswitch] Pin loop-unswitch to legacy PM or use simple-loop-unswitch

As mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html,
loop-unswitch has not been ported to the NPM. Instead people are using
simple-loop-unswitch.

Pin all tests in Transforms/LoopUnswitch to legacy PM and replace all
other uses of loop-unswitch with simple-loop-unswitch.

One test that didn't fit into the above was
2014-06-21-congruent-constant.ll which seems to only pass with
loop-unswitch. That is also pinned to legacy PM.

Now all tests containing "-loop-unswitch" anywhere in the test succeed with
NPM turned on by default.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85360

3 years ago[HIP] Ignore invalid ar linker options
Aaron En Ye Shi [Wed, 5 Aug 2020 19:53:30 +0000 (19:53 +0000)]
[HIP] Ignore invalid ar linker options

Instead of accepting the same arguments as regular linker,
the static linker will only accept input files.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D85442

3 years ago[lldb/testsuite] Change get_debugserver_exe to support Rosetta
Fred Riss [Thu, 6 Aug 2020 17:38:06 +0000 (10:38 -0700)]
[lldb/testsuite] Change get_debugserver_exe to support Rosetta

In order to be able to run the debugserver tests against the Rosetta
debugserver, detect the Rosetta run configuration and return the
system Rosetta debugserver.

3 years ago[NewPM] Pin -assumption-cache-tracker tests to legacy PM
Arthur Eubanks [Thu, 6 Aug 2020 04:32:26 +0000 (21:32 -0700)]
[NewPM] Pin -assumption-cache-tracker tests to legacy PM

All tests have corresponding NPM RUN lines.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85395

3 years agoAMDGPU: Define raw/struct variants of buffer atomic fadd
Matt Arsenault [Tue, 4 Aug 2020 21:42:47 +0000 (17:42 -0400)]
AMDGPU: Define raw/struct variants of buffer atomic fadd

Somehow the new FP atomic buffer intrinsics ended up using the legacy
style for buffer intrinsics.

3 years agoRemove unused variable "saved_opts".
Sterling Augustine [Thu, 6 Aug 2020 17:16:23 +0000 (10:16 -0700)]
Remove unused variable "saved_opts".

wattr_get is a macro, and the documentation states:
"The parameter opts is reserved for  future use,
applications must supply a null pointer."

In practice, passing a variable there is harmless, except
that it is unused inside the macro, which causes unused
variable warnings.

The various places where

3 years agoAArch64/GlobalISel: Fix verifier error after selecting returnaddress
Matt Arsenault [Thu, 6 Aug 2020 17:10:08 +0000 (13:10 -0400)]
AArch64/GlobalISel: Fix verifier error after selecting returnaddress

This was caching the wrong register to re-use later.

3 years ago[SLP][X86] Regenerate sdiv test noticed in D83779. NFC.
Simon Pilgrim [Thu, 6 Aug 2020 17:00:06 +0000 (18:00 +0100)]
[SLP][X86] Regenerate sdiv test noticed in D83779. NFC.

3 years ago[NFC]{MLInliner] Point out the tests' model dependencies
Mircea Trofin [Thu, 6 Aug 2020 16:21:14 +0000 (09:21 -0700)]
[NFC]{MLInliner] Point out the tests' model dependencies

3 years agoAMDGPU: Fix spilling of 96-bit AGPRs
Matt Arsenault [Fri, 31 Jul 2020 23:31:07 +0000 (19:31 -0400)]
AMDGPU: Fix spilling of 96-bit AGPRs

3 years agoAMDGPU/GlobalISel: Start trying to handle AGPR bank
Matt Arsenault [Thu, 2 Jul 2020 02:34:16 +0000 (22:34 -0400)]
AMDGPU/GlobalISel: Start trying to handle AGPR bank

Try to use AGPR banks for the various merge/unmerge type
operations. Previously these would introduce copies to VGPR.

3 years agoGlobalISel: Define InvalidRegBankID enum value
Matt Arsenault [Thu, 16 Jul 2020 21:18:43 +0000 (17:18 -0400)]
GlobalISel: Define InvalidRegBankID enum value

3 years ago[OPENMP]Fix for Windows buildbots, NFC.
Alexey Bataev [Thu, 6 Aug 2020 16:36:52 +0000 (12:36 -0400)]
[OPENMP]Fix for Windows buildbots, NFC.

3 years ago[OPENMP]Redesign of OMPExecutableDirective/OMPDeclarativeDirective representation.
Alexey Bataev [Fri, 26 Jun 2020 21:42:31 +0000 (17:42 -0400)]
[OPENMP]Redesign of OMPExecutableDirective/OMPDeclarativeDirective representation.

Summary:
Introduced OMPChildren class to handle all associated clauses, statement
and child expressions/statements. It allows to represent some directives
more correctly (like flush, depobj etc. with pseudo clauses, ordered
depend directives, which are standalone, and target data directives).
Also, it will make easier to avoid using of CapturedStmt in directives,
if required (atomic, tile etc. directives).
Also, it simplifies serialization/deserialization of the
executable/declarative directives.
Reduces number of allocation operations for mapper declarations.

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, jfb, cfe-commits, sstefan1, aaron.ballman, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D83261

3 years ago[InstCombine] Add tests for mul(add(x,c),negpow2) -> mul(sub(-c,x),pow2) fold
Simon Pilgrim [Thu, 6 Aug 2020 16:13:13 +0000 (17:13 +0100)]
[InstCombine] Add tests for mul(add(x,c),negpow2) -> mul(sub(-c,x),pow2) fold

Also fix some undef vector elements in the similar vector tests that I missed.

3 years ago[llvm][MLInliner] Don't log 'mandatory' events
Mircea Trofin [Thu, 6 Aug 2020 16:04:15 +0000 (09:04 -0700)]
[llvm][MLInliner] Don't log 'mandatory' events

We don't want mandatory events in the training log. We do want to handle
them, to keep the native size accounting accurate, but that's all.

Fixed the code, also expanded the test to capture this.

Differential Revision: https://reviews.llvm.org/D85373

3 years ago[OpenMP] Fix ref count dec for implicit map of partial data
Joel E. Denny [Thu, 6 Aug 2020 15:36:27 +0000 (11:36 -0400)]
[OpenMP] Fix ref count dec for implicit map of partial data

D85342 broke this case.  The new test case presents an example.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D85369

3 years ago[lldb][NFC] Document and encapsulate OriginMap in ASTContextMetadata
Raphael Isemann [Tue, 4 Aug 2020 12:18:41 +0000 (14:18 +0200)]
[lldb][NFC] Document and encapsulate OriginMap in ASTContextMetadata

Just adds the respective accessor functions to ASTContextMetadata instead
of directly exposing the OriginMap to the whole world.

3 years ago[InstCombine] Add tests for mul(sub(x,y),negpow2) -> mul(sub(y,x),pow2) fold
Simon Pilgrim [Thu, 6 Aug 2020 15:31:40 +0000 (16:31 +0100)]
[InstCombine] Add tests for mul(sub(x,y),negpow2) -> mul(sub(y,x),pow2) fold

Add full vector coverage (that currently are not folded).

3 years agoPDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI.
Simon Pilgrim [Thu, 6 Aug 2020 15:13:50 +0000 (16:13 +0100)]
PDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI.

We already need to include raw_ostream.h, also add missing StringRef.h implicit dependency.

3 years ago[ELF] Allow sections after a non-SHF_ALLOC section to be covered by PT_LOAD
Fangrui Song [Thu, 6 Aug 2020 15:26:43 +0000 (08:26 -0700)]
[ELF] Allow sections after a non-SHF_ALLOC section to be covered by PT_LOAD

GNU ld allows sections after a non-SHF_ALLOC section to be covered by PT_LOAD
(PR37607) and assigns addresses to non-SHF_ALLOC output sections (similar to
SHF_ALLOC NOBITS sections. The location counter is not advanced).

This patch tries to fix PR37607 (remove a special case in
`Writer<ELFT>::createPhdrs`). To make the created PT_LOAD meaningful, we cannot
reset dot to 0 for a middle non-SHF_ALLOC output section. This results in
removal of two special cases in LinkerScript::assignOffsets. Non-SHF_ALLOC
non-orphan sections can have non-zero addresses like in GNU ld.

The zero address rule for non-SHF_ALLOC sections is weakened to apply to orphan
only. This results in a special case in createSection and findOrphanPos, respectively.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85100

3 years agoAMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
Matt Arsenault [Sun, 26 Jul 2020 15:46:23 +0000 (11:46 -0400)]
AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}

These intrinsics are missing mangling for both the pointer and data
type.

3 years agoAMDGPU/GlobalISel: Try to promote to use packed saturating add/sub
Matt Arsenault [Wed, 15 Jul 2020 17:49:03 +0000 (13:49 -0400)]
AMDGPU/GlobalISel: Try to promote to use packed saturating add/sub

This produces worse results right now for i8 vectors, but that should
be addressed when we actually try to optimize packed vectors.

3 years ago[PatternMatch] allow intrinsic form of min/max with existing matchers
Sanjay Patel [Thu, 6 Aug 2020 14:49:26 +0000 (10:49 -0400)]
[PatternMatch] allow intrinsic form of min/max with existing matchers

I skimmed the existing users of these matchers and don't see any problems
(eg, the caller assumes the matched value was a select instruction without checking).

So I think we can generalize the matching to allow the new intrinsics or the cmp+select idioms.

I did not find any unit tests for the matchers, so added some basics there. The instsimplify
tests are adapted from existing tests for the cmp+select pattern and cover the folds in
simplifyICmpWithMinMax().

Differential Revision: https://reviews.llvm.org/D85230

3 years agoAMDGPU/GlobalISel: Move frame index selection to patterns
Matt Arsenault [Fri, 31 Jul 2020 20:01:38 +0000 (16:01 -0400)]
AMDGPU/GlobalISel: Move frame index selection to patterns

Doesn't really save any code until global value is handled too.

3 years agoAMDGPU: Fix code duplication between the selectors
Matt Arsenault [Sun, 26 Jul 2020 15:30:44 +0000 (11:30 -0400)]
AMDGPU: Fix code duplication between the selectors

Not sure this is the right place for this helper.

3 years ago[XCOFF][AIX] Put each jump table in an independent section if -ffunction-sections...
jasonliu [Thu, 6 Aug 2020 13:19:59 +0000 (13:19 +0000)]
[XCOFF][AIX] Put each jump table in an independent section if -ffunction-sections is specified

If a function is in a unique section, putting all jump tables in
 .rodata will prevent functions that have a jump table to get
garbage collect by the linker.
Therefore, we need to put jump table into a unique section as well.

Reviewed By: Xiangling_L

Differential Revision: https://reviews.llvm.org/D84761

3 years agoAMDGPU/GlobalISel: Implement expansion for rsq.clamp
Matt Arsenault [Sat, 25 Jul 2020 20:26:33 +0000 (16:26 -0400)]
AMDGPU/GlobalISel: Implement expansion for rsq.clamp

Not sure why we handle this removed instruction on newer subtargets
for this one and no others, but maintain compatibility with the DAG.

3 years agoAMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops
Matt Arsenault [Sat, 18 Jul 2020 16:35:28 +0000 (12:35 -0400)]
AMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops

3 years agoAMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering
Matt Arsenault [Fri, 10 Jul 2020 20:12:35 +0000 (16:12 -0400)]
AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering

We really need to put this undef padding stuff into a helper
somewhere, but leave that for when this is moved to generic code.

3 years agoAMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses
Matt Arsenault [Fri, 31 Jul 2020 15:04:13 +0000 (11:04 -0400)]
AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses

3 years ago[flang] Add options to control IMPLICIT NONE
Tim Keith [Thu, 6 Aug 2020 13:47:59 +0000 (06:47 -0700)]
[flang] Add options to control IMPLICIT NONE

Add `-fimplicit-none-type-always` to treat each specification-part
like it has `IMPLICIT NONE`. This is helpful for enforcing good Fortran
programming practices. We might consider something similar for
`IMPLICIT NONE(EXTERNAL)` as well.

Add `-fimplicit-none-type-never` to ignore occurrences of `IMPLICIT NONE`
and `IMPLICIT NONE(TYPE)`. This is to handle cases like the one below,
which violates the standard but it accepted by some compilers:
```
subroutine s(a, n)
  implicit none
  real :: a(n)
  integer :: n
end
```

Differential Revision: https://reviews.llvm.org/D85363

3 years agoAMDGPU/GlobalISel: Make s16 phi legal
Matt Arsenault [Thu, 30 Jul 2020 02:34:11 +0000 (22:34 -0400)]
AMDGPU/GlobalISel: Make s16 phi legal

If we were to have an operation with an s16 def that needs to be
executed in a waterfall loop, not having s16 legal would place an
avoidable burden on RegBankSelect to widen it.

3 years agoAMDGPU/GlobalISel: Fix assert on copy to vcc
Matt Arsenault [Wed, 22 Jul 2020 03:24:02 +0000 (23:24 -0400)]
AMDGPU/GlobalISel: Fix assert on copy to vcc

This was trying to constrain a physical register. By the verifier's
understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so
don't try to handle physregs.

3 years agoRevert "PDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI."
Raphael Isemann [Thu, 6 Aug 2020 13:15:36 +0000 (15:15 +0200)]
Revert "PDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI."

This reverts commit 87c5437afd273e909e0fed3389de7531d5452ea5.

The commit includes several headers in the middle of a function, which
breaks pretty much everything.

3 years ago[mlir][Vector] NFC - Use matchAndRewrite in ContractionOp lowering patterns
Nicolas Vasilache [Thu, 6 Aug 2020 13:01:57 +0000 (09:01 -0400)]
[mlir][Vector] NFC - Use matchAndRewrite in ContractionOp lowering patterns

Replace the use of separate match and rewrite which unnecessarily duplicates logic.

Differential Revision: https://reviews.llvm.org/D85421

3 years ago[DOCS] Add more detail to stack protector documentation
Peter Smith [Wed, 5 Aug 2020 11:47:54 +0000 (12:47 +0100)]
[DOCS] Add more detail to stack protector documentation

The Clang -fstack-protector documentation mentions what functions are considered
vulnerable but does not mention the details of the implementation such as the use
of a global variable for the guard value. This brings the documentation more in
line with the GCC documentation at:
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html
and gives someone using the option more idea about what is protected.

This partly addresses https://bugs.llvm.org/show_bug.cgi?id=42764

Differential Revision: https://reviews.llvm.org/D85239

3 years ago[DWARFYAML][debug_info] Make the 'Values' field optional.
Xing GUO [Thu, 6 Aug 2020 12:41:12 +0000 (20:41 +0800)]
[DWARFYAML][debug_info] Make the 'Values' field optional.

This patch makes the 'Values' field optional. This is useful when we
handcraft the terminating entry of DIEs.

```
debug_info:
  - Version:  4
    ...
    Entries:
      - AbbrCode: 1
        Values:
          - Value: 0x1234
      - AbbrCode: 0 ## Termination
```

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D85397

3 years ago[obj2yaml] Test dumping an empty .debug_aranges section.
Xing GUO [Thu, 6 Aug 2020 12:39:34 +0000 (20:39 +0800)]
[obj2yaml] Test dumping an empty .debug_aranges section.

This patch adds one test case that tests dumping an empty .debug_aranges
section.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D85405

3 years ago[lldb][AArch64] Correct compile options for Neon corefile
David Spickett [Mon, 3 Aug 2020 14:09:48 +0000 (15:09 +0100)]
[lldb][AArch64] Correct compile options for Neon corefile

SVE is not required, it has its own test. Note that
there is no "+neon" so "+simd" is used instead.

Also rename the file to match the name of the corefile
it produces.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D85134

3 years ago[LLDB] Skip test_launch_simple from TestTargetAPI.py on Arm/AArch64 Linux
Muhammad Omair Javaid [Thu, 6 Aug 2020 12:33:26 +0000 (17:33 +0500)]
[LLDB] Skip test_launch_simple from TestTargetAPI.py on Arm/AArch64 Linux

Recently added TestTargetAPI.py test "test_launch_simple" is failing on
Arm/AArch64 Linux targets. Putting them to skip until fixed.

Differential Revision: https://reviews.llvm.org/D85235

3 years ago[GlobalISel][InlineAsm] Fix matching input constraint to physreg
Petar Avramovic [Thu, 6 Aug 2020 12:26:10 +0000 (14:26 +0200)]
[GlobalISel][InlineAsm] Fix matching input constraint to physreg

Add given input and mark it as tied.
Doesn't create additional copy compared to
matching input constraint to virtual register.

Differential Revision: https://reviews.llvm.org/D85122

3 years ago[ABI][NFC] Fix the confusion of ByVal and ByRef argument names
Anatoly Trosinenko [Thu, 6 Aug 2020 11:34:10 +0000 (14:34 +0300)]
[ABI][NFC] Fix the confusion of ByVal and ByRef argument names

The second argument of getNaturalAlignIndirect() was `bool ByRef`, but
the implementation was just delegating to getIndirect() with `ByRef`
passed unchanged to `bool ByVal` parameter of getIndirect().

Fix a couple of /*ByRef=*/ comments as well.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D85113

3 years agoBitstreamRemarkParser.h - remove unnecessary includes. NFCI.
Simon Pilgrim [Thu, 6 Aug 2020 12:17:36 +0000 (13:17 +0100)]
BitstreamRemarkParser.h - remove unnecessary includes. NFCI.

Remove unused includes, moving to the lib header or cpp file as necessary.

3 years agoRevert "[ELF] Allow sections after a non-SHF_ALLOC section to be covered by PT_LOAD"
Muhammad Omair Javaid [Thu, 6 Aug 2020 11:30:05 +0000 (16:30 +0500)]
Revert "[ELF] Allow sections after a non-SHF_ALLOC section to be covered by PT_LOAD"

This reverts commit 030ddc0a0bb9e2b25319eb681d520a9cee32b761.

This breaks http://lab.llvm.org:8011/builders/lldb-arm-ubuntu
and http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu

Differential Revision: https://reviews.llvm.org/D85100

3 years agoFix include sorting order. NFC
Simon Pilgrim [Thu, 6 Aug 2020 10:46:53 +0000 (11:46 +0100)]
Fix include sorting order. NFC

3 years ago[X86] getX86MaskVec - replace mask limit from NumElts < 8 with NumElts <= 4
Simon Pilgrim [Thu, 6 Aug 2020 10:46:19 +0000 (11:46 +0100)]
[X86] getX86MaskVec - replace mask limit from NumElts < 8 with NumElts <= 4

As noted on PR46885, the number of mask elements should always be a power of 2, so to fix the static analyzer warning we are better off replacing the condition to <= 4, and I've added a pow2 assertion as well.

3 years agoPDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI.
Simon Pilgrim [Thu, 6 Aug 2020 10:28:29 +0000 (11:28 +0100)]
PDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI.

We already need to include raw_ostream.h, also add missing StringRef.h and cstdint implicit dependencies.

Remove unnecessary includes from PDBExtras.cpp

3 years ago[X86][SSE] Expose all memory offsets in expand load tests
Simon Pilgrim [Wed, 5 Aug 2020 19:14:48 +0000 (20:14 +0100)]
[X86][SSE] Expose all memory offsets in expand load tests

Since we're messing with individual element loads we need to expose this to show whats going on.

Part of the work to fix the masked_expandload.ll regressions in D66004

3 years ago[SVE] Lower scalable vector mul operations.
Paul Walker [Wed, 5 Aug 2020 16:19:54 +0000 (17:19 +0100)]
[SVE] Lower scalable vector mul operations.

This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.

Differential Revision: https://reviews.llvm.org/D85328

3 years ago[mlir][Linalg] Introduce canonicalization to remove dead LinalgOps
Nicolas Vasilache [Thu, 6 Aug 2020 09:13:33 +0000 (05:13 -0400)]
[mlir][Linalg] Introduce canonicalization to remove dead LinalgOps

When any of the memrefs in a structured linalg op has a zero dimension, it becomes dead.
This is consistent with the fact that linalg ops deduce their loop bounds from their operands.

Note however that this is not the case for the `tensor<0xelt_type>` which is a special convention
that must be lowered away into either `memref<elt_type>` or just `elt_type` before this
canonicalization can kick in.

Differential Revision: https://reviews.llvm.org/D85413

3 years ago[SVE] Implement lowering for fixed length vector multiplication.
Paul Walker [Thu, 6 Aug 2020 10:01:39 +0000 (11:01 +0100)]
[SVE] Implement lowering for fixed length vector multiplication.

NOTE: Also uses SVE code generation for NEON size vectors, instead
of expanding i64 based vector multiplications.

Differential Revision: https://reviews.llvm.org/D85327

3 years ago[MLIR] Change GpuLaunchFuncToGpuRuntimeCallsPass to wrap a RewritePattern with the...
Christian Sigg [Sat, 1 Aug 2020 13:06:25 +0000 (15:06 +0200)]
[MLIR] Change GpuLaunchFuncToGpuRuntimeCallsPass to wrap a RewritePattern with the same functionality.

The RewritePattern will become one of several, and will be part of the LLVM conversion pass (instead of a separate pass following LLVM conversion).

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D84946

3 years ago[analyzer][tests] Understand when diagnostics change between builds
Valeriy Savchenko [Wed, 5 Aug 2020 14:32:24 +0000 (17:32 +0300)]
[analyzer][tests] Understand when diagnostics change between builds

Before the patch `SATest compare`, produced quite obscure results
when something about the diagnostic have changed (i.e. its description
or the name of the corresponding checker) because it was simply two
lists of warnings, ADDED and REMOVED.  It was up to the developer
to match those warnings, understand that they are essentially the
same, and figure out what caused the difference.

This patch introduces another category of results: MODIFIED.
It tries to match new warnings against the old ones and prints out
clues on what is different between two builds.

Differential Revision: https://reviews.llvm.org/D85311

3 years ago[dsymutil] Disable dsymutil/X86/reproducer.test on windows.
Alexey Lapshin [Tue, 4 Aug 2020 16:46:33 +0000 (19:46 +0300)]
[dsymutil] Disable dsymutil/X86/reproducer.test on windows.

The dsymutil/X86/reproducer.test test could create paths
longer than MAX_PATH:

C:\ps4-buildslave2\llvm-clang-x86_64-expensive-checks-win
\build\test\tools\dsymutil\X86\Output\reproducer.test.tmp.repro
\ps4-buildslave2\llvm-clang-x86_64-expensive-checks-win\build
\test\tools\dsymutil\X86\Output\reproducer.test.tmp\Inputs\
basic1.macho.x86_64.o

Disable this test on windows.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D85294

3 years ago[mlir] Lower DimOp to LLVM for unranked memrefs.
Alexander Belyaev [Thu, 6 Aug 2020 08:33:48 +0000 (10:33 +0200)]
[mlir] Lower DimOp to LLVM for unranked memrefs.

Differential Revision: https://reviews.llvm.org/D85361

3 years ago[flang] Add parser support for OpenMP allocate clause
Irina Dobrescu [Wed, 5 Aug 2020 14:58:13 +0000 (15:58 +0100)]
[flang] Add parser support for OpenMP allocate clause

Differential Revision: https://reviews.llvm.org/D85212

3 years ago[InstCombine] Fold freeze(undef) into a proper constant
Juneyoung Lee [Thu, 30 Jul 2020 14:15:26 +0000 (23:15 +0900)]
[InstCombine] Fold freeze(undef) into a proper constant

This is a simple patch that folds freeze(undef) into a proper constant after inspecting its uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84948

3 years ago[InstCombine] Add tests for D84948; NFC
Juneyoung Lee [Thu, 6 Aug 2020 05:32:42 +0000 (14:32 +0900)]
[InstCombine] Add tests for D84948; NFC

3 years ago[LoopVectorizer] Inloop vector reductions
David Green [Thu, 6 Aug 2020 09:10:50 +0000 (10:10 +0100)]
[LoopVectorizer] Inloop vector reductions

Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069