platform/upstream/llvm.git
3 years ago[clangd] Disable msan instrumentation for generated Evaluate().
Utkarsh Saxena [Tue, 29 Sep 2020 15:06:13 +0000 (17:06 +0200)]
[clangd] Disable msan instrumentation for generated Evaluate().

MSAN build times out for generated DecisionForest inference runtime.

A solution worth trying is splitting the function into 300 smaller
functions and then re-enable msan.

For now we are disabling instrumentation for the generated function.

Differential Revision: https://reviews.llvm.org/D88495

3 years agoMSP430TargetMachine.h - remove unused includes. NFCI.
Simon Pilgrim [Tue, 29 Sep 2020 15:36:58 +0000 (16:36 +0100)]
MSP430TargetMachine.h - remove unused includes. NFCI.

3 years agoNVPTXTargetMachine.h - remove unused includes. NFCI.
Simon Pilgrim [Tue, 29 Sep 2020 15:29:51 +0000 (16:29 +0100)]
NVPTXTargetMachine.h - remove unused includes. NFCI.

3 years agoSparcSubtarget.h - cleanup include dependencies. NFCI.
Simon Pilgrim [Tue, 29 Sep 2020 15:15:35 +0000 (16:15 +0100)]
SparcSubtarget.h - cleanup include dependencies. NFCI.

TargetFrameLowering.h is guaranteed to be covered by SparcFrameLowering.h

Fix missing implicit Triple.h dependency.

3 years ago[OpenMP][VE plugin] Fixing failure to build VE plugin with consolidated error handlin...
Manoel Roemmer [Tue, 29 Sep 2020 14:21:09 +0000 (16:21 +0200)]
[OpenMP][VE plugin] Fixing failure to build VE plugin with consolidated error handling in libomptarget

The libomptarget VE plugin [[
http://lab.llvm.org:8014/builders/clang-ve-ninja/builds/8937/steps/build-unified-tree/logs/stdio
| fails zu build ]] after ae95ceeb8f98d81f615c69da02f73b5ee6b1519a .

Differential Revision: https://reviews.llvm.org/D88476

3 years ago[scudo][standalone] Fix Primary's ReleaseToOS test
Kostya Kortchinsky [Tue, 29 Sep 2020 00:21:00 +0000 (17:21 -0700)]
[scudo][standalone] Fix Primary's ReleaseToOS test

Said test was flaking on Fuchsia for non-obvious reasons, and only
for ASan variants (the release was returning 0).

It turned out that the templating was off, `true` being promoted to
a `s32` and used as the minimum interval argument. This meant that in
some circumstances, the normal release would occur, and the forced
release would have nothing to release, hence the 0 byte released.

The symbols are giving it away (note the 1):
```
scudo::SizeClassAllocator64<scudo::FixedSizeClassMap<scudo::DefaultSizeClassConfig>,24ul,1,2147483647,false>::releaseToOS(void)
```

This also probably means that there was no MTE version of that test!

Differential Revision: https://reviews.llvm.org/D88457

3 years ago[SVE] Fix typo in CHECK lines for sve-fixed-length-int-reduce.ll
Cameron McInally [Tue, 29 Sep 2020 15:12:58 +0000 (10:12 -0500)]
[SVE] Fix typo in CHECK lines for sve-fixed-length-int-reduce.ll

3 years ago[InstCombine] use redirect of input file in regression tests; NFC
Sanjay Patel [Tue, 29 Sep 2020 15:02:03 +0000 (11:02 -0400)]
[InstCombine] use redirect of input file in regression tests; NFC

This is a repeat of 1880092722 from 2009. We should have less risk
of hitting bugs at this point because we auto-generate positive CHECK
lines only, but this makes things consistent.

Copying the original commit msg:
"Change tests from "opt %s" to "opt < %s" so that opt doesn't see the
input filename so that opt doesn't print the input filename in the
output so that grep lines in the tests don't unintentionally match
strings in the input filename."

3 years ago[mlir][openacc] Add init operation
Valentin Clement [Tue, 29 Sep 2020 14:58:46 +0000 (10:58 -0400)]
[mlir][openacc] Add init operation

This patch introduces the init operation that represents the init executable directive
from the OpenACC 3.0 specifications.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D88254

3 years ago[InstCombine] Add some basic trunc(lshr(zext(x),c)) tests
Simon Pilgrim [Tue, 29 Sep 2020 14:49:43 +0000 (15:49 +0100)]
[InstCombine] Add some basic trunc(lshr(zext(x),c)) tests

Copied from the sext equivalents

3 years ago[mlir][openacc] Add wait operation
Valentin Clement [Tue, 29 Sep 2020 14:39:13 +0000 (10:39 -0400)]
[mlir][openacc] Add wait operation

This patch introduce the wait operation that represent the OpenACC wait directive.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D88125

3 years ago[clangd] Improve PopulateSwitch tweak to work on non-empty switches
Tadeo Kondrak [Tue, 29 Sep 2020 14:29:22 +0000 (16:29 +0200)]
[clangd] Improve PopulateSwitch tweak to work on non-empty switches

Improve the recently-added PopulateSwitch tweak to work on non-empty switches.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D88434

3 years ago[InstCombine] Inherit exact flags on extended shifts in trunc (lshr (sext A), C)...
Simon Pilgrim [Tue, 29 Sep 2020 14:30:46 +0000 (15:30 +0100)]
[InstCombine] Inherit exact flags on extended shifts in trunc (lshr (sext A), C) --> (ashr A, C)

This was missed in D88475

3 years ago[mlir] Expose Dialect class and registration/loading to C API
Alex Zinenko [Tue, 29 Sep 2020 14:23:02 +0000 (16:23 +0200)]
[mlir] Expose Dialect class and registration/loading to C API

- Add a minimalist C API for mlir::Dialect.
- Allow one to query the context about registered and loaded dialects.
- Add API for loading dialects.
- Provide functions to register the Standard dialect.

When used naively, this will require to separately register each dialect. When
we have more than one exposed, we can add variadic macros that expand to
individual calls.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D88162

3 years ago[InstCombine] Add exact shift tests missed in D88475
Simon Pilgrim [Tue, 29 Sep 2020 14:05:30 +0000 (15:05 +0100)]
[InstCombine] Add exact shift tests missed in D88475

I missed the post-LGTM comment from @lebedev.ri

3 years ago[Sema] Address-space sensitive check for unbounded arrays (v2)
Chris Hamilton [Tue, 29 Sep 2020 14:11:41 +0000 (16:11 +0200)]
[Sema] Address-space sensitive check for unbounded arrays (v2)

Check applied to unbounded (incomplete) arrays and pointers to spot
cases where the computed address is beyond the largest possible
addressable extent of the array, based on the address space in which the
array is delcared, or which the pointer refers to.

Check helps to avoid cases of nonsense pointer math and array indexing
which could lead to linker failures or runtime exceptions.  Of
particular interest when building for embedded systems with small
address spaces.

This is version 2 of this patch -- version 1 had some testing issues
due to a sign error in existing code.  That error is corrected and
lit test for this chagne is extended to verify the fix.

Originally reviewed/accepted by: aaron.ballman
Original revision: https://reviews.llvm.org/D86796

Reviewed By: ebevhan

Differential Revision: https://reviews.llvm.org/D88174

3 years ago[SDAG] Do not convert undef to 0 when folding CONCAT/BUILD_VECTOR
Krzysztof Parzyszek [Thu, 24 Sep 2020 23:59:02 +0000 (18:59 -0500)]
[SDAG] Do not convert undef to 0 when folding CONCAT/BUILD_VECTOR

Differential Revision: https://reviews.llvm.org/D88273

3 years ago[InstCombine] visitTrunc - trunc (lshr (sext A), C) --> (ashr A, C) non-uniform support
Simon Pilgrim [Tue, 29 Sep 2020 13:45:30 +0000 (14:45 +0100)]
[InstCombine] visitTrunc - trunc (lshr (sext A), C) --> (ashr A, C) non-uniform support

This came from @lebedev.ri's suggestion to use m_SpecificInt_ICMP for D88429 - since I was going to change the m_APInt to m_Constant for that patch I thought I would do it for the only other user of the APInt first.

I've added a ConstantExpr::getUMin helper - its trivial to add UMAX/SMIN/SMAX but thought I'd wait until we have use cases.

Differential Revision: https://reviews.llvm.org/D88475

3 years ago[mlir][openacc] Add update operation
Valentin Clement [Tue, 29 Sep 2020 13:56:54 +0000 (09:56 -0400)]
[mlir][openacc] Add update operation

This patch introduce the update operation that represent the OpenACC update directive.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D88102

3 years ago[mlir][Linalg] Refactor Linalg op initTensors support - NFC
Nicolas Vasilache [Tue, 29 Sep 2020 12:23:37 +0000 (08:23 -0400)]
[mlir][Linalg] Refactor Linalg op initTensors support - NFC

Manually-defined named ops do not currently support `init_tensors` or return values and may never support them. Add extra interface to the StructuredOpInterface so that we can still write op-agnostic transformations based on StructuredOpInterface.

This is an NFC extension in preparation for tiling on tensors.

Differential Revision: https://reviews.llvm.org/D88481

3 years ago[GlobalISel] fix widenScalarUnmerge if widen type is not a multiple of destination...
Dominik Montada [Mon, 28 Sep 2020 14:38:35 +0000 (16:38 +0200)]
[GlobalISel] fix widenScalarUnmerge if widen type is not a multiple of destination type

Fix creation of illegal unmerge when widen was requested to a type which
is not a multiple of the destination type. E.g. when trying to widen
an s48 unmerge to s64 the existing code would create an illegal unmerge
from s64 to s48.

Instead, create further unmerges to a GCD type, then use this to remerge
these intermediate results to the actual destinations.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88422

3 years ago[mlir][Linalg] Refactor Linalg creation of loops to allow passing iterArgs - NFC
Nicolas Vasilache [Tue, 29 Sep 2020 12:07:08 +0000 (08:07 -0400)]
[mlir][Linalg] Refactor Linalg creation of loops to allow passing iterArgs - NFC

This revision changes the signatures of helper function that Linalg uses to create loops so that they can also take iterArgs.
iterArgs are asserted empty to ensure no functional change.
This is a mechanical change in preparation of tiling on linalg on tensors to avoid  polluting the implementation with an NFC change.

Differential Revision: https://reviews.llvm.org/D88480

3 years ago[AArch64] Add v8.5 Branch Target Identification support.
Daniel Kiss [Tue, 29 Sep 2020 13:50:19 +0000 (15:50 +0200)]
[AArch64] Add v8.5 Branch Target Identification support.

The .note.gnu.property must be in the assembly file to indicate the
support for BTI otherwise BTI will be disabled for the whole library.
__unw_getcontext and libunwind::Registers_arm64::jumpto() may be called
indirectly therefore they should start with a landing pad.

Reviewed By: tamas.petz, #libunwind, compnerd

Differential Revision: https://reviews.llvm.org/D77786

3 years agoRevert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"
Mirko Brkusanin [Tue, 29 Sep 2020 13:29:26 +0000 (15:29 +0200)]
Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"

This reverts commit f5cd7ec9f3fc969ff5e1feed961996844333de3b.

Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.

3 years ago[mlir] Fix shared libs build
Andrzej Warzynski [Tue, 29 Sep 2020 13:20:35 +0000 (14:20 +0100)]
[mlir] Fix shared libs build

The following change causes the shared libraries build
(BUILD_SHARED_LIBS=On) to fail:
  * https://reviews.llvm.org/D88351
This patch will fix that.

Differential Revision: https://reviews.llvm.org/D88484

3 years ago[SDag] Verify DAG divergence after dumping. NFC.
Jay Foad [Mon, 28 Sep 2020 12:37:49 +0000 (13:37 +0100)]
[SDag] Verify DAG divergence after dumping. NFC.

When debugging, it's useful to be able to see the DAG that has just
failed divergence verification.

3 years ago[SDag] Refactor and simplify divergence calculation and checking. NFC.
Jay Foad [Mon, 28 Sep 2020 11:44:43 +0000 (12:44 +0100)]
[SDag] Refactor and simplify divergence calculation and checking. NFC.

3 years ago[SystemZ] Don't emit PC-relative memory accesses to unaligned symbols.
Jonas Paulsson [Thu, 10 Sep 2020 13:59:36 +0000 (15:59 +0200)]
[SystemZ] Don't emit PC-relative memory accesses to unaligned symbols.

In the presence of packed structures (#pragma pack(1)) where elements are
referenced through pointers, there will be stores/loads with alignment values
matching the default alignments for the element types while the elements are
in fact unaligned. Strictly speaking this is incorrect source code, but is
unfortunately part of existing code and therefore now addressed.

This patch improves the pattern predicate for PC-relative loads and stores by
not only checking the alignment value of the instruction, but also making
sure that the symbol (and element) itself is aligned.

Fixes https://bugs.llvm.org/show_bug.cgi?id=44405

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D87510

3 years ago[mlir][GPU] Improve constant sinking in kernel outlining
Stephan Herhut [Tue, 29 Sep 2020 11:20:37 +0000 (13:20 +0200)]
[mlir][GPU] Improve constant sinking in kernel outlining

The previous implementation did not support sinking simple expressions. In particular,
it is often beneficial to sink dim operations.

Differential Revision: https://reviews.llvm.org/D88439

3 years ago[LoopUtils] Only verify SE in builds with assertions.
Florian Hahn [Tue, 29 Sep 2020 12:37:24 +0000 (13:37 +0100)]
[LoopUtils] Only verify SE in builds with assertions.

Follow up to 60b852092c98.

3 years ago[sanitizer] Don't build gmock for tests (follow-up to 82827244).
Hans Wennborg [Tue, 29 Sep 2020 12:29:58 +0000 (14:29 +0200)]
[sanitizer] Don't build gmock for tests (follow-up to 82827244).

A use of gmock was briefly added in a90229d6, but was soon removed in
82827244. This also removes it from the cmake files.

3 years ago[SYCL] Assume SYCL device functions are convergent
Alexey Bader [Tue, 25 Aug 2020 14:05:19 +0000 (17:05 +0300)]
[SYCL] Assume SYCL device functions are convergent

SYCL device compiler (similar to other SPMD compilers) assumes that
functions are convergent by default to avoid invalid transformations.
This attribute can be removed if compiler can prove that function does
not have convergent operations.

Reviewed By: Naghasan

Differential Revision: https://reviews.llvm.org/D87282

3 years ago[AArch64] Add BTI to CFI jumptables.
Daniel Kiss [Tue, 29 Sep 2020 11:35:25 +0000 (13:35 +0200)]
[AArch64] Add BTI to CFI jumptables.

With branch protection the jump to the jump table entries requires a landing pad.

Reviewed By: eugenis, tamas.petz

Differential Revision: https://reviews.llvm.org/D81251

3 years ago[IndVarSimplify] Fix Modified status for removal of overflow intrinsics
David Stenberg [Tue, 29 Sep 2020 09:04:13 +0000 (11:04 +0200)]
[IndVarSimplify] Fix Modified status for removal of overflow intrinsics

When removing an overflow intrinsic the Changed status in SimplifyIndvar
was not set, leading to the IndVarSimplify pass returning an incorrect
status.

This was caught using the check introduced by D80916.

As pointed out in the code review, a similar bug may exist for
eliminateTrunc().

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D85971

3 years ago[msan] Fix llvm.abs.v intrinsic
Vitaly Buka [Tue, 29 Sep 2020 10:15:37 +0000 (03:15 -0700)]
[msan] Fix llvm.abs.v intrinsic

The last argument of the intrinsic is a boolean
flag to control INT_MIN handling and does
not affect msan metadata.

3 years ago[msan] Add test for vector abs intrinsic
Vitaly Buka [Tue, 29 Sep 2020 10:08:24 +0000 (03:08 -0700)]
[msan] Add test for vector abs intrinsic

3 years ago[OpenMPOpt][Fix] Only initialize ICV initial values once.
sstefan1 [Tue, 29 Sep 2020 09:51:36 +0000 (11:51 +0200)]
[OpenMPOpt][Fix] Only initialize ICV initial values once.

Reviewers: jdoerfert, ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D88441

3 years ago[InstCombine] Add trunc(lshr(sext(x),c)) non-uniform vector tests
Simon Pilgrim [Tue, 29 Sep 2020 09:56:00 +0000 (10:56 +0100)]
[InstCombine] Add trunc(lshr(sext(x),c)) non-uniform vector tests

3 years ago[LoopDeletion] Forget loop before setting values to undef
Florian Hahn [Tue, 29 Sep 2020 09:38:44 +0000 (10:38 +0100)]
[LoopDeletion] Forget loop before setting values to undef

After D71539, we need to forget the loop before setting the incoming
values of phi nodes in exit blocks, because we are looking through those
phi nodes now and the SCEV expression could depend on the loop phi. If
we update the phi nodes before forgetting the loop, we miss those users
during invalidation.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D88167

3 years ago[SCEV][NFC] Introduce isBasicBlockEntryGuardedByCond
Max Kazantsev [Tue, 29 Sep 2020 08:32:53 +0000 (15:32 +0700)]
[SCEV][NFC] Introduce isBasicBlockEntryGuardedByCond

Currently, we have `isLoopEntryGuardedByCond` method in SCEV, which
checks that some fact is true if we enter the loop. In fact, this is just a
particular case of more general concept `isBasicBlockEntryGuardedByCond`
applied to given loop's header. In fact, the logic if this code is largely
independent on the given loop and only cares code above it.

This patch makes this generalization. Now we can query it for any block,
and `isBasicBlockEntryGuardedByCond` is just a particular case.

Differential Revision: https://reviews.llvm.org/D87828
Reviewed By: fhahn

3 years agoRevert "OpaquePtr: Add type to sret attribute"
Tres Popp [Tue, 29 Sep 2020 08:24:54 +0000 (10:24 +0200)]
Revert "OpaquePtr: Add type to sret attribute"

This reverts commit 55c4ff91bd820d72014f63dcf7f3d5a0d3397986.

Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.

3 years ago[IsKnownNonZero] Handle the case with non-constant phi nodes
Serguei Katkov [Thu, 24 Sep 2020 17:45:15 +0000 (00:45 +0700)]
[IsKnownNonZero] Handle the case with non-constant phi nodes

Handle the case when all inputs of phi are proven to be non zero.

Constants are checked in beginning of this method before check for depth of recursion,
so it is a partial case of non-constant phi.

Recursion depth is already handled by the function.

Reviewers: aqjune, nikic, efriedma
Reviewed By: nikic
Subscribers: dantrushin, hiraditya, jdoerfert, llvm-commits
Differential Revision: https://reviews.llvm.org/D88276

3 years agoRevert "Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one.""
Florian Hahn [Tue, 29 Sep 2020 08:18:19 +0000 (09:18 +0100)]
Revert "Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one.""

Looks like there is still another remaining issue:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/22273/steps/build%20libcxx%2Fmsan/logs/stdio

This reverts commit 86a20d9e34f5a9989da72097f23f3b0a44157e73.

3 years agoRecommit "[SCCP] Do not replace deref'able ptr with un-deref'able one."
Florian Hahn [Mon, 28 Sep 2020 15:08:30 +0000 (16:08 +0100)]
Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one."

This version includes an small fix allowing function pointers to be
unconditionally replaced for now.

This reverts commit 4c5e4aa89b11ec3253258b8df5125833773d1b1e.

3 years ago[NFC][ARM] Comments and lambdas
Sam Parker [Tue, 29 Sep 2020 07:41:53 +0000 (08:41 +0100)]
[NFC][ARM] Comments and lambdas

Add some comments in LowOverheadLoops and make some lambda variables
explicit arguments instead of capturing.

3 years agoThis reduces code duplication between CGObjCMac.cpp and Mangle.cpp
Ellis Hoag [Tue, 29 Sep 2020 06:25:24 +0000 (02:25 -0400)]
This reduces code duplication between CGObjCMac.cpp and Mangle.cpp
for generating the mangled name of an Objective-C method.

This has no intended functionality change.

https://reviews.llvm.org/D88329

3 years ago[Driver] Filter out <libdir>/gcc and <libdir>/gcc-cross if they do not exists
Dmitry Antipov [Tue, 29 Sep 2020 03:32:51 +0000 (06:32 +0300)]
[Driver] Filter out <libdir>/gcc and <libdir>/gcc-cross if they do not exists

Differential Revision: https://reviews.llvm.org/D87901

3 years ago[X86] Add computeKnownBits support for PEXT.
Craig Topper [Tue, 29 Sep 2020 05:52:31 +0000 (22:52 -0700)]
[X86] Add computeKnownBits support for PEXT.

The number of zeros in the mask provides a lower bound on the number
of leading zeros in the result.

3 years ago[X86] Add known bits test for PEXT. NFC
Craig Topper [Tue, 29 Sep 2020 05:46:27 +0000 (22:46 -0700)]
[X86] Add known bits test for PEXT. NFC

3 years agoRevert "[OpenMP][FIX] Verify compatible types for declare variant calls"
Johannes Doerfert [Tue, 29 Sep 2020 05:36:45 +0000 (00:36 -0500)]
Revert "[OpenMP][FIX] Verify compatible types for declare variant calls"

This reverts commit c942095790decf525a445f3bd68fb9bcc9aa43c6.

One of the tests broke, revert to investigate.

3 years ago[Docs][NewPM] Add note about required passes
Arthur Eubanks [Fri, 25 Sep 2020 22:21:54 +0000 (15:21 -0700)]
[Docs][NewPM] Add note about required passes

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D88342

3 years ago[NFC] Use assert instead of checking the guaranteed condition
Max Kazantsev [Tue, 29 Sep 2020 04:37:17 +0000 (11:37 +0700)]
[NFC] Use assert instead of checking the guaranteed condition

From preconditions it is known that either A dominates B or
B dominates A. If A does not dominate B, we do not really need
to check it. Assert should be enough. Should save some compile
time.

3 years ago[IndVars] Remove exiting conditions that are trivially true/false
Max Kazantsev [Tue, 29 Sep 2020 04:34:15 +0000 (11:34 +0700)]
[IndVars] Remove exiting conditions that are trivially true/false

When removing exiting loop conditions, we only consider checks for
which we know the exact exit count. We could also eliminate checks for
which the condition is always true/false.

Differential Revision: https://reviews.llvm.org/D87344
Reviewed By: lebedev.ri, reames

3 years ago[OpenMP][FIX] Verify compatible types for declare variant calls
Johannes Doerfert [Sun, 27 Sep 2020 20:52:52 +0000 (15:52 -0500)]
[OpenMP][FIX] Verify compatible types for declare variant calls

Especially for templates we need to check at some point if the base
function matches the specialization we might call instead. Before this
lead to the replacement of `std::sqrt(int(2))` calls with one that
converts the argument to a `std::complex<int>`, clearly not the desired
behavior.

Reported as PR47655

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D88384

3 years ago[MLIR][OpenMP] Removed the ambiguity in flush op assembly syntax
Kiran Kumar T P [Tue, 29 Sep 2020 04:11:46 +0000 (09:41 +0530)]
[MLIR][OpenMP] Removed the ambiguity in flush op assembly syntax

Summary:
========
Bugzilla Ticket No: Bug 46884 [https://bugs.llvm.org/show_bug.cgi?id=46884]

Flush op assembly syntax was ambiguous:

Consider the below test case:
flush operation is not having any arguments.
But the next statement token i.e "%2" is read as the argument for flush operation and then translator issues an error.
***************************************************************
$ cat -n flush.mlir
     1  llvm.func @_QQmain(%arg0: !llvm.i32) {
     2    %0 = llvm.mlir.constant(1 : i64) : !llvm.i64
     3    %1 = llvm.alloca %0 x !llvm.i32 {in_type = i32, name = "a"} : (!llvm.i64) -> !llvm.ptr<i32>
     4    omp.flush
     5    %2 = llvm.load %1 : !llvm.ptr<i32>
     6    llvm.return
     7  }

$ mlir-translate -mlir-to-llvmir flush.mlir
flush.mlir:5:6: error: expected ':'
  %2 = llvm.load %1 : !llvm.ptr<i32>
     ^
***************************************************************

Solution:
=========
Introduced begin ( `(` ) and end token ( `)` ) to determince the begin and end of variadic arguments.

The patch includes code changes and testcase modifications.

Reviewed By: Valentin Clement, Mehdi AMINI

Differential Revision: https://reviews.llvm.org/D88376

3 years agoBPF: explicitly specify bpfel triple for certain tests
Yonghong Song [Tue, 29 Sep 2020 03:15:05 +0000 (20:15 -0700)]
BPF: explicitly specify bpfel triple for certain tests

Commit 54d9f743c8b0 ("BPF: move AbstractMemberAccess and
PreserveDIType passes to EP_EarlyAsPossible") changed most
of CORE tests with opt run followed by llc and opt requires
the target triple specified in the IR.

There are few tests where little endian and big endian will
report different result and for little endian versions of
tests, "target triple = "bpf"" will produce wrong results
if the test executed in a big endian machine, e.g.
PowerPC big endian machine, since target "bpf" represents
host endian and will resolve to "bpfeb".
The builtbot reported such failures when build-and-run
on a PowerPC big endian machine.

To fix the issue, using "target triple = "bpfel"" instead.

3 years ago[HIP] Return non-zero value for invalid target ID
Yaxun (Sam) Liu [Sun, 27 Sep 2020 03:29:57 +0000 (23:29 -0400)]
[HIP] Return non-zero value for invalid target ID

This is part of https://reviews.llvm.org/D60620

3 years agoRecommit "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Yaxun (Sam) Liu [Tue, 29 Sep 2020 02:39:21 +0000 (22:39 -0400)]
Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024"

Recommit 04abbb3a78186aa92809866b43217c32cba90b71

3 years ago[AArch64][GlobalISel] Scalarize <2 x s64> G_MUL since we don't have native support...
Amara Emerson [Mon, 28 Sep 2020 16:46:26 +0000 (09:46 -0700)]
[AArch64][GlobalISel] Scalarize <2 x s64> G_MUL since we don't have native support for it.

Differential Revision: https://reviews.llvm.org/D88437

3 years agoSkip -fPIE for AMDGPU and HIP toolchain
Yaxun (Sam) Liu [Mon, 28 Sep 2020 16:07:06 +0000 (12:07 -0400)]
Skip -fPIE for AMDGPU and HIP toolchain

AMDGPU toolchain does not support -fPIE, therefore skip it if specified by driver.

Differential Revision: https://reviews.llvm.org/D88425

3 years ago[mlir][openacc] Add acc.data operation verifier
Valentin Clement [Tue, 29 Sep 2020 01:22:07 +0000 (21:22 -0400)]
[mlir][openacc] Add acc.data operation verifier

Add a basic verifier for the data operation following the restriction from the standard.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D88334

3 years ago[clangd] When finding refs for a renaming alias, do not return refs to underlying...
Nathan Ridge [Mon, 7 Sep 2020 06:28:46 +0000 (02:28 -0400)]
[clangd] When finding refs for a renaming alias, do not return refs to underlying decls

Fixes https://github.com/clangd/clangd/issues/515

Differential Revision: https://reviews.llvm.org/D87225

3 years agoRemove dependency from LLVM Dialect on the OpenMP dialect
Mehdi Amini [Mon, 28 Sep 2020 22:16:12 +0000 (22:16 +0000)]
Remove dependency from LLVM Dialect on the OpenMP dialect

The OmpDialect is in practice optional during translation to LLVM IR: the code is tolerant
to have a "nullptr" when not present / needed.

The dependency still exists on the export to LLVMIR.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D88351

3 years ago[gn build] Port 54d9f743c8b
LLVM GN Syncbot [Tue, 29 Sep 2020 00:24:06 +0000 (00:24 +0000)]
[gn build] Port 54d9f743c8b

3 years agoEnsure that we don't compute linkage for an anonymous class too early if
Richard Smith [Tue, 29 Sep 2020 00:21:42 +0000 (17:21 -0700)]
Ensure that we don't compute linkage for an anonymous class too early if
it has a member whose name is the same as a builtin.

Fixes a regression from the introduction of BuiltinAttr.

3 years ago[clang] Update warning-wall.c test
Jan Korous [Tue, 29 Sep 2020 00:19:31 +0000 (17:19 -0700)]
[clang] Update warning-wall.c test

Follow-up to 1e86d637eb4f:
[clang] Selectively ena/disa-ble format-insufficient-args warning

3 years ago[RegisterCoalescer] Pass Undefs to extendToIndices()
Ruiling Song [Tue, 15 Sep 2020 00:06:57 +0000 (08:06 +0800)]
[RegisterCoalescer] Pass Undefs to extendToIndices()

When extending the subranges, the reaching-def may be an undefs. When
extending such kind of subrange, it will try to search for the reaching
def first. If the reaching def is an undef and we did not provide 'Undefs',
The findReachingDefs() will fail with message:
"Use of $noreg does not have a corresponding definition on every path:
 LLVM ERROR: Use not jointly dominated by defs."
So we computeSubRangeUndefs() and pass the result to extendToIndices().

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87744

3 years agoBuildVectorType with a dependent (array) type is crashing the compiler - Fix for...
Zahira Ammarguellat [Mon, 28 Sep 2020 23:54:40 +0000 (16:54 -0700)]
BuildVectorType with a dependent (array) type is crashing the compiler  - Fix for PR-47542

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D88150

3 years agoBPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible
Yonghong Song [Thu, 3 Sep 2020 05:56:41 +0000 (22:56 -0700)]
BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible

Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.

Currently, compiler may transform the above code
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
    p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
    bpf_probe_read(buf, buf_size, p2);
  }
to
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    bpf_probe_read(buf, buf_size, p2);
  }
and eventually assembly code looks like
  reloc_exist = 1;
  reloc_member_offset = 10; //calculate member offset from base
  p2 = base + reloc_member_offset;
  if (reloc_exist) {
    bpf_probe_read(bpf, buf_size, p2);
  }
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.

This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
  %6 = load @llvm.sk_buff:0:50$0:0:0:2:0
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.

But this transformation has another consequence, code sinking
may happen like below:
  PHI = <possibly different @preserve_*_access_globals>
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6

For such cases, we will not able to generate relocations since
multiple relocations are merged into one.

This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.

A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
  - report fatal error if any reloc global in PHI nodes
  - remove all bpf passthrough builtin functions

Changes for existing CORE tests:
  - for clang tests, add "-Xclang -disable-llvm-passes" flags to
    avoid builtin->reloc_global transformation so the test is still
    able to check correctness for clang generated IR.
  - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
    before "llc" command since "opt" is needed to call newly-placed
    builtin->reloc_global transformation. Add target triple in the IR
    file since "opt" requires it.
  - Since target triple is added in IR file, if a test may produce
    different results for different endianness, two tests will be
    created, one for bpfeb and another for bpfel, e.g., some tests
    for relocation of lshift/rshift of bitfields.
  - field-reloc-bitfield-1.ll has different relocations compared to
    old codes. This is because for the structure in the test,
    new code returns struct layout alignment 4 while old code
    is 8. Align 8 is more precise and permits double load. With align 4,
    the new mechanism uses 4-byte load, so generating different
    relocations.
  - test intrinsic-transforms.ll is removed. This is used to test
    cse on intrinsics so we do not lose metadata. Now metadata is attached
    to global and not instruction, it won't get lost with cse.

Differential Revision: https://reviews.llvm.org/D87153

3 years ago[clang][driver][AIX] Set compiler-rt as default rtlib
David Tenty [Thu, 3 Sep 2020 22:34:57 +0000 (18:34 -0400)]
[clang][driver][AIX] Set compiler-rt as default rtlib

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D88182

3 years agoAttempt to clear some msan errors in the libcxx atomic tests.
ogiroux [Mon, 28 Sep 2020 23:34:41 +0000 (16:34 -0700)]
Attempt to clear some msan errors in the libcxx atomic tests.

3 years ago[mlir][Affine][VectorOps] Fix super vectorizer utility (D85869)
Diego Caballero [Mon, 28 Sep 2020 23:15:13 +0000 (16:15 -0700)]
[mlir][Affine][VectorOps] Fix super vectorizer utility (D85869)

Adding missing code that should have been part of "D85869: Utility to
vectorize loop nest using strategy."

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D88346

3 years ago[scudo][standalone] Remove unused atomic_compare_exchange_weak
Kostya Kortchinsky [Mon, 28 Sep 2020 20:07:33 +0000 (13:07 -0700)]
[scudo][standalone] Remove unused atomic_compare_exchange_weak

`atomic_compare_exchange_weak` is unused in Scudo, and its associated
test is actually wrong since the weak variant is allowed to fail
spuriously (thanks Roland).

This lead to flakes such as:
```
[ RUN      ] ScudoAtomicTest.AtomicCompareExchangeTest
../../zircon/third_party/scudo/src/tests/atomic_test.cpp:98: Failure: Expected atomic_compare_exchange_weak(reinterpret_cast<T *>(&V), &OldVal, NewVal, memory_order_relaxed) is true.
    Expected: true
    Which is: 01
    Actual  : atomic_compare_exchange_weak(reinterpret_cast<T *>(&V), &OldVal, NewVal, memory_order_relaxed)
    Which is: 00
../../zircon/third_party/scudo/src/tests/atomic_test.cpp:100: Failure: Expected atomic_compare_exchange_weak( reinterpret_cast<T *>(&V), &OldVal, NewVal, memory_order_relaxed) is false.
    Expected: false
    Which is: 00
    Actual  : atomic_compare_exchange_weak( reinterpret_cast<T *>(&V), &OldVal, NewVal, memory_order_relaxed)
    Which is: 01
../../zircon/third_party/scudo/src/tests/atomic_test.cpp:101: Failure: Expected OldVal == NewVal.
    Expected: NewVal
    Which is: 24
    Actual  : OldVal
    Which is: 42
[  FAILED  ] ScudoAtomicTest.AtomicCompareExchangeTest (0 ms)
[----------] 2 tests from ScudoAtomicTest (1 ms total)
```

So I am removing this, if someone ever needs the weak variant, feel
free to add it back with a test that is not as terrible. This test was
initially ported from sanitizer_common, but their weak version calls
the strong version, so it works for them.

Differential Revision: https://reviews.llvm.org/D88443

3 years ago[clang] Selectively ena/disa-ble format-insufficient-args warning
Jan Korous [Wed, 16 Sep 2020 16:52:51 +0000 (09:52 -0700)]
[clang] Selectively ena/disa-ble format-insufficient-args warning

Differential Revision: https://reviews.llvm.org/D87176

3 years agoGuard `find_library(tensorflow_c_api ...)` by checking for TENSORFLOW_C_LIB_PATH...
Mehdi Amini [Mon, 28 Sep 2020 20:46:22 +0000 (20:46 +0000)]
Guard `find_library(tensorflow_c_api ...)` by checking for TENSORFLOW_C_LIB_PATH to be set by the user

Also have CMake fails if the user provides a TENSORFLOW_C_LIB_PATH but
we can't find TensorFlow at this path.

At the moment the CMake script tries to figure if TensorFlow is
available on the system and enables support for it. This is in general
not desirable to customize build features this way and instead it is
preferable to let the user opt-in explicitly into the features they want
to enable. This is in line with other optional external dependencies
like Z3.
There are a few reasons to this but amongst others:
- reproducibility: making features "magically" enabled based on whether
  we find a package on the system or not makes it harder to handle bug
  reports from users.
- user control: they can't have TensorFlow on the system and build LLVM
  without TensorFlow right now. They also would suddenly distribute LLVM
  with a different set of features unknowingly just because their build
  machine environment would change subtly.

Right now this is motivated by a user reporting build failures on their system:

.../mesa-git/llvm-git/src/llvm-project/llvm/lib/Analysis/TFUtils.cpp:23:10: fatal error: tensorflow/c/c_api.h: No such file or directory
   23 | #include "tensorflow/c/c_api.h"
      |          ^~~~~~

It looks like we detected TensorFlow at configure time but couldn't set all the paths correctly.

Differential Revision: https://reviews.llvm.org/D88371

3 years ago[CVP] Allow two transforms in one invocation
Philip Reames [Mon, 28 Sep 2020 22:08:25 +0000 (15:08 -0700)]
[CVP] Allow two transforms in one invocation

For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early.

This is somewhat of a speculative fix.  Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding.  I never received a test case.  However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue.  So, since it's a real issue, fix it and hope.

3 years ago[EHStreamer] Simplify sharedTypeIDs with std::mismatch
Fangrui Song [Mon, 28 Sep 2020 22:05:09 +0000 (15:05 -0700)]
[EHStreamer] Simplify sharedTypeIDs with std::mismatch

(Note that EMStreamer.cpp is largely under tested. The only test checking the prefix sharing is CodeGen/WebAssembly/eh-lsda.ll)

3 years ago[mlir][shape] Make conversion passes more consistent.
Sean Silva [Thu, 24 Sep 2020 20:03:30 +0000 (13:03 -0700)]
[mlir][shape] Make conversion passes more consistent.

- use select-ops to make the lowering simpler
- change style of FileCheck variables names to be consistent
- change some variable names in the code to be more explicit

Differential Revision: https://reviews.llvm.org/D88258

3 years ago[libcxx] Don't pass -s to libtool
Petr Hosek [Mon, 28 Sep 2020 21:18:55 +0000 (14:18 -0700)]
[libcxx] Don't pass -s to libtool

This flag is the default in libtool on Darwin, and it's not supported
by llvm-libtool-darwin causing a build failure.

Differential Revision: https://reviews.llvm.org/D88449

3 years ago[libc++] Fix constexpr dynamic allocation on GCC 10
Louis Dionne [Mon, 28 Sep 2020 21:29:52 +0000 (17:29 -0400)]
[libc++] Fix constexpr dynamic allocation on GCC 10

We're technically not allowed by the Standard to call ::operator new in
constexpr functions like __libcpp_allocate. Clang doesn't seem to complain
about it, but GCC does.

3 years ago[X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a consta...
Craig Topper [Mon, 28 Sep 2020 21:20:20 +0000 (14:20 -0700)]
[X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a constant mask.

We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits

-If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it.
-The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted.
-The result will have zeros in any position that the mask is zero.
-Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input.

Differential Revision: https://reviews.llvm.org/D87883

3 years ago[X86] Add tests for D87883. NFC
Craig Topper [Mon, 28 Sep 2020 21:14:14 +0000 (14:14 -0700)]
[X86] Add tests for D87883. NFC

3 years ago[GlobalISel] Add support for lowering of vector G_SELECT and use for AArch64.
Amara Emerson [Sat, 26 Sep 2020 17:02:39 +0000 (10:02 -0700)]
[GlobalISel] Add support for lowering of vector G_SELECT and use for AArch64.

The lowering is a port of the SDAG expansion.

Differential Revision: https://reviews.llvm.org/D88364

3 years ago[CMake][AIX] Limit tools in external project build
David Tenty [Thu, 20 Aug 2020 22:24:11 +0000 (18:24 -0400)]
[CMake][AIX] Limit tools in external project build

This is a follow on to D85329 which disabled some llvm tools in the
runtimes build due to XCOFF64 limitations. This change disables them
in other external project builds as well, when no list of tools is
specified in the arguments.

Reviewed By: hubert.reinterpretcast, stevewan

Differential Revision: https://reviews.llvm.org/D88310

3 years ago[gn build] Re-run CompletionModelCodegen when input json files change
Nico Weber [Mon, 28 Sep 2020 20:57:48 +0000 (16:57 -0400)]
[gn build] Re-run CompletionModelCodegen when input json files change

3 years agoFix a think-o with the numerical suffixes in the docs for init_priority.
Aaron Ballman [Mon, 28 Sep 2020 20:49:15 +0000 (16:49 -0400)]
Fix a think-o with the numerical suffixes in the docs for init_priority.

3 years ago[lldb] Add print_function import
Jonas Devlieghere [Mon, 28 Sep 2020 20:50:22 +0000 (13:50 -0700)]
[lldb] Add print_function import

3 years agoRevert "Revert "[AArch64][GlobalISel] Add selection support for <8 x s16> G_INSERT_V...
Amara Emerson [Mon, 28 Sep 2020 20:42:56 +0000 (13:42 -0700)]
Revert "Revert "[AArch64][GlobalISel] Add selection support for <8 x s16>  G_INSERT_VECTOR_ELT with GPR scalar.""

This isn't a real with the codegen, it's a previously known bug in clang which
causes non-deterministic failures due to garbage bits in undef registers being
used in saturating instructions.

I'm disabling the result checking for the test until this issue is resolved.

This reverts commit 6c8168324b5329c94fe7e8f9a1619802091b9bec.

3 years ago[mlir] [VectorOps] Relaxed restrictions on vector.reduction types even more
Aart Bik [Mon, 28 Sep 2020 19:56:10 +0000 (12:56 -0700)]
[mlir] [VectorOps] Relaxed restrictions on vector.reduction types even more

Recently, restrictions on vector reductions were made more relaxed by
accepting any width signless integer and floating-point. This CL relaxes
the restriction even more by including unsigned and signed integers.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D88442

3 years ago[X86] Use inlineasm flag output for the _bittest* intrinsics.
Craig Topper [Mon, 28 Sep 2020 19:32:34 +0000 (12:32 -0700)]
[X86] Use inlineasm flag output for the _bittest* intrinsics.

Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.

If the flag can't be used directly, the backend will emit a setcc.

Differential Revision: https://reviews.llvm.org/D87888

3 years ago[InstCombine] Regenerate cast tests. NFC.
Simon Pilgrim [Mon, 28 Sep 2020 20:31:55 +0000 (21:31 +0100)]
[InstCombine] Regenerate cast tests. NFC.

3 years ago[COFF] Aliases resolve directly to defined external targets
Eric Astor [Mon, 28 Sep 2020 20:11:44 +0000 (16:11 -0400)]
[COFF] Aliases resolve directly to defined external targets

Avoid introducing unnecessary indirection for weak-external references.

We only need to introduce ".weak.<SYMBOL>.default" when referencing a
symbol that is defined, but not external.

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D88305

3 years ago[libc++] Replace uses of __libcpp_allocate by std::allocator<>
Louis Dionne [Mon, 28 Sep 2020 19:47:49 +0000 (15:47 -0400)]
[libc++] Replace uses of __libcpp_allocate by std::allocator<>

Both are equivalent, however std::allocator can appear in constant
expressions and is higher level.

3 years ago[libc++] Add UNSUPPORTED markup to atomic test in single-threaded mode
Louis Dionne [Mon, 28 Sep 2020 18:45:48 +0000 (14:45 -0400)]
[libc++] Add UNSUPPORTED markup to atomic test in single-threaded mode

3 years ago[libc++] Fix heap UaF issue in coroutine test
Louis Dionne [Mon, 28 Sep 2020 18:28:45 +0000 (14:28 -0400)]
[libc++] Fix heap UaF issue in coroutine test

This wasn't being flagged by older versions of ASAN, but it is now.

3 years ago[wasm] Move WasmTraits.h to BinaryFormat
Benjamin Kramer [Mon, 28 Sep 2020 20:06:34 +0000 (22:06 +0200)]
[wasm] Move WasmTraits.h to BinaryFormat

There's no dependency on Object in there and this avoids a cyclic
dependency between libMC and libObject.

3 years ago[CostModel] remove hack for intrinsic cost based on cost type
Sanjay Patel [Mon, 28 Sep 2020 19:54:11 +0000 (15:54 -0400)]
[CostModel] remove hack for intrinsic cost based on cost type

This hack seems to only have been necessary because of the
constructor bug noted in 33125cffd.

Once again, it's hard to prove NFC, but that's the hope...

3 years agoOnce we've found a firmware binary and loaded it, don't search more
Jason Molenda [Mon, 28 Sep 2020 19:42:16 +0000 (12:42 -0700)]
Once we've found a firmware binary and loaded it, don't search more

Add the flag in ProcessMachCore::DoLoadCore that stops additional
searches for the binaries when we have an LC_NOTE identifying the
firmware/standalone binary as the correct one & we have loaded it
successfully.

3 years ago[lldb] Enable markdown support for documentation
Jonas Devlieghere [Mon, 28 Sep 2020 19:48:22 +0000 (12:48 -0700)]
[lldb] Enable markdown support for documentation

This enables support for writing LLDB documentation in markdown in
addition to reStructured text. We already had documentation written in
markdown (StructuredDataPlugins and DarwinLog) which will now also be
available on the website.

3 years ago[PowerPC] Legalize v256i1 and v512i1 and implement load and store of these types
Baptiste Saleil [Mon, 28 Sep 2020 19:12:14 +0000 (14:12 -0500)]
[PowerPC] Legalize v256i1 and v512i1 and implement load and store of these types

This patch legalizes the v256i1 and v512i1 types that will be used for MMA.

It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.

This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.

Differential Revision: https://reviews.llvm.org/D84968