review.tizen.org Git - platform/upstream/llvm.git/log

projects / platform / upstream / llvm.git / log

Zarko Todorovski [Mon, 10 May 2021 12:06:28 +0000 (08:06 -0400)]

[PowerPC] Enable safe for 32bit vins* P10 instructions

Correctly emit `vins`instructions that are safe in 32bit mode.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D101383

commit | commitdiff | tree

Alexey Bataev [Thu, 6 May 2021 20:44:03 +0000 (13:44 -0700)]

[SLP]Do not count perfect diamond matches for gathers several times.

Need to remove the old code for avoiding double counting of the gather
nodes with perfect diamond matches within the tree after we started
detecting perfect/shuffled matching in the previous patch D100495. We
may skip the cost for such nodes completely.

Differential Revision: https://reviews.llvm.org/D102023

commit | commitdiff | tree

jasonliu [Mon, 10 May 2021 13:45:36 +0000 (13:45 +0000)]

[libc++][AIX] Define _LIBCPP_ELAST

The aim is to define _LIBCPP_ELAST for AIX since strerror/strerror_r
can't handle out-of-range errno values.

Differential Revision: https://reviews.llvm.org/D100986

commit | commitdiff | tree

Bradley Smith [Thu, 6 May 2021 11:19:38 +0000 (12:19 +0100)]

[AArch64][SVE] Improve SVE codegen for fixed length BITCAST

Expanding a fixed length operation involves wrapping the operation in an
insert/extract subvector pair, as such, when this is done to bitcast we
end up with an extract_subvector of a bitcast. DAGCombine tries to
convert this into a bitcast of an extract_subvector which restores the
initial fixed length bitcast, causing an infinite loop of legalization.

As part of this patch, we must make sure the above DAGCombine does not
trigger after legalization if the created bitcast would not be legal.

Differential Revision: https://reviews.llvm.org/D101990

commit | commitdiff | tree

Alexey Bataev [Wed, 5 May 2021 15:01:58 +0000 (08:01 -0700)]

[OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

Follow the more general patch for now, do not try to SPMDize the kernel
if the variable is used and local.

Differential Revision: https://reviews.llvm.org/D101911

commit | commitdiff | tree

qixingxue [Mon, 10 May 2021 06:32:21 +0000 (14:32 +0800)]

[TableGen] Remove redundant `Error:` in msg (NFC)

Since calling `PrintFatalError` will automatically add `error: `
prefix in the message printed, there is no need having an extra
`ERROR:` prefix in the argument passed.

Differential Revision: https://reviews.llvm.org/D102151
Reviewed By: Paul-C-Anagnostopoulos

commit | commitdiff | tree

Simon Pilgrim [Mon, 10 May 2021 13:00:19 +0000 (14:00 +0100)]

X86FlagsCopyLowering.cpp - try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies. NFCI.

commit | commitdiff | tree

Simon Pilgrim [Mon, 10 May 2021 12:58:05 +0000 (13:58 +0100)]

X86LoadValueInjectionLoadHardening.cpp - use const-reference in for-range loops to avoid unnecessary copies. NFCI.

commit | commitdiff | tree

Fraser Cormack [Fri, 7 May 2021 16:41:27 +0000 (17:41 +0100)]

[Constant] Allow ConstantAggregateZero a scalable element count

A ConstantAggregateZero may be created from a scalable vector type.
However, it still assumed fixed number of elements when queried for
them. This patch changes ConstantAggregateZero to correctly report its
element count.

This change fixes a couple of issues. Firstly, it fixes a crash in
Constant::getUniqueValue when called on a scalable-vector
zeroinitializer constant.

Secondly, it fixes a latent bug in GlobalISel's IRTranslator in which
translating a scalable-vector zeroinitializer would hit the assertion in
ConstantAggregateZero::getNumElements when casting to a FixedVectorType,
rather than reporting an error more gracefully. This is currently
hypothetical as the IRTranslator has deeper issues preventing the use of
scalable vector types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102082

commit | commitdiff | tree

Christian Kandeler [Mon, 10 May 2021 12:56:55 +0000 (14:56 +0200)]

[clangd] Fix data type of WorkDoneProgressReport::percentage

According to the specification, this should be an unsigned integer.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D101616

commit | commitdiff | tree

Djordje Todorovic [Mon, 10 May 2021 12:08:33 +0000 (05:08 -0700)]

[NFC][llvm-dwarfdump] Code clean up for inlined var loc stats

This is preparation for the https://reviews.llvm.org/D101025.
The D101025 will start calculating var locstats for concrete fns
that refere to an abstract origin as well.

commit | commitdiff | tree

Nico Weber [Mon, 10 May 2021 12:47:28 +0000 (08:47 -0400)]

clang: Fix tests after 7f78e409d028 if clang is not called clang-13

We might release a new version at some point after all.
In fact, use the same pattern the other CHECK lines in this test
use, for consistency.

commit | commitdiff | tree

Bradley Smith [Fri, 30 Apr 2021 15:21:50 +0000 (16:21 +0100)]

[AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics

When using predicated intrinsics, if the predicate used is all lanes active,
use an unpredicated form of the instruction, additionally this allows for
better use of immediate forms.

This only includes instructions where the unpredicated/predicated forms
matched in such a way that instruction selection would not introduce extra
ptrue instructions. This allows us to convert the intrinsics directly to
architecture independent ISD nodes.

Depends on D101062

Differential Revision: https://reviews.llvm.org/D101828

commit | commitdiff | tree

Bradley Smith [Fri, 30 Apr 2021 15:17:37 +0000 (16:17 +0100)]

[AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics

When using predicated arithmetic intrinsics, if the predicate used is all
lanes active, use an unpredicated form of the instruction, additionally
this allows for better use of immediate forms.

This also includes a new complex isel pattern which allows matching an
all active predicate when the types are different but the predicate is a
superset of the type being used. For example, to allow a b8 ptrue for a
b32 predicate operand.

This only includes instructions where the unpredicated/predicated forms
are mismatched between variants, meaning that the removal of the
predicate is done during instruction selection in order to prevent
spurious re-introductions of ptrue instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>
Differential Revision: https://reviews.llvm.org/D101062

commit | commitdiff | tree

Momchil Velikov [Mon, 10 May 2021 10:19:13 +0000 (11:19 +0100)]

[GlobalISel] Fix wrong invocation of `getParamStackAlign` (NFC)

The function template `CallLowering::setArgFlags` is invoked both
for arguments and return values. In the latter case, it calls
`getParamStackAlign` with argument index `~0u`. Nothing wrong
happens now, as the argument is safely incremented back to 0
inside `getParamStackAlign` (the type is `unsigned`), but in
principle it's fragile and may become incorrect.

Differential Revision: https://reviews.llvm.org/D102004

commit | commitdiff | tree

Sander de Smalen [Mon, 10 May 2021 10:27:38 +0000 (11:27 +0100)]

[AArch64][SVE] Fix isel failure for FP-extending loads

DAGCombiner tries to combine a (fpext (load)) to (fround (extload))
but SVE has no FP-extending loads. By marking these as expand,
the combine no longer happens.

This also fixes a similar issue for fptrunc, where the source type
is not a legal type.

Reviewed By: bsmith, kmclaughlin

Differential Revision: https://reviews.llvm.org/D102053

commit | commitdiff | tree

Simon Pilgrim [Mon, 10 May 2021 09:49:08 +0000 (10:49 +0100)]

HexagonVectorCombine.cpp - don't negate a bool value. NFCI.

Silences MSVC warning.

commit | commitdiff | tree

Kadir Cetinkaya [Fri, 7 May 2021 13:22:29 +0000 (15:22 +0200)]

[clang][PreProcessor] Cutoff parsing after hitting completion point

This fixes a crash caused by Lexers being invalidated at code
completion points in
https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/PPLexerChange.cpp#L520.

Differential Revision: https://reviews.llvm.org/D102069

commit | commitdiff | tree

Mats Petersson [Mon, 10 May 2021 08:54:41 +0000 (08:54 +0000)]

[OpenMP][MLIR]Add support for guided, auto and runtime scheduling

When using parallel loop construct, the OpenMP specification allows for
guided, auto and runtime as scheduling variants (as well as static and
dynamic which are already supported).

This adds the translation from MLIR to LLVM-IR for these scheduling
variants.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101435

commit | commitdiff | tree

Julian Gross [Mon, 3 May 2021 14:59:59 +0000 (16:59 +0200)]

Fixed bug in buffer deallocation pass using unranked memref types.

In the buffer deallocation pass, unranked memref types are not properly supported.
After investigating this issue, it turns out that the Clone and Dealloc operation
does not support unranked memref types in the current implementation.
This patch adds the missing feature and enables the transformation of any memref
type.

This patch solves this bug: https://bugs.llvm.org/show_bug.cgi?id=48385

Differential Revision: https://reviews.llvm.org/D101760

commit | commitdiff | tree

David Spickett [Wed, 5 May 2021 10:49:35 +0000 (11:49 +0100)]

[compiler-rt] Handle None value when polling addr2line pipe

According to:
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.poll

poll can return None if the process hasn't terminated.

I'm not quite sure how addr2line could end up closing the pipe without
terminating but we did see this happen on one of our bots:
```
<...>scripts/asan_symbolize.py",
line 211, in symbolize
logging.debug("addr2line exited early (broken pipe), returncode=%d"
% self.pipe.poll())
TypeError: %d format: a number is required, not NoneType
```

Handle None by printing a message that we couldn't get the return
code.

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D101891

commit | commitdiff | tree

Frederik Gossen [Mon, 10 May 2021 08:22:23 +0000 (10:22 +0200)]

[MLIR][Shape] Concretize broadcast result type if possible

As a canonicalization, infer the resulting shape rank if possible.

Differential Revision: https://reviews.llvm.org/D102068

commit | commitdiff | tree

Guillaume Chatelet [Mon, 10 May 2021 08:23:30 +0000 (08:23 +0000)]

[libc] Simplifies multi implementations and benchmarks

This is a follow up on D101524 which:
- simplifies cpu features detection and usage,
- flattens target dependent optimizations so it's obvious which implementations are generated,
- provides an implementation targeting the host (march/mtune=native) for the mem* functions,
- makes sure all implementations are unittested (provided the host can run them),
- makes sure all implementations are benchmarkable (provided the host can run them).

Differential Revision: https://reviews.llvm.org/D101895

commit | commitdiff | tree

Petar Avramovic [Mon, 10 May 2021 08:16:09 +0000 (10:16 +0200)]

AMDGPU/GlobalISel: Use destination register bank in applyMappingLoad

Large loads on target that does not useFlatForGlobal have to be split
in regbankselect. This did not happen in case when destination had vgpr
bank and address had sgpr bank.
Instead of checking if address bank is sgpr check bank of the destination.

Differential Revision: https://reviews.llvm.org/D101992

commit | commitdiff | tree

Petar Avramovic [Fri, 7 May 2021 11:12:28 +0000 (13:12 +0200)]

AMDGPU/GlobalISel: Add regbankselect test for vgpr(dest) sgpr(address) load

Pre-commit for D101992.

commit | commitdiff | tree

Alex Zinenko [Mon, 10 May 2021 08:02:18 +0000 (10:02 +0200)]

[mlir] OpenMP-to-LLVM: properly set outer alloca insertion point

Previously, the OpenMP to LLVM IR conversion was setting the alloca insertion
point to the same position as the main compuation when converting OpenMP
`parallel` operations. This is problematic if, for example, the `parallel`
operation is placed inside a loop and would keep allocating on stack on each
iteration leading to stack overflow.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D101307

commit | commitdiff | tree

Pushpinder Singh [Fri, 7 May 2021 11:37:07 +0000 (11:37 +0000)]

[AMDGPU][OpenMP] Emit textual IR for -emit-llvm -S

Previously clang would print a binary blob into the bundled file
for amdgcn. With this patch, it will instead print textual IR as
expected.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102065

commit | commitdiff | tree

Guillaume Chatelet [Mon, 10 May 2021 07:53:48 +0000 (07:53 +0000)]

[libc] Allow target architecture customization

This patch provides a way to specify the default target cpu optimizations to use when compiling llvm-libc.
This ensures we don't rely on current compiler's default and allows compiling and cross compiling for a particular target.

Differential Revision: https://reviews.llvm.org/D101991

commit | commitdiff | tree

Pushpinder Singh [Fri, 7 May 2021 08:15:49 +0000 (08:15 +0000)]

[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails

This patch prevents runtime tests running on systems without amdgpu.

Reviewed By: protze.joachim, tianshilei1992

Differential Revision: https://reviews.llvm.org/D102054

commit | commitdiff | tree

Pushpinder Singh [Fri, 7 May 2021 11:56:46 +0000 (11:56 +0000)]

[amdgpu-arch] Guard hsa.h with __has_include

This patch is suppose to fix the issue of hsa.h not found.
Issue was reported in D99949

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102067

commit | commitdiff | tree

Fraser Cormack [Fri, 7 May 2021 10:20:21 +0000 (11:20 +0100)]

[LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion

This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.

We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102063

commit | commitdiff | tree

Adrian Kuegel [Mon, 10 May 2021 05:48:45 +0000 (07:48 +0200)]

[mlir] Fix compile error.

Inside a templated function, other class members need to be called with
this->.
Otherwise we get: explicit qualification required to use member
'setDebugName' from dependent base class.

commit | commitdiff | tree

Jun Ma [Fri, 30 Apr 2021 02:30:37 +0000 (10:30 +0800)]

[AArch64][SVE] Remove index_vector node.

Since index_vector is lowered into step_vector in D100816, we can just remove
index_vector, use step_vector for codegen directly.

Differential Revision: https://reviews.llvm.org/D101593

commit | commitdiff | tree

Lang Hames [Sun, 9 May 2021 18:20:54 +0000 (11:20 -0700)]

[ORC] Use the new dispatchTask API to run query callbacks.

Dispatching query callbacks, rather than running them on the current thread,
will allow them to be distributed across multiple threads.

commit | commitdiff | tree

Lang Hames [Sun, 9 May 2021 00:45:42 +0000 (17:45 -0700)]

[ORC] Generalize materialization dispatch to task dispatch.

Generalizing this API allows work to be distributed more evenly. In particular,
query callbacks can now be dispatched (rather than running immediately on the
thread that satisfied the query). This avoids the pathalogical case where an
operation on one thread satisfies many queries simultaneously, causing large
amounts of work to be run on that thread while other threads potentially sit
idle.

commit | commitdiff | tree

Teresa Johnson [Wed, 28 Apr 2021 22:20:04 +0000 (15:20 -0700)]

[SimplifyCFG] Ignore ephemeral values when counting insts for threading

Ignore ephemeral values (only feeding llvm.assume intrinsics) when
computing the instruction count to decide if a block is small enough for
threading. This is similar to the handling of these values in the
InlineCost computation. These instructions will eventually be removed
and shouldn't count against code size (similar to the existing ignoring
of phis).

Without this change, when enabling -fwhole-program-vtables, which causes
type test / assume sequences to be inserted by clang, we can get
different threading decisions. In particular, when building with
instrumentation FDO it can affect the optimizations decisions before FDO
matching, leading to some mismatches.

Differential Revision: https://reviews.llvm.org/D101494

commit | commitdiff | tree

Yuanfang Chen [Mon, 10 May 2021 02:04:07 +0000 (19:04 -0700)]

[NFC][Coroutines] Fix two tests by removing hardcoded SSA value.

commit | commitdiff | tree

Zakk Chen [Wed, 5 May 2021 07:53:41 +0000 (15:53 +0800)]

[RISCV][NFC] Don't need to create a new STI in RISCVAsmPrinter.

RISCVAsmPrinter already has MCSubtargetInfo.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D101889

commit | commitdiff | tree

Chia-hung Duan [Fri, 16 Apr 2021 05:34:10 +0000 (13:34 +0800)]

Support NativeCodeCall binding in rewrite pattern.

We are able to bind the result from native function while rewriting
pattern. In matching pattern, if we want to get some values back, we can
do that by passing parameter as return value placeholder. Besides, add
the semantic of '$_self' in NativeCodeCall while matching, it'll be the
operation that defines certain operand.

Differential Revision: https://reviews.llvm.org/D100746

commit | commitdiff | tree

Jez Ng [Mon, 10 May 2021 01:11:29 +0000 (21:11 -0400)]

[lld-macho] Add llvm-otool as a test dependency

This unbreaks my local build, which is configured to build only parts of
LLVM.

commit | commitdiff | tree

Nico Weber [Sun, 9 May 2021 22:35:16 +0000 (18:35 -0400)]

[lld/mac] Fix alignment on subsections

On a section with alignment of 16, subsections aligned to 16-byte
boundaries should keep their 16-byte alignment.

Fixes PR50274. (The same bug could have happened with -order_file
previously.)

Differential Revision: https://reviews.llvm.org/D102139

commit | commitdiff | tree

Jez Ng [Mon, 10 May 2021 00:05:45 +0000 (20:05 -0400)]

[lld-macho] Don't reference entry symbol for non-executables

This would cause us to pull in symbols (and code) that should
be unused.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D102137

commit | commitdiff | tree

Tomasz Miąsko [Sun, 9 May 2021 20:38:13 +0000 (13:38 -0700)]

[Demangle][Rust] Print special namespaces

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101821

commit | commitdiff | tree

Roman Lebedev [Sun, 9 May 2021 20:45:44 +0000 (23:45 +0300)]

[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction

As measured by exegesis, and confirmed by ref docs.

commit | commitdiff | tree

Roman Lebedev [Sun, 9 May 2021 20:28:08 +0000 (23:28 +0300)]

[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking

commit | commitdiff | tree

Roman Lebedev [Sun, 9 May 2021 20:14:17 +0000 (23:14 +0300)]

[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction

As confirmed by exegesis measurements, and ref docs.
It does actually execute.

While there, bump latency for MULX32rr, that seems to match measurements.

commit | commitdiff | tree

Roman Lebedev [Sun, 9 May 2021 20:14:12 +0000 (23:14 +0300)]

[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking

commit | commitdiff | tree

Roman Lebedev [Sun, 9 May 2021 19:43:30 +0000 (22:43 +0300)]

[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms

As measured by exegesis and confirmed in reference docs.

commit | commitdiff | tree

Roman Lebedev [Sun, 9 May 2021 19:27:16 +0000 (22:27 +0300)]

[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests

commit | commitdiff | tree

David Green [Sun, 9 May 2021 20:57:55 +0000 (21:57 +0100)]

[ARM] Fix postinc of vst1xN

These nodes are not handled correctly by CombineBaseUpdate. For the
moment, similar to 5f1cad4d296a20025f0b mark them as unsupported.

commit | commitdiff | tree

Nikita Popov [Sat, 1 May 2021 14:59:06 +0000 (16:59 +0200)]

[SCEV] Handle and/or in applyLoopGuards()

applyLoopGuards() already combines conditions from multiple nested
guards. However, it cannot use multiple conditions on the same guard,
combined using and/or. Add support for this by recursing into either
`and` or `or`, depending on the direction of the branch.

Differential Revision: https://reviews.llvm.org/D101692

commit | commitdiff | tree

Nikita Popov [Sun, 9 May 2021 19:21:54 +0000 (21:21 +0200)]

[SCEV] Add additional loop guard and/or tests (NFC)

Add tests for and/and, and/or, or/or, or/and combinations.

commit | commitdiff | tree

Roman Lebedev [Sun, 9 May 2021 17:37:30 +0000 (20:37 +0300)]

[NFC][X86] Znver3: drop obsolete fixme

commit | commitdiff | tree

Roman Lebedev [Sun, 9 May 2021 14:32:37 +0000 (17:32 +0300)]

[X86] AMD Zen 3: XCHG is a zero-cycle instruction

As measured by exegesis and confirmed by reference docs.

commit | commitdiff | tree

LemonBoy [Sun, 9 May 2021 16:51:05 +0000 (18:51 +0200)]

[SelectionDAG] Regenerate test checks (NFC)

commit | commitdiff | tree

Nikita Popov [Sun, 9 May 2021 16:20:37 +0000 (18:20 +0200)]

[SROA] Regenerate test checks (NFC)

commit | commitdiff | tree

Mark de Wever [Sun, 9 May 2021 15:55:50 +0000 (17:55 +0200)]

[libc++][doc] Update the Format library status.

- Move LWG-3218 to the chrono section.
- Mark the several parts 'In progress'.

commit | commitdiff | tree

Greg McGary [Sat, 8 May 2021 18:42:15 +0000 (11:42 -0700)]

[lld-macho][NFC] Purge stale test-output trees prior to split-file

Enforce standard practice

Differential Revision: https://reviews.llvm.org/D102112

commit | commitdiff | tree

Roman Lebedev [Sat, 8 May 2021 21:57:59 +0000 (00:57 +0300)]

[NFC][LoopIdiom] Add some tests for 'lshr until zero' ('count active bits') "on steroids" idiom

commit | commitdiff | tree

Roman Lebedev [Sat, 8 May 2021 17:42:14 +0000 (20:42 +0300)]

[NFCI][X86] Mark Znver3 scheduling model as complete

To the best of my knowledge, all instructions are modelled,
and have reasonable values to them; flipping the switch
doesn't cause any diff for MCA tests, so either we're good,
or we have test coverage gaps.

I'm not really sure why no other X86 sched model is marked as complete.

commit | commitdiff | tree

Roman Lebedev [Sat, 8 May 2021 17:39:26 +0000 (20:39 +0300)]

[NFCI][X86] Mark a few lately-added system instructions as such for Scheduling purposes

commit | commitdiff | tree

Fangrui Song [Sat, 8 May 2021 20:41:36 +0000 (13:41 -0700)]

[test] Fix tools/gold/X86/new-pm.ll after D101797

commit | commitdiff | tree

Krzysztof Parzyszek [Fri, 7 May 2021 17:51:10 +0000 (12:51 -0500)]

[Hexagon] Propagate metadata in Hexagon Vector Combine

commit | commitdiff | tree

Andrea Di Biagio [Sat, 8 May 2021 18:41:56 +0000 (19:41 +0100)]

[llvm-mca][View] Update the Register File statistics.

Correctly track the number of move eliminated in the
Register File statistics.

commit | commitdiff | tree

Greg McGary [Sat, 8 May 2021 01:05:47 +0000 (18:05 -0700)]

[lld-macho] Explicitly undefine literal exported symbols

Symbols explicitly exported via command-line options `--exported_symbol SYM` and `--exported_symbols_list FILE` must be defined. Before this fix, lazy symbols defined in archives would be left to languish. We now force them to be included in the linked output.

Differential Revision: https://reviews.llvm.org/D102100

commit | commitdiff | tree

Andrea Di Biagio [Sat, 8 May 2021 16:58:46 +0000 (17:58 +0100)]

[MCA][RegisterFile] Refactor the move elimination logic to address PR50258.

This patch lifts the restriction on the number of read/write registers for a
move elimination candidate. With this patch, move elimination candidates with
exactly two reads and two writes are treated like register swap operations for
the purpose of move elimination.

This patch currently doesn't affect any upstream model. However, it should help
unblock the progress on PR50258.

commit | commitdiff | tree

Nico Weber [Sat, 8 May 2021 17:03:17 +0000 (13:03 -0400)]

[lld/mac] Copy some of the commit message of d5a70db193 into a comment

commit | commitdiff | tree

Louis Dionne [Sat, 8 May 2021 16:15:30 +0000 (12:15 -0400)]

[libc++] NFC: Refactor Lit annotations

Annotations for c++03 mode are useless, since we only run these tests
in C++11 and C++14.

commit | commitdiff | tree

Florian Hahn [Sun, 11 Apr 2021 10:41:48 +0000 (11:41 +0100)]

[VPlan] Add test for sink scalars and merging using VPlan.

Add a couple of tests with scalars that can be sunk to their predicated
users.

This pre-commits tests for D100258.

commit | commitdiff | tree

Simon Pilgrim [Sat, 8 May 2021 15:22:46 +0000 (16:22 +0100)]

[GlobalISel] Ensure MachineIRBuilder::getDebugLoc() returns a const reference. NFCI.

Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.

commit | commitdiff | tree

Simon Pilgrim [Sat, 8 May 2021 15:19:18 +0000 (16:19 +0100)]

[X86] combineHorizOpWithShuffle - generalize HOP(SHUFFLE(X),SHUFFLE(Y)) -> SHUFFLE(HOP(X,Y)) fold.

For 128-bit types, generalize the fold to recognise duplicate operands in either shuffle.

commit | commitdiff | tree

Louis Dionne [Fri, 7 May 2021 14:15:36 +0000 (10:15 -0400)]

[libc++] Move handling of the target triple to the DSL

This fixes a long standing issue where the triple is not always set
consistently in all configurations. This change also moves the
back-deployment Lit features to using the proper target triple
instead of using something ad-hoc.

This will be necessary for using from scratch Lit configuration files
in both normal testing and back-deployment testing.

Differential Revision: https://reviews.llvm.org/D102012

commit | commitdiff | tree

Vinayaka Bandishti [Sat, 8 May 2021 14:42:23 +0000 (20:12 +0530)]

[MLIR] Add memref dialect dependency for affine fusion pass

For `AffineLoopFusion` pass, add `memref` dialect as a dependent
dialect. Since the fusion pass can create `memref::AllocOp`s, the
dialect must be registered in its dependent dialects.

The missing dependency was not discovered until now because the above
said op creation happes only when the input already has
`memref::AllocOp`s in it, and all dialects in the input are
automatically added to the context.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D102104

commit | commitdiff | tree

Uday Bondhugula [Sat, 8 May 2021 13:15:14 +0000 (18:45 +0530)]

[MLIR][NFC] Remove unused MLIRContext declaration

Remove unused MLIRContext declaration. NFC.

Differential Revision: https://reviews.llvm.org/D102103

commit | commitdiff | tree

Roman Lebedev [Sat, 8 May 2021 12:42:11 +0000 (15:42 +0300)]

Revert "[LICM] Hoist loads with invariant.group metadata"

This appears to miscompile google benchmark's GetCacheSizesFromKVFS()
when compiling with -fstrict-vtable-pointers.
Runnable reproducer: https://godbolt.org/z/f9ovKqTzb
The "f.fail()" crashes with BUS error, it is compiled into testb,
and the adress it is testing is non-sensical.

This reverts commit 4c89bcadf6cae8320a1925eb9cbeb8c8c1f5f58b.

commit | commitdiff | tree

Saurabh Jha [Sat, 8 May 2021 12:24:05 +0000 (13:24 +0100)]

Test commit to check commit access

commit | commitdiff | tree

Roman Lebedev [Sat, 8 May 2021 12:15:41 +0000 (15:15 +0300)]

[X86] Improve costmodel for scalar byte swaps

Currently we model i16 bswap as very high cost (`10`),
which doesn't seem right, with all other being at `1`.

Regardless of `MOVBE`, i16 reg-reg bswap is lowered into
(an extending move plus) rot-by-8:
https://godbolt.org/z/8jrq7fMTj
I think it should at worst have throughput of `1`:

Since i32/i64 already have cost of `1`,
`MOVBE` doesn't improve their costs any further.

BUT, `MOVBE` must have at least a single memory operand,
with other being a register. Which means, if we have
a bswap of load, iff load has a single use,
we'll fold bswap into load.

Likewise, if we have store of a bswap, iff bswap
has a single use, we'll fold bswap into store.

So i think we should treat such a bswap as free,
unless of course we know that for the particular CPU
they are performing badly.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D101924

commit | commitdiff | tree

Louis Dionne [Fri, 7 May 2021 17:14:57 +0000 (13:14 -0400)]

[libc++] Use Xcode's CMake if it's present

This resolves issues when the CMake in use on the host is too old to
configure libc++ properly, but Xcode has a sufficiently recent version.
It is technically possible for the reverse issue to happen, where the
Xcode version would be too old and the user-installed version would be
better, however in the context of our build bots, we use AppleClang on
Apple platforms, and the CMake shipped with Xcode should work with the
AppleClang shipped alongside that Xcode.

Differential Revision: https://reviews.llvm.org/D102083

commit | commitdiff | tree

Qiu Chaofan [Sat, 8 May 2021 10:13:05 +0000 (18:13 +0800)]

[VectorCombine] Simplify to scalar store if only one element updated

This patch simplifies load-insertelt-store pattern into
getelementptr-store.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98240

commit | commitdiff | tree

Butygin [Sat, 10 Apr 2021 16:38:11 +0000 (19:38 +0300)]

[mlir] Debug print pattern before and after matchAndRewrite call

Motivation: we have passes with lot of rewrites and when one one them segfaults or asserts, it is very hard to find waht exactly pattern failed without debug info.

Differential Revision: https://reviews.llvm.org/D101443

commit | commitdiff | tree

Xiang1 Zhang [Sat, 8 May 2021 05:46:51 +0000 (13:46 +0800)]

[X86] Support AMX fast register allocation

Differential Revision: https://reviews.llvm.org/D100026

commit | commitdiff | tree

Arthur Eubanks [Sat, 8 May 2021 06:18:44 +0000 (23:18 -0700)]

Fix build after 34a8a437b

commit | commitdiff | tree

Xiang1 Zhang [Sat, 8 May 2021 05:43:32 +0000 (13:43 +0800)]

Revert "[X86] Support AMX fast register allocation"

This reverts commit 77e2e5e07d01fe0b83c39d0c527c0d3d2e659146.

commit | commitdiff | tree

Xiang1 Zhang [Fri, 7 May 2021 02:46:52 +0000 (10:46 +0800)]

[X86] Support AMX fast register allocation

commit | commitdiff | tree

Michael Liao [Sat, 8 May 2021 05:09:15 +0000 (01:09 -0400)]

Replace a remaining CRLF with LF. NFC.

commit | commitdiff | tree

Arthur Eubanks [Mon, 3 May 2021 23:09:56 +0000 (16:09 -0700)]

[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose

Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797

commit | commitdiff | tree

RamNalamothu [Sat, 8 May 2021 04:45:49 +0000 (10:15 +0530)]

[DebugInfo] UnwindTable::create() should not add empty rows to CFI unwind table

UnwindTable::parseRows() may return successfully if the CFIProgram has either
no CFI instructions or only DW_CFA_nop instructions and the UnwindRow return
argument will be empty. But currently, the callers are not checking for this case
which is leading to incorrect dumps in the unwind tables in such cases i.e.

CFA=unspecified

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D101892

commit | commitdiff | tree

River Riddle [Sat, 8 May 2021 02:30:25 +0000 (19:30 -0700)]

[mlir] Refactor the representation of function-like argument/result attributes.

The current design uses a unique entry for each argument/result attribute, with the name of the entry being something like "arg0". This provides for a somewhat sparse design, but ends up being much more expensive (from a runtime perspective) in-practice. The design requires building a string every time we lookup the dictionary for a specific arg/result, and also requires N attribute lookups when collecting all of the arg/result attribute dictionaries.

This revision restructures the design to instead have an ArrayAttr that contains all of the attribute dictionaries for arguments and another for results. This design reduces the number of attribute name lookups to 1, and allows for O(1) lookup for individual element dictionaries. The major downside is that we can end up with larger memory usage, as the ArrayAttr contains an entry for each element even if that element has no attributes. If the memory usage becomes too problematic, we can experiment with a more sparse structure that still provides a lot of the wins in this revision.

This dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds.

Differential Revision: https://reviews.llvm.org/D102035

commit | commitdiff | tree

Arthur Eubanks [Sat, 8 May 2021 01:11:21 +0000 (18:11 -0700)]

[lit] Bump up the Windows process cap from 32 to 60

At 61 or over, I see messages like

  File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)

  ValueError: need at most 63 handles, got a sequence of length 64

60 seems to work for me.

If this causes issues for anybody else, feel free to revert.

commit | commitdiff | tree

River Riddle [Sat, 8 May 2021 00:55:52 +0000 (17:55 -0700)]

[mlir] Add hover support to mlir-lsp-server

This provides information when the user hovers over a part of the source .mlir file. This revision adds the following hover behavior:
* Operation:
  - Shows the generic form.
* Operation Result:
  - Shows the parent operation name, result number(s), and type(s).
* Block:
  - Shows the parent operation name, block number, predecessors, and successors.
* Block Argument:
  - Shows the parent operation name, parent block, argument number, and type.

Differential Revision: https://reviews.llvm.org/D101113

commit | commitdiff | tree

Arthur Eubanks [Sat, 8 May 2021 01:00:11 +0000 (18:00 -0700)]

Revert "lit: revert 134b103fc0f3a995d76398bf4b029d72bebe8162"

This reverts commit d319005a3746a7661c8c9a3302266b6ff7cf61be.

Causing messages like:

File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 74

commit | commitdiff | tree

Arthur Eubanks [Sat, 8 May 2021 00:54:32 +0000 (17:54 -0700)]

[gn build] Manually port 5b158093e

commit | commitdiff | tree

thomasraoux [Sat, 8 May 2021 00:10:35 +0000 (17:10 -0700)]

[mlir][vector] Fix warning

Previous change caused another warning in some build configuration:
"default label in switch which covers all enumeration values"

commit | commitdiff | tree

Amara Emerson [Fri, 7 May 2021 00:14:04 +0000 (17:14 -0700)]

[AArch64][GlobalISel] Create a new minimal combiner pass just for -O0.

We never bothered to have a separate set of combines for -O0 in the prelegalizer
before. This results in some minor performance hits for a mode where performance
isn't a concern (although not regressing code size significantly is still preferable).

This also removes the CSE option since we don't need it for -O0.

Through experiments, I've arrived at a set of combines that gets the most code
size improvement at -O0, while reducing the amount of time spent in the combiner
by around 35% give or take.

Differential Revision: https://reviews.llvm.org/D102038

commit | commitdiff | tree

Amara Emerson [Wed, 5 May 2021 18:37:00 +0000 (11:37 -0700)]

[GlobalISel] Don't form zero/sign extending loads for atomics.

For importing patterns, we only support matching G_LOAD, not G_ZEXTLOAD or
G_SEXTLOAD.

Differential Revision: https://reviews.llvm.org/D101932

commit | commitdiff | tree

Weston Carvalho [Thu, 29 Apr 2021 21:30:47 +0000 (14:30 -0700)]

Make `hasTypeLoc` matcher support more node types.

Differential Revision: https://reviews.llvm.org/D101572

commit | commitdiff | tree

Weston Carvalho [Fri, 7 May 2021 23:32:57 +0000 (00:32 +0100)]

NFC: Move TypeList implementation up the file

This will make it possible for more code to use it.

commit | commitdiff | tree

Arthur Eubanks [Fri, 7 May 2021 21:32:40 +0000 (14:32 -0700)]

[NewPM] Move analysis invalidation/clearing logging to instrumentation

We're trying to move DebugLogging into instrumentation, rather than
being part of PassManagers/AnalysisManagers.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D102093

commit | commitdiff | tree

Jessica Paquette [Tue, 20 Apr 2021 23:07:54 +0000 (16:07 -0700)]

[AArch64][GlobalISel] Legalize narrow type G_CTPOPs

Using `clampScalar` here because we ought to mark s128 as custom eventually.

(Right now, it will just fall back.)

With this legalization, we get the same code as SDAG:
https://godbolt.org/z/TneoPKrKG

Differential Revision: https://reviews.llvm.org/D100908

commit | commitdiff | tree

Adrian Prantl [Fri, 7 May 2021 21:44:45 +0000 (14:44 -0700)]

Fix the module-enabled build by removing a redundant type definition.

Domain: System / Toolchain;

RSS Atom