review.tizen.org Git - platform/upstream/llvm.git/log

projects / platform / upstream / llvm.git / log

cgyurgyik [Wed, 29 Jul 2020 20:29:19 +0000 (16:29 -0400)]

[libc] Adds fuzz test for strstr and alphabetizes string fuzz CMakeList.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D84611

commit | commitdiff | tree

Philip Reames [Wed, 29 Jul 2020 20:28:26 +0000 (13:28 -0700)]

[Statepoint] Enable cross block relocates w/vreg lowering

This change is mechanical, it just removes the restriction and updates tests.  The key building blocks were submitted in 31342eb and 8fe2abc.

Note that this (and preceeding changes) entirely subsumes D83965.  I did includes a couple of it's tests.

From the codegen changes, an interesting observation: this doesn't actual reduce spilling, it just let's the register allocator do it's job.  That results in a slightly different overall result which has both pros and cons over the eager spill lowering.  (i.e. We'll have some perf tuning to do once this is stable.)

commit | commitdiff | tree

Luboš Luňák [Sat, 5 Oct 2019 19:14:01 +0000 (21:14 +0200)]

[lldb] implement 'up' and 'down' shortcuts in lldb gui

Also add a unittest.

Differential Revision: https://reviews.llvm.org/D68541

commit | commitdiff | tree

Luboš Luňák [Fri, 11 Oct 2019 19:59:50 +0000 (21:59 +0200)]

[lldb] change shortcut for 'step out' from 'o' to 'f'

This makes it consistent with gdb tui, where 'f' is 'finish'.
See the discussion at https://reviews.llvm.org/D68541 .

Differential Revision: https://reviews.llvm.org/D68909

commit | commitdiff | tree

Luboš Luňák [Fri, 11 Oct 2019 19:42:56 +0000 (21:42 +0200)]

[lldb] remove somewhat dangerous 'd'(etach) and 'k'(ill) shortcuts

'd' would be much better used for up/down shortcuts, and this also removes
the possibility of ruining the whole debugging session by accidentally
hitting 'd' or 'k'. Also change menu to have both 'detach and resume'
and 'detach suspended' to make it clear which one is which. See
discussion at https://reviews.llvm.org/D68541 .

Differential Revision: https://reviews.llvm.org/D68908

commit | commitdiff | tree

aartbik [Wed, 29 Jul 2020 20:25:31 +0000 (13:25 -0700)]

[mlir] [VectorOps] [integration_test] Sparse matrix times vector (jagged SAXPY version)

Transposed jagged diagonal storage yields longer vector lengths. Also, in
contrast with naive SAXPY (one gather/scatter), this only performs one gather.

Reviewed By: reidtatge

Differential Revision: https://reviews.llvm.org/D84699

commit | commitdiff | tree

Johannes Doerfert [Wed, 29 Jul 2020 20:18:20 +0000 (15:18 -0500)]

[OpenMP] Fix D83281 issue on windows by allowing `dso_local` in CHECK

commit | commitdiff | tree

Nikita Popov [Sat, 25 Jul 2020 14:21:55 +0000 (16:21 +0200)]

[ConstantRange] Add API for intrinsics (NFC)

This adds a common API for compute constant ranges of intrinsics.
The intention here is that
a) we can reuse the same code across different passes that handle
constant ranges, i.e. this can be reused in SCCP
b) we only have to add knowledge about supported intrinsics to
ConstantRange, not any consumers.

Differential Revision: https://reviews.llvm.org/D84587

commit | commitdiff | tree

Tobias Gysi [Wed, 29 Jul 2020 20:00:48 +0000 (22:00 +0200)]

[mlir] fix error handling in rocm runtime wrapper

The patch fixes minor issues in the rocm runtime wrapper due to api differences between CUDA and HIP.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D84861

commit | commitdiff | tree

Richard Smith [Wed, 29 Jul 2020 20:06:52 +0000 (13:06 -0700)]

Fix invalid attempted explicit instantiation, which Clang now rejects.

commit | commitdiff | tree

Fangrui Song [Wed, 29 Jul 2020 20:01:31 +0000 (13:01 -0700)]

[ELF][test] Fix ppc64-reloc-pcrel34-overflow.s

commit | commitdiff | tree

Victor Huang [Wed, 29 Jul 2020 18:38:55 +0000 (18:38 +0000)]

[PowerPC] Support for R_PPC64_REL24_NOTOC calls where the caller has no TOC and the callee is not DSO local

This patch supports the situation where caller does not have a valid TOC and
calls using the R_PPC64_REL24_NOTOC relocation and the callee is not DSO local.
In this case the call cannot be made directly since the callee may or may not
require a valid TOC pointer. As a result this situation require a PC-relative
plt stub to set up r12.

Reviewed By: sfertile, MaskRay, stefanp

Differential Revision: https://reviews.llvm.org/D83669

commit | commitdiff | tree

Simon Pilgrim [Wed, 29 Jul 2020 18:52:11 +0000 (19:52 +0100)]

[X86][AVX] isHorizontalBinOp - relax no-lane-crossing limit for AVX1-only targets.

Instead of never accepting v8f32/v4f64 FHADD/FHSUB if the input shuffle masks cross lanes, perform the matching and determine if the post shuffle mask simplifies to a 'whole lane shuffle' mask - in which case we are guaranteed to cheaply perform this as a VPERM2F128 shuffle.

commit | commitdiff | tree

Philip Reames [Wed, 29 Jul 2020 19:44:49 +0000 (12:44 -0700)]

[Tests] Split a file for ease of update

commit | commitdiff | tree

Florian Hahn [Wed, 29 Jul 2020 18:35:14 +0000 (19:35 +0100)]

Reland "[SCEVExpander] Add option to preserve LCSSA directly."

This reverts the revert commit dc2867576886247cbe351e7c63618c09ab6af808.

It includes a fix for Polly, which uses SCEVExpander on IR that is not
in LCSSA form. Set PreserveLCSSA = false in that case, to ensure we do
not introduce LCSSA phis where there were none before.

commit | commitdiff | tree

Richard Smith [Wed, 29 Jul 2020 19:29:05 +0000 (12:29 -0700)]

PR46231: Promote diagnostic for 'template<...>;' from ExtWarn to Error.

No other compiler accepts this as an extension, not even in permissive
mode. We're not doing anyone any favors by allowing this, and it's
unlikely to be at all common, even in Clang-only code, in the wild.

commit | commitdiff | tree

Stanislav Mekhanoshin [Wed, 29 Jul 2020 19:21:28 +0000 (12:21 -0700)]

[AMDGPU] Fixed formatting in GCNHazardRecognizer.cpp. NFC.

commit | commitdiff | tree

Stanislav Mekhanoshin [Wed, 29 Jul 2020 18:47:18 +0000 (11:47 -0700)]

[AMDGPU] prefer non-mfma in post-RA schedule

MFMA instructions shall not be scheduled back to back
to avoid MAI SIMD stall. Tell post-RA schedule we would
prefer some other instruction instead.

Differential Revision: https://reviews.llvm.org/D84883

commit | commitdiff | tree

Louis Dionne [Wed, 29 Jul 2020 18:24:02 +0000 (14:24 -0400)]

[libc++] Re-enable tests for C11 math macros in <float.h> and <cfloat>

Fixes http://llvm.org/PR38572.

commit | commitdiff | tree

Matt Arsenault [Wed, 29 Jul 2020 18:40:51 +0000 (14:40 -0400)]

GlobalISel: Fix insert point in CSEMIRBuilder unit test

This was using invalid MIR for the test instructions. The test add was
the first instruction in the block, before the trunc inputs or copies
from physical registers which I assume was not intended.

commit | commitdiff | tree

Tatyana Krasnukha [Thu, 23 Jul 2020 15:41:14 +0000 (18:41 +0300)]

[lldb/Breakpoint] Rename StoppointLocation to StoppointSite and drop its relationship with BreakpointLocation

Both of BreakpointLocation and BreakpointSite were inherited from StoppointLocation. However, the only thing
they shared was hit counting logic. The patch encapsulates those logic into StoppointHitCounter, renames
StoppointLocation to StoppointSite, and stops BreakpointLocation's inheriting from it.

Differential Revision: https://reviews.llvm.org/D84527

commit | commitdiff | tree

Richard Smith [Wed, 29 Jul 2020 19:01:47 +0000 (12:01 -0700)]

PR46859: Fix crash if declaring a template in a DeclScope with no DeclContext.

This can happen during error recovery; it could also happen at block
scope if we ever parsed a template declaration at block scope.

commit | commitdiff | tree

Baptiste Saleil [Tue, 28 Jul 2020 21:02:50 +0000 (16:02 -0500)]

[PowerPC] Add options to control paired vector memops support

Adds frontend and backend options to enable and disable the
PowerPC paired vector memory operations added in ISA 3.1.
Instructions using these options will be added in subsequent patches.

Differential Revision: https://reviews.llvm.org/D83722

commit | commitdiff | tree

Matt Morehouse [Wed, 29 Jul 2020 18:58:14 +0000 (18:58 +0000)]

[DFSan] Add efficient fast16labels instrumentation mode.

Adds the -fast-16-labels flag, which enables efficient instrumentation
for DFSan when the user needs <=16 labels. The instrumentation
eliminates most branches and most calls to __dfsan_union or
__dfsan_union_load.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D84371

commit | commitdiff | tree

Tatyana Krasnukha [Wed, 29 Jul 2020 18:52:40 +0000 (21:52 +0300)]

[lldb/BreakpointSite] Handle all ways of control flow

commit | commitdiff | tree

Amara Emerson [Fri, 24 Jul 2020 20:00:12 +0000 (13:00 -0700)]

[GlobalISel] Add G_INTRINSIC_LRINT and translate from llvm.lrint

Differential Revision: https://reviews.llvm.org/D84551

commit | commitdiff | tree

Philip Reames [Wed, 29 Jul 2020 18:41:40 +0000 (11:41 -0700)]

[Statepoint] Consolidate relocation type tracking [NFC]

Change the way we track how a particular pointer was relocated at a statepoint in selection dag. Previously, we used an optional<location> for the spill lowering, and a block local Register for the newly introduced vreg lowering. Combine all three lowerings (norelocate, spill, and vreg) into a single helper class, and keep a single copy of the information.

This is submitted separately as it really does make the code more readible on it's own, but the indirect motivation is to move vreg tracking from StatepointLowering to FunctionLoweringInfo. This is the last piece needed to support cross block relocations with vregs; that will follow in a separate (non-NFC) patch.

commit | commitdiff | tree

Amara Emerson [Wed, 29 Jul 2020 07:21:15 +0000 (00:21 -0700)]

[AArch64][GlobalISel] Selection support for vector DUP[X]lane instructions.

In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.

Differential Revision: https://reviews.llvm.org/D84866

commit | commitdiff | tree

Tatyana Krasnukha [Tue, 21 Jul 2020 18:04:36 +0000 (21:04 +0300)]

[lldb] Don't use hardware index to determine whether a breakpoint site is hardware

Most process plugins (if not all) don't set hardware index for breakpoints. They even
are not able to determine this index.

This patch makes StoppointLocation::IsHardware pure virtual and lets BreakpointSite
override it using more accurate BreakpointSite::Type.

It also adds assertions to be sure that a breakpoint site is hardware when this is required.

Differential Revision: https://reviews.llvm.org/D84257

commit | commitdiff | tree

Tatyana Krasnukha [Wed, 22 Jul 2020 11:52:52 +0000 (14:52 +0300)]

[lldb] Make process plugins check whether a hardware breakpoint is required

Remove @skipIfWindows as process should report the error correctly on Windows now.

Differential Revision: https://reviews.llvm.org/D84255

commit | commitdiff | tree

Tatyana Krasnukha [Wed, 22 Jul 2020 11:45:10 +0000 (14:45 +0300)]

[lldb] Skip overlapping hardware and external breakpoints when writing memory

This fixes the assertion `assert(intersects);` in the Process::WriteMemory function.

Differential Revision: https://reviews.llvm.org/D84254

commit | commitdiff | tree

Matt Arsenault [Tue, 21 Jul 2020 23:41:24 +0000 (19:41 -0400)]

AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant

commit | commitdiff | tree

Julian Lettner [Wed, 29 Jul 2020 16:55:52 +0000 (09:55 -0700)]

[compiler-rt][Darwin] Disable EXC_GUARD exceptions

ASan/TSan use mmap in a way that creates “deallocation gaps” which
triggers EXC_GUARD exceptions on macOS 10.15+ (XNU 19.0+). Let's
suppress those.

commit | commitdiff | tree

Tatyana Krasnukha [Wed, 22 Jul 2020 11:41:15 +0000 (14:41 +0300)]

[lldb/test] Put hardware breakpoint tests together, NFC

Create a common base class for them to re-use supports_hw_breakpoints function in decorators.

Differential Revision: https://reviews.llvm.org/D84311

commit | commitdiff | tree

Florian Hahn [Wed, 29 Jul 2020 18:17:24 +0000 (19:17 +0100)]

Revert "[SCEVExpander] Add option to preserve LCSSA directly."

This reverts commit 99166fd4fb422351f131fb1265cb85d5f6c5b8da, because it
breaks the polly builders.

polly/test/Isl/CodeGen/invariant_load_escaping_second_scop.ll fails
because a apparently unnecessary LCSSA phi node is introduced.

Make the bots green again, while I take a closer look.

commit | commitdiff | tree

Louis Dionne [Wed, 29 Jul 2020 18:17:57 +0000 (14:17 -0400)]

[libc++] Remove c++98 from the possible Standards of the test suite

Clang treats C++98 and C++03 as the same anyway, so it's no use having
two different settings for the same standard.

commit | commitdiff | tree

Louis Dionne [Wed, 29 Jul 2020 18:16:51 +0000 (14:16 -0400)]

[libc++][pstl] Remove c++98 from UNSUPPORTED annotations

c++98 isn't used by the test suite anymore, only c++03 is.

commit | commitdiff | tree

Matt Arsenault [Mon, 27 Jul 2020 01:25:10 +0000 (21:25 -0400)]

GlobalISel: Implement lower for G_EXTRACT_VECTOR_ELT

Use the basic store to stack and reload.

commit | commitdiff | tree

Kostya Serebryany [Wed, 29 Jul 2020 17:34:07 +0000 (10:34 -0700)]

Add more debug code for https://github.com/google/sanitizers/issues/1193 (getting desperate, not being able to reproduce it for a few months, but the users are seeing it)

mode debug code

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D84819

commit | commitdiff | tree

Jessica Paquette [Tue, 28 Jul 2020 18:33:39 +0000 (11:33 -0700)]

[AArch64][GlobalISel] Select XRO addressing mode with wide immediates

Port the wide immediate case from AArch64DAGToDAGISel::SelectAddrModeXRO.

If we have a wide immediate which can't be represented in an add, we can end up
with code like this:

```
mov  x0, imm
add x1, base, x0
ldr  x2, [x1, 0]
```

If we use the [base, xN] addressing mode instead, we can produce this:

```
mov  x0, imm
ldr  x2, [base, x0]
```

This saves 0.4% code size on 7zip at -O3, and gives a geomean code size
improvement of 0.1% on CTMark.

Differential Revision: https://reviews.llvm.org/D84784

commit | commitdiff | tree

Matt Arsenault [Mon, 20 Jul 2020 02:57:24 +0000 (22:57 -0400)]

AMDGPU: Relax restriction on folding immediates into physregs

I never completed the work on the patches referenced by
f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to
avoid folding immediate writes into m0 which the coalescer doesn't
understand very well. Relax this to allow simple SGPR immediates to
fold directly into VGPR copies. This pattern shows up routinely in
current GlobalISel code since nothing is smart enough to emit VGPR
constants yet.

commit | commitdiff | tree

Tres Popp [Wed, 29 Jul 2020 14:03:53 +0000 (16:03 +0200)]

[MLIR][NFC] Move Shape::WitnessType Declaration.

This moves it from ShapeOps.td to ShapeBase.td

Differential Revision: https://reviews.llvm.org/D84845

commit | commitdiff | tree

Matt Arsenault [Wed, 29 Jul 2020 17:31:59 +0000 (13:31 -0400)]

GloblaISel: Remove unreachable condition

Fixes bug 46882

commit | commitdiff | tree

LLVM GN Syncbot [Wed, 29 Jul 2020 17:37:10 +0000 (17:37 +0000)]

[gn build] Port 276f9e8cfaf

commit | commitdiff | tree

Heejin Ahn [Wed, 29 Jul 2020 08:19:36 +0000 (01:19 -0700)]

[WebAssembly] Fix getBottom for loops

When it was first created, CFGSort only made sure BBs in each
`MachineLoop` are sorted together. After we added exception support,
CFGSort now also sorts BBs in each `WebAssemblyException`, which
represents a `catch` block, together, and
`Region` class was introduced to be a thin wrapper for both
`MachineLoop` and `WebAssemblyException`.

But how we compute those loops and exceptions is different.
`MachineLoopInfo` is constructed using the standard loop computation
algorithm in LLVM; the definition of loop is "a set of BBs that are
dominated by a loop header and have a path back to the loop header". So
even if some BBs are semantically contained by a loop in the original
program, or in other words dominated by a loop header, if they don't
have a path back to the loop header, they are not considered a part of
the loop. For example, if a BB is dominated by a loop header but
contains `call abort()` or `rethrow`, it wouldn't have a path back to
the header, so it is not included in the loop.

But `WebAssemblyException` is wasm-specific data structure, and its
algorithm is simple: a `WebAssemblyException` consists of an EH pad and
all BBs dominated by the EH pad. So this scenario is possible: (This is
also the situation in the newly added test in cfg-stackify-eh.ll)

```
Loop L: header, A, ehpad, latch
Exception E: ehpad, latch, B
```
(B contains `abort()`, so it does not have a path back to the loop
header, so it is not included in L.)

And it is sorted in this order:
```
header
A
ehpad
latch
B
```

And when CFGStackify places `end_loop` or `end_try` markers, it
previously used `WebAssembly::getBottom()`, which returns the latest BB
in the sorted order, and placed the marker there. So in this case the
marker placements will be like this:
```
loop
  header
  try
    A
  catch
    ehpad
    latch
end_loop         <-- misplaced!
    B
  end_try
```
in which nesting between the loop and the exception is not correct.
`end_loop` marker has to be placed after `B`, and also after `end_try`.

Maybe the fundamental way to solve this problem is to come up with our
own algorithm for computing loop region too, in which we include all BBs
dominated by a loop header in a loop. But this takes a lot more effort.
The only thing we need to fix is actually, `getBottom()`. If we make it
return the right BB, which means in case of a loop, the latest BB of the
loop itself and all exceptions contained in there, we are good.

This renames `Region` and `RegionInfo` to `SortRegion` and
`SortRegionInfo` and extracts them into their own file. And add
`getBottom` to `SortRegionInfo` class, from which it can access
`WebAssemblyExceptionInfo`, so that it can compute a correct bottom
block for loops.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D84724

commit | commitdiff | tree

Jonas Devlieghere [Wed, 29 Jul 2020 15:59:57 +0000 (08:59 -0700)]

[lldb] Improve platform handling in CreateTargetInternal

Currently, `target create` has no --platform option. However,
TargetList::CreateTargetInternal which is called under the hood, will
return an error when either no platform or multiple matching platforms
are found, saying that a platform should be specified with --platform.

This patch adds the platform option, but that doesn't solve either of
these errors.

- If more than one platform matches, specifying the platform isn't
   going to fix that. The current code will only look at the
   architecture instead. I've updated the error message to ask the user
   to specify an architecture.

- If no architecture is found, specifying a new one via platform isn't
   going to change that either because we already try to find one that
   matches the given architecture.

Differential revision: https://reviews.llvm.org/D84809

commit | commitdiff | tree

Arthur Eubanks [Tue, 28 Jul 2020 21:29:13 +0000 (14:29 -0700)]

[Scudo][CMake] Add -fno-lto to Scudo libraries

-fno-lto is in SANITIZER_COMMON_CFLAGS but not here.
Don't use SANITIZER_COMMON_CFLAGS because of performance issues.
See https://bugs.llvm.org/show_bug.cgi?id=46838.

Fixes
$ ninja TScudoCUnitTest-i386-Test
on an LLVM build with -DLLVM_ENABLE_LTO=Thin.
check-scudo now passes.

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D84805

commit | commitdiff | tree

Hiroshi Yamauchi [Wed, 29 Jul 2020 16:13:37 +0000 (09:13 -0700)]

[PGO] Remove insignificant function hash values from some tests.

This is to avoid the need to update a bunch of test files when the PGO
instrumentation function hashing changes.

Split off of D84782.

Differential Revision: https://reviews.llvm.org/D84865

commit | commitdiff | tree

Craig Topper [Wed, 29 Jul 2020 17:08:12 +0000 (10:08 -0700)]

[X86] Add custom lowering for llvm.roundeven with sse4.1.

We can use the roundss/sd/ps/pd instructions like we do for
ceil/floor/trunc/rint/nearbyint.

Differential Revision: https://reviews.llvm.org/D84592

commit | commitdiff | tree

Craig Topper [Wed, 29 Jul 2020 17:05:25 +0000 (10:05 -0700)]

[LV] Add abs/smin/smax/umin/umax intrinsics to isTriviallyVectorizable

This patch adds support for vectorizing these intrinsics.

Differential Revision: https://reviews.llvm.org/D84796

commit | commitdiff | tree

Arthur Eubanks [Mon, 27 Jul 2020 21:19:12 +0000 (14:19 -0700)]

[DFSan][NewPM] Port DataFlowSanitizer to NewPM

Reviewed By: ychen, morehouse

Differential Revision: https://reviews.llvm.org/D84707

commit | commitdiff | tree

Sanjay Patel [Wed, 29 Jul 2020 17:12:51 +0000 (13:12 -0400)]

[InstSimplify] try constant folding intrinsics before general simplifications

This matches the behavior of simplify calls for regular opcodes -
rely on ConstantFolding before spending time on folds with variables.

I am not aware of any diffs from this re-ordering currently, but there was
potential for unintended behavior from the min/max intrinsics because that
code is implicitly assuming that only 1 of the input operands is constant.

commit | commitdiff | tree

Simon Pilgrim [Wed, 29 Jul 2020 16:33:56 +0000 (17:33 +0100)]

[DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG

Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG

Differential Revision: https://reviews.llvm.org/D84863

commit | commitdiff | tree

Roman Lebedev [Wed, 29 Jul 2020 16:54:33 +0000 (19:54 +0300)]

[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline

I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

commit | commitdiff | tree

Kang Zhang [Wed, 29 Jul 2020 16:39:27 +0000 (16:39 +0000)]

[PowerPC] Set v1i128 to expand for SETCC to avoid crash

Summary:
PPC only supports the instruction selection for v16i8, v8i16, v4i32,
v2i64, v4f32 and v2f64 for ISD::SETCC, don't support the v1i128, so
v1i128 for ISD::SETCC will crash.

This patch is to set v1i128 to expand to avoid crash.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D84238

commit | commitdiff | tree

Philip Reames [Wed, 29 Jul 2020 16:13:15 +0000 (09:13 -0700)]

[Statepoint] When using the tied def lowering, unconditionally use vregs [almost NFC]

This builds on 3da1a96 on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below.

The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that.

Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation.

The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication.

I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level.

Differential Revision: https://reviews.llvm.org/D84692

commit | commitdiff | tree

Joel E. Denny [Wed, 29 Jul 2020 16:18:50 +0000 (12:18 -0400)]

[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)

This patch implements OpenMP runtime support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
previous patch in this series implements Clang front end support.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D84712

commit | commitdiff | tree

Joel E. Denny [Wed, 29 Jul 2020 16:18:45 +0000 (12:18 -0400)]

[OpenMP] Implement TR8 `present` motion modifier in Clang (1/2)

This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
next patch in this series implements OpenMP runtime support.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84711

commit | commitdiff | tree

Arthur Eubanks [Wed, 29 Jul 2020 00:57:21 +0000 (17:57 -0700)]

[NewPM][Attributor] Pin tests with -attributor to legacy PM

All these tests already explicitly test against both legacy PM and NPM.

$ sed -i 's/ -attributor / -attributor -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=)
$ sed -i 's/ -attributor-cgscc / -attributor-cgscc -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=)

Now all tests in Transforms/Attributor/ pass under NPM.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84813

commit | commitdiff | tree

Sanjay Patel [Wed, 29 Jul 2020 15:24:36 +0000 (11:24 -0400)]

[InstSimplify] allow partial undef constants for vector min/max folds

commit | commitdiff | tree

Sanjay Patel [Wed, 29 Jul 2020 15:17:11 +0000 (11:17 -0400)]

[InstSimplify] fold integer min/max intrinsic with same args

commit | commitdiff | tree

Kang Zhang [Wed, 29 Jul 2020 15:43:47 +0000 (15:43 +0000)]

[MachineVerifier] Handle the PHI node for verifyLiveVariables()

Summary:
When doing MachineVerifier for LiveVariables, the MachineVerifier pass
will calculate the LiveVariables, and compares the result with the
result livevars pass gave. If they are different, verifyLiveVariables()
will give error.

But when we calculate the LiveVariables in MachineVerifier, we don't
consider the PHI node, while livevars considers.

This patch is to fix above bug.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D80274

commit | commitdiff | tree

Matt Arsenault [Sat, 18 Jul 2020 01:32:24 +0000 (21:32 -0400)]

AMDGPU: Account for the size of LDS globals used through constant
expressions.

Also "fix" the longstanding bug where the computed size depends on the
order of the visitation. We could try to predict the allocation order
used by legalization, but it would never be 100% perfect. Until we
start fixing the addresses somehow (or have a more reliable allocation
scheme later), just try to compute the size based on the worst case
padding.

commit | commitdiff | tree

Nathan James [Wed, 29 Jul 2020 15:34:06 +0000 (16:34 +0100)]

[clang-tidy] Handled insertion only fixits when determining conflicts.

Handle insertion fix-its when removing incompatible errors by introducting a new EventType `ET_Insert`
This has lower prioirty than End events, but higher than begin.
Idea being If an insert is at the same place as a begin event, the insert should be processed first to reduce unnecessary conflicts.
Likewise if its at the same place as an end event, process the end event first for the same reason.

This also fixes https://bugs.llvm.org/show_bug.cgi?id=46511.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D82898

commit | commitdiff | tree

David Sherwood [Fri, 10 Jul 2020 09:26:33 +0000 (10:26 +0100)]

[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock

In vectorizeChainsInBlock we try to collect chains of PHI nodes
that have the same element type, but the code is relying upon
the implicit conversion from TypeSize -> uint64_t. For now, I have
modified the code to ignore PHI nodes with scalable types.

Differential Revision: https://reviews.llvm.org/D83542

commit | commitdiff | tree

Yuanfang Chen [Tue, 28 Jul 2020 23:31:46 +0000 (16:31 -0700)]

[NewPM][PassInstrument] Make PrintIR and TimePasses to use before-pass-run callback

Reviewed By: asbirlea, aeubanks

Differential Revision: https://reviews.llvm.org/D84773

commit | commitdiff | tree

Yuanfang Chen [Tue, 28 Jul 2020 23:31:29 +0000 (16:31 -0700)]

[NewPM][PassInstrument] Add a new kind of before-pass callback that only get called if the pass is not skipped

TODO
* PrintIRInstrumentation and TimePassesHandler would be using this new callback.
* "Running pass" logging will also be moved to use this callback.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D84772

commit | commitdiff | tree

Johannes Doerfert [Tue, 7 Jul 2020 06:08:03 +0000 (01:08 -0500)]

[OpenMP] Allow traits for the OpenMP context selector `isa`

It was unclear what `isa` was supposed to mean so we did not provide any
traits for this context selector. With this patch we will allow *any*
string or identifier. We use the target attribute and target info to
determine if the trait matches. In other words, we will check if the
provided value is a target feature that is available (at the call site).

Fixes PR46338

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D83281

commit | commitdiff | tree

Johannes Doerfert [Wed, 15 Jul 2020 14:50:46 +0000 (09:50 -0500)]

[OpenMP][Docs] Update Clang Support docs after D75591

commit | commitdiff | tree

Simon Wallis [Wed, 29 Jul 2020 15:17:47 +0000 (16:17 +0100)]

[MachineCopyPropagation] BackwardPropagatableCopy: add check for hasOverlappingMultipleDef

In MachineCopyPropagation::BackwardPropagatableCopy(),
a check is added for multiple destination registers.

The copy propagation is avoided if the copied destination register
is the same register as another destination on the same instruction.

A new test is added. This used to fail on ARM like this:
error: unpredictable instruction, RdHi and RdLo must be different
umull r9, r9, lr, r0

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D82638

commit | commitdiff | tree

Nathan James [Wed, 29 Jul 2020 15:19:06 +0000 (16:19 +0100)]

[clang-tidy] Fix module options being registered with different priorities

Not a bug that is ever likely to materialise, but still worth fixing

Reviewed By: DmitryPolukhin

Differential Revision: https://reviews.llvm.org/D84850

commit | commitdiff | tree

Xing GUO [Wed, 29 Jul 2020 15:05:47 +0000 (23:05 +0800)]

[DWARFYAML] Make the field names consistent with the DWARF spec. NFC.

This patch replaces 'AddrSize'/'SegSize' with
'AddressSize'/'SegmentSelectorSize'. NFC.

commit | commitdiff | tree

Sanjay Patel [Wed, 29 Jul 2020 14:54:47 +0000 (10:54 -0400)]

[ConstantFolding] fold integer min/max intrinsics

If both operands are undef, return undef.
If one operand is undef, clamp to limit constant.

commit | commitdiff | tree

Sanjay Patel [Wed, 29 Jul 2020 13:44:51 +0000 (09:44 -0400)]

[ConstantFolding] add tests for integer min/max intrinsics; NFC

commit | commitdiff | tree

Chris Bowler [Wed, 29 Jul 2020 13:52:34 +0000 (09:52 -0400)]

[NFC][PPC][AIX] Add test coverage for _Complex return values

Differential Revision: https://reviews.llvm.org/D84069

commit | commitdiff | tree

Simon Pilgrim [Wed, 29 Jul 2020 14:47:28 +0000 (15:47 +0100)]

[CostModel][X86] Add SSE costs for SMAX/SMIN/UMAX/UMIN intrinsics

commit | commitdiff | tree

Frederik Gossen [Wed, 29 Jul 2020 14:52:27 +0000 (14:52 +0000)]

[MLIR][Shape] Limit shape to SCF lowering patterns to their supported types

Differential Revision: https://reviews.llvm.org/D84444

commit | commitdiff | tree

Jakub Lichman [Wed, 29 Jul 2020 14:24:48 +0000 (16:24 +0200)]

[mlir][Linalg] Conv1D, Conv2D and Conv3D added as named ops

This commit is part of a greater project which aims to add
full end-to-end support for convolutions inside mlir. The
reason behind having conv ops for each rank rather than
having one generic ConvOp is to enable better optimizations
for every N-D case which reflects memory layout of input/kernel
buffers better and simplifies code as well. We expect plain linalg.conv
to be progressively retired.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D83879

commit | commitdiff | tree

Nathan James [Wed, 29 Jul 2020 14:35:28 +0000 (15:35 +0100)]

[clang-tidy] Fix RedundantStringCStrCheck with r values

The previous fix for this, https://reviews.llvm.org/D76761, Passed test cases but failed in the real world as std::string has a non trivial destructor so creates a CXXBindTemporaryExpr.

This handles that shortfall and updates the test case std::basic_string implementation to use a non trivial destructor to reflect real world behaviour.

Reviewed By: gribozavr2

Differential Revision: https://reviews.llvm.org/D84831

commit | commitdiff | tree

Alexey Bader [Wed, 29 Jul 2020 12:07:06 +0000 (15:07 +0300)]

[OpenCL] Add global_device and global_host address spaces

This patch introduces 2 new address spaces in OpenCL: global_device and global_host
which are a subset of a global address space, so the address space scheme will be
looking like:

```
generic->global->host
                          ->device
             ->private
             ->local
constant
```

Justification: USM allocations may be associated with both host and device memory. We
want to give users a way to tell the compiler the allocation type of a USM pointer for
optimization purposes. (Link to the Unified Shared Memory extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc)

Before this patch USM pointer could be only in opencl_global
address space, hence a device backend can't tell if a particular pointer
points to host or device memory. On FPGAs at least we can generate more
efficient hardware code if the user tells us where the pointer can point -
being able to distinguish between these types of pointers at compile time
allows us to instantiate simpler load-store units to perform memory
transactions.

Patch by Dmitry Sidorov.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D82174

commit | commitdiff | tree

Tim Keith [Wed, 29 Jul 2020 14:23:28 +0000 (07:23 -0700)]

[flang] Fix bug with intrinsic in type declaration stmt

When an instrinsic function is declared in a type declaration statement
we need to set the INTRINSIC attribute and (per 8.2(3)) ignore the
specified type.

To simplify the check, add IsIntrinsic utility to BaseVisitor.

Also, intrinsics and external procedures were getting assigned a size
and offset and they shouldn't be.

Differential Revision: https://reviews.llvm.org/D84702

commit | commitdiff | tree

Juneyoung Lee [Wed, 29 Jul 2020 14:21:39 +0000 (23:21 +0900)]

[InstSimplify] add tests for expandCommutativeBinOp; NFC

commit | commitdiff | tree

Florian Hahn [Wed, 29 Jul 2020 13:54:03 +0000 (14:54 +0100)]

[SCEVExpander] Add option to preserve LCSSA directly.

This patch teaches SCEVExpander to directly preserve LCSSA.

As it is currently, SCEV does not look through PHI nodes in loops,
as it might break LCSSA form. Once SCEVExpander can preserve
LCSSA form, it should be safe for SCEV to look through PHIs.

To preserve LCSSA form, this patch uses formLCSSAForInstructions
on operands of newly created instructions, if the definition is inside
a different loop than the new instruction.

The final value we return from expandCodeFor may also need LCSSA
phis, depending on the insert point. As no user for it exists there yet,
create a temporary instruction at the insert point, which can be passed
to formLCSSAForInstructions. This temporary instruction is removed
after LCSSA construction.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D71538

commit | commitdiff | tree

Frederik Gossen [Wed, 29 Jul 2020 13:53:41 +0000 (13:53 +0000)]

[MLIR][Shape] Limit shape to standard lowerings to their supported types

The lowering does not support all types for its source operations. This change
makes the patterns fail in a well-defined manner.

Differential Revision: https://reviews.llvm.org/D84443

commit | commitdiff | tree

Bruno Ricci [Wed, 29 Jul 2020 13:49:59 +0000 (14:49 +0100)]

[clang][NFC] clang-format fix after eb10b065f2a870b425dcc2040b9955e0eee464b4

commit | commitdiff | tree

Bruno Ricci [Tue, 28 Jul 2020 14:49:05 +0000 (15:49 +0100)]

[clang][NFC] Pass the ASTContext to CXXRecordDecl::setCaptures

In general Decl::getASTContext() is relatively expensive and here the changes
are non-invasive. NFC.

commit | commitdiff | tree

Tres Popp [Wed, 29 Jul 2020 10:24:17 +0000 (12:24 +0200)]

Forward extent tensors through shape.broadcast.

Differential Revision: https://reviews.llvm.org/D84832

commit | commitdiff | tree

Sanjay Patel [Wed, 29 Jul 2020 13:35:26 +0000 (09:35 -0400)]

[ConstantFolding] update test checks FP min/max intrinsics

There's a slight difference in functionality with the new CHECK lines:
before, we allowed either -0.0 or 0.0 for maxnum/minnum. That matches
the definition, but we should always get a deterministic result from
constant folding within the compiler, so now we assert that we got
the single expected result in all cases.

commit | commitdiff | tree

Victor Campos [Wed, 29 Jul 2020 13:32:51 +0000 (14:32 +0100)]

[Driver][ARM] Fix testcase that should only run on ARM

Fix testcase introduced in d1a3396bfbc6fd6df927f2864c18d86e742cabff.

commit | commitdiff | tree

Simon Pilgrim [Wed, 29 Jul 2020 13:33:44 +0000 (14:33 +0100)]

[CostModel][X86] Add SSE costs for ABS intrinsics

commit | commitdiff | tree

Victor Campos [Tue, 21 Jul 2020 16:18:20 +0000 (17:18 +0100)]

[Driver][ARM] Disable unsupported features when nofp arch extension is used

A list of target features is disabled when there is no hardware
floating-point support. This is the case when one of the following
options is passed to clang:

- -mfloat-abi=soft
- -mfpu=none

This option list is missing, however, the extension "+nofp" that can be
specified in -march flags, such as "-march=armv8-a+nofp".

This patch also disables unsupported target features when nofp is passed
to -march.

Differential Revision: https://reviews.llvm.org/D82948

commit | commitdiff | tree

Andrew Ng [Tue, 28 Jul 2020 13:33:25 +0000 (14:33 +0100)]

[ELF][test] Add test coverage of `__real_` to wrap-plt.s

Differential Revision: https://reviews.llvm.org/D84749

commit | commitdiff | tree

Simon Pilgrim [Wed, 29 Jul 2020 12:32:05 +0000 (13:32 +0100)]

[TTI] Move abs/smax/smin/umax/umin cost expansion to ICA getIntrinsicInstrCost variant

This will simplify target overrides, and matches what we do for most integer intrinsic costs.

commit | commitdiff | tree

Stephan Herhut [Wed, 29 Jul 2020 10:50:05 +0000 (12:50 +0200)]

[mlir][Standard] Allow unranked memrefs as operands to dim and rank

`std.dim` currently only accepts ranked memrefs and `std.rank` is limited to
tensors.

Differential Revision: https://reviews.llvm.org/D84790

commit | commitdiff | tree

David Green [Wed, 29 Jul 2020 12:41:34 +0000 (13:41 +0100)]

[ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores

This patch uses the feature added in D79162 to fix the cost of a
sext/zext of a masked load, or a trunc for a masked store.
Previously, those were considered cheap or even free, but it's
not the case as we cannot split the load in the same way we would for
normal loads.

This updates the costs to better reflect reality, and adds a test for it
in test/Analysis/CostModel/ARM/cast.ll.

It also adds a vectorizer test that showcases the improvement: in some
cases, the vectorizer will now choose a smaller VF when
tail-predication is enabled, which results in better codegen. (Because
if it were to use a higher VF in those cases, the code we see above
would be generated, and the vmovs would block tail-predication later in
the process, resulting in very poor codegen overall)

Original Patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79163

commit | commitdiff | tree

David Green [Wed, 29 Jul 2020 12:32:53 +0000 (13:32 +0100)]

[Analysis] TTI: Add CastContextHint for getCastInstrCost

Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types. Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization. Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.

For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.

To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.

Original patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79162

commit | commitdiff | tree

David Sherwood [Thu, 16 Jul 2020 10:29:25 +0000 (11:29 +0100)]

[SVE][CodeGen] Add simple integer add tests for SVE tuple types

I have added tests to:

CodeGen/AArch64/sve-intrinsics-int-arith.ll

for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:

1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.

Differential revision: https://reviews.llvm.org/D84016

commit | commitdiff | tree

Sjoerd Meijer [Wed, 29 Jul 2020 12:13:04 +0000 (13:13 +0100)]

[ARM] Optimize immediate selection

Optimize some specific immediates selection by materializing them with sub/mvn
instructions as opposed to loading them from the constant pool.

Patch by Ben Shi, powerman1st@163.com.

Differential Revision: https://reviews.llvm.org/D83745

commit | commitdiff | tree

Matt Arsenault [Mon, 20 Jul 2020 19:56:39 +0000 (15:56 -0400)]

AMDGPU/GlobalISel: Refactor special argument management

commit | commitdiff | tree

Matt Arsenault [Tue, 14 Jul 2020 00:57:31 +0000 (20:57 -0400)]

AMDGPU: Make saturating add/sub legal for DAG path

Domain: System / Toolchain;

RSS Atom