platform/upstream/llvm.git
3 years ago[clang] accept -fsanitize-ignorelist= in addition to -fsanitize-blacklist=
Nico Weber [Tue, 4 May 2021 13:50:43 +0000 (09:50 -0400)]
[clang] accept -fsanitize-ignorelist= in addition to -fsanitize-blacklist=

Use that for internal names (including the default ignorelists of the
sanitizers).

Differential Revision: https://reviews.llvm.org/D101832

3 years ago[AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to zero...
Bradley Smith [Tue, 27 Apr 2021 14:31:46 +0000 (15:31 +0100)]
[AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to zero lanes 1-N

Specifically, this allow us to rely on the lane zero'ing behaviour of
SVE reduce instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>
Differential Revision: https://reviews.llvm.org/D101369

3 years agoLocal.cpp - Avoid DebugLoc copies - use const reference from getDebugLoc. NFCI.
Simon Pilgrim [Tue, 4 May 2021 13:31:29 +0000 (14:31 +0100)]
Local.cpp - Avoid DebugLoc copies - use const reference from getDebugLoc. NFCI.

3 years ago[OpenCL] Allow pipe as a valid identifier prior to OpenCL 2.0.
Anastasia Stulova [Tue, 4 May 2021 13:17:40 +0000 (14:17 +0100)]
[OpenCL] Allow pipe as a valid identifier prior to OpenCL 2.0.

Pipe has not been a reserved keyword in the earlier OpenCL
standards. However we failed to allow its use as an identifier
in the original commit. This issues is fixed now and testing
is improved accordingly.

Differential Revision: https://reviews.llvm.org/D101052

3 years ago[clang][cli][docs] Clarify marshalling infrastructure documentation
Jan Svoboda [Tue, 4 May 2021 13:15:44 +0000 (15:15 +0200)]
[clang][cli][docs] Clarify marshalling infrastructure documentation

3 years ago[Utils] recognizeBSwapOrBitReverseIdiom - support matching from funnel shift roots...
Simon Pilgrim [Tue, 4 May 2021 12:46:45 +0000 (13:46 +0100)]
[Utils] recognizeBSwapOrBitReverseIdiom - support matching from funnel shift roots (PR40058)

We were missing bitreverse matches in cases where InstCombine had seen a byte-level rotation at the end of a bitreverse sequence (replacing or() with fshl()), hindering the exhaustive bitreverse matching in CodeGenPrepare later on.

3 years ago[CodeGenPrepare][X86] Add bitreverse detection tests
Simon Pilgrim [Tue, 4 May 2021 12:20:31 +0000 (13:20 +0100)]
[CodeGenPrepare][X86] Add bitreverse detection tests

Initially only test for XOP which is the only thing that supports scalar bitreverse - we can add vector tests later.

3 years ago[X86] Update PR20841 test description to make it clear we SHOULDN'T be folding EFLAGS...
Simon Pilgrim [Fri, 30 Apr 2021 12:09:28 +0000 (13:09 +0100)]
[X86] Update PR20841 test description to make it clear we SHOULDN'T be folding EFLAGS with XADD

3 years ago[clang][cli] NFC: Remove confusing `EmptyKPM` variable
Jan Svoboda [Tue, 4 May 2021 12:27:40 +0000 (14:27 +0200)]
[clang][cli] NFC: Remove confusing `EmptyKPM` variable

3 years ago[mlir] Add lowering from math.expm1 to LLVM.
Adrian Kuegel [Tue, 16 Feb 2021 13:34:35 +0000 (14:34 +0100)]
[mlir] Add lowering from math.expm1 to LLVM.

Differential Revision: https://reviews.llvm.org/D96776

3 years ago[AMDGPU][AsmParser] Correct the order of optional operands to mimg
David Stuttard [Fri, 30 Apr 2021 10:23:33 +0000 (11:23 +0100)]
[AMDGPU][AsmParser] Correct the order of optional operands to mimg

Ordering of operands was incorrect meaning that a16 operand was treated as tfe

Differential Revision: https://reviews.llvm.org/D101618

Change-Id: I3b15e71ef5ff625f19f52823414ab684d76aca33

3 years ago[IndVarSimplify] Add additional tests using isImpliedViaMerge.
Florian Hahn [Tue, 4 May 2021 10:24:14 +0000 (11:24 +0100)]
[IndVarSimplify] Add additional tests using isImpliedViaMerge.

3 years agoRevert "[SLP]Allow masked gathers only if allowed by target."
Alexey Bataev [Tue, 4 May 2021 11:52:28 +0000 (04:52 -0700)]
Revert "[SLP]Allow masked gathers only if allowed by target."

This reverts commit fd18547e0721983dcb273670d16341921f831e50. Need to
add a check for the size of the vectorization tree to avoid some extra
vectorization.

3 years ago[InstCombine] ctpop(X) ^ ctpop(Y) & 1 --> ctpop(X^Y) & 1 (PR50094)
Dávid Bolvanský [Tue, 4 May 2021 11:15:13 +0000 (13:15 +0200)]
[InstCombine] ctpop(X) ^ ctpop(Y) & 1 --> ctpop(X^Y) & 1 (PR50094)

Original pattern: (__builtin_parity(x) ^ __builtin_parity(y))

LLVM rewrites it as: (__builtin_popcount(x) ^ __builtin_popcount(y)) & 1

Optimized form:  __builtin_popcount(X^Y) & 1

Alive proof: https://alive2.llvm.org/ce/z/-GdWFr

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D101802

3 years ago[clang-tidy] Fix cppcoreguidelines-pro-type-vararg false positives with __builtin_ms_...
Georgy Komarov [Sun, 25 Apr 2021 15:52:16 +0000 (18:52 +0300)]
[clang-tidy] Fix cppcoreguidelines-pro-type-vararg false positives with __builtin_ms_va_list

This commit fixes cppcoreguidelines-pro-type-vararg false positives on
'char *' variables.

The incorrect warnings generated by clang-tidy can be illustrated with
the following minimal example:

```
goid foo(char* in) {
  char *tmp = in;
}
```

The problem is that __builtin_ms_va_list desugared as 'char *', which
leads to false positives.

Fixes bugzilla issue 48042.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D101259

3 years agoIntroduce clangd-server-monitor tool
Kirill Bobyrev [Tue, 4 May 2021 10:48:20 +0000 (12:48 +0200)]
Introduce clangd-server-monitor tool

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D101516

3 years ago[clang-format] Prevent extraneous space insertion in bitshift operators
Luis Penagos [Tue, 4 May 2021 10:00:00 +0000 (12:00 +0200)]
[clang-format] Prevent extraneous space insertion in bitshift operators

This serves to augment the improvements made in https://reviews.llvm.org/D86581. It prevents clang-format from interpreting bitshift operators as template arguments in certain circumstances. This is an attempt at fixing https://bugs.llvm.org/show_bug.cgi?id=49868

Reviewed By: MyDeveloperDay, krasimir

Differential Revision: https://reviews.llvm.org/D100778

3 years ago[RISCV] Pre-commit tests for D101342
Jessica Clarke [Tue, 4 May 2021 10:12:43 +0000 (11:12 +0100)]
[RISCV] Pre-commit tests for D101342

These tests show inefficient sign extension for AMOs on RISC-V. The
normal CodeGen tests use anyext return values, but if marked signext
then we end up generating unnecessary sign extension instructions. This
can be seen when compiling C that returns an i32 (signed or unsigned),
where the calling convention results in a signext return value.

3 years ago[llvm] Unbreak no-assertion testing
David Zarzycki [Tue, 4 May 2021 10:05:38 +0000 (06:05 -0400)]
[llvm] Unbreak no-assertion testing

3 years ago[gn build] Port 1db4dbba24dd
LLVM GN Syncbot [Tue, 4 May 2021 09:56:46 +0000 (09:56 +0000)]
[gn build] Port 1db4dbba24dd

3 years agoMake dependency between certain analysis passes transitive
Bjorn Pettersson [Wed, 21 Apr 2021 12:18:32 +0000 (14:18 +0200)]
Make dependency between certain analysis passes transitive

LazyBlockFrequenceInfoPass, LazyBranchProbabilityInfoPass and
LoopAccessLegacyAnalysis all cache pointers to their nestled required
analysis passes. One need to use addRequiredTransitive to describe
that the nestled passes can't be freed until those analysis passes
no longer are used themselves.

There is still a bit of a mess considering the getLazyBPIAnalysisUsage
and getLazyBFIAnalysisUsage functions. Those functions are used from
both Transform, CodeGen and Analysis passes. I figure it is OK to
use addRequiredTransitive also when being used from Transform and
CodeGen passes. On the other hand, I figure we must do it when
used from other Analysis passes. So using addRequiredTransitive should
be more correct here. An alternative solution would be to add a
bool option in those functions to let the user tell if it is a
analysis pass or not. Since those lazy passes will be obsolete when
new PM has conquered the world I figure we can leave it like this
right now.

Intention with the patch is to fix PR49950. It at least solves the
problem for the reproducer in PR49950. However, that reproducer
need five passes in a specific order, so there are lots of various
"solutions" that could avoid the crash without actually fixing the
root cause.

Differential Revision: https://reviews.llvm.org/D100958

3 years agoRecommit "[VP,Integer,#2] ExpandVectorPredication pass"
Simon Moll [Fri, 30 Apr 2021 11:43:48 +0000 (13:43 +0200)]
Recommit "[VP,Integer,#2] ExpandVectorPredication pass"

This reverts the revert 02c5ba8679873e878ae7a76fb26808a47940275b

Fix:

Pass was registered as DUMMY_FUNCTION_PASS causing the newpm-pass
functions to be doubly defined. Triggered in -DLLVM_ENABLE_MODULE=1
builds.

Original commit:

This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).

VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D78203

3 years ago[clangd] Fix hover crash on broken code
Kadir Cetinkaya [Mon, 3 May 2021 07:13:56 +0000 (09:13 +0200)]
[clangd] Fix hover crash on broken code

Differential Revision: https://reviews.llvm.org/D101743

3 years agoIntroduce -Wreserved-identifier
serge-sans-paille [Wed, 9 Dec 2020 08:26:27 +0000 (09:26 +0100)]
Introduce -Wreserved-identifier

Warn when a declaration uses an identifier that doesn't obey the reserved
identifier rule from C and/or C++.

Differential Revision: https://reviews.llvm.org/D93095

3 years ago[RISCV] Lower splats of non-constant i1s as SETCCs
Fraser Cormack [Wed, 28 Apr 2021 15:11:57 +0000 (16:11 +0100)]
[RISCV] Lower splats of non-constant i1s as SETCCs

This patch adds support for splatting i1 types to fixed-length or
scalable vector types. It does so by lowering the operation to a SETCC
of the equivalent i8 type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101465

3 years ago[TTI] Replace ceil lambdas with divideCeil. NFCI
David Green [Tue, 4 May 2021 08:04:44 +0000 (09:04 +0100)]
[TTI] Replace ceil lambdas with divideCeil. NFCI

As pointed out in D101726, this function already exists in MathExtras.
It uses different types, but with the values used here I believe that
should not make a functional difference.

3 years ago[gn build] Port ed51156084dd
LLVM GN Syncbot [Tue, 4 May 2021 06:39:48 +0000 (06:39 +0000)]
[gn build] Port ed51156084dd

3 years ago[ModuleUtils] NFC: Add unit tests for appendToUsedList
Reshabh Sharma [Tue, 4 May 2021 06:35:50 +0000 (12:05 +0530)]
[ModuleUtils] NFC: Add unit tests for appendToUsedList

This patch adds initial unit tests for appendToUsedList
in the ModuleUtils. It specifically tests changes from
https://reviews.llvm.org/D101363 which intent to allow
insertion of globals in non-zero address spaces into the
llvm used lists.

Reviewed by: dblaikie

Differential Revision: https://reviews.llvm.org/D101746

3 years ago[lld-macho] Implement builtin section renaming
Greg McGary [Sun, 25 Apr 2021 23:00:24 +0000 (16:00 -0700)]
[lld-macho] Implement builtin section renaming

ld64 automatically renames many sections depending on output type and assorted flags. Here, we implement the most common configs. We can add more obscure flags and behaviors as needed.

Depends on D101393

Differential Revision: https://reviews.llvm.org/D101395

3 years ago[NFC] Give better diagnose on clang-format not found error
Shivam Gupta [Tue, 4 May 2021 03:52:06 +0000 (09:22 +0530)]
[NFC] Give better diagnose on clang-format not found error

Contributors often confused by whether this is a server or local issue.

3 years ago[clang][CodeGen] Use llvm::stable_sort for multi version resolver options
Alex Lorenz [Tue, 4 May 2021 03:05:38 +0000 (20:05 -0700)]
[clang][CodeGen] Use llvm::stable_sort for multi version resolver options

The use of llvm::sort causes periodic failures on the bot with EXPENSIVE_CHECKS enabled,
as the regular sort pre-shuffles the array in the expensive checks mode, leading to a
non-deterministic test result which causes the CodeGenCXX/attr-cpuspecific-outoflinedefs.cpp
testcase to fail on the bot (http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/).

3 years ago[mlir] Fix bug in TransferOpReduceRank when all dims are broadcasts
Matthias Springer [Tue, 4 May 2021 01:43:10 +0000 (10:43 +0900)]
[mlir] Fix bug in TransferOpReduceRank when all dims are broadcasts

TransferReadOps that are a scalar read + broadcast are handled by TransferReadToVectorLoadLowering.

Differential Revision: https://reviews.llvm.org/D101808

3 years ago[mlir][tosa] Add lowerings for tosa.equal and tosa.arithmetic_right_shift
natashaknk [Tue, 4 May 2021 01:08:14 +0000 (18:08 -0700)]
[mlir][tosa] Add lowerings for tosa.equal and tosa.arithmetic_right_shift

Lowerings equal and arithmetic_right_shift for elementwise ops to linalg dialect using linalg.generic

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D101804

3 years ago[IndVarSimplify][NFC] Removed mayThrow from if-condition in predicateLoopExits of...
Philip Reames [Tue, 4 May 2021 01:23:41 +0000 (18:23 -0700)]
[IndVarSimplify][NFC] Removed mayThrow from if-condition in predicateLoopExits of IndVarSimplify

Instruction has mayHaveSideEffects method that returns true if mayThrow return true because this is called internally in the first method.  As such, the call being removed is redundant.

Patch By: vdsered (Daniil Seredkin)
Differential Revision: https://reviews.llvm.org/D101685

3 years ago[lld][WebAssembly] Fix crash with `-pie` without `--allow-undefined`
Sam Clegg [Sat, 1 May 2021 22:37:40 +0000 (15:37 -0700)]
[lld][WebAssembly] Fix crash with `-pie` without `--allow-undefined`

`shouldImport` was not returning true in PIC mode even though out
assumption elsewhere (in Relocations.cpp:scanRelocations) is that we
don't report undefined symbols in PIC mode today.  This was resulting
functions that were undefined and but also not imported which hits an
assert later on that all functions have valid indexes.

Differential Revision: https://reviews.llvm.org/D101716

3 years ago[InstCombine] generalize select + select/and/or folding using implied conditions
Juneyoung Lee [Sun, 2 May 2021 11:07:25 +0000 (20:07 +0900)]
[InstCombine] generalize select + select/and/or folding using implied conditions

This patch optimizes the remaining possible cases in D101191 by generalizing isImpliedCondition()-based
foldings.

Assume that there is `op a, (select b, _, _)` where op is one of `and i1`, `or i1` or their select forms.

We can do the following optimization based on the result of `isImpliedCondition(a, b)`:

If a = true implies…
- b = true:
    - select a, (select b, A, B), false => select a, A, false : https://alive2.llvm.org/ce/z/WCnZYh
    - and a, (select b, A, B) => select a, A, false : https://alive2.llvm.org/ce/z/uZhcMG
- b = false:
    - select a, (select b, A, B), false => select a, B, false : https://alive2.llvm.org/ce/z/c2hJpV
    - and a, (select b, A, B) => select a, B, false : https://alive2.llvm.org/ce/z/5ggwMM

If a = false implies…
- b = true:
    - select a, true, (select b, A, B) => select a, true, A : https://alive2.llvm.org/ce/z/tidKvH
    - or a, (select b, A, B) =>  select a, true, A : https://alive2.llvm.org/ce/z/cC-uyb
- b = false:
    - select a, true, (select b, A, B) => select a, true, B : https://alive2.llvm.org/ce/z/ZXpJq9
    - or a, (select b, A, B) => select a, true, B : https://alive2.llvm.org/ce/z/hnDrJj

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101720

3 years agoPrecommit tests for D101720 (NFC)
Juneyoung Lee [Sun, 2 May 2021 10:24:09 +0000 (19:24 +0900)]
Precommit tests for D101720 (NFC)

3 years agoClarify the help for "breakpoint command add" and "watchpoint command add".
Jim Ingham [Tue, 4 May 2021 00:17:51 +0000 (17:17 -0700)]
Clarify the help for "breakpoint command add" and "watchpoint command add".

These two commands add a list of commands to the breakpoint/watchpoint. The current
implementation only supports replacing the current command list.  I started with
that as overwrite seems to be the most common operation.  But using "add" will
allow us to later offer other add-modes: "prepend", "append" and "insert".
That and "overwrite" then make up a useful set of options for this operation.

3 years ago[NewPM] Only invalidate modified functions' analyses in CGSCC passes
Arthur Eubanks [Mon, 3 May 2021 23:50:26 +0000 (16:50 -0700)]
[NewPM] Only invalidate modified functions' analyses in CGSCC passes

Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved.

So far this only touches the inliner, argpromotion, funcattrs, and
updateCGAndAnalysisManager(), since they are the most used.

Slight compile time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=326da4adcb8def2abdd530299d87ce951c0edec9&to=8942c7669f330082ef159f3c6c57c3c28484f4be&stat=instructions

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D100917

3 years ago[lld][WebAssembly] Convert more tests to asm format. NFC
Sam Clegg [Sun, 2 May 2021 04:18:06 +0000 (21:18 -0700)]
[lld][WebAssembly] Convert more tests to asm format. NFC

Two of these are trivial.  The third (shared.s) did have some
expectations changes but only due to two data symbols being re-ordered.

Differential Revision: https://reviews.llvm.org/D101711

3 years ago[libc] Introduce asctime, asctime_r to LLVM libc
Raman Tenneti [Wed, 31 Mar 2021 20:56:41 +0000 (13:56 -0700)]
[libc] Introduce asctime, asctime_r to LLVM libc

[libc] Introduce asctime, asctime_r to LLVM libc

asctime and asctime_r share the same common code. They call asctime_internal
a static inline function.

asctime uses snprintf to return the string representation in a buffer.
It uses the following format (26 characters is the buffer size) as per
7.27.3.1 section in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2478.pdf.
The buf parameter for asctime_r shall point to a buffer of at least 26 bytes.

snprintf(buf, 26, "%.3s %.3s%3d %.2d:%.2d:%.2d %d\n",...)

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D99686

3 years ago[gn build] Port 7310403e3cdf
LLVM GN Syncbot [Tue, 4 May 2021 00:04:57 +0000 (00:04 +0000)]
[gn build] Port 7310403e3cdf

3 years ago[demangler] Initial support for the new Rust mangling scheme
Tomasz Miąsko [Mon, 3 May 2021 23:41:30 +0000 (16:41 -0700)]
[demangler] Initial support for the new Rust mangling scheme

Add a demangling support for a small subset of a new Rust mangling
scheme, with complete support planned as a follow up work.

Intergate Rust demangling into llvm-cxxfilt and use llvm-cxxfilt for
end-to-end testing. The new Rust mangling scheme uses "_R" as a prefix,
which makes it easy to disambiguate it from other mangling schemes.

The public API is modeled after __cxa_demangle / llvm::itaniumDemangle,
since potential candidates for further integration use those.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101444

3 years ago[lld][WebAssembly] Do not merge comdat data segments
Sam Clegg [Sat, 1 May 2021 22:37:40 +0000 (15:37 -0700)]
[lld][WebAssembly] Do not merge comdat data segments

When running in relocatable mode any input data segments that are part
of a comdat group should not be merged with other segments of the same
name.  This is because the final linker needs to keep the separate so
they can be included/excluded individually.

Often this is not a problem since normally only one section with a given
name `foo` ends up in the output object file.  However, the problem
occurs when one input contains `foo` which part of a comdat and another
object contains a local symbol `foo` we were attempting to merge them.

This behaviour matches (I believe) that of the ELF linker.  See
`LinkerScript.cpp:addInputSec`.

Fixes: https://github.com/emscripten-core/emscripten/issues/9726

Differential Revision: https://reviews.llvm.org/D101703

3 years agoRecommit "Generalize getInvertibleOperand recurrence handling slightly"
Philip Reames [Mon, 3 May 2021 23:35:47 +0000 (16:35 -0700)]
Recommit "Generalize getInvertibleOperand recurrence handling slightly"

This was reverted because of a reported problem.  It turned out this patch didn't introduce said problem, it just exposed it more widely.  15a4233 fixes the root issue, so this simple a) rebases over that, and b) adds a much more extensive comment explaining why that weakened assert is correct.

Original commit message follows:

Follow up to D99912, specifically the revert, fix, and reapply thereof.

This generalizes the invertible recurrence logic in two ways:
* By allowing mismatching operand numbers of the phi, we can recurse through a pair of phi recurrences whose operand orders have not been canonicalized.
* By allowing recurrences through operand 1, we can invert these odd (but legal) recurrence.

Differential Revision: https://reviews.llvm.org/D100884

3 years ago[NewPM] Invalidate AAManager after populating GlobalsAA
Arthur Eubanks [Tue, 27 Apr 2021 15:56:11 +0000 (08:56 -0700)]
[NewPM] Invalidate AAManager after populating GlobalsAA

GlobalsAA is only created at the beginning of the inliner pipeline.  If
an AAManager is cached from previous passes, it won't get rebuilt to
include the newly created GlobalsAA.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D101379

3 years ago[Attributor] Add AAExecutionDomainInfo interface to OpenMPOpt
Joseph Huber [Wed, 28 Apr 2021 20:22:53 +0000 (16:22 -0400)]
[Attributor] Add AAExecutionDomainInfo interface to OpenMPOpt

Summary:
Add the AAExecutionDomainInfo attributor instance to OpenMPOpt.
This will infer information relating to domain information that an
instruction might be expecting in. Right now this only includes a very
crude check for instructions that will be executed by the master thread
by comparing a thread-id function with a constant zero.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101578

3 years agoOne more test case inspired by PR50191
Philip Reames [Mon, 3 May 2021 23:22:56 +0000 (16:22 -0700)]
One more test case inspired by PR50191

3 years ago[mlir] Linalg: add vector transfer lowering patterns to the contraction lowering
Eugene Zhulenev [Mon, 3 May 2021 23:06:43 +0000 (16:06 -0700)]
[mlir] Linalg: add vector transfer lowering patterns to the contraction lowering

This fixes a performance regression in vec-mat vectorization

Reviewed By: asaadaldien

Differential Revision: https://reviews.llvm.org/D101795

3 years ago[OpenMP] Refactor/Rework topology discovery code
Peyton, Jonathan L [Fri, 16 Apr 2021 21:30:26 +0000 (16:30 -0500)]
[OpenMP] Refactor/Rework topology discovery code

This patch does the following:

1) Introduce kmp_topology_t as the runtime-friendly structure (the
corresponding global variable is __kmp_topology) to determine the
exact machine topology which can vary widely among current and future
architectures. The current design is not easy to expand beyond the assumed
three layer topology: sockets, cores, and threads so a rework capable of
using the existing KMP_AFFINITY mechanisms is required.

This new topology structure has:
* The depth and types of the topology
* Ratio count for each consecutive level (e.g., number of cores per
   socket, number of threads per core)
* Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads)
* Equivalent topology layer map (e.g., Numa domain is equivalent to
   socket, L1/L2 cache equivalent to core)
* Whether it is uniform or not

The hardware threads are represented with the kmp_hw_thread_t
structure. This structure contains the ids (e.g., socket 0, core 1,
thread 0) and other information grabbed from the previous Address
structure. The kmp_topology_t structure contains an array of these.

2) Generalize the KMP_HW_SUBSET envirable for the new
kmp_topology_t structure. The algorithm doesn't assume any order with
tiles,numa domains,sockets,cores,threads. Instead it just parses the
envirable, makes sure it is consistent with the detected topology
(including taking into account equivalent layers) and then trims away
the unneeded subset of hardware threads. To enable this, a new
kmp_hw_subset_t structure is introduced which contains a vector of
items (hardware type, number user wants, offset). Any keyword within
__kmp_hw_get_keyword() can be used as a name and can be shortened as
well. e.g.,
KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine.

3) Simplify topology detection functions so they only do the singular
task of detecting the machine's topology. Printing, and all
canonicalizing functionality is now done afterwards. So many lines of
duplicated code are eliminated.

4) Add new ll_caches and numa_domains to OMP_PLACES, and
consequently, KMP_AFFINITY's granularity setting. All the names within
__kmp_hw_get_keyword() are available for use in OMP_PLACES or
KMP_AFFINITY's granularity setting.

5) Simplify and future-proof code where explicit lists of allowed
affinity settings keywords inside if() conditions.

6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method
so equivalent caches could be detected (in particular for the ll_caches
place).

Differential Revision: https://reviews.llvm.org/D100997

3 years agoAdd some additional test cases inspired by PR50191
Philip Reames [Mon, 3 May 2021 22:55:25 +0000 (15:55 -0700)]
Add some additional test cases inspired by PR50191

3 years ago[lld-macho] Add ARM requirement to objc.s
Jez Ng [Mon, 3 May 2021 22:47:30 +0000 (18:47 -0400)]
[lld-macho] Add ARM requirement to objc.s

3 years ago[lld-macho] De-templatize mach_header operations
Jez Ng [Mon, 3 May 2021 22:31:23 +0000 (18:31 -0400)]
[lld-macho] De-templatize mach_header operations

@thakis pointed out that `mach_header` and `mach_header_64`
actually have the same set of (used) fields, with the 64-bit version
having extra padding. So we can access the fields we need using the
single `mach_header` type instead of using templates to switch between
the two.

I also spotted a potential issue where hasObjCSection tries to parse a
file w/o checking if it does indeed match the target arch... As such,
I've added a quick magic number check to ensure we don't access invalid
memory during `findCommand()`.

Addresses PR50180.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D101724

3 years ago[InstCombine] Added tests for PR50094, NFC
Dávid Bolvanský [Mon, 3 May 2021 22:15:58 +0000 (00:15 +0200)]
[InstCombine] Added tests for PR50094, NFC

3 years ago[Utils] Add prof metadata to matched unnamed values
Giorgis Georgakoudis [Mon, 3 May 2021 18:12:11 +0000 (11:12 -0700)]
[Utils] Add prof metadata to matched unnamed values

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101742

3 years ago[mlir] Add polynomial approximation for math::Log1p
Emilio Cota [Mon, 3 May 2021 22:05:14 +0000 (15:05 -0700)]
[mlir] Add polynomial approximation for math::Log1p

This approximation matches the one in Eigen.

```
name                      old cpu/op  new cpu/op  delta
BM_mlir_Log1p_f32/10      83.2ns ± 7%  34.8ns ± 5%  -58.19%    (p=0.000 n=84+71)
BM_mlir_Log1p_f32/100      664ns ± 4%   129ns ± 4%  -80.57%    (p=0.000 n=82+82)
BM_mlir_Log1p_f32/1k      6.75µs ± 4%  0.81µs ± 3%  -88.07%    (p=0.000 n=88+79)
BM_mlir_Log1p_f32/10k     76.5µs ± 3%   7.8µs ± 4%  -89.84%    (p=0.000 n=80+80)
BM_eigen_s_Log1p_f32/10   70.1ns ±14%  72.6ns ±14%   +3.49%  (p=0.000 n=116+112)
BM_eigen_s_Log1p_f32/100   706ns ± 9%   717ns ± 3%   +1.60%   (p=0.018 n=117+80)
BM_eigen_s_Log1p_f32/1k   8.26µs ± 1%  8.26µs ± 1%     ~       (p=0.567 n=84+86)
BM_eigen_s_Log1p_f32/10k  92.1µs ± 5%  92.6µs ± 6%   +0.60%  (p=0.047 n=115+115)
BM_eigen_v_Log1p_f32/10   31.8ns ±24%  34.9ns ±17%   +9.72%    (p=0.000 n=98+96)
BM_eigen_v_Log1p_f32/100   169ns ±10%   177ns ± 5%   +4.66%   (p=0.000 n=119+81)
BM_eigen_v_Log1p_f32/1k   1.42µs ± 4%  1.46µs ± 8%   +2.70%   (p=0.000 n=93+113)
BM_eigen_v_Log1p_f32/10k  14.4µs ± 5%  14.9µs ± 8%   +3.61%  (p=0.000 n=115+110)
```

Reviewed By: ezhulenev, ftynse

Differential Revision: https://reviews.llvm.org/D101765

3 years ago[AArch64][SVE] More unpredicated ld1/st1 patterns for reg+reg addressing modes
Eli Friedman [Thu, 15 Apr 2021 05:21:35 +0000 (22:21 -0700)]
[AArch64][SVE] More unpredicated ld1/st1 patterns for reg+reg addressing modes

In some cases, we can improve the generated code by using a load with
the "wrong" element width: in particular, using ld1b/st1b when we see
reg+reg without a shift.

Differential Revision: https://reviews.llvm.org/D100527

3 years ago[debugserver] Include LLDB_VERSION_SUFFIX in debugserver version
Jonas Devlieghere [Mon, 3 May 2021 22:04:07 +0000 (15:04 -0700)]
[debugserver] Include LLDB_VERSION_SUFFIX in debugserver version

The lack of a dot before the suffix is intentional, as the suffix itself
includes a dot or dash.

Differential revision: https://reviews.llvm.org/D101655

3 years ago[InstCombine] cttz(sext(x)) -> cttz(zext(x))
Dávid Bolvanský [Mon, 3 May 2021 21:59:19 +0000 (23:59 +0200)]
[InstCombine] cttz(sext(x)) -> cttz(zext(x))

```

----------------------------------------
define i32 @src(i16 %x, i1 %b) {
%0:
  %z = sext i16 %x to i32
  %p = cttz i32 %z, %b
  ret i32 %p
}
=>
define i32 @tgt(i16 %x, i1 %b) {
%0:
  %z = zext i16 %x to i32
  %p = cttz i32 %z, %b
  ret i32 %p
}
Transformation seems to be correct!
```

https://alive2.llvm.org/ce/z/evomeg

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101764

3 years ago[WebAssembly] Reenable end-to-end test in wasm-eh.cpp
Heejin Ahn [Mon, 3 May 2021 01:03:25 +0000 (18:03 -0700)]
[WebAssembly] Reenable end-to-end test in wasm-eh.cpp

This was temporarily disabled while we were reimplementing the new spec.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D101735

3 years ago[mlir][Linalg] Add a utility method to get reassociations maps for reshape.
MaheshRavishankar [Mon, 3 May 2021 21:39:36 +0000 (14:39 -0700)]
[mlir][Linalg] Add a utility method to get reassociations maps for reshape.

Given the source and destination shapes, if they are static, or if the
expanded/collapsed dimensions are unit-extent, it is possible to
compute the reassociation maps that can be used to reshape one type
into another. Add a utility method to return the reassociation maps
when possible.

This utility function can be used to fuse a sequence of reshape ops,
given the type of the source of the producer and the final result
type. This pattern supercedes a more constrained folding pattern added
to DropUnitDims pass.

Differential Revision: https://reviews.llvm.org/D101343

3 years ago[libcxx][iterator][ranges] adds `bidirectional_iterator` and `bidirectional_range`
Christopher Di Bella [Mon, 12 Apr 2021 01:16:45 +0000 (01:16 +0000)]
[libcxx][iterator][ranges] adds `bidirectional_iterator` and `bidirectional_range`

Implements parts of:
    * P0896R4 The One Ranges Proposal`

Depends on D100275.

Differential Revision: https://reviews.llvm.org/D100278

3 years ago[mlir][sparse] fixed typo: sparse -> sparse_tensor
Aart Bik [Mon, 3 May 2021 18:11:41 +0000 (11:11 -0700)]
[mlir][sparse] fixed typo: sparse -> sparse_tensor

Test passes either way, but this is full name of dialect

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D101774

3 years agoRevert "[MC][ELF] Work around R_MIPS_LO16 relocation handling problem"
Dimitry Andric [Mon, 3 May 2021 21:08:04 +0000 (23:08 +0200)]
Revert "[MC][ELF] Work around R_MIPS_LO16 relocation handling problem"

This reverts commit ab40c027f0ce9492919a72ad339de40bdb84b354.

Some additional test cases are influenced by the workaround, and I need
to do a complete test run to identify and check them all.

3 years ago[MC][ELF] Work around R_MIPS_LO16 relocation handling problem
Dimitry Andric [Mon, 3 May 2021 18:08:49 +0000 (20:08 +0200)]
[MC][ELF] Work around R_MIPS_LO16 relocation handling problem

This fixes PR49821, and avoids "ld.lld: error: test.o:(.rodata.str1.1):
offset is outside the section" errors when linking MIPS objects with
negative R_MIPS_LO16 implicit addends.

ld.lld handles R_MIPS_HI16/R_MIPS_LO16 separately, not as a whole, so it
doesn't know that an R_MIPS_HI16 with implicit addend 1 and an
R_MIPS_LO16 with implicit addend -32768 represents 32768, which is in
range of a MergeInputSection. We could introduce a new RelExpr member
(like R_RISCV_PC_INDIRECT for R_RISCV_PCREL_HI20 / R_RISCV_PCREL_LO12)
but the complexity is unnecessary given that GNU as keeps the original
symbol for this case as well.

Reviewed By: atanasyan, MaskRay

Differential Revision: https://reviews.llvm.org/D101773

3 years ago[sanitizer] Set IndentPPDirectives: AfterHash in .clang-format
Fangrui Song [Mon, 3 May 2021 20:49:41 +0000 (13:49 -0700)]
[sanitizer] Set IndentPPDirectives: AfterHash in .clang-format

Code patterns like this are common, `#` at the line beginning
(https://google.github.io/styleguide/cppguide.html#Preprocessor_Directives),
one space indentation for if/elif/else directives.
```
#if SANITIZER_LINUX
# if defined(__aarch64__)
# endif
#endif
```

However, currently clang-format wants to reformat the code to
```
#if SANITIZER_LINUX
#if defined(__aarch64__)
#endif
#endif
```

This significantly harms readability in my review.  Use `IndentPPDirectives:
AfterHash` to defeat the diagnostic. clang-format will now suggest:

```
#if SANITIZER_LINUX
#  if defined(__aarch64__)
#  endif
#endif
```

Unfortunately there is no clang-format option using indent with 1 for
just preprocessor directives. However, this is still one step forward
from the current behavior.

Reviewed By: #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D100238

3 years agoRevert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
Tomas Matheson [Mon, 3 May 2021 20:48:20 +0000 (21:48 +0100)]
Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"

This reverts commit 753185031d939711f8733639a77a6fdc3bdbad22.

3 years ago[libcxx][iterator][ranges] adds `forward_iterator` and `forward_range`
Christopher Di Bella [Sun, 11 Apr 2021 22:11:03 +0000 (22:11 +0000)]
[libcxx][iterator][ranges] adds `forward_iterator` and `forward_range`

Implements parts of:
    * P0896R4 The One Ranges Proposal`

Depends on D100271.

Differential Revision: https://reviews.llvm.org/D100275

3 years ago[SimplifyCFG] Look for control flow changes instead of side effects.
Teresa Johnson [Wed, 28 Apr 2021 23:05:04 +0000 (16:05 -0700)]
[SimplifyCFG] Look for control flow changes instead of side effects.

When passingValueIsAlwaysUndefined scans for an instruction between an
inst with a null or undef argument and its first use, it was checking
for instructions that may have side effects, which is a superset of the
instructions it intended to find (as per the comments, control flow
changing instructions that would prevent reaching the uses). Switch
to using isGuaranteedToTransferExecutionToSuccessor() instead.

Without this change, when enabling -fwhole-program-vtables, which causes
assumes to be inserted by clang, we can get different simplification
decisions. In particular, when building with instrumentation FDO it can
affect the optimizations decisions before FDO matching, leading to some
mismatches.

I had to modify d83507-knowledge-retention-bug.ll since this fix enables
more aggressive optimization of that code such that it no longer tested
the original bug it was meant to test. I removed the undef which still
provokes the original failure (confirmed by temporarily reverting the
fix) and also changed it to just invoke the passes of interest to narrow
the testing.

Similarly I needed to adjust code for UnreachableEliminate.ll to avoid
an undef which was causing the function body to get optimized away with
this fix.

Differential Revision: https://reviews.llvm.org/D101507

3 years ago[WebAssembly] Fixup order of ins variables for table instructions
Paulo Matos [Mon, 3 May 2021 20:04:50 +0000 (13:04 -0700)]
[WebAssembly] Fixup order of ins variables for table instructions

WebAssembly instruction arguments should have their arguments ordered from
the deepest to the shallowest on the stack.

3 years ago[ValueTracking] soften assert for invertible recurrence matching
Sanjay Patel [Mon, 3 May 2021 19:07:10 +0000 (15:07 -0400)]
[ValueTracking] soften assert for invertible recurrence matching

There's a TODO comment in the code and discussion in D99912
about generalizing this, but I wasn't sure how to implement that,
so just going with a potential minimal fix to avoid crashing.

The test is a reduction beyond useful code (there's no user of
%user...), but it is based on https://llvm.org/PR50191, so this
is asserting on real code.

Differential Revision: https://reviews.llvm.org/D101772

3 years ago[mlir][Linalg] Use rank-reduced versions of subtensor and subtensor insert when possible.
MaheshRavishankar [Mon, 3 May 2021 19:50:29 +0000 (12:50 -0700)]
[mlir][Linalg] Use rank-reduced versions of subtensor and subtensor insert when possible.

Convert subtensor and subtensor_insert operations to use their
rank-reduced versions to drop unit dimensions.

Differential Revision: https://reviews.llvm.org/D101495

3 years ago[OpenMPIRBuilder] Add createOffloadMaptypes and createOffloadMapnames functions
Valentin Clement [Mon, 3 May 2021 19:42:19 +0000 (15:42 -0400)]
[OpenMPIRBuilder] Add createOffloadMaptypes and createOffloadMapnames functions

Add function to create the offload_maptypes and the offload_mapnames globals. These two functions
are used in clang. They will be used in the Flang/MLIR lowering as well.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101503

3 years ago[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
Tomas Matheson [Wed, 31 Mar 2021 16:45:45 +0000 (17:45 +0100)]
[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0

atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164

Originally submitted as 3338290c187b254ad071f4b9cbf2ddb2623cefc0.
Reverted in c7df6b1223d88dfd15248fbf7b7b83dacad22ae3.

3 years ago[mlir][linalg] Fix vectorization bug in vector transfer indexing map calculation
thomasraoux [Mon, 3 May 2021 18:56:11 +0000 (11:56 -0700)]
[mlir][linalg] Fix vectorization bug in vector transfer indexing map calculation

The current implementation had a bug as it was relying on the target vector
dimension sizes to calculate where to insert broadcast. If several dimensions
have the same size we may insert the broadcast on the wrong dimension. The
correct broadcast cannot be inferred from the type of the source and
destination vector.

Instead when we want to extend transfer ops we calculate an "inverse" map to the
projected permutation and insert broadcast in place of the projected dimensions.

Differential Revision: https://reviews.llvm.org/D101738

3 years ago[MLIR][Linalg] Avoid forward declaration in `Loops.cpp`
Frederik Gossen [Mon, 3 May 2021 19:05:43 +0000 (21:05 +0200)]
[MLIR][Linalg] Avoid forward declaration in `Loops.cpp`

Differential Revision: https://reviews.llvm.org/D101771

3 years ago[MLIR][Linalg] Lower `linalg.tiled_loop` in a separate pass
Frederik Gossen [Mon, 3 May 2021 18:58:21 +0000 (20:58 +0200)]
[MLIR][Linalg] Lower `linalg.tiled_loop` in a separate pass

Add dedicated pass `convert-linalg-tiled-loops-to-scf` to lower
`linalg.tiled_loop`s.

Differential Revision: https://reviews.llvm.org/D101768

3 years ago[AsmParser][SystemZ][z/OS] Implement HLASM location counter syntax ("*") for Z PC...
Anirudh Prasad [Mon, 3 May 2021 18:57:45 +0000 (14:57 -0400)]
[AsmParser][SystemZ][z/OS] Implement HLASM location counter syntax ("*") for Z PC-relative instructions.

- This patch attempts to implement the location counter syntax (*) for the HLASM variant for PC-relative instructions.
- In the HLASM variant, for purely constant relocatable values, we expect a * token preceding it, with special support for " *" which is parsed as "<pc-rel-insn 0>"
- For combinations of absolute values and relocatable values, we don't expect the "*" preceding the token.

When you have a " * "  what’s accepted is:

```
*<space>.*{.*} -> <pc-rel-insn> 0
*[+|-][constant-value] -> <pc-rel-insn> [+|-]constant-value
```

When you don’t have a " * " what’s accepted is:

```
brasl  1,func           is allowed (MCSymbolRef type)
brasl  1,func+4         is allowed (MCBinary type)
brasl  1,4+func         is allowed (MCBinary type)
brasl  1,-4+func        is allowed (MCBinary type)
brasl  1,func-4         is allowed (MCBinary type)
brasl  1,*func          is not allowed (* cannot be used for non-MCConstantExprs)
brasl  1,*+func         is not allowed (* cannot be used for non-MCConstantExprs)
brasl  1,*+func+4       is not allowed (* cannot be used for non-MCConstantExprs)
brasl  1,*+4+func       is not allowed (* cannot be used for non-MCConstantExprs)
brasl  1,*-4+8+func     is not allowed (* cannot be used for non-MCConstantExprs)
```

Reviewed By: Kai

Differential Revision: https://reviews.llvm.org/D100987

3 years ago[scudo] Don't track free/use stats for transfer batches.
Mitch Phillips [Mon, 3 May 2021 17:42:19 +0000 (10:42 -0700)]
[scudo] Don't track free/use stats for transfer batches.

The Scudo C unit tests are currently non-hermetic. In particular, adding
or removing a transfer batch is a global state of the allocator that
persists between tests. This can cause flakiness in
ScudoWrappersCTest.MallInfo, because the creation or teardown of a batch
causes mallinfo's uordblks or fordblks to move up or down by the size of
a transfer batch on malloc/free.

It's my opinion that uordblks and fordblks should track the statistics
related to the user's malloc() and free() usage, and not the state of
the internal allocator structures. Thus, excluding the transfer batches
from stat collection does the trick and makes these tests pass.

Repro instructions of the bug:
 1. ninja ./projects/compiler-rt/lib/scudo/standalone/tests/ScudoCUnitTest-x86_64-Test
 2. ./projects/compiler-rt/lib/scudo/standalone/tests/ScudoCUnitTest-x86_64-Test --gtest_filter=ScudoWrappersCTest.MallInfo

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D101653

3 years ago[libc++] Use the internal Lit shell to run the tests
Louis Dionne [Thu, 30 Apr 2020 21:01:28 +0000 (17:01 -0400)]
[libc++] Use the internal Lit shell to run the tests

This makes the libc++ tests more portable -- almost all of them should
now work on Windows, except for some tests that assume a shell is
available on the target. We should probably provide a way to exclude
those anyway for the purpose of running tests on embedded targets.

Differential Revision: https://reviews.llvm.org/D89495

3 years ago[libc++] Fix template instantiation depth issues with std::tuple
Louis Dionne [Mon, 3 May 2021 16:06:28 +0000 (12:06 -0400)]
[libc++] Fix template instantiation depth issues with std::tuple

This fixes the issue by implementing _And using the short-circuiting
SFINAE trick that we previously used only in std::tuple. One thing we
could look into is use the naive recursive implementation for disjunctions
with a small number of arguments, and use that trick with larger numbers
of arguments. It might be the case that the constant overhead for setting
up the SFINAE trick makes it only worth doing for larger packs, but that's
left for further work.

This problem was raised in https://reviews.llvm.org/D96523.

Differential Revision: https://reviews.llvm.org/D101661

3 years agoMove MLIR python sources to mlir/python.
Stella Laurenzo [Wed, 28 Apr 2021 20:04:17 +0000 (20:04 +0000)]
Move MLIR python sources to mlir/python.

* NFC but has some fixes for CMake glitches discovered along the way (things not cleaning properly, co-mingled depends).
* Includes previously unsubmitted fix in D98681 and a TODO to fix it more appropriately in a smaller followup.

Differential Revision: https://reviews.llvm.org/D101493

3 years ago[libc++] Disentangle std::pointer_safety
Louis Dionne [Tue, 13 Apr 2021 20:43:42 +0000 (16:43 -0400)]
[libc++] Disentangle std::pointer_safety

This patch gets rid of technical debt around std::pointer_safety which,
I claim, is entirely unnecessary. I don't think anybody has used
std::pointer_safety in actual code because we do not implement the
underlying garbage collection support. In fact, P2186 even proposes
removing these facilities entirely from a future C++ version. As such,
I think it's entirely fine to get rid of complex workarounds whose goals
were to avoid breaking the ABI back in 2017.

I'm putting this up both to get reviews and to discuss this proposal for
a breaking change. I think we should be comfortable with making these
tiny breaks if we are confident they won't hurt anyone, which I'm fairly
confident is the case here.

Differential Revision: https://reviews.llvm.org/D100410

3 years ago[DebuggerTuning] Move a comment to a more useful place.
Paul Robinson [Mon, 3 May 2021 18:07:12 +0000 (11:07 -0700)]
[DebuggerTuning] Move a comment to a more useful place.

The comment about how to make use of debugger tuning within DwarfDebug
really belongs inside the DwarfDebug declaration, where it will be
easier to find.

3 years ago[mlir][spirv] Add support to convert std.splat op
thomasraoux [Mon, 3 May 2021 17:56:15 +0000 (10:56 -0700)]
[mlir][spirv] Add support to convert std.splat op

Differential Revision: https://reviews.llvm.org/D101511

3 years ago[AMDGPU] Change FLAT Scratch SADDR to VADDR form in moveToVALU
Stanislav Mekhanoshin [Fri, 30 Apr 2021 18:26:53 +0000 (11:26 -0700)]
[AMDGPU] Change FLAT Scratch SADDR to VADDR form in moveToVALU

Extend the legalization of global SADDR loads and stores
with changing to VADDR to the FLAT scratch instructions.

Differential Revision: https://reviews.llvm.org/D101408

3 years ago[AIX] Remove unused vector registers from allocation order in the default AltiVec ABI
Zarko Todorovski [Mon, 3 May 2021 17:01:49 +0000 (13:01 -0400)]
[AIX] Remove unused vector registers from allocation order in the default AltiVec ABI

The previous implementation of the default AltiVec ABI marked registers V20-V31
as reserved.  This failed to prevent reserved VFRC registers being allocated.
In this patch instead of marking the registers reserved we remove unallowed
registers from the allocation order completely.

This is a slight rework of an implementation by @nemanjai

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D100050

3 years agoModules: Remove an extra early return, NFC
Duncan P. N. Exon Smith [Fri, 30 Apr 2021 22:16:36 +0000 (15:16 -0700)]
Modules: Remove an extra early return, NFC

Remove an early return from an `else` block that's immediately followed
by an equivalent early return after the `else` block.

Differential Revision: https://reviews.llvm.org/D101671

3 years ago[mlir][vector] Extend vector transfer unrolling to support permutations and broadcast
thomasraoux [Mon, 3 May 2021 17:47:02 +0000 (10:47 -0700)]
[mlir][vector] Extend vector transfer unrolling to support permutations and broadcast

Differential Revision: https://reviews.llvm.org/D101637

3 years ago[mlir][vector] Add canonicalization for extract/insert -> shapecast
thomasraoux [Mon, 3 May 2021 17:41:15 +0000 (10:41 -0700)]
[mlir][vector] Add canonicalization for extract/insert -> shapecast

Differential Revision: https://reviews.llvm.org/D101643

3 years ago[libFuzzer] Deflake entropic exec-time test.
Matt Morehouse [Mon, 3 May 2021 17:25:32 +0000 (10:25 -0700)]
[libFuzzer] Deflake entropic exec-time test.

3 years ago[libFuzzer] Fix off-by-one error in ApplyDictionaryEntry
Fabian Meumertzheim [Fri, 30 Apr 2021 16:16:43 +0000 (09:16 -0700)]
[libFuzzer] Fix off-by-one error in ApplyDictionaryEntry

In the overwrite branch of MutationDispatcher::ApplyDictionaryEntry in
FuzzerMutate.cpp, the index Idx at which W.size() bytes are overwritten
with the word W is chosen uniformly at random in the interval
[0, Size - W.size()). This means that Idx + W.size() will always be
strictly less than Size, i.e., the last byte of the current unit will
never be overwritten.

This is fixed by adding 1 to the exclusive upper bound.

Addresses https://bugs.llvm.org/show_bug.cgi?id=49989.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D101625

3 years ago[AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU
Stanislav Mekhanoshin [Mon, 26 Apr 2021 23:12:50 +0000 (16:12 -0700)]
[AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU

Instead of legalizing saddr operand with a readfirstlane
when address is moved from SGPR to VGPR we can just
change the opcode.

Differential Revision: https://reviews.llvm.org/D101405

3 years ago[OpenMP] Fix non-determinism in clang task codegen
Giorgis Georgakoudis [Mon, 3 May 2021 04:49:05 +0000 (21:49 -0700)]
[OpenMP] Fix non-determinism in clang task codegen

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101739

3 years ago[mlir] Fix multidimensional lowering from std.select to llvm.select
Benjamin Kramer [Mon, 3 May 2021 17:24:00 +0000 (19:24 +0200)]
[mlir] Fix multidimensional lowering from std.select to llvm.select

The converter assumed that all operands have the same type, that's not
true for select.

Differential Revision: https://reviews.llvm.org/D101767

3 years ago[mlir][vector][NFC] split TransposeOp lowerning out of contractLowering
thomasraoux [Mon, 3 May 2021 17:04:12 +0000 (10:04 -0700)]
[mlir][vector][NFC] split TransposeOp lowerning out of contractLowering

Move TransposeOp lowering in its own populate function as in some cases
it is better to keep it during ContractOp lowering to better
canonicalize it rather than emiting scalar insert/extract.

Differential Revision: https://reviews.llvm.org/D101647

3 years ago[docs][NewPM] Add section on analyses
Arthur Eubanks [Tue, 20 Apr 2021 19:44:19 +0000 (12:44 -0700)]
[docs][NewPM] Add section on analyses

Reviewed By: asbirlea, ychen

Differential Revision: https://reviews.llvm.org/D100912

3 years ago[MLIR] Fix TestAffineDataCopy for test cases with no load ops
Uday Bondhugula [Sun, 2 May 2021 09:40:22 +0000 (15:10 +0530)]
[MLIR] Fix TestAffineDataCopy for test cases with no load ops

Add missing check in -test-affine-data-copy without which a test case
that has no affine.loads at all would crash this test pass. Fix two
clang-tidy warnings in the file while at this. (Not adding a test case
given the triviality.)

Differential Revision: https://reviews.llvm.org/D101719

3 years ago[mlir][Python] Add casting constructor to Type and Attribute.
Stella Laurenzo [Sun, 2 May 2021 22:15:21 +0000 (15:15 -0700)]
[mlir][Python] Add casting constructor to Type and Attribute.

* This makes them consistent with custom types/attributes, whose constructors will do a type checked conversion. Of course, the base classes can represent everything so never error.
* More importantly, this makes it possible to subclass Type and Attribute out of tree in sensible ways.

Differential Revision: https://reviews.llvm.org/D101734

3 years ago[Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.
Chris Lattner [Sat, 1 May 2021 21:07:17 +0000 (14:07 -0700)]
[Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.

This avoids the non-trivial overhead of creating a TaskGroup in these degenerate
cases, but also exposes parallelism.  It turns out that the default executor
underlying TaskGroup prevents recursive parallelism - so an instance of a task
group being alive will make nested ones become serial.

This is a big issue in MLIR in some dialects, if they have a single instance of
an outer op (e.g. a firrtl.circuit) that has many parallel ops within it (e.g.
a firrtl.module).  This patch side-steps the problem by avoiding creating the
TaskGroup in the unneeded case.  See this issue for more details:
https://github.com/llvm/circt/issues/993

Note that this isn't a really great solution for the general case of nested
parallelism.  A redesign of the TaskGroup stuff would be better, but would be
a much more invasive change.

Differential Revision: https://reviews.llvm.org/D101699