Simon Pilgrim [Sun, 16 Aug 2020 13:52:07 +0000 (14:52 +0100)]
[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types
We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup.
There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.
Sanjay Patel [Sun, 16 Aug 2020 13:52:33 +0000 (09:52 -0400)]
Revert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC"
This reverts commit
babb59496b540583c6951813d1e0b3abdea97e7d.
This test addition was queued up with some unrelated changes,
but it seems more likely that we need to fix something internal
to -memcpyopt. Also, I'm not sure if including target-specifc
attributes in a generic regression test dir will cause bot
problems.
Sanjay Patel [Sat, 15 Aug 2020 18:21:24 +0000 (14:21 -0400)]
[InstCombine] fold copysign with fabs/fneg operand
We already get this in the backend, but we need to do
it in IR too to consistently get yet more copysign
transforms.
Sanjay Patel [Sat, 15 Aug 2020 17:59:13 +0000 (13:59 -0400)]
[InstCombine] reduce code duplication; NFC
Sanjay Patel [Sat, 15 Aug 2020 17:33:35 +0000 (13:33 -0400)]
[InstCombine] add tests for copysign; NFC
Sanjay Patel [Fri, 14 Aug 2020 20:46:37 +0000 (16:46 -0400)]
[PhaseOrdering] add test for memcpy removal (PR47114); NFC
Vitaly Buka [Thu, 6 Aug 2020 12:28:03 +0000 (05:28 -0700)]
[StackSafety] Change how callee searched in index
Handle other than local linkage types.
Simon Pilgrim [Sun, 16 Aug 2020 11:26:09 +0000 (12:26 +0100)]
[X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp
Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.
Brad Smith [Sun, 16 Aug 2020 10:50:50 +0000 (06:50 -0400)]
Create strict aligned code for OpenBSD/arm64.
Simon Pilgrim [Sun, 16 Aug 2020 10:51:44 +0000 (11:51 +0100)]
[X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC.
Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size.
This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.
Shoaib Meenai [Sun, 16 Aug 2020 07:16:35 +0000 (00:16 -0700)]
[llvm-libtool-darwin] Fix test on all host architectures
By default, if a universal binary has a slice matching the host
architecture, llvm-objdump will only print that slice, otherwise it'll
print all architectures. Explicitly pass `--arch all` to force it to
always print all architectures, as we want for this test.
Fady Ghanim [Sun, 9 Aug 2020 18:46:21 +0000 (14:46 -0400)]
[OpenMP][OMPBuilder] Adding support for `omp single`
This adds support for generating `omp single`, and necessary calls for
`copyprivate` clause.
Differential Revision: https://reviews.llvm.org/D85617
Shoaib Meenai [Sun, 16 Aug 2020 04:32:09 +0000 (21:32 -0700)]
[llvm-libtool-darwin] Speculative buildbot fix
http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l is failing
this test. Attempt to explicitly use the Mach-O dump format as a
speculative fix.
LLVM GN Syncbot [Sun, 16 Aug 2020 03:17:58 +0000 (03:17 +0000)]
[gn build] Port
577e58bcc75
Wenlei He [Sat, 15 Aug 2020 21:52:10 +0000 (14:52 -0700)]
[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.
A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.
This is a resubmit of https://reviews.llvm.org/D83743
Fangrui Song [Sun, 16 Aug 2020 02:33:35 +0000 (19:33 -0700)]
[ARC] Fix CodeGen/ARC/brcc.ll
Jon Chesterfield [Sat, 15 Aug 2020 22:52:19 +0000 (23:52 +0100)]
[libomptarget] Implement host plugin for amdgpu
[libomptarget] Implement host plugin for amdgpu
Replacement for D71384. Primary difference is inlining the dependency on atmi
followed by extensive simplification and bugfixes. This is the latest version
from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with
minor patches and a rename from hsa to amdgpu, on the basis that this can't be
used by other implementations of hsa without additional work.
This will not build unless the ROCM_DIR variable is passed so won't break other
builds. That variable is used to locate two amdgpu specific libraries that ship
as part of rocm:
libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime
These libraries build from source. The build scripts in those repos are for
shared libraries, but can be adapted to statically link both into this plugin.
There are caveats.
- This works well enough to run various tests and benchmarks, and will be used
to support the current clang bring up
- It is adequately thread safe for the above but there will be races remaining
- It is not stylistically correct for llvm, though has had clang-format run
- It has suboptimal memory management and locking strategies
- The debug printing / error handling is inconsistent
I would like to contribute this pretty much as-is and then improve it in-tree.
This would be advantagous because the aomp12 branch that was in use for fixing
this codebase has just been joined with the amd internal rocm dev process.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85742
Lang Hames [Sat, 15 Aug 2020 22:45:49 +0000 (15:45 -0700)]
[JITLink][MachO] Use correct symbol scope when N_PEXT is set and N_EXT unset.
MachOLinkGraphBuilder has been treating these as hidden, but they should be
treated as local.
Symbols with N_PEXT set and N_EXT unset are produced when hidden symbols are
run through 'ld -r' without passing -keep_private_externs. They will show up
under 'nm -m' as "was private extern", hence the name of the test cases.
Testcase commited as relocatable object to ensure that the test suite doesn't
depend on having 'ld -r' available.
Mehdi Amini [Sat, 15 Aug 2020 21:37:11 +0000 (21:37 +0000)]
Slightly relax the regex on lld version in test (NFC)
This makes the test introduced in
537f5483fe4e more robust with respect
to the actual version number. The previous regex restricted the version
to start with a leading `1` which was overly restrictive.
Amara Emerson [Fri, 14 Aug 2020 08:37:49 +0000 (01:37 -0700)]
[GlobalISel] Enable copy-propagation in post-legalizer combiner.
This cleans up copies that the legalizer or other combines leave around. They
can occasionally end up escaping as moves.
Differential Revision: https://reviews.llvm.org/D85964
Mehdi Amini [Sat, 15 Aug 2020 19:02:56 +0000 (19:02 +0000)]
Refactor mlir-opt setup in a new helper function (NFC)
This will help refactoring some of the tools to prepare for the explicit registration of
Dialects.
Differential Revision: https://reviews.llvm.org/D86023
Shoaib Meenai [Sat, 15 Aug 2020 18:41:57 +0000 (11:41 -0700)]
[llvm-libtool-darwin] Use Optional operator overloads. NFC
Use operator bool instead of hasValue and operator* instead of getValue
to simplify the code slightly.
LLVM GN Syncbot [Sat, 15 Aug 2020 16:24:37 +0000 (16:24 +0000)]
[gn build] Port
79298a50670
Matt Arsenault [Sat, 15 Aug 2020 15:09:21 +0000 (11:09 -0400)]
GlobalISel: Remove unnecessary llvm::
Matt Arsenault [Sat, 15 Aug 2020 00:22:04 +0000 (20:22 -0400)]
AMDGPU: Remove register class params from flat memory patterns
Matt Arsenault [Sat, 15 Aug 2020 00:01:51 +0000 (20:01 -0400)]
AMDGPU: Fix global atomic saddr operand class
Matt Arsenault [Fri, 14 Aug 2020 20:42:18 +0000 (16:42 -0400)]
AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
Matt Arsenault [Thu, 13 Aug 2020 22:51:58 +0000 (18:51 -0400)]
AMDGPU: Fix matching wrong offsets for global atomic loads
These used signed offsets with a different size.
Matt Arsenault [Thu, 13 Aug 2020 22:57:06 +0000 (18:57 -0400)]
AMDGPU: Remove redundant FLAT complex patterns
These were identical to the non-atomic cases. I'm not sure why these
were ever separated.
Matt Arsenault [Wed, 12 Aug 2020 00:38:40 +0000 (20:38 -0400)]
AMDGPU: Correct definitions for global saddr instructions
The VGPR component is a 32-bit offset, not 64-bits.
I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.
Matt Arsenault [Thu, 13 Aug 2020 20:17:42 +0000 (16:17 -0400)]
AMDGPU: Remove SIFixupVectorISel pass
This was only used for matching the saddr addressing mode of global
instructions, but this was not implemented correctly. The instruction
definitions aren't even correct, and are defined as using a 64-bit
VGPR component. Eliminate this pass to enable correcting the
instruction definitions. A new matching implementation can work in
GlobalISel or relying on DAG divergence information for the base
address.
Aditya Kumar [Sat, 15 Aug 2020 15:51:48 +0000 (08:51 -0700)]
[NFC] Fix typo and variable names
Luofan Chen [Sat, 15 Aug 2020 15:53:11 +0000 (23:53 +0800)]
[Attributor][NFC] Format code
Luofan Chen [Sat, 15 Aug 2020 15:04:11 +0000 (23:04 +0800)]
[Attributor][NFC] Use indexes instead of iterator
When adding elements when iterating, the iterator will become
valid, which could cause errors. This fixes the issue by using
indexes instead of iterator.
Bernhard Manfred Gruber [Sat, 15 Aug 2020 14:40:22 +0000 (10:40 -0400)]
Add support for C++20 concepts and decltype to modernize-use-trailing-return-type.
Cyndy Ishida [Sat, 15 Aug 2020 13:37:06 +0000 (06:37 -0700)]
[TextAPI] update DriverKit string value
String value differed from downstream, where upstream doesn't depend on
casing difference.
<rdar://problem/
67106257>
Xing GUO [Sat, 15 Aug 2020 13:08:17 +0000 (21:08 +0800)]
[MachOYAML] Move EmitFunc to an inner scope. NFC.
Luofan Chen [Sat, 15 Aug 2020 11:17:44 +0000 (19:17 +0800)]
[Attributor] Use internalized version of non-exact functions
This patch internalize non-exact functions and replaces of their uses
with the internalized version. Doing this enables the analysis of
non-exact functions.
We can do this because some non-exact functions with the same name
whose linkage is `linkonce_odr` or `weak_odr` should have the same
semantics, so we can safely internalize and replace use of them (the
result of the other version of this function should be the same.).
Note that not all functions can be internalized, e.g., function with
`linkonce` or `weak` linkage.
For now when specified in commandline, we internalize all functions
that meet the requirements without calculating the cost of such
internalzation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84167
Xing GUO [Sat, 15 Aug 2020 12:09:37 +0000 (20:09 +0800)]
[DWARFYAML] Simplify isEmpty(). NFC.
Dimitry Andric [Sat, 1 Aug 2020 14:26:36 +0000 (16:26 +0200)]
On FreeBSD, add -pthread to ASan dynamic compile flags for tests
Otherwise, lots of these tests fail with a CHECK error similar to:
==12345==AddressSanitizer CHECK failed: compiler-rt/lib/asan/asan_posix.cpp:120 "((0)) == ((pthread_key_create(&tsd_key, destructor)))" (0x0, 0x4e)
This is because the default pthread stubs in FreeBSD's libc always
return failures (such as ENOSYS for pthread_key_create) in case the
pthread library is not linked in.
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D85082
Dávid Bolvanský [Sat, 15 Aug 2020 10:07:45 +0000 (12:07 +0200)]
Reland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"
Mehdi Amini [Sat, 15 Aug 2020 09:21:47 +0000 (09:21 +0000)]
Revert "Separate the Registration from Loading dialects in the Context"
This reverts commit
20563933875a9396c8ace9c9770ecf6a988c4ea6.
Build is broken on a few bots
Mehdi Amini [Sat, 15 Aug 2020 06:40:18 +0000 (06:40 +0000)]
Separate the Registration from Loading dialects in the Context
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Differential Revision: https://reviews.llvm.org/D85622
Mehdi Amini [Sat, 15 Aug 2020 07:33:59 +0000 (07:33 +0000)]
Revert "Separate the Registration from Loading dialects in the Context"
This was landed by accident, will reland with the right comments
addressed from the reviews.
Also revert dependent build fixes.
Martin Storsjö [Sat, 15 Aug 2020 06:19:54 +0000 (09:19 +0300)]
Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"
This reverts commit
6dbf0cfcf789365493f70ae69df8a7a59be41c75.
That commit caused failed assertions, e.g. like this:
$ cat sprintf-strcpy.c
char *ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); }
$ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu
clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value*,
llvm::Value::ReplaceMetadataUses): Assertion `New->getType() ==
getType() && "replaceAllUses of value with new value of different
type!"' failed.
Raphael Isemann [Sat, 15 Aug 2020 06:14:42 +0000 (08:14 +0200)]
[lldb] Remove XFAIL from now passing TestPtrRefs/TestPtreRefsObjC
8fcfe2862fd4fde4793e232cfeebe6c5540c80a5 and
0cceb54366b406649fdfe7bb11b133ab96f3cd70 fixed those tests.
Philip Reames [Sat, 15 Aug 2020 03:45:48 +0000 (20:45 -0700)]
[Tests] Be consistent w/definition of statepoint-example
These tests use the statepoint-example builtin gc which expects address space #1 to the only non-integral address space. The fact the test used as=0 happened to work, but was caught by a downstream assert. (Literally years ago, I just happened to notice the XFAIL and fix it now.)
Philip Reames [Sat, 15 Aug 2020 03:29:41 +0000 (20:29 -0700)]
[Statepoint] Remove code related to inline operand bundles
This code becomes dead for valid IR after 48f4312 and a96fc46. The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order. By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports. And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.
Philip Reames [Sat, 15 Aug 2020 02:42:18 +0000 (19:42 -0700)]
Remove inline gc arguments from statepoints
The "gc-live" operand bundles were recently added, and all tests have been updated to use that format. A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement.
This is an extension to a96fc46. "gc-live" hadn't been implemented at the point that patch was initially posted.
Stanislav Mekhanoshin [Fri, 14 Aug 2020 22:38:13 +0000 (15:38 -0700)]
[AMDGPU] Fix MAI ld/st hazard handling
It did not process hazard for ds_permute because it does not
load or store even though it is DS.
Differential Revision: https://reviews.llvm.org/D86003
Dávid Bolvanský [Fri, 14 Aug 2020 23:49:02 +0000 (01:49 +0200)]
[SLC] Transform strncpy(dst, "text", C) to memcpy(dst, "text\0\0\0", C) for C <= 128 only
Transformation creates big strings for big C values, so bail out for C > 128.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86004
Gui Andrade [Fri, 14 Aug 2020 23:34:16 +0000 (23:34 +0000)]
[MSAN] Avoid dangling ActualFnStart when replacing instruction
This would be a problem if the entire instrumented function was a call
to
e.g. memcpy
Use FnPrologueEnd Instruction* instead of ActualFnStart BB*
Differential Revision: https://reviews.llvm.org/D86001
Cameron McInally [Fri, 14 Aug 2020 23:36:16 +0000 (18:36 -0500)]
[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors.
Differential Revision: https://reviews.llvm.org/D85982
Christopher Tetreault [Fri, 14 Aug 2020 22:54:16 +0000 (15:54 -0700)]
[SVE] Remove calls to VectorType::getNumElements from AggressiveInstCombine
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D82218
Michael Park [Fri, 14 Aug 2020 23:30:10 +0000 (16:30 -0700)]
[libcxx/variant] Avoided variable name shadowing.
Philip Reames [Fri, 14 Aug 2020 23:06:19 +0000 (16:06 -0700)]
Remove deopt and gc transition arguments from gc.statepoint intrinsic
(Forgot to land this a couple of weeks back.)
In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.
The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.
Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.
Differential Revision: https://reviews.llvm.org/D80892
Arthur Eubanks [Sat, 8 Aug 2020 00:56:31 +0000 (17:56 -0700)]
[test][LoopUnroll] Cleanup FullUnroll.ll
This is in preparation for enabling proper handling of optnone under the
NPM. Most optimizations won't run on an optnone function.
Previously the test would rely on lots of optimizations to optimize the
IR into a simple infinite loop. This is an optnone function, so clearly
that shouldn't be the case.
This IR was found by printing the module before the LoopFullUnrollerPass ran.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D85578
Arthur Eubanks [Thu, 6 Aug 2020 18:10:14 +0000 (11:10 -0700)]
[NewPM][optnone] Mark various passes as required
This was done by turning on -enable-npm-optnone and fixing failures.
That will be enabled in a follow-up change for ease of reverting.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D85457
Fangrui Song [Fri, 14 Aug 2020 22:50:52 +0000 (15:50 -0700)]
Fix TargetSubtargetInfo derivatives after D85165
Fangrui Song [Fri, 14 Aug 2020 22:38:05 +0000 (15:38 -0700)]
[ELF] Re-initialize InputFile::isInGroup so that elf::link can be called more than once
Craig Topper [Fri, 14 Aug 2020 21:56:54 +0000 (14:56 -0700)]
[X86][MC][Target] Initial backend support a tune CPU to support -mtune
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
Davide Italiano [Fri, 14 Aug 2020 22:31:02 +0000 (15:31 -0700)]
[TestPtrRefsObjC] Prefer `command script import`.
Davide Italiano [Fri, 14 Aug 2020 22:30:07 +0000 (15:30 -0700)]
[TestPtrRefs] Prefer `command script import`.
Jordan Rupprecht [Fri, 14 Aug 2020 21:51:49 +0000 (14:51 -0700)]
Temporarily revert "[SCEVExpander] Add helper to clean up instrs inserted while expanding."
This reverts commit
7829c33084a7a5097533cf862daef521380c4e63. The assertion is triggering on some internal code. A reduced test case is in progress.
Dávid Bolvanský [Fri, 14 Aug 2020 21:48:30 +0000 (23:48 +0200)]
[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)
Transform sprintf(dst, "%s", str) -> strcpy(dst, str) if result is unused
Avoid sprintf(dest, "%s", str) -> llvm.memcpy(align 1 dest, align 1 str, strlen(str)+1) if optimizing for size.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85963
Nicolas Guillemot [Wed, 12 Aug 2020 22:22:58 +0000 (15:22 -0700)]
[TableGen] Allow mnemonics with uppercase letters to be matched
The assembly parser "canonicalizes" the mnemonics it processes at an
early level by making them lowercase. The goal of this is presumably to
allow assembly to be case-insensitive. However, if one declares an
instruction with a mnemonic using uppercase letters, then it will
never get matched, since the generated lookup tables for the
AsmMatcherEmitter didn't lower() their inputs. This made it difficult to
have instructions that get printed using a mnemonic that includes
uppercase letters, since they could not be parsed.
To fix this problem, this patch adds a few calls to lower() to make the
lookup tables used in AsmMatcherEmitter be case-insensitive. This allows
instruction mnemonics with uppercase letters to be parsed.
Differential Revision: https://reviews.llvm.org/D85858
Gui Andrade [Fri, 14 Aug 2020 18:26:23 +0000 (18:26 +0000)]
[MSAN] Convert ActualFnStart to be a particular Instruction *, not BB
This allows us to add addtional instrumentation before the function start,
without splitting the first BB.
Differential Revision: https://reviews.llvm.org/D85985
Matt Morehouse [Fri, 14 Aug 2020 20:45:36 +0000 (13:45 -0700)]
[docs] Add missing semicolon to example.
Gui Andrade [Fri, 14 Aug 2020 20:31:10 +0000 (20:31 +0000)]
[MSAN] Reintroduce libatomic load/store instrumentation
Have the front-end use the `nounwind` attribute on atomic libcalls.
This prevents us from seeing `invoke __atomic_load` in MSAN, which
is problematic as it has no successor for instrumentation to be added.
Xiangling Liao [Fri, 7 Aug 2020 14:47:31 +0000 (10:47 -0400)]
[AIX] Generate unique module id based on Pid and timestamp
A unique module id, which is a part of sinit and sterm function names, is
necessary to be unique. However, `getUniqueModuleId` will fail if there is
no strong external symbol within a module. We turn to use Pid and timestamp
when this happens.
Differential Revision: https://reviews.llvm.org/D85527
Sanjay Patel [Fri, 14 Aug 2020 20:16:39 +0000 (16:16 -0400)]
[x86] add tests for store merging (PR46662); NFC
Artem Belevich [Wed, 12 Aug 2020 00:17:53 +0000 (17:17 -0700)]
Split Preprocessor/init.c test
Some parts of the test had been extracted into separate files previously.
This patch continues the trend and extracts few more large blocks.
This reduces wall time for the test from a single 14s-long test into a set of
smaller tests that can be run in parallel.
Before/after state of the check-clang tests are here:
https://gist.github.com/Artem-B/
d0b05c2e98a49158c02de23f7f4f0279
Differential Revision: https://reviews.llvm.org/D85798
Michael Park [Tue, 11 Aug 2020 22:52:49 +0000 (15:52 -0700)]
[libcxx/variant] Introduce `switch`-based mechanism for `std::visit`.
This patch introduces mechanism for `std::visit` backed by `switch`.
The `switch` is structured such that it's a flattened manual vtable (an n-ary array).
The `switch` mechanism is enabled if `(1 * ... * vs.size()) < 1024`.
The following are performance numbers from the benchmarks added in D85419, tested on my 2017 Macbook Pro.
```
$ ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out
2020-08-09 23:55:14
Running ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 262K (x4)
L3 Unified 8388K (x1)
Load Average: 2.03, 2.36, 2.43
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
BM_Visit<1, 1> 0.260 ns 0.260 ns
1000000000
BM_Visit<1, 2> 1.56 ns 1.56 ns
435925220
BM_Visit<1, 3> 1.55 ns 1.55 ns
444416228
BM_Visit<1, 4> 1.57 ns 1.57 ns
427951336
BM_Visit<1, 5> 1.57 ns 1.56 ns
444766371
BM_Visit<1, 6> 1.70 ns 1.68 ns
446639358
BM_Visit<1, 7> 1.64 ns 1.64 ns
400441630
BM_Visit<1, 8> 1.56 ns 1.56 ns
430729471
BM_Visit<1, 9> 1.58 ns 1.58 ns
449894596
BM_Visit<1, 10> 1.54 ns 1.54 ns
449660506
BM_Visit<1, 20> 1.56 ns 1.56 ns
450813074
BM_Visit<1, 30> 1.59 ns 1.59 ns
440032940
BM_Visit<1, 40> 1.59 ns 1.59 ns
443731656
BM_Visit<1, 50> 1.56 ns 1.56 ns
444709859
BM_Visit<1, 60> 1.59 ns 1.58 ns
439527320
BM_Visit<1, 70> 1.57 ns 1.57 ns
438450890
BM_Visit<1, 80> 1.58 ns 1.58 ns
443001525
BM_Visit<1, 90> 1.63 ns 1.62 ns
448456349
BM_Visit<1, 100> 1.57 ns 1.57 ns
445740630
$ ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out
2020-08-09 23:59:35
Running ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 262K (x4)
L3 Unified 8388K (x1)
Load Average: 1.40, 1.94, 2.22
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_Visit<2, 1> 0.261 ns 0.260 ns
1000000000
BM_Visit<2, 2> 1.55 ns 1.54 ns
432844219
BM_Visit<2, 3> 1.30 ns 1.30 ns
532529974
BM_Visit<2, 4> 1.54 ns 1.54 ns
446055910
BM_Visit<2, 5> 1.31 ns 1.31 ns
531099680
BM_Visit<2, 6> 1.56 ns 1.56 ns
443203475
BM_Visit<2, 7> 1.29 ns 1.29 ns
526478087
BM_Visit<2, 8> 1.56 ns 1.56 ns
439000834
BM_Visit<2, 9> 1.30 ns 1.30 ns
528756817
BM_Visit<2, 10> 1.56 ns 1.55 ns
442923039
BM_Visit<2, 20> 1.35 ns 1.35 ns
517021072
BM_Visit<2, 30> 1.60 ns 1.59 ns
419724661
BM_Visit<2, 40> 1.45 ns 1.44 ns
472137163
BM_Visit<2, 50> 1.65 ns 1.65 ns
421389743
$ ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out
2020-08-10 00:01:32
Running ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 262K (x4)
L3 Unified 8388K (x1)
Load Average: 2.20, 2.01, 2.21
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_Visit<3, 1> 0.272 ns 0.271 ns
1000000000
BM_Visit<3, 2> 1.87 ns 1.86 ns
361858090
BM_Visit<3, 3> 1.77 ns 1.77 ns
391192579
BM_Visit<3, 4> 1.84 ns 1.84 ns
374694223
BM_Visit<3, 5> 1.75 ns 1.75 ns
408270392
BM_Visit<3, 6> 1.88 ns 1.88 ns
378759185
BM_Visit<3, 7> 1.79 ns 1.79 ns
395498102
BM_Visit<3, 8> 1.85 ns 1.85 ns
371660366
BM_Visit<3, 9> 1.80 ns 1.80 ns
386872851
BM_Visit<3, 10> 1.84 ns 1.84 ns
362367606
BM_Visit<3, 15> 1.77 ns 1.77 ns
392060220
BM_Visit<3, 20> 1.85 ns 1.85 ns
379157188
```
```
$ ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out
2020-08-10 00:05:57
Running ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 262K (x4)
L3 Unified 8388K (x1)
Load Average: 2.27, 2.36, 2.34
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
BM_Visit<1, 1> 0.271 ns 0.271 ns
1000000000
BM_Visit<1, 2> 0.269 ns 0.269 ns
1000000000
BM_Visit<1, 3> 0.271 ns 0.271 ns
1000000000
BM_Visit<1, 4> 0.270 ns 0.270 ns
1000000000
BM_Visit<1, 5> 0.269 ns 0.269 ns
1000000000
BM_Visit<1, 6> 0.270 ns 0.269 ns
1000000000
BM_Visit<1, 7> 0.265 ns 0.265 ns
1000000000
BM_Visit<1, 8> 0.269 ns 0.269 ns
1000000000
BM_Visit<1, 9> 0.268 ns 0.268 ns
1000000000
BM_Visit<1, 10> 0.269 ns 0.269 ns
1000000000
BM_Visit<1, 20> 0.267 ns 0.267 ns
1000000000
BM_Visit<1, 30> 0.272 ns 0.272 ns
1000000000
BM_Visit<1, 40> 0.268 ns 0.268 ns
1000000000
BM_Visit<1, 50> 0.268 ns 0.268 ns
1000000000
BM_Visit<1, 60> 0.268 ns 0.268 ns
1000000000
BM_Visit<1, 70> 0.269 ns 0.269 ns
1000000000
BM_Visit<1, 80> 0.266 ns 0.266 ns
1000000000
BM_Visit<1, 90> 0.268 ns 0.268 ns
1000000000
BM_Visit<1, 100> 0.267 ns 0.267 ns
1000000000
$ ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out
2020-08-12 04:09:59
Running ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 262K (x4)
L3 Unified 8388K (x1)
Load Average: 2.17, 4.20, 4.78
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_Visit<2, 1> 0.302 ns 0.301 ns
1000000000
BM_Visit<2, 2> 0.297 ns 0.295 ns
1000000000
BM_Visit<2, 3> 0.353 ns 0.351 ns
1000000000
BM_Visit<2, 4> 0.276 ns 0.276 ns
1000000000
BM_Visit<2, 5> 0.285 ns 0.283 ns
1000000000
BM_Visit<2, 6> 0.290 ns 0.287 ns
1000000000
BM_Visit<2, 7> 0.282 ns 0.280 ns
1000000000
BM_Visit<2, 8> 0.290 ns 0.287 ns
1000000000
BM_Visit<2, 9> 0.291 ns 0.285 ns
1000000000
BM_Visit<2, 10> 0.293 ns 0.287 ns
1000000000
BM_Visit<2, 20> 1.70 ns 1.68 ns
391400375
BM_Visit<2, 30> 1.64 ns 1.63 ns
418925874
BM_Visit<2, 40> 1.63 ns 1.62 ns
423623677
BM_Visit<2, 50> 1.68 ns 1.67 ns
411687212
$ ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out
2020-08-12 04:10:43
Running ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out
Run on (8 X 3100 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 262K (x4)
L3 Unified 8388K (x1)
Load Average: 1.57, 3.76, 4.59
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_Visit<3, 1> 0.271 ns 0.270 ns
1000000000
BM_Visit<3, 2> 0.344 ns 0.334 ns
1000000000
BM_Visit<3, 3> 0.347 ns 0.336 ns
1000000000
BM_Visit<3, 4> 0.300 ns 0.296 ns
1000000000
BM_Visit<3, 5> 0.290 ns 0.286 ns
1000000000
BM_Visit<3, 6> 0.272 ns 0.271 ns
1000000000
BM_Visit<3, 7> 1.72 ns 1.71 ns
415765841
BM_Visit<3, 8> 1.73 ns 1.72 ns
408909555
BM_Visit<3, 9> 2.16 ns 2.04 ns
380898485
BM_Visit<3, 10> 2.45 ns 2.40 ns
295714256
BM_Visit<3, 15> 1.92 ns 1.85 ns
375990332
BM_Visit<3, 20> 1.66 ns 1.65 ns
414456233
```
Differential Revision: https://reviews.llvm.org/D85420
Vitaly Buka [Fri, 14 Aug 2020 19:42:21 +0000 (12:42 -0700)]
[StackSafety] Use ValueInfo in ParamAccess::Call
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D85269
cgyurgyik [Fri, 14 Aug 2020 19:38:52 +0000 (15:38 -0400)]
[libc] Add restrict qualifiers to string library; give consistent naming scheme to TableGen files.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D85945
Greg McGary [Fri, 14 Aug 2020 19:35:31 +0000 (12:35 -0700)]
[lld-macho] Emit load command LC_BUILD_VERSION
Reviewed By: int3
Differential Revision: https://reviews.llvm.org/D85786
Greg McGary [Fri, 14 Aug 2020 19:34:20 +0000 (12:34 -0700)]
[MachO] Add skeletal support for DriverKit platform
Define the platform ID = 10, and simple mappings between platform ID & name.
Reviewed By: MaskRay, cishida
Differential Revision: https://reviews.llvm.org/D85594
Marius Brehler [Fri, 14 Aug 2020 19:26:15 +0000 (21:26 +0200)]
Test commit
Test commit access to the LLVM repository.
Mauricio Sifontes [Fri, 14 Aug 2020 19:12:07 +0000 (19:12 +0000)]
Fix warning caused by ReductionTreePass class
Explicitly declare ReductionTreeBase base class in ReductionTreePass copy constructor.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D85983
Sameer Arora [Fri, 7 Aug 2020 18:03:54 +0000 (11:03 -0700)]
[llvm-libtool-darwin] Add support for -l and -L
Add support for passing in libraries via `-l` and `-L` options to
`llvm-libtool-darwin`.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D85540
Matt Morehouse [Fri, 14 Aug 2020 18:43:33 +0000 (11:43 -0700)]
[DFSan] Don't unmap during dfsan_flush().
Unmapping and remapping is dangerous since another thread could touch
the shadow memory while it is unmapped. But there is really no need to
unmap anyway, since mmap(MAP_FIXED) will happily clobber the existing
mapping with zeroes. This is thread-safe since the mmap() is done under
the same kernel lock as page faults are done.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D85947
Stephen Neuendorffer [Fri, 14 Aug 2020 18:37:51 +0000 (11:37 -0700)]
[examples][cmake] build fix for examples with BUILD_SHARED_LIBS=on
Differential Revision: https://reviews.llvm.org/D85987
Stephen Neuendorffer [Fri, 14 Aug 2020 18:26:29 +0000 (11:26 -0700)]
[mlir] build fix for gcc-5
It appears in this case that an implicit cast from StringRef to std::string
doesn't happen. Fixed with an explicit cast.
Differential Revision: https://reviews.llvm.org/D85986
Marius Brehler [Fri, 14 Aug 2020 06:28:01 +0000 (23:28 -0700)]
[mlir] Make mlir_check_link_libraries() work with interface libraries
This changes mlir_check_link_libraries() to work with interface libraries.
These don't have the LINK_LIBRARIES property.
Differential Revision: https://reviews.llvm.org/D85957
Sameer Arora [Wed, 5 Aug 2020 17:22:24 +0000 (10:22 -0700)]
[llvm-libtool-darwin] Support universal outputs
Add support for producing universal binaries containing archives when
`llvm-libtool-darwin` is given inputs of multiple architectures.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D85334
zacharyselk [Fri, 14 Aug 2020 18:27:30 +0000 (12:27 -0600)]
[clang-tools-extra] Added missing comma
The new diagnostic tool (D85545) caught a missing comma, adding one to fix the warning.
Differential Revision: https://reviews.llvm.org/D85978
Haowei Wu [Thu, 13 Aug 2020 21:19:06 +0000 (14:19 -0700)]
Remove unnecessary HEADER_DIRS in lib/InterfaceStub/CMakeLists.txt
This change removes unnecessary HEADER_DIRS from //llvm/lib/
InterfaceStub/CMakeLists.txt file.
Differential Revision: https://reviews.llvm.org/D85936
Matt Arsenault [Sat, 1 Aug 2020 14:39:21 +0000 (10:39 -0400)]
TableGen/GlobalISel: Partially handle immAllOnesV/immAllZerosV
These should really match either G_BUILD_VECTOR or
G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing
mechanism for matching alternative opcodes. There is GIM_SwitchOpcode,
but it seems to assume it's oly only used for matcher optimization.
I could also omit any opcode check and rely on the matcher directly
checking the opcode, but the table optimizer currently assumes there
has to be an opcode check.
Also doesn't try to handle undef elements like the DAG version.
Simon Pilgrim [Fri, 14 Aug 2020 15:15:05 +0000 (16:15 +0100)]
[X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))
This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining
Another step towards PR41813
Recommit of rG9bd97d036398 with fixed offset adjustments
Matt Arsenault [Fri, 31 Jul 2020 17:48:58 +0000 (13:48 -0400)]
AMDGPU/GlobalISel: Match andn2/orn2 for more types
Unfortunately this ends up not working as expected on targets with
16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform
16-bit ops to i32.
The vector case annoyingly requires switching the checked opcode,
since constants for vectors aren't directly handled.
I also need to think more carefully about whether this is valid for i1.
Jim Ingham [Fri, 14 Aug 2020 00:41:14 +0000 (17:41 -0700)]
Add python enumerators for SBTypeEnumMemberList, and some tests for this API.
Differential Revision: https://reviews.llvm.org/D85951
Mehdi Amini [Fri, 14 Aug 2020 16:54:01 +0000 (16:54 +0000)]
Minor build fix (pointer must be dereferenced with `->`)
Julian Lettner [Tue, 11 Aug 2020 22:01:20 +0000 (15:01 -0700)]
[TSan][libdispatch] Add interceptors for dispatch_async_and_wait()
Add interceptors for `dispatch_async_and_wait[_f]()` which was added in
macOS 10.14. This pair of functions is similar to `dispatch_sync()`,
but does not force a context switch of the queue onto the caller thread
when the queue is active (and hence is more efficient). For TSan, we
can apply the same semantics as for `dispatch_sync()`.
From the header docs:
> Differences with dispatch_sync()
>
> When the runtime has brought up a thread to invoke the asynchronous
> workitems already submitted to the specified queue, that servicing
> thread will also be used to execute synchronous work submitted to the
> queue with dispatch_async_and_wait().
>
> However, if the runtime has not brought up a thread to service the
> specified queue (because it has no workitems enqueued, or only
> synchronous workitems), then dispatch_async_and_wait() will invoke the
> workitem on the calling thread, similar to the behaviour of functions
> in the dispatch_sync family.
Additional context:
> The guidance is to use `dispatch_async_and_wait()` instead of
> `dispatch_sync()` when it is necessary to mix async and sync calls on
> the same queue. `dispatch_async_and_wait()` does not guarantee
> execution on the caller thread which allows to reduce context switches
> when the target queue is active.
> https://gist.github.com/tclementdev/
6af616354912b0347cdf6db159c37057
rdar://
35757961
Reviewed By: kubamracek
Differential Revision: https://reviews.llvm.org/D85854
Mehdi Amini [Fri, 14 Aug 2020 16:34:24 +0000 (16:34 +0000)]
Remove dependency from lib/CAPI/IR/IR.cpp on registerAllDialects() (build fix)
This library does not depend on all the dialects, conceptually. This is
changing the recently introduced `mlirContextLoadAllDialects()` function
to not call `registerAllDialects()` itself, which aligns it better with
the C++ code anyway (and this is deprecated and will be removed soon).
Stefan Gränitz [Fri, 14 Aug 2020 16:06:33 +0000 (18:06 +0200)]
[ORC] Build LLJITWithChildProcess example only on UNIX host systems
Differential Revision: https://reviews.llvm.org/D85919
Jonas Devlieghere [Fri, 14 Aug 2020 15:44:29 +0000 (08:44 -0700)]
[lldb] Remove Python 2 fallback and only support Python 3
This removes the fallback to Python 2 and makes Python 3 the only
supported configuration. This is the first step to fully migrate to
Python 3 over the coming releases as discussed on the mailing list.
http://lists.llvm.org/pipermail/lldb-dev/2020-August/016388.html
As a reminder, for the current release the test suite and the generated
bindings should remain compatible with Python 2.
Differential revision: https://reviews.llvm.org/D85942
Jordan Rupprecht [Fri, 14 Aug 2020 15:35:58 +0000 (08:35 -0700)]
[NFC] Silence variables unused in release builds
Jonas Devlieghere [Fri, 14 Aug 2020 15:32:21 +0000 (08:32 -0700)]
[lldb] Use file to synchronize TestDeepBundle and TestBundleWithDotInFilename
Currently these two tests use an arbitrary wait of 5 seconds for the
inferior to finish setting up. When the test machine is under heavy load
this sometimes is insufficient leading to spurious test failures. This
patch adds synchronization trough a token on the file system. In
addition to making the test more reliable it also makes it much faster
because we no longer have to wait the full 5 seconds if the setup was
completed faster than that.
Differential revision: https://reviews.llvm.org/D85915
Denis Antrushin [Fri, 14 Aug 2020 15:08:54 +0000 (22:08 +0700)]
[Statepoints] FixupStatepoint: properly set isKill on spilled register.
When spilling statepoint meta arg register it is incorrect to blindly
mark it as killed - it may be used in non-meta args (e.g., as call
parameter).
Matt Morehouse [Fri, 14 Aug 2020 15:17:35 +0000 (08:17 -0700)]
Revert "[NFC][StackSafety] Move out sort from the loop"
This reverts commit
0426e28419799c35cf52fe3d773c5bab9928c699 due to ASan
buildbot failure.