Max Kazantsev [Mon, 26 Oct 2020 07:55:46 +0000 (14:55 +0700)]
Fix broken build after previous commit
Max Kazantsev [Mon, 26 Oct 2020 07:53:22 +0000 (14:53 +0700)]
[NFC] Remove unused funciton param
Max Kazantsev [Mon, 26 Oct 2020 07:49:37 +0000 (14:49 +0700)]
[NFC] Factor out common code into lambda for further improvement
Max Kazantsev [Mon, 26 Oct 2020 06:47:11 +0000 (13:47 +0700)]
[IndVars] Use contextual knowledge when proving trivial conds
No exact example where it would help, but it's a generally a more
powerful way to prove predicates.
Kirill Bobyrev [Mon, 26 Oct 2020 06:08:49 +0000 (07:08 +0100)]
[clangd] Add dependency on remote index service proto
It requires Index.proto to be built first. Failed builds:
https://github.com/clangd/clangd/runs/
1305985916
Christudasan Devadasan [Fri, 9 Oct 2020 11:20:24 +0000 (16:50 +0530)]
[AMDGPU] Avoid offset register in MUBUF for direct stack object accesses
We use an absolute address for stack objects and
it would be necessary to have a constant 0 for soffset field.
Fixes: SWDEV-228562
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89234
Craig Topper [Mon, 26 Oct 2020 03:40:45 +0000 (20:40 -0700)]
[X86] Don't disassemble wbinvd with 0xf2 or 0x66 prefix.
The 0xf3 prefix has been defined as wbnoinvd on Icelake Server. So
the prefix isn't ignored by the CPU. AMD documentation suggests that
wbnoinvd is treated as wbinvd on older processors. Intel documentation
is not clear. Perhaps 0xf2 and 0x66 are treated the same, but its
not documented.
This patch changes TB to PS in the td file so 0xf2 and 0x66 will
be treated as errors. This matches versions of objdump after
wbnoinvd was added.
Liu, Chen3 [Fri, 23 Oct 2020 03:32:19 +0000 (11:32 +0800)]
[X86] VEX/EVEX prefix doesn't work for inline assembly.
For now, we lost the encoding information if we using inline assembly.
The encoding for the inline assembly will keep default even if we add
the vex/evex prefix.
Differential Revision: https://reviews.llvm.org/D90009
Craig Topper [Sun, 25 Oct 2020 19:46:53 +0000 (12:46 -0700)]
[X86] Use TargetConstant for immediates for VASTART_SAVE_XMM_REGS.
Craig Topper [Sun, 25 Oct 2020 19:19:05 +0000 (12:19 -0700)]
[X86] Use TargetConstant instead of Constant for operands to X86vaarg64.
Sanjay Patel [Sun, 25 Oct 2020 19:17:52 +0000 (15:17 -0400)]
[CostModel] remove cost-kind predicate for some vector reduction costs
This is a modified 2nd try of
22d10b8ab44f
(reverted by
1c8371692d because it managed
to expose an existing crashing bug that should be fixed by
74ffc823 ).
Original commit message:
This is similar in spirit to
01ea93d85d6e (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.
That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.
The ARM costs show a small difference between throughput and
size because there's an underlying difference in cmp/sel
costs that is also predicated on cost-kind.
Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.
Targets should provide better overrides if the current modeling
is not accurate.
Sanjay Patel [Sun, 25 Oct 2020 18:58:13 +0000 (14:58 -0400)]
[CostModel] fix operand/type accounting for fadd/fmul reductions
I'm not sure if/how this ever worked, but it must not be tested
currently because the basic tests added here were crashing as
noted in the post-review comments for 1c83716 (which reverted
another cost-model fix in
22d10b8ab44f).
Nikita Popov [Sun, 25 Oct 2020 18:39:07 +0000 (19:39 +0100)]
[SCEV] Strenthen nowrap flags after constant folding for mul exprs
Same change as
0dda6333175c1749f12be660456ecedade3bcf21, but for
mul expressions. We want to first fold any constant operans and
then strengthen the nowrap flags, as we can compute more precise
flags at that point.
Aaron Puchert [Sat, 25 Jul 2020 23:53:32 +0000 (01:53 +0200)]
Thread safety analysis: Nullability improvements in TIL, NFCI
The constructor of Project asserts that the contained ValueDecl is not
null, use that in the ThreadSafetyAnalyzer. In the case of LiteralPtr
it's the other way around.
Also dyn_cast<> is sufficient if we know something isn't null.
Aaron Puchert [Sun, 25 Oct 2020 18:31:53 +0000 (19:31 +0100)]
Thread safety analysis: Consider global variables in scope
Instead of just mutex members we also consider mutex globals.
Unsurprisingly they are always in scope. Now the paper [1] says that
> The scope of a class member is assumed to be its enclosing class,
> while the scope of a global variable is the translation unit in
> which it is defined.
But I don't think we should limit this to TUs where a definition is
available - a declaration is enough to acquire the mutex, and if a mutex
is really limited in scope to a translation unit, it should probably be
only declared there.
The previous attempt in
9dcc82f34ea was causing false positives because
I wrongly assumed that LiteralPtrs were always globals, which they are
not. This should be fixed now.
[1] https://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/42958.pdf
Fixes PR46354.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D84604
Nikita Popov [Sun, 25 Oct 2020 17:46:27 +0000 (18:46 +0100)]
[SCEV] Always constant fold mul expression operands
Establish parity with the handling of add expressions, by always
constant folding mul expression operands before checking the depth
limit (this is a non-recursive simplification). The code was already
unconditionally constant folding the case where all operands were
constants, but was not folding multiple constant operands together
if there were also non-constant operands.
This requires picking out a different demonstration for depth-based
folding differences in the limit-depth.ll test.
Nikita Popov [Sun, 25 Oct 2020 17:14:20 +0000 (18:14 +0100)]
[SCEV] Separate out constant folding in mul expr creation
Separate out the code handling constant folding into a separate
block, that is independent of other folds that need a constant
first operand. Also make some minor adjustments to make the
constant folding look nearly identical to the same code in
getAddExpr().
The only reason this change is not strictly NFC is that the
C1*(C2+V) fold is moved below the constant folding, which means
that it now also applies to C1*C2*(C3+V), as it should.
Nikita Popov [Sun, 25 Oct 2020 16:13:38 +0000 (17:13 +0100)]
[SCEV] Strength nowrap flags after constant folding
We should first try to constant fold the add expression and only
strengthen nowrap flags afterwards. This allows us to determine
stronger flags if e.g. only two operands are left after constant
folding (and thus "guaranteed no wrap region" code applies) or the
resulting operands are non-negative and thus nsw->nuw strengthening
applies.
Nikita Popov [Sun, 25 Oct 2020 16:15:55 +0000 (17:15 +0100)]
[IndVars] Regenerate test checks (NFC)
Also run the test case through -instnamer.
Sanjay Patel [Sun, 25 Oct 2020 15:09:35 +0000 (11:09 -0400)]
[InstSimplify] peek through 'not' operand in logic-of-icmps fold
This extends D78430 to solve cases like:
https://llvm.org/PR47858
There are still missed opportunities shown in the tests,
and as noted in the earlier patches, we have related
functionality in InstCombine, so we may want to extend
other folds in a similar way.
A semi-random sampling of test diff proofs in this patch:
https://rise4fun.com/Alive/sS4C
Sanjay Patel [Sun, 25 Oct 2020 14:30:23 +0000 (10:30 -0400)]
[InstSimplify] add tests for logic-of-cmps with not op; NFC
One variant of this is shown in:
https://llvm.org/PR47858
Melanie Blower [Sun, 25 Oct 2020 15:10:24 +0000 (08:10 -0700)]
Correct LIT test failure detected on buildbot after mibintc committed rG2e204e23911b: [clang] Enable support for #pragma STDC FENV_ACCESS D87528
Florian Hahn [Sun, 25 Oct 2020 12:57:05 +0000 (12:57 +0000)]
[SLP] Add AArch64 tests with vectorizable compare/select patterns.
This patch adds an additional set of tests that can be vectorized
efficiently on AArch64, using CMxx & BFI.
Simon Pilgrim [Sun, 25 Oct 2020 14:14:09 +0000 (14:14 +0000)]
Remove superfluous whitespace around if(). NFC.
Melanie Blower [Tue, 29 Sep 2020 17:44:36 +0000 (10:44 -0700)]
[clang] Enable support for #pragma STDC FENV_ACCESS
Reviewers: rjmccall, rsmith, sepavloff
Differential Revision: https://reviews.llvm.org/D87528
Simon Pilgrim [Sun, 25 Oct 2020 10:17:45 +0000 (10:17 +0000)]
[InstCombine] matchBSwapOrBitReversem - recognise or(fshl(),fshl()) bswap patterns.
I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).
Simon Pilgrim [Sun, 25 Oct 2020 10:07:19 +0000 (10:07 +0000)]
[InstCombine] Add test for or(fshl(),fshl()) bswap pattern.
Currently InstCombinerImpl::matchBSwapOrBitReverse won't match starting from funnel shifts.
Richard Smith [Sun, 25 Oct 2020 07:28:48 +0000 (00:28 -0700)]
[c++20] For P0732R2: Support string literal operator templates.
Craig Topper [Sun, 25 Oct 2020 07:25:19 +0000 (00:25 -0700)]
[X86] Use TargetConstant for FPDiff with X86::TC_RETURN.
It's required to be a constant and can never be in a register so
make it explicit.
Martin Storsjö [Sun, 25 Oct 2020 06:35:33 +0000 (08:35 +0200)]
Revert "[CostModel] remove cost-kind predicate for vector reduction costs"
This reverts commit
22d10b8ab44f703b72b8316a9b3b8adc623ca73f.
This broke compilation e.g. like this:
$ cat synth.c
*a;
float *b;
c() {
for (;;) {
float d = -*b * *a++;
d -= *--b * *a++;
d -= *--b * *a;
d -= *--b * *a;
e(d);
}
}
$ clang -target x86_64-linux-gnu -c -O2 -ffast-math synth.c
clang: ../include/llvm/Support/Casting.h:104: static bool llvm::isa_impl
_cl<To, const From*>::doit(const From*) [with To = llvm::PointerType; Fr
om = llvm::Type]: Assertion `Val && "isa<> used on a null pointer"' fail
ed.
Teresa Johnson [Sun, 25 Oct 2020 06:07:34 +0000 (23:07 -0700)]
[MemProf] Temporarily disable part of test
Disable the part of this test that started failing only on the
llvm-avr-linux bot after
5c20d7db9f2791367b9311130eb44afecb16829c.
Unfortunately, "XFAIL: avr" does not work. Still in the process of
trying to figure out how to debug.
Richard Smith [Sun, 25 Oct 2020 05:08:24 +0000 (22:08 -0700)]
For P0732R2, P1907R1: ensure that template parameter objects don't refer
to disallowed objects or have non-constant destruction.
Nathan Ridge [Tue, 13 Oct 2020 06:09:45 +0000 (02:09 -0400)]
[clangd] Add a TestWorkspace utility
TestWorkspace allows easily writing tests involving multiple
files that can have inclusion relationships between them.
BackgroundIndexTest.RelationsMultiFile is refactored to use
TestWorkspace, and moved to FileIndexTest as it no longer
depends on BackgroundIndex.
Differential Revision: https://reviews.llvm.org/D89297
Arthur Eubanks [Sat, 24 Oct 2020 23:26:48 +0000 (16:26 -0700)]
Fix typo SSC -> SCC
Fangrui Song [Sat, 24 Oct 2020 22:13:47 +0000 (15:13 -0700)]
[ELF] Don't crash on R_X86_64_GOTPCRELX for test/binop instructions
While MC did not produce R_X86_64_GOTPCRELX for test/binop instructions
(movl/adcl/addl/andl/...) before the previous commit, this code path has been
exercised by -fno-integrated-as for GNU as since 2016: -no-pie relaxing
may incorrectly access loc[-3] and produce a corrupted instruction.
Simply handle test/binop R_X86_64_GOTPCRELX like R_X86_64_GOTPCREL.
Fangrui Song [Sat, 24 Oct 2020 20:48:55 +0000 (13:48 -0700)]
[X86] Produce R_X86_64_GOTPCRELX for test/binop instructions (MOV32rm/TEST32rm/...) when -Wa,-mrelax-relocations=yes is enabled
We have been producing R_X86_64_REX_GOTPCRELX (MOV64rm/TEST64rm/...) and
R_X86_64_GOTPCRELX for CALL64m/JMP64m without the REX prefix since 2016 (to be
consistent with GNU as), but not for MOV32rm/TEST32rm/...
Drew Fisher [Sat, 24 Oct 2020 21:24:10 +0000 (14:24 -0700)]
[asan] Fix stack-use-after-free checks on non-main thread on Fuchsia
While some platforms call `AsanThread::Init()` from the context of the
thread being started, others (like Fuchsia) call `AsanThread::Init()`
from the context of the thread spawning a child. Since
`AsyncSignalSafeLazyInitFakeStack` writes to a thread-local, we need to
avoid calling it from the spawning thread on Fuchsia. Skipping the call
here on Fuchsia is fine; it'll get called from the new thread lazily on first
attempted access.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D89607
Drew Fisher [Sat, 24 Oct 2020 19:26:40 +0000 (12:26 -0700)]
[asan][fuchsia] set current thread before reading thread state
When enabling stack use-after-free detection, we discovered that we read
the thread ID on the main thread while it is still set to 2^24-1.
This patch moves our call to AsanThread::Init() out of CreateAsanThread,
so that we can call SetCurrentThread first on the main thread.
Reviewed By: mcgrathr
Differential Revision: https://reviews.llvm.org/D89606
Fangrui Song [Sat, 24 Oct 2020 19:46:47 +0000 (12:46 -0700)]
[AArch64][GlobalISel] Fix -Wunused-variable. NFC
Nico Weber [Sat, 24 Oct 2020 19:04:22 +0000 (15:04 -0400)]
Revert "hwasan: Disable operator {new,delete} interceptors when interceptors are disabled."
This reverts commit
fa66bcf4bc9467514dddacdba711a42e0a83cf9d.
Seems to break tests, see https://reviews.llvm.org/D89827#2351930
Sanjay Patel [Fri, 23 Oct 2020 17:44:45 +0000 (13:44 -0400)]
[CostModel] remove cost-kind predicate for vector reduction costs
This is similar in spirit to
01ea93d85d6e (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.
That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.
The ARM costs show a small difference between throughput and
size because there's an underlying difference in cmp/sel
costs that is also predicated on cost-kind.
Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.
Targets should provide better overrides if the current modeling
is not accurate.
Benjamin Kramer [Sat, 24 Oct 2020 16:00:33 +0000 (18:00 +0200)]
[X86] Add a stub for Intel's alderlake.
No scheduling, no autodetection.
Benjamin Kramer [Sat, 24 Oct 2020 16:00:20 +0000 (18:00 +0200)]
[X86] Add a stub for znver3 based on the little public information there is in AMD's manuals
No scheduling, no autodetection. Just enough so -march=znver3 works.
Benjamin Kramer [Sat, 24 Oct 2020 13:40:24 +0000 (15:40 +0200)]
Unbreak the clang-interpreter example after
0aec49c8531bc5282b095730d34681455826bc2c
dfukalov [Thu, 22 Oct 2020 16:38:56 +0000 (19:38 +0300)]
[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions.
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.
Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D89973
David Green [Sat, 24 Oct 2020 16:22:49 +0000 (17:22 +0100)]
[ARM] Remove some dead code. NFC
Andrzej Warzynski [Sat, 24 Oct 2020 16:04:25 +0000 (17:04 +0100)]
[flang][tests] Fix Python bug in the lit config
Without this change LIT tests for Flang fail with:
```
TypeError: append() takes exactly one argument (2 given)
```
Stefan Gränitz [Sat, 24 Oct 2020 14:58:06 +0000 (16:58 +0200)]
Reapply "[jitlink][ELF] Add zero-fill blocks for symbols in section SHN_COMMON"
Root cause of the test failure was fixed with:
[JITLink][ELF] PCRel32GOTLoad edge offset can be smaller three
This reverts commit
10b1a61bafba39fd7400a814a7272f41222ad579.
Stefan Gränitz [Sat, 24 Oct 2020 14:53:06 +0000 (16:53 +0200)]
[JITLink][ELF] PCRel32GOTLoad edge offset can be smaller three
Offset is 2 for MOVL instruction in test ELF_x86-64_common. This should fix the test failures.
Differential Revision: https://reviews.llvm.org/D89795
Caroline Concatto [Sat, 24 Oct 2020 11:33:19 +0000 (12:33 +0100)]
[Flang][Driver] Add infrastructure for basic frontend actions and file I/O
This patch introduces the dependencies required to read and manage input files
provided by the command line option. It also adds the infrastructure to create
and write to output files. The output is sent to either stdout or a file
(specified with the `-o` flag).
Separately, in order to be able to test the code for file I/O, it adds
infrastructure to create frontend actions. As a basic testable example, it adds
the `InputOutputTest` FrontendAction. The sole purpose of this action is to
read a file from the command line and print it either to stdout or the output
file. This action is run by using the `-test-io` flag also introduced in this
patch (available for `flang-new` and `flang-new -fc1`). With this patch:
```
flang-new -test-io input-file.f90
```
will read input-file.f90 and print it in the output file.
The `InputOutputTest` frontend action has been introduced primarily to
facilitate testing. It is hidden from users (i.e. it's only displayed with
`--help-hidden`). Currently Clang doesn’t have an equivalent action.
`-test-io` is used to trigger the InputOutputTest action in the Flang frontend
driver. This patch makes sure that “flang-new” forwards it to “flang-new -fc1"
by creating a preprocessor job. However, in Flang.cpp, `-test-io` is passed to
“flang-new -fc1” without `-E`. This way we make sure that the preprocessor is
_not_ run in the frontend driver. This is the desired behaviour: `-test-io`
should only read the input file and print it to the output stream.
co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com>
Differential Revision: https://reviews.llvm.org/D87989
TaWeiTu [Sat, 24 Oct 2020 13:50:33 +0000 (21:50 +0800)]
[NPM] Port -loop-versioning-licm to NPM
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D89371
Stefan Gränitz [Sat, 24 Oct 2020 13:41:48 +0000 (15:41 +0200)]
Revert "[jitlink][ELF] Add zero-fill blocks for symbols in section SHN_COMMON"
This reverts commit
e9955b0843cc1e5876430f3f051494d4197419f3. Cannot reproduce the buildbot failures yet. Reverting in the meantime.
TaWeiTu [Sat, 24 Oct 2020 13:39:42 +0000 (21:39 +0800)]
[LoopVersioning] Form dedicated exits for versioned loop to preserve simplify form
The exit blocks of the versioned and non-versioned loops are not dedicated and thus the two loops are not in simplify form.
Insert dummy exit blocks after loop versioning with `formDedicatedExits()` to preserve the simplify form for subsequence passes.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D89569
Stefan Gränitz [Sat, 24 Oct 2020 12:27:19 +0000 (14:27 +0200)]
[jitlink][ELF] Add zero-fill blocks for symbols in section SHN_COMMON
Symbols with special section index SHN_COMMON (0xfff2) haven't been handled so far and caused an invalid section error.
This is a more or less straightforward use of the code commented out at the end of the function. I checked with the ELF spec, that the symbol value gives the alignment.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D89795
Stefan Gränitz [Fri, 23 Oct 2020 19:52:12 +0000 (21:52 +0200)]
[JITLink][ELF] PCRel32GOTLoad relocations are resolved like regular PCRel32 ones
The difference is that the former are indirect and go to the GOT while the latter go to the target directly. This info can be used to relax indirect ones that don't need the GOT (because the target is in range). We check for this optimization beforehand. For formal correctness and to avoid confusion, we should only change the relocation kind if we actually apply the relaxation.
Simon Pilgrim [Sat, 24 Oct 2020 11:51:51 +0000 (12:51 +0100)]
Fix some signed/unsigned comparison gcc warnings from D87930
Simon Pilgrim [Sat, 24 Oct 2020 11:42:43 +0000 (12:42 +0100)]
[InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155)
As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns.
Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types.
Differential Revision: https://reviews.llvm.org/D89542
Simon Pilgrim [Sat, 24 Oct 2020 11:23:09 +0000 (12:23 +0100)]
[DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.
I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.
Differential Revision: https://reviews.llvm.org/D87930
Simon Pilgrim [Sat, 24 Oct 2020 10:30:32 +0000 (11:30 +0100)]
[LegalizeTypes] Legalize vector rotate operations
Lower vector rotate operations as long as the legalization occurs outside of LegalizeVectorOps.
This fixes https://bugs.llvm.org/show_bug.cgi?id=47320
Patch By: @rsanthir.quic (Ryan Santhirarajan)
Differential Revision: https://reviews.llvm.org/D89497
Nikita Popov [Sat, 24 Oct 2020 08:15:30 +0000 (10:15 +0200)]
[BasicAA] Avoid duplicate cache lookup (NFCI)
Rather than performing the cache lookup with both possible orders
for the locations, use the same canonicalization as the other
AliasCache lookups in BasicAA.
Nikita Popov [Fri, 23 Oct 2020 18:50:05 +0000 (20:50 +0200)]
[BasicAA] Fix caching in the presence of phi cycles
Any time we insert a block into VisitedPhiBBs, previously cached
values may no longer be valid for the recursive alias queries. As
such, perform them using an empty AAQueryInfo.
Note that if we recurse to the same phi, the block will already
be inserted, so we reuse the old AAQueryInfo, and thus still
protect against infinite recursion.
This problem can appear with with an without BatchAA, but is more
likely to occur with BatchAA, as more values are cached.
Differential Revision: https://reviews.llvm.org/D90066
Jonas Paulsson [Fri, 23 Oct 2020 18:45:54 +0000 (20:45 +0200)]
[SystemZ] Define MaxInstLength to have the value of 6.
This value had the default value of 4 which caused branch relaxation to fail.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D90065
Michał Górny [Wed, 14 Oct 2020 07:57:33 +0000 (09:57 +0200)]
[lldb] [Process/NetBSD] Use XStateRegSet for all FPU registers
Unify the x86 regset API to use XStateRegSet for all FPU registers,
therefore eliminating the legacy API based on FPRegSet. This makes
the code a little bit simpler but most notably, it provides future
compatibility for register caching.
Since the NetBSD kernel takes care of providing compatibility with
pre-XSAVE processors, PT_{G,S}ETXSTATE can be used on systems supporting
only FXSAVE or even plain FSAVE (and unlike PT_{G,S}ETXMMREGS, it
clearly indicates that XMM registers are not supported).
Differential Revision: https://reviews.llvm.org/D90034
Martin Storsjö [Sat, 24 Oct 2020 06:32:11 +0000 (09:32 +0300)]
[lldb] Fix building with GCC 7. NFC.
Tony [Thu, 22 Oct 2020 07:15:41 +0000 (07:15 +0000)]
[AMDGPU] Cleanup AMDGPUUsage.rst
- Layout and typo improvements.
- Add memory spaces section.
- reStructure syntax fixes.
Differential Revision: https://reviews.llvm.org/D90002
Michael Kruse [Sat, 24 Oct 2020 03:59:57 +0000 (22:59 -0500)]
[flang] Fix pimpl idiom for IntrinsicProcTable.
The class IntrinsicProcTable uses the pimpl idiom and manages its own pointer-to-implementation. However, it violates the rule-of-five and does not implement a move-constructor or assignment-operator. Due to differences between compilers in implementation copy elision, these may or may not be used. Due to the missing user implementation for resource handling, using the results in runtime errors.
Fix my using `std::unique_ptr` instead of custom resource management.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D88794
Med Ismail Bennani [Fri, 23 Oct 2020 02:17:07 +0000 (04:17 +0200)]
[llvm/DebugInfo] Emit DW_OP_implicit_value when tuning for LLDB
This patch enables emitting DWARF `DW_OP_implicit_value` opcode when
tuning debug information for LLDB (`-debugger-tune=lldb`).
This will also propagate to Darwin platforms, since they use LLDB tuning
as a default.
rdar://
67406059
Differential Revision: https://reviews.llvm.org/D90001
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
Vitaly Buka [Sat, 24 Oct 2020 04:09:07 +0000 (21:09 -0700)]
[NFC][UBSAN] Refine CHECK pattern in test
As-is it was failed by unrelated linker warning with filename in the
output.
Peter Collingbourne [Tue, 20 Oct 2020 21:36:34 +0000 (14:36 -0700)]
hwasan: Disable operator {new,delete} interceptors when interceptors are disabled.
Differential Revision: https://reviews.llvm.org/D89827
Michael Kruse [Sat, 24 Oct 2020 03:24:22 +0000 (22:24 -0500)]
[flang][msvc] Fix lambda capture ambiguity. NFC.
Patch D88695 introduces a new local variable inside a lambda with the same name as a variable outside of it. In some of the if constexpr regions, msvc prioritizes the outer declaration and emits the error.
```
C:\Users\meinersbur\src\llvm-project\flang\lib\Evaluate\fold-implementation.h(1200): error C3493: 'context' cannot be implicitly captured because no default capture mode has been specified
```
This is fixed by giving the inner variable a different name.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D89367
Michael Kruse [Fri, 23 Oct 2020 14:50:30 +0000 (09:50 -0500)]
[flang][windows] Support platform-specific path separator.
Remove the assumption that the path separator is `/`. Use functions from `llvm::sys::path` instead.
Reviewed By: isuruf, klausler
Differential Revision: https://reviews.llvm.org/D89369
Zequan Wu [Wed, 14 Oct 2020 01:40:45 +0000 (18:40 -0700)]
[llvm-cov] don't include all source files when provided source files are filtered out
When all provided source files are filtered out either due to `--ignore-filename-regex` or not part of binary, don't generate coverage reults for all source files. Because if users want to generate coverage results for all source files, they don't even need to provid selected source files or `--ignore-filename-regex`.
Differential Revision: https://reviews.llvm.org/D89359
David Blaikie [Sat, 24 Oct 2020 02:19:36 +0000 (19:19 -0700)]
fix lldb for recent libDebugInfoDWARF API change
Vitaly Buka [Sat, 24 Oct 2020 01:53:33 +0000 (18:53 -0700)]
[NFC][UBSAN] Avoid "not FileCheck" in tests
It's not clear if "not FileCheck" succeeded because
input is empty or because input does not match "CHECK:"
pattern.
Duncan P. N. Exon Smith [Thu, 15 Oct 2020 21:17:55 +0000 (17:17 -0400)]
HeaderSearch: Simplify use of FileEntryRef in HeaderSearch::LookupFile, NFC
Simplify `HeaderSearch::LookupFile`. Instead of deconstructing a
`FileEntryRef` into a name and `FileEntry` and then rebuilding it later,
use it as is. This helps to unblock making the constructor of
`FileEntryRef` private to `FileManager`.
Differential Revision:
David Blaikie [Sat, 24 Oct 2020 00:51:56 +0000 (17:51 -0700)]
llvm-dwarfdump: Support verbose printing DW_OP_convert to print the CU local offset before the resolved absolute offset
Duncan P. N. Exon Smith [Thu, 15 Oct 2020 23:29:10 +0000 (19:29 -0400)]
clangd: Stop calling FileEntryRef::FileEntryRef
In `ReplayPreamble::replay`, use `getFileRef` instead of `getFile`, and
then use that `FileEntryRef` later to avoid needing
`FileEntryRef::FileEntryRef`. The latter is going to become private to
`FileManager` in a later commit.
Mehdi Amini [Sat, 24 Oct 2020 01:22:38 +0000 (01:22 +0000)]
Add CMake dependency from MLIRJitRunner on all dialects
This dependency was already existing indirectly, but is now more direct
since the registration relies on a inline function. This fixes the
link of the tools with BFD.
Duncan P. N. Exon Smith [Fri, 16 Oct 2020 01:03:49 +0000 (21:03 -0400)]
FileManager: Reorder declarations of FileEntry and FileEntryRef, NFC
This reduces noise in a future patch, but shouldn't change anything
otherwise.
Differential Revision: https://reviews.llvm.org/D89521
Hongtao Yu [Fri, 23 Oct 2020 06:17:45 +0000 (23:17 -0700)]
[AutoFDO] Remove a broken assert in merging inlinee samples
Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.
To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like
```
A()
{
B(); // with nested profile B1 and C1
B(); // duplicated, with nested profile B1 and C1
B(); // with nested profile B2 and C2
}
```
For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into
```
A()
{
C(); // with nested profile C1
B(); // duplicated, with nested profile B1 and C1
B(); // with nested profile B2 and C2.
}
```
Here is what happens next:
1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite.
2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count.
3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on.
4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.
It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.
The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D90056
Mehdi Amini [Sat, 24 Oct 2020 00:36:51 +0000 (00:36 +0000)]
Remove unused verifyRegStateMapping() function in RegAllocFast (NFC)
This fixes compiler warning when building with assertions.
Mehdi Amini [Fri, 23 Oct 2020 20:19:35 +0000 (20:19 +0000)]
Remove global dialect registration
This has been deprecated for >1month now and removal was announced in:
https://llvm.discourse.group/t/rfc-revamp-dialect-registration/1559/11
Differential Revision: https://reviews.llvm.org/D86356
Mehdi Amini [Sat, 24 Oct 2020 00:34:58 +0000 (00:34 +0000)]
Topologically sort the library to link to mlir-cpu-runner which is required with some linkers like BFD (NFC)
Mehdi Amini [Sat, 24 Oct 2020 00:22:48 +0000 (00:22 +0000)]
Fix a few warnings from GCC (NFC)
Walter Erquinigo [Fri, 23 Oct 2020 23:28:29 +0000 (16:28 -0700)]
[intel-pt] Disable/Enable tracing to guarantee the trace is correct
As mentioned in the comment inside the code, the Intel documentation
states that the internal CPU buffer is flushed out to RAM only when tracing is
disabled. Otherwise, the buffer on RAM might be stale.
This diff disables tracing when the trace buffer is going to be read. This is a
quite safe operation, as the reading is done when the inferior is paused at a
breakpoint, so we are not losing any packets because there's no code being
executed.
After the reading is finished, tracing is enabled back.
It's a bit hard to write a test for this now, but Greg Clayton and I will
refactor the PT support and writing tests for it will be easier. However
I tested it manually by doing a script that automates
the following flow
```
(lldb) b main
Breakpoint 1: where = a.out`main + 15 at main.cpp:4:7, address = 0x000000000040050f
(lldb) r
Process 3078226 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x000000000040050f a.out`main at main.cpp:4:7
(lldb) processor-trace start
(lldb) b 5
Breakpoint 2: where = a.out`main + 22 at main.cpp:5:12, address = 0x0000000000400516
(lldb) c
Process 3078226 resuming
Process 3078226 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000400516 a.out`main at main.cpp:5:12
(lldb) processor-trace show-instr-log
thread #1: tid=3078226
0x40050f <+15>: movl $0x0, -0x8(%rbp)
>>> Before, some runs of the script up to this point lead to empty traces
(lldb) b 6
Breakpoint 3: where = a.out`main + 42 at main.cpp:6:14, address = 0x000000000040052a
(lldb) c
Process 3092991 resuming
Process 3092991 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 3.1
frame #0: 0x000000000040052a a.out`main at main.cpp:6:14
(lldb) processor-trace show-instr-log thread #1: tid=3092991
0x40050f <+15>: movl $0x0, -0x8(%rbp)
0x400516 <+22>: movl $0x0, -0xc(%rbp)
0x40051d <+29>: cmpl $0x2710, -0xc(%rbp) ; imm = 0x2710
0x400524 <+36>: jge 0x400546 ; <+70> at main.cpp
0x400524 <+36>: jge 0x400546 ; <+70> at main.cpp
>>> The trace was re-enabled correctly and includes the instruction of the
first reading.
```
Those instructions correspond to these lines
```
3 int main() {
4 int z = 0;
5 for (int i = 0; i < 10000; i++) {
6 z += fun(z)
...
```
Differential Revision: https://reviews.llvm.org/D85241
Richard Smith [Fri, 23 Oct 2020 23:13:49 +0000 (16:13 -0700)]
Don't allow structured binding declarations to decompose a
lambda-expression's captures.
The built-in structured binding rules for classes require that all
fields can be accessed by name, and the fields introduced for lambda
captures are unnamed, so decomposing a capturing lambda is ill-formed.
Krzysztof Parzyszek [Fri, 23 Oct 2020 23:05:06 +0000 (18:05 -0500)]
[Hexagon] Handle selection between HVX vector predicates
Make sure that (select i1 q0 q1) is handled properly.
Max Moroz [Fri, 23 Oct 2020 18:07:30 +0000 (11:07 -0700)]
[libFuzzer] Added -print_full_coverage flag.
-print_full_coverage=1 produces a detailed branch coverage dump when run on a single file.
Uses same infrastructure as -print_coverage flag, but prints all branches (regardless of coverage status) in an easy-to-parse format.
Usage: For internal use with machine learning fuzzing models which require detailed coverage information on seed files to generate mutations.
Differential Revision: https://reviews.llvm.org/D85928
Teresa Johnson [Fri, 23 Oct 2020 22:57:38 +0000 (15:57 -0700)]
[MemProf] Attempt to debug avr bot failure
Reverts the XFAIL added in
b67a2aef8ac9fd9c10666a05d72d909315140dcb,
which had no effect.
Adjust the test to make sure all output is dumped to stderr, so that
hopefully I can get a better idea of where/why this is failing.
Remove some redundant checking while here.
Arthur Eubanks [Thu, 8 Oct 2020 05:07:30 +0000 (22:07 -0700)]
[StructurizeCFG][NewPM] Port -structurizecfg to NPM
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.
The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.
This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89026
Arthur Eubanks [Fri, 16 Oct 2020 19:08:59 +0000 (12:08 -0700)]
[Inliner][NPM] Properly pass callee AAResults
Fixes noalias-calls.ll under NPM.
Differential Revision: https://reviews.llvm.org/D89592
Arthur Eubanks [Wed, 21 Oct 2020 15:46:25 +0000 (08:46 -0700)]
[test] Simplify pr33641_remove_arg_dbgvalue.ll
This makes it pass under the NPM.
The legacy PM pass ran passes on SCCs in a different order, causing
argpromotion to not trigger on @bar().
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D89889
Keith Smiley [Fri, 23 Oct 2020 22:00:25 +0000 (15:00 -0700)]
[llvm-install-name-tool] Add -prepend_rpath option
This diff adds the option -prepend_rpath which inserts an rpath as
the first rpath in the binary.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D89605
Akira Hatanaka [Fri, 23 Oct 2020 21:21:34 +0000 (14:21 -0700)]
[CodeGen] Emit destructor calls to destruct non-trivial C struct
temporaries created by conditional and assignment operators
rdar://problem/
64989559
Differential Revision: https://reviews.llvm.org/D83448
Evandro Menezes [Thu, 8 Oct 2020 21:20:24 +0000 (16:20 -0500)]
[RISCV] Use the commercial name for scheduling model (NFC)
Use the commercial name for the scheduling model for the SiFive 7 Series.
Richard Smith [Fri, 23 Oct 2020 21:27:24 +0000 (14:27 -0700)]
PR47954 / DR2126: permit temporary objects that are lifetime-extended by
variables that are usable in constant expressions to themselves be
usable in constant expressions.
Mehdi Amini [Fri, 23 Oct 2020 21:26:32 +0000 (21:26 +0000)]
Revert "Remove global dialect registration"
This reverts commit
b22e2e4c6e420b78a8a4c307f0cf002f51af9590.
Investigating broken builds
Cameron McInally [Fri, 23 Oct 2020 20:56:40 +0000 (15:56 -0500)]
[SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
Kirsten Lee [Fri, 23 Oct 2020 21:14:53 +0000 (14:14 -0700)]
Add a mlir natvis file for debugging with Visual Studio
Differential Revision: https://reviews.llvm.org/D89601
Artur Pilipenko [Fri, 2 Oct 2020 03:05:23 +0000 (20:05 -0700)]
GC-parseable element atomic memcpy/memmove
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.
See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ
Differential Revision: https://reviews.llvm.org/D88861