Reid Kleckner [Sun, 20 Dec 2020 01:45:49 +0000 (17:45 -0800)]
Fix left shift overflow UB in PPC backend on LLP64 platforms
Andrew Litteken [Thu, 27 Aug 2020 20:16:37 +0000 (15:16 -0500)]
[IROutliner] Deduplicating functions that only require inputs.
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.
We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D86978
Andrew Litteken [Sat, 19 Dec 2020 23:33:49 +0000 (17:33 -0600)]
Revert "[IROutliner] Deduplicating functions that only require inputs."
Missing reviewers and differential revision in commit message.
This reverts commit
5cdc4f57e50bbe0d211c109517c17defe78e0b73.
Andrew Litteken [Thu, 27 Aug 2020 20:16:37 +0000 (15:16 -0500)]
[IROutliner] Deduplicating functions that only require inputs.
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.
We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Craig Topper [Sat, 19 Dec 2020 22:25:16 +0000 (14:25 -0800)]
[TableGen][ARM][X86] Detect combining IntrReadMem and IntrWriteMem.
These properties aren't additive. They are closer to ReadOnly and
WriteOnly. The default is ReadWrite. ReadMem cancels the write property and
WriteMem cancels the read property. Combining them leaves neither.
This patch checks that when we process WriteMem, the Mod flag is
still set. And for ReadMem we check that the Ref flag set still set.
I've updated 2 target intrinsics that were combining these properties.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D93571
Greg McGary [Mon, 7 Dec 2020 06:33:38 +0000 (22:33 -0800)]
Handle overflow beyond the 127 common encodings limit
The common encodings table holds only 127 entries. The encodings index for compact entries is 8 bits wide, and indexes 127..255 are stored locally to each second-level page. Prior to this diff, lld would `fatal()` if encodings overflowed the 127 limit.
This diff populates a per-second-level-page encodings table as needed. When the per-page encodings table hits its limit, we must terminate the page. If such early termination would consume fewer entries than a regular (non-compact) encoding page, then we prefer the regular format.
Caveat: one reason the common-encoding table might overflow is because of DWARF debug-info references, which are not yet implemented and will come with a later diff.
Differential Revision: https://reviews.llvm.org/D93267
Roman Lebedev [Sat, 19 Dec 2020 19:23:35 +0000 (22:23 +0300)]
[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1
... for conditional branch case
Roman Lebedev [Sat, 19 Dec 2020 17:51:48 +0000 (20:51 +0300)]
[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree
Roman Lebedev [Sat, 19 Dec 2020 17:07:26 +0000 (20:07 +0300)]
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2
... for the custom case returning void.
Roman Lebedev [Sat, 19 Dec 2020 16:12:30 +0000 (19:12 +0300)]
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1
... for the general case of returning a value.
Roman Lebedev [Sat, 19 Dec 2020 16:10:27 +0000 (19:10 +0300)]
[NFCI][SimplifyCFG] SimplifyCondBranchToTwoReturns(): pull out BI->getParent() into a variable
Roman Lebedev [Sat, 19 Dec 2020 15:09:10 +0000 (18:09 +0300)]
[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.
Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
Roman Lebedev [Sat, 19 Dec 2020 13:52:54 +0000 (16:52 +0300)]
[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree
Roman Lebedev [Sat, 19 Dec 2020 13:18:04 +0000 (16:18 +0300)]
[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree
Roman Lebedev [Sat, 19 Dec 2020 12:38:30 +0000 (15:38 +0300)]
[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree
Roman Lebedev [Sat, 19 Dec 2020 09:48:32 +0000 (12:48 +0300)]
[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree
Still boring, simply drop all edges to successors of DomBlock,
and add an edge to to BB instead.
Roman Lebedev [Sat, 19 Dec 2020 08:47:56 +0000 (11:47 +0300)]
[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation
And that exposes that a number of tests don't *actually* manage to
maintain DomTree validity, which is inline with my observations.
Once again, SimlifyCFG pass currently does not require/preserve DomTree
by default, so this is effectively NFC.
Andrew Litteken [Thu, 3 Sep 2020 17:20:47 +0000 (12:20 -0500)]
[IRSim][IROutliner] Limit to extracting regions that only require
inputs.
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch limits the
extracted regions to those that only require inputs, and do not have any
other special cases.
We test that we do not outline the wrong constants in:
test/Transforms/IROutliner/outliner-different-constants.ll
test/Transforms/IROutliner/outliner-different-globals.ll
test/Transforms/IROutliner/outliner-constant-vs-registers.ll
We test that correctly outline in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Reviewers: paquette, plofti
Differential Revision: https://reviews.llvm.org/D86977
Craig Topper [Fri, 18 Dec 2020 08:54:45 +0000 (00:54 -0800)]
[X86] Teach assembler to accept vmsave/vmload/vmrun/invlpga/skinit with or without the fixed register operands
These instructions read their inputs from fixed registers rather
than using a modrm byte. We shouldn't require the user to list them
when parsing assembly. This matches the GNU assembler.
This patch adds InstAliases so we can accept either form. It also
changes the printing code to use the form without registers. This
will change the behavior of llvm-objdump, but should be consistent
with binutils objdump. This also matches what we already do in LLVM for
clzero and monitorx which also used fixed registers.
I need to add and improve tests before this can be commited. The
disassembler tests exist, but weren't checking the fixed register
so they pass before and after this change.
Fixes https://github.com/ClangBuiltLinux/linux/issues/1216
Differential Revision: https://reviews.llvm.org/D93524
Kazu Hirata [Sat, 19 Dec 2020 18:57:35 +0000 (10:57 -0800)]
[Analysis] Remove dead function getInstTypePair (NFC)
The last use of getInstTypePair with two parameters was removed on on
Jan 9, 2015 in commit
33d7f9de332701294f6528ae7151bc40ba008737. It
seems to be unused since then.
Kazu Hirata [Sat, 19 Dec 2020 18:43:18 +0000 (10:43 -0800)]
[Target, Transforms] Use contains (NFC)
Juneyoung Lee [Sat, 19 Dec 2020 16:03:19 +0000 (01:03 +0900)]
apply update_test_checks.py to a few files in llvm/test/Transforms/InstCombine
Mark de Wever [Sat, 19 Dec 2020 15:16:54 +0000 (16:16 +0100)]
[NFC][libc++] Fixes swapped comments.
Zakk Chen [Thu, 17 Dec 2020 17:30:03 +0000 (09:30 -0800)]
[RISCV] Define vlxe/vsxe/vsuxe intrinsics.
Define vlxe/vsxe intrinsics and lower to vlxei<EEW>/vsxei<EEW>
instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Differential Revision: https://reviews.llvm.org/D93471
Kristof Beyls [Sun, 13 Dec 2020 18:56:48 +0000 (18:56 +0000)]
[ARM] Add clang command line support for -mharden-sls=
The command line syntax is identical to the -mharden-sls= command line
syntax for AArch64 targets.
Differential Revision: https://reviews.llvm.org/D93221
Kristof Beyls [Thu, 26 Nov 2020 13:45:37 +0000 (13:45 +0000)]
[ARM] harden-sls-blr: avoid r12 and lr in indirect calls.
As a linker is allowed to clobber r12 on function calls, the code
transformation that hardens indirect calls is not correct in case a
linker does so. Similarly, the transformation is not correct when
register lr is used.
This patch makes sure that r12 or lr are not used for indirect calls
when harden-sls-blr is enabled.
Differential Revision: https://reviews.llvm.org/D92469
Kristof Beyls [Fri, 20 Nov 2020 16:11:17 +0000 (16:11 +0000)]
[ARM] Harden indirect calls against SLS
To make sure that no barrier gets placed on the architectural execution
path, each indirect call calling the function in register rN, it gets
transformed to a direct call to __llvm_slsblr_thunk_mode_rN. mode is
either arm or thumb, depending on the mode of where the indirect call
happens.
The llvm_slsblr_thunk_mode_rN thunk contains:
bx rN
<speculation barrier>
Therefore, the indirect call gets split into 2; one direct call and one
indirect jump.
This transformation results in not inserting a speculation barrier on
the architectural execution path.
The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.
As a linker is allowed to clobber r12 on function calls, the
above code transformation is not correct in case a linker does so.
Similarly, the transformation is not correct when register lr is used.
Avoiding r12/lr being used is done in a follow-on patch to make
reviewing this code easier.
Differential Revision: https://reviews.llvm.org/D92468
Kristof Beyls [Thu, 19 Nov 2020 13:58:26 +0000 (13:58 +0000)]
[ARM] Implement harden-sls-retbr for Thumb mode
The only non-trivial consideration in this patch is that the formation
of TBB/TBH instructions, which is done in the constant island pass, does
not understand the speculation barriers inserted by the SLSHardening
pass. As such, when harden-sls-retbr is enabled for a function, the
formation of TBB/TBH instructions in the constant island pass is
disabled.
Differential Revision: https://reviews.llvm.org/D92396
LLVM GN Syncbot [Sat, 19 Dec 2020 12:25:56 +0000 (12:25 +0000)]
[gn build] Port
195f44278c4
Kristof Beyls [Wed, 28 Oct 2020 21:04:11 +0000 (21:04 +0000)]
[ARM] Implement harden-sls-retbr for ARM mode
Some processors may speculatively execute the instructions immediately
following indirect control flow, such as returns, indirect jumps and
indirect function calls.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every indirect control flow where
control flow doesn't return to the next instruction, such as returns and
indirect jumps, but not indirect function calls.
Hardening of indirect function calls will be done in a later,
independent patch.
This patch is implementing the same functionality as the AArch64 counter
part implemented in https://reviews.llvm.org/D81400.
For AArch64, returns and indirect jumps only occur on RET and BR
instructions and hence the function attribute to control the hardening
is called "harden-sls-retbr" there. On AArch32, there is a much wider
variety of instructions that can trigger an indirect unconditional
control flow change. I've decided to stick with the name
"harden-sls-retbr" as introduced for the corresponding AArch64
mitigation.
This patch implements this for ARM mode. A future patch will extend this
to also support Thumb mode.
The inserted barriers are never on the correct, architectural execution
path, and therefore performance overhead of this is expected to be low.
To ensure these barriers are never on an architecturally executed path,
when the harden-sls-retbr function attribute is present, indirect
control flow is never conditionalized/predicated.
On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.
These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.
The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.
Differential Revision: https://reviews.llvm.org/D92395
Kazu Hirata [Sat, 19 Dec 2020 03:08:17 +0000 (19:08 -0800)]
[Analysis, CodeGen, IR] Use contains (NFC)
Jonas Devlieghere [Sat, 19 Dec 2020 02:40:12 +0000 (18:40 -0800)]
[lldb] Simplify the is_finalized logic in process and make it thread safe.
This is a speculative fix when looking at the finalization code in
Process. It tackles the following issues:
- Adds synchronization to prevent races between threads.
- Marks the process as finalized/invalid as soon as Finalize is called
rather than at the end.
- Simplifies the code by using only a single instance variable to track
finalization.
Differential revision: https://reviews.llvm.org/D93479
Tim Keith [Sat, 19 Dec 2020 01:43:51 +0000 (17:43 -0800)]
[flang] Fix bug in IMPLICIT NONE(EXTERNAL)
We were only checking the restrictions of IMPLICIT NONE(EXTERNAL) when a
procedure name is first encountered. But it can also happen with an
existing symbol, e.g. if an external function's return type is declared
before is it called. This change adds a check in that branch too.
Differential Revision: https://reviews.llvm.org/D93552
Jacques Pienaar [Sat, 19 Dec 2020 01:26:15 +0000 (17:26 -0800)]
[FileCheck] Add a literal check directive modifier
Introduce CHECK modifiers that change the behavior of the CHECK
directive. Also add a LITERAL modifier for cases where matching could
end requiring escaping strings interpreted as regex where only
literal/fixed string matching is desired (making the CHECK's more
difficult to write/fragile and difficult to interpret).
Sam McCall [Sat, 19 Dec 2020 01:23:39 +0000 (02:23 +0100)]
[clangd] Fix windows path handling in .clang-tidy parsing
Aditya Kumar [Fri, 18 Dec 2020 16:57:38 +0000 (08:57 -0800)]
[HotColdSplit] Reflect full cost of parameters in split penalty
Make the penalty for splitting a region more accurately reflect the cost
of materializing all of the inputs/outputs to/from the region.
This almost entirely eliminates code growth within functions which
undergo splitting in key internal frameworks, and reduces the size of
those frameworks between 2.6% to 3%.
rdar://
49167240
Patch by: Vedant Kumar(@vsk)
Reviewers: hiraditya,rjf,t.p.northover
Reviewed By: hiraditya,rjf
Differential Revision: https://reviews.llvm.org/D59715
Sam McCall [Sat, 19 Dec 2020 00:55:26 +0000 (01:55 +0100)]
[clangd] Don't cancel requests based on "updates" with same content
There's an unfortunate collision between two features:
- we implicitly cancel certain requests when the file changes, to avoid
the queue getting clogged building old revisions to service stale requests
- we "reparse-if-needed" by synthesizing a file change, e.g. on didSave
We could explicitly mark these synthetic requests to avoid this, but
looking for changes in file content clutters our APIs less and is
arguably the correct thing to do in any case.
Fixes https://github.com/clangd/clangd/issues/620
Akira Hatanaka [Sat, 19 Dec 2020 00:59:06 +0000 (16:59 -0800)]
[ObjC][ARC] Fix a bug where the inline-asm retain/claim RV marker wasn't
inserted when the original call had a 'returned' argument
The code is testing whether the instruction BBI points to is the call
that is paired up with the retainRV/claimRV call, but it doesn't work
when the call has a 'returned' argument since GetArgRCIdentityRoot looks
through 'returned' arguments.
rdar://
72485383
Kazushi (Jam) Marukawa [Fri, 18 Dec 2020 16:01:24 +0000 (01:01 +0900)]
[VE] Support copy of vector mask registers
Support VM and VMP registers in copyPhysReg() function. Also add
regression tests.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93547
Sam McCall [Fri, 18 Dec 2020 17:39:20 +0000 (18:39 +0100)]
[clangd] Make our printing policies for Hover more consistent, especially tags
Different cases were using a bunch of different variants of the printing policy.
Each of these had something going for it, but the result was inconsistent.
Goals:
- single printing policy used (almost) everywhere
- avoid unidiomatic tags like `class vector<class X>`
- be informative and easy to understand
For tags, the solution I wound up with is: we print only the outer tag and only
in the simplest cases where this elaboration won't cause confusion.
For example:
- class X
- enum Foo
- vector<int>
- X*
This seems to strike a nice balance of providing plenty of info/context in common
cases while never being confusing.
Differential Revision: https://reviews.llvm.org/D93553
Harald van Dijk [Fri, 18 Dec 2020 23:38:38 +0000 (23:38 +0000)]
[X86] Avoid generating invalid R_X86_64_GOTPCRELX relocations
We need to make sure not to emit R_X86_64_GOTPCRELX relocations for
instructions that use a REX prefix. If a REX prefix is present, we need to
instead use a R_X86_64_REX_GOTPCRELX relocation. The existing logic for
CALL64m, JMP64m, etc. already handles this by checking the HasREX parameter
and using it to determine which relocation type to use. Do this for all
instructions that can use relaxed relocations.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93561
Richard Smith [Fri, 18 Dec 2020 22:13:45 +0000 (14:13 -0800)]
[www] Remove '$Date$' marker from cxx_dr_status.
This doesn't actually work (any more?), and instead renders as a literal
$Date$ on the website.
Fraser Cormack [Fri, 18 Dec 2020 21:49:14 +0000 (21:49 +0000)]
[RISCV] Address clang-tidy warnings in RISCVTargetMachine. NFC.
Sanjay Patel [Fri, 18 Dec 2020 21:44:04 +0000 (16:44 -0500)]
[SLP] fix typo; NFC
Richard Smith [Fri, 18 Dec 2020 21:25:18 +0000 (13:25 -0800)]
Fix memory leak complicated non-type template arguments.
Richard Smith [Fri, 18 Dec 2020 21:18:47 +0000 (13:18 -0800)]
[c++2b] Add tests for feature test macros.
Richard Smith [Fri, 18 Dec 2020 21:11:01 +0000 (13:11 -0800)]
Add tests for the absence of feature test macros for features we don't
support yet.
Richard Smith [Fri, 18 Dec 2020 21:10:05 +0000 (13:10 -0800)]
[c++20] Mark class type NTTPs as done and start defining the feature test macro.
Fraser Cormack [Fri, 18 Dec 2020 12:51:48 +0000 (12:51 +0000)]
[RISCV] Assume no-op addrspacecasts by default
To support OpenCL, which typically uses SPIR as an IR, non-zero address
spaces must be accounted for. This patch makes the RISC-V target assume
no-op address space casts across the board, which effectively removes
the need to support addrspacecast instructions in the backend.
For a RISC-V implementation with different configurations or specialized
address spaces where casts aren't no-ops, the function can be adjusted
as required.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D93536
Chih-Ping Chen [Fri, 18 Dec 2020 21:01:37 +0000 (16:01 -0500)]
Rename files with same (case insensitive) name
Patch by: Aditya Kumar.
Differential Revision: https://reviews.llvm.org/D93559
Craig Topper [Fri, 18 Dec 2020 20:08:27 +0000 (12:08 -0800)]
[RISCV] Add intrinsics for vsetvli instruction
This patch adds two IR intrinsics for vsetvli instruction. One to set the vector length to a user specified value and one to set it to vlmax. The vlmax uses the X0 source register encoding.
Clang builtins will follow in a separate patch
Differential Revision: https://reviews.llvm.org/D92973
Fangrui Song [Fri, 18 Dec 2020 20:08:16 +0000 (12:08 -0800)]
[TableGen] Fix D90844 introduced non-determinism due to iteration over a std::map over allocated object pointers
993eaf2d69d8beb97e4695cbd919b927ed1cfe86 (D90844) is still wrong.
The allocated const Record* pointers do not have an order guarantee
so switching from DenseMap to std::map does not help.
ProcModelMapTy = std::map<const Record*, unsigned>
Sort the values instead.
Nikita Popov [Fri, 18 Dec 2020 19:49:56 +0000 (20:49 +0100)]
[InstCombine] Regenerate test checks (NFC)
Craig Topper [Fri, 18 Dec 2020 19:22:43 +0000 (11:22 -0800)]
[RISCV] Sign extend constant arguments to V intrinsics when promoting to XLen.
The default behavior for any_extend of a constant is to zero extend.
This occurs inside of getNode rather than allowing type legalization
to promote the constant which would sign extend. By using sign extend
with getNode the constant will be sign extended. This gives a better
chance for isel to find a simm5 immediate since all xlen bits are
examined there.
For instructions that use a uimm5 immediate, this change only affects
constants >= 128 for i8 or >= 32768 for i16. Constants that large
already wouldn't have been eligible for uimm5 and would need to use a
scalar register.
If the instruction isn't able to use simm5 or the immediate is
too large, we'll need to materialize the immediate in a register.
As far as I know constants with all 1s in the upper bits should
materialize as well or better than all 0s.
Longer term we should probably have a SEW aware PatFrag to ignore
the bits above SEW before checking simm5.
I updated about half the test cases in some tests to use a negative
constant to get coverage for this.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D93487
Nikita Popov [Mon, 30 Nov 2020 22:51:54 +0000 (23:51 +0100)]
[DSE] Use correct memory location for read clobber check
MSSA DSE starts at a killing store, finds an earlier store and
then checks that the earlier store is not read along any paths
(without being killed first). However, it uses the memory location
of the killing store for that, not the earlier store that we're
attempting to eliminate.
This has a number of problems:
* Mismatches between what BasicAA considers aliasing and what DSE
considers an overwrite (even though both are correct in isolation)
can result in miscompiles. This is PR48279, which D92045 tries to
fix in a different way. The problem is that we're using a location
from a store that is potentially not executed and thus may be UB,
in which case analysis results can be arbitrary.
* Metadata on the killing store may be used to determine aliasing,
but there is no guarantee that the metadata is valid, as the specific
killing store may not be executed. Using the metadata on the earlier
store is valid (it is the store we're removing, so on any execution
where its removal may be observed, it must be executed).
* The location is imprecise. For full overwrites the killing store
will always have a location that is larger or equal than the earlier
access location, so it's beneficial to use the earlier access
location. This is not the case for partial overwrites, in which
case either location might be smaller. There is some room for
improvement here.
Using the earlier access location means that we can no longer cache
which accesses are read for a given killing store, as we may be
querying different locations. However, it turns out that simply
dropping the cache has no notable impact on compile-time.
Differential Revision: https://reviews.llvm.org/D93523
Craig Topper [Fri, 18 Dec 2020 19:17:09 +0000 (11:17 -0800)]
Recommit "[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f"
This time with tests.
Original message:
Similar to D93365, but for floating point. No need for special ISD opcodes
though. We can directly isel these from intrinsics. I had to use anyfloat_ty
instead of anyvector_ty in the intrinsics to make LLVMVectorElementType not
crash when imported into the -gen-dag-isel tablegen backend.
Differential Revision: https://reviews.llvm.org/D93426
Craig Topper [Fri, 18 Dec 2020 19:16:36 +0000 (11:16 -0800)]
Revert "[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f"
This reverts commit
46a40c4bc10671ebddb45fabd1a3b0b419a58109.
I forgot to git add the tests.
Craig Topper [Fri, 18 Dec 2020 19:11:15 +0000 (11:11 -0800)]
[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f
Similar to D93365, but for floating point. No need for special ISD opcodes
though. We can directly isel these from intrinsics. I had to use anyfloat_ty
instead of anyvector_ty in the intrinsics to make LLVMVectorElementType not
crash when imported into the -gen-dag-isel tablegen backend.
Differential Revision: https://reviews.llvm.org/D93426
Roman Lebedev [Fri, 18 Dec 2020 18:33:09 +0000 (21:33 +0300)]
[NFC][InstCombine] Fixup check lines for prof md in select_meta.ll test
Craig Topper [Fri, 18 Dec 2020 17:50:23 +0000 (09:50 -0800)]
[RISCV] Add intrinsics for vmv.x.s and vmv.s.x
This adds intrinsics for vmv.x.s and vmv.s.x.
I've used stricter type constraints on these intrinsics than what we've been doing on the arithmetic intrinsics so far. This will allow us to not need to pass the scalar type to the Intrinsic::getDeclaration call when creating these intrinsics.
A custom ISD is used for vmv.x.s in order to implement the change in computeNumSignBitsForTargetNode which can remove sign extends on the result.
I also modified the MC layer description of these instructions to show the tied source/dest operand. This is different than what we do for masked instructions where we drop the tied source operand when converting to MC. But it is a more accurate description of the instruction. We can't do this for masked instructions since we use the same MC instruction for masked and unmasked. Tools like llvm-mca operate in the MC layer and rely on ins/outs and Uses/Defs for analysis so I don't know if we'll be able to maintain the current behavior for masked instructions. So I went with the accurate description here since it was easy.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D93365
Kazu Hirata [Fri, 18 Dec 2020 18:29:51 +0000 (10:29 -0800)]
[GVNHoist] Remove successorDominate (NFC)
The function was introduced on Aug 25, 2016 in commit
5f0d0e60d11b8d2e48aacf31a82762280f9a8712.
Its last use was removed on Sep 13, 2017 in commit
dfa8741c9693c344477c842a25ee0cb6a6f59fcd.
Roman Lebedev [Fri, 18 Dec 2020 17:50:20 +0000 (20:50 +0300)]
[InstCombine] Canonicalize SPF to abs intrinsic
This patch enables canonicalization of SPF_ABS and SPF_ABS
to the abs intrinsic.
This is a recommit, the original try was
05d4c4ebc2fb006b8a2bd05b24c6aba10dd2eef8,
but it was reverted due to an apparent miscompile,
which since then has just been fixed by the previous commit.
Differential Revision: https://reviews.llvm.org/D87188
Roman Lebedev [Fri, 18 Dec 2020 17:29:27 +0000 (20:29 +0300)]
[InstSimplify] Don't miscompile `X == 0 ? abs(X) : -abs(X) --> -abs(X)` xform
The transform wasn't checking that the LHS of the comparison
*is* the `X` in question...
This is the miscompile that was holding up D87188.
Thanks to Dave Green for producing an actionable reproducer!
Roman Lebedev [Fri, 18 Dec 2020 17:24:25 +0000 (20:24 +0300)]
[NFC][InstSimplify] Add miscompiled testcase from D87188/D87197
Thanks to Dave Green for producing an actionable reproducer!
It is (obviously) a miscompile:
```
----------------------------------------
define i32 @select_abs_of_abs_eq_wrong(i32 %x, i32 %y) {
%0:
%abs = abs i32 %x, 0
%neg = sub i32 0, %abs
%cmp = icmp eq i32 %y, 0
%sel = select i1 %cmp, i32 %neg, i32 %abs
ret i32 %sel
}
=>
define i32 @select_abs_of_abs_eq_wrong(i32 %x, i32 %y) {
%0:
%abs = abs i32 %x, 0
ret i32 %abs
}
Transformation doesn't verify!
ERROR: Value mismatch
Example:
i32 %x = #xe0000000 (
3758096384, -
536870912)
i32 %y = #x00000000 (0)
Source:
i32 %abs = #x20000000 (
536870912)
i32 %neg = #xe0000000 (
3758096384, -
536870912)
i1 %cmp = #x1 (1)
i32 %sel = #xe0000000 (
3758096384, -
536870912)
Target:
i32 %abs = #x20000000 (
536870912)
Source value: #xe0000000 (
3758096384, -
536870912)
Target value: #x20000000 (
536870912)
Alive2: Transform doesn't verify!
```
Chih-Ping Chen [Thu, 17 Dec 2020 16:08:46 +0000 (11:08 -0500)]
[DebugInfo] Support Fortran 'use <external module>' statement.
The main change is to add a 'IsDecl' field to DIModule so
that when IsDecl is set to true, the debug info entry generated
for the module would be marked as a declaration. That way, the debugger
would look up the definition of the module in the gloabl scope.
Please see the comments in llvm/test/DebugInfo/X86/dimodule.ll
for what the debug info entries would look like.
Differential Revision: https://reviews.llvm.org/D93462
Björn Schäpers [Sat, 12 Dec 2020 22:17:15 +0000 (23:17 +0100)]
[clang-format][NFC] Expand BreakBeforeBraces examples
Differential Revision: https://reviews.llvm.org/D93170
diggerlin [Fri, 18 Dec 2020 18:02:41 +0000 (13:02 -0500)]
[AIX] Change the code based on https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-
20201214/864235.html
Summary:
change the code based on the discussion as:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-
20201214/864235.html
Florian Hahn [Fri, 18 Dec 2020 17:48:01 +0000 (17:48 +0000)]
Revert "[BasicAA] Handle two unknown sizes for GEPs"
Temporarily revert commit
8b1c4e310c2f9686cad925ad81d8e2be10a1ef3c.
After
8b1c4e310c2f the compile-time for `MultiSource/Benchmarks/MiBench/consumer-lame`
dramatically increases with -O3 & LTO, causing issues for builders with
that configuration.
I filed PR48553 with a smallish reproducer that shows a 10-100x compile
time increase.
Craig Topper [Fri, 18 Dec 2020 05:56:42 +0000 (21:56 -0800)]
[RISCV] Add intrinsics for vmv.v.v, vmv.v.x, and vmv.x.i
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Craig Topper <craig.topper@sifive.com>
Differential Revision: https://reviews.llvm.org/D93514
Kevin P. Neal [Wed, 16 Dec 2020 19:12:35 +0000 (14:12 -0500)]
Revert "Revert "[FPEnv] Teach the IRBuilder about invoke's correct use of the strictfp attribute.""
Similar to D69312, and documented in D69839, the IRBuilder needs to add
the strictfp attribute to invoke instructions when constrained floating
point is enabled.
This is try 2, with the test corrected.
Differential Revision: https://reviews.llvm.org/D93134
Whitney Tsang [Fri, 18 Dec 2020 17:35:46 +0000 (17:35 +0000)]
Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
Kazu Hirata [Fri, 18 Dec 2020 17:09:04 +0000 (09:09 -0800)]
[MCA, ExecutionEngine, Object] Use llvm::is_contained (NFC)
Craig Blackmore [Fri, 18 Dec 2020 16:57:01 +0000 (16:57 +0000)]
[RegisterScavenging] Fix assert in scavengeRegisterBackwards
According to the documentation, if a spill is required to make a
register available and AllowSpill is false, then NoRegister should be
returned, however, this scenario was actually triggering an assertion
failure.
This patch moves the assertion after the handling of AllowSpill.
Authored by: Lewis Revill
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D92104
Arnamoy Bhattacharyya [Fri, 18 Dec 2020 16:38:51 +0000 (11:38 -0500)]
[SROA] Remove Dead Instructions while creating speculative instructions
The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions.
This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet:
```
while (!PN.use_empty()) {
LoadInst *LI = cast<LoadInst>(PN.user_back());
LI->replaceAllUsesWith(NewPN);
LI->eraseFromParent();
}
```
If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line:
```
LoadInst *Load = PredBuilder.CreateAlignedLoad(
LoadTy, InVal, Alignment,
(PN.getName() + ".sroa.speculate.load." + Pred->getName()));
```
This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D92431
Fangrui Song [Fri, 18 Dec 2020 16:24:42 +0000 (08:24 -0800)]
[ELF] Rename R_TLS to R_TPREL and R_NEG_TLS to R_TPREL_NEG. NFC
The scope of R_TLS (TP offset relocation types (TPREL/TPOFF) used for the
local-exec TLS model) is actually narrower than its name may imply. R_TLS_NEG
is only used by Solaris R_386_TLS_LE_32.
Rename them so that they will be less confusing.
Reviewed By: grimar, psmith, rprichard
Differential Revision: https://reviews.llvm.org/D93467
Nicolas Vasilache [Fri, 18 Dec 2020 16:14:44 +0000 (16:14 +0000)]
[mlir][Linlag] Reflow Linalg.md - NFC
Markdown formatting seems to now be available, reflowing the doc without changing any content.
David Green [Fri, 18 Dec 2020 16:13:08 +0000 (16:13 +0000)]
[ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
vmov q0[2], q0[0], r2, r0
vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.
This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:
3 2 1 0 -> vmovqrr 31; vmovqrr 20
3 2 1 -> vmovqrr 31; vmov 2
3 1 -> vmovqrr 31
2 1 0 -> vmovqrr 20; vmov 1
2 0 -> vmovqrr 20
With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.
This is a recommit of
6cc3d80a84884a79967fffa4596c14001b8ba8a3 after
fixing the backward instruction definitions.
Xun Li [Fri, 18 Dec 2020 16:05:04 +0000 (08:05 -0800)]
Cleanup coro-inline.ll
Following up with the comments in D92706.
- Use -passes instead of -enable-new-pm
- CoroEarly should happen before AlwaysInliner, adjust it.
- Remove some unnecessary barriers (still kept one)
- Cleanup unnecessary debug info
Differential Revision: https://reviews.llvm.org/D93342
Matt Arsenault [Fri, 18 Dec 2020 15:51:54 +0000 (10:51 -0500)]
PEI: Only call updateLiveness once per function
This only needs to be called once for the function, and it visits all
the necessary blocks in the function. It looks like
631f6b888c50276450fee8b9ef129f37f83fc5a1 accidentally moved this into
the loop over all save blocks.
Simon Pilgrim [Fri, 18 Dec 2020 16:00:27 +0000 (16:00 +0000)]
[X86] Avoid std::string creation in RecognizableInstr constructor. NFCI.
The value names in byteFromRec calls are compile time constants - just create StringRef directly instead of via std::string.
Lucas Prates [Fri, 18 Dec 2020 13:17:35 +0000 (13:17 +0000)]
[AArch64] Updating .arch_extension negative tests
This updates the test for the `.arch_extension` as directive negatives
to properly enable the extensions being tested on the llvm-mc command
line before validating that the directive correctly disables them.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D93538
Lucas Prates [Wed, 2 Dec 2020 16:00:02 +0000 (16:00 +0000)]
[AArch64] Add support for ls64 to the .arch_extension asm directive
This adds support for the 'ls64' AArch64 extension to the `.arch_extension`
asm directive.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D92574
Simon Pilgrim [Fri, 18 Dec 2020 15:19:43 +0000 (15:19 +0000)]
[X86][AVX] Remove X86ISD::SUBV_BROADCAST (PR38969)
Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_LOAD instead.
Remove all the X86SubVBroadcast isel patterns, including all the fallbacks for if memory folding failed.
Andrzej Warzynski [Fri, 18 Dec 2020 15:32:55 +0000 (15:32 +0000)]
[flang][driver] Rename unittest file (nfc)
This patch renames PrintPreprocessedTest.cpp as FrontendActionTest.cpp.
The latter reflects the contents of the file more accurately.
Sam McCall [Fri, 18 Dec 2020 15:34:34 +0000 (16:34 +0100)]
[clangd] zap a few warnings
Quentin Chateau [Fri, 18 Dec 2020 15:10:29 +0000 (16:10 +0100)]
[clangd] Smarter hover on auto and decltype
Only show the keyword as the hover "Name".
Show whether the type is deduced or undeduced as
the hover "Documentation".
Show the deduced type (if any) as the "Definition".
Don't show any hover information for:
- the "auto" word of "decltype(auto)"
- "auto" in lambda parameters
- "auto" in template arguments
---------------
This diff is a suggestion based on what @sammccall suggested in https://reviews.llvm.org/D92977 about hover on "auto". It somehow "hacks" onto the "Documentation" and "Definition" fields of `HoverInfo`. It sure looks good on VSCode, let me know if this seem acceptable to you.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D93227
Sanjay Patel [Fri, 18 Dec 2020 13:49:05 +0000 (08:49 -0500)]
[VectorCombine] allow peeking through GEPs when creating a vector load
This is an enhancement motivated by https://llvm.org/PR16739
(see D92858 for another).
We can look through a GEP to find a base pointer that may be
safe to use for a vector load. If so, then we shuffle (shift)
the necessary vector element over to index 0.
Alive2 proof based on 1 of the regression tests:
https://alive2.llvm.org/ce/z/yPJLkh
The vector translation is independent of endian (verify by
changing to leading 'E' in the datalayout string).
Differential Revision: https://reviews.llvm.org/D93229
Sam McCall [Fri, 18 Dec 2020 14:11:08 +0000 (15:11 +0100)]
[clangd] Fix broken JSON test on windows
Georgii Rymar [Fri, 11 Dec 2020 11:54:39 +0000 (14:54 +0300)]
[libObject, llvm-readobj] - Reimplement `ELFFile<ELFT>::getEntry`.
Currently, `ELFFile<ELFT>::getEntry` does not check an index of
an entry. Because of that the code might read past the end of the symbol
table silently. I've added a test to `llvm-readobj\ELF\relocations.test`
to demonstrate the possible issue. Also, I've added a unit test for
this method.
After this change, `getEntry` stops reporting the section index and
reuses the `getSectionContentsAsArray` method, which already has
all the validation needed. Our related warnings now provide
more and better context sometimes.
Differential revision: https://reviews.llvm.org/D93209
David Green [Fri, 18 Dec 2020 13:33:40 +0000 (13:33 +0000)]
Revert "[ARM] Match dual lane vmovs from insert_vector_elt"
This one needed more testing.
Tomas Matheson [Fri, 18 Dec 2020 13:29:50 +0000 (13:29 +0000)]
[AArch64] Fix Copy Elemination for negative values
Redundant Copy Elimination was eliminating a MOVi32imm -1 when it
determined that the value of the destination register is already -1.
However, it didn't take into account that the MOVi32imm zeroes the upper
32 bits (which are
FFFFFFFF) and therefore cannot be eliminated.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D93100
Paul Walker [Wed, 16 Dec 2020 14:58:20 +0000 (14:58 +0000)]
[NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions.
During isel there's no need to protect illegal types. Patch also
adds a missing unit test for tbl2 intrinsic using bfloat types.
Differential Revision: https://reviews.llvm.org/D93404
LLVM GN Syncbot [Fri, 18 Dec 2020 13:00:09 +0000 (13:00 +0000)]
[gn build] Port
e69e551e0e5
Aaron Ballman [Fri, 18 Dec 2020 12:53:39 +0000 (07:53 -0500)]
No longer reject tag declarations in the clause-1 of a for loop.
We currently reject this valid C construct by claiming it declares a
non-local variable: for (struct { int i; } s={0}; s.i != 0; s.i--) ;
We expected all declaration in the clause-1 declaration statement to be
a local VarDecl, but there can be other declarations involved such as a
tag declaration. This fixes PR35757.
David Zarzycki [Fri, 18 Dec 2020 11:04:50 +0000 (06:04 -0500)]
[LLDB] Unbreak the build after recent clang changes
9e08e51a20d0d2b1c5724bb17e969d036fced4cd introduced a new enum case.
Frank Derry Wanye [Fri, 18 Dec 2020 12:49:48 +0000 (07:49 -0500)]
new altera single work item barrier check
This lint check is a part of the FLOCL (FPGA Linters for OpenCL)
project out of the Synergy Lab at Virginia Tech.
FLOCL is a set of lint checks aimed at FPGA developers who write code
in OpenCL.
The altera single work item barrier check finds OpenCL kernel functions
that call a barrier function but do not call an ID function. These
kernel functions will be treated as single work-item kernels, which
could be inefficient or lead to errors.
Based on the "Altera SDK for OpenCL: Best Practices Guide."
Aleksandr Platonov [Fri, 18 Dec 2020 12:14:15 +0000 (15:14 +0300)]
[clangd] Ignore the static index refs from the dynamic index files.
This patch fixes the following problem:
- open a file with references to the symbol `Foo`
- remove all references to `Foo` (from the dynamic index).
- `MergedIndex::refs()` result will contain positions of removed references (from the static index).
The idea of this patch is to keep a set of files which were used during index build inside the index.
Thus at processing the static index references we can check if the file of processing reference is a part of the dynamic index or not.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D93393
Pavel Labath [Thu, 10 Dec 2020 14:52:00 +0000 (15:52 +0100)]
[lldb/test] Reduce boilerplate in lldb-server tests
Nearly all of our lldb-server tests have two flavours (lldb-server and
debugserver). Each of them is tagged with an appropriate decorator, and
each of them starts with a call to a matching "init" method. The init
calls are mandatory, and it's not possible to meaningfully combine them
with a different decorator.
This patch leverages the existing decorators to also tag the tests with
the appropriate debug server tag, similar to how we do with debug info
flavours. This allows us to make the "init" calls from inside the common
setUp method.
Kerry McLaughlin [Fri, 18 Dec 2020 11:04:41 +0000 (11:04 +0000)]
[SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter
This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate
addressing modes for scalable masked gathers & scatters.
selectGatherScatterAddrMode checks if the base pointer is null, in which case
we can swap the base pointer and the index, e.g.
getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices)
-> getelementptr %offset, <vscale x N x T> %indices
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D93132
Simon Pilgrim [Fri, 18 Dec 2020 01:01:39 +0000 (01:01 +0000)]
[X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling.
Simplifies a few more cases, notably shuffle demanded elts cases.