Simon Pilgrim [Mon, 20 Jan 2020 16:49:44 +0000 (16:49 +0000)]
[SelectionDAG] GetDemandedBits - fallback to SimplifyMultipleUseDemandedBits by default.
First step towards removing SelectionDAG::GetDemandedBits entirely since it so similar to SimplifyMultipleUseDemandedBits anyhow.
Thomas Preud'homme [Wed, 15 Jan 2020 00:02:15 +0000 (00:02 +0000)]
[FileCheck] Make Match unittest more flexible
Summary:
FileCheck's Match unittest needs updating whenever some call to
initNextPattern() is inserted before its final block of checks. This
commit change usage of LineNumber inside the Tester object so that the
line number of the current pattern can be queries, thereby making the
Match test more solid.
Reviewers: jhenderson, jdenny, probinson, grimar, arichardson, rnk
Reviewed By: jhenderson
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72913
Alexey Bataev [Sat, 18 Jan 2020 01:39:04 +0000 (20:39 -0500)]
[OPENMP]Fix PR44578: crash in target construct with captured global.
Target regions have implicit outer region which may erroneously capture
some globals when it should not. It may lead to a compiler crash at the
compile time.
Martin Probst [Mon, 20 Jan 2020 11:19:47 +0000 (12:19 +0100)]
clang-format: [JS] fix `??` opreator wrapping.
Summary:
clang-format currently treats the nullish coalescing operator `??` like
the ternary operator. That causes multiple nullish terms to be each
indented relative to the last `??`, as they would in a ternary.
The `??` operator is often used in chains though, and as such more
similar to other binary operators, such as `||`. So to fix the indent,
set its token type to `||`, so it inherits the same treatment.
This opens up the question of operator precedence. However, `??` is
required to be parenthesized when mixed with `||` and `&&`, so this is
not a problem that can come up in syntactically legal code.
Reviewers: krasimir
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73026
Haojian Wu [Mon, 20 Jan 2020 13:14:52 +0000 (14:14 +0100)]
[clangd] Avoid redundant testcases in rename unittest, NFC.
Reviewers: kadircet
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73035
Sid Manning [Fri, 27 Dec 2019 19:03:01 +0000 (13:03 -0600)]
Add support for Linux/Musl ABI
Differential revision: https://reviews.llvm.org/D72701
The patch adds a new option ABI for Hexagon. It primary deals with
the way variable arguments are passed and is use in the Hexagon Linux Musl
environment.
If a callee function has a variable argument list, it must perform the
following operations to set up its function prologue:
1. Determine the number of registers which could have been used for passing
unnamed arguments. This can be calculated by counting the number of
registers used for passing named arguments. For example, if the callee
function is as follows:
int foo(int a, ...){ ... }
... then register R0 is used to access the argument ' a '. The registers
available for passing unnamed arguments are R1, R2, R3, R4, and R5.
2. Determine the number and size of the named arguments on the stack.
3. If the callee has named arguments on the stack, it should copy all of these
arguments to a location below the current position on the stack, and the
difference should be the size of the register-saved area plus padding
(if any is necessary).
The register-saved area constitutes all the registers that could have
been used to pass unnamed arguments. If the number of registers forming
the register-saved area is odd, it requires 4 bytes of padding; if the
number is even, no padding is required. This is done to ensure an 8-byte
alignment on the stack. For example, if the callee is as follows:
int foo(int a, ...){ ... }
... then the named arguments should be copied to the following location:
current_position - 5 (for R1-R5) * 4 (bytes) - 4 (bytes of padding)
If the callee is as follows:
int foo(int a, int b, ...){ ... }
... then the named arguments should be copied to the following location:
current_position - 4 (for R2-R5) * 4 (bytes) - 0 (bytes of padding)
4. After any named arguments have been copied, copy all the registers that
could have been used to pass unnamed arguments on the stack. If the number
of registers is odd, leave 4 bytes of padding and then start copying them
on the stack; if the number is even, no padding is required. This
constitutes the register-saved area. If padding is required, ensure
that the start location of padding is 8-byte aligned. If no padding is
required, ensure that the start location of the on-stack copy of the
first register which might have a variable argument is 8-byte aligned.
5. Decrement the stack pointer by the size of register saved area plus the
padding. For example, if the callee is as follows:
int foo(int a, ...){ ... } ;
... then the decrement value should be the following:
5 (for R1-R5) * 4 (bytes) + 4 (bytes of padding) = 24 bytes
The decrement should be performed before the allocframe instruction.
Increment the stack-pointer back by the same amount before returning
from the function.
Thomas Preud'homme [Wed, 15 Jan 2020 23:02:03 +0000 (23:02 +0000)]
[FileCheck] Clean and improve unit tests
Summary:
Clean redundant unit test checks (codepath already tested elsewhere) and
add a few missing checks for existing numeric substitution and match
logic.
Reviewers: jhenderson, jdenny, probinson, grimar, arichardson, rnk
Reviewed By: jhenderson
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72912
Sanjay Patel [Mon, 20 Jan 2020 15:25:47 +0000 (10:25 -0500)]
[InstCombine] form copysign from select of FP constants (PR44153)
This should be the last step needed to solve the problem in the
description of PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
If we're casting an FP value to int, testing its signbit, and then
choosing between a value and its negated value, that's a
complicated way of saying "copysign":
(bitcast X) < 0 ? -TC : TC --> copysign(TC, X)
Differential Revision: https://reviews.llvm.org/D72643
LLVM GN Syncbot [Mon, 20 Jan 2020 15:32:54 +0000 (15:32 +0000)]
[gn build] Port
24b7b99b7d6
Miloš Stojanović [Fri, 17 Jan 2020 13:28:54 +0000 (14:28 +0100)]
[llvm-exegesis][NFC] Disassociate snippet generators from benchmark runners
The addition of `inverse_throughput` mode highlighted the disjointedness
of snippet generators and benchmark runners because it used the
`UopsSnippetGenerator` with the `LatencyBenchmarkRunner`.
To keep the code consistent tie the snippet generators to
parallelization/serialization rather than their benchmark runners.
Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`.
Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`.
Differential Revision: https://reviews.llvm.org/D72928
Eric Astor [Mon, 20 Jan 2020 14:54:59 +0000 (09:54 -0500)]
Fix build - removing legacy target reference.
Jon Chesterfield [Mon, 20 Jan 2020 14:52:15 +0000 (14:52 +0000)]
[libomptarget] Implement smid for amdgcn
Summary:
[libomptarget] Implement smid for amdgcn
Implementation is in a new file as it uses an intrinsic with
complicated encoding that warranted substantial comments.
Reviewers: jdoerfert, grokos, ABataev, ronlieb
Reviewed By: jdoerfert
Subscribers: jvesely, mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D72956
Guillaume Chatelet [Mon, 20 Jan 2020 13:47:28 +0000 (14:47 +0100)]
[Alignment][NFC] Use Align with CreateElementUnorderedAtomicMemCpy
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, nicolasvasilache
Subscribers: hiraditya, jfb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73041
Mark Murray [Thu, 16 Jan 2020 10:34:14 +0000 (10:34 +0000)]
[ARM][MVE][Intrinsics] Take abs() of VMINNMAQ, VMAXNMAQ intrinsics' first arguments.
Summary: Fix VMINNMAQ, VMAXNMAQ intrinsics; BOTH arguments have the absolute values taken.
Reviewers: dmgreen, simon_tatham
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72830
Eric Astor [Mon, 20 Jan 2020 14:18:25 +0000 (09:18 -0500)]
[ms] [llvm-ml] Add placeholder for llvm-ml, based on llvm-mc
As discussed on the mailing list, I plan to introduce an ml-compatible MASM assembler as part of providing more of the Windows build tools. This will be similar to llvm-mc, but with different command-line parameters.
This placeholder is purely a stripped-down version of llvm-mc; we'll eventually add support for the Microsoft-style command-line flags, and back it with a MASM parser.
Relanding this revision after fixing ARM-compatibility issues.
Reviewers: rnk, thakis, RKSimon
Reviewed By: thakis, RKSimon
Differential Revision: https://reviews.llvm.org/D72679
Raphael Isemann [Tue, 14 Jan 2020 17:58:17 +0000 (18:58 +0100)]
[lldb][NFC] Add test for iterator invalidation during code completion.
Sanjay Patel [Mon, 20 Jan 2020 13:17:42 +0000 (08:17 -0500)]
[InstSimplify] fold select of vector constants that include undef elements
As mentioned in D72643, we'd like to be able to assert that any select
of equivalent constants has been removed before we're deep into InstCombine.
But there's a loophole in that assertion for vectors with undef elements
that don't match exactly.
This patch should close that gap. If we have undefs, we can't safely
propagate those unless both constants elements for that lane are undef.
Differential Revision: https://reviews.llvm.org/D72958
dfukalov [Mon, 20 Jan 2020 13:40:02 +0000 (16:40 +0300)]
[SCEV] Swap guards estimation sequence. NFC
Summary:
Loop unroll spends a lot of time in SCEVs processing in case when a function
contains hundreds of simple 'for' loops with a quite complex arrays indexes like
for (int i = 0; i < 8; ++i) {
for (int j = 0; j < 32; ++j) {
C[j*8+i] = B[j*32+i+128] + A[i*64+128];
}
}
for (int i = 0; i < 8; ++i) {
for (int j = 0; j < 8; ++j) {
for (int k = 0; k < 32; ++k) {
D[k*64+i*8+j] = D[k*64+i*8+j] + E[i+16] * C[k*8+j+256];
}
}
}
The patch improves loop unroll speed since isLoopBackedgeGuardedByCond takes
much less time than isLoopEntryGuardedByCond in the edge case.
Reviewers: skatkov, sanjoy, mkazantsev
Reviewed By: sanjoy
Subscribers: fhahn, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72929
Raphael Isemann [Mon, 20 Jan 2020 13:02:50 +0000 (14:02 +0100)]
[lldb] Mark the implicit copy constructor as deleted when a move constructor is provided.
Summary:
CXXRecordDecls that have a move constructor but no copy constructor need to
have their implicit copy constructor marked as deleted (see C++11 [class.copy]p7, p18)
Currently we don't do that when building an AST with ClangASTContext which causes
Sema to realise that the AST is malformed and asserting when trying to create an implicit
copy constructor for us in the expression:
```
Assertion failed: ((data().DefaultedCopyConstructorIsDeleted || needsOverloadResolutionForCopyConstructor())
&& "Copy constructor should not be deleted"), function setImplicitCopyConstructorIsDeleted, file include/clang/AST/DeclCXX.h, line 828.
```
In the test case there is a class `NoCopyCstr` that should have its copy constructor marked as
deleted (as it has a move constructor). When we end up trying to tab complete in the
`IndirectlyDeletedCopyCstr` constructor, Sema realises that the `IndirectlyDeletedCopyCstr`
has no implicit copy constructor and tries to create one for us. It then realises that
`NoCopyCstr` also has no copy constructor it could find via lookup. However because we
haven't marked the FieldDecl as having a deleted copy constructor the
`needsOverloadResolutionForCopyConstructor()` returns false and the assert fails.
`needsOverloadResolutionForCopyConstructor()` would return true if during the time we
added the `NoCopyCstr` FieldDecl to `IndirectlyDeletedCopyCstr` we would have actually marked
it as having a deleted copy constructor (which would then mark the copy constructor of
`IndirectlyDeletedCopyCstr ` as needing overload resolution and Sema is happy).
This patch sets the correct mark when we complete our CXXRecordDecls (which is the time when
we know whether a copy constructor has been declared). In theory we don't have to do this if
we had a Sema around when building our debug info AST but at the moment we don't have this
so this has to do the job for now.
Reviewers: shafik
Reviewed By: shafik
Subscribers: aprantl, JDevlieghere, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D72694
Alex Zinenko [Mon, 20 Jan 2020 13:29:25 +0000 (14:29 +0100)]
[mlir] clarify LangRef wording around control flow in regions
It was unclear what "exiting a region" meant in the existing formulation.
Phrase it in terms of control flow transfer to the operation enclosing the
region.
Discussion: https://groups.google.com/a/tensorflow.org/d/msg/mlir/73d2O8gjTuA/xVj1KoCTBAAJ
Simon Tatham [Mon, 20 Jan 2020 13:25:50 +0000 (13:25 +0000)]
[ARM,MVE] Fix confusing MC names for MVE VMINA/VMAXA insns.
Summary:
A recent commit accidentally defined names like `MVE_VMAXAs8` as
instances of the multiclass `MVE_VMINA`, and vice versa. This has no
effect on the test suite, because nothing directly refers to those
instruction names (the isel patterns are generated in Tablegen using
`!cast<Instruction>(NAME)` inside a lower-level multiclass). But it
means that `llvm-mc -show-inst` was listing VMAXA as VMINA, and it
would also affect any further draft code gen patches that use those
instruction ids.
Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73034
Yi Kong [Mon, 20 Jan 2020 12:52:12 +0000 (20:52 +0800)]
[llvm-profdata] Fix hint message since argument format has changed
"-sample" option is now changed to "--sample".
Christian Sigg [Mon, 20 Jan 2020 12:30:24 +0000 (13:30 +0100)]
[mlir] Add in-dialect lowering of gpu.all_reduce.
Reviewers: ftynse, nicolasvasilache, herhut
Reviewed By: ftynse, herhut
Subscribers: liufengdb, aartbik, herhut, merge_guards_bot, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72129
Andrzej Warzynski [Fri, 17 Jan 2020 12:04:07 +0000 (12:04 +0000)]
[AArch64][SVE] Extend int_aarch64_sve_ld1_gather_imm
The ACLE distinguishes between the following addressing modes for gather
loads:
* "scalar base, vector offset", and
* "vector base, scalar offset".
For the "vector base, scalar offset" case, the
`int_aarch64_sve_ld1_gather_imm` intrinsic was added in
79f2422d.
Currently, that intrinsic assumes that the scalar offset is passed as an
immediate. As a result, it does not cater for cases where scalar offset
is stored in a register.
In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all
cases are covered:
* `int_aarch64_sve_ld1_gather_imm` is renamed as
`int_aarch64_sve_ld1_gather_scalar_offset`
* new DAG combine rules are added for GLD1_IMM for scenarios where the
offset is a non-immediate scalar or an out-of-range immediate
* sve-intrinsics-gather-loads-vector-base.ll is renamed as
sve-intrinsics-gather-loads-vector-base-imm-offset.ll
* sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test
file for non-immediate offsets
Similar changes are made for scatter store intrinsics.
Reviewed By: sdesmalen, efriedma
Differential Revision: https://reviews.llvm.org/D71773
Pavel Labath [Thu, 14 Nov 2019 13:21:41 +0000 (14:21 +0100)]
[lldb] Allow loading of minidumps with no process id
Summary:
Normally, on linux we retrieve the process ID from the LinuxProcStatus
stream (which is just the contents of /proc/%d/status pseudo-file).
However, this stream is not strictly required (it's a breakpad
extension), and we are encountering a fair amount of minidumps which do
not have it present. It's not clear whether this is the case with all
these minidumps, but the two known situations where this stream can be
missing are:
- /proc filesystem not mounted (or something to that effect)
- process crashing after exhausting (almost) all file descriptors (so
the minidump writer may not be able to open the /proc file)
Since this is a corner case which will become less and less relevant
(crashpad-generated minidumps should not suffer from this problem), I
work around this problem by hardcoding the PID to 1 in these cases.
The same thing is done by the gdb plugin when talking to a stub which
does not report a process id (e.g. a hardware probe).
Reviewers: jingham, clayborg
Subscribers: markmentovai, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D70238
Pavel Labath [Fri, 20 Dec 2019 15:34:55 +0000 (16:34 +0100)]
[lldb] Don't process symlinks deep inside DWARFUnit
Summary:
This code is handling debug info paths starting with /proc/self/cwd,
which is one of the mechanisms people use to obtain "relocatable" debug
info (the idea being that one starts the debugger with an appropriate
cwd and things "just work").
Instead of resolving the symlinks inside DWARFUnit, we can do the same
thing more elegantly by hooking into the existing Module path remapping
code. Since llvm::DWARFUnit does not support any similar functionality,
doing things this way is also a step towards unifying llvm and lldb
dwarf parsers.
Reviewers: JDevlieghere, aprantl, clayborg, jdoerfert
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D71770
Stephen Kelly [Sat, 11 Jan 2020 12:09:45 +0000 (12:09 +0000)]
Fix the invisible-traversal to ignore more nodes
Unnar Freyr Erlendsson [Mon, 20 Jan 2020 11:49:15 +0000 (12:49 +0100)]
Make SymbolFileDWARF::ParseLineTable use std::sort instead of insertion sort
Summary:
Motivation: When setting breakpoints in certain projects line sequences are frequently being inserted out of order.
Rather than inserting sequences one at a time into a sorted line table, store all the line sequences as we're building them up and sort and flatten afterwards.
Reviewers: jdoerfert, labath
Reviewed By: labath
Subscribers: teemperor, labath, mgrang, JDevlieghere, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D72909
Pavel Labath [Fri, 17 Jan 2020 13:58:46 +0000 (14:58 +0100)]
[lldb/DWARF] Simplify DWARFDebugInfoEntry::LookupAddress
Summary:
This method was doing a lot more than it's only caller needed
(DWARFDIE::LookupDeepestBlock) needed, so I inline it into the caller,
and remove any code which is not actually used. This includes code for
searching for the deepest function, and the code for working around
incomplete DW_AT_low_pc/high_pc attributes on a compile unit DIE (modern
compiler get this right, and this method is called on function DIEs
anyway).
This also improves our llvm consistency, as llvm::DWARFDebugInfoEntry is
just a very simple struct with no nontrivial logic.
Reviewers: JDevlieghere, aprantl
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D72920
Stephen Kelly [Mon, 20 Jan 2020 11:37:58 +0000 (11:37 +0000)]
Fix clang-formatting for recent commits
Evgeniy Brevnov [Fri, 27 Dec 2019 05:39:24 +0000 (12:39 +0700)]
[LV] Vectorizer should adjust trip count in profile information
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.
Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov
Reviewed By: Ayal, DaniilSuchkov
Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67905
Kadir Cetinkaya [Mon, 13 Jan 2020 08:00:19 +0000 (09:00 +0100)]
[clang][CodeComplete] Propogate printing policy to FunctionDecl
Summary:
Printing policy was not propogated to functiondecls when creating a
completion string which resulted in canonical template parameters like
`foo<type-parameter-0-0>`. This patch propogates printing policy to those as
well.
Fixes https://github.com/clangd/clangd/issues/76
Reviewers: ilya-biryukov
Subscribers: jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72715
Stephen Kelly [Sun, 19 Jan 2020 15:37:24 +0000 (15:37 +0000)]
Compare traversal for memoization before bound nodes container
Stephen Kelly [Sun, 19 Jan 2020 16:27:26 +0000 (16:27 +0000)]
Add missing tests for parent traversal
Haojian Wu [Mon, 20 Jan 2020 11:06:37 +0000 (12:06 +0100)]
[clangd] Remove a stale FIXME, NFC.
Simon Pilgrim [Mon, 20 Jan 2020 10:48:40 +0000 (10:48 +0000)]
[X86][SSE] Add PACKSS SimplifyMultipleUseDemandedBits 'sign bit' handling.
Attempt to use SimplifyMultipleUseDemandedBits to simplify PACKSS if we're only after the sign bit.
Pavel Labath [Fri, 17 Jan 2020 12:43:55 +0000 (13:43 +0100)]
[lldb/DWARF] Change how we construct a llvm::DWARFContext
Summary:
The goal of this patch is two-fold. First, it fixes a use-after-free in
the construction of the llvm DWARFContext. This happened because the
construction code was throwing away the lldb DataExtractors it got while
reading the sections (unlike their llvm counterparts, these are also
responsible for memory ownership). In most cases this did not matter,
because the sections are just slices of the mmapped file data. But this
isn't the case for compressed elf sections, in which case the section is
decompressed into a heap buffer. A similar thing also happen with object
files which are loaded from process memory.
The second goal is to make it explicit which sections go into the llvm
DWARFContext -- any access to the sections through both DWARF parsers
carries a risk of parsing things twice, so it's better if this is a
conscious decision. Also, this avoids loading completely irrelevant
sections (e.g. .text). At present, the only section that needs to be
present in the llvm DWARFContext is the debug_line_str. Using it through
both APIs is not a problem, as there is no parsing involved.
The first goal is achieved by loading the sections through the existing
lldb DWARFContext APIs, which already do the caching. The second by
explicitly enumerating the sections we wish to load.
Reviewers: JDevlieghere, aprantl
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D72917
Sjoerd Meijer [Mon, 20 Jan 2020 10:26:36 +0000 (10:26 +0000)]
[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks
This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in exit blocks, so that we can
clean-up loop update expressions inside the hardware-loops later in
ARMLowOverheadLoops, which is necessary to get actual performance gains for
tail-predicated loops.
Differential Revision: https://reviews.llvm.org/D72714
Evgeniy Brevnov [Mon, 20 Jan 2020 10:04:12 +0000 (17:04 +0700)]
[NFC][LoopUtils] Minor change in comment according to review D71990.
David Spickett [Mon, 13 Jan 2020 13:35:57 +0000 (13:35 +0000)]
[test] On Mac, don't try to use result of sysctl command if calling it failed.
If sysctl is not found at all, let the usual exception propogate
so that the user can fix their env. If it fails because of the
permissions required to read the property then print a warning
and continue.
Differential Revision: https://reviews.llvm.org/D72278
Evgeniy Brevnov [Thu, 16 Jan 2020 10:34:02 +0000 (17:34 +0700)]
[LoopUtils] Better accuracy for getLoopEstimatedTripCount.
Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2.
Reviewers: Ayal, fhahn
Reviewed By: Ayal
Subscribers: mgorny, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71990
Awanish Pandey [Mon, 20 Jan 2020 09:34:20 +0000 (15:04 +0530)]
Recommit "[DWARF5][DebugInfo]: Added support for DebugInfo generation for auto return type for C++ member functions."
Summary:
This was reverted in
328e0f3dcac52171b8cdedeaba22c98e7fbb75ea due to
chromium bot failure. This revision addresses that case.
Original commit message:
Summary:
This patch will provide support for auto return type for the C++ member
functions. Before this return type of the member function is deduced and
stored in the DIE.
This patch includes llvm side implementation of this feature.
Patch by: Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D70524
Georgii Rymar [Thu, 16 Jan 2020 13:13:23 +0000 (16:13 +0300)]
[llvm-objdump] - Fix the indentation when printing dynamic tags.
We have a bug currently: printed tag names might overlap the
value column. It happens for MIPS now.
This patch adds a logic to calculate the size of indentation on fly
to fix such issues.
Differential revision: https://reviews.llvm.org/D72838
Sjoerd Meijer [Mon, 20 Jan 2020 09:01:34 +0000 (09:01 +0000)]
[IndVarSimplify][LoopUtils] rewriteLoopExitValues. NFCI
This moves `rewriteLoopExitValues()` from IndVarSimplify to LoopUtils thus
making it a generic loop utility function. This allows to rewrite loop exit
values by just calling this function without running the whole IndVarSimplify
pass.
We use this in D72714 to rematerialise the iteration count in exit blocks, so
that we can clean-up loop update expressions inside the hardware-loops later.
Differential Revision: https://reviews.llvm.org/D72602
Fangrui Song [Mon, 20 Jan 2020 09:00:21 +0000 (01:00 -0800)]
[test] Simplify CodeGen/PowerPC/stack-coloring-vararg.mir
Georgii Rymar [Thu, 16 Jan 2020 10:51:01 +0000 (13:51 +0300)]
[llvm-mc] - Produce R_X86_64_PLT32 relocation for branches with JCC opcodes too.
The idea is to produce R_X86_64_PLT32 instead of
R_X86_64_PC32 for branches.
It fixes https://bugs.llvm.org/show_bug.cgi?id=44397.
This patch teaches MC to do that for JCC (jump if condition is met)
instructions. The new behavior matches modern GNU as.
It is similar to D43383, which did the same for "call/jmp foo",
but missed JCC cases.
Differential revision: https://reviews.llvm.org/D72831
Nathan Chancellor [Mon, 20 Jan 2020 08:08:31 +0000 (00:08 -0800)]
[LLVMgold][test] Fix llvm-nm test after D72658
Differential Revision: https://reviews.llvm.org/D73014
David Green [Sat, 18 Jan 2020 18:25:45 +0000 (18:25 +0000)]
[ARM] MVE VLDn postinc
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.
It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.
Differential Revision: https://reviews.llvm.org/D71194
David Green [Sat, 18 Jan 2020 18:15:06 +0000 (18:15 +0000)]
[ARM] MVE VLDn post inc tests. NFC
David Green [Fri, 17 Jan 2020 15:45:14 +0000 (15:45 +0000)]
[ARM] Favour post inc for MVE loops
We were previously not necessarily favouring postinc for the MVE loads
and stores, leading to extra code prior to the loop to set up the
preinc. MVE in general can benefit from postinc (as we don't have
unrolled loops), and certain instructions like the VLD2's only post-inc
versions are available.
Differential Revision: https://reviews.llvm.org/D70790
Fangrui Song [Mon, 20 Jan 2020 05:53:10 +0000 (21:53 -0800)]
[StackColoring] Remap FixedStackPseudoSourceValue frame index referenced by MachineMemOperand
StackColoring::remapInstructions() remaps MachineOperand frame index (e.g. %stack.1 -> %stack.0)
but does not remap FixedStackPseudoSourceValue frame index (e.g. store 4 into %stack.1.ap2.i.i)
referenced by MachineMemoryOperand.
This can cause an assertion failure when LiveDebugValues references a dead stack object.
It is difficult to craft a test case. -g, va_copy and stack-coloring are required.
I can only reproduce it on ppc32.
Kazuaki Ishizaki [Mon, 20 Jan 2020 03:14:37 +0000 (03:14 +0000)]
[mlir] NFC: Fix trivial typos in comments
Differential Revision: https://reviews.llvm.org/D73012
Eric Fiselier [Mon, 20 Jan 2020 02:49:14 +0000 (21:49 -0500)]
[libc++][libc++abi] Fix or suppress failing tests in single-threaded
builds.
Fix a libc++abi test that was incorrectly checking for threading
primitives even when threading was disabled.
Additionally, temporarily XFAIL some module tests that fail because
the <atomic> header is unsupported but still built as a part of the
std module.
To properly address this libc++ would either need to produce a different
module.modulemap for single-threaded configurations, or it would need
to make the <atomic> header not hard-error and instead be empty
for single-threaded configurations
Richard Smith [Wed, 15 Jan 2020 21:21:44 +0000 (13:21 -0800)]
Undo changes to release notes intended for the Clang 10 branch, not
Richard Smith [Fri, 17 Jan 2020 02:35:46 +0000 (18:35 -0800)]
List implicit operator== after implicit destructors in a vtable.
Summary:
We previously listed first declared members, then implicit operator=,
then implicit operator==, then implicit destructors. Per discussion on
https://github.com/itanium-cxx-abi/cxx-abi/issues/88, put the implicit
equality comparison operators at the very end, after all special member
functions.
Reviewers: rjmccall
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72897
Richard Smith [Mon, 20 Jan 2020 02:14:18 +0000 (18:14 -0800)]
PR42108 Consistently diagnose binding a reference template parameter to
a temporary.
We previously failed to materialize a temporary when performing an
implicit conversion to a reference type, resulting in our thinking the
argument was a value rather than a reference in some cases.
Michael Liao [Mon, 20 Jan 2020 02:11:54 +0000 (21:11 -0500)]
Reorder targets in alphabetical order. NFC.
Florian Hahn [Mon, 20 Jan 2020 01:11:43 +0000 (17:11 -0800)]
[X86] Try to avoid casts around logical vector ops recursively.
Currently PromoteMaskArithemtic only looks at a single operation to
skip casts. This means we miss cases where we combine multiple masks.
This patch updates PromoteMaskArithemtic to try to recursively promote
AND/XOR/AND nodes that terminate in truncates of the right size or
constant vectors.
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D72524
Fangrui Song [Sun, 19 Jan 2020 22:53:45 +0000 (14:53 -0800)]
[BranchRelaxation] Simplify offset computation and fix a bug in adjustBlockOffsets()
If Start!=0, adjustBlockOffsets() may unnecessarily adjust the offset of
Start. There is no correctness issue, but it can create more block
splits.
Nico Weber [Sun, 19 Jan 2020 23:13:08 +0000 (18:13 -0500)]
fix doc typos to cycle bots
Fangrui Song [Sun, 19 Jan 2020 21:35:54 +0000 (13:35 -0800)]
[TargetRegisterInfo] Default trackLivenessAfterRegAlloc() to true
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
Lang Hames [Fri, 17 Jan 2020 22:48:48 +0000 (14:48 -0800)]
[ORC] Add weak symbol support to defineMaterializing, fix for PR40074.
The MaterializationResponsibility::defineMaterializing method allows clients to
add new definitions that are in the process of being materialized to the JIT.
This patch adds support to defineMaterializing for symbols with weak linkage
where the new definitions may be rejected if another materializer concurrently
defines the same symbol. If a weak symbol is rejected it will not be added to
the MaterializationResponsibility's responsibility set. Clients can check for
membership in the responsibility set via the
MaterializationResponsibility::getSymbols() method before resolving any
such weak symbols.
This patch also adds code to RTDyldObjectLinkingLayer to tag COFF comdat symbols
introduced during codegen as weak, on the assumption that these are COFF comdat
constants. This fixes http://llvm.org/PR40074.
Michael Liao [Sun, 19 Jan 2020 17:23:11 +0000 (12:23 -0500)]
Fix gcc `-Wunused-variable` warning. NFC.
Reid Kleckner [Sun, 19 Jan 2020 16:20:17 +0000 (08:20 -0800)]
Remove extra "\01" prefix in EH docs
These escapes haven't been necessary since
f8b51c5f90c60. Remove them to
declutter the docs.
mydeveloperday [Sun, 19 Jan 2020 15:56:04 +0000 (15:56 +0000)]
[clang-format] Expand the SpacesAroundConditions option to include catch statements
Summary: This diff expands the SpacesAroundConditions option added in D68346 to include adding spaces to catch statements.
Reviewed By: MyDeveloperDay
Patch by: timwoj
Differential Revision: https://reviews.llvm.org/D72793
mydeveloperday [Sun, 19 Jan 2020 15:52:26 +0000 (15:52 +0000)]
[clang-format] Add IndentCaseBlocks option
Summary:
The documentation for IndentCaseLabels claimed that the "Switch
statement body is always indented one level more than case labels". This
is technically false for the code block immediately following the label.
Its closing bracket aligns with the start of the label.
If the case label are not indented, it leads to a style where the
closing bracket of the block aligns with the closing bracket of the
switch statement, which can be hard to parse.
This change introduces a new option, IndentCaseBlocks, which when true
treats the block as a scope block (which it technically is).
(Note: regenerated ClangFormatStyleOptions.rst using tools/dump_style.py)
Reviewed By: MyDeveloperDay
Patch By: capn
Tags: #clang-format, #clang
Differential Revision: https://reviews.llvm.org/D72276
mydeveloperday [Sun, 19 Jan 2020 15:41:40 +0000 (15:41 +0000)]
Allow space after C-style cast in C# code
Reviewed By: MyDeveloperDay, krasimir
Patch By: jbcoe
Differential Revision: https://reviews.llvm.org/D72150
LLVM GN Syncbot [Sun, 19 Jan 2020 14:54:02 +0000 (14:54 +0000)]
[gn build] Port
a0f50d73163
Nico Weber [Sun, 19 Jan 2020 14:51:25 +0000 (09:51 -0500)]
fix doc typos to cycle bots
Fangrui Song [Sun, 19 Jan 2020 05:44:06 +0000 (21:44 -0800)]
[CodeGen] Move fentry-insert, xray-instrumentation and patchable-function before addPreEmitPass()
This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).
This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.
Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.
Fangrui Song [Sun, 19 Jan 2020 06:36:33 +0000 (22:36 -0800)]
[XRay] Set hasSideEffects flag of PATCHABLE_FUNCTION_{ENTER,EXIT}
Otherwise they may be picked as the delay slot by mips-delay-slot-filler, if we move patchable-function before mips-delay-slot-filler.
Fangrui Song [Sun, 19 Jan 2020 07:12:22 +0000 (23:12 -0800)]
[DebugInfo][test] Change two MIR tests to use -start-before=livedebugvalues instead of -start-after=patchable-function
To break order dependency between livedebugvalues and patchable-function.
Craig Topper [Sun, 19 Jan 2020 05:25:17 +0000 (23:25 -0600)]
[X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together.
Summary:
I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.
I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72805
Fangrui Song [Thu, 16 Jan 2020 22:08:31 +0000 (14:08 -0800)]
[X86][BranchAlign] Suppress branch alignment for {,_}__tls_get_addr
The x86-64 General Dynamic TLS code sequence uses prefixes to allow
linker relaxation. Adding segment override prefix or NOPs can break
linker relaxation (ld -pie/-no-pie).
i386 General Dynamic and x86-64 Local Dynamic do not use prefixes, but
for simplicity, just disable auto padding consistently.
Reviewed By: skan, LuoYuanke
Differential Revision: https://reviews.llvm.org/D72878
Fangrui Song [Sun, 19 Jan 2020 00:59:11 +0000 (16:59 -0800)]
[AsmPrinter] Delete dead takeDeletedSymbsForFunction()
The code added in r98579 is dead now.
Saar Raz [Sat, 18 Jan 2020 22:45:25 +0000 (00:45 +0200)]
[Concepts] Fix name-type conflict compilation issues
D50360 caused some platforms to not compile due to a parameter with the name of a type.
Rename the parameter.
Saar Raz [Sat, 18 Jan 2020 07:11:43 +0000 (09:11 +0200)]
[Concepts] Requires Expressions
Implement support for C++2a requires-expressions.
Re-commit after compilation failure on some platforms due to alignment issues with PointerIntPair.
Differential Revision: https://reviews.llvm.org/D50360
Fangrui Song [Sat, 18 Jan 2020 21:54:11 +0000 (13:54 -0800)]
[llvm-exegesis][mips] Fix -Wunused-function after D72858
Jonas Devlieghere [Sat, 18 Jan 2020 21:12:45 +0000 (13:12 -0800)]
[lldb/Test] XFAIL TestRequireHWBreakpoints when HW BPs are avialable
Resolves PR44055
Rainer Orth [Sat, 18 Jan 2020 21:10:46 +0000 (22:10 +0100)]
[mlir] NFC: Rename index_t to index_type
mlir currently fails to build on Solaris:
/vol/llvm/src/llvm-project/dist/mlir/lib/Conversion/VectorToLoops/ConvertVectorToLoops.cpp:78:20: error: reference to 'index_t' is ambiguous
IndexHandle zero(index_t(0)), one(index_t(1));
^
/usr/include/sys/types.h:103:16: note: candidate found by name lookup is 'index_t'
typedef short index_t;
^
/vol/llvm/src/llvm-project/dist/mlir/include/mlir/EDSC/Builders.h:27:8: note: candidate found by name lookup is 'mlir::edsc::index_t'
struct index_t {
^
and many more.
Given that POSIX reserves all identifiers ending in `_t` 2.2.2 The Name Space <https://pubs.opengroup.org/onlinepubs/
9699919799/functions/V2_chap02.html>, it seems
quite unwise to use such identifiers in user code, even more so without a distinguished
prefix.
The following patch fixes this by renaming `index_t` to `index_type`.
cases.
Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D72619
Alexandre Ganea [Sat, 18 Jan 2020 17:57:16 +0000 (12:57 -0500)]
[mlir] Fix compilation with VS2019.
Jonas Devlieghere [Sat, 18 Jan 2020 19:36:28 +0000 (11:36 -0800)]
[debugserver] Share code between Enable/DisableHardwareWatchpoint (NFC)
This extract the common functionality of enabling and disabling hardware
watchpoints into a single function.
Differential revision: https://reviews.llvm.org/D72971
Fangrui Song [Sat, 18 Jan 2020 17:54:35 +0000 (09:54 -0800)]
[test] clang/test/InterfaceStubs/externstatic.c requires x86-registered-target
Reid Kleckner [Sat, 18 Jan 2020 17:33:00 +0000 (09:33 -0800)]
Revert "[Support] Explicitly instantiate BumpPtrAllocatorImpl"
This reverts commit
add95990508ee0aec90d07bcce1bba47b4f46622.
Buildbots don't seem to like it.
Reid Kleckner [Sat, 18 Jan 2020 17:20:20 +0000 (09:20 -0800)]
[Support] Explicitly instantiate BumpPtrAllocatorImpl
Most clients only ever use the default BumpPtrAllocator.
Eric Astor [Sat, 18 Jan 2020 14:50:32 +0000 (09:50 -0500)]
Revert "[ms] [llvm-ml] Add placeholder for llvm-ml, based on llvm-mc"
This reverts commit
22af2cbefc86dbef6e11ddaa96a08956e0baf22b, due to breakages on ARM platforms.
Saar Raz [Sat, 18 Jan 2020 12:58:01 +0000 (14:58 +0200)]
Revert "[Concepts] Requires Expressions"
This reverts commit
027931899763409e2c61a84bdee6057b5e838ffa.
There have been some failing tests on some platforms, reverting while investigating.
Simon Pilgrim [Sat, 18 Jan 2020 11:29:14 +0000 (11:29 +0000)]
[X86] Rename lowerShuffleAsRotate -> lowerShuffleAsVALIGN
Since it can only ever create VALIGN nodes.
Simon Pilgrim [Sat, 18 Jan 2020 10:55:09 +0000 (10:55 +0000)]
[X86][SSE] Add some v16i8 reverse + endian swap style shuffle tests
Saar Raz [Sat, 18 Jan 2020 07:11:43 +0000 (09:11 +0200)]
[Concepts] Requires Expressions
Implement support for C++2a requires-expressions.
Differential Revision: https://reviews.llvm.org/D50360
Michael Liao [Wed, 15 Jan 2020 07:06:57 +0000 (02:06 -0500)]
[DAG] Add helper for creating constant vector index with correct type. NFC.
Fred Riss [Sat, 18 Jan 2020 04:49:08 +0000 (20:49 -0800)]
[lldb/testsuite] Modernize 2 test Makefiles
Those old Makefiles used completely ad-hoc rules for building files,
which means they didn't obey the test harness' variants.
They were somewhat tricky to update as they use very peculiar build
flags for some files. For this reason I was careful to compare the
build commands before and after the change, which is how I found the
discrepancy fixed by the previous commit.
While some of the make syntax used here might not be easy to grasp for
newcomers (per-target variable overrides), it seems better than to
have to repliacte the Makefile.rules logic for the test variants and
platform support.
Fred Riss [Sat, 18 Jan 2020 04:34:16 +0000 (20:34 -0800)]
[lldb/Makefile.rules] Force the default target to be 'all'
The test harness invokes the test Makefiles with an explicit 'all'
target, but it's handy to be able to recursively call Makefile.rules
without speficying a goal.
Some time ago, we rewrote some tests in terms of recursive invocations
of Makefile.rules. It turns out this had an unintended side
effect. While using $(MAKE) for a recursive invocation passes all the
variables set on the command line down, it doesn't pass the make
goals. This means that those recursive invocations would invoke the
default rule. It turns out the default rule of Makefile.rules is not
'all', but $(EXE). This means that ti would work becuase the
executable is always needed, but it also means that the created
binaries would not follow some of the other top-level build
directives, like MAKE_DSYM.
Forcing 'all' to be the default target seems easier than making sure
all the invocations are correct going forward. This patch does this
using the .DEFAULT_GOAL directive rather than hoisting the 'all' rule
to be the first one of the file. It seems like this explicit approach
will be less prone to be broken in the future. Hopefully all the make
implementations we use support it.
David Blaikie [Sat, 18 Jan 2020 02:12:34 +0000 (18:12 -0800)]
DebugInfo: Move SectionLabel tracking into CU's addRange
This makes the SectionLabel handling more resilient - specifically for
future PROPELLER work which will have more CU ranges (rather than just
one per function).
Ultimately it might be nice to make this more general/resilient to
arbitrary labels (rather than relying on the labels being created for CU
ranges & then being reused by ranges, loclists, and possibly other
addresses). It's possible that other (non-rnglist/loclist) uses of
addresses will need the addresses to be in SectionLabels earlier (eg:
move the CU.addRange to be done on function begin, rather than function
end, so during function emission they are already populated for other
use).
David Blaikie [Sat, 18 Jan 2020 01:29:34 +0000 (17:29 -0800)]
[IR] Remove some unnecessary cleanup in Module's dtor, and use a unique_ptr to simplify some
Follow on from D72812, based on Mehdi Amini's feedback.
Derek Schuff [Wed, 18 Dec 2019 22:50:19 +0000 (14:50 -0800)]
[WebAssembly] Track frame registers through VReg and local allocation
This change has 2 components:
Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded. That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.
WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run. Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase
The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.
This is a reland of rG3a05c3969c18 with fixes for the expensive-checks
and Windows builds
Differential Revision: https://reviews.llvm.org/D71681
Frank Laub [Sat, 18 Jan 2020 01:11:04 +0000 (17:11 -0800)]
[MLIR] LLVM dialect: modernize and cleanups
Summary:
Modernize some of the existing custom parsing code in the LLVM dialect.
While this reduces some boilerplate code, it also reduces the precision
of the diagnostic error messges.
Reviewers: ftynse, nicolasvasilache, rriddle
Reviewed By: rriddle
Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72967
Matt Arsenault [Sat, 18 Jan 2020 01:02:51 +0000 (20:02 -0500)]
TableGen/GlobalISel: Don't check exact intrinsic opcode value
Matt Arsenault [Sat, 2 Nov 2019 00:57:29 +0000 (17:57 -0700)]
Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
Matt Arsenault [Fri, 17 Jan 2020 15:22:36 +0000 (10:22 -0500)]
AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp
The existing test is overly reliant on -mattr=-flat-for-global, and
some missing optimizations to re-use.