Nikita Popov [Sun, 17 Mar 2019 20:24:02 +0000 (20:24 +0000)]
[ConstantRange] Add fromKnownBits() method
Following the suggestion in D59450, I'm moving the code for constructing
a ConstantRange from KnownBits out of ValueTracking, which also allows us
to test this code independently.
I'm adding this method to ConstantRange rather than KnownBits (which
would have been a bit nicer API wise) to avoid creating a dependency
from Support to IR, where ConstantRange lives.
Differential Revision: https://reviews.llvm.org/D59475
llvm-svn: 356339
Sanjay Patel [Sun, 17 Mar 2019 19:08:00 +0000 (19:08 +0000)]
[InstCombine] canonicalize rotate right by constant to rotate left
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.
llvm-svn: 356338
Sanjay Patel [Sun, 17 Mar 2019 18:50:39 +0000 (18:50 +0000)]
[InstCombine] add tests for rotate by constant using funnel intrinsics; NFC
llvm-svn: 356337
David Green [Sun, 17 Mar 2019 16:11:22 +0000 (16:11 +0000)]
[ARM] Search backwards for CMP when combining into CBZ
The constant island pass currently only looks at the instruction immediately
before a branch for a CMP to fold into a CBZ/CBNZ. This extends it to search
backwards for the instruction that defines CPSR. We need to ensure that the
register is not overridden between the CMP and the branch.
Differential Revision: https://reviews.llvm.org/D59317
llvm-svn: 356336
David Green [Sun, 17 Mar 2019 16:00:21 +0000 (16:00 +0000)]
[ARM] Add some CBZ constant island tests. NFC
llvm-svn: 356335
George Rimar [Sun, 17 Mar 2019 15:46:03 +0000 (15:46 +0000)]
[LLD][ELF] - Replace one of the tests with a YAML version.
This removes one more binary from the inputs.
Differential revision: https://reviews.llvm.org/D59085
llvm-svn: 356334
Nikita Popov [Sun, 17 Mar 2019 15:45:38 +0000 (15:45 +0000)]
[DAGCombine] Fold (x & ~y) | y patterns
Fold (x & ~y) | y and it's four commuted variants to x | y. This pattern
can in particular appear when a vselect c, x, -1 is expanded to
(x & ~c) | (-1 & c) and combined to (x & ~c) | c.
This change has some overlap with D59066, which avoids creating a
vselect of this form in the first place during uaddsat expansion.
Differential Revision: https://reviews.llvm.org/D59174
llvm-svn: 356333
Sanjay Patel [Sun, 17 Mar 2019 14:57:40 +0000 (14:57 +0000)]
[TargetLowering] improve the default expansion of uaddsat/usubsat
This is a subset of what was proposed in:
D59006
...and may overlap with test changes from:
D59174
...but it seems like a good general optimization to turn selects
into bitwise-logic when possible because we never know exactly
what can happen at this stage of DAG combining depending on how
the target has defined things.
Differential Revision: https://reviews.llvm.org/D59066
llvm-svn: 356332
Fangrui Song [Sun, 17 Mar 2019 13:53:42 +0000 (13:53 +0000)]
[ELF] De-virtualize findOrphanPos, excludeLibs and handleARMTlsRelocation
llvm-svn: 356331
Alex Bradbury [Sun, 17 Mar 2019 12:02:32 +0000 (12:02 +0000)]
[RISCV][NFC] Factor out matchRegisterNameHelper in RISCVAsmParser.cpp
Contains common logic to match a string to a register name.
llvm-svn: 356330
Alex Bradbury [Sun, 17 Mar 2019 12:00:58 +0000 (12:00 +0000)]
[RISCV] Fix RISCVAsmParser::ParseRegister and add tests
RISCVAsmParser::ParseRegister is called from AsmParser::parseRegisterOrNumber,
which in turn is called when processing CFI directives. The RISC-V
implementation wasn't setting RegNo, and so was incorrect. This patch address
that and adds cfi directive tests that demonstrate the fix. A follow-up patch
will factor out the register parsing logic shared between ParseRegister and
parseRegister.
llvm-svn: 356329
Peter Collingbourne [Sat, 16 Mar 2019 19:25:39 +0000 (19:25 +0000)]
CodeGen: Preserve packed attribute in constStructWithPadding.
Otherwise the object may have an incorrect size due to tail padding.
Differential Revision: https://reviews.llvm.org/D59446
llvm-svn: 356328
Simon Pilgrim [Sat, 16 Mar 2019 17:36:26 +0000 (17:36 +0000)]
[DAGCombine] combineShuffleOfScalars - handle non-zero SCALAR_TO_VECTOR indices (PR41097)
rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef.
llvm-svn: 356327
Yonghong Song [Sat, 16 Mar 2019 15:36:31 +0000 (15:36 +0000)]
[BPF] Add BTF Var and DataSec Support
Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added.
BTF_KIND_VAR has the following specification:
btf_type.name: var name
btf_type.info: type kind
btf_type.type: var type
// btf_type is followed by one u32
u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections)
Not all globals are supported in this patch. The following globals are supported:
. static variables with or without section attributes
. global variables with section attributes
The inclusion of globals with section attributes
is for future potential extraction of key/value
type id's from map definition.
BTF_KIND_DATASEC has the following specification:
btf_type.name: section name associated with variable or
one of .data/.bss/.readonly
btf_type.info: type kind and vlen for # of variables
btf_type.size: 0
#vlen number of the following:
u32: id of corresponding BTF_KIND_VAR
u32: in-session offset of the var
u32: the size of memory var occupied
At the time of debug info emission, the data section
size is unknown, so the btf_type.size = 0 for
BTF_KIND_DATASEC. The loader can patch it during
loading time.
The in-session offseet of the var is only available
for static variables. For global variables, the
loader neeeds to assign the global variable symbol value in
symbol table to in-section offset.
The size of memory is used to specify the amount of the
memory a variable occupies. Typically, it equals to
the type size, but for certain structures, e.g.,
struct tt {
int a;
int b;
char c[];
};
static volatile struct tt s2 = {3, 4, "abcdefghi"};
The static variable s2 has size of 20.
Note that for BTF_KIND_DATASEC name, the section name
does not contain object name. The compiler does have
input module name. For example, two cases below:
. clang -target bpf -O2 -g -c test.c
The compiler knows the input file (module) is test.c
and can generate sec name like test.data/test.bss etc.
. clang -target bpf -O2 -g -emit-llvm -c test.c -o - |
llc -march=bpf -filetype=obj -o test.o
The llc compiler has the input file as stdin, and
would generate something like stdin.data/stdin.bss etc.
which does not really make sense.
For any user specificed section name, e.g.,
static volatile int a __attribute__((section("id1")));
static volatile const int b __attribute__((section("id2")));
The DataSec with name "id1" and "id2" does not contain
information whether the section is readonly or not.
The loader needs to check the corresponding elf section
flags for such information.
A simple example:
-bash-4.4$ cat t.c
int g1;
int g2 = 3;
const int g3 = 4;
static volatile int s1;
struct tt {
int a;
int b;
char c[];
};
static volatile struct tt s2 = {3, 4, "abcdefghi"};
static volatile const int s3 = 4;
int m __attribute__((section("maps"), used)) = 4;
int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; }
-bash-4.4$ clang -target bpf -O2 -g -S t.c
Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m).
4 BTF_KIND_DATASEC's are generated with names
".data", ".bss", ".rodata" and "maps".
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D59441
llvm-svn: 356326
Simon Pilgrim [Sat, 16 Mar 2019 15:02:00 +0000 (15:02 +0000)]
[X86][SSE] Constant fold PEXTRB/PEXTRW/EXTRACT_VECTOR_ELT nodes.
Replaces existing i1-only fold.
llvm-svn: 356325
Simon Pilgrim [Sat, 16 Mar 2019 14:29:50 +0000 (14:29 +0000)]
[X86] Add SimplifyDemandedBitsForTargetNode support for PEXTRB/PEXTRW
Improved constant folding for PEXTRB/PEXTRW will be added in a future commit
llvm-svn: 356324
Csaba Dabis [Sat, 16 Mar 2019 13:47:55 +0000 (13:47 +0000)]
[analyzer] ConditionBRVisitor: Unknown condition evaluation support
Summary:
If the constraint information is not changed between two program states the
analyzer has not learnt new information and made no report. But it is
possible to happen because we have no information at all. The new approach
evaluates the condition to determine if that is the case and let the user
know we just `Assuming...` some value.
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: llvm-commits, xazax.hun, baloghadamsoftware, szepet, a.sidorin,
mikhail.ramalho, Szelethus, donat.nagy, dkrupp, gsd, gerazo
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D57410
llvm-svn: 356323
Csaba Dabis [Sat, 16 Mar 2019 11:55:07 +0000 (11:55 +0000)]
[analyzer] ConditionBRVisitor: Remove GDM checking
Summary:
Removed the `GDM` checking what could prevent reports made by this visitor.
Now we rely on constraint changes instead.
(It reapplies 356318 with a feature from 356319 because build-bot failure.)
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: cfe-commits, jdoerfert, gerazo, xazax.hun, baloghadamsoftware,
szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp
Tags: #clang
Differential Revision: https://reviews.llvm.org/D54811
llvm-svn: 356322
Csaba Dabis [Sat, 16 Mar 2019 10:44:49 +0000 (10:44 +0000)]
Revert "[analyzer] ConditionBRVisitor: Remove GDM checking"
This reverts commit
f962485adad9d646511fd3240c0408d9554e6784.
llvm-svn: 356321
Csaba Dabis [Sat, 16 Mar 2019 10:06:06 +0000 (10:06 +0000)]
Revert "[analyzer] ConditionBRVisitor: Unknown condition evaluation support"
This reverts commit
0fe67a61cd4aec13c7969a179517f1cc06ab05cd.
llvm-svn: 356320
Csaba Dabis [Sat, 16 Mar 2019 09:24:30 +0000 (09:24 +0000)]
[analyzer] ConditionBRVisitor: Unknown condition evaluation support
Summary: If the constraint information is not changed between two program states the analyzer has not learnt new information and made no report. But it is possible to happen because we have no information at all. The new approach evaluates the condition to determine if that is the case and let the user know we just 'Assuming...' some value.
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: xazax.hun, baloghadamsoftware, szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, gsd, gerazo
Tags: #clang
Differential Revision: https://reviews.llvm.org/D57410
llvm-svn: 356319
Csaba Dabis [Sat, 16 Mar 2019 09:16:16 +0000 (09:16 +0000)]
[analyzer] ConditionBRVisitor: Remove GDM checking
Summary: Removed the `GDM` checking what could prevent reports made by this visitor. Now we rely on constraint changes instead.
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: jdoerfert, gerazo, xazax.hun, baloghadamsoftware, szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp
Tags: #clang
Differential Revision: https://reviews.llvm.org/D54811
llvm-svn: 356318
Heejin Ahn [Sat, 16 Mar 2019 05:39:12 +0000 (05:39 +0000)]
[WebAssembly] Use rethrow intrinsic in the rethrow block
Summary:
Because in wasm we merge all catch clauses into one big catchpad, in
case none of the types in catch handlers matches after we test against
each of them, we should unwind to the next EH enclosing scope. For this,
we should NOT use a call to `__cxa_rethrow` but rather a call to our own
rethrow intrinsic, because what we're trying to do here is just to
transfer the control flow into the next enclosing EH pad (or the
caller). Calls to `__cxa_rethrow` should only be used after a call to
`__cxa_begin_catch`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59353
llvm-svn: 356317
Heejin Ahn [Sat, 16 Mar 2019 05:38:57 +0000 (05:38 +0000)]
[WebAssembly] Make rethrow take an except_ref type argument
Summary:
In the new wasm EH proposal, `rethrow` takes an `except_ref` argument.
This change was missing in r352598.
This patch adds `llvm.wasm.rethrow.in.catch` intrinsic. This is an
intrinsic that's gonna eventually be lowered to wasm `rethrow`
instruction, but this intrinsic can appear only within a catchpad or a
cleanuppad scope. Also this intrinsic needs to be invokable - otherwise
EH pad successor for it will not be correctly generated in clang.
This also adds lowering logic for this intrinsic in
`SelectionDAGBuilder::visitInvoke`. This routine is basically a
specialized and simplified version of
`SelectionDAGBuilder::visitTargetIntrinsic`, but we can't use it
because if is only for `CallInst`s.
This deletes the previous `llvm.wasm.rethrow` intrinsic and related
tests, which was meant to be used within a `__cxa_rethrow` library
function. Turned out this needs some more logic, so the intrinsic for
this purpose will be added later.
LateEHPrepare takes a result value of `catch` and inserts it into
matching `rethrow` as an argument.
`RETHROW_IN_CATCH` is a pseudo instruction that serves as a link between
`llvm.wasm.rethrow.in.catch` and the real wasm `rethrow` instruction. To
generate a `rethrow` instruction, we need an `except_ref` argument,
which is generated from `catch` instruction. But `catch` instrutions are
added in LateEHPrepare pass, so we use `RETHROW_IN_CATCH`, which takes
no argument, until we are able to correctly lower it to `rethrow` in
LateEHPrepare.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59352
llvm-svn: 356316
Heejin Ahn [Sat, 16 Mar 2019 04:46:05 +0000 (04:46 +0000)]
[WebAssembly] Method order change in LateEHPrepare (NFC)
Summary:
Currently the order of these methods does not matter, but the following
CL needs to have this order changed. Merging the order change and the
semantics change within a CL complicates the diff, so submitting the
order change first.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59342
llvm-svn: 356315
Peter Collingbourne [Sat, 16 Mar 2019 03:46:02 +0000 (03:46 +0000)]
gn build: Merge r356305.
llvm-svn: 356314
Heejin Ahn [Sat, 16 Mar 2019 03:00:19 +0000 (03:00 +0000)]
[WebAssembly] Irreducible control flow rewrite
Summary:
Rewrite WebAssemblyFixIrreducibleControlFlow to a simpler and cleaner
design, which directly computes reachability and other properties
itself. This avoids previous complexity and bugs. (The new graph
analyses are very similar to how the Relooper algorithm would find loop
entries and so forth.)
This fixes a few bugs, including where we had a false positive and
thought fannkuch was irreducible when it was not, which made us much
larger and slower there, and a reverse bug where we missed
irreducibility. On fannkuch, we used to be 44% slower than asm2wasm and
are now 4% faster.
Reviewers: aheejin
Subscribers: jdoerfert, mgrang, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D58919
Patch by Alon Zakai (kripken)
llvm-svn: 356313
Fangrui Song [Sat, 16 Mar 2019 02:41:45 +0000 (02:41 +0000)]
[ADT] Make SmallVector emplace_back return a reference
This follows the C++17 std::vector change and can simplify immediate
back() calls.
llvm-svn: 356312
Julian Lettner [Sat, 16 Mar 2019 02:07:50 +0000 (02:07 +0000)]
[TSan][libdispatch] Configure libdispatch lit tests
llvm-svn: 356311
Sam Clegg [Sat, 16 Mar 2019 01:18:12 +0000 (01:18 +0000)]
[WebAssembly] Error on R_WASM_MEMORY_ADDR relocations against undefined data symbols.
For these types of relocations an absolute memory address is
required which is not possible for undefined data symbols. For symbols
that can be undefined at link time (i.e. external data symbols in
shared libraries) a different type of relocation (i.e. via a GOT) will
be needed.
Differential Revision: https://reviews.llvm.org/D59337
llvm-svn: 356310
Amara Emerson [Sat, 16 Mar 2019 01:02:10 +0000 (01:02 +0000)]
[GlobalISel] Make isel verification checks of vregs run under NDEBUG only.
llvm-svn: 356309
Devin Coughlin [Sat, 16 Mar 2019 01:01:29 +0000 (01:01 +0000)]
[analyzer] Teach scan-build to find clang when installed in /usr/local/bin/
Change scan-build to support the scenario where scan-build is installed in
$TOOLCHAIN/usr/local/bin/ but clang itself is installed in $TOOLCHAIN/usr/bin/.
This is restricted to when 'xcrun' is present; that is, on the Mac.
rdar://problem/
48914634
Differential Revision: https://reviews.llvm.org/D59406
llvm-svn: 356308
Csaba Dabis [Fri, 15 Mar 2019 23:44:35 +0000 (23:44 +0000)]
hello, clang
Test commit with head and body.
llvm-svn: 356307
Peter Collingbourne [Fri, 15 Mar 2019 22:47:34 +0000 (22:47 +0000)]
gn build: Add missing dependency to check-clang target.
llvm-svn: 356306
Fedor Sergeev [Fri, 15 Mar 2019 22:15:23 +0000 (22:15 +0000)]
[TimePasses] allow -time-passes reporting into a custom stream
TimePassesHandler object (implementation of time-passes for new pass manager)
gains ability to report into a stream customizable per-instance (per pipeline).
Intended use is to specify separate time-passes output stream per each compilation,
setting up TimePasses member of StandardInstrumentation during PassBuilder setup.
That allows to get independent non-overlapping pass-times reports for parallel
independent compilations (in JIT-like setups).
By default it still puts timing reports into the info-output-file stream
(created by CreateInfoOutputFile every time report is requested).
Unit-test added for non-default case, and it also allowed to discover that print() does not work
as declared - it did not reset the timers, leading to yet another report being printed into the default stream.
Fixed print() to actually reset timers according to what was declared in print's comments before.
Reviewed By: philip.pfaffe
Differential Revision: https://reviews.llvm.org/D59366
llvm-svn: 356305
Amara Emerson [Fri, 15 Mar 2019 21:59:50 +0000 (21:59 +0000)]
[GlobalISel] Allow MachineIRBuilder to build subregister copies.
This relaxes some asserts about sizes, and adds an optional subreg parameter
to buildCopy().
Also update AArch64 instruction selector to use this in places where we
previously used MachineInstrBuilder manually.
Differential Revision: https://reviews.llvm.org/D59434
llvm-svn: 356304
Eli Friedman [Fri, 15 Mar 2019 21:44:49 +0000 (21:44 +0000)]
[ARM] Add MachineVerifier logic for some Thumb1 instructions.
tMOVr and tPUSH/tPOP/tPOP_RET have register constraints which can't be
expressed in TableGen, so check them explicitly. I've unfortunately run
into issues with both of these recently; hopefully this saves some time
for someone else in the future.
Differential Revision: https://reviews.llvm.org/D59383
llvm-svn: 356303
Jonathan Peyton [Fri, 15 Mar 2019 21:24:45 +0000 (21:24 +0000)]
[OpenMP] Fix OMPT cancellation test for GOMP
The GOMP sections interface uses schedule(dynamic) dispatch so it cannot
be assumed which thread executes the cancel and which thread executes
the cancellation point. This patch allows either thread to execute either
section.
llvm-svn: 356302
Roman Lebedev [Fri, 15 Mar 2019 21:18:05 +0000 (21:18 +0000)]
[X86] X86ISelLowering::combineSextInRegCmov(): also handle i8 CMOV's
Summary:
As noted by @andreadb in https://reviews.llvm.org/D59035#inline-525780
If we have `sext (trunc (cmov C0, C1) to i8)`,
we can instead do `cmov (sext (trunc C0 to i8)), (sext (trunc C1 to i8))`
Reviewers: craig.topper, andreadb, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits, andreadb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59412
llvm-svn: 356301
Roman Lebedev [Fri, 15 Mar 2019 21:17:53 +0000 (21:17 +0000)]
[X86] Promote i8 CMOV's (PR40965)
Summary:
@mclow.lists brought up this issue up in IRC, it came up during
implementation of libc++ `std::midpoint()` implementation (D59099)
https://godbolt.org/z/oLrHBP
Currently LLVM X86 backend only promotes i8 CMOV if it came from 2x`trunc`.
This differential proposes to always promote i8 CMOV.
There are several concerns here:
* Is this actually more performant, or is it just the ASM that looks cuter?
* Does this result in partial register stalls?
* What about branch predictor?
# Indeed, performance should be the main point here.
Let's look at a simple microbenchmark: {
F8412076}
```
#include "benchmark/benchmark.h"
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>
// Future preliminary libc++ code, from Marshall Clow.
namespace std {
template <class _Tp>
__inline _Tp midpoint(_Tp __a, _Tp __b) noexcept {
using _Up = typename std::make_unsigned<typename remove_cv<_Tp>::type>::type;
int __sign = 1;
_Up __m = __a;
_Up __M = __b;
if (__a > __b) {
__sign = -1;
__m = __b;
__M = __a;
}
return __a + __sign * _Tp(_Up(__M - __m) >> 1);
}
} // namespace std
template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
std::numeric_limits<T>::max());
std::vector<T> v;
v.reserve(count);
std::generate_n(std::back_inserter(v), count,
[&dis, &gen]() { return dis(gen); });
assert(v.size() == count);
return v;
}
struct RandRand {
template <typename T>
static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
return std::make_pair(getVectorOfRandomNumbers<T>(count),
getVectorOfRandomNumbers<T>(count));
}
};
struct ZeroRand {
template <typename T>
static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
return std::make_pair(std::vector<T>(count, T(0)),
getVectorOfRandomNumbers<T>(count));
}
};
template <class T, class Gen>
void BM_StdMidpoint(benchmark::State& state) {
const size_t Length = state.range(0);
const std::pair<std::vector<T>, std::vector<T>> Data =
Gen::template Gen<T>(Length);
const std::vector<T>& a = Data.first;
const std::vector<T>& b = Data.second;
assert(a.size() == Length && b.size() == a.size());
benchmark::ClobberMemory();
benchmark::DoNotOptimize(a);
benchmark::DoNotOptimize(a.data());
benchmark::DoNotOptimize(b);
benchmark::DoNotOptimize(b.data());
for (auto _ : state) {
for (size_t i = 0; i < Length; i++) {
const auto calculated = std::midpoint(a[i], b[i]);
benchmark::DoNotOptimize(calculated);
}
}
state.SetComplexityN(Length);
state.counters["midpoints"] =
benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
state.counters["midpoints/sec"] =
benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
const size_t BytesRead = 2 * sizeof(T) * Length;
state.counters["bytes_read/iteration"] =
benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
benchmark::Counter::OneK::kIs1024);
state.counters["bytes_read/sec"] = benchmark::Counter(
BytesRead, benchmark::Counter::kIsIterationInvariantRate,
benchmark::Counter::OneK::kIs1024);
}
template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
const size_t L2SizeBytes = 2 * 1024 * 1024;
// What is the largest range we can check to always fit within given L2 cache?
const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
/*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}
// Both of the values are random.
// The comparison is unpredictable.
BENCHMARK_TEMPLATE(BM_StdMidpoint, int32_t, RandRand)
->Apply(CustomArguments<int32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint32_t, RandRand)
->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int64_t, RandRand)
->Apply(CustomArguments<int64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint64_t, RandRand)
->Apply(CustomArguments<uint64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int16_t, RandRand)
->Apply(CustomArguments<int16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint16_t, RandRand)
->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, int8_t, RandRand)
->Apply(CustomArguments<int8_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint8_t, RandRand)
->Apply(CustomArguments<uint8_t>);
// One value is always zero, and another is bigger or equal than zero.
// The comparison is predictable.
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint32_t, ZeroRand)
->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint64_t, ZeroRand)
->Apply(CustomArguments<uint64_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint16_t, ZeroRand)
->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_StdMidpoint, uint8_t, ZeroRand)
->Apply(CustomArguments<uint8_t>);
```
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks ./llvm-cmov-bench-OLD ./llvm-cmov-bench-NEW
RUNNING: ./llvm-cmov-bench-OLD --benchmark_out=/tmp/tmp5a5qjm
2019-03-06 21:53:31
Running ./llvm-cmov-bench-OLD
Run on (8 X 4000 MHz CPU s)
CPU Caches:
L1 Data 16K (x8)
L1 Instruction 64K (x4)
L2 Unified 2048K (x4)
L3 Unified 8192K (x1)
Load Average: 1.78, 1.81, 1.36
----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters<...>
----------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072 300398 ns 300404 ns 2330 bytes_read/iteration=1024k bytes_read/sec=3.25083G/s midpoints=305.398M midpoints/sec=436.319M/s
BM_StdMidpoint<int32_t, RandRand>_BigO 2.29 N 2.29 N
BM_StdMidpoint<int32_t, RandRand>_RMS 2 % 2 %
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072 300433 ns 300433 ns 2330 bytes_read/iteration=1024k bytes_read/sec=3.25052G/s midpoints=305.398M midpoints/sec=436.278M/s
BM_StdMidpoint<uint32_t, RandRand>_BigO 2.29 N 2.29 N
BM_StdMidpoint<uint32_t, RandRand>_RMS 2 % 2 %
<...>
BM_StdMidpoint<int64_t, RandRand>/65536 169857 ns 169858 ns 4121 bytes_read/iteration=1024k bytes_read/sec=5.74929G/s midpoints=270.074M midpoints/sec=385.828M/s
BM_StdMidpoint<int64_t, RandRand>_BigO 2.59 N 2.59 N
BM_StdMidpoint<int64_t, RandRand>_RMS 3 % 3 %
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536 169770 ns 169771 ns 4125 bytes_read/iteration=1024k bytes_read/sec=5.75223G/s midpoints=270.336M midpoints/sec=386.026M/s
BM_StdMidpoint<uint64_t, RandRand>_BigO 2.59 N 2.59 N
BM_StdMidpoint<uint64_t, RandRand>_RMS 3 % 3 %
<...>
BM_StdMidpoint<int16_t, RandRand>/262144 591169 ns 591179 ns 1182 bytes_read/iteration=1024k bytes_read/sec=1.65189G/s midpoints=309.854M midpoints/sec=443.426M/s
BM_StdMidpoint<int16_t, RandRand>_BigO 2.25 N 2.25 N
BM_StdMidpoint<int16_t, RandRand>_RMS 1 % 1 %
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144 591264 ns 591274 ns 1184 bytes_read/iteration=1024k bytes_read/sec=1.65162G/s midpoints=310.378M midpoints/sec=443.354M/s
BM_StdMidpoint<uint16_t, RandRand>_BigO 2.25 N 2.25 N
BM_StdMidpoint<uint16_t, RandRand>_RMS 1 % 1 %
<...>
BM_StdMidpoint<int8_t, RandRand>/524288 2983669 ns 2983689 ns 235 bytes_read/iteration=1024k bytes_read/sec=335.156M/s midpoints=123.208M midpoints/sec=175.718M/s
BM_StdMidpoint<int8_t, RandRand>_BigO 5.69 N 5.69 N
BM_StdMidpoint<int8_t, RandRand>_RMS 0 % 0 %
<...>
BM_StdMidpoint<uint8_t, RandRand>/524288 2668398 ns 2668419 ns 262 bytes_read/iteration=1024k bytes_read/sec=374.754M/s midpoints=137.363M midpoints/sec=196.479M/s
BM_StdMidpoint<uint8_t, RandRand>_BigO 5.09 N 5.09 N
BM_StdMidpoint<uint8_t, RandRand>_RMS 0 % 0 %
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072 300887 ns 300887 ns 2331 bytes_read/iteration=1024k bytes_read/sec=3.24561G/s midpoints=305.529M midpoints/sec=435.619M/s
BM_StdMidpoint<uint32_t, ZeroRand>_BigO 2.29 N 2.29 N
BM_StdMidpoint<uint32_t, ZeroRand>_RMS 2 % 2 %
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536 169634 ns 169634 ns 4102 bytes_read/iteration=1024k bytes_read/sec=5.75688G/s midpoints=268.829M midpoints/sec=386.338M/s
BM_StdMidpoint<uint64_t, ZeroRand>_BigO 2.59 N 2.59 N
BM_StdMidpoint<uint64_t, ZeroRand>_RMS 3 % 3 %
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144 592252 ns 592255 ns 1182 bytes_read/iteration=1024k bytes_read/sec=1.64889G/s midpoints=309.854M midpoints/sec=442.62M/s
BM_StdMidpoint<uint16_t, ZeroRand>_BigO 2.26 N 2.26 N
BM_StdMidpoint<uint16_t, ZeroRand>_RMS 1 % 1 %
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288 987295 ns 987309 ns 711 bytes_read/iteration=1024k bytes_read/sec=1012.85M/s midpoints=372.769M midpoints/sec=531.028M/s
BM_StdMidpoint<uint8_t, ZeroRand>_BigO 1.88 N 1.88 N
BM_StdMidpoint<uint8_t, ZeroRand>_RMS 1 % 1 %
RUNNING: ./llvm-cmov-bench-NEW --benchmark_out=/tmp/tmpPvwpfW
2019-03-06 21:56:58
Running ./llvm-cmov-bench-NEW
Run on (8 X 4000 MHz CPU s)
CPU Caches:
L1 Data 16K (x8)
L1 Instruction 64K (x4)
L2 Unified 2048K (x4)
L3 Unified 8192K (x1)
Load Average: 1.17, 1.46, 1.30
----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters<...>
----------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072 300878 ns 300880 ns 2324 bytes_read/iteration=1024k bytes_read/sec=3.24569G/s midpoints=304.611M midpoints/sec=435.629M/s
BM_StdMidpoint<int32_t, RandRand>_BigO 2.29 N 2.29 N
BM_StdMidpoint<int32_t, RandRand>_RMS 2 % 2 %
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072 300231 ns 300226 ns 2330 bytes_read/iteration=1024k bytes_read/sec=3.25276G/s midpoints=305.398M midpoints/sec=436.578M/s
BM_StdMidpoint<uint32_t, RandRand>_BigO 2.29 N 2.29 N
BM_StdMidpoint<uint32_t, RandRand>_RMS 2 % 2 %
<...>
BM_StdMidpoint<int64_t, RandRand>/65536 170819 ns 170777 ns 4115 bytes_read/iteration=1024k bytes_read/sec=5.71835G/s midpoints=269.681M midpoints/sec=383.752M/s
BM_StdMidpoint<int64_t, RandRand>_BigO 2.60 N 2.60 N
BM_StdMidpoint<int64_t, RandRand>_RMS 3 % 3 %
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536 171705 ns 171708 ns 4106 bytes_read/iteration=1024k bytes_read/sec=5.68733G/s midpoints=269.091M midpoints/sec=381.671M/s
BM_StdMidpoint<uint64_t, RandRand>_BigO 2.62 N 2.62 N
BM_StdMidpoint<uint64_t, RandRand>_RMS 3 % 3 %
<...>
BM_StdMidpoint<int16_t, RandRand>/262144 592510 ns 592516 ns 1182 bytes_read/iteration=1024k bytes_read/sec=1.64816G/s midpoints=309.854M midpoints/sec=442.425M/s
BM_StdMidpoint<int16_t, RandRand>_BigO 2.26 N 2.26 N
BM_StdMidpoint<int16_t, RandRand>_RMS 1 % 1 %
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144 614823 ns 614823 ns 1180 bytes_read/iteration=1024k bytes_read/sec=1.58836G/s midpoints=309.33M midpoints/sec=426.373M/s
BM_StdMidpoint<uint16_t, RandRand>_BigO 2.33 N 2.33 N
BM_StdMidpoint<uint16_t, RandRand>_RMS 4 % 4 %
<...>
BM_StdMidpoint<int8_t, RandRand>/524288 1073181 ns 1073201 ns 650 bytes_read/iteration=1024k bytes_read/sec=931.791M/s midpoints=340.787M midpoints/sec=488.527M/s
BM_StdMidpoint<int8_t, RandRand>_BigO 2.05 N 2.05 N
BM_StdMidpoint<int8_t, RandRand>_RMS 1 % 1 %
BM_StdMidpoint<uint8_t, RandRand>/524288 1071010 ns 1071020 ns 653 bytes_read/iteration=1024k bytes_read/sec=933.689M/s midpoints=342.36M midpoints/sec=489.522M/s
BM_StdMidpoint<uint8_t, RandRand>_BigO 2.05 N 2.05 N
BM_StdMidpoint<uint8_t, RandRand>_RMS 1 % 1 %
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072 300413 ns 300416 ns 2330 bytes_read/iteration=1024k bytes_read/sec=3.2507G/s midpoints=305.398M midpoints/sec=436.302M/s
BM_StdMidpoint<uint32_t, ZeroRand>_BigO 2.29 N 2.29 N
BM_StdMidpoint<uint32_t, ZeroRand>_RMS 2 % 2 %
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536 169667 ns 169669 ns 4123 bytes_read/iteration=1024k bytes_read/sec=5.75568G/s midpoints=270.205M midpoints/sec=386.257M/s
BM_StdMidpoint<uint64_t, ZeroRand>_BigO 2.59 N 2.59 N
BM_StdMidpoint<uint64_t, ZeroRand>_RMS 3 % 3 %
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144 591396 ns 591404 ns 1184 bytes_read/iteration=1024k bytes_read/sec=1.65126G/s midpoints=310.378M midpoints/sec=443.257M/s
BM_StdMidpoint<uint16_t, ZeroRand>_BigO 2.26 N 2.26 N
BM_StdMidpoint<uint16_t, ZeroRand>_RMS 1 % 1 %
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288 1069421 ns 1069413 ns 655 bytes_read/iteration=1024k bytes_read/sec=935.092M/s midpoints=343.409M midpoints/sec=490.258M/s
BM_StdMidpoint<uint8_t, ZeroRand>_BigO 2.04 N 2.04 N
BM_StdMidpoint<uint8_t, ZeroRand>_RMS 0 % 0 %
Comparing ./llvm-cmov-bench-OLD to ./llvm-cmov-bench-NEW
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_StdMidpoint<int32_t, RandRand>/131072 +0.0016 +0.0016 300398 300878 300404 300880
<...>
BM_StdMidpoint<uint32_t, RandRand>/131072 -0.0007 -0.0007 300433 300231 300433 300226
<...>
BM_StdMidpoint<int64_t, RandRand>/65536 +0.0057 +0.0054 169857 170819 169858 170777
<...>
BM_StdMidpoint<uint64_t, RandRand>/65536 +0.0114 +0.0114 169770 171705 169771 171708
<...>
BM_StdMidpoint<int16_t, RandRand>/262144 +0.0023 +0.0023 591169 592510 591179 592516
<...>
BM_StdMidpoint<uint16_t, RandRand>/262144 +0.0398 +0.0398 591264 614823 591274 614823
<...>
BM_StdMidpoint<int8_t, RandRand>/524288 -0.6403 -0.6403 2983669 1073181 2983689 1073201
<...>
BM_StdMidpoint<uint8_t, RandRand>/524288 -0.5986 -0.5986 2668398 1071010 2668419 1071020
<...>
BM_StdMidpoint<uint32_t, ZeroRand>/131072 -0.0016 -0.0016 300887 300413 300887 300416
<...>
BM_StdMidpoint<uint64_t, ZeroRand>/65536 +0.0002 +0.0002 169634 169667 169634 169669
<...>
BM_StdMidpoint<uint16_t, ZeroRand>/262144 -0.0014 -0.0014 592252 591396 592255 591404
<...>
BM_StdMidpoint<uint8_t, ZeroRand>/524288 +0.0832 +0.0832 987295 1069421 987309 1069413
```
What can we tell from the benchmark?
* `BM_StdMidpoint<[u]int8_t, RandRand>` indeed has the worst performance.
* All `BM_StdMidpoint<uint{8,16,32}_t, ZeroRand>` are all performant, even the 8-bit case.
That is because there we are computing mid point between zero and some random number,
thus if the branch predictor is in use, it is in optimal situation.
* Promoting 8-bit CMOV did improve performance of `BM_StdMidpoint<[u]int8_t, RandRand>`, by -59%..-64%.
# What about branch predictor?
* `BM_StdMidpoint<uint8_t, ZeroRand>` was faster than `BM_StdMidpoint<uint{16,32,64}_t, ZeroRand>`,
which may mean that well-predicted branch is better than `cmov`.
* Promoting 8-bit CMOV degraded performance of `BM_StdMidpoint<uint8_t, ZeroRand>`,
`cmov` is up to +10% worse than well-predicted branch.
* However, i do not believe this is a concern. If the branch is well predicted, then the PGO
will also say that it is well predicted, and LLVM will happily expand cmov back into branch:
https://godbolt.org/z/P5ufig
# What about partial register stalls?
I'm not really able to answer that.
What i can say is that if the branch is unpredictable (if it is predictable, then use PGO and you'll have branch)
in ~50% of cases you will have to pay branch misprediction penalty.
```
$ grep -i MispredictPenalty X86Sched*.td
X86SchedBroadwell.td: let MispredictPenalty = 16;
X86SchedHaswell.td: let MispredictPenalty = 16;
X86SchedSandyBridge.td: let MispredictPenalty = 16;
X86SchedSkylakeClient.td: let MispredictPenalty = 14;
X86SchedSkylakeServer.td: let MispredictPenalty = 14;
X86ScheduleBdVer2.td: let MispredictPenalty = 20; // Minimum branch misdirection penalty.
X86ScheduleBtVer2.td: let MispredictPenalty = 14; // Minimum branch misdirection penalty
X86ScheduleSLM.td: let MispredictPenalty = 10;
X86ScheduleZnver1.td: let MispredictPenalty = 17;
```
.. which it can be as small as 10 cycles and as large as 20 cycles.
Partial register stalls do not seem to be an issue for AMD CPU's.
For intel CPU's, they should be around ~5 cycles?
Is that actually an issue here? I'm not sure.
In short, i'd say this is an improvement, at least on this microbenchmark.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40965 | PR40965 ]].
Reviewers: craig.topper, RKSimon, spatel, andreadb, nikic
Reviewed By: craig.topper, andreadb
Subscribers: jfb, jdoerfert, llvm-commits, mclow.lists
Tags: #llvm, #libc
Differential Revision: https://reviews.llvm.org/D59035
llvm-svn: 356300
Nikita Popov [Fri, 15 Mar 2019 21:04:34 +0000 (21:04 +0000)]
[AArch64] Turn BIC immediate creation into a DAG combine
Switch BIC immediate creation for vector ANDs from custom lowering
to a DAG combine, which gives generic DAG combines a change to
apply first. In particular this avoids (and x, -1) being turned into
a (bic x, 0) instead of being eliminated entirely.
Differential Revision: https://reviews.llvm.org/D59187
llvm-svn: 356299
Changpeng Fang [Fri, 15 Mar 2019 21:02:48 +0000 (21:02 +0000)]
AMDGPU: Fix a SIAnnotateControlFlow issue when there are multiple backedges.
Summary:
At the exit of the loop, the compiler uses a register to remember and accumulate
the number of threads that have already exited. When all active threads exit the
loop, this register is used to restore the exec mask, and the execution continues
for the post loop code.
When there is a "continue" in the loop, the compiler made a mistake to reset the
register to 0 when the "continue" backedge is taken. This will result in some
threads not executing the post loop code as they are supposed to.
This patch fixed the issue.
Reviewers:
nhaehnle, arsenm
Differential Revision:
https://reviews.llvm.org/D59312
llvm-svn: 356298
Alex Langford [Fri, 15 Mar 2019 20:43:53 +0000 (20:43 +0000)]
[CMake] Correct CMake message mode
Summary:
This wasn't actually printing out a CMake warning, it was prepending
"WARN" to the message.
Reviewers: zturner
Subscribers: mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59432
llvm-svn: 356297
Brian Gesiak [Fri, 15 Mar 2019 20:25:49 +0000 (20:25 +0000)]
[coroutines][PR40978] Emit error for co_yield within catch block
Summary:
As reported in https://bugs.llvm.org/show_bug.cgi?id=40978, it's an
error to use the `co_yield` or `co_await` keywords outside of a valid
"suspension context" as defined by [expr.await]p2 of
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4775.pdf.
Whether or not the current scope was in a function-try-block's
(https://en.cppreference.com/w/cpp/language/function-try-block) handler
could be determined using scope flag `Scope::FnTryCatchScope`. No
such flag existed for a simple C++ catch statement, so this commit adds
one.
Reviewers: GorNishanov, tks2103, rsmith
Reviewed By: GorNishanov
Subscribers: EricWF, jdoerfert, cfe-commits, lewissbaker
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59076
llvm-svn: 356296
Dan Liew [Fri, 15 Mar 2019 20:14:46 +0000 (20:14 +0000)]
[CMake] Fix broken uses of `try_compile_only()` and improve the function.
Summary:
There were existing calls to `try_compile_only()` with arguments not
prefixed by `SOURCE` or `FLAGS`. These were silently being ignored.
It looks like the `SOURCE` and `FLAGS` arguments were first introduced
in r278454.
One implication of this is that for a builtins only build for Darwin
(see `darwin_test_archs()`) it would mean we weren't actually passing
`-arch <arch>` to the compiler). This would result in compiler-rt
claiming all supplied architectures could be targetted provided
the compiler could build for Clang's default architecture.
This patch fixes this in several ways.
* Fixes all incorrect calls to `try_compile_only()`.
* Adds code to `try_compile_only()` to check for unhandled arguments
and raises a fatal error if this occurs. This should stop any
incorrect calls in the future.
* Improve the documentation on `try_compile_only()` which seemed
completely wrong.
rdar://problem/
48928526
Reviewers: beanz, fjricci, dsanders, kubamracek, yln, dcoughlin
Subscribers: mgorny, jdoerfert, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D59429
llvm-svn: 356295
Craig Topper [Fri, 15 Mar 2019 19:59:35 +0000 (19:59 +0000)]
[X86] Strip the SAE bit from the rounding mode passed to the _RND opcodes. Use TargetConstant to save a conversion in the isel table.
The asm parser generates the immediate without the SAE bit. So for consistency we should generate the MCInst the same way from CodeGen.
Since they are now both the same, remove the masking from the printer and replace with an llvm_unreachable.
Use a target constant since we're rebuilding the node anyway. Then we don't have to have isel convert it. Saves about 500 bytes from the isel table.
llvm-svn: 356294
Philip Reames [Fri, 15 Mar 2019 19:54:06 +0000 (19:54 +0000)]
[SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.
The result is that we replace a vector gep with at least one undef in each lane with a undef. We can also do the same for vector intrinsics. Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.
Differential Revision: https://reviews.llvm.org/D57468
llvm-svn: 356293
Simon Pilgrim [Fri, 15 Mar 2019 19:14:28 +0000 (19:14 +0000)]
[X86][SSE] Fold scalar_to_vector(i64 anyext(x)) -> bitcast(scalar_to_vector(i32 anyext(x)))
Reduce the size of an any-extended i64 scalar_to_vector source to i32 - the any_extend nodes are often introduced by SimplifyDemandedBits.
llvm-svn: 356292
Evgeny Mankov [Fri, 15 Mar 2019 19:04:46 +0000 (19:04 +0000)]
[CUDA][Windows] Partial fix for bug 38811 (Step 2 of 3)
Partial fix for the clang Bug 38811 "Clang fails to compile with CUDA-9.x on Windows".
[Synopsis]
__sptr is a new Microsoft specific modifier (https://docs.microsoft.com/en-us/cpp/cpp/sptr-uptr?view=vs-2017).
[Solution]
Replace all `__sptr` occurrences with `__s` (and all `__cptr` with `__c` as well) to eliminate the below clang compilation error on Windows.
In file included from C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:162:
C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_device_functions.h:524:33: error: expected expression
return __nv_fast_sincosf(__a, __sptr, __cptr);
^
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D59423
llvm-svn: 356291
Nikita Popov [Fri, 15 Mar 2019 18:37:45 +0000 (18:37 +0000)]
[ValueTracking] Use ConstantRange overflow checks for unsigned add/sub; NFC
Use the methods introduced in rL356276 to implement the
computeOverflowForUnsigned(Add|Sub) functions in ValueTracking, by
converting the KnownBits into a ConstantRange.
This is NFC: The existing KnownBits based implementation uses the same
logic as the the ConstantRange based one. This is not the case for the
signed equivalents, so I'm only changing unsigned here.
This is in preparation for D59386, which will also intersect the
computeConstantRange() result into the range determined from KnownBits.
llvm-svn: 356290
Jonathan Peyton [Fri, 15 Mar 2019 18:27:14 +0000 (18:27 +0000)]
[OpenMP] Add missing parenthesis in Perl module
llvm-svn: 356289
Jonathan Peyton [Fri, 15 Mar 2019 18:24:59 +0000 (18:24 +0000)]
[OpenMP] Remove deprecated taskq
Remove very old, unused, and deprecated taskq code.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58989
llvm-svn: 356288
Sanjay Patel [Fri, 15 Mar 2019 18:14:25 +0000 (18:14 +0000)]
[InstCombine] add tests for logic of NaN fcmps; NFC
llvm-svn: 356287
Nikita Popov [Fri, 15 Mar 2019 18:08:06 +0000 (18:08 +0000)]
[ConstantRange] Try to fix compiler warnings; NFC
Try to fix "ignoring return value" and "default label" errors on
clang-with-thin-lto-ubuntu buildbot.
llvm-svn: 356286
Philip Reames [Fri, 15 Mar 2019 18:06:32 +0000 (18:06 +0000)]
[tests] Add a test for constexpr mask as requested in D57372
llvm-svn: 356285
Zachary Turner [Fri, 15 Mar 2019 18:00:43 +0000 (18:00 +0000)]
Abbreviation declarations are required to have non-null tags.
Treat a null tag as an error.
llvm-svn: 356284
Sanjay Patel [Fri, 15 Mar 2019 18:00:28 +0000 (18:00 +0000)]
[InstCombine] add tests for masked store/scatter; NFC
Baseline tests for D57247
llvm-svn: 356283
Amara Emerson [Fri, 15 Mar 2019 18:00:01 +0000 (18:00 +0000)]
[AArch64][GlobalISel] Regbankselect: Fix G_BUILD_VECTOR trying to use s16 gpr sources.
Since we can't insert s16 gprs as we don't have 16 bit GPR registers, we need to
teach RBS to assign them to the FPR bank so our selector works.
llvm-svn: 356282
Julian Lettner [Fri, 15 Mar 2019 17:52:27 +0000 (17:52 +0000)]
[TSan][libdispatch] Enable linking and running of tests on Linux
When COMPILER_RT_INTERCEPT_LIBDISPATCH is ON the TSan runtime library
now has a dependency on the blocks runtime and libdispatch. Make sure we
set all the required linking options.
Also add cmake options for specifying additional library paths to
instruct the linker where to search for libdispatch and the blocks
runtime. This allows us to build TSan runtime with libdispatch support
without installing those libraries into default linker library paths.
`CMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY` is necessary to avoid
aborting the build due to failing the link step in CMake's
check_c_compiler test.
Reviewed By: dvyukov, kubamracek
Differential Revision: https://reviews.llvm.org/D59334
llvm-svn: 356281
Philip Reames [Fri, 15 Mar 2019 17:50:30 +0000 (17:50 +0000)]
[X86][GlobalISEL] Support lowering aligned unordered atomics
The existing lowering code is accidentally correct for unordered atomics as far as I can tell. An unordered atomic has no memory ordering, and simply requires the actual load or store to be done as a single well aligned instruction. As such, relax the restriction while adding tests to ensure the lowering remains correct in the future.
Differential Revision: https://reviews.llvm.org/D57803
llvm-svn: 356280
Yonghong Song [Fri, 15 Mar 2019 17:39:10 +0000 (17:39 +0000)]
[BPF] handle external global properly
Previous commit
6bc58e6d3dbd ("[BPF] do not generate unused local/global types")
tried to exclude global variable from type generation. The condition is:
if (Global.hasExternalLinkage())
continue;
This is not right. It also excluded initialized globals.
The correct condition (from AssemblyWriter::printGlobal()) is:
if (!GV->hasInitializer() && GV->hasExternalLinkage())
Out << "external ";
Let us do the same in BTF type generation. Also added a test for it.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 356279
Zachary Turner [Fri, 15 Mar 2019 17:32:05 +0000 (17:32 +0000)]
Return Error and Expected from more DWARF interfaces.
This continues the work of introducing Error and Expected into
the DWARF parsing interfaces, this time for the DWARFCompileUnit
and DWARFDebugAranges classes.
Differential Revision: https://reviews.llvm.org/D59381
llvm-svn: 356278
Aaron Enye Shi [Fri, 15 Mar 2019 17:31:51 +0000 (17:31 +0000)]
[HIP-Clang] propagate -mllvm options to opt and llc
Change the HIP Toolchain to pass the OPT_mllvm options into OPT and LLC stages. Added a lit test to verify the command args.
Reviewers: yaxunl
Differential Revision: https://reviews.llvm.org/D59316
llvm-svn: 356277
Nikita Popov [Fri, 15 Mar 2019 17:29:05 +0000 (17:29 +0000)]
[ConstantRange] Add overflow check helpers
Add functions to ConstantRange that determine whether the
unsigned/signed addition/subtraction of two ConstantRanges
may/always/never overflows. This will allow checking overflow
conditions based on known constant ranges in addition to known bits.
I'm implementing these methods on ConstantRange to allow them to be
unit tested independently of any ValueTracking machinery. The tests
include exhaustive testing on 4-bit ranges, to make sure the result
is both conservatively correct and maximally precise.
The OverflowResult enum is redeclared on ConstantRange, because
I wanted to avoid a dependency in either direction between
ValueTracking.h and ConstantRange.h.
Differential Revision: https://reviews.llvm.org/D59193
llvm-svn: 356276
Adrian Prantl [Fri, 15 Mar 2019 17:22:00 +0000 (17:22 +0000)]
Implement a better way of not passing the sanitizer environment on to tests.
rdar://problem/
48889580
llvm-svn: 356275
Simon Pilgrim [Fri, 15 Mar 2019 17:17:37 +0000 (17:17 +0000)]
[AArch64] Regenerate build vector tests
llvm-svn: 356274
Simon Pilgrim [Fri, 15 Mar 2019 17:00:55 +0000 (17:00 +0000)]
[SelectionDAG] Add SimplifyDemandedBits handling for ISD::SCALAR_TO_VECTOR
Fixes a lot of constant folding mismatches between i686 and x86_64
llvm-svn: 356273
Robert Widmann [Fri, 15 Mar 2019 16:57:23 +0000 (16:57 +0000)]
[LLVM-C] Expose the "Add Discriminators" Pass To LLVM-C
Summary: Add bindings to create a wrapped "Add Discriminators" pass. Now that we have debug info support, this is a handy transform to have.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: dblaikie, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58624
llvm-svn: 356272
Davide Italiano [Fri, 15 Mar 2019 16:55:51 +0000 (16:55 +0000)]
[DataFormatters] Remove LLDB_DISABLE_PYTHON from TypeCategory.
llvm-svn: 356271
Simon Pilgrim [Fri, 15 Mar 2019 16:16:49 +0000 (16:16 +0000)]
[X86] Add SimplifyDemandedBitsForTargetNode support for PINSRB/PINSRW
llvm-svn: 356270
Pavel Labath [Fri, 15 Mar 2019 15:34:10 +0000 (15:34 +0000)]
YAMLIO: Improve endian type support
Summary:
Now that endian types support enumerations (D59141), the existing yaml
support for them is somewhat insufficient. The current solution was to
define the ScalarTraits class for these types, which always forwards to
the ScalarTraits of the underlying type. However, the enum types will
usually have ScalarEnumerationTraits of ScalarBitsetTraits.
In this patch I add the two extra Traits types to the endian types. In
order to properly SFINAE-ize them, I've also added an extra "Enable"
template argument to the Traits template classes.
Reviewers: zturner, sammccall
Subscribers: kristina, Bigcheese, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59289
llvm-svn: 356269
Teresa Johnson [Fri, 15 Mar 2019 15:11:38 +0000 (15:11 +0000)]
[ThinLTO] Restructure AliasSummary to contain ValueInfo of Aliasee
Summary:
The AliasSummary previously contained the AliaseeGUID, which was only
populated when reading the summary from bitcode. This patch changes it
to instead hold the ValueInfo of the aliasee, and always populates it.
This enables more efficient access to the ValueInfo (specifically in the
recent patch r352438 which needed to perform an index hash table lookup
using the aliasee GUID).
As noted in the comments in AliasSummary, we no longer technically need
to keep a pointer to the corresponding aliasee summary, since it could
be obtained by walking the list of summaries on the ValueInfo looking
for the summary in the same module. However, I am concerned that this
would be inefficient when walking through the index during the thin
link for various analyses. That can be reevaluated in the future.
By always populating this new field, we can remove the guard and special
handling for a 0 aliasee GUID when dumping the dot graph of the summary.
An additional improvement in this patch is when reading the summaries
from LLVM assembly we now set the AliaseeSummary field to the aliasee
summary in that same module, which makes it consistent with the behavior
when reading the summary from bitcode.
Reviewers: pcc, mehdi_amini
Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D57470
llvm-svn: 356268
Simon Pilgrim [Fri, 15 Mar 2019 15:07:44 +0000 (15:07 +0000)]
[Hexagon] Remove icmp undef from reduced tests
Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC)
Approved by @kparzysz (Krzysztof Parzyszek)
llvm-svn: 356267
Marshall Clow [Fri, 15 Mar 2019 15:00:41 +0000 (15:00 +0000)]
Update a deque test with more assertions. NFC
llvm-svn: 356266
Mircea Trofin [Fri, 15 Mar 2019 15:00:12 +0000 (15:00 +0000)]
[llvm] Skip over empty line table entries.
Summary:
This is similar to how addr2line handles consecutive entries with the
same address - pick the last one.
Reviewers: dblaikie, friss, JDevlieghere
Reviewed By: dblaikie
Subscribers: eugenis, vitalybuka, echristo, JDevlieghere, probinson, aprantl, hiraditya, rupprecht, jdoerfert, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D58952
llvm-svn: 356265
Ilya Biryukov [Fri, 15 Mar 2019 14:30:07 +0000 (14:30 +0000)]
[clangd] Remove includes of "gmock-matchers.h". NFC
For consistency with most of the test code.
llvm-svn: 356264
Pavel Labath [Fri, 15 Mar 2019 14:03:52 +0000 (14:03 +0000)]
Fix a typo in FindLibEdit.cmake
The package name is LibEdit, so we should use that name in the call to
find_package_handle_standard_args. Failing to do so results in the
standard_args (such as the one telling us whether REQUIRED was used in
the find_package invocation) not being handled.
llvm-svn: 356263
Pavel Labath [Fri, 15 Mar 2019 14:02:35 +0000 (14:02 +0000)]
Delete type_sp member from TypePair
Summary:
As discussed in the review of D59217, this member is unnecessary since
always the first thing we do is convert it to a CompilerType.
This opens up possibilities for further cleanups (e.g. the whole
TypePair class now loses purpose, since we can just pass around
CompilerType everywhere), but I did not want to do that yet, because I
am not sure if this will not introduce breakages in some of the
platforms/configurations that I am not testing on.
Reviewers: clayborg, zturner, jingham
Subscribers: jdoerfert, lldb-commits
Differential Revision: https://reviews.llvm.org/D59297
llvm-svn: 356262
Ilya Biryukov [Fri, 15 Mar 2019 14:00:49 +0000 (14:00 +0000)]
[clangd] Tune the fuzzy-matching algorithm
Summary:
To reduce the gap between prefix and initialism matches.
The motivation is producing better scoring in one particular example,
but the change does not seem to cause large regressions in other cases.
The examples is matching 'up' against 'unique_ptr' and 'upper_bound'.
Before the change, we had:
- "[u]nique_[p]tr" with a score of 0.3,
- "[up]per_bound" with a score of 1.0.
A 3x difference meant that symbol quality signals were almost always ignored
and 'upper_bound' was always ranked higher.
However, intuitively, the match scores should be very close for the two.
After the change we have the following scores:
- "[u]nique_[p]tr" with a score of 0.75,
- "[up]per_bound" with a score of 1.0.
Reviewers: ioeric
Reviewed By: ioeric
Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59300
llvm-svn: 356261
Mikael Holmen [Fri, 15 Mar 2019 13:51:05 +0000 (13:51 +0000)]
[CodeGenPrepare] avoid crashing from replacing a phi twice
Summary:
This is a fix to bug 41052:
https://bugs.llvm.org/show_bug.cgi?id=41052
While trying to optimize a memory instruction in a dead basic block, we end up registering the same phi for replacement twice. This patch avoids registering more than the first replacement candidate for a phi.
Patch by: JesperAntonsson
Reviewers: skatkov, aprantl
Reviewed By: aprantl
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59358
llvm-svn: 356260
Sam Parker [Fri, 15 Mar 2019 13:36:37 +0000 (13:36 +0000)]
[ARM] Remove EarlyCSE from backend
There is an issue with early CSE hitting an assert, so temporarily
remove the pass from the Arm backend.
Bug: https://bugs.llvm.org/show_bug.cgi?id=41081
Differential Revision: https://reviews.llvm.org/D59410
llvm-svn: 356259
Michael Liao [Fri, 15 Mar 2019 12:42:21 +0000 (12:42 +0000)]
[AMDGPU] Fix SGPR fixing through SCC chaining
Summary:
- During the fixing of SGPR copying from VGPR, ensure users of SCC is
properly propagated, i.e.
* only propagate through live def of SCC,
* skip the SCC-def inst itself, and
* stop the propagation on the other SCC-def inst after checking its
SCC-use first.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59362
llvm-svn: 356258
Florian Hahn [Fri, 15 Mar 2019 12:37:50 +0000 (12:37 +0000)]
[LSR] Update test from rL356256 after rebase.
llvm-svn: 356257
Florian Hahn [Fri, 15 Mar 2019 12:17:36 +0000 (12:17 +0000)]
[LSR] Check for signed overflow in NarrowSearchSpaceByDetectingSupersets.
We are adding a sign extended IR value to an int64_t, which can cause
signed overflows, as in the attached test case, where we have a formula
with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min().
If the addition would overflow, skip the simplification for this
formula. Note that the target triple is required to trigger the failure.
Reviewers: qcolombet, gilr, kparzysz, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59211
llvm-svn: 356256
Evgeny Mankov [Fri, 15 Mar 2019 12:05:36 +0000 (12:05 +0000)]
[CUDA][Windows] Partial fix for bug #38811 (Step 1 of 3)
Partial fix for the clang Bug https://bugs.llvm.org/show_bug.cgi?id=38811 "Clang fails to compile with CUDA-9.x on Windows".
Adding defined(_WIN64) check along with existing #if defined(__LP64__) eliminates the below clang (64-bit) compilation error on Windows.
C:/GIT/LLVM/trunk/llvm-64-release-vs2017/dist/lib/clang/9.0.0\include\__clang_cuda_device_functions.h(1609,45): error GEF7559A7: no matching function for call to 'roundf'
__DEVICE__ long lroundf(float __a) { return roundf(__a); }
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D59361
llvm-svn: 356255
Nico Weber [Fri, 15 Mar 2019 11:54:01 +0000 (11:54 +0000)]
Rename directory housing clang-change-namespace to be eponymous
Makes the name of this directory consistent with the names of the other
directories in clang-tools-extra.
Differential Revision: https://reviews.llvm.org/D59382
llvm-svn: 356254
Simon Pilgrim [Fri, 15 Mar 2019 11:24:17 +0000 (11:24 +0000)]
[SPARC] Regenerate label test for D59363
llvm-svn: 356253
Simon Pilgrim [Fri, 15 Mar 2019 11:14:59 +0000 (11:14 +0000)]
[ARM] Remove icmp undef from reduced tests
Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC)
Approved by @efriedma (Eli Friedman)
llvm-svn: 356252
Simon Pilgrim [Fri, 15 Mar 2019 11:13:26 +0000 (11:13 +0000)]
[WebAssembly] Remove icmp undef in stackify test
Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC)
Approved by @tlively (Thomas Lively)
llvm-svn: 356251
Benjamin Kramer [Fri, 15 Mar 2019 11:09:41 +0000 (11:09 +0000)]
Make getFullyQualifiedName qualify both the pointee and class type for member ptr types
We already handle pointers and references, member ptrs are just another
special case. Fixes PR40732.
Differential Revision: https://reviews.llvm.org/D59387
llvm-svn: 356250
Simon Pilgrim [Fri, 15 Mar 2019 11:05:42 +0000 (11:05 +0000)]
[X86][SSE] Attempt to convert SSE shift-by-var to shift-by-imm.
Prep work for PR40203
llvm-svn: 356249
Fangrui Song [Fri, 15 Mar 2019 10:43:51 +0000 (10:43 +0000)]
[llvm-profdata] Deleted unused Cutoffs added by D16005
llvm-svn: 356248
James Henderson [Fri, 15 Mar 2019 10:35:27 +0000 (10:35 +0000)]
[yaml2obj]Allow explicit setting of p_filesz, p_memsz, and p_offset
yaml2obj currently derives the p_filesz, p_memsz, and p_offset values of
program headers from their sections. This makes writing tests for
certain formats more complex, and sometimes impossible. This patch
allows setting these fields explicitly, overriding the default value,
when relevant.
Reviewed by: jakehehrlich, Higuoxing
Differential Revision: https://reviews.llvm.org/D59372
llvm-svn: 356247
Fangrui Song [Fri, 15 Mar 2019 10:34:57 +0000 (10:34 +0000)]
[llvm-readobj] Delete unused variable. NFC
llvm-svn: 356246
Fangrui Song [Fri, 15 Mar 2019 10:27:28 +0000 (10:27 +0000)]
[llvm-objcopy] Delete unused parameter from replaceDebugSections. NFC
llvm-svn: 356245
Fangrui Song [Fri, 15 Mar 2019 10:20:51 +0000 (10:20 +0000)]
[llvm-objcopy] Don't use {}; NFC
llvm-svn: 356244
Sam Parker [Fri, 15 Mar 2019 10:19:32 +0000 (10:19 +0000)]
[ARM][ParallelDSP] Disable for big-endian
Bail early when we don't have a preheader and also if the target is
big endian because it's written with only little endian in mind!
Differential Revision: https://reviews.llvm.org/D59368
llvm-svn: 356243
Jonas Hahnfeld [Fri, 15 Mar 2019 10:15:13 +0000 (10:15 +0000)]
[msan] Fix BMI2 detection in msan tests, take 2.
It's not enough if only one bit is present, we need to check that
both are set. This finally fixes the test failures for me.
llvm-svn: 356242
Fangrui Song [Fri, 15 Mar 2019 09:40:03 +0000 (09:40 +0000)]
[COFF] Delete unused declarations and add a missing forward declaration. NFC
llvm-svn: 356241
Fangrui Song [Fri, 15 Mar 2019 07:16:39 +0000 (07:16 +0000)]
[ELF] Delete unused forward declarations and unused DynamicReloc::getInputSec(). NFC
llvm-svn: 356239