Sam Kolton [Thu, 22 Dec 2016 11:30:48 +0000 (11:30 +0000)]
[AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwa
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.
Reviewers: nhaustov, vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27847
llvm-svn: 290336
George Rimar [Thu, 22 Dec 2016 11:05:05 +0000 (11:05 +0000)]
[ELF] - Use error() instead of fatal() during relaxation of R_X86_64_GOTTPOFF
This is last known noticable fatal() in target.cpp.
We also have other ones for unknown relocations or
creating unknown targets, but that one can be just error I think.
Used yaml2obj to generate test.
Differential revision: https://reviews.llvm.org/D28049
llvm-svn: 290335
Rui Ueyama [Thu, 22 Dec 2016 09:54:32 +0000 (09:54 +0000)]
Do not return null or Undefined from find{All,}ByVersion.
Vectors returned form that function contained nullptrs or Undefined symbols.
This patch filter them out. This makes use of the function a bit easier.
llvm-svn: 290334
Ayman Musa [Thu, 22 Dec 2016 08:42:46 +0000 (08:42 +0000)]
[X86][AVX2] Passing the appropriate memory operand class to VPMADDWD instruction.
Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem.
Differential Revision: https://reviews.llvm.org/D28024
llvm-svn: 290333
Rui Ueyama [Thu, 22 Dec 2016 08:20:28 +0000 (08:20 +0000)]
Make -color-diagnostics an alias to -color-diagnostics=always.
Previously, that was an alias to -color-diagnostics=auto. However,
Clang's -fcolor-diagnostics is an alias to -fcolor-diagnostics=always,
so that was confusing. This patch fixes that issue.
llvm-svn: 290332
Chandler Carruth [Thu, 22 Dec 2016 07:53:20 +0000 (07:53 +0000)]
[PM] Loosen the check ever so slightly -- MSVC appears to not include
a space after the comma in template arguments with our hacky type name
system.
llvm-svn: 290331
Diana Picus [Thu, 22 Dec 2016 07:35:56 +0000 (07:35 +0000)]
[XRay] [compiler-rt] Move machine-dependent code into machine-dependent files
Reapply r290077.
Authors: pelikan
Differential Revision: https://reviews.llvm.org/D27979
llvm-svn: 290330
Richard Smith [Thu, 22 Dec 2016 07:24:39 +0000 (07:24 +0000)]
Speculative revert of r290310 to see if that's the change that's making some of
the bots unhappy.
llvm-svn: 290329
Chandler Carruth [Thu, 22 Dec 2016 07:14:35 +0000 (07:14 +0000)]
[PM] Make a couple of CHECK lines a bit more precise, NFC.
I was staring at these and didn't realize these were module-layer
proxies as opposed to some other layer. Justin and I have a plan to
rename things to make the names themselves much easier to reason about,
but I at least want the CHECK lines to be precise for now.
llvm-svn: 290328
Chandler Carruth [Thu, 22 Dec 2016 07:14:33 +0000 (07:14 +0000)]
[PM] Remove now-dead extern template and explicit instantiation
declarations.
We're using a custom class here instead of the helper template, these
bits just didn't get deleted when the other bits did get deleted. This
was found by a really nice MSVC warning about explicitly instantiating
a template where some member functions aren't defined and thus can't be
instantiatied.
llvm-svn: 290327
Bruno Cardoso Lopes [Thu, 22 Dec 2016 07:06:03 +0000 (07:06 +0000)]
[CrashReproducer] Add support for merging -ivfsoverlay
Merge all VFS mapped files inside -ivfsoverlay inputs into the vfs
overlay provided by the crash reproducer. This is the last missing piece
to allow crash reproducers to fully work with user frameworks; when
combined with headermaps, it allows clang to find additional frameworks.
rdar://problem/
27913709
llvm-svn: 290326
Chandler Carruth [Thu, 22 Dec 2016 06:59:15 +0000 (06:59 +0000)]
[PM] Introduce a reasonable port of the main per-module pass pipeline
from the old pass manager in the new one.
I'm not trying to support (initially) the numerous options that are
currently available to customize the pass pipeline. If we end up really
wanting them, we can add them later, but I suspect many are no longer
interesting. The simplicity of omitting them will help a lot as we sort
out what the pipeline should look like in the new PM.
I've also documented to the best of my ability *why* each pass or group
of passes is used so that reading the pipeline is more helpful. In many
cases I think we have some questionable choices of ordering and I've
left FIXME comments in place so we know what to come back and revisit
going forward. But for now, I've left it as similar to the current
pipeline as I could.
Lastly, I've had to comment out several places where passes are not
ported to the new pass manager or where the loop pass infrastructure is
not yet ready. I did at least fix a few bugs in the loop pass
infrastructure uncovered by running the full pipeline, but I didn't want
to go too far in this patch -- I'll come back and re-enable these as the
infrastructure comes online. But I'd like to keep the comments in place
because I don't want to lose track of which passes need to be enabled
and where they go.
One thing that seemed like a significant API improvement was to require
that we don't build pipelines for O0. It seems to have no real benefit.
I've also switched back to returning pass managers by value as at this
API layer it feels much more natural to me for composition. But if
others disagree, I'm happy to go back to an output parameter.
I'm not 100% happy with the testing strategy currently, but it seems at
least OK. I may come back and try to refactor or otherwise improve this
in subsequent patches but I wanted to at least get a good starting point
in place.
Differential Revision: https://reviews.llvm.org/D28042
llvm-svn: 290325
Adrian Prantl [Thu, 22 Dec 2016 06:10:41 +0000 (06:10 +0000)]
Fix an assertion in DwarfExpression when emitting fragments in vector registers
When DwarfExpression is emitting a fragment that is located in a
register and that fragment is smaller than the register, and the
register must be composed from sub-registers (are you still with me?)
the last DW_OP_piece operation must not be larger than the size of the
fragment itself, since the last piece of the fragment could be smaller
than the last subregister that is being emitted.
rdar://problem/
29779065
llvm-svn: 290324
Rui Ueyama [Thu, 22 Dec 2016 05:31:52 +0000 (05:31 +0000)]
Define a getter function for a lazily-created object.
Previously, you had to call initDemangledSyms() before accessing DemangledSyms.
Now getDemangledSyms() initializes it and then returns it. So it is now less easy
to use it in a wrong way.
llvm-svn: 290323
Adrian Prantl [Thu, 22 Dec 2016 05:27:12 +0000 (05:27 +0000)]
Refactor the DIExpression fragment query interface (NFC)
... so it becomes available to DIExpressionCursor.
llvm-svn: 290322
Rui Ueyama [Thu, 22 Dec 2016 05:22:29 +0000 (05:22 +0000)]
Simplify. NFC.
llvm-svn: 290321
Rui Ueyama [Thu, 22 Dec 2016 05:11:12 +0000 (05:11 +0000)]
Define a function to avoid a magic variable 0x3.
llvm-svn: 290320
Antonio Maiorano [Thu, 22 Dec 2016 05:10:07 +0000 (05:10 +0000)]
Make FormatStyle.GetStyleOfFile test work on MSVC
Modify getStyle to use vfs::FileSystem::makeAbsolute just like FS.addFile does,
rather than sys::fs::make_absolute. The latter gets the CWD from the platform,
while the former expects it to be set by the client, causing a mismatch when
converting relative paths to absolute.
Differential Revision: https://reviews.llvm.org/D27971
llvm-svn: 290319
Rui Ueyama [Thu, 22 Dec 2016 04:40:56 +0000 (04:40 +0000)]
Remove a typedef that is used only once.
llvm-svn: 290318
Matt Arsenault [Thu, 22 Dec 2016 04:39:45 +0000 (04:39 +0000)]
DAG: Add helper for testing constant values
There are helpers for testing for constant or constant build_vector,
and for splat ConstantFP vectors, but not for a constantfp or
non-splat ConstantFP vector.
llvm-svn: 290317
Matt Arsenault [Thu, 22 Dec 2016 04:39:41 +0000 (04:39 +0000)]
AMDGPU: Fix missing commute table entries for cmpx
No tests because these aren't currently used anywhere.
llvm-svn: 290316
Saleem Abdulrasool [Thu, 22 Dec 2016 04:26:57 +0000 (04:26 +0000)]
Sema: print qualified name for overload candidates
Print the fully qualified names for the overload candidates. This makes
it easier to tell what the ambiguity is. Especially if a template
is instantiated after a using namespace, it will not inherit the
namespace where it was declared. The specialization will give a message
about a partial order being ambiguous for the same (unqualified) name,
which does not help identify the failure.
Addresses PR31450!
llvm-svn: 290315
Mehdi Amini [Thu, 22 Dec 2016 04:09:29 +0000 (04:09 +0000)]
[ThinLTO] Save 8B per summary entry by rearranging the fields (NFC)
Size goes from 72B to 64B per entry.
Differential Revision: https://reviews.llvm.org/D27970
llvm-svn: 290314
Matt Arsenault [Thu, 22 Dec 2016 04:03:40 +0000 (04:03 +0000)]
AMDGPU: Swap order of operands in fadd/fsub combine
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.
llvm-svn: 290313
Matt Arsenault [Thu, 22 Dec 2016 04:03:35 +0000 (04:03 +0000)]
AMDGPU: Check fast math flags in fadd/fsub combines
llvm-svn: 290312
Matt Arsenault [Thu, 22 Dec 2016 03:55:35 +0000 (03:55 +0000)]
AMDGPU: Form more FMAs if fusion is allowed
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.
llvm-svn: 290311
Richard Smith [Thu, 22 Dec 2016 03:52:37 +0000 (03:52 +0000)]
Only substitute into type of non-type template parameter once, rather than
twice, in finalization of template argumetn deduction.
llvm-svn: 290310
Matt Arsenault [Thu, 22 Dec 2016 03:44:42 +0000 (03:44 +0000)]
AMDGPU: Move combines into separate functions
llvm-svn: 290309
Matt Arsenault [Thu, 22 Dec 2016 03:40:39 +0000 (03:40 +0000)]
AMDGPU: Enable some f32 fadd/fsub combines for f16
llvm-svn: 290308
Matt Arsenault [Thu, 22 Dec 2016 03:21:48 +0000 (03:21 +0000)]
AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16
llvm-svn: 290307
Matt Arsenault [Thu, 22 Dec 2016 03:21:45 +0000 (03:21 +0000)]
AMDGPU: setcc test cleanup
llvm-svn: 290306
Saleem Abdulrasool [Thu, 22 Dec 2016 03:09:04 +0000 (03:09 +0000)]
Driver: use the triple to query the arch, not the toolchain
Although the result is the same, the intent is much more clear this way:
we care about the architecture we are targeting. NFC.
llvm-svn: 290305
Saleem Abdulrasool [Thu, 22 Dec 2016 03:09:02 +0000 (03:09 +0000)]
Driver: remove unnecessary parameter
We can query the Triple and EffectiveTriple from the ToolChain. Avoid
passing in the argument and query it in the function. NFC.
llvm-svn: 290304
Saleem Abdulrasool [Thu, 22 Dec 2016 03:09:00 +0000 (03:09 +0000)]
Driver: rename parameter to reduce confusion
The parameter to ParsePICOpts passed the effective triple and then used
that in a few places and used the actual triple in others. This was
slightly confusing. Rename the parameter to make it more obvious.
llvm-svn: 290303
Matt Arsenault [Thu, 22 Dec 2016 03:05:44 +0000 (03:05 +0000)]
AMDGPU: Allow rcp and rsq usage with f16
llvm-svn: 290302
Matt Arsenault [Thu, 22 Dec 2016 03:05:41 +0000 (03:05 +0000)]
AMDGPU: Custom lower f16 fdiv
llvm-svn: 290301
Matt Arsenault [Thu, 22 Dec 2016 03:05:37 +0000 (03:05 +0000)]
AMDGPU: Implement f16 fcanonicalize
llvm-svn: 290300
Matt Arsenault [Thu, 22 Dec 2016 03:05:30 +0000 (03:05 +0000)]
AMDGPU: Update isFPImmLegal for f16
I don't think this matters because ConstantFP is legal.
llvm-svn: 290299
Peter Collingbourne [Thu, 22 Dec 2016 02:52:23 +0000 (02:52 +0000)]
Clear the PendingTypeTests vector after moving from it.
This is to put the vector into a well defined state. Apparently the state of a
vector after being moved from is valid but unspecified. Found with clang-tidy.
llvm-svn: 290298
George Burgess IV [Thu, 22 Dec 2016 02:50:20 +0000 (02:50 +0000)]
Add the alloc_size attribute to clang, attempt 2.
This is a recommit of r290149, which was reverted in r290169 due to msan
failures. msan was failing because we were calling
`isMostDerivedAnUnsizedArray` on an invalid designator, which caused us
to read uninitialized memory. To fix this, the logic of the caller of
said function was simplified, and we now have a `!Invalid` assert in
`isMostDerivedAnUnsizedArray`, so we can catch this particular bug more
easily in the future.
Fingers crossed that this patch sticks this time. :)
Original commit message:
This patch does three things:
- Gives us the alloc_size attribute in clang, which lets us infer the
number of bytes handed back to us by malloc/realloc/calloc/any user
functions that act in a similar manner.
- Teaches our constexpr evaluator that evaluating some `const` variables
is OK sometimes. This is why we have a change in
test/SemaCXX/constant-expression-cxx11.cpp and other seemingly
unrelated tests. Richard Smith okay'ed this idea some time ago in
person.
- Uniques some Blocks in CodeGen, which was reviewed separately at
D26410. Lack of uniquing only really shows up as a problem when
combined with our new eagerness in the face of const.
llvm-svn: 290297
Haicheng Wu [Thu, 22 Dec 2016 01:39:24 +0000 (01:39 +0000)]
[AArch64] Correct the check of signed 9-bit imm in getIndexedAddressParts().
-256 is a legal indexed address part.
Differential Revision: https://reviews.llvm.org/D27537
llvm-svn: 290296
Easwaran Raman [Thu, 22 Dec 2016 01:07:01 +0000 (01:07 +0000)]
Pass GetAssumptionCache to InlineFunctionInfo constructor
Differential revision: https://reviews.llvm.org/D28038
llvm-svn: 290295
David Majnemer [Thu, 22 Dec 2016 00:51:59 +0000 (00:51 +0000)]
[NVVMIntrRange] Only set range metadata if none is already present
The range metadata inserted by NVVMIntrRange is pessimistic, range
metadata already present could be more precise.
llvm-svn: 290294
Adrian Prantl [Thu, 22 Dec 2016 00:45:21 +0000 (00:45 +0000)]
Renumber testcase metadata nodes after r290153.
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.
Differential Revision: https://reviews.llvm.org/D27765
llvm-svn: 290292
Adrian Prantl [Thu, 22 Dec 2016 00:29:00 +0000 (00:29 +0000)]
[LLParser] Make the line field of DIMacro(File) optional.
Otherwise these records do not survive roundtrips.
llvm-svn: 290291
Alexander Kornienko [Wed, 21 Dec 2016 23:44:23 +0000 (23:44 +0000)]
[clang-tidy] Ignore `size() == 0` in the container implementation.
llvm-svn: 290289
Adrian Prantl [Wed, 21 Dec 2016 23:38:17 +0000 (23:38 +0000)]
Legalize metadata in legacy testcases
llvm-svn: 290288
Adrian Prantl [Wed, 21 Dec 2016 23:36:06 +0000 (23:36 +0000)]
Legalize metadata in legacy testcases
llvm-svn: 290287
Adrian Prantl [Wed, 21 Dec 2016 23:30:35 +0000 (23:30 +0000)]
Legalize metadata in legacy testcases
llvm-svn: 290286
Adrian Prantl [Wed, 21 Dec 2016 23:28:49 +0000 (23:28 +0000)]
Legalize metadata in legacy testcases
llvm-svn: 290285
Ahmed Bougacha [Wed, 21 Dec 2016 23:26:20 +0000 (23:26 +0000)]
[GlobalISel] Add basic Selector-emitter tblgen backend.
This adds a basic tablegen backend that analyzes the SelectionDAG
patterns to find simple ones that are eligible for GlobalISel-emission.
That's similar to FastISel, with one notable difference: we're not fed
ISD opcodes, so we need to map the SDNode operators to generic opcodes.
That's done using GINodeEquiv in TargetGlobalISel.td.
Otherwise, this is mostly boilerplate, and lots of filtering of any kind
of "complicated" pattern. On AArch64, this is sufficient to match G_ADD
up to s64 (to ADDWrr/ADDXrr) and G_BR (to B).
Differential Revision: https://reviews.llvm.org/D26878
llvm-svn: 290284
Ahmed Bougacha [Wed, 21 Dec 2016 23:26:13 +0000 (23:26 +0000)]
[AsmWriter] Remove redundant cast<>s. NFC.
llvm-svn: 290283
Sean Callanan [Wed, 21 Dec 2016 23:21:11 +0000 (23:21 +0000)]
specify -DNDEBUG for BNI builds of all targets in the Xcode build
llvm-svn: 290282
Dan Gohman [Wed, 21 Dec 2016 23:09:42 +0000 (23:09 +0000)]
[WebAssembly] Fix the opcode value for i64.rotr.
llvm-svn: 290281
Peter Collingbourne [Wed, 21 Dec 2016 23:03:45 +0000 (23:03 +0000)]
IR: Function summary representation for type tests.
Each function summary has an attached list of type identifier GUIDs. The
idea is that during the regular LTO phase we would match these GUIDs to type
identifiers defined by the regular LTO module and store the resolutions in
a top-level "type identifier summary" (which will be implemented separately).
Differential Revision: https://reviews.llvm.org/D27967
llvm-svn: 290280
Evgeniy Stepanov [Wed, 21 Dec 2016 22:50:08 +0000 (22:50 +0000)]
Increase the treshold in unit test to accomodate for qurantine size increase.
Reviewers: eugenis
Patch by Alex Shlyapnikov.
Subscribers: llvm-commits, kubabrecka
Differential Revision: https://reviews.llvm.org/D28029
llvm-svn: 290279
Mike Aizatsky [Wed, 21 Dec 2016 22:10:01 +0000 (22:10 +0000)]
[sancov] skip duplicated points
llvm-svn: 290278
Mike Aizatsky [Wed, 21 Dec 2016 22:09:57 +0000 (22:09 +0000)]
[sancov] hash prefix results in huge merge files, use shorter prefix
llvm-svn: 290277
Richard Smith [Wed, 21 Dec 2016 21:42:57 +0000 (21:42 +0000)]
Perform type-checking for a converted constant expression in a template
argument even if the expression is value-dependent (we need to suppress the
final portion of the narrowing check, but the rest of the checking can still be
done eagerly).
This affects template template argument validity and partial ordering under
p0522r0.
llvm-svn: 290276
Haicheng Wu [Wed, 21 Dec 2016 21:40:47 +0000 (21:40 +0000)]
[AArch64] Remove a redundant check. NFC.
The case AM.Scale == 0 is already handled by the code right above.
Differential Revision: https://reviews.llvm.org/D28003
llvm-svn: 290275
Greg Clayton [Wed, 21 Dec 2016 21:37:06 +0000 (21:37 +0000)]
Add the ability for DWARFDie objects to get the parent DWARFDie.
In order for the llvm DWARF parser to be used in LLDB we will need to be able to get the parent of a DIE. This patch adds that functionality by changing the DWARFDebugInfoEntry class to store a depth field instead of a sibling index. Using a depth field allows us to easily calculate the sibling and the parent without increasing the size of DWARFDebugInfoEntry.
I tested llvm-dsymutil on a debug version of clang where this fully parses DWARF in over 1200 .o files to verify there was no serious regression in performance.
Added a full suite of unit tests to test this functionality.
Differential Revision: https://reviews.llvm.org/D27995
llvm-svn: 290274
Chris Bieneman [Wed, 21 Dec 2016 21:23:27 +0000 (21:23 +0000)]
[CMake] Support distribution install for LLDB.framework
This patch adds the last bit of support to get LLVM_DISTRIBUTION_COMPONENTS working with libLLDB when built as a framework.
This patch adds dummy install targets for binaries built into the framework's Resources directory, and makes the framework's install target depend on all the binaries that get installed with the framework.
llvm-svn: 290273
Andrey Churbanov [Wed, 21 Dec 2016 21:20:20 +0000 (21:20 +0000)]
Fix for the __kmpc_global_num_threads function to return the value of the __kmp_all_nth global var.
Patch by Yonghong Yan.
Differential Revision: https://reviews.llvm.org/D27975
llvm-svn: 290272
Justin Bogner [Wed, 21 Dec 2016 21:19:00 +0000 (21:19 +0000)]
cmake: Don't build llvm-config and tblgen concurrently in cross builds
This sets USES_TERMINAL for the native llvm-config build, so that it
doesn't run at the same time as builds of other native tools (namely,
tablegen). Without this, if you're very unlucky with the timing it's
possible to be relinking libSupport as one of the tools is linking,
causing a spurious failure.
The tablegen build adopted USES_TERMINAL for this same reason in
r280748.
llvm-svn: 290271
Ed Maste [Wed, 21 Dec 2016 20:51:42 +0000 (20:51 +0000)]
Update mailing list post URL and add libunwind reference
RTDyldMemoryManager.cpp describes the differing __register_frame
API between libunwind and libgcc, with a mailing list posting URL.
The original link was 404; replace it with what I believe is the
intended post, as well as a reference to the "OS X" implementation in
libunwind.
Differential Revision: https://reviews.llvm.org/D27965
llvm-svn: 290269
Tim Northover [Wed, 21 Dec 2016 20:49:43 +0000 (20:49 +0000)]
ARM: define a macro for the FPv5 FPU in ARM mode.
FPv5 is in Cortex-M7 and the 64-bit CPUs when running in 32-bit mode. The name
is from the Cortex-M7 TRM.
llvm-svn: 290268
Simon Pilgrim [Wed, 21 Dec 2016 20:00:10 +0000 (20:00 +0000)]
[X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:
AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);
when we could avoid one of the upper shifts with:
AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);
This matches the lowering on gcc/icc.
Differential Revision: https://reviews.llvm.org/D27756
llvm-svn: 290267
David Majnemer [Wed, 21 Dec 2016 19:21:59 +0000 (19:21 +0000)]
Revert "[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp"
This reverts commit r289813, it caused PR31449.
llvm-svn: 290266
Tom Stellard [Wed, 21 Dec 2016 19:06:24 +0000 (19:06 +0000)]
AMDGPU/SI: Fix file header
llvm-svn: 290265
Peter Collingbourne [Wed, 21 Dec 2016 19:00:47 +0000 (19:00 +0000)]
TypeMetadataUtils: Simplify; spotted by Mehdi.
llvm-svn: 290264
Zachary Turner [Wed, 21 Dec 2016 18:50:52 +0000 (18:50 +0000)]
Add missing includes on Windows.
Patch by Andrey Khalyavin
Differential Revision: https://reviews.llvm.org/D27915
llvm-svn: 290263
Paul Robinson [Wed, 21 Dec 2016 18:33:17 +0000 (18:33 +0000)]
Make some diagnostic tests C++11 clean.
Differential Revision: http://reviews.llvm.org/D27794
llvm-svn: 290262
Michael Kuperstein [Wed, 21 Dec 2016 18:29:47 +0000 (18:29 +0000)]
[LLParser] Parse vector GEP constant expression correctly
The constantexpr parsing was too constrained and rejected legal vector GEPs.
This relaxes it to be similar to the ones for instruction parsing.
This fixes PR30816.
Differential Revision: https://reviews.llvm.org/D28013
llvm-svn: 290261
Michael Kuperstein [Wed, 21 Dec 2016 17:34:21 +0000 (17:34 +0000)]
[ConstantFolding] Fix vector GEPs harder
For vector GEPs, CastGEPIndices can end up in an infinite recursion, because
we compare the vector type to the scalar pointer type, find them different,
and then try to cast a type to itself.
Differential Revision: https://reviews.llvm.org/D28009
llvm-svn: 290260
Daniel Jasper [Wed, 21 Dec 2016 17:02:06 +0000 (17:02 +0000)]
clang-format: Fix bug in handling of single-column lists.
Members that are themselves wrapped in fake parentheses would lead to
AvoidBinPacking be set on the wrong ParenState.
After:
vector<int> aaaa = {
aaaaaa.
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
aaaaaa.
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
aaaaaa.aaaaaaa,
aaaaaa.aaaaaaa,
aaaaaa.aaaaaaa,
aaaaaa.aaaaaaa,
};
Before we were falling back to bin-packing these.
llvm-svn: 290259
Simon Pilgrim [Wed, 21 Dec 2016 16:39:09 +0000 (16:39 +0000)]
Wdocumentation fix
llvm-svn: 290258
Simon Pilgrim [Wed, 21 Dec 2016 15:49:01 +0000 (15:49 +0000)]
[CostModel] Pass shuffle mask args with ArrayRef. NFCI.
llvm-svn: 290257
Roman Gareev [Wed, 21 Dec 2016 12:51:12 +0000 (12:51 +0000)]
Change the determination of parameters of macro-kernel
Typically processor architectures do not include an L3 cache, which means that
Nc, the parameter of the micro-kernel, is, for all practical purposes,
redundant ([1]). However, its small values can cause the redundant packing of
the same elements of the matrix A, the first operand of the matrix
multiplication. At the same time, big values of the parameter Nc can cause
segmentation faults in case the available stack is exceeded.
This patch adds an option to specify the parameter Nc as a multiple of
the parameter of the micro-kernel Nr.
In case of Intel Core i7-3820 SandyBridge and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8
it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).
Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28019
llvm-svn: 290256
Michael Zuckerman [Wed, 21 Dec 2016 12:48:01 +0000 (12:48 +0000)]
revert first commit . removing empty line in X86.h
llvm-svn: 290255
Michael Zuckerman [Wed, 21 Dec 2016 12:44:47 +0000 (12:44 +0000)]
First commit adding new line to X86.h
llvm-svn: 290254
Roman Gareev [Wed, 21 Dec 2016 12:37:36 +0000 (12:37 +0000)]
Align newly created arrays to the first level cache line boundary
Aligning data to cache lines boundaries helps to avoid overheads related to
an access to it ([1]). This patch aligns newly created arrays and adds an
option to specify the first level cache line size. By default we use 64 bytes,
which is a typical cache-line size ([2]).
In case of Intel Core i7-3820 SandyBridge and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8
it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak).
Refs.:
[1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access
[2] - http://igoro.com/archive/gallery-of-processor-cache-effects/
Differential Revision: https://reviews.llvm.org/D28020
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 290253
Davide Italiano [Wed, 21 Dec 2016 12:22:19 +0000 (12:22 +0000)]
[ELF/tests] Use cpio -it instead of cpio -t.
OpenBSD's cpio does not accept the -t option without -i.
Apparently some systems implement cpio -t as a shortcut
for cpio -it, the latter is the only thing that's documented.
This change avoids test failures on OpenBSD.
Patch by Mark Kettenis!
Differential Revision: https://reviews.llvm.org/D28002
llvm-svn: 290252
Roman Gareev [Wed, 21 Dec 2016 11:18:42 +0000 (11:18 +0000)]
[Polly] Use three-dimensional arrays to store packed operands of the matrix
multiplication
Previously we had two-dimensional accesses to store packed operands of
the matrix multiplication for the sake of simplicity of the packed arrays.
However, addition of the third dimension helps to simplify the corresponding
memory access, reduce the execution time of isl operations applied to it, and
consequently reduce the compile-time of Polly. For example, in case of
Intel Core i7-3820 SandyBridge and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=7
it helps to reduce the compile-time from about 361.456 seconds to about 0.816
seconds.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D27878
llvm-svn: 290251
Elena Demikhovsky [Wed, 21 Dec 2016 10:43:36 +0000 (10:43 +0000)]
Added a template for building target specific memory node in DAG.
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store.
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.
Differential Revision: https://reviews.llvm.org/D27899
llvm-svn: 290250
Davide Italiano [Wed, 21 Dec 2016 10:19:00 +0000 (10:19 +0000)]
[AMDGPU] Garbage collect dead code. NFCI.
llvm-svn: 290249
Oren Ben Simhon [Wed, 21 Dec 2016 09:47:31 +0000 (09:47 +0000)]
[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing a warning.
llvm-svn: 290248
George Rimar [Wed, 21 Dec 2016 09:42:25 +0000 (09:42 +0000)]
[ELF] - Linkerscript: Fall back to search paths when INCLUDE not found
From https://sourceware.org/binutils/docs/ld/File-Commands.html:
The file will be searched for in the current directory, and in any
directory specified with the -L option.
Patch done by Alexander Richardson.
Differential revision: https://reviews.llvm.org/D27831
llvm-svn: 290247
Oren Ben Simhon [Wed, 21 Dec 2016 09:18:37 +0000 (09:18 +0000)]
[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing failing test.
llvm-svn: 290246
Oren Ben Simhon [Wed, 21 Dec 2016 09:04:08 +0000 (09:04 +0000)]
Reverting last change.
llvm-svn: 290245
Oren Ben Simhon [Wed, 21 Dec 2016 08:59:42 +0000 (08:59 +0000)]
[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing build issues.
llvm-svn: 290244
George Rimar [Wed, 21 Dec 2016 08:58:36 +0000 (08:58 +0000)]
[ELF] - Removed trailing whitespaces. NFC.
llvm-svn: 290243
Oren Ben Simhon [Wed, 21 Dec 2016 08:58:19 +0000 (08:58 +0000)]
[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing build issues.
llvm-svn: 290242
Rui Ueyama [Wed, 21 Dec 2016 08:40:09 +0000 (08:40 +0000)]
De-template DefinedSynthetic.
DefinedSynthetic is not created for a real ELF object, so it doesn't
have to be a template function. It has a virtual st_value, which is
either 32 bit or 64 bit, but we can simply use 64 bit.
llvm-svn: 290241
Oren Ben Simhon [Wed, 21 Dec 2016 08:31:45 +0000 (08:31 +0000)]
[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use.
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.
The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.
Differential Revision: https://reviews.llvm.org/D27392
llvm-svn: 290240
George Rimar [Wed, 21 Dec 2016 08:21:34 +0000 (08:21 +0000)]
[ELF] - Do not call fatal() in Target.cpp, call error() instead.
We probably would want to avoid fatal() if we can in context of librarification,
but for me reason of that patch is to help D27900 go.
D27900 changes errors reporting to something like
error: text1
note: text2
note: text3
where hint used to provide additional information about location. In that case
I can't just call fatal() because user will not see notes after that what adds additional complication to handle.
So It is good to switch fatal() to error() where it is possible.
Also it adds testcase with broken relocation number.
Previously we did not have any, It checks that error() instead of fatal() works fine.
Differential revision: https://reviews.llvm.org/D27973
llvm-svn: 290239
George Rimar [Wed, 21 Dec 2016 08:11:49 +0000 (08:11 +0000)]
[ELF] - Fix use of freed memory.
It was revealed by D27831.
If we have linkerscript that includes another one that sets OUTPUT for example:
RUN: echo "INCLUDE \"foo.script\"" > %t.script
RUN: echo "OUTPUT(\"%t.out\")" > %T/foo.script
then we do:
void ScriptParser::readInclude() {
...
std::unique_ptr<MemoryBuffer> &MB = *MBOrErr;
tokenize(MB->getMemBufferRef());
OwningMBs.push_back(std::move(MB));
}
void ScriptParser::readOutput() {
...
Config->OutputFile = unquote(Tok);
...
}
Problem is that OwningMBs are destroyed after script parser do its job.
So all Toks are dead and Config->OutputFile points to destroyed data.
Patch suggests to save all included scripts into using string Saver.
Differential revision: https://reviews.llvm.org/D27987
llvm-svn: 290238
Simon Atanasyan [Wed, 21 Dec 2016 05:31:57 +0000 (05:31 +0000)]
[ELF][MIPS] Allow .MIPS.abiflags larger than one Elf_Mips_ABIFlags struct
Older versions of BFD generate libraries with .MIPS.abiflags that only
concatenate the individual .MIPS.abiflags sections instead of merging.
Patch by Alexander Richardson.
Differential revision: https://reviews.llvm.org/D27770
llvm-svn: 290237
David L. Jones [Wed, 21 Dec 2016 04:34:52 +0000 (04:34 +0000)]
Rename several methods on ASTRecordReader to follow LLVM style (lowerCamelCase).
Summary:
This follows up to r290217, and makes functions on ASTRecordReader consistent
and valid style.
Reviewers: rsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D28008
llvm-svn: 290236
Adam Nemet [Wed, 21 Dec 2016 04:07:40 +0000 (04:07 +0000)]
[LDist] Match behavior between invoking via optimization pipeline or opt -loop-distribute
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).
As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.
clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.
This changes opt's default to off as well to match clang. The tests are
modified to explicitly enable the transformation.
llvm-svn: 290235
Sebastian Pop [Wed, 21 Dec 2016 03:37:39 +0000 (03:37 +0000)]
remove pretty-print test that requires debug
There is no need to test the pretty printer. Remove the boggus test to make the
build bots happy.
llvm-svn: 290234