Hans Wennborg [Wed, 30 Mar 2016 23:38:01 +0000 (23:38 +0000)]
[X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325)
The size savings are significant, and from what I can tell, both ICC and GCC do this.
Differential Revision: http://reviews.llvm.org/D18573
llvm-svn: 264966
Justin Lebar [Wed, 30 Mar 2016 23:30:25 +0000 (23:30 +0000)]
[CUDA] Don't initialize the CUDA toolchain if we don't have any CUDA inputs.
Summary:
This prevents errors when you invoke clang with a flag that the NVPTX
toolchain doesn't support. For example, on x86-64,
clang -mthread-model single -x c++ /dev/null -o /dev/null
should output just one error about "invalid thread model 'single' in
'-mthread-model single' for this target"; x86-64 doesn't support
-mthread-model, but we shouldn't also instantiate a NVPTX target!
Reviewers: echristo
Subscribers: tra, sunfish, cfe-commits
Differential Revision: http://reviews.llvm.org/D18629
llvm-svn: 264965
Justin Lebar [Wed, 30 Mar 2016 23:30:21 +0000 (23:30 +0000)]
[CUDA] Make unattributed constexpr functions implicitly host+device.
With this patch, by a constexpr function is implicitly host+device
unless:
a) it's a variadic function (variadic functions are not allowed on the
device side), or
b) it's preceeded by a __device__ overload in a system header.
The restriction on overloading __host__ __device__ functions on the
basis of their CUDA attributes remains in place, but we use (b) to allow
us to define __device__ overloads for constexpr functions in cmath,
which would otherwise be __host__ __device__ and thus not overloadable.
You can disable this behavior with -fno-cuda-host-device-constexpr.
Reviewers: tra, rnk, rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18380
llvm-svn: 264964
Justin Lebar [Wed, 30 Mar 2016 23:30:14 +0000 (23:30 +0000)]
[CUDA] Add math forward declares to CUDA header wrapper.
Summary:
This is necessary for a future patch which will make all constexpr
functions implicitly host+device. cmath may declare constexpr
functions, but these we do *not* want to be host+device. The forward
declares added in this patch prevent this (because the rule will be,
constexpr functions become implicitly host+device unless they're
preceeded by a decl with __device__).
Reviewers: tra
Subscribers: cfe-commits, rnk, rsmith
Differential Revision: http://reviews.llvm.org/D18539
llvm-svn: 264963
Pete Cooper [Wed, 30 Mar 2016 23:28:49 +0000 (23:28 +0000)]
Fix MachO test which is failing on a Windows bot.
This is breaking http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/31647/steps/test%20lld/logs/stdio.
The issue seems to be that it can't write to a file in /tmp, probably because that path doesn't
exist on Windows. This was failing after I added EXPECT_FALSE(ec) in r264961 for the error
handling migration.
llvm-svn: 264962
Pete Cooper [Wed, 30 Mar 2016 23:10:39 +0000 (23:10 +0000)]
Convert lld file writing to llvm::Error. NFC.
This converts the writeFile method, as well as some of the ones it calls
in the normalized binary file writer and yaml writer.
llvm-svn: 264961
Matt Arsenault [Wed, 30 Mar 2016 22:57:40 +0000 (22:57 +0000)]
AMDGPU: Add frexp_mant + frexp_exp builtins
llvm-svn: 264960
Matthias Braun [Wed, 30 Mar 2016 22:46:04 +0000 (22:46 +0000)]
CodeGen: Factor out code for tail call result compatibility check; NFC
llvm-svn: 264959
Matthias Braun [Wed, 30 Mar 2016 22:45:58 +0000 (22:45 +0000)]
Avoid unnecessary #include; NFC
llvm-svn: 264958
Enrico Granata [Wed, 30 Mar 2016 22:45:13 +0000 (22:45 +0000)]
Enhance the 'type X list' commands such that they actually alert the user if no formatters matching the constraints could be found
llvm-svn: 264957
Simon Atanasyan [Wed, 30 Mar 2016 22:43:14 +0000 (22:43 +0000)]
[ELF][MIPS] Revert r264761 and add test case to demonstrate the problem
If we make R_MIPS_LO16 a relative relocation, linker:
- never creates R_MIPS_COPY relocation for it
- attempts to create R_MIPS_REL32 dynamic relocation if R_MIPS_LO16's
target is a preemptible symbol
Differential Revision: http://reviews.llvm.org/D18607
llvm-svn: 264956
Paul Robinson [Wed, 30 Mar 2016 22:41:38 +0000 (22:41 +0000)]
Update copyright year to 2016.
llvm-svn: 264955
Paul Robinson [Wed, 30 Mar 2016 22:41:06 +0000 (22:41 +0000)]
Update copyright year to 2016.
llvm-svn: 264954
Paul Robinson [Wed, 30 Mar 2016 22:40:59 +0000 (22:40 +0000)]
Update copyright year to 2016.
llvm-svn: 264953
Paul Robinson [Wed, 30 Mar 2016 22:40:47 +0000 (22:40 +0000)]
Update copyright year to 2016.
llvm-svn: 264952
Rui Ueyama [Wed, 30 Mar 2016 22:40:16 +0000 (22:40 +0000)]
Fix -Wpessimizing-move warnings.
llvm-svn: 264951
Paul Robinson [Wed, 30 Mar 2016 22:39:53 +0000 (22:39 +0000)]
Update copyright year to 2016.
llvm-svn: 264950
Paul Robinson [Wed, 30 Mar 2016 22:39:03 +0000 (22:39 +0000)]
Update copyright year to 2016.
llvm-svn: 264949
Paul Robinson [Wed, 30 Mar 2016 22:38:50 +0000 (22:38 +0000)]
Update copyright year to 2016.
llvm-svn: 264948
Paul Robinson [Wed, 30 Mar 2016 22:38:47 +0000 (22:38 +0000)]
Update copyright year to 2016.
llvm-svn: 264947
Paul Robinson [Wed, 30 Mar 2016 22:38:44 +0000 (22:38 +0000)]
Update copyright year to 2016.
llvm-svn: 264946
Pete Cooper [Wed, 30 Mar 2016 22:34:37 +0000 (22:34 +0000)]
Remove useless unreachable. Switch coverage already gives us this. NFC
llvm-svn: 264945
Matt Arsenault [Wed, 30 Mar 2016 22:28:52 +0000 (22:28 +0000)]
AMDGPU: Add frexp_exp intrinsic
llvm-svn: 264944
Matt Arsenault [Wed, 30 Mar 2016 22:28:26 +0000 (22:28 +0000)]
AMDGPU: Constant folding for frexp_mant
llvm-svn: 264943
Paul Robinson [Wed, 30 Mar 2016 22:25:04 +0000 (22:25 +0000)]
Docs: keep copyright years up-to-date.
llvm-svn: 264942
Paul Robinson [Wed, 30 Mar 2016 22:24:57 +0000 (22:24 +0000)]
Docs: keep copyright years up-to-date.
llvm-svn: 264941
Richard Trieu [Wed, 30 Mar 2016 22:23:00 +0000 (22:23 +0000)]
Fix Clang crash with template type diffing.
Fixes https://llvm.org/bugs/show_bug.cgi?id=27129 which is crash involving type
aliases and template type diffing. Template arguments for type aliases and
template arguments for the underlying desugared type may not have one-to-one
relations, which could mess us the attempt to get more information from the
desugared type. For type aliases, ignore the iterator over the desugared type.
llvm-svn: 264940
Vassil Vassilev [Wed, 30 Mar 2016 22:22:50 +0000 (22:22 +0000)]
Add -emit-llvm-only to the regression test for PR21547.
llvm-svn: 264939
Ryan Govostes [Wed, 30 Mar 2016 22:21:58 +0000 (22:21 +0000)]
[asan] Mark the initialization-bug.cc unsupported on OS X Yosemite and older
This test should fail on OS X Yosemite and older, and pass on OS X El Capitan
and newer as well as on other platforms.
llvm-svn: 264938
Vassil Vassilev [Wed, 30 Mar 2016 22:18:29 +0000 (22:18 +0000)]
Canonicalize UnaryTransformType types when they don't have a known underlying type.
Fixes https://llvm.org/bugs/show_bug.cgi?id=26014
Reviewed by Richard Smith.
llvm-svn: 264937
Teresa Johnson [Wed, 30 Mar 2016 22:17:28 +0000 (22:17 +0000)]
Use existing PrintEscapedString in AssemblyWriter
r264884 introduced a helper to escape the backslashes in the source file
path, but I since discovered an existing mechanism to escape strings.
llvm-svn: 264936
Peter Collingbourne [Wed, 30 Mar 2016 22:05:13 +0000 (22:05 +0000)]
Cloning: Reduce complexity of debug info cloning and fix correctness issue.
Commit r260791 contained an error in that it would introduce a cross-module
reference in the old module. It also introduced O(N^2) complexity in the
module cloner by requiring the entire module to be visited for each function.
Fix both of these problems by avoiding use of the CloneDebugInfoMetadata
function (which is only designed to do intra-module cloning) and cloning
function-attached metadata in the same way that we clone all other metadata.
Differential Revision: http://reviews.llvm.org/D18583
llvm-svn: 264935
Jonathan Peyton [Wed, 30 Mar 2016 21:50:59 +0000 (21:50 +0000)]
Fix bug when KMP_USE_ADAPTIVE_LOCKS is 0
#endif was one line too low. If KMP_USE_ADAPTIVE_LOCKS is 0,
then queuing locks would incorrectly use drdpa lock mechanism.
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=26649
llvm-svn: 264934
Sanjay Patel [Wed, 30 Mar 2016 21:38:20 +0000 (21:38 +0000)]
fix typos
llvm-svn: 264933
Aaron Ballman [Wed, 30 Mar 2016 21:33:34 +0000 (21:33 +0000)]
Silencing warnings from MSVC 2015 Update 2. Both of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
llvm-svn: 264932
Matt Arsenault [Wed, 30 Mar 2016 21:32:37 +0000 (21:32 +0000)]
AMDGPU: Remove separate r600 double data layout
This is identical to the other r600 datalayout string.
llvm-svn: 264931
Oleg Ranevskyy [Wed, 30 Mar 2016 21:30:30 +0000 (21:30 +0000)]
[Clang][ARM] __va_list declaration is not saved in ASTContext causing compilation error or crash
Summary:
When the code is compiled for arm32 and the builtin `__va_list` declaration is created by `CreateAAPCSABIBuiltinVaListDecl`, the declaration is not saved in the `ASTContext` which may lead to a compilation error or crash.
Minimal reproducer I was able to find:
**header.h**
```
#include <stdarg.h>
typedef va_list va_list_1;
```
**test.cpp**
```
typedef __builtin_va_list va_list_2;
void foo(const char* format, ...) { va_list args; va_start( args, format ); }
```
Steps to reproduce:
```
clang -x c++-header --target=armv7l-linux-eabihf header.h
clang -c -include header.h --target=armv7l-linux-eabihf test.cpp
```
Compilation error:
```
error: non-const lvalue reference to type '__builtin_va_list'
cannot bind to a value of unrelated type 'va_list' (aka '__builtin_va_list')
```
Compiling the same code as a C source leads to a crash:
```
clang --target=armv7l-linux-eabihf header.h
clang -c -x c -include header.h --target=armv7l-linux-eabihf test.cpp
```
Reviewers: logan, rsmith
Subscribers: cfe-commits, asl, aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D18557
llvm-svn: 264930
Aaron Ballman [Wed, 30 Mar 2016 21:30:00 +0000 (21:30 +0000)]
Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
llvm-svn: 264929
Matt Arsenault [Wed, 30 Mar 2016 21:15:18 +0000 (21:15 +0000)]
LegalizeDAG: Don't replace vector store with integer if not legal
For the same reason as the corresponding load change.
Note that ExpandStore is completely broken for non-byte sized element
vector stores, but preserve the current broken behavior which has tests
for it. The behavior should be the same, but now introduces a new typed
store that is incorrectly split later rather than doing it directly.
llvm-svn: 264928
Matt Arsenault [Wed, 30 Mar 2016 21:15:10 +0000 (21:15 +0000)]
LegalizeDAG: Don't replace vector load with integer unless legal
On AMDGPU we want to be able to promote i64/f64 loads to v2i32.
If the access is unaligned, this would conclude that since i64 is legal,
it would convert it back to i64 and there is an endless legalization
loop.
Extract the logic for scalarizing the load into a new TargetLowering
function, where this can also replace the custom function AMDGPU
has for this.
llvm-svn: 264927
David Majnemer [Wed, 30 Mar 2016 21:12:06 +0000 (21:12 +0000)]
[IndVarSimplify] Don't insert after a catchswitch
Widening a PHI requires us to insert a trunc.
The logical place for this trunc is in the same BB as the PHI.
This is not possible if the BB is terminated by a catchswitch.
This fixes PR27133.
llvm-svn: 264926
Davide Italiano [Wed, 30 Mar 2016 21:01:14 +0000 (21:01 +0000)]
[LTO] Add a test to ensure we treat externally available symbols correctly.
We already get it right, but there was no coverage for it.
llvm-svn: 264925
Pete Cooper [Wed, 30 Mar 2016 20:56:54 +0000 (20:56 +0000)]
Convert file handle* methods to llvm::Error instead of std::error_code. NFC.
This updates most of the file handling methods in the linking context and
resolver to use the new API.
llvm-svn: 264924
Justin Lebar [Wed, 30 Mar 2016 20:52:40 +0000 (20:52 +0000)]
Add #include <functional> to PassManagerBuilder, now that it uses std::function. NFC
llvm-svn: 264923
Simon Pilgrim [Wed, 30 Mar 2016 20:52:24 +0000 (20:52 +0000)]
[X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.
llvm-svn: 264922
Pete Cooper [Wed, 30 Mar 2016 20:44:14 +0000 (20:44 +0000)]
Change loadFileList to llvm::Error. NFC
llvm-svn: 264921
Justin Lebar [Wed, 30 Mar 2016 20:41:05 +0000 (20:41 +0000)]
[Sema] s/UseUsingDeclRules/UseMemberUsingDeclRules/
Summary:
IsOverload has a param named UseUsingDeclRules. But as far as I can
tell, it should be called UseMemberUsingDeclRules. That is, it only
applies to "using" declarations inside classes or structs.
Reviewers: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18538
llvm-svn: 264920
Justin Lebar [Wed, 30 Mar 2016 20:40:11 +0000 (20:40 +0000)]
[NVPTX] Make NVVMReflect a function pass.
Summary:
Currently it's a module pass. Make it a function pass so that we can
move it to PassManagerBuilder's EP_EarlyAsPossible extension point,
which only accepts function passes.
Reviewers: rnk
Subscribers: tra, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D18615
llvm-svn: 264919
Justin Lebar [Wed, 30 Mar 2016 20:39:29 +0000 (20:39 +0000)]
[PassManager] Make PassManagerBuilder::addExtension take an std::function, rather than a function pointer.
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style. (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)
No functional change, and the new API is backwards-compatible.
Reviewers: chandlerc
Subscribers: joker.eph, cfe-commits
Differential Revision: http://reviews.llvm.org/D18613
llvm-svn: 264918
Pete Cooper [Wed, 30 Mar 2016 20:36:31 +0000 (20:36 +0000)]
Convert lld Pass::runOnFile to llvm::Error from std::error_code. NFC.
Pretty mechanical change here. Just replacing all the std::error_code() with
llvm::Error() and make_dynamic_error_code with make_error<GenericError>
llvm-svn: 264917
Justin Bogner [Wed, 30 Mar 2016 20:36:07 +0000 (20:36 +0000)]
test: Remove a test for a transform that hasn't existed in 5 years.
The TailDup transform was removed in r138841 in 2011, along with most
of the tests for it. This test, however, was missed. Probably because
it had already been XFAIL'd for 3 years at that point (since r52243!)
and continued to fail when the opt flag for -tailduplicate stopped
being valid.
llvm-svn: 264916
Rui Ueyama [Wed, 30 Mar 2016 20:25:26 +0000 (20:25 +0000)]
Attempt to fix test failure on Windows.
Windows seems to complain that the file cannot be removed because
it is still in use. We don't have to remove the file but instead
just overwrite it, so do that.
llvm-svn: 264915
Sean Callanan [Wed, 30 Mar 2016 20:17:41 +0000 (20:17 +0000)]
Fixed a problem where a dSYM wasn't properly found because it had the wrong name
<rdar://problem/
25447765>
llvm-svn: 264914
Vassil Vassilev [Wed, 30 Mar 2016 20:16:03 +0000 (20:16 +0000)]
[modules] Write out identifiers if the ID is local, too.
In some cases a slot for an identifier is requested but it gets written to
another module, causing an assertion.
At the point when we start serializing Rtypes, we have no imported IdentifierID
for float_round_style. We start serializing stuff and allocate an ID for it.
Then, during the serialization process, we pull in the identifier info for it
from TSchemaHelper. Finally, WriteIdentifierTable decides that the identifier
has not changed since it was deserialized, so doesn't emit it.
Fixes https://llvm.org/bugs/show_bug.cgi?id=27041
Discussed on IRC with Richard Smith. Agreed on post commit review if needed.
llvm-svn: 264913
Reid Kleckner [Wed, 30 Mar 2016 20:15:50 +0000 (20:15 +0000)]
Fix the detection of the shell feature and disable some tests when its not present
llvm-svn: 264912
Reid Kleckner [Wed, 30 Mar 2016 20:15:41 +0000 (20:15 +0000)]
Remove unused fwd decl for LLVM IR stuff that lives in LTO now
llvm-svn: 264911
Pete Cooper [Wed, 30 Mar 2016 20:15:06 +0000 (20:15 +0000)]
Change getReferenceInfo/getPairReferenceInfo to use new Error handling. NFC.
Adds a GenericError class to lld/Core which can carry a string. This is
analygous to the dynamic_error we currently use in lld/Core.
Use this GenericError instead of make_dynamic_error_code. Also, provide
an implemention of GenericError::convertToErrorCode which for now converts
it in to the dynamic_error_code we used to have. This will go away once
all the APIs are converted.
llvm-svn: 264910
Greg Clayton [Wed, 30 Mar 2016 20:14:35 +0000 (20:14 +0000)]
When support for DWO files was added, there were two ways to pass lldb::user_id_t out to the rest of LLDB:
1 - DWARF in .o files with debug map in executable: we would place the compile unit index in the upper 32 bits of the 64 bit value and the lower 32 bits would be the DIE offset
2 - DWO: we would place the compile unit offset in the upper 32 bits of the 64 bit value and the lower 32 bits would be the DIE offset
There was a mixing and matching of this and it wasn't done consistently.
Major changes include:
The DIERef constructor that takes a lldb::user_id_t now requires a SymbolFileDWARF:
DIERef(lldb::user_id_t uid, SymbolFileDWARF *dwarf)
It is needed so that it can be decoded correctly. If it is DWARF in .o files with debug map in executable, then we get the right compile unit from the SymbolFileDWARFDebugMap, otherwise, we use the compile unit offset and DIE offset for DWO or normal DWARF.
The function:
lldb::user_id_t DIERef::GetUID() const;
Now becomes
lldb::user_id_t DIERef::GetUID(SymbolFileDWARF *dwarf) const;
Again, we need the DWARF file to encode it correctly.
This removes the need for "lldb::user_id_t SymbolFileDWARF::MakeUserID() const" and for bool SymbolFileDWARF::UserIDMatches (lldb::user_id_t uid) const". There were also many places were doing things inneficiently like:
1 - encode a dw_offset_t into a lldb::user_id_t
2 - call the public SymbolFile interface to resolve types using the lldb::user_id_t
3 - This would then decode the lldb::user_id_t into a DIERef, and then try to find that type.
There are many places that are now doing this more efficiently by storing DW_AT_type form values as DWARFFormValue objects and then making a DIERef from them and directly calling the underlying function to resolve the lldb_private::Type, lldb_private::CompilerType, lldb_private::CompilerDecl, lldb_private::CompilerDeclContext.
If there are any regressions in DWARF with DWO, we will need to fix any issues that arise since the original patch wasn't functional for the much more widely used DWARF in .o files with debug map.
<rdar://problem/
25200976>
llvm-svn: 264909
Vassil Vassilev [Wed, 30 Mar 2016 20:10:07 +0000 (20:10 +0000)]
[modules] Add a regression test for PR21547.
llvm-svn: 264908
Hal Finkel [Wed, 30 Mar 2016 19:54:56 +0000 (19:54 +0000)]
Add a copy constructor to StringMap
There is code under review that requires StringMap to have a copy constructor,
and this makes StringMap more consistent with our other containers (like
DenseMap) that have copy constructors.
Differential Revision: http://reviews.llvm.org/D18506
llvm-svn: 264906
Rui Ueyama [Wed, 30 Mar 2016 19:41:51 +0000 (19:41 +0000)]
Split Writer::assignAddresses. NFC.
llvm-svn: 264905
Hal Finkel [Wed, 30 Mar 2016 19:37:08 +0000 (19:37 +0000)]
[LoopVectorize] Don't vectorize loops when everything will be scalarized
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:
LV: The Smallest and Widest types: 32 / 32 bits.
LV: The Widest register is: 256 bits.
LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Scalar loop costs: 3.
LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Vector loop of width 2 costs: 2.
LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Vector loop of width 4 costs: 1.
...
LV: Selecting VF: 8.
LV: The target has 32 registers
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #4 Interval # 1
LV(REG): At #5 Interval # 1
LV(REG): VF = 8
The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.
The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.
While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.
This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.
Differential Revision: http://reviews.llvm.org/D18537
llvm-svn: 264904
Adhemerval Zanella [Wed, 30 Mar 2016 19:12:18 +0000 (19:12 +0000)]
[lld] [ELF/AArch64] Add aarch64 TLS IE to LE relax for local symbol test
This patch add a TLS relax optimization test when transforming
Initial-Exec to Local-Exec for local symbols (which can not be preempted).
llvm-svn: 264903
Rong Xu [Wed, 30 Mar 2016 18:37:52 +0000 (18:37 +0000)]
[PGO] PGOFuncName in LTO optimizations
PGOFuncNames are used as the key to retrieve the Function definition from the
MD5 stored in the profile. For internal linkage function, we prefix the source
file name to the PGOFuncNames. LTO's internalization privatizes many global linkage
symbols. This happens after value profile annotation, but those internal
linkage functions should not have a source prefix. To differentiate compiler
generated internal symbols from original ones, PGOFuncName meta data are
created and attached to the original internal symbols in the value profile
annotation step. If a symbol does not have the meta data, its original linkage
must be non-internal.
Also add a new map that maps PGOFuncName's MD5 value to the function definition.
Differential Revision: http://reviews.llvm.org/D17895
llvm-svn: 264902
Reid Kleckner [Wed, 30 Mar 2016 18:31:14 +0000 (18:31 +0000)]
[cmake] Get the MSVC version by running cl rather than relying on MSVC_VERSION
MSVC_VERSION comes from the _MSC_VER macro, which won't correspond to
the STL version if the host compiler is clang-cl.
llvm-svn: 264901
Reid Kleckner [Wed, 30 Mar 2016 18:19:39 +0000 (18:19 +0000)]
[cmake] Instead of testing char16_t for MSVC compat, directly ask cl.exe its version
Credit to Aaron Ballman for thinking of this.
llvm-svn: 264886
Tobias Grosser [Wed, 30 Mar 2016 18:18:31 +0000 (18:18 +0000)]
Revert 264782 and 264789
These caused LNT failures due to new assertions when running with
-polly-position=before-vectorizer -polly-process-unprofitable for:
FAIL: clamscan.compile_time
FAIL: cjpeg.compile_time
FAIL: consumer-jpeg.compile_time
FAIL: shapes.compile_time
FAIL: clamscan.execution_time
FAIL: cjpeg.execution_time
FAIL: consumer-jpeg.execution_time
FAIL: shapes.execution_time
The failures have been introduced by r264782, but r264789 had to be reverted
as it depended on the earlier patch.
llvm-svn: 264885
Teresa Johnson [Wed, 30 Mar 2016 18:15:08 +0000 (18:15 +0000)]
Restore "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly"
This restores commit 264869, with a fix for windows bots to properly
escape '\' in the path when serializing out. Added test.
llvm-svn: 264884
Jim Ingham [Wed, 30 Mar 2016 18:14:36 +0000 (18:14 +0000)]
Fix header name.
llvm-svn: 264883
Chad Rosier [Wed, 30 Mar 2016 18:08:51 +0000 (18:08 +0000)]
[AArch64] Fix warnings pointed out by Hal.
llvm-svn: 264882
Reid Kleckner [Wed, 30 Mar 2016 17:30:26 +0000 (17:30 +0000)]
[cmake] Add -fms-compatibility-version=19 when clang-cl gives errors about char16_t
What we are really trying to do here is to figure out if we are using
the 2015 STL. Unfortunately, so far as I know the MSVC STL does not
define a version macro that we can check directly. Instead I wrote a
check to see if char16_t works.
llvm-svn: 264881
Reid Kleckner [Wed, 30 Mar 2016 17:28:21 +0000 (17:28 +0000)]
[cmake] Allow EH usage with clang-cl
llvm-svn: 264880
Rong Xu [Wed, 30 Mar 2016 16:56:31 +0000 (16:56 +0000)]
[PGO] Use ArrayRef in annotateValueSite()
Using ArrayRef in annotateValueSite's parameter instead of using an array
and it's size.
Differential Revision: http://reviews.llvm.org/D18568
llvm-svn: 264879
Rui Ueyama [Wed, 30 Mar 2016 16:51:57 +0000 (16:51 +0000)]
Include line number in error message for linker scripts.
This patch is based on http://reviews.llvm.org/D18545 written
by George Rimar.
llvm-svn: 264878
Tom Stellard [Wed, 30 Mar 2016 16:35:13 +0000 (16:35 +0000)]
AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:
- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.
shader-db stats:
16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A
Differential Revision: http://reviews.llvm.org/D18453
llvm-svn: 264877
Tom Stellard [Wed, 30 Mar 2016 16:35:09 +0000 (16:35 +0000)]
AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.
This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.
shader-db stats:
2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)
Depends on D18451
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18452
llvm-svn: 264876
Jonas Paulsson [Wed, 30 Mar 2016 16:11:58 +0000 (16:11 +0000)]
[SystemZ] Add nop and nopr InstAliases.
For compatability with GAS, nop and nopr are recognized as alises for
bc and bcr, respectively. A mask of 0 turns these instructions
effectively into no-operations.
Reviewed by Ulrich Weigand.
llvm-svn: 264875
Vedant Kumar [Wed, 30 Mar 2016 16:03:02 +0000 (16:03 +0000)]
[c-index-test] Delete dead function, NFC
llvm-svn: 264874
Jonas Paulsson [Wed, 30 Mar 2016 15:51:24 +0000 (15:51 +0000)]
[SystemZ] Specify required features for builtins.
BuiltinsSystemZ.def is extended to include the required processor
features per intrinsic.
New test test/CodeGen/builtins-systemz-error2.c that checks for
expected errors when instrinsics are used with a subtarget that does
not support the required feature (e.g. vector support).
Reviewed by Ulrich Weigand.
llvm-svn: 264873
Nirav Dave [Wed, 30 Mar 2016 15:41:12 +0000 (15:41 +0000)]
Remove HasFnAttribute guards to getFnAttribute calls
These checks are redundant and can be removed
Reviewers: hans
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D18564
llvm-svn: 264872
Teresa Johnson [Wed, 30 Mar 2016 15:16:04 +0000 (15:16 +0000)]
Revert "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly"
This reverts commit r264869. I am seeing Windows bot failures due to the
"\" in the path being mishandled at some point (seems to be interpreted
wrongly at some point and llvm-as | llvm-dis is yielding some junk
characters). Need to investigate.
llvm-svn: 264871
Simon Pilgrim [Wed, 30 Mar 2016 14:14:00 +0000 (14:14 +0000)]
[X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.
llvm-svn: 264870
Teresa Johnson [Wed, 30 Mar 2016 14:00:02 +0000 (14:00 +0000)]
[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly
Summary:
This change serializes out and in the SourceFileName to LLVM assembly
so that it is preserved through "llvm-dis | llvm-as". This is
necessary to ensure that the global identifiers created for local values
in the module summary index are the same even if the bitcode is
streamed out and read back from LLVM assembly.
Serializing the summary itself to LLVM assembly is in progress.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18588
llvm-svn: 264869
Teresa Johnson [Wed, 30 Mar 2016 13:59:49 +0000 (13:59 +0000)]
Prepare tests for change to emit Module SourceFileName to LLVM assembly
Modify these tests to ignore the source file name when looking for the
expected string. It was already catching the source file name once via
the ModuleID, and will catch it another time with an impending change to
LLVM to serialize out the module's SourceFileName.
llvm-svn: 264868
Simon Pilgrim [Wed, 30 Mar 2016 13:55:00 +0000 (13:55 +0000)]
[X86][SSE] Test the legalization of vector comparison results
We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.
llvm-svn: 264867
Rafael Espindola [Wed, 30 Mar 2016 13:27:50 +0000 (13:27 +0000)]
No relocation needs bot SA and ZA.
Pass only one of them to relocateOne.
llvm-svn: 264866
Rafael Espindola [Wed, 30 Mar 2016 13:18:08 +0000 (13:18 +0000)]
Implement getImplicitAddend for mips.
llvm-svn: 264865
Rafael Espindola [Wed, 30 Mar 2016 12:45:58 +0000 (12:45 +0000)]
Simplify mips addend processing.
It is now added to the addend in the same way as a regular Elf_Rel
addend.
llvm-svn: 264864
Rafael Espindola [Wed, 30 Mar 2016 12:40:38 +0000 (12:40 +0000)]
Fix handling of addends on i386.
Because of merge sections it is not sufficient to just add them while
applying a relocation.
llvm-svn: 264863
Alexander Kornienko [Wed, 30 Mar 2016 12:35:05 +0000 (12:35 +0000)]
[clang-tidy] Fix MSVC build.
llvm-svn: 264862
Benjamin Kramer [Wed, 30 Mar 2016 12:31:51 +0000 (12:31 +0000)]
[NVPTX] Avoid temporary std::string and make single-use function local to the cpp file.
No functionality change intended.
llvm-svn: 264861
Marianne Mailhot-Sarrasin [Wed, 30 Mar 2016 12:20:53 +0000 (12:20 +0000)]
gold-plugin: Fixed typo in an error message.
llvm-svn: 264860
Gabor Horvath [Wed, 30 Mar 2016 12:16:09 +0000 (12:16 +0000)]
[clang-tidy] Adjust dangling references check to ASTMatcher changes.
llvm-svn: 264859
Alexander Kornienko [Wed, 30 Mar 2016 12:05:33 +0000 (12:05 +0000)]
[docs] Added 3.8 clang-tidy release notes, fixed formatting.
llvm-svn: 264858
Simon Pilgrim [Wed, 30 Mar 2016 11:43:26 +0000 (11:43 +0000)]
[X86][SSE] Added tests for clearing upper bits of vector elements
Patterns based on PR6455
llvm-svn: 264857
Alexander Kornienko [Wed, 30 Mar 2016 11:31:33 +0000 (11:31 +0000)]
[clang-tidy] readability check for const params in declarations
Summary: Adds a clang-tidy warning for top-level consts in function declarations.
Reviewers: hokein, sbenza, alexfh
Subscribers: cfe-commits
Patch by Matt Kulukundis!
Differential Revision: http://reviews.llvm.org/D18408
llvm-svn: 264856
Gabor Horvath [Wed, 30 Mar 2016 11:22:14 +0000 (11:22 +0000)]
[ASTMatchers] Existing matcher hasAnyArgument fixed
Summary: A checker (will be uploaded after this patch) needs to check implicit casts. The checker needs matcher hasAnyArgument but it ignores implicit casts and parenthesized expressions which disables checking of implicit casts for arguments in the checker. However the documentation of the matcher contains a FIXME that this should be removed once separate matchers for ignoring implicit casts and parenthesized expressions are ready. Since these matchers were already there the fix could be executed. Only one Clang checker was affected which was also fixed (ignoreParenImpCasts added) and is separately uploaded. Third party checkers (not in the Clang repository) may be affected by this fix so the fix must be emphasized in the release notes.
Reviewers: klimek, sbenza, alexfh
Subscribers: alexfh, klimek, xazax.hun, cfe-commits
Differential Revision: http://reviews.llvm.org/D18243
llvm-svn: 264855
Kuba Brecka [Wed, 30 Mar 2016 10:50:24 +0000 (10:50 +0000)]
Fix the ThreadSanitizer support to avoid creating empty SBThreads and to not crash when thread_id is unavailable. Plus a whitespace fix.
llvm-svn: 264854
Alexey Bataev [Wed, 30 Mar 2016 10:43:55 +0000 (10:43 +0000)]
[OPENMP 4.0] Initial support for '#pragma omp declare simd' directive.
Initial parsing/sema/serialization/deserialization support for '#pragma
omp declare simd' directive.
The 'declare simd' construct can be applied to a function to enable the
creation of one or more versions that can process multiple arguments
using SIMD instructions from a single invocation from a SIMD loop.
If the function has any declarations, then the declare simd construct
for any declaration that has one must be equivalent to the one specified
for the definition. Otherwise, the result is unspecified.
This pragma can be applied many times to the same declaration.
Internally this pragma is represented as an attribute. But we need special processing for this pragma because it must be used before function declaration, this directive is applied to.
Differential Revision: http://reviews.llvm.org/D10599
llvm-svn: 264853
James Molloy [Wed, 30 Mar 2016 10:11:43 +0000 (10:11 +0000)]
[VectorUtils] Don't try and truncate PHIs to a smaller bitwidth
We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means.
However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018.
This fixes PR17018.
llvm-svn: 264852