Johannes Doerfert [Thu, 26 Aug 2021 18:04:29 +0000 (13:04 -0500)]
[OpenMP][FIX] Allow declare variant to work with reference types
Reference types in the return or parameter position did cause the OpenMP
declare variant overload reasoning to give up. We should allow them as
we allow any other type.
This should fix the bug reported on the mailing list:
https://lists.llvm.org/pipermail/openmp-dev/2021-August/004094.html
Reviewed By: ABataev, pdhaliwal
Differential Revision: https://reviews.llvm.org/D108774
Johannes Doerfert [Tue, 17 Aug 2021 06:37:29 +0000 (01:37 -0500)]
[Attributor][FIX] Recursion via memory needs to be tracked explicitly
Recursion can happen when we see a PHI use the second time or when we
look at a store value operand use again. We already visited the
potential copies and doing so again will just cause endless looping.
Reviewed By: kuter
Differential Revision: https://reviews.llvm.org/D108190
Johannes Doerfert [Mon, 16 Aug 2021 16:04:09 +0000 (11:04 -0500)]
[Attributor][FIX] Do not treat byval args as local memory (for now)
For now we do should not treat byval arguments as local copies performed
on the call edge, though, in general we should. To make that happen we
need to teach various passes, e.g., DSE, about the copy effect of a
byval. That would also allow us to mark functions only accessing byval
arguments as readnone again, atguably their acceses have no effect
outside of the function, like accesses to allocas.
Reviewed By: kuter
Differential Revision: https://reviews.llvm.org/D108140
Jason Liu [Fri, 27 Aug 2021 17:37:34 +0000 (13:37 -0400)]
Fix assertion when passing function into inline asm's input operand
This seem to be a regression caused by this change:
https://reviews.llvm.org/D60943.
Since we delayed report the error, we would run into some invalid
state in clang and llvm.
Without this fix, clang would assert when passing function into
inline asm's input operand.
Differential Revision: https://reviews.llvm.org/D107941
LLVM GN Syncbot [Fri, 27 Aug 2021 17:29:43 +0000 (17:29 +0000)]
[gn build] Port
54e8cae56529
Philip Reames [Fri, 27 Aug 2021 17:12:15 +0000 (10:12 -0700)]
Special case common branch patterns in breakLoopBackedge (try 2)
Changes since aec08e:
* Adjust placement of a closing brace so that the general case actually runs. Turns out we had *no* coverage of the switch case. I added one in eae90fd.
* Drop .llvm.loop.* metadata from the new branch as there is no longer a loop to annotate.
Original commit message:
This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
Roman Lebedev [Fri, 27 Aug 2021 17:15:45 +0000 (20:15 +0300)]
[Codegen][X86] EltsFromConsecutiveLoads(): if only have AVX1, ensure that the "load" is actually foldable (PR51615)
This fixes another reproducer from https://bugs.llvm.org/show_bug.cgi?id=51615
And again, the fix lies not in the code added in D105390
In this case, we completely don't check that the "broadcast-from-mem" we create
can actually fold the load. In this case, it's operand was not a load at all:
```
Combining: t16: v8i32 = vector_shuffle<0,u,u,u,0,u,u,u> t14, undef:v8i32
Creating new node: t29: i32 = undef
RepeatLoad:
t8: i32 = truncate t7
t7: i64 = extract_vector_elt t5, Constant:i64<0>
t5: v2i64,ch = load<(load (s128) from %ir.arg)> t0, t2, undef:i64
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t1: i64 = Register %0
t4: i64 = undef
t3: i64 = Constant<0>
Combining: t15: v8i32 = undef
```
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D108821
Philipp Krones [Fri, 27 Aug 2021 16:55:52 +0000 (17:55 +0100)]
[MC][RISCV] Add RISCV MCObjectFileInfo
This makes sure, that the text section will have a 2-byte alignment, if
the +c extension is enabled.
Reviewed By: MaskRay, luismarques
Differential Revision: https://reviews.llvm.org/D102052
Craig Topper [Fri, 27 Aug 2021 16:51:05 +0000 (09:51 -0700)]
[RISCV] Add -riscv-v-fixed-length-vector-elen-max to limit the ELEN used for fixed length vectorization.
This adds an ELEN limit for fixed length vectors. This will scalarize
any elements larger than this. It will also disable some fractional
LMULs. For example, if ELEN=32 then mf8 becomes illegal, i32/f32
vectors can't use any fractional LMULs, i16/f16 can only use mf2,
and i8 can use mf2 and mf4.
We may also need something for the scalable vectors, but that has
interactions with the intrinsics and we can't scalarize a scalable
vector.
Longer term this should come from one of the Zve* features
Sizhe Zhao [Thu, 26 Aug 2021 20:11:54 +0000 (23:11 +0300)]
[libcxx] Use GetSystemTimePreciseAsFileTime() if available
We will try to use GetSystemTimePreciseAsFileTime if possible.
Reference: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/
59195b2d7fe26549f70969b0dd487293819f023e/.
Reviewed By: compnerd, #libc, mstorsjo, ldionne
Differential Revision: https://reviews.llvm.org/D104987
Philip Reames [Fri, 27 Aug 2021 17:09:51 +0000 (10:09 -0700)]
[test] exercise breakLoopBackedge with a switch latch cond
This was reduced from a test case which triggered a revert to my recent change to same function. It turns out we didn't have *any* coverage of the non-branch latch and my patch was blatantly broken.
LLVM GN Syncbot [Fri, 27 Aug 2021 16:46:52 +0000 (16:46 +0000)]
[gn build] Port
c8b14c03ec74
Louis Dionne [Fri, 27 Aug 2021 14:36:04 +0000 (10:36 -0400)]
[libc++][NFC] Fix include guard for decay_copy.h and remove underscores from the header
We don't use double underscores for private header names when they are
in a subdirectory with double underscores already.
Differential Revision: https://reviews.llvm.org/D108820
Louis Dionne [Thu, 26 Aug 2021 18:54:07 +0000 (14:54 -0400)]
[libc++][NFC] Remove useless _LIBCPP_PUSH_MACROS
Only files that actually use min/max are required to do this dance.
Differential Revision: https://reviews.llvm.org/D108778
Walter Erquinigo [Fri, 27 Aug 2021 16:31:41 +0000 (09:31 -0700)]
[trace] [intel pt] Create a "process trace save" command
added new command "process trace save -d <directory>".
-it saves a JSON file as <directory>/trace.json, with the main properties of the trace session.
-it saves binary Intel-pt trace as <directory>/thread_id.trace; each file saves each thread.
-it saves modules to the directory <directory>/modules .
-it only works for live process and it only support Intel-pt right now.
Example:
```
b main
run
process trace start
n
process trace save -d /tmp/mytrace
```
A file named trace.json and xxx.trace should be generated in /tmp/mytrace. To load the trace that was just saved:
```
trace load /tmp/mytrace
thread trace dump instructions
```
You should see the instructions of the trace got printed.
To run a test:
```
cd ~/llvm-sand/build/Release/fbcode-x86_64/toolchain
ninja lldb-dotest
./bin/lldb-dotest -p TestTraceSave
```
Reviewed By: wallace
Differential Revision: https://reviews.llvm.org/D107669
Siva Chandra Reddy [Fri, 27 Aug 2021 15:48:55 +0000 (15:48 +0000)]
[libc][Obvious] Add header guards for the generated linux syscall header file.
Fangrui Song [Fri, 27 Aug 2021 15:53:55 +0000 (08:53 -0700)]
[MC] Change ELFOSABI_NONE to ELFOSABI_GNU for STB_GNU_UNIQUE
Similar to D97976.
On Linux, most GCC installations are configured with
`--enable-gnu-unique-object` and such GCC emits `@gnu_unique_object` assembly.
The feature is highly controversial and disliked by many folks.
(On glibc DF_1_NODELETE is implicitly enabled and makes dlclose a no-op).
In llvm-project STB_GNU_UNIQUE is assembly only. Clang does not use STB_GNU_UNIQUE.
Use ELFOSABI_GNU to match GNU as behavior and avoid collision with other
OSABI binding values.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D107861
Arthur Eubanks [Wed, 25 Aug 2021 22:24:49 +0000 (15:24 -0700)]
[gn build] Don't copy xray includes
The gn build doesn't support xray, so there's no reason to make the xray
headers available. Some CMake checks check if xray includes are
available to determine if xray is usable. Since we don't build the xray
runtime, there are link errors.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D108737
Louis Dionne [Fri, 27 Aug 2021 15:47:27 +0000 (11:47 -0400)]
[libc++][NFC] Remove unused helper function in the test suite
Fanbo Meng [Fri, 27 Aug 2021 14:26:51 +0000 (10:26 -0400)]
[MCParser][z/OS] Mark test as unsupported for the z/OS Target
Marking test as unsupported for the same reason as https://reviews.llvm.org/D105204
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D108819
Kazu Hirata [Fri, 27 Aug 2021 15:42:57 +0000 (08:42 -0700)]
[IR] Remove getWithOperandReplaced (NFC)
The function hasn't been used for at least 10 years.
Dmitry Preobrazhensky [Fri, 27 Aug 2021 14:16:22 +0000 (17:16 +0300)]
[AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description.
Summary of changes:
- Added f16 omod modifier (bug 51386).
- Corrected names of data types (bug 48638).
- Enabled a16 with most GFX10 MIMG opcodes (see https://reviews.llvm.org/D102231).
- Corrected description of integer operands (bug 51130).
- Corrected description of 8-bit DS offsets (bug 51536).
- Improved PERMLANE op_sel description.
- Corrected *SAD* opcode types.
Joe Loser [Fri, 27 Aug 2021 14:08:11 +0000 (10:08 -0400)]
[libc++][NFC] Remove extra __ranges/take_view.h entry in CMakeLists.txt
Differential Revision: https://reviews.llvm.org/D108802
Louis Dionne [Fri, 27 Aug 2021 14:01:29 +0000 (10:01 -0400)]
Revert "[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on Linux"
This reverts commit
abb956370ee71d018e9a88ae196f039f6c4e0dae, which broke
the libc++ CI on Linux.
owenca [Wed, 25 Aug 2021 23:22:02 +0000 (16:22 -0700)]
[clang-format] Group options that pack constructor initializers
Add a new option PackConstructorInitializers and deprecate the
related options ConstructorInitializerAllOnOneLineOrOnePerLine and
AllowAllConstructorInitializersOnNextLine. Below is the mapping:
PackConstructorInitializers ConstructorInitializer... AllowAll...
Never - -
BinPack false -
CurrentLine true false
NextLine true true
The option value Never fixes PR50549 by always placing each
constructor initializer on its own line.
Differential Revision: https://reviews.llvm.org/D108752
Matt Arsenault [Fri, 27 Aug 2021 13:18:26 +0000 (09:18 -0400)]
GlobalISel: Remove check for empty functions as these are invalid IR
Nico Weber [Fri, 27 Aug 2021 02:03:26 +0000 (22:03 -0400)]
[lld/COFF] Ignore /LTCG, /LTCG:, /LTCGOUT:, /ILK: flags
We currently complain "could not open /LTCG: no such file or directory",
which isn't very useful. We could emit a warning when we see this flag, but
just ignoring it seems fine.
Final missing part of PR38799.
Differential Revision: https://reviews.llvm.org/D108799
Nico Weber [Fri, 27 Aug 2021 02:01:00 +0000 (22:01 -0400)]
[lld/COFF] Use P_priv more
P_priv does the same as the old QF further down. Standardize on P_priv.
No behavior change.
Differential Revision: https://reviews.llvm.org/D108798
Balazs Benics [Fri, 27 Aug 2021 12:41:26 +0000 (14:41 +0200)]
[analyzer] MallocOverflow should consider comparisons only preceding malloc
MallocOverflow works in two phases:
1) Collects suspicious malloc calls, whose argument is a multiplication
2) Filters the aggregated list of suspicious malloc calls by iterating
over the BasicBlocks of the CFG looking for comparison binary
operators over the variable constituting in any suspicious malloc.
Consequently, it suppressed true-positive cases when the comparison
check was after the malloc call.
In this patch the checker will consider the relative position of the
relation check to the malloc call.
E.g.:
```lang=C++
void *check_after_malloc(int n, int x) {
int *p = NULL;
if (x == 42)
p = malloc(n * sizeof(int)); // Previously **no** warning, now it
// warns about this.
// The check is after the allocation!
if (n > 10) {
// Do something conditionally.
}
return p;
}
```
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D107804
Sanjay Patel [Fri, 27 Aug 2021 12:09:28 +0000 (08:09 -0400)]
[GlobalOpt] don't hoist constant expressions that can trap
We try to forward a stored-once-constant-value from one global access
to another, but that's not safe if the constant value is an expression
that can trap.
The tests are reduced from the miscompile examples in:
https://llvm.org/PR47578
Differential Revision: https://reviews.llvm.org/D108771
Jun Ma [Wed, 25 Aug 2021 11:25:38 +0000 (19:25 +0800)]
[AArch64][SVE] Optimize ptrue predicate pattern with known sve register width.
For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use
AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available.
TestPlan: check-llvm
Differential Revision: https://reviews.llvm.org/D108706
Jun Ma [Wed, 25 Aug 2021 09:25:39 +0000 (17:25 +0800)]
[AArch64][SVE] Add API for conversion between SVE predicate pattern and element number. NFC
This patch solely moves convert operation between SVE predicate pattern
and element number into two small functions. It's pre-commit patch for optimize
pture with known sve register width.
Differential Revision: https://reviews.llvm.org/D108705
Jun Ma [Wed, 25 Aug 2021 07:43:18 +0000 (15:43 +0800)]
[AArch64][SVE] Use getPTrue uniformly.NFC.
Andrea Di Biagio [Fri, 27 Aug 2021 11:48:30 +0000 (12:48 +0100)]
[MCA][NFC] Removed unused method, and fixed a coverity issue.
The coverity issue was reported agaist class MCAOperand
due to the lack of proper initialization for field Index.
No functional change intended.
Jon Chesterfield [Fri, 27 Aug 2021 11:34:02 +0000 (12:34 +0100)]
[openmp][amdgpu] Initial gfx10 offloading implementation
Lets wavefront size be 32 for amdgpu openmp, as well as 64.
Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D108708
Serge Pavlov [Fri, 13 Aug 2021 09:52:29 +0000 (16:52 +0700)]
[X86] Implement llvm.isnan(x86_fp80) as unordered comparison
x86_fp80 format allows values that do not fit any of IEEE-754 category.
Previously they were recognized by intrinsic __builtin_isnan as NaNs.
Now this intrinsic is implemented using instruction FXAM, which
distinguish between NaNs and unsupported values. It can make some
programs behave differently.
As a solution, this fix changes lowering of the intrinsic. If floating
point exceptions are ignored, llvm.isnan is lowered into unordered
comparison, as __buildtin_isnan was implemented earlier. In strictfp
functions the intrinsic is lowered using FXAM, which does not raise
exceptions even for signaling NaN, as required by IEEE-754 and C
standards.
Differential Revision: https://reviews.llvm.org/D108037
Nathan Sidwell [Thu, 26 Aug 2021 11:05:25 +0000 (04:05 -0700)]
[NFC][X86] Sret return register cleanup
There are no paths into LowerFormalParms that have already specified
the sret register. We always materialize a virtual and then assign it
to the physical reg at the point of the return.
Differential Revision: https://reviews.llvm.org/D108762
Carl Ritson [Fri, 27 Aug 2021 10:08:10 +0000 (19:08 +0900)]
[DAGCombine] Allow FMA combine with both FMA and FMAD
Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108619
Ricky Taylor [Thu, 26 Aug 2021 21:13:28 +0000 (22:13 +0100)]
[M68k] Update pointer data layout
Fixes PR51626.
The M68k requires that all instruction, word and long word reads are
aligned to word boundaries. From the 68020 onwards, there is a
performance benefit from aligning long words to long word boundaries.
The M68k uses the same data layout for pointers and integers.
In line with this, this commit updates the pointer data layout to
match the layout already set for 32-bit integers: 32:16:32.
Differential Revision: https://reviews.llvm.org/D108792
Roman Lebedev [Fri, 27 Aug 2021 10:23:27 +0000 (13:23 +0300)]
[X86] AMD Zen 3: MULX w/ mem operand has the same throughput as with reg op
Exegesis is faulty and sometimes when measuring throughput^-1
produces snippets that have loop-carried dependencies,
which must be what caused me to incorrectly measure it originally.
After looking much more carefully, the inverse throughput should match
that of the MULX w/ reg op.
As per llvm-exegesis measurements.
Roman Lebedev [Fri, 27 Aug 2021 10:01:36 +0000 (13:01 +0300)]
[X86] AMD Zen 3: MULX produces low part of the result in 3cy, +1cy for high part
As per llvm-exegesis measurements.
Roman Lebedev [Fri, 27 Aug 2021 09:12:57 +0000 (12:12 +0300)]
[NFC][X86][MCA] AMD Zen 3: improve MULX test coverage
Latency for MULX isn't right
Yaron Keren [Fri, 27 Aug 2021 09:14:58 +0000 (12:14 +0300)]
[docs] Add DIA register instructions to Getting Started with Visual Studio page
Since Visual Studio 2017 the DIA libs are not registered by default, see:
https://docs.microsoft.com/en-us/visualstudio/extensibility/breaking-changes-2017?view=vs-2019#change-reduce-registry-impact
LLDB building instruction already specify registering these DLLs, required
both the LLVM PDB tests and LLDB build.
Differential Revision: https://reviews.llvm.org/D108811
Balazs Benics [Fri, 27 Aug 2021 09:31:16 +0000 (11:31 +0200)]
[analyzer] Catch leaking stack addresses via stack variables
Not only global variables can hold references to dead stack variables.
Consider this example:
void write_stack_address_to(char **q) {
char local;
*q = &local;
}
void test_stack() {
char *p;
write_stack_address_to(&p);
}
The address of 'local' is assigned to 'p', which becomes a dangling
pointer after 'write_stack_address_to()' returns.
The StackAddrEscapeChecker was looking for bindings in the store which
referred to variables of the popped stack frame, but it only considered
global variables in this regard. This patch relaxes this, catching
stack variable bindings as well.
---
This patch also works for temporary objects like:
struct Bar {
const int &ref;
explicit Bar(int y) : ref(y) {
// Okay.
} // End of the constructor call, `ref` is dangling now. Warning!
};
void test() {
Bar{33}; // Temporary object, so the corresponding memregion is
// *not* a VarRegion.
}
---
The return value optimization aka. copy-elision might kick in but that
is modeled by passing an imaginary CXXThisRegion which refers to the
parent stack frame which is supposed to be the 'return slot'.
Objects residing in the 'return slot' outlive the scope of the inner
call, thus we should expect no warning about them - except if we
explicitly disable copy-elision.
Reviewed By: NoQ, martong
Differential Revision: https://reviews.llvm.org/D107078
Sylvestre Ledru [Fri, 27 Aug 2021 08:46:50 +0000 (10:46 +0200)]
polly: remove the old reference to svn in the doc
Sylvestre Ledru [Fri, 27 Aug 2021 07:06:52 +0000 (09:06 +0200)]
[clang] Move the soname declaration in a variable at the top of the file
Currently, it is a bit buried in the file even if this is
pretty important for distro.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D108533
Chuanqi Xu [Fri, 27 Aug 2021 06:00:03 +0000 (14:00 +0800)]
[NFC] [ASTReader] Remove unused variables
LLVM GN Syncbot [Fri, 27 Aug 2021 04:42:51 +0000 (04:42 +0000)]
[gn build] Port
b749ef9e2241
Lang Hames [Thu, 26 Aug 2021 21:52:40 +0000 (07:52 +1000)]
[ORC][ORC-RT] Reapply "Introduce ELF/*nix Platform and runtime..." with fixes.
This reapplies
e256445bfff, which was reverted in
45ac5f54418 due to bot errors
(e.g. https://lab.llvm.org/buildbot/#/builders/112/builds/8599). The issue that
caused the bot failure was fixed in
2e6a4fce356.
Lang Hames [Thu, 26 Aug 2021 21:47:58 +0000 (07:47 +1000)]
[ORC][JITLink][ELF] Treat STB_GNU_UNIQUE as Weak in the JIT.
This should fix the bot error in
https://lab.llvm.org/buildbot/#/builders/112/builds/8599
which forced reversion of the ELFNixPlatform in
45ac5f54418.
This should allow us to re-enable the ELFNixPlatform in a follow-up patch.
Matt Arsenault [Sun, 15 Aug 2021 01:16:42 +0000 (21:16 -0400)]
AMDGPU: Fix hardcoded registers in test
Matt Arsenault [Sat, 14 Aug 2021 23:01:27 +0000 (19:01 -0400)]
AMDGPU/GlobalISel: Add baseline test for new ABI attribute hints
Matt Arsenault [Fri, 13 Aug 2021 17:28:57 +0000 (13:28 -0400)]
AMDGPU: Remove implicit argument attributes when introducing new calls
In a future patch, a new set of amdgpu-no-* attributes will be
introduced to indicate when a function does not need an implicitly
passed input. This pass introduces new instances of these intrinsic
calls, and should remove the attributes if they were present before.
Matt Arsenault [Fri, 27 Aug 2021 02:04:13 +0000 (22:04 -0400)]
AMDGPU: Fix broken test
Chen Zheng [Thu, 1 Jul 2021 09:41:03 +0000 (09:41 +0000)]
[PowerPC][ELF] make sure local variable space does not overlap with parameter save area
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D105271
Matt Arsenault [Wed, 11 Aug 2021 22:33:40 +0000 (18:33 -0400)]
AMDGPU: Invert AMDGPUAttributor
Switch to using BitIntegerState for each of the inputs, and invert
their meanings.
This now diverges more from the old AMDGPUAnnotateKernelFeatures, but
this isn't used yet anyway.
Matt Arsenault [Sat, 14 Aug 2021 17:24:54 +0000 (13:24 -0400)]
AMDGPU: Fix broken check lines
Matt Arsenault [Sat, 14 Aug 2021 19:58:17 +0000 (15:58 -0400)]
GlobalISel: Add CallBase to CallLoweringInfo
The DAG version has this, and is necessary for call lowering to take
advantage of any attributes at the call site.
Matt Arsenault [Fri, 13 Aug 2021 18:20:00 +0000 (14:20 -0400)]
AMDGPU: Restrict attributor transforms
We only really want this to add the custom attributes. Theoretically
the regular transforms were already run at this point. Touching
undefined behavior breaks a lot of tests when this is enabled by
default, many of which are expecting to test handling of undef
operations.
George Rokos [Fri, 27 Aug 2021 01:00:05 +0000 (18:00 -0700)]
[libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages.
Matt Arsenault [Wed, 11 Aug 2021 23:01:30 +0000 (19:01 -0400)]
AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor
amdgpu-calls and amdgpu-stack-objects don't really belong as
attributes, and are currently a hacky way of passing an analysis into
the DAG. These don't really belong in the IR, and don't really fit in
with the other attributes. Remove these to facilitate inverting the
pass.
I don't exactly understand the indirect call test changes. These tests
are using calls which are trivially replacable with a direct call, so
I'm not sure what the point is.
Matt Arsenault [Sat, 14 Aug 2021 00:43:32 +0000 (20:43 -0400)]
AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer
support using it outside of kernels.
Heejin Ahn [Thu, 26 Aug 2021 19:25:03 +0000 (12:25 -0700)]
[WebAssembly] Fix PHI when relaying longjmps
When doing Emscritpen EH, if SjLj is also enabled and used and if the
thrown exception has a possiblity being a longjmp instead of an
exception, we shouldn't swallow it; we should rethrow, or relay it. It
was done in D106525 and the code is here:
https://github.com/llvm/llvm-project/blob/
8441a8eea8007b9eaaaabf76055949180a702d6d/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp#L858-L898
Here is the pseudocode of that part: (copied from comments)
```
if (%__THREW__.val == 0 || %__THREW__.val == 1)
goto %tail
else
goto %longjmp.rethrow
longjmp.rethrow: ;; This is longjmp. Rethrow it
%__threwValue.val = __threwValue
emscripten_longjmp(%__THREW__.val, %__threwValue.val);
tail: ;; Nothing happened or an exception is thrown
... Continue exception handling ...
```
If the current BB (where the `invoke` is created) has successors that
has the current BB as its PHI incoming node, now that has to change to
`tail` in the pseudocode, because `tail` is the latest BB that is
connected with the next BB, but this was missing.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D108785
David Blaikie [Wed, 25 Aug 2021 19:03:53 +0000 (12:03 -0700)]
Remove set-but-unused variable
Vitaly Buka [Thu, 26 Aug 2021 17:25:09 +0000 (10:25 -0700)]
[sanitizer] No THREADLOCAL in qsort and bsearch
qsort can reuse qsort_r if available.
bsearch always passes key as the first comparator argument, so we
can use it to wrap the original comparator.
Differential Revision: https://reviews.llvm.org/D108751
Matt Arsenault [Sat, 14 Aug 2021 15:00:47 +0000 (11:00 -0400)]
AMDGPU: Remove unnecessary -NEXT checks
This avoids spuriously breaking the test in a future change
Matt Arsenault [Sat, 14 Aug 2021 15:58:07 +0000 (11:58 -0400)]
AMDGPU: Fix amdgpu_gfx calling convention usage in test
This was calling a regular C function from amdgpu_gfx, which isn't
defined to have all of the necessary implicit arguments.
Jez Ng [Thu, 8 Jul 2021 16:31:37 +0000 (12:31 -0400)]
[lld-macho][nfc] Clean up InputSection constructors
Artem Belevich [Thu, 26 Aug 2021 23:00:18 +0000 (16:00 -0700)]
[CUDA] update constraints on NVPTX builtins to include PTX73 and 74.
Matt Arsenault [Thu, 26 Aug 2021 21:41:33 +0000 (17:41 -0400)]
AMDGPU: Fix crashing on kernel declarations when lowering LDS
This was trying to insert the used marker into a declaration.
Jez Ng [Thu, 26 Aug 2021 17:51:38 +0000 (13:51 -0400)]
[lld-macho] Have -ObjC load archive members before symbol resolution
This is what ld64 does. Deviating in behavior here can result
in some subtle duplicate symbol errors, as detailed in the objc.s test.
Differential Revision: https://reviews.llvm.org/D108781
Jez Ng [Thu, 26 Aug 2021 15:49:47 +0000 (11:49 -0400)]
[lld-macho] Refactor archive loading
The previous logic was duplicated between symbol-initiated
archive loads versus flag-initiated loads (i.e. `-force_load` and
`-ObjC`). This resulted in code duplication as well as redundant work --
we would create Archive instances twice whenever we had one of those
flags; once in `getArchiveMembers` and again when we constructed the
ArchiveFile.
This was motivated by an upcoming diff where we load archive members
containing ObjC-related symbols before loading those containing
ObjC-related sections, as well as before performing symbol resolution.
Without this refactor, it would be difficult to do that while avoiding
loading the same archive member twice.
Differential Revision: https://reviews.llvm.org/D108780
Jez Ng [Thu, 26 Aug 2021 02:46:48 +0000 (22:46 -0400)]
[lld-macho] Fix unwind info personality size
This was missed by {D107035}. This fix addresses the following warning:
loop variable 'personality' has type 'const uint32_t &' (aka 'const unsigned int &') but is initialized with type 'const unsigned long long' resulting in a copy [-Wrange-loop-analysis]
In addition to fixing the size, I also removed the const reference,
since there's no performance benefit to avoiding copies of integer-sized
values.
Butygin [Sat, 14 Aug 2021 08:57:02 +0000 (11:57 +0300)]
[mlir][spirv] Initial support for 64 bit index type and builtins
Differential Revision: https://reviews.llvm.org/D108516
Benson Chu [Sun, 15 Aug 2021 18:12:21 +0000 (13:12 -0500)]
[AST] Pick last tentative definition as the acting definition
Clang currently picks the second tentative definition when
VarDecl::getActingDefinition is called.
This can lead to attributes being dropped if they are attached to
tentative definitions that appear after the second one. This is
because VarDecl::getActingDefinition loops through VarDecl::redecls
assuming that the last tentative definition is the last element in the
iterator. However, it is the second element that would be the last
tentative definition.
This changeset modifies getActingDefinition to iterate through the
declaration chain in reverse, so that it can immediately return when
it encounters a tentative definition.
Originally the unit test for this changeset did not have a -triple
flag for the clang invocation, leading to this test being broken on
MacOS, since Mach-O does not support the section attribute.
Differential Revision: https://reviews.llvm.org/D99732
Arthur Eubanks [Thu, 26 Aug 2021 21:32:06 +0000 (14:32 -0700)]
[clang][NewPM] Mention that legacy PM flags are deprecated
Differential Revision: https://reviews.llvm.org/D108789
Yonghong Song [Thu, 26 Aug 2021 18:25:04 +0000 (11:25 -0700)]
[DebugInfo] convert btf_tag attrs to DI annotations for func parameters
Generate btf_tag annotations for DILocalVariable. The annotations
are represented as an DINodeArray in DebugInfo.
Differential Revision: https://reviews.llvm.org/D106620
Fangrui Song [Thu, 26 Aug 2021 21:25:31 +0000 (14:25 -0700)]
[CMake] Change -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=off to -DLLVM_ENABLE_NEW_PASS_MANAGER=off
LLVM_ENABLE_NEW_PASS_MANAGER is set to ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER, so
-DLLVM_ENABLE_NEW_PASS_MANAGER=off has no effect.
Change the cache variable to LLVM_ENABLE_NEW_PASS_MANAGER instead.
A user opting out the new PM needs to switch from
-DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=off to
-DLLVM_ENABLE_NEW_PASS_MANAGER=off.
Also give a warning that -DLLVM_ENABLE_NEW_PASS_MANAGER=off is deprecated.
Reviewed By: aeubanks, phosek
Differential Revision: https://reviews.llvm.org/D108775
Yonghong Song [Mon, 19 Jul 2021 16:11:10 +0000 (09:11 -0700)]
[DebugInfo] generate btf_tag annotations for func parameters
Generate btf_tag annotations for function parameters.
A field "annotations" is introduced to DILocalVariable, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
distinct !DILocalVariable(name: "info",, arg: 1, ..., annotations: !10)
!10 = !{!11, !12}
!11 = !{!"btf_tag", !"a"}
!12 = !{!"btf_tag", !"b"}
Differential Revision: https://reviews.llvm.org/D106620
Artem Dergachev [Thu, 26 Aug 2021 04:33:38 +0000 (21:33 -0700)]
[analyzer] Fix scan-build report deduplication.
The previous behavior was to deduplicate reports based on md5 of the
html file. This algorithm might have worked originally but right now
HTML reports contain information rich enough to make them virtually
always distinct which breaks deduplication entirely.
The new strategy is to (finally) take advantage of IssueHash - the
stable report identifier provided by clang that is the same if and only if
the reports are duplicates of each other.
Additionally, scan-build no longer performs deduplication on its own.
Instead, the report file name is now based on the issue hash,
and clang instances will silently refuse to produce a new html file
when a duplicate already exists. This eliminates the problem entirely.
The '-analyzer-config stable-report-filename' option is deprecated
because report filenames are no longer unstable. A new option is
introduced, '-analyzer-config verbose-report-filename', to produce
verbose file names that look similar to the old "stable" file names.
The old option acts as an alias to the new option.
Differential Revision: https://reviews.llvm.org/D105167
Kirill Stoimenov [Thu, 19 Aug 2021 17:58:29 +0000 (17:58 +0000)]
[asan] Implemented flag to emit intrinsics to optimize ASan callbacks.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D108377
Kirill Stoimenov [Thu, 26 Aug 2021 00:12:53 +0000 (00:12 +0000)]
[asan] Fixed a runtime crash.
Looks like the NoRegister has some effect on the final code that is generated. My guess is that some optimization kicks in at the end?
When I use -S to dump the assembly I get the correct version with 'shrq $3, %r8':
movq %r9, %r8
shrq $3, %r8
movsbl
2147450880(%r8), %r8d
But, when I disassemble the final binary I get RAX in stead of R8:
mov %r9,%r8
shr $0x3,%rax
movsbl 0x7fff8000(%r8),%r8d
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D108745
Rob Suderman [Thu, 26 Aug 2021 18:20:58 +0000 (11:20 -0700)]
[mlir][tosa] Tosa reverse to linalg supporting dynamic shapes
Needed to switch to extract to support tosa.reverse using dynamic shapes.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108744
Alexey Bataev [Tue, 3 Aug 2021 20:20:32 +0000 (13:20 -0700)]
[SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.
Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.
The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.
The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.
Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.
Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.
Differential Revision: https://reviews.llvm.org/D105020
Nikita Popov [Thu, 26 Aug 2021 19:12:11 +0000 (21:12 +0200)]
[MergeICmps] Add test for call before first load (NFC)
If a clobbering call happens before all loads, that shouldn't
block the transform.
Arthur Eubanks [Thu, 26 Aug 2021 19:05:56 +0000 (12:05 -0700)]
[test] Update precommit tests for D108734
Vitaly Buka [Thu, 26 Aug 2021 19:02:45 +0000 (12:02 -0700)]
[sanitizer] Add basic qsort test
Jon Chesterfield [Thu, 26 Aug 2021 17:56:01 +0000 (18:56 +0100)]
[libomptarget][amdgpu][nfc] Rename variables, delete dead code
Andrea Di Biagio [Thu, 26 Aug 2021 18:53:17 +0000 (19:53 +0100)]
Revert "[MCA][NFC] Remove redundant calls to std::move."
This reverts commit
9cc0023fb863194be526f0bf19bd21e36236c5f6.
due to buildbot failures.
Siva Chandra Reddy [Thu, 26 Aug 2021 05:21:54 +0000 (05:21 +0000)]
[libc][NFC] Move the mutex implementation into a utility class.
This allows others parts of the libc to use the mutex types without
actually pulling in public function implementations.
Along the way, few cleanups have been done, like using a uniform type to
refer the linux futex word.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D108749
Andrea Di Biagio [Thu, 26 Aug 2021 18:43:18 +0000 (19:43 +0100)]
[MCA][NFC] Remove redundant calls to std::move.
This fixes some redundant move in return statement [-Wredundant-move] gcc 9.3.0
warnings.
This also fixes a minor coverity issue reported agaist class MCAOperand about
the lack of proper initialization for field Index.
No functional change intended.
Jessica Paquette [Thu, 26 Aug 2021 18:04:17 +0000 (11:04 -0700)]
[AArch64][GlobalISel] Optimize G_BUILD_VECTOR of undef + 1 elt -> SUBREG_TO_REG
This pattern
```
%elt = ... something ...
%undef = G_IMPLICIT_DEF
%vec = G_BUILD_VECTOR %elt, %undef, %undef, ... %undef
```
Can be selected to a SUBREG_TO_REG, assuming `%elt` and `%vec` have the same
register bank. We don't care about any of the bits in `%vec` aside from those
in `%elt`, which just happens to be the 0th element.
This is preferable to emitting `mov` instructions for every index.
This gives minor code size improvements on the test suite at -Os.
Differential Revision: https://reviews.llvm.org/D108773
RamNalamothu [Thu, 26 Aug 2021 18:24:15 +0000 (23:54 +0530)]
[docs, AMDGPU] Fix typo in dwarf register number mapping
Reviewed By: xgupta
Differential Revision: https://reviews.llvm.org/D108557
Yaron Keren [Sat, 21 Aug 2021 17:59:45 +0000 (20:59 +0300)]
[docs] Update Getting Started with Visual Studio guide
Update this document for 2021.
Reviewed By: aaron.ballman, kuhnel, amccarth
Differential Revision: https://reviews.llvm.org/D108513
Rob Suderman [Thu, 26 Aug 2021 18:06:12 +0000 (11:06 -0700)]
[mlir][tosa] Elementwise operation dynamic shape support
Added dynamic shape support for elementwise operations. This assumes equal
sizes (broadcasting 1-length dynamic is problematic).
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108730
Louis Dionne [Thu, 26 Aug 2021 18:18:03 +0000 (14:18 -0400)]
[libc++][NFC] Sort headers alphabetically
Shafik Yaghmour [Thu, 26 Aug 2021 18:11:00 +0000 (11:11 -0700)]
[LLDB] Add type to the output for FieldDecl when logging in ClangASTSource::layoutRecordType
I was debugging a problem and noticed that it would have been helpful to have
the type of each FieldDecl when looking at the output from
ClangASTSource::layoutRecordType.
Differential Revision: https://reviews.llvm.org/D108257
Andrea Di Biagio [Thu, 26 Aug 2021 17:57:59 +0000 (18:57 +0100)]
[MCA][RegisterFile] Consistently update the PRF in the presence of multiple writes to the same register.
My last change to the RegisterFile (PR51495) has introduced a bug in the logic
that allocates physical registers in the PRF.
In some cases, this bug could have triggered a nasty unsigned wrap in the number
of allocated registers, thus resulting in mca being stuck forever in a loop of
PRF availability checks.
LLVM GN Syncbot [Thu, 26 Aug 2021 18:08:07 +0000 (18:08 +0000)]
[gn build] Port
ee44dd8062a2
Louis Dionne [Wed, 11 Aug 2021 21:36:35 +0000 (17:36 -0400)]
[libc++] Implement the underlying mechanism for range adaptors
This patch implements the underlying mechanism for range adaptors. It
does so based on http://wg21.link/p2387, even though that paper hasn't
been adopted yet. In the future, if p2387 is adopted, it would suffice
to rename `__bind_back` to `std::bind_back` and `__range_adaptor_closure`
to `std::range_adaptor_closure` to implement that paper by the spec.
Differential Revision: https://reviews.llvm.org/D107098