Yaxun (Sam) Liu [Sat, 21 Mar 2020 21:06:39 +0000 (17:06 -0400)]
[AMDGPU] Add builtin functions image_bvh_intersect_ray
Reviewed by: Stanislav Mekhanoshin, Matt Arsenault
Differential Revision: https://reviews.llvm.org/D104946
Nico Weber [Wed, 30 Jun 2021 16:58:59 +0000 (12:58 -0400)]
[gn build] add dep needed after
b56e5f8a10c1e
Nico Weber [Wed, 30 Jun 2021 16:52:01 +0000 (12:52 -0400)]
[clangd] Unbreak mac build after
0c96a92d8666b8
That commit removed the include of Features.inc from ClangdLSPServer.h,
but ClangdMain.cpp relied on this include to pull in Features.inc for
the #if at the bottom of Transport.h.
Since the include is needed in Transport.h, just add it to there
directly.
Fangrui Song [Wed, 30 Jun 2021 16:43:28 +0000 (09:43 -0700)]
[ELF] -pie: produce dynamic relocations for absolute relocations referencing undef weak
See the comment for my understanding of -no-pie and -shared expectation.
-no-pie has freedom on choices. We choose dynamic relocations to be consistent
with the handling of GOT-generating relocations.
Note: GNU ld has arch-varying behaviors and its x86 -pie has a very
complex rule:
if there is at least one GOT-generating or PLT-generating relocation and
-z dynamic-undefined-weak (enabled by default) is in effect, generate a
dynamic relocation.
We don't emulate its rule.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D105164
David Goldman [Fri, 11 Jun 2021 14:16:19 +0000 (10:16 -0400)]
[clangd] Fix highlighting for implicit ObjC property refs
Objective-C lets you use the `self.prop` syntax as sugar for both
`[self prop]` and `[self setProp:]`, but clangd previously did not
provide a semantic token for `prop`.
Now, we provide a semantic token, treating it like a normal property
except it's backed by a `ObjCMethodDecl` instead of a
`ObjCPropertyDecl`.
Differential Revision: https://reviews.llvm.org/D104117
Caroline Tice [Tue, 29 Jun 2021 21:50:10 +0000 (14:50 -0700)]
[lldb] Replace SVE_PT* macros in NativeRegisterContextLinux_arm64.{cpp,h} with their equivalent defintions in LinuxPTraceDefines_arm64sve.h
Commit
090306fc80dbf (August 2020) changed most of the arm64 SVE_PT*
macros, but apparently did not make the changes in the
NativeRegisterContextLinux_arm64.* files (or those files were pulled
over from someplace else after that commit). This change replaces the
macros NativeRegisterContextLinux_arm64.cpp with the replacement
definitions in LinuxPTraceDefines_arm64sve.h. It also includes
LinuxPTraceDefines_arm64sve.h in NativeRegisterContextLinux_arm64.h.
Differential Revision: https://reviews.llvm.org/D104826
thomasraoux [Wed, 30 Jun 2021 07:00:11 +0000 (00:00 -0700)]
[mlir] Fix wrong type in WmmaConstantOpToNVVMLowering
InsertElement takes a scalar integer attribute not an array of integer.
Differential Revision: https://reviews.llvm.org/D105174
thomasraoux [Wed, 30 Jun 2021 07:02:47 +0000 (00:02 -0700)]
[mlir][VectorToGPU] Support converting vetor.broadcast to MMA op
Differential Revision: https://reviews.llvm.org/D105175
LLVM GN Syncbot [Wed, 30 Jun 2021 15:57:43 +0000 (15:57 +0000)]
[gn build] Port
0c96a92d8666
Jeremy Morse [Tue, 29 Jun 2021 17:50:24 +0000 (18:50 +0100)]
[LiveDebugValues][InstrRef][1/2] Recover more clobbered variable locations
In various circumstances, when we clobber a register there may be
alternative locations that the value is live in. The classic example would
be a value loaded from the stack, and then clobbered: the value is still
available on the stack. InstrRefBasedLDV was coping with this at block
starts where it's forced to pick a location, however it wasn't searching
for alternative locations when values were clobbered.
This patch notifies the "Transfer Tracker" object when clobbers occur, and
it's able to find alternatives and issue DBG_VALUEs for that location. See:
the added test.
Differential Revision: https://reviews.llvm.org/D88405
Joseph Huber [Wed, 30 Jun 2021 14:55:14 +0000 (10:55 -0400)]
[OpenMP] Change analysis remarks to not emit on cold functions
The remarks will trigger on some functions that are marked cold, such as the
`__muldc3` intrinsic functions. Change the remarks to avoid these functions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105196
Sam McCall [Thu, 11 Mar 2021 00:20:36 +0000 (01:20 +0100)]
[clangd] Show padding following a field on field hover.
This displays as: `Size: 4 bytes (+4 padding)`
Also stop showing (byte) offset/size for bitfields. They're not
meaningful and using them to calculate padding is dangerous!
Differential Revision: https://reviews.llvm.org/D98377
Sam McCall [Thu, 15 Apr 2021 12:29:57 +0000 (14:29 +0200)]
[clangd] Log feature configuration (linux+asan+grpc) of the clangd build
Included in logs, --version, remote index queries, and LSP serverInfo.
Differential Revision: https://reviews.llvm.org/D100553
Sam McCall [Wed, 16 Jun 2021 13:24:23 +0000 (15:24 +0200)]
[clangd] Correct SelectionTree behavior around anonymous field access.
struct A { struct { int b; }; };
A().^b;
This should be considered a reference to b, but currently it's
considered a reference to the anonymous struct field.
Fixes https://github.com/clangd/clangd/issues/798
Differential Revision: https://reviews.llvm.org/D104376
Philip Reames [Wed, 30 Jun 2021 15:31:13 +0000 (08:31 -0700)]
[SCEV] Fold (0 udiv %x) to 0
We have analogous rules in instsimplify, etc.., but were missing the same in SCEV. The fold is near trivial, but came up in the context of a larger change.
Philip Reames [Wed, 30 Jun 2021 15:26:18 +0000 (08:26 -0700)]
[test] precommit a test for missing (0 /u %x) SCEV fold
Louis Dionne [Wed, 30 Jun 2021 15:11:52 +0000 (11:11 -0400)]
[libc++] Remove broken links and outdated information in the docs
The various design docs have been moved to RST, and the linked blog post
does not apply anymore since libc++ is the default library used by Clang
on Apple platforms.
Craig Topper [Tue, 29 Jun 2021 23:21:26 +0000 (16:21 -0700)]
[ARM] Fix incorrect assignment of Changed variable in MVEGatherScatterLowering::optimiseOffsets.
I believe this Changed flag should be initialized to false,
otherwise the if (!Changed) is always dead. This doesn't
manifest in a functional issue because the PHINode checks will
fail if nothing changed. They are identical to the earlier
checks that must have already failed to get into this else block.
While there remove an else after return to reduce indentation.
Differential Revision: https://reviews.llvm.org/D105159
Louis Dionne [Fri, 18 Jun 2021 17:33:14 +0000 (13:33 -0400)]
[lit] Add the ability to parse regexes in Lit boolean expressions
This patch augments Lit with the ability to parse regular expressions
in boolean expressions. This includes REQUIRES:, XFAIL:, UNSUPPORTED:,
and all other special Lit markup that evaluates to a boolean expression.
Regular expressions can be specified by enclosing them in {{...}},
similarly to how FileCheck handles such regular expressions. The regular
expression can either be on its own, or it can be part of an identifier.
For example, a match expression like {{.+}}-apple-darwin{{.+}} would match
the following variables:
x86_64-apple-darwin20.0
arm64-apple-darwin20.0
arm64-apple-darwin22.0
etc...
In the long term, this could be used to remove the need to handle the
target triple specially when parsing boolean expressions.
Differential Revision: https://reviews.llvm.org/D104572
Florian Mayer [Wed, 30 Jun 2021 13:48:57 +0000 (14:48 +0100)]
[hwasan] Add missing newline in report.
Reviewed By: glider
Differential Revision: https://reviews.llvm.org/D105190
Simon Pilgrim [Wed, 30 Jun 2021 14:01:51 +0000 (15:01 +0100)]
[CostModel][X86] Adjust fp<->int vXi32 AVX1+ costs based on llvm-mca reports
Based off the worse case numbers generated by D103695, the AVX1/2/512 sitofp/uitofp/fptosi/fptoui costs were higher than necessary (based off instruction counts instead of actual throughput).
The SSE costs still need further fixes, but I hit an issue with the order in which SSE costs are checked - we need to check CUSTOM costs (with non-legal types) first, and then fallback to LEGALIZED types. I'm looking at this now, and this should let us start thinning out a lot of the duplicates in the costs tables.
Then we can finally start work on vXi64 / vXi16 / vXi8 / vXi1 integers, which should let us look at sub-128-bit vectorization (D103925).
Nico Weber [Wed, 30 Jun 2021 14:21:33 +0000 (10:21 -0400)]
Revert "[Coroutine] Add statistics for the number of elided coroutine"
This reverts commit
1d9539cf49a585e7c3cd8faa1b8e7291e0ce285c.
Test fails in LLVM_ENABLE_ASSERTIONS=OFF builds (such as regular
release builds).
William S. Moses [Wed, 30 Jun 2021 14:09:42 +0000 (10:09 -0400)]
[MLIR] Update description of SCF.execute_region op
See https://reviews.llvm.org/D104865
William S. Moses [Fri, 25 Jun 2021 23:40:35 +0000 (19:40 -0400)]
[MLIR][SCF] Inline ExecuteRegion if parent can contain multiple blocks
The executeregionop is used to allow multiple blocks within SCF constructs. If the container allows multiple blocks, inline the region
Differential Revision: https://reviews.llvm.org/D104960
Melanie Blower [Mon, 28 Jun 2021 16:45:56 +0000 (12:45 -0400)]
[clang][patch] Add builtin __arithmetic_fence and option fprotect-parens
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.
Reviewed By: aaron.ballman, kpn
Differential Revision: https://reviews.llvm.org/D100118
Joseph Huber [Tue, 29 Jun 2021 21:05:31 +0000 (17:05 -0400)]
[OpenMP] Add additional remarks for OpenMPOpt
This patch adds additional remarks, suggesting the use of `noescape` for failed
globalization and indicating when internalization failed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105150
William S. Moses [Thu, 10 Jun 2021 19:12:04 +0000 (15:12 -0400)]
[MLIR] Eliminate unnecessary affine stores
Deduce circumstances where an affine load could not possibly be read by an operation (such as an affine load), and if so, eliminate the load
Differential Revision: https://reviews.llvm.org/D105041
Florian Hahn [Wed, 30 Jun 2021 12:28:09 +0000 (13:28 +0100)]
[Matrix] Add tests for hoisting address computations.
Peter Smith [Fri, 25 Jun 2021 10:39:47 +0000 (11:39 +0100)]
[LLD][ELF][ARM] Tidy up test to hook up missing filecheck patterns [NFC]
A couple of filecheck patterns had not been hooked up with
the patterns suffering from some drift. As this test is old
and llvm-objdump has improved a lot, take this opportunity to
hide the instruction encoding. I've also taken out a lot of
the explanatory comments that llvm-objdump improvements make
redundant, as these comments oftern don't get updated when addresses
change.
Differential Revision: https://reviews.llvm.org/D104907
alex-t [Wed, 30 Jun 2021 12:48:02 +0000 (15:48 +0300)]
[AMDGPU] PHI node cost should not be counted for the size and latency.
Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
body size/cost estimation.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D105104
Peter Smith [Fri, 25 Jun 2021 09:52:15 +0000 (10:52 +0100)]
[LLD][ELF][ARM] Fix case of patched unrelocated BLX
There are a couple of problems with the code to patch
unrelocated BLX instructions:
1. The calculation of the PC needs to take into account
the alignment of the instruction. The Thumb BLX
uses alignDown(PC, 4) for the source address.
2. The calculation of the PC bias is hard-coded to 4
which works for Thumb, but when there is a BLX the
branch will be in Arm state so it needs an 8 byte
PC bias.
No asssembler generates an unrelocated BLX instruction
so these problems do not affect real world programs.
However we should still fix them.
Differential Revision: https://reviews.llvm.org/D104905
Bradley Smith [Mon, 28 Jun 2021 12:39:07 +0000 (13:39 +0100)]
[TargetLowering][AArch64][SVE] Take into account accessed type when clamping address
When clamping the index for a memory access to a stacked vector we must
take into account the entire type being accessed, not just assume that
we are accessing only a single element.
Differential Revision: https://reviews.llvm.org/D105016
Tobias Gysi [Wed, 30 Jun 2021 12:26:33 +0000 (12:26 +0000)]
[mlir][linalg][python] Update the OpDSL doc (NFC).
Update the OpDSL documentation to reflect recent changes. In particular, the updated documentation discusses:
- Attributes used to parameterize index expressions
- Shape-only tensor support
- Scalar parameters
Differential Revision: https://reviews.llvm.org/D105123
Saiyedul Islam [Wed, 9 Jun 2021 13:19:45 +0000 (18:49 +0530)]
[clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives
This patch adds unbundling support of an archive file. It takes an
archive file along with a set of offload targets as input.
Output is a device specific archive for each given offload target.
Input archive contains bundled code objects bundled using
clang-offload-bundler. Each generated device specific archive contains
a set of device code object files which are named as
<Parent Bundle Name>-<CodeObject-GPUArch>.
Entries in input archive can be of any binary type which is
supported by clang-offload-bundler, like *.bc. Output archives will
contain files in same type.
Example Usuage:
clang-offload-bundler --unbundle --inputs=lib-generic.a -type=a
-targets=openmp-amdgcn-amdhsa--gfx906,openmp-amdgcn-amdhsa--gfx908
-outputs=devicelib-gfx906.a,deviceLib-gfx908.a
Reviewed By: jdoerfert, yaxunl
Differential Revision: https://reviews.llvm.org/D93525
Simon Pilgrim [Wed, 30 Jun 2021 10:36:06 +0000 (11:36 +0100)]
Fix MSVC "32-bit shift implicitly converted to 64 bits" warning.
Alexey Bataev [Tue, 29 Jun 2021 19:26:37 +0000 (12:26 -0700)]
[OPENMP]Fix PR50929: Ignored initializer clause in user-defined reduction.
No need to try to create the default constructor for private copy, it
will be called automatically in the initializer of the declare
reduction. Fixes balance between constructors/destructors calls.
Differential Revision: https://reviews.llvm.org/D105143
Zhouyi Zhou [Wed, 30 Jun 2021 11:46:35 +0000 (19:46 +0800)]
[clang] NFC: add line break at the end of if expressions
Hi,
In function TransformTemplateArgument,
would it be better to add line break at the end of "if" expressions?
I use clang-format to do the job for me.
Thanks a lot
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D104604
madhur13490 [Fri, 18 Jun 2021 08:14:54 +0000 (13:44 +0530)]
[AMDGPU] Simplify getReservedNumSGPRs
This is a followup patch on D103636 where
it seemed checking on amdgpu-calls and
amdgpu-stack-objects is unnecessary. Removing these
checks didn't regress any tests functionally.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D104513
Florian Hahn [Wed, 30 Jun 2021 08:45:50 +0000 (09:45 +0100)]
[ConstantRanges] Use APInt for constant case for urem/srem.
Currently UREM & SREM on constant ranges produces overly pessimistic
results for single element constant ranges.
Delegate to APInt's implementation if both operands are single element
constant ranges. We already do something similar for other binary
operators, like binary AND.
Fixes PR49731.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D105115
Florian Mayer [Mon, 28 Jun 2021 13:19:43 +0000 (14:19 +0100)]
[hwasan] Make sure we retag with a new tag on free.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105021
David Sherwood [Tue, 29 Jun 2021 15:12:17 +0000 (16:12 +0100)]
[NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
Florian Mayer [Tue, 29 Jun 2021 19:11:41 +0000 (20:11 +0100)]
[MTE] Remove redundant helper function.
Looking at PostDominatorTree::dominates, we can see that has the same
logic (with the addition of handling Phi nodes - which are not used as inputs in
this pass) as the helper function.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105141
Jay Foad [Fri, 25 Jun 2021 16:08:36 +0000 (17:08 +0100)]
[TableGen] Allow identical MnemonicAliases with no predicate
My use case for this is illustrated in the test case: I want to define
the same instruction twice with different (disjoint) predicates, because
the instruction has different operands on different subtargets. It's
convenient to do this with a multiclass that also defines an alias for
the instruction.
Previously tablegen would complain if this alias was defined twice with
no predicate. One way to fix this would be to add a predicate on each
definition of the alias, matching the predicate on the instruction. But
this (a) is slightly awkward to do in the real world use case I had, and
(b) leads to an inefficient matcher that will do something like this:
if (Mnemonic == "foo_alias") {
if (Features.test(Feature_Subtarget1Bit))
Mnemonic == "foo";
else if (Features.test(Feature_Subtarget2Bit))
Mnemonic == "foo";
return;
}
It would be more efficient to skip the feature tests and return "foo"
unconditionally.
Overall it seems better to allow multiple definitions of the identical
alias with no predicate.
Differential Revision: https://reviews.llvm.org/D105033
Valeriy Savchenko [Wed, 30 Jun 2021 09:49:31 +0000 (12:49 +0300)]
[analyzer][satest][NFC] Relax dependencies requirements
Igor Kudrin [Wed, 30 Jun 2021 09:34:52 +0000 (16:34 +0700)]
[ARMInstPrinter] Print the target address of a branch instruction
This follows other patches that changed printing immediate values of
branch instructions to target addresses, see D76580 (x86), D76591 (PPC),
D77853 (AArch64).
As observing immediate values might sometimes be useful, they are
printed as comments for branch instructions.
// llvm-objdump -d output (before)
000200b4 <_start>:
200b4: ff ff ff fa blx #-4 <thumb>
000200b8 <thumb>:
200b8: ff f7 fc ef blx #-8 <_start>
// llvm-objdump -d output (after)
000200b4 <_start>:
200b4: ff ff ff fa blx 0x200b8 <thumb> @ imm = #-4
000200b8 <thumb>:
200b8: ff f7 fc ef blx 0x200b4 <_start> @ imm = #-8
// GNU objdump -d.
000200b4 <_start>:
200b4:
faffffff blx 200b8 <thumb>
000200b8 <thumb>:
200b8: f7ff effc blx 200b4 <_start>
Differential Revision: https://reviews.llvm.org/D104701
Tobias Gysi [Wed, 30 Jun 2021 08:59:22 +0000 (08:59 +0000)]
[mlir][linalg][python] Explicit shape and dimension order in OpDSL.
Extend the OpDSL syntax with an optional `domain` function to specify an explicit dimension order. The extension is needed to provide more control over the dimension order instead of deducing it implicitly depending on the formulation of the tensor comprehension. Additionally, the patch also ensures the symbols are ordered according to the operand definitions of the operation.
Differential Revision: https://reviews.llvm.org/D105117
Igor Kudrin [Wed, 30 Jun 2021 08:54:53 +0000 (15:54 +0700)]
[ARM][NFC] Remove an unused method
`ARMInstPrinter::printMveAddrModeQOperand()` was added in D62680, but
was never used. It looks like `printT2AddrModeImm8Operand<false>()` is
used instead.
Differential Revision: https://reviews.llvm.org/D105124
Stephan Herhut [Tue, 29 Jun 2021 14:47:58 +0000 (16:47 +0200)]
[mlir][llvm] Add a test for memref.copy lowering to llvm
This was missing and also there was a bug in the lowering itself, which went unnoticed due to it.
Differential Revision: https://reviews.llvm.org/D105122
Sjoerd Meijer [Tue, 29 Jun 2021 10:33:14 +0000 (11:33 +0100)]
Recommit "[AArch64] Custom lower <4 x i8> loads"
This recommits D104782 including a fix for adding a wrong operand to the new
load node.
Differential Revision: https://reviews.llvm.org/D105110
Dmitry Polukhin [Tue, 29 Jun 2021 12:57:14 +0000 (05:57 -0700)]
[clang] Fix UB when string.front() is used for the empty string
Compilation database might have empty string as a command line argument.
But ExpandResponseFilesDatabase::expand doesn't expect this and assumes
that string.front() can be used for any argument. It is undefined behaviour if
string is empty. With debug build mode it causes crash in clangd.
Test Plan: check-clang
Differential Revision: https://reviews.llvm.org/D105120
Kai Luo [Wed, 30 Jun 2021 05:36:26 +0000 (05:36 +0000)]
[PowerPC][AIX] Re-generate test aix-framepointer-save-restore.ll. NFC.
Uday Bondhugula [Wed, 30 Jun 2021 04:20:33 +0000 (09:50 +0530)]
[MLIR] Fix generateCopyForMemRefRegion
Fix generateCopyForMemRefRegion for a missing check: in some cases, when
the thing to generate copies for itself is empty, no fast buffer/copy
loops would have been allocated/generated. Add an extra assertion there
while at this.
Differential Revision: https://reviews.llvm.org/D105170
Kai Luo [Wed, 30 Jun 2021 04:39:31 +0000 (04:39 +0000)]
[PowerPC][AIX] Pre-commit tracetable test for D100167. NFC.
Mehdi Amini [Wed, 30 Jun 2021 04:08:36 +0000 (04:08 +0000)]
Fix test pass registration to use the new API / not use the deprecated one (NFC)
Tony Tye [Fri, 7 May 2021 20:55:23 +0000 (20:55 +0000)]
[AMDGPU] Update gfx90a memory model support
Update AMDGPU gfx90a memory model to make coarse grain memory allocations
consistent when fine grained system scope atomic acquire and release is
performed.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D105137
Chuanqi Xu [Wed, 30 Jun 2021 03:24:44 +0000 (11:24 +0800)]
[FuncSpec] Add an option to specializing literal constant
Now the option is off by default. Since we are not sure if this option
would make the compile time increase aggressively. Although we tested it
on SPEC2017, we may need to test more to make it on by default.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D104365
Chuanqi Xu [Wed, 30 Jun 2021 03:20:51 +0000 (11:20 +0800)]
[Coroutine] Add statistics for the number of elided coroutine
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D105095
Fangrui Song [Wed, 30 Jun 2021 01:47:55 +0000 (18:47 -0700)]
[llvm-objcopy][MachO] Support LC_LINKER_OPTIMIZATION_HINT load command
The load command is currently specific to arm64 and holds information
for instruction rewriting, e.g. converting a GOT load to an ADR to
compute a local address.
(On ELF the information is usually conveyed by relocations, e.g.
R_X86_64_REX_GOTPCRELX, R_PPC64_TOC16_HA)
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D104968
Greg Clayton [Wed, 30 Jun 2021 01:03:25 +0000 (18:03 -0700)]
Fix buildbot compile error for https://reviews.llvm.org/D105160.
Greg Clayton [Tue, 29 Jun 2021 20:12:36 +0000 (13:12 -0700)]
Create synthetic symbol names on demand to improve memory consumption and startup times.
This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.
Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.
This patch fixes the issue by:
- not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
- doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
- removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool
Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)
After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)
The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.
Differential Revision: https://reviews.llvm.org/D105160
Nick Desaulniers [Wed, 30 Jun 2021 00:09:39 +0000 (17:09 -0700)]
[Test] delete LPM RUNs in inline_nossp.ll
This test was modified in D104958. Invoking opt with -{passname} (vs
-passes={passname}) without -enable-new-pm={0|1} is now ambiguous and
dependent on how LLVM was configured. Drop the LPM runs rather than
fix since there unlikely to be any users still on LPM that rely on the
behavior in this test.
See also:
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151553.html
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D105154
David Blaikie [Tue, 29 Jun 2021 23:39:05 +0000 (16:39 -0700)]
Conditionalize function only used in an assert to address -Wunused-function
Akira Hatanaka [Tue, 29 Jun 2021 23:27:24 +0000 (16:27 -0700)]
[CodeGen] Add ParmVarDecls to FunctionDecls that are created to generate
ObjC property getter/setter functions
This is needed to prevent clang from crashing when we make the changes
proposed in https://reviews.llvm.org/D98799.
Differential Revision: https://reviews.llvm.org/D104883
Lei Huang [Tue, 29 Jun 2021 23:03:23 +0000 (18:03 -0500)]
Revert "Attempt to disable MLIR JIT tests on PowerPC to unbreak the bot"
This reverts commit
652f4b5140e231b679564a86019307291f7bf7cc.
Re-enable MLLIR JIT tests.
The MLIR Bot was updated to export LD_LIBRARY_PATH=/usr/lib64, which
seem to fix this issue.
Steffen Larsen [Mon, 28 Jun 2021 22:43:10 +0000 (15:43 -0700)]
[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0.
PTX ISA description of
- `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld
- `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st
- `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma
- `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma
Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape
Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com>
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D104847
Dhruva Chakrabarti [Tue, 29 Jun 2021 22:07:57 +0000 (15:07 -0700)]
[libomptarget] [amdgpu] Change default number of teams per computation unit
This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D99003
Adrian Prantl [Tue, 29 Jun 2021 22:24:36 +0000 (15:24 -0700)]
Improve path remapping in cross-debugging scenarios
This patch implements a slight improvement when debugging across
platforms and remapping source paths that are in a non-native
format. See the unit test for examples.
rdar://
79205675
Differential Revision: https://reviews.llvm.org/D104407
Adrian Prantl [Tue, 29 Jun 2021 22:19:31 +0000 (15:19 -0700)]
Modernize Module::RemapFile to return an Optional (NFC)
This addresses feedback raised in https://reviews.llvm.org/D104404.
Differential Revision: https://reviews.llvm.org/D104724
Adrian Prantl [Tue, 29 Jun 2021 22:14:31 +0000 (15:14 -0700)]
Express PathMappingList::FindFile() in terms of PathMappingList::RemapPath()
NFC.
This patch replaces the function body FindFile() with a call to
RemapPath(), since the two functions implement the same functionality.
Differential Revision: https://reviews.llvm.org/D104406
Adrian Prantl [Tue, 29 Jun 2021 21:59:12 +0000 (14:59 -0700)]
Change PathMappingList::FindFile to return an optional result (NFC)
This is an NFC modernization refactoring that replaces the combination
of a bool return + reference argument, with an Optional return value.
Differential Revision: https://reviews.llvm.org/D104405
Aaron Puchert [Tue, 29 Jun 2021 21:51:52 +0000 (23:51 +0200)]
Thread safety analysis: Rename parameters of ThreadSafetyAnalyzer::intersectAndWarn (NFC)
In D104261 we made the parameters' meaning slightly more specific, this
changes their names accordingly. In all uses we're building a new lock
set by intersecting existing locksets. The first (modifiable) argument
is the new lock set being built, the second (non-modifiable) argument is
the exit set of a preceding block.
Reviewed By: aaron.ballman, delesley
Differential Revision: https://reviews.llvm.org/D104649
Aaron Puchert [Tue, 29 Jun 2021 21:46:43 +0000 (23:46 +0200)]
Thread safety analysis: Always warn when dropping locks on back edges
We allow branches to join where one holds a managed lock but the other
doesn't, but we can't do so for back edges: because there we can't drop
them from the lockset, as we have already analyzed the loop with the
larger lockset. So we can't allow dropping managed locks on back edges.
We move the managed() check from handleRemovalFromIntersection up to
intersectAndWarn, where we additionally check if we're on a back edge if
we're removing from the first lock set (the entry set of the next block)
but not if we're removing from the second lock set (the exit set of the
previous block). Now that the order of arguments matters, I had to swap
them in one invocation, which also causes some minor differences in the
tests.
Reviewed By: delesley
Differential Revision: https://reviews.llvm.org/D104261
Arthur Eubanks [Tue, 29 Jun 2021 18:38:18 +0000 (11:38 -0700)]
[OpaquePtr][BitcodeWriter] Handle attributes with types
For example, byval.
Skip the type attribute auto-upgrade if we already have the type.
I've actually seen this error of the ValueEnumerator missing a type
attribute's type in a non-opaque pointer context.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D105138
Nikita Popov [Tue, 29 Jun 2021 21:43:58 +0000 (23:43 +0200)]
[Test] Regenerate test checks (NFC)
Make these follow the update_test_checks.py format.
Matt Arsenault [Thu, 20 May 2021 02:06:14 +0000 (22:06 -0400)]
CodeGen: Store LLT instead of uint64_t in MachineMemOperand
GlobalISel is relying on regular MachineMemOperands to track all of
the memory properties of accesses. Just the raw byte size is
insufficent to disambiguate all situations. For example, if we need to
split an unaligned extending load, we need to know the number of bits
in the original source value and can't infer it from the result
type. This is also a problem for extending vector loads.
This does decrease the maximum representable size from the full
uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this,
other than places using UINT64_MAX for unknown sizes. This may be an
issue for G_MEMCPY and co., although they can just use unknown size
for large static sizes. This also has potential for backend abuse by
relying on the type when it really shouldn't be relevant after
selection.
This does not include the necessary MIR printer/parser changes to
represent this.
Matt Arsenault [Tue, 8 Jun 2021 22:23:34 +0000 (18:23 -0400)]
Revert "GlobalISel: Use MMO helper for getting the size in bits"
This reverts commit
dc98adfb448bdb845605185bb173e99614a17790.
This should still be done, but this is currently causing some commit
ordering issues.
Akira Hatanaka [Tue, 29 Jun 2021 21:22:26 +0000 (14:22 -0700)]
[CodeGen] Stop creating fake FunctionDecls when generating IR for
functions implicitly generated by the compiler
These fake functions would cause clang to crash if the changes proposed
in https://reviews.llvm.org/D98799 were made.
Duncan P. N. Exon Smith [Tue, 29 Jun 2021 19:49:57 +0000 (12:49 -0700)]
OpaquePtr: Support i32** with --force-opaque-pointers
4506f614cb6983a16d117cf77a968608e66d7a5c fixed parsing of textual IR to
reject `ptr*`, but broke the auto-conversion of `i32**` to `ptr` with
`--force-opaque-pointers`.
Get that working again by refactoring LLParser::parseType to only send
`ptr`-spelled pointers into the type suffix logic when it's the return
of a function type. This also rejects `ptr addrspace(3) addrspace(2)`,
which
1e6303e60ca5af4fbe7ca728572fd65666a98271 invadvertently started
accepting. Just the default top-level error message for the
double-addrspace since I had trouble thinking of something nice;
probably it's fine as is (it doesn't look valid the way that `ptr*`
does).
Differential Revision: https://reviews.llvm.org/D105146
Alexander Shaposhnikov [Tue, 29 Jun 2021 20:46:20 +0000 (13:46 -0700)]
[llvm-objcopy][MachO] Code cleanup
1. Remove unnecessary templates.
2. Fix potentially unaligned reads inside constructSection.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D105089
Stella Stamenova [Tue, 29 Jun 2021 20:54:48 +0000 (13:54 -0700)]
[lldb] Fix debug_loc.s which was broken after https://reviews.llvm.org/D103502
An empty location is now printed as <empty>
Dhruva Chakrabarti [Tue, 29 Jun 2021 00:52:01 +0000 (17:52 -0700)]
[libomptarget] [amdgpu] Fix default setting of max flat workgroup size
When max flat workgroup size is not specified, it is set to the default
workgroup size. This prevents kernel launch with a workgroup size larger
than the default. The fix is to ignore a size of 0 and treat it as
unspecified.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D105073
Siva Chandra Reddy [Tue, 29 Jun 2021 20:27:28 +0000 (20:27 +0000)]
[libc] Allow target architecture independent configs
Previously, we required entrypoints.txt for every target architecture
supported by a target OS. With this change, we allow architecture
independent config for a target OS. That is, if an architecture specific
entrypoints.txt is missing, then a generic entrypoints.txt for that
target OS will be used.
Reviewed By: caitlyncano
Differential Revision: https://reviews.llvm.org/D105147
Stella Stamenova [Tue, 29 Jun 2021 20:39:18 +0000 (13:39 -0700)]
[lldb] Fix globals-bss.cpp which was broken in https://reviews.llvm.org/D105055
-S replaced -s, so the test needs to be updated to use the new option
Jianzhou Zhao [Tue, 29 Jun 2021 06:17:00 +0000 (06:17 +0000)]
[dfsan] Expose dfsan_get_track_origins to get origin tracking status
This allows application code checks if origin tracking is on before
printing out traces.
-dfsan-track-origins can be 0,1,2.
The current code only distinguishes 1 and 2 in compile time, but not at runtime.
Made runtime distinguish 1 and 2 too.
Reviewed By: browneee
Differential Revision: https://reviews.llvm.org/D105128
Stella Laurenzo [Mon, 28 Jun 2021 22:54:11 +0000 (22:54 +0000)]
[mlir] Generare .cpp.inc files for dialects.
* Previously, we were only generating .h.inc files. We foresee the need to also generate implementations and this is a step towards that.
* Discussed in https://llvm.discourse.group/t/generating-cpp-inc-files-for-dialects/3732/2
* Deviates from the discussion above by generating a default constructor in the .cpp.inc file (and adding a tablegen bit that disables this in case if this is user provided).
* Generating the destructor started as a way to flush out the missing includes (produces a link error), but it is a strict improvement on its own that is worth doing (i.e. by emitting key methods in the .cpp file, we root vtables in one translation unit, which is a non-controversial improvement).
Differential Revision: https://reviews.llvm.org/D105070
Stella Stamenova [Tue, 29 Jun 2021 19:09:56 +0000 (12:09 -0700)]
Revert D104488 and friends since it broke the windows bot
Reverts commits:
"Fix failing tests after https://reviews.llvm.org/D104488."
"Fix buildbot failure after https://reviews.llvm.org/D104488."
"Create synthetic symbol names on demand to improve memory consumption and startup times."
This series of commits broke the windows lldb bot and then failed to fix all of the failing tests.
Eugene Zhulenev [Tue, 29 Jun 2021 19:56:15 +0000 (12:56 -0700)]
[mlir:Async] Change async-parallel-for block size/count calculation
Depends On D105037
Avoid creating too many tasks when the number of workers is large.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D105126
Eugene Zhulenev [Tue, 29 Jun 2021 19:12:15 +0000 (12:12 -0700)]
[mlir:Async] Add an async reference counting pass based on the user defined policy
Depends On D104999
Automatic reference counting based on the liveness analysis can add a lot of reference counting overhead at runtime. If the IR is known to be constrained to few particular "shapes", it's much more efficient to provide a custom reference counting policy that will specify where it is required to update the async value reference count.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D105037
Nicolas Vasilache [Tue, 29 Jun 2021 15:39:14 +0000 (15:39 +0000)]
[mlir][Linalg] Add a ComprehensiveModuleBufferizePass and support for CallOp analysis(9/n)
This revision adds the minimal plumbing to create a simple ComprehensiveModuleBufferizePass that can behave conservatively in the presence of CallOps.
A topological sort of caller/callee is performed and, if the call-graph is cycle-free, analysis can proceed.
Differential revision: https://reviews.llvm.org/D104859
Stefan Pintilie [Tue, 29 Jun 2021 19:01:48 +0000 (14:01 -0500)]
[Clang] Add option to handle behaviour of vector bool/vector pixel.
Added the option `-altivec-src-compat=[mixed,gcc,xl]`. The default at this time is `mixed`.
The default behavior for clang is for all vector compares to return a scalar unless the vectors being
compared are vector bool or vector pixel. In that case the compare returns a
vector. With the gcc case all vector compares return vectors and in the xl case
all vector compares return scalars.
This patch does not change the default behavior of clang.
This option will be used in future patches to implement behaviour compatibility for the vector bool/pixel types.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D103615
Leonard Chan [Wed, 23 Jun 2021 23:12:52 +0000 (16:12 -0700)]
[NFC][compiler-rt][hwasan] Re-use ring buffer size calculation
Users can call HwasanThreadList::GetRingBufferSize rather than RingBufferSize
to prevent having to do the calculation in RingBufferSize. This will be useful
for Fuchsia where we plan to initialize the stack ring buffer separately from
the rest of thread initialization.
Differential Revision: https://reviews.llvm.org/D104823
Fangrui Song [Tue, 29 Jun 2021 18:56:26 +0000 (11:56 -0700)]
[llvm-readobj] Make -s and -t match llvm-readelf
llvm-readobj is an internal testing tool for binary formats. Its output and
command line options do not need to be stable. It isn't supposed to be part of a
build process.
llvm-readelf was created as a user-facing utility and its interface intends to
be compatible with GNU readelf (unless there are good reasons not to).
The two tools have mostly compatible options. -s and -t are noticeable
exceptions due to history. I think the cost of keeping the inconsistency
overweighs the little history-compatible benefit and hinders transition from
cl::opt to OptTable, so let's change it.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105055
Leonard Chan [Wed, 23 Jun 2021 23:18:44 +0000 (16:18 -0700)]
[NFC][compiler-rt][hwasan] Move GetCurrentThread to hwasan.cpp
We can reuse the same implementation for getting the current thread on fuchsia.
Differential Revision: https://reviews.llvm.org/D104824
Fangrui Song [Tue, 29 Jun 2021 18:50:31 +0000 (11:50 -0700)]
[test] Change -t to --syms and -s to -S for llvm-readobj RUN lines
-s and -t will be changed to improve consistency with llvm-readelf.
The inconsistency issue regularly contributes to confusion using the two tools.
Nikita Popov [Tue, 29 Jun 2021 18:29:10 +0000 (20:29 +0200)]
[SanitizerCoverage] Fix global type check with opaque pointers
The code was previously relying on the fact that an incorrectly
typed global would result in the insertion of a BitCast constant
expression. With opaque pointers, this is no longer the case, so
we should check the type explicitly.
Fangrui Song [Tue, 29 Jun 2021 18:23:30 +0000 (11:23 -0700)]
[llvm-objcopy][MachO] Support ARM64_RELOC_ADDEND
An ARM64_RELOC_ADDEND relocation reuses the symbol field for the addend value.
We should pass through such relocations.
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D104967
Jacob Hegna [Tue, 29 Jun 2021 18:14:24 +0000 (18:14 +0000)]
[NFC] clang-format on InlineCost.cpp and InlineAdvisor.h.
Nikita Popov [Fri, 25 Jun 2021 20:14:37 +0000 (22:14 +0200)]
[OpaquePtr] Support forward references in textual IR
Currently, LLParser will create a Function/GlobalVariable forward
reference based on the desired pointer type and then modify it when
it is declared. With opaque pointers, we generally do not know the
correct type to use until we see the declaration.
Solve this by creating the forward reference with a dummy type, and
then performing a RAUW with the correct Function/GlobalVariable when
it is declared. The approach is adopted from
https://github.com/TNorthover/llvm-project/commit/
b5b55963f62038319fa7a8b1b232226ba1d8ef3c.
This results in a change to the use list order, which is why we see
test changes on some module passes that are not stable under use list
reordering.
Differential Revision: https://reviews.llvm.org/D104950
Craig Topper [Tue, 29 Jun 2021 17:38:47 +0000 (10:38 -0700)]
[LegalizeTypes][VE] Don't Expand BITREVERSE/BSWAP during type legalization promotion if they will be promoted for NVT in op legalization.
We were trying to expand these if they were going to be expanded
in op legalization so that we generated the minimum number of
operations. We failed to take into account that NVT could be
promoted to another legal type in op legalization.
Hoping this fixes the issue on the VE target reported as a follow
up to D96681. The check line changes were taken from before
1e46b6f4012399a2fef5fbbb4ed06fc919835414 so this patch does
appear to improve some cases that had previously regressed.
Jonas Devlieghere [Tue, 29 Jun 2021 17:56:01 +0000 (10:56 -0700)]
[lldb] Check for the mangled symbol name for objc_copyRealizedClassList_nolock
When we check whether the Objective-C SPI is available, we need to check
for the mangled symbol name. Unlike `objc_copyRealizedClassList`, which
is C exported, the `nolock` variant is not.
Differential revision: https://reviews.llvm.org/D105136