Konstantin Zhuravlyov [Fri, 30 Apr 2021 14:07:02 +0000 (10:07 -0400)]
AMDGPU/llvm-readobj: Add missing tests for note parsing/displaying
This is a follow up review/change for https://reviews.llvm.org/D95638
Add valid note tests for code object v2 notes:
- NT_AMD_HSA_CODE_OBJECT_VERSION (required yaml2obj update)
- NT_AMD_HSA_HSAIL (required yaml2obj update)
- NT_AMD_HSA_ISA_VERSION (required yaml2obj update)
- NT_AMD_HSA_METADATA
- NT_AMD_HSA_ISA_NAME
- NT_AMD_PAL_METADATA
Add valid note tests for code object v3 notes:
- NT_AMDGPU_METADATA
Add invalid note tests for code object v2 notes:
- NT_AMD_HSA_CODE_OBJECT_VERSION (required yaml2obj update)
- NT_AMD_HSA_HSAIL (required yaml2obj update)
- NT_AMD_HSA_ISA_VERSION (required yaml2obj update)
Add invalid note tests for code object v3 notes:
- NT_AMDGPU_METADATA
Differential Revision: https://reviews.llvm.org/D101304
David Spickett [Fri, 30 Apr 2021 10:35:54 +0000 (11:35 +0100)]
[lldb] More tests for DumpDataExtractor
* Using a base address or skipping it with LLDB_INVALID_ADDRESS
* Using a data offset, which does not effect the printed addresses
* Not providing an output stream
* Formatting a double sized HexFloat
* Formatting over multiple lines
Since address printing now has its own test,
I've removed the base address from all the format
type tests.
The multi line tests still use a base address to check that
it's incremented correctly for each new line.
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D101627
Jingu Kang [Tue, 23 Mar 2021 13:46:54 +0000 (13:46 +0000)]
[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch
Differential Revision: https://reviews.llvm.org/D99354
Amy Kwan [Wed, 28 Apr 2021 03:37:02 +0000 (22:37 -0500)]
[PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns.
This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.
The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.
The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.
This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.
Differential Revision: https://reviews.llvm.org/D93370
Sidharth Baveja [Fri, 30 Apr 2021 14:48:02 +0000 (14:48 +0000)]
[XCOFF][AIX] Add Global Variables Directly to TOC for 32 bit AIX
Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.
Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178
Adam Czachorowski [Fri, 16 Apr 2021 18:07:46 +0000 (20:07 +0200)]
[clang] Fix assert() crash when checking undeduced arg alignment
There already was a check for undeduced and incomplete types, but it
failed to trigger when outer type (SubstTemplateTypeParm in test) looked
fine, but inner type was not.
Differential Revision: https://reviews.llvm.org/D100667
Jay Foad [Fri, 30 Apr 2021 13:50:46 +0000 (14:50 +0100)]
[AMDGPU] Add test for set_gpr_idx removal with conditional branches
Dmitry Vyukov [Wed, 28 Apr 2021 07:19:37 +0000 (09:19 +0200)]
sanitizer_common: introduce kInvalidTid/kMainTid
Currently we have a bit of a mess related to tids:
- sanitizers re-declare kInvalidTid multiple times
- some call it kUnknownTid
- implicit assumptions that main tid is 0
- asan/memprof claim their tids need to fit into 24 bits,
but this does not seem to be true anymore
- inconsistent use of u32/int to store tids
Introduce kInvalidTid/kMainTid in sanitizer_common
and use them consistently.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D101428
LLVM GN Syncbot [Fri, 30 Apr 2021 13:48:40 +0000 (13:48 +0000)]
[gn build] Port
43bc584dc05e
Simon Moll [Fri, 30 Apr 2021 13:46:59 +0000 (15:46 +0200)]
[VE] VP intrinsics are legal
Simon Moll [Fri, 30 Apr 2021 11:43:48 +0000 (13:43 +0200)]
[VP,Integer,#2] ExpandVectorPredication pass
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).
VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D78203
Jay Foad [Fri, 30 Apr 2021 13:17:14 +0000 (14:17 +0100)]
[AMDGPU] Add implicit negative check for the set_gpr_idx tests
The only effect of the optimization is to remove s_set_gpr_idx_*
instructions, and update_mir_test_checks.py always inserts CHECK: rather
than CHECK-NEXT: checks, so without this implicit negative check, the
tests would always pass even if the optimization did nothing.
Differential Revision: https://reviews.llvm.org/D101622
Anastasia Stulova [Fri, 30 Apr 2021 13:38:36 +0000 (14:38 +0100)]
[OpenCL] Prevent adding vendor extensions for all targets
Removed extension begin/end pragma as it has no effect and
it is added unconditionally for all targets.
Differential Revision: https://reviews.llvm.org/D92244
Nico Weber [Fri, 30 Apr 2021 13:22:46 +0000 (09:22 -0400)]
[lld/mac] Remove unused -L%t flags from tests
No behavior change.
Differential Revision: https://reviews.llvm.org/D101623
Pooja Yadav [Fri, 30 Apr 2021 13:32:02 +0000 (19:02 +0530)]
[docs]Added llvm/bindings section
Added information about language bindings provided by LLVM.
Reviewed By: xgupta, gandhi21299
Differential Revision: https://reviews.llvm.org/D101295
Nico Weber [Fri, 30 Apr 2021 13:30:46 +0000 (09:30 -0400)]
[lld/mac] Tweak two comments and fix style on one variable name
Cosmetic, no behavior change.
Andrea Di Biagio [Fri, 30 Apr 2021 12:53:33 +0000 (13:53 +0100)]
[MCA] Fix CarryOver check in the DispatchStage (PR50174).
Early exit from method DispatchStage::isAvailable() if the dispatch group is
already full. Not all instructions declare at least one uOP.
Fixes PR50174.
Florian Hahn [Fri, 30 Apr 2021 13:13:47 +0000 (14:13 +0100)]
[clang] Refactor mustprogress handling, add it to all loops in c++11+.
Currently Clang does not add mustprogress to inifinite loops with a
known constant condition, matching C11 behavior. The forward progress
guarantee in C++11 and later should allow us to add mustprogress to any
loop (http://eel.is/c++draft/intro.progress#1).
This allows us to simplify the code dealing with adding mustprogress a
bit.
Reviewed By: aaron.ballman, lebedev.ri
Differential Revision: https://reviews.llvm.org/D96418
Jay Foad [Fri, 30 Apr 2021 12:42:45 +0000 (13:42 +0100)]
[AMDGPU] Fix inconsistent ---/... in MIR tests and regenerate checks
In some cases the lack of --- or ... confused update_mir_test_checks.py
into not adding any checks for a function.
Arthur O'Dwyer [Thu, 29 Apr 2021 21:32:17 +0000 (17:32 -0400)]
[libc++] [test] Run the clang-format and generated-output checks on the "service" queue
As these jobs only run in a couple seconds, and block starting of
other jobs, they can run on the "service" queue which doesn't get
blocked by other long-running jobs.
Differential Revision: https://reviews.llvm.org/D101437
Arthur O'Dwyer [Thu, 29 Apr 2021 21:40:17 +0000 (17:40 -0400)]
[libc++] Minor cleanups in <iterator>. NFCI.
Tomas Matheson [Thu, 29 Apr 2021 13:46:11 +0000 (14:46 +0100)]
[ARM][MVE] vcreateq lane ordering for big endian
Use of bitcast resulted in lanes being swapped for vcreateq with big
endian. Fix this by using vreinterpret. No code change for little
endian. Adds IR lit test.
Differential Revision: https://reviews.llvm.org/D101606
Nathan James [Fri, 30 Apr 2021 12:27:23 +0000 (13:27 +0100)]
[clangd][NFC] Remove unnecessary string captures in lambdas.
Due to a somewhat annoying, but necessary, shortfall in -Wunused-lambda-capture, These unused captures aren't warned about.
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D101611
Hans Wennborg [Fri, 30 Apr 2021 12:23:30 +0000 (14:23 +0200)]
Require shell for lld/test/MachO/reproduce.s
as a way of not running it on Windows, where the file paths when
extracting repro2.tar can become longer than the maximum file length
limit (depending on the build dir name) and cause the test to fail.
(See https://crbug.com/1204463 for example test failure.)
Martin Probst [Thu, 29 Apr 2021 09:35:27 +0000 (11:35 +0200)]
clang-format: [JS] handle "off" in imports
Previously, the JavaScript import sorter would ignore `// clang-format
off` and `on` comments. This change fixes that. It tracks whether
formatting is enabled for a stretch of imports, and then only sorts and
merges the imports where formatting is enabled, in individual chunks.
This means that there's no meaningful total order when module references are mixed
with blocks that have formatting disabled. The alternative approach
would have been to sort all imports that have formatting enabled in one
group. However that raises the question where to insert the
formatting-off block, which can also impact symbol visibility (in
particular for exports). In practice, sorting in chunks probably isn't a
big problem.
This change also simplifies the general algorithm: instead of tracking
indices separately and sorting them, it just sorts the vector of module
references. And instead of attempting to do fine grained tracking of
whether the code changed order, it just prints out the module references
text, and compares that to the previous text. Given that source files
typically have dozens, but not even hundreds of imports, the performance
impact seems negligible.
Differential Revision: https://reviews.llvm.org/D101515
Evgeniy Brevnov [Tue, 27 Apr 2021 13:15:05 +0000 (20:15 +0700)]
[NARY] Don't optimize min/max if there are side uses (part2)
Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D101359
Alexey Bader [Thu, 29 Apr 2021 14:11:11 +0000 (17:11 +0300)]
[Doc] Fix sphinx warnings about wrong code-block format
Differential Revision: https://reviews.llvm.org/D101549
Florian Hahn [Fri, 30 Apr 2021 08:39:34 +0000 (09:39 +0100)]
[Passes] Run sinking/hoisting in SimplifyCFG earlier.
Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:
1. Executing memory accesses unconditionally.
2. Reducing the number of instructions that need predication.
After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.
This patch tires to recover the regression by running hoisting/sinking
as part of a SimplifyCFG run after LoopRotate and before LoopVectorize.
Note that in the legacy pass-manager, we run LoopRotate just before
vectorization again and there's no SimplifyCFG run in between, so the
sinking/hoisting may impact the later run on LoopRotate. But the impact
should be limited and the benefit of hosting/sinking at this stage
should outweigh the risk of not rotating.
Compile-time impact looks slightly positive for most cases.
http://llvm-compile-time-tracker.com/compare.php?from=
2ea7fb7b1c045a7d60fcccf3df3ebb26aa3699e5&to=
e58b4a763c691da651f25996aad619cb3d946faf&stat=instructions
NewPM-O3: geomean -0.19%
NewPM-ReleaseThinLTO: geoman -0.54%
NewPM-ReleaseLTO-g: geomean -0.03%
With a few benchmarks seeing a notable increase, but also some
improvements.
Alternative to D101290.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101468
Jun Ma [Tue, 20 Apr 2021 02:51:50 +0000 (10:51 +0800)]
[AArch64][SVE] Lower index_vector to step_vector
As discussed in D100107, this patch first convert index_vector to
step_vector, and convert step_vector back to index_vector after LegalizeDAG.
Differential Revision: https://reviews.llvm.org/D100816
Roman Lebedev [Fri, 30 Apr 2021 10:22:35 +0000 (13:22 +0300)]
[InlineCost] CallAnalyzer: use TTI info for extractvalue - they are free (PR50099)
It seems incorrect to use TTI data in some places,
and override it in others. In this case, TTI says
that `extractvalue` are free, yet we bill them.
While this doesn't address https://bugs.llvm.org/show_bug.cgi?id=50099 yet,
it reduces the cost from 55 to 50 while the threshold is 45.
Differential Revision: https://reviews.llvm.org/D101228
Neal (nealsid) [Fri, 30 Apr 2021 10:29:00 +0000 (12:29 +0200)]
Wrap edit line configuration calls into helper functions
Currently we call el_set directly to configure the editor in the libedit
wrapper. There are some cases in which this causes extra casting, but we pass
captureless lambdas as function pointers, which should work out of the box.
Since el_set takes varargs, if the cast is incorrect or if the cast is not
present, it causes a run time failure rather than compile error. This change
makes it so a few different types of configuration is done inside a helper
function to provide type safety and eliminate that casting. I didn't do all
edit line configuration because I'm not sure how important it was in other cases
and it might require something more general keep up with libedit's signature.
I'm open to suggestions, though.
Reviewed By: teemperor, JDevlieghere
Differential Revision: https://reviews.llvm.org/D101250
David Stuttard [Fri, 30 Apr 2021 09:06:30 +0000 (10:06 +0100)]
[AMDGPU] Tidy up some simple expressions for clarity NFC
Slight refactor for clarity.
Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5
Differential Revision: https://reviews.llvm.org/D101610
David Stuttard [Wed, 7 Apr 2021 13:19:21 +0000 (14:19 +0100)]
[JITLink] Minor fix to avoid Windows compiler warning for static-cast
Change-Id: Id0c1d5535b53e2aebe314151c0efa585e763f3f6
Differential Revision: https://reviews.llvm.org/D100093
Keith Walker [Thu, 29 Apr 2021 12:46:25 +0000 (13:46 +0100)]
[AArch64] Change __ARM_FEATURE_FP16FML macro name to __ARM_FEATURE_FP16_FML
The "Arm C Language extensions" document (the current version can be
found at https://developer.arm.com/documentation/101028/0012/?lang=en)
states that the name of the feature test macro for the FP16 FML extension
is __ARM_FEATURE_FP16_FML.
Differential Revision: https://reviews.llvm.org/D101532
David Spickett [Tue, 13 Apr 2021 14:42:02 +0000 (15:42 +0100)]
[lldb] Add tests for DumpDataExtractor formats
Covering basic cases where you have 1 item on 1 line.
Apart from eFormatCharArray, where using multiple lines
highlights the difference between it and eFormatVectorOfChar.
Reviewed By: #lldb, teemperor
Differential Revision: https://reviews.llvm.org/D101453
Fraser Cormack [Fri, 30 Apr 2021 08:39:25 +0000 (09:39 +0100)]
[RISCV][NFC] Merge RV32/RV64 test checks with a common prefix
Fraser Cormack [Tue, 20 Apr 2021 14:23:30 +0000 (15:23 +0100)]
[RISCV] Support STEP_VECTOR with a step greater than one
DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.
This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D100856
Timm Bäder [Fri, 30 Apr 2021 08:17:03 +0000 (10:17 +0200)]
[llvm][Support][NFC] Fix fallthrough attribute indentation
The attribute does not belong to the if statement before and trips up
gcc's indentation checker.
Dmitry Vyukov [Fri, 30 Apr 2021 08:20:01 +0000 (10:20 +0200)]
tsan: fix fork syscall test
Arm64 builders failed with:
error: use of undeclared identifier 'SYS_fork'
https://lab.llvm.org/buildbot/#/builders/7/builds/2575
Indeed, not all arches have fork syscall.
Implement fork via clone on these arches.
Differential Revision: https://reviews.llvm.org/D101603
Dominik Montada [Mon, 29 Mar 2021 13:21:46 +0000 (15:21 +0200)]
[GISel] Teach TableGen to check predicates of immediate operands in patterns
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D91703
Jay Foad [Fri, 30 Apr 2021 07:58:20 +0000 (08:58 +0100)]
[AMDGPU] Simplify getWaitStatesSince. NFC.
Martin Storsjö [Wed, 21 Apr 2021 05:35:40 +0000 (08:35 +0300)]
[cmake] Use -ffunction-sections and -Wl,--gc-sections on MinGW targets
If compiling with GCC or linking with ld.bfd, these options have little
effect, but if built with Clang and linked with LLD, they provide a
quite notable size decrease - this shrinks an entire llvm-mingw
distribution package by 22%.
If building with BUILD_SHARED_LIBS or LLVM_BUILD_LLVM_DYLIB with LLD,
this requires a version of LLD that contains a fix for auto exporting
symbols from comdats,
2b01a417d7ccb001ccc1185ef5fdc967c9fac8d7.
Differential Revision: https://reviews.llvm.org/D101568
Evgeny Leviant [Fri, 30 Apr 2021 07:18:23 +0000 (10:18 +0300)]
Fix -fdebug-pass-structure test case
Pass structure can change when -O0 is given and extensions are used.
Martin Storsjö [Mon, 12 Apr 2021 09:49:17 +0000 (12:49 +0300)]
Reapply [llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets
When looking up data referenced from pdata/xdata structures, the
referenced data can be found in two different ways:
- For an unrelocated object file, it's located via a relocation
- For a relocated, linked image, the data is referenced with an
(image relative) absolute address
For the latter case, the absolute address can optionally be
described with a symbol.
For the case of an object file, there's two offsets involved; one
immediate offset encoded in the data location that is modified by
the relocation, and a section offset in the symbol.
Previously, for the ExceptionRecord field, we printed the offset
from the symbol (only) but used the immediate offset ignoring
the symbol's address (using only the symbol's section) for printing
the exception data.
Add a helper method for doing the lookup and address calculation,
for simplifying the calling code and making all the cases consistent.
This addresses an existing FIXME comment, fixing printing of the
exception data for cases where relocations point at individual
symbols in the xdata section (which is what MSVC generates) instead of
all relocations pointing at the start of the xdata section (which is
what LLVM generates).
This also fixes printing of the function name for packed entries in
linked images.
Relanded with a format string fix in the formatSymbol function; one
can't use %X as format string for an uint64_t. That bug has been
present since this code was added in
e6971cab306cd.
Differential Revision: https://reviews.llvm.org/D100305
Dmitry Vyukov [Fri, 30 Apr 2021 06:32:52 +0000 (08:32 +0200)]
tsan: refactor fork handling
Commit
efd254b6362 ("tsan: fix deadlock in pthread_atfork callbacks")
fixed another deadlock related to atfork handling.
But builders with DCHECKs enabled reported failures of
pthread_atfork_deadlock2.c and pthread_atfork_deadlock3.c tests
related to the fact that we hold runtime locks on interceptor exit:
https://lab.llvm.org/buildbot/#/builders/70/builds/6727
This issue is somewhat inherent to the current approach,
we indeed execute user code (atfork callbacks) with runtime lock held.
Refactor fork handling to not run user code (atfork callbacks)
with runtime locks held. This change does this by installing
own atfork callbacks during runtime initialization.
Atfork callbacks run in LIFO order, so the expectation is that
our callbacks run last, right before the actual fork.
This way we lock runtime mutexes around fork, but not around
user callbacks.
Extend tests to also install after fork callbacks just to cover
more scenarios. Some tests also started reporting real races
that we previously suppressed.
Also extend tests to cover fork syscall support.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D101517
Jonas Devlieghere [Fri, 30 Apr 2021 05:07:46 +0000 (22:07 -0700)]
[debugserver] Use add_lldb_library instead of add_library
Use add_lldb_library to ensure debugserver inherits the defines set by
llvm and lldb.
Differential revision: https://reviews.llvm.org/D101596
Jianzhou Zhao [Thu, 29 Apr 2021 23:10:19 +0000 (23:10 +0000)]
[msan] Add static to some msan allocator functions
This is to help review refactor the allocator code.
So it is easy to see which are the real public interfaces.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D101586
Qiu Chaofan [Fri, 30 Apr 2021 04:02:37 +0000 (12:02 +0800)]
Pre-commit test for PPC vector extraction test
Arthur Eubanks [Thu, 29 Apr 2021 21:19:14 +0000 (14:19 -0700)]
[InlineCost] Remove visitUnaryInstruction()
The simplifyInstruction() in visitUnaryInstruction() does not trigger
for all of check-llvm. Looking at all delegates to UnaryInstruction in
InstVisitor, the only instructions that either don't have a visitor in
CallAnalyzer, or redirect to UnaryInstruction, are VAArgInst and Alloca.
VAArgInst will never get simplified, and visitUnaryInstruction(Alloca)
would always return false anyway.
Reviewed By: mtrofin, lebedev.ri
Differential Revision: https://reviews.llvm.org/D101577
Christudasan Devadasan [Thu, 29 Apr 2021 18:52:45 +0000 (00:22 +0530)]
[AMDGPU] Skip promote-alloca for insertelement/insertvalue users
It is difficult to track the users of vector and aggregate types.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D101562
luxufan [Mon, 12 Apr 2021 05:28:00 +0000 (13:28 +0800)]
[RISCV] Fix StackOffset calculation when using sp to access the fixed stack object in the case of rvv vector objects existed
When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field
Differential Revision: https://reviews.llvm.org/D100286
luxufan [Mon, 12 Apr 2021 05:28:00 +0000 (13:28 +0800)]
[RISCV] Precommit a test case that test accessing a fixed object when has rvv vector object existed
Differential Revision: https://reviews.llvm.org/D100284
Steven Wu [Thu, 29 Apr 2021 22:48:46 +0000 (15:48 -0700)]
[CMake][compiler-rt] avoid conflict with builtin check_linker_flag
Rename `check_linker_flag` in compiler_rt to avoid conflict. Follow up
as the fix in D100901.
Patched by radford.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101581
Wang, Pengfei [Fri, 30 Apr 2021 01:34:06 +0000 (09:34 +0800)]
[MS] Preserve base register %rbx around cpuid
This patch copies implementation from cpuid.h, which preserve base register %rbx around cpuid. It fixes PR50133.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D101338
Brendon Cahoon [Thu, 29 Apr 2021 15:03:24 +0000 (11:03 -0400)]
[AArch64][GlobalISel] Fix width value for G_SBFX/G_UBFX
When creating G_SBFX/G_UBFX opcodes, the last operand is the
width instead of the bit position. The bit position is used
for the AArch64 SBFM and UBFM instructions. The bit position
is converted to a width if the SBFX/UBFX aliases are generated.
For other SBMF/UBFM aliases, such as shifts, the bit position
is used.
Differential Revision: https://reviews.llvm.org/D101543
Matt Arsenault [Thu, 25 Oct 2018 21:47:57 +0000 (14:47 -0700)]
VirtRegMap: Support partially allocated virtual registers
Don't assert if there are unassigned virtual registers. Maintain
LiveIntervals by removing the RegUnits for allocated registers, since
they should not longer be necessary.
One part I find somewhat questionable is the special handling
necessary for handleIdentityCopy. The LiveIntervals for the relevant
regunits needs to be removed.
Walter Erquinigo [Tue, 27 Apr 2021 23:02:38 +0000 (16:02 -0700)]
[lldb-vscode] Follow up of D99989 - store some strings more safely
As a follow up of https://reviews.llvm.org/D99989#inline-953343, I'm now
storing std::string instead of char *. I know it might never break as char *,
but if it does, chasing that bug might be dauting.
Besides, I'm also checking of the strings gotten through the SB API are
null or not.
Matt Arsenault [Thu, 25 Oct 2018 21:45:55 +0000 (14:45 -0700)]
VirtRegMap: Add pass option to not clear virt regs
In a future change it will be possible to run register
allocation with a specific set of register classes,
so some of the remaining virtual registers will still
be meaningful.
Matt Arsenault [Wed, 28 Apr 2021 23:36:09 +0000 (19:36 -0400)]
AMDGPU: Add missing runline to test
There are checks for gfx908, but this wasn't actually running with it.
Carl Ritson [Fri, 30 Apr 2021 00:01:10 +0000 (09:01 +0900)]
[AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn
Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the instruction can never be null.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D101430
Carl Ritson [Thu, 29 Apr 2021 23:55:42 +0000 (08:55 +0900)]
[AMDGPU] Remove dead early-out in GCNHazardRecognizer
Remove an early-out in wait state counting which can never be
taken.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D101520
Akira Hatanaka [Thu, 22 Apr 2021 16:48:54 +0000 (09:48 -0700)]
[Sema] Don't set BlockDecl's DoesNotEscape bit if the parameter type of
the function the block is passed to isn't a block pointer type
This patch fixes a bug where a block passed to a function taking a
parameter that doesn't have a block pointer type (e.g., id or reference
to a block pointer) was marked as noescape.
This partially fixes PR50043.
rdar://
77030453
Differential Revision: https://reviews.llvm.org/D101097
Jianzhou Zhao [Thu, 29 Apr 2021 18:47:04 +0000 (18:47 +0000)]
[msan] Remove dead function/fields
To see how to extract a shared allocator interface for D101204,
found some unused code. Tests passed. Are they safe to remove?
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D101559
Akira Hatanaka [Tue, 27 Apr 2021 01:38:58 +0000 (18:38 -0700)]
[ObjC][ARC] Don't enter the cleanup scope if the initializer expression
isn't an ExprWithCleanups
This patch fixes a bug where a temporary ObjC pointer is released before
the end of the full expression.
This fixes PR50043.
rdar://
77030453
Differential Revision: https://reviews.llvm.org/D101502
Zequan Wu [Thu, 29 Apr 2021 22:52:24 +0000 (15:52 -0700)]
Aart Bik [Thu, 29 Apr 2021 21:31:18 +0000 (14:31 -0700)]
[mlir][sparse] migrate sparse operations into new sparse tensor dialect
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.
NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.
Differential Revision: https://reviews.llvm.org/D101573
Zequan Wu [Thu, 29 Apr 2021 04:25:51 +0000 (21:25 -0700)]
[CodeGen] don't emit addrsig symbol if it's used only by metadata
Value only used by metadata can be removed from .addrsig table.
This solves the undefined symbol error when enabling addrsig table on COFF LTO.
Differential Revision: https://reviews.llvm.org/D101512
Rob Suderman [Fri, 23 Apr 2021 00:40:35 +0000 (17:40 -0700)]
[mlir][tosa] Remove constant-0 dim expr values from TOSA lowerings
Constant-0 dim expr values should be avoided for linalg as it can prevent
fusion. This includes adding support for rank-0 reshapes.
Differential Revision: https://reviews.llvm.org/D101418
jasonliu [Thu, 29 Apr 2021 20:39:43 +0000 (20:39 +0000)]
[XCOFF] Handle the case when personality routine is an alias
Summary:
Personality routine could be an alias to another personality routine.
Fix the situation when we compile the file that contains the personality
routine and the file also have functions that need to refer to the
personality routine.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D101401
Alex Lorenz [Mon, 26 Apr 2021 21:56:56 +0000 (14:56 -0700)]
Recommit "[clang][driver] Use the provided arch name for a Darwin target triple
This ensures that the Darwin driver uses a consistent target triple
representation when the triple is printed out to the user.
This reverts the revert commit
ab0df6c0346e515291a381467527621ab0ccf953.
Differential Revision: https://reviews.llvm.org/D100807
zoecarver [Tue, 27 Apr 2021 16:03:52 +0000 (09:03 -0700)]
[libcxx][ranges] Fix tests for stdlib types that conform to sized_sentinel_for.
Differential Revision: https://reviews.llvm.org/D101371
Amara Emerson [Thu, 29 Apr 2021 21:35:02 +0000 (14:35 -0700)]
[GlobalISel][Legalizer] Bump up a smallvector size that was found to be too small. NFC.
Vladimir Vereschaka [Thu, 29 Apr 2021 21:19:24 +0000 (14:19 -0700)]
[CMake] Stop using c++ subdirectory for libc++ on Win to ARM Linux cross builds. NFC
Updated cross Win-x-ARM Linux toolchain cmake cache file in according of
the following changes: https://reviews.llvm.org/D100869
Stop using use c++ subdirectory for libc++ library
Lang Hames [Thu, 29 Apr 2021 20:47:44 +0000 (13:47 -0700)]
[ORC] JITDylib::addDependencies should be run under the session lock.
Martin Storsjö [Thu, 29 Apr 2021 21:03:40 +0000 (00:03 +0300)]
Revert "[llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets"
This reverts commit
37789240882bfacd951767acdb4c088fcbf53385.
The added test fails on at least one buildbot, by printing a reversed
combination, printing "func3_xdata +0x18 (0x8)" while it's supposed to
be "func3_xdata +0x8 (0x18)", see e.g.
https://lab.llvm.org/buildbot/#/builders/107/builds/7269. Currently
no idea how that could happen, but reverting until it can be figured
out.
Amara Emerson [Mon, 26 Apr 2021 15:29:59 +0000 (08:29 -0700)]
[AArch64][GlobalISel] Simplify out of range rotate amount.
Differential Revision: https://reviews.llvm.org/D101005
Mehdi Amini [Thu, 29 Apr 2021 20:59:41 +0000 (20:59 +0000)]
Revert "[mlir][sparse] migrate sparse operations into new sparse tensor dialect"
This reverts commit
a6d92a971175d727873a9e7644913ee02d7232a8.
The build with -DBUILD_SHARED_LIBS=ON is broken.
Martin Storsjö [Mon, 12 Apr 2021 09:49:17 +0000 (12:49 +0300)]
[llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets
When looking up data referenced from pdata/xdata structures, the
referenced data can be found in two different ways:
- For an unrelocated object file, it's located via a relocation
- For a relocated, linked image, the data is referenced with an
(image relative) absolute address
For the latter case, the absolute address can optionally be
described with a symbol.
For the case of an object file, there's two offsets involved; one
immediate offset encoded in the data location that is modified by
the relocation, and a section offset in the symbol.
Previously, for the ExceptionRecord field, we printed the offset
from the symbol (only) but used the immediate offset ignoring
the symbol's address (using only the symbol's section) for printing
the exception data.
Add a helper method for doing the lookup and address calculation,
for simplifying the calling code and making all the cases consistent.
This addresses an existing FIXME comment, fixing printing of the
exception data for cases where relocations point at individual
symbols in the xdata section (which is what MSVC generates) instead of
all relocations pointing at the start of the xdata section (which is
what LLVM generates).
This also fixes printing of the function name for packed entries in
linked images.
Differential Revision: https://reviews.llvm.org/D100305
Martin Storsjö [Thu, 29 Apr 2021 08:57:33 +0000 (11:57 +0300)]
[LLD] [COFF] Fix the mingw --export-all-symbols behaviour with comdat symbols
When looking for the "all" symbols that are supposed to be exported,
we can't look at the live flag - the symbols we mark as to be
exported will become GC roots even if they aren't yet marked as live.
With this in place, building an LLVM library with BUILD_SHARED_LIBS
produces the same set of symbols exported regardless of whether the
--gc-sections flag is specified, both with and without being built
with -ffunction-sections.
Differential Revision: https://reviews.llvm.org/D101522
Arnamoy Bhattacharyya [Thu, 29 Apr 2021 20:12:28 +0000 (16:12 -0400)]
[flang][OpenMP][FIX] Fix the worksharing nesting check with inclusion of more constructs to cover combined constructs.
Philip Reames [Thu, 29 Apr 2021 20:06:26 +0000 (13:06 -0700)]
Revert "Generalize getInvertibleOperand recurrence handling slightly"
This reverts commit
0c01b37eeb18a51a7e9c9153330d8009de0f600e while a problem reported is investigated.
Benjamin Kramer [Thu, 29 Apr 2021 13:26:10 +0000 (15:26 +0200)]
[mlir] Fix lowering of multi-dimensional vector log1p to LLVM
This was using the untransformed operand, leading to invalid IR.
Differential Revision: https://reviews.llvm.org/D101531
Jay Foad [Thu, 29 Apr 2021 16:48:54 +0000 (17:48 +0100)]
[AMDGPU] Fix v_swap_b32 formation on physical registers
As explained in the comments, matchSwap matches:
// mov t, x
// mov x, y
// mov y, t
and turns it into:
// mov t, x (t is potentially dead and move eliminated)
// v_swap_b32 x, y
On physical registers we don't have full use-def chains so the check
for T being live-out was not working properly with subregs/superregs.
Differential Revision: https://reviews.llvm.org/D101546
Alexey Bataev [Thu, 29 Apr 2021 19:46:59 +0000 (12:46 -0700)]
[COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.
Differential Revision: https://reviews.llvm.org/D100865
Alexey Bataev [Thu, 29 Apr 2021 19:39:48 +0000 (12:39 -0700)]
Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit
92399322217917e67c0d72a55ec51ddc82251cf6 to fix
a compiler crash on mask checks.
Jez Ng [Thu, 29 Apr 2021 19:32:43 +0000 (15:32 -0400)]
[lld-macho] Remove stray file
Sriraman Tallam [Thu, 29 Apr 2021 18:48:11 +0000 (11:48 -0700)]
Basic block sections for functions with implicit-section-name attribute
Functions can have section names set via #pragma or section attributes,
basic block sections should be correctly named for such functions.
With #pragma, the expectation is that all functions in that file are placed
in the same section in the final binary. Basic block sections should be
correctly named with the unique flag set so that the final binary has all the
basic blocks of the function in that named section. This patch fixes the bug
by calling getExplictSectionGlobal when implicit-section-name attribute is set
to make sure the function's basic blocks get the correct section name.
Differential Revision: https://reviews.llvm.org/D101311
Jez Ng [Thu, 29 Apr 2021 02:45:03 +0000 (22:45 -0400)]
[lld-macho][nfc] Clean up header.s test
I don't think it's super worthwhile to test the dylib headers outputs of
all the different archs when x86_64 is the only one that has interesting
behavior.
Motivated by my upcoming addition of arm32...
Jez Ng [Thu, 29 Apr 2021 19:09:01 +0000 (15:09 -0400)]
[lld-macho] Make everything PIE by default
Modern versions of macOS (>= 10.7) and in general all modern Mach-O
target archs want PIEs by default. ld64 defaults to PIE for iOS >= 4.3,
as well as for all versions of watchOS and simulators. Basically all the
platforms LLD is likely to target want PIE. So instead of cluttering LLD's
code with legacy version checks, I think it's simpler to just default to
PIE for everything.
Note that `-no_pie` still works, so users can still opt out of it.
Reviewed By: #lld-macho, thakis, MaskRay
Differential Revision: https://reviews.llvm.org/D101513
Aart Bik [Thu, 29 Apr 2021 01:15:11 +0000 (18:15 -0700)]
[mlir][sparse] migrate sparse operations into new sparse tensor dialect
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.
NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101488
Sanjay Patel [Thu, 29 Apr 2021 18:02:50 +0000 (14:02 -0400)]
[InstCombine] narrow popcount with zext operand
https://llvm.org/PR50141
Sanjay Patel [Thu, 29 Apr 2021 17:46:10 +0000 (13:46 -0400)]
[InstCombine] add tests for popcount with zext operand; NFC
PR50141
Tim Northover [Thu, 29 Apr 2021 18:58:51 +0000 (19:58 +0100)]
Revert "RegAlloc: do not consider liveins to EH-pad successors as liveout."
Some liveins *can* come from this block (e.g. any SSA value except the call),
it's only the ones that produce `landingpad` values that can't and I didn't
think it through properly.
Petar Avramovic [Wed, 28 Apr 2021 11:11:44 +0000 (13:11 +0200)]
AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return
When atomic image intrinsic return value is unused, register class for
destination of a sub-register copy of return value ends up not being set.
This copy then hits 'Register class not set' assert later.
If return value has uses, register class is determined by use instruction.
Fix is to not create sub-register copy when image intrinsic destination has
no uses because it would be deleted by dead-mi-elimination later anyway.
Differential Revision: https://reviews.llvm.org/D101448
Dan Liew [Wed, 28 Apr 2021 21:22:08 +0000 (14:22 -0700)]
[ASan] Rename `-fsanitize-address-destructor-kind=` to drop the `-kind` suffix.
Renaming the option is based on discussions in https://reviews.llvm.org/D101122.
It is normally not a good idea to rename driver flags but this flag is
new enough and obscure enough that it is very unlikely to have adopters.
While we're here also drop the `<kind>` metavar. It's not necessary and
is actually inconsistent with the documentation in
`clang/docs/ClangCommandLineReference.rst`.
Differential Revision: https://reviews.llvm.org/D101491
Tim Northover [Thu, 29 Apr 2021 11:31:22 +0000 (12:31 +0100)]
RegAlloc: do not consider liveins to EH-pad successors as liveout.
These registers get defined by the runtime, not the block being allocated, and
treating them as preassigned in RegAllocFast adds extra pressure, sometimes
enough to make the function unallocatable.
Victor Huang [Wed, 28 Apr 2021 18:57:16 +0000 (13:57 -0500)]
[AIX][TLS] Add ASM portion changes to support TLSGD relocations to XCOFF objects
- Add new variantKinds for the symbol's variable offset and region handle
- Print the proper relocation specifier @gd in the asm streamer when emitting
the TC Entry for the variable offset for the symbol
- Fix the switch section failure between the TC Entry of variable offset and
region handle
- Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D100956
Roman Lebedev [Thu, 29 Apr 2021 16:20:06 +0000 (19:20 +0300)]
[SimplifyCFG] Common code sinking: fix application of profitability check
The profitability check is: we don't want to create more than a single PHI
per instruction sunk. We need to create the PHI unless we'll sink
all of it's would-be incoming values.
But there is a caveat there.
This profitability check doesn't converge on the first iteration!
If we first decide that we want to sink 10 instructions,
but then determine that 5'th one is unprofitable to sink,
that may result in us not sinking some instructions that
resulted in determining that some other instruction
we've determined to be profitable to sink becoming unprofitable.
So we need to iterate until we converge, as in determine
that all leftover instructions are profitable to sink.
But, the direct approach of just re-iterating seems dumb,
because in the worst case we'd find that the last instruction
is unprofitable, which would result in revisiting instructions
many many times.
Instead, i think we can get away with just two passes - forward and backward.
However then it isn't obvious what is the most performant way to update
InstructionsToSink.
Sam Clegg [Mon, 5 Apr 2021 15:00:30 +0000 (08:00 -0700)]
[lld][WebAssembly] Add `--export-if-defined`
Unlike the existing `--export` option this will not causes errors
or warnings if the specified symbol is not defined.
See: https://github.com/emscripten-core/emscripten/issues/13736
Differential Revision: https://reviews.llvm.org/D99887
Mark de Wever [Sat, 27 Feb 2021 15:52:39 +0000 (16:52 +0100)]
[libc++] Fixes std::to_chars for bases != 10.
While working on D70631, Microsoft's unit tests discovered an issue.
Our `std::to_chars` implementation for bases != 10 uses the range
`[first,last)` as temporary buffer. This violates the contract for
to_chars:
[charconv.to.chars]/1 http://eel.is/c++draft/charconv#to.chars-1
`to_chars_result to_chars(char* first, char* last, see below value, int base = 10);`
"If the member ec of the return value is such that the value is equal to
the value of a value-initialized errc, the conversion was successful and
the member ptr is the one-past-the-end pointer of the characters
written."
Our implementation modifies the range `[member ptr, last)`, which causes
Microsoft's test to fail. Their test verifies the buffer
`[member ptr, last)` is unchanged. (The test is only done when the
conversion is successful.)
While looking at the code I noticed the performance for bases != 10 also
is suboptimal. This is tracked in D97705.
This patch fixes the issue and adds a benchmark. This benchmark will be
used as baseline for D97705.
Reviewed By: #libc, Quuxplusone, zoecarver
Differential Revision: https://reviews.llvm.org/D100722