Kazu Hirata [Wed, 27 Jan 2021 04:00:16 +0000 (20:00 -0800)]
[AMDGPU] Forward-declare TargetRegisterClass (NFC)
AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a
forward declaration of TargetRegisterClass in InstructionSelector.h.
This patch adds a forward declaration right in
AMDGPUInstructionSelector.h.
While we are at it, this patch removes the one in
InstructionSelector.h, where it is unnecessary.
Craig Topper [Wed, 27 Jan 2021 03:41:52 +0000 (19:41 -0800)]
[TableGen] Add isContradictoryImpl implementation to CheckCondCodeMatcher and CheckChild2CondCodeMatcher.
This enables better pattern factoring in the RISCV ISel table.
Tom Stellard [Wed, 27 Jan 2021 03:37:08 +0000 (19:37 -0800)]
Bump the trunk major version to 13
and clear the release notes.
Duncan P. N. Exon Smith [Wed, 27 Jan 2021 03:27:32 +0000 (19:27 -0800)]
Frontend: Use early returns in CompilerInstance::clearOutputFiles, NFC
Use early returns in `CompilerInstance::clearOutputFiles` to clarify the
logic, and rename `ec` to `EC` as a drive-by.
No functionality change.
Duncan P. N. Exon Smith [Wed, 27 Jan 2021 03:26:24 +0000 (19:26 -0800)]
Rename clang/test/Frontend/output-{failures,paths}.c, NFC
A follow up patch will add a few success cases here; rename it to
`output-paths.c` instead of `output-failures.c`.
LLVM GN Syncbot [Wed, 27 Jan 2021 01:23:23 +0000 (01:23 +0000)]
[gn build] Port
bb9eb1982980
Shilei Tian [Wed, 27 Jan 2021 01:21:27 +0000 (20:21 -0500)]
[OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs`
With D94745, we no longer use CUDA SDK to compile `deviceRTLs`. Therefore,
many CMake code in the project is useless. This patch cleans up unnecessary code
and also drops the requirement to build NVPTX `deviceRTLs`. CUDA detection is
still being used however to determine whether we need to involve the tests. Auto
detection of compute capability is enabled by default and can be disabled by
setting CMake variable `LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF`.
If auto detection is enabled, and CUDA is also valid, it will only build the
bitcode library for the detected version; otherwise, all variants supported will
be generated. One drawback of this patch is, we now generate 96 variants of
bitcode library, and totally 1485 files to be built with a clean build on a
non-CUDA system. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=""` can be used to
disable building NVPTX `deviceRTLs`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95466
Craig Topper [Wed, 27 Jan 2021 01:03:21 +0000 (17:03 -0800)]
[RISCV] Add rv64 run lines to rv32 MC layer tests for B extension
Remove common instructions from rv64 tests since they are now
covered by the rv64 run lines in the rv32 tests.
Add rv32-only* tests for a few cases that aren't common between
r32 and rv64.
Addresses review feedback from D95150.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95272
Petr Hosek [Fri, 15 Jan 2021 09:14:37 +0000 (01:14 -0800)]
Support for instrumenting only selected files or functions
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
Arthur O'Dwyer [Mon, 25 Jan 2021 22:00:14 +0000 (17:00 -0500)]
[libc++] Give `MoveOnly` all six comparison operators, not just == and <.
Split out of D93512.
Nawrin Sultana [Mon, 2 Nov 2020 22:17:37 +0000 (16:17 -0600)]
[OpenMP] Modify OMP_ALLOCATOR environment variable
This patch sets the def-allocator-var ICV based on the environment variables
provided in OMP_ALLOCATOR. Previously, only allowed value for OMP_ALLOCATOR
was a predefined memory allocator. OpenMP 5.1 specification allows predefined
memory allocator, predefined mem space, or predefined mem space with traits in
OMP_ALLOCATOR. If an allocator can not be created using the provided environment
variables, the def-allocator-var is set to omp_default_mem_alloc.
Differential Revision: https://reviews.llvm.org/D94985
Jon Chesterfield [Wed, 27 Jan 2021 00:22:28 +0000 (00:22 +0000)]
[libomptarget][cuda] Handle missing _v2 symbols gracefully
[libomptarget][cuda] Handle missing _v2 symbols gracefully
Follow on from D95367. Dlsym the _v2 symbols if present, otherwise use the
unsuffixed version. Builds a hashtable for the check, can revise for zero
heap allocations later if necessary.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95415
Nico Weber [Wed, 27 Jan 2021 00:20:23 +0000 (19:20 -0500)]
[gn build] fix get.py change
Nico Weber [Wed, 27 Jan 2021 00:19:19 +0000 (19:19 -0500)]
[gn build] restore build command removed in
9595a7ff55b6 for platforms without prebuilts
Dan Albert [Wed, 27 Jan 2021 00:04:56 +0000 (16:04 -0800)]
Disable rosegment for old Android versions.
The unwinder used by the crash handler on versions of Android prior to
API 29 did not correctly handle binaries built with rosegment, which is
enabled by default for LLD. Android only supports LLD, so it's not an
issue that this flag is not accepted by other linkers.
Reviewed By: srhines
Differential Revision: https://reviews.llvm.org/D95166
Nico Weber [Wed, 27 Jan 2021 00:11:56 +0000 (19:11 -0500)]
llvm-lib: Pull error printing code out of two functions
Slightly changes the output in error code, but no behavior change in
normal use. This is for preparation for using these two functions
elsewhere.
Vyacheslav Zakharin [Tue, 26 Jan 2021 22:45:40 +0000 (14:45 -0800)]
[libomptarget][NFC] Avoid gcc 5/6 issue with lambda captures.
Differential Revision: https://reviews.llvm.org/D95486
Duncan P. N. Exon Smith [Sat, 21 Nov 2020 02:04:32 +0000 (18:04 -0800)]
Frontend: Fix layering between create{,Default}OutputFile, NFC
Fix layering between `CompilerInstance::createDefaultOutputFile` and the
two versions of `createOutputFile`.
- Add missing configuration flags to `createDefaultOutputFile` so that
GeneratePCHAction and GenerateModuleFromModuleMapAction can use it.
They previously promised that temporary files were turned on; now
`createDefaultOutputFile` handles that logic.
- Lift the logic handling `InFile` and `Extension` to
`createDefaultOutputFile`, since it's only the callers of that
function that are using it.
- Rename the deeper of the two `createOutputFile`s to
`createOutputFileImpl` and make it private to `CompilerInstance` (to
prove that no one else is using it).
- Sink the logic for adding to `CompilerInstance::OutputFiles` down to
`createOutputFileImpl`, allowing two "optional" (but always used)
`std::string*` out parameters to be removed.
- Instead of passing a `std::error_code` out parameter into
`createOutputFileImpl`, have it return `Expected<>`.
- As a drive-by, inline `CompilerInstance::addOutputFile` into its only
caller, `createOutputFileImpl`.
Clean layering makes it easier for a future commit to extract
`createOutputFileImpl` out of `CompilerInstance`.
Differential Revision: https://reviews.llvm.org/D93248
Fangrui Song [Tue, 26 Jan 2021 23:33:37 +0000 (15:33 -0800)]
[llc] Add reportError helper and canonicalize error messages
Duncan P. N. Exon Smith [Tue, 15 Dec 2020 01:46:11 +0000 (17:46 -0800)]
Frontend: Simplify handling of non-seeking streams in CompilerInstance, NFC
Add a new `raw_pwrite_ostream` variant, `buffer_unique_ostream`, which
is like `buffer_ostream` but with unique ownership of the stream it's
wrapping. Use this in CompilerInstance to simplify the ownership of
non-seeking output streams, avoiding logic sprawled around to deal with
them specially.
This also simplifies future work to encapsulate output files in a
different class.
Differential Revision: https://reviews.llvm.org/D93260
Jessica Paquette [Tue, 26 Jan 2021 22:39:39 +0000 (14:39 -0800)]
[GlobalISel] Implement computeKnownBits for G_SEXT_INREG
Just use the existing `Known.sextInReg` implementation.
- Update KnownBitsTest.cpp.
- Update combine-redundant-and.mir for a more concrete example.
Differential Revision: https://reviews.llvm.org/D95484
Adrian Prantl [Tue, 26 Jan 2021 22:30:10 +0000 (14:30 -0800)]
Salvage debug info for function arguments in coro-split funclets.
This patch improves the availability for variables stored in the
coroutine frame by emitting an alloca to hold the pointer to the frame
object and rewriting dbg.declare intrinsics to point inside the frame
object using salvaged DIExpressions. Finally, a new alloca is created
in the funclet to hold the FramePtr pointer to ensure that it is
available throughout the entire function at -O0.
This path also effectively reverts D90772. The testcase updates
highlight nicely how every removed CHECK for a dbg.value is preceded
by a new CHECK for a dbg.declare.
Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews!
Differential Revision: https://reviews.llvm.org/D93497
rdar://
71866936
Duncan P. N. Exon Smith [Sat, 21 Nov 2020 03:13:19 +0000 (19:13 -0800)]
Frontend: Fix memory leak in CompilerInstance::setVerboseOutputStream
Found this memory leak in `CompilerInstance::setVerboseOutputStream` by
inspection; it looks like this wasn't previously exercised, since it was
never called twice.
Differential Revision: https://reviews.llvm.org/D93249
Zhuojia Shen [Tue, 26 Jan 2021 22:00:58 +0000 (14:00 -0800)]
[ARM] Fix STRT/STRHT/STRBT input/output operands.
STRT, STRHT, and STRBT are store instructions and their source register
$Rt should be treated as an input operand instead of an output operand.
This should fix things (e.g., liveness tracking in LivePhysRegs) if
these instructions were used in CodeGen.
Differential Revision: https://reviews.llvm.org/D95074
Bjorn Pettersson [Fri, 22 Jan 2021 23:54:04 +0000 (00:54 +0100)]
[NewPM] Add ExtraVectorizerPasses support
As it looks like NewPM generally is using SimpleLoopUnswitch
instead of LoopUnswitch, this patch also use SimpleLoopUnswitch
in the ExtraVectorizerPasses sequence (compared with LegacyPM
which use the LoopUnswitch pass).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D95457
Vyacheslav Zakharin [Tue, 26 Jan 2021 21:52:53 +0000 (13:52 -0800)]
[libomptarget][NFC] Use portable printf format specifiers.
Differential Revision: https://reviews.llvm.org/D95476
Valery N Dmitriev [Tue, 26 Jan 2021 19:05:16 +0000 (11:05 -0800)]
[InstCombine] Preserve FMF for powi simplifications.
Differential Revision: https://reviews.llvm.org/D95455
Valery N Dmitriev [Tue, 26 Jan 2021 18:59:27 +0000 (10:59 -0800)]
[NFC] Show instcombine powi simplifications drop FMF
Differential Revision: https://reviews.llvm.org/D95454
Craig Topper [Tue, 26 Jan 2021 19:28:25 +0000 (11:28 -0800)]
[X86] In shrinkAndImmediate, place the new constant into the topological sort.
Revert the change to use APInt::isSignedIntN from
5ff5cf8e057782e3e648ecf5ccf1d9990b53ee90.
Its clear that the games we were playing to avoid the topological
sort aren't working. So just fix it once and for all.
Fixes PR48888.
Julian Lettner [Tue, 26 Jan 2021 18:54:16 +0000 (10:54 -0800)]
[NFC][lit] Cleanup code using string interpolation
LLVM now requires Python 3.6, so we can use string interpolation to make
code more readable.
Amara Emerson [Tue, 26 Jan 2021 20:54:41 +0000 (12:54 -0800)]
[GlobalISel][IRTranslator] Ignore the llvm.experimental.noalias.scope.decl intrinsic.
These don't generate any code.
Atmn Patel [Tue, 26 Jan 2021 20:56:48 +0000 (15:56 -0500)]
[OpenMP][Libomptarget] Fix cmake error on remote plugin
Requiring 3.15 causes a build breakage, I'm sure none of the contents actually require
3.15 or above.
Differential Revision: https://reviews.llvm.org/D95474
LLVM GN Syncbot [Tue, 26 Jan 2021 20:48:31 +0000 (20:48 +0000)]
[gn build] Port
1e634f3952aa
Fangrui Song [Tue, 26 Jan 2021 20:45:45 +0000 (12:45 -0800)]
[llvm-elfabi] Fix test after D95140
Jon Chesterfield [Tue, 26 Jan 2021 20:43:06 +0000 (20:43 +0000)]
[libomptarget][cuda] Gracefully handle missing cuda library
[libomptarget][cuda] Gracefully handle missing cuda library
If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95412
Jon Chesterfield [Tue, 26 Jan 2021 20:41:06 +0000 (20:41 +0000)]
[libomptarget][cuda] Only run tests when sure there is cuda available
[libomptarget][cuda] Only run tests when sure there is cuda available
Prior to D95155, building the cuda plugin implied cuda was installed locally.
With that change, every machine can build a cuda plugin, but they won't all have
cuda and/or an nvptx card installed locally.
This change enables the nvptx tests when either:
- libcuda is present
- the user has forced use of the dlopen stub
The default case when there is no cuda detected will no longer attempt to
run the tests on nvptx hardware, as was the case before D95155.
Reviewed By: jdoerfert, ronlieb
Differential Revision: https://reviews.llvm.org/D95467
Atmn Patel [Sat, 23 Jan 2021 17:23:50 +0000 (12:23 -0500)]
[OpenMP][Libomptarget] Introduce Remote Offloading Plugin
This introduces a remote offloading plugin for libomptarget. This
implementation relies on gRPC and protobuf, so this library will only
build if both libraries are available on the system. The corresponding
server is compiled to `openmp-offloading-server`.
This is a large change, but the only way to split this up is into RTL/server
but I fear that could introduce an inconsistency amongst them.
Ideally, tests for this should be added to the current ones that but that is
problematic for at least one reason. Given that libomptarget registers plugin
on a first-come-first-serve basis, if we wanted to offload onto a local x86
through a different process, then we'd have to either re-order the plugin list
in `rtl.cpp` (which is what I did locally for testing) or find a better
solution for runtime plugin registration in libomptarget.
Differential Revision: https://reviews.llvm.org/D95314
Haowei Wu [Tue, 26 Jan 2021 19:34:51 +0000 (11:34 -0800)]
[llvm-elfabi] Support ELF file that lacks .gnu.hash section
Before this change, when reading ELF file, elfabi determines number of
entries in .dynsym by reading the .gnu.hash section. This change makes
elfabi read section headers directly first. This change allows elfabi
works on ELF files which do not have .gnu.hash sections.
Differential Revision: https://reviews.llvm.org/D93362
Louis Dionne [Tue, 26 Jan 2021 20:30:42 +0000 (15:30 -0500)]
[libc++] Fix oss-fuzz build
Fangrui Song [Tue, 26 Jan 2021 20:28:23 +0000 (12:28 -0800)]
Add -fbinutils-version= to gate ELF features on the specified binutils version
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727 https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
Petr Hosek [Tue, 26 Jan 2021 20:25:28 +0000 (12:25 -0800)]
Revert "Support for instrumenting only selected files or functions"
This reverts commit
4edf35f11a9e20bd5df3cb47283715f0ff38b751 because
the test fails on Windows bots.
Jim Ingham [Tue, 26 Jan 2021 20:15:09 +0000 (12:15 -0800)]
Make SBDebugger::CreateTargetWithFileAndArch work with lldb::LLDB_DEFAULT_ARCH
Second try, handling both a bogus arch string and the "null file & arch" used
to create an empty but valid target.
Also check in that case before logging (previously the logging would have
crashed.)
Valentin Clement [Tue, 26 Jan 2021 19:53:50 +0000 (14:53 -0500)]
[flang][openacc][NFC] Organize clause validity tests by directive
Split the tests from acc-clause-validity.f90 in dedicated files by directives.
The file acc-clause-validity.f90 was getting too big to be correctly maintained.
Tests are identical.
Reviewed By: SouraVX
Differential Revision: https://reviews.llvm.org/D95328
Fangrui Song [Tue, 26 Jan 2021 19:53:25 +0000 (11:53 -0800)]
CGDebugInfo CreatedLimitedType: Drop file/line for RecordType with invalid location
For Clang synthesized `__va_list_tag` (`CreateX86_64ABIBuiltinVaListDecl`),
its DW_AT_decl_file/DW_AT_decl_line are arbitrarily set from `CurLoc`.
In a stage 2 `-DCMAKE_BUILD_TYPE=Debug` clang build, I observe that
in driver.cpp, DW_AT_decl_file/DW_AT_decl_line may be set to an `#include` line
(the transitively included file uses va_arg (`__builtin_va_arg`)).
This seems arbitrary. Drop that.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D94735
Fangrui Song [Tue, 26 Jan 2021 19:44:41 +0000 (11:44 -0800)]
CGDebugInfo: Drop Loc.isInvalid() special case from getLineNumber
`getLineNumber()` picks CurLoc if the parameter is invalid. This appears to
mainly work around missing SourceLocation information for some constructs, but
sometimes adds unintended locations.
* For `CodeGenObjC/debug-info-blocks.m`, `CurLoc` has been advanced to the closing brace. The debug line of `ImplicitVarParameter` is set to the line of `}` because this implicit parameter has an invalid `SourceLocation`. The debug line is a bit arbitrary - perhaps the location of `^{` is better.
* The file/line of Clang synthesized `__va_list_tag` is arbitrarily attached a `#include` line. D94735
Drop the special case to make getLineNumber less magic and add CurLoc fallback in its callers instead.
Tested with stage 2 -DCMAKE_BUILD_TYPE=Debug clang, byte identical.
Reviewed By: #debug-info, aprantl
Differential Revision: https://reviews.llvm.org/D94391
Austin Kerbow [Mon, 30 Nov 2020 17:06:35 +0000 (09:06 -0800)]
[AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
Atmn [Tue, 26 Jan 2021 19:19:10 +0000 (14:19 -0500)]
[OpenMP][Libomptarget] Introduce changes to support remote plugin
In order to support remote execution, we need to be able to send the
target binary description to the remote host for registration (and
consequent deregistration). To support this, I added these two
optional new functions to the plugin API:
- `__tgt_rtl_register_lib`
- `__tgt_rtl_unregister_lib`
These functions will be called to properly manage the instance of
libomptarget running on the remote host.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D93293
LLVM GN Syncbot [Tue, 26 Jan 2021 19:12:09 +0000 (19:12 +0000)]
[gn build] Port
4edf35f11a9e
Petr Hosek [Fri, 15 Jan 2021 09:14:37 +0000 (01:14 -0800)]
Support for instrumenting only selected files or functions
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
Jon Chesterfield [Tue, 26 Jan 2021 19:05:21 +0000 (19:05 +0000)]
[libomptarget][devicertl][amdgpu] Fix build, variable renaming error
Nathan James [Tue, 26 Jan 2021 18:59:29 +0000 (18:59 +0000)]
[clangd] FindTarget resolves base specifier
FindTarget on the virtual keyword or access specifier of a base specifier will now resolve to type of the base specifier.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D95338
Nathan James [Tue, 26 Jan 2021 18:58:53 +0000 (18:58 +0000)]
[clangd] Selection handles CXXBaseSpecifier
Selection now includes the virtual and access modifier as part of their range for cxx base specifiers.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D95231
Adhemerval Zanella [Tue, 26 Jan 2021 18:20:26 +0000 (15:20 -0300)]
[ARM] [ELF] Fix ARMMaterializeGV for Indirect calls
Recent shouldAssumeDSOLocal changes (introduced by
961f31d8ad14c66)
do not take in consideration the relocation model anymore. The ARM
fast-isel pass uses the function return to set whether a global symbol
is loaded indirectly or not, and without the expected information
llvm now generates an extra load for following code:
```
$ cat test.ll
@__asan_option_detect_stack_use_after_return = external global i32
define dso_local i32 @main(i32 %argc, i8** %argv) #0 {
entry:
%0 = load i32, i32* @__asan_option_detect_stack_use_after_return,
align 4
%1 = icmp ne i32 %0, 0
br i1 %1, label %2, label %3
2:
ret i32 0
3:
ret i32 1
}
attributes #0 = { noinline optnone }
$ lcc test.ll -o -
[...]
main:
.fnstart
[...]
movw r0, :lower16:__asan_option_detect_stack_use_after_return
movt r0, :upper16:__asan_option_detect_stack_use_after_return
ldr r0, [r0]
ldr r0, [r0]
cmp r0, #0
[...]
```
And without 'optnone' it produces:
```
[...]
main:
.fnstart
[...]
movw r0, :lower16:__asan_option_detect_stack_use_after_return
movt r0, :upper16:__asan_option_detect_stack_use_after_return
ldr r0, [r0]
clz r0, r0
lsr r0, r0, #5
bx lr
[...]
```
This triggered a lot of invalid memory access in sanitizers for
arm-linux-gnueabihf. I checked this patch both a stage1 built with
gcc and a stage2 bootstrap and it fixes all the Linux sanitizers
issues.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D95379
Craig Topper [Tue, 26 Jan 2021 18:47:49 +0000 (10:47 -0800)]
[RISCV] Have customLegalizeToWOp truncate to the original type instead of i32 now that we use it for i8/i16 as well.
239cfbccb0509da1a08d9e746706013b732e646b add support for legalizing
i8/i16 UDIV/UREM/SDIV to use *W instructions. So we need to truncate
to i8/i16 if we're legalizing one of those.
Eric Schweitz [Tue, 26 Jan 2021 18:18:44 +0000 (10:18 -0800)]
[mlir] sret and byval now require a type argument when constructed.
Fixes the LLVM code gen bugs and adds the missing tests.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D95378
Julian Lettner [Fri, 15 Jan 2021 01:04:39 +0000 (17:04 -0800)]
Reland "[lit] Use os.cpu_count() to cleanup TODO"
The initial problem with the remaining bot config was resolved.
We can now use Python3. Let's use `os.cpu_count()` to cleanup this
helper.
Differential Revision: https://reviews.llvm.org/D94734
Raphael Isemann [Tue, 26 Jan 2021 17:13:45 +0000 (18:13 +0100)]
[lldb][NFC] Another attempt to fix GCC 5.x compilation
37510f69b4cb8d76064f108d57bebe95984a23ae tried to fix GCC 5.x compilation
by making the enum which is used as a unordered_map key unscoped. However it
seems that in GCC 5.x, enum keys are not supported *at all* in unordered_maps
(at least that's what some trial&error on godbolt tells me). This updates the
workaround to just use an int until GCC 5.x support is dropped.
Christian Sigg [Tue, 26 Jan 2021 13:24:43 +0000 (14:24 +0100)]
[mlir] Set CUDA/ROCm context before creating resources.
The current context is thread-local state, and in preparation of GPU async execution (on multiple threads) we need to set the context before calling API that create resources.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D94495
Matt Arsenault [Mon, 21 Sep 2020 12:43:06 +0000 (08:43 -0400)]
AMDGPU: Fix redundant FP spilling/assert in some functions
If a function has stack objects, and a call, we require an FP. If we
did not initially have any stack objects, and only introduced them
during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills
would end up spilling the FP register as if it were a normal
register. This would result in an assert in a debug build, or
redundant handling of the FP register in a release build.
Try to predict that we will have an FP later, although this is ugly.
Matt Arsenault [Thu, 21 Jan 2021 18:19:50 +0000 (13:19 -0500)]
AMDGPU: Add assertion to determineCalleeSaves
Make sure this isn't getting called multiple times. I was surprised we
were modifying the function here, which I think is a bit questionable.
Shilei Tian [Tue, 26 Jan 2021 17:28:15 +0000 (12:28 -0500)]
[OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language
From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
Here're a list of changes in this patch.
1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.
With this change, there are also multiple features to be expected in the near future:
1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
4. (Maybe more...)
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94745
Dave Lee [Tue, 26 Jan 2021 00:52:32 +0000 (16:52 -0800)]
[lldb] Remove unused ThreadPlanStack::GetStackOfKind (NFC)
This function isn't used.
Differential Revision: https://reviews.llvm.org/D95411
Kadir Cetinkaya [Tue, 26 Jan 2021 06:39:43 +0000 (07:39 +0100)]
[clangd] Add std::size_t to StdSymbol mapping
This is a common symbol that's missing from our mapping because
cppreference yields multiple headers.
Add it manually by picking cstddef to prevent insertion of some stdlib-internal
headers instead.
Fixes https://github.com/clangd/clangd/issues/666.
Differential Revision: https://reviews.llvm.org/D95423
Alex Zinenko [Mon, 25 Jan 2021 17:17:19 +0000 (18:17 +0100)]
[mlir] Add Python bindings for IntegerSet
This follows up on the introduction of C API for the same object and is similar
to AffineExpr and AffineMap.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D95437
Sanjay Patel [Tue, 26 Jan 2021 15:43:38 +0000 (10:43 -0500)]
[LoopVectorize] add test for fmin/fmax FMF propagation; NFC
The existing test has less FMF than we might expect if
our FMF was fixed (on all FP values), so this additional
test is intended to check propagation in a more "normal"
example.
Sanjay Patel [Tue, 26 Jan 2021 15:25:36 +0000 (10:25 -0500)]
[LoopUtils] do not initialize Cmp predicate unnecessarily; NFC
The switch must set the predicate correctly; anything else
should lead to unreachable/assert.
I'm trying to fix FMF propagation here and the callers,
so this is a preliminary cleanup.
Simon Pilgrim [Tue, 26 Jan 2021 16:19:18 +0000 (16:19 +0000)]
Fix null dereference static analysis warning. NFCI.
Replace cast_or_null<> with cast<> as we immediately dereference the pointer afterward so we're not expecting a null pointer.
Simon Pilgrim [Tue, 26 Jan 2021 16:09:27 +0000 (16:09 +0000)]
[AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI.
Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions
Mircea Trofin [Tue, 26 Jan 2021 04:44:41 +0000 (20:44 -0800)]
[NFC] Disallow unused prefixes under clang/test/CodeGen
Differential Revision: https://reviews.llvm.org/D95417
Alexander Belyaev [Tue, 26 Jan 2021 15:59:19 +0000 (16:59 +0100)]
[mlir][nfc] Move `getInnermostParallelLoops` to SCF/Transforms/Utils.h.
Simon Pilgrim [Tue, 26 Jan 2021 15:51:49 +0000 (15:51 +0000)]
[Sema] diagnoseEquivalentInternalLinkageDeclarations - assert for non-null NamedDecl. NFCI.
Fixes clang static analysis warnings.
Simon Pilgrim [Tue, 26 Jan 2021 15:33:31 +0000 (15:33 +0000)]
[AMDGPU] Fix null-dereference static analysis warnings. NFCI.
Avoid repeated calls to isZeroValue() and check for a null pointer before dereferencing a dyn_cast<>.
George Rokos [Tue, 26 Jan 2021 15:37:21 +0000 (07:37 -0800)]
[libomptarget][NFC] Fixed obsolete function names in comments
Matt Arsenault [Mon, 18 Jan 2021 15:49:48 +0000 (10:49 -0500)]
AMDGPU: Clear IsSSA property in SIFormMemoryClauses
Fixes verifier error when writing MIR testcases
Florian Hahn [Fri, 22 Jan 2021 22:49:45 +0000 (22:49 +0000)]
[LoopUnswitch] Avoid partially unswitching too aggressively.
This patch adds additional checks to avoid partial unswitching
in cases where it won't be profitable, e.g. because the path directly
exits the loop anyways.
Florian Hahn [Fri, 22 Jan 2021 22:47:43 +0000 (22:47 +0000)]
[LoopUnswitch] Add some additional tests.
Add a few additional tests where partial unswitching is not really
profitable and should be avoided.
Simon Pilgrim [Tue, 26 Jan 2021 14:57:37 +0000 (14:57 +0000)]
Fix signed/unsigned comparison warning. NFCI.
Sander de Smalen [Tue, 26 Jan 2021 14:03:55 +0000 (14:03 +0000)]
[CostModel] Handle CTLZ and CCTZ in getTypeBasedIntrinsicInstrCost
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D95355
Sebastian Neubauer [Tue, 26 Jan 2021 11:44:02 +0000 (12:44 +0100)]
[AMDGPU] Add IntrWillReturn to three intrinsics
None of these can terminate a wave or lane.
With these, all intrinsic are IntrWillReturn except those that change
exec or can terminate the wave.
Not marking intrinsics as WillReturn may prevent optimizations in the
future: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148047.html
Differential Revision: https://reviews.llvm.org/D95436
Mirko Brkusanin [Tue, 26 Jan 2021 14:21:11 +0000 (15:21 +0100)]
[AMDGPU] Fix use of HasModifiers in VopProfile
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.
Differential Revision: https://reviews.llvm.org/D94795
Florian Hahn [Tue, 26 Jan 2021 13:43:39 +0000 (13:43 +0000)]
[Passes] Run peeling as part of simple/full loop unrolling.
Loop peeling removes conditions from loop bodies that become invariant
after a small number of iterations. When triggered, this leads to fewer
compares and possibly PHIs in loop bodies, enabling further
optimizations. The current cost-model of loop peeling should be quite
conservative/safe, i.e. only peel if a condition in the loop becomes
known after peeling.
For example, see PR47671, where loop peeling enables vectorization by
removing a PHI the vectorizer does not understand. Granted, the
loop-vectorizer could also be taught about constant PHIs, but loop
peeling is likely to enable other optimizations as well.
This has an impact on quite a few benchmarks from
MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example
Same hash: 186 (filtered out)
Remaining: 51
Metric: loop-vectorize.LoopsVectorized
Program base patch diff
test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5%
test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4%
test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9%
test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0%
test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1%
test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8%
test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5%
test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1%
test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0%
Same hash: 186 (filtered out)
Remaining: 51
Metric: loop-vectorize.LoopsAnalyzed
Program base patch diff
test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4%
test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3%
test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2%
test-suite...lications/minisat/minisat.test 134.00 135.00 0.7%
test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7%
test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5%
test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3%
test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3%
test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1%
test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1%
test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0%
test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0%
test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0%
test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0%
Note the -11% regression in number of loops vectorized for skidmarks. I
suspect this corresponds to the fact that those loops are gone now (see
the reduction in number of loops analyzed by LV).
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D88471
Lang Hames [Tue, 26 Jan 2021 12:56:01 +0000 (23:56 +1100)]
[ORC] Attempt to auto-claim responsibility for weak defs in ObjectLinkingLayer.
Compilers may insert new definitions during compilation, E.g. EH personality
function pointers, or named constant pool entries. This commit causes
ObjectLinkingLayer to attempt to claim responsibility for all weak definitions
in objects as they're linked. This is always safe (first claimant for each
symbol is granted responsibility, subsequent claims are rejected without error)
and prevents compiler-injected symbols from being dead-stripped (which they
will be if they remain unclaimed by anyone).
This change was motivated by errors seen by an out-of-tree client while testing
eh-frame support in JITLink ELF/x86-64: IR containing exceptions didn't define
DW.ref.__gxx_personality_v0 (since it's added by CodeGen), and this caused
DW.ref.__gxx_personality_v0 to be dead-stripped leading to linker failures.
No test case yet: We won't have a way to test in-tree until we enable JITLink
for lli on Linux.
Andrzej Warzynski [Tue, 26 Jan 2021 12:54:12 +0000 (12:54 +0000)]
Revert "[flang] Search for #include "file" in right directory"
This reverts commit
d987b61b1dce9948801ac37704477e7c257100b1.
As pointed out in https://reviews.llvm.org/D95388, the reverted commit
causes build failures in the following Flang buildbots:
* http://lab.llvm.org:8011/#/builders/32/builds/2642
* http://lab.llvm.org:8011/#/builders/33/builds/2131
* http://lab.llvm.org:8011/#/builders/135/builds/1473
* http://lab.llvm.org:8011/#/builders/66/builds/1559
* http://lab.llvm.org:8011/#/builders/134/builds/1409
* http://lab.llvm.org:8011/#/builders/132/builds/1817
I'm guessing that the patch was only tested with
`FLANG_BUILD_NEW_DRIVER=Off` (i.e. the default). The builders listed
above set `FLANG_BUILD_NEW_DRIVER` to `On`.
Although fixing the build is relatively easy, the reverted patch
modifies the behaviour of the frontend, which breaks driver tests. In
particular, in https://reviews.llvm.org/D93453 support for `-I` was
added that depends on the current behaviour. The reverted patch
changes that behaviour. Either the tests have to be updated or the
change fine-tuned.
Zarko Todorovski [Tue, 26 Jan 2021 12:43:22 +0000 (07:43 -0500)]
Remove requirement for -maltivec to be used when using -mabi=vec-extabi or -mabi=vec-default when not using vector code
The previous implementation required that `-maltivec` be specified when using either `-mabi=vec-extabi` or `-mabi=vec-default`, this patch removes that requirement.
Reviewed By: cebowleratibm
Differential Revision: https://reviews.llvm.org/D94986
Lang Hames [Tue, 26 Jan 2021 12:52:44 +0000 (23:52 +1100)]
[ORC] Fix debug logging message.
Lang Hames [Tue, 26 Jan 2021 12:46:33 +0000 (23:46 +1100)]
[JITLink][ELF/x86-64] When building PLT stub, use -4 offset for PCRel32.
This is required for ELF where PCRel32 doesn't implicitly subtract 4.
No test case yet: I haven't figured out a good way to test stub
generation -- this may required extensions to jitlink-check.
Alexey Bataev [Tue, 26 Jan 2021 12:43:31 +0000 (07:43 -0500)]
[LIBOMPTARGET]FIX define declaration, NFC
Fixed declaration of define by adding a comma symbol. Required to fix build without profiling.
Alex Zinenko [Tue, 26 Jan 2021 12:30:45 +0000 (13:30 +0100)]
[mlir] drop unused statics
Adhemerval Zanella [Wed, 13 Jan 2021 17:29:16 +0000 (17:29 +0000)]
[LLD][ELF][AArch64] Add support for R_AARCH64_LD64_GOTPAGE_LO15 relocation
It is not used by LLVM, but GCC might generates it when compiling
with -fpie, as indicated by PR#40357 [1].
[1] https://bugs.llvm.org/show_bug.cgi?id=40357
Dmitry Preobrazhensky [Tue, 26 Jan 2021 11:52:24 +0000 (14:52 +0300)]
[AMDGPU][MC] Refactored exp tgt handling
Summary:
- Separated tgt encoding from parsing;
- Separated tgt decoding from printing;
- Improved errors handling;
- Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0
Reviewers: arsenm, rampitec, foad
Differential Revision: https://reviews.llvm.org/D95216
Eugene Zhulenev [Tue, 26 Jan 2021 10:40:43 +0000 (02:40 -0800)]
[mlir] Async: add a separate pass to lower from async to async.coro and async.runtime
Depends On D95000
Move async.execute outlining and async -> async.runtime lowering into the separate Async transformation pass
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D95311
David Sherwood [Tue, 26 Jan 2021 10:59:36 +0000 (10:59 +0000)]
[SVE] Fix some logical arithmetic tests
There were some right-shift tests in
CodeGen/AArch64/sve-int-arith-imm.ll
that were being folded away because we were shifting all the bits
out to the right. I've updated the tests to ensure this doesn't
happen.
Marek Kurdej [Tue, 26 Jan 2021 10:58:56 +0000 (11:58 +0100)]
Revert "[clang-format] add case aware include sorting"
This reverts commit
3395a336b02538d0bb768ccfae11c9b6151b102e as there was a post-merge doubt about option naming and type.
Eugene Zhulenev [Mon, 25 Jan 2021 22:14:12 +0000 (14:14 -0800)]
[mlir:async] Use ODS to define async types
Depends On D94923
Migrate Async dialect to ODS `TypeDef`
Reviewed By: ftynse, rriddle
Differential Revision: https://reviews.llvm.org/D95000
Georgii Rymar [Mon, 25 Jan 2021 14:15:31 +0000 (17:15 +0300)]
[yaml2obj][obj2yaml] - Improve how we set/dump the sh_entsize field.
We already set the `sh_entsize` field in a single place
for all non-implicit sections.
This patch reorders the logic slightly and with it
we finally have the only one place where the `sh_entsize` is set.
obj2yaml will not dump the `EntSize` key for `SHT_DYNSYM/SHT_SYMTAB` sections anymore,
when the value of `sh_entsize` is equal to `sizeof(Elf_Sym)`
Note that this also seems revealed an issue in llvm-objcopy:
Previously yaml2obj set the `sh_entsize` for the `.symtab` section to 0x18,
now we it sets it for `SHT_SYMTAB` sections, i.e. by type.
But the `llvm-objcopy/ELF/only-keep-debug.test` has a `.symtab` section of type `SHT_STRTAB`,
and now yaml2obj sets the `sh_entsize` to 0 for it.
I had to update the corresponding check lines for `ES`, but the behavior of
`llvm-objcopy` should be fixed instead I think.
I've added a TODO and a comment.
Differential revision: https://reviews.llvm.org/D95364
Martin Storsjö [Tue, 26 Jan 2021 10:29:14 +0000 (12:29 +0200)]
[llvm-nm] Silence a gcc warning about a stray semicolon. NFC.
Ben Shi [Tue, 26 Jan 2021 09:50:56 +0000 (17:50 +0800)]
[update_llc_test_checks] Support AVR
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D95240
Georgii Rymar [Tue, 26 Jan 2021 09:42:24 +0000 (12:42 +0300)]
[LLDB][test] - Fix test after yaml2obj change.
D95354 started to set the sh_link field for SHT_SYMTAB sections.
Previously it was set for symbol tables basing on their names (e.g. ".symtab").
This test now crashes see:
http://lab.llvm.org:8011/#/builders/68/builds/5911
I updated it to restore the old behavior.
Jan Svoboda [Tue, 26 Jan 2021 08:35:32 +0000 (09:35 +0100)]
[clang][cli] Port GPU-related language options to marshalling system
Port some GPU-related language options to the marshalling system for automatic command line parsing and generation.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D95345
Georgii Rymar [Mon, 25 Jan 2021 13:26:21 +0000 (16:26 +0300)]
[yaml2obj] - Refine how we set the sh_link field. NFCI.
This refactors the logic that sets the `sh_link` field.
With this patch we set it in a single place for all sections.
Differential revision: https://reviews.llvm.org/D95354