Sander de Smalen [Tue, 28 Sep 2021 18:44:10 +0000 (19:44 +0100)]
[AArch64][SVE] Fix extract_subvector patterns for unpacked fp types.
The patterns added in D110163 were incorrect, since it used the wrong
element widths for its shuffles.
Example for nxv2f16 extract_subvector(nxv8f16 %in, 6):
<a|b|c|d|e|f|g|h>
^^^
extract g and h.
=> UUNPKHI .h -> .s results in:
<e |f |g |h >
=> UUNPKHI .s -> .d results in:
<g |h >
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D110523
Martin Storsjö [Thu, 23 Sep 2021 10:38:36 +0000 (13:38 +0300)]
[X86] Fix handling of i128<->fp on Windows
On Windows, i128 arguments are passed as indirect arguments, and
they are returned in xmm0.
This is mostly fixed up by `WinX86_64ABIInfo::classify` in Clang, making
the IR functions return v2i64 instead of i128, and making the arguments
indirect. However for cases where libcalls are generated in the target
lowering, the lowering uses the default x86_64 calling convention for
i128, where they are passed/returned as a register pair.
Add custom lowering logic, similar to the existing logic for i128
div/mod (added in
4a406d32e97b1748c4eed6674a2c1819b9cf98ea),
manually making the libcall (while overriding the return type to
v2i64 or passing the arguments as pointers to arguments on the stack).
X86CallingConv.td doesn't seem to handle i128 at all, otherwise
the windows specific behaviours would ideally be implemented as
overrides there, in generic code, handling these cases automatically.
This fixes https://bugs.llvm.org/show_bug.cgi?id=48940.
Differential Revision: https://reviews.llvm.org/D110413
Amara Emerson [Wed, 29 Sep 2021 09:54:18 +0000 (02:54 -0700)]
[AArch64][GlobalISel] Add selection tests for vector G_UMULH/G_SMULH.
We already import these patterns from SelectionDAG.
David Spickett [Wed, 29 Sep 2021 09:52:27 +0000 (10:52 +0100)]
[AMDGPU] Require AMDGPU target for ASAN instrumentation tests
Should fix test failure on Arm/AArch64 quick bots which
only build those targets.
https://lab.llvm.org/buildbot/#/builders/171/builds/4077
Jay Foad [Wed, 29 Sep 2021 09:11:57 +0000 (10:11 +0100)]
[RemoveRedundantDebugValues] Enable machine verification after this pass
Machine verification after RemoveRedundantDebugValues has been disabled
since the pass was first added in D105279, but I guess this was just due
to copy-and-paste. Enabling it does not show any problems in check-llvm
in an LLVM_ENABLE_EXPENSIVE_CHECKS build.
Differential Revision: https://reviews.llvm.org/D110688
Igor Kudrin [Wed, 29 Sep 2021 09:36:37 +0000 (16:36 +0700)]
[llvm-objcopy] Rename relocation sections together with their targets.
As for now, llvm-objcopy renames only sections that are specified
explicitly in --rename-section, while GNU objcopy keeps names of
relocation sections in sync with their targets. For example:
> readelf -S test.o
...
[ 1] .foo PROGBITS
[ 2] .rela.foo RELA
> objcopy --rename-section .foo=.bar test.o gnu.o
> readelf -S gnu.o
...
[ 1] .bar PROGBITS
[ 2] .rela.bar RELA
> llvm-objcopy --rename-section .foo=.bar test.o llvm.o
> readelf -S llvm.o
...
[ 1] .bar PROGBITS
[ 2] .rela.foo RELA
This patch makes llvm-objcopy to match the behavior of GNU objcopy better.
Differential Revision: https://reviews.llvm.org/D110352
Amara Emerson [Wed, 29 Sep 2021 09:09:21 +0000 (02:09 -0700)]
[AArch64][GlobalISel] Make some vector G_SMULH/G_UMULH legal.
Andrzej Warzynski [Wed, 29 Sep 2021 07:28:40 +0000 (07:28 +0000)]
[Flang] Fix failing plugin tests
The updated tests were originally added in
https://reviews.llvm.org/D109890 and are currently causing some
buildbots to fail.
This patch:
* adds missing items in the `REQUIRERS` list in tests
* adds `flangOmpReport` (the plugin library added in D109890) as a CMake
dependency for tests (only when examples are enabled)
Differential Revision: https://reviews.llvm.org/D110682
Pavel Labath [Wed, 29 Sep 2021 09:22:15 +0000 (11:22 +0200)]
[lldb/gdb-remote] Remove last_stop_packet_mutex
This is a remnant of the non-stop mode.
Krasimir Georgiev [Wed, 29 Sep 2021 08:52:53 +0000 (10:52 +0200)]
David Spickett [Wed, 29 Sep 2021 08:47:16 +0000 (09:47 +0100)]
[libcxx] Run u16string tests for gdb pretty printers
As far as I can tell these were just missed out when the tests
were first added. No specific reason they should be skipped.
Simon Moll [Wed, 29 Sep 2021 06:46:51 +0000 (08:46 +0200)]
[VP] Vector predicated vector splice intrinsic
This patch introduces the vector-predicated version of the
experimental_vector_splice intrinsic [1] at the IR level. It considers
the active vector length for both vectors and and uses a vector mask to
disable certain lanes in the result.
[1] https://reviews.llvm.org/D94708
Change originally authored by Vineet Kumar <vineet.kumar@bsc.es>
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D103898
Sven van Haastregt [Wed, 29 Sep 2021 08:40:06 +0000 (09:40 +0100)]
[OpenCL] Fix as_type3 invalid store creation
With -fpreserve-vec3-type enabled, a cast was not created when
converting from a non-vec3 type to a vec3 type, even though a
conversion to vec3 was performed. This resulted in creation of
invalid store instructions.
Differential Revision: https://reviews.llvm.org/D108470
Eric Schweitz [Wed, 29 Sep 2021 08:30:54 +0000 (10:30 +0200)]
[fir][NFC] Rename operand of ArrayCoorOp
Rename `lenParams` to `typeparams` to be in sync with fir-dev.
This patch is part of the upstreaming effort from fir-dev branch.
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D110645
Stefan Gränitz [Mon, 27 Sep 2021 12:36:50 +0000 (14:36 +0200)]
[ORC][examples] Port LLJITWithRemoteDebugging to SimpleRemoteEPC
Though this is a full port of the example, it is not yet fully functional due to a threading issue in the SimpleRemoteEPC implementation. The issue was discussed in D110530, but it needs a more thorough solution. For now we are dropping the dependency to the old `OrcRPC` here (it's been the last use-case in-tree). The test for the example is under review in ... and will be re-enabled once the threading issue is solved.
Lang Hames [Wed, 29 Sep 2021 05:08:47 +0000 (22:08 -0700)]
[ORC-RT] Add target dependencies to ORC-RT regression tests.
check-orc-rt had no cmake target dependency on orc or llvm-jitlink, which
could lead to regression test failures in compiler-rt. This patch should
fix the issue.
Patch by Jack Andersen (jackoalan@gmail.com). Thanks Jack!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D110659
Lang Hames [Wed, 29 Sep 2021 02:11:28 +0000 (19:11 -0700)]
[JITLink][MachO][x86-64] Add support for splitting compact-unwind sections.
Follow-up to
fc734da7954 to enable compact-unwind splitting on x86-64.
Jinsong Ji [Wed, 29 Sep 2021 01:53:29 +0000 (01:53 +0000)]
[AIX] Enable PGO without LTO
On AIX, we relied on LTO to merge the csects for profiling data/counter
sections.
AIX binder now get the namedcsect support to support the merging,
so now we can enable PGO without LTO with the new binder.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D110671
hsmahesha [Wed, 29 Sep 2021 01:48:36 +0000 (07:18 +0530)]
[AMDGPU] Do not internalize ASan device library functions.
ASan device library functions (those starts with the prefix __asan_)
are at the moment undergoing through undesired optimizations due to
internalization. Hence, in order to avoid such undesired optimizations
on ASan device library functions, do not internalize them in the first
place.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D110468
LLVM GN Syncbot [Wed, 29 Sep 2021 01:03:11 +0000 (01:03 +0000)]
[gn build] Port
c07f7099690e
Sterling Augustine [Wed, 29 Sep 2021 00:09:18 +0000 (17:09 -0700)]
Revert "Recommit "[AArch64] Split bitmask immediate of bitwise AND operation""
This reverts commit
73a196a11c0e6fe7bbf33055cc2c96ce3c61ff0d.
Causes crashes as reported in https://reviews.llvm.org/D109963
Itay Bookstein [Mon, 27 Sep 2021 22:36:41 +0000 (15:36 -0700)]
[SelectionDAG] Fix incorrect condition for shift amount truncation
Comment says:
// If the operand is larger than the shift count type but the shift
// count type has enough bits to represent any shift value ...
It clearly talks about the shifted operand, not the shift-amount operand,
but the comparison is performed against Log2_32_Ceil(Op2.getValueSizeInBits())
where Op2 is the shift amount operand. This comparison also doesn't make
sense in the context of the previous one (ShiftsSize > Op2Size) because
Op2Size == Op2.getValueSizeInBits(). Fix to use Op1.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D110509
Jinsong Ji [Tue, 28 Sep 2021 22:28:13 +0000 (22:28 +0000)]
[AIX] Change the linkage of profiling counter/data to be private
We generate symbols like `profc`/`profd` for each function, and put them into csects.
When there are weak functions, we generate weak symbols for the functions as well,
with ELF (and some others), linker (binder) will discard and only keep one copy of the weak symbols.
However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect,
as binder can NOT discard a subset of a csect.
This creates a unique challenge for using those symbols to calculate some relative offsets.
This patch changed the linkage of `profc`/`profd` symbols to be private, so that all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct.
The downside is that we won't be able to discard the duplicated counters and profile data,
but those can not be discarded even if we keep the weak linkage,
due to the binder limitation of not discarding a subsect of the csect either .
Reviewed By: Whitney, MaskRay
Differential Revision: https://reviews.llvm.org/D110422
Lang Hames [Wed, 29 Sep 2021 00:14:54 +0000 (17:14 -0700)]
[JITLink][MachO][arm64] Add support for splitting compact-unwind sections.
CompactUnwindSplitter splits compact-unwind sections on record boundaries and
adds keep-alive edges from target functions back to their respective records.
In MachO_arm64.cpp, a CompactUnwindSplitter pass is added to the pre-prune pass
list when setting up the standard pipeline.
This patch does not provide runtime support for compact-unwind, but is a first
step towards enabling it.
Jessica Paquette [Tue, 28 Sep 2021 22:57:02 +0000 (15:57 -0700)]
[AArch64][GlobalISel] Run overlapping_and after legalization
When we have code with truncates, those truncates may be changed into G_ANDs
with constants. These may, in turn, feed into other G_AND instructions.
Running this combine post-legalize allows us to optimize examples like this one:
https://godbolt.org/z/zrGY4dfEW
SDAG currently optimizes the example above so that there is only one `and`.
GISel doesn't optimize it, because the G_AND we'd optimize here is translated
as a G_TRUNC. Later, that G_TRUNC is turned into a G_AND during legalization.
Differential Revision: https://reviews.llvm.org/D110667
Teresa Johnson [Wed, 22 Sep 2021 18:41:44 +0000 (11:41 -0700)]
Clean up large copies of binaries copied into temp directories in tests
In looking at the disk space used by a ninja check-all, I found that a
few of the largest files were copies of clang and lld made into temp
directories by a couple of tests. These tests were added in D53021 and
D74811. Clean up these copies after usage.
Differential Revision: https://reviews.llvm.org/D110276
Arthur Eubanks [Tue, 28 Sep 2021 23:59:15 +0000 (16:59 -0700)]
[test] Specify triple in backend-attribute-error-warning.cpp
Tests fail on Windows otherwise.
Jessica Paquette [Tue, 28 Sep 2021 22:04:23 +0000 (15:04 -0700)]
[GlobalISel] Combine mulo x, 2 -> addo x, x
Similar to what SDAG does when it sees a smulo/umulo against 2
(see: `DAGCombiner::visitMULO`)
This pattern is fairly common in Swift code AFAICT.
Here's an example extracted from a Swift testcase:
https://godbolt.org/z/6cT8Mesx7
Differential Revision: https://reviews.llvm.org/D110662
Michael Jones [Tue, 28 Sep 2021 18:25:43 +0000 (18:25 +0000)]
[libc] Add support for 128 bit ints in limits.h
Also, this adds unit tests to check that limits.h complies with the C
standard.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D110643
Nico Weber [Tue, 28 Sep 2021 15:55:11 +0000 (11:55 -0400)]
[clang] Let PPCallbacks::PragmaWarning() pass specifier as enum instead of string
Differential Revision: https://reviews.llvm.org/D110635
Fred Grim [Sat, 25 Sep 2021 16:07:12 +0000 (09:07 -0700)]
fixes bug #51926 where dangling comma caused overrun
bug 51926 identified an issue where a dangling comma caused the cell count to be to off by one
Differential Revision: https://reviews.llvm.org/D110481
Vitaly Buka [Tue, 28 Sep 2021 18:20:18 +0000 (11:20 -0700)]
[NFC][sanitizer] Return StackDepotStats by value
Differential Revision: https://reviews.llvm.org/D110644
Sean Silva [Tue, 28 Sep 2021 21:58:51 +0000 (21:58 +0000)]
[mlir][Python] Fix lifetime of ExecutionEngine runtime functions.
We weren't retaining the ctypes closures that the ExecutionEngine was
calling back into, leading to mysterious errors.
Open to feedback about how to test this. And an extra pair of eyes to
make sure I caught all the places that need to be aware of this.
Differential Revision: https://reviews.llvm.org/D110661
Arthur Eubanks [Thu, 23 Sep 2021 20:54:24 +0000 (13:54 -0700)]
Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).
One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.
Previous revisions didn't properly declare the new dependencies.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
Rob Suderman [Tue, 28 Sep 2021 21:53:07 +0000 (14:53 -0700)]
[mlir][tosa] Add i32 to supported quantized type
Quantized int type should include I32 types as its the output of a quantizd
convolution or matmul operation.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D110651
Shoaib Meenai [Tue, 28 Sep 2021 03:10:23 +0000 (20:10 -0700)]
[CodeGen] Fix wrapping personality symbol on ARM
The ARM backend was explicitly setting global binding on the personality
symbol. This was added without any comment in
a7ec2dcefd954, which
introduced EHABI support (back in 2011). None of the other backends do
anything equivalent, as far as I can tell.
This causes problems when attempting to wrap the personality symbol.
Wrapped symbols are marked as weak inside LTO to inhibit IPO (see
https://reviews.llvm.org/D33621). When we wrap the personality symbol,
it initially gets weak binding, and then the ARM backend attempts to
change the binding to global, which causes an error in MC because of
attempting to change the binding of a symbol from non-global to global
(the error was added in https://reviews.llvm.org/D90108).
Simply drop the ARM backend's explicit global binding setting to fix
this. This matches all the other backends, and a large internal
application successfully linked and ran with this change, so it
shouldn't cause any problems. Test via LLD, since wrapping is required
to exhibit the issue.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D110609
Stuart Ellis [Tue, 28 Sep 2021 21:17:27 +0000 (22:17 +0100)]
Flang OpenMP Report Plugin
This plugin parses Fortran files and creates a
YAML report with all the OpenMP constructs and
clauses seen in the file.
The following tests have been modified to be
compatible for testing the plugin, hence why
they are not reused from another directory:
- omp-atomic.f90
- omp-declarative-directive.f90
- omp-device-constructs.f90
The plugin outputs a single file in the same
directory as the source file in the following format:
`<source-file-name>.yaml`
Building the plugin:
`ninja flangOmpReport`
Running the plugin:
`./bin/flang-new -fc1 -load lib/flangOmpReport.so -plugin flang-omp-report -fopenmp <source_file.f90>`
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Co-authored-by: Stuart Ellis <stuart.ellis@arm.com>
Reviewed By: awarzynski, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D109890
bakhtiyar [Tue, 28 Sep 2021 21:35:15 +0000 (14:35 -0700)]
Remove unnecessary async group creates and awaits.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D110605
bakhtiyar [Tue, 28 Sep 2021 21:34:53 +0000 (14:34 -0700)]
Rename target block size to min task size for clarity.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D110604
Arthur Eubanks [Tue, 28 Sep 2021 21:49:27 +0000 (14:49 -0700)]
Revert "[clang] Rework dontcall attributes"
This reverts commit
2943071e2ee0c7f31f34062a44d12aeb0e3a66fd.
Breaks bots
Arthur Eubanks [Tue, 28 Sep 2021 21:42:23 +0000 (14:42 -0700)]
Revert "[test] Pin some RUN lines in optimization-remark.c to new PM"
This reverts commit
952f030fe6ade193ead8f23a7654cf8d2c7aa3df.
Causes bot failures.
Amy Zhuang [Tue, 28 Sep 2021 20:54:15 +0000 (13:54 -0700)]
[mlir] Unroll-and-jam loops with iter_args.
Unroll-and-jam currently doesn't work when the loop being unroll-and-jammed
or any of its inner loops has iter_args. This patch modifies the
unroll-and-jam utility to support loops with iter_args.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110085
Louis Dionne [Thu, 23 Sep 2021 22:09:17 +0000 (18:09 -0400)]
[libc++] Simplify std::ranges::subrange
Instead of using a base class to store the members and the optional
size, use [[no_unique_address]] to achieve the same thing without
needing a base class.
Also, as a fly-by:
- Change subrange from struct to class (per the standard)
- Improve the diagnostic for when one doesn't provide a size to the ctor of a sized subrange
- Replace this->member by just member since it's not in a dependent base anymore
This change would be an ABI break due to [[no_unique_address]], but we
haven't shipped ranges anywhere yet, so this shouldn't affect anyone.
Differential Revision: https://reviews.llvm.org/D110370
Arthur Eubanks [Tue, 28 Sep 2021 21:29:33 +0000 (14:29 -0700)]
[test] Pin some RUN lines in optimization-remark.c to new PM
Some people downstream are reporting that this test fails. I've been
unable to reproduce, but there is indeed something spooky going on.
Pinning to the new PM suppresses the failure. I'm continuing to
investigate this.
Arthur Eubanks [Thu, 23 Sep 2021 20:54:24 +0000 (13:54 -0700)]
[clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).
One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
Siva Chandra Reddy [Wed, 25 Aug 2021 19:52:09 +0000 (19:52 +0000)]
[libc] Add implementations of the C standard condition variable functions.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D108948
Sanjay Patel [Tue, 28 Sep 2021 20:59:18 +0000 (16:59 -0400)]
[InstCombine] reduce redundant code for shl-binop folds
This is NFCI (no-functional-change-intended), but there
are benign diffs possible with commutable ops as seen in
the test diffs.
The transforms were repeated for the commutative opcodes,
but that should not be necessary if we canonicalize the
patterns that we're matching. If both operands of the
binop match, that should get folded eventually.
The transform that starts with a mask op seems to
over-constrain the use checks, so that could be a
potential enhancement.
Sanjay Patel [Tue, 28 Sep 2021 20:51:19 +0000 (16:51 -0400)]
[InstCombine] add multi-use tests for shl folds; NFC
Fangrui Song [Tue, 28 Sep 2021 20:39:41 +0000 (13:39 -0700)]
[LTO] Avoid repeated Triple construction. NFC
thomasraoux [Tue, 28 Sep 2021 16:26:41 +0000 (09:26 -0700)]
[mlir] Fix bug in FoldSubview with rank reducing subview
Fix how we calculate the new permutation map of the transfer ops.
Differential Revision: https://reviews.llvm.org/D110638
Lang Hames [Tue, 28 Sep 2021 20:13:57 +0000 (13:13 -0700)]
[llvm-jitlink] Add -slab-page-size to tests that need it.
Also fixes 80-column rule violations.
Louis Dionne [Tue, 28 Sep 2021 20:02:43 +0000 (16:02 -0400)]
[libc++] Clarify the name of Lit features related to standard library selection
Before this patch, we had features named 'libc++', 'libstdc++' and
'msvc' to describe the three implementations that use our test suite.
This patch renames them to 'stdlib=libc++', 'stdlib=libstdc++', etc
to avoid confusion between MSVC's STL and the MSVC compiler (or Clang
in MSVC mode).
Furthermore, this prepares the terrain for adding support for additional
"implementations" to the test suite. Basically, I'd like to be able to
treat Apple's libc++ differently from LLVM's libc++ for the purpose of
testing, because those effectively behave in different ways in some aspects.
serge-sans-paille [Tue, 28 Sep 2021 19:54:27 +0000 (21:54 +0200)]
Fix memcpy-nobuiltin.c test case
Make it more generic by accepting weak_odr and dso_local specifiers.
Differential Revision: https://reviews.llvm.org/D109967
Nikita Popov [Tue, 28 Sep 2021 19:49:36 +0000 (21:49 +0200)]
Revert "Improve the effectiveness of BDCE's debug info salvaging"
This reverts commit
f6954bf80472cbfc06e39dac75a4a72120c9bd15.
This breaks the test-suite O3 build:
/home/nikic/llvm-test-suite/build-O3/tools/timeit --summary Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.time /home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -save-stats=obj -save-stats=obj -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -MF Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.d -o Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -c ../Bitcode/Benchmarks/Halide/local_laplacian/local_laplacian.bc
While deleting: i64 %
Use still stuck around after Def is destroyed: %12620 = mul i64 %12619, <badref>
clang++: /home/nikic/llvm-project/llvm/lib/IR/Value.cpp:103: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.
Anna Thomas [Tue, 28 Sep 2021 19:32:49 +0000 (15:32 -0400)]
Add profile count. Regenerate check lines. NFC
Function profile counts added to test cases. Regenerated test lines for
loop predication test.
wlei [Fri, 24 Sep 2021 18:32:32 +0000 (11:32 -0700)]
[llvm-profgen] Strip context to support non-CS profile generation for hybrid sample
Differential Revision: https://reviews.llvm.org/D109769
serge-sans-paille [Thu, 16 Sep 2021 16:13:15 +0000 (18:13 +0200)]
Simplify handling of builtin with inline redefinition
(This is a recommit of
3d6f49a56995b845 that should no longer break validation since
bd379915de38a9af3d65e1).
It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.
Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.
Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.
After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite
Differential Revision: https://reviews.llvm.org/D109967
Leonard Chan [Tue, 28 Sep 2021 18:49:37 +0000 (11:49 -0700)]
[llvm][profile] Add padding after binary IDs
Some tests with binary IDs would fail with error: no profile can be merged.
This is because raw profiles could have unaligned headers when emitting binary
IDs. This means padding should be emitted after binary IDs are emitted to
ensure everything else is aligned. This patch adds padding after each binary ID
to ensure the next binary ID size is 8-byte aligned. This also adds extra
checks to ensure we aren't reading corrupted data when printing binary IDs.
Differential Revision: https://reviews.llvm.org/D110365
Aaron Ballman [Tue, 28 Sep 2021 18:47:26 +0000 (14:47 -0400)]
Revert "Add support for `NOLINTBEGIN` ... `NOLINTEND` comments"
This reverts commit
c0687e1984a82925918c874b7bb68ad34c32aed0.
There are testing failures being caught by bots.
See http://45.33.8.238/linux/56886/step_8.txt as an example.
Sanjay Patel [Tue, 28 Sep 2021 18:28:53 +0000 (14:28 -0400)]
[InstCombine] add/move tests for shl with binop; NFC
Akira Hatanaka [Mon, 13 Sep 2021 16:15:16 +0000 (09:15 -0700)]
[docs ]Fix indentation
Kevin Athey [Tue, 28 Sep 2021 18:08:32 +0000 (11:08 -0700)]
Revert "Simplify handling of builtin with inline redefinition"
This reverts commit
3d6f49a56995b845c40be5827ded5d1e3f692cec.
Broke bot: https://lab.llvm.org/buildbot/#/builders/5/builds/12360
Artem Belevich [Mon, 27 Sep 2021 22:16:05 +0000 (15:16 -0700)]
[CUDA] Move CUDA SDK include path further down the include search path.
This allows clang to work on Linux distributions like Debian where
<CUDA-PATH>/include may be a symlink to /usr/include. We only need
`cuda_wrappers` to be present before the standard C++ library headers.
The CUDA SDK headers themselves do not need to be found that early.
This addresses https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995122
mentioned in post-commit comments on D108247
Differential Revision: https://reviews.llvm.org/D110596
Vitaly Buka [Tue, 28 Sep 2021 17:04:41 +0000 (10:04 -0700)]
[NFC][sanitizer] Clang-format some code
Siva Chandra Reddy [Tue, 28 Sep 2021 06:53:09 +0000 (06:53 +0000)]
[libc] Add FE_DFL_ENV and handle it in fesetenv.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D110611
Roman Gareev [Sat, 25 Sep 2021 14:19:33 +0000 (19:19 +0500)]
[Polly] Check the properties of accesses to operands of a matrix-matrix
multiplication
The following code modifies elements of the array D.
for (i = 0; i < _PB_NI; i++)
for (j = 0; j < _PB_NJ; j++)
{
for (k = 0; k < _PB_NK; k++)
{
double Mul = A[i][k] * B[k][j];
D[i][j][k] += Mul;
C[i][j] += Mul;
}
}
Nevertheless, the code is recognised as a matrix-matrix multiplication, since
the second and third dimensions of D are accessed with non-zero strides.
This fixes the typo, which was made during the translation to C++ bindings
(https://reviews.llvm.org/D35845).
Reviewed By: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D110491
Jozef Lawrynowicz [Tue, 28 Sep 2021 17:47:14 +0000 (20:47 +0300)]
[MSP430][Clang] Remove support for -mmcu=msp430
The -mmcu= option accepts a generic MCU named "msp430", which sets the
CPU to msp430 and disables hardware multiply support.
The current purpose of accepting this value is to allow -mmcu= to be
used as an alias for -mcpu=, however there are some downsides to doing
this. -mmcu= provides additional features that will interfere
with the expected behavior if the user tries to to use it as an alias
for -mcpu=.
-mmcu=msp430 will conflict with -mhwmult=, since the "msp430" MCU is
defined to have no hardware multiply support, so the user will not be
able to set an explicit hardware multiply version.
-mmcu=msp430 will put "-Tmsp430.ld" on the linker command line, however
TI's support files do not provide a linker script with this name and so
the user would have to explicitly create it.
Differential Revision: https://reviews.llvm.org/D108299
David Blaikie [Tue, 28 Sep 2021 02:33:37 +0000 (19:33 -0700)]
DebugInfo: Use sugared function type when emitting function declarations for call sites
Otherwise we're losing type information for these functions.
Lang Hames [Tue, 28 Sep 2021 17:25:11 +0000 (10:25 -0700)]
[llvm-jitlink] Add a -slab-page-size option to override process page size.
The slab allocator is frequently used in -noexec tests where we want a
consistent memory layout. In this context we also want to set the effective
page size, rather than using the page size of the host process, since not all
systems use the same page size. The -slab-page-size option allows us to set
the page size for such tests.
The -slab-page-size option will also be honored in exec mode when using the
slab allocator, but will trigger an error if the requested size is not a
multiple of the actual process page size.
This option was motivated by test failures on a ppc64 bot that was returning
zero from sys::Process::getPageSize(), so it also contains a check for errors
and zero results from that function if the -slab-page-size option is absent.
Existing slab allocator tests will be updated to use this option in a follow-up
commit so that we can point the failing bot at this commit and observe errors
associated with sys::Process::getPageSize().
Lang Hames [Sun, 26 Sep 2021 22:10:33 +0000 (15:10 -0700)]
[MCJIT] Mark test-global-ctors as UNSUPPORTED on Darwin, rather than XFAIL.
MachO doesn't have a '.text.startup' -- this is just plain unsupported.
Michael Jones [Mon, 27 Sep 2021 20:23:53 +0000 (20:23 +0000)]
[libc][NFC] Make strchr and strrchr more consistent
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D110581
Adrian Prantl [Mon, 27 Sep 2021 17:51:04 +0000 (10:51 -0700)]
Improve the effectiveness of BDCE's debug info salvaging
This patch improves the effectiveness of BDCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!
Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.
Differential Revision: https://reviews.llvm.org/D110568
Adrian Prantl [Sat, 25 Sep 2021 00:16:59 +0000 (17:16 -0700)]
Improve the effectiveness of ADCE's debug info salvaging
This patch improves the effectiveness of ADCE's debug info salvaging
by processing the instructions in reverse order and delaying
dropAllReferences until after debug info salvaging. This allows
salvaging of entire chains of deleted instructions!
Previously we would remove all references from an instruction, which
would make it impossible to use that instruction to salvage a later
instruction in the instruction stream, because its operands were
already removed.
Differential Revision: https://reviews.llvm.org/D110462
Adrian Prantl [Fri, 24 Sep 2021 23:35:04 +0000 (16:35 -0700)]
Add salvageDebugInfo support for truncating/extending ptr/int conversions.
This patch enables debug info salvaging for truncating/extending ptr
int conversions. The testcase uncovered a bug in adce, which is
addressed separately.
rdar://
80227769
Differential Revision: https://reviews.llvm.org/D110461
Paul Robinson [Fri, 24 Sep 2021 13:10:02 +0000 (06:10 -0700)]
[TargetLibraryInfo] Pick new/delete calls by target
There are two sets of new/delete functions, one with Windows/MSVC
mangling and one with Itanium mangling. Mark one set or the other
as unavailable depending on the target.
Split the test malloc-free-delete.ll into three parts: malloc-free.dll
for the C API tests, new-delete-itanium.ll and new-delete-msvc.ll for
the target-specific new/delete tests.
Differential Revision: https://reviews.llvm.org/D110419
Simon Pilgrim [Tue, 28 Sep 2021 16:59:30 +0000 (17:59 +0100)]
[CostModel][X86] Add SSE2/AVX1/AVX512BW test coverage for i16 interleaved load/store
Yuanfang Chen [Thu, 24 Jun 2021 06:46:42 +0000 (23:46 -0700)]
Diagnose -Wunused-value based on CFG reachability
(This relands
59337263ab45d7657e and makes sure comma operator
diagnostics are suppressed in a SFINAE context.)
While at it, add the diagnosis message "left operand of comma operator has no effect" (used by GCC) for comma operator.
This also makes Clang diagnose in the constant evaluation context which aligns with GCC/MSVC behavior. (https://godbolt.org/z/7zxb8Tx96)
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D103938
Fangrui Song [Tue, 28 Sep 2021 16:58:27 +0000 (09:58 -0700)]
[llvm-objdump] Fix -R display and support ET_EXEC
* Add a newline before `DYNAMIC RELOCATION RECORDS` (see D101796)
* Add the missing `OFFSET TYPE VALUE` line
* Align columns
Note: llvm-readobj/ELFDumper.cpp `loadDynamicTable` has sophisticated PT_DYNAMIC
code which is unavailable in llvm-objdump.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D110595
Alex Richardson [Tue, 28 Sep 2021 15:06:35 +0000 (16:06 +0100)]
[InstCombine] Fold ptrtoint(gep i8 null, x) -> x
This commit is the InstCombine follow-up to the previous constant-folding
change that enables noticeable optimizations for CHERI-enabled targets.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D110247
Alex Richardson [Tue, 28 Sep 2021 14:32:18 +0000 (15:32 +0100)]
[ConstantFolding] Fold ptrtoint(gep i8 null, x) -> x
I was looking at some missed optimizations in CHERI-enabled targets and
noticed that we weren't removing vtable indirection for calls via known
pointers-to-members. The underlying reason for this is that we represent
pointers-to-function-members as {i8 addrspace(200)*, i64} and generate the
constant offsets using (gep i8 null, <index>). We use a constant GEP here
since inttoptr should be avoided for CHERI capabilities. The pointer-to-member
call uses ptrtoint to extract the index, and due to this missing fold we can't
infer the actual value loaded from the vtable.
This is the initial constant folding change for this pattern, I will add
an InstCombine fold as a follow-up.
We could fold all inbounds GEP to null (and therefore the ptrtoint to
zero) since zero is the only valid offset for an inbounds GEP. If the
offset is not zero, that GEP is poison and therefore returning 0 is valid
(https://alive2.llvm.org/ce/z/Gzb5iH). However, Clang currently generates
inbounds GEPs on NULL for hand-written offsetof() expressions, so this
could lead to miscompilations.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110245
Alex Richardson [Tue, 28 Sep 2021 14:27:58 +0000 (15:27 +0100)]
[InstCombine][ConstantFold] Baseline tests for ptrtoint(gep null, x)
Differential Revision: https://reviews.llvm.org/D110244
Alex Richardson [Wed, 22 Sep 2021 09:09:17 +0000 (10:09 +0100)]
[NFC][clang] Add a CHECK lines to tests checking offsetof-like expressions
I am looking at constant-folding changes that could affect these tests, so
check that it emits the expected global value instead of just checking
that it doesn't crash.
Alex Richardson [Wed, 22 Sep 2021 09:48:22 +0000 (10:48 +0100)]
[NFC] Add a comment to member-function-pointer-calls.cpp
Looking at this test I did not see why MinGW was using a different command
line until I looked at the git history. Add a comment explaining what this
RUN line is actually testing. Also add two more RUN lines to show that
indirectly passed member pointers don't inhibit the optimization.
Alex Richardson [Wed, 22 Sep 2021 10:29:35 +0000 (11:29 +0100)]
Drop REQUIRES: arm-registered-target from an IR-only test
This works just fine even if the Arm backend is not built.
Alex Richardson [Fri, 24 Sep 2021 13:15:22 +0000 (14:15 +0100)]
[UpdateTestChecks][NFC] Drop a python2 workaround
Alex Richardson [Tue, 28 Sep 2021 14:10:39 +0000 (15:10 +0100)]
Fix incorrect GEP bitwidth in areNonOverlapSameBaseLoadAndStore()
When using a datalayout that has pointer index width != pointer size this
code triggers an assertion in Value::stripAndAccumulateConstantOffsets().
I encountered this this while compiling FreeBSD for CHERI-RISC-V.
Also update LoadsTest.cpp to use a DataLayout with index width != pointer
width to ensure this case is tested.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D110406
Alex Richardson [Tue, 28 Sep 2021 14:10:07 +0000 (15:10 +0100)]
[update_llc_test_checks.py] Fix MIPS ASM regex for functions with EH
On MIPS, functions with exception handling code emits an additional
temporary label at the start of the function (due to UseAssignmentForEHBegin):
_Z8do_catchv: # @_Z8do_catchv
.Ltmp3:
.set .Lfunc_begin0, .Ltmp3
.cfi_startproc
.cfi_personality 128, DW.ref.__gxx_personality_v0
.cfi_lsda 0, .Lexception0
.frame $c11,48,$c17
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.set nomacro
.set noat
# %bb.0: # %entry
The `[^:]*` regex was terminating the search after .Ltmp<N>: and therefore
not detecting functions with exception handling.
Reviewed By: atanasyan, MaskRay
Differential Revision: https://reviews.llvm.org/D100027
Alex Richardson [Tue, 28 Sep 2021 14:09:51 +0000 (15:09 +0100)]
[update_llc_test_checks] Baseline test for D100027
Show that we fail to generate CHECK lines for MIPS64 functions with EH.
Differential Revision: https://reviews.llvm.org/D110408
Arthur O'Dwyer [Mon, 27 Sep 2021 05:10:52 +0000 (01:10 -0400)]
[libc++] [compare] Rip out more vestiges of *_equality. NFCI.
There's really no reason to even have two different enums here,
but *definitely* we shouldn't have *three*, and they don't need
so many synonymous enumerator values.
Differential Revision: https://reviews.llvm.org/D110516
Roman Lebedev [Tue, 28 Sep 2021 16:15:08 +0000 (19:15 +0300)]
[X86][Costmodel] Load/store i16 Stride=6 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.
For load we have:
https://godbolt.org/z/5qGb9odP6 - for intels `Block RThroughput: <=106.0`; for ryzens, `Block RThroughput: <=34.8`
So pick cost of `106`.
For store we have:
https://godbolt.org/z/KrWcv4Ph7 - for intels `Block RThroughput: =58.0`; for ryzens, `Block RThroughput: <=20.5`
So pick cost of `58`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110593
Roman Lebedev [Tue, 28 Sep 2021 16:15:07 +0000 (19:15 +0300)]
[X86][Costmodel] Load/store i16 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/3Tc5s897j - for intels `Block RThroughput: =39.0`; for ryzens, `Block RThroughput: <=13.5`
So pick cost of `39`.
For store we have:
https://godbolt.org/z/fo1h9E67e - for intels `Block RThroughput: =21.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `21`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110592
Roman Lebedev [Tue, 28 Sep 2021 16:15:01 +0000 (19:15 +0300)]
[X86][Costmodel] Load/store i16 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1Wcaf9c7T - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `9`.
For store we have:
https://godbolt.org/z/1Wcaf9c7T - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `15`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110591
Roman Lebedev [Tue, 28 Sep 2021 16:14:56 +0000 (19:14 +0300)]
[X86][Costmodel] Load/store i16 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/bhscej4WM - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `13`.
For store we have:
https://godbolt.org/z/Yf4Pfnxbq - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=3.5`
So pick cost of `10`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110590
wlei [Sat, 25 Sep 2021 01:16:36 +0000 (18:16 -0700)]
[llvm-profgen][CSSPGO] On-demand function size computation for preinliner
Similar to https://reviews.llvm.org/D110465, we can compute function size on-demand for the functions that's hit by samples.
Here we leverage the raw range samples' address to compute a set of sample hit function. Then `BinarySizeContextTracker` just works on those function range for the size.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D110466
wlei [Sat, 25 Sep 2021 00:06:39 +0000 (17:06 -0700)]
[llvm-profgen] On-demand symbolization
Previously we do symbolization for all the functions and actually we only need the symbols that's hit by the samples.
This can significantly speed up the time for large size binary.
Optimization for per-inliner will come along with next patch.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110465
Quinn Pham [Wed, 8 Sep 2021 13:39:28 +0000 (08:39 -0500)]
[PowerPC] FP compare and test XL compat builtins.
This patch is in a series of patches to provide builtins for
compatability with the XL compiler. This patch adds builtins for compare
exponent and test data class operations on floating point values.
Reviewed By: #powerpc, lei
Differential Revision: https://reviews.llvm.org/D109437
Kazu Hirata [Tue, 28 Sep 2021 15:38:05 +0000 (08:38 -0700)]
[SystemZ] Remove redundant declaration SystemZMnemonicSpellCheck (NFC)
Note that SystemZMnemonicSpellCheck is defined in
SystemZGenAsmMatcher.inc, which SystemZAsmParser.cpp includes.
Identified with readability-redundant-declaration.
Roman Lebedev [Tue, 28 Sep 2021 15:23:17 +0000 (18:23 +0300)]
Revert "[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on Linux"
See original review https://reviews.llvm.org/D107799
This reverts commit
f9dbca68d48e705f6d45df8f58d6b2ee88bce76c.
Dmitry Vyukov [Mon, 27 Sep 2021 12:07:28 +0000 (14:07 +0200)]
tsan: print a meaningful frame for stack races
Depends on D110631.
Differential Revision: https://reviews.llvm.org/D110632
Dmitry Vyukov [Tue, 28 Sep 2021 14:59:45 +0000 (16:59 +0200)]
tsan: fix tls_race3 test on darwin
Darwin also needs to use __tsan_tls_initialization
to pass the test.
Differential Revision: https://reviews.llvm.org/D110631