Muiez Ahmed [Fri, 16 Sep 2022 14:22:21 +0000 (10:22 -0400)]
[SystemZ][z/OS] define REMOVE_ALL_USE_DIRECTORY_ITERATOR (libc++)
This patch fixes the z/OS build by using the first implementation of __remove_all since we don't have access to the openat() family of POSIX functions.
Differential Revision: https://reviews.llvm.org/D132948
Mark de Wever [Thu, 1 Sep 2022 16:38:03 +0000 (18:38 +0200)]
[libc++] Shows the detailed compiler version info.
The libc++ pre-commit CI uses Clang nightly builds. Currently it's not
possible to determine the exact version used since CMake doesn't show
this information by default. Instead use the --version flag to get this
information.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D133122
Sander de Smalen [Thu, 15 Sep 2022 15:17:23 +0000 (15:17 +0000)]
[AArch64][SME] Implement ABI for calls to/from streaming functions.
This patch implements the ABI for calls from:
Normal -> Streaming
Normal -> Streaming-compatible
Streaming -> Normal
Streaming -> Streaming-compatible
Streaming -> Streaming
The compiler inserts SMSTART/SMSTOP instructions before and after the call,
depending on the required transition.
More details about the SME attributes and design can be found
in D131562.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D131576
Florian Hahn [Fri, 16 Sep 2022 13:57:43 +0000 (14:57 +0100)]
[AArch64] Use tbl for truncating vector FPtoUI conversions.
On AArch64, doing the vector truncate separately after the fptoui
conversion can be lowered more efficiently using tbl.4, building on
D133495.
https://alive2.llvm.org/ce/z/T538CC
Depends on D133495
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D133496
sstwcw [Fri, 16 Sep 2022 13:18:21 +0000 (13:18 +0000)]
[clang-format] Fix template arguments in macros
Fixes https://github.com/llvm/llvm-project/issues/57738
old
```
#define FOO(typeName, realClass) \
{ \
#typeName, foo < FooType>(new foo <realClass>(#typeName)) \
}
```
new
```
#define FOO(typeName, realClass) \
{ #typeName, foo<FooType>(new foo<realClass>(#typeName)) }
```
Previously, when an UnwrappedLine began with a hash in a macro
definition, the program incorrectly assumed the line was a preprocessor
directive. It should be stringification.
The rule in spaceRequiredBefore was added in
8b5297117b. Its purpose is
to add a space in an include directive. It also added a space to a
template opener when the line began with a stringification hash. So we
changed it.
Reviewed By: HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D133954
sstwcw [Sat, 10 Sep 2022 19:28:37 +0000 (19:28 +0000)]
[clang-format] Parse the else part of `#if 0`
Fixes https://github.com/llvm/llvm-project/issues/57539
Previously things outside of `#if` blocks were parsed as if only the
first branch of the conditional compilation branch existed, unless the
first condition is 0. In that case the outer parts would be parsed as
if nothing inside the conditional parts existed. Now we use the second
conditional branch if the first condition is 0.
Reviewed By: owenpan
Differential Revision: https://reviews.llvm.org/D133647
Adrian Vogelsgesang [Wed, 14 Sep 2022 10:21:34 +0000 (03:21 -0700)]
[libunwind] Fix usage of `_dl_find_object` on 32-bit x86
On 32-bit x86, `_dl_find_object` also returns a `dlfo_eh_dbase` address.
So far, compiling against a version of `_dl_find_object` which returns a
`dlfo_eh_dbase` was blocked using a `#if` + `#error`. This commit now
removes this compile time assertion and simply ignores the returned
`dlfo_eh_dbase`. All test cases are passing on a 32-bit build now.
According to https://www.gnu.org/software/libc/manual/html_node/Dynamic-Linker-Introspection.html,
`dlfo_eh_dbase` should be the base address for all DW_EH_PE_datarel
relocations. However, glibc/elf/dl-find_object.h says that eh_dbase
is the relocated DT_PLTGOT value. I don't understand how those two
statements fit together, but to fix 32-bit x86, ignoring `dlfo_eh_dbase`
seems to be good enough.
Fixes #57733
Differential Revision: https://reviews.llvm.org/D133846
Mark de Wever [Tue, 13 Sep 2022 15:40:18 +0000 (17:40 +0200)]
[NFC][libc++][test] Uses public functions.
Replaces std::__format_context_create with the public wrapper
test_format_context_create.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D133781
Simon Pilgrim [Fri, 16 Sep 2022 12:03:06 +0000 (13:03 +0100)]
[CostModel][X86] Add CostKinds handling for vector integer comparisons
These were based off a mixture of vector integer add/sub costs and the numbers from the 'cost-tables vs llvm-mca' script from D103695 - the extra costs for different predicates are still proving tricky to implement, but I've gotten most costs to within +/1 now - the AVX512 are tricky as we still don't handle predicate results properly, so most of these were done by hand.
Joseph Huber [Thu, 15 Sep 2022 23:28:52 +0000 (18:28 -0500)]
[Libomptarget] Revert changes to AMDGPU plugin destructors
These patches exposed a lot of problems in the AMD toolchain. Rather
than keep it broken we should revert it to its old semi-functional
state. This will prevent us from using device destructors but should
remove some new bugs. In the future this interface should be changed
once these problems are addressed more correctly.
This reverts commit
ed0f21811544320f829124efbb6a38ee12eb9155.
This reverts commit
2b7203a35972e98b8521f92d2791043dc539ae88.
Fixes #57536
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133997
Florian Hahn [Fri, 16 Sep 2022 11:42:49 +0000 (12:42 +0100)]
[AArch64] Lower vector trunc using tbl.
Similar to using tbl to lower vector ZExts, tbl4 can be used to lower
vector truncates.
The initial version support i32->i8 conversions.
Depends on D120571
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D133495
Aaron Ballman [Fri, 16 Sep 2022 11:19:30 +0000 (07:19 -0400)]
Fix the clang Sphinx bot
This addresses failures introduced by:
https://lab.llvm.org/buildbot/#/builders/92/builds/32809
It also fixes a secondary issue that crept in after the above build
started failing.
Kadir Cetinkaya [Thu, 15 Sep 2022 18:57:07 +0000 (20:57 +0200)]
[clang(d)] Include/Exclude CLDXC options properly
This handles the new CLDXC options that was introduced in
https://reviews.llvm.org/D128462 inside clang-tooling to make sure cl driver
mode is not broken.
Fixes https://github.com/clangd/clangd/issues/1292.
Differential Revision: https://reviews.llvm.org/D133962
Matheus Izvekov [Fri, 16 Sep 2022 10:03:34 +0000 (12:03 +0200)]
Revert "[clang] use getCommonSugar in an assortment of places"
This reverts commit
aff1f6310e5f4cea92c4504853d5fd824754a74f.
Matheus Izvekov [Sun, 10 Oct 2021 13:28:37 +0000 (15:28 +0200)]
[clang] use getCommonSugar in an assortment of places
For this patch, a simple search was performed for patterns where there are
two types (usually an LHS and an RHS) which are structurally the same, and there
is some result type which is resolved as either one of them (typically LHS for
consistency).
We change those cases to resolve as the common sugared type between those two,
utilizing the new infrastructure created for this purpose.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D111509
Nikita Popov [Thu, 15 Sep 2022 15:00:55 +0000 (17:00 +0200)]
[CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692)
Callee save registers must be preserved, so -fzero-call-used-regs
should not be zeroing them. The previous implementation only did
not zero callee save registers that were saved&restored inside the
function, but we need preserve all of them.
Fixes https://github.com/llvm/llvm-project/issues/57692.
Differential Revision: https://reviews.llvm.org/D133946
Stanislav Mekhanoshin [Thu, 15 Sep 2022 19:46:02 +0000 (12:46 -0700)]
[AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn
Differential Revision: https://reviews.llvm.org/D133966
Matheus Izvekov [Fri, 16 Sep 2022 09:37:55 +0000 (11:37 +0200)]
NFC: remove accidental inclusion of libcxx test changes
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Johannes Reifferscheid [Fri, 16 Sep 2022 09:38:30 +0000 (11:38 +0200)]
Fixes for D133947.
Johannes Reifferscheid [Fri, 16 Sep 2022 09:09:09 +0000 (11:09 +0200)]
Fix bufferization of collapse_shape of subviews with size 1 dims.
Currently, there's an optimization that claims dimensions of size 1 are always
contiguous. This is not necessarily the case for subviews.
```
Input:
[
[
[0, 1],
[2, 3]
],
[
[4, 5]
[6, 7]
]
]
Subview:
[
[
[0, 1],
],
[
[4, 5]
]
]
```
The old logic treats this subview as contiguous, when it is not.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D134026
Florian Hahn [Fri, 16 Sep 2022 09:25:27 +0000 (10:25 +0100)]
[AArch64] Add tests with 2 x tbl2 for v8i8 and nonconst masks.
Extra tests for D133491.
Matheus Izvekov [Wed, 25 May 2022 20:00:58 +0000 (22:00 +0200)]
[clang] template / auto deduction deduces common sugar
After upgrading the type deduction machinery to retain type sugar in
D110216, we were left with a situation where there is no general
well behaved mechanism in Clang to unify the type sugar of multiple
deductions of the same type parameter.
So we ended up making an arbitrary choice: keep the sugar of the first
deduction, ignore subsequent ones.
In general, we already had this problem, but in a smaller scale.
The result of the conditional operator and many other binary ops
could benefit from such a mechanism.
This patch implements such a type sugar unification mechanism.
The basics:
This patch introduces a `getCommonSugaredType(QualType X, QualType Y)`
method to ASTContext which implements this functionality, and uses it
for unifying the results of type deduction and return type deduction.
This will return the most derived type sugar which occurs in both X and
Y.
Example:
Suppose we have these types:
```
using Animal = int;
using Cat = Animal;
using Dog = Animal;
using Tom = Cat;
using Spike = Dog;
using Tyke = Dog;
```
For `X = Tom, Y = Spike`, this will result in `Animal`.
For `X = Spike, Y = Tyke`, this will result in `Dog`.
How it works:
We take two types, X and Y, which we wish to unify as input.
These types must have the same (qualified or unqualified) canonical
type.
We dive down fast through top-level type sugar nodes, to the
underlying canonical node. If these canonical nodes differ, we
build a common one out of the two, unifying any sugar they had.
Note that this might involve a recursive call to unify any children
of those. We then return that canonical node, handling any qualifiers.
If they don't differ, we walk up the list of sugar type nodes we dived
through, finding the last identical pair, and returning that as the
result, again handling qualifiers.
Note that this patch will not unify sugar nodes if they are not
identical already. We will simply strip off top-level sugar nodes that
differ between X and Y. This sugar node unification will instead be
implemented in a subsequent patch.
This patch also implements a few users of this mechanism:
* Template argument deduction.
* Auto deduction, for functions returning auto / decltype(auto), with
special handling for initializer_list as well.
Further users will be implemented in a subsequent patch.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D111283
Florian Hahn [Fri, 16 Sep 2022 09:20:10 +0000 (10:20 +0100)]
[AArch64] Lower extending uitofp using tbl.
On AArch64, doing the zero-extend separately first can be lowered more
efficiently using tbl, building on D120571.
https://alive2.llvm.org/ce/z/8Je595
Depends on D120571
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D133494
Alex Zinenko [Thu, 15 Sep 2022 16:30:03 +0000 (18:30 +0200)]
[mlir] switch bufferization to use strided layout attribute
Bufferization already makes the assumption that buffers pass function
boundaries in the strided form and uses the corresponding affine map layouts.
Switch it to use the recently introduced strided layout instead to avoid
unnecessary casts when bufferizing further operations to the memref dialect
counterparts that now largely rely on the strided layout attribute.
Depends On D133947
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D133951
Alex Zinenko [Thu, 15 Sep 2022 16:29:38 +0000 (18:29 +0200)]
[mlir] make remaining memref dialect ops produce strided layouts
The three following ops in the memref dialect: transpose, expand_shape,
collapse_shape, have been originally designed to operate on memrefs with
strided layouts but had to go through the affine map representation as the type
did not support anything else. Make these ops produce memref values with
StridedLayoutAttr instead now that it is available.
Depends On D133938
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D133947
Alex Zinenko [Thu, 15 Sep 2022 16:29:14 +0000 (18:29 +0200)]
[mlir] make memref.subview produce strided layout
Memref subview operation has been initially designed to work on memrefs with
strided layouts only and has never supported anything else. Port it to use the
recently added StridedLayoutAttr instead of extracting the strided from
implicitly from affine maps.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D133938
Nikita Popov [Fri, 9 Sep 2022 09:15:53 +0000 (11:15 +0200)]
[libcxx] Use interface library for libcxx-abi-shared
The libc++.so linker script generation uses the IMPORTED_LIBNAME
target property on libcxx-abi-shared. However, libcxx-abi-shared
is not an interface library and as such cannot have an
IMPORTED_LIBNAME target property.
Convert libcxx-abi-shared into an imported interface library
and use IMPORTED_LIBNAME in place of IMPORTED_LOCATION. This makes
linker script generation work correctly with system-libcxxabi.
I believe this fixes the issue that D131037 was intended to fix.
Differential Revision: https://reviews.llvm.org/D133566
Dmitry Makogon [Fri, 16 Sep 2022 07:49:55 +0000 (14:49 +0700)]
[Test] Add tests showing instcombine sinking opportunity (NFC)
InstCombine could sink instruction to NCD of its users.
jacquesguan [Thu, 15 Sep 2022 06:33:17 +0000 (14:33 +0800)]
[mlir][Math] Add constant folder for SinOp.
This patch adds constant folder for SinOp by using sin/sinf of libm.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D133915
Zequan Wu [Wed, 7 Sep 2022 19:29:31 +0000 (12:29 -0700)]
[LLDB][NativePDB] Global ctor and dtor should be global decls.
This fixes a crash that mistaken global ctor/dtor as funciton methods.
Differential Revision: https://reviews.llvm.org/D133446
Lang Hames [Fri, 16 Sep 2022 05:24:47 +0000 (22:24 -0700)]
[ORC-RT] Update COFF, ELF support after ExecutorAddrDiff change in
4c434831865.
Vitaly Buka [Fri, 16 Sep 2022 04:49:57 +0000 (21:49 -0700)]
[msan] Add msan-insert-check DEBUG_COUNTER
Fangrui Song [Fri, 16 Sep 2022 04:41:18 +0000 (21:41 -0700)]
[Driver][test] Disable hip-link-bc-to-bc.hip
As it was disabled due to unsupported feature "clang-driver" before.
Dave Lee [Fri, 16 Sep 2022 03:28:26 +0000 (20:28 -0700)]
[lldb] Improve formatting of skipped categories message (NFC)
jacquesguan [Fri, 16 Sep 2022 03:18:27 +0000 (11:18 +0800)]
[ORC-RT] Remove wrong getValue of ExecutorAddrDiff.
jacquesguan [Fri, 16 Sep 2022 02:31:18 +0000 (10:31 +0800)]
[RISCV][test] Add precommit test for D132923.
Fangrui Song [Fri, 16 Sep 2022 02:58:42 +0000 (19:58 -0700)]
[HIP][test] Avoid %T
%T is a deprecated lit feature. It refers to the parent directory.
When two tests in test/Driver refer to the same `%T/foo`, they are racy with each other.
%t includes the test name and is safe for use.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D133998
Jez Ng [Fri, 16 Sep 2022 02:55:41 +0000 (22:55 -0400)]
[lld-macho][reland] Add support for N_INDR symbols
This is similar to the `-alias` CLI option, but it gives finer-grained
control in that it allows the aliased symbols to be treated as private
externs.
While working on this, I realized that our `-alias` handling did not
cover the cases where the aliased symbol is a common or dylib symbol,
nor the case where we have an undefined that gets treated specially and
converted to a defined later on. My N_INDR handling neglects this too
for now; I've added checks and TODO messages for these.
`N_INDR` symbols cropped up as part of our attempt to link swift-stdlib.
Reviewed By: #lld-macho, thakis, thevinster
Differential Revision: https://reviews.llvm.org/D133825
Lang Hames [Thu, 15 Sep 2022 03:11:22 +0000 (20:11 -0700)]
[ORC-RT] Invert the layout of the trivial-jit-re-dlopen testcase.
Compiles and moves the original C code for main to Inputs/dlopen-dlclose-x2.S,
where it can be shared with other testcases that want a
dlopen-dlclose-dlopen-dlclose sequence. The assembly containging the
initializers to be tested is moved into the test file.
Lang Hames [Fri, 16 Sep 2022 02:06:16 +0000 (19:06 -0700)]
[ORC-RT] Make ExecutorAddrDiff an alias for uint64_t.
Unlike ExecutorAddr, there's limited value to having a distinct type for
ExecutorAddrDiff, and it's occasionally awkward to work with. The corresponding
LLVM type (llvm::orc::ExecutorAddrDiff) was already made a type-alias in
9e2cfb061a882.
Gulfem Savrun Yeniceri [Fri, 26 Aug 2022 16:38:44 +0000 (16:38 +0000)]
[InstrProfiling] No runtime hook for unused funcs
This is a reland of https://reviews.llvm.org/D122336.
Original patch caused a problem in collecting coverage in
Fuchsia because it was returning early without putting unused
function names into __llvm_prf_names section. This patch
fixes that issue.
The original commit message is as the following:
CoverageMappingModuleGen generates a coverage mapping record
even for unused functions with internal linkage, e.g.
static int foo() { return 100; }
Clang frontend eliminates such functions, but InstrProfiling pass
still emits runtime hook since there is a coverage record.
Fuchsia uses runtime counter relocation, and pulling in profile
runtime for unused functions causes a linker error:
undefined hidden symbol: __llvm_profile_counter_bias.
Since https://reviews.llvm.org/D98061, we do not hook profile
runtime for the binaries that none of its translation units
have been instrumented in Fuchsia. This patch extends that for
the instrumented binaries that consist of only unused functions.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D122336
Brad Smith [Fri, 16 Sep 2022 01:43:01 +0000 (21:43 -0400)]
[lit] Set shlibpath_var on OpenBSD
Yuta Mukai [Thu, 15 Sep 2022 16:52:18 +0000 (01:52 +0900)]
[MachinePipeliner] Fix the interpretation of the scheduling model
The method of counting resource consumption is modified to be based on
"Cycles" value when DFA is not used.
The calculation of ResMII is modified to total "Cycles" and divide it
by the number of units for each resource. Previously, ResMII was
excessive because it was assumed that resources were consumed for
the cycles of "Latency" value.
The method of resource reservation is modified similarly. When a
value of "Cycles" is larger than 1, the resource is considered to be
consumed by 1 for cycles of its length from the scheduled cycle.
To realize this, ResourceManager maintains a resource table for all
slots. Previously, resource consumption was always 1 for 1 cycle
regardless of the value of "Cycles" or "Latency".
In addition, the number of micro operations per cycle is modified to
be constrained by "IssueWidth". To disable the constraint,
--pipeliner-force-issue-width=100 can be used.
For the case of using DFA, the scheduling results are unchanged.
Reviewed By: dpenry
Differential Revision: https://reviews.llvm.org/D133572
Colin Cross [Thu, 15 Sep 2022 23:58:57 +0000 (23:58 +0000)]
Set HOME for tests that use module cache path
Getting the default module cache path calls llvm::sys::path::cache_directory,
which calls home_directory, which checks the HOME environment variable
before falling back to getpwuid. When compiling against musl libc,
which does not support NSS, and running on a machine that doesn't have
the current user in /etc/passwd due to NSS, no home directory can
be found. Set the HOME environment variable in the tests to avoid
depending on getpwuid.
Reviewed By: pirama, srhines
Differential Revision: https://reviews.llvm.org/D132984
Navid Emamdoost [Thu, 15 Sep 2022 22:33:43 +0000 (15:33 -0700)]
Add -fsanitizer-coverage=control-flow
Reviewed By: kcc, vitalybuka, MaskRay
Differential Revision: https://reviews.llvm.org/D133157
Jeffrey Byrnes [Thu, 15 Sep 2022 22:37:58 +0000 (15:37 -0700)]
[NFC] Fix tests in commit
20cf170e68def
Colin Cross [Thu, 15 Sep 2022 21:58:24 +0000 (21:58 +0000)]
Fix std::fpos pretty printer on musl
The mbstate_t field in std::fpos is an opaque type provied by libc,
and musl's implementation does not match the one used by glibc.
Change StdFposPrinter to verify its assumptions about the layout
of mbstate_t, and leave out the state printing if it doesn't match.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D132983
Aart Bik [Thu, 15 Sep 2022 20:38:14 +0000 (13:38 -0700)]
[mlir][sparse][python] improve sparse encoding test
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D133971
David Green [Thu, 15 Sep 2022 20:52:55 +0000 (21:52 +0100)]
[AArch64] Add some vector lowering tests and regenerate a couple of files. NFC
Roy Sundahl [Thu, 15 Sep 2022 19:09:35 +0000 (12:09 -0700)]
[test][fuzzer] XFAIL tvOS tests pending investigation. (rdar://
99981102)
These four tests are failing on tvOS devices (not simulators) so XFAIL
them for now for CI and investigate further.
rdar://
99981102
Differential Revision: https://reviews.llvm.org/D133963
Amy Huang [Thu, 15 Sep 2022 20:23:25 +0000 (20:23 +0000)]
Fix error in clang /MT equivalent flag patch.
This is a followup to reviews.llvm.org/D133457.
Philip Reames [Thu, 15 Sep 2022 19:50:00 +0000 (12:50 -0700)]
[RISCV] Verify merge operand is tied properly
Differential Revision: https://reviews.llvm.org/D133957
Philip Reames [Thu, 15 Sep 2022 19:47:58 +0000 (12:47 -0700)]
[RISCV] Verify VL operand on instructions if present
These should only be immediate values or GPR registers.
Differential Revision: https://reviews.llvm.org/D133953
Alexander Timofeev [Fri, 9 Sep 2022 17:32:51 +0000 (19:32 +0200)]
[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D133593
Vitaly Buka [Thu, 15 Sep 2022 19:32:43 +0000 (12:32 -0700)]
[test] Regenerate few tests
Erich Keane [Thu, 15 Sep 2022 19:07:23 +0000 (12:07 -0700)]
Stop trying to fixup 'overloadable' prototypeless functions.
While investigating something else, I discovered that a prototypeless
function with 'overloadable' was having the attribute left on the
declaration, which caused 'ambiguous' call errors later on. This lead to
some confusion. This patch removes the 'overloadable' attribute from
the declaration and leaves it as prototypeless, instead of trying to
make it variadic.
Joseph Huber [Thu, 15 Sep 2022 18:58:21 +0000 (13:58 -0500)]
[Libomptarget] Embed bitcode library in static library instead.
This patch changes the CMake to instead embed the already generated
LLVM-IR bitcode library into an object file to create the static
library. This is different from the previous method which generated them
separately. This will make the build faster and allow us to perform the
same internalization into a single library we do with the bitcode
library.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133952
Hanhan Wang [Thu, 15 Sep 2022 18:44:52 +0000 (11:44 -0700)]
[mlir][linalg] Propagate attributes when doing named ops conversion.
Custom attributes can be set on the operation. It prevents them to be
removed when doing named ops conversion.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D133892
Florian Hahn [Thu, 15 Sep 2022 18:35:25 +0000 (19:35 +0100)]
[CGP] Update failing test missed in
81a11da762577.
Groverkss [Thu, 15 Sep 2022 17:45:58 +0000 (18:45 +0100)]
[MLIR][Presburger] Improve unittest parsing
This patch adds better functions for parsing MultiAffineFunctions and
PWMAFunctions in Presburger unittests.
A PWMAFunction can now be parsed as:
```
PWMAFunction result = parsePWMAF({
{"(x, y) : (x >= 10, x <= 20, y >= 1)", "(x, y) -> (x + y)"},
{"(x, y) : (x >= 21)", "(x, y) -> (x + y)"},
{"(x, y) : (x <= 9)", "(x, y) -> (x - y)"},
{"(x, y) : (x >= 10, x <= 20, y <= 0)", "(x, y) -> (x - y)"},
});
```
which is much more readable than the old format since the output can be
described as an AffineMap, instead of coefficients.
This patch also adds support for parsing divisions in MultiAffineFunctions
and PWMAFunctions which was previously not possible.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D133654
Philip Reames [Thu, 15 Sep 2022 18:15:35 +0000 (11:15 -0700)]
[RISCV] Add test coverage for mixed fixed and scalable uses of splats
Florian Hahn [Thu, 15 Sep 2022 18:18:12 +0000 (19:18 +0100)]
[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.
This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.
This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.
This improves the generated code for loops like the one below in
combination with D96522.
int foo(uint8_t *p, int N) {
unsigned long long sum = 0;
for (int i = 0; i < N ; i++, p++) {
unsigned int v = *p;
sum += (v < 127) ? v : 256 - v;
}
return sum;
}
https://clang.godbolt.org/z/Wco866MjY
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D120571
Haojian Wu [Thu, 15 Sep 2022 18:01:20 +0000 (20:01 +0200)]
Revert "Fix bazel build after
84d07d021333f7b5716f0444d5c09105557272e0."
This reverts commit
10250c5a2a2ca6be683ff940d776648a2d5968e3 as the
related patch is being reverted.
Sergei Barannikov [Thu, 15 Sep 2022 17:42:47 +0000 (13:42 -0400)]
[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit. This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
Sanjay Patel [Thu, 15 Sep 2022 17:42:30 +0000 (13:42 -0400)]
[SCCP] convert ashr to lshr for non-negative shift value
This is similar to the existing signed instruction folds.
We get the obvious minimal patterns in other passes, but
this avoids potential missed folds when the multi-block
tests are converted to selects.
Sanjay Patel [Thu, 15 Sep 2022 17:10:24 +0000 (13:10 -0400)]
[SCCP] add tests for ashr range transforms; NFC
Amy Huang [Wed, 31 Aug 2022 22:09:45 +0000 (22:09 +0000)]
Add Clang driver flags equivalent to cl's /MD, /MT, /MDd, /MTd.
This will allow selecting the MS C runtime library without having to use
cc1 flags.
Differential Revision: https://reviews.llvm.org/D133457
Groverkss [Thu, 15 Sep 2022 17:30:57 +0000 (18:30 +0100)]
Revert "[MLIR][Presburger] Improve unittest parsing"
This reverts commit
84d07d021333f7b5716f0444d5c09105557272e0.
Reverted to fix a compilation issue on gcc8.
Groverkss [Thu, 15 Sep 2022 17:29:22 +0000 (18:29 +0100)]
Revert "[mlir] Remove the unused source file."
This reverts commit
e488ce29ec5ead2d518c183890215313c9d1b1f0.
Reverted to fix a compilation issue on gcc8.
Aart Bik [Thu, 15 Sep 2022 03:18:51 +0000 (20:18 -0700)]
[mlir][sparse] partially implement codegen for sparse_tensor.compress
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D133912
Siva Chandra Reddy [Thu, 15 Sep 2022 07:52:17 +0000 (07:52 +0000)]
[libc] Add the implementation of the "remove" function.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D133922
Craig Topper [Thu, 15 Sep 2022 16:38:02 +0000 (09:38 -0700)]
[IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation.
There are two ctlz intrinsics here with the zero_is_poison flag
set. There are also two comparisons that check if either of the
inputs the ctlzs are zero. We need to use a logical or to block
the poison from the ctlz if either of the inputs is zero.
Reviewed By: arsenm, aqjune
Differential Revision: https://reviews.llvm.org/D130680
Sanjay Patel [Thu, 15 Sep 2022 16:01:11 +0000 (12:01 -0400)]
[InstCombine] fold X*X == 0 --> X == 0
This is safe when the mul does not overflow:
https://alive2.llvm.org/ce/z/LedVVP
This could be extended to handle non-zero compare constants
and non-squared multiplies.
Sanjay Patel [Thu, 15 Sep 2022 15:56:45 +0000 (11:56 -0400)]
[InstCombine] add tests for X*X == 0; NFC
Simon Pilgrim [Thu, 15 Sep 2022 15:20:56 +0000 (16:20 +0100)]
[CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts
These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions.
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
Jay Foad [Thu, 15 Sep 2022 09:40:54 +0000 (10:40 +0100)]
[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction
Differential Revision: https://reviews.llvm.org/D133928
Joe Loser [Fri, 26 Aug 2022 03:31:22 +0000 (21:31 -0600)]
[libc++] Clean up `_LIBCPP_HAS_NO_PLATFORM_WAIT` macro
As the comment suggests, `_LIBCPP_HAS_NO_PLATFORM_WAIT` is not documented or
defined anywhere internally in the build system. It's a direct define in terms
of `_LIBCPP_HAS_NO_THREADS`. So, remove `_LIBCPP_HAS_NO_PLATFORM_WAIT` and use
`_LIBCPP_HAS_NO_THREADS` instead to control the desired behavior.
Differential Revision: https://reviews.llvm.org/D132715
Matt Arsenault [Sat, 23 Jul 2022 16:32:05 +0000 (12:32 -0400)]
AMDGPU: Use GlobalPriority for largest register tuples
Only do this for 16 and 32 register tuples, although we might want to
extend to 8 tuples.
It's incredibly expensive to spill these, and doing so majorly
interferes with the ability to allocate anything else in the function.
The lit tests show mostly sizeable improvements with a handful of tiny
regressions with large vectors.
Jakub Kuderski [Thu, 15 Sep 2022 15:34:43 +0000 (11:34 -0400)]
[mlir][arith] Support wide int cast emulation
Add support for `arith.extsi`, `arith.extui`, and `arith.trunci` ops.
Tested by checking the results for all 16-bit inputs when emulating i16 with i8.
Reviewed By: antiagainst, Mogball
Differential Revision: https://reviews.llvm.org/D133612
Dmitry Preobrazhensky [Thu, 15 Sep 2022 15:15:50 +0000 (18:15 +0300)]
[AMDGPU][MC][GFX11] Add disassembler tests for v_readfirstlane_b32
Differential Revision: https://reviews.llvm.org/D133437
Nico Weber [Thu, 15 Sep 2022 15:12:32 +0000 (11:12 -0400)]
Revert "[lld-macho] Add support for N_INDR symbols"
This reverts commit
5b8da10b87f7009c06215449e4a9c61dab91697a.
Breaks tests, see https://reviews.llvm.org/D133825
Sander de Smalen [Wed, 14 Sep 2022 15:53:13 +0000 (15:53 +0000)]
[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm()
A thread may not have access to SME or TPIDR2_EL0, so in order to
safely query PSTATE.SM in a streaming-compatible function, the
code should call `__arm_sme_state()`, as described in the ABI:
https://github.com/ARM-software/abi-aa/pull/123/commits/
c2bb09c4d4ee60a5787baf1ccc7e92e67e4240b7
This means that the value of pstate.sm is:
* 0 if the function is non-streaming.
* 1 if the function has `arm_streaming` or `arm_locally_streaming`.
* evaluated at runtime by a call to __arm_sme_state() otherwise.
This patch also adds a calling convention for calls to SME support routines.
At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic
and use function calls (with the corresponding cc) directly instead.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D131571
Dmitry Preobrazhensky [Thu, 15 Sep 2022 15:03:26 +0000 (18:03 +0300)]
[AMDGPU][MC][GFX11][NFC] Update disassembler tests for MIMG instructions
Differential Revision: https://reviews.llvm.org/D133411
Simon Pilgrim [Thu, 15 Sep 2022 14:55:00 +0000 (15:55 +0100)]
[CostModel][X86] Remove redundant SSSE3 checks from div/rem costs
These all match the default SSE2 costs so use those instead
Matt Arsenault [Sat, 23 Jul 2022 14:13:25 +0000 (10:13 -0400)]
RegAllocGreedy: Avoid overflowing priority bitfields
The class priority is expected to be at most 5 bits before it starts
clobbering bits used for other fields. Also clamp the instruction
distance in case we have millions of instructions.
AMDGPU was accidentally overflowing into the global priority bit in
some cases. I think in principal we would have wanted this, but in the
cases I've looked at, it had the counter intuitive effect and
de-prioritized the large register tuple.
Avoid using weird bit hack PPC uses for global priority. The
AllocationPriority field is really 5 bits, and PPC was relying on
overflowing this to 6-bits to forcibly set the global priority
bit. Split this out as a separate flag to avoid having magic behavior
for values above 31.
Simon Pilgrim [Thu, 15 Sep 2022 14:28:51 +0000 (15:28 +0100)]
[CostModel][X86] Remove redundant BTVER2 checks from arithmetic costs
These all match the default AVX/AVX1 costs so use those instead
Simon Pilgrim [Thu, 15 Sep 2022 14:25:52 +0000 (15:25 +0100)]
[CostModel][X86] Remove redundant BTVER2 checks from shift costs
These all match the default AVX/AVX1 costs so use those instead
Florian Hahn [Thu, 15 Sep 2022 14:12:33 +0000 (15:12 +0100)]
[AArch64] Add big-endian tests for trunc-to-tbl.ll
Extra tests for D133495.
Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:36:19 +0000 (16:36 +0300)]
[AMDGPU][MC][GFX11] Add validation of constant bus limitations for VOPD
Differential Revision: https://reviews.llvm.org/D133881
Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:29:53 +0000 (16:29 +0300)]
[AMDGPU][MC][GFX11] Add VOPD literals validation
Differential Revision: https://reviews.llvm.org/D133864
Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:24:25 +0000 (16:24 +0300)]
[AMDGPU][MC][NFC] Refactor AMDGPUAsmParser::validateVOPLiteral
Differential Revision: https://reviews.llvm.org/D133861
Mehdi Amini [Mon, 29 Aug 2022 12:18:14 +0000 (12:18 +0000)]
Apply clang-tidy fixes for llvm-include-order in TypeTest.cpp (NFC)
Mehdi Amini [Mon, 29 Aug 2022 12:14:14 +0000 (12:14 +0000)]
Apply clang-tidy fixes for bugprone-argument-comment in LLVMTypeTest.cpp (NFC)
Mehdi Amini [Mon, 29 Aug 2022 12:10:49 +0000 (12:10 +0000)]
Apply clang-tidy fixes for readability-simplify-boolean-expr in OpFormatGen.cpp (NFC)
Tue Ly [Thu, 15 Sep 2022 05:00:13 +0000 (01:00 -0400)]
[libc][math] Improve sinhf and coshf performance.
Optimize `sinhf` and `coshf` by computing exp(x) and exp(-x) simultaneously.
Currently `sinhf` and `coshf` are implemented using the following formulas:
```
sinh(x) = 0.5 *(exp(x) - 1) - 0.5*(exp(-x) - 1)
cosh(x) = 0.5*exp(x) + 0.5*exp(-x)
```
where `exp(x)` and `exp(-x)` are calculated separately using the formula:
```
exp(x) ~ 2^hi * 2^mid * exp(dx)
~ 2^hi * 2^mid * P(dx)
```
By expanding the polynomial `P(dx)` into even and odd parts
```
P(dx) = P_even(dx) + dx * P_odd(dx)
```
we can see that the computations of `exp(x)` and `exp(-x)` have many things in common,
namely:
```
exp(x) ~ 2^(hi + mid) * (P_even(dx) + dx * P_odd(dx))
exp(-x) ~ 2^(-(hi + mid)) * (P_even(dx) - dx * P_odd(dx))
```
Expanding `sinh(x)` and `cosh(x)` with respect to the above formulas, we can compute
these two functions as follow in order to maximize the sharing parts:
```
sinh(x) = (e^x - e^(-x)) / 2
~ 0.5 * (P_even * (2^(hi + mid) - 2^(-(hi + mid))) +
dx * P_odd * (2^(hi + mid) + 2^(-(hi + mid))))
cosh(x) = (e^x + e^(-x)) / 2
~ 0.5 * (P_even * (2^(hi + mid) + 2^(-(hi + mid))) +
dx * P_odd * (2^(hi + mid) - 2^(-(hi + mid))))
```
So in this patch, we perform the following optimizations for `sinhf` and `coshf`:
# Use the above formulas to maximize sharing intermediate results,
# Apply similar optimizations from https://reviews.llvm.org/D133870
Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:
For `sinhf`:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 16.718
System LIBC reciprocal throughput : 63.151
BEFORE:
LIBC reciprocal throughput : 90.116
LIBC reciprocal throughput : 28.554 (with `-msse4.2` flag)
LIBC reciprocal throughput : 22.577 (with `-mfma` flag)
AFTER:
LIBC reciprocal throughput : 36.482
LIBC reciprocal throughput : 16.955 (with `-msse4.2` flag)
LIBC reciprocal throughput : 13.943 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 48.821
System LIBC latency : 137.019
BEFORE
LIBC latency : 97.122
LIBC latency : 84.214 (with `-msse4.2` flag)
LIBC latency : 71.611 (with `-mfma` flag)
AFTER
LIBC latency : 54.555
LIBC latency : 50.865 (with `-msse4.2` flag)
LIBC latency : 48.700 (with `-mfma` flag)
```
For `coshf`:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 16.939
System LIBC reciprocal throughput : 19.695
BEFORE:
LIBC reciprocal throughput : 52.845
LIBC reciprocal throughput : 29.174 (with `-msse4.2` flag)
LIBC reciprocal throughput : 22.553 (with `-mfma` flag)
AFTER:
LIBC reciprocal throughput : 37.169
LIBC reciprocal throughput : 17.805 (with `-msse4.2` flag)
LIBC reciprocal throughput : 14.691 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 48.478
System LIBC latency : 48.044
BEFORE
LIBC latency : 99.123
LIBC latency : 85.595 (with `-msse4.2` flag)
LIBC latency : 72.776 (with `-mfma` flag)
AFTER
LIBC latency : 57.760
LIBC latency : 53.967 (with `-msse4.2` flag)
LIBC latency : 50.987 (with `-mfma` flag)
```
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/D133913
Alexey Lapshin [Sun, 4 Sep 2022 09:38:36 +0000 (12:38 +0300)]
[DWARFLinker][NFC] Set the target DWARF version explicitly.
Currently, DWARFLinker determines the target DWARF version internally.
It examines incoming object files, detects maximal
DWARF version and uses that version for the output file.
This patch allows explicitly setting output DWARF version by the consumer
of DWARFLinker. So that DWARFLinker uses a specified version instead
of autodetected one. It allows consumers to use different logic for
setting the target DWARF version. f.e. instead of the maximally used version
someone could set a higher version to convert from DWARFv4 to DWARFv5
(This possibility is not supported yet, but it would be good if
the interface will support it). Or another variant is to set the target
version through the command line. In this patch, the autodetection is moved
into the consumers(DwarfLinkerForBinary.cpp, DebugInfoLinker.cpp).
Differential Revision: https://reviews.llvm.org/D132755
Simon Pilgrim [Thu, 15 Sep 2022 13:01:27 +0000 (14:01 +0100)]
[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops
Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount.
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
Florian Hahn [Thu, 15 Sep 2022 13:01:26 +0000 (14:01 +0100)]
[AArch64] Add big-endian tests for zext-to-tbl.ll
Extra tests for D120571.
wanglei [Thu, 15 Sep 2022 12:31:24 +0000 (20:31 +0800)]
[LoongArch] Fixup value adjustment in applyFixup
A complete implementation of `applyFixup` for D132323.
Makes `LoongArchAsmBackend::shouldForceRelocation` to determine
if the relocation types must be forced.
This patch also adds range and alignment checks for `b*` instructions'
operands, at which point the offset to a label is known.
Differential Revision: https://reviews.llvm.org/D132818
Aleksandr Platonov [Thu, 15 Sep 2022 12:51:30 +0000 (15:51 +0300)]
[clang][RecoveryExpr] Don't perform alignment check if parameter type is dependent
This patch fixes a crash which appears because of getTypeAlignInChars() call with depentent type.
Reviewed By: hokein
Differential Revision: https://reviews.llvm.org/D133886