Fraser Cormack [Tue, 1 Feb 2022 10:54:39 +0000 (10:54 +0000)]
[RISCV][3/3] Switch undef -> poison in scalable-vector RVV tests
Fraser Cormack [Tue, 1 Feb 2022 10:54:19 +0000 (10:54 +0000)]
[RISCV][2/3] Switch undef -> poison in fixed-vector RVV tests
Fraser Cormack [Tue, 1 Feb 2022 10:51:01 +0000 (10:51 +0000)]
[RISCV][1/3] Switch undef -> poison in VP RVV tests
Inspired by a recent Discourse post on undef vs. poison usage, this
series of patches should reduce the number of undefs in LLVM tests by
around 10%.
Only undef vector operands to insertelement/shufflevector have been
handled, which are by far the most common we've got.
The switchover is split into 3 fairly arbitrary clusters to make it
slightly more manageable: vector predication, fixed-length vectors,
scalable vectors.
Benjamin Kramer [Tue, 1 Feb 2022 10:58:27 +0000 (11:58 +0100)]
[mlir] Attempt working around a GCC 5 bug
It doesn't like implicit `this` in generic lambdas.
Nicolas Vasilache [Wed, 26 Jan 2022 10:57:09 +0000 (05:57 -0500)]
[mlir][LLVM] Add support for operand_attrs to InlineAsmOp
This revision adds enough support to allow InlineAsmOp to work properly with indirect memory constraints "*m".
These require an explicit "elementtype" TypeAttr on the operands to pass LLVM verification and need to be provided.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D118006
Nikita Popov [Tue, 1 Feb 2022 10:52:31 +0000 (11:52 +0100)]
[AArch64] Regenerate test checks (NFC)
The check lines were in the wrong order.
Prashant Kumar [Sat, 22 Jan 2022 18:52:46 +0000 (00:22 +0530)]
[MLIR] Extract division representation from equality expressions.
Extract the division representation from equality constraints.
For example:
32*k == 16*i + j - 31 <-- k is the localVariable
expr = 16*i + j - 31, divisor = 32
k = (16*i + j - 32) floordiv 32
The dividend of the division is set to [16, 1, -32] and the divisor is set
to 32.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D117959
Benjamin Kramer [Tue, 1 Feb 2022 10:37:15 +0000 (11:37 +0100)]
Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit
afaaecc88c6e5989de8a6a0266610860ef99d9d6.
Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276
tyb0807 [Sun, 23 Jan 2022 22:29:33 +0000 (22:29 +0000)]
[ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td
Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded
instruction size for some pseudo-instructions, while this
information should ideally be found in ARMInstrInfo.td,
ARMInstrThumb(2).td files (which can be accessed via MCInstrDesc). Hence,
the .td files should be updated and no hard-coded instruction sizes
should be used by getInstSizeInBytes() anymore.
Differential Revision: https://reviews.llvm.org/D118009
tyb0807 [Sat, 22 Jan 2022 10:31:56 +0000 (10:31 +0000)]
[AArch64] Make getInstSizeInBytes() use instruction size from InstrInfo.td
Currently, AArch64InstrInfo::getInstSizeInBytes() uses hard-coded
instruction size for some pseudo-instructions, while this
information should ideally be found in AArch64InstrInfo.td file (which
can be accessed via MCInstrDesc). Hence, the .td file should be updated
and no hard-coded instruction sizes should be used by
getInstSizeInBytes() anymore.
Differential Revision: https://reviews.llvm.org/D117970
Fraser Cormack [Mon, 31 Jan 2022 17:43:37 +0000 (17:43 +0000)]
[RISCV] Add a test showing an incorrect VSETVLI insertion
This test shows a loop, whose preheader uses a SEW=64, LMUL=1 vector
operation. The loop body starts off with another SEW=64, LMUL=1 VADD
vector operation, before switching to a SEW=32, LMUL=1/2 vector store
instruction.
We can see that the VSETVLI insertion pass omits a VSETVLI before the
VADD (thinking it inherits its configuration from the preheader) but
does place a SEW=32, LMUL=1/2 VSETVLI before the store. This results in
a miscompilation as when the loop comes back around, the VADD is
incorrectly configured with SEW=32, LMUL=1/2.
It appears to be a bad load/store optimization, as replacing the vector
store with an SEW=32, LMUL=1/2 VADD does correctly insert a VSETVLI. The
issue is therefore possibly arising from canSkipVSETVLIForLoadStore.
Differential Revision: https://reviews.llvm.org/D118629
David Spickett [Tue, 1 Feb 2022 10:11:02 +0000 (10:11 +0000)]
[compiler-rt][fuzzer] Disable 2 tests for Arm Thumb builds
These tests appear to be causing timeouts on our silent
Thumbv7 bot: https://lab.llvm.org/staging/#/builders/162/builds/260
It is possible they would complete given enough time. value-profile-switch
seems to take a long time even on a powerful Armv8 machine.
Bjorn Pettersson [Fri, 28 Jan 2022 12:23:47 +0000 (13:23 +0100)]
[DAGCombine] Add simple folds for SSHLSAT/USHLSAT
Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT
and USHLSAT DAG nodes.
This includes folds such as:
(shlsat undef/poison, x) -> 0
(shlsat x, undef/poison) -> undef
(shlsat x, too_large_shamt) -> undef
(shlsat 0, x) -> 0
(shlsat x, 0) -> x
(shlsat c1, c2) -> c3
Differential Revision: https://reviews.llvm.org/D118603
Bjorn Pettersson [Mon, 31 Jan 2022 22:23:48 +0000 (23:23 +0100)]
Pre-commit test cases missing SSHLSAT/USHLSAT folds. NFC
Florian Hahn [Tue, 1 Feb 2022 09:50:47 +0000 (09:50 +0000)]
[LV] Use onlyFirstLaneDemanded when widening pointer phis (NFCI).
This removes another instance of recipe execution still relying on
the cost model.
Depends on D116554.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D116656
David Sherwood [Thu, 13 Jan 2022 11:07:54 +0000 (11:07 +0000)]
[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.
Differential Revision: https://reviews.llvm.org/D117210
Nikita Popov [Tue, 1 Feb 2022 09:42:34 +0000 (10:42 +0100)]
[ArgPromotion] Add alignment test (NFC)
This shows a miscompile in the current argpromotion implementation:
We may speculatively execute overaligned loads.
Jay Foad [Mon, 31 Jan 2022 16:56:32 +0000 (16:56 +0000)]
[StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert
predicates. Add a quick loop to clean these up afterwards if we can get
away with modifying an existing compare instruction instead.
(StructurizeCFG is generally run late in the pipeline so instcombine
does not clean them up for us.)
Differential Revision: https://reviews.llvm.org/D118623
Nikita Popov [Tue, 1 Feb 2022 09:34:02 +0000 (10:34 +0100)]
[ArgPromotion] Regenerate test checks (NFC)
Nikita Popov [Tue, 1 Feb 2022 09:32:58 +0000 (10:32 +0100)]
[ArgPromotion] Use range-based for loop (NFC)
David Green [Tue, 1 Feb 2022 09:21:49 +0000 (09:21 +0000)]
[LV][AArch64] Add test for scalar interleaving with predication. NFC
Siva Chandra [Tue, 1 Feb 2022 06:17:44 +0000 (22:17 -0800)]
[libc] Add a few missing deps, includes, and fix a few typos.
This allows us to enable rmdir, mkdir, mkdirat, unlink and unlinkat for
aarch64.
Johannes Doerfert [Tue, 1 Feb 2022 08:23:55 +0000 (02:23 -0600)]
[Attributor][FIX] Relax assertion in IRPosition::verify
A call base can be a floating value if we talk about the instruction and
not the return value. This distinction was not made before but is
important for liveness, e.g., a call site return value might be unused
(=dead) but the call site is not.
Markus Lavin [Tue, 1 Feb 2022 08:16:50 +0000 (09:16 +0100)]
[llvm-reduce] Set ShouldPreserveUseListOrder=true
When exporting textual IR during reduction the ShouldPreserveUseListOrder
parameter of the IR printer should be set to get predictable results.
Differential Revision: https://reviews.llvm.org/D118585
Johannes Doerfert [Tue, 1 Feb 2022 08:18:02 +0000 (02:18 -0600)]
[UpdateTestChecks][FIX] Expected output changed with Attributor
Marek Kurdej [Tue, 1 Feb 2022 08:05:26 +0000 (09:05 +0100)]
[clang-format] Fix AlignConsecutiveAssignments breaking lambda formatting.
Fixes https://github.com/llvm/llvm-project/issues/52772.
This patch fixes the formatting of the code:
```
auto
aaaaaaaaaaaaaaaaaaaaa = {};
auto b = g([] {
return;
});
```
which should be left as is, but before this patch was formatted to:
```
auto
aaaaaaaaaaaaaaaaaaaaa = {};
auto b = g([] {
return;
});
```
Reviewed By: MyDeveloperDay, HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D115972
Fangrui Song [Tue, 1 Feb 2022 08:16:42 +0000 (00:16 -0800)]
[ELF] Change vector<Symbol *> to SmallVector. NFC
Siva Chandra Reddy [Tue, 1 Feb 2022 08:13:43 +0000 (08:13 +0000)]
[libc] Adjust few fcntl macros for aarch64.
Fangrui Song [Tue, 1 Feb 2022 08:14:21 +0000 (00:14 -0800)]
[ELF] Change vector<InputSection *> to SmallVector. NFC
My x86-64 lld executable is 8KiB smaller.
Johannes Doerfert [Tue, 1 Feb 2022 08:13:01 +0000 (02:13 -0600)]
[Attributor][FIX] Repair broken unit test
Fangrui Song [Tue, 1 Feb 2022 08:09:30 +0000 (00:09 -0800)]
[ELF] Switch split-stack to use SmallVector. NFC
My x86-64 lld executable is 1.1KiB smaller.
Marek Kurdej [Mon, 31 Jan 2022 17:39:00 +0000 (18:39 +0100)]
[clang-format] Don't break block comments when sorting includes.
Fixes https://github.com/llvm/llvm-project/issues/34626.
Before, the include sorter would break the code:
```
#include <stdio.h>
#include <stdint.h> /* long
comment */
```
and change it into:
```
#include <stdint.h> /* long
#include <stdio.h>
comment */
```
This commit handles only the most basic case of a single block comment on an include line, but does not try to handle all the possible edge cases with multiple comments.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D118627
Johannes Doerfert [Mon, 31 Jan 2022 22:45:17 +0000 (16:45 -0600)]
[Attributor] Introduce the `AA::isPotentiallyReachable` helper APIs
To make usage easier (compared to the many reachability related AAs),
this patch introduces a helper API, `AA::isPotentiallyReachable`, which
performs all the necessary steps. It also does the "backwards"
reachability (see D106720) as that simplifies the AA a lot (backwards
queries were somewhat different from the other query resolvers), and
ensures we use cached values in every stage.
To test inter-procedural reachability in a reasonable way this patch
includes an extension to `AAPointerInfo::forallInterferingWrites`.
Basically, we can exclude writes if they cannot reach a load "during the
lifetime" of the allocation. That is, we need to go up the call graph to
determine reachability until we can determine the allocation would be
dead in the caller. This leads to new constant propagations (through
memory) in `value-simplify-pointer-info-gpu.ll`.
Note: The new code contains plenty debug output to determine how
reachability queries are resolved.
Parts extracted from D110078.
Differential Revision: https://reviews.llvm.org/D118673
Johannes Doerfert [Mon, 31 Jan 2022 20:16:43 +0000 (14:16 -0600)]
[Attributor] Introduce the concept of query AAs
D106720 introduced features that did not work properly as we could add
new queries after a fixpoint was reached and which could not be answered
by the information gathered up to the fixpoint alone.
As an alternative to D110078, which forced eager computation where we
want to continue to be lazy, this patch fixes the problem.
QueryAAs are AAs that allow lazy queries during their lifetime. They are
never fixed if they have no outstanding dependences and always run as
part of the updates in an iteration. To determine if we are done, all
query AAs are asked if they received new queries, if not, we only need
to consider updated AAs, as before. If new queries are present we go for
another iteration.
Differential Revision: https://reviews.llvm.org/D118669
Johannes Doerfert [Tue, 1 Feb 2022 01:16:45 +0000 (19:16 -0600)]
[Attributor] Pre-commit test case
This test shows how we can use alloca position and kernel+AS information
to improve reachability queries and consequently store-load forwarding.
The thirst argument passed to the @use function can be determined
statically (a constant). The others cannot and are there for
verification.
Kuter Dinel [Mon, 31 Jan 2022 20:29:09 +0000 (14:29 -0600)]
[Attributor] AAFunctionReachability, Instruction reachability.
This patch implement instruction reachability for AAFunctionReachability
attribute. It is used to tell if a certain instruction can reach a function
transitively.
NOTE: I created a new commit based of D106720 and set the author back to
Kuter. Other metadata, etc. is wrong. I also addressed the
remaining review comments and fixed the unit test.
Differential Revision: https://reviews.llvm.org/D106720
Johannes Doerfert [Mon, 20 Sep 2021 19:41:52 +0000 (14:41 -0500)]
[Attributor] Use AAFunctionReachability to determine AANoRecurse
We missed out on AANoRecurse in the module pass because we had no call
graph. With AAFunctionReachability we can simply ask if the function may
reach itself.
Differential Revision: https://reviews.llvm.org/D110099
Johannes Doerfert [Sun, 30 Jan 2022 22:40:34 +0000 (16:40 -0600)]
[Attributor] Make interprocedural value explicit in genericValueTraversal
genericValueTraversal can look through arguments and allow value
simplification across function boundaries. In fact, the latter already
happened unchecked. With this change we allow the user of
genericValueTraversal to opt-out of interprocedural traversal if
required. We explicitly look through arguments now which helps to do
various things, incl. the propagation of constants into OpenMP parallel
regions (on the host).
Christian Sigg [Tue, 1 Feb 2022 06:15:35 +0000 (07:15 +0100)]
[MLIR][arith] Mark addf/mulf as commutative
Following the discussion in D118318, mark `arith.addf/mulf` commutative.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D118600
Mogball [Tue, 1 Feb 2022 06:58:02 +0000 (06:58 +0000)]
[mlir][ods] Unify Attr/TypeDef and Operation Format Parsing
Part 2 of 3 of unifying the assembly formats of attributes/types and operations.The last patch that introduced attribute/type formats (D111594) factored out the format lexer entirely. This patch factors out most of the format parsers such that the attribute/type and op parsers only need to implement handling for specific elements.
Certain things could be factored better (element verification, 'seen' variables) but the primary goal of factoring is so that features can be used across both assembly formats.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117971
Johannes Doerfert [Tue, 1 Feb 2022 02:23:18 +0000 (20:23 -0600)]
[Attributor][FIX] Liveness handling in the isAssumedDead helpers
This fixes a conceptual problem with our AAIsDead usage which conflated
call site liveness with call site return value liveness. Without the
fix tests would obviously miscompile as we make genericValueTraversal
more powerful (in a follow up). The effects on the tests are mixed but
mostly marginal. The most prominent one is the lack of `noreturn` for
functions. The reason is that we make entire blocks live at the same
time (for time reasons). Now that we actually look at the block
liveness, which we need to do, the return instructions are live and
will survive. As an example, `noreturn_async.ll` has been modified
to retain the `noreturn` even with block granularity. We could address
this easily but there is little need in practice.
Johannes Doerfert [Mon, 31 Jan 2022 13:55:11 +0000 (07:55 -0600)]
[Attributor] Use edge liveness rather than block liveness
We moved to the edge API a while back, not all uses were adjusted.
Edge liveness is more precise.
Johannes Doerfert [Mon, 31 Jan 2022 13:53:31 +0000 (07:53 -0600)]
[Attributor][FIX] Address two oversights in AAIsDead
No tests as these were found browsing the code and I'm not sure how to
test them properly.
Johannes Doerfert [Mon, 31 Jan 2022 13:52:19 +0000 (07:52 -0600)]
[Attributor][NFCI] Improve debug diagnostic
Johannes Doerfert [Sun, 30 Jan 2022 21:51:14 +0000 (15:51 -0600)]
[Attributor] Provide convenient helpers for isAssumedRead{None,Only}
We have two attributes that can answer readnone queries. While there is
a dependence between them, it seems best to not force the users to know
what AA to ask. The helpers also allow to check for readonly nicely.
Test changes show where we now deduce readnone but haven't before,
mostly because we only asked AAMemoryBehavior and not AAMemoryLocation.
AANoAlias has not been ported to the new API yet.
Johannes Doerfert [Sat, 17 Jul 2021 06:54:43 +0000 (01:54 -0500)]
[Attributor] Use CFG reasoning to filter potentially interfering writes
Since D104432 we can look through memory by analyzing all writes that
might interfere with a load. This patch provides some logic to exclude
writes that cannot interfere with a location, due to CFG reasoning.
We make sure to avoid multi-thread write-read situations properly while
we ignore writes that cannot reach a load or writes that will be
overwritten before the load is reached.
Differential Revision: https://reviews.llvm.org/D106397
Johannes Doerfert [Sun, 30 Jan 2022 21:21:27 +0000 (15:21 -0600)]
[Attributor][NFC] Make debug output more useful and concise
Johannes Doerfert [Wed, 26 Jan 2022 21:53:39 +0000 (15:53 -0600)]
[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned
Due to num_threads (probably also other reasons) we cannot assume
explicit barriers are always executed by all threads in an aligned
fashion. We can optimize them if that property can be proven but
that is different.
Johannes Doerfert [Sun, 23 Jan 2022 20:06:22 +0000 (14:06 -0600)]
[Attributor][NFCI] Expose some nosync reasoning to outside users.
No-sync is a property that we need in more places as complex
transformations emerge. To simplify the query we provide an
`AA::isNoSyncInst` helper now and expose two existing helpers through
the `AANoSync` class.
Johannes Doerfert [Sun, 23 Jan 2022 20:08:06 +0000 (14:08 -0600)]
[Attributor][NFCI] Remove anonymous namespaces
The namespaces made it more complicate to implement static helpers,
among other things. We should not need them at all.
Johannes Doerfert [Sat, 22 Jan 2022 22:24:52 +0000 (16:24 -0600)]
[OpenMP] Eliminate redundant barriers in the same block
Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and
bugs introduced later by me.
This patch allows us to remove redundant barriers if they are part
of a "consecutive" pair of barriers in a basic block with no impacted
memory effect (read or write) in-between them. Memory accesses to
local (=thread private) or constant memory are allowed to appear.
Technically we could also allow any other memory that is not used to
share information between threads, e.g., the result of a malloc that
is also not captured. However, it will be easier to do more reasoning
once the code is put into an AA. That will also allow us to look through
phis/selects reasonably. At that point we should also deal with calls,
barriers in different blocks, and other complexities.
Differential Revision: https://reviews.llvm.org/D118002
Johannes Doerfert [Fri, 21 Jan 2022 21:47:56 +0000 (15:47 -0600)]
[OpenMP] Ensure to remove noinline from all runtime functions eventually
We used to remove noinline from known OpenMP runtime functions (which
are declared in OMPKinds.td). Now we remove noinline from all functions
with the proper prefixes: __kmpc, _ZN4_OMP (= namespace omp), omp_
Amir Ayupov [Mon, 31 Jan 2022 06:02:51 +0000 (22:02 -0800)]
[BOLT][CMAKE] Add extra BOLT_INCLUDE_TESTS condition for merge-fdata emit-relocs option
Only enable --emit-relocs linker option for merge-fdata target if tests are enabled.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D118580
Siva Chandra Reddy [Mon, 31 Jan 2022 17:32:07 +0000 (17:32 +0000)]
[libc] Add implementations of POSIX mkdir, mkdirat, rmdir, unlink and unlinkat.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D118641
Jez Ng [Tue, 1 Feb 2022 04:45:19 +0000 (23:45 -0500)]
[lld-macho][test] Add test for UUID format
Reviewed By: keith
Differential Revision: https://reviews.llvm.org/D118646
Serguei Katkov [Thu, 27 Jan 2022 05:21:09 +0000 (12:21 +0700)]
[RS4GC] Make PointerToBase mapping be independent on call site. NFC.
PointerToBase is a mapping between potentially derived pointer to its base.
As soon as we are in SSA form if there is a base of derived pointer and it
is available at def of derived pointer, the same base will be available at any
point where derived pointer is alive.
So the mapping of derived pointer to base pointer is not a property
of a call site but the same on function level.
Reviewers: reames, yrouban
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D118604
Joseph Huber [Tue, 1 Feb 2022 04:32:33 +0000 (23:32 -0500)]
[OpenMP] Remove new driver tests for AMDGPU
Some of the new driver tests are flaky on AMDGPU, remove for now.
Joseph Huber [Mon, 31 Jan 2022 19:31:54 +0000 (14:31 -0500)]
[Libomptarget] Run GPU offloading tests using the new drvier
This patch adds a new target to the tests to run using the new driver as
the method for generating offloading code.
Depends on D116541
Differential Revision: https://reviews.llvm.org/D118637
Joseph Huber [Fri, 21 Jan 2022 20:43:20 +0000 (15:43 -0500)]
[PassBuilder] Add OpenMPOpt to default LTO pipeline
The LTO support for OpenMP offloading allows us to run the OpenMPOpt
pass during the LTO pipeline. This patch introduces an early run of the
Module pass and a late run of the CGSCC pass. These are quick no-ops if
there is no OpenMP in the module.
Depends on D118198
Differential Revision: https://reviews.llvm.org/D118611
Joseph Huber [Tue, 25 Jan 2022 22:46:01 +0000 (17:46 -0500)]
[OpenMP] Remove call to 'clang-offload-wrapper' binary
Summary:
This patch removes the system call to the `clang-offload-wrapper` tool
by replicating its functionality in a new file. This improves
performance and makes the future wrapping functionality easier to
change.
Differential Revision: https://reviews.llvm.org/D118198
Joseph Huber [Tue, 25 Jan 2022 19:25:39 +0000 (14:25 -0500)]
[OpenMP] Replace sysmtem call to `llc` with target machine
Summary:
This patch replaces the system call to the `llc` binary with a library
call to the target machine interface. This should be faster than
relying on an external system call to compile the final wrapper binary.
Differential Revision: https://reviews.llvm.org/D118197
Joseph Huber [Tue, 25 Jan 2022 16:23:27 +0000 (11:23 -0500)]
[OpenMP] Cleanup the Linker Wrapper
Summary:
Various changes and cleanup for the Linker Wrapper tool.
Joseph Huber [Tue, 18 Jan 2022 15:56:12 +0000 (10:56 -0500)]
[OpenMP] Include the executable name in the temporary files
Summary:
This parses the executable name out of the linker arguments so we can
use it to give more informative temporary file names and so we don't
accidentally use it for device linking.
Joseph Huber [Sun, 16 Jan 2022 21:06:59 +0000 (16:06 -0500)]
[OpenMP] Implement save temps functionality in linker wrapper
Summary:
This patch implements the `-save-temps` flag for the linker wrapper.
This allows the user to inspect the intermeditary outpout that the
linker wrapper creates.
Joseph Huber [Sun, 16 Jan 2022 04:10:52 +0000 (23:10 -0500)]
[OpenMP] Embed bitcode after optimizations instead of linking
Summary:
Various changes to the linker wrapper, and the bitcode embedding is not
done after the optimizations have run rather than after linking is done.
This saves time when doing JIT.
Joseph Huber [Fri, 14 Jan 2022 03:59:05 +0000 (22:59 -0500)]
[OpenMP] Improve symbol resolution for OpenMP Offloading LTO
This patch improves the symbol resolution done for LTO with offloading
applications. The symbol resolution done here allows the LTO backend to
internalize more functions. The symbol resoltion done is a simplified
view that does not take into account various options like `--wrap` or
`--dyanimic-list` and always assumes we are creating a shared object.
The actual target may be an executable, but semantically it is used as a
shared object because certain objects need to be visible outside of the
executable when they are read by the OpenMP plugin.
Depends on D117246
Differential Revision: https://reviews.llvm.org/D118155
Joseph Huber [Thu, 13 Jan 2022 17:42:02 +0000 (12:42 -0500)]
[OpenMP] Add support for linking AMDGPU images
This patch adds support for linking AMDGPU images using the LLD binary.
AMDGPU files are always bitcode images and will always use the LTO
backend. Additionally we now pass the default architecture found with
the `amdgpu-arch` tool to the argument list.
Depends on D117156
Differential Revision: https://reviews.llvm.org/D117246
Joseph Huber [Wed, 12 Jan 2022 21:14:52 +0000 (16:14 -0500)]
[OpenMP] Add extra flag handling to linker wrapper
This patch adds support for a few extra flags in the linker wrapper,
such as debugging flags, verbose output, and passing arguments to ptxas. We also
now forward pass remarks to the LLVM backend so they will show up in the LTO
passes.
Depends on D117049
Differential Revision: https://reviews.llvm.org/D117156
Joseph Huber [Tue, 11 Jan 2022 20:50:39 +0000 (15:50 -0500)]
[OpenMP] Add support for embedding bitcode images in wrapper tool
Summary;
This patch adds support for embedding device images in the linker
wrapper tool. This will be used for performing JIT functionality in the
future.
Depends on D117048
Differential Revision: https://reviews.llvm.org/D117049
Joseph Huber [Tue, 11 Jan 2022 15:53:59 +0000 (10:53 -0500)]
[OpenMP] Link the bitcode library late for device LTO
Summary:
This patch adds support for linking the OpenMP device bitcode library
late when doing LTO. This simply passes it in as an additional device
file when doing the final device linking phase with LTO. This has the
advantage that we don't link it multiple times, and the device
references do not get inlined and prevent us from doing needed OpenMP
optimizations when we have visiblity of the whole module.
Fix some failings where the implicit conversion of an Error to an
Expected triggered the deleted copy constructor.
Depends on D116675
Differential revision: https://reviews.llvm.org/D117048
Joseph Huber [Fri, 7 Jan 2022 22:12:51 +0000 (17:12 -0500)]
[OpenMP] Initial Implementation of LTO and bitcode linking in linker wrapper
This patch implements the fist support for handling LTO in the
offloading pipeline. The flag `-foffload-lto` is used to control if
bitcode is embedded into the device. If bitcode is found in the device,
the extracted files will be sent to the LTO pipeline to be linked and
sent to the backend. This implementation does not separately link the
device bitcode libraries yet.
Depends on D116675
Differential Revision: https://reviews.llvm.org/D116975
Joseph Huber [Wed, 5 Jan 2022 18:21:03 +0000 (13:21 -0500)]
[OpenMP] Search for static libraries in offload linker tool
This patch adds support for searching through the linker library paths
to identify static libraries that may contain device code. If device
code is present it will be extracted. This should ideally fully support
static linking with OpenMP offloading.
Depends on D116627
Differential Revision: https://reviews.llvm.org/D116675
Joseph Huber [Tue, 4 Jan 2022 22:20:04 +0000 (17:20 -0500)]
[Clang] Initial support for linking offloading code in tool
This patch adds the initial support for linking NVPTX offloading code
using the clang-linker-wrapper tool. This uses the extracted device
files and runs `nvlink` on them. Currently this is then passed to the
existing toolchain for creating linkable OpenMP offloading programs
using `clang-offload-wrapper` and compiling it manually using `llc`.
More work is required to support LTO, Bitcode linking, AMDGPU, and x86
offloading.
Depends on D116545
Differential Revision: https://reviews.llvm.org/D116627
Joseph Huber [Mon, 3 Jan 2022 17:31:52 +0000 (12:31 -0500)]
[OpenMP] Add support for extracting device code in linker wrapper
This patchs add support for extracting device offloading code from the
linker's input files. If the file contains a section with the name
`.llvm.offloading.<triple>.<arch>` it will be extracted to a new
temporary file to be linked. Addtionally, the host file containing it
will have the section stripped so it does not remain in the executable
once linked.
Depends on D116544
Differential Revision: https://reviews.llvm.org/D116545
Sam Clegg [Mon, 12 Oct 2020 13:59:51 +0000 (06:59 -0700)]
llvm-readobj: support globals in initializer expressions
Differential Revision: https://reviews.llvm.org/D117747
River Riddle [Fri, 21 Jan 2022 08:38:30 +0000 (00:38 -0800)]
[mlir] Add isa/dyn_cast support for dialect interfaces
This matches the same API usage as attributes/ops/types. For example:
```c++
Dialect *dialect = ...;
// Instead of this:
if (auto *interface = dialect->getRegisteredInterface<DialectInlinerInterface>())
// You can do this:
if (auto *interface = dyn_cast<DialectInlinerInterface>(dialect))
```
Differential Revision: https://reviews.llvm.org/D117859
Fangrui Song [Tue, 1 Feb 2022 03:16:11 +0000 (19:16 -0800)]
[AArch64] Temporarily use getPointerElementType to fix -Wdeprecated-declarations. NFC
Tanya Lattner [Tue, 1 Feb 2022 03:03:29 +0000 (19:03 -0800)]
Add status of migration.
Mircea Trofin [Tue, 1 Feb 2022 02:59:47 +0000 (18:59 -0800)]
[nfc][mlgo][regalloc] 'hasPreferredPhys' out of feature components
It isn't cacheable, it can be updated by other events than live interval
resizing.
Geoffrey Martin-Noble [Tue, 1 Feb 2022 01:50:59 +0000 (17:50 -0800)]
[Bazel] Don't fail the build on usage of deprecated APIs
Build failures are not a particularly helpful way to enforce not using
deprecated APIs and that isn't the point of the Bazel build.
At the same time, this removes `-Wno-unused` this is a check that we do
enforce in the Google internal build and so are ok maintaining in our
maintenance of the upstream Bazel build (the comment about not wanting
to do so was from a time when this was in a separate repository and I was
the only one maintaining it).
Differential Revision: https://reviews.llvm.org/D118671
Changpeng Fang [Tue, 1 Feb 2022 02:07:47 +0000 (18:07 -0800)]
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for the new ABI
Set the metadata version to be 1.2
Reviewers:
t-tye, b-sumner, arsenm, and bcahoon
Fixes:
SWDEV-307188, SWDEV-307189
Differential Revision:
https://reviews.llvm.org/D118272
Chris Bieneman [Tue, 1 Feb 2022 01:44:37 +0000 (19:44 -0600)]
Fix memory leak I introduced in
2d66ed370a40
This should fix the asan issue identified on the Linux asan bot.
David Blaikie [Tue, 1 Feb 2022 01:32:31 +0000 (17:32 -0800)]
Disable -Wmissing-prototypes for internal linkage functions that aren't explicitly marked "static"
Some functions can end up non-externally visible despite not being
declared "static" or in an unnamed namespace in C++ - such as by having
parameters that are of non-external types.
Such functions aren't mistakenly intended to be defining some function
that needs a declaration. They could be maybe more legible (except for
the `operator new` example) with an explicit static, but that's a
stylistic thing outside what should be addressed by a warning.
Jonas Devlieghere [Tue, 1 Feb 2022 00:51:07 +0000 (16:51 -0800)]
[lldb] Use the build's python interpreter in the shell tests
Make sure that the shell tests use the same python interpreter as the
rest of the build instead of picking up `python` from the PATH.
It would be nice if we could use the _disallow helper, but that triggers
on invocations that specify python as the scripting language.
Fangrui Song [Tue, 1 Feb 2022 00:46:11 +0000 (16:46 -0800)]
[BitcodeWriter] Fix cases of some functions
`WriteIndexToFile` is used by external projects so I do not touch it.
Fangrui Song [Tue, 1 Feb 2022 00:33:56 +0000 (16:33 -0800)]
[ModuleUtils] Move EmbedBufferInModule to LLVMTransformsUtils
D116542 adds EmbedBufferInModule which introduces a layer violation
(https://llvm.org/docs/CodingStandards.html#library-layering).
See
2d5f857a1eaf5f7a806d12953c79b96ed8952da8 for detail.
EmbedBufferInModule does not use BitcodeWriter functionality and should be moved
LLVMTransformsUtils. While here, change the function case to the prevailing
convention.
It seems that EmbedBufferInModule just follows the steps of
EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR
update operations which ideally should be refactored to another library.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D118666
Joseph Huber [Mon, 31 Jan 2022 23:58:35 +0000 (18:58 -0500)]
[LLVM] Resolve layer violation in BitcodeWriter
Summary:
The changes introduced in D116542 added a dependency on TransformUtils
to use the `appendToCompilerUsed` method. This created a circular
dependency. This patch simply copies the needed function locally to
remove the dependency.
Keith Smiley [Sat, 29 Jan 2022 04:06:51 +0000 (20:06 -0800)]
[llvm-objcopy][MachO] Ignore LC_LINKER_OPTION when redefining symbols
Previously you would get this error:
```
error: unsupported load command (cmd=0x2d)
```
If the binary you were redefining the symbols of contained a
LC_LINKER_OPTION load command. This command does not need to be changed
when redefining symbols so we can ignore it like many others.
Differential Revision: https://reviews.llvm.org/D118526
Fangrui Song [Mon, 31 Jan 2022 23:41:45 +0000 (15:41 -0800)]
[Bazel] Add include/llvm/Transforms/Utils/ModuleUtils.h to work around layer violation after D116542
There is a layer violation and can break clang -fmodule-name=X -fmodules-strict-decluse builds:
* LLVMTransformUtils has `#include "llvm/Bitcode/BitcodeWriterPass.h"`
* LLVMBitWriter depends on LLVMTransformUtils after D116542
Temporarily work around the issue.
Michael Kruse [Mon, 31 Jan 2022 15:49:44 +0000 (09:49 -0600)]
[Clang][OpenMPIRBuilder] Fix off-by-one error when dividing by stepsize.
When the stepsize does not evenly divide the range's end, round-up to ensure that that last multiple of the stepsize before the reaching the upper boud is reached. For instance, the trip count of
for (int i = 0; i < 7; i+=5)
is two (i=0 and i=5), not (7-0)/5 == 1.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D118542
Peter Klausler [Wed, 26 Jan 2022 17:53:12 +0000 (09:53 -0800)]
[flang] Make NEWUNIT= use a range suitable for INTEGER(KIND=1) and recycle unit numbers
Use a bit-set to manage runtime-generated I/O unit numbers, recycle
them after they're closed, and use a range of values that fits in
a minimal-sized integer.
Differential Revision: https://reviews.llvm.org/D118651
Mircea Trofin [Mon, 31 Jan 2022 22:43:03 +0000 (14:43 -0800)]
[mlgo][regalloc] Factor live interval feature calculation
Factoring it out so we can subsequently cache it. This should be a NFC,
however, for the float quantities, we see small errors in the least
significant digits. This is because, before, we were summing up one by
one. Now, we sum up results of sums.
This shouldn't matter for ML, and will require rework when we do
quantization (avoiding floats altogether), but meanwhile, it did require
an update to the reference file used for testing.
The patch also bumps the precision of the variables involved in this, to
reduce the error (note they are casted back to float at the end by the
SET macro, since we only work with float and not double in TF)
Differential Revision: https://reviews.llvm.org/D118659
Snehasish Kumar [Mon, 31 Jan 2022 22:15:36 +0000 (14:15 -0800)]
[instrprof][NFC] Refactor out the common logic for getProfileKind.
The logic for getProfileKind for RawInstrProfReader and
InstrProfReaderIndex is similar. To avoid duplication, move the logic
from the header to InstrProfReader.cpp and introduce a static method
which implements the common code.
Differential Revision: https://reviews.llvm.org/D118656
Snehasish Kumar [Wed, 29 Dec 2021 23:31:11 +0000 (15:31 -0800)]
[memprof] Move the meminfo block struct to MemProfData.inc.
The definition of the MemInfoBlock is shared between the memprof
compiler-rt runtime and llvm/lib/ProfileData/. This change removes the
memprof_meminfoblock header and moves the struct to the shared include
file. To enable this sharing, the Print method is moved to the
memprof_allocator (the only place it is used) and the remaining uses are
updated to refer to the MemInfoBlock defined in the MemProfData.inc
file.
Also a couple of other minor changes which improve usability of the
types in MemProfData.inc.
* Update the PACKED macro to handle commas.
* Add constructors and equality operators.
* Don't initialize the buildid field.
Differential Revision: https://reviews.llvm.org/D116780
Peter Klausler [Thu, 20 Jan 2022 22:09:05 +0000 (14:09 -0800)]
[flang] runtime perf: larger I/O buffer growth increments
When reallocating an I/O buffer to accommodate a large record,
ensure that the amount of growth is at least as large as the
minimum initial record size (64KiB). The previous policy was
causing input buffer reallocation for each byte after the minimum
buffer size when scanning input data for record termination
newlines.
Differential Revision: https://reviews.llvm.org/D118649
Dávid Bolvanský [Mon, 31 Jan 2022 22:45:56 +0000 (23:45 +0100)]
[Clang][NFC] Added testcase from #49549
The issue is fixed in trunk, so add testcase to avoid regression in the future.
Konstantin Varlamov [Mon, 31 Jan 2022 22:44:53 +0000 (14:44 -0800)]
[libc++][ranges][NFC] Fix formatting on newly-added links on the Ranges status page.
Sam Clegg [Sun, 30 Jan 2022 03:09:06 +0000 (19:09 -0800)]
[clang][WebAssembly] Imply -fno-threadsafe-static when threading is disabled
When we don't enable atomics we completely disabled threading in
which case there is no point in generating thread safe code for
static initialization.
This should always be safe because, in WebAssembly, it is not
possible to link object compiled without the atomics feature into a
mutli-threaded program.
See https://github.com/emscripten-core/emscripten/pull/16152
Differential Revision: https://reviews.llvm.org/D118571
Chris Bieneman [Mon, 31 Jan 2022 21:44:55 +0000 (15:44 -0600)]
[NFC] Skip PassBuilderCTests if no default triple
This fixes the unit tests so that it is skipped if there is no default
target triple set. Unset default target triple is a supported build
configuration for LLVM.
Mircea Trofin [Mon, 31 Jan 2022 22:01:43 +0000 (14:01 -0800)]
[NFC][regalloc] Move evict advisor initialization before VRAI
This is because a subsequent patch will propose obtaining the VRAI from
the advisor, which will enable feature caching for the ML advisor, for
better compile time. Making this change first as it's both innocuous and
keeps the future patch to be reviewed small.