Hongtao Yu [Fri, 22 Jan 2021 23:52:46 +0000 (15:52 -0800)]
[CSSPGO] Passing the clang driver switch -fpseudo-probe-for-profiling to the linker.
As titled.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D95271
Jonas Devlieghere [Tue, 2 Feb 2021 17:40:08 +0000 (09:40 -0800)]
[debugserver] Fix -Winconsistent-missing-override warnings on arm64
Fangrui Song [Tue, 2 Feb 2021 17:41:05 +0000 (09:41 -0800)]
[MC] Upgrade DWARF version to 5 upon .file 0
Without `-dwarf-version`, llvm-mc uses the default `MCContext::DwarfVersion` 4.
Without `-gdwarf-N`, Clang cc1as uses `clang::driver::ToolChain::GetDefaultDwarfVersion`
which is 4 on many toolchains. Note: `clang -c` can synthesize .debug_info without -g.
There is currently a MCParser warning upon `.file 0` and MCParser errors upon
`.loc 0` if the DWARF version is less than 5. This causes friction to the
following usage:
```
clang -S -g -gdwarf-5 a.c
// MC warning due to .file 0, MC error due to .loc 0
clang -c a.s
llvm-mc -filetype=obj a.s
```
My idea is that we can just upgrade `MCContext::DwarfVersion` to 5 upon
`.file 0` to make the above commands work.
The downside is that for an explicit version `clang -c -gdwarf-4 a.s`, it can be
argued that the new behavior drops the probably intended diagnostic. I think the
downside is small because in most cases DWARF version for an assembly action
should either match the original compile action or be omitted.
Ongoing discussion taking a similar action for GNU as: https://sourceware.org/pipermail/binutils/2021-January/114980.html
Differential Revision: https://reviews.llvm.org/D94882
Florian Hahn [Tue, 2 Feb 2021 16:59:18 +0000 (16:59 +0000)]
[ConstraintElimination] Add test with pointer bitcast.
Fangrui Song [Tue, 2 Feb 2021 17:35:27 +0000 (09:35 -0800)]
[test] Add basic _Unwind_ForcedUnwind + exception tests
Forced unwinding is like a foreign exception, which can be caught by `catch (...)` and rethrown.
If not rethrown, `__cxa_end_cath` will call `_Unwind_DeleteException` to destroy the object.
The behavior going through empty `throw()` and non-empty `throw(int)` is not
clear (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98785), so I do not add such
tests.
Differential Revision: https://reviews.llvm.org/D95200
Fangrui Song [Tue, 2 Feb 2021 17:34:08 +0000 (09:34 -0800)]
[MC] Support SHF_GNU_RETAIN as section flag 'R'
On Linux target triples, GNU as sets EI_OSABI to ELFOSABI_GNU when SHF_GNU_RETAIN is used。
On `*-*-freebsd`, it usually sets EI_OSABI to ELFOSABI_FREEBSD.
GNU ld respects SHF_GNU_RETAIN only for ELFOSABI_FREEBSD/ELFOSABI_GNU.
https://sourceware.org/bugzilla/show_bug.cgi?id=27282
MC doesn't set ELFOSABI_GNU for SHF_GNU_RETAIN/STB_GNU_UNIQUE/STT_GNU_IFUNC.
MC assembled object files do not have special semantics in GNU ld.
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D95730
Fangrui Song [Tue, 2 Feb 2021 17:19:53 +0000 (09:19 -0800)]
[yaml2obj/obj2yaml/llvm-readobj] Support SHF_GNU_RETAIN
In binutils, the flag is defined for ELFOSABI_GNU and ELFOSABI_FREEBSD.
It can be used to mark a section as a GC root.
In practice, the flag has generic semantics and can be applied to many
EI_OSABI values, so we consider it generic.
Differential Revision: https://reviews.llvm.org/D95728
Sanjay Patel [Tue, 2 Feb 2021 16:02:07 +0000 (11:02 -0500)]
[ExpandReductions] add test for fmin with FMF; NFC
Jeroen Dobbelaere [Tue, 2 Feb 2021 16:55:06 +0000 (17:55 +0100)]
[InlineFunction] Only update noalias scopes once for an instruction.
Inlining sometimes maps different instructions to be inlined onto the same instruction.
We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best).
This patch ensures that we only replace scopes for which the mapping is known.
This approach is preferred over tracking which instructions we already handled in a SmallPtrSet,
as that one will need more memory.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95862
David Green [Tue, 2 Feb 2021 16:55:31 +0000 (16:55 +0000)]
[ARM] Correct some tablegen operand types. NFC
Peyton, Jonathan L [Tue, 2 Feb 2021 16:38:33 +0000 (10:38 -0600)]
[OpenMP] Fix sign comparison warnings from GCC
New affinity patch introduced legitimate sign-compare warnings that
clang doesn't report but GCC-10 does. This removes the warnings by
changing two variables types to unsigned.
Differential Revision: https://reviews.llvm.org/D95818
Eric Schweitz [Mon, 1 Feb 2021 23:18:37 +0000 (15:18 -0800)]
[flang][NFC] Update #include and comment.
Differential Revision: https://reviews.llvm.org/D95828
Florian Hahn [Tue, 2 Feb 2021 15:27:58 +0000 (15:27 +0000)]
[ConstraintElimination] Add nicer way to dump constraints (NFC).
Use ConstraintSystem::dump(Names) to display the result of decomposing a
condition.
David Green [Tue, 2 Feb 2021 16:35:47 +0000 (16:35 +0000)]
[ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike
This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is
the same as an insert subreg, allowing it to optimize some redundant
lane moves.
Differential Revision: https://reviews.llvm.org/D95433
Anastasia Stulova [Tue, 2 Feb 2021 16:15:28 +0000 (16:15 +0000)]
Fixed failing OpenCL test
Sebastian Neubauer [Tue, 2 Feb 2021 16:08:57 +0000 (17:08 +0100)]
[AMDGPU] Remove unused tmp register
The temporary register is only used to compute the frame pointer.
The frame pointer is overwritten and not used in between, so we
can reuse the frame pointer for the computation, saving one register.
Differential Revision: https://reviews.llvm.org/D95865
Sebastian Neubauer [Mon, 1 Feb 2021 15:38:50 +0000 (16:38 +0100)]
[AMDGPU] Save fp/bp after csr saves
Saving callee-save registers happens in whole wave mode. Exec is saved
to a free register, which can be reused to save the frame pointer.
Therefore, saving the fp needs to happen after saving csrs.
Differential Revision: https://reviews.llvm.org/D95861
Lei Zhang [Tue, 2 Feb 2021 16:13:39 +0000 (11:13 -0500)]
Revert "[mlir] Fix scf.for single iteration canonicalization check"
This reverts commit
b2b35697dc5172ab1e815e08c0a2714f2a1a9330.
It gotten accidentially landed before LGTM.
Lei Zhang [Tue, 2 Feb 2021 16:08:39 +0000 (11:08 -0500)]
[mlir][spirv] Define sp.VectorShuffle
This patch adds basic op definition, parser/printer, and verifier.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D95825
Lei Zhang [Tue, 2 Feb 2021 13:30:10 +0000 (08:30 -0500)]
[mlir] Fix scf.for single iteration canonicalization check
We should be check whether lb + step >= ub to determine
whether this is a single iteration. Previously we were
checking lb + lb >= ub.
Differential Revision: https://reviews.llvm.org/D95440
Zarko Todorovski [Tue, 2 Feb 2021 15:56:15 +0000 (10:56 -0500)]
[AIX] Improve option processing for mabi=vec-extabi and mabi=vec=defaul
Opening this revision to better address comments by @hubert.reinterpretcast in https://reviews.llvm.org/rGcaaaebcde462
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D95702
Wenlei He [Wed, 20 Jan 2021 07:29:14 +0000 (23:29 -0800)]
[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline
Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.
This is resubmit of D95024, with build break and overtighten assertion fixed.
Test Plan:
Joseph Huber [Tue, 2 Feb 2021 15:48:36 +0000 (10:48 -0500)]
[OpenMP][NFC] Adding FAQ Entry for errors with static libraries
Stefan Pintilie [Tue, 2 Feb 2021 12:07:50 +0000 (06:07 -0600)]
[PowerPC] Materialize 34 bit constants with pli on Power 10.
NOTE: This patch was originally written by Anil Mahmud. His code has been
rebased but otherwise left mostly unchanged.
A new instructon on Power 10 allows for the materialization of 34 bit
immediate values. This patch allows the compiler to take advantage of
the new instruction in this situation.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D92879
David Green [Tue, 2 Feb 2021 15:15:04 +0000 (15:15 +0000)]
[ARM] Add MVE insert-of-extract pattern
A v4i32 insert of an extract can become a simple lane move, as opposed
to round-tripping via a GPR. This adds a patterns that turns an v4i32
insert-extract pair into a EXTRACT_SUBREG/INSERT_SUBREG, with the
required COPY_TO_REGCLASS. These get better optimized into a simple lane
move by the rest of the backend.
Differential Revision: https://reviews.llvm.org/D95428
Stephen Kelly [Tue, 2 Feb 2021 15:11:40 +0000 (15:11 +0000)]
Ensure that the matcher is instantiated
Fix issue diagnosed by Windows linker.
Anastasia Stulova [Tue, 2 Feb 2021 13:00:09 +0000 (13:00 +0000)]
[OpenCL] Add diagnostics for references to functions
Restrict use of references to functions as they can
result in non-conforming behavior.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D95442
Melanie Blower [Tue, 2 Feb 2021 15:06:25 +0000 (07:06 -0800)]
[clang][PATCH][NFC] Correct test case related to review D95482
Lei Zhang [Tue, 2 Feb 2021 15:03:29 +0000 (10:03 -0500)]
[mlir] Put template specialization in the same namespace
This should address GCC 5 failure due to specialization of
runStrategy in different namespace.
Roman Lebedev [Tue, 2 Feb 2021 13:59:47 +0000 (16:59 +0300)]
[InstCombine] Host inversion out of ashr's value operand (PR48995)
This is a yet another hint that we will eventually need InstCombineInverter,
which would consistently sink inversions, but but for that we'll need
to consistently hoist inversions where possible, so let's do that here.
Example of a proof: https://alive2.llvm.org/ce/z/78SbDq
See https://bugs.llvm.org/show_bug.cgi?id=48995
Roman Lebedev [Tue, 2 Feb 2021 13:54:55 +0000 (16:54 +0300)]
[NFC][InstCombine] Add tests for (~x) a>> y --> ~(x a>> y) fold (PR48995)
See https://bugs.llvm.org/show_bug.cgi?id=48995
Ben Shi [Tue, 2 Feb 2021 14:45:52 +0000 (22:45 +0800)]
[AVR][clang] Fix a bug in AVR toolchain search paths
Reviewed By: dylanmckay, MaskRay
Differential Revision: https://reviews.llvm.org/D95529
Sam McCall [Tue, 2 Feb 2021 14:20:18 +0000 (15:20 +0100)]
[clangd] Fix race in Global CDB shutdown
I believe the atomic write can be reordered after the notify, and that
seems to be happening on mac m1: http://45.33.8.238/macm1/2654/step_8.txt
In practice maybe seq_cst is enough? But no reason not to lock here.
https://bugs.llvm.org/show_bug.cgi?id=48998
Stephen Kelly [Sat, 30 Jan 2021 16:19:43 +0000 (16:19 +0000)]
[ASTMatchers] Ignore parts of BindingDecls which are not spelled in source
Differential Revision: https://reviews.llvm.org/D95740
Tom Weaver [Tue, 2 Feb 2021 14:19:31 +0000 (14:19 +0000)]
Revert "[InstrProfiling] Use !associated metadata for counters, data and values"
This reverts commit
df3e39f60b356ca9dbfc11e96e5fdda30afa7acb.
introduced failing test instrprof-gc-sections.c
causing build bot to fail:
http://lab.llvm.org:8011/#/builders/53/builds/1184
David Green [Tue, 2 Feb 2021 14:16:42 +0000 (14:16 +0000)]
[ARM] Extra shuffle tests. NFC
Kent Sommer [Tue, 2 Feb 2021 13:41:55 +0000 (14:41 +0100)]
[clang-format] Add case aware include sorting.
Adds an option to [clang-format] which sorts headers in an alphabetical manner using case only for tie-breakers. The options is off by default in favor of the current ASCIIbetical sorting style.
Reviewed By: MyDeveloperDay, curdeius, HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D95017
Stephen Kelly [Sat, 30 Jan 2021 15:50:44 +0000 (15:50 +0000)]
[ASTMatchers] Add matchers for decomposition decls
Differential Revision: https://reviews.llvm.org/D95739
David Green [Tue, 2 Feb 2021 13:50:02 +0000 (13:50 +0000)]
[ARM] Select VINS from vector inserts
This patch adds tablegen patterns for pairs of i16/f16 insert/extracts.
If we are inserting into two adjacent vector lanes (0 and 1 for
example), we can use either a vmov;vins or vmovx;vins to insert the pair
together, avoiding a round-trip from GRP registers. This is quite a
large patterns with a number of EXTRACT_SUBREG/INSERT_SUBREG/
COPY_TO_REGCLASS nodes, but hopefully as most of those become copies all
that will be cleaned up by further optimizations.
The VINS pattern was also adjusted to allow it to represent that it is
inserting into the top half of an existing register.
Differential Revision: https://reviews.llvm.org/D95381
Simon Pilgrim [Tue, 2 Feb 2021 12:52:10 +0000 (12:52 +0000)]
[X86][SSE] LowerINSERT_VECTOR_ELT - pull out repeated EltSizeInBits calls. NFCI.
Raphael Isemann [Tue, 2 Feb 2021 13:41:41 +0000 (14:41 +0100)]
Revert "[lldb] Use current execution context in SBDebugger"
This reverts commit
754ab803b8dc659e3645d369d1b5d6d2f97be29e.
As pointed out in https://reviews.llvm.org/D95761, this patch could lead to
having the wrong execution context in some situations (thanks Jim!).
D92164 is addressing the same issue and will replace this patch, so I'll
revert this one.
Sander de Smalen [Tue, 2 Feb 2021 12:50:18 +0000 (12:50 +0000)]
NFC: Migrate SpeculateAroundPHIs to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Reviewed By: ctetreau
Differential Revision: https://reviews.llvm.org/D95353
Sander de Smalen [Tue, 2 Feb 2021 12:28:20 +0000 (12:28 +0000)]
NFC: Migrate SimpleLoopUnswitch to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D95352
Stephen Kelly [Sat, 30 Jan 2021 01:36:40 +0000 (01:36 +0000)]
[ASTMatchers] Fix matching after generic top-level matcher
With a matcher like
expr(anyOf(integerLiteral(equals(42)), unless(expr())))
and code such as
struct B {
B(int);
};
B func1() { return 42; }
the top-level expr() would match each of the nodes which are not spelled
in the source and then ignore-traverse to match the integerLiteral node.
This would result in multiple results reported for the integerLiteral.
Fix that by only running matching logic on nodes which are not skipped
with the top-level matcher.
Differential Revision: https://reviews.llvm.org/D95735
Nicolas Vasilache [Tue, 2 Feb 2021 12:16:51 +0000 (12:16 +0000)]
[mlir][Linalg] Fix and properly test CodegenStrategy API
Fix a bug that was introduced where calling the codegen strategy with actual concrete C++ Op types did not trigger the expected behavior.
Also introduce a test for the behavior that was missing.
Differential Revision: https://reviews.llvm.org/D95863
Nico Weber [Tue, 2 Feb 2021 12:38:44 +0000 (07:38 -0500)]
Revert "[test] Default clang/test to FileCheck --allow-unused-prefixes=false"
This reverts commit
80f539526eec31f03aadd96753648686312b1ad1.
Many test failures on mac: http://45.33.8.238/macm1/2772/summary.html
One on win: http://45.33.8.238/win/32442/summary.html
Utkarsh Saxena [Mon, 1 Feb 2021 20:17:53 +0000 (21:17 +0100)]
[clangd] Report only decl of overridding method in xref.
See: https://github.com/clangd/clangd/issues/668
```
struct A { virtual void foo() = 0; };
struct B : A { void foo() override; };
```
Find refs on `A::foo()` will show:
- decls of `A::foo()`
- decls of `B::foo()`
- refs to `A::foo()`
- no refs to `B::foo()`.
Differential Revision: https://reviews.llvm.org/D95812
Benjamin Kramer [Tue, 2 Feb 2021 11:59:41 +0000 (12:59 +0100)]
[mlir][Linalg] Fix unused variable warning in Release builds. NFC.
Dmitry Preobrazhensky [Tue, 2 Feb 2021 11:49:59 +0000 (14:49 +0300)]
[AMDGPU][MC] Corrected parsing of optional modifiers
Fixed bugs in parsing of "no*" modifiers and improved errors handling.
See https://bugs.llvm.org/show_bug.cgi?id=41282.
Differential Revision: https://reviews.llvm.org/D95675
Simon Pilgrim [Tue, 2 Feb 2021 11:40:39 +0000 (11:40 +0000)]
[X86][AVX512] Support variable-index vector insertion on AVX512 targets (PR47924)
With predicate masks, AVX512 can efficiently perform variable-index vector insertion with 2 broadcasts + 1 comparison, avoiding a lot of aliased memory traffic.
Differential Revision: https://reviews.llvm.org/D95779
Andrew Ng [Wed, 27 Jan 2021 16:47:21 +0000 (16:47 +0000)]
[X86] Fix disassembly of x86-64 GDTLS code sequence
For x86-64 the REX.w prefix takes precedence over any other size
override (i.e. 0x66). Therefore, for x86-64 when REX.w is present set
'hasOpSize' to false to ensure that any size override is ignored.
Fixes PR48901.
Differential Revision: https://reviews.llvm.org/D95682
Nicolas Vasilache [Tue, 2 Feb 2021 11:19:21 +0000 (11:19 +0000)]
[mlir][Linalg] Refactor Linalg vectorization for better reuse and extensibility.
This revision unifies Linalg vectorization and paves the way for vectorization of Linalg ops with mixed-precision operations.
The new algorithm traverses the ops in the linalg block in order and avoids recursion.
It uses a BlockAndValueMapping to keep track of vectorized operations.
The revision makes the following modifications but is otherwise NFC:
1. vector.transfer_read are created eagerly and may appear in a different order than the original order.
2. a more progressive vectorization to vector.contract results in only the multiply operation being converted to `vector.contract %a, %b, %zero`, where `%zero` is a
constant of the proper type. Later vector canonicalizations are assumed to rewrite vector.contract %a, %b, %zero + add to a proper accumulate form.
Differential revision: https://reviews.llvm.org/D95797
Simon Pilgrim [Tue, 2 Feb 2021 10:53:28 +0000 (10:53 +0000)]
[X86][AVX] Add missing VEX_WIG tags from VPACKUSDW/VPHSUBD/VPCMPISTRI/VPCMPISTRM/VPCMPESTRI/VPCMPESTRM
Fixes PR48877
Differential Revision: https://reviews.llvm.org/D95801
Sven van Haastregt [Tue, 2 Feb 2021 11:15:29 +0000 (11:15 +0000)]
[OpenCL] Change extension handling for -fdeclare-opencl-builtins
Until now, the `-fdeclare-opencl-builtins` option behaved differently
compared to inclusion of `opencl-c.h`: builtins that are part of an
extension were only available if the extension was enabled using the
corresponding pragma.
Builtins that belong to an extension are guarded using a preprocessor
macro (that is named after the extension) in `opencl-c.h`. Align the
behaviour of `-fdeclare-opencl-builtins` with this.
Co-authored-by: Anastasia Stulova
Differential Revision: https://reviews.llvm.org/D95616
David Green [Tue, 2 Feb 2021 11:09:31 +0000 (11:09 +0000)]
[ARM] Remove DLS lr, lr
A DLS lr, lr instruction only moves lr to itself. It need not be emitted
on it's own to save a instruction in the loop preheader.
Differential Revision: https://reviews.llvm.org/D78916
Adrian Kuegel [Tue, 2 Feb 2021 10:46:54 +0000 (11:46 +0100)]
Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline"
This reverts commit
9a03058d6322edb8abc803ba3e436cc62647d979.
Adrian Kuegel [Tue, 2 Feb 2021 10:46:32 +0000 (11:46 +0100)]
Revert "Fix build break from D95024"
This reverts commit
09cd849fdef2b2d3de2d0b0a5c512100957e0ef6.
David Green [Tue, 2 Feb 2021 10:28:58 +0000 (10:28 +0000)]
[ARM] Regenerate LowOverheadLoops mir tests. NFC
Andrzej Warzynski [Tue, 2 Feb 2021 09:07:33 +0000 (09:07 +0000)]
[flang][driver] Disallow non-existent input files in the frontend driver
This patch adds a check that verifies that the input file used when
calling the frontend driver (i.e. `flang-new -fc1`) actually exists.
This was not required for the compiler driver, `flang-new`, as that's
already handled in libclangDriver.
Once all input/output file management is moved to the driver, we should
also check that for input from `stdin` the corresponding file descriptor
was successfully acquired.
This patch also makes sure that the default action in the frontend is
`ParseSyntaxOnly`. This is consistent with Clang. Before this change
`flang-new -fc1` would do nothing, which makes testing changes like the
one introduced here a bit tricky.
Reviewed By: SouraVX
Differential Revision: https://reviews.llvm.org/D95127
David Sherwood [Fri, 22 Jan 2021 16:53:21 +0000 (16:53 +0000)]
[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE
This patch updates IRBuilder::CreateMaskedGather/Scatter to work
with ScalableVectorType and adds isLegalMaskedGather/Scatter functions
to AArch64TargetTransformInfo. In addition I've fixed up
isLegalMaskedLoad/Store to return true for supported scalar types,
since this is what the vectorizer asks for.
In LoopVectorize.cpp I've changed
LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid
cost for scalable vectors, since currently this relies upon using shuffle
vector for reversing vectors. In addition, in
LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed
that the cost of scalarising memory ops is infinitely expensive.
I have added some simple masked load/store and gather/scatter tests,
including cases where we use gathers and scatters for conditional invariant
loads and stores.
Differential Revision: https://reviews.llvm.org/D95350
Benjamin Kramer [Tue, 2 Feb 2021 09:50:48 +0000 (10:50 +0100)]
Fold one-use variable into assert. NFCI.
Avoids a warning in Release builds.
Alex Zinenko [Fri, 29 Jan 2021 18:41:10 +0000 (19:41 +0100)]
[mlir] Keep track of region signature conversions as argument replacements
In dialect conversion, signature conversions essentially perform block argument
replacement and are added to the general value remapping. However, the replaced
values were not tracked, so if a signature conversion was rolled back, the
construction of operand lists for the following patterns could have obtained
block arguments from the mapping and give them to the pattern leading to
use-after-free. Keep track of signature conversions similarly to normal block
argument replacement, and erase such replacements from the general mapping when
the conversion is rolled back.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D95688
Hans Wennborg [Fri, 29 Jan 2021 09:54:40 +0000 (10:54 +0100)]
[dllimport] Honor always_inline when deciding whether a dllimport function should be available for inlining (PR48925)
Normally, Clang will not make dllimport functions available for inlining
if they reference non-imported symbols, as this can lead to confusing
link errors. But if the function is marked always_inline, the user
presumably knows what they're doing and the attribute should be honored.
Differential revision: https://reviews.llvm.org/D95673
Sebastian Neubauer [Mon, 1 Feb 2021 15:02:09 +0000 (16:02 +0100)]
[AMDGPU] Mark epilog restores as frame-destroy
I guess instructions were marked as frame-setup by accident, they are
restores as part of the epilog.
Differential Revision: https://reviews.llvm.org/D95783
Sebastian Neubauer [Thu, 28 Jan 2021 13:53:22 +0000 (14:53 +0100)]
[AMDGPU] Clarify calling conv about inactive lanes
So far, it was not specified what happens with the VGPRs of inactive
lanes when functions are called. This patch explicitely mentions that
the VGPR values of inactive lanes need to be preserved for all
registers.
This describes the current behavior, as only active lanes of registers
are saved to scratch. Also, as the multi-lane nature of VGPRs is not
properly modeled, we cannot determine the live VGPRs from inactive lanes
at calls. So we cannot save them, even if we intended to do so.
Differential Revision: https://reviews.llvm.org/D95610
Wenlei He [Tue, 2 Feb 2021 09:00:24 +0000 (01:00 -0800)]
Fix build break from D95024
Wenlei He [Wed, 20 Jan 2021 07:29:14 +0000 (23:29 -0800)]
[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline
Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.
Test Plan:
Differential Revision: https://reviews.llvm.org/D95024
Thomas Symalla [Tue, 2 Feb 2021 08:32:25 +0000 (09:32 +0100)]
[AMDGPU] Add new short clamp pattern in GlobalISel.
Thomas Symalla [Tue, 2 Feb 2021 07:50:30 +0000 (08:50 +0100)]
Removed Diff file.
Differential Revision: https://reviews.llvm.org/D93708
Thomas Symalla [Mon, 1 Feb 2021 16:58:51 +0000 (17:58 +0100)]
Fixed includes.
Differential Revision: https://reviews.llvm.org/D93708
Thomas Symalla [Mon, 1 Feb 2021 15:59:57 +0000 (16:59 +0100)]
Fixed includes.
Thomas Symalla [Mon, 1 Feb 2021 08:19:44 +0000 (09:19 +0100)]
Reverted whitespace changes.
Differential Revision: https://reviews.llvm.org/D90968
Thomas Symalla [Mon, 1 Feb 2021 08:17:33 +0000 (09:17 +0100)]
Added missing includes.
Thomas Symalla [Tue, 26 Jan 2021 10:26:50 +0000 (11:26 +0100)]
Renamed med3 opcode, removed superfluous copy.
Thomas Symalla [Mon, 25 Jan 2021 14:20:24 +0000 (15:20 +0100)]
Removed the generic virtual register creations. Reworked the tests.
Thomas Symalla [Mon, 18 Jan 2021 15:05:00 +0000 (16:05 +0100)]
Implemented a MED3_S32 GIR opcode.
Thomas Symalla [Wed, 13 Jan 2021 14:23:45 +0000 (15:23 +0100)]
Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review.
Thomas Symalla [Tue, 12 Jan 2021 12:02:49 +0000 (13:02 +0100)]
Formatting changes
Thomas Symalla [Tue, 12 Jan 2021 12:01:04 +0000 (13:01 +0100)]
Formatting changes.
Thomas Symalla [Tue, 12 Jan 2021 11:58:36 +0000 (12:58 +0100)]
Updating formatting changes.
Thomas Symalla [Tue, 12 Jan 2021 11:50:04 +0000 (12:50 +0100)]
Resolve formatting changes.
Thomas Symalla [Tue, 12 Jan 2021 08:40:21 +0000 (09:40 +0100)]
Code changes yielded from review.
Thomas Symalla [Mon, 11 Jan 2021 14:09:47 +0000 (15:09 +0100)]
Fixed tests.
Thomas Symalla [Mon, 11 Jan 2021 13:50:45 +0000 (14:50 +0100)]
Move step to PreLegalizer
Thomas Symalla [Mon, 11 Jan 2021 13:50:28 +0000 (14:50 +0100)]
Move Combiner to PreLegalize step
Thomas Symalla [Tue, 5 Jan 2021 10:24:27 +0000 (11:24 +0100)]
Renamed identifiers in lit
Thomas Symalla [Tue, 5 Jan 2021 08:59:17 +0000 (09:59 +0100)]
Reverted unintended git-format change.
Thomas Symalla [Tue, 5 Jan 2021 08:58:00 +0000 (09:58 +0100)]
Fixed the lit tests and a bug in the implementation.
Thomas Symalla [Mon, 4 Jan 2021 14:23:44 +0000 (15:23 +0100)]
Refactored the pattern matching.
Thomas Symalla [Mon, 4 Jan 2021 10:27:30 +0000 (11:27 +0100)]
Renames
Thomas Symalla [Tue, 22 Dec 2020 15:04:26 +0000 (16:04 +0100)]
Added early exit.
Thomas Symalla [Tue, 22 Dec 2020 13:49:24 +0000 (14:49 +0100)]
Added comments.
Thomas Symalla [Tue, 22 Dec 2020 13:40:07 +0000 (14:40 +0100)]
clang-format
Thomas Symalla [Tue, 22 Dec 2020 13:37:49 +0000 (14:37 +0100)]
Added clamp i64 to i16 global isel pattern.
Craig Topper [Tue, 2 Feb 2021 07:53:54 +0000 (23:53 -0800)]
[RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand.
I think this is a more standard way of doing this.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D95833
Wenlei He [Mon, 4 Jan 2021 00:43:06 +0000 (16:43 -0800)]
[CSSPGO] Call site prioritized inlining for sample PGO
This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`).
Motivation
With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO.
- It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat.
- The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately.
- It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO.
New FDO Inliner
Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed.
- Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655.
- Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check.
- Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites.
- BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path.
Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO).
Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO.
Tunings and knobs
I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO.
Results
Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it).
Differential Revision: https://reviews.llvm.org/D94001
Nicolas Vasilache [Tue, 2 Feb 2021 07:41:07 +0000 (07:41 +0000)]
[mlir][Standard] Extend n-D vector lowering to LLVM to [s|z]exti ops.
[s|z]exti ops do not have the same operand and result type.
As a consequence, the lowering of the n-D vector form needs to be relaxed a bit.
This revision additionally performs a few NFC renamings of variables to make them more intuitive.
Differential Revision: https://reviews.llvm.org/D95760
Craig Topper [Tue, 2 Feb 2021 07:08:46 +0000 (23:08 -0800)]
[SelectionDAG] Prevent scalable vector warning from ComputeNumSignBits on extract_vector_elt on a scalable vector.
xgupta [Tue, 2 Feb 2021 07:23:45 +0000 (12:53 +0530)]
[NFC][Docs] Fix RAVFrontendAction doc's CMakelists.txt for Shared build
[[ https://clang.llvm.org/docs/RAVFrontendAction.html | Example tutorial ]] giving undefine reference error while building with BUILD_SHARED_LIBS=ON.
Differential Revision: https://reviews.llvm.org/D95737
Puyan Lotfi [Tue, 2 Feb 2021 07:33:44 +0000 (02:33 -0500)]
Revert "[AArch64] Homogeneous Prolog and Epilog Size Optimization"
This reverts commit
0426be3df6180747bd68706db87a70580f064f0f.
Reverting due to some expensive-checks failures in tests.