Christopher Tetreault [Thu, 30 Sep 2021 23:39:48 +0000 (16:39 -0700)]
[NFC] Fix build failure in ScopDetection
In some build environments, the C++ compiler is unable to infer the
correct type for the DenseMap::insert in isErrorBlock. Typing out
std::make_pair helps.
Christopher Tetreault [Mon, 27 Sep 2021 21:23:49 +0000 (14:23 -0700)]
[SimpleLoopUnswitch] Allow threshold to be specified zero or more times
Differential Revision: https://reviews.llvm.org/D110594
Weiwei Li [Mon, 4 Oct 2021 16:04:33 +0000 (00:04 +0800)]
[mlir][SPIRVToLLVM] Propagate location attribute from spv.GlobalVariable to llvm.mlir.global
This patch is mainly to propogate location attribute from spv.GlobalVariable to llvm.mlir.global.
It also contains three small changes.
1. Remove the restriction on UniformConstant In SPIRVToLLVM.cpp;
2. Remove the errorCheck on relaxedPrecision when deserializering SPIR-V in Deserializer.cpp
3. In SPIRVOps.cpp, let ConstantOp take signedInteger too.
Co-authered: Alan Liu <alanliu.yf@gmail.com> and Xinyi Liu <xyliuhelen@gmail.com>
Reviewed by:antiagainst
Differential revision: https://reviews.llvm.org/D110207
Alfsonso Gregory [Mon, 4 Oct 2021 15:54:05 +0000 (08:54 -0700)]
[LLDB] Fix objc_clsopt_v16_t struct
The objc_clsopt_v16_t struct does not match up with the macOS/iOS15
dyld_shared_cache ObjC runtime structures. A struct field was seemingly
omitted.
Differential revision: https://reviews.llvm.org/D110477
Nico Weber [Mon, 4 Oct 2021 15:45:55 +0000 (11:45 -0400)]
[lld] Use checkError more
No behavior change.
Kazu Hirata [Mon, 4 Oct 2021 15:40:24 +0000 (08:40 -0700)]
[IR] Migrate from getNumArgOperands to arg_size (NFC)
Note that arg_operands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.
Jinsong Ji [Mon, 4 Oct 2021 15:29:04 +0000 (15:29 +0000)]
[PowerPC][NFC] Remove reg name option in int128 test
The test is generated by script, so we don't really need the regname to
be meaniful here.
AIX doesn't support the reg name option, removing it for now so that we
can reuse the CHECKs for AIX triple as well.
Louis Dionne [Mon, 4 Oct 2021 15:19:16 +0000 (11:19 -0400)]
[libc++][NFC] Qualify nullptr_t in test
LLVM GN Syncbot [Mon, 4 Oct 2021 15:13:27 +0000 (15:13 +0000)]
[gn build] Port
811b1736d91b
Zurab Tsinadze [Sat, 18 Sep 2021 20:54:59 +0000 (22:54 +0200)]
[analyzer] Add InvalidPtrChecker
This patch introduces a new checker: `alpha.security.cert.env.InvalidPtr`
Checker finds usage of invalidated pointers related to environment.
Based on the following SEI CERT Rules:
ENV34-C: https://wiki.sei.cmu.edu/confluence/x/8tYxBQ
ENV31-C: https://wiki.sei.cmu.edu/confluence/x/5NUxBQ
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D97699
Roman Lebedev [Mon, 4 Oct 2021 14:04:29 +0000 (17:04 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=4
Roman Lebedev [Mon, 4 Oct 2021 13:55:08 +0000 (16:55 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=4
Roman Lebedev [Mon, 4 Oct 2021 13:50:43 +0000 (16:50 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=4
Roman Lebedev [Mon, 4 Oct 2021 13:25:45 +0000 (16:25 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=4
David Spickett [Mon, 4 Oct 2021 14:24:03 +0000 (14:24 +0000)]
[llvm-objdump] Fix common symbol output on 32 bit platforms
Since https://reviews.llvm.org/D109452 symbol-table.test has
been failing on our Arm32 bots.
https://lab.llvm.org/buildbot/#/builders/171/builds/4201
This is because in that change an implicit widening cast
of the alignment from 32 bit to 64 bit was removed and the
format string expects a 64 bit number.
Louis Dionne [Mon, 4 Oct 2021 14:22:17 +0000 (10:22 -0400)]
[libc++][NFC] Qualify usage of nullptr_t in the format tests
David Goldman [Fri, 1 Oct 2021 18:46:57 +0000 (14:46 -0400)]
[clangd] Improve PopulateSwitch tweak
- Support enums in C and ObjC as their
AST representations differ slightly.
- Add support for typedef'ed enums.
Differential Revision: https://reviews.llvm.org/D110954
Alexey Bataev [Mon, 4 Oct 2021 13:28:09 +0000 (06:28 -0700)]
[clang] Fix computation of number of dependencies using OpenMP iterator,
by Raul Penacoba.
The size of kmp_depend_info and the number of dependencies are computed multiplying the iterator sizes, which not right.
Now size is computed as:
itersize1*numclausedeps1 + itersize2*numclausedeps2 + ... + itersizeN*numclausedepsN
where itersizeX is the size of the iterator and numclausedepsX the number of dependencies in that depend clause.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D111045
Wang, Pengfei [Wed, 29 Sep 2021 15:05:18 +0000 (23:05 +0800)]
[demangle] Add a unittest for _Float16 demangling. NFC
David Green [Mon, 4 Oct 2021 14:01:18 +0000 (15:01 +0100)]
[AArch64] Test for Store Pair Suppress under minsize.
Bjorn Pettersson [Tue, 28 Sep 2021 08:26:25 +0000 (10:26 +0200)]
[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC
In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer
need the IsSizeTTy lambda function and the SizeTTy object. Instead
we just follow the regular structure of checking for integer types
given an exepected number of bits.
Joseph Huber [Wed, 29 Sep 2021 17:45:07 +0000 (13:45 -0400)]
[OpenMP] Add options to change Attributor max iterations in OpenMPOpt
This patch adds a new command line option `openmp-opt-max-iterations`
that controls the maximum number of iterations the attributor will run
for when compiling OpenMP target device code. This patch also adds a
remark to indicate when the attributor failed because it did not run
for enough iterations.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110749
Simon Pilgrim [Mon, 4 Oct 2021 13:36:32 +0000 (14:36 +0100)]
[X86] SimplifyDemandedVectorEltsForTargetNode - simplify PMADDWD for known zero elements
Noticed while investigating the regressions in D110995 - if the RHS element is already zero, then we don't need the corresponding LHS element.
Technically we could also recheck RHS once we have LHS's known zeros, but I haven't seen any missed opportunities from that yet.
Pavel Labath [Mon, 4 Oct 2021 12:23:44 +0000 (14:23 +0200)]
[lldb] Fix a stray array access in Editline
This manifested itself as an asan failure in TestMultilineNavigation.py.
Michał Górny [Fri, 1 Oct 2021 19:28:08 +0000 (21:28 +0200)]
[lldb] Add unit tests for Terminal API
Differential Revision: https://reviews.llvm.org/D110962
Roman Lebedev [Mon, 4 Oct 2021 11:23:51 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs
This required huge amount of assembly surgery, but i think this is about right.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0`
So could pick cost of `25`.
For store we have:
https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `24`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111031
Roman Lebedev [Mon, 4 Oct 2021 11:23:51 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs
This one required quite a bit of assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `10`.
For store we have:
https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `12`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111027
Roman Lebedev [Mon, 4 Oct 2021 11:23:46 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs
This one required quite a bit of assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `5`.
For store we have:
https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111026
Roman Lebedev [Mon, 4 Oct 2021 11:23:42 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `1`.
For store we have:
https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111025
Roman Lebedev [Mon, 4 Oct 2021 11:23:13 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `14`.
For store we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `22`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111022
Roman Lebedev [Mon, 4 Oct 2021 11:23:13 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `7`.
For store we have:
https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `11`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111021
Roman Lebedev [Mon, 4 Oct 2021 11:23:08 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `3`.
For store we have:
https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `5`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111020
Roman Lebedev [Mon, 4 Oct 2021 11:23:04 +0000 (14:23 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5`
So pick cost of `3`.
For store we have:
https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111019
Stefan Pintilie [Wed, 29 Sep 2021 20:06:30 +0000 (15:06 -0500)]
[PowerPC] Fix __builtin_ppc_load2r to return short instead of int.
This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D110771
Jay Foad [Mon, 4 Oct 2021 10:33:22 +0000 (11:33 +0100)]
[APFloat] Common up some assertions. NFC.
Nicolas Vasilache [Fri, 1 Oct 2021 11:54:29 +0000 (11:54 +0000)]
[mlir] Tighten strided layout specification.
Clarify that the strided layout specification is represented by a single semi-affine map.
Differential Revision: https://reviews.llvm.org/D110921
Jay Foad [Mon, 4 Oct 2021 09:20:18 +0000 (10:20 +0100)]
[APFloat] Remove BitWidth argument from getAllOnesValue
There's no need to pass this in explicitly because it is
trivially available from the semantics.
Michał Górny [Sat, 2 Oct 2021 16:13:44 +0000 (18:13 +0200)]
[lldb] [test] Terminate "process connect" connections via kill
Fix the termination of "process connect" (and "gdb-remote") to kill
the process rather than attempting to disconnect the platform.
The latter only results in an error since we did not use "platform
connect", and apparently process-level connections (at least via
gdb-remote) do not really support disconnecting.
Differential Revision: https://reviews.llvm.org/D110996
Simon Pilgrim [Mon, 4 Oct 2021 10:17:17 +0000 (11:17 +0100)]
[X86] Add tests for enabling slow-mulld on AVX2 targets
As discussed on D110588 - Haswell/Broadwell don't have a great PMULLD implementation, we might want to enable this for them in the future
Cullen Rhodes [Mon, 4 Oct 2021 09:28:57 +0000 (09:28 +0000)]
[MLIR] Fix unused tablegen template arg warnings
Identified in D109359.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D110805
Andrew Ng [Thu, 23 Sep 2021 17:42:31 +0000 (18:42 +0100)]
[ELF][test] Fix several LLD ICF tests
A number of the ICF tests were not updated to use --print-icf-sections
instead of --verbose and various '-NOT' checks were not updated to the
latest output format of --print-icf-sections. Because these are all
'negative' tests, these issues have gone unnoticed.
Differential Revision: https://reviews.llvm.org/D110353
Alex Zinenko [Mon, 4 Oct 2021 09:39:19 +0000 (11:39 +0200)]
[mlir][python] Provide more convenient constructors for std.CallOp
The new constructor relies on type-based dynamic dispatch and allows one to
construct call operations given an object representing a FuncOp or its name as
a string, as opposed to requiring an explicitly constructed attribute.
Depends On D110947
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110948
Alex Zinenko [Mon, 4 Oct 2021 09:38:53 +0000 (11:38 +0200)]
[mlir][python] Provide more convenient wrappers for std.ConstantOp
Constructing a ConstantOp using the default-generated API is verbose and
requires to specify the constant type twice: for the result type of the
operation and for the type of the attribute. It also requires to explicitly
construct the attribute. Provide custom constructors that take the type once
and accept a raw value instead of the attribute. This requires dynamic dispatch
based on type in the constructor. Also provide the corresponding accessors to
raw values.
In addition, provide a "refinement" class ConstantIndexOp similar to what
exists in C++. Unlike other "op view" Python classes, operations cannot be
automatically downcasted to this class since it does not correspond to a
specific operation name. It only exists to simplify construction of the
operation.
Depends On D110946
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110947
Alex Zinenko [Mon, 4 Oct 2021 09:38:20 +0000 (11:38 +0200)]
[mlir][python] Usability improvements for Python bindings
Provide a couple of quality-of-life usability improvements for Python bindings,
in particular:
* give access to the list of types for the list of op results or block
arguments, similarly to ValueRange->TypeRange,
* allow for constructing empty dictionary arrays,
* support construction of array attributes by concatenating an existing
attribute with a Python list of attributes.
All these are required for the upcoming customization of builtin and standard
ops.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110946
Hans Wennborg [Fri, 1 Oct 2021 08:59:55 +0000 (10:59 +0200)]
[libFuzzer] Use octal instead of hex escape sequences in PrintASCII
Previously, PrintASCII would print the string "\ta" as "\x09a". However,
in C/C++ those strings are not the same: the trailing 'a' is part of the
escape sequence, which means it's equivalent to "\x9a". This is an
annoying quirk of the standard. (See
https://eel.is/c++draft/lex.ccon#nt:hexadecimal-escape-sequence)
To fix this, output three-digit octal escape sequences instead. Since
octal escapes are limited to max three digits, this avoids the problem
of subsequent characters unintentionally becoming part of the escape
sequence.
Dictionary files still use the non-C-compatible hex escapes, but I
believe we can't change the format since it comes from AFL, and
libfuzzer never writes such files, it only has to read them, so they're
not affected by this change.
Differential revision: https://reviews.llvm.org/D110920
Jingu Kang [Mon, 13 Sep 2021 11:09:16 +0000 (12:09 +0100)]
[LoopBoundSplit] Use SCEVAddRecExpr instead of SCEV for AddRecSCEV (NFC)
Differential Revision: https://reviews.llvm.org/D109682
David Sherwood [Mon, 4 Oct 2021 08:52:26 +0000 (09:52 +0100)]
[NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
Avoid creating EpilogueVectorizationForceVF twice.
Jay Foad [Thu, 30 Sep 2021 09:50:04 +0000 (10:50 +0100)]
[APInt] Stop using soft-deprecated constructors and methods in clang. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.
Differential Revision: https://reviews.llvm.org/D110808
Jay Foad [Thu, 30 Sep 2021 08:54:57 +0000 (09:54 +0100)]
[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.
Differential Revision: https://reviews.llvm.org/D110807
Michał Górny [Mon, 4 Oct 2021 06:25:45 +0000 (08:25 +0200)]
[openmp] [elf_common] Fix linking against LLVM dylib
The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries. Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.
Differential Revision: https://reviews.llvm.org/D111038
Muhammad Omair Javaid [Mon, 4 Oct 2021 06:49:04 +0000 (11:49 +0500)]
[LLDB] Skip TestClangREPL.py on Arm/AArch64 Linux
TestClangREPL.py has been failing randomly on Arm/AArch64 Linux
buildbot. I am marking it as skipped to reduce false alarms.
Tobias Gysi [Mon, 4 Oct 2021 06:23:53 +0000 (06:23 +0000)]
[mli][linalg] Change tensor size in unit test (NFC).
As a follow up to https://reviews.llvm.org/D110849, adapt the input tensor size to match the iteration space.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D110906
Kirill Bobyrev [Mon, 4 Oct 2021 06:39:06 +0000 (08:39 +0200)]
[clangd] Follow-up on rGdea48079b90d
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D110925
Jaroslav Sevcik [Sat, 25 Sep 2021 17:29:04 +0000 (19:29 +0200)]
[lldb] Refactor variable parsing
Separates the methods for recursive variable parsing in function
context and non-recursive parsing of global variables.
Differential Revision: https://reviews.llvm.org/D110570
Philip Reames [Sun, 3 Oct 2021 23:14:06 +0000 (16:14 -0700)]
[SCEV] Cap the number of instructions scanned when infering flags
This addresses a comment from review on D109845. The concern was raised that an unbounded scan would be expensive. Long term plan is to cache this search - likely reusing the existing mechanism for loop side effects - but let's be simple and conservative for now.
Philip Reames [Sun, 3 Oct 2021 23:01:30 +0000 (16:01 -0700)]
[SCEV] Use trivial bound on defining scope of all SCEVs when computing flags
This addresses a comment from review on D109845. Even for SCEVs which we can't find true bounds without recursing through operands, entry to the function forms a trivial upper bound. In some cases, this trivial bound is enough to prove safety of flag inference.
Philip Reames [Sun, 3 Oct 2021 22:32:15 +0000 (15:32 -0700)]
[SCEV] Use full logic when infering flags on add and gep
This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path.
We can still do much better here (on both paths), but this is our first step.
Differential Revision: https://reviews.llvm.org/D111003
Philip Reames [Sun, 3 Oct 2021 22:19:33 +0000 (15:19 -0700)]
[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec
This fixes a violation of the wrap flag rules introduced in
c4048d8f. This is an alternate fix to D106852.
The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.
In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation.
One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.
The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)
Differential Revision: https://reviews.llvm.org/D109845
Nikita Popov [Sun, 3 Oct 2021 20:23:05 +0000 (22:23 +0200)]
[AttrBuilder] Make handling of int attribtues more generifc (NFC)
This is basically the same change as
42cc7f3c524a0ede6b903486c588003fe12d9293
but for integer attributes. Rather than treating each attribute
individually, handle them all the same way. The only thing that
needs to be done per attribute is specify how get/add convert
from/to the raw representation.
Martin Storsjö [Fri, 27 Aug 2021 09:16:03 +0000 (09:16 +0000)]
[openmp] Fix a typo in a test REQUIRES line
Differential Revision: https://reviews.llvm.org/D110963
Roman Lebedev [Sun, 3 Oct 2021 20:37:23 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.
For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111018
Roman Lebedev [Sun, 3 Oct 2021 20:37:22 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.
For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111017
Roman Lebedev [Sun, 3 Oct 2021 20:37:18 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.
For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111016
Roman Lebedev [Sun, 3 Oct 2021 20:37:13 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.
For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111015
Roman Lebedev [Sun, 3 Oct 2021 20:37:09 +0000 (23:37 +0300)]
[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.
For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111014
Roman Lebedev [Sun, 3 Oct 2021 20:23:13 +0000 (23:23 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.
For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111013
Roman Lebedev [Sun, 3 Oct 2021 20:23:13 +0000 (23:23 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.
For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111012
Roman Lebedev [Sun, 3 Oct 2021 20:23:08 +0000 (23:23 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.
For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111011
Roman Lebedev [Sun, 3 Oct 2021 20:23:03 +0000 (23:23 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.
For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111010
Roman Lebedev [Sun, 3 Oct 2021 20:22:58 +0000 (23:22 +0300)]
[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.
For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111008
Yuanfang Chen [Sun, 3 Oct 2021 19:49:14 +0000 (12:49 -0700)]
[Clang][NFC] Fix the comment for Sema::DiagIfReachable
Michał Górny [Sat, 2 Oct 2021 09:52:08 +0000 (11:52 +0200)]
[mlir] [test] Add missing tool substitutions
Add missing mlir-capi-*-test tool substitutions in order to fix CAPI
test failures when mlir is not installed yet.
Differential Revision: https://reviews.llvm.org/D110991
David Green [Sun, 3 Oct 2021 18:30:08 +0000 (19:30 +0100)]
[ARM] Mark <= -1 immediate constant as cheap
A <= -1 constant on a compare can be converted to a < 0 operation, which
is usually cheap. If we mark the constant as cheap, preventing hoisting,
we allow that fold to happen even across different blocks.
Differential Revision: https://reviews.llvm.org/D109360
Simon Pilgrim [Sun, 3 Oct 2021 17:38:47 +0000 (18:38 +0100)]
[X86] Split Cannonlake + Icelake Tuning. NFC
The Ice/Tiger/RocketLake specs were inheriting the tuning settings from CannonLake, a previous architecture. We shouldn't have this dependency, so I've copied the current tuning settings so we can make future adjustments to both CNL + ICL etc. more easily.
Simon Pilgrim [Sun, 3 Oct 2021 16:16:45 +0000 (17:16 +0100)]
[CostModel][X86] X86TTIImpl::getCmpSelInstrCost - try to use Predicate argument directly first (PR48337)
There's still a lot of cases where getCmpSelInstrCost fails to specify a predicate, once those are in place we should be able to remove the fallback to the Instruction argument entirely.
David Green [Sun, 3 Oct 2021 15:32:31 +0000 (16:32 +0100)]
[ARM] Tests for constant hoisting -1 immediates
Kazu Hirata [Sun, 3 Oct 2021 15:22:19 +0000 (08:22 -0700)]
[Analysis, CodeGen] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.
Roman Lebedev [Sun, 3 Oct 2021 14:50:51 +0000 (17:50 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i64 load/store stride=3
Roman Lebedev [Sun, 3 Oct 2021 14:34:21 +0000 (17:34 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=3
Sanjay Patel [Sun, 3 Oct 2021 14:37:22 +0000 (10:37 -0400)]
[InstCombine] fold cast of right-shift if high bits are not demanded (3rd try)
The first two tries at this were reverted because they caused an
infinite loop in instcombine.
That should be fixed after a series of patches that ended with
removing the faulty opposing transform:
3fabd98e5b3e
Original commit message:
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C
Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.
Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF
Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb
Differential Revision: https://reviews.llvm.org/D110170
Sanjay Patel [Sun, 3 Oct 2021 14:35:59 +0000 (10:35 -0400)]
[InstCombine] add test for shl + demanded bits; NFC
This is a reduction of a test that would infinite loop with D110170.
Nikita Popov [Sun, 3 Oct 2021 11:26:14 +0000 (13:26 +0200)]
[InstSimplify] Add additional load from constant test (NFC)
This case does not get folded, because the GEP indexes too deeply
(to the i8), making the bitcast logic not apply (on the [8 x i8]).
Roman Lebedev [Sun, 3 Oct 2021 13:48:45 +0000 (16:48 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i32 load/store stride=3
Roman Lebedev [Sun, 3 Oct 2021 13:31:23 +0000 (16:31 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=3
Dávid Bolvanský [Sun, 3 Oct 2021 12:52:42 +0000 (14:52 +0200)]
Fixed warnings in target/parser codes produced by -Wbitwise-instead-of-logicala
Dávid Bolvanský [Sun, 3 Oct 2021 11:57:57 +0000 (13:57 +0200)]
Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical
Roman Lebedev [Sun, 3 Oct 2021 10:41:30 +0000 (13:41 +0300)]
[NFC][X86][Codegen] Add test coverage for interleaved i8 load/store stride=6
Roman Lebedev [Sun, 3 Oct 2021 10:30:49 +0000 (13:30 +0300)]
[NFC][X86][LV] Add costmodel test coverage for interleaved i8 load/store stride=6
Simon Pilgrim [Sun, 3 Oct 2021 11:31:22 +0000 (12:31 +0100)]
[X86] Add SSE2/AVX1/AVX512BW test coverage to interleaved load/store tests
Extension to PR51979 so codegen tests keep close to the costmodel tests
Dávid Bolvanský [Sun, 3 Oct 2021 11:19:04 +0000 (13:19 +0200)]
Unbreak hexagon-check-builtins.c due to rGb1fcca388441
mydeveloperday [Sun, 3 Oct 2021 11:08:24 +0000 (12:08 +0100)]
[clang-format] allow clang-format to be passed a file of filenames so we can add a regression suite of "clean clang-formatted files" from LLVM
This change now generates that list, and the change to clang-format allows
us to run clang-format quickly over these files via the list of files.
clang-format.exe -verbose -n --files=./clang/docs/tools/clang-formatted-files.txt
```
Clang-formating 7926 files
Formatting [1/7925] clang/bindings/python/tests/cindex/INPUTS/header1.h
..
Formatting [7925/7925] utils/bazel/llvm-project-overlay/llvm/include/llvm/Config/config.h
```
This is needed because putting all those files on the command line is too
long, and invoking 7900+ clang-formats is much slower (too slow to be honest)
Using this method it takes on 7.5 minutes (on my machine) to run
`clang-format -n` over all of the files (7925), this should result in us
testing any change quickly and easily.
We should be able to use rerunning this list to ensure that we don't regress
clang-format over a large code base, but also use it to ensure none of the
previous files which were 100% clang-formatted remain so.
(which the LLVM premerge checks should be enforcing)
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D111000
Dávid Bolvanský [Sun, 3 Oct 2021 11:05:09 +0000 (13:05 +0200)]
Reland "[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects"
This reverts commit
a4933f57f3f0a45e1db1075f7285f0761a80fc06. New warnings were fixed.
Dávid Bolvanský [Sun, 3 Oct 2021 11:04:18 +0000 (13:04 +0200)]
Fixed warnings in LLVM produced by -Wbitwise-instead-of-logical
Dávid Bolvanský [Sun, 3 Oct 2021 10:47:12 +0000 (12:47 +0200)]
Revert "[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects"
This reverts commit
f62d18ff140f67a8776a7a3c62a75645d8d540b5. Found some cases in LLVM itself.
Dávid Bolvanský [Sun, 3 Oct 2021 09:06:19 +0000 (11:06 +0200)]
[Clang] Extend -Wbool-operation to warn about bitwise and of bools with side effects
Motivation: https://arstechnica.com/gadgets/2021/07/google-pushed-a-one-character-typo-to-production-bricking-chrome-os-devices/
Warn for pattern boolA & boolB or boolA | boolB where boolA and boolB has possible side effects.
Casting one operand to int is enough to silence this warning: for example (int)boolA & boolB or boolA| (int)boolB
Fixes https://bugs.llvm.org/show_bug.cgi?id=51216
Differential Revision: https://reviews.llvm.org/D108003
hyeongyu kim [Sun, 3 Oct 2021 08:57:05 +0000 (17:57 +0900)]
[LSV] Change the default value of InstertElement to poison
This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior.
Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value.
Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D111005
Hsiangkai Wang [Sun, 3 Oct 2021 07:43:38 +0000 (15:43 +0800)]
[NFC][RISCV] Update test cases through update_cc_test_checks.py.
Michał Górny [Sat, 2 Oct 2021 09:59:15 +0000 (11:59 +0200)]
[mlir] [test] Include mlir_tools_dir in PATH to fix mlir-reduce
Include mlir_tools_dir in the PATH used in test environment,
as otherwise mlir-reduce is unable to find mlir-opt when building
standalone (and hence mlir_tools_dir != llvm_tools_dir).
Differential Revision: https://reviews.llvm.org/D110992
Mehdi Amini [Sun, 3 Oct 2021 01:24:07 +0000 (01:24 +0000)]
Fix ASAN execution for the MLIR Python tests
First the leak sanitizer has to be disabled, as even an empty script
leads to leak detection with Python.
Then we need to preload the ASAN runtime, as the main binary (python)
won't be linked against it. This will only work on Linux right now.
Differential Revision: https://reviews.llvm.org/D111004
Mehdi Amini [Sun, 3 Oct 2021 01:25:10 +0000 (01:25 +0000)]
Exclude MLIR python binding tests from Sanitizer tests for now
This requires more config to work reliably during lit execution.
But also I see many leaks when running manually right now.