Philip Reames [Tue, 5 Oct 2021 21:45:48 +0000 (14:45 -0700)]
[test] factor out reliance on noundef return value
Philip Reames [Tue, 5 Oct 2021 21:41:29 +0000 (14:41 -0700)]
[test] rework recently added SCEV tests
These are meant to check a future patch which recurses through operands of SCEVs, but because all SCEVs are trivially bounded by function entry, we need to arrange the trivial scope not to be valid. (i.e. we specifically need a lower defining scope)
Mircea Trofin [Thu, 30 Sep 2021 23:13:05 +0000 (16:13 -0700)]
[inliner] Mandatory inlining decisions produce remarks
This also removes the need to disable the mandatory inlining phase in
tests.
In a departure from the previous remark, we don't output a 'cost' in
this case, because there's no such thing. We just report that inlining
happened because of the attribute.
Differential Revision: https://reviews.llvm.org/D110891
Louis Dionne [Tue, 5 Oct 2021 17:37:43 +0000 (13:37 -0400)]
[libc++] Run the no-unicode CI job on new testing configs
This was most likely an oversight, since we're running all other jobs on
the new configs.
Differential Revision: https://reviews.llvm.org/D111168
Aaron Ballman [Tue, 5 Oct 2021 20:43:55 +0000 (16:43 -0400)]
Fix some Sphinx warnings in the static analyzer docs
A heading wasn't underlined properly, and two links share the same text
and so they should use an anonymous hyperlink instead of a named one.
Aaron Ballman [Tue, 5 Oct 2021 20:38:01 +0000 (16:38 -0400)]
Update the release notes for consteval if support; NFC
This support was landed in
424733c12aacc227a28114deba72061153f8dff2.
Sanjay Patel [Tue, 5 Oct 2021 20:26:01 +0000 (16:26 -0400)]
[InstCombine] refactor folds of 'not' instructions; NFC
This removes repeated calls to m_Not, so hopefully a little
more efficient.
Also, we may need to enhance some of these blocks to allow
logical and/or (select of bools).
Sam Clegg [Tue, 28 Sep 2021 18:43:47 +0000 (11:43 -0700)]
[lld][WebAssembly] Create optional internal symbols only after LTO object as been added
This is important for the cases where new symbols can be introduced
during LTO. Specifically this happens for during TLS-lowering where
references to `__tls_base` can be introduced.
Fixes: https://github.com/emscripten-core/emscripten/issues/12489
Differential Revision: https://reviews.llvm.org/D111171
Nikita Popov [Tue, 5 Oct 2021 20:25:41 +0000 (22:25 +0200)]
[SCEV] Don't check if propagation safe if there are no flags (NFC)
If there are no nowrap flags, then we don't need to determine
whether propagating flags is safe -- it will make no difference.
Philip Reames [Tue, 5 Oct 2021 20:16:10 +0000 (13:16 -0700)]
[tests] Cover cases we could infer SCEV flags, but don't
Vitaly Buka [Tue, 5 Oct 2021 20:08:16 +0000 (13:08 -0700)]
[sanitizer] Fix Android bot
We don't need to check for equality, we need to check
that storage is large enough.
Vitaly Buka [Tue, 5 Oct 2021 19:14:48 +0000 (12:14 -0700)]
[NFC][sanitizer] Combine MSAN data in single field
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D111118
Jonas Devlieghere [Tue, 5 Oct 2021 19:12:04 +0000 (12:12 -0700)]
[lldb] Improve meta data stripping from JSON crashlogs
JSON crashlogs normally start with a single line of meta data that we
strip unconditionally. Some producers started omitting the meta data
which tripped up crashlog. Be more resilient by only removing the first
line when we know it really is meta data.
rdar://
82641662
Roman Lebedev [Tue, 5 Oct 2021 18:06:30 +0000 (21:06 +0300)]
[NFC] Fixup newly-added costmodel tests to actually test what they should
Valentin Clement [Tue, 5 Oct 2021 18:32:43 +0000 (20:32 +0200)]
[fir] Add external name interop pass
Add the external name conversion pass needed for compiler
interoperability. This pass convert the Flang internal symbol name to
the common gfortran convention.
Clean up old passes without implementation in the Passes.ts file so
the project and fir-opt can build correctly.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: schweitz
Differential Revision: https://reviews.llvm.org/D111057
Louis Dionne [Thu, 30 Sep 2021 18:16:39 +0000 (14:16 -0400)]
[libc++abi] Mark __cxa_new_handler with _LIBCPP_SAFE_STATIC
For consistency with the other handlers, and because requiring constant
initialization whenever we can is a good thing.
Differential Revision: https://reviews.llvm.org/D110866
Lei Zhang [Tue, 5 Oct 2021 17:56:20 +0000 (13:56 -0400)]
[mlir][spirv] Add ops and patterns for lowering standard max/min ops
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D111143
peter klausler [Wed, 29 Sep 2021 23:42:22 +0000 (16:42 -0700)]
[flang] Fold MAXLOC and MINLOC
Generalize the code that folds FINDLOC to also handle
folding for MAXLOC and MINLOC.
Differential Revision: https://reviews.llvm.org/D110951
Philip Reames [Tue, 5 Oct 2021 18:15:35 +0000 (11:15 -0700)]
[SCEV] Tweak the algorithm for figuring out if flags must apply to a SCEV [mostly-NFC]
Behavior wise, this patch should be mostly NFC. The only behavior difference known is that on the isSCEVExprNeverPoison path we'll consider a bound imposed by the SCEVable operands (if any).
Algorithmically, it's an invert of the existing code. Previously, we checked for each operand if we could find a bound, then checked for must-execute given that bound. With the patch, we use dominance to refine the innermost bound, then check must execute once. The interesting case is when we have multiple unknowns within a single basic block. While both dominance and must-execute are worst-case linear walks within the block, only dominance is cached. As such, refining based on dominance should be more efficient.
River Riddle [Tue, 5 Oct 2021 00:24:24 +0000 (00:24 +0000)]
[mlir:Pass] Generate a reproducer as early as possible
This avoids keeping references to passes that may be freed by
the time that the pass manager has finished executing (in the
non-crash case).
Fixes PR#52069
Differential Revision: https://reviews.llvm.org/D111106
Joe Loser [Tue, 5 Oct 2021 18:08:42 +0000 (14:08 -0400)]
[libc++][test] Use = delete over DELETE_FUNCTION. NFC.
Some tests repeat the definition of `DELETE_FUNCTION` macro locally.
However, it's not even requred to guard against in the C++03 case since
Clang supports `= delete;` in C++03 mode. A warning is issued but
`libc++` tests run with `-Wno-c++11-extensions`, so this isn't an issue.
Since we don't support other compilers in C++03 mode, `= delete;` is
always available for use. As such, inline all calls of `DELETE_FUNCTION`
to use `= delete;`.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D111148
Rob Suderman [Tue, 5 Oct 2021 17:49:08 +0000 (10:49 -0700)]
[mlir][tosa] tosa.cast support for unsigned integers
Unsigned integers need to be handled for cast to floating point.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D111102
Amy Huang [Mon, 20 Sep 2021 16:50:49 +0000 (09:50 -0700)]
[Sema] Allow comparisons between different ms ptr size address space types.
We're currently using address spaces to implement __ptr32/__ptr64 attributes;
this patch fixes a bug where clang doesn't allow types with different pointer
size attributes to be compared.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51889
Differential Revision: https://reviews.llvm.org/D110670
Simon Pilgrim [Tue, 5 Oct 2021 17:34:02 +0000 (18:34 +0100)]
[llvm] Update report_fatal_error calls from raw_string_ostream to use Twine(OS.str())
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().
David Zarzycki [Tue, 5 Oct 2021 17:39:29 +0000 (13:39 -0400)]
[lldb testing] NFC: run through clang-format
Keith Smiley [Sat, 2 Oct 2021 00:41:56 +0000 (17:41 -0700)]
[lldb] Improve help for platform put-file
Previously it was not clear what arguments this required, or what it would do if you didn't pass the destination argument.
Differential Revision: https://reviews.llvm.org/D110981
Petr Hosek [Tue, 5 Oct 2021 07:26:08 +0000 (00:26 -0700)]
[InstrProfData] Bump the raw profile version to 8
This is to account for the change that made CountersPtr in __profd_
relative which landed in
a1532ed27582038e2d9588108ba0fe8237f01844.
That change hasn't updated the raw profile version, and while the
profile layout stayed the same, profiles generated by tip-of-tree
LLVM are incompatible with 13.x tooling.
Differential Revision: https://reviews.llvm.org/D111123
Kirill Bobyrev [Tue, 5 Oct 2021 16:44:43 +0000 (18:44 +0200)]
[clangd] Revert unwanted change from D108194
Geoffrey Martin-Noble [Tue, 5 Oct 2021 01:09:02 +0000 (18:09 -0700)]
[MLIR][linalg] Preserve location during elementwise fusion
This otherwise loses a lot of debugging info and results in a painful
debugging experience.
Reviewed By: mravishankar, stellaraccident
Differential Revision: https://reviews.llvm.org/D111107
Alexey Bataev [Tue, 5 Oct 2021 13:33:14 +0000 (06:33 -0700)]
[SLP]Detect reused scalars in all possible gathers for better vectorization cost.
Some initially gathered nodes missed the check for the reused scalars,
which leads to high gather cost. Such nodes still can be represented as
m gathers + shuffle instead of n gathers, where m < n.
Differential Revision: https://reviews.llvm.org/D111153
Roman Lebedev [Tue, 5 Oct 2021 16:28:23 +0000 (19:28 +0300)]
[NFC][X86][LV] Add basic costmodel test coverage for not-fully-interleaved i32 loads
The coverage could have cumulative explosion here,
so i'm adding only the most basic cases,
and hoping it's enough, though more can be added if needed.
Aart Bik [Mon, 4 Oct 2021 20:13:24 +0000 (13:13 -0700)]
[mlir][sparse] add a "release" operation to sparse tensor dialect
We have several ways to materialize sparse tensors (new and convert) but no explicit operation to release the underlying sparse storage scheme at runtime (other than making an explicit delSparseTensor() library call). To simplify memory management, a sparse_tensor.release operation has been introduced that lowers to the runtime library call while keeping tensors, opague pointers, and memrefs transparent in the initial IR.
*Note* There is obviously some tension between the concept of immutable tensors and memory management methods. This tension is addressed by simply stating that after the "release" call, no further memref related operations are allowed on the tensor value. We expect the design to evolve over time, however, and arrive at a more satisfactory view of tensors and buffers eventually.
Bug:
http://llvm.org/pr52046
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111099
Jonas Paulsson [Tue, 3 Aug 2021 17:49:45 +0000 (19:49 +0200)]
[SystemZ] Implement memcmp of variable length with CLC.
Following the same pattern of memset/memcpy, this patch implements a variable
length memcmp with a CLC loop followed by an EXRL instruction.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D107380
Nikita Popov [Tue, 5 Oct 2021 16:09:09 +0000 (18:09 +0200)]
[APInt] Fix type limits warning (NFC)
Unsigned number is always >= 0.
Matt Beardsley [Tue, 5 Oct 2021 16:09:25 +0000 (18:09 +0200)]
[clang-tidy] Fix add_new_check.py to generate correct list.rst autofix column from relative path
Previously, the code in add_new_check.py that looks for fixit keywords in check source files when generating list.rst assumed that the script would only be called from its own path. That means it doesn't find any source files for the checks it's attempting to scan for, and it defaults to writing out nothing in the "Offers fixes" column for all checks. Other parts of add_new_check.py work from other paths, just not this part.
After this fix, add_new_check.py's "offers fixes" column generation for list.rst will be consistent regardless of what path it's called from by using the caller path that's deduced elsewhere already from sys.argv[0].
Reviewed By: kbobyrev
Differential Revision: https://reviews.llvm.org/D110600
Joe Nash [Mon, 4 Oct 2021 14:29:51 +0000 (10:29 -0400)]
[MacroFusion] Expose useful static methods. NFC.
hasLessThanNumFused and fuseInstructionPair are useful for
DAG mutations similar to MacroFusion, but which cannot use
MacroFusion as a whole (such as fusing non-dependent instruction).
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D111070
Change-Id: I3a5d56aba0471d45ef64cebb9b724030e2eae2f3
Kirill Bobyrev [Tue, 5 Oct 2021 16:08:00 +0000 (18:08 +0200)]
[clangd] IncludeCleaner: Mark used headers
Follow-up on D105426.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D108194
Nikita Popov [Sun, 3 Oct 2021 10:33:59 +0000 (12:33 +0200)]
[ConstantFold] Refactor load folding
This refactors load folding to happen in two cleanly separated
steps: ConstantFoldLoadFromConstPtr() takes a pointer to load from
and decomposes it into a constant initializer base and an offset.
Then ConstantFoldLoadFromConst() loads from that initializer at
the given offset. This makes the core logic independent of having
actual GEP expressions (and those GEP expressions having certain
structure) and will allow exposing ConstantFoldLoadFromConst() as
an independent API in the future.
This is mostly only a refactoring, but it does make the folding
logic slightly more powerful.
Differential Revision: https://reviews.llvm.org/D111023
Simon Pilgrim [Tue, 5 Oct 2021 16:02:18 +0000 (17:02 +0100)]
[Support] Update SmallVector report_fatal_error calls to use Twine and add missing implicit header dependency.
Simon Pilgrim [Tue, 5 Oct 2021 16:00:13 +0000 (17:00 +0100)]
[TableGen] CodeEmitterGen - emit report_fatal_error(const char*) instead of report_fatal_error(std::string&)
As described on D111049, we're trying to remove the <string> dependency from error handling. In most cases the plan is to use the Twine() variant directly but to reduce introducing additional headers for the generated files, I'm using the const char* variant here instead.
Simon Pilgrim [Tue, 5 Oct 2021 15:29:33 +0000 (16:29 +0100)]
[clang] FatalErrorHandler.cpp - add explicit <stdio.h> include
Required for fprintf/stderr usage in the error handler, noticed while trying to remove the <string> dependency described in D111049
Chris Lattner [Tue, 5 Oct 2021 04:33:51 +0000 (21:33 -0700)]
[APInt] Make insertBits and concat work with zero width APInts.
These should both clearly work with our current model for zero width
integers, but don't until now!
Differential Revision: https://reviews.llvm.org/D111113
Utkarsh Saxena [Mon, 4 Oct 2021 06:20:09 +0000 (08:20 +0200)]
[clangd] Include refs of base method in refs for derived method.
Addresses https://github.com/clangd/clangd/issues/881
Includes refs of base class method in refs of derived class method.
Previously we reported base class method's refs only for decl of derived
class method. Ideally this should work for all usages of derived class method.
Related patch:
https://github.com/llvm/llvm-project/commit/
fbeff2ec2bc6e44b92931207b0063f83ff7a3b3a.
Differential Revision: https://reviews.llvm.org/D111039
Kazu Hirata [Tue, 5 Oct 2021 15:29:19 +0000 (08:29 -0700)]
[llvm] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name. See
llvm/include/llvm/IR/InstrTypes.h for details.
Amara Emerson [Tue, 5 Oct 2021 15:24:44 +0000 (08:24 -0700)]
Revert "Revert "Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable""""
This reverts commit
c93bc508ee446d17f9d5d59b48d98aef15f22d52.
Seems to break a different thing now.
Jonas Paulsson [Tue, 20 Jul 2021 18:53:22 +0000 (20:53 +0200)]
[SystemZ] Implement memcpy of variable length with MVC.
Instead of making a memcpy libcall, emit an MVC loop and an EXRL instruction
the same way as is already done for memset 0.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D106874
Peter Waller [Tue, 21 Sep 2021 14:31:09 +0000 (14:31 +0000)]
[AArch64][SVE] Remove redundant PTEST following PNEXT/PFIRST
PNEXT and PFIRST set the NZCV flags, so the subsequent PTEST can be
optimized away in AArch64InstrInfo::optimizePTestInstr.
See-also: https://reviews.llvm.org/D93292
Differential Revision: https://reviews.llvm.org/D110177
Matthew Devereau [Mon, 4 Oct 2021 15:56:56 +0000 (16:56 +0100)]
[AArch64][SVE] Propagate math flags from intrinsics to instructions
Retain floating-point math flags inside instCombineSVEVectorBinOp
David Zarzycki [Tue, 5 Oct 2021 14:21:32 +0000 (10:21 -0400)]
[lldb testing] Avoid subtle terminfo behavioral differences
The original "arbitrary" changes were causing EINVAL on a Fedora 34 box.
Joe Loser [Tue, 5 Oct 2021 14:14:48 +0000 (10:14 -0400)]
[libc++][test] Remove unused macro in is_constructible.pass.cpp. NFC.
Test file defines `LIBCPP11_STATIC_ASSERT` but it never uses it now. It
always uses `static_assert` unconditionally. So, remove the unused
macro.
TN Khanh [Tue, 5 Oct 2021 14:04:34 +0000 (19:34 +0530)]
Add .cmt and .cmti files for OCaml bindings
We can build .cmt and .cmti files for easier
code navigation for OCaml bindings
Roman Lebedev [Tue, 5 Oct 2021 13:28:54 +0000 (16:28 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1jfGddcre - for intels `Block RThroughput: =36.0`; for ryzens, `Block RThroughput: =12.0`
So could pick cost of `36`
For store we have:
https://godbolt.org/z/ao9srMT8r - for intels `Block RThroughput: =30.0`; for ryzens, `Block RThroughput: =12.0`
So we could pick cost of `30`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111094
Roman Lebedev [Tue, 5 Oct 2021 13:28:54 +0000 (16:28 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/rc8jYxW6M - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So could pick cost of `18`.
For store we have:
https://godbolt.org/z/9PhPEr65G - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: =6.0`
So we could pick cost of `15`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111093
Roman Lebedev [Tue, 5 Oct 2021 13:28:49 +0000 (16:28 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/onese7rec - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =3.0`
So could pick cost of `6`.
For store we have:
https://godbolt.org/z/bMd7dddnT - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=6.0`
So we could pick cost of `8`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111092
Roman Lebedev [Tue, 5 Oct 2021 13:28:04 +0000 (16:28 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=6 VF=16 interleaving costs
This one required quite a bit of an assembly surgery, but i think it's in the right ballpark..
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/na97Kb96o - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So could pick cost of `64`.
For store we have:
https://godbolt.org/z/GG1WeoKar - for intels `Block RThroughput: =66.0`; for ryzens, `Block RThroughput: <=27.5`
So we could pick cost of `66`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111091
Roman Lebedev [Tue, 5 Oct 2021 13:28:03 +0000 (16:28 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/jK85GWKaK - for intels `Block RThroughput: =31.0`; for ryzens, `Block RThroughput: <=17.0`
So could pick cost of `31`.
For store we have:
https://godbolt.org/z/hPWWhEEf9 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=13.8`
So we could pick cost of `33`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111089
Roman Lebedev [Tue, 5 Oct 2021 13:27:58 +0000 (16:27 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/szEj1ceee - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=8.8`
So could pick cost of `15`.
For store we have:
https://godbolt.org/z/81bq4fTo1 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=10.0`
So we could pick cost of `12`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111087
Roman Lebedev [Tue, 5 Oct 2021 13:27:53 +0000 (16:27 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.3`
So could pick cost of `6`.
For store we have:
https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So we could pick cost of `9`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111083
Roman Lebedev [Mon, 4 Oct 2021 19:50:11 +0000 (22:50 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/3M3hbq7n8 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0`
So could pick cost of `20`.
For store we have:
https://godbolt.org/z/zvnPYWTx7 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0`
So we could pick cost of `20`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111076
Roman Lebedev [Mon, 4 Oct 2021 19:50:11 +0000 (22:50 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/MTKdzjvnr - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So could pick cost of `8`.
For store we have:
https://godbolt.org/z/cMYEvqoah - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So we could pick cost of `8`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111075
Roman Lebedev [Mon, 4 Oct 2021 19:50:06 +0000 (22:50 +0300)]
[X86][Costmodel] Load/store i64/f64 Stride=4 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/z197317d1 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So could pick cost of `6`.
For store we have:
https://godbolt.org/z/8dzszjf9q - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So we could pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111073
Roman Lebedev [Mon, 4 Oct 2021 16:30:14 +0000 (19:30 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=16 interleaving costs
This one required quite a bit of assembly surgery, but the trend continues, so i think this is right.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/EKWdj8cKT - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`.
For store we have:
https://godbolt.org/z/zj4bb9P75 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `32`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111064
Roman Lebedev [Mon, 4 Oct 2021 16:30:14 +0000 (19:30 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/a6rxMG6ec - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=12.0`
So could pick cost of `16`.
For store we have:
https://godbolt.org/z/ced1bdqc9 - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So we could pick cost of `16`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111063
Roman Lebedev [Mon, 4 Oct 2021 16:30:07 +0000 (19:30 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/avq1oz98W - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So could pick cost of `8`.
For store we have:
https://godbolt.org/z/89PGMc1qs - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=6.0`
So we could pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111061
Roman Lebedev [Mon, 4 Oct 2021 16:30:00 +0000 (19:30 +0300)]
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=2 interleaving costs
Finally, we are getting to the heavy-hitter stuff!
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/7crGWoar6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So could pick cost of `4`.
For store we have:
https://godbolt.org/z/T8aq3MszM - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.0`
So we could pick cost of `5`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111060
kpyzhov [Tue, 5 Oct 2021 13:56:04 +0000 (09:56 -0400)]
[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration.
The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument.
This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf.
Differential revision: https://reviews.llvm.org/D110337
Hsiangkai Wang [Tue, 5 Oct 2021 06:20:36 +0000 (14:20 +0800)]
[RISCV] Update to vlm.v and vsm.v according to v1.0-rc1.
vle1.v -> vlm.v
vse1.v -> vsm.v
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106044
Lei Zhang [Tue, 5 Oct 2021 13:32:35 +0000 (09:32 -0400)]
[mlir] Add an 'cppNamespace' field to availability
This allows us to generate interfaces in a namespace,
following other TableGen'erated code.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108311
Dmitry Vyukov [Tue, 5 Oct 2021 13:26:02 +0000 (15:26 +0200)]
tsan: improve detection of stack/tls races
Print meaningful stack frames for stack/tls races
(instead of PC 1/2 that don't symbolize).
Imitate stack/tls writes after we create and initialize
the new thread, otherwise the races are not detected.
This is re-submit of the following reverted commits,
but without tests as they failed on a number of OSes/arches:
"tsan: fix and test detection of TLS races"
"tsan: fix tls_race3 test on darwin"
"tsan: print a meaningful frame for stack races"
Differential Revision: https://reviews.llvm.org/D111147
Lei Zhang [Tue, 5 Oct 2021 13:31:49 +0000 (09:31 -0400)]
[mlir][spirv] Fix path in define_enum.sh script
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D108310
Jay Foad [Fri, 1 Oct 2021 14:39:38 +0000 (15:39 +0100)]
[PHIElimination] Update LiveVariables after handling an unspillable terminator
Update the LiveVariables analysis after the special handling for
unspillable terminators which was added in D91358. This is just enough
to fix some "Block should not be in AliveBlocks" / "Block missing from
AliveBlocks" errors in the codegen test suite when machine verification
is forced to run after PHIElimination (currently it is disabled).
Differential Revision: https://reviews.llvm.org/D110939
Dmitry Vyukov [Fri, 24 Sep 2021 04:57:37 +0000 (06:57 +0200)]
tsan: make cur_thread_init return cur_thread
Whenever we call cur_thread_init, we call cur_thread on the next line.
So make cur_thread_init return the current thread directly.
Makes code a bit shorter, does not affect codegen.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D110384
Vassil Vassilev [Tue, 5 Oct 2021 06:11:42 +0000 (06:11 +0000)]
Reland "[clang-repl] Allow loading of plugins in clang-repl."
Differential revision: https://reviews.llvm.org/D110484
Jeremy Morse [Tue, 5 Oct 2021 12:44:40 +0000 (13:44 +0100)]
[DebugInfo][InstrRef] Track all of DBG_PHIs operands
An important part of the instruction referencing solution is that we
identify all the registers that values move between before we then compute
an SSA-like function from the machine code, and from the variable
intrinsics. DBG_PHIs weren't causing all the subregisters of their operands
to be tracked; this patch forces that to happen.
The practical implications were that not enough space is allocated for
storing values when analysing the function -- asan will crash on the
attached test case with an unpatched compiler. Non-asan llc's will produce
a DBG_VALUE $noreg, where it should be $dil.
Differential Revision: https://reviews.llvm.org/D109064
Kamau Bridgeman [Thu, 30 Sep 2021 16:36:54 +0000 (11:36 -0500)]
[PowerPC][MMA] Allow MMA builtin types in pre-P10 compilation units
This patch allows the use of __vector_quad and __vector_pair, PPC MMA builtin
types, on all PowerPC 64-bit compilation units. When these types are
made available the builtins that use them automatically become available
so semantic checking for mma and pair vector memop __builtins is also
expanded to ensure these builtin function call are only allowed on
Power10 and new architectures. All related test cases are updated to
ensure test coverage.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D109599
Tobias Gysi [Tue, 5 Oct 2021 12:24:19 +0000 (12:24 +0000)]
[mlir][linalg] Move generalization pattern to Transforms (NFC).
Move the generalization pattern to the other Linalg transforms to make it available to the codegen strategy.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110728
Raphael Isemann [Tue, 5 Oct 2021 12:27:54 +0000 (14:27 +0200)]
[lldb][NFC] Remove unnecessary include in cpp/const_this test
Aaron Ballman [Tue, 5 Oct 2021 12:21:29 +0000 (08:21 -0400)]
consteval if is now fully supported
This amends
424733c12aacc227a28114deba72061153f8dff2 which accidentally
dropped the change to the status page.
Corentin Jabot [Tue, 5 Oct 2021 12:02:53 +0000 (08:02 -0400)]
Implement if consteval (P1938)
Modify the IfStmt node to suppoort constant evaluated expressions.
Add a new ExpressionEvaluationContext::ImmediateFunctionContext to
keep track of immediate function contexts.
This proved easier/better/probably more efficient than walking the AST
backward as it allows diagnosing nested if consteval statements.
Valentin Clement [Tue, 5 Oct 2021 12:01:17 +0000 (14:01 +0200)]
[fir] Split FIROptimizer lib into several smaller libraries
Partition libFIROptimizer into smaller libraries that reflect the
structure. Adapt potential problems.
This patch is part of the upstreaming effort from fir-dev branch. It's a
building stone to upstreaming transformations.
Reviewed By: schweitz
Differential Revision: https://reviews.llvm.org/D111055
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Sjoerd Meijer [Tue, 5 Oct 2021 11:12:39 +0000 (12:12 +0100)]
[SCCPSolver] Fix use-after-free in markArgInFuncSpecialization
In SCCPSolver::markArgInFuncSpecialization, the ValueState map may be
reallocated *after* the initial ValueLatticeElement reference is grabbed, but
*before* its use in copy initialization. This causes a use-after-free. To fix
this, this commit changes the behavior to create the new ValueLatticeElement
before assigning the old one to it.
Patch by: https://github.com/duck-37/
Differential Revision: https://reviews.llvm.org/D111112
Mirko Brkusanin [Tue, 5 Oct 2021 11:43:39 +0000 (13:43 +0200)]
[GlobalISel] Combine fabs(fneg(x)) to fabs(x)
Differential Revision: https://reviews.llvm.org/D110943
Nicolas Vasilache [Tue, 5 Oct 2021 10:06:50 +0000 (10:06 +0000)]
[mlir][Linalg] Allow operand-less scf::ExecuteRegionOp to encapsulate scf::YieldOp
These are considered noops.
Buferization will still fail on scf.execute_region which yield values.
This is used to make comprehensive bufferization interoperate better with external clients.
Differential Revision: https://reviews.llvm.org/D111130
Aaron Ballman [Tue, 5 Oct 2021 11:13:00 +0000 (07:13 -0400)]
Silence an implicit conversion warning on the bit shift result in MSVC; NFC
gbhyamso [Tue, 5 Oct 2021 11:04:01 +0000 (12:04 +0100)]
[llvm-cxxfilt][NFC] Fix test for running in Windows cmd
The test llvm\test\tools\llvm-cxxfilt\delimiters.test started failling when run
from cmd.exe on Windows after D110986 which added a unicode character (⦙) to it.
Piping the unicode character in cmd.exe causes it to be converted to a '?'.
That causes the test to fail because the llvm-cxxfilt output becomes Foo?Bar
rather than the expected Foo⦙Bar.
Redirect the echo output to and from a temporary file to get around this
problem.
It's not entirely clear what the root cause is, but two separate downstream
builders are tripping up on this, so we are landing the work around for the
time being.
Differential Revision: https://reviews.llvm.org/D111072
Balázs Kéri [Tue, 5 Oct 2021 10:24:34 +0000 (12:24 +0200)]
[clang][ASTImporter] Add import of thread safety attributes.
Attributes of "C/C++ Thread safety attributes" section in Attr.td
are added to ASTImporter. The not added attributes from this section
do not need special import handling.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D110528
Max Kazantsev [Tue, 5 Oct 2021 10:49:03 +0000 (17:49 +0700)]
[Test] Add test showing profitable peeling opportunity
Patch by Dmitry Makogon!
LLVM GN Syncbot [Tue, 5 Oct 2021 10:41:33 +0000 (10:41 +0000)]
[gn build] Port
214054f78a4e
Michał Górny [Fri, 1 Oct 2021 15:21:45 +0000 (17:21 +0200)]
[lldb] Move DynamicRegisterInfo to public Target library
Move DynamicRegisterInfo from the internal lldbPluginProcessUtility
library to the public lldbTarget library. This is a prerequisite
towards ABI plugin changes that are going to pass DynamicRegisterInfo
parameters.
Differential Revision: https://reviews.llvm.org/D110942
Andrew Ng [Mon, 4 Oct 2021 16:09:18 +0000 (17:09 +0100)]
[ELF][test] Enhance relative dynamic relocation tests
Add checking of the value of the relocation with an addend. Also check
all relocation offsets.
Differential Revision: https://reviews.llvm.org/D111071
Bjorn Pettersson [Mon, 20 Sep 2021 09:32:01 +0000 (11:32 +0200)]
[SelectionDAG] Replace error prone index check in BaseIndexOffset::computeAliasing
Deriving NoAlias based on having the same index in two BaseIndexOffset
expressions seemed weird (and as shown in the added unittest the
correctness of doing so depended on undocumented pre-conditions that
the user of BaseIndexOffset::computeAliasing would need to take care
of.
This patch removes the code that dereived NoAlias based on indices
being the same. As a compensation, to avoid regressions/diffs in
various lit test, we also add a new check. The new check derives
NoAlias in case the two base pointers are based on two different
GlobalValue:s (neither of them being a GlobalAlias).
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D110256
Bjorn Pettersson [Mon, 20 Sep 2021 08:29:11 +0000 (10:29 +0200)]
[SelectionDAG] Assume that a GlobalAlias may alias other global values
This fixes a bug detected in DAGCombiner when using global alias
variables. Here is an example:
@foo = global i16 0, align 1
@aliasFoo = alias i16, i16 * @foo
define i16 @bar() {
...
store i16 7, i16 * @foo, align 1
store i16 8, i16 * @aliasFoo, align 1
...
}
BaseIndexOffset::computeAliasing would incorrectly derive NoAlias
for the two accesses in the example above, resulting in DAGCombiner
miscompiles.
This patch fixes the problem by a defensive approach letting
BaseIndexOffset::computeAliasing return false, i.e. that the aliasing
couldn't be determined, when comparing two global values and at least
one is a GlobalAlias. In the future we might improve this with a
deeper analysis to look at the aliasee for the GlobalAlias etc. But
that is a bit more complicated considering that we could have
'local_unnamed_addr' and situations with several 'alias' variables.
Fixes PR51878.
Differential Revision: https://reviews.llvm.org/D110064
Adrian Kuegel [Tue, 5 Oct 2021 09:10:42 +0000 (11:10 +0200)]
[mlir] Convert ConstShapeOp to a static tensor type.
ConstShapeOp knows its shape, so it should also have a static tensor type.
Differential Revision: https://reviews.llvm.org/D111127
Jay Foad [Fri, 1 Oct 2021 12:30:42 +0000 (13:30 +0100)]
[AMDGPU][GlobalISel] Fix legalization of G_UMULH
Scalarize before narrowing because the narrowing implementation does not
work on vectors. This matches what we do for regular G_MUL.
Differential Revision: https://reviews.llvm.org/D111129
Simon Pilgrim [Tue, 5 Oct 2021 09:51:28 +0000 (10:51 +0100)]
[Support] Change fatal_error_handler_t to take a const char* instead of std::string
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
Excessive use of the <string> header has a massive impact on compile time; its most commonly included via the ErrorHandling.h header, which has to be included in many key headers, impacting many source files that have no need for std::string.
As an initial step toward removing the <string> include from ErrorHandling.h, this patch proposes to update the fatal_error_handler_t handler to just take a raw const char* instead.
The next step will be to remove the report_fatal_error std::string variant, which will involve a lot of cleanup and better use of Twine/StringRef.
Differential Revision: https://reviews.llvm.org/D111049
Jay Foad [Tue, 5 Oct 2021 09:47:54 +0000 (10:47 +0100)]
[GlobalISel] Simplify narrowScalarMul. NFC.
Remove some redundancy because the source and result types of any
multiply are always the same.
David Green [Tue, 5 Oct 2021 09:51:18 +0000 (10:51 +0100)]
[ARM] Reset speculation-hardening-sls.ll test checks.
The commit
e497b12a69604b6d691312a30f6b86da4f18f7f8 went and regenerated
all the checks lines in the Arm speculation-hardening-sls.ll test in a
way that removed most of the important checks. This just resets them
back to how they were before, with the single character fix to change:
; NOHARDENARM: {{bxge lr$}}
to
; NOHARDENARM: {{bxgt lr$}}
Differential Revision: https://reviews.llvm.org/D111074
Frederik Gossen [Mon, 4 Oct 2021 14:42:24 +0000 (16:42 +0200)]
[MLIR] Add an option to disable `maxIterations` in greedy pattern rewrites
This option is needed for passes that are known to reach a fix point, but may
need many iterations depending on the size of the input IR.
Differential Revision: https://reviews.llvm.org/D111058
David Green [Tue, 5 Oct 2021 09:32:30 +0000 (10:32 +0100)]
[AArch64] Make speculation-hardening-sls.ll x16 test more robust
As suggested in D110830, this copies the Arm backend method of testing
function calls through specific registers, using inline assembly to
force the variable into x16 to check that the __llvm_slsblr_thunk calls
do not use a register that may be clobbered by the linker.
Differential Revision: https://reviews.llvm.org/D111056
Tim Northover [Thu, 23 Sep 2021 10:31:03 +0000 (11:31 +0100)]
AArch64+GISel: legalize vector remainder operations.