platform/upstream/llvm.git
2 years ago[ARM] Regenerate sxt_rot.ll tests
Simon Pilgrim [Sun, 21 Nov 2021 18:33:05 +0000 (18:33 +0000)]
[ARM] Regenerate sxt_rot.ll tests

2 years ago[Thumb2] Regenerate ext + rot tests
Simon Pilgrim [Sun, 21 Nov 2021 18:32:10 +0000 (18:32 +0000)]
[Thumb2] Regenerate ext + rot tests

2 years ago[PowerPC] Regenerate rlwinm2.ll test
Simon Pilgrim [Sun, 21 Nov 2021 18:30:58 +0000 (18:30 +0000)]
[PowerPC] Regenerate rlwinm2.ll test

2 years agoAdd a best practice section on how to configure a fast builder
Philip Reames [Sun, 21 Nov 2021 16:00:34 +0000 (08:00 -0800)]
Add a best practice section on how to configure a fast builder

This is based on conversations with a couple of folks currently running buildbots. There's a couple pieces which didn't make it in, but this tries to cover the common themes.

Differential Revision: https://reviews.llvm.org/D114325

2 years ago[MLIR][NFC] Simplex::restoreRow: improve documentation
Arjun P [Sun, 21 Nov 2021 13:53:15 +0000 (19:23 +0530)]
[MLIR][NFC] Simplex::restoreRow: improve documentation

2 years ago[ARM][ParallelDSP] Regenerate complex_dot_prod.ll test
Simon Pilgrim [Sun, 21 Nov 2021 12:01:44 +0000 (12:01 +0000)]
[ARM][ParallelDSP] Regenerate complex_dot_prod.ll test

2 years ago[AArch64] Extra testing for sinking splats to various instructions. NFC
David Green [Sun, 21 Nov 2021 11:46:34 +0000 (11:46 +0000)]
[AArch64] Extra testing for sinking splats to various instructions. NFC

2 years ago[ELF] Move getOutputSectionName from Writer.cpp to LinkerScript.cpp. NFC
Fangrui Song [Sun, 21 Nov 2021 06:18:09 +0000 (22:18 -0800)]
[ELF] Move getOutputSectionName from Writer.cpp to LinkerScript.cpp. NFC

and internalize it.

2 years ago[llvm] Use range-based for loops (NFC)
Kazu Hirata [Sun, 21 Nov 2021 02:42:10 +0000 (18:42 -0800)]
[llvm] Use range-based for loops (NFC)

2 years ago[X86][FP16] Relax the pattern condition for VZEXT_MOVL to match more cases
Phoebe Wang [Sun, 21 Nov 2021 01:12:46 +0000 (09:12 +0800)]
[X86][FP16] Relax the pattern condition for VZEXT_MOVL to match more cases

Fixes pr52560

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114313

2 years ago[libc++][NFC] Fix typo in ranges::iterator_t synopsis
Joe Loser [Sun, 21 Nov 2021 00:15:00 +0000 (19:15 -0500)]
[libc++][NFC] Fix typo in ranges::iterator_t synopsis

The `iterator_t` alias template is on `T` not a `R` like the other
neighboring alias templates. Fix the typo.

2 years ago[libc++] [doc] Mark some spaceship-related LWG issues as "Complete."
Arthur O'Dwyer [Thu, 14 Oct 2021 20:49:58 +0000 (16:49 -0400)]
[libc++] [doc] Mark some spaceship-related LWG issues as "Complete."

LWG3330 has been "Completed" since D99309, which was in the 13.x timeframe.
Reviewed as part of D110738.

2 years ago[NFC][X86][Costmodel] Actually test +prefer-256-bit in replication-shuffle-related...
Roman Lebedev [Sat, 20 Nov 2021 22:11:05 +0000 (01:11 +0300)]
[NFC][X86][Costmodel] Actually test +prefer-256-bit in replication-shuffle-related tests :(

While -prefer-256-bit indeed becomes complete with D114314,
the real-world (the one with +prefer-256-bit) coverage is lacking.

Hilarious.

2 years ago[DSE] Drop hasAnalyzableMemoryWrite() (NFCI)
Nikita Popov [Sat, 20 Nov 2021 22:17:41 +0000 (23:17 +0100)]
[DSE] Drop hasAnalyzableMemoryWrite() (NFCI)

The functionality of hasAnalyzableMemoryWrite() is effectively
subsumed by getLocForWriteEx(), which will return None if the
instruction is not analyzable. The implementations don't match
exactly (e.g. getLocForWriteEx() does not limit non-calls to
stores), but in conjunction with the isRemovable() check, it ends
up being the same.

2 years ago[clang-tidy] performance-unnecessary-copy-initialization: Correctly match the type...
Felix Berger [Fri, 19 Nov 2021 01:33:22 +0000 (20:33 -0500)]
[clang-tidy] performance-unnecessary-copy-initialization: Correctly match the type name of the thisPointertype.

The matching did not work correctly for pointer and reference types.

Differential Revision: https://reviews.llvm.org/D114212

Reviewed-by: courbet
2 years ago[LVI] Drop requirement that modulus is constant
Nikita Popov [Sat, 20 Nov 2021 20:06:08 +0000 (21:06 +0100)]
[LVI] Drop requirement that modulus is constant

If we're looking only at the lower bound, the actual modulus
doesn't matter. This is a leftover from when I wanted to consider
the upper bound as well, where the modulus does matter.

2 years ago[LVI] Support urem in implied conditions
Nikita Popov [Sat, 20 Nov 2021 18:03:45 +0000 (19:03 +0100)]
[LVI] Support urem in implied conditions

If (X urem M) >= C we know that X >= C. Make use of this fact
when computing the implied condition range.

In some cases we could also establish an upper bound, but that's
both tricker and not interesting in practice.

Alive: https://alive2.llvm.org/ce/z/R5ZGSW

2 years ago[CVP] Add tests for implied conditions using urem (NFC)
Nikita Popov [Sat, 20 Nov 2021 19:48:56 +0000 (20:48 +0100)]
[CVP] Add tests for implied conditions using urem (NFC)

2 years ago[VPlan] Wrap vector loop blocks in region.
Florian Hahn [Sat, 20 Nov 2021 17:59:47 +0000 (17:59 +0000)]
[VPlan] Wrap vector loop blocks in region.

A first step towards modeling preheader and exit blocks in VPlan as well.
Keeping the vector loop in a region allows for changing the VF as we
traverse region boundaries.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D113182

2 years ago[InstCombine] add folds for binop with sexted bool and constant operands
Sanjay Patel [Sat, 20 Nov 2021 15:55:41 +0000 (10:55 -0500)]
[InstCombine] add folds for binop with sexted bool and constant operands

This is a generalization/extension of the existing and/or
folds noted with TODO comments. Those have a one-use
constraint that is not necessary.

Potential follow-ups are noted by the TODO comments in
the new function. We can also call this function from
other binop visit* functions, but we need to add tests
first.

This solves:
https://llvm.org/PR52543

https://alive2.llvm.org/ce/z/NWuCR5

2 years ago[InstCombine] add tests for bitwise logic with bool op; NFC
Sanjay Patel [Sat, 20 Nov 2021 15:19:27 +0000 (10:19 -0500)]
[InstCombine] add tests for bitwise logic with bool op; NFC

2 years ago[libc++] [test] Eliminate libcpp-no-noexcept-function-type and libcpp-no-structured...
Arthur O'Dwyer [Mon, 8 Nov 2021 22:00:43 +0000 (17:00 -0500)]
[libc++] [test] Eliminate libcpp-no-noexcept-function-type and libcpp-no-structured-bindings.

At this point, every supported compiler that claims a -std=c++17 mode
should also support these features.

Differential Revision: https://reviews.llvm.org/D113436

2 years ago[MLIR] Simplify Semi-affine expressions by rule based matching and replacing "expr...
Arnab Dutta [Sat, 20 Nov 2021 15:34:59 +0000 (21:04 +0530)]
[MLIR] Simplify Semi-affine expressions by rule based matching and replacing "expr - q * (expr floordiv q)" with  "expr mod q" expression.

Add rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.

Reviewed By: bondhugula, dcaballe

Differential Revision: https://reviews.llvm.org/D112985

2 years ago[Libomptarget] Remove undefined symbol in old runtime
Joseph Huber [Fri, 19 Nov 2021 15:02:28 +0000 (10:02 -0500)]
[Libomptarget] Remove undefined symbol in old runtime

A function with no definition was left in the old runtime, causing
linker errors when trying to compile.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D114264

2 years agocompiler-rt: Use FreeBSD's elf_aux_info to detect AArch64 HW features
Dimitry Andric [Sat, 20 Nov 2021 11:10:06 +0000 (12:10 +0100)]
compiler-rt: Use FreeBSD's elf_aux_info to detect AArch64 HW features

Using the out-of-line LSE atomics helpers for AArch64 on FreeBSD also
requires adding support for initializing __aarch64_have_lse_atomics
correctly. On Linux this is done with getauxval(3), on FreeBSD with
elf_aux_info(3), which has a slightly different interface.

Differential Revision: https://reviews.llvm.org/D109330

2 years ago[NFC][X86][Costmodel] Add AVX512DQ runlines to trunc.ll/extend.ll
Roman Lebedev [Sat, 20 Nov 2021 10:55:13 +0000 (13:55 +0300)]
[NFC][X86][Costmodel] Add AVX512DQ runlines to trunc.ll/extend.ll

2 years ago[NFC][X86][MCA] Add forgotten test coverage for AVX512's VPMOVM2[BWDQ] / VPMOV[BWDQ]2M
Roman Lebedev [Sat, 20 Nov 2021 10:09:18 +0000 (13:09 +0300)]
[NFC][X86][MCA] Add forgotten test coverage for AVX512's VPMOVM2[BWDQ] / VPMOV[BWDQ]2M

2 years ago[MLIR] Avoid creation of buggy affine maps while replacing dimension and symbol
Arnab Dutta [Sat, 20 Nov 2021 06:30:49 +0000 (12:00 +0530)]
[MLIR] Avoid creation of buggy affine maps while replacing dimension and symbol

Initially before appending the newly composed dimension and symbols
to the dimension and symbol list whose size is to be passed in
AffineMap::get(), the call to the AffineMap::get() was made, resulting
in wrong dimCount and symbolCount being passed as argument. We move the
call to the AffineMap::get() after the diimension and symbol list are
updated.

Differential Revision: https://reviews.llvm.org/D114237

2 years ago[X86] Don't combine (x86cmp (trunc (movmsk (bitcast X))), 0) if the truncate discards...
Craig Topper [Sat, 20 Nov 2021 03:05:10 +0000 (19:05 -0800)]
[X86] Don't combine (x86cmp (trunc (movmsk (bitcast X))), 0) if the truncate discards unknown bits.

We have transform that tries turn a pmovmskb into movmskps/pd or
movmskps to movmskpd. This transform isn't valid if the truncate
discarded bits that might be set by the original movmsk.

We could fix this by inserting an AND after the new movmsk to discard
the equivalent of the truncated bits, but I've left that for later
patch.

Fixes PR52567.

Differential Revision: https://reviews.llvm.org/D114306

2 years ago[X86] Add test case for pr52567. NFC
Craig Topper [Sat, 20 Nov 2021 02:52:17 +0000 (18:52 -0800)]
[X86] Add test case for pr52567. NFC

2 years ago[ORC] Make JITDylib::AsynchronousSymbolQuerySet private.
Lang Hames [Sat, 20 Nov 2021 05:12:23 +0000 (21:12 -0800)]
[ORC] Make JITDylib::AsynchronousSymbolQuerySet private.

This type does not need to be public

2 years ago[llvm] Use range-based for loops (NFC)
Kazu Hirata [Sat, 20 Nov 2021 05:12:12 +0000 (21:12 -0800)]
[llvm] Use range-based for loops (NFC)

2 years ago[AMDGPU] Do not generate ELF symbols for the local branch target labels
RamNalamothu [Fri, 19 Nov 2021 20:23:38 +0000 (01:53 +0530)]
[AMDGPU] Do not generate ELF symbols for the local branch target labels

The compiler was generating symbols in the final code object for local
branch target labels. This bloats the code object, slows down the loader,
and is only used to simplify disassembly.

Use '--symbolize-operands' with llvm-objdump to improve readability of the
branch target operands in disassembly.

Fixes: SWDEV-312223

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D114273

2 years ago[ORC][JITLink] Move JITDylib name into JITLinkDylib base class.
Lang Hames [Sat, 20 Nov 2021 04:51:44 +0000 (20:51 -0800)]
[ORC][JITLink] Move JITDylib name into JITLinkDylib base class.

This will enable better error messages and debug logs in JITLink.

2 years ago[GVN][NFC] Remove redundant check
ksyx [Sat, 13 Nov 2021 20:59:43 +0000 (15:59 -0500)]
[GVN][NFC] Remove redundant check

The if-check above deleted part guarantees that StoreOffset <= LoadOffset
and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that
LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows
StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0,
while it could be meaningless to have a type with nonpositive size, so that
the check could be removed. The values are converted to signed types to
avoid unsigned operation with negative offsets.

Part of revision D100179
Reapply commit c35e8185d8c170c20e28956e0c9f3c1be895fefb with fixing problem
reported by mstorsjo

2 years ago[hmaptool] Port to python3
Nathan Lanza [Wed, 11 Aug 2021 22:55:01 +0000 (18:55 -0400)]
[hmaptool] Port to python3

This is just a few trivial changes -- change the interpreter and fix a
few byte-vs-string issues.

Differential Revision: https://reviews.llvm.org/D107944

2 years ago[NFC] Test commit, add whitespace to end-of-line
James Nagurne [Sat, 20 Nov 2021 00:21:23 +0000 (18:21 -0600)]
[NFC] Test commit, add whitespace to end-of-line

2 years ago[clangd] Avoid possible crash: apply configuration after binding methods
Sam McCall [Sat, 20 Nov 2021 00:10:30 +0000 (01:10 +0100)]
[clangd] Avoid possible crash: apply configuration after binding methods

The configuration may kick off indexing, which may involve sending LSP
messages.
The crash is fiddly to reproduce in a hermetic test (we need background
indexing on without disk storage, and to handle server->client messages
in LSPClient...)

Fixes https://github.com/clangd/clangd/issues/926

2 years ago[InstrProf] Use i32 for GEP index from lowering llvm.instrprof.increment
Ellis Hoag [Fri, 19 Nov 2021 23:44:48 +0000 (15:44 -0800)]
[InstrProf] Use i32 for GEP index from lowering llvm.instrprof.increment

The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114268

2 years ago[MLIR][GPU] Link in device libraries during HSA compilation if needed
Krzysztof Drewniak [Thu, 18 Nov 2021 22:37:53 +0000 (22:37 +0000)]
[MLIR][GPU] Link in device libraries during HSA compilation if needed

To perform some operations, such as sin() or printf(), code compiled
for AMD GPUs must be linked to a series of device libraries. This
commit adds support for linking in these libraries.

However, since these device libraries are delivered as LLVM bitcode,
raising the possibility of version incompatibilities, this commit only
links in libraries when the functions from those libraries are called
by the code being compiled.

This code also sets the math flags to their most conservative values,
as MLIR doesn't have a `-ffast-math` equivalent.

Depends on D114114

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114117

2 years ago[NFC][llvm] Inclusive language: remove instance of master from Thumb2SizeReduction.cpp
Quinn Pham [Thu, 18 Nov 2021 21:52:39 +0000 (15:52 -0600)]
[NFC][llvm] Inclusive language: remove instance of master from Thumb2SizeReduction.cpp

[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `Thumb2SizeReduction.cpp`.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D114196

2 years ago[mlir] Bug fix. Stream must outlive the pass manager.
rdzhabarov [Fri, 19 Nov 2021 21:43:17 +0000 (21:43 +0000)]
[mlir] Bug fix. Stream must outlive the pass manager.

Bug fix. Stream must outlive the pass manager.

Reviewed By: Chia-hungDuan

Differential Revision: https://reviews.llvm.org/D114277

2 years ago[Sema] fix nondeterminism in ASTContext::getDeducedTemplateSpecializationType
Wei Wang [Fri, 19 Nov 2021 21:14:41 +0000 (13:14 -0800)]
[Sema] fix nondeterminism in ASTContext::getDeducedTemplateSpecializationType

`DeducedTemplateSpecializationTypes` is a `llvm::FoldingSet<DeducedTemplateSpecializationType>` [1],
where `FoldingSetNodeID` is based on the values: {`TemplateName`, `QualType`, `IsDeducedAsDependent`},
those values are also used as `DeducedTemplateSpecializationType` constructor arguments.

A `FoldingSetNodeID` created by the static `DeducedTemplateSpecializationType::Profile` may not be equal
to`FoldingSetNodeID` created by a member `DeducedTemplateSpecializationType::Profile` of an instance
created with the same {`TemplateName`, `QualType`, `IsDeducedAsDependent`}, which makes
`DeducedTemplateSpecializationTypes` lookups nondeterministic.

Specifically, while `IsDeducedAsDependent` value is passes to the constructor, `IsDependent()` method on
the created instance may return a different value, because `IsDependent` is not saved as is:
```name=clang/include/clang/AST/Type.h
  DeducedTemplateSpecializationType(TemplateName Template,  QualType DeducedAsType, bool IsDeducedAsDependent)
      : DeducedType(DeducedTemplateSpecialization, DeducedAsType,
                    toTypeDependence(Template.getDependence()) | // <~  also considers `TemplateName` parameter
                        (IsDeducedAsDependent ? TypeDependence::DependentInstantiation : TypeDependence::None)),
```
For example, if an instance A with key `FoldingSetNodeID {A, B, false}` is inserted. Then a key
`FoldingSetNodeID {A, B, true}` is probed:
If it happens to correspond to the same bucket in `FoldingSet` as the first key, and `A.Profile()` returns
`FoldingSetNodeID {A, B, true}`, then it's a hit.
If the bucket for the second key is different from the first key, instance A is not considered at all, and it's
a no hit, even if `A.Profile()` returns  `FoldingSetNodeID {A, B, true}`.

Since `TemplateName`, `QualType` parameter values involve memory pointers, the lookup result depend on allocator,
and may differ from run to run. When this is used as part of modules compilation, it may result in "module out of date"
errors, if imported modules are built on different machines.

This makes `ASTContext::getDeducedTemplateSpecializationType` consider `Template.isDependent()` similar
`DeducedTemplateSpecializationType` constructor.

Tested on a very big codebase, by running modules compilations from directories with varied path length
(seem to affect allocator seed).

1. https://llvm.org/docs/ProgrammersManual.html#llvm-adt-foldingset-h

Patch by Wei Wang and Igor Sugak!

Reviewed By: bruno

Differential Revision: https://reviews.llvm.org/D112481

2 years ago[InstCombine] add/adjust tests for mask of sext i1; NFC
Sanjay Patel [Fri, 19 Nov 2021 17:48:27 +0000 (12:48 -0500)]
[InstCombine] add/adjust tests for mask of sext i1; NFC

These are sibling transforms, but the test coverage was
uneven and incomplete.

2 years ago[PowerPC][NFC] Add a series of codegen tests for vector reductions.
Stefan Pintilie [Mon, 15 Nov 2021 21:26:30 +0000 (15:26 -0600)]
[PowerPC][NFC] Add a series of codegen tests for vector reductions.

This patch only adds tests for PowerPC. The purpose of these tests
is to track what code is generated for various vector reductions.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D113801

2 years ago[libc++][NFC] Add missing include in test
Louis Dionne [Fri, 19 Nov 2021 21:01:39 +0000 (16:01 -0500)]
[libc++][NFC] Add missing include in test

2 years agoAllow __attribute__((swift_attr)) in attribute push pragmas
Becca Royal-Gordon [Fri, 19 Nov 2021 20:10:15 +0000 (12:10 -0800)]
Allow __attribute__((swift_attr)) in attribute push pragmas

This change allows SwiftAttr to be used with #pragma clang attribute push
to add Swift attributes to large regions of header files.
We plan to use this to annotate headers with concurrency information.

Patch by: Becca Royal-Gordon

Differential Revision: https://reviews.llvm.org/D112773

2 years ago[MLIR][GPU] Make the path to ROCm a runtime option
Krzysztof Drewniak [Thu, 18 Nov 2021 21:42:42 +0000 (21:42 +0000)]
[MLIR][GPU] Make the path to ROCm a runtime option

Our current build assumes that the path to ROCm we find at build time
will be the path at which ROCm is located when the built code is
executed. This commit adds a --rocm-path option to SerializeToHsaco,
and removes the HIP dependency that the SerializeToHsaco previously had.

Depends on D114113

(though the dependency is to ensure the diffs apply cleanly and to capture the dependency on D114107)

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114114

2 years agoNFC: Callout restriction on folding 0-result ops in documentation.
Stella Laurenzo [Fri, 19 Nov 2021 20:32:21 +0000 (20:32 +0000)]
NFC: Callout restriction on folding 0-result ops in documentation.

Differential Revision: https://reviews.llvm.org/D114271

2 years agoDWARFVerifier: Simplify name lookups
David Blaikie [Fri, 19 Nov 2021 20:31:27 +0000 (12:31 -0800)]
DWARFVerifier: Simplify name lookups

No need to use the dynamic fallback query when the name type is known
statically at the call site.

2 years ago[openmp][amdgpu][nfc] Simplify implicit args handling
Jon Chesterfield [Fri, 19 Nov 2021 20:18:23 +0000 (20:18 +0000)]
[openmp][amdgpu][nfc] Simplify implicit args handling

Removes a +x/-x pair on the only store/load of a variable
and deletes some nearby dead code. Also reduces the size of the implicit
struct to reflect the code currently emitted by clang.

Differential Revision: https://reviews.llvm.org/D114270

2 years ago[libc++] Test that our algorithms never copy a user-provided comparator.
Arthur O'Dwyer [Thu, 18 Nov 2021 05:07:23 +0000 (00:07 -0500)]
[libc++] Test that our algorithms never copy a user-provided comparator.

This is not mandated by the standard, so it goes in libcxx/test/libcxx/.
It's certainly arguable that the algorithms changed here
(`is_heap`, `is_sorted`, `min`, `max`) are harmless and we should
just let them copy their comparators once. But at the same time,
it's nice to have all our algorithms be 100% consistent and never
copy a comparator, not even once.

Differential Revision: https://reviews.llvm.org/D114136

2 years ago[clang][NFC] Inclusive terms: replace some uses of sanity in clang
Zarko Todorovski [Fri, 19 Nov 2021 19:50:09 +0000 (14:50 -0500)]
[clang][NFC] Inclusive terms: replace some uses of sanity in clang

Rewording of comments to avoid using `sanity test, sanity check`.

Reviewed By: aaron.ballman, Quuxplusone

Differential Revision: https://reviews.llvm.org/D114025

2 years ago[libc++] Fix feature test macro for __cpp_lib_to_chars
Louis Dionne [Fri, 19 Nov 2021 14:52:28 +0000 (09:52 -0500)]
[libc++] Fix feature test macro for __cpp_lib_to_chars

We would have been defining it in <utility> instead of <charconv>. For
the time being, this doesn't change anything since we don't implement
the feature test macro anyways.

Also, as a fly-by, this removes obsolete feature test macro tests. There
was a brief time back in the days when we wrote feature test macro tests
manually. In particular, we had test files for __cpp_lib_to_chars and
__cpp_lib_memory_resource. Since we now have a principled way of generating
these tests with scripts, this commit removes the obsolete (and empty)
tests for these two feature test macros.

Differential Revision: https://reviews.llvm.org/D114243

2 years ago[libc++] Fix some tests that were broken in the single-threaded configuration
Louis Dionne [Fri, 19 Nov 2021 14:50:05 +0000 (09:50 -0500)]
[libc++] Fix some tests that were broken in the single-threaded configuration

We never noticed it because our CI doesn't actually build against a C
library that doesn't have threading functionality, however building
against a truly thread-free platform surfaces these issues.

Differential Revision: https://reviews.llvm.org/D114242

2 years ago[libc++] Avoid potential truncation warnings in std::abs test
Louis Dionne [Fri, 19 Nov 2021 14:55:45 +0000 (09:55 -0500)]
[libc++] Avoid potential truncation warnings in std::abs test

One some platforms, -Wimplicit-int-conversion is enabled by default,
which can lead to additional warnings being triggered in this test.
Since we're only trying to test errors related to calling abs(), the
assignment is superfluous.

As a fly-by fix, correct one instance of ::abs to std::abs and made
the test a .verify.cpp test instead.

Differential Revision: https://reviews.llvm.org/D114244

2 years ago[MLIR][GPU] Run generic LLVM optimizations when serializing (on AMD)
Krzysztof Drewniak [Thu, 18 Nov 2021 21:45:27 +0000 (21:45 +0000)]
[MLIR][GPU] Run generic LLVM optimizations when serializing (on AMD)

- Adds hooks that allow SerializeTo* passes to arbitrarily transform
the produced LLVM Module before it is passed to the code generation
passes.

- Uses these hooks within the SerializeToHsaco pass in order to run
LLVM optimizations and to set the optimization level on the
TargetMachine.

- Adds an optLevel parameter to SerializeToHsaco

Future work may include moving much of what's been added to
SerializeToHsaco to SerializeToBlob, but that would require
confirmation from the NVVM backend maintainers that it would be
appropriate to do so.

Depends on D114107

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114113

2 years ago[mlir][gpu] Extend shuffle op modes and add nvvm lowering
Thomas Raoux [Fri, 19 Nov 2021 19:03:10 +0000 (11:03 -0800)]
[mlir][gpu] Extend shuffle op modes and add nvvm lowering

Add up, down and idx modes to gpu shuffle ops, also change the mode from
string to enum

Differential Revision: https://reviews.llvm.org/D114188

2 years ago[AMDGPU] Add an implicit use of M0 to all V_MOV_B32_indirect_read/write
Jay Foad [Fri, 19 Nov 2021 13:48:23 +0000 (13:48 +0000)]
[AMDGPU] Add an implicit use of M0 to all V_MOV_B32_indirect_read/write

NFCI. Previously the implicit use was added to V_MOV_B32_indirect_read
when building the instruction. V_MOV_B32_indirect_write didn't have an
implicit use of M0 at all, but apparently it did not cause any problems.

Differential Revision: https://reviews.llvm.org/D114239

2 years ago[ELF] Support discarding .got.plt
Fangrui Song [Fri, 19 Nov 2021 18:50:53 +0000 (10:50 -0800)]
[ELF] Support discarding .got.plt

Fix a null pointer dereference when .got.plt is discarded.

This also adds a test for discarding `.plt`.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D114180

2 years ago[openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller
Jon Chesterfield [Fri, 19 Nov 2021 18:40:24 +0000 (18:40 +0000)]
[openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller

2 years ago[mlir][vector] NFC, move some vector patterns in a separate file
Thomas Raoux [Fri, 19 Nov 2021 18:31:52 +0000 (10:31 -0800)]
[mlir][vector] NFC, move some vector patterns in a separate file

Move patterns related to dropping lead unit dim into their own file.

Differential Revision: https://reviews.llvm.org/D114265

2 years ago[mlir][vector] Remove usage of shapecast to remove unit dim
Thomas Raoux [Fri, 19 Nov 2021 00:09:49 +0000 (16:09 -0800)]
[mlir][vector] Remove usage of shapecast to remove unit dim

Instead of using shape_cast op in the pattern removing leading unit
dimensions we use extract/broadcast ops. This is part of the effort to
restrict ShapeCastOp fuirther in the future and only allow them to
convert to or from 1D vector.

This also adds extra canonicalization to fill the gaps in simplifying
broadcast/extract ops.

Differential Revision: https://reviews.llvm.org/D114205

2 years ago[SROA] Add new test cases to cover existing SROA behavior that structs will be scalar...
Mingming Liu [Fri, 19 Nov 2021 18:16:11 +0000 (18:16 +0000)]
[SROA] Add new test cases to cover existing SROA behavior that structs will be scalarized.

Add an IR in unit test directory, which demonstrate the scalarization for struct allocations.
This is added to pave the way for an SROA change to skip scalarization for some cases.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D114128

2 years ago[DSE] Improve handling of `strncpy` in Dead Store Elimination
Fabian Wolff [Fri, 19 Nov 2021 17:46:17 +0000 (17:46 +0000)]
[DSE] Improve handling of `strncpy` in Dead Store Elimination

Fixes PR#52062 and one of the remaining cases of PR#47644.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D114035

2 years ago[analyzer][NFC] MaybeUInt -> MaybeCount
Balazs Benics [Fri, 19 Nov 2021 17:36:55 +0000 (18:36 +0100)]
[analyzer][NFC] MaybeUInt -> MaybeCount

I forgot to include this in D113594

Differential Revision: https://reviews.llvm.org/D113594

2 years ago[analyzer][NFC] Use enum for CallDescription flags
Balazs Benics [Fri, 19 Nov 2021 17:32:13 +0000 (18:32 +0100)]
[analyzer][NFC] Use enum for CallDescription flags

Yeah, let's prefer a slightly stronger type representing this.

Reviewed By: martong, xazax.hun

Differential Revision: https://reviews.llvm.org/D113595

2 years ago[analyzer][NFC] Consolidate the inner representation of CallDescriptions
Balazs Benics [Fri, 19 Nov 2021 17:32:13 +0000 (18:32 +0100)]
[analyzer][NFC] Consolidate the inner representation of CallDescriptions

`CallDescriptions` have a `RequiredArgs` and `RequiredParams` members,
but they are of different types, `unsigned` and `size_t` respectively.
In the patch I use only `unsigned` for both, that should be large enough
anyway.
I also introduce the `MaybeUInt` type alias for `Optional<unsigned>`.

Additionally, I also avoid the use of the //smart// less-than operator.

  template <typename T>
  constexpr bool operator<=(const Optional<T> &X, const T &Y);

Which would check if the optional **has** a value and compare the data
only after. I found it surprising, thus I think we are better off
without it.

Reviewed By: martong, xazax.hun

Differential Revision: https://reviews.llvm.org/D113594

2 years ago[analyzer][NFC] CallDescription should own the qualified name parts
Balazs Benics [Fri, 19 Nov 2021 17:32:13 +0000 (18:32 +0100)]
[analyzer][NFC] CallDescription should own the qualified name parts

Previously, CallDescription simply referred to the qualified name parts
by `const char*` pointers.
In the future we might want to dynamically load and populate
`CallDescriptionMaps`, hence we will need the `CallDescriptions` to
actually **own** their qualified name parts.

Reviewed By: martong, xazax.hun

Differential Revision: https://reviews.llvm.org/D113593

2 years ago[analyzer][NFC] Demonstrate the use of CallDescriptionSet
Balazs Benics [Fri, 19 Nov 2021 17:32:13 +0000 (18:32 +0100)]
[analyzer][NFC] Demonstrate the use of CallDescriptionSet

Reviewed By: martong, xazax.hun

Differential Revision: https://reviews.llvm.org/D113592

2 years ago[analyzer][NFC] Switch to using CallDescription::matches() instead of isCalled()
Balazs Benics [Fri, 19 Nov 2021 17:32:13 +0000 (18:32 +0100)]
[analyzer][NFC] Switch to using CallDescription::matches() instead of isCalled()

This patch replaces each use of the previous API with the new one.
In variadic cases, it will use the ADL `matchesAny(Call, CDs...)`
variadic function.
Also simplifies some code involving such operations.

Reviewed By: martong, xazax.hun

Differential Revision: https://reviews.llvm.org/D113591

2 years ago[analyzer][NFC] Introduce CallDescription::matches() in addition to isCalled()
Balazs Benics [Fri, 19 Nov 2021 17:32:13 +0000 (18:32 +0100)]
[analyzer][NFC] Introduce CallDescription::matches() in addition to isCalled()

This patch introduces `CallDescription::matches()` member function,
accepting a `CallEvent`.
Semantically, `Call.isCalled(CD)` is the same as `CD.matches(Call)`.

The patch also introduces the `matchesAny()` variadic free function template.
It accepts a `CallEvent` and at least one `CallDescription` to match
against.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D113590

2 years ago[analyzer][NFC] Introduce CallDescriptionSets
Balazs Benics [Fri, 19 Nov 2021 17:32:13 +0000 (18:32 +0100)]
[analyzer][NFC] Introduce CallDescriptionSets

Sometimes we only want to decide if some function is called, and we
don't care which of the set.
This `CallDescriptionSet` will have the same behavior, except
instead of `lookup()` returning a pointer to the mapped value,
the `contains()` returns `bool`.
Internally, it uses the `CallDescriptionMap<bool>` for implementing the
behavior. It is preferred, to reuse the generic
`CallDescriptionMap::lookup()` logic, instead of duplicating it.
The generic version might be improved by implementing a hash lookup or
something along those lines.

Reviewed By: martong, Szelethus

Differential Revision: https://reviews.llvm.org/D113589

2 years ago[LV] Remove obsolete comment about creating a dummy block (NFC)
Florian Hahn [Fri, 19 Nov 2021 17:17:03 +0000 (17:17 +0000)]
[LV] Remove obsolete comment about creating a dummy block (NFC)

No dummy pre-entry block is created since a6c4969f5f45. The comment is
stale now and can be removed.

Mentioned by @Ayal in D113182.

2 years ago[MLIR] Make the ROCM integration tests runnable
Krzysztof Drewniak [Thu, 18 Nov 2021 20:21:33 +0000 (20:21 +0000)]
[MLIR] Make the ROCM integration tests runnable

- Move the #define s to the GPU Transform library from GPU Ops so that
SerializeToHsaco is non-trivially compiled

- Add required includes to SerializeToHsaco

- Move MCSubtargetInfo creation to the correct point in the
compilation process

- Change mlir in ROCM tests to account for renamed/moved ops

Differential Revision: https://reviews.llvm.org/D114184

2 years agoSkip tests when compiler with older versions of clang
Adrian Prantl [Fri, 19 Nov 2021 17:05:38 +0000 (09:05 -0800)]
Skip tests when compiler with older versions of clang

2 years ago[libc][Obvious][NFC] A bunch of cosmetic cleanup.
Siva Chandra Reddy [Fri, 19 Nov 2021 07:32:58 +0000 (07:32 +0000)]
[libc][Obvious][NFC] A bunch of cosmetic cleanup.

* Added missing header guards.
* Fixed license header format in a few files.
* Renamed files to more suitable names.

2 years ago[lldb/test] Add ability to terminate connection from a gdb-client handler
Pavel Labath [Thu, 18 Nov 2021 12:56:36 +0000 (13:56 +0100)]
[lldb/test] Add ability to terminate connection from a gdb-client handler

We were using the client socket close as a way to terminate the handler
thread. But this kind of concurrent access to the same socket is not
safe. It also complicates running the handler without a dedicated thread
(next patch).

Instead, here I add an explicit way for a packet handler to request
termination. Waiting for lldb to terminate the connection would almost
be sufficient, but in the pty test we want to keep the pty open so we
can examine its state. Ability to disconnect at an arbitrary point may
be useful for testing other aspects of lldb functionality as well.

The way this works is that now each packet handler can optionally return
a list of responses (instead of just one). One of those responses (it
only makes sense for it to be the last one) can be a special
RESPONSE_DISCONNECT object, which triggers a disconnection (via a new
TerminateConnectionException).

As the mock server now cleans up the connection whenever it disconnects,
the pty test needs to explicitly dup(2) the descriptors in order to
inspect the post-disconnect state.

Differential Revision: https://reviews.llvm.org/D114156

2 years ago[SCEV] Revert two speculative compile time optimizations which made no difference
Philip Reames [Fri, 19 Nov 2021 16:40:24 +0000 (08:40 -0800)]
[SCEV] Revert two speculative compile time optimizations which made no difference

Revert "[SCEV] Defer all work from ea12c2cb as late as possible"
Revert "[SCEV] Defer loop property checks from ea12c2cb as late as possible"

This reverts commit 734abbad79dbcbd0e880510fbab1ef0e701cfc7b and  1a5666acb281c7510504e726ba481d09ab5f5b95.

Both of these changes were speculative attempts to address a compile time regression.  Neither worked, and both complicated the code in undesirable ways.

2 years ago[RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk
Philipp Tomsich [Fri, 19 Nov 2021 03:13:46 +0000 (19:13 -0800)]
[RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk

On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:
```
  core_list_find:
lh a2, 2(a1)
seqz a3, a0         <<
bltz a2, .LBB0_5
bnez a3, .LBB0_9    << should sink the seqz
        [...]
j .LBB0_9
  .LBB0_5:
bnez a3, .LBB0_9    << should sink the seqz
lh a1, 0(a1)
        [...]
```
due to an icmp not being sunk.

The blocks after `codegenprepare` look as follows:
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !4
    %cmp = icmp sgt i16 %0, -1
    %tobool.not37 = icmp eq %struct.list_head_s* %list, null
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph
```
where the `%tobool.not37` is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
`bltz` instruction and the `bnez` is a basic-block of its own.

Compare this to what happens on AArch64 (where the icmp is correctly sunk):
```
  define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
  entry:
    %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
    %0 = load i16, i16* %idx, align 2, !tbaa !6
    %cmp = icmp sgt i16 %0, -1
    br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader

  while.cond9.preheader:                            ; preds = %entry
    %1 = icmp eq %struct.list_head_s* %list, null
    br i1 %1, label %return, label %land.rhs11.lr.ph
```

This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.

Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:
 * we no longer signal multiple condition registers (thus changing
   the behaviour of sinkCmpExpression() back to sinking the icmp)
 * we override shouldNormalizeToSelectSequence() to let always select
   the preferred normalisation strategy for our backend

With both changes, the test results remain unchanged.  Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.

The original test case changes as expected:
```
  core_list_find:
lh a2, 2(a1)
bltz a2, .LBB0_5
beqz a0, .LBB0_9    <<
        [...]
j .LBB0_9
.LBB0_5:
beqz a0, .LBB0_9    <<
lh a1, 0(a1)
        [...]
```

Differential Revision: https://reviews.llvm.org/D98932

2 years ago[RISCV] Pre-commit test for D98932. NFC
Craig Topper [Fri, 19 Nov 2021 06:15:09 +0000 (22:15 -0800)]
[RISCV] Pre-commit test for D98932. NFC

2 years ago[PowerPC] Add a flag for conditional trap optimization
Victor Huang [Fri, 19 Nov 2021 16:10:19 +0000 (10:10 -0600)]
[PowerPC] Add a flag for conditional trap optimization

This patch adds a flag to enable/disable conditional trap optimization.
Optimization disabled by default.

Peer reviewed by: nemanjai

2 years ago[DSE] Add additional strncpy tests.
Fabian Wolff [Fri, 19 Nov 2021 16:01:36 +0000 (16:01 +0000)]
[DSE] Add additional strncpy tests.

Test for PR#52062 and one of the remaining cases of PR#47644.

2 years ago[NFC][llvm] Inclusive language: remove instance of master in IntrinsicsNVVM.td
Quinn Pham [Thu, 18 Nov 2021 21:10:49 +0000 (15:10 -0600)]
[NFC][llvm] Inclusive language: remove instance of master in IntrinsicsNVVM.td

[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `IntrinsicsNVVM.td`.

Reviewed By: steffenlarsen

Differential Revision: https://reviews.llvm.org/D114193

2 years ago[libc++][nfc] Move functions to a generic place.
Mark de Wever [Fri, 19 Nov 2021 15:38:35 +0000 (16:38 +0100)]
[libc++][nfc] Move functions to a generic place.

This allows the floating-point formatter to use the same functions as
the integral formatter. This was tested in D114001.

2 years ago[libc++] Adds (to|from)_chars_result operator==.
Mark de Wever [Sat, 23 Oct 2021 16:28:31 +0000 (18:28 +0200)]
[libc++] Adds (to|from)_chars_result operator==.

Implements part of P1614 The Mothership has Landed.

Reviewed By: #libc, Quuxplusone, Mordante

Differential Revision: https://reviews.llvm.org/D112366

2 years ago[ORC] Fix materialization of weak local symbols
Ben Langmuir [Fri, 19 Nov 2021 00:47:16 +0000 (16:47 -0800)]
[ORC] Fix materialization of weak local symbols

We were adding all defined weak symbols to the materialization
responsibility, but local symbols will not be in the symbol table, so it
failed to materialize due to the "missing" symbol.

Local weak symbols come up in practice when using `ld -r` with a hidden
weak symbol.

rdar://85574696

2 years ago[X86] Selective relocation relaxation for +tagged-globals
Matt Morehouse [Fri, 19 Nov 2021 14:12:51 +0000 (06:12 -0800)]
[X86] Selective relocation relaxation for +tagged-globals

For tagged-globals, we only need to disable relaxation for globals that
we actually tag.  With this patch function pointer relocations, which
we do not instrument, can be relaxed.

This patch also makes tagged-globals work properly with LTO, as
-Wa,-mrelax-relocations=no doesn't work with LTO.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D113220

2 years ago[SLP][NFC]Introduce TreeEntry::getVectorFactor member function, NFC.
Alexey Bataev [Thu, 18 Nov 2021 21:02:43 +0000 (13:02 -0800)]
[SLP][NFC]Introduce TreeEntry::getVectorFactor member function, NFC.

Added TreeEntry::getVectorFactor to get the final vectotization factor
to simplify the code.

Differential Revision: https://reviews.llvm.org/D114190

2 years ago[OpenMP] support depend clause for taskwait directive, by Deepak
Alexey Bataev [Fri, 19 Nov 2021 13:59:40 +0000 (05:59 -0800)]
[OpenMP] support depend clause for taskwait directive, by Deepak
Eachempati.

This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D113540

2 years ago[asm] Allow block address operands in `asm inteldialect`
Nico Weber [Thu, 18 Nov 2021 16:38:52 +0000 (11:38 -0500)]
[asm] Allow block address operands in `asm inteldialect`

This makes the following program build with -masm=intel:

    int foo(int count) {
      asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop);
      return count;
    stop:
      return 0;
    }

It's also is another step towards merging EmitGCCInlineAsmStr() and
EmitMSInlineAsmStr().

Differential Revision: https://reviews.llvm.org/D114167

2 years ago[lld/mac] Crash even less on undefined symbols with --icf=all
Nico Weber [Thu, 18 Nov 2021 21:59:21 +0000 (16:59 -0500)]
[lld/mac] Crash even less on undefined symbols with --icf=all

Follow-up to https://reviews.llvm.org/D112643. Even after that change, we were
still asserting if two separate functions that are eligible for ICF (same size,
same data, same number of relocs, same reloc types, ...) referred to
Undefineds. This fixes that oversight.

Differential Revision: https://reviews.llvm.org/D114195

2 years ago[asm] Remove explicit branch for modifier 'l'
Nico Weber [Fri, 19 Nov 2021 03:50:42 +0000 (22:50 -0500)]
[asm] Remove explicit branch for modifier 'l'

No intended behavior change.

EmitGCCInlineAsmStr() used to explicitly check for modifier 'l'
after handling block address and machine basic block operands.
This prevented passing a MachineOperand with 'l' modifier to
PrintAsmMemoryOperand(). Conceptually that seems kind of nice,
but in practice the overrides of PrintAsmMemoryOperand() in all (*)
AsmPrinter subclasses already reject modifiers they don't know about,
and none of them don't know about 'l'. So removing this doesn't have
a behavior difference, is less code, and it makes EmitGCCInlineAsmStr()
and EmitMSInlineAsmStr() more similar, to prepare for merging them later.

(Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that
always works with X86AsmPrinter I think, and
X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l'
modifier, so it's hard to motivate adding that branch.)

*: The one exception was AVRAsmPrinter, which had an llvm_unreachable instead
of returning true. So this commit changes that, so that the AVR target keeps
emitting an error instead of crashing when passing a mem operand with a :l
modifier to it. All the other targets already don't crash on this.

Differential Revision: https://reviews.llvm.org/D114216

2 years agoThe _Float16 type is supported on x86 systems with SSE2 enabled.
Zahira Ammarguellat [Wed, 17 Nov 2021 16:53:36 +0000 (11:53 -0500)]
The _Float16 type is supported on x86 systems with SSE2 enabled.
Operations are emulated by software emulation and “float” instructions.
This patch is allowing the support of _Float16 type without the use of
-max512fp16 flag. The final goal being, perform _Float16 emulation for
all arithmetic expressions.

2 years agoMake clang-format fuzz through Lexing with asserts enabled.
Manuel Klimek [Fri, 19 Nov 2021 13:11:53 +0000 (14:11 +0100)]
Make clang-format fuzz through Lexing with asserts enabled.

Makes clang-format bail out if an in-memory source file with an
unsupported BOM is handed in instead of creating source locations that
are violating clang's assumptions.

In the future, we should add support to better transport error messages
like this through clang-format instead of printing to stderr and not
creating any changes.

2 years ago[AMDGPU] Use new opcode for indexed vgpr reads
Jay Foad [Fri, 19 Nov 2021 10:32:35 +0000 (10:32 +0000)]
[AMDGPU] Use new opcode for indexed vgpr reads

Introduce V_MOV_B32_indirect_read for indexed vgpr reads
(and rename the old V_MOV_B32_indirect to
V_MOV_B32_indirect_write) so they can be unambiguously
distinguished from regular V_MOV_B32_e32. Previously they
were distinguished by looking for extra implicit operands
but this is fragile because regular moves sometimes have
extra implicit operands too:
- either by accident, when instructions end up with
  duplicate implicit operands (see e.g. D100939)
- or by design, when SIInstrInfo::copyPhysReg breaks a
  multi-dword copy into individual subreg mov instructions
  and adds implicit operands for the super-register.

The effect of this is that SIInstrInfo::isFoldableCopy can
be simplified and identifies more foldable copies. The test
diffs show that more immediate 0 values have been folded as
inline operands.

SIInstrInfo::isReallyTriviallyReMaterializable could
probably be simplified too but that is not part of this
patch.

Differential Revision: https://reviews.llvm.org/D114230

2 years ago[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 8...
Roman Lebedev [Fri, 19 Nov 2021 12:55:31 +0000 (15:55 +0300)]
[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 8 bit when have AVX512BW+AVX512VBMI

If in addition to AVX512BW (that provides `{k}<->{i8,i16}` casts and i16 shuffles),
we have AVX512VBMI, which provides i8 shuffles, we are in an optimal situation.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114071

2 years ago[X86][Costmodel] `trunc v16i8 to v8i1` can appear after legalization, cost is same...
Roman Lebedev [Fri, 19 Nov 2021 12:55:21 +0000 (15:55 +0300)]
[X86][Costmodel] `trunc v16i8 to v8i1` can appear after legalization, cost is same as for `trunc v8i8 to v8i1`

Note that there are many other missing costs, i'm *only* adding the ones that are queried
from `getReplicationShuffleCost()` for the existing (quite exhaustive) test coverage.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114070

2 years ago[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16...
Roman Lebedev [Fri, 19 Nov 2021 12:55:07 +0000 (15:55 +0300)]
[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16 bit when have AVX512BW

Here we get pretty lucky. AVX512F does not provide any instructions
to convert between a `k` vector mask and a vector,
but AVX512BW adds `{k}<->nX{i8,i16}`conversions,
and just as it happens, with AVX512BW we have a i16 shuffle.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113915

2 years ago[LangRef][VP] Correct operands' types in vp.select documentation
Fraser Cormack [Fri, 19 Nov 2021 12:06:54 +0000 (12:06 +0000)]
[LangRef][VP] Correct operands' types in vp.select documentation

The types of llvm.vp.select's operands much match the return type.