platform/upstream/llvm.git
2 years ago[clang][preprocessor] Allow calling DumpToken() on annotation tokens
Timm Bäder [Tue, 29 Mar 2022 14:58:45 +0000 (16:58 +0200)]
[clang][preprocessor] Allow calling DumpToken() on annotation tokens

Differential Revision: https://reviews.llvm.org/D122659

2 years ago[X86][test] Add encoding/decoding tests for VEX instruction w/ address-size prefix
Shengchen Kan [Wed, 13 Apr 2022 04:49:51 +0000 (12:49 +0800)]
[X86][test] Add encoding/decoding tests for VEX instruction w/ address-size prefix

This patch also contains a regression test for D122448

Reviewed By: hvdijk, RKSimon

Differential Revision: https://reviews.llvm.org/D122449

2 years ago[clang-format] Allow empty .clang-format file
owenca [Wed, 13 Apr 2022 00:51:28 +0000 (17:51 -0700)]
[clang-format] Allow empty .clang-format file

Differential Revision: https://reviews.llvm.org/D123535

2 years ago[libomptarget][amdgpu] Add hidden_heap_v1 kernarg metadata
Saiyedul Islam [Mon, 11 Apr 2022 17:28:07 +0000 (17:28 +0000)]
[libomptarget][amdgpu] Add hidden_heap_v1 kernarg metadata

Code object version 5 adds support of hidden_heap_v1 kernarg
metadata field [1]. It is a global address space pointer to an
initialized memory buffer that conforms to the requirements of the
malloc/free device library V1 version implementation.

[1] https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v5

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D123527

2 years ago[lldb] Re-enable TestStepNoDebug.py on AS
Jonas Devlieghere [Wed, 13 Apr 2022 03:27:10 +0000 (20:27 -0700)]
[lldb] Re-enable TestStepNoDebug.py on AS

This test showed up as an unexpected pass and is now consistently
passing on Apple Silicon.

2 years ago[lldb] Print diagnostic prefixes (error, warning) in color
Jonas Devlieghere [Wed, 13 Apr 2022 03:26:37 +0000 (20:26 -0700)]
[lldb] Print diagnostic prefixes (error, warning) in color

Print diagnostic prefixes (error, warning) in their respective colors
when colors are enabled.

2 years ago[NFC][sanitizer] Consolidate malloc hook invocations
Vitaly Buka [Wed, 13 Apr 2022 03:07:34 +0000 (20:07 -0700)]
[NFC][sanitizer] Consolidate malloc hook invocations

2 years ago[mlir][LLVM-IR] Added support for global variable attributes
Shraiysh Vaishay [Wed, 13 Apr 2022 02:50:56 +0000 (08:20 +0530)]
[mlir][LLVM-IR] Added support for global variable attributes

This patch adds thread_local to llvm.mlir.global and adds translation for dso_local and addr_space to and from LLVM IR.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D123412

2 years ago[NFC] [AST] Reduce the size of TemplateParmPosition
Chuanqi Xu [Thu, 7 Apr 2022 10:53:55 +0000 (18:53 +0800)]
[NFC] [AST] Reduce the size of TemplateParmPosition

I found this when reading the codes. I think it makes sense to reduce
the space for TemplateParmPosition. It is hard to image the depth of
template parameter is larger than 2^20 and the index is larger than
2^12. So I think the patch might be reasonable.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D123298

2 years ago[NFC][sanitizer] Remove unnececary HOOK macros
Vitaly Buka [Wed, 13 Apr 2022 02:19:44 +0000 (19:19 -0700)]
[NFC][sanitizer] Remove unnececary HOOK macros

2 years ago[InstCombine] [NFC] Add a test for fneg.ll
Chenbing Zheng [Wed, 13 Apr 2022 02:33:54 +0000 (10:33 +0800)]
[InstCombine] [NFC] Add a test for fneg.ll

2 years ago[clang][test] Disable opaque pointers in test
Arthur Eubanks [Wed, 13 Apr 2022 02:14:52 +0000 (19:14 -0700)]
[clang][test] Disable opaque pointers in test

Was missed in opaque pointer switch due to not being run on x86.

2 years ago[mlir][Arithmetic] Add common constant folder function for type cast ops.
jacquesguan [Mon, 11 Apr 2022 09:24:43 +0000 (09:24 +0000)]
[mlir][Arithmetic] Add common constant folder function for type cast ops.

This revision replaces current type cast constant folder with a new common type cast constant folder function template.
It will cover all former folder and support fold the constant splat and vector.

Differential Revision: https://reviews.llvm.org/D123489

2 years ago[NFC][msan] Rename SymbolizerScope to UnwinderScope and hide
Vitaly Buka [Wed, 13 Apr 2022 01:57:01 +0000 (18:57 -0700)]
[NFC][msan] Rename SymbolizerScope to UnwinderScope and hide

2 years ago[NFC][sanitizer] Clang format some code
Vitaly Buka [Wed, 13 Apr 2022 01:43:22 +0000 (18:43 -0700)]
[NFC][sanitizer] Clang format some code

2 years ago[NFC][msan] Switch pointer to a reference
Vitaly Buka [Tue, 12 Apr 2022 22:29:13 +0000 (15:29 -0700)]
[NFC][msan] Switch pointer to a reference

2 years ago[lldb] Escape semicolons for all shells
Raphael Isemann [Wed, 13 Apr 2022 01:12:18 +0000 (18:12 -0700)]
[lldb] Escape semicolons for all shells

LLDB supports having globbing regexes in the process launch arguments
that will be resolved using the user's shell. This requires that we pass
the launch args to the shell and then read back the expanded arguments
using LLDB's argdumper utility.

As the shell will not just expand the globbing regexes but all special
characters, we need to escape all non-globbing charcters such as $, &,
<, >, etc. as those otherwise are interpreted and removed in the step
where we expand the globbing characters. Also because the special
characters are shell-specific, LLDB needs to maintain a list of all the
characters that need to be escaped for each specific shell.

This patch adds the missing semicolon character to the escape list for
all currently supported shells. Without this having a semicolon in the
binary path or having a semicolon in the launch arguments will cause the
argdumping process to fail. E.g., lldb -- ./calc "a;b" was failing
before but is working now.

Fixes rdar://55776943

Differential revision: https://reviews.llvm.org/D104629

2 years ago[SLP]Improve reductions analysis and emission, part 1.
Alexey Bataev [Thu, 18 Nov 2021 16:08:01 +0000 (08:08 -0800)]
[SLP]Improve reductions analysis and emission, part 1.

Currently SLP vectorizer walks through the instructions and selects
3 main classes of values: 1) reduction operations - instructions with same
reduction opcode (add, mul, min/max, etc.), which build the reduction,
2) reduced values - instructions with the same opcodes, but different
from the reduction opcode, 3) extra arguments - all other values,
instructions from the different basic block rather than the root node,
instructions with to many/less uses.

This scheme is not very efficient. It excludes some instructions and all
non-instruction values from the reductions (constants, proficient
gathers), to many possibly reduced values are marked as extra arguments.
Patch improves this process by introducing a bit extended analysis
stage. During this stage, we still try to select 3 classes of the
values: 1) reduction operations - same as before, 2) possibly reduced
values - all instructions from the current block/non-instructions, which
may build a vectorization tree, 3) extra arguments - instructions from
the different basic blocks. Additionally, an extra sorting of the
possibly reduced values occurs to build the scalar sequences which
highly likely will bed vectorized, e.g. loads are grouped by the
distance between them, constants are grouped together, cmp instructions
are sorted by their compare types and predicates, extractelement
instructions are sorted by the vector operand, etc. Also, these groups
are reordered by their length so the longest group is the first in the
list of the possibly reduced values.

The vectorization process tries to emit the reductions for all these
groups. These reductions, remaining non-vectorized possible reduced
values and extra arguments are then combined into the final expression
just like it was before.

Differential Revision: https://reviews.llvm.org/D114171

2 years agoAMDGPU: Update reqd-work-group-size optimization for umin intrinsic
Matt Arsenault [Thu, 7 Apr 2022 17:50:08 +0000 (13:50 -0400)]
AMDGPU: Update reqd-work-group-size optimization for umin intrinsic

This code was pattern matching the ID computation expression as it
appears in the library. This was a compare and select, but now that
umin is canonical, we were no longer matching. Update to match the
intrinsic instead.

2 years agoRevert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
Muhammad Omair Javaid [Tue, 12 Apr 2022 23:51:25 +0000 (04:51 +0500)]
Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"

This reverts commit 64b6192e812977092242ae34d6eafdcd42fea39d.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

https://lab.llvm.org/buildbot/#/builders/176/builds/1515

llvm-tblgen crashes after applying this patch.

2 years ago[test][DSE] Precommit test
Arthur Eubanks [Tue, 12 Apr 2022 23:20:49 +0000 (16:20 -0700)]
[test][DSE] Precommit test

2 years agoRegAllocGreedy: Fix illegal eviction assert for urgent evictions
Matt Arsenault [Tue, 29 Mar 2022 12:48:21 +0000 (08:48 -0400)]
RegAllocGreedy: Fix illegal eviction assert for urgent evictions

The condition in canEvictInterferenceBasedOnCost is slightly different
from the assertion in evictInteference.
canEvictInterferenceBasedOnCost uses a <= check for the cascade number
for legality, but the assert was checking for <. For equal cascade
numbers for an urgent eviction, canEvictInterferenceBasedOnCost could
return success. The actual eviction would then hit this assert. Avoid
ever returning true for equivalent cascade numbers.

The resulting failed allocation seems a bit off to me. e.g. in
illegal-eviction-assert.mir, I wuold assume %0 gets allocated starting
at $vgpr0. That was its initial allocation choice, but was later
evicted. In this example no evictions can help improve anything.

2 years ago[AMDGPU] Split unaligned 4 DWORD DS operations
Stanislav Mekhanoshin [Mon, 11 Apr 2022 22:25:11 +0000 (15:25 -0700)]
[AMDGPU] Split unaligned 4 DWORD DS operations

Similarly to 3 DWORD operations it is better for performance
to split unlaligned operations as long a these are at least
DWORD alignmened. Performance data:

```
Using platform: AMD Accelerated Parallel Processing
Using device: gfx900:xnack-

ds_write_b128                      aligned by 16:  4.9 sec
ds_write2_b64                      aligned by 16:  5.1 sec
ds_write2_b32 * 2                  aligned by 16:  5.5 sec
ds_write_b128                      aligned by  1:  8.1 sec
ds_write2_b64                      aligned by  1:  8.7 sec
ds_write2_b32 * 2                  aligned by  1: 14.0 sec
ds_write_b128                      aligned by  2:  8.1 sec
ds_write2_b64                      aligned by  2:  8.7 sec
ds_write2_b32 * 2                  aligned by  2: 14.0 sec
ds_write_b128                      aligned by  4:  5.6 sec
ds_write2_b64                      aligned by  4:  8.7 sec
ds_write2_b32 * 2                  aligned by  4:  5.6 sec
ds_write_b128                      aligned by  8:  5.6 sec
ds_write2_b64                      aligned by  8:  5.1 sec
ds_write2_b32 * 2                  aligned by  8:  5.6 sec
ds_read_b128                       aligned by 16:  3.8 sec
ds_read2_b64                       aligned by 16:  3.8 sec
ds_read2_b32 * 2                   aligned by 16:  4.0 sec
ds_read_b128                       aligned by  1:  4.6 sec
ds_read2_b64                       aligned by  1:  8.1 sec
ds_read2_b32 * 2                   aligned by  1: 14.0 sec
ds_read_b128                       aligned by  2:  4.6 sec
ds_read2_b64                       aligned by  2:  8.1 sec
ds_read2_b32 * 2                   aligned by  2: 14.0 sec
ds_read_b128                       aligned by  4:  4.6 sec
ds_read2_b64                       aligned by  4:  8.1 sec
ds_read2_b32 * 2                   aligned by  4:  4.0 sec
ds_read_b128                       aligned by  8:  4.6 sec
ds_read2_b64                       aligned by  8:  3.8 sec
ds_read2_b32 * 2                   aligned by  8:  4.0 sec

Using platform: AMD Accelerated Parallel Processing
Using device: gfx1030

ds_write_b128                      aligned by 16:  6.2 sec
ds_write2_b64                      aligned by 16:  7.1 sec
ds_write2_b32 * 2                  aligned by 16:  7.6 sec
ds_write_b128                      aligned by  1: 24.1 sec
ds_write2_b64                      aligned by  1: 25.2 sec
ds_write2_b32 * 2                  aligned by  1: 43.7 sec
ds_write_b128                      aligned by  2: 24.1 sec
ds_write2_b64                      aligned by  2: 25.1 sec
ds_write2_b32 * 2                  aligned by  2: 43.7 sec
ds_write_b128                      aligned by  4: 14.4 sec
ds_write2_b64                      aligned by  4: 25.1 sec
ds_write2_b32 * 2                  aligned by  4:  7.6 sec
ds_write_b128                      aligned by  8: 14.4 sec
ds_write2_b64                      aligned by  8:  7.1 sec
ds_write2_b32 * 2                  aligned by  8:  7.6 sec
ds_read_b128                       aligned by 16:  6.2 sec
ds_read2_b64                       aligned by 16:  6.3 sec
ds_read2_b32 * 2                   aligned by 16:  7.5 sec
ds_read_b128                       aligned by  1: 12.5 sec
ds_read2_b64                       aligned by  1: 24.0 sec
ds_read2_b32 * 2                   aligned by  1: 43.6 sec
ds_read_b128                       aligned by  2: 12.5 sec
ds_read2_b64                       aligned by  2: 24.0 sec
ds_read2_b32 * 2                   aligned by  2: 43.6 sec
ds_read_b128                       aligned by  4: 12.5 sec
ds_read2_b64                       aligned by  4: 24.0 sec
ds_read2_b32 * 2                   aligned by  4:  7.5 sec
ds_read_b128                       aligned by  8: 12.5 sec
ds_read2_b64                       aligned by  8:  6.3 sec
ds_read2_b32 * 2                   aligned by  8:  7.5 sec
```

Differential Revision: https://reviews.llvm.org/D123634

2 years ago[docs][ORC] Fix RST error in dfffb7df24e.
Lang Hames [Tue, 12 Apr 2022 23:05:01 +0000 (16:05 -0700)]
[docs][ORC] Fix RST error in dfffb7df24e.

2 years agoRevert "[clang-format] Allow empty .clang-format file"
owenca [Tue, 12 Apr 2022 23:04:59 +0000 (16:04 -0700)]
Revert "[clang-format] Allow empty .clang-format file"

This reverts commit 4e814a6f2db90046914734fac4f9e3110c7e0424.

2 years agoRegAllocGreedy: Roll back successful recolorings on failure
Matt Arsenault [Thu, 17 Mar 2022 17:12:36 +0000 (13:12 -0400)]
RegAllocGreedy: Roll back successful recolorings on failure

This is a replacement for the original fix attempted in
c46aab01c002b7a04135b8b7f1f52d8c9ae23a58.

This fixes "overlapping insert" assertion failures when trying to
unwind an unsuccessful recoloring attempt.

The problem would occur when there are multiple recoloring candidates
which recursively required recoloring. If one recoloring candidate was
successfully recolored at one level, and the next recoloring candidate
was unsuccessful, we would not roll back the first candidates
successful recoloring. The forgotten successful recoloring may have
been assigned to something that conflicts with a register that needs
to be restored in a parent recoloring attempt.

See the testcase added in issue48473 for a more concrete example with
explanation.

2 years ago[docs] Update OrcV2 doc to include some notes on code removal.
Lang Hames [Tue, 12 Apr 2022 22:23:42 +0000 (15:23 -0700)]
[docs] Update OrcV2 doc to include some notes on code removal.

2 years ago[clang-format] Allow empty .clang-format file
owenca [Tue, 12 Apr 2022 21:35:58 +0000 (14:35 -0700)]
[clang-format] Allow empty .clang-format file

Differential Revision: https://reviews.llvm.org/D123535

2 years agoFix libcxx build after cd0a5889d71c62ae7cefc
Yuanfang Chen [Tue, 12 Apr 2022 22:42:21 +0000 (15:42 -0700)]
Fix libcxx build after cd0a5889d71c62ae7cefc

2 years ago[ArgPromo][OpaquePointer] Don't promote mismatched function types
Arthur Eubanks [Tue, 12 Apr 2022 22:16:11 +0000 (15:16 -0700)]
[ArgPromo][OpaquePointer] Don't promote mismatched function types

Mismatched call/callee function types is considered an indirect call.

Fixes crash in https://reviews.llvm.org/D123300#3446023.

2 years ago[examples][ORC] Add a new example showing the ORCv2 removable code APIs.
Lang Hames [Tue, 12 Apr 2022 21:47:07 +0000 (14:47 -0700)]
[examples][ORC] Add a new example showing the ORCv2 removable code APIs.

2 years ago[MSan] Ensure argument shadow initialized on memcpy
Nikita Popov [Tue, 12 Apr 2022 20:45:53 +0000 (13:45 -0700)]
[MSan] Ensure argument shadow initialized on memcpy

We need to explicitly query the shadow here, because it is lazily
initialized for byval arguments. Without opaque pointers this used to
mostly work out, because there would be a bitcast to `i8*` present, and
that would query, and copy in case of byval, the argument shadow.

Reviewed By: vitalybuka, eugenis

Differential Revision: https://reviews.llvm.org/D123602

2 years agoRevert "[MSan] Ensure argument shadow initialized on memcpy"
Vitaly Buka [Tue, 12 Apr 2022 21:51:00 +0000 (14:51 -0700)]
Revert "[MSan] Ensure argument shadow initialized on memcpy"

Invalid author.

This reverts commit 163a9f4552bea71b2d53126a5f74f9a1b47d2865.

2 years ago[Reland][lit] Use sharding for GoogleTest format
Yuanfang Chen [Tue, 12 Apr 2022 19:09:34 +0000 (12:09 -0700)]
[Reland][lit] Use sharding for GoogleTest format

This helps lit unit test performance by a lot, especially on windows. The performance gain comes from launching one gtest executable for many subtests instead of one (this is the current situation).

The shards are executed by the test runner and the results are stored in the
json format supported by the GoogleTest. Later in the test reporting stage,
all test results in the json file are retrieved to continue the test results
summary etc.

On my Win10 desktop, before this patch: `check-clang-unit`: 177s, `check-llvm-unit`: 38s; after this patch: `check-clang-unit`: 37s, `check-llvm-unit`: 11s.
On my Linux machine, before this patch: `check-clang-unit`: 46s, `check-llvm-unit`: 8s; after this patch: `check-clang-unit`: 7s, `check-llvm-unit`: 4s.

Reviewed By: yln, rnk, abrachet

Differential Revision: https://reviews.llvm.org/D122251

2 years ago[MSan] Ensure argument shadow initialized on memcpy
Vitaly Buka [Tue, 12 Apr 2022 20:45:53 +0000 (13:45 -0700)]
[MSan] Ensure argument shadow initialized on memcpy

We need to explicitly query the shadow here, because it is lazily
initialized for byval arguments. Without opaque pointers this used to
mostly work out, because there would be a bitcast to `i8*` present, and
that would query, and copy in case of byval, the argument shadow.

Reviewed By: vitalybuka, eugenis

Differential Revision: https://reviews.llvm.org/D123602

2 years ago[GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics
Johannes Doerfert [Mon, 11 Apr 2022 18:32:22 +0000 (13:32 -0500)]
[GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics

This is a long standing problem that resurfaces once in a while [0].
There might actually be two problems because I'm not 100% sure if the
issue underlying https://reviews.llvm.org/D115302 would be solved by
this or not. Anyway.

In 2008 we thought intrinsics do not read/write globals passed to them:
https://github.com/llvm/llvm-project/commit/d4133ac31535ce5176f97e9fc81825af8a808760
This is not correct given that intrinsics can synchronize threads and
cause effects to effectively become visible.

NOTE: I did not yet modify any tests but only tried out the reproducer
      of https://github.com/llvm/llvm-project/issues/54851.

Fixes: https://github.com/llvm/llvm-project/issues/54851

[0] https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402

Differential Revision: https://reviews.llvm.org/D123531

2 years ago[NVPTX][FIX] Allow __nvvm_reflect in the presence of opaque pointers
Johannes Doerfert [Mon, 11 Apr 2022 17:23:50 +0000 (12:23 -0500)]
[NVPTX][FIX] Allow __nvvm_reflect in the presence of opaque pointers

Differential Revision: https://reviews.llvm.org/D123522

2 years ago[OpenMP][FIX] Ensure to set the context for wait events if necessary
Johannes Doerfert [Sat, 9 Apr 2022 05:12:44 +0000 (00:12 -0500)]
[OpenMP][FIX] Ensure to set the context for wait events if necessary

Differential Revision: https://reviews.llvm.org/D123445

2 years agoAMDGPU: Don't use unreachable on stores to unhandled address space
Matt Arsenault [Mon, 21 Feb 2022 23:00:20 +0000 (18:00 -0500)]
AMDGPU: Don't use unreachable on stores to unhandled address space

For stores to constant address space, this will now consistently hit a
selection error instead of hitting unreachable in an asserts build.

I'm not sure what we should really do here. We could either just
codegen as if it were global, delete the instruction, or declare the
IR invalid (we really should have a target IR verifier to enforce it).

2 years agoRevert "[clang-format] Allow empty .clang-format file"
owenca [Tue, 12 Apr 2022 21:28:02 +0000 (14:28 -0700)]
Revert "[clang-format] Allow empty .clang-format file"

This reverts commit 6eafda0ef0543cad4b190002e9dae93b036a4ded.

2 years ago[clang-format] Allow empty .clang-format file
owenca [Mon, 11 Apr 2022 19:00:03 +0000 (12:00 -0700)]
[clang-format] Allow empty .clang-format file

Differential Revision: https://reviews.llvm.org/D123535

2 years agoGlobalISel: Implement MoreElements for select of vector conditions
Matt Arsenault [Tue, 12 Apr 2022 01:31:15 +0000 (21:31 -0400)]
GlobalISel: Implement MoreElements for select of vector conditions

2 years agoAArch64/GlobalISel: Remove pointless s1 legalize rules
Matt Arsenault [Tue, 12 Apr 2022 00:43:58 +0000 (20:43 -0400)]
AArch64/GlobalISel: Remove pointless s1 legalize rules

These have no net effect on the legalize rules.

2 years agoGlobalISel: Fix lowerSelect handling of boolean high bits
Matt Arsenault [Tue, 12 Apr 2022 01:11:26 +0000 (21:11 -0400)]
GlobalISel: Fix lowerSelect handling of boolean high bits

This was making several invalid assumptions about the incoming
select. First, it was assuming the incoming condition was either s1 or
already sign extended, not accounting for different boolean high bits
behavior between scalar and vector conditions. We only had a vector
boolean due to the intermediate step vector select, which is now
avoided.

Second, it was assuming it can use the result vector type as a boolean
mask. These types don't have anything to do with other, and only makes
sense in the context of the expansion to bit operations. Since these
logically are part of the same lowering, do the complete expansion in
a single step.

The added select_v4s1_s1 test does fail to legalize, since it seems
AArch64's vector legalization support is pretty incomplete.

2 years agoGlobalISel: Handle widening addo/subo booleans
Matt Arsenault [Tue, 12 Apr 2022 15:49:22 +0000 (11:49 -0400)]
GlobalISel: Handle widening addo/subo booleans

This will be tested in a future patch

2 years agoGlobalISel: Handle widening umulo/smulo condition outputs
Matt Arsenault [Tue, 12 Apr 2022 16:03:04 +0000 (12:03 -0400)]
GlobalISel: Handle widening umulo/smulo condition outputs

2 years agoGlobalISel: Update mutationIsSane assert for scalable vectors
Matt Arsenault [Tue, 12 Apr 2022 20:13:59 +0000 (16:13 -0400)]
GlobalISel: Update mutationIsSane assert for scalable vectors

2 years agoMips/GlobalISel: Add test for atomic load
Matt Arsenault [Sun, 10 Apr 2022 16:45:45 +0000 (12:45 -0400)]
Mips/GlobalISel: Add test for atomic load

2 years ago[RISCV] Add a encodeLMUL function to RISCVVType. NFC
Craig Topper [Tue, 12 Apr 2022 20:25:51 +0000 (13:25 -0700)]
[RISCV] Add a encodeLMUL function to RISCVVType. NFC

This moves the encoding handling out of the assembly parser.

Reviewed By: khchen, frasercrmck

Differential Revision: https://reviews.llvm.org/D123553

2 years ago[PowerPC] Fix EmitPPCBuiltinExpr to emit arguments once
Quinn Pham [Tue, 12 Apr 2022 17:46:33 +0000 (12:46 -0500)]
[PowerPC] Fix EmitPPCBuiltinExpr to emit arguments once

This patch changes `EmitPPCBuiltinExpr` in `CGBuiltin.cpp` to remove
the loop at the beginning of the function that emits the arguments and
to delay emitting the arguments until inside the switch statement. These
changes will put `EmitPPCBuiltinExpr` in line with the strategy of the
target independent function `EmitBuiltinExpr`. Also, this patch
ensures that arguments are only emitted once.

Tests that included builtins affected by these changes have been
modified to match expected behaviour.

Reviewed By: #powerpc, nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D121637

2 years agolit.cfg.py: remove obsoleted feature clang-driver
Fangrui Song [Tue, 12 Apr 2022 20:31:06 +0000 (13:31 -0700)]
lit.cfg.py: remove obsoleted feature clang-driver

2 years ago[Driver][test] Remove unused/obsoleted REQUIRES: clang-driver
Fangrui Song [Tue, 12 Apr 2022 20:29:46 +0000 (13:29 -0700)]
[Driver][test] Remove unused/obsoleted REQUIRES: clang-driver

It (introduced by 556d713c70bfaf58ac18d089883f9c34c581633a) appears to be
related to the removed dragonegg project. In addition, the feature was a bit
misnamed and may lur users to unnecessarily use it.

2 years ago[trace][intelpt] Remove code smell when printing the raw trace size
Walter Erquinigo [Fri, 8 Apr 2022 03:57:01 +0000 (20:57 -0700)]
[trace][intelpt] Remove code smell when printing the raw trace size

Something ugly I did was to report the trace buffer size to the DecodedThread,
which is later used as part of the `dump info` command. Instead of doing that,
we can just directly ask the trace for the raw buffer and print its size.

I thought about not asking for the entire trace but instead just for its size,
but in this case, as our traces as not extremely big, I prefer to ask for the
entire trace, ensuring it could be fetched, and then print its size.

Differential Revision: https://reviews.llvm.org/D123358

2 years ago[trace][intelpt] Add task timer classes
Walter Erquinigo [Fri, 8 Apr 2022 03:24:25 +0000 (20:24 -0700)]
[trace][intelpt] Add task timer classes

I'm adding two new classes that can be used to measure the duration of long
tasks as process and thread level, e.g. decoding, fetching data from
lldb-server, etc. In this first patch, I'm using it to measure the time it takes
to decode each thread, which is printed out with the `dump info` command. In a
later patch I'll start adding process-level tasks and I might move these
classes to the upper Trace level, instead of having them in the intel-pt
plugin. I might need to do that anyway in the future when we have to
measure HTR. For now, I want to keep the impact of this change minimal.

With it, I was able to generate the following info of a very big trace:

```
(lldb) thread trace dump info                                                                                                            Trace technology: intel-pt

thread #1: tid = 616081
  Total number of instructions: 9729366

  Memory usage:
    Raw trace size: 1024 KiB
    Total approximate memory usage (excluding raw trace): 123517.34 KiB
    Average memory usage per instruction (excluding raw trace): 13.00 bytes

  Timing:
    Decoding instructions: 1.62s

  Errors:
    Number of TSC decoding errors: 0
```

As seen above, it took 1.62 seconds to decode 9.7M instructions. This is great
news, as we don't need to do any optimization work in this area.

Differential Revision: https://reviews.llvm.org/D123357

2 years ago[ubsan][test] Unsupport Android for new test diag-stacktrace.cpp
Fangrui Song [Tue, 12 Apr 2022 19:55:43 +0000 (12:55 -0700)]
[ubsan][test] Unsupport Android for new test diag-stacktrace.cpp

https://reviews.llvm.org/D123562#3446485 reported that the test failed
on arm-linux-android.

2 years ago[clang][extract-api] Add support for true anonymous enums
Daniel Grumberg [Mon, 11 Apr 2022 18:53:33 +0000 (19:53 +0100)]
[clang][extract-api] Add support for true anonymous enums

Anonymous enums without a typedef should have a "(anonymous)" identifier.

Differential Revision: https://reviews.llvm.org/D123533

2 years agoAMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Changpeng Fang [Tue, 12 Apr 2022 19:36:30 +0000 (12:36 -0700)]
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally

Summary:
  Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default.
We use implicitarg_ptr + offset to check whether the multigrid synchronization
pointer is used. If yes, we remove this attribute and also remove
amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg
only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.

Reviewers: arsenm, sameerds, b-sumner and foad

Differential Revision: https://reviews.llvm.org/D123548

2 years ago[AMDGPU] Update ds-alignment.ll test checks. NFC.
Stanislav Mekhanoshin [Tue, 12 Apr 2022 18:56:55 +0000 (11:56 -0700)]
[AMDGPU] Update ds-alignment.ll test checks. NFC.

2 years ago[mlir][sparse] refactored python setup of sparse compiler
Aart Bik [Fri, 8 Apr 2022 18:51:14 +0000 (11:51 -0700)]
[mlir][sparse] refactored python setup of sparse compiler

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D123419

2 years ago[mlir][Linalg] Allow collapsing subset of the reassociations when fusing by collapsing.
Mahesh Ravishankar [Tue, 12 Apr 2022 16:50:43 +0000 (16:50 +0000)]
[mlir][Linalg] Allow collapsing subset of the reassociations when fusing by collapsing.

This change generalizes the fusion of `tensor.expand_shape` ->
`linalg.generic` op by collapsing to handle cases where only a subset
of the reassociations specified in the `tensor.expand_shape` are valid
to be collapsed.
The method that does the collapsing is refactored to allow it to be a
generic utility when required.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D123153

2 years ago[SLP][X86] Add ray_sphere intersection methods from c-ray benchmark
Simon Pilgrim [Tue, 12 Apr 2022 18:46:53 +0000 (19:46 +0100)]
[SLP][X86] Add ray_sphere intersection methods from c-ray benchmark

We're failing to vectorize several comparison reduction patterns.

Issue #43090 was based off this, but while that simplified test case is now folding, the original still fails due to poor cost model values for vXi1 extractions

2 years ago[lldb] Re-enable fixed on-device tests
Jonas Devlieghere [Tue, 12 Apr 2022 18:37:06 +0000 (11:37 -0700)]
[lldb] Re-enable fixed on-device tests

These tests were fixed by 833882b32701.

2 years ago[Bitcode] materialize Functions early when BlockAddress taken
Nick Desaulniers [Tue, 12 Apr 2022 18:37:42 +0000 (11:37 -0700)]
[Bitcode] materialize Functions early when BlockAddress taken

IRLinker builds a work list of functions to materialize, then moves them
from a source module to a destination module one at a time.

This is a problem for blockaddress Constants, since they need not refer
to the function they are used in; IPSCCP is quite good at sinking these
constants deep into other functions when passed as arguments.

This would lead to curious errors during LTO:
  ld.lld: error: Never resolved function from blockaddress ...
based on the ordering of function definitions in IR.

The problem was that IRLinker would basically do:

  for function f in worklist:
    materialize f
    splice f from source module to destination module

in one pass, with Functions being lazily added to the running worklist.
This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.
This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs
list incorrectly, as it will never re-materialize an already
materialized (but spliced out) function.

Because of the possibility that blockaddress Constants may appear in
Functions other than the ones they reference, this patch adds a new
bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of
Functions that contain BlockAddress Constants that refer back to this
Function, rather then the Function they are scoped in. We then
materialize those functions when materializing `f` from the example loop
above. This might over-materialize Functions should the user of
BitcodeReader ultimately decide not to link those Functions, but we can
at least now we can avoid this ordering related issue with blockaddresses.

Fixes: https://github.com/llvm/llvm-project/issues/52787
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1215

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D120781

2 years ago[mlir][OpenMP] Added omp.task
Shraiysh Vaishay [Tue, 12 Apr 2022 18:20:27 +0000 (23:50 +0530)]
[mlir][OpenMP] Added omp.task

This patch adds tasking construct according to Section 2.10.1 of OpenMP 5.0

Reviewed By: peixin, kiranchandramohan, abidmalikwaterloo

Differential Revision: https://reviews.llvm.org/D123575

2 years ago[ubsan] Fix print_stacktrace=1:fast_unwind_on_fatal=0 to correctly fallback to fast...
Fangrui Song [Tue, 12 Apr 2022 18:24:19 +0000 (11:24 -0700)]
[ubsan] Fix print_stacktrace=1:fast_unwind_on_fatal=0 to correctly fallback to fast unwinder

ubsan_GetStackTrace (from 52b751088b11547e0f4ef0589ebbe5e57752c68c) called by
~ScopeReport leaves top/bottom zeroes in the
`!WillUseFastUnwind(request_fast_unwind)` code path.
When BufferedStackTrace::Unwind falls back to UnwindFast,
`if (stack_top < 4096) return;` will return early, leaving just one frame in the stack trace.

Fix this by always initializing top/bottom like 261d6e05d5574bec753ea6b7e9a7f99229927753.

Reviewed By: eugenis, yln

Differential Revision: https://reviews.llvm.org/D123562

2 years ago[InstCombine] Add more memrchr tests (NFC).
Martin Sebor [Tue, 12 Apr 2022 17:10:42 +0000 (11:10 -0600)]
[InstCombine] Add more memrchr tests (NFC).

2 years ago[OpenMP][libomp] Replace global variable references with local object
Jonathan Peyton [Tue, 12 Apr 2022 16:15:54 +0000 (11:15 -0500)]
[OpenMP][libomp] Replace global variable references with local object

Remove references to global __kmp_topology within a kmp_topology_t
object method. There should just be implicit references to the
private object.

2 years ago[docs] Mention that we are in the process of removing the legacy PM for the optimizat...
Arthur Eubanks [Mon, 11 Apr 2022 21:13:53 +0000 (14:13 -0700)]
[docs] Mention that we are in the process of removing the legacy PM for the optimization pipeline

And remove references to flags to turn it off.

Reviewed By: nikic, MaskRay

Differential Revision: https://reviews.llvm.org/D123547

2 years ago[libc++] Define legacy symbols for inline functions at a finer-grained level
Louis Dionne [Mon, 11 Apr 2022 16:32:40 +0000 (12:32 -0400)]
[libc++] Define legacy symbols for inline functions at a finer-grained level

When we build the library with the stable ABI, we need to include some
functions in the dylib that were made inline in later versions of the
library (to avoid breaking code that might be relying on those symbols).

However, those methods were made non-inline whenever we'd be building
the library, which means that all translation units would end up using
the old out-of-line definition of these methods, as opposed to the new
inlined version. This patch makes it so that only the translation units
that actually define the out-of-line methods use the old definition,
opening up potential optimization opportunities in other translation
units.

This should solve some of the issues encountered in D65667.

Differential Revision: https://reviews.llvm.org/D123519

2 years ago[AArch64][LOH] Don't ignore regmasks in bundles by iterating over instrs.
Ahmed Bougacha [Tue, 12 Apr 2022 16:23:11 +0000 (09:23 -0700)]
[AArch64][LOH] Don't ignore regmasks in bundles by iterating over instrs.

The LOH pass iterates over instructions to build its custom register
state machine, but it uses the top-level bundle iterator.
This should be okay, because when the wrapper BUNDLE MI is built,
it aggregates the register defs/uses in its instructions into MOs.

However, that doesn't apply to regmasks, and accumulating regmasks
across multiple instructions would be messy business.
There are a couple AnalyzePhysRegInBundle (/Virt) helpers that
do look at regmasks, but those don't fit in very well here.

AArch64 has started to use a few bundle instructions, specifically
as glorified pseudos for variant call instructions, which have regmasks.
So the LOH pass ends up ignoring regmasks.

Concretely, this has been wrong for a while, but, on aarch64, the
most common bundle (rv_marker call) was always followed by the
attached call instruction, a plain BL with a regmask.  Which
was properly detected by the pass.

However, we recently started keeping the attached call in the bundle,
so the regmask is now ignored.  And the pass happily combines ADRPs, of
say, x8, across the bundle, resulting in corrupt pointers later.

2 years ago[AArch64] Cleanup call-rv-marker.ll test. NFC.
Ahmed Bougacha [Tue, 12 Apr 2022 16:39:03 +0000 (09:39 -0700)]
[AArch64] Cleanup call-rv-marker.ll test. NFC.

This was doing -iphoneos instead of -ios. While there,
remove an old TODO and cleanup some alignment.

2 years ago[X86] Fix handling of maskmovdqu in x32 differently
Harald van Dijk [Tue, 12 Apr 2022 17:32:14 +0000 (18:32 +0100)]
[X86] Fix handling of maskmovdqu in x32 differently

This reverts the functional changes of D103427 but keeps its tests, and
and reimplements the functionality by reusing the existing 32-bit
MASKMOVDQU and VMASKMOVDQU instructions as suggested by skan in review.
These instructions were previously predicated on Not64BitMode. This
reimplementation restores the disassembly of a class of instructions,
which will see a test added in followup patch D122449.

These instructions are in 64-bit mode special cased in
X86MCInstLower::Lower, because we use flags with one meaning for subtly
different things: we have an AdSize32 class which indicates both that
the instruction needs a 0x67 prefix and that the text form of the
instruction implies a 0x67 prefix. These instructions are special in
needing a 0x67 prefix but having a text form that does *not* imply a
0x67 prefix, so we encode this in MCInst as an instruction that has an
explicit address size override.

Note that originally VMASKMOVDQU64 was special cased to be excluded from
disassembly, as we cannot distinguish between VMASKMOVDQU and
VMASKMOVDQU64 and rely on the fact that these are indistinguishable, or
close enough to it, at the MCInst level that it does not matter which we
use. Because VMASKMOVDQU now receives special casing, even though it
does not make a difference in the current implementation, as a
precaution VMASKMOVDQU is excluded from disassembly rather than
VMASKMOVDQU64.

Reviewed By: RKSimon, skan

Differential Revision: https://reviews.llvm.org/D122540

2 years ago[MLIR][Presburger] Remove inheritance from PresburgerSpace in IntegerRelation, Presbu...
Groverkss [Tue, 12 Apr 2022 17:14:25 +0000 (22:44 +0530)]
[MLIR][Presburger] Remove inheritance from PresburgerSpace in IntegerRelation, PresburgerRelation and PWMAFunction

This patch removes inheritence from PresburgerSpace in IntegerRelation and
instead makes it a member of these classes.

This is required for three reasons:
  - It prevents implicit casting to PresburgerSpace.
  - Not all functions of PresburgerSpace need to be exposed by the deriving classes.
  - IntegerRelation and IntegerPolyhedron are defined in a PresburgerSpace. It
    makes more sense for the space to be a member instead of them inheriting from
    a space.

Reviewed By: arjunp, ftynse

Differential Revision: https://reviews.llvm.org/D123585

2 years ago[clang][ExtractAPI][NFC] Fix sed delimiter in test
Zixu Wang [Mon, 11 Apr 2022 17:52:36 +0000 (10:52 -0700)]
[clang][ExtractAPI][NFC] Fix sed delimiter in test

Fix path replacement in sed (properly this time) using lit
regex_replacement.

Differential Revision: https://reviews.llvm.org/D123526

Co-authored-by: Michele Scandale <michele.scandale@gmail.com>
Co-authored-by: Zixu Wang <9819235+zixu-w@users.noreply.github.com>
2 years ago[NFC][CodeGen] Use ArrayRef in TargetLowering functions
Shao-Ce SUN [Tue, 12 Apr 2022 15:22:41 +0000 (23:22 +0800)]
[NFC][CodeGen] Use ArrayRef in TargetLowering functions

This patch is similar to D122557, adding an `ArrayRef` version for `setOperationAction`, `setLoadExtAction`, `setCondCodeAction`, `setLibcallName`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123467

2 years ago[AMDGPU][Codegen] Unsupported image sample texture map instructions
Anshil Gandhi [Tue, 12 Apr 2022 15:46:41 +0000 (09:46 -0600)]
[AMDGPU][Codegen] Unsupported image sample texture map instructions

Disables image_sample_*_g16 instructions on architectures lacking g16 support. This patch fixes the issue 54672.

Differential Revision: https://reviews.llvm.org/D123461

2 years ago[SimplifyCFG] cleanup code for converting switch to select (NFC)
Sanjay Patel [Tue, 12 Apr 2022 16:06:27 +0000 (12:06 -0400)]
[SimplifyCFG] cleanup code for converting switch to select (NFC)

This renames functions for more general usage (and current capitalization style)
before a proposed logic change in D122485.

Differential Revision: https://reviews.llvm.org/D123614

2 years ago[OpenMP][libomp] Fix some Doxygen issues
Jonathan Peyton [Tue, 12 Apr 2022 16:00:36 +0000 (11:00 -0500)]
[OpenMP][libomp] Fix some Doxygen issues

Fix spelling of variable names and remove accidental references (#)
in Doxygen comments.

2 years ago[AArch64] Async unwind - function epilogues
Momchil Velikov [Tue, 12 Apr 2022 15:30:46 +0000 (16:30 +0100)]
[AArch64] Async unwind - function epilogues

Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D112330

2 years ago[NFC][libc++][test] Move time tests.
Mark de Wever [Wed, 30 Mar 2022 16:52:31 +0000 (18:52 +0200)]
[NFC][libc++][test] Move time tests.

In the C++20 Standard time is no longer section under utilities, but
became its own chapter. This moves the time tests accordingly so their
location matches the current Standard.

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D122745

2 years ago[AMDGPU] Use default member initializers in Subtarget classes
Jay Foad [Tue, 12 Apr 2022 15:19:11 +0000 (16:19 +0100)]
[AMDGPU] Use default member initializers in Subtarget classes

Use default member initializers in AMDGPUSubtarget and subclasses. This
is to guard against adding a new feature boolean in AMDGPUSubtarget.h
but forgetting to initialize it to false in AMDGPUSubtarget.cpp.

This was mostly autogenerated by:
clang-tidy -checks=-*,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/*Subtarget.cpp

Differential Revision: https://reviews.llvm.org/D123613

2 years ago[gn build] Fix a URL in a comment
Nico Weber [Tue, 12 Apr 2022 15:37:57 +0000 (11:37 -0400)]
[gn build] Fix a URL in a comment

2 years ago[InstSimplify] Don't fold phi of poison and trapping const expr (PR49839)
Nikita Popov [Tue, 12 Apr 2022 15:31:29 +0000 (17:31 +0200)]
[InstSimplify] Don't fold phi of poison and trapping const expr (PR49839)

Folding this case would result in the constant expression being
executed unconditionally, which may introduce a new trap.

Fixes https://github.com/llvm/llvm-project/issues/49839.

2 years ago[InstSimplify] Add test for PR49839 (NFC)
Nikita Popov [Tue, 12 Apr 2022 15:25:28 +0000 (17:25 +0200)]
[InstSimplify] Add test for PR49839 (NFC)

2 years ago[AMDGPU] Split unaligned 3 DWORD DS operations
Stanislav Mekhanoshin [Mon, 11 Apr 2022 17:13:31 +0000 (10:13 -0700)]
[AMDGPU] Split unaligned 3 DWORD DS operations

I have written a minitest to check the performance. Overall
the benefit of aligned b96 operations on data which is not
known but happens to be aligned is small, while performance
hit of using b96 operations on a really unaligned memory is
high.

The only exception is when data is not aligned even by 4, it
is better to use b96 in this case.

Here is the test output on Vega and Navi:

```
Using platform: AMD Accelerated Parallel Processing
Using device: gfx900:xnack-

ds_write_b96                                  aligned: 3.4 sec
ds_write_b32 + ds_write_b64                   aligned: 4.5 sec
ds_write_b32 * 3                              aligned: 4.8 sec
ds_write_b96                          misaligned by 1: 4.8 sec
ds_write_b32 + ds_write_b64           misaligned by 1: 7.2 sec
ds_write_b32 * 3                      misaligned by 1: 10.0 sec
ds_write_b96                          misaligned by 2: 4.8 sec
ds_write_b32 + ds_write_b64           misaligned by 2: 7.2 sec
ds_write_b32 * 3                      misaligned by 2: 10.1 sec
ds_write_b96                          misaligned by 4: 4.8 sec
ds_write_b32 + ds_write_b64           misaligned by 4: 4.2 sec
ds_write_b32 * 3                      misaligned by 4: 4.9 sec
ds_write_b96                          misaligned by 8: 4.8 sec
ds_write_b32 + ds_write_b64           misaligned by 8: 4.6 sec
ds_write_b32 * 3                      misaligned by 8: 4.9 sec
ds_read_b96                                   aligned: 3.3 sec
ds_read_b32 + ds_read_b64                     aligned: 4.9 sec
ds_read_b32 * 3                               aligned: 2.6 sec
ds_read_b96                           misaligned by 1: 4.1 sec
ds_read_b32 + ds_read_b64             misaligned by 1: 7.2 sec
ds_read_b32 * 3                       misaligned by 1: 10.1 sec
ds_read_b96                           misaligned by 2: 4.1 sec
ds_read_b32 + ds_read_b64             misaligned by 2: 7.2 sec
ds_read_b32 * 3                       misaligned by 2: 10.1 sec
ds_read_b96                           misaligned by 4: 4.1 sec
ds_read_b32 + ds_read_b64             misaligned by 4: 2.6 sec
ds_read_b32 * 3                       misaligned by 4: 2.6 sec
ds_read_b96                           misaligned by 8: 4.1 sec
ds_read_b32 + ds_read_b64             misaligned by 8: 4.9 sec
ds_read_b32 * 3                       misaligned by 8: 2.6 sec

Using platform: AMD Accelerated Parallel Processing
Using device: gfx1030

ds_write_b96                                  aligned: 4.1 sec
ds_write_b32 + ds_write_b64                   aligned: 13.0 sec
ds_write_b32 * 3                              aligned: 4.5 sec
ds_write_b96                          misaligned by 1: 12.5 sec
ds_write_b32 + ds_write_b64           misaligned by 1: 22.0 sec
ds_write_b32 * 3                      misaligned by 1: 31.5 sec
ds_write_b96                          misaligned by 2: 12.4 sec
ds_write_b32 + ds_write_b64           misaligned by 2: 22.0 sec
ds_write_b32 * 3                      misaligned by 2: 31.5 sec
ds_write_b96                          misaligned by 4: 12.4 sec
ds_write_b32 + ds_write_b64           misaligned by 4: 4.0 sec
ds_write_b32 * 3                      misaligned by 4: 4.5 sec
ds_write_b96                          misaligned by 8: 12.4 sec
ds_write_b32 + ds_write_b64           misaligned by 8: 13.0 sec
ds_write_b32 * 3                      misaligned by 8: 4.5 sec
ds_read_b96                                   aligned: 3.8 sec
ds_read_b32 + ds_read_b64                     aligned: 12.8 sec
ds_read_b32 * 3                               aligned: 4.4 sec
ds_read_b96                           misaligned by 1: 10.9 sec
ds_read_b32 + ds_read_b64             misaligned by 1: 21.8 sec
ds_read_b32 * 3                       misaligned by 1: 31.5 sec
ds_read_b96                           misaligned by 2: 10.9 sec
ds_read_b32 + ds_read_b64             misaligned by 2: 21.9 sec
ds_read_b32 * 3                       misaligned by 2: 31.5 sec
ds_read_b96                           misaligned by 4: 10.9 sec
ds_read_b32 + ds_read_b64             misaligned by 4: 3.8 sec
ds_read_b32 * 3                       misaligned by 4: 4.5 sec
ds_read_b96                           misaligned by 8: 10.9 sec
ds_read_b32 + ds_read_b64             misaligned by 8: 12.8 sec
ds_read_b32 * 3                       misaligned by 8: 4.5 sec
```

Fixes: SWDEV-330802

Differential Revision: https://reviews.llvm.org/D123524

2 years ago[AMDGPU] Refactor LDS alignment checks.
Stanislav Mekhanoshin [Thu, 7 Apr 2022 00:41:25 +0000 (17:41 -0700)]
[AMDGPU] Refactor LDS alignment checks.

Move features/bugs checks into the single place
allowsMisalignedMemoryAccessesImpl.

This is mostly NFCI except for the order of selection in couple places.
A separate change may be needed to stop lying about Fast.

Differential Revision: https://reviews.llvm.org/D123343

2 years ago[X86] getFauxShuffleMask - remove use DemandedElts TODO
Simon Pilgrim [Tue, 12 Apr 2022 14:36:20 +0000 (15:36 +0100)]
[X86] getFauxShuffleMask - remove use DemandedElts TODO

Most of the getTargetShuffleInputs recursive calls have now gone and the remaining uses aren't likely to benefit from a DemandedElts mask

2 years ago[pseudo] Remove unused clangTesting dep. NFC
Sam McCall [Tue, 12 Apr 2022 14:17:32 +0000 (16:17 +0200)]
[pseudo] Remove unused clangTesting dep. NFC

2 years ago[clang-tidy] Never consider assignments as equivalent in `misc-redundant-expression...
Fabian Wolff [Tue, 12 Apr 2022 14:03:14 +0000 (16:03 +0200)]
[clang-tidy] Never consider assignments as equivalent in `misc-redundant-expression` check

Fixes https://github.com/llvm/llvm-project/issues/35853.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D122535

2 years ago[lldb] Adjust libc++ string formatter for changes in D122598
Pavel Labath [Tue, 12 Apr 2022 13:41:45 +0000 (15:41 +0200)]
[lldb] Adjust libc++ string formatter for changes in D122598

The __size_ member is now in a slightly different location.

2 years ago[Clang] Fix unknown type attributes diagnosed twice with [[]] spelling
Jun Zhang [Tue, 12 Apr 2022 13:11:51 +0000 (21:11 +0800)]
[Clang] Fix unknown type attributes diagnosed twice with [[]] spelling

Don't warn on unknown type attributes in Parser::ProhibitCXX11Attributes
for most cases, but left the diagnostic to the later checks.
module declaration and module import declaration are special cases.

Fixes https://github.com/llvm/llvm-project/issues/54817

Differential Revision: https://reviews.llvm.org/D123447

2 years ago[ValueTracking] Make getStringLenth aware of strdup
serge-sans-paille [Mon, 11 Apr 2022 11:38:31 +0000 (13:38 +0200)]
[ValueTracking] Make getStringLenth aware of strdup

During strlen compile-time evaluation, make it possible to track size of
strduped strings.

Differential Revision: https://reviews.llvm.org/D123497

2 years ago[lldb][AArch64] Automatically add all extensions to disassembler
David Spickett [Thu, 7 Apr 2022 15:12:21 +0000 (15:12 +0000)]
[lldb][AArch64] Automatically add all extensions to disassembler

This means we don't have to remember to update this code as much.

This is all tested in lldb/test/Shell/Commands/command-disassemble-aarch64-extensions.s
which I added previously.

We don't have a way to get the latest base architecture yet
so that remains manual. Having all the extensions specified
will probably be equivalent to the latest architecture version
in any case.

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D123582

2 years ago[AMDGPU][DOC][NFC] Updated GFX10 assembler syntax description
Dmitry Preobrazhensky [Tue, 12 Apr 2022 12:16:20 +0000 (15:16 +0300)]
[AMDGPU][DOC][NFC] Updated GFX10 assembler syntax description

The description has been updated to reflect AMDGPU MC changes:
- enabled literals for src0 of v_fmaak_f*, v_fmamk_f*, v_madak_f32, v_madmk_f32;
- enabled global_atomic_fcmpswap and global_atomic_fcmpswap_x2;
- enabled dlc with flat_atomic* and global_atomic_*.

Bug fixing and improvements:
- enabled s_wait_idle;
- enabled s_waitcnt_depctr;
- added description of s_waitcnt_depctr syntactic sugar;
- disabled SYSMSG_OP_HOST_TRAP_ACK (it is not supported on GFX10);
- corrected description of lgkmcnt (accept values from 0 to 63).

2 years ago[MLIR][Presburger] normalizeDiv: add assert that denom > 0
Arjun P [Tue, 12 Apr 2022 12:04:56 +0000 (13:04 +0100)]
[MLIR][Presburger] normalizeDiv: add assert that denom > 0

2 years ago[AMDGPU][DOC][NFC] Updated GFX1030 assembler syntax description
Dmitry Preobrazhensky [Tue, 12 Apr 2022 11:55:46 +0000 (14:55 +0300)]
[AMDGPU][DOC][NFC] Updated GFX1030 assembler syntax description

Summary of changes:
- enabled null for VOP operands;
- added description of s_waitcnt_depctr syntactic sugar.

2 years ago[DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds
Simon Pilgrim [Tue, 12 Apr 2022 11:57:48 +0000 (12:57 +0100)]
[DAG] Add non-uniform vector support to (shl (sr[la] exact X,  C1), C2) folds

2 years agoUpdate the Bazel build files for "[mlir][Math] Replace some constant ..."
Dmitri Gribenko [Tue, 12 Apr 2022 11:47:51 +0000 (13:47 +0200)]
Update the Bazel build files for "[mlir][Math] Replace some constant ..."

2 years ago[mlir][Math] Replace some constant folder functions with common folder functions.
jacquesguan [Mon, 11 Apr 2022 07:22:32 +0000 (07:22 +0000)]
[mlir][Math] Replace some constant folder functions with common folder functions.

Differential Revision: https://reviews.llvm.org/D123485

2 years ago[MLIR][Presburger][Simplex] addSymbolicCut: don't add symbol div if denom is 1
Arjun P [Mon, 11 Apr 2022 20:21:34 +0000 (21:21 +0100)]
[MLIR][Presburger][Simplex] addSymbolicCut: don't add symbol div if denom is 1

This is unncessary, so we remove it as an optimization.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D123540