platform/upstream/llvm.git
16 months ago[X86] Precommit a test
Kazu Hirata [Tue, 21 Feb 2023 08:01:43 +0000 (00:01 -0800)]
[X86] Precommit a test

This patch precommits a test for:

https://github.com/llvm/llvm-project/issues/60374

16 months ago[AMDGPU] MIR-Tests for Multiplication using KBA
Jessica Del [Mon, 20 Feb 2023 15:51:15 +0000 (16:51 +0100)]
[AMDGPU] MIR-Tests for Multiplication using KBA

These tests show inefficient behavior that will be optimized by a
later change.

By using Known Bits Analysis, we can avoid unnecessary multiplications
or additions with 0.

16 months ago[clang-tidy] update docs for new hungarian identifier-naming types (unsigned char...
Alexis Murzeau [Tue, 21 Feb 2023 07:37:00 +0000 (07:37 +0000)]
[clang-tidy] update docs for new hungarian identifier-naming types (unsigned char and void)

Since 37e6a4f9496c8e35efc654d7619a79d6dbb72f99, `void` and
`unsigned char` were added to primitive types for hungarian notation.

This commit adds them to the documentation.

Reviewed By: carlosgalvezp

Differential Revision: https://reviews.llvm.org/D144422

16 months ago[ADT] Alternative way to declare enum type as bitmask
Serge Pavlov [Tue, 21 Feb 2023 06:44:24 +0000 (13:44 +0700)]
[ADT] Alternative way to declare enum type as bitmask

If an enumeration represents a set of bit flags, using the macro
LLVM_MARK_AS_BITMASK_ENUM can make operations with such enumeration more
convenient. It however brings problems if the enumeration is non-scoped.
As the macro adds an item LLVM_BITMASK_LARGEST_ENUMERATOR to the
enumeration type, only one such type may be declared as bitmask. This
problem could be solved by convertion of the enumeration to scoped, but
it requires static_casts in new places and the convenience can be
eliminated.

This change introduces a new macro LLVM_DECLARE_ENUM_AS_BITMASK, which
allows non-invasive convertion of an enumeration into bitmask. It
provides specialization to trait classes, which previously were built
based on presence of LLVM_BITMASK_LARGEST_ENUMERATOR in the enumeration.
The macro must be specified in global or llvm namespace because the
trait classes are declared in llvm namespace.

Differential Revision: https://reviews.llvm.org/D144202

16 months ago[AMDGPU] Add tests for future commit
Konstantina Mitropoulou [Sat, 18 Feb 2023 00:42:35 +0000 (16:42 -0800)]
[AMDGPU] Add tests for future commit

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144312

16 months ago[MLIR][Affine] Fix affine.parallel op domain add
Uday Bondhugula [Tue, 21 Feb 2023 04:45:57 +0000 (10:15 +0530)]
[MLIR][Affine] Fix affine.parallel op domain add

Fix obvious bug in `addAffineParallelOpDomain` that would lead to
incorrect domain constraints for any affine.parallel op with
dimensionality greater than one.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D144349

16 months ago[MLIR] Add replaceUsesWithIf on Operation
Uday Bondhugula [Tue, 21 Feb 2023 04:40:15 +0000 (10:10 +0530)]
[MLIR] Add replaceUsesWithIf on Operation

Add replaceUsesWithIf on Operation along the lines of
Value::replaceUsesWithIf. This had been missing on Operation and is
convenient to replace multi-result operations' results conditionally.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D144348

16 months ago[TypeSize][NFC] Fix type-o
eopXD [Tue, 21 Feb 2023 04:15:56 +0000 (20:15 -0800)]
[TypeSize][NFC] Fix type-o

Signed-off-by: eop Chen <eop.chen@sifive.com>
16 months ago[AIX] Lower some memory intrinsics to millicode functions on AIX
esmeyi [Tue, 21 Feb 2023 03:25:49 +0000 (22:25 -0500)]
[AIX] Lower some memory intrinsics to millicode functions on AIX

Summary: Currently we lower MEMCPY/MEMMOVE/MEMSET/BZERO to the corresponding libc functions. And the libc functions call the millicode functions on AIX. We can lower these intrinsics directly to save one call layer.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D143997

16 months ago[X86][NFC] Reorganize X86InstrInfo.td
Wang, Xin10 [Tue, 21 Feb 2023 02:16:47 +0000 (10:16 +0800)]
[X86][NFC] Reorganize X86InstrInfo.td

For now X86InstrInfo.td has many definitions for some instrs
and patterns which I think should not exist here, extract them
and move to other files.

It will be more clear to me that X86InstrInfo just define some
X86 specific properties and would not include detailed instrs
definition.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D144244

16 months ago[X86] Precommit a test
Kazu Hirata [Tue, 21 Feb 2023 01:00:03 +0000 (17:00 -0800)]
[X86] Precommit a test

This is for:

https://github.com/llvm/llvm-project/issues/60854

16 months ago[X86] Improve (select carry C1+1 C1)
Kazu Hirata [Tue, 21 Feb 2023 00:38:21 +0000 (16:38 -0800)]
[X86] Improve (select carry C1+1 C1)

Without this patch:

  return X < 4 ? 3 : 2;

  return X < 9 ? 7 : 6;

are compiled as:

  31 c0                   xor    %eax,%eax
  83 ff 04                cmp    $0x4,%edi
  0f 93 c0                setae  %al
  83 f0 03                xor    $0x3,%eax

  31 c0                   xor    %eax,%eax
  83 ff 09                cmp    $0x9,%edi
  0f 92 c0                setb   %al
  83 c8 06                or     $0x6,%eax

respectively.  With this patch, we generate:

  31 c0                   xor    %eax,%eax
  83 ff 04                cmp    $0x4,%edi
  83 d0 02                adc    $0x2,%eax

  31 c0                   xor    %eax,%eax
  83 ff 04                cmp    $0x4,%edi
  83 d0 02                adc    $0x2,%eax

respectively, saving 3 bytes while reducing the tree height.

This patch recognizes the equivalence of OR and ADD
(if bits do not overlap) and delegates to combineAddOrSubToADCOrSBB
for further processing.  The same applies to the equivalence of XOR
and SUB.

Differential Revision: https://reviews.llvm.org/D143838

16 months ago[X86] Add test case that clobber base pointer register.
Luo, Yuanke [Mon, 20 Feb 2023 12:00:37 +0000 (20:00 +0800)]
[X86] Add test case that clobber base pointer register.

16 months ago[SLP][X86] minimum-sizes.ll - add AVX512 test coverage
Simon Pilgrim [Mon, 20 Feb 2023 23:31:49 +0000 (23:31 +0000)]
[SLP][X86] minimum-sizes.ll - add AVX512 test coverage

As noticed on D144128, we need better AVX512 coverage for GEP vectorization

16 months ago[Support] Silence warning with Clang ToT.
Alexandre Ganea [Mon, 20 Feb 2023 22:16:21 +0000 (17:16 -0500)]
[Support] Silence warning with Clang ToT.

This fixes the following warning on Windows with latest Clang:
```
[160/3057] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/Signals.cpp.obj
In file included from C:/git/llvm-project/llvm/lib/Support/Signals.cpp:260:
C:/git/llvm-project/llvm/lib/Support/Windows/Signals.inc(834,15): warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
  if (RetCode == (0xE0000000 | EX_IOERR))
      ~~~~~~~ ^   ~~~~~~~~~~~~~~~~~~~~~
1 warning generated.```

16 months ago[SLP][X86] load-merge.ll - add AVX512 test coverage
Simon Pilgrim [Mon, 20 Feb 2023 23:21:27 +0000 (23:21 +0000)]
[SLP][X86] load-merge.ll - add AVX512 test coverage

As noticed on D144128, we need better AVX512 coverage for GEP vectorization

16 months ago[PowerPC] Correctly use ELFv2 ABI on all OS's that use the ELFv2 ABI
Brad Smith [Mon, 20 Feb 2023 22:57:15 +0000 (17:57 -0500)]
[PowerPC] Correctly use ELFv2 ABI on all OS's that use the ELFv2 ABI

Add a member function isPPC64ELFv2ABI() to determine what ABI is used on the
64-bit PowerPC big endian operating environment.

Reviewed By: nemanjai, dim, pkubaj

Differential Revision: https://reviews.llvm.org/D144321

16 months agoUse modern @got syntax in tsan assembly, instead of old style non_lazy_pointers....
Peter Cooper [Mon, 13 Feb 2023 23:38:01 +0000 (15:38 -0800)]
Use modern @got syntax in tsan assembly, instead of old style non_lazy_pointers.  NFC

Reviewed By: kubamracek, yln, wrotki, dvyukov

Differential Revision: https://reviews.llvm.org/D143959

16 months ago[RISCV][NFC] Add test for different LMULs in vectorizer
Luke Lau [Fri, 10 Feb 2023 01:17:14 +0000 (01:17 +0000)]
[RISCV][NFC] Add test for different LMULs in vectorizer

This is a test for an upcoming patch that proposes to change the default LMUL used by the loop vectorizer from 1 to 2

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D143722

16 months ago[libc] Fix GPU include directories not being set properly
Joseph Huber [Mon, 20 Feb 2023 21:42:28 +0000 (15:42 -0600)]
[libc] Fix GPU include directories not being set properly

Summary:
For some reason, this variable was set after where it was used. Causing
weird behaviour with including the standard headers. Fix it.

16 months ago[InstCombine] relax constraint on udiv fold
Sanjay Patel [Mon, 20 Feb 2023 19:39:51 +0000 (14:39 -0500)]
[InstCombine] relax constraint on udiv fold

The pair of div folds was just added with:
4966d8ebe1bbe5bd6a4d28

But as noted in the post-commit review, we don't actually need
the no-remainder requirement for an unsigned division (still
need the no-unsigned-wrap though):
https://alive2.llvm.org/ce/z/qHjK3Q

16 months ago[debug-info][codegen] Prevent creation of self-referential SP node
Felipe de Azevedo Piovezan [Fri, 10 Feb 2023 20:21:46 +0000 (15:21 -0500)]
[debug-info][codegen] Prevent creation of self-referential SP node

The function `CGDebugInfo::EmitFunctionDecl` is supposed to create a
declaration -- never a _definition_ -- of a subprogram. This is made
evident by the fact that the SPFlags never have the "Declaration" bit
set by that function.

However, when `EmitFunctionDecl` calls `DIBuilder::createFunction`, it
still tries to fill the "Declaration" argument by passing it the result
of `getFunctionDeclaration(D)`. This will query an internal cache of
previously created declarations and, for most code paths, we return
nullptr; all is good.

However, as reported in [0], there are pathological cases in which we
attempt to recreate a declaration, so the cache query succeeds,
resulting in a subprogram declaration whose declaration field points to
another declaration. Through a series of RAUWs, the declaration field
ends up pointing to the SP itself. Self-referential MDNodes can't be
`unique`, which causes the verifier to fail (declarations must be
`unique`).

We can argue that the caller should check the cache first, but this is
not a correctness issue (declarations are `unique` anyway). The bug is
that `CGDebugInfo::EmitFunctionDecl` should always pass `nullptr` to the
declaration argument of `DIBuilder::createFunction`, expressing the fact
that declarations don't point to other declarations. AFAICT this is not
something for which any reasonable meaning exists.

This seems a lot like a copy-paste mistake that has survived for ~10
years, since other places in this file have the exact same call almost
token-by-token.

I've tested this by compiling LLVMSupport with and without the patch, O2
and O0, and comparing the dwarfdump of the lib. The dumps are identical
modulo the attributes decl_file/producer/comp_dir.

[0]: https://github.com/llvm/llvm-project/issues/59241

Differential Revision: https://reviews.llvm.org/D143921

16 months ago[clang-tidy] add primitive types for hungarian identifier-naming (unsigned char and...
Alexis Murzeau [Mon, 20 Feb 2023 19:18:36 +0000 (19:18 +0000)]
[clang-tidy] add primitive types for hungarian identifier-naming (unsigned char and void)

Add `unsigned char` and `void` types to recognized PrimitiveTypes.
Fixes: #60670

Depends on D144037

Reviewed By: carlosgalvezp

Differential Revision: https://reviews.llvm.org/D144041

16 months ago[clang-tidy] allow tests to use --config-file instead of --config
Alexis Murzeau [Mon, 20 Feb 2023 19:03:10 +0000 (19:03 +0000)]
[clang-tidy] allow tests to use --config-file instead of --config

The previous way to test hungarian notation doesn't check CHECK-FIXES.

This will allow readability-identifier-naming tests of Hungarian
notation to keep the use of an external .clang-tidy file (not embedded
within the .cpp test file) and properly check CHECK-FIXES.

Also, it turns out the hungarian notation tests use the wrong
.clang-tidy file, so fix that too to make these tests ok.

This is a part of a fix for issue https://github.com/llvm/llvm-project/issues/60670.

Reviewed By: carlosgalvezp

Differential Revision: https://reviews.llvm.org/D144037

16 months ago[InstCombine] auto-generate test CHECK lines; NFC
Sanjay Patel [Mon, 20 Feb 2023 19:04:03 +0000 (14:04 -0500)]
[InstCombine] auto-generate test CHECK lines; NFC

The check line was not enabled until bfb1559fbe2fb656f3,
and then it was excessive, so the test started failing.

16 months ago[InstCombine] distribute div over add with matching mul-by-constant
Sanjay Patel [Mon, 20 Feb 2023 18:42:04 +0000 (13:42 -0500)]
[InstCombine] distribute div over add with matching mul-by-constant

((X * C2) + C1) / C2 --> X + C1/C2
https://alive2.llvm.org/ce/z/P66io8
https://alive2.llvm.org/ce/z/vghegw

This could be made more general -- the multiplier could be a
multiple of the divisor -- but this is the pattern from
issue #60754.

16 months ago[InstCombine] add tests for div with muladd operand; NFC
Sanjay Patel [Mon, 20 Feb 2023 18:09:17 +0000 (13:09 -0500)]
[InstCombine] add tests for div with muladd operand; NFC

issue #60754

16 months ago[InstCombine] add tests for add with sub-from-constant operand; NFC
Sanjay Patel [Mon, 20 Feb 2023 16:20:30 +0000 (11:20 -0500)]
[InstCombine] add tests for add with sub-from-constant operand; NFC

16 months ago[NFC] Fix missing colon in CHECK directives
Tiwari Abhinav Ashok Kumar [Mon, 20 Feb 2023 18:29:29 +0000 (23:59 +0530)]
[NFC] Fix missing colon in CHECK directives

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D144412

16 months ago[lldb] Use llvm::rotr (NFC)
Kazu Hirata [Mon, 20 Feb 2023 18:38:18 +0000 (10:38 -0800)]
[lldb] Use llvm::rotr (NFC)

16 months ago[AArch64] Add tests for saba (NFC)
Ricardo Jesus [Mon, 20 Feb 2023 15:43:59 +0000 (15:43 +0000)]
[AArch64] Add tests for saba (NFC)

Tests in sve-saba.ll currently exhibit inefficient codegen.

Differential Revision: https://reviews.llvm.org/D144399

16 months ago[InstCombine] canonicalize urem as cmp+select
zhongyunde [Mon, 20 Feb 2023 15:49:13 +0000 (23:49 +0800)]
[InstCombine] canonicalize urem as cmp+select

Fix https://github.com/llvm/llvm-project/issues/60546

Reviewed By: nikic, efriedma, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D143883

16 months ago[InstCombine] Remove early constant fold
Nikita Popov [Mon, 20 Feb 2023 08:46:54 +0000 (09:46 +0100)]
[InstCombine] Remove early constant fold

InstCombine currently performs a constant folding attempt as part
of the main InstCombine loop, before visiting the instruction.
However, each visit method will also attempt to simplify the
instruction, which will in turn constant fold it. (Additionally,
we also constant fold instructions before the main InstCombine loop
and use a constant folding IR builder, so this is doubly redundant.)

There is one place where InstCombine visit methods currently don't
call into simplification, and that's casts. To be conservative,
I've added an explicit constant folding call there (though it has
no impact on tests).

This makes for a mild compile-time improvement and in particular
mitigates the compile-time regression from enabling load
simplification in be88b5814d9efce131dbc0c8e288907e2e6c89be.

Differential Revision: https://reviews.llvm.org/D144369

16 months ago[test] precommit some tests for D143883 NFC
zhongyunde [Mon, 20 Feb 2023 12:50:55 +0000 (20:50 +0800)]
[test] precommit some tests for D143883 NFC

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D144372

16 months ago[mlir][bufferization] Fix crash in EmptyTensorElimination
Matthias Springer [Mon, 20 Feb 2023 15:32:43 +0000 (16:32 +0100)]
[mlir][bufferization] Fix crash in EmptyTensorElimination

Differential Revision: https://reviews.llvm.org/D144389

16 months ago[InstCombine] Use CaptureTracking in foldAllocaCmp()
Nikita Popov [Mon, 20 Feb 2023 15:35:58 +0000 (16:35 +0100)]
[InstCombine] Use CaptureTracking in foldAllocaCmp()

foldAllocaCmp() checks whether the alloca is not captured (ignoring
the icmp). Replace the manual implementation of escape analysis
with CaptureTracking.

The primary practical difference is that CaptureTracking handles
nocapture arguments, while foldAllocaCmp() was using a hardcoded
list.

This is basically just the CaptureTracking refactoring from D120371
without the other changes.

16 months ago[libc] Fix dependencies for generating the GPU binary file
Joseph Huber [Fri, 17 Feb 2023 18:06:02 +0000 (12:06 -0600)]
[libc] Fix dependencies for generating the GPU binary file

This patch adjusts the way dependencies are handled in the packaed
version of the GPU libc runtime. Previously the files were not getting
updated properly in the install when they changed.

Depends on D144214

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D144280

16 months ago[libc] Support add_object_library for the GPU build
Joseph Huber [Thu, 16 Feb 2023 20:46:39 +0000 (14:46 -0600)]
[libc] Support add_object_library for the GPU build

This patch unifies the handling of generating the GPU build targets
between the `add_entrypoint_library` and the `add_object_library`
functions. The `_build_gpu_objects` function will create two targets.
One contains a single object file with several GPU binaries embedded in
it, a so-called fatbinary. The other is a direct compile of the
supported target to be used internally only. This patch pulls out some
of the properties logic so that we can handle both more easily. This
patch also required adding an ovverride  `NO_GPU_BUILD` for cases when
we only want to build the source file as normal.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D144214

16 months ago[AArch64] Add SLP test for abs (NFC)
Ricardo Jesus [Mon, 20 Feb 2023 14:34:29 +0000 (14:34 +0000)]
[AArch64] Add SLP test for abs (NFC)

Differential Revision: https://reviews.llvm.org/D144376

16 months ago[Libomptarget] Implement the host memory allocator with fine grained memory
Joseph Huber [Fri, 10 Feb 2023 20:37:20 +0000 (14:37 -0600)]
[Libomptarget] Implement the host memory allocator with fine grained memory

This patch should enable the "Host" allocation using fine-grained
memory. As far as I understand, this is HSA managed memory that is
availible to the host, but can be accessed by the device as well.
The original patch that introduced these extensions just stipulated that
it's "non-migratable" memory, which is most likely true because it's
managed by the host but accessible by the device. This should work
sufficiently well for what we expect the "host" allocation to do.

Depends on D143771

Reviewed By: kevinsala

Differential Revision: https://reviews.llvm.org/D143775

16 months ago[Libmoptarget] Enable the shared allocator for AMDGPU
Joseph Huber [Fri, 10 Feb 2023 19:13:21 +0000 (13:13 -0600)]
[Libmoptarget] Enable the shared allocator for AMDGPU

Currently, the AMDGPU plugin did not support the `TARGET_ALLOC_SHARED`
allocation kind. We used the fine-grained memory allocator for the
"host" alloc when this is most likely not what is intended. Fine-grained
memory can be accessed by all agents, so it should be considered shared.
This patch removes the use of fine-grained memory for the host
allocator. A later patch will add support for this via the
`hsa_amd_memory_lock` method.

Reviewed By: kevinsala

Differential Revision: https://reviews.llvm.org/D143771

16 months ago[ConstraintElimination] Add tests to check for `or` instruction decomposition if...
Zain Jaffal [Mon, 20 Feb 2023 13:56:00 +0000 (13:56 +0000)]
[ConstraintElimination] Add tests to check for `or` instruction decomposition if a constant operand is < 2^known_zero_bits of the first operand.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D142545

16 months agoasan: fix crash in strdup on malloc failure
Dmitry Vyukov [Mon, 20 Feb 2023 10:58:15 +0000 (11:58 +0100)]
asan: fix crash in strdup on malloc failure

There are some programs that try to handle all malloc failures.
Let's allow testing of such programs.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D144374

16 months agoRework "llvm-tblgen: Anonymize some functions.", llvmorg-17-init-2668-gc45e90cf152d
NAKAMURA Takumi [Sun, 19 Feb 2023 22:12:57 +0000 (07:12 +0900)]
Rework "llvm-tblgen: Anonymize some functions.", llvmorg-17-init-2668-gc45e90cf152d

Anonymous namespace should be applied only to class definitions.

16 months ago[SLP]Add shuffling of extractelements to avoid extra costs/data movement.
Alexey Bataev [Tue, 10 Jan 2023 12:28:21 +0000 (04:28 -0800)]
[SLP]Add shuffling of extractelements to avoid extra costs/data movement.

If the scalar must be extracted and then used in the gather node,
instead we can emit shuffle instruction to avoid those extra
extractelements and vector-to-scalar and back data movement.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141940

16 months ago[AArch64] More consistently use buildvector for zero and all-ones constants
David Green [Mon, 20 Feb 2023 14:13:53 +0000 (14:13 +0000)]
[AArch64] More consistently use buildvector for zero and all-ones constants

The AArch64 backend will use legal BUILDVECTORs for zero vectors or all-ones
vectors, so during selection tablegen patterns get rely on immAllZerosV and
immAllOnesV pattern frags in patterns like vnot. It was not always consistent
though, which this patch attempt to fix by recognizing where constant splat +
insert vector element is used. The main outcome of this will be that full
vector movi v0.2d, #0000000000000000 will be used as opposed to movi d0, #0, as
per https://reviews.llvm.org/D53579. This helps simplify what tablegen will
see, to make pattern matching simpler.

Differential Revision: https://reviews.llvm.org/D144018

16 months ago[VPlan] Use usesScalars in shouldPack.
Florian Hahn [Mon, 20 Feb 2023 14:11:18 +0000 (14:11 +0000)]
[VPlan] Use usesScalars in shouldPack.

Suggested by @Ayal as follow-up improvement in D143864.

I was unable to find a case where this actually changes generated code,
but it is a unifying code to use common infrastructure.

16 months ago[SME2][AArch64] Add multi-multi multiply-add long long intrinsics
Kerry McLaughlin [Mon, 20 Feb 2023 11:00:47 +0000 (11:00 +0000)]
[SME2][AArch64] Add multi-multi multiply-add long long intrinsics

Adds intrinsics for the following SME2 instructions (2 & 4 vectors):
 - smlall
 - smlsll
 - umlall
 - umlsll
 - usmlall

NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D143277

16 months agoFix LLVM sphinx build
Aaron Ballman [Mon, 20 Feb 2023 13:35:42 +0000 (08:35 -0500)]
Fix LLVM sphinx build

This fixes the issue found by:
https://lab.llvm.org/buildbot/#/builders/30/builds/32127

16 months ago[IR] Add new intrinsics interleave and deinterleave vectors
Caroline Concatto [Mon, 20 Feb 2023 12:21:38 +0000 (12:21 +0000)]
[IR] Add new intrinsics interleave and deinterleave vectors

This patch adds 2 new intrinsics:

  ; Interleave two vectors into a wider vector
  <vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)

  ; Deinterleave the odd and even lanes from a wider vector
  {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)

The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.

The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.

The format of the intrinsics matches how `shufflevector` is used in
LoopVectorize. For example:

  using cf = std::complex<float>;

  void foo(cf * dst, int N) {
      for (int i=0; i<N; ++i)
          dst[i] += cf(1.f, 2.f);
  }

For this loop, LoopVectorize:
  (1) Loads a wide vector (e.g. <8 x float>)
  (2) Extracts odd lanes using shufflevector (leading to <4 x float>)
  (3) Extracts even lanes using shufflevector (leading to <4 x float>)
  (4) Performs the addition
  (5) Interleaves the two <4 x float> vectors into a single <8 x float> using
      shufflevector
  (6) Stores the wide vector.

In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.

The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.

Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.

Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D141924

16 months agoRevert "[AssumptionCache] caches @llvm.experimental.guard's"
Max Kazantsev [Mon, 20 Feb 2023 11:38:07 +0000 (18:38 +0700)]
Revert "[AssumptionCache] caches @llvm.experimental.guard's"

This reverts commit f9599bbc7a3f831e1793a549d8a7a19265f3e504.

For some reason it caused us a huge compile time regression in downstream
workloads. Not sure whether the source of it is in upstream code ir not.
Temporarily reverting until investigated.

Differential Revision: https://reviews.llvm.org/D142330

16 months ago[LV] Harden the test of the minmax with index pattern. (NFC)
Mel Chen [Mon, 13 Feb 2023 13:28:42 +0000 (05:28 -0800)]
[LV] Harden the test of the minmax with index pattern. (NFC)

  - Add test config: -force-vector-width=4 -force-vector-interleave=1
  - New test case: The test case both returns the minimum value and the index.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D143905

16 months ago[VPlan] Move shouldPack outside of DEBUG ifdef.
Florian Hahn [Mon, 20 Feb 2023 10:53:45 +0000 (10:53 +0000)]
[VPlan] Move shouldPack outside of DEBUG ifdef.

This fixes a build failure with assertions disabled.

16 months ago[LowerTypeTests] Support generating Armv6-M jump tables. (reland)
Simon Tatham [Thu, 16 Feb 2023 15:34:33 +0000 (15:34 +0000)]
[LowerTypeTests] Support generating Armv6-M jump tables. (reland)

[Originally committed as f6ddf7781471b71243fa3c3ae7c93073f95c7dff;
reverted in bbef38352fbade9e014ec97d5991da5dee306da7 due to test
breakage; now relanded with the Arm tests conditioned on
`arm-registered-target`]

The LowerTypeTests pass emits a jump table in the form of an
`inlineasm` IR node containing a string representation of some
assembly. It tests the target triple to see what architecture it
should be generating assembly for. But that's not good enough for
`Triple::thumb`, because the 32-bit PC-relative `b.w` branch
instruction isn't available in all supported architecture versions. In
particular, Armv6-M doesn't support that instruction (although the
similar Armv8-M Baseline does).

Most of this patch is concerned with working out whether the
compilation target is Armv6-M or not, which I'm doing by going through
all the functions in the module, retrieving a TargetTransformInfo for
each one, and querying it via a new method I've added to check its
SubtargetInfo. If any function's TTI indicates that it's targeting an
architecture supporting B.W, then we assume we're also allowed to use
B.W in the jump table.

The Armv6-M compatible jump table format requires a temporary
register, and therefore also has to use the stack in order to restore
that register.

Another consequence of this change is that jump tables on Arm/Thumb
are no longer always the same size. In particular, on an architecture
that supports Arm and Thumb-1 but not Thumb-2, the Arm and Thumb
tables are different sizes from //each other//. As a consequence,
``getJumpTableEntrySize`` can no longer base its answer on the target
triple's architecture: it has to take into account the decision that
``selectJumpTableArmEncoding`` made, which meant I had to move that
function to an earlier point in the code and store its answer in the
``LowerTypeTestsModule`` class.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D143576

16 months ago[VPlan] Replace AlsoPack field with shouldPack() method (NFC).
Florian Hahn [Mon, 20 Feb 2023 10:28:24 +0000 (10:28 +0000)]
[VPlan] Replace AlsoPack field with shouldPack() method (NFC).

There is no need to update the AlsoPack field when creating
VPReplicateRecipes. It can be easily computed based on the VP def-use
chains when it is needed.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D143864

16 months ago[InstSimplify] Correct icmp_lshr test to use ult instead of slt
Matt Devereau [Mon, 20 Feb 2023 09:47:53 +0000 (09:47 +0000)]
[InstSimplify] Correct icmp_lshr test to use ult instead of slt

16 months ago[mlir][linalg][TransformOps] Connect hoistRedundantVectorTransfers
Nicolas Vasilache [Mon, 20 Feb 2023 09:40:18 +0000 (01:40 -0800)]
[mlir][linalg][TransformOps] Connect hoistRedundantVectorTransfers

Connect the hoistRedundantVectorTransfers functionality to the transform
dialect.

Authored-by: Quentin Colombet <quentin.colombet@gmail.com>
Differential Revision: https://reviews.llvm.org/D144260

16 months ago[InstCombine] Call simplifyLoadInst()
Nikita Popov [Wed, 15 Feb 2023 11:14:55 +0000 (12:14 +0100)]
[InstCombine] Call simplifyLoadInst()

InstCombine is supposed to be a superset of InstSimplify, but
failed to invoke load simplification.

Unfortunately, this causes a minor compile-time regression, which
will be mitigated in a future commit.

16 months ago[Test] Move test for D143726 to LICM
Max Kazantsev [Mon, 20 Feb 2023 09:42:14 +0000 (16:42 +0700)]
[Test] Move test for D143726 to LICM

Seems that it's a more appropriate place to do this transform.

16 months ago[InstCombine] Add additional load folding tests (NFC)
Nikita Popov [Mon, 20 Feb 2023 09:38:22 +0000 (10:38 +0100)]
[InstCombine] Add additional load folding tests (NFC)

These show that we currently fail to call load simplification from
InstCombine.

16 months ago[InstSimplify] Simplify icmp between Shl instructions of the same value
Matt Devereau [Tue, 31 Jan 2023 13:30:09 +0000 (13:30 +0000)]
[InstSimplify] Simplify icmp between Shl instructions of the same value

define i1 @compare_vscales() {
  %vscale = call i64 @llvm.vscale.i64()
  %vscalex2 = shl nuw nsw i64 %vscale, 1
  %vscalex4 = shl nuw nsw i64 %vscale, 2
  %cmp = icmp ult i64 %vscalex2, %vscalex4
  ret i1 %cmp
}

This IR is currently emitted by LLVM. This icmp is redundant as this snippet
can be simplified to true or false as both operands originate from the same
@llvm.vscale.i64() call.

Differential Revision: https://reviews.llvm.org/D142542

16 months ago[SCEV] Canonicalize ext(min/max(x, y)) to min/max(ext(x), ext(y))
Max Kazantsev [Mon, 20 Feb 2023 08:48:05 +0000 (15:48 +0700)]
[SCEV] Canonicalize ext(min/max(x, y)) to min/max(ext(x), ext(y))

I stumbled over this while trying to improve our exit count work. These expressions
are equivalent for complementary signed/unsigned ext and min/max (including umin_seq),
but they are not canonicalized and SCEV cannot recognize them as the same.

The benefit of this canonicalization is that SCEV can prove some new equivalences which
it coudln't prove because of different forms. There is 1 test where trip count seems pessimized,
I could not directly figure out why, but it just seems an unrelated issue that we can fix.
Other changes seem neutral or positive to me.

Differential Revision: https://reviews.llvm.org/D141481
Reviewed By: nikic

16 months agoMigrate away from the soft-deprecated functions in APInt.h (NFC)
Kazu Hirata [Mon, 20 Feb 2023 08:58:29 +0000 (00:58 -0800)]
Migrate away from the soft-deprecated functions in APInt.h (NFC)

Note that those functions on the left hand side are soft-deprecated in
favor of those on the right hand side:

  getMinSignedBits -> getSignificantBits
  getNullValue     -> getZero
  isNullValue      -> isZero
  isOneValue       -> isOne

16 months ago[llvm][Uniformity] A phi with an undef argument is not always divergent
Sameer Sahasrabuddhe [Mon, 20 Feb 2023 08:55:37 +0000 (14:25 +0530)]
[llvm][Uniformity] A phi with an undef argument is not always divergent

The uniformity analysis treated an undef argument to phi to be distinct from any
other argument, equivalent to calling PHINode::hasConstantValue() instead of
PHINode::hasConstantOrUndefValue(). Such a phi was reported as divergent. This
is different from the older divergence analysis which treats such a phi as
uniform. Fixed uniformity analysis to match the older behaviour.

The original behaviour was added to DivergenceAnalysis in D19013. But it is not
clear if relying on the undef value is safe. The defined values are not constant
per se; they just happen to be uniform and the non-constant uniform value may
not dominate the PHI.

Reviewed By: ruiling

Differential Revision: https://reviews.llvm.org/D144254

16 months ago[flang] Carry over the derived type from target in pointer remapping
Valentin Clement [Mon, 20 Feb 2023 08:43:57 +0000 (09:43 +0100)]
[flang] Carry over the derived type from target in pointer remapping

When calling PointerAssociateRemapping the dynamic type information
from the target needs to be carried over to the pointer if any.

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D143717

16 months ago[InstSimplify] Fold LoadInst for uniform constant global variables
Kohei Asano [Mon, 20 Feb 2023 08:38:03 +0000 (09:38 +0100)]
[InstSimplify] Fold LoadInst for uniform constant global variables

Fold LoadInst for uniformly initialized constants, even if there
are non-constant GEP indices.

Goal proof: https://alive2.llvm.org/ce/z/oZtVby

Motivated by https://github.com/rust-lang/rust/issues/107208

Differential Revision: https://reviews.llvm.org/D144184

16 months ago[libc][AArch64] Fix fullbuild when using G++/GCC
David Spickett [Fri, 3 Feb 2023 11:45:05 +0000 (11:45 +0000)]
[libc][AArch64] Fix fullbuild when using G++/GCC

The libc uses some functions that GCC does not currently
implement, that come from Arm's ACLE header usually.

These are:
```
__arm_wsr64
__arm_rsr64
__arm_wsr
__arm_rsr
```

This issue was reported to us (https://github.com/llvm/llvm-project/issues/60473)
and I've then reported that back to GCC (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108642).

Even if these functions are added, clang has some non standard extensions
to them that gcc may not take. So we're looking at a fix in gcc 13 at best,
and that may not be enough for what we're doing with them.

So I've added ifdefs to use alternatives with gcc.

For handling the stack pointer, inline assembly is unfortunately the only option.
I have verified that the single mov is essentially what __arm_rsr64 generates.

For fpsr and fpcr the gcc devs suggested using
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/AArch64-Built-in-Functions.html#AArch64-Built-in-Functions.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D143261

16 months ago[InstSimplify] Add additional load folding tests (NFC)
Kohei Asano [Mon, 20 Feb 2023 08:23:39 +0000 (09:23 +0100)]
[InstSimplify] Add additional load folding tests (NFC)

For D144184.

16 months ago[mlir][llvm] Add atomic support to the StoreOp.
Tobias Gysi [Mon, 20 Feb 2023 07:46:33 +0000 (08:46 +0100)]
[mlir][llvm] Add atomic support to the StoreOp.

This revision adds atomic support to the StoreOp. It chooses
to print the atomic keywords together with the syncscope and
ordering arguments. The revision also implements verifiers to
ensure the constraints that apply to atomic store operations
are checked.

Depends on D144112

Reviewed By: Dinistro

Differential Revision: https://reviews.llvm.org/D144200

16 months agoUse APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC)
Kazu Hirata [Mon, 20 Feb 2023 07:56:52 +0000 (23:56 -0800)]
Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC)

Note that getMinSignedBits has been soft-deprecated in favor of
getSignificantBits.

16 months ago[NFC] Remove the unused parameter in Sema::PushGlobalModuleFragment
Chuanqi Xu [Mon, 20 Feb 2023 07:37:11 +0000 (15:37 +0800)]
[NFC] Remove the unused parameter in Sema::PushGlobalModuleFragment

The `IsImplicit` parameter should be removed since it is not used now.

16 months ago[llvm] Use APInt::isAllOnes instead of isAllOnesValue (NFC)
Kazu Hirata [Mon, 20 Feb 2023 07:35:39 +0000 (23:35 -0800)]
[llvm] Use APInt::isAllOnes instead of isAllOnesValue (NFC)

Note that isAllOnesValue has been soft-deprecated in favor of
isAllOnes.

16 months ago[NFC] Remove unused Sema::DirectModuleImports
Chuanqi Xu [Mon, 20 Feb 2023 07:07:07 +0000 (15:07 +0800)]
[NFC] Remove unused Sema::DirectModuleImports

Sema::DirectModuleImports is not used now. Remove it for clearness.

16 months agoUse APInt::isOne instead of APInt::isOneValue (NFC)
Kazu Hirata [Mon, 20 Feb 2023 07:06:36 +0000 (23:06 -0800)]
Use APInt::isOne instead of APInt::isOneValue (NFC)

Note that isOneValue has been soft-deprecated in favor of isOne.

16 months agoUse APInt::getAllOnes instead of APInt::getAllOnesValue (NFC)
Kazu Hirata [Mon, 20 Feb 2023 06:54:23 +0000 (22:54 -0800)]
Use APInt::getAllOnes instead of APInt::getAllOnesValue (NFC)

Note that getAllOnesValue has been soft-deprecated in favor of
getAllOnes.

16 months ago[llvm] Use APInt::getZero instead of APInt::getNullValue (NFC)
Kazu Hirata [Mon, 20 Feb 2023 06:42:01 +0000 (22:42 -0800)]
[llvm] Use APInt::getZero instead of APInt::getNullValue (NFC)

Note that APInt::getNullValue has been soft-deprecated in favor of
APInt::getZero.

16 months ago[SimpleLoopUnswitch] Fix an assert in injectPendingInvariantConditions
Serguei Katkov [Mon, 20 Feb 2023 06:03:18 +0000 (13:03 +0700)]
[SimpleLoopUnswitch] Fix an assert in injectPendingInvariantConditions

Since canonicalizeForInvariantConditionInjection is introduced the
in loop successor may be the second successor.

Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D144361

16 months agoUse APInt::isZero instead of APInt::isNulLValue (NFC)
Kazu Hirata [Mon, 20 Feb 2023 06:23:57 +0000 (22:23 -0800)]
Use APInt::isZero instead of APInt::isNulLValue (NFC)

Note that APInt::isNullValue has been soft-deprecated in favor of
APInt::isZero.

16 months ago[Modules] Handle the visibility of GMF during the template instantiation
Chuanqi Xu [Mon, 20 Feb 2023 05:58:28 +0000 (13:58 +0800)]
[Modules] Handle the visibility of GMF during the template instantiation

Close https://github.com/llvm/llvm-project/issues/60775

Previously, we will mark all the declarations in the GMF as not visible
to other module units. But this is too strict and the users may meet
problems during the template instantiation like the above exampel shows.
The patch addresseds the problem.

16 months ago[SCEV] Support umin/smin in SCEVLoopGuardRewriter
Max Kazantsev [Mon, 20 Feb 2023 05:39:33 +0000 (12:39 +0700)]
[SCEV] Support umin/smin in SCEVLoopGuardRewriter

Adds support for these SCEVs to cover more cases.

Differential Revision: https://reviews.llvm.org/D143259
Reviewed By: dmakogon, fhahn

16 months agoUse APInt::count{l,r}_{zero,one} (NFC)
Kazu Hirata [Mon, 20 Feb 2023 06:04:47 +0000 (22:04 -0800)]
Use APInt::count{l,r}_{zero,one} (NFC)

16 months ago[RISCV] Add more tests for D144166. NFC
Craig Topper [Mon, 20 Feb 2023 05:43:15 +0000 (21:43 -0800)]
[RISCV] Add more tests for D144166. NFC

Adding load and store tests. Addressing post commit feedback.

16 months ago[LoopIdiomRecognize] Remove legacy pass
Fangrui Song [Mon, 20 Feb 2023 05:39:47 +0000 (21:39 -0800)]
[LoopIdiomRecognize] Remove legacy pass

Following recent changes to remove non-core legacy passes.

16 months ago[Fuchsia] Use cleaner method of adding driver binary
Alex Brachet [Mon, 20 Feb 2023 03:57:23 +0000 (03:57 +0000)]
[Fuchsia] Use cleaner method of adding driver binary

16 months ago[Fuchsia] Fix driver build on Windows
Alex Brachet [Mon, 20 Feb 2023 03:33:29 +0000 (03:33 +0000)]
[Fuchsia] Fix driver build on Windows

Don't include llvm-driver when building for Windows

16 months ago[clang-format] Put ports on separate lines in Verilog module headers
sstwcw [Mon, 20 Feb 2023 03:03:33 +0000 (03:03 +0000)]
[clang-format] Put ports on separate lines in Verilog module headers

New:
```
module mh1
    (input var int in1,
     input var in2, in3,
     output tagged_st out);
endmodule
```

Old:
```
module mh1
    (input var int in1, input var in2, in3, output tagged_st out);
endmodule
```

`getNextNonComment` was modified to return a non-const pointer because
we needed to use it that way in `verilogGroupDecl`.

The comment on line 2626 was a typo.  We corrected it while modifying
the function.

Reviewed By: MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D143825

16 months agoRecommit [Coroutines] Stop supportting std::experimental::coroutine_traits
Chuanqi Xu [Mon, 20 Feb 2023 02:26:41 +0000 (10:26 +0800)]
Recommit [Coroutines] Stop supportting std::experimental::coroutine_traits

As we discussed before, we should stop supporting
std::experimental::coroutine_traits in clang17. Now the clang16 is
branched so we can clean them now.

All the removed tests have been duplicated before.

16 months ago[GISelEmitter][NFC] Correct path of GISel's td file in the comment.
Kai Luo [Mon, 20 Feb 2023 02:05:37 +0000 (10:05 +0800)]
[GISelEmitter][NFC] Correct path of GISel's td file in the comment.

`include/llvm/CodeGen/TargetGlobalISel.td` no longer exists.

16 months agoAMDGPU: Restrict foldFreeOpFromSelect combine based on legal source mods
Matt Arsenault [Mon, 23 Jan 2023 15:22:33 +0000 (11:22 -0400)]
AMDGPU: Restrict foldFreeOpFromSelect combine based on legal source mods

Provides a small code size savings for some f32 cases.

16 months agoReland "[Fuchsia] Enable llvm-driver build".
Alex Brachet [Mon, 20 Feb 2023 01:57:40 +0000 (01:57 +0000)]
Reland "[Fuchsia] Enable llvm-driver build".

The MacOS problem has been fixed. Additionally, don't enable the
driver build on Windows. We can look into enabling it later if
symlinks work better than I think on Windows.

Differential Revision: https://reviews.llvm.org/D144287

16 months agoAMDGPU: Teach fneg combines that select has source modifiers
Matt Arsenault [Thu, 15 Dec 2022 00:23:55 +0000 (19:23 -0500)]
AMDGPU: Teach fneg combines that select has source modifiers

We do match source modifiers for f32 typed selects already, but the
combiner code was never informed of this.

A long time ago the documentation lied and stated that source
modifiers don't work for v_cndmask_b32 when they in fact do. We had a
bunch fo code operating under the assumption that they don't support
source modifiers, so we tried to move fnegs around to work around
this.

Gets a few small improvements here and there. The main hazard to watch
out for is infinite loops in the combiner since we try to move fnegs
up and down the DAG. For now, don't fold fneg directly into select.
The generic combiner does this for a restricted set of cases
when getNegatedExpression obviously shows an improvement for both
operands. It turns out to be trickier to avoid infinite looping the
combiner in conjunction with pulling out source modifiers, so
leave this for a later commit.

16 months ago[GlobalISel] Fix a store-merging bug due to use of >= instead of >.
Amara Emerson [Sun, 19 Feb 2023 23:36:36 +0000 (15:36 -0800)]
[GlobalISel] Fix a store-merging bug due to use of >= instead of >.

This fixes a corner case where we would skip doing an alias check because of a
>= vs > bug, due to the presence of a non-aliasing instruction, in this case
the load %safeld.

Fixes issue #59376

16 months ago[CMake] Fix driver build on MacOS
Alex Brachet [Sun, 19 Feb 2023 23:42:11 +0000 (23:42 +0000)]
[CMake] Fix driver build on MacOS

16 months ago[InstCombine] canonicalize "extract lowest set bit" away from cttz intrinsic
Sanjay Patel [Sun, 19 Feb 2023 15:33:30 +0000 (10:33 -0500)]
[InstCombine] canonicalize "extract lowest set bit" away from cttz intrinsic

1 << (cttz X) --> -X & X
https://alive2.llvm.org/ce/z/qv3E9e

This creates an extra use of the input value, so that's generally
not preferred, but there are advantages to this direction:
1. 'negate' and 'and' allow for better analysis than 'cttz'.
2. This is more likely to induce follow-on transforms (in the
   example from issue #60801, we'll get the decrement pattern).
3. The more basic ALU ops are more likely to result in better
   codegen across a variety of targets.

This won't solve the motivating bugs (see issue #60799) because
we do not recognize the redundant icmp+sel, and the x86 backend
may not have the pattern-matching to produce the optimal BMI
instructions.

Differential Revision: https://reviews.llvm.org/D144329

16 months agoRecommit "[Support] change StringMap hash function from djbHash to xxHash"
Erik Desjardins [Sun, 19 Feb 2023 18:47:09 +0000 (13:47 -0500)]
Recommit "[Support] change StringMap hash function from djbHash to xxHash"

This reverts commit 37eb9d13f891f7656f811516e765b929b169afe0.

Test failures have been fixed:

- ubsan failure fixed by 72eac42f21c0f45a27f3eaaff9364cbb5189b9e4
- warn-unsafe-buffer-usage-fixits-local-var-span.cpp fixed by
  03cc52dfd1dbb4a59b479da55e87838fb93d2067 (wasn't related)
- test-output-format.ll failure was spurious, build failed at
  https://lab.llvm.org/buildbot/#/builders/54/builds/3545 (b4431b2d945b6fc19b1a55ac6ce969a8e06e1e93)
  but passed at
  https://lab.llvm.org/buildbot/#/builders/54/builds/3546 (5ae99be0377248c74346096dc475af254a3fc799)
  which is before my revert
  https://github.com/llvm/llvm-project/compare/b4431b2d945b6fc19b1a55ac6ce969a8e06e1e93...5ae99be0377248c74346096dc475af254a3fc799

Original commit message:

    Depends on https://reviews.llvm.org/D142861.

    Alternative to https://reviews.llvm.org/D137601.

    xxHash is much faster than djbHash. This makes a simple Rust test case with a large constant string 10% faster to compile.

    Previous attempts at changing this hash function (e.g. https://reviews.llvm.org/D97396) had to be reverted due to breaking tests that depended on iteration order.
    No additional tests fail with this patch compared to `main` when running `check-all` with `-DLLVM_ENABLE_PROJECTS="all"` (on a Linux host), so I hope I found everything that needs to be changed.

    Differential Revision: https://reviews.llvm.org/D142862

16 months ago[SLP] Fix infinite loop in isUndefVector.
Florian Hahn [Sun, 19 Feb 2023 21:42:04 +0000 (21:42 +0000)]
[SLP] Fix infinite loop in isUndefVector.

This fixes an infinite loop if isa<T>(II->getOperand(1)) is true.
Update Base at the top of the loop, before the continue.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D144292

16 months ago[RISCV][MC] Mark Zawrs extension as non-experimental
Alex Bradbury [Sun, 19 Feb 2023 20:40:58 +0000 (20:40 +0000)]
[RISCV][MC] Mark Zawrs extension as non-experimental

Support for the unratified 1.0-rc3 specification was introduced in
D133443. The specification has since been ratified (in November 2022
according to the recently ratified extensions list
<https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions>.

A review of the diff
<https://github.com/riscv/riscv-zawrs/compare/V1.0-rc3...main> of the
1.0-rc3 spec vs the current/ratified document shows no changes to the
instruction encoding or naming. At one point, a note was added
<https://github.com/riscv/riscv-zawrs/commit/e84f42406a7c88eb92452515b2035144a7023a51>
indicating Zawrs depends on the Zalrsc extension (not officially
specified, but I believe to be just the LR/SC instructions from the A
extension). The final text ended up as "The instructions in the Zawrs
extension are only useful in conjunction with the LR instructions, which
are provided by the A extension, and which we also expect to be provided
by a narrower Zalrsc extension in the future." I think it's consistent
with this phrasing to not require the A extension for Zawrs, which
matches what was implemented.

No intrinsics are implemented for Zawrs currently, meaning we don't need
to additionally review whether those intrinsics can be considered
finalised and ready for exposure to end users.

Differential Revision: https://reviews.llvm.org/D143507

16 months ago[RISCV] Add fgtq.s and fgeq.s assembler aliases for Zfa.
Craig Topper [Sun, 19 Feb 2023 20:27:24 +0000 (12:27 -0800)]
[RISCV] Add fgtq.s and fgeq.s assembler aliases for Zfa.

We can swap operands and use fltq.s and fleq.s. Similar for D and H.

16 months ago[RISCV] Remove Commutable property from Zfa fltq/fleq instructions.
Craig Topper [Sun, 19 Feb 2023 19:37:17 +0000 (11:37 -0800)]
[RISCV] Remove Commutable property from Zfa fltq/fleq instructions.

16 months agoUse APInt::popcount instead of APInt::countPopulation (NFC)
Kazu Hirata [Sun, 19 Feb 2023 19:29:12 +0000 (11:29 -0800)]
Use APInt::popcount instead of APInt::countPopulation (NFC)

This is for consistency with the C++20-style bit manipulation
functions in <bit>.