platform/upstream/llvm.git
2 years agoAdd missing entries for Annex F and Annex H to the C status page
Aaron Ballman [Wed, 8 Jun 2022 19:52:06 +0000 (15:52 -0400)]
Add missing entries for Annex F and Annex H to the C status page

2 years ago[DWARF] Support 'G' in dwarf parser
Florian Mayer [Tue, 7 Jun 2022 00:54:56 +0000 (17:54 -0700)]
[DWARF] Support 'G' in dwarf parser

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D127171

2 years ago[MC] Add 'G' to augmentation string for MTE instrumented functions
Florian Mayer [Fri, 3 Jun 2022 01:05:02 +0000 (18:05 -0700)]
[MC] Add 'G' to augmentation string for MTE instrumented functions

This was agreed on in
https://lists.llvm.org/pipermail/llvm-dev/2020-May/141345.html

The thread proposed two options
* add a character to augmentation string and handle in libuwind
* use a separate personality function.

It was determined that this is the simpler and better option.

This is part of ARM's Aarch64 ABI:
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id22

The next step after this is teaching libunwind to untag when this
augmentation character is set.

Reviewed By: MaskRay, eugenis

Differential Revision: https://reviews.llvm.org/D127007

2 years ago[compiler-rt][test] Restore original symbolize_stack test
Paul Kirth [Wed, 8 Jun 2022 18:59:27 +0000 (18:59 +0000)]
[compiler-rt][test] Restore original symbolize_stack test

In D126580 we updated the test to reflect that there should always
be a full trace. However, some executions do not have symbolizer
information, so we will restore the original test until we can formulate
a more robust test.

Reviewed By: leonardchan

Differential Revision: https://reviews.llvm.org/D127334

2 years ago[JITLink][ELF][AArch64] Implement R_AARCH64_PREL32 and R_AARCH64_PREL64.
Sunho Kim [Wed, 8 Jun 2022 18:13:03 +0000 (11:13 -0700)]
[JITLink][ELF][AArch64] Implement R_AARCH64_PREL32 and R_AARCH64_PREL64.

This patch implements R_AARCH64_PREL64 and R_AARCH64_PREL32 relocations that is
used in eh frame pointers. The test case utlizes obj2yaml tool to create an
artifical eh frame that generates related relocation types.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D127058

2 years ago[CSSPGO][Preinliner] Set default value of sample-profile-inline-limit-max to 3000
Hongtao Yu [Wed, 8 Jun 2022 18:50:19 +0000 (11:50 -0700)]
[CSSPGO][Preinliner] Set default value of sample-profile-inline-limit-max to 3000

The default value of sample-profile-inline-limit-max is defined as 10000 in sampleprofile.cpp. This is too big for cspreinliner which works with assembly size instead of IR size. The value 3000 turns out to be a good tradeoff. Compared to the value 10000, 3000 gives as good performance and code size, but lower build time.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D127330

2 years agoRevert "Reland "[NFC][compiler-rt][asan] Unify asan and lsan allocator settings""
Leonard Chan [Wed, 8 Jun 2022 18:54:18 +0000 (11:54 -0700)]
Revert "Reland "[NFC][compiler-rt][asan] Unify asan and lsan allocator settings""

This reverts commit b37d84aa8d59dde2fae7388da5101bf471ec3434.

This broke aarch64 asan builders for fuchsia. I accidentally changed the allocator
settings for fuchsia on aarch64 because the new asan allocator settings use:

```
// AArch64/SANITIZER_CAN_USE_ALLOCATOR64 is only for 42-bit VMA
// so no need to different values for different VMA.
const uptr kAllocatorSpace =  0x10000000000ULL;
const uptr kAllocatorSize  =  0x10000000000ULL;  // 3T.
typedef DefaultSizeClassMap SizeClassMap;
```

rather than reaching the final `#else` which would use fuchsia's lsan config.

2 years ago[APFloat] Fix truncation of certain subnormal numbers
Danila Malyutin [Mon, 6 Jun 2022 17:12:43 +0000 (20:12 +0300)]
[APFloat] Fix truncation of certain subnormal numbers

Certain subnormals would be incorrectly rounded away from zero.

Fixes #55838

Differential Revision: https://reviews.llvm.org/D127140

2 years ago[SystemZ] Fix check for zero size when lowering memcmp.
Kai Nacke [Thu, 2 Jun 2022 17:42:49 +0000 (13:42 -0400)]
[SystemZ] Fix check for zero size when lowering memcmp.

During lowering of memcmp/bcmp, the check for a size of 0 is done
in 2 different ways. In rare cases this can lead to a crash in
SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause
is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a
constant int value which is not yet evaluated. When the value is
turned into a SDValue, then the evaluation is done and results in
a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special
case of 0 length to be handled, which results in an assertion.

The fix is to turn the value into a SDValue, so that both functions
use the same check.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D126900

2 years ago[lldb] Improve error reporting from TestAppleSimulatorOSType.py
Jonas Devlieghere [Wed, 8 Jun 2022 18:47:03 +0000 (11:47 -0700)]
[lldb] Improve error reporting from TestAppleSimulatorOSType.py

When we can't find a simulator, report the platform and architecture in
the error message.

2 years ago[MLIR][Presburger] subtract: improve redundant constraint detection
Arjun P [Wed, 8 Jun 2022 18:43:43 +0000 (14:43 -0400)]
[MLIR][Presburger] subtract: improve redundant constraint detection

When constraints in the two operands make each other redundant, prefer constraints of the second because this affects the number of sets in the output at each level; reducing these can help prevent exponential blowup.

This is accomplished by adding extra overloads to Simplex::detectRedundant that only scan a subrange of the constraints for redundancy.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D127237

2 years ago[cmake] Don't try creating an executable when detecting the linker
Louis Dionne [Tue, 17 May 2022 19:05:05 +0000 (15:05 -0400)]
[cmake] Don't try creating an executable when detecting the linker

On most platforms, the linker detection command that we run ends up being
something like `clang++ -Wl,-v` or `clang++ -Wl,--version`. This usually
fails with a missing reference to `_main` because we don't have any input
file. However, when compiling for a target that is implicitly freestanding,
the invocation actually succeeds and a dummy `a.out` file is created in
the current working directory. This is extremely annoying because it
creates a `a.out` file at the root of the monorepo when running CMake
configuration from the root.

Differential Revision: https://reviews.llvm.org/D125827

2 years ago[compiler-rt][hwasan] Check address tagging mode in InitializeOsSupport on Fuchsia
Leonard Chan [Wed, 8 Jun 2022 00:16:28 +0000 (17:16 -0700)]
[compiler-rt][hwasan] Check address tagging mode in InitializeOsSupport on Fuchsia

Differential Revision: https://reviews.llvm.org/D127262

2 years ago[lldb] Use objc_getRealizedClassList_trylock on macOS Ventura and later
Jonas Devlieghere [Wed, 8 Jun 2022 18:32:36 +0000 (11:32 -0700)]
[lldb] Use objc_getRealizedClassList_trylock on macOS Ventura and later

In order to avoid stranding the Objective-C runtime lock, we switched
from objc_copyRealizedClassList to its non locking variant
objc_copyRealizedClassList_nolock. Not taking the lock was relatively
safe because we run this expression on one thread only, but it was still
possible that someone was in the middle of modifying this list while we
were trying to read it. Worst case that would result in a crash in the
inferior without side-effects and we'd unwind and try again later.

With the introduction of macOS Ventura, we can use
objc_getRealizedClassList_trylock instead. It has semantics similar to
objc_copyRealizedClassList_nolock, but instead of not locking at all,
the function returns if the lock is already taken, which avoids the
aforementioned crash without stranding the Objective-C runtime lock.
Because LLDB gets to allocate the underlying memory we also avoid
stranding the malloc lock.

rdar://89373233

Differential revision: https://reviews.llvm.org/D127252

2 years ago[mlir] Refactoring the tablegen Tensor types
wren romano [Sat, 4 Jun 2022 00:17:56 +0000 (17:17 -0700)]
[mlir] Refactoring the tablegen Tensor types

Reduces repetition in tablegen files for defining various tensor types.  In particular the goal is to reduce the repetition when defining new tensor types (e.g., D126994).

Reviewed By: aartbik, rriddle

Differential Revision: https://reviews.llvm.org/D127039

2 years ago[clang][dataflow] Enable use of synthetic properties on all Value instances.
Wei Yi Tee [Wed, 8 Jun 2022 17:55:54 +0000 (19:55 +0200)]
[clang][dataflow] Enable use of synthetic properties on all Value instances.

This patch moves the implementation of synthetic properties from the StructValue class into the Value base class so that it can be used across all Value instances.

Reviewed By: gribozavr2, ymandel, sgatev, xazax.hun

Differential Revision: https://reviews.llvm.org/D127196

2 years ago[MSAN] Add result printing for failed call in pthread_getaffinity_np.
Kevin Athey [Wed, 8 Jun 2022 16:45:05 +0000 (09:45 -0700)]
[MSAN] Add result printing for failed call in pthread_getaffinity_np.

Will be reverted when test failure is diagnosed.

Depends on: https://reviews.llvm.org/D127185

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D127320

2 years ago[clang][deps] Make order of module dependencies deterministic
Ben Langmuir [Tue, 7 Jun 2022 19:13:08 +0000 (12:13 -0700)]
[clang][deps] Make order of module dependencies deterministic

This fixes the underlying module dependencies, which had a
non-deterministic order, which was also visible in the order of calls to
DependencyConsumer methods. This was not directly observable in
the clang-scan-deps utility, because it was previously seeing a sorted
order from std::map in DependencyScanningTool. However, the underlying
API previously created a likely issue for any other clients. Note: if
you only apply the change from DependencyScanningTool, you can see the
issue in clang-scan-deps, and existing tests will fail
non-deterministicaly.

Differential Revision: https://reviews.llvm.org/D127243

2 years ago[clang][deps] Set -disable-free for module compilations
Ben Langmuir [Tue, 7 Jun 2022 16:53:38 +0000 (09:53 -0700)]
[clang][deps] Set -disable-free for module compilations

The command-line arguments for module builds are cc1 commands, so they
do not implicitly set -disable-free like a driver invocation, and
Tooling will disable it for the scanning instance itself. Set
-disable-free explicitly so that separate invocations for building
modules will not pay for freeing memory unnecessarily.

Differential Revision: https://reviews.llvm.org/D127229

2 years ago[AMDGPU] gfx11 VOP3P instruction MC support
Joe Nash [Tue, 24 May 2022 17:31:09 +0000 (13:31 -0400)]
[AMDGPU] gfx11 VOP3P instruction MC support

Includes dpp versions of VOP3P instructions.

Patch 18/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126917

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126978

2 years ago[clang][driver] adds `-print-diagnostics`
Christopher Di Bella [Wed, 1 Jun 2022 17:43:52 +0000 (17:43 +0000)]
[clang][driver] adds `-print-diagnostics`

Prints a list of all the warnings that Clang offers.

Differential Revision: https://reviews.llvm.org/D126796

2 years ago[PseudoProbe] Use callee name as callsite identfier for MCDecodedPseudoProbeInlineTree.
Hongtao Yu [Wed, 25 May 2022 23:30:07 +0000 (16:30 -0700)]
[PseudoProbe] Use callee name as callsite identfier for MCDecodedPseudoProbeInlineTree.

The callsite identifier used in pseudo probe encoding and decoding is consisted of a function name and the callsite probe id. For encoding, i.e., `MCPseudoProbeInlineTree`, the function name is callee function name. However for decoding, i.e., `MCDecodedPseudoProbeInlineTree`, the caller function name is used actually. This results in multiple callees that are inlined at the same callsite, likely via indirect call promotion, sharing the same decoded inline frame. While it is not a problem for profile generation, it confuses probe re-encoding in Bolt.

In Bolt, we decode pseudo probes first and build `MCDecodedPseudoProbeInlineTree`. The decoded tree is used for final re-encoding. Here comes the problem. Two inlinees from the same callsite share the same decoded inline frame. During re-encoding, the frame name (whatever inlinee comes first) will be used and encoded in the bolted binary. This will cause wrong inline contexts  in the profile generated on the bolted binary.

The fix is a no-op to pre-bolt profile generation. Some of the bolt tests are not yet upstreamed, thus I'm not adding a bolt test here.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D126434

2 years ago[mlir] Lower complex.power and complex.rsqrt to standard dialect.
bixia1 [Wed, 8 Jun 2022 16:11:28 +0000 (09:11 -0700)]
[mlir] Lower complex.power and complex.rsqrt to standard dialect.

Add conversion tests and correctness tests.

Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D127255

2 years agoAdd Python bindings for the OpaqueType
dime10 [Wed, 8 Jun 2022 17:50:12 +0000 (19:50 +0200)]
Add Python bindings for the OpaqueType

Implement the C-API and Python bindings for the builtin opaque type, which was previously missing.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D127303

2 years ago[docs][clang] Minor typo fix
Jose Manuel Monsalve Diaz [Wed, 8 Jun 2022 17:41:04 +0000 (17:41 +0000)]
[docs][clang] Minor typo fix

Changing "iamge" to "image"

2 years ago[WebAssembly] Implement remaining relaxed SIMD instructions
Thomas Lively [Wed, 8 Jun 2022 17:32:10 +0000 (10:32 -0700)]
[WebAssembly] Implement remaining relaxed SIMD instructions

Add codegen, intrinsics, and builtins for the i16x8.relaxed_q15mulr_s,
i16x8.dot_i8x16_i7x16_s, and i32x4.dot_i8x16_i7x16_add_s instructions. These are
the last instructions from the relaxed SIMD proposal[1] that had not been
implemented.

[1]:
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md.

Differential Revision: https://reviews.llvm.org/D127170

2 years ago[CodeView] Fix incorrect CodeView encoding of signed integer constants
Steve Merritt [Mon, 23 May 2022 19:41:58 +0000 (15:41 -0400)]
[CodeView] Fix incorrect CodeView encoding of signed integer constants

Add proper CodeView encoding for positive constant integer values greater than
127.  In addition, use the two byte encoding form for positive values less
than LF_NUMERIC.

Differential Revision: https://reviews.llvm.org/D126968

2 years ago[X86] Regenerate slow-pmulld.ll with common SSE check prefixes
Simon Pilgrim [Wed, 8 Jun 2022 17:17:50 +0000 (18:17 +0100)]
[X86] Regenerate slow-pmulld.ll with common SSE check prefixes

Add back some unused check prefixes to simplify the D127115 regeneration

2 years agoRevert "[libc++][CI] Updates Docker image."
Mark de Wever [Wed, 8 Jun 2022 17:16:02 +0000 (19:16 +0200)]
Revert "[libc++][CI] Updates Docker image."

This reverts commit f2f0dba818a50fc17ed309823b2fdb72cb725eec.

This Docker file doesn't work on the CI. It fails to clone the checkout.
This seems like an issue with a newer glibc on an older Docker where the
clone3() call fails.

This needs further investigation before relanding.

2 years ago[mlir] Fix handling of some region branch terminator successors
Mogball [Wed, 8 Jun 2022 00:01:44 +0000 (00:01 +0000)]
[mlir] Fix handling of some region branch terminator successors

When `RegionBranchOpInterface::getSuccessorRegions` is called for anything other than the parent op, it expects the operands of the terminator of the source region to be passed, not the operands of the parent op. This was not always respected.

This fixes a bug in integer range inference and ForwardDataFlowSolver and changes `scf.while` to allow narrowing of successors using constant inputs.

Fixes #55873

Reviewed By: mehdi_amini, krzysz00

Differential Revision: https://reviews.llvm.org/D127261

2 years ago[Clang] Fix memory leak due to TemplateArgumentListInfo used in AST node.
Andrew Browne [Fri, 3 Jun 2022 00:42:54 +0000 (17:42 -0700)]
[Clang] Fix memory leak due to TemplateArgumentListInfo used in AST node.

It looks like the leak is rooted at the allocation here:
https://github.com/llvm/llvm-project/blob/1a155ee7de3b62a2fabee86fb470a1554fadc54d/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp#L3857

The VarTemplateSpecializationDecl is allocated using placement new which uses the AST structure for ownership: https://github.com/llvm/llvm-project/blob/1a155ee7de3b62a2fabee86fb470a1554fadc54d/clang/lib/AST/DeclBase.cpp#L99

The problem is the TemplateArgumentListInfo inside https://github.com/llvm/llvm-project/blob/1a155ee7de3b62a2fabee86fb470a1554fadc54d/clang/include/clang/AST/DeclTemplate.h#L2721
This object contains a vector which does not use placement new: https://github.com/llvm/llvm-project/blob/1a155ee7de3b62a2fabee86fb470a1554fadc54d/clang/include/clang/AST/TemplateBase.h#L564

Apparently ASTTemplateArgumentListInfo should be used instead https://github.com/llvm/llvm-project/blob/1a155ee7de3b62a2fabee86fb470a1554fadc54d/clang/include/clang/AST/TemplateBase.h#L575

https://reviews.llvm.org/D125802#3551305

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D126944

2 years ago[libc++][NFC] Add missing 'return 0'
Louis Dionne [Wed, 8 Jun 2022 16:56:13 +0000 (12:56 -0400)]
[libc++][NFC] Add missing 'return 0'

2 years ago[mlir][sparse] Add F16 and BF16.
bixia1 [Tue, 7 Jun 2022 23:07:13 +0000 (16:07 -0700)]
[mlir][sparse] Add F16 and BF16.

This is the first PR to add `F16` and `BF16` support to the sparse codegen. There are still problems in supporting these two data types, such as `BF16` is not quite working yet.

Add tests cases.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D127010

2 years ago[X86] combineMOVMSK - constant fold with getTargetConstantBitsFromNode not just BUILD...
Simon Pilgrim [Wed, 8 Jun 2022 16:48:47 +0000 (17:48 +0100)]
[X86] combineMOVMSK - constant fold with getTargetConstantBitsFromNode not just BUILD_VECTOR

Help avoid a regression in D127115

2 years ago[libc++][NFC] Simplify enable_if for std::copy optimization
Louis Dionne [Tue, 7 Jun 2022 17:16:52 +0000 (13:16 -0400)]
[libc++][NFC] Simplify enable_if for std::copy optimization

Get rid of the __is_trivially_copy_assignable_unwrapped helper, which
is only used in one place, and use __iter_value_type instead of
iterator_traits<T>::value_type.

Differential Revision: https://reviews.llvm.org/D127230

2 years ago[flang] Add one missed semantic check for named constant in common block
PeixinQiao [Wed, 8 Jun 2022 16:43:30 +0000 (00:43 +0800)]
[flang] Add one missed semantic check for named constant in common block

As Fortran 2018 R874, common block object must be one variable name, which
cannot be one named constant. Add this check.

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D126762

2 years ago[flang] Add one semantic check for procedure bind(C) interface-name
PeixinQiao [Wed, 8 Jun 2022 16:38:14 +0000 (00:38 +0800)]
[flang] Add one semantic check for procedure bind(C) interface-name

As Fortran 2018 C1521, in procedure declaration statement, if
proc-language-binding-spec (bind(c)) is specified, the proc-interface
shall appear, it shall be an interface-name, and interface-name shall
be declared with a proc-language-binding-spec.

Reviewed By: klausler, Jean Perier

Differential Revision: https://reviews.llvm.org/D127121

2 years ago[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Jose Manuel Monsalve Diaz [Wed, 1 Jun 2022 21:49:23 +0000 (21:49 +0000)]
[LIBOMPTARGET] Adding AMD to llvm-omp-device-info

Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
    This is a generic-elf-64bit device

Device (1):
    This is a generic-elf-64bit device

Device (2):
    This is a generic-elf-64bit device

Device (3):
    This is a generic-elf-64bit device

Device (4):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           0
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (5):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           1
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (6):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           2
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (7):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           3
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE
```

Differential Revision: https://reviews.llvm.org/D126836

2 years ago[NFC][Flang][OpenMP] Refactor getting ompobject symbol
PeixinQiao [Wed, 8 Jun 2022 16:29:07 +0000 (00:29 +0800)]
[NFC][Flang][OpenMP] Refactor getting ompobject symbol

Getting ompobject symbol is needed in multiple places and will be
needed later for the lowering of other constructs/clauses such as
copyin clause. Extract them into one function.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D127280

2 years ago[libc++] Make sure we add /llvm to the list of safe directories
Louis Dionne [Wed, 8 Jun 2022 16:25:05 +0000 (12:25 -0400)]
[libc++] Make sure we add /llvm to the list of safe directories

With the new version of Git in Ubuntu Jammy (which is now what we use in
our Docker image), we need to add `/llvm` to the list of safe directories
to avoid failures.

2 years ago[AArch64] Remove ToBeRemoved from AArch64MIPeepholeOpt
David Green [Wed, 8 Jun 2022 16:26:07 +0000 (17:26 +0100)]
[AArch64] Remove ToBeRemoved from AArch64MIPeepholeOpt

The ToBeRemoved is used to remove any MachineInstructions that are no
longer needed, making sure we don't invalidate the iterator that is
currently in use by erasing the instruction straight away. This makes
issues for keeping the code in SSA from though, where subsequent
transforms that require SSA form may have been broken by previous
peepholes.

If, instead, we use make_early_inc_range the iteration issue shouldn't
be present, so long as we do not remove the subsequent instruction in
the peephole optimizations. That way the code between transforms is kept
in SSA form, meaning hopefully less things that can go wrong.

Differential Revision: https://reviews.llvm.org/D127296

2 years ago[libc] Fix build when __FE_DENORM is defined
Alex Brachet [Wed, 8 Jun 2022 16:21:53 +0000 (16:21 +0000)]
[libc] Fix build when __FE_DENORM is defined

Differential revision: https://reviews.llvm.org/D127222

2 years ago[flang][NFC] Move genMaxWithZero into fir:::factory
jeanPerier [Wed, 8 Jun 2022 16:01:50 +0000 (18:01 +0200)]
[flang][NFC] Move genMaxWithZero into fir:::factory

Move tthe function to allow its usage in the Optimizer/Builder functions.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D127295

2 years ago[lldb] Update TestMultithreaded to report FAIL for a non-zero exit code
Jonas Devlieghere [Wed, 8 Jun 2022 15:45:53 +0000 (08:45 -0700)]
[lldb] Update TestMultithreaded to report FAIL for a non-zero exit code

A non-zero exit code from the test binary results in a
CalledProcessError. Without catching the exception, that would result in
a error (unresolved test) instead of a failure. This patch fixes that.

2 years ago[lldb] Parse the dotest output to determine the most appropriate result code
Jonas Devlieghere [Wed, 8 Jun 2022 15:35:38 +0000 (08:35 -0700)]
[lldb] Parse the dotest output to determine the most appropriate result code

Currently we look for keywords in the dotest.py output to determine the
lit result code. This binary approach of a keyword being present works
for PASS and FAIL, where having at least one test pass or fail
respectively results in that exit code. Things are more complicated
for tests that neither passed or failed, but report a combination of
(un)expected failures, skips or unresolved tests.

This patch changes the logic to parse the number of tests with a
particular result from the dotest.py output. For tests that did not PASS
or FAIL, we now report the lit result code for the one that occurred the
most. For example, if we had a test with 3 skips and 4 expected
failures, we report the test as XFAIL.

We're still mapping multiple tests to one result code, so some loss of
information is inevitable.

Differential revision: https://reviews.llvm.org/D127258

2 years ago[WebAssembly] Regenerate simd-build-vector.ll to show full codegen
Simon Pilgrim [Wed, 8 Jun 2022 15:54:26 +0000 (16:54 +0100)]
[WebAssembly] Regenerate simd-build-vector.ll to show full codegen

2 years ago[flang] Add proper todo in BoxValue
Valentin Clement [Wed, 8 Jun 2022 15:50:49 +0000 (17:50 +0200)]
[flang] Add proper todo in BoxValue

Switch debub message to proper TODOs.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D127282

2 years agoReland [AMDGPU] gfx11 VOP1+VOP2 Instruction MC support
Joe Nash [Mon, 23 May 2022 14:26:02 +0000 (10:26 -0400)]
Reland [AMDGPU] gfx11 VOP1+VOP2 Instruction MC support

The reverted dependent commit is now relanded, so reland this.
Includes dpp instructions and vop1/vop2 promoted to vop3

Patch 17/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126483

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126917

2 years agoAdd a parameter to LoadFromASTFile that accepts a file system and defaults to the...
Andy Soffer [Wed, 8 Jun 2022 15:17:22 +0000 (15:17 +0000)]
Add a parameter to LoadFromASTFile that accepts a file system and defaults to the real file-system.

Reviewed By: ymandel

Differential Revision: https://reviews.llvm.org/D126888

2 years agoAdd an error message to the default SIGPIPE handler
Tim Northover [Wed, 11 May 2022 08:52:10 +0000 (09:52 +0100)]
Add an error message to the default SIGPIPE handler

UNIX03 conformance requires utilities to flush stdout before exiting and raise
an error if writing fails. Flushing already happens on a call to exit
and thus automatically on a return from main. Write failure is then
detected by LLVM's default SIGPIPE handler. The handler already exits with
a non-zero code, but conformance additionally requires an error message.

2 years ago[Dexter] Use PurePath to compare paths in Dexter commands
Stephen Tozer [Mon, 6 Jun 2022 10:20:05 +0000 (11:20 +0100)]
[Dexter] Use PurePath to compare paths in Dexter commands

Prior to this patch, when comparing the paths of source files in Dexter
commands, we would use os.samefile. This function performs actual file
operations and requires the files to exist on the current system; this
is suitable when running the test for the first time, but renders the
DextIR output files non-portable, and unusable if the source files no
longer exist in their original location.

Differential Revision: https://reviews.llvm.org/D127099

2 years ago[RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset.
Craig Topper [Wed, 8 Jun 2022 15:20:34 +0000 (08:20 -0700)]
[RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset.

Add with immediates in the range [-4096, -2049] or [2048, 4095] get
convert to two ADDIs. Teach RISCVMergeBaseOffset to recognize this
pattern as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126843

2 years ago[RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset.
Craig Topper [Wed, 8 Jun 2022 15:06:56 +0000 (08:06 -0700)]
[RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset.

LUI+ADDIW always produces a simm32. This allows us to always
fold it into a global offset.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126729

2 years ago[mlir][spirv] NFC: fix typo in UnifyAliasedResourcePass pass
Lei Zhang [Wed, 8 Jun 2022 15:17:41 +0000 (08:17 -0700)]
[mlir][spirv] NFC: fix typo in UnifyAliasedResourcePass pass

Reviewed By: ThomasRaoux, hanchung

Differential Revision: https://reviews.llvm.org/D127265

2 years ago[DAG] visitVSELECT - don't wait for truncation of sub before attempting to match...
Simon Pilgrim [Wed, 8 Jun 2022 15:16:26 +0000 (16:16 +0100)]
[DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT

Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again

2 years ago[DA] Handle mismatching loop levels by considering them non-linear
Bardia Mahjour [Wed, 8 Jun 2022 15:15:37 +0000 (11:15 -0400)]
[DA] Handle mismatching loop levels by considering them non-linear

To represent various loop levels within a nest, DA implements a special
numbering scheme (see comment atop establishNestingLevels). The goal of
this numbering scheme appears to be representing each unique loop
distinctively by using as little memory as possible. This numbering
scheme is simple when the source and destination of the dependence are
in the same loop. In such cases the level is simply the depth of the
loop in which src and dst reside. When the src and dst are not in the
same loop, we could run into the following situation exposed by
https://reviews.llvm.org/D71539. This patch fixes this by detecting
such cases in checkSubscripts and treating them as non-linear/non-affine.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D110973

2 years ago[SystemZ] Use STDY/STEY/LDY/LEY for VR32/VR64 in eliminateFrameIndex().
Jonas Paulsson [Wed, 8 Dec 2021 00:34:26 +0000 (18:34 -0600)]
[SystemZ] Use STDY/STEY/LDY/LEY for VR32/VR64 in eliminateFrameIndex().

When e.g. a VR64 register is spilled to a stack slot requiring a long
(20-bit) displacement, it is possible to use an FP opcode if the allocated
phys reg allows it. This eliminates the use of a separate LAY instruction.

Reviewed By: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D115406

2 years ago[RISCV] Untangle instruction properties from VSETVLIInfo [NFC]
Philip Reames [Wed, 8 Jun 2022 15:08:03 +0000 (08:08 -0700)]
[RISCV] Untangle instruction properties from VSETVLIInfo [NFC]

The abstract state used in the data flow should not know anything about the instructions which produced the abstract states. Instead, when comparing two states, we can simply use information about the machine instr at that time.

In the old design, basically any use of the instruction flags on the current (as opposed to a "Require" - aka upcoming state) would be a bug. We don't seem to actually have any such bugs, but we can make this much more obvious with code structure.

Differential Revision: https://reviews.llvm.org/D126921

2 years ago[clang] co_return cleanup
Nathan Sidwell [Fri, 13 May 2022 12:04:48 +0000 (05:04 -0700)]
[clang] co_return cleanup

There's no need for the CoreturnStmt getChildren member to deal with
the presence or absence of the operand member. All users already deal
with null children.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D125542

2 years ago[clang][NFC][SVE] Add tests for operators on VLS vectors
David Truby [Tue, 7 Jun 2022 15:10:23 +0000 (16:10 +0100)]
[clang][NFC][SVE] Add tests for operators on VLS vectors

This patch adds codegen tests for operators on SVE VLS vector types

2 years ago[Dexter] Catch value error when encountering invalid address
Stephen Tozer [Mon, 6 Jun 2022 11:22:24 +0000 (12:22 +0100)]
[Dexter] Catch value error when encountering invalid address

The DexDeclareAddress command checks the value of a variable at a
certain point in the debugged program, and saves that value to be used
in other commands. If the value at that point is not a valid address
however, it currently causes an error in Dexter when we try to cast it -
this is fixed in this patch by catching the error and leaving the
address value unresolved.

Differential Revision: https://reviews.llvm.org/D127101

2 years agoRestore isa<Ty>(X) asserts inside cast<Ty>(X)
Philip Reames [Wed, 8 Jun 2022 14:11:12 +0000 (07:11 -0700)]
Restore isa<Ty>(X) asserts inside cast<Ty>(X)

PLEASE DO NOT REVERT without careful consideration, and preferably prior
discussion.

cast<Ty>(X) is a "checked cast". Its entire purpose is explicitly documented
(https://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast
templates) as catching bad casts by asserting that the cast is valid.
Unfortunately, in a recent rewrite of our casting infrastructure about three
months back, these asserts got dropped.

This is discussed in more detail on discourse in https://discourse.llvm.org/t/cast-x-is-broken-implications-and-proposal-to-address/63033.

Differential Revision: https://reviews.llvm.org/D127231

2 years ago[AArch64] Add tests for bitcast high register extracts. NFC
David Green [Wed, 8 Jun 2022 14:26:31 +0000 (15:26 +0100)]
[AArch64] Add tests for bitcast high register extracts. NFC

2 years ago[RISCV] Add ISD::EH_DWARF_CFA
Shao-Ce SUN [Thu, 26 May 2022 18:42:00 +0000 (02:42 +0800)]
[RISCV] Add ISD::EH_DWARF_CFA

Based on D24038.
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D126181

2 years ago[Target] Remove `startswith` for adding `SHF_EXCLUDE` to offload section
Joseph Huber [Wed, 8 Jun 2022 13:54:08 +0000 (09:54 -0400)]
[Target] Remove `startswith` for adding `SHF_EXCLUDE` to offload section

Summary:
We use the special section name `.llvm.offloading` to store device
imagees in the host object file. We want these to be stripped by the
linker as they are not used after linking so we use the `SHF_EXCLUDE`
flag to instruct the linker to drop them. We used to do this for all
sections that started with `.llvm.offloading` when we encoded metadata
in the section name itself. Now we embed a special binary containing the
metadata, we should only add the flag on this name specifically.

2 years ago[Libomptarget] Add missing include to define `printf`
Joseph Huber [Wed, 8 Jun 2022 13:11:04 +0000 (09:11 -0400)]
[Libomptarget] Add missing include to define `printf`

Summary:
This test was failing because of an implicit declaration of `printf`
which isn't legal with newer C, causing it to fail. This patch just adds
the necessary header.

2 years ago[libc] Add expm1f function to bazel's build overlay.
Tue Ly [Wed, 8 Jun 2022 13:36:07 +0000 (09:36 -0400)]
[libc] Add expm1f function to bazel's build overlay.

Add expm1f function to bazel's build overlay.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D127298

2 years agoM68k: Fix build
Matt Arsenault [Wed, 8 Jun 2022 13:25:57 +0000 (09:25 -0400)]
M68k: Fix build

2 years agoRevert "[RISCV] Testcase to show wrong register allocation result of subreg liveness"
Kito Cheng [Wed, 8 Jun 2022 13:19:27 +0000 (21:19 +0800)]
Revert "[RISCV] Testcase to show wrong register allocation result of subreg liveness"

Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS.

This reverts commit cbe22c794348a1962af8a5d21fbedbb65974d94c.

2 years agoRecommit "[VPlan] Remove uneeded needsVectorIV check."
Florian Hahn [Wed, 8 Jun 2022 13:06:45 +0000 (14:06 +0100)]
Recommit "[VPlan] Remove uneeded needsVectorIV check."

This reverts commit 266ea446ab747671eb6c736569c3c9c5f3c53d11.

The reasons for the revert have been addressed by cleaning up condition
handling in VPlan and properly marking VPBranchOnMaskRecipe as using
scalars.

The test case for the revert from D123720 has been added in 3d663308a5d.

2 years agoCorrecting some links in the C status page
Aaron Ballman [Wed, 8 Jun 2022 12:51:52 +0000 (08:51 -0400)]
Correcting some links in the C status page

The paper titles were correct, but the document number and links were
incorrect (typo'ed numbers).

2 years ago[sanitizer] Fix shift UB in LEB128 test
Nikita Popov [Wed, 8 Jun 2022 12:19:07 +0000 (14:19 +0200)]
[sanitizer] Fix shift UB in LEB128 test

If u64 and uptr have the same size, then this will perform a shift
by the bitwidth, which is UB. We only need this code if uptr is
smaller than u64.

2 years agoAdd the 2022 papers to the C status tracking page
Aaron Ballman [Wed, 8 Jun 2022 11:41:34 +0000 (07:41 -0400)]
Add the 2022 papers to the C status tracking page

This adds the papers from Feb 2022 (parts 1 and 2) and May 2022.

2 years ago[LV] Add test that caused revert of D123720.
Florian Hahn [Wed, 8 Jun 2022 11:25:17 +0000 (12:25 +0100)]
[LV] Add test that caused revert of D123720.

2 years ago[BOLT] Set valid index for functions with profiles
Vladislav Khmelevsky [Tue, 7 Jun 2022 15:40:04 +0000 (18:40 +0300)]
[BOLT] Set valid index for functions with profiles

Some of the passes that calculates tentative layout like LongJmp and
Golang are expecting that only functions with valid index will be
located in hot text section. But currently functions with valid profiles
and not set index are breaking this logic, to fix this we can move the
hasValidProfile() condition from AssignSections pass to ReorderFunctions.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D127223

2 years ago[MLIR][Math] Add round operation
lorenzo chelini [Thu, 2 Jun 2022 14:49:23 +0000 (16:49 +0200)]
[MLIR][Math] Add round operation

Introduce RoundOp in the math dialect. The operation rounds the operand to the
nearest integer value in floating-point format. RoundOp lowers to LLVM
intrinsics 'llvm.intr.round' or as a function call to libm (round or roundf).

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D127286

2 years ago[Hexagon] Regenerate build-vector-v4i8-zext.ll to show full codegen
Simon Pilgrim [Wed, 8 Jun 2022 10:41:54 +0000 (11:41 +0100)]
[Hexagon] Regenerate build-vector-v4i8-zext.ll to show full codegen

2 years ago[AST] Make header self-contained
Benjamin Kramer [Wed, 8 Jun 2022 10:35:24 +0000 (12:35 +0200)]
[AST] Make header self-contained

There's a dependency in AbstractTypeReader.inc that becomes an error
after D127231.

2 years ago[gn build] Port 916e9052ba95
LLVM GN Syncbot [Wed, 8 Jun 2022 10:19:18 +0000 (10:19 +0000)]
[gn build] Port 916e9052ba95

2 years ago[CMake] Improve support for ASAN on Windows with MSVC cl & clang-cl
Andrew Ng [Tue, 31 May 2022 14:13:24 +0000 (15:13 +0100)]
[CMake] Improve support for ASAN on Windows with MSVC cl & clang-cl

Tested with MSVC 2019 (19.29) and LLVM 14.0.4.

Differential Revision: https://reviews.llvm.org/D126706

2 years ago[libc++] Implement ranges::adjacent_find
Nikolas Klauser [Wed, 8 Jun 2022 10:14:12 +0000 (12:14 +0200)]
[libc++] Implement ranges::adjacent_find

Reviewed By: Mordante, var-const, #libc

Spies: libcxx-commits, mgorny

Differential Revision: https://reviews.llvm.org/D126610

2 years ago[Docs] Add version support information for opaque pointers (NFC)
Nikita Popov [Wed, 8 Jun 2022 09:51:28 +0000 (11:51 +0200)]
[Docs] Add version support information for opaque pointers (NFC)

I've seen a few people try to enable opaque pointers with LLVM 14
already. While LLVM 14 has pretty good baseline support, there are
enough missing pieces that you're definitely going to hit assertion
failures if you try this.

Add some wording to make it clear what the support (or planned
support) for opaque/typed pointers is across LLVM 14, 15, and 16.

2 years ago[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST.
Paul Walker [Mon, 6 Jun 2022 15:23:48 +0000 (16:23 +0100)]
[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST.

Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work
with TypeSize directly rather than silently casting to unsigned.

To accomplish this I've extended TypeSize with an interface that
essentially allows TypeSize division when both operands have the
same number of dimensions.

There still exists combinations of scalable vector bitcasts that
cause compiler crashes. I call these out by adding "is missing"
entries to sve-bitcast.

Depends on D126957.
Fixes: #55114

Differential Revision: https://reviews.llvm.org/D127126

2 years ago[SVE] Fix incorrect code generation for bitcasts of unpacked vector types.
Paul Walker [Tue, 31 May 2022 09:59:05 +0000 (10:59 +0100)]
[SVE] Fix incorrect code generation for bitcasts of unpacked vector types.

Bitcasting between unpacked scalable vector types of different
element counts is not a NOP because the live elements are laid out
differently.
               01234567
e.g. nxv2i32 = XX??XX??
     nxv4f16 = X?X?X?X?

Differential Revision: https://reviews.llvm.org/D126957

2 years ago[Bitcode] Re-enable verify-uselistorder test (NFC)
Nikita Popov [Wed, 8 Jun 2022 09:28:57 +0000 (11:28 +0200)]
[Bitcode] Re-enable verify-uselistorder test (NFC)

This issue has since been fixed, so re-enable the commented RUN
line.

2 years ago[Test] Add XFAIL test for PR55689
Max Kazantsev [Wed, 8 Jun 2022 08:59:50 +0000 (15:59 +0700)]
[Test] Add XFAIL test for PR55689

SCEV issues in dynamically unreached code, see details at https://github.com/llvm/llvm-project/issues/55689

1st reduced test by Nikic!

2 years ago[doc] Add release notes about SEH unwind information on ARM
Martin Storsjö [Mon, 6 Jun 2022 20:56:31 +0000 (23:56 +0300)]
[doc] Add release notes about SEH unwind information on ARM

Differential Revision: https://reviews.llvm.org/D127150

2 years ago[mlir][bufferize] Improve buffer writability analysis
Matthias Springer [Tue, 7 Jun 2022 22:04:54 +0000 (00:04 +0200)]
[mlir][bufferize] Improve buffer writability analysis

Find writability conflicts (writes to buffers that are not allowed to be written to) by checking SSA use-def chains. This is better than the current writability analysis, which is too conservative and finds false positives.

Differential Revision: https://reviews.llvm.org/D127256

2 years ago[NFC] Remove commented cerr debugging loggings
Chuanqi Xu [Wed, 8 Jun 2022 07:45:21 +0000 (15:45 +0800)]
[NFC] Remove commented cerr debugging loggings

There are some unused cerr debugging loggings in the codes. It is weird
to remain such commented debug helpers in the product.

2 years ago[Sanitizers] intercept FreeBSD procctl
David CARLIER [Wed, 8 Jun 2022 07:55:10 +0000 (08:55 +0100)]
[Sanitizers] intercept FreeBSD procctl

Reviewers: vitalybuka, emaster

Reviewed-By: viatelybuka
Differential Revision: https://reviews.llvm.org/D127069

2 years ago[mlir][MemRef] Fix a crash when expanding a scalar shape
Benjamin Kramer [Tue, 7 Jun 2022 17:30:10 +0000 (19:30 +0200)]
[mlir][MemRef] Fix a crash when expanding a scalar shape

In this case the reassociation is empty, yielding no strides for the
result type.

Differential Revision: https://reviews.llvm.org/D127232

2 years ago[gn build] Port 638b0fb4d651
LLVM GN Syncbot [Wed, 8 Jun 2022 07:20:40 +0000 (07:20 +0000)]
[gn build] Port 638b0fb4d651

2 years ago[ADT][NFC] Early bail out for ComputeEditDistance
Nathan James [Wed, 8 Jun 2022 07:20:28 +0000 (08:20 +0100)]
[ADT][NFC] Early bail out for ComputeEditDistance

The minimun bound for number of edits is the size difference between the 2 arrays.
If MaxEditDistance is smaller than this, we can bail out early without needing to traverse any of the arrays.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D127070

2 years ago[MLIR][SCF] Improve doc (NFC)
lorenzo chelini [Wed, 8 Jun 2022 06:46:36 +0000 (08:46 +0200)]
[MLIR][SCF] Improve doc (NFC)

2 years agoRevert "[SplitKit] Handle early clobber + tied to def correctly"
Kito Cheng [Wed, 8 Jun 2022 05:05:35 +0000 (13:05 +0800)]
Revert "[SplitKit] Handle early clobber + tied to def correctly"

Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS.

This reverts commit e14d04909df4e52e531f6c2e045c3cf9638dd817.

2 years ago[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on Linux
Fangrui Song [Wed, 8 Jun 2022 04:22:38 +0000 (21:22 -0700)]
[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on Linux

This makes the LLVM_ENABLE_PROJECTS mode (supported for compiler-rt, deprecated
(D112724) for libcxx/libcxxabi/libunwind) closer to
https://libcxx.llvm.org/BuildingLibcxx.html#bootstrapping-build .
The layout is arguably superior because different libraries of target triples
are in different directories, similar to GCC/Debian multiarch.

When LLVM_DEFAULT_TARGET_TRIPLE is x86_64-unknown-linux-gnu,
`lib/clang/15.0.0/lib/libclang_rt.asan-x86_64.a`
is moved to
`lib/clang/15.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.asan.a`.

In addition, if the host compiler supports -m32 (multilib),
`lib/clang/15.0.0/lib/libclang_rt.asan-i386.a`
is moved to
`lib/clang/15.0.0/lib/i386-unknown-linux-gnu/libclang_rt.asan.a`.

Reviewed By: mstorsjo, ldionne, #libc

Differential Revision: https://reviews.llvm.org/D107799

2 years ago[DirectX][Fail crash in DXILPrepareModule pass when input has typed ptr.
python3kgae [Wed, 8 Jun 2022 01:38:29 +0000 (18:38 -0700)]
[DirectX][Fail crash in DXILPrepareModule pass when input has typed ptr.

Check supportsTypedPointers instead of hasSetOpaquePointersValue when query if has typed ptr.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D127268

2 years ago[SplitKit] Handle early clobber + tied to def correctly
Kito Cheng [Thu, 26 May 2022 10:43:04 +0000 (18:43 +0800)]
[SplitKit] Handle early clobber + tied to def correctly

Spliter will try to extend a live range into `r` slot for a use operand,
that's works on most situaion, however that not work correctly when the operand
has tied to def, and the def operand is early clobber.

Give an example to demo what's wrong:
  0  %0 = ...
 16  early-clobber %0 = Op %0 (tied-def 0), ...
 32  ... = Op %0

Before extend:
 %0 = [0r, 0d) [16e, 32d)

The point we want to extend is 0d to 16e not 16r in this case, but if
we use 16r here we will extend nothing because that already contained
in [16e, 32d).

This patch add check for detect such case and adjust the extend point.

Detailed explanation for testcase: https://reviews.llvm.org/D126047

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D126048

2 years ago[RISCV] Testcase to show wrong register allocation result of subreg liveness
Kito Cheng [Wed, 8 Jun 2022 02:23:35 +0000 (10:23 +0800)]
[RISCV] Testcase to show wrong register allocation result of subreg liveness

This testcase show the live range isn't construct correctly when subreg
liveness is enabled.

In the testcase `early-clobber-tied-def-subreg-liveness.ll`, first operand of
`vsext.vf2 v8, v16, v0.t` is both def and use, and the use is come from
the memory location of `.L__const._Z3foov.var_49`, it's load and spilled
into stack, and then...v8 is overwrite by another instructions.

```
lui a0, %hi(.L__const._Z3foov.var_49)
addi a0, a0, %lo(.L__const._Z3foov.var_49)
...
vle16.v v8, (a0) # Load value from var_49
...
addi a0, sp, 16
...
vs2r.v v8, (a0) # Spill
...
vl2r.v v8, (a1) # Reload
...
lui a0, %hi(.L__const._Z3foov.var_40)
addi a0, a0, %lo(.L__const._Z3foov.var_40)
vle16.v v8, (a0)     # Load value...into v8???
vmsbc.vx v0, v8, a0  # And use that.
...
vsext.vf2 v8, v16, v0.t # But v8 is here...which is expect value from the reload
```

The `early-clobber-tied-def-subreg-liveness.mir` has more detailed
infomation for that, `%25.sub_vrm2_0` is defined in 64, and used in 464,
and defined again in 464, and we has used an inline asm to clobber all
vector register for trigger spliter.

```
0B      bb.0.entry:
16B       %0:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_49
32B       %1:gpr = ADDI %0:gpr, target-flags(riscv-lo) @__const._Z3foov.var_49
48B       dead $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype
64B       undef %25.sub_vrm2_0:vrn4m2nov0 = PseudoVLE16_V_M2 %1:gpr, 2, 4, implicit $vl, implicit $vtype
80B       %3:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_48
96B       %4:gpr = ADDI %3:gpr, target-flags(riscv-lo) @__const._Z3foov.var_48
112B      %5:vr = PseudoVLE8_V_M1 %4:gpr, 2, 3, implicit $vl, implicit $vtype
128B      %6:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_46
144B      %7:gpr = ADDI %6:gpr, target-flags(riscv-lo) @__const._Z3foov.var_46
160B      %25.sub_vrm2_1:vrn4m2nov0 = PseudoVLE16_V_M2 %7:gpr, 2, 4, implicit $vl, implicit $vtype
176B      %9:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_45
192B      %10:gpr = ADDI %9:gpr, target-flags(riscv-lo) @__const._Z3foov.var_45
208B      %25.sub_vrm2_2:vrn4m2nov0 = PseudoVLE16_V_M2 %10:gpr, 2, 4, implicit $vl, implicit $vtype
224B      INLINEASM &"" [sideeffect] [attdialect], $0:[clobber], ...
240B      %12:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_44
256B      %13:gpr = ADDI %12:gpr, target-flags(riscv-lo) @__const._Z3foov.var_44
272B      dead $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype
288B      %25.sub_vrm2_3:vrn4m2nov0 = PseudoVLE16_V_M2 %13:gpr, 2, 4, implicit $vl, implicit $vtype
304B      $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype
320B      %16:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_40
336B      %17:gpr = ADDI %16:gpr, target-flags(riscv-lo) @__const._Z3foov.var_40
352B      %18:vrm2 = PseudoVLE16_V_M2 %17:gpr, 2, 4, implicit $vl, implicit $vtype
368B      $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype
384B      %20:gpr = LUI 1048572
400B      %21:gpr = ADDIW %20:gpr, 928
416B      early-clobber %22:vr = PseudoVMSBC_VX_M2 %18:vrm2, %21:gpr, 2, 4, implicit $vl, implicit $vtype
432B      $x0 = PseudoVSETIVLI 2, 9, implicit-def $vl, implicit-def $vtype
448B      $v0 = COPY %22:vr
464B      early-clobber %25.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %25.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, killed $v0, 2, 4, 0, implicit $vl, implicit $vtype
480B      %26:gpr = LUI target-flags(riscv-hi) @var_47
496B      %27:gpr = ADDI %26:gpr, target-flags(riscv-lo) @var_47
512B      PseudoVSSEG4E16_V_M2 %25:vrn4m2nov0, %27:gpr, 2, 4, implicit $vl, implicit $vtype
528B      PseudoRET
```

When spliter will try to split %25:

```
selectOrSplit VRN4M2NoV0:%25 [64r,160r:4)[160r,208r:0)[208r,288r:1)[288r,464e:2)[464e,512r:3) 0@160r 1@208r 2@288r 3@464e 4@64r  L0000000000000030 [160r,512r:0) 0@160r  L00000000000000C0 [208r,512r:0) 0@208r  L0000000000000300 [288r,512r:0) 0@288r  L000000000000000C [64r,464e:1)[464e,512r:0) 0@464e 1@64r  weight:1.179245e-02 w=1.179245e-02
```

```
Best local split range: 64r-208r, 6.999861e-03, 3 instrs
    enterIntvBefore 64r: not live
    leaveIntvAfter 208r: valno 1
    useIntv [64B;216r): [64B;216r):1
  blit [64r,160r:4): [64r;160r)=1(%29)(recalc)
  blit [160r,208r:0): [160r;208r)=1(%29)(recalc)
  blit [208r,288r:1): [208r;216r)=1(%29)(recalc) [216r;288r)=0(%28)(recalc)
  blit [288r,464e:2): [288r;464e)=0(%28)(recalc)
  blit [464e,512r:3): [464e;512r)=0(%28)(recalc)
  rewr %bb.0    464e:0  early-clobber %28.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %25.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, $v0, 2, 4, 0, implicit $vl, implicit $vtype
  rewr %bb.0    288r:0  %28.sub_vrm2_3:vrn4m2nov0 = PseudoVLE16_V_M2 %13:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    208r:1  %29.sub_vrm2_2:vrn4m2nov0 = PseudoVLE16_V_M2 %10:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    160r:1  %29.sub_vrm2_1:vrn4m2nov0 = PseudoVLE16_V_M2 %7:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    64r:1   undef %29.sub_vrm2_0:vrn4m2nov0 = PseudoVLE16_V_M2 %1:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    464B:0  early-clobber %28.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %28.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, $v0, 2, 4, 0, implicit $vl, implicit $vtype
  rewr %bb.0    512B:0  PseudoVSSEG4E16_V_M2 %28:vrn4m2nov0, %27:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    216B:1  undef %28.sub_vrm1_0_sub_vrm1_1_sub_vrm1_2_sub_vrm1_3_sub_vrm1_4_sub_vrm1_5:vrn4m2nov0 = COPY %29.sub_vrm1_0_sub_vrm1_1_sub_vrm1_2_sub_vrm1_3_sub_vrm1_4_sub_vrm1_5:vrn4m2nov0
queuing new interval: %28 [216r,288r:0)[288r,464e:1)[464e,512r:2) 0@216r 1@288r 2@464e  L000000000000000C [216r,216d:0)[464e,512r:1) 0@216r 1@464e  L0000000000000300 [288r,512r:0) 0@288r  L00000000000000C0 [216r,512r:0) 0@216r  L0000000000000030 [216r,512r:0) 0@216r  weight:8.706897e-03
Enqueuing %28
queuing new interval: %29 [64r,160r:0)[160r,208r:1)[208r,216r:2) 0@64r 1@160r 2@208r  L000000000000000C [64r,216r:0) 0@64r  L00000000000000C0 [208r,216r:0) 0@208r  L0000000000000030 [160r,216r:0) 0@160r  weight:1.097826e-02
Enqueuing %29
```

The live range of first part subreg of %25 is become [216r,216d:0)[464e,512r:1),
however first live range should live until 464e rather than just live
and [216r,216d:0).

And then the register allocator allocated wrong result accroding the
live range info.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126047

2 years ago[MLIR] Add an install target for mlir-libraries
Nathan Lanza [Wed, 8 Jun 2022 02:55:05 +0000 (22:55 -0400)]
[MLIR] Add an install target for mlir-libraries

This is required for the distribution system for installing the
mlir-libraries component. This is copied from clang's equivalent
feature.

Differential Revision: https://reviews.llvm.org/D126837