platform/upstream/llvm.git
22 months agoRevert "[MLIR][Presburger] Improve unittest parsing"
Groverkss [Thu, 15 Sep 2022 17:30:57 +0000 (18:30 +0100)]
Revert "[MLIR][Presburger] Improve unittest parsing"

This reverts commit 84d07d021333f7b5716f0444d5c09105557272e0.

Reverted to fix a compilation issue on gcc8.

22 months agoRevert "[mlir] Remove the unused source file."
Groverkss [Thu, 15 Sep 2022 17:29:22 +0000 (18:29 +0100)]
Revert "[mlir] Remove the unused source file."

This reverts commit e488ce29ec5ead2d518c183890215313c9d1b1f0.

Reverted to fix a compilation issue on gcc8.

22 months ago[mlir][sparse] partially implement codegen for sparse_tensor.compress
Aart Bik [Thu, 15 Sep 2022 03:18:51 +0000 (20:18 -0700)]
[mlir][sparse] partially implement codegen for sparse_tensor.compress

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D133912

22 months ago[libc] Add the implementation of the "remove" function.
Siva Chandra Reddy [Thu, 15 Sep 2022 07:52:17 +0000 (07:52 +0000)]
[libc] Add the implementation of the "remove" function.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D133922

22 months ago[IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation.
Craig Topper [Thu, 15 Sep 2022 16:38:02 +0000 (09:38 -0700)]
[IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation.

There are two ctlz intrinsics here with the zero_is_poison flag
set. There are also two comparisons that check if either of the
inputs the ctlzs are zero. We need to use a logical or to block
the poison from the ctlz if either of the inputs is zero.

Reviewed By: arsenm, aqjune

Differential Revision: https://reviews.llvm.org/D130680

22 months ago[InstCombine] fold X*X == 0 --> X == 0
Sanjay Patel [Thu, 15 Sep 2022 16:01:11 +0000 (12:01 -0400)]
[InstCombine] fold X*X == 0 --> X == 0

This is safe when the mul does not overflow:
https://alive2.llvm.org/ce/z/LedVVP

This could be extended to handle non-zero compare constants
and non-squared multiplies.

22 months ago[InstCombine] add tests for X*X == 0; NFC
Sanjay Patel [Thu, 15 Sep 2022 15:56:45 +0000 (11:56 -0400)]
[InstCombine] add tests for X*X == 0; NFC

22 months ago[CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift...
Simon Pilgrim [Thu, 15 Sep 2022 15:20:56 +0000 (16:20 +0100)]
[CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts

These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)

22 months ago[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction
Jay Foad [Thu, 15 Sep 2022 09:40:54 +0000 (10:40 +0100)]
[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction

Differential Revision: https://reviews.llvm.org/D133928

22 months ago[libc++] Clean up `_LIBCPP_HAS_NO_PLATFORM_WAIT` macro
Joe Loser [Fri, 26 Aug 2022 03:31:22 +0000 (21:31 -0600)]
[libc++] Clean up `_LIBCPP_HAS_NO_PLATFORM_WAIT` macro

As the comment suggests, `_LIBCPP_HAS_NO_PLATFORM_WAIT` is not documented or
defined anywhere internally in the build system. It's a direct define in terms
of `_LIBCPP_HAS_NO_THREADS`. So, remove `_LIBCPP_HAS_NO_PLATFORM_WAIT` and use
`_LIBCPP_HAS_NO_THREADS` instead to control the desired behavior.

Differential Revision: https://reviews.llvm.org/D132715

22 months agoAMDGPU: Use GlobalPriority for largest register tuples
Matt Arsenault [Sat, 23 Jul 2022 16:32:05 +0000 (12:32 -0400)]
AMDGPU: Use GlobalPriority for largest register tuples

Only do this for 16 and 32 register tuples, although we might want to
extend to 8 tuples.

It's incredibly expensive to spill these, and doing so majorly
interferes with the ability to allocate anything else in the function.

The lit tests show mostly sizeable improvements with a handful of tiny
regressions with large vectors.

22 months ago[mlir][arith] Support wide int cast emulation
Jakub Kuderski [Thu, 15 Sep 2022 15:34:43 +0000 (11:34 -0400)]
[mlir][arith] Support wide int cast emulation

Add support for `arith.extsi`, `arith.extui`, and `arith.trunci` ops.

Tested by checking the results for all 16-bit inputs when emulating i16 with i8.

Reviewed By: antiagainst, Mogball

Differential Revision: https://reviews.llvm.org/D133612

22 months ago[AMDGPU][MC][GFX11] Add disassembler tests for v_readfirstlane_b32
Dmitry Preobrazhensky [Thu, 15 Sep 2022 15:15:50 +0000 (18:15 +0300)]
[AMDGPU][MC][GFX11] Add disassembler tests for v_readfirstlane_b32

Differential Revision: https://reviews.llvm.org/D133437

22 months agoRevert "[lld-macho] Add support for N_INDR symbols"
Nico Weber [Thu, 15 Sep 2022 15:12:32 +0000 (11:12 -0400)]
Revert "[lld-macho] Add support for N_INDR symbols"

This reverts commit 5b8da10b87f7009c06215449e4a9c61dab91697a.
Breaks tests, see https://reviews.llvm.org/D133825

22 months ago[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm()
Sander de Smalen [Wed, 14 Sep 2022 15:53:13 +0000 (15:53 +0000)]
[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm()

A thread may not have access to SME or TPIDR2_EL0, so in order to
safely query PSTATE.SM in a streaming-compatible function, the
code should call `__arm_sme_state()`, as described in the ABI:

  https://github.com/ARM-software/abi-aa/pull/123/commits/c2bb09c4d4ee60a5787baf1ccc7e92e67e4240b7

This means that the value of pstate.sm is:
* 0 if the function is non-streaming.
* 1 if the function has `arm_streaming` or `arm_locally_streaming`.
* evaluated at runtime by a call to __arm_sme_state() otherwise.

This patch also adds a calling convention for calls to SME support routines.

At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic
and use function calls (with the corresponding cc) directly instead.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131571

22 months ago[AMDGPU][MC][GFX11][NFC] Update disassembler tests for MIMG instructions
Dmitry Preobrazhensky [Thu, 15 Sep 2022 15:03:26 +0000 (18:03 +0300)]
[AMDGPU][MC][GFX11][NFC] Update disassembler tests for MIMG instructions

Differential Revision: https://reviews.llvm.org/D133411

22 months ago[CostModel][X86] Remove redundant SSSE3 checks from div/rem costs
Simon Pilgrim [Thu, 15 Sep 2022 14:55:00 +0000 (15:55 +0100)]
[CostModel][X86] Remove redundant SSSE3 checks from div/rem costs

These all match the default SSE2 costs so use those instead

22 months agoRegAllocGreedy: Avoid overflowing priority bitfields
Matt Arsenault [Sat, 23 Jul 2022 14:13:25 +0000 (10:13 -0400)]
RegAllocGreedy: Avoid overflowing priority bitfields

The class priority is expected to be at most 5 bits before it starts
clobbering bits used for other fields. Also clamp the instruction
distance in case we have millions of instructions.

AMDGPU was accidentally overflowing into the global priority bit in
some cases. I think in principal we would have wanted this, but in the
cases I've looked at, it had the counter intuitive effect and
de-prioritized the large register tuple.

Avoid using weird bit hack PPC uses for global priority. The
AllocationPriority field is really 5 bits, and PPC was relying on
overflowing this to 6-bits to forcibly set the global priority
bit. Split this out as a separate flag to avoid having magic behavior
for values above 31.

22 months ago[CostModel][X86] Remove redundant BTVER2 checks from arithmetic costs
Simon Pilgrim [Thu, 15 Sep 2022 14:28:51 +0000 (15:28 +0100)]
[CostModel][X86] Remove redundant BTVER2 checks from arithmetic costs

These all match the default AVX/AVX1 costs so use those instead

22 months ago[CostModel][X86] Remove redundant BTVER2 checks from shift costs
Simon Pilgrim [Thu, 15 Sep 2022 14:25:52 +0000 (15:25 +0100)]
[CostModel][X86] Remove redundant BTVER2 checks from shift costs

These all match the default AVX/AVX1 costs so use those instead

22 months ago[AArch64] Add big-endian tests for trunc-to-tbl.ll
Florian Hahn [Thu, 15 Sep 2022 14:12:33 +0000 (15:12 +0100)]
[AArch64] Add big-endian tests for trunc-to-tbl.ll

Extra tests for D133495.

22 months ago[AMDGPU][MC][GFX11] Add validation of constant bus limitations for VOPD
Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:36:19 +0000 (16:36 +0300)]
[AMDGPU][MC][GFX11] Add validation of constant bus limitations for VOPD

Differential Revision: https://reviews.llvm.org/D133881

22 months ago[AMDGPU][MC][GFX11] Add VOPD literals validation
Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:29:53 +0000 (16:29 +0300)]
[AMDGPU][MC][GFX11] Add VOPD literals validation

Differential Revision: https://reviews.llvm.org/D133864

22 months ago[AMDGPU][MC][NFC] Refactor AMDGPUAsmParser::validateVOPLiteral
Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:24:25 +0000 (16:24 +0300)]
[AMDGPU][MC][NFC] Refactor AMDGPUAsmParser::validateVOPLiteral

Differential Revision: https://reviews.llvm.org/D133861

22 months agoApply clang-tidy fixes for llvm-include-order in TypeTest.cpp (NFC)
Mehdi Amini [Mon, 29 Aug 2022 12:18:14 +0000 (12:18 +0000)]
Apply clang-tidy fixes for llvm-include-order in TypeTest.cpp (NFC)

22 months agoApply clang-tidy fixes for bugprone-argument-comment in LLVMTypeTest.cpp (NFC)
Mehdi Amini [Mon, 29 Aug 2022 12:14:14 +0000 (12:14 +0000)]
Apply clang-tidy fixes for bugprone-argument-comment in LLVMTypeTest.cpp (NFC)

22 months agoApply clang-tidy fixes for readability-simplify-boolean-expr in OpFormatGen.cpp ...
Mehdi Amini [Mon, 29 Aug 2022 12:10:49 +0000 (12:10 +0000)]
Apply clang-tidy fixes for readability-simplify-boolean-expr in OpFormatGen.cpp (NFC)

22 months ago[libc][math] Improve sinhf and coshf performance.
Tue Ly [Thu, 15 Sep 2022 05:00:13 +0000 (01:00 -0400)]
[libc][math] Improve sinhf and coshf performance.

Optimize `sinhf` and `coshf` by computing exp(x) and exp(-x) simultaneously.

Currently `sinhf` and `coshf` are implemented using the following formulas:
```
  sinh(x) = 0.5 *(exp(x) - 1) - 0.5*(exp(-x) - 1)
  cosh(x) = 0.5*exp(x) + 0.5*exp(-x)
```
where `exp(x)` and `exp(-x)` are calculated separately using the formula:
```
  exp(x) ~ 2^hi * 2^mid * exp(dx)
         ~ 2^hi * 2^mid * P(dx)
```
By expanding the polynomial `P(dx)` into even and odd parts
```
  P(dx) = P_even(dx) + dx * P_odd(dx)
```
we can see that the computations of `exp(x)` and `exp(-x)` have many things in common,
namely:
```
  exp(x)  ~ 2^(hi + mid) * (P_even(dx) + dx * P_odd(dx))
  exp(-x) ~ 2^(-(hi + mid)) * (P_even(dx) - dx * P_odd(dx))
```
Expanding `sinh(x)` and `cosh(x)` with respect to the above formulas, we can compute
these two functions as follow in order to maximize the sharing parts:
```
  sinh(x) = (e^x - e^(-x)) / 2
          ~ 0.5 * (P_even * (2^(hi + mid) - 2^(-(hi + mid))) +
                  dx * P_odd * (2^(hi + mid) + 2^(-(hi + mid))))
  cosh(x) = (e^x + e^(-x)) / 2
          ~ 0.5 * (P_even * (2^(hi + mid) + 2^(-(hi + mid))) +
                  dx * P_odd * (2^(hi + mid) - 2^(-(hi + mid))))
```
So in this patch, we perform the following optimizations for `sinhf` and `coshf`:
  # Use the above formulas to maximize sharing intermediate results,
  # Apply similar optimizations from https://reviews.llvm.org/D133870

Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:
For `sinhf`:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 16.718
System LIBC reciprocal throughput : 63.151

BEFORE:
LIBC reciprocal throughput        : 90.116
LIBC reciprocal throughput        : 28.554    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 22.577    (with `-mfma` flag)

AFTER:
LIBC reciprocal throughput        : 36.482
LIBC reciprocal throughput        : 16.955    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 13.943    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 48.821
System LIBC latency : 137.019

BEFORE
LIBC latency        : 97.122
LIBC latency        : 84.214    (with `-msse4.2` flag)
LIBC latency        : 71.611    (with `-mfma` flag)

AFTER
LIBC latency        : 54.555
LIBC latency        : 50.865    (with `-msse4.2` flag)
LIBC latency        : 48.700    (with `-mfma` flag)
```
For `coshf`:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 16.939
System LIBC reciprocal throughput : 19.695

BEFORE:
LIBC reciprocal throughput        : 52.845
LIBC reciprocal throughput        : 29.174    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 22.553    (with `-mfma` flag)

AFTER:
LIBC reciprocal throughput        : 37.169
LIBC reciprocal throughput        : 17.805    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 14.691    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 48.478
System LIBC latency : 48.044

BEFORE
LIBC latency        : 99.123
LIBC latency        : 85.595    (with `-msse4.2` flag)
LIBC latency        : 72.776    (with `-mfma` flag)

AFTER
LIBC latency        : 57.760
LIBC latency        : 53.967    (with `-msse4.2` flag)
LIBC latency        : 50.987    (with `-mfma` flag)
```

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D133913

22 months ago[DWARFLinker][NFC] Set the target DWARF version explicitly.
Alexey Lapshin [Sun, 4 Sep 2022 09:38:36 +0000 (12:38 +0300)]
[DWARFLinker][NFC] Set the target DWARF version explicitly.

Currently, DWARFLinker determines the target DWARF version internally.
It examines incoming object files, detects maximal
DWARF version and uses that version for the output file.
This patch allows explicitly setting output DWARF version by the consumer
of DWARFLinker. So that DWARFLinker uses a specified version instead
of autodetected one. It allows consumers to use different logic for
setting the target DWARF version. f.e. instead of the maximally used version
someone could set a higher version to convert from DWARFv4 to DWARFv5
(This possibility is not supported yet, but it would be good if
the interface will support it). Or another variant is to set the target
version through the command line. In this patch, the autodetection is moved
into the consumers(DwarfLinkerForBinary.cpp, DebugInfoLinker.cpp).

Differential Revision: https://reviews.llvm.org/D132755

22 months ago[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops
Simon Pilgrim [Thu, 15 Sep 2022 13:01:27 +0000 (14:01 +0100)]
[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops

Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)

22 months ago[AArch64] Add big-endian tests for zext-to-tbl.ll
Florian Hahn [Thu, 15 Sep 2022 13:01:26 +0000 (14:01 +0100)]
[AArch64] Add big-endian tests for zext-to-tbl.ll

Extra tests for D120571.

22 months ago[LoongArch] Fixup value adjustment in applyFixup
wanglei [Thu, 15 Sep 2022 12:31:24 +0000 (20:31 +0800)]
[LoongArch] Fixup value adjustment in applyFixup

A complete implementation of `applyFixup` for D132323.

Makes `LoongArchAsmBackend::shouldForceRelocation` to determine
if the relocation types must be forced.

This patch also adds range and alignment checks for `b*` instructions'
operands, at which point the offset to a label is known.

Differential Revision: https://reviews.llvm.org/D132818

22 months ago[clang][RecoveryExpr] Don't perform alignment check if parameter type is dependent
Aleksandr Platonov [Thu, 15 Sep 2022 12:51:30 +0000 (15:51 +0300)]
[clang][RecoveryExpr] Don't perform alignment check if parameter type is dependent

This patch fixes a crash which appears because of getTypeAlignInChars() call with depentent type.

Reviewed By: hokein

Differential Revision: https://reviews.llvm.org/D133886

22 months ago[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.
Ivan Kosarev [Thu, 15 Sep 2022 12:20:24 +0000 (13:20 +0100)]
[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D133787

22 months ago[lld-macho] Add support for N_INDR symbols
Jez Ng [Thu, 15 Sep 2022 12:34:35 +0000 (08:34 -0400)]
[lld-macho] Add support for N_INDR symbols

This is similar to the `-alias` CLI option, but it gives finer-grained
control in that it allows the aliased symbols to be treated as private
externs.

While working on this, I realized that our `-alias` handling did not
cover the cases where the aliased symbol is a common or dylib symbol,
nor the case where we have an undefined that gets treated specially and
converted to a defined later on. My N_INDR handling neglects this too
for now; I've added checks and TODO messages for these.

`N_INDR` symbols cropped up as part of our attempt to link swift-stdlib.

Reviewed By: #lld-macho, thakis, thevinster

Differential Revision: https://reviews.llvm.org/D133825

22 months ago[mlir] Remove the unused source file.
Haojian Wu [Thu, 15 Sep 2022 11:54:29 +0000 (13:54 +0200)]
[mlir] Remove the unused source file.

It seems to be a missing file in 84d07d021333f7b5716f0444d5c09105557272e0

Differential Revision: https://reviews.llvm.org/D133937

22 months ago[MLIR][Presburger] clarify why -0 is used instead of 0 (NFC)
Arjun P [Thu, 15 Sep 2022 12:23:00 +0000 (13:23 +0100)]
[MLIR][Presburger] clarify why -0 is used instead of 0 (NFC)

22 months ago[SPIRV] add IR regularization pass
Ilia Diachkov [Wed, 14 Sep 2022 08:51:03 +0000 (11:51 +0300)]
[SPIRV] add IR regularization pass

The patch adds the regularization pass that prepare LLVM IR for
the IR translation. It also contains following changes:
- reduce indentation, make getNonParametrizedType, getSamplerType,
getPipeType, getImageType, getSampledImageType static in SPIRVBuiltins,
- rename mayBeOclOrSpirvBuiltin to getOclOrSpirvBuiltinDemangledName,
- move isOpenCLBuiltinType, isSPIRVBuiltinType, isSpecialType from
SPIRVGlobalRegistry.cpp to SPIRVUtils.cpp, renaming isSpecialType to
isSpecialOpaqueType,
- implment getTgtMemIntrinsic() in SPIRVISelLowering,
- add hasSideEffects = 0 in Pseudo (SPIRVInstrFormats.td),
- add legalization rule for G_MEMSET, correct G_BRCOND rule,
- add capability processing for OpBuildNDRange in SPIRVModuleAnalysis,
- don't correct types of registers holding constants and used in
G_ADDRSPACE_CAST (SPIRVPreLegalizer.cpp),
- lower memset/bswap intrinsics to functions in SPIRVPrepareFunctions,
- change TargetLoweringObjectFileELF to SPIRVTargetObjectFile
in SPIRVTargetMachine.cpp,
- correct comments.
5 LIT tests are added to show the improvement.

Differential Revision: https://reviews.llvm.org/D133253

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
22 months ago[NFC] Don't assume llvm directory is CMake root
Michael Platings [Thu, 15 Sep 2022 10:59:03 +0000 (11:59 +0100)]
[NFC] Don't assume llvm directory is CMake root

This makes the file consistent with ARM/CMakeLists.txt

22 months agoFix bazel build after 84d07d021333f7b5716f0444d5c09105557272e0.
Haojian Wu [Thu, 15 Sep 2022 11:52:46 +0000 (13:52 +0200)]
Fix bazel build after 84d07d021333f7b5716f0444d5c09105557272e0.

22 months ago[mlir][Linalg] Post submit addressed comments missed in f0cdc5bcd3f25192f12bfaff072ce...
Nicolas Vasilache [Thu, 15 Sep 2022 11:12:58 +0000 (04:12 -0700)]
[mlir][Linalg] Post submit addressed comments missed in f0cdc5bcd3f25192f12bfaff072ce02497b59c3c

Differential Revision: https://reviews.llvm.org/D133936

22 months ago[NFC] Fix exception in version-check.py script
Tobias Hieta [Thu, 15 Sep 2022 11:32:32 +0000 (13:32 +0200)]
[NFC] Fix exception in version-check.py script

22 months agoAdd a "Potentially Breaking Changes" section to the Clang release notes
Aaron Ballman [Thu, 15 Sep 2022 11:29:49 +0000 (07:29 -0400)]
Add a "Potentially Breaking Changes" section to the Clang release notes

Sometimes we make changes to the compiler that we expect may cause
disruption for users. For example, we may strengthen a warning to
default to be an error, or fix an accepts-invalid bug that's been
around for a long time, etc which may cause previously accepted code to
now be rejected. Rather than hope users discover that information by
reading all of the release notes, it's better that we call these out in
one location at the top of the release notes.

Based on feedback collected in the discussion at:
https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-implicit-function-declaration/65213/

Differential Revision: https://reviews.llvm.org/D133771

22 months ago[MLIR][Presburger] Improve unittest parsing
Groverkss [Thu, 15 Sep 2022 11:09:00 +0000 (12:09 +0100)]
[MLIR][Presburger] Improve unittest parsing

This patch adds better functions for parsing MultiAffineFunctions and
PWMAFunctions in Presburger unittests.

A PWMAFunction can now be parsed as:

```
PWMAFunction result = parsePWMAF({
    {"(x, y) : (x >= 10, x <= 20, y >= 1)", "(x, y) -> (x + y)"},
    {"(x, y) : (x >= 21)", "(x, y) -> (x + y)"},
    {"(x, y) : (x <= 9)", "(x, y) -> (x - y)"},
    {"(x, y) : (x >= 10, x <= 20, y <= 0)", "(x, y) -> (x - y)"},
});
```

which is much more readable than the old format since the output can be
described as an AffineMap, instead of coefficients.

This patch also adds support for parsing divisions in MultiAffineFunctions
and PWMAFunctions which was previously not possible.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D133654

22 months ago[PowerPC] Converts to comparison against zero even when the optimization
esmeyi [Thu, 15 Sep 2022 10:06:25 +0000 (06:06 -0400)]
[PowerPC] Converts to comparison against zero even when the optimization
          doesn't happened in peephole optimizer.

Summary: Converting a comparison against 1 or -1 into a comparison
against 0 can exploit record-form instructions for comparison optimization.
The conversion will happen only when a record-form instruction can be used
to replace the comparison during the peephole optimizer (see function optimizeCompareInstr).

In post-RA, we also want to optimize the comparison by using the record
form (see D131873) and it requires additional dataflow analysis to reliably
find uses of the CR register set.

It's reasonable to common the conversion for both peephole optimizer and
post-RA optimizer.

Converting to comparison against zero even when the optimization doesn't
happened in peephole optimizer may create additional opportunities for the
post-RA optimization.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D131374

22 months ago[mlir][linalg] Retire Linalg's Vectorization Pattern
Guray Ozen [Thu, 15 Sep 2022 08:39:13 +0000 (10:39 +0200)]
[mlir][linalg] Retire Linalg's Vectorization Pattern

This revision retires the LinalgCodegenStrategy vectorization pattern. Please see the context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785.
This revision improves the transform dialect's VectorizeOp in different ways below:
- Adds LinalgDialect as a dependent dialect. When `transform.structured.vectorize` vectorizes `tensor.pad`, it generates `linalg.init_tensor`. In this case, linalg dialect must be registered.
- Inserts CopyVectorizationPattern in order to vectorize `memref.copy`.
- Creates two attributes: `disable_multi_reduction_to_contract_patterns` and `disable_transfer_permutation_map_lowering_patterns`. They are limiting the power of vectorization and are currently intended for testing purposes.

It also removes some of the "CHECK: vector.transfer_write" in the vectorization.mlir test. They are redundant writes, at the end of the code there is a rewrite to the same place. Transform dialect no longer generates them.

Depends on D133684 that retires the LinalgCodegenStrategy vectorization pass.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133699

22 months ago[mlir][linalg] Retire Linalg's StrategyVectorizePass
Guray Ozen [Thu, 15 Sep 2022 08:45:46 +0000 (10:45 +0200)]
[mlir][linalg] Retire Linalg's StrategyVectorizePass

We retire linalg's strategy vectorize pass. Our goal is to use transform dialect instead of passes.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133684

22 months ago[clangd] Fix hover on symbol introduced by using declaration
Tom Praschan [Sun, 11 Sep 2022 12:54:26 +0000 (14:54 +0200)]
[clangd] Fix hover on symbol introduced by using declaration

This fixes https://github.com/clangd/clangd/issues/1284. The example
there was C++20's "using enum", but I noticed that we had the same issue
for other using-declarations.

Differential Revision: https://reviews.llvm.org/D133664

22 months ago[GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests
Marco Elver [Thu, 15 Sep 2022 08:36:11 +0000 (10:36 +0200)]
[GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests

Add fix for propagation of !pcsections metadata for expanded atomics,
together with more tests for interesting atomic instructions (based on
llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133710

22 months ago[Bazel] Add lit tests to bazel builds.
Christian Sigg [Wed, 7 Sep 2022 22:00:38 +0000 (00:00 +0200)]
[Bazel] Add lit tests to bazel builds.

Add BUILD.bazel files for most of the MLIR tests and lit tests itself.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D133455

22 months ago[JumpThreading][NFC] Reuse existing DT instead of recomputation (newPM)
Evgeniy Brevnov [Thu, 15 Sep 2022 05:24:56 +0000 (12:24 +0700)]
[JumpThreading][NFC] Reuse existing DT instead of recomputation (newPM)

This is the same change as
503d5771b6c5e3544a9fa3be6b8d085ffbbd4057 with the same intent but for new pass manager.

22 months ago[IRBuilder] Fix -Wunused-variable in non-assertion build. NFC
Fangrui Song [Thu, 15 Sep 2022 05:14:36 +0000 (22:14 -0700)]
[IRBuilder] Fix -Wunused-variable in non-assertion build. NFC

22 months agoRevert "[OpenMP] Codegen aggregate for outlined function captures"
Dhruva Chakrabarti [Thu, 15 Sep 2022 03:08:46 +0000 (03:08 +0000)]
Revert "[OpenMP] Codegen aggregate for outlined function captures"

This reverts commit 7539e9cf811e590d9f12ae39673ca789e26386b4.

22 months ago[nfc][msan] getShadowOriginPtr on <N x ptr>
Vitaly Buka [Mon, 12 Sep 2022 04:49:18 +0000 (21:49 -0700)]
[nfc][msan] getShadowOriginPtr on <N x ptr>

Some vector instructions can benefit from
of Addr as <N x ptr>.

Differential Revision: https://reviews.llvm.org/D133681

22 months ago[IRBuilder] Add CreateMaskedExpandLoad and CreateMaskedCompressStore
Vitaly Buka [Mon, 12 Sep 2022 04:30:07 +0000 (21:30 -0700)]
[IRBuilder] Add CreateMaskedExpandLoad and CreateMaskedCompressStore

22 months ago[NFC][msan] Rename variables to match definition
Vitaly Buka [Sun, 11 Sep 2022 19:55:57 +0000 (12:55 -0700)]
[NFC][msan] Rename variables to match definition

22 months ago[NFC][msan] Convert some code to early returns
Vitaly Buka [Sun, 11 Sep 2022 00:22:32 +0000 (17:22 -0700)]
[NFC][msan] Convert some code to early returns

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133673

22 months ago[NFC][msan] Simplify llvm.masked.load origin code
Vitaly Buka [Sun, 11 Sep 2022 00:13:12 +0000 (17:13 -0700)]
[NFC][msan] Simplify llvm.masked.load origin code

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133652

22 months ago[msan] Resolve FIXME from D133880
Vitaly Buka [Thu, 15 Sep 2022 01:18:45 +0000 (18:18 -0700)]
[msan] Resolve FIXME from D133880

We don't need to change tests we convertToBool
unconditionally only before OR.

22 months ago[test][msan] Use implicit-check-not
Vitaly Buka [Thu, 15 Sep 2022 01:42:13 +0000 (18:42 -0700)]
[test][msan] Use implicit-check-not

22 months ago[M68k] Fix the crash of fast register allocator
Sheng [Thu, 15 Sep 2022 01:22:15 +0000 (09:22 +0800)]
[M68k] Fix the crash of fast register allocator

`MOVEM` is used to spill the register, which will cause problem with 1 byte data, since it only supports word (2 bytes) and long (4 bytes) size.

We change to use the normal `move` instruction to spill 1 byte data.

Fixes #57660

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D133636

22 months ago[mlir] Allow `Attribute::print` to elide the type
Jeff Niu [Wed, 14 Sep 2022 00:35:38 +0000 (17:35 -0700)]
[mlir] Allow `Attribute::print` to elide the type

This patch adds a flag to `Attribute::print` that prints the attribute
without its type.

Fixes #57689

Reviewed By: rriddle, lattner

Differential Revision: https://reviews.llvm.org/D133822

22 months ago[mlir][ods] Add cppClassName to ConfinedType
Jeff Niu [Wed, 14 Sep 2022 20:43:45 +0000 (13:43 -0700)]
[mlir][ods] Add cppClassName to ConfinedType

So ODS can generate `OneTypedResult` when a ConfinedType is used as a
result type.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D133893

22 months ago[OpenMP] Codegen aggregate for outlined function captures
Giorgis Georgakoudis [Thu, 15 Sep 2022 00:09:54 +0000 (00:09 +0000)]
[OpenMP] Codegen aggregate for outlined function captures

Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6, ABataev

Differential Revision: https://reviews.llvm.org/D102107

22 months agoFix crash while printing MMO target flags
Stanislav Mekhanoshin [Wed, 14 Sep 2022 22:42:30 +0000 (15:42 -0700)]
Fix crash while printing MMO target flags

MachineMemOperand::print can dereference a NULL pointer if TII
is not passed from the printMemOperand. This does not happen while
dumping the DAG/MIR from llc but crashes the debugger if a dump()
method is called from gdb.

Differential Revision: https://reviews.llvm.org/D133903

22 months ago[RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI
Craig Topper [Wed, 14 Sep 2022 22:53:18 +0000 (15:53 -0700)]
[RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI

This code was written as if it lived in the MC layer instead of
the CodeGen layer. We get the MCInstrDesc directly from MachineInstr.
And we can use RISCVSubtarget::is64Bit instead of going to the
Triple.

Differential Revision: https://reviews.llvm.org/D133905

22 months ago[MC] Fix typo in getSectionAddressSize comment. NFC
Sam Clegg [Thu, 1 Sep 2022 07:59:54 +0000 (00:59 -0700)]
[MC] Fix typo in getSectionAddressSize comment. NFC

The comment was refering to a now non-existant function that was removed
in 93e3cf0ebd9c95a8df42fff0aa38fc022422b4d4.

Differential Revision: https://reviews.llvm.org/D133098

22 months ago[IR][VP] Remove IntrArgMemOnly from vp.gather/scatter.
Craig Topper [Wed, 14 Sep 2022 21:52:17 +0000 (14:52 -0700)]
[IR][VP] Remove IntrArgMemOnly from vp.gather/scatter.

IntrArgMemOnly is only valid for intrinsics that use a scalar
pointer argument. These intrinsics use a vector of pointer.

Alias analysis will try to find a scalar pointer argument and
will return incorrect alias results when it doesn't find one.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133898

22 months ago[GVN][VP] Add test case for incorrect removal of a vp.gather. NFC
Craig Topper [Wed, 14 Sep 2022 21:52:05 +0000 (14:52 -0700)]
[GVN][VP] Add test case for incorrect removal of a vp.gather. NFC

Pre-commit for D133898

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133899

22 months ago[lld-macho] Have ICF dedup explicitly-defined selrefs
Jez Ng [Tue, 13 Sep 2022 15:15:15 +0000 (11:15 -0400)]
[lld-macho] Have ICF dedup explicitly-defined selrefs

This is what ld64 does (though it doesn't use ICF to do this; instead it
always dedups selrefs by default).

We'll want to dedup implicitly-defined selrefs as well, but I will leave
that for future work.

Additionally, I'm not *super* happy with the current LLD implementation
because I think it is rather janky and inefficient. But at least it
moves us toward the goal of closing the size gap with ld64. I've
described ideas for cleaning up our implementation here:
https://github.com/llvm/llvm-project/issues/57714

Differential Revision: https://reviews.llvm.org/D133780

22 months ago[lld-macho][nfc] Clean up ICF code
Jez Ng [Wed, 14 Sep 2022 21:46:41 +0000 (17:46 -0400)]
[lld-macho][nfc] Clean up ICF code

Split these changes out from https://reviews.llvm.org/D133780.

22 months ago[msan] Change logic of ClInstrumentationWithCallThreshold
Vitaly Buka [Wed, 14 Sep 2022 03:17:55 +0000 (20:17 -0700)]
[msan] Change logic of ClInstrumentationWithCallThreshold

According to logs, ClInstrumentationWithCallThreshold is workaround
for slow backend with large number of basic blocks.
However, I can't reproduce that one, but I see significant slowdown
after ClCheckConstantShadow. Without ClInstrumentationWithCallThreshold
compiler is able to eliminate many of the branches.

So maybe we should drop ClInstrumentationWithCallThreshold completly.

For now I just change the logic to ignore constant shadow so it will
not trigger callback fallback too early.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D133880

22 months ago[RISCV] Update error message to not call 'RV32' and 'RV64' an extension.
Craig Topper [Wed, 14 Sep 2022 21:41:19 +0000 (14:41 -0700)]
[RISCV] Update error message to not call 'RV32' and 'RV64' an extension.

I used RV32 so I didn't have to write RV32I and RV32E. Ideally
these builtins will be wrapped in a header someday so long term I don't
expect users to see these errors.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D133444

22 months agoRevert "Revert "Be more careful to maintain quoting information when parsing commands.""
Jim Ingham [Wed, 14 Sep 2022 19:45:26 +0000 (12:45 -0700)]
Revert "Revert "Be more careful to maintain quoting information when parsing commands.""

    This reverts commit ac05bc0524c66c74278b26742896a4c634c034cf.

I had incorrectly removed one set of checks in the option handling in
Options::ParseAlias because I couldn't see what it is for.  It was a
bit obscure, but it handled the case where you pass "-something=other --"
as the input_line, which caused the built-in "run" alias not to return
the right value for IsDashDashCommand, causing TestHelp.py to fail.

22 months ago[RISCV] Verify SEW/VecPolicy immediate values
Philip Reames [Wed, 14 Sep 2022 21:40:29 +0000 (14:40 -0700)]
[RISCV] Verify SEW/VecPolicy immediate values

Copy the asserts from the printing code, and turn them into actual verifier rules. Doing this revealed an existing bug - see 0a14551.

Differential Revision: https://reviews.llvm.org/D133869

22 months ago[RISCV] Fix a silent miscompile in copyPhysReg
Philip Reames [Wed, 14 Sep 2022 21:39:09 +0000 (14:39 -0700)]
[RISCV] Fix a silent miscompile in copyPhysReg

Found this when adding verifier rules. The case which arises is that we have a DefMBBI which has a VecPolicy operand. The code was not expecting this, and the unconditional copy of the last two operands resulted in the SEW and VecPolicy fields being added to the VMV_V_V as AVL and SEW respectively.

Oddly, this appears to be a silent in practice. There's no test change despite verifier changes proving that we definitely hit this in existing tests.

Differential Revision: https://reviews.llvm.org/D133868

22 months ago[mlir] Fix warnings
Kazu Hirata [Wed, 14 Sep 2022 21:35:19 +0000 (14:35 -0700)]
[mlir] Fix warnings

This patch fixes three warnings of the form:

  mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:1436:5: error:
  default label in switch which covers all enumeration values
  [-Werror,-Wcovered-switch-default]

22 months ago[mlir][vector] Check minor identity map in FoldExtractSliceIntoTransferRead
Lei Zhang [Wed, 14 Sep 2022 21:23:27 +0000 (17:23 -0400)]
[mlir][vector] Check minor identity map in FoldExtractSliceIntoTransferRead

vecotr.transfer_read ops with minor identity indexing map is rank
reducing, with implicit leading unit dimensions. This should be
a natural extension to support in addition to full identity indexing
maps.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D133883

22 months ago[libc] Add POSIX functions pread and pwrite.
Siva Chandra Reddy [Wed, 14 Sep 2022 19:23:40 +0000 (19:23 +0000)]
[libc] Add POSIX functions pread and pwrite.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D133888

22 months ago[OptBisect] Add flag to print IR when opt-bisect kicks in
Arthur Eubanks [Tue, 13 Sep 2022 21:01:15 +0000 (14:01 -0700)]
[OptBisect] Add flag to print IR when opt-bisect kicks in

-opt-bisect-print-ir-path=foo will dump the IR to foo when opt-bisect-limit starts skipping passes.

Currently we don't print the IR if the opt-bisect-limit is higher than the total number of times opt-bisect is called.

This makes getting the IR right before a bad transform easier.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D133809

22 months ago[lsan][Darwin] Scan libdispatch and Foundation memory regions
Leonard Grey [Wed, 7 Sep 2022 17:21:09 +0000 (13:21 -0400)]
[lsan][Darwin] Scan libdispatch and Foundation memory regions

libdispatch uses its own heap (_dispatch_main_heap) for some allocations, including the dispatch_continuation_t that holds a dispatch source's event handler.
Objective-C block trampolines (creating methods at runtime with a block as the implementations) use the VM_MEMORY_FOUNDATION region (see https://github.com/apple-oss-distributions/objc4/blob/8701d5672d3fd3cd817aeb84db1077aafe1a1604/runtime/objc-block-trampolines.mm#L371).

This change scans both regions to fix false positives. See tests for details; unfortunately I was unable to reduce the trampoline example with imp_implementationWithBlock on a new class, so I'm resorting to something close to the bug as seen in the wild.

Differential Revision: https://reviews.llvm.org/D129385

22 months ago[pipelines] Require GlobalsAA after sanitizers
Vitaly Buka [Thu, 8 Sep 2022 22:39:15 +0000 (15:39 -0700)]
[pipelines] Require GlobalsAA after sanitizers

Restore GlobalsAA if sanitizers inserted at early optimize callback.
The analysis can be useful for the following FunctionPassManager.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D133537

22 months ago[NFC][CodeGen] Remove empty line
Vitaly Buka [Wed, 14 Sep 2022 20:28:46 +0000 (13:28 -0700)]
[NFC][CodeGen] Remove empty line

22 months ago[mlir][LLVMIR] Add lifetime start and end marker instrinsics
Jeff Niu [Wed, 14 Sep 2022 15:36:51 +0000 (08:36 -0700)]
[mlir][LLVMIR] Add lifetime start and end marker instrinsics

This patch adds the `llvm.intr.lifetime.start` and `llvm.intr.lifetime.end`
intrinsics which are used to indicate to LLVM the lifetimes of allocated
memory.

These ops have the requirement that the first argument (the size) be an
"immediate argument". I added an OpTrait to check this, but it is
possible that an approach like GEPArg would work too.

Reviewed By: rriddle, dcaballe

Differential Revision: https://reviews.llvm.org/D133867

22 months ago[mlir][linalg] fix switch case for conv-vec to have brackets
Stanley Winata [Wed, 14 Sep 2022 20:08:21 +0000 (13:08 -0700)]
[mlir][linalg] fix switch case for conv-vec to have brackets

Windows build requires brackets on switch-cases that initializes
variables.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D133889

22 months ago[compiler-rt][builtins] Enable more warnings in add_security_warnings
Akira Hatanaka [Tue, 13 Sep 2022 15:47:43 +0000 (08:47 -0700)]
[compiler-rt][builtins] Enable more warnings in add_security_warnings

Enable -Wsizeof-array-div and -Wsizeof-pointer-divcompiler.

Also, replace -Wmemset-transposed-args with -Wsuspicious-memaccess. The
latter automatically enables the former and a few other warnings.

Differential Revision: https://reviews.llvm.org/D133783

22 months ago[CMake] Avoid `LLVM_BINARY_DIR` when other more specific variable are better-suited...
John Ericson [Wed, 14 Sep 2022 04:12:04 +0000 (00:12 -0400)]
[CMake] Avoid `LLVM_BINARY_DIR` when other more specific variable are better-suited, part 2

A simple sed doing these substitutions:

- `${LLVM_BINARY_DIR}/lib${LLVM_LIBDIR_SUFFIX}\>` -> `${LLVM_LIBRARY_DIR}`
- `${LLVM_BINARY_DIR}/bin\>` -> `${LLVM_TOOLS_BINARY_DIR}`

where `\>` means "word boundary".

The only manual modifications were reverting changes in

- `runtimes/CMakeLists.txt`

because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing.

There are some `${LLVM_BINARY_DIR}/lib` without the `${LLVM_LIBDIR_SUFFIX}`, but these refer to the lib subdirectory of the source (`llvm/lib`). That `lib` is automatically appended to make the local `CMAKE_CURRENT_BINARY_DIR` value by `add_subdirectory`; since the directory name in the source tree is fixed without any suffix, the corresponding `CMAKE_CURRENT_BINARY_DIR` will also be. We therefore do not replace it but leave it as-is.

This picks up where D133828 left off, getting the occurrences with*out* `CMAKE_CFG_INTDIR`. But this is difficult to do correctly and so not done in the (retroactively) previous diff.

This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D132316

22 months ago[DAGCombiner] More load-store forwarding for big-endian
Roland Froese [Wed, 14 Sep 2022 19:35:37 +0000 (15:35 -0400)]
[DAGCombiner] More load-store forwarding for big-endian

Get some load-store forwarding cases for big-endian where a larger store covers
a smaller load, and the offset would be 0 and handled on little-endian but on
big-endian the offset is adjusted to be non-zero. The idea is just to shift the
data to make it look like the offset 0 case.

Differential Revision: https://reviews.llvm.org/D130115

22 months ago[llvm-objdump] Change printSymbolVersionDependency to use ELFFile API
Fangrui Song [Wed, 14 Sep 2022 19:30:34 +0000 (12:30 -0700)]
[llvm-objdump] Change printSymbolVersionDependency to use ELFFile API

When .gnu.version_r is empty (allowed by readelf but warned by objdump),
llvm-objdump -p may decode the next section as .gnu.version_r and may crash due
to out-of-bounds C string reference. ELFFile<ELFT>::getVersionDependencies
handles 0-entry .gnu.version_r gracefully. Just use it.

Fix https://github.com/llvm/llvm-project/issues/57707

Differential Revision: https://reviews.llvm.org/D133751

22 months ago[llvm-objdump][test] Add verneed-invalid.test
Fangrui Song [Wed, 14 Sep 2022 19:27:30 +0000 (12:27 -0700)]
[llvm-objdump][test] Add verneed-invalid.test

22 months agolld: Include name of output file in "failed to write output" diag
Nico Weber [Thu, 1 Sep 2022 14:01:54 +0000 (10:01 -0400)]
lld: Include name of output file in "failed to write output" diag

Differential Revision: https://reviews.llvm.org/D133110

22 months ago[lldb][tests] Move C++ gmodules tests into new gmodules/ subdirectory
Michael Buch [Wed, 14 Sep 2022 16:31:25 +0000 (12:31 -0400)]
[lldb][tests] Move C++ gmodules tests into new gmodules/ subdirectory

This is in preparation for adding more gmodules
tests.

Differential Revision: https://reviews.llvm.org/D133876

22 months ago[libc][math] Improve exp2f performance.
Tue Ly [Wed, 14 Sep 2022 15:37:29 +0000 (11:37 -0400)]
[libc][math] Improve exp2f performance.

Reduce the number of subintervals that need lookup table and optimize
the evaluation steps.

Currently, `exp2f` is computed by reducing to `2^hi * 2^mid * 2^lo` where
`-16/32 <= mid <= 15/32` and `-1/64 <= lo <= 1/64`, and `2^lo` is then
approximated by a degree 6 polynomial.

Experiment with Sollya showed that by using a degree 6 polynomial, we
can approximate `2^lo` for a bigger range with reasonable errors:
```
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/64, 1/64]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/64, 1/64]);
0x1.e18a1bc09114def49eb851655e2e5c4dd08075ac2p-63

> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/32, 1/32]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/32, 1/32]);
0x1.05627b6ed48ca417fe53e3495f7df4baf84a05e2ap-56
```
So we can optimize the implementation a bit with:
# Reduce the range to `mid = i/16` for `i = 0..15` and `-1/32 <= lo <= 1/32`
# Store the table `2^mid` in bits, and add `hi` directly to its exponent field to compute `2^hi * 2^mid`
# Rearrange the order of evaluating the polynomial approximating `2^lo`.

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 9.534
System LIBC reciprocal throughput : 6.229

BEFORE:
LIBC reciprocal throughput        : 21.405
LIBC reciprocal throughput        : 15.241    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 11.111    (with `-mfma` flag)

AFTER:
LIBC reciprocal throughput        : 18.617
LIBC reciprocal throughput        : 12.852    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 9.253     (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 40.869
System LIBC latency : 30.580

BEFORE
LIBC latency        : 64.888
LIBC latency        : 61.027    (with `-msse4.2` flag)
LIBC latency        : 48.778    (with `-mfma` flag)

AFTER
LIBC latency        : 48.803
LIBC latency        : 45.047    (with `-msse4.2` flag)
LIBC latency        : 37.487    (with `-mfma` flag)
```

Reviewed By: sivachandra, orex

Differential Revision: https://reviews.llvm.org/D133870

22 months ago[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on *BSD
Fangrui Song [Wed, 14 Sep 2022 18:24:00 +0000 (11:24 -0700)]
[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on *BSD

Similar to D107799 but for *BSD (DragonFlyBSD, FreeBSD, NetBSD, OpenBSD, etc).
This Linux default has been in main and release/15.x for a while.

`CMAKE_SYSTEM_PROCESSOR MATCHES "^arm"` is excluded for now.

Link: https://discourse.llvm.org/t/rfc-time-to-drop-legacy-runtime-paths/64628
Reviewed By: dim

Differential Revision: https://reviews.llvm.org/D110126

22 months ago[mlir][linalg] Vectorization for conv_1d_ncw_fcw
Stanley Winata [Wed, 14 Sep 2022 18:07:46 +0000 (11:07 -0700)]
[mlir][linalg] Vectorization for conv_1d_ncw_fcw

Most computer vision torch models uses nchw/ncw convolution. In a previous patch we added decomposition conv2dNchw to conv1dNcw. To enhance the performance on torch models we add this vectorization pattern for conv1dNcw which would consquently also improve the performance on conv2dNchw.

On IREE + Intel Xeon 8360 + Resnet50, we were able to get ~7x speed up ~880ms to 126ms.

Reviewed By: nicolasvasilache, hanchung

Differential Revision: https://reviews.llvm.org/D133675

22 months ago[AMDGPU] Check for num elts in SelectVOP3PMods
Piotr Sobczak [Wed, 14 Sep 2022 11:19:16 +0000 (13:19 +0200)]
[AMDGPU] Check for num elts in SelectVOP3PMods

The rest of the code section assumes there are exactly two elements
in the vector (Lo, Hi), so add the check before entering the section.

Differential Revision: https://reviews.llvm.org/D133852

22 months ago[Clang]: Diagnose deprecated copy operations also in MSVC compatibility mode
Julius [Wed, 14 Sep 2022 17:34:16 +0000 (19:34 +0200)]
[Clang]: Diagnose deprecated copy operations also in MSVC compatibility mode

When running in MSVC compatibility mode, previously no deprecated copy
operation warnings (enabled by -Wdeprecated-copy) were raised. This
restriction was already in place when the deprecated copy warning was
first introduced.

This patch removes said restriction so that deprecated copy warnings, if
enabled, are also raised in MSVC compatibility mode. The reasoning here
being that these warnings are still useful when running in MSVC
compatibility mode and also have to be semi-explicitly enabled in the
first place (using -Wdeprecated-copy, -Wdeprecated or -Wextra).

Differential Revision: https://reviews.llvm.org/D133354

22 months ago[ConstraintElimination] Track if variables are positive in constraint.
Florian Hahn [Wed, 14 Sep 2022 17:43:53 +0000 (18:43 +0100)]
[ConstraintElimination] Track if variables are positive in constraint.

Keep track if variables are known positive during constraint
decomposition, aggregate the information when building the constraint
object and encode the extra information as constraints to be used during
reasoning.

22 months ago[mlir][vector] Clean up and generalize lowering of warp_execute to scf
Thomas Raoux [Wed, 14 Sep 2022 02:10:38 +0000 (02:10 +0000)]
[mlir][vector] Clean up and generalize lowering of warp_execute to scf

Simplify the lowering of warp_execute_on_lane0 of scf.if by making the
logic more generic. Also remove the assumption that the most inner
dimension is the dimension distributed.

Differential Revision: https://reviews.llvm.org/D133826

22 months agollvm-reduce: Do not insert replacement IMPLICIT_DEFs for dead defs
Matt Arsenault [Wed, 14 Sep 2022 13:10:25 +0000 (09:10 -0400)]
llvm-reduce: Do not insert replacement IMPLICIT_DEFs for dead defs

Also skip dead defs when looking for a previous vreg with the same
class. This helps avoid some mid-reduction verifier errors when
LiveIntervals computation starts introducing dead flags everywhere.