review.tizen.org Git - platform/upstream/llvm.git/log

projects / platform / upstream / llvm.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Fangrui Song [Fri, 16 Sep 2022 02:58:42 +0000 (19:58 -0700)]

[HIP][test] Avoid %T

%T is a deprecated lit feature. It refers to the parent directory.
When two tests in test/Driver refer to the same `%T/foo`, they are racy with each other.
%t includes the test name and is safe for use.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D133998

commit | commitdiff | tree

Jez Ng [Fri, 16 Sep 2022 02:55:41 +0000 (22:55 -0400)]

[lld-macho][reland] Add support for N_INDR symbols

This is similar to the `-alias` CLI option, but it gives finer-grained
control in that it allows the aliased symbols to be treated as private
externs.

While working on this, I realized that our `-alias` handling did not
cover the cases where the aliased symbol is a common or dylib symbol,
nor the case where we have an undefined that gets treated specially and
converted to a defined later on. My N_INDR handling neglects this too
for now; I've added checks and TODO messages for these.

`N_INDR` symbols cropped up as part of our attempt to link swift-stdlib.

Reviewed By: #lld-macho, thakis, thevinster

Differential Revision: https://reviews.llvm.org/D133825

commit | commitdiff | tree

Lang Hames [Thu, 15 Sep 2022 03:11:22 +0000 (20:11 -0700)]

[ORC-RT] Invert the layout of the trivial-jit-re-dlopen testcase.

Compiles and moves the original C code for main to Inputs/dlopen-dlclose-x2.S,
where it can be shared with other testcases that want a
dlopen-dlclose-dlopen-dlclose sequence. The assembly containging the
initializers to be tested is moved into the test file.

commit | commitdiff | tree

Lang Hames [Fri, 16 Sep 2022 02:06:16 +0000 (19:06 -0700)]

[ORC-RT] Make ExecutorAddrDiff an alias for uint64_t.

Unlike ExecutorAddr, there's limited value to having a distinct type for
ExecutorAddrDiff, and it's occasionally awkward to work with. The corresponding
LLVM type (llvm::orc::ExecutorAddrDiff) was already made a type-alias in
9e2cfb061a882.

commit | commitdiff | tree

Gulfem Savrun Yeniceri [Fri, 26 Aug 2022 16:38:44 +0000 (16:38 +0000)]

[InstrProfiling] No runtime hook for unused funcs

This is a reland of https://reviews.llvm.org/D122336.
Original patch caused a problem in collecting coverage in
Fuchsia because it was returning early without putting unused
function names into __llvm_prf_names section. This patch
fixes that issue.

The original commit message is as the following:
CoverageMappingModuleGen generates a coverage mapping record
even for unused functions with internal linkage, e.g.
static int foo() { return 100; }
Clang frontend eliminates such functions, but InstrProfiling pass
still emits runtime hook since there is a coverage record.
Fuchsia uses runtime counter relocation, and pulling in profile
runtime for unused functions causes a linker error:
undefined hidden symbol: __llvm_profile_counter_bias.
Since https://reviews.llvm.org/D98061, we do not hook profile
runtime for the binaries that none of its translation units
have been instrumented in Fuchsia. This patch extends that for
the instrumented binaries that consist of only unused functions.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D122336

commit | commitdiff | tree

Brad Smith [Fri, 16 Sep 2022 01:43:01 +0000 (21:43 -0400)]

[lit] Set shlibpath_var on OpenBSD

commit | commitdiff | tree

Yuta Mukai [Thu, 15 Sep 2022 16:52:18 +0000 (01:52 +0900)]

[MachinePipeliner] Fix the interpretation of the scheduling model

The method of counting resource consumption is modified to be based on
"Cycles" value when DFA is not used.

The calculation of ResMII is modified to total "Cycles" and divide it
by the number of units for each resource. Previously, ResMII was
excessive because it was assumed that resources were consumed for
the cycles of "Latency" value.

The method of resource reservation is modified similarly. When a
value of "Cycles" is larger than 1, the resource is considered to be
consumed by 1 for cycles of its length from the scheduled cycle.
To realize this, ResourceManager maintains a resource table for all
slots. Previously, resource consumption was always 1 for 1 cycle
regardless of the value of "Cycles" or "Latency".

In addition, the number of micro operations per cycle is modified to
be constrained by "IssueWidth". To disable the constraint,
--pipeliner-force-issue-width=100 can be used.

For the case of using DFA, the scheduling results are unchanged.

Reviewed By: dpenry

Differential Revision: https://reviews.llvm.org/D133572

commit | commitdiff | tree

Colin Cross [Thu, 15 Sep 2022 23:58:57 +0000 (23:58 +0000)]

Set HOME for tests that use module cache path

Getting the default module cache path calls llvm::sys::path::cache_directory,
which calls home_directory, which checks the HOME environment variable
before falling back to getpwuid. When compiling against musl libc,
which does not support NSS, and running on a machine that doesn't have
the current user in /etc/passwd due to NSS, no home directory can
be found. Set the HOME environment variable in the tests to avoid
depending on getpwuid.

Reviewed By: pirama, srhines

Differential Revision: https://reviews.llvm.org/D132984

commit | commitdiff | tree

Navid Emamdoost [Thu, 15 Sep 2022 22:33:43 +0000 (15:33 -0700)]

Add -fsanitizer-coverage=control-flow

Reviewed By: kcc, vitalybuka, MaskRay

Differential Revision: https://reviews.llvm.org/D133157

commit | commitdiff | tree

Jeffrey Byrnes [Thu, 15 Sep 2022 22:37:58 +0000 (15:37 -0700)]

[NFC] Fix tests in commit 20cf170e68def

commit | commitdiff | tree

Colin Cross [Thu, 15 Sep 2022 21:58:24 +0000 (21:58 +0000)]

Fix std::fpos pretty printer on musl

The mbstate_t field in std::fpos is an opaque type provied by libc,
and musl's implementation does not match the one used by glibc.
Change StdFposPrinter to verify its assumptions about the layout
of mbstate_t, and leave out the state printing if it doesn't match.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D132983

commit | commitdiff | tree

Aart Bik [Thu, 15 Sep 2022 20:38:14 +0000 (13:38 -0700)]

[mlir][sparse][python] improve sparse encoding test

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D133971

commit | commitdiff | tree

David Green [Thu, 15 Sep 2022 20:52:55 +0000 (21:52 +0100)]

[AArch64] Add some vector lowering tests and regenerate a couple of files. NFC

commit | commitdiff | tree

Roy Sundahl [Thu, 15 Sep 2022 19:09:35 +0000 (12:09 -0700)]

[test][fuzzer] XFAIL tvOS tests pending investigation. (rdar://99981102)

These four tests are failing on tvOS devices (not simulators) so XFAIL
them for now for CI and investigate further.

rdar://99981102

Differential Revision: https://reviews.llvm.org/D133963

commit | commitdiff | tree

Amy Huang [Thu, 15 Sep 2022 20:23:25 +0000 (20:23 +0000)]

Fix error in clang /MT equivalent flag patch.

This is a followup to reviews.llvm.org/D133457.

commit | commitdiff | tree

Philip Reames [Thu, 15 Sep 2022 19:50:00 +0000 (12:50 -0700)]

[RISCV] Verify merge operand is tied properly

Differential Revision: https://reviews.llvm.org/D133957

commit | commitdiff | tree

Philip Reames [Thu, 15 Sep 2022 19:47:58 +0000 (12:47 -0700)]

[RISCV] Verify VL operand on instructions if present

These should only be immediate values or GPR registers.

Differential Revision: https://reviews.llvm.org/D133953

commit | commitdiff | tree

Alexander Timofeev [Fri, 9 Sep 2022 17:32:51 +0000 (19:32 +0200)]

[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode

This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593

commit | commitdiff | tree

Vitaly Buka [Thu, 15 Sep 2022 19:32:43 +0000 (12:32 -0700)]

[test] Regenerate few tests

commit | commitdiff | tree

Erich Keane [Thu, 15 Sep 2022 19:07:23 +0000 (12:07 -0700)]

Stop trying to fixup 'overloadable' prototypeless functions.

While investigating something else, I discovered that a prototypeless
function with 'overloadable' was having the attribute left on the
declaration, which caused 'ambiguous' call errors later on. This lead to
some confusion. This patch removes the 'overloadable' attribute from
the declaration and leaves it as prototypeless, instead of trying to
make it variadic.

commit | commitdiff | tree

Joseph Huber [Thu, 15 Sep 2022 18:58:21 +0000 (13:58 -0500)]

[Libomptarget] Embed bitcode library in static library instead.

This patch changes the CMake to instead embed the already generated
LLVM-IR bitcode library into an object file to create the static
library. This is different from the previous method which generated them
separately. This will make the build faster and allow us to perform the
same internalization into a single library we do with the bitcode
library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133952

commit | commitdiff | tree

Hanhan Wang [Thu, 15 Sep 2022 18:44:52 +0000 (11:44 -0700)]

[mlir][linalg] Propagate attributes when doing named ops conversion.

Custom attributes can be set on the operation. It prevents them to be
removed when doing named ops conversion.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D133892

commit | commitdiff | tree

Florian Hahn [Thu, 15 Sep 2022 18:35:25 +0000 (19:35 +0100)]

[CGP] Update failing test missed in 81a11da762577.

commit | commitdiff | tree

Groverkss [Thu, 15 Sep 2022 17:45:58 +0000 (18:45 +0100)]

[MLIR][Presburger] Improve unittest parsing

This patch adds better functions for parsing MultiAffineFunctions and
PWMAFunctions in Presburger unittests.

A PWMAFunction can now be parsed as:

```
PWMAFunction result = parsePWMAF({
    {"(x, y) : (x >= 10, x <= 20, y >= 1)", "(x, y) -> (x + y)"},
    {"(x, y) : (x >= 21)", "(x, y) -> (x + y)"},
    {"(x, y) : (x <= 9)", "(x, y) -> (x - y)"},
    {"(x, y) : (x >= 10, x <= 20, y <= 0)", "(x, y) -> (x - y)"},
});
```

which is much more readable than the old format since the output can be
described as an AffineMap, instead of coefficients.

This patch also adds support for parsing divisions in MultiAffineFunctions
and PWMAFunctions which was previously not possible.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D133654

commit | commitdiff | tree

Philip Reames [Thu, 15 Sep 2022 18:15:35 +0000 (11:15 -0700)]

[RISCV] Add test coverage for mixed fixed and scalable uses of splats

commit | commitdiff | tree

Florian Hahn [Thu, 15 Sep 2022 18:18:12 +0000 (19:18 +0100)]

[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.

This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle  creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

    int foo(uint8_t *p, int N) {
      unsigned long long sum = 0;
      for (int i = 0; i < N ; i++, p++) {
unsigned int v = *p;
sum += (v < 127) ? v : 256 - v;
      }
      return sum;
    }

https://clang.godbolt.org/z/Wco866MjY

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D120571

commit | commitdiff | tree

Haojian Wu [Thu, 15 Sep 2022 18:01:20 +0000 (20:01 +0200)]

Revert "Fix bazel build after 84d07d021333f7b5716f0444d5c09105557272e0."

This reverts commit 10250c5a2a2ca6be683ff940d776648a2d5968e3 as the
related patch is being reverted.

commit | commitdiff | tree

Sergei Barannikov [Thu, 15 Sep 2022 17:42:47 +0000 (13:42 -0400)]

[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s

All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit. This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.

commit | commitdiff | tree

Sanjay Patel [Thu, 15 Sep 2022 17:42:30 +0000 (13:42 -0400)]

[SCCP] convert ashr to lshr for non-negative shift value

This is similar to the existing signed instruction folds.
We get the obvious minimal patterns in other passes, but
this avoids potential missed folds when the multi-block
tests are converted to selects.

commit | commitdiff | tree

Sanjay Patel [Thu, 15 Sep 2022 17:10:24 +0000 (13:10 -0400)]

[SCCP] add tests for ashr range transforms; NFC

commit | commitdiff | tree

Amy Huang [Wed, 31 Aug 2022 22:09:45 +0000 (22:09 +0000)]

Add Clang driver flags equivalent to cl's /MD, /MT, /MDd, /MTd.

This will allow selecting the MS C runtime library without having to use
cc1 flags.

Differential Revision: https://reviews.llvm.org/D133457

commit | commitdiff | tree

Groverkss [Thu, 15 Sep 2022 17:30:57 +0000 (18:30 +0100)]

Revert "[MLIR][Presburger] Improve unittest parsing"

This reverts commit 84d07d021333f7b5716f0444d5c09105557272e0.

Reverted to fix a compilation issue on gcc8.

commit | commitdiff | tree

Groverkss [Thu, 15 Sep 2022 17:29:22 +0000 (18:29 +0100)]

Revert "[mlir] Remove the unused source file."

This reverts commit e488ce29ec5ead2d518c183890215313c9d1b1f0.

Reverted to fix a compilation issue on gcc8.

commit | commitdiff | tree

Aart Bik [Thu, 15 Sep 2022 03:18:51 +0000 (20:18 -0700)]

[mlir][sparse] partially implement codegen for sparse_tensor.compress

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D133912

commit | commitdiff | tree

Siva Chandra Reddy [Thu, 15 Sep 2022 07:52:17 +0000 (07:52 +0000)]

[libc] Add the implementation of the "remove" function.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D133922

commit | commitdiff | tree

Craig Topper [Thu, 15 Sep 2022 16:38:02 +0000 (09:38 -0700)]

[IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation.

There are two ctlz intrinsics here with the zero_is_poison flag
set. There are also two comparisons that check if either of the
inputs the ctlzs are zero. We need to use a logical or to block
the poison from the ctlz if either of the inputs is zero.

Reviewed By: arsenm, aqjune

Differential Revision: https://reviews.llvm.org/D130680

commit | commitdiff | tree

Sanjay Patel [Thu, 15 Sep 2022 16:01:11 +0000 (12:01 -0400)]

[InstCombine] fold X*X == 0 --> X == 0

This is safe when the mul does not overflow:
https://alive2.llvm.org/ce/z/LedVVP

This could be extended to handle non-zero compare constants
and non-squared multiplies.

commit | commitdiff | tree

Sanjay Patel [Thu, 15 Sep 2022 15:56:45 +0000 (11:56 -0400)]

[InstCombine] add tests for X*X == 0; NFC

commit | commitdiff | tree

Simon Pilgrim [Thu, 15 Sep 2022 15:20:56 +0000 (16:20 +0100)]

[CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts

These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)

commit | commitdiff | tree

Jay Foad [Thu, 15 Sep 2022 09:40:54 +0000 (10:40 +0100)]

[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction

Differential Revision: https://reviews.llvm.org/D133928

commit | commitdiff | tree

Joe Loser [Fri, 26 Aug 2022 03:31:22 +0000 (21:31 -0600)]

[libc++] Clean up `_LIBCPP_HAS_NO_PLATFORM_WAIT` macro

As the comment suggests, `_LIBCPP_HAS_NO_PLATFORM_WAIT` is not documented or
defined anywhere internally in the build system. It's a direct define in terms
of `_LIBCPP_HAS_NO_THREADS`. So, remove `_LIBCPP_HAS_NO_PLATFORM_WAIT` and use
`_LIBCPP_HAS_NO_THREADS` instead to control the desired behavior.

Differential Revision: https://reviews.llvm.org/D132715

commit | commitdiff | tree

Matt Arsenault [Sat, 23 Jul 2022 16:32:05 +0000 (12:32 -0400)]

AMDGPU: Use GlobalPriority for largest register tuples

Only do this for 16 and 32 register tuples, although we might want to
extend to 8 tuples.

It's incredibly expensive to spill these, and doing so majorly
interferes with the ability to allocate anything else in the function.

The lit tests show mostly sizeable improvements with a handful of tiny
regressions with large vectors.

commit | commitdiff | tree

Jakub Kuderski [Thu, 15 Sep 2022 15:34:43 +0000 (11:34 -0400)]

[mlir][arith] Support wide int cast emulation

Add support for `arith.extsi`, `arith.extui`, and `arith.trunci` ops.

Tested by checking the results for all 16-bit inputs when emulating i16 with i8.

Reviewed By: antiagainst, Mogball

Differential Revision: https://reviews.llvm.org/D133612

commit | commitdiff | tree

Dmitry Preobrazhensky [Thu, 15 Sep 2022 15:15:50 +0000 (18:15 +0300)]

[AMDGPU][MC][GFX11] Add disassembler tests for v_readfirstlane_b32

Differential Revision: https://reviews.llvm.org/D133437

commit | commitdiff | tree

Nico Weber [Thu, 15 Sep 2022 15:12:32 +0000 (11:12 -0400)]

Revert "[lld-macho] Add support for N_INDR symbols"

This reverts commit 5b8da10b87f7009c06215449e4a9c61dab91697a.
Breaks tests, see https://reviews.llvm.org/D133825

commit | commitdiff | tree

Sander de Smalen [Wed, 14 Sep 2022 15:53:13 +0000 (15:53 +0000)]

[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm()

A thread may not have access to SME or TPIDR2_EL0, so in order to
safely query PSTATE.SM in a streaming-compatible function, the
code should call `__arm_sme_state()`, as described in the ABI:

https://github.com/ARM-software/abi-aa/pull/123/commits/c2bb09c4d4ee60a5787baf1ccc7e92e67e4240b7

This means that the value of pstate.sm is:
* 0 if the function is non-streaming.
* 1 if the function has `arm_streaming` or `arm_locally_streaming`.
* evaluated at runtime by a call to __arm_sme_state() otherwise.

This patch also adds a calling convention for calls to SME support routines.

At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic
and use function calls (with the corresponding cc) directly instead.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131571

commit | commitdiff | tree

Dmitry Preobrazhensky [Thu, 15 Sep 2022 15:03:26 +0000 (18:03 +0300)]

[AMDGPU][MC][GFX11][NFC] Update disassembler tests for MIMG instructions

Differential Revision: https://reviews.llvm.org/D133411

commit | commitdiff | tree

Simon Pilgrim [Thu, 15 Sep 2022 14:55:00 +0000 (15:55 +0100)]

[CostModel][X86] Remove redundant SSSE3 checks from div/rem costs

These all match the default SSE2 costs so use those instead

commit | commitdiff | tree

Matt Arsenault [Sat, 23 Jul 2022 14:13:25 +0000 (10:13 -0400)]

RegAllocGreedy: Avoid overflowing priority bitfields

The class priority is expected to be at most 5 bits before it starts
clobbering bits used for other fields. Also clamp the instruction
distance in case we have millions of instructions.

AMDGPU was accidentally overflowing into the global priority bit in
some cases. I think in principal we would have wanted this, but in the
cases I've looked at, it had the counter intuitive effect and
de-prioritized the large register tuple.

Avoid using weird bit hack PPC uses for global priority. The
AllocationPriority field is really 5 bits, and PPC was relying on
overflowing this to 6-bits to forcibly set the global priority
bit. Split this out as a separate flag to avoid having magic behavior
for values above 31.

commit | commitdiff | tree

Simon Pilgrim [Thu, 15 Sep 2022 14:28:51 +0000 (15:28 +0100)]

[CostModel][X86] Remove redundant BTVER2 checks from arithmetic costs

These all match the default AVX/AVX1 costs so use those instead

commit | commitdiff | tree

Simon Pilgrim [Thu, 15 Sep 2022 14:25:52 +0000 (15:25 +0100)]

[CostModel][X86] Remove redundant BTVER2 checks from shift costs

These all match the default AVX/AVX1 costs so use those instead

commit | commitdiff | tree

Florian Hahn [Thu, 15 Sep 2022 14:12:33 +0000 (15:12 +0100)]

[AArch64] Add big-endian tests for trunc-to-tbl.ll

Extra tests for D133495.

commit | commitdiff | tree

Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:36:19 +0000 (16:36 +0300)]

[AMDGPU][MC][GFX11] Add validation of constant bus limitations for VOPD

Differential Revision: https://reviews.llvm.org/D133881

commit | commitdiff | tree

Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:29:53 +0000 (16:29 +0300)]

[AMDGPU][MC][GFX11] Add VOPD literals validation

Differential Revision: https://reviews.llvm.org/D133864

commit | commitdiff | tree

Dmitry Preobrazhensky [Thu, 15 Sep 2022 13:24:25 +0000 (16:24 +0300)]

[AMDGPU][MC][NFC] Refactor AMDGPUAsmParser::validateVOPLiteral

Differential Revision: https://reviews.llvm.org/D133861

commit | commitdiff | tree

Mehdi Amini [Mon, 29 Aug 2022 12:18:14 +0000 (12:18 +0000)]

Apply clang-tidy fixes for llvm-include-order in TypeTest.cpp (NFC)

commit | commitdiff | tree

Mehdi Amini [Mon, 29 Aug 2022 12:14:14 +0000 (12:14 +0000)]

Apply clang-tidy fixes for bugprone-argument-comment in LLVMTypeTest.cpp (NFC)

commit | commitdiff | tree

Mehdi Amini [Mon, 29 Aug 2022 12:10:49 +0000 (12:10 +0000)]

Apply clang-tidy fixes for readability-simplify-boolean-expr in OpFormatGen.cpp (NFC)

commit | commitdiff | tree

Tue Ly [Thu, 15 Sep 2022 05:00:13 +0000 (01:00 -0400)]

[libc][math] Improve sinhf and coshf performance.

Optimize `sinhf` and `coshf` by computing exp(x) and exp(-x) simultaneously.

Currently `sinhf` and `coshf` are implemented using the following formulas:
```
  sinh(x) = 0.5 *(exp(x) - 1) - 0.5*(exp(-x) - 1)
  cosh(x) = 0.5*exp(x) + 0.5*exp(-x)
```
where `exp(x)` and `exp(-x)` are calculated separately using the formula:
```
  exp(x) ~ 2^hi * 2^mid * exp(dx)
         ~ 2^hi * 2^mid * P(dx)
```
By expanding the polynomial `P(dx)` into even and odd parts
```
  P(dx) = P_even(dx) + dx * P_odd(dx)
```
we can see that the computations of `exp(x)` and `exp(-x)` have many things in common,
namely:
```
  exp(x)  ~ 2^(hi + mid) * (P_even(dx) + dx * P_odd(dx))
  exp(-x) ~ 2^(-(hi + mid)) * (P_even(dx) - dx * P_odd(dx))
```
Expanding `sinh(x)` and `cosh(x)` with respect to the above formulas, we can compute
these two functions as follow in order to maximize the sharing parts:
```
  sinh(x) = (e^x - e^(-x)) / 2
          ~ 0.5 * (P_even * (2^(hi + mid) - 2^(-(hi + mid))) +
                  dx * P_odd * (2^(hi + mid) + 2^(-(hi + mid))))
  cosh(x) = (e^x + e^(-x)) / 2
          ~ 0.5 * (P_even * (2^(hi + mid) + 2^(-(hi + mid))) +
                  dx * P_odd * (2^(hi + mid) - 2^(-(hi + mid))))
```
So in this patch, we perform the following optimizations for `sinhf` and `coshf`:
  # Use the above formulas to maximize sharing intermediate results,
  # Apply similar optimizations from https://reviews.llvm.org/D133870

Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:
For `sinhf`:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 16.718
System LIBC reciprocal throughput : 63.151

BEFORE:
LIBC reciprocal throughput        : 90.116
LIBC reciprocal throughput        : 28.554    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 22.577    (with `-mfma` flag)

AFTER:
LIBC reciprocal throughput        : 36.482
LIBC reciprocal throughput        : 16.955    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 13.943    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 48.821
System LIBC latency : 137.019

BEFORE
LIBC latency        : 97.122
LIBC latency        : 84.214    (with `-msse4.2` flag)
LIBC latency        : 71.611    (with `-mfma` flag)

AFTER
LIBC latency        : 54.555
LIBC latency        : 50.865    (with `-msse4.2` flag)
LIBC latency        : 48.700    (with `-mfma` flag)
```
For `coshf`:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 16.939
System LIBC reciprocal throughput : 19.695

BEFORE:
LIBC reciprocal throughput        : 52.845
LIBC reciprocal throughput        : 29.174    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 22.553    (with `-mfma` flag)

AFTER:
LIBC reciprocal throughput        : 37.169
LIBC reciprocal throughput        : 17.805    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 14.691    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 48.478
System LIBC latency : 48.044

BEFORE
LIBC latency        : 99.123
LIBC latency        : 85.595    (with `-msse4.2` flag)
LIBC latency        : 72.776    (with `-mfma` flag)

AFTER
LIBC latency        : 57.760
LIBC latency        : 53.967    (with `-msse4.2` flag)
LIBC latency        : 50.987    (with `-mfma` flag)
```

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D133913

commit | commitdiff | tree

Alexey Lapshin [Sun, 4 Sep 2022 09:38:36 +0000 (12:38 +0300)]

[DWARFLinker][NFC] Set the target DWARF version explicitly.

Currently, DWARFLinker determines the target DWARF version internally.
It examines incoming object files, detects maximal
DWARF version and uses that version for the output file.
This patch allows explicitly setting output DWARF version by the consumer
of DWARFLinker. So that DWARFLinker uses a specified version instead
of autodetected one. It allows consumers to use different logic for
setting the target DWARF version. f.e. instead of the maximally used version
someone could set a higher version to convert from DWARFv4 to DWARFv5
(This possibility is not supported yet, but it would be good if
the interface will support it). Or another variant is to set the target
version through the command line. In this patch, the autodetection is moved
into the consumers(DwarfLinkerForBinary.cpp, DebugInfoLinker.cpp).

Differential Revision: https://reviews.llvm.org/D132755

commit | commitdiff | tree

Simon Pilgrim [Thu, 15 Sep 2022 13:01:27 +0000 (14:01 +0100)]

[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops

Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)

commit | commitdiff | tree

Florian Hahn [Thu, 15 Sep 2022 13:01:26 +0000 (14:01 +0100)]

[AArch64] Add big-endian tests for zext-to-tbl.ll

Extra tests for D120571.

commit | commitdiff | tree

wanglei [Thu, 15 Sep 2022 12:31:24 +0000 (20:31 +0800)]

[LoongArch] Fixup value adjustment in applyFixup

A complete implementation of `applyFixup` for D132323.

Makes `LoongArchAsmBackend::shouldForceRelocation` to determine
if the relocation types must be forced.

This patch also adds range and alignment checks for `b*` instructions'
operands, at which point the offset to a label is known.

Differential Revision: https://reviews.llvm.org/D132818

commit | commitdiff | tree

Aleksandr Platonov [Thu, 15 Sep 2022 12:51:30 +0000 (15:51 +0300)]

[clang][RecoveryExpr] Don't perform alignment check if parameter type is dependent

This patch fixes a crash which appears because of getTypeAlignInChars() call with depentent type.

Reviewed By: hokein

Differential Revision: https://reviews.llvm.org/D133886

commit | commitdiff | tree

Ivan Kosarev [Thu, 15 Sep 2022 12:20:24 +0000 (13:20 +0100)]

[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D133787

commit | commitdiff | tree

Jez Ng [Thu, 15 Sep 2022 12:34:35 +0000 (08:34 -0400)]

[lld-macho] Add support for N_INDR symbols

This is similar to the `-alias` CLI option, but it gives finer-grained
control in that it allows the aliased symbols to be treated as private
externs.

While working on this, I realized that our `-alias` handling did not
cover the cases where the aliased symbol is a common or dylib symbol,
nor the case where we have an undefined that gets treated specially and
converted to a defined later on. My N_INDR handling neglects this too
for now; I've added checks and TODO messages for these.

`N_INDR` symbols cropped up as part of our attempt to link swift-stdlib.

Reviewed By: #lld-macho, thakis, thevinster

Differential Revision: https://reviews.llvm.org/D133825

commit | commitdiff | tree

Haojian Wu [Thu, 15 Sep 2022 11:54:29 +0000 (13:54 +0200)]

[mlir] Remove the unused source file.

It seems to be a missing file in 84d07d021333f7b5716f0444d5c09105557272e0

Differential Revision: https://reviews.llvm.org/D133937

commit | commitdiff | tree

Arjun P [Thu, 15 Sep 2022 12:23:00 +0000 (13:23 +0100)]

[MLIR][Presburger] clarify why -0 is used instead of 0 (NFC)

commit | commitdiff | tree

Ilia Diachkov [Wed, 14 Sep 2022 08:51:03 +0000 (11:51 +0300)]

[SPIRV] add IR regularization pass

The patch adds the regularization pass that prepare LLVM IR for
the IR translation. It also contains following changes:
- reduce indentation, make getNonParametrizedType, getSamplerType,
getPipeType, getImageType, getSampledImageType static in SPIRVBuiltins,
- rename mayBeOclOrSpirvBuiltin to getOclOrSpirvBuiltinDemangledName,
- move isOpenCLBuiltinType, isSPIRVBuiltinType, isSpecialType from
SPIRVGlobalRegistry.cpp to SPIRVUtils.cpp, renaming isSpecialType to
isSpecialOpaqueType,
- implment getTgtMemIntrinsic() in SPIRVISelLowering,
- add hasSideEffects = 0 in Pseudo (SPIRVInstrFormats.td),
- add legalization rule for G_MEMSET, correct G_BRCOND rule,
- add capability processing for OpBuildNDRange in SPIRVModuleAnalysis,
- don't correct types of registers holding constants and used in
G_ADDRSPACE_CAST (SPIRVPreLegalizer.cpp),
- lower memset/bswap intrinsics to functions in SPIRVPrepareFunctions,
- change TargetLoweringObjectFileELF to SPIRVTargetObjectFile
in SPIRVTargetMachine.cpp,
- correct comments.
5 LIT tests are added to show the improvement.

Differential Revision: https://reviews.llvm.org/D133253

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>

commit | commitdiff | tree

Michael Platings [Thu, 15 Sep 2022 10:59:03 +0000 (11:59 +0100)]

[NFC] Don't assume llvm directory is CMake root

This makes the file consistent with ARM/CMakeLists.txt

commit | commitdiff | tree

Haojian Wu [Thu, 15 Sep 2022 11:52:46 +0000 (13:52 +0200)]

Fix bazel build after 84d07d021333f7b5716f0444d5c09105557272e0.

commit | commitdiff | tree

Nicolas Vasilache [Thu, 15 Sep 2022 11:12:58 +0000 (04:12 -0700)]

[mlir][Linalg] Post submit addressed comments missed in f0cdc5bcd3f25192f12bfaff072ce02497b59c3c

Differential Revision: https://reviews.llvm.org/D133936

commit | commitdiff | tree

Tobias Hieta [Thu, 15 Sep 2022 11:32:32 +0000 (13:32 +0200)]

[NFC] Fix exception in version-check.py script

commit | commitdiff | tree

Aaron Ballman [Thu, 15 Sep 2022 11:29:49 +0000 (07:29 -0400)]

Add a "Potentially Breaking Changes" section to the Clang release notes

Sometimes we make changes to the compiler that we expect may cause
disruption for users. For example, we may strengthen a warning to
default to be an error, or fix an accepts-invalid bug that's been
around for a long time, etc which may cause previously accepted code to
now be rejected. Rather than hope users discover that information by
reading all of the release notes, it's better that we call these out in
one location at the top of the release notes.

Based on feedback collected in the discussion at:
https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-implicit-function-declaration/65213/

Differential Revision: https://reviews.llvm.org/D133771

commit | commitdiff | tree

Groverkss [Thu, 15 Sep 2022 11:09:00 +0000 (12:09 +0100)]

[MLIR][Presburger] Improve unittest parsing

This patch adds better functions for parsing MultiAffineFunctions and
PWMAFunctions in Presburger unittests.

A PWMAFunction can now be parsed as:

```
PWMAFunction result = parsePWMAF({
    {"(x, y) : (x >= 10, x <= 20, y >= 1)", "(x, y) -> (x + y)"},
    {"(x, y) : (x >= 21)", "(x, y) -> (x + y)"},
    {"(x, y) : (x <= 9)", "(x, y) -> (x - y)"},
    {"(x, y) : (x >= 10, x <= 20, y <= 0)", "(x, y) -> (x - y)"},
});
```

which is much more readable than the old format since the output can be
described as an AffineMap, instead of coefficients.

This patch also adds support for parsing divisions in MultiAffineFunctions
and PWMAFunctions which was previously not possible.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D133654

commit | commitdiff | tree

esmeyi [Thu, 15 Sep 2022 10:06:25 +0000 (06:06 -0400)]

[PowerPC] Converts to comparison against zero even when the optimization
doesn't happened in peephole optimizer.

Summary: Converting a comparison against 1 or -1 into a comparison
against 0 can exploit record-form instructions for comparison optimization.
The conversion will happen only when a record-form instruction can be used
to replace the comparison during the peephole optimizer (see function optimizeCompareInstr).

In post-RA, we also want to optimize the comparison by using the record
form (see D131873) and it requires additional dataflow analysis to reliably
find uses of the CR register set.

It's reasonable to common the conversion for both peephole optimizer and
post-RA optimizer.

Converting to comparison against zero even when the optimization doesn't
happened in peephole optimizer may create additional opportunities for the
post-RA optimization.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D131374

commit | commitdiff | tree

Guray Ozen [Thu, 15 Sep 2022 08:39:13 +0000 (10:39 +0200)]

[mlir][linalg] Retire Linalg's Vectorization Pattern

This revision retires the LinalgCodegenStrategy vectorization pattern. Please see the context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785.
This revision improves the transform dialect's VectorizeOp in different ways below:
- Adds LinalgDialect as a dependent dialect. When `transform.structured.vectorize` vectorizes `tensor.pad`, it generates `linalg.init_tensor`. In this case, linalg dialect must be registered.
- Inserts CopyVectorizationPattern in order to vectorize `memref.copy`.
- Creates two attributes: `disable_multi_reduction_to_contract_patterns` and `disable_transfer_permutation_map_lowering_patterns`. They are limiting the power of vectorization and are currently intended for testing purposes.

It also removes some of the "CHECK: vector.transfer_write" in the vectorization.mlir test. They are redundant writes, at the end of the code there is a rewrite to the same place. Transform dialect no longer generates them.

Depends on D133684 that retires the LinalgCodegenStrategy vectorization pass.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133699

commit | commitdiff | tree

Guray Ozen [Thu, 15 Sep 2022 08:45:46 +0000 (10:45 +0200)]

[mlir][linalg] Retire Linalg's StrategyVectorizePass

We retire linalg's strategy vectorize pass. Our goal is to use transform dialect instead of passes.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133684

commit | commitdiff | tree

Tom Praschan [Sun, 11 Sep 2022 12:54:26 +0000 (14:54 +0200)]

[clangd] Fix hover on symbol introduced by using declaration

This fixes https://github.com/clangd/clangd/issues/1284. The example
there was C++20's "using enum", but I noticed that we had the same issue
for other using-declarations.

Differential Revision: https://reviews.llvm.org/D133664

commit | commitdiff | tree

Marco Elver [Thu, 15 Sep 2022 08:36:11 +0000 (10:36 +0200)]

[GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests

Add fix for propagation of !pcsections metadata for expanded atomics,
together with more tests for interesting atomic instructions (based on
llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133710

commit | commitdiff | tree

Christian Sigg [Wed, 7 Sep 2022 22:00:38 +0000 (00:00 +0200)]

[Bazel] Add lit tests to bazel builds.

Add BUILD.bazel files for most of the MLIR tests and lit tests itself.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D133455

commit | commitdiff | tree

Evgeniy Brevnov [Thu, 15 Sep 2022 05:24:56 +0000 (12:24 +0700)]

[JumpThreading][NFC] Reuse existing DT instead of recomputation (newPM)

This is the same change as
503d5771b6c5e3544a9fa3be6b8d085ffbbd4057 with the same intent but for new pass manager.

commit | commitdiff | tree

Fangrui Song [Thu, 15 Sep 2022 05:14:36 +0000 (22:14 -0700)]

[IRBuilder] Fix -Wunused-variable in non-assertion build. NFC

commit | commitdiff | tree

Dhruva Chakrabarti [Thu, 15 Sep 2022 03:08:46 +0000 (03:08 +0000)]

Revert "[OpenMP] Codegen aggregate for outlined function captures"

This reverts commit 7539e9cf811e590d9f12ae39673ca789e26386b4.

commit | commitdiff | tree

Vitaly Buka [Mon, 12 Sep 2022 04:49:18 +0000 (21:49 -0700)]

[nfc][msan] getShadowOriginPtr on <N x ptr>

Some vector instructions can benefit from
of Addr as <N x ptr>.

Differential Revision: https://reviews.llvm.org/D133681

commit | commitdiff | tree

Vitaly Buka [Mon, 12 Sep 2022 04:30:07 +0000 (21:30 -0700)]

[IRBuilder] Add CreateMaskedExpandLoad and CreateMaskedCompressStore

commit | commitdiff | tree

Vitaly Buka [Sun, 11 Sep 2022 19:55:57 +0000 (12:55 -0700)]

[NFC][msan] Rename variables to match definition

commit | commitdiff | tree

Vitaly Buka [Sun, 11 Sep 2022 00:22:32 +0000 (17:22 -0700)]

[NFC][msan] Convert some code to early returns

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133673

commit | commitdiff | tree

Vitaly Buka [Sun, 11 Sep 2022 00:13:12 +0000 (17:13 -0700)]

[NFC][msan] Simplify llvm.masked.load origin code

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133652

commit | commitdiff | tree

Vitaly Buka [Thu, 15 Sep 2022 01:18:45 +0000 (18:18 -0700)]

[msan] Resolve FIXME from D133880

We don't need to change tests we convertToBool
unconditionally only before OR.

commit | commitdiff | tree

Vitaly Buka [Thu, 15 Sep 2022 01:42:13 +0000 (18:42 -0700)]

[test][msan] Use implicit-check-not

commit | commitdiff | tree

Sheng [Thu, 15 Sep 2022 01:22:15 +0000 (09:22 +0800)]

[M68k] Fix the crash of fast register allocator

`MOVEM` is used to spill the register, which will cause problem with 1 byte data, since it only supports word (2 bytes) and long (4 bytes) size.

We change to use the normal `move` instruction to spill 1 byte data.

Fixes #57660

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D133636

commit | commitdiff | tree

Jeff Niu [Wed, 14 Sep 2022 00:35:38 +0000 (17:35 -0700)]

[mlir] Allow `Attribute::print` to elide the type

This patch adds a flag to `Attribute::print` that prints the attribute
without its type.

Fixes #57689

Reviewed By: rriddle, lattner

Differential Revision: https://reviews.llvm.org/D133822

commit | commitdiff | tree

Jeff Niu [Wed, 14 Sep 2022 20:43:45 +0000 (13:43 -0700)]

[mlir][ods] Add cppClassName to ConfinedType

So ODS can generate `OneTypedResult` when a ConfinedType is used as a
result type.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D133893

commit | commitdiff | tree

Giorgis Georgakoudis [Thu, 15 Sep 2022 00:09:54 +0000 (00:09 +0000)]

[OpenMP] Codegen aggregate for outlined function captures

Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6, ABataev

Differential Revision: https://reviews.llvm.org/D102107

commit | commitdiff | tree

Stanislav Mekhanoshin [Wed, 14 Sep 2022 22:42:30 +0000 (15:42 -0700)]

Fix crash while printing MMO target flags

MachineMemOperand::print can dereference a NULL pointer if TII
is not passed from the printMemOperand. This does not happen while
dumping the DAG/MIR from llc but crashes the debugger if a dump()
method is called from gdb.

Differential Revision: https://reviews.llvm.org/D133903

commit | commitdiff | tree

Craig Topper [Wed, 14 Sep 2022 22:53:18 +0000 (15:53 -0700)]

[RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI

This code was written as if it lived in the MC layer instead of
the CodeGen layer. We get the MCInstrDesc directly from MachineInstr.
And we can use RISCVSubtarget::is64Bit instead of going to the
Triple.

Differential Revision: https://reviews.llvm.org/D133905

commit | commitdiff | tree

Sam Clegg [Thu, 1 Sep 2022 07:59:54 +0000 (00:59 -0700)]

[MC] Fix typo in getSectionAddressSize comment. NFC

The comment was refering to a now non-existant function that was removed
in 93e3cf0ebd9c95a8df42fff0aa38fc022422b4d4.

Differential Revision: https://reviews.llvm.org/D133098

commit | commitdiff | tree

Craig Topper [Wed, 14 Sep 2022 21:52:17 +0000 (14:52 -0700)]

[IR][VP] Remove IntrArgMemOnly from vp.gather/scatter.

IntrArgMemOnly is only valid for intrinsics that use a scalar
pointer argument. These intrinsics use a vector of pointer.

Alias analysis will try to find a scalar pointer argument and
will return incorrect alias results when it doesn't find one.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133898

commit | commitdiff | tree

Craig Topper [Wed, 14 Sep 2022 21:52:05 +0000 (14:52 -0700)]

[GVN][VP] Add test case for incorrect removal of a vp.gather. NFC

Pre-commit for D133898

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133899

Domain: System / Toolchain;