platform/upstream/llvm.git
14 months ago[libc++][mdspan] Implement layout_right
Christian Trott [Thu, 29 Jun 2023 14:06:47 +0000 (10:06 -0400)]
[libc++][mdspan] Implement layout_right

This commit implements layout_right in support of C++23 mdspan
(https://wg21.link/p0009). layout_right is a layout mapping policy
whose index mapping corresponds to the memory layout of multidimensional
C-arrays, and is thus also referred to as the C-layout.

Co-authored-by: Damien L-G <dalg24@gmail.com>
Differential Revision: https://reviews.llvm.org/D151267

14 months ago[clang][Sema] Remove dead diagnostic for loss of __unaligned qualifier
Takuya Shimizu [Thu, 29 Jun 2023 14:02:09 +0000 (23:02 +0900)]
[clang][Sema] Remove dead diagnostic for loss of __unaligned qualifier

D120936 has made the loss of `__unaligned` qualifier NOT a bad-conversion.
Because of this, the bad-conversion note about the loss of this qualifier does not take effect.
e.g.
```
void foo(int *ptr);

void func(const __unaligned int *var) { foo(var); }
```
BEFORE this patch:
```
source.cpp:3:41: error: no matching function for call to 'foo'
    3 | void func(const __unaligned int *var) { foo(var); }
      |                                         ^~~
source.cpp:1:6: note: candidate function not viable: 1st argument ('const __unaligned int *') would lose __unaligned qualifier
    1 | void foo(int *ptr);
      |      ^
    2 |
    3 | void func(const __unaligned int *var) { foo(var); }
      |                                             ~~~
```
AFTER this patch:
```
source.cpp:3:41: error: no matching function for call to 'foo'
    3 | void func(const __unaligned int *var) { foo(var); }
      |                                         ^~~
source.cpp:1:6: note: candidate function not viable: 1st argument ('const __unaligned int *') would lose const qualifier
    1 | void foo(int *ptr);
      |      ^
    2 |
    3 | void func(const __unaligned int *var) { foo(var); }
      |                                             ~~~
```
Please note the different mentions of `__unaligned` and `const` in notes.

Reviewed By: cjdb, rnk
Differential Revision: https://reviews.llvm.org/D153690

14 months ago[clang-tidy] Fix modernize-use-std-print check when return value used
Mike Crowe [Wed, 28 Jun 2023 15:36:42 +0000 (15:36 +0000)]
[clang-tidy] Fix modernize-use-std-print check when return value used

The initial implementation of the modernize-use-std-print check was
capable of converting calls to printf (etc.) which used the return value
to calls to std::print which has no return value, thus breaking the
code.

Use code inspired by the implementation of bugprone-unused-return-value
check to ignore cases where the return value is used. Add appropriate
lit test cases and documentation.

Reviewed By: PiotrZSL

Differential Revision: https://reviews.llvm.org/D153860

14 months ago[DWARFv5][DWARFLinker] Remove dsymutil-classic compatibility feature as it leads...
Alexey Lapshin [Tue, 27 Jun 2023 19:27:34 +0000 (21:27 +0200)]
[DWARFv5][DWARFLinker] Remove dsymutil-classic compatibility feature as it leads to an error.

DWARFLinker has a compatibility feature with dsymutil-classic.
It may keep location expression attribute even if does not
reference live address. Current llvm-dwarfdump --verify
reports a error if variable references an address but is not
added into the .debug_names table.

error: Name Index @ 0x0: Entry for DIE @ 0xf35 (DW_TAG_variable) with name seed missing.

DW_TAG_variable
  DW_AT_name      ("seed")
  DW_AT_type      (0x00000000000047b7 "uint64_t")
  DW_AT_location  (DW_OP_addr 0x9ff8)  <<<< dead address

DWARFLinker does not add the variable into .debug_names table
because it references dead address. To have a valid variable and
consistent accelerator table it is necessary to remove location expression
referencing dead address. This patch removes dsymutil-classic
compatibilty feature.

Differential Revision: https://reviews.llvm.org/D153988

14 months ago[X86] Add tests for PR63475 (NFC)
Nikita Popov [Thu, 29 Jun 2023 13:31:18 +0000 (15:31 +0200)]
[X86] Add tests for PR63475 (NFC)

14 months ago[MCP] Optimize copies from undef
pvanhout [Mon, 26 Jun 2023 15:59:55 +0000 (17:59 +0200)]
[MCP] Optimize copies from undef

Revert D152502 and instead optimize away copy from undefs, but clear the undef flag on the original copy.
Apparently, not optimizing the COPY can cause performance issues in some cases.

Fixes SWDEV-405813, SWDEV-405899

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153838

14 months ago[SystemZ][z/OS] Add support for z/OS link step (executable and shared libs)
Sean Perry [Thu, 29 Jun 2023 12:54:20 +0000 (08:54 -0400)]
[SystemZ][z/OS] Add support for z/OS link step (executable and shared libs)

Add support for performing a link step on z/OS.  This will support C & C++ building executables and shared libs.

Reviewed By: zibi, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D153580

14 months ago[RISCV] Add tests for cost modelling constants in phis
Luke Lau [Thu, 29 Jun 2023 12:54:57 +0000 (13:54 +0100)]
[RISCV] Add tests for cost modelling constants in phis

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D149168

14 months ago[AMDGPU] Handle Additional Cases in tryFoldPhiAGPR
pvanhout [Tue, 27 Jun 2023 14:50:01 +0000 (16:50 +0200)]
[AMDGPU] Handle Additional Cases in tryFoldPhiAGPR

 Sometimes PHI have different incoming values, such as:
 ```
%1:vgpr_256 = COPY %0:agpr_256
%2:vgpr_32 = COPY %1:vgpr_256.sub0
```

Those weren't handled, which could lead to massive performance issues if break-large-PHIs kicked in + AGPRs were used (MFMA)

Fixes SWDEV-407986

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D153879

14 months ago[TTI] Use users of GEP to guess access type in getGEPCost
Luke Lau [Thu, 4 May 2023 18:42:05 +0000 (19:42 +0100)]
[TTI] Use users of GEP to guess access type in getGEPCost

Currently getGEPCost uses the target type of the GEP as a heuristic for
the type that will be accessed, to pass onto isLegalAddressingMode.
Targets use this to work out if a GEP can then be folded into the
load/store instruction that uses the GEP.
For example, on RISC-V loads and stores can have an offset added to a
base register folded into a single instruction, so the following GEP is
free:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load i32, ptr %p                           ; getInstructionCost = 1
------------------------------------------------------------------------
lw t0, a0(42)

However vector loads and stores cannot have an offset folded into them,
so the following GEP is costed:

%p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

The issue arises whenever there is a mismatch between the target type of
the GEP and the type that is actually accessed:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

Even though this GEP will result in an add instruction, because TTI
thinks it's loading an i32, it will think it can be folded and not
charge for it.

The target type can become mismatched with the memory access during
transformations, noticeably during SLP where a scalar base pointer will
be reused to perform a vector load or store.

This patch adds an optional AccessType argument to getGEPCost which
allows the type of memory accessed by users to be passed in as a hint,
so that we can more accurately determine if the GEP can be folded into
its users.

If AccessType is not provided, getGEPCost falls back to the old
behaviour of using the PointeeType to guess the memory access type. This
can be revisited in a later patch.

Also for now, only GEPs with exactly one user use the access type hint.
Whilst we could look through all users and use all access types to
determine if we can fold the GEP, this patch avoids doing so to prevent
O(N) behaviour.

Differential Revision: https://reviews.llvm.org/D149889

14 months ago[RISCV][SLP] Add tests for GEP costs
Luke Lau [Thu, 4 May 2023 18:36:48 +0000 (19:36 +0100)]
[RISCV][SLP] Add tests for GEP costs

This patch updates the tests in gep.ll to have explicitly memory
accesses using them, to illustrate the new behaviour in D149889.
New tests have also been added for mismatched pointer types and memory
access types, and gep-zero-indices.ll has also been added to make sure
that we always cost GEPs with all zero indices as free.

14 months ago[mlir][memref] Add some missing interfaces to memref ops.
Ivan Butygin [Fri, 23 Jun 2023 17:38:32 +0000 (19:38 +0200)]
[mlir][memref] Add some missing interfaces to memref ops.

Add `ViewLikeOpInterface` to `ExtractStridedMetadataOp` as it returns its buffer as one of the results.
Add mem Read/Write attributes to atomic ops.

Differential Revision: https://reviews.llvm.org/D153647

14 months ago[AArch64] Add and cmp cost model tests. NFC
David Green [Thu, 29 Jun 2023 12:29:34 +0000 (13:29 +0100)]
[AArch64] Add and cmp cost model tests. NFC

See D153611. Tests for the cost of icmp(and, 0) are added, in addition to
expanding the extractelements-to-shuffle.ll test, which has always been a bit
simple, to include a more complete example with both a vector and scalar
version. The icmp(and, 0) costs are targetting at improving the second when the
cost of vector inserts and extracts is lowered.

14 months ago[Clang][RISCV] Fix RISC-V vector / SiFive intrinsic inclusion in SemaLookup
eopXD [Thu, 29 Jun 2023 08:39:25 +0000 (01:39 -0700)]
[Clang][RISCV] Fix RISC-V vector / SiFive intrinsic inclusion in SemaLookup

The existing code assumes that both `DeclareRISCVVBuiltins` and
`DeclareRISCVSiFiveVectorBuiltins` are set when coming into the if-statement
under SemaLookup.cpp.

This is not the case and causes issue #63571.

This patch resolves the issue.

Reviewed By: 4vtomat, kito-cheng

Differential Revision: https://reviews.llvm.org/D154050

14 months ago[libc][NFC] Use SIZE_MAX instead of size_t(-1)
Guillaume Chatelet [Thu, 29 Jun 2023 12:21:26 +0000 (12:21 +0000)]
[libc][NFC] Use SIZE_MAX instead of size_t(-1)

14 months ago[Clang] Allow C++11 style initialisation of SVE types.
Paul Walker [Thu, 22 Jun 2023 14:03:28 +0000 (14:03 +0000)]
[Clang] Allow C++11 style initialisation of SVE types.

Fixes https://github.com/llvm/llvm-project/issues/63223

Differential Revision: https://reviews.llvm.org/D153560

14 months ago[CSKY][test][NFC] Add tests of ANDI/ORI
Ben Shi [Fri, 23 Jun 2023 07:15:57 +0000 (15:15 +0800)]
[CSKY][test][NFC] Add tests of ANDI/ORI

These tests will be optimized with BSETI32/BCLRI32
in the future.

Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153613

14 months ago[CSKY][NFC] Simplify code with multiclass
Ben Shi [Wed, 21 Jun 2023 07:43:06 +0000 (15:43 +0800)]
[CSKY][NFC] Simplify code with multiclass

Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153402

14 months ago[ELF][NFC] Change comment terminology
Alex Brachet [Thu, 29 Jun 2023 11:22:00 +0000 (11:22 +0000)]
[ELF][NFC] Change comment terminology

Differential Revision: https://reviews.llvm.org/D153978

14 months ago[mlir][Linalg] Add a softmax op
Quentin Colombet [Thu, 29 Jun 2023 10:25:15 +0000 (12:25 +0200)]
[mlir][Linalg] Add a softmax op

This patch adds a softmax op.
For now, nothing interesting happens, we can only do a round trip.
Later patches will add the tiling interface and the lowering of this op to
a sequence of simpler ops.

This is graduating the linag_ext.softmax op from iree to LLVM.

Original implementation from Harsh Menon <harsh@nod-labs.com>
Nicolas Vasilache <nicolas.vasilache@gmail.com> co-authored this patch.

Differential Revision: https://reviews.llvm.org/D153422

14 months ago[mlir][GreedyPatternRewriter] Add out param to detect changes in IR in `applyPatterns...
Joel Wee [Thu, 29 Jun 2023 10:46:54 +0000 (12:46 +0200)]
[mlir][GreedyPatternRewriter] Add out param to detect changes in IR in `applyPatternsAndFoldGreedily`

This allows users of `applyPatternsAndFoldGreedily` to detect if any MLIR changes have occurred. An example use-case is where we expect the `applyPatternsAndFoldGreedily` to change the IR and want to validate that it indeed does change it.

Differential Revision: https://reviews.llvm.org/D153986

14 months ago[ConstraintElim] Add ptr phi tests with upper bounds with const offsets.
Florian Hahn [Thu, 29 Jun 2023 10:18:43 +0000 (11:18 +0100)]
[ConstraintElim] Add ptr phi tests with upper bounds with const offsets.

Extra tests for D152730.

14 months ago[AMDGPU][AsmParser][NFC] Simplify instruction operand definitions.
Ivan Kosarev [Thu, 29 Jun 2023 09:51:37 +0000 (10:51 +0100)]
[AMDGPU][AsmParser][NFC] Simplify instruction operand definitions.

This addresses the trivial cases that only require removing the
operand classes and renaming related entities.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D153965

14 months agoRevert "[ScalarEvolution] Infer loop max trip count from array accesses"
Liren Peng [Thu, 29 Jun 2023 08:08:06 +0000 (16:08 +0800)]
Revert "[ScalarEvolution] Infer loop max trip count from array accesses"

This reverts commit 57e093162e27334730d8ed8f7b25b1b6f65ec8c8.

14 months ago[RISCV] Remove unused variables in RISCVISelDAGToDAG.cpp (NFC)
Jie Fu [Thu, 29 Jun 2023 08:58:12 +0000 (16:58 +0800)]
[RISCV] Remove unused variables in RISCVISelDAGToDAG.cpp (NFC)

/Users/jiefu/llvm-project/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:97:33: error: unused variable 'FuncInfo' [-Werror,-Wunused-variable]
      RISCVMachineFunctionInfo *FuncInfo =
                                ^
/Users/jiefu/llvm-project/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:106:29: error: unused variable 'TLI' [-Werror,-Wunused-variable]
      const TargetLowering &TLI = CurDAG->getTargetLoweringInfo();
                            ^
2 errors generated

14 months ago[RISCV] Use temporary stack in expanding SPLAT_VECTOR_SPLIT_I64_VL node
Yunze Zhu [Thu, 29 Jun 2023 06:38:45 +0000 (14:38 +0800)]
[RISCV] Use temporary stack in expanding SPLAT_VECTOR_SPLIT_I64_VL node

There is an issue: https://github.com/llvm/llvm-project/issues/63515
The issue is because when expanding SPLAT_VECTOR_SPLIT_I64_VL node, only memoperand is used to create dependency.
However in ScheduleDAGNodes, dependency is checked with chain only, and breaks order of store/load instructions.
I think in llvm.bitreverse.nxv2i64 intrinsic SPLAT_VECTOR_SPLIT_I64_VL nodes are parallel processed,
so no chain should be add to these nodes.
Using temporary in expanding SPLAT_VECTOR_SPLIT_I64_VL node can keep vlse instruction get correct value
no matter order of store instructions is changed.

Differential Revision: https://reviews.llvm.org/D153743

14 months ago[ConstraintElim] Add pointer induction tests with struct types.
Florian Hahn [Thu, 29 Jun 2023 08:35:35 +0000 (09:35 +0100)]
[ConstraintElim] Add pointer induction tests with struct types.

Extra tests for D152730.

14 months agoReland "[LLDB] Fix the use of "platform process launch" with no extra arguments"
David Spickett [Wed, 28 Jun 2023 08:01:14 +0000 (08:01 +0000)]
Reland "[LLDB] Fix the use of "platform process launch" with no extra arguments"

This reverts commit 3254623d73fb7252385817d8057640c9d5d5ffd1.

One test has been updated to add the "-s" flag which along with
86fd957af981f146a306831608d7ad2de65b9560 should fix the tests on MacOS.

An assert on hijack listener added in that patch was removed, it seems
to be correct on MacOS but not on Linux.

14 months ago[PhaseOrdering] Add test with gep null compare in loop (NFC)
Arthur Eubanks [Thu, 29 Jun 2023 08:18:07 +0000 (10:18 +0200)]
[PhaseOrdering] Add test with gep null compare in loop (NFC)

Test from D153392 in both the alloca and malloc variants.

14 months ago[ConstraintElim] Allow and check preconditions in doesHold.
Florian Hahn [Thu, 29 Jun 2023 08:17:37 +0000 (09:17 +0100)]
[ConstraintElim] Allow and check preconditions in doesHold.

Delegate checking of the constraint & its preconditions to the existing
::isValid. This reduces duplication and allows additional optimizations
together with D152730.

14 months ago[mlir][llvm] Dominance violating debug intrinsic import
Christian Ulmann [Thu, 29 Jun 2023 07:53:18 +0000 (07:53 +0000)]
[mlir][llvm] Dominance violating debug intrinsic import

Debug intrinsics are allowed to violate SSA dominance and might thus
cause the LLVM import to produce invalid LLVM dialect. This commit
ensures that the debug intrinsics are emitted right after the definition
of their SSA operands.
As the position of debug intrinsics has no meaning, changing it has no
semantic implication.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D153984

14 months ago[Clang][Driver] Change missing multilib error to warning
Michael Platings [Thu, 29 Jun 2023 08:07:12 +0000 (09:07 +0100)]
[Clang][Driver] Change missing multilib error to warning

The error could be awkward to work around when experimenting with flags
that didn't have a matching multilib. It also broke many tests when
multilib.yaml was present in the build directory.

Reviewed By: simon_tatham, MaskRay

Differential Revision: https://reviews.llvm.org/D153885

14 months ago[test] Replace aarch64-*-eabi with aarch64
Michael Platings [Wed, 28 Jun 2023 07:44:41 +0000 (08:44 +0100)]
[test] Replace aarch64-*-eabi with aarch64

Also replace aarch64_be-*-eabi with aarch64_be

Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D153943

14 months ago[FunctionAttrs] Regenerate test checks (NFC)
Nikita Popov [Thu, 29 Jun 2023 07:58:49 +0000 (09:58 +0200)]
[FunctionAttrs] Regenerate test checks (NFC)

14 months ago[InlineCost][TargetTransformInfo][AMDGPU] Consider cost of alloca instructions in...
Juan Manuel MARTINEZ CAAMAÑO [Thu, 29 Jun 2023 07:31:34 +0000 (09:31 +0200)]
[InlineCost][TargetTransformInfo][AMDGPU] Consider cost of alloca instructions in the caller (2/2)

Before this patch, the compiler gave a bump to the inline-threshold
when the total size of the allocas passed as arguments to the
callee was below 256 bytes.
This heuristic ignores that some of these allocas could have be removed
by SROA if inlining was applied.

Ideally, this bonus would be attributed to the threshold once the
size of all the allocas that could not be handled by SROA is known:
at the end of the InlineCost analysis.
However, we may never reach this point if the inline-cost analysis exits
early when the inline cost goes over the threshold mid-analysis.

This patch proposes:
* Attribute the bonus in the inline-threshold when allocas are passed
  as arguments (regardless of their total size).
* Assigns a cost to each alloca proportional to its size,
  such that the cost of all the allocas cancels the bonus.

Potential problems:
* This patch assumes that removing alloca instructions with SROA is
  always profitable. This may not be the case if the total size of the
  allocas is still too big to be promoted to registers/LDS.
* Redundant calls to getTotalAllocaSize
* Awkwardly, the threshold attributed contributes to the single-bb and
  vector bonus.

Reviewed By: scchan

Differential Revision: https://reviews.llvm.org/D149741

14 months ago[InlineCost][TargetTransformInfo][AMDGPU] Consider cost of alloca instructions in...
Juan Manuel MARTINEZ CAAMAÑO [Thu, 29 Jun 2023 07:11:54 +0000 (09:11 +0200)]
[InlineCost][TargetTransformInfo][AMDGPU] Consider cost of alloca instructions in the caller (1/2)

On AMDGPU, alloca instructions have penalty that can
be avoided when SROA is applied after inlining.

This patch introduces the default implementation of
TargetTransformInfo::getCallerAllocaCost.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D149740

14 months ago[libunwind] Add cached compile and link flags to libunwind
mgrzywac [Thu, 29 Jun 2023 07:41:08 +0000 (07:41 +0000)]
[libunwind] Add cached compile and link flags to libunwind

Add flags allowing to use compile flags and libraries provided in cache with libunwind.
Similar flags are already present in libc++ and libc++abi CMakeLists files.

Differential Revision: https://reviews.llvm.org/D150252

14 months ago[RISCV] Do a more complete job of disabling extending loads and truncating stores...
Craig Topper [Thu, 29 Jun 2023 07:20:47 +0000 (00:20 -0700)]
[RISCV] Do a more complete job of disabling extending loads and truncating stores for fixed vector types.

We weren't marking some combinations as Expand if ones of the
types wasn't legal.

Fixes #63596.

14 months ago[InstSimplify] Fold icmp of allocas based on offset difference
Hanbum Park [Thu, 29 Jun 2023 07:13:18 +0000 (09:13 +0200)]
[InstSimplify] Fold icmp of allocas based on offset difference

Strengthen the fold for icmps of non-overlapping storage, by
working on the difference of offsets, rather than considering
both offsets independently. In particular, this allows handling
comparisons of pointers to the end of equal-sized allocations.

Proofs: https://alive2.llvm.org/ce/z/Po2nL4

Differential Revision: https://reviews.llvm.org/D153752

14 months ago[clang][dataflow] Don't crash when creating pointers to members.
Martin Braenne [Thu, 29 Jun 2023 06:39:39 +0000 (06:39 +0000)]
[clang][dataflow] Don't crash when creating pointers to members.

The newly added tests crash without the other changes in this patch.

Reviewed By: sammccall, xazax.hun, gribozavr2

Differential Revision: https://reviews.llvm.org/D153960

14 months ago[SCEV] Make use of non-null pointers for range calculation
Nikita Popov [Fri, 23 Jun 2023 10:50:27 +0000 (12:50 +0200)]
[SCEV] Make use of non-null pointers for range calculation

We know that certain pointers (e.g. non-extern-weak globals or
allocas in default address space) are not null, in which case the
lowest address they can be allocated at is their alignment.

This allows us to calculate better exit counts for loops that have
an additional null check in the guarding condition
(see alloca_icmp_null_exit_count).

Differential Revision: https://reviews.llvm.org/D153624

14 months ago[mlir][llvm] Add debug label intrinsic
Tobias Gysi [Thu, 29 Jun 2023 06:31:03 +0000 (06:31 +0000)]
[mlir][llvm] Add debug label intrinsic

This revision adds support for the llvm.dbg.label.intrinsic
and the corresponding DILabel metadata.

Reviewed By: Dinistro

Differential Revision: https://reviews.llvm.org/D153975

14 months ago[RISCV] Update computeKnownBitsForTargetNode for FPCLASS.
Jianjian GUAN [Thu, 29 Jun 2023 03:15:14 +0000 (11:15 +0800)]
[RISCV] Update computeKnownBitsForTargetNode for FPCLASS.

The fclass instruction only set one of the low 10 bits.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154040

14 months ago[StructuralHash] Ignore global variable declarations
Mikael Holmen [Thu, 29 Jun 2023 05:51:15 +0000 (07:51 +0200)]
[StructuralHash] Ignore global variable declarations

Ignore declarations of global variables, just as we do with declarations
of functions.

Done as a follow up to the comments in https://reviews.llvm.org/D149209

Differential Revision: https://reviews.llvm.org/D153855

# Conflicts:
# llvm/lib/IR/StructuralHash.cpp

14 months ago[Analysis] Refactor MBB hotness/coldness into templated PSI functions.
Han Shen [Thu, 29 Jun 2023 05:18:53 +0000 (22:18 -0700)]
[Analysis] Refactor MBB hotness/coldness into templated PSI functions.

Currently, to use PSI->isFunctionHotInCallGraph, we first need to
calculate BPI->BFI, which is expensive. Instead, we can implement this
directly with MBFI. Also as @wenlei mentioned in another patch review,
that MachineSizeOpts already has isFunctionColdInCallGraph,
isFunctionHotInCallGraphNthPercentile, etc implemented. These can be
refactored and so they can be reused across MachineFunctionSplitting
and MachineSizeOpts passes.

This CL does this - it refactors out those internal static functions
into PSI as templated functions, so they can be accessed easily.

Differential Revision: https://reviews.llvm.org/D153927

14 months ago[NFC] Add missing cpu tests in predefined-arch-macros.c
Freddy Ye [Thu, 29 Jun 2023 05:29:00 +0000 (13:29 +0800)]
[NFC] Add missing cpu tests in predefined-arch-macros.c

Added tests for penryn, nehalem, westmere, sandybridge, ivybridge,
haswell, bonnell, silvermont.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D153714

14 months ago[clang][dataflow] Make `getThisPointeeStorageLocation()` return an `AggregateStorageL...
Martin Braenne [Wed, 28 Jun 2023 08:38:00 +0000 (08:38 +0000)]
[clang][dataflow] Make `getThisPointeeStorageLocation()` return an `AggregateStorageLocation`.

This avoids the need for casts at callsites.

Depends On D153852

Reviewed By: sammccall, xazax.hun, gribozavr2

Differential Revision: https://reviews.llvm.org/D153854

14 months ago[clang][dataflow] Initialize fields of anonymous records correctly.
Martin Braenne [Wed, 28 Jun 2023 08:36:06 +0000 (08:36 +0000)]
[clang][dataflow] Initialize fields of anonymous records correctly.

Previously, the newly added test would crash.

Depends On D153851

Reviewed By: gribozavr2

Differential Revision: https://reviews.llvm.org/D153852

14 months ago[ValueTracking] Guaranteed well-defined if parameter has a dereferecable_or_null...
luxufan [Wed, 28 Jun 2023 15:01:09 +0000 (23:01 +0800)]
[ValueTracking] Guaranteed well-defined if parameter has a dereferecable_or_null attribute

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153945

14 months ago[sanitizer][msan] The LLVM part of the LoongArch memory sanitizer implementation
zhanglimin [Thu, 29 Jun 2023 03:40:22 +0000 (11:40 +0800)]
[sanitizer][msan] The LLVM part of the LoongArch memory sanitizer implementation

This patch enabled msan in LLVM and fixed all failing tests in
check-msan.

It does not add VarArgHelper implementation on LoongArch, which
will be done separately later. And it adds a test for VarArgNoOpHelper,
which is based on the X86 one.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D152692

14 months ago[analyzer] Refactor codes in findMethodDecl()
Manna, Soumi [Thu, 29 Jun 2023 02:51:15 +0000 (19:51 -0700)]
[analyzer] Refactor codes in findMethodDecl()

In findMethodDecl(clang::ObjCMessageExpr const *, clang::ObjCObjectPointerType const *, clang::ASTContext &), if the MessageExpr->getReceiverKind() is not Instance or Class, we never dereference pointer “ReceiverObjectPtrType”. Also, we don't dereference the pointer “ReceiverObjectPtrType” if ReceiverType is ObjCIdType or ObhjCClassType. So the pointer “ReceiverObjectPtrType”is only used in this branch and the declaration should be here.

This patch directly uses ReceiverType->castAs<ObjCObjectPointerType>() instead of ReceiverObjectPtrType when calling canAssignObjCInterfaces() to express the intent more clearly.

Reviewed By: erichkeane, steakhal

Differential Revision: https://reviews.llvm.org/D152194

14 months ago[MSan] Enable MSAN for loongarch64
zhanglimin [Thu, 29 Jun 2023 01:46:02 +0000 (09:46 +0800)]
[MSan] Enable MSAN for loongarch64

This patch adds basic memory sanitizer support for loongarch64
with 47-bit VMA, which memory layout is based on x86_64.

The LLVM part of the LoongArch memory sanitizer implementation will
be done separately, which will fix failing tests in check-msan.
These failing tests fail with the following same error: "error in
backend: unsupported architecture".

Reviewed By: #sanitizers, vitalybuka, MaskRay

Differential Revision: https://reviews.llvm.org/D140528

14 months ago[NFC]Fix possibly derefer nullptr in ComplexDeinterleavingPass.cpp
Wang, Xin10 [Thu, 29 Jun 2023 03:23:06 +0000 (23:23 -0400)]
[NFC]Fix possibly derefer nullptr in ComplexDeinterleavingPass.cpp

Fix static analyzer reports issue, add assert to avoid analyzer report.

Reviewed By: igor.kirillov

Differential Revision: https://reviews.llvm.org/D153942

14 months ago[LoongArch] Emit R_LARCH_64_PCREL relocation for FK_Data_8 when IsPCRel is true
Weining Lu [Wed, 28 Jun 2023 00:25:11 +0000 (08:25 +0800)]
[LoongArch] Emit R_LARCH_64_PCREL relocation for FK_Data_8 when IsPCRel is true

Reviewed By: xen0n, MaskRay, hev

Differential Revision: https://reviews.llvm.org/D153872

14 months ago[RISCV] Bump vector crypto to v1.0.0-rc1
4vtomat [Tue, 27 Jun 2023 07:06:40 +0000 (00:06 -0700)]
[RISCV] Bump vector crypto to v1.0.0-rc1

Differential Revision: https://reviews.llvm.org/D153836

14 months ago[CLANG] Fix potential integer overflow value in getRVVTypeSize()
Manna, Soumi [Thu, 29 Jun 2023 02:25:46 +0000 (19:25 -0700)]
[CLANG] Fix potential integer overflow value in getRVVTypeSize()

In getRVVTypeSize(clang::ASTContext &, clang::BuiltinType const *) potential integer overflow occurs on expression VScale->first * MinElts with type unsigned int (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type uint64_t (64 bits, unsigned).

To avoid integer overflow, this patch changes the types of variables MinElts and EltSize to uint64_t from unsigned instead of the cast.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D153146

14 months ago[mlir][IR] clang-format OperationSupport.cpp, NFC
Hideto Ueno [Thu, 29 Jun 2023 01:56:48 +0000 (18:56 -0700)]
[mlir][IR] clang-format OperationSupport.cpp, NFC

Follow-up to D154015

14 months ago[mlir][IR] Combine location hash if required in OperationEquivalence::computeHash
Hideto Ueno [Thu, 29 Jun 2023 01:47:21 +0000 (18:47 -0700)]
[mlir][IR] Combine location hash if required in OperationEquivalence::computeHash

This fixes a bug that `OperationEquivalence::computeHash` doesn't
combine hash of operation locations even when `IgnoreLocations` is false.
Added a unit test which fails at the current trunk.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D154015

14 months ago[mlir][memref] Make result normalization aware of the number symbols
Kai Sasaki [Thu, 29 Jun 2023 01:04:35 +0000 (10:04 +0900)]
[mlir][memref] Make result normalization aware of the number symbols

Memref normalization fails to recognize the non-zero symbols used in the memref type itself with strided, offset information. It causes the crash with the type like `memref<128x512xf32, strided<[?, ?], offset: ?>>`. The original issue is here. https://github.com/llvm/llvm-project/issues/61345

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D150250

14 months ago[BOLT] Add -dump-cg option to dump call graph
Amir Ayupov [Thu, 29 Jun 2023 00:53:54 +0000 (17:53 -0700)]
[BOLT] Add -dump-cg option to dump call graph

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D153994

14 months ago[BOLT][NFC] Add extra debug logging to buildCallGraph
Amir Ayupov [Thu, 29 Jun 2023 00:52:35 +0000 (17:52 -0700)]
[BOLT][NFC] Add extra debug logging to buildCallGraph

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D153987

14 months ago[BOLT][NFC] Print functions after attaching profile (-print-profile)
Amir Ayupov [Thu, 29 Jun 2023 00:50:39 +0000 (17:50 -0700)]
[BOLT][NFC] Print functions after attaching profile (-print-profile)

Add an extra point of dumping functions: immediately after attaching the profile information.
This dumping is enabled by newly introduced `-print-profile` and `-print-all`.

The reason is that in `aggregate-only`/perf2bolt mode BOLT may not reach the point of
printing the function after CFG is constructed (`-print-cfg`), while we may still want to inspect
the attached profile, especially for diff'ing purposes.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D153996

14 months ago[libc][NFC] Set rounding mode for sincosf exhaustive test.
Tue Ly [Thu, 29 Jun 2023 00:30:54 +0000 (20:30 -0400)]
[libc][NFC] Set rounding mode for sincosf exhaustive test.

14 months ago[scudo] Use fast get time in secondary.
Christopher Ferris [Wed, 28 Jun 2023 21:24:38 +0000 (14:24 -0700)]
[scudo] Use fast get time in secondary.

When I moved the primary to use the faster get time syscall, I missed
the secondary use. Now fix the secondary to use this function too.

Reviewed By: Chia-hungDuan

Differential Revision: https://reviews.llvm.org/D154012

14 months ago[ConstraintElim] Add tests with phis and different alloc sizes/end ptrs.
Florian Hahn [Wed, 28 Jun 2023 22:12:05 +0000 (23:12 +0100)]
[ConstraintElim] Add tests with phis and different alloc sizes/end ptrs.

Extra tests for D152730

14 months ago[libc++][hardening][NFC] Introduce `_LIBCPP_ASSERT_UNCATEGORIZED`.
varconst [Tue, 27 Jun 2023 23:40:39 +0000 (16:40 -0700)]
[libc++][hardening][NFC] Introduce `_LIBCPP_ASSERT_UNCATEGORIZED`.

Replace most uses of `_LIBCPP_ASSERT` with
`_LIBCPP_ASSERT_UNCATEGORIZED`.

This is done as a prerequisite to introducing hardened mode to libc++.
The idea is to make enabling assertions an opt-in with (somewhat)
fine-grained controls over which categories of assertions are enabled.
The vast majority of assertions are currently uncategorized; the new
macro will allow turning on `_LIBCPP_ASSERT` (the underlying mechanism
for all kinds of assertions) without enabling all the uncategorized
assertions (in the future; this patch preserves the current behavior).

Differential Revision: https://reviews.llvm.org/D153816

14 months ago[RISCV] Add test cases for vmv.v.vs which could be combined
Luke Lau [Tue, 20 Jun 2023 14:23:00 +0000 (15:23 +0100)]
[RISCV] Add test cases for vmv.v.vs which could be combined

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153350

14 months ago[RISCV] Add test cases for insert subvector shuffles for fixed vectors
Luke Lau [Tue, 13 Jun 2023 13:51:49 +0000 (13:51 +0000)]
[RISCV] Add test cases for insert subvector shuffles for fixed vectors

These cases could have the vmv.v.v folded into the VL of the previous
instruction.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153030

14 months ago[DAGCombine] Fold (store (insert_elt (load p)) x p) -> (store x)
Luke Lau [Tue, 6 Jun 2023 11:16:13 +0000 (13:16 +0200)]
[DAGCombine] Fold (store (insert_elt (load p)) x p) -> (store x)

If we have a store of a load with no other uses in between it, it's
considered dead and is removed. So sometimes when legalizing a fixed
length vector store of an insert, we end up producing better code
through scalarization than without.
An example is the follow below:

  %a = load <4 x i64>, ptr %x
  %b = insertelement <4 x i64> %a, i64 %y, i32 2
  store <4 x i64> %b, ptr %x

If this is scalarized, then DAGCombine successfully removes 3 of the 4
stores which are considered dead, and on RISC-V we get:

  sd a1, 16(a0)

However if we make the vector type legal (-mattr=+v), then we lose the
optimisation because we don't scalarize it.

This patch attempts to recover the optimisation for vectors by
identifying patterns where we store a load with a single insert
inbetween, replacing it with a scalar store of the inserted element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152276

14 months ago[RISCV] Add fixed vector insert tests that are pass by value
Luke Lau [Wed, 28 Jun 2023 11:57:32 +0000 (11:57 +0000)]
[RISCV] Add fixed vector insert tests that are pass by value

So we can still test insert_vector_elt lowering with D152276

Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D153964

14 months ago[gn build] Port 75a1797044fc
LLVM GN Syncbot [Wed, 28 Jun 2023 21:38:12 +0000 (21:38 +0000)]
[gn build] Port 75a1797044fc

14 months agoReland [llvm] Preliminary fat-lto-objects support
Paul Kirth [Wed, 28 Jun 2023 15:33:24 +0000 (15:33 +0000)]
Reland [llvm] Preliminary fat-lto-objects support

Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in  `.text` are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Earlier versions of this patch were missing REQUIRES lines for llc
related tests in Transforms/EmbedBitcode. Those tests are now under
CodeGen/X86, which should avoid running the check on unsupported
platforms.

The EmbedbBitcodePass also returned PreservedAnalyses::all when adding a
metadata section, which failed expensive checks, since it modified the
module. This is now corrected.

Reviewed By: tejohnson, MaskRay, nikic

Differential Revision: https://reviews.llvm.org/D146776

14 months ago[mlir][sparse] admit un-sparsifiable operations if all its operands are loaded from...
Peiming Liu [Wed, 28 Jun 2023 19:56:38 +0000 (19:56 +0000)]
[mlir][sparse] admit un-sparsifiable operations if all its operands are loaded from dense input

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D153998

14 months ago[Object] Add ELF section type SHT_LLVM_BITCODE for LLVM bitcode
Fangrui Song [Wed, 28 Jun 2023 21:01:08 +0000 (14:01 -0700)]
[Object] Add ELF section type SHT_LLVM_BITCODE for LLVM bitcode

clang -ffat-lto-objects can use this new ELF section type for the .llvm.lto
section for fat LTO support (D146776).

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D153215

14 months agoHIP: Directly call floor builtins
Matt Arsenault [Sun, 20 Nov 2022 16:50:13 +0000 (08:50 -0800)]
HIP: Directly call floor builtins

14 months ago[SLP]Fix emission of buildvectors with full match.
Alexey Bataev [Wed, 28 Jun 2023 19:53:39 +0000 (12:53 -0700)]
[SLP]Fix emission of buildvectors with full match.

If the buildvector node is a full match of another node, need to
correctly build the mask for the original vector value and build common
mask for the emitted node.

14 months ago[NFC][Sample PGO] Avoid non-const accessor for CallsiteSamples
Wenlei He [Wed, 28 Jun 2023 18:54:39 +0000 (11:54 -0700)]
[NFC][Sample PGO] Avoid non-const accessor for CallsiteSamples

Exposing a non-const accessor for clearing CallsiteSamples during flattening is a big of an overkill. Replace the non-const accessor with removeAllCallsiteSamples.

Differential Revision: https://reviews.llvm.org/D153995

14 months ago[clang] Fix checking the equality comparator of base classes in __is_trivially_equali...
Nikolas Klauser [Wed, 28 Jun 2023 20:33:40 +0000 (13:33 -0700)]
[clang] Fix checking the equality comparator of base classes in __is_trivially_equality_comparable

Fixes #63192

Reviewed By: cor3ntin

Spies: cfe-commits

Differential Revision: https://reviews.llvm.org/D153890

14 months ago[flang][openmp] Fortran offloading test
Ethan Luis McDonough [Wed, 28 Jun 2023 20:02:37 +0000 (15:02 -0500)]
[flang][openmp] Fortran offloading test

Flang currently supports offloading for AMD GPUs.  This patch establishes a test structure for Fortran offloading tests in libomptarget.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D148778

14 months ago[ConstraintElim] Add additional induction phi tests with end argument.
Florian Hahn [Wed, 28 Jun 2023 20:10:43 +0000 (21:10 +0100)]
[ConstraintElim] Add additional induction phi tests with end argument.

Extra tests for D152730 with different GEP step sizes and the end pointer
being an argument.

14 months ago[SLP] Use vector types for cmp alt instructions costs
David Green [Wed, 28 Jun 2023 20:02:29 +0000 (21:02 +0100)]
[SLP] Use vector types for cmp alt instructions costs

Similar to the other code that costs main/alt instructions, the cmp should be
using the VecTy for the costs, not the ScalarTy.

One of the tests look like it gets worse just because it is not simplified to
0.

Differential Revision: https://reviews.llvm.org/D153507

14 months agoRevert "[Clang] Reset FP options before function instantiations"
Serge Pavlov [Wed, 28 Jun 2023 19:04:31 +0000 (02:04 +0700)]
Revert "[Clang] Reset FP options before function instantiations"

This reverts commit 98390ccb80569e8fbb20e6c996b4b8cff87fbec6.
It caused issue #63542.

14 months agoHIP: Use frexp builtins in math headers
Matt Arsenault [Tue, 6 Jun 2023 22:06:38 +0000 (18:06 -0400)]
HIP: Use frexp builtins in math headers

14 months agoLangRef: Fix sphinx build error
Matt Arsenault [Wed, 28 Jun 2023 19:04:08 +0000 (15:04 -0400)]
LangRef: Fix sphinx build error

14 months agoadding bf16 support to NVPTX
root [Fri, 23 Jun 2023 17:59:22 +0000 (10:59 -0700)]
adding bf16 support to NVPTX

Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D144911

Co-authored-by: Artem Belevich <tra@google.com>
14 months agoclang: Use new frexp intrinsic for builtins and add f16 version
Matt Arsenault [Tue, 2 May 2023 13:07:47 +0000 (09:07 -0400)]
clang: Use new frexp intrinsic for builtins and add f16 version

14 months agoIR: Add llvm.frexp intrinsic
Matt Arsenault [Thu, 27 Apr 2023 01:57:10 +0000 (21:57 -0400)]
IR: Add llvm.frexp intrinsic

Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.

14 months ago[LLDB] Fix buffer overflow problem in DWARFExpression::Evaluate.
Caroline Tice [Tue, 27 Jun 2023 07:18:33 +0000 (00:18 -0700)]
[LLDB] Fix buffer overflow problem in DWARFExpression::Evaluate.

In two calls to ReadMemory in DWARFExpression.cpp, the buffer size
passed to ReadMemory is not actually the size of the buffer (I suspect
a copy/paste error where the variable name was not properly
updated). This caused a buffer overflow bug, which we found throuth
Address Sanitizer.  This patch fixes the problem by passing the
correct buffer size to the calls to ReadMemory (and to the
DataExtractor).

Differential Revision: https://reviews.llvm.org/D153840

14 months ago[libc++] Add missing _LIBCPP_HIDE_FROM_ABI in uninitialized_buffer.h
Nikolas Klauser [Wed, 28 Jun 2023 18:22:11 +0000 (11:22 -0700)]
[libc++] Add missing _LIBCPP_HIDE_FROM_ABI in uninitialized_buffer.h

14 months ago[libc][math] Implement erff function correctly rounded to all rounding modes.
Tue Ly [Sat, 24 Jun 2023 04:08:31 +0000 (00:08 -0400)]
[libc][math] Implement erff function correctly rounded to all rounding modes.

Implement correctly rounded `erff` functions.

For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`.

For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval:
```
  erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).
```

For `x < 0`, we can use the same formula as above, since the odd part is factored out.

Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X:

Reciprocal throughput (clock cycles / op)
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;

-- LIBC reciprocal throughput --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;
```

and latency (clock cycles / op):
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;

-- LIBC latency --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153683

14 months ago[Symbolizer] Ignore unknown additional symbolizer markup fields
Daniel Thornburgh [Fri, 23 Jun 2023 22:24:48 +0000 (15:24 -0700)]
[Symbolizer] Ignore unknown additional symbolizer markup fields

The symbolizer markup syntax is structured such that fields require only
previous fields for their interpretation; this was originally intended
to make adding new fields a natural extension mechanism for existing
elements. This codifies this into the spec and makes the behavior of the
llvm-symbolizer match. Extra fields are now warned about, but ignored,
rather than ignoring the whole element.

Reviewed By: mcgrathr

Differential Revision: https://reviews.llvm.org/D153821

14 months ago[MachineInst] Bump NumOperands back up to 24bits
Jon Roelofs [Mon, 26 Jun 2023 17:31:57 +0000 (10:31 -0700)]
[MachineInst] Bump NumOperands back up to 24bits

In https://reviews.llvm.org/D149445, it was lowered from 32 to 16bits, which
broke an internal project of ours. The relevant code being compiled is a fairly
large nested switch that results in a PHI node with 65k+ operands, which can't
easily be turned into a table for perf reasons.

This change unifies `NumOperands`, `Flags`, and `AsmPrinterFlags` into a packed
7-byte struct, which `CapOperands` can follow as the 8th byte, rounding it up
to a nice alignment before the `Info` field.

rdar://111217742&109362033

Differential revision: https://reviews.llvm.org/D153791

14 months agoAMDGPU: Move AMDGPUAttributor run earlier
Matt Arsenault [Thu, 8 Jun 2023 16:42:59 +0000 (12:42 -0400)]
AMDGPU: Move AMDGPUAttributor run earlier

Move it up with other module passes. It's a higher level optimization
that should probably be done before hacking up the IR for codegen. It
should really be done earlier than this. We could possibly move this
with other IPO passes, but we'd have to stop inferring the lack of
lds.kernel.id calls and have the LDS module pass mark functions which
don't need the ID.

The one test change is because that pass is relying on the backend run
of SROA (which we ideally wouldn't have).

14 months ago[docs][RISCV] Remove duplicate entries for zvfbfmin and zvfbfwma
Philip Reames [Wed, 28 Jun 2023 16:38:40 +0000 (09:38 -0700)]
[docs][RISCV] Remove duplicate entries for zvfbfmin and zvfbfwma

14 months ago[instrprof] Add an overload to accept raw_string_ostream.
Snehasish Kumar [Tue, 27 Jun 2023 18:26:57 +0000 (18:26 +0000)]
[instrprof] Add an overload to accept raw_string_ostream.

Add an overload for InstrProfWriter::write so that users can emit the
buffer to a string. Also use this new overload for existing unit test
usecases.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D153904

14 months ago[SLP][AArch64] Extend extracts-from-scalarizable-vector.ll test for cmp cost testing...
David Green [Wed, 28 Jun 2023 16:16:34 +0000 (17:16 +0100)]
[SLP][AArch64] Extend extracts-from-scalarizable-vector.ll test for cmp cost testing. NFC

See D153507. The existing test is over-simplified, as written it should have
been simpified prior to SLP vectorization. I have left it as-is to ensure the
crash it was protecting against doesn't arise again. A new test with valid
inputs is also added to show the incorrect costs of alt cmp vectorization.

14 months ago[InstSimplify] Fix a scalable-vector crash
Fraser Cormack [Thu, 22 Jun 2023 15:49:39 +0000 (16:49 +0100)]
[InstSimplify] Fix a scalable-vector crash

D143505 fixed/simplified folding of operations with SNaN operands. In
doing so it introduced a crash when handling scalable vector types,
wherein the scalable-vector ConstantVector was cast to a ConstantFP.

Since we know by that point in the code that if we've found a NaN, we're
dealing with a scalable-vector splat (as there are no other kinds of
scalable-vector constant for which that holds), we can grab the splatted
value and re-use the existing code, which will automatically splat the
new NaN back to a scalable vector for us.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153566

14 months ago[flang][openacc] Resolve symbol in device, host and self clause
Valentin Clement [Wed, 28 Jun 2023 16:08:22 +0000 (09:08 -0700)]
[flang][openacc] Resolve symbol in device, host and self clause

Some symbols were not resolved in the device, host and self clause
resulting in an `Internal: no symbol found` error.

This patch adds symbol resolution for these clauses.

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D153919

14 months ago[flang][openacc] Relax clause rule on routine directive
Valentin Clement [Wed, 28 Jun 2023 16:07:04 +0000 (09:07 -0700)]
[flang][openacc] Relax clause rule on routine directive

Some compiler treat `acc routine` without a parallelism clause as
if seq is present. Relax the parser rule to allow acc routine
without clause. The default clause will be handled in lowering.

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D153896

14 months ago[doc] Fix link typo
Paul Robinson [Wed, 28 Jun 2023 15:27:25 +0000 (08:27 -0700)]
[doc] Fix link typo