review.tizen.org Git - platform/upstream/llvm.git/log

thinlto_embed_bitcode.ll: clarify grep should treat input as text

The input to the test's use of grep should be treated as text, and
that's not the case on certain Linux distros. Added --text.

[llvm][clang][mlir] Add checks for the return values from Target::createXXX to prevent protential null deref

All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.

The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.

Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.

Reviewed By: tejohnson, MaskRay, jpienaar

Differential Revision: https://reviews.llvm.org/D91410

[StaticAnalyzer] Support struct annotations in FuchsiaHandleChecker

Support adding handle annotations to sturucture that contains
handles. All the handles referenced by the structure (direct
value or ptr) would be treated as containing the
release/use/acquire annotations directly.

Patch by Yu Shan

Differential Revision: https://reviews.llvm.org/D91223

[InstCombine] Use is_contained (NFC)

Fix shared build.

[libunwind] Delete unused handlerNotFound in unwind_phase1

[BasicAA] Remove unnecessary sextOrSelf (NFC)

We are doing a sextOrTrunc directly afterwards, so this seems
useless. There is a multiplication in between, but truncating
before or after the multiplication should not make a difference.

[compiler-rt] [profile] Silence a warning about an unused function on mingw targets

This function is only used within the ifdef below.

Differential Revision: https://reviews.llvm.org/D91850

[BasicAA] Return DecomposedGEP (NFC)

Instead of requiring the caller to initialize the DecomposedGEP
structure and then passing it in by reference, make
DecomposeGEPExpression() responsible for initializing and returning
the structure.

[BasicAA] Remove some intermediate variables (NFC)

Use DecompGEP1.Offset instead of GEP1BaseOffset, etc. I found the
asymmetry of modifying DecompGEP1.VarIndices, but not modifying
DecompGEP1.Offset odd here.

[NFC] Fix typo in atomic

[flang][openmp] Separate memory-order-clause parser creating OmpClause node

This patch introduce the separate parser for the memory-order-clause from the general
OmpClauseList. This parser still creates OmpClause node and therefore can use all the feature
from TableGen and the OmpStructureChecker.
This is applied only for the Flush construct in this patch and it should be applied for
atomic as well.

This is the approach we disscussed several time during the weekly call.

Reviewed By: kiranchandramohan, sameeranjoshi

Differential Revision: https://reviews.llvm.org/D91839

[BasicAA] Remove stale FIXME (NFC)

If aliasGEP returns MayAlias, the code does fall through to
aliasPHI etc, so this FIXME is no longer applicable.

[X86] Include %rip for 32-bit RIP-relative relocs for x32

%rip was only included for 64-bit RIP-relative relocations, but needs to be included for 32-bit as well.

Reviewed By: MaskRay, RKSimon

Differential Revision: https://reviews.llvm.org/D91339

MachineDominators.h - remove unused <vector> include

DominanceFrontier - remove unused <vector> includes

[X86] Regenerate vector-reduce-or-cmp.ll

Fix AVX512 prefixes to appease update_llc_test_checks.py

[Flang][OpenMP][NFC][2/2] Reorder OmpStructureChecker and simplify it.

`OmpStructureChecker` has too much boilerplate code in source file.

This patch:
1. Use helpers from `check-directive-structure.h` and reduces the boilerplate.
2. Use TableGen infrastructure as much as possible.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D90834

[BasicAA] Add recphi test with dynamic offset (NFC)

Currently, we don't recognize that %a an %p don't alias.

[lldb] Reland "Use translated full ftag values"

Translate between abridged and full ftag values in order to expose
the latter in the gdb-remote protocol while the former are used by
FXSAVE/XSAVE... This matches the gdb behavior.

The Shell/Register tests now rely on the new behavior, and therefore
are run on non-Darwin systems only. The Python (API) test relies
on the legacy behavior, and is run on Darwin only.

Differential Revision: https://reviews.llvm.org/D91504

[TableGen] [ISel Matcher Emitter] Rework with two passes: one to size, one to emit

Differential Revision: https://reviews.llvm.org/D91632

[NFC, Refactor] Modernize enum FunctionDefinitionKind (DeclSpech.h) into a scoped enum

Reviewed by aaron.ballman, rsmith, wchilders
Highlights of review:
- avoid specifying an underlying type (unless such an enum is stored (or part of an abi?))
- avoid using enums as bit-fields, preferring unsigned bit-fields that we static_cast enumerators to. (MS's abi laysout enum bit-fields differently).
- clang-format, clang-format, clang-format.

https://reviews.llvm.org/D91035

Thank you!

[mlir] Fix async microbench integration test

Differential Revision: https://reviews.llvm.org/D91912

[flang][openmp] Fix bug in `OmpClause::Hint` clause which was missing to generate inside in OMP.cpp.inc file.

Before this patch "Hint" isn't found inside the generated file.
./bin/llvm-tblgen --gen-directive-gen ../llvm-project/llvm/include/llvm/Frontend/OpenMP/OMP.td -I ../llvm-project/llvm/include/ > OMP.cpp.in

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D91909

[mlir] AsynToLLVM: do no use op->getOperands() in conversion patterns

Differential Revision: https://reviews.llvm.org/D91910

[mlir] Add microbenchmark for linalg+async-parallel-for

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D91896

[VE][NFC] Modify function order and simplify comments

[VE] Correct types of return/argument values for getAdjustedFrameSize()

A getAdjustedFrameSize function may need to handle larger than 32 bits
integer, so change int to uint64_t.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91862

GitHub Actions: Add job for automatically updating the main branch

Differential Revision: https://reviews.llvm.org/D91554

[NFC][AMDGPU] Document kernel descriptor

- Document that the kernel descriptor defined is for code object V3.
Document that it also applies to earlier code object formats for CP.

- Document the deprecated bits in kernel descriptor.

Differential Revision: https://reviews.llvm.org/D91458

[VE][NFC] Update missing bulk update tests to use typed sret

[mac/arm] Fix test/Driver/darwin-sdk-version.c on arm macs

Two invocations in this test used `-m64`, which on an arm mac means
arm64 in the triple, not x86_64.

[mac/arm] Fix clang/test/Sema/wchar.c on mac/arm hosts

Part of PR46644.

OpaquePtr: Make byval/sret types mandatory

AMDGPU: Fix counting kernel arguments towards register usage

Also use DataLayout to get type size. Relying on the IR type size is
also pretty broken here, since this won't perfectly capture how types
are legalized.

[Analysis] Use llvm::is_contained (NFC)

Revert "Revert "[libc++] ADL-proof <vector> by adding _VSTD:: qualification on calls.""

This reverts commit 620adacf87a376ec536ccc66af59df5bb4dc3b38.

Fix: unsupport C++03 for the new test, define helpers before __swap_allocator

(1) Add _VSTD:: qualification to __swap_allocator.

(2) Add _VSTD:: qualification consistently to __to_address.

(3) Add some more missing _VSTD:: to <vector>, with a regression test.
This part is cleanup after d9a4f936d05.

Note that a vector whose allocator actually runs afoul of any of these ADL calls will
likely also run afoul of simple things like `v1 == v2` (which is also an ADL call).
But, still, libc++ should be consistent in qualifying function calls wherever possible.

Relevant blog post: https://quuxplusone.github.io/blog/2019/09/26/uglification-doesnt-stop-adl/

Differential Revision: https://reviews.llvm.org/D91708

[flang][openacc] Add clause validity tests for the host_data directive

Add some clause validity tests for the host_data directive to avoid future regressions.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D91889

Verifier: Fix assert when verifying non-pointer byval or preallocated

This would fail on a cast<PointerType> when verifying the attribute if
these attributes were incorrectly used with a non-pointer type.

OpaquePtr: Update more tests to use typed sret

[flang][openacc] Add clause validity tests for the parallel directive

Add some clause validity tests for parallel directive.

Reviewed By: sameeranjoshi

Differential Revision: https://reviews.llvm.org/D91871

[mlir][sparse] refine optimization, add few more test cases

Adds tests for full sum reduction (tensors summed up into scalars)
and the well-known sampled-dense-dense-matrix-product. Refines
the optimizations rules slightly to handle the summation better.

Reviewed By: penpornk

Differential Revision: https://reviews.llvm.org/D91818

[hwasan] Implement error report callback.

Similar to __asan_set_error_report_callback, pass the entire report to a
user provided callback function.

Differential Revision: https://reviews.llvm.org/D91825

[llvm-profgen][NFC]Fix build failure on different platform

see titile
Test Plan:
ninja & ninja check-llvm

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D91897

[libc] Make more of the libc unit testing llvm independent

(WIP, hopefully I'll add more to this patch before submitting)

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D91665

[libc] Switch functions to using global headers

This switches all of the files in src/string, src/math, and
test/src/math from using relative paths (e.g. `#include “include/string.h”`)
to global paths (e.g. `#include <string.h>`) to make bringing up those
functions on other platforms, such as fuchsia, easier.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D91394

Don’t break before nested block param when prior param is not a block

Add ScopedTrace to verify methods in FormatTestObjC
Add tests from D17700

Reviewed By: keith, kastiglione

Differential Revision: https://reviews.llvm.org/D91669

OpaquePtr: Bulk update tests to use typed sret

[CSSPGO][llvm-profgen] Instruction symbolization

This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

This change adds the support of instruction symbolization. Given the RVA on an instruction pointer, a full calling context can be printed side-by-side with the disassembly code.
E.g.
```
Disassembly of section .text [0x0, 0x4a]:

<funcA>:
     0: mov eax, edi                           funcA:0
     2: mov ecx, dword ptr [rip]               funcLeaf:2 @ funcA:1
     8: lea edx, [rcx + 3]                     fib:2 @ funcLeaf:2 @ funcA:1
     b: cmp ecx, 3                             fib:2 @ funcLeaf:2 @ funcA:1
     e: cmovl edx, ecx                           fib:2 @ funcLeaf:2 @ funcA:1
    11: sub eax, edx                           funcLeaf:2 @ funcA:1
    13: ret                                        funcA:2
    14: nop word ptr cs:[rax + rax]
    1e: nop

<funcLeaf>:
    20: mov eax, edi                           funcLeaf:1
    22: mov ecx, dword ptr [rip]               funcLeaf:2
    28: lea edx, [rcx + 3]                     fib:2 @ funcLeaf:2
    2b: cmp ecx, 3                             fib:2 @ funcLeaf:2
    2e: cmovl edx, ecx                           fib:2 @ funcLeaf:2
    31: sub eax, edx                           funcLeaf:2
    33: ret                                        funcLeaf:3
    34: nop word ptr cs:[rax + rax]
    3e: nop

<fib>:
    40: lea eax, [rdi + 3]                     fib:2
    43: cmp edi, 3                             fib:2
    46: cmovl eax, edi                           fib:2
    49: ret                                        fib:8
```

Test Plan:
ninja check-llvm

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D89715

[CSSPGO][llvm-profgen] Disassemble text sections

This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

This change enables disassembling the text sections to build various address maps that are potentially used by the virtual unwinder. A switch `--show-disassembly` is being added to print the disassembly code.

Like the llvm-objdump tool, this change leverages existing LLVM components to parse and disassemble ELF binary files. So far X86 is supported.

Test Plan:

ninja check-llvm

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D89712

[CSSPGO][llvm-profgen] Parse mmap events from perf script

This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

As a starter, this change sets up an entry point by introducing PerfReader to load profiled binaries and perf traces(including perf events and perf samples). For the event, here it parses the mmap2 events from perf script to build the loader snaps, which is used to retrieve the image load address in the subsequent perf tracing parsing.

As described in llvm-profgen.rst, the tool being built aims to support multiple input perf data (preprocessed by perf script) as well as multiple input binary images. It should also support dynamic reload/unload shared objects by leveraging the loader snaps being built by this change

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D89707

[AArch64][GlobalISel] Make G_EXTRACT_VECTOR_ELT of <2 x p0> legal.

Also fix a selection issue for this which was using LLT::isScalar() when it
should have been using !isVector(), add test for that too.

Demangling support for class type non-type template parameter extensions.

The extensions in question are described in:
https://github.com/itanium-cxx-abi/cxx-abi/issues/47
https://github.com/itanium-cxx-abi/cxx-abi/issues/63

Differential Revision: https://reviews.llvm.org/D90003

[SLP][NFC]Fix assert condition in newTreeEntry, NFC.

[msan] unpoison_file from fclose and fflash

Also unpoison IO_write_base/_IO_write_end buffer

memcpy from fclose and fflash can copy internal bytes without metadata into user memory.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D91858

Revert "[lldb] add a missing dependency on intrinsics_gen"

This reverts commit 137ff7331705179c83533a074d800c481b7df1ac.

This belongs in Apple's Swift fork since this is a direct fix for
unified Swift + llvm + lldb builds.

Guard init_priority attribute within libc++

Not all platforms support priority attribute. I'm moving conditional definition of this attribute to `include/__config`.

Reviewed By: #libc, aaron.ballman

Differential Revision: https://reviews.llvm.org/D91565

[RISCV] Put RV32 before RV64 in the ValueTypeByHwMode and RegInfoByHwMode lists in RISCVRegisterInfo.td

Addresses post-commit feedback from 77e25b5bc8860e23395f617dcca4940489f6355c

[mlir][vector] Add transfer_op LoadToStore forwarding and deadStore optimizations

Add transformation to be able to forward transfer_write into transfer_read
operation and to be able to remove dead transfer_write when a transfer_write is
overwritten before being read.

Differential Revision: https://reviews.llvm.org/D91321

[clangd] semanticTokens: fields are 'property', not 'member'

This isn't obvious, but vscode maps member as 'entity.name.function.member',
so it's really for member functions.

Fixes https://github.com/clangd/vscode-clangd/issues/105

[OPENMP]Use the real pointer value as base, not indexed value.

After fix for PR48174 the base pointer for pointer-based
array-sections/array-subscripts will be emitted as `&ptr[idx]`, but
actually it should be just `ptr`, i.e. the address stored in the ponter
to point correctly to the beginning of the array. Currently it may lead
to a crash in the runtime.

Differential Revision: https://reviews.llvm.org/D91805

[RISCV] Remove RV32 HwMode. Use DefaultMode for RV32

Prior to this the DefaultMode was never selected, but RISCVGenDAGISel.inc, RISCVGenRegisterInfo.inc, RISCVGenGlobalISel.inc all ended up with extra table entries for that mode.

This patch removes the RV32 and uses DefaultMode for RV32. This impressively reduces the size of my release+asserts llc binary by about 270K. About 15K from RISCVGenDAGISel.inc, 1-2K from RISCVGenRegisterInfo.inc, but the vast majority from RISCVGenGlobalISel.inc.

Differential Revision: https://reviews.llvm.org/D90973

[OPENMP]Honor constantness of captured variables.

Fixes bug reported via Stackoverflow:
https://stackoverflow.com/questions/64179168/clang-overload-resolution-failure-with-templates-and-openmp-collapse

Need to honor constantness of private/target variables to make the code
compilable.

Differential Revision: https://reviews.llvm.org/D91644

OpaquePtr: Bulk update tests to use typed byval

Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.

[CSSPGO] MIR target-independent pseudo instruction for pseudo-probe intrinsic

This change introduces a MIR target-independent pseudo instruction corresponding to the IR intrinsic llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

An `llvm.pseudoprobe` intrinsic call will be lowered into a target-independent operation named `PSEUDO_PROBE`. Given the following instrumented IR,

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```
the corresponding MIR is shown below. Note that block `bb3` is duplicated into `bb1` and `bb2` where its probe is duplicated too. This allows for an accurate execution count to be collected for `bb3`, which is basically the sum of the counts of `bb1` and `bb2`.

```
bb.0.bb0:
   frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
   TEST32rr killed renamable $edi, renamable $edi, implicit-def $eflags
   PSEUDO_PROBE 837061429793323041, 1, 0
   $edi = MOV32ri 1, debug-location !13; test.c:0
   JCC_1 %bb.1, 4, implicit $eflags

bb.2.bb2:
   PSEUDO_PROBE 837061429793323041, 3, 0
   PSEUDO_PROBE 837061429793323041, 4, 0
   $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
   RETQ

bb.1.bb1:
   PSEUDO_PROBE 837061429793323041, 2, 0
   PSEUDO_PROBE 837061429793323041, 4, 0
   $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
   RETQ
```

The target op PSEUDO_PROBE will be converted into a piece of binary data by the object emitter with no machine instructions generated. This is done in a different patch.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86495

[RISCV] Custom type legalize i32 bswap/bitreverse to GREVIW on RV64 with Zbp extension

Previously we required a sra to pattern match these properly in isel. If the consumer didn't need the result sign extended we'll have an srl instead of sra and fail to match.

This patch switches to custom legalizing to GREVIW using portions of D91259.

Differential Revision: https://reviews.llvm.org/D91457

[CSSPGO] IR intrinsic for pseudo-probe block instrumentation

This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490

Update OptionComparison.md

[RISCV] Add RISCVISD::ROLW/RORW use those for custom legalizing i32 rotl/rotr on RV64IZbb.

This should result in better utilization of RORIW since we
don't need to look for a SIGN_EXTEND_INREG that may not exist.

Also remove rotl/rotr isel matching to GREVI and just prefer RORI.
This is to keep consistency so we don't have to match ROLW/RORW
to GREVIW as well. I imagine RORI/RORIW performance will be the
same or better than GREVI.

Differential Revision: https://reviews.llvm.org/D91449

[X86][AVX] LowerADDSAT_SUBSAT - avoid X86ISD::BLENDV in UADDSAT/USUBSAT v8i32/v4i64 lowering

Use the OR(CMP,ADD) / AND(CMP,SUB) patterns like we do on SSE targets.

Enable custom lowering for v8i32/v4i64 and generalize the 128-bit lowering code for any vector size - this also lets us use the slightly cheaper codegen for icmp_ugt instead of umin/umax.

[MLIR] Correct block merge bug

Block merging in MLIR will incorrectly merge blocks with operations whose values are used outside of that block. This change forbids this behavior and provides a test where it is illegal to perform such a merge.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D91745

[libTooling] Update Transformer's `node` combinator to include the trailing semicolon for decls.

Currently, `node` only includes the semicolon for (some) statements. However,
declarations have the same issue of (potentially) trailing semicolons, so `node`
should behave the same for them.

Differential Revision: https://reviews.llvm.org/D91872

[SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version.

The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.

Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.

Differential Revision: https://reviews.llvm.org/D91846

[mlir] add canonicalization patterns for trivial SCF 'for' and 'if'

Add canoncalization patterns to remove zero-iteration 'for' loops, replace
single-iteration 'for' loops with their bodies; remove known-false conditionals
with no 'else' branch and replace conditionals with known value by the
respective region. Although similar transformations are performed at the CFG
level, not all flows reach that level, e.g., the GPU flow may want to remove
single-iteration loops before deciding on loop mapping to thread dimensions.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D91865

[lldb] Add examples and reword source-map help string

Update the help string for `target.source-map` to remove the use of the word
"duple" and to add examples. Additionally I rewrote parts with the goal of
making the description more concrete.

rdar://68736012

Differential Revision: https://reviews.llvm.org/D91742

[Hexagon][NewPM] Port -hexagon-loop-idiom and add to pipeline

Fixes pmpy-mod.ll under NPM

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D91829

[mlir] Expose parseDimAndSymbolList from affineops.h

This was removed from ops.h, but it is used by onnx-mlir

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D91830

[DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead

Patched by: guopeilin
Reviewed By: hliao,rampitec

Differential Revision: https://reviews.llvm.org/D91513

[X86][SSE] LowerADDSAT_SUBSAT - avoid X86ISD::BLENDV in UADDSAT/USUBSAT custom lowering

Use the OR(CMP,ADD) / AND(CMP,SUB) patterns like we do on pre-SSE4 targets.

We're still using X86ISD::BLENDV on some AVX targets as we don't do custom lowering for >= 256-bit vectors.

Really this (and combineVSelectWithAllOnesOrZeros) needs moving to DAGCombiner, but pre-SSE42 we see the vXi64 comparison type as a 2 x 32-bits result so we can't just rely on ComputeNumSignBits to give us the 'all bits' result we need.

[flang][driver] Remove unnecessary CMake dependencies (nfc)

Remove clangFrontend from the list of dependencies. These should have
been removed in: 8d51d37e0628bde3eb5a3200507ba7135dfc2751. See also
https://reviews.llvm.org/D87774.

[CostModel] mostly remove cost-kind predicate for intrinsics in basic TTI implementation

This is re-applying a combination of f7eac51b9b3f and 8ec7ea3ddce7 as one patch
to avoid regressions now that we have better testing in place.

Those were reverted with 32dd5870ee31 because of crashing in experimental intrinsics.
That bug should be fixed with 7ae346434.

Paraphrased original commit messages:

This is the last step in removing cost-kind as a consideration in the
basic class model for intrinsics.
See D89461 for the start of that.
Subsequent commits dealt with each of the special-case intrinsics that
had customization here in the basic class. This should remove a barrier
to retrying D87188 (canonicalization to the abs intrinsic).

The ARM and x86 cost diffs seen here may be wrong because the
target-specific overrides have their own bugs, but we hope this is
less wrong - if something has a significant throughput cost, then it
should have a significant size / blended cost too by default.

The only behavioral diff in current regression tests is shown in the
x86 scatter-gather test (which is misplaced or broken because it runs
the entire -O3 pipeline) - we unrolled less, and we assume that is
a improvement.

Exception: in general, we want the *size* cost for a scalar call to be
cheap even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.

Differential Revision: https://reviews.llvm.org/D90554

[X86] Add SSE42 sat-add test coverage

Check SSE42 targets which have PCMPGTQ

[AMDGPU] Set the default globals address space to 1

This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D84345

Add a default address space for globals to DataLayout

This is similar to the existing alloca and program address spaces (D37052)
and should be used when creating/accessing global variables.
We need this in our CHERI fork of LLVM to place all globals in address space 200.
This ensures that values are accessed using CHERI load/store instructions
instead of the normal MIPS/RISC-V ones.

The problem this is trying to fix is that most of the time the type of
globals is created using a simple PointerType::getUnqual() (or ::get() with
the default address-space value of 0). This does not work for us and we get
assertion/compilation/instruction selection failures whenever a new call
is added that uses the default value of zero.

In our fork we have removed the default parameter value of zero for most
address space arguments and use DL.getProgramAddressSpace() or
DL.getGlobalsAddressSpace() whenever possible. If this change is accepted,
I will upstream follow-up patches to use DL.getGlobalsAddressSpace() instead
of relying on the default value of 0 for PointerType::get(), etc.

This patch and the follow-up changes will not have any functional changes
for existing backends with the default globals address space of zero.
A follow-up commit will change the default globals address space for
AMDGPU to 1.

Reviewed By: dylanmckay

Differential Revision: https://reviews.llvm.org/D70947

[libc] Combine all math differential fuzzers into one target.

Also added diffing of a few more math functions. Combining the diff check
for all of these functions helps us meet the OSS fuzz bar of a minimum of
100 program edges.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D91817

[SLP][Test] Update pr47269.ll test. NFC

Expand test for PR47269 to better demonstrate changes introduced by D90445.

Reland: Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager.  Enable memoryssa for loopsink with new pass manager.  This
combination exposed a bug that was previously fixed for loopsink
without memoryssa.  When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation.  This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.

Respond to review comments.  Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control.  Update tests accordingly.

Respond to review comments.  Add options controlling whether memoryssa is
used for loop sink, defaulting to off.  Expand testing based on these
options.

Respond to review comments.  Properly indicated preserved analyses.

This relanding addresses a compile-time performance problem by moving
test for profile data earlier to avoid unnecessary computations.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249

[CostModel] avoid crashing while finding scalarization overhead

The constrained intrinsics have metadata arguments, so the
tests here were crashing as noted in D90554 (and that was
reverted even though this bug exists independently of that
change).

[clang-tidy] Include std::basic_string_view in readability-redundant-string-init.

std::string_view("") produces a string_view instance that compares
equal to std::string_view(), but requires more complex initialization
(storing the address of the string literal, rather than zeroing).

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D91009

[NFC intended] Refactor the code for printChanged for reuse and to facilitate subsequent reporters of changes to the IR in the new pass manager.

Summary:
[NFC intended] Refactor the code for printChanged for reuse and to facilitate
subsequent reporters of changes to the IR in the new pass manager.

Create abstract template base classes for common functionality and give
classes more appropriate names. The base classes handle all of the
determination of when a function or pass is "interesting" and should be
reported or filtered out. They have pure virtual functions which are called
when a change by a pass has been recognized so the derived class need only
provide the overrides to present the information about the changing IR.
There are at least 2 more change reporters to come (which were presented
in my tutorial at the 2020 llvm developer's meeting) that derive from
these classes.

Respond to review comments: move function out of line, remove inline keyword,
remove unneeded qualifiers, simplify comparison.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks), madhur13490 (Madhur Amilkanthwar)
Differential Revision: https://reviews.llvm.org/D87000

[clang] Do not crash on pointer wchar_t pointer assignment.

wchar_t can be signed (thus hasSignedIntegerRepresentation() returns
true), but it doesn't have an unsigned type, which would lead to a crash
when trying to get it.

With this fix, we special-case WideChar types in the pointer assignment
code.

Differential Revision: https://reviews.llvm.org/D91625

[AArch64] Enable post RA scheduler for Cortex-R82

Just something I forgot when I added the R82. Need to have a look
at crypto and fusing, but will do that as a follow up.

Differential Revision: https://reviews.llvm.org/D91848

Add a call super attribute plugin example

If a virtual method is marked as call_super, the
override method must call it, simpler feature like @CallSuper
in Android Java.

Add documentation illustrating use of IgnoreUnlessSpelledInSource

Differential Revision: https://reviews.llvm.org/D91639

[ARM] Disable WLSTP loops

This checks to see if the loop will likely become a tail predicated loop
and disables wls loop generation if so, as the likelihood for reverting
is currently too high. These should be fairly rare situations anyway due
to the way iterations and element counts are used during lowering. Just
not trying can alter how SCEV's are materialized however, leading to
different codegen.

It also adds a option to disable all while low overhead loops, for
debugging.

Differential Revision: https://reviews.llvm.org/D91663

[AArch64] Out-of-line atomics (-moutline-atomics) implementation.

This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.

Differential Revision: https://reviews.llvm.org/D91157

[CostModel] add tests for math library calls; NFC

This is a partial un-revert of 32dd5870ee31 (originally df09f82599 ).

I'm adding back the baseline tests first, so we don't have
to back-track as much in case there are still problems.

[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC

See discussion in D90554.

This is a partial un-revert of 32dd5870ee31. I'm adding
back the baseline tests first, so we don't have to
back-track as much in case there are still problems.

[sanitizers][test] Test sanitizer_common and ubsan_minimal on Solaris

During the initial Solaris sanitizer port, I missed to enable the
`sanitizer_common` and `ubsan_minimal` testsuites.  This patch fixes this,
correcting a few unportabilities:

- `Posix/getpass.cpp` failed to link since Solaris lacks `libutil`.
  Omitting the library lets the test `PASS`, but I thought adding `%libutil`
  along the lines of `%librt` to be overkill.
- One subtest of `Posix/getpw_getgr.cpp` is disabled because Solaris
  `getpwent_r` has a different signature than expected.
- `/dev/null` is a symlink on Solaris.
- XPG7 specifies that `uname` returns a non-negative value on success.

Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D91606

[mlir][std] Canonicalize a dim(memref_reshape) into a load from the shape operand

This canonicalization helps propagate shape information through the program.

Differential Revision: https://reviews.llvm.org/D91854