review.tizen.org Git - platform/upstream/llvm.git/log

[PDLL] Add support for user defined constraint and rewrite functions

These functions allow for defining pattern fragments usable within the `match` and `rewrite` sections of a pattern. The main structure of Constraints and Rewrites functions are the same, and are similar to functions in other languages; they contain a signature (i.e. name, argument list, result list) and a body:

```pdll
// Constraint that takes a value as an input, and produces a value:
Constraint Cst(arg: Value) -> Value { ... }

// Constraint that returns multiple values:
Constraint Cst() -> (result1: Value, result2: ValueRange);
```

When returning multiple results, each result can be optionally be named (the result of a Constraint/Rewrite in the case of multiple results is a tuple).

These body of a Constraint/Rewrite functions can be specified in several ways:

* Externally
In this case we are importing an external function (registered by the user outside of PDLL):

```pdll
Constraint Foo(op: Op);
Rewrite Bar();
```

* In PDLL (using PDLL constructs)
In this case, the body is defined using PDLL constructs:

```pdll
Rewrite BuildFooOp() {
  // The result type of the Rewrite is inferred from the return.
  return op<my_dialect.foo>;
}
// Constraints/Rewrites can also implement a lambda/expression
// body for simple one line bodies.
Rewrite BuildFooOp() => op<my_dialect.foo>;
```

* In PDLL (using a native/C++ code block)
In this case the body is specified using a C++(or potentially other language at some point) code block. When building PDLL in AOT mode this will generate a native constraint/rewrite and register it with the PDL bytecode.

```pdll
Rewrite BuildFooOp() -> Op<my_dialect.foo> [{
  return rewriter.create<my_dialect::FooOp>(...);
}];
```

Differential Revision: https://reviews.llvm.org/D115836

[PDLL] Add support for single line lambda-like patterns

This allows for defining simple patterns in a single line. The lambda
body of a Pattern expects a single operation rewrite statement:

```
Pattern => replace op<my_dialect.foo>(operands: ValueRange) with operands;
```

Differential Revision: https://reviews.llvm.org/D115835

[libcxx] Silence -Wformat-nonliteral warnings in the Windows support code

This brings the mingw build back to zero build warnings as it was
at some earlier time.

Differential Revision: https://reviews.llvm.org/D119134

[llvm-libtool-darwin] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr

The pointer is dereferenced immediately, so assert the cast is correct instead of returning nullptr

[LoopVectorize] getStepVector - reduce scope of local variable. NFC.

[libc++] Prepare string.modifiers tests for constexpr

Reviewed By: ldionne, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D119329

[SCEV] Constify some uses of SCEVUnionPredicate* [NFC]

This exploits the immutability introduced in d334fec.

[NFC] Simplify pairwise store test mir to drop stack accesses.

Test simplified:
test/CodeGen/AArch64/stp-opt-with-renaming-undef-assert.mir

Modified SourceLevelDebugging.rst to include information about memory location exp

updated local branch to incorporate latest changes

worked on review comments

Updated the test to include proper string get functions

Updated the test to include addtional details

Added StringLocationExp to the new apis

Addressed review comments

Adding DIBuilder interface for assumed length string

Cleanup LLVMObject headers

Most notably,

llvm/Object/Binary.h no longer includes llvm/Support/MemoryBuffer.h
llvm/Object/MachOUniversal*.h no longer include llvm/Object/Archive.h
llvm/Object/TapiUniversal.h no longer includes llvm/Object/TapiFile.h

llvm-project preprocessed size:
before: 1068185081
after: 1068324320

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119457

Update all LLVM documentation mentioning runtimes in LLVM_ENABLE_PROJECTS

We are moving away from building the runtimes with LLVM_ENABLE_PROJECTS,
however the documentation was largely outdated. This commit updates all
the documentation I could find to use LLVM_ENABLE_RUNTIMES instead of
LLVM_ENABLE_PROJECTS for building runtimes.

Note that in the near future, libcxx, libcxxabi and libunwind will stop
supporting being built with LLVM_ENABLE_PROJECTS altogether. I don't know
what the plans are for other runtimes like libc, openmp and compiler-rt,
so I didn't make any changes to the documentation that would imply
something for those projects.

Once this lands, I will also cherry-pick this on the release/14.x branch
to make sure that LLVM's documentation is up-to-date and reflects what
we intend to support in the future.

Differential Revision: https://reviews.llvm.org/D119351

Sign-extend addresses in CompactRingBuffer.

Summary:
This is neccessary to support solaris/sparc9 where some userspace
addresses have all top bits set, as well as, potentially, kernel memory
on aarch64.

This change does not update the compiler side (HWASan IR pass) which
needs to be done separately for the affected targets.

Reviewers: ro, vitalybuka

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D91827

[Attributor][NFC] Expose new API in AAPointerInfo

New users might want to check bins without a load or store instruction
at hand. Since we use those instructions only to find the offset and
size of the access anyway, we can expose an offset and size interface
to the outside world as well.

This commit mainly moves code around and exposes a class (OffsetAndSize)
as well as a method forallInterferingAccesses in AAPointerInfo.

Differential Revision: https://reviews.llvm.org/D119249

[Attributor][FIX] Reachability needs to account for readonly callees

The oversight caused us to ignore call sites that are effectively dead
when we computed reachability (or more precise the call edges of a
function). The problem is that loads in the readonly callee might depend
on stores prior to the callee. If we do not track the call edge we
mistakenly assumed the store before the call cannot reach the load.
The problem is nicely visible in:
`llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll`

Caused by D118673.

Fixes https://github.com/llvm/llvm-project/issues/53726

[Attributor][FIX] Honor alloca address space in AAPrivatizablePtr

When we privatize a pointer (~argument promotion) we introduce new
private allocas as replacement. These need to be placed in the alloca
address space as later passes cannot properly deal with them otherwise.

Fixes https://github.com/llvm/llvm-project/issues/53725

[sanitizer] Try to enable test on Android

#53721 suggests that it should work after https://reviews.llvm.org/D119461

[MLIR][GPU][lld] Use LLD bundled in ROCm, removing workaround

Having clarified that executing the SerializeToHsaco pass can
depend on a ROCm installation, switch from calling lld as a library to
using the copy of lld guaranteed to be included in a ROCm install.

This removes the workaround introduced in D119277

Reviewed By: whchung

Differential Revision: https://reviews.llvm.org/D119463

[flang] Refine pointer/target test for ASSOCIATED intrinsic

The second argument to the ASSOCIATED intrinsic must be a valid pointer
or target. The test for this property only checked the last symbol
in a data-reference, but any symbol in the reference with the
POINTER or TARGET attribute will do.

Differential Revision: https://reviews.llvm.org/D119450

[clang-tidy] Add early exit for defaulted FunctionDecls

This prevents matching of defaulted comparison operators.

Resolves: https://github.com/llvm/llvm-project/issues/53355

Author: Febbe (Fabian Keßler)

[libc++][P2321R2] Add vector<bool>::reference::operator=(bool) const

Add vector<bool>::reference::operator(bool) const

Reviewed By: Quuxplusone, ldionne, #libc

Spies: BRevzin, libcxx-commits

Differential Revision: https://reviews.llvm.org/D117736

[libc][obvious] only include vector with malloc

the vector class, due to being dynamically resized, needs malloc. This
fixes the build so that it only includes it when malloc should be
available.

Differential Revision: https://reviews.llvm.org/D119464

[compiler-rt] Fix endianness in get_sock_peer_name test

Fix passing the port and IP address with the wrong endianness
in get_sock_peer_name() that causes the connect() to fail inside
without an outgoing network interface (it's trying to connect
to 1.0.0.127 instead of 127.0.0.1).

Differential Revision: https://reviews.llvm.org/D119461

[OpenMP][CUDA] Refine the logic to determine grid size

This patch refines the logic to determine grid size as previous method
can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual
hardware limit.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119311

Add -fmodules-local-submodule-visibility to MANDATORY_MODULE_BUILD_CFLAGS

to work around build failure introduced in https://reviews.llvm.org/D119036

[libc] add a vector internal class

Add a basic implementation of the vector class for use internally to
LLVM-libc.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D118954

[TTI][X86] Pull out repeated getSizeInBits() calls. NFC.

Fix a double debug info size counting in top level stats for "statistics dump".

This mainly affects Darwin targets (macOS, iOS, tvOS and watchOS) when these targets don't use dSYM files and the debug info was in the .o files. All modules, including the .o files that are loaded by the debug maps, were in the global module list. This was great because it allows us to see each .o file and how much it contributes. There were virtual functions on the SymbolFile class to fetch the symtab/debug info parse and index times, and also the total debug info size. So the main executable would add all of the .o file's stats together and report them as its own data. Then the "totalDebugInfoSize" and many other "totalXXX" top level totals were all being added together. This stems from the fact that my original patch only emitted the modules for a target at the start of the patch, but as comments from the reviews came in, we switched to emitting all of the modules from the global module list.

So this patch fixes it so when we have a SymbolFileDWARFDebugMap that loads .o files, the main executable will have no debug info size or symtab/debug info parse/index times, but each .o file will have its own data as a separate module. Also, to be able to tell when/if we have a dSYM file I have added a "symbolFilePath" if the SymbolFile for the main modules path doesn't match that of the main executable. We also include a "symbolFileModuleIdentifiers" key in each module if the module does have multiple lldb_private::Module objects that contain debug info so that you can track down the information for a module and add up the contributions of all of the .o files.

Tests were added that are labeled with @skipUnlessDarwin and @no_debug_info_test that test all of this functionality so it doesn't regress.

For a module with a dSYM file, we can see the "symbolFilePath" is included:
```
  "modules": [
    {
      "debugInfoByteSize": 1070,
      "debugInfoIndexLoadedFromCache": false,
      "debugInfoIndexSavedToCache": false,
      "debugInfoIndexTime": 0,
      "debugInfoParseTime": 0,
      "identifier": 4873280600,
      "path": "/Users/gclayton/Documents/src/lldb/main/Debug/lldb-test-build.noindex/commands/statistics/basic/TestStats.test_dsym_binary_has_symfile_in_stats/a.out",
      "symbolFilePath": "/Users/gclayton/Documents/src/lldb/main/Debug/lldb-test-build.noindex/commands/statistics/basic/TestStats.test_dsym_binary_has_symfile_in_stats/a.out.dSYM/Contents/Resources/DWARF/a.out",
      "symbolTableIndexTime": 7.9999999999999996e-06,
      "symbolTableLoadedFromCache": false,
      "symbolTableParseTime": 7.8999999999999996e-05,
      "symbolTableSavedToCache": false,
      "triple": "arm64-apple-macosx12.0.0",
      "uuid": "E1F7D85B-3A42-321E-BF0D-29B103F5F2E3"
    },
```
And for the DWARF in .o file case we can see the "symbolFileModuleIdentifiers" in the executable's module stats:
```
  "modules": [
    {
      "debugInfoByteSize": 0,
      "debugInfoIndexLoadedFromCache": false,
      "debugInfoIndexSavedToCache": false,
      "debugInfoIndexTime": 0,
      "debugInfoParseTime": 0,
      "identifier": 4603526968,
      "path": "/Users/gclayton/Documents/src/lldb/main/Debug/lldb-test-build.noindex/commands/statistics/basic/TestStats.test_no_dsym_binary_has_symfile_identifiers_in_stats/a.out",
      "symbolFileModuleIdentifiers": [
        4604429832
      ],
      "symbolTableIndexTime": 7.9999999999999996e-06,
      "symbolTableLoadedFromCache": false,
      "symbolTableParseTime": 0.000112,
      "symbolTableSavedToCache": false,
      "triple": "arm64-apple-macosx12.0.0",
      "uuid": "57008BF5-A726-3DE9-B1BF-3A9AD3EE8569"
    },
```
And the .o file for 4604429832 looks like:
```
    {
      "debugInfoByteSize": 1028,
      "debugInfoIndexLoadedFromCache": false,
      "debugInfoIndexSavedToCache": false,
      "debugInfoIndexTime": 0,
      "debugInfoParseTime": 6.0999999999999999e-05,
      "identifier": 4604429832,
      "path": "/Users/gclayton/Documents/src/lldb/main/Debug/lldb-test-build.noindex/commands/statistics/basic/TestStats.test_no_dsym_binary_has_symfile_identifiers_in_stats/main.o",
      "symbolTableIndexTime": 0,
      "symbolTableLoadedFromCache": false,
      "symbolTableParseTime": 0,
      "symbolTableSavedToCache": false,
      "triple": "arm64-apple-macosx"
    }
```

Differential Revision: https://reviews.llvm.org/D119400

Wild guess to fix LLDB bot

[gn build] Port bd3a1de683f8

[MLIR][GPU] Add now-required include to SerializeToHsaco

Reviewed By: whchung

Differential Revision: https://reviews.llvm.org/D119455

[flang] Handle "type(foo) function f" when foo is defined in f

Fortran allows forward references to derived types, including
function results that are typed in a prefix of a FUNCTION statement.
If a type is defined in the body of the function, a reference to
that type from a prefix on the FUNCTION statement must resolve to
the local symbol, even and especially when that type shadows one
from the host scope.

The solution is to defer the processing of that type until the
end of the function's specification part. But the language doesn't
allow for forward references to other names in the prefix, so defer
the processing of the type only when it is not an intrinsic type.
The data structures in name resolution that track this information
for functions needed to become a stack in order to make this work,
since functions can contain interfaces that are functions.

Differential Revision: https://reviews.llvm.org/D119448

[clang-cl] Support the /JMC flag

The introduction and some examples are on this page:
https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/

The `/JMC` flag enables these instrumentations:
- Insert at the beginning of every function immediately after the prologue with
  a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`.
  The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean
  global variable (the global variable is initialized to 1) with the name
  convention `__<hash>_<filename>`. All such global variables are placed in
  the `.msvcjmc` section.
- The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping
  with a directory path. MSVC uses some unknown hashing function. Here I
  used DJB.
- Add a dummy/empty COMDAT function `__JustMyCode_Default`.
- Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link
  option via ".drectve" section. This is to prevent failure in
  case `__CheckForDebuggerJustMyCode` is not provided during linking.

Implementation:
All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D118428

[AArch64][ARM] add -Wunaligned-access only for clang

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D119301

[AArch64][LoadStoreOptimizer] Ignore undef registers when checking rename register used between paired instructions.

The content of undef registers are not used in meaningful ways, when checking
if a rename register is used between paired instructions we should ignore
undef registers.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D119305

[clang-format] Do not remove required spaces when aligning tokens.

Fixes https://github.com/llvm/llvm-project/issues/44292.
Fixes https://github.com/llvm/llvm-project/issues/45874.

Reviewed By: HazardyKnusperkeks, owenpan

Differential Revision: https://reviews.llvm.org/D119419

[mlir][vector] Add pattern to drop lead unit dim for Contraction Op

If the result operand has a unit leading dim it is removed from all operands.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D119206

[X86] Add smulo and umulo with add+load test coverage

[clangd] Crash in __memcmp_avx2_movbe

There is a clangd crash at `__memcmp_avx2_movbe`. Short problem description is below.

The method `HeaderIncludes::addExistingInclude` stores `Include` objects by reference at 2 places: `ExistingIncludes` (primary storage) and `IncludesByPriority` (pointer to the object's location at ExistingIncludes). `ExistingIncludes` is a map where value is a `SmallVector`. A new element is inserted by `push_back`. The operation might do resize. As result pointers stored at `IncludesByPriority` might become invalid.

Typical stack trace
```
    frame #0: 0x00007f11460dcd94 libc.so.6`__memcmp_avx2_movbe + 308
    frame #1: 0x00000000004782b8 clangd`llvm::StringRef::compareMemory(Lhs="
\"t2.h\"", Rhs="", Length=6) at StringRef.h:76:22
    frame #2: 0x0000000000701253 clangd`llvm::StringRef::compare(this=0x0000
7f10de7d8610, RHS=(Data = "", Length = 7166742329480737377)) const at String
Ref.h:206:34
  * frame #3: 0x00000000007603ab clangd`llvm::operator<(llvm::StringRef, llv
m::StringRef)(LHS=(Data = "\"t2.h\"", Length = 6), RHS=(Data = "", Length =
7166742329480737377)) at StringRef.h:907:23
    frame #4: 0x0000000002d0ad9f clangd`clang::tooling::HeaderIncludes::inse
rt(this=0x00007f10de7fb1a0, IncludeName=(Data = "t2.h\"", Length = 4), IsAng
led=false) const at HeaderIncludes.cpp:365:22
    frame #5: 0x00000000012ebfdd clangd`clang::clangd::IncludeInserter::inse
rt(this=0x00007f10de7fb148, VerbatimHeader=(Data = "\"t2.h\"", Length = 6))
const at Headers.cpp:262:70
```

A unit test test for the crash was created (`HeaderIncludesTest.RepeatedIncludes`). The proposed solution is to use std::list instead of llvm::SmallVector

Test Plan
```
./tools/clang/unittests/Tooling/ToolingTests --gtest_filter=HeaderIncludesTest.RepeatedIncludes
```

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D118755

[RISCV] Lower the shufflevector equivalent of vector.splice

We can lower a vector splice to a vslidedown and a vslideup.

The majority of the matching code here came from X86's code for matching
PALIGNR and VPALIGND/Q.

The slidedown and slideup lowering don't really require it to be concatenation,
but it happened to be an interesting pattern with existing analysis code I
could use.

This helps with cases where the scalar loop optimizer forwarded a load
result from a previous loop iteration. For example, this happens if the
loop uses x[i] and x[i+1] on the same iteration. The scalar optimizer
will forward x[i+1] load from the previous loop to satisfy x[i] on this
loop. When this get vectorized it results in one element of a vector
being forwarded from the previous loop to be concatenated with elements
loaded on this iteration.

Whether that's more efficient than doing a shifted loaded or reloading
the single scalar and using vslide1up is an interesting question.
But that's not something the backend can help with.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D119039

[flang] Lower simple RETURN statement

This patch adds the lowering for the RETURN statement
without alternate returns in the main program or in subroutine
and functions.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D119429

Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>

[flang][NFC] Replace hardcoded attribute name

Replace the hardcoded attribute name with the constexpr StringRef
defined in the FIROps.td file.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D119422

[RISCV] Move the creation of VLMaxSentinel to isel. Use X0 during lowering.

The VLMaxSentinel is represented as TargetConstant, but that's included
in isa<ConstantSDNode>. To keep constant VLs and VLMax separate as long
as possible, use the X0 register during lowering and only convert to
VLMaxSentinel during isel.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118845

[clang-cl] Accept the "legacy" -target flag spelling

we already accept "--target=". No reason to not accept "-target" too
(that's the one I typically use for some reason).

Differential revision: https://reviews.llvm.org/D119446

[SVE] Remove AArch64ISD::ADD_PRED and AArch64ISD::SUB_PRED.

These nodes provide an indirection that is not necessary because
SVE has unpredicated add/sub instructions and there's no downside
to using them for partial register operations. In fact, the test
changes show that unifying how fixed-length and scalable vector
add/sub are lowered enables better use of existing isel patterns.

Differential Revision: https://reviews.llvm.org/D119355

[X86] Add tests showing failure to use LEA to avoid spoiling EFLAGS from smulo

Add smulo and umulo test coverage

[X86] getFMA3OpcodeToCommuteOperands - use unreachable to detect fma3 format mismatch

Matches what we do in getThreeSrcCommuteCase.

Fixes static analyzer out of bounds array access warning.

[hwasan][test] Rework memaccess-clobber.ll

Previously memaccess-clobber.ll relied on both legacy PM-specific things
like `-analyze` and MemoryDependenceAnalysis, which are both deprecated.

This uses MemorySSA, which is the cool new thing that a bunch of passes
have migrated to.

Differential Revision: https://reviews.llvm.org/D119393

[RISCV] Remove stale comment. NFC

Now that we pre-process SPLAT_VECTOR to VFMV_V_F_VL, these patterns
handled scalable vectors and vectors converted from fixed. These
are also used by vp.fma lowering.

[libc++] Remove usage of `_LIBCPP_DEBUG` in `__comp_ref_type` and replace with `_LIBCPP_DEBUG_LEVEL`

In libc++, checking specific `_LIBCPP_DEBUG_LEVEL` levels is used everywhere except in `comp_ref_type.h`. `_LIBCPP_DEBUG` is meant as a user-facing option, and internally libc++ should be checking the value of `_LIBCPP_DEBUG_LEVEL`.

The definition of `std::__debug_less` doesn't need to be hidden behind the macro, we can unconditionally expose it. It will be unused by `__comp_ref_type` unless debug mode is enabled.

This was suggested in D118940.

Reviewed By: #libc, philnik, Quuxplusone, ldionne

Differential Revision: https://reviews.llvm.org/D118950

[libc++] Fix std::__debug_less in c++17.

b07b5bd72716625e0976a84d23652d94d8d0165a adds a use of `__comp_ref_type.h` to `std::min`. When libc++ is built with `-D_LIBCPP_DEBUG=0`, this enables `std::__debug_less`, which is only marked constexpr after c++17.

`std::min` itself is marked as being `constexpr` as of c++14, so by extension, `std::__debug_less` should also be marked `constexpr` for the same versions so that `std::min` can use it. This change lowers the guard from `> 17` to `> 11`.

Reproducer in godbolt: https://godbolt.org/z/ans3TGsj8

```

constexpr int x() { return std::min<int>({1, 2, 3, 4}); }

static_assert(x() == 1);
```

Reviewed By: #libc, philnik, Quuxplusone, ldionne

Differential Revision: https://reviews.llvm.org/D118940

[libc++][nfc] Add TEST_HAS_NO_LOCALIZATION.

This avoids using an libc++ internal macro in our tests.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D119352

[DebugInfo][InstrRef] Avoid duplicate instruction numbers in x86-lea-fixup

This new-ish LEA-fixup code path creates two substitutions for an
instruction number -- this is incorrect because each Value should be
replaced by a single replacement Value. Fix by deleting the duplicate
substitution. Add some test coverage for this path with debug-info
attached.

Differential Revision: https://reviews.llvm.org/D119232

[AMDGPU] Missed sign/zero extend patterns for divergence-driven instruction selection

This change includes tablegen patterns that were missed by https://reviews.llvm.org/D110950 and https://reviews.llvm.org/D76230

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D119302

[AMDGPU] Pull out repeated getVecSize() calls. NFC.

This is guaranteed to be evaluated so we can avoid repeated calls.

Helps the static analyzer as it couldn't recognise that each getVecSize() would return the same value.

[AArch64] Improve codegen for get.active.lane.mask when SVE is available

When lowering the get.active.lane.mask intrinsic with a fixed-width
predicate vector result, we can actually make use of the SVE whilelo
instruction when SVE is enabled. We do this by carefully choosing
a sensible VT for the whilelo instruction, then promoting it to an
integer vector, i.e. nxv16i1 -> nx16i8. We can then extract a v16i8
subvector and truncate back to the original return type, i.e. v16i1.
This leads to a significant improvement in code quality.

Differential Revision: https://reviews.llvm.org/D116664

[libc++] Prepare string.{contains, ends_with, iterators, require, starts_with} tests for constexpr

Reviewed By: ldionne, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D119304

[LV] Add tests with chained first-order recurrences.

[mlir][sparse][pytaco] migrate to sparse compiler pipeline

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D119395

[OpenMP][NFC] update status for 5.1 'nothing' directive to 'worked on'

Differential Revision: https://reviews.llvm.org/D119440

[fir] Fix FlangOptimizerTests link on Solaris

As reported in Issue #53690,
`tools/flang/unittests/Optimizer/FlangOptimizerTests` `FAIL`s to link on
Solaris:

  Undefined                       first referenced
   symbol                             in file
  _ZN3fir7runtimeL8getModelIcEEPFN4mlir4TypeEPNS2_11MLIRContextEEv lib/libFIRBuilder.a(Reduction.cpp.o)

which is `mlir::Type (*fir::runtime::getModel<char>())(mlir::MLIRContext*)`.

`clang++` warn's

  In file included from /var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/flang/lib/Optimizer/Builder/Runtime/Reduction.cpp:14:
  /var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h:60:34: warning: function 'fir::runtime::getModel<char>' has internal linkage but is not defined [-Wundefined-internal]
  static constexpr TypeBuilderFunc getModel();
                                   ^
  /var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h:289:29: note: used here
        TypeBuilderFunc ret = getModel<RT>();
                              ^

Fixed by adding an explicit template instantiation for `getModel<char>`.  I
suppose this is necessary because on Solaris `char` is `signed`.

Tested on `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D119438

[gn build] Port 9d9053190498

[InstSimplify] Remove zero-index opaque pointer GEP

With opaque pointers, a zero-index GEP is a no-op. It does not
need to be retained for the pointer element type change it may
perform.

[libc++][ranges] Implement std::ranges::swap_ranges()

Implement `std::ranges::swap_ranges()`

Reviewed By: Quuxplusone, #libc, ldionne

Spies: ldionne, mgorny, jloser, libcxx-commits

Differential Revision: https://reviews.llvm.org/D116303

clangd: Set a diagnostic on a code action resulting from a tweak

... if there is a match.
This is needed to that clients can can make a connection between a
diagnostic and an associated quickfix-tweak.
Ideally, quickfix-kind tweak code actions would be provided inline along
with the non-tweak fixes, but this doesn't seem easily achievable.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D118976

[AMDGPU] Rename DSAtomicCmpXChg to DSAtomicCmpXChgSwapped. NFC.

This is just a reminder that the operands are swapped compared with all
the other CmpXChg instructions.

Differential Revision: https://reviews.llvm.org/D119421

[SVE] Remove redundant hasBF16 calls from lowering code.

The are several places where hasBF16 is used to protect code that
has no requirement for the +bf16 feature. The lowering code uses
stock SVE instructions for things like loads and stores and so is
safe even when +bf16 is not available.

NOTE: Currently the nxvbf16 type is not legal unless the +bf16
feature is available, but that isn't an issue because the affected
code is post type legalisation.

NOTE: This patch mirrors previous work that removed the same
redundant protection from isel patterns where the resulting
selection emitted stock SVE instructions.

Differential Revision: https://reviews.llvm.org/D119328

[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 y` handling

While that effectively concludes i1 select handling,
that boolean restriction can be lifted later.

[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 C : i1 y` pattern

https://alive2.llvm.org/ce/z/uRvVtN

[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 C` pattern

https://alive2.llvm.org/ce/z/2Q7Du_

[SCEV] Recognize `cond ? i1 0 : i1 y` as `umin_seq ~cond, x`

By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/N6XwV-

[SCEV] Recognize `cond ? i1 x : i1 1` as `~umin_seq cond, ~x`

By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/aqe9GK

[SCEV] Recognize logical `or` as `not umin_seq (not, not)`

By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/MUfbTL

[SCEV] Recognize logical `and` as `umin_seq`

By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/59KuZZ

[SCEV] `createNodeForSelectOrPHI()`: try constant-folding even if not an Instruction

We'd catch the tautological select pattern later anyways
due to constant folding, so that leaves PHI-like select,
but it does not appear to fire there.

[NFC][SCEV] Prepare `createNodeForSelectOrPHI()` for gaining additional strategy

Currently `createNodeForSelectOrPHI()` takes an Instruction,
and only works on the Cond that is an ICmpInst,
but that can be relaxed somewhat.

For now, simply rename the existing function,
and add a thin wrapper ontop that still does
the same thing as it used to.

[SCEV] Recognize binary `xor` as bit-wise `add`

https://alive2.llvm.org/ce/z/ULuZxB

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `add`
bit/element -wise, the expression will be rather large,
so let's not do that for now.

[SCEV] Recognize binary `and` as bit-wise `umin`

https://alive2.llvm.org/ce/z/aKAr94

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umin`
bit/element -wise, the expression will be rather large,
so let's not do that for now.

[SCEV] Recognize binary `or` as bit-wise `umax`

https://alive2.llvm.org/ce/z/SMEaoc

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umax`
bit/element -wise, the expression will be rather large,
so let's not do that for now.

[NFC][SCEV] Add some tests with logical operations and whatnot

[SVE] Prefer zero-extending loads when lowering ISD::EXTLOAD.

The decision is perhaps arbitrary but I figure zeroing has no
dependency on the value being loaded.

Differential Revision: https://reviews.llvm.org/D119327

[SVE][CodeGen] Bail out for scalable vectors in AArch64TargetLowering::ReconstructShuffle

Previously the code in AArch64TargetLowering::ReconstructShuffle assumed
the input vectors were always fixed-width, however this is not always
the case since you can extract elements from scalable vectors and insert
into fixed-width ones. We were hitting crashes here for two different
cases:

1. When lowering a fixed-length vector extract from a scalable vector
with i1 element types. This happens due to the fact the i1 elements
get promoted to larger integer types for fixed-width vectors and leads
to sequences of INSERT_VECTOR_ELT and EXTRACT_VECTOR_ELT nodes. In this
case AArch64TargetLowering::ReconstructShuffle will still fail to make
a transformation, but at least it no longer crashes.
2. When lowering a sequence of extractelement/insertelement operations
on mixed fixed-width/scalable vectors.

For now, I've just changed AArch64TargetLowering::ReconstructShuffle to
bail out if it finds a scalable vector.

Tests for both instances described above have been added here:

(1) CodeGen/AArch64/sve-extract-fixed-vector.ll
(2) CodeGen/AArch64/sve-fixed-length-reshuffle.ll

Differential Revision: https://reviews.llvm.org/D116602

[mlir] Add missing dep to new cf dialect

[cross-project-tests] REQUIRES: system-darwin in llgdb-tests/asan-deque.cpp

Some configurations of gdb pretty print std::deque and some don't. Make
this test run only on system-darwin (which uses lldb instead), otherwise
it will fail on some non-darwin machines and not others.

[clang] Add test for C++ DR2390

DR2390 clarifies that the argument to __has_cpp_attribute() needs to be
macro-expanded. Clang already supports this and tests it explicitly in
clang/test/Preprocessor/has_attribute.cpp.

Copy the test over to clang/test/CXX/drs/ so the make_cxx_drs script
picks it up.

Differential Revision: https://reviews.llvm.org/D119230

[clang][tests] Add test for C++ DR2406

Clang already handles this fine, so add a test case to let the
make_cxx_dr_status script pick it up.

Differential Revision: https://reviews.llvm.org/D119224

[mlir][linalg] Fold tensor.pad(linalg.fill) with the same value

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D119160

[mlir][vector] Add helper that builds a scalar reduction according to CombiningKind

Differential Revision: https://reviews.llvm.org/D119433

[clang-tidy] add option performance-move-const-arg.CheckMoveToConstRef

This option allows callers to disable the warning from
https://clang.llvm.org/extra/clang-tidy/checks/performance-move-const-arg.html
that would warn on the following

```
void f(const string &s);
string s;
f(std::move(s)); // ALLOWED if performance-move-const-arg.CheckMoveToConstRef=false
```

The reason people might want to disable this check, is because it allows
callers to use `std::move()` or not based on local reasoning about the
argument, and without having to care about how the function `f` accepts
the argument. Indeed, `f` might accept the argument by const-ref today,
but change to by-value tomorrow, and if the caller had moved the
argument that they were finished with, the code would work as
efficiently as possible regardless of how `f` accepted the parameter.

Reviewed By: ymandel

Differential Revision: https://reviews.llvm.org/D119370

[InstCombine] Extend fold (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0) to support smin intrinsic

Replace matchSelectPattern pattern match with the more general m_SMin so that it can handle smin intrinsics as well as the icmp+select pattern

Noticed while reviewing regressions from D98152

[InstCombine] Add test showing failure to fold (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0) with smin intrinsic

Noticed while reviewing regressions from D98152

[InstCombine] reduce mul operands based on undemanded high bits

We already do this in SDAG, but mul was left out of the fold
for unused high bits in IR.

The high bits of a mul's operands do not change the low bits
of the result:
https://alive2.llvm.org/ce/z/XRj5Ek

Verify some test diffs to confirm that they are correct:
https://alive2.llvm.org/ce/z/y_W8DW
https://alive2.llvm.org/ce/z/7DM5uf
https://alive2.llvm.org/ce/z/GDiHCK

This gets a fold that was presumed not possible in D114272:
https://alive2.llvm.org/ce/z/tAN-WY

Removing nsw/nuw is needed for general correctness (and is
also done in the codegen version), but we might be able to
recover more of those with better analysis.

Differential Revision: https://reviews.llvm.org/D119369

[FastISel] Remove redundant reg class check (NFC)

SrcVT and DstVT are the same in this branch, as such their register
classes will also be the same.

[CodeGen] Regenerate test checks (NFC)