review.tizen.org Git - platform/upstream/llvm.git/log

[tsan] Relax atexit5.cpp a bit more so it's not as dependent on the standard library implementation

[clang][deps] NFC: Extract function

This commits extracts a couple of nested conditions into a separate function with early returns, making the control flow easier to understand.

Added line numbers to the debug output of PDL bytecode.

This is a small diff that splits out the debug output for PDL bytecode. When running bytecode with debug output on, it is useful to know the line numbers where the PDLIntepr operations are performed. Usually, these are in a single MLIR file, so it's sufficient to print out the line number rather than the entire location (which tends to be quite verbose). This debug output is gated by `LLVM_DEBUG` rather than `#ifndef NDEBUG` to make it easier to test.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D114061

Multi-root PDL matching using upward traversals.

This is commit 4 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

This PR integrates the various components (root ordering algorithm, nondeterministic execution of PDL bytecode) to implement multi-root PDL matching. The main idea is for the pattern to specify mulitple candidate roots. The PDL-to-PDLInterp lowering selects one of these roots and "hangs" the pattern from this root, traversing the edges downwards (from operation to its operands) when possible and upwards (from values to its uses) when needed. The root is selected by invoking the optimal matching multiple times, once for each candidate root, and the connectors are determined form the optimal matching. The costs in the directed graph are equal to the number of upward edges that need to be traversed when connecting the given two candidate roots. It can be shown that, for this choice of the cost function, "hanging" the pattern an inner node is no better than from the optimal root.

The following three main additions were implemented as a part of this PR:
1. OperationPos predicate has been extended to allow tracing the operation accepting a value (the opposite of operation defining a value).
2. Predicate checking if two values are not equal - this is useful to ensure that we do not traverse the edge back downwards after we traversed it upwards.
3. Function for for building the cost graph among the candidate roots.
4. Updated buildPredicateList, building the predicates optimal branching has been determined.

Testing: unit tests (an integration test to follow once the stack of commits has landed)

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108550

Implementation of the root ordering algorithm

This is commit 3 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

We form a graph over the specified roots, provided in `pdl.rewrite`, where two roots are connected by a directed edge if the target root can be connected (via a chain of operations) in the underlying pattern to the source root. We place a restriction that the path connecting the two candidate roots must only contain the nodes in the subgraphs underneath these two roots. The cost of an edge is the smallest number of upward traversals (edges) required to go from the source to the target root, and the connector is a `Value` in the intersection of the two subtrees rooted at the source and target root that results in that smallest number of such upward traversals. Optimal root ordering is then formulated as the problem of finding a spanning arborescence (i.e., a directed spanning tree) of minimal weight.

In order to determine the spanning arborescence (directed spanning tree) of minimum weight, we use the [Edmonds' algorithm](https://en.wikipedia.org/wiki/Edmonds%27_algorithm). The worst-case computational complexity of this algorithm is O(_N_^3) for a single root, where _N_ is the number of specified roots. The `pdl`-to-`pdl_interp` lowering calls this algorithm as a subroutine _N_ times (once for each candidate root), so the overall complexity of root ordering is O(_N_^4). If needed, this complexity could be reduced to O(_N_^3) with a more efficient algorithm. However, note that the underlying implementation is very efficient, and _N_ in our instances tends to be very small (<10). Therefore, we believe that the proposed (asymptotically suboptimal) implementation will suffice for now.

Testing: a unit test of the algorithm

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108549

Introduced iterative bytecode execution.

This is commit 2 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

This commit implements the features needed for the execution of the new operations pdl_interp.get_accepting_ops, pdl_interp.choose_op:
1. The implementation of the generation and execution of the two ops.
2. The addition of Stack of bytecode positions within the ByteCodeExecutor. This is needed because in pdl_interp.choose_op, we iterate over the values returned by pdl_interp.get_accepting_ops until we reach finalize. When we reach finalize, we need to return back to the position marked in the stack.
3. The functionality to extend the lifetime of values that cross the nondeterministic choice. The existing bytecode generator allocates the values to memory positions by representing the liveness of values as a collection of disjoint intervals over the matcher positions. This is akin to register allocation, and substantially reduces the footprint of the bytecode executor. However, because with iterative operation pdl_interp.choose_op, execution "returns" back, so any values whose original liveness cross the nondeterminstic choice must have their lifetime executed until finalize.

Testing: pdl-bytecode.mlir test

Reviewed By: rriddle, Mogball

Differential Revision: https://reviews.llvm.org/D108547

[AArch64][SVEIntrinsicOpts] Fix: predicated SVE mul/fmul are not commutative

We can not swap multiplicand and multiplier because the sve intrinsics
are predicated. Imagine lanes in vectors having the following values:
         pg = 0
         multiplicand = 1 (from dup)
         multiplier = 2
The resulting value should be 1, but if we swap multiplicand and multiplier it will become 2,
which is incorrect.

Differential Revision: https://reviews.llvm.org/D114577

[Docs] Removed /Zd flag still mentioned in documentation

https://reviews.llvm.org/D93458 removed the /Zd flag as MSVC doesn't support that syntax. Instead users should be using -gline-tables-only.
The /Zd flag is still mentioned at https://clang.llvm.org/docs/UsersManual.html#clang-cl : /Zd Emit debug line number tables only.

Fix PR52571

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D114632

Defines new PDLInterp operations needed for multi-root matching in PDL.

This is commit 1 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

These operations are:
* pdl.get_accepting_ops: Returns a list of operations accepting the given value or a range of values at the specified position. Thus if there are two operations `%op1 = "foo"(%val)` and `%op2 = "bar"(%val)` accepting a value at position 0, `%ops = pdl_interp.get_accepting_ops of %val : !pdl.value at 0` will return both of them. This allows us to traverse upwards from a value to operations accepting the value.
* pdl.choose_op: Iteratively chooses one operation from a range of operations. Therefore, writing `%op = pdl_interp.choose_op from %ops` in the example above will select either `%op1`or `%op2`.

Testing: Added the corresponding test cases to mlir/test/Dialect/PDLInterp/ops.mlir.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108543

[libunwind][ARM] Handle end of stack during unwind

When unwind step reaches the end of the stack that means the force unwind should notify the stop function.
This is not an error, it could mean just the thread is cleaned up completely.

Reviewed By: #libunwind, mstorsjo

Differential Revision: https://reviews.llvm.org/D109856

[AMDGPU] Add SIMemoryLegalizer comments to clarify bit usage

Attempt to further document the intended cache policies requested
by different combinations of GLC, SLC and DLC bits.
GFX10 non-temporal stores are updated to set GLC.

Reviewed By: t-tye

Differential Revision: https://reviews.llvm.org/D114351

[GlobalISel] Fold or of shifts to funnel shift.

This change folds a basic funnel shift idiom:
- (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt)
- (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt)

This also helps in folding to rotate shift if x and y are equal since we
already have a funnel shift to rotate combine.

Differential Revision: https://reviews.llvm.org/D114499

[LoopVectorize] When tail-folding, don't always predicate uniform loads

In VPRecipeBuilder::handleReplication if we believe the instruction
is predicated we then proceed to create new VP region blocks even
when the load is uniform and only predicated due to tail-folding.

I have updated isPredicatedInst to avoid treating a uniform load as
predicated when tail-folding, which means we can do a single scalar
load and a vector splat of the value.

Tests added here:

Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll

Differential Revision: https://reviews.llvm.org/D112552

[clang][deps] NFC: Clean up wording (ignored vs minimized)

The filesystem used during dependency scanning does two things: it caches file entries and minimizes source file contents. We use the term "ignored file" in a couple of places, but it's not clear what exactly that means. This commit clears up the semantics, explicitly spelling out this relates to minimization.

[clang][deps] NFC: Remove else after early return

[AArch64][SVE] Generate ASRD instructions for power of 2 signed divides

Differential Revision: https://reviews.llvm.org/D113281

[ARM] Generate VCTP from SETCC

This converts a vector SETCC([0,1,2,..], splat(n), ult) to vctp n, which
can be fewer instructions and prevent the need for constant pool loads.

Differential Revision: https://reviews.llvm.org/D114177

[DAG] SimplifyDemandedVectorElts - attempt to handle ADD(x,x) as single use

If the ADD node is the only user of the repeated operand, then treat this as single use - allows us to peek through shl(x,1) patterns.

[lldb] Fix 'memory write' to not allow specifying values when writing file contents

Currently the 'memory write' command allows specifying the values when
writing the file contents to memory but the values are actually ignored. This
patch fixes that by erroring out when values are specified in such cases.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D114544

[libcxx] Implement three-way comparison for std::reverse_iterator

This patch implements operator<=> for std::reverse_iterator and
also adds a test that checks that three-way comparison of different
instantiations of std::reverse_iterator works as expected (related to
D113417).

Reviewed By: ldionne, Quuxplusone, #libc

Differential Revision: https://reviews.llvm.org/D113695

[clang] Fix crash on broken parameter declarators

Differential Revision: https://reviews.llvm.org/D114609

[ARM] Add some vctp from setcc tests. NFC

[clang] Change ordering of PreableCallbacks to make sure PP can be referenced in them

Currently, BeforeExecute is called before BeginSourceFile which does not allow
using PP in the callbacks. Change the ordering to ensure it is possible.
This is a prerequisite for D114370.

Originated from a discussion with @kadircet.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D114525

[CodeGen] Add scalable vector support for lowering of llvm.get.active.lane.mask

Currently the generic lowering of llvm.get.active.lane.mask is done
in SelectionDAGBuilder::visitIntrinsicCall and currently assumes
only fixed-width vectors are used. This patch changes the code to be
more generic and support scalable vectors too. I have added tests
for SVE here:

CodeGen/AArch64/active_lane_mask.ll

although the code quality leaves a lot to be desired. The code will
be improved significantly in a later patch that makes use of the
SVE whilelo instruction.

Differential Revision: https://reviews.llvm.org/D114541

[mlir][linalg] Simplify the hoist padding tests.

Use primarily matvec instead of matmul to test hoist padding. Test the hoisting only starting from already padded IR. Use one-dimensional tiling only except for the tile_and_fuse test that exercises hoisting on a larger loop nest with fill and pad tensor operations in the backward slice.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114608

[clang][AST] Check context of record in structural equivalence.

The AST structural equivalence check did not differentiate between
a struct and a struct with same name in different namespace. When
type of a member is checked it is possible to encounter such a case
and wrongly decide that the types are similar. This problem is fixed
by check for the namespaces of a record declaration.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D113118

[mlir][Vector] Minor formatting fixes in Vector.md

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D113854

tsan: remember and print function that installed at_exit callbacks

Sometimes stacks for at_exit callbacks don't include any of the user functions/files.
For example, a race with a global std container destructor will only contain
the container type name and our at_exit_wrapper function. No signs what global variable
this is.
Remember and include in reports the function that installed the at_exit callback.
This should give glues as to what variable is being destroyed.

Depends on D114606.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D114607

tsan: add a test for on_exit

Depends on D114605.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D114606

tsan: add test for __cxa_atexit

Add a test for a common C++ bug when a global object is destroyed
while background threads still use it.

Depends on D114604.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D114605

tsan: check stack in atexit4.cpp test

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D114604

[llvm] Use range-based for loops (NFC)

[AMDGPU] Make vector superclasses allocatable

The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300

[ELF] Rename BaseCommand to SectionCommand. NFC

BaseCommand was picked when PHDRS/INSERT/etc were not implemented. Rename it to
SectionCommand to match `sectionCommands` and make it clear that the commands
are used in SECTIONS (except a special case for SymbolAssignment).

Also, improve naming of some BaseCommand variables (base -> cmd).

[NFC] Fix typo in 95875d246acb

[mlir][linalg][bufferize][NFC] Pass BufferizationState to PostAnalysisStep

Pass BufferizationStep instead of BufferizationAliasInfo. Note: BufferizationState contains BufferizationAliasInfo.

Differential Revision: https://reviews.llvm.org/D114512

[mlir][linalg][bufferize] Compose dialect-specific bufferization state

Use composition instead of inheritance for storing dialect-specific bufferization state. This is in preparation of adding "tensor dialect"-specific bufferization state.

Differential Revision: https://reviews.llvm.org/D114508

[mlir][linalg][bufferize][NFC] Allow returning arbitrary memrefs

If `allowReturnMemref` is set to true, arbitrary memrefs may be returned from FuncOps. Also remove allocation hoisting code, which is only partly implemented at the moment.

The purpose of this commit is to untangle `bufferize` from `aliasInfo`. (Even with this change, they are not fully untangled yet.)

Differential Revision: https://reviews.llvm.org/D114507

[mlir][linalg][bufferize][NFC] Extract func boundary bufferization

Bufferization of function boundaries is extracted from ComprehensiveBufferize into a separate file. This will become its own build target in the future.

Differential Revision: https://reviews.llvm.org/D114226

[ELF] Make ExprValue smaller. NFC'

[ELF] Rename OutputSection::sectionCommands to commands. NFC

This partially reverts r315409: the description applies to LinkerScript, but not
to OutputSection.

The name "sectionCommands" is used in both LinkerScript::sectionCommands and
OutputSection::sectionCommands, which may lead to confusion.
"commands" in OutputSection has no ambiguity because there are no other types
of commands.

[mlir][linalg][bufferize][NFC] Move Affine interface impl to new build target

This makes ComprehensiveBufferize entirely independent of the Affine dialect.

Differential Revision: https://reviews.llvm.org/D114222

Fix link to the other docs from the Bufferization dialect

[ELF] Remove redundant part.dynSymTab creation. NFC

[ELF] Simplify GnuHashSection::write. NFC

[ELF] Simplify DynamicSection content computation. NFC

The new code computes the content twice, but avoides the tricky
std::function<uint64_t()>. Removed 13KiB code in a Release build.

[DebugInfo][InstrRef] Add extra indirection for NRVO tests

In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs
after SelectionDAG, even in instruction referencing mode (if the variable
is an argument). If the corresponding argument value is spilt to the stack,
then we have:
* Indirection from it being on the stack,
* Indirection from it being a dbg.declare or a dbg.addr.

However InstrRefBasedLDV only emits one level of indirection. This patch
adds the second, by adding an extra DW_OP_deref if necessary. The two
tests modified fail otherwise -- they feature some NRVO, and require two
levels of indirection to be correct.

Differential Revision: https://reviews.llvm.org/D114364

[clang][NFC] Inclusive terms: rename AccessDeclContextSanity to AccessDeclContextCheck

Rename function to more inclusive name.

Reviewed By: quinnp

Differential Revision: https://reviews.llvm.org/D114029

[NFC] Inclusive language: rename master flag to main flag

[NFC] As part of using inclusive language within the llvm project, this patch
renames master flag to main flag in these comments.

Reviewed By: ZarkoCA

Differential Revision: https://reviews.llvm.org/D114090

[NFC][flang] Inclusive language: remove instances of master

[NFC] As part of using inclusive language within the llvm project, this patch:
- replaces master with main in C++style.md to match the renaming of the master
branch,
- removes master from `FortranIR.md` where it is superfluous,
- renames a logical variable in `pre-fir-tree04.f90` containing master.

Reviewed By: ZarkoCA

Differential Revision: https://reviews.llvm.org/D113923

[DebugInfo][InstrRef] Avoid some quadratic behaviour in LiveDebugVariables

This is a performance patch -- LiveDebugVariables can behave quadratically
if a lot of debug instructions are inserted back into the same place, and
we have to repeatedly step-over hte ones we've already inserted.

To get around it, whenever we insert a debug instruction at a slot index,
check whether there are more debug instructions to insert at this point,
and insert them too. That avoids the repeated lookup and stepping through.
It relies on the container for unlinked debug instructions being recorded
in-order, which is how LiveDebugVariables currently does it.

Differential Revision: https://reviews.llvm.org/D114587

[libunwind] Fix testing with sanitizers enabled

When testing with sanitizers enabled, we need to link against a plethora
of system libraries. Using `-nodefaultlibs` like we used to breaks this,
and we would have to add all these system libraries manually, which is
not portable and error prone. Instead, stop using `-nodefaultlibs` so
that we get the libraries added by default by the compiler.

The only caveat with this approach is that we are now relying on the
fact that `-L <path-to-local-libunwind>` will cause the just built
libunwind to be selected before the system implementation (either of
libunwind or libgcc_s.so), which is somewhat fragile.

This patch also turns the 32 bit multilib build into a soft failure
since we are in the process of removing it anyway, see D114473 for
details. This patch is incompatible with the 32 bit multilib build
because Ubuntu does not provide a proper libstdc++ for 32 bits, and
that is required when running with sanitizers enabled.

Differential Revision: https://reviews.llvm.org/D114385

[ThreadPool] Use auto again for future with ENABLE_THREADS=Off.

This fixes a build failure with LLVM_ENABLE_THREADS=Off.

[mlir][Vector] Support 0-D vectors in `VectorPrintOpConversion`

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114549

[AIX] Disable unsupported offloading gpu tests

GPUs are not supported on AIX, so this patch sets these tests as unsupported.

Reviewed By: stevewan

Differential Revision: https://reviews.llvm.org/D114381

Recommit [ThreadPool] Support returning futures with results.

This reverts commit 71a7c55f0f021b04b9a7303d0cd391b9161cf303.

The revert broken building llvm-reduce and it is not clear it fixes an
issue with LLVM_ENABLE_THREADS=OFF.

See discussion in https://reviews.llvm.org/D114183 for more details.

[clang-format] Extend AllowShortBlocksOnASingleLine for else blocks

Extend AllowShortBlocksOnASingleLine for else blocks. See https://bugs.llvm.org/show_bug.cgi?id=49722

Reviewed By: HazardyKnusperkeks, owenpan, MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D114320

[NFC][llvm] Inclusive language: replace master in llvm docs

[NFC] As part of using inclusive language within the llvm project, this patch
removes instances of master in these files.

Reviewed By: ZarkoCA

Differential Revision: https://reviews.llvm.org/D114187

[clang-format] NFC update LLVM overall clang-formatted status

Whilst the % clang-formatted remains the same, the number
of files added to the LLVM project has risen by almost by 259.

- 190 of them have been added clang-format clean.
- 69 files have been added unformatted. (lit tests should be excluded from this number)

- 291 files have been added to the list of files that are clang-format clean
- 101 files have either become unclean or have been removed

As this updates the clang-formatted-files there are now
8139 files that are clean which we can be used as a regression test when making changes to clang-format.

```
clang-format -verbose -n -files ./clang/docs/tools/clang-formatted-files.txt
```

[NFC][compiler-rt] Inclusive language: replace master/slave with primary/secondary

[NFC] As part of using inclusive language within the llvm project, this patch
replaces master and slave with primary and secondary respectively in
`sanitizer_mac.cpp`.

Reviewed By: ZarkoCA

Differential Revision: https://reviews.llvm.org/D114255

[MLIR] NFC. Rename MLIR CAPI ExecutionEngine target for consistency

Rename MLIR CAPI ExecutionEngine target for consistency:
MLIRCEXECUTIONENGINE -> MLIRCAPIExecutionEngine in line with other
targets.

Differential Revision: https://reviews.llvm.org/D114596

[PowerPC] Prevent the optimizer from producing wide vector types in IR.

This patch prevents the optimizer from producing wide vectors in the IR,
specifically the MMA types (v256i1, v512i1). The idea is that on Power, we only
want to be producing these types only if the vector_pair and vector_quad types
are used specifically.

To prevent the optimizer from producing these types in the IR,
vectorCostAdjustmentFactor() is updated to return an instruction cost factor or
an invalid instruction cost if the current type is that of an MMA type. An
invalid instruction cost returned by this function signifies to other cost
computing functions to return the maximum instruction cost to inform the
optimizer that producing these types within the IR is expensive, and should not
be produced in the first place.

This issue was first seen in the test case included within this patch.

Differential Revision: https://reviews.llvm.org/D113900

[NFC][llvm] Inclusive language: replace master with main in dbg-call-site-spilled-arg.mir

[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `dbg-call-site-spilled-arg.mir`.

Reviewed By: ZarkoCA

Differential Revision: https://reviews.llvm.org/D114097

[LLDB] Provide target specific directories to libclang

On Linux some C++ and C include files reside in target specific directories, like /usr/include/x86_64-linux-gnu.
Patch adds them to libclang, so LLDB jitter has more chances to compile expression.

OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg

Reviewed By: teemperor

Differential Revision: https://reviews.llvm.org/D110827

[libc++] Fix ssize test that made an assumption about ptrdiff_t being 'long'

On some platforms like armv7m, the size() method of containers returns
unsigned long, while ptrdiff_t is just int. Hence, std::ssize_t ends up
being long, which is not the same as ptrdiff_t. This is usually not an
issue because std::ptrdiff_t is long, so everything works out, but it
breaks on some more exotic architectures.

Differential Revision: https://reviews.llvm.org/D114563

[CMake] Add new cmake option to control adding comments in GenDAGISel

Add new cmake option `LLVM_OMIT_DAGISEL_COMMENTS` to control adding
of comments in GenDAGISel for none debug builds

Ref: https://reviews.llvm.org/D78884

Reviewed By: nemanjai, MaskRay, #powerpc

Differential Revision: https://reviews.llvm.org/D114122

tsan: new runtime (v3)

This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603

Revert "[ThreadPool] Support returning futures with results."

This reverts commit 6149e57dc1313d32c85524f8009a1249e0b8f4d1.

The offending commit broke building with LLVM_ENABLE_THREADS=OFF.

[NFC][clang-tools-extra] Inclusive language: replace master with main

[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `SubModule2.h`.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D114100

[llvm] Use range-based for loops (NFC)

[libc++] Avoid overload resolution in path comparison operators

Rework `std::filesystem::path::operator==` and friends to avoid overload
resolution and atomic constraint caching issues shown from
https://reviews.llvm.org/D113161.

Always call `__compare(string_view)` from the comparison operators which avoids
overload resolution.

Differential Revision: https://reviews.llvm.org/D114570

[libc++] Fix constraints for string_view's iterator/sentinel constructor

The `string_view` constructor taking an iterator/sentinel uses concepts
instead of type traits like the Standard states. Using `same_as` instead
of `is_same_v` should be harmless. Prefer `std::is_same_v` instead which is
cheaper to compile. Replace `convertible_to` with `is_convertible_v` as
well.

This observation came up while working on
https://reviews.llvm.org/D113161

Differential Revision: https://reviews.llvm.org/D114561

tsan: fix another potential deadlock in fork

Linux/fork_deadlock.cpp currently hangs in debug mode in the following stack.
Disable memory access handling in OnUserAlloc/Free around fork.

1  0x000000000042c54b in __sanitizer::internal_sched_yield () at sanitizer_linux.cpp:452
2  0x000000000042da15 in __sanitizer::StaticSpinMutex::LockSlow (this=0x57ef02 <__sanitizer::internal_allocator_cache_mu>) at sanitizer_mutex.cpp:24
3  0x0000000000423927 in __sanitizer::StaticSpinMutex::Lock (this=0x57ef02 <__sanitizer::internal_allocator_cache_mu>) at sanitizer_mutex.h:32
4  0x000000000042354c in __sanitizer::GenericScopedLock<__sanitizer::StaticSpinMutex>::GenericScopedLock (this=this@entry=0x7ffcabfca0b8, mu=0x1) at sanitizer_mutex.h:367
5  0x0000000000423653 in __sanitizer::RawInternalAlloc (size=size@entry=72, cache=cache@entry=0x0, alignment=8, alignment@entry=0) at sanitizer_allocator.cpp:52
6  0x00000000004235e9 in __sanitizer::InternalAlloc (size=size@entry=72, cache=0x1, cache@entry=0x0, alignment=4, alignment@entry=0) at sanitizer_allocator.cpp:86
7  0x000000000043aa15 in __sanitizer::SymbolizedStack::New (addr=4802655) at sanitizer_symbolizer.cpp:45
8  0x000000000043b353 in __sanitizer::Symbolizer::SymbolizePC (this=0x7f578b77a028, addr=4802655) at sanitizer_symbolizer_libcdep.cpp:90
9  0x0000000000439dbe in __sanitizer::(anonymous namespace)::StackTraceTextPrinter::ProcessAddressFrames (this=this@entry=0x7ffcabfca208, pc=4802655) at sanitizer_stacktrace_libcdep.cpp:36
10 0x0000000000439c89 in __sanitizer::StackTrace::PrintTo (this=this@entry=0x7ffcabfca2a0, output=output@entry=0x7ffcabfca260) at sanitizer_stacktrace_libcdep.cpp:109
11 0x0000000000439fe0 in __sanitizer::StackTrace::Print (this=0x18) at sanitizer_stacktrace_libcdep.cpp:132
12 0x0000000000495359 in __sanitizer::PrintMutexPC (pc=4802656) at tsan_rtl.cpp:774
13 0x000000000042e0e4 in __sanitizer::InternalDeadlockDetector::Lock (this=0x7f578b1ca740, type=type@entry=2, pc=pc@entry=4371612) at sanitizer_mutex.cpp:177
14 0x000000000042df65 in __sanitizer::CheckedMutex::LockImpl (this=<optimized out>, pc=4) at sanitizer_mutex.cpp:218
15 0x000000000042bc95 in __sanitizer::CheckedMutex::Lock (this=0x600001000000) at sanitizer_mutex.h:127
16 __sanitizer::Mutex::Lock (this=0x600001000000) at sanitizer_mutex.h:165
17 0x000000000042b49c in __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock (this=this@entry=0x7ffcabfca370, mu=0x1) at sanitizer_mutex.h:367
18 0x000000000049504f in __tsan::TraceSwitch (thr=0x7f578b1ca980) at tsan_rtl.cpp:656
19 0x000000000049523e in __tsan_trace_switch () at tsan_rtl.cpp:683
20 0x0000000000499862 in __tsan::TraceAddEvent (thr=0x7f578b1ca980, fs=..., typ=__tsan::EventTypeMop, addr=4499472) at tsan_rtl.h:624
21 __tsan::MemoryAccessRange (thr=0x7f578b1ca980, pc=4499472, addr=135257110102784, size=size@entry=16, is_write=true) at tsan_rtl_access.cpp:563
22 0x000000000049853a in __tsan::MemoryRangeFreed (thr=thr@entry=0x7f578b1ca980, pc=pc@entry=4499472, addr=addr@entry=135257110102784, size=16) at tsan_rtl_access.cpp:487
23 0x000000000048f6bf in __tsan::OnUserFree (thr=thr@entry=0x7f578b1ca980, pc=pc@entry=4499472, p=p@entry=135257110102784, write=true) at tsan_mman.cpp:260
24 0x000000000048f61f in __tsan::user_free (thr=thr@entry=0x7f578b1ca980, pc=4499472, p=p@entry=0x7b0400004300, signal=true) at tsan_mman.cpp:213
25 0x000000000044a820 in __interceptor_free (p=0x7b0400004300) at tsan_interceptors_posix.cpp:708
26 0x00000000004ad599 in alloc_free_blocks () at fork_deadlock.cpp:25
27 __tsan_test_only_on_fork () at fork_deadlock.cpp:32
28 0x0000000000494870 in __tsan::ForkBefore (thr=0x7f578b1ca980, pc=pc@entry=4904437) at tsan_rtl.cpp:510
29 0x000000000046fcb4 in syscall_pre_fork (pc=1) at tsan_interceptors_posix.cpp:2577
30 0x000000000046fc9b in __sanitizer_syscall_pre_impl_fork () at sanitizer_common_syscalls.inc:3094
31 0x00000000004ad5f5 in myfork () at syscall.h:9
32 main () at fork_deadlock.cpp:46

Depends on D114595.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114597

tsan: fix Java heap block begin in reports

We currently use a wrong value for heap block
(only works for C++, but not for Java).
Use the correct value (we already computed it before, just forgot to use).

Depends on D114593.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114595

tsan: add a benchmark for vector memory accesses

Depends on D114592.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114593

tsan: add a test for vector memory accesses

Add a basic test that checks races between vector/non-vector
read/write accesses of different sizes/offsets in different orders.
This gives coverage of __tsan_read/write16 callbacks.

Depends on D114591.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114592

tsan: enable -msse4 when compiling tests

Vector SSE accesses make compiler emit __tsan_[unaligned_]read/write16 callbacks.
Make it possible to test these.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114591

[ARM] Convert fptoi.sat to fixed point multiply

This is a very small addition to the existing MVE fixed point vcvt code
to also create them from FP_TO_SINT_SAT and FP_TO_UINT_SAT nodes, which
should be equally valid for native saturating converts under MVE.

Differential Revision: https://reviews.llvm.org/D114360

[llvm][ubsan] Inclusive language: replace use of blacklist HandleLLVMOptions.cmake but use old option name

Retry at https://reviews.llvm.org/D113689, this time with using the old option name
to support older versions of clang.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D114033

[DebugInfo][InstrRef] Track variable assignments in out-of-scope blocks

DBG_INSTR_REF's and DBG_VALUE's can end up in blocks that aren't in the
lexical scope of their variable. It's arguable as to what we should do
about this, however VarLocBasedLDV permits such variable locations to be
propagated, so let's allow it in InstrRefBasedLDV.

It's necessary for the modified test to work.

Differential Revision: https://reviews.llvm.org/D114578

[ARM] Add fptosi.sat variants of the fixed point vcvt tests. NFC

[clang][OpenMP][DebugInfo] Debug support for private variables inside an OpenMP task construct

Currently variables appearing inside private/firstprivate/lastprivate
clause of openmp task construct are not visible inside lldb debugger.
This is because compiler does not generate debug info for it.

Please consider the testcase debug_private.c attached with patch.

```
   28   #pragma omp task shared(res) private(priv1, priv2) firstprivate(fpriv)
   29       {
   30         priv1 = n;
   31         priv2 = n + 2;
   32         printf("Task n=%d,priv1=%d,priv2=%d,fpriv=%d\n",n,priv1,priv2,fpriv);
   33
-> 34         res = priv1 + priv2 + fpriv + foo(n - 1);
   35       }
   36   #pragma omp taskwait
   37       return res;
(lldb) p priv1
error: <user expression 0>:1:1: use of undeclared identifier 'priv1'
priv1
^
(lldb) p priv2
error: <user expression 1>:1:1: use of undeclared identifier 'priv2'
priv2
^
(lldb) p fpriv
error: <user expression 2>:1:1: use of undeclared identifier 'fpriv'
fpriv
^
```

After the current patch, lldb is able to show the variables

```
(lldb) p priv1
(int) $0 = 10
(lldb) p priv2
(int) $1 = 12
(lldb) p fpriv
(int) $2 = 14
```

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D114504

Don't store nullptrs in mlir::FuncOp::getAll*Attrs' result

These methods for results and arguments would create an ArrayRef full
of nullptrs when there were no argument attributes. This is problematic
because this result could not be passed to the FuncOp::build creator
without causing a segfault. Now the list will have empty attributes.

Differential Revision: https://reviews.llvm.org/D114358

[PowerPC/ Regenerate fp128-bitcast-after-operation test checks

Revert "[SLP]Improve analysis/emission of vector operands for alternate nodes."

This reverts commit 496254cf802a21e1967b61dec48017b8ec831574 to fix
compiler crashes reported in D114101#3152982.

[MLIR] [docs] Fix misguided examples in memref.subview operation.

The examples in `memref.subview` operation are misguided in that subview's strides operands mean "memref-rank number of strides that compose multiplicatively with the base memref strides in each dimension.".
So the below examples should be changed from `Strides: [64, 4, 1]` to `Strides: [1, 1, 1]`

Before changes
```
// Subview with constant offsets, sizes and strides.
%1 = memref.subview %0[0, 2, 0][4, 4, 4][64, 4, 1]
  : memref<8x16x4xf32, (d0, d1, d2) -> (d0 * 64 + d1 * 4 + d2)> to
    memref<4x4x4xf32, (d0, d1, d2) -> (d0 * 64 + d1 * 4 + d2 + 8)>
```

After changes
```
// Subview with constant offsets, sizes and strides.
%1 = memref.subview %0[0, 2, 0][4, 4, 4][1, 1, 1]
  : memref<8x16x4xf32, (d0, d1, d2) -> (d0 * 64 + d1 * 4 + d2)> to
    memref<4x4x4xf32, (d0, d1, d2) -> (d0 * 64 + d1 * 4 + d2 + 8)>
```

Also I fixed some syntax issues in docs related with memref layout map and added detailed explanation in subview rank reducing case.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D114500

[NFC][llvm] Inclusive language: reword uses of sanity test and check

Part of continuing work to use more inclusive language. Reworded uses
of sanity check and sanity test in llvm/test/

[clangd] Move IncludeCleaner tracer to the actual computation

This way we won't get results with 0 ms for all the users with disabled
IncludeCleaner.

[clang-format] [C++20] [Module] clang-format couldn't recognize partitions

https://bugs.llvm.org/show_bug.cgi?id=52517

clang-format is butchering modules, this could easily become a barrier to entry for modules given clang-formats wide spread use.

Prevent the following from adding spaces around the `:` (cf was considering the ':' as an InheritanceColon)

Reviewed By: HazardyKnusperkeks, owenpan, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D114151

[lldb/gdb-remote] Ignore spurious ACK packets

Although I cannot find any mention of this in the specification, both
gdb and lldb agree on sending an initial + packet after establishing the
connection.

OTOH, gdbserver and lldb-server behavior is subtly different. While
lldb-server *expects* the initial ack, and drops the connection if it is
not received, gdbserver will just ignore a spurious ack at _any_ point
in the connection.

This patch changes lldb's behavior to match that of gdb. An ACK packet
is ignored at any point in the connection (except when expecting an ACK
packet, of course). This is inline with the "be strict in what you
generate, and lenient in what you accept" philosophy, and also enables
us to remove some special cases from the server code. I've extended the
same handling to NAK (-) packets, mainly because I don't see a reason to
treat them differently here.

(The background here is that we had a stub which was sending spurious
+ packets. This bug has since been fixed, but I think this change makes
sense nonetheless.)

Differential Revision: https://reviews.llvm.org/D114520

[lldb/gdb-remote] Remove initial pipe-draining workaround

This code, added in rL197579 (Dec 2013) is supposed to work around what
was presumably a qemu bug, where it would send unsolicited stop-reply
packets after the initial connection.

At present, qemu does not exhibit such behavior. Also, the 10ms delay
introduced by this code is sufficient to mask bugs in other stubs, but
it is not sufficient to *reliably* mask those bugs. This resulted in
flakyness in one of our stubs, which was (incorrectly) sending a +
packet at the start of the connection, resulting in a small-but-annoying
number of dropped connections.

Differential Revision: https://reviews.llvm.org/D114529

[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED)

If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

Reapplied with fix for AArch64 rev patterns to matching the ARM fix.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354

[clang-format]NFC improve the comment to match the code

Missing from {D114519}

[clang-format] [PR52595] clang-format does not recognize rvalue references to array

https://bugs.llvm.org/show_bug.cgi?id=52595

missing space between `T(&&)` but not between `T (&` due to && being incorrectly thought of as `UnaryOperator` rather than `PointerOrReference`

```
int operator()(T (&)[N]) { return 0; }
int operator()(T(&&)[N]) { return 1; }
```

Existing Unit tests are changed because actually I think they are originally incorrect, and are inconsistent with the (&) cases that are 4 or 5 lines above them.

Reviewed By: curdeius

Differential Revision: https://reviews.llvm.org/D114519

[mlir] Move memref.[tensor_load|buffer_cast|clone] to "bufferization" dialect.

https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712

Differential Revision: https://reviews.llvm.org/D114552

[mlir][linalg] Cleanup hoisting test (NFC).

Rename the check prefixes to HOIST21 and HOIST32 to clarify the different flag configurations.

Depends On D114438

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114442

[mlir][linalg] Perform checks early in hoist padding.

Instead of checking for unexpected operations (any operation with a region except for scf::For and `padTensorOp` or operations with a memory effect) while cloning the packing loop nest perform the checks early. Update `dropNonIndexDependencies` to check for unexpected operations. Additionally, check all of these operations have index type operands only.

Depends On D114428

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114438

[mlir][linalg] Limit hoist padding to constant paddings.

Limit hoist padding to pad tensor ops that depend only on a constant value. Supporting arbitrary padding values that depend on computations part of the backward slice to hoist require complex analysis to ensure the computation can be hoisted.

Depends On D114420

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114428

[mlir][linalg] Add backward slice filtering in hoist padding.

Adapt hoist padding to filter the backward slice before cloning the packing loop nest. The filtering removes all operations that are not used to index the hoisted pad tensor op and its extract slice op. The filtering is needed to support the more complex loop nests created after fusion. For example, fusing the producer of an output operand can added linalg ops and pad tensor ops to the backward slice. These operations have regions and currently prevent hoisting.

The following example demonstrates the effect of the newly introduced `dropNonIndexDependencies` method that filters the backward slice:
```
%source = linalg.fill(%cst, %arg0)
scf.for %i
  %unrelated = linalg.fill(%cst, %arg1)    // not used to index %source!
  scf.for %j (%arg2 = %unrelated)
    scf.for %k                             // not used to index %source!
      %ubi = affine.min #map(%i)
      %ubj = affine.min #map(%j)
      %slice = tensor.extract_slice %source [%i, %j] [%ubi, %ubj]
      %padded_slice = linalg.pad_tensor %slice
```
dropNonIndexDependencies(%padded_slice, %slice)
removes [scf.for %k, linalg.fill(%cst, %arg1)] from backwardSlice.

Depends On D114175

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114420

[clangd] Add ObjC method support to prepareCallHierarchy

This fixes "textDocument/prepareCallHierarchy" in clangd for ObjC methods. Details at https://github.com/clangd/vscode-clangd/issues/247.

clangd uses Decl::isFunctionOrFunctionTemplate to check if the decl given in a prepareCallHierarchy request is eligible for prepareCallHierarchy. We change to use isFunctionOrMethod which includes functions and ObjC methods.

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D114058