review.tizen.org Git - platform/upstream/llvm.git/log

[InstCombine] add tests for pow reassociation; NFC

[NFC] [scudo] actually fix DCHECK now

[NFC] [scudo] fix mistake in DCHECK

sorry, my test build (and all the pre-merge bots) did not exercise this.

[OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation)
The current only way to obtain pinned memory with libomptarget is to use a custom allocator llvm_omp_target_alloc_host.
This reflects well the CUDA implementation of libomptarget, but it does not correctly expose the AMDGPU runtime API,
where any system allocated page can be locked/unlocked through a call to hsa_amd_memory_lock/unlock.
This patch enables users to allocate memory through malloc (mmap, sbreak) and then pin the related memory pages
with a libomptarget special call. It is a base support in the amdgpu libomptarget plugin to enable users to prelock
their host memory pages so that the runtime doesn't need to lock them itself for asynchronous memory transfers.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D139208

AMDGPU: Fix format string indexes for existing llvm.printf.fmts

The index stored to the buffer is just an index into this named
metadata. It would more robust to produce a private constant table,
and use a constant expression to index into it.

Revert "[clang-scan-deps] Migrate to OptTable"

This reverts commit 0c60ec699fc1ccca2e444ceb041cad9b1dca3a64.

[clang-scan-deps] Migrate to OptTable

Differential Revision: https://reviews.llvm.org/D139949

[SimplifyCFG] Reapply: when eliminating `unreachable` landing pads, mark `call`s as `nounwind`

This time the change is in it's least intrusive form since only the return
type in prototype for `removeUnwindEdge()` is changed, since only a single
specific caller need that knowledge.

We really can't recover that knowledge, and `nounwind` knowledge,
(and not just a lack of the unwind edge, aka `call` instead of `invoke`),
is e.g. part of the reasoning in e.g. `mayHaveSideEffects()`.

Note that this is call-site-specific knowledge,
just because some callsite had an `unreachable`
unwind edge, does not mean that all will.

[mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr)

This is a purely mechanical change that introduces an enum attribute in the GPU
dialect to represent the various memref memory spaces as opposed to the
hard-coded integer attributes that are currently used.

The following steps were taken to make the transition across the codebase:

1. Introduce a pass "gpu-lower-memory-space-attributes":

The pass updates all memref types that have a memory space attribute that is a
`gpu::AddressSpaceAttr`. These attributes are changed to `IntegerAttr`'s using a
mapping that is given by the caller. This pass is based on the
"map-memref-spirv-storage-class" pass and the common functions can probably
be refactored into a set of utilities under the MemRef dialect.

2. Update the verifiers of GPU/NVGPU dialect operations.

If a verifier currently checks the address space of an operand using
e.g.`getWorkspaceAddressSpace`, then it can continue to do so. However, the
checks are changed to only fail if the memory space is either missing or a wrong
value of type `gpu::AddressSpaceAttr`. Otherwise, it just assumes the address
space is correct because it was specifically lowered to something other than a
`gpu::AddressSpaceAttr`.

3. Update existing gpu-to-llvm conversion infrastructure.

In the existing gpu-to-X passes, we add a full conversion equivalent to
`gpu-lower-memory-space-attributes` just before doing the conversion to the
LLVMDialect. This is done because currently both the gpu-to-llvm passes
(rocdl,nvvm) run gpu-to-gpu rewrites within the pass, which introduce
`AddressSpaceAttr` memory space annotations. Therefore, I inserted the
memory space conversion between the gpu-to-gpu rewrites and the LLVM
conversion.

For more context see the below discourse discussion:
https://discourse.llvm.org/t/gpu-workgroup-shared-memory-address-space-is-hard-coded/

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D140644

[Sanitizer] Clean up SANITIZER_CAN_USE_ALLOCATOR64 logic

Update: A change to this code recently broke our bots after this change enabled the 64 bit allocator for defined(aarch64): https://reviews.llvm.org/D137136

We added logic initially to get our test passing, but want to further clean up this code to enable MacOS to use allocator64 and increase readability and clarity of the logic.

rdar://103647896

Differential Revision: https://reviews.llvm.org/D141171

allocation_ring_buffer_size to 0 disables stack collection

Reviewed By: hctim, eugenis

Differential Revision: https://reviews.llvm.org/D141631

[VPlan] Use to_vector when iterating over a temporary vector. (NFC)

AMDGPU: Some printf call edge case tests

Check printf printing printf, and printf passed to a function.

AMDGPU: Don't expand printf users if printf is defined

[thread] Fix a FIXME with std::apply. NFCI

Revert "[lldb] Add Debugger & ScriptedMetadata reference to Platform::CreateInstance"

This reverts commit 2d53527e9c64c70c24e1abba74fa0a8c8b3392b1.

[libc++][doc] Updates the release notes.

This is a preparation for the upcoming LLVM 17 release.

Reviewed By: ldionne, philnik, avogelsgesang, jloser, #libc

Differential Revision: https://reviews.llvm.org/D141304

[mlir][sparse] Improve ConcatenateOp rewriting for annotated all dense result.

Previously, we rely on InsertOp to add values to the result, in the same way we
add values to a sparse tensor with compressed dimensions. We now direct store
values to the values buffer.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D141517

[NFC] Remove Function::getParamAlignment

Differential Revision: https://reviews.llvm.org/D141696

[AArch64] Add some tests for vscale being a power 2. NFC

[Clang][Sema] Enabled implicit conversion warning for CompoundAssignment operator.

This change enables implicit conversion warnings (like Wshorten-64-to-32) for compound assignment operator with integral operands.

rdar://10466193

Differential Revision: https://reviews.llvm.org/D139114

[libunwind] Fixed an upcoming clang -Wsign-conversion warning

Fixing an upcoming clang warning (from https://reviews.llvm.org/D139114) in libunwind.

Differential Revision: https://reviews.llvm.org/D141515

[mlir][spirv] Fix crash in spirv-lower-abi-attributes

... when the are no SPIR-V env attributes.

Fixes: https://github.com/llvm/llvm-project/issues/59983

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D141695

[mlir][vector] Support multiple result types in vector.mask

The verifier already had support for multiple result types, but the op definition assumed a single, optional result.

Differential Revision: https://reviews.llvm.org/D141683

[Driver][RISCV] Adjust the priority between -mcpu, -mtune and -march

RISC-V supports `-march`, `-mtune`, and `-mcpu`: `-march` provides the
architecture extension information, `-mtune` provide the pipeline model, and
`-mcpu` provides both.

What's the priority among those options for now(w/o this patch)?

Pipeline model:
- Take from `-mtune` if present.
- Take from `-mcpu` if present
- Use the default pipeline model: `generic-rv32` or `generic-rv64`
Architecture extension has quite complicated behavior now:
- Take union from `-march` and `-mcpu` if both are present.
- Take from `-march` if present.
- Take from `-mcpu` if present.
- Implied from `-mabi` if present.
- Use the default architecture depending on the target triple

We treat `-mcpu`/`-mtune` and `-mcpu`/`-march` differently, and it's
kind of counterintuitive: -march is explicitly specified but ignored.

This patch adjusts the priority between `-mcpu`/`-march`, letting it use
architecture extension information from `-march` if it's present.

So the priority of architecture extension information becomes:
- Take from `-march` if present.
- Take from `-mcpu` if present.
- Implied from `-mabi` if present.
- Use the default architecture depending on the target triple

And this also match what we implement in RISC-V GCC too.

Reviewed By: craig.topper, MaskRay

Differential Revision: https://reviews.llvm.org/D140693

[mlir][vector] Add scalable vectors support to OuterProductOp

This will probably be the first in a series of patches that tries to
enable code generation for ARM SME (extension of SVE).

Since SME's core operation is the outer product instruction, I figured
that it would probably be a good idea to enable the outer product
operation to properly accept and generate scalable vectors.

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D138718

Move definitions to prevent incomplete types.

C++20 is more strict when erroring out due to incomplete types.
Thus the code required some restructoring so that it complies in C++20.

Differential Revision: https://reviews.llvm.org/D141671

Add default constructurs to `filter_iterator_impl` and `filter_iterator_impl`.

Bases of `reverse_iterator` must be default-constructible. This is enforced when using `libstdc++-12` plus C++20.

Differential Revision: https://reviews.llvm.org/D141587

[mlir][bufferization][NFC] Make getEnclosingRepetitiveRegion public

These functions are generally useful and not specific to One-Shot Analysis. Move them to `BufferizableOpInterface.h` and make them public.

Differential Revision: https://reviews.llvm.org/D141685

[InstCombine] improve description of fold and add TODO; NFC

D58633

[MachineCombiner] Lift same-bb restriction for reassociable ops.

This patch relaxes the restriction that both reassociate operands must
be in the same block as the root instruction.

The comment indicates that the reason for this restriction was that the
operands not in the same block won't have a depth in the trace.

I believe this is outdated; if the operand is in a different block, it
must dominate the current block (otherwise it would need to be phi),
which in turn means the operand's block must be included in the current
rance, and depths must be available.

There's a test case (no_reassociate_different_block) added in
70520e2f1c5fc4 which shows that we have accurate depths for operands
defined in other blocks.

This allows reassociation of code that computes the final reduction
value after vectorization, among other things.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D141302

[HashBuilder] Simplify with C++17. NFCI

[include-cleaner] Remove a stale FIXME.

This FIXME was addressed in 0e545816a9e582af29ea4b9441fea8ed376cf52a.

[AArch64] Update 2 RME MEC instruction encodings

The encodings of these 2 RME MEC instructions are
incorrect and need swapping:

- DC CIPAE
- DC CIGDPAE

The correct encoding is:

Operation       op0     op1     CRn     CRm     op2
DC CIPAE, Xt    0b01    0b100   0b0111  0b1110  0b000
DC CIGDPAE, Xt  0b01    0b100   0b0111  0b1110  0b111

Differential Revision: https://reviews.llvm.org/D141689

[AArch64] Add extra tests for sinking to umull/smull. NFC

[lld][Mach-O] Fix build with Mach-O due to missing library

Summary:
The build was failing due to an undefined symbol. This is because this
function is defined in the `BitWriter` component but was not linked.

[ARM] Accept two-register form of vnmul

The previous vnmul only accepts three registers. It should accept either
two or three registers as vmul does.

Differential Revision: https://reviews.llvm.org/D141405

[analyzer] Fix a FIXME. NFCI

[MLIR] Fold outer dims permutation to pack when propagating

Instead of folding the transpose into the linalg.generic keep the
transposition in the packing operation, effectively making the
linalg.generic transparent to the propagation. Additionally, if the init
operand of the generic has users pack the init and pass it as the
operand to the generic.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D141483

[LVI] Check for non-speculatable instructions

When constraining an operand based on a condition at a (potentially
transitive) use site, make sure we don't skip over non-speculatable
instructions. While the result is only used under the condition,
the non-speculatable instruction may have a side-effect or UB.

Demonstrating this issue requires raising the limit on the walk,
so do that.

Deprecate DataLayout::getPrefTypeAlignment

[lldb][NFC] Remove dependency on DataLayout::getPrefTypeAlignment

[mlir][NFC] Remove dependency on DataLayout::getPrefTypeAlignment

[clang][NFC] Remove dependency on DataLayout::getPrefTypeAlignment

[CVP] Add additional tests for use-site conditions (NFC)

For longer chains, we need to consider non-speculatable instructions.

[compiler-rt] fix typo in #95507c82

Revert "Workaround an assertion failure during module build"

This reverts commit e77e14ecf17bba5f9e2ef43d8c3dbc9c86685287.

According to @rsmith on https://reviews.llvm.org/D131858, he believes
the reason for this workaround in the firstplace has been fixed.
Reverting the workaround to hopefully confirm that is the case.

[CVP] Handle use-site conditions in urem folds

[CVP] Add additional tests for use-site conditions (NFC)

[compiler-rt] Fix XFAIL conditions after converting to 'target=...'

Fixes #60002.

[IPSCCP] Enable specialization of functions.

Re-enable the optimization after having fixed the compilation error
found in SPEC/CINT2017rate/502.gcc_r when both LTO and PGO are in use
(see https://reviews.llvm.org/D141474).

Differential Revision: https://reviews.llvm.org/D140210

[lld][MachO] Store test outputs in %t

[CVP] Avoid duplicate range fetch (NFC)

In preparation for switching this to use getConstantRangeAtUse().

Fix -Wlogical-op-parentheses warning inconsistency for const and constexpr values

When using the && operator within a || operator, both Clang and GCC
produce a warning for potentially confusing operator precedence.
However, Clang avoids this warning for certain patterns, such as
a && b || 0 or a || b && 1, where the operator precedence of && and ||
does not change the result.

However, this behavior appears inconsistent when using the const or
constexpr qualifiers. For example:

bool t = true;
bool tt = true || false && t; // Warning: '&&' within '||'
const bool t = true;
bool tt = true || false && t; // No warning
const bool t = false;
bool tt = true || false && t; // Warning: '&&' within '||'

The second example does not produce a warning because
true || false && t matches the a || b && 1 pattern, while the third one
does not match any of them.

This behavior can lead to the lack of warnings for complicated
constexpr expressions. Clang should only suppress this warning when
literal values are placed in the place of t in the examples above.

This patch adds the literal-or-not check to fix the inconsistent
warnings for && within || when using const or constexpr.

[clang][NFC] Remove dependency on DataLayout::getPrefTypeAlignment

[SVE] Restrict SVE fixed length extload/truncstore combine to float and double types.

Prior to this patch we would create floating point extending load
and truncating store operations involving fp128 types, which we
cannot lower.

Fixes #58530

Differential Revision: https://reviews.llvm.org/D140318

[X86][test] Add pre-commit test for D141657.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D141677

Re-apply "[mlir][SparseTensor] Add a few more tests for sparse vectorization"

This reverts commit 93f40c983e0adbb63cbb7c59814090134d691dd1.

Update the tests to also work on window.
The order in which the `arith.constant`s appear in the output IR is
slightly different between window and linux.
Use CHECK.*-DAG for the constants.

Original message:
These tests cover muli, xor, and, addf, subf, and addi.

The tests themselves are not that interesting, their goal is to provide
code coverage for all the types of reductions currently supported.

NFC

Differential Revision: https://reviews.llvm.org/D141369

[RISCV] Optimize (brcond (seteq (and X, (1 << C)-1), 0))

Inspired by gcc's assembly: https://godbolt.org/z/54hbzsGYn, while referring to D130203

Replace AND+IMM{32,64} with a slli.

But gcc does not handle 0xffff and 0xffffffff, which also seem to be optimizable.

The testcases copies all the bits in D130203 and adds 16, 32, and 64 bits.

Differential Revision: https://reviews.llvm.org/D141607

[LLDB][RISCV] Add RVDC instruction support for EmulateInstructionRISCV

RVC is the RISC-V standard compressed instruction-set extension, named "C", which reduces static and dynamic code size by adding short 16-bit instruction encodings for common operations, and RVCD is the compressed "D extension".

And "D extension" is a double-precision floating-point instruction-set extension, which adds double-precision floating-point computational instructions compliant with the IEEE 754-2008 arithmetic standard.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D140961

[flang] Initial support of allocate statement with source

Support allocate statement with source in runtime version. The source
expression is evaluated only once for each allocate statement. When the
source expression has shape-spec, uses it for bounds. Otherwise, get
the bounds from the source expression. Get the length if the source
expression has deferred length parameter.

Reviewed By: clementval, jeanPerier

Differential Revision: https://reviews.llvm.org/D137812

[lld][ARM] support position independent thunks for Armv4(T)

- Position independent thunks now work for both Armv4 and Armv4T
- Armv4 arm->arm thunks don't emit a BX anymore, which doesn't exist for the
arch. This fixes https://github.com/llvm/llvm-project/issues/50764.
- Armv4 and Armv4T both have the same arm->arm behaviour. Which also is
desirable for the above ticket.

Reviewed By: MaskRay, peter.smith

Differential Revision: https://reviews.llvm.org/D141272

[TableGen] Use raw_ostream::write_escaped instead of reinventing it. NFCI.

Recommit [SchedBoundary] Add dump method for resource usage.

Summary:
As supporting information, I have added an example that describes how
the indexes of the vector of resources SchedBoundary::ReservedCycles
are tracked by the field SchedBoundary::ReservedCyclesIndex.

This has a minor rework of
https://github.com/llvm/llvm-project/commit/b39a9a94f420a25a239ae03097c255900cbd660e
which was reverted in
https://github.com/llvm/llvm-project/commit/df6ae1779fafd9984e144a27315d6dd65b32c325
becasue the llc invocation of the test was missing the argument
`-mtriple`.

See for example the failure at
https://lab.llvm.org/buildbot#builders/231/builds/7245 that reported
the following when targeting a non-aarch64 native build:

'cortex-a55' is not a recognized processor for this target (ignoring processor)

Reviewers: jroelofs

Subscribers:

Differential Revision: https://reviews.llvm.org/D141367

[SCEV] Support all Min/Max SCEVs for GetMinTrailingZeros

There is already support for U/SMax. No reason why Min and SequentialMin
should not be supported.

NFC: code in GetMinTrailingZeroes is copied for a couple node types.
Refactor them into a single code block.

Differential Revision: https://reviews.llvm.org/D141568

Revert "[mlir][SparseTensor] Add a few more tests for sparse vectorization"

This reverts commit 904f2ccc3ba1d3aaf94140aa4595fd41af67d897.

This breaks a window bot. Reverting while I investigate.
https://lab.llvm.org/buildbot/#/builders/13/builds/30748

[mlir][memref] Add details on the semantic of reinterpret_cast

Make it clearer what the semantic of reinterpret_cast is.

In particular, call out that this instruction is not a no-op.

NFC

Related to https://github.com/llvm/llvm-project/issues/59896

Differential Revision: https://reviews.llvm.org/D141662

Revert "[SchedBoundary] Add dump method for resource usage."

Reverting because of https://lab.llvm.org/buildbot#builders/16/builds/41860

When building on x86, I need to specify also -mtriple in the
invocation of llc otherwise the folllowing error shows up:

'cortex-a55' is not a recognized processor for this target (ignoring processor)

This reverts commit b39a9a94f420a25a239ae03097c255900cbd660e.

[flang] Implement codegen of hlfir.designate with component refs

Lower all the different kinds of hlfir.designate component refs to
FIR.

Differential Revision: https://reviews.llvm.org/D141476

[InstCombine] Regenerate test checks (NFC)

Pick up changes in variables names.

[bazel] Updates for https://github.com/llvm/llvm-project/commit/aa0883b59ae17e5465906dad51b5561b5292a28d

[mlir] GreedyPatternRewriter: Add ancestors to worklist

When adding an op to the worklist, also add its ancestors to the worklist. This allows for RewritePatterns to match an op `a` based on what is inside of the body of `a`.

This change fixes a problem that became apparent with `vector.warp_execute_on_lane_0`, but could probably be triggered with similar patterns. The pattern extracts an op `b` with `eligible = true` from the body of an op `a`:
```
test.a {
  %0 = test.b() {eligible = true}
  yield %0
}
```

Afterwards:
```
%0 = test.b() {eligible = true}
test.a {
  yield %0
}
```

The pattern is an `OpRewritePattern<OpA>`. For some reason, `test.a` is not on the GreedyPatternRewriter's worklist. E.g., because no pattern could be applied and it was removed. Now, another pattern updates `test.b`, so that `eligible` is changed from `true` to `false`. The `OpRewritePattern<OpA>` could now be applied, but (without this revision) `test.a` is still not on the worklist.

Note: In the above example, an `OpRewritePattern<OpB>` could have been used instead of an `OpRewritePattern<OpA>`. With such a design, we can run into the same problem (when the `eligible` attr is on `test.a` and `test.b` is removed from the worklist because no patterns could be applied).

Note: This change uncovered an unrelated bug in TestSCFUtils.cpp that was triggered due to a change in the order in which ops are processed. A TODO is added to the broken code and test cases are adapted so that the bug is no longer triggered.

Differential Revision: https://reviews.llvm.org/D140304

[AMDGPU][AsmParser][NFC] Refine defining i8- and i16-typed custom operands.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D140799

[mlir][SparseTensor] Add a few more tests for sparse vectorization

These tests cover muli, xor, and, addf, subf, and addi.

The tests themselves are not that interesting, their goal is to provide
code coverage for all the types of reductions currently supported.

NFC

Differential Revision: https://reviews.llvm.org/D141369

[MemDep] Reduce block limit

The non-local MemDep analysis has a limit on the number of blocks
it will scan trying to find dependencies. The current limit of 1000
is very high, especially when we consider that each block scan can
also visit up to 100 instructions. In degenerate cases (where we
actually scan that many blocks) MemDep/GVN dominate overall
compile-time, for little benefit.

This patch reduces the limit to 200, which is probably still too
large, but at least mitigates some of the more catastrophic cases.
(For comparison, MSSA clobber walks consider up to 100
MemoryDefs/MemoryPhis, rather than 200 blocks * 100 instructions,
but these limits aren't directly comparable.)

I know that we were kind of hoping that this issue would resolve
itself in time, either by a switch to NewGVN or use of MSSA in GVN.
But I think we should still address this in the meantime.
Additionally, a switch to an MSSA-based implementation will
effectively be doing this as well, in a roundabout way (by dint of
MSSA having lower cutoffs than MDA).

Differential Revision: https://reviews.llvm.org/D140097

[SchedBoundary] Add dump method for resource usage.

As supporting information, I have added an example that describes how
the indexes of the vector of resources SchedBoundary::ReservedCycles
are tracked by the field SchedBoundary::ReservedCyclesIndex.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D141367

[Mips] Convert some tests to opaque pointers (NFC)

I'm not sure why, but the absence of bitcasts / no-op GEPs causes
the branch delay slot to be used.

Differential Revision: https://reviews.llvm.org/D141593

[mlir][affine][transform] Simplify affine.min/max ops with given constraints

This transform op uses `mlir::simplifyConstrainedMinMaxOp` to simplify `affine.min` and `affine.max` ops based on a given constraints.

Differential Revision: https://reviews.llvm.org/D140997

[GVN][NFC] Refactor GVN::AnalyzeLoadAvailability method

Simplify AnalyzeLoadAvailability code:
1. Use std::optional for return value
2. Use range-based loop for non-local dependencies

Differential Revision: https://reviews.llvm.org/D141664

Reland "[mlir][llvm] Add an explicit void type debug info attribute."

Previously, the DISubroutineType attribute used an optional result
parameter and an optional argument types array to model the subroutine
signature. LLVM IR debug metadata, on the other hand, has one types
list whose first entry maps to the result type. That entry may be
null to model a void result type. The type list may also be entirely
empty not specifying any type information. The latter is problematic
since the current DISubroutineType attribute cannot express it.

The revision changes DISubroutineTypeAttr to closely follow the
LLVM metadata design. In particular, it uses a single types parameter
array to model the subroutine signature and introduces an explicit
DIVoidResultTypeAttr to model the null entries.

Reviewed By: Dinistro

Differential Revision: https://reviews.llvm.org/D141261

This reverts commit 81f57b6
and relands commit a960547

Fixes flang build and drop_begin on an empty array ref.

[clang] Redefine some AVR specific macros

Fixes https://github.com/llvm/llvm-project/issues/58855

Reviewed By: aykevl, Miss_Grape

Differential Revision: https://reviews.llvm.org/D141598

[Driver] Add crtfastmath.o on Solaris if appropriate

`Flang :: Driver/fast_math.f90` `FAIL`s on Solaris because `crtfastmath.o`
is missing from the link line.

This patch adds it as appropriate.

Tested on `amd64-pc-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D141596

MachineIRBuilder: Add buildMergeValues. NFC

Add a `buildMergeValues` method that unconditionally builds a
G_MERGE_VALUES instruction, as opposed to `buildMergeLikeInstr` which
may decide on a different opcode based on the input types.

I haven't audited all the uses of `buildMergeLikeInstr` to see if they
can be replaced with `buildMergeValues`, but I did find a couple of
obvious ones where we check that we're merging scalars right before
calling `buildMerge`.

This is a follow-up suggested in https://reviews.llvm.org/D140964

Differential Revision: https://reviews.llvm.org/D141373

MachineIRBuilder: Rename buildMerge. NFC

`buildMerge` may build a G_MERGE_VALUES, G_BUILD_VECTOR or
G_CONCAT_VECTORS. Rename it to `buildMergeLikeInstr`.

This is a follow-up suggested in https://reviews.llvm.org/D140964

Differential Revision: https://reviews.llvm.org/D141372

GlobalISel: s/Op/Instr in some places. NFC

This patch replaces `GMergeLikeOp` with `GMergeLikeInstr` and
`MachineIRBuilder::buildAssertOp` with `buildAssertInstr` in order to
remove ambiguity. Discussed in: https://reviews.llvm.org/D141372

[flang] Lower elemental intrinsics to hlfir.elemental

- Move the core code generating hlfir.elemental for user calls from
  genUserElementalCall into a new ElementalCallBuilder class and use
  C++ CRTP (curiously recursive template pattern) to implement the
  parts specific to user and intrinsic call into ElementalUserCallBuilder
  and ElementalIntrinsicCallBuilder. This allows sharing the core logic
  to lower elemental procedures for both user defined and intrinsics
  procedures.

- To allow using ElementalCallBuilder, split the intrinsic lowering code
  into two parts: first lower the arguments to hlfir::Entity regardless
  of the interface of the intrinsics, and then, in a different function
  (genIntrinsicProcRefCore), prepare the hlfir::Entity according to the
  interface. This allows using the same core logic to prepare "normal"
  arguments for non-elemental intrinsics, and to prepare the elements of
  array arguments inside elemental call (ElementalIntrinsicCallBuilder
  calls genIntrinsicProcRefCore once it has computed the scalar actual
  arguments).
  To allow this split, genExprBox/genExprAddr/genExprValue logic had to
  be split in ConvertExprToHlfir.[cpp/h].

- Add missing statement context pushScope/finalizeAndPop around the
  code generation inside the hlfir.elemental so that any temps created
  while lowering the call at the element level is correctly cleaned-up.

- One piece of code in hlfir::Entity::hasNonDefaultLowerBounds() was wrong for assumed shape arrays (returned true when an assumed shaped array had no explicit lower bounds). This caused the added test to hit a bogus TODO, so fix it.

Elemental intrinsics returning are still TODO (e.g., adjustl). I will implement this in a next patch, this one is big enough.

Differential Revision: https://reviews.llvm.org/D141612

[libc][NFC] Use C headers in host CPU sniffing code.

[C2x] reject type definitions in offsetof

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm made very
clear that it is an UB having type definitions with in offsetof. After
this patch clang will reject any type definitions in __builtin_offsetof.

Fixes https://github.com/llvm/llvm-project/issues/57065

```
local/offsetof.c:10:38: error: 'struct S' cannot be defined in '__builtin_offsetof'
return __builtin_offsetof(struct S{ int a, b;}, a);
^
```

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D133574

[Instcombine] Add precommit tests for pattern xor(and, or); NFC

[clang-tidy][doc] Deprecate the AnalyzeTemporaryDtors option

It's not used anywhere, and we should not keep it
for eternity. Document it as deprecated and announce
its removal 2 releases later, so people have time
to update their .clang-tidy files.

Differential Revision: https://reviews.llvm.org/D141583

[RISCV][VP] expand vp intrinsics if no +zve32x feature

If the subtarget does not support VInstructions, expand vp intrinscs to scalar instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D139706

[AVR] Fix a bug in AsmPrinter when printing inline-asm operands

Fixes https://github.com/llvm/llvm-project/issues/58878

Reviewed By: aykevl, Miss_Grape

Differential Revision: https://reviews.llvm.org/D141589

[X86] Remove unused variable after D140087

[LoopReroll] Allow for multiple loop control only induction vars

Before this, LoopReroll would fail an assertion, falsely assuming that
there can only possibly a single loop control only induction variable.

For example:

```
%a = phi i16 [ %dec2, %for.body ], [ 0, %entry ]
%b = phi i16 [ %dec1, %for.body ], [ 0, %entry ]
%a.next = add nsw i16 %1, -1
%b.next = add nsw i16 %0, -1
%add = add nsw i16 %a, %b
; ... rerollable code
%cmp.not = icmp eq i16 -10, %add
br i1 %cmp.not, label %exit, label %loop
```

Both %a and %b are valid loop control only induction vars

Additionally, some NFC changes to remove unnecessary isa<PHINode> check

Updated complex_reroll checks

Differential Revision: https://reviews.llvm.org/D141109

[X86] Improve mul x, 2^N +/- 2 pattern by making the +/- 2x compute independently to x << N

Previous pattern was omitting ops in sequence which just increases the
latency (to 3c, same as imul!) i.e:

`(add/sub (add/sub (shl x, N), x), x)`

Better is to compute 2x indepedently so x << N for better ULP i.e:
`(add/sub (shl x, N), (add x, x))`

Reviewed By: pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D141113

[X86] Replace (31/63 -/^ X) with (NOT X) and ignore (32/64 ^ X) when computing shift count

Shift count is masked by hardware so these peepholes just extend
common patterns for NOT to the lower bits of shift count.

As well (32/64 ^ X) is masked off by the shift so can be safely
ignored.

Reviewed By: pengfei, lebedev.ri

Differential Revision: https://reviews.llvm.org/D140087

AMDGPU/GlobalISel: Make regbankselect of implicit_def consistent with constants

[RISCV] Change the return type of getStreamer() to support the use of overloading and other functions in RISCVELFStreamer

Move the declaration of RISCVELFStreamer from RISCVELFStreamer.cpp to RISCVELFStreamer.h.
Change the return type of getStreamer() to support the use of overloading and other functions in RISCVELFStreamer.

Differential Revision: https://reviews.llvm.org/D138500

[lldb/test] Fix data racing issue in TestStackCoreScriptedProcess

This patch should fix an nondeterministic error in TestStackCoreScriptedProcess.

In order to test both the multithreading capability and shared library
loading in Scripted Processes, the test would create multiple threads
that would take the same variable as a reference.

The first thread would alter the value and the second thread would
monitor the value until it gets altered. This assumed a certain ordering
regarding the `std::thread` spawning, however the ordering was not
always guaranteed at runtime.

To fix that, the test now makes use of a `std::condition_variable`
shared between the each thread. On the former, it will notify the other
thread when the variable gets initialized or updated and on the latter,
it will wait until the variable it receives a new notification.

This should fix the data racing issue while preserving the testing
coverage.

rdar://98678134

Differential Revision: https://reviews.llvm.org/D139484

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

[lldb] Update custom commands to always be overrriden

This is a follow-up patch to 6f7835f309b9.

As explained previously, when running from an IDE, it can happen that
the IDE imports some lldb scripts by itself. If the user also tries to
import these commands, lldb will show the following message:

```
error: cannot add command: user command exists and force replace not set
```

This message is confusing to the user, because it suggests that the
command import failed and that the execution should stop. However, in
this case, lldb will continue the execution with the command added
previously by the user.

To prevent that, this patch updates every first-party lldb-packaged
custom commands to override commands that were pre-imported in lldb.

Differential Revision: https://reviews.llvm.org/D140293

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>