review.tizen.org Git - platform/upstream/llvm.git/log

[OpenMP] Support kernel record and replay

This patch adds functionality for recording and replaying the execution of OpenMP offload kernels, based on an original implementation by Steve Rangel. The patch extends libomptarget to extract a json description of the kernel, the device image binary, and a device memory snapshot before and after the execution of a recorded kernel. Kernel recording/replaying in libomptarget is controlled through env vars (LIBOMPTARGET_RECORD, LIBOMPTARGET_REPLAY). It provides a tool, llvm-omp-kernel-replay, for replaying a kernel using the extracted information with the ability to verify replayed execution using the post-execution device memory snapshot, also supporting changing the number of teams/threads for replaying.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D138931

[nfc][mlgo] Remove abstraction layers for training logger

This follows from D141720

Differential Revision: https://reviews.llvm.org/D141967

[MachineBasicBlock] Explicit FT branching param

Introduce a parameter in getFallThrough() to optionally
allow returning the fall through basic block in spite of
an explicit branch instruction to it. This parameter is
set to false by default.

Introduce getLogicalFallThrough() which calls
getFallThrough(false) to obtain the block while avoiding
insertion of a jump instruction to its immediate successor.

This patch also reverts the changes made by D134557 and
solves the case where a jump is inserted after another jump
(branch-relax-no-terminators.mir).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D140790

Revert ""Reland "[pgo] Avoid introducing relocations by using private alias""

This reverts commit 6e5cbc097a5ac7fa95a8f425af8b03958151c763.

Causes link errors, see http://go/crb/1408161.

Fix OSX build break introduced by D141720

[Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number.

Let Propeller use specialized IDs for basic blocks, instead of MBB number.

This allows optimizations not just prior to asm-printer, but throughout the entire codegen.
This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version.

####Background
Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR.  This is done as follows.
    - Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly.
    - Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to.
    - While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization.  Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point.
    - The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks).
    - In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR.  Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped.
    - Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline.  Hence, MBB numbers are not suitable and we need something else.
####Solution
We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block.  It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs.

To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies.

The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions.

####Impact on Size of the `LLVM_BB_ADDR_MAP` Section
Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D100808

[RISCV] Use zeroext instead of signext in mask reduction tests. NFC

This is more consistent with ABI and how bools on RISC-V are
represented.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D141963

[libc++] Add ALLOW_RETRIES to a few flaky tests

Fixes #59464

Reviewed By: ldionne, Mordante, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D141885

[SystemZ][z/OS] Fix cityhash lit for EBCDIC

This will fix __murmur2_or_cityhash.pass.cpp in EBCDIC mode. The reason it fails is because of string literals are being used as input to CityHash algorithm so we need to adjust the EBCDIC expected results.

Reviewed By: #libc, philnik

Differential Revision: https://reviews.llvm.org/D141623

[CompilerRT] Remove sanitizer support for i386 iossim

Summary:
This patch removes building sanitizers for i386 iossim. This is to reduce the toolchain size.
Reviewers:

Subscribers:

[CompilerRT] Remove ubsan static runtime on Apple

This patch removes the static ubsan runtime on Apple devices. The motivation
is to reduce the toolchain size.

rdar://102061519

Differential Revision: https://reviews.llvm.org/D141550

[ORC-RT] Reapply ab59185fbfb (Add IntervalMap/Set), with missing files included.

The original commit was reverted in c151e8428a due missing files (thanks Kazu!).

Revert "[ORC-RT] Add IntervalMap and IntervalSet collections."

This reverts commit ab59185fbfb15c9ce5a64e3aacd3a8c7f6a97621.

It looks like this commit is missing interval_set_test.cpp.

[Analysis] Fix a warning

This patch fixes:

  llvm/include/llvm/Analysis/Utils/TrainingLogger.h:94:14: error:
  private field 'IncludeReward' is not used
  [-Werror,-Wunused-private-field]

[RISCV][TableGen] Correct formatting in RISCVGenCompressInstEmitter.inc. NFC

[clang][sema][Matrix] Move code from try-cast to `TypeLocVisitor`. NFC intended.

`MatrixTypeLoc` is not "sugar" `TypeLoc` and doesn't require to use the
underlying `TypeLoc` instead.

Differential Revision: https://reviews.llvm.org/D141422

[ORC-RT] Add IntervalMap and IntervalSet collections.

IntervalMap is an optionally-coalescing map -- it uses half-open ranges as keys,
allows lookups based on elements of the ranges (returning an iterator to the
containing range) and optionally coalesces adjacent ranges that have the same
value.

IntervalSet is an optionally-coalescing set based on IntervalMap.

These collections will be used to store and lookup metadata section ranges,
e.g. unwind-info ranges.

[mlir] Fix a deprecation warning

This patch fixes:

  mlir/include/mlir/Dialect/Affine/LoopUtils.h:332:25: error:
  'makeMutableArrayRef<mlir::scf::ForOp>' is deprecated: Use deduction
  guide instead [-Werror,-Wdeprecated-declarations]

[mlgo] Remove the protobuf dependency

The dependency was due to the log format. This change switches to the
previously-introduced (D139370) "dependency-free" logger instead of the
protobuf-based one.

A subsequent change will clean out the unnecessary abstraction left
behind.

This change drops the logger unittest, we have sufficient test coverage
via lit tests, and a unit test would require adding, unnecesarily, a log
reader (the reader is expected to be python, for the ML side, and there
is a reader for that under Analysis/models, used for tests).

Differential Revision: https://reviews.llvm.org/D141720

[VPlan] Replace VPExpandSCEVRecipe::classof with VP_CLASSOF_IMPL. (NFC)

[libc++] Mark std::pmr virtual functions as _LIBCPP_HIDE_FROM_ABI_VIRTUAL

Reviewed By: ldionne, Mordante, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D141864

nullptr returned from ActOnTag() is not a valid result

DeclResult tracks two states: valid/invalid and usable/unusable.
Passing a null pointer to the constructor creates a valid but unusable
result and we wanted an invalid result instead. This changes some
functions to return a DeclResult rather than a Decl * to make it harder
to get this incorrect in callers.

Discovered when working on https://reviews.llvm.org/D141280.

Co-authored-by: Haojian Wu <hokein@google.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Differential Revision: https://reviews.llvm.org/D141580

[mlir] Fix a warning

This patch fixes:

  mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp:820:13:
  error: unused function 'hasAtMostOneResultFunctionOfDim'
  [-Werror,-Wunused-function]

[RISCV] Add missing check prefixes to vreductions-mask.ll. NFC

There's a conflict between the riscv32 and riscv64 output for some
tests which caused the script to drop the check lines.

Add specific check prefixes for these cases.

[mlir][sparse] avoid using mutable descriptor when unnecessary (NFC)

Use SparseTensorDescriptor whenever not calling setters, to avoid needing to create a temporal buffer for simple query purposes.

Reviewed By: bixia, wrengr

Differential Revision: https://reviews.llvm.org/D141953

[VPlan] Replace VPScalarIVStepsRecipe::classof with VP_CLASSOF_IMPL(NFC)

[Clang] Reject in-class defaulting of previously declared comparison operators

Comparison operators are not allowed to be defaulted if they were previously declared outside the class.
Pretty low-impact, but it's nice to reject this without a linking error.
Fixes https://github.com/llvm/llvm-project/issues/51227.

Reviewed By: #clang-language-wg, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D141803

[Libomptarget][NFC] Rename device environment variable

This variable is used by the runtime. Before kernel launch we set it to
indicate several configuration options from the host. This patch renames
it to be more in-line with the rest of the named exported from the
runtime. This is better because this is the only symbol visible to the
host from the runtime, so it should have a reserved name.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D141960

[Clang] Configure definitions for amdgpu/nvptx arch query tools

Summary:
These tools are built unconditionally now. However, there seemed to be
problems where the headers would be found during cross compilation, but
no libraries present. To combat this we should elect to make the CMake
indicate whether or not we should use the dynamic library method or link
it directly rather than using `__has_include`.

[llvm][ADT] Mark `makeMutableArrayRef` as deprecated

Now that all of the uses of `makeMutableArrayRef` are replaced in-tree with use
of deduction guides (see
https://github.com/llvm/llvm-project/commit/a288d7f937708cf67d960962bfa22ffae37ddbf4),
mark `makeMutableArrayRef` as deprecated.

Also remove the old tests for `makeMutableArrayRef` in favor of the ones
introduced with the deduction guides in
https://github.com/llvm/llvm-project/commit/38791259c1165cedfa313e06dc20e443f1e20634.

Differential Revision: https://reviews.llvm.org/D141872

[InstCombine] factor difference-of-squares to reduce multiplication

(X * X) - (Y * Y) --> (X + Y) * (X - Y)
https://alive2.llvm.org/ce/z/BAuRCf

The no-wrap propagation could be relaxed in some cases,
but there does not seem to be an obvious rule for that.

[InstCombine] add tests for difference-of-squares; NFC

[RISCV] Remove MCRegisterInfo dependency from compressInst/uncompresInst/isCompressibleInst.

This was being used to lookup the register class for a register number,
but those live in a tablegened array. We can index that array directly
just like RISCVAsmParser does.

Differential Revision: https://reviews.llvm.org/D141951

[MC] Use MCRegister instead of unsigned in MCInstPrinter (NFC)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D140654

[RISCV] Use Zvl*b as a lower bound for VScaleRange.

The backend has a fatal error in RISCVSubtarget::getMinRVVVectorSizeInBits
if RVVVectorBitsMin is less than the Zvl length from -march. Now
RVVVectorBitsMin is connected to VScaleRange in the backend, we
can trip this fatal error.

This patch adds the Zvl*b length as a lower bound to protect this.
The test is updated to test vscale-min with Zvl64b instead of V.

I'd like to do a proper diagnostic for this, but I don't think we
can do that from this function. Since -mvscale-min is an internal cc1
option, I'm not sure it's a big deal.

I'm planning to add a driver option -msve-vector-bits. I will
probably implement a diagnostic for that.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D141459

Diagnose extensions in 'offsetof'

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm made very
clear that it is an UB having type definitions with in offsetof.
Clang supports defining a type as the first argument as a conforming
extension due to how many projects use the construct in C99 and earlier
to calculate the alignment of a type. GCC also supports defining a type
as the first argument.

This adds extension warnings and documentation for the functionality
Clang explicitly supports.

Fixes #57065

Co-authored-by: Yingchi Long <i@lyc.dev>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>

[PS5] Handle visibility options same as PS4

This update was missed in the initial rounds of upstreaming PS5.

[PS4] NFC: rewrite a test to use lit's DEFINE feature

Preparatory to running the same test for PS5.

[MLIR] Add return type inference to scf.if builder

Differential Revision: https://reviews.llvm.org/D141928

Add additional tests for ctlz{_zero_undef} to test folding with xor; NFC

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D141549

[mlir] fix dereferencing of optional sym_name attribute

`sym_name` is an optional attribute of `ModuleOp`, so it is unsafe to
fetch the underlying value without checking whether it is non-empty.
Such unsafe dereferencing causes the lower-host-to-llvm-calls_fail.mlir
test to segfault. Although this bug existed for four months, it wasn't
triggered, since previous tests executed a code path that used a default
value instead of one fetched from the module attribute.

This patch makes the code use a default value if the optional attribute
does not have a value.

Reviewed By: stella.stamenova

Differential Revision: https://reviews.llvm.org/D141941

[OpenMP] Make `-Xarch_host` and `-Xarch_device` work for OpenMP offloading

Clang currently supports the `-Xarch_host` and `-Xarch_device` variants
to handle passing arguments to only one part of the offloading
toolchain. This was previously only supported fully for HIP / CUDA This
patch simple updates the logic to make it work for any offloading kind.

Fixes #59799

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D141935

[Libomptarget] Replace Nvidia arch lookup with 'nvptx-arch'

This method to look up the CUDA architecture is deprecated in newer
versions of CMake. We also have our own way to query this information
that we control now via the `nvptx-arch` program, which should always be
present in LLVM builds with clang going forward. This is currently only
used for testing so I think we should be okay with the dependency.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D141933

[lldb] Only allow SymbolFiles to construct Types

SymbolFiles should own Types by keeping them in their TypeList. This
patch privates the Type constructor to guarantee that every created Type
is kept in the SymbolFile's type list.

Reland: [GWP-ASan] Add recoverable mode.

The GWP-ASan recoverable mode allows a process to continue to function
after a GWP-ASan error is detected. The error will continue to be
dumped, but GWP-ASan now has APIs that a signal handler (like the
example optional crash handler) can call in order to allow the
continuation of a process.

When an error occurs with an allocation, the slot used for that
allocation will be permanently disabled. This means that free() of that
pointer is a no-op, and use-after-frees will succeed (writing and
reading the data present in the page).

For heap-buffer-overflow/underflow, the guard page is marked as accessible
and buffer-overflows will succeed (writing and reading the data present
in the now-accessible guard page). This does impact adjacent
allocations, buffer-underflow and buffer-overflows from adjacent
allocations will no longer touch an inaccessible guard page. This could
be improved in future by having two guard pages between each adjacent
allocation, but that's out of scope of this patch.

Each allocation only ever has a single error report generated. It's
whatever came first between invalid-free, double-free, use-after-free or
heap-buffer-overflow, but only one.

Reviewed By: eugenis, fmayer

Differential Revision: https://reviews.llvm.org/D140173

[flang] Generate TBAA information.

This is initial version of TBAA information generation for Flang
generated IR. The desired behavior is that TBAA type descriptors
are generated for FIR types during FIR to LLVM types conversion,
and then TBAA access tags are attached to memory accessing operations
when they are converted to LLVM IR dialect.

In the initial version the type conversion is not producing
TBAA type descriptors, and all memory accesses are just partitioned
into two sets of box and non-box accesses, which can never alias.

The TBAA generation is enabled by default at >O0 optimization levels.
TBAA generation may also be enabled via `apply-tbaa` option of
`fir-to-llvm-ir` conversion pass. `-mllvm -disable-tbaa` engineering
option allows disabling TBAA generation to override Flang's default
(e.g. when -O1 is used).

SPEC CPU2006/437.leslie3d speeds up by more than 2x on Icelake.

Reviewed By: jeanPerier, clementval

Differential Revision: https://reviews.llvm.org/D141820

[InstCombine] Handle PHI nodes in PtrReplacer

This patch adds on to the functionality implemented
in rG42ab5dc5a5dd6c79476104bdc921afa2a18559cf,
where PHI nodes are supported in the use-def traversal
algorithm to determine if an alloca ever overwritten
in addition to a memmove/memcpy. This patch implements
the support needed by the PointerReplacer to collect
all (indirect) users of the alloca in cases where a PHI
is involved. Finally, a new PHI is defined in the replace
method which takes in replaced incoming values and
updates the WorkMap accordingly.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D136201

[MLIR][SCF] Fix comment in `TestTilingInterface.cpp` (NFC)

The method is called `tileConsumerAndFuseProducerGreedilyUsingSCFForOp`
and not `tileAndFuseGreedilyUsingSCFForOp`.

tsan: fix broken aarch64_39/42 mappings and expand them

The aarch64 39- and 42-bit mappings were broken: mappings to meta and shadow were not fully invertible. This CL introduces a working set of mappings, and also increases the size of some app regions:
* aarch64, 39-bit (2^39 == 512GB):
- Low: (Old) 4GB -> (New) 20GB
- Mid: 4GB -> 20GB
- Heap: 4GB -> 12GB
- High: 8GB -> 12GB
* aarch64, 42-bit (2^42 == 4TB):
- Low: 64GB -> 128GB
- Mid: 4GB -> 88GB
- Heap: 64GB -> 192GB
- High: 64GB

Additionally, this CL improves the code comments for all the linux aarch64 mappings.

Differential Revision: https://reviews.llvm.org/D141640

[mlir][vector] Fix extract op canonicalization for 0d vector

Fix ExtractOpFromBroadcast when the broadcast source is a 0d vector.

Differential Revision: https://reviews.llvm.org/D141735

[mlir][gpu] Improve foreach_thread distribution

Replace Ids with 0 when block dim is 1 when distributing foreach_thread.

Differential Revision: https://reviews.llvm.org/D141718

[mlir][vector] Add extra lowering for more transfer_write maps

Add pattern to lower transfer_write with permutation map that are not
permutation of minor identity map.

Differential Revision: https://reviews.llvm.org/D141815

[mlir][EmitC] Remove Pure trait from `emitc.include`

The op `emitc.include` does not have results and thus will be elided
during canonicalization, which is not correct behavior. This change
removes the 'Pure' trait and adds a canonicalization test.

Reviewed By: jpienaar, marbre

Differential Revision: https://reviews.llvm.org/D141704

[mlir][vector] Fix lowering of permutation maps for transfer_write op

The lowering of transfer write permutation maps didn't match the op definition:
https://github.com/llvm/llvm-project/blob/93ccccb00d9717b58ba93f0942a243ba6dac4ef6/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td#L1476

Fix the lowering and add a case to the integration test in
order to enforce the correct semantic.

Differential Revision: https://reviews.llvm.org/D141801

[scudo] Fix -Wsign-compare warning

Fix crash in LLVM Dialect inliner interface: add support for llvm.return

The LLVM inliner was missing the `handleTerminator` method in the
Dialect interface implementation.

Fixes #60093

Differential Revision: https://reviews.llvm.org/D141901

Fix crash in scf.parallel verifier

Fixes #59989

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D141911

[clangd] Disable modernize-macro-to-enum tidy check

Check relies on seeing PP-directives from preamble, hence it's unusable.
See https://github.com/clangd/clangd/issues/1464.

[CVP] Avoid duplicate range calculation (NFC)

Calculate the range once for all the sdiv/srem transforms.

[AArch64][SVE] Implement isVScaleKnownToBeAPowerOfTwo

According to https://developer.arm.com/documentation/102105/ia-00/?lang=en

> Arm is making a retrospective change to the SVE architecture to remove
> the capability of selecting a non-power-of-two vector length in
> non-Streaming SVE as well as in Streaming SVE mode. Specific updates as
> a result of this change will be communicated in due course.

This patch implements the isVScaleKnownToBeAPowerOfTwo method to teach
DAG Combines that VScale will be known to be a power of 2, which helps
reduce or simplify some expressions (notably the udiv in vector trip
count expressions).

Differential Revision: https://reviews.llvm.org/D141486

[CVP] Avoid duplicate range calculation (NFC)

Calculate the range once and use it in processURem() and
narrowUDivOrURem().

[CVP] Handle use-site conditions in domain-based folds

As a side-effect, this switchem them to use getConstantRange() rather
than getPredicateAt(). getPredicateAt() is not supposed to be more
powerful than getConstantRange() for non-equality comparisons (as
long as block values are used).

Revert "[clang] Instantiate concepts with sugared template arguments"

This reverts commit b8064374b217db061213c561ec8f3376681ff9c8.

Based on the report here:
https://github.com/llvm/llvm-project/issues/59271

this produces a significant increase in memory use of the compiler and a
large compile-time regression. This patch reverts this so that we don't
branch for release with that issue.

[CVP] Handle use-site conditions in more folds

[flang] Support allocate with source for polymorphic entities

Apply the source type spec to the descriptor for
polyrmophic entities.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D141822

[VPlan] Remove duplicated VPValue IDs (NFCI).

At the moment, both VPValue and VPDef have an ID used when casting via
classof. This duplication is cumbersome, because it requires adding IDs
for new recipes twice and also requires setting them twice. In a few
cases, there's only a VPDef ID and no VPValue ID, which can cause same
confusion.

To simplify things, remove the VPValue IDs for different recipes.
Instead, only retain the generic VPValue ID (= used VPValues without a
corresponding defining recipe) and VPVRecipe for VPValues that are
defined by recipes that inherit from VPValue.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140848

[mlir][Transform] Add a transform.get_consumers_of_result navigation op

Differential Revision: https://reviews.llvm.org/D141930

[MIScheduler] Print top/down cycle in the SUnit dump.

Add an extra command line option to `llc` that allows checking at what cycle an instruction has been scheduled by the machine scheduler.

Differential Revision: https://reviews.llvm.org/D141289

[flang] Lower allocation with MOLD

Lower allocate statement with MOLD= to calls to the Fortran
runtime. PointerApplyMold and AllocatableApplyMold are called
depending on the object to be allocated.

Reviewed By: jeanPerier, PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D141843

[Flang] [OpenMP] Refine parser restrictions for OMP TARGET UPDATE clauses.

In Parser, move some clauses of OMP TARGET UPDATE to allowedOnceClauses so that restrictions will be imposed.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D141567

[MLIR][Transform] Introduce loop.coalesce transform op.

This patch made a minor refactor of LoopCoalescing.cpp's walkLoops
templated method and placed it in Affine's LoopUtils.cpp/h.
This method is also renamed as coalescePerfectlyNestedLoops method. This
minor change enables this method to be invoked
by both the original LoopCoalescing pass as well as the newly introduced
loop.coalesce transform op.

The loop.coalesce transform op has the ability to coalesce affine, and
scf loop nests, leveraging existing LoopCoalescing
mechanism. I have created it inside the SCFTransformOps.td instead of
AffineTransformOps.td as it feels to be similar
in spirit as the loop.unroll op that can handle both scf and affine
loops. Please let me know if you feel that this op
should be moved into AffineTransformOps.td instead.

The testcase added illustrates loop.coalesce transform op working for
scf, affine loops (inner, outer) as well as
coalesced loop can be further unrolled (achieving composibility).

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D141202

[clang-repl] XFAIL riscv targets in simple-exception test case

This test fails for RISC-V and Arm targets are already XFAILed, so add
RISC-V to the XFAIL list.

Differential Revision: https://reviews.llvm.org/D141380

[InstCombine] Don't combine smul of i1 type constant one

Fixes: https://github.com/llvm/llvm-project/issues/59876

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D141214

[flang] fix FIRLangRef.md path

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D141416

[mlir][Linalg] Fix post-commit typo for 5443743ca1874acfe2d5654fedd4a0c0bed6777e

[mlir][Linalg] Add a transform.structured.pack operation

This revision introduces a `transform.structured.pack` operation to
transform any Linalg operation to a higher-dimensional Linalg operation on
packed operands.

`tensor.pack` (resp. `tensor.unpack`) operations are inserted for the operands
(resp. results) that need to be packed (resp. unpacked) according to the
`packed_sizes` specification.

At the moment, the packing operation always pads with `getZeroAttr` which will
need to be adjusted depending on the consumers.

Packing is limited to those dimensions that are indexed only by AffineDimExpr.
Packing more advanced indexings requires modular arithmetic that is outside the
scoped of a `linalg.generic` at the moment.

Differential Revision: https://reviews.llvm.org/D141860

[AArch64][SVE] Fix typo after post review change to D141471.

[docs] Add llvm & clang release notes for LoongArch

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D141750

[AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging

This fix is similar to D124325, and I find the DestructiveBinaryComm
operation type also may be allocated same register, so insert the LSL.

      movprfx       z0.s, p0/z, z0.s
      lsl z0.b, p0/m, z0.b, #0
      fmul z0.s, p0/m, z0.s, z0.s

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D141471

[flang][hlfir] Lower some character elemental references

Lower character elemental user procedures with constant length, and
bot dynamic and constant length ADJUSTL, ADJUSTR, and MERGE references
(which leaves out MIN/MAX).

Character elemental user procedures with dynamic length are a bit more
involving and since it is an edge-case that is not currently supported,
I will take this on later.

Differential Revision: https://reviews.llvm.org/D141847

[flang][OpenMP] Parser support for the unroll construct (5.1)

added parser support for the unroll construct

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D138229

[mlir][Tensor][NFC] Migrate Tensor dialect to the new fold API

See https://discourse.llvm.org/t/psa-new-improved-fold-method-signature-has-landed-please-update-your-downstream-projects/67618 for context

Differential Revision: https://reviews.llvm.org/D141530

[BitcodeReader] Allow reading pointer types from old IR

When opaque pointers are enabled and old IR with typed pointers is read,
the BitcodeReader automatically upgrades all typed pointers to opaque
pointers. This is a lossy conversion, i.e. when a function argument is a
pointer and unused, it’s impossible to reconstruct the original type
behind the pointer.

There are cases where the type information of pointers is needed. One is
reading DXIL, which is bitcode of old LLVM IR and makes a lot of use of
pointers in function signatures.
We’d like to keep using up-to-date llvm to read in and process DXIL, so
in the face of opaque pointers, we need some way to access the type
information of pointers from the read bitcode.

This patch allows extracting type information by supplying functions to
parseBitcodeFile that get called for each function signature or metadata
value. The function can access the type information via the reader’s
type IDs and the getTypeByID and getContainedTypeID functions.
The tests exemplarily shows how type info from pointers can be stored in
metadata for use after the BitcodeReader finished.

Differential Revision: https://reviews.llvm.org/D127728

Fix bazel build overlay.

[VPlan] Add test for VPAllSuccessorIterator directly. (NFC)

Additional test coverage for D140511.

[flang][OpenMP] Added parser support for Tile Construct ( OpenMP 5.1)

Added parser support for Tile Construct .

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D136359

[VPlan] Remove unnecessary getNumSuccessors call (NFC).

If ParentWithSuccs is nullptr, the number of successors is guaranteed to
be 0. Simplify the code as suggested by @Ayal in D140511.

[ARM] Fix i1 shuffle lowering with multiple operands.

The existing lowering of i1 vector shuffle was only considering
single-source shuffles, always assuming the second was undef. This
extends that to properly handle both operands.

[Linker] Convert test to opaque pointers (NFC)

Remove pointer indirections to preserve test intent.

[Linker] Convert test to opaque pointers (NFC)

Removing pointer indirections to at least somewhat preserve test
intent. I wasn't aware this kind of directly co-recursive type
is even legal.

[mlir][vector] Share enums with the transform dialect

Refactor the definition of the enums that are used in the lower_vectors
operation of the transformation dialect.
This avoid duplicating the definition of all the configurations that
this operation can trigger.

NFC

Differential Revision: https://reviews.llvm.org/D141867

[libc] Fix memcpy inefficiency

[Linker] Convert test to opaque pointers (NFC)

To at least somewhat preserve the test intent, remove some
pointer indirections and make types structurally different.

[flang] Lower elemental and transformational clean-up in HLFIR

In lowering to hlfir, no clean-up was added yet for
the created hlfir.elemental. Add the needed hlfir.destroy.

Regarding transformational lowering, clean-ups were created because
they are lowered in memory, but this is inconvenient because this
prevented lowering to hlfir from "moving" the created variable to
an expression. Add a new entry point in IntrinsicCall.h that keeps
track of whether or not the returned storage needs to be deallocated,
but does not insert the deallocation in the StatementContext.
This allows using the newly added hlfir.as_expr "move" aspect to be
used and save creating a copy.

Depends on D141839

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D141841

Revert "[AArch64] fold subs ugt/ult to ands when the second operand is a mask"

This reverts commit 4a64024c1410692197e4b54e27e7b269a67c78f4.

The original commit made a misstake that ugt reverse should be ule

[NFC][WebAssembly] Update test

Run update_llc_test_checks.py on address-offsets.ll

[flang][hlfir] Add hlfir.destroy operation.

Add the operation to mark the end of life of hlfir.expr.
As described in its description this is the easiest solution
to deploy given lowering "knows" where expression value are last
used.
However, inserting these points in lowering will probably make
it harder to do some IR transformation that would move the code
using or creating hlfir.expr (no use should be moved after an
hlfir.destroy).
Once the dust settle with the HLFIR change, it will be worth assessing
the situation and see if an analysis could do a better and safer job at
finding those destruction points.

Depends on D141832

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D141839

[flang][hlfir] Add move semantics to hlfir.as_expr.

hlfir.as_expr allows turning an array, character, or derived type
variable into a value when it the usage require an hlfir.expr (e.g,
when returning the element value inside and hlfir.elemental).

The default implementation of this operation in bufferization is to
make a copy of the variable into a temporary buffer.
This adds a time and memory overhead in cases where such copy is not
needed because the variable is already a temporary that was created
in lowering to compute the expression value, and the "as_expr" is
the sole usage of the variable.

This is for instance the case for many transformational intrinsics
that do not have hlfir.expr operation (at least for now, but some may
never benefit from having one) and must be implemented "on memory"
in lowering.

This patch adds a way to "move" the variable storage along its value.
It allows the bufferization to re-use the variable storage for the
hlfir.expr created by hlfir.as_expr, and in exchange, the
responsibility of deallocating the buffer (if the variable was heap
allocated) if passed along to the hlfir.expr, and will need to be
done after the last hlfir.expr usage.

Differential Revision: https://reviews.llvm.org/D141832

[MLIR] Convert some tests to opaque pointers (NFC)

[MLIR] Convert test to opaque pointers (NFC)