review.tizen.org Git - platform/upstream/llvm.git/log

[ELF] Reword symMap/symVector comment. NFC

Having symVector makes iteration efficient and is actually more
efficient than using llvm::DenseMap<llvm::CachedHashStringRef, Symbol
*>, so the FIXME comment can be removed. Using an alternative
implementation ankerl/unordered_dense.h decreases link time for chromium
by 0.x% but I am unsure it justifies the extra header file.

[lldb] Accept negative indexes in __getitem__

To the Python bindings, add support for Python-like negative indexes.

While was using `script`, I tried to access a thread's bottom frame with
`thread.frame[-1]`, but that failed. This change updates the `__getitem__`
implementations to support negative indexes as one would expect in Python.

Differential Revision: https://reviews.llvm.org/D143282

[TLS]: Clamp the alignment of TLS global variables if required by the target

Adding a module flag 'MaxTLSAlign' describing the maximum alignment a global TLS
variable can have. Optimizers are prevented from increasing the alignment of such
variables beyond this threshold.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D140123

[SROA] Pre-commit vector-promotion.ll tests for D143225

[LLDB] Fix assertion failure by removing `CopyType` in `std::coroutine_handle` pretty printer

The pretty printer for `std::coroutine_handle` was running into
> Assertion failed: (target_ctx != source_ctx && "Can't import into itself")
from ClangASTImporter.h, line 270.

This commit fixes the issue by removing the `CopyType` call from the
pretty printer. While this call was necessary in the past, it seems to
be no longer required, at least all test cases are still passing. Maybe
something changed in the meantime around the handling of `TypesystemClang`
instances. I don't quite understand why `CopyType` was necessary earlier.

I am not sure how to add a regression test for this, though. It seems
the issue is already triggered by the exising `TestCoroutineHandle.py`,
but API tests seem to ignore all violations of `lldbassert` and still
report the test as "passed", even if assertions were triggered

Differential Revision: https://reviews.llvm.org/D143127

[RISCV] Precommit a test for upcoming miscompile bugfix

[DAG] Fold freeze(build_pair(x,y)) -> build_pair(freeze(x),freeze(y))

One of the cleanups necessary for D136529 - another being how we're going to handle moving freeze through multiple result nodes (like uaddo and subcarry)

[flang] Fix rank and byte stride in pointer remapping

In some remapping case the rank of the pointer is different
from the target one.

```
program remap
  type :: p
    integer :: a
  end type t
  type(p), target :: ta(10) = [ (t(i),i=1,10) ]
  class(t), pointer :: p(:,:)
  p(1:2,1:5) => ta
end
```

This patch updates the rank and the byte stride to fix such case.

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D143566

[RuntimeDyld][ELF] Fixed relocations referencing undefined TLS symbols

The classification of TLS symbols in ELF was changed from ST_Data to
ST_Other in the following commit:
018a484cd26d72fb4c9e7fd75e5f5bc7838dfc73

RuntimeDyldELF::processRelocationRef() needs to be updated to also
handle ST_Other symbols so that it handles TLS relocations correctly.
The current tests did not fail because we have a shortcut for global
symbols that are already defined.

Differential Revision: https://reviews.llvm.org/D143568

[X86] Merge DQ/BW AVX512 ISD::ABDS/ABDU setOperationAction calls. NFCI.

All set to Custom - there's no need to have them in separate loops

[flang] Fix optional assertion in PFTBuilder

D142279 enabled assertion in libstdc++ and one was triggered
in the PFTBuilder because an optional was access even if it was
null.
This patch fix this issue and add a regression test.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D143589

[LV] Perform recurrence sinking directly on VPlan.

This patch updates LV to sink recipes directly using the VPlan use
chains. The initial patch only moves sinking to be purely VPlan-based.
Follow-up patches will move legality checks to VPlan as well.

At the moment, there's a single test failure remaining.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142589

Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."

This patch causes a regression, so reverting it while I investigate the issue.

This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.

Revert "[Support] change StringMap hash function from djbHash to xxHash"

This reverts commit d768b97424f9e1a0aae45440a18b99f21c4027ce.

Causes sanitizer failure: https://lab.llvm.org/buildbot/#/builders/238/builds/1114

```
/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/xxhash.cpp:107:12: runtime error: applying non-zero offset 8 to null pointer
#0 0xaaaab28ec6c8 in llvm::xxHash64(llvm::StringRef) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/xxhash.cpp:107:12
#1 0xaaaab28cbd38 in llvm::StringMapImpl::LookupBucketFor(llvm::StringRef) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/StringMap.cpp:87:28
```

Probably causes test failure in `warn-unsafe-buffer-usage-fixits-local-var-span.cpp`: https://lab.llvm.org/buildbot/#/builders/60/builds/10619

Probably causes reverse-iteration test failure in `test-output-format.ll`: https://lab.llvm.org/buildbot/#/builders/54/builds/3545

[flang] Unlimited polymoprhic allocated as character

Allocation of unlimited polymorphic allocatable with
character intrinsic type is now done through
`PointerNullifyCharacter` or `AllocatableInitCharacter` so the length
is correctly set.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D143580

[AArch64][GlobalISel] Lower formal arguments of AAPCS & ms_abi variadic functions.

Reimplemented SelectionDAG code for GlobalISel.

Fixes https://github.com/llvm/llvm-project/issues/54079

Differential Revision: https://reviews.llvm.org/D130903

[ARM] Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.

Use APInt::setBit() method instead of OR'ing individual bits.

[hexagon] Turning off sign mismatch warning by default.

Patch-by: Colin Lemahieu <colinl@codeaurora.org>
Differential Revision: https://reviews.llvm.org/D143531

[flang] Support polymorphic inputs for UNPACK intrinsic

Result must carry the polymorphic type information
from the vector.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D143575

Recommit "[ConstraintElimination] Move Value2Index map to ConstraintSystem (NFC)"

This reverts commit 665ee0cd57f92a112cf8e929d00768e282fb205a.

Fix comments and formatting style.

[libc] Don't try to use MPFR with the GPU build for now

Summary:
We don't have the infastructure to support MPFR on the GPU. We should
disable this categorically on GPU builds for now.

[libc][bazel] Add missing libc_root dep

[InstSimplify] add tests for strict fadd with SNaN operand; NFC

[DSE] Add test with llvm.memcpy & memcpy_chk.

This adds test coverage to avoid crashes with further changes.

[AArch64] Fix creation of invalid instructions with XZR register

A combination of GlobalISel and MachineCombiner can end up creating
`SUB xrz, (MOVI -2105098)` instructions which have not been constant
folded. The AArch64MIPeepholeOpt pass will then attempt to create
`ADD xzr, 513, lsl 12`, which is not a valid instruction. This adds
a bail out of the transform if the register is xzr/wzr.

Fixes #60528

Differential Revision: https://reviews.llvm.org/D143475

[NVPTX] Increase inline threshold multiplier to 11 in nvptx backend.

I used https://github.com/zjin-lcf/HeCBench (with nvcc usage swapped to
clang++), which is an adaptation of the classic Rodinia benchmarks aimed
at CUDA and SYCL programming models, to compare different values of the
multiplier using both clang++ cuda and clang++ sycl nvptx backends. I
find that the value is currently too low for both cases. Qualitatively
(and in most cases there is very a close quantitative agreement across
both cases) the change in code execution time for a range of values from
5 to 1000 matches in both variations (CUDA clang++ vs SYCL (with cuda
backend) using the intel/llvm clang++ compiler) of the HeCbench samples.
This value of 11 is optimal for clang++ cuda for all cases I've
investigated. I have not found a single case where performance is
deprecated by this change of the value from 5 to 11. For one sample the
sycl cuda backend preferred a higher value. However we are happy to
prioritize clang++ cuda, and we find that this value is close to ideal
for both cases anyway. It would be good to do some further investigation
using clang++ openmp cuda offload. However since I do not know of an
appropriate set of benchmarks for this case, and the fact that we are
now getting complaints about register spills related to insufficient
inlining on a weekly basis, we have decided to propose this change and
potentially seek some more input from someone who may have more
expertise in the openmp case. Incidentally this value coincides with the
value used for the amd-gcn backend. We have also been able to use the
amd backend of the intel/llvm "dpc++" compiler to compare the inlining
behaviour of an identical code when targetting amd (compared to nvptx).
Unsurprisingly the amd backend with a multiplier value of 11 was
performing better (with regard to inlining) than the nvptx case when the
value of 5 was used. When the two backends use the same multiplier value
the inlining behaviors appear to align closely.

This also considerably improves the performance of at least one of the
most popular HPC applications: NWCHEMX.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Reviewed by: tra
Differential Revision: https://reviews.llvm.org/D142232

[SanitizerBinaryMetadata] Emit constants as ULEB128

Emit all constant integers produced by SanitizerBinaryMetadata as
ULEB128 to further reduce binary space used. Increasing the version is
not necessary given this change depends on (and will land) along with
the bump to v2.

To support this, the !pcsections metadata format is extended to allow
for per-section options, encoded in the first MD operator which must
always be a string and contain the section: "<section>!<options>".

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D143484

[SanitizerBinaryMetadata] Optimize used space for features and UAR stack args

Optimize the encoding of "covered" metadata by:

1. Reducing feature mask from 4 bytes to 1 byte (needs increase once we
reach more than 8 features).

2. Only emitting UAR stack args size if it is non-zero, saving 4 bytes
in the common case.

One caveat is that the emitted metadata for function PC (offset), size,
and UAR size (if enabled) are no longer aligned to 4 bytes.

SanitizerBinaryMetadata version base is increased to 2, since the change
is backwards incompatible.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D143482

[bazel] Actually put Importer in the right library

Fixes a81136c332

[bazel] Port b83caa32dc

[xxHash] Don't trigger UB on empty StringRef

This is quite silly, but casting to uintptr_t seems like the easiest
option to quiet ubsan.

llvm/lib/Support/xxhash.cpp:107:12: runtime error: applying non-zero offset 8 to null pointer
#0 0x7fe3660404c0 in llvm::xxHash64(llvm::StringRef) llvm/lib/Support/xxhash.cpp:107:12

[flang][NFC] Move Procedure designator lowering in its own file

Code move without any change, the goal is to re-use this piece of
code for procedure designator lowering in HLFIR since there is no
significant changes in the way procedure designators will be
lowered.

Differential Revision: https://reviews.llvm.org/D143563

[DAG] Fold Op(vecreduce(a), vecreduce(b)) into vecreduce(Op(a,b))

So long as the operation is reassociative, we can reassociate the double
vecreduce from for example fadd(vecreduce(a), vecreduce(b)) to
vecreduce(fadd(a,b)). This will in general save a few instructions, but some
architectures (MVE) require the opposite fold, so a shouldExpandReduction is
added to account for it. Only targets that use shouldExpandReduction will be
affected.

Differential Revision: https://reviews.llvm.org/D141870

Revert "[ConstraintElimination] Move Value2Index map to ConstraintSystem (NFC)"

This reverts commit 40ffe9c167395256b43846733ab69eec17eead78.

Reverted because some comments where missed in the review https://reviews.llvm.org/D142647

[mlir][llvm] Add MD_prof import error handling

This commit adds additional checks and warning messages to the MD_prof
import. As LLVM does not verify most metadata, the import has the be
resilient towards ill-formatted inputs.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D143492

[ConstraintElimination] Move Value2Index map to ConstraintSystem (NFC)

Differential Revision: https://reviews.llvm.org/D142647

[mlir][llvm] Add support for loop metadata import

This commit introduces functionality to import loop metadata. Loop
metadata nodes are transformed into LoopAnnotationAttrs and attached to
the corresponding branch operations.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D143376

[flang] Use clang sysroot image to test fastmath linking

This test has been very unreliable across different machines. Update it
to use clang's sysroot image so that the fastmath object file name is
stable across different distributions and distro types.

Based on clang/test/Driver/linux-ld.c

Thanks to mnadeem for pointing this out at https://reviews.llvm.org/D138675

Differential Revision: https://reviews.llvm.org/D142807

[flang][NFC] add convertToX functions to HLFIRTools

These will be useful for sharing code with intrinsic argument processing
when lowering hlfir transformational intrinsic operations to FIR in
the BufferizeHLFIR pass.

Differential Revision: https://reviews.llvm.org/D143503

[clang][AIX] Remove test for the default OpenMP runtime

The default OpenMP runtime may not be libomp since it can be changed
by specified `CLANG_DEFAULT_OPENMP_RUNTIME`. This test will fail if
we change the default OpenMP runtime.

This patch removes test for the default OpenMP runtime and moves the
CHECKs downward.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D143549

[X86] Add ISD::ABDS/ABDU vXi64 support on SSE41+ targets

If IMINMAX ops aren't legal, we can lower to the select(icmp(x,y),sub(x,y),sub(y,x)) pattern

[flang] Add a proper TODO for polymorphic array lowering with vector subscript

Creation of polymorphic array temporary cannot be done inlined.
Add a TODO so the current code exit in a clean way when lowering
reach it. A solution involving the runtime will be put in place.

Depends on D143490

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D143491

[flang][NFC] Centralize fir.class addition in ConvertType

fir.class type is always needed for polymorphic and unlimited
polymorphic entities. Wrapping the element type with a fir.class
type was done in ConvertType for some case and else where in the
code for other. Centralize this in ConvertType when converting
from expr or symbol.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D143490

[mlir][MemRef] Add option to `-finalize-memref-to-llvm` to emit opaque pointers

This is the first patch in a series of patches part of this RFC: https://discourse.llvm.org/t/rfc-switching-the-llvm-dialect-and-dialect-lowerings-to-opaque-pointers/68179

This patch adds the ability to lower the memref dialect to the LLVM Dialect with the use of opaque pointers instead of typed pointers. The latter are being phased out of LLVM and this patch is part of an effort to phase them out of MLIR as well. To do this, we'll need to support both typed and opaque pointers in lowering passes, to allow downstream projects to change without breakage.

The gist of changes required to change a conversion pass are:
* Change any `LLVM::LLVMPointerType::get` calls to NOT use an element type if opaque pointers are to be used.
* Use the `build` method of `llvm.load` with the explicit result type. Since the pointer does not have an element type anymore it has to be specified explicitly.
* Use the `build` method of `llvm.getelementptr` with the explicit `basePtrType`. Ditto to above, we have to now specify what the element type is so that GEP can do its indexing calculations
* Use the `build` method of `llvm.alloca` with the explicit `elementType`. Ditto to the above, alloca needs to know how many bytes to allocate through the element type.
* Get rid of any `llvm.bitcast`s
* Adapt the tests to the above. Note that `llvm.store` changes syntax as well when using opaque pointers

I'd like to note that the 3 `build` method changes work for both opaque and typed pointers, so unconditionally using the explicit element type form is always correct.

For the testsuite a practical approach suggested by @ftynse was taken: I created a separate test file for testing the typed pointer lowering of Ops. This mostly comes down to checking that bitcasts have been created at the appropiate places, since these are required for typed pointer support.

Differential Revision: https://reviews.llvm.org/D143268

[lit] Pass LLVM_PROFILE_FILE environment

When building a PGO version of LLVM you might want to customize
the output profile file when building tests. For this to work
we need to pass LLVM_PROFILE_FILE enviroment.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D143556

[SanitizerBinaryMetadata] Make module_[cd]tor external

If a COMDAT key has a local linkage, it behaves as `comdat nodeduplicate` and
llvm/lib/Linker/LinkModules.cpp does not deduplicate its members.
This is not intended. Switch to an external linkage to allow deduplication.

See also https://maskray.me/blog/2021-07-25-comdat-and-section-group#grp_comdat

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D143530

[LoongArch] Merge the 12bit constant address into the offset field of the instruction

There are 12bit offset fields in the ld.[b/h/w/d] and st.[b/h/w/d].
When the constant address is less than 12 bits, the address
calculation is incorporated into the offset field of the instruction.

Differential Revision: https://reviews.llvm.org/D143470

[C++20] [Modules] Allow -fmodule-file=<module-name>=<BMI-Path> for implementation unit and document the behavior

Close https://github.com/llvm/llvm-project/issues/57293.

Previsouly we can't use `-fmodule-file=<module-name>=<BMI-Path>` for
implementation units, it is a bug. Also the behavior of the above option
is not tested nor documented for C++20 Modules. This patch addresses the
2 problems.

[mlir][bufferize][NFC] OneShotAnalysis: Expose analysis hooks from AnalysisState

This is in preparation of reusing the same AnalysisState for tensor.empty elimination and One-Shot Bufferize (to address performance bottlenecks).

Differential Revision: https://reviews.llvm.org/D143379

[flang] Carry over the derived type from MOLD

Derived type from the MOLD was not carried over
to the newly allocated pointer or allocatable.
This may lead to wrong dynamic type when the pointer or allocatable
is polymorphic as shown in the example below:

```
type :: p1
integer :: a
end type

type, extends(p1) :: p2
integer :: b
end type

class(p1), pointer :: p(:)

allocate(p(5), MOLD=p2(1,2))
```

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D143525

[mlir][bufferize][NFC] Merge AnalysisState and BufferizationAliasInfo

There is no longer a need to keep the two separate. This is in preparation of reusing the same AnalysisState for tensor.empty elimination and One-Shot Bufferize (to address performance bottlenecks).

Differential Revision: https://reviews.llvm.org/D143313

[compiler-rt][macOS]: Disable iOS support if iOS SDK is not found

If you are missing the iOS SDK on your macOS (for example you don't have
full Xcode but just CommandLineTools) then CMake currently errors
out without a helpful message. This patch disables iOS support in
compiler-rt if the iOS SDK is not found. This can be overriden by
passing -DCOMPILER_RT_ENABLE_IOS=ON.

Reviewed By: delcypher, thetruestblue

Differential Revision: https://reviews.llvm.org/D133273

Revert "[RISCV] Add performMULcombine to perform strength-reduction"

This reverts commit 3304d51b676ea511feca28089cb60eba3873132e.

Revert "[RISCV] Add vendor-defined XTHeadBs (single-bit) extension"

This reverts commit 656188ddc4075eb50260607b3497589873f373d2.

Revert "[RISCV] Add vendor-defined XTheadBb (basic bit-manipulation) extension"

This reverts commit 19a59099095b3cbc9846e5330de26fca0a44ccbe.

[RISCV] Fix comment for Zba tests. NFC.

The comments in the Zba tests were referring to the "bitmanip base"
extension (i.e., the Zbb). Fix it.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D143534

[RISCV] Add vendor-defined XTheadBb (basic bit-manipulation) extension

The vendor-defined XTHeadBb (predating the standard Zbb extension)
extension adds some bit-manipulation extensions with somewhat similar
semantics as some of the Zbb instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for XTHeadBb is
available from:
https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=8254c3d2c94ae5458095ea6c25446ba89134b9da

Depends on D143036

Differential Revision: https://reviews.llvm.org/D143439

[RISCV] Add vendor-defined XTHeadBs (single-bit) extension

The vendor-defined XTHeadBs (predating the standard Zbs extension)
extension adds a bit-test instruction (th.tst) with similar semantics
as bexti from Zbs.  It is supported by the C9xx cores (e.g., found in
the wild in the Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for XTHeadBs is
available from:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=8254c3d2c94ae5458095ea6c25446ba89134b9da

Depends on D143394

Differential Revision: https://reviews.llvm.org/D143036

[RISCV] Add performMULcombine to perform strength-reduction

The RISC-V backend thus far does not provide strength-reduction, which
causes a long (but not complete) list of 3-instruction patterns listed
to utilize the shift-and-add instruction from Zba and XTHeadBa in
strength-reduction.

This adds the logic to perform strength-reduction through the DAG
combine for ISD::MUL.  Initially, we wire this up for XTheadBa only,
until this has had some time to settle and get real-world test
exposure.

The following strength-reductions strategies are currently supported:
  - XTheadBa
    - C = (n + 1)           // th.addsl
    - C = (n + 1)k          // th.addsl, slli
    - C = (n + 1)(m + 1)    // th.addsl, th.addsl
    - C = (n + 1)(m + 1)k   // th.addsl, th.addsl, slli
    - C = ((n + 1)m + 1)    // th.addsl, th.addsl
    - C = ((n + 1)m + 1)k   // th.addslm th.addsl, slli
  - base ISA
    - C being 2 set-bits    // slli, slli, add
       (possibly slli, th.addsl)

Even though the slli+slli+add sequence would we supported without
XTheadBa, this currently is gated to avoid having to update a large
number of test cases (i.e., anything that has a multiplication with a
constant where only 2 bits are set) in this commit.

With the strength reduction now being performed in performMUL combine,
we drop the (now redundant) patterns from RISCVInstrInfoXTHead.td.

Depends on D143029

Differential Revision: https://reviews.llvm.org/D143394

[RISCV] Add vendor-defined XTHeadBa (address-generation) extension

The vendor-defined XTHeadBa (predating the standard Zba extension)
extension adds an address-generation instruction (th.addsl) with
similar semantics as sh[123]add from Zba.  It is supported by the C9xx
cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for XTHeadBa is
available from:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=8254c3d2c94ae5458095ea6c25446ba89134b9da

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D143029

DAGCombiner: fix -Wunused-private-field. NFC

[AMDGPU] Introduce never uniform bit field in tablegen

IsNeverUniform can be set to 1 to mark instructions which are
inherently never-uniform/divergent. Enabling this bit in
Writelane instruction for now. To be extended to all required
instructions.

Reviewed By: arsenm, sameerds, #amdgpu

Differential Revision: https://reviews.llvm.org/D143154

[libc++] Implement LWG3657 std::hash<filesystem::path>

This is implemented as a DR on top of C++17.

Differential Revision: https://reviews.llvm.org/D143452

[VP][DAGCombiner] Introduce generalized pattern match for vp sdnodes.

The patch tries to solve duplicated combine work for vp sdnodes. The idea is to
introduce MatchConext that verifies specific patterns and generate specific node
infromation. There is two MatchConext in DAGCombiner. EmptyMatcher is for
normal nodes and VPMatcher is for vp nodes.

The idea of this patch is come form Simon Moll's proposal [0]. I only fixed some
minor issues and added few new features in this patch.

[0]: https://github.com/sx-aurora-dev/llvm-project/commit/c38a14484aa2945f3b05369560b65916dd832f76

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D141891

[Support] change StringMap hash function from djbHash to xxHash

Depends on https://reviews.llvm.org/D142861.

Alternative to https://reviews.llvm.org/D137601.

xxHash is much faster than djbHash. This makes a simple Rust test case with a large constant string 10% faster to compile.

Previous attempts at changing this hash function (e.g. https://reviews.llvm.org/D97396) had to be reverted due to breaking tests that depended on iteration order.
No additional tests fail with this patch compared to `main` when running `check-all` with `-DLLVM_ENABLE_PROJECTS="all"` (on a Linux host), so I hope I found everything that needs to be changed.

Differential Revision: https://reviews.llvm.org/D142862

[MachineCopyPropagation] Eliminate spillage copies that might be caused by eviction chain

Remove spill-reload like copy chains. For example
```
r0 = COPY r1
r1 = COPY r2
r2 = COPY r3
r3 = COPY r4
<def-use r4>
r4 = COPY r3
r3 = COPY r2
r2 = COPY r1
r1 = COPY r0
```
will be folded into
```
r0 = COPY r1
r1 = COPY r4
<def-use r4>
r4 = COPY r1
r1 = COPY r0
```

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D122118

[Clang] Disable building tools for 32-bit hosts as well

Summary:
Offloading is not supported on 32-bit applications. We already disable
this for 32-bit cross-compiling but we also need to disable it for
32-bit native machines as well.

[mlir][vector] Support 0-D vector when eliding single element reduction

ElideSingleElementReduction causes assertion failure when we give 0-D vector. It's possible to fold the case by using vector.extractelement op instead. It's originally reported in https://github.com/llvm/llvm-project/issues/60193.

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D143242

Revert "[llvm-profdata] Add option to cap profile output size"

This reverts commit 48f163b889a8f373474c7d198c43e27779f38692.

[llvm-profdata] Add option to cap profile output size

D139603 (add option to llvm-profdata to reduce output profile size) contains test cases that are not cross-platform. Moving those tests to unit test and making sure the feature is callable from llvm library

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D141446

[mlir][IRNumbering] Fix the dialect comparator to be strict

Check if rhs is the dialect to be ordered first, ensuring that
we don't inadvertantly order something before it by
falling back to pure number comparison.

This only shows up depending on the implementation of
stable_sort. This was hit in a build of MSVC that was
checking for strict ordering.

[AArch64] Fix missing comment on D138888, NFC

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D143459

[flang] Enable LoongArch for x86CompatibleBehavior in floating point flag

Similar to D138503 for RISC-V which fix the flang-OldUnit test failure:

```
.../llvm-project/flang/unittests/Evaluate/real.cpp:504: FAIL:
FlagsToBits(prod.flags) == 0x18, not 0x10
0 0x800001 * 0xbf7ffffe
```

With this patch applied, `check-flang` all pass.

Reviewed By: vzakhari

Differential Revision: https://reviews.llvm.org/D143132

[flang] Add LoongArch64 support to lib/Optimizer/CodeGen/Target.cpp

Add LoongArch64 linux target specifics to Target.cpp which is similar to
RISCV-64 in D136547.

For LoongArch, a complex floating-point number, or a structure
containing just one complex floating-point number, is passed as though
it were a structure containing two floating-point reals.

Reviewed By: vzakhari

Differential Revision: https://reviews.llvm.org/D143131

[RISCV][CodeGen] Account for LMUL from VS2 for Vector Reduction Instructions

The Reduction instruction destination register LMUL is 1. But the source
register(vs2) has different LMUL(MF8 to M8). It's beneficial to know how
many registers are working on reduction instructions.
This patch creates separate SchedWrite for each relevant LMUL that from VS2.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D141565

[RISCV] Allow mismatched SmallDataLimit and use Min for conflicting values

Fix an issue about module linking with LTO.

When compiling with PIE, the small data limitation needs to be consistent with that in PIC, otherwise there will be linking errors due to conflicting values.

bar.c
```
int bar() { return 1; }
```

foo.c
```
int foo() { return 1; }
```

```
clang --target=riscv64-unknown-linux-gnu -flto -c foo.c -o foo.o -fPIE
clang --target=riscv64-unknown-linux-gnu -flto -c bar.c -o bar.o -fPIC

clang --target=riscv64-unknown-linux-gnu -flto foo.o bar.o -flto -nostdlib -v -fuse-ld=lld
```

```
ld.lld: error: linking module flags 'SmallDataLimit': IDs have conflicting values in 'bar.o' and 'ld-temp.o'
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
```

Use Min instead of Error for conflicting SmallDataLimit.

Authored by: @joshua-arch1
Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com>
Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com>
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131230

Revert "[-Wunsafe-buffer-usage] Add unsafe buffer checking opt-out pragmas"

This reverts commit aef05b5dc5c566bcaa15b66c989ccb8d2841ac71.
It causes a buildbot failure: https://lab.llvm.org/buildbot/#/builders/216/builds/16879/steps/6/logs/stdio

[libc++] Implement P1328R1 constexpr std::type_info::operator==

Differential Revision: https://reviews.llvm.org/D143447

[DAGCombine] Allow scalable type dead store elimination.

Add support to allow removing a dead store for scalable types. Avoid to remove
scalable type store in favor of fixed type store, since scalable type size is
unknown at the compile time.

Differential Revision: https://reviews.llvm.org/D142100

[-Wunsafe-buffer-usage] Add unsafe buffer checking opt-out pragmas

Add a pair of clang pragmas:
- `#pragma clang unsafe_buffer_usage begin` and
- `#pragma clang unsafe_buffer_usage end`,
which specify the start and end of an (unsafe buffer checking) opt-out
region, respectively.

Behaviors of opt-out regions conform to the following rules:

- No nested nor overlapped opt-out regions are allowed. One cannot
  start an opt-out region with `... unsafe_buffer_usage begin` but never
  close it with `... unsafe_buffer_usage end`. Mis-use of the pragmas
  will be warned.
- Warnings raised from unsafe buffer operations inside such an opt-out
  region will always be suppressed. This behavior CANNOT be changed by
  `clang diagnostic` pragmas or command-line flags.
- Warnings raised from unsafe operations outside of such opt-out
  regions may be reported on declarations inside opt-out
  regions. These warnings are NOT suppressed.
- An un-suppressed unsafe operation warning may be attached with
  notes. These notes are NOT suppressed as well regardless of whether
  they are in opt-out regions.

The implementation maintains a separate sequence of location pairs
representing opt-out regions in `Preprocessor`.  The `UnsafeBufferUsage`
analyzer reads the region sequence to check if an unsafe operation is
in an opt-out region. If it is, discard the warning raised from the
operation immediately.

Reviewed by: NoQ

Differential revision: https://reviews.llvm.org/D140179

[libc++][NFC] Use _LIBCPP_HIDE_FROM_ABI everywhere in pair

[gn build] Port 692da6245d71

[-Wunsafe-buffer-usage] Filter out conflicting fix-its

Two fix-its conflict if they have overlapping source ranges. We shall
not emit conflicting fix-its. This patch checks conflicts in fix-its
generated for one variable (including variable declaration fix-its and
variable usage fix-its). If there is any, we do NOT emit any fix-it
for that variable.

Reviewed by: NoQ

Differential revision: https://reviews.llvm.org/D141338

Only run the weird new try-to-read-too-much test on Darwin

I'm still getting linux CI bot failures for this test. It's not
critical, and it depends on a failure mode that is true on Darwin
but I was always gambling that it might fail in the same way on
other systems.

Fix TestProcessAPI.py to only allocate sys.maxsize buffer

I hardcoded nearly a UINT64_MAX number in this test case,
and python is not able to convert it to a long on some
platforms. Use sys.maxsize instead; this also would have
failed if the testsuite was run on a 32-bit system.

[emacs] Add llvm-mir-mode

Adds an emacs mode for .mir files. For the most part this just
consists of keyword rules for various MIR constructs and then
appending the llvm-mode keywords to that. This doesn't currently
attempt to do anything to be aware of the YAML structure or
differentiate between machine IR and embedded LLVM IR.

Add `set(CMAKE_CXX_STANDARD 17)` to MLIR CMakeLists.txt

This is only useful when building the project in a "standalone" way: that is by
invoking cmake pointing at mlir/ to build against an already built LLVM.

Fixes #60574

[mlir][memref] fix typo in documentation (NFC)

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D143532

[bazel] Fix libc

[-Wunsafe-buffer-usage] Generate fix-it for local variable declarations

Use clang fix-its to transform declarations of local variables, which
are used for buffer access , to be of std::span type.

We placed a few limitations to keep the solution simple:
- it only transforms local variable declarations (no parameter declaration);
- it only considers single level pointers, i.e., pointers of type T * regardless of whether T is again a pointer;
- it only transforms to std::span types (no std::array, or std::span::iterator, or ...);
- it can only transform a VarDecl that belongs to a DeclStmt whose has a single child.

One of the purposes of keeping this patch simple enough is to first
evaluate if fix-it is an appropriate approach to do the
transformation.

This commit was reverted by 622be09c815266632e204eaf1c7a35f050220459
for a compilation warning and now it is fixed.

Reviewed by: NoQ, jkorous

Differential revision: https://reviews.llvm.org/D139737

[Sanitizers] Fix read buffer overrun in scanning loader commands

The fix only affects Darwin, but to write the test I had to modify
the MemoryMappingLayout class which is used by all OSes,
to allow for mocking of image header (this change should be NFC). Hence no [Darwin] in the subject
so I can get more eyes on it.

While looking for a memory gap to put the shadow area into, the sanitizer code
scans through the loaded images, and for each image it scans through its
loader command to determine the occupied memory ranges.

While doing so, if the 'segment load' (kLCSegment) loader comand is encountered, the command scanning function
returns success (true), but does not decrement the command list iterator counter.
The result is that the function is called again and again, with the iterator counter
now being too high. The command scanner keeps updating the loader command pointer,
by using the command size field.

If the loop counter is too high, the command pointer
lands into unintended area ( beyond
+sizeof(mac_header64)+header->sizeofcmds ),
and result depends on the random content found there.

The random content interpreted as loader command might contain a large integer value in the
cmdsize field - this value is added to the current loader command pointer,
which might now point to an inaccessible memory address. It can occasionally result
in a crash if it happens to run beyond the mapped memory segment.

Note that when the area after the loader command list
contains zeros or small integers only, the loop will end normally and the problem
will go unnoticed. So it happened until now since having a some big value
after the header area, falling into command size field is a pretty rare situation.

The fix makes sure that the iterator counter gets updated when the segment load (kLCSegment)
loader command is found too, and in the same code location so the updates will always go together.

Undo the changes in the sanitizer_procmaps_mac.cpp to see the test failing.

rdar://101161047
rdar://102819707

Differential Revision: https://reviews.llvm.org/D142164

[Clang][CMake] Set up distribution target for Clang-BOLT

Provide a way to install usable BOLT-optimized Clang
(clang + resource headers) using
`ninja clang-bolt install-distribution` with BOLT.cmake cache file
or `ninja stage2-clang-bolt stage2-install-distribution`
with BOLT-PGO.cmake cache file.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D140565

[mlir][gpu] Add support for integer types in gpu.subgroup_mma ops

The signedness is carried by `!gpu.mma_matrix` types to most closely
match the Cooperative Matrix specification which determines signedness
with the type (and sometimes the operation).

See: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_cooperative_matrix.html

To handle the lowering from vector to gpu, ops such as arith.extsi are
pattern matched next to `vector.transfer_read` and `vector.contract` to
determine the signedness of the matrix type.

Enables s8 and u8 WMMA types in NVVM for the GPUToNVVM conversion.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D143223

Revert "[-Wunsafe-buffer-usage] Generate fix-it for local variable declarations"

This reverts commit a29e67614c3b7018287e5f68c57bba7618aa880e.

[mlir] Relax version requirement for PyYAML in mlir

Some Ubuntu 20.04 images come with PyYAML 5.3.1 pre-installed through distutils. This makes pip very angry. See https://github.com/yaml/pyyaml/issues/349.

Since older versions of PyYAML should work for mlir, relax the version requirement to ease developer setup.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D143523

[NVPTX] Lower extraction of upper half of i32/i64 as partial move.

This produces better SASS than right-shift + truncate and is fairly common for
CUDA code that operates on __half2 values represented as opaque integer.

Differential Revision: https://reviews.llvm.org/D143448

Check if null buffer handed to SBProcess::ReadMemory

Add a check for a null destination buffer in SBProcess::ReadMemory,
and return an error if that happens. If a Python SB API script
tries to allocate a huge amount of memory, the malloc done by the
intermediate layers will fail and will hand a null pointer to
ReadMemory. lldb will eventually crash trying to write in to that
buffer.

Also add a test that tries to allocate an impossibly large amount
of memory, and hopefully should result in a failed malloc and hitting
this error codepath.

Differential Revision: https://reviews.llvm.org/D143012
rdar://104846609

[VPlan] Mark load VPWidenMemoryInstruction as not having side-effects.

Also add an assert using the underlying instruction to catch any
potential violations.

[bazel] Make libc:__support_common depend on :libc_root

[FuncSpec] Prevent assertion failure when no store value is found

If the only user of the Alloca argument provided to getPromotableAlloca()
is the same as the Call argument, StoreValue is never set and results
in an assertion failure that isa<> was used on a nullptr when passed into
getCandidateConstant().

This was originally seen when trying to build SPEC 2006 416.gamess using
flang with lto enabled.

Differential Revision: https://reviews.llvm.org/D143457