review.tizen.org Git - platform/upstream/llvm.git/log

[AArch64] Stringref'ize AArch64Subtarget constructor. NFCI

[mlir][gpu][spirv] Lower gpu reduction ops to spirv

Supports only "add" and "mul" ops for now. More ops will be added later.

Differential Revision: https://reviews.llvm.org/D140576

[CVP] Expand bound `urem`s

This kind of thing happens really frequently in LLVM's very own
shuffle combining methods, and it is even considered bad practice
to use `%` there, instead of using this expansion directly.
Though, many of the cases there have variable divisors,
so this won't help everything.

Simple case: https://alive2.llvm.org/ce/z/PjvYf-
There's alternative expansion via `umin`:
https://alive2.llvm.org/ce/z/hWCVPb

BUT while we can transform the first expansion
into the `umin` one (e.g. for SCEV):
https://alive2.llvm.org/ce/z/iNxKmJ
... we can't go in the opposite direction.

Also, the non-`umin` expansion seems somewhat more codegen-friendly:
https://godbolt.org/z/qzjx5bqWK
https://godbolt.org/z/a7bj1axbx

There's second variant of precondition:
https://alive2.llvm.org/ce/z/zE6cbM
but there the numerator must be non-undef / must be frozen.

[NFC][CVP] `processURem()`: add statistic and increase readability

[NFC][CVP] Add tests for urem expansion

[NFC][PhaseOrdering] Re-autogenerate check lines in one test

ValueTracking: Fix canCreateUndefOrPoison for saturating shifts

These need to consider the shift amount.

[AMDGPU][AsmParser] Refine parsing cache policy modifiers.

Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D140108

[MemProf] Fix inline propagation of memprof metadata

It isn't correct to always remove memprof metadata MIBs from the
original allocation call after inlining.

Let's say we have the following partial call graph:

C     D
\   /
  v v
   B   E
   |  /
   v v
    A

where A contains an allocation call. If both contexts including B have
the same allocation behavior, the context in the memprof metadata on the
allocation will be pruned, and we will have 2 MIBs with contexts:
A,B and A,E.

Previously, if we inlined A into B we propagate the matching MIBs onto
the inlined allocation call in B' (A,B in this case), and remove it from
the original out of line allocation in A. This is correct if we have a
single round of bottom up inlining.

However, in the compiler we can have multiple invocations of the inliner
pass (e.g. LTO). We may also inline non-bottom up with an alternative
inliner such as the ModuleInliner. In that case, we could end up first
inlining B into C, without having inlined A into B. The call graph then
looks like:

    D
    |
    v
C'  B   E
\  |  /
  v v v
    A

If we subsequently (perhaps on a later invocation of bottom up inlining)
inline A into B, the previous handling would propagate the memprof MIB
context A,B up into the inlined allocation in B', and remove it from the
original allocation in A. The propagation into B' is fine, however, by
removing it from A's allocation, we no longer reflect the context coming
from C'.

To fix this, simply prevent the removal of MIB from the original
allocation callsites.

Note that the memprof_inline.ll test has some changes to existing
checking to replace "noncold" with "notcold" in the metadata. The
corresponding CHECK was accidentally commented out in the old version
and thus this mistake was not previously detected.

Differential Revision: https://reviews.llvm.org/D140764

[SLP]Do not emit many extractelements, reuse the single one emitted.

We do not need to emit many extractelements for each particular use, we
can reuse the only one, just need to adjust it to make it dominate on
all uses.

Differential Revision: https://reviews.llvm.org/D140580

[InstSimplify] fold selects where true/false arm is the same as condition

We managed to fold related patterns in issue #59704,
but we were missing these more basic folds:
https://alive2.llvm.org/ce/z/y6d7SN

[InstSimplify] add tests for select-of-bool; NFC

IROutliner: Convert tests to opaque pointers

Some of these show improvements. outlining-bitcasts.ll might not be
relevant anymore (or should be rewritten to test some other type of
non-pointer bitcast).

[AMDGPU][GFX11] Correct tied src2 of v_fmac_f16_e64

src2 was incorrectly defined as VSrc_f16 but it is tied to dst which is VGPR_32. As a result, disassembler failed to decode src2.

Differential Revision: https://reviews.llvm.org/D140299

AMDGPU: Use default attributes on image dim intrinsics

These were missing nocallback and willreturn

[AMDGPU][MC][GFX11] Correct encoding of neg modifier for v_dot2_f32_bf16

Fix a bug with neg_lo:[0,1,0] and neg_hi:[0,1,0] modifiers - they are accepted but not encoded.

Differential Revision: https://reviews.llvm.org/D140470

[NFC][IR] Remove unused assignment to Offset

This value is overwritten anyway, so let's remove it

[ScheduleDAG] Support REQ_SEQUENCE unscheduling

REG_SEQUENCE node requires special treatment during the
unscheduling because the node is untyped and neither its
class, nor cost could be retrieved the same way as for
typed nodes.

Related issue: https://github.com/llvm/llvm-project/issues/58911

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D138837

[mlir][Arith] Fold integer shift op with zero.

This revision folds arith.shrui, arith.shrsi and arith.shli with zero
rhs to lhs.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D140749

[mlir][Arith] Remove redundant defination, NFC.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D140774

[Flang] Add ppc64 support to Optimizer/CodeGen/Target.cpp for AIX 64 bit

Adding support for ppc64 (big endian) in order to support flang on 64 bit AIX

Reviewed By: clementval, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D138390

[mlir] Simplify a test for vectorizing tensor.extract

Remove unused arguments and the corresponding logic (e.g. affine maps).

Differential Revision: https://reviews.llvm.org/D140755

[CodeGen] Temporarily disable-lsr in HWASAN build

HWASAN exposes some non-determinism in the pass and triggers:
ScalarEvolution.cpp:11540: bool llvm::ScalarEvolution::isLoopEntryGuardedByCond(const Loop *, ICmpInst::Predicate, const SCEV *, const SCEV *): Assertion `isAvailableAtLoopEntry(LHS, L) && "LHS is not available at Loop Entry"' failed.

E.g.
https://lab.llvm.org/buildbot/#/builders/236/builds/1629/steps/16/logs/stdio
is broken after D137838. I tried to split D137838 into smaller patches
and the one which reproduced was just a move of cpp from one dir to another.

Maybe it has something do to with comparison of tagged pointeres and
PtrSets used in pass.

Issues is hard to reproduce, even slight changes in path, or preprocessing
cpp file hide it.

[clang][dataflow] Fix crash when having boolean-to-integral casts.

Since now we just ignore all (implicit) integral casts, treating the
resulting value as the same as the underlying value, it could cause
inconsistency between values after `Join` if in some paths the type
doesn't strictly match. This could cause intermittent crashes.

std::optional<bool> o;
int x;
if (o.has_value()) {
x = o.value();
}

Fixes: https://github.com/llvm/llvm-project/issues/59728

Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D140753

[Bazel] Exclude lib/Headers/openmp_wrappers/stdlib.h out of builtin_headers

It has been there since llvmorg-16-init-14999-g07ff3c5ccce6

[MLIR][Arith][NFC] Use the interface of 'getElementTypeOrSelf' to get the resType

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D140608

[RISCV] Use SUB instead of XOR in lowerShiftLeftParts/lowerShiftRightParts./

isel is now capable of turning the SUB into XOR for shift amounts.
Though it uses NOT instead of XOR with ShiftSize-1.

By using SUB during lowering we enable more DAG combines with
other arithmetic on the shift amount.

[RISCV] RISCVDAGToDAGISel::selectShiftMask to shift by (sub size-1, X).

If the shift amount is (sub C, X) where C is -1 modulo the size of
the shift, we can replace the sub with a NOT.

We could also use XORI X, size-1, but NOT would work better with
c.not from the future Zce extension.

AMDGPU/clang: Remove target features from address space test builtins

It turns out we can codegen these on targets without flat addressing,
although the runtime probably didn't put anything useful there. The
proper diagnostic would be to disallow flat pointer uses or languages
with them, not this one edge case. Allows removing one of the special
cases requiring subtarget support in the device libraries.

[mlir][spirv] Fail vector.bitcast conversion with different bitwidth

Depending on the target environment, we may need to emulate certain
types, which can cause issue with bitcast.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D140437

libclc: Add parentheses to silence warning

Fixes #59209

DAG: Prevent store value forwarding to distinct addrspace load

DAGCombiner replaces (load const_addr1) directly chained with (store
(val, const_addr2)) with val if address space stripped const_addr1 ==
const_addr2. The patch fixes the issue by checking address spaces as
well. However, it might makes sense to not to chain together side
effects that belong to different address spaces in the first place and
make SelectionDAG::root address space aware.

[RISCV] Teach RISCVDAGToDAGISel::selectShiftMask to bypass adds with constant.

If the shift amount is (add X, C) where C is 0 modulo the size of
the shift, we can bypass the add.

Similar to other targets like AArch64 and X86.

AMDGPU/clang: Add builtins for llvm.amdgcn.ballot

Use explicit _w32/_w64 suffixes for the wave size to be consistent
with the existing other wave dependent intrinsics. Also start
diagnosing trying to use both wave32 and wave64.

I would have preferred to avoid the +wavefrontsize64 spam on targets
where that's the only option, but avoiding this seems to be more work
than I expected.

[NFC][Codegen][X86] zero_extend_vector_inreg.ll: add SSE4.2 runline

[DAGCombiner] Try to partition ISD::EXTRACT_VECTOR_ELT to accomodate it's ISD::BUILD_VECTOR users

This mainly cleans up a few patterns that are legalized by scalarization
from a wide-element vector, but then are further split apart to build
a more narrow-sized-element vector. In particular this happens in some
cases for illegal ISD::ZERO_EXTEND_VECTOR_INREG.

Given a ISD::EXTRACT_VECTOR_ELT, which is a glorified bit sequence extract,
recursively analyse all of it's users. and try to model themselves as
bit sequence extractions. If all of them agree on the new, narrower element
type, and all of them can be modelled as ISD::EXTRACT_VECTOR_ELT's of that
new element type, do that, but only if unmodelled users are ISD::BUILD_VECTOR.

[TargetLowering] Teach BuildUDIV to take advantage of leading zeros in the dividend.

If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.

This patch is for scalars only, but we might be able to do this
for vectors in a follow up.

Differential Revision: https://reviews.llvm.org/D140750

[instrprof] Fix issue in binary-ids-padding.test

https://reviews.llvm.org/D135929 caused a failure in
binary-ids-padding.test in big endian configurations:
https://lab.llvm.org/buildbot/#/builders/231/builds/6709

binary-ids-padding.test writes the profile in little-endian format.
This patch changes the raw profile reader to use getDataEndianness()
instead of llvm::support::endian::system_endianness() to fix the issue.

Apply clang-tidy fixes for performance-for-range-copy in Vectorization.cpp (NFC)

Apply clang-tidy fixes for readability-identifier-naming in TestDialect.cpp (NFC)

[clang] Use try_emplace instead of insert when getting new identifier

This is both less verbose and slightly faster, according to:

https://llvm-compile-time-tracker.com/compare.php?from=d9ab3e82f30d646deff054230b0c742704a1cf26&to=73405077ad913f634797ffc7a7bbb110ac9cae99&stat=instructions:u

No functional change intended :-)

[mlir] Add constBuilderCall to DictionaryAttr

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D140740

[mlir][sparse] layout fixes (NFC)

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D140761

[ProfileData] Fix msan -fsanitize-memory-param-retval after D135929

test/tools/llvm-cov/load-multiple-objects.test calls
IndexedInstrProfReader::readBinaryIds with uninitialized BinaryIdsStart.

[llvm][AsmPrinter][NFC] Cleanup `GCMetadataPrinters` field

The field is currently `void*`, which was originlly chosen in 2010 to not need to include `DenseMap`. Since then, `DenseMap` has been included in the header file anyways, so there is no more need to for the indirection via `void*` and the cruft around it can be removed.

Differential Revision: https://reviews.llvm.org/D140758

[InstCombine] avoid miscompile in sinkNotIntoLogicalOp()

Fixes #59704

[InstCombine] add test for miscompile from sinkNotIntoLogicalOp(); NFC

issue #59704

[SLP] Fix debug print for cost in tryToVectorizeList - NFC.

Actual VF was confused with local variable named "VF".

[BPF] Use SectionForGlobal() for section names computation in BTF

Use function TargetLoweringObjectFile::SectionForGlobal() to compute
section names for globals described in BTF_KIND_DATASEC records.

This fixes a discrepancy in section name computation between
BTFDebug::processGlobals and the rest of the LLVM pipeline.

Specifically, the following example illustrates the discrepancy
before this commit:

  struct Foo {
    int i;
  } __attribute__((aligned(16)));
  struct Foo foo = { 0 };

The initializer for 'foo' looks as follows:

  %struct.Foo { i32 0, [12 x i8] undef }

TargetLoweringObjectFile::SectionForGlobal() classifies 'foo' as
a part of '.bss' section, while BTFDebug::processGlobals
classified it as a part of '.data' section because of the
following expression:

  SecName = Global.getInitializer()->isZeroValue() ? ".bss" : ".data"

The isZeroValue() returns false because of the undef tail of the
initializer, while SectionForGlobal() allows such patterns in '.bss'.

Differential Revision: https://reviews.llvm.org/D140505

[SLP] A couple of minor improvements for slp graph view - NFC.

Show ScatterVectorize nodes in frames of blue color
and print vectorize tree indices.

[profile] Add binary ids into indexed profiles

This patch adds support for including binary ids in an indexed profile.
It adds a new field into the header that points to the offset of the
binary id section. The binary id section consists of a size of the
section, and a list of binary ids (if they are present) that consist
of two parts: length and data.

This patch guarantees that indexed profile is backwards compatible
after adding binary ids.

Differential Revision: https://reviews.llvm.org/D135929

[test] Fix dfsan/stack_trace.c

[Support] Fix what I think is an off by 1 bug in UnsignedDivisionByConstantInfo.

The code in Hacker's Delight says
`nc = -1 - (-d)%d;`

But we have
`NC = AllOnes - (AllOnes-D)%D`

The Hacker's Delight code is written for the LeadingZeros==0 case.
`AllOnes - D` is not the same as `-d` from Hacker's Delight.

This patch changes the code to
`NC = AllOnes - (AllOnes+1-D)%D`

This will increment AllOnes to 0 in the LeadingZeros==0 case. This
will make it equivalent to -D. I believe this is also correct for
LeadingZeros>0.

At least for i8, i16, and i32 the only divisor that changes is
((1 << (BitWidth-1)) | 1). Or 127 for i8, 32769 for i16, and 2147483649
for i32. These are all large enough that the quotient is 0 or 1 so
InstCombine replaces them with an icmp and zext before SelectionDAG.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D140636

[flang] Restore checking for some optional values before use

Recent commits (2098ad7f00324ee0f2a6538f418a6f81dfdd2edb and
15a9a72ee68166c0cff3f036cacd3c82be66c729) replaced usage of "o.value()"
on optionals with "*o". Those optional values are expected to be
present -- but now, if it ever turns out that they're not,
compilation will proceed with garbage data rather than crashing
immediately (and more debuggably) with an uncaught exception.

Add asserts for presence to restore the previous level of safety.
(I could have revert these patches so as to resume used of .value()
but I didn't want to just have them get broken again.)

Differential Revision: https://reviews.llvm.org/D140340

[InstSimplify] fold exact divide to poison if it is known to not divide evenly

This is related to the discussion in D140665. I was looking over the demanded
bits implementation in IR and noticed that we just bail out of a potential
fold if a udiv is exact:
https://github.com/llvm/llvm-project/blob/82be8a1d2b00f6e89096b86f670a8be894c7b9e6/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L799

Also, see tests added with 7f0c11509e8f.

Then, I saw that we could lose a fold to poison if we zap the exact with that
transform, so this patch tries to catch that as a preliminary step.

Alive2 proofs:
https://alive2.llvm.org/ce/z/zCjKM7
https://alive2.llvm.org/ce/z/-tz_RK (trailing zeros must be "less-than")
https://alive2.llvm.org/ce/z/c9CMsJ (general proof and specific example)

Differential Revision: https://reviews.llvm.org/D140733

Detemplate llvm::EmitGEPOffset and move it into a cpp file. NFC.

[MC] [llvm-ml] Add support for the extrn keyword

It is the same as the already supported `extern` keyword.
https://learn.microsoft.com/en-us/cpp/assembler/masm/extrn?view=msvc-170

Fixes: https://github.com/llvm/llvm-project/issues/59712

Reviewed By: epastor

Differential Revision: https://reviews.llvm.org/D140679

[InstSimplify] fix formatting and add bool function argument comments; NFC

Make existing code conform with proposed additions in D140733.

[RISCV] Add fmin/fmax scalar instructions to isAssociativeAndCommutative

Follow-up patch of D140530.

We can add FMIN, FMAX to isAssociativeAndCommutative to
increase instruction-level parallelism by the existing MachineCombiner
pass.

Differential Revision: https://reviews.llvm.org/D140602

[RISCV] Add integer scalar instructions to isAssociativeAndCommutative

Inspired by D138107.

We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative
to increase instruction-level parallelism by the existing MachineCombiner pass.

Differential Revision: https://reviews.llvm.org/D140530

[mlir] NFC: work around gcc-aarch64 v8.3 compilation issue in getRegionBranchSuccessorOperands implementation.

https://reviews.llvm.org/rG25671db3d343 didn't quite do it because the underlying issue was that the specific compiler chokes on the second standard conversion sequence after the user-defined `MutableOperandRange::operator OperandRange() const` conversion (see https://en.cppreference.com/w/cpp/language/implicit_conversion).

[M68k] Define __GCC_HAVE_SYNC_COMPARE_AND_SWAP macros

Define __GCC_HAVE_SYNC_COMPARE_AND_SWAP macros

Fixes #58974

Reviewed By: myhsu, glaubitz, 0x59616e

Differential Revision: https://reviews.llvm.org/D140695

[mlir] Add a newline character in the Linalg debug macro

Differential Revision: https://reviews.llvm.org/D140752

[InstCombine] Fold (X << Z) / (X * Y) -> (1 << Z) / Y

Alive2: https://alive2.llvm.org/ce/z/CBJLeP

Fix build of nvptx-arch with CLANG_LINK_CLANG_DYLIB

The function clang_target_link_libraries must only be used with real
Clang libraries; with CLANG_LINK_CLANG_DYLIB, it will instead link in
clang-cpp. We must use the standard CMake target_link_libraries for
the CUDA library.

[RISCV] Add Svpbmt extension support.

Spec of Svpbmt: https://github.com/riscv/riscv-isa-manual/blob/master/src/supervisor.tex#L2399

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D140692

[RISCV] Add SH1ADD/SH2ADD/SH3ADD to RISCVDAGToDAGISel::hasAllNBitUsers.

[Clang][RISCV] Use poison instead of undef

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D140687

[BOLT] Respect -function-order in lite mode

Process functions listed in -function-order file even in lite mode.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D140435

[RISCV] Prefer ADDI over ORI if the known bits are disjoint.

There is no compressed form of ORI but there is a compressed form
for ADDI.

This also works for XORI since DAGCombine will turn Xor with disjoint
bits in Or.

Note: The compressed forms require a simm6 immediate, but I'm doing
this for the full simm12 range.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D140674

[DFSan] Add `zeroext` attribute for callbacks with 8bit shadow variable arguments

Add `zeroext` attribute for below callbacks' first parameter
(8bit shadow variable arguments) to conform to many platforms'
ABI calling convention and some compiler behavior.
- __dfsan_load_callback
- __dfsan_store_callback
- __dfsan_cmp_callback
- __dfsan_conditional_callback
- __dfsan_conditional_callback_origin
- __dfsan_reaches_function_callback
- __dfsan_reaches_function_callback_origin

The type of these callbacks' first parameter is u8 (see the
definition of `dfsan_label`). First, many platforms' ABI
requires unsigned integer data types (except unsigned int)
are zero-extended when stored in general-purpose register.
Second, the problem is that compiler optimization may assume
the arguments are zero-extended and, if not, misbehave, e.g.
it uses an `i8` argument to index into a jump table. If the
argument has non-zero high bits, the output executable may
crash at run-time. So we need to add the `zeroext` attribute
when declaring and calling them.

Reviewed By: browneee, MaskRay

Differential Revision: https://reviews.llvm.org/D140689

[XRay] Unsupport version<2 sled entry

For many features we expect clang and compiler-rt to have a version lock
relation, yet for XRaySledEntry we have kept version<2 compatibility for more
than 2 years (I migrated away the last user mips in 2020-09 (D87977)).
I think it's fair to call an end to version<2 now. This should discourage more
work on version<2 (e.g. D140725).

Reviewed By: ianlevesque

Differential Revision: https://reviews.llvm.org/D140739

Revert "[MLIR][Arith] Remove unused assertions"

This reverts commit 50e6c306b1cb03fe398aebc41d1bef5b6c9d9bb0.

[NFC][Codegen][X86] Add exhaustive-ish test coverage for ZERO_EXTEND_VECTOR_INREG

It should be possible to deduplicate AVX2 and AVX512F checklines,
but i'm not sure which combination of check prefixes would do that.

https://godbolt.org/z/sndT9n1nz

[mlir][py] Add StrAttr convenience builder.

[dfsan][test] Replace REQUIRES: x86_64-target-arch with lit.cfg.py check

Make it easier to support a new architecture.

Reviewed By: #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D140744

[RISCV] Fix mistakes in fixed-vectors-vreductions-mask.ll command lines. NFC

There were 4 RUN lines, but only 2 of them were unique. I believe
we were trying to test LMUL=1 and LMUL=8 with riscv32 and riscv64.
But put riscv32 on both LMUL=1 lines and riscv64 on both LMUL=8 lines.

[RISCV] Add RISCV::XORI to RISCVDAGToDAGISel::hasAllNBitUsers.

[Clang] Move AMDGPU IAS enabling to Generic_GCC::IsIntegratedAssemblerDefault, NFC

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D140657

Apply clang-tidy fixes for readability-identifier-naming in InferTypeOpInterface.cpp (NFC)

Apply clang-tidy fixes for readability-simplify-boolean-expr in BufferizableOpInterfaceImpl.cpp (NFC)

[RISCV] Support SRLI in hasAllNBitUsers.

We can recursively look through SRLI if the shift amount is less
than the demanded bits. We can reduce the demanded bit count by
the shift amount and check the users of the SRLI.

[RISCV] Refactor RISCV::hasAllWUsers to hasAllNBitUsers similar to RISCVISelDAGToDAG's version. NFC

Move to RISCVInstrInfo since we need RISCVSubtarget now.

Instead of asking if only the lower 32 bits are used we can now
ask if the lower N bits are used. This will be needed by a future
patch.

CodingStandards: restrict CamelCase variable names guideline to llvm/clang/clang-tools-extra/polly/bolt

See https://discourse.llvm.org/t/top-level-clang-tidy-options-and-variablename-suggestion-on-codingstandards/58783 ,
the CamelCase variable names guideline does not reflect the truth:
flang, libc, libclc, libcxx, libcxxabi, libunwind, lld, mlir, openmp,
and pstl use camelCase. lldb uses snake_case.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D140585

[MLIR][Affine] Make fusion helper check method significantly more efficient

The `hasDependencePath` method in affine fusion is quite inefficient as
it does a DFS on the complete graph for what is a small part of the
checks before fusion can be performed. Make this efficient by using the
fact that the nodes involved are all at the top-level of the same block.
With this change, for large graphs with about 10,000 nodes, the check
runs in a few seconds instead of not terminating even in a few hours.

This is NFC from a functionality standpoint; it only leads to an
improvement in pass running time on large IR.

Differential Revision: https://reviews.llvm.org/D140522

[XRay] Fix Hexagon sled version

D113638 emitted version 0 for XRaySledEntry, which will lead to an incorrect
address computation in the runtime.

While here, improve the test.

[OpenMP][JIT] Fixed a couple of issues in the initial implementation of JIT

This patch fixes a couple of issues:
1. Instead of using `llvm_unreachable` for those base virtual functions, unknown
value will be returned. The previous method could cause runtime error for those
targets where the image is not compatible but JIT is not implemented.
2. Fixed the type in CMake that causes the `Target` CMake variable is undefined.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D140732

[RISCV] Add const qualifiers to some function arguments. NFC

[X86] Emit RIP-relative access to local function in PIC medium code model

Currently, the medium code model for x86_64 emits position-dependent relocations (R_X86_64_64) for local functions, regardless of PIC or no-PIC mode. (This means generically that code compiled with the medium model cannot be linked into a position-independent executable.)

Example:

```
static int g(int n) {
  return 2 * n + 3;
}

void f(int(**p)(int)) {
  *p = g;
}
```

This results in:

```
Disassembly of section .text:

0000000000000000 <f>:
       0: 48 b8 00 00 00 00 00 00 00 00 movabs rax, 0x0
       a: 48 89 07                      mov qword ptr [rdi], rax
       d: c3                            ret
```

```
Relocation section '.rela.text' at offset 0xf0 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000002  0000000200000001 R_X86_64_64            0000000000000000 .text + 10
```

This patch changes the behaviour to unconditionally emit a RIP-relative access, both in PIC and non-PIC mode. This fixes PIC mode, and is perhaps an improvement in non-PIC mode, too, since it results in a shorter instruction. A 32-bit relocation should suffice since the medium memory model demands that all code fit within 2GiB.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140593

[InstSimplify] add tests for div exact; NFC

[InstCombine] add tests for udiv-by-constant demanded bits; NFC

[libc++][CI] Improves clang-(tidy|query) selection.

Hardcode the version of the tools used in the test feature script
instead of the tests. By changing the hard-coded location it's
easier to make the location flexible in the future.

Drive-by change
- The minimum required version for clang-query is now 15, which matches
  our future idea as outlined in the Dockerfile.
- The minimum required version for clang-tidy is now 16, which enables
  the new clang-tidy ADL plugin. This plugin is disabled for C++03
  due to false positives when using `noexcept`, which is not an operator
  in C++03.

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D139545

[lld] Fix iwyu problems after 83d59e05b201760e3f364ff6316301d347cbad95

The commit transitively includes lld/include/lld/Common/ErrorHandler.h into
lld/include/lld/Common/Driver.h, which is not intended.

[NVPTX] Emit .noreturn directive

Differential Revision: https://reviews.llvm.org/D140238

Handle simple diamond CFG hoisting in DivRemPairs.

Previous we only handled triangle CFGs. This patch expands that
to support diamonds, where the div and rem appear in the then/else
sides of a condition. In that case, we can hoist the div into the
shared predecessor.

This could be generalized further to use nearest common ancestors,
but some of the conditions for hoisting would then require
post-dominator information.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D140647

[AArch64] Fix AArch64TargetParser.def includes for standalone builds.

[NFC][libc++] Replaces tabs by spaces.

[test] Exclude //llvm/unittests:llvm_exegesis_tests due to buildkite environment.

Buildkite does not allow user perf monitoring and fails: https://buildkite.com/llvm-project/upstream-bazel/builds/49579.

```
[ RUN ] PerfHelperTest.FunctionalTest
Unable to open event. ERRNO: Permission denied. Make sure your kernel allows user space perf monitoring.
You may want to try:
$ sudo sh -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
llvm_exegesis_tests: external/llvm-project/llvm/tools/llvm-exegesis/lib/PerfHelper.cpp:111: llvm::exegesis::pfm::Counter::Counter(llvm::exegesis::pfm::PerfEvent &&): Assertion `FileDescriptor != -1 && "Unable to open event"' failed.
```

[mlir][sparse] Use DLT in the mangled function names for insertion.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D140484

[bazel] Restore libpfm as a conditional dependency for exegesis.

We used to have `pfm` built into exegesis, although since it's an external dependency we marked it as a manual target. Because of this we didn't have buildbot coverage and so we removed it in D134510 after we had a few breakages that weren't caught. This adds it back, but with three possible states similar to the story with `mpfr`, i.e. it can either be disabled, built from external sources (git/make), or use whatever `-lpfm` is installed on the system.

This change is modeled after D119547. Like that patch, the default is off (matching the status quo), but unlike that patch we don't enable it for CI because IIRC we don't have the package installed there, and building from source might be expensive. We could enable it later either after installing it on buildbot machines or by measuring build cost and deeming it OK.

Reviewed By: GMNGeoffrey

Differential Revision: https://reviews.llvm.org/D138470