review.tizen.org Git - platform/upstream/llvm.git/log

[MLIR][TOSA] Switch Tosa to DenseArrayAttr

This diff completes switching Tosa to DenseArrayAttr.

Test plan: ninja check-mlir check-all

Differential revision: https://reviews.llvm.org/D141111

[Fix][-Wunsafe-buffer-usage] Add a new `forEachDescendant` matcher that skips callable declarations

The original patch does include a `new` statement without a matching
`delete`, causing Sanitizer warnings in
https://lab.llvm.org/buildbot/#/builders/5/builds/30522/steps/13/logs/stdio.

This commit is a fix to it.

Differential Revision: https://reviews.llvm.org/D138329

AMDGPU: Try to fix 32-bit build bot

[ubsan][test] Fix typo in D139230

Fix "runtime runtime error" -> "runtime error"

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140321

AMDGPU: Use BinaryByteStream in printf expansion

Attempt to fix test failures on big endian bots. This pass definitely
needs more test coverage.

AMDGPU: Add additional printf string tests

Test various inputs passed to %s.

[mlir][tensor] Add producer fusion for tensor.unpack op.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D141151

[Support] On Windows 11 and Windows Server 2022, fix an affinity mask issue on large core count machines

Before Windows 11 and Windows Server 2022, only one 'processor group' is assigned by default to a starting process, then the program is responsible for dispatching its own threads on more 'processor groups'. That is what 8404aeb56a73ab24f9b295111de3b37a37f0b841 was doing, allowing LLVM tools to automatically use all hardware threads in the machine.

After Windows 11 and Windows Server 2022, the OS takes care of that. This has an adverse effect reported in #56618 which is that using `GetProcessAffinityMask()` API in some edge cases seems buggy now. That API is used to detect if an affinity mask was set, and adjust accordingly the available threads for a ThreadPool.

With this patch, on one hand, we let the OS dispatch threads on all 'processor groups', but only for Windows 11 & Windows Server 2022 and after. We retain the old behavior for older OS versions. On the other hand, a workaround was added to mitigate the `GetProcessAffinityMask()` issue described above (see Threading.inc, L226).

Differential Revision: https://reviews.llvm.org/D138747

[mlir][py] Fix python modules build with clang-cl due to requiring exceptions

The generator expression previously used to enable exceptions would not work since the compiler id of clang-cl is Clang, even if used via clang-cl.

The patch fixes that by replacing the generator expression with simple logic, setting the right compiler flags for all MSVC like compilers (including clang-cl) and all GCC like compilers.

Differential Revision: https://reviews.llvm.org/D141155

[BOLT][DWARF] Change rangelists to use DW_RLE_offset_pair

Before we always used DW_RLE_startx_length. This is not very efficient and leads
to bigger .debug_addr section. Changed it to use
DW_RLE_base_addressx/DW_RLE_offset_pair.

clang-16 build in debug mode
llvm-bolt ran on it with --update-debug-sections
| section | before | after | diff | % decrease |
| .debug_rnglists | 32732292 | 31986051 | -746241 | 2.3% |
| .debug_addr | 14415808 | 14184128 | -231680 | 1.6% |

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D140439

Revert "[Fix][-Wunsafe-buffer-usage] Add a new `forEachDescendant` matcher that skips callable declarations"

This reverts commit 6d140b952805bd9277fba666520ce46c19f2c637.

This commit may causes `test/SemaCXX/warn-unsafe-buffer-usage.cpp` failure.

[Fix][-Wunsafe-buffer-usage] Add a new `forEachDescendant` matcher that skips callable declarations

The original patch does include a `new` statement without a matching
`delete`, causing Sanitizer warnings in
https://lab.llvm.org/buildbot/#/builders/5/builds/30522/steps/13/logs/stdio.

This commit is a fix to it.

Differential Revision: https://reviews.llvm.org/D138329

[mlir][Arith] Remove expansions of integer min and max ops

As of several months ago, both ArithToLLVM and ArithToSPIRV have
native support for integer min and max operations. Since these are all
the targets available in MLIR core, the need to "expand" arith.minui,
arith.minsi, arith,maxsi, and arith.manxui to more primitive
operations is to longer present.

Therefore, the expanding of integer min and max operations in Arith,
while correct, is likely to lead to performance loss by way of
misoptimization further down the line, and is no longer needed for
anyone's correctness.

This change may break downstream tests, but will not affect the
semantics of MLIR programs.

arith.minf and arith.maxf have a lot of underlying complexity due to
the many different possible NaN and signed zero semantics available on
various platforms, and so removing their expansion is left to a future
commit.

Reviewed By: ThomasRaoux, Mogball

Differential Revision: https://reviews.llvm.org/D140856

[-Wunsafe-buffer-usage] Changing the use of None with std::nullopt to address a warning.

[mlir] Add header file for ssize_t

ssize_t is part of POSIX and not standard C/C++, so using ssize_t
without the necessary header files causes the build to fail on Windows
with the following error: 'ssize_t': undeclared identifier.

This patch includes llvm/Support/DataTypes.h to resolve the problem.

Differential Revision: https://reviews.llvm.org/D141149

[-Wunsafe-buffer-usage] Safe-buffers re-architecture to introduce Fixable gadgets

Re-architecture of safe-buffers gadgets to re-classify them as warning and fixable
gadgets. The warning gadgets identify unsafe operations on buffer variables and
emit suitable warnings. While the fixable gadgets consider all operations on
variables identified by the warning gadgets and emit necessary fixits.

Differential Revision: https://reviews.llvm.org/D140062?id=486625

[libc] add noexcept to external function headers

To improve code generation for C++ code that directly includes our
headers, the external function definitions will now be marked noexcept.
This may not be necessary for the internal definitions since we build
with the -fno-exceptions flag.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D141095

Fix PDL verifiers to be resilient to invalid IR

This would cause a crash when calling `dump()` on an operation that
didn't have a parent yet.

[libc++][test] Add missing include

`std::out_of_range` is in `<stdexcept>`

[DebugInfo] Allow non-stack_value variadic expressions and use in DBG_INSTR_REF

Prior to this patch, variadic DIExpressions (i.e. ones that contain
DW_OP_LLVM_arg) could only be created by salvaging debug values to create
stack value expressions, resulting in a DBG_VALUE_LIST being created. As of
the previous patch in this patch stack, DBG_INSTR_REF's syntax has been
changed to match DBG_VALUE_LIST in preparation for supporting variadic
expressions. This patch adds some minor changes needed to allow variadic
expressions that aren't stack values to exist, and allows variadic expressions
that are trivially reduceable to non-variadic expressions to be handled
similarly to non-variadic expressions.

Reviewed by: jmorse

Differential Revision: https://reviews.llvm.org/D133926

[mlir] Support TBAA metadata in LLVMIR dialect.

This change introduces new LLVMIR dialect operations to represent
TBAA root, type descriptor and access tag metadata nodes.

For the purpose of importing TBAA metadata from LLVM IR it only
supports the current version of TBAA format described in
https://llvm.org/docs/LangRef.html#tbaa-metadata (i.e. size-aware
representation introduced in D41501 is not supported).

TBAA attribute support is only added for LLVM::LoadOp and LLVM::StoreOp.
Support for intrinsics operations (e.g. LLVM::MemcpyOp) may be added later.

The TBAA attribute is represented as an array of access tags, though,
LLVM IR supports only single access tag per memory accessing instruction.
I implemented it as an array anticipating similar support in LLVM IR
to combine TBAA graphs with different roots for Flang - one of the options
described in https://docs.google.com/document/d/16kKZVmI585wth01VSaJAqZMZpoX68rcdBmgfj0kNAt0/edit#heading=h.jzzheaz9vqac

It should be easy to restrict MLIR operation to a single access tag,
if we end up using a different approach for Flang.

Differential Revision: https://reviews.llvm.org/D140768

[AMDGPU] Combine redundant Asm64 and AsmVOP3DPPBase. NFC

Reduce duplication in the codebase by combining these fields in
VOPProfile.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D141088

Cleanup unwind table emission code a bit.

This change removes the `tidyLandingPads` function, which previously
had a few responsibilities:

1. Dealing with the deletion of an invoke, after MachineFunction lowering.
2. Dealing with the deletion of a landing pad BB, after MachineFunction lowering.
3. Cleaning up the type-id list generated by `MachineFunction::addLandingPad`.

Case 3 has been fixed in the generator, and the others are now handled
during table emission.

This change also removes `MachineFunction`'s `addCatchTypeInfo`,
`addFilterTypeInfo`, and `addCleanup` helper fns, as they had a single
caller, and being outlined didn't make it simpler.

Finally, as calling `tidyLandingPads` was effectively the only thing
`DwarfCFIExceptionBase` did, that class has been eliminated.

Remove special cases for invoke of non-throwing inline-asm.

Non-throwing inline asm infers the nounwind attribute in
instcombine. Thus, it can be handled in the same manner as
non-throwing target functions are generally. Further special casing is
unnecessary complexity.

[mlir][tosa] Add tosa.conv3d lowering to Linalg

Conv3D has an existing linalg operation for floating point. Adding a quantized
variant and corresponding lowering from TOSA. Numerical correctness was validated
using the TOSA conformance tests.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D140919

When loading mach-o corefile, new fallback for finding images

When lldb is reading a user process corefile, it starts by finding
dyld, then finding the dyld_all_image_infos structure in dyld by
symbol name, then getting the list of loaded binaries. If it fails
to find the structure by name, it can't load binaries. There is
an additional fallback that this patch adds, which is to look for
this object by the section name it is stored in, if the symbol name
lookup fails.

Differential Revision: https://reviews.llvm.org/D140066
rdar://103369931

Re-land "[-Wunsafe-buffer-usage] Add a new `forEachDescendant` matcher that skips callable declarations"

This reverts commit 22df4549a3718dcd8b387ba8246978349e4be50c.

After a quick investigation, realizing that the Sanitizer test
failures caused by this patch is not likely to block other
contributors. I re-land this patch before taking a closer look at
those tests so that it won't block the [-Wunsafe-buffer-usage]
development.

Fix: Title underline too short in D129372

This patch fixes an error in commit e10e9363 in which the
added documentation contained an incorrectly-styled underline
for the title "Debug Instruction Reference Operands".

[DebugInfo][NFC] Add new MachineOperand type and change DBG_INSTR_REF syntax

This patch makes two notable changes to the MIR debug info representation,
which result in different MIR output but identical final DWARF output (NFC
w.r.t. the full compilation). The two changes are:

  * The introduction of a new MachineOperand type, MO_DbgInstrRef, which
    consists of two unsigned numbers that are used to index an instruction
    and an output operand within that instruction, having a meaning
    identical to first two operands of the current DBG_INSTR_REF
    instruction. This operand is only used in DBG_INSTR_REF (see below).
  * A change in syntax for the DBG_INSTR_REF instruction, shuffling the
    operands to make it resemble DBG_VALUE_LIST instead of DBG_VALUE,
    and replacing the first two operands with a single MO_DbgInstrRef-type
    operand.

This patch is the first of a set that will allow DBG_INSTR_REF
instructions to refer to multiple machine locations in the same manner
as DBG_VALUE_LIST.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D129372

[libc++][test] Suppress MSVC warnings in std::expected tests

* initializing `short`s with `short`s instead of `int`s to avoid narrowing warnings
* Explicitly discard the result of `value` calls to avoid `[[nodiscard]]` warnings

Drive-by: `testException` from `value` test is duplicated in `value_or` test; remove the duplicate.
Differential Review: https://reviews.llvm.org/D141108

[PPC] Add support for tune-cpu attribute

clang (like gcc) has the -mtune= command line option. This option
adds the "tune-cpu" attribute to a function. The intended functionality
is that the scheduling model of that cpu is used. E.g. -mtune=pwr9 -march=pwr8
generates only instructions supported on pwr8 but uses the scheduling model
of pwr9 for it.
This PR adds the infrastructure to support this in LLVM.
clang support was added in https://reviews.llvm.org/D130526.

Reviewed By: amyk, qiucf

Differential Revision: https://reviews.llvm.org/D138317

Recommit "[RISCV] Enable the LocalStackSlotAllocation pass support"

This includes a fix for the tramp3d failure from the llvm-testsuite
that caused the last revert. Hopefully the others failures were the
same issue.

Original commit message:
For RISC-V, load/store(exclude vector load/store) instructions only has a 12 bit immediate operand. If the offset is out-of-range, it must make use of a temp register to make up this offset. If between these offsets, they have a small(IsInt<12>) relative offset, LocalStackSlotAllocation pass can find a value as frame base register's value, and replace the origin offset with this register's value plus the relative offset.

Co-authored-by: luxufan <luxufan@iscas.ac.cn>
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Differential Revision: https://reviews.llvm.org/D98101

Re-gernerate a test in preparation for D141060

[mlir] improve error handling in Linalg op splitting

In several cases, the splitting may be known to be a noop, i.e., produce
no second part. Thread this information through the transform utilities
to the transform dialect, and differentiate it from the error state.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D141138

[mlir][nvvm] Add lowering of gpu.printf to nvvm

When converting to nvvm lowering gpu.printf to vprintf allows us to
support printing when running on cuda.

Differential Revision: https://reviews.llvm.org/D141049

[SLP]Fix cost of the broadcast buildvector/gather.

Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498

[RISCV] Improve 4x and 8x (s/u)int_to_fp.

Previously we emitted a 4x or 8x vzext followed by a vfcvt.
We can instead use a 2x or 4x vzext followed by a vfwcvt.

Revert "[Dominator] Add findNearestCommonDominator() for Instructions (NFC)"

This reverts commit 7f0de9573f758f5f9108795850337a5acbd17eef.

This is missing handling for !isReachableFromEntry() blocks, which
may be relevant for some callers. Revert for now.

[RISCV] Add more XVentanaCondOps patterns.

Add patterns with seteq/setne conditions.

We don't have instructions for seteq/setne except for comparing
with zero and need to emit an ADDI or XOR before a seqz/snez to
compare other values.

The select ISD node takes a 0/1 value for the condition, but the
VT_MASKC(N) instructions check all XLen bits for zero or non-zero.
We can use this to avoid the seqz/snez in many cases.

This is pretty ridiculous number of patterns. I wonder if we could
use some ComplexPatterns to merge them, but I'd like to do that as
a follow up and focus on correctness of the result in this patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D140421

[GVN] Name instructions in test (NFC)

[EntryExitInstrumenter] Convert test to opaque pointers (NFC)

[RISCV] Add support for the vscale_range attribute.

This is based on @frasercrmck's D107290. At least some of the clang
portion of D107290 has already been committed.

This uses vscale_range for min/max vector width unless the command
line overrides are used.

As a follow up, I plan to add a max or exact VLEN option to clang
to control the vscale_range. This will eliminate many of the reasons
for users to use the overrides through the -mllvm interface.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139873

[mlir][vector] Relax restriction on reduction distribution

Relax unnecessary restriction when distribution a vector.reduce op.
All the float and integer types can be supported by user's lambda.

Differential Revision: https://reviews.llvm.org/D141094

flang: break the build on 32bit systems

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D141132

Doc: improve the flang readme page

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D141126

[Dominator] Add findNearestCommonDominator() for Instructions (NFC)

This is a recurring pattern: We want to find the nearest common
dominator (instruction) for two instructions, but currently only
provide an API for the nearest common dominator of two basic blocks.

Add an overload that accepts and return instructions.

[gn build] Port 16c1c9fdcc48

[SelectionDAG] Implicitly truncate known bits in SPLAT_VECTOR

Now that D139525 fixes the Hexagon infinite loop, the stopgap can be
removed to provide more information about known bits in SPLAT_VECTOR
whose operands are smaller than the bit width (which is most of the
time)

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D141075

[WebAssembly][NFC] Add test case for PR59626

For D141079

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D141120

Revert D140263 "[NFC] Vastly simplifies TypeSize"

This broke some build bots : https://lab.llvm.org/buildbot/#/builders/16/builds/41419/steps/5/logs/stdio

This reverts commit 4670d5ece57d9b030597da679072f78bb3f4d419.

[LoopFlattening] Check for extra uses on Mul

Similar to D138404, we were not guarding against extra uses of the Mul.
In most cases other checks would catch the issue due to unsupported
instructions in the outer loop, but certain non-canonical loop forms
could still get through.

Fixes #59339

Differential Revision: https://reviews.llvm.org/D141114

[LoopFlatten][NFC] Run instnamer on pr59339.ll

[AArch64][SME]: Make 'Expand' the default action for all Ops.

By default expand all operations, then change to Custom/Legal if needed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D141068

Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize"

The patch should be discussed further.

This reverts commit dd56e1c92b0e6e6be249f2d2dd40894e0417223f.

[NFC] Only expose getXXXSize functions in TypeSize

Currently 'TypeSize' exposes two functions that serve the same purpose:
- getFixedSize / getFixedValue
- getKnownMinSize / getKnownMinValue

source : https://github.com/llvm/llvm-project/blob/bf82070ea465969e9ae86a31dfcbf94c2a7b4c4c/llvm/include/llvm/Support/TypeSize.h#L337-L338

This patch offers to remove one of the two and stick to a single function in the code base.

Differential Revision: https://reviews.llvm.org/D141134

[StackLifetime] Fix sign compare warning (NFC)

[MemCpyOpt] Extract processStoreOfLoad() method (NFC)

[Libomptarget] Add more moves to expected conversion

Summary:
Fixes other instances of the same problem in the previous patch.

[Libomptarget] Add move to expected conversion

Summary:
These implicit conversions from move-only types to expected seem to only
work with newer compilers. This should hopefully fix it.

[mlir] fix use-after-free on error path in transform dialect

[clang-format] fix template closer followed by >

fix https://github.com/llvm/llvm-project/issues/59785

Reviewed By: HazardyKnusperkeks, MyDeveloperDay, owenpan

Differential Revision: https://reviews.llvm.org/D140843

[IR] Use isEntryBlock() API (NFC)

[IR] Add AllocaInst::getAllocationSize() (NFC)

When fetching allocation sizes, we almost always want to have the
size in bytes, but we were only providing an InBits API. Also add
the corresponding byte-based conjugate to save some *8 and /8
juggling everywhere.

[SDAG] try to avoid multiply for X*Y==0

Forking this off from D140850 -
https://alive2.llvm.org/ce/z/TgBeK_
https://alive2.llvm.org/ce/z/STVD7d

We could almost justify doing this in IR, but consideration for
"minsize" requires that we only try it in codegen -- the
transform is not reversible.

In all other cases, avoiding multiply should be a win because a
mul is more expensive than simple/parallelizable compares. AArch
even has a trick to keep instruction count even for some types.

Differential Revision: https://reviews.llvm.org/D141086

AMDGPU/GlobalISel: Add missing test for implicit_def regbankselect

AMDGPU/GlobalISel: Add wave32 checks to bool test

[C++20] Determine the dependency of unevaluated lambdas more accurately

During template instantiation, the instantiator will enter constant
evaluated
context before instantiate a template argument originated from an
expression,
and this impedes the instantiator from creating lambdas with independent
types.

This patch solves the problem via widening the condition that the
instantiator
marks lambdas as never dependent, and fixes the issue #57960

Differential Revision: https://reviews.llvm.org/D140554

[AMDGPU] Add a feature for VALUTransUseHazard

NFCI. This just allows us to experiment with enabling/disabling the
workaround on different subtargets.

Differential Revision: https://reviews.llvm.org/D141121

[llvm-exegesis][NFC] Update benchmark phase naming to match documentation

[mlir][memref] Add runtime verification for memref::CastOp

Verify unranked -> ranked casts and casts of dynamic sizes/offset/strides to static ones.

Differential Revision: https://reviews.llvm.org/D138671

[AArch64] add tests for x*y == 0; NFC

[x86] add tests for x*y == 0; NFC

[UpdateTestChecks] Do not add --force-update to UTC_ARGS

Persisting this flag only introduces test churn.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D141124

Make switch-to-lookup-large-types.ll more reliable

When larger integer types are natively supported simplifycfg will use an
inline constant instead of a global variable for this transform. I noticed
this while trying to automatically infer the datalayout from the target
triple in opt if it is not explicitly specified. Since the x86_64
datalayout includes "n8:16:32:64", this test started failing.

While touching this file also change i128 to i64 in the first test since
this was intended behaviour in the original commit.

Reviewed By: spatel, fhahn

Differential Revision: https://reviews.llvm.org/D141055

[CallSiteSplitting] Convert test to opaque pointers (NFC)

Keeping the bitcasts here because this is in part testing the
(legal) bitcast after a musttail call, even though it's no longer
really relevant.

[gn build] Port 4670d5ece57d

[NFC] Vastly simplifies TypeSize

Simplifies the implementation of `TypeSize` while retaining its interface.
There is no need for abstract concepts like `LinearPolyBase`, `UnivariateLinearPolyBase` or `LinearPolySize`.

Differential Revision: https://reviews.llvm.org/D140263

[WebAssembly] Explicitly add {z,s}ext so extends are selected

During DAG legalization, {u,s}itofp instructions on v2i8, v2i16, v4i8
and v4i16 types ended up being legalized into scalar instructions, when
they could just be extended to v2i32/v4i32 instead.

Fixes https://github.com/llvm/llvm-project/issues/57182

Differential Revision: https://reviews.llvm.org/D140916

[mlir] adapt TransformEachOpTrait to parameter values

Adapt the implementation of TransformEachOpTrait to the existence of
parameter values recently introduced into the transform dialect. In
particular, allow `applyToOne` hooks to return a list containing a mix
of `Operation *` that will be associated with handles and `Attribute`
that will be associated with parameter values by the trait
implementation of the transform interface's `apply` method.

Disentangle the "transposition" of the list of per-payload op partial
results to decrease its overall complexity and detemplatize the code
that doesn't really need templates. This removes the poorly documented
special handling for single-result ops with TransformEachOpTrait that
could have assigned null pointer values to handles.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D140979

[mlir] NFC: move DiagnosedSilenceableFailure to Utils in Transform dialect

It was originally placed in TransformInterfaces for convenience, but it
is really a generic utility. It may also create an include cycle between
TransformTypes and TransformInterfaces if the latter needs to include
the former because the former uses the failure util.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D140978

[mlir] NFC: rename TransformTypeInterface to TransformHandleTypeInterface

This makes it more consistent with the recently added
TransformParamTypeInterface.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D140977

[mlir] introduce parameters into the transofrm dialect

Introduce a new kind of values into the transform dialect -- parameter
values. These values have a type implementing the new
`TransformParamTypeInterface` and are associated with lists of
attributes rather than lists of payload operations. This mechanism
allows one to wrap numeric calculations, typically heuristics, into
transform operations separate from those at actually applying the
transformation. For example, tile size computation can be now separated
from tiling itself, and not hardcoded in the transform dialect. This
further improves the separation of concerns between transform choice and
implementation.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D140976

[bazel] Add missing :Support dependency after 1b8224537070

[mlir][tensor] Support parallel_insert_slice in MergeConsecutiveInsertExtractSlicePatterns.cpp

Differential Revision: https://reviews.llvm.org/D141116

[mlir][linalg] Swap extract_slice(fill(x)) ops

This pattern is similar to `FoldFillWithTensorReshape`, which performs the same swapping with reshapes.

Fill the smaller extracted tensor slice instead of `x`. This allows for additional simplifications in case `x` is the result of another extract_slice.

Differential Revision: https://reviews.llvm.org/D141117

[clang][analyzer] Extend StreamChecker with some new functions.

The stream handling functions `ftell`, `rewind`, `fgetpos`, `fsetpos`
are evaluated in the checker more exactly than before.
New tests are added to test behavior of the checker together with
StdLibraryFunctionsChecker. The option ModelPOSIX of that checker
affects if (most of) the stream functions are recognized, and checker
StdLibraryFunctionArgs generates warnings if constraints for arguments
are not satisfied. The state of `errno` is set by StdLibraryFunctionsChecker
too for every case in the stream functions.
StreamChecker works with the stream state only, does not set the errno state,
and is not dependent on other checkers.

Reviewed By: Szelethus

Differential Revision: https://reviews.llvm.org/D140395

[Transforms] Convert some tests to opaque pointers (NFC)

[AArch64] add GlobalIsel support for scalar CNT instruction

When feature CSSC is available we should use instruction CNT for s32, s64 and
s128 types in GlobalIsel's G_CTPOP.

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits-

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D139417

[GlobalSplit] Convert test to opaque pointers (NFC)

[ConstantFold] Don't drop zero index gep with inrange attribute

This may cause GlobalSplit to fail if opaque pointers are used.

inrange really needs a new representation, but for now restore the
pre-opaque pointers status.

NFC Address review comment for D140905

[LV] Disable runtime unrolling for vectorized loops.

This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261

[DebugInfo] Replace UndefValue with PoisonValue in setKillLocation

This helps towards the effort to remove UndefValue from LLVM.

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D140905

[LoopUnroll] Convert test to opaque pointers (NFC)

[LoopUnroll] Name instructions in test (NFC)

Apply clang-tidy fixes for performance-unnecessary-value-param in SparseTensorCodegen.cpp (NFC)

[LoopIdiom] Convert tests to opaque pointers (NFC)

The differences here are due to SCEVExpander producing GEPs with
explicit offset calculation, a known difference with opaque pointers.

[cmake] Add llvm-debuginfod as test dependency

llvm-debuginfod is used by llvm-lit as of
36f01909a0e29c1014301ed6835687a84bf0e9fa, so adding this dependency
fixes a "note: Did not find llvm-debuginfod" warning from showing up
when running tests.

Differential Revision: https://reviews.llvm.org/D141071

[DebugInfo] Prefer setKillLocation rather than replacing operands with undef

NFC-ish. There is a functional change but the outputs are semantically
identical. Where we might've before replaced one operand with undef (which
means "this is a kill location marker") the use of `setKillLocation` will
replace all location operands with `undef` (which also means "this is a kill
location marker").

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D140904

[LoopIdiom] Name instructions in test (NFC)