review.tizen.org Git - platform/upstream/llvm.git/log

[flang] optionally lower scalar and explicit shape with fir.declare

Lower scalar and explicit shape arrays to fir.declare under the -hlfir option.
Update the SymMap so that it can hold fir::FortranVariableInterface.

The plan is to go towards a SymMap that only contains fir::FortranVariableInterface
once current expression lowering can be replaced. This should make the SymMap lighter
than it is today (SymBox/ExtendedValue are above 256 bytes).

Assumed shape, allocatable and pointer are left TODOs for now. Anything with a
specification expression that is not a constant expression will only be able to
be lowered when the HLFIR expression lowering skeleton is added.

Differential Revision: https://reviews.llvm.org/D136252

[flang] add fir.declare codegen support

For now, nothing is done about debug info and the fir.declare is simply
replaced by the memref argument. This is done in the PreCGRewrite in
order to avoid requiring adding support for fir.shape codegen, which
would still be useless and undesired at that point.

Differential Revision: https://reviews.llvm.org/D136254

[gn build] Port 3c397c90c183

[llvm-debuginfo-analyzer] (04/09) - Locations and ranges

llvm-debuginfo-analyzer is a command line tool that processes debug
info contained in a binary file and produces a debug information
format agnostic “Logical View”, which is a high-level semantic
representation of the debug info, independent of the low-level
format.

The code has been divided into the following patches:

1) Interval tree
2) Driver and documentation
3) Logical elements
4) Locations and ranges
5) Select elements
6) Warning and internal options
7) Compare elements
8) ELF Reader
9) CodeView Reader

Full details:
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570

This patch:

Locations and ranges
- All functionality for logical debug locations and ranges:
LVLocation, LVRanges.

Reviewed By: psamolysov, probinson

Differential Revision: https://reviews.llvm.org/D125779

[mlir][aarch64] Disable bf16 tests on AArch64

This patch disables 2 bf16 tests that are currently not supported on
AArch64. I've triaged these failures and opened [1] to track this. I
don't have a simple reproducer for dense_output_bf16.mlir, but it's
rather clear that both tests fail due to missing support for `bfloat`
operations in the AArch64 backend.

I'm not sure what the path forward to enable these tests on AArch64
should be. I think that there are two options:
  * AArch64 backened gains capability to legalize these nodes containing
    `bfloat` operands, or
  * MLIR (similarly to Clang) is taught not to emit such nodes in the
    first place.

[1] https://github.com/llvm/llvm-project/issues/58465

Differential Revision: https://reviews.llvm.org/D136273

[libc][Obvious] Fix incomplete spec definition of sys/random.h.

[X86] Move 128/256-bit FP16/BF16 typedef to emmintrin.h or avxintrin.h, NFCI

[mlir][llvm] Use longer variable names in LLVM IR import (NFC).

Rename single letter member variables and function arguments to use
longer names in ConvertFromLLVMIR.cpp. Also drop some uses of auto in
favor our spelling out the type and refactor some llvm::enumerate loops.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D136246

[gn build] Port e28b9357b14c

[llvm-debuginfo-analyzer] (03/09) - Logical elements

llvm-debuginfo-analyzer is a command line tool that processes debug
info contained in a binary file and produces a debug information
format agnostic “Logical View”, which is a high-level semantic
representation of the debug info, independent of the low-level
format.

The code has been divided into the following patches:

1) Interval tree
2) Driver and documentation
3) Logical elements
4) Locations and ranges
5) Select elements
6) Warning and internal options
7) Compare elements
8) ELF Reader
9) CodeView Reader

Full details:
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570

This patch:

Logical elements
- All basic functionality for the logical elements:
LVScope, LVLine, LVSymbol, LVType.
- The logical reader:
LVReader.h

Reviewed By: psamolysov, probinson

Differential Revision: https://reviews.llvm.org/D125778

[MLIR] Enable distribution target in standalone builds

Invoke llvm_distribution_add_targets() when doing standalone build
explicitly in order to create the `distribution` target.

Differential Revision: https://reviews.llvm.org/D136269

[testcase][OpenMP] Fix the testcase error of check-all when DCLANG_DEFAULT_OPENMP_RUNTIME is not libomp

When DCLANG_DEFAULT_OPENMP_RUNTIME is set to libgomp, there is some check-all error.
The expected CHECK result only displays when fopenmp=libomp is specified explicitly.

Differential Revision: https://reviews.llvm.org/D136239

[X86] Remove redundant static from constexpr. NFC

[CMake] Disable BOLT instrumentation of Clang on instrumented build

This enables multi-stage PGO build optimized by BOLT using BOLT.cmake cache.

The issue is that `-DPGO_BUILD_CONFIGURATION` cache file is passed to both
stage2-instrumented and stage2-optimized builds (for them to be identical),
but in case of BOLT.cmake, it doesn't make sense to BOLT-instrument the
instrumented binary (it's not going to be optimized). Hence turn off
`CLANG_BOLT_INSTRUMENT` code if `LLVM_BUILD_INSTRUMENTED` is enabled.

The final workflow that enables multi-stage InstrPGO+ThinLTO+BOLT Clang build:
```
cmake <llvm-project>/llvm -GNinja -DLLVM_ENABLE_LLD=ON \
-DBOOTSTRAP_LLVM_ENABLE_LLD=ON -DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DPGO_INSTRUMENT_LTO=Thin -C llvm-project/clang/cmake/caches/BOLT-PGO.cmake
ninja stage2-clang++-bolt
```

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D136023

[gn build] Port 62ca79102cf9

[X86][1/2] Support PREFETCHI instructions

For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D136040

[docs] Update compiler-rt/CODE_OWNERS.TXT

Reviewed By: compnerd, kcc, lhames, phosek, tejohnson, dvyukov, MaskRay

Differential Revision: https://reviews.llvm.org/D135617

[mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes

This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on
largest possible tiles for matrixB. It requires handling
`vector.extract_strided_slice` from vector to ngpu lowering.

Differential Revision: https://reviews.llvm.org/D135749

[test][asan][Darwin] Pass -mlinker-version=133 to linker invocation in odr-lto.cpp

When building clang with lld, we don't get a default mlinker-version [1]. This causes us to not pass -lto_library to ld64 [2].
Explicitly pass -mlinker-version=133 so we properly pass -lto_library to ld64 and don't get LLVM bitcode version mismatches due to LTO.

[1] https://github.com/llvm/llvm-project/blob/55ae180a4cb7fc68b3ac153f07752c8c6a2d92f0/clang/CMakeLists.txt#L345
[2] https://github.com/llvm/llvm-project/blob/55ae180a4cb7fc68b3ac153f07752c8c6a2d92f0/clang/lib/Driver/ToolChains/Darwin.cpp#L262-L270

[VP] Teach isVPBinaryOp to recognize vp.smin/smax/umin/umax/minnum/maxnum.

Those vp intrinsics should be vp binary operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135753

[clang][RISCV] Set vscale_range attribute based on VLEN

Follow up on D135894, restructure code to work in terms of minimum and maximum VLEN coming from RISCVISAInfo.cpp. In the original review, I'd mentioned that MinVLEN was sometimes zero. This turns out to be a case of human error, combined with really bad (lack of) error reporting.

This patch adds appropriate tests for various vector extension combinations to show the mechanism works, but doesn't try to provide exhaustive coverage of the extension interactions. Presumably, that is already covered in existing tests elsewhere.

Differential Revision: https://reviews.llvm.org/D136106

[VE] Change the way to lower selectcc

Change to use VEISD::CMPI/CMPU/CMPF/CMPQ and VEISD::CMOV in combineSelectCC
for better optimization. Support VEISD::CMPI/CMPU in combineTRUNCATE also
to optimize truncate. Remove obsolete lower patterns from VEInstrInfo.td.
Update regression tests also.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D136049

[libc][Obvious] Add termios.h to the list of x86_64 linux headers.

[RISCV] Remove -enable-unsafe-fp-math from machine combiner tests. NFC

The optimization is using fast math flags on the instructions instead.

[clang] Disable assertion that can "easily happen"

Disable the assertion for getting a module ID for non-local,
non-imported module. According to the FIXME this can "easily happen" and
indeed, we're hitting this assertion regularly. Disable it until it can
be properly investigated.

rdar://99352728

Differential revision: https://reviews.llvm.org/D136290

[examples][ORC] Make sure eh-frame registration code is linked into an example.

Since aedeb8d5570, which switched to EPC-based eh-frame registration, the
eh-frame registration functions need to be forcibly linked into the target
process.

We need a general solution to this problem, but for now just force it in this
example to fix the test failures in
https://green.lab.llvm.org/green/job/clang-stage1-RA/31497

rdar://101083784

[Sema] Don't treat a non-null template argument as if it were null.

The way this code checks whether a pointer is null is wrong for other
reasons; it doesn't actually check whether a null pointer constant is a
"constant" in the C++ standard sense. But this fix at least makes sure
we don't treat a non-null pointer as if it were null.

Fixes https://github.com/llvm/llvm-project/issues/57883

Differential Revision: https://reviews.llvm.org/D134928

[Hexagon] Fix insertion point for pointer difference calculation

HVC::calculatePointerDifference inserts temporary instructions for
simplification, and calulation of known bits. These instructions were
inserted at the end of a basic block (after the terminator), which
caused BB->getTerminator() to return nullptr. This, in turn, caused
a crash when a PHI instruction was examined in computeKnownBits.

[mlir][sparse] Fix breakage on older versions of cmake

Per https://reviews.llvm.org/D136005#3866692 the introduction of the MLIRSparseTensorEnums target in D136002 caused breakage on some versions of cmake. This differential aims to fix those errors.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D136217

[lldb] Allow SymbolFileDWARFDebugMap to register multiple compile units

Currently, SymbolFileDWARFDebugMap works on the assumption that there is
only one compile unit per object file. This patch documents this
limitation (when using the general SymbolFile API), and allows users of
the concrete SymbolFileDWARFDebugMap class to find out about these extra
compile units.

Differential Revision: https://reviews.llvm.org/D136114

[mlgo] Fix one test post-D135934

The test was checking output opcodes, one changed as result of D135934.

[mlir][linalg] Add back split reduction tests dropped by previous commit

The transition to transform dialect based tests dropped several cases of
the split reduction testing. Adding them back.

Differential Revision: https://reviews.llvm.org/D136287

[BitcodeReader] Convert pair to triple in preparation for MemProf (NFC)

Extracted from D135714 which adds summary support for MemProf. We will
need a 3rd tuple member in the ValueIdToValueInfoMap, this patch makes a
number of NFC changes to the existing clients of that map to reflect the
conversion of pair to tuple.

[SPIR-V] Add get_image_num_mip_levels implementation

Differential Revision: https://reviews.llvm.org/D135904

[SPIR-V] Add atomic_init and fix atomic explicit lowering

Differential Revision: https://reviews.llvm.org/D135902

[lldb] Add matching based on Python callbacks for data formatters.

This patch adds a new matching method for data formatters, in addition
to the existing exact typename and regex-based matching. The new method
allows users to specify the name of a Python callback function that
takes a `SBType` object and decides whether the type is a match or not.

Here is an overview of the changes performed:

- Add a new `eFormatterMatchCallback` matching type, and logic to handle
  it in `TypeMatcher` and `SBTypeNameSpecifier`.

- Extend `FormattersMatchCandidate` instances with a pointer to the
  current `ScriptInterpreter` and the `TypeImpl` corresponding to the
  candidate type, so we can run registered callbacks and pass the type
  to them. All matcher search functions now receive a
  `FormattersMatchCandidate` instead of a type name.

- Add some glue code to ScriptInterpreterPython and the SWIG bindings to
  allow calling a formatter matching callback. Most of this code is
  modeled after the equivalent code for watchpoint callback functions.

- Add an API test for the new callback-based matching feature.

For more context, please check the RFC thread where this feature was
originally discussed:
https://discourse.llvm.org/t/rfc-python-callback-for-data-formatters-type-matching/64204/11

Differential Revision: https://reviews.llvm.org/D135648

[SLP][NFC]Remove unused variable, NFC.

[NFC][CostModel] Added floating point frem test for SVE

Differential Revision: https://reviews.llvm.org/D136241

[mlir][sparse] end-to-end sparse vector insertion codegen

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D136275

[clang][modules] Add time traces for AST serialization

Fills gaps in the time trace when precompiled headers are created/loaded.

Reviewed By: jansvoboda11

Differential Revision: https://reviews.llvm.org/D135657

[RISCV] Add more check prefixes to extractelt-fp.ll to fix a conflicting case.

The existing prefix conflicted and the script silently dropped the checks.

[BOLT] Ignore duplicate global symbols

We noticed some binaries with duplicated global symbol
entries (same name, address and size). Ignore them as it is possibly a
bug in the linker, and continue processing, unless the symbol has a
different size or address.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D136122

[flang][NFC] Add fir.dispatch codegen test with pass object at pos 1

D136189 was missing a test where the pass object is not at
position 0. This patch adds one.

Reviewed By: jeanPerier, PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D136231

[mlir][NVGPU] Fixing minor typo (first test commit)

[docs] Add myself for LLVM Office hours

Add an entry for my office hours. Intended focus is low-level LLVM
stuff.

Differential Version: https://reviews.llvm.org/D136270

Revert D135427 "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally"

This reverts commit 8ef3fd8d59ba0100bc6e83350ab1e978536aa531.

I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up),
as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).

[mlir][sparse] remove vector support in sparsification

Sparse compiler used to generate vectorized code for sparse tensors computation, but it should really be delegated to other vectorization passes for better progressive lowering.

https://discourse.llvm.org/t/rfc-structured-codegen-beyond-rectangular-arrays/64707

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D136183

[lit] fix a error when using --show-used-features

The error is
```
NotADirectoryError: [Errno 20] Not a directory: '<build-dir>/unittests/Analysis/./AnalysisTests/0/40'
```

Exclude unittests when collecting features because
unittests don't make use of feature keywords.

[lld-macho][nfc] Clean up includes

- remove unused/duplicate includes
- reformatting/whitespaces

Differential Revision: https://reviews.llvm.org/D136266

[JMCInstrument] rename ELF section name from ".just.my.code" to ".data.just.my.code"

This gives linker scripts a hint about where to place the section.

[BOLT][DWARF] Add support for DW_FORM_addr for DW_AT_call_return_pc

GCC 12 produces DW_FORM_addr for DW_AT_call_return_pc. Added support for that.
Fixes facebookincubator/BOLT#307

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D136204

[NFC] Updating an incorrect code comment

This slipped in by accident.

[libc++][doc] Fixes status pages.

Addresses post-commit review comment in D134742.

[mlir][sparse] Replace the folding of nop convert with a codegen rule.

This is to allow the use of a nop convert to express that the sparse tensor
allocated through bufferization::AllocTensorOp will be expanded to sparse
tensor storage by sparse tensor codegen.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D136214

[OMPIRBuilder] Support depend clause for task

This patch adds support for the `depend` clause for the `task`
construct.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D135695

[DX] Fix missing preserved analysis

The ShaderFlagsAnalysisWrapper needs to be marked to preserve all
analyssis.

Fixes #58474 (https://github.com/llvm/llvm-project/issues/58474)

[AArch64] Fix minor issue introduced in D135950.

The Key for the SubtargetMap had the StreamingSVEModeDisabled in the
wrong place. This change is non-functional, since the string (key) is
still unique.

[DirectX] Disabling currently failing test

The pretty-printer isn't working because the resource analysis isn't
properly preservered.

[AArch64] SME2 Single-multi vector ternary int/FP 2 and 4 registers

This patch adds the assembly/disassembly for the following instructions:

For INT:
    ADD(array results, multiple and single vector): Add replicated single
        vector to multi-vector with ZA array vector results.
    SUB(array results, multiple and single vector): Subtract replicated single
        vector from multi-vector with ZA array vector results.
For FP:
    FMLA (multiple and single vector): Multi-vector floating-point fused
          multiply-add by vector.
    FMLS (multiple and single vector): Multi-vector floating-point
          multiply-subtract long by vector.
The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits
(MatrixOp32 and MatrixOp64)

Depends on: D135448

Depends on:  D135952

Differential Revision: https://reviews.llvm.org/D135455

[AArch64][SME] Disable (SLP|Loop)Vectorizer when function may be executed in streaming mode.

When the SME attributes tell that a function is or may be executed in Streaming
SVE mode, we currently need to be conservative and disable _any_ vectorization
(fixed or scalable) because the code-generator does not yet support generating
streaming-compatible code.

Scalable auto-vec will be gradually enabled in the future when we have
confidence that the loop-vectorizer won't use any SVE or NEON instructions
that are illegal in Streaming SVE mode.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D135950

[MLIR][Tensor] Remove assert in PadOp builder

The assert is misplaced as the result type is allowed to be null. A few
lines below the result type is inferred if it is passed a nullptr.
Besides, this behavior is described in the documentation of the builder.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D136262

Move HLSL builtins into hlsl namespace

Should have done this from the start. Since all the injected AST types
are in the hlsl namespace we should also put the header-defined types
and functions in there too.

This updates the basic_types test to run once with the namespaced types
and once without, and adds using declarations or namespaces calls in
other tests.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135973

[X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics

This is an alternative of D120395 and D120411.

Previously we use `__bfloat16` as a typedef of `unsigned short`. The
name may give user an impression it is a brand new type to represent
BF16. So that they may use it in arithmetic operations and we don't have
a good way to block it.

To solve the problem, we introduced `__bf16` to X86 psABI and landed the
support in Clang by D130964. Now we can solve the problem by switching
intrinsics to the new type.

Reviewed By: LuoYuanke, RKSimon

Differential Revision: https://reviews.llvm.org/D132329

[AMDGPU] New helper function SIInsertWaitcnts::getVmemWaitEventType

This just commons up and simplifies some logic that was repeated in
SIInsertWaitcnts::updateEventWaitcntAfter. NFCI.

Differential Revision: https://reviews.llvm.org/D136253

[SLP][NFC]Add a test for possible reordering gap in SLP, NFC.

Avoid exporting 80-bit fp functions for architectures other than Intel.

This patch is a partial fix for [[ https://github.com/llvm/llvm-project/issues/56349 | issue ]], due to functions affected by D117473.

Implementation details:
The patch essentially creates a new macro if the architecture is either
intel32 or intel64, since the generate-def.pl cannot process boolean algebra
on macros.

Reviewed By: jlpeyton

Differential Revision: https://reviews.llvm.org/D135795

[analyzer] Make directly bounded LazyCompoundVal as lazily copied

Previously, `LazyCompoundVal` bindings to subregions referred by
`LazyCopoundVals`, were not marked as //lazily copied//.

This change returns `LazyCompoundVals` from `getInterestingValues()`,
so their regions can be marked as //lazily copied// in `RemoveDeadBindingsWorker::VisitBinding()`.

Depends on D134947

Authored by: Tomasz Kamiński <tomasz.kamiński@sonarsource.com>

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D135136

[analyzer] Fix the liveness of Symbols for values in regions referred by LazyCompoundVal

To illustrate our current understanding, let's start with the following program:
https://godbolt.org/z/33f6vheh1
```lang=c++
void clang_analyzer_printState();

struct C {
   int x;
   int y;
   int more_padding;
};

struct D {
   C c;
   int z;
};

C foo(D d, int new_x, int new_y) {
   d.c.x = new_x;       // B1
   assert(d.c.x < 13);  // C1

   C c = d.c;           // L

   assert(d.c.y < 10);  // C2
   assert(d.z < 5);     // C3

   d.c.y = new_y;       // B2

   assert(d.c.y < 10);  // C4

   return c;  // R
}
```
In the code, we create a few bindings to subregions of root region `d` (`B1`, `B2`), a constrain on the values  (`C1`, `C2`, ….), and create a `lazyCompoundVal` for the part of the region `d` at point `L`, which is returned at point `R`.

Now, the question is which of these should remain live as long the return value of the `foo` call is live. In perfect a word we should preserve:

  # only the bindings of the subregions of `d.c`, which were created before the copy at `L`. In our example, this includes `B1`, and not `B2`.  In other words, `new_x` should be live but `new_y` shouldn’t.

  # constraints on the values of `d.c`, that are reachable through `c`. This can be created both before the point of making the copy (`L`) or after. In our case, that would be `C1` and `C2`. But not `C3` (`d.z` value is not reachable through `c`) and `C4` (the original value of`d.c.y` was overridden at `B2` after the creation of `c`).

The current code in the `RegionStore` covers the use case (1), by using the `getInterestingValues()` to extract bindings to parts of the referred region present in the store at the point of copy. This also partially covers point (2), in case when constraints are applied to a location that has binding at the point of the copy (in our case `d.c.x` in `C1` that has value `new_x`), but it fails to preserve the constraints that require creating a new symbol for location (`d.c.y` in `C2`).

We introduce the concept of //lazily copied// locations (regions) to the `SymbolReaper`, i.e. for which a program can access the value stored at that location, but not its address. These locations are constructed as a set of regions referred to by `lazyCompoundVal`. A //readable// location (region) is a location that //live// or //lazily copied// . And symbols that refer to values in regions are alive if the region is //readable//.

For simplicity, we follow the current approach to live regions and mark the base region as //lazily copied//, and consider any subregions as //readable//. This makes some symbols falsy live (`d.z` in our example) and keeps the corresponding constraints alive.

The rename `Regions` to `LiveRegions` inside  `RegionStore` is NFC change, that was done to make it clear, what is difference between regions stored in this two sets.

Regression Test: https://reviews.llvm.org/D134941
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Reviewed By: martong, xazax.hun

Differential Revision: https://reviews.llvm.org/D134947

[Libomptarget][NFC] clang-format the libomptarget OpenMP tests

Summary:
Recent changes to clang-format improved the handling of OpenMP pragmas.
Clean up the existing libomptarget tests.

[AMDGPU] V_LDEXP_F16 encoding fix and doc update.

The amdgcn.ldexp.* intrinsics take an i32 value as src1.
The V_LDEXP_F16 instruction considers src1 an f16 operand, and therefore
src1 is implicitly truncated to 16 bits when lowering to that instruction from the
intrinsic. This is unlikely to result in an error in practice
because values that large are not useful.

The operand class of src1 in the True16 version of the instruction has
been corrected to encode correctly on GFX11.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D136195

[Verifier] Allow undef/poison token argument to llvm.experimental.gc.result

As part of the optimization in the unreachable code, we remove
tokens, thereby replacing them with undef/poison in intrinsics.
But the verifier falls on the assertion, within of what it sees
token poison in unreachable code, which in turn is incorrect.

bug: 57871, https://github.com/llvm/llvm-project/issues/57871
Differential Revision: https://reviews.llvm.org/D134427

[flang] Fix missing generated includes in out of tree build

875fd9df76ded4a88a3a44b690f290ea98f91705 added a new dialect
with some generated files.

When flang is built out of tree (build llvm/clang/mlir first, then
build flang pointing at the first build) those files were not created
at all.

I don't 100% understand why not but juding by the comment at the top
of the file, add_mlir_interface probably expects to run in an MLIR
directory, as add_mlir_dialect does.

So in the same way, I've just inlined enough of that function to
fix the out of tree build.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D136250

[gn build] Port 8cadac41e9f6

[clang][dataflow] Add equivalence relation `Value` type.

Defines an equivalence relation on the `Value` type to standardize several
places in the code where we replicate the ~same equivalence comparison.

Differential Revision: https://reviews.llvm.org/D135964

[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC).

@Ayal suggested a better named helper than using `!getDef()` to check if
a value is invariant across all parts.

The property we are using here is that the VPValue is defined outside
any vector loop region. There's a TODO left to handle recipes defined in
pre-header blocks.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133666

[clangd] consider ~^foo() to target the destructor, not the type

This behavior was once deliberate, but i've yet to find someone who likes it.
The reference behavior is unchanged: the `foo` within ~foo is still considered
a reference to the type. This means rename etc still works.

fixes https://github.com/clangd/clangd/issues/179

Differential Revision: https://reviews.llvm.org/D136212

Revert rG42230efccf8fe1185be5fa6c23dce0a8183d6ec9 "[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1"

@foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it

[Attr][Doc] Fix pragma unroll documentation.

There is a contradiction in the #pragma unroll behavior documentation.
It says that specifying `#pragma unroll` without a parameter directs the
loop unroller to attempt to partially unroll the loop if the trip count
is not known at compile time. At the same time later it states that
`#pragma unroll` has identical semantics to `#pragma clang loop unroll(full)`,
which doesn't attempt to unroll partially if the trip count is not known
at compile time.

pragma clang loop unroll(enable):
If unroll(enable) is specified the unroller will attempt to fully unroll
the loop if the trip count is known at compile time. If the fully
unrolled code size is greater than an internal limit the loop will be
partially unrolled up to this limit. If the trip count is not known at
compile time the loop will be partially unrolled with a heuristically
chosen unroll factor.

pragma clang loop unroll(full):
If unroll(full) is specified the unroller will attempt to fully unroll
the loop if the trip count is known at compile time identically to
unroll(enable). However, with unroll(full) the loop will not be unrolled
if the loop count is not known at compile time.

Differential Revision: https://reviews.llvm.org/D136160

[mlir] Add TransposeOp to Linalg structured ops.

RFC: https://discourse.llvm.org/t/rfc-primitive-ops-add-mapop-reductionop-transposeop-broadcastop-to-linalg/64184

Differential Revision: https://reviews.llvm.org/D135854

[SCEV] Replace assert with returning CouldNotComp in computeMaxBECountForLT.

This patch removes the bail out for signed predicates and non-positive
strides in howManyLessThans and updates computeMaxBECountForLT to return
SCEVCouldNotCompute for signed predicates with negative strides.

AFAICT bail-out was only added because computeMaxBECountForLT may not
handle negative signed strides correctly. Instead of not calling
computeMaxBECountForLT at all because we bail out earlier, we can
instead return SCEVCouldNotCompute in computeMaxBECountForLT.

The max backedge taken count will be computed as the max value of the
symbolic backedge taken count.

This improves precision in cases where we can compute symbolic backedge
taken counts and also fixes a crash.

Fixes #57818.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135667

[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads.

This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.

Differential Revision: https://reviews.llvm.org/D135137

Keep configuration file search directories in ExpansionContext. NFC

Class ExpansionContext encapsulates options for search and expansion of
response files, including configuration files. With this change the
directories which are searched for configuration files are also stored
in ExpansionContext.

Differential Revision: https://reviews.llvm.org/D135439

[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1

Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops.

Differential Revision: https://reviews.llvm.org/D136081

[AMDGPU] Assume getDefIgnoringCopies will succeed. NFC.

getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on
valid MIR, so don't bother to check for failure.

Differential Revision: https://reviews.llvm.org/D136238

[mlir][llvm] Ordered traversal in LLVM IR import.

The revision performs a topological sort of the blocks to
ensure the operations are processed in dominance order.
After the change, we do not need to introduce dummy
instructions if an operand has not yet been processed.
Additionally, the revision also moves and simplifies the
control-flow related tests to a separate test file.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D136230

[AArch64] Replace sme-i64 by sme-i16i64 and sme-f64 by sme-f64f64

The names in developer.arm for these SME features are:
HaveSMEI16I64 and HaveSMEF64F64
so the new flag names are consistent with the documentation page

Reviewed By: sdesmalen, c-rhodes

Differential Revision: https://reviews.llvm.org/D135974

[AMDGPU] Add test case for a VOPD s_delay_alu insertion bug

[AMDGPU][Backend] Fix user-after-free in AMDGPUReleaseVGPRs::isLastVGPRUseVMEMStore

Reviewed By: jpages, arsenm

Differential Revision: https://reviews.llvm.org/D134641

[libc++] Remove std::function in C++03

We've said that we'll remove `std::function` from C++03 in LLVM 16, so we might as well do it now before we forget.

Reviewed By: ldionne, #libc, Mordante

Spies: jloser, Mordante, libcxx-commits

Differential Revision: https://reviews.llvm.org/D135868

[flang] Add fir.declare operation

Add fir.declare operation whose purpose was described in https://reviews.llvm.org/D134285.
It uses the FortranVariableInterfaceOp for most of its logic (including the verifier).
The rational is that all these aspects/logic will be shared by hlfir.designate and
hlfir.associate.

Its codegen and lowering will be added in later patches.

Differential Revision: https://reviews.llvm.org/D136181

[AA] Rename getModRefBehavior() to getMemoryEffects() (NFC)

Follow up on D135962, renaming the method name to match the new
type name.

[AA] Rename uses of FunctionModRefBehavior (NFC)

Followup to D135962 to rename remaining uses of
FunctionModRefBehavior to MemoryEffects. Does not touch API names
yet, but also updates variables names FMRB/MRB to ME, to match the
new type name.

[AA] Rename FunctionModRefBehavior to MemoryEffects (NFC)

As part of https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579,
the FunctionModRefBehavior class sees a good bit of additional use,
and I've found the name to be something of a mouthful. This patch
renames it to MemoryEffects, which has a couple of advantages over
the old name:
* It is more concise.
* It decouples it from modelling only functions.
* It matches the terminology of the aforementioned RFC.
* The meaning should be more obvious to people not familiar with
our particular AA lingo.

This patch just updates the class definition. Other uses of the
name will be updated separately.

Differential Revision: https://reviews.llvm.org/D135962

[RISCV] Enable the LocalStackSlotAllocation pass support

For RISC-V, load/store(exclude vector load/store) instructions only
has a 12 bit immediate operand. If the offset is out-of-range, it
must make use of a temp register to make up this offset. If between
these offsets, they have a small(IsInt<12>) relative offset,
LocalStackSlotAllocation pass can find a value as frame base register's
value, and replace the origin offset with this register's value plus
the relative offset.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98101

[flang][NFC] Fix printed name from proc_nopass_p2

[lldb][trace] Fix some minor bugs in the call tree

- We weren't truncating the output files
- We weren't considering the case in which we couldn't disassembly an
instruction.

[flang] Add fir.dispatch code generation

fir.dispatch code generation uses the binding table stored in the
type descriptor. There is no runtime call involved. The binding table
is always build from the parent type so the index of a specific binding
is the same in the parent derived-type or in the extended type.

Follow-up patches will deal cases not present here such as allocatable
polymorphic entities or pointers.

Reviewed By: jeanPerier, PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D136189

[include-cleaner] Fix link errors when -DBUILD_SHARED_LIBS=ON

Fixed ppc buildbot https://lab.llvm.org/buildbot/#/builders/121/builds/24273 which is using `-DBUILD_SHARED_LIBS=ON`.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D136229

[flang] Introduce FortranVariableOpInterface for ops creating variable

HLFIR will rely on certain operations to create SSA memory values
that correspond to a Fortran variable. They will hold bounds and type
parameters information as well as metadata (like Fortran attributes).

This patch adds an interface that for such operations so that Fortran
variable can be stored, manipulated, and queried regardless of what
created them. This is so far intended for fir.declare, hlfir.designate
and hlfir.associate operations.
It is added to FIR and not HLFIR because fir.declare needs it and it
does not itself needs any HLFIR concepts.

Unit tests for the interface methods will be added alongside
fir.declare in the next patch.

Differential Revision: https://reviews.llvm.org/D136151

[mlir][spirv] Consider target when converting one-element vector

Vectors with just one element will be converted into scalars.
However, we cannot just return the element types and assume it
is supported in the target environment; we need to conver the
element type again factoring in those considerations.

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D136226