review.tizen.org Git - platform/upstream/llvm.git/log

[IR] Allow typed pointers to be used in vector types

Reviewed By: nikic, jcranmer-intel

Differential Revision: https://reviews.llvm.org/D136768

[NFC] Remove unused variables

Community calendar: more clearly document how to add events

[libc++abi][AIX] Use reserved slot in stack to pass the address of exception object

Summary:
The existing implementation of the personality for legacy IBM xlclang++ compiler generated code passes the address of exception object in r14 for the landing pad to retrieve with a call to __xlc_exception_handle(). This clobbers the content of r14 in user code (and potentially, when running cleanup actions, the address of another exception object being passed). This patch changes to use the stack slot reserved for compilers to pass the address. It has been confirmed that xlclang++-generated code does not use this slot.

Reviewed by: hubert.reinterpretcast, cebowleratibm

[mlir] Add `parseSymbolName` that doesn't take an attribute list

This patch adds a version of `parseSymbolName` and
`parseOptionalSymbolName` to AsmParser that don't take an attribute name
and attribute list.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D136696

[flang] Carry dynamic type when emboxing polymorphic pointer

In order to be passed as passed-object in the dynamic dispatch, the
polymorphic pointer entity are emboxed. In this process, the dynamic
type must be preserve and pass to fir.embox as the tdesc operand. This
patch introduce a new ExtendedValue that allow to carry over the
dynamic type when the value is unboxed.

Depends on D136820

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D136824

[flang] Lower allocate for polymorphic pointer

Lowering of allocate statement for polymoprhic pointers is a bit
different than for allocatables. A call to `PointerNullifyDerived`
runtime function is done instead of `AllocatableInitDerived`.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D136820

[mlir][spirv] Add target control to UnifyAliasedResourcePass

The UnifyAliasedResourcePass is actually only necessary for
targeting Apple GPUs via MoltenVK, where we need to translate
SPIR-V into MSL. The translation has limitations--no support
of aliased resources. So introducing a control to disable
this pass if targeting other platforms.

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D136869

[SLP]Improve analysis of same/alternate code ops and scheduling.

Should improve compile time for analysis and vectorization.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test           2023.00                   2022.00 -0.0%
test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test               148.00                    146.00 -1.4%

Generated more vector instructions.

Differential Revision: https://reviews.llvm.org/D127531

Fix whitespace introduced by 891aaff9a8a9997582eac1bb1edb8d4b4e117ef1

[AArch64][SVE2] Add the SVE2.1 pext and ptrue predicate-as-counter instructions

This patch adds the assembly/disassembly for the following instructions:

pext (predicate) : Set predicate from predicate-as-counter
ptrue (predicate-as-counter) : Initialise predicate-as-counter to all active

This patch also introduces the predicate-as-counter registers pn8, etc.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136678

[mlir][sparse] add a cursor to sparse storage scheme

This prepare a subsequent revision that will generalize
the insertion code generation. Similar to the support lib,
insertions become much easier to perform with some "cursor"
bookkeeping. Note that we, in the long run, could perhaps
avoid storing the "cursor" permanently and use some
retricted-scope solution (alloca?) instead. However,
that puts harder restrictions on insertion-chain operations,
so for now we follow the more straightforward approach.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D136800

[OpenMP][libomptarget] New plugin infrastructure and new CUDA plugin

This patch adds a new infrastructure for OpenMP target plugins. It also implements the CUDA and GenericELF64bit plugins under this new infrastructure. We place the sources in a separate directory named plugins-nextgen, and we build the new plugins as different plugin libraries. The original plugins, which remain untouched, will be used by default. However, the user can change this behavior at run-time through the boolean envar LIBOMPTARGET_NEXTGEN_PLUGINS. If enabled, the libomptarget will try to load the NextGen version of each plugin, falling back to the original if they are not present or valid.

The idea of this new plugin infrastructure is to implement the common parts of target plugins in generic classes (defined in files inside plugins-next/common/PluginInterface folder), and then, each specific plugin defines its own specific classes inheriting from the common ones. In this way, most logic remains on the common interface while reducing the plugin-specific source code. It is also beneficial in the sense that now most code and behavior are the same across the different plugins. As an example, we define classes for a plugin, a device, a device image, a stream manager, etc. The plugin object (a single instance per plugin library) holds different device objects (i.e., one per available device), while these latter are the responsible for managing its own resources.

Most code on this patch is based on the changes made by @jdoerfert (Johannes Doerfert)

Reviewed By: jhuber6, jdoerfert

Differential Revision: https://reviews.llvm.org/D134396

[clang] Implement -fstrict-flex-arrays=3

The -fstrict-flex-arrays=3 is the most restrictive type of flex arrays.
No number, including 0, is allowed in the FAM. In the cases where a "0"
is used, the resulting size is the same as if a zero-sized object were
substituted.

This is needed for proper _FORTIFY_SOURCE coverage in the Linux kernel,
among other reasons. So while the only reason for specifying a
zero-length array at the end of a structure is for specify a FAM,
treating it as such will cause _FORTIFY_SOURCE not to work correctly;
__builtin_object_size will report -1 instead of 0 for a destination
buffer size to keep any kernel internals from using the deprecated
members as fake FAMs.

For example:

  struct broken {
      int foo;
      int fake_fam[0];
      struct something oops;
  };

There have been bugs where the above struct was created because "oops"
was added after "fake_fam" by someone not realizing. Under
__FORTIFY_SOURCE, doing:

  memcpy(p->fake_fam, src, len);

raises no warnings when __builtin_object_size(p->fake_fam, 1) returns -1
and may stomp on "oops."

Omitting a warning when using the (invalid) zero-length array is how GCC
treats -fstrict-flex-arrays=3. A warning in that situation is likely an
irritant, because requesting this option level is explicitly requesting
this behavior.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
Differential Revision: https://reviews.llvm.org/D134902

[mlir][sparse] fix crash when sparsifying broadcast operations.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D136866

[RISCV] Fix an obvious CSE opportunity in LSR test case. NFC

[mlir][CAPI] Allow specifying pass manager anchor

This adds a new function for creating pass managers that takes an
argument for the anchor string.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D136404

[ObjectYAML] Add support for DXContainer HASH

DXContainer files contain a part that has an MD5 of the generated
shader. This adds support to the ObjectYAML tooling to expand the hash
part data and hash iteself in preparation for adding hashing support to
DirectX code generation.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D136632

[libc] add fgets

This adds the fgets function and its unit tests.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D136785

[LSR] Drop LSR solution if it is less profitable than baseline

The LSR may suggest less profitable transformation to the loop. This
patch adds check to prevent LSR from generating worse code than what
we already have.

Since LSR affects nearly all targets, the patch is guarded by the
option 'lsr-drop-solution' and default as disable for now.

The next step should be extending an TTI interface to allow target(s)
to enable this enhancememnt.

Debug log is added to remind user of such choice to skip the LSR
solution.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D126043

[mlir][python] Include pipeline parse errors in exception message

Currently any errors during pipeline parsing are reported to stderr.
This adds a new pipeline parsing function to the C api that reports
errors through a callback, and updates the python bindings to use it.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D136402

[llvm-readelf] --section-details: display SHF_COMPRESSED headers

readelf --section-details displays ch_type/ch_size/ch_addralign for
a SHF_COMPRESSED section. Port the feature. There is a small difference
that readelf doesn't display `[<corrupt>]` for an empty section while
we do.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D136636

[mlir] Fix asan issue in Vectorization.cpp of Linalg.

Differential Revision: https://reviews.llvm.org/D136852

[NFC][PhaseOrdering] Add one more test for SROA after partial unroll

https://reviews.llvm.org/D136806

[LegalizeVectorOps][X86][RISCV] Expand vector S/USHLSAT instead of unrolling.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D136478

[lldb][test] Remove explicit mydir definitions (NFC)

[RISCV] Drop single letter b extension support

It splited into several zb* extensions, and `b` is dropped after
0.93, so it time to retired that as other non-ratified zb* extensions.

Currntly clang can accept that with warning:

$ clang -target riscv64-elf ~/hello.c -S -march=rv64gcb
'+b' is not a recognized feature for this target (ignoring feature)
'+b' is not a recognized feature for this target (ignoring feature)
'+b' is not a recognized feature for this target (ignoring feature)

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D136812

[lldb][test] Remove empty setUp/tearDown methods (NFC)

[FuncSpec] Do not overestimate the specialization bonus for users inside loops.

When calculating the specialization bonus for a given function argument,
we recursively traverse the chain of (certain) users, accumulating the
instruction costs. Then we exponentially increase the bonus to account
for loop nests. This is problematic for two reasons: (a) the users might
not themselves be inside the loop nest, (b) if they are we are accounting
for it multiple times. Instead we should be adjusting the bonus before
traversing the user chain.

This reduces the instruction count for CTMark (newPM-O3) when Function
Specialization is enabled without actually reducing the amount of
specializations performed (geomean: -0.001% non-LTO, -0.406% LTO).

Differential Revision: https://reviews.llvm.org/D136692

[InstCombine] improve demanded bits for Sub operand 0

This is copying the code that was added for 'add' with D130075.
(That patch removed a fallthrough in the cases, but we can
probably still share at least some code again as a follow-up
cleanup, but I didn't want to risk it here.)

The reasoning is similar to the carry propagation for 'add':
if we don't demand low bits of the subtraction and the
subtrahend (aka RHS or operand 1) is known zero in those low
bits, then there can't be any borrowing required from the
higher bits of operand 0, so the low bits don't matter.

Also, the no-wrap flags can be propagated (and I think that
should be true for add too).

Here's an attempt to prove that in Alive2:
https://alive2.llvm.org/ce/z/xqh7Pa
(can add nsw or nuw to src and tgt, and it should still pass)

Differential Revision: https://reviews.llvm.org/D136788

[P0857R0 Part-B] Allows `require' clauses appearing in

template-template parameters. Although it effects whether a template can be
used as an argument for another template, the constraint seems not to
be checked, nor other major implementations (GCC, MSVC, et al.) check it.

Additionally, Part-A of the document seems to have been implemented.
So mark P0857R0 as completed.

Differential Revision: https://reviews.llvm.org/D134128

[LoongArch] Add codegen support for cmpxchg on LA64

Differential Revision: https://reviews.llvm.org/D135948

[MachineCSE] Allow PRE of instructions that read physical registers

Currently MachineCSE forbids PRE when the instruction reads a physical
register. Relax this so that it's allowed when the value being read is
the same as what would be read in the place the instruction would be
hoisted to.

This is being done in preparation for adding FPCR handling to the
AArch64 backend, in order to prevent it to from worsening the
generated code, but for targets that already have a similar register
it should improve things.

This patch affects code generation in several tests. The new code
looks better except for in Thumb2/LowOverheadLoops/memcall.ll where
we perform PRE but the LowOverheadLoops transformation then undoes
it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse,
but actually the function as a whole is better (as a MOV is PRE'd).

Differential Revision: https://reviews.llvm.org/D136675

[gn build] Port b51b90d6e25c

[gn build] Port 17059753f133

[gn build] semi-automatically ort 4f06d46f465c (LogicalView input files)

Revert "Harmonize cmake_policy() across standalone builds of all projects"

This reverts commit 88d7508dc479210f07abccb17f0194b66264b125.
It's reported to break builds when symlinking other projects inside
the `tools` directory.

[intelpt] Update Python tests to account for new errrors

Update the Python tests (ie tests run via `lldb-dotest -p TestTrace`) to
handle new error introduced in D136610.

Test Plan:
`lldb-dotest -p TestTrace`

Differential Revision: https://reviews.llvm.org/D136801

[llvm-debuginfo-analyzer] (08/09) - ELF Reader

The fix for the unitest case introduced a dependency on the
MC library causing a failure in:

  https://lab.llvm.org/buildbot/#/builders/121/builds/24567
  clang-ppc64le-multistage/stage1
  undefined reference to symbol 'llvm::TargetRegistry::lookupTarget'

Added:
- MC to the LLVM_LINK_COMPONENTS list.

Reviewed By: jryans

Differential Revision: https://reviews.llvm.org/D136837

[mlir] Fix printing when linalg.map has no inputs.

Differential Revision: https://reviews.llvm.org/D136836

Recommit: [FuncSpec] Fix specialisation based on literals

[fixed test to work with reverse iteration]

The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` .  There are a few
issues with the implementation, though:

* even with the default, the pass will still specialise based on
   floating-point literals

* even when it's enabled, the pass will specialise only for the `i1`
    type (or `i2` if all of the possible 4 values occur, or `i3` if all
    of the possible 8 values occur, etc)

The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.

This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:

* unknown or undef
* a constant
* a constant range with a single element

on the basis that specialisation is pointless for those cases.

Is also changes the criteria for picking up an actual argument to
specialise if the argument is:

* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135893

Change-Id: Iea273423176082ec51339aa66a5fe9fea83557ee

Harmonize cmake_policy() across standalone builds of all projects

Move `cmake_policy()` settings from `llvm/CMakeLists.txt` into a shared
`cmake/modules/CMakePolicy.cmake`. Include it from all relevant
projects that support standalone builds, in order to ensure that
the policies are consistently set whether they are built in-tree
or stand-alone.

Differential Revision: https://reviews.llvm.org/D136572

[mlir] Fix `AffineMap.dropResults`.

`AffineMap.dropResult` erases one result from the array and it changes indexing. Calling `dropResult` is a loop with increasing indexes does not produce a desired result.

Differential Revision: https://reviews.llvm.org/D136833

Fix iterator corruption in splitBasicBlockBefore

We should not delete block predecessors (via replacing successors
of terminators) while iterating them, otherwise we may skip some
of them. Instead, save predecessors to a separate vector and iterate
over it.

[FunctionAttrs] Add additional tests with operand bundles (NFC)

[mlir][tensor][bufferize] Support memory_space for tensor.pad

This change adds memory space support to tensor.pad. (tensor.generate and tensor.from_elements do not support memory spaces yet.)

The memory space is inferred from the buffer of the source tensor.

Instead of lowering tensor.pad to tensor.generate + tensor.insert_slice, it is now lowered to bufferization.alloc_tensor (with the correct memory space) + linalg.map + tensor.insert_slice.

Memory space support for the remaining two tensor ops is left for a later point, as this requires some more design discussions.

Differential Revision: https://reviews.llvm.org/D136265

Fix buildbot fail

[mlir][tensor] Fix build: Add missing line break to test case

This should have been part of D136767.

[mlir][tensor][bufferize] Lower tensor.generate to linalg.map

There is no memref equivalent of tensor.generate. The purpose of this change is to avoid creating scf.parallel loops during bufferization.

Differential Revision: https://reviews.llvm.org/D136767

[BasicAA] Remove redundant libcall handling

The writeonly attribute for memset_pattern16 (and other referenced
libcalls) is being added by InferFunctionAttrs nowadays. No need
to special-case it here.

[clang] Do not hide base member using-decls with different template head.

Fixes: https://github.com/llvm/llvm-project/issues/50886

**Adding requires clause to template head** or **constraining the template parameter type** is ineffective because, even though it creates a non-equivalent template head [temp.over.link#6](https://eel.is/c++draft/temp.over.link#6) and hence eligible for overload resolution, `Derived::foo` still [hides any previous using decl](https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaOverload.cpp#L1283-L1301,).
Clang diverges from gcc here and can be seen more clearly in this example:
```
struct base {
  template <int N, int M>
  int foo() { return 1; };
};

struct bar : public base {
  using base::foo;
  template <int N>
  int foo() { return 2; };
};

int main() {
  bar f;
  f.foo<10, 10>(); // clang previously errored while GCC does not.
}
```
https://godbolt.org/z/v5hnh6czq. We see that `bar::foo` hides `base::foo` because it only differs in the head.

Adding a trailing `requires` to the definition was a nice find. In this case, clang considers them [overloads](https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaOverload.cpp#L1148-L1152) because of [mismatching requires clause.](https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaOverload.cpp#L1390-L1405). So both of them make it to the overload resolution (where constrained Derived::foo is rejected then).

---

In this patch, we do not ignore matching the template head (template parameters, type contraints and trailing requires) while considering whether the using decl of base member should be hidden. The return type of a templated function is still not considered as different return types would create ambiguous candidates.

The changed tests looks reasonable and also matches GCC behaviour: https://godbolt.org/z/8KqPEThrY

Note: We are now able to create an ambiguity in case where both base member and derived member specialisations satisfy the constraints (when the constraints are not same). Ideally using-decl should not create ambiguity. I plan to fix this later if it gathers more attention.

Reviewed By: ilya-biryukov, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D136440

[mlir] Fix circular dialect initialization

This change fixes a bug where a dialect is initialized multiple times. This triggers an assertion when the ops of the dialect are registered (`error: operation named ... is already registered`).

This bug can be triggered as follows:

1. Dialect A depends on dialect B (as per ADialect.td).

2. Somewhere there is an extension of dialect B that depends on dialect A (e.g., it defines external models create ops from dialect A). E.g.:
```
registry.addExtension(+[](MLIRContext *ctx, BDialect *dialect) {
  BDialectOp::attachInterface ...
  ctx->loadDialect<ADialect>();
});
```

3. When dialect A is loaded, its `initialize` function is called twice:

```
     ADialect::ADialect()
        |     |
        |     v
        |   ADialect::initialize()
        v
     getOrLoadDialect<BDialect>()
        |
        v
     (load extension of BDialect)
        |
        v
     ctx->loadDialect<ADialect>()  // user wrote this in the extension
        |
        v
     getOrLoadDialect<ADialect>()  // the dialect is not "fully" loaded yet
        |
        v
     ADialect::ADialect()
        |
        v
     ADialect::initialize()
```

An example of a dialect extension that depends on other dialects is `Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp`. That particular dialect extension does not trigger this bug. (It would trigger this bug if the SCF dialect would depend on the Tensor dialect.)

This change introduces a new dialect state: dialects that are currently being loaded. Same as dialects that were already fully loaded (and initialized), dialects that are in the process of being loaded are not loaded a second time.

Differential Revision: https://reviews.llvm.org/D136685

[X86][1/2] SUPPORT RAO-INT

For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Initial authored by Liu Chen (@LiuChen3)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135951

[mlir][vector][bufferize] Implement DestinationStyleOpInterface on TransferWriteOp

This simplifies the BufferizableOpInterface implementation of vector.transfer_write.

Differential Revision: https://reviews.llvm.org/D136348

[llvm-debuginfo-analyzer] (08/09) - ELF Reader

The unitest and test cases are platform dependent (x86_64)
causing failures in:

  https://lab.llvm.org/buildbot/#/builders/245/builds/146
  https://lab.llvm.org/buildbot/#/builders/188/builds/21397
  No available targets are compatible with triple "x86_64-unknown-unknown".

Added:
- ';REQUIRES: x86-registered-target' to the LIT tests.
- Code to check if the target 'Triple::x86_64' is supported to
  the unittest case.

Pre-commit test case for D136784

This is a pre-commit for the fix in D136784.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D136783

[LSR][RISCV] Pre-commit test case for D126043

Pre-commit test case for D126043

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D134823

[mlir][bufferize] Provide default BufferizableOpInterface impl for destination style ops

tensor.insert and tensor.insert_slice (as destination style ops) do no longer need to implement the entire BufferizableOpInterface.

Differential Revision: https://reviews.llvm.org/D136347

Fix a -Wunused-const-variable warning.

[LLVM] Use DWARFv4 bitfields when tuning for GDB

GDB implemented data_bit_offset in https://sourceware.org/bugzilla/show_bug.cgi?id=12616
which has been present since GDB 8.0.

GCC started using it at GCC 11.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D135583

[RISCV] Map pseudos to their BaseInstr to reduce cases

There are a lot of cases for pseudos of the same instruction, here
we just use existed mapping table to map pseudos to real instructions
to reduce cases.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D128271

[IntervalMap] Add move and copy ctors and assignment operators

And update the unittest.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D136242

[NFC] [Modules] Rename modules related things in Preprocessor and AffectingModules

Rename module related things according to the consensus in
https://discourse.llvm.org/t/rfc-unifying-the-terminology-about-modules-in-clang/66054/
to reduce further confusings.

This only renames things I can make sure. It doesn't mean all the names
in Preprocessor are correct now.

Revert D136595 "[libc] Switch to new implementation of mem* functions"

This patch seems to introduce bugs on aarch64.
Reverting while we investigate the root cause.

This reverts commit 02841488138160f9064f334a833d4bf3e80385c6.

[PowerPC] Fix check for ieeelongdouble support

Clang detects the GCC version from the libdir. However, modern
GCC versions only include the major version in the libdir
(something like lib/gcc/powerpc64le-linux-gnu/12/), not all
version components. For this reason, even though the system has
a supported libstdcxx, it will still fail the check against the
12.1.0 version requirement.

Fix this by doing the same thing we do for patch versions: Assume
that a missing minor version is larger than any specific version.

To allow this to be tested, we need to fix two additional issues:
First, the GCC toolchain directories used for testing need to
contain a crtbegin.o file to be properly detected. The existing
tests actually ended up using a 0.0.0 version, rather the intended
one. Second, we also need to satisfy the glibc version check based
on the dynamic linker. To do so, respect the --dyld-prefix argument
and add the necessary file to the test toolchain directory.

Differential Revision: https://reviews.llvm.org/D136258

Revert "Revert "[mlir] Add vectorization tests for linalg.map,reduce,transpose.""

This reverts commit c34de60ea3a0f58ac8d21d6176356161ba5cc875.

[mlir][tensor] Implement DestinationStyleOpInterface for tensor.insert/insert_slice

Also allow unranked tensors/memrefs with destination style op outputs.

This allows for a simpler implementation of the BufferizableOpInterface (in a subsequent commit).

Differential Revision: https://reviews.llvm.org/D136346

[BasicAA] Replace VisitedPhiBBs with a single flag

When looking through phis, BasicAA has to guard against the
possibility that values from two separate cycle iterations are
being compared -- in this case, even though the SSA values may
be the same, they cannot be considered as equal.

This is currently done by keeping a set of VisitedPhiBBs for any
phis we looked through, and then checking whether the relevant
instruction is reachable from one of the phis.

This patch replaces this set with a single flag. If the flag is
set, then we will not assume equality for any instruction part
of a cycle. While this is nominally less accurate, it makes
essentially no difference in practice. Here are the AA stats
for test-suite:

    aa.NumMayAlias  |   3072005 |  3072016
    aa.NumMustAlias |    337858 |   337854
    aa.NumNoAlias   |  13255345 | 13255349

The motivation for the change is to expose the MayBeCrossIteration
flag to AA users, which will allow fixing miscompiles related to
incorrect handling of cross-iteration AA queries.

Differential Revision: https://reviews.llvm.org/D136174

[clang-format] Move InsertBraces unit tests out of FormatTest.cpp

Also add line range examples from #58161.

Differential Revision: https://reviews.llvm.org/D136658

[mlir][interfaces] Allow only ranked tensors/memrefs in DestinationStyleOpInterface

We have currently no need for unranked tensors/memrefs.

Differential Revision: https://reviews.llvm.org/D136588

[Support] Use find() for faster StringRef::count (NFC)

While profiling InclusionRewriter, it was found that counting lines was
so slow that it took up 20% of the processing time. Surely, calling
memcmp() of size 1 on every substring in the window isn't a good idea.

Use StringRef::find() instead; in the case of N=1 it will forward to
memcmp which is much more optimal. For 2<=N<256 it will run the same
memcmp loop as we have now, which is still suboptimal but at least does
not regress anything.

Differential Revision: https://reviews.llvm.org/D133658

[mlir] Print bbArgs of linalg.map/reduce/tranpose on the next line.

```
%mapped = linalg.map
  ins(%arg0 : tensor<64xf32>)
  outs(%arg1 : tensor<64xf32>)
  (%in: f32) {
    %0 = math.absf %in : f32
    linalg.yield %0 : f32
  }
%reduced = linalg.reduce
  ins(%arg0 : tensor<16x32x64xf32>)
  outs(%arg1 : tensor<16x64xf32>)
  dimensions = [1]
  (%in: f32, %init: f32) {
    %0 = arith.addf %in, %init : f32
    linalg.yield %0 : f32
  }
%transposed = linalg.transpose
  ins(%arg0 : tensor<16x32x64xf32>)
  outs(%arg1 : tensor<32x64x16xf32>)
  permutation = [1, 2, 0]
```

Differential Revision: https://reviews.llvm.org/D136818

[mlir][tensor] Disallow unranked tensors for tensor.extract/insert

When writing a tensor.extract/tensor.insert, the rank of the tensor is implied by the number of specified indices. When extracting from/inserting into an unranked tensor, it should first be casted to a ranked version.

Differential Revision: https://reviews.llvm.org/D136756

update_test_checks.py: allow use with custom tools

We have a downstream project with a command-line utility that operates
pretty much exactly like `opt`. So it would make sense for us to
maintain tests with update_test_checks.py with our custom tool
substituted for `opt`, as this change allows.

Differential Revision: https://reviews.llvm.org/D136329

Account for memory locations in DIExpression::createFragmentExpression

createFragmentExpression rejects expressions containing certain ops, like
DW_OP_plus, that may cause the expression to compute a value that can't be
split.

Teach createFragmentExpression that the value loaded from an address computed
using those ops is safe to split.

Update a unittest to account for and test this change.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D136243

Revert "[libc] Implement getopt"

This reverts commit a678f86351c30a7d57197ffefab4e6e44e61a857.

[libc] Implement getopt

Differential Revision: https://reviews.llvm.org/D133487

[libc] Cleanup stale documentation.

[libunwind] Add module maps for libunwind

Add module maps for the libunwind headers. unwind_arm_ehabi.h and unwind_itanium.h aren't covered because they don't get installed on all platforms.

Reviewed By: #libunwind, MaskRay

Differential Revision: https://reviews.llvm.org/D135345

[RISCV][NFC] Remove ISel of SPLAT_VECTOR

Since we have converted SPLAT_VECTOR to VMV_V_X_VL
or VFMV_V_F_VL in RISCVDAGToDAGISel::PreprocessISelDAG().

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D136814

[NFC] [AAPointerInfo] OffsetAndSize is no longer an std::pair

The struct OffsetAndSize is a simple tuple of two int64_t. Treating it as a
derived class of std::pair has no special benefit, but it makes the code
verbose since we need get/set functions that avoid using "first" and "second" in
client code. Eliminating the std::pair makes this more readable.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D136745

[Bazel][llvm-debuginfo-analyzer] Add deps for DebugInfoLogicalView after 4f06d46f465c

Add clang_CXXMethod_isCopyAssignmentOperator to libclang

The new method is a wrapper of `CXXMethodDecl::isCopyAssignmentOperator` and
can be used to recognized copy-assignment operators in libclang.

An export for the method, together with its documentation, was added to
"clang/include/clang-c/Index.h" with an implementation provided in
"clang/tools/libclang/CIndex.cpp". The implementation was based on
similar `clang_CXXMethod.*` implementations, following the same
structure but calling `CXXMethodDecl::isCopyAssignmentOperator` for its
main logic.

The new symbol was further added to "clang/tools/libclang/libclang.map"
to be exported, under the LLVM16 tag.

"clang/tools/c-index-test/c-index-test.c" was modified to print a
specific tag, "(copy-assignment operator)", for cursors that are
recognized by `clang_CXXMethod_isCopyAssignmentOperator`.
A new regression test file,
"clang/test/Index/copy-assignment-operator.cpp", was added to ensure
that the correct constructs were recognized or not by the new function.

The "clang/test/Index/get-cursor.cpp" regression test file was updated
as it was affected by the new "(copy-assignment operator)" tag.

A binding for the new function was added to libclang's python's
bindings, in "clang/bindings/python/clang/cindex.py", adding a new
method for `Cursor`, `is_copy_assignment_operator_method`.

The current release note, `clang/docs/ReleaseNotes.rst`, was modified to
report the new addition under the "libclang" section.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D136604

[libc] Add a doc about the libc overlay mode.

Reviewed By: jeffbailey

Differential Revision: https://reviews.llvm.org/D136810

[llvm-debuginfo-analyzer] (08/09) - ELF Reader

llvm-debuginfo-analyzer is a command line tool that processes debug
info contained in a binary file and produces a debug information
format agnostic “Logical View”, which is a high-level semantic
representation of the debug info, independent of the low-level
format.

The code has been divided into the following patches:

1) Interval tree
2) Driver and documentation
3) Logical elements
4) Locations and ranges
5) Select elements
6) Warning and internal options
7) Compare elements
8) ELF Reader
9) CodeView Reader

Full details:
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570

This patch:

This is a high level summary of the changes in this patch.

ELF Reader
- Support for ELF/DWARF.
LVBinaryReader, LVELFReader

Reviewed By: psamolysov, probinson

Differential Revision: https://reviews.llvm.org/D125783

[clang] Instantiate concepts with sugared template arguments

Since we don't unique specializations for concepts, we can just instantiate
them with the sugared template arguments, at negligible cost.

If we don't track their specializations, we can't resugar them later
anyway, and that would be more expensive than just instantiating them
sugared in the first place since it would require an additional pass.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D136566

[clang] Instantiate alias templates with sugar

This makes use of the changes introduced in D134604, in order to
instantiate alias templates witn a final sugared substitution.

This comes at no additional relevant cost.
Since we don't track / unique them in specializations, we wouldn't be
able to resugar them later anyway.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D136565

[clang] Instantiate NTTPs and template default arguments with sugar

This makes use of the changes introduced in D134604, in order to
instantiate non-type template parameters and default template arguments
with a final sugared substitution.

This comes at no additional relevant cost.
Since we don't track / unique them in specializations, we wouldn't be
able to resugar them later anyway.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D136564

[clang] Implement sugared substitution changes to infrastructure

Implements the changes required to perform substitution with
non-canonical template arguments, and to 'finalize' them
by not placing 'Subst' nodes.

A finalized substitution means we won't resugar them later,
because these templates themselves were eagerly substituted
with the intended arguments at the point of use. We may still
resugar other templates used within those, though.

This patch does not actually implement any uses of this
functionality, those will be added in subsequent patches,
so expect no changes to existing tests.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D134604

[SampleFDO] Compute and report profile staleness metrics

When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high.

Two sets of metrics are introduced here:

- (Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this.
- (Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite.

This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future.

Reviewed By: hoy, wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D136627

[clang] Perform sugared substitution of builtin templates

Since these are much like template type aliases, where we don't
track a specialization for them and just substitute them eagerly,
we can't resugar them anyway, and there is no relevant cost in just
performing a finalizing sugared substitution.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D136563

[clang] Changes to produce sugared converted template arguments

Makes CheckTemplateArgumentList and the template deduction functions
produce a sugared converted argument list in addition to the canonical one.

This is mostly NFC except that we hook this up to a few diagnostics in
SemaOverload.

The infrastructure here will be used in subsequent patches
where we perform a finalized sugared substitution for entities
which we do not unique per specializations on canonical arguments,
and later on will be used for template specialization resugaring.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D133874

[clang] Include the type of a pointer or reference non-type template parameter in its notion of template argument identity.

We already did this for all the other kinds of non-type template
argument.

Note: Based on earlier reverted patch from zygoloid.

Differential Revision: https://reviews.llvm.org/D136803

[mlir][Tensor] Change `createDimValues` to return a list of `OpFoldResult`s.

Reviewed By: nicolasvasilache, hanchung, ThomasRaoux

Differential Revision: https://reviews.llvm.org/D136733

Revert "[lldb-vscode] Send Statistics Dump in terminated event"

This reverts commit c8a26f8c6de30dbd814546f02e4c89a4fcb2b4ef.

Returning full statistics result in "terminated" (DAP) event could result in delay in the UI when debugging from VSCode.

If the program run to exit and debug session terminates. The DAP event order will be: exited event --> terminateCommands --> terminated event --> disconnect request --> disconnect response.

The debugging UI in VSCode corresponds to "disconnect" request/response. If the terminated event is taking long to process, the IDE won't quit debugging UI until it's done.

For big binary (tested example has 29 GB of debug info), it can cause ~15s delay in terminated event itself. And the UI could take ~20s to reflect.

This may cause confusion in debug sessions. We should persuit a more lightweight return or other solution to return such info.

[AArch64] Adjust operand sequence for Add+Sub to combine more inline shift

((X >> C) - Y) + Z --> (Z - Y) + (X >> C)

Fix AArch part: #55714

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D136158

[CodeGen] Improve large stack frame diagnostic

Add statistics about how much memory is used, in variables, spills, and
unsafestack.

Issue #58168 describes some of the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. D135488 addresses some of those issues by
giving developers a method to view the stack layout and thereby understand
where and how stack memory is used.

However, that solution requires an additional pass, when a short summary about
how the compiler has allocated stack memory can inform developers about where
they should investigate. When they need the complete context, D135488 can
provide them with a more comprehensive set of diagnostics.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D136484

[NFC][PhaseOrdering] Add new test for SROA misplacement

[flang] Add atomic_fetch_and to list of intrinsics

Add the atomic subroutine, atomic_fetch_and, to the list of
intrinsic subroutines, add its last dummy argument to a check
for coindexed-object, and update test.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D136720

[Driver] Allow target override containing . in executable name

The gcc compatible driver has support for overriding the default
target based on the driver's executable name, for instance
x86_64-pc-linux-gnu-clang will set the default target to
x86_64-pc-linux-gnu.

Previously, this failed when the target contained a minor version, for
example x86_64-pc-freebsd13.1, so instead of finding the file's
stem, use the whole file name, but strip off any '.exe' from the tail.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D135284