review.tizen.org Git - platform/upstream/llvm.git/log

[ADT] Use alignas + sizeof for inline storage, NFC

AlignedCharArrayUnion is really only needed to handle the "union" case
when we need memory of suitable size and alignment for multiple types.
SmallVector only needs storage for one type, so use that directly.

[clang][NFC] Change diagnostic to start with lowercase letter

[Format/ObjC] Add NS_SWIFT_NAME() and CF_SWIFT_NAME() to WhitespaceSensitiveMacros

The argument passed to the preprocessor macros `NS_SWIFT_NAME(x)` and
`CF_SWIFT_NAME(x)` is stringified before passing to
`__attribute__((swift_name("x")))`.

ClangFormat didn't know about this stringification, so its custom parser
tried to parse the argument(s) passed to the macro as if they were
normal function arguments.

That means ClangFormat currently incorrectly inserts whitespace
between `NS_SWIFT_NAME` arguments with colons and dots, so:

```
extern UIWindow *MainWindow(void) NS_SWIFT_NAME(getter:MyHelper.mainWindow());
```

becomes:

```
extern UIWindow *MainWindow(void) NS_SWIFT_NAME(getter : MyHelper.mainWindow());
```

which clang treats as a parser error:

```
error: 'swift_name' attribute has invalid identifier for context name [-Werror,-Wswift-name-attribute]
```

Thankfully, D82620 recently added the ability to treat specific macros
as "whitespace sensitive", meaning their arguments are implicitly
treated as strings (so whitespace is not added anywhere inside).

This diff adds `NS_SWIFT_NAME` and `CF_SWIFT_NAME` to
`WhitespaceSensitiveMacros` so their arguments are implicitly treated
as whitespace-sensitive.

Test Plan:
New tests added. Ran tests with:
% ninja FormatTests && ./tools/clang/unittests/Format/FormatTests

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D89425

Register TargetCXXABI.def as a textual header

[mlir][Linalg] Rethink fusion of linalg ops with reshape ops.

The current fusion on tensors fuses reshape ops with generic ops by
linearizing the indexing maps of the fused tensor in the generic
op. This has some limitations
- It only works for static shapes
- The resulting indexing map has a linearization that would be
potentially prevent fusion later on (for ex. tile + fuse).

Instead, try to fuse the reshape consumer (producer) with generic op
producer (consumer) by expanding the dimensionality of the generic op
when the reshape is expanding (folding). This approach conflicts with
the linearization approach. The expansion method is used instead of
the linearization method.

Further refactoring that changes the fusion on tensors to be a
collection of patterns.

Differential Revision: https://reviews.llvm.org/D89002

Make header self-contained. NFC.

clang/Basic: Replace ContentCache::getBuffer with Optional semantics

Remove `ContentCache::getBuffer`, which always returned a
dereferenceable `MemoryBuffer*` and had a `bool*Invalid` out parameter,
and replace it with:

- `ContentCache::getBufferOrNone`, which returns
  `Optional<MemoryBufferRef>`. This is the new API that consumers should
  use. Later it could be renamed to `getBuffer`, but intentionally using
  a different name to root out any unexpected callers.
- `ContentCache::getBufferPointer`, which returns `MemoryBuffer*` with
  "optional" semantics. This is `private` to avoid growing callers and
  `SourceManager` has temporarily been made a `friend` to access it.
  Later paches will update the transitive callers to not need a raw
  pointer, and eventually this will be deleted.

No functionality change intended here.

Differential Revision: https://reviews.llvm.org/D89348

[llvm] Update default cutoff threshold for machine function splitter.

Based on internal testing at Google we found that setting the profile
summary cutoff threshold to 999950 yields the best results in terms of
itlb and icache metrics (as observed on Intel CPUs).

*default* = Split out code if no profile count available for block
*size-%*  = The fraction of bytes split out of .text and .text.hot
*itlb*    = Misses per kilo instructions (MPKI) for itlb
*icache*  = Misses per kilo instructions (MPKI) for L1 icache

Search1

| cutoff  | size-%  | itlb      | icache  |
|---------|---------|-----------|---------|
| default | 42.5861 | 0.0822151 | 2.46363 |
|  999999 | 44.9350 | 0.0767194 | 2.44416 |
|  999950 | 50.0660 |  0.075744 |  2.4091 |
|  999500 | 56.9158 |  0.082564 |  2.4188 |
|  995000 | 63.8625 | 0.0814927 | 2.42832 |
|  990000 | 71.7314 |  0.106906 | 2.57785 |

Search2

| cutoff  | size-% | itlb     | icache  |
|---------|--------|----------|---------|
| default | 2.8845 | 0.626712 | 4.73245 |
|  999999 | 3.3291 | 0.602309 | 4.70045 |
|  999950 | 3.8577 | 0.587842 | 4.71632 |
|  999500 | 4.4170 |  0.63577 | 4.68351 |
|  995000 | 5.1020 | 0.657969 | 4.82272 |
|  990000 | 5.7153 | 0.719122 | 5.39496 |

Differential Revision: https://reviews.llvm.org/D89085

[mlir] Fix some style comments from D89268

That change was a pure move, so split out the stylistic changes into
this patch.

Differential Revision: https://reviews.llvm.org/D89272

[mlir][bufferize] Rename BufferAssignment* to Bufferize*

Part of the refactor discussed in:
https://llvm.discourse.group/t/what-is-the-strategy-for-tensor-memref-conversion-bufferization/1938/17

Differential Revision: https://reviews.llvm.org/D89271

[mlir] Refactor code out of BufferPlacement.cpp

Now BufferPlacement.cpp doesn't depend on Bufferize.h.

Part of the refactor discussed in:
https://llvm.discourse.group/t/what-is-the-strategy-for-tensor-memref-conversion-bufferization/1938/17

Differential Revision: https://reviews.llvm.org/D89268

[mlir] Rename ShapeTypeConversion to ShapeBufferize

Once we have tensor_to_memref ops suitable for type materializations,
this pass can be split into a generic type conversion pattern.

Part of the refactor discussed in:
https://llvm.discourse.group/t/what-is-the-strategy-for-tensor-memref-conversion-bufferization/1938/17

Differential Revision: https://reviews.llvm.org/D89258

[mlir] Linalg refactor for using "bufferize" terminology.

Part of the refactor discussed in:
https://llvm.discourse.group/t/what-is-the-strategy-for-tensor-memref-conversion-bufferization/1938/17

Differential Revision: https://reviews.llvm.org/D89261

[clang] Add -fc++-abi= flag for specifying which C++ ABI to use

This implements the flag proposed in RFC http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.

The goal is to add a way to override the default target C++ ABI through
a compiler flag. This makes it easier to test and transition between different
C++ ABIs through compile flags rather than build flags.

In this patch:
- Store `-fc++-abi=` in a LangOpt. This isn't stored in a
  CodeGenOpt because there are instances outside of codegen where Clang
  needs to know what the ABI is (particularly through
  ASTContext::createCXXABI), and we should be able to override the
  target default if the flag is provided at that point.
- Expose the existing ABIs in TargetCXXABI as values that can be passed
  through this flag.
  - Create a .def file for these ABIs to make it easier to check flag
    values.
  - Add an error for diagnosing bad ABI flag values.

Differential Revision: https://reviews.llvm.org/D85802

[llvm] Set the default for -bbsections-cold-text-prefix to .text.split.

After using this for a while, we find that it is generally useful to
have it set to .text.split. by default, removing the need for an
additional -mllvm option.

Differential Revision: https://reviews.llvm.org/D88997

[MBP] Add whole chain to BlockFilterSet instead of individual BB

Currently we add individual BB to BlockFilterSet if its frequency satisfies

LoopFreq / Freq <= LoopToColdBlockRatio

LoopFreq is edge frequency from outside to loop header.
LoopToColdBlockRatio is a command line parameter.

It doesn't make sense since we always layout whole chain, not individual BBs.

It may also cause a tricky problem. Sometimes it is possible that the LoopFreq
of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in
BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop,
like .cold in the test case. So it is added to the chain of inner loop. When
work on the outer loop, .cold is not added to BlockFilterSet, so the edge to
successor .problem is not counted in UnscheduledPredecessors of .problem chain.
But other blocks in the inner loop are added BlockFilterSet, so the whole inner
loop chain can be layout, and markChainSuccessors is called to decrease
UnscheduledPredecessors of following chains. markChainSuccessors calls
markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold,
so .problem chain's UnscheduledPredecessors is decreased, but this edge was not
counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors
becomes 0 when it still has an unscheduled predecessor .pred! And it causes
problems in following various successor BB selection algorithms.

Differential Revision: https://reviews.llvm.org/D89088

[lldb] More memory allocation test fixes

XFAIL nodefaultlib.cpp on darwin - the test does not pass there

XFAIL TestGdbRemoteMemoryAllocation on windows - memory is allocated
with incorrect permissions

[flang] Fix CMake bug in the definition of flang-new

Recent patch that improved Flang's compatibility with respect to how LLVM
dynamic libraries should be linked (and specified in CMake recipes),
introduced a bug in the definition of `flang-new`:
  * https://reviews.llvm.org/D87893
More specifically, `add_flang_tool` does not support the
`LINK_COMPONENTS` CMake argument.  Instead, one should set
`LLVM_LINK_COMPONENTS` before calling `add_flang_tool`.

This patch reverts the change for `flang-new` from
https://reviews.llvm.org/D87893, and instead:
  * sets `LLVM_LINK_COMPONENTS`
  * calls `clang_target_link_libraries` to add Clang dependencies

Differential Revision: https://reviews.llvm.org/D89403

Preserve param alignment in NVPTXLowerArgs pass.

NVPTXLowerArgs works as follows.

  * Create a regular alloca with alignment identical to arg.
  * Copy arg from param space (and ASC'ing it from generic AS first) to
    the alloca (it's still in generic AS).
  * Replace loads of arg with loads of alloca.

The bug here is that we did not preserve the arg's alignment when
loading from the alloca.

The impact of this bug is that sometimes param loads would be lowered as
a series of u8 loads, because we're incorrectly assuming everything has
alignment 1.

Differential Revision: https://reviews.llvm.org/D89404

[DDR] Introduce implicit equality check for the source pattern operands with the same name.

This CL allows user to specify the same name for the operands in the source pattern which implicitly enforces equality on operands with the same name.
E.g., Pat<(OpA $a, $b, $a) ... > would create a matching rule for checking equality for the first and the last operands. Equality of the operands is enforced at any depth, e.g., OpA ($a, $b, OpB($a, $c, OpC ($a))).

Example usage: Pat<(Reshape $arg0, (Shape $arg0)), (replaceWithValue $arg0)>

Note, this feature only covers operands but not attributes.
Current use cases are based on the operand equality and explicitly add the constraint into the pattern. Attribute equality will be worked out on the different CL.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D89254

[lldb] [Process/FreeBSDRemote] Support YMM reg via PT_*XSTATE

Add a framework for reading/writing extended register sets via
PT_GETXSTATE/PT_GETXSTATE_INFO/PT_SETXSTATE, and use it to support
YMM0..YMM15. The code is prepared to handle arbitrary XSAVE extensions,
including correct offset handling.

This fixes Shell/Register/*ymm* tests.

Differential Revision: https://reviews.llvm.org/D89193

[Hexagon] Generate better splat code on v62+

[mlir] More changes to avoid args now inserted.NFC

Migrates a bit more from the old/to be deprecated form.

[Driver]: fix compiler-rt path when printing libgcc for baremetal

clang --target arm-none-eabi --print-libgcc-file-name --rtlib=compiler-rt
used to print `/path/to/lib/clang/version/lib/libclang_rt.builtins-arm.a`
but should print `/path/to/lib/clang/version/lib/baremetal/libclang_rt.builtins-arm.a`.
Similarly, --target armv7m-none-eabi should print libclang_rt.builtins-armv7m.a
This matches the compiler-rt file name used at link time in the
baremetal driver.

Reviewed By: manojgupta

Differential Revision: https://reviews.llvm.org/D89327

[X86] Add test case to demonstrate a Log2_32_Ceil that can just be Log2_32 in SimplifySetCC ctpop combine.

This combine can look through (trunc (ctpop X)). When doing this
it tries to make sure the trunc doesn't lose any information
from the ctpop. It does this by checking that the truncated type
has more bits that Log2_32_Ceil of the ctpop type. The Ceil is
unnecessary and pessimizes non-power of 2 types.

For example, ctpop of i256 requires 9 bits to represent the max
value of 256. But ctpop of i255 only requires 8 bits to represent
the max result of 255. Log2_32_Ceil of 256 and 255 both return 8
while Log2_32 returns 8 for 256 and 7 for 255.

Revert rG25a97c3a43d7 - "[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts"

This reverts commit 25a97c3a43d7bc469ec67dd4e901a507b9b11116.

We have other constant folds that fold undef funnel shift amounts to 0 - so we need to be consistent.

If we end up with regressions where we lose a splat shift amount pattern we'll have to investigate other canonicalizations, but matchFunnelShift currently protects us from that.

AMDGPU: Update AMDHSA code object version handling

Differential Revision: https://reviews.llvm.org/D89076

InstCombine: Fix losing load properties in copy-constant-to-alloca

Preserve the alignment and metadata. Atomic loads are skipped for
this, but pass along the properties for consistency.

InstCombine: Fix infinite loop in copy-constant-to-alloca transform

This was broken by 16295d521e294b27106e51fac29957c1aac8ff89, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.

This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.

[libc++] Mark two tests as unsupported in C++03

This was dropped when I split the tests into individual source files
to make sure they would actually run (in 2908eb20ba).

Recommit "[VPlan] Use VPValue def for VPMemoryInstructionRecipe."

This reverts the revert commit 710aceb645e7dba4de7053eef2c616311b9163d4
and includes a fix for a memsan failure.

Original message:

    This patch turns VPMemoryInstructionRecipe into a VPValue and uses it
    during VPlan construction and codegeneration instead of the plain IR
    reference where possible.

[CodeGen] Move x86 specific ms intrinsic tests into x86 target subfolder. NFCI.

Polly - specify address space when creating a pointer to a vector type

Polly incorrectly dropped the address space specified for a load instruction when it vectorized the code.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D88907

[clangd] clang-format TweakTests, NFC

Add Allocate Clause to MLIR Parallel Operation Definition

Differential Revision: https://reviews.llvm.org/D87684

[libc++] Use ADDITIONAL_COMPILE_FLAGS instead of #define for _LIBCPP_DEBUG

[libc++] Split off debug tests that were missed by ce1365f8f7e into test/libcxx

Also, some tests had multiple death tests in them, so split them into
separate tests instead. The second death test would obviously never
get run, because the first one would kill the program before.

[AIX] Turn -fdata-sections on by default in Clang

Summary:

This patch does the following:
1. Make InitTargetOptionsFromCodeGenFlags() accepts Triple as a
parameter, because some options' default value is triple dependant.
2. DataSections is turned on by default on AIX for llc.
3. Test cases change accordingly because of the default behaviour change.
4. Clang Driver passes in -fdata-sections by default on AIX.

Reviewed By: MaskRay, DiggerLin

Differential Revision: https://reviews.llvm.org/D88737

[InstCombine] narrowRotate - canonicalize to OR(SHL,LSHR). NFCI.

Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15

[NFC][MC] Use MCRegister in Machine{Sink|Pipeliner}.cpp

Differential Revision: https://reviews.llvm.org/D89328

Remove Combine.td.rej file

Fix an apparent typo. `assert()` must not contain side-effects. NFC.

[mlir][vulkan-runner] Fix buffer usage flags

The buffers are used as source or destination of transfer commands
so always add VK_BUFFER_USAGE_TRANSFER_{DST,SRC}_BIT to their usage
flags.

Signed-off-by: Kevin Petit <kevin.petit@arm.com>

Fix conjuntion of -Werror,-Wsuggest-override with google/benchmark

[InstCombine] Add m_SpecificIntAllowUndef pattern matcher

m_SpecificInt doesn't accept undef elements in a vector splat value - tweak specific_intval to optionally allow undefs and add the m_SpecificIntAllowUndef variants.

Allows us to remove the m_APIntAllowUndef + comparison hack inside matchFunnelShift

[profile] Remove useless msync when dumping gcda files

Summary:
According the mmap man page (https://man7.org/linux/man-pages/man2/mmap.2.html) is only required to precisely control updates, so we can safely remove it.
Since gcda files are dumped just before to call exec** functions, dump need to be fast.
On my computer, Firefox built with --coverage needs ~1min40 to display something and in removing msync it needs ~8s.

Reviewers: void

Subscribers: #sanitizers, marco-c, sylvestre.ledru

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D81060

[lldb] Fix TestGdbRemoteMemoryAllocation on windows

It appears that memory allocation actually works on windows (but it was
not fully wired up before 2c4226f8).

[lldb] Remove bogus ProcessMonitor forward-decls

This class is not used in those files.

[SVE] Lower fixed length VECREDUCE_FADD operation

Differential Revision: https://reviews.llvm.org/D89263

[flang] Rework host runtime folding and enable REAL(2) folding with it.

- Rework the host runtime table so that it is constexpr to avoid
having to construct it and to store/propagate it.
- Make the interface simpler (remove many templates and a file)
- Enable 16bits float folding using 32bits float host runtime
- Move StaticMultimapView into its own header to use it for host
folding

Reviewed By: klausler, PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D88981

[libc++] Remove signal-based checkpoints in libc++ tests

While this adds some convenience to the test suite, it prevents the tests
using these checkpoints from being used on systems where signals are not
available, such as some embedded systems. It will also prevent these tests
from being constexpr-friendly once e.g. std::map is made constexpr, due
to the use of statics.

Instead, one can always use a debugger to figure out exactly where a
test is failing when that isn't clear from the log output without
checkpoints.

Fix `-Wparentheses` warnings. NFC.

[mlir] expand the legal floating-point types in the LLVM IR dialect type check

This patch adds a couple missing LLVM IR dialect floating point types to
the legality check.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D89350

[mlir][Linalg] Add missing dependency

[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts

By always performing a modulo on the shift amount constants this was causing undef amounts being replaced with zero, meaning we were losing funnel shift by splat (with undef) patterns.

Tweaked the shift amount bounds check to support (passthrough) undefs, and use Constant::mergeUndefsWith to preserve the undefs after folding.

[SystemZ] Bugfix in SystemZVectorConstantInfo

In order to correctly load an all-ones FP NaN value into a floating point
register with a VGBM, the analyzed 32/64 FP bits must first be shifted left
(into element 0 of the vector register).

SystemZVectorConstantInfo has so far relied on element replication which has
bypassed the need to do this shift, but now it is clear that this must be
done in order to handle NaNs.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D89389

[DebugInstrRef] Create DBG_INSTR_REFs in SelectionDAG

When given the -experimental-debug-variable-locations option (via -Xclang
or to llc), have SelectionDAG generate DBG_INSTR_REF instructions instead
of DBG_VALUE. For now, this only happens in a limited circumstance: when
the value referred to is not a PHI and is defined in the current block.
Other situations introduce interesting problems, addresed in later patches.

Practically, this patch hooks into InstrEmitter and if it can find a
defining instruction for a value, gives it an instruction number, and
points the DBG_INSTR_REF at that <instr, operand> pair.

Differential Revision: https://reviews.llvm.org/D85747

Fix a broken build for gcc <= 7.1

we need add a "this->" inside the lambda body to workaround it. Rewrite
it to normal for-range loop.

Revert "Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"" and it's follow-ups

While we haven't encountered an earth-shattering problem with this yet,
by now it is pretty evident that trying to model the ptr->int cast
implicitly leads to having to update every single place that assumed
no such cast could be needed. That is of course the wrong approach.

Let's back this out, and re-attempt with some another approach,
possibly one originally suggested by Eli Friedman in
https://bugs.llvm.org/show_bug.cgi?id=46786#c20
which should hopefully spare us this pain and more.

This reverts commits 1fb610429308a7c29c5065f5cc35dcc3fd69c8b1,
7324616660fc0995fa8c166e3c392361222d5dbc,
aaafe350bb65dfc24c2cdad4839059ac81899fbe,
e92a8e0c743f83552fac37ecf21e625ba3a4b11e.

I've kept&improved the tests though.

[lldb-server][linux] Add ability to allocate memory

This patch adds support for the _M and _m gdb-remote packets, which
(de)allocate memory in the inferior. This works by "injecting" a
m(un)map syscall into the inferior. This consists of:
- finding an executable page of memory
- writing the syscall opcode to it
- setting up registers according to the os syscall convention
- single stepping over the syscall

The advantage of this approach over calling the mmap function is that
this works even in case the mmap function is buggy or unavailable. The
disadvantage is it is more platform-dependent, which is why this patch
only works on X86 (_32 and _64) right now. Adding support for other
linux architectures should be easy and consist of defining the
appropriate syscall constants. Adding support for other OSes depends on
the its ability to do a similar trick.

Differential Revision: https://reviews.llvm.org/D89124

[lldb] Modernize PseudoTerminal::OpenFirstAvailablePrimary

replace char*+length combo with llvm::Error

[flang] Make flang build compatible with LLVM dylib

Harmonize usage of LLVM components througout Flang.

Explicit LLVM Libs where used across several CMakeFIles, which led to
incompatibilities with LLVM shlibs.
Fortunately, the LLVM component system can be relied on to harmoniously handle
both cases.

Differential Revision: https://reviews.llvm.org/D87893

[clangd] Disable extract variable for RHS of assignments

Differential Revision: https://reviews.llvm.org/D89307

[ASTImporter] Fix crash caused by unset AttributeSpellingListIndex

During the import of attributes we forgot to set the spelling list
index. This caused a segfault when we wanted to traverse the AST
(e.g. by the dump() method).

Differential Revision: https://reviews.llvm.org/D89318

[ASTImporter] Fix crash caused by unimported type of FromatAttr

During the import of FormatAttrs we forgot to import the type (e.g
`__scanf__`) of the attribute. This caused a segfault when we wanted to
traverse the AST (e.g. by the dump() method).

Differential Revision: https://reviews.llvm.org/D89319

[clangd] Refine recoveryAST flags in clangd

so that we could start experiment for C.

Previously, these flags in clangd were only meaningful for C++. We need
to flip them for C, this patch repurpose these flags.

- if true, just set it.
- if false, just respect the value in clang.

this would allow us to keep flags on for C++, and optionally flip them on for C.

Differential Revision: https://reviews.llvm.org/D89233

[AMDGPU] Base getSubRegFromChannel on TableGen data

Generate (at runtime) the table used to drive getSubRegFromChannel,
base on AMDGPUSubRegIdxRanges from TableGen data.
The is a step closer to it being staticly generated by TableGen and
allows getSubRegFromChannel handle all bitwidths in the mean time.

Reviewed By: rampitec, arsenm, foad

Differential Revision: https://reviews.llvm.org/D89217

[ValueTracking] Use assume's noundef operand bundle

This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219

Analysis: only query size of sized objects.

Recently we started looking into sret parameters, though the issue could crop
up elsewhere. If the pointee type is opaque, we should not try to compute its
size because that leads to an assertion failure.

[mlir][Linalg] Make a Linalg CodegenStrategy available.

This revision adds a programmable codegen strategy from linalg based on staged rewrite patterns. Testing is exercised on a simple linalg.matmul op.

Differential Revision: https://reviews.llvm.org/D89374

[InstCombine] Add undef funnel shift amount test coverage

[Flang][OpenMP] Fix issue in only a single nowait clause can appear on a sections directive.

The OpenMP 5.0 standard restricts nowait clause to appear only once on sections
directive.
See OpenMP 5.0
- 2.8.1
- point 3 in restrictions.

Added a test with fix.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D88556

Add x86 REQUIRES to tests from 2c5f3d54c5ee / D85746

[Test] Auto-update for some tests

Reland "[Support][unittests] Enforce alignment in ConvertUTFTest"

This relands commit 53b3873cf428fd78f1d92504cc20adf11181ead7. The failure
of `ConvertUTFTest.UTF16WrappersForConvertUTF16ToUTF8String` detected the
first time is fixed.

Differential Revision: https://reviews.llvm.org/D88824

[AArch64] Add more addv tests

Differential Revision: https://reviews.llvm.org/D89365

[DebugInstrRef] Parse debug instruction-references from/to MIR

This patch defines the MIR format for debug instruction references: it's an
integer trailing an instruction, marked out by "debug-instr-number", much
like how "debug-location" identifies the DebugLoc metadata of an
instruction. The instruction number is stored directly in a MachineInstr.

Actually referring to an instruction comes in a later patch, but is done
using one of these instruction numbers.

I've added a round-trip test and two verifier checks: that we don't label
meta-instructions as generating values, and that there are no duplicates.

Differential Revision: https://reviews.llvm.org/D85746

[LV] Unroll factor is expected to be > 0

LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87679

[InstCombine] matchFunnelShift - add support for non-uniform vectors containing undefs.

Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result.

The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.

[SyntaxTree][NFC] Nit on `replaceChildRangeLowLevel`

[SyntaxTree] Bug fix in `MutationsImpl::addAfter`.

* Add assertions to other `MutationsImpl` member functions
* `findPrevious` is a free function

Differential Revision: https://reviews.llvm.org/D89314

[SyntaxTree] Improve safety of `replaceChildRangeLowLevel`

* Add assertions for other preconditions.
* If nothing is modified, don't mark it.

Differential Revision: https://reviews.llvm.org/D89303

[LoopFlatten] Precommit new test cases. NFC.

[lldb] [test/Register] Add read/write tests for multithreaded process

Add a test to verify that 'register read' and 'register write' commands
work correctly in a multithreaded program, in particular that they read
or write registers for the correct thread. The tests use locking
to ensure that events are serialized and the test can execute reliably.

Differential Revision: https://reviews.llvm.org/D89248

[Flang][OpenMP] Rework parser changes for OpenMP atomic construct.

`OmpStructureChecker` is supposed to work only with `parser::OmpClause`
after tablegen changes for OpenMP and OpenACC were introduced.
Hence `OmpMemoryOrderClause`, `OmpAtomicMemoryOrderClause` and similar ones were failing
to catch semantic errors, inspite of having code for semantic checks.
This patch tries to change parser for `OmpMemoryOrderClause` and similar dependent ones
and use `OmpClauseList` which resides/comes from common tablegen for OpenMP/OpenACC eventually using `parser::OmpClause`.

This patch also tries to :
1. Change `OmpCriticalDirective` in `openmp-parsers.cpp` to support `OmpClauseList`.
2. Check-flang regresses when changes were introduced due to missing semantic checks in OmpCritical, patch implements them at the minimal level to pass the regression.
3. Change tablegen to support Hint clause.
4. Adds missing source locations `CharBlock Source` in each atomic construct.
5. Remove dead code realted to `memory-order-clauses` after moving to `OmpClauseList`.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D88965

[SVE] Add fatal error when running out of registers for SVE tuple call arguments

When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack. This is wrong and we'd like
to avoid any silent ABI compatibility issues in future. For now,
I've added a fatal error when this happens until we can get a
proper fix.

Differential Revision: https://reviews.llvm.org/D89326

Fix typos in the documentation of dynamic values in subview ops

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D89338

[lldb] Reject redefinitions of persistent variables

Currently one can redefine a persistent variable and LLDB will just silently
ignore the second definition:

```
(lldb) expr int $i = 1
(lldb) expr int $i = 2
(lldb) expr $i
(int) $i = 1
```

This patch makes this an error and rejects the expression with the second
definition.

A nice follow up would be to refactor LLDB's persistent variables to not just be
a pair of type and name, but also contain some way to obtain the original
declaration and source code that declared the variable. That way we could
actually make a full diagnostic as we would get from redefining a variable twice
in the same expression.

Reviewed By: labath, shafik, JDevlieghere

Differential Revision: https://reviews.llvm.org/D89310

[Attributor][NFC] Make `createShallowWrapper()` available outside of Attributor

D85703 will need to create shallow wrappers in order to track the spmd icv. We need to make it available.

Differential Revision: https://reviews.llvm.org/D89342

[clang-rename] Simplify the code of handling class paritial specializations, NFC.

Instead of collecting all specializations and doing a post-filterin, we
can just get all targeted specializations from getPartialSpecializationsizations.

Differential Revision: https://reviews.llvm.org/D89220

[test][lld] Mark TLS tests as REQUIRES: x86.

Fixes http://lab.llvm.org:8011/#/builders/119/builds/92

[libcxxabi,libunwind] support running tests in standalone mode

Remove check for standalone and shared library mode in libcxxabi to
allow including tests in said mode. This check prevented running the
tests in standalone mode with static libraries, which is the case for
baremetal targets.

Fix check-unwind target trying to use a non-existent llvm-lit executable
in standalone mode. Copy the HandleOutOfTreeLLVM logic from libcxxabi to
libunwind in order to make the tests work in standalone mode.

Reviewed By: ldionne, #libc_abi, #libc

Differential Revision: https://reviews.llvm.org/D86540

[ARM.td] Make instruction definitions visible to sched models

Differential revision: https://reviews.llvm.org/D89308

[lldb] Remove lexical block and fix formatting LoadScriptingModule (NFC)

[lldb] Unconditionally strip the `.py(c)` extension when loading a module

Currently we only strip the Python extension when the file exists on
disk because we assumed that if it didn't exist it was a module.
However, with the change from D89334 this is no longer the case as we
want to be able to import a relative path to a .py as a module. Since we
always import a scripting module as a "python module" we should always
strip the extension if present.

Differential revision: https://reviews.llvm.org/D89352

Revert "[clang] Improve handling of physical registers in inline assembly operands."

This reverts commit c78da037783bda0f27f4d82060149166e6f0c796.

Temporarily reverted due to https://bugs.llvm.org/show_bug.cgi?id=47837.

[AMDGPU] Cleanup memory legalizer interfaces

- Rename interfaces to be in terms of acquire and release.
- Improve comments.

Differential Revision: https://reviews.llvm.org/D89355

[test][NewPM] Pin -mergereturn tests to legacy PM

Looks like this pass isn't really used and hasn't been worked on in a
loooong time.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89010

[LoopExtract][NewPM] Port -loop-extract to NPM

-loop-extract-single is just -loop-extract on one loop.

-loop-extract depended on -break-crit-edges and -loop-simplify in the
legacy PM, but the NPM doesn't allow specifying pass dependencies like
that, so manually add those passes to the RUN lines where necessary.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89016

libDebugInfoDWARF: Don't try to parse loclist[.dwo] headers when parsing debug_info[.dwo]

There's no way to know whether there's a loclist contribution to parse
if there's no loclistx encoding - and if there is one, there's no need
to walk back from the loclist_base (or, uin the case of
info.dwo/loclist.dwo - starting at 0 in the contribution) to parse the
header, instead rely on the DWARF32/64 and address size in the CU
that's already available.

This would come up in split DWARF (non-split wouldn't try to read a
loclist header in the absence of a loclist_base) when one unit had
location lists and another does not (because the loclists.dwo section
would be non-empty in that case - in the case where it's empty the
parsing would silently skip).

Simplify the testing a bit, rather than needing a whole dwp, etc - by
creating a malformed loclists.dwo section (and use single file Split
DWARF) that would trip up any attempt to parse it - but no attempt
should be made.