review.tizen.org Git - platform/upstream/llvm.git/log

[mlir][sparse] scalarize reductions in for-loops during sparse codegen

Reductions in innermost loops become harder for the backend to disambiguate
after bufferization into memrefs, resulting in less efficient load-update-store
cycles. By scalarizing innermost reductions, the backend is more likely to assign
a register to perform the reduction (also prepares vectorization). Even though
we could scalarize reductions for more outer loops and while-loops as well,
currently scalarization is only done for chains of innermost for-loops, where
it matters most, to avoid complicating codegen unnecessary (viz. adding lots
of yield instructions).

This CL also refactors condition simplification into the merger class,
where it belongs, so that conditions are simplified only once per loop
nest and not repeatedly as was currently done. This CL also fixes a few
minor bugs, some layout issues, and comments.

Reviewed By: penpornk

Differential Revision: https://reviews.llvm.org/D93143

Remove unneeded header include (NFC)

[mlir] Move `std.tensor_cast` -> `tensor.cast`.

This is almost entirely mechanical.

Differential Revision: https://reviews.llvm.org/D93357

Workaround around clang 5.0 bug by including SmallVector.h in LLVM.h (PR41549)

The forward declaration for SmallVector does not play well with clang-5.

Differential Revision: https://reviews.llvm.org/D93498

[mlir][Linalg] Define a linalg.init_tensor operation.

This operation is used to materialize a tensor of a particular
shape. The shape could be specified as a mix of static and dynamic
values.

The use of this operation is to be an `init` tensor for Linalg
structured operation on tensors where the bounds of the computation
depends on the shape of the output of the linalg operation. The result
of this operation will be used as the `init` tensor of such Linalg
operations. To note,

1) The values in the tensor materialized is not used. Any operation to
   which this is an init tensor is expected to overwrite the entire
   tensor.
2) The tensor is materialized only for the shape of the output and to
   make the loop bounds depend only on operands of the structured
   operation.

Based on (1) and (2) it is assumed that these operations eventually go
away since they are only used in `dim` operations that can be
canonicalized to make this operation dead. Such canonicalization are
added here too.

Differential Revision: https://reviews.llvm.org/D93374

[mlir] Add canonicalization from `tensor_cast` to `dim` op.

Fold a `tensor_cast` -> `dim` to take the `dim` of the original tensor.

Differential Revision: https://reviews.llvm.org/D93492

[DSE] Add test for potential caching bug (NFC)

This one would miscompile if read-clobber checks switched to using
the EarlierAccess location, but the read cache was retained.

CodeGen: Improve generated IR for __builtin_mul_overflow(uint, uint, int)

Add a special case for handling __builtin_mul_overflow with unsigned
inputs and a signed output to avoid emitting the __muloti4 library
call on x86_64. __muloti4 is not implemented in libgcc, so avoiding
this call fixes compilation of some programs that call
__builtin_mul_overflow with these arguments.

For example, this fixes the build of cpio with clang, which includes code from
gnulib that calls __builtin_mul_overflow with these argument types.

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D84405

[VectorCombine] add tests for gep load with cast; NFC

[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree

Pretty boring, removeUnwindEdge() already known how to update DomTree,
so if we are to call it, we must first flush our own pending updates;
otherwise, we just stop predecessors from branching to us,
and for certain predecessors, stop their predecessors from
branching to them also.

[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree

... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a number of tests,
all of which are marked as such so that they do not regress.

[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree

... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.

Fix -Wno-error= parsing in clang-format.

As noted in https://reviews.llvm.org/D86137#2460135 parsing of
the clang-format parameter -Wno-error=unknown fails.
This currently is done by having `-Wno-error=unknown` as an option.
In this patch this is changed to make `-Wno-error=` parse an enum into a bit set.
This way the parsing is fixed and also we can possibly add new options easily.

Reviewed By: MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D93459

[libc++] Fix extern C for __sanitizer_annotate_contiguous_container() (for gcc)

gcc supports it only at the beginning:

    $ g++ -o /dev/null -c /tmp/test_extern.cpp
    $ cat /tmp/test_extern.cpp
    extern "C" __attribute__ ((__visibility__("default"))) int foo();

Otherwise:

    $ g++ -o /dev/null -c /tmp/test_extern.cpp
    /tmp/test_extern.cpp:1:52: error: expected unqualified-id before string constant
        1 | __attribute__ ((__visibility__("default"))) extern "C" int foo();
          |                                                    ^~~
    $ cat /tmp/test_extern.cpp
    __attribute__ ((__visibility__("default"))) extern "C" int foo();

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D93316

lld: Replace some lld::outs()s with message()

No behavior change.

[mlir][IR][NFC] Move context/location parameters of builtin Type::get methods to the start of the parameter list

This better matches the rest of the infrastructure, is much simpler, and makes it easier to move these types to being declaratively specified.

Differential Revision: https://reviews.llvm.org/D93432

Revert "Ensure SplitEdge to return the new block between the two given blocks"

This reverts commit d20e0c3444ad9ada550d9d6d1d56fd72948ae444.

[mlir] Partially update the conversion-to-llvm document

This document was not updated after the LLVM dialect type system had been
reimplemented and was using an outdated syntax. Rewrite the part of the
document that concerns type conversion and prepare the ground for splitting it
into a document that explains how built-in types are converted and a separate
document that explains how standard types and functions are converted, which
will better correspond to the fact that built-in types do not belong to the
standard dialect.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D93486

clang-cl: Remove /Zd flag

cl.exe doesn't understand Zd (in either MSVC 2017 or 2019), so neiter
should we. It used to do the same as `-gline-tables-only` which is
exposed as clang-cl flag as well, so if you want this behavior, use
`gline-tables-only`. That makes it clear that it's a clang-cl-only flag
that won't work with cl.exe.

Motivated by the discussion in D92958.

Differential Revision: https://reviews.llvm.org/D93458

[gn build] Link with -Wl,--gdb-index when linking with LLD

For full-debug-info (is_debug=true / symbol_level=2 builds), this makes
linking 15% slower, but gdb startup 1500% faster (for lld: link time
3.9s->4.4s, gdb load time >30s->2s).

For link time, I ran

    bench.py -o {noindex,index}.txt \
        sh -c 'rm out/gn/bin/lld && ninja -C out/gn lld'

and then `ministat noindex.txt index.txt`:

```
x noindex.txt
+ index.txt
    N           Min           Max        Median           Avg        Stddev
x   5      3.784461     4.0200169     3.8452811     3.8754988   0.089902595
+   5       4.32496     4.6058481     4.3361208     4.4141198    0.12288267
Difference at 95.0% confidence
0.538621 +/- 0.15702
13.8981% +/- 4.05161%
(Student's t, pooled s = 0.107663)
```

For gdb load time I loaded the crash in PR48392 with

    gdb -ex r --args ../out/gn/bin/ld64.lld.darwinnew @response.txt

and just stopped the time until the crash got displayed with a stopwatch
a few times. So the speedup there is less precise, but it's so
pronounced that that's ok (loads ~instantly with the patch, takes a very
long time without it).

Only doing this for LLD because I haven't tried it with other linkers.

Differential Revision: https://reviews.llvm.org/D92844

[OpenMP][NFC] Provide a new remark and documentation

If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D93439

[AttrDocs] document always_inline

GNU documentaion for always_inline:
https://gcc.gnu.org/onlinedocs/gcc/Inline.html

GNU documentation for function attributes:
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

Microsoft documentation for __force_inline:
https://docs.microsoft.com/en-us/cpp/cpp/inline-functions-cpp

Reviewed By: ojeda

Differential Revision: https://reviews.llvm.org/D68410

[mlir][ArmSVE] Add documentation generation

Adds missing cmake command to generate documentation for ArmSVE
Dialect.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D93465

[gn build] (manually) merge f4c8b8031800

[DSE] Add more tests for read clobber location (NFC)

[test] Factor out creation of copy of SCC Nodes into function

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D93434

Fix NDEBUG build after https://reviews.llvm.org/D93005.

Differential Revision: https://reviews.llvm.org/D93480

[NFC][AMDGPU] Reorganize description of scratch handling

Differential Revision: https://reviews.llvm.org/D93440

[mlir][LLVMIR] Add 'llvm.switch' op

The LLVM IR 'switch' instruction allows control flow to be transferred
to one of any number of branches depending on an integer control value,
or a default value if the control does not match any branch values. This patch
adds `llvm.switch` to the MLIR LLVMIR dialect, as well as translation routines
for lowering it to LLVM IR.

To store a variable number of operands for a variable number of branch
destinations, the new op makes use of the `AttrSizedOperandSegments`
trait. It stores its default branch operands as one segment, and all
remaining case branches' operands as another. It also stores pairs of
begin and end offset values to delineate the sub-range of each case branch's
operands. There's probably a better way to implement this, since the
offset computation complicates several parts of the op definition. This is the
approach I settled on because in doing so I was able to delegate to the default
op builder member functions. However, it may be preferable to instead specify
`skipDefaultBuilders` in the op's ODS, or use a completely separate
approach; feedback is welcome!

Another contentious part of this patch may be the custom printer and
parser functions for the op. Ideally I would have liked the MLIR to be
printed in this way:

```
llvm.switch %0, ^bb1(%1 : !llvm.i32) [
1: ^bb2,
2: ^bb3(%2, %3 : !llvm.i32, !llvm.i32)
]
```

The above would resemble how LLVM IR is formatted for the 'switch'
instruction. But I found it difficult to print and parse something like
this, whether I used the declarative assembly format or custom functions.
I also was not sure a multi-line format would be welcome -- it seems
like most MLIR ops do not use newlines. Again, I'd be happy to hear any
feedback here as well, or on any other aspect of the patch.

Differential Revision: https://reviews.llvm.org/D93005

[openmp] Remove clause from OMPKinds.def and use OMP.td info

Remove the OpenMP clause information from the OMPKinds.def file and use the
information from the new OMP.td file. There is now a single source of truth for the
directives and clauses.

To avoid generate lots of specific small code from tablegen, the macros previously
used in OMPKinds.def are generated almost as identical. This can be polished and
possibly removed in a further patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D92955

[WebAssembly][lld] Don't mark a file live from an undefine symbol

Live symbols should only cause the files in which they are defined
to become live.

For now this is only tested in emscripten: we're continuing
to work on reducing the test case further for an lld-style
unit test.

Differential Revision: https://reviews.llvm.org/D93472

[OpenMP] Add definitions for 5.1 interop to omp.h

scudo: Adjust test to use correct check for primary allocations.

canAllocate() does not take into account the header size so it does
not return the right answer in borderline cases. There was already
code handling this correctly in isTaggedAllocation() so split it out
into a separate function and call it from the test.

Furthermore the test was incorrect when MTE is enabled because MTE
does not pattern fill primary allocations. Fix it.

Differential Revision: https://reviews.llvm.org/D93437

Add brief description of dialects doc section.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D93466

[PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_

On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.

Differential Revision: https://reviews.llvm.org/D91974

[scudo][standalone] Allow the release of smaller sizes

Initially we were avoiding the release of smaller size classes due to
the fact that it was an expensive operation, particularly on 32-bit
platforms. With a lot of batches, and given that there are a lot of
blocks per page, this was a lengthy operation with little results.

There has been some improvements since then to the 32-bit release,
and we still have some criterias preventing us from wasting time
(eg, 9x% free blocks in the class size, etc).

Allowing to release blocks < 128 bytes helps in situations where a lot
of small chunks would not have been reclaimed if not for a forced
reclaiming.

Additionally change some `CHECK` to `DCHECK` and rearrange a bit the
code.

I didn't experience any regressions in my benchmarks.

Differential Revision: https://reviews.llvm.org/D93141

Add call site location getter to C API

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D93334

[gn build] Port dae34463e3e

[IRSim][IROutliner] Adding the extraction basics for the IROutliner.

Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed. Each
IRSimilarityCandidate is used to define an OutlinableRegion. Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Recommit of bf899e891387d07dfd12de195ce2a16f62afd5e0 fixing memory
leaks.

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975

[gn build] Add symbol_level to adjust debug info level

is_debug by default makes symbol_level = 2 and !is_debug means by
default symbol_level = 0.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92958

[LangRef] Update new ssp/sspstrong/sspreq semantics after D91816

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D93422

[libc][Obvious] Fix typo is wrappergen unittest.

[lldb] [Process/FreeBSDRemote] Use RegSetKind consistently [NFC]

Use RegSetKind enum for register sets everything, rather than int.
Always spell it as 'RegSetKind', without unnecessary 'enum'. Add
missing switch case. While at it, use uint32_t for regnums
consistently.

Differential Revision: https://reviews.llvm.org/D93450

[lldb] [Process/FreeBSDRemote] Replace GetRegisterSetCount()

Replace the wrong code in GetRegisterSetCount() with a constant return.
The original code passed register index in place of register set index,
effectively getting always true. Correcting the code to check for
register set existence is not possible as LLDB supports only eliminating
last register sets. Just return the full number for now which should
be NFC.

Differential Revision: https://reviews.llvm.org/D93396

[libc] Add python3 to libc buildbot depedencies.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D93463

[libc] Refactor WrapperGen to make the flow cleaner.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D93417

Revert "[lldb] Make CommandInterpreter's execution context the same as debugger's one."

This reverts commit a01b26fb51c710a3a8ef88cc83b0701461f5b9ab, because it
breaks the "finish" command in some way -- the command does not
terminate after it steps out, but continues running the target. The
exact blast radius is not clear, but it at least affects the usage of
the "finish" command in TestGuiBasicDebug.py. The error is *not*
gui-related, as the same issue can be reproduced by running the same
steps outside of the gui.

There is some kind of a race going on, as the test fails only 20% of the
time on the buildbot.

Detect section type conflicts between functions and variables

If two variables are declared with __attribute__((section(name))) and
the implicit section types (e.g. read only vs writeable) conflict, an
error is raised. Extend this mechanism so that an error is raised if the
section type implied by a function's __attribute__((section)) conflicts
with that of another variable.

[flang][openacc] Enforce restriction on routine directive and clauses

This patch add some checks for the restriction on the routine directive
and fix several issue at the same time.

Validity tests have been added in a separate file than acc-clause-validity.f90 since this one
became quite large. I plan to split the larger file once on-going review are done.

Reviewed By: sameeranjoshi

Differential Revision: https://reviews.llvm.org/D92672

[DebugInfo] Avoid re-ordering assignments in LCSSA

The LCSSA pass makes use of a function insertDebugValuesForPHIs() to
propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty
behaviour occurs when the parent PHI of a newly inserted PHI is not the
most recent assignment to a source variable. insertDebugValuesForPHIs ends
up propagating a value that isn't the most recent assignemnt.

This change removes the call to insertDebugValuesForPHIs() from LCSSA,
preventing incorrect dbg.value intrinsics from being propagated.
Propagating variable locations between blocks will occur later, during
LiveDebugValues.

Differential Revision: https://reviews.llvm.org/D92576

[PowerPC][NFC] Cleanup PPCCTRLoopsVerify pass

The PPCCTRLoop pass has been moved to HardwareLoops,
so the comments and some useless code are deprecated now.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D93336

[amdgpu] Default to code object v3

[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end

Reviewed By: t-tye, yaxunl

Differential Revision: https://reviews.llvm.org/D93258

[mlir][spirv] NFC: Shuffle code around to better follow convention

This commit shuffles SPIR-V code around to better follow MLIR
convention. Specifically,

* Created IR/, Transforms/, Linking/, and Utils/ subdirectories and
  moved suitable code inside.
* Created SPIRVEnums.{h|cpp} for SPIR-V C/C++ enums generated from
  SPIR-V spec. Previously they are cluttered inside SPIRVTypes.{h|cpp}.
* Fixed include guards in various header files (both .h and .td).
* Moved serialization tests under test/Target/SPIRV.
* Renamed TableGen backend -gen-spirv-op-utils into -gen-spirv-attr-utils
  as it is only generating utility functions for attributes.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D93407

Ensure SplitEdge to return the new block between the two given blocks

This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
br bb1
bb1:
        %0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
        %0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
        %0 = mul i32 1, 2
br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200

[llvm-symbolizer][Windows] Add start line when searching in line table sections.

Fixes issue where if a line section doesn't start with a line number
then the addresses at the beginning of the section don't have line numbers.

For example, for a line section like this
```
0001:00000010-00000014, line/column/addr entries = 1
7 00000013 !
```
a line number wouldn't be found for addresses from 10 to 12.

This matches behavior when using the DIA SDK.

Differential Revision: https://reviews.llvm.org/D93306

[SampleFDO] Fix uninitialized field warnings. NFCI.

Seems to have been caused by D93254 which added the SecHdrTableEntry::LayoutIndex field.

[flang][openacc] Update serial construct clauses for OpenACC 3.1

Update the allowed clauses for the SERIAL construct for the new OpenACC 3.1
specification.

Reviewed By: sameeranjoshi

Differential Revision: https://reviews.llvm.org/D92123

[Clang] Make nomerge attribute a function attribute as well as a statement attribute.

Differential Revision: https://reviews.llvm.org/D92800

[TableGen] Return const std::string& in InstrMap getName()/getFilterClass() methods. NFCI.

Avoid temp std::string instances - we're never keeping these around, just printing them to streams, converting to StringRef etc.

[InstCombine] Preserve !annotation on newly created instructions.

If the source instruction has !annotation metadata, all instructions
created during combining should also have it. Tell the builder to
add it.

The !annotation system was discussed on llvm-dev as part of
'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

This patch is based on an earlier patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, lebedev.ri

Differential Revision: https://reviews.llvm.org/D91444

[ARM][MachineOutliner] Fix costs model.

Fix candidates calls costs models allocation and prepare stack fixups
handling.

Differential Revision: https://reviews.llvm.org/D92933

[TableGen] Make InstrMap::getFilterClass() const. NFCI.

Reported by cppcheck.

I've run clang-format across all the InstrMap accessors as well.

Fix dead link

Remove Python2 fallback and only advertise Python3 in the doc

Differential Revision: https://www.youtube.com/watch?v=RsL0cipURA0

[lld] [ELF] AArch64: Handle DT_AARCH64_VARIANT_PCS

As indicated by AArch64 ELF specification, symbols with st_other
marked with STO_AARCH64_VARIANT_PCS indicates it may follow a variant
procedure call standard with different register usage convention
(for instance SVE calls).

Static linkers must preserve the marking and propagate it to the dynamic
symbol table if any reference or definition of the symbol is marked with
STO_AARCH64_VARIANT_PCS, and add a DT_AARCH64_VARIANT_PCS dynamic tag if
there are R_<CLS>_JUMP_SLOT relocations that reference that symbols.

It implements https://bugs.llvm.org/show_bug.cgi?id=48368.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D93045

[llvm-readobj/elf] - AArch64: Handle AARCH64_VARIANT_PCS for GNUStyle

It mimics the GNU readelf where it prints a [VARIANT_PCS] for symbols
with st_other with STO_AARCH64_VARIANT_PCS.

Reviewed By: grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D93044

[obj2yaml][yaml2obj] - Add AArch64 STO_AARCH64_VARIANT_PCS support

Reviewed By: grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D93235

[clang-tidy][NFC] Reduce copies of Intrusive..FileSystem

Swapped a few instances where a move is more optimal or the target doesn't need to hold a reference.

[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest.

When folding a branch to a common destination, preserve !annotation on
the created instruction, if the terminator of the BB that is going to be
removed has !annotation. This should ensure that !annotation is attached
to the instructions that 'replace' the original terminator.

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D93410

[InstCombine] Remove scalable vector restriction in InstCombineCasts

Differential Revision: https://reviews.llvm.org/D93389

[lld-macho] Use LC_LOAD_WEAK_DYLIB for dylibs with only weakrefs

Note that dylibs without *any* refs will still be loaded in the usual
(strong) fashion.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93435

[lld-macho] Add support for weak references

Weak references need not necessarily be satisfied at runtime (but they must
still be satisfied at link time). So symbol resolution still works as per usual,
but we now pass around a flag -- ultimately emitting it in the bind table -- to
indicate if a given dylib symbol is a weak reference.

ld64's behavior for symbols that have both weak and strong references is
a bit bizarre. For non-function symbols, it will emit a weak import. For
function symbols (those referenced by BRANCH relocs), it will emit a
regular import. I'm not sure what value there is in that behavior, and
since emulating it will make our implementation more complex, I've
decided to treat regular weakrefs like function symbol ones for now.

Fixes PR48511.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93369

[ARM] Adding v8.7-A command-line support for the ARM target

This extends the command-line support for the 'armv8.7-a' architecture
name to the ARM target.

Based on a patch written by Momchil Velikov.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D93231

[ARM][AAarch64] Initial command-line support for v8.7-A

This introduces command-line support for the 'armv8.7-a' architecture name
(and an alias without the '-', as usual), and for the 'ls64' extension name.

Based on patches written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91776

[AArch64] Adding the v8.7-A LD64B/ST64B Accelerator extension

This adds support for the v8.7-A LD64B/ST64B Accelerator extension
through a subtarget feature called "ls64". It adds four 64-byte
load/store instructions with an operand in the new GPR64x8 register
class, and one system register that's part of the same extension.

Based on patches written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91775

[AArch64] Add a GPR64x8 register class

This adds a GPR64x8 register class that will be needed as the data
operand to the LD64B/ST64B family of instructions in the v8.7-A
Accelerator Extension, which load or store a contiguous range of eight
x-regs. It has to be its own register class so that register allocation
will have visibility of the full set of registers actually read/written
by the instructions, which will be needed when we add intrinsics and/or
inline asm access to this piece of architecture.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91774

[ARM][AArch64] Adding basic support for the v8.7-A architecture

This introduces support for the v8.7-A architecture through a new
subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT"
instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI
instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new
HCRX_EL2 system register.

Based on patches written by Simon Tatham and Victor Campos.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91772

[NFC][AArch64] Capturing multiple feature requirements in AsmParser messages

This enables the capturing of multiple required features in the AArch64
AsmParser's SysAlias error messages.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92388

[NFC][AArch64] Move AArch64 MSR/MRS into a new decoder namespace

This removes the general forms of the AArch64 MSR and MRS instructions
from the same decoding table that contains many more specific
instructions that supersede them. They're now in a separate decoding
table of their own, called "Fallback", which is only consulted in the
event of the main decoder table failing to produce an answer.

This should avoid decoding conflicts on future specialized instructions
in the MSR space.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91771

[IRBuilder] Generalize debug loc handling for arbitrary metadata.

This patch extends IRBuilder to allow adding/preserving arbitrary
metadata on created instructions.

Instead of using references to specific metadata nodes (like DebugLoc),
IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which
are added to each created instruction.

The patch itself is a NFC and only moves the existing debug location
handling over to the new system. In a follow-up patch it will be used to
preserve !annotation metadata besides !dbg.

The current approach requires iterating over MetadataToCopy to avoid
adding duplicates, but given that the number of metadata kinds to
copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2
(!dbg and !annotation)) that should not matter.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D93400

[libc] revamp memory function benchmark

The benchmarking infrastructure can now run in two modes:
- Sweep Mode: which generates a ramp of size values (same as before),
- Distribution Mode: allows the user to select a distribution for the size paramater that is representative from production.

The analysis tool has also been updated to handle both modes.

Differential Revision: https://reviews.llvm.org/D93210

[lldb-vscode] Speculative fix for raciness in TestVSCode_attach

The test appears to expect the inferior to be stopped, but the custom
"attach commands" leave it in a running state.

It's unclear how this could have ever worked.

[mlir] Fix syntax error in markdown documentation

[lldb] [unittests] Filter FreeBSD through CMake rather than #ifdef

[lldb] [unittests] Add tests for NetBSD register offsets/sizes

Differential Revision: https://reviews.llvm.org/D93299

[LV] Disable epilogue vectorization for scalable VFs

Epilogue vectorization doesn't support scalable vectorization factors
yet, disable it for now.

Reviewed By: sdesmalen, bmahjour

Differential Revision: https://reviews.llvm.org/D93063

[DebugInfo] Fix MSVC build by adding back necessary reverse_iterator != operator

Put back the std::reverse_iterator<DWARFDie::iterator> != operator that was removed in D78938 to fix VS2019 builds

[clangd] Add llvm:: qualifier to work around GCC bug. NFC

Some old GCC versions seem to miss the default template parameter when
using the clang/Basic/LLVM.h forward declarations of SmallVector.

See D92788

[AArch64] Renamed sve-masked-scatter-legalise.ll. NFC.

[libcxx] Remove ifdefs in the message to static_assert. NFC.

Differential Revision: https://reviews.llvm.org/D93283

[mlir] Move LLVM Dialect Op documentation to ODS

This was long overdue. The initial documentation for the LLVM dialect was
introduced before ODS had support for long descriptions. This is now possible,
so the documentation is moved to ODS, which can serve as a single source of
truth. The high-level description of the dialect structure is updated to
reflect that.

Depends On: D93315

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D93425

[mlir] partially update LLVM dialect documentation

Rewrite the parts of the documentation that became stale: context/module
handling and type system. Expand the type system description.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D93315

[SVE][CodeGen] Add bfloat16 support to scalable masked gather

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93307

[NFC] Reduce include files dependency and AA header cleanup (part 2).

Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852

[X86] Remove extract_subvector(subv_broadcast_load()) fold.

This was needed in an earlier version of D92645, but isn't now - and I've just noticed that it was potentially flawed depending on the relevant widths of the broadcasted and extracted subvectors.

fix a -Wunused-variable warning in release build

[lldb] Add std::array to the supported template list of the CxxModuleHandler

Identical to the other patches that add STL containers to the supported
templated list.

Make LLVM build in C++20 mode

Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).

There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:

// 1) Missing const
namespace missing_const {
    struct A {
    #ifndef FIXED
        bool operator==(A const&);
    #else
        bool operator==(A const&) const;
    #endif
    };

    bool a = A{} == A{}; // error
}

// 2) Type mismatch on CRTP
namespace crtp_mismatch {
    template <typename Derived>
    struct Base {
    #ifndef FIXED
        bool operator==(Derived const&) const;
    #else
        // in one case changed to taking Base const&
        friend bool operator==(Derived const&, Derived const&);
    #endif
    };

    struct D : Base<D> { };

    bool b = D{} == D{}; // error
}

// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
    template <bool Const>
    struct iterator {
        using const_iterator = iterator<true>;

        iterator();

        template <bool B, std::enable_if_t<(Const && !B), int> = 0>
        iterator(iterator<B> const&);

    #ifndef FIXED
        bool operator==(const_iterator const&) const;
    #else
        friend bool operator==(iterator const&, iterator const&);
    #endif
    };

    bool c = iterator<false>{} == iterator<false>{} // error
          || iterator<false>{} == iterator<true>{}
          || iterator<true>{} == iterator<false>{}
          || iterator<true>{} == iterator<true>{};
}

// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
    enum Color { Red };

    struct C {
        C();
        C(Color);
        operator Color() const;
        bool operator==(Color) const;
        friend bool operator==(C, C);
    };

    bool c = C{} == C{}; // error
    bool d = C{} == Red;
}

Differential revision: https://reviews.llvm.org/D78938

[X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969)

Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns.

This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts.

As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes.

Later patches will continue to replace X86ISD::SUBV_BROADCAST usage.

Differential Revision: https://reviews.llvm.org/D92645

[libc] add back math.h #include utils/FPUtil/ManipulationFunctions.h

This partially reverts cee1e7d14f4628d6174b33640d502bff3b54ae45:
  [libc][NFC][Obvious] Remove few unnecessary #include directives in tests.

That commit causes a test failure in our configuration:
[ RUN      ] ILogbTest.SpecialNumbers_ilogb
third_party/llvm/llvm-project/libc/test/src/math/ILogbTest.h:28: FAILURE
      Expected: FP_ILOGBNAN
      Which is: 2147483647
To be equal to: func(__llvm_libc::fputil::FPBits<T>::buildNaN(1))
      Which is: -2147483648