review.tizen.org Git - platform/upstream/llvm.git/log

[InstCombine] avoid 'tmp' usage in test files; NFC

The update script ( utils/update_test_checks.py ) warns against this.

[InstCombine] avoid 'tmp' usage in test file; NFC

The update script ( utils/update_test_checks.py ) warns against this.

Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration""

This reverts commit 43d2e51c2e86788b9e2a582fdd3d8ffa7829328a.

Commited wrong version.

Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"

The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.

Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.

Differential Revision: https://reviews.llvm.org/D102615

[mlir] Fold complex.create(complex.re(op), complex.im(op))

Differential Revision: https://reviews.llvm.org/D103148

[AArch64] Generate LD1 for anyext i8 or i16 vector load

The existing LD1 patterns do not cover cases where result type does
not match the memory type. This happens when illegal vector types are
extended and scalarized, for example:

  load <2 x i16>* %v2i16

is lowered into:

  // first element
  (v4i32 (insert_subvector (v2i32 (scalar_to_vector (load anyext from i16)))))
  // other elements
  (v4i32 (insert_vector_elt (i32 (load anyext from i16)) idx))

Before this patch these patterns were compiled into LDR + INS.
Now they are compiled into LD1.

The problem was reported in
PR24820: LLVM Generates abysmal code in simple situation.

Differential Revision: https://reviews.llvm.org/D102938

[Test] Add Loop Deletion test with irreducible CFG

Authored by Mikael Holmén. It demonstrated miscompile on irreducible
CFG with patch "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration".
The patch is reverted. Checking in the test to make sure this bug
does not return.

[OpenCL] Include header for atomic-ops test

Avoid duplicating the memory_order and memory_scope enum definitions.

[MC] Move elf-unique-sections-by-flags.ll to X86/

[Docs] Updated the content of getting started documentation under llvm/lib/MC

Wrote about llvm/lib/MC subproject on https://llvm.org/docs/GettingStarted.html page.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D101047

[MC][ELF] Emit unique sections for different flags

Global values imply flags such as readable, writable, executable for the
sections that they will be placed in. Currently MC places all such
entries into the same section, using the first set of flags seen. This
can lead to situations in LTO where a writable global is placed in the
same named section as a readable global from another file, and the
section may not be marked writable.

D72194 ensures that mergeable globals with explicit sections are placed
in separate sections with compatible entry size, by emitting the
`unique` assembly syntax where appropriate. This change extends that
approach to include section flags, so that globals with different
section flags are emitted in separate unique sections.

Differential revision: https://reviews.llvm.org/D100944

[MC][NFCI] Factor out ELF section unique ID calculation

Precursor to D100944. The logic for determining the unique ID had become
quite difficult to reason about, so I have factored this out into a
separate function.

Differential Revision: https://reviews.llvm.org/D102336

[AMDGPU][Libomptarget] Inline atmi_init/atmi_finalize

After D102847, these functions can be inlined.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103075

[AMDGPU][Libomptarget] Delete g_atmi_initialized

This patch drops g_atmi_initialized and inlines the Initialize &
Finalize methods from Runtime class.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102847

[lldb][NFC] Use C++ versions of the deprecated C standard library headers

The C headers are deprecated so as requested in D102845, this is replacing them
all with their (not deprecated) C++ equivalent.

Reviewed By: shafik

Differential Revision: https://reviews.llvm.org/D103084

[X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs

Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate.

Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.

[SCEV] Add tests with signed predicates for applyLoopGuards.

[AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy

Two globals KernelInfoTable & SymbolInfoTable are moved
into RTLDeviceInfoTy class.
This builds on the top of D102691.
[2/2]

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102692

[NFC] Add CHECK lines for unordered FP reductions

An additional RUN line has been added to both strict-fadd.ll &
scalable-strict-fadd.ll to ensure the correct behaviour of these
tests where `-enable-strict-reductions` is false.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D103015

[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks

This function can change regbank for registers which already have a selected
bank. Depending on the instruction where these registers were used it can
cause instruction selection to fail.

Differential Revision: https://reviews.llvm.org/D98515

Revert "[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks"

This reverts commit 18c5444702893fd63b0a99ec7133dd714284f9d2.

[RISCV] Pre-commit fixed-length mask vselect tests

These are default-expanded but later unrolled due to RISC-V's vector
boolean content policy. A patch to improve this codegen will follow
shortly.

[Test] Add simplified versions of tests for loop deletion that don't need context

AArch64: support post-indexed stores to bfloat types.

[CostModel][X86] Remove old testshift* tests

The vector shift cost tests are better covered (more cpu/sse levels) by the vshift-*-*cost files, and we're trying to avoid codegen tests in here as it makes it harder to maintain the test files.

[X86][Atom] Fix vector variable shift resource/throughputs

Match whats documented in the Intel AOM - the non-immediate variants of the PSLL*/PSRA*/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.

[Test] Add test on unrolling to make sure it won't fail

Initially it failed an assertion with "Do actual DCE in LoopUnroll (try 2)"
which was later reverted. Make sure that when this patch is returned, the
test works fine.

[NFC][X86] clang-format X86TTIImpl::getInterleavedMemoryOpCostAVX2()

I plan to make changes to it, and undoing formatting each time is not going to be fun.

Fix warning introduced by 9c766f4090d19e3e2f56e87164177f8c3eba4b96

[HIP] Adjust check in hip-include-path.hip test case

The changes in commit 722c39fef5ab6 caused the test case to fail
when building with -DLLVM_LIBDIR_SUFFIX=64. This patch makes the
checks a bit more relaxed to support libdir suffixes again.

Also adjusting the regular expressions to avoid mathes including
double quotes.

[mlir] LocalAliasAnalysis: Assume allocation scope to function scope if cannot determine better

It helps when checking aliasing between AllocOp result and function arguments.

Differential Revision: https://reviews.llvm.org/D102557

[mlir] Simplify folding code (NFC)

[InstCombine] Fold extractelement + vector GEP with one use

We sometimes see code like this:

Case 1:
  %gep = getelementptr i32, i32* %a, <2 x i64> %splat
  %ext = extractelement <2 x i32*> %gep, i32 0

or this:

Case 2:
  %gep = getelementptr i32, <4 x i32*> %a, i64 1
  %ext = extractelement <4 x i32*> %gep, i32 0

where there is only one use of the GEP. In such cases it makes
sense to fold the two together such that we create a scalar GEP:

Case 1:
  %ext = extractelement <2 x i64> %splat, i32 0
  %gep = getelementptr i32, i32* %a, i64 %ext

Case 2:
  %ext = extractelement <2 x i32*> %a, i32 0
  %gep = getelementptr i32, i32* %ext, i64 1

This may create further folding opportunities as a result, i.e.
the extract of a splat vector can be completely eliminated. Also,
even for the general case where the vector operand is not a splat
it seems beneficial to create a scalar GEP and extract the scalar
element from the operand. Therefore, in this patch I've assumed
that a scalar GEP is always preferrable to a vector GEP and have
added code to unconditionally fold the extract + GEP.

I haven't added folds for the case when we have both a vector of
pointers and a vector of indices, since this would require
generating an additional extractelement operation.

Tests have been added here:

  Transforms/InstCombine/gep-vector-indices.ll

Differential Revision: https://reviews.llvm.org/D101900

[mlir] Fold complex.re(complex.create) and complex.im(complex.create)

This extends the folding we already have. A test needs to be adjusted.

Differential Revision: https://reviews.llvm.org/D103141

[NFC][object] Change the input parameter of the method isDebugSection.

Summary: This is a NFC patch to change the input parameter of the method SectionRef::isDebugSection(), by replacing the StringRef SectionName with DataRefImpl Sec. This allows us to determine if a section is debug type in more ways than just by section name.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D102601

[ARM] Add patterns for vmulh

Now that vmulh can be selected, this adds the MVE patterns to make it
legal and generate instructions.

Differential Revision: https://reviews.llvm.org/D88011

[clang-format][NFC] correctly sort StatementAttributeLike-macros' IO.map

[gn build] Port 36d0fdf9ac3b

[libcxx][iterator] adds `std::ranges::advance`

Implements part of P0896 'The One Ranges Proposal'.
Implements [range.iter.op.advance].

Differential Revision: https://reviews.llvm.org/D101922

[OpaquePtr] Make atomicrmw work with opaque pointers

FullTy is only necessary when we need to figure out what type an
instruction works with given a pointer's pointee type. However, we just
end up using the value operand's type, so FullTy isn't necessary.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102788

Revert "[lldb] Avoid format string in LLDB_SCOPED_TIMER"

Right after pushing, I remembered that this was added to silence a GCC
warning (https://reviews.llvm.org/D99120). This reverts my patch and
adds a comment.

[lldb] Avoid format string in LLDB_SCOPED_TIMER

Pass LLVM_PRETTY_FUNCTION directly for the no-argument macro.

[LTT] Handle merged llvm.assume when dropping type tests

When the lower type test pass is invoked a second time with
DropTypeTests set to true, it expects that all remaining type tests feed
assume instructions, which are removed along with the type tests.

In some cases the llvm.assume might have been merged with another one,
i.e. from a builtin_assume instruction, in which case the type test
would actually feed a phi that in turn feeds the merged assume
instruction. In this case we can simply replace that operand of the phi
with "true" before removing the type test.

Differential Revision: https://reviews.llvm.org/D103073

[OpaquePtr] Create new bitcode encoding for atomicrmw

Since the opaque pointer type won't contain the pointee type, we need to
separately encode the value type for an atomicrmw.

Emit this new code for atomicrmw.

Handle this new code and the old one in the bitcode reader.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103123

[sanitizer] Let glibc aarch64 use O(1) GetTls

The generic approach can still be used by musl and FreeBSD. Note: on glibc
2.31, TLS_PRE_TCB_SIZE is 0x700, larger than ThreadDescriptorSize() by 16, but
this is benign: as long as the range includes pthread::{specific_1stblock,specific}
pthread_setspecific will not cause false positives.

Note: the state before afec953857ffd682cb4119e7950f3593efbaaa81 underestimated
the TLS size a lot (nearly ThreadDescriptorSize() = 1776).
That may explain why afec953857ffd682cb4119e7950f3593efbaaa81 actually made some
tests pass.

LLVM Detailed IR tests for introduction of flag -fsanitize-address-detect-stack-use-after-return-mode.

Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102462

[benchmark] Silence 'suggest override' and 'missing override' warnings

When building with Clang 11 on Windows, silence the following:

F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(955,8): warning: 'Run' overrides a member function but is not marked 'override' [-Wsuggest-override]
  void Run(State& st);
       ^
F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(895,16): note: overridden virtual function is here
  virtual void Run(State& state) = 0;
               ^
1 warning generated.

[gcov] Silence warning: comparison of integers of different signs

When building with Clang 11 on Windows, silence the following:

[432/5643] Building C object projects\compiler-rt\lib\profile\CMakeFiles\clang_rt.profile-x86_64.dir\GCDAProfiling.c.obj
F:\aganea\llvm-project\compiler-rt\lib\profile\GCDAProfiling.c(464,13): warning: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Wsign-compare]
if (val != (gcov_version >= 90 ? GCOV_TAG_OBJECT_SUMMARY
~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.

[NFC][MLIR][TOSA] Replaced tosa linalg.indexed_generic lowerings with linalg.index

Indexed Generic should be going away in the future. Migrate to linalg.index.

Reviewed By: NatashaKnk, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D103110

[NFC][SCUDO] Fix unittest for -gtest_repeat=10

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D103122

[MLIR Core] Cache the empty StringAttr like we do for empty dictionaries. NFC.

MLIRContext holds a few special case values that occur frequently like empty
dictionary and NoneType, which allow us to avoid taking locks to get an instance
of them. Give the empty StringAttr this treatment as well. This cuts several
percent off compile time for CIRCT.

Differential Revision: https://reviews.llvm.org/D103117

[Toy] Update tests to pass with top-down canonicalize pass. NFC

[libomptarget][nfc] Move hostcall required test to rtl

[libomptarget][nfc] Move hostcall required test to rtl

Remove a global, fix minor race. First of N patches to bring up hostcall.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103058

[libc++] Install GCC 11 on CI builders

[ARM] Extra predicated tests for VMULH. NFC

[Internalize] Rename instead of removal if a to-be-internalized comdat has more than one member

Beside the `comdat any` deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:

> All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key.

On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated
`__llvm_prf_data` section are placed in the same GRP_COMDAT group. A
`__llvm_prf_data` is usually not referenced and expects the liveness of its
associated `__llvm_prf_cnts` to retain it.

The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the
use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained).
The main goal of this patch is to fix the dependency relationship.

I think it makes sense for InternalizePass to internalize a comdat and thus
suppress the deduplication feature, e.g. a relocatable link of a regular LTO can
create an object file affected by InternalizePass.
If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the
a.o references to the comdat definitions will be non-resolvable (references
cannot bind to STB_LOCAL definitions in b.o).

On PE-COFF, for a non-external selection symbol, deduplication is naturally
suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend
to believe the spec creator has not thought about this use case (see D102973).

GNU ld and gold are still using the "signature is name based" interpretation.
So even if D102973 for ld.lld is accepted, for portability, a better approach is
to rename the comdat. A comdat with one single member is the common case,
leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by
deleting the comdat; otherwise we rename the comdat.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103043

Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"

This reverts commit 2531fd70d19aa5d61feb533bbdeee7717a4129eb due to
performance regression on the PPC buildbot.

[libc++] [P0619] Hide not1 and not2 under _LIBCPP_ENABLE_CXX20_REMOVED_NEGATORS.

This also provides some of the scaffolding needed by D102992 and D101729, and mops up after D101730 etc.

Differential Revision: https://reviews.llvm.org/D103055

[libcxx] Fix the function name in exceptions from create_directories

If the nested create_directory call fails, we'd still want to
re-report the errors with the create_directories function name,
which is what the caller called.

This fixes one aspect from MS STL's tests for std::filesystem.

Differential Revision: https://reviews.llvm.org/D102365

[Canonicalize] Switch the default setting to "top down".

This provides a sizable compile time improvement by seeding
the worklist in an order that leads to less iterations of the
worklist.

This patch only changes the behavior of the Canonicalize pass
itself, it does not affect other passes that use the
GreedyPatternRewrite driver

Differential Revision: https://reviews.llvm.org/D103053

[flang] Implement checks for defined input/output procedures

Defined input/output procedures are specified in 12.6.4.8. There are different
versions for read versus write and formatted versus unformatted, but they all
share the same basic set of dummy arguments.

I added several checking functions to check-declarations.cpp along with a test.

In the process of implementing this, I noticed and fixed a typo in
.../lib/Evaluate/characteristics.cpp.

Differential Revision: https://reviews.llvm.org/D103045

[Canonicalize] Fully parameterize the pass based on config options. NFC.

This allows C++ clients of the Canonicalize pass to specify their own
Config option struct to control how Canonicalize works, increasing reusability.

This also allows controlling these settings for the default Canonicalize pass
using command line options. This is useful for testing and for playing with
things on the command line.

Differential Revision: https://reviews.llvm.org/D103069

[libcxxabi] Use ASan interface header for declaration. NFC

This was changed from using the header to using a forward declaration in
c4600ccf891c, since older versions of the header didn't declare the
function. At this point, it's been declared for ~3.5 years, and it
should be pretty safe to assume that we can rely on the ASan interface
header to provide a declaration instead of needing to write our own.

Reviewed By: #libc_abi, ldionne

Differential Revision: https://reviews.llvm.org/D103003

[libcxx] [test] Explain an XFAIL LIBCXX-WINDOWS-FIXME and convert into UNSUPPORTED

This particular test relies on internal details from the libc++
filesystem implementation header, and those details are structured
differently in the implementation for Windows.

Differential Revision: https://reviews.llvm.org/D102357

[libcxx] Make the visibility attributes consistent for __narrow_to_utf8/__widen_from_utf8

Use the same visiblity attributes as for all other template
specializations in the same file; declare the specialization itself
using _LIBCPP_TYPE_VIS, and don't use _LIBCPP_EXPORTED_FROM_ABI on
the destructor. Methods that are excluded from the ABI are marked
with _LIBCPP_INLINE_VISIBILITY.

This makes the vtable exported from DLL builds of libc++. Practically,
it doesn't make any difference for the CI configuration, but it
can make a difference in mingw setups.

Differential Revision: https://reviews.llvm.org/D102717

[docs] [CMake] Change recommendations for how to use LLVM_DEFINITIONS

LLVM_DEFINITIONS is a string variable containing a list of arguments
to pass to the compiler. When CMake's add_definitions is passed a
string variable, this is interpreted as one argument. To make it
behave properly, the string variable needs to be split into a list.

Despite the fact that add_definitions isn't supposed to be used like
the LLVM docs recommended, it worked fine in practice in many cases.
If the first argument in LLVM_DEFINITIONS is of the form -DFOO=42
instead of plain -DFOO, the rest of the string is treated as value
to this define. I.e. if LLVM_DEFINITIONS consists of `-DFOO=42 -DBAR`,
CMake ended up passing `-DFOO="42 -DBAR"` to the compiler.

See https://gitlab.kitware.com/cmake/cmakissues/22162
for discussion on the matter.

Changing LLVM_DEFINITIONS to be a list variable would possibly be
more disruptive; instead keep the variable defined as before but
change the recommendation for how to use it. Then projects using it
can gradually be updated to follow the new recommendation.

Differential Revision: https://reviews.llvm.org/D103044

[Hexagon] Remove unused function from HexagonISelDAGToDAGHVX.cpp

It will be reintroduced shortly with an actual use. This change is
simply to eliminate a compilation warning.

[sanitizer][test] s/A<10>/A<7>/ to fix "WARNING: Symbolizer buffer too small" which is somehow a hard error on s390x

https://reviews.llvm.org/D102046#2766553

[docs] Explain address spaces a bit more in opaque pointers doc

Reviewed By: theraven

Differential Revision: https://reviews.llvm.org/D102523

[TSAN][CMake] Add support to run lit on individual tests

Handy when testing specific files, already supported in other components.

Example:
cd build; ./bin/llvm-lit ../compiler-rt/test/tsan/ignore_free.cpp

Differential Revision: https://reviews.llvm.org/D103054

[AMDGPU] Fix unused variable warning. NFC.

[NFC] Fix 'unused' warning

[JITLink][MachO][arm64] Build GOT entries for defined symbols too.

During the generic x86-64 support refactor in ecf6466f01c52 the implementation
of MachO_arm64_GOTAndStubsBuilder::isGOTEdgeToFix was altered to only return
true for external symbols. This behavior is incorrect: GOT entries may be
required for defined symbols (e.g. in the large code model).

This patch fixes the bug and adds a test case for it (renaming an old test
case to avoid any ambiguity).

[JITLink][MachO][arm64] Use a more descriptive test name.

[mlir] Add a copy constructor to FailureOr

The copy constructor was missing from FailureOr.

Note that I do not have commit access.

Differential Revision: https://reviews.llvm.org/D98955

[Matrix] Use LLVM_DEBUG for a debug flag

dump() doesn't exist in release builds.

ld.lld: error: undefined symbol: llvm::Value::dump() const
>>> referenced by LowerMatrixIntrinsics.cpp
>>> LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())

Revert "[AIX] Avoid structor alias; die before bad alias codegen"

Avoiding structor alias is no longer needed because AIX now has an alias implementation here: https://reviews.llvm.org/D83252.

This reverts commit b116ded57da3530e661f871f4191c59cd9e091cd.

Reviewed By: jasonliu

Differential Revision: https://reviews.llvm.org/D102724

[SCEV] Cache operands used in BEInfo (NFC)

When memoized values for a SCEV expressions are dropped, we also
drop all BECounts that make use of the SCEV expression. This is done
by iterating over all the ExitNotTaken counts and (recursively)
checking whether they use the SCEV expression. If there are many
exits, this will take a lot of time.

This patch improves the situation by pre-computing a set of all
used operands, so that we can determine whether a certain BEInfo
needs to be invalidated using a simple set lookup. Will still need
to loop over all BEInfos though.

This makes for a mild improvement on non-degenerate cases:
https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions

For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384,
for n=128 I'm seeing run time drop from 1.6s to 1.1s.

Differential Revision: https://reviews.llvm.org/D102796

[gn build] Port 33706191d88d

[lld-macho][nfc] Remove unnecessary parameterization of section sort

As @alexshap pointed out [here](https://reviews.llvm.org/D102972#inline-975208),
it's a bit confusing to have the option to sort OutputSections with any
comparator when in practice we only use one.

Reviewed By: #lld-macho, alexshap, thakis

Differential Revision: https://reviews.llvm.org/D102974

[lld-macho][nfc] Sort OutputSections based on explicit order of command-line inputs

This diff paves the way for {D102964} which adds a new kind of
InputSection.

We previously maintained section ordering implicitly: we created
InputSections as we parsed each file in command-line order, and passed
on this ordering when we created OutputSections and OutputSegments by
iterating over these InputSections. The implicitness of the ordering
made it difficult to refactor the code to e.g. handle a new type of
InputSection. As such, I've codified the ordering explicitly via
`inputOrder` fields. This also allows us to use `sort` instead of
`stable_sort`.

Benchmarking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          4.23          4.35          4.27         4.274   0.030157481
  +  20          4.24          4.38          4.27        4.2815   0.033759989
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D102972

[lld-macho][nfc] Rename MergedOutputSection to ConcatOutputSection

The ELF format has the concept of merge sections (marked by SHF_MERGE),
which contain data that can be safely deduplicated. The Mach-O
equivalents are called literal sections (marked by S_CSTRING_LITERALS or
S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word
'merge', to avoid confusion, I've renamed our MergedOutputSection to
ConcatOutputSection. I believe it's a more descriptive name too.

This renaming sets the stage for {D102964}.

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D102971

[lld-macho][nfc] clang-format everything

[lld-macho][nfc] Misc code cleanup

* Move `static_asserts` into cpp instead of header file. I noticed they
  had been separated from the main class definition in the header, so I
  set about to clean that up, then figured it made more sense as part of
  the cpp file so as not to incur unnecessary compile-time overhead.

* Remove unnecessary `virtual`s

* Remove unnecessary comment / reword another comment

Revert "[NFC][scudo] Let disableMemoryTagChecksTestOnly to fail"

This reverts commit 2c212db4ea42fbbc0e83647da4f62261f775388b.

It's not needed.

[CVP] Guard against poison in common phi value transform (PR50399)

The common phi value transform replaces constants with values that
have the same value as the constant on a given edge. However, LVI
generally only provides information that is correct up to poison,
so this can end up replacing a well-defined value with poison.
D69442 addressed an instance of this problem by clearing poison
flags on the generating instruction, which was sufficient at the
time. rGa917fb89dc28 made LVI's edge value analysis slightly more
powerful, and clearing poison flags is no longer sufficient.

This patch changes the transform to instead explicitly guard against
a poison value instead. This should be satisfied for most cases due
to a prior branch on poison.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50399.

Differential Revision: https://reviews.llvm.org/D102966

[SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics.

- When memory intrinsics, such as memcpy, the attached scoped AA
  metadata is not passed down to the backend. As a result, the backend
  cannot schedule relevant memory operations around them following that
  hint. In this patch, SelectionDAG is enhanced to propagate that
  metadata (scoped AA only) when they are lowered into loads and stores.

Differential Revision: https://reviews.llvm.org/D102215

Add pre-commit tests for [D102215](https://reviews.llvm.org/D102215).

[mlir] Use unique_function in AbstractOperation fields

Currently, AbstractOperation fields are function pointers.
Modifying them to unique_function allow them to contain
runtime information.

For instance, this allows operations to be defined at runtime.

Differential Revision: https://reviews.llvm.org/D103031

[AMDGPU] Lower kernel LDS into a sorted structure

Differential Revision: https://reviews.llvm.org/D102954

[InstSimplify] allow undef element match in vector select condition value

The semantics of select with undefined/poison condition
are not explicitly stated in the LangRef, but this matches
comments in the code and Alive2 appears to concur:
https://alive2.llvm.org/ce/z/KXytmd

We can find this pattern after demanded elements transforms.

As noted in D101191, fuzzers are finding infinite loops because
we may not account for this pattern in other passes.

[mlir][doc] Fix links and references in documentation of Tutorials

This patch is the third in a series of patches fixing markdown links and references inside the mlir documentation.

This patch addresses all broken references to other markdown files and sections inside the Tutorials folder.

Differential Revision: https://reviews.llvm.org/D103017

[Matrix] Factor and distribute transposes across multiplies

Now that we can fold some transposes into multiplies (CM: A * B^t and RM:
A^t * B), we want to move them around to create the optimal expressions:

* fold away double transposes while still using them to assert the shape
* sink transposes hoping they cancel out
* lift transposes when both operands are transposed

This also modifies the matrix remarks to include the number of exposed
transposes (i.e. transposes that we couldn't fold into a multiply).

The adjustment to the test remarks-inlining is a bit subtle: I am changing the
double transpose to a single transpose so that we don't remove it completely.
More importantly this changes some of the total instruction count, most
notable stores because we can no longer use a vector store.

Differential Revision: https://reviews.llvm.org/D102733

[mlir] Add an optional distributionTypes attribute to TiledLoopOp.

Differential Revision: https://reviews.llvm.org/D103104

[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones

Nowadays LLVM does not assume that all loops are finite,
so if we want to produce a finite loop from a potentially-infinite one,
we must ensure that the original loop is known to be a finite one.

For this transform, it only matters for arithmetic right-shifts.
For them, either the function or the loop must be known to
be `mustprogress`, or the original value being shifted must be known
to be non-negative (because iff the sign bit was set,
it will never become zero, but will become `-1` in the "end").

It would be really good for alive2 to actually complain about this,
but it currently does not: https://github.com/AliveToolkit/alive2/issues/726

[scudo] Fix CHECK implementation

Cast of signed types to u64 breaks comparison.
Also remove double () around operands.

Reviewed By: cryptoad, hctim

Differential Revision: https://reviews.llvm.org/D103060

[scudo] Consistent setting of SCUDO_DEBUG

Make sure that if SCUDO_DEBUG=1 in tests
then we had the same in the scudo
library itself.

Reviewed By: cryptoad, hctim

Differential Revision: https://reviews.llvm.org/D103061

[Hexagon] Improve argument packing in vector shuffle selection

[mlir][linalg] Update Linalg.md (NFC).

Update the paragraph on generic / indexed_generic to reflect the unification of these operations.

Differential Revision: https://reviews.llvm.org/D102775

[CSSPGO][llvm-profgen] Change default cold threshold for context merging

llvm-profgen uses profile summary based cold threshold to merge and trim cold context profile. This is to strike a good balance between profile size and performance.

We've been using 99.9% as the cutoff to save profile size without affecting performance. This change switch to use 99.9% instead of 99.9999% as default cold threshold cutoff for llvm-profgen.

Redundant switch csprof-cold-thres is also removed and tests cleaned up.

Differential Revision: https://reviews.llvm.org/D103071