review.tizen.org Git - platform/upstream/llvm.git/log

[SCEV][NFC] Split out type balancing in implication engine

We plan to introduce more advanced ways of dealing with different types.

[RISCV] Fix -Wbraced-scalar-init after D89025

[RISCV] Add -mtune support

- The goal of this patch is improve option compatible with RISCV-V GCC,
   -mcpu support on GCC side will sent patch in next few days.

- -mtune only affect the pipeline model and non-arch/extension related
   target feature, e.g. instruction fusion; in td file it called
   TuneFeatures, which is introduced by X86 back-end[1].

- -mtune accept all valid option for -mcpu and extra alias processor
   option, e.g. `generic`, `rocket` and `sifive-7-series`, the purpose is
   option compatible with RISCV-V GCC.

- Processor alias for -mtune will resolve according the current target arch,
   rv32 or rv64, e.g. `rocket` will resolve to `rocket-rv32` or `rocket-rv64`.

- Interaction between -mcpu and -mtune:
   * -mtune has higher priority than -mcpu for pipeline model and
     TuneFeatures.

[1] https://reviews.llvm.org/D85165

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D89025

[llvm-cov] Fix test cases.
`/dev/null` is treated as regualar file on Windows.
native_separators.c line 11 used relative path which was not correct but worked before because when `SourceFiles` is empty, it add all source files into `SourceFiles`.

[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs

We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.

Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: efriedma

Support ObjC in IncludeInserter

Update IncludeSorter/IncludeInserter to support objective-c google style (part 1):

1) Correctly consider .mm/.m extensions
2) Correctly categorize category headers.
3) Add support for generated files to go in a separate section of imports

Reviewed By: alexfh, gribozavr2

Patch by Joe Turner.

Differential Revision: https://reviews.llvm.org/D89276

[mlir][CAPI] Add mlirAttributeGetType function.

* Also fixes the const-ness of the various DenseElementsAttr construction functions.
* Both issues identified when trying to use the DenseElementsAttr functions.

Differential Revision: https://reviews.llvm.org/D89517

[llvm-cov] don't include all source files when provided source files are filtered out

When all provided source files are filtered out either due to `--ignore-filename-regex` or not part of binary, don't generate coverage reults for all source files. Because if users want to generate coverage results for all source files, they don't even need to provid selected source files or `--ignore-filename-regex`.

Differential Revision: https://reviews.llvm.org/D89359

[MLIR] Fix gcc5 in D89161

Missing .str() makes gcc5 unable to infer the template to use.

Differential Revision: https://reviews.llvm.org/D89516

Switch the default of VerifyIntegerConstantExpression from constant
folding to not constant folding.

Constant folding of ICEs is done as a GCC compatibility measure, but new
code was picking it up, presumably by accident, due to the bad default.

While here, also switch the flag from a bool to an enum to make it more
obvious what it means at call sites. This highlighted a couple of places
where our behavior is different between C++11 and C++14 due to switching
from checking for an ICE to checking for a converted constant
expression (where there is no 'fold' codepath).

[mlir] RewriterGen NativeCodeCall matcher with ConstantOp matcher

Added an underlying matcher for generic constant ops. This
included a rewriter of RewriterGen to make variable use more
clear.

Differential Revision: https://reviews.llvm.org/D89161

[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting

This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

[AArch64][GlobalISel] NFC: Refactor emitIntegerCompare

Simplify emitIntegerCompare and improve comments + asserts.

Mostly making the code a little easier to follow.

Also, this code is only used for G_ICMP. The legalizer ensures that the LHS/RHS
for every G_ICMP is either a s32 or s64. So, there's no need to handle anything
else. This lets us remove a bunch of checks for whether or not we successfully
emitted the compare.

Differential Revision: https://reviews.llvm.org/D89433

[GlobalISel] Remove scalar src from non-sequential fadd/fmul reductions.

It's probably better to split these into separate G_FADD/G_FMUL + G_VECREDUCE
operations in the translator rather than carrying the scalar around. The
majority of the time it'll get simplified away as the scalars are probably
identity values.

Differential Revision: https://reviews.llvm.org/D89150

[mlir][vector] Add unrolling patterns for Transfer read/write

Adding unroll support for transfer read and transfer write operation. This
allows to pick the ideal size for the memory access for a given target.

Differential Revision: https://reviews.llvm.org/D89289

Add missing 'override'

[CGBuiltin] Respect asm labels and redefine_extname for builtins with specialized emitting

rL131311 added `asm()` support for builtin functions, but `asm()` for builtins with
specialized emitting (e.g. memcpy, various math functions) still do not work.

This patch makes these functions work for `asm()` and `#pragma redefine_extname`.
glibc uses `asm()` to redirect internal libc function calls to hidden aliases.

Limitation: such a function is a builtin in clang, but will not be recognized as
a libcall in optimization passes because Clang does not annotate the renamed
function as a libcall.  In GCC -O1 or above, `abs` can be optimized out but we can't.
Additionally, we cannot redirect `__builtin_sin` to `real_sin` in the following example:

  double sin(double x) asm("real_sin");
  double f(double d) { return __builtin_sin(d); }

---

According to @rsmith, the following three statements cannot be simultaneously true:

(1) The frontend function foo has known, builtin semantics X.
(2) The symbol foo has known, builtin semantics X.
(3) It's not correct to lower a call to the frontend function foo to the symbol foo.

People do want (1) (if it is profitable to expand a memcpy, do it).
This also means that people do not want to add -fno-builtin-memcpy.
People do want (3): that is why they use asm("__GI_memcpy") in the first place.

So unfortunately we make a compromise by not refuting (2) (see the limitation above).
For most libcalls, there is a small loss because compilers don't synthesize them.
For the few glibc cares about, it uses `asm("memcpy = __GI_memcpy");` to make
the assembly level redirection.
(Changing function names (e.g. `__memcpy`) is a hit to ergonomics which is not acceptable).

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D88712

[MS] Apply `inreg` to AArch64 sret parms on instance methods

The documentation rules indicate that instance methods should return
large, trivially copyable aggregates via X1/X0 and not X8 as is normally
done when returning such structs from free functions:
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values

Fixes PR47836, a bug in the initial implementation of these rules.

I tried to simplify the logic a bit as well while I'm here.

Differential Revision: https://reviews.llvm.org/D89362

Add an SB API to get the SBTarget from an SBBreakpoint

Differential Revision: https://reviews.llvm.org/D89358

[VE] Add VGT/VSC/PFCHV instructions

Add VGT/VSC/PFCHV vector instructions and regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89471

[VE] Support fabs/fcos/fsin/fsqrt math functions

VE doesn't have instruction for fabs/fcos/fsin/fsqrt, so expand them.
Add regression tests also. Update fcopysign regression test, also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89457

Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024"

This reverts commit 187658b8a6112446d9e7797d495bc7542ac83905 due to
AMDGPU backend issues.

Revert "[clang] Add -fc++-abi= flag for specifying which C++ ABI to use"

This reverts commits 683b308c07bf827255fe1403056413f790e03729 and
8487bfd4e9ae186f9f588ef989d27a96cc2438c9.

We will go for a more restricted approach that does not give freedom to
everyone to change ABIs on whichever platform.

See the discussion on https://reviews.llvm.org/D85802.

[WebAssembly] Prototype i8x16.popcnt

As proposed at https://github.com/WebAssembly/simd/pull/379. Use a target
builtin and intrinsic rather than normal codegen patterns to make the
instruction opt-in until it is merged to the proposal and stabilized in engines.

Differential Revision: https://reviews.llvm.org/D89446

fix symbol printing on windows

Similar to MCSymbol::print in 3d6c8ebb584375d01b1acead4c2056b3f0c501fc
(llvm-svn: 81682, PR4966), these symbols may need to be quoted to be handled by
the linker correctly.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D87099

[LoopVersion] Unify SCEVChecks and alias check handling (NFC).

This is an initial cleanup of the way LoopVersioning interacts with LAA.

Currently LoopVersioning has 2 ways of initializing things:

1. Passing LAI and passing UseLAIChecks = true
2. Passing UseLAIChecks = false, followed by calling setSCEVChecks and
setAliasChecks.

Both ways of initializing lead to the same result and the duplication
seems more complicated than necessary.

This patch removes the UseLAIChecks flag from the constructor and the
setSCEVChecks & setAliasChecks helpers and move initialization
exclusively to the constructor.

This simplifies things, by providing a single way to initialize
LoopVersioning and reducing duplication.

Reviewed By: Meinersbur, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84406

[libTooling] Change `after` range-selector to operate only on source ranges

Currently, `after` fails when applied to locations in macro arguments. This
change projects the subrange into a file source range and then applies `after`.

Differential Revision: https://reviews.llvm.org/D89468

PR47864: Fix assertion in pointer-to-member emission if there are
multiple declarations of the same base class.

[libc] Use entrypoints.txt as the single source of list of functions for a platform.

The function listings in api.td are removed. The same lists are now deduced using the information
in entrypoints.txt.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D89267

[AMDGPU] SILowerControlFlow::removeMBBifRedundant should not try to change MBB layout if it can fallthrough

removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB.
It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB.
It only may be allowed in case the new successor itself has no successors to which it fall through.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89397

[NFC][IndVars] Autogenerate check lines in tests being affected by upcoming patch

[NFC][LSR] Autogenerate check lines in tests being affected by upcoming patch

[NFC][SCEV] Autogenerate check lines in tests being affected by upcoming patch

[gn bulid] Remove phantom WebAssembly tablegen() calls

Apparenlty I added these in https://reviews.llvm.org/rL350628 but
I'm not sure why. They never existed in the CMake build, and now
they're causing trouble.

[AArch64] Stack frame reordering.

Implement stack frame reordering in the AArch64 backend.

Unlike the X86 implementation, AArch64 does not seem to benefit from
"access density" based frame reordering, mainly because it has a much
smaller variety of addressing modes, and the fact that all instructions
are 4 bytes so each frame object is either in range of an instruction
(and then the access is "free") or not (and that has a code size cost
of 4 bytes).

This change improves Memory Tagging codegen by
* Placing an object that has been chosen as the base tagged pointer of
the function at SP + 0. This saves one instruction to setup the pointer
(IRG does not have an offset immediate), and more because that object
can now be referenced without materializing its tagged address in a
scratch register.
* Placing objects that go out of scope simultaneously together. This
exposes opportunities for instruction merging in tryMergeAdjacentSTG.

Differential Revision: https://reviews.llvm.org/D72366

[MTE] Pin the tagged base pointer to one of the stack slots.

Summary:
Pin the tagged base pointer to one of the stack slots, and (if
necessary) rewrite tag offsets so that an object that occupies that
slot has both address and tag offsets of 0. This allows ADDG
instructions for that object to be eliminated and their uses replaced
with the tagged base pointer itself.

This optimization must be done in machine instructions and not in the IR
instrumentation pass, because referring to a stack slot through an IRG
pointer would confuse the stack coloring pass.

The optimization makes a (pretty naive) attempt to find the slot that
would benefit the most by counting the uses of stack slots in the
function.

Reviewers: ostannard, pcc

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72365

[AMDGPU] gfx1032 target

Differential Revision: https://reviews.llvm.org/D89487

Reland "[WebAssembly] v128.load{8,16,32,64}_lane instructions"

This reverts commit 7c8385a352ba21cb388046290d93b53dc273cd9f with a typing fix
to an instruction selection pattern.

[SemaObjC] Fix composite pointer type calculation for `void*` and pointer to lifetime qualified ObjC pointer type

Fixes a regression introduced in 9a6f4d451ca7. rdar://70101809

Differential revision: https://reviews.llvm.org/D89475

[mlir] Add std.tensor_to_memref op and teach the infra about it

The opposite of tensor_to_memref is tensor_load.

- Add some basic tensor_load/tensor_to_memref folding.
- Add source/target materializations to BufferizeTypeConverter.
- Add an example std bufferization pattern/pass that shows how the
  materialiations work together (more std bufferization patterns to come
  in subsequent commits).
  - In coming commits, I'll document how to write composable
  bufferization passes/patterns and update the other in-tree
  bufferization passes to match this convention. The populate* functions
  will of course continue to be exposed for power users.

The naming on tensor_load/tensor_to_memref and their pretty forms are
not very intuitive. I'm open to any suggestions here. One key
observation is that the memref type must always be the one specified in
the pretty form, since the tensor type can be inferred from the memref
type but not vice-versa.

With this, I've been able to replace all my custom bufferization type
converters in npcomp with BufferizeTypeConverter!

Part of the plan discussed in:
https://llvm.discourse.group/t/what-is-the-strategy-for-tensor-memref-conversion-bufferization/1938/17

Differential Revision: https://reviews.llvm.org/D89437

[mlir] Fix typo in LangRef

Reapply [LLD] [COFF] Implement a GNU/ELF like -wrap option

Add a simple forwarding option in the MinGW frontend, and implement
the private -wrap option in the COFF linker.

The feature in lld-link isn't gated by the -lldmingw option, but
the option is left as a private, undocumented option primarily
used by the MinGW driver.

The implementation is significantly based on the support for --wrap
in the ELF linker, but many small nuance details are different
between the ELF and COFF linkers, ending up with more than a few
implementation differences.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47384.

Differential Revision: https://reviews.llvm.org/D89004

Reapplied with the bitfield member canInline fixed so it doesn't break
builds targeting windows.

[InstCombine] update tests for logic folds to exercise commuted patterns; NFC

This was the intent for D88551.
I also varied the types a bit for extra coverage
and tried to make better test/value names.

[NFC][CaptureTracking] Move static function isNonEscapingLocalObject to llvm namespace

Function isNonEscapingLocalObject is a static one within BasicAliasAnalysis.cpp.
It wraps around PointerMayBeCaptured of CaptureTracking, checking whether a pointer
is to a function-local object, which never escapes from the function.

Although at the moment, isNonEscapingLocalObject is used only by BasicAliasAnalysis,
its functionality can be used by other pass(es), one of which I will put up for review
very soon. Instead of copying the contents of this static function, I move it to llvm
scope, and place it amongst other functions with similar functionality in CaptureTracking.

The rationale for the location are:
- Pointer escape and pointer being captured are actually two sides of the same coin
- isNonEscapingLocalObject is wrapping around another function in CaptureTracking

Reviewed By: jdoerfert (Johannes Doerfert)

Differential Revision: https://reviews.llvm.org/D89465

Make sure both cc1 and cc1as process -m[no-]code-object-v3

Differential Revision: https://reviews.llvm.org/D89478

[libc++] Reduce dependencies on <iostream> from <random>

We included <istream> and <ostream> from <random>, but really it is
sufficient to include <iosfwd> if we make sure we access ios_base
members through a dependent type. This allows us to break a hard
dependency of <random> on locales.

[flang][msvc] Avoid a reinterpret_cast

The call to the binary->decimal formatter in real.cpp was cheating
by using a reinterpret_cast<> to extract its binary value.
Use a more principled and portable approach by extending the
API of evaluate::Integer<> to include ToUInt<>()/ToSInt<>()
member function templates that do the "right" thing. Retain
ToUInt64()/ToSInt64() for compatibility.

Differential revision: https://reviews.llvm.org/D89435

[mlir][Linalg] NFC - Rename test files s/_/-/g

Revert "[LLD] [COFF] Implement a GNU/ELF like -wrap option"

This reverts commit a012c704b5e5b60f9d2a7304d27cbc84a3619571.

Breaks Windows builds.

C:\src\llvm-mint\lld\COFF\Symbols.cpp(26,1): error: static_assert failed due to requirement 'sizeof(lld::coff::SymbolUnion) <= 48' "symbols should be optimized for memory usage"
static_assert(sizeof(SymbolUnion) <= 48,

[LV] Add a getRecurrenceBinOp and make use of it. NFC

[libc++][filesystem] Only include <fstream> when we actually need it in copy_file_impl

This allows building <filesystem> on systems that don't support <fstream>,
such as systems that don't support localization.

[CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI implementation

The cost modeling for intrinsics is a patchwork based on different
expectations from the callers, so it's a mess. I'm hoping to untangle
this to allow canonicalization to the new min/max intrinsics in IR.
The general goal is to remove the cost-kind restriction here in the
basic implementation class. Ie, if some intrinsic has throughput cost
of 104, assume that it has the same size, latency, and blended costs.
Effectively, an intrinsic with cost N is composed of N simple
instructions. If that's not correct, the target should provide a more
accurate override.

The x86-64 SSE2 subtarget cost diffs require explanation:

1. The scalar ctlz/cttz are assuming "BSR+XOR+CMOV" or
   "TEST+BSF+CMOV/BRANCH", so not cheap.
2. The 128-bit SSE vector width versions assume cost of 18 or 26
   (no explanation provided in the tables, but this corresponds to a
   bunch of shift/logic/compare).
3. The 512-bit vectors in the test file are scaled up by a factor of
   4 from the legal vector width costs.
4. The plain latency cost-kind is not affected in this patch because
   that calc is diverted before we get to getIntrinsicInstrCost().

Differential Revision: https://reviews.llvm.org/D89461

[PGO] Remove the old memop value profiling buckets.

Following up D81682 and D83903, remove the code for the old value profiling
buckets, which have been replaced with the new, extended buckets and disabled by
default.

Also syncing InstrProfData.inc between compiler-rt and llvm.

Differential Revision: https://reviews.llvm.org/D88838

[libc++] NFC: Remove unused include

[libc++] Allow building libc++ on platforms without a random device

Some platforms, like several embedded platforms, do not provide a source
of randomness through a random device. This commit makes it possible to
build and test libc++ for such platforms, i.e. without std::random_device.

Surprisingly, the only functionality that doesn't work on such platforms
is std::random_device itself -- everything else in <random> still works,
one just has to find alternative ways to seed the PRNGs.

[x86] add no 'unwind' to reduce test noise; NFC

I suggested this in D89412, but the comment was missed.

Revert "[WebAssembly] v128.load{8,16,32,64}_lane instructions"

This reverts commit 7c6bfd90ab2ddaa60de62878c8512db0645e8452.

[lldb] [Process/FreeBSDRemote] Initial multithreading support

Implement initial support for watching thread creation and termination.
Update ptrace() calls to correctly indicate requested thread.
Watchpoints are not supported yet.

This patch fixes at least multithreaded register tests.

Differential Revision: https://reviews.llvm.org/D89413

[LLD] [COFF] Implement a GNU/ELF like -wrap option

Add a simple forwarding option in the MinGW frontend, and implement
the private -wrap option in the COFF linker.

The feature in lld-link isn't gated by the -lldmingw option, but
the option is left as a private, undocumented option primarily
used by the MinGW driver.

The implementation is significantly based on the support for --wrap
in the ELF linker, but many small nuance details are different
between the ELF and COFF linkers, ending up with more than a few
implementation differences.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47384.

Differential Revision: https://reviews.llvm.org/D89004

[LLD] [COFF] Fix a condition that was missed in 7f0e6c31c255. NFC.

This should fix cases when e.g. auto import is enabled without
mingw mode in total being enabled.

Differential Revision: https://reviews.llvm.org/D89006

[WebAssembly] v128.load{8,16,32,64}_lane instructions

Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.

Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.

Differential Revision: https://reviews.llvm.org/D89366

[RISCV] fix a mistake in RISCVInstrInfoV.td

A commit of VALUVVNoVm was wrong, fixed it.

Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D88142

[InstCombine] Use m_SpecificInt instead of m_APInt + comparison. NFCI.

[InstCombine] SimplifyDemandedUseBits - xor - refactor cast<ConstantInt> usage to PatternMatch. NFCI.

First step towards replacing these to add full vector support.

[InstCombine] InstCombineAndOrXor - refactor cast<ConstantInt> usages to PatternMatch. NFCI.

First step towards replacing these to add full vector support.

[openmp][libomptarget] Include header from LLVM source tree

[openmp][libomptarget] Include header from LLVM source tree

The change is to the amdgpu plugin so is unlikely to break anything.

The point of contention is whether libomptarget can depend on LLVM.
A community discussion was cautiously not opposed yesterday.

This introduces a compile time dependency on the LLVM source tree, in this case
expressed as skipping the building of the plugin if LLVM_MAIN_INCLUDE_DIR is not
set. One the source files will #include llvm/Frontend/OpenMP/OMPGridValues.h,
instead of copy&pasting the numbers across.

For users that download the monorepo, the llvm tree is already on disk. This will
inconvenience users who download only the openmp source as a tar, as they would
now also have to download (at least a file or two) from the llvm source, if they want
to build the parts of the openmp project that (post this patch) depend on llvm.

There was interest expressed in going further - using llvm tools as part of
building libomp, or linking against llvm libraries. That seems less clear cut
an improvement and worthy of further discussion. This patch seeks only to change
policy to support openmp depending on the llvm source tree. Including in the
other direction, or using libraries / tools etc, are purposefully out of scope.

Reviewers are a best guess at interested parties, please feel free to add others

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D87841

[mlir][standard] Fix parsing of scalar subview and canonicalize

Parsing of a scalar subview did not create the required static_offsets attribute.
This also adds support for folding scalar subviews away.

Differential Revision: https://reviews.llvm.org/D89467

[NFC] Fix license header from D87841

[TableGen] Add the !not and !xor operators.
Update the TableGen Programmer's Reference.

[RISCV] [TableGen] Modify RISCVCompressInstEmitter.cpp to use getAllDerivedDefinitions().

Add "not" to an llvm-symbolizer test that expects to fail

In a7b209a6d40d77b, llvm-symbolizer was adjusted to return a failure status
code when it produced an error, to flag up DWARF parsing problems. The
test for missing PDB file is analogous, and returns a failure status now
too.

This should fix the llvm-clang-win-x-armv7l buildbot croaking:

http://lab.llvm.org:8011/#/builders/60/builds/77

AMDGPU: Fix verifier error on killed spill of partially undef register

This does unfortunately end up with extra waitcnts getting inserted
that were avoided before. Ideally we would avoid the spills of these
undef components in the first place.

[InstCombine] visitXor - refactor ((X^C1)>>C2)^C3 -> (X>>C2)^((C1>>C2)^C3) fold. NFCI.

This is still ConstantInt-only (scalar) but is refactored to use PatternMatch to make adding vector support in the future relatively trivial.

[AMDGPU] Minimize number of s_mov generated by copyPhysReg

Generate the minimal set of s_mov instructions required when
expanding a SGPR copy operation in copyPhysReg.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89187

[SVE]Fix implicit TypeSize casts in EmitCheckValue

Using TypeSize::getFixedSize() instead of relying upon the implicit
TypeSize->uint64_cast as the type is always fixed width.

Differential Revision: https://reviews.llvm.org/D89313

[Statepoints] Remove MI limit on number of tied operands.

After D87915 statepoint can have more than 15 tied operands.
Remove this restriction from statepoint lowering code.

[LLD][ELF] Improve ICF for relocations to ineligible sections via "aliases"

ICF was not able to merge equivalent sections because of relocations to
sections ineligible for ICF that use alternative symbols, e.g. symbol
aliases or section relative relocations.

Merging in this scenario has been enabled by giving the sections that
are ineligible for ICF a unique ID, i.e. an equivalence class of their
own. This approach also provides another benefit as it improves the
hashing that is used to perform the initial equivalance grouping for
ICF. This is because the ICF ineligible sections can now contribute a
unique value towards the hashes instead of the same value of zero. This
has been seen to reduce link time with ICF by ~68% for objects compiled
with -fprofile-instr-generate.

In order to facilitate this use of a unique ID, the existing
inconsistent approach to the setting of the InputSection eqClass in ICF
has been changed so that there is a clear distinction between the
eqClass values of ICF eligible sections and those of the ineligible
sections that have a unique ID. This inconsistency could have caused
incorrect equivalence class equality in the past, although it appears
that no issues were encountered in actual use.

Differential Revision: https://reviews.llvm.org/D88830

[flang] Fix build with BUILD_SHARED_LIBS=ON and FLANG_BUILD_NEW_DRIVER=ON

As usual, it's difficult to handle all different configuration in the first row,
but this one has been extensively tested

Differential Revision: https://reviews.llvm.org/D89452

Fix unused variable warning when compiling with asserts disabled.

Differential Revision: https://reviews.llvm.org/D89454

[DebugInstrRef] Support recording of instruction reference substitutions

Add a table recording "substitutions" between pairs of <instruction,
operand> numbers, from old pairs to new pairs. Post-isel optimizations are
able to record the outcome of an optimization in this way. For example, if
there were a divide instruction that generated the quotient and remainder,
and it were replaced by one that only generated the quotient:

  $rax, $rcx = DIV-AND-REMAINDER $rdx, $rsi, debug-instr-num 1
  DBG_INSTR_REF 1, 0
  DBG_INSTR_REF 1, 1

Became:

  $rax = DIV $rdx, $rsi, debug-instr-num 2
  DBG_INSTR_REF 1, 0
  DBG_INSTR_REF 1, 1

We could enter a substitution from <1, 0> to <2, 0>, and no substitution
for <1, 1> as it's no longer generated.

This approach means that if an instruction or value is deleted once we've
left SSA form, all variables that used the value implicitly become
"optimized out", something that isn't true of the current DBG_VALUE
approach.

Differential Revision: https://reviews.llvm.org/D85749

[AggressiveInstCombine] foldAnyOrAllBitsSet - add uniform vector support

Replace m_ConstantInt with m_APInt to support uniform vectors (with no undef elements)

Adding non-undef support would involve some refactoring of the MaskOps struct but this might still be worth it.

[AggressiveInstCombine] foldAnyOrAllBitsSet - add uniform vector tests

[CodeGen][X86] Emit fshl/fshr ir intrinsics for shiftleft128/shiftright128 ms intrinsics

Now that funnel shift handling is pretty good, we can use the intrinsics directly and avoid a lot of zext/trunc issues.

https://godbolt.org/z/YqhnnM

Differential Revision: https://reviews.llvm.org/D89405

[lldb] Explicitly test the template argument SB API

[AMDGPU] Add objdump invalid metadata testcase

Checks that metadata and invalid message are printed.

Differential Revision: https://reviews.llvm.org/D89375

[Statepoints] Unlimited tied operands.

Current limit on amount of tied operands (15) sometimes is too low
for statepoint. We may get couple dozens of gc pointer operands on
statepoint.
Review D87154 changed format of statepoint to list every gc pointer
only once, which makes it trivial to find tiedness relation between
statepoint operands: defs are mapped 1-1 to gc pointer operands passed
on registers.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D87915

[NFC] Correct name of profile function to Profile in APValue

Capitalize the profile function of APValue such that it can be used by FoldingSetNodeID

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D88643

[yaml2obj] - Allow specifying no tags to create empty sections in few cases.

Currently we have a few sections that
does not support specifying no keys for them. E.g. it is required that one
of "Content", "Size" or "Entries" key is present. There is no reason to
have this restriction. We can allow this and emit an empty section instead.

This opens road for a simplification and generalization of the code in `validate()`
that is discussed in the D89039 thread.

Depends on D89039.

Differential revision: https://reviews.llvm.org/D89391

[libc][NFC] Add probability distributions for memory function sizes

This patch adds memory function size distributions sampled from different applications running in production.
This will be used to benchmark and compare memory functions implementations.

Differential Revision: https://reviews.llvm.org/D89401

[yaml2obj/obj2yaml] - Add support of 'Size' and 'Content' keys for all sections.

Many sections either do not have a support of `Size`/`Content` or support just a
one of them, e.g only `Content`.

`Section` is the base class for sections. This patch adds `Content` and `Size` members
to it and removes similar members from derived classes. This allows to cleanup and
generalize the code and adds a support of these keys for all sections (`SHT_MIPS_ABIFLAGS`
is a only exception, it requires unrelated specific changes to be done).

I had to update/add many tests to test the new functionality properly.

Differential revision: https://reviews.llvm.org/D89039

[TargetLowering] Replace Log2_32_Ceil with Log2_32 in SimplifySetCC ctpop combine.

This combine can look through (trunc (ctpop X)). When doing this
it tries to make sure the trunc doesn't lose any information
from the ctpop. It does this by checking that the truncated type
has more bits that Log2_32_Ceil of the ctpop type. The Ceil is
unnecessary and pessimizes non-power of 2 types.

For example, ctpop of i256 requires 9 bits to represent the max
value of 256. But ctpop of i255 only requires 8 bits to represent
the max result of 255. Log2_32_Ceil of 256 and 255 both return 8
while Log2_32 returns 8 for 256 and 7 for 255

The code with popcnt enabled is a regression for this test case,
but it does match what already happens with i256 truncated to i9.
Since power of 2 is more likely, I don't think it should block
this change.

Differential Revision: https://reviews.llvm.org/D89412

[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets

In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.

Differential Revision: https://reviews.llvm.org/D89101

Increase timeout to find a dSYM in macos DownloadObjectAndSymbolFile

With a large dSYM over a slow home connection, the two minute timeout
would sometimes be exceeded, and we haven't seen instances of a
long timeout causing people any problems, so we're bumping it up.

640 seconds ought to be enough for anyone.

<rdar://problem/67759526>

[LLD] Set alignment as part of Characteristics in TLS table.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46473

LLD wasn't previously specifying any specific alignment in the TLS table's Characteristics field so the loader would just assume the default value (16 bytes). This works most of the time except if you have thread locals that want specific higher alignments (e.g. 32 as in the bug) *even* if they specify an alignment on the thread local. This change updates LLD to take the max alignment from tls section.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D88637

Revert "[LLD] Set alignment as part of Characteristics in TLS table."

Revert individual wip commits and will instead follow up with a
single commit with all the changes. Makes cherry-picking easier
and will contain all the right tags.

This reverts commit 32a4ad3b6ce6028a371b028cf06fa5feff9534bf.
This reverts commit 7fe13af676678815989a6d0ece684687953245e7.
This reverts commit 51fbc1bef657bb0f5808986555ec3517a84768c4.
This reverts commit f80950a8bb985c082b26534b0e157447bf803935.
This reverts commit 0778cad9f325df4d7b32b22f3dba201a16a0b8fe.
This reverts commit 8b70d527d7ec1c8b9e921177119a0d906ffad4f0.

Fix llvm-symbolizer assembly-based test to require x86 and specify x86 when assembling

llvm-symbolizer: Exit non-zero when DWARF parsing errors have been rendered

Fix typeo in attach failed error message.
<rdar://problem/70296751>

[AMDGPU] Pre-commit test for D89187

llvm-symbolizer: Ensure non-zero exit when an error is printed

(this doesn't cover all cases - libDebugInfoDWARF has a default error
handler that prints errors without any exit code handling - I'll be
following up with a patch for that after this)