review.tizen.org Git - platform/upstream/llvm.git/log

AMDGPU/llvm-readobj: Add missing tests for note parsing/displaying

This is a follow up review/change for https://reviews.llvm.org/D95638

Add valid note tests for code object v2 notes:
  - NT_AMD_HSA_CODE_OBJECT_VERSION (required yaml2obj update)
  - NT_AMD_HSA_HSAIL (required yaml2obj update)
  - NT_AMD_HSA_ISA_VERSION (required yaml2obj update)
  - NT_AMD_HSA_METADATA
  - NT_AMD_HSA_ISA_NAME
  - NT_AMD_PAL_METADATA

Add valid note tests for code object v3 notes:
  - NT_AMDGPU_METADATA

Add invalid note tests for code object v2 notes:
  - NT_AMD_HSA_CODE_OBJECT_VERSION (required yaml2obj update)
  - NT_AMD_HSA_HSAIL (required yaml2obj update)
  - NT_AMD_HSA_ISA_VERSION (required yaml2obj update)

Add invalid note tests for code object v3 notes:
  - NT_AMDGPU_METADATA

Differential Revision: https://reviews.llvm.org/D101304

[lldb] More tests for DumpDataExtractor

* Using a base address or skipping it with LLDB_INVALID_ADDRESS
* Using a data offset, which does not effect the printed addresses
* Not providing an output stream
* Formatting a double sized HexFloat
* Formatting over multiple lines

Since address printing now has its own test,
I've removed the base address from all the format
type tests.

The multi line tests still use a base address to check that
it's incremented correctly for each new line.

Reviewed By: teemperor

Differential Revision: https://reviews.llvm.org/D101627

[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch

Differential Revision: https://reviews.llvm.org/D99354

[PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns.

This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.

The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.

The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.

This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.

Differential Revision: https://reviews.llvm.org/D93370

[XCOFF][AIX] Add Global Variables Directly to TOC for 32 bit AIX

Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.

Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178

[clang] Fix assert() crash when checking undeduced arg alignment

There already was a check for undeduced and incomplete types, but it
failed to trigger when outer type (SubstTemplateTypeParm in test) looked
fine, but inner type was not.

Differential Revision: https://reviews.llvm.org/D100667

[AMDGPU] Add test for set_gpr_idx removal with conditional branches

sanitizer_common: introduce kInvalidTid/kMainTid

Currently we have a bit of a mess related to tids:
- sanitizers re-declare kInvalidTid multiple times
- some call it kUnknownTid
- implicit assumptions that main tid is 0
- asan/memprof claim their tids need to fit into 24 bits,
but this does not seem to be true anymore
- inconsistent use of u32/int to store tids

Introduce kInvalidTid/kMainTid in sanitizer_common
and use them consistently.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101428

[gn build] Port 43bc584dc05e

[VE] VP intrinsics are legal

[VP,Integer,#2] ExpandVectorPredication pass

This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).

VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D78203

[AMDGPU] Add implicit negative check for the set_gpr_idx tests

The only effect of the optimization is to remove s_set_gpr_idx_*
instructions, and update_mir_test_checks.py always inserts CHECK: rather
than CHECK-NEXT: checks, so without this implicit negative check, the
tests would always pass even if the optimization did nothing.

Differential Revision: https://reviews.llvm.org/D101622

[OpenCL] Prevent adding vendor extensions for all targets

Removed extension begin/end pragma as it has no effect and
it is added unconditionally for all targets.

Differential Revision: https://reviews.llvm.org/D92244

[lld/mac] Remove unused -L%t flags from tests

No behavior change.

Differential Revision: https://reviews.llvm.org/D101623

[docs]Added llvm/bindings section

Added information about language bindings provided by LLVM.

Reviewed By: xgupta, gandhi21299

Differential Revision: https://reviews.llvm.org/D101295

[lld/mac] Tweak two comments and fix style on one variable name

Cosmetic, no behavior change.

[MCA] Fix CarryOver check in the DispatchStage (PR50174).

Early exit from method DispatchStage::isAvailable() if the dispatch group is
already full. Not all instructions declare at least one uOP.
Fixes PR50174.

[clang] Refactor mustprogress handling, add it to all loops in c++11+.

Currently Clang does not add mustprogress to inifinite loops with a
known constant condition, matching C11 behavior. The forward progress
guarantee in C++11 and later should allow us to add mustprogress to any
loop (http://eel.is/c++draft/intro.progress#1).

This allows us to simplify the code dealing with adding mustprogress a
bit.

Reviewed By: aaron.ballman, lebedev.ri

Differential Revision: https://reviews.llvm.org/D96418

[AMDGPU] Fix inconsistent ---/... in MIR tests and regenerate checks

In some cases the lack of --- or ... confused update_mir_test_checks.py
into not adding any checks for a function.

[libc++] [test] Run the clang-format and generated-output checks on the "service" queue

As these jobs only run in a couple seconds, and block starting of
other jobs, they can run on the "service" queue which doesn't get
blocked by other long-running jobs.

Differential Revision: https://reviews.llvm.org/D101437

[libc++] Minor cleanups in <iterator>. NFCI.

[ARM][MVE] vcreateq lane ordering for big endian

Use of bitcast resulted in lanes being swapped for vcreateq with big
endian. Fix this by using vreinterpret. No code change for little
endian. Adds IR lit test.

Differential Revision: https://reviews.llvm.org/D101606

[clangd][NFC] Remove unnecessary string captures in lambdas.

Due to a somewhat annoying, but necessary, shortfall in -Wunused-lambda-capture, These unused captures aren't warned about.

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D101611

Require shell for lld/test/MachO/reproduce.s

as a way of not running it on Windows, where the file paths when
extracting repro2.tar can become longer than the maximum file length
limit (depending on the build dir name) and cause the test to fail.

(See https://crbug.com/1204463 for example test failure.)

clang-format: [JS] handle "off" in imports

Previously, the JavaScript import sorter would ignore `// clang-format
off` and `on` comments. This change fixes that. It tracks whether
formatting is enabled for a stretch of imports, and then only sorts and
merges the imports where formatting is enabled, in individual chunks.

This means that there's no meaningful total order when module references are mixed
with blocks that have formatting disabled. The alternative approach
would have been to sort all imports that have formatting enabled in one
group. However that raises the question where to insert the
formatting-off block, which can also impact symbol visibility (in
particular for exports). In practice, sorting in chunks probably isn't a
big problem.

This change also simplifies the general algorithm: instead of tracking
indices separately and sorting them, it just sorts the vector of module
references. And instead of attempting to do fine grained tracking of
whether the code changed order, it just prints out the module references
text, and compares that to the previous text. Given that source files
typically have dozens, but not even hundreds of imports, the performance
impact seems negligible.

Differential Revision: https://reviews.llvm.org/D101515

[NARY] Don't optimize min/max if there are side uses (part2)

Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D101359

[Doc] Fix sphinx warnings about wrong code-block format

Differential Revision: https://reviews.llvm.org/D101549

[Passes] Run sinking/hoisting in SimplifyCFG earlier.

Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:

1. Executing memory accesses unconditionally.
2. Reducing the number of instructions that need predication.

After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.

This patch tires to recover the regression by running hoisting/sinking
as part of a SimplifyCFG run after LoopRotate and before LoopVectorize.

Note that in the legacy pass-manager, we run LoopRotate just before
vectorization again and there's no SimplifyCFG run in between, so the
sinking/hoisting may impact the later run on LoopRotate. But the impact
should be limited and the benefit of hosting/sinking at this stage
should outweigh the risk of not rotating.

Compile-time impact looks slightly positive for most cases.
http://llvm-compile-time-tracker.com/compare.php?from=2ea7fb7b1c045a7d60fcccf3df3ebb26aa3699e5&to=e58b4a763c691da651f25996aad619cb3d946faf&stat=instructions

NewPM-O3: geomean -0.19%
NewPM-ReleaseThinLTO: geoman -0.54%
NewPM-ReleaseLTO-g: geomean -0.03%

With a few benchmarks seeing a notable increase, but also some
improvements.

Alternative to D101290.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101468

[AArch64][SVE] Lower index_vector to step_vector

As discussed in D100107, this patch first convert index_vector to
step_vector, and convert step_vector back to index_vector after LegalizeDAG.

Differential Revision: https://reviews.llvm.org/D100816

[InlineCost] CallAnalyzer: use TTI info for extractvalue - they are free (PR50099)

It seems incorrect to use TTI data in some places,
and override it in others. In this case, TTI says
that `extractvalue` are free, yet we bill them.

While this doesn't address https://bugs.llvm.org/show_bug.cgi?id=50099 yet,
it reduces the cost from 55 to 50 while the threshold is 45.

Differential Revision: https://reviews.llvm.org/D101228

Wrap edit line configuration calls into helper functions

Currently we call el_set directly to configure the editor in the libedit
wrapper.  There are some cases in which this causes extra casting, but we pass
captureless lambdas as function pointers, which should work out of the box.
Since el_set takes varargs, if the cast is incorrect or if the cast is not
present, it causes a run time failure rather than compile error.  This change
makes it so a few different types of configuration is done inside a helper
function to provide type safety and eliminate that casting.  I didn't do all
edit line configuration because I'm not sure how important it was in other cases
and it might require something more general keep up with libedit's signature.
I'm open to suggestions, though.

Reviewed By: teemperor, JDevlieghere

Differential Revision: https://reviews.llvm.org/D101250

[AMDGPU] Tidy up some simple expressions for clarity NFC

Slight refactor for clarity.

Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5

Differential Revision: https://reviews.llvm.org/D101610

[JITLink] Minor fix to avoid Windows compiler warning for static-cast

Change-Id: Id0c1d5535b53e2aebe314151c0efa585e763f3f6

Differential Revision: https://reviews.llvm.org/D100093

[AArch64] Change __ARM_FEATURE_FP16FML macro name to __ARM_FEATURE_FP16_FML

The "Arm C Language extensions" document (the current version can be
found at https://developer.arm.com/documentation/101028/0012/?lang=en)
states that the name of the feature test macro for the FP16 FML extension
is __ARM_FEATURE_FP16_FML.

Differential Revision: https://reviews.llvm.org/D101532

[lldb] Add tests for DumpDataExtractor formats

Covering basic cases where you have 1 item on 1 line.

Apart from eFormatCharArray, where using multiple lines
highlights the difference between it and eFormatVectorOfChar.

Reviewed By: #lldb, teemperor

Differential Revision: https://reviews.llvm.org/D101453

[RISCV][NFC] Merge RV32/RV64 test checks with a common prefix

[RISCV] Support STEP_VECTOR with a step greater than one

DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.

This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D100856

[llvm][Support][NFC] Fix fallthrough attribute indentation

The attribute does not belong to the if statement before and trips up
gcc's indentation checker.

tsan: fix fork syscall test

Arm64 builders failed with:
error: use of undeclared identifier 'SYS_fork'
https://lab.llvm.org/buildbot/#/builders/7/builds/2575

Indeed, not all arches have fork syscall.
Implement fork via clone on these arches.

Differential Revision: https://reviews.llvm.org/D101603

[GISel] Teach TableGen to check predicates of immediate operands in patterns

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D91703

[AMDGPU] Simplify getWaitStatesSince. NFC.

[cmake] Use -ffunction-sections and -Wl,--gc-sections on MinGW targets

If compiling with GCC or linking with ld.bfd, these options have little
effect, but if built with Clang and linked with LLD, they provide a
quite notable size decrease - this shrinks an entire llvm-mingw
distribution package by 22%.

If building with BUILD_SHARED_LIBS or LLVM_BUILD_LLVM_DYLIB with LLD,
this requires a version of LLD that contains a fix for auto exporting
symbols from comdats, 2b01a417d7ccb001ccc1185ef5fdc967c9fac8d7.

Differential Revision: https://reviews.llvm.org/D101568

Fix -fdebug-pass-structure test case

Pass structure can change when -O0 is given and extensions are used.

Reapply [llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets

When looking up data referenced from pdata/xdata structures, the
referenced data can be found in two different ways:
- For an unrelocated object file, it's located via a relocation
- For a relocated, linked image, the data is referenced with an
(image relative) absolute address

For the latter case, the absolute address can optionally be
described with a symbol.

For the case of an object file, there's two offsets involved; one
immediate offset encoded in the data location that is modified by
the relocation, and a section offset in the symbol.

Previously, for the ExceptionRecord field, we printed the offset
from the symbol (only) but used the immediate offset ignoring
the symbol's address (using only the symbol's section) for printing
the exception data.

Add a helper method for doing the lookup and address calculation,
for simplifying the calling code and making all the cases consistent.

This addresses an existing FIXME comment, fixing printing of the
exception data for cases where relocations point at individual
symbols in the xdata section (which is what MSVC generates) instead of
all relocations pointing at the start of the xdata section (which is
what LLVM generates).

This also fixes printing of the function name for packed entries in
linked images.

Relanded with a format string fix in the formatSymbol function; one
can't use %X as format string for an uint64_t. That bug has been
present since this code was added in e6971cab306cd.

Differential Revision: https://reviews.llvm.org/D100305

tsan: refactor fork handling

Commit efd254b6362 ("tsan: fix deadlock in pthread_atfork callbacks")
fixed another deadlock related to atfork handling.
But builders with DCHECKs enabled reported failures of
pthread_atfork_deadlock2.c and pthread_atfork_deadlock3.c tests
related to the fact that we hold runtime locks on interceptor exit:
https://lab.llvm.org/buildbot/#/builders/70/builds/6727
This issue is somewhat inherent to the current approach,
we indeed execute user code (atfork callbacks) with runtime lock held.

Refactor fork handling to not run user code (atfork callbacks)
with runtime locks held. This change does this by installing
own atfork callbacks during runtime initialization.
Atfork callbacks run in LIFO order, so the expectation is that
our callbacks run last, right before the actual fork.
This way we lock runtime mutexes around fork, but not around
user callbacks.

Extend tests to also install after fork callbacks just to cover
more scenarios. Some tests also started reporting real races
that we previously suppressed.

Also extend tests to cover fork syscall support.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101517

[debugserver] Use add_lldb_library instead of add_library

Use add_lldb_library to ensure debugserver inherits the defines set by
llvm and lldb.

Differential revision: https://reviews.llvm.org/D101596

[msan] Add static to some msan allocator functions

This is to help review refactor the allocator code.
So it is easy to see which are the real public interfaces.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101586

Pre-commit test for PPC vector extraction test

[InlineCost] Remove visitUnaryInstruction()

The simplifyInstruction() in visitUnaryInstruction() does not trigger
for all of check-llvm. Looking at all delegates to UnaryInstruction in
InstVisitor, the only instructions that either don't have a visitor in
CallAnalyzer, or redirect to UnaryInstruction, are VAArgInst and Alloca.
VAArgInst will never get simplified, and visitUnaryInstruction(Alloca)
would always return false anyway.

Reviewed By: mtrofin, lebedev.ri

Differential Revision: https://reviews.llvm.org/D101577

[AMDGPU] Skip promote-alloca for insertelement/insertvalue users

It is difficult to track the users of vector and aggregate types.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D101562

[RISCV] Fix StackOffset calculation when using sp to access the fixed stack object in the case of rvv vector objects existed

When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field

Differential Revision: https://reviews.llvm.org/D100286

[RISCV] Precommit a test case that test accessing a fixed object when has rvv vector object existed

Differential Revision: https://reviews.llvm.org/D100284

[CMake][compiler-rt] avoid conflict with builtin check_linker_flag

Rename `check_linker_flag` in compiler_rt to avoid conflict. Follow up
as the fix in D100901.

Patched by radford.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101581

[MS] Preserve base register %rbx around cpuid

This patch copies implementation from cpuid.h, which preserve base register %rbx around cpuid. It fixes PR50133.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D101338

[AArch64][GlobalISel] Fix width value for G_SBFX/G_UBFX

When creating G_SBFX/G_UBFX opcodes, the last operand is the
width instead of the bit position. The bit position is used
for the AArch64 SBFM and UBFM instructions. The bit position
is converted to a width if the SBFX/UBFX aliases are generated.
For other SBMF/UBFM aliases, such as shifts, the bit position
is used.

Differential Revision: https://reviews.llvm.org/D101543

VirtRegMap: Support partially allocated virtual registers

Don't assert if there are unassigned virtual registers. Maintain
LiveIntervals by removing the RegUnits for allocated registers, since
they should not longer be necessary.

One part I find somewhat questionable is the special handling
necessary for handleIdentityCopy. The LiveIntervals for the relevant
regunits needs to be removed.

[lldb-vscode] Follow up of D99989 - store some strings more safely

As a follow up of https://reviews.llvm.org/D99989#inline-953343, I'm now
storing std::string instead of char *. I know it might never break as char *,
but if it does, chasing that bug might be dauting.
Besides, I'm also checking of the strings gotten through the SB API are
null or not.

VirtRegMap: Add pass option to not clear virt regs

In a future change it will be possible to run register
allocation with a specific set of register classes,
so some of the remaining virtual registers will still
be meaningful.

AMDGPU: Add missing runline to test

There are checks for gfx908, but this wasn't actually running with it.

[AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn

Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the instruction can never be null.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D101430

[AMDGPU] Remove dead early-out in GCNHazardRecognizer

Remove an early-out in wait state counting which can never be
taken.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D101520

[Sema] Don't set BlockDecl's DoesNotEscape bit if the parameter type of
the function the block is passed to isn't a block pointer type

This patch fixes a bug where a block passed to a function taking a
parameter that doesn't have a block pointer type (e.g., id or reference
to a block pointer) was marked as noescape.

This partially fixes PR50043.

rdar://77030453

Differential Revision: https://reviews.llvm.org/D101097

[msan] Remove dead function/fields

To see how to extract a shared allocator interface for D101204,
found some unused code. Tests passed. Are they safe to remove?

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101559

[ObjC][ARC] Don't enter the cleanup scope if the initializer expression
isn't an ExprWithCleanups

This patch fixes a bug where a temporary ObjC pointer is released before
the end of the full expression.

This fixes PR50043.

rdar://77030453

Differential Revision: https://reviews.llvm.org/D101502

Reland "[lld-link] Enable addrsig table in COFF lto"

This reverts commit a78fa73bcf986cf5912d665ecd9620535f480607.

The commit cab48e2f0e00648ef0494ce114f4e00a3ded330f fixes the issue on eabd55b1b2c5e322c3b36cb44348f178692890c8.

[mlir][sparse] migrate sparse operations into new sparse tensor dialect

This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.

NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.

Differential Revision: https://reviews.llvm.org/D101573

[CodeGen] don't emit addrsig symbol if it's used only by metadata

Value only used by metadata can be removed from .addrsig table.
This solves the undefined symbol error when enabling addrsig table on COFF LTO.

Differential Revision: https://reviews.llvm.org/D101512

[mlir][tosa] Remove constant-0 dim expr values from TOSA lowerings

Constant-0 dim expr values should be avoided for linalg as it can prevent
fusion. This includes adding support for rank-0 reshapes.

Differential Revision: https://reviews.llvm.org/D101418

[XCOFF] Handle the case when personality routine is an alias

Summary:
Personality routine could be an alias to another personality routine.
Fix the situation when we compile the file that contains the personality
routine and the file also have functions that need to refer to the
personality routine.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D101401

Recommit "[clang][driver] Use the provided arch name for a Darwin target triple

This ensures that the Darwin driver uses a consistent target triple
representation when the triple is printed out to the user.

This reverts the revert commit ab0df6c0346e515291a381467527621ab0ccf953.

Differential Revision: https://reviews.llvm.org/D100807

[libcxx][ranges] Fix tests for stdlib types that conform to sized_sentinel_for.

Differential Revision: https://reviews.llvm.org/D101371

[GlobalISel][Legalizer] Bump up a smallvector size that was found to be too small. NFC.

[CMake] Stop using c++ subdirectory for libc++ on Win to ARM Linux cross builds. NFC

Updated cross Win-x-ARM Linux toolchain cmake cache file in according of
the following changes: https://reviews.llvm.org/D100869

Stop using use c++ subdirectory for libc++ library

[ORC] JITDylib::addDependencies should be run under the session lock.

Revert "[llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets"

This reverts commit 37789240882bfacd951767acdb4c088fcbf53385.

The added test fails on at least one buildbot, by printing a reversed
combination, printing "func3_xdata +0x18 (0x8)" while it's supposed to
be "func3_xdata +0x8 (0x18)", see e.g.
https://lab.llvm.org/buildbot/#/builders/107/builds/7269. Currently
no idea how that could happen, but reverting until it can be figured
out.

[AArch64][GlobalISel] Simplify out of range rotate amount.

Differential Revision: https://reviews.llvm.org/D101005

Revert "[mlir][sparse] migrate sparse operations into new sparse tensor dialect"

This reverts commit a6d92a971175d727873a9e7644913ee02d7232a8.

The build with -DBUILD_SHARED_LIBS=ON is broken.

[llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets

When looking up data referenced from pdata/xdata structures, the
referenced data can be found in two different ways:
- For an unrelocated object file, it's located via a relocation
- For a relocated, linked image, the data is referenced with an
(image relative) absolute address

For the latter case, the absolute address can optionally be
described with a symbol.

For the case of an object file, there's two offsets involved; one
immediate offset encoded in the data location that is modified by
the relocation, and a section offset in the symbol.

Previously, for the ExceptionRecord field, we printed the offset
from the symbol (only) but used the immediate offset ignoring
the symbol's address (using only the symbol's section) for printing
the exception data.

Add a helper method for doing the lookup and address calculation,
for simplifying the calling code and making all the cases consistent.

This addresses an existing FIXME comment, fixing printing of the
exception data for cases where relocations point at individual
symbols in the xdata section (which is what MSVC generates) instead of
all relocations pointing at the start of the xdata section (which is
what LLVM generates).

This also fixes printing of the function name for packed entries in
linked images.

Differential Revision: https://reviews.llvm.org/D100305

[LLD] [COFF] Fix the mingw --export-all-symbols behaviour with comdat symbols

When looking for the "all" symbols that are supposed to be exported,
we can't look at the live flag - the symbols we mark as to be
exported will become GC roots even if they aren't yet marked as live.

With this in place, building an LLVM library with BUILD_SHARED_LIBS
produces the same set of symbols exported regardless of whether the
--gc-sections flag is specified, both with and without being built
with -ffunction-sections.

Differential Revision: https://reviews.llvm.org/D101522

[flang][OpenMP][FIX] Fix the worksharing nesting check with inclusion of more constructs to cover combined constructs.

Revert "Generalize getInvertibleOperand recurrence handling slightly"

This reverts commit 0c01b37eeb18a51a7e9c9153330d8009de0f600e while a problem reported is investigated.

[mlir] Fix lowering of multi-dimensional vector log1p to LLVM

This was using the untransformed operand, leading to invalid IR.

Differential Revision: https://reviews.llvm.org/D101531

[AMDGPU] Fix v_swap_b32 formation on physical registers

As explained in the comments, matchSwap matches:

// mov t, x
// mov x, y
// mov y, t

and turns it into:

// mov t, x (t is potentially dead and move eliminated)
// v_swap_b32 x, y

On physical registers we don't have full use-def chains so the check
for T being live-out was not working properly with subregs/superregs.

Differential Revision: https://reviews.llvm.org/D101546

[COST] Improve shuffle kind detection if shuffle mask is provided.

Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865

Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."

This reverts commit 92399322217917e67c0d72a55ec51ddc82251cf6 to fix
a compiler crash on mask checks.

[lld-macho] Remove stray file

Basic block sections for functions with implicit-section-name attribute

Functions can have section names set via #pragma or section attributes,
basic block sections should be correctly named for such functions.

With #pragma, the expectation is that all functions in that file are placed
in the same section in the final binary. Basic block sections should be
correctly named with the unique flag set so that the final binary has all the
basic blocks of the function in that named section. This patch fixes the bug
by calling getExplictSectionGlobal when implicit-section-name attribute is set
to make sure the function's basic blocks get the correct section name.

Differential Revision: https://reviews.llvm.org/D101311

[lld-macho][nfc] Clean up header.s test

I don't think it's super worthwhile to test the dylib headers outputs of
all the different archs when x86_64 is the only one that has interesting
behavior.

Motivated by my upcoming addition of arm32...

[lld-macho] Make everything PIE by default

Modern versions of macOS (>= 10.7) and in general all modern Mach-O
target archs want PIEs by default. ld64 defaults to PIE for iOS >= 4.3,
as well as for all versions of watchOS and simulators. Basically all the
platforms LLD is likely to target want PIE. So instead of cluttering LLD's
code with legacy version checks, I think it's simpler to just default to
PIE for everything.

Note that `-no_pie` still works, so users can still opt out of it.

Reviewed By: #lld-macho, thakis, MaskRay

Differential Revision: https://reviews.llvm.org/D101513

[mlir][sparse] migrate sparse operations into new sparse tensor dialect

This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.

NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D101488

[InstCombine] narrow popcount with zext operand

https://llvm.org/PR50141

[InstCombine] add tests for popcount with zext operand; NFC

PR50141

Revert "RegAlloc: do not consider liveins to EH-pad successors as liveout."

Some liveins *can* come from this block (e.g. any SSA value except the call),
it's only the ones that produce `landingpad` values that can't and I didn't
think it through properly.

AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return

When atomic image intrinsic return value is unused, register class for
destination of a sub-register copy of return value ends up not being set.
This copy then hits 'Register class not set' assert later.
If return value has uses, register class is determined by use instruction.
Fix is to not create sub-register copy when image intrinsic destination has
no uses because it would be deleted by dead-mi-elimination later anyway.

Differential Revision: https://reviews.llvm.org/D101448

[ASan] Rename `-fsanitize-address-destructor-kind=` to drop the `-kind` suffix.

Renaming the option is based on discussions in https://reviews.llvm.org/D101122.

It is normally not a good idea to rename driver flags but this flag is
new enough and obscure enough that it is very unlikely to have adopters.

While we're here also drop the `<kind>` metavar. It's not necessary and
is actually inconsistent with the documentation in
`clang/docs/ClangCommandLineReference.rst`.

Differential Revision: https://reviews.llvm.org/D101491

RegAlloc: do not consider liveins to EH-pad successors as liveout.

These registers get defined by the runtime, not the block being allocated, and
treating them as preassigned in RegAllocFast adds extra pressure, sometimes
enough to make the function unallocatable.

[AIX][TLS] Add ASM portion changes to support TLSGD relocations to XCOFF objects

- Add new variantKinds for the symbol's variable offset and region handle
- Print the proper relocation specifier @gd in the asm streamer when emitting
the TC Entry for the variable offset for the symbol
- Fix the switch section failure between the TC Entry of variable offset and
region handle
- Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property

Reviewed by: sfertile

Differential Revision: https://reviews.llvm.org/D100956

[SimplifyCFG] Common code sinking: fix application of profitability check

The profitability check is: we don't want to create more than a single PHI
per instruction sunk. We need to create the PHI unless we'll sink
all of it's would-be incoming values.

But there is a caveat there.
This profitability check doesn't converge on the first iteration!
If we first decide that we want to sink 10 instructions,
but then determine that 5'th one is unprofitable to sink,
that may result in us not sinking some instructions that
resulted in determining that some other instruction
we've determined to be profitable to sink becoming unprofitable.

So we need to iterate until we converge, as in determine
that all leftover instructions are profitable to sink.

But, the direct approach of just re-iterating seems dumb,
because in the worst case we'd find that the last instruction
is unprofitable, which would result in revisiting instructions
many many times.

Instead, i think we can get away with just two passes - forward and backward.
However then it isn't obvious what is the most performant way to update
InstructionsToSink.

[lld][WebAssembly] Add `--export-if-defined`

Unlike the existing `--export` option this will not causes errors
or warnings if the specified symbol is not defined.

See: https://github.com/emscripten-core/emscripten/issues/13736

Differential Revision: https://reviews.llvm.org/D99887

[libc++] Fixes std::to_chars for bases != 10.

While working on D70631, Microsoft's unit tests discovered an issue.
Our `std::to_chars` implementation for bases != 10 uses the range
`[first,last)` as temporary buffer. This violates the contract for
to_chars:
[charconv.to.chars]/1 http://eel.is/c++draft/charconv#to.chars-1
`to_chars_result to_chars(char* first, char* last, see below value, int base = 10);`
"If the member ec of the return value is such that the value is equal to
the value of a value-initialized errc, the conversion was successful and
the member ptr is the one-past-the-end pointer of the characters
written."

Our implementation modifies the range `[member ptr, last)`, which causes
Microsoft's test to fail. Their test verifies the buffer
`[member ptr, last)` is unchanged. (The test is only done when the
conversion is successful.)

While looking at the code I noticed the performance for bases != 10 also
is suboptimal. This is tracked in D97705.

This patch fixes the issue and adds a benchmark. This benchmark will be
used as baseline for D97705.

Reviewed By: #libc, Quuxplusone, zoecarver

Differential Revision: https://reviews.llvm.org/D100722