review.tizen.org Git - platform/upstream/llvm.git/log

[InstCombine] Return replaceInstUsesWith() result (NFC)

Follow the usual usage pattern for this function and return the
result.

[AArch64] Generate and parse SEH assembly directives

This ensures that you get the same output regardless if generating
code directly to an object file or if generating assembly and
assembling that.

Add implementations of the EmitARM64WinCFI*() methods in
AArch64TargetAsmStreamer, and fill in one blank in MCAsmStreamer.

Add corresponding directive handlers in AArch64AsmParser and
COFFAsmParser.

Some SEH directive names have been picked to match the prior art
for SEH assembly directives for x86_64, e.g. the spelling of
".seh_startepilogue" matching the preexisting ".seh_endprologue".

For the directives for saving registers, the exact spelling
from the arm64 documentation is picked, e.g. ".seh_save_reg" (to follow
that naming for all the other ones, e.g. ".seh_save_fregp_x"), while
the corresponding one for x86_64 is plain ".seh_savereg" without the
second underscore.

Directives in the epilogues have the same names as in prologues,
e.g. .seh_savereg, even though the registers are restored, not
saved, at that point.

Differential Revision: https://reviews.llvm.org/D86529

[MC] [Win64EH] Fill in FuncletOrFuncEnd if missing

This can happen e.g. for code that declare .seh_proc/.seh_endproc
in assembly, or for code that use .seh_handlerdata (which triggers
the unwind info to be emitted before the end of the function).

The TextSection field must be made non-const to be able to use it
with Streamer.SwitchSection().

Differential Revision: https://reviews.llvm.org/D86528

[InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW

We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.

[InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW

We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.

[NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith()

[NFC] Instruction::isIdenticalToWhenDefined(): s/nessesairly/necessarily/

[NFC][InstCombine] Add STATISTIC() for how many iterations we did

As we've established, if it takes more than two iterations
(one to perform folding and one to ensure that no folding opportunities
remain) per function, then there are worklist management issues.
So it may be interesting to keep track of it.

[NFC][InstCombine] select.ll: remove outdated TODO comment

Fixed by 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9

[InstCombine] visitPHINode(): use InstCombiner::replaceInstUsesWith() instead of RAUW

As noted in post-commit review, we really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.

[InstCombine] Take 2: Perform trivial PHI CSE

The original take was 6102310d814ad73eab60a88b21dd70874f7a056f,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb40d26b5d3ead3bb8e9e284218e0186
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.

However InstCombine already does many different optimizations,
so it should be a safe place to do it here.

Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |      |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs                             | 0         | 22228     |  22228 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942329   | 7942456   |    127 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254313792 |  18160 |    0.01% |    0.01% |
| early-cse.NumCSE                                   | 2183283   | 2183272   |    -11 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3666911   |  26600 |    0.73% |    0.73% |
| instcombine.NumDeadInst                            | 1778204   | 1783318   |   5114 |    0.29% |    0.29% |
| instcount.NumCallInst                              | 1758395   | 1758804   |    409 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330549    |     -8 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077221   |     83 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832611   |    665 |    0.01% |    0.01% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999740    | -20073 |   -1.97% |    1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.

[NFC][InstSimplify] Add a note to PHI CSE tests that they are all negative tests

As discussed in https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html
even though it seems worthwhile doing so in InstSimplify,
we really can't do that there, because the other PHI wouldn't be
def-reachable from the original PHI.

[NFC][InstCombine] Add tests for PHI CSE

As discussed in https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html
even though it seems worthwhile doing so in InstSimplify,
we really can't do that there, because the other PHI wouldn't be
def-reachable from the original PHI.

So ignoring whether or not EarlyCSE should also know to do this,
InstCombine is the place.

[PPC] Fix platform definitions when compiling FreeBSD powerpc64 as LE

As a prerequisite to doing experimental buids of pieces of FreeBSD PowerPC64 as little-endian, allow actually targeting it.

This is needed so basic platform definitions are pulled in. Without it, the compiler will only run freestanding.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D73425

[InstCombine] Fix typo in comment (NFC)

As pointed out in post-commit review of D63060.

[Target][AArch64] Allow for char as int8_t in AArch64AsmParser.cpp

A couple of AArch64 tests were failing on Solaris, both sparc and x86:

  LLVM :: MC/AArch64/SVE/add-diagnostics.s
  LLVM :: MC/AArch64/SVE/cpy-diagnostics.s
  LLVM :: MC/AArch64/SVE/cpy.s
  LLVM :: MC/AArch64/SVE/dup-diagnostics.s
  LLVM :: MC/AArch64/SVE/dup.s
  LLVM :: MC/AArch64/SVE/mov-diagnostics.s
  LLVM :: MC/AArch64/SVE/mov.s
  LLVM :: MC/AArch64/SVE/sqadd-diagnostics.s
  LLVM :: MC/AArch64/SVE/sqsub-diagnostics.s
  LLVM :: MC/AArch64/SVE/sub-diagnostics.s
  LLVM :: MC/AArch64/SVE/subr-diagnostics.s
  LLVM :: MC/AArch64/SVE/uqadd-diagnostics.s
  LLVM :: MC/AArch64/SVE/uqsub-diagnostics.s

For example, reduced from `MC/AArch64/SVE/add-diagnostics.s`:

  add     z0.b, z0.b, #0, lsl #8

missed the expected diagnostics

  $ ./bin/llvm-mc -triple=aarch64 -show-encoding -mattr=+sve add.s
  add.s:1:21: error: immediate must be an integer in range [0, 255] with a shift amount of 0
  add     z0.b, z0.b, #0, lsl #8
                      ^

The message is `Match_InvalidSVEAddSubImm8`, emitted in the generated
`lib/Target/AArch64/AArch64GenAsmMatcher.inc` for `MCK_SVEAddSubImm8`.
When comparing the call to `::AArch64Operand::isSVEAddSubImm<char>` on both
Linux/x86_64 and Solaris, I find

  875     bool IsByte = std::is_same<int8_t, std::make_signed_t<T>>::value;

is `false` on Solaris, unlike Linux.

The problem boils down to the fact that `int8_t` is plain `char` on
Solaris: both the sparc and i386 psABIs have `char` as signed.  However,
with

  9887     DiagnosticPredicate DP(Operand.isSVEAddSubImm<int8_t>());

in `lib/Target/AArch64/AArch64GenAsmMatcher.inc`, `std::make_signed_t<int8_t>`
above yieds `signed char`, so `std::is_same<int8_t, signed char>` is `false`.

This can easily be fixed by also allowing for `int8_t` here and in a few
similar places.

Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D85225

[Attributes] Merge calls to getFnAttribute/hasFnAttribute using Attribute::isValid. NFC

Rather than calling hasFnAttribute and then calling getFnAttribute
if the attribute exists, its better to just call getFnAttribute and
then check if we got a valid attribute back.

[NFC][InstructionSimplify] Add a warning about not simplifying to not def-reachable

See
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html
and
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824967.html

InstSimply is not allowed to perform simplifications to instructions
that are not def-reachable from the original instruction.

[NFC][STLExtras] Add make_first_range(), similar to existing make_second_range()

Having just one of the two seens weird.
I wanted to use it a few times, but it wasn't there.

[DWARFYAML] Make the debug_abbrev_offset field optional.

This patch helps make the debug_abbrev_offset field optional. We don't
need to calculate the value of this field in the future.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86614

[LLD][PowerPC][test] Disable ELF/ppc64-pcrel-long-branch-error.s

Following 0becc27ebfec, `ppc64-pcrel-long-branch-error.s` fails in some
environments with out-of-memory errors associated with buffering the
output in-memory. Since the alternative of writing to an allocated file
is also known to cause problems, we will disable the test
unconditionally (pending a mechanism to disable the test selectively).

[DAGCombiner] Enhance (zext(setcc))

Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86687

[compiler-rt][tsan] Remove unnecesary typedefs

These typedefs are not used anywhere else in this compilation unit.

Differential Revision: https://reviews.llvm.org/D86826

[lldb] Hoist --framework argument out of LLDB_TEST_COMMON_ARGS (NFC)

Give the framework argument its own variable (LLDB_FRAMEWORK_DIR) so
that we can configure it in lit.site.cfg.py if we so desire.

[lldb] Make the lit configuration values optional for the API tests

LIT uses a model where the test suite is configurable trough a
lit.site.cfg file. Most of the time we use the lit.site.cfg with values
that match the current build configuration, generated by CMake.

Nothing prevents you from running the test suite with a different
configuration, either by overriding some of these values from the
command line, or by passing a different lit.site.cfg.

The latter is currently tedious. Many configuration values are optional
but they still need to be set because lit.cfg.py is accessing them
directly. This patch changes the code to use getattr to return the
attribute if it exists. This makes it possible to specify a minimal
lit.site.cfg with only the mandatory/desired configuration values.

Differential revision: https://reviews.llvm.org/D86821

[ObjC][ARC] In HandlePotentialAlterRefCount, check whether an
instruction can decrement the reference count, not whether it can alter
it

This prevents the state transition from S_Use to S_CanRelease when doing
a bottom-up traversal and the transition from S_Retain to S_CanRelease
when doing a top-down traversal when the visited instruction can
increment the ref count but cannot decrement it. This allows the ARC
optimizer to remove retain/release pairs which were previously not
removed.

rdar://problem/21793154

Use report_fatal_error instead of llvm::errs() + abort() (NFC)

This is making the error reporting in line with other fatal errors.

[gcov][test] Don't write a.gcno in CWD

[clang] Enable -fsanitize=thread on Fuchsia.

This CL modifies clang enabling using -fsanitize=thread on fuchsia. The
change doesn't build the runtime for fuchsia, it just enables the
instrumentation to be used.

pair-programmed-with: mdempsky@google.com
Change-Id: I816c4d240d1f15e9eae2803fb8ba3a7bf667ed51

Reviewed By: mcgrathr, phosek

Differential Revision: https://reviews.llvm.org/D86822

[ARC] Update brcc test.

Revert "[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block"

This reverts commit 6102310d814ad73eab60a88b21dd70874f7a056f. It
appears to cause compilation non-determinism and caused stage3
mismatches.

[gcov] Increment counters with atomicrmw if -fsanitize=thread

Without this patch, `clang --coverage -fsanitize=thread` may fail spuriously
because non-atomic counter increments can be detected as data races.

[lldb] Get rid of LLDB_LIB_DIR and LLDB_IMPLIB_DIR in dotest

This patch removes the rather confusing LLDB_LIB_DIR and LLDB_IMPLIB_DIR
environment variables. They are confusing because LLDB_LIB_DIR would
point to the bin subdirectory in the build root while LLDB_IMPLIB_DIR
would point to the lib subdirectory. The reason far this was
LLDB.framework, which gets build under bin.

This patch replaces their uses with configuration.lldb_framework_path
and configuration.lldb_libs_dir respectively.

Differential revision: https://reviews.llvm.org/D86817

[lldb] Dervice dotest.py path from config.lldb_src_root (NFC)

Add Location, Region and Block to MLIR Python bindings.

* This is just enough to create regions/blocks and iterate over them.
* Does not yet implement the preferred iteration strategy (python pseudo containers).
* Refinements need to come after doing basic mappings of operations and values so that the whole hierarchy can be used.

Differential Revision: https://reviews.llvm.org/D86683

GlobalISel: Combine out redundant sext_inreg

The scalar tests don't work yet, since computeNumSignBits apparently
doesn't handle sextload yet, and sext folds into the load first.

[early-ifcvt] Add OptRemarks

AMDGPU: Fix incorrectly deleting copies after spilling SGPR tuples

The implicit def of the super register would appear to kill any live
uses of components before the spill, and would be deleted by
MachineCopyPropagation. We need to add implicit uses of the super
register, similarly to what copyPhysReg does. VGPR tuples appear to be
correctly handled already. I need to double check the SGPR->memory
path.

[UpdateTestChecks] include { in function signature check line

After D85099, if we have attribute group in the function signature that hasn't
been seen before, and later a callsite with the same attribute group, filecheck will evaluate
the first attribute group to for example '#0 {'. We now include { in the args_and_sig group to avoid this.

Differential Revision: https://reviews.llvm.org/D86769

Reland "[test] Exit with an error if no tests are run."

This reverts commit a06c28df3e8c85ceb665d3d9a1ebc2853dfd87a9 (reland adb5c23f8c0d60eeec41dcbe21d1b26184e1c97d).

The issue with PExpect tests on Windows should be fixed with e5e05ecf65aba836802154279efbc8cbce6fe2ea.

[test] Pin some RUNs in potential.ll to legacy PM

There are corresponding NPM RUNs.

Rename AnalysisManager::slice in AnalysisManager::nest (NFC)

The naming wasn't reflecting the intent of this API, "nest" is aligning
it with the pass manager API.

Add new warning for compound punctuation tokens that are split across macro expansions or split by whitespace.

For example:

#define FOO(x) (x)
FOO({});

... forms a statement-expression after macro expansion. This warning
applies to '({' and '})' delimiting statement-expressions, '[[' and ']]'
delimiting attributes, and '::*' introducing a pointer-to-member.

The warning for forming these compound tokens across macro expansions
(or across files!) is enabled by default; the warning for whitespace
within the tokens is not, but is included in -Wall.

Differential Revision: https://reviews.llvm.org/D86751

[Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)

There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744

[LLD][PowerPC] Remove redundant file write out in the test cases

Fix the time out test failure on lld-x86_64-freebsd build bot.

Peer reviewed by: stefanp, nemanjai

[ObjCARCOpt] Port objc-arc to NPM

Since doInitialization() in the legacy pass modifies the module, the NPM
pass is a Module pass.

Reviewed By: ahatanak, ychen

Differential Revision: https://reviews.llvm.org/D86178

[SROA] Improve handleling of assumes bundles by SROA

This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e
And gives SROA the ability to remove assumes if it allows promoting an alloca to register
Without removing assumes when it can't promote to register.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86570

[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)

Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding
usub.sat(a, b) + b to umax(a, b). The backend will expand umax
back to usubsat if that is profitable.

We may also want to handle uadd.sat(a, b) - b in the future.

Differential Revision: https://reviews.llvm.org/D63060

[LIBOMPTARGET]Do not try to optimize bases for the next parameters.

PrivateArgumentManager shall immediately allocate firstprivates if they
are bases for the next parameters and the next paramaters rely on the
fact that the base musst be allocated already.

Differential Revision: https://reviews.llvm.org/D86781

Skip analysis re-computation when no changes are reported

This is a follow-up to https://reviews.llvm.org/D80707, generalized to
CallGraphSCC, Loop and Region

Differential Revision: https://reviews.llvm.org/D86442

[ARM] Skip combining base updates for vld1x NEON intrinsics

Skip this for now, to avoid a backend crash in:

UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412

This should fix PR45824.

Differential Revision: https://reviews.llvm.org/D86784

Strength-reduce SmallVectors to arrays. NFCI.

[CodeGenPrepare] Zap the argument of llvm.assume when deleting it

We know that the argument is mostly likely dead, so we can purge it
early. Otherwise it would make it to codegen, and can block further
optimizations.

[lldb/test] Use shorter test case names in TestStandardUnwind

TestStandardUnwind uses the full absolute path to a set of C/C++ files as the test case name, which in turn is used in the name of a log file. When the source file is long, and the directory where log files are stored is also long, this causes an OSError because the log filename is too long.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D86752

[lldb] Fix typo in disassemble_options_line description

[lldb/test] Use @skipIfWindows for PExpectTest

Annotating `PExpectTest` with `@skipIfWindows` instead of marking it as an empty class will make the test runner recognize it as a test class, which should allow me to reland adb5c23f8c0d60eeec41dcbe21d1b26184e1c97d.

I don't have a windows machine to verify this works, but I did some tests using `@skipIfLinux` and they all worked as expected. In case the `pexpect` import is not at all available on windows, I moved it to within the method where it's used.

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D86745

[gn build] Port 94faadaca4e

[llvm][CodeGen] Machine Function Splitter

We introduce a codegen optimization pass which splits functions into hot and cold
parts. This pass leverages the basic block sections feature recently
introduced in LLVM from the Propeller project. The pass targets
functions with profile coverage, identifies cold blocks and moves them
to a separate section. The linker groups all cold blocks across
functions together, decreasing fragmentation and improving icache and
itlb utilization.

We evaluated the Machine Function Splitter pass on clang bootstrap and
SPECInt 2017.

For clang bootstrap we observe a mean 2.33% runtime improvement with a
~32% reduction in itlb and stlb misses. Additionally, L1 icache misses
reduced by 9.5% while L2 instruction misses reduced by 20%.

For SPECInt we report the change in IntRate the C/C++
benchmarks. All benchmarks apart from mcf and x264 improve, on average
by 0.6% with the max for deepsjeng at 1.6%.

Benchmark % Change
500.perlbench_r 0.78
502.gcc_r 0.82
505.mcf_r -0.30
520.omnetpp_r 0.18
523.xalancbmk_r 0.37
525.x264_r -0.46
531.deepsjeng_r 1.61
541.leela_r 0.83
557.xz_r 0.15

Differential Revision: https://reviews.llvm.org/D85368

[ARM][MVE] Enable MVE gathers and scatters by default

Enable MVE gather/scatters by default, which requires some
minor adaptations in some tests.

Differential revision: https://reviews.llvm.org/D86776

[flang][NFC] Change how error symbols are recorded

When an error is associated with a symbol, it was marked with a flag
from Symbol::Flag. The problem with that is that you need a mutable
symbol to do that. Instead, store the set of error symbols in the
SemanticsContext. This allows for some const_casts to be eliminated.

Also, improve the internal error that occurs if SetError is called
but no fatal error has been reported.

Differential Revision: https://reviews.llvm.org/D86740

[libc++] Un-deprecate and un-remove some members of std::allocator

This implements the part of P0619R4 related to the default allocator.
This is incredibly important, since otherwise there is an ABI break
between C++17 and C++20 w.r.t. the default allocator's size_type on
platforms where std::size_t is not the same as std::make_unsigned<std::ptrdiff_t>.

[ARM] Correct predicate operand for offset gather/scatter

These arm_mve_vldr_gather_offset_predicated and
arm_mve_vstr_scatter_offset_predicated have some extra parameters
meaning the predicate is at a later operand. If a loop contains _only_
those masked instructions, we would miss transforming the active lane
mask.

Differential Revision: https://reviews.llvm.org/D86791

[ARM] Extra gather scatter tailpred test. NFC

[PowerPC] Implemented Vector Load with Zero and Signed Extend Builtins

This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.

Differential Revision: https://reviews.llvm.org/D82502#inline-797941

[Statepoint] Always spill base pointer.

There is a subtle problem with new statepoint lowering scheme
when base and pointers are the same (see PR46917 for more context):

%1 = STATEPOINT ... %0, %0(tied-def 0)...

if, for some reason, register allocator desides to put two instances
of %0 into two different objects (registers or spill slots), we may
end up with

$reg3 = STATEPOINT ... $reg2, $reg1(tied-def 0)...

and nothing will prevent later passes to sink uses of $reg2 below
statepoint, which is incorrect.

As a short term solution, always put base pointers on stack during
lowering.
A longer term solution may be to rework MIR statepoint format to
avoid GC pointer duplication in statepoint argument list.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D86712

[OpenMP] Fixed wrong test command in the test private_mapping.c

The test command in `private_mapping.c` was set to expect failure by mistake. It is fixed in this patch.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86758

[GlobalISel] fix a compilation error with gcc 6.3.0

With gcc 6.3.0, I hit the following compilation error:
  ../lib/CodeGen/GlobalISel/Combiner.cpp: In member function
      ‘bool llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&,
       llvm::GISelCSEInfo*)’:
  ../lib/CodeGen/GlobalISel/Combiner.cpp:156:54: error: suggest parentheses
       around ‘&&’ within ‘||’ [-Werror=parentheses]
     assert(!CSEInfo || !errorToBool(CSEInfo->verify()) &&
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
                            "CSEInfo is not consistent. Likely missing calls to "
                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                            "observer on mutations");

Fix the code as suggested by the compiler.

[DAGCombine] Don't delete the node if it has uses immediately

This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node.
```
    if (NegX && (CostX <= CostY)) {
      Cost = std::min(CostX, CostZ);
      RemoveDeadNode(NegY);
      return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags);  #<-- NegY is used here if NegY == NegX.
    }
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86689

Reland "[CodeGen][AArch64] Support arm_sve_vector_bits attribute"

This relands D85743 with a fix for test
CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass
manager with '-fno-experimental-new-pass-manager'. Test was failing due
to IR differences with the new pass manager which broke the Fuchsia
builder [1]. Reverted in 2e7041f.

[1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375

Original summary:

This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.

VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:

  * Implicit casting between VLA <-> VLS types.
  * Coercion of VLS types in function args/return.
  * Mangling of VLS types.

Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.

Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.

The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:

  __SVE_VLS<typename, unsigned>

where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:

  #if __ARM_FEATURE_SVE_BITS==512
  // Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
  typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
  // Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
  typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
  #endif

The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision.  The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.

[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85743

[LLD][PowerPC] Add a pc-rel based long branch thunk

In this patch, a pc-rel based long branch thunk is added for the local
call protocol that caller and callee does not use TOC.

Reviewed By: sfertile, nemanjai

Differential Revision: https://reviews.llvm.org/D86706

Fix Windows x86 compilation after a6a37a2fcd2a8048a75bd0d8280497ed89d73224

Fix more build failures caused by f4257c5832aa51e960e7351929ca3d37031985b7

MLIR build failed after ElementCount refactoring - updated code to
call isScalable() and getKnownMinValue().

Fix build failures caused by f4257c5832aa51e960e7351929ca3d37031985b7

[SVE] Make ElementCount members private

This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065

[DWARFYAML] Abbrev codes in a new abbrev table should start from 1 (by default).

The abbrev codes in a new abbrev table should start from 1 (by default),
rather than inherit the value from the code in the previous table.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86545

[Statepoint] Turn assert into check in foldPatchpoint.

Original D81646 had check for tied regs in foldPatchpoint().
Due to unfortunate miscommunication with review comments and
adressing some comments post commit, it turned into assertion.

We had an offline talk and agreed that with current implementation
this path is possible, so I'm changing it back to check.

Note that this is workaround until ussues described in PR46917 are
resolved.

[ARM][LowOverheadLoops] Liveouts and reductions

Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.

Differential Revision: https://reviews.llvm.org/D86613

[NFC][ARM] Add tail predication test

[SCCP] Use bulk-remove API to bulk-remove attributes. NFCI.

[SyntaxTree] Add coverage for declarators and init-declarators

[SyntaxTree][NFC] Refactor function templates into functions taking base class

The refactored functions were
* `isReponsibleForCreatingDeclaration`
* `getQualifiedNameStart`

Differential Revision: https://reviews.llvm.org/D86719

[FunctionAttrs] Bulk remove attributes. NFC.

[AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported

Previously in addTypeForNeon, we would set the operations for bfloat vectors
like other generic types. But as bfloat is a storage-only type a number of
operations shouldn't be set. This patch fixes that.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D85101

[clang-format] Detect pointer qualifiers in cast expressions

When guessing whether a closing paren is then end of a cast expression also
skip over pointer qualifiers while looking for TT_PointerOrReference.
This prevents some address-of and dereference operators from being parsed
as a binary operator.

Before:
x = (foo *const) * v;
x = (foo *const volatile restrict __attribute__((foo)) _Nonnull _Null_unspecified _Nonnull) & v;

After:
x = (foo *const)*v;
x = (foo *const volatile restrict __attribute__((foo)) _Nonnull _Null_unspecified _Nonnull)&v;

Reviewed By: MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D86716

[clang-format] Parse nullability attributes as a pointer qualifier

Before:
void f() { MACRO(A * _Nonnull a); }
void f() { MACRO(A * _Nullable a); }
void f() { MACRO(A * _Null_unspecified a); }

After:
void f() { MACRO(A *_Nonnull a); }
void f() { MACRO(A *_Nullable a); }
void f() { MACRO(A *_Null_unspecified a); }

Reviewed By: JakeMerdichAMD

Differential Revision: https://reviews.llvm.org/D86713

[clang-format] Parse __attribute((foo)) as a pointer qualifier

Before: void f() { MACRO(A * __attribute((foo)) a); }
After: void f() { MACRO(A *__attribute((foo)) a); }

Also check that the __attribute__ alias is handled.

Reviewed By: MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D86711

[clang-format] Parse restrict as a pointer qualifier

Before: void f() { MACRO(A * restrict a); }
After: void f() { MACRO(A *restrict a); }

Also check that the __restrict and __restrict__ aliases are handled.

Reviewed By: JakeMerdichAMD

Differential Revision: https://reviews.llvm.org/D86710

[clang-format] Parse volatile as a pointer qualifier

Before: void f() { MACRO(A * volatile a); }
After: void f() { MACRO(A *volatile a); }

Also check that the __volatile and __volatile__ aliases are handled.

Reviewed By: JakeMerdichAMD

Differential Revision: https://reviews.llvm.org/D86708

[DSE,MemorySSA] Check if Current is valid for elimination first.

This changes getDomMemoryDef to check if a Current is a valid
candidate for elimination before checking for reads. Before the change,
we were spending a lot of compile-time in checking for read accesses for
Current that might not even be removable.

This patch flips the logic, so we skip Current if they cannot be
removed before checking all their uses. This is much more efficient in
practice.

It also adds a more aggressive limit for checking partially overlapping
stores. The main problem with overlapping stores is that we do not know
if they will lead to elimination until seeing all of them. This patch
limits adds a new limit for overlapping store candidates, which keeps
the number of modified overlapping stores roughly the same.

This is another substantial compile-time improvement (while also
increasing the number of stores eliminated). Geomean -O3 -0.67%,
ReleaseThinLTO -0.97%.

http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86487

[lldb/Utility] Polish the Scalar class

This patch is mostly about removing the "Category" enum, which was
very useful when the Type enum contained a large number of types, but
now the two are completely identical.

It also removes some other artifacts like unused typedefs and macros.

[lldb] Reduce intentation in SymbolFileDWARF::ParseVariableDIE

using early exits. NFC.

[doxygen] Fix bad doxygen results for BugReporterVisitors.h

`{@code xxxxx}` triggers a Doxygen bug. The bug may be matching the
close brace with the open brace of the namespace
declaration (`namespace clang {` or `namespace ento {`).

Differential Revision: https://reviews.llvm.org/D85105

[cmake] Don't build with -O3 -fPIC on Solaris/sparcv9

Tests on Solaris/sparcv9 currently show about 250 failures when building
with gcc, most of them like the following:

  FAIL: LLVM-Unit :: Support/./SupportTests/TaskQueueTest.UnOrderedFutures (4269 of 67884)
  ******************** TEST 'LLVM-Unit :: Support/./SupportTests/TaskQueueTest.UnOrderedFutures' FAILED ********************
  Note: Google Test filter = TaskQueueTest.UnOrderedFutures
  [==========] Running 1 test from 1 test case.
  [----------] Global test environment set-up.
  [----------] 1 test from TaskQueueTest
  [ RUN      ] TaskQueueTest.UnOrderedFutures
  0  SupportTests        0x0000000100753b20 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 32
  1  SupportTests        0x0000000100752974 llvm::sys::RunSignalHandlers() + 68
  2  SupportTests        0x0000000100752b18 SignalHandler(int) + 372
  3  libc.so.1           0xffffffff7eedc800 __sighndlr + 12
  4  libc.so.1           0xffffffff7eecf23c call_user_handler + 852
  5  libc.so.1           0xffffffff7eecf594 sigacthandler + 84
  6  SupportTests        0x00000001006f8cb8 std::thread::_State_impl<std::thread::_Invoker<std::tuple<llvm::ThreadPool::ThreadPool(llvm::ThreadPoolStrategy)::'lambda'()> > >::_M_run() + 512
  7  libstdc++.so.6.0.28 0xfffffffc628117cc execute_native_thread_routine + 16
  8  libc.so.1           0xffffffff7eedc6a0 _lwp_start + 0

Since it's effectively impossible to debug such a `SEGV` in a `Release`
build, I tried a `Debug` build instead, only to find that the failures had
gone away.

Further investigation revealed that most of the issue centers around
`llvm/lib/Support/ThreadPool.cpp`.  That file is built with `-O3 -fPIC` in
a `Release` build.  The failure vanishes if

- compiling without `-fPIC`
- compiling with `-O -fPIC`
- linking with GNU `ld` instead of Solaris `ld`

It has meanwhile been determined that `gcc` doesn't correctly heed some TLS
code sequences.  To make things worse, Solaris `ld` doesn't properly
validate its assumptions against the input, generating wrong code.

`gld` like `gcc` is more liberal here and correctly deals with the code it
gets fed from `gcc`.

There's PR target/96607: GCC feeds SPARC/Solaris linker with unrecognized
TLS sequences <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96607> now.

An attempt to build with `-DLLVM_ENABLE_PIC=Off` initially failed since
neither `libRemarks.so` (D85626 <https://reviews.llvm.org/D85626>) nor
`LLVMPolly.so` (D85627 <https://reviews.llvm.org/D85627>) heed that option.
Even with that fixed, a few codegen failures remain.

Next I tried to build just `ThreadPool.cpp` with `-O -fPIC`.  While that
fixed the vast majority of the failures, 16 `LLVM :: CodeGen/X86` failures
remained.

Given that that solution was both incomplete and fragile, I went for
building the whole tree with `-O -fPIC` for `Release` and `RelWithDebInfo`
builds.

As detailed in Bug 47304, 2-stage builds also show large numbers of
failures when building with `-O3` or `-O2`, which are likewise worked
around by building with `-O` until they are sufficiently analyzed and
fixed.

This way, all failures relative to a `Debug` build go away.

Tested on `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D85630

[MemLoc] Support memcmp in MemoryLocation::getForArgument.

This patch adds support for memcmp in MemoryLocation::getForArgument.
memcmp reads from the first 2 arguments up to the number of bytes of the
third argument.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86725

[BasicAA] Add first libfunc tests with memcmp.

[DSE,MemorySSA] Add memcmp test case.

[NFC][asan] Don't unwind stack before pool check

[mlir][Linalg] Enhance Linalg fusion on generic op and tensor_reshape op.

The tensor_reshape op was only fusible only if it is a collapsing case. Now we
propagate the op to all the operands so there is a further chance to fuse it
with generic op. The pre-conditions are:

1) The producer is not an indexed_generic op.
2) All the shapes of the operands are the same.
3) All the indexing maps are identity.
4) All the loops are parallel loops.
5) The producer has a single user.

It is possible to fuse the ops if the producer is an indexed_generic op. We
still can compute the original indices. E.g., if the reshape op collapses the d0
and d1, we can use DimOp to get the width of d1, and calculate the index
`d0 * width + d1`. Then replace all the uses with it. However, this pattern is
not implemented in the patch.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D86314

[BuildLibCalls] Add argmemonly to more lib calls.

strspn, strncmp, strcspn, strcasecmp, strncasecmp, memcmp, memchr,
memrchr, memcpy, memmove, memcpy, mempcpy, strchr, strrchr, bcmp
should all only access memory through their arguments.

I broke out strcoll, strcasecmp, strncasecmp because the result
depends on the locale, which might get accessed through memory.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86724

[llvm-readobj] - Simplify the code that creates dumpers. NFCI.

We have a few helper functions like the following:
```
std::error_code create*Dumper(...)
```

In fact we do not need or want to use `std::error_code` and the code
can be simpler if we just return `std::unique_ptr<ObjDumper>`.

This patch does this change and refines the signature of `createDumper`
as well.

Differential revision: https://reviews.llvm.org/D86718