review.tizen.org Git - platform/upstream/llvm.git/log

WIP: Verify -gsimple-template-names=mangled values

Clang will encode names that should be able to be simplified as
"_STNname|<template, args>" (eg: "_STNt1|<int>") - this verification
mode will detect these names, decode them, create the original name
("t1<int>") and the simple name ("t1") - letting the simple name run
through the usual rebuilding logic - then compare the two sources of the
full name - the rebuilt and the _STN encoding.

This helps ensure that -gsimple-template-names is lossless.

[dsymutil] Track incompleteness across unions

When determining the incompleteness of a DIE based on its children, make
sure we propagate it across union types. See test case for an example.
Without this patch we never emit the definition of Container_ivars.

Differential revision: https://reviews.llvm.org/D110443

[llvm-profgen] Unify output format of different unsymbolized profiles

Differential Revision: https://reviews.llvm.org/D110080

[AutoFDO][llvm-profgen] Report zero count for unexecuted part of function code

In order to be consistent with compiler that interprets zero count as unexecuted(cold), this change reports zero-value count for unexecuted part of function code. For the implementation, it leverages the range counter, initializes all the executed function range with the zero-value. After all ranges are merged and converted into disjoint ranges, the remaining zero count will indicates the unexecuted(cold) part of the function.

This change also extends the current `findDisjointRanges` method which now can support adding zero-value range.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D109713

[mlir][tosa] Do not fold transpose with quantized types

For such cases, the type of the constant DenseElementsAttr is
different from the transpose op return type.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D110446

[AutoFDO][llvm-profgen] Profile generation for LBR(non-CS) sample

This patch introduces non-CS AutoFDO profile generation into LLVM. The profile is supposed to be well consumed by compiler using `-fprofile-sample-use=[profile]`.

After range and branch counters are extracted from the LBR sample, here we go through each addresses for symbolization, create FunctionSamples and populate its sub fields like TotalSamples, BodySamples and HeadSamples etc. For inlined code, as we need to map back to original code, so we always add body samples to the leaf frame's function sample.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109551

[mlir] Create a generic reduction detection utility

This patch introduces a generic reduction detection utility that works
across different dialecs. It is mostly a generalization of the reduction
detection algorithm in Affine. The reduction detection logic in Affine,
Linalg and SCFToOpenMP have been replaced with this new generic utility.

The utility takes some basic components of the potential reduction and
returns: 1) the reduced value, and 2) a list with the combiner operations.
The logic to match reductions involving multiple combiner operations disabled
until we can properly test it.

Reviewed By: ftynse, bondhugula, nicolasvasilache, pifon2a

Differential Revision: https://reviews.llvm.org/D110303

[llvm-profgen] Ignore invalid perf line in LBR record

Similar to https://reviews.llvm.org/D109637, there is a whole invalid line of message in perfscript.

```
warning: Invalid address in LBR record at line 14118674: Processed 14138923 events and lost 1 chunks!
warning: Invalid address in LBR record at line 14118676: Check IO/CPU overload!
```

This only happened for LBR only perfscript, hybridperfscript have a check of " 0x" to make sure it's the LBR perf line.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D110424

[AMDGPU] Limit promote alloca max size in functions

Non-entry functions have 32 caller saved VGPRs available. If we
promote alloca to consume more registers we will have to spill
CSRs. There is no reason to eliminate scratch access to get
another scratch access instead.

Differential Revision: https://reviews.llvm.org/D110372

[gn build] Port c0d889995e70

[gn build] Port a9ae2436fc0d

[ORC] Add 'contains' and 'overlaps' operations to ExecutorAddrRange.

Also includes unit tests for not-yet tested operations like comparison and
to/from pointer conversion.

[SystemZ][z/OS] Introduce the GOFFMCAsmInfo Interface for z/OS

- This patch adds in the GOFFMCAsmInfo interfaces for the z/OS target.
- This patch decouples the previously existing SystemZMCAsmInfo interface for the ELF target and the z/OS target.
- This patch also removes a small test in the SystemZAsmLexerTest.cpp. The reason for this is because, the test is set up for the s390x-ibm-linux (SystemZ ELF triple), and the test checks a function which is overridden only for the z/OS target. The reason we can't change the test to use a z/OS triple outright is because there is still missing support which prevents the successful running of a test (assert in AsmParser.cpp due to missing GOFFAsmParser support)

Reviewed By: uweigand, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D110077

[IR] Handle large element size when calculating GEP indices

This is a fix for the issue reported at
https://reviews.llvm.org/D110043#3019942:
The ElementSize is a uint64_t and as such may be larger than the
index space, or be negative in the index space. This is UB, but
shouldn't cause assertion failures.

We address this by detecting whether the size is too large and
use a zero index in that case (which is always conservatively
correct).

Differential Revision: https://reviews.llvm.org/D110437

[mlir:OpAsm] Factor out the common bits of (Op/Dialect)Asm(Parser/Printer)

This has a few benefits:
* It allows for defining parsers/printer code blocks that
can be shared between operations and attribute/types.
* It removes the weird duplication of generic parser/printer hooks,
which means that newly added hooks only require touching one class.

Differential Revision: https://reviews.llvm.org/D110375

[flang][fir] Add support to mangle/deconstruct namelist group name

Add support to create unique name for namelist group and be able to
deconstruct them.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D110331

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

[Polly] Fix wrong redirect in test case.

[InstCombine] fold lshr(trunc(lshr X, C1)) C2

Only the multi-use cases are changing here because there's
another fold that catches the simpler patterns.

But that other fold is the source of infinite loops when we
try to add D110170, so removing that is planned as a follow-up.

Attempt to show the general proof in Alive2:
https://alive2.llvm.org/ce/z/Ns1uS2

Note that the overshift fold-to-zero tests are not
currently handled by instsimplify. If they were, we
could assert that the shift amount sum is less than
the source bitwidth.

[InstCombine] match variable names and code comments; NFC

Fix bot failure by adding needed dependence

Fix bot failure from 96cb97c4533a0a02c2d62ffb1121cd275aa43dd5, e.g.:
https://lab.llvm.org/buildbot/#/builders/61/builds/15203

llvm-lto now needs to link in IPO.

[mlir:MemRef] Move DmaStartOp/DmaWaitOp to ODS

These are among the last operations still defined explicitly in C++. I've
tried to keep this commit as NFC as possible, but these ops
definitely need a non-NFC cleanup at some point.

Differential Revision: https://reviews.llvm.org/D110440

[ThinLTO] Update combined index for SamplePGO indirect calls to locals

In ThinLTO for locals we normally compute the GUID from the name after
prepending the source path to get a unique global id. SamplePGO indirect
call profiles contain the target GUID without this uniquification,
however (unless compiling with -funique-internal-linkage-names).

In order to correctly handle the call edges added to the combined index
for these indirect calls, during importing and bitcode writing we
consult a map of original to full GUID to identify the actual callee.
However, for a large application this was consuming a lot of compile
time as we need to do this repeatedly (especially during importing where
we may traverse call edges multiple times).

To fix this implement a suggestion in one of the FIXME comments, and
actually modify the call edges during a single traversal after the index
is built to perform the fixups once. I combined this fixup with the dead
code analysis performed on the index in order to avoid adding an
additional walk of the index. The dead code analysis is the first
analysis performed on the index.

This reduced the time required for a large thin link with SamplePGO by
about 20%.

No new test added, but I confirmed that there are existing tests that
will fail when no fixup is performed.

Differential Revision: https://reviews.llvm.org/D110374

[mlir][tosa] Add some transpose folders

* If the input is a constant splat value, we just
  need to reshape it.
* If the input is a general constant with one user,
  we can also constant fold it, without bloating
  the IR.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D110439

[libc] Add an implementation of qsort.

A fuzzer for qsort has also been added.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D110382

[NFC] Replace hard-coded usages of SystemZ::R15D with SpecialRegisters API

This patch changes hard-coded usages of SystemZ::R15D with calls to the getStackPointerRegister function. Uses in the LowerCall function are avoided to avoid merge conflicts with an expected upcoming patch.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D109702

[DSE] Add additional capture tests (NFC)

These test other escape sources and the case of multiple
underlying objects.

[TargetLibraryInfo] Correctly handle sqrt*_finite

Other <math>_finite calls are marked as unavailable except on GNU/Linux;
it looks like the sqrt set was just overlooked.

Differential Revision: https://reviews.llvm.org/D110418

[compiler-rt] Add shared_cxxabi requirement to some tests

This adds REQUIRES: shared_cxxabi to a bunch of tests that would fail if this
weak reference in sanitizer common was undefined. This is necessary in cases
where libc++abi.a is statically linked in. Because there is no strong reference
to __cxa_demangle in compiler-rt, then if libc++abi is linked in via a static
archive, then the linker will not extract the archive member that would define
that weak symbol. This causes a handful of tests to fail because this leads to
the symbolizer printing mangled symbols where tests expect them demangled.

Technically, this feature is WAI since sanitizer runtimes shouldn't fail if
this symbol isn't resolved, and linking statically means you wouldn't need to
link in all of libc++abi. As a workaround, we can simply make it a requirement
that these tests use shared libc++abis.

Differential Revision: https://reviews.llvm.org/D109639

[ARM] Addition jump table plus while loop block placement pass test.

Also regenerated the file, whilst here.

[libc++][NFC] Update status of old issue LWG2560 -- we implement it properly

DebugInfo: Move the '=' version of -gsimple-template-names to the frontend

Based on feedback from Paul Robinson on 38c09ea that the 'mangled' mode
is only useful as an LLVM-developer-internal tool in combination with
llvm-dwarfdump --verify, so demote that to a frontend-only (not driver)
option. The driver support is simply -g{no-,}simple-template-names to
switch on simple template names, without the option to use the mangled
template name scheme there.

[LiveIntervals] Fix asan debug build failures

Call RemoveMachineInstrFromMaps before erasing instrs.
repairIntervalsInRange will do this for you after erasing the
instruction, but it's not safe to rely on it because assertions in
SlotIndexes::removeMachineInstrFromMaps refer to fields in the erased
instruction.

This fixes asan buildbot failures caused by D110328.

[libc++][NFC] Mark LWG3158 as implemented

It has been implemented in 59e26308e60a.

[SystemZ][z/OS] Add GOFF Support to the DataLayout

- This patch adds in the GOFF mangling support to the LLVM data layout string. A corresponding additional line has been added into the data layout section in the language reference documentation.
- Furthermore, this patch also sets the right data layout string for the z/OS target in the SystemZ backend.

Reviewed By: uweigand, Kai, abhina.sreeskantharajan, MaskRay

Differential Revision: https://reviews.llvm.org/D109362

[mlir:OpConversion] Remove the remaing usages of the deprecated matchAndRewrite methods

This commits updates the remaining usages of the ArrayRef<Value> based
matchAndRewrite/rewrite methods in favor of the new OpAdaptor
overload.

Differential Revision: https://reviews.llvm.org/D110360

[mlir:OpConversionPattern] Add overloads for taking an Adaptor instead of ArrayRef

This has been a TODO for a long time, and it brings about many advantages (namely nice accessors, and less fragile code). The existing overloads that accept ArrayRef are now treated as deprecated and will be removed in a followup (after a small grace period). Most of the upstream MLIR usages have been fixed by this commit, the rest will be handled in a followup.

Differential Revision: https://reviews.llvm.org/D110293

[NFC][libc++] Update clang-format style.

Changes the style as requested by @ldionne in D103368.

Reviewed By: ldionne, #libc, Quuxplusone

Differential Revision: https://reviews.llvm.org/D109835

Revert "Allow rematerialization of virtual reg uses"

Reverted due to two distcint performance regression reports.

This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39.

Fix test from 8dd42f, capitalization in test

Add test for DR1307, which we have already implemented.

Also regenerated cxx_dr_status.html

[libc++] Refactor the tests for common_view to reduce duplication

[ConstantFold] ConstantFoldGetElementPtr - use APInt::isNegative() instead of getSExtValue() to support big ints

Fixes fuzz test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39197

[SCCP] Regenerate bigint test checks

[PowerPC][NFC] Add test case in preparation for codegen change

This test case tests doubles inserted into vector ints,
and help make apparent the optimizations a future patch
will make.

Fix wrong FixIt about union in cppcoreguidelines-pro-type-member-init

At most one variant member of a union may have a default member
initializer. The case of anonymous records with multiple levels of
nesting like the following also needs to meet this rule. The original
logic is to horizontally obtain all the member variables in a record
that need to be initialized and then filter to the variables that need
to be fixed. Obviously, it is impossible to correctly initialize the
desired variables according to the nesting relationship.

See Example 3 in class.union

union U {
  U() {}
  int x;  // int x{};
  union {
    int k;  // int k{};  <==  wrong fix
  };
  union {
    int z;  // int z{};  <== wrong fix
    int y;
  };
};

Write test for CWG1772/CWG1762/CWG1779, mark them 'done', and update
cxx_dr_status.html

I noticed that these two DRs are currently working correctly, so I
added a pair of lit tests that check the AST (which is most useful for
CWG1779, since 'dependent' is really only observable in an ast dump) to
make sure __func__ works correctly in dependent cases, and in lambda
operator().

Also noticed that CWG1762, mostly an 'example' change, works correctly,
so added a test so that it gets marked 'done' as well.

Additionally, I regenerated cxx_dr_status.html, updating it for Clang
13's release, based on the cwg_status.html from August 12, 2021.

Differential Revision: https://reviews.llvm.org/D109956

[PowerPC] Mark splat immediate instructions as rematerializable

This patch marks splat immediate instructions XXSPLTIW and XXSPLTIDP as
rematerializable to prevent MachineLICM from moving them out of loops.

Reviewed By: lei, amy

Differential revision: https://reviews.llvm.org/D108823

Re-apply "[JumpThreading] Ignore free instructions"

It seems the crashes we saw wasn't caused by this (see comments on the review).

> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290

This reverts commit 4604695d7c20e72b551a1a5224f3de877cb41bd3.

Revert "[flang][fir] Add support to mangle/deconstruct namelist group name"

This reverts commit 3593ae4312f6156c9ca50d46cdb55a8dfad782d0.

[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch

With architected flat scratch it becomes readonly. We must always
reserve SGPR pair for it even if we do not use scratch at all since
an attempt to write to SGPRs mapped to FLAT_SCRATCH results in
memory violation.

This is not needed since GFX10 with architected flat scratch though
since special SGPRs are not carving space from normal SGPRs.

Differential Revision: https://reviews.llvm.org/D110376

[mlir] Linalg: ensure tile-and-pad always creates padding as requested

Initially, the padding transformation and the related operation were only used
to guarantee static shapes of subtensors in tiled operations. The
transformation would not insert the padding operation if the shapes were
already static, and the overall code generation would actively remove such
"noop" pads. However, this transformation can be also used to pack data into
smaller tensors and marshall them into faster memory, regardless of the size
mismatches. In context of expert-driven transformation, we should assume that,
if padding is requested, a potentially padded tensor must be always created.
Update the transformation accordingly. To do this, introduce an optional
`packing` attribute to the `pad_tensor` op that serves as an indication that
the padding is an intentional choice (as opposed to side effect of type
normalization) and should be left alone by cleanups.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110425

[PowerPC] Add range check for vec_genpcvm builtins

This patch adds range checking for some Power10 altivec builtins. Range
checking is done in SemaChecking.

Reviewed By: #powerpc, lei, Conanap

Differential Revision: https://reviews.llvm.org/D109780

Recommit "[DSE] Track earliest escape, use for loads in isReadClobber."

This reverts the revert commit df56fc6ebbee6c458b0473185277b7860f7e3408.

This version of the patch adjusts the location where the EarliestEscapes
cache is cleared when an instruction gets removed. The earliest escaping
instruction does not have to be a memory instruction.

It could be a ptrtoint instruction like in the added test
@earliest_escape_ptrtoint, which subsequently gets removed. We need to
invalidate the EarliestEscape entry referring to the ptrtoint when
deleting it.

This fixes the crash mentioned in
https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6

[MC][NFC] Add end-of-namespace comments

tsan: don't use pipe2 in tests

MacOS buildbots failed:
stress.cpp:57:7: error: use of undeclared identifier 'pipe2'
https://green.lab.llvm.org/green//job/clang-stage1-RA/24209/consoleFull#-3468768778254eaf0-7326-4999-85b0-388101f2d404

Fix the test to not use pipe2.

Differential Revision: https://reviews.llvm.org/D110423

[X86][SSE] combineMulToPMADDWD - replace sext(v8i16) -> zext(v8i16)

As suggested on D108522, if we're sign extending a v4i16 source before multiplying as a v4i32, then we can replace that with a zero extension and rely on the implicit sign-extension of PMADDWD.

[libc++] Require a C++20 capable compiler.

This enforces libcxx and its benchmarks are compiled by a C++20 capable
compiler. Based on review comments in D103413.

Differential Revision: https://reviews.llvm.org/D110338

[x86] convert logic-of-FP-compares to FP logic-of-vector-compares

This is motivated by the examples and discussion in:
https://llvm.org/PR51245
...and related bugs.

By using vector compares and vector logic, we can convert 2 'set'
instructions into 1 'movd' or 'movmsk' and generally improve
throughput/reduce instructions.

Unfortunately, we don't have a complete vector compare ISA before
AVX, so I left SSE-only out of this patch. Ie, we'd need extra logic
ops to simulate the missing predicates for SSE 'cmpp*', so it's not
as clearly a win.

Differential Revision: https://reviews.llvm.org/D110342

[InstCombine] add tests for lshr-trunc-lshr; NFC

[libc++][NFC] Add missing link to a ranges review

[Transforms/Utils] Remove redundant declaration computeSyntheticCounts (NFC)

[llvm-objcopy][NFC] Add a helper method RelocationSectionBase::getNamePrefix()

Refactor handleArgs() to use that method.

Differential Revision: https://reviews.llvm.org/D110350

[TargetLibraryInfo][AMDGPU] Minor cleanup, NFC

Revert "[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)"

This reverts commit bb9333c3504a4a02b982526ad8264d14c6ec1ad4.

This exposes another existing bug that causes an infinite loop as shown in
D110170
...so reverting while I look at another fix.

tsan: add a stress test

The stress test does various assorted things
(memory accesses, function calls, atomic operations,
thread creation/join, intercepted libc calls)
in multiple threads just to stress various parts
of the runtime.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110416

Revert "[JumpThreading] Ignore free instructions"

It caused compiler crashes, see comment on the code review for repro.

> This is basically D108837 but for jump threading. Free instructions
> should be ignored for the threading decision. JumpThreading already
> skips some free instructions (like pointer bitcasts), but does not
> skip various free intrinsics -- in fact, it currently gives them a
> fairly large cost of 2.
>
> Differential Revision: https://reviews.llvm.org/D110290

This reverts commit 1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff.

tsan: add a test for flushing memory

Add a test for __tsan_flush_memory() and for background
flushing of the runtime memory.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110409

Revert "[DSE] Track earliest escape, use for loads in isReadClobber."

This reverts commit 5ce89279c0986d0bcbe526dce52f91dd0c16427c.
Makes clang crash, see comments on https://reviews.llvm.org/D109844

[compiler-rt] Use portable "#!/usr/bin/env bash" shebang for tests.

In build_symbolizer.sh we can safely remove the -eu argument from the shebang (which is an unportable construct), as the scripts sets **-e** and **-u** already.

Differential Revision: https://reviews.llvm.org/D110039

[SystemZ] NFC: Remove unused intrinsic template arg 'name'

Identified in D109359.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D109598

[lldb] [Host] Refactor Socket::DecodeHostAndPort() to use LLVM API

Refactor Socket::DecodeHostAndPort() to use LLVM API over redundant
LLDB API. In particular, this means llvm::Regex, llvm::Error return
type and llvm::to_integer().

While at it, change the port type from int32_t to uint16_t. The method
never returns any value outside this range, and using the correct type
allows us to rely on getAsInteger()'s implicit overflow check.

Differential Revision: https://reviews.llvm.org/D110391

[Analysis] Fix another issue when querying vscale attributes on functions

There are several places in the code that are currently broken where
we assume an Instruction is always a member of a BasicBlock that
lives in a Function. This is a problem specifically when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction's parent also has a parent!

I've added a test for a function-less @llvm.vscale intrinsic call here:

unittests/Analysis/ValueTrackingTest.cpp

[flang][fir] Add support to mangle/deconstruct namelist group name

Add support to create unique name for namelist group and be able to
deconstruct them.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D110331

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

Revert "[lldb] [Host] Refactor Socket::DecodeHostAndPort() to use LLVM API"

This reverts commit a6daf99228bc16fb7f2596d67a0d00fef327ace5. It causes
buildbot regressions, I'll investigate.

[lldb] [Host] Refactor Socket::DecodeHostAndPort() to use LLVM API

Refactor Socket::DecodeHostAndPort() to use LLVM API over redundant
LLDB API. In particular, this means llvm::Regex, llvm::Error return
type and llvm::to_integer().

While at it, change the port type from int32_t to uint16_t. The method
never returns any value outside this range, and using the correct type
allows us to rely on getAsInteger()'s implicit overflow check.

Differential Revision: https://reviews.llvm.org/D110391

[LiveIntervals] Repair live intervals that gain subranges

In repairIntervalsInRange, if the new instructions refer to subregs but
the old instructions did not, make sure any existing live interval for
the superreg is updated to have subranges. Also skip repairing any range
that we have recalculated from scratch, partly for efficiency but also
to avoids some cases that repairOldRegInRange can't handle.

The existing test/CodeGen/AMDGPU/twoaddr-regsequence.mir provides some
test coverage for this change: when TwoAddressInstructionPass converts
REG_SEQUENCE into subreg copies, the live intervals will now get
subranges and MachineVerifier will verify that the subranges are
correct. Unfortunately MachineVerifier does not complain if the
subranges are not present, so the test also passed before this patch.

This patch also fixes ~800 of the ~1500 failures in the whole CodeGen
lit test suite when -early-live-intervals is forced on.

Differential Revision: https://reviews.llvm.org/D110328

[LiveIntervals] Fix repairOldRegInRange for simple def cases

The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange"
was over-zealous. It would bail out when the end of the range to be
repaired was in the middle of the first segment of the live range of
Reg, which was always the case when the range contained a single def of
Reg.

This patch fixes it as suggested by Matthias Braun in post-commit review
on the original patch, and tests it by adding -early-live-intervals to
a selection of existing lit tests that now pass.

(Note that D23303 was originally applied to fix a crash in
SILoadStoreOptimizer, but that is now moot since D23814 updated
SILoadStoreOptimizer to run before scheduling so it no longer has to
update live intervals.)

Differential Revision: https://reviews.llvm.org/D110238

Unrevert with some changes to the tests:
- Add -verify-machineinstrs to check for remaining problems in live
interval support in TwoAddressInstructionPass.
- Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from
some of those remaining problems.

[NFC] Mark LI.getLoopsInPreorder and LI.getLoopsInReverseSiblingPreorder const.

They create a new vector with the list, so they can be const.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D110394

[MLIR] PresburgerSet: support divisions in operations

Add support for intersecting, subtracting, complementing and checking equality of sets having divisions.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D110138

[mlir] add pad_tensor(tensor.cast) -> pad_tensor canonicalizer

This canonicalization pattern complements the tensor.cast(pad_tensor) one in
propagating constant type information when possible. It contributes to the
feasibility of pad hoisting.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110343

[CodeMoverUtils] Enhance isSafeToMoveBefore() when moving BBs

When moving an entire basic block BB before InsertPoint, currently
we check for all instructions whether the operands dominates
InsertPoint, however, this can be improved such that even an
operand does not dominate InsertPoint, as long as it appears as
a previous instruction in the same BB, it is safe to move.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D110378

Reapply "[Dexter] Improve performance by evaluating expressions only when needed"

Fixes issue found on greendragon buildbot, in which an incorrectly
indented statement following an if block led to entire frames being
dropped instead of simply filtering unneeded watches.

This reverts commit 1f44fa3ac17ceacc753019092bc50436c77ddcfa.

[analyzer] Retrieve a value from list initialization of constant array declaration in a global scope.

Summary: Fix the point that we didn't take into account array's dimension. Retrieve a value of global constant array by iterating through its initializer list.

Differential Revision: https://reviews.llvm.org/D104285

Fixes: https://bugs.llvm.org/show_bug.cgi?id=50604

Revert "[libcxx][pretty printers] Import gdb module in gdb feature check"

This reverts commit 0c2a4548455c6c943ac5e2b5c51ed5c2c91e34e7.

This was my mistake. When gdb can find its data directory it'll
import it automatically. If it can't (like when you're using a
version from a build folder) you need to give it the data
directory path.

We're safe to assume gdb is installed for testing purposes
so it'll import it for us.

[RISCV] (2/2) Add the tail policy argument to builtins/intrinsics.

Add the tail policy argument to Clang builtins. There
are two policies for tail elements. Tail agnostic means users do not
care about the values in the tail elements and tail undisturbed means
the values in the tail elements need to be kept after the operation. In
order to let users control the tail policy, we add an additional
argument at the end of the argument list.

For unmasked operations, we have no maskedoff and the tail policy is
always tail agnostic. If users want to keep tail elements under unmasked
operations, they could use all one mask in the masked operations to do
it. So, we only add the additional argument for masked operations for
most cases. There are exceptions listed below.

In this patch, we do not handle the following cases to reduce the
complexity of the patch. There could be two separate patches for them.

Use dest argument to control tail policy
vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest
argument)
vfmerge.vfm (add _t builtins with additional dest argument)
vmv.v.v (add _t builtins with additional dest argument)
vmv.v.x (add _t builtins with additional dest argument)
vmv.v.i (add _t builtins with additional dest argument)
vfmv.v.f (add _t builtins with additional dest argument)
vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest
argument)
vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument)

Always has tail argument for masked/unmasked intrinsics
Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt
builtins)
Vector Widening Integer Multiply-Add Instructions (add _t and _mt
builtins)
Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add
_t and _mt builtins)
Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t
and _mt builtins)
Vector Reduction Operations (add _t and _mt builtins)
Vector Slideup Instructions (add _t and _mt builtins)
Vector Slidedown Instructions (add _t and _mt builtins)

Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101

Differential Revision: https://reviews.llvm.org/D109322

[RISCV] (1/2) Add the tail policy argument to builtins/intrinsics.

Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list.

For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below.

In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them.

* Use dest argument to control tail policy
vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument)
vfmerge.vfm (add _t builtins with additional dest argument)
vmv.v.v (add _t builtins with additional dest argument)
vmv.v.x (add _t builtins with additional dest argument)
vmv.v.i (add _t builtins with additional dest argument)
vfmv.v.f (add _t builtins with additional dest argument)
vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument)
vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument)

* Always has tail argument for masked/unmasked intrinsics
Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins)
Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins)
Vector Reduction Operations (add _t and _mt builtins)
Vector Slideup Instructions (add _t and _mt builtins)
Vector Slidedown Instructions (add _t and _mt builtins)

Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101

Differential Revision: https://reviews.llvm.org/D105092

[X86][SLM] Fix ADDQ/SUBQ/CMPEQQ throughput to account for running on either port.

Testing on a SLM box suggests these can run on either port, but the throughput is 4cy on either (inc MMX versions). Confirmed with Intel AoM / Agner / InstLatX64.

[clang-doc] Pass Record argument by const-ref. NFCI.

Record is a SmallVector<uint64_t, 1024> - we really need to avoid passing this by value.

Avoid unnecessary big copies, reported by coverity.

[libcxx][pretty printers] Import gdb module in gdb feature check

Earlier versions of GDB do not do this automatically.
(from my checks 8.3 does not and 9.2 does)

[Analysis] Fix issues when querying vscale attributes on functions

There are several places in the code that are currently broken as
they assume an Instruction always has a parent Function when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction has a parent.

I've added a test for a parentless @llvm.vscale intrinsic call here:

unittests/Analysis/ValueTrackingTest.cpp

Differential Revision: https://reviews.llvm.org/D110158

[llvm-objcopy][docs] Add missing options to the help output and the command guide

This change is to keep the help text and command guide of objcopy in
tandem.

- In the help output the options --rename-section and
  --set-section-flags were missing the flag exclude, which is found in
  the command guide.
- In the command guide the alias -G for --keep-global-symbol was
    missing, which is found in the help output.

Differential Revision: https://reviews.llvm.org/D110340

[clang-format] Fixed an unused variable warning

[libcxx][pretty printers] Check GDB Python scripting support

I found this after upgrading from Ubuntu bionic (gdb 8.1.1) to
Focal (gdb 9.2). (where this test fails, but that's for a
different patch)

9.2 allows you to set breakpoint commands from
Python, which was added in 8.3.
(bintutils a913fffbdee21fdd50e8de0596358be425775678
"Allow breakpoint commands to be set from Python")

The reason this test never failed before was because it did so
silently. "source <python file>" doesn't fail even if that script
raises an Exception.

To fix this extend the gdb lit feature to check that:
* gdb exists
* has Python support
* allows you to set breakpoint commands

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D110334

[SystemZ] Implement ISD::BITCAST for fp128 -> i128.

The type legalizer has by default no method of doing this bitcast other than
storing and reloading the value from stack.

This patch implements a custom lowering of this operation using extractions
of subregs (z13 and earlier using FP128 register pairs), or of vector
elements (with 'vector enhancements 1' using VR128 FP registers).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D110346

[AArch64] Rewrite ldst-unsignedimm.ll codegen test.

Instead of relying on many volatile loads/stores in a single function,
rewrite the test to use separate functions as any other test would.

[mlir][linalg] Fix result type in FoldSourceTensorCast

* Do not discard static result type information that cannot be inferred from lower/upper padding.
* Add optional argument to `PadTensorOp::inferResultType` for specifying known result dimensions.

Differential Revision: https://reviews.llvm.org/D110380

[Driver] Correctly handle static C++ standard library

When statically linking C++ standard library, we shouldn't add -Bdynamic
after including the library on the link line because that might override
user settings like -static and -static-pie. Rather, we should surround
the library with --push-state/--pop-state to make sure that -Bstatic
only applies to C++ standard library and nothing else. This has been
supported since GNU ld 2.25 (2014) so backwards compatibility should
no longer be a concern.

Differential Revision: https://reviews.llvm.org/D110128

[GlobalISel][IRTranslator] Fix crash during bit-test switch optimization with odd types.

Odd switch case types cause a crash in the conversion to MVT. Instead use a pointer sized
scalar type which is what SDAG does in these cases.

[clang-format][docs] Fix documentation of clang-format BasedOnStyle type

Fix little inconsistency and use `std::string` (which is used everywhere
else) instead of `string`

Reviewed By: MyDeveloperDay, HazardyKnusperkeks

Differential Revision: https://reviews.llvm.org/D108765

[clang-format] ensure clang-format command-line argument sets up the default left/right qualifier ordering

When specifying the alignment direction on the command line ensure
we set up the default ordering.

Fix spelling mistakes in the command-line argument

Reviewed By: HazardyKnusperkeks

Differential Revision: https://reviews.llvm.org/D110359