review.tizen.org Git - platform/upstream/llvm.git/log

[gn build] Port d38be2ba0e4e

[libc++abi] Simplify scan_eh_tab

1.
All `_URC_HANDLER_FOUND` return values need to set `landingPad`
and its value does not matter for `_URC_CONTINUE_UNWIND`. So we
can always set `landingPad` to unify code.

2.
For an exception specification (`ttypeIndex < 0`), we can check `_UA_FORCE_UNWIND` first.

3.
The so-called type 3 search (`actions & _UA_CLEANUP_PHASE && !(actions & _UA_HANDLER_FRAME)`)
is actually conceptually wrong. For a catch handler or an unmatched dynamic
exception specification, `_UA_HANDLER_FOUND` should be returned immediately. It
still appeared to work because the `ttypeIndex==0` case would return
`_UA_HANDLER_FOUND` at a later time.

This patch fixes the conceptual error and simplifies the code by handling type 3
the same way as type 2 (which is also what libsupc++ does).
The only difference between phase 1 and phase 2 is what to do with a cleanup
(`actionEntry==0`, or a `ttypeIndex==0` is found in the action record chain):
phase 1 returns `_URC_CONTINUE_UNWIND` while phase 2 returns `_URC_HANDLER_FOUND`.

Reviewed By: #libc_abi, compnerd

Differential Revision: https://reviews.llvm.org/D93190

[llvm-mca] Initial implementation of serialization using JSON. The views
implemented at this time are Summary, Timeline, ResourcePressure and InstructionInfo.
Use --json on the command line to obtain JSON output.

Add Python bindings for the builtin dialect

This includes some minor customization for FuncOp and ModuleOp.

Differential Revision: https://reviews.llvm.org/D95022

Fix crash when emitting NullReturn guards for functions returning BOOL

CodeGenModule::EmitNullConstant() creates constants with their "in memory"
type, not their "in vregs" type. The one place where this difference matters is
when the type is _Bool, as that is an i1 when in vregs and an i8 in memory.

Fixes: rdar://73361264

[WebAssembly] Test that invalid symbol/relocation types generate errors

See https://bugs.llvm.org/show_bug.cgi?id=48827

Differential Revision: https://reviews.llvm.org/D95163

Revert [mlir] Link mlir_runner_utils statically into cuda/rocm-runtime-wrappers (cf50f4f76456)

There are cmake failures that I do not know how to fix.

Differential Revision: https://reviews.llvm.org/D95162

[libc++abi] Add an option to avoid demangling in terminate.

We've been using this patch in Android so we can avoid including the
demangler in libc++.so. It comes with a rather large cost in RSS and
isn't commonly needed.

Reviewed By: #libc_abi, compnerd

Differential Revision: https://reviews.llvm.org/D88189

[lldb-vscode] improve modules request

lldb-vsdode was communicating the list of modules to the IDE with events, which in practice ended up having some drawbacks
- when debugging large targets, the number of these events were easily 10k, which polluted the messages being transmitted, which caused the following: a harder time debugging the messages, a lag after terminated the process because of these messages being processes (this could easily take several seconds). The latter was specially bad, as users were complaining about it even when they didn't check the modules view.
- these events were rarely used, as users only check the modules view when something is wrong and they try to debug things.

After getting some feedback from users, we realized that it's better to not used events but make this simply a request and is triggered by users whenever they needed.

This diff achieves that and does some small clean up in the existing code.

Differential Revision: https://reviews.llvm.org/D94033

[LV][ARM] Inloop reduction cost modelling

This adds cost modelling for the inloop vectorization added in
745bf6cf4471. Up until now they have been modelled as the original
underlying instruction, usually an add. This happens to works OK for MVE
with instructions that are reducing into the same type as they are
working on. But MVE's instructions can perform the equivalent of an
extended MLA as a single instruction:

  %sa = sext <16 x i8> A to <16 x i32>
  %sb = sext <16 x i8> B to <16 x i32>
  %m = mul <16 x i32> %sa, %sb
  %r = vecreduce.add(%m)
  ->
  R = VMLADAV A, B

There are other instructions for performing add reductions of
v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64
(VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV).
The i64 are particularly interesting as there are no native i64 add/mul
instructions, leading to the i64 add and mul naturally getting very
high costs.

Also worth mentioning, under NEON there is the concept of a sdot/udot
instruction which performs a partial reduction from a v16i8 to a v4i32.
They extend and mul/sum the first four elements from the inputs into the
first element of the output, repeating for each of the four output
lanes. They could possibly be represented in the same way as above in
llvm, so long as a vecreduce.add could perform a partial reduction. The
vectorizer would then produce a combination of in and outer loop
reductions to efficiently use the sdot and udot instructions. Although
this patch does not do that yet, it does suggest that separating the
input reduction type from the produced result type is a useful concept
to model. It also shows that a MLA reduction as a single instruction is
fairly common.

This patch attempt to improve the costmodelling of in-loop reductions
by:
- Adding some pattern matching in the loop vectorizer cost model to
   match extended reduction patterns that are optionally extended and/or
   MLA patterns. This marks the cost of the reduction instruction correctly
   and the sext/zext/mul leading up to it as free, which is otherwise
   difficult to tell and may get a very high cost. (In the long run this
   can hopefully be replaced by vplan producing a single node and costing
   it correctly, but that is not yet something that vplan can do).
- getExtendedAddReductionCost is added to query the cost of these
   extended reduction patterns.
- Expanded the ARM costs to account for these expanded sizes, which is a
   fairly simple change in itself.
- Some minor alterations to allow inloop reduction larger than the highest
   vector width and i64 MVE reductions.
- An extra InLoopReductionImmediateChains map was added to the vectorizer
   for it to efficiently detect which instructions are reductions in the
   cost model.
- The tests have some updates to show what I believe is optimal
   vectorization and where we are now.

Put together this can greatly improve performance for reduction loop
under MVE.

Differential Revision: https://reviews.llvm.org/D93476

[SLP] rename reduction variable to avoid shadowing; NFC

The code structure can likely be improved now that
'OperationData' is gone.

Scalar: Don't visit constants in findInnerReductionPhi in LoopInterchange

In LoopInterchange, `findInnerReductionPhi()` looks for reduction
variables, which cannot be constants. Update it to return early in that
case.

This also addresses a blocker for removing use-lists from ConstantData,
whose users could be spread across arbitrary modules in the same
LLVMContext.

Differential Revision: https://reviews.llvm.org/D94712

Remove deprecated methods from OpState.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D95123

ADT: Fix reference invalidation in SmallVector::emplace_back and assign(N,V)

This fixes the final (I think?) reference invalidation in `SmallVector`
that we need to fix to align with `std::vector`. (There is still some
left in the range insert / append / assign, but the standard calls that
UB for `std::vector` so I think we don't care?)

For POD-like types, reimplement `emplace_back()` in terms of
`push_back()`, taking a copy even for large `T` rather than lose the
realloc optimization in `grow_pod()`.

For other types, split the grow operation in three and construct the new
element in the middle.

- `mallocForGrow()` calculates the new capacity and returns the result
  of `safe_malloc()`. We only need a single definition per
  `SmallVectorBase` so this is defined in SmallVector.cpp to avoid code
  size bloat. Moving this part of non-POD grow to the source file also
  allows the logic to be easily shared with `grow_pod`, and
  `report_size_overflow()` and `report_at_maximum_capacity()` can move
  there too.
- `moveElementsForGrow()` moves elements from the old to the new
  allocation.
- `takeAllocationForGrow()` frees the old allocation and saves the
  new allocation and capacity .

`SmallVector:assign(size_type, const T&)` also uses the split-grow
operations for non-POD, but it also has a semantic change when not
growing. Previously, assign would start with `clear()`, and so the old
elements were destructed and all elements of the new vector were
copy-constructed (potentially invalidating references). The new
implementation skips destruction and uses copy-assignment for the prefix
of the new vector that fits. The new semantics match what libc++ does
for `std::vector::assign()`.

Note that the following is another possible implementation:
```
  void assign(size_type NumElts, ValueParamT Elt) {
    std::fill_n(this->begin(), std::min(NumElts, this->size()), Elt);
    this->resize(NumElts, Elt);
  }
```
The downside of this simpler implementation is that if the vector has to
grow there will be `size()` redundant copy operations.

(I had planned on splitting this patch up into three for committing
(after getting performance numbers / initial review), but I've realized
that if this does for some reason need to be reverted we'll probably
want to revert the whole package...)

Differential Revision: https://reviews.llvm.org/D94739

Recommit "[RISCV] Legalize select when Zbt extension available"

This recommits 71ed4b6ce57d8843ef705af8f98305976a8f107a with
the polarity of some of the pattern corrected.

Original commit message:
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D93767

[SLP] simplify reduction matching

This is NFC-intended and removes the "OperationData"
class which had become nothing more than a recurrence
(reduction) type.

I adjusted the matching logic to distinguish
instructions from non-instructions - that's all that
the "IsLeafValue" member was keeping track of.

[ELF] report section sizes when output file too large

Fixes PR48523. When the linker errors with "output file too large",
one question that comes to mind is how the section sizes differ from
what they were previously. Unfortunately, this information is lost
when the linker exits without writing the output file. This change
makes it so that the error message includes the sizes of the largest
sections.

Reviewed By: MaskRay, grimar, jhenderson

Differential Revision: https://reviews.llvm.org/D94560

[FunctionAttrs] Infer willreturn for functions without loops

If a function doesn't contain loops and does not call non-willreturn
functions, then it is willreturn. Loops are detected by checking
for backedges in the function. We don't attempt to handle finite
loops at this point.

Differential Revision: https://reviews.llvm.org/D94633

X86: Fix use-after-realloc in X86AsmParser::ParseIntelExpression

`X86AsmParser::ParseIntelExpression` has a while loop. In the body,
calls to MCAsmLexer::UnLex can force a reallocation in the MCAsmLexer's
`CurToken` SmallVector, invalidating saved references to
`MCAsmLexer::getTok()`.

`const MCAsmToken &Tok` is such a saved reference, and this moves it
from outside the while loop to inside the body, fixing a
use-after-realloc.

`Tok` will still be reused across calls to `Lex()`, each of which
effectively destroys and constructs the pointed-to token. I'm a bit
skeptical of this usage pattern, but it seems broadly used in the
X86AsmParser (and others) so I'm leaving it alone (for now).

Somehow this bug was exposed by https://reviews.llvm.org/D94739,
resulting in test failures in dot-operator related tests in
llvm/test/tools/llvm-ml. I suspect the exposure path is related to
optimizer changes from splitting up the grow operation, but I haven't
dug all the way in. Regardless, there are already tests in tree that
cover this; they might fail consistently if we added ASan
instrumentation to SmallVector.

Differential Revision: https://reviews.llvm.org/D95112

[OpenMP] Fix failing test due to change in offloading flags

Summary:
Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95133

[CUDA] Normalize handling of defauled dtor.

Defaulted destructor was treated inconsistently, compared to other
compiler-generated functions.

When Sema::IdentifyCUDATarget() got called on just-created dtor which didn't
have implicit __host__ __device__ attributes applied yet, it would treat it as a
host function. That happened to (sometimes) hide the error when dtor referred
to a host-only functions.

Even when we had identified defaulted dtor as a HD function, we still treated it
inconsistently during selection of usual deallocators, where we did not allow
referring to wrong-side functions, while it is allowed for other HD functions.

This change brings handling of defaulted dtors in line with other HD functions.

Differential Revision: https://reviews.llvm.org/D94732

[flang] Better C_LOC and C_ASSOCIATED in flang/module

The place-holding implementation of C_LOC just didn't work
when used with our more complete semantic checking, specifically
in the case of a polymorphic argument; convert it to an external
function with an implicit interface. C_ASSOCIATED needs to be
a generic interface with specific implementations for C_PTR and
C_FUNPTR.

Differential Revision: https://reviews.llvm.org/D94714

[NFC][Doc] Mention SystemZ supports StackMap generation

Support available as of commit 5eb64110d241cf2506f54ade3c2693beed42dd8f.

Differential Revision: https://reviews.llvm.org/D95040

[RISCV] Update V instructions constraints to conform to v1.0

Upgrade RISC-V V extension to v1.0-08a0b46.
Update instruction constraints to conform to v1.0.

Differential Revision: https://reviews.llvm.org/D93612

[OpenMP] Add time profiling support in libomp

Profiling has been recently implemented in libomptarget (D93055). This patch enables time profiling support for libomptarget in libomp, to support profiling of multi-threaded execution of offloaded regions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94855

Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue"

This reverts commit ba7dcd8542abfc784255efcb0767701dec42fe83.

(caused memory leaks)

[libc++] Use ioctl when available to get random_device entropy.

Implemented the idea from D94571 to improve entropy on Linux.

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D94953

[RISCV] Add new V instructions in v1.0-08a0b46.

Add new V instructions.
vfrsqrte7.v
vfrece7.v
vrgatherei16.vv
vneg.v
vncvt.x.x.w
vfneg.v

[flang][driver] Move fixed/free from detection out of FrontendAction API

All Fortran options should be set in `CompilerInstance` (via its
`CompilerInvocation`) before any of `FrontendAction` is entered -
that's one of the tasks of the driver. However, this is a bit tricky
with fixed and free from detection introduced in
https://reviews.llvm.org/D94228.

Fixed-free form detection needs to happen:
  * before any frontend action (we need to specify `isFixedForm` in
    `Fortran::parser::Options` before running any actions)
  * separately for every input file (we might be compiling multiple
    Fortran files, some in free form, some in fixed form)
In other words, we need this to happen early (before any
`FrontendAction`), but not too early (we need to know what the current
input file is). In practice, `isFixedForm` can only be set later
than other options (other options are inferred from compiler flags). So
we can't really set all of them in one place, which is not ideal.

All changes in this patch are NFCs (hence no new tests). Quick summary:
  * move fixed/free form detection from `FrontendAction::ExecuteAction` to
    `CompilerInstance::ExecuteAction`
  * add a bool flag in `FrontendInputFile` to mark a file as fixed/free
    form
  * updated a few comments

Differential Revision: https://reviews.llvm.org/D95042

[RISCV] Make LMUL field in VTYPE continuous.

Upgrade RISC-V V extension to v1.0-08a0b46.
Update the VTYPE encoding. Make LMUL encoding in a continuous field.

[mlir]][SPIRV] Define OrderedOp and UnorderedOp and add lowerings from Standard.

Define OrderedOp and UnorderedOp instructions in SPIR-V and convert
cmpf operations with `ord` and `uno` tag to these instructions
respectively.

Differential Revision: https://reviews.llvm.org/D95098

[mlir][SPIRV] Rename OpSpecConstantOperation -> OpSpecConstantOp

The SPIR-V spec uses OpSpecConstantOp. Using an inconsistent name
makes the dialect generation scripts fail. Update to use the right
operation name, and fix the auto generation scripts as well.

Differential Revision: https://reviews.llvm.org/D95097

[AMDGPU][GlobalISel] Run SIAddImgInit

This pass is required to get correct codegen for image instructions with
the tfe or lwe bits set.

Differential Revision: https://reviews.llvm.org/D95132

AMDGPU: Remove v_rsq_f64 patterns

This isn't accurate enough without correction

AMDGPU: Use more accurate fast f64 fdiv

A raw v_rcp_f64 isn't accurate enough, so start applying correction.

[OpenMP][NVPTX] Added forward declaration for atomic operations

Pretty similar to D95058, this patch added forward declaration for
CUDA atomic functions. We already have definitions with right mangled names in
internal CUDA headers so the forward declaration here can work properly.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D95085

AArch64/GlobalISel: Factor out parametersInCSRMatch

Make this look more like the DAG handling and move to common code.

I also noticed AArch64 seems to not be properly adding the
physreg:virtreg mapping to the function live ins.

[AMDGPU] Implement mir parseCustomPseudoSourceValue

Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Differential Revision: https://reviews.llvm.org/D94768

[ARM] Fix vector saddsat costs.

It turns out the vectorizer calls the getIntrinsicInstrCost functions
with a scalar return type and vector VF. This updates the costmodel to
handle that, still producing the correct vector costs.

A vectorizer test is added to show it vectorizing at the correct factor
again.

[flang][driver] Make the driver report diagnostics from the prescanner

This patch makes sure that diagnostics from the prescanner are reported
when running `flang-new -E` (i.e. only the preprocessor phase is
requested). More specifically, the `PrintPreprocessedAction` action is
updated.

With this patch we make sure that the `f18` and `flang-new` provide
identical output when running the preprocessor and the prescanner
generates diagnostics.

Differential Revision: https://reviews.llvm.org/D94782

[OpenMP] Add support for mapping names in mapper API

Summary:
The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D94806

AMDGPU: Add occupancy to serialized MachineFunctionInfo

Not sure about the default value handling, but also not sure
defaulting to a theoretically subtarget dependent value.

[lldb][NFC] Fix build with GCC<6

GCC/libstdc++ before 6.1 can't handle scoped enums as unordered_map keys. LLVM
(and some build) bots officially support some GCC 5.x versions, so this patch
just makes the enum unscoped until we can require GCC 6.x.

[clang][AST] Add get functions for CXXFoldExpr paren locations.

Reviewed By: hokein

Differential Revision: https://reviews.llvm.org/D94787

[InstCombine] avoid crashing on attribute propagation

In https://llvm.org/PR48810 , we are crashing while trying to
propagate attributes from mempcpy (returns void*) to memcpy
(returns nothing - void).

We can avoid the crash by removing known incompatible
attributes for the void return type.

I'm not sure if this goes far enough (should we just drop all
attributes since this isn't the same function?). We also need
to audit other transforms in LibCallSimplifier to make sure
there are no other cases that have the same problem.

Differential Revision: https://reviews.llvm.org/D95088

[MC] Use std::make_tuple to make some toolchains happy again

My toolchain (LLVM 8.0, libstdc++ 5.4.0) complained with:

12:27:43 ../lib/MC/MCDwarf.cpp:814:10: error: chosen constructor is explicit in copy-initialization
12:27:43   return {Offset, Size, SetDelta};
12:27:43          ^~~~~~~~~~~~~~~~~~~~~~~~
12:27:43 /proj/flexasic/app/llvm/8.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
12:27:43         constexpr tuple(_UElements&&... __elements)
12:27:43                   ^
12:27:43 1 error generated.

This commit adds explicit calls to std::make_tuple to work around
the problem.

Add log1p lowering from standard to ROCDL intrinsics

Differential Revision: https://reviews.llvm.org/D95129

[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED).

Add DemandedElts support inside the TRUNCATE analysis.

REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d

Differential Revision: https://reviews.llvm.org/D56387

Add log1p lowering from standard to NVVM intrinsics

Differential Revision: https://reviews.llvm.org/D95130

[X86][SSE] Add uitofp(trunc(and(lshr(x,c)))) vector test

Reduced from regression reported by @hans on D56387

[DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type

As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast<ConstantSDNode>.

Reland [lldb] Fix TestThreadStepOut.py after "Flush local value map on every instruction"

The original patch got reverted as a dependency of cf1c774d6ace59c5adc9ab71b31e .
That patch got relanded so it's also necessary to reland this patch.

Original summary:

After cf1c774d6ace59c5adc9ab71b31e762c1be695b1, Clang seems to generate code
that is more similar to icc/Clang, so we can use the same line numbers for
all compilers in this test.

[lldb] Make TestBSDArchives a no-debug-info-test

The DSYM variant of this test is failing since D94890. But as we explicitly
try to disable the DSYM generation in the makefile and build the archive on
our own, I don't see why we even need to run the DSYM version of the test.

This patch disables the generated derived versions of this test for the
different debug information containers (which includes the failing DSYM one).

[lldb][import-std-module] Do some basic file checks before trying to import a module

Currently when LLDB has enough data in the debug information to import the `std` module,
it will just try to import it. However when debugging libraries where the sources aren't
available anymore, importing the module will generate a confusing diagnostic that
the module couldn't be built.

For the fallback mode (where we retry failed expressions with the loaded module), this
will cause the second expression to fail with a module built error instead of the
actual parsing issue in the user expression.

This patch adds checks that ensures that we at least have any source files in the found
include paths before we try to import the module. This prevents the module from being
loaded in the situation described above which means we don't emit the bogus 'can't
import module' diagnostic and also don't waste any time retrying the expression in the
fallback mode.

For the unit tests I did some refactoring as they now require a VFS with the files in it
and not just the paths. The Python test just builds a binary with a fake C++ module,
then deletes the module before debugging.

Fixes rdar://73264458

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D95096

MC: AArch64: Add support for gotpage_lo15

It is not used bt LLVM itself, but it would be used on lld tests
to implement R_AARCH64_LD64_GOTPAGE_LO15 support.

[DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI.

Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.

[X86] Avoid a std::string copy by replacing auto with const auto&. NFC.

Fixes msvc analyzer warning.

Revert "[X86][AMX] Fix tile config register spill issue."

This reverts commit 20013d02f3352a88d0838eed349abc9a2b0e9cc0.

[clangd] Fix a missing override keyword, NFC.

[LoopUnswitch] Implement first version of partial unswitching.

This patch applies the idea from D93734 to LoopUnswitch.

It adds support for unswitching on conditions that are only
invariant along certain paths through a loop.

In particular, it targets conditions in the loop header that
depend on values loaded from memory. If either path from
the true or false successor through the loop does not modify
memory, perform partial loop unswitching.

That is, duplicate the instructions feeding the condition in the pre-header.
Then unswitch on the duplicated condition. The condition is now known
in the unswitched version for the 'invariant' path through the original loop.

On caveat of this approach is that one of the loops created can be partially
unswitched again. To avoid this behavior, `llvm.loop.unswitch.partial.disable`
metadata is added to the unswitched loops, to avoid subsequent partial
unswitching.

If that's the approach to go, I can move the code handling the metadata kind
into separate functions.

This increases the cases we unswitch quite a bit in SPEC2006/SPEC2000 &
MultiSource. It also allows us to eliminate a dead loop in SPEC2017's omnetpp

```
Tests: 236
Same hash: 170 (filtered out)
Remaining: 66
Metric: loop-unswitch.NumBranches

Program                                        base   patch  diff
test-suite...000/255.vortex/255.vortex.test     2.00  23.00 1050.0%
test-suite...T2006/401.bzip2/401.bzip2.test     7.00  55.00 685.7%
test-suite :: External/Nurbs/nurbs.test         5.00  26.00 420.0%
test-suite...s-C/unix-smail/unix-smail.test     1.00   3.00 200.0%
test-suite.../Prolangs-C++/ocean/ocean.test     1.00   3.00 200.0%
test-suite...tions/lambda-0.1.3/lambda.test     1.00   3.00 200.0%
test-suite...yApps-C++/PENNANT/PENNANT.test     2.00   5.00 150.0%
test-suite...marks/Ptrdist/yacr2/yacr2.test     1.00   2.00 100.0%
test-suite...lications/viterbi/viterbi.test     1.00   2.00 100.0%
test-suite...plications/d/make_dparser.test    12.00  24.00 100.0%
test-suite...CFP2006/433.milc/433.milc.test    14.00  27.00 92.9%
test-suite.../Applications/lemon/lemon.test     7.00  12.00 71.4%
test-suite...ce/Applications/Burg/burg.test     6.00  10.00 66.7%
test-suite...T2006/473.astar/473.astar.test    16.00  26.00 62.5%
test-suite...marks/7zip/7zip-benchmark.test    78.00 121.00 55.1%
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93764

[mlir] Remove complex ops from Standard dialect.

`complex` dialect should be used instead.
https://llvm.discourse.group/t/rfc-split-the-complex-dialect-from-std/2496/2

Differential Revision: https://reviews.llvm.org/D95077

MCDwarf: Delete uneeded parameter

And change signature

[llvm-nm][ELF] - Make -D display symbol versions.

This fixes https://bugs.llvm.org/show_bug.cgi?id=48670.

Since binutils 2.35, nm -D displays symbol versions by default.
This patch teaches llvm-nm to do the same.

Differential revision: https://reviews.llvm.org/D94907

[X86][AMX] Fix tile config register spill issue.

Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the regmask of call instruction and insert ldtilecfg if there is
any tile data register live across the call. Inserting the sttilecfg
before the call is unneccessary, because the tile config doesn't change
and we can just reload the config.
Besides we also need check tile config register interference. Since we
don't model the config register we should check interference from the
ldtilecfg to each tile data register def.
             ldtilecfg
             /       \
            BB1      BB2
            /         \
           call       BB3
           /           \
       %1=tileload   %2=tilezero
We can start from the instruction of each tile def, and backward to
ldtilecfg. If there is any call instruction, and tile data register is
not preserved, we should insert ldtilecfg after the call instruction.

Differential Revision: https://reviews.llvm.org/D94155

[yaml2obj/obj2yaml] - Improve dumping/creating of ELF versioning sections.

This makes the following improvements.

For `SHT_GNU_versym`:
* yaml2obj: set `sh_link` to index of `.dynsym` section automatically.
For `SHT_GNU_verdef`:
* yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
* yaml2obj: set `sh_info` field automatically.
* obj2yaml: don't dump the `Info` field when its value matches the number of version definitions.
For `SHT_GNU_verneed`:
* yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
* yaml2obj: set `sh_info` field automatically.
* obj2yaml: don't dump the `Info` field when its value matches the number of version dependencies.

Also, simplifies few test cases.

Differential revision: https://reviews.llvm.org/D94956

[IndirectFunctions] Skip propagating attributes to address taken functions

In case of indirect calls or address taken functions,
skip propagating any attributes to them. We just
propagate features to such functions.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94585

[llvm] Use hasSingleElement (NFC)

[Transforms] Use llvm::append_range (NFC)

[llvm] Construct SmallVector with iterator ranges (NFC)

[X86] Add experimental option to separately tune alignment of innermost loops

We already have an experimental option to tune loop alignment. Its impact
is very wide (and there is a suspicion that it's not always profitable). We want
to have something more narrow to play with. This patch adds similar option that
overrides preferred alignment for innermost loops. This is for experimental
purposes, default values do not change the existing behavior.

Differential Revision: https://reviews.llvm.org/D94895
Reviewed By: pengfei

[RISCV] Implement vssseg intrinsics.

Define vlsseg intrinsics and pseudo instructions. Lower vlsseg
intrinsics to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94863

[RISCV] Implement vlsseg intrinsics.

Define vlsseg intrinsics and pseudo instructions. Lower vlsseg intrinsics
to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94763

[RISCV] Implement vsseg intrinsics.

Define vsseg intrinsics and pseudo instructions. Lower vsseg intrinsics
to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94688

[lldb] Upstream eCore_arm_arm64e enum value in ArchSpec

Upstream the eCore_arm_arm64e enum value in ArchSpec. All the other
arm64e triple changes already landed in LLVM.

Differential revision: https://reviews.llvm.org/D95110

[RISCV] Use update_llc_test_checks.py to regenerate check lines in vleff-rv32.ll and vleff-rv64.ll.

This should minimize change in a future patch.

[dsymutil] Compare object modification times using second precision

The modification time in the debug map is expressed using second
precision, while the modification time returned by the filesystem could
be more precise. Avoid spurious warnings about timestamp mismatches by
truncating the modification time reported by the system to seconds.

Use CXX_SOURCES and point to the right source file.

Copy paste error, but the test still built on macOS. Weird.
It failed on debian linux with an error about -fno-limit-debug-info
not being a supported flag??? Not sure how this goof would cause
that error, but let's see if it did...

[MSan] Move origins for overlapped memory transfer

Reviewed-by: eugenis
Differential Revision: https://reviews.llvm.org/D94572

Fix a bug with setting breakpoints on C++11 inline initialization statements.

If they occurred before the constructor that used them, we would refuse to
set the breakpoint because we thought they were crossing function boundaries.

Differential Revision: https://reviews.llvm.org/D94846

[lld-macho] Add dependency on ObjCARC to fix shared build

[Clang][OpenMP] Use `clang_cc1` test for `declare_target_device_only_compilation.cpp`

Use `clang_cc1` test for `declare_target_device_only_compilation.cpp`

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D95089

[DAGCombiner] Precommit test case for D95086

This is the test case for D95086 with worse result.

Differential Revision: https://reviews.llvm.org/D95103

[mlir][OpFormatGen] Fix incorrect kind used for RegionsDirective

I attempted to write a test case for this, but the situations in which the kind is used for RegionDirective and ResultsDirective have zero overlap; meaning that there isn't a situation in which sharing the kind creates a conflict.

Differential Revision: https://reviews.llvm.org/D94988

[mlir] Make MLIRContext::getOrLoadDialect(StringRef, TypeID, ...) public

Having this function in a public scope is helpful to register dialects that are
defined at runtime, and thus that need a runtime-defined TypeID.

Also, a similar function in DialectRegistry, insert(TypeID, StringRef, ...), has
a public scope.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D95091

[mlir] Add a new builtin `unrealized_conversion_cast` operation

An `unrealized_conversion_cast` operation represents an unrealized conversion
from one set of types to another, that is used to enable the inter-mixing of
different type systems. This operation should not be attributed any special
representational or execution semantics, and is generally only intended to be
used to satisfy the temporary intermixing of type systems during the conversion
of one type system to another.

This operation was discussed in the following RFC(and ODM):

https://llvm.discourse.group/t/open-meeting-1-14-dialect-conversion-and-type-conversion-the-question-of-cast-operations/

Differential Revision: https://reviews.llvm.org/D94832

[mlir] Add an interface for Cast-Like operations

A cast-like operation is one that converts from a set of input types to a set of output types. The arity of the inputs may be from 0-N, whereas the arity of the outputs may be anything from 1-N. Cast-like operations are removable in cases where they produce a "no-op", i.e when the input types and output types match 1-1.

Differential Revision: https://reviews.llvm.org/D94831

[NFC] Minor cleanup for ValueHandle code.

Based on feedback in https://reviews.llvm.org/D93433.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D94238

[libc][NFC][obvious] fix the names of MPFR tests

I missed the MPFR tests in my previous commit. They have now been fixed
to not fail the prefix check in the test macro.

[libc][NFC] add "LlvmLibc" as a prefix to all test names

Summary:
Having a consistent prefix makes selecting all of the llvm libc tests
easier on any platform that is also using the gtest framework.
This also modifies the TEST and TEST_F macros to enforce this change
moving forward.

Reviewers: sivachandra

Subscribers:

[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95095

[RISCV] Add another isel pattern for slliu.w.

Previously we only matched (and (shl X, C1), 0xffffffff << C1)
which matches the InstCombine canonicalization order. But its
possible to see (shl (and X, 0xffffffff), C1) if the pattern
is introduced in SelectionDAG. For example, through expansion of
a GEP.

[RISCV] Add addu.w and slliu.w test that uses getelementptr with zero extended indices.

This is closer to the kind of code that these intrinsics are
targeted at. Note we fail to match slliu.w here because our pattern
looks for (and (shl X, C1), 0xffffffff << C1) rather than
(shl (and X, 0xffffffff), C1). I'll fix this in a follow up
commit.

Revert "[mlir][Affine] Add support for multi-store producer fusion"

This reverts commit 7dd198852b4db52ae22242dfeda4eccda83aa8b2.

ASAN issue.

[mlir][sparse] add asserts on reading in tensor data

Rationale:
Since I made the argument that metadata helps with extra
verification checks, I better actually do that ;-)

Reviewed By: penpornk

Differential Revision: https://reviews.llvm.org/D95072

D94954: Fixes Snapdragon Kryo CPU core detection

All of these families were claiming to be a73 based, which was causing
-mcpu/mtune=native to never use the newer features available to these
cores.

Goes through each and bumps the individual cores to their respective Big
counterparts. Since this code path doesn't support big.little detection,
there was already a precedent set with the Qualcomm line to choose the
big cores only.

Adds a comment on each line for the product's name that the part number
refers to. Confirmed on-device and through Linux header naming
convections.

Additionally newer SoCs mix CPU implementer parts from multiple
implementers. Both 0x41 (ARM) and 0x51 (Qualcomm) in the Snapdragon case

This was causing a desync in information where the scan at the start to
find the implementer would mismatch the part scan later on.
Now scan for both implementer and part at the start so these stay in
sync.

Differential Revision: https://reviews.llvm.org/D94954

Makefile.rules: Avoid redundant .d generation (make restart) and inline archive rule to the only test

Take an example when `CXX_SOURCES` is main.cpp.

main.d is an included file. make will rebuild main.d, re-executes itself [1] to read
in the new main.d file, then rebuild main.o, finally link main.o into a.out.
main.cpp is parsed twice in this process.

This patch merges .d generation into .o generation [2], writes explicit rules
for .c/.m and deletes suffix rules for %.m and %.o. Since a target can be
satisfied by either of .c/.cpp/.m/.mm, we use multiple pattern rules. The
rule with the prerequisite (with VPATH considered) satisfied is used [3].

Since suffix rules are disabled, the implicit rule for archive member targets is
no long available [4]. Rewrite, simplify the archive rule and inline it into the
only test `test/API/functionalities/archives/Makefile`.

[1]: https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html
[2]: http://make.mad-scientist.net/papers/advanced-auto-dependency-generation/
[3]: https://www.gnu.org/software/make/manual/html_node/Pattern-Match.html
[4]: https://www.gnu.org/software/make/manual/html_node/Archive-Update.html

ObjC/ObjCXX tests only run on macOS. I don't have testing environment. Hope
someone can do it for me.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D94890

[mlir] NFC - Fix unused variable in non-debug mode

[NFC][AMDGPU] Document target ID syntax for code object V2 to V3

Differential Revision: https://reviews.llvm.org/D95018

[hip] Fix `<complex>` compilation on Windows with VS2019.

Differential Revision: https://reviews.llvm.org/D95075

Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor"

This reverts commit d97f776be5f8cd3cd446fe73827cd355f6bab4e1.

The original problem was due to build failures in shared lib builds. D95079
moved ImportedFunctionsInliningStatistics under Analysis, unblocking
this.