review.tizen.org Git - platform/upstream/llvm.git/log

[DAGCombine] Remove unused param in combineCarryDiamond(). NFC

[AMDGPU] Make v8i16/v8f16 legal

Differential Revision: https://reviews.llvm.org/D117721

[lldb/Interpreter] Make `ScriptedInterface::ErrorWithMessage` static (NFC)

This patch changes the `ScriptedInterface::ErrorWithMessage` method to
make it `static` which makes it easier to call.

The patch also updates its various call sites to reflect this change.

Differential Revision: https://reviews.llvm.org/D117374

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

[lldb/Plugins] Fix ScriptedThread IndexID reporting

When listing all the Scripted Threads of a ScriptedProcess, we can see that all
have the thread index set to 1. This is caused by the lldb_private::Thread
constructor, which sets the m_index_id member using the provided thread id `tid`.

Because the call to the super constructor is done before instantiating
the `ScriptedThreadInterface`, lldb can't fetch the thread id from the
script instance, so it uses `LLDB_INVALID_THREAD_ID` instead.

To mitigate this, this patch takes advantage of the `ScriptedThread::Create`
fallible constructor idiom to defer calling the `ScriptedThread` constructor
(and the `Thread` super constructor with it), until we can fetch a valid
thread id `tid` from the `ScriptedThreadInterface`.

rdar://87432065

Differential Revision: https://reviews.llvm.org/D117076

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

[lldb/Plugins] Enrich ScriptedThreads Stop Reasons with Exceptions

This patch adds Exceptions to the list of supported stop reasons for
Scripted Threads.

The main motivation for this is that breakpoints are triggered as a
special exception class on ARM platforms, so adding it as a stop reason
allows the ScriptedProcess to selected the ScriptedThread that stopped at
a breakpoint (or crashed :p).

rdar://87430376

Differential Revision: https://reviews.llvm.org/D117074

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

[lldb/Plugins] Add support of multiple ScriptedThreads in a ScriptedProcess

This patch adds support of multiple Scripted Threads in a ScriptedProcess.

This is done by fetching the Scripted Threads info dictionary at every
ScriptedProcess::DoUpdateThreadList and iterate over each element to
create a new ScriptedThread using the object instance, if it was not
already available.

This patch also adds the ability to pass a pointer of a script interpreter
object instance to initialize a ScriptedInterface instead of having to call
the script object initializer in the ScriptedInterface constructor.

This is used to instantiate the ScriptedThreadInterface from the
ScriptedThread constructor, to be able to perform call on that script
interpreter object instance.

Finally, the patch also updates the scripted process test to check for
multiple threads.

rdar://84507704

Differential Revision: https://reviews.llvm.org/D117071

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

[lldb/Plugins] Move ScriptedThreadInterface to ScriptedThread

Since we can have multiple Scripted Threads per Scripted Process, having
only a single ScriptedThreadInterface (with a single object instance)
will cause the method calls to be done on the wrong object.

Instead, this patch creates a separate ScriptedThreadInterface for each
new lldb_private::ScriptedThread to make sure we interact with the right
instance.

rdar://87427911

Differential Revision: https://reviews.llvm.org/D117070

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

[lldb/Plugins] Add ScriptedProcess::GetThreadsInfo interface

This patch adds a new method to the Scripted Process interface to
retrive a dictionary of Scripted Threads. It uses the thread ID as a key
and the Scripted Thread instance as the value.

This dictionary will be used to create Scripted Threads in lldb and
perform calls to the python scripted thread object.

rdar://87427126

Differential Revision: https://reviews.llvm.org/D117068

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

[AMDGPU] Remove cndmask from readsExecAsData

Differential Revision: https://reviews.llvm.org/D117909

[NFC][MLGO] Simplify conditional compilation

Most of the code that's shared between 'release' and 'development'
modes doesn't depend on anything special.

[libc++][format] Finish P0645 Text Formatting.

This adjust the version macro and sets it as completed. All parts of the paper
have been implemented, except for the parts replaced by later papers and
LWG-issues.

Adjusted the synopsis to match the synopsis in the Standard. Not yet
implemented parts of P2216 and P2418 still use the P0645 wording.

Completes:
- P0645 Text Formatting

Depends on D115991

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D115999

[libc++] Fix bugs in common_iterator; add test coverage.

Differential Revision: https://reviews.llvm.org/D117400

[libc][cmake] Make `add_tablegen` calls match others

in all the other `add_tablegen` calls, the project name is so transformed so it
can be a prefix of a CMake variable. I think it is better to do do that here
too for consistency.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D117979

[ConstraintElimination] Fix sign of sub decomposition.

Update the decomposition code to make sure the right coefficient (-1) is
used for the second operand of the subtract.

Fixes PR53123.

[ConstraintElimination] Add test from PR53123.

[NFC][DebugInfo] Strip out an undesired #if 0 block

As mentioned in discussion of D116821, it's better to just delete this
block than keep it hanging around.

Use -gdwarf-4 in compiler-rt/test/profile/Linux/instrprof-debug-info-correlate.c

otherwise the test fails after the recent DWARF 4 -> 5 default change,
see https://github.com/llvm/llvm-project/issues/53387

[libc] Add bazel definition for hypot/hypotf.

Patch by Clint Caywood.

Differential Revision: https://reviews.llvm.org/D118053

[AArch64] NFC: Clarify and auto-generate some CodeGen tests.

* For ext-narrow-index.ll, move vscale_range attribute closer to the
  function definition, rather than through indirect #<num> attribute. This
  makes the test a bit easier to read.
* auto-generated CHECK lines for sve-cmp-select.ll and
  named-vector-shuffles-sve.ll.
* re-generated CHECK lines for tests that had a mention they were
  auto-generated, but where the CHECK lines were out of date.

[llvm] Do not replace dead constant references in metadata with undef

This patch removes an incorrect behaviour in Constants.cpp, which would
replace dead constant references in metadata with an undef value. This
blanket replacement resulted in undef values being inserted into
metadata that would not accept them. The replacement was intended for
debug info metadata, but this is now instead handled in the RAUW
handler.

Differential Revision: https://reviews.llvm.org/D117300

[gn build] Port db2944e34b16

[gn build] Port 787ccd345cbb

[libc++][format] Adds formatter handle.

This implements the handler according to P0645. P2418 changes the wording
in the Standard. That isn't implemented and requires changes in more
places. LWG3631 applies modifications to P2418, but is currently
unresolved.

Implements parts of:
* P0645 Text Formatting

Depends on D115989

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D115991

[libc++][format] Disable default formatter.

[format.formatter.spec]/5 lists the requirements for the default
formatter. The original implementation didn't implement this. This
implements the default formatter according to the Standard.

This adds additional test to validate the default formatter is disabled
and the required standard formatters are enabled.

While adding the tests it seems the formatters needed a constraint for the
character types they were valid for.

Implements parts of:
- P0645 Text Formatting

Depends on D115988

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D115989

[libc++][format] Adds formatter pointer.

This implements the last required formatter specialization.

Completes:
- LWG 3251 Are std::format alignment specifiers applied to string arguments?
- LWG 3340 Formatting functions should throw on argument/format string mismatch in §[format.functions]
- LWG 3540 §[format.arg] There should be no const in basic_format_arg(const T* p)

Implements parts of:
- P0645 Text Formatting

Depends on D114001

Reviewed By: ldionne, vitaut, #libc

Differential Revision: https://reviews.llvm.org/D115988

[libc++][format] Adds formatter floating-point.

This properly implements the formatter for floating-point types.

Completes:
- P1652R1 Printf corner cases in std::format
- LWG 3250 std::format: # (alternate form) for NaN and inf
- LWG 3243 std::format and negative zeroes

Implements parts of:
- P0645 Text Formatting

Reviewed By: #libc, ldionne, vitaut

Differential Revision: https://reviews.llvm.org/D114001

[clang-format] Space between attribute closing parenthesis and qualified type colon.

Fixes https://github.com/llvm/llvm-project/issues/35711.

Reviewed By: MyDeveloperDay, HazardyKnusperkeks, owenpan

Differential Revision: https://reviews.llvm.org/D117894

Revert rG6a605b97a200 due to excessive memory use

Over in the comments for D116821, some use-cases have cropped up where
there's a substantial increase in memory usage. A quick inspection
shows that a) it's a lot of memory and b) there are several things to
be done to reduce it. Reverting (via disabling this feature by default)
to avoid bothering people in the meantime.

[AMDGPU][InstCombine] Remove zero image offset

Remove the offset parameter if it is zero.

Differential Revision: https://reviews.llvm.org/D117876

[clang][NFC] Wrap TYPE_SWITCH in "do while (0)" in the interpreter

Wraps the expansions of TYPE_SWITCH and COMPOSITE_TYPE_SWITCH in
the constexpr interpreter with "do { ... } while (0)" so that these
macros can be used like this:

if (llvm::Optional<PrimType> T = Ctx.classify(FieldTy))
TYPE_SWITCH(*T, Ok &= ReturnValue<T>(FP.deref<T>(), Value));
else
Ok &= Composite(FieldTy, FP, Value);

This bug was found while testing D116316. See also review comment:
https://reviews.llvm.org/D64146?id=208520#inline-584131

Also cleaned up the macro definitions by removing the superfluous
do-while statements and removed the unused INT_TPYE_SWITCH macro.

Differential Revision: https://reviews.llvm.org/D117301

[SLP][NFC] Add debug logs for entry.

Tell the users they are specifying something without vector register.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D117980

[libc++] Fix benchmark failure

[ELF] Fix the branch range computation when reusing a thunk

Notation: dst is `t->getThunkTargetSym()->getVA()`

On AArch64, when `src-0x8000000-r_addend <= dst < src-0x8000000`, the condition
`target->inBranchRange(rel.type, src, rel.sym->getVA(rel.addend))` may
incorrectly consider a thunk reusable.
`rel.addend = -getPCBias(rel.type)` resets the addend to 0 for AArch64/PPC
and the zero addend is used by `rel.sym->getVA(rel.addend)` to check
out-of-range relocations.

See the test for a case this computation is wrong:
`error: a.o:(.text_high+0x4): relocation R_AARCH64_JUMP26 out of range: -134217732 is not in [-134217728, 134217727]`
I have seen a real world case with r_addend=19960.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D117734

[X86] combineSetCCMOVMSK - fold allof(cmpeq(x,y)) -> ptest(sub(x,y)) (PR53379)

As suggested on PR53379, for all-of icmp-eq patterns, we can use ptest(sub(x,y)) on SSE41+ targets

This is a generalization of the existing allof(cmpeq(x,0)) -> ptest(x) pattern

We can probably extend this further, in particularly to handle 256-bit cases on pre-AVX2 targets, but this part of the generalization is pretty trivial

Fixes Issue #53379

[ISEL] Move trivial step_vector folds to FoldConstantArithmetic.

Given that step_vector is practically a constant, doing this early
helps with DAGCombine folds that happen before type legalization.

There is currently no way to test this happens earlier, although existing
tests for step_vector folds continue protect the folds happening at all.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D117863

[libcxx][test] {move,reverse}_iterator cannot be instantiated for a type with no `operator*`

Since their nested reference types are defined in terms of `iter_reference_t<T>`, which examines `decltype(*declval<T>())`.

Differential Revision: https://reviews.llvm.org/D117371

[mlir][linalg] Add transpose support to hoist padding.

Add a transpose option to hoist padding to transpose the padded tensor before storing it into the packed tensor. The early transpose improves the memory access patterns of the actual compute kernel. The patch introduces a transpose right after the hoisted pad tensor and a second transpose inside the compute loop. The second transpose can either be fused into the compute operation or will canonicalize away when lowering to vector instructions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D117893

[DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)).

If the bitreverse gets expanded, it will introduce a new bswap. By
putting a bswap before the bitreverse, we can ensure it gets cancelled
out when this happens.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118012

[mlir][bufferize][NFC] Implement BufferizableOpInterface on bufferization ops directly

No longer go through an external model. Also put BufferizableOpInterface into the same build target as the BufferizationDialect. This allows for some code reuse between BufferizationOps canonicalizers and BufferizableOpInterface implementations.

Differential Revision: https://reviews.llvm.org/D117987

[SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x.

This can show up during when bitreverse is expanded to bswap and
swap of bits within a byte. If the input is already a bswap, we
should cancel them out before we further transform them in a way
that makes it harder to see the redundancy.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118007

[RISCV] Select int_riscv_vsll with shift of 1 to vadd.vv.

Add might be faster than shift. We can't do this earlier without
using a Freeze instruction.

This is the intrinsic version of D106689.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118013

Don't run test/ClangScanDeps/modules-symlink.c on Windows

'ln -s' isn't Windows friendly.

[MLIR][OpenMP] Suppress -Wreturn-type warnings (NFC)

[mlir][bufferize] Change insertion point for ToTensorOps

Both insertion points are valid. This is to make BufferizableOpInteface-based bufferization compatible with existing partial bufferization test cases. (So less changes are necessary to unit tests.)

Differential Revision: https://reviews.llvm.org/D117986

[Templight] Don't display empty strings for names of unnamed template parameters

Patch originally by oktal3000: https://github.com/mikael-s-persson/templight/pull/40

When a template parameter is unnamed, the name of -templight-dump might return
an empty string. This is fine, they are unnamed after all, but it might be more
user friendly to at least describe what entity is unnamed.

Differential Revision: https://reviews.llvm.org/D115521

[flang] Add MemoryAllocation pass to the pipeline

Add the MemoryAllocation pass into the pipeline. Add
the possibilty to pass the options directly within the tool (tco).

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D117886

[mlir][tensor][bufferize] Support tensor.rank in BufferizableOpInterfaceImpl

This is the only op that is not supported via BufferizableOpInterfaceImpl bufferization. Once this op is supported we can switch `tensor-bufferize` over to the new unified bufferization.

Differential Revision: https://reviews.llvm.org/D117985

[libc++][AIX] Do not assert chmod return value is non-zero.

A number of the filesystem tests create a directory that contains a bad
symlink. On AIX recursively setting permissions on said directory will
return a non-zero value because of the bad symlink, however the
following rm -r still completes successfully. Avoid the assertion on
AIX, and rely on the return value of the remove command to detect
problems.

Differential Revision: https://reviews.llvm.org/D112086

[mlir] Fix broken __repr__ implementation in Linalg OpDSL

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D118027

[llvm][docs] Fix code-block in the testing guide

Without a langauge name it's an error (with some verisons of Sphinx
it seems) or the block is simply missing in the output.

[mlir][tensor] Move BufferizableOpInterface impl to tensor dialect

This is in preparation of unifying the existing bufferization with One-Shot bufferization.

A subsequent commit will replace `tensor-bufferize`'s implementation with the BufferizableOpInterface-based implementation and move over missing test cases.

Differential Revision: https://reviews.llvm.org/D117984

AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills

These have negative / out of bounds frame index values and would
assert when trying to set the BitVector. Fixed stack objects can't be
colored away so ignore them.

Pre-commit test cases for (sra (load)) -> (sextload) folds. NFC

Add test case to show missing folds for (sra (load)) -> (sextload).

Differential Revision: https://reviews.llvm.org/D116929

Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction"

This reverts commit a97e20a3a8a58be751f023e610758310d5664562.

Reapply "IR: Make getRetAlign check callee function attributes"

Reapply 3d2d208f6a0a421b23937c39b9d371183a5913a3, reverted in
a97e20a3a8a58be751f023e610758310d5664562

[clang-format] Fix SeparateDefinitionBlocks issues

- Fixes https://github.com/llvm/llvm-project/issues/53227 that wrongly
  indents multiline comments
- Fixes wrong detection of single-line opening braces when used along
  with those only opening scopes, causing crashes due to duplicated
  replacements on the same token:
    void foo()
    {
      {
        int x;
      }
    }
- Fixes wrong recognition of first line of definition when the line
  starts with block comment, causing crashes due to duplicated
  replacements on the same token for this leads toward skipping the line
  starting with inline block comment:
    /*
      Some descriptions about function
    */
    /*inline*/ void bar() {
    }
- Fixes wrong recognition of enum when used as a type name rather than
  starting definition block, causing crashes due to duplicated
  replacements on the same token since both actions for enum and for
  definition blocks were taken place:
    void foobar(const enum EnumType e) {
    }
- Change to use function keyword for JavaScript instead of comparing
  strings
- Resolves formatting conflict with options EmptyLineAfterAccessModifier
  and EmptyLineBeforeAccessModifier (prompts with --dry-run (-n) or
  --output-replacement-xml but no observable change)
- Recognize long (len>=5) uppercased name taking a single line as return
  type and fix the problem of adding newline below it, with adding new
  token type FunctionLikeOrFreestandingMacro and marking tokens in
  UnwrappedLineParser:
    void
    afunc(int x) {
      return;
    }
    TYPENAME
    func(int x, int y) {
      // ...
    }
- Remove redundant and repeated initialization
- Do no change to newlines before EOF

Reviewed By: MyDeveloperDay, curdeius, HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D117520

[AArch64] Regenerate CHECK lines for llvm/test/CodeGen/AArch64/sve2-int-mul.ll

[X86] Add cmp-equality bool reductions

PR53379 test coverage

[X86] Rename cmp-with-zero bool reductions

Explicitly name them icmp0_* - I'm intending to add PR53379 test coverage shortly

[X86] Fix v8i8 -> v8i16 typo in bool reductions

We were supposed to be testing <8 x i16> reductions

Add missing STLExtras.h include from lldb/unittests/TestingSupport/MockTildeExpressionResolver.cpp

[RISCV] Add side-effect-free vsetvli intrinsics

This patch introduces new intrinsics that enable the use of vsetvli in
contexts where only the returned vector length is of interest. The
pre-existing intrinsics are marked with side-effects, which prevents
even trivial optimizations on/across them.

These intrinsics are intended to be used in situations where the vector
length is fed in turn to RVV intrinsics or to vector-predication
intrinsics during loop vectorization, for example. Those codegen paths
ensure that instructions are generated with their own implicit vsetvli,
so the vector length and vtype can be relied upon to be correct.

No corresponding C builtins are planned at this stage, though that is a
possibility for the future if the need arises.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117910

[LoopFlatten] Address FIXME about getTripCountFromExitCount. NFC.

Together with the previous commit which mainly documents better LoopFlatten's
overall strategy, this addresses a concern added as a FIXME comment in D110587;
the code refactoring (NFC) introduces functions (also for the SCEV usage) to
make this clearer.

[LoopFlatten] Added comments about usage of various Loop APIs. NFC.

Add missing include llvm/ADT/STLExtras

Add `isConstinit` matcher

Support C++20 constinit variables for AST Matchers.

[demangler][NFC] Refactor some parsing

There's some unnecessary code duplication in the parser.  This
refactors that and deploys boolean variables to avoid the duplication.
These also happen to help adding module demangling (with an updated
mangling scheme).

1a) The grammar requires some lookahead concerning <template-args>. We
may discover an <unscoped-name> is actually <unscoped-template-name>
<template-args>.  (When <unscoped-name> was a substitution, there must
be a following <template-args>.)  Refactor parseName to only have one
code path looking for the 'I' indicating <template-args>.

1b) While there I altered the control flow to hold the result in a
variable, rather than tail call.  Made it easier to debug (and of
course an optimizer will DTRT here anyway).

2a) An <unscoped-name> can have an St or StL prefix.  No need for
completely separate code paths handling the following unqualified-name
though.

2b) Also no need to look for both 'St' and 'StL' separately.  Look for
'St' and then conditionally swallow an 'L'.

3) We get a similar issue as #1a when parsing a typeName.  Here I just
change the control flow slightly to bring the 'break' out to the end
of the 'S' block and embed the early return inside an if.  That's more
in keeping with the code style.

4) Although NFC, there's a new testcase as that's not covered by the
existing demangler tests and is significant in the #1a case above.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D117879

[demangler] write-protect non-canonical source

To try and avoid undesired changes to the non-canonical demangler
sources, change the cp-to-llvm script to (a) write-protect the target
files and (b) prepend 'do not edit' comments that are significant to
emacs[*], and hopefully humans.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D118008

[demangler] Resync demangler sources

Recent commits changed llvm/include/llvm/Demangle without also
changing libcxxabi/src/Demangle, which is the canonical source
location. This resyncs those commits to the libcxxabi directory.

Commits:
* 1f9e18b6565fd1bb69c4b649b9efd3467b3c7c7d
* f53d359816e66a107195e1e4b581e2a33bbafaa4
* 065044c443f4041f32e0a8d6e633f9d92580fbca

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D117990.diff

[flang] Update tco tool pipline and add translation to LLVM IR

tco is a tool to test the FIR to LLVM IR pipeline of the Flang compiler.

This patch update tco pipelines and adds the translation to LLVM IR.

A simple test is added to make sure the tool is working with a simple
FIR program.
More tests will be upstream in follow up patch from the fir-dev branch.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: kiranchandramohan, awarzynski, schweitz, mehdi_amini

Differential Revision: https://reviews.llvm.org/D117781

Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com>

Move STLFunctionalExtras out of STLExtras

Only using that change in StringRef already decreases the number of
preoprocessed lines from 7837621 to 7776151 for LLVMSupport

Perhaps more interestingly, it shows that many files were relying on the
inclusion of StringRef.h to have the declaration from STLExtras.h. This
patch tries hard to patch relevant part of llvm-project impacted by this
hidden dependency removal.

Potential impact:
- "llvm/ADT/StringRef.h" no longer includes <memory>,
"llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h"

Related Discourse thread:
https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831

[LV] Make some tests more robust by adding missing users.

[X86] Add PR46249 test case showing poorly widened select predicate mask

[AMDGPU][NFC] Fix debug prints

Print the instructions instead of pointers.

[NFC] New test case for BasicAA and memcy/memmove with deopt

New test checks results of BasicAA for llvm.memcpy.*/llvm.memmove.* intrinsics in presence of deopt bundle. By specification expected result for unrelated global memory should be Ref. Currently this is not the case and will be fixed in upcoming patches.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D118031

[MLIR][Presburger] Refactor duplicate division merging to Utils

This patch moves merging of duplicate divisions to presburger utility
functions. This is required to support division merging in structures other
than IntegerPolyhedron.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D118001

[RISCV] add support for zbkx subextension in MC layer.

This patch adds support for zbkx extension from K extension(v1.0.0) in MC layer.
Instructions with same functionality and same encoding is defined in the bitmanip extension.
It defines {Xperm8, Xperm4} as instruction aliases for xperm.* in Zbp extension. When Zbkx is enabled while Zbp is not, xperm.h will not be available. When Zbkx and Zbp are both enabled, the instructions will be decoded in Zbp format.

[[ https://reviews.llvm.org/D94999 | D94999 ]] this is the patch that introduces xperm.* instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117889

[LV] Name values and blocks in same induction tests (NFC).

This reduces the churn in the test in future updates due to numbering
changes.

[gn build] Port 3696c70e67d9

[LoopVectorize] Support epilogue vectorisation of loops with reductions

isCandidateForEpilogueVectorization will currently return false for loops
which contain reductions. This patch removes this restriction and makes
the following changes to support epilogue vectorisation with reductions:

- `fixReduction`: If fixReduction is being called during vectorisation of the
    epilogue, the phi node it creates will need to additionally carry incoming
     values from the middle block of the main loop.

- `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi
    created by fixReduction are updated after the vec.epilog.iter.check block
    is added. The phi is also moved to the preheader of the epilogue.

- `processLoop`: The start value of any VPReductionPHIRecipes are updated before
    vectorising the epilogue loop. The getResumeInstr function added to the ILV
    will return the resume instruction associated with the recurrence descriptor.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D116928

[X86] Remove avx512f integer and/or/xor/min/max reduction intrinsics and use generic equivalents

None of these have any reordering issues, and they still emit the same reduction intrinsics without any change in the existing test coverage:

llvm-project\clang\test\CodeGen\X86\avx512-reduceIntrin.c
llvm-project\clang\test\CodeGen\X86\avx512-reduceMinMaxIntrin.c

Differential Revision: https://reviews.llvm.org/D117881

[clang-tidy] Add `readability-container-contains` check

This commit introduces a new check `readability-container-contains` which finds
usages of `container.count()` and `container.find() != container.end()` and
instead recommends the `container.contains()` method introduced in C++20.

For containers which permit multiple entries per key (`multimap`, `multiset`,
...), `contains` is more efficient than `count` because `count` has to do
unnecessary additional work.

While this this performance difference does not exist for containers with only
a single entry per key (`map`, `unordered_map`, ...), `contains` still conveys
the intent better.

Reviewed By: xazax.hun, whisperity

Differential Revision: http://reviews.llvm.org/D112646

[X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`

D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions

This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_max_epu32
  // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
  return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Sibling patch to D117791

Differential Revision: https://reviews.llvm.org/D117798

[mlir][bufferize][NFC] Refactor createAlloc function signature

Pass a ValueRange instead of an ArrayRef<Value> for better compatibility. Also provide an additional function overload that automatically deallocates the buffer if specified.

Differential Revision: https://reviews.llvm.org/D118025

[X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs

D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)

This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
  // CHECK-LABEL: test_mm256_abs_epi8
  // CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
  return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Differential Revision: https://reviews.llvm.org/D117791

[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth

In code review for D117104 two slightly weird checks were found
in DAGCombiner::reduceLoadWidth. They were typically checking
if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)),
but such a comparison actually only make sense if BitsB is a power
of two.

The checks were related to the code that attempted to shrink a load
based on the fact that the loaded value would be right shifted.

Afaict the legality of the value types is checked later (typically in
isLegalNarrowLdSt), so the existing checks were both overly
conservative as well as being wrong whenever ExtVTBits wasn't a
power of two. The latter was a situation triggered by a number of
lit tests so we could not just assert on ExtVTBIts being a power of
two).

When attempting to simply remove the checks I found some problems,
that seems to have been guarded by the checks (maybe just out of
luck). A typical example would be a pattern like this:

  t1 = load i96* ptr
  t2 = srl t1, 64
  t3 = truncate t2 to i64

When DAGCombine is visiting the truncate reduceLoadWidth is called
attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then
the SRL is detected and we set ShAmt to 64.

In the past we've bailed out due to i96 not being a multiple of 64.
If we simply remove that check then we would end up replacing the
load with a new load that would read 64 bits but with a base pointer
adjusted by 64 bits. So we would read 32 bits the wasn't accessed by
the original load.
This patch will instead utilize the fact that the logical left shift
can be folded away by using a zextload. Thus, the pattern above will
now be combined into

  t3 = load i32* ptr+offset, zext to i64

Another case is shown in the X86/shift-folding.ll test case:

  t1 = load i32* ptr
  t2 = srl i32 t1, 8
  t3 = truncate t2 to i16

In the past we bailed out due to the shift count (8) not being a
multiple of 16. Now the narrowing kicks in and we get

  t3 = load i16* ptr+offset

Differential Revision: https://reviews.llvm.org/D117406

Pre-commit test case for trunc+lshr+load folds

This is a pre-commit of test cases relevant for D117406.

@srl_load_narrowing1 is showing a pattern that could be folded into
a more narrow load.

@srl_load_narrowing2 is showing a similar pattern that happens to
be optimized already, but that happens in two steps (first triggering
a combine based on SRL and later another combine based on TRUNCATE).

Differential Revision: https://reviews.llvm.org/D117588

[lldb] Update release notes with non-address bit handling changes

This adds the "memory find" (https://reviews.llvm.org/D117299)
and "memory tag" (https://reviews.llvm.org/D117672) commands
and puts them all in one list.

[RISCV][VP] Lower VP_MERGE to RVV instructions

This patch adds lowering of the llvm.vp.merge.* intrinsic
(ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a
special pseudo form of vmerge which allows a tied merge operand,
allowing us to specify the tail elements as being equal to the "on
false" operand, using a tied-def constraint and a "tail undisturbed"
policy.

While this strategy allows us to often lower the intrinsic to just one
instruction, it may be less efficient in fixed-vector types as the
number of tail elements may extend far beyond the length of the fixed
vector. Another strategy could be to use a vmerge/vfmerge instruction
with an AVL equal to the length of the vector type, and manipulate the
condition operand such that mask elements greater than the operation's
EVL are false.

I've also observed inefficient codegen in which our 'VF' patterns don't
match raw floating-point SPLAT_VECTORs, which occur in scalable-vector
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117561

[RISCV] Match VF variants for masked VFRDIV/VFRSUB

This patch follows up on D117697 to help the simple binary operations
behave similarly in the presence of masks.

It also enables CGP sinking support for vp.fdiv and vp.fsub intrinsics,
now that VFRDIV and VFRSUB are consistently matched with a LHS splat for
masked and unmasked variants.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117783

[X86] getVectorMaskingNode - fix indentation. NFC.

clang-format

[AMDGPU][GlobalISel] Remove the post ':' part of vreg operands in fsh combine tests.

[flang] Update the description of `!fir.coordinate_of`

This change was suggested in one of the comments for
https://reviews.llvm.org/D115333. Basically, the following usage is
valid, but the current wording suggests otherwise:
```
%1 = fir.coordinate_of %a, %k : (!fir.ref<!fir.array<10 x 10 x i32>>, index) -> !fir.ref<!fir.array<10 x i32>>
```
A test is also added to better document this particular case.

Differential revision: https://reviews.llvm.org/D115929

[lldb] Ignore non-address bits in "memory find" arguments

This removes the non-address bits before we try to use
the addresses.

Meaning that when results are shown, those results won't
show non-address bits either. This follows what "memory read"
has done. On the grounds that non-address bits are a property
of a pointer, not the memory pointed to.

I've added testing and merged the find and read tests into one
file.

Note that there are no API side changes because "memory find"
does not have an equivalent API call.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D117299

[AMDGPU][GlobalISel] Add more sign/zero/any-extension tests

Add s1 to s16 cases, and for sgprs s1 to s64 and s32 to s64.

[AMDGPU][GlobalISel] Regenerate checks in inst-select-*ext.mir

[LLD][ELF][AArch64] Update test with incorrect REQUIRES line [NFC]

D54759 introduced aarch64-combined-dynrel.s and
aarch64-combined-dynrel-ifunc.s . Unfortunately the requires line
at the top was AArch64 instead of aarch64 which means they were never
run. Update the tests to use aarch64 and fix to match current lld output.

Differential Revision: https://reviews.llvm.org/D117896

[AArch64][GlobalISel] Support returned argument with multiple registers

The call lowering code assumed that a returned argument could only
consist of one register. Pass an ArrayRef<Register> instead of
Register to make sure that all parts get assigned.

Fixes https://github.com/llvm/llvm-project/issues/53315.

Differential Revision: https://reviews.llvm.org/D117866

[SDAG] Don't move DBG_VALUE instructions after insertion point during scheduling (PR53243)

EmitSchedule() shouldn't be touching instructions after the provided
insertion point. The change introduced in D83561 performs a scan to
the end of the block, and thus may move unrelated instructions. In
particular, this ends up moving instructions that have been produced
by FastISel and will later be deleted. Moving them means that more
instructions than intended are removed.

Fix this by stopping the iteration when the insertion point is
reached.

Fixes https://github.com/llvm/llvm-project/issues/53243.

Differential Revision: https://reviews.llvm.org/D117489

[ISEL] Canonicalise constant splats to RHS.

SelectionDAG::getNode() canonicalises constants to the RHS if the
operation is commutative, but it doesn't do so for constant splat
vectors. Doing this early helps making certain folds on vector types,
simplifying the code required for target DAGCombines that are enabled
before Type legalization.

Somewhat to my surprise, DAGCombine doesn't seem to traverse the
DAG in a post-order DFS, so at the time of doing some custom fold where
the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the
constant splat to the RHS.

This patch leads to a few improvements, but also a few minor regressions,
which I traced down to D46492. When I tried reverting this change to see
if the changes were still necessary, I ran into some segfaults. Not sure
if there is some latent bug there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117794