review.tizen.org Git - platform/upstream/llvm.git/log

[AArch64][GlobalISel] combine and + [la]sr => ubfx

https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839

[LV] Update test that was missed in e844f05397b72.

BPF: fix a bug in IRPeephole pass

Commit 009f3a89d833 ("BPF: remove intrindics @llvm.stacksave()
and @llvm.stackrestore()") implemented IRPeephole pass to remove
llvm.stacksave()/stackrestore() instrinsics.
Buildbot reported a failure:
UNREACHABLE executed at ../lib/IR/LegacyPassManager.cpp:1445!
which is:
llvm_unreachable("Pass modifies its input and doesn't report it");

The code has changed but the implementation didn't return true
for changing. This patch fixed this problem.

Fix a comment in SemaSYCL to make sure I can commit

[AIX] Disable tests failing due to lack of 64-bit XCOFF object file support

The following tests are failing because 64-bit XCOFF object files are not currently supported on AIX. This patch disables these tests on AIX for now.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D111887

[JITLink][NFC] Fix Wdangling-else warning in LinkGraphTests

Fix a dangling else that gcc-11 warned about. The EXPECT_EQ macro
expands to an if-else, so the whole construction contains a hidden
hangling else.

[LoopUtils] Simplify addRuntimeCheck to return a single value.

This simplifies the return value of addRuntimeCheck from a pair of
instructions to a single `Value *`.

The existing users of addRuntimeChecks were ignoring the first element
of the pair, hence there is not reason to track FirstInst and return
it.

Additionally all users of addRuntimeChecks use the second returned
`Instruction *` just as `Value *`, so there is no need to return an
`Instruction *`. Therefore there is no need to create a redundant
dummy `and X, true` instruction any longer.

Effectively this change should not impact the generated code because the
redundant AND will be folded by later optimizations. But it is easy to
avoid creating it in the first place and it allows more accurately
estimating the cost of the runtime checks.

[mlir] Flipping Test dialect to prefixed form _Both

Starting with a mostly NFC change to be able to differentiate between
mechanical changes from ones that require more detailed review.

This will be used to flush out flow before flipping dialects used
outside local testing. As this dialect is not intended to be used
generally rather than in tests in core, I will not be following 2 week
staging approach here.

[RISCV] Rewrite forwardCopyWillClobberTuple to not assume that there are exactly 32 registers. NFC

This function was copied from ARM where register pairs/triples/quads can wrap around the 32 encoding space. So register 31 can pair with register 0. This is not true for RISCV vectors. The spec specifically mentions the possibility of a future encoding that has more than 32 registers.

This patch removes the modulo from the code and directly checks that destination register is in the source register range and not the beginning of the range. Though I don't expect an identity copy will occur.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D111467

[gn build] Port 009f3a89d833

BPF: remove intrindics @llvm.stacksave() and @llvm.stackrestore()

Paul Chaignon reported a bpf verifier failure ([1]) due to using
non-ABI register R11. For the test case, llvm11 is okay while
llvm12 and later generates verifier unfriendly code.

The failure is related to variable length array size.
The following mimics the variable length array definition
in the test case:

struct t { char a[20]; };
void foo(void *);
int test() {
   const int a = 8;
   char tmp[AA + sizeof(struct t) + a];
   foo(tmp);
   ...
}

Paul helped bisect that the following llvm commit is
responsible:

552c6c232872 ("PR44406: Follow behavior of array bound constant
              folding in more recent versions of GCC.")

Basically, before the above commit, clang frontend did constant
folding for array size "AA + sizeof(struct t) + a" to be 68,
so used alloca for stack allocation. After the above commit,
clang frontend didn't do constant folding for array size
any more, which results in a VLA and llvm.stacksave/llvm.stackrestore
is generated.

BPF architecture API does not support stack pointer (sp) register.
The LLVM internally used R11 to indicate sp register but it should
not be in the final code. Otherwise, kernel verifier will reject it.

The early patch ([2]) tried to fix the issue in clang frontend.
But the upstream discussion considered frontend fix is really a
hack and the backend should properly undo llvm.stacksave/llvm.stackrestore.
This patch implemented a bpf IR phase to remove these intrinsics
unconditionally. If eventually the alloca can be resolved with
constant size, r11 will not be generated. If alloca cannot be
resolved with constant size, SelectionDag will complain, the same
as without this patch.

[1] https://lore.kernel.org/bpf/20210809151202.GB1012999@Mem/
[2] https://reviews.llvm.org/D107882

Differential Revision: https://reviews.llvm.org/D111897

[OpenMP] libomp: add check of task function pointer for NULL.

This patch allows to simplify compiler implementation on "taskwait nowait"
construct. The "taskwait nowait" is semantically equivalent to the empty task.
Instead of creating an empty routine as a task entry, compiler can just send
NULL pointer to the runtime. Then the runtime will make all the work with
dependences and return because of the absent task routine.

Differential Revision: https://reviews.llvm.org/D112015

Use llvm::erase_if (NFC)

[Sanitizers] Replaced getMaxPointerSizeInBits with getPointerSizeInBits, which was causing failures for 32bit x86.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D111829

Reland [clang] Pass -clear-ast-before-backend in Clang::ConstructJob()

This clears the memory used for the Clang AST before we run LLVM passes.

https://llvm-compile-time-tracker.com/compare.php?from=d0a5f61c4f6fccec87fd5207e3fcd9502dd59854&to=b7437fee79e04464dd968e1a29185495f3590481&stat=max-rss
shows significant memory savings with no slowdown (in fact -O0 slightly speeds up).

For more background, see
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068930.html.

Turn this off for the interpreter since it does codegen multiple times.

Relanding with fix for -print-stats: D111973

Differential Revision: https://reviews.llvm.org/D111270

[mlir][docs] Fix name of get arith->LLVM patterns in docs

[ADT] Fix Wshift-overflow gcc warning in isPowerOf2 unit test

[NFC] ProfileSummary: const a bunch of members and fields.

It helps readability and maintainability (don't need to chase down
writes to a field I see is const, for example)

[mlir][NFC] Provide accessor for TableGen record for constraints

Besides accessing the record, there is currently no way to access all possible
constraint informations, such as the base constraint of a variadic constraint
for example.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D111719

[mlir] Add AnyAttrOf tablegen attribute constraint

AnyAttrOf, similar to AnyTypeOf, expects the attribute to be one of the
given attributes.
For instance, `AnyAttrOf<[I32Attr, StrAttr]>` expects either a `I32Attr`,
or a `StrAttr`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D111739

[clang] Disable -clear-ast-before-backend with -print-stats

We still need access to various things in the ASTContext when printing stats.

Differential Revision: https://reviews.llvm.org/D111973

[LLD][TEST] Add testing for negative addends for R_X86_64_32 and R_X86_64_PC32 relocations

This change is derived from a test case we have locally but I could not
see an equivalent in LLD's testing.

Differential Revision: https://reviews.llvm.org/D111803

[mlir] Fix tsan failure in PassCrashRecovery

Don't set printOpOnDiagnostic, as this is not safe to call from a threaded context.

Differential Revision: https://reviews.llvm.org/D111752

[libc++][NFC] Fix typo in test

[mlir] Add support for specifying printing flags when adding an op to a Diagnostic

This removes edge cases where the default flags we want to use
during printing (e.g. local scope, eliding attributes, etc.)
get missed/dropped.

Differential Revision: https://reviews.llvm.org/D111761

[LV] Record memory widening decisions (NFCI)

Record widening decisions for memory operations within the planned recipes and
use the recorded decisions in code-gen rather than querying the cost model.

Differential Revision: https://reviews.llvm.org/D110479

[libomptarget] Pass OMP_TARGET_OFFLOAD env variable through to tests

Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111995

Allow only valid vector.shape_cast transitive folding

When folding A->B->C => A->C only accept A->C that is valid shape cast

Reviewed By: ThomasRaoux, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D111473

Revert D105169 due to the two-stage failure in ASAN

This reverts the following commits:
37ca7a795b277c20c02a218bf44052278c03344b
9aa6c72b92b6c89cc6d23b693257df9af7de2d15
705387c5074bcca36d626882462ebbc2bcc3bed4
8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515
80dba72a669b5416e97a42fd2c2a7bc5a6d3f44a

[libc++] Add the std::views::reverse range adaptor

Differential Revision: https://reviews.llvm.org/D110426

[Mips] Add glue between CopyFromReg, CopyToReg and RDHWR nodes for TLS

The MIPS ABI requires the thread pointer be accessed via rdhwr $3, $r29.
This is currently represented by (CopyToReg $3, (RDHWR $29)) followed by
a (CopyFromReg $3). However, there is no glue between these, meaning
scheduling can break those apart. In particular, PR51691 is a report
where PseudoSELECT_I was moved to between the CopyToReg and CopyFromReg,
and since its expansion uses branches, it split the def and use of the
physical register between two basic blocks, resulting in the def being
eliminated and the use having no def. It also seems possible that a
similar situation could arise splitting up the CopyToReg from the RDHWR,
causing the RDHWR to use a destination register other than $3, violating
the ABI requirement.

Thus, add glue between all three nodes to ensure they aren't split up
during instruction selection. No regression test is added since any test
would be implictly relying on specific scheduling behaviour, so whilst
it might be testing that glue is preventing reordering today, changes to
scheduling behaviour could result in the test no longer being able to
catch a regression here, as the reordering might no longer happen for
other unrelated reasons.

Fixes PR51691.

Reviewed By: atanasyan, dim

Differential Revision: https://reviews.llvm.org/D111967

[ADT] Add some basic APInt::isPowerOf2() unit test coverage

[mlir][python] Add 'loc' property to ops

Add a read-only `loc` property to Operation and OpView

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D111972

[lldb] Fix missing dependency on libc++ from LLDB test suite on non-Darwin platforms

Right now we only set a dependency on libc++ when the host is Darwin, which
means that libc++ in the build directory is in some undefined state when running
the test suite (it can be fully built, out-of-date or missing). Depending on
whether we have a system libc++ (which LLDB also supports running the libc++
tests against), the outcome is that we sometimes skip the libc++ tests or we run
the tests against a mix of ToT-libc++/system-libc++ (e.g., we compile against
the ToT-libc++ headers and link against the system libc++ library).

This can be demonstrated via `export LIT_FILTER=TestDataFormatterLibcxxSet ninja
check-lldb-api` (or any other libc++ test) and then gradually building parts of
libc++ in the same build (which will slowly change the test behaviour from
`UNSUPPORTED` to various failures to passing depending on how much of libcxx is
built at test time).

Note that this effectively re-enables the (unintentionally) disabled libc++
formatter tests on Linux. Don't revert this if it breaks a libc++ LLDB test,
instead please @skipIf decorate the failing test (as it was probably already
failing before this commit).

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D111981

[InstCombine][DebugInfo] Remove superflous assertion, add test [2/2]

Accidentally committed a prior version of this patch. This is the
correct version.

[SVE][CodeGen] Fix predicate for add/sub + element count patterns

The patterns added in D111441 should use the HasSVEorStreamingSVE
predicate. This changes one incorrect use of HasSVE with the new
patterns.

[AArch64] Improve shuffle vector by using wider types

Try to widen element type to get a new mask value for a better permutation
sequence, so that we can use NEON shuffle instructions, such as zip1/2,
UZP1/2, TRN1/2, REV, INS, etc.
For example:
  shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 2, i32 3>
is equivalent to:
  shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>
Finally, we can get:
  mov     v0.d[0], v1.d[1]

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D111619

[lldb] Delete TestStandardUnwind

It's been broken (not failing, but not testing anything either) for
quite some time now, and nobody noticed. It also (by design) tests
stepping through libc code, which makes it completely non-hermetic.

It's not worth reviving such a test.

[Analysis] add utility function for unary shuffle mask creation

This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891

[analyzer] Allow matching non-CallExprs using CallDescriptions

Fallback to stringification and string comparison if we cannot compare
the `IdentifierInfo`s, which is the case for C++ overloaded operators,
constructors, destructors, etc.

Examples:
  { "std", "basic_string", "basic_string", 2} // match the 2 param std::string constructor
  { "std", "basic_string", "~basic_string" }  // match the std::string destructor
  { "aaa", "bbb", "operator int" } // matches the struct bbb conversion operator to int

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D111535

[analyzer][NFC] Refactor CallEvent::isCalled()

Refactor the code to make it more readable.

It will set up further changes, and improvements to this code in
subsequent patches.
This is a non-functional change.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D111534

[analyzer][NFC] Add unittests for CallDescription and split the old ones

This NFC change accomplishes three things:
1) Splits up the single unittest into reasonable segments.
2) Extends the test infra using a template to select the AST-node
   from which it is supposed to construct a `CallEvent`.
3) Adds a *lot* of different tests, documenting the current
   capabilities of the `CallDescription`. The corresponding tests are
   marked with `FIXME`s, where the current behavior should be different.

Both `CXXMemberCallExpr` and `CXXOperatorCallExpr` are derived from
`CallExpr`, so they are matched by using the default template parameter.
On the other hand, `CXXConstructExpr` is not derived from `CallExpr`.
In case we want to match for them, we need to pass the type explicitly
to the `CallDescriptionAction`.

About destructors:
They have no AST-node, but they are generated in the CFG machinery in
the analyzer. Thus, to be able to match against them, we would need to
construct a CFG and walk on that instead of simply walking the AST.

I'm also relaxing the `EXPECT`ation in the
`CallDescriptionConsumer::performTest()`, to check the `LookupResult`
only if we matched for the `CallDescription`.
This is necessary to allow tests in which we expect *no* matches at all.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D111794

Remove also Block-ABI-Apple.txt from the Makefile

[X86][Costmodel] Add SSE2 sub-128bit vXi32/f32 stride 2 interleaved store costs

Differential Revision: https://reviews.llvm.org/D111941

[X86][Costmodel] Add SSE2 sub-128bit vXi8/16 stride 2 interleaved store costs

Differential Revision: https://reviews.llvm.org/D111941

[lldb] Fix PDB/compilands.test for a3939e1

[OpenMP][OMPT] thread_num determination for programs with explicit tasks

__ompt_get_task_info_internal is now able to determine the right value of the
“thread_num” argument during the execution of an explicit task.

During the execution of a while loop that iterates over the ancestor tasks
hierarchy, the “prev_team” variable was always set to “team” variable at the
beginning of each loop iteration.

Assume that the program contains a parallel region which encloses an explicit
task executed by the worker thread of the region. Also assume that the tool
inquires the “thread_num” of a worker thread for the implicit task that
corresponds to the region (task at “ancestor_level == 1”) and expects to
receive the value of “thread_num > 0”.
After the loop finishes, both “team” and “prev_team” variables are equal and
point to the team information of the parallel region.
The “thread_num” is set to “prev_team->t.t_master_tid”, that is equal to
“team->t.t_master_tid”. In this case, “team->t.t_master_tid” is 0, since
the master thread of the region is the initial master thread of the program.
This leads to a contradiction.

To prevent this, “prev_team” variable is set to “team” variable only at the
time when the loop that has already encountered the implicit task (“taskdata”
variable contains the information about an implicit task) continues iterating
over the implicit task’s ancestors, if any.

After the mentioned loop finishes, the “prev_team” variable might be equal to
NULL. This means that the task at requested “ancestor_level” belongs to the
innermost parallel region, so the “thread_num” will be determined by calling
the “__kmp_get_tid”.

To prove that this patch works, the test case “explicit_task_thread_num.c” is
provided.
It contains the example of the program explained earlier in the summary.

Differential Revision: https://reviews.llvm.org/D110473

[OpenMP][Tests][NFC] Work around ICC bug
Older intel compilers miss the privatization of nested loop variables for
doacross loops. Declaring the variable in the loop makes the test more
robust.

[OpenMP][Tests][NFC] Flagging OMPT tests as XFAIL for Intel compilers

With Intel 19 compiler the teams tests fail to link while trying to link
liboffload.

[Sema] haveSameParameterTypes - replace repeated isNull() test with assertions

As reported on https://pvs-studio.com/en/blog/posts/cpp/0771/ (Snippet 2) - (and mentioned on rGdc4259d5a38409) we are repeating the T1.isNull() check instead of checking T2.isNull() as well, and at this point neither should be null - so we're better off with an assertion.

Differential Revision: https://reviews.llvm.org/D107347

[DebugInfo] Correctly handle arrays with 0-width elements in GEP salvaging

Fixes an issue where GEP salvaging did not properly account for GEP
instructions which stepped over array elements of width 0 (effectively a
no-op). This unnecessarily produced long expressions by appending
`... + (x * 0)` and potentially extended the number of SSA values used
in the dbg.value. This also erroneously triggered an assert in the
salvage function that the element width would be strictly positive.
These issues are resolved by simply ignoring these useless operands.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D111809

[AArch64][SVE][CodeGen] Add tests for RSHRN{T,B} instructions

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D111735

[InstCombine][DebugInfo] Remove superflous assertion, add test

When this code was added, an unnecessary assertion slipped in which we
now hit in real code.

Add a test to defend against it firing again.

[AMDGPU] Remove unused VirtRegMap analysis. NFC.

[DebugInfo][InstrRef] Avoid a crash during DBG_PHI maintenence

With D110105, the isDebug flag for register uses is now a proxy for whether
the instruction is a debug instruction; that causes DBG_PHIs to have their
operands updated by calls to updateDbgUsersToReg, which is the correct
behaviour. However: that function only expects to receive DBG_VALUE
instructions and asserts such.

This patch splits the updating-action into a lambda, and applies it to the
appropriate operands for each kind of debug instruction. Tested with an
ARM test that stimulates this function: I've added some DBG_PHI
instructions that should be updated in the same way as DBG_VALUEs.

Differential Revision: https://reviews.llvm.org/D108641

[lldb] [lldb-server] Refactor ConnectToRemote()

Refactor ConnectToRemote() to improve readability and make future
changes easier:

1. Replace static buffers with std::string.
2. When handling errors, prefer reporting the actual error over dumb
   'connection status is not success'.
3. Move host/port parsing directly into reverse_connection condition
   that is its only user, and simplify it to make its purpose (verifying
   that a valid port is provided) clear.
4. Use llvm::errs() and llvm::outs() instead of fprintf().

Differential Revision: https://reviews.llvm.org/D11196

Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits"

This reverts commit fa16329ae0721023376f24c7577b9020d438df1a.

See comments in discussion. Merged by mistake, not entirely getting what
the problem was.

[NFC] Remove Block-ABI-Apple.txt

This file was rewritten in rst format in clang/docs/Block-ABI-Apple.rst

[lldb][NFC] clang format change

clang format on some demangling files

Reviewed By: teemperor

Differential Revision: https://reviews.llvm.org/D111934

[lldb] Fix SymbolFilePDBTests for a3939e1

[clang][modules] Delay creating `IdentifierInfo` for names of explicit modules

When using explicit Clang modules, some declarations might unexpectedly become invisible.

This is caused by the mechanism that loads PCM files passed via `-fmodule-file=<path>` and creates an `IdentifierInfo` for the module name. The `IdentifierInfo` creation takes place when the `ASTReader` is in a weird state, with modules that are loaded but not yet set up properly. This patch delays the creation of `IdentifierInfo` until the `ASTReader` is done with reading the PCM.

Note that the `-fmodule-file=<name>=<path>` form of the argument doesn't suffer from this issue, since it doesn't create `IdentifierInfo` for the module name.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D111543

[AMDGPU] Add link to bug

Fix signed/unsigned comparison after b5426ced71280

gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.

Remove the verifyAfter mechanism that was replaced by D111397

Differential Revision: https://reviews.llvm.org/D111872

Add new MachineFunction property FailsVerification

TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.

This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.

Differential Revision: https://reviews.llvm.org/D111397

[AMDGPU] Add patterns for i8/i16 local atomic load/store

Add patterns for i8/i16 local atomic load/store.

Added tests for new patterns.

Copied atomic_[store/load]_local.ll to GlobalISel directory.

Differential Revision: https://reviews.llvm.org/D111869

[AIX][cmake] Set atomics related macros when build with xlclang

Set `HAVE_CXX_ATOMICS_WITHOUT_LIB` or `HAVE_LIBATOMIC` when build LLVM with xlclang. With these macros set, libraries like libLLVMSupport are able to know whether it's necessary to add `-latomic` to dependent system libs. If `HAVE_LIBATOMIC` is set, `llvm-config --system-libs` appends `-latomic` to its output.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D111782

[SelectionDAG] Fix illegal widening of scalable-vector loads

The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.

However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).

This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.

Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111885

[X86] Prefer VEX encoding in X86 assembler.

This patch is to order the AVX instructions ahead of AVX512 instructions
in the matching table so that the AVX instructions can be matched first.
Thanks Craig and Shengchen for the idea.

Differential Revision: https://reviews.llvm.org/D111538

[lldb] [Utility] Remove Status::WasInterrupted() along with its only use

Remove Status::WasInterrupted() that checks whether the underlying error
code matches EINTR.  ProcessGDBRemote::ConnectToDebugserver() is its
only call site, and it does not seem correct there.  After all, EINTR
is precisely when we want to retry, not stop retrying.  Furthermore,
it should not really matter since we should be catching EINTR
immediately via llvm::sys::RetryAfterSignal() but that's another story.

Differential Revision: https://reviews.llvm.org/D111908

[AArch64][GISel] Add 8/16 bit uaddo lowering tests.

Precommit tests for D111888.

[AMDGPU] Divergence driven selection for fused bitlogic

The change adds divergence predicates for fused logical operations.
The problem with selecting a scalar fused op such as S_NOR_B32 is
that it does not have a VALU counterpart and will be split in
moveToVALU. At the same time it prevents selection of a better
opcode on the VALU side (such as V_OR3_B32) which does not have a
counterpart on SALU side.

XNOR opcodes are left as is and selected as scalar to get advantage
of the SIInstrInfo::lowerScalarXnor() code which can commute
operations to keep one of two opcodes on SALU if possible. See
xnor.ll test for this.

Differential Revision: https://reviews.llvm.org/D111907

Fix bazel build.

This is a temporary fix, better would be to avoid including
llvm/Option/ArgList.h from a Support source file.

Differential Revision: https://reviews.llvm.org/D111974

[lldb] Return StringRef from PluginInterface::GetPluginName

There is no reason why this function should be returning a ConstString.

While modifying these files, I also fixed several instances where
GetPluginName and GetPluginNameStatic were returning different strings.

I am not changing the return type of GetPluginNameStatic in this patch, as that
would necessitate additional changes, and this patch is big enough as it is.

Differential Revision: https://reviews.llvm.org/D111877

Fix cyclic header dependency between Support<->Option due to RISCVISAInfo

This was introduced in D105168 which added RISCVISAInfo.h.

[Parse] Improve diagnostic and recovery when there is an extra override in the outline method definition.

The clang behavior was poor before this patch:

```
void B::foo() override {}
// Before: clang emited "expcted function body after function
// declarator", and skiped all contents until it hits a ";", the
// following function f() is discarded.

// VS

// Now "override is not allowed" with a remove fixit, and following f()
// is retained.
void f();
```

Differential Revision: https://reviews.llvm.org/D111883

[AArch64] Fixed a bug on AArch64MIPeepholeOpt

Create new virtual register for the definition of new AND instruction and
replace old register by the new one to keep SSA form.

Differential Revision: https://reviews.llvm.org/D109963

[MachineSink] Compile time improvement for large testcases which has many kill flags

We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05

After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D111688

[PowerPC] Implement scheduling model for Power10

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D110855

[JITLink] Add comments, rename types for visitExistingEdges utility.

The "Fixers" name was a hangover from an earlier draft of the patch. "Visitors"
fits the function name(s).

[NFC] [LoopPeel] Change the way DT is updated for loop exits

When peeling a loop, we assume that the latch has a `br` terminator and
that all loop exits are either terminated with an `unreachable` or have
a terminating deoptimize call. So when we peel off the 1st iteration, we
change the IDom of all loop exits to the peeled copy of
`NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support
loops with exits that are followed by a block with an `unreachable` or a
terminating deoptimize call, changing the exit's idom wouldn't be enough
and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So
neither of the exits dominates this unreachable block. If we change the
IDoms of the exits to some peeled loop block, we don't update the
dominators of the unreachable block. Currently we just don't get to the
peeling logic, saying that we can't peel such loops.

With this NFC we just insert edges from cloned exiting blocks to their
exits after peeling each iteration (we accumulate the insertion updates
and then after peeling apply the updates to DT).

This patch was a part of D110922.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev

[lldb] Skip target variable test on AS

[clang] Use llvm::erase_if (NFC)

[CostModel][X86] Add mul by positive/negative power-of-2 constants tests

We have backend optimizations for these, but currently the costmodel doesn't match them

[fir] Add IfBuilder and utility functions

In order to reduct the size of D111337. The IfBuilder and the two
utility functions genIsNotNull and genIsNull have been extracted in
a separate patch with dedicated unittests.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: Leporacanthicus

Differential Revision: https://reviews.llvm.org/D111796

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

[CostModel][X86] Add div/rem by negative power-of-2 constants

We have backend optimizations for these (like we do for power-of-2 divisions), but currently the costmodel doesn't match them

[X86][SLM] Fix BitTest+Set uops + port usage

Both ports are required for BitTest ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures and what Intel AoM / Agner reports as well.

[X86][SLM] Fix uops for PCMPISTR/PCMPISTR instructions

Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.

[X86][SLM] Fix uops for PCLMULQDQ

Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.

[X86][SLM] +1uop for PSHUFBrm xmm

Extra 1uop for folded pshufb ops, based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.

[ConstantRange] Add fast signed multiply

The multiply() implementation is very slow -- it performs six
multiplications in double the bitwidth, which means that it will
typically work on allocated APInts and bypass fast-path
implementations. Add an additional implementation that doesn't
try to produce anything better than a full range if overflow is
possible. At least for the BasicAA use-case, we really don't care
about more precise modeling of overflow behavior. The current
use of multiply() is fine while the implementation is limited to
a single index, but extending it to the multiple-index case makes
the compile-time impact untenable.

[X86][Costmodel] Load/store i64 Stride=4 VF=16 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/9bnKrefcG - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So could pick cost of `40`

For store we have:
https://godbolt.org/z/5s3s14dEY - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So we could pick cost of `40`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111945

[X86][Costmodel] Load/store i64 Stride=2 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/MTaKboejM - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So could pick cost of `32`

For store we have:
https://godbolt.org/z/v7xPj3Wd4 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111944

[X86][Costmodel] Load/store i32 Stride=4 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/11rcvdreP - for intels `Block RThroughput: <=68.0`; for ryzens, `Block RThroughput: <=48.0`
So could pick cost of `68`

For store we have:
https://godbolt.org/z/6aM11fWcP - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `64`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111943

[X86][Costmodel] Load/store i32 Stride=3 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/s5b6E6jsP - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`

For store we have:
https://godbolt.org/z/efh99d93b - for intels `Block RThroughput: <=48.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `48`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111942

[X86][Costmodel] Load/store i16 Stride=6 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/YTeT9M7fW - for intels `Block RThroughput: <=212.0`; for ryzens, `Block RThroughput: <=64.0`
So could pick cost of `212`

For store we have:
https://godbolt.org/z/vc954KEGP - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=24.0`
So we could pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111940

This patch supports the following checks for THREADPRIVATE Directive:
```
[5.1] 2.21.2 THREADPRIVATE Directive
A variable that appears in a threadprivate directive must be declared in
the scope of a module or have the SAVE attribute, either explicitly or
implicitly.
A variable that appears in a threadprivate directive must not be an
element of a common block or appear in an EQUIVALENCE statement.
```

This patch supports the following checks for DECLARE TARGET Directive:
```
[5.1] 2.14.7 Declare Target Directive
A variable that is part of another variable (as an array, structure
element or type parameter inquiry) cannot appear in a declare
target directive.
A variable that appears in a declare target directive must be declared
in the scope of a module or have the SAVE attribute, either explicitly
or implicitly.
A variable that appears in a declare target directive must not be an
element of a common block or appear in an EQUIVALENCE statement.
```

As Fortran 2018 standard [8.5.16] states, a variable, common block, or
procedure pointer declared in the scoping unit of a main program,
module, or submodule implicitly has the SAVE attribute, which may be
confirmed by explicit specification.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D109864

Bump the value of __STDC_VERSION__ in -std=c2x mode

Previously, we reported the same value as for C17, now we report 202000L, which
is the same value currently used by GCC.

Once C23 ships, this value will be bumped to the correct date.

[InstCombine] Add some extra tests for truncated saturates. NFC

Lex arguments for __has_cpp_attribute and friends as expanded tokens

The C and C++ standards require the argument to __has_cpp_attribute and
__has_c_attribute to be expanded ([cpp.cond]p5). It would make little sense
to expand the argument to those operators but not expand the argument to
__has_attribute and __has_declspec, so those were both also changed in this
patch.

Note that it might make sense for the other builtins to also expand their
argument, but it wasn't as clear to me whether the behavior would be correct
there, and so they were left for a future revision.