review.tizen.org Git - platform/upstream/llvm.git/log

[OpenMP][Tests][NFC] Work around ICC bug
Older intel compilers miss the privatization of nested loop variables for
doacross loops. Declaring the variable in the loop makes the test more
robust.

[OpenMP][Tests][NFC] Flagging OMPT tests as XFAIL for Intel compilers

With Intel 19 compiler the teams tests fail to link while trying to link
liboffload.

[Sema] haveSameParameterTypes - replace repeated isNull() test with assertions

As reported on https://pvs-studio.com/en/blog/posts/cpp/0771/ (Snippet 2) - (and mentioned on rGdc4259d5a38409) we are repeating the T1.isNull() check instead of checking T2.isNull() as well, and at this point neither should be null - so we're better off with an assertion.

Differential Revision: https://reviews.llvm.org/D107347

[DebugInfo] Correctly handle arrays with 0-width elements in GEP salvaging

Fixes an issue where GEP salvaging did not properly account for GEP
instructions which stepped over array elements of width 0 (effectively a
no-op). This unnecessarily produced long expressions by appending
`... + (x * 0)` and potentially extended the number of SSA values used
in the dbg.value. This also erroneously triggered an assert in the
salvage function that the element width would be strictly positive.
These issues are resolved by simply ignoring these useless operands.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D111809

[AArch64][SVE][CodeGen] Add tests for RSHRN{T,B} instructions

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D111735

[InstCombine][DebugInfo] Remove superflous assertion, add test

When this code was added, an unnecessary assertion slipped in which we
now hit in real code.

Add a test to defend against it firing again.

[AMDGPU] Remove unused VirtRegMap analysis. NFC.

[DebugInfo][InstrRef] Avoid a crash during DBG_PHI maintenence

With D110105, the isDebug flag for register uses is now a proxy for whether
the instruction is a debug instruction; that causes DBG_PHIs to have their
operands updated by calls to updateDbgUsersToReg, which is the correct
behaviour. However: that function only expects to receive DBG_VALUE
instructions and asserts such.

This patch splits the updating-action into a lambda, and applies it to the
appropriate operands for each kind of debug instruction. Tested with an
ARM test that stimulates this function: I've added some DBG_PHI
instructions that should be updated in the same way as DBG_VALUEs.

Differential Revision: https://reviews.llvm.org/D108641

[lldb] [lldb-server] Refactor ConnectToRemote()

Refactor ConnectToRemote() to improve readability and make future
changes easier:

1. Replace static buffers with std::string.
2. When handling errors, prefer reporting the actual error over dumb
   'connection status is not success'.
3. Move host/port parsing directly into reverse_connection condition
   that is its only user, and simplify it to make its purpose (verifying
   that a valid port is provided) clear.
4. Use llvm::errs() and llvm::outs() instead of fprintf().

Differential Revision: https://reviews.llvm.org/D11196

Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits"

This reverts commit fa16329ae0721023376f24c7577b9020d438df1a.

See comments in discussion. Merged by mistake, not entirely getting what
the problem was.

[NFC] Remove Block-ABI-Apple.txt

This file was rewritten in rst format in clang/docs/Block-ABI-Apple.rst

[lldb][NFC] clang format change

clang format on some demangling files

Reviewed By: teemperor

Differential Revision: https://reviews.llvm.org/D111934

[lldb] Fix SymbolFilePDBTests for a3939e1

[clang][modules] Delay creating `IdentifierInfo` for names of explicit modules

When using explicit Clang modules, some declarations might unexpectedly become invisible.

This is caused by the mechanism that loads PCM files passed via `-fmodule-file=<path>` and creates an `IdentifierInfo` for the module name. The `IdentifierInfo` creation takes place when the `ASTReader` is in a weird state, with modules that are loaded but not yet set up properly. This patch delays the creation of `IdentifierInfo` until the `ASTReader` is done with reading the PCM.

Note that the `-fmodule-file=<name>=<path>` form of the argument doesn't suffer from this issue, since it doesn't create `IdentifierInfo` for the module name.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D111543

[AMDGPU] Add link to bug

Fix signed/unsigned comparison after b5426ced71280

gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.

Remove the verifyAfter mechanism that was replaced by D111397

Differential Revision: https://reviews.llvm.org/D111872

Add new MachineFunction property FailsVerification

TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.

This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.

Differential Revision: https://reviews.llvm.org/D111397

[AMDGPU] Add patterns for i8/i16 local atomic load/store

Add patterns for i8/i16 local atomic load/store.

Added tests for new patterns.

Copied atomic_[store/load]_local.ll to GlobalISel directory.

Differential Revision: https://reviews.llvm.org/D111869

[AIX][cmake] Set atomics related macros when build with xlclang

Set `HAVE_CXX_ATOMICS_WITHOUT_LIB` or `HAVE_LIBATOMIC` when build LLVM with xlclang. With these macros set, libraries like libLLVMSupport are able to know whether it's necessary to add `-latomic` to dependent system libs. If `HAVE_LIBATOMIC` is set, `llvm-config --system-libs` appends `-latomic` to its output.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D111782

[SelectionDAG] Fix illegal widening of scalable-vector loads

The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.

However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).

This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.

Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111885

[X86] Prefer VEX encoding in X86 assembler.

This patch is to order the AVX instructions ahead of AVX512 instructions
in the matching table so that the AVX instructions can be matched first.
Thanks Craig and Shengchen for the idea.

Differential Revision: https://reviews.llvm.org/D111538

[lldb] [Utility] Remove Status::WasInterrupted() along with its only use

Remove Status::WasInterrupted() that checks whether the underlying error
code matches EINTR.  ProcessGDBRemote::ConnectToDebugserver() is its
only call site, and it does not seem correct there.  After all, EINTR
is precisely when we want to retry, not stop retrying.  Furthermore,
it should not really matter since we should be catching EINTR
immediately via llvm::sys::RetryAfterSignal() but that's another story.

Differential Revision: https://reviews.llvm.org/D111908

[AArch64][GISel] Add 8/16 bit uaddo lowering tests.

Precommit tests for D111888.

[AMDGPU] Divergence driven selection for fused bitlogic

The change adds divergence predicates for fused logical operations.
The problem with selecting a scalar fused op such as S_NOR_B32 is
that it does not have a VALU counterpart and will be split in
moveToVALU. At the same time it prevents selection of a better
opcode on the VALU side (such as V_OR3_B32) which does not have a
counterpart on SALU side.

XNOR opcodes are left as is and selected as scalar to get advantage
of the SIInstrInfo::lowerScalarXnor() code which can commute
operations to keep one of two opcodes on SALU if possible. See
xnor.ll test for this.

Differential Revision: https://reviews.llvm.org/D111907

Fix bazel build.

This is a temporary fix, better would be to avoid including
llvm/Option/ArgList.h from a Support source file.

Differential Revision: https://reviews.llvm.org/D111974

[lldb] Return StringRef from PluginInterface::GetPluginName

There is no reason why this function should be returning a ConstString.

While modifying these files, I also fixed several instances where
GetPluginName and GetPluginNameStatic were returning different strings.

I am not changing the return type of GetPluginNameStatic in this patch, as that
would necessitate additional changes, and this patch is big enough as it is.

Differential Revision: https://reviews.llvm.org/D111877

Fix cyclic header dependency between Support<->Option due to RISCVISAInfo

This was introduced in D105168 which added RISCVISAInfo.h.

[Parse] Improve diagnostic and recovery when there is an extra override in the outline method definition.

The clang behavior was poor before this patch:

```
void B::foo() override {}
// Before: clang emited "expcted function body after function
// declarator", and skiped all contents until it hits a ";", the
// following function f() is discarded.

// VS

// Now "override is not allowed" with a remove fixit, and following f()
// is retained.
void f();
```

Differential Revision: https://reviews.llvm.org/D111883

[AArch64] Fixed a bug on AArch64MIPeepholeOpt

Create new virtual register for the definition of new AND instruction and
replace old register by the new one to keep SSA form.

Differential Revision: https://reviews.llvm.org/D109963

[MachineSink] Compile time improvement for large testcases which has many kill flags

We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05

After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D111688

[PowerPC] Implement scheduling model for Power10

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D110855

[JITLink] Add comments, rename types for visitExistingEdges utility.

The "Fixers" name was a hangover from an earlier draft of the patch. "Visitors"
fits the function name(s).

[NFC] [LoopPeel] Change the way DT is updated for loop exits

When peeling a loop, we assume that the latch has a `br` terminator and
that all loop exits are either terminated with an `unreachable` or have
a terminating deoptimize call. So when we peel off the 1st iteration, we
change the IDom of all loop exits to the peeled copy of
`NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support
loops with exits that are followed by a block with an `unreachable` or a
terminating deoptimize call, changing the exit's idom wouldn't be enough
and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So
neither of the exits dominates this unreachable block. If we change the
IDoms of the exits to some peeled loop block, we don't update the
dominators of the unreachable block. Currently we just don't get to the
peeling logic, saying that we can't peel such loops.

With this NFC we just insert edges from cloned exiting blocks to their
exits after peeling each iteration (we accumulate the insertion updates
and then after peeling apply the updates to DT).

This patch was a part of D110922.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev

[lldb] Skip target variable test on AS

[clang] Use llvm::erase_if (NFC)

[CostModel][X86] Add mul by positive/negative power-of-2 constants tests

We have backend optimizations for these, but currently the costmodel doesn't match them

[fir] Add IfBuilder and utility functions

In order to reduct the size of D111337. The IfBuilder and the two
utility functions genIsNotNull and genIsNull have been extracted in
a separate patch with dedicated unittests.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: Leporacanthicus

Differential Revision: https://reviews.llvm.org/D111796

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

[CostModel][X86] Add div/rem by negative power-of-2 constants

We have backend optimizations for these (like we do for power-of-2 divisions), but currently the costmodel doesn't match them

[X86][SLM] Fix BitTest+Set uops + port usage

Both ports are required for BitTest ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures and what Intel AoM / Agner reports as well.

[X86][SLM] Fix uops for PCMPISTR/PCMPISTR instructions

Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.

[X86][SLM] Fix uops for PCLMULQDQ

Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.

[X86][SLM] +1uop for PSHUFBrm xmm

Extra 1uop for folded pshufb ops, based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.

[ConstantRange] Add fast signed multiply

The multiply() implementation is very slow -- it performs six
multiplications in double the bitwidth, which means that it will
typically work on allocated APInts and bypass fast-path
implementations. Add an additional implementation that doesn't
try to produce anything better than a full range if overflow is
possible. At least for the BasicAA use-case, we really don't care
about more precise modeling of overflow behavior. The current
use of multiply() is fine while the implementation is limited to
a single index, but extending it to the multiple-index case makes
the compile-time impact untenable.

[X86][Costmodel] Load/store i64 Stride=4 VF=16 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/9bnKrefcG - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So could pick cost of `40`

For store we have:
https://godbolt.org/z/5s3s14dEY - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So we could pick cost of `40`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111945

[X86][Costmodel] Load/store i64 Stride=2 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/MTaKboejM - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So could pick cost of `32`

For store we have:
https://godbolt.org/z/v7xPj3Wd4 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111944

[X86][Costmodel] Load/store i32 Stride=4 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/11rcvdreP - for intels `Block RThroughput: <=68.0`; for ryzens, `Block RThroughput: <=48.0`
So could pick cost of `68`

For store we have:
https://godbolt.org/z/6aM11fWcP - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `64`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111943

[X86][Costmodel] Load/store i32 Stride=3 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/s5b6E6jsP - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`

For store we have:
https://godbolt.org/z/efh99d93b - for intels `Block RThroughput: <=48.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `48`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111942

[X86][Costmodel] Load/store i16 Stride=6 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/YTeT9M7fW - for intels `Block RThroughput: <=212.0`; for ryzens, `Block RThroughput: <=64.0`
So could pick cost of `212`

For store we have:
https://godbolt.org/z/vc954KEGP - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=24.0`
So we could pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111940

This patch supports the following checks for THREADPRIVATE Directive:
```
[5.1] 2.21.2 THREADPRIVATE Directive
A variable that appears in a threadprivate directive must be declared in
the scope of a module or have the SAVE attribute, either explicitly or
implicitly.
A variable that appears in a threadprivate directive must not be an
element of a common block or appear in an EQUIVALENCE statement.
```

This patch supports the following checks for DECLARE TARGET Directive:
```
[5.1] 2.14.7 Declare Target Directive
A variable that is part of another variable (as an array, structure
element or type parameter inquiry) cannot appear in a declare
target directive.
A variable that appears in a declare target directive must be declared
in the scope of a module or have the SAVE attribute, either explicitly
or implicitly.
A variable that appears in a declare target directive must not be an
element of a common block or appear in an EQUIVALENCE statement.
```

As Fortran 2018 standard [8.5.16] states, a variable, common block, or
procedure pointer declared in the scoping unit of a main program,
module, or submodule implicitly has the SAVE attribute, which may be
confirmed by explicit specification.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D109864

Bump the value of __STDC_VERSION__ in -std=c2x mode

Previously, we reported the same value as for C17, now we report 202000L, which
is the same value currently used by GCC.

Once C23 ships, this value will be bumped to the correct date.

[InstCombine] Add some extra tests for truncated saturates. NFC

Lex arguments for __has_cpp_attribute and friends as expanded tokens

The C and C++ standards require the argument to __has_cpp_attribute and
__has_c_attribute to be expanded ([cpp.cond]p5). It would make little sense
to expand the argument to those operators but not expand the argument to
__has_attribute and __has_declspec, so those were both also changed in this
patch.

Note that it might make sense for the other builtins to also expand their
argument, but it wasn't as clear to me whether the behavior would be correct
there, and so they were left for a future revision.

[llvm][AArch64][SVE] Fold literals into math instructions

SVE has predicated literal forms of some instructions for specific
literals, which currently are generated correctly when using ACLE
but not when those instructions are generated directly.

This adds the patterns to generate those instructions when
generating from standard LLVM IR instructions.

Differential Revision: https://reviews.llvm.org/D99074

tsan: refactor trace tests

Instead of creating real threads for trace tests
create a new ThreadState in the main thread.
This makes the tests more unit-testy and will also
help with future trace tests that will need
more than 1 thread. Creating more than 1 real thread and
dispatching test actions across multiple threads in the
required deterministic order is painful.

This is resubmit of reverted D110546 with 2 changes:
1. The previous version patched ImitateTlsWrite to not
expect ThreadState to be allocated in TLS (the CHECK
failed for the fake test threads).
This added an ugly hack into production code and was still
logically wrong because we imitated write to the main
thread TLS/stack when we started the fake test thread
(which has nothing to do with the main thread TLS/stack).
This version uses ThreadType::Fiber instead of ThreadType::Regular
for the fake threads. This naturally makes ThreadStart skip
obtaining stack/tls and imitating writes to them.

2. This version still skips the tests on Darwin and PowerPC
to be on the safer side. Build bots reported failures for PowerPC
for the previous version.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D111156

[gn build] Port ff13189c5d0d

[RISCV][NFC] Fix build error

[RISCV] Unify the arch string parsing logic to to RISCVISAInfo.

How many place you need to modify when implementing a new extension for RISC-V?

At least 7 places as I know:

- Add new SubtargetFeature at RISCV.td
- -march parser in RISCV.cpp
- RISCVTargetInfo::initFeatureMap@RISCV.cpp for handling feature vector.
- RISCVTargetInfo::getTargetDefines@RISCV.cpp for pre-define marco.
- Arch string parser for ELF attribute in RISCVAsmParser.cpp
- ELF attribute emittion in RISCVAsmParser.cpp, and make sure it's in
  canonical order...
- ELF attribute emittion in RISCVTargetStreamer.cpp, and again, must in
  canonical order...

And now, this patch provide an unified infrastructure for handling (almost)
everything of RISC-V arch string.

After this patch, you only need to update 2 places for implement an extension
for RISC-V:
- Add new SubtargetFeature at RISCV.td, hmmm, it's hard to avoid.
- Add new entry to RISCVSupportedExtension@RISCVISAInfo.cpp or
  SupportedExperimentalExtensions@RISCVISAInfo.cpp .

Most codes are come from existing -march parser, but with few new feature/bug
fixes:
- Accept version for -march, e.g. -march=rv32i2p0.
- Reject version info with `p` but without minor version number like `rv32i2p`.

Differential Revision: https://reviews.llvm.org/D105168

Use llvm::erase_value (NFC)

Fix a few warnings (signed/unsigned comparison in gtest, and missing field initializers)

[MLIR][LLVM] Add memset intrinsic

Add memset intrinsic into LLVM dialect

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D111906

Revert "[AArch64] Optimize add/sub with immediate"

This reverts commit 9bf6bef9951a1c230796ccad2c5c0195ce4c4dff.

[Object] Simplify RELR decoding

[NFC][sanitizer] Add StackDepotTestOnlyUnmap

[NFC][sanitizer] Rename stack depot tests

[X86] Add DAG combine for negation of CMOV absolute value pattern.

This patch detects the absolute value pattern on the RHS of a
subtract. If we find it we swap the CMOV true/false values and
replace the subtract with an ADD.

There may be a more generic way to do this, but I'm not sure.
Targets that don't have legal or custom ISD::ABS use a generic
expand in DAG combiner already when it sees (neg (abs(x))). I
haven't checked what happens if the neg is a more general subtract.

Fixes PR50991 for X86.

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D111858

[Builders.h] Silence a warning by adding a cast.

The no-result version of createOrFold calls 'tryFold' but
ignores the result since it doesn't matter what it produced.
Explicitly cast to void to silence this warning:

../llvm/mlir/include/mlir/IR/Builders.h:454:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
tryFold(op.getOperation(), unused);
^~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~

Differential Revision: https://reviews.llvm.org/D111951

Improve fatal error message when an Attribute or Type wasn't initialized by a dialect (NFC)

The existing message hints that the dialect may not be loaded, but there
is also the possibility that the dialect was loaded and the initialize()
method didn't include the Type/Attribute.

Revert "[clang] Pass -clear-ast-before-backend in Clang::ConstructJob()"

This reverts commit 47eb99aa44ab1d20327d67a49d6c47163de76387.

This causes crashes with -print-stats: PR52193.

[APInt] Fix 1-bit edge case in smul_ov()

The sdiv used to check for overflow can itself overflow if the
LHS is signed min and the RHS is -1. The code tried to account for
this by also checking the commuted version. However, for 1-bit
values, signed min and -1 are the same value, so both divisions
overflow. As such, the overflow for -1 * -1 was not detected
(which results in -1 rather than 1 for 1-bit values). Fix this by
explicitly checking for this case instead.

Noticed while adding exhaustive test coverage for smul_ov(),
which is also part of this commit.

[OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode`

D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect
whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`.
It is based on the assumption that:
- In SPMD mode, parallel level is initialized to 1.
- In generic mode, parallel level is initialized to 0.
- `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise.

Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there
was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or
1 since C++14. In D110279, the return value is the result of an `and` operation,
which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`.

Reviewed By: carlo.bertolli, dpalermo

Differential Revision: https://reviews.llvm.org/D111905

[TTI][X86] Add v8i16 -> 2 x v4i16 stride 2 interleaved load costs

Split SSE2 and SSSE3 costs to correctly handle PSHUFB lowering - as was noted on D111938

[libc++][doc] Adds more issue status labels.

A followup to D111458 adding more labels to LWG-issues. This should add
the labels for the not completed chrono, format, ranges, and spaceship
issues.

Some minor formatting cleanups along the way.

Reviewed By: #libc, Quuxplusone

Differential Revision: https://reviews.llvm.org/D111935

[TTI][X86] Add SSE2 sub-128bit vXi16/32 and v2i64 stride 2 interleaved load costs

These cases use the same codegen as AVX2 (pshuflw/pshufd) for the sub-128bit vector deinterleaving, and unpcklqdq for v2i64.

It's going to take a while to add full interleaved cost coverage, but since these are the same for SSE2 -> AVX2 it should be an easy win.

Fixes PR47437

Differential Revision: https://reviews.llvm.org/D111938

[NFC][X86][Codegen] Add missing interleaving tests after D111546

Use llvm::is_contained (NFC)

[Support] Add more Windows error codes to mapWindowsError

Also sort ERROR_BAD_NETPATH correctly.

Compared with the similar error code mapping in
libcxx/src/filesystem/operations.cpp, I'm leaving out
mappings for ERROR_NOT_SAME_DEVICE and ERROR_OPERATION_ABORTED.
They map nicely to std::errc::cross_device_link and
std::errc::operation_canceled, but those aren't available in
llvm::errc, as they aren't available across all platforms.

Also, the libcxx version maps ERROR_INVALID_NAME to
no_such_file_or_directory instead of invalid_argument.

Differential Revision: https://reviews.llvm.org/D111874

[LV][X86] Add PR47437 test case

[lldb] Split ParseSingleMember into Obj-C property and normal member/ivar parsing code.

Right now DWARFASTParserClang::ParseSingleMember has two parts: One part parses
Objective-C properties and the other part parses C/C++ members/Objective-C
ivars. These parts are pretty much independent of each other (with one
historical exception, see below) and in practice they parse DIEs with different
tags/attributes: `DW_TAG_APPLE_property` and `DW_TAG_member`.

I don't see a good reason for keeping the different parsing code intertwined in
a single function, so instead split out the Objective-C property parser into its
own function.

Note that 90% of this commit is just unindenting nearly all of
`ParseSingleMember` which was inside a `if (tag == DW_TAG_member)` block. I.e.,
think of the old `ParseSingleMember` function as: The rest is just moving the
property parsing code into its own function and I added the ReportError
implementation in case we fail to resolve the property type (which before was
just a silent failure).

```
lang=c++
void DWARFASTParserClang::ParseSingleMember(...) {
  [...]
  if (tag == DW_TAG_member) {
    [...] // This huge block got unindented in this patch as the `if` above is gone.
  }
  if (property) {
    [...] // This is the property parsing code that is now its own function.
  }
}
```

There is one exception to the rule that the parsers are independent. Before 2012
Objective-C properties were encoded as `DW_TAG_member` with
`DW_AT_APPLE_property*` attributes describing the property. In 2012 this has
changed in a series of commits (see for example
c0449635b35b057c5a877343b0c5f14506c7cf02 which updates the docs) so that
`DW_TAG_APPLE_property` is now used for properties. With the old format we first
created an ivar and afterwards used the `DW_AT_APPLE_property*` attributes to
create the respective property, but there doesn't seem to be any way to create
such debug info with any clang from the last 9 years. So this is technically not
NFC in case some finds debug info from that time and tries to use properties.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D111632

[Symbolize] Demangle Rust symbols

Add support for demangling Rust v0 symbols to LLVM symbolizer by reusing
nonMicrosoftDemangle which supports both Itanium and Rust mangling.

Reviewed By: dblaikie, jhenderson

Part of https://reviews.llvm.org/D110664

[llvm-cxxfilt] Use nonMicrosoftDemangle for demangling NFC

Reviewed By: dblaikie, jhenderson

Part of https://reviews.llvm.org/D110664

[Demangle] Extract nonMicrosoftDemangle from llvm::demangle

Introduce a new demangling function that supports symbols using Itanium
mangling and Rust v0 mangling, and is expected in the near future to
include support for D mangling as well.

Unlike llvm::demangle, the function does not accept extra underscore
decoration. The callers generally know exactly when symbols should
include the extra decoration and so they should be responsible for
stripping it.

Functionally the only intended change is to allow demangling Rust
symbols with an extra underscore decoration through llvm::demangle,
which matches the existing behaviour for Itanium symbols.

Reviewed By: dblaikie, jhenderson

Part of https://reviews.llvm.org/D110664

[docs] Mention DragonFlyBSD as a supported platform for LLVM.

Differential Revision: https://reviews.llvm.org/D111758

[Analysis] Replace assert(isa)/dyn_cast with cast. NFC.

cast<> will perform the assertion for us.

Removes a static analysis null dereference warning.

[LazyValueInfo] getPredicateAt - remove unnecessary null pointer check. NFC.

We already dereference the CxtI pointer several times before reaching the "if(CxtI)", we have no need to check it again.

Fixes a coverity warning.

[ConstantFolding] ConstantFoldScalarCall2 - early-out if getLibFunc fails. NFC.

[ConstantFolding] Use getValueAPF const ref value where possible. NFC.

Don't copy the value if we can avoid it.

[ConstantFolding] ConstantFoldScalarCall1 - early-out if getLibFunc fails. NFC.

[X86][LV] X86 does *not* prefer vectorized addressing

And another attempt to start untangling this ball of threads around gather.
There's `TTI::prefersVectorizedAddressing()`hoop, which confusingly defaults to `true`,
which tells LV to try to vectorize the addresses that lead to loads,
but X86 generally can not deal with vectors of addresses,
the only instructions that support that are GATHER/SCATTER,
but even those aren't available until AVX2, and aren't really usable until AVX512.

This specializes the hook for X86, to return true only if we have AVX512 or AVX2 w/ fast gather.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111546

[AArch64] Optimize add/sub with immediate

Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034

[clang-tidy] Fix false positive in cppcoreguidelines-virtual-class-destructor

Incorrectly triggers for template classes that inherit
from a base class that has virtual destructor.

Any class inheriting from a base that has a virtual destructor
will have their destructor also virtual, as per the Standard:

https://timsong-cpp.github.io/cppwp/n4140/class.dtor#9

> If a class has a base class with a virtual destructor,
> its destructor (whether user- or implicitly-declared) is virtual.

Added unit tests to prevent regression.

Fixes bug https://bugs.llvm.org/show_bug.cgi?id=51912

Differential Revision: https://reviews.llvm.org/D110614

[mlir][linalg][bufferize] Relax rules for extract_slice/insert_slice matching

The rules were too restrictive, causing out-of-place bufferization when the result of two ExtractSliceOp is fed into an InsertSliceOp.

Differential Revision: https://reviews.llvm.org/D111861

[TableGen] Replace static_cast with llvm's cast. NFC

These all appear next to an isa<> and cast<> is much more
common in these cases.

Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc

[MLIR] Generalize Affine dependence analysis using Affine Relations

This patch removes code very specific to affine dependence analysis and
refactors it as a FlatAfffineRelation.

A FlatAffineRelation represents a set of ordered pairs (domain -> range) where
"domain" and "range" are tuples of identifiers. These relations are used to
represent an "access relation" for memory access on a memref. An access
relation maps elements of an iteration domain to the element(s) of an array
domain accessed by that iteration of the associated statement through some
array reference. The dependence relation representing the dependence
constraints between two memory accesses can be built by composing the access
relation of the destination access by the inverse of the access relation of
source access.

This patch does not change the functionality of the existing dependence
analysis in checkMemrefAccessDependence, but refactors it to use
FlatAffineRelations to deduplicate code and enable code reuse for future
development of features like scheduling, value-based dependence analysis, etc.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D110563

Fix lit test failures in clang-ppc* and clang-x64-windows-msvc

Resolve lit failures in clang after 8ca4b3e's land

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)

This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default

Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169

[Polly][docs] Fix Sphinx warning.

ReStructured Text is not Markdown.