review.tizen.org Git - platform/upstream/llvm.git/log

[OpenMP][libomp] Allow users to specify KMP_HW_SUBSET in any order

Remove restriction forcing users to specify the KMP_HW_SUBSET value in
topology order. This patch sorts the user KMP_HW_SUBSET value before
trying to apply it. For example: 1s,4c,2t is equivalent to 2t,1s,4c

Differential Revision: https://reviews.llvm.org/D112027

[lldb] remove usage of distutils, fix python path on debian/ubuntu

distutils is deprecated and will be removed, so we shouldn't be
using it.

We were using it to compute LLDB_PYTHON_RELATIVE_PATH.

Discussing a similar issue
[at python.org](https://bugs.python.org/issue41282), Filipe Laíns said:

    If you are relying on the value of distutils.sysconfig.get_python_lib()
    as you shown in your system, you probably don't want to. That
    directory (dist-packages) should be for Debian provided packages
    only, so moving to sysconfig.get_path() would be a good thing,
    as it has the correct value for user installed packages on your
    system.

So I propose using a relative path from `sys.prefix` to
`sysconfig.get_path("platlib")` instead.

On Mac and windows, this results in the same paths as we had before,
which are `lib/python3.9/site-packages` and `Lib\site-packages`,
respectively.

On ubuntu however, this will change the path from
`lib/python3/dist-packages` to `lib/python3.9/site-packages`.

This change seems to be correct, as Filipe said above, `dist-packages`
belongs to the distribution, not us.

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D114106

[libc++][NFC] Re-indent and re-order includes in uses_alloc_types.h

[NFC] Update comments to refer to unique_ptr instead of raw pointers.

[clang] Fix typo in 36873fb768dbe

[clang] Try to fix test more after ae98182cf7341181e

We need to use the td-based marshalling instead of doing this manually,
else the setting gets lost on the way to codegen in most build configs.

[OpenMP][libomp][NFC] Remove non-ASCII apostrophe in comment

[SCEVAA] Avoid forming malformed pointer diff expressions

This solves the same crash as in D104503, but with a different approach.

The test case test_non_dom demonstrates a case where scev-aa crashes today. (If exercised either by -eval-aa or -licm.) The basic problem is that SCEV-AA expects to be able to compute a pointer difference between two SCEVs for any two pair of pointers we do an alias query on. For (valid, but out of scope) reasons, we can end up asking whether expressions in different sub-loops can alias each other. This results in a subtraction expression being formed where neither operand dominates the other.

The approach this patch takes is to leverage the "defining scope" notion we introduced for flag semantics to detect and disallow the formation of the problematic SCEV. This ends up being relatively straight forward on that new infrastructure. This change does hint that we should probably be verifying a similar property for all SCEVs somewhere, but I'll leave that to a follow on change.

Differential Revision: D114112

Fix -Wparentheses warnings. NFC.

[SystemZ] [Sanitizer] Bugfixes in internal_clone().

The __flags variable needs to be of type 'long' in order to get sign extended
properly.

internal_clone() uses an svc (Supervisor Call) directly (as opposed to
internal_syscall), and therefore needs to take care to set errno and return
-1 as needed.

Review: Ulrich Weigand

[X86] splitVector - only extract lower half subvector from splats

If we're splitting a source vector that is a splat (with no undefs), just extract (for free) the lower half subvector and use it for both halfs.

[clang] Try to fix test after ae98182cf7341181e

The test assumes an integrated assembler, so use a triple where
that's the default.

[lldb] Port PlatformWindows, PlatformOpenBSD and PlatformRemoteGDBServer to GetSupportedArchitectures

[clang] Address review comments on https://reviews.llvm.org/D113707

- Drop a needless `l` size suffix on a mov instruction in AT&T mode
- Move varying bits of test flags to front
- Add a comment about MS mode test

[libc] fix strtof/d/ld NaN parsing

Fix the fact that previously strtof/d/ld would only accept a NaN as
having parentheses if the thing in the parentheses was a valid number,
now it will accept any combination of letters and numbers, but will only
put valid numbers in the mantissa.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D113790

Fix MSVC signed/unsigned mismatch warning. NFC.

[X86] LowerRotate - improve vXi8 rotate-by-scalar lowering with direct use of (extended) shift-by-scalar helpers.

If we're rotating vXi8 by a splatted amount, then unpack to vXi16, perform a SHL by the (extended) scalar, and then pack the results.

This is a vector equivalent to the "rotl(x,y) -> (((aext(x) << bw) | zext(x)) << (y & (bw-1))) >> bw" style expansion we do for scalars in LowerFunnelShift.

I think we can usefully use this for other vector types and vector funnel-shifts in the future, depending how we expand beyond D113192 for matching rotations/funnel-shifts for more type/ops.

[OpenMP] Add version macro support for 5.1 and 5.2

Differential Revision: https://reviews.llvm.org/D114102

[InstCombine] Generalize complex OR patterns to AND

For every pattern with only NOT, OR, and AND operations there is
always a symmetrical attern with AND and OR swapped.

This adds 2 transformations: https://reviews.llvm.org/D113526

```
(~(a & b) | c) & (~(a & c) | b) --> ~((b ^ c) & a)
(~(a & b) | c) & ~(a & c) --> ~((b | c) & a)
```

```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %and1 = and i4 %b, %a
  %not1 = xor i4 %and1, 15
  %and2 = and i4 %a, %c
  %not2 = xor i4 %and2, 15
  %or = or i4 %not2, %b
  %r = and i4 %or, %not1
  ret i4 %r
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %or = or i4 %b, %c
  %and = and i4 %or, %a
  %r = xor i4 %and, 15
  ret i4 %r
}
Transformation seems to be correct!

----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %and1 = and i4 %a, %b
  %not1 = xor i4 %and1, 15
  %or1 = or i4 %not1, %c
  %and2 = and i4 %a, %c
  %not2 = xor i4 %and2, 15
  %or2 = or i4 %not2, %b
  %and3 = and i4 %or1, %or2
  ret i4 %and3
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %xor = xor i4 %b, %c
  %and = and i4 %xor, %a
  %not = xor i4 %and, 15
  ret i4 %not
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D113526

[llvm-objcopy] Fix some comment typos

[clang] Make -masm=intel affect inline asm style

With this,

  void f() {  __asm__("mov eax, ebx"); }

now compiles with clang with -masm=intel.

This matches gcc.

The flag is not accepted in clang-cl mode. It has no effect on
MSVC-style `__asm {}` blocks, which are unconditionally in intel
mode both before and after this change.

One difference to gcc is that in clang, inline asm strings are
"local" while they're "global" in gcc. Building the following with
-masm=intel works with clang, but not with gcc where the ".att_syntax"
from the 2nd __asm__() is in effect until file end (or until a
".intel_syntax" somewhere later in the file):

  __asm__("mov eax, ebx");
  __asm__(".att_syntax\nmovl %ebx, %eax");
  __asm__("mov eax, ebx");

This also updates clang's intrinsic headers to work both in
-masm=att (the default) and -masm=intel modes.
The official solution for this according to "Multiple assembler dialects in asm
templates" in gcc docs->Extensions->Inline Assembly->Extended Asm
is to write every inline asm snippet twice:

    bt{l %[Offset],%[Base] | %[Base],%[Offset]}

This works in LLVM after D113932 and D113894, so use that.

(Just putting `.att_syntax` at the start of the snippet works in some but not
all cases: When LLVM interpolates in parameters like `%0`, it uses at&t or
intel syntax according to the inline asm snippet's flavor, so the `.att_syntax`
within the snippet happens to late: The interpolated-in parameter is already
in intel style, and then won't parse in the switched `.att_syntax`.)

It might be nice to invent a `#pragma clang asm_dialect push "att"` /
`#pragma clang asm_dialect pop` to be able to force asm style per snippet,
so that the inline asm string doesn't contain the same code in two variants,
but let's leave that for a follow-up.

Fixes PR21401 and PR20241.

Differential Revision: https://reviews.llvm.org/D113707

[llvm-objcopy][MachO] Add llvm-strip support for newer load commands

Previously llvm-strip would fail because of unknown commands.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50044

Differential Revision: https://reviews.llvm.org/D113734

[libc++] Refactor tests for trivially copyable atomics

- Replace irrelevant synopsis by a comment
- Use a .verify.cpp test instead of .compile.fail.cpp
- Remove unnecessary includes in one of the tests (was a copy-paste error)

Differential Revision: https://reviews.llvm.org/D114094

[x86/asm] Let EmitMSInlineAsmStr() handle variants too

This is preparation for D113707, where I want to make `-masm=intel`
emit `asm inteldialect` instructions.

`{movq %rbx, %rax|mov rax, rbx}` is supposed to evaluate to the bit
between { and | for att and to the bit between | and } for intel.
Since intel will become `asm inteldialect`, which alls EmitMSInlineAsmStr(),
EmitMSInlineAsmStr() has to support variants as well.

(clang translates `{...|...}` to `$(...$|...$)`. I'm not sure why
it doesn't just send along only the first `...` or the second `...`
to LLVM, but given the notes in PR23933 let's not do a big
reorganization in this codepath.)

Differential Revision: https://reviews.llvm.org/D113932

[RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent.

If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.

The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.

This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.

We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.

Differential Revision: https://reviews.llvm.org/D111904

[x86/asm] Make variants work when converting at&t inline asm input to intel asm output

`asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm
input), so EmitGCCInlineAsmStr() always has to pick the same variant since it
cares about the input asm string, not the output asm string.

For PowerPC, that default variant is 1. For other targets, it's 0.

Without this, the included test case errors out with

error: unknown use of instruction mnemonic without a size suffix
mov rax, rbx

since it picks the intel branch and then tries to interpret it as AT&T
when selecting intel-style output with `-x86-asm-syntax=intel`.

Differential Revision: https://reviews.llvm.org/D113894

[clangd] Dont include file version in task name

This will drop file version information from span names, reducing
overall cardinality and also effect logging when skipping actions in scheduler.

Differential Revision: https://reviews.llvm.org/D113390

[llvm-objdump/mac] Add support for new load commands

Differential Revision: https://reviews.llvm.org/D113733

[flang] Deal with negative character lengths in semantics

Fortran defines LEN(X) = 0 after CHARACTER(LEN=-1)::X so
apply MAX(0, ...) to character length expressions.

Differential Revision: https://reviews.llvm.org/D114030

Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction.

This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead.

The compiler also needs to mark register $x0 as live in for the following case.

```
$x1 = ADDXri $sp, 16, 0
BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0
```

This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0.
As an example:

```
lang=c
typedef struct {
    double v1;
    double v2;
} D16;

typedef struct {
    D16 v1;
    D16 v2;
} D32;

typedef long long LL8;
typedef struct {
    long long v1;
    long long v2;
} LL16;
typedef struct {
    LL16 v1;
    LL16 v2;
} LL32;

typedef struct {
    LL32 v1;
    LL32 v2;
} LL64;

LL8 needx0(LL8 v0, LL8 v1);

void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8);

LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8)
{
  LL8 result = needx0(v0, 0);
  bar(v1, v2, v3, v4, v5, v6, v7, v8);
  return result + 1;
}
```

As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`.
This code is compiled to give the following instruction MIR code.

```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
...
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
...
```

Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get:
```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
...
```

For the first time we outlined the following sequence:
```
frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
```
and
```
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
```

When we execute the outline again, we will get:
```
$x0 = ORRXrs $xzr, $lr, 0 <---- here
BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0
$lr = ORRXrs $xzr, $x0, 0

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```

When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register.
The reason for the above error appears to be that:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```
should be:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0
```

When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`.
A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0).

Reviewed By: jinlin

Differential Revision: https://reviews.llvm.org/D112911

[JITLink] Allow duplicate symbol names for locals

Local symbols can have the same name. I ran into this with JITLink
while working with an object file that had been run through `strip -S`
that had many "func.eh" symbols, but it can also happen using `ld -r`.

rdar://85352156

Differential Revision: https://reviews.llvm.org/D114042

[lldb] build failure for LLDB_PYTHON_EXE_RELATIVE_PATH on greendragon

see: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/38387/console

```
Could not find a relative path to sys.executable under sys.prefix
tried: /usr/local/opt/python/bin/python3.7
tried: /usr/local/opt/python/bin/../Frameworks/Python.framework/Versions/3.7/bin/python3.7
sys.prefix: /usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7
```

It was unable to find LLDB_PYTHON_EXE_RELATIVE_PATH because it was not resolving
the real path of sys.prefix.

caused by: https://reviews.llvm.org/D113650

[flang] Check ArrayRef base for contiguity in IsSimplyContiguousHelper

Previous code was returning true for `x(:)` where x is a pointer without
the contiguous attribute.
In case the array ref is a whole array section, check the base for contiguity
to solve the issue.

Differential Revision: https://reviews.llvm.org/D114084

[gn build] Add missed comma

[NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor

In a CGSCC pass manager, we may visit the same function multiple times
due to SCC mutations. In the inliner pipeline, this results in running
the function simplification pipeline on a function multiple times even
if it hasn't been changed since the last function simplification
pipeline run.

We use a newly introduced analysis to keep track of whether or not a
function has changed since the last time the function simplification
pipeline has run on it. If we see this analysis available for a function
in a CGSCCToFunctionPassAdaptor, we skip running the function passes on
the function. The analysis is queried at the end of the function passes
so that it's available after the first time the function simplification
pipeline runs on a function. This is a per-adaptor option so it doesn't
apply to every adaptor.

The goal of this is to improve compile times. However, currently we
can't turn this on by default at least for the higher optimization
levels since the function simplification pipeline is not robust enough
to be idempotent in many cases, resulting in performance regressions if
we stop running the function simplification pipeline on a function
multiple times. We may be able to turn this on for -O1 in the near
future, but turning this on for higher optimization levels would require
more investment in the function simplification pipeline.

Heavily inspired by D98103.

Example compile time improvements with flag turned on:
https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113947

[SLP][NFC]Add a test for multiple alternate nodes with cost estimation,
NFC.

[Format, Sema] Use range-based for loops with llvm::reverse (NFC)

[OpenMP] Silence build warnings when built with MinGW

There's an attempt to upstream this change in
https://github.com/intel/ittapi/pull/25 too.

Differential Revision: https://reviews.llvm.org/D114069

[libc++] Remove _LIBCPP_HAS_NO_SPACESHIP_OPERATOR

All supported compilers support spaceship in C++20 nowadays.

Differential Revision: https://reviews.llvm.org/D113938

[libc++abi] Don't re-define _LIBCPP_HAS_NO_THREADS in single-threaded mode

Libc++ already defines the macro inside its __config_site header, so
libc++abi doesn't need to do it. Doing it just leads to -Wmacro-redefined
warnings when building libc++abi.

[NFC][gn build] Inclusive language: replace master with main in sync_source_lists_from_cmake.py

[NFC] As part of using inclusive language within the llvm project and to
match the renamed master branch, this patch replaces master with main in
sync_source_lists_from_cmake.py.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D113926

[libc] Use more consistent if defined syntax

[libc] Fix missing restricts

[libc] Fix documentation typo

[mlir][Vector] First step for 0D vector type

There seems to be a consensus that we should allow 0D vectors:
https://llvm.discourse.group/t/should-we-have-0-d-vectors/3097

This commit is only the first step: it changes the verifier and the parser to
allow vectors like `vector<f32>` (but does not allow explicit 0 dimensions,
i.e., `vector<0xf32>` is not allowed).

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114086

[analyzer][NFC] Make the API of CallDescription safer slightly

The new //deleted// constructor overload makes sure that no implicit
conversion from `0` would happen to `ArrayRef<const char*>`.

Also adds nodiscard to the `CallDescriptionMap::lookup()`

[NFC][clang] Inclusive terms: replace uses of blacklist in clang/test/

Replace filenames, variable names, check prefixes uses of blacklist with ignore list.

Reviewed By: jkorous

Differential Revision: https://reviews.llvm.org/D113211

[NFC][clangd] cleanup llvm-else-after-return findings

Cleanup of clang-tidy findings: removing "else" after a return statement
to improve readability of the code.

This patch was created by applying the clang-tidy fixes automatically.

Differential Revision: https://reviews.llvm.org/D113892

[clangd] Fix assertion crashes on unmatched NOLINTBEGIN comments.

The overload shouldSuppressDiagnostic seems unnecessary, and it is only
used in clangd.

This patch removes it and use the real one (suppression diagnostics are
discarded in clangd at the moment).

Fixes https://github.com/clangd/clangd/issues/929

Differential Revision: https://reviews.llvm.org/D113999

asan: don't use thread user_id

asan does not use user_id for anything,
so don't pass it to ThreadCreate.
Passing a random uninitialized field of AsanThread
as user_id does not make much sense anyway.

Depends on D113921.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D113922

memprof: don't use thread user_id

memprof does not use user_id for anything,
so don't pass it to ThreadCreate.
Passing a random field of MemprofThread as user_id
does not make much sense anyway.

Depends on D113920.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D113921

lsan: remove pthread_detach/join interceptors

They don't seem to do anything useful in lsan.
They are needed only if a tools needs to execute
some custom logic during detach/join, or if it uses
thread registry quarantine. Lsan does none of this.
And if a tool cares then it would also need to intercept
pthread_tryjoin_np and pthread_timedjoin_np, otherwise
it will mess thread states.
Fwiw, asan does not intercept these functions either.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D113920

tsan: don't consider debug calls as calls

Tsan pass does 2 optimizations based on presence of calls:
1. Don't emit function entry/exit callbacks if there are no calls
and no memory accesses.
2. Combine read/write of the same variable if there are no
intervening calls.
However, all debug info is represented as CallInst as well
and thus effectively disables these optimizations.
Don't consider debug info calls as calls.

Reviewed By: glider, melver

Differential Revision: https://reviews.llvm.org/D114079

Add a clang-transformer tutorial

Differential Revision: https://reviews.llvm.org/D114011

[NFC][AMDGPU][GlobalISel] Fix some legalizer tests

Instructions being tested were accidentally left dead.

[AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods

If possible fold fneg into instruction above if users cannot fold mods and we
know it will decrease instruction count.
Follows same logic as SDAG combiner in choosing opportunities to combine.

Differential Revision: https://reviews.llvm.org/D112827

Improve docs & test for #pragma clang attribute's any clause; NFC

There was some confusion during the discussion of a patch as to whether
`any` can be used to blast an attribute with no subject list onto
basically everything in a program by not specifying a subrule. This
patch adds documentation and tests to make it clear that this situation
is not supported and will be diagnosed.

[Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector

When getTypeConversion returns TypeScalarizeScalableVector we were
sometimes returning a non-simple type from getTypeLegalizationCost.
However, many callers depend upon this being a simple type and will
crash if not. This patch changes getTypeLegalizationCost to ensure
that we always a return sensible simple VT. If the vector type
contains unusual integer types, e.g. <vscale x 2 x i3>, then we just
set the type to MVT::i64 as a reasonable default.

A test has been added here that demonstrates the vectoriser can
correctly calculate the cost of vectorising a "zext i3 to i64"
instruction with a VF=vscale x 1:

Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

Differential Revision: https://reviews.llvm.org/D113777

[AMDGPU] Generate test checks for mad_64_32.ll

Differential Revision: https://reviews.llvm.org/D113985

[DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c')

If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.

[fir] !fir.tdesc type conversion

Add !fir.tdesc type conversion.
!fir.tdesc is converted to a llvm.ptr<i8>.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D113769

Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>

[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown

When asking how many parts are required for a scalable vector type
there are occasions when it cannot be computed. For example, <vscale x 1 x i3>
is one such vector for AArch64+SVE because at the moment no matter how we
promote the i3 type we never end up with a legal vector. This means
that getTypeConversion returns TypeScalarizeScalableVector as the
LegalizeKind, and then getTypeLegalizationCost returns an invalid cost.
This then causes BasicTTImpl::getNumberOfParts to dereference an invalid
cost, which triggers an assert. This patch changes getNumberOfParts to
return 0 for such cases, since the definition of getNumberOfParts in
TargetTransformInfo.h states that we can use a return value of 0 to represent
an unknown answer.

Currently, LoopVectorize.cpp is the only place where we need to check for
0 as a return value, because all other instances will not currently
ask for the number of parts for <vscale x 1 x iX> types.

In addition, I have changed the target-independent interface for
getNumberOfParts to return 1 and assume there is a single register
that can fit the type. The loop vectoriser has lots of tests that are
target-independent and they relied upon the 0 value to mean the
answer is known and that we are not scalarising the vector.

I have added tests here that show we correctly return an invalid cost
for VF=vscale x 1 when the loop contains unusual types such as i7:

Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

Differential Revision: https://reviews.llvm.org/D113772

[DebugInfo][NFC] Force some tests to not use instruction-referencing

There are various tests that need to be adjusted to test the right
thing with instruction referencing -- usually because the internal
representation of variables is different, sometimes that location lists
change. This patch makes a bunch of tests explicitly not use
instruction referencing, so that a check-llvm test with instruction
referencing on for x86_64 doesn't fail. I'll then convert the tests
to have instr-ref CHECK lines, and similar.

Differential Revision: https://reviews.llvm.org/D113194

[Thumb2] Regenerate test impacted by e8b55cf7b70a695d158d.

[fir] Add fir.box_tdesc conversion

This patch adds the conversion pattern for
`fir.box_tdes`.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D113931

Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

[SCEV] Apply loop guards when computing max BTC for arbitrary steps.

Similar other cases in the current function (e.g. when the step is 1 or
-1), applying loop guards can lead to tighter upper bounds for the
backedge-taken counts.

Fixes PR52464.

Reviewed By: reames, nikic

Differential Revision: https://reviews.llvm.org/D113578

[lldb/test] TestRegisterVariables test fix

Revert "[runtimes] Fix building initial libunwind+libcxxabi+libcxx with compiler implied -lunwind"

This reverts commit 7c3d19ab7bcb79636bd65ee55a0fefef224fcb25.

This commit was reported as causing build problems for the amdgpu
buildbot in https://reviews.llvm.org/D113253#3137097.

[IR] Change CreateStepVector to work with element types smaller than i8

Currently the stepvector intrinsic only supports element types that
are integers of size 8 bits or more. This patch adds support for the
creation of stepvectors with smaller element types by creating
the intrinsic with i8 elements that we then truncate to the requested
size.

It's not currently possible to write a vectoriser test to exercise
this code path so I have added a unit test here:

llvm/unittests/IR/IRBuilderTest.cpp

Differential Revision: https://reviews.llvm.org/D113767

[RISCV] Add extra -early-live-intervals test coverage

Add test coverage for a problem that was fixed by D113493: when updating
live intervals, fix handling of live ranges that were previously tied to
an early-clobber def but no longer are.

[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress

Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493

[fir] Add conversion patterns for slice, shape, shapeshift and shift ops

The information in these perations is used by other operation.
At this point they should not have anymore uses.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D113971

Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

[libc][benchmark] add memmove to size distribution, also update other distributions

Differential Revision: https://reviews.llvm.org/D113260

[InstCombine] Use SpecificBinaryOp_match in two more places

Differential Revision: https://reviews.llvm.org/D114038

[NFC][X86][Costmodel] Improve test coverage for i32->i64 vector *ext

[X86][Costmodel] `*ext v64i1 to v32i16` can appear after legalization, cost is same as for `*ext v32i1 to v32i16`

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113914

[X86][Costmodel] `trunc v32i16 to v64i1` can appear after legalization, cost is same as for `trunc v32i16 to v32i1`

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113913

[SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors

This change make WidenVecRes_SELECT work for scalable vectors.

This patch is split from [D110319](https://reviews.llvm.org/D110319)

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110388

[lldb/test] Added lldbutil function to test a breakpoint

Testing the breakpoint itself rather than the lldb string.

Differential Revision: https://reviews.llvm.org/D111899

[libcxx] [ci] Add CI configurations for MinGW

Mention support for MinGW in the docs. Rename the existing windows
CI jobs to Clang-cl, as both Clang-cl and MinGW are equally much
"Windows", just different toolchain environments.

Add an XFAIL for a recently added test that fails in the MinGW DLL
configuration (with an explanation of what's causing the failure).

Differential Revision: https://reviews.llvm.org/D112215

[libc] Fix incorrect revert of 1ee3205

The revert, b2fbd45d2395f1f6ef39db72b7156724fc101e40, incorrectly
re-introduced a few lines removed in 7c3d19ab7bcb79636bd65ee55a0fefef224fcb25

[flang] Remove default argument from function template specialization. NFC.

Patch D113697 added default function arguments to template specializations of `ConvertToBinary`.

According to https://en.cppreference.com/w/cpp/language/template_specialization this not allowed:
> Default function arguments cannot be specified in explicit specializations of function templates, member function templates, and member functions of class templates when the class is implicitly instantiated.

It happens to compile with gcc, clang and msvc 14.30 (Visual Studio 2022), but not msvc 14.29 (Visual Studio 2020). Even for the compilers that syntactically accept it, the default argument will never be used (only the default argument of the template declaration). From https://en.cppreference.com/w/cpp/language/function_template
> Note that only non-template and primary template overloads participate in overload resolution.

That is, the explicit function template specialization is not added to the overload candidate set. Only after all the parameter types are known, are the explicit specializations chosen, at which point the default function argument is ignored.

Also see D85657.

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D114032

[X86][FP16] add alias for f*mul_*ch intrinsics

*_mul_*ch is to align with *_mul_*s, *_mul_*d and *_mul_*h.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D112777

[scudo] Handle mallinfo2

mallinfo is deprecated by GLIBC

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D113951

ADT: Adding a key_type definition to MapVector

The key_type type definition for map containers is useful in some
generic, template-based programming scenarios. The addition of key_type
to MapVector is consistent with other map types like DenseMap.

Differential Revision: https://reviews.llvm.org/D113242

[MachO] Move type size asserts to source files. NFC

As discussed in https://reviews.llvm.org/D113809#3128636. It's a bit
unfortunate to move the asserts away from the structs whose sizes
they're checking, but it's a far better developer experience when one of
the asserts is violated, because you get a single error instead of every
single source file including the header erroring out.

Revert "[gn build] (manually) port 1ee32055ea1d (benchmark move)"

1ee32055ea1d was reverted in 67de95b8c955.

This reverts commit a8e8e2d5a22636dba2f38d5eda7440a31933e8be
and follow-up a0dc6001df20fcd92e8eb9e196d403859f1b5192

[lld-macho][nfc] Sanity check on template type

Differential Revision: https://reviews.llvm.org/D114044

[mlir] Fix formatting in Ops.td files (NFC)

MemRefOps.td has some inconsistencies in its formatting of argument
lists.

Revert "[libc][NFC][Obvious] Fix the benchmarks after the switch to llvm/third-party"

This reverts commit 39e9f5d3685f3cfca0df072928ad96d973704dff.

Reverting, as we needed to re-revert the benchmarks move because it was
causing a build failure in the Fuchsia bots due to the way they consume
libcxx's CMakeLists. I want to make sure I understand where the fix
should be for that. After that, I'll incorporate the change here in the
re-reland.

[Bazel] Ignore both old and new benchmark directories

This is getting reverted and relanded a lot, breaking the build each
time.

Differential Revision: https://reviews.llvm.org/D114043

Revert "Make it possible for lldb to launch a remote binary with no local file."

The reworking of the gdb client tests into the PlatformClientTestBase broke
the test for this. I did the mutatis mutandis for the move, but the test
still fails. Reverting till I have time to figure out why.

This reverts commit b715b79d54d5ca2d4e8c91089b8f6a9389d9dc48.

[mlir][lsp] Use ResultGroupDefinition struct

This struct was added and was intended to be used, but it was missed in the original patch.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D114041

Revert "Reland "[benchmarks] Move libcxx's fork of google/benchmark and llvm/utils'"""

This reverts commit 1ee32055ea1dd4db70d1939cbd4f5105c2dce160.

We hit additional bot failures; in particular, Fuchsia's seems to be
related to how CMakeLists are ingested, see https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8830380874445931681/overview

[MachO] Shrink reloc from 32 bytes to 24 bytes

The `r_address` field of `relocation_info` is only 4 bytes, so our
offset field (which is the `r_address` field adjusted for subsection
splitting) also only needs to be 4 bytes. This reduces the structure
size from 32 bytes to 24 bytes.

Combined with https://reviews.llvm.org/D113813, this is a minor perf
improvement for linking an internal app, tested on two machines:

```
           smol-relocs     baseline        difference (95% CI)
sys_time   7.367 ± 0.138   7.543 ± 0.157   [  +0.9% ..   +3.8%]
user_time  21.843 ± 0.351  21.861 ± 0.450  [  -1.3% ..   +1.4%]
wall_time  20.301 ± 0.307  20.556 ± 0.324  [  +0.1% ..   +2.4%]
samples    16              16

           smol-relocs     baseline        difference (95% CI)
sys_time   2.923 ± 0.050   2.992 ± 0.018   [  +1.4% ..   +3.4%]
user_time  10.345 ± 0.039  10.448 ± 0.023  [  +0.8% ..   +1.2%]
wall_time  12.068 ± 0.071  12.229 ± 0.021  [  +1.0% ..   +1.7%]
samples    15              12
```

More importantly though, this change by itself reduces our maximum
resident set size by 220 MB (2.75%, from 7.85 GB to 7.64 GB) on the
first machine. On the second machine, it reduces it by 125 MB (1.94%,
from 6.31 GB to 6.19 GB).

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D113818

[MachO] Reduce size of Symbol and Defined

We can lay out Symbol more optimally to reduce its size from 56 bytes to
48 bytes by eliminating unnecessary padding, and we can lay out Defined
such that its bitfield members are placed in the tail padding of Symbol
(on ABIs which support this), to reduce it from 96 bytes to 80 bytes (8
bytes from the Symbol reduction, and 8 bytes from the tail padding
reuse).

This is perf-neutral for an internal app (results from two different
machines):

```
           smol-syms       baseline        difference (95% CI)
sys_time   7.430 ± 0.202   7.440 ± 0.193   [  -2.6% ..   +2.9%]
user_time  21.443 ± 0.513  21.206 ± 0.396  [  -3.3% ..   +1.1%]
wall_time  20.453 ± 0.534  20.222 ± 0.488  [  -3.7% ..   +1.5%]
samples    9               8

           smol-syms       baseline        difference (95% CI)
sys_time   3.011 ± 0.050   3.040 ± 0.052   [  -0.4% ..   +2.3%]
user_time  10.416 ± 0.075  10.496 ± 0.091  [  +0.1% ..   +1.4%]
wall_time  12.229 ± 0.144  12.354 ± 0.192  [  -0.1% ..   +2.1%]
samples    14              13
```

However, on the first machine, it reduces maximum resident set size by
65.9 MB (0.8%, from 7.92 GB to 7.85 GB). On the second machine, it
reduces it by 92 MB (1.4%, from 6.40 GB to 6.31 GB).

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D113813

[MachO] Fix struct size assertion

It was checking for 64-bit builds incorrectly. Unfortunately,
ConcatInputSection has grown a bit in the meantime, and I don't see any
obvious way to shrink it. Perhaps icfEqClass could use 32-bit hashes
instead of 64-bit ones, but xxHash64 is supposed to be much faster than
xxHash32 (https://github.com/Cyan4973/xxHash#benchmarks), so that sounds
like a loss. (Unrelatedly, we should really look at using XXH3 instead
of xxHash64 now.)

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D113809

Make it possible for lldb to launch a remote binary with no local file.

We don't actually need a local copy of the main executable to debug
a remote process. So instead of treating "no local module" as an error,
see if the LaunchInfo has an executable it wants lldb to use, and if so
use it. Then report whatever error the remote server returns.

Differential Revision: https://reviews.llvm.org/D113521

[gn build] (manually) port 1ee32055ea1d more (benchmark move)

The new benchmark dep apparently has more source files than the old one.

[gn build] (manually) port 1ee32055ea1d (benchmark move)