review.tizen.org Git - platform/upstream/llvm.git/log

Revert "[NFC][LSAN] Move ThreadCreate into child thread"

https://bugs.chromium.org/p/chromium/issues/detail?id=1445676

This reverts commit 6d7b26ae49b9273d9aea4e53a96901caeb09efe0.

Revert "[flang] Add check for constraints on event-stmts"

This reverts commit 9725c740fbe7841a7aed57ca35f83d28aac1814c.

[SVE ACLE] Change the lowering of SVE integer builtins

Change the lowering of SVE integer mla_x/mls_x and mad_x/msb_x
builtins to use dedicated undef (_u) intrinsics.

Differential Revision: https://reviews.llvm.org/D150553

Correct documentation for -fconstexpr-depth=

We were documenting that this was about recursive calls when it's
actually about arbitrary calls.

e.g., https://godbolt.org/z/en8sYd77E

[Flang][OpenMP][Semantics] Added missing HostAssoc check for use_device_ptr test.

Missed adding this check in previous commit so adding it through separate commit.

Reviewed By: raghavendhra

Differential Revision: https://reviews.llvm.org/D150626

[libc] Add optimized memcmp for RISCV

This patch adds two versions of `bcmp` optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.

Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_memcmp` on a quad core Linux starfive RISCV 64 board running at 1.5GHz.

Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
  L1 Instruction 32 KiB (x4)
  L1 Data 32 KiB (x4)
  L2 Unified 2048 KiB (x1)
----------------------------------------------------------------------
Benchmark            Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------
BM_Memcmp/0/0        110 ns         66.4 ns     10404864 bytes_per_cycle=0.107646/s bytes_per_second=153.989M/s items_per_second=15.071M/s __llvm_libc::memcmp,memcmp Google A
BM_Memcmp/1/0        318 ns          211 ns      3026944 bytes_per_cycle=0.131539/s bytes_per_second=188.167M/s items_per_second=4.73691M/s __llvm_libc::memcmp,memcmp Google B
BM_Memcmp/2/0        204 ns          115 ns      6118400 bytes_per_cycle=0.121675/s bytes_per_second=174.058M/s items_per_second=8.70241M/s __llvm_libc::memcmp,memcmp Google D
BM_Memcmp/3/0        143 ns         99.6 ns      7013376 bytes_per_cycle=0.117974/s bytes_per_second=168.763M/s items_per_second=10.0437M/s __llvm_libc::memcmp,memcmp Google L
BM_Memcmp/4/0       81.3 ns         58.2 ns     11426816 bytes_per_cycle=0.101125/s bytes_per_second=144.661M/s items_per_second=17.1805M/s __llvm_libc::memcmp,memcmp Google M
BM_Memcmp/5/0        177 ns          118 ns      5952512 bytes_per_cycle=0.120612/s bytes_per_second=172.537M/s items_per_second=8.45549M/s __llvm_libc::memcmp,memcmp Google Q
BM_Memcmp/6/0        342 ns          220 ns      3483648 bytes_per_cycle=0.132004/s bytes_per_second=188.834M/s items_per_second=4.54739M/s __llvm_libc::memcmp,memcmp Google S
BM_Memcmp/7/0        208 ns          130 ns      5681152 bytes_per_cycle=0.12468/s bytes_per_second=178.356M/s items_per_second=7.6674M/s __llvm_libc::memcmp,memcmp Google U
BM_Memcmp/8/0        123 ns         79.1 ns      8387584 bytes_per_cycle=0.110593/s bytes_per_second=158.204M/s items_per_second=12.6439M/s __llvm_libc::memcmp,memcmp Google W
BM_Memcmp/9/0      20707 ns        10643 ns        67584 bytes_per_cycle=0.142401/s bytes_per_second=203.707M/s items_per_second=93.9559k/s __llvm_libc::memcmp,uniform 384 to 4096
```

After
```
BM_Memcmp/0/0       80.4 ns         55.8 ns     12648448 bytes_per_cycle=0.132703/s bytes_per_second=189.834M/s items_per_second=17.9256M/s __llvm_libc::memcmp,memcmp Google A
BM_Memcmp/1/0        140 ns         80.5 ns      8230912 bytes_per_cycle=0.337273/s bytes_per_second=482.474M/s items_per_second=12.4165M/s __llvm_libc::memcmp,memcmp Google B
BM_Memcmp/2/0        101 ns         66.4 ns     10571776 bytes_per_cycle=0.208539/s bytes_per_second=298.317M/s items_per_second=15.0687M/s __llvm_libc::memcmp,memcmp Google D
BM_Memcmp/3/0        118 ns         67.6 ns     10533888 bytes_per_cycle=0.176822/s bytes_per_second=252.946M/s items_per_second=14.7946M/s __llvm_libc::memcmp,memcmp Google L
BM_Memcmp/4/0        106 ns         53.0 ns     12722176 bytes_per_cycle=0.111141/s bytes_per_second=158.988M/s items_per_second=18.8591M/s __llvm_libc::memcmp,memcmp Google M
BM_Memcmp/5/0        141 ns         70.2 ns     10436608 bytes_per_cycle=0.26032/s bytes_per_second=372.39M/s items_per_second=14.2458M/s __llvm_libc::memcmp,memcmp Google Q
BM_Memcmp/6/0        144 ns         79.3 ns      8932352 bytes_per_cycle=0.353168/s bytes_per_second=505.211M/s items_per_second=12.612M/s __llvm_libc::memcmp,memcmp Google S
BM_Memcmp/7/0        123 ns         71.7 ns      9945088 bytes_per_cycle=0.22143/s bytes_per_second=316.758M/s items_per_second=13.9421M/s __llvm_libc::memcmp,memcmp Google U
BM_Memcmp/8/0       97.0 ns         56.2 ns     12509184 bytes_per_cycle=0.160526/s bytes_per_second=229.635M/s items_per_second=17.7784M/s __llvm_libc::memcmp,memcmp Google W
BM_Memcmp/9/0       1840 ns          989 ns       676864 bytes_per_cycle=1.4894/s bytes_per_second=2.08067G/s items_per_second=1010.92k/s __llvm_libc::memcmp,uniform 384 to 4096
```

glibc
```
BM_Memcmp/0/0       72.6 ns         51.7 ns     12963840 bytes_per_cycle=0.141261/s bytes_per_second=202.075M/s items_per_second=19.3246M/s glibc::memcmp,memcmp Google A
BM_Memcmp/1/0        118 ns         75.2 ns      9280512 bytes_per_cycle=0.354054/s bytes_per_second=506.478M/s items_per_second=13.3046M/s glibc::memcmp,memcmp Google B
BM_Memcmp/2/0        114 ns         62.9 ns     11152384 bytes_per_cycle=0.222675/s bytes_per_second=318.539M/s items_per_second=15.8943M/s glibc::memcmp,memcmp Google D
BM_Memcmp/3/0       84.0 ns         63.5 ns     11030528 bytes_per_cycle=0.186353/s bytes_per_second=266.581M/s items_per_second=15.7378M/s glibc::memcmp,memcmp Google L
BM_Memcmp/4/0       93.5 ns         51.2 ns     13462528 bytes_per_cycle=0.119215/s bytes_per_second=170.539M/s items_per_second=19.5384M/s glibc::memcmp,memcmp Google M
BM_Memcmp/5/0        123 ns         61.7 ns     11376640 bytes_per_cycle=0.225262/s bytes_per_second=322.239M/s items_per_second=16.1993M/s glibc::memcmp,memcmp Google Q
BM_Memcmp/6/0        122 ns         71.6 ns      9967616 bytes_per_cycle=0.380844/s bytes_per_second=544.802M/s items_per_second=13.9579M/s glibc::memcmp,memcmp Google S
BM_Memcmp/7/0        118 ns         65.6 ns     10555392 bytes_per_cycle=0.238677/s bytes_per_second=341.43M/s items_per_second=15.2334M/s glibc::memcmp,memcmp Google U
BM_Memcmp/8/0       90.4 ns         54.0 ns     12920832 bytes_per_cycle=0.161987/s bytes_per_second=231.724M/s items_per_second=18.5169M/s glibc::memcmp,memcmp Google W
BM_Memcmp/9/0       1045 ns          601 ns      1195008 bytes_per_cycle=2.53677/s bytes_per_second=3.54383G/s items_per_second=1.66423M/s glibc::memcmp,uniform 384 to 4096
```

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D150663

[lldb][NFCI] Small adjustment to Breakpoint::AddName

m_name_list is a std::unordered_set<std::string>, we can insert the
string directly instead of grabbing the c_str and creating yet another
one.

[AArch64] Combine add(extract v1i64) into v1i64 add

This helps fix a regression from D148309 where a shift + add was no longer
combined into a ssra. It looks for add's with v1i64 extract operands and
converts them to v1i64 adds. The other operand needs to be something that is
easily converted to a v1i64, in this case it currently just checks for a load.

Some of the code in performAddSubCombine has been cleaned up whilst I was here.

Differential Revision: https://reviews.llvm.org/D148311

[flang] Parenthesize RHS arguments to defined assignments (bug #62599)

The right-hand sides of assignment statements are always expressions,
never variables. When an assignment statement is converted into a call
to a defined assignment subroutine, and the actual argument being associated
with the second dummy argument is a variable, and the dummy argument does
not have the VALUE attribute, wrap it with parentheses so that lowering
will pass it by means of a temporary.

Fixes https://github.com/llvm/llvm-project/issues/62599.

Differential Revision: https://reviews.llvm.org/D150331

[libc] Add optimized bcmp for RISCV

[libc] Add optimized bcmp for RISCV

This patch adds two versions of bcmp optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.

Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Bcmp on a quad core Linux starfive RISCV 64 board running at 1.5GHz.

Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
  L1 Instruction 32 KiB (x4)
  L1 Data 32 KiB (x4)
  L2 Unified 2048 KiB (x1)
Load Average: 7.03, 5.98, 3.71
----------------------------------------------------------------------
Benchmark            Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------
BM_Bcmp/0/0        102 ns         60.5 ns     11662336 bytes_per_cycle=0.122696/s bytes_per_second=175.518M/s items_per_second=16.5258M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0        328 ns          172 ns      3737600 bytes_per_cycle=0.15256/s bytes_per_second=218.238M/s items_per_second=5.80575M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0        199 ns         99.7 ns      7019520 bytes_per_cycle=0.141897/s bytes_per_second=202.986M/s items_per_second=10.032M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0        173 ns         86.5 ns      8361984 bytes_per_cycle=0.13863/s bytes_per_second=198.312M/s items_per_second=11.5669M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0        105 ns         51.8 ns     13213696 bytes_per_cycle=0.116399/s bytes_per_second=166.51M/s items_per_second=19.2931M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0        167 ns         93.9 ns      7853056 bytes_per_cycle=0.139432/s bytes_per_second=199.459M/s items_per_second=10.6503M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0        262 ns          165 ns      3931136 bytes_per_cycle=0.151516/s bytes_per_second=216.745M/s items_per_second=6.07091M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0        168 ns          105 ns      6665216 bytes_per_cycle=0.143159/s bytes_per_second=204.791M/s items_per_second=9.52163M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0        108 ns         68.0 ns     10175488 bytes_per_cycle=0.125504/s bytes_per_second=179.535M/s items_per_second=14.701M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0      15371 ns         9007 ns        78848 bytes_per_cycle=0.166128/s bytes_per_second=237.648M/s items_per_second=111.031k/s __llvm_libc::bcmp,uniform 384 to 4096
```

After
```
BM_Bcmp/0/0       74.2 ns         49.7 ns     14306304 bytes_per_cycle=0.148927/s bytes_per_second=213.042M/s items_per_second=20.1101M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0        108 ns         68.1 ns     10350592 bytes_per_cycle=0.411197/s bytes_per_second=588.222M/s items_per_second=14.6849M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0       80.2 ns         56.0 ns     12386304 bytes_per_cycle=0.258588/s bytes_per_second=369.912M/s items_per_second=17.8585M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0       92.4 ns         55.7 ns     12555264 bytes_per_cycle=0.206835/s bytes_per_second=295.88M/s items_per_second=17.943M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0       79.3 ns         46.8 ns     14288896 bytes_per_cycle=0.125872/s bytes_per_second=180.061M/s items_per_second=21.3611M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0       98.0 ns         57.9 ns     12232704 bytes_per_cycle=0.268815/s bytes_per_second=384.543M/s items_per_second=17.2711M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0        132 ns         65.5 ns     10474496 bytes_per_cycle=0.417246/s bytes_per_second=596.875M/s items_per_second=15.2673M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0        101 ns         60.9 ns     11505664 bytes_per_cycle=0.253733/s bytes_per_second=362.968M/s items_per_second=16.4202M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0       72.5 ns         50.2 ns     14082048 bytes_per_cycle=0.183262/s bytes_per_second=262.158M/s items_per_second=19.9271M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0        852 ns          803 ns       854016 bytes_per_cycle=1.85028/s bytes_per_second=2.58481G/s items_per_second=1.24597M/s __llvm_libc::bcmp,uniform 384 to 4096
```

For comparison with glibc
```
BM_Bcmp/0/0        106 ns         52.6 ns     12906496 bytes_per_cycle=0.142072/s bytes_per_second=203.235M/s items_per_second=19.0271M/s glibc::bcmp,memcmp Google A
BM_Bcmp/1/0        132 ns         77.1 ns      8905728 bytes_per_cycle=0.365072/s bytes_per_second=522.239M/s items_per_second=12.9782M/s glibc::bcmp,memcmp Google B
BM_Bcmp/2/0        122 ns         62.3 ns     10909696 bytes_per_cycle=0.222667/s bytes_per_second=318.527M/s items_per_second=16.0563M/s glibc::bcmp,memcmp Google D
BM_Bcmp/3/0       99.5 ns         64.2 ns     11074560 bytes_per_cycle=0.185126/s bytes_per_second=264.825M/s items_per_second=15.5674M/s glibc::bcmp,memcmp Google L
BM_Bcmp/4/0       86.6 ns         50.2 ns     13488128 bytes_per_cycle=0.117941/s bytes_per_second=168.717M/s items_per_second=19.9053M/s glibc::bcmp,memcmp Google M
BM_Bcmp/5/0        106 ns         61.4 ns     11344896 bytes_per_cycle=0.248968/s bytes_per_second=356.151M/s items_per_second=16.284M/s glibc::bcmp,memcmp Google Q
BM_Bcmp/6/0        145 ns         71.9 ns     10046464 bytes_per_cycle=0.389814/s bytes_per_second=557.633M/s items_per_second=13.9019M/s glibc::bcmp,memcmp Google S
BM_Bcmp/7/0        119 ns         65.6 ns     10718208 bytes_per_cycle=0.243756/s bytes_per_second=348.696M/s items_per_second=15.2329M/s glibc::bcmp,memcmp Google U
BM_Bcmp/8/0       86.4 ns         54.5 ns     13250560 bytes_per_cycle=0.154831/s bytes_per_second=221.488M/s items_per_second=18.3532M/s glibc::bcmp,memcmp Google W
BM_Bcmp/9/0       1090 ns          604 ns      1186816 bytes_per_cycle=2.53848/s bytes_per_second=3.54622G/s items_per_second=1.65598M/s glibc::bcmp,uniform 384 to 4096
```

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D150567

[DWARFLinker][DWARFv5] Add handling of DW_OP_addrx and DW_OP_constx expression operands.

This patch adds handling of DW_OP_addrx and DW_OP_constx expression operands.
In --update case these operands are preserved as is. Otherwise they are
converted into the DW_OP_addr and DW_OP_const[*]u correspondingly.

Differential Revision: https://reviews.llvm.org/D147066

[clang] Convert a few OpenMP tests to opaque pointers

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D150680

[clang] Convert a few OpenMP tests to opaque pointers

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D150682

[mlir][sparse] Add a helper class to help lowering operations with/without function calls

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D150477

[flang] Apply default module accessibility rules a second time (bug#62598)

Apply the default PUBLIC/PRIVATE accessibility of a module to its symbols
a second time after it is known that all symbols, including implicitly typed
names from NAMELIST groups and specification expressions in module subprograms,
have been created in its scope.

Fixes https://github.com/llvm/llvm-project/issues/62598.

Differential Revision: https://reviews.llvm.org/D150307

Migrate {starts,ends}with_insensitive to {starts,ends}_with_insensitive (NFC)

This patch migrates uses of StringRef::{starts,ends}with_insensitive
to StringRef::{starts,ends}_with_insensitive so that we can use names
similar to those used in std::string_view.

Note that the llvm/ directory has migrated in commit
6c3ea866e93003e16fc55d3b5cedd3bc371d1fde.

I'll post a separate patch to deprecate
StringRef::{starts,ends}with_insensitive.

Differential Revision: https://reviews.llvm.org/D150506

[Hexagon] Fix HVX predicates on some intrinsic selection patterns

Instead of checking arch version, check HVX version when dealing with
HVX instructions.

[clangd][check] Print directory with compile flags

[flang] Don't mistakenly tokenize a Hollerith literal from "DO 100 H=..." (bug #58732)

After tokenizing an identifier, don't allow the next token to be a
Hollerith literal.

Fixes https://github.com/llvm/llvm-project/issues/58732.

Differential Revision: https://reviews.llvm.org/D150406

[Clang][Flang][OpenMP] Add loadOffloadInfoMetadata and createOffloadEntriesAndInfoMetadata into OMPIRBuilder's finalize and initialize

This allows the generation of OpenMP offload metadata for the OpenMP
dialect when lowering to LLVM-IR and moves some of the shared logic
between the OpenMP Dialect and Clang into the IRBuilder.

Reviewers: jsjodin, jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D148370

[flang] Fixed comparison for derived types constants.

The two constants should be equal only if their derived types
are the same. This fixes regression caused by D150380.

Differential Revision: https://reviews.llvm.org/D150634

[clangd] Fix test.

[RISCV] Rework how implied SP operands work in the disassembler. NFC

Previously we added the SP operands when an immediate operand was added
to certain opcodes.

This patch moves it to a post processing step using the information
in MCInstrDesc. This avoids an explicit opcode list in RISCVDisassembler.cpp.

In considered using a custom DecoderMethod, but the bit swizzling we
need to do for the immediates on these instructions made that
unattractive.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D149931

[bazel] Fix build after 0c4d7d14e94d

[lldb] Define lldbassert based on NDEBUG instead of LLDB_CONFIGURATION_DEBUG

Whether assertions are enabled or not is orthogonal to the build type
which could lead to surprising behavior for lldbassert. Previously, when
doing a debug build with assertions disabled, lldbassert would become a
NOOP, rather than printing an error like it does in a release build. By
definining lldbassert in terms of NDEBUG, it behaves like a regular
assert when assertions are enabled, and like a soft assert.

Differential revision: https://reviews.llvm.org/D150639

[llvm-objdump][X86] Add @plt symbols for .plt.got

If a symbol needs both JUMP_SLOT and GLOB_DAT relocations, there is a
minor linker optimization to keep just GLOB_DAT. This optimization
is only implemented by GNU ld's x86 port and mold.
https://maskray.me/blog/2021-08-29-all-about-global-offset-table#combining-.got-and-.got.plt

With the optimizing, the PLT entry is placed in .plt.got and the
associated GOTPLT entry is placed in .got (ld.bfd -z now) or .got.plt (ld.bfd -z lazy).
The relocation is in .rel[a].dyn.

This patch synthesizes `symbol@plt` labels for these .plt.got entries.

Example:
```
cat > a.s <<e
.globl _start; _start:
mov combined0@gotpcrel(%rip), %rax; mov combined1@gotpcrel(%rip), %rax
call combined0@plt; call combined1@plt
call foo0@plt; call foo1@plt
e
cat > b.s <<e
.globl foo0, foo1, combined0, combined1
foo0: foo1: combined0: combined1:
e
gcc -fuse-ld=bfd -shared b.s -o b.so
gcc -fuse-ld=bfd -pie -nostdlib a.s b.so -o a
```

```
Disassembly of section .plt:

0000000000001000 <.plt>:
    1000: ff 35 ea 1f 00 00             pushq   0x1fea(%rip)            # 0x2ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
    1006: ff 25 ec 1f 00 00             jmpq    *0x1fec(%rip)           # 0x2ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
    100c: 0f 1f 40 00                   nopl    (%rax)

0000000000001010 <foo1@plt>:
    1010: ff 25 ea 1f 00 00             jmpq    *0x1fea(%rip)           # 0x3000 <_GLOBAL_OFFSET_TABLE_+0x18>
    1016: 68 00 00 00 00                pushq   $0x0
    101b: e9 e0 ff ff ff                jmp     0x1000 <.plt>

0000000000001020 <foo0@plt>:
    1020: ff 25 e2 1f 00 00             jmpq    *0x1fe2(%rip)           # 0x3008 <_GLOBAL_OFFSET_TABLE_+0x20>
    1026: 68 01 00 00 00                pushq   $0x1
    102b: e9 d0 ff ff ff                jmp     0x1000 <.plt>

Disassembly of section .plt.got:

0000000000001030 <combined0@plt>:
    1030: ff 25 a2 1f 00 00             jmpq    *0x1fa2(%rip)           # 0x2fd8 <foo1+0x2fd8>
    1036: 66 90                         nop

0000000000001038 <combined1@plt>:
    1038: ff 25 a2 1f 00 00             jmpq    *0x1fa2(%rip)           # 0x2fe0 <foo1+0x2fe0>
    103e: 66 90                         nop
```

For x86-32, with -z now, if we remove `foo0` and `foo1`, the absence of regular
PLT will cause GNU ld to omit .got.plt, and our code cannot synthesize @plt
labels. This is an extreme corner case that almost never happens in practice (to
trigger the case, ensure every PLT symbol has been taken address). To fix it, we
can get the `_GLOBAL_OFFSET_TABLE_` symbol value, but the complexity is not
worth it.

Close https://github.com/llvm/llvm-project/issues/62537

Reviewed By: bd1976llvm

Differential Revision: https://reviews.llvm.org/D149817

[AArch64] Use correct IRBuilder in InstCombine hooks

These need to use the IRBuilder provided by InstCombine for proper
worklist management.

Add doc link to missing include diagnostics.

Differential Revision: https://reviews.llvm.org/D150668

[clang] Convert a couple of OpenMP tests to opaque pointers

This is a follow-up to D150608.

Revert "[libc] Add explicit constructor calls to fix compilation when using UInt<T>"

This reverts commit b663993067ffb5800632ad41ea7f2f92caab1093.

This caused a regression on aarch64:
https://lab.llvm.org/buildbot#builders/138/builds/43983

[libc] Add explicit constructor calls to fix compilation when using UInt<T>

This patch is similar to 86fe88c8d9 and adds several explicit
constructor calls (bool(...), uint64_t(...), uint8_t(...)) that are
needed when we use UInt<T> (in my case UInt<128> in riscv32).

This patch also adds two operators to UInt<T>:
* operator/= required by printf_core/float_hex_converter.h:148
* operator-- required by FPUtil/ManipulationFunctions.h:166

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D149594

[mlir][Linalg] Split vectorization tests

Split Linalg vectorization tests from "vectorization.mlir" across more
specialised test files:
  * vectorize-tensor-extract.mlir - tests for tensor.extract with no
    masking,
  * vectorize-tensor-extract-masked.mlir - tests for tensor.extract with
    masking,
  * vectorization-masked.mlir - all other tests that use masking,
  * vectorisation.mlir - the remaining tests.

Differential Revision: https://reviews.llvm.org/D149843

[Driver] Support multi /guard: options

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D150645

[mlir][openacc] Add ReturnLike trait to acc.yield operation

Just add the trait as acc.yield is a return like op.

Reviewed By: razvanlupusoru, jeanPerier

Differential Revision: https://reviews.llvm.org/D150617

[AMDGPU][InferAddressSpaces] Only rewrite address-spaces that can be trivially casted to flat for llvm.amdgcn.flat.atomic.{fadd,fmax,fmin}

The intrinsic @llvm.amdgcn.flat.atomic.{fadd,fmax,fmin} can only be
selected for flat address spaces (constant, flat and global).

This patch restricts the cases over which GCNTTIImpl::rewriteIntrinsicWithAddressSpace
rewrites the intrinsic.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D149938

LangRef: Clarify behavior of minnum/maxnum

Make it clearer minnum(+0, +0) cannot return -0. Also remove
a note about the result always being quiet which is directly
contradicted by the following paragraph.

GlobalOpt: Improve addrspacecast handling

Handle addrspacecast when looking at uses.

GlobalOpt: Add a test for addrspacecast coverage with alloc functions

AllUsesOfValueWillTrapIfNull could handle addrspacecast, but currently
doesn't.

Revert "[clang] Show line numbers in diagnostic code snippets"

This reverts commit e2917311f026cc445fa8aeefa0457b0c7a60824a.

This caused some problems with lldb testing the diagnostic output:
https://lab.llvm.org/buildbot/#/builders/68/builds/52754

[clang][docs] Fix sphinx bot

Breakage:
https://lab.llvm.org/buildbot/#/builders/92/builds/44222

[clang] Show line numbers in diagnostic code snippets

Show line numbers to the left of diagnostic code snippets and increase
the numbers of lines shown from 1 to 16.

Differential Revision: https://reviews.llvm.org/D147875

[LLD] Do not assume /guard:cf always set together with /guard:ehcont

MS link accepts *.obj with ehcont bit set only. LLD should match this
behavoir too.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D150508

[DWARFLinker][DWARFv5] Add support for .debug_line_str table.

This patch adds support for DWARFv5 .debug_line_str table.
It replaces code generating line table. Instead of copying original
table and patching certain places this patch implements full line table
generation.

Differential Revision: https://reviews.llvm.org/D150554

[MemCpyOpt] Fix up debug loc for simplified memset in processMemSetMemCpyDependence

Make sure the code comments in processMemSetMemCpyDependence match
with the actual transform. They indicated that the memset being
rewritten was sunk to after a memcpy, while it actually is inserted
just before the memcpy.

Also make sure we use the debug location of the original memset
when creating the new simplified memset. In the past we've been
using the debug location for the memcpy which could be a bit
confusing.

Differential Revision: https://reviews.llvm.org/D135574

[AMDGPU] Avoid RegScavenger::forward in copyPhysReg/indirectCopyToAGPR

RegScavenger::backward is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D150571

Revert "[GlobalIsel][X86] Legalize G_BSWAP"

This reverts commit 5cafecf9f952818400fa32645695e79838f1bc2c.

Buildbots are not happy with the patch.

Lots of crashes and assertion failures such as

  llvm::LegalizeRuleSet &llvm::LegalizerInfo::getActionDefinitionsBuilder(
  std::initializer_list<unsigned int>): Assertion `Opcodes.size() >= 2 &&
  "Initializer list must have at least two opcodes"' failed.

tsan-rt: silence a -Wunused-const-variable

lsan-rt: silence a -Wformat-pedantic

asan-rt: silence some more -Wformat-pedantic's

[AIX] Fixed malformed big archive when total archive file size is large than 4Gbytes

Summary:

1. we use the unsigned type for NextOffset,PrevOffset ,GlobalSymbolOffset , MemberTableSize, it will caused a malform big archive when the archive file size is large than 4G.
2. also fix a NFC comment on https://reviews.llvm.org/D142479#inline-1443927

Reviewers: James Henderson
Differential Revision: https://reviews.llvm.org/D150462

Remove some includes that shouldn't be needed any longer

This remove a bunch of #include statements in Scalar.cpp. I do not
think those should be needed any longer (assuming that they once
upon a time possibly were needed for legacy PM C bindings, but
that is not supported any longer).

Also removing some other #include statements not needed any longer
due to deprecation of legacy PM.

Differential Revision: https://reviews.llvm.org/D149438

[AArch64][SME2/SVE2p1] Add predicate-as-counter intrinsics for pext (multi)

These intrinsics are used to implement the pext intrinsics that extract
two predicates (mask) from a predicate-as-counter value, e.g.

__attribute__((arm_streaming))
svboolx2_t svpext_lane_c8_x2(svcount_t pnn, uint64_t imm);

As described in https://github.com/ARM-software/acle/pull/217

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D150442

[AArch64][SME2/SVE2p1] Add predicate-as-counter intrinsics for pext (single)

These intrinsics are used to implement the pext intrinsics that extract
a predicate (mask) from a predicate-as-counter value, e.g.

__attribute__((arm_streaming))
svbool_t svpext_lane_c8(svcount_t pnn, uint64_t imm);

As described in https://github.com/ARM-software/acle/pull/217

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D150441

[GlobalIsel][X86] Legalize G_BSWAP

remark: unable to legalize instruction: %95:_(s16) = G_BSWAP %94:_ (in function: _ZNK4llvm13DataExtractor6getU16EPyPtj) [-Rpass-missed=gisel-legalize]

check plan: ninja check-llvm-codegen-x86

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D150667

[clang][AIX] Adding Revised xcoff-roptr CodeGen Test Case

https://reviews.llvm.org/D150586 removed a problematic test cases that caused failures on non-ppc buildbots. This patch revises the test case and adds it back.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D150597

[analyzer] Fix QTimer::singleShot NewDeleteLeaks false positive

Fixes #39713

fferential Revision: https://reviews.llvm.org/D150552

ValueTracking: Restore ordered negative handling for frem

In D148674, the negative condition was weakened to only
checking isKnownNever(fcNegative), instead of cannotBeOrderedLessThanZero().

This avoids a regression when CannotBeOrderedLessThanZero is
replaced with computeKnownFPClass.

[clang][NFC] Use llvm::count_if instead of manual loop

ValueTracking: fadd/fsub +0 cannot return -0

Copied from CannotBeNegativeZero and extended to cover fsub.

[clang] Regenerate checks in OpenMP tests with opaque-pointers enabled

[gn build] Port 7158fd381a0b

Revert "[clang-repl] Introduce Value to capture expression results"

This reverts commit a423b7f1d7ca8b263af85944f57a69aa08fc942c.
See https://lab.llvm.org/buildbot/#/changes/95083

[builtins][test] Use architecture specific float16 check

The COMPILER_RT_HAS_FLOAT16 cmake check is now set per architecture,
which needs to be reflected when building the tests.

Additionally added armhf to the architecture list.

Reviewed By: dim

Differential Revision: https://reviews.llvm.org/D150281

[mlir][llvm] Add is constant intrinsic.

The revision adds LLVM's is constant intrinsic.

Depends on D150643

Reviewed By: Dinistro

Differential Revision: https://reviews.llvm.org/D150660

[Clang][LoongArch] Pass the -mabi and -target-abi options to as and cc1as respectively

This change is necessary to set correct EFlags according to the
options (-m*-float and -mabi=) passed to clang when input is assembly.

Note: `-mabi=` is not documented by `as`.
```
$ as --version
GNU assembler (GNU Binutils) 2.40.50.20230316
...
$ as --target-help
LARCH options:
```

But we can see gcc invokes `as` and passes the `-mabi=` option when compiling C or assembly.
```
$ gcc -c a.c -v 2>&1 -msoft-float | grep "as -v"
as -v -mabi=lp64s -o a.o /tmp/ccFrxzZi.s
$ gcc -c a.s -v 2>&1 -msoft-float | grep "as -v"
as -v -mabi=lp64s -o a.o a.s
```

Reviewed By: xen0n

Differential Revision: https://reviews.llvm.org/D150537

[LoongArch] Move lp64s out of the unimplemented calling conv list

lp64s is same as lp64d execpt that floating point arguments and return
values are always passed via GPRs or stack which means `UseGPRForFloat`
is always `true` in `CC_LoongArch` for lp64s.

One motivation of this change is to build linux which uses
`-msoft-float` and `-mabi=lp64s` [1].

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/loongarch/Makefile?h=v6.4-rc1#n49

Reviewed By: xen0n, hev

Differential Revision: https://reviews.llvm.org/D150417

[clangd] downgrade missing-includes diagnostic to Information level

In practice, a Warning on every occurrence is very unpopular, even on a codebase
with clear rules about direct inclusion & moderately good compliance.

This change has various practical effects (in vscode for concreteness):
  - makes the diagnostic decoration less striking (blue vs yellow)
  - makes these diagnostics visually distinct from others when reading
  - causes these diagnostics to sort last in the "problems" view
  - allows these diagnostics to be easily filtered from the "problems" view

Differential Revision: https://reviews.llvm.org/D149912

[AArch64] Change the type of i64 neon shifts to v1i64

This alters the lowering of shifts by a constant, so that the type is lowered
to a v1i64 instead of a i64. This helps communicate that the type will live in
a neon register, and can help clean up surrounding code. Note this is only
currently for the scalar shifts of a constant that go through the nodes in
tryCombineShiftImm.

ssra instructions are no longer being recognized in places, but that can be
cleaned up in a followup patch that combines the i64 add into a v1i64 add.

Differential Revision: https://reviews.llvm.org/D148309

[gn build] Port a423b7f1d7ca (ClangReplInterpreterTests -rdynamic)

[clang] Convert a few OpenMP tests to opaque pointers

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D150652

[clang][AST] Print name instead of type when diagnosing uninitialized subobject in constexpr variables

This patch improves the diagnostic on uninitialized subobjects in constexpr variables by modifying the diagnostic message to display the subobject's name instead of its type.

Fixes https://github.com/llvm/llvm-project/issues/58601
Differential Revision: https://reviews.llvm.org/D146358

[GlobalIsel][x86] Legalize G_AND, G_OR, and G_XOR for AVX2

check plan: ninja check-llvm-codegen-x86

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D150658

[mlir][GPU] Rename MLIRGPUOps CMake target to MLIRGPUDialect

This is for consistency with other dialects.

Differential Revision: https://reviews.llvm.org/D150659

[AArch64] Additional testing for uqshl and regenerate arm64-vshift.ll. NFC

This tries to fill in some missing testing for neon shift intrinsics, and
regenerates the existing tests. See D148309 and D148311.

[clang-repl] Introduce Value to capture expression results

This is the second part of the below RFC:
https://discourse.llvm.org/t/rfc-handle-execution-results-in-clang-repl/68493

This patch implements a Value class that can be used to carry expression
results in clang-repl. In other words, when we see a top expression
without semi, it will be captured and stored to a Value object. You can
explicitly specify where you want to store the object, like:

```
Value V;
llvm::cantFail(Interp->ParseAndExecute("int x = 42;"));
llvm::cantFail(Interp->ParseAndExecute("x", &V));
```

`V` now stores some useful infomation about `x`, you can get its real
value (42), it's `clang::QualType` or anything interesting.

However, if you don't specify the optional argument, it will be captured
to a local variable, and automatically called `Value::dump`, which is
not implemented yet in this patch.

Signed-off-by: Jun Zhang <jun@junz.org>

[clang] Add a new annotation token: annot_repl_input_end

This patch is the first part of the below RFC:
https://discourse.llvm.org/t/rfc-handle-execution-results-in-clang-repl/68493

It adds an annotation token which will replace the original EOF token
when we are in the incremental C++ mode. In addition, when we're
parsing an ExprStmt and there's a missing semicolon after the
expression, we set a marker in the annotation token and continue
parsing.

Eventually, we propogate this info in ParseTopLevelStmtDecl and are able
to mark this Decl as something we want to do value printing. Below is a
example:

clang-repl> int x = 42;
clang-repl> x
// `x` is a TopLevelStmtDecl and without a semicolon, we should set
// it's IsSemiMissing bit so we can do something interesting in
// ASTConsumer::HandleTopLevelDecl.

The idea about annotation toke is proposed by Richard Smith, thanks!

Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D148997

Fix regression after D150436

llvm-clang-x86_64-expensive-checks-debian will fail after D150436 merged.
The fail occurred in X86, I changed the sort rule in AsmMatcher in Patch D150436, so x86 code will arrive line 633 first(will not affect other targets).
The logic here want to use the order record written in source file to make AsmMatcher to first use AVX instructions, it used field HasPositionOrder.
But the condition here just makes sure one of the compared record is subclass of Instruction and has field HasPositionOrder true, and didn't check another.

(Committing on behalf of @XinWang10 to unblock broken expensive-cjhecks builds)

Differential Revision: https://reviews.llvm.org/D150651

Revert "[Clang] Fix parsing of `(auto(x))`."

This reverts commit ef47318ec3615e83c328b07341046dfb9d869414.

This patch breaks valid code https://reviews.llvm.org/D149276#4345620

[asan][test][win] Move MSVC-specific tests into a subdir

This moves all but one remaining tests which use clang-cl and test
MSVC-specific behaviour into its own subdirectory. `dll_host.cpp` test
is excluded from the move because other tests also depend on its source
file, making it not MSVC-specific.

Differential Revision: https://reviews.llvm.org/D150271

[asan][test][win] Remove `REQUIRES: asan-rtl-heap-interception`

This appears to be a leftover from when these tests were first added in
D62927. Because of this, these tests had never run with `check-asan` or
`check-asan-dynamic`.

I've tested locally that these tests do pass on both i686 MSVC and MinGW
targets. They are disabled for 64-bit though, and I believe no LLVM
buildbots are testing for 32-bit Windows targets.

Differential Revision: https://reviews.llvm.org/D150270

[asan][test][win] Port more tests to not use clang-cl on MinGW (4)

This ports some tests that requires dead stripping or ICF.

Differential Revision: https://reviews.llvm.org/D150269

[asan][test][win] Port more tests to not use clang-cl on MinGW (3)

This ports tests which requires additional link flags.

Differential Revision: https://reviews.llvm.org/D150268

[asan][test][win] Port more tests to not use clang-cl on MinGW (2)

Continuation of D147432 and D147444.

Differential Revision: https://reviews.llvm.org/D150267

[asan][win][test] Disable interception_failure_test.cpp for static asan

This test checks that asan does not intercept user-provided libc
functions, but on Windows the static asan runtime does intercept static
copies of libc functions, so this test is invalid for said environment.
It used to fail from a different linker error, but this no longer
happens with newer WinSDK. Refer to comments on
https://reviews.llvm.org/D149549.

Differential Revision: https://reviews.llvm.org/D150349

ValueTracking: Implement computeKnownFPClass for sqrt

Could be slightly smarter in cases that are probably uninteresting.

[mlir] drop spurious PDL include

[Instsimplfy] X == Y ? 0 : X - Y --> X - Y

Alive2: https://alive2.llvm.org/ce/z/rPN1GB
Fixes: https://github.com/llvm/llvm-project/issues/62238

Depends on D150377

Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D150378

Add baseline tests for PR62238

Differential Revision: https://reviews.llvm.org/D150377

Signed-off-by: Jun Zhang <jun@junz.org>

[mlir][llvm] Add expect intrinsics.

The revision adds the LLVM expect and expect.with.probability
intrinsics.

Reviewed By: Dinistro, ftynse

Differential Revision: https://reviews.llvm.org/D150643

[mlir] Fix memory explosion when converting global variable bodies in ModuleTranslation

There is memory explosion when converting the body or initializer region of a large global variable, e.g. a constant array.

For example, when translating a constant array of 100000 strings:
```
llvm.mlir.global internal constant @cats_strings() {addr_space = 0 : i32, alignment = 16 : i64} : !llvm.array<100000 x ptr<i8>> {
    %0 = llvm.mlir.undef : !llvm.array<100000 x ptr<i8>>
    %1 = llvm.mlir.addressof @om_1 : !llvm.ptr<array<1 x i8>>
    %2 = llvm.getelementptr %1[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8>
    %3 = llvm.insertvalue %2, %0[0] : !llvm.array<100000 x ptr<i8>>
    %4 = llvm.mlir.addressof @om_2 : !llvm.ptr<array<1 x i8>>
    %5 = llvm.getelementptr %4[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8>
    %6 = llvm.insertvalue %5, %3[1] : !llvm.array<100000 x ptr<i8>>
    %7 = llvm.mlir.addressof @om_3 : !llvm.ptr<array<1 x i8>>
    %8 = llvm.getelementptr %7[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8>
    %9 = llvm.insertvalue %8, %6[2] : !llvm.array<100000 x ptr<i8>>
    %10 = llvm.mlir.addressof @om_4 : !llvm.ptr<array<1 x i8>>
    %11 = llvm.getelementptr %10[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8>
    %12 = llvm.insertvalue %11, %9[3] : !llvm.array<100000 x ptr<i8>>

    ... (ignore the remaining part)
}
```

where `@om_1`, `@om_2`, ... are string global constants.

Each time an operation is converted to LLVM, a new constant is created.
When it comes to `llvm.insertvalue`, a new constant array of 100000 elements is created and the old constant array (input) is not destroyed.
This causes memory explosion. We observed that, on a system with 128 GB memory, the translation of 100000 elements got killed due to using up all the memory.
On a system with 64 GB, 65536 elements was enough to cause the translation killed.

This patch fixes the issue by checking generated constants and destroyed them if there is no use.
By the fix, the translation of 100000 elements only takes about 1.6 GB memory, and finishes without any error.

Reviewed By: ftynse, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D148487

[clang] Convert several OpenMP tests to opaque pointers

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D150608

[lldb][DWARFASTParserClang] Don't create unnamed bitfields to account for vtable pointer

**Summary**

When filling out the LayoutInfo for a structure with the offsets
from DWARF, LLDB fills gaps in the layout by creating unnamed
bitfields and adding them to the AST. If we don't do this correctly
and our layout has overlapping fields, we will hat an assertion
in `clang::CGRecordLowering::lower()`. Specifically, if we have
a derived class with a VTable and a bitfield immediately following
the vtable pointer, we create a layout with overlapping fields.

This is an oversight in some of the previous cleanups done around this
area.

In `D76808`, we prevented LLDB from creating unnamed bitfields if there
was a gap between the last field of a base class and the start of a bitfield
in the derived class.

In `D112697`, we started accounting for the vtable pointer. The intention
there was to make sure the offset bookkeeping accounted for the
existence of a vtable pointer (but we didn't actually want to create
any AST nodes for it). Now that `last_field_info.bit_size` was being
set even for artifical fields, the previous fix `D76808` broke
specifically for cases where the bitfield was the first member of a
derived class with a vtable (this scenario wasn't tested so we didn't
notice it). I.e., we started creating redundant unnamed bitfields for
where the vtable pointer usually sits. This confused the lowering logic
in clang.

This patch adds a condition to `ShouldCreateUnnamedBitfield` which
checks whether the first field in the derived class is a vtable ptr.

**Testing**

* Added API test case

Differential Revision: https://reviews.llvm.org/D150591

[lldb][DWARFASTParserClang][NFC] Extract condition for unnamed bitfield creation into helper function

This patch adds a new private helper
`DWARFASTParserClang::ShouldCreateUnnamedBitfield` which
`ParseSingleMember` whether we should fill the current gap
in a structure layout with unnamed bitfields.

Extracting this logic will allow us to add additional
conditions in upcoming patches without jeoperdizing readability
of `ParseSingleMember`.

We also store some of the boolean conditions in local variables
to make the intent more obvious.

Differential Revision: https://reviews.llvm.org/D150590

[lldb][DWARFASTParserClang][NFC] Simplify unnamed bitfield condition

Minor cleanup of redundant variable initialization and
if-condition. These are leftovers/oversights from previous
cleanup in this area:
* https://reviews.llvm.org/D72953
* https://reviews.llvm.org/D76808

Differential Revision: https://reviews.llvm.org/D150589

[mlir][nfc] Remove unnecessary `-split-input-file`

[InstSimplify] Clarify simplifyWithOpReplaced() refinement requirement (NFC)

In order to justify some of the special cases we have, we need to
assume that Op/RepOp are non-poison. For the places where this
function is used, if one of these is poison, then the select result
is poison anyway.

[SCEV] Regenerate test checks (NFC)

[KnownBits] Handle shifts over wide types

Do not assert if the bit width is larger than 64 bits. This case
is currently hidden from the IR layer by other checks, but gets
exposed with future changes.

[MemRefToLLVM][NFC] Use early exit for the getter of the buffer ptr

Address review comment from https://reviews.llvm.org/D148947

[mlir] [mem2reg] Adapt to be pattern-friendly.

This revision modifies the mem2reg interfaces and algorithm to be more
omfortable to use as a pattern. The motivation behind this is that
currently the pattern needs to be applied to the scope op of the region
in which allocators should be promoted. However, a more natural way to
apply the pattern would be to apply it on the allocator directly. This
is not only clearer but easier to parallelize.

This revision changes the mem2reg pattern to operate this way. This
required restraining the interfaces to only mutate IR using
RewriterBase, as the previously used escape hatch is not granular enough
to match on the region that is modified only. This has the unfortunate
cost of preventing batching allocator promotion and making the block
argument adding logic more complex. Because batching no longer made any
sense, I made the internal analyzer/promoter decoupling private again.

This also adds statistics to the mem2reg infrastructure.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D150432