review.tizen.org Git - platform/upstream/llvm.git/log

[lld-macho][nfc] Put back shouldOmitFromOutput() asserts

I removed them in rG5de7467e982 but @thakis pointed out that
they were useful to keep, so here they are again. I've also converted
the `!isCoalescedWeak()` asserts into `!shouldOmitFromOutput()` asserts,
since the latter check subsumes the former.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D104169

[llvm-objcopy][MachO] Copy LC_LINKER_OPTIMIZATION_HINT

This fixes `error: unsupported load command (cmd=0x2e)`

[CSSPGO] Report zero-count probe in profile instead of dangling probes.

Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits:

1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode.

2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader.

3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples.

5. Better readability.

6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler.

Note that the current patch does include any work for #3. There will be follow-up changes.

For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%.

For #4, I have seen general counts quality for SPEC2017 is improved by 10%.

Reviewed By: wenlei, wlei, wmi

Differential Revision: https://reviews.llvm.org/D104129

[mlir][sparse] support new kind of scalar in sparse linalg generic op

We have several ways of introducing a scalar invariant value into
linalg generic ops (should we limit this somewhat?). This revision
makes sure we handle all of them correctly in the sparse compiler.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D104335

[ValueTracking] add FP intrinsics to test for propagatesPoison; NFC

I'm not sure what behavior we want if the FP environment is
not default (also not sure if there's a way to enumerate
the full list of intrinsics programmatically), but currently
these are all defaulting to 'false' (doesn't propagate).

RISCVFixupKinds.h: Don’t duplicate function or class name at the beginning of the comment && fix some comments

[flang] Correct the subscripts used for arguments to character intrinsics

When chasing down another unrelated bug, I noticed that the
implementations of various character intrinsic functions assume
that the lower bounds of (some of) their arguments were 1.
This isn't necessarily the case, so I've cleaned them up, tweaked
the unit tests to exercise the fix, and regularized the allocation
pattern used for results to use SetBounds() before Allocate() rather
than the old original Descriptor::Allocate() wrapper around
CFI_allocate().

Since there were few other remaining uses of the old original
Descriptor::Allocate() wrapper, I also converted them to the
new one and deleted the old one.

Differential Revision: https://reviews.llvm.org/D104325

[index] Fix performance regression with indexing macros

When using FileIndexRecord with macros, symbol references can be seen
out of source order, which was causing a regression to insert the
symbols into a vector. Instead, we now lazily sort the vector. The
impact is small on most code, but in very large files with many macro
references (M) near the beginning of the file followed by many decl
references (D) it was O(M*D). A particularly bad protobuf-generated
header was observed with a 100% regression in practice.

rdar://78628133

[llvm-objcopy] Make ihex writer similar to binary writer

There is no need to differentiate whether `UseSegments` is true or
false. Unifying the cases makes the behavior closer to BinaryWriter.

This improves compatibility with objcopy because SHF_ALLOC sections not in
a PT_LOAD will not be skipped. Such cases are usually erroneous input, though.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D104186

[M68k][GloballSel] Adding initial GlobalISel infrastructure

Wiring up GlobalISel for the M68k backend

Differential Revision: https://reviews.llvm.org/D101819

Revert "Revert "[libcxx][module-map] creates submodules for private headers""

This reverts commit d9633f229c36f292dab0e5f510ac635cfaf3a798 as a
workaround was discovered.

Differential Revision: https://reviews.llvm.org/D104170

[ValueTracking] add tests for propagatesPoison with FP ops; NFC

Verify that this matches the behavior in InstSimplify:
D104383 / ce95200b7942

We still need to add code/tests for FP intrinsics.

[gn build] Port ef16c8eaa5cd

Reapply "[MCA] Adding the CustomBehaviour class to llvm-mca".

The original change was pushed in main as commit f7a23ecece52.
It was then reverted by commit a04f01bab2 because it caused linker failures
on buildbots that don't build the AMDGPU target.

--

Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.

More details are available in the original commit log message (f7a23ecece52).

Differential Revision: https://reviews.llvm.org/D104149

[NFC][libomptarget] Reduce the dependency on libelf

This change-set removes libelf usage from elf_common part of the plugins.
libelf is still used in x86_64 generic plugin code and in some plugins
(e.g. amdgpu) - these will have to be cleaned up in separate checkins.

Differential Revision: https://reviews.llvm.org/D103545

[InstSimplify] propagate poison through FP ops

We already have this fold:
fadd float poison, 1.0 --> poison
...via ConstantFolding, so this makes the behavior consistent
if the other operand(s) are non-constant.

The fold for undef was added before poison existed as a
value/type in IR.

This came up in D102673 / D103169
because we're trying to sort out the more complicated handling
for constrained math ops.
We should have the handling for the regular instructions done
first, so we can build on that (or diverge as needed).

Differential Revision: https://reviews.llvm.org/D104383

[FuncSpec] Fixed prefix typo in test function-specialization-noexec.ll. NFC.

[libTooling][NFC] Refactor implemenation of Transformer Stencils to use standard OOP

Currently, the implementation combines OOP and overloads, using a template to
tie the two together. In practice, this has proven confusing with no
benefits. This patch simplifies the code to use standard OOP design (a
collection of classes deriving from an interface).

Differential Revision: https://reviews.llvm.org/D104317

[lld-macho] Downgrade version mismatch to warning

It's a warning in ld64. While having LLD be stricter would be nice, it
makes it harder for it to be a drop-in replacement into existing builds.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D104333

[SVE] Selection failure with scalable insertelements

Reviewed By: efriedma, CarolineConcatto

Differential Revision: https://reviews.llvm.org/D104244

[obj2yaml] Address D104035 review comments

Accidentally missed from commit 5c1639fe064b.

Differential Revision: https://reviews.llvm.org/D104035

[AMDGPU] Set VOP3P flag on Real instructions

This does not affect codegen but might benefit llvm-mca.

[llvm][AArch64] Handle arrays of struct properly (from IR)

This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46996

One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)

This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
  ret [ 1 x %T1 ] zeroinitializer
}
```

We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.

This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.

You get this:
```
  renamable $d0 = FMOVD0
  $w0 = COPY killed renamable $d0
```

Rather than this:
```
  $d0 = FMOVD0
  $w0 = COPY $wzr
```

The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.

This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.

Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.

New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104123

[libc++] Undeprecate the std::allocator<void> specialization

While the std::allocator<void> specialization was deprecated by
https://wg21.link/p0174#2.2, the *use* of std::allocator<void> by users
was not. The intent was that std::allocator<void> could still be used
in C++17 and C++20, but starting with C++20 (with the removal of the
specialization), std::allocator<void> would use the primary template.
That intent was called out in wg21.link/p0619r4#3.9.

As a result of this patch, _LIBCPP_ENABLE_CXX20_REMOVED_ALLOCATOR_MEMBERS
will also not control whether the explicit specialization is provided or
not. It shouldn't matter, since in C++20, one can simply use the primary
template.

Fixes http://llvm.org/PR50299

Differential Revision: https://reviews.llvm.org/D104323

[MCA][InstrBuilder] Always check for implicit uses of resource units (PR50725).

When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.

Fixes PR50725

[mlir] NFC - Drop newline form BlockArgument printing.

Differential Revision: https://reviews.llvm.org/D104368

[X86][AVX] Regenerate pr15296.ll tests

Exposes some really bad shift lowering codegen in shiftInput___canonical

[llvm-symbolizer] improve test and fix doc example after recent --print-source-context-lines behaviour change

I believe that after https://reviews.llvm.org/D102355 the behaviour of --print-source-context-lines has changed.

Before: --print-source-context-lines=3 prints 4 lines.
After: --print-source-context-lines=3 prints 3 lines.

Adjust the example in the docs for this change and make the testing a little more robust.

Differential Revision: https://reviews.llvm.org/D104114

[AMDGPU] Set SALU, VALU and other instruction type flags on Real instructions

This does not affect codegen but might benefit llvm-mca.

[libcxx] Fix exception raised during downstream bare-metal libunwind tests

Fix for the following exception.

AttributeError: 'TestingConfig' object has no attribute 'target_triple'

Related revision: https://reviews.llvm.org/D102012

'TestingConfig' object has no attribute 'target_triple'

Reviewed By: #libunwind, miyuki, danielkiss, mstorsjo

Differential Revision: https://reviews.llvm.org/D103140

[libc] Add a set of elementary operations

Resubmission of D100646 now making sure that we handle cases were `__builtin_memcpy_inline` is not available.

Original commit message:
Each of these elementary operations can be assembled to support higher order constructs (Overlapping access, Loop, Aligned Loop).
The patch does not compile yet as it depends on other ones (D100571, D100631) but it allows to get the conversation started.

A self-contained version of this code is available at https://godbolt.org/z/e1x6xdaxM

[SVE] Fix PromoteIntRes_TRUNCATE not to call getVectorNumElements

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104115

[lldb] Require Clang 8 for gpubnames test

This test is using -gpubnames which is only available since Clang 8. The
original Clang 7 requirement was based on the availability of
-accel-tables=Dwarf (which the test initially used before being changed to
-gpubnames in commit 15a6df52efaa7 ).

[OpenMP] libomp: fixed implementation of OMP 5.1 inoutset task dependence type

Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.

Differential Revision: https://reviews.llvm.org/D97085

[flang] Check there's no dependency on C++ libs. NFC

Add a test to make sure the flang runtime doesn't pull in the C++
runtime libraries.

This is achieved by adding a C file that calls some functions from the
runtime (currently only CpuTime, but we should probably add anything
complicated enough, e.g. IO-related things). We force the C compiler to
use -std=c90 to make sure it's really in C mode (we don't really care
which version of the standard, this one is probably more widely
available). We only enable this test if CMAKE_C_COMPILER is set to
something (which is probably always true in practice).

This is a recommit of 7ddbf26, with 2 fixes:
* Replace C++ comments with C comments
* Only enable the test if libFortranRuntime.a exists (this might not be
the case if e.g. BUILD_SHARED_LIBS=On)

Differential Revision: https://reviews.llvm.org/D104290

[AMDGPU] Set IsAtomicRet and IsAtomicNoRet on Real instructions

This does not affect codegen but might benefit llvm-mca.

Revert "[libc] Add a set of elementary operations"

This reverts commit 4694321fbe54628513b75a4395124cd7508581a6.

[libc] Add a set of elementary operations

Resubmission of D100646 now making sure that we handle cases were `__builtin_memcpy_inline` is not available.

Original commit message:
Each of these elementary operations can be assembled to support higher order constructs (Overlapping access, Loop, Aligned Loop).
The patch does not compile yet as it depends on other ones (D100571, D100631) but it allows to get the conversation started.

A self-contained version of this code is available at https://godbolt.org/z/e1x6xdaxM

[lldb] vwprintw -> vw_printw in IOHandlerCursesGUI

`vwprintw` is (in theory) using the `arargs.h` va_list while `vw_printw` is
using the `stdarg.h` va_list. It seems these days they can be used
interchangeably but `vwprintw` is marked as deprecated.

[AMDGPU] Set mayLoad and mayStore on Real instructions

This does not affect codegen but might benefit llvm-mca.

Revert "[flang] Check there's no dependency on C++ libs"

This reverts commit 7ddbf2633911a5c378ad6af01e250f6f252b9032.

This doesn't work if we're not building libFortranRuntime.a. I'll
recommit with a fix.

[lld/mac] Add support for -no_data_in_code_info flag

Differential Revision: https://reviews.llvm.org/D104345

[lld/mac] Put lld-only flags in "LLD-SPECIFIC:" --help section

Differential Revision: https://reviews.llvm.org/D104347

[ELF] Consider that NOLOAD sections should be placed in a PT_LOAD segment

During PHDR creation, the case where an output section does not require a
PT_LOAD header but still occupies memory in the current VMA region was not handled.

If such an output section interleaves two output sections that have the same
VMA and LMA regions set, we would previously re-use the existing PT_LOAD header
for the second output section.
However, since the memory region is not contiguous, we need to start a new PT_LOAD
segment.

This fixes https://bugs.llvm.org/show_bug.cgi?id=50558

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D103815

[mlir] ODS: temporarily disbale external model in presence of extra class declarations

Default implementations of interfaces may rely on extra class
declarations, which aren't currently generated in the external model,
that in turn may rely on functions defined in the main Attribute/Type
class, which wouldn't be available on the external model.

[ARM] Correct type of setcc results for FP vectors

Under MVE v4f32 and v8f16 vectors should be using v4i1/v8i1 predicates
for the setcc result type, as they have predicated registers for those
types. Setting this correctly prevents some inefficient optimizations
from happening.

[ARM] Extra tests for sign extended floating point compares. NFC

[flang] Fixup 7ddbf2633911a5c378ad6af01e250f6f252b9032

Replace C++ comments with C-style comments (not sure why my C compiler
doesn't complain about this).

[FuncSpec] Remove other passes in a test RUN line. NFC.

[flang] Add clang-tidy check for braces around if

Flang diverges from the llvm coding style in that it requires braces
around the bodies of if/while/etc statements, even when the body is
a single statement.

This commit adds the readability-braces-around-statements check to
flang's clang-tidy config file. Hopefully the premerge bots will pick it
up and report violations in Phabricator.

We also explicitly disable the check in the directories corresponding to
the Lower and Optimizer libraries, which rely heavily on mlir and llvm
and therefore follow their coding style. Likewise for the tools
directory.

We also fix any outstanding violations in the runtime and in
lib/Semantics.

Differential Revision: https://reviews.llvm.org/D104100

[FuncSpec] Add test for a call site that will never be executed. NFC.

[yaml2obj][obj2yaml] Support custom ELF section header string table name

This patch adds support for a new field in the FileHeader, which states
the name to use for the section header string table. This also allows
combining the string table with another string table in the object, e.g.
the symbol name string table. The field is optional. By default,
.shstrtab will continue to be used.

This partially fixes https://bugs.llvm.org/show_bug.cgi?id=50506.

Reviewed by: Higuoxing

Differential Revision: https://reviews.llvm.org/D104035

[yaml2obj] Fix bug when referencing items in SectionHeaderTable

There was an off-by-one error caused by an index (which included an
index for the null section header) being used to check against the size
of a list of sections (which didn't include the null section header).

This is a partial fix for https://bugs.llvm.org/show_bug.cgi?id=50506.

Reviewed by: MaskRay

Differential Revision: https://reviews.llvm.org/D104098

[AMDGPU] Set more flags on Real instructions

This does not affect codegen, which only tests these flags on Pseudo
instructions, but might help llvm-mca which has to work with Real
instructions. In particular setting LGKM_CNT on DS instructions helps
with the problem identified in D104149.

Differential Revision: https://reviews.llvm.org/D104293

[flang] Check there's no dependency on C++ libs

Add a test to make sure the flang runtime doesn't pull in the C++
runtime libraries.

This is achieved by adding a C file that calls some functions from the
runtime (currently only CpuTime, but we should probably add anything
complicated enough, e.g. IO-related things). We force the C compiler to
use -std=c90 to make sure it's really in C mode (we don't really care
which version of the standard, this one is probably more widely
available). We only enable this test if CMAKE_C_COMPILER is set to
something (which is probably always true in practice).

Differential Revision: https://reviews.llvm.org/D104290

[AMDGPU] Use defvar in SOPInstructions.td. NFC.

Factor out repeated !cast<SOP*_Pseudo>(NAME) into a new "defvar ps",
just to improve readability and maintainability.

Differential Revision: https://reviews.llvm.org/D104306

[OpenMP][NFC] Add back suppression of warning

Commit cff215565e9 did not fix all unused variables in different builds,
so adding back the suppression for now.

[FuncSpec] Statistics

Adds some bookkeeping for collecting the number of specialised functions and a
test for that.

Differential Revision: https://reviews.llvm.org/D104102

[ORC] Switch to WrapperFunction utility for calls to registration functions.

Addresses FIXMEs in TPC-based EH-frame and debug object registration code by
replacing manual argument serialization with WrapperFunction utility calls.

[flang][nfc] Move `external-hello-world` to flang/examples

As `external-hello-world` is not really a test, I am moving it from
`flang/unittest/Runtime` to `flang/examples` (it makes a lot of sense as
an example). I've not modified the source code (apart from adjusting the
include paths).

Differential Revision: https://reviews.llvm.org/D104320

[flang][driver] Add `-fdebug-dump-all`

The new option will run the semantic checks and then dump the parse tree
and all the symbols. This is equivalent to running the driver twice,
once with `-fdebug-dump-parse-tree` and then with
the `-fdebug-dump-symbols` action flag.

Currently we wouldn't be able to achieve the same by simply running:
```
flang-new -fc1 -fdebug-dump-parse-tree -fdebug-dump-symbols <input-file>
```
That's because the new driver will only run one frontend action per
invocation (both of the flags used here are action flags). Diverging
from this design would lead to costly compromises and it's best avoided.

We may want to consider re-designing our debugging actions (and action
options) in the future so that there's more code re-use. For now, I'm
focusing on making sure that we support all the major cases requested by
our users.

Differential Revision: https://reviews.llvm.org/D104305

[OpenMP] Remove unused variables from libomp code

Several variables were left unused as a result of different patches removing
their use.

Two variables have some use:
`poll_count` is used by the KMP_BLOCKING macro only under certain conditions.
Adding (void) to tell the compiler to ignore the unused variable.

`padding` is a dummy stack allocation with no intent to be used. Also adding
(void) to make the compiler ignore the unused variable.

Differential Revision: https://reviews.llvm.org/D104303

[NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half) element type

As per the discussion in D103818, so far, this does not appear to be worthwhile.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103818

[SCEV] PtrToInt on non-integral pointers is allowed

As per (committed without review) @reames's rGac81cb7e6dde9b0890ee1780eae94ab96743569b change,
we are now allowed to produce `ptrtoint` for non-integral pointers.
This will unblock further unbreaking of SCEV regarding int-vs-pointer type confusion.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D104322

[SLP] Incorrect handling of external scalar values

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103954

[DFSan][NFC] Fix shadowing variable name.

[LLDB] Fix buildbots breakage due to TestGuessLanguage.py

Fix LLDB buidbot breakage due to D104291

Differential Revision: https://reviews.llvm.org/D104291

[SampleFDO] Place the discriminator flag variable into the used list.

We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.

This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.

Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.

Differential Revision: https://reviews.llvm.org/D103988

[flang] Add semantic check for the RANDOM_SEED intrinsic

I added the only check that wasn't already tested along with tests for
many valid and invalid arguments.

Differential Revision: https://reviews.llvm.org/D104318

Revert "[SampleFDO] Using common linkage for the discriminator flag variable"

This reverts commit 434fed5aff5e62460e2e984c7cc2674c12779b1e.

Post commit review suggested to use another implmenentation.
Detailed can be found in the review.

[Driver] Delete -fsanitize-coverage-blocklist= in favor of -fsanitize-coverage-ignorelist=

We are settled with -fsanitize-coverage-ignorelist (D101832).
Just delete -fsanitize-coverage-blocklist which is also new.

[Debug-Info] guard DW_LANG_C_plus_plus_14 under strict dwarf

Reviewed By: stuart

Differential Revision: https://reviews.llvm.org/D104291

X86: balance the frame prologue and epilogue on Win64

This was broken in ba1509da7b89c850c89f0f98afbab375794cd3c8. The Win64
frame would not perform the setup of the Swift async context parameter
but would tear down the setup in the epilogue resulting in crashes.
This ensures that we do the full setup when we do the tear down.
Although this is non-conforming to the Win64 calling convention, it
corrects the setup and exposes the actual issue that the change
introduced: incorrect frame setup.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104246

[FuncSpec] Use std::pow instead of operator^

The original implementation calculating UserBonus uses operator ^, which means XOR in C++
language.
At the first glance of reviewing, I thought it should be power, my bad.
It doesn't make sense to use XOR here. So I believe it should be a
carelessness as I made.

Test Plan: check-all

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D104282

[NFC][sanitizer] Remove calls to __asan_get_current_fake_stack

Unnecessary with -fsanitize-address-use-after-return=never.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104154

[DFSan] Handle landingpad inst explicitly as zero shadow.

Before this change, DFSan was relying fallback cases when getting origin
address.

Differential Revision: https://reviews.llvm.org/D104266

[libc++] Promote GCC 11 to mandatory CI

Also, fix the last issue that prevented GCC 11 from passing the test
suite. Thanks to everyone else who fixed issues.

Differential Revision: https://reviews.llvm.org/D104315

Don't depend on the "run" alias doing shell expanding.
Instead dial it up explicitly.

This test started failing recently and I'm not sure why. It also
doesn't make sense to me the replacing "run" with "process launch -X 1 --"
should make any difference - run is an alias for the latter. But
it does pass with the change, and unless we are testing for the exact
run alias, it's better to ask for what we want explicitly.

CMake: allow overriding CMAKE_CXX_VISIBILITY_PRESET

This allows overriding the `CMAKE_CXX_VISIBILITY_PRESET` on the command line. For example, setting the value to `default` lets PIC LLVM static libraries be converted to DSOs, without the need to rebuild LLVM with BUILD_SHARED_LIBS=ON.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104168

[mlir][sparse] integration test for all-dense annotated "sparse" output

Reviewed By: gussmith23

Differential Revision: https://reviews.llvm.org/D104277

Missed a Windows use of ValidForThisThread in the changes for
cfb96d845a684a5c567823dbe2aa4392937ee979.

[docs] Exclude FlangOption and re-generate ClangCommandLineReference.rst

[mlir][SCF] Remove empty else blocks of `scf.if` operations.

Differential Revision: https://reviews.llvm.org/D104273

[OpaquePtr] Verify Opaque pointer in function parameter

Verifying opaque pointer as function parameter when using with `byval`, `byref`,
`inalloca`, `preallocated`.

Differential Revision: https://reviews.llvm.org/D104309

[mlir][sparse] allow all-dense annotated "sparse" tensor output

This is a very careful start with alllowing sparse tensors at the
left-hand-side of tensor index expressions (viz. sparse output).
Note that there is a subtle difference between non-annotated tensors
(dense, remain n-dim, handled by classic bufferization) and all-dense
annotated "sparse" tensors (linearized to 1-dim without overhead
storage, bufferized by sparse compiler, backed by runtime support library).
This revision gently introduces some new IR to facilitate annotated outputs,
to be generalized to truly sparse tensors in the future.

Reviewed By: gussmith23, bixia

Differential Revision: https://reviews.llvm.org/D104074

[SampleFDO] Using common linkage for the discriminator flag variable

We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.

This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.

In this patch, we use a "common" linkage for this variable so that
it will be GC'ed by the linker.

Differential Revision: https://reviews.llvm.org/D103988

Convert functions that were returning BreakpointOption * to BreakpointOption &.

This is an NFC cleanup.

Many of the API's that returned BreakpointOptions always returned valid ones.
Internally the BreakpointLocations usually have null BreakpointOptions, since they
use their owner's options until an option is set specifically on the location.
So the original code used pointers & unique_ptr everywhere for consistency.
But that made the code hard to reason about from the outside.

This patch changes the code so that everywhere an API is guaranteed to
return a non-null BreakpointOption, it returns it as a reference to make
that clear.

It also changes the Breakpoint to hold a BreakpointOption
member where it previously had a UP. Since we were always filling the UP
in the Breakpoint constructor, having the UP wasn't helping anything.

Differential Revision: https://reviews.llvm.org/D104162

[OpenMP] Add GOMP 5.0 version symbols to API

* Add GOMP versioned pause functions
* Add GOMP versioned affinity format functions

To do the affinity format functions, only attach versioned symbols
to the APPEND Fortran entries (e.g., omp_set_affinity_format_) since
GOMP only exports two symbols (one for Fortran, one for C). Our
affinity format functions have three symbols.
e.g., with omp_set_affinity_format:
1) omp_set_affinity_format (Fortran interface)
2) omp_set_affinity_format_ (Fortran interface)
3) ompc_set_affinity_format (C interface)

Have the GOMP version of the C symbol alias the ompc_* 3) version
instead of the Fortran unappended version 1).

Differential Revision: https://reviews.llvm.org/D103647

[OpenMP] Fix affinity determine capable algorithm on Linux

Remove strange checks for syscall() arguments where mask is NULL.
Valgrind reports these as error usages for the syscall.
Instead, just check if CACHE_LINE bytes is long enough. If not, then
search for the size. Also, by limiting the first size detection
attempt to CACHE_LINE bytes, instead of 1MB, we don't use more than one
cache line for the mask size. Before this patch, sometimes the returned
mask size was 640 bytes (10 cache lines) because the initial call to
getaffinity() was limited only by the internal kernel mask size
which can be very large.

Differential Revision: https://reviews.llvm.org/D103637

[OpenMP] Lazily assign root affinity

Lazily set affinity for root threads. Previously, the root thread
executing middle initialization would attempt to assign affinity
to other existing root threads. This was not working properly as the
set_system_affinity() function wasn't setting the affinity for the
target thread. Instead, the middle init thread was resetting the
its own affinity using the target thread's affinity mask.

Differential Revision: https://reviews.llvm.org/D103625

AArch64 Linux and elf-core PAC stack unwinder support

This patch builds on D100521 and other related patches to add support
for unwinding stack on AArch64 systems with pointer authentication
feature enabled.

We override FixCodeAddress and FixDataAddress function in ABISysV_arm64
class. We now try to calculate and set code and data masks after reading
data_mask and code_mask registers exposed by AArch64 targets running Linux.

This patch utilizes core file linux-aarch64-pac.core for testing that
LLDB can successfully unwind stack frames in the presence of signed
return address after masking off ignored bits.

This patch also includes a AArch64 Linux native test case to demonstrate
successful back trace calculation in presence of pointer authentication
feature.

Differential Revision: https://reviews.llvm.org/D99944

[libc][NFC] Disable thrd_test as it is exhibiting flaky behavior on the bots.

Revert "[MCA] Adding the CustomBehaviour class to llvm-mca"

This reverts commit f7a23ecece524564a0c3e09787142cc6061027bb.

It appears to breaks buildbots that don't build the AMDGPU backend.

[MCA] Adding the CustomBehaviour class to llvm-mca

Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.

Implementation details:

llvm-mca does its best to extract relevant register, resource, and memory
information from every MCInst when lowering them to an mca::Instruction. It then
uses this information to detect dependencies and simulate stalls within the
pipeline. For some instructions, the information that gets captured within the
mca::Instruction is not enough for mca to simulate them properly. In these
cases, there are two main possibilities:

1. The instruction has a dependency that isn’t detected by mca.
2. mca is incorrectly enforcing a dependency that shouldn’t exist.

For the rest of this discussion, I will be focusing on (1), but I have put some
thought into (2) and I may revisit it in the future.

So we have an instruction that has dependencies that aren’t picked up by mca.
The basic idea for both pipelines in mca is that when an instruction wants to be
dispatched, we first check for register hazards and then we check for resource
hazards. This is where CB is injected. If no register or resource hazards have
been detected, we make a call to CustomBehaviour::checkCustomHazard() to give
the target specific CB the chance to detect and enforce any custom dependencies.

The return value for checkCustomHazaard() is an unsigned int representing the
(minimum) number of cycles that the instruction needs to stall for. It’s fine to
underestimate this value because when StallCycles gets down to 0, we’ll end up
checking for all the hazards again before the instruction is actually
dispatched. However, it’s important not to overestimate the value and the more
accurate your estimate is, the more efficient mca’s execution can be.

In general, for checkCustomHazard() to be able to detect these custom
dependencies, it needs information about the current instruction and also all of
the instructions that are still executing within the pipeline. The mca pipeline
uses mca::Instruction rather than MCInst and the current information encoded
within each mca::Instruction isn’t sufficient for my use cases. I had to add a
few extra attributes to the mca::Instruction class and have them get set by the
MCInst during instruction building. For example, the current mca::Instruction
doesn’t know its opcode, and it also doesn’t know anything about its immediate
operands (both of which I had to add to the class).

With information about the current instruction, a list of all currently
executing instructions, and some target specific objects (MCSubtargetInfo and
MCInstrInfo which the base CB class has references to), developers should be
able to detect and enforce most custom dependencies within checkCustomHazard. If
you need more information than is present in the mca::Instruction, feel free to
add attributes to that class and have them set during the lowering sequence from
MCInst.

Fortunately, in the in-order pipeline, it’s very convenient for us to pass these
arguments to checkCustomHazard. The hazard checking is taken care of within
InOrderIssueStage::canExecute(). This function takes a const InstRef as a
parameter (representing the instruction that currently wants to be dispatched)
and the InOrderIssueStage class maintains a SmallVector<InstRef, 4> which holds
all of the currently executing instructions. For the out-of-order pipeline, it’s
a bit trickier to get the list of executing instructions and this is why I have
held off on implementing it myself. This is the main topic I will bring up when
I eventually make a post to discuss and ask for feedback.

CB is a base class where targets implement their own derived classes. If a
target specific CB does not exist (or we pass in the -disable-cb flag), the base
class is used. This base class trivially returns 0 from its checkCustomHazard()
implementation (meaning that the current instruction needs to stall for 0 cycles
aka no hazard is detected). For this reason, targets or users who choose not to
use CB shouldn’t see any negative impacts to accuracy or performance (in
comparison to pre-patch llvm-mca).

Differential Revision: https://reviews.llvm.org/D104149

Adding the rest of the C11 papers to the status page.

[gn build] Port 6478ef61b1a4

[InstSimplify] Treat invariant group insts as bitcasts for load operands

We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.

Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.

Relanding with a check for self-referential values.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101103

[asan] Remove Asan, Ubsan support of RTEMS and Myriad

Differential Revision: https://reviews.llvm.org/D104279

[NFC] Fix "unused variable" warning

Support lowering of index-cast on vector types.

The index cast operation accepts vector types. Implement its lowering in this patch.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D104280