review.tizen.org Git - platform/upstream/llvm.git/log

[Clang][Driver] Default Generic_GCC::IsIntegratedAssemblerDefault to true

Invert the logic and have the default being true. Disable the few spots where
it looks like IAS is currently not used.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D147030

[crt] Remove __USER_LABEL_PREFIX__

The .init_array code is ELF specific. For ELF platforms,
`__USER_LABEL_PREFIX__` is defined as "". Make the simplification
so that downstream ELF targets can build this file even if
`__USER_LABEL_PREFIX__` is undefined.

Reviewed By: barannikov88

Differential Revision: https://reviews.llvm.org/D147093

Reland "[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2"

This reverts commit db6a979ae82410e42430e47afa488936ba8e3025.

Reland D102817 without any change. The previous revert was a mistake.

Differential Revision: https://reviews.llvm.org/D102817

[fuzzer] Limit big-file-copy.test to darwin only

This test has to be limited to darwin due to multiple failures on other
platforms for multple reasons. (Timeout, puts() limit, etc.). This
commit modifies D146189.

Reviewed By: NoQ

Differential Revision: https://reviews.llvm.org/D147094

[mlir][bufferization] Use rewriter to erase ops in scf.forall bufferization.

Without this bufferization cannot track operations removed during bufferization.
Unfortunately there is currently no way to enforce that ops need to be erased through
the rewriter and this causes sporadic errors when tracking pointers in Bufferization pass.
Therefore there is no easy way to test that the pattern is doing the right thing.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D147095

[mlir][sparse] Fixing -Wignored-reference-qualifiers in MergerTest.cpp

These warnings were introduced by D146561.

Reviewed By: aartbik, Peiming

Differential Revision: https://reviews.llvm.org/D147090

[bazel] Port zstd support

Originally added in D128465. Used by `llvm:Support` and `lld:ELF`.

Enabled by default. Disable with `--@llvm_zstd//:llvm_enable_zstd=false`.

Reviewed By: MaskRay, GMNGeoffrey

Differential Revision: https://reviews.llvm.org/D143344

[CSSPGO][Preinliner] Trim cold call edges of the profiled call graph for a more stable profile generation.

I've noticed that for some services CSSPGO profile is less stable than non-CS AutoFDO profile from profiling to profiling without source changes. This is manifested by comparing profile similarities. For example in my experiments, AutoFDO profiles are always 99+% similar over same binary but different inputs (very close dynamic traffics) while CSSPGO profile similarity is around 90%.

The main source of the profile stability is the top-down order computed on the profiled call graph in the llvm-profgen CS preinliner. The top-down order is used to guide the CS preinliner to pre-compute an inline decision that is later on fulfilled by the compiler. A subtle change in the top-down order from run to run could cause a different inline decision computed. A deeper look in the diversion of the top-down order revealed that:
- The topological sorting inside one SCC isn't quite right. This is fixed by {D130717}.
- The profiled call graphs of the two sides of the A/B run isn't 100% the same. The call edges in the two runs do not subsume each other, and edges appear in both graphs may not have exactly the same weight. This is due to the nature that the graphs are dynamic. However, I saw that the graphs can be made more close by removing the cold edges from them and this bumped up the CSSPGO profile stableness to the same level of the AutoFDO profile.

Removing cold call edges from the dynamic call graph may have an impact on cold inlining, but so far I haven't seen any performance issues since the CS preinliner mainly targets hot callsites, and cold inlining can always be done by the compiler CGSCC inliner.

Also fixing an issue where the largest weight instead of the accumulated weight for a call edge is used in the profiled call graph.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147013

[lldb] Support Universal Mach-O binaries with a fat64 header

Support universal Mach-O binaries with a fat64 header. After
4d683f7fa7d4, dsymutil can now generate such binaries when the offsets
would otherwise overflow the 32-bit offsets in the regular fat header.

rdar://107289570

Differential revision: https://reviews.llvm.org/D147012

[AMDGPU] Replace target feature for global fadd32

Change target feature of __builtin_amdgcn_global_atomic_fadd_f32
to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a,
gfx940 and gfx1100 as they all support the return variant of
`global_atomic_add_f32`.

Fixes https://github.com/llvm/llvm-project/issues/61331.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D146840

[bazel] Fix MLIR tests after 92c6946

Reviewed By: GMNGeoffrey

Differential Revision: https://reviews.llvm.org/D147088

[mlir][sparse] convert a sparse tensor slice to sparse tensor correctly.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D147074

[analyzer] Teach scan-build how to pass -analyzer-config to xcodebuild.

The scan-build tool assists various build systems with applying the Clang
static analyzer alongside compilation. It offers explicit integration with
Xcode's native build system aka `xcodebuild`; in this case it doesn't
substitute the compiler, but instead kindly asks xcodebuild to enable
the static analyzer, something that it already knows how to do.

Make sure scan-build's `-analyzer-config` flag (which translates to a
similar `clang -cc1 -analyzer-config` flag) is properly translated
to Xcode build system. This unbreaks a few related features such as
checker silencing.

No LIT tests because they'd require an Xcode installation on your system.

[fuzzer] Use puts() rather than printf() in CopyFileToErr()

CopyFileToErr() uses Printf("%s", ...) which fails with a negative size on
files >2Gb (Its path is through var-args wrappers to an unnecessary "%s"
expansion and subject to int overflows) Using puts() in place of printf()
bypasses this path and writes the string directly to stderr. This avoids the
present loss of data when a crashed worker has generated >2Gb of output.

rdar://99384640

Reviewed By: yln, rsundahl

Differential Revision: https://reviews.llvm.org/D146189

[clang][PowerPC] Remove remaining Darwin support

POWER Darwin support in the backend has been removed for some time: https://discourse.llvm.org/t/rfc-remove-darwin-support-from-power-backends
but Clang still has the TargetInfo and other remnants lying around.

This patch does some cleanup and removes those and other related frontend support still remaining. We adjust any tests using the triple to either remove
the test if unneeded or switch to another Power triple.

Reviewed By: MaskRay, nemanjai

Differential Revision: https://reviews.llvm.org/D146459

Disable resize_tls_dynamic test for HWASan

The test is not applicable because HWASan does not intercept __tls_get_addr.

This is pre-emptive cleanup, to get ready for Kirill's patch to enable sanitizer common tests for HWASan (https://reviews.llvm.org/D147067).

Note that there is an outstanding dynamic TLS bug for sanitizers - https://github.com/google/sanitizers/issues/1409
- but that isn't applicable here due to the lack of interception.

Test: LIT_FILTER=resize_tls_dynamic ninja check-sanitizer

Differential Revision: https://reviews.llvm.org/D147076

DebugInfo: Rebuild dwp debug_info index column from v5 indexes more robustly

the v4 rebuilding is a best-effort because it's not possible to reliably
parse the DWO ID as it requires the abbrev section (& if the index isn't
trustworthy then there's no way to find the associated abbrev section
contribution for a given info section contribution)

But in v5 the DWO ID/type signature is in the header and can be rebuilt
losslessly (only at the cost of performance of rescanning/parsing the
headers of all the units), so let's implement that.

the testing isn't /ideal/ - I think the testing should've been
implemented as a hardcoded dwp file with a corrupted/incorrect index,
then the test could've demonstrated that reparsing the index produces
the right answer - but this is a quick port of the existing v5 test back
to v4 so that we don't lose coverage on the v4 codepath now that it's
separated from the v5 codepath.

Differential Revision: https://reviews.llvm.org/D146662

[clang-tidy] Add option to ignore capture default by reference in cppcoreguidelines-avoid-capture-default-when-capturing-this

The rule exists primarily for when using capture default
by copy "[=]", since member variables will be captured by
reference, which is against developer expectations.

However when the capture default is by reference, then there
is no doubt: everything will be captured by reference. Add
an option to allow just that.

Note: Release Notes do not need update since this check
has been introduced in the current WIP release.

A ticket has been opened at the C++ Core Guidelines repo
to consider updating the rule such that this behavior
is the default one:
https://github.com/isocpp/CppCoreGuidelines/issues/2060

Differential Revision: https://reviews.llvm.org/D147062

Simplify index rebuilding test.

This isn't an ideal test - probably would be better if it had a
corrupted index (& was hardcoded - so it didn't depend on llvm-dwp) to
demonstrate that index rebuilding produces a distinct result.

But, ah well, this'll do for now.

[libc][NFC] Fix conversion warning

[clang-format] Handle '_' in ud-suffix for IntegerLiteralSeparator

Also, handle imaginary numbers, i.e., those with suffixes starting
with an 'i'.

Fixes #61676.

Differential Revision: https://reviews.llvm.org/D146844

[bazel] Fix mlir buildifier issues

[flang] Fine-tune NAN formatted input editing

Per Fortran 2018, "NAN" and "NAN()" are to be translated into quiet
NaNs, and the other forms are implementation-dependent; I've made
them quiet NaNs too. Also process signs on input NaNs, which seems
wrong but other compilers all do it, and fix some misleading template
argument names noticed along the way.

Differential Revision: https://reviews.llvm.org/D147071

[MLIR][MemRef] Add missing #include for FailureOr

FailureOr was used without including correct headers, so the code only works if the user of Transform.h includes the correct headers first.

Reviewed By: jyknight

Differential Revision: https://reviews.llvm.org/D147069

[libc] Install GPU headers to `gpu-none-llvm/` subfolder

The GPU support for the `libc` generates all its own headers. Since
these headers use the same names as the system headers we need to make
sure that they are separate. Currently, we either use the system headers
on the GPU or the GPU headers on the system. This patch makes them
explicitly separate. A follow-up patch will then make `clang` look in
this folder by default.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D146970

[lldb] TestInferiorCrashing.py should check for crash reason

In a now-reverted series of patches, I inadvertently broke the ability
for lldb-server to explain a crash reason. To ensure that this feature
continues to work after future refactors, let's test the feature.

Differential Revision: https://reviews.llvm.org/D147001

[flang] Fix checking of pointer passed to assumed-rank

Don't check ranks when a pointer actual argument is associated with
a pointer assumed-rank dummy argument.

Differential Revision: https://reviews.llvm.org/D147052

[libc] Support setting 'native' GPU architecture for libc

We already use the `amdgpu-arch` and `nvptx-arch` tools to determine the
GPU architectures the user's system supports. We can provide
`LIBC_GPU_ARCHITECTURES=native` to allow users to easily build support
for only the one found on their system. This also cleans up the code
somewhat.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D146994

[mlir][sparse] Removing shared_ptr from the MergerTest.cpp unit test

This is a preliminary change to make way for converting the Merger's identifier types from mere typedefs to actual types (which causes some issues that this patch fixes).

Depends On D146676

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D146561

[BOLT] computing raw branch count for yaml profiles

`Function.RawBranchCount` is initialized for fdata profile but not for yaml one.
The diff adds the computation of the field for yaml profiles

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D144211

[NFC] For formatting for `enumerator_result::get`.

[Fuchsia][CMake] Always use multiple distribution model.

[flang] Normalize logical values during type conversions.

Flang was missing value normalization for logical<->integer conversions
which is required by Flang specification. The shrinking logical<->logical
conversions were also incorrectly truncating the input.
This change performs value normalization for all logical<->integer
conversions and logical<->logical conversions between different kinds.

Note that value normalization is not strictly required for
logical(kind=k1)->logical(kind=k2) conversions when k1 < k2.

Differential Revision: https://reviews.llvm.org/D147019

[OpenMP][MLIR] Fix warning from getIsDevice OffloadModuleInterface function

Missed the default return component of the function on original
implementation, which is a warning that causes subsequent
failure (but regardless it's incorrect behaviour and should
have been fixed).

Fix build failures with MSVC 14.x

[SLP][AArch64] Add test to check for the vectorization of fshl

Currently the cost for fshl is an overestimate causing SLP to vectorize when it is not necessary.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D147056

[llvm] Use pointer index type for more GEP offsets (pre-codegen)

Many uses of getIntPtrType() were using that type to calculate the
neened type for GEP offset arguments. However, some time ago,
DataLayout was extended to support pointers where the size of the
pointer is not equal to the size of the values used to index it.

Much code was already migrated to, for example, use getIndexSizeInBits
instead of getPtrSizeInBits, but some rewrites still used
getIntPtrType() to get the type for GEP offsets.

This commit changes uses of getIntPtrType() to getIndexType() where
they are involved in a GEP-related calculation.

In at least one case (bounds check insertion) this resolves a compiler
crash that the new test added here would previously trigger.

This commit does not impact
- C library-related rewriting (memcpy()), which are operating under
the assumption that intptr_t == size_t. While all the mechanisms for
breaking this assumption now exist, doing so is outside the scope of
this commit.
- Code generation and below. Note that the use of getIntPtrType() in
CodeGenPrepare will be changed in a future commit.
- Usage of getIntPtrType() in any backend

Depends on D143435

Reviewed By: arichardson

Differential Revision: https://reviews.llvm.org/D143437

[lldb] Fix value printing for a specific case

Fixes printing of spaces in cases where the following are true:

1. Persistent results are disabled
2. The type has a summary string

As reported by @jgorbe in D146783, two spaces were being printed before the summary
string, and no spaces were printed after.

Differential Revision: https://reviews.llvm.org/D147006

[MLIR] Refactor affine tilePerfectlyNestedLoops to drop validity check

The affine loop utility `tilePerfectlyNestedLoops` was checking for the
validity of tiling as well as performing the tiling. This is
inconsistent with how other similar utilities work. Move out the
analysis/check from the utility so that the latter only performs the
mechanics of IR manipulation.

This is NFC/pure move beyond the change in behavior of
tilePerfectlyNestedLoops.

Differential Revision: https://reviews.llvm.org/D147055

[mlir-cpu-runner] Add export_executable_symbols in CMake.

LLJIT needs access to symbols (e.g. llvm_orc_registerEHFrameSectionWrapper)
that will be defined in the executable when LLVM is linked statically.

This change is consistent with how other tools within LLVM use LLJIT. It
is required to make sure that `mlir-cpu-runner --host-supports-jit`
correctly returns `true` on platforms that do support JITting (in my
case that's AArch64 Linux).

See https://github.com/llvm/llvm-project/issues/61712 for more context.

Differential Revision: https://reviews.llvm.org/D146935

[clang][doc] Fix link to SYCL compiler design doc

[AArch64] Add v8.9a/v9.4a FEAT_ATS1A

FEAT_ATS1A adds three new AT system instruction aliases. This feature is
optional from v8.9a/v9.4a. FEAT_ATS1A is a very late addition to the
2022 A-profile VMSA extension, and has not yet been added to the public
docs available on developer.arm.com

These AT instructions are added without a command-line flag or feature,
because it is system-instruction only, and FEAT_S1PIE also has no
command-line flag.

Differential Revision: https://reviews.llvm.org/D146962

[RISCV] Add shuffle cost tests for general fixed vector permute [nfc]

[RISCV] Consolidate and extend fixed vector shuffle cost tests [nfc]

[AMDGPU] Avoid duplicated work in SIRegisterInfo::getReservedRegs

[OpenMP][Flang][MLIR] Implement OffloadModuleInterface for OpenMP Dialect and convert is_device to an Attribute

This commit adds the OffloadModuleInterface to the OpenMP dialect,
which will implement future module attribute get/set's for offloading.
Currently it implements set and get's for the omp.is_device attribute,
which is promoted to a real attribute in this commit as well (primarily
to allow switch cases to work nicely with it for future work and to keep
consistency with future module attributes).

This interface is attached to mlir::ModuleOp's on registration of the
OpenMPDialect and should be accessible anywhere the OpenMP
dialect is registered and initialized.

Reviewers: kiranchandramohan, awarzynski

Differential Revision: https://reviews.llvm.org/D146850

[NFC][OpenMP][libomptarget] Remove unnecessary AsyncInfoWrapperTy parameter

[mlir][llvm] Verify consistency of llvm.resume and llvm.landingpad types

Following the steps of the LLVM verifier
(https://github.com/llvm/llvm-project/blob/b2c48559c882fd558f91e471c4d23ea7b0c6e718/llvm/lib/IR/Verifier.cpp#L4195),
checks in llvm.func verifier are added to ensure consistency of
llvm.landingpad operations' result types and llvm.resume operations'
input types.

As in LLVM, we will allow llvm.resume operations with input values
defined by operations other than llvm.landingpad.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Reviewed By: gysit, Dinistro

Differential Revision: https://reviews.llvm.org/D146968

[libc][NFC] Remove useless header guards from implementation file

Summary:
These were accidentally added and don't do anything.

[mlir][python] Mark operator== overloads as const

This resolves some warnings when building with C++20, e.g.:
```
llvm-project/mlir/lib/Bindings/Python/IRAffine.cpp:545:60: warning: ISO C++20 considers use of overloaded operator '==' (with operand types 'mlir::python::PyAffineExpr' and 'mlir::python::PyAffineExpr') to be ambiguous despite there being a unique best viable function [-Wambiguous-reversed-operator]
                        PyAffineExpr &other) { return self == other; })
                                                      ~~~~ ^  ~~~~~
llvm-project/mlir/lib/Bindings/Python/IRAffine.cpp:350:20: note: ambiguity is between a regular call to this operator and a call with the argument order reversed
bool PyAffineExpr::operator==(const PyAffineExpr &other) {
                   ^
```

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D147018

[AIX] Update release notes regarding -mxcoff-build-id and the profile runtime

[RISCV][TTI] Extract getConstantPoolLoadCost helper routine [nfc]

We had 3 copies of this code, and I am about to add a fourth.

[RISCV] Cost model for general case of single vector permute

The cost model was not accounting for the fact that we can generate vrgather + an index expression.

Two cases to call out.
1) I did not model the difference between vrgather and vrgatherei16. The result is the constant pool cost can be slightly understated on RV32. I don't think we care, but if someone disagrees, this would be easy to add.
2) Our current codegen for i8 vectors longer than 256 (which is the limit of what this costs) has some room for improvement.

Differential Revision: https://reviews.llvm.org/D147000

[X86] MatchVectorAllZeroTest - return X86::CondCode instead of constant node. NFC.

Just return the X86::CondCode enum value instead of creating the target constant node in multiple locations, letting us use the getSETCC helper.

[clangd] Fix build by replacing unsigned long with std::vector::size_type.

[clangd] Show used symbols on #include line hover.

Differential Revision: https://reviews.llvm.org/D146244

[X86] emitFlagsForSetcc - pull out repeated isEquality condcode checks. NFC.

Most of the combines are for ISD::SETEQ/ISD::SETNE comparisons so do a single early-check for the condcode.

[libc++][NFC] Rename helper function for testing spaceship

The helper is mis-named, since it won't work as-is on ordered containers
like set and map, because they rely on being able to store keys that are
partial_ordering::unordered, and that's UB for an ordered container.

This was most likely a typo or an unintended naming mistake, since
the function is only used with sequence containers anyway.

Differential Revision: https://reviews.llvm.org/D146991

[libc++] Don't try to provide source_location on AppleClang 1403

AppleClang 1403 has some bugs that prevent std::source_location from
working properly on it. Consequently, we XFAILed the unit test for
source_location with that compiler. However, we should also avoid
advertising that the feature is supported on that compiler, otherwise
our feature-test macros lie. This was noticed to break Boost.Asio
when building with a recent libc++ and AppleClang 14.0.3.

rdar://106863087

Differential Revision: https://reviews.llvm.org/D146837

[mlir][MemRef] Move transform related functions in Transforms.h

NFC

[libc++] Also support target triples that end with .0 in backdeployment tests

Sometimes, a target can look like `<arch>-apple-macosx10.15.0` instead
of `<arch>-apple-macosx10.15`. This ensures that the test suite handles
those target triples properly as well.

Differential Revision: https://reviews.llvm.org/D146365

[AMDGPU] Handle memset users in PromoteAlloca

Allows allocas with memset users to be promoted.

This is intended to prevent patterns such as `memset(&alloca, 0, sizeof(alloca))` (which I think can be emitted by frontends) from preventing a vectorization of allocas.

Fixes SWDEV-388784

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D146225

Downgrade reserved module identifier error into a warning

Any project that wants to import std; potentially needs to be able to
build a module that does export std;. We silenced the error diagnostic
if the module identified itself as a system header, but this isn't
quite good enough, what we really need is a way to identify a system
module. It would be nice for that feature to be shared among the major
implementations, so this downgrades the diagnostic from an error to a
warning temporarily to give implementers time to determine what that
mechanism will look like. We may convert this warning back into an
error in a future release of Clang, but it's not guaranteed we will do
so.

Fixes https://github.com/llvm/llvm-project/issues/61446
Differential Revision: https://reviews.llvm.org/D146986

[Sanitizers] Fix a memory leak.

Differential Revision: https://reviews.llvm.org/D146756

[mlir] Add missing STL include to 1:N conversion utils.

[mlir][MemRef] Add patterns to extract address computations

This patch adds patterns to rewrite memory accesses such that the resulting
accesses are only using a base pointer.
E.g.,
```mlir
memref.load %base[%off0, ...]
```

Will be rewritten in:
```mlir
%new_base = memref.subview %base[%off0,...][1,...][1,...]
memref.load %new_base[%c0,...]
```

The idea behind these patterns is to offer a way to more gradually lower
address computations.

These patterns are the exact opposite of FoldMemRefAliasOps.
I've implemented the support of only five operations in this patch:
- memref.load
- memref.store
- nvgpu.ldmatrix
- vector.transfer_read
- vector.transfer_write

Going forward we may want to provide an interface for these rewritings (and
the ones in FoldMemRefAliasOps.)
One step at a time!

Differential Revision: https://reviews.llvm.org/D146724

[flang] Fix CONTIGUOUS attribute checking

A CONTIGUOUS entity must be an array pointer, assumed-shape dummy array,
or assumed-rank dummy argument (C752, C830). As currently implemented,
f18 only implements the array requirement if the entity is a pointer.
Combine these checks and start issuing citations to scalars.

Differential Revision: https://reviews.llvm.org/D146588

[clang] Fix consteval initializers of temporaries

When a potential immediate invocation is met, it is immediately wrapped by a
`ConstantExpr`. There is also a TreeTransform that removes this `ConstantExpr`
wrapper when corresponding expression evaluation context is popped.
So, in case initializer was an immediate invocation, `CXXTemporaryObjectExpr`
was wrapped by a `ConstantExpr`, and that caused additional unnecessary
`CXXFunctionalCastExpr` to be added, which later confused the TreeTransform
that rebuilds immediate invocations, so it was adding unnecessary
constructor call.

Fixes https://github.com/llvm/llvm-project/issues/60286

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D146801

[flang] Disallow scalar argument to SIZE/LBOUND/UBOUND

The compiler accepts arguments of any rank, or assumed rank, to a host
of intrinsic inquiry functions. For scalars, this is correct for most
of them, but the standard (and other compilers) prohibit scalar arguments
to SIZE, LBOUND, and UBOUND (without DIM=).

There are meaningful interpretations for these intrinsic inquiries
on scalars, but since there's no portability concern here, continuing
to support them would be an unjustifiable extension.

Differential Revision: https://reviews.llvm.org/D146587

[mlir][doc] Fix typos

It fixes some typos in the language reference.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D147028

[ComplexDeinterleaving] Propagate fast math flags to symmetric operations.

This is a simple patch to make sure fast math flags are propagated through to
the newly created symmetric operations, which can help with later
simplifications.

Differential Revision: https://reviews.llvm.org/D146409

[AArch64] Combine fadd into fcmla

This adds a target combine for `fadd(a, vcmla(b, c, d))` -> `vcmla(fadd(a, b), b, c)`,
pushing the fadd into the operands of the fcmla, which can help simplify away some
additions.

Differential Revision: https://reviews.llvm.org/D146407

[clang][dataflow][NFC] Put TransferVisitor in an unnamed namespace.

This avoids the risk of ODR violations.

Reviewed By: gribozavr2

Differential Revision: https://reviews.llvm.org/D147032

[mlir][Bazel] Adjust BUILD file for 586cebef271f627e80c3535e7cd201915f88b349

[C++20] [Modules] Don't create duplicated deduction guides for duplicated classes

Close https://github.com/llvm/llvm-project/issues/56916

Within C++20 modules, we may have multiple same constructors in
multiple same RecordDecls. And it doesn't make sense naturally to create
duplicated deduction guides for the duplicated constructors.

[llvm][Bazel] Add missing dependency.

[mlir] support external named transform libraries

Introduce support for external definitions of named sequences in the
transform dialect by letting the TransformInterpreterPassBase read a
"library" MLIR file. This file is expected to contain definitions for
named sequences that are only declared in the main transformation
script. This allows for sharing non-trivial transform combinations
without duplication.

This patch provides only the minimal plumbing for a single textual IR
file. Further changes are possible to support multiple libraries and
bytecode files.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D146961

[llvm-windres] Fix a test that failed on Windows. NFC.

Skip this test on Windows (by requiring a posix shell), since we
want to test specific corner cases of quotes passed to the executable,
and llvm-lit/cmd don't seem to handle it correctly at the moment.

[Orc][AArch32] Polish Thumb symbol assertions in ObjectLinkingLayer

[mlir] Add another test case for 1:N type conversion facilities. (NFC)

This patch adds another test case for the new 1:N type conversion utils
testing that the proper user materializations are applied depending on
which of the ops in are converted by the test pass.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D147027

[mlir][scf] Implement structural conversion for 1:N type conversions.

This patch implements patterns for the newly introduced 1:N type
conversion utils for several ops of the SCF dialect. It also adds an
option to the existing test pass as well as test cases that applies the
patterns through the test pass.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D146959

Revert "[CMake] Unify llvm_check_linker_flag and llvm_check_compiler_linker_flag"

This reverts commit 55e65ad876e3ac0b1cb0410a5cce3554c009af65.

[tsan] Derive the unmangled SP in longjmp with xor key on loongarch64

Introducing xor key to derive unmangled sp is here to follow the way
that the glibc adds support for pointer mangling on loongarch in commit
1c9bc1b6e50293a1b7037a7bfbf835868a55baed.

Reviewed By: SixWeining, wangleiat, xen0n

Differential Revision: https://reviews.llvm.org/D146716

[Clang][DebugInfo][AMDGPU] Emit zero size bitfields in the debug info to delimit bitfields in different allocation units.

Consider the following sturctures when targetting:

  struct foo {
    int space[4];
    char a : 8;
    char b : 8;
    char x : 8;
    char y : 8;
  };

  struct bar {
    int space[4];
    char a : 8;
    char b : 8;
    char : 0;
    char x : 8;
    char y : 8;
  };

Even if both structs have the same layout in memory, they are handled
differenlty by the AMDGPU ABI.

With the following code:

// clang --target=amdgcn-amd-amdhsa -g -O1 example.c -S
char use_foo(struct foo f) { return f.y; }
char use_bar(struct bar b) { return b.y; }

For use_foo, the 'y' field is passed in v4
; v_ashrrev_i32_e32 v0, 24, v4
; s_setpc_b64 s[30:31]

For use_bar, the 'y' field is passed in v5
; v_bfe_i32 v0, v5, 8, 8
; s_setpc_b64 s[30:31]

To make this distinction, we record a single 0-size bitfield for every member that is preceded
by it.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D144870

[clang][dataflow][NFC] Eliminate StmtToEnvMap interface.

Instead, we turn StmtToEnvMap into a concrete class with the implementation that used to live in StmtToEnvMapImpl.

The layering issue that originally required the indirection through the
`StmtToEnvMap` interface no longer exists.

Reviewed By: ymandel, xazax.hun, gribozavr2

Differential Revision: https://reviews.llvm.org/D146507

[lvm-windres] Try to match GNU windres regarding handling of unescaped quotes

Some background context: GNU windres invokes the preprocessor in
a subprocess. Some windres options are passed through to the
preproocessor, e.g. -D options for predefining defines.
When GNU windres passes these options onwards, it takes the options
in exact the form they are received (in argv or similar) and
assembles them into a single preprocessor command string which gets
interpreted by a shell (IIRC via the popen() function, or similar).

When LLVM invokes subprocesses, it does so via APIs that take
properly split argument vectors, to avoid needing to worry about
shell quoting/escaping/unescaping. But in the case of LLVM windres,
we have to emulate the effect of the shell parsing done by popen().

Most of the relevant cases are already taken care of here, but this
patch fixes an uncommon case encountered in
https://github.com/llvm/llvm-project/issues/57334.
(This case is uncommon since it doesn't do what one would want to;
the quotes need to be escaped more to work as intended through
the popen() shell).

Differential Revision: https://reviews.llvm.org/D146848

[llvm-rc] Remove transitional preprocessing fallback logic

When preprocessing was integrated to llvm-rc in 2021, this was a
new requirement (previously one could execute llvm-rc without a
suitable preprocessing tool to be available).

As a transitional helper, llvm-rc fell back on skipping preprocessing
if no suitable tool was found (with a warning printed), but users
could pass an llvm-rc specific option to silence the warning, if they
explicitly want to run the tool without preprocessing.

Now 2 years later, remove the transitional helper - error out if
preprocessing failed. The option for disabling preprocessing remains.

Differential Revision: https://reviews.llvm.org/D146797

[llvm-rc] Fix the reference to the option for disabling preprocessing in a message

This was the original option name from the first iteration of the patch
that added the feature, but during review, a different name was suggested
and preferred - but the reference in the helpful message was missed.

Differential Revision: https://reviews.llvm.org/D146796

[llvm-rc] Look for "clang-<major>" when locating a suitable preprocessor

In some cases, there's no adjacent executable named "clang" or
"clang-cl", but one name "clang-<major>". This logic doesn't
cover every possible deployment setup of course, but should
cover more fairly common/reasonable cases.

See
https://github.com/curl/curl-for-win/commit/caaae171ac43af5b883403714dafd42030d8de61#commitcomment-105808524
for discussion about a case where this would have been helpful.

Differential Revision: https://reviews.llvm.org/D146794

[llvm-rc] Respect the executable specified in the --preprocessor command

The arguments passed in this option were passed onto the child
process, but we still blindly used the clang binary that we had
found to sys::ExecuteAndWait as the intended executable to run.

If the user hasn't specified any custom --preprocessor command,
Args[0] is equal to the variable Clang.

This doesn't affect any tests, since the tests only print the
arguments it would try to execute (but not the first parameter to
sys::ExecuteAndWait), but there's no testes for executing it
(and validating that it did execute the right thing).

Differential Revision: https://reviews.llvm.org/D146793

[mlir][Linalg][Transform] Drop spurious assertion in packGreedilyOp

`transform.pack_greedily` supports skipping dimensions in which case we
may well end up with e.g. a matvec innermost.

We should not spuriously crash in such cases.

[mlir] Apply ClangTidy readability fix (NFC)

[SimpleLoopUnswitch] Fix SCEV invalidation for unswitchTrivialSwitch

When doing a trivial unswitch of a switch statement the code need
to "invalidate SCEVs for the outermost loop reached by any of the
exits", as indicated by code comments.

Depending on if we find such an outermost loop or not we can limit
the invalidation to some sub-loops or the full loop-nest. As shown
in the added test case there seem to have been some bugs in the code
that was finding the "outermost loop", so we could end up invalidating
too few loops.

Seems like commit 1bf8ae17f5e2714c8c87978 introduced the bug by
moving the code that invalidates the loops above some of the code
that computed 'OuterL'. This patch fixes that by also moving that
computation of 'OuterL' so that we compute 'OuterL' properly before
we use it for the SCEV invalidation.

Differential Revision: https://reviews.llvm.org/D146963

[AMDGPU] Break-up large PHIs for DAGISel

DAGISel uses CopyToReg/CopyFromReg to lower PHI nodes. With large PHIs, this can result in poor codegen.
This is because it introduces a need to have a build_vector before copying the PHI value, and that build_vector may have many undef elements. This can cause very high register pressure and abnormal stack usage in some cases.

This scalarization/phi "break-up" can be easily tuned/disabled through CL options in case it's not beneficial for some users.
It's also only enabled for DAGIsel and GlobalISel handles PHIs much better (as it works on the whole function).

This can both scalarize (break a vector into its elements) and simplify (break a vector into smaller, more manageable subvectors) PHIs.

Fixes SWDEV-321581

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D143731

[runtimes][CMake] Drop the check to see if linker works

This isn't needed anymore.

Differential Revision: https://reviews.llvm.org/D144440

[AMDGPU] Fold more AGPR copies/PHIs in SIFoldOperands

Generalize `tryFoldLCSSAPhi` into `tryFoldPhiAGPR` which works
on any kind of PHI node (not just LCSSA ones) and attempts to
create AGPR Phis more aggressively.

Also adds a GFX908-only "cleanup" function `tryOptimizeAGPRPhis`
which tries to minimize AGPR to AGPR copies on GFX908, which doesn't
have a ACCVGPR MOV instruction (so AGPR-AGPR copies become 2 or 3 instructions
as they need a VGPR temp). The reason why this is needed is because D143731
+ the new `tryFoldPhiAGPR` may create a lot more PHIs (one 32xfloat PHI becomes
32 float phis), and if each PHI hits the same AGPR (like in `test_mfma_loop_agpr_init`)
they will be lowered to 32 copies from the same AGPR, which will each become 2-3 instructions.
Creating a VGPR cache in this case prevents all those copies from being generated
(we have AGPR-VGPR copies instead which are trivial).

This is a prepation patch intended to prevent regressions in D143731 when
AGPRs are involved.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144099

[Test] Add two tests showing unprofitable case of Guard Widening

Guard Widening is ignorant about blocks frequency. As result, it may
end up widening conditions from cold/effectively dead code into some
much hotter place, harming average performance.

Revert "[SLP] Check with target before vectorizing GEP Indices."

This reverts commit 1387a13e1d0bac94457626ef3e7427c84caf6e65.

This introduced performance regressions on AArch64, when the cost of a
vector GEP + extracts is offset by the benefits of vectorizing the rest
of the tree.

The test in llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll
illustrates the issue. It was extracted from code that regressed a SPEC
benchmark by 15%.

[clang-tidy] Ignore unevaluated exprs in rvalue-reference-param-not-moved

Ignore unevaluated expressions in rvalue-reference-param-not-moved
check since they are not actual uses of a move().

Reviewed By: PiotrZSL

Differential Revision: https://reviews.llvm.org/D146929

[NVPTX] Enforce half type support is present for builtins

Differential Revision: https://reviews.llvm.org/D146715