review.tizen.org Git - platform/upstream/llvm.git/log

[lldb] Fix initialization of LazyBool/bool variables m_overwrite/m_overwrite_lazy. NFCI.

This silences a GCC warning after
1f7b58f2a50461493f083b2ed807b25e036286f6 / D122680:

lldb/source/Commands/CommandObjectCommands.cpp:1650:22: warning: enum constant in boolean context [-Wint-in-bool-context]
1650 | bool m_overwrite = eLazyBoolCalculate;
| ^~~~~~~~~~~~~~~~~~

Differential Revision: https://reviews.llvm.org/D123204

Fix the encoding and decoding of UniqueCStringMap<T> objects when saved to cache files.

UniqueCStringMap<T> objects are a std::vector<UniqueCStringMap::Entry> objects where the Entry object contains a ConstString + T. The values in the vector are sorted first by ConstString and then by the T value. ConstString objects are simply uniqued "const char *" values and when we compare we use the actual string pointer as the value we sort by. This caused a problem when we saved the symbol table name indexes and debug info indexes to disk in one process when they were sorted, and then loaded them into another process when decoding them from the cache files. Why? Because the order in which the ConstString objects were created are now completely different and the string pointers will no longer be sorted in the new process the cache was loaded into.

The unit tests created for the initial patch didn't catch the encoding and decoding issues of UniqueCStringMap<T> because they were happening in the same process and encoding and decoding would end up createing sorted UniqueCStringMap<T> objects due to the constant string pool being exactly the same.

This patch does the sort and also reserves the right amount of entries in the UniqueCStringMap::m_map prior to adding them all to avoid doing multiple allocations.

Added a unit test that loads an object file from yaml, and then I created a cache file for the original file and removed the cache file's signature mod time check since we will generate an object file from the YAML, and use that as the object file for the Symtab object. Then we load the cache data from the array of symtab cache bytes so that the ConstString "const char *" values will not match the current process, and verify we can lookup the 4 names from the object file in the symbol table.

Differential Revision: https://reviews.llvm.org/D124572

[AMDGPU][clang] Definition of gfx11 subtarget

Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537

[AMDGPU] Produce waitcounts for LDS DMA

MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written
can be accessed. A load from LDS to VMEM does not need a wait.

Differential Revision: https://reviews.llvm.org/D124626

[flang] Fix build bot problem

A recent change is eliciting a valid warning from the out-of-tree
flang build bot; fix by using a reference in a range-based for().

Differential Revision: https://reviews.llvm.org/D124682

Add a paragraph showing how to use container commands.

Differential Revision: https://reviews.llvm.org/D124028

[mlir] Prevent argStorage relocations

This fixes msan reports like https://reviews.llvm.org/P8285

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D124576

Add a mutex to the ThreadPlanStackMap class.
We've seen very occasional crashes that we can only explain by
simultaneous access to the ThreadPlanStackMap, so I'm adding a
mutex to protect it.

Differential Revision: https://reviews.llvm.org/D124029

[randstruct] Automatically randomize a structure of function pointers

Strutures of function pointers are a good surface area for attacks. We
should therefore randomize them unless explicitly told not to.

Reviewed By: aaron.ballman, MaskRay

Differential Revision: https://reviews.llvm.org/D123544

Fix sphinx build error in AMDGPUUsage.rst

Corrects error from
813e521e55b11165138b071f446eda94b14570dc

Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM

Fixed "private field is not used" warning when compiled
with clang.

original commit: 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f
reverted in: fa49021c68ef7a7adcdf7b8a44b9006506523191

------

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7. The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions. These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672

Silence -Wstrict-prototype diagnostics in C2x mode

This also disables the diagnostic when the user passes -fno-knr-functions.

[lldb] Define LLDB_VERSION_PATCH correctly

In commit ccf1469a4cdb lldb got its own generated Version.inc file, with
`LLDB_VERSION` macros. However, it used `LLDB_VERSION_PATCHLEVEL`
instead of the actually correct `LLDB_VERSION_PATCH`. Correct this.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D124672

[Clang][Docs] Add new offloading flags to the clang documentation

Summary:
Some previous patches introduced the `--offload-new-driver` flag, which
is a generic way to enable the new driver, and the `--offload-host-only`
and `--offload-device-only` flags which allow users to compile for one
side, making it easier to inspect intermediate code for offloading
compilations. This patch just documents them in the command line
reference.

[RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc]

[AMDGPU] Add gfx11 subtarget ELF definition

This is the first patch of a series to upstream support for the new
subtarget.

Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 1/N for upstreaming AMDGPU gfx11 architectures.

Reviewed By: foad, kzhuravl, #amdgpu

Differential Revision: https://reviews.llvm.org/D124536

[SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine.

This is essentially a refactoring patch but allows more cases to
be caught, hence the output changes to some tests.

Differential Revision: https://reviews.llvm.org/D122994

[RISCV] Extract getAllOnesMask helper [nfc]

[SLP][NFC]Fix a comment.

[RISCV] Improve constant materialization for cases that can use LUI+ADDI instead of LUI+ADDIW.

It's possible that we have a constant that isn't simm32 so we can't
use LUI+ADDIW, but we can use LUI+ADDI. Because ADDI uses a sign
extended constant, it's possible that after subtracting it out, we
end up with a simm32 that maps to LUI.

This patch detects this case after removing Lo12 and before shifting
the value for SLLI.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124222

[OpenMP] Allow CUDA to be linked with OpenMP using the new driver

After basic support for embedding and handling CUDA files was added to
the new driver, we should be able to call CUDA functions from OpenMP
code. This patch makes the necessary changes to successfuly link in CUDA
programs that were compiled using the new driver. With this patch it
should be possible to compile device-only CUDA code (no kernels) and
call it from OpenMP as follows:

```
$ clang++ cuda.cu -fopenmp-new-driver -offload-arch=sm_70 -c
$ clang++ openmp.cpp cuda.o -fopenmp-new-driver -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_70
```

Currently this requires using a host variant to suppress the generation
of a CPU-side fallback call.

Depends on D120272

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120273

[InstCombine] Require LoopInfo in test (NFC)

This test case doesn't show what it was intended to without
require<loops>.

[OpenMP] Add options to only compile the host or device when offloading

OpenMP recently moved to the new offloading driver, this had the effect
of making it more difficult to inspect intermediate code for the device.
This patch adds `-foffload-host-only` and `-foffload-device-only` to
control which sides get compiled. This will allow users to more easily
inspect output without needing the temp files.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124220

[InstCombine] Add additional tests for gep of minus ptrtoint (NFC)

[X86] lowerShuffleAsRepeatedMaskAndLanePermute - move the sublane split code into a lambda helper. NFC.

This is a NFC cleanup as part of the work on #55066 - the idea being that we will be able to check for multiple sub lane scales.

[COST]Fix crash for non-power-2 vector shuffle mask.

Need to normalizize the mask to avoid possible crashes during attempts
to estimate cost of the very long shuffles with non-power-2 number of
elements in masks.

[SimplifyCFG] Avoid shifting by a too large exponent.

TI->getBitWidth can be > 64 and in those cases the shift will be UB due
to the exponent being too large.

To fix this, cap the shift at 63. I think this should work out fine,
because TableSize is itself a 64 bit type and the maximum table size
must fit in the type. Also, if we would underestimate the size here, at
most we get an extra ZExt.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124608

Additionally set f32 mode with denormal-fp-math

When the denormal-fp-math option is used, this should set the
denormal handling mode for all floating point types. However,
currently 32-bit float types can ignore this setting as there is a
variant of the option, denormal-fp-math-f32, specifically for that type
which takes priority when checking the mode based on type and remains
at the default of IEEE. From the description, denormal-fp-math would
be expected to set the mode for floats unless overridden by the f32
variant, and code in the front end only emits the f32 option if it is
different to the general one, so setting just denormal-fp-math should
be valid.

This patch changes the denormal-fp-math option to also set the f32
mode. If denormal-fp-math-f32 is also specified, this is then
overridden as expected, but if it is absent floats will be set to the
mode specified by the former option, rather than remain on the default.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122589

[CompileTime] [Passes] Avoid computing unnecessary analyses. NFC

Similar to c515b2f39e77, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI). Reordered the analyses requests and early return
if there are no loops.

The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.

This is an NFC with compile time improvement.

Differential Revision: https://reviews.llvm.org/D124529

[PowerPC][NFC] Add a function to determine if a call needs to be NOTOC.

Add the isNoTOCCallInstr function to PPCInstrInfo to determine if a call opcode
does not need a TOC restore after the call. All call opcodes should be listed in
this function. A default unreachable in this function should force future call
opcodes to also be added.

This is a follow up patch to D122012

Reviewed By: jsji, shchenz

Differential Revision: https://reviews.llvm.org/D124415

[clang] Eliminate TypeProcessingState::trivial.

This flag is redundant -- it's true iff `savedAttrs` is empty.

Querying `savedAttrs.empty()` should not take any more time than querying the
`trivial` flag, so this should not have a performance impact either.

I noticed this while working on https://reviews.llvm.org/D111548.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D123783

[DAGCombiner] Stop invalid sign conversion in refineIndexType.

When looking through extends of gather/scatter indices it's safe
to convert a known positive signed index to unsigned, but unsigned
indices must remain unsigned.

Depends On D123318

Differential Revision: https://reviews.llvm.org/D123326

[SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost.

getGatherScatterIndexIsExtended currently looks through all
SIGN_EXTEND_INREG operations regardless of their input type. This
patch restricts the code to only look through i32->i64 extensions,
which are the ones supported implicitly by SVE addressing modes.

Differential Revision: https://reviews.llvm.org/D123318

[CUDA] Add driver support for compiling CUDA with the new driver

This patch adds the basic support for the clang driver to compile and link CUDA
using the new offloading driver. This requires handling the CUDA offloading kind
and embedding the generated files into the host. This will allow us to link
OpenMP code with CUDA code in the linker wrapper. More support will be required
to create functional CUDA / HIP binaries using this method.

Depends on D120270 D120271 D120934

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D120272

[Clang] Make enabling the new driver more generic

In preparation for allowing other offloading kinds to use the new driver
a new opt-in flag `-foffload-new-driver` is added. This is distinct from
the existing `-fopenmp-new-driver` because OpenMP will soon use the new
driver by default while the others should not.

Reviewed By: yaxunl, tra

Differential Revision: https://reviews.llvm.org/D123325

[OpenMP] Make clang argument handling for the new driver more generic

In preparation for accepting other offloading kinds with the new driver,
this patch makes the way we handle offloading actions more generic. A
new field to get the associated device action's toolchain is used rather
than manually iterating a list. This makes building the arguments easier
and makes sure that we doin't rely on any implicit ordering.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D123313

[OpenMP] Make generating offloading entries more generic

This patch moves the logic for generating the offloading entries to the
OpenMPIRBuilder. This makes it easier to re-use in other places, such as
for OpenMP support in Flang or using the same method for generating
offloading entires for other languages like Cuda.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D123460

[InstCombine] Add test for unused atomic load from non-constant global (NFC)

[SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize

This is an alternative to D124530. In getUniformBase() only create
scales that match the gather/scatter element size. If targets also
support other scales, then they can produce those scales in target
DAG combines. This is what X86 already does (as long as the
resulting scale would be 1, 2, 4 or 8).

This essentially restores the pre-opaque-pointer state of things.

Fixes https://github.com/llvm/llvm-project/issues/55021.

Differential Revision: https://reviews.llvm.org/D124605

[flang] Handle common block with different sizes in same file

Semantics is not preventing a named common block to appear with
different size in a same file (named common block should always have
the same storage size (see Fortran 2018 8.10.2.5), but it is a common
extension to accept different sizes).

Lowering was not coping with this well, since it just use the first
common block appearance, starting with BLOCK DATAs to define common
blocks (this also was an issue with the blank common block, which can
legally appear with different size in different scoping units).

Semantics is also not preventing named common from being initialized
outside of a BLOCK DATA, and lowering was dealing badly with this,
since it only gave an initial value to common blocks Globals if the
first common block appearance, starting with BLOCK DATAs had an initial
value.

Semantics is also allowing blank common to be initialized, while
lowering was assuming this would never happen, and was never creating
an initial value for it.

Lastly, semantics was not complaining if a COMMON block was initialized
in several scoping unit in a same file, while lowering can only generate
one of these initial value.

To fix this, add a structure to keep track of COMMON block properties
(biggest size, and initial value if any) at the Program level. Once the
size of a common block appearance is know, the common block appearance
is checked against this information. It allows semantics to emit an error
in case of multiple initialization in different scopes of a same common
block, and to warn in case named common blocks appears with different
sizes. Lastly, this allows lowering to use the Program level info about
common blocks to emit the right GlobalOp for a Common Block, regardless
of the COMMON Block appearances order: It emits a GlobalOp with the
biggest size, whose lowest bytes are initialized with the initial value
if any is given in a scope where the common block appears.

Lowering is updated to go emit the common blocks before anything else so
that the related GlobalOps are available when lowering the scopes where
common block appear. It is also updated to not assume that blank common
are never initialized.

Differential Revision: https://reviews.llvm.org/D124622

[InstCombine] Remove memset of undef value

This removes memset with undef char. We already do this for stores
of undef value.

This comes with the caveat that this optimization is not, strictly
speaking, legal for undef values, because we might be overwriting
a poison value. However, our entire load/store model currently still
operates on undef values, so we need to support undef here as well
for internal consistency.

Once https://github.com/llvm/llvm-project/issues/52930 is resolved,
these and related folds can be limited to poison -- I've added
FIXMEs to that effect.

Differential Revision: https://reviews.llvm.org/D124173

[LV] Rename CountRoundDown to VectorTripCount (NFC)

The name CountRoundDown is potentially misleading, as the number of
iterations can be rounded up when folding the tail.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119681

[InstCombine] Fold logical and/or of range icmps with nowrap flags

This is an edge-case where we don't convert to bitwise and/or based
on implies poison reasoning, so explicitly try to perform the fold
in logical form. The transform itself is poison-safe, as both icmps
are based on the same value and any nowrap flags are discarded as
part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used
example).

[mlir][linalg][transform] Add TileOp to transform dialect

This commit adds a tiling op to the transform dialect as an external op.

Differential Revision: https://reviews.llvm.org/D124661

[VPlan] Simplify & adjust code as suggested in D123005.

Improve code as suggested in D123005. Applied separately, because the
comments where made a diff that has not been rebased to current main.

[lldb] Allow EXE or exe in toolchain-msvc.test

I suspect that one of link or cl is found by shutil.which
and one isn't, hence the case difference. It doesn't really
matter for what the test is looking for.

llvm/Support/Debug.h: Suppress warnings with -Asserts. [-Wunused-variable]

Re. setCurrentDebugTypes(X,N), the only user is llvm-ml.cpp (exc. DebugTests)
since llvmorg-15-init-8355-g82ecf9a0b1b3.

FIXME: X and N are evaluated regardless of NDEBUG.
Could we avoid evaluating (but w/o warnings) with NDEBUG?

AVRExpandPseudoInsts.cpp: Fix a warning. [-Wunused-but-set-variable]

It has been enabled since llvmorg-15-init-5683-g2af845a6519c, aka D122271.

[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling

refineUniformBase and selectGatherScatterAddrMode both attempt the
transformation:

base(0) + index(A+splat(B)) => base(B) + index(A)

However, this is only safe when index is not implicitly scaled.

Differential Revision: https://reviews.llvm.org/D123222

Reland "[lldb] Use shutil.which in Shell tests find_executable"

This reverts commit d9247cc84825539d346c74eb1379c6cb948d3a71.

With the Windows tests updated to expect .EXE suffixes. This changed
because shutil.which uses PATHEXT which will contain, amongst others,
"EXE".

Also I noticed the "." in ".exe" was the wildcard dot not literal
dot so I've escaped those.

[InstCombine] Add test for is_alpha check with logical or and nsw (NFC)

The combination of logical or and nsw prevents the fold from
happening.

[AMDGPU] Simplify the test case for D124450

[X86] SimplifyDemandedVectorEltsForTargetNode - fold (uniform) shift(0,x) -> 0

[include-cleaner] Add missing deps from unittests

Revert "[lldb] Use shutil.which in Shell tests find_executable"

This reverts commit 713752610edd3d8766f56e2704bb7241434cd15b.

Some test output needs updating for Windows builders:
https://lab.llvm.org/buildbot/#/builders/83/builds/18356

[InstCombine] Pass ICmpInsts to foldAndOrOfICmpsUsingRanges() (NFC)

Pass the whole instruction rather than unpacking it. This makes it
easier to reuse the function in another place, as the entire
logic is encapsulated.

[X86] SimplifyDemandedVectorEltsForTargetNode - fold shift(0,x) -> 0

[InstCombine] Remove foldAndOrOfEqualityCmpsWithConstants() fold

This fold handles a special subset of foldAndOrOfICmpsUsingRanges(),
use the more generic implementation instead.

The result can differ if a representation using a range comparison
is possible, in which case that is preferred over masking. There is
a canonicalization opportunity here.

[OCaml] Remove add_loop_unswitch use in test.

[InstCombine] Fold and of two ranges differing by mask

This is the de Morgan conjugated variant of the existing fold for
ors. Implement this by switching the range code to always work
on ors and perform invert operands at the start and end. This makes
reasoning easier and makes the extension more obviosuly correct.

[lldb] Use shutil.which in Shell tests find_executable

In build.py we have our own find_executable that looks
a lot like the distutils one that I switched to shutil.which.

This find_executable isn't quite the same as shutil.which
so I've refactored it to call that in the correct way.

Note that the path passed to shutil.which is in the form that
PATH would be, meaning separators are allowed.
```
>>> shutil.which("gcc", path="/home/david.spickett:/bin")
'/bin/gcc'
```

We just need to make sure it doesn't ignore the existing PATH
and normalise the result if it does find the binary.

The .exe extension is automatically added to the binary name
if we are on Windows.

Depends on D124601

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D124604

Remove loop-unswitch from various bindings.

LoopUnswitch has been removed in fb4113ef0c8b.

Also remove it from various bindings.

[gn build] Port fb4113ef0c8b

[InstCombine] Add tests for and of two ranges differing by mask (NFC)

[Passes] Remove legacy LoopUnswitch pass.

The legacy LoopUnswitch pass is only used in the legacy pass manager
pipeline, which is deprecated.

The NewPM replacement is SimpleLoopUnswitch and I think it is time to
remove the legacy LoopUnswitch code.

Fixes #31000.

Reviewed By: aeubanks, Meinersbur, asbirlea

Differential Revision: https://reviews.llvm.org/D124376

[X86] combineShuffle - reuse SDLoc. NFCI.

[X86] Fold ANDNP(undef,x)/ANDNP(x,undef) -> 0

Matches the fold in DAGCombiner::visitANDLike.

[InstCombine] Switch an or of icmps fold to use constant ranges

We can express this fold more naturally when working on the constant
range implementation. This change is not entirely NFC, because the
code now also handles cases that don't match the precise pattern
this previously looked for, e.g. we can omit an add on one of the
ranges.

[InstCombine] Add additional test for icmp of two ranges (NFC)

[include-cleaner] Include-cleaner library structure, and simplistic AST walking.

Include-cleaner is a library that uses the clang AST and preprocessor to
determine which headers are used. It will be used in clang-tidy, in
clangd, in a standalone tool at least for testing, and in out-of-tree tools.

Roughly, it walks the AST, finds referenced decls, maps these to
used sourcelocations, then to FileEntrys, then matching these against #includes.
However there are many wrinkles: dealing with macros, standard library
symbols, umbrella headers, IWYU directives etc.

It is not built on the C++20 modules concept of usage, to allow:
- use with existing non-modules codebases
- a flexible API embeddable in clang-tidy, clangd, and other tools
- avoiding a chicken-and-egg problem where include cleanups are needed
   before modules can be adopted

This library is based on existing functionality in clangd that provides
an unused-include warning. However it has design changes:
- it accommodates diagnosing missing includes too (this means tracking
   where references come from, not just the set of targets)
- it more clearly separates the different mappings
   (symbol => location => header => include) for better testing
- it handles special cases like standard library symbols and IWYU directives
   more elegantly by adding unified Location and Header types instead of
   side-tables
- it will support some customization of policy where necessary (e.g.
   for style questions of what constitutes a use, or to allow
   both missing-include and unused-include modes to be conservative)

This patch adds the basic directory structure under clang-tools-extra
and a skeleton version of the AST traversal, which will be the central
piece.
A more end-to-end prototype is in https://reviews.llvm.org/D122677

RFC: https://discourse.llvm.org/t/rfc-lifting-include-cleaner-missing-unused-include-detection-out-of-clangd/61228

Differential Revision: https://reviews.llvm.org/D124164

[lldb] Use shutil.which instead of distutils find_executable

distutils is deprecated and shutil.which is the suggested
replacement for this function.

https://peps.python.org/pep-0632/#migration-advice
https://docs.python.org/3/library/shutil.html#shutil.which

It was added in Python3.3 but given that we're already using
shutil.which elsewhere I think this is ok/no worse than before.

We do have our own find_executable in lldb/test/Shell/helper/build.py
but I'd rather leave that as is for now. Also we have our own versions
of which() but again, a change for another time.

This work is part of #54337.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D124601

[VecCombine] Fix sort comparator logic in foldShuffleFromReductions

I think this sort comparator was overly complex, and the windows
expensive check bot agreed, failing as it was not giving a strict weak
ordering. Change it to use the comparison of the mask values as unsigned
integers. This should sort the undef elements to the end whilst keeping
X<Y otherwise.

[SimplifyCFG] Replace condition value when threading

Replace the condition value with the known constant value on the
threaded edge. This happens implicitly with phi threading because
we replace with the incoming value, but not for non-phi threading.

Disable test for Android/Bionic.

Test depends on pthread_cancel which is not supported on Android.

[SimplifyCFG] Thread branches on same condition in more cases (PR54980)

SimplifyCFG implements basic jump threading, if a branch is
performed on a phi node with constant operands. However,
InstCombine canonicalizes such phis to the condition value of a
previous branch, if possible. SimplifyCFG does support this as
well, but only in the very limited case where the same condition
is used in a direct predecessor -- notably, this does not include
the common diamond pattern (i.e. two consecutive if/elses on the
same condition).

This patch extends the code to look back a limited number of
blocks to find a branch on the same value, rather than only
looking at the direct predecessor.

Fixes https://github.com/llvm/llvm-project/issues/54980.

Differential Revision: https://reviews.llvm.org/D124159

[X86] Fix CodeGen Module Flag for -mibt-seal

When assertions are enabled, clang will perform RoundTrip for CompilerInvocation argument generation. ibt-seal flags are currently missing in this argument generation, and because of that, the feature doesn't get enabled for these cases. Performing RoundTrip is the default for assert builds, rendering the feature broken in these scenarios.

This patch fixes this and adds a test to properly verify that modules are being generated with the flag when -mibt-seal is used.

Please, add any known relevant reviewer which I may have missed.

[1] - https://reviews.llvm.org/D116070

Reviewed By: pengfei, gftg, aaron.ballman, nickdesaulniers

Differential Revision: https://reviews.llvm.org/D118052

[SystemZ] Custom lowering of llvm.is_fpclass

Differential Revision: https://reviews.llvm.org/D114695

[ELF][test] Improve data-segment-relro.test

[RISCV] Add cost model for SK_Broadcast

Add cost model for broadcast shuffle in RISCVTTIImpl::getShuffleCost
with scalable vector. The cost model might not the best.

For scalable vector, BasicTTIImpl::getShuffleCost return invalid cost,
so this patch relies on the existing cost model in BasicTTIImpl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124101

[Clang][CodeGen]Fix __builtin_dump_struct missing record type field name

Thanks for @rsmith to point this. I'm sorry for introducing this bug.
See @rsmith 's comment in https://reviews.llvm.org/D122248
Eg:(By @rsmith ) https://godbolt.org/z/o7vcbWaEf
I have added a test case
struct:
```
struct U19A {
    int a;
  };
  struct U19B {
    struct U19A a;
  };

  struct U19B a = {
    .a.a = 2022
  };
```
Dump result:
```
struct U19B {
    struct U19A a = {
        int a = 2022
    }
}
```

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D122920

[RISCV] Merge addi into load/store as there is a ADD between them

This patch adds peephole optimizations for the following patterns:

(load (add base, (addi src, off1)), off2)
-> (load (add base, src), off1+off2)
(store val, (add base, (addi src, off1)), off2)
-> (store val, (add base, src), off1+off2)

Differential Revision: https://reviews.llvm.org/D124231

[PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass

PowerPC supports `ppc_fp128`, which is not an IEEE floating point
type. The generic lowering of llvm.is_fpclass could not handle it
properly. This change extends the generic lowering code to
support `ppc_fp128`.

The change was tested on emulator using runtime tests from
https://reviews.llvm.org/D112933 and the patch for clang
https://reviews.llvm.org/D112932.

Differential Revision: https://reviews.llvm.org/D113908

[asan] Enable detect_stack_use_after_return=1 by default on Linux

By default -fsanitize=address already compiles with this check, why not use it.
For compatibly it can be disabled with env ASAN_OPTIONS=detect_stack_use_after_return=0.

Reviewed By: eugenis, kda, #sanitizers, hans

Differential Revision: https://reviews.llvm.org/D124057

Reland: [clang] Adding Platform/Architecture Specific Resource Header Installation Targets

The goal of this patch is to improve distribution build's flexibility to include only applicable header files.

Currently, the clang-resource-headers target contains nearly all the files in clang/lib/Headers. Most of these files are platform specific (e.g. immintrin.h is x86 specific). A distribution build will have to either include all the headers for all the platforms, or not include any headers. For example, if a distribution build for powerpc includes the clang-resource-headers target, it will include all the x86 specific headers, even-though the x86 specific headers cannot be used.

This patch breaks up the clang-resource-headers list to a core list and platform specific lists. With the patch, a distribution build can now include the ppc-resource-headers to include the headers applicable to the powerpc platform.

Specifically, one can now have

cmake ... LLVM_DISTRIBUTION_COMPONENTS="clang;ppc-resource-headers" ... ../llvm
ninja install-distribution then installs the powerpc headers.

Similarly, one can do

cmake ... LLVM_DISTRIBUTION_COMPONENTS="clang;x86-resource-headers" ... ../llvm
to include headers applicable to the x86 platform in a distribution installation.

To implement this behaviour, the patch does two things

It breaks up the long files header file list to a core list and platform specific lists.
It adds numerous platform specific installation targets.

Differential Revision: https://reviews.llvm.org/D123498

[CMake] Ensure correct extension for llvm-lit is used on Windows when LLVM_INSTALL_UTILS is enabled.

D77110 initially added support for setting LLVM_CONFIG_DEFAULT_EXTERNAL_LIT
to llvm-lit in the install directory if LLVM_INSTALL_UTILS is on.

D79144 ensured that, on Windows, llvm-lit.py is correctly set for
LLVM_CONFIG_DEFAULT_EXTERNAL_LIT within the context of the build area,
however, it did not account for the install area which is the latter set
directive for this same variable.

This patch ensures that LLVM_CONFIG_DEFAULT_EXTERNAL_LIT under the install
area uses llvm-lit.py under Windows since llvm-lit without the extension
is not created.

Differential Revision: https://reviews.llvm.org/D124197

[NFC] Prevent shadowing a variable declared in `if`

Prevents confusion over which `S` is referenced in the final `else`
branch if such use is added.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D124556

[msan][libcxx] Enable -fsanitize-memory-param-retval

We are considering to make -fsanitize-memory-param-retval enabled by default so probably this patch is unnneded.

Reviewed By: #libc, EricWF

Differential Revision: https://reviews.llvm.org/D123979

Frontend: Delete output streams before closing CompilerInstance outputs

Delete the output streams coming from
CompilerInstance::createOutputFile() and friends once writes are
finished. Concretely, replacing `OS->flush()` with `OS.reset()` in:

- `ExtractAPIAction::EndSourceFileAction()`
- `PrecompiledPreambleAction::setEmittedPreamblePCH()`
- `cc1_main()'s support for `-ftime-trace`

This fixes theoretical bugs related to proxy streams, which may have
cleanups to run in their destructor. For example, a proxy that
CompilerInstance sometimes uses is `buffer_ostream`, which wraps a
`raw_ostream` lacking pwrite support and adds it. `flush()` does not
promise that output is complete; `buffer_ostream` needs to wait until
the destructor to forward anything so that it can service later calls to
`pwrite()`. If the destructor isn't called then the proxied stream
hasn't received any content.

This also protects against some logic bugs, triggering a null
dereference on a later attempt to write to the stream.

No tests, since in practice these particular code paths never use
use `buffer_ostream`; you need to be writing a binary file to a
pipe (such as stdout) to hit it, but `-extract-api` writes a text file
and the other two use computed filenames that will never (in practice)
be a pipe. This is effectively NFC, for now.

But I have some other patches in the works that add guard rails,
crashing if the stream hasn't been destructed by the time the
CompilerInstance is told to keep the output file, since in most cases
this is a problem.

Differential Revision: https://reviews.llvm.org/D124635

[analyzer] Add path note tags to standard library function summaries.

The patch is straightforward except the tiny fix in BugReporterVisitors.cpp
that suppresses a default note for "Assuming pointer value is null" when
a note tag from the checker is present. This is probably the right thing to do
but also definitely not a complete solution to the problem of different sources
of path notes being unaware of each other, which is a large and annoying issue
that we have to deal with. Note tags really help there because they're nicely
introspectable. The problem is demonstrated by the newly added getenv() test.

Differential Revision: https://reviews.llvm.org/D122285

[CUDA][HIP] Fix mangling number for local struct

MSVC and Itanium mangling use different mangling numbers
for function-scope structs, which causes inconsistent
mangled kernel names in device and host compilations.

This patch uses Itanium mangling number for structs
in for mangling device side names in CUDA/HIP host
compilation on Windows to fix this issue.

A state is added to ASTContext to indicate whether the
current name mangling is for device side names in host
compilation. Device and host mangling number
are encoded/decoded as upper and lower half of 32 bit
unsigned integer to fit into the original mangling number
field for AST. Diagnostic will be emitted if a manglining
number exceeds limit.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D122734

Fixes: SWDEV-328515

[NFC][SCEV] Refactor `createNodeForSelectViaUMinSeq()` out of `createNodeForSelectOrPHIViaUMinSeq()`

[NFC][SCEV] Tests with modellable pointer `select`s

Revert "[DebugInfo][InstrRef] Describe value sizes when spilt to stack"

This reverts commit a15b66e76d1ecff625a4bbb4a46ff83a43138f49.

This causes linker to crash at assertion: `Assertion failed: !Expr->isComplex(), file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\CodeGen\LiveDebugValues\InstrRefBasedImpl.cpp, line 907`.

[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues

This patch is fixing two issues for both CS and non-CS.
1) For external-call-internal, the head samples of the the internal function should be recorded.
2) avoid ignoring LBR after meeting the interrupt branch for CS profile

LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder.

Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by `unwindLinear` without losing any context.

The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in `unwindLinear`, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D118177

[lldb] Remove ConnectionFileDescriptorTest.Connectv(4|6)

These tests are timing out on the bots.

[Tooling/DependencyScanning] Make skipping excluded PP ranges during dependency scanning the default

This is to improve maintenance a bit and remove need to maintain the additional option and related code-paths.

Differential Revision: https://reviews.llvm.org/D124558

[NFC] remove const from FunctionPropertiesAnalysis::run, keep on Result

The goal in 75881d8b023e was just modifying what `Result` is, didn't
need to also modify ::run.

[lldb] Fix crash when launching in terminal

This patch fixes a crash when using process launch -t to launch the
inferior from a TTY. The issue is that on Darwin, Host.mm is calling
ConnectionFileDescriptor::Connect without a socket_id_callback_type. The
overload passes nullptr as the function ref, which gets called
unconditionally as the socket_id_callback.

One potential way to fix this is to change all the lambdas to include a
null check, but instead I went with an empty lambda.

Differential revision: https://reviews.llvm.org/D124535

[MC][AArch64] Enable '+v8a' when nothing specified for MCSubtargetInfo

Since D110065, the 'R' profile support is added to LLVM. It turns the
`generic` cpu into the intersection of v8-a and v8-r. However, this
makes some backward compatibility problems. The original patch makes
the clang driver implicitly pass -march=armv8-a when only the triple
is specified. Since it only applies to clang, other tools like
llvm-objdump still faces the backward compatibility problem.

This patch applies the same idea to MC related tools by enabling '+v8a'
feature when nothing is specified (both CPU and FS are empty) for
MCSubtargetInfo creation.

This patch should fix PR53956.

Reviewed by: labrinea

Differential Revision: https://reviews.llvm.org/D124319

[BOLT] Fix r_aarch64_prelxx test

The relocation value is calculated using the formula S + A - P,
the verification of the value is performed by inversely calculating the location address

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D124270