review.tizen.org Git - platform/upstream/llvm.git/log

[CLANG][PATCH][FPEnv] Add support for option -ffp-eval-method and extend #pragma float_control similarly

The Intel compiler ICC supports the option "-fp-model=(source|double|extended)"
which causes the compiler to use a wider type for intermediate floating point
calculations. Also supported is a way to embed this effect in the source
program with #pragma float_control(source|double|extended).
This patch extends pragma float_control syntax, and also adds support
for a new floating point option "-ffp-eval-method=(source|double|extended)".
source: intermediate results use source precision
double: intermediate results use double precision
extended: intermediate results use extended precision

Reviewed By: Aaron Ballman

Differential Revision: https://reviews.llvm.org/D93769

[mlir][spirv] Fix a few issues in ModuleCombiner

- Fixed symbol insertion into `symNameToModuleMap`. Insertion
needs to happen whether symbols are renamed or not.
- Added check for the VCE triple and avoid dropping it.
- Disabled function deduplication. It requires more careful
rules. Right now it can remove different functions.
- Added tests for symbol rename listener.
- And some other code/comment cleanups.

Reviewed By: ergawy

Differential Revision: https://reviews.llvm.org/D106886

[AsmParser] Remove unused declaration parseOptionalCommaInAlloca (NFC)

[InstSimplify] Simplify llvm.vscale when vscale_range attribute exists

Reduce llvm.vscale to constant based on vscale_range attribute.

Differential Revision: https://reviews.llvm.org/D106850

[SLP]Fix build on MacOS, NFC.

[x86] improve CMOV codegen by pushing add into operands, part 3

In this episode, we are trying to avoid an x86 micro-arch quirk where complex
(3 operand) LEA potentially costs significantly more than simple LEA. So we
simultaneously push and pull the math around the CMOV to balance the operations.

I looked at the debug spew during instruction selection and decided against
trying a later DAGToDAG transform -- it seems very difficult to match if the
trailing memops are already selected and managing the creation of extra
instructions at that level is always tricky.

Differential Revision: https://reviews.llvm.org/D106918

sanitizer_common: replace RWMutex/BlockingMutex with Mutex

Mutex supports reader access, OS blocking, spinning,
portable and smaller than BlockingMutex.
Overall it's supposed to be better than RWMutex/BlockingMutex.
Replace RWMutex/BlockingMutex with Mutex.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D106936

sanitizer_common: prohibit Mutex(LINKER_INITIALIZED)

Mutex does not support LINKER_INITIALIZED ctor.
But we used to support it with BlockingMutex.
To prevent potential bugs delete LINKER_INITIALIZED Mutex ctor.
Also mark existing ctor as explicit.

Depends on D106944.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D106945

sanitizers: switch BlockingMutex(LINKER_INITIALIZED) to Mutex

Mutex does not support LINKER_INITIALIZED support.
As preparation to switching BlockingMutex to Mutex,
proactively replace all BlockingMutex(LINKER_INITIALIZED) to Mutex.
All of these are objects with static storage duration and Mutex ctor
is constexpr, so it should be equivalent.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D106944

[lldb] Add "memory tag write" --end-addr option

The default mode of "memory tag write" is to calculate the
range from the start address and the number of tags given.
(just like "memory write" does)

(lldb) memory tag write mte_buf 1 2
(lldb) memory tag read mte_buf mte_buf+48
Logical tag: 0x0
Allocation tags:
[0xfffff7ff9000, 0xfffff7ff9010): 0x1
[0xfffff7ff9010, 0xfffff7ff9020): 0x2
[0xfffff7ff9020, 0xfffff7ff9030): 0x0

This new option allows you to set an end address and have
the tags repeat until that point.

(lldb) memory tag write mte_buf 1 2 --end-addr mte_buf+64
(lldb) memory tag read mte_buf mte_buf+80
Logical tag: 0x0
Allocation tags:
[0xfffff7ff9000, 0xfffff7ff9010): 0x1
[0xfffff7ff9010, 0xfffff7ff9020): 0x2
[0xfffff7ff9020, 0xfffff7ff9030): 0x1
[0xfffff7ff9030, 0xfffff7ff9040): 0x2
[0xfffff7ff9040, 0xfffff7ff9050): 0x0

This is implemented using the QMemTags packet previously
added. We skip validating the number of tags in lldb and send
them on to lldb-server, which repeats them as needed.

Apart from the number of tags, all the other client side checks
remain. Tag values, memory range must be tagged, etc.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D105183

[X86][AVX] Move VPERM2F128 defs above VINSERTF128 defs. NFC.

This will be necessary for a future patch to lower VINSERTF128 custom folds to VPERM2F128

[SLP]Improve graph reordering.

Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020

[clang-tidy] Fix crash on "reference-to-array" parameters in 'bugprone-easily-swappable-parameters'

An otherwise unexercised code path related to trying to model
"array-to-pointer decay" resulted in a null pointer dereference crash
when parameters of type "reference to array" were encountered.

Fixes crash report http://bugs.llvm.org/show_bug.cgi?id=50995.

Reviewed By: aaron.ballman

Differential Revision: http://reviews.llvm.org/D106946

[LTO][Legacy] Add new API to check presence of ctor/dtor functions.

On AIX, the linker needs to check whether a given lto_module_t contains
any constructor/destructor functions, in order to implement the behavior
of the -bcdtors:all flag. See
https://www.ibm.com/docs/en/aix/7.2?topic=l-ld-command for the flag's
documentation.
In llvm IR, constructor (destructor) functions are added to a special
global array @llvm.global_ctors (@llvm.global_dtors).
However, because these two symbols are artificial, they are not visited
during the symbol traversal (using the
lto_module_get_[num_symbols|symbol_name|symbol_attribute] API).

This patch adds a new function to the libLTO interface that checks the
presence of one or both of these two symbols.

Reviewed By: steven_wu

Differential Revision: https://reviews.llvm.org/D106887

[LV] Move recurrence backedge fixup code to VPlan::execute (NFC).

As suggested in D105008, move the code that fixes up the backedge value
for first order recurrences to VPlan::execute.

Now all that remains in fixFirstOrderRecurrences is the code responsible
for creating the exit values in the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D106244

[LV][ARM] Tighten up MLA reduction costing

This makes a couple of changes to the costing of MLA reduction patterns,
to more accurately cost various patterns that can come up from
vectorization.

- The Arm implementation of getExtendedAddReductionCost is altered to
   only provide costs for legal or smaller types. Larger than legal types
   need to be split, which currently does not work very well, especially
   for predicated reductions where the predicate may be legal but needs to
   be split. Currently we limit it to legal or smaller input types.
- The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext))
   is a pattern that can come up, and can be treated the same as
   reduce(mul(ext, ext)) providing the extension types match.
- And it has been adjusted to not count the ext in reduce(mul(ext, ext))
   as part of a reduce(mul) pattern.

Together these changes help to more accurately cost the mla reductions
in cases such as where the extend types don't match or the extend
opcodes are different, picking better vector factors that don't result
in expanded reductions.

Differential Revision: https://reviews.llvm.org/D106166

[mlir][linalg] Specialize LinalgOp canonicalization patterns (NFC).

Specialize the DeduplicateInputs and RemoveIdentityLinalgOps patterns for GenericOp instead of implementing them for the LinalgOp interface.

This revsion is based on https://reviews.llvm.org/D105622 that moves the logic to erase identity CopyOps in a separate pattern.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D105291

Allow #pragma float_control(push|pop) within a language linkage specification

Currently, we prohibit this pragma from appearing within a language
linkage specification, but this is useful functionality that is
supported by MSVC (which is where we inherited this feature from).
This patch allows you to use the pragma within an extern "C" {} (etc)
block.

[mlir][linalg] Introduce a separate EraseIdentityCopyOp Pattern.

Split out an EraseIdentityCopyOp from the existing RemoveIdentityLinalgOps pattern. Introduce an additional check to ensure the pattern checks the permutation maps match. This is a preparation step to specialize RemoveIdentityLinalgOps to GenericOp only.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D105622

[libcxx] Bump __libcpp_version to 14 after branch

This was missed in 08c766a7318ab37bf1d77e0c683cd3b00e700877
and caused test failures in the buildkite bots:
libcpp_version.pass.cpp:22:1:
error: static_assert failed due to requirement '14000 == libcpp_version'
"_LIBCPP_VERSION doesn't match __libcpp_version

[LLDB] Skip TestGuiBasicDebug.py on Arm/AArch64 Linux

TestGuiBasicDebug.py randomly fails due to timeouts sending out false
negatives on LLDB Arm and AArch64 Linux buildbots. I havnt found a
reliable wayy to set pexpect timeout for this test to pass regularly.

Skipping it on Arm and AArch64 Linux to silence buildbot failures.

[LLDB] Skip HW breakpoints test_step_until on Arm/Linux

test_step_until xpasses on some machines while fails on others.
Marking it as skipped for now.

[mlir][memref] Fix collapsed shape ops memref.cast folding with changed type

`memref.collapse_shape` has verification logic to make sure
result dim must be static if all the collapsing src dims are static.
Cast folding might add more static information for the src operand
of `memref.collapse_shape` which might change a valid collapsing
operation to be invalid. Add `CollapseShapeOpMemRefCastFolder` pattern
to fix this.

Minor changes to `convertReassociationIndicesToExprs` to use `context`
instead of `builder` to avoid extra steps to construct temporary
builders.

Reviewed By: nicolasvasilache, mravishankar

Differential Revision: https://reviews.llvm.org/D106670

[ARM] Extra MVE reduction vectorizer tests. NFC

[lldb] Temporarily bump the max length of the pexpect error message to diagnose an lldb-aarch64 test failure

This is only temporarily to gather some logs before this gets reverted. See
D106873 for a discussion about how/if we can make this change permanent.

[lldb] Add "memory tag write" command

This adds a new command for writing memory tags.
It is based on the existing "memory write" command.

Syntax: memory tag write <address-expression> <value> [<value> [...]]
(where "value" is a tag value)

(lldb) memory tag write mte_buf 1 2
(lldb) memory tag read mte_buf mte_buf+32
Logical tag: 0x0
Allocation tags:
[0xfffff7ff9000, 0xfffff7ff9010): 0x1
[0xfffff7ff9010, 0xfffff7ff9020): 0x2

The range you are writing to will be calculated by
aligning the address down to a granule boundary then
adding as many granules as there are tags.

(a repeating mode with an end address will be in a follow
up patch)

This is why "memory tag write" uses MakeTaggedRange but has
some extra steps to get this specific behaviour.

The command does all the usual argument validation:
* Address must evaluate
* You must supply at least one tag value
(though lldb-server would just treat that as a nop anyway)
* Those tag values must be valid for your tagging scheme
(e.g. for MTE the value must be > 0 and < 0xf)
* The calculated range must be memory tagged

That last error will show you the final range, not just
the start address you gave the command.

(lldb) memory tag write mte_buf_2+page_size-16 6
(lldb) memory tag write mte_buf_2+page_size-16 6 7
error: Address range 0xfffff7ffaff0:0xfffff7ffb010 is not in a memory tagged region

(note that we do not check if the region is writeable
since lldb can write to it anyway)

The read and write tag tests have been merged into
a single set of "tag access" tests as their test programs would
have been almost identical.
(also I have renamed some of the buffers to better
show what each one is used for)

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D105182

Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR"

Crashes were reported on the upstreamm revision:
https://reviews.llvm.org/D105207

This reverts commit 796b84d26f4d461fb50e7b4e84e15a10eaca88fc.

[clang-format] Correctly attach enum braces with ShortEnums disabled

Previously, with AllowShortEnumsOnASingleLine disabled, enums that would have otherwise fit on a single line would always put the opening brace on its own line.
This patch ensures that these enums will only put the brace on its own line if the existing attachment rules indicate that it should.

Reviewed By: HazardyKnusperkeks, curdeius

Differential Revision: https://reviews.llvm.org/D99840

Revert "[LLDB] Skip HW breakpoints test_step_until on Arm/Linux"

This reverts commit ab5b8ee1a7a18fe097419e21224ac4f15591bcd7.

This caused some failure on buildbots so reverting it for now.

[LLDB] Skip HW breakpoints test_step_until on Arm/Linux

test_step_until xpasses on some machines while fails on others. I am
marking it as skipped for now.

[ORC] Fix missing include.

Aims to fix bot failures for some module builds, e.g.
https://green.lab.llvm.org/green/blue/organizations/jenkins/lldb-cmake/detail/lldb-cmake/33934/pipeline/

[SLP][X86] Fix naming consistency of dot product tests. NFC.

Revert "sanitizers: increase .clang-format columns to 100"

This reverts commit 5d1df6d220f1d6f726d9643848679d781750db64.

There is a strong objection to this change:
https://reviews.llvm.org/D106436#2905618

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D106847

Revert "Emit strong definition for TypeID storage in Op/Type/Attributes definition"

This reverts commit b349d4c5e1852091aad97d3750e286493cac7178.
This broke a bot that exposes some missing CMake dependencies that need
to be fixed first.

[AMDGPU] We would need FP if there is call and caller save VGPR spills

Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs
to consider caller save VGPR spills as well while anticipating if we
require FP.

Fixes: SWDEV-295978
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106758

Emit strong definition for TypeID storage in Op/Type/Attributes definition

By making an explicit template specialization for the TypeID provided by these classes,
the compiler will not emit an inline weak definition and rely on the linker to unique it.
Instead a single definition will be emitted in the C++ file alongside the implementation
for these classes. That will turn into a linker error what is now a hard-to-debug runtime
behavior where instances of the same class may be using a different TypeID inside of
different DSOs.

Differential Revision: https://reviews.llvm.org/D105903

Bump the trunk major version to 14

and clear the release notes.

[OpenMP] Fixing missing variables when CUDA SDK not in system

This patch fixes the error reported in D106751. When there is no CUDA SDK
installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE`
variables.

Using @zsrkmyn sugested fix

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106933

[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool

This patch introduces the `llvm-omp-device-info` tool, which uses the
omptarget library and interface to query the device info from all the
available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo`

Since omptarget usually requires a description structure with executable
kernels, I split the initialization of the RTLs and Devices to be able to
initialize all possible devices and query each of them.

This revision relies on the patch that introduces the print device info.

A limitation is that the order in which the devices are initialized, and the
corresponding device ID is not necesarily the one seen by OpenMP.

The changes are as follows:
1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function
2. Create an `initAllRTLs` method that initializes all available RTLs at runtime
3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information.

Example Output:
```
Device (0):
    print_device_info not implemented

Device (1):
    print_device_info not implemented

Device (2):
    print_device_info not implemented

Device (3):
    print_device_info not implemented

Device (4):
    CUDA Driver Version:                11000
    CUDA Device Number:                 0
    Device Name:                        Quadro P1000
    Global Memory Size:                 4236312576 bytes
    Number of Multiprocessors:          5
    Concurrent Copy and Execution:      Yes
    Total Constant Memory:              65536 bytes
    Max Shared Memory per Block:        49152 bytes
    Registers per Block:                65536
    Warp Size:                          32 Threads
    Maximum Threads per Block:          1024
    Maximum Block Dimensions:           1024, 1024, 64
    Maximum Grid Dimensions:            2147483647 x 65535 x 65535
    Maximum Memory Pitch:               2147483647 bytes
    Texture Alignment:                  512 bytes
    Clock Rate:                         1480500 kHz
    Execution Timeout:                  Yes
    Integrated Device:                  No
    Can Map Host Memory:                Yes
    Compute Mode:                       DEFAULT
    Concurrent Kernels:                 Yes
    ECC Enabled:                        No
    Memory Clock Rate:                  2505000 kHz
    Memory Bus Width:                   128 bits
    L2 Cache Size:                      1048576 bytes
    Max Threads Per SMP:                2048
    Async Engines:                      Yes (2)
    Unified Addressing:                 Yes
    Managed Memory:                     Yes
    Concurrent Managed Memory:          Yes
    Preemption Supported:               Yes
    Cooperative Launch:                 Yes
    Multi-Device Boars:                 No
    Compute Capabilities:               61
```

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106752

[mlir][openacc] Initial translation for DataOp to LLVM IR

Add basic translation of acc.data to LLVM IR with runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104301

Fix a thinko in the parsing of substitutions in CommandObjectRegexCommand.

The old code incorrectly calculated the start position for the search
for the third (and subsequent) instance of a particular substitution
pattern (e.g. %1).

I also added a few test cases for this parsing covering this failure.

[mlir] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]]

[[noreturn]] can be used since 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.

[lld] Replace LLVM_ATTRIBUTE_NORETURN with [[noreturn]]

[[noreturn]] can be used since 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.

[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`

This patch introduces a function in the device's plugin to print the
device information. This patch relates to another patch that introduces
a CLI tool to obtain the device information from the omplibrary directly.
It is inspired by PGI's pgaccelinfo.

The modifications are as follows:
1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL.
2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented
3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy`
4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106751

[OpenMP] Folding threadLimit and numThreads when single value in kernels

The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033

[clang] NFC: change uses of `Expr->getValueKind` into `is?Value`

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D100733

llvm/utils: guarantee revert_checker's revert ordering

At the moment, the revert ordering from this tool is unspecified (though
it happens to be in `git log` order, so newest reverts come first).

From the standpoint of tooling and users, this seems to be the opposite
of what we want by default: tools and users will generally try to apply
these reverts as cherry-picks. If two reverts in the list are close
enough to each other, if the reverts get applied out of order, we'll get
a merge conflict.

Rather than having `reverse`s for all tools (and mental reverses for
manual users), just guarantee an oldest-first output ordering for this
function.

Differential Revision: https://reviews.llvm.org/D106838

[DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND

This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))`
if the SETCC is only used by BRCOND.

Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105344

Precommit test files for D105344 (NFC)

[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT

Differential Revision: https://reviews.llvm.org/D106780

Reapply "[Attributor] Disable simplification AAs if a callback is present""

This reapplies commit cbb709e25124dc38ee593882051fc88c987fe591 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.

This reverts commit aa27430a625b2fd059707a87f8ba2df8f480ff11.

Revert "[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT"

This reverts commit 6ff73efea94621e74642e4d7a15cc86a5fb6d411.

Add llvm::equal convenient wrapper for ranges around std::equal

Differential Revision: https://reviews.llvm.org/D106913

[libc++] Fix a few warnings in system headers with GCC

This isn't fixing all of them, but at least it's making some progress.

Differential Revision: https://reviews.llvm.org/D106283

[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT

[PDL] Mark PatternOp as SingleBlock

This provides access to the SingleBlock accessor methods, e.g. getBody().

[PDL] Fix the builders for OperationOp and PatternOp

Create synthetic symbol names on demand to improve memory consumption and startup times.

This is a resubmission of https://reviews.llvm.org/D105160 after fixing testing issues.

This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.

Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.

This patch fixes the issue by:

not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool
Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)

After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)

The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.

Differential Revision: https://reviews.llvm.org/D106837

[Hexagon] Fix resetting dead registers in DBG_VALUE_LISTs

This fixes https://llvm.org/PR51229.

Revert "[ELF] --gc-sections: allow GC on reserved sections in a group"

clang may place dynamic initializations for explicitly specialized class
template static data members in comdat.
Such in-comdat SHT_INIT_ARRAY was an abuse but we have to work around it for a while.

[gn build] Port 8a48e6dda9f7

Revert "[Attributor] Disable simplification AAs if a callback is present"

This reverts commit cbb709e25124dc38ee593882051fc88c987fe591 as it
breaks the tests, which was not supposed to happen. Investigating now.

[Attributor] Verify `checkForAllUses` return value properly

Also do not emit more than one remark after Heap2Stack failed.

[OpenMP] Improve alignment handling in the new device runtime

[Attributor] Disable simplification AAs if a callback is present

AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at
the IR by default. If queried for a IR position which has a
simplification callback we should either look at the callback return, or
give up. We do the latter for now.

[libcxx][ranges] Add `counted_iterator`.

Differential Revision: https://reviews.llvm.org/D106205

[libcxx][nfc] Delete `cpp20_input_iterator`'s default constructor.

This will make it conform only to the minimum requirements for an `input_iterator`.

[compiler-rt][hwasan][Fuchsia] Do not emit FindDynamicShadowStart for Fuchsia

This function is unused because fuchsia does not support a dynamic shadow.

Differential Revision: https://reviews.llvm.org/D105735

Fix test/Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll.

It was writing to the source directory (which may not be writeable),
rather than using %t.

Fixes: a5dd6c6cf935 ("[LoopVectorize] Don't interleave scalar ordered reductions for inner loops")

[lld][ELF] remove empty SyntheticSections from inputSections

Change removeUnusedSyntheticSections() to actually remove empty
SyntheticSections in inputSections.

In addition to doing what removeUnusedSyntheticSections() was meant
to do, this will also make the shuffle-sections tests, which shuffles
inputSections, less sensitive to empty Synthetic Sections that
will not appear in the final image.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D106427

Change-Id: I589eaf596472161a4395fb658aea0fad73318088

[gn build] manually port 71909de37495

[Libomptarget] Revert new variable sharing to use the old method

The new method of sharing variables introduces a `__kmpc_alloc_shared` call
that cannot be removed in the middle end because of its non-constant argument
and unconnected free. This patch reverts this to the old method that used a
static amount of shared memory for sharing variables.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106905

[AIX] Update fetch_and_add type

It turns out that the AIX kernel is defining int instead of unsigned int for fetch_and_add.

Legacy XL also defines this to be signed.

https://www.ibm.com/docs/en/aix/7.2?topic=f-fetch-add-kernel-services

So update the type for compat.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D106920

[MLGO] fix silly LLVM_DEBUG misuse

[OpenMP][Tests] Fix test compatibility

gcc and clang disagree in how the event handle needs to be handled.
According to OpenMP LC, gcc is right. Will open clang bug report

[NFC][MLGO] Debug messages for what inline advisor is selected

We already have an indication (error) if the desired inline advisor
cannot be enabled, but we don't have a positive indication. Added
LLVM_DEBUG messages for the latter.

[OpenMP] Fix deadlock for detachable task with child tasks

This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066.

For detachable tasks, the assumption breaks that the proxy task cannot have
remaining child tasks when the proxy completes.
In stead of increment/decrement the incomplete task count, a high-order bit
is flipped to mark and wait for the incomplete proxy task.

Differential Revision: https://reviews.llvm.org/D101082

[libc] Fix strtok_r crash when src and *saveptr are both nullptr

While working and testing my refactoring of multiple string functions in libc, I came across a bug that needs to be addressed in a patch on its own: src is checked for nullptr and assigned to *saveptr if it is nullptr. However, saveptr is initially nullptr when it comes to reentry. This could cause a problem if both saveptr and src are null; we need to do the check first and return nullptr if both are nullptr.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D106885

[lldb][NFC] Fix incorrect log and comment

Likely copy & paste issue that was overlooked years ago

[OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions

The device runtime contains several calls to __kmpc_get_hardware_num_threads_in_block
and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In commit D106033 we have the optimization phase. This commit adds the attributes to
the outlined function for the grid size. the two attributes are `omp_target_num_teams` and
`omp_target_thread_limit`. These values are added as long as they are constant.

Two functions are created `getNumThreadsExprForTargetDirective` and
`getNumTeamsExprForTargetDirective`. The original functions `emitNumTeamsForTargetDirective`
and `emitNumThreadsForTargetDirective` identify the expresion and emit the code.
However, for the Device version of the outlined function, we cannot emit anything.
Therefore, this is a first attempt to separate emision of code from deduction of the
values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106298

[dfsan][NFC] Describe how origin trace tracking works

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D106903

[libc] Fix x86_64 fenv implementation for windows

All fenv functions are also enabled for windows. Since two tests,
enabled_exceptions_test and feholdexcept_test are still failing on
windows, they have been disabled.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D106808

[PowerPC] Turn deprecated altivec prefetch instrs to nops on AIX

The dst/dstt/dstst/dststt instructions are nop's on all PowerPC
cores that AIX supports. The AIX assembler also does not accept
these mnemonics. Turn them into nop's on AIX (similar to dstall).

[x86] update stale code comment; NFC

The transform was generalized with:
1ce05ad619a5

[x86] add more tests for cmov and lea; NFC

[mlir] Add a FailureOr copy constructor from a FailureOr of a convertible type.

[PDL] Remove RewriteEndOp and mark RewriteOp as NoTerminator

RewriteEndOp was a fake terminator operation that is no longer needed now that blocks are not required to have terminators.

Differential Revision: https://reviews.llvm.org/D106911

[libc] Enable MPFR library for math functions test

Included more math functions to Windows's entrypoints
and made a cmake option (-DLLVM_LIBC_MPFR_INSTALL_PATH)
where the user can specify the install path where the MPFR
library was built so it can be linked. The try_compile was
moved to LLVMLibCCheckMPFR.cmake, so the variable that is
set after this process can retain its value in other files
of the same parent file. A direct reason for this is for
LIBC_TESTS_CAN_USE_MPFR to be true when the user specifies
MPFR's path and retain its value even after leaving the file.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D106894

Add a test for top-level expressions using "expr --top-level".

This was broken for a while even though the Python version
continued to work. This adds a test so it doesn't regress.

When calculating the "currently selected thread" in
Process::HandleStateChangedEvent, we check whether a thread stopped
for eStopReasonSignal is stopped for a signal that's currently set to
"no-stop". If it is, then we don't set that thread as the currently
selected thread.

But that only happens in the part of the algorithm that's handling the
case where the previously selected thread has no stop reason. Since we
want to keep on a thread as long as it is doing something interesting,
we always prefer the current thread. That's almost right, but we
forgot to check whether the previously selected thread stopped with an
eStopReasonSignal for a "no-stop" signal. If it did, then we shouldn't
select it.

This patch adds that check. I can't figure out a good way to test
this. This is the sort of thing that Ismail's scripted process plugin
will make easy once it is a real boy. But figuring out how to do this
in a real process is not trivial.

Differential Revision: https://reviews.llvm.org/D106712

Fix "break delete --disabled" with no arguments.

The code that figured out which breakpoints to delete was supposed
to set the result status if it found breakpoints, and then the code
that actually deleted them checked that the result's status was set.

The code for "break delete --disabled" failed to set the status if
no "protected" breakpoints were provided. This was a confusing way
to implement this, so I reworked it with early returns so it was less
error prone, and added a test case for the no arguments case.

Differential Revision: https://reviews.llvm.org/D106623

[libc++] Disable incomplete library features.

Adds a new CMake option to disable the usage of incomplete headers.
These incomplete headers are not guaranteed to be ABI stable. This
option is intended to be used by vendors so they can avoid their users
from code that's not ready for production usage.

The option is enabled by default.

Differential Revision: https://reviews.llvm.org/D106763

[mlir][bzl] Fix typo

[libclang] Check LLVM_HAVE_LINK_VERSION_SCRIPT

There are some platform that might not have version script support,
don't try to use version script on those.

Reviewed By: MaskRay, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D106914

[flang] Disallow BOZ literal constants as output list items

According to C7109, "A boz-literal-constant shall appear only as a
data-stmt-constant in a DATA statement, or where explicitly allowed in
16.9 as an actual argument of an intrinsic procedure." This change
enforces that constraint for output list items.

I also added a general interface to determine if an expression is a BOZ
literal constant and changed all of the places I could find where it
could be used.

I also added a test.

This change stemmed from the following issue --
https://gitlab-master.nvidia.com/fortran/f18-stage/issues/108

Differential Revision: https://reviews.llvm.org/D106893

AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9

The patterns for the m0 glue patterns were failing to import.

AMDGPU/GlobalISel: Fix wrong addrspace in test MMOs

AMDGPU/GlobalISel: Add a few tests for unaligned truncating stores

[hwasan] Fix stack safety test for old PM.

With the old PM, the stub for __hwasan_generate_tag is still generated
in the IR, but never called.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106858

[z/OS] Make MinGlobalAlign consistent with SystemZ

Remove overriding MinGlobalAlign to 0 for z/OS target to be consistent with SystemZ.

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D106890