review.tizen.org Git - platform/upstream/llvm.git/log

[AArch64][GlobalISel] More widenToNextPow2 changes, this time for arithmetic/bitwise ops.

NFC: Change quotes from Unicode to ASCII

This was causing some problems for Python scripts that we have.

Context: https://reviews.llvm.org/D106792

[lldb][AArch64] Annotate synchronous tag faults

In the latest Linux kernels synchronous tag faults
include the tag bits in their address.
This change adds logical and allocation tags to the
description of synchronous tag faults.
(asynchronous faults have no address)

Process 1626 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSEGV: sync tag check fault (fault address: 0x900fffff7ff9010 logical tag: 0x9 allocation tag: 0x0)

This extends the existing description and will
show as much as it can on the rare occasion something
fails.

This change supports AArch64 MTE only but other
architectures could be added by extending the
switch at the start of AnnotateSyncTagCheckFault.
The rest of the function is generic code.

Tests have been added for synchronous and asynchronous
MTE faults.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D105178

[AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary

While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.

This fixes a Vulkan CTS test that would hang otherwise.

Differential Revision: https://reviews.llvm.org/D105709

[mlir] Put back virtual ~ConversionTarget(), some users started relying on it

[mlir] Remove the default isDynamicallyLegal hook

This is redundant with the callback variant and untested. Also remove
the callback-less methods for adding a dynamically legal op, as they
are no longer useful.

Differential Revision: https://reviews.llvm.org/D106786

Fix FindZ3.cmake to support static libraries and Windows

Use absolute path to link z3 to allow builds both on windows and linux
since the library name is platform dependent for Z3 (libz3 on Windows
and z3 on Linux) and MSVC does not recognized -L and -l options.
Fix CMAKE_CROSSCOMPILING that does not work correctly since it uses
Z3_BUILD_VERSION instead of Z3_BUILD_NUMBER
Fix building with the static version of z3 library (supersedes D80227).

- Build the Z3 version detection code as C++, since the static
library brings in libstdc++ symbols
- Detect threading support and link against threading, in the
(likely) case Z3 was built with threads

Exposed compilation error from building a program that is used to detect
z3 version in the warning message, to simplify troubleshooting.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D106131

[LoopFlatten] Fix missed LoopFlatten opportunity

When the trip count of the inner loop is a constant, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.

Differential Revision: https://reviews.llvm.org/D105802

[RISCV] Optimize floating-point "dominant value" BUILD_VECTORs

This patch aims to improve the performance of BUILD_VECTORs which are
identified as containing a dominant element. Given that most
floating-point constants themselves require a load from the constant
pool, it was possible for the optimization to actually increase the
number of individual loads on small vectors. The exception is the zero
constant -- +0.0 -- which can be materialized efficiently.

While this optimization could do with a proper cost model to weigh the
benfits of a single vector load vs. the manipulation of individual
elements -- even for integer vectors which often require several
instructions to materialize -- without a concrete RVV implementation to
work with any heuristic is likely to be both more obtuse and inaccurate.

Until then, this patch fixes at least one known obvious deficiency.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106963

[RISCV] Add test case showing suboptimal BUILD_VECTOR lowering

The second test case added here was pointed out to me by @craig.topper
and shows how we "optimize" a two-element BUILD_VECTOR from being one
load from the constant pool to two loads from the constant pool.

The first test case shows that since materialization for the
floating-point +0.0 value is cheap and doesn't involve a load, the
optimization is more clearly beneficial here.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106962

test-release.sh: Kill python2

Don't prefer python2's virtualenv when setting up the test-suite.
Always use python3 instead, since that's what we support everywhere else
anyway.

Differential Revision: https://reviews.llvm.org/D106941

[clang-format] Fix aligning with linebreaks #2

This amends c5243c63cda3c740d6e9c7e501f6518c21688da3 to fix formatting
continued function calls with BinPacking = false.

Differential Revision: https://reviews.llvm.org/D106773

[libcxx][doc] Update the build documentation.

These are the hunks of
D106770 [libc++][doc] Update the release notes
that are relevant for main.

[NFC][InstSimplify] Use more intuitive variable names.

tsan: switch from SSE3 to SSE4.2

Switch x86_64 requirement for optimized code from SSE3 to SSE4.2.
The new tsan runtime will need few instructions that are only
supported by SSE4:

_mm_max_epu32
_mm_extract_epi8
_mm_insert_epi32

SSE3 was introcued in 2004 and SSE4 in 2006:
https://en.wikipedia.org/wiki/SSE3
https://en.wikipedia.org/wiki/SSE4

We are still providing non-optimized C++ version of the code,
so either way it's possible to build tsan runtime for any CPU.

But for Go this will bump strict requirement for -race because
Go contains prebuilt versions and these will be built with -msse4.2.
But requiring a CPU produced at least in 2006 looks reasonable for
a debugging tool (more reasonable than disabling optimizations
for everybody).

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106948

tsan: remove /**/ at the of multi-line macros

Prefer code readability over writeability.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106982

tsan: fix java_symbolization test

We reliably remove bottom libc-guts frames only on linux/glibc.
Some bots failed on this test showing other bottom frames:

.annobin_libc_start.c libc-start.c (libc.so.6+0x249f4)
generic_start_main.isra.0 libc-start.c (libc.so.6+0x45b0c)

We can't reliably remove all of possible bottom frames.
So remove the assertion for that.

Differential Revision: https://reviews.llvm.org/D107037

[libc++] Remove excess whitespace in synopsis comment. NFCI.

[NFC][X86] add missing tests in clang/test/CodeGen/attr-target-mv.c

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D106849

Implement recursive support into OperationEquivalence::isEquivalentTo()

This allows to use OperationEquivalence to track structural comparison for equality
between two operations.

Differential Revision: https://reviews.llvm.org/D106422

Add `all_of_zip` to STLExtras

This takes two ranges and invokes a predicate on the element-wise pair in the
ranges. It returns true if all the pairs are matching the predicate and the ranges
have the same size.
It is useful with containers that aren't random iterator where we can't check the
sizes in O(1).

Differential Revision: https://reviews.llvm.org/D106605

[test] Fix tools/gold/X86/comdat-nodeduplicate.ll on non-X86 hosts

When running this test on an aarch64 machine, it fails:

```
/usr/bin/ld.gold: error: .../test/tools/gold/X86/Output/comdat-nodeduplicate.ll.tmp/ab.lto.o: incompatible target
```

Specify the elf_x86_64 emulation as all of the other gold plugin tests
do.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107020

[libc][NFC] Add noreturn and constexpr qualifiers where appropriate

These functions make it clear to the compiler and user what the intended
behavior is so llvm can make them go as fast as possible.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D106807

libcang: Add missing function to libclang.map

This function is marked with CINDEX_LINKAGE, but was never added to the
export list / linker script.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D106974

[Preprocessor] -E -P: Ensure newline after 8 skipped lines.

The implementation of -fminimize-whitespace (D104601) revised the logic
when to emit newlines. There was no case to handle when more than
8 lines were skippped in -P (DisableLineMarkers) mode and instead fell
through the case intended for -fminimize-whitespace, i.e. emit nothing.
This patch will emit one newline in this case.

The newline logic is slightly reorganized. The `-P -fminimize-whitespace`
case is handled explicitly and emitting at least one newline is the new
fallback case. The choice between emitting a line marker or up to
7 empty lines is now a choice only with enabled line markers. The up to
8 newlines likely are fewer characters than a line directive, but
in -P mode this had the paradoxic effect that it would print up to
7 empty lines, but none at all if more than 8 lines had to be skipped.
Now with DisableLineMarkers, we don't consider printing empty lines
(just start a new line) which matches gcc's behavior.

The line-directive-output-mincol.c test is replaced with a more
comprehensive test skip-empty-lines.c also testing the more than
8 skipped lines behaviour with all flag combinations.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D106924

Update file names and extensions for MLIR Python execution engine changes.

NFC: Add missing import to integration test.

[gn build] Port 61c35fb0c2c9

[libc++][modularisation] Split <compare> into internal headers.

Differential Revision: https://reviews.llvm.org/D106107

[libc++] Remove unused variables in generate_private_header_tests.py. NFCI.

[libc++] money_get::do_get() set failbit and eofbit if iterator begin equals end

Summary:
Currently, if we pass in the same iterator for begin and end,
the long double version of do_get would throw a runtime error.

However, according to standard (https://eel.is/c++draft/locale.money.get#virtuals-1),
we should set the failbit and eofbit when no more characters are available.

Differential Revision: https://reviews.llvm.org/D100510

[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header

Function findBestLoopTopHelper tries to find a new loop top block which can also
fall through to OldTop, but it's impossible if OldTop is not a chain header, so
it should exit immediately.

Differential Revision: https://reviews.llvm.org/D106329

[RISCV] Optimize mul in the zba extension with SH*ADD

This patch makes the following optimization, if the
immediate multiplier is not a simm12.

(mul x, (power_of_2 + 2)) => (SH1ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 4)) => (SH2ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 8)) => (SH3ADD x, (SLLI x, bits))

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106648

[RISCV][test] Add new tests for mul optimization in the zba extension with SH*ADD

These test will show the following optimization by future patches.

(mul x, (power_of_2 + 2)) => (SH1ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 4)) => (SH2ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 8)) => (SH3ADD x, (SLLI x, bits))

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106647

[libc++] Implement the resolutions of LWG3506 and LWG3522.

Implement the changes in all language modes.

LWG3506 "Missing allocator-extended constructors for priority_queue"
makes the following changes:
- New allocator-extended constructors for priority_queue.
- New deduction guides targeting those constructors.

LWG3522: "Missing requirement on InputIterator template parameter
for priority_queue constructors". The iterator parameter should be
constrained to actually be an iterator type. `priority_queue{1,2}`
should be SFINAE-friendly ill-formed.

Also, do a drive-by fix in the allocator-extended move constructor:
there's no need to do a `make_heap` after moving from `__q.c` into
our own `c`, because that container was already heapified when it
was part of `__q`. [priqueue.cons.alloc] actually specifies the
behavior and does *not* mention calling `make_heap`. I think this
was just a copy-paste thinko. It dates back to the initial import
of libc++.

Differential Revision: https://reviews.llvm.org/D106824
Differential Revision: https://reviews.llvm.org/D106827

[ThinLTO] Disallow importing for functions with indir branch to block address

We don't allowing inlining for functions with blockaddress with uses other than strictly callbr. This is because if the blockaddress escapes the function via a global variable, inlining may lead to an invalid cross-function reference.

We check against such cases during inlining, however the check can fail for ThinLTO post-link because CFG simplification can incorrectly removes blocks based on wrong block reachability.

When we import a function with blockaddress taken in a global variable but without importing that variable, we won't go through value mapping to reflect the real address-taken-ness of the cloned blocks. For the imported clone, this leads to blocks reachable from indirect branch through global variable being incorrectly treated as unreachable and removed by SimplifyCFG.

Since inlining for such cases shouldn't be allowed in the first place, I'm marking them as ineligible for importing during pre-link to save the problem of missing address-taken-ness of imported clone as well as bad DCE and inlining.

Differential Revision: https://reviews.llvm.org/D106930

[asan][fuchsia] Implement PlatformUnpoisonStacks

This CL modifies the PlatformUnpoisonStacks so that fuchsia can
implement its own logic for unpoisoning the stacks.

For the general case, the behavior is the same as with regular asan: it
will unpoison everything from the current stack pointer until the base
of the stack (stack top).

In some situations, the current stack might not be the same as the
default stack. In those cases, the code will now unpoison the entire
default stack, and will also unpoison the current page of the stack.

Reviewed By: mcgrathr

Differential Revision: https://reviews.llvm.org/D106835

[libFuzzer] Fix CFI Directives for fuchsia

This commit fixes the CFI directives in the crash trampoline so
libunwind can get a backtrace during a crash.

In order to get a backtrace from a libfuzzer crash in fuchsia, we
resume execution in the crashed thread, forcing it to call the
StaticCrashHandler. We do this by setting a "crash trampoline" that has
all the necessary cfi directives for an unwinder to get full backtrace
for that thread.

Due to a bug in libunwind, it was not possible to restore the RSP
pointer, as it was always set to the call frame address (CFA). The
previous version worked around this issue by setting the CFA to the
value of the stack pointer at the point of the crash.

The bug in libunwind is now fixed[0], so I am correcting the CFI
annotations so that the CFA correctly points to the beginning of the
trampoline's call frame.

[0]: https://reviews.llvm.org/D106626

Reviewed By: mcgrathr

Differential Revision: https://reviews.llvm.org/D106725

[llvm-objcopy][MachO] Ignore all LC_SUB_* commands.

The LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA, LC_SUB_CLIENT, and LC_SUB_LIBRARY
are used to indicate related libraries, binaries or framework names.
Their only payload is the string with the name of the object. Adding
those commands to the list of ignored/skipped load commands will avoid
an error that stop the process of copying/stripping and will copy their
contents verbatim.

Additionally, in order to have a test for this case, `yaml2obj` now
allows those four commands to contain a `Content`.

Differential Revision: https://reviews.llvm.org/D106412

[AArch64][GlobalISel] Improve legalization for odd-type G_LOAD

Swap the order of widening so that we widen to the next power-of-2 first when
legalizing G_LOAD.

Also, provide a minimum type for the power of 2 to disallow s2 + s1. Clamping
ought to disallow s2 and s1, but I think it's better to be explicit about the
expected minimum size.

We probably need a similar change for G_STORE, but it seems to be a bit more
finnicky. So, let's just handle G_LOAD for now.

Differential Revision: https://reviews.llvm.org/D107013

Break apart the MLIR ExecutionEngine from core python module.

* For python projects that don't need JIT/ExecutionEngine, cuts the number of files to compile roughly in half (with similar reduction in end binary size).

Differential Revision: https://reviews.llvm.org/D106992

Emit strong definition for TypeID storage in Op/Type/Attributes definition

By making an explicit template specialization for the TypeID provided by these classes,
the compiler will not emit an inline weak definition and rely on the linker to unique it.
Instead a single definition will be emitted in the C++ file alongside the implementation
for these classes. That will turn into a linker error what is now a hard-to-debug runtime
behavior where instances of the same class may be using a different TypeID inside of
different DSOs.

Recommit 660a56956c32b0bcd850fc12fa8ad0225a6bb880 after fixing gcc5
build.

Differential Revision: https://reviews.llvm.org/D105903

NFC: Adapt operation.py to builtin operation print format changes.

[tsan] Fix Darwin build after D106973

Revert "[tsan] Fix Darwin build after D106973"

It was invalid fix.

This reverts commit 6a0fe68844150f16e16fe64d050509e4ba740d98.

[tsan] Fix Darwin build after D106973

[Attributor] Don't test internalization in the CGSCC pass.

Summary:
Enabling internalization in the Attributor's CGSCC pass does something
different that we don't expect. Ignore this for now to pass the tests.

[gn build] Port 0f4b41e03853

[Attributor] Change function internalization to not replace uses in internalized callers

The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106931

[lld-macho] Downgrade "cannot export hidden symbol" to warning

This matches ld64's behavior, and makes it easier to fit LLD
into existing build systems.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D107011

[gn build] Manually port dbed061b

[Bazel] Fix digest for bazel-skylib 1.0.3

I apparently left in the old digest when updating the version, so for my
local build Bazel just happily used the cached version, but anyone
attempting a fresth build would get a mismatch.

Differential Revision: https://reviews.llvm.org/D107010

[AArch64][GlobalISel] Improve legalization for odd-sized G_ICMP/G_CONSTANT

We were handing types like s88 like

1) clamp to the range
2) widen to the next power of 2

This isn't desirable because it causes an odd breakdown for types like s88.
If we widen to the next power of 2 (s128) first, then we get a clean breakdown
when we clamp back to s64.

Differential Revision: https://reviews.llvm.org/D106998

[NFC][Codegen][X86] Autogenerate check lines in avx.ll test

[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR

Reapply commit d675b594f4f1e1f6a195fb9a4fd02cf3de92292d that was
reverted due to buildbot failures. A simple fix has been applied to
remove an assertion.

Differential Revision: https://reviews.llvm.org/D105207

[gn build] Add support for Win/x86 compiler-rt

This allows us to build the x86 profile runtime.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106972

[profile] Add underscore to /alternatename for Win/x86

/alternatename should use the mangled name. On x86 we need an extra
underscore.

Copied from sanitizer_win_defs.h

Fixes https://crbug.com/1233589.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D107000

[libc] add strncmp to strings

Add strncmp as a function to strings.h. Also adds unit tests, and adds
strncmp as an entrypoint for all current platforms.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D106901

[clang] fix concepts crash on substitution failure during normalization

When substitution failed on the first constrained template argument (but
only the first), we would assert / crash. Checking for failure was only
being performed from the second constraint on.

This changes it so the checking is performed in that case,
and the code is also now simplified a little bit to hopefully
avoid this confusion.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D106907

[clang] NFC: refactor multiple implementations of getDecltypeForParenthesizedExpr

This cleanup patch refactors a bunch of functional duplicates of
getDecltypeForParenthesizedExpr into a common implementation.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: aaronpuchert

Differential Revision: https://reviews.llvm.org/D100713

Revert "Emit strong definition for TypeID storage in Op/Type/Attributes definition"

This reverts commit 660a56956c32b0bcd850fc12fa8ad0225a6bb880.

This broke the GCC5 build

[mlir] Set the namespace of the BuiltinDialect to 'builtin'

Historically the builtin dialect has had an empty namespace. This has unfortunately created a very awkward situation, where many utilities either have to special case the empty namespace, or just don't work at all right now. This revision adds a namespace to the builtin dialect, and starts to cleanup some of the utilities to no longer handle empty namespaces. For now, the assembly form of builtin operations does not require the `builtin.` prefix. (This should likely be re-evaluated though)

Differential Revision: https://reviews.llvm.org/D105149

[trace] Introduce Hierarchical Trace Representation (HTR) and add  command for Intel PT trace visualization

This diff introduces Hierarchical Trace Representation (HTR) and creates the `thread trace export ctf  -f <filename> -t <thread_id>` command to export an Intel PT trace's HTR to Chrome Trace Format (CTF) for visualization.

See `lldb/docs/htr.rst` for context/documentation on HTR.

**Overview of Changes**
    - Add HTR documentation (see `lldb/docs/htr.rst`)
    - Add HTR structures (layer, block, block metadata)
    - Implement "Basic Super Block" HTR pass
    - Add 'thread trace export ctf' command to export the HTR of an Intel PT
      trace to Chrome Trace Format (CTF)

As this diff is the first iteration of HTR and trace visualization, future diffs will build on this work by generalizing the internal design of HTR and implementing new HTR passes that provide better trace summarization/visualization.

See attached video for an example of Intel PT trace visualization:
{F17851042}

Original Author: jj10306

Submitted by: wallace

Reviewed By: wallace, clayborg

Differential Revision: https://reviews.llvm.org/D105741

[clang] Evaluate strlen of strcpy argument for -Wfortify-source.

Also introduce Expr::tryEvaluateStrLen.

Differential Revision: https://reviews.llvm.org/D104887

[libc++] Add UNSUPPORTED for clang-14 since the underlying bug hasn't been fixed yet

This started breaking in the CI because we bumped the Clang version to 14,
which requires adjusting the markup in the test suite. I think it's actually
nice the we need to do that and that it doesn't happen automatically, since
it serves as a reminder that this is broken in Clang.

Revert "[trace] Introduce Hierarchical Trace Representation (HTR) and add command for Intel PT trace visualization"

This reverts commit aad17c55a8116cd3831d4392d909139702019d65.

Breaks LLDB build on 32 bit Arm/Linux bot:
https://lab.llvm.org/buildbot/#/builders/17/builds/9497

Differential Revision: https://reviews.llvm.org/D105741

[Bazel] Added missing targets to LLVM bazel rules.

Added the following targets to the LLVM Bazel overlay:

AVR
Mips
MPS430
SystemZ
XCore

Reviewed By: GMNGeoffrey

Differential Revision: https://reviews.llvm.org/D106921

[Bazel] Update for dbed061bf1

This adds Bazel configuration for the TargetMCA targets, which currently
only includes AMDGPU.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106996

[ELF][test] Convert --start-address= and --stop-address= values to hexadecimal

so that readers can connect them with the hexadecimal addresses in the output.

[ELF][test] Delete unneeded --triple=thumb* from llvm-objdump RUN lines

Optionally eliminate blocking runtime.await calls by converting functions to coroutines.

Interop parallelism requires needs awaiting on results. Blocking awaits are bad for performance. TFRT supports lightweight resumption on threads, and coroutines are an abstraction than can be used to lower the kernels onto TFRT threads.

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D106508

[libcxx][ranges] Add ranges::take_view.

Differential Revision: https://reviews.llvm.org/D106507

[DebugInfo][docs] Fix DISubprogram fields

D45024 renamed the field in `DISubprogram` from `variables:` to
`retainedNodes:`. Some of the docs were updated in D89082 but this
updates the rest.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D106926

COFF/ELF: Place llvm.global_ctors elements in llvm.used if comdat is used

On ELF, an SHT_INIT_ARRAY outside a section group is a GC root. The current
codegen abuses SHT_INIT_ARRAY in a section group to mean a GC root.

On PE/COFF, the dynamic initialization for `__declspec(selectany)` in a comdat
can be garbage collected by `-opt:ref`.

Call `addUsedGlobal` for the two cases to fix the abuse/bug.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106925

[ARM] Fix llvm-objdump disassembly of armv7m object files.

Apparently, the features were getting mixed up, so we'd try to
disassemble in ARM mode. Fix sub-architecture detection to compute the
correct triple if we're detecting it automatically, so the user doesn't
need to pass --triple=thumb etc.

It's possible we should be somehow tying the "+thumb-mode" target
feature more directly to Tag_CPU_arch_profile? But this seems to work
reasonably well, anyway.

While I'm here, fix up the other llvm-objdump tests that were explicitly
specifying an ARM triple; that shouldn't be necessary.

Differential Revision: https://reviews.llvm.org/D106912

Emit strong definition for TypeID storage in Op/Type/Attributes definition

By making an explicit template specialization for the TypeID provided by these classes,
the compiler will not emit an inline weak definition and rely on the linker to unique it.
Instead a single definition will be emitted in the C++ file alongside the implementation
for these classes. That will turn into a linker error what is now a hard-to-debug runtime
behavior where instances of the same class may be using a different TypeID inside of
different DSOs.

Differential Revision: https://reviews.llvm.org/D105903

Add some missing CMake dependencies between MLIR dialects (NFC)

tsan: don't print __tsan_atomic* functions in report stacks

Currently __tsan_atomic* functions do FuncEntry/Exit using caller PC
and then use current PC (pointing to __tsan_atomic* itself) during
memory access handling. As the result the top function in reports
involving atomics is __tsan_atomic* and the next frame points to user code.

Remove FuncEntry/Exit in atomic functions and use caller PC
during memory access handling. This removes __tsan_atomic*
from the top of report stacks, so that they point right to user code.

The motivation for this is performance.
Some atomic operations are very hot (mostly loads),
so removing FuncEntry/Exit is beneficial.
This also reduces thread trace consumption (1 event instead of 3).

__tsan_atomic* at the top of the stack is not necessary
and does not add any new information. We already say
"atomic write of size 4", "__tsan_atomic32_store" does not add
anything new.

It also makes reports consistent between atomic and non-atomic
accesses. For normal accesses we say "previous write" and point
to user code; for atomics we say "previous atomic write" and now
also point to user code.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106966

sanitizer_common: avoid compiler-interted memset in deadlock detector

Compilers tends to insert memset/memcpy for some struct/array operations,
and these don't play well inside of sanitizer runtimes.
Avoiding these calls was the intention behind internal_memset.
Remove the leftover ={} that can result in memset call.

Reviewed By: vitalybuka, pgousseau

Differential Revision: https://reviews.llvm.org/D106978

tsan: strip __libc_start_main frame

We strip all frames below main but in some cases it may be not enough.
Namely, when main is instrumented but does not call any other instrumented code.
In this case __tsan_func_entry in main obtains PC pointing to __libc_start_main
(as we pass caller PC to __tsan_func_entry), but nothing obtains PC pointing
to main itself (as main does not call any instrumented code).
In such case we will not have main in the stack, and stripping everything
below main won't work.
So strip __libc_start_main explicitly as well.
But keep stripping of main because __libc_start_main is glibc/linux-specific,
so looking for main is more reliable (and usually main is present in stacks).

Depends on D106957.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106958

tsan: don't use caller/current PC in Java interfaces

Caller PC is plain harmful as native caller PC has nothing to do with Java code.
Current PC is not particularly useful and may be somewhat confusing for Java users
as it makes top frame to point to some magical __tsan function.
But obtaining and using these PCs adds runtime cost for every java event.
Remove these PCs. Rely only on official Java frames.
It makes execution faster, code simpler and reports better.

Depends on D106956.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106957

tsan: print alloc stack for Java objects

We maintain information about Java allocations,
but for some reason never printed it in reports.
Print it.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106956

[MCA] Moving the target specific CustomBehaviour impl. from /tools/llvm-mca/ to /lib/Target/.

Differential Revision: https://reviews.llvm.org/D106775

tsan: add more micro benchmarks

1. Add a set of micro benchmarks for memory accesses,
   mem* functions and unaligned accesses.
2. Add support for multiple benchmarks in a single binary
   (or it would require 12 new benchmark binaries).
3. Remove the "clock growth" machinery,
   it affects the current tsan runtime by increasing size of
   all vector clocks, but this won't be relevant for the new
   tsan runtime.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106961

tsan: remove mblock types

We used to count number of allocations/bytes based on the type
and maybe record them in heap block headers.
But that's all in the past, now it's not used for anything.
Remove the mblock type.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106971

tsan: remove unused pc arguments

Remove pc argument of ThreadIgnoreEnd, ThreadIgnoreSyncEnd
and AcquireGlobal functions. It's unused and in some places
we don't even have a pc and pass 0 anyway.
Don't confuse readers and don't pretend that pc is needed
and that passing 0 is somehow deficient.

Use simpler convention for ThreadIgnoreBegin and ThreadIgnoreSyncBegin:
accept only pc instread of pc+save_stack. 0 pc means "don't save stack".

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106973

[trace] Introduce Hierarchical Trace Representation (HTR) and add  command for Intel PT trace visualization

This diff introduces Hierarchical Trace Representation (HTR) and creates the `thread trace export ctf  -f <filename> -t <thread_id>` command to export an Intel PT trace's HTR to Chrome Trace Format (CTF) for visualization.

See `lldb/docs/htr.rst` for context/documentation on HTR.

**Overview of Changes**
    - Add HTR documentation (see `lldb/docs/htr.rst`)
    - Add HTR structures (layer, block, block metadata)
    - Implement "Basic Super Block" HTR pass
    - Add 'thread trace export ctf' command to export the HTR of an Intel PT
      trace to Chrome Trace Format (CTF)

As this diff is the first iteration of HTR and trace visualization, future diffs will build on this work by generalizing the internal design of HTR and implementing new HTR passes that provide better trace summarization/visualization.

See attached video for an example of Intel PT trace visualization:
{F17851042}

Original Author: jj10306

Submitted by: wallace

Reviewed By: wallace, clayborg

Differential Revision: https://reviews.llvm.org/D105741

[LoopFlatten] Fix bug where SCEVCouldNotCompute object is used

The SCEV method getBackedgeTakenCount() returns a SCEVCouldNotCompute
object if the backedge-taken count is unpredictable. This fix ensures
there is no longer an attempt to use such an object to find the trip
count.

Patch by: Rosie Sumpter.

Differential Revision: https://reviews.llvm.org/D106970

[PredicateInfo] Use Intrinsic::getDeclaration now that it handles unnamed types.

This is a second attempt to fix the EXPENSIVE_CHECKS issue that was mentioned In D91661#2875179 by @jroelofs.

(The first attempt was in D105983)

D91661 more or less completely reverted D49126 and by doing so also removed the cleanup logic of the created declarations and calls.
This patch is a replacement for D91661 (which must itself be reverted first). It replaces the custom declaration creation with the
generic version and shows the test impact. It also tracks the number of NamedValues to detect if a new prototype was added instead
of looking at the available users of a prototype.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D106147

Revert "Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types."

This reverts commit 77080a1eb6061df2dcfae8ac84b85ad4d1e02031.

This change introduced issues detected with EXPENSIVE_CHECKS. Reverting to restore the
needed function cleanup. A next patch will then just improve on the name mangling.

Simplify allowing pragma float_control in a linkage specification

This amends b0ef3d8f666fa6008abb09340b73d9340d442569 based on a suggestion from James Y Knight.

[mlir][sparse] use proper type alias for filename ptr

Reviewed By: gussmith23

Differential Revision: https://reviews.llvm.org/D106904

[NFC] Test commit to verify commit access

[llvm] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]]

[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.

Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.

[mlir] run the verifier before translating a module

In translation from MLIR to another IR, run the MLIR verifier on the parsed
module to ensure only valid modules are given to the translation. Previously,
we would send any module that could be parsed to the translation, including
semantically invalid modules, leading to surprising errors or lack thereof down
the pipeline.

Depends On D106937

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D106938

[mlir] harden result type verification in llvm.call

The verifier of the llvm.call operation was not checking for mismatches between
the number of operation results and the number of results in the signature of
the callee. Furthermore, it was possible to construct an llvm.call operation
producing an SSA value of !llvm.void type, which should not exist. Add the
verification and treat !llvm.void result type as absence of call results.
Update the GPU conversions to LLVM that were mistakenly assuming that it was
fine for llvm.call to produce values of !llvm.void type and ensure these calls
do not produce results.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D106937

[NFC][PowerPC] Fix spe.ll to work with update_llc_test_checks.py again

Using split-file does not work with update_llc_test_checks.py. It's also
mostly redundant, as the single and double tests can just use a single
llc and FileCheck invocation for each FPU type using -check-prefixes
rather than -check-prefix, and update_llc_test_checks.py will merge what
it can. Only test_dasmconst needs to be SPE-only and so is pulled out
into its own mall file (rather than using sed to preprocess the file and
keep it commented out for EFPU2, which would work, but is ugly).

As well as cutting down on the number of RUN lines, this also results in
test_fma's CHECK lines being merged for both FPUs.

Reviewed By: kiausch

Differential Revision: https://reviews.llvm.org/D106969

Revert "[lldb] Temporarily bump the max length of the pexpect error message to diagnose an lldb-aarch64 test failure"

This reverts commit 5db8e232126fc4c0f5d5b0339bdc5a49830268d1. The test has
been disabled since then on the bot and we got the logs we wanted.

[RISCV] Fix grammar in a comment. NFC

[RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop.

The sign_extend we insert here can get turned into a zero_extend if
the sign bit is known zero. This can enable a setcc combine that
shrinks compares with zero_extend. This reduces the use count of
the zero_extend allowing other combines to turn it back into an
any_extend.

This restricts the combine to only cases where the result is used
by a CopyToReg. This works for my original motivating case. I
hope the CopyToReg use will prevent any converted extends from
turning back into an any_extend.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106754