review.tizen.org Git - platform/upstream/llvm.git/log

Recommit: [lldb] Remove "dwarf dynamic register size expressions" from RegisterInfo

The previous version of the patch did not update the definitions in
conditionally compiled code. This patch includes changes to ARC and
windows targets.

Original commit message was:

These were added to support some mips registers on linux, but linux mips
support has now been removed due.

They are still referenced in the freebds mips implementation, but the
completeness of that implementation is also unknown. All other
architectures just set these fields to zero, which is a cause of
significant bloat in our register info definitions.

Arm also has registers with variable sizes, but they were implemented in
a more gdb-compatible fashion and don't use this feature.

Differential Revision: https://reviews.llvm.org/D110914

[TwoAddressInstruction] Fix ReplacedAllUntiedUses in processTiedPairs

Fix the calculation of ReplacedAllUntiedUses when any of the tied defs
are early-clobber. The effect of this is to fix the placement of kill
flags on an instruction like this (from @f2 in
test/CodeGen/SystemZ/asm-18.ll):

  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], killed %4:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

After TwoAddressInstruction without this patch:

  %3:grh32bit = COPY killed %4:grh32bit
  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

Note that the COPY kills %4, even though there is a later use of %4 in
the INLINEASM. This fails machine verification if you force it to run
after TwoAddressInstruction (currently it is disabled for other
reasons).

After TwoAddressInstruction with this patch:

  %3:grh32bit = COPY %4:grh32bit
  INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit

Differential Revision: https://reviews.llvm.org/D110848

[TwoAddressInstruction] Pre-commit a test case for D110848

[mlir] Don't allow dynamic extent tensor types for ConstShapeOp.

ConstShapeOp has a constant shape, so its type can always be static.
We still allow it to have ShapeType though.

Differential Revision: https://reviews.llvm.org/D111139

[AArch64][SME] Update DUP (predicate) instruction

Changes in architecture revision 00eac1:
  * Renamed to PSEL.
  * Copies whole source register.
  * Element type suffix removed from destination.
  * Element index no longer optional and '#' prefix has been removed.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-09

Depends on D111212.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D111213

[AArch64][SME] Update tile slice index offset

Changes in architecture revision 00eac1:
  * Tile slice index offset no longer prefixed with '#'.
  * The syntax for 128-bit (.Q) ZA tile slice accesses must now include
    an explicit zero index.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-09

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111212

[VPlan] Replace hard-coded VPValue ids with patterns in tests.

This makes the tests a bit more robust with respect to small changes in
the value numbering.

[libcxx][pretty printers] Correct locale for u16/u32 string tests

When the locale is not some UTF-8 these tests fail.
(different results for python2 linked gdbs vs. python3
but same issue)

Setting the locale just for the test works around this.
By default Ubuntu comes with just C.UTF-8. I've chosen
to use en_US.UTF-8 instead given that my Mac doesn't have
the former and there's a slim chance this test might run there.

This also enables the u16string tests which are now passing.

Reviewed By: #libc, ldionne, saugustine

Differential Revision: https://reviews.llvm.org/D111138

[libcxx][CI] Install all locales used by the test suite

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D111235

[mlir][linalg] Add unsigned min/max/cast function to OpDSL.

Update OpDSL to support unsigned integers by adding unsigned min/max/cast signatures. Add tests in OpDSL and on the C++ side to verify the proper signed and unsigned operations are emitted.

The patch addresses an issue brought up in https://reviews.llvm.org/D111170.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D111230

[Clang][OpenMP] Fix windows buildbot failure for D105191

Fixes 4c4117089599cb5b6c6fa5635c28462ffd1bddf4.

[GlobalISel] Silence gcc warning about unused variable

[Clang][OpenMP] Add partial support for Static Device Libraries

An archive containing device code object files can be passed to
clang command line for linking. For each given offload target
it creates a device specific archives which is either passed to llvm-link
if the target is amdgpu, or to clang-nvlink-wrapper if the target is
nvptx. -L/-l flags are used to specify these fat archives on the command
line. E.g.
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib

It currently doesn't support linking an archive directly, like:
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a

Linking with x86 offload also does not work.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D105191

[CFE][Codegen] Update auto-generated check lines for few GPU lit tests

which is essentially required as a pre-commit for https://reviews.llvm.org/D110257.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D110676

Don't register mem segments that aren't present in a corefile

A Mach-O corefile has an array of memory segments, representing
the memory of the process at the point of the capture. Each segment
has a virtual address + size, and a file offset + size. The file
size may be less than the virtual address size, indicating that
the memory was unavailable. When ProcessMachCore::DoLoadCore scans
this array of memory segments, it builds up a table to translate
virtual addresses to file offsets, for memory read requests.
This lookup table combines contiguous memory segments into a single
entry, to reduce the number of entries (some corefile writers will
emit a separate segement for each virtual meory page).

This contiguous check wasn't taking into account a segment that
isn't present in the corefile, e.g. filesize==0, and every contiguous
memory segment after that point would result in lldb reading the
wrong offset of the file because it didn't account for this.

I'd like to have an error message when someone tries to read memory from
one of these segments, instead of returning all zeroes, so this patch
intentionally leaves these out of the vmaddr -> fileoff table (and
avoids combining them with segments that actually do exist in the
corefile).

I'm a little unsure of writing a test for this one; I'd have to do
a yaml2obj of a corefile with the problem, or add an internal mode
to the Mach-O process save-core where it could write a filesize==0
segment while it was writing one.

rdar://83382487

Update TODO noting that DriverKit should be added too

When BridgeOS and DriverKit are added to llvm Triple.

[MLIR] Add OrOp folding rule for constant one operand

Add folding rule for std.or op when an operand has all bits set.

or(x, <all bits set>) -> <all bits set>

Differential Revision: https://reviews.llvm.org/D111206

[IR][NFC] Rename getBaseObject to getAliaseeObject

To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.

[CUDA] remove unneeded includes from CUDA-related headers.

This should fix bot failures on PPC and windows.

[MLIR] fix arith dialect build failure

Missing function defs causes errors on some build configs.

DebugInfo: Use clang's preferred names for integer types

This reverts c7f16ab3e3f27d944db72908c9c1b1b7366f5515 / r109694 - which
suggested this was done to improve consistency with the gdb test suite.
Possible that at the time GCC did not canonicalize integer types, and so
matching types was important for cross-compiler validity, or that it was
only a case of over-constrained test cases that printed out/tested the
exact names of integer types.

In any case neither issue seems to exist today based on my limited
testing - both gdb and lldb canonicalize integer types (in a way that
happens to match Clang's preferred naming, incidentally) and so never
print the original text name produced in the DWARF by GCC or Clang.

This canonicalization appears to be in `integer_types_same_name_p` for
GDB and in `TypeSystemClang::GetBasicTypeEnumeration` for lldb.

(I tested this with one translation unit defining 3 variables - `long`,
`long (*)()`, and `int (*)()`, and another translation unit that had
main, and a function that took `long (*)()` as a parameter - then
compiled them with mismatched compilers (either GCC+Clang, or
Clang+(Clang with this patch applied)) and no matter the combination,
despite the debug info for one CU naming the type "long int" and the
other naming it "long", both debuggers printed out the name as "long"
and were able to correctly perform overload resolution and pass the
`long int (*)()` variable to the `long (*)()` function parameter)

Did find one hiccup, identified by the lldb test suite - that CodeView
was relying on these names to map them to builtin types in that format.
So added some handling for that in LLVM. (these could be split out into
separate patches, but seems small enough to not warrant it - will do
that if there ends up needing any reverti/revisiting)

Differential Revision: https://reviews.llvm.org/D110455

[GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots

Differential Revision: https://reviews.llvm.org/D109114

Return failure on failure in convertBlockSignature.

This was causing a subsequent assert/crash when a type converter failed to convert a block argument.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D110985

[gn build] Port ccfb0555f76b

[CUDA] Implement experimental support for texture lookups.

The patch implements header-only support for testure lookups.

The patch has been tested on a source file with all possible combinations of
argument types supported by CUDA headers, compiled and verified that the
generated instructions and their parameters match the code generated by NVCC.
Unfortunately, compiling texture code requires CUDA headers and can't be tested
in clang itself. The test will need to be added to the test-suite later.

While generated code compiles and seems to match NVCC, I do not have any code
that uses textures that I could test correctness of the implementation. Hence
the experimental status.

Differential Revision: https://reviews.llvm.org/D110089

More size_t -> uint64_t fixes after 05392466

Fixes some bots where the two differ.

Add missing diagnostic for use of _reserved name in extern "C"
declaration.

Names starting with an underscore are reserved at the global scope, so
cannot be used as the name of an extern "C" symbol in any scope because
such usages conflict with a name at global scope.

PR50644: Do not warn on a declaration of `operator"" _foo`.

Also do not warn on `#define _foo` or `#undef _foo`.

Only global scope names starting with _[a-z] are reserved, not the use
of such an identifier in any other context.

PR50641: Properly handle AttributedStmts when checking for a valid
constexpr function body.

[SCEV] Search operand tree for scope bound when inferring flags from IR

When checking to see if we can apply IR flags to a SCEV, we need to identify a bound on the defining scope of the SCEV to be produced. We'd previously added support for a couple SCEVExpr types which trivially imply bounds, but hadn't handled types such as umax where the bounds come from the bounds of the operands. This does the obvious thing, and recurses through operands searching for a tighter bound on the defining scope.

I'm honestly surprised by how little this seems to mater on existing tests, but it's worth doing for completeness sake alone.

Differential Revision: https://reviews.llvm.org/D111191

Revert "Reland "[clang][Fuchsia] Re-enable compiler-rt tests in runtimes build""

This reverts commit c52d60ec3b9202eaa9e109be539f1d2a03fbad70.

The failing scudo test came up again.

Add a unit test for llvm-gcc producer strings and cleanup code. (NFC)

Simplify control flow (NFC)

Use llvm::VersionTuple to store DWARF producer info (NFC)

This has the nice side-effect that it can actually store the quadruple version numbers that Apple's tools are using nowadays.

rdar://82982162

Differential Revision: https://reviews.llvm.org/D111200

[X86] Don't use popcnt for parity if only bits 7:0 of the input can be non-zero.

Without popcnt we had a special case for using the parity flag from a single test i8 test instruction if only bits 7:0 could be non-zero. That special case is still useful when we have popcnt.

To reach this special case, we enable custom lowering of parity for i16/i32/i64 even when popcnt is enabled. The check for POPCNT being enabled is now after the special case in LowerPARITY.

Fixes PR52093

Differential Revision: https://reviews.llvm.org/D111249

Fix assert of "Unable to find base lambda address" from
adjustMemberOfForLambdaCaptures.

The problem is happening when user passes lambda function with reference
type in the map clause.

The natural of the problem when processing generateInfoForCapture,
the BasePointer is generated with new load for a lambda variable with
reference type. It is not expected in adjustMemberOfForLambdaCaptures.

One way to fix this is to skipping call to generateInfoForCapture for
map(to:lambda). The map info will be generated later in the call to
generateDefaultMapInfo samiler as firsprivate clase.

This to fix https://bugs.llvm.org/show_bug.cgi?id=52071

Differential Revision:https://reviews.llvm.org/D111115

[clang][Fuchsia] Add -static-libgcc to TSAN tests

This will ensure that tsan tests are built with the built libunwind.a
rather than the host libunwind.so.

[flang] Catch mismatched parentheses in prescanner

Source lines with mismatched parentheses are hard cases for error
recovery in parsing, and the best error message (viz.,
"here's an unmatched parenthesis") can be emitted from the
prescanner.

Differential Revision: https://reviews.llvm.org/D111254#3046173

Reland "[clang][Fuchsia] Re-enable compiler-rt tests in runtimes build"

This reverts commit 95f824ad7c2da1e8142dbade4e4e12f389f1eb78.

We should've addressed the remaining issues on our CI builders.

[compiler-rt][memprof] Disambiguate checks for __tls_get_addr in output

TestCases/stress_dtls.c was failing when we ran memprof tests for the first
time. The test checks that __tls_get_addr is not in the output for the last
run when it is possible for the interceptor __interceptor___tls_get_addr to
be in the output from stack dumps. The test actually intends to check that
the various __tls_get_addr reports don't get emitted when intercept_tls_get_addr=0.
This updates the test to also check for the following `:` and preceding `==`
which should ignore the __interceptor___tls_get_addr interceptor.

Differential Revision: https://reviews.llvm.org/D111192

size_t -> uint64_t after 05392466

Fixes some bots where the two differ.

[libc++] [test] s/ContiguousView/MoveOnlyView/g. NFCI.

The unique (ha!) thing about this range type is that it's move-only.
Its contiguity is unsurprising (most of our test ranges are contiguous).
Discussed in D111231 but committed separately for clarity.

[libc++] [test] Change a lot of free begin/end pairs to members. NFCI.

If you have a `begin() const` member, you don't need a `begin()` member
unless you want it to do something different (e.g. have a different return
type). So in general, //view// types don't need `begin()` non-const members.

Also, static_assert some things about the types in "types.h", so that we
don't accidentally break those properties under refactoring.

Differential Revision: https://reviews.llvm.org/D111231

[clang] Add option to clear AST memory before running LLVM passes

This is to save memory for Clang compiles.
Measuring building PassBuilder.cpp under /usr/bin/time, max rss goes from 0.93GB to 0.7GB.

This does not turn it by default yet.

I've turned on the option locally and run it over a good amount of files without any issues.

For more background, see
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068930.html.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111105

Reland [IR] Increase max alignment to 4GB

Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451

[gn build] Port 67231650e6ef

[lldb] [ABI/X86] Split base x86 and i386 classes

Split the ABIX86 class into two classes: base ABIX86 class that is
common to 32-bit and 64-bit ABIs, and ABIX86_i386 class that is the base
for 32-bit ABIs. This removes the confusing concept that ABIX86
initializes 64-bit ABIs but is only the base for 32-bit ABIs.

Differential Revision: https://reviews.llvm.org/D111216

[SCEV] Avoid unnecessary domination checks (NFC)

When determining the defining scope, avoid repeatedly querying
dominationg against the function entry instruction. This ends up
begin a very common case that we can handle more efficiently.

[NFC][X86][LoopVectorize] Autogenerate check lines in a few tests for ease of updating

For D111220

[libc++] Use init_priority(100) when possible

Priorities below 101 are reserved for the implementation, so that's what
we should be using here. That is unfortunately only supported on more
recent versions of Clang. See https://reviews.llvm.org/D31413 for details.

Differential Revision: https://reviews.llvm.org/D95972

[libc++] Simplify writing testing config files

Reduce code duplication by sharing most of the test suite setup across
the different from-scratch configs.

Differential Revision: https://reviews.llvm.org/D111196

[gn build] (manually) port 77d5ccdc6f460

(similar to 64f623d4c37c6)

[APInt] Fix isAllOnes and extractBits for zero width values.

isAllOnes() should return true for zero bit values because
there are no zeros in it.

Thanks to Jay Foad for pointing this out.

Differential Revision: https://reviews.llvm.org/D111241

[MLIR] Split arith dialect from the std dialect

Create the Arithmetic dialect that contains basic integer and floating
point arithmetic operations. Ops that did not meet this criterion were
moved to the Math dialect.

First of two atomic patches to remove integer and floating point
operations from the standard dialect. Ops will be removed from the
standard dialect in a subsequent patch.

Reviewed By: ftynse, silvas

Differential Revision: https://reviews.llvm.org/D110200

[scev] minor style improvement [nfc]

[tests] precommit test changes for D111191

[flang] Define IEEE_SCALB, IEEE_NEXT_AFTER, IEEE_NEXT_DOWN, IEEE_NEXT_UP

These functions were missing from the standard intrinsic module
IEEE_ARITHMETIC. IEEE_SCALB is an alias for the standard intrinsic
function SCALE(), and the others are defined as new builtin intrinsic
functions.

Differential Revision: https://reviews.llvm.org/D111253

Disable SANITIZER_CHECK_DEADLOCKS on Darwin platforms.

Although THREADLOCAL variables are supported on Darwin they cannot be
used very early on during process init (before dyld has set it up).

Unfortunately the checked lock is used before dyld has setup TLS leading
to an abort call (`_tlv_boostrap()` is never supposed to be called at
runtime).

To avoid this problem `SANITIZER_CHECK_DEADLOCKS` is now disabled on
Darwin platforms. This fixes running TSan tests (an possibly other
Sanitizers) when `COMPILER_RT_DEBUG=ON`.

For reference the crashing backtrace looks like this:

```
* thread #1, stop reason = signal SIGABRT
  * frame #0: 0x00000002044da0ae dyld`__abort_with_payload + 10
    frame #1: 0x00000002044f01af dyld`abort_with_payload_wrapper_internal + 80
    frame #2: 0x00000002044f01e1 dyld`abort_with_payload + 9
    frame #3: 0x000000010c989060 dyld_sim`abort_with_payload + 26
    frame #4: 0x000000010c94908b dyld_sim`dyld4::halt(char const*) + 375
    frame #5: 0x000000010c988f5c dyld_sim`abort + 16
    frame #6: 0x000000010c96104f dyld_sim`dyld4::APIs::_tlv_bootstrap() + 9
    frame #7: 0x000000010cd8d6d2 libclang_rt.tsan_iossim_dynamic.dylib`__sanitizer::CheckedMutex::LockImpl(this=<unavailable>, pc=<unavailable>) at sanitizer_mutex.cpp:218:58 [opt]
    frame #8: 0x000000010cd8a0f7 libclang_rt.tsan_iossim_dynamic.dylib`__sanitizer::Mutex::Lock() [inlined] __sanitizer::CheckedMutex::Lock(this=0x000000010d733c90) at sanitizer_mutex.h:124:5 [opt]
    frame #9: 0x000000010cd8a0ee libclang_rt.tsan_iossim_dynamic.dylib`__sanitizer::Mutex::Lock(this=0x000000010d733c90) at sanitizer_mutex.h:162:19 [opt]
    frame #10: 0x000000010cd8a0bf libclang_rt.tsan_iossim_dynamic.dylib`__sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=0x000000030c7479a8, mu=<unavailable>) at sanitizer_mutex.h:364:10 [opt]
    frame #11: 0x000000010cd89819 libclang_rt.tsan_iossim_dynamic.dylib`__sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=0x000000030c7479a8, mu=<unavailable>) at sanitizer_mutex.h:363:67 [opt]
    frame #12: 0x000000010cd8985b libclang_rt.tsan_iossim_dynamic.dylib`__sanitizer::LibIgnore::OnLibraryLoaded(this=0x000000010d72f480, name=0x0000000000000000) at sanitizer_libignore.cpp:39:8 [opt]
    frame #13: 0x000000010cda7aaa libclang_rt.tsan_iossim_dynamic.dylib`__tsan::InitializeLibIgnore() at tsan_interceptors_posix.cpp:219:16 [opt]
    frame #14: 0x000000010cdce0bb libclang_rt.tsan_iossim_dynamic.dylib`__tsan::Initialize(thr=0x0000000110141400) at tsan_rtl.cpp:403:3 [opt]
    frame #15: 0x000000010cda7b8e libclang_rt.tsan_iossim_dynamic.dylib`__tsan::ScopedInterceptor::ScopedInterceptor(__tsan::ThreadState*, char const*, unsigned long) [inlined] __tsan::LazyInitialize(thr=0x0000000110141400) at tsan_rtl.h:665:5 [opt]
    frame #16: 0x000000010cda7b86 libclang_rt.tsan_iossim_dynamic.dylib`__tsan::ScopedInterceptor::ScopedInterceptor(this=0x000000030c747af8, thr=0x0000000110141400, fname=<unavailable>, pc=4568918787) at tsan_interceptors_posix.cpp:247:3 [opt]
    frame #17: 0x000000010cda7bb9 libclang_rt.tsan_iossim_dynamic.dylib`__tsan::ScopedInterceptor::ScopedInterceptor(this=0x000000030c747af8, thr=<unavailable>, fname=<unavailable>, pc=<unavailable>) at tsan_interceptors_posix.cpp:246:59 [opt]
    frame #18: 0x000000010cdb72b7 libclang_rt.tsan_iossim_dynamic.dylib`::wrap_strlcpy(dst="\xd2", src="0xd1d398d1bb0a007b", size=20) at sanitizer_common_interceptors.inc:7386:3 [opt]
    frame #19: 0x0000000110542b03 libsystem_c.dylib`__guard_setup + 140
    frame #20: 0x00000001104f8ab4 libsystem_c.dylib`_libc_initializer + 65
    ...
```

rdar://83723445

Differential Revision: https://reviews.llvm.org/D111243

Returning poison from a function w/ noundef return attribute is UB

This does for readability of returns within said function as what we do for the caller side when reasoning about what might be poison.

Differential Revision: https://reviews.llvm.org/D111180

[UBSAN][PS4] For the PS4 target, emit the ud2 ocpode for ubsan traps.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D111172

[mlir] Fix redundant return in the void method.

clang-tidy, fix redundant return statement at the end of the void method.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D111251

[llvm-profgen] Support symbol list for accurate profile

Differential Revision: https://reviews.llvm.org/D110859

Revert "Reland [IR] Increase max alignment to 4GB"

This reverts commit 8d64314ffea55f2ad94c1b489586daa8ce30f451.

Update some types after D110451

To fix mismatched size_t vs uint64_t on some platforms.

[PowerPC] Fix issue with lowering byval parameters.

Lowering of byval parameters with sizes that are not represented by a single
store require multiple stores to properly address the correct size of the
parameter.

Sizes that cannot be done with a single store are 3 bytes, 5 bytes, 6 bytes,
7 bytes. It is not correct to simply perform an 8 byte store and for these
elements because then the store would be larger than the element and alias
analysis would assume that this is undefined behaivour and return NoAlias
for them.

This patch adds the correct stores so that the size of the store is not larger
than the size of the element.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D108795

[libc++] Implement P1391 for string_view

Implement P1391 (https://wg21.link/p1391) which allows
`std::string_view` to be constructible from any contiguous range of
characters.

Note that a different paper (http://wg21.link/P1989) handles the generic
range constructor for `std::string_view`.

Reviewed By: ldionne, Quuxplusone, Mordante, #libc

Differential Revision: https://reviews.llvm.org/D110718

[SCEV] Infer flags from add/gep in any block

This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

Differential Revision: https://reviews.llvm.org/D111186

[CostModel][TTI] Replace BAD_ICMP_PREDICATE with ICMP_NE for generic smulo/umulo cost expansion

Match the predicate used in TargetLowering::expandMULO to detect overflow

[CostModel][TTI] Fix ops used for generic smulo/umulo cost expansion

Fix copy+pasta that was checking for smul_fix instead of smul_with_overflow to detected signed values.

The LShr is performed on the extended type as we use it to truncate+extract the upper/hi bits of the extended multiply.

More closely matches the default expansion from TargetLowering::expandMULO

[CostModel][TTI] Replace BAD_ICMP_PREDICATE with ICMP_ULT/UGT for generic uadd/usubo cost expansion

Match the predicates used in TargetLowering::expandUADDSUBO

[compiler-rt][test] Add shared_unwind requirement

When using a static libunwind, the check_memcpy.c can fail because it checks
that tsan intercepted all memcpy/memmoves in the final binary. Though if the
static libunwind is not instrumented, then this will fail because it may contain
regular memcpy/memmoves.

This adds a new REQUIRES check for ensuring that this test won't run unless a
dynamic libunwind.so is provided.

Differential Revision: https://reviews.llvm.org/D111194

Reland [IR] Increase max alignment to 4GB

Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451

[X86] Add X86 and X64 prefixes to parity.ll to reduce duplicate check lines. NFC

[gn build] Port 10f16bc7b2bf

Revert "[lldb] [ABI/X86] Split base x86 and i386 classes"

This change broke the windows lldb bot.

This reverts commit a30a36f66aea459337999a000c7997b220b25227.

[FPEnv][InstSimplify] Fold constrained X + -0.0 ==> X

Currently the fadd optimizations in InstSimplify don't know how to do this
"X + -0.0 ==> X" fold when using the constrained intrinsics. This adds the
support.

This commit is derived from D106362 with some improvements from D107285.

Differential Revision: https://reviews.llvm.org/D111085

[Driver][test] Add Debian multiarch lib/clang/14.0.0/x86_64-linux-gnu and include/x86_64-linux-gnu/c++/v1 tests

[sanitizer] Switch to StackDepotNode to 64bit hash

Now we can avoid scanning the stack on fast path.
The price is the false stack trace with probability of the hash collision.
This increase performance of lsan by 6% and pre-requirement for stack compression.

Depends on D111182.

Reviewed By: morehouse, dvyukov

Differential Revision: https://reviews.llvm.org/D111183

[mlir][tosa] Create basic dynamic shape support for several ops.

Transpose, Matmul and Fully-connected dynamic shape support

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D111167

[X86] Add test cases for PR52093. NFC

[MLIR] Update DRR doc with returnType directive

Add missing documentation.

Reviewed By: Chia-hungDuan, jpienaar

Differential Revision: https://reviews.llvm.org/D110964

Revert "[IR] Increase max alignment to 4GB"

This reverts commit df84c1fe78130a86445d57563dea742e1b85156a.

Breaks some bots

[Clang][OpenMP] Allow loop-transformations with template parameters.

Clang would reject

    #pragma omp for
    #pragma omp tile sizes(P)
    for (int i = 0; i < 128; ++i) {}

where P is a template parameter, but the loop itself is not
template-dependent. Because P context-dependent, the TransformedStmt
cannot be generated and therefore is nullptr (until the template is
instantiated by TreeTransform). The OMPForDirective would still expect
the a loop is the dependent context and trigger an error.

Fix by introducing a NumGeneratedLoops field to OMPLoopTransformation.
This is used to distinguish the case where no TransformedStmt will be
generated at all (e.g. #pragma omp unroll full) and template
instantiation is needed. In the latter case, delay resolving the
iteration space like when the for-loop itself is template-dependent
until the template instatiation.

A more radical solution would always delay the iteration space analysis
until template instantiation, but would also break many test cases.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D111124

[sanitizer] Support Intel CET

1. Include <cet.h> in sanitizer_common/sanitizer_asm.h to mark Intel CET
support when Intel CET is enabled.
2. Add _CET_ENDBR to function entries in assembly codes so that ENDBR
instruction will be generated when Intel CET is enabled.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D111185

[CMake] Fix typo in error message for LLD in bootstrap builds.

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D110836

[NFC] Add doxygen comment for hasFp in RISCVFrameLowering.cpp

[ARM] Fix a bug in finding a pair of extracts to create VMOVRRD

D100244 missed a check on the ResNo of the extract's operand 0 when finding a
pair of extracts to combine into a VMOVRRD (extract(x, n); extract(x, n+1) ->
VMOVRRD(extract x, n/2)). As a result, it can incorrectly pair an extract(x, n)
with another extract(x:3, n+1) for example. This patch fixes the bug by adding
the proper check on ResNo.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D111188

[IR] Increase max alignment to 4GB

Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451

[clang] Allow printing 64 bit ints in diagnostics

Currently we're limited to 32 bit ints in diagnostics.
With support for 4GB alignments coming soon, we need to report 4GB as the max alignment allowed.
I've tested that this does indeed properly print 2^32.

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D111184

[Test] Add LoopPeel test for loops with profile data available

Patch by Dmitry Makogon!

[analyzer][NFC] Add RangeSet::dump

This tiny change improves the debugging experience of the solver a lot!

Differential Revision: https://reviews.llvm.org/D110911

[MLIR] Improve debug messages in BuiltinTypes

It's nice for users to have more information when debugging failures and
these are only triggered in a failure path.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D107676

[MLIR] Rename Shape dialect's `join` to `meet`.

For the type lattice, we (now) use the "less specialized or equal" partial
order, leading to the bottom representing the empty set, and the top
representing any type.

This naming is more in line with the generally used conventions, where the top
of the lattice is the full set, and the bottom of the lattice is the empty set.
A typical example is the powerset of a finite set: generally, meet would be the
intersection, and join would be the union.

```
top:  {a,b,c}
     /   |   \
{a,b} {a,c} {b,c}
   |  X     X  |
   {a} { b } {c}
      \  |  /
bottom: { }
```

This is in line with the examined lattice representations in LLVM:
* lattice for `BitTracker::BitValue` in `Hexagon/BitTracker.h`
* lattice for constant propagation in `HexagonConstPropagation.cpp`
* lattice in `VarLocBasedImpl.cpp`
* lattice for address space inference code in `InferAddressSpaces.cpp`

Reviewed By: silvas, jpienaar

Differential Revision: https://reviews.llvm.org/D110766

[BasicAA] Don't unnecessarily extend pointer size

BasicAA GEP decomposition currently performs all calculation on the
maximum pointer size, but at least 64-bit, with an option to double
the size. The code comment claims that this improves analysis power
when working with uint64_t indices on 32-bit systems. However, I don't
see how this can be, at least while maintaining correctness:

When working on canonical code, the GEP indices will have GEP index
size. If the original code worked on uint64_t with a 32-bit size_t,
then there will be truncs inserted before use as a GEP index. Linear
expression decomposition does not look through truncs, so this will
be an opaque value as far as GEP decomposition is concerned. Working
on a wider pointer size does not help here (or have any effect at all).

When working on non-canonical code (before first InstCombine), the
GEP indices are implicitly truncated to GEP index size. The BasicAA
code currently just ignores this fact completely, and pretends that
this truncation doesn't happen. This is incorrect and will be
addressed by D110977.

I believe that for correctness reasons, it is important to work on
the actual GEP index size to properly model potential overflow.
BasicAA tries to patch over the fact that it uses the wrong size
(see adjustToPointerSize), but it only does that in limited cases
(only for constant values, and not all of them either). I'd like to
move this code towards always working on the correct size, and
dropping these artificial pointer size adjustments is the first step
towards that.

Differential Revision: https://reviews.llvm.org/D110657

[InstSimplify] (x | y) & (x | !y) --> x

https://alive2.llvm.org/ce/z/QagQMn

This fold is handled by instcombine via SimplifyUsingDistributiveLaws(),
but we are missing the sibliing fold for 'logical and' (implemented with
'select'). Retrofitting the code in instcombine looks much harder
than just adding a small adjustment here, and this is potentially more
efficient and beneficial to other passes.

[InstSimplify] add tests for bitwise logic fold of 'and'; NFC

[analyzer][solver] Fix CmpOpTable handling bug

There is an error in the implementation of the logic of reaching the `Unknonw` tristate in CmpOpTable.

```
void cmp_op_table_unknownX2(int x, int y, int z) {
  if (x >= y) {
                    // x >= y    [1, 1]
    if (x + z < y)
      return;
                    // x + z < y [0, 0]
    if (z != 0)
      return;
                    // x < y     [0, 0]
    clang_analyzer_eval(x > y);  // expected-warning{{TRUE}} expected-warning{{FALSE}}
  }
}
```
We miss the `FALSE` warning because the false branch is infeasible.

We have to exploit simplification to discover the bug. If we had `x < y`
as the second condition then the analyzer would return the parent state
on the false path and the new constraint would not be part of the State.
But adding `z` to the condition makes both paths feasible.

The root cause of the bug is that we reach the `Unknown` tristate
twice, but in both occasions we reach the same `Op` that is `>=` in the
test case. So, we reached `>=` twice, but we never reached `!=`, thus
querying the `Unknonw2x` column with `getCmpOpStateForUnknownX2` is
wrong.

The solution is to ensure that we reached both **different** `Op`s once.

Differential Revision: https://reviews.llvm.org/D110910

Revert "[lldb] Remove "dwarf dynamic register size expressions" from RegisterInfo"

This reverts commit 00e704bf080ffeeb9e334fb3ab71594f9aa50969.

This commit should should have updated
llvm/llvm-project/lldb/source/Plugins/ABI/ARC/ABISysV_arc.cpp like the other
architectures.

[Clang][OpenMP] Infix OMPLoopTransformationDirective abstract class. NFC.

Insert OMPLoopTransformationDirective between OMPLoopBasedDirective and the loop transformations OMPTileDirective and OMPUnrollDirective. This simplifies handling of loop transformations not requiring distinguishing between OMPTileDirective and OMPUnrollDirective anymore.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D111119

Revert "[IR] Remove arg_operands and getNumArgOperands (NFC)"

This reverts commit c72722f45ef1aa6d78e1e6fee07ab6bd86980da8.