review.tizen.org Git - platform/upstream/llvm.git/log

[ELF] Allow mixed SHF_LINK_ORDER & non-SHF_LINK_ORDER sections and sort within InputSectionDescription

LLD currently does not allow non-contiguous SHF_LINK_ORDER components in an
output section. This makes it infeasible to add SHF_LINK_ORDER to an existing
metadata section if backward compatibility with older object files are
concerned.

We did not allow mixed components (like GNU ld) and D77007 relaxed to allow
non-contiguous SHF_LINK_ORDER components. This patch allows arbitrary mix, with
sorting performed within an InputSectionDescription. For example,
`.rodata : {*(.rodata.foo) *(.rodata.bar)}`, has two InputSectionDescription's.
If there is at least one SHF_LINK_ORDER and at least one non-SHF_LINK_ORDER in
.rodata.foo, they are ordered within `*(.rodata.foo)`: we arbitrarily place
SHF_LINK_ORDER components before non-SHF_LINK_ORDER components (like Solaris ld).

`*(.rodata.bar)` is ordered similarly, but the two InputSectionDescription's
don't interact. It can be argued that this is more reasonable than the previous
behavior where written order was not respected.

It would be nice if the two different semantics (ordering requirement & garbage
collection) were not overloaded on one section flag, however, it is probably
difficult to obtain a generic flag at this point
(https://groups.google.com/forum/#!topic/generic-abi/hgx_m1aXqUo
"SHF_LINK_ORDER's original semantics make upgrade difficult").

(Actually, without the GC semantics, SHF_LINK_ORDER would still have the
sh_link!=0 & sh_link=0 issue. It is just that people find the GC semantics more
useful and tend to use the feature more often.)

GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=16833

Differential Revision: https://reviews.llvm.org/D84001

[DFSan] Support fast16labels mode in dfsan_union.

While the instrumentation never calls dfsan_union in fast16labels mode,
the custom wrappers do. We detect fast16labels mode by checking whether
any labels have been created. If not, we must be using fast16labels
mode.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D86012

[flang][directives] Use TableGen to generate clause unparsing

Use the TableGen directive back-end to generate code for the clauses unparsing.

Reviewed By: sscalpone, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D85851

[llvm] Don't create the directory hierarchy in the FileCollector...

... if the collected file doesn't exists.

This fixes the situation where LLDB can't create a file when capturing a
reproducer because the parent path doesn't exist, but can during replay
because the file collector created the directory hierarchy even though
the file doesn't exist.

This is covered by the lldb reproducer test suite.

GlobalISel: Fix parameter name in doxygen comment

GlobalISel: Revisit users of other merge opcodes in artifact combiner

The artifact combiner searches for the uses of G_MERGE_VALUES for
unmerge/trunc that need further combining. This also needs to handle
the vector merge opcodes the same way. This fixes leaving behind some
pairs I expected to be removed, that were if the legalizer is run a
second time.

[libcxx/variant] Correctly propagate return type of the visitor.

The tests for it were missing so I've added them.

Reviewed By: #libc, EricWF

Differential Revision: https://reviews.llvm.org/D86006

[DSE,MemorySSA] Check for underlying objects first.

isWriteAtEndOfFunction needs to check all memory uses of Def, which is
much more expensive than getting the underlying objects in practice.
Switch the call order, as recommended by the TODO, which was added as
per an earlier review.

This shaves off a bit of compile-time.

GlobalISel: Early continue to reduce loop indentation

[lldb] Skip TestError.test with reproducers

This tests the driver, which is bypassed by the reproducer during
replay.

[lldb] Add missing LLDB_REGISTER for GarbageCollectAllocatedModules

Add the missing LLDB_REGISTER_STATIC_METHOD macro and format the file.

[test] Fix aggregate-assign-call.c in preparation for -enable-npm-optnone

Pin the test to use -enable-npm-optnone.
Before, optnone wasn't implemented under NPM, so the LPM and NPM runs produced different IR. Now with -enable-npm-optnone, that is no longer necessary.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D86008

[test] Fix thinlto-debug-pm.c in preparation for -enable-npm-optnone

This fails due to the clang invocation running at -O0, producing an optnone function.
Then even with -O2 in the later invocations, LoopVectorizePass doesn't run on the optnone function.
So split this into an -O0 run and an -O2 run.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86011

[lldb] Replace unittest2.expectedFailure with expectedFailure (NFC)

Rename the existing expectedFailure to expectedFailureIfFn to better
describe its purpose and provide an overload for
unittest2.expectedFailure in decorators.py.

[lldb] Use os.path.sep in TestInvalidArgsLog.py to fix Windows bot

Revert "Make compiler-rt/asan tests run with llvm-lit."

This reverts commit 7f84f62ef07a0a540a7dd751d08aae583d5e2472.

Seems to be causing a bunch of compiler-rt test failures on
ppc64-linux bots.

[ELF] Enforce two-dash form for some LLD specific options and the newer --[no-]pcrel-optimize

Since -[no-]toc-optimize has not ever been used, we can enforce the two-dash form as well.

[DSE,MemorySSA] Account for ScanLimit == 0 on entry.

Currently the code does not account for the fact that getDomMemoryDef
can be called with ScanLimit == 0, if we reached the limit while
processing an earlier access. Also tighten the check a bit more and bump
the scan limit now that it is handled properly.

In some cases, this brings a 2x speedup in terms of compile-time.

Adds __str__ support to python mlir.ir.MlirModule.

* Also raises an exception on parse error.
* Removes placeholder smoketest.
* Adds docstrings.

Differential Revision: https://reviews.llvm.org/D86046

NFC: [GVNHoist] Hoist loop invariant code and rename variables for readability

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D86031

[lldb] Add missing signal include for TestMultipleDebuggers.py

Fixes multi-process-driver.cpp:221:19: error: use of undeclared
identifier 'SIG_IGN'

[lldb] Only link against Python 3 when LLDB_ENABLE_PYTHON is set.

Only link against Python3_LIBRARY when LLDB_ENABLE_PYTHON is true. We
have to be more strict now becuase Python3_LIBRARY might be set to
NOTFOUND instead of being not set at all.

AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset

We may have an SGPR->VGPR copy if a totally uniform pointer
calculation is used for a VGPR pointer operand.

Also hack around a bug in MUBUF matching which would incorrectly use
MUBUF for global when flat was requested. This should really be a
predicate on the parent pattern, but the DAG always checked this
manually inside the complex pattern.

Make compiler-rt/asan tests run with llvm-lit.

This sets some config parameters so we can run the asan tests with
llvm-lit,
e.g. `./bin/llvm-lit [...]/compiler-rt/test/asan`

Differential Revision: https://reviews.llvm.org/D83821

[lldb] Fix and re-enable TestMultipleDebuggers

The comment says that TestMultipleDebuggers was XFAILed because it was
failing nondeterministically in which case it should be skipped not
failed (as XPASS will cause the test suite to fail).

The reason it fails is because it was not marked as a no-debug-info test
case. I've ran the test in a loop and it has been passing consistently.
Let's enable it and see if the bots agree, if not we can skip it.

[lldb-vscode] NFC: clang format

Run clang-format on all the c++ file of this project.

[mlir] Provide LLVMType::getPrimitiveSizeInBits

This function is available on llvm::Type and has been used by some clients of
the LLVM dialect before the transition. Implement the MLIR counterpart.

Reviewed By: schweitz

Differential Revision: https://reviews.llvm.org/D85847

[MLIR] Add support for defining and using Op specific analysis

- Add variants of getAnalysis() and friends that operate on a specific derived
  operation types.
- Add OpPassManager::getAnalysis() to always call the base getAnalysis() with OpT.
- With this, an OperationPass can call getAnalysis<> using an analysis type that
  is generic (works on Operation *) or specific to the OpT for the pass. Anything
  else will fail to compile.
- Extend AnalysisManager unit test to test this, and add a new PassManager unit
  test to test this functionality in the context of an OperationPass.

Differential Revision: https://reviews.llvm.org/D84897

[lldb] Get rid of helper CMake variables for Python

This patch is a big sed to rename the following variables:

  s/PYTHON_LIBRARIES/Python3_LIBRARIES/g
  s/PYTHON_INCLUDE_DIRS/Python3_INCLUDE_DIRS/g
  s/PYTHON_EXECUTABLE/Python3_EXECUTABLE/g
  s/PYTHON_RPATH/Python3_RPATH/g

I've also renamed the CMake module to better express its purpose and for
consistency with FindLuaAndSwig.

Differential revision: https://reviews.llvm.org/D85976

[libomptarget][NFC] Sort list of plugins in chronological order

Differential Revision: https://reviews.llvm.org/D86082

Reset PAL metadata when AMDGPU traget stream finishes

If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one. See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC.

No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side. Let me know if there is a good way to test this.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D85667

[LLD][PowerPC] Implement GOT to PC-Rel relaxation

This patch implements the handling for the R_PPC64_PCREL_OPT relocation as well
as the GOT relocation for the associated R_PPC64_GOT_PCREL34 relocation.

On Power10 targets with PC-Relative addressing, the linker can relax
GOT-relative accesses to PC-Relative under some conditions. Since the sequence
consists of a prefixed load, followed by a non-prefixed access (load or store),
the linker needs to replace the first instruction (as the replacement
instruction will be prefixed). The compiler communicates to the linker that
this optimization is safe by placing the two aforementioned relocations on the
GOT load (of the address).
The linker then does two things:

- Convert the load from the got into a PC-Relative add to compute the address
relative to the PC
- Find the instruction referred to by the second relocation (R_PPC64_PCREL_OPT)
and replace the first with the PC-Relative version of it

It is important to synchronize the mapping from legacy memory instructions to
their PC-Relative form. Hence, this patch adds a file to be included by both
the compiler and the linker so they're always in agreement.

Differential revision: https://reviews.llvm.org/D84360

[libc] Make benchmark boxplots transparent.

So that the configuration box does not make a part of the plot invisible.

Reviewers: sivachandra

Differential Revision: https://reviews.llvm.org/D85953

[PowerPC] Fix thunk alignment issue when using pc-rel instruction

Thunk alignment is added in thie patch when using pc-rel instructions
to avoid crossing the 64 byte boundary.

Patched by: nemanjai, NeHuang
Reviewed By: sfertile, MaskRay

Differential Revision: https://reviews.llvm.org/D85973

DAG: Add missing comment for transform

[flang] Copy attributes and flags onto host-assoc symbols

As with use-associated symbols, copy the attributes and flags from the
original symbol onto host-associated symbols when they are created.

This was showing up as an error on a deallocate of a host-associated
name. We reported an error because the symbol didn't have the POINTER
or ALLOCATABLE attribute.

Differential Revision: https://reviews.llvm.org/D85763

[clangd] Fix Windows build when remote index is enabled.

CMake log:
```
CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:823 (add_executable):
  Target "clangd" links to target "Threads::Threads" but the target was not
  found.  Perhaps a find_package() call is missing for an IMPORTED target, or
  an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:150 (add_llvm_executable)
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:160 (add_clang_executable)
  D:/llvm-project/clang-tools-extra/clangd/tool/CMakeLists.txt:4 (add_clang_tool)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:821 (add_executable):
  Target "ClangdTests" links to target "Threads::Threads" but the target was
  not found.  Perhaps a find_package() call is missing for an IMPORTED
  target, or an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:1417 (add_llvm_executable)
  D:/llvm-project/clang-tools-extra/clangd/unittests/CMakeLists.txt:32 (add_unittest)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:527 (add_library):
  Target "RemoteIndexProtos" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:103 (llvm_add_library)
  D:/llvm-project/llvm/cmake/modules/FindGRPC.cmake:105 (add_clang_library)
  D:/llvm-project/clang-tools-extra/clangd/index/remote/CMakeLists.txt:2 (generate_grpc_protos)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:527 (add_library):
  Target "clangdRemoteIndex" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:103 (llvm_add_library)
  D:/llvm-project/clang-tools-extra/clangd/index/remote/CMakeLists.txt:11 (add_clang_library)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:527 (add_library):
  Target "clangdRemoteMarshalling" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:103 (llvm_add_library)
  D:/llvm-project/clang-tools-extra/clangd/index/remote/marshalling/CMakeLists.txt:1 (add_clang_library)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:823 (add_executable):
  Target "clangd-index-server" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
```

Reviewed By: kbobyrev

Differential Revision: https://reviews.llvm.org/D86052

AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping

AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics

This should use the default mapping and insert a copy to the vcc bank,
and not try to insert a readfirstlane.

AMDGPU: Don't look at dbg users for foldable operands

These would have always failed to fold, so checking them or adding
them to the fold candidates is useless.

[mlir] do not use llvm.cmpxchg with floats

According to the LLVM Language Reference, 'cmpxchg' accepts integer or pointer
types. Several MLIR tests were using it with floats as it appears possible to
programmatically construct and print such an instruction, but it cannot be
parsed back. Use integers instead.

Depends On D85899

Reviewed By: flaub, rriddle

Differential Revision: https://reviews.llvm.org/D85900

[flang] Add preprocessor test for defines passed on the command line

This adds a test for D85862 to ensure that preprocessor definitions
passed on command lines don't regress in future.

Reviewed By: tskeith

Differential Revision: https://reviews.llvm.org/D85967

GlobalISel: Remove unnecessary check for copy type

COPY isn't allowed to change the type, but can mix no type with type.

AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo

AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics

Global instructions have the signed offsets.

[mlir] Move data layout from LLVMDialect to module Op attributes

Legacy implementation of the LLVM dialect in MLIR contained an instance of
llvm::Module as it was required to parse LLVM IR types. The access to the data
layout of this module was exposed to the users for convenience, but in practice
this layout has always been the default one obtained by parsing an empty layout
description string. Current implementation of the dialect no longer relies on
wrapping LLVM IR types, but it kept an instance of DataLayout for
compatibility. This effectively forces a single data layout to be used across
all modules in a given MLIR context, which is not desirable. Remove DataLayout
from the LLVM dialect and attach it as a module attribute instead. Since MLIR
does not yet have support for data layouts, use the LLVM DataLayout in string
form with verification inside MLIR. Introduce the layout when converting a
module to the LLVM dialect and keep the default "" description for
compatibility.

This approach should be replaced with a proper MLIR-based data layout when it
becomes available, but provides an immediate solution to compiling modules with
different layouts, e.g. for GPUs.

This removes the need for LLVMDialectImpl, which is also removed.

Depends On D85650

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D85652

[llvm] support graceful failure of DataLayout parsing

Existing implementation always aborts on syntax errors in a DataLayout
description. While this is meaningful for consuming textual IR modules, it is
inconvenient for users that may need fine-grained control over the layout from,
e.g., command-line options. Propagate errors through the parsing functions and
only abort in the top-level parsing function instead.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D85650

[lldb] Skip TestSimulatorPlatform with sanitized builds

The test executable crashes when ran on a simulator. Skipping until this is
fixed.

rdar://67238668

[SystemZ/ZOS]__(de)register_frame are not available on z/OS.

The functions `__register_frame`/`__deregister_frame` are not
available on z/OS, so add a guard to not use them.

Reviewed By: lhames, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D84787

[NFC] run update test script

On Transforms/LoopUnroll/runtime-small-upperbound.ll

[NFC] Tweak a comment about the lock-free builtins

[lldb][NFC] Use StringRef in CreateFunctionDeclaration/GetDeclarationName

CreateFunctionDeclaration should just take a StringRef. GetDeclarationName is
(only) used by CreateFunctionDeclaration so that's why now also takes a
StringRef.

[llvm-readobj] - Remove unwrapOrError calls from GNUStyle<ELFT>::printRelocations.

This fixes existent FIXMEs: we should not error out when unable to
find the number of relocations.

Differential revision: https://reviews.llvm.org/D85891

[RISCV] Enable the use of the old mucounteren name

The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which
has the same numeric CSR value as `mucounteren` from 1.09.1. This patch
enables the use of the old `mucounteren` name.

Patch by Yuichi Sugiyama.

Reviewed By: lenary, jrtc27, pzheng

Differential Revision: https://reviews.llvm.org/D85067

[RISCV] Indirect branch generation in position independent code

This fixes the "Unable to insert indirect branch" fatal error sometimes
seen when generating position-independent code.

Patch by msizanoen1

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D84833

[gn build] Port c1f6ce0c732

[InstCombine] fold abs(X)/X to cmp+select

The backend can convert the select-of-constants to
bit-hack shift+logic if desirable.

https://alive2.llvm.org/ce/z/pgJT6E

  define i8 @src(i8 %x) {
  %0:
    %a = abs i8 %x, 1
    %d = sdiv i8 %x, %a
    ret i8 %d
  }
  =>
  define i8 @tgt(i8 %x) {
  %0:
    %cond = icmp sgt i8 %x, 255
    %r = select i1 %cond, i8 1, i8 255
    ret i8 %r
  }
  Transformation seems to be correct!

[InstCombine] add tests for sdiv-of-abs; NFC

[InstCombine] reduce code duplication; NFC

[llvm-readobj/elf] - Refine the warning about the broken PT_DYNAMIC segment.

Splitted out from D85519.

Currently we report "PT_DYNAMIC segment offset + size exceeds the size of the file",
this changes it to
"PT_DYNAMIC segment offset (0x1234) + file size (0x5678) exceeds the size of the file (0x68ab)"

Differential revision: https://reviews.llvm.org/D85654

[DemandedBits] Improve accuracy of Add propagator

The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.

I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.

This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.

Known A:     0_______0_______0_______0_______
Known B:     0_______0_______0_______0_______
AOut:        00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch:   00000000001111111000000000000000

Committed on behalf of: @rrika (Erika)

Differential Revision: https://reviews.llvm.org/D72423

[DemandedBits] Reorder addition test checks. NFC.

As suggested on D72423 we should try to keep the same order as the original IR

[NFC] Run update script on test

Update IndVarSimplify/no-iv-rewrite.ll

[LLD][ELF] - Do not produce an invalid dynamic relocation order with --shuffle-sections.

Normally (when not on android with android relocation packing enabled),
we put IRelative relocations to ".rel[a].dyn", after other relocations,
to ensure that IRelatives are processed last by the dynamic loader.

To achieve that we add the `in.relaIplt` after the `part.relaDyn`:
https://github.com/llvm/llvm-project/blob/master/lld/ELF/Writer.cpp#L540

The problem is that `--shuffle-sections` might break the sections order.
This patch fixes it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47056.

Differential revision: https://reviews.llvm.org/D85651

[lldb][NFC] Remove name parameter from CreateFunctionTemplateDecl

It's unused and not documented.

[lldb][NFC] Use expect_expr in more tests

[X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases

Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported.

We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004

[lldb][NFC] Use the proper type for the 'storage' parameter of CreateFunctionDeclaration

All the callers pass an enum and we cast the int anyway back to the actual type,
so we might as well just use the type for the parameter.

[InlineCost] Fix scalable vectors in visitAlloca

Discovered as part of the VLS type work (see D85128).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85848

[NFC][StackSafety] Move out sort from the loop

[lldb] Remove OS-specific string from TestInvalidArgsLog

This is the error message from the OS, so we shouldn't check against the
OS-specific part of the string.

Fixes the test on Windows which returns a different error message.

[lldb] Don't delete orphaned shared modules in SBDebugger::DeleteTarget

In D83876 the consensus seems that LLDB should never deleted orphaned modules
implicitly. However, SBDebugger::DeleteTarget is currently doing exactly that.
This code was added in 753406221b55b95141c8c1239660dc4db4e35ea5 but I don't see
any explanation in the commit, so I think we should delete it.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D83933

[lldb/Utility] Simplify and generalize Scalar class

The class contains an enum listing all host integer types as well as
some non-host types. This setup is a remnant of a time when this class
was actually implemented in terms of host integer types. Now that we are
using llvm::APInt, they are mostly useless and mean that each function
needs to enumerate all of these cases even though it treats most of them
identically.

I only leave e_sint and e_uint to denote the integer signedness, but I
want to remove that in a follow-up as well.

Removing these cases simplifies most of these functions, with the only
exception being PromoteToMaxType, which can no longer rely on a simple
enum comparison to determine what needs to be promoted.

This also makes the class ready to work with arbitrary integer sizes, so
it does not need to be modified when someone needs to add a larger
integer size.

Differential Revision: https://reviews.llvm.org/D85836

[lldb] Forcefully complete a type when adding nested classes

With -flimit-debug-info, we can run into cases when we only have a class
as a declaration, but we do have a definition of a nested class. In this
case, clang will hit an assertion when adding a member to an incomplete
type (but only if it's adding a c++ class, and not C struct).

It turns out we already had code to handle a similar situation arising
in the -gmodules scenario. This extends the code to handle
-flimit-debug-info as well, and reorganizes bits of other code handling
completion of types to move functions doing similar things closer
together.

Differential Revision: https://reviews.llvm.org/D85968

[lldb] Add SBModule::GarbageCollectAllocatedModules and clear modules after each test run

Right now the only places in the SB API where lldb:: ModuleSP instances are
destroyed are in SBDebugger::MemoryPressureDetected (where it's just attempted
but not guaranteed) and in SBDebugger::DeleteTarget (which will be removed in
D83933). Tests that directly create an lldb::ModuleSP and never create a target
therefore currently leak lldb::Module instances. This triggers the sanity checks
in lldbtest that make sure that the global module list is empty after a test.

This patch adds SBModule::GarbageCollectAllocatedModules as an explicit way to
clean orphaned lldb::ModuleSP instances. Also we now start calling this method
at the end of each test run and move the sanity check behind that call to make
this work. This way even tests that don't create targets can pass the sanity
check.

This fixes TestUnicodeSymbols.py when D83865 is applied (which makes that the
sanity checks actually fail the test).

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D83876

[lldb] Fix that log enable's -f parameter causes LLDB to crash when it can't open the log file

We didn't do anything with the llvm::Error we get from `Open`, so when we end up in the
error case we just crash due to the llvm::Error sanity check. Also add the missing newline
behind the error message so it no longer messes with the next (lldb) prompt.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D85970

[lldb] Get lldb-server platform's --socket-file working again

`lldb-server platform --socket-file /any/path` currently always fails to create
the socket file. This stopped working after D67424 which changed the
input variables of `writeFileAtomically` slightly. We're expected to
pass in a temporary path template (`/tmp/foo-%%%%%`) and the final
path we want to write. Instead we currently pass in the never set
`temp_file_path` as the temporary path (which will make this function always
fail) and pass in the temp_file_spec's path as the final path (which is actually
the template path such as `/tmp/foo-%%%%%`) instead of the actual path
we want to write (e.g. `/tmp/foo`).

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D85890

[VE] Support f128

Support f128 using VE instructions. Update regression tests.
I've noticed there is no load or store i128 test, so I add them too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D86035

[lldb][NFC] Remove stride parameter from GetArrayElementType

This parameter isn't used anywhere in LLDB nor the Swift downstream branch. It
also doesn't really fit into the TypeSystem APIs that usually don't return
additional related functionality via some output parameters. Also the
implementations already states that the calculated value there is wrong.

Let's remove it. If we need this functionality at some point then Swift's much
nicer `GetByteStride` function seems like the way to go.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D84299

[clang] Make signature help work with dependent args

Fixes https://github.com/clangd/clangd/issues/490

Differential Revision: https://reviews.llvm.org/D85826

[lldb] Print the exception traceback when hitting cleanup errors

Right now if the test suite encounters a cleanup error it just prints "CLEANUP
ERROR:" but not any additional information.

This patch just prints the exception that caused the cleanup error. This should
make debugging the failing tests for D83865 easier (and seems in general nice to
have).

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D83874

[X86] Reject dirflag in inline asm constraints other than clobber.

Fixes the crash from PR47195.

[PowerPC] Make StartMI ignore COPY like instructions.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D85659

[InstCombine] Fix a compilation bug

With gcc 6.3.0, I hit the following compilation bug.
  ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
   };
    ^
  cc1plus: all warnings being treated as errors

The error is introduced by Commit ae7f08812e09 ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")

[clang] fix a compilation bug

With gcc 6.3.0, I hit the following compilation bug:
  /home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:
  In function ‘bool ParseCodeGenArgs(clang::CodeGenOptions&, llvm::opt::ArgList&,
  clang::InputKind, clang::DiagnosticsEngine&, const clang::TargetOptions&,
  const clang::FrontendOptions&)’:
  /home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:780:12:
    error: unused variable ‘A’ [-Werror=unused-variable]
     if (Arg *A = Args.getLastArg(OPT_fuse_ctor_homing))
              ^
  cc1plus: all warnings being treated as errors

The bug is introduced by Commit ae6523cd62a4 ("[DebugInfo] Add
-fuse-ctor-homing cc1 flag so we can turn on constructor homing only
if limited debug info is already on.")

Initial MLIR python bindings based on the C API.

* Basic support for context creation, module parsing and dumping.

Differential Revision: https://reviews.llvm.org/D85481

[StackSafety] Skip ambiguous lifetime analysis

If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630

Replace setter named 'getAsOpaqueInt' with a real getter.

Clean up a bunch of places where the opaque forms of FPOptions and
FPOptionsOverride were being used inappropriately.

Always keep unset fields in FPOptionsOverride zeroed.

There are three fields that the FPOptions default constructor sets to
non-zero values; those fields previously could have been zero or
non-zero depending on whether they'd been explicitly removed from the
FPOptionsOverride set. However, that doesn't seem to ever actually
happen, so this is NFC, except that it makes the AST file representation
of FPOptionsOverride make more sense.

Use consistent code for setting FPFeatures from operator constructors.

Don't leave the FPOptions in a UnaryOperator uninitialized.

We don't appear to use these FPOptions for anything right now, but
they shouldn't be uninitialized because that makes our AST file output
nondeterministic.

Add missing parsing for attributes to std.generic_atomic_rmw op

Fix llvm.org/pr47182

Differential Revision: https://reviews.llvm.org/D86030

[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically

[InstCombine] Aggregate reconstruction simplification (PR47060)

This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.

The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a  `call`.

Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S

This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.

I think, this is a missing fold in InstCombine, so i've implemented it.

I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
   If it was extracted from the aggregate with identical type,
   from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
   then we can just use said original source aggregate directly,
   instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
   we need be PHI-aware - we might have different source aggregate when coming
   from each predecessor.

I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.

I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.

On RawSpeed, for example, this has the following effect:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |     1253 |  1253 |   0.00% |  0.00% |
| simplifycfg.NumInvokes                            |      948 |     1355 |   407 |  42.93% | 42.93% |
| instcount.NumInsertValueInst                      |     4382 |     3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode                     |      574 |      458 |  -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs                   |     1154 |      921 |  -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst                     |    29017 |    26397 | -2620 |  -9.03% |  9.03% |
| instcombine.NumDeadInst                           |   166618 |   174705 |  8087 |   4.85% |  4.85% |
| instcount.NumPHIInst                              |    51526 |    50678 |  -848 |  -1.65% |  1.65% |
| instcount.NumLandingPadInst                       |    20865 |    20609 |  -256 |  -1.23% |  1.23% |
| instcount.NumInvokeInst                           |    34023 |    33675 |  -348 |  -1.02% |  1.02% |
| simplifycfg.NumSimpl                              |   113634 |   114708 |  1074 |   0.95% |  0.95% |
| instcombine.NumSunkInst                           |    15030 |    14930 |  -100 |  -0.67% |  0.67% |
| instcount.TotalBlocks                             |   219544 |   219024 |  -520 |  -0.24% |  0.24% |
| instcombine.NumCombined                           |   644562 |   645805 |  1243 |   0.19% |  0.19% |
| instcount.TotalInsts                              |  2139506 |  2135377 | -4129 |  -0.19% |  0.19% |
| instcount.NumBrInst                               |   156988 |   156821 |  -167 |  -0.11% |  0.11% |
| instcount.NumCallInst                             |  1206144 |  1207076 |   932 |   0.08% |  0.08% |
| instcount.NumResumeInst                           |     5193 |     5190 |    -3 |  -0.06% |  0.06% |
| asm-printer.EmittedInsts                          |   948580 |   948299 |  -281 |  -0.03% |  0.03% |
| instcount.TotalFuncs                              |    11509 |    11507 |    -2 |  -0.02% |  0.02% |
| inline.NumDeleted                                 |    97595 |    97597 |     2 |   0.00% |  0.00% |
| inline.NumInlined                                 |   210514 |   210522 |     8 |   0.00% |  0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.

On vanilla llvm-test-suite:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |      744 |   744 |   0.00% |  0.00% |
| instcount.NumInsertValueInst                      |     2705 |     2053 |  -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes                            |     1212 |     1424 |   212 |  17.49% | 17.49% |
| instcount.NumExtractValueInst                     |    21681 |    20139 | -1542 |  -7.11% |  7.11% |
| simplifycfg.NumSinkCommonInstrs                   |    14575 |    14361 |  -214 |  -1.47% |  1.47% |
| simplifycfg.NumSinkCommonCode                     |     6815 |     6743 |   -72 |  -1.06% |  1.06% |
| instcount.NumLandingPadInst                       |    14851 |    14712 |  -139 |  -0.94% |  0.94% |
| instcount.NumInvokeInst                           |    27510 |    27332 |  -178 |  -0.65% |  0.65% |
| instcombine.NumDeadInst                           |  1438173 |  1443371 |  5198 |   0.36% |  0.36% |
| instcount.NumResumeInst                           |     2880 |     2872 |    -8 |  -0.28% |  0.28% |
| instcombine.NumSunkInst                           |    55187 |    55076 |  -111 |  -0.20% |  0.20% |
| instcount.NumPHIInst                              |   321366 |   320916 |  -450 |  -0.14% |  0.14% |
| instcount.TotalBlocks                             |   886816 |   886493 |  -323 |  -0.04% |  0.04% |
| instcount.TotalInsts                              |  7663845 |  7661108 | -2737 |  -0.04% |  0.04% |
| simplifycfg.NumSimpl                              |   886791 |   887171 |   380 |   0.04% |  0.04% |
| instcount.NumCallInst                             |   553552 |   553733 |   181 |   0.03% |  0.03% |
| instcombine.NumCombined                           |  3200512 |  3201202 |   690 |   0.02% |  0.02% |
| instcount.NumBrInst                               |   741794 |   741656 |  -138 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                  |    14443 |    14445 |     2 |   0.01% |  0.01% |
| asm-printer.EmittedInsts                          |  7978085 |  7977916 |  -169 |   0.00% |  0.00% |
| inline.NumDeleted                                 |    73188 |    73189 |     1 |   0.00% |  0.00% |
| inline.NumInlined                                 |   291959 |   291968 |     9 |   0.00% |  0.00% |
```
Roughly similar effect, less instructions and blocks total.

See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.

Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions

And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text

See https://bugs.llvm.org/show_bug.cgi?id=47060

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85787

[OpenMP][CUDA] Keep one kernel list per device, not globally.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86039

[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)

Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86038

[OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156

When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037

[ARM] Tests for tail predicated loads. NFC

[Sema] Use the proper cast for a fixed bool enum.

When casting an enumerate with a fixed bool type the casting should use
an IntegralToBoolean instead of an IntegralCast as is required per Core
Issue 2338.

Fixes PR47055: Incorrect codegen for enum with bool underlying type

Differential Revision: https://reviews.llvm.org/D85612

[Sema] Validate calls to GetExprRange.

When a conditional expression has a throw expression it called
GetExprRange with a void expression, which caused an assertion failure.

This approach was suggested by Richard Smith.

Fixes PR46484: Clang crash in clang/lib/Sema/SemaChecking.cpp:10028

Differential Revision: https://reviews.llvm.org/D85601