review.tizen.org Git - platform/upstream/llvm.git/log

[CSSPGO][llvm-profgen] Handle return to external transition.

In a callback case, a return from internal code, say A, to external runtime can happen. The external runtime can then call back to another internal routine, say B. Making an artificial branch that looks like a return from A to B can confuse the unwinder to treat the instruction before B as the call instruction.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D104546

Revert "Revert "[cmake] [compiler-rt] Call llvm_setup_rpath() when adding shared libraries.""

This reverts commit 21c008d5a5b1e0c2ec3c1659cff961f4b0ccea2c since
it broke the build on macOS and Windows with the following error:

  The install of the clang_rt.<na,e> target requires changing an
  RPATH from the build tree, but this is not supported with the Ninja
  generator unless on an ELF-based platform.  The
  CMAKE_BUILD_WITH_INSTALL_RPATH variable may be set to avoid this relinking
  step.

precommit test for D104665

[ELF] Optimize ScriptLexer::getLineNumber by caching the previous line number and offset

getLineNumber() was counting the number of line feeds from the start of
the buffer to the current token. For large linker scripts this became a
performance bottleneck. For one 4MB linker script over 4 minutes was
spent in getLineNumber's StringRef::count.

Store the line number from the last token, and only count the additional
line feeds since the last token.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104137

[Attributor] Fix AAExecutionDomain returning true on invalid states

This patch fixes a problem with the AAExecutionDomain attributor not
checking if it is in a valid state. This can cause it to incorrectly
return that a block is executed in a single threaded context after the
attributor failed for any reason.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103186

[clang] unbreak Index/preamble-reparse-changed-module.m with LLVM_APPEND_VC_REV=NO after 7942ebdf01b3

See revision b8b7a9dcdcbc for prior art.

gn build: Rebase clang-tblgen include path against root_build_dir instead of root_out_dir.

Fixes clang cross-compilation.

Also remove some redundant include path arguments.

[mlir][sparse] integration test for "simply dynamic" sparse output tensors

Reviewed By: gussmith23

Differential Revision: https://reviews.llvm.org/D104583

[compiler-rt] Make use of undefined symbols configurable

We want to disable the use of undefined symbols on Fuchsia, but there
are cases where it might be desirable so may it configurable.

Differential Revision: https://reviews.llvm.org/D104728

[mlir] Fix build on gcc-5 after D104167

[PowerPC][NFC] Clean up builtin sema checks

Cleanup sema checking for 64bit builtins or builtins that require
specific feature support.

Reviewed By: NeHuang

Differential Revision: https://reviews.llvm.org/D104664

[flang] [NFC] Repair build with GCC 7.3

Work around two problems with GCC 7.3.
One is its inability to implement "constexpr operator=(...) = default;"
in a class with a std::optional<> component; another is a legitimate-
looking warning about an unused variable.

Differential Revision: https://reviews.llvm.org/D104731

[OpenMP] Change remaining globalization from an analysis remark to missed

After landing the globalization optimizations, the precense of globalization on
the device that was not put in shared or stack memory is a failed optimization
with performance consequences so it should indicate a missed remark.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104735

[clangd] Dont index ObjCCategoryDecls for completion

They are already provided by Sema, deserializing from preamble if need
be. Moreover category names are meaningless outside interface/implementation
context, hence they were only causing noise.

Differential Revision: https://reviews.llvm.org/D104540

[mlir][sparse] add support for "simply dynamic" sparse tensor expressions

Slowly we are moving toward full support of sparse tensor *outputs*. First
step was support for all-dense annotated "sparse" tensors. This step adds
support for truly sparse tensors, but only for operations in which the values
of a tensor change, but not the nonzero structure (this was refered to as
"simply dynamic" in the [Bik96] thesis).

Some background text was posted on discourse:
https://llvm.discourse.group/t/sparse-tensors-in-mlir/3389/25

Reviewed By: gussmith23

Differential Revision: https://reviews.llvm.org/D104577

[clang] Add cc1 option for dumping layout for all complete types

This change adds an option which, in addition to dumping the record
layout as is done by -fdump-record-layouts, causes us to compute the
layout for all complete record types (rather than the as-needed basis
which is usually done by clang), so that we will dump them as well.
This is useful if we are looking for layout differences across large
code bases without needing to instantiate every type we are interested in.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D104484

[OpenMP] Add thread limit environment variable support to plugins

The OpenMP 5.1 standard defines the environment variable
`OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a
single block. This patch adds support for this into the AMDGPU and CUDA
plugins.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103923

[libc++] NFC: Remove unused c++98 Lit feature

[mlir] Remove the Identifier ThreadLocalCache from MLIRContext

This used to be important for reducing lock contention when accessing identifiers, but
the cost of the cache can be quite large if parsing in a multi-threaded context. After
D104167, the win of keeping a cache is not worth the cost.

Differential Revision: https://reviews.llvm.org/D104737

[mlir][OpGen] Cache Identifiers for known attribute names in AbstractOperation.

Operations currently rely on the string name of attributes during attribute lookup/removal/replacement, in build methods, and more. This unfortunately means that some of the most used APIs in MLIR require string comparisons, additional hashing(+mutex locking) to construct Identifiers, and more. This revision remedies this by caching identifiers for all of the attributes of the operation in its corresponding AbstractOperation. Just updating the autogenerated usages brings up to a 15% reduction in compile time, greatly reducing the cost of interacting with the attributes of an operation. This number can grow even higher as we use these methods in handwritten C++ code.

Methods for accessing these cached identifiers are exposed via `<attr-name>AttrName` methods on the derived operation class. Moving forward, users should generally use these methods over raw strings when an attribute name is necessary.

Differential Revision: https://reviews.llvm.org/D104167

Add regression test for maybeMangle issue

This was crbug.com/1222724, which caused D104529 to be reverted. The
new test fails when D104529 is reapplied locally.

Introduce a Bazel build configuration

This patch introduces configuration for a Bazel BUILD in a side
directory in the monorepo.

This is following the approval of
https://github.com/llvm/llvm-www/blob/main/proposals/LP0002-BazelBuildConfiguration.md

As detailed in the README, the Bazel BUILD is not supported
by the community in general, and is maintained only by interested
parties. It follows the requirements of the LLVM peripheral tier:
https://llvm.org/docs/SupportPolicy.html#peripheral-tier.

This is largely copied from https://github.com/google/llvm-bazel,
with a few filepath tweaks and the addition of the README.

Reviewed By: echristo, keith, dblaikie, kuhar

Differential Revision: https://reviews.llvm.org/D90352

[clang-format] Add new LambdaBodyIndentation option

Currently the lambda body indents relative to where the lambda signature is located. This instead lets the user
choose to align the lambda body relative to the parent scope that contains the lambda declaration. Thus:

someFunction([] {
  lambdaBody();
});

will always have the same indentation of the body even when the lambda signature goes on a new line:

someFunction(
    [] {
  lambdaBody();
});

whereas before lambdaBody would be indented 6 spaces.

Differential Revision: https://reviews.llvm.org/D102706

Revert "[cmake] [compiler-rt] Call llvm_setup_rpath() when adding shared libraries."

This reverts commit 78fd93e0396a19cb89d4b874c7cc42255888df56 as
a follow up to D91099.

[gn build] manually port c747b7d1d9a2 more (config.osx_sysroot)

Make lit configs relocatable again after c747b7d1d9a

See https://reviews.llvm.org/D77184 for background.

[llvm-diff] Explicitly check ConstantArrays

Global initializers may be ConstantArrays. They need to be checked
explicitly, because different-yet-still-equivalent type names may be
used for each, and/or a GEP instruction may appear in one.

[llvm-diff] Add support for diffing the callbr instruction

The only wrinkle is that we can't process the "blockaddress" arguments
of the callbr until the blocks have been equated. So we force them to be
"unified" before checking.

This was left out when the callbr instruction was added.

Differential Revision: https://reviews.llvm.org/D104606

Revert "[compiler-rt] Make use of undefined symbols configurable"

This reverts commit ed7086ad46f99f639b85ea6c8bda7c1a71be7c53.
This reverts commit b9792638b0bfb308e0c7c125ac78f4ebf910c11b.

This breaks cmake with message:

CMake Error at llvm-project/compiler-rt/CMakeLists.txt:449:
Parse error. Expected "(", got newline with text "

[OpaquePtr] Support changing load type in InstCombine

When the load type is changed to ptr, we need the load pointer type
to also be ptr, because it's not allowed to create a pointer to an
opaque pointer. This is achieved by adjusting the getPointerTo() API
to return an opaque pointer for an opaque pointer base type.

Differential Revision: https://reviews.llvm.org/D104718

Revert "ThinLTO: Fix inline assembly references to static functions with CFI"

This reverts commit 4474958d3a97dede2caa0920f7c4a4dc7aac57d3.

Breaks check-llvm on Mac.

[OpenMP] Remove OpenMP CUDA Target Parallel compiler flag

Summary:
The changes introduced in D97680 turns this command line option into a no-op so
it can be removed entirely.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102940

[libcxx][doc] corrects LWG links in the One Ranges section

[CMake] Fix the option declaration

This addresses build issue introduced in
b9792638b0bfb308e0c7c125ac78f4ebf910c11b.

[libcxx][docs] updates the ranges status paper

* indicates whether work has been started or completed
* consolidates content that was split for dependency reasons (iff
everything has been merged)
* makes things a lot more fine-grained
* turns sub-CSVs into lists
* puts links into description section and removes patch column
* adds links to c++draft on occasion

These changes heavily prioritise the the reader of the generated HTML
file, not the source.

Differential Revision: https://reviews.llvm.org/D103295

[compiler-rt] Make use of undefined symbols configurable

We want to disable the use of undefined symbols on Fuchsia, but there
are cases where it might be desirable so may it configurable.

Differential Revision: https://reviews.llvm.org/D104728

[compiler-rt][CMake] Drop flags that are set by default for Fuchsia

-Wl,-z,now is set by the Fuchsia driver, -Wl,-z,relro is the default
in LLD.

[CodeGen] Don't create fake FunctionDecls when generating block/byref
copy/dispose helper functions

We found out that these fake functions would cause clang to crash if the
changes proposed in https://reviews.llvm.org/D98799 were made.

Differential Revision: https://reviews.llvm.org/D104082

[OpenMP][NFC] Add new optimizations to OpenMPOpt comment header

Summary:
Adds mentions to the new globalization optimizations added to the OpenMPOpt comment header.

[Attributor] Add an option to increase the max number of iterations

Right now the Attributor defaults to 32 fixed point iterations unless it is set
explicitly by a command line flag. This patch allows this to be configured when
the attributor instance is created. The maximum is then increased in OpenMPOpt
if the target is a kernel. This is because the globalization analysis can result
in larger iteration counts due to many dependent instances running at once.

Depends on D102444

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104416

Revert "[LLD] [COFF] Avoid doing repeated fuzzy symbol lookup for each iteration. NFC."

This reverts commit e1adf90826a57b674eee79b071fb46c1f5683cd0.

This appears to affect the way that C++ mangled symbols appear in the
import library when using a .def file that names a C++ free function
with no name decoration. I will follow up with a reduced test case
shortly.

Improve clang -Wframe-larger-than= diagnostic

Match the style in D104667.

This commit is for non-LTO diagnostics, while D104667 is for LTO and llc diagnostics.

[InstCombine] reduce code duplication for FP min/max with casts fold; NFC

[InstCombine][test] add tests for FP min/max with negated op; NFC

[Attributor] Add interface to emit remarks in Attributor

Summary:
This patch adds support for the Attributor to emit remarks on behalf of some
other pass. The attributor can now optionally take a callback function that
returns an OptimizationRemarkEmitter object when given a Function pointer. If
this is availible then a remark will be emitted for the corresponding pass
name.

Depends on D102197

Reviewed By: sstefan1 thegameg

Differential Revision: https://reviews.llvm.org/D102444

[ARM] Change some Gather/Scatter interface types to Instructions. NFC

These returned Values are cast to an Instruction already, this just
cleans up the interface a little to match the expected types.

[lldb] Add missing string include to lldb-server's main

[libc++] NFC: Add missing all.h to the modulemap

AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions

These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. This
also manages to pick up a few cases that the pattern fails to optimize
away.

We handle some cases with instruction patterns, but some get
through. In particular this improves the integer cases.

[libc++] Enable `explicit` conversion operators, even in C++03 mode.

C++03 didn't support `explicit` conversion operators;
but Clang's C++03 mode does, as an extension, so we can use it.
This lets us make the conversion explicit in `std::function` (even in '03),
and remove some silly metaprogramming in `std::basic_ios`.

Drive-by improvements to the tests for these operators, in addition
to making sure all these tests also run in `c++03` mode.

Differential Revision: https://reviews.llvm.org/D104682

AMDGPU: Add baseline test for instructions zeroing high bits

[OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls

Summary:
The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087.

Depends on D97818

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102197

[OpenMP] Add new OpenMP globalization functions to library info

Summary:
The changes to globalization introduced in D97680 created two new functions to
push / pop shareably memory on the GPU, __kmpc_alloc_shared and
__kmpc_free_shared. This patch adds these new runtime functions to the
library info so they can be used by the HeapToStack attributor interface. This
optimization replaces malloc / free pairs with stack memory if legal.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102087

[MCA] [In-order pipeline] Fix for 0 latency instruction causing assertion to fail.

0 latency instructions now get processed and retired properly within the in-order pipeline. Had to fix a bug within TimelineView.cpp as well that would show up when a 0 latency instruction was the first instruction in the source.

Differential Revision: https://reviews.llvm.org/D104675

AMDGPU: Fix high 16-bit optimization on gfx9

We can do this optimization in the majority of cases, but we currently
don't have a way to do it. We do not track/model which instructions
have which behavior, the control bit to change the high bit behavior,
or making use of preserved bits at all. This is a bit fuzzy since we
don't know precisely how the source instruction will be lowered, but
that only really matters in one case (for fma_mixlo).

We do need to fixup some of these cases after selection, but the
pattern helps eliminate many of these zexts.

[gn build] Port 40d6d2c49dd1

ThinLTO: Fix inline assembly references to static functions with CFI

Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354
Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D104058

[AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table.

Summary:

generate eh_info when vector registers are saved according to the traceback table.

struct eh_info_t {

unsigned version;       /* EH info version 0 */
#if defined(64BIT)

char _pad[4];           /* padding */
#endif

unsigned long lsda;     /* Pointer to Language Specific Data Area */
unsigned long personality; /* Pointer to the personality routine */
};

the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function

Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/D103651

[AMDGPU] Use performOptimizedStructLayout for LDS sort

This gives better packing.

Differential Revision: https://reviews.llvm.org/D104331

Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular)

Before: `warning: stack size limit exceeded (888) in main`
After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned)

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D104667

[libcxx][ranges] Add `ranges::iter_swap`.

Differential Revision: https://reviews.llvm.org/D102809

[gn build] manually port c747b7d1d9a2 (config.osx_sysroot)

AMDGPU: Move zeroed FP high bits optimization to patterns

[libc++] Change forward_list::swap to use propagate_on_container_swap for noexcept specification

The current implementation of `std::forward_list::swap` uses
`propagate_on_container_move_assignment` for `noexcept` specification.
This patch changes it to use `propagate_on_container_swap`, as it should.

Fixes https://llvm.org/PR50224.

Differential Revision: https://reviews.llvm.org/D101899

[clang][c++20] Fix false warning for unused private fields when a class has only defaulted comparison operators.

Fixes bug 50263

When "unused-private-field" flag is on if you have a struct with private
members and only defaulted comparison operators clang will warn about
unused private fields.

If you where to write the comparison operators by hand no warning is
produced.

This is a bug since defaulting a comparison operator uses all private
members .

The fix is simple, in CheckExplicitlyDefaultedFunction just clear the
list of unused private fields if the defaulted function is a comparison
function.

Differential revision: https://reviews.llvm.org/D102186

[NFC][OpenMP][Offloading] Unified the construction of mapping table entry

This patch unifies construction of mapping table entry to use `emplace`.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104580

[OpenMP] Internalize functions in OpenMPOpt to improve IPO passes

Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102824

[llvm] Fix lto tests that requires ld64

Since Xcode 13, ld64 requires linking libSystem for all the executable.
Fix the tests that needs to run ld64 by linking libSystem from sysroot.

rdar://77332728

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D104332

[llvm-objcopy] Fix some namespace style issues

https://llvm.org/docs/CodingStandards.html#use-namespace-qualifiers-to-implement-previously-declared-functions

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D104693

[llvm-diff] Constify APIs so that there aren't conflicts

Some APIs work with const variables while others don't. This can cause
conflicts when calling one from the other.

This is NFC.

Differential Revision: https://reviews.llvm.org/D104719

[OpenMP] Replace GPU globalization calls with shared memory in the middle-end

Summary:
The changes introduced in D97680 create a simpler interface to code that needs
to be globalized. This interface is used to simplify the globalization calls in
the middle end. We can check any globalization call that is only called by a
single thread in the team and replace it with a static shared memory buffer.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97818

[Libomptarget] Improve device runtime implementation for globalized variables.

Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread.

Depends on D97680

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104666

[OpaquePtr] Handle addrspacecasts in InstCombine

This adds support for addrspace casts involving opaque pointers to
InstCombine, as well as the isEliminableCastPair() helper
(otherwise the assertion failure would just move there).

Add PointerType::hasSameElementTypeAs() to hide the element type
details.

Differential Revision: https://reviews.llvm.org/D104668

[AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier

This is a recommit that fixes unwanted STP generation by checking that
the base register has not been modified or used elsewhere.

Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.

Differential Revision: https://reviews.llvm.org/D103597

[lldb] Remove more redundant SetStatus(eReturnStatusFailed)

Mostly by converting uses of GetErrorStream to AppendError,
so that the call to SetStatus is implicit.

Some remain where it isn't certain that you'll have a message
to set, or you want the output to be on stdout.

One place in CommandObjectWatchpoint previously didn't set
the status to failed at all. However it's pretty obvious
that it should do so.

Reviewed By: teemperor

Differential Revision: https://reviews.llvm.org/D104697

[SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch

There was a bug from cost calculation for partially invariant unswitch.

The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.

Differential Revision: https://reviews.llvm.org/D103816

[SCEV] Reduce code to handle predicates in applyLoopGuards (NFC).

Hoist out common recurrence check and sink updating the map, to reduce
the code required to support additional predicates.

[OpenMP] Simplify GPU memory globalization

Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680

[lldb][NFC] Remove an outdated comment in HostInfoBase

We should *never* use static local variables in this file as this makes
unittesting the plugin code impossible (and this whole 'testing' thing has
turned out to be rather useful so far).

[SLP][AArch64] Add SLP vectorizer tests for XOR and AND reductions. NFC

These regression tests show missed SLP vectorization opportunities,
which will be fixed in a future commit (see:
https://reviews.llvm.org/D104538).

Differential Revision: https://reviews.llvm.org/D104708

[clang] Remove unused capture in closure

c6a91ee6aaaa removed uses of IsMonotonic from OpenMP SIMD codegen,
but that left a capture of the variable unused which upset buildbots
using -Werror.

[Libomptarget] Introduce new globalization runtime calls

Summary:
This patch introduces the new globalization runtime to be used by D97680. These
runtime calls will replace the __kmpc_data_sharing_push_stack and
__kmpc_data_sharing_pop_stack functions.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102532

[BitcodeReader] Validate Strtab before accessing.

This fixes a crash with invalid bitcode files that have records
referencing names in Strtab, but Strtab is not present or the index is
out-of-bounds.

This fixes the following clusterfuzz issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29895

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95554

[ConstantFold] Delay fetching pointer element type

Don't do this while stipping pointer casts, instead fetch it at
the end. This improves compatibility with opaque pointers for the
case where the base object is not opaque.

[ConstantFold] Skip bitcast -> GEP transform for opaque pointers

Same as with the InstCombine transform, this is not possible for
bitcasts involving opaque pointers, as GEP preserves opaqueness.

[OpenMP] libomp: fix dynamic loop dispatcher

Restructured dynamic loop dispatcher code.
Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule:
- eliminated possibility of stealing iterations of the wrong loop when victim
thread changed its buffer to work on another loop;
- fixed race when victim thread changed its buffer to work in nested parallel;
- eliminated "static" property of the schedule, that is now a single thread can
execute whole loop.

Differential Revision: https://reviews.llvm.org/D103648

[mlir] Fix invalid handling of AllocOp symbolOperands by SimplifyAllocConst.

symbolOperands were completely ignored by SimplifyAllocConst. Also, slightly improved diagnostic message for verifyAllocLikeOp.

Differential Revision: https://reviews.llvm.org/D104260

[AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp

Moving this method helps eliminate a use of g_atl_machine.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104691

[lldb][NFC] Use SubsystemRAII in XcodeSDKModuleTests

Add norm sub-target feature to table gen for ARC

This adds the `norm` sub-target feature (without backing implementation for now) to table gen.

Differential Revision: https://reviews.llvm.org/D104558

[LLDB] Skip TestExitDuringExpression on aarch64/linux buildbot

TestExitDuringExpression test_exit_before_one_thread_no_unwind fails
sporadically on both Arm and AArch64 linux buildbots. This seems like
manifesting itself on a fully loaded machine. I have not found a reliable
timeout value so marking it skip for now.

[mlir][memref] Add memref.copy operation

As the name suggests, it copies from one memref to another.

Differential Revision: https://reviews.llvm.org/D104657

[SCEV] Retain AddExpr flags when subtracting a foldable constant.

Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2.

But we can retain flags under certain conditions:

* Adding a smaller constant is NUW if the original AddExpr was NUW.

* Adding a constant with the same sign and small magnitude is NSW, if the
original AddExpr was NSW.

This can improve results after using `SimplifyICmpOperands`, which may
subtract one in order to use stricter predicates, as is the case for
`isKnownPredicate`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104319

[lldb] Adjust Clang version requirements for tail_call_frames tests

Those tests are all failing for older Clang versions. This is adding the
respective test decorators for the passing Clang versions to get the recently
revived matrix bot green.

[lld/mac] Add explicit "no unwind info" entries for functions without unwind info

Fixes PR50529. With this, lld-linked Chromium base_unittests passes on arm macs.

Surprisingly, no measurable impact on link time.

Differential Revision: https://reviews.llvm.org/D104681

[lldb] Bumb Clang version requirement for TestBasicEntryValues.py to 11

The test only passes with Clang>=11 so adjust the decorator.

Failure output for Clang 10 is:

--- FileCheck trace (code=1) ---
FileCheck main.cpp -check-prefix=FUNC1-GNU

FileCheck input:
      Address: a.out[0x0000000000401127] (a.out.PT_LOAD[1]..text + 263)
      Summary: a.out`func1(int&) + 23 at main.cpp:25:1
       Module: file = "functionalities/param_entry_vals/basic_entry_values/BasicEntryValues_GNU.test_dwo/a.out", arch = "x86_64"
  CompileUnit: id = {0x00000000}, file = "functionalities/param_entry_vals/basic_entry_values/main.cpp", language = "c++11"
     Function: id = {0x400000000000010a}, name = "func1(int&)", mangled = "_Z5func1Ri", range = [0x0000000000401110-0x0000000000401129)
     FuncType: id = {0x400000000000010a}, byte-size = 0, decl = main.cpp:13, compiler_type = "void (int &)"
       Blocks: id = {0x400000000000010a}, range = [0x00401110-0x00401129)
    LineEntry: [0x0000000000401127-0x0000000000401130): functionalities/param_entry_vals/basic_entry_values/main.cpp:25:1
       Symbol: id = {0x0000002c}, range = [0x0000000000401110-0x0000000000401129), name="func1(int&)", mangled="_Z5func1Ri"

FileCheck output:

functionalities/param_entry_vals/basic_entry_values/main.cpp:23:16: error: FUNC1-GNU: expected string not found in input
// FUNC1-GNU: name = "sink", type = "int &", location = DW_OP_GNU_entry_value

[ADT] Add StringRef consume_front_lower and consume_back_lower

These serve as a convenient combination of consume_front/back and
startswith_lower/endswith_lower, consistent with other existing
case insensitive methods named <operation>_lower.

Differential Revision: https://reviews.llvm.org/D104218

[Clang][OpenMP] Monotonic does not apply to SIMD

The codegen for simd constructs was affected by the presence (or
absence) of the 'monotonic' schedule modifier for worksharing
loops. The modifier is only intended to apply to the scheduling of
chunks for a thread, not iterations of a loop inside a chunk.

In addition, the monotonic modifier was applied to worksharing loops
by default if no schedule clause was present; the referenced part of
the OpenMP 4.5 spec in the code (section 2.7.1) only applies if the
user specified a schedule clause with a static kind but no modifier.
Without a user-specified schedule clause we should default to
nonmonotonic scheduling.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103793

[ConstantFolding] Separate conditions in GEP evaluation (NFC)

Handle to gep p, 0-v case separately, and not as part of the loop
that ensures all indices are constant integers. Those two things
are not really related.