review.tizen.org Git - platform/upstream/llvm.git/log

profi - a flow-based profile inference algorithm: Part III (out of 3)

This is a continuation of D109860 and D109903.

An important challenge for profile inference is caused by the fact that the
sample profile is collected on a fully optimized binary, while the block and
edge frequencies are consumed on an early stage of the compilation that operates
with a non-optimized IR. As a result, some of the basic blocks may not have
associated sample counts, and it is up to the algorithm to deduce missing
frequencies. The problem is illustrated in the figure where three basic
blocks are not present in the optimized binary and hence, receive no samples
during profiling.

We found that it is beneficial to treat all such blocks equally. Otherwise the
compiler may decide that some blocks are “cold” and apply undesirable
optimizations (e.g., hot-cold splitting) regressing the performance. Therefore,
we want to distribute the counts evenly along the blocks with missing samples.
This is achieved by a post-processing step that identifies "dangling" subgraphs
consisting of basic blocks with no sampled counts; once the subgraphs are
found, we rebalance the flow so as every branch probability is 50:50 within the
subgraphs.

Our experiments indicate up to 1% performance win using the optimization on
some binaries and a significant improvement in the quality of profile counts
(when compared to ground-truth instrumentation-based counts)

{F19093045}

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D109980

[gn build] (manually) port 9e3552523ebd (no more old mach-o lld)

[ELF] Hint -z nostart-stop-gc for __start_ undefined references

Make users aware what to do with ld.lld 13.0.0 / GNU ld<2015-10 --gc-sections
behavior.

Differential Revision: https://reviews.llvm.org/D114830

Reapply "OpenMP: Start calling setTargetAttributes for generated kernels"

This reverts commit 25eb7fa01d7ebbe67648ea03841cda55b4239ab2.

Previous buildbot failures appear to have been a fluke from a dirty
build.

[NFC][sanitizer] Use more bytes of sanitizer_stack_store_test pointers

[compiler-rt] Fix incorrect variable names used

[sanitizer] Start background thread once

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D114933

[Bazel] Remove old macho lld port

This code and cmake was removed in https://reviews.llvm.org/D114842

Differential Revision: https://reviews.llvm.org/D114976

[asan] Remove confusing workaround

The goal is to identify the bot and try to fix it.

SetSoftRssLimitExceededCallback is AsanInitInternal as I assume
that only MaybeStartBackgroudThread needs to be delayed to constructors.
Later I want to move MaybeStartBackgroudThread call into sanitizer_common.

If it needs to be reverted please provide to more info, like bot, or details about setup.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D114934

Simplify the libcxx std::string_view gdb pretty printer

Seems better to rely on the existing formatting, makes the output
smaller/simpler - this is consistent with libstdc++'s std::string_view
pretty printing too.

Differential Revision: https://reviews.llvm.org/D113244

[Bazel] Remove old MachO LLD from the Bazel build

Updates Bazel files for 9e3552523ebd3385487e01e3e7af37b8c0efaf57

[lld-macho] Remove old macho darwin lld

During the llvm round table it was generally agreed that the newer macho
lld implementation is feature complete enough to replace the old
implementation entirely. This will reduce confusion for new users who
aren't aware of the history.

Differential Revision: https://reviews.llvm.org/D114842

profi - a flow-based profile inference algorithm: Part II (out of 3)

This is a continuation of D109860.

Traditional flow-based algorithms cannot guarantee that the resulting edge
frequencies correspond to a *connected* flow in the control-flow graph. For
example, for an instance in the attached figure, a flow-based (or any other)
inference algorithm may produce an output in which the hot loop is disconnected
from the entry block (refer to the rightmost graph in the figure). Furthermore,
creating a connected minimum-cost maximum flow is a computationally NP-hard
problem. Hence, we apply a post-processing adjustments to the computed flow
by connecting all isolated flow components ("islands").

This feature helps to keep all blocks with sample counts connected and results
in significant performance wins for some binaries.
{F19077343}

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D109903

[mlir][bufferization] fixed typo in to_memref doc

Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D114824

[SLP]Fix reused extracts cost.

If the extractelement instruction is used multiple times in the
different tree entries (either vectorized, or gathered), need to
compensate the scalar cost of such instructions. They are completely
removed if all users are part of the tree but we need to compensate the
cost only once for each instruction.

Differential Revision: https://reviews.llvm.org/D114958

[sanitizer] Add delta compression stack depot

Compress by factor 4x, takes about 10ms per 8 MiB block.

Depends on D114498.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D114503

[sanitizer] Add compress_stack_depot flag

Depends on D114494.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D114495

[ELF] Fix driver.test after 8c3641d0 when cwd is readonly

[sanitizer] DEFINE_REAL_PTHREAD_FUNCTIONS for hwasan, lsan, msan

It should be NFC, as they already intercept pthread_create.

This will let us to fix BackgroundThread for these sanitizerts.
In in followup patches I will fix MaybeStartBackgroudThread for them
and corresponding tests.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D114935

[lldb] Skip two lldb tests on Windows because they are flaky

These tests work fine with VS2017, but become more flaky with VS2019 and the buildbot is about to get upgraded.

Differential Revision: https://reviews.llvm.org/D114907

[bazel][mlgo] Remove the mlgo-related build excludes

They aren't needed anymore, we handle conditional compilation in those
files.

Reviewed By: GMNGeoffrey

Differential Revision: https://reviews.llvm.org/D114970

[BasicAA] Add tests for strcat/strncat/strcpy.

[DSE] Read after strcpy test.

[OpenMP] Remove the new runtime default for AMDGPU

The new runtime is currently broken for AMD offloading. This patch makes
the default the old runtime only for the AMD target.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D114965

[llvm] Use range-based for loops (NFC)

[llvm] Fix "unused variable" warnings

[SLP]Outline and fix code for finding common insertelement vectors.

Need to outline the code for finding common vectors in insertelement
instructions into a separate function for future patches. It also
improves the process by adding some extra checks for early exit and
fixes a bug where it always finds the match because of erroneous compare
of the same values.

Differential Revision: https://reviews.llvm.org/D114909

[ARM] Introduce i8neg and i8pos addressing modes

Some instructions with i8 immediate ranges can only hold negative values
(like t2LDRHi8), only hold positive values (like t2STRT) or hold +/-
depending on the U bit (like the pre/post inc instructions. e.g
t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8,
AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear.

This allows us to get the offset ranges of t2LDRHi8 correct in the
load/store optimizer, fixing issues where we could end up creating
instructions with positive offsets (which may then be encoded as ldrht).

Differential Revision: https://reviews.llvm.org/D114638

[clang-cl] Define _MSVC_LANG for -std=c++2b

This matches the value that msvc v19.29 VS16.11 uses for
_MSVC_LANG with /std:c++latest.

Differential Revision: https://reviews.llvm.org/D114952

Reapply "[TLI checker] Add more tests"

This reverts commit 8cd61aac0030b8add686a98b8902ea49ec9c1deb.
I had missed one place in a test that needed updating; it passed on my
dirty build tree but not on a clean one.

Original commit message:

D114478 identified testing gaps; this patch fills them.

Differential Revision: https://reviews.llvm.org/D114913

tsan: tolerate munmap with invalid arguments

We call UnmapShadow before the actual munmap, at that point we don't yet
know if the provided address/size are sane. We can't call UnmapShadow
after the actual munmap becuase at that point the memory range can
already be reused for something else, so we can't rely on the munmap
return value to understand is the values are sane.
While calling munmap with insane values (non-canonical address, negative
size, etc) is an error, the kernel won't crash. We must also try to not
crash as the failure mode is very confusing (paging fault inside of the
runtime on some derived shadow address).

Such invalid arguments are observed on Chromium tests:
https://bugs.chromium.org/p/chromium/issues/detail?id=1275581

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114944

[SLP]Improve registering and merging of compatible shuffles.

If several shuffle instructions are emitted, some of them might
same/compatible (less defined) with the previously emitted ones. Such
shuffles can be removed safely, improving the total cost of the
vectorized code.

Differential Revision: https://reviews.llvm.org/D114087

tsan: fix false positives in dynamic libs with static tls

The added test demonstrates  loading a dynamic library with static TLS.
Such static TLS is a hack that allows a dynamic library to have faster TLS,
but it can be loaded only iff all threads happened to allocate some excess
of static TLS space for whatever reason. If it's not the case loading fails with:

dlopen: cannot load any more object with static TLS

We used to produce a false positive because dlopen will write into TLS
of all existing threads to initialize/zero TLS region for the loaded library.
And this appears to be racing with initialization of TLS in the thread
since we model a write into the whole static TLS region (we don't what part
of it is currently unused):

WARNING: ThreadSanitizer: data race (pid=2317365)
  Write of size 1 at 0x7f1fa9bfcdd7 by main thread:
    0 memset
    1 init_one_static_tls
    2 __pthread_init_static_tls
    [[ this is where main calls dlopen ]]
    3 main
  Previous write of size 8 at 0x7f1fa9bfcdd0 by thread T1:
    0 __tsan_tls_initialization

Fix this by ignoring accesses during dlopen.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114953

[lld][WebAssembly] Fix for debug relocations against undefined function symbols

This is very similar to https://reviews.llvm.org/D103557 but applies to
symbols which are undefined at link time rather than compile time.

We already have code that handles symbols which were defined at link
time but dead stripped by `--gc-sections` (See
`test/wasm/debug-removed-fn.ll`). In that case the symbols are not live
(!isLive()). However, we can also have live symbols (which are
references by the program) but which are undefined at link time and are
imported by the linker.

In the test case here the symbol `undef` is used but is not defined
in the program but is imported by the linker due to the
`--import-undefined` flag.

Fixes: https://github.com/emscripten-core/emscripten/issues/15528

Differential Revision: https://reviews.llvm.org/D114921

Revert "[TLI checker] Add more tests"

This reverts commit 2778554971dada8ef7df9ee6954c52a753d90c22.

Some bots are failing on the updated tests.

[clang] Do not duplicate "EnableSplitLTOUnit" module flag

If clang's output is set to bitcode and LTO is enabled, clang would
unconditionally add the flag to the module.  Unfortunately, if the input were a
bitcode or IR file and had the flag set, this would result in two copies of the
flag, which is illegal IR.  Guard the setting of the flag by checking whether it
already exists.  This follows existing practice for the related "ThinLTO" module
flag.

Differential Revision: https://reviews.llvm.org/D112177

[TLI checker] Add more tests

D114478 identified testing gaps; this patch fills them.

Differential Revision: https://reviews.llvm.org/D114913

[OpenMP] Make the new device runtime the default

This patch changes the `-fopenmp-target-new-runtime` option which controls if
the new or old device runtime is used to be true by default. Disabling this to
use the old runtime now requires using `-fno-openmp-target-new-runtime`.

Reviewed By: JonChesterfield, tianshilei1992, gregrodgers, ronlieb

Differential Revision: https://reviews.llvm.org/D114890

[InstCombine] add tests for icmp with mul op; NFC

[clangd] cleanup of header guard names

Renaming header guards to match the LLVM convention.
This patch was created by automatically applying the fixes from
clang-tidy.

I've removed the [NFC] tag from the title, as we're adding header guards in some files and thus might trigger behavior changes.

Differential Revision: https://reviews.llvm.org/D113896

[Clang] Fix LTO pipeline test after 770a50b28c00211f9a.

[AnnotationRemarks] Support generating annotation remarks with -O0.

This matches the legacy pass manager behavior. If remarks are not
enabled the pass is effectively a no-op.

[clang-tidy] Fix build broken by commit 6a9487df73e917c4faf5e060f2bb33c6ade3f967 (D113148)

[SLP][NFC]Add a test for extractelements with many uses vectorization, NFC.

[AMDGPU] Add support for in-order bvh in waitcnt pass

bvh should be handled separately from vmem and vmem with sampler instructions
for waitcnt handling.

Differential Revision: https://reviews.llvm.org/D114794

[AMDGPU] Test for in-order waitcnt insertion for bvh instructions

In-order bvh instructions don't require a waitcnt as order is
guaranteed.

However, waitcnt IS required for other image instruction types vs
bvh.

Pre-commit test for new functionality in https://reviews.llvm.org/D114794

Differential Revision: https://reviews.llvm.org/D114792

[VE][NFC] Cleanup redundant namespace wrapper

[MemoryLocation] Support strncpy in getForArgument.

The size argument of strncpy can be used as bound for the size of
its pointer arguments.

strncpy is guaranteed to write N bytes and reads up to N bytes.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D114871

[libc] Fix a bug in MPFRUtils making ULP values off by 2^(-mantissaWidth).

Fix a bug in MPFRUtils making ULP values off by 2^(-mantissaWidth) and incorrect eps for denormal numbers.

Differential Revision: https://reviews.llvm.org/D114878

[PatternMatch] create and use matcher for 'not' that excludes undef elements

We needed a stricter version of m_Not for D114462, but I wasn't
sure if that was going to be required anywhere else, so I didn't bother
to make that reusable.

It turns out we have one more existing simplification that needs
this (currently miscompiles):
https://alive2.llvm.org/ce/z/9-nTKi

And there's at least one more fold in that family that we could add.

Differential Revision: https://reviews.llvm.org/D114882

[MemoryLocation] Support memset_chk in getForArgument.

The size argument for memset_chk is an upper bound for the size of the
pointer argument. memset_chk may write less than the specified length,
if it exceeds the specified max size and aborts.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114870

[gn build] Port 6a9487df73e9

[flang] GettingInvolved: update LLVM Alias Analysis Technical Call info

The google doc was changed and the calls are now using teams.

Reviewed By: sameeranjoshi

Differential Revision: https://reviews.llvm.org/D114145

[HIPSPV] Add CUDA->SPIR-V address space mapping

Add mapping for CUDA address spaces for HIP to SPIR-V
translation. This change allows HIP device code to be
emitted as valid SPIR-V by mapping unqualified pointers
to generic address space and by mapping __device__ and
__shared__ AS to their equivalent AS in SPIR-V
(CrossWorkgroup and Workgroup, respectively).

Cuda's __constant__ AS is handled specially. In HIP
unqualified pointers (aka "flat" pointers) can point to
__constant__ objects. Mapping this AS to ConstantMemory
would produce to illegal address space casts to
generic AS. Therefore, __constant__ AS is mapped to
CrossWorkgroup.

Patch by linjamaki (Henry Linjamäki)!

Differential Revision: https://reviews.llvm.org/D108621

Fix documentation for `forEachLambdaCapture` and `hasAnyCapture`

Updates the return types of these matchers' definitions to use
`internal::Matcher<LambdaCapture>` instead of `LambdaCaptureMatcher`. This
ensures that they are categorized as traversal matchers, instead of narrowing
matchers.

Reviewed By: ymandel, tdl-g, aaron.ballman

Differential Revision: https://reviews.llvm.org/D114809

Add new clang-tidy check for string_view(nullptr)

Checks for various ways that the `const CharT*` constructor of `std::basic_string_view` can be passed a null argument and replaces them with the default constructor in most cases. For the comparison operators, braced initializer list does not compile so instead a call to `.empty()` or the empty string literal are used, where appropriate.

This prevents code from invoking behavior which is unconditionally undefined. The single-argument `const CharT*` constructor does not check for the null case before dereferencing its input. The standard is slated to add an explicitly-deleted overload to catch some of these cases: wg21.link/p2166

https://reviews.llvm.org/D114823 is a companion change to prevent duplicate warnings from the `bugprone-string-constructor` check.

Reviewed By: ymandel

Differential Revision: https://reviews.llvm.org/D113148

[fir] Declare test function inline

Declare functions checkCallOp and checkCallOpFromResultBox inline due to buildbot failure flang-aarch64-latest-clang

Expand testing of necessary features for print-changed=dot-cfg.

Summary:
Expand the testing for whether the lit tests for print-changed=dot-cfg
are supported to include checking whether dot supports pdf output.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: hvdijk (Harald van Dijk)
Differential Revision: https://reviews.llvm.org/D113187

[AArch64][SVE] Enable bf16 vector.insert

Allow passthrough bf16 registers for vector.insert

Differential revision: https://reviews.llvm.org/D114858

[VE][Clang][NFC] Disable VE toolchain tests on Windows

VE hardware is unsupported under Windows. Disable the clang VE toolchain
tests here. Tests breaking because of non-POSIX path separators.

Reland "[LICM] Hoist LOAD without sinking the STORE"

When doing load/store promotion within LICM, if we
cannot prove that it is safe to sink the store we won't
hoist the load, even though we can prove the load could
be dereferenced and moved outside the loop. This patch
implements the load promotion by moving it in the loop
preheader by inserting proper PHI in the loop. The store
is kept as is in the loop. By doing this, we avoid doing
the load from a memory location in each iteration.

Please consider this small example:

loop {
  var = *ptr;
  if (var) break;
  *ptr= var + 1;
}
After this patch, it will be:

var0 = *ptr;
loop {
  var1 = phi (var0, var2);
  if (var1) break;
  var2 = var1 + 1;
  *ptr = var2;
}
This addresses some problems from [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=51193

Differential revision: https://reviews.llvm.org/D113289

[BasicAA] Add tests for memset_pattern{4,8,16}.

This also removes the existing memset_pattern.ll test, which was relying
on GVN. It is also covered by the new test directly.

[DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling

This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case.

Differential Revision: https://reviews.llvm.org/D114676

[LICM] Adding the test as a precommit for the D113289

[ARM] Correct range in isLegalAddressImm

The ranges in isLegalAddressImm were off by one, not allowing the
maximum values for unscaled offsets.

Differential Revision: https://reviews.llvm.org/D114636

[llvm-readobj] Add support for machine-independent NetBSD ELF core notes.

Notes generated in NetBSD core files provide additional information about
processes. These notes are described in core.5, which can be viewed here:
https://man.netbsd.org/core.5

Differential Revision: https://reviews.llvm.org/D114635

[BuildLibCalls] Add support for memset_pattern{4,8}.

Add support for memset_pattern{4,8} similar to the existing
memset_pattern16 handling.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114883

[GlobalOpt] Fix assertion failure during instruction deletion

This fixes the assertion failure reported in https://reviews.llvm.org/D114889#3166417,
by making RecursivelyDeleteTriviallyDeadInstructionsPermissive()
more permissive. As the function accepts a WeakTrackingVH, even if
originally only Instructions were inserted, we may end up with
different Value types after a RAUW operation. As such, we should
not assume that the vector only contains instructions.

Notably this matches the behavior of the
RecursivelyDeleteTriviallyDeadInstructions() function variant which
accepts a single value rather than vector.

Use cc/c++ instead of gcc/g++ on FreeBSD.

All supported FreeBSD platforms do not have GCC in base anymore.

Differential Revision: https://reviews.llvm.org/D114530

[ARM] Add additional postinc distribute tests and regenerate tests. NFC

[Flang] Replace notifyMatchFailure with TODO hard failures

For unimplemented patterns we revert to using TODO hard failures instead of
notifyMatchFailure.

For fir.select_type revert to using mlir::emiterror.
For the fir.embox TODO on a type with len params we cannot add a test since the type cannot be converted to llvm.

Adding negative tests using not and checking for the error message.
TODO exits with an error in a build without assertion but aborts in a
build with assertions. Abort requires using not with the --crash
option. The two different usages of not is handled by using a custom
command %not_todo_cmd which is converted to not or not --crash
depending on the presence or absence of assertions. Using llvm-config
to check the presence of assertions.

Reviewed By: clementval, awarzynski

Differential Revision: https://reviews.llvm.org/D114371

Revert "Revert "[VE] Make VE official""

This reverts commit 27c9e8b45b25614a92539ac6787dbb5670d950b3.

Bugs exposed by AddressSanitizer have been reproduced and fixed locally:
* commit e37000f3bff384
* commit 435d44bf8ab392

[InferAttrs] Add memset_pattern{4,8} declarations to test.

[ELF] Discard input .note.gnu.build-id even with default --build-id=none

binutils 2.38 will adopt this behavior
https://sourceware.org/bugzilla/show_bug.cgi?id=28639

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D114910

[BuildLibCalls] Add additional attrs to memcpy_chk.

`memcpy_chk` can be treated like `memcpy`, with the exception that it
may not return (if it aborts the program).

See D114793 for a similar patch for `memset_chk`.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D114863

[VE][NFC] Fix use-after-free in PVFMK expansion

There is custom expansion code for packed VFMK Pseudos in the VE
backend. This code erased the Pseudo without telling
ExpandPostRAPseudos about it, causing the generic expansion function to
access the erased Pseudo. This bug triggered in the
test/CodeGen/VE/VELIntrinsics/vfmk.ll test with asan-enabled builds.

Detected by:
sanitizer-x86_64-linux-fast
(https://lab.llvm.org/buildbot/#/builders/5/builds/15393)

[ORC] Fix ambiguous call to overloaded function.

This should fix the build failure at
https://lab.llvm.org/buildbot#builders/110/builds/8359

[clangd] IncludeClenaer: Don't mark forward declarations of a class if it's declared in the main file

This will mark more headers that are unrelated to used symbol but contain its
forawrd declaration. E.g. the following are examples of headers forward
declaring `llvm::StringRef`:

- clang/include/clang/Basic/Cuda.h
- llvm/include/llvm/Support/SHA256.h
- llvm/include/llvm/Support/TrigramIndex.h
- llvm/include/llvm/Support/RandomNumberGenerator.
- ... and more (~50 in total)

This patch is a reduced version of D112707 which was controversial.

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D114864

[fir] Add fir numeric intrinsic runtime call builder

This patch adds the FIR builder to generate the numeric intrinsic
runtime call.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D114477

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: mleair <leairmark@gmail.com>

[llvm-c] Make LLVMAddAlias opaque pointer compatible

Deprecate LLVMAddAlias in favor of LLVMAddAlias2, which accepts a
value type and an address space. Previously these were extracted
from the pointer type.

Differential Revision: https://reviews.llvm.org/D114860

[GlobalOpt] Add test for PR39751 (NFC)

This has been fixed by D114889, as noted in the comments.

[clang-format] Add better support for co-routinues

Responding to a Discord call to help {D113977} and heavily inspired by the unlanded {D34225} add some support to help coroutinues from not being formatted from

```for co_await(auto elt : seq)```

to

```
for
co_await(auto elt : seq)
```

Because of the dominance of clang-format in the C++ community, I don't think we should make it the blocker that prevents users from embracing the newer parts of the standard because we butcher the layout of some of the new constucts.

Reviewed By: HazardyKnusperkeks, Quuxplusone, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D114859

[NFC][sanitizer] Check &real_pthread_join

It's a weak function which may be undefined.

[openmp][amdgpu] Disable three tests in preparation for new runtime

[ARM] Teach getIntImmCostInst about the cost of saturating fp converts

Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Recommitted from 0e98659ea1193c with a fix for APInt comparison.

Differential Revision: https://reviews.llvm.org/D114380

[ORC] Add support for removing JITDylibs.

This allows JITDylibs to be removed from the ExecutionSession. Calling
ExecutionSession::removeJITDylib will disconnect the JITDylib from the
ExecutionSession and clear it (removing all trackers associated with it). The
JITDylib object will then be destroyed as soon as the last JITDylibSP pointing
at it is destroyed.

[ORC] Only use JITDylib::GeneratorsMutex while running generators.

GeneratorsMutex should prevent lookups from proceeding through the
generators of a single JITDylib concurrently (since this could
result in redundant attempts to generate definitions). Mutation of
the generators list itself should be done under the session lock.

[ORC] Hold ResourceTracker in MaterializationResponsibility.

This keeps the tracker alive for the lifetime of the MR. This is needed so that
we can check whether the tracker has become defunct before posting results (or
failure) for the MR.

[AMDGPU] Set most sched model resource's BufferSize to one

Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memory ops
and their consumers on an in-order processor. After this change, the
scheduler will treat the data edges from loads as blocking so that
stalls are guaranteed when waiting for data to be retreaved from memory.
Since we don't actually track waitcnt here, this should do a better job
at modeling their behavior.

Practically, this means that the scheduler will trigger the 'STALL'
heuristic more often.

This type of change needs to be evaluated experimentally. Preliminary
results are positive.

Fixes: SWDEV-282962

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114777

[AMDGPU][clang] Fix __builtin_nontemporal_store() failure on AMDGPU

Reviewed By: yaxunl, sameerds

Differential Revision: https://reviews.llvm.org/D114849

[X86][FP16] Only generate approximate rsqrt when Reciprocal is true for half type

We have reasonable fast sqrt and accurate rsqrt for half type due to the
limited fractions. So neither do we need multi steps refinement for
rsqrt nor replace sqrt by rsqrt.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114844

[X86] Insert FMUL for estimated non reciprocal SQRT when `RefinementSteps` = 0

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D114843

[lldb] Skip test_launch_scripted_process_stack_frames with ASan

This test is failing on the sanitized bot because of a
heap-use-after-free. Disabling the test to turn the bot
green again.

rdar://85954489.

[ELF] Prevent internalizing used comdat symbol

When a comdat symbol is defined in both bitcode and regular object
files, which are contained in the same archive, the linker could lose
the flag that the symbol is used in the regular object file and allow
LTO to internalize it, which led to "error: undefined symbol".

The issue was introduced in D79300.

Differential Revision: https://reviews.llvm.org/D114801

[mlir][drr] Simple heuristic to reduce chance of accidental nullptr dereference

When an attribute is optional & is given an additional constraint in
rewrite pattern that could lead to dereferencing null Attribute. Avoid
cases where the constraints checks attribute but has no check if null.

This should be improved to be more uniformly guarded.

[AMDGPU] Add a regclass flag for scalar registers

Along with vector RC flags, this scalar flag will
make various regclass queries like `isVGPR` more
accurate.

Regclasses other than vectors are currently set
with the new flag even though certain unallocatable
classes aren't truly scalars. It would be ok as long
as they remain unallocatable.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D110053

[libc++] Implement P1989R2: range constructor for string_view

Implement P1989R2 which adds a range constructor for `string_view`.

Adjust `operator/=` in `path` to avoid atomic constraints caching issue
getting provoked from this PR.

Add defaulted template argument to `string_view`'s "sufficient
overloads" to avoid mangling issues in `clang-cl` builds. It is a
MSVC mangling bug that this works around.

Differential Revision: https://reviews.llvm.org/D113161

[NFC][sanitizer] Fix "not used" warning in test

[lldb] Fix DYLD_INSERT_LIBRARIES on AS

Don't make DYLD_INSERT_LIBRARIES conditional on the host triple
containing x86.

[tests] Precommit tests for writeonly argument attribute inference