review.tizen.org Git - platform/upstream/llvm.git/log

[clang-cl] Support the /HOTPATCH flag

This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019

The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).

Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: https://github.com/llvm/llvm-project/blob/d703b922961e0d02a5effdd4bfbb23ad50a3cc9f/lld/COFF/Writer.cpp#L1298

The outcome is that we can finally use Live++ or Recode along with clang-cl.

NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.

Depends on D43002, D80833 and D81301 for the full feature.

Differential Revision: https://reviews.llvm.org/D116511

AMDGPU: Fix asm in test using wrong IR type for physical register

AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection

Avoids a test diff with SDAG.

AMDGPU/GlobalISel: Regenerate test checks with -NEXT

AMDGPU/GlobalISel: Explicitly set -global-isel-abort in failure tests

If the default mode is the fallback, this would fail since it would
end up seeing the DAG failure message instead.

add tsan shared library

Add tsan shared library on Android. Only build tsan when minSdkVersion is above 23.

Reviewed By: danalbert, vitalybuka

Differential Revision: https://reviews.llvm.org/D108394

AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics

Emit an error if the return value is used on subtargets that do not
support them. Previously we were falling back to the DAG on selection
failure, where it would emit this error and then fail again.

[libc++] basic_string::resize_and_overwrite: Adopt LWG3645 (Not voted in yet)

Adopt LWG3645, which fixes the value categories of basic_string::resize_and_overwrite
https://timsong-cpp.github.io/lwg-issues/3645

Reviewed By: ldionne, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D116815

[NFC][SimplifyCFG] Add some tests for `invoke` merging

[lld][WebAssemlby] Convert test to check disassembly output. NFC

Differential Revision: https://reviews.llvm.org/D117739

optimize icmp-ugt-ashr

This diff optimizes the sequence icmp-ugt(ashr,C_1) C_2. InstCombine
already implements this optimization for sgt, and this patch adds
support ugt. This patch adds the check for UGT.

@craig.topper came up with the idea and proof:

  define i1 @src(i8 %x, i8 %y, i8 %c) {
    %cp1 = add i8 %c, 1
    %i = shl i8 %cp1, %y
    %i.2 = ashr i8 %i, %y
    %cmp = icmp eq i8 %cp1, %i.2
    ;Assume: C + 1 == (((C + 1) << y) >> y)
    call void @llvm.assume(i1 %cmp)

    ; uncomment for the sgt case
    %j = shl i8 %cp1, %y
    %j.2 = sub i8 %j, 1
    %cmp2 = icmp ne i8 %j.2, 127
    ;Assume (((c + 1 ) << y) - 1) != 127
    call void @llvm.assume(i1 %cmp2)

    %s = ashr i8 %x, %y
    %r = icmp sgt i8 %s, %c
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %y, i8 %c) {
    %cp1 = add i8 %c, 1
    %j = shl i8 %cp1, %y
    %j.2 = sub i8 %j, 1

    %r = icmp sgt i8 %x, %j.2
    ret i1 %r
  }

  declare void @llvm.assume(i1)

  This change is related to the optimizations in D117252.

  Differential Revision: https://reviews.llvm.org/D117365

[flang][NFC] Remove unused/duplicated kStridePosInDim

kStridePosInDim is a duplicate of kDimStridePos and is not used. Just
remove it.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D117784

[RISCV] Add tests for commuted vector/scalar VP patterns

This patch adds a variety of tests checking that we can match
vector/scalar instructions against masked VP intrinsics when the splat
is on the LHS. At this stage, we can't, despite us having
ostensibly-commutable ISel patterns for them. The use of V0 as the mask
operand interferes with the auto-generated ISel table.

AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd

This code is not structured to handle the legacy buffer intrinsics and
was miscompiling them.

AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics

The struct/raw forms for the buffer atomics now work as
expected. However, we're incorrectly handling the legacy form (which
we probably shouldn't handle at all). We also are not diagnosing the
use of the return value on gfx908. These will be addressed separately.

AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal

This was inheriting the mesa behavior, and as far as I know nobody is
using opencl kernels with amdpal. The isMesaKernel check was
irrelevant because this property needs to be held for all functions.

[mips] Improve vr4300 mulmul bugfix pass

When compiling with dwarf info, the mfix4300 flag introduced in
https://reviews.llvm.org/D116238 can miss some occurrences of the vr4300
mulmul bug if a debug instruction happens to be between two `muls`
instructions. This change skips debug instructions in order to fix
the mulmul bug detection.

Fixes https://github.com/llvm/llvm-project/issues/53094

Differential Revision: https://reviews.llvm.org/D117615

[GlobalISel] Fix incorrect sign extension when combining G_INTTOPTR and G_PTR_ADD

The GlobalISel combiner currently uses sign extension when manipulating
the LHS constant when combining a sequence of the following sequence of
machine instructions into a single constant:
```
  %0:_(s32) = G_CONSTANT i32 <CONSTANT>
  %1:_(p0) = G_INTTOPTR %0:_(s32)
  %2:_(s64) = G_CONSTANT i64 <CONSTANT>
  %3:_(p0) = G_PTR_ADD %1:_, %2:_(s64)
```

This causes an issue when the bit width of the first contant and the
target pointer size are different, as G_INTTOPTR has no sign extension
semantics.

This patch fixes this by capture an arbitrary precision in when matching
the constant, allowing the matching function to correctly zero extend
it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D116941

[FuncSpec] Add a reference, and some other clarifying comments. NFC.

{SLP] Delete dead code in favor of proper assert [NFC]

[SLP] Reduce nesting depth in calculateDependencies via for loop and early continue [NFC]

[ScalableVectors] Warn instead of error for invalid size requests.

This was intended to be fixed by D98856, but that only seemed to have
the desired behaviour when compiling to assembly using `-S`, not when
compiling into an object file or executable. Given that this was not
the intention of D98856, this patch fixes the behaviour.

Add missing include to fix modular build

[SLP] Add an asser to make a non-obvious precondition clear [NFC]

[OpenMPIRBuilder] Detect and fix ambiguous InsertPoints for createParallel.

When a Builder methods accepts multiple InsertPoints, when both point to
the same position, inserting instructions at one position will "move" the
other after the inserted position since the InsertPoint is pegged to the
instruction following the intended InsertPoint. For instance, when
creating a parallel region at Loc and passing the same position as AllocaIP,
creating instructions at Loc will "move" the AllocIP behind the Loc
position.

To avoid this ambiguity, add an assertion checking this condition and
fix the unittests.

In case of AllocaIP, an alternative solution could be to implicitly
split BasicBlock at InsertPoint, using the first as AllocaIP, the second
for inserting the instructions themselves. However, this solution is
specific to AllocaIP since AllocaIP will always have to be first. Hence,
this is an argument to generally handling ambiguous InsertPoints as API
sage error.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117226

[gn build] (manually) port f29256a64a

[InstSimplify] Add test for load of non-integral pointer (NFC)

[Clang][AArch64][ARM] Unaligned Access Test Fix

Test fixed for the unaligned access warning.

[InstSimplify] Add test for reinterpret load of pointer type (NFC)

[X86] lowerToAddSubOrFMAddSub - lower 512-bit ADDSUB patterns to blend(fsub,fadd)

AVX512 doesn't provide a ADDSUB instruction, but if we've built this from a build vector of scalar fsub/fadd elements we can still lower to blend(fsub,fadd)

[MLGO] Improved support for AOT cross-targeting scenarios

The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.

To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.

This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.

The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.

Differential Revision: https://reviews.llvm.org/D117752

[DebugInstrRef] Memoize variable order during sorting (NFC)

Instead of constructing DebugVariables and looking up the order
in the comparison function, compute the order upfront and then sort
a vector of (order, instr).

This improves compile-time by -0.4% geomean on CTMark ReleaseLTO-g.

Differential Revision: https://reviews.llvm.org/D117575

[X86] Fix v16f32 ADDSUB test

This was supposed to ensure we're not generating 512-bit ADDSUB nodes, but cut+paste typos meant we weren't generating a full 512-bit pattern

[Clang][RISCV] Change TARGET_BUILTIN to require zve32x for vector instruction

According to v-spec v1.0, `zve-32x` is the new minimum extension to include
to have vector instructions.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D112613

[llvm][vfs] Abstract in-memory node creation

The creation of in-memory VFS nodes happens in a single function that deduces what kind of node to create from the arguments. This leads to complicated if-then-else logic that's difficult to cleanly extend.

This patch abstracts away in-memory node creation via a type-erased factory function that's passed instead.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117648

[llvm][vfs] NFC: Virtualize in-memory `getStatus`

This patch virtualizes the `getStatus` function on `InMemoryNode` in LLVM VFS. Currently, this is implemented via top-level function `getNodeStatus` that tries to cast `InMemoryNode *` into each subtype. Virtual functions seem to be the simpler solution here.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117649

[mlir][memref] Add better support for identity layouts in memref.collapse_shape canonicalizer

When computing the new type of a collapse_shape operation, we need to at least
take into account whether the type has an identity layout, in which case we can
easily support dynamic strides. Otherwise, the canonicalizer creates invalid
IR.

Longer term, both the verifier and the canoncializer need to be extended to
support the general case.

Differential Revision: https://reviews.llvm.org/D117772

[clang][dataflow] Intersect ExprToLoc when joining environments

This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.

Reviewed-by: xazax.hun
Differential Revision: https://reviews.llvm.org/D117754

[flang][NFC] Remove extra braces

Noticed during the upstreaming process.

[Clang][AArch64][ARM] Unaligned Access Warning Added

Added warning for potential cases of
unaligned access when option
-mno-unaligned-access has been specified

Differential Revision: https://reviews.llvm.org/D116221

[EarlyCSE] Regenerate test checks (NFC)

[IRGen] Do not overwrite existing attributes in CGCall.

When adding new attributes, existing attributes are dropped. While
this appears to be a longstanding issue, this was highlighted by D105169
which dropped a lot of attributes due to adding the new noundef
attribute.

Ahmed Bougacha (@ab) tracked down the issue and provided the fix in
CGCall.cpp. I bundled it up and updated the tests.

[AArch64] Remove PRBAR0_ELn and PRLAR0_ELn sysregs.

The Armv8-R.64 architecture defines numbered MPU region registers with
indices 1-15, not 0-15. So there's no such register as PRBAR0_EL2 or
PRLAR0_EL1 (for example). The encodings that they would occupy are
used for the unnumbered PRBAR_ELn and PRLAR_ELn registers.

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D117755

[MC] Add a disassembly test for Armv8-R sysregs.

This is the counterpart to llvm/test/MC/AArch64/armv8r-sysreg.s,
checking all the same encodings when fed to the disassembler.

[AMDGPU] Set MemoryVT for truncstores in tblgen.

GlobalISelEmitter was skipping these patterns when its predicates were
checked. This patch should allow us to select d16_hi stores in
GlobalISel.

Differential Revision: https://reviews.llvm.org/D117762

[clang-format] Indicate source location on test failure. NFC.

[X86] combineConcatVectorOps - remove superfluous Subtarget.hasAVX() check

This function only ever gets called by AVX targets, and we already assert for this at the top of the function

[X86] combineConcatVectorOps - add handling for X86ISD::VSHL/VSRL/VSRA

These can be handled the same as the vector shift by immediate variants that are already handled.

[AMDGPU] Regenerate some MIR checks

[flang][NFC] Move current inliner files in Dialect directory

This patch just move the files from the Transforms directory to
the Dialect directory.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D117661

[flang][NFC] Cleanup dependent dialects and make def homogenous

Remove unnecessary dependent dialect and make the definition of the
pass more homogenous with the two others.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D117688

[RISCV] Match RVV VF variants also through masked operations

This brings floating-point RVV vector/scalar support more in line with
the integer vector patterns, which can already match '.vx' instructions
with masked operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117697

Revert "[AArch64][SVE][VLS] Move extends into arguments of comparisons"

This reverts commit db04d3e30b3878ae39ef64eb0b0a1538644c7f6a, which
causes a buildbot failure.

[RISCV] Optimize lowering of floating-point -0.0

This idea has come up in several reviews -- D115978 and D105902 -- so I
can't take any credit for the idea. Instead of using a constant pool to
lower -0.0, we can emit a sequence of two instructions:

fmv.[hwd].x freg, zero
fsgnjn.[hsd] freg, freg, freg

This is only done when the floating-point type is legal.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117687

[MLIR] The return type in the `computeSingleVarRepr` function is modified to include equality expressions.

Earlier `computeSingleVarRepr` was returning a pair of upper bound and
lower bound indices of the inequality contraints that can be expressed
as a floordiv of an affine function. The equality expression can also be
expressed as a floordiv but contains only one index and hence the `LocalRepr`
class is introduced to facilitate this.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D117430

[lldb] Remove non-address bits from addresses given to memory tag commands

Although the memory tag commands use a memory tag manager to handle
addresses, that only removes the top byte.

That top byte is 4 bits of memory tag and 4 free bits, which is more
than it should strictly remove but that's how it is for now.

There are other non-address bit uses like pointer authentication.
To ensure the memory tag manager only has to deal with memory tags,
use the ABI plugin to remove the rest.

The tag access test has been updated to sign all the relevant pointers
and require that we're running on a system with pointer authentication
in addition to memory tagging.

The pointers will look like:
<4 bit user tag><4 bit memory tag><signature><bit virtual address>

Note that there is currently no API for reading memory tags. It will
also have to consider this when it arrives.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D117672

[lldb] Rename MemoryTagManager RemoveNonAddressBits to RemoveTagBits

This better describes the intent of the method. Which for AArch64
is removing the top byte which includes the memory tags.

It does not include pointer signatures, for those we need to use
the ABI plugin. The rename makes this a little more clear.

It's a bit awkward that the memory tag manager is removing the whole
top byte not just the memory tags but it's an improvement for now.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D117671

[IRBuilder] Migrate and-folding to value-based FoldAnd.

Similar to the migration of or-folding to FoldOr, there are a few cases
where the fold in IRBuilder::CreateAnd triggered directly. Those have
been updated.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117431

[flang][NFC] Fix header guard and comment

[libcxx][test] Use TEST_HAS_BUILTIN in test code

... rather than using `__has_builtin` directly. This both (1) allows a compiler that doesn't speak `__has_builtin` to workaround with preprocessor magic, and (2) avoids diagnostics about things that look like function like macros after `#if` but are not.

[RISCV] Imply extensions in RISCVTargetInfo::initFeatureMap

Under ASTContext, clang only copies the features from the options with
Target->initFeatureMap, and no implications is done there. This makes
clang_cc1 fail to imply into `zve32x` for the vector extension, and test
cases will have to add ` -target-feature +experimental-zve32x` in order
to work.

This patch fixes it.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D113336

[flang][NFC] Fix header guard

Fix header guard to fit other files.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D117662

[mlir][bufferization] Move one-shot bufferization to Bufferization dialect

This commit is the first step towards unifying core bufferization and One-Shot Bufferize.

This commit does not move over the implementations of BufferizableOpInterface yet. This will be done in separate commits. This change does also not move the unit tests yet. The tests will be moved together with op interface implementations and split into separate files.

Differential Revision: https://reviews.llvm.org/D117641

[llvm][AArch64] Accept armv8.8 "hbc" and "mops" in .arch_extension directive

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D117693

[LegalizeTypes][VP] Add widening support for vp.gather and vp.scatter

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117557

[clang-format] Add tests for aligning `operator=` with `=delete`. NFC.

Also, add test case from https://github.com/llvm/llvm-project/issues/33044.
This was actually fixed in https://github.com/llvm/llvm-project/commit/480a1fab72f4e367a5986d572914d252c318431d, but there were no tests for delete.

Reapply [MemCpyOpt] Look through pointer casts when checking capture

This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.

-----

The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.

Reapply [MemCpyOpt] Make capture check during call slot optimization more precise

This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.

-----

Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.

Differential Revision: https://reviews.llvm.org/D115615

[clang-tidy] Revert documentation change (NFC)

Restore a fix to the list of checks that was undone by a recent commit.

[RISCV] Add intrinsic for Zbt extension

RV32: fsl, fsr, fsri
RV64: fsl, fsr, fsri, fslw, fsrw, fsriw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117468

[MemCpyOpt] Fix metadata merging during call slot optimization

Call slot optimization currently merges the metadata between the
call and the load. However, we also need to merge in the metadata
of the store.

Part of the reason why we might have gotten away with this
previously is that usually the load and the store are the same
instruction (a memcpy), this can only happen if call slot
optimization occurs on an actual load/store pair.

This addresses the issue reported in
https://reviews.llvm.org/D115615#3251386.

Differential Revision: https://reviews.llvm.org/D117679

[Support] Remove incorrect noalias return attribute in BumpPtrAllocator

The memory returned by the Allocate() function is also otherwise
accessible -- and is indeed accessed by the DestroyAll() method of
SpecificBumpPtrAlloactor. This is a violation of the noalias return
contract.

This should address the issue reported in
https://reviews.llvm.org/D116728#3252464.

Differential Revision: https://reviews.llvm.org/D117664

[lld-macho] Fix grammar in doc

[clang-format] Fix bug in parsing `operator<` with template

Fixes https://github.com/llvm/llvm-project/issues/44601.

This patch handles a bug when parsing a below example code :

```
template <class> class S;

template <class T> bool operator<(S<T> const &x, S<T> const &y) {
  return x.i < y.i;
}

template <class T> class S {
  int i = 42;
  friend bool operator< <>(S const &, S const &);
};

int main() { return S<int>{} < S<int>{}; }
```
which parse `< <>` as `<< >`, not `< <>` in terms of tokens as discussed in discord.

1. Add a condition in `tryMergeLessLess()` considering `operator` keyword and `>`
2. Force to leave a whitespace between `tok::less` and a template opener
3. Add unit test

Reviewed By: MyDeveloperDay, curdeius

Differential Revision: https://reviews.llvm.org/D117398

[RISCV] Add the zve extension according to the v1.0 spec

`zve` is the new standard vector extension to specify varying degrees of
vector support for embedding processors. The `zve` extension is related
to the `zvl` extension and other updates that are added in v1.0.

According to https://github.com/riscv-non-isa/riscv-c-api-doc/pull/21,
Clang defines macro `__riscv_v_max_elen`, `__riscv_v_max_elen_fp` for
`zve` and it can be used by applications that uses the vector extension.

Authored by: Zakk Chen <zakk.chen@sifive.com> @khchen
Co-Authored by: Eop Chen <eop.chen@sifive.com> @eopXD

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112408

[ORC] Fix another missing std::move from 9eb4939b862.

[ORC] Fix missing std::move from 9eb4939b862.

[ORC] Allow JITDylib::getDFSLinkOrder and friends to fail for defunct JITDylibs.

Calls to JITDylib's getDFSLinkOrder and getReverseDFSLinkOrder methods (both
static an non-static versions) are now valid to make on defunct JITDylibs, but
will return an error if any JITDylib in the link order is defunct.

This means that platforms can safely lookup link orders by name in response to
jit-dlopen calls from the ORC runtime, even if the call names a defunct
JITDylib -- the call will just fail with an error.

[ELF] Replace .zdebug string comparison with SHF_COMPRESSED check. NFC

[dsymutil] Don't print timestamp warning if we have no timestamp

If we have no timestamp (0), don't print the corresponding warning. The
binary holder already successfully ignores these cases, but the warning
for swift interface files was lacking it.

rdar://86036385

Differential revision: https://reviews.llvm.org/D117333

[OpenMP][AMDGPU] Optimize the linked in math libraries

Once we linked in math files, potentially even if we link in only other
"system libraries", we want to optimize the code again. This is not only
reasonable but also helps to hide various problems with the missing
attribute annotations in the math libraries.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D116906

[M68k][NFC] Rename Bt(BT) to Btst(BTST)

It seems that implementation of Bt refered from x86.
In M68k, Bt(BT) should be renamed to Btst(BTST).

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D117534

[OpenMP] Avoid costly shadow map traversals whenever possible

In the OpenMC app we saw `omp target update` spending an awful lot of
time in the shadow map traversal without ever doing any update there.
There are two cases that allow us to avoid the traversal completely.
The simplest thing is that small updates cannot (reasonably) contain
an attached pointer part. The other case requires to track in the
mapping table if an entry might contain an attached pointer as part.
Given that we have a single location shadow map entries are created,
the latter is actually fairly easy as well.

Differential Revision: https://reviews.llvm.org/D113124

[OpenMP] Introduce an environment variable to disable atomic map clauses

Atomic handling of map clauses was introduced to comply with the OpenMP
standard (see D104418). However, many apps won't need this feature which
can be costly in certain situations. To allow for applications to
opt-out we now introduce the `LIBOMPTARGET_MAP_FORCE_ATOMIC` environment
flag that voids the atomicity guarantee of the standard for map clauses
again, shifting the burden to the user.

This patch also de-duplicates the code that introduces the events used
to enforce atomicity as a cleanup.

Differential Revision: https://reviews.llvm.org/D117627

[WebAssembly] Support Wasm EH + Wasm SjLj

D108960 added support for SjLj using Wasm EH instructions, which we call
Wasm SjLj going forward. (We call the old SjLj Emscripten SjLj) But it
did not support using Wasm EH and Wasm SjLj together. So far users of
Wasm EH had to use Wasm EH with Emscripten SjLj, which had a certain
limitation and it suffered from bigger code size increases as well.

This enables using Wasm EH and Wasm SjLj together.
1. This redirects `catchswitch` and `cleanupret` that unwind to caller
   to `catch.dispatch.longjmp` BB, which is a `catchswitch` BB that
   handles longjmps.
2. D108960 converted all longjmpable `call`s to `invokes` that unwind to
   `catch.dispatch.longjmp`. This CL checks if the `call` is embedded
   within another `catchpad`, and if so, makes it unwind to its nearest
   parent's unwind destination, rather than `catch.dispatch.longjmp`.
   This is necessary to preserve the scoping structure.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D117610

[ELF] Remove StringRefZ

StringRefZ does not improve performance. Non-local symbols always have eagerly
computed nameSize. Most local symbols's lengths will be updated in either:

* shouldKeepInSymtab
* SymbolTableBaseSection::addSymbol

Its benefit is offsetted by strlen in every call site (sums up to 5KiB code in a
release x86-64 build), so using StringRefZ may be slower.

In a -s link (uncommon) there is minor speedup, like ~0.3% for clang and chrome.

Reviewed By: alexander-shaposhnikov

Differential Revision: https://reviews.llvm.org/D117644

[mlir] Fix GCC5 build broken by improper name redefinition

Combine to vpdpbusd when operand is constant and small enough.

Differential Revision: https://reviews.llvm.org/D116363

[AVR] Generate ELPM for loading byte/word from extended program memory

Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D116493

[AVR][MC] Generate section '.progmemX.data' for extended flash banks

Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D115987

[RISCV] Add a test to show the bug in the RA caused by reserved BP.

The bug is reported in https://github.com/llvm/llvm-project/issues/53016.

Authored-by: Kito Cheng <kito.cheng@sifive.com>
Differential Revision: https://reviews.llvm.org/D117738

[gn build] (manually) port 30c17e70a4d7

[llvm-dis] Add an option `dump-thinlto-index-only` in llvm-dis to read ThinLTO minimized code only.

[MLGO] Don't run the 'release' mode tests in non-autogenerated cases

Optimize shift and accumulate pattern in AArch64.

AArch64 supports unsigned shift right and accumulate. In case we see a
unsigned shift right followed by an OR. We could turn them into a USRA
instruction, given the operands of the OR has no common bits.

Differential Revision: https://reviews.llvm.org/D114405

[LoopPeel] Pass TripCount to computePeelCount by value instead of by reference. NFC

The TripCount is not modified by the function so it doesn't need
to be passed by reference. Verified by passing it as const reference
before changing to value.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D117735

[flang] Accept INDEX(..., BACK=array)

The intrinsic table entry for INDEX mistakenly required
the optional BACK= argument to be scalar, but it's an
elemental intrinsic that can accept a conforming array.

Differential Revision: https://reviews.llvm.org/D117700

[flang] Accept sparse argument keyword names for MAX/MIN

Accept any keyword argument names of the form "An" for
values of n >= 3 in calls to the intrinsic functions MAX, MIN,
and their variants, so long as "n" has no leading zero and
all the keywords are distinct. Previously, f18 was needlessly
requiring the names to be contiguous. When synthesizing keywords
to characterize the procedure's interface, don't conflict with
the program's keywords.

Differential Revision: https://reviews.llvm.org/D117701

[mlir] Make locations required when adding/creating block arguments

BlockArguments gained the ability to have locations attached a while ago, but they
have always been optional. This goes against the core tenant of MLIR where location
information is a requirement, so this commit updates the API to require locations.

Fixes #53279

Differential Revision: https://reviews.llvm.org/D117633