review.tizen.org Git - platform/upstream/llvm.git/log

[lld/mac] Make comment style uniform in start-end.s test

[lld/mac] Add support for segment$start$ and segment$end$ symbols

These symbols are somewhat interesting in that they create non-existing
segments, which as far as I know is the only way to create segments
that don't contain any sections.

Final part of part of PR50760. Like D106629, but for segments instead
of sections. I'm not aware of anything that needs this in practice.

Differential Revision: https://reviews.llvm.org/D106767

[lld/mac] Move output segment rename logic into OutputSegment

Fixes the output segment name if both -rename_section and
-rename_segment are used and the post-section-rename segment
name is the same as the pre-segment-rename segment name to
match ld64's behavior.

The motivation is that segment$start$ can create section-less segments,
and this makes a corner case in the interaction between segment$start and
-rename_segment in the upcoming segment$start patch.

Differential Revision: https://reviews.llvm.org/D106766

[lld/mac] Reland: Add tests for the interaction between -rename_section and -rename_segment

No behavior change.

Differential Revision: https://reviews.llvm.org/D106765

[libomptarget][amdgpu] More robust handling of failure to init HSA

If hsa_init fails, subsequent calls into hsa are not safe. Except for
hsa_init, but we don't retry on failure.

This patch:
- deletes a print that called into hsa to ask why it can't call into hsa
- drops a merge conflict block next to that print
- reliably initializes number of devices to zero
- skips the plugin destructor contents if the constructor failed to init hsa

Tested by making hsa_init return error, and by forcing the dynamic library
use which was then deleted from disk. Before this patch, both segv. After it,
friendly message about offloading being unavailable.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106774

Revert "[lld/mac] Add tests for the interaction between -rename_section and -rename_segment"

This reverts commit a6eb34624dcfa5a33caa0211f4a16710b22079c2.
The test fails, I screwed something up.

[lld/mac] Add tests for the interaction between -rename_section and -rename_segment

No behavior change.

Differential Revision: https://reviews.llvm.org/D106765

[docs] Update release notes to mention lli JIT engine switch

Revert "[VPlan] Add recipe for first-order rec phis, make splicing explicit."

Makes clang crash: https://reviews.llvm.org/D105008#2903350
This reverts commit d2a73fb44ea0b8c981e4b923f811f18793fc4770.

Also revert a minor formatting follow-up:
This reverts commit 82834a673246f27a541ffcc57e0eb65b008102ef.

Revert "[libomptarget] Build amdgpu plugin without hsa"

Inaccurate error handling around hsa_init

This reverts commit e30b3b23a4eddbc08b5648e643f0a0b456a57832.

[LangRef] Reorder two paragraphs for comdat

so that IMAGE_COMDAT_SELECT_LARGEST refers to the correct example.

[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI.

Begin replacing individual getMemIntrinsicNode calls and setup (for X86ISD::VBROADCAST_LOAD + X86ISD::SUBV_BROADCAST_LOAD opcodes) with this getBROADCAST_LOAD helper.

[libomptarget] Build amdgpu plugin without hsa

Default to building the amdgpu plugin to use dlopen when hsa is
not found instead of disabling it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106600

[OpenMP] Introduce RAII to protect certain RTL calls from DCE

This patch introduces a new RAII struct that will temporarily make an OpenMP
RTL function have external linkage. This is done before the attributor is
invoked to prevent it from incorrectly removing some function definitions that
we will use later. For example, if we determine all calls to one function are
dead, because it has internal linkage it can safely be removed. Later when we
try to get an instance to that function to modify the source using
`getOrCreateRuntimeFunction` we will then get an empty declaration for that
function that won't be defined anywhere. This patch prevents this from
occurring.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106707

[NFC][Codegen][X86] Improve test coverage for insertions into XMM vector

[AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog

The stack adjustment for local deallocation was incorrectly ported.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D106760

[OpenMP][tests][NFC] Update test status for gcc 11 and 12

gcc 11 introduced support for depend clause, but the gomp interface of libomp
does not yet handle the information.
Also remove -fopenmp-version=50, which is no longer needed for clang, but not
supported by gcc.

[X86][SSE] LowerRotate - perform modulo on the amount splat source directly.

If the rotation amount is a known splat, perform the modulo on the splat source, and then perform the splat. That way the amount-extension performed later by LowerScalarVariableShift can fold the splats away without any multiple-use issues.

Fixes one of the concerns raised on D104156

[Attributes] Clean up handling of UB implying attributes (NFC)

Rather than adding methods for dropping these attributes in
various places, add a function that returns an AttrBuilder with
these attributes, which can then be used with existing methods
for dropping attributes. This is with an eye on D104641, which
also needs to drop them from returns, not just parameters.

Also be more explicit about the semantics of the method in the
documentation. Refer to UB rather than Undef, which is what this
is actually about.

[Attributes] Remove nonnull from UB-implying attributes

From LangRef:

> if the parameter or return pointer is null, poison value is
> returned or passed instead. The nonnull attribute should be
> combined with the noundef attribute to ensure a pointer is not
> null or otherwise the behavior is undefined.

Dropping noundef is sufficient to prevent UB. Including nonnull
in this method just muddies the semantics.

Revert rG939291041bb35b8088e3b61be2b8b3bc950f64a7 "[AMDGPU] Regenerate wave32.ll test checks"

This still breaks buildbots

[JITLink][RISCV] Run new test from 0ad562b48 only if the RISCV backend is enabled

[InstCombine] Fix PR47960 - Incorrect transformation of fabs with nnan flag

Bug Fix for PR: https://llvm.org/PR47960

This patch makes sure that the fast math flag used in the 'select'
instruction is the same as the 'fabs' instruction after the transformation.

Differential Revision: https://reviews.llvm.org/D101727

[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs

We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When
the info cache is created, some attributes are removed. As a result, although we
mark a few functions `noinline`, they are still inlined when the bitcode library
is generated. This can cause an issue in middle end optimization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106710

[NFC][Codegen][X86] Improve test coverage for repeated insertions of the same scalar into different elements

[AMDGPU] Regenerate wave32.ll test checks

To simplify diff in future patch

[AMDGPU] Regenerate mul24 test checks

To simplify diffs in future patch

[x86] improve CMOV codegen by pushing add into operands, part 2

This is a minimum extension of D106607 to allow folding for
2 non-zero constantsi that can be materialized as immediates..

In the reduced test examples, we save 1 instruction by rolling
the constants into LEA/ADD. In the motivating test from the bullet
benchmark, we absorb both of the constant moves into add ops via
LEA magic, so we reduce by 2 instructions.

Differential Revision: https://reviews.llvm.org/D106684

[GlobalISel] Remove FlagsOp (NFC)

The class was introduced without a use on Dec 11, 2018 in commit
cef44a234219e38e1c28c902ff24586150eef682.

[Inline] Fix a warning by removing an explicit copy constructor

This patches fixes the warning:

  llvm/include/llvm/Analysis/InlineCost.h:62:3: error: definition of
  implicit copy assignment operator for 'CostBenefitPair' is
  deprecated because it has a user-declared copy constructor
  [-Werror,-Wdeprecated-copy]

by removing the explicit copy constructor.

[X86][AVX] Adjust AllowBWIVPERMV3 tolerance to account for VariableCrossLaneShuffleDepth

As noticed on D105390 - we were hardwiring the depth limit for combining to VPERMI2W/VPERMI2B instructions. Not only had we made the limit too low, we hadn't accounted for slow/fast shuffles via the VariableCrossLaneShuffleDepth control

[AMDGPU] Regenerate global-load-saddr-to-vaddr test checks

To simplify diff in future patch

[AMDGPU] Regenerate ctpop16 test checks

To simplify diff in future patch

[AMDGPU] Regenerate half test checks

To simplify diff in future patch

[AMDGPU] Regenerate anyext test checks

To simplify diff in future patch

[llvm][Inline] Add interface to return cost-benefit stuff

Return cost-benefit stuff which is computed by cost-benefit analysis.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D105349

[AArch64][GlobalISel] Widen non-pow-2 types for shifts before clamping.

For types like s96, we don't want to clamp to s64, we want to first widen to
s128 and then narrow it. Otherwise we end up with impossible to legalize types.

[mlir] Async: lower SCF operations into CFG inside coroutines

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D106747

[RISCV] Custom lower (i32 (fptoui/fptosi X)).

I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32)
isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if
the assertzexti32 has an additional user. If we add a one use check
it would just cause a fcvt.lu followed by a sext.w when only need
a fcvt.wu to satisfy both users.

To mitigate this I've added custom isel and new ISD opcodes for
fcvt.wu. This allows us to keep know it started life as a conversion
to i32 without needing to match multiple nodes. ComputeNumSignBits
has been taught that this new nodes produces 33 sign bits. To
prevent regressions when we need to zero extend the result of an
(i32 (fptoui X)), I've added a DAG combine to convert it to an
(i64 (fptoui X)) before type legalization. In most cases this would
happen in InstCombine, but a zero_extend can be created for function
returns or arguments.

To keep everything consistent I've added new nodes for fptosi as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106346

[Tests] Add additional tests for incorrect willreturn handling (NFC)

Highlight a few of the places that don't handle non-willreturn
calls correctly right now.

[Tests] Add missing willreturn attributes (NFC)

To retain the spirit of these tests after an upcoming change
to mayHaveSideEffect(), add willreturn attributes to a number
of functions.

[LICM] Extract debugify test (NFC)

Only one of the tests in the file wants to check debug info, so
move it into a separate file. This allows update_test_checks to
work.

[ADT] Remove WrappedPairNodeDataIterator (NFC)

The last use was removed on Jul 16, 2020 in commit
f1d4db4f0cdcbfeaee0840bf8a4fb5dc1b9b56fd.

[X86] Add additional div-mod-pair negative test coverage

As suggested on D106745

[mlir] Restore markUnknownOpDynamicallyLegal to call isDynamicallyLegal by default

Looks like an oversight from b7a464989955e6374b39b518e317b59b510d4dc5

This should probably have a test case ...

[BasicTTI] Set scalarization cost of scalable vector casts to Invalid.

When BasicTTIImpl::getCastInstrCost can't determine the cost of a
vector cast operation when the types need legalization, it falls
back to calculating scalarization costs. Instead of crashing on
`cast<FixedVectorType>(DstVTy)` when the type is a scalable vector,
return an Invalid cost.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106655

[X86] Add i128 div-mod-pair test coverage

[SVE][NFC] Cleanup fixed length code gen tests to make them more resilient.

Many of the tests have used NEXT when DAG is more approprite. In
some cases single DAG lines have been used. Note that these are
manual tests because they're to complex for update_llc_test_checks.py
and so it's worth not relying too much on the ordered output.

I've also made the CHECK lines more uniform when it comes to the
ordering of things like LO/HI.

[CGP] despeculateCountZeros - Don't create is-zero branch if cttz/ctlz source is known non-zero

If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from).

Differential Revision: https://reviews.llvm.org/D106685

[gn build] Port 6aa9e746ebde

[AVR] Only support sp, r0 and r1 in llvm.read_register

Most other registers are allocatable and therefore cannot be used.

This issue was flagged by the machine verifier, because reading other
registers is considered reading from an undefined register.

Differential Revision: https://reviews.llvm.org/D96969

[AVR] Fix rotate instructions

This patch fixes some issues with the RORB pseudo instruction.

  - A minor issue in which the instructions were said to use the SREG,
    which is not true.
  - An issue with the BLD instruction, which did not have an output operand.
  - A major issue in which invalid instructions were generated. The fix
    also reduce RORB from 4 to 3 instructions, so it's also a small
    optimization.

These issues were flagged by the machine verifier.

Differential Revision: https://reviews.llvm.org/D96957

[AVR] Expand large shifts early in IR

This patch makes sure shift instructions such as this one:

%result = shl i32 %n, %amount

are expanded just before the IR to SelectionDAG conversion to a loop so
that calls to non-existing library functions such as __ashlsi3 are
avoided. The generated code is currently pretty bad but there's a lot of
room for improvement: the shift itself can be done in just four
instructions.

Differential Revision: https://reviews.llvm.org/D96677

[AVR] Improve 8/16 bit atomic operations

There were some serious issues with atomic operations. This patch should
fix the biggest issues.

For details on the issue take a look at this Compiler Explorer sample:
https://godbolt.org/z/n3ndhn

Code:

    void atomicadd(_Atomic char *val) {
        *val += 5;
    }

Output:

    atomicadd:
        movw    r26, r24
        ldi     r24, 5     ; 'operand' register
        in      r0, 63
        cli
        ld      r24, X     ; load value
        add     r24, r26   ; value += X
        st      X, r24     ; store value back
        out     63, r0
        ret                ; return the wrong value (in r24)

There are various problems with this.

- The value to add (5) is stored in r24. However, the value to add to
   is loaded in the same register: r24.
- The `add` instruction adds half of the pointer to the loaded value,
   instead of (attempting to) add the operand with value 5.
- The output value of the cmpxchg instruction (which is not used in
   this code sample) is the new value with 5 added, not the old value.
   The LangRef specifies that it has to be the old value, before the
   operation.

This patch fixes the first two and leaves the third problem to be fixed
at a later date. I believe atomics were mostly broken before this patch,
with this patch they should become usable as long as you ignore the
output of the atomic operation. In particular it fixes the following
things:

- It sets the earlyclobber flag for the input ('$operand' operand) so
   that the register allocator puts it in a different register than the
   output value.
- It fixes a number of issues with the pseudo op expansion pass, for
   example now it adds the $operand field instead of the pointer. This
   fixes most machine instruction verifier issues (other flagged issues
   are unrelated to atomics).

Differential Revision: https://reviews.llvm.org/D97127

[AVR] Set R31R30 as clobbered after ADJCALLSTACKDOWN

In most cases, using R31R30 is fine because the call (which always
precedes ADJCALLSTACKDOWN) will clobber R31R30 anyway. However, in some
rare cases the register allocator might insert an instruction between
the call and the ADJCALLSTACKDOWN instruction and expect the register
pair to be live afterwards. I think this happens as a result of
rematerialization. Therefore, to fix this, the instruction needs to have
Defs set to R31R30.

Setting the Defs field does have the effect of making the instruction
look dead, which it certainly is not. This is fixed by setting
hasSideEffects to true.

Differential Revision: https://reviews.llvm.org/D97745

[AVR] Do not chain stores in call frame setup

Previously, AVRTargetLowering::LowerCall attempted to keep stack stores
in order with chains. Perhaps this worked in the past, but it does not
work now: it appears that the SelectionDAG legalization phase removes
these chains. Therefore, I've removed these chains entirely to match
X86 (which, similar to AVR, also prefers to use push instructions over
stack-relative stores to set up a call frame). With this change, all the
stack stores are in a somewhat reasonable order.

Differential Revision: https://reviews.llvm.org/D97853

[lld][WebAssembly] Align __heap_base

__heap_base was not aligned. In practice, it will often be aligned
simply because it follows the stack, but when the stack is placed at the
beginning (with the --stack-first option), the __heap_base might be
unaligned. It could even be byte-aligned.

At least wasi-libc appears to expect that __heap_base is aligned:
https://github.com/WebAssembly/wasi-libc/blob/659ff414560721b1660a19685110e484a081c3d4/dlmalloc/src/malloc.c#L5224

While WebAssembly itself does not appear to require any alignment for
memory accesses, it is sometimes required when sharing a pointer
externally. For example, WASI might expect alignment up to 8:
https://github.com/WebAssembly/WASI/blob/main/phases/snapshot/docs.md#-timestamp-u64

This issue got introduced with the addition of the --stack-first flag:
https://reviews.llvm.org/D46141
I suspect the lack of alignment wasn't intentional here.

Differential Revision: https://reviews.llvm.org/D106499

[mlir] ConversionTarget legality callbacks refactoring

* Get rid of Optional<std::function> as std::function already have a null state
* Add private setLegalityCallback function to set legality callback for unknown ops
* Get rid of unknownOpsDynamicallyLegal flag, use unknownLegalityFn state insted. This causes behavior change when user first calls markUnknownOpDynamicallyLegal with callback and then without but I am not sure is the original behavior was really a 'feature', or just oversignt in the original implementation.

Differential Revision: https://reviews.llvm.org/D105496

[clang][patch] Remove test artifact before running test for consistent results

Fix non-deterministic test behavior by removing previously-created
test directory, see comments in D95159

[DAG] Add initial SelectionDAG::isGuaranteedNotToBeUndefOrPoison framework (PR51129)

I've setup the basic framework for the isGuaranteedNotToBeUndefOrPoison call and updated DAGCombiner::visitFREEZE to use it, further Opcodes can be handled when we have test coverage.

I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place.

SelectionDAG::isGuaranteedNotToBePoison wrappers have also been added.

Differential Revision: https://reviews.llvm.org/D106668

[x86] add more tests for add with CMOV of constants; NFC

See D106607 / https://llvm.org/PR51069 for details.

[llvm] Inline getAssociatedFunction() in LLVM_DEBUG.

Function* F is used only inside LLVM_DEBUG, so that it causes unused
variable warning.

[InstCombine] Add freezeAllUsesOfArgument to visitFreeze

In D106041, a freeze was added before the branch condition to solve the miscompilation problem of SimpleLoopUnswitch.
However, I found that the added freeze disturbed other optimizations in the following situations.
```
arg.fr = freeze(arg)
use(arg.fr)
...
use(arg)
```
It is a problem that occurred when arg and arg.fr were recognized as different values.
Therefore, changing to use arg.fr instead of arg throughout the function eliminates the above problem.
Thus, I add a function that changes all uses of arg to freeze(arg) to visitFreeze of InstCombine.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D106233

Revert D106195 "[dfsan] Add wrappers for v*printf functions"

This reverts commit bf281f364757d6af8d9d8456f26d334d1eeaf575.

This commit causes dfsan to segfault.

[SimplifyCFG] Add additional if conversion tests (NFC)

Test a readonly call in between, as well as the combination of
an atomic and simple store.

[CMake] Add LIBXML2_DEFINITIONS when testing for symbol existance

Currently when linking LLVM against Libxml2, a simple check is performed to check whether it can be linked successfully. This check currently adds the include directories and the libraries for libxml2, but not definitions found by the config.

This causes issues on Windows when trying to link against a static libxml2. Libxml2 requires LIBXML_STATIC to be defined in the preprocessor to be able to link statically. This definition is put into LIBXML2_DEFINITIONS in the cmake config, but not properly forwarded to check_symbol_exists leading to it failing as it could not find xmlReadMemory in a DLL.

This patch simply appends the content of LIBXML2_DEFINITIONS to the symbol check definitions, fixing the issue.

Differential Revision: https://reviews.llvm.org/D106740

[CMake] Don't LTO optimize targets on Darwin, but only if its not ThinLTO

This is just a workaround. Pass the `-mllvm,-O0` link flags only if its
not ThinLTO. Doing that with ThinLTO currently results in an error:

```
Remaining virtual register operands
UNREACHABLE executed at .../llvm/lib/CodeGen/MachineRegisterInfo.cpp:209!
```

[GlobalISel] Add GUnmerge, GMerge, GConcatVectors, GBuildVector abstractions. NFC.

Use these to slightly simplify some code in the artifact combiner.

Re-re-re-apply "[ORC][ORC-RT] Add initial native-TLV support to MachOPlatform."

The ccache builders have recevied a config update that should eliminate the
build issues seen previously.

[gn build] Port 96709823ec37

[AMDGPU] Deduce attributes with the Attributor

This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: https://reviews.llvm.org/D104997

[flang] runtime: fix problems with I/O around EOF & delimited characters

When a WRITE overwrites an endfile record, we need to forget
that there was an endfile record.  When doing a BACKSPACE
after an explicit ENDFILE statement, the position afterwards
must be upon the endfile record.

Attempts to join list-directed delimited character input across
record boundaries was due to a bad reading of the standard
and has been deleted, now that the requirements are better understood.
This problem would cause a read attempt past EOF if a delimited
character input value was at the end of a record.

It turns out that delimited list-directed (and NAMELIST) character
output is required to emit contiguous doubled instances of the
delimiter character when it appears in the output value.  When
fixed-size records are being emitted, as is the case with internal
output, this is not possible when the problematic character falls
on the last position of a record.  No two other Fortran compilers
do the same thing in this situation so there is no good precedent
to follow.

Because it seems least wrong, with this patch we now emit one copy
of the delimiter as the last character of the current record and
another as the first character of the next record.  (The
second-least-wrong alternative might be to flag a runtime error,
but that seems harsh since it's not an explicit error in the standard,
and the output may not have to be usable later as input anyway.)
Consequently, the output is not suitable for use as list-directed or
NAMELIST input.

If a later standard were to clarify this case, this behavior will of
course change as needed to conform.

Differential Revision: https://reviews.llvm.org/D106695

[flang] Runtime: Reset list-directed input state for each NAMELIST item

NAMELIST I/O formatting uses the runtime infrastructure for
list-directed I/O. List-directed input processing has same state
that requires reinitialization for each successive NAMELIST input
item. This patch fixes bugs with "null" items and repetition counts
on NAMELIST input items after the first in the group.

Differential Revision: https://reviews.llvm.org/D106694

[LLDB][GUI] Check fields validity in actions

This patch adds a virtual method HasError to fields, it can be
overridden by fields that have errors. Additionally, a form method
CheckFieldsValidity was added to be called by actions that expects all
the field to be valid.

Differential Revision: https://reviews.llvm.org/D106459

[LLDB][GUI] Add Platform Plugin Field

This patch adds a new Platform Plugin Field. It is a choices field that
lists all the available platform plugins and can retrieve the name of the
selected plugin. The default selected plugin is the currently selected
one. This patch also allows for arbitrary scrolling to make scrolling
easier when setting choices.

Differential Revision: https://reviews.llvm.org/D106483

[source maps] fix source mapping when there are multiple matching rules

D104406 introduced an error in which, if there are multiple matchings rules for a given path, lldb was only checking for the validity in the filesystem of the first match instead of looking exhaustively one by one until a valid file is found.

Besides that, a call to consume_front was being done incorrectly, as it was modifying the input, which renders subsequent matches incorrect.

I added a test that checks for both cases.

Differential Revision: https://reviews.llvm.org/D106723

[tests] SCEV trip count w/ neg step and varying rhs

Style tweaks for SCEV's computeMaxBECountForLT [NFC]

[LangRef] Clarify comdat

* ELF supports `nodeduplicate`.
* ELF calls the concept "section group". `GRP_COMDAT` emulates the PE COMDAT deduplication feature.
* "COMDAT group" is an ELF term. Avoid it for PE/COFF.
* WebAssembly supports comdat but only supports the `any` selection kind. https://bugs.llvm.org/show_bug.cgi?id=50531
* A comdat must be included or omitted as a unit. Both the compiler and the linker must obey this rule.
* A global object can be a member of at most one comdat.
* COFF requires a non-local linkage for non-`nodeduplicate` selection kinds.
* llvm.global_ctors/.llvm.global_dtors: if the third field is used on ELF, it must reference a global variable or function in a comdat

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106300

[Attributor][FIX] checkForAllInstructions, correctly handle declarations

checkForAllInstructions was not handling declarations correctly.
It should have been returning false when it gets called on a declaration

The patch also fixes a test case for AAFunctionReachability for it to be able
to pass after the changes to the checkForAllinstructions.

Differential Revision: https://reviews.llvm.org/D106625

[cmake] Export LLVM_HOST_TRIPLE in the LLVMConfig.cmake

This is referenced in several of the cmake files that are part of an llvm install and it is also useful by downstream components such as onnx-mlir.

Differential Revision: https://reviews.llvm.org/D106686

[SCEV] Fix bug involving zero step and non-invariant RHS in trip count logic

Eli pointed out the issue when reviewing D104140. The max trip count logic makes an assumption that the value of IV changes. When the step is zero, the nowrap fact becomes trivial, and thus there's nothing preventing the loop from being nearly infinite. (The "nearly" part is because mustprogress may disallow an infinite loop while still allowing 999999999 iterations before RHS happens to allow an exit.)

This is very difficult to see in practice. You need a means to produce a loop varying RHS in a mustprogress loop which doesn't allow the loop to be infinite. In most cases, LICM or SCEV are smart enough to remove the loop varying expressions.

Differential Revision: https://reviews.llvm.org/D106327

[libc] Accommodate Fuchsia's death test framework in fenv tests.

Fuchsia's death test framework runs the closure which can die in a
different thread. Hence, the FP exceptions which cause the closure to
die should be enalbed in the closure.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D106683

[lld/mac] Fix comment typo in new start-end.s test

[Bazel] Swap stray td_srcs to deps

This is the last instance of td_srcs in MLIR core build files. `deps` is
generally preferred. There are still some cases where `td_srcs` is
useful where creating a td_library would just be another layer of
indirection, so not (yet) dropping it entirely.

Differential Revision: https://reviews.llvm.org/D106716

[NFC][SimplifyCFG] Add tests for `FoldTwoEntryPHINode()` with prof md

[ConstantFold] Fix GEP of GEP fold with opaque pointers

This was previously combining indices even though they operate on
different types. For non-opaque pointers, the condition is
automatically satisfied based on the pointer types being equal.

[ConstantFold] Extract GEP of GEP fold (NFCI)

Move this fold into a separate function and clean up the control
flow a bit.

[WebAssembly] Codegen for pmin and pmax

Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.

Differential Revision: https://reviews.llvm.org/D106612

[WebAssembly][NFC] Simplify SIMD bitconvert pattern

Differential Revision: https://reviews.llvm.org/D106680

[OpenMP] always compile with c++14 instead of gnu++14

Fixes PR 51174. c++14 should be a more portable option than gnu++14.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106632

[NFC][SimplifyCFG] Make 'conditional block' handling more straight-forward

This will simplify making use of profile weights
to not perform the speculation when obviously unprofitable.

[NFC][SimplifyCFG] FoldTwoEntryPHINode(): make better use of GetIfCondition() returning dom block

[NFC][BasicBlockUtils] Refactor GetIfCondition() to return the branch, not it's condition

Otherwise e.g. the FoldTwoEntryPHINode() has to do a lot of legwork
to re-deduce what is the dominant block (i.e. for which block
is this branch the terminator).

[NewPM] Add CrossDSOCFI pass irrespective of LTO optimization level

This pass is not an optimization and is needed for CFI functionality
(cross-dso verification).

Differential Revision: https://reviews.llvm.org/D106699

[libc] Clean up Windows macros

This clean-up removes checks for _WIN64, as the _WIN32 macro returns 1
whenever the compilation targe is 32- or 64-bit ARM.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D106706

[lld/mac] Fix start-stop.s test with expensive checks enabled

See e.g. https://lab.llvm.org/buildbot/#/builders/16/builds/14317
Not 100% sure why this fails yet, but this fixes it. Let's get
the bots green again first :)

Differential Revision: https://reviews.llvm.org/D106711

[OpenMP] Fix bug 50022

Bug 50022 [0] reports target nowait fails in certain case, which is added in this
patch. The root cause of the failure is, when the second task is created, its
parent's `td_incomplete_child_tasks` will not be incremented because there is no
parallel region here thus its team is serialized. Therefore, when the initial
thread is waiting for its unfinished children tasks, it thought there is only
one, the first task, because it is hidden helper task, so it is tracked. The
second task will only be pushed to the queue when the first task is finished.
However, when the first task finishes, it first decrements the counter of its
parent, and then release dependences. Once the counter is decremented, the thread
will move on because its counter is reset, but actually, the second task has not
been executed at all. As a result, since in this case, the main function finishes,
then `libomp` starts to destroy. When the second task is pushed somewhere, all
some of the structures might already have already been destroyed, then anything
could happen.

This patch simply moves `__kmp_release_deps` ahead of decrement of the counter.
In this way, we can make sure that the initial thread is aware of the existence
of another task(s) so it will not move on. In addition, in order to tackle
dependence chain starting with hidden helper thread, when hidden helper task is
encountered, we force the task to release dependences.

Reference:
[0] https://bugs.llvm.org/show_bug.cgi?id=50022

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D106519

[Libomptarget] Add unroll flag to shared variables loop

Unrolling this loop provides better performance in practice because it is
executed on the device and is likely to be very small.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106692

[compiler-rt][NFC] add debugging options to iossim_run

Add the ability to:
1. tell simctl to wait for debugger when spawning process
2. print the command that is called to launch the process

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D106700