review.tizen.org Git - platform/upstream/llvm.git/log

[CMake] Link to compiler-rt if LIBUNWIND_USE_COMPILER_RT is ON.

Summary:
If `-nodefaultlibs` is given, we weren't actually linking to it. This
was true irrespective of passing `-rtlib=compiler-rt` (see previous
patch). Now we explicitly link it to handle that case.

I wonder if we should be linking these libraries only if we're using
`-nodefaultlibs`...

Reviewers: beanz

Subscribers: dberris, mgorny, christof, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D51657

llvm-svn: 343990

[x86] make horizontal binop matching clearer; NFCI

The instructions are complicated, so this code will
probably never be very obvious, but hopefully this
makes it better.

As shown in PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195
...we need to improve the matching to not miss cases
where we're h-opping on 1 source vector, and that
should be a small patch after this rearranging.

llvm-svn: 343989

Remove remnant code of using indirect syscall on NetBSD

Summary:
The NetBSD version of internal routines no longer call
the indirect syscall interfaces, as these functions were
switched to lib calls.

Remove the remnant code complication that is no
longer needed after this change. Remove the variations
of internal_syscall, as they were NetBSD specific.

No functional change intended.

Reviewers: vitalybuka, joerg, javed.absar

Reviewed By: vitalybuka

Subscribers: kubamracek, fedor.sergeev, llvm-commits, #sanitizers

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D52955

llvm-svn: 343988

Don't harcode -ldl test/sanitizer_common/TestCases

Summary:
The dl library does not exist on all system and in particular
this breaks build on NetBSD. Make it conditional and
enable only for Linux, following the approach from other
test suites in the same repository.

Reviewers: joerg, vitalybuka

Reviewed By: vitalybuka

Subscribers: kubamracek, llvm-commits, #sanitizers

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D52994

llvm-svn: 343987

[TailCallElim] Enable marking of calls with byval as tails

In r339636 the alias analysis rules were changed with regards to tail calls
and byval arguments. Previously, tail calls were assumed not to alias
allocas from the current frame. This has been updated, to not assume this
for arguments with the byval attribute.

This patch aligns TailCallElim with the new rule. Tail marking can now be
more aggressive and mark more calls as tails, e.g.:

define void @test() {
  %f = alloca %struct.foo
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test2(%struct.foo* byval %f) {
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test3(%struct.foo* byval %f) {
  %agg.tmp = alloca %struct.foo
  %0 = bitcast %struct.foo* %agg.tmp to i8*
  %1 = bitcast %struct.foo* %f to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false)
  call void @bar(%struct.foo* byval %agg.tmp)
  ret void
}

The problematic case where a byval parameter is captured by a call is still
handled correctly, and will not be marked as a tail (see PR7272).

llvm-svn: 343986

AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions

Summary: The 32-bit variants do not exist on VI+.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52958

llvm-svn: 343985

Fix incorrect Twine usage in CFGPrinter

CFGPrinter (-view-cfg, -dot-cfg) invokes an undefined behaviour (dangling
pointer to rvalue) on IR files with branch weights. This patch fixes the
problem caused by Twine initialization and string conversion split into
two statements.

This change fixes the bug 37019. A similar patch to this problem was
provided in the llvmlite project

Patch by mcopik (Marcin Copik).

Differential Revision: https://reviews.llvm.org/D52933

llvm-svn: 343984

Fix a broken buildbot.

llvm-svn: 343983

[clang-move] Dump whether a declaration is templated.

llvm-svn: 343982

Disable TestCases/pthread_mutexattr_get on NetBSD

The pshared feature is unsupported on NetBSD as of today.

llvm-svn: 343981

Fix Posix/devname_r for NetBSD

NetBSD returns a different type as a return value of
devname_r(3) than FreeBSD and Darwin (int vs char*).

This implies that checking for successful completion of this
function has to be handled differently.

This test used to work well, but was switched to fix Darwin,
which broke NetBSD.

Add a dedicated ifdef for NetBSD and make it functional again
for this OS.

llvm-svn: 343980

Avoid unnecessary buffer allocation and memcpy for compressed sections.

Previously, we uncompress all compressed sections before doing anything.
That works, and that is conceptually simple, but that could results in
a waste of CPU time and memory if uncompressed sections are then
discarded or just copied to the output buffer.

In particular, if .debug_gnu_pub{names,types} are compressed and if no
-gdb-index option is given, we wasted CPU and memory because we
uncompress them into newly allocated bufers and then memcpy the buffers
to the output buffer. That temporary buffer was redundant.

This patch changes how to uncompress sections. Now, compressed sections
are uncompressed lazily. To do that, `Data` member of `InputSectionBase`
is now hidden from outside, and `data()` accessor automatically expands
an compressed buffer if necessary.

If no one calls `data()`, then `writeTo()` directly uncompresses
compressed data into the output buffer. That eliminates the redundant
memory allocation and redundant memcpy.

This patch significantly reduces memory consumption (20 GiB max RSS to
15 Gib) for an executable whose .debug_gnu_pub{names,types} are in total
5 GiB in an uncompressed form.

Differential Revision: https://reviews.llvm.org/D52917

llvm-svn: 343979

AMDGPU: Future-proof {raw,struct}.buffer.atomic intrinsics

Summary:
The ISA is really supposed to support 64-bit atomics as well,
so the data type should be an overload.

Mesa doesn't use these atomics yet, in fact I noticed this
issue while trying to use the atomics from Mesa.

Change-Id: I77f58317a085a0d3eb933cc7e99308c48a19f83e

Reviewers: tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52291

llvm-svn: 343978

TableGen/CodeGenDAGPatterns: addPredicateFn only once

Summary:
The predicate function is added in InlinePatternFragments, no need to
do it here. As a result, all uses of addPredicateFn are located in
InlinePatternFragments.

Test confirmed that there are no changes to generated files when
building all (non-experimental) targets.

Change-Id: I720e42e045ca596eb0aa339fb61adf6fe71034d5

Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D51993

llvm-svn: 343977

Fix test case for @r343970

op2 for weakodr symbols is 101 from bcanalyzer.

llvm-svn: 343976

[x86] add hadd test with no undefs, remove duplicate tests; NFC

llvm-svn: 343975

[x86] simplify hadd tests; NFC

The tests from PR39195 don't use 2 parameters. That's the
root problem for the pattern matching in isHorizontalBinOp().

llvm-svn: 343974

[AMDGPU] Add an AMDGPU specific atomic optimizer.

This commit adds a new IR level pass to the AMDGPU backend to perform
atomic optimizations. It works by:

- Running through a function and finding atomicrmw add/sub or uses of
  the atomic buffer intrinsics for add/sub.
- If all arguments except the value to be added/subtracted are uniform,
  record the value to be optimized.
- Run through the atomic operations we can optimize and, depending on
  whether the value is uniform/divergent use wavefront wide operations
  (DPP in the divergent case) to calculate the total amount to be
  atomically added/subtracted.
- Then let only a single lane of each wavefront perform the atomic
  operation, reducing the total number of atomic operations in flight.
- Lastly we recombine the result from the single lane to each lane of
  the wavefront, and calculate our individual lanes offset into the
  final result.

Differential Revision: https://reviews.llvm.org/D51969

llvm-svn: 343973

[ELF][HEXAGON] Add R_HEX_GOT_16_X support

Differential Revision: https://reviews.llvm.org/D52909

llvm-svn: 343972

Don't use back-quotes in a run line.

This works on Windows, but seems to be breaking tests that
use an external shell (e.g. bash) because backquote has special
meaning.

This particular argument wasn't crucial for the test, so I've
just removed it.

llvm-svn: 343971

[ThinLTO] Keep non-prevailing (linkonce|weak)_odr symbols live

Summary:
If we have a symbol with (linkonce|weak)_odr linkage, we do not want
to dead strip it even it is not prevailing.

IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix
ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF
object is prevailing and the ones in the IR objects are not. Stripping
them will prevent us from doing optimizations with them.

By not dead stripping them, We will convert these symbols to
available_externally linkage as a result of non-prevailing and eventually
dropping them after inlining.

I modified cache-prevailing.ll to use linkonce linkage as it is
testing whether cache prevailing bit is effective or not, not
we should treat linkonce_odr alive or not

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52893

llvm-svn: 343970

[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled

When branch target identification is enabled, we can only do indirect
tail-calls through x16 or x17. This means that the outliner can't
transform a BLR instruction at the end of an outlined region into a BR.

Differential revision: https://reviews.llvm.org/D52869

llvm-svn: 343969

[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI

When branch target identification is enabled, all indirectly-callable
functions start with a BTI C instruction. this instruction can only be
the target of certain indirect branches (direct branches and
fall-through are not affected):
- A BLR instruction, in either a protected or unprotected page.
- A BR instruction in a protected page, using x16 or x17.
- A BR instruction in an unprotected page, using any register.

Without BTI, we can use any non call-preserved register to hold the
address for an indirect tail call. However, when BTI is enabled, then
the code being compiled might be loaded into a BTI-protected page, where
only x16 and x17 can be used for indirect tail calls.

Legacy code withiout this restriction can still indirectly tail-call
BTI-protected functions, because they will be loaded into an unprotected
page, so any register is allowed.

Differential revision: https://reviews.llvm.org/D52868

llvm-svn: 343968

[AArch64][v8.5A] Branch Target Identification code-generation pass

The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.

There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
  used at function entry points. These are the BLR instruction, as well
  as BR with x16 or x17. These BR instructions are allowed for use in
  PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
  tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
  targeted by any BLR or BR instruction.

Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).

This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
  called (either because it is address-taken, or externally visible so
  could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
  indirectly branched to. This could be either done by a jump table, or
  by taking the address of the block (e.g. the using GCC label values
  extension).

We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.

Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.

Differential revision: https://reviews.llvm.org/D52867

llvm-svn: 343967

[GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM

Support G_UDIV/G_UREM/G_SREM. The instruction selection
code is taken from FastISel with only minor tweaks to adapt
for GlobalISel.

Differential Revision: https://reviews.llvm.org/D49781

llvm-svn: 343966

[x86] add 16 missed hadd patterns (PR39195); NFC

llvm-svn: 343965

[Sanitizer] fix internal_sysctlbyname build for FreeBSD.

llvm-svn: 343964

[clangd] Update the out-of-date yaml-symbol-file flag in clangd.

Summary:
The flag is stale due to the recent changes of clangd indexer, this
patch renames the flag to "index-file".

Reviewers: sammccall

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D52976

llvm-svn: 343963

[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.

The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:

- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
weren't being tested (including the FMF flag that wasn't checked).

This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).

Differential Revision: https://reviews.llvm.org/D52087

llvm-svn: 343962

[AsmParser] Return an error in the case of empty symbol ref in an expression

The following instruction:

> str q28, [x0, #1*6*4*@]

contains a @ which is parsed as an empty symbol. The parser returns true
but has no error, so the assembler continues by ignoring the
instruction.

Differential Revision: https://reviews.llvm.org/D52645

llvm-svn: 343961

[ARM] Account for implicit IT when calculating inline asm size

When deciding if it is safe to optimize a conditional branch to a CBZ or
CBNZ the offsets of the BasicBlocks from the start of the function are
estimated. For inline assembly the generic getInlineAsmLength() function is
used to get a worst case estimate of the inline assembly by multiplying the
number of instructions by the max instruction size of 4 bytes. This
unfortunately doesn't take into account the generation of Thumb implicit IT
instructions. In edge cases such as when all the instructions in the block
are 4-bytes in size and there is an implicit IT then the size is
underestimated. This can cause an out of range CBZ or CBNZ to be generated.

The patch takes a conservative approach and assumes that every instruction
in the inline assembly block may have an implicit IT.

Fixes pr31805

Differential Revision: https://reviews.llvm.org/D52834

llvm-svn: 343960

[AArch64] Fix verifier error when outlining indirect calls

The MachineOutliner for AArch64 transforms indirect calls into indirect
tail calls, replacing the call with the TCRETURNri pseudo-instruction.
This pseudo lowers to a BR, but has the isCall and isReturn flags set.

The problem is that TCRETURNri takes a tcGPR64 as the register argument,
to prevent indiret tail-calls from using caller-saved registers. The
indirect calls transformed by the outliner could use caller-saved
registers. This is fine, because the outliner ensures that the register
is available at all call sites. However, this causes a verifier failure
when the register is not in tcGPR64. The fix is to add a new
pseudo-instruction like TCRETURNri, but which accepts any GPR.

Differential revision: https://reviews.llvm.org/D52829

llvm-svn: 343959

[RISCV] Update alu8.ll and alu16.ll test cases

The srli test in alu8.ll was a no-op, as it shifted by 8 bits. Fix this, and
also change the immediate in alu16.ll as shifted by something other than a
poewr of 8 is more interesting.

llvm-svn: 343958

[DebugInfo][PDB] Fix a signed/unsigned coversion warning

Fix the following warning when compiling with clang (caused by commit
rL343951):

GlobalsStream.cpp:61:33: warning: comparison of integers of different
signs: 'int' and 'uint32_t'

This also avoids double evaluation of `GlobalsTable.HashBuckets.size()`.

llvm-svn: 343957

[InstCombine] Fix incongruous GEP type addrspace

Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.

Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
   %t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
   %t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
   %t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
   %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
   call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
   ret void
}
```

Output:

```
define void @insertelem_after_gep(<16 x i32>* %t0) {
  %t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
  call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
  ret void
}
```

Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.

```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```

I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.

Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52294

llvm-svn: 343956

[SelectionDAGBuilder][NFC] Pass LHSTy to getShiftAmountTy rather than RHSTy

r126518 introduced a a type parameter to the getShiftAmountTy target hook. It
produces the type of the shift (RHSTy), parameterised by the type of the value
being shifted (LHSTy). SelectionDAGBuilder::visitShift passed RHSTy rather
than LHSTy and this patch corrects this. The change is a no-op because in LLVM
IR the LHS and RHS types for a shift must be equal anyway.

llvm-svn: 343955

[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160

At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.

The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.

This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.

Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.

Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito

llvm-svn: 343954

Fix a -Wsign-compare warning.

llvm-svn: 343953

Fix a compilation failure on non-MSVC compilers.

llvm-svn: 343952

[PDB] Add the ability to lookup global symbols by name.

The Globals table is a hash table keyed on symbol name, so
it's possible to lookup symbols by name in O(1) time. Add
a function to the globals stream to do this, and add an option
to llvm-pdbutil to exercise this, then use it to write some
tests to verify correctness.

llvm-svn: 343951

Revert r343948 "[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC"

The assert is failing some asan tests on the bots.

llvm-svn: 343950

[coro]Pass rvalue reference for named local variable to return_value

Summary:
Addressing https://bugs.llvm.org/show_bug.cgi?id=37265.

Implements [class.copy]/33 of coroutines TS.

When the criteria for elision of a copy/move operation are met, but not
for an exception-declaration, and the object to be copied is designated by an
lvalue, or when the expression in a return or co_return statement is a
(possibly parenthesized) id-expression that names an object with automatic
storage duration declared in the body or parameter-declaration-clause of the
innermost enclosing function or lambda-expression, overload resolution to select
the constructor for the copy or the return_value overload to call is first
performed as if the object were designated by an rvalue.

Patch by Tanoy Sinha!

Reviewers: modocache, GorNishanov

Reviewed By: modocache, GorNishanov

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D51741

llvm-svn: 343949

[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC

llvm-svn: 343948

[LegalizeDAG] Move legalization of scatter and masked store from LegalizeVectorOps to LegalizeDAG.

This is where we legalize gather and masked load so this is consistent.

Since these ops are always on vectors I've chosen to go with LegalizeDAG since that's what we do for other vector only ops like BUILD_VECTOR, VECTOR_SHUFFLE, etc. The ScalarizeMaskedMemIntrinsic pass should take care of scalarizing these before SelectionDAG so hopefully we don't need to worry about illegally typed scalar ops being emitted in the legalizing. If we did we would need to do this in LegalizeVectorOps so we could get the second type legalization that runs between LegalizeVectorOps and LegalizeDAG.

llvm-svn: 343947

[clangd] Migrate to LLVM STLExtras range API

llvm-svn: 343946

[DAGCombiner] allow undef elts in vector fadd matching

llvm-svn: 343945

[x86] add vector fadd with undef elts test; NFC

llvm-svn: 343944

[x86] remove redundant tests; NFC

The equivalent tests were added to the file with related folds in rL343941.

llvm-svn: 343943

[DAGCombiner] allow undefs when matching vector splats for fmul folds

llvm-svn: 343942

[x86] add vector fmul with undef elts tests; NFC

llvm-svn: 343941

[DAGCombiner] allow undef elts in vector fabs/fneg matching

This change is proposed as a part of D44548, but we
need this independently to avoid regressions from improved
undef propagation in SimplifyDemandedVectorElts().

llvm-svn: 343940

[DAGCombiner] shorten code for bitcast+fabs fold; NFC

llvm-svn: 343939

[x86] add tests for FP logic folding for vectors with undefs; NFC

llvm-svn: 343938

[clangd] NFC: Migrate to LLVM STLExtras API where possible

This patch improves readability by migrating `std::function(ForwardIt
start, ForwardIt end, ...)` to LLVM's STLExtras range-based equivalent
`llvm::function(RangeT &&Range, ...)`.

Similar change in Clang: D52576.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D52650

llvm-svn: 343937

[InstSimplify] add vector test for fneg+fdiv; NFC

This should be fixed with D52934.

llvm-svn: 343936

[SelectionDAG] Respect multiple uses in SimplifyDemandedBits to SimplifyDemandedVectorElts simplification

rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....).

Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ.

Thanks to Jan Vesely (@jvesely) for bisecting the bug.

llvm-svn: 343935

[AARCH64][X86] Remove _nonsplat from test names

As discussed on D50222

llvm-svn: 343934

[LegalizeVectorOps] Make ExpandStrictFPOp return the result corresponding to the result number of the SDValue passed in.

It was always returning the chain which seems to be the result number of the SDValue in the lit tests we have. But I don't know if that's guaranteed.

llvm-svn: 343933

[IAI,LV] Avoid creating interleave-groups for predicated accesse

This patch fixes PR39099.

When strided loads are predicated, each of them will form an interleaved-group
(with gaps). However, subsequent stages of vectorization (planning and
transformation) assume that if a load is part of an Interleave-Group it is not
predicated, resulting in wrong code - unmasked wide loads are created.

The Interleaving Analysis does take care not to have conditional interleave
groups of size > 1, but until we extend the planning and transformation stages
to support masked-interleave-groups we should also avoid having them for
size == 1.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D52682

llvm-svn: 343931

[RISCV] Introduce alu8.ll and alu16.ll tests

These track the quality of generated code for simple arithmetic operations
that were legalised from non-native types.

llvm-svn: 343930

[ORC] Consume unhandled errors in unit test.

This should fix the failures on the debug buildbots.

llvm-svn: 343929

[ORC] Add a 'remove' method to JITDylib to remove symbols.

Symbols can be removed provided that all are present in the JITDylib and none
are currently in the materializing state. On success all requested symbols are
removed. On failure an error is returned and no symbols are removed.

llvm-svn: 343928

[ORC] Pass symbol name to discard by const reference.

This saves some unnecessary atomic ref-counting operations.

llvm-svn: 343927

[X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion

llvm-svn: 343926

[X86][SSE] Add SSE41 vector int2fp tests

llvm-svn: 343925

[X86][AVX] Ensure resolveTargetShuffleInputs shuffle masks are the correct width

Don't handle ZERO_EXTEND style shuffles until we support bitcasts. Found by inspection.

llvm-svn: 343924

Papers and Issues for San Diego

llvm-svn: 343923

[X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand

Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself

llvm-svn: 343922

[X86] Regenerate LSR loop iteration test

llvm-svn: 343921

[x86] add test for masked store with extra shift op; NFC

llvm-svn: 343920

[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks

Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises.

llvm-svn: 343919

[X86] Use the SimplifyDemandedBits wrappers where possible. NFCI.

Leave the wrapper to handle TargetLowering::TargetLoweringOpt and CommitTargetLoweringOpt.

llvm-svn: 343918

Revert rL343916: Fix -Wmissing-braces warning. NFCI.

llvm-svn: 343917

Fix -Wmissing-braces warning. NFCI.

llvm-svn: 343916

Wdocumentation fix

llvm-svn: 343915

Wdocumentation fix

llvm-svn: 343914

[SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts simplification

This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector.

There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify.

As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.).

Fixes PR39178

Differential Revision: https://reviews.llvm.org/D52935

llvm-svn: 343913

[clangd] Remove unused headers from CodeComplete.cpp

queue is not used after index-provided completions' merge with those from Sema
USRGeneration.h is not used after introduction of getSymbolID

llvm-svn: 343912

[RISCV] Compress addiw rd, x0, simm6 to c.li rd, simm6

A pattern was present for addi rd, x0, simm6 but not addiw which is
semantically identical when the source register is x0. This patch addresses
that, and the benefit can be seen in rv64c-aliases-valid.s.

llvm-svn: 343911

AMDGPU: Consolidate SMRD TableGen patterns

Summary:
Merge the SMRD patterns for CI into the same multiclass as the
patterns for other sub-targets.

This removes some duplicate code and will make it easier for some
future GlobalISel changes I would like to do.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52557

llvm-svn: 343909

Thread safety analysis: Handle conditional expression in getTrylockCallExpr

Summary:
We unwrap conditional expressions containing try-lock functions.

Additionally we don't acquire on conditional expression branches, since
that is usually not helpful. When joining the branches we would almost
certainly get a warning then.

Hopefully fixes an issue that was raised in D52398.

Reviewers: aaron.ballman, delesley, hokein

Reviewed By: aaron.ballman

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D52888

llvm-svn: 343902

[llvm-ar] Use POSIX-specified timestamps for 'tv'.

Summary:
The POSIX spec says:

```
If the −t option is used with the −v option, the standard output format shall be:
"%s %u/%u %u %s %d %d:%d %d %s\n", <member mode>, <user ID>,
<group ID>, <number of bytes in member>,
<abbreviated month>, <day-of-month>, <hour>,
<minute>, <year>, <file>

where:

...
<abbreviated month>
Equivalent to the format of the %b conversion specification format in date.
<day-of-month>
Equivalent to the format of the %e conversion specification format in date.
<hour> Equivalent to the format of the %H conversion specification format in date.
<minute> Equivalent to the format of the %M conversion specification format in date.
<year> Equivalent to the format of the %Y conversion specification format in date.
```

This actually used to be the format printed by llvm-ar. It was apparently accidentally changed (see r207385 followed by comments in r207387). This makes it conform to GNU ar for easier replacement.

Reviewers: MaskRay

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52940

llvm-svn: 343901

Add support for artificial tail call frames

This patch teaches lldb to detect when there are missing frames in a
backtrace due to a sequence of tail calls, and to fill in the backtrace
with artificial tail call frames when this happens. This is only done
when the execution history can be determined from the call graph and
from the return PC addresses of calls on the stack. Ambiguous sequences
of tail calls (e.g anything involving tail calls and recursion) are
detected and ignored.

Depends on D49887.

Differential Revision: https://reviews.llvm.org/D50478

llvm-svn: 343900

Relax a data formatter test

Before inspecting the contents of a list, make sure that we've stepped
past the push_back() that inserts the element we're interested in.

llvm-svn: 343899

[New PM][PassTiming] implement -time-passes for the new pass manager

Enable time-passes functionality through PassInstrumentation callbacks
for passes and analyses.

TimePassesHandler class keeps all the callbacks, the timing data as it
is being collected as well as the stack of currently active timers.

Parts of the fix that might be somewhat unobvious:
  - mapping of passes into Timer (TimingData) can not be done per-instance.
    PassID name provided into the callback is common for all the pass invocations.
    Thus the only way to get a timing with reasonable granularity is to collect
    timing data per pass invocation, getting a new timer for each BeforePass.
    Hence the key for TimingData uses a pair of <StringRef/unsigned count> to
    uniquely identify a pass invocation.

  - consequently, this new-pass-manager implementation performs no aggregation
    of timing data, reporting timings for each pass invocation separately.
    In that it differs from legacy-pass-manager time-passes implementation that
    reports timing data aggregated per pass instance.

  - pass managers and adaptors are not tracked, similar to how pass managers are
    not tracked in legacy time-passes.

  - TimerStack tracks timers that are active, each BeforePass pushes the new timer
    on stack, each AfterPass pops active timer from stack and stops it.

Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D51276

llvm-svn: 343898

[AArch64] -mcpu=native CPU detection for Cavium processors

This small patch updates the CPU detection for Cavium processors when
-mcpu=native is passed on compile-line.

Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D51939

llvm-svn: 343897

[llvm-nm] Update all tests to redirect stderr to stdout

This addresses the breakage introduced in r343887.

llvm-svn: 343896

X86, AArch64, ARM: Do not attach debug location to spill/reload instructions

This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895

[COFF, ARM64] Add _InterlockedAdd intrinsic

Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D52811

llvm-svn: 343894

Specify -mtriple=x86_64 in an X86-specific dwarf test

On the PPC bot, the %llc_dwarf substitution does not contain an -mtriple
argument. This can cause the wrong backend to be exercised.

This causes issues because the backends differ in when they decide to
emit tail calls:

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/12440

This is mostly a speculative fix as I don't have a PPC machine to test
with.

llvm-svn: 343893

Emit CK_NoOp casts in C mode, not just C++.

Previously, it had been using CK_BitCast even for casts that only
change const/restrict/volatile. Now it will use CK_Noop where
appropriate.

This is an alternate solution to r336746.

Differential Revision: https://reviews.llvm.org/D52918

llvm-svn: 343892

[X86][AVX] Limit getFauxShuffleMask INSERT_SUBVECTOR support to 2 inputs

rL343853 didn't limit the number of subinputs, but we don't currently support faux shuffles with more than 2 total inputs, so put a limiter in place until this is fixed.

Found by Artem Dergachev.

llvm-svn: 343891

[LiveDebugValues] Extend var ranges through artificial blocks

ASan often introduces basic blocks consisting exclusively of
instructions without debug locations, or with line 0 debug locations.

LiveDebugValues needs to extend variable ranges through these artificial
blocks. Otherwise, a lot of variables disappear -- even at -O0.

Typically, LiveDebugValues does not extend a variable's range into a
block unless the block is essentially "part of" the variable's scope
(for a precise definition, see LexicalScopes::dominates). This patch
relaxes the lexical dominance check for artificial blocks.

This makes the following Swift program debuggable at -O0:
```
1| var x = 100
2| print("x = \(x)")
```

rdar://39127144

Differential Revision: https://reviews.llvm.org/D52921

llvm-svn: 343890

Clarify debug output in LiveDebugValues

MachineBasicBlocks often do not have names, so it helps to refer to them
by block number when printing debug messages.

llvm-svn: 343889

Disable the dwarf callsite attrs test on Windows

The Windows formats don't understand relocations inside of AT_return_pc.

http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/270

llvm-svn: 343888

[llvm-nm] Write "no symbol" output to stderr

This matches the output of binutils' nm and ensures that any scripts
or tools that use nm and expect empty output in case there no symbols
don't break.

Differential Revision: https://reviews.llvm.org/D52943

llvm-svn: 343887

Avoid hardcoding PC addresses in a dwarf test

The PCs appear to vary from builder-to-builder:

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/20053

llvm-svn: 343886

[GlobalIsel] Add llvm.invariant.start and llvm.invariant.end

Port over the implementation in SelectionDAGBuilder.cpp into the IRTranslator
and update the arm64-irtranslator test.

These were causing fallbacks in CTMark/Bullet (-Rpass-missed=gisel-select),
and this patch fixes that.

https://reviews.llvm.org/D52945

llvm-svn: 343885

dwarfdump: Avoid parsing units unnecessarily

NFC-ish (the parsing of the units is not a functional change - no
errors/warnings are emitted during the shallow parsing - though without
parsing them here, the "max version" would be wrong (still zero) later
on, so in those cases the units do need to be parsed)

llvm-svn: 343884

[DebugInfo] Add support for DWARF5 call site-related attributes

DWARF v5 introduces DW_AT_call_all_calls, a subprogram attribute which
indicates that all calls (both regular and tail) within the subprogram
have call site entries. The information within these call site entries
can be used by a debugger to populate backtraces with synthetic tail
call frames.

Tail calling frames go missing in backtraces because the frame of the
caller is reused by the callee. Call site entries allow a debugger to
reconstruct a sequence of (tail) calls which led from one function to
another. This improves backtrace quality. There are limitations: tail
recursion isn't handled, variables within synthetic frames may not
survive to be inspected, etc. This approach is not novel, see:

https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=jelinek.pdf

This patch adds an IR-level flag (DIFlagAllCallsDescribed) which lowers
to DW_AT_call_all_calls. It adds the minimal amount of DWARF generation
support needed to emit standards-compliant call site entries. For easier
deployment, when the debugger tuning is LLDB, the DWARF requirement is
adjusted to v4.

Testing: Apart from check-{llvm, clang}, I built a stage2 RelWithDebInfo
clang binary. Its dSYM passed verification and grew by 1.4% compared to
the baseline. 151,879 call site entries were added.

rdar://42001377

Differential Revision: https://reviews.llvm.org/D49887

llvm-svn: 343883