review.tizen.org Git - platform/upstream/llvm.git/log

[ThinLTO] Keep non-prevailing (linkonce|weak)_odr symbols live

Summary:
If we have a symbol with (linkonce|weak)_odr linkage, we do not want
to dead strip it even it is not prevailing.

IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix
ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF
object is prevailing and the ones in the IR objects are not. Stripping
them will prevent us from doing optimizations with them.

By not dead stripping them, We will convert these symbols to
available_externally linkage as a result of non-prevailing and eventually
dropping them after inlining.

I modified cache-prevailing.ll to use linkonce linkage as it is
testing whether cache prevailing bit is effective or not, not
we should treat linkonce_odr alive or not

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52893

llvm-svn: 343970

[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled

When branch target identification is enabled, we can only do indirect
tail-calls through x16 or x17. This means that the outliner can't
transform a BLR instruction at the end of an outlined region into a BR.

Differential revision: https://reviews.llvm.org/D52869

llvm-svn: 343969

[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI

When branch target identification is enabled, all indirectly-callable
functions start with a BTI C instruction. this instruction can only be
the target of certain indirect branches (direct branches and
fall-through are not affected):
- A BLR instruction, in either a protected or unprotected page.
- A BR instruction in a protected page, using x16 or x17.
- A BR instruction in an unprotected page, using any register.

Without BTI, we can use any non call-preserved register to hold the
address for an indirect tail call. However, when BTI is enabled, then
the code being compiled might be loaded into a BTI-protected page, where
only x16 and x17 can be used for indirect tail calls.

Legacy code withiout this restriction can still indirectly tail-call
BTI-protected functions, because they will be loaded into an unprotected
page, so any register is allowed.

Differential revision: https://reviews.llvm.org/D52868

llvm-svn: 343968

[AArch64][v8.5A] Branch Target Identification code-generation pass

The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.

There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
  used at function entry points. These are the BLR instruction, as well
  as BR with x16 or x17. These BR instructions are allowed for use in
  PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
  tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
  targeted by any BLR or BR instruction.

Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).

This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
  called (either because it is address-taken, or externally visible so
  could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
  indirectly branched to. This could be either done by a jump table, or
  by taking the address of the block (e.g. the using GCC label values
  extension).

We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.

Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.

Differential revision: https://reviews.llvm.org/D52867

llvm-svn: 343967

[GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM

Support G_UDIV/G_UREM/G_SREM. The instruction selection
code is taken from FastISel with only minor tweaks to adapt
for GlobalISel.

Differential Revision: https://reviews.llvm.org/D49781

llvm-svn: 343966

[x86] add 16 missed hadd patterns (PR39195); NFC

llvm-svn: 343965

[Sanitizer] fix internal_sysctlbyname build for FreeBSD.

llvm-svn: 343964

[clangd] Update the out-of-date yaml-symbol-file flag in clangd.

Summary:
The flag is stale due to the recent changes of clangd indexer, this
patch renames the flag to "index-file".

Reviewers: sammccall

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D52976

llvm-svn: 343963

[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.

The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:

- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
weren't being tested (including the FMF flag that wasn't checked).

This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).

Differential Revision: https://reviews.llvm.org/D52087

llvm-svn: 343962

[AsmParser] Return an error in the case of empty symbol ref in an expression

The following instruction:

> str q28, [x0, #1*6*4*@]

contains a @ which is parsed as an empty symbol. The parser returns true
but has no error, so the assembler continues by ignoring the
instruction.

Differential Revision: https://reviews.llvm.org/D52645

llvm-svn: 343961

[ARM] Account for implicit IT when calculating inline asm size

When deciding if it is safe to optimize a conditional branch to a CBZ or
CBNZ the offsets of the BasicBlocks from the start of the function are
estimated. For inline assembly the generic getInlineAsmLength() function is
used to get a worst case estimate of the inline assembly by multiplying the
number of instructions by the max instruction size of 4 bytes. This
unfortunately doesn't take into account the generation of Thumb implicit IT
instructions. In edge cases such as when all the instructions in the block
are 4-bytes in size and there is an implicit IT then the size is
underestimated. This can cause an out of range CBZ or CBNZ to be generated.

The patch takes a conservative approach and assumes that every instruction
in the inline assembly block may have an implicit IT.

Fixes pr31805

Differential Revision: https://reviews.llvm.org/D52834

llvm-svn: 343960

[AArch64] Fix verifier error when outlining indirect calls

The MachineOutliner for AArch64 transforms indirect calls into indirect
tail calls, replacing the call with the TCRETURNri pseudo-instruction.
This pseudo lowers to a BR, but has the isCall and isReturn flags set.

The problem is that TCRETURNri takes a tcGPR64 as the register argument,
to prevent indiret tail-calls from using caller-saved registers. The
indirect calls transformed by the outliner could use caller-saved
registers. This is fine, because the outliner ensures that the register
is available at all call sites. However, this causes a verifier failure
when the register is not in tcGPR64. The fix is to add a new
pseudo-instruction like TCRETURNri, but which accepts any GPR.

Differential revision: https://reviews.llvm.org/D52829

llvm-svn: 343959

[RISCV] Update alu8.ll and alu16.ll test cases

The srli test in alu8.ll was a no-op, as it shifted by 8 bits. Fix this, and
also change the immediate in alu16.ll as shifted by something other than a
poewr of 8 is more interesting.

llvm-svn: 343958

[DebugInfo][PDB] Fix a signed/unsigned coversion warning

Fix the following warning when compiling with clang (caused by commit
rL343951):

GlobalsStream.cpp:61:33: warning: comparison of integers of different
signs: 'int' and 'uint32_t'

This also avoids double evaluation of `GlobalsTable.HashBuckets.size()`.

llvm-svn: 343957

[InstCombine] Fix incongruous GEP type addrspace

Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.

Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
   %t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
   %t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
   %t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
   %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
   call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
   ret void
}
```

Output:

```
define void @insertelem_after_gep(<16 x i32>* %t0) {
  %t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
  call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
  ret void
}
```

Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.

```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```

I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.

Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52294

llvm-svn: 343956

[SelectionDAGBuilder][NFC] Pass LHSTy to getShiftAmountTy rather than RHSTy

r126518 introduced a a type parameter to the getShiftAmountTy target hook. It
produces the type of the shift (RHSTy), parameterised by the type of the value
being shifted (LHSTy). SelectionDAGBuilder::visitShift passed RHSTy rather
than LHSTy and this patch corrects this. The change is a no-op because in LLVM
IR the LHS and RHS types for a shift must be equal anyway.

llvm-svn: 343955

[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160

At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.

The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.

This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.

Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.

Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito

llvm-svn: 343954

Fix a -Wsign-compare warning.

llvm-svn: 343953

Fix a compilation failure on non-MSVC compilers.

llvm-svn: 343952

[PDB] Add the ability to lookup global symbols by name.

The Globals table is a hash table keyed on symbol name, so
it's possible to lookup symbols by name in O(1) time. Add
a function to the globals stream to do this, and add an option
to llvm-pdbutil to exercise this, then use it to write some
tests to verify correctness.

llvm-svn: 343951

Revert r343948 "[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC"

The assert is failing some asan tests on the bots.

llvm-svn: 343950

[coro]Pass rvalue reference for named local variable to return_value

Summary:
Addressing https://bugs.llvm.org/show_bug.cgi?id=37265.

Implements [class.copy]/33 of coroutines TS.

When the criteria for elision of a copy/move operation are met, but not
for an exception-declaration, and the object to be copied is designated by an
lvalue, or when the expression in a return or co_return statement is a
(possibly parenthesized) id-expression that names an object with automatic
storage duration declared in the body or parameter-declaration-clause of the
innermost enclosing function or lambda-expression, overload resolution to select
the constructor for the copy or the return_value overload to call is first
performed as if the object were designated by an rvalue.

Patch by Tanoy Sinha!

Reviewers: modocache, GorNishanov

Reviewed By: modocache, GorNishanov

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D51741

llvm-svn: 343949

[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC

llvm-svn: 343948

[LegalizeDAG] Move legalization of scatter and masked store from LegalizeVectorOps to LegalizeDAG.

This is where we legalize gather and masked load so this is consistent.

Since these ops are always on vectors I've chosen to go with LegalizeDAG since that's what we do for other vector only ops like BUILD_VECTOR, VECTOR_SHUFFLE, etc. The ScalarizeMaskedMemIntrinsic pass should take care of scalarizing these before SelectionDAG so hopefully we don't need to worry about illegally typed scalar ops being emitted in the legalizing. If we did we would need to do this in LegalizeVectorOps so we could get the second type legalization that runs between LegalizeVectorOps and LegalizeDAG.

llvm-svn: 343947

[clangd] Migrate to LLVM STLExtras range API

llvm-svn: 343946

[DAGCombiner] allow undef elts in vector fadd matching

llvm-svn: 343945

[x86] add vector fadd with undef elts test; NFC

llvm-svn: 343944

[x86] remove redundant tests; NFC

The equivalent tests were added to the file with related folds in rL343941.

llvm-svn: 343943

[DAGCombiner] allow undefs when matching vector splats for fmul folds

llvm-svn: 343942

[x86] add vector fmul with undef elts tests; NFC

llvm-svn: 343941

[DAGCombiner] allow undef elts in vector fabs/fneg matching

This change is proposed as a part of D44548, but we
need this independently to avoid regressions from improved
undef propagation in SimplifyDemandedVectorElts().

llvm-svn: 343940

[DAGCombiner] shorten code for bitcast+fabs fold; NFC

llvm-svn: 343939

[x86] add tests for FP logic folding for vectors with undefs; NFC

llvm-svn: 343938

[clangd] NFC: Migrate to LLVM STLExtras API where possible

This patch improves readability by migrating `std::function(ForwardIt
start, ForwardIt end, ...)` to LLVM's STLExtras range-based equivalent
`llvm::function(RangeT &&Range, ...)`.

Similar change in Clang: D52576.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D52650

llvm-svn: 343937

[InstSimplify] add vector test for fneg+fdiv; NFC

This should be fixed with D52934.

llvm-svn: 343936

[SelectionDAG] Respect multiple uses in SimplifyDemandedBits to SimplifyDemandedVectorElts simplification

rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....).

Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ.

Thanks to Jan Vesely (@jvesely) for bisecting the bug.

llvm-svn: 343935

[AARCH64][X86] Remove _nonsplat from test names

As discussed on D50222

llvm-svn: 343934

[LegalizeVectorOps] Make ExpandStrictFPOp return the result corresponding to the result number of the SDValue passed in.

It was always returning the chain which seems to be the result number of the SDValue in the lit tests we have. But I don't know if that's guaranteed.

llvm-svn: 343933

[IAI,LV] Avoid creating interleave-groups for predicated accesse

This patch fixes PR39099.

When strided loads are predicated, each of them will form an interleaved-group
(with gaps). However, subsequent stages of vectorization (planning and
transformation) assume that if a load is part of an Interleave-Group it is not
predicated, resulting in wrong code - unmasked wide loads are created.

The Interleaving Analysis does take care not to have conditional interleave
groups of size > 1, but until we extend the planning and transformation stages
to support masked-interleave-groups we should also avoid having them for
size == 1.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D52682

llvm-svn: 343931

[RISCV] Introduce alu8.ll and alu16.ll tests

These track the quality of generated code for simple arithmetic operations
that were legalised from non-native types.

llvm-svn: 343930

[ORC] Consume unhandled errors in unit test.

This should fix the failures on the debug buildbots.

llvm-svn: 343929

[ORC] Add a 'remove' method to JITDylib to remove symbols.

Symbols can be removed provided that all are present in the JITDylib and none
are currently in the materializing state. On success all requested symbols are
removed. On failure an error is returned and no symbols are removed.

llvm-svn: 343928

[ORC] Pass symbol name to discard by const reference.

This saves some unnecessary atomic ref-counting operations.

llvm-svn: 343927

[X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion

llvm-svn: 343926

[X86][SSE] Add SSE41 vector int2fp tests

llvm-svn: 343925

[X86][AVX] Ensure resolveTargetShuffleInputs shuffle masks are the correct width

Don't handle ZERO_EXTEND style shuffles until we support bitcasts. Found by inspection.

llvm-svn: 343924

Papers and Issues for San Diego

llvm-svn: 343923

[X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand

Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself

llvm-svn: 343922

[X86] Regenerate LSR loop iteration test

llvm-svn: 343921

[x86] add test for masked store with extra shift op; NFC

llvm-svn: 343920

[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks

Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises.

llvm-svn: 343919

[X86] Use the SimplifyDemandedBits wrappers where possible. NFCI.

Leave the wrapper to handle TargetLowering::TargetLoweringOpt and CommitTargetLoweringOpt.

llvm-svn: 343918

Revert rL343916: Fix -Wmissing-braces warning. NFCI.

llvm-svn: 343917

Fix -Wmissing-braces warning. NFCI.

llvm-svn: 343916

Wdocumentation fix

llvm-svn: 343915

Wdocumentation fix

llvm-svn: 343914

[SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts simplification

This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector.

There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify.

As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.).

Fixes PR39178

Differential Revision: https://reviews.llvm.org/D52935

llvm-svn: 343913

[clangd] Remove unused headers from CodeComplete.cpp

queue is not used after index-provided completions' merge with those from Sema
USRGeneration.h is not used after introduction of getSymbolID

llvm-svn: 343912

[RISCV] Compress addiw rd, x0, simm6 to c.li rd, simm6

A pattern was present for addi rd, x0, simm6 but not addiw which is
semantically identical when the source register is x0. This patch addresses
that, and the benefit can be seen in rv64c-aliases-valid.s.

llvm-svn: 343911

AMDGPU: Consolidate SMRD TableGen patterns

Summary:
Merge the SMRD patterns for CI into the same multiclass as the
patterns for other sub-targets.

This removes some duplicate code and will make it easier for some
future GlobalISel changes I would like to do.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52557

llvm-svn: 343909

Thread safety analysis: Handle conditional expression in getTrylockCallExpr

Summary:
We unwrap conditional expressions containing try-lock functions.

Additionally we don't acquire on conditional expression branches, since
that is usually not helpful. When joining the branches we would almost
certainly get a warning then.

Hopefully fixes an issue that was raised in D52398.

Reviewers: aaron.ballman, delesley, hokein

Reviewed By: aaron.ballman

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D52888

llvm-svn: 343902

[llvm-ar] Use POSIX-specified timestamps for 'tv'.

Summary:
The POSIX spec says:

```
If the −t option is used with the −v option, the standard output format shall be:
"%s %u/%u %u %s %d %d:%d %d %s\n", <member mode>, <user ID>,
<group ID>, <number of bytes in member>,
<abbreviated month>, <day-of-month>, <hour>,
<minute>, <year>, <file>

where:

...
<abbreviated month>
Equivalent to the format of the %b conversion specification format in date.
<day-of-month>
Equivalent to the format of the %e conversion specification format in date.
<hour> Equivalent to the format of the %H conversion specification format in date.
<minute> Equivalent to the format of the %M conversion specification format in date.
<year> Equivalent to the format of the %Y conversion specification format in date.
```

This actually used to be the format printed by llvm-ar. It was apparently accidentally changed (see r207385 followed by comments in r207387). This makes it conform to GNU ar for easier replacement.

Reviewers: MaskRay

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52940

llvm-svn: 343901

Add support for artificial tail call frames

This patch teaches lldb to detect when there are missing frames in a
backtrace due to a sequence of tail calls, and to fill in the backtrace
with artificial tail call frames when this happens. This is only done
when the execution history can be determined from the call graph and
from the return PC addresses of calls on the stack. Ambiguous sequences
of tail calls (e.g anything involving tail calls and recursion) are
detected and ignored.

Depends on D49887.

Differential Revision: https://reviews.llvm.org/D50478

llvm-svn: 343900

Relax a data formatter test

Before inspecting the contents of a list, make sure that we've stepped
past the push_back() that inserts the element we're interested in.

llvm-svn: 343899

[New PM][PassTiming] implement -time-passes for the new pass manager

Enable time-passes functionality through PassInstrumentation callbacks
for passes and analyses.

TimePassesHandler class keeps all the callbacks, the timing data as it
is being collected as well as the stack of currently active timers.

Parts of the fix that might be somewhat unobvious:
  - mapping of passes into Timer (TimingData) can not be done per-instance.
    PassID name provided into the callback is common for all the pass invocations.
    Thus the only way to get a timing with reasonable granularity is to collect
    timing data per pass invocation, getting a new timer for each BeforePass.
    Hence the key for TimingData uses a pair of <StringRef/unsigned count> to
    uniquely identify a pass invocation.

  - consequently, this new-pass-manager implementation performs no aggregation
    of timing data, reporting timings for each pass invocation separately.
    In that it differs from legacy-pass-manager time-passes implementation that
    reports timing data aggregated per pass instance.

  - pass managers and adaptors are not tracked, similar to how pass managers are
    not tracked in legacy time-passes.

  - TimerStack tracks timers that are active, each BeforePass pushes the new timer
    on stack, each AfterPass pops active timer from stack and stops it.

Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D51276

llvm-svn: 343898

[AArch64] -mcpu=native CPU detection for Cavium processors

This small patch updates the CPU detection for Cavium processors when
-mcpu=native is passed on compile-line.

Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D51939

llvm-svn: 343897

[llvm-nm] Update all tests to redirect stderr to stdout

This addresses the breakage introduced in r343887.

llvm-svn: 343896

X86, AArch64, ARM: Do not attach debug location to spill/reload instructions

This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895

[COFF, ARM64] Add _InterlockedAdd intrinsic

Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D52811

llvm-svn: 343894

Specify -mtriple=x86_64 in an X86-specific dwarf test

On the PPC bot, the %llc_dwarf substitution does not contain an -mtriple
argument. This can cause the wrong backend to be exercised.

This causes issues because the backends differ in when they decide to
emit tail calls:

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/12440

This is mostly a speculative fix as I don't have a PPC machine to test
with.

llvm-svn: 343893

Emit CK_NoOp casts in C mode, not just C++.

Previously, it had been using CK_BitCast even for casts that only
change const/restrict/volatile. Now it will use CK_Noop where
appropriate.

This is an alternate solution to r336746.

Differential Revision: https://reviews.llvm.org/D52918

llvm-svn: 343892

[X86][AVX] Limit getFauxShuffleMask INSERT_SUBVECTOR support to 2 inputs

rL343853 didn't limit the number of subinputs, but we don't currently support faux shuffles with more than 2 total inputs, so put a limiter in place until this is fixed.

Found by Artem Dergachev.

llvm-svn: 343891

[LiveDebugValues] Extend var ranges through artificial blocks

ASan often introduces basic blocks consisting exclusively of
instructions without debug locations, or with line 0 debug locations.

LiveDebugValues needs to extend variable ranges through these artificial
blocks. Otherwise, a lot of variables disappear -- even at -O0.

Typically, LiveDebugValues does not extend a variable's range into a
block unless the block is essentially "part of" the variable's scope
(for a precise definition, see LexicalScopes::dominates). This patch
relaxes the lexical dominance check for artificial blocks.

This makes the following Swift program debuggable at -O0:
```
1| var x = 100
2| print("x = \(x)")
```

rdar://39127144

Differential Revision: https://reviews.llvm.org/D52921

llvm-svn: 343890

Clarify debug output in LiveDebugValues

MachineBasicBlocks often do not have names, so it helps to refer to them
by block number when printing debug messages.

llvm-svn: 343889

Disable the dwarf callsite attrs test on Windows

The Windows formats don't understand relocations inside of AT_return_pc.

http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/270

llvm-svn: 343888

[llvm-nm] Write "no symbol" output to stderr

This matches the output of binutils' nm and ensures that any scripts
or tools that use nm and expect empty output in case there no symbols
don't break.

Differential Revision: https://reviews.llvm.org/D52943

llvm-svn: 343887

Avoid hardcoding PC addresses in a dwarf test

The PCs appear to vary from builder-to-builder:

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/20053

llvm-svn: 343886

[GlobalIsel] Add llvm.invariant.start and llvm.invariant.end

Port over the implementation in SelectionDAGBuilder.cpp into the IRTranslator
and update the arm64-irtranslator test.

These were causing fallbacks in CTMark/Bullet (-Rpass-missed=gisel-select),
and this patch fixes that.

https://reviews.llvm.org/D52945

llvm-svn: 343885

dwarfdump: Avoid parsing units unnecessarily

NFC-ish (the parsing of the units is not a functional change - no
errors/warnings are emitted during the shallow parsing - though without
parsing them here, the "max version" would be wrong (still zero) later
on, so in those cases the units do need to be parsed)

llvm-svn: 343884

[DebugInfo] Add support for DWARF5 call site-related attributes

DWARF v5 introduces DW_AT_call_all_calls, a subprogram attribute which
indicates that all calls (both regular and tail) within the subprogram
have call site entries. The information within these call site entries
can be used by a debugger to populate backtraces with synthetic tail
call frames.

Tail calling frames go missing in backtraces because the frame of the
caller is reused by the callee. Call site entries allow a debugger to
reconstruct a sequence of (tail) calls which led from one function to
another. This improves backtrace quality. There are limitations: tail
recursion isn't handled, variables within synthetic frames may not
survive to be inspected, etc. This approach is not novel, see:

https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=jelinek.pdf

This patch adds an IR-level flag (DIFlagAllCallsDescribed) which lowers
to DW_AT_call_all_calls. It adds the minimal amount of DWARF generation
support needed to emit standards-compliant call site entries. For easier
deployment, when the debugger tuning is LLDB, the DWARF requirement is
adjusted to v4.

Testing: Apart from check-{llvm, clang}, I built a stage2 RelWithDebInfo
clang binary. Its dSYM passed verification and grew by 1.4% compared to
the baseline. 151,879 call site entries were added.

rdar://42001377

Differential Revision: https://reviews.llvm.org/D49887

llvm-svn: 343883

[x86] make blend tests resistant to demanded elements improvements; NFC

Similar to rL343858 - we don't want these tests to lose value with D52912.

llvm-svn: 343882

[COFF, ARM64] Add _InterlockedCompareExchangePointer_nf intrinsic

Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D52807

llvm-svn: 343881

Fix dwarf-no-source-loc.ll path separator on Windows

llvm-svn: 343880

[COFF] Do MinGW specific entry/subsystem inference

ld.bfd doesn't do any inference of subsystem; unless the windows
subsystem is specified, the console subsystem is used.

For the console subsystem, the entry point is called mainCRTStartup,
regardless of whether the the user code entry point is main or wmain.
The same goes for the windows subsystem, where the entry point always
is WinMainCRTStartup, for both WinMain and wWinMain in user code.

One detail that we don't emulate, is that if the inferred entry point
is undefined, ld.bfd silently just sets the entry point to the start
of the image. And if an explicit entry point is set, but it is
undefined, the link still succeeds but the linker warns about the
entry point not being found.

Differential Revision: https://reviews.llvm.org/D52931

llvm-svn: 343879

[docs] Mention some notable feature in the release notes

Differential Revision: https://reviews.llvm.org/D52908

llvm-svn: 343878

[COFF] Cope with GCC produced weak aliases referring to comdat functions

For certain cases of inline functions written to comdat sections,
GCC 5.x produces a weak symbol in addition, which would end up
undefined in some cases.

This no longer seems to happen with GCC 6.x or newer though.

Differential Revision: https://reviews.llvm.org/D52602

llvm-svn: 343877

Revert r343606/r342652 "[winasan] Unpoison the stack in NtTerminateThread""

This still seems to be causing pnacl + asan to crash.

llvm-svn: 343876

[CUDA] Use all 64 bits of GUID in __nv_module_id

getGUID() returns an uint64_t and "%x" only prints 32 bits of it.
Use PRIx64 format string to print all 64 bits.

Differential Revision: https://reviews.llvm.org/D52938

llvm-svn: 343875

DwarfDebug: Pick next location in case of missing location at block begin

Context: Compiler generated instructions do not have a debug location
assigned to them. However emitting 0-line records for all of them bloats
the line tables for very little benefit so we usually avoid doing that.

Not emitting anything will lead to the previous debug location getting
applied to the locationless instructions. This is not desirable for
block begin and after labels. Previously we would emit simply emit
line-0 records in this case, this patch changes the behavior to do a
forward search for a debug location in these cases before emitting a
line-0 record to further reduce line table bloat.

Inspired by the discussion in https://reviews.llvm.org/D52862

llvm-svn: 343874

[RISCV] Regenerate several tests now enableMultipleCopyHints is enabled by default

r343851 caused codegen changes in several tests. This patch regenerates them.

llvm-svn: 343873

clang-format: Don't insert spaces in front of :: for Java 8 Method References.

The existing code kept the space if it was there for identifiers, and it didn't
handle `this`. After this patch, for Java `this` is handled in addition to
identifiers, and existing space is always stripped between identifier and `::`.

Also accept `::` in addition to `.` in front of `<` in `foo::<T>bar` generic
calls.

Differential Revision: https://reviews.llvm.org/D52842

llvm-svn: 343872

[X86] Don't promote i16 compares to i32 if the immediate will fit in 8 bits.

The comments in this code say we were trying to avoid 16-bit immediates, but if the immediate fits in 8-bits this isn't an issue. This avoids creating a zero extend that probably won't go away.

The movmskb related changes are interesting. The movmskb instruction writes a 32-bit result, but fills the upper bits with 0. So the zero_extend we were previously emitting was free, but we turned a -1 immediate that would fit in 8-bits into a 32-bit immediate so it was still bad.

llvm-svn: 343871

Unwind local macro DEFINE_INTERNAL()

No functional change intended.

This is a follow up of a suggestion from D52793.

llvm-svn: 343870

[OpenMP] Convert KMP_DYNAMIC_LIB to a 0 or 1 guard everywhere

llvm-svn: 343869

[X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957)

Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957).

This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level.

I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place.

Differential Revision: https://reviews.llvm.org/D52886

llvm-svn: 343868

Emit diagnostic note when calling an invalid function declaration.

The comment said it was intentionally not emitting any diagnostic
because the declaration itself was already diagnosed. However,
everywhere else that wants to not emit a diagnostic without an extra
note emits note_invalid_subexpr_in_const_expr instead, which gets
suppressed later.

This was the only place which did not emit a diagnostic note.

Differential Revision: https://reviews.llvm.org/D52919

llvm-svn: 343867

[OpenMP] Fix KMP_DYNAMIC_LIB to be dependent on LIBOMP_ENABLE_SHARED

The KMP_DYNAMIC_LIB guard was hard set to 1. This patch has the guard depend
on CMake variable LIBOMP_ENABLE_SHARED.

llvm-svn: 343866

[SelectionDAG] allow undefs when matching splat constants

And use that to transform fsub with zero constant operands.
The integer part isn't used yet, but it is proposed for use in
D44548, so adding both enhancements here makes that
patch simpler.

llvm-svn: 343865

Format the dwarfdump --statistics version as an integer instead of a string.

llvm-svn: 343864

[x86] add test for (X - 0.0) vector with undef elts; NFC

llvm-svn: 343863