review.tizen.org Git - platform/upstream/llvm.git/log

[Frontend] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).

llvm-svn: 328584

Fix newlines. NFCI.

llvm-svn: 328583

[X86] Add WriteCRC32 scheduler class

Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.

Differential Revision: https://reviews.llvm.org/D44647

llvm-svn: 328582

Use local symbols for creating .stack-size.

llvm-svn: 328581

Fix go bindings test when using goma distributed build tool

Goma[1] is a distributed build system similar to distcc and icecc
primarily used to compile Chromium. The client is open source, and
hopefully soon the server will be as well. The intended usage model is
similar to most distributed build systems: prefix gomacc onto your
compiler command line, and it transparently distributes compilation.

The go lit config wants to determine the host compiler binary, so it
needs some extra logic to avoid looking at these prefixes.

[1] https://chromium.googlesource.com/infra/goma/client/

llvm-svn: 328580

Refactor SharedFile::parseRest. NFC.

SharedFile::parseRest function grew organically and got a bit hard to
understand. This patch refactor it. This patch also adds comments.

Differential Revision: https://reviews.llvm.org/D44860

llvm-svn: 328579

Use correct format specifier.
Review comment on r328235 by James Henderson.

llvm-svn: 328578

[MemorySSA] Fix exponential compile-time updating MemorySSA.

MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for
each block, it computes the previous definition for each predecessor,
then takes those definitions and combines them. But currently it doesn't
remember results which it already computed; this means it can visit the
same block multiple times, which adds up to exponential time overall.

To fix this, this patch adds a cache. If we computed the result for a
block already, we don't need to visit it again because we'll come up
with the same result. Well, unless we RAUW a MemoryPHI; in that case,
the TrackingVH will be updated automatically.

This matches the original source paper for this algorithm.

The testcase isn't really a test for the bug, but it adds coverage for
the case where tryRemoveTrivialPhi erases an existing PHI node. (It's
hard to write a good regression test for a performance issue.)

Differential Revision: https://reviews.llvm.org/D44715

llvm-svn: 328577

[libFuzzer] Do not optimize minimize_two_crashes.test.

Speculative fix for build bot breakage on Mac.

llvm-svn: 328576

Move blocktime_str variable right before its first use

llvm-svn: 328575

[Hexagon] Assertion failure in HexagonSubtarget.cpp

In restoreLatency, replace range-for loop with std::find.

Patch by Jyotsna Verma.

llvm-svn: 328574

[X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs

Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)

llvm-svn: 328573

[SLP] Add more checks to a test case. NFC.

llvm-svn: 328572

Reduce code duplication a bit.

Thanks to George Rimar for pointing it out.

llvm-svn: 328571

[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32

Summary:
Re-lands r328386 and r328443, reverting r328482.

Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small
parameters in i8 and i16 do not end up in the SysV register parameters
(EDI, ESI, etc).

I added tests for how we receive small parameters, since that is the
important part. It's always safe to store more bytes than will be read,
but the assumptions you make when loading them are what really matter.

I also tested this by self-hosting clang and it passed tests on win64.

Reviewers: mstorsjo, hans

Subscribers: hiraditya, mstorsjo, llvm-commits

Differential Revision: https://reviews.llvm.org/D44900

llvm-svn: 328570

Reduce code duplication a bit. NFC

llvm-svn: 328569

Add summarizeStats.py to tools directory

The summarizeStats.py script processes raw data provided by the
instrumented (stats-gathering) OpenMP* runtime library. It provides:

1) A radar chart which plots counters as frequency (per GigaTick) of use within
   the program. The frequencies are plotted as log10, however values less than
   one are kept as it is and represented in red color. This was done to help
   visualize the differences better.
2) Pie charts separating total time as compute and non-compute. The compute and
   non-compute times have their own pie charts showing the constructs that
   contributed to them. The percentages listed are with respect to the total
   time.
3) '.csv' file with percentage of time spent within the different constructs.

The script can be used as:
$ python $PATH_TO_SCRIPT/summarizeStats.py instrumented1.csv instrumented2.csv

Patch by Taru Doodi

Differential Revision: https://reviews.llvm.org/D41838

llvm-svn: 328568

[MS] Fix late-parsed template infinite loop in eager instantiation

Summary:
This fixes PR33561 and PR34185.

Don't store pending template instantiations for late-parsed templates in
the normal PendingInstantiations queue. Instead, use a separate list
that will only be parsed and instantiated at end of TU when late
template parsing actually works and doesn't infinite loop.

Reviewers: rsmith, thakis, hans

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D44846

llvm-svn: 328567

[X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881)

Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.

These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).

Differential Revision: https://reviews.llvm.org/D44879

llvm-svn: 328566

Remove unused file, ExecutionEngine/MCJIT/ObjectBuffer.h

This header also wasn't self contained/modular - but with no users, it
didn't seem worth fixing because it'd break so easily again.

llvm-svn: 328565

[XCore] Change std::sort to llvm::sort in response to r327219

Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.

Reviewers: dblaikie, RKSimon, robertlytton

Reviewed By: robertlytton

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44875

llvm-svn: 328564

[lit] Implement 'cat' command for internal shell

Fixes PR36449

Patch by Chamal de Silva

Differential Revision: https://reviews.llvm.org/D43501

llvm-svn: 328563

Delete pdbutil diff mode.

This has been made obsolete by the fact that almost all of the
things it previously checked for are no longer relevant since
we can just compare bytes in a lot of places.

llvm-svn: 328562

[Hexagon] Add more lit tests

llvm-svn: 328561

[InstCombine] improve code comment; NFC

llvm-svn: 328560

[ELF] GotSection increment NumEntries when Target saves GlobalOffsetTable in the .got

When the target saves ElfSym::GlobalOffsetTable in the .got rather than
.got.plt, Target->GotHeaderEntriesNum states the number of extra entries
required in the .got. Rather than having to add Target->GotHeaderEntriesNum to
NumEntries in every function which refers to NumEntries, this patch changes the
initial value of NumEntries in the constructor.

Differential Revision: https://reviews.llvm.org/D44744

llvm-svn: 328559

[Power9]Legalize and emit code for quad-precision convert from double-precision

Legalize and emit code for quad-precision floating point operation xscvdpqp
and add option to guard the quad precision operation support.

Differential Revision: https://reviews.llvm.org/D44746

llvm-svn: 328558

Fix check for verbose logging.

Thanks to Pavel for pointing this out!

llvm-svn: 328557

[PowerPC] Infrastructure work. Implement getting the opcode for a spill in one place.

A new function getOpcodeForSpill should now be the only place to get
the opcode for a given spilled register.

Differential Revision: https://reviews.llvm.org/D43086

llvm-svn: 328556

Disable [MachineLICM] Add functions to MachineLICM to hoist invariant stores

Disable https://reviews.llvm.org/D40196 with setting option
hoist-const-stores to false since failing s390 buildbot.

llvm-svn: 328555

[Pipeliner] Several node-ordering fixes

First, we change the heuristic that is used to ignore the recurrent
node-sets in the node ordering. In certain cases it's not important
to focus on the recurrent node-sets. Instead, the algorithm begins
by considering all the instructions in the node ordering step.

Second, a minor change to the bottom up traversal, which needs to
consider loop carried dependences (modeled as anti dependences).
Previously, these instructions were skipped, which caused problems
because the instruction ends up having both predecessors and
sucessors in the schedule.

Third, consider anti-dependences as a tie breaker when choosing
between instructions in the node ordering. We want to make sure
that the source of the anti-dependence does not end up with both
predecesssors and sucessors in the final node ordering.

Patch by Brendon Cahoon.

llvm-svn: 328554

[AMDGPU] Improve disassembler error handling

Summary:
llvm-objdump now disassembles unrecognised opcodes as data, using
the .long directive. We treat unrecognised opcodes as being 32 bit
values, so move along 4 bytes rather than the single byte which
previously resulted in a cascade of bogus disassembly following an
unrecognised opcode.

While no solution can always disassemble code that contains
embedded data correctly this provides a significant improvement.

The disassembler will now cope with an arbitrary length section
as it no longer truncates it to a multiple of 4 bytes, and will
use the .byte directive for trailing bytes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44685

llvm-svn: 328553

[CodeGen] Mark fma as const for Android

Summary:
r318093 sets fma, fmaf, fmal as const for Gnu and MSVC. Android also
does not set errno for these functions. So mark these const for
Android.

Reviewers: spatel, efriedma, srhines, chh, enh

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D44852

llvm-svn: 328552

[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs

We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR.....

llvm-svn: 328551

[Pipeliner] Check for affine expression in isLoopCarriedOrder

The pipeliner must add a loop carried dependence between two memory
operations if the base register is not an affine (linear) exression.
The current implementation doesn't check how the base register is
defined, which allows non-affine expressions, and then the pipeliner
does not add a loop carried dependence when one is needed.

This patch adds code to isLoopCarriedOrder that checks if the base
register of the memory operations is defined by a phi, and the loop
definition for the phi is a constant increment value. This is a very
simple check for a linear expression.

Patch by Brendon Cahoon.

llvm-svn: 328550

Remove an unneeded (& mislayered) include from Target/TargetLoweringObjectFile on a CodeGen header

llvm-svn: 328549

Remove unneeded (& mislayered) include from TargetMachine.cpp on a CodeGen header

llvm-svn: 328548

[Pipeliner] Add missing loop carried dependences

The pipeliner is not adding a dependence edge for a loop carried
dependence, and ends up scheduling a load from iteration n prior
to an aliased store in iteration n-1.

The code that adds the loop carried dependences in the pipeliner
doesn't check if the memory objects for loads and stores are
"identified" (i.e., distinct) objects. If they are not, then the
code that adds the dependences needs to be conservative. The
objects can be used to check dependences only when they are
distinct objects.

The code that checks for loop carried dependences has been updated
to classify loads and stores that are not identified as "unknown"
values. A store with an "unknown" value can potentially create
a loop carried dependence with any pending load.

Patch by Brendon Cahoon.

llvm-svn: 328547

[SLP] Add a test case. NFC.

llvm-svn: 328546

[Pipeliner] Fix renaming in pipeliner when eliminating phis

The phi renaming code in the pipeliner uses the wrong value when
rewriting phi uses, which results in an undefined value. In this
case, the original phi is no longer needed due to the order of
instruction in the pipelined loop. The pipeliner was assuming, in
this case, the the phi loop definition should be used to
rewrite the uses. However, the pipeliner needs to check to make
sure that the loop definition has already been scheduled. If not,
then the phi initial value needs to be used instead.

Patch by Brendon Cahoon.

llvm-svn: 328545

[OPENMP] Codegen for declare target with link clause.

If the link clause is used on the declare target directive, the object
should be linked on target or target data directives, not during the
codegen. Patch adds support for this clause.

llvm-svn: 328544

[Pipeliner] Fix number of phis to generate in the epilog

The pipeliner was generating too many phis in the epilog blocks, which
caused incorrect code generation when rewriting an instruction that uses
the phi.

In this case, there 3 prolog and epilog stages. An existing phi was
scheduled at stage 1. When generating the code for the 2nd epilog an
extra new phi was generated.

To fix this, we need to update the code that calculates the maximum
number of phis that can be generated, which is based upon the current
prolog stage and the stage of the original phi. In this case, when the
prolog stage is 1 and the original phi stage is 1, the maximum number
of phis to generate is 2.

Patch by Brendon Cahoon.

llvm-svn: 328543

[Pipeliner] Use latency to compute RecMII

The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.

The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.

The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).

The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.

The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound.

Patch by Brendon Cahoon.

llvm-svn: 328542

[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs

llvm-svn: 328541

[Pipeliner] Fix assert caused by pipeliner serialization

The pipeliner is asserting because the serialization step that
occurs at the end is deleting an instruction. The assert
occurs later on because there is a use without a definition.

The problem occurs when an instruction defines a value used
by a REQ_SEQUENCE and that value is used by a COPY instruction.
The latencies between these instructions are zero, so they are
put in to the same packet. The serialization code is unable to
handle this correctly, and ends up putting the REG_SEQUENCE
before its definition.

There is special code in the serialization step that attempts
to handle zero-cost instructions (phis, copy, reg_sequence)
differently than regular instructions. Unfortunately, this means
the order does not come out correct.

This patch simplifies the code by changing the seperate steps for
handling zero-cost and regular instructions. Only phis are
handled separate now, since they should occurs first. Then, this
patch adds checks to make use the MoveUse is set to the smallest
value if there are multiple uses in a cycle.

Patch by Brendon Cahoon.

llvm-svn: 328540

[InstCombine] reassociate loop invariant GEP chains to enable LICM

This change brings performance of zlib up by 10%. The example below is from a
hot loop in longest_match() from zlib.

do.body:
  %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
  %idx.ext = zext i32 %cur_match.addr.0 to i64
  %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
  %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
  %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1

In this example %idx.ext1 is a loop invariant. It will be moved above the use of
loop induction variable %idx.ext such that it can be hoisted out of the loop by
LICM. The operands that have dependences carried by the loop will be sinked down
in the GEP chain. This patch will produce the following output:

do.body:
  %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
  %idx.ext = zext i32 %cur_match.addr.0 to i64
  %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
  %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1
  %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext

llvm-svn: 328539

[Pipeliner] Enable more base+offset dependence changes in pipeliner

The pipeliner changes dependences between base+offset instructions
(loads and stores) so that the instructions have more flexibility
to be scheduled with respect to each other. This occurs when the
pipeliner is able to compute that the instructions will not alias
if their order is changed. The prevous code enforced the alias
property by checking if the base register is the same, and that the
offset values are either both positive or negative.

This patch improves the alias check by using the API
areMemAccessesTriviallyDisjoint instead. This enables more cases,
especially if the offset is a negative value. The pipeliner uses
the function by creating a new instruction with the offset used
in the next iteration.

Patch by Brendon Cahoon.

llvm-svn: 328538

[Pipeliner] Fix calculation when reusing phis

A schedule may require that a phi from the original loop is used in
multiple iterations in the scheduled loop. When this occurs, we generate
multiple phis in the pipelined loop to save the value across iterations.

When we generate the new phis and update the register names in the
pipelined loop, the pipeliner attempts to reuse a previously generated
phi, when possible. The calculation for the name of the new phi needs
to account for the version/iteration of the original phi. Also, in the
epilog, the code only needs to check backwards for a previous iteration
until reaching the first prolog block.

Patch by Brendon Cahoon.

llvm-svn: 328537

[X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of JALU0 for GPR PRF write)

llvm-svn: 328536

[Pipeliner] Fix check for order dependences when finalizing instructions

The code in orderDepdences that looks at the order dependences between
instructions was processing all the successor and predecessor order
dependences. However, we really only want to check for an order dependence
for instructions scheduled in the same cycle.

Also, fixed how the pipeliner handles output dependences. An output
dependence is also a potential loop carried dependence. The pipeliner
didn't handle this case properly so an invalid schedule could be created
that allowed an output dependence to be scheduled in the next iteration
at the same cycle.

Patch by Brendon Cahoon.

llvm-svn: 328516

[Pipeliner] Fix in the pipeliner phi reuse code

When the definition of a phi is used by a phi in the next iteration,
the pipeliner was assuming that the definition is processed first.
Because of the assumption, an incorrect phi name was used. This patch
has a check to see if the phi definition has been processed already.

Patch by Brendon Cahoon.

llvm-svn: 328510

[Pipeliner] Pipeliner should mark physical registers as used

The software pipeliner attempts to delete dead instructions after
generating the pipelined loop. The code looks for uses of each
instruction. Physical registers should be treated differently because
the use chains do not exist. The code that checks for dead
instructions should assume that definitions of physical registers
are used if the operand doesn't contain the dead flag.

Patch by Brendon Cahoon.

llvm-svn: 328509

[Pipeliner] Correctly update memoperands in the epilog

The pipeliner needs to be conservative when updating the memoperands
of instructions in the epilog. Previously, the pipeliner was changing
the offset of the memoperand based upon the scheduling stage. However,
that is incorrect when control flow branches around the kernel code.
The bug enabled a load and store to the same stack offset to be swapped.

This patch fixes the bug by updating the size of the memoperands to be
UINT_MAX. This conservative value means that dependences will be created
between other loads and stores.

Patch by Brendon Cahoon.

llvm-svn: 328508

[demangler] Fix a bug in r328464 found by oss-fuzz.

llvm-svn: 328507

[Hexagon] Give priority to post-incremementing memory accesses in LSR

llvm-svn: 328506

[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs

Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)

This also adds missing vcvttss2si tests

llvm-svn: 328505

Fix TestDisassembleBreakpoint broken by r328488

The first issue was that the test was capturing the "before" disassembly
before launching, and the "after" after. This is a problem because some
of the disassembly will change after we know the load address (e.g. PCs
in call instructions). I fix this by capturing both disassemblies with
the process running.

The second issue was that the refactor in r328488 accidentaly changed
the meaning of the test, as it was no longer disassembling the function
which contained the breakpoint.

While inside, I also modernize the test to use
lldbutil.run_to_source_breakpoint and prevent debug-info replication.

llvm-svn: 328504

Migrate dockerfiles to use multi-stage builds.

Summary:
We previously emulated multi-staged builds using two dockerfiles,
native support from Docker allows us to merge them into one,
simplifying our scripts.

For more details about multi-stage builds, see:
https://docs.docker.com/develop/develop-images/multistage-build/

Reviewers: mehdi_amini, klimek, sammccall

Reviewed By: sammccall

Subscribers: llvm-commits, ioeric, cfe-commits

Differential Revision: https://reviews.llvm.org/D44787

llvm-svn: 328503

[InstCombine] distribute fmul over fadd/fsub

This replaces a large chunk of code that was looking for compound
patterns that include these sub-patterns. Existing tests ensure that
all of the previous examples are still folded as expected.

We still need to loosen the FMF check.

llvm-svn: 328502

[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs

These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults.

llvm-svn: 328501

[clangd] Support incremental document syncing

Summary:
This patch adds support for incremental document syncing, as described
in the LSP spec.  The protocol specifies ranges in terms of Position (a
line and a character), and our drafts are stored as plain strings.  So I
see two things that may not be super efficient for very large files:

- Converting a Position to an offset (the positionToOffset function)
  requires searching for end of lines until we reach the desired line.
- When we update a range, we construct a new string, which implies
  copying the whole document.

However, for the typical size of a C++ document and the frequency of
update (at which a user types), it may not be an issue.  This patch aims
at getting the basic feature in, and we can always improve it later if
we find it's too slow.

Signed-off-by: Simon Marchi <simon.marchi@ericsson.com>
Reviewers: malaperle, ilya-biryukov

Reviewed By: ilya-biryukov

Subscribers: MaskRay, klimek, mgorny, ilya-biryukov, jkorous-apple, ioeric, cfe-commits

Differential Revision: https://reviews.llvm.org/D44272

llvm-svn: 328500

[llvm-mca] Fix how views are added to the InstructionTables.

This should fix the stack-use-after-scope reported by the asan buildbots after
revision 328493.

llvm-svn: 328499

[InstCombine] check uses before creating instructions for fmul distribution

As the tests show, we could create extra instructions without any obvious benefit.

llvm-svn: 328498

[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs

The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps

llvm-svn: 328497

AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classes

Differential revision: https://reviews.llvm.org/D44820

Change-Id: I732979e2964006aa15d78a333d8886e6855f319a
llvm-svn: 328496

[clang-format] Wildcard expansion on Windows.

Summary:
Add support for wildcard expansion in command line arguments on Windows.
See https://docs.microsoft.com/en-us/cpp/c-language/expanding-wildcard-arguments

Fixes https://bugs.llvm.org/show_bug.cgi?id=17217

Reviewers: klimek, djasper, rnk

Reviewed By: rnk

Subscribers: rnk, smeenai, zturner, alexfh, mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D44778

llvm-svn: 328495

[SemaCXX] _Pragma("clang optimize off") not affecting lambda.

Declaring "_Pragma("clang optimize off")" before the body of a
function with a lambda leads to the lambda functions in the body
not being affected.

Differential Revision: https://reviews.llvm.org/D43821

llvm-svn: 328494

[llvm-mca] Add a flag -instruction-info to enable/disable the instruction info view.

llvm-svn: 328493

[llvm-mca] Update the commandline docs after r328305.

Document that flag -resource-pressure can be used to enable/disable the resource
pressure view. This change should have been part of r328305.

llvm-svn: 328492

[X86][Btver2] Double the AGU and schedule pipe resources for YMM

Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model.

llvm-svn: 328491

[LSR] Allow giving priority to post-incrementing addressing modes

Implement TTI interface for targets to indicate that the LSR should give
priority to post-incrementing addressing modes.

Combination of patches by Sebastian Pop and Brendon Cahoon.

Differential Revision: https://reviews.llvm.org/D44758

llvm-svn: 328490

Make @skipUnlessSupportedTypeAttribute windows-compatible

- close_fds is not compatible with stdin/out redirection on windows. I
just remove it, as this is not required for correct operation.
- the command string was assuming a posix shell. I rewrite the Popen
invocation to avoid the need for passing the arguments through a shell.

llvm-svn: 328489

Add and fix some tests for PPC64

Summary:
TestExprsChar.py
Char is unsigned char by default in PowerPC.

TestDisassembleBreakpoint.py
Modify disassemble testcase to consider multiple architectures.

TestThreadJump.py
Jumping directly to the return line on PowerPC architecture dos not
means returning the value that is seen on the code. The last test fails,
because it needs the execution of some assembly in the beginning of the
function. Avoiding this test for this architecture.

TestEhFrameUnwind.py
Implement func for ppc64le test case.

TestWatchLocation.py
TestStepOverWatchpoint.py
PowerPC currently supports only one H/W watchpoint.

TestDisassembleRawData.py
Add PowerPC opcode and instruction for disassemble testcase.

Reviewers: labath

Reviewed By: labath

Subscribers: davide, labath, alexandreyy, lldb-commits, luporl, lbianc

Differential Revision: https://reviews.llvm.org/D44472
Patch by Alexandre Yukio Yamashita <alexandre.yamashita@eldorado.org.br>.

llvm-svn: 328488

[llvm-mca] Add flag -instruction-tables to print the theoretical resource pressure distribution for instructions (PR36874)

The goal of this patch is to address most of PR36874. To fully fix PR36874 we
need to split the "InstructionInfo" view from the "SummaryView". That would make
easy to check the latency and rthroughput as well.

The patch reuses all the logic from ResourcePressureView to print out the
"instruction tables".

We have an entry for every instruction in the input sequence. Each entry reports
the theoretical resource pressure distribution. Resource pressure is uniformly
distributed across all the processor resource units of a group.

At the moment, the backend pipeline is not configurable, so the only way to fix
this is by creating a different driver that simply sends instruction events to
the resource pressure view. That means, we don't use the Backend interface.
Instead, it is simpler to just have a different code-path for when flag
-instruction-tables is specified.

Once Clement addresses bug 36663, then we can port the "instruction tables"
logic into a stage of our configurable pipeline.

Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag
-instruction-tables to each modified test.

Differential Revision: https://reviews.llvm.org/D44839

llvm-svn: 328487

[LLDB][PPC64] Fix TestGdbRemoteAuxvSupport

Summary: PPC64's auxvec has a special key that must be ignored.

Reviewers: clayborg, labath

Reviewed By: clayborg, labath

Subscribers: alexandreyy, lbianc

Differential Revision: https://reviews.llvm.org/D43771
Patch by Leandro Lupori <leandro.lupori@gmail.com>.

llvm-svn: 328486

Add a test for setting the load address of a module with differing physical/virtual addresses

Summary:
First attempt at landing D42145 was reverted because it caused test
failures on some android devices. It turned out this was because these
devices had vdso modules with differing physical and virtual addresses.
This was not caught earlier because all of the modules in our tests
either lack physical addresses or have them identical to virtual ones.

In the discussion on the patch, we came to the conclusion that in the
scenario where we are merely setting a load address of a module (for
example from a dynamic loader plugin), we should always use virtual
addresses (i.e., preserve status quo). This patch adds a test to make
sure we don't regress in that direction.

Reviewers: owenpshaw

Subscribers: lldb-commits

Differential Revision: https://reviews.llvm.org/D44738

llvm-svn: 328485

Test commit - adding a new line.

llvm-svn: 328484

[LoopUnroll] Fix dangling pointers in SCEV

Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on
fact that exit count of outer loops cannot rely on exiting blocks of
inner loops, which is true in current implementation of backedge taken count
calculation but is wrong in general. As result, when we only forget the loop that
we have just unrolled, we may still have cached data for its outer loops (in particular,
exit counts) which keeps references on blocks of inner loop that could have been
changed or even deleted.

The attached test demonstrates a situaton when after unrolling of innermost loop
the outermost loop contains a dangling pointer on non-existant block. The problem
shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV
smarter about exit count calculation. I am not sure if the bug exists without this patch,
it appears that now it is accidentally correct just because in practice exact backedge
taken count for outer loops with complex control flow inside is never calculated.
But when SCEV learns to do so, this problem shows up.

This patch replaces existing logic of SCEV loop invalidation with a correct one, which
happens to be invalidation of outermost loop (which also leads to invalidation of all
loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers
on removed blocks, or just outdated information that has changed after unrolling.

Differential Revision: https://reviews.llvm.org/D44818
Reviewed By: samparker

llvm-svn: 328483

Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32"

This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up
patch at D44876 fixes this, but let's revert back to green for now until that's
ready to land.

(Also reverts r328443.)

> Both GCC and MSVC only look at the low byte of a boolean when it is
> passed.

llvm-svn: 328482

[DeadArgElim] Strip allocsize attributes when deleting an argument.

Since allocsize refers to the argument number it gets invalidated when
an argument is removed and the numbers shift.

llvm-svn: 328481

[IRCE] Enable increasing loops of variable bounds

CanBeMin is currently used which will report true for any unknown
values, but often a check is performed outside the loop which covers
this situation:

for (int i = 0; i < N; ++i)
  ...

if (N > 0)
  for (int i = 0; i < N; ++i)
    ...

So I've add 'LoopGuardedAgainstMin' which reports whether N is
greater than the minimum value which then allows loop with a variable
loop count to be optimised. I've also moved the increasing bound
checking into its own function and replaced SumCanReachMax is another
isLoopEntryGuardedByCond function.

llvm-svn: 328480

This is PR36799.

Currently, we might have a bug with scripts like below:

.foo : ALIGN(8)
{
*(.foo)
} > ram
because do not expand the memory region when doing ALIGN.

This might result in file range overlaps. The patch fixes the issue.

Differential revision: https://reviews.llvm.org/D44730

llvm-svn: 328479

[ARM] Simplify constructing the ARMArchFeature string. NFC.

Differential Revision: https://reviews.llvm.org/D44819

llvm-svn: 328478

Fix test case initialization issues in permissions test

llvm-svn: 328477

Implement filesystem::perm_options specified in NB comments.

The NB comments for filesystem changed permissions and added
a new enum `perm_options` which control how the permissions
are applied.

This implements than NB resolution

llvm-svn: 328476

Make filesystem tests generic between experimental and std versions.

As I move towards implementing std::filesystem, there is a need to
make the existing tests run against both the std and experimental versions.
Additionally, it's helpful to allow running the tests against other
implementations of filesystem.

This patch converts the test to easily target either. First, it
adds a filesystem_include.hpp header which is soley responsible
for selecting and including the correct implementation. Second,
it converts existing tests to use this header instead of including
filesystem directly.

llvm-svn: 328475

[X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT.

llvm-svn: 328474

[X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models.

I've used Agner's data as best I could to get the values to converge on.

llvm-svn: 328473

[X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions.

llvm-svn: 328472

[X86] Correct the itineraries for the dot production instructions.

llvm-svn: 328471

[X86] Use the same itinerary for VCVTDQ2PD as the SSE version so that the generated scheduler classes will merge.

llvm-svn: 328470

[X86] Swap the itineraries on the memory and register forms of CVTDQ2PD.

They were backwards.

llvm-svn: 328469

[X86] Give VMOVSX/ZX the same itinerary as the SSE version so they'll reuse the same generated scheduler class.

llvm-svn: 328468

[sanitizer] Make test compatible with Darwin

llvm-svn: 328467

[X86] Give vpmsadbw the same itinerary as the SSE version so they'll be able to share the same generated scheduler class.

llvm-svn: 328466

[X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 for Skylake.

This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX.

llvm-svn: 328465

[demangler] Use a back-patching scheme to resolve forward references.

Strictly in a conversion operator's type, a <template-param> refers to a
<template-arg> that is further ahead in the mangled name. Instead of
doing a second parse to resolve these, introduce a
ForwardTemplateReference Node and back-patch the referenced
<template-arg> when we're in the right context.

This is also a correctness fix, previously we would only do a second
parse if the <template-param> was out of bounds in the current set of
<template-args>. This lead to misdemangles (gasp!) when the conversion
operator was a member of a templated struct, for instance.

llvm-svn: 328464

[demangler] Tweak how parameter pack sizes are determined.

Rather than eagerly propagating up parameter pack sizes in Node ctors,
find the parameter pack size during printing. This is being done to
support back-patching forward referencing <template-param>s.

llvm-svn: 328463

[demangler] Support for clang's enable_if attribute.

Fixes PR33569.

llvm-svn: 328462

[PatternMatch] allow undef elements when matching vector FP +0.0

This continues the FP constant pattern matching improvements from:
https://reviews.llvm.org/rL327627
https://reviews.llvm.org/rL327339
https://reviews.llvm.org/rL327307

Several integer constant matchers also have this ability. I'm
separating matching of integer/pointer null from FP positive zero
and renaming/commenting to make the functionality clearer.

llvm-svn: 328461