review.tizen.org Git - platform/upstream/llvm.git/log

[Attr] Add missing header for clang example.

The examples are easy to miss.

[lldb/DWARF] Switch to llvm location list parser

Summary:
This patch deletes the lldb location list parser and teaches the
DWARFExpression class to use the parser in llvm instead. I have
centralized all the places doing the parsing into a single
GetLocationExpression function.

In theory the the actual location list parsing should be covered by llvm
tests, and this glue code by our existing location list tests, but since
we don't have that many location list tests, I've tried to extend the
coverage a bit by adding some explicit dwarf5 loclist handling and a
test of the dumping code.

For DWARF4 location lists this should be NFC (modulo small differences
in error handling which should only show up on invalid inputs). In case
of DWARF5, this fixes various missing bits of functionality, most
notably, the lack of support for DW_LLE_offset_pair.

Reviewers: JDevlieghere, aprantl, clayborg

Subscribers: lldb-commits, dblaikie

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D71003

[lldb] Improve/fix base address selection in location lists

Summary:
Lldb support base address selection entries in location lists was broken
for a long time. This wasn't noticed until llvm started producing these
kinds of entries more frequently with r374600.

In r374769, I made a quick patch which added sufficient support for them
to get the test suite to pass. However, I did not fully understand how
this code operates, and so the fix was not complete. Specifically, what
was lacking was the ability to handle modules which were not loaded at
their preferred load address (for instance, due to ASLR).

Now that I better understand how this code works, I've come to the
conclusion that the current setup does not provide enough information
to correctly process these entries. In the current setup the location
lists were parameterized by two addresses:
- the distance of the function start from the start of the compile unit.
  The purpose of this was to make the location ranges relative to the
  start of the function.
- the actual address where the function was loaded at. With this the
  function-start-relative ranges can be translated to actual memory
  locations.

The reason for the two values, instead of just one (the load bias) is (I
think) MachO, where the debug info in the object files will appear to be
relative to the address zero, but the actual code it refers to
can be moved and reordered by the linker. This means that the location
lists need to be "linked" to reflect the locations in the actual linked
file.

These two bits of information were enough to correctly process location
lists which do not contain base address selection entries (and so all
entries are relative to the CU base). However, they don't work with
them because, in theory two base address can be completely unrelated (as
can happen for instace with hot/cold function splitting, where the
linker can reorder the two pars arbitrarily).

To fix that, I split the first parameter into two:
- the compile unit base address
- the function start address, as is known in the object file

The new algorithm becomes:
- the location lists are processed as they were meant to be processed.
  The CU base address is used as the initial base address value. Base
  address selection entries can set a new base.
- the difference between the "file" and "load" function start addresses
  is used to compute the load bias. This value is added to the final
  ranges to get the actual memory location.

This algorithm is correct for non-MachO debug info, as there the
location lists correctly describe the code in the final executable, and
the dynamic linker can just move the entire module, not pieces of it. It
will also be correct for MachO if the static linker preserves relative
positions of the various parts of the location lists -- I don't know
whether it actually does that, but judging by the lack of base address
selection support in dsymutil and lldb, this isn't something that has
come up in the past.

I add a test case which simulates the ASLR scenario and demonstrates
that base address selection entries now work correctly here.

Reviewers: JDevlieghere, aprantl, clayborg

Subscribers: dblaikie, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D70532

[test][tools] Add missing and improve testing

Mostly this adds testing for certain aliases in more explicit ways.
There are also a few tidy-ups, and additions of missing testing, where
the feature was either not tested at all, or not tested explicitly and
sufficiently.

Reviewed by: MaskRay, rupprecht, grimar

Differential Revision: https://reviews.llvm.org/D71116

[ARM][MVE] Add complex vector intrinsics

Summary:
This patch adds intrinsics for the following MVE instructions:
* VCADD, VHCADD
* VCMUL
* VCMLA

Each of the above 3 groups has a corresponding new LLVM IR intrinsic.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71190

[ARM] Add missing REQUIRES: asserts to test. NFC

[lldb] Xfail TestCallOverriddenMethod.py for aarch64/linux

This test still fails on Linux aarch64.
Tested by buildbot running Ubuntu Bionic

Differential Revision: https://reviews.llvm.org/D70722

[CommandLine] Add missing Callbacks

It appears that the cl::bits options are not used anywhere in-tree. In
the recent addition to add Callback's to the options, the Callback was
missing from this one. This fixes it by adding the same code from the
other classes.

It also adds a simple test, of sorts, just to make sure these continue
compiling.

[ARM] Enable MVE masked loads and stores

With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968

[clang][Tooling] Fix potential UB in ExpandResponseFilesCompilationDatabase

Summary:
`vector::assign` will cause UB at here.

fixes: https://github.com/clangd/clangd/issues/223

Reviewers: kadircet, sammccall, hokein

Reviewed By: sammccall

Subscribers: merge_guards_bot, ilya-biryukov, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D71172

[OpenCL] Handle address space conversions for constexpr (PR44177)

The AST for the constexpr.cl test contains address space conversion
nodes to cast through the implicit generic address space. These
caused the evaluator to reject the input as constexpr in C++ for
OpenCL mode, whereas the input was considered constexpr in plain C++
mode as the AST won't have address space cast nodes then.

Fixes PR44177.

Differential Revision: https://reviews.llvm.org/D71015

gn build: Merge 6d5c273500a

[ARM] Teach the Arm cost model that a Shift can be folded into other instructions

This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966

[ARM] Additional tests and minor formatting. NFC

This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.

Reland "[AST] Traverse the class type loc inside the member type loc.""

Summary: added a unittest which causes "TL.getClassTInfo" is null.

Reviewers: ilya-biryukov

Subscribers: mgorny, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D71186

[cmake] Disable GCC 9's -Wredundant-move

Summary:
This new warning (enabled by -Wextra) fires when a std::move is
redundant, as the default compiler behavior would be to select a move
operation anyway (e.g., when returning a local variable). Unlike
-Wpessimizing-move, it has no performance impact -- it just adds noise.

Currently llvm has about 1500 of these warnings. Unfortunately, the
suggested fix -- removing std::move -- does not work because of some
older compilers we still support. Specifically clang<=3.8 will not use a
move operation if an implicit conversion is needed (Core issue 1579). In
code like "A f(ConvertibleToA a) { return a; }" it will prefer a copy,
or fail to compile if a copy is not possible.

This patch disables that warning to get a meaningful signal out of a GCC
9 build.

Reviewers: rnk, aaron.ballman, xbolva00

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70963

[DebugInfo] Make describeLoadedValue() reg aware

Summary:
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.

This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.

This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.

This also fixes situations in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.

This also allows us to handle cases like this:

  $ebx = [...]
  $rdi = MOVSX64rr32 $ebx
  $esi = MOV32rr $edi
  CALL64pcrel32 @call

The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.

This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.

I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: ormris, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D70431

[lldb] Support for DWARF-5 atomic types

Summary:
This patch adds support for atomic types (DW_TAG_atomic_type) to LLDB. It's mostly just filling out all the switch-statements that didn't implement Atomic case with the usual boilerplate.

Thanks Pavel for writing the test case.

Reviewers: labath, aprantl, shafik

Reviewed By: labath

Subscribers: jfb, abidh, JDevlieghere, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D71183

Revert "[DebugInfo] Make describeLoadedValue() reg aware"

This reverts commit 3cd93a4efcdeabeb20cb7bec9fbddcb540d337a1.
I'll recommit with a well-formatted arcanist commit message.

[DebugInfo] Make describeLoadedValue() reg aware

Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.

This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.

This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.

This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.

This also allows us to handle cases like this:

  $ebx = [...]
  $rdi = MOVSX64rr32 $ebx
  $esi = MOV32rr $edi
  CALL64pcrel32 @call

The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.

This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.

I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.

[compiler-rt] Add a critical section when flushing gcov counters

Summary:
Counters can be flushed in a multi-threaded context for example when the process is forked in different threads (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Instrumentation/GCOVProfiling.cpp#L632-L663).
In order to avoid pretty bad things, a critical section is needed around the flush.
We had a lot of crashes in this code in Firefox CI when we switched to clang for linux ccov builds and those crashes disappeared with this patch.

Reviewers: marco-c, froydnj, dmajor, davidxl

Reviewed By: marco-c, dmajor

Subscribers: froydnj, dmajor, dberris, jfb, #sanitizers, llvm-commits, sylvestre.ledru

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D70910

[lldb] Add a test for how we lazily create Clang AST nodes

Summary:
One of the ways we try to make LLDB faster is by only creating the Clang declarations (and loading the associated types)
when we actually need them for something. For example an evaluated expression might need to load types to
type check and codegen the expression.

Currently this mechanism isn't really tested, so we currently have no way to know how many Clang nodes we load and
when we load them. In general there seems to be some confusion when and why certain Clang nodes are created.
As we are about to make some changes to the code which is creating Clang AST nodes we probably should have
a test that at least checks that the current behaviour doesn't change. It also serves as some kind of documentation
on the current behaviour.

The test in this patch is just evaluating some expressions and checks which Clang nodes are created due to this in the
module AST. The check happens by looking at the AST dump of the current module and then scanning it for the
declarations we are looking for.

I'm aware that there are things missing in this test (inheritance, template parameters, non-expression evaluation commands)
but I'll expand it in follow up patches.

Also this test found two potential bugs in LLDB which are documented near the respective asserts in the test:

1. LLDB seems to always load all types of local variables even when we don't reference them in the expression. We had patches
that tried to prevent this but it seems that didn't work as well as it should have (even though we don't complete these
types).
2. We always seem to complete the first field of any record we run into. This has the funny side effect that LLDB is faster when
all classes in a project have an arbitrary `char unused;` as their first member. We probably want to fix this.

Reviewers: shafik

Subscribers: abidh, JDevlieghere, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D71056

Revert 393dacacf7e7 "[ARM] Enable TypePromotion by default"

This caused "Too many bits for uint64_t" asserts when building Chromium. See
https://crbug.com/1031978#c2 for a reproducer. I'll follow up on the
llvm-commits thread with a creduced version.

> ARMCodeGenPrepare has already been generalized and renamed to
> TypePromotion. We've had it enabled and tested downstream for a
> while, so enable it by default.
>
> Differential Revision: https://reviews.llvm.org/D70998

[c++20] Synthesis of defaulted comparison functions.

Array members are not yet handled. In addition, defaulted comparisons
can't yet find comparison operators by unqualified lookup (only by
member lookup and ADL). These issues will be fixed in follow-on changes.

Fix for build bot failure. For more details see:
https://reviews.llvm.org/D70691
Upated LIT test.

[PowerPC] Automatically generate store-constant.ll . NFC

Fix a few doc typos, to cycle bots.

[lldb/SWIG] Guard embedded Python code in SWIG interfaces by SWIGPYTHON

Guard the embedded Python code in LLDB's interface files by the
SWIGPYTHON define to ensures they can be reused for other languages
supported by SWIG.

[NFC][LivePhysRegs] Fix incorrect comment

Reviewers: #llvm, tellenbach

Reviewed By: tellenbach

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71051

Patch by: rollrat <rollrat.cse@gmail.com>

[Frontend] Allow OpenMP offloading to aarch64

Summary:
D30644 added OpenMP offloading to AArch64 targets, then D32035 changed the
frontend to throw an error when offloading is requested for an unsupported
target architecture. However the latter did not include AArch64 in the list
of supported architectures, causing the following unit tests to fail:

    libomptarget :: api/omp_get_num_devices.c
    libomptarget :: mapping/pr38704.c
    libomptarget :: offloading/offloading_success.c
    libomptarget :: offloading/offloading_success.cpp

Reviewers: pawosm01, gtbercea, jdoerfert, ABataev

Subscribers: kristof.beyls, guansong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D70804

[InstSimplify] fold copysign with negated operand, part 2

This is another transform suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

Unlike rG12f39e0fede9, it doesn't look like the
backend matches this variant.

Fix typo in the AST Matcher Reference doc Closes: #54

[InstSimplify] fold copysign with negated operand

This is another transform suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

The backend for some targets already manages to get
this if it converts copysign to bitwise logic.

Revert "Driver: Don't look for libc++ headers in the install directory on Android."

This reverts commit 198fbcb817492ff45946e3f7517de15e8cdf0607.

This breaks Fedora 31.

[llvm-dwarfdump][Statistics] Unify coverage statistic computation

Summary:
The patch removes OffsetToFirstDefinition in the 'scope bytes total'
statistic computation. Thus it unifies the way the scope and the coverage
buckets are computed. The rationals behind that are the following:

1. OffsetToFirstDefinition was used to calculate the variable's life range.
However, there is no simple way to do it accurately, so the scope calculated
this way might be misleading. See D69027 for more details on the subject.
2. Both 'scope bytes total' and coverage buckets seem to be intended
to represent the same data in different ways. Otherwise, the statistics
might be controversial and confusing.

Note that the approach gives up a thorough evaluation of debug information
completeness (i.e. coverage buckets by themselves doesn't tell how good
the debug information is). Only changes in coverage over time make
a 'physical' sense.

Reviewers: djtodoro, aprantl, vsk, dblaikie, avl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70548

[ARM] Attempt to use whole register vmovs for MVE shuffles.

MVE doesn't have the range of shuffle instructions available in Neon. We
also cannot use the trick of cutting a difficult vector shuffle in half
to simplify things. Instead we need to be more careful about how we
lower shuffles.

This patch adds an extra combine that attempts to find "whole lane"
vmovs when lowering shuffles of smaller types. This helps us make some
shuffles a lot simpler, generating single lane movs for the parts that
can make use of it, falling back to the original shuffle for the rest.

Differential Revision: https://reviews.llvm.org/D69509

[ARM] Disable VLD4 under MVE

Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109

[clangd] Navigation from definition of template specialization to primary template

Fixes https://github.com/clangd/clangd/issues/212.

Reviewers: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D71090

[LV] Pick correct BB as insert point when fixing PHI for FORs.

Currently we fail to pick the right insertion point when
PreviousLastPart of a first-order-recurrence is a PHI node not in the
LoopVectorBody. This can happen when PreviousLastPart is produce in a
predicated block. In that case, we should pick the insertion point in
the BB the PHI is in.

Fixes PR44020.

Reviewers: hsaito, fhahn, Ayal, dorit

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D71071

Symbol: use elaborated types for `DataExtractor`

Use type elaborated spellings for the parameter to avoid the ambiguity
between `llvm` and `lldb_private` names. This is needed for building
with Visual Studio.

[SystemZ] Fix build bot failures

My patch 9db13b5a7d43096a9ab5f7cef6e1b7e2dc9c9c63 seems to have
caused some build bots to fail due to warnings that appear only
when using -Wcovered-switch-default.

This patch is an attempt to fix this by trying to avoid both the warning
"default label in switch which covers all enumeration values"
for the inner switch statements and at the same time the warning
"this statement may fall through"
for the outer switch statement in getVectorComparison
(SystemZISelLowering.cpp).

Optionally exclude bitfield definitions from magic numbers check

Adds the IgnoreBitFieldsWidths option to readability-magic-numbers.

[SimplifyCFG] Account for N being null.

Fixes a crash, e.g.
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15119/

[BPF] Support weak global variables for BTF

Generate types for global variables with "weak" attribute.
Keep allocation scope the same for both weak and non-weak
globals as ELF symbol table can determine whether a global
symbol is weak or not.

Differential Revision: https://reviews.llvm.org/D71162

[SimplifyCFG] Handle AssumptionCache being null.

AssumptionCache can be null in SimplifyCFGOptions. However, FoldCondBranchOnPHI() was not properly handling that when passing a null AssumptionCache to simplifyCFG.

Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>

Reviewers: fhahn, lebedev.ri, spatel

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D69963

[OpenMP] Require trivially copyable type for mapping

A trivially copyable type provides a trivial copy constructor and a trivial
copy assignment operator. This is enough for the runtime to memcpy the data
to the device. Additionally there must be no virtual functions or virtual
base classes and the destructor is guaranteed to be trivial, ie performs
no action.
The runtime does not require trivial default constructors because on alloc
the memory is undefined. Thus, weaken the warning to be only issued if the
mapped type is not trivially copyable.

Differential Revision: https://reviews.llvm.org/D71134

[FPEnv] Constrained FCmp intrinsics

This adds support for constrained floating-point comparison intrinsics.

Specifically, we add:

      declare <ty2>
      @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
                                          metadata <condition code>,
                                          metadata <exception behavior>)
      declare <ty2>
      @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
                                           metadata <condition code>,
                                           metadata <exception behavior>)

The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).

The condition code is implemented as a metadata string.  The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).

These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations.  Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes.  The patch includes support
for the common legalization operations for those nodes.

The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).

Differential Revision: https://reviews.llvm.org/D69281

gn build: Merge e60b36cf92e

[VPlan] Rename VPlanHCFGTransforms to VPlanTransforms (NFC).

The file is intended to gather various VPlan transformations, not only
CFG related transforms. Actually, the only transformation there is not
CFG related.

Reviewers: Ayal, gilr, hsaito, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70732

[PowerPC] Fix MI peephole optimization for splats

Summary:
This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap.

Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered.

This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register.

Here is an example in pseudo-MIR of what happens in the test cased added in this patch:

Before PPC MI peephole optimization:
```
%arg = XVADDDP %0, %1

$f1 = COPY %arg.sub_64
call double rint(double)
%res.first = COPY $f1
%vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64

%arg.swapped = XXPERMDI %arg, %arg, 2
$f1 = COPY %arg.swapped.sub_64
call double rint(double)
%res.second = COPY $f1

%vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
%vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2
; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ]
```

After optimization:
```
; ...
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1
; so the pass replaces the swap with a copy:
%vec.res = COPY %vec.res.splat
; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ]
```

As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat.

Committed for vddvss (Colin Samples). Thanks Colin for the patch!
Differential Revision: https://reviews.llvm.org/D69497

export.sh: Fetch sources from GitHub instead of SVN

Reviewers: hansw, jdoerfert

Subscribers: sylvestre.ledru, mgorny, hans, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70460

Driver: Don't look for libc++ headers in the install directory on Android.

The NDK uses a separate set of libc++ headers in the sysroot. Any headers
in the installation directory are not going to work on Android, not least
because they use a different name for the inline namespace (std::__1 instead
of std::__ndk1).

This effectively makes it impossible to produce a single toolchain that is
capable of targeting both Android and another platform that expects libc++
headers to be installed in the installation directory, such as Mac.

In order to allow this scenario to work, stop looking for headers in the
install directory on Android.

Differential Revision: https://reviews.llvm.org/D71154

[AArch64][GlobalISel] Add missing default statement to a switch in the selector.

Move variable only used in an assert into the assert itself.

This prevents unused variable warnings from breaking the build.

[c++20] Determine whether a defaulted comparison should be deleted or
constexpr.

[AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates.

Only implemented for the type combinations already supported for G_SHL.

Differential Revision: https://reviews.llvm.org/D71153

[lldb/Reproducer] Disable test on Windows to unblock the bot.

http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja

Add matchDynamic convenience functions

Summary: These correspond to the existing match() free functions.

Reviewers: aaron.ballman

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D54406

gn build: Change scudo's list of supported platforms to a whitelist.

Scudo only supports building for android/linux/fuchsia, so require target_os to
be one of linux/fuchsia to do a stage2_unix scudo build. Android is already
covered by the stage2_android* toolchains below.

Differential Revision: https://reviews.llvm.org/D71131

Revert "[Sema][X86] Consider target attribute into the checks in validateOutputSize and validateInputSize."

This reverts commit e1578fd2b79fe5af5f80c0c166a8abd0f816c022.

It introduces a dependency on Attr.h which I am removing from
ASTContext.h.

Use ASTDumper to dump the AST from clang-query

Summary:
This way, the output is not limited by the various API differences
between the dump() member functions. For example, all dumps are now in
color, while that used to be the case only for Decls and Stmts, but not
Types.

Additionally, while DynTypedNode::dump (which was used up to now) was
limited to dumping only Decls, Stmts and Types, this makes clang-query
support everything ASTNodeTraverser supports.

Reviewers: aaron.ballman

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D62056

[Sema][X86] Consider target attribute into the checks in validateOutputSize and validateInputSize.

The validateOutputSize and validateInputSize need to check whether
AVX or AVX512 are enabled. But this can be affected by the
target attribute so we need to factor that in.

This patch copies some of the code from CodeGen to create an
appropriate feature map that we can pass to the function. Probably
need some refactoring here to share more code with Codegen. Is
there a good place to do that? Also need to support the cpu_specific
attribute as well.

Differential Revision: https://reviews.llvm.org/D68627

Remove Expr.h include from ASTContext.h, NFC

ASTContext.h is popular, prune its includes. Expr.h brings in Attr.h,
which is also expensive.

Move BlockVarCopyInit to Expr.h to accomplish this.

[CommandLine] Add callbacks to Options

Summary:
Add a new cl::callback attribute to Option.

This attribute specifies a callback function that is called when
an option is seen, and can be used to set other options, as in
option A implies option B. If the option is a `cl::list`, and
`cl::CommaSeparated` is also specified, the callback will fire
once for each value. This could be used to validate combinations
or selectively set other options.

Reviewers: beanz, thomasfinch, MaskRay, thopre, serge-sans-paille

Reviewed By: beanz

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70620

Make it possible control matcher traversal kind with ASTContext

Summary:
This will eventually allow traversal of an AST while ignoring invisible
AST nodes. Currently it depends on the available enum values for
TraversalKinds. That can be extended to ignore all invisible nodes in
the future.

Reviewers: klimek, aaron.ballman

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61837

[WebAssebmly][MC] Support .import_name/.import_field asm directives

Convert the MC test to use asm rather than bitcode.

This is a precursor to https://reviews.llvm.org/D70520.

Differential Revision: https://reviews.llvm.org/D70877

[MC] Rewrite tablegen for printInstrAlias to comiple faster, NFC

Before this change, the *InstPrinter.cpp files of each target where some
of the slowest objects to compile in all of LLVM. See this snippet produced by
ClangBuildAnalyzer:
https://reviews.llvm.org/P8171$96
Search for "InstPrinter", and see that it shows up in a few places.

Tablegen was emitting a large switch containing a sequence of operand checks,
each of which created many conditions and many BBs. Register allocation and
jump threading both did not scale well with such a large repetitive sequence of
basic blocks.

So, this change essentially turns those control flow structures into
data. The previous structure looked like:

  switch (Opc) {
  case TGT::ADD:
    // check alias 1
    if (MI->getOperandCount() == N && // check num opnds
        MI->getOperand(0).isReg() && // check opnd 0
        ...
        MI->getOperand(1).isImm() && // check opnd 1
     AsmString = "foo";
     break;
   }
   // check alias 2
   if (...)
     ...
   return false;

The new structure looks like:

  OpToPatterns: Sorted table of opcodes mapping to pattern indices.
   \->
     Patterns: List of patterns. Previous table points to subrange of
               patterns to match.
      \->
        Conds: The if conditions above encoded as a kind and 32-bit value.

See MCInstPrinter.cpp for the details of how the new data structures are
interpreted.

Here are some before and after metrics.
Time to compile AArch64InstPrinter.cpp:
  0m29.062s vs. 0m2.203s
size of the obj:
  3.9M vs. 676K
size of clang.exe:
  97M vs. 96M

I have not benchmarked disassembly performance, but typically
disassemblers are bottlenecked on IO and string processing, not alias
matching, so I'm not sure it's interesting enough to be worth doing.

Reviewers: RKSimon, andreadb, xbolva00, craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D70650

[X86] Fix prolog/epilog mismatch for stack protectors on win32-macho.

The xor'ing behaviour is only used for msvc/crt environments, when we're targeting
macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing
when targeting win32-macho to be consistent.

Differential Revision: https://reviews.llvm.org/D71095

[ObjC] Make sure that the implicit arguments for direct methods have been setup

This commit sets the Self and Imp declarations for ObjC method declarations,
in addition to the definitions. It also fixes
a bunch of code in clang that had wrong assumptions about when getSelfDecl() would be set:

- CGDebugInfo::getObjCMethodName and AnalysisConsumer::getFunctionName would assume that it was
set for method declarations part of a protocol, which they never were,
and that self would be a Class type, which it isn't as it is id for a protocol.

Also use the Canonical Decl to index the set of Direct methods so that
when calls and implementations interleave, the same llvm::Function is
used and the same symbol name emitted.

Radar-Id: rdar://problem/57661767

Patch by: Pierre Habouzit

Differential Revision: https://reviews.llvm.org/D71091

wrap an rst file to 80 cols, to cycle bots

[TargetLowering] Fix another potential FPE in expandFP_TO_UINT

D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:

// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)

Instead, I'd suggest to use the following sequence:

// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).

In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.

There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)

Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper

Differential Revision: https://reviews.llvm.org/D67105

[analyzer] Fix false positive on introspection of a block's internal layout.

When implementation of the block runtime is available, we should not
warn that block layout fields are uninitialized simply because they're
on the stack.

[InstSimplify] add tests for copysign with fneg operand; NFC

[WPD] Remove unused parameter (NFC)

Remove unused parameter.

[X86] Don't setup and teardown memory for a musttail call

Summary:
musttail calls should not require allocating extra stack for arguments.
Updates to arguments passed in memory should happen in place before the
epilogue.

This bug was mostly a missed optimization, unless inalloca was used and
store to push conversion fired.

If a reserved call frame was used for an inalloca musttail call, the
call setup and teardown instructions would be deleted, and SP
adjustments would be inserted in the prologue and epilogue. You can see
these are removed from several test cases in this change.

In the case where the stack frame was not reserved, i.e. call frame
optimization fires and turns argument stores into pushes, then the
imbalanced call frame setup instructions created for inalloca calls
become a problem. They remain in the instruction stream, resulting in a
call setup that allocates zero bytes (expected for inalloca), and a call
teardown that deallocates the inalloca pack. This deallocation was
unbalanced, leading to subsequent crashes.

Reviewers: hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71097

[clang-tidy] Pass -faligned-allocation on the compiler command line to
fix compile error

The test was failing when run on OSes older than MacOSX10.14 because
aligned deallocation functions are unavailable on older OSes.

rdar://problem/57706710

Revert "[PGO][PGSO] Instrument the code gen / target passes."

This reverts commit 9a0b5e14075a1f42a72eedb66fd4fde7985d37ac.

This seems to break buildbots.

[OPENMP50]Add if clause in distribute simd directive.

According to OpenMP 5.0, if clause can be used in for simd directive. If
condition in the if clause if false, the non-vectorized version of the
loop must be executed.

[AutoFDO] Inline replay for cold/small callees from sample profile loader

Summary:
Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout.

This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70750

Stop checking whether std::strong_* has ::equivalent members.

Any attempt to use these would be a bug, so we shouldn't even look for
them.

Avoid naming variable after type to fix GCC 5.3 build

GCC says:
.../llvm/lib/DebugInfo/GSYM/FunctionInfo.cpp:195:12:
error: ‘InfoType’ is not a class, namespace, or enumeration
case InfoType::EndOfList:
^

Presumably, GCC thinks InfoType is a variable here. Work around it by
using the name IT as is done above.

Revert "[InstCombine] reduce code duplication; NFC"

This reverts commit db5739658467e20a52f20e769d3580412e13ff87.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.

Revert "[InstCombine] improve readability; NFC"

This reverts commit 7250ef3613cc6b81145b9543bafb86d7f9466cde.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.

Revert "[InstCombine] reduce indentation; NFC"

This reverts commit 8bf8ef7116bd0daec570b35480ca969b74e66c6e.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.

[libcxx{,abi}] Don't link libpthread and libdl on Fuchsia

These are a part of the libc so linking these explicitly isn't necessary
and embedding these as deplibs causes link time error.

This issues was introduced in a9b5fff which changed how we emit deplibs.

Differential Revision: https://reviews.llvm.org/D71135

Revert "ARM-Darwin: keep the frame register reserved even if not updated."

This reverts commit a7d90af1be48234ce583e00fb16e33633d44ae38.

This revision came back as the root-cause for crashes in internal
ARM-IOS apps.
Reproducer in https://bugs.llvm.org/show_bug.cgi?id=44231.

[x86] add cost model special-case for insert/extract from element 0

This is a follow-up to D70607 where we made any
extract element on SLM more costly than default. But that is
pessimistic for extract from element 0 because that corresponds
to x86 movd/movq instructions. These generally have >1 cycle
latency, but they are probably implemented as single uop
instructions.

Note that no vectorization tests are affected by this change.
Also, no targets besides SLM are affected because those are
falling through to the default cost of 1 anyway. But this will
become visible/important if we add more specializations via cost
tables.

Differential Revision: https://reviews.llvm.org/D71023

[PGO][PGSO] Instrument the code gen / target passes.

Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71072

clang/AMDGPU: Fix default for frame-pointer attribute

Enabling optimization should allow frame pointer elimination.

[OPENMP]Reorganize OpenMP warning groups.

openmp-mapping group is a subgroup of openmp-target warning group. Also,
added global openmp group to control all other OpenMP warning groups.

[InstCombine] reduce indentation; NFC

[InstCombine] improve readability; NFC

CreateIntCast returns the input if its type matches, so need to duplicate that check.

[InstCombine] reduce code duplication; NFC

[InstCombine] improve readability; NFC

Add `QualType::hasAddressSpace`. NFC.

- Add that as a shorthand of <T>.getQualifiers().hasAddressSpace().
- Simplify related code.

[MBP] Avoid tail duplication if it can't bring benefit

Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.

To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:

make sure there is at least one duplication in current work set.
the number of duplication should not exceed the number of successors.

The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.

Differential Revision: https://reviews.llvm.org/D64376

[ASTImporter] Implicitly declare parameters for imported ObjCMethodDecls

Summary:
When Sema encounters a ObjCMethodDecl definition it declares the implicit parameters for the ObjCMethodDecl.
When importing such a method with the ASTImporter we need to do the same for the imported method
otherwise we will crash when generating code (where CodeGen expects that this was called by Sema).

Note I had to implement Objective-C[++] support in Language.cpp as this is the first test for Objective-C and this
would otherwise just hit this 'not implemented' assert when running the unit test.

Reviewers: martong, a.sidorin, shafik

Reviewed By: martong

Subscribers: rnkovacs, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D71112

[NFC][AIX][XCOFF] if the size of Csect is zero, the Csect do not need write any data into sections

SUMMARY:

if the size of Csect is zero, the Csect do not need write any data into sections
for example, the TOC Csect has zero size, it do not need invoke a
Asm.writeSectionData(W.OS, Csect.MCCsect, Layout);

Reviewers: daltenty
Subscribers: rupprecht, seiyai,hiraditya

Differential Revision: https://reviews.llvm.org/D71120

update string comparison in clang-format.py

Summary: Python 3.8 introduces a SyntaxWarning about string comparisons with 'is'. This commit updates the string comparison in clang-format.py that is done with 'is not' to '!='. This should not break compatibility with older python versions (tested 3.4.9, 2.7.17, 2.7.5, 3.8.0).

Reviewers: MyDeveloperDay, klimek, llvm-commits, cfe-commits

Reviewed By: MyDeveloperDay, klimek

Patch By: pseyfert

Tags: #clang-format, #clang

Differential Revision: https://reviews.llvm.org/D70664

[clang-format] update trailing newline treatment in clang-format.py

Summary:
The current clang-format.py does not handle trailing newlines at the end of a file correctly.
Trailing empty lines get removed except one.
As far as I understand this is because clang-format gets fed from stdin and writes to stdout when called from clang-format.py.
In a "normal" file (with no trailing empty lines) the string that gets passed to clang-format does not contain a trailing '\n' after the '\n'.join from python.
The clang-format binary does not add a trailing newline to input from stdin, but (if there are multiple trailing '\n', all except one get removed).

When reading back this means that we see in python from a "normal" file a string with no trailing '\n'. From a file with (potentially multiple) empty line(s) at the end, we get a string with one trailing '\n' back in python. In the former case all is fine, in the latter case split('\n') makes one empty line at the end of the file out of the clang-format output. Desired would be instead that the **file** ends with a newline, but not with an empty line.

For the case that a user specifies a range to format (and wants to keep trailing empty lines) I did **not** try to fix this by simply removing all trailing newlines from the clang-format output. Instead, I add a '\n' to the unformatted file content (i.e. newline-terminate what is passed to clang-format) and then strip off the last newline from the output (which itself is now for sure the newline termination of the clang-format output).

(Should this get approved, I'll need someone to help me land this.)

Reviewers: klimek, MyDeveloperDay

Reviewed By: MyDeveloperDay

Patch By: pseyfert

Subscribers: cfe-commits, llvm-commits

Tags: #clang-format, #clang

Differential Revision: https://reviews.llvm.org/D70864
update trailing newline treatment in clang-format.py