review.tizen.org Git - platform/upstream/llvm.git/log

[XRay] Guard call to postCurrentThreadFCT()

Summary:
Some cases where `postCurrentThreadFCT()` are not guarded by our
recursion guard. We've observed that sometimes these can lead to
deadlocks when some functions (like memcpy()) gets outlined and the
version of memcpy is XRay-instrumented, which can be materialised by the
compiler in the implementation of lower-level components used by the
profiling runtime.

This change ensures that all calls to `postCurrentThreadFCT` are guarded
by our thread-recursion guard, to prevent deadlocks.

Reviewers: mboerger, eizan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53805

llvm-svn: 345489

[X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers.

When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction.

Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that.

llvm-svn: 345488

Revert "Support for groups of attributes in #pragma clang attribute"

This reverts commit r345486.

Looks like it causes some old versions of GCC to crash, I'll see if I can
work around it and recommit...

llvm-svn: 345487

Support for groups of attributes in #pragma clang attribute

This commit enables pushing an empty #pragma clang attribute push, then adding
multiple attributes to it, then popping them all with #pragma clang attribute
pop, just like #pragma clang diagnostic. We still support the current way of
adding these, #pragma clang attribute push(__attribute__((...))), by treating it
like a combined push/attribute. This is needed to create macros like:

DO_SOMETHING_BEGIN(attr1, attr2, attr3)
// ...
DO_SOMETHING_END

rdar://45496947

Differential revision: https://reviews.llvm.org/D53621

llvm-svn: 345486

[XRay] Use more portable control block

Summary:
In D53560, we assumed a specific layout for memory without using an
explicit structure. This follow-up change uses more portable layout
control by using unions in a struct, and consolidating the memory
management code in the buffer queue.

We also take the opportunity to improve the documentation on the types
and operations, along with simplifying some of the logic in the buffer
queue implementation.

Reviewers: mboerger, eizan

Subscribers: jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D53802

llvm-svn: 345485

[X86] Recognize constant splats in LowerFCOPYSIGN.

llvm-svn: 345484

[X86] Add test case to show failure to handle splat vectors in the constant check in LowerFCOPYSIGN.

llvm-svn: 345483

Revert "Revert "DebugInfo: reduce DIE range verification on object files""

This reverts commit 836c763dadbd9478fa35b1a291a38bf17aa206ba. Default
initialize the values that MSAN caught.

llvm-svn: 345482

[SelectionDAG] Fix bad indentation. NFC

llvm-svn: 345481

[llvm-exegesis] Fix SNB counter definition and handling.

Summary: SNB is the only one that has P23 as a single proc res.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53766

llvm-svn: 345480

AST: extend MS decoration handling for extended vectors

We correctly handled extended vectors of non-floating point types.
However, we have the Intel style builtins which MSVC also supports which
do overlap in sizes with the floating point extended vectors.  This
would result in overloading of floating point extended vector types
which matched sizes (e.g. <3 x float> would be backed by a <4 x float>
and thus match sizes) to be mangled similarly.  Extended vectors are a
clang extension which live outside of the builtins, so mangle them all
similarly.  This change just extends the current scheme to treat
floating point types similar to the way that we treat other types
currently.

This now allows the swift runtime to be built for Windows again.

llvm-svn: 345479

[TargetLowering] Move i64/vXi64 to f32/vXf32 UINT_TO_FP handling to TargetLowering::expandUINT_TO_FP.

llvm-svn: 345478

[AST] Fix an use-of-uninitialized bug introduced in CaseStmt

SwitchCaseBits.CaseStmtIsGNURange needs to be initialized first.

llvm-svn: 345477

[X86][NFC] sse42-schedule.ll: disable XOP for BdVer2 tests

Else we are clearly testing the wrong instruction.

llvm-svn: 345476

[X86][NFC] sse41-schedule.ll: disable XOP for BdVer2 tests

Else we are clearly testing the wrong instruction.

llvm-svn: 345475

[X86][NFC] sse2-schedule.ll: disable XOP for BdVer2 tests

Else we are clearly testing the wrong instruction.

llvm-svn: 345474

[VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support.

Add vector support to TargetLowering::expandFP_TO_UINT.

This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this.

llvm-svn: 345473

[AST] Don't store data for GNU range case statement if not needed

Don't store the data for case statements of the form LHS ... RHS if not
needed. This cuts the size of CaseStmt by 1 pointer + 1 SourceLocation in
the common case.

Also use the newly available space in the bit-fields of Stmt to store the
keyword location of SwitchCase and move the small accessor
SwitchCase::getSubStmt to the header.

Differential Revision: https://reviews.llvm.org/D53609

Reviewed By: rjmccall

llvm-svn: 345472

[XRay] Refcount backing store for buffers

Summary:
This change implements the ref-counting for backing stores associated
with generational buffer management. We do this as an implementation
detail of the buffer queue, instead of exposing this to the interface.

This change allows us to keep the buffer queue interface and usage model
the same.

Depends on D53551.

Reviewers: mboerger, eizan

Subscribers: jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D53560

llvm-svn: 345471

Reapply Pass the nopie flag to the linker when linking with -pg.

llvm-svn: 345470

[DAGCombiner] Better constant vector support for FCOPYSIGN.

Enable constant folding when both operands are vectors of constants.

Turn into FNEG/FABS when the RHS is a splat constant vector.

llvm-svn: 345469

[X86] Add test cases showing missed opportunities for optimizing vector fcopysign when the RHS is a splat constant.

llvm-svn: 345468

[utils] collect_and_build_with_pgo.py: revert part already fixed in rL345461

The change was inadvertently included in my last commit.

llvm-svn: 345467

[utils] Fix _run_benchmark in collect_and_build_with_pgo.py

Summary: Also fix a FIXME in _build_stage1_clang: clang llvm-profdata profile are sufficient

Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53795

llvm-svn: 345466

Revert r344172: [LV] Add a new reduction pattern match

This patch has caused fast-math issues in the reduction pattern.

Will re-work and land again.

llvm-svn: 345465

[AST] Only store the needed data in IfStmt

Only store the needed data in IfStmt. This cuts the size of IfStmt
by up to 3 pointers + 1 SourceLocation. The order of the children
is intentionally kept the same even though it would be more
convenient to put the optional trailing objects last. Additionally
use the newly available space in the bit-fields of Stmt to store
the location of the "if".

The result of this is that for the common case of an
if statement of the form:

if (some_cond)
some_statement

the size of IfStmt is brought down to 8 bytes + 2 pointers,
instead of 8 bytes + 5 pointers + 2 SourceLocation.

Differential Revision: https://reviews.llvm.org/D53607

Reviewed By: rjmccall

llvm-svn: 345464

AMD BdVer2 (Piledriver) Initial Scheduler model

Summary:
# Overview
This is somewhat partial.
* Latencies are good {F7371125}
  * All of these remaining inconsistencies //appear// to be noise/noisy/flaky.
* NumMicroOps are somewhat good {F7371158}
  * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes
* Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there.
  They are basically verbatum copy from `btver2`
* Many `InstRW`. And there are still inconsistencies left...

To be noted:
I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis!

# Benchmark
I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed | RawSpeed raw image decoding library ]].
Diff (the exact clang from trunk without/with this patch):
```
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                                                        Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue                             0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean                              -0.0607         -0.0604           234           219           233           219
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median                            -0.0630         -0.0626           233           219           233           219
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev                            +0.2581         +0.2587             1             2             1             2
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue                             0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean                              -0.0770         -0.0767           144           133           144           133
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median                            -0.0767         -0.0763           144           133           144           133
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev                            -0.4170         -0.4156             1             0             1             0
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue                                          0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean                                           -0.0271         -0.0270           463           450           463           450
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median                                         -0.0093         -0.0093           453           449           453           449
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev                                         -0.7280         -0.7280            13             4            13             4
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue                                          0.0004          0.0004      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean                                           -0.0065         -0.0065           569           565           569           565
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median                                         -0.0077         -0.0077           569           564           569           564
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev                                         +1.0077         +1.0068             2             5             2             5
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue                                          0.0220          0.0199      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean                                           +0.0006         +0.0007           312           312           312           312
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median                                         +0.0031         +0.0032           311           312           311           312
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev                                         -0.7069         -0.7072             4             1             4             1
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue                                          0.0004          0.0004      U Test, Repetitions: 25 vs 25
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean                                           -0.0015         -0.0015           141           141           141           141
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median                                         -0.0010         -0.0011           141           141           141           141
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev                                         -0.1486         -0.1456             0             0             0             0
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue                                          0.6139          0.8766      U Test, Repetitions: 25 vs 25
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean                                           -0.0008         -0.0005            60            60            60            60
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median                                         -0.0006         -0.0002            60            60            60            60
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev                                         -0.1467         -0.1390             0             0             0             0
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue                                          0.0137          0.0137      U Test, Repetitions: 25 vs 25
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean                                           +0.0002         +0.0002           275           275           275           275
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median                                         -0.0015         -0.0014           275           275           275           275
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev                                         +3.3687         +3.3587             0             2             0             2
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue                                     0.4041          0.3933      U Test, Repetitions: 25 vs 25
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean                                      +0.0004         +0.0004            67            67            67            67
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median                                    -0.0000         -0.0000            67            67            67            67
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev                                    +0.1947         +0.1995             0             0             0             0
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue                              0.0074          0.0001      U Test, Repetitions: 25 vs 25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean                               -0.0092         +0.0074           547           542            25            25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median                             -0.0054         +0.0115           544           541            25            25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev                             -0.4086         -0.3486             8             5             0             0
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue                                        0.3320          0.0000      U Test, Repetitions: 25 vs 25
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean                                         +0.0015         +0.0204           218           218            12            12
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median                                       +0.0001         +0.0203           218           218            12            12
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev                                       +0.2259         +0.2023             1             1             0             0
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue                                      0.0000          0.0001      U Test, Repetitions: 25 vs 25
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean                                       -0.0209         -0.0179            96            94            90            88
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median                                     -0.0182         -0.0155            95            93            90            88
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev                                     -0.6164         -0.2703             2             1             2             1
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue                                     0.0000          0.0000      U Test, Repetitions: 25 vs 25
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean                                      -0.0098         -0.0098           176           175           176           175
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median                                    -0.0126         -0.0126           176           174           176           174
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev                                    +6.9789         +6.9157             0             2             0             2
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 25 vs 25
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean                  -0.0237         -0.0238           474           463           474           463
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median                -0.0267         -0.0267           473           461           473           461
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev                +0.7179         +0.7178             3             5             3             5
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue                   0.6837          0.6554      U Test, Repetitions: 25 vs 25
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean                    -0.0014         -0.0013          1375          1373          1375          1373
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median                  +0.0018         +0.0019          1371          1374          1371          1374
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev                  -0.7457         -0.7382            11             3            10             3
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue                                        0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean                                         -0.0080         -0.0289            22            22            10            10
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median                                       -0.0070         -0.0287            22            22            10            10
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev                                       +1.0977         +0.6614             0             0             0             0
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue                                       0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean                                        +0.0132         +0.0967            35            36            10            11
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median                                      +0.0132         +0.0956            35            36            10            11
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev                                      -0.0407         -0.1695             0             0             0             0
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue                                      0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean                                       +0.0331         +0.1307            13            13             6             6
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median                                     +0.0430         +0.1373            12            13             6             6
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev                                     -0.9006         -0.8847             1             0             0             0
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue                                            0.0016          0.0010      U Test, Repetitions: 25 vs 25
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean                                             -0.0023         -0.0024           395           394           395           394
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median                                           -0.0029         -0.0030           395           394           395           393
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev                                           -0.0275         -0.0375             1             1             1             1
Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue                                          0.0232          0.0000      U Test, Repetitions: 25 vs 25
Phase One/P65/CF027310.IIQ/threads:8/real_time_mean                                           -0.0047         +0.0039           114           113            28            28
Phase One/P65/CF027310.IIQ/threads:8/real_time_median                                         -0.0050         +0.0037           114           113            28            28
Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev                                         -0.0599         -0.2683             1             1             0             0
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue                          0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean                           +0.0206         +0.0207           405           414           405           414
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median                         +0.0204         +0.0205           405           414           405           414
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev                         +0.2155         +0.2212             1             1             1             1
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue                         0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean                          -0.0109         -0.0108           147           145           147           145
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median                        -0.0104         -0.0103           147           145           147           145
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev                        -0.4919         -0.4800             0             0             0             0
Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue                                         0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean                                          -0.0149         -0.0147           220           217           220           217
Samsung/NX3000/_3184416.SRW/threads:8/real_time_median                                        -0.0173         -0.0169           221           217           220           217
Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev                                        +1.0337         +1.0341             1             3             1             3
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue                                         0.0001          0.0001      U Test, Repetitions: 25 vs 25
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean                                          -0.0019         -0.0019           194           193           194           193
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median                                        -0.0021         -0.0021           194           193           194           193
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev                                        -0.4441         -0.4282             0             0             0             0
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue                                0.0000          0.4263      U Test, Repetitions: 25 vs 25
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean                                 +0.0258         -0.0006            81            83            19            19
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median                               +0.0235         -0.0011            81            82            19            19
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev                               +0.1634         +0.1070             1             1             0             0
```
{F7443905}
If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`),
and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`);
Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%`
Looks good so far i'd say.

llvm-exegesis details:
{F7371117} {F7371125}
{F7371128} {F7371144} {F7371158}

Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh

Reviewed By: andreadb

Subscribers: javed.absar, gbedwell, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52779

llvm-svn: 345463

[NFC][X86] Baseline tests for AMD BdVer2 (Piledriver) Scheduler model

Adding the baseline tests in a preparatory NFC commit,
so that the actual commit shows the *diff*.

Yes, i'm aware that a few of these codegen-based sched tests
are testing wrong instructions, i will fix that afterwards.

For https://reviews.llvm.org/D52779

llvm-svn: 345462

[utils] Run tests in the proper directory.

The intent here was to run check-llvm/check-clang in the instrumented
clang's build directory, not the maybe-not-yet-created uninstrumented
clang's. Oops. :)

llvm-svn: 345461

[AST] Refactor PredefinedExpr

Make the following changes to PredefinedExpr:

1. Move PredefinedExpr below StringLiteral so that it can use its definition.
2. Rename IdentType to IdentKind to be more in line with clang's conventions,
   and propagate the change to its users.
3. Move the location and the IdentKind into the newly available space of
   the bit-fields of Stmt.
4. Only store the function name when needed. When parsing all of Boost,
   of the 1357 PredefinedExpr 919 have no function name.

Differential Revision: https://reviews.llvm.org/D53605

Reviewed By: rjmccall

llvm-svn: 345460

[AST] Widen the bit-fields of Stmt to 8 bytes.

Although some classes are using the tail padding of Stmt, most of
them are not. In particular the expression classes are not using it
since there is Expr in between, and Expr contains a single pointer.

This patch widen the bit-fields to Stmt to 8 bytes and move some
data from NullStmt, CompoundStmt, LabelStmt, AttributedStmt, SwitchStmt,
WhileStmt, DoStmt, ForStmt, GotoStmt, ContinueStmt, BreakStmt
and ReturnStmt to the newly available space.

In itself this patch do not achieve much but I plan to go through each of
the classes in the statement/expression hierarchy and use this newly
available space. A quick estimation gives me that this should shrink the
size of the statement/expression hierarchy by >10% when parsing all of Boost.

Differential Revision: https://reviews.llvm.org/D53604

Reviewed By: rjmccall

llvm-svn: 345459

[X86][SSE] LowerVSELECT - pull out repeated getOperand(). NFCI.

llvm-svn: 345458

Revert "DebugInfo: reduce DIE range verification on object files"

This reverts commits r345441 and r345444, they were causing msan
buildbot failures.

llvm-svn: 345457

[Local] Keep K's range if K does not move when combining metadata.

As K has to dominate I, IIUC I's range metadata must be a subset of
K's. After Eli's recent clarification to the LangRef, loading a value
outside of the range is undefined behavior.
Therefore if I's range contains elements outside of K's range and we would load
one such value, K would cause undefined behavior.

In cases like hoisting/sinking, we still want the most generic range
over all code paths to/from the hoist/sink point. As suggested in the
patches related to D47339, I will refactor the handling of those
scenarios and try to decouple it from this function as follow up, once
we switched to a similar handling of metadata in most of
combineMetadata.

I updated some tests checking mostly the merging of metadata to keep the
metadata of to dominating load. The most interesting one is probably test8 in
test/Transforms/JumpThreading/thread-loads.ll. It contained a comment
about the alias metadata preventing us to eliminate the branch, but it
seem like the actual problem currently is that we merge the ranges of
both loads and cannot eliminate the icmp afterwards. With this patch, we
manage to eliminate the icmp, as the range of the first load excludes 8.

Reviewers: efriedma, nlopes, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D51629

llvm-svn: 345456

[x86] make test immune to improved extraction in D53784; NFC

llvm-svn: 345455

Fix -Wdocumentation warning. NFCI.

llvm-svn: 345454

Regenerate FP_TO_INT tests.

Precursor to fix for PR17686

llvm-svn: 345453

[TargetLowering] Move LegalizeDAG FP_TO_UINT handling to TargetLowering::expandFP_TO_UINT. NFCI.

First step towards fixing PR17686 and adding vector support.

llvm-svn: 345452

Revert rL345395: [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs
Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed.
........
Its been reported that this is causing out of trunk failures.

llvm-svn: 345451

[ARM64][Windows] MCLayer support for exception handling

Add ARM64 unwind codes to MCLayer, as well SEH directives that will be emitted
by the frame lowering patch to follow. We only emit unwind codes into object
object files for now.

Differential Revision: https://reviews.llvm.org/D50166

llvm-svn: 345450

AST: fix a typo in a comment (NFC)

Fix a typo spotted by Akira! NFC

llvm-svn: 345449

[X86] Add some isel patterns for scalar_to_vector/extract_vector_element that use the avx512 extended register classes when they are available.

llvm-svn: 345448

Revert r345169 [along with its llvm counterpart r345170] as it makes Halide builds timeout.

llvm-svn: 345447

Revert r345170 [along with its llvm counterpart r345169] as it makes Halide builds timeout.

llvm-svn: 345446

[XRay] Support generational buffers in FDR controller

Summary:
This is an intermediary step in the full support for generational buffer
management in the FDR runtime. This change makes the FDR controller
aware of the new generation number in the buffers handed out by the
BufferQueue type.

In the process of making this change, we've realised that the cleanest
way of ensuring that the backing store per generation is live while all
the threads that need access to it will need reference counting to tie
the backing store to the lifetime of all threads that have a handle on
buffers associated with the memory.

We also learn that we're missing the edge-case in the function exit
handler's implementation where the first record being written into the
buffer is a function exit, which is caught/fixed by the test for
generational buffer management.

We still haven't wired the controller into the FDR mode runtime, which
will need the reference counting on the backing store implemented to
ensure that we're being conservatively thread-safe with this approach.

Depends on D52974.

Reviewers: mboerger, eizan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53551

llvm-svn: 345445

test: add missing -triple

Ensure that the test builds for x86_64 as it is an assembly test. This
should repair the buildbots.

llvm-svn: 345444

Revert "Pass the nopie flag to the linker when linking with -pg." until
one of the tests can be fixed on !OpenBSD hosts.

llvm-svn: 345443

[Hexagon] Add missing assignment to Itinerary in Call_nr

The class definition for Call_nr has the itinerary as a
parameter, but the value is never assigned to the Itinerary
field for the instruction. This means the compiler is unable
to schedule and packetize the instruction correctly because
these instrution will not have any resource descritions.
I don't have a specific test case, but the ps_call_nr.ll
test failed with a proposed patch.

llvm-svn: 345442

DebugInfo: reduce DIE range verification on object files

Relocatable content may have overlapping ranges until the sections are
finalized. This reduces the amount of verification that is done on an object
file so that invalid errors are not raised.

llvm-svn: 345441

Update the other test.

llvm-svn: 345440

Pass the nopie flag to the linker when linking with -pg.

llvm-svn: 345439

Further split cpus test

On GreenDragon, CodeGen/X86/cpus-no-x86_64.ll was still timing out even
after breaking up the original test. I further split off the intel and
AMD cpus which hopefully resolves this.

http://green.lab.llvm.org/green/job/clang-stage2-cmake-RgSan/

llvm-svn: 345438

[x86] adjust tests to preserve behavior; NFC

I'm planning a binop optimization that would subvert the
domain forcing ops in these tests, so turning them into
zexts.

llvm-svn: 345437

[llvm-readobj] Fix bugs with unrecognized types in switch statements

Summary:
Add missing breaks. Several functions used nested switch statements,
where the outer switch branches based on the architecture, and the inner
switch handles architecture-specific types. If the type isn't
architecture-specific, break out to the generic types rather than fall
through.

getElfPtType: For GNU-style output, llvm-readobj prints
"<unknown>: 0xnnnnnnnn" for an unrecognized segment type, unless the
architecture is EM_ARM, EM_MIPS, or EM_MIPS_RS3_LE, in which case it
prints "". This behavior appears accidental, so instead, always print
the "<unknown>: 0xnnnnnnnn" string.

Reviewers: pcc, grimar

Reviewed By: grimar

Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53730

llvm-svn: 345436

Fix and rename broken test for `settings write`.

I committed this test without updating the old `settings export` to
settings write. Since the functionality was renamed I also renamed the
test case.

llvm-svn: 345435

Fix PR39458 _LIBCPP_DEBUG breaks heterogeneous compare.

The types/comparators passed to std::upper_bound and std::lower_bound
are not required to provided to provide an operator</comp(...) which
accepts the arguments in reverse order. Nor are the ranges required
to have a strict weak ordering.

However, in debug mode we attempted to check the result of a comparison
with the arguments reversed, which may not compiler.

This patch removes the use of the debug comparator for upper_bound
and lower_bound.

equal_range et al still use debug comparators when they call
__upper_bound and __lower_bound.

See llvm.org/PR39458

llvm-svn: 345434

Revert "[PassManager/Sanitizer] Enable usage of ported AddressSanitizer passes with -fsanitize=address"

This reverts commit 8d6af840396f2da2e4ed6aab669214ae25443204 and commit
b78d19c287b6e4a9abc9fb0545de9a3106d38d3d which causes slower build times
by initializing the AddressSanitizer on every function run.

The corresponding revisions are https://reviews.llvm.org/D52814 and
https://reviews.llvm.org/D52739.

llvm-svn: 345433

[VFS] Add property 'fallthrough' that controls fallback to real file system.

Default property value 'true' preserves current behavior. Value 'false' can be
used to create VFS "root", file system that gives better control over which
files compiler can use during compilation as there are no unpredictable
accesses to real file system.

Non-fallthrough use case changes how we treat multiple VFS overlay
files. Instead of all of them being at the same level just above a real
file system, now they are nested and subsequent overlays can refer to
files in previous overlays.

Change is done both in LLVM and Clang, corresponding LLVM commit is r345431.

rdar://problem/39465552

Reviewers: bruno, benlangmuir

Reviewed By: bruno

Subscribers: dexonsmith, cfe-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D50539

llvm-svn: 345432

[VFS] Add property 'fallthrough' that controls fallback to real file system.

Default property value 'true' preserves current behavior. Value 'false' can be
used to create VFS "root", file system that gives better control over which
files compiler can use during compilation as there are no unpredictable
accesses to real file system.

Non-fallthrough use case changes how we treat multiple VFS overlay
files. Instead of all of them being at the same level just above a real
file system, now they are nested and subsequent overlays can refer to
files in previous overlays.

rdar://problem/39465552

Reviewers: bruno, benlangmuir

Reviewed By: bruno

Subscribers: dexonsmith, cfe-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D50539

llvm-svn: 345431

[DAGCombiner] rearrange code in narrowExtractedVectorBinOp(); NFC

We can extend this code to handle many more cases
if an extract is cheap, so prepping for that change.

llvm-svn: 345430

[ValueTracking] peek through shuffles in ComputeNumSignBits (PR37549)

The motivating case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The analysis improvement allows us to form a vector 'select' out of
bitwise logic (the use of ComputeNumSignBits was added at rL345149).

The smaller test shows another InstCombine improvement - we use
ComputeNumSignBits to add 'nsw' to shift-left. But the negative
test shows an example where we must not add 'nsw' - when the shuffle
mask contains undef elements.

Differential Revision: https://reviews.llvm.org/D53659

llvm-svn: 345429

[LegalizeTypes] Stop DAGTypeLegalizer::getSETCCWidenedResultTy from creating illegal setccs. Add checks for valid setccs

The DAGTypeLegalizer::getSETCCWidenedResultTy was widening the MaskVT, but the code in convertMask called after getSETCCWidenedResultTy had no idea this widening had occurred. So none of the operands were widened when convertMask created new setccs with the widened VT.

This patch removes the widening and adds some asserts to getNode to validate the types of setccs to prevent issues like this in the future.

Differential Revision: https://reviews.llvm.org/D53743

llvm-svn: 345428

Add docs+a script for building clang/LLVM with PGO

Depending on who you ask, PGO grants a 15%-25% improvement in build
times when using clang. Sadly, hooking everything up properly to
generate a profile and apply it to clang isn't always straightforward.
This script (and the accompanying docs) aim to make this process easier;
ideally, a single invocation of the given script.

In terms of testing, I've got a cronjob on my Debian box that's meant to
run this a few times per week, and I tried manually running it on a puny
Gentoo box I have (four whole Atom cores!). Nothing obviously broke.
¯\_(ツ)_/¯

I don't know if we have a Python style guide, so I just shoved this
through yapf with all the defaults on.

Finally, though the focus is clang at the moment, the hope is that this
is easily applicable to other LLVM-y tools with minimal effort (e.g.
lld, opt, ...). Hence, this lives in llvm/utils and tries to be somewhat
ambiguous about naming.

Differential Revision: https://reviews.llvm.org/D53598

llvm-svn: 345427

[Spectre] Fix MIR verifier errors in retpoline thunks

Summary:
The main challenge here is that X86InstrInfo::AnalyzeBranch doesn't
understand the way we're using a CALL instruction as a branch, so we
can't list the CallTarget MBB as a successor of the entry block. If we
don't list it as a successor, then the AsmPrinter doesn't print a label
for the MBB.

Fix the issue by inserting our own label at the beginning of the call
target block. We can rely on the AsmPrinter to always emit it, even
though the block appears to be unreachable, but address-taken.

Fixes PR38391.

Reviewers: thegameg, chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D53653

llvm-svn: 345426

Work around gcc.gnu.org/PR87766

llvm-svn: 345425

[NFC] Update comment in libc++ ABI changelog

llvm-svn: 345424

Fix test expectation to match reality.

llvm-svn: 345423

Remove an early-return from Driver::ParseArgs that
was added as a part of D52604 / r343348. If the
lldb driver is run without any arguments, .lldbinit
file reading was not enabled.

<rdar://problem/45570242>

llvm-svn: 345422

Fix typo.

llvm-svn: 345421

[ARM] Make InstrEmitter mark CPSR defs dead for Thumb1.

The "dead" markings allow existing target-independent optimizations,
like MachineSink, to trigger more frequently. The CPSR defs would have
eventually been marked dead by LiveVariables, so this only affects
optimizations before regalloc.

The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible
with this change: the transform adds a use to an otherwise dead def
of CPSR. This is covered by existing regression tests.

thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the
generated code; I'll fix it in D53452.

Differential Revision: https://reviews.llvm.org/D53453

llvm-svn: 345420

PR26547: alignof should return ABI alignment, not preferred alignment

Summary:
- Add `UETT_PreferredAlignOf` to account for the difference between `__alignof` and `alignof`
- `AlignOfType` now returns ABI alignment instead of preferred alignment iff clang-abi-compat > 7, and one uses _Alignof or alignof

Patch by Nicole Mazzuca!

Differential Revision: https://reviews.llvm.org/D53207

llvm-svn: 345419

[clang-doc] Switch to default to all-TUs executor

Since we generally want to document a whole project, not just one file.

Differential Revision: https://reviews.llvm.org/D53170

llvm-svn: 345418

[NFC][OpenMP] Add new test for parallel for code generation.

Summary:
This is a simple test of the parallel for code generation. It will be used to showcase the change introduced by patch D53443.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53772

llvm-svn: 345417

[XRay] Use std::errc::invalid_argument instead of std::errc::bad_message

This change should appease the mingw32 builds.

Similar to r293725.

Differential Revision: https://reviews.llvm.org/D53742

llvm-svn: 345416

[PowerPC] Improve BUILD_VECTOR of 4 i32s

Currently, for this node:
  vector int test(int a, int b, int c, int d) {
    return (vector int) { a, b, c, d };
  }

we get this on Power9:
  mtvsrdd 34, 5, 3
  mtvsrdd 35, 6, 4
  vmrgow 2, 3, 2

and this on Power8:
  mtvsrwz 0, 3
  mtvsrwz 1, 5
  mtvsrwz 2, 4
  mtvsrwz 3, 6
  xxmrghd 34, 1, 0
  xxmrghd 35, 3, 2
  vmrgow 2, 3, 2

This can be improved to this on LE Power9:
  rldimi 3, 4, 32, 0
  rldimi 5, 6, 32, 0
  mtvsrdd 34, 5, 3

and this on LE Power8
  rldimi 3, 4, 32, 0
  rldimi 5, 6, 32, 0
  mtvsrd 34, 3
  mtvsrd 35, 5
  xxpermdi 34, 35, 34, 0

This patch updates the TD pattern to generate the optimized sequence for both
Power8 and Power9 on LE and BE.

Differential Revision: https://reviews.llvm.org/D53494

llvm-svn: 345414

Pointer types were treated as zero-size by MergeICmps

Summary:
The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size.
Found with "self-build" of clang/LLVM on Windows.

Reviewers: christylee, trentxintong, courbet

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53536

llvm-svn: 345413

[ADT] Use explicit constructors for DenseMapPair to work around compiler issues.

Inheriting constructors from std::pair caused clang-3.8 to treat some DenseMap
initializer_list constructor calls as ambiguous, which broke several bots. This
commit explicitly defines DenseMapPair's constructos to work around the issue.

https://reviews.llvm.org/D53726

llvm-svn: 345411

[llvm-ar] Strip trailing \r and format

Reviewers: mstorsjo, rupprecht, gbreynoo

Reviewed By: rupprecht

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53769

llvm-svn: 345410

[X86] Stop promoting vector and/or/xor/andn to vXi64.

These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation.

This patch removes the promotion and adds isel patterns instead.

The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward.

Differential Revision: https://reviews.llvm.org/D53268

llvm-svn: 345408

[X86] Add -LABEL to some FileCheck checks. NFC

llvm-svn: 345407

[sanitizer] Improve macOS version detection

Part of <https://reviews.llvm.org/D48445>.

llvm-svn: 345406

[llvm-ar] Add a dependency to BinaryFormat after rL345383

llvm-svn: 345405

[DWARF][NFC] cleanup (mostly leftovers from the implementation of string offsets tables)
Majority of the patch by David Blaikie.

Differential Revision: https://reviews.llvm.org/D53741

llvm-svn: 345404

Fix incorrect use of aligned allocation in get_temporary_buffer.

llvm-svn: 345403

[DataFormatters] Adding formatters for libc++ std::u16string and std::u32string

rdar://problem/41302849

Differential Revision: https://reviews.llvm.org/D53656

llvm-svn: 345402

XFAIL sized deallocation test with GCC

llvm-svn: 345400

[tblgen] Improve comments in TargetInstrPredicate.td. NFC

llvm-svn: 345399

[Fixed Point Arithmetic] Refactor fixed point casts

Summary:
- Added names for some emitted values (such as "tobool" for
  the result of a cast to boolean).
- Replaced explicit IRBuilder request for doing sext/zext/trunc
  by using CreateIntCast instead.
- Simplify code for emitting satuation into one if-statement
  for clamping to max, and one if-statement for clamping to min.

Reviewers: leonardchan, ebevhan

Reviewed By: leonardchan

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D53707

llvm-svn: 345398

Revert "UBSan blacklist workaround for bot timeouts"

This reverts commit r335525. This workaround is no longer necessary
because PR37929 has been fixed.

llvm-svn: 345397

[MIR] Simplify and move MIR test

Also fixes a Machine Verifier issue.

llvm-svn: 345396

[X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs

Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed.

llvm-svn: 345395

[sanitizer] Fix mallopt test on Android.

There is not a single common mallopt option between gnu/linux and
android, so simply use a random number there.

llvm-svn: 345394

Rename warnUnorderableSymbol maybeWarnUnorderableSymbol because the function doesn't always emit a warning.

llvm-svn: 345393

Refactor readCallGraph() and readCallGraphFromObjectFiles(). NFC.

llvm-svn: 345392

[x86] commute blendvb with constant condition op to allow load folding

This is a narrow fix for 1 of the problems mentioned in PR27780:
https://bugs.llvm.org/show_bug.cgi?id=27780

I looked at more general solutions, but it's a mess. We canonicalize shuffle masks
based on the number of elements accessed from each operand, and that's not optional.
If you remove that, we'll crash because we fail to match isel patterns. So I'm
waiting until we're sure that we have blendvb with constant condition and then
commuting based on the load potential. Other cases like blend-with-immediate are
already handled elsewhere, so this is probably not a common problem anyway.

I didn't use "MayFoldLoad" because that checks for one-use and in these cases, we've
screwed that up by creating a temporary PSHUFB using these operands that we're counting
on to be killed later. Undoing that didn't look like a simple task because it's
intertwined with determining if we actually use both operands of the shuffle or not.a

Differential Revision: https://reviews.llvm.org/D53737

llvm-svn: 345390

[X86] Use existing pulled out VT variables. NFCI.

llvm-svn: 345388

[SimpleLoopUnswitch] Unswitch by experimental.guard intrinsics

This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial
simple loop unswitching. These intrinsics represent implicit control flow which
has pretty much the same semantics as usual conditional branches. The
algorithm of dealing with them is following:

- Consider guards as unswitching candidates;
- If a guard is considered the best candidate, turn it into a branch;
- Apply normal unswitching algorithm on this branch.

The patch has no compile time effect on code that does not contain any guards.

Differential Revision: https://reviews.llvm.org/D53744
Reviewed By: chandlerc

llvm-svn: 345387

[ARM] Fix ARMCodeGenPrepare test cases

While working on FileCheck producing better diagnostics in D53710, I noticed
that our test case is broken in a few different ways. The test was running, but
results were not checked as prefix CHECK-COMMON wasn't defined (which is what
FileCheck should warn about). Also, the output was different in 2 cases because
of recent changes in ARMCodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D53746

llvm-svn: 345386

[Windows] Define generic arguments registers for Windows x64

Summary:
When evaluating expressions the generic arguments registers are required by ABI.
This patch defines them.

Reviewers: zturner, stella.stamenova, labath

Subscribers: aleksandr.urakov, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D53753

llvm-svn: 345385

[CodeGen] Remove out operands from PATCHABLE_OP

The current model requires 1 out operand, but it is not used nor created.

This fixed an x86 machine verifier issue.

Part of PR27481.

llvm-svn: 345384