platform/upstream/llvm.git
4 years ago[lldb] Correct wording of EXP_MSG
David Spickett [Tue, 25 Aug 2020 15:50:03 +0000 (16:50 +0100)]
[lldb] Correct wording of EXP_MSG

EXP_MSG generates a message to show on assert
failure. Currently it looks like:
AssertionError: False is not True : '<cmd>'
returns expected result, got '<actual output>'

Which seems to say that the test failed but
also got the expected result.

It should say:
AssertionError: False is not True : '<cmd>'
returned unexpected result, got '<actual output>'

Reviewed By: teemperor, #lldb

Differential Revision: https://reviews.llvm.org/D86603

4 years ago[X86] Make sure we do not clobber RBX with mwaitx when used as a base
Pierre Gousseau [Wed, 15 Jan 2020 18:33:28 +0000 (18:33 +0000)]
[X86] Make sure we do not clobber RBX with mwaitx when used as a base
pointer.

mwaitx uses EBX as one of its argument.
Using this instruction clobbers RBX as it is defined to hold one of the
input. When the backend uses dynamically allocated stack, RBX is used as
a reserved register for the base pointer.

This patch is adapted from @qcolombet patch for cmpxchg at r263325.

This fixes PR43528.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D73475

4 years ago[GlobalISel] Fix and tidy up documentation for ValueMapping class (NFC)
Gabriel Hjort Åkerlund [Wed, 26 Aug 2020 08:53:02 +0000 (10:53 +0200)]
[GlobalISel] Fix and tidy up documentation for ValueMapping class (NFC)

The documentation was missing a '*/' in '/*<2x32-bit> vadd {0, 64, VPR}',
and the example code are now aligned to improve readability.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86201

4 years ago[TableGen][GlobalISel] Fix tblgen optimization bug
Gabriel Hjort Åkerlund [Wed, 26 Aug 2020 08:48:15 +0000 (10:48 +0200)]
[TableGen][GlobalISel] Fix tblgen optimization bug

When optimizing the table, PointerToAnyOperandMatchers would be
incorrectly reported as identical even though they have different
SizeInBits values. This bug was due to failing to overload the
isIdentical() method, which this patch addresses.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86199

4 years agoReland [IR] Intrinsics default attributes and opt-out flag
sstefan1 [Wed, 26 Aug 2020 08:37:48 +0000 (10:37 +0200)]
Reland [IR] Intrinsics default attributes and opt-out flag

Intrinsic properties can now be set to default and applied to all
intrinsics. If the attributes are not needed, the user can opt-out by
setting the DisableDefaultAttributes flag to true.

Differential Revision: https://reviews.llvm.org/D70365

4 years ago[AArch64][AsmParser] Fix bug in operand printer
Cullen Rhodes [Tue, 25 Aug 2020 12:27:15 +0000 (12:27 +0000)]
[AArch64][AsmParser] Fix bug in operand printer

The switch in AArch64Operand::print was changed in D45688 so the shift
can be printed after printing the register. This is implemented with
LLVM_FALLTHROUGH and was broken in D52485 when BTIHint was put between
the register and shift operands.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D86535

4 years ago[analyzer] Add modeling of assignment operator in smart ptr
Nithin Vadukkumchery Rajendrakumar [Wed, 26 Aug 2020 09:22:55 +0000 (11:22 +0200)]
[analyzer] Add modeling of assignment operator in smart ptr

Summary: Support for 'std::unique_ptr>::operator=' in SmartPtrModeling

Reviewers: NoQ, Szelethus, vsavchenko, xazax.hun

Reviewed By: NoQ, vsavchenko, xazax.hun

Subscribers: martong, cfe-commits
Tags: #clang

Differential Revision: https://reviews.llvm.org/D86293

4 years ago[AArch64][SVE] Fix calculation restore point for SVE callee saves.
Sander de Smalen [Wed, 19 Aug 2020 10:06:51 +0000 (11:06 +0100)]
[AArch64][SVE] Fix calculation restore point for SVE callee saves.

This fixes an issue where the restore point of callee-saves in the
function epilogues was incorrectly calculated when the basic block
consisted of only a RET instruction. This caused dealloc instructions
to be inserted in between the block of callee-save restore instructions,
rather than before it.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D86099

4 years ago[NFC] Fix some spelling errors in clang Driver Options.td
Yang Zhihui [Wed, 26 Aug 2020 08:28:50 +0000 (04:28 -0400)]
[NFC] Fix some spelling errors in clang Driver Options.td

Differential Revision: https://reviews.llvm.org/D86427

4 years ago[clangd] Compute the inactive code range for semantic highlighting.
Haojian Wu [Wed, 26 Aug 2020 08:50:31 +0000 (10:50 +0200)]
[clangd] Compute the inactive code range for semantic highlighting.

Differential Revision: https://reviews.llvm.org/D85635

4 years ago[Support] Speedup llvm-dwarfdump 3.9x
Jan Kratochvil [Wed, 26 Aug 2020 07:57:53 +0000 (09:57 +0200)]
[Support] Speedup llvm-dwarfdump 3.9x

Currently `strace llvm-dwarfdump x.debug >/tmp/file`:

  ioctl(1, TCGETS, 0x7ffd64d7f340)        = -1 ENOTTY (Inappropriate ioctl for device)
  write(1, "           DW_AT_decl_line\t(89)\n"..., 4096) = 4096
  ioctl(1, TCGETS, 0x7ffd64d7f400)        = -1 ENOTTY (Inappropriate ioctl for device)
  ioctl(1, TCGETS, 0x7ffd64d7f410)        = -1 ENOTTY (Inappropriate ioctl for device)
  ioctl(1, TCGETS, 0x7ffd64d7f400)        = -1 ENOTTY (Inappropriate ioctl for device)

After this patch:

  write(1, "0000000000001102 \"strlen\")\n     "..., 4096) = 4096
  write(1, "site\n                  DW_AT_low"..., 4096) = 4096
  write(1, "d53)\n\n0x000e4d4d:       DW_TAG_G"..., 4096) = 4096

The same speedup can be achieved by `--color=0` but that is not much convenient.

This implementation has been suggested by Joerg Sonnenberger.

Differential Revision: https://reviews.llvm.org/D86406

4 years ago[lldb] XFAIL TestMemoryHistory on Linux
Raphael Isemann [Wed, 26 Aug 2020 08:22:04 +0000 (10:22 +0200)]
[lldb] XFAIL TestMemoryHistory on Linux

This test appears to have never worked on Linux but it seems none of the current
bots ever ran this test as it required enabling compiler-rt (otherwise it
would have just been skipped).

This just copies over the XFAIL decorator that are already on all other sanitizer
tests.

4 years ago[SelectionDAG] Handle non-power-of-2 bitwidths in expandROT
Jay Foad [Mon, 24 Aug 2020 12:24:21 +0000 (13:24 +0100)]
[SelectionDAG] Handle non-power-of-2 bitwidths in expandROT

Differential Revision: https://reviews.llvm.org/D86449

4 years ago[mlir] Fix bug in block merging when the types of the operands differ
River Riddle [Wed, 26 Aug 2020 08:13:01 +0000 (01:13 -0700)]
[mlir] Fix bug in block merging when the types of the operands differ

The merging algorithm was previously not checking for type equivalence.

Fixes PR47314

Differential Revision: https://reviews.llvm.org/D86594

4 years ago[Attributor] Provide an edge-based interface in AAIsDead
Shinji Okumura [Wed, 26 Aug 2020 07:57:52 +0000 (16:57 +0900)]
[Attributor] Provide an edge-based interface in AAIsDead

This patch produces an edge-based interface in AAIsDead.
By this, we can query a set of basic blocks that are directly reachable from a given basic block.
This is specifically useful for implementation of AAReachability.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85547

4 years ago[SyntaxTree] Fix C++ versions on tests of `BuildTreeTest.cpp`
Eduardo Caldas [Wed, 26 Aug 2020 07:18:01 +0000 (07:18 +0000)]
[SyntaxTree] Fix C++ versions on tests of `BuildTreeTest.cpp`

Differential Revision: https://reviews.llvm.org/D86591

4 years ago[SyntaxTree] Add support for `CallExpression`
Eduardo Caldas [Tue, 25 Aug 2020 07:47:52 +0000 (07:47 +0000)]
[SyntaxTree] Add support for `CallExpression`

* Generate `CallExpression` syntax node for all semantic nodes inheriting from
`CallExpr` with call-expression syntax - except `CUDAKernelCallExpr`.
* Implement all the accessors
* Arguments of `CallExpression` have their own syntax node which is based on
the `List` base API

Differential Revision: https://reviews.llvm.org/D86544

4 years ago[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
Roman Lebedev [Wed, 26 Aug 2020 06:08:24 +0000 (09:08 +0300)]
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad

While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

This is a reland of the original commit fcb51d8c2460faa23b71e06abb7e826243887dd6,
because originally i forgot to ensure that the base aggregate types match.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530

4 years ago[NFC][InstCombine] Add a PHI-of-insertvalues test with different base aggregate types
Roman Lebedev [Wed, 26 Aug 2020 06:28:22 +0000 (09:28 +0300)]
[NFC][InstCombine] Add a PHI-of-insertvalues test with different base aggregate types

4 years agoAdd assertion in PatternRewriter::create<> to defend the same way as OpBuilder::creat...
Mehdi Amini [Wed, 26 Aug 2020 05:43:43 +0000 (05:43 +0000)]
Add assertion in PatternRewriter::create<> to defend the same way as OpBuilder::create<> against missing dialect registration (NFC)

The code would have failed a few line later, but that way the error
message is more clear/friendly to debug.

4 years agoAdjust assertion when casting to an unregistered operation
Mehdi Amini [Wed, 26 Aug 2020 05:08:59 +0000 (05:08 +0000)]
Adjust assertion when casting to an unregistered operation

This assertion does not achieve what it meant to do originally, as it
would fire only when applied to an unregistered operation, which is a
fairly rare circumstance (it needs a dialect or context allowing
unregistered operation in the input in the first place).
Instead we relax it to only fire when it should have matched but didn't
because of the misconfiguration.

Differential Revision: https://reviews.llvm.org/D86588

4 years ago[libc][NFC] For remquo quotient, compare only 3 bits of MPFR and libc results.
Siva Chandra Reddy [Wed, 26 Aug 2020 06:32:55 +0000 (23:32 -0700)]
[libc][NFC] For remquo quotient, compare only 3 bits of MPFR and libc results.

4 years ago[LLD][MinGW] Handle allow-multiple-definition flag
Mateusz Mikuła [Wed, 26 Aug 2020 06:25:52 +0000 (09:25 +0300)]
[LLD][MinGW] Handle allow-multiple-definition flag

Basically copied from ELF driver.

Differential Revision: https://reviews.llvm.org/D86512

4 years ago[LLD][MinGW] Cleanup Options.td file. NFC.
Mateusz Mikuła [Wed, 26 Aug 2020 06:24:37 +0000 (09:24 +0300)]
[LLD][MinGW] Cleanup Options.td file. NFC.

Based on ELF driver Options.td.

Differential Revision: https://reviews.llvm.org/D86509

4 years ago[MC] [Win64EH] Update the AArch64/seh.s test slightly. NFC.
Martin Storsjö [Thu, 20 Aug 2020 08:53:56 +0000 (11:53 +0300)]
[MC] [Win64EH] Update the AArch64/seh.s test slightly. NFC.

Update the comment stating the aim of the test - this is currently
only checking that these assembler directives doesn't cause the
assembler to fail, but the results of the testcase aren't particularly
correct yet.

Remove bits of the testcase that are even less likely to be found in
the wild (the .seh_startchained/.seh_endchained block), where the
testcase currently doesn't really generate anything interesting
anyway.

Differential Revision: https://reviews.llvm.org/D86524

4 years ago[llvm-readobj] Fix arm64 unwind opcode disassembly printing
Martin Storsjö [Mon, 24 Aug 2020 09:29:56 +0000 (12:29 +0300)]
[llvm-readobj] Fix arm64 unwind opcode disassembly printing

Add a missing minus, fix vertical alignment of instructions for one opcode.

Differential Revision: https://reviews.llvm.org/D86523

4 years ago[mlir][spirv] Infer converted type of scf.for from the init value
Thomas Raoux [Wed, 26 Aug 2020 06:19:09 +0000 (23:19 -0700)]
[mlir][spirv] Infer converted type of scf.for from the init value

Instead of using the TypeConverter infer the value of the alloca created based
on the init value. This will allow some ambiguous types like multidimensional
vectors to be converted correctly.

Differential Revision: https://reviews.llvm.org/D86582

4 years agoRevert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are...
Roman Lebedev [Wed, 26 Aug 2020 06:23:22 +0000 (09:23 +0300)]
Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad"

This reverts commit fcb51d8c2460faa23b71e06abb7e826243887dd6.

As buildbots report, there's apparently some missing check to ensure
that the types of incoming values match the type of PHI.
Let's revert for a moment.

4 years ago[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
Roman Lebedev [Wed, 26 Aug 2020 06:08:24 +0000 (09:08 +0300)]
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad

While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530

4 years agoFix a 32-bit overflow issue when reading LTO-generated bitcode files whose strtab...
Jianzhou Zhao [Tue, 25 Aug 2020 00:36:24 +0000 (00:36 +0000)]
Fix a 32-bit overflow issue when reading LTO-generated bitcode files whose strtab are of size > 2^29

This happens when using -flto and -Wl,--plugin-opt=emit-llvm to create a linked LTO bitcode file, and the bitcode file has a strtab with size > 2^29.

All the issues relate to a pattern like this
  size_t x64 = y64 + z32 * C
  When z32 is >= (2^32)/C, z32 * C overflows.

Reviewed-by: MaskRay
Differential Revision: https://reviews.llvm.org/D86500

4 years agoRemove the use of global dialect registration from the standalone-translate.cpp examp...
Mehdi Amini [Wed, 26 Aug 2020 05:08:32 +0000 (05:08 +0000)]
Remove the use of global dialect registration from the standalone-translate.cpp example (NFC)

4 years ago[libc][obvious] Add back the accidentally removed MPFRNumber destructor.
Siva Chandra Reddy [Wed, 26 Aug 2020 04:57:46 +0000 (21:57 -0700)]
[libc][obvious] Add back the accidentally removed MPFRNumber destructor.

4 years ago[libc] Extend MPFRMatcher to handle multiple-input-multiple-output functions.
Siva Chandra Reddy [Fri, 21 Aug 2020 05:36:53 +0000 (22:36 -0700)]
[libc] Extend MPFRMatcher to handle multiple-input-multiple-output functions.

Tests for frexp[f|l] now use the new capability. Not all input-output
combinations have been addressed by this change. Support for newer combinations
can be added in future as needed.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D86506

4 years ago[DWARFYAML] Use writeDWARFOffset() to write the prologue_length field. NFC.
Xing GUO [Wed, 26 Aug 2020 04:30:33 +0000 (12:30 +0800)]
[DWARFYAML] Use writeDWARFOffset() to write the prologue_length field. NFC.

Use writeDWARFOffset() to simplify the logic. NFC.

4 years ago[llvm-lipo] Add support for bitcode files
Adrien Guinet [Wed, 26 Aug 2020 01:55:50 +0000 (18:55 -0700)]
[llvm-lipo] Add support for bitcode files

A Mach-O universal binary may contain bitcode as a slice.
This diff adds proper handling of such binaries to llvm-lipo.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D85740

4 years agoAh, one test too many updated. This one should be unmodified.
Jason Molenda [Wed, 26 Aug 2020 04:03:39 +0000 (21:03 -0700)]
Ah, one test too many updated.  This one should be unmodified.

4 years agoUpdate UnwindPlan dump to list if it is a trap handler func; also Command
Jason Molenda [Wed, 26 Aug 2020 03:53:01 +0000 (20:53 -0700)]
Update UnwindPlan dump to list if it is a trap handler func; also Command

Update the "image show-unwind" command output to show if the function
being shown is listed as a user-setting or platform trap handler.

Update the individual UnwindPlan dumps to show whether the unwind plan
is registered as a trap handler.

4 years ago[Docs] Document --lto-whole-program-visibility
Teresa Johnson [Wed, 4 Mar 2020 23:38:45 +0000 (15:38 -0800)]
[Docs] Document --lto-whole-program-visibility

Summary:
Documents interaction of linker option added in D71913 with LTO
visibility.

Reviewers: pcc

Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75655

4 years agoAdd Z3 to system libraries list if enabled
Mikhail R. Gadelha [Tue, 25 Aug 2020 23:19:58 +0000 (19:19 -0400)]
Add Z3 to system libraries list if enabled

Without this trying to link static LLVM libraries (built with Z3 enabled) fails because `llvm-config` doesn't print `-lz3`.
We are already using this patch at MSYS2: https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/0013-Add-Z3-to-system-libraries-list-if-enabled.patch

Reviewed By: mikhail.ramalho

Differential Revision: https://reviews.llvm.org/D85195

4 years ago[X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid...
Craig Topper [Wed, 26 Aug 2020 01:20:41 +0000 (18:20 -0700)]
[X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid an extra EXTRACT_SUBREG

Since we can only copy to GR32 we had to EXTRACT from GR32, but
we would first go to GR16 and then the truncate would extra again
to GR8. This adds a special case to go directly from GR32 to GR8.
This would eventually get cleaned up, but though maybe we should
avoid doing it in the first place. Our k-register handling is weird
and we could probably stand to have some more special ISD nodes
for the conversions so the i32 type would be explicit.

4 years ago[Modules] Improve error message when cannot find parent module for submodule definition.
Volodymyr Sapsai [Thu, 23 Jul 2020 19:47:16 +0000 (12:47 -0700)]
[Modules] Improve error message when cannot find parent module for submodule definition.

Before the change the diagnostic for

    module unknown.submodule {}

was "error: expected module name" which is incorrect and misleading
because both "unknown" and "submodule" are valid module names.

We already have a better error message when a parent module is a
submodule itself and is missing. Make the error for a missing top-level
module more like the one for a submodule.

rdar://problem/64424407

Reviewed By: bruno

Differential Revision: https://reviews.llvm.org/D84458

4 years agoRemove global registration from the test dialect in MLIR (NFC)
Mehdi Amini [Tue, 25 Aug 2020 23:21:15 +0000 (23:21 +0000)]
Remove global registration from the test dialect in MLIR (NFC)

4 years ago[X86] Remove extra getOperand(0) call from recently introduced store(extract_element...
Craig Topper [Tue, 25 Aug 2020 23:00:12 +0000 (16:00 -0700)]
[X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine.

The IsExtractedElement already called getOperand(0) so Extract
here is the source vector. We shouldn't call getOperand(0). This
worked for the original test cases because the result was a
bitcast so the getOperand(0) accidently peeked through the bitcast
which is what we wanted.

In the failing case here, the operand turns out to be undef so
the getOperand(0) asserts because undef has no operands.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184

Differential Revision: https://reviews.llvm.org/D86428

4 years agoAdd llvm_unreachable after fully covered switch to silence some warnings from GCC...
Mehdi Amini [Tue, 25 Aug 2020 23:08:43 +0000 (23:08 +0000)]
Add llvm_unreachable after fully covered switch to silence some warnings from GCC (NFC)

4 years agoRevert "[Coverage] Enable emitting gap area between macros"
Zequan Wu [Tue, 25 Aug 2020 22:04:33 +0000 (15:04 -0700)]
Revert "[Coverage] Enable emitting gap area between macros"

This reverts commit a31c89c1b7a0a2fd3e2c0b8a587a60921abf4abd.

4 years ago[X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output...
Craig Topper [Tue, 25 Aug 2020 22:16:50 +0000 (15:16 -0700)]
[X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern.

KMOVWkr produces VK16, there's no reason to copy it to VK16 again.

Test changes are presumably because we were scheduling based on
the COPY that is no longer there.

4 years ago[llvm-libtool-darwin] Address post-commit feedback
Shoaib Meenai [Tue, 25 Aug 2020 22:03:37 +0000 (15:03 -0700)]
[llvm-libtool-darwin] Address post-commit feedback

Address James Henderson's comments on https://reviews.llvm.org/D86359.

4 years agoRemove unused/misnamed SetObjectModificationTime
Dave Lee [Mon, 24 Aug 2020 20:46:23 +0000 (13:46 -0700)]
Remove unused/misnamed SetObjectModificationTime

Remove `SetObjectModificationTime` which is not currently used, and assigns to the wrong member.

Differential Revision: https://reviews.llvm.org/D86493

4 years ago[MLInliner] Simplify TFUTILS_SUPPORTED_TYPES
Mircea Trofin [Tue, 25 Aug 2020 16:58:49 +0000 (09:58 -0700)]
[MLInliner] Simplify TFUTILS_SUPPORTED_TYPES

We only need the C++ type and the corresponding TF Enum. The other
parameter was used for the output spec json file, but we can just
standardize on the C++ type name there.

Differential Revision: https://reviews.llvm.org/D86549

4 years ago[AMDGPU] Remove unsound dependency on ISA version in waitcnt
Stanislav Mekhanoshin [Tue, 25 Aug 2020 18:58:23 +0000 (11:58 -0700)]
[AMDGPU] Remove unsound dependency on ISA version in waitcnt

Differential Revision: https://reviews.llvm.org/D86566

4 years ago[TargetLoweringObjectFileImpl] Make .llvmbc and .llvmcmd non-SHF_ALLOC
Fangrui Song [Tue, 25 Aug 2020 20:37:29 +0000 (13:37 -0700)]
[TargetLoweringObjectFileImpl] Make .llvmbc and .llvmcmd non-SHF_ALLOC

There are two ways .llvmbc can be produced:

* clang -c -fembed-bitcode=all (which also produces .llvmcmd)
* LTO backend: ld.lld -mllvm -lto-embed-bitcode or -plugin-opt=-lto-embed-bitcode

.llvmbc and .llvmcmd have the SHF_ALLOC flag, so they can be dropped by
--gc-sections.

This patch sets SectionKind::Metadata to drop the SHF_ALLOC flag. This
is conceptually correct: the two sections are not part of the process
image, so SHF_ALLOC is not appropriate.

`test/LTO/X86/embed-bitcode.ll`: changed `llvm-objcopy -O binary --only-section` to
`llvm-objcopy --dump-section`. `-O binary` does not dump non-SHF_ALLOC sections.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D86374

4 years ago[SDAG] Improve MemSDNode::getBasePtr
Krzysztof Parzyszek [Mon, 24 Aug 2020 18:25:06 +0000 (13:25 -0500)]
[SDAG] Improve MemSDNode::getBasePtr

It returned getOperand(1), except for STORE for which it returned
getOperand(2). Handle MSTORE, MGATHER, and MSCATTER as well.

4 years ago[mlir] [LLVMIR] Mark reductions as side-effect free
aartbik [Tue, 25 Aug 2020 19:26:28 +0000 (12:26 -0700)]
[mlir] [LLVMIR] Mark reductions as side-effect free

Attribute was missing from original base class.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D86569

4 years ago[OpenMP] Pack first-private arguments to improve efficiency of data transfer
Shilei Tian [Tue, 25 Aug 2020 20:04:59 +0000 (16:04 -0400)]
[OpenMP] Pack first-private arguments to improve efficiency of data transfer

In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive.

Let's take the test case as example.
```
int main() {
  int data1[3] = {1}, data2[3] = {2}, data3[3] = {3};
  int sum[16] = {0};
#pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3)
  for (int i = 0; i < 16; ++i) {
    for (int j = 0; j < 3; ++j) {
      sum[i] += data1[j];
      sum[i] += data2[j];
      sum[i] += data3[j];
    }
  }
}
```
Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)*3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way:
1. First collect all information for those *first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later.
2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address.
3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`.

The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable?

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D86307

4 years ago[AMDGPU] Switch to named simm16 in vscnt insertion
Stanislav Mekhanoshin [Tue, 25 Aug 2020 19:13:24 +0000 (12:13 -0700)]
[AMDGPU] Switch to named simm16 in vscnt insertion

Differential Revision: https://reviews.llvm.org/D86568

4 years ago[Hexagon] Check if EVT is simple type in HVX lowering
Ankit Aggarwal [Tue, 25 Aug 2020 19:47:44 +0000 (14:47 -0500)]
[Hexagon] Check if EVT is simple type in HVX lowering

4 years ago[lldb] Make Reproducer compatbile with SubsystemRAII (NFC)
Jonas Devlieghere [Tue, 25 Aug 2020 19:49:30 +0000 (12:49 -0700)]
[lldb] Make Reproducer compatbile with SubsystemRAII  (NFC)

Make Reproducer compatbile with SubsystemRAII and use it in
LocateSymbolFileTest.

4 years ago[SystemZ][z/OS] Add z/OS Target and define macros
Abhina Sreeskantharajan [Tue, 25 Aug 2020 17:50:17 +0000 (13:50 -0400)]
[SystemZ][z/OS] Add z/OS Target and define macros

This patch adds the z/OS target and defines macros as a stepping stone
towards enabling a native build on z/OS.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D85324

4 years ago[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands
Juneyoung Lee [Sun, 23 Aug 2020 17:37:10 +0000 (02:37 +0900)]
[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands

This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.

Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86477

4 years ago[ValueTracking] Add a noundef test for D86477; NFC
Juneyoung Lee [Mon, 24 Aug 2020 17:34:17 +0000 (02:34 +0900)]
[ValueTracking] Add a noundef test for D86477; NFC

4 years agoReland "[DebugInfo] Move constructor homing case in shouldOmitDefinition."
Amy Huang [Tue, 25 Aug 2020 17:36:03 +0000 (10:36 -0700)]
Reland "[DebugInfo] Move constructor homing case in shouldOmitDefinition."

For some reason the ctor homing case was before the template
specialization case, and could have returned false too early.
I moved the code out into a separate function to avoid this.

This reverts commit 05777ab941063192b9ccb1775358a83a2700ccc1.

4 years ago[MemDep] Use BatchAA when computing pointer dependencies
Nikita Popov [Thu, 25 Jun 2020 19:35:41 +0000 (21:35 +0200)]
[MemDep] Use BatchAA when computing pointer dependencies

We're not changing IR while running a single MemDep query, so it's
safe to cache alias analysis results using BatchAA. This adds BatchAA
usage to getSimplePointerDependencyFrom(), which is non-intrusive --
covering larger parts (like a whole processNonLocalLoad query) is
also possible, but requires threading BatchAA through a bunch of APIs.

For the ThinLTO configuration, this is a 1% geomean improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D85583

4 years ago[mlir] [LLVMIR] Add get active lane mask intrinsic
aartbik [Tue, 25 Aug 2020 18:55:45 +0000 (11:55 -0700)]
[mlir] [LLVMIR] Add get active lane mask intrinsic

Provides fast, generic way of setting a mask up to a certain
point. Potential use cases that may benefit are create_mask
and transfer_read/write operations in the vector dialect.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D86501

4 years ago[llvm-mca][NFC] Refactor handling of views that examine individual instructions,
Wolfgang Pieb [Fri, 21 Aug 2020 22:52:02 +0000 (15:52 -0700)]
[llvm-mca][NFC] Refactor handling of views that examine individual instructions,
including printing them.

Reviewers: andreadb, lebedev.ri

Differential Review: https://reviews.llvm.org/D86390

Introduces a new base class "InstructionView" that such views derive from.
Other views still use the "View" base class.

4 years ago[mlir][openacc][NFC] Fix comment about OpenACCExecMapping
clementval [Tue, 25 Aug 2020 19:11:05 +0000 (15:11 -0400)]
[mlir][openacc][NFC] Fix comment about OpenACCExecMapping

4 years ago[flang] Check that various variables referenced in I/O statements may be defined
peter klausler [Tue, 25 Aug 2020 17:34:33 +0000 (10:34 -0700)]
[flang] Check that various variables referenced in I/O statements may be defined

A number of I/O syntax rules involve variables that will be written to,
and must therefore be definable.  This includes internal file variables,
IOSTAT= and IOMSG= specifiers, most INQUIRE statement specifiers, a few
other specifiers, and input variables.  This patch checks for
these violations, and implements several additional I/O TODO constraint
checks.

Differential Revision: https://reviews.llvm.org/D86557

4 years ago[tsan] On arm64e, strip out ptrauth bits from incoming PCs
Kuba Mracek [Tue, 25 Aug 2020 18:59:05 +0000 (11:59 -0700)]
[tsan] On arm64e, strip out ptrauth bits from incoming PCs

Differential Revision: https://reviews.llvm.org/D86378

4 years ago[X86] Mention -march=sapphirerapids in the release notes.
Craig Topper [Tue, 25 Aug 2020 18:56:43 +0000 (11:56 -0700)]
[X86] Mention -march=sapphirerapids in the release notes.

This was just added in e02d081f2b60b61eb60ef6a49b1a9f907e432d4c.

4 years ago[test] Add -inject-tli-mapping to -loop-vectorize -vector-library tests
Arthur Eubanks [Tue, 25 Aug 2020 17:59:12 +0000 (10:59 -0700)]
[test] Add -inject-tli-mapping to -loop-vectorize -vector-library tests

The legacy LoopVectorize has a dependency on InjectTLIMappingsLegacy.
That cannot be expressed in the new PM since they are both normal
passes. Explicitly add -inject-tli-mappings as a pass.

Follow-up to https://reviews.llvm.org/D86492.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86561

4 years ago[examples] Fix dependencies for OrcV2Examples/LLJITWithThinLTOSummaries.
Lang Hames [Tue, 25 Aug 2020 18:50:32 +0000 (11:50 -0700)]
[examples] Fix dependencies for OrcV2Examples/LLJITWithThinLTOSummaries.

4 years ago[ORC] Fix an endif comment.
Lang Hames [Mon, 24 Aug 2020 03:34:57 +0000 (20:34 -0700)]
[ORC] Fix an endif comment.

4 years ago[flang] Improve error handling for bad characters in source
peter klausler [Tue, 25 Aug 2020 16:38:41 +0000 (09:38 -0700)]
[flang] Improve error handling for bad characters in source

When an illegal character appears in Fortran source (after
preprocessing), catch and report it in the prescanning phase
rather than leaving it for the parser to cope with.

Differential Revision: https://reviews.llvm.org/D86553

4 years ago[flang] Parse global compiler directives
peter klausler [Tue, 25 Aug 2020 16:39:52 +0000 (09:39 -0700)]
[flang] Parse global compiler directives

Accept and represent "global" compiler directives that appear
before and between program units in a source file.

Differential Revision: https://reviews.llvm.org/D86555

4 years ago[lldb] Initialize reproducers in LocateSymbolFileTest
Raphael Isemann [Tue, 25 Aug 2020 18:25:15 +0000 (20:25 +0200)]
[lldb] Initialize reproducers in LocateSymbolFileTest

Since a842950b62b6d029a392c3c312c6495d6368c2a4 this test started using
the reproducer subsystem but we never initialized it in the test. The
Subsystem takes an argument, so we can't use the usual SubsystemRAII at the
moment to do this for us.

This just adds the initialize/terminate calls to get the test passing again.

4 years ago[lldb] Don't ask for QOS_CLASS_UNSPECIFIED queue in TestQueues
Raphael Isemann [Tue, 25 Aug 2020 17:53:48 +0000 (19:53 +0200)]
[lldb] Don't ask for QOS_CLASS_UNSPECIFIED queue in TestQueues

TestQueues is curiously failing for me as my queue for QOS_CLASS_UNSPECIFIED
is named "Utility" and not "User Initiated" or "Default". While debugging, this
I noticed that this test isn't actually using this API right from what I understand. The API documentation
for `dispatch_get_global_queue` specifies for the parameter: "You may specify the value
QOS_CLASS_USER_INTERACTIVE, QOS_CLASS_USER_INITIATED, QOS_CLASS_UTILITY, or QOS_CLASS_BACKGROUND."

QOS_CLASS_UNSPECIFIED isn't listed as one of the supported values. swift-corelibs-libdispatch
even checks for this value and returns a DISPATCH_BAD_INPUT. The
libdispatch shipped on macOS seems to also check for QOS_CLASS_UNSPECIFIED and seems to
instead cause a "client crash", but somehow this doesn't trigger in this test and instead we just
get whatever queue

This patch just removes that part of the test as it appears the code is just incorrect.

Reviewed By: jasonmolenda

Differential Revision: https://reviews.llvm.org/D86211

4 years ago[FIX] Avoid creating BFI when emitting remarks for dead functions
Wei Wang [Tue, 25 Aug 2020 17:42:54 +0000 (10:42 -0700)]
[FIX] Avoid creating BFI when emitting remarks for dead functions

Dead function has its body stripped away, and can cause various
analyses to panic. Also it does not make sense to apply analyses on
such function.

Reviewed By: xazax.hun, MaskRay, wenlei, hoy

Differential Revision: https://reviews.llvm.org/D84715

4 years ago[mlir] NFC: fix typo in FileCheck prefix
Kazuaki Ishizaki [Tue, 25 Aug 2020 18:11:48 +0000 (03:11 +0900)]
[mlir] NFC: fix typo in FileCheck prefix

CHECL -> CHECK

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D86550

4 years agoAArch64: Fix hardcoded register in test
Matt Arsenault [Wed, 3 Jun 2020 21:53:15 +0000 (17:53 -0400)]
AArch64: Fix hardcoded register in test

4 years ago[flang] Don't completely left-justify fixed-form tokenization
peter klausler [Tue, 25 Aug 2020 16:39:09 +0000 (09:39 -0700)]
[flang] Don't completely left-justify fixed-form tokenization

If the label field is empty, and macro replacement occurs,
the rescanned text might be misclassified as a comment card
if it happens to begin with a C or a D.  Insert a leading
space into these otherwise empty label fields.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47173

4 years ago[NewPM][test] Fix accelerate-vector-functions.ll under NPM
Arthur Eubanks [Mon, 24 Aug 2020 22:38:00 +0000 (15:38 -0700)]
[NewPM][test] Fix accelerate-vector-functions.ll under NPM

The legacy SLPVectorizer has a dependency on InjectTLIMappingsLegacy.
That cannot be expressed in the new PM since they are both normal
passes. Explicitly add -inject-tli-mappings as a pass.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86492

4 years ago[ARM] Additional test for tailpred reductions. NFC
David Green [Tue, 25 Aug 2020 17:29:15 +0000 (18:29 +0100)]
[ARM] Additional test for tailpred reductions. NFC

4 years ago[Hexagon] Remove (redundant) HexagonISelLowering::isHvxOperation(SDValue)
Krzysztof Parzyszek [Tue, 25 Aug 2020 16:42:42 +0000 (11:42 -0500)]
[Hexagon] Remove (redundant) HexagonISelLowering::isHvxOperation(SDValue)

Use isHvxOperation(SDNode*) instead.

4 years ago[x86] add AVX shuffle test for PR47262; NFC
Sanjay Patel [Tue, 25 Aug 2020 16:42:20 +0000 (12:42 -0400)]
[x86] add AVX shuffle test for PR47262; NFC

Goes with D86429

4 years ago[LoopNest] False negative of `arePerfectlyNested` with LCSSA loops
Ta-Wei Tu [Tue, 25 Aug 2020 16:17:01 +0000 (16:17 +0000)]
[LoopNest] False negative of `arePerfectlyNested` with LCSSA loops

Summary: The LCSSA pass (required for all loop passes) sometimes adds
additional blocks containing LCSSA variables, and checkLoopsStructure
may return false even when the loops are perfectly nested in this case.
This is because the successor of the exit block of the inner loop now
points to the LCSSA block instead of the latch block of the outer loop.
Examples are shown in the test nests-with-lcssa.ll.

To fix the issue, the successor of the exit block of the inner loop can
now point to a block in which all instructions are LCSSA phi node
(except the terminator), and the sole successor of that block should
point to the latch block of the outer loop.

Reviewed By: Whitney, etiotto

Differential Revision: https://reviews.llvm.org/D86133

4 years ago[AIX][compiler-rt][builtins] Don't add ppc builtin implementations that require __int...
David Tenty [Tue, 25 Aug 2020 15:30:17 +0000 (11:30 -0400)]
[AIX][compiler-rt][builtins] Don't add ppc builtin implementations that require __int128 on AIX

since __int128 currently isn't supported on AIX.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D85972

4 years ago[LangRef] Revise semantics of intrinsic get.active.lane.mask
Sjoerd Meijer [Tue, 25 Aug 2020 15:14:49 +0000 (16:14 +0100)]
[LangRef] Revise semantics of intrinsic get.active.lane.mask

A first version of get.active.lane.mask was committed in rG7fb8a40e5220. One of
the main purposes and uses of this intrinsic is to communicate information from
the middle-end to the back-end, but its current definition and semantics make
this actually very difficult. The intrinsic was defined as:

  @llvm.get.active.lane.mask(%IV, %BTC)

where %BTC is the Backedge-Taken Count (variable names are different in the
LangRef spec). This allows to implicitly communicate the loop tripcount, which
can be reconstructed by calculating BTC + 1. But it has been very difficult to
prove that calculating BTC + 1 is safe and doesn't overflow. We need
complicated range and SCEV analysis, and thus the problem is that this
intrinsic isn't really doing what it was supposed to solve. Examples of the
overflow checks that are required in the (ARM) back-end are D79175 and D86074,
which aren't even complete/correct yet.

To solve this problem, we are revising the definitions/semantics for
get.active.lane.mask to avoid all the complicated overflow analysis. This means
that instead of communicating the BTC, we are now using the loop tripcount. Now
using LangRef's variable names, its semantics is changed from:

  icmp ule (%base + i), %n

to:

  icmp ult (%base + i), %n

with %n > 0 and corresponding to the loop tripcount. The intrinsic signature
remains the same.

Differential Revision: https://reviews.llvm.org/D86147

4 years ago[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd...
Sanjay Patel [Tue, 25 Aug 2020 15:11:43 +0000 (11:11 -0400)]
[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try)

The 1st attempt (rG557b890) was reverted because it caused miscompiles.
That bug is avoided here by changing the order of folds and as verified
in the new tests.

Original commit message:
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460

4 years ago[InstCombine] add vector demanded elements tests with shuffles; NFC
Sanjay Patel [Tue, 25 Aug 2020 14:57:13 +0000 (10:57 -0400)]
[InstCombine] add vector demanded elements tests with shuffles; NFC

The 1st draft of D86460 (reverted) would show miscompiles with these tests
because the undef element tracking went wrong and became visible in the
shuffle masks.

4 years ago[ELF] .note.gnu.property: error for invalid pr_datasize
Fangrui Song [Tue, 25 Aug 2020 15:05:38 +0000 (08:05 -0700)]
[ELF] .note.gnu.property: error for invalid pr_datasize

A n_type==NT_GNU_PROPERTY_TYPE_0 note encodes a program property.
If pr_datasize is invalid, LLD may crash
(https://github.com/ClangBuiltLinux/linux/issues/1141)

This patch adds some error checking, supports big-endian, and add some tests
for invalid n_descsz.

Differential Revision: https://reviews.llvm.org/D86422

4 years agoAMDGPU/GlobalISel: re-auto-generate some test checks
Jay Foad [Tue, 25 Aug 2020 14:52:11 +0000 (15:52 +0100)]
AMDGPU/GlobalISel: re-auto-generate some test checks

4 years ago[Verifier] Additional check for intrinsic get.active.lane.mask
Sjoerd Meijer [Tue, 25 Aug 2020 14:13:51 +0000 (15:13 +0100)]
[Verifier] Additional check for intrinsic get.active.lane.mask

This adapts the verifier checks for intrinsic get.active.lane.mask to the new
semantics of it as described in D86147. I.e., the second argument %n, which
corresponds to the loop tripcount, must be greater than 0 if it is a constant,
so check that.

Differential Revision: https://reviews.llvm.org/D86301

4 years ago[scudo][standalone] Skip irrelevant regions during release
Kostya Kortchinsky [Mon, 24 Aug 2020 21:13:12 +0000 (14:13 -0700)]
[scudo][standalone] Skip irrelevant regions during release

With the 'new' way of releasing on 32-bit, we iterate through all the
regions in between `First` and `Last`, which covers regions that do not
belong to the class size we are working with. This is effectively wasted
cycles.

With this change, we add a `SkipRegion` lambda to `releaseFreeMemoryToOS`
that will allow the release function to know when to skip a region.
For the 64-bit primary, since we are only working with 1 region, we never
skip.

Reviewed By: hctim

Differential Revision: https://reviews.llvm.org/D86399

4 years ago[DWARFYAML] Make the 'Attributes' field optional.
Xing GUO [Tue, 25 Aug 2020 14:37:40 +0000 (22:37 +0800)]
[DWARFYAML] Make the 'Attributes' field optional.

This patch makes the 'Attributes' field optional. We don't need to
explicitly specify the 'Attributes' field in the future.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D86537

4 years ago[SelectionDAG] Legalize intrinsic get.active.lane.mask
Sjoerd Meijer [Tue, 25 Aug 2020 13:41:53 +0000 (14:41 +0100)]
[SelectionDAG] Legalize intrinsic get.active.lane.mask

This adapts legalization of intrinsic get.active.lane.mask to the new semantics
as described in D86147. Because the second argument is now the loop tripcount,
we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the
backedge-taken count.

Differential Revision: https://reviews.llvm.org/D86302

4 years ago[LiveDebugValues] Add switches for using instr-ref variable locations
Jeremy Morse [Tue, 25 Aug 2020 13:23:14 +0000 (14:23 +0100)]
[LiveDebugValues] Add switches for using instr-ref variable locations

This patch adds the -Xclang option
"-fexperimental-debug-variable-locations" and same LLVM CodeGen option,
to pick which variable location tracking solution to use.

Right now all the switch does is pick which LiveDebugValues
implementation to use, the normal VarLoc one or the instruction
referencing one in rGae6f78824031. Over time, the aim is to add fragments
of support in aid of the value-tracking RFC:

  http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html

also controlled by this command line switch. That will slowly move
variable locations to be defined by an instruction calculating a value,
and a DBG_INSTR_REF instruction referring to that value. Thus, this is
going to grow into a "use the new kind of variable locations" switch,
rather than just "use the new LiveDebugValues implementation".

Differential Revision: https://reviews.llvm.org/D83048

4 years agoAMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge
Matt Arsenault [Sat, 22 Aug 2020 22:00:08 +0000 (18:00 -0400)]
AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge

Most notably, we were incorrectly reporting <3 x s16> as a legal type
for these. Make sure these aren't legal to help make progress on
fixing the artifact combiner and vector legalizer
rules. Unfortunately, this means spreading the -global-isel-abort=0
hack, although this doesn't change the legalizer result in any
situation.

4 years agoAMDGPU/GlobalISel: Fix using unlegalizable values in tests
Matt Arsenault [Sat, 22 Aug 2020 21:24:47 +0000 (17:24 -0400)]
AMDGPU/GlobalISel: Fix using unlegalizable values in tests

Implicit uses of non-register value types places impossible to satisfy
constraints on the legalizer / artifact combiner. These prevent
writing sensible legalize rules for the artifacts without triggering
infinite loops in the legalizer.

The verifier really needs to enforce this, but I'm not sure what the
exact conditions would look like yet.

4 years ago[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
Sjoerd Meijer [Tue, 25 Aug 2020 12:53:26 +0000 (13:53 +0100)]
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks

This adapts tail-predication to the new semantics of get.active.lane.mask as
defined in D86147. This means that:
- we can remove the BTC + 1 overflow checks because now the loop tripcount is
  passed in to the intrinsic,
- we can immediately use that value to setup a counter for the number of
  elements processed by the loop and don't need to materialize BTC + 1.

Differential Revision: https://reviews.llvm.org/D86303

4 years agoAMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors
Matt Arsenault [Sun, 16 Aug 2020 16:51:31 +0000 (12:51 -0400)]
AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors

The selection patterns will currently fail on these.

4 years ago[Utils] Add highlighting definition for byref IR attribute
Anatoly Trosinenko [Tue, 25 Aug 2020 12:52:26 +0000 (15:52 +0300)]
[Utils] Add highlighting definition for byref IR attribute

This patch assumes `byref` can be handled identically to `byval`.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D85768