platform/upstream/llvm.git
4 years ago[DemandedBits] Reorder addition test checks. NFC.
Simon Pilgrim [Mon, 17 Aug 2020 11:46:31 +0000 (12:46 +0100)]
[DemandedBits] Reorder addition test checks. NFC.

As suggested on D72423 we should try to keep the same order as the original IR

4 years ago[NFC] Run update script on test
Sam Parker [Mon, 17 Aug 2020 11:50:30 +0000 (12:50 +0100)]
[NFC] Run update script on test

Update IndVarSimplify/no-iv-rewrite.ll

4 years ago[LLD][ELF] - Do not produce an invalid dynamic relocation order with --shuffle-sections.
Georgii Rymar [Mon, 10 Aug 2020 14:00:53 +0000 (17:00 +0300)]
[LLD][ELF] - Do not produce an invalid dynamic relocation order with --shuffle-sections.

Normally (when not on android with android relocation packing enabled),
we put IRelative relocations to ".rel[a].dyn", after other relocations,
to ensure that IRelatives are processed last by the dynamic loader.

To achieve that we add the `in.relaIplt` after the `part.relaDyn`:
https://github.com/llvm/llvm-project/blob/master/lld/ELF/Writer.cpp#L540

The problem is that `--shuffle-sections` might break the sections order.
This patch fixes it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47056.

Differential revision: https://reviews.llvm.org/D85651

4 years ago[lldb][NFC] Remove name parameter from CreateFunctionTemplateDecl
Raphael Isemann [Mon, 17 Aug 2020 11:38:21 +0000 (13:38 +0200)]
[lldb][NFC] Remove name parameter from CreateFunctionTemplateDecl

It's unused and not documented.

4 years ago[lldb][NFC] Use expect_expr in more tests
Raphael Isemann [Mon, 17 Aug 2020 11:08:40 +0000 (13:08 +0200)]
[lldb][NFC] Use expect_expr in more tests

4 years ago[X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases
Simon Pilgrim [Mon, 17 Aug 2020 10:17:20 +0000 (11:17 +0100)]
[X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases

Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported.

We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004

4 years ago[lldb][NFC] Use the proper type for the 'storage' parameter of CreateFunctionDeclaration
Raphael Isemann [Mon, 17 Aug 2020 10:47:12 +0000 (12:47 +0200)]
[lldb][NFC] Use the proper type for the 'storage' parameter of CreateFunctionDeclaration

All the callers pass an enum and we cast the int anyway back to the actual type,
so we might as well just use the type for the parameter.

4 years ago[InlineCost] Fix scalable vectors in visitAlloca
Cullen Rhodes [Wed, 12 Aug 2020 18:03:46 +0000 (18:03 +0000)]
[InlineCost] Fix scalable vectors in visitAlloca

Discovered as part of the VLS type work (see D85128).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85848

4 years ago[NFC][StackSafety] Move out sort from the loop
Vitaly Buka [Fri, 14 Aug 2020 11:17:08 +0000 (04:17 -0700)]
[NFC][StackSafety] Move out sort from the loop

4 years ago[lldb] Remove OS-specific string from TestInvalidArgsLog
Raphael Isemann [Mon, 17 Aug 2020 09:53:03 +0000 (11:53 +0200)]
[lldb] Remove OS-specific string from TestInvalidArgsLog

This is the error message from the OS, so we shouldn't check against the
OS-specific part of the string.

Fixes the test on Windows which returns a different error message.

4 years ago[lldb] Don't delete orphaned shared modules in SBDebugger::DeleteTarget
Raphael Isemann [Mon, 17 Aug 2020 09:03:36 +0000 (11:03 +0200)]
[lldb] Don't delete orphaned shared modules in SBDebugger::DeleteTarget

In D83876 the consensus seems that LLDB should never deleted orphaned modules
implicitly. However, SBDebugger::DeleteTarget is currently doing exactly that.
This code was added in 753406221b55b95141c8c1239660dc4db4e35ea5 but I don't see
any explanation in the commit, so I think we should delete it.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D83933

4 years ago[lldb/Utility] Simplify and generalize Scalar class
Pavel Labath [Mon, 27 Jul 2020 13:06:48 +0000 (15:06 +0200)]
[lldb/Utility] Simplify and generalize Scalar class

The class contains an enum listing all host integer types as well as
some non-host types. This setup is a remnant of a time when this class
was actually implemented in terms of host integer types. Now that we are
using llvm::APInt, they are mostly useless and mean that each function
needs to enumerate all of these cases even though it treats most of them
identically.

I only leave e_sint and e_uint to denote the integer signedness, but I
want to remove that in a follow-up as well.

Removing these cases simplifies most of these functions, with the only
exception being PromoteToMaxType, which can no longer rely on a simple
enum comparison to determine what needs to be promoted.

This also makes the class ready to work with arbitrary integer sizes, so
it does not need to be modified when someone needs to add a larger
integer size.

Differential Revision: https://reviews.llvm.org/D85836

4 years ago[lldb] Forcefully complete a type when adding nested classes
Pavel Labath [Fri, 14 Aug 2020 12:21:09 +0000 (14:21 +0200)]
[lldb] Forcefully complete a type when adding nested classes

With -flimit-debug-info, we can run into cases when we only have a class
as a declaration, but we do have a definition of a nested class. In this
case, clang will hit an assertion when adding a member to an incomplete
type (but only if it's adding a c++ class, and not C struct).

It turns out we already had code to handle a similar situation arising
in the -gmodules scenario. This extends the code to handle
-flimit-debug-info as well, and reorganizes bits of other code handling
completion of types to move functions doing similar things closer
together.

Differential Revision: https://reviews.llvm.org/D85968

4 years ago[lldb] Add SBModule::GarbageCollectAllocatedModules and clear modules after each...
Raphael Isemann [Mon, 17 Aug 2020 08:56:02 +0000 (10:56 +0200)]
[lldb] Add SBModule::GarbageCollectAllocatedModules and clear modules after each test run

Right now the only places in the SB API where lldb:: ModuleSP instances are
destroyed are in SBDebugger::MemoryPressureDetected (where it's just attempted
but not guaranteed) and in SBDebugger::DeleteTarget (which will be removed in
D83933). Tests that directly create an lldb::ModuleSP and never create a target
therefore currently leak lldb::Module instances. This triggers the sanity checks
in lldbtest that make sure that the global module list is empty after a test.

This patch adds SBModule::GarbageCollectAllocatedModules as an explicit way to
clean orphaned lldb::ModuleSP instances. Also we now start calling this method
at the end of each test run and move the sanity check behind that call to make
this work. This way even tests that don't create targets can pass the sanity
check.

This fixes TestUnicodeSymbols.py when D83865 is applied (which makes that the
sanity checks actually fail the test).

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D83876

4 years ago[lldb] Fix that log enable's -f parameter causes LLDB to crash when it can't open...
Raphael Isemann [Mon, 17 Aug 2020 08:30:23 +0000 (10:30 +0200)]
[lldb] Fix that log enable's -f parameter causes LLDB to crash when it can't open the log file

We didn't do anything with the llvm::Error we get from `Open`, so when we end up in the
error case we just crash due to the llvm::Error sanity check. Also add the missing newline
behind the error message so it no longer messes with the next (lldb) prompt.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D85970

4 years ago[lldb] Get lldb-server platform's --socket-file working again
Raphael Isemann [Mon, 17 Aug 2020 08:25:20 +0000 (10:25 +0200)]
[lldb] Get lldb-server platform's --socket-file working again

`lldb-server platform --socket-file /any/path` currently always fails to create
the socket file.  This stopped working after D67424 which changed the
input variables of `writeFileAtomically` slightly. We're expected to
pass in a temporary path template (`/tmp/foo-%%%%%`) and the final
path we want to write. Instead we currently pass in the never set
`temp_file_path` as the temporary path (which will make this function always
fail) and pass in the temp_file_spec's path as the final path (which is actually
the template path such as `/tmp/foo-%%%%%`) instead of the actual path
we want to write (e.g. `/tmp/foo`).

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D85890

4 years ago[VE] Support f128
Kazushi (Jam) Marukawa [Sat, 27 Jun 2020 10:08:09 +0000 (19:08 +0900)]
[VE] Support f128

Support f128 using VE instructions.  Update regression tests.
I've noticed there is no load or store i128 test, so I add them too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D86035

4 years ago[lldb][NFC] Remove stride parameter from GetArrayElementType
Raphael Isemann [Mon, 17 Aug 2020 08:18:38 +0000 (10:18 +0200)]
[lldb][NFC] Remove stride parameter from GetArrayElementType

This parameter isn't used anywhere in LLDB nor the Swift downstream branch. It
also doesn't really fit into the TypeSystem APIs that usually don't return
additional related functionality via some output parameters. Also the
implementations already states that the calculated value there is wrong.

Let's remove it. If we need this functionality at some point then Swift's much
nicer `GetByteStride` function seems like the way to go.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D84299

4 years ago[clang] Make signature help work with dependent args
Kadir Cetinkaya [Tue, 11 Aug 2020 19:10:57 +0000 (21:10 +0200)]
[clang] Make signature help work with dependent args

Fixes https://github.com/clangd/clangd/issues/490

Differential Revision: https://reviews.llvm.org/D85826

4 years ago[lldb] Print the exception traceback when hitting cleanup errors
Raphael Isemann [Mon, 17 Aug 2020 07:53:25 +0000 (09:53 +0200)]
[lldb] Print the exception traceback when hitting cleanup errors

Right now if the test suite encounters a cleanup error it just prints "CLEANUP
ERROR:" but not any additional information.

This patch just prints the exception that caused the cleanup error. This should
make debugging the failing tests for D83865 easier (and seems in general nice to
have).

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D83874

4 years ago[X86] Reject dirflag in inline asm constraints other than clobber.
Craig Topper [Mon, 17 Aug 2020 06:27:41 +0000 (23:27 -0700)]
[X86] Reject dirflag in inline asm constraints other than clobber.

Fixes the crash from PR47195.

4 years ago[PowerPC] Make StartMI ignore COPY like instructions.
Chen Zheng [Mon, 10 Aug 2020 14:55:22 +0000 (10:55 -0400)]
[PowerPC] Make StartMI ignore COPY like instructions.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D85659

4 years ago[InstCombine] Fix a compilation bug
Yonghong Song [Mon, 17 Aug 2020 04:56:42 +0000 (21:56 -0700)]
[InstCombine] Fix a compilation bug

With gcc 6.3.0, I hit the following compilation bug.
  ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
   };
    ^
  cc1plus: all warnings being treated as errors

The error is introduced by Commit ae7f08812e09 ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")

4 years ago[clang] fix a compilation bug
Yonghong Song [Mon, 17 Aug 2020 04:49:13 +0000 (21:49 -0700)]
[clang] fix a compilation bug

With gcc 6.3.0, I hit the following compilation bug:
  /home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:
  In function ‘bool ParseCodeGenArgs(clang::CodeGenOptions&, llvm::opt::ArgList&,
  clang::InputKind, clang::DiagnosticsEngine&, const clang::TargetOptions&,
  const clang::FrontendOptions&)’:
  /home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:780:12:
    error: unused variable ‘A’ [-Werror=unused-variable]
     if (Arg *A = Args.getLastArg(OPT_fuse_ctor_homing))
              ^
  cc1plus: all warnings being treated as errors

The bug is introduced by Commit ae6523cd62a4 ("[DebugInfo] Add
-fuse-ctor-homing cc1 flag so we can turn on constructor homing only
if limited debug info is already on.")

4 years agoInitial MLIR python bindings based on the C API.
zhanghb97 [Mon, 17 Aug 2020 01:49:28 +0000 (18:49 -0700)]
Initial MLIR python bindings based on the C API.

* Basic support for context creation, module parsing and dumping.

Differential Revision: https://reviews.llvm.org/D85481

4 years ago[StackSafety] Skip ambiguous lifetime analysis
Vitaly Buka [Fri, 7 Aug 2020 02:10:02 +0000 (19:10 -0700)]
[StackSafety] Skip ambiguous lifetime analysis

If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630

4 years agoReplace setter named 'getAsOpaqueInt' with a real getter.
Richard Smith [Sun, 16 Aug 2020 23:36:10 +0000 (16:36 -0700)]
Replace setter named 'getAsOpaqueInt' with a real getter.

Clean up a bunch of places where the opaque forms of FPOptions and
FPOptionsOverride were being used inappropriately.

4 years agoAlways keep unset fields in FPOptionsOverride zeroed.
Richard Smith [Sun, 16 Aug 2020 22:44:51 +0000 (15:44 -0700)]
Always keep unset fields in FPOptionsOverride zeroed.

There are three fields that the FPOptions default constructor sets to
non-zero values; those fields previously could have been zero or
non-zero depending on whether they'd been explicitly removed from the
FPOptionsOverride set. However, that doesn't seem to ever actually
happen, so this is NFC, except that it makes the AST file representation
of FPOptionsOverride make more sense.

4 years agoUse consistent code for setting FPFeatures from operator constructors.
Richard Smith [Sun, 16 Aug 2020 22:40:38 +0000 (15:40 -0700)]
Use consistent code for setting FPFeatures from operator constructors.

4 years agoDon't leave the FPOptions in a UnaryOperator uninitialized.
Richard Smith [Sun, 16 Aug 2020 22:15:16 +0000 (15:15 -0700)]
Don't leave the FPOptions in a UnaryOperator uninitialized.

We don't appear to use these FPOptions for anything right now, but
they shouldn't be uninitialized because that makes our AST file output
nondeterministic.

4 years agoAdd missing parsing for attributes to std.generic_atomic_rmw op
Mehdi Amini [Sat, 15 Aug 2020 23:58:32 +0000 (23:58 +0000)]
Add missing parsing for attributes to std.generic_atomic_rmw op

Fix llvm.org/pr47182

Differential Revision: https://reviews.llvm.org/D86030

4 years ago[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically
Roman Lebedev [Sun, 16 Aug 2020 20:37:36 +0000 (23:37 +0300)]
[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically

4 years ago[InstCombine] Aggregate reconstruction simplification (PR47060)
Roman Lebedev [Sun, 16 Aug 2020 20:27:56 +0000 (23:27 +0300)]
[InstCombine] Aggregate reconstruction simplification (PR47060)

This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.

The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a  `call`.

Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S

This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.

I think, this is a missing fold in InstCombine, so i've implemented it.

I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
   If it was extracted from the aggregate with identical type,
   from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
   then we can just use said original source aggregate directly,
   instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
   we need be PHI-aware - we might have different source aggregate when coming
   from each predecessor.

I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.

I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.

On RawSpeed, for example, this has the following effect:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |     1253 |  1253 |   0.00% |  0.00% |
| simplifycfg.NumInvokes                            |      948 |     1355 |   407 |  42.93% | 42.93% |
| instcount.NumInsertValueInst                      |     4382 |     3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode                     |      574 |      458 |  -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs                   |     1154 |      921 |  -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst                     |    29017 |    26397 | -2620 |  -9.03% |  9.03% |
| instcombine.NumDeadInst                           |   166618 |   174705 |  8087 |   4.85% |  4.85% |
| instcount.NumPHIInst                              |    51526 |    50678 |  -848 |  -1.65% |  1.65% |
| instcount.NumLandingPadInst                       |    20865 |    20609 |  -256 |  -1.23% |  1.23% |
| instcount.NumInvokeInst                           |    34023 |    33675 |  -348 |  -1.02% |  1.02% |
| simplifycfg.NumSimpl                              |   113634 |   114708 |  1074 |   0.95% |  0.95% |
| instcombine.NumSunkInst                           |    15030 |    14930 |  -100 |  -0.67% |  0.67% |
| instcount.TotalBlocks                             |   219544 |   219024 |  -520 |  -0.24% |  0.24% |
| instcombine.NumCombined                           |   644562 |   645805 |  1243 |   0.19% |  0.19% |
| instcount.TotalInsts                              |  2139506 |  2135377 | -4129 |  -0.19% |  0.19% |
| instcount.NumBrInst                               |   156988 |   156821 |  -167 |  -0.11% |  0.11% |
| instcount.NumCallInst                             |  1206144 |  1207076 |   932 |   0.08% |  0.08% |
| instcount.NumResumeInst                           |     5193 |     5190 |    -3 |  -0.06% |  0.06% |
| asm-printer.EmittedInsts                          |   948580 |   948299 |  -281 |  -0.03% |  0.03% |
| instcount.TotalFuncs                              |    11509 |    11507 |    -2 |  -0.02% |  0.02% |
| inline.NumDeleted                                 |    97595 |    97597 |     2 |   0.00% |  0.00% |
| inline.NumInlined                                 |   210514 |   210522 |     8 |   0.00% |  0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.

On vanilla llvm-test-suite:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |      744 |   744 |   0.00% |  0.00% |
| instcount.NumInsertValueInst                      |     2705 |     2053 |  -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes                            |     1212 |     1424 |   212 |  17.49% | 17.49% |
| instcount.NumExtractValueInst                     |    21681 |    20139 | -1542 |  -7.11% |  7.11% |
| simplifycfg.NumSinkCommonInstrs                   |    14575 |    14361 |  -214 |  -1.47% |  1.47% |
| simplifycfg.NumSinkCommonCode                     |     6815 |     6743 |   -72 |  -1.06% |  1.06% |
| instcount.NumLandingPadInst                       |    14851 |    14712 |  -139 |  -0.94% |  0.94% |
| instcount.NumInvokeInst                           |    27510 |    27332 |  -178 |  -0.65% |  0.65% |
| instcombine.NumDeadInst                           |  1438173 |  1443371 |  5198 |   0.36% |  0.36% |
| instcount.NumResumeInst                           |     2880 |     2872 |    -8 |  -0.28% |  0.28% |
| instcombine.NumSunkInst                           |    55187 |    55076 |  -111 |  -0.20% |  0.20% |
| instcount.NumPHIInst                              |   321366 |   320916 |  -450 |  -0.14% |  0.14% |
| instcount.TotalBlocks                             |   886816 |   886493 |  -323 |  -0.04% |  0.04% |
| instcount.TotalInsts                              |  7663845 |  7661108 | -2737 |  -0.04% |  0.04% |
| simplifycfg.NumSimpl                              |   886791 |   887171 |   380 |   0.04% |  0.04% |
| instcount.NumCallInst                             |   553552 |   553733 |   181 |   0.03% |  0.03% |
| instcombine.NumCombined                           |  3200512 |  3201202 |   690 |   0.02% |  0.02% |
| instcount.NumBrInst                               |   741794 |   741656 |  -138 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                  |    14443 |    14445 |     2 |   0.01% |  0.01% |
| asm-printer.EmittedInsts                          |  7978085 |  7977916 |  -169 |   0.00% |  0.00% |
| inline.NumDeleted                                 |    73188 |    73189 |     1 |   0.00% |  0.00% |
| inline.NumInlined                                 |   291959 |   291968 |     9 |   0.00% |  0.00% |
```
Roughly similar effect, less instructions and blocks total.

See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.

Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions

And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text

See https://bugs.llvm.org/show_bug.cgi?id=47060

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85787

4 years ago[OpenMP][CUDA] Keep one kernel list per device, not globally.
Johannes Doerfert [Sun, 16 Aug 2020 16:00:33 +0000 (11:00 -0500)]
[OpenMP][CUDA] Keep one kernel list per device, not globally.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86039

4 years ago[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)
Johannes Doerfert [Sun, 16 Aug 2020 15:49:37 +0000 (10:49 -0500)]
[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)

Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86038

4 years ago[OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156
Johannes Doerfert [Sat, 15 Aug 2020 22:27:14 +0000 (17:27 -0500)]
[OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156

When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037

4 years ago[ARM] Tests for tail predicated loads. NFC
David Green [Sun, 16 Aug 2020 18:46:37 +0000 (19:46 +0100)]
[ARM] Tests for tail predicated loads. NFC

4 years ago[Sema] Use the proper cast for a fixed bool enum.
Mark de Wever [Sun, 16 Aug 2020 16:40:08 +0000 (18:40 +0200)]
[Sema] Use the proper cast for a fixed bool enum.

When casting an enumerate with a fixed bool type the casting should use
an IntegralToBoolean instead of an IntegralCast as is required per Core
Issue 2338.

Fixes PR47055: Incorrect codegen for enum with bool underlying type

Differential Revision: https://reviews.llvm.org/D85612

4 years ago[Sema] Validate calls to GetExprRange.
Mark de Wever [Sun, 16 Aug 2020 16:32:38 +0000 (18:32 +0200)]
[Sema] Validate calls to GetExprRange.

When a conditional expression has a throw expression it called
GetExprRange with a void expression, which caused an assertion failure.

This approach was suggested by Richard Smith.

Fixes PR46484: Clang crash in clang/lib/Sema/SemaChecking.cpp:10028

Differential Revision: https://reviews.llvm.org/D85601

4 years ago[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float...
Simon Pilgrim [Sun, 16 Aug 2020 13:52:07 +0000 (14:52 +0100)]
[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types

We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup.

There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.

4 years agoRevert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC"
Sanjay Patel [Sun, 16 Aug 2020 13:52:33 +0000 (09:52 -0400)]
Revert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC"

This reverts commit babb59496b540583c6951813d1e0b3abdea97e7d.

This test addition was queued up with some unrelated changes,
but it seems more likely that we need to fix something internal
to -memcpyopt. Also, I'm not sure if including target-specifc
attributes in a generic regression test dir will cause bot
problems.

4 years ago[InstCombine] fold copysign with fabs/fneg operand
Sanjay Patel [Sat, 15 Aug 2020 18:21:24 +0000 (14:21 -0400)]
[InstCombine] fold copysign with fabs/fneg operand

We already get this in the backend, but we need to do
it in IR too to consistently get yet more copysign
transforms.

4 years ago[InstCombine] reduce code duplication; NFC
Sanjay Patel [Sat, 15 Aug 2020 17:59:13 +0000 (13:59 -0400)]
[InstCombine] reduce code duplication; NFC

4 years ago[InstCombine] add tests for copysign; NFC
Sanjay Patel [Sat, 15 Aug 2020 17:33:35 +0000 (13:33 -0400)]
[InstCombine] add tests for copysign; NFC

4 years ago[PhaseOrdering] add test for memcpy removal (PR47114); NFC
Sanjay Patel [Fri, 14 Aug 2020 20:46:37 +0000 (16:46 -0400)]
[PhaseOrdering] add test for memcpy removal (PR47114); NFC

4 years ago[StackSafety] Change how callee searched in index
Vitaly Buka [Thu, 6 Aug 2020 12:28:03 +0000 (05:28 -0700)]
[StackSafety] Change how callee searched in index

Handle other than local linkage types.

4 years ago[X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp
Simon Pilgrim [Sun, 16 Aug 2020 11:26:09 +0000 (12:26 +0100)]
[X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp

Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.

4 years agoCreate strict aligned code for OpenBSD/arm64.
Brad Smith [Sun, 16 Aug 2020 10:50:50 +0000 (06:50 -0400)]
Create strict aligned code for OpenBSD/arm64.

4 years ago[X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC.
Simon Pilgrim [Sun, 16 Aug 2020 10:51:44 +0000 (11:51 +0100)]
[X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC.

Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size.

This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.

4 years ago[llvm-libtool-darwin] Fix test on all host architectures
Shoaib Meenai [Sun, 16 Aug 2020 07:16:35 +0000 (00:16 -0700)]
[llvm-libtool-darwin] Fix test on all host architectures

By default, if a universal binary has a slice matching the host
architecture, llvm-objdump will only print that slice, otherwise it'll
print all architectures. Explicitly pass `--arch all` to force it to
always print all architectures, as we want for this test.

4 years ago[OpenMP][OMPBuilder] Adding support for `omp single`
Fady Ghanim [Sun, 9 Aug 2020 18:46:21 +0000 (14:46 -0400)]
[OpenMP][OMPBuilder] Adding support for `omp single`

This adds support for generating `omp single`, and necessary calls for
`copyprivate` clause.

Differential Revision: https://reviews.llvm.org/D85617

4 years ago[llvm-libtool-darwin] Speculative buildbot fix
Shoaib Meenai [Sun, 16 Aug 2020 04:32:09 +0000 (21:32 -0700)]
[llvm-libtool-darwin] Speculative buildbot fix

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l is failing
this test. Attempt to explicitly use the Mach-O dump format as a
speculative fix.

4 years ago[gn build] Port 577e58bcc75
LLVM GN Syncbot [Sun, 16 Aug 2020 03:17:58 +0000 (03:17 +0000)]
[gn build] Port 577e58bcc75

4 years ago[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
Wenlei He [Sat, 15 Aug 2020 21:52:10 +0000 (14:52 -0700)]
[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks

This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

This is a resubmit of https://reviews.llvm.org/D83743

4 years ago[ARC] Fix CodeGen/ARC/brcc.ll
Fangrui Song [Sun, 16 Aug 2020 02:33:35 +0000 (19:33 -0700)]
[ARC] Fix CodeGen/ARC/brcc.ll

4 years ago[libomptarget] Implement host plugin for amdgpu
Jon Chesterfield [Sat, 15 Aug 2020 22:52:19 +0000 (23:52 +0100)]
[libomptarget] Implement host plugin for amdgpu

[libomptarget] Implement host plugin for amdgpu

Replacement for D71384. Primary difference is inlining the dependency on atmi
followed by extensive simplification and bugfixes. This is the latest version
from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with
minor patches and a rename from hsa to amdgpu, on the basis that this can't be
used by other implementations of hsa without additional work.

This will not build unless the ROCM_DIR variable is passed so won't break other
builds. That variable is used to locate two amdgpu specific libraries that ship
as part of rocm:
libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime
These libraries build from source. The build scripts in those repos are for
shared libraries, but can be adapted to statically link both into this plugin.

There are caveats.
- This works well enough to run various tests and benchmarks, and will be used
  to support the current clang bring up
- It is adequately thread safe for the above but there will be races remaining
- It is not stylistically correct for llvm, though has had clang-format run
- It has suboptimal memory management and locking strategies
- The debug printing / error handling is inconsistent

I would like to contribute this pretty much as-is and then improve it in-tree.
This would be advantagous because the aomp12 branch that was in use for fixing
this codebase has just been joined with the amd internal rocm dev process.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85742

4 years ago[JITLink][MachO] Use correct symbol scope when N_PEXT is set and N_EXT unset.
Lang Hames [Sat, 15 Aug 2020 22:45:49 +0000 (15:45 -0700)]
[JITLink][MachO] Use correct symbol scope when N_PEXT is set and N_EXT unset.

MachOLinkGraphBuilder has been treating these as hidden, but they should be
treated as local.

Symbols with N_PEXT set and N_EXT unset are produced when hidden symbols are
run through 'ld -r' without passing -keep_private_externs. They will show up
under 'nm -m' as "was private extern", hence the name of the test cases.

Testcase commited as relocatable object to ensure that the test suite doesn't
depend on having 'ld -r' available.

4 years agoSlightly relax the regex on lld version in test (NFC)
Mehdi Amini [Sat, 15 Aug 2020 21:37:11 +0000 (21:37 +0000)]
Slightly relax the regex on lld version in test (NFC)

This makes the test introduced in 537f5483fe4e more robust with respect
to the actual version number. The previous regex restricted the version
to start with a leading `1` which was overly restrictive.

4 years ago[GlobalISel] Enable copy-propagation in post-legalizer combiner.
Amara Emerson [Fri, 14 Aug 2020 08:37:49 +0000 (01:37 -0700)]
[GlobalISel] Enable copy-propagation in post-legalizer combiner.

This cleans up copies that the legalizer or other combines leave around. They
can occasionally end up escaping as moves.

Differential Revision: https://reviews.llvm.org/D85964

4 years agoRefactor mlir-opt setup in a new helper function (NFC)
Mehdi Amini [Sat, 15 Aug 2020 19:02:56 +0000 (19:02 +0000)]
Refactor mlir-opt setup in a new helper function (NFC)

This will help refactoring some of the tools to prepare for the explicit registration of
Dialects.

Differential Revision: https://reviews.llvm.org/D86023

4 years ago[llvm-libtool-darwin] Use Optional operator overloads. NFC
Shoaib Meenai [Sat, 15 Aug 2020 18:41:57 +0000 (11:41 -0700)]
[llvm-libtool-darwin] Use Optional operator overloads. NFC

Use operator bool instead of hasValue and operator* instead of getValue
to simplify the code slightly.

4 years ago[gn build] Port 79298a50670
LLVM GN Syncbot [Sat, 15 Aug 2020 16:24:37 +0000 (16:24 +0000)]
[gn build] Port 79298a50670

4 years agoGlobalISel: Remove unnecessary llvm::
Matt Arsenault [Sat, 15 Aug 2020 15:09:21 +0000 (11:09 -0400)]
GlobalISel: Remove unnecessary llvm::

4 years agoAMDGPU: Remove register class params from flat memory patterns
Matt Arsenault [Sat, 15 Aug 2020 00:22:04 +0000 (20:22 -0400)]
AMDGPU: Remove register class params from flat memory patterns

4 years agoAMDGPU: Fix global atomic saddr operand class
Matt Arsenault [Sat, 15 Aug 2020 00:01:51 +0000 (20:01 -0400)]
AMDGPU: Fix global atomic saddr operand class

4 years agoAMDGPU: Remove slc from flat offset complex patterns
Matt Arsenault [Fri, 14 Aug 2020 20:42:18 +0000 (16:42 -0400)]
AMDGPU: Remove slc from flat offset complex patterns

This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.

4 years agoAMDGPU: Fix matching wrong offsets for global atomic loads
Matt Arsenault [Thu, 13 Aug 2020 22:51:58 +0000 (18:51 -0400)]
AMDGPU: Fix matching wrong offsets for global atomic loads

These used signed offsets with a different size.

4 years agoAMDGPU: Remove redundant FLAT complex patterns
Matt Arsenault [Thu, 13 Aug 2020 22:57:06 +0000 (18:57 -0400)]
AMDGPU: Remove redundant FLAT complex patterns

These were identical to the non-atomic cases. I'm not sure why these
were ever separated.

4 years agoAMDGPU: Correct definitions for global saddr instructions
Matt Arsenault [Wed, 12 Aug 2020 00:38:40 +0000 (20:38 -0400)]
AMDGPU: Correct definitions for global saddr instructions

The VGPR component is a 32-bit offset, not 64-bits.

I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.

4 years agoAMDGPU: Remove SIFixupVectorISel pass
Matt Arsenault [Thu, 13 Aug 2020 20:17:42 +0000 (16:17 -0400)]
AMDGPU: Remove SIFixupVectorISel pass

This was only used for matching the saddr addressing mode of global
instructions, but this was not implemented correctly. The instruction
definitions aren't even correct, and are defined as using a 64-bit
VGPR component. Eliminate this pass to enable correcting the
instruction definitions. A new matching implementation can work in
GlobalISel or relying on DAG divergence information for the base
address.

4 years ago[NFC] Fix typo and variable names
Aditya Kumar [Sat, 15 Aug 2020 15:51:48 +0000 (08:51 -0700)]
[NFC] Fix typo and variable names

4 years ago[Attributor][NFC] Format code
Luofan Chen [Sat, 15 Aug 2020 15:53:11 +0000 (23:53 +0800)]
[Attributor][NFC] Format code

4 years ago[Attributor][NFC] Use indexes instead of iterator
Luofan Chen [Sat, 15 Aug 2020 15:04:11 +0000 (23:04 +0800)]
[Attributor][NFC] Use indexes instead of iterator

When adding elements when iterating, the iterator will become
valid, which could cause errors. This fixes the issue by using
indexes instead of iterator.

4 years agoAdd support for C++20 concepts and decltype to modernize-use-trailing-return-type.
Bernhard Manfred Gruber [Sat, 15 Aug 2020 14:40:22 +0000 (10:40 -0400)]
Add support for C++20 concepts and decltype to modernize-use-trailing-return-type.

4 years ago[TextAPI] update DriverKit string value
Cyndy Ishida [Sat, 15 Aug 2020 13:37:06 +0000 (06:37 -0700)]
[TextAPI] update DriverKit string value

String value differed from downstream, where upstream doesn't depend on
casing difference.
<rdar://problem/67106257>

4 years ago[MachOYAML] Move EmitFunc to an inner scope. NFC.
Xing GUO [Sat, 15 Aug 2020 13:08:17 +0000 (21:08 +0800)]
[MachOYAML] Move EmitFunc to an inner scope. NFC.

4 years ago[Attributor] Use internalized version of non-exact functions
Luofan Chen [Sat, 15 Aug 2020 11:17:44 +0000 (19:17 +0800)]
[Attributor] Use internalized version of non-exact functions

This patch internalize non-exact functions and replaces of their uses
with the internalized version. Doing this enables the analysis of
non-exact functions.

We can do this because some non-exact functions with the same name
whose linkage is `linkonce_odr` or `weak_odr` should have the same
semantics, so we can safely internalize and replace use of them (the
result of the other version of this function should be the same.).
Note that not all functions can be internalized, e.g., function with
`linkonce` or `weak` linkage.

For now when specified in commandline, we internalize all functions
that meet the requirements without calculating the cost of such
internalzation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84167

4 years ago[DWARFYAML] Simplify isEmpty(). NFC.
Xing GUO [Sat, 15 Aug 2020 12:09:37 +0000 (20:09 +0800)]
[DWARFYAML] Simplify isEmpty(). NFC.

4 years agoOn FreeBSD, add -pthread to ASan dynamic compile flags for tests
Dimitry Andric [Sat, 1 Aug 2020 14:26:36 +0000 (16:26 +0200)]
On FreeBSD, add -pthread to ASan dynamic compile flags for tests

Otherwise, lots of these tests fail with a CHECK error similar to:

==12345==AddressSanitizer CHECK failed: compiler-rt/lib/asan/asan_posix.cpp:120 "((0)) == ((pthread_key_create(&tsd_key, destructor)))" (0x0, 0x4e)

This is because the default pthread stubs in FreeBSD's libc always
return failures (such as ENOSYS for pthread_key_create) in case the
pthread library is not linked in.

Reviewed By: arichardson

Differential Revision: https://reviews.llvm.org/D85082

4 years agoReland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"
Dávid Bolvanský [Sat, 15 Aug 2020 10:07:45 +0000 (12:07 +0200)]
Reland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"

4 years agoRevert "Separate the Registration from Loading dialects in the Context"
Mehdi Amini [Sat, 15 Aug 2020 09:21:47 +0000 (09:21 +0000)]
Revert "Separate the Registration from Loading dialects in the Context"

This reverts commit 20563933875a9396c8ace9c9770ecf6a988c4ea6.

Build is broken on a few bots

4 years agoSeparate the Registration from Loading dialects in the Context
Mehdi Amini [Sat, 15 Aug 2020 06:40:18 +0000 (06:40 +0000)]
Separate the Registration from Loading dialects in the Context

This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.

This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.

Differential Revision: https://reviews.llvm.org/D85622

4 years agoRevert "Separate the Registration from Loading dialects in the Context"
Mehdi Amini [Sat, 15 Aug 2020 07:33:59 +0000 (07:33 +0000)]
Revert "Separate the Registration from Loading dialects in the Context"

This was landed by accident, will reland with the right comments
addressed from the reviews.
Also revert dependent build fixes.

4 years agoRevert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"
Martin Storsjö [Sat, 15 Aug 2020 06:19:54 +0000 (09:19 +0300)]
Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"

This reverts commit 6dbf0cfcf789365493f70ae69df8a7a59be41c75.

That commit caused failed assertions, e.g. like this:

$ cat sprintf-strcpy.c
char *ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); }

$ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu
clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value*,
llvm::Value::ReplaceMetadataUses): Assertion `New->getType() ==
getType() && "replaceAllUses of value with new value of different
type!"' failed.

4 years ago[lldb] Remove XFAIL from now passing TestPtrRefs/TestPtreRefsObjC
Raphael Isemann [Sat, 15 Aug 2020 06:14:42 +0000 (08:14 +0200)]
[lldb] Remove XFAIL from now passing TestPtrRefs/TestPtreRefsObjC

8fcfe2862fd4fde4793e232cfeebe6c5540c80a5 and
0cceb54366b406649fdfe7bb11b133ab96f3cd70 fixed those tests.

4 years ago[Tests] Be consistent w/definition of statepoint-example
Philip Reames [Sat, 15 Aug 2020 03:45:48 +0000 (20:45 -0700)]
[Tests] Be consistent w/definition of statepoint-example

These tests use the statepoint-example builtin gc which expects address space #1 to the only non-integral address space.  The fact the test used as=0 happened to work, but was caught by a downstream assert.  (Literally years ago, I just happened to notice the XFAIL and fix it now.)

4 years ago[Statepoint] Remove code related to inline operand bundles
Philip Reames [Sat, 15 Aug 2020 03:29:41 +0000 (20:29 -0700)]
[Statepoint] Remove code related to inline operand bundles

This code becomes dead for valid IR after 48f4312 and a96fc46.  The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order.  By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports.  And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.

4 years agoRemove inline gc arguments from statepoints
Philip Reames [Sat, 15 Aug 2020 02:42:18 +0000 (19:42 -0700)]
Remove inline gc arguments from statepoints

The "gc-live" operand bundles were recently added, and all tests have been updated to use that format.  A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement.

This is an extension to a96fc46.  "gc-live" hadn't been implemented at the point that patch was initially posted.

4 years ago[AMDGPU] Fix MAI ld/st hazard handling
Stanislav Mekhanoshin [Fri, 14 Aug 2020 22:38:13 +0000 (15:38 -0700)]
[AMDGPU] Fix MAI ld/st hazard handling

It did not process hazard for ds_permute because it does not
load or store even though it is DS.

Differential Revision: https://reviews.llvm.org/D86003

4 years ago[SLC] Transform strncpy(dst, "text", C) to memcpy(dst, "text\0\0\0", C) for C <=...
Dávid Bolvanský [Fri, 14 Aug 2020 23:49:02 +0000 (01:49 +0200)]
[SLC] Transform strncpy(dst, "text", C) to memcpy(dst, "text\0\0\0", C) for C <= 128 only

Transformation creates big strings for big C values, so bail out for C > 128.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86004

4 years ago[MSAN] Avoid dangling ActualFnStart when replacing instruction
Gui Andrade [Fri, 14 Aug 2020 23:34:16 +0000 (23:34 +0000)]
[MSAN] Avoid dangling ActualFnStart when replacing instruction

This would be a problem if the entire instrumented function was a call
to
e.g. memcpy

Use FnPrologueEnd Instruction* instead of ActualFnStart BB*

Differential Revision: https://reviews.llvm.org/D86001

4 years ago[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors.
Cameron McInally [Fri, 14 Aug 2020 23:36:16 +0000 (18:36 -0500)]
[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors.

Differential Revision: https://reviews.llvm.org/D85982

4 years ago[SVE] Remove calls to VectorType::getNumElements from AggressiveInstCombine
Christopher Tetreault [Fri, 14 Aug 2020 22:54:16 +0000 (15:54 -0700)]
[SVE] Remove calls to VectorType::getNumElements from AggressiveInstCombine

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D82218

4 years ago[libcxx/variant] Avoided variable name shadowing.
Michael Park [Fri, 14 Aug 2020 23:30:10 +0000 (16:30 -0700)]
[libcxx/variant] Avoided variable name shadowing.

4 years agoRemove deopt and gc transition arguments from gc.statepoint intrinsic
Philip Reames [Fri, 14 Aug 2020 23:06:19 +0000 (16:06 -0700)]
Remove deopt and gc transition arguments from gc.statepoint intrinsic

(Forgot to land this a couple of weeks back.)

In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.

The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.

Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.

Differential Revision: https://reviews.llvm.org/D80892

4 years ago[test][LoopUnroll] Cleanup FullUnroll.ll
Arthur Eubanks [Sat, 8 Aug 2020 00:56:31 +0000 (17:56 -0700)]
[test][LoopUnroll] Cleanup FullUnroll.ll

This is in preparation for enabling proper handling of optnone under the
NPM. Most optimizations won't run on an optnone function.

Previously the test would rely on lots of optimizations to optimize the
IR into a simple infinite loop. This is an optnone function, so clearly
that shouldn't be the case.

This IR was found by printing the module before the LoopFullUnrollerPass ran.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D85578

4 years ago[NewPM][optnone] Mark various passes as required
Arthur Eubanks [Thu, 6 Aug 2020 18:10:14 +0000 (11:10 -0700)]
[NewPM][optnone] Mark various passes as required

This was done by turning on -enable-npm-optnone and fixing failures.
That will be enabled in a follow-up change for ease of reverting.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D85457

4 years agoFix TargetSubtargetInfo derivatives after D85165
Fangrui Song [Fri, 14 Aug 2020 22:50:52 +0000 (15:50 -0700)]
Fix TargetSubtargetInfo derivatives after D85165

4 years ago[ELF] Re-initialize InputFile::isInGroup so that elf::link can be called more than...
Fangrui Song [Fri, 14 Aug 2020 22:38:05 +0000 (15:38 -0700)]
[ELF] Re-initialize InputFile::isInGroup so that elf::link can be called more than once

4 years ago[X86][MC][Target] Initial backend support a tune CPU to support -mtune
Craig Topper [Fri, 14 Aug 2020 21:56:54 +0000 (14:56 -0700)]
[X86][MC][Target] Initial backend support a tune CPU to support -mtune

This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.

This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.

One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.

I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.

Differential Revision: https://reviews.llvm.org/D85165