Artem Belevich [Tue, 17 Aug 2021 19:32:05 +0000 (12:32 -0700)]
[CUDA] Improve CUDA version detection and diagnostics.
Always use cuda.h to detect CUDA version. It's a more universal approach
compared to version.txt which is no longer present in recent CUDA versions.
Split the 'unknown CUDA version' warning in two:
* when detected CUDA version is partially supported by clang. It's expected to
work in general, at the feature parity with the latest supported CUDA
version. and may be missing support for the new features/instructions/GPU
variants. Clang will issue a warning.
* when detected version is new. Recent CUDA versions have been working with
clang reasonably well, and will likely to work similarly to the partially
supported ones above. Or it may not work at all. Clang will issue a warning and
proceed as if the latest known CUDA version was detected.
Differential Revision: https://reviews.llvm.org/D108247
Artem Belevich [Tue, 17 Aug 2021 19:27:37 +0000 (12:27 -0700)]
[CUDA] Add support for CUDA-11.4
Differential Revision: https://reviews.llvm.org/D108239
Artem Belevich [Tue, 17 Aug 2021 18:51:12 +0000 (11:51 -0700)]
[CUDA] Bump default GPU architecture to sm_35.
It's the oldest GPU architecture currently supported by all CUDA versions clang
can use.
Differential Revision: https://reviews.llvm.org/D108235
Simon Pilgrim [Mon, 23 Aug 2021 20:06:06 +0000 (21:06 +0100)]
Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated
As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.
I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.
https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)
Differential Revision: https://reviews.llvm.org/D106450
David Green [Mon, 23 Aug 2021 20:07:55 +0000 (21:07 +0100)]
[AArch64] Correct store ReadAdrBase operand
It appears that the Read operand for stores was being placed on the
first operand (the stored value) not the address base. This adds a
ReadST for the stored value operand, allowing the ReadAdrBase to
correctly act upon the address.
Differential Revision: https://reviews.llvm.org/D108287
David Green [Mon, 23 Aug 2021 09:49:26 +0000 (10:49 +0100)]
[AArch64] Add Scheduling tests for Load/Store ReadAdv operands.
MaheshRavishankar [Mon, 23 Aug 2021 17:15:35 +0000 (10:15 -0700)]
[mlir][Linalg] Allow all build methods of Structured ops to specify additional attributes.
Differential Revision: https://reviews.llvm.org/D108338
Nikita Popov [Sun, 22 Aug 2021 16:15:55 +0000 (18:15 +0200)]
[MergeICmps] Allow sinking past non-load/store
This is a followup to D106591. MergeICmps currently only allows
sinking the loads past either instructions that don't write to
memory at all, or simple loads/stores that don't modify the memory
the loads access.
The "simple loads/stores" part of this check doesn't seem necessary
to me -- AA isModRef() already accurately models any operation
that may clobber the memory. For example, in the adjusted test case
the transform is still fine if the call to @foo() isn't readonly,
but inaccessiblememonly -- in both cases, the call cannot modify
the loaded memory.
Differential Revision: https://reviews.llvm.org/D108517
River Riddle [Mon, 23 Aug 2021 19:49:38 +0000 (19:49 +0000)]
[mlir][NFC] Add inlineRegion overloads that take a block iterator insert position
This allows for inlining into an empty block or to the beginning of a block. NFC as the existing implementations now foward to this overload.
Differential Revision: https://reviews.llvm.org/D108572
Alina Sbirlea [Fri, 20 Aug 2021 17:33:33 +0000 (10:33 -0700)]
[DSE] Check post-dominance for malloc+memset->calloc transform.
Aiming to address the regression discussed in
https://reviews.llvm.org/D103009.
Differential Revision: https://reviews.llvm.org/D108485
Louis Dionne [Mon, 23 Aug 2021 19:34:40 +0000 (15:34 -0400)]
[libc++][NFC] Reindent error message
Andrei Elovikov [Mon, 23 Aug 2021 19:05:15 +0000 (12:05 -0700)]
[NFC][clang] Use X86 Features declaration from X86TargetParser
...instead of redeclaring them in clang's own X86Target.def. They were already
required to be in sync (IIUC), so no reason to maintain two identical lists.
Reviewed By: erichkeane, craig.topper
Differential Revision: https://reviews.llvm.org/D108151
Jon Chesterfield [Mon, 23 Aug 2021 19:25:23 +0000 (20:25 +0100)]
[openmp] Use llvm GridValues from devicertl
Add include path to the cmakefiles and set the target_impl enums
from the llvm constants instead of copying the values.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108391
Stanislav Mekhanoshin [Fri, 20 Aug 2021 17:14:31 +0000 (10:14 -0700)]
Fix late rematerialization operands check
D106408 enables rematerialization of instructions with virtual
register uses. That has uncovered the bug in the allUsesAvailableAt
implementation: https://bugs.llvm.org/show_bug.cgi?id=51516.
In the majority of cases canRematerializeAt() called to check if
an instruction can be rematerialized before the given UseIdx.
However, SplitEditor::enterIntvAtEnd() calls it to rematerialize
an instruction at the end of a block passing LIS.getMBBEndIdx()
into the check. In the testcase from the bug it has attempted to
rematerialize ADDXri after STRXui in bb.17. The use operand %55
of the ADD is killed by the STRX but that is undetected by the check
because it adjusts passed UseIdx to the reg slot, before the kill.
The value is dead at the index passed to the check however.
This change uses a later of passed UseIdx and its reg slot. This
shall be correct because if are checking an availability of operands
before an instruction that instruction cannot be the one defining
these operands. If we are checking for late rematerialization we
are really interested if operands live past the instruction.
The bug is not exploitable without D106408 but needed to reland
reverted D106408.
Differential Revision: https://reviews.llvm.org/D108475
Zarko Todorovski [Mon, 23 Aug 2021 18:34:35 +0000 (14:34 -0400)]
[PowerPC][AIX] Set the HasAlloca flag in the AIX Traceback Table only if R31 is used as a frame pointer
After
c063946476e083a9a0c5bd397337d1ece4742ec6 usage of R31 doesn't necessarily mean
that alloca is used. The `TracebackTable::IsAllocaUsedMask` flag should be set only
when R31 is used as a frame pointer.
On AIX the `function calls alloca' bit seems to be set whenever R31 is
set up as a frame pointer, even when there is no alloca call.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D108141
Sanjay Patel [Mon, 23 Aug 2021 18:51:58 +0000 (14:51 -0400)]
[InstCombine] improve efficiency of isFreeToInvert
This is NFC-intended when viewed from outside the pass.
I was trying to make sure that we don't infinite loop
in subtract combines and noticed that we handle the
non-canonical forms of add/sub here, but it should
not be necessary. Coding it this way seems slightly
clearer than mixing all 4 patterns as before.
River Riddle [Mon, 23 Aug 2021 18:14:18 +0000 (18:14 +0000)]
[mlir][FoldUtils] Ensure the created constant dominates the replaced op
This revision fixes a bug where an operation would get replaced with
a pre-existing constant that didn't dominate it. This can occur when
a pattern inserts operations to be folded at the beginning of the
constants insertion block. This revision fixes the bug by moving the
existing constant before the replaced operation in such cases. This is
fine because if a constant didn't already exist, a new one would have
been inserted before this operation anyways.
Differential Revision: https://reviews.llvm.org/D108498
Alex Langford [Mon, 23 Aug 2021 18:31:36 +0000 (11:31 -0700)]
[lldb][NFC] Remove unused method RichManglingContext::IsFunction
Krzysztof Drewniak [Mon, 16 Aug 2021 15:21:02 +0000 (15:21 +0000)]
[MLIR][Docs] Fix broken link to tuple type rationale
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108135
Alfonso Gregory [Mon, 23 Aug 2021 18:15:14 +0000 (18:15 +0000)]
[libc][NFC] Add explicit casts to ctype functions
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D106902
Greg Clayton [Fri, 20 Aug 2021 23:36:06 +0000 (16:36 -0700)]
Fix fallback code that gets decl file + line.
When a function has no line table, but does have debug info (DW_TAG_subprogram), we fall back to creating a line table with a single line entry that has the start address of the function and the source file and line of the function declaration. The bug in this code was that we might have a DW_TAG_subprogram that uses a DW_AT_specification or DW_AT_abstract_origin that points to another DIE, and that DIE might be in another compile unit. The bug was we were grabbing the file index value from the DIE, and that index could be from the other DIE in another compile unit that has its own and compleltely different file table, so we might be using a file index from one compile unit with the file table from another. This was causing a crash in llvm-gsymuil when run against dSYM files. dsymutil, the Apple DWARF linker, will often unique types and can end up with more absolute references across different compile units.
The fix is to use the DWARFDie::getDeclFile(...) accessor as it does fetch this information correctly.
Differential Revision: https://reviews.llvm.org/D108497
Jessica Paquette [Mon, 23 Aug 2021 16:55:52 +0000 (09:55 -0700)]
[AArch64][GlobalISel] Add regbankselect support for G_LLROUND
Same as G_LROUND: destination should always be a GPR, source should always be
a FPR.
Differential Revision: https://reviews.llvm.org/D108566
Chris Bieneman [Thu, 29 Jul 2021 14:37:49 +0000 (09:37 -0500)]
Implement #pragma clang restrict_expansion
This patch adds `#pragma clang restrict_expansion ` to enable flagging
macros as unsafe for header use. This is to allow macros that may have
ABI implications to be avoided in headers that have ABI stability
promises.
Using macros in headers (particularly public headers) can cause a
variety of issues relating to ABI and modules. This new pragma logs
warnings when using annotated macros outside the main source file.
This warning is added under a new diagnostics group -Wpedantic-macros
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D107095
Jessica Paquette [Mon, 23 Aug 2021 16:33:12 +0000 (09:33 -0700)]
[AArch64][GlobalISel] Legalize G_LLROUND for s64 + s32
Same as G_LROUND.
Also add a TODO for full fp16 legalization.
Differential Revision: https://reviews.llvm.org/D108564
Jessica Paquette [Mon, 23 Aug 2021 16:16:20 +0000 (09:16 -0700)]
[GlobalISel] Translate @llvm.llround.* -> G_LLROUND
Translate it using `IRTranslator::translateSimpleIntrinsic`.
Differential Revision: https://reviews.llvm.org/D108563
Jon Chesterfield [Mon, 23 Aug 2021 15:19:10 +0000 (16:19 +0100)]
[openmp][nfc] Refactor GridValues
Remove redundant fields and replace pointer with virtual function
Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.
This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108380
Florian Hahn [Mon, 23 Aug 2021 14:45:53 +0000 (15:45 +0100)]
[InstCombine] Add reduced sub/negate test from PR51584.
Florian Hahn [Mon, 23 Aug 2021 14:31:48 +0000 (15:31 +0100)]
Revert "[InstCombine] generalize subtract with 'not' operands"
This reverts commit
3aa009cc87e3789ac44bbb98b04846736373e08f.
The reverted commit causes an infinite loop in instcombine. See PR51584.
Jinsong Ji [Mon, 23 Aug 2021 13:16:48 +0000 (13:16 +0000)]
[InstrProfiling] Add AIX triple to platform test
We found that AIX was not covered in most of the InstrProfiling tests.
So we are trying to enable the tests gradually.
This is to add AIX triple to platform tests to make sure the
registrations are OK.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D108490
Alexander Potapenko [Mon, 23 Aug 2021 13:46:28 +0000 (15:46 +0200)]
[tsan] Do not include <stdatomic.h> from sanitize-thread-disable.c
Looks like non-x86 bots are unhappy with inclusion of <stdatomic.h>
e.g.:
clang-armv7-vfpv3-2stage - https://lab.llvm.org/buildbot/#/builders/182/builds/626
clang-ppc64le-linux - https://lab.llvm.org/buildbot/#/builders/76/builds/3619
llvm-clang-win-x-armv7l - https://lab.llvm.org/buildbot/#/builders/60/builds/4514
It seems to be unnecessary, just remove it and replace atomic_load()
calls with dereferences of _Atomic*.
Differential Revision: https://reviews.llvm.org/D108555
Krasimir Georgiev [Mon, 23 Aug 2021 13:51:39 +0000 (15:51 +0200)]
[clang-format] break after the closing paren of a TypeScript decoration
This fixes up a regression we found from
https://reviews.llvm.org/D107267: in specific contexts, clang-format
stopped breaking after the `)` in TypeScript decorations. There were no test cases covering this, so I added one.
Reviewed By: MyDeveloperDay
Differential Revision: https://reviews.llvm.org/D108538
Peyton, Jonathan L [Fri, 20 Aug 2021 21:06:13 +0000 (16:06 -0500)]
[OpenMP][test] fix omp_get_wtime.c test to be more accommodating
The omp_get_wtime.c test fails intermittently if the recorded times are
off by too much which can happen when many tests are run in parallel.
Instead of failing if one timing is a little off, take average of 100
timings minus the 10 worst.
Differential Revision: https://reviews.llvm.org/D108488
Simon Pilgrim [Mon, 23 Aug 2021 12:59:54 +0000 (13:59 +0100)]
[X86] Add unaligned partial load test
Shows LoadedSlice::canMergeExpensiveCrossRegisterBankCopy failure to merge unaligned dereferencable loads.
Another candidate for PR45116
Andy Wingo [Wed, 4 Aug 2021 13:49:13 +0000 (15:49 +0200)]
[clang][CodeGen] GetDefaultAlignTempAlloca uses preferred alignment
This function was defaulting to use the ABI alignment for the LLVM
type. Here we change to use the preferred alignment. This will allow
unification with GetTempAlloca, which if alignment isn't specified, uses
the preferred alignment.
Differential Revision: https://reviews.llvm.org/D108450
Andy Wingo [Wed, 4 Aug 2021 09:32:51 +0000 (11:32 +0200)]
[clang][NFC] Tighten up code for GetGlobalVarAddressSpace
The LangAS local is only used in the OpenCL case; move its decl
inwards.
Differential Revision: https://reviews.llvm.org/D108449
Andy Wingo [Wed, 4 Aug 2021 09:27:02 +0000 (11:27 +0200)]
[clang][NFC] GetOrCreateLLVMGlobal takes LangAS
Pass a LangAS instead of a target address space to
GetOrCreateLLVMGlobal, to remove a place where the frontend assumes that
target address space 0 is special.
Differential Revision: https://reviews.llvm.org/D108445
Matthias Springer [Mon, 23 Aug 2021 12:33:56 +0000 (21:33 +0900)]
[mlir][SCF] Do not peel loops inside partial iterations
Do not apply loop peeling to loops that are contained in the partial iteration of an already peeled loop. This is to avoid code explosion when dealing with large loop nests. Can be controlled with a new pass option `skip-partial`.
Differential Revision: https://reviews.llvm.org/D108542
Chuanqi Xu [Mon, 23 Aug 2021 11:20:07 +0000 (19:20 +0800)]
[FuncSpec] Don't specialize function which are easy to inline
It would waste time to specialize a function which would inline finally.
This patch did two things:
- Don't specialize functions which are always-inline.
- Don't spescialize functions whose lines of code are less than threshold
(100 by default).
For spec2017int, this patch could reduce the number of specialized
functions by 33%. Then the compile time didn't increase for every
benchmark.
Reviewed By: SjoerdMeijer, xbolva00, snehasish
Differential Revision: https://reviews.llvm.org/D107897
Alexander Potapenko [Tue, 17 Aug 2021 11:19:15 +0000 (13:19 +0200)]
[tsan] Add support for disable_sanitizer_instrumentation attribute
Unlike __attribute__((no_sanitize("thread"))), this one will cause TSan
to skip the entire function during instrumentation.
Depends on https://reviews.llvm.org/D108029
Differential Revision: https://reviews.llvm.org/D108202
Simon Pilgrim [Mon, 23 Aug 2021 10:34:00 +0000 (11:34 +0100)]
[X86][AVX] Add PR13310 test coverage
Show failure to fold scaled-index into gather/scatter scale operands
Florian Hahn [Mon, 23 Aug 2021 10:17:37 +0000 (11:17 +0100)]
Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts the revert
ab9296f13be45cd190608f54a69bdd5c7c561b16.
The issue causing the revert should be fixed in
9baed023b4b5.
Jay Foad [Mon, 23 Aug 2021 09:49:01 +0000 (10:49 +0100)]
[AMDGPU] Try to fix a GCC 11 warning
Apparently GCC 11 was warning:
AMDGPURegisterBankInfo.cpp:2543:33: warning: enumerated and non-enumerated type in conditional expression [-Wextra]
Cullen Rhodes [Fri, 13 Aug 2021 07:40:39 +0000 (07:40 +0000)]
[AArch64][SME] Support NEON scalar FP instructions in streaming mode
The following scalar FP instructions are legal in streaming mode:
0101 1110 xx1x xxxx 11x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar)
0101 1110 x10x xxxx 00x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar, FP16)
01x1 1110 1x10 0001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar)
01x1 1110 1111 1001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar, FP16)
Predicate them on `HasNEONorStreamingSVE`. Full list of affected
instructions:
FMULX16, FMULX32, FMULX64, FRECPS16, FRECPS32, FRECPS64, FRSQRTS16,
FRSQRTS32, FRSQRTS64, FRECPEv1f16, FRECPEv1i32, FRECPEv1i64, FRECPXv1f16,
FRECPXv1i32, FRECPXv1i64, FRSQRTEv1f16, FRSQRTEv1i32, FRSQRTEv1i64
Depends on D107902.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions
Execution of NEON instructions that are illegal in streaming mode will
cause a trap or exception. Using FMULX [1] as an example, this check is
at the top of the pseudocode:
if elements == 1 then
CheckFPEnabled64();
else
CheckFPAdvSIMDEnabled64();
For the legal scalar variants it calls `CheckFPEnabled64`, whereas for the
illegal vector variants it calls `CheckFPAdvSIMDEnabled64` which traps.
This is useful for observing which instructions are/aren't legal
in streaming mode.
[1] https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended-
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D108039
Cullen Rhodes [Wed, 18 Aug 2021 12:46:22 +0000 (12:46 +0000)]
[AArch64][SME] Add predicate for NEON support in streaming mode
Split out from D107903 to remove dependency for D108039 and D108279.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D108293
Michael Kruse [Mon, 23 Aug 2021 00:23:51 +0000 (19:23 -0500)]
[Polly] Never consider non-SCoP blocks as error blocks.
Code outside the SCoP will be executed recardless of the code versioning
runtime check introduced by CodeGeneration. Assumption made based on
that these are never executed in Polly-optimized code does not hold.
This fixes the miscompilation of MultiSource/Applications/lambda-0.1.3
Siva Chandra Reddy [Sat, 21 Aug 2021 04:46:32 +0000 (04:46 +0000)]
[libc] Add a multi-waiter mutex test.
A corresponding adjustment to mtx_lock has also been made.
Min-Yih Hsu [Mon, 23 Aug 2021 05:43:02 +0000 (22:43 -0700)]
[M68k][NFC] Tidy up the just-migrated MC tests
Cleanup the formats of the MC tests that were just migrated. NFC
Min-Yih Hsu [Mon, 23 Aug 2021 05:24:31 +0000 (22:24 -0700)]
[M68k][test] Migrate some MOVE instruction MC tests
Migrate some MOVE instruction MC tests from test/CodeGen/M68k.
Unfortunately the tests touched in this commit were failed due to
lacking of the `abs.W` operand, which forces any memory address parsed
from assembly being represented in 32-bits.
We're temporarily allowing these unwanted widening in the tests until
the support for `abs.W` is there.
Siva Chandra Reddy [Fri, 4 Jun 2021 06:12:50 +0000 (06:12 +0000)]
[libc] Add range reduction functions based on Paine and Hanek algorithm.
These functions will be used in a future patch to implement
trigonometric functions. Unit tests have been added but to the
libc-long-running-tests suite. The unit tests long running because we
compare against MPFR computations performed at 1280 bits of precision.
Some cleanups or elimination of repeated patterns can be done as follow
up changes.
Differential Revision: https://reviews.llvm.org/D104817
Shilei Tian [Mon, 23 Aug 2021 02:57:05 +0000 (22:57 -0400)]
[NFC] clang-format -i clang/lib/CodeGen/CGStmtOpenMP.cpp
Kai Luo [Mon, 23 Aug 2021 02:02:36 +0000 (02:02 +0000)]
[PowerPC] Use int64_t to represent stack object offset and frame size
This is the first step to enable PPC64 support huge frame size(>2G). Also fix an assertion error for frame size, i.e.,`int x; !isInt<32>(x);` should be always evaluated false, so the guard code for frame size is impossible to hit.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D107435
Michael Kruse [Mon, 23 Aug 2021 01:35:14 +0000 (20:35 -0500)]
[Polly] Add support for -polly-dump-before/after with NPM.
The new pass manager does not allow adding module passes at the
-polly-position=before-vectorizer extension point. Introduce a
DumpFunctionPass that dumps only current function. In contrast to the
legacy pass manager's -polly-dump-before, each function will be dumped
into its own file. -polly-dump-before-file is still not supported.
The DumpFunctionPass uses llvm::CloneModule to copy the current function
into a new module and then write it into a file.
Stella Laurenzo [Sun, 22 Aug 2021 22:11:42 +0000 (15:11 -0700)]
[mlir] Add op for NCHW conv2d.
* This is the native data layout for PyTorch and npcomp was using the prior version before cleanup.
Differential Revision: https://reviews.llvm.org/D108527
Stella Laurenzo [Sun, 22 Aug 2021 23:54:10 +0000 (16:54 -0700)]
[mlir][linalg] Add script to update the LinalgNamedStructuredOps.yaml. nfc
Also adds banners to the files with update instructions.
Differential Revision: https://reviews.llvm.org/D108529
Stella Laurenzo [Sun, 22 Aug 2021 20:43:55 +0000 (13:43 -0700)]
[mlir][python] Makes C++ extension code relocatable by way of a macro.
* Resolves a TODO by making this configurable by downstreams.
* This seems to be the last thing allowing full use of the Python bindings as a library within another project (i.e. be embedding them).
Differential Revision: https://reviews.llvm.org/D108523
Nikita Popov [Sun, 22 Aug 2021 20:28:53 +0000 (22:28 +0200)]
[GVN] Fix test for loop load PRE on alloca (NFC)
This test was not modifying the pointer in the loop, so the loads
just ended up as undef, without relation to loop load PRE.
Pass the alloca to the called function, so the memory is
potentially modified.
Nikita Popov [Sun, 22 Aug 2021 19:08:58 +0000 (21:08 +0200)]
[GVN] Don't short-circuit load PRE
4ad41902e8c7481ccf3cdf6e618dfcd1e1fc10fc changed this code to
propagate Changed if scalar GEP PRE is performed. However, as
implemented this would skip the load PRE entirely if GEP indices
were PREd. Make sure load PRE runs even if Changed is already
true.
This likely has no functional effect as load PRE would then
occur on a later GVN iteration.
Amy Kwan [Sun, 22 Aug 2021 18:46:52 +0000 (13:46 -0500)]
[scudo][standalone] Link tests against libatomic if libatomic exists
It is possible that libatomic does not exist on some systems. This patch updates
the scudo standalone tests to link against libatomic if the library exists.
This is an update to the original patch: https://reviews.llvm.org/D64134 and
aims to resolve https://bugs.llvm.org/show_bug.cgi?id=51431.
Differential Revision: https://reviews.llvm.org/D108503
Philip Reames [Sun, 22 Aug 2021 18:34:50 +0000 (11:34 -0700)]
[runtimeunroll] Use early return to reduce nesting [nfc]
Philip Reames [Sun, 22 Aug 2021 17:40:05 +0000 (10:40 -0700)]
Special case common branch patterns in breakLoopBackedge
This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
Simon Pilgrim [Sun, 22 Aug 2021 17:27:03 +0000 (18:27 +0100)]
[X86] combineMul - move MUL_IMM comment inside function. NFC.
combineMul is now used for other things as well as the mul-with-constant expansion - move the comment to where its actually relevant.
Alexey Lapshin [Wed, 4 Aug 2021 16:17:33 +0000 (19:17 +0300)]
[DWARF][Verifier] Do not add child DieRangeInfo with empty address range to the parent.
verifyDieRanges function checks for the intersected address ranges.
It adds child DieRangeInfo into parent DieRangeInfo to check
whether children have overlapping address ranges. It is safe to not add
DieRangeInfo with empty address range into parent's children list.
This decreases the number of children which should be navigated and as a result
decreases execution time(parents having a lot of children with empty ranges
spend much time navigating them). For this command: "llvm-dwarfdump --verify clang-repl"
execution time decreased from 220 sec till 75 sec.
Differential Revision: https://reviews.llvm.org/D107554
Kazu Hirata [Sun, 22 Aug 2021 16:08:21 +0000 (09:08 -0700)]
[Transforms] Remove unused declaration emitStrNLen (NFC)
The corresponding definition has been missing for at least 5 years.
Arthur O'Dwyer [Thu, 19 Aug 2021 19:06:21 +0000 (15:06 -0400)]
[libc++] Eliminate needless `add_lvalue_reference` from <algorithm> helpers. NFCI.
When `_Compare` is a function parameter already (so it's not `void`
and it's not an abominable function type), `add_lvalue_reference_t<_Compare>`
is simply a synonym for `_Compare&`. We don't need to pull in `<type_traits>`
and instantiate a template trait to figure that out.
Differential Revision: https://reviews.llvm.org/D108400
Nikita Popov [Sun, 22 Aug 2021 14:55:53 +0000 (16:55 +0200)]
[InstCombine] Perform "eq of parts" fold with logical ops
The pattern matched here is too complex for the general logical
and/or to bitwise and/or conversion to trigger. However, the
fold is poison-safe, so match it with a select root as well:
https://alive2.llvm.org/ce/z/vNzzSg
https://alive2.llvm.org/ce/z/Beyumt
Nikita Popov [Sun, 22 Aug 2021 14:43:27 +0000 (16:43 +0200)]
[InstCombine] Add tests for "eq of parts" with logical op (NFC)
We currently only handle this with a bitwise and/or instruction,
but not a logical.
Simon Pilgrim [Sun, 22 Aug 2021 14:26:17 +0000 (15:26 +0100)]
[X86][AVX] matchShuffleAsBlend - use isElementEquivalent to help match broadcast/repeated elements
Extend matchShuffleAsBlend to not only match against known in-place elements for BLEND shuffles, but use isElementEquivalent to determine if the shuffle mask's referenced element is the same as the in-place element.
This allows us to replace a number of insertps instructions with more general blendps instructions (better opportunities for commutation, concatenation etc.).
Simon Pilgrim [Sun, 22 Aug 2021 14:02:19 +0000 (15:02 +0100)]
Fix signed/unsigned comparison warning. NFCI.
Simon Pilgrim [Sun, 22 Aug 2021 13:54:36 +0000 (14:54 +0100)]
[X86] Expose memory codegen in element insert load tests to improve accuracy of checks
Also replace X32 with X86 check prefixes for i686 tests (we tend to try to use X32 for gnux32 targets)
Simon Pilgrim [Sun, 22 Aug 2021 13:17:39 +0000 (14:17 +0100)]
[X86][SSE] lowerVECTOR_SHUFFLE - canonicalize with horizontal ops.
Before lowering shuffles, see if we can merge horizontal ops or canonicalize the shuffle mask to point to the same LHS/RHS of the HOps when an HOp's args are repeated.
Sanjay Patel [Sun, 22 Aug 2021 13:13:59 +0000 (09:13 -0400)]
[InstSimplify] fold rotate of -1 to -1
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575
https://alive2.llvm.org/ce/z/GpkFCt
Sanjay Patel [Sun, 22 Aug 2021 13:10:52 +0000 (09:10 -0400)]
[InstSimplify] fold rotate of zero to zero
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575
https://alive2.llvm.org/ce/z/fjKwqv
Sanjay Patel [Sun, 22 Aug 2021 13:09:49 +0000 (09:09 -0400)]
[InstSimplify] add tests for rotates of 0/-1; NFC
Simon Pilgrim [Sun, 22 Aug 2021 12:02:51 +0000 (13:02 +0100)]
[X86] Try to sync HSW + BDW model class defs to simplify comparisons. NFC.
Broadwell is mainly a die shrink of Haswell, but the model had many of the scheduling classes in different orders, making side-by-side comparisons very difficult.
The InstRW overrides are still quite different, but at least that part of the side-by-side diff is now in the same position.
This was noticed while I was trying to investigate diffs between llvm-mca and other perf analyzers in https://uica.uops.info/ - we used to be able to do diffs between most of the models very easily, but we seem to have lost that simplicity as classes have been altered, models have been refined and other models have rotted.
Sanjay Patel [Sat, 21 Aug 2021 17:05:35 +0000 (13:05 -0400)]
[InstCombine] generalize subtract with 'not' operands
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.
In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
Simon Pilgrim [Sat, 21 Aug 2021 19:46:12 +0000 (20:46 +0100)]
CGBuiltin.cpp - pass SVETypeFlags by const reference. NFC.
Don't pass the struct by value.
Florian Hahn [Sun, 22 Aug 2021 09:45:20 +0000 (10:45 +0100)]
[LV] Adjust reduction recipes before recurrence handling.
Adjusting the reduction recipes still relies on references to the
original IR, which can become outdated by the first-order recurrence
handling. Until reduction recipe construction does not require IR
references, move it before first-order recurrence handling, to prevent a
crash as exposed by D106653.
Ben Shi [Thu, 19 Aug 2021 13:51:09 +0000 (21:51 +0800)]
[DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27
Differential Revision: https://reviews.llvm.org/D107711
luxufan [Sun, 22 Aug 2021 08:43:02 +0000 (16:43 +0800)]
[JITLink] Add support of R_X86_64_32S relocation
This patch supported the R_X86_64_32S relocation and add the Pointer32Signed generic edge kind.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D108446
Lang Hames [Sun, 22 Aug 2021 00:43:06 +0000 (10:43 +1000)]
[ORC] Add std::tuple support to SimplePackedSerialization.
Lang Hames [Sun, 22 Aug 2021 00:58:58 +0000 (10:58 +1000)]
[ORC] Rename blobSerializationRoundTrip, drop explicit arg types on calls.
Renames the blobSerializationRoundTrip test helper function to
spsSerializationRoundTrip ('blob' was the placeholder name for the serialization
scheme during prototyping, this function was missed when renaming everything
for the mainline). Also drops explicit template arguments at call sites where
they can be inferred (and are obvious) from the call argument type.
Wang, Pengfei [Sun, 22 Aug 2021 00:24:20 +0000 (08:24 +0800)]
[X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.
Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D105267
Lang Hames [Sun, 22 Aug 2021 00:34:38 +0000 (10:34 +1000)]
[ORC] Add missing header.
Should fix bot failure at
https://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/4367
Fangrui Song [Sat, 21 Aug 2021 23:41:48 +0000 (16:41 -0700)]
[TargetCallingConv] Change OutputArg ctor to match its members
This avoids unneeded MVT->EVT conversion.
Fangrui Song [Sat, 21 Aug 2021 23:33:29 +0000 (16:33 -0700)]
[AArch64] Replace unneeded CCAssignToRegWithShadow with CCAssignToReg
CCState::AllocateReg handles aliased registers.
Fangrui Song [Sat, 21 Aug 2021 20:59:17 +0000 (13:59 -0700)]
[TargetMachine] Drop special case for *-win32-macho
clang CodeGenModule shouldAssumeDSOLocal has set dso_local.
Fangrui Song [Sat, 21 Aug 2021 19:37:29 +0000 (12:37 -0700)]
[TargetMachine] Simplify shouldAssumeDSOLocal. NFC
Kazu Hirata [Sat, 21 Aug 2021 19:17:58 +0000 (12:17 -0700)]
[clang] Fix typos in documentation (NFC)
Sanjay Patel [Fri, 20 Aug 2021 22:34:09 +0000 (18:34 -0400)]
[InstCombine] combine constants by reassociating add/sub/add
This may overlap partially with the reassociate pass,
but it seems simple enough that we should try it here
in InstCombine to enable other folds.
This shows up as an opportunity and potential regression
if we improve a subtract fold with 'not' ops to be more
general.
Sanjay Patel [Fri, 20 Aug 2021 21:50:18 +0000 (17:50 -0400)]
[InstCombine] add tests for add/sub/add combines; NFC
Sanjay Patel [Fri, 20 Aug 2021 21:33:19 +0000 (17:33 -0400)]
[InstCombine] add tests for min/max with nots and sub; NFC
David Green [Sat, 21 Aug 2021 15:33:18 +0000 (16:33 +0100)]
[ARM] Fix VQDMULH fold for scalar smin
Add a variant of mve-vqdmulh tests that uses min/max intrinsics
directly, including a scalar test that shows it misbehaving for min
intrinsics and a fix for the combine to prevent it from misbehaving.
Andrzej Warzynski [Fri, 20 Aug 2021 10:25:11 +0000 (10:25 +0000)]
[flang] Refine output file generation
This patch cleans-up the file generation code in Flang's frontend
driver. It improves the layering between
`CompilerInstance::CreateDefaultOutputFile`,
`CompilerInstance::CreateOutputFile` and their various clients.
* Rename `CreateOutputFile` as `CreateOutputFileImpl` and make it
private. This method is an implementation detail.
* Instead of passing an `std::error_code` out parameter into
`CreateOutputFileImpl`, have it return Expected<>. This is a bit shorter
and idiomatic LLVM.
* Make `CreateDefaultOutputFile` (which calls `CreateOutputFileImpl`)
issue an error when file creation fails. The error code from
`CreateOutputFileImpl` is used to generate a meaningful diagnostic
message.
* Remove error reporting from `PrintPreprocessedAction::ExecuteAction`.
This is only for cases when output file generation fails. This is
handled in `CreateDefaultOutputFile` instead (see the previous point).
* Inline `AddOutputFile` into its only caller,
`CreateDefaultOutputFile`.
* Switch from `lvm::buffer_ostream` to `llvm::buffer_unique_ostream>`
for non-seekable output streams. This simplifies the logic in the driver
and was introduced for this very reason in [1]
* Moke sure that the diagnostics from the prescanner when running `-E`
(`PrintPreprocessedAction::ExecuteAction`) are printed before the actual
output is generated.
* Update comments, add test.
NOTE: This patch relands [2]. As suggested by Michael Kruse in the
post-commit/post-revert review, I've added the following:
```
config.errc_messages = "@LLVM_LIT_ERRC_MESSAGES@"
```
in Flang's `lit.site.cfg.py.in`. This way, `%errc_ENOENT` in
output-paths.f90 gets the correct value on Windows as well as on Linux.
[1] https://reviews.llvm.org/D93260
[2]
fd21d1e198e381a2b9e7af1701044462b2d386cd
Reviewed By: ashermancinelli
Differential Revision: https://reviews.llvm.org/D108390
Kirill Shmakov [Fri, 20 Aug 2021 12:27:37 +0000 (15:27 +0300)]
[lldb] Fix typo in the description of breakpoint options
LLVM GN Syncbot [Sat, 21 Aug 2021 09:44:22 +0000 (09:44 +0000)]
[gn build] Port
7f99337f9bcf
Lang Hames [Fri, 20 Aug 2021 05:52:42 +0000 (15:52 +1000)]
[ORC] Add EPCGenericMemoryAccess: generic executor memory access via EPC calls.
All ExecutorProcessControl subclasses must provide an
ExecutorProcessControl::MemoryAccess object that can be used to access executor
memory from the JIT process. The EPCGenericMemoryAccess class provides an
off-the-shelf MemoryAccess implementation for JITs that do not need (or cannot
provide) a specialized MemoryAccess implementation. This simplifies the process
of creating new ExecutorProcessControl implementations.
eopXD [Thu, 19 Aug 2021 06:15:38 +0000 (23:15 -0700)]
[NFC][LoopIdiom] Let processLoopStoreOfLoopLoad take StoreSize as SCEV instead of unsigned
Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.
The plan is to let memcpy / memmove deal with runtime-determined sizes, just
like what D107353 did to memset.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D108289
Siva Chandra Reddy [Mon, 21 Jun 2021 06:05:29 +0000 (06:05 +0000)]
[libc] Add a new suite called "libc-long-running-tests".
This suite is helpful is adding long running tests which take a long
time to finish that they can be run on the public builders. They
will probably be run on special builders in future.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D104816
Kazu Hirata [Sat, 21 Aug 2021 02:19:54 +0000 (19:19 -0700)]
[CodeGen] Remove unused declaration setLiveInsUsed (NFC)
The corresponding definition was removed on Jan 20, 2017 in commit
710a4c1f3ddba3aa9313c72c43f9619afbc3e259.
Joseph Huber [Fri, 20 Aug 2021 20:43:31 +0000 (16:43 -0400)]
[OpenMP] Correctly add member expressions to OpenMP info
Mapping expressions that have `this` as their base expression aren't
considered a valid base variable and the rest of the runtime expects
this. However, if we have an expression with no value declaration we can
try to extract it manually to provide more helpful debuggin information.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108483