Guillaume Chatelet [Mon, 14 Oct 2019 13:14:34 +0000 (13:14 +0000)]
[Alignment][NFC] Move and type functions from MathExtras to Alignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68942
llvm-svn: 374773
Sander de Smalen [Mon, 14 Oct 2019 13:11:34 +0000 (13:11 +0000)]
[AArch64] Stackframe accesses to SVE objects.
Materialize accesses to SVE frame objects from SP or FP, whichever is
available and beneficial.
This patch still assumes the objects are pre-allocated. The automatic
layout of SVE objects within the stackframe will be added in a separate
patch.
Reviewers: greened, cameron.mcinally, efriedma, rengolin, thegameg, rovka
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D67749
llvm-svn: 374772
Fangrui Song [Mon, 14 Oct 2019 12:51:47 +0000 (12:51 +0000)]
[llvm-size] Tidy up error messages (PR42970)
Clean up some formatting inconsistencies in the error messages and correctly exit with non-zero in all error cases.
Differential Revision: https://reviews.llvm.org/D68906
Patch by Alex Cameron
llvm-svn: 374771
David Stenberg [Mon, 14 Oct 2019 12:49:58 +0000 (12:49 +0000)]
[DebugInfo] Fix truncation of call site immediates
Summary:
This addresses a bug in collectCallSiteParameters() where call site
immediates would be truncated from int64_t to unsigned.
This fixes PR43525.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: aprantl
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D68869
llvm-svn: 374770
Pavel Labath [Mon, 14 Oct 2019 12:49:06 +0000 (12:49 +0000)]
DWARFExpression: Fix/add support for (v4) debug_loc base address selection entries
The DWARFExpression is parsing the location lists in about five places.
Of those, only one actually had proper support for base address
selection entries.
Since r374600, llvm has started to produce location expressions with
base address selection entries more aggresively, which caused some tests
to fail.
This patch adds support for these entries to the places which had it
missing, fixing the failing tests. It also adds a targeted test for the
two of the three fixes, which should continue testing this functionality
even if the llvm output changes. I am not aware of a way to write a
targeted test for the third fix (DWARFExpression::Evaluate).
llvm-svn: 374769
Dmitri Gribenko [Mon, 14 Oct 2019 12:22:48 +0000 (12:22 +0000)]
Revert "Add a pass to lower is.constant and objectsize intrinsics"
This reverts commit r374743. It broke the build with Ocaml enabled:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218
llvm-svn: 374768
Alexander Timofeev [Mon, 14 Oct 2019 12:01:10 +0000 (12:01 +0000)]
[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'
Detailed description:
After https://reviews.llvm.org/D59990 submit several issues were discovered.
Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly.
Discovered issues were addressed in the following commits:
https://reviews.llvm.org/D67662
https://reviews.llvm.org/D67101
https://reviews.llvm.org/D63953
https://reviews.llvm.org/D63731
This change brings back AMDGPU specific changes.
Reviewed by: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D68635
llvm-svn: 374767
Victor Campos [Mon, 14 Oct 2019 11:12:23 +0000 (11:12 +0000)]
Fixing typo in llvm/IR/Intrinsics.td
Fixing typo in comment line.
llvm-svn: 374766
Andrea Di Biagio [Mon, 14 Oct 2019 11:12:18 +0000 (11:12 +0000)]
[X86][BtVer2] Improved latency and throughput of float/vector loads and stores.
This patch introduces the following changes to the btver2 scheduling model:
- The number of micro opcodes for YMM loads and stores is now 2 (it was
incorrectly set to 1 for both aligned and misaligned loads/stores).
- Increased the number of AGU resource cycles for YMM loads and stores
to 2cy (instead of 1cy).
- Removed JFPU01 and JFPX from the list of resources consumed by pure
float/vector loads (no MMX).
I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those
are dispatched to the FPU but not really issues on JFPU01.
Differential Revision: https://reviews.llvm.org/D68871
llvm-svn: 374765
Sam Parker [Mon, 14 Oct 2019 10:00:21 +0000 (10:00 +0000)]
[NFC][TTI] Add Alignment for isLegalMasked[Load/Store]
Add an extra parameter so the backend can take the alignment into
consideration.
Differential Revision: https://reviews.llvm.org/D68400
llvm-svn: 374763
Guillaume Chatelet [Mon, 14 Oct 2019 09:31:00 +0000 (09:31 +0000)]
Fix D68936
llvm-svn: 374761
Hans Wennborg [Mon, 14 Oct 2019 09:08:57 +0000 (09:08 +0000)]
build_llvm_package.bat: Run check-clang-tools and check-clangd tests.
llvm-svn: 374759
Guillaume Chatelet [Mon, 14 Oct 2019 09:04:15 +0000 (09:04 +0000)]
[Alignment][NFC] Support compile time constants
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68936
llvm-svn: 374758
Sjoerd Meijer [Mon, 14 Oct 2019 07:40:36 +0000 (07:40 +0000)]
[docs] loop pragmas: options implying transformations
Following our discussion on the cfe dev list:
http://lists.llvm.org/pipermail/cfe-dev/2019-August/063054.html,
I have added a paragraph that is explicit about loop pragmas, and
transformation options implying the corresponding transformation.
Differential Revision: https://reviews.llvm.org/D66199
llvm-svn: 374756
Craig Topper [Mon, 14 Oct 2019 06:47:56 +0000 (06:47 +0000)]
[X86] Teach EmitTest to handle ISD::SSUBO/USUBO in order to use the Z flag from the subtract directly during isel.
This prevents isel from emitting a TEST instruction that
optimizeCompareInstr will need to remove later.
In some of the modified tests, the SUB gets duplicated due to
the flags being needed in two places and being clobbered in
between. optimizeCompareInstr was able to optimize away the TEST
that was using the result of one of them, but optimizeCompareInstr
doesn't know to turn SUB into CMP after removing the TEST. It
only knows how to turn SUB into CMP if the result was already
dead.
With this change the TEST never exists, so optimizeCompareInstr
doesn't have to remove it. Then it can just turn the SUB into
CMP immediately.
Fixes PR43649.
llvm-svn: 374755
Michal Gorny [Mon, 14 Oct 2019 05:33:23 +0000 (05:33 +0000)]
[clang] [clang-offload-bundler] Fix finding installed llvm-objcopy
Allow finding installed llvm-objcopy in PATH if it's not present
in the directory containing clang-offload-bundler. This is the case
if clang is being built stand-alone, and llvm-objcopy is already
installed while the c-o-b tool is still present in build directory.
This is consistent with how e.g. llvm-symbolizer is found in LLVM.
However, most of similar searches in LLVM and Clang are performed
without special-casing the program directory.
Fixes r369955.
Differential Revision: https://reviews.llvm.org/D68931
llvm-svn: 374754
Nico Weber [Mon, 14 Oct 2019 03:44:47 +0000 (03:44 +0000)]
clangd tests: use extended regex with sed
The escaped parens seem to confuse the combination of lit, cygwin
quoting, and cygwin's sed. unxutils sed in cmd.exe is fine with both
forms, so use the extended regex form that doesn't need an escaped
paren.
llvm-svn: 374753
Nico Weber [Mon, 14 Oct 2019 02:21:12 +0000 (02:21 +0000)]
convert another test to unix line endings
llvm-svn: 374752
Nico Weber [Mon, 14 Oct 2019 02:14:18 +0000 (02:14 +0000)]
convert a test to unix line endings
llvm-svn: 374751
Nico Weber [Mon, 14 Oct 2019 01:44:29 +0000 (01:44 +0000)]
fix typo in 374747
llvm-svn: 374750
Nico Weber [Mon, 14 Oct 2019 01:41:56 +0000 (01:41 +0000)]
Prefer 'env not' over 'not env' in tests.
That way, lit's builtin 'env' command can be used for the 'env' bit.
Also it's clearer that way that the 'not' shouldn't cover 'env'
failures.
llvm-svn: 374749
Craig Topper [Mon, 14 Oct 2019 01:41:04 +0000 (01:41 +0000)]
[X86] Autogenerate complete checks. NFC
llvm-svn: 374748
Nico Weber [Mon, 14 Oct 2019 01:19:53 +0000 (01:19 +0000)]
Make symbols.test pass on Windows.
See commit message of r374746 for details.
Hopefully final bit of PR43592.
llvm-svn: 374747
Nico Weber [Mon, 14 Oct 2019 01:00:33 +0000 (01:00 +0000)]
Make code-action-request.test and request-reply.test pass on Windows.
clangd's test:// scheme expands to /clangd-test on non-Win and to
C:/clang-test on Win, so it can't be mixed freely with
file:///clangd-test since that's wrong on Windows. This makes both
tests consistenly use the test:// scheme. (Alternatively, we could use
the //INPUT_DIR pattern used in a few other tests.)
Part of PR43592.
llvm-svn: 374746
Nico Weber [Mon, 14 Oct 2019 00:45:02 +0000 (00:45 +0000)]
Don't run background-index.test on Windows.
The test had a "UNSUPPORTED: win32" line, but the spelling of that
changed in r339307 a year ago. Finally update this test too.
Part of PR43592.
llvm-svn: 374745
Florian Hahn [Sun, 13 Oct 2019 23:34:13 +0000 (23:34 +0000)]
[NewGVN] Use m_Br to simplify code a bit. (NFC)
llvm-svn: 374744
Joerg Sonnenberger [Sun, 13 Oct 2019 23:00:15 +0000 (23:00 +0000)]
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
Joerg Sonnenberger [Sun, 13 Oct 2019 22:33:46 +0000 (22:33 +0000)]
Improve __builtin_constant_p lowering
__builtin_constant_p used to be short-cut evaluated to false when
building with -O0. This is undesirable as it means that constant folding
in the front-end can give different results than folding in the back-end.
It can also create conditional branches on constant conditions that don't
get folded away. With the pending improvements to the llvm.is.constant
handling on the LLVM side, the short-cut is no longer useful.
Adjust various codegen tests to not depend on the short-cut or the
backend optimisations.
Differential Revision: https://reviews.llvm.org/D67638
llvm-svn: 374742
Simon Atanasyan [Sun, 13 Oct 2019 22:10:06 +0000 (22:10 +0000)]
merge-request.sh: Update 9.0 metabug for 9.0.1
llvm-svn: 374741
Johannes Doerfert [Sun, 13 Oct 2019 21:25:53 +0000 (21:25 +0000)]
[Attributor] Shortcut no-return through will-return
No-return and will-return are exclusive, assuming the latter is more
prominent we can avoid updates of the former unless will-return is not
known for sure.
llvm-svn: 374739
Johannes Doerfert [Sun, 13 Oct 2019 20:48:26 +0000 (20:48 +0000)]
[Attributor][FIX] NullPointerIsDefined needs the pointer AS (AANonNull)
Also includes a shortcut via AADereferenceable if possible.
llvm-svn: 374737
Johannes Doerfert [Sun, 13 Oct 2019 20:47:16 +0000 (20:47 +0000)]
[Attributor][MemBehavior] Fallback to the function state for arguments
Even if an argument is captured, we cannot have an effect the function
does not have. This is fine except for the special case of `inalloca` as
it does not behave by the rules.
TODO: Maybe the special rule for `inalloca` is wrong after all.
llvm-svn: 374736
Johannes Doerfert [Sun, 13 Oct 2019 20:40:10 +0000 (20:40 +0000)]
[Attributor][FIX] Use check prefix that is actually tested
Summary:
This changes "CHECK" check lines to "ATTRIBUTOR" check lines where
necessary and also fixes the now exposed, mostly minor, problems.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68929
llvm-svn: 374735
Roman Lebedev [Sun, 13 Oct 2019 20:15:00 +0000 (20:15 +0000)]
[NFC][InstCombine] Some preparatory cleanup in dropRedundantMaskingOfLeftShiftInput()
llvm-svn: 374734
DeForest Richards [Sun, 13 Oct 2019 20:05:22 +0000 (20:05 +0000)]
[Docs] Moves Control Flow Document to User Guides
Moves Control Flow document from Reference docs page to User guides page.
llvm-svn: 374733
Simon Pilgrim [Sun, 13 Oct 2019 19:35:35 +0000 (19:35 +0000)]
[X86] getTargetShuffleInputs - Control KnownUndef mask element resolution as well as KnownZero.
We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all.
llvm-svn: 374732
Craig Topper [Sun, 13 Oct 2019 19:07:28 +0000 (19:07 +0000)]
[X86] Enable use of avx512 saturating truncate instructions in more cases.
This enables use of the saturating truncate instructions when the
result type is less than 128 bits. It also enables the use of
saturating truncate instructions on KNL when the input is less
than 512 bits. We can do this by widening the input and then
extracting the result.
llvm-svn: 374731
Nico Weber [Sun, 13 Oct 2019 17:43:16 +0000 (17:43 +0000)]
Add missing "REQUIRES: shell" to system-include-extractor.test
Part of PR43592.
llvm-svn: 374730
Sanjay Patel [Sun, 13 Oct 2019 17:34:08 +0000 (17:34 +0000)]
[ConstantFold] fix inconsistent handling of extractelement with undef index (PR42689)
Any constant other than zero was already folded to undef if the index is undef.
https://bugs.llvm.org/show_bug.cgi?id=42689
llvm-svn: 374729
Sanjay Patel [Sun, 13 Oct 2019 17:19:08 +0000 (17:19 +0000)]
[InstCombine] don't assume 'inbounds' for bitcast deref or null pointer in non-default address space
Follow-up to D68244 to account for a corner case discussed in:
https://bugs.llvm.org/show_bug.cgi?id=43501
Add one more restriction: if the pointer is deref-or-null and in a non-default
(non-zero) address space, we can't assume inbounds.
Differential Revision: https://reviews.llvm.org/D68706
llvm-svn: 374728
Nico Weber [Sun, 13 Oct 2019 17:19:00 +0000 (17:19 +0000)]
Make the last to clangd unit tests pass on Windows.
(Some lit tests still fail.)
See FIXME in diff for details.
Part of PR43592.
llvm-svn: 374727
Roman Lebedev [Sun, 13 Oct 2019 17:11:16 +0000 (17:11 +0000)]
[NFC][InstCombine] More test for "sign bit test via shifts" pattern (PR43595)
While that pattern is indirectly handled via
reassociateShiftAmtsOfTwoSameDirectionShifts(),
that incursme one-use restriction on truncation,
which is pointless since we know that we'll produce a single instruction.
Additionally, *if* we are only looking for sign bit,
we don't need shifts to be identical,
which isn't the case in general,
and is the blocker for me in bug in question:
https://bugs.llvm.org/show_bug.cgi?id=43595
llvm-svn: 374726
Simon Pilgrim [Sun, 13 Oct 2019 17:03:11 +0000 (17:03 +0000)]
[X86] SimplifyMultipleUseDemandedBitsForTargetNode - use getTargetShuffleInputs with KnownUndef/Zero results.
llvm-svn: 374725
Simon Pilgrim [Sun, 13 Oct 2019 17:03:02 +0000 (17:03 +0000)]
[X86] getTargetShuffleInputs - add KnownUndef/Zero output support
Adjust SimplifyDemandedVectorEltsForTargetNode to use the known elts masks instead of recomputing it locally.
llvm-svn: 374724
Casey Carter [Sun, 13 Oct 2019 16:46:16 +0000 (16:46 +0000)]
[libc++][test] std::variant test cleanup
* Add the conventional `return 0` to `main` in `variant.assign/conv.pass.cpp` and `variant.ctor/conv.pass.cpp`
* Fix some MSVC signed-to-unsigned conversion warnings by replacing `int` literarls with `unsigned int` literals
llvm-svn: 374723
Casey Carter [Sun, 13 Oct 2019 16:46:12 +0000 (16:46 +0000)]
[libc++][test] <=> now has a feature-test macro
...which `test/support/test_macros.h` can use to detect compiler support.
llvm-svn: 374722
Nico Weber [Sun, 13 Oct 2019 15:25:13 +0000 (15:25 +0000)]
gn build: (manually) merge r374720
llvm-svn: 374721
Paul Hoad [Sun, 13 Oct 2019 14:51:45 +0000 (14:51 +0000)]
[clang-format] Proposal for clang-format to give compiler style warnings
relanding {D68554} with fixed lit tests, checked on Windows and MacOS
llvm-svn: 374720
Simon Pilgrim [Sun, 13 Oct 2019 13:18:07 +0000 (13:18 +0000)]
[X86][AVX] Add i686 avx splat tests
llvm-svn: 374719
Nico Weber [Sun, 13 Oct 2019 13:15:27 +0000 (13:15 +0000)]
Make most clangd unittests pass on Windows
The Windows triple currently turns on delayed template parsing, which
confuses several unit tests that use templates.
For now, just explicitly disable delayed template parsing. This isn't
ideal, but:
- the Windows triple will soon no longer use delayed template parsing
by default
- there's precedent for this in the clangd unit tests already
- let's get the clangd tests pass on Windows first before making
behavioral changes
Part of PR43592.
llvm-svn: 374718
Simon Pilgrim [Sun, 13 Oct 2019 11:30:06 +0000 (11:30 +0000)]
BlockInCriticalSectionChecker - silence static analyzer dyn_cast null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 374717
Simon Pilgrim [Sun, 13 Oct 2019 11:29:35 +0000 (11:29 +0000)]
IRTranslator - silence static analyzer null dereference warnings. NFCI.
The CmpInst::getType() calls can be replaced by just using User::getType() that it was dyn_cast from, and we then need to assert that any default predicate cases came from the CmpInst.
llvm-svn: 374716
Csaba Dabis [Sun, 13 Oct 2019 10:59:30 +0000 (10:59 +0000)]
[clang-tidy] bugprone-not-null-terminated-result: checker adjustments 4
llvm-svn: 374715
Csaba Dabis [Sun, 13 Oct 2019 10:41:13 +0000 (10:41 +0000)]
[clang-tidy] bugprone-not-null-terminated-result: checker adjustments 3
On Windows the signed/unsigned int conversions of APInt seems broken, so that
two of the test files marked as unsupported on Windows, as a hotfix.
llvm-svn: 374713
Csaba Dabis [Sun, 13 Oct 2019 10:20:58 +0000 (10:20 +0000)]
[clang-tidy] bugprone-not-null-terminated-result: checker adjustments 2
llvm-svn: 374712
Csaba Dabis [Sun, 13 Oct 2019 09:46:56 +0000 (09:46 +0000)]
[clang-tidy] bugprone-not-null-terminated-result: checker adjustments
llvm-svn: 374711
Csaba Dabis [Sun, 13 Oct 2019 08:49:43 +0000 (08:49 +0000)]
[clang-tidy] bugprone-not-null-terminated-result: Sphinx adjustments 2
llvm-svn: 374710
Csaba Dabis [Sun, 13 Oct 2019 08:41:24 +0000 (08:41 +0000)]
[clang-tidy] bugprone-not-null-terminated-result: Sphinx adjustments
llvm-svn: 374709
GN Sync Bot [Sun, 13 Oct 2019 08:33:14 +0000 (08:33 +0000)]
gn build: Merge r374707
llvm-svn: 374708
Csaba Dabis [Sun, 13 Oct 2019 08:28:27 +0000 (08:28 +0000)]
[clang-tidy] New checker for not null-terminated result caused by strlen(), size() or equal length
Summary:
New checker called bugprone-not-null-terminated-result. This checker finds
function calls where it is possible to cause a not null-terminated result.
Usually the proper length of a string is `strlen(src) + 1` or equal length
of this expression, because the null terminator needs an extra space.
Without the null terminator it can result in undefined behaviour when the
string is read.
The following and their respective `wchar_t` based functions are checked:
`memcpy`, `memcpy_s`, `memchr`, `memmove`, `memmove_s`, `strerror_s`,
`strncmp`, `strxfrm`
The following is a real-world example where the programmer forgot to
increase the passed third argument, which is `size_t length`.
That is why the length of the allocated memory is not enough to hold the
null terminator.
```
static char *stringCpy(const std::string &str) {
char *result = reinterpret_cast<char *>(malloc(str.size()));
memcpy(result, str.data(), str.size());
return result;
}
```
In addition to issuing warnings, fix-it rewrites all the necessary code.
It also tries to adjust the capacity of the destination array:
```
static char *stringCpy(const std::string &str) {
char *result = reinterpret_cast<char *>(malloc(str.size() + 1));
strcpy(result, str.data());
return result;
}
```
Note: It cannot guarantee to rewrite every of the path-sensitive memory
allocations.
Reviewed By: JonasToth, aaron.ballman, whisperity, alexfh
Tags: #clang-tools-extra, #clang
Differential Revision: https://reviews.llvm.org/D45050
llvm-svn: 374707
Craig Topper [Sun, 13 Oct 2019 06:48:05 +0000 (06:48 +0000)]
[X86] Add a one use check on the setcc to the min/max canonicalization code in combineSelect.
This seems to improve std::midpoint code where we have a min and
a max with the same condition. If we split the setcc we can end
up with two compares if the one of the operands is a constant.
Since we aggressively canonicalize compares with constants.
For non-constants it can interfere with our ability to share
control flow if we need to expand cmovs into control flow.
I'm also not sure I understand this min/max canonicalization code.
The motivating case talks about comparing with 0. But we don't
check for 0 explicitly.
Removes one instruction from the codegen for PR43658.
llvm-svn: 374706
Craig Topper [Sun, 13 Oct 2019 05:47:47 +0000 (05:47 +0000)]
[X86] Enable v4i32->v4i16 and v8i16->v8i8 saturating truncates to use pack instructions with avx512.
llvm-svn: 374705
Craig Topper [Sun, 13 Oct 2019 05:47:42 +0000 (05:47 +0000)]
[X86] Add v2i64->v2i32/v2i16/v2i8 test cases to the trunc packus/ssat/usat tests. NFC
llvm-svn: 374704
Johannes Doerfert [Sun, 13 Oct 2019 05:27:09 +0000 (05:27 +0000)]
[Attributor][FIX] Avoid splitting blocks if possible
Before, we eagerly split blocks even if it was not necessary, e.g., they
had a single unreachable instruction and only a single predecessor.
llvm-svn: 374703
Johannes Doerfert [Sun, 13 Oct 2019 05:19:17 +0000 (05:19 +0000)]
[Attributor][FIX] Remove leftover, now unused, variable
llvm-svn: 374702
Johannes Doerfert [Sun, 13 Oct 2019 05:07:00 +0000 (05:07 +0000)]
[Attributor] Remove unused verification flag
We use the verify max iteration now which is more reliable.
llvm-svn: 374701
Johannes Doerfert [Sun, 13 Oct 2019 04:16:02 +0000 (04:16 +0000)]
[Attributor][NFC] Expose call site traversal without QueryingAA
llvm-svn: 374700
Johannes Doerfert [Sun, 13 Oct 2019 04:14:15 +0000 (04:14 +0000)]
[Attributor][FIX] Ensure h2s doesn't trigger on escaped pointers
We do not yet perform h2s because we know something is free'ed but we do
it because we know the pointer does not escape. Storing the pointer
allows it to escape so we have to prevent that.
llvm-svn: 374699
Johannes Doerfert [Sun, 13 Oct 2019 03:54:08 +0000 (03:54 +0000)]
[Attributor][FIX] Do not apply h2s for arbitrary mallocs
H2S did apply to mallocs of non-constant sizes if the uses were OK. This
is now forbidden through reording of the "good" and "bad" cases in the
conditional.
llvm-svn: 374698
Johannes Doerfert [Sun, 13 Oct 2019 02:42:09 +0000 (02:42 +0000)]
[Attributor][FIX] Add missing function declaration in test case
llvm-svn: 374696
Johannes Doerfert [Sun, 13 Oct 2019 02:24:02 +0000 (02:24 +0000)]
[Attributor][FIX] Avoid modifying naked/optnone functions
The check for naked/optnone was insufficient for different reasons. We
now check before we initialize an abstract attribute and we do it for
all abstract attributes.
llvm-svn: 374694
Johannes Doerfert [Sun, 13 Oct 2019 02:21:23 +0000 (02:21 +0000)]
[SROA] Reuse existing lifetime markers if possible
Summary:
If the underlying alloca did not change, we do not necessarily need new
lifetime markers. This patch adds a check and reuses the old ones if
possible.
Reviewers: reames, ssarda, t.p.northover, hfinkel
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68900
llvm-svn: 374692
Nico Weber [Sat, 12 Oct 2019 22:58:34 +0000 (22:58 +0000)]
Revert r374663 "[clang-format] Proposal for clang-format to give compiler style warnings"
The test fails on macOS and looks a bit wrong, see comments on the review.
Also revert follow-up r374686.
llvm-svn: 374688
Nico Weber [Sat, 12 Oct 2019 22:24:56 +0000 (22:24 +0000)]
gn build: (manually) merge r374663
llvm-svn: 374686
Casey Carter [Sat, 12 Oct 2019 19:01:46 +0000 (19:01 +0000)]
[libc++][test] Silence MSVC warning in std::optional test
`make_optional<string>(4, 'X')` passes `4` (an `int`) as the first argument to `string`'s `(size_t, charT)` constructor, triggering a signed/unsigned mismatch warning when compiling with MSVC at `/W4`. The incredibly simple fix is to instead use an unsigned literal (`4u`).
llvm-svn: 374684
Joel E. Denny [Sat, 12 Oct 2019 18:52:46 +0000 (18:52 +0000)]
Revert r374648: "Reland r374388: [lit] Make internal diff work in pipelines"
This series of patches still breaks a Windows bot.
llvm-svn: 374683
Joel E. Denny [Sat, 12 Oct 2019 18:52:31 +0000 (18:52 +0000)]
Revert r374649: "Reland r374389: [lit] Clean up internal diff's encoding handling"
This series of patches still breaks a Windows bot.
llvm-svn: 374682
Joel E. Denny [Sat, 12 Oct 2019 18:52:18 +0000 (18:52 +0000)]
Revert r374650: "Reland r374390: [lit] Extend internal diff to support `-` argument"
This series of patches still breaks a Windows bot.
llvm-svn: 374681
Joel E. Denny [Sat, 12 Oct 2019 18:52:05 +0000 (18:52 +0000)]
Revert 374651: "Reland r374392: [lit] Extend internal diff to support -U"
This series of patches still breaks a Windows bot.
llvm-svn: 374680
Joel E. Denny [Sat, 12 Oct 2019 18:51:51 +0000 (18:51 +0000)]
Revert r374652: "[lit] Fix internal diff's --strip-trailing-cr and use it"
This series of patches still breaks a Windows bot.
llvm-svn: 374679
Joel E. Denny [Sat, 12 Oct 2019 18:51:34 +0000 (18:51 +0000)]
Revert r374653: "[lit] Fix a few oversights in r374651 that broke some bots"
This series of patches still breaks a Windows bot.
llvm-svn: 374678
Joel E. Denny [Sat, 12 Oct 2019 18:51:18 +0000 (18:51 +0000)]
Revert r374665: "[lit] Try yet again to fix new tests that fail on Windows bots"
This series of patches still breaks a Windows bot.
llvm-svn: 374677
Joel E. Denny [Sat, 12 Oct 2019 18:51:08 +0000 (18:51 +0000)]
Revert r374666: "[lit] Adjust error handling for decode introduced by r374665"
This series of patches still breaks a Windows bot.
llvm-svn: 374676
Joel E. Denny [Sat, 12 Oct 2019 18:50:57 +0000 (18:50 +0000)]
Revert r374671: "[lit] Try errors="ignore" for decode introduced by r374665"
This series of patches still breaks a Windows bot.
llvm-svn: 374675
Simon Pilgrim [Sat, 12 Oct 2019 18:33:47 +0000 (18:33 +0000)]
[X86] scaleShuffleMask - use size_t Scale to avoid overflow warnings
llvm-svn: 374674
Simon Pilgrim [Sat, 12 Oct 2019 17:55:09 +0000 (17:55 +0000)]
SymbolRecord - consistently use explicit for single operand constructors
llvm-svn: 374673
Simon Pilgrim [Sat, 12 Oct 2019 17:55:01 +0000 (17:55 +0000)]
SymbolRecord - fix uninitialized variable warnings. NFCI.
llvm-svn: 374672
Joel E. Denny [Sat, 12 Oct 2019 17:23:25 +0000 (17:23 +0000)]
[lit] Try errors="ignore" for decode introduced by r374665
Still trying to fix the same error as in r374666.
llvm-svn: 374671
Roman Lebedev [Sat, 12 Oct 2019 16:48:16 +0000 (16:48 +0000)]
[NFC][LoopIdiom] Adjust FIXME to be self-explanatory
llvm-svn: 374670
Simon Pilgrim [Sat, 12 Oct 2019 16:37:02 +0000 (16:37 +0000)]
Replace for-loop of SmallVector::push_back with SmallVector::append. NFCI.
llvm-svn: 374669
Simon Pilgrim [Sat, 12 Oct 2019 16:36:52 +0000 (16:36 +0000)]
Fix cppcheck shadow variable name warnings. NFCI.
llvm-svn: 374668
Simon Pilgrim [Sat, 12 Oct 2019 16:36:44 +0000 (16:36 +0000)]
[X86] Use any_of/all_of patterns in shuffle mask pattern recognisers. NFCI.
llvm-svn: 374667
Joel E. Denny [Sat, 12 Oct 2019 16:25:46 +0000 (16:25 +0000)]
[lit] Adjust error handling for decode introduced by r374665
On that decode, Windows bots fail with:
```
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)
```
That's the same error as before r374665 except it's now at the decode
before the write to stdout.
llvm-svn: 374666
Joel E. Denny [Sat, 12 Oct 2019 16:00:35 +0000 (16:00 +0000)]
[lit] Try yet again to fix new tests that fail on Windows bots
I seem to have misread the bot logs on my last attempt. When lit's
internal diff runs on Windows under Python 2.7, it's text diffs not
binary diffs that need decoding to avoid this error when writing the
diff to stdout:
```
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)
```
There is no `decode` attribute in this case under Python 3.6.8 under
Ubuntu, so this patch checks for the `decode` attribute before using
it here. Hopefully nothing else is needed when `decode` isn't
available.
It might take a couple more attempts to figure out what error
handling, if any, is needed for this decoding.
llvm-svn: 374665
Joel E. Denny [Sat, 12 Oct 2019 16:00:25 +0000 (16:00 +0000)]
Revert r374657: "[lit] Try again to fix new tests that fail on Windows bots"
llvm-svn: 374664
Paul Hoad [Sat, 12 Oct 2019 15:36:05 +0000 (15:36 +0000)]
[clang-format] Proposal for clang-format to give compiler style warnings
Summary:
Related somewhat to {D29039}
On seeing a quote on twitter by @invalidop
> If it's not formatted with clang-format it's a build error.
This made me want to change the way I use clang-format into a tool that could optionally show me where my source code violates clang-format syle.
When I'm making a change to clang-format itself, one thing I like to do to test the change is to ensure I didn't cause a huge wave of changes, what I want to do is simply run this on a known formatted directory and see if any new differences arrive in a manner I'm used to.
This started me thinking that we should allow build systems to run clang-format on a whole tree and emit compiler style warnings about files that fail clang-format in a form that would make them as a warning in most build systems and because those build systems range in their construction I don't think its unreasonable to NOT expect them to have to do the directory searching or parsing the output replacements themselves, but simply transform that into an error code when there are changes required.
I am starting this by suggesing adding a -n or -dry-run command line argument which would emit a warning/error of the form
Support for various common compiler command line argumuments like '-Werror' and '-ferror-limit' could make this very flexible to be integrated into build systems and CI systems.
```
> $ /usr/bin/clang-format --dry-run ClangFormat.cpp -ferror-limit=3 -fcolor-diagnostics
> ClangFormat.cpp:54:29: warning: code should be clang-formatted [-Wclang-format-violations]
> static cl::list<std::string>
> ^
> ClangFormat.cpp:55:20: warning: code should be clang-formatted [-Wclang-format-violations]
> LineRanges("lines", cl::desc("<start line>:<end line> - format a range of\n"
> ^
> ClangFormat.cpp:55:77: warning: code should be clang-formatted [-Wclang-format-violations]
> LineRanges("lines", cl::desc("<start line>:<end line> - format a range of\n"
> ^
```
Reviewers: mitchell-stellar, klimek, owenpan
Reviewed By: klimek
Subscribers: mgorny, cfe-commits
Tags: #clang-format, #clang-tools-extra, #clang
Differential Revision: https://reviews.llvm.org/D68554
llvm-svn: 374663
Roman Lebedev [Sat, 12 Oct 2019 15:35:32 +0000 (15:35 +0000)]
[LoopIdiomRecognize] Recommit: BCmp loop idiom recognition
Summary:
This is a recommit, this originally landed in rL370454 but was
subsequently reverted in rL370788 due to
https://bugs.llvm.org/show_bug.cgi?id=43206
The reduced testcase was added to bcmp-negative-tests.ll
as @pr43206_different_loops - we must ensure that the SCEV's
we got are both for the same loop we are currently investigating.
Original commit message:
@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.
In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.
libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ
libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)
So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {
F8768213}
```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>
#include "benchmark/benchmark.h"
template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
for (; a != a_end; ++a, ++b) {
if (*a != *b) return false;
}
return true;
}
template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
std::numeric_limits<T>::max());
std::vector<T> v;
v.reserve(count);
std::generate_n(std::back_inserter(v), count,
[&dis, &gen]() { return dis(gen); });
assert(v.size() == count);
return v;
}
struct Identical {
template <typename T>
static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
auto Tmp = getVectorOfRandomNumbers<T>(count);
return std::make_pair(Tmp, std::move(Tmp));
}
};
struct InequalHalfway {
template <typename T>
static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
auto V0 = getVectorOfRandomNumbers<T>(count);
auto V1 = V0;
V1[V1.size() / size_t(2)]++; // just change the value.
return std::make_pair(std::move(V0), std::move(V1));
}
};
template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
const size_t Length = state.range(0);
const std::pair<std::vector<T>, std::vector<T>> Data =
Gen::template Gen<T>(Length);
const std::vector<T>& a = Data.first;
const std::vector<T>& b = Data.second;
assert(a.size() == Length && b.size() == a.size());
benchmark::ClobberMemory();
benchmark::DoNotOptimize(a);
benchmark::DoNotOptimize(a.data());
benchmark::DoNotOptimize(b);
benchmark::DoNotOptimize(b.data());
for (auto _ : state) {
const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
benchmark::DoNotOptimize(is_equal);
}
state.SetComplexityN(Length);
state.counters["eltcnt"] =
benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
state.counters["eltcnt/sec"] =
benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
const size_t BytesRead = 2 * sizeof(T) * Length;
state.counters["bytes_read/iteration"] =
benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
benchmark::Counter::OneK::kIs1024);
state.counters["bytes_read/sec"] = benchmark::Counter(
BytesRead, benchmark::Counter::kIsIterationInvariantRate,
benchmark::Counter::OneK::kIs1024);
}
template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
const size_t L2SizeBytes = []() {
for (const benchmark::CPUInfo::CacheInfo& I :
benchmark::CPUInfo::Get().caches) {
if (I.level == 2) return I.size;
}
return 0;
}();
// What is the largest range we can check to always fit within given L2 cache?
const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
/*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}
BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
->Apply(CustomArguments<uint64_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
->Apply(CustomArguments<uint64_t>);
```
{
F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
L1 Data 16K (x8)
L1 Instruction 64K (x4)
L2 Unified 2048K (x4)
L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N
BM_bcmp<uint8_t, Identical>_RMS 8 % 8 %
<...>
BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N
BM_bcmp<uint16_t, Identical>_RMS 25 % 25 %
<...>
BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N
BM_bcmp<uint32_t, Identical>_RMS 42 % 42 %
<...>
BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N
BM_bcmp<uint64_t, Identical>_RMS 27 % 27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
L1 Data 16K (x8)
L1 Instruction 64K (x4)
L2 Unified 2048K (x4)
L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N
BM_bcmp<uint8_t, Identical>_RMS 37 % 37 %
<...>
BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N
BM_bcmp<uint16_t, Identical>_RMS 34 % 34 %
<...>
BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N
BM_bcmp<uint32_t, Identical>_RMS 35 % 35 %
<...>
BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N
BM_bcmp<uint64_t, Identical>_RMS 33 % 33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark Time CPU Time Old Time New CPU Old CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590
<...>
BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948
<...>
BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627
<...>
BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498
```
What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
linearly decreases with element size.
For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
As it can be seen from the full output {
F8768210}, the `memcmp()` is almost
universally worse, independent of the element size (and thus buffer size) when
element count is less than 8.
So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.
Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {
F8768216} {
F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp
Program result-new
MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00
MultiSource/Applications/d/make_dparser 3.00
SingleSource/UnitTests/vla 2.00
MultiSource/Applications/Burg/burg 1.00
MultiSourc.../Applications/JM/lencod/lencod 1.00
MultiSource/Applications/lemon/lemon 1.00
MultiSource/Benchmarks/Bullet/bullet 1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00
MultiSourc...Prolangs-C/simulator/simulator 1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text
Program result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6%
test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5%
test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1%
test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1%
test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1%
test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1%
test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0%
test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0%
test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0%
test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan%
Geomean difference nan%
result-old result-new diff
count 1.000000e+01 10.00000 10.000000
mean 3.152090e+05 311695.40000 0.006749
std 3.790398e+05 372091.42232 0.036605
min 7.530000e+02 833.00000 -0.034981
25% 4.243300e+04 42401.00000 -0.000866
50% 1.197370e+05 119689.00000 -0.000392
75% 6.397050e+05 639705.00000 -0.000005
max 1.001697e+06 966657.00000 0.106242
```
I don't have timings though.
And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.
Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???
Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet
Reviewed By: courbet
Subscribers: miyuki, hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61144
llvm-svn: 374662
Roman Lebedev [Sat, 12 Oct 2019 15:35:16 +0000 (15:35 +0000)]
[NFC][LoopIdiom] Add bcmp loop idiom miscompile test from PR43206.
The transform forgot to check SCEV loop scopes.
https://bugs.llvm.org/show_bug.cgi?id=43206
llvm-svn: 374661
Roman Lebedev [Sat, 12 Oct 2019 15:35:09 +0000 (15:35 +0000)]
[NFC][LoopIdiom] Move one bcmp test into the proper place
llvm-svn: 374660
Sylvestre Ledru [Sat, 12 Oct 2019 15:24:00 +0000 (15:24 +0000)]
remove an useless allocation found by scan-build - the new Dead nested assignment check
llvm-svn: 374659