Melanie Blower [Wed, 28 Jul 2021 14:50:14 +0000 (10:50 -0400)]
[CLANG][PATCH][FPEnv] Add support for option -ffp-eval-method and extend #pragma float_control similarly
The Intel compiler ICC supports the option "-fp-model=(source|double|extended)"
which causes the compiler to use a wider type for intermediate floating point
calculations. Also supported is a way to embed this effect in the source
program with #pragma float_control(source|double|extended).
This patch extends pragma float_control syntax, and also adds support
for a new floating point option "-ffp-eval-method=(source|double|extended)".
source: intermediate results use source precision
double: intermediate results use double precision
extended: intermediate results use extended precision
Reviewed By: Aaron Ballman
Differential Revision: https://reviews.llvm.org/D93769
Lei Zhang [Wed, 28 Jul 2021 14:30:54 +0000 (10:30 -0400)]
[mlir][spirv] Fix a few issues in ModuleCombiner
- Fixed symbol insertion into `symNameToModuleMap`. Insertion
needs to happen whether symbols are renamed or not.
- Added check for the VCE triple and avoid dropping it.
- Disabled function deduplication. It requires more careful
rules. Right now it can remove different functions.
- Added tests for symbol rename listener.
- And some other code/comment cleanups.
Reviewed By: ergawy
Differential Revision: https://reviews.llvm.org/
D106886
Kazu Hirata [Wed, 28 Jul 2021 13:49:28 +0000 (06:49 -0700)]
[AsmParser] Remove unused declaration parseOptionalCommaInAlloca (NFC)
Jun Ma [Tue, 27 Jul 2021 03:29:46 +0000 (11:29 +0800)]
[InstSimplify] Simplify llvm.vscale when vscale_range attribute exists
Reduce llvm.vscale to constant based on vscale_range attribute.
Differential Revision: https://reviews.llvm.org/
D106850
Alexey Bataev [Wed, 28 Jul 2021 13:02:51 +0000 (06:02 -0700)]
[SLP]Fix build on MacOS, NFC.
Sanjay Patel [Wed, 28 Jul 2021 13:07:45 +0000 (09:07 -0400)]
[x86] improve CMOV codegen by pushing add into operands, part 3
In this episode, we are trying to avoid an x86 micro-arch quirk where complex
(3 operand) LEA potentially costs significantly more than simple LEA. So we
simultaneously push and pull the math around the CMOV to balance the operations.
I looked at the debug spew during instruction selection and decided against
trying a later DAGToDAG transform -- it seems very difficult to match if the
trailing memops are already selected and managing the creation of extra
instructions at that level is always tricky.
Differential Revision: https://reviews.llvm.org/
D106918
Dmitry Vyukov [Wed, 28 Jul 2021 07:45:43 +0000 (09:45 +0200)]
sanitizer_common: replace RWMutex/BlockingMutex with Mutex
Mutex supports reader access, OS blocking, spinning,
portable and smaller than BlockingMutex.
Overall it's supposed to be better than RWMutex/BlockingMutex.
Replace RWMutex/BlockingMutex with Mutex.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D106936
Dmitry Vyukov [Wed, 28 Jul 2021 11:08:49 +0000 (13:08 +0200)]
sanitizer_common: prohibit Mutex(LINKER_INITIALIZED)
Mutex does not support LINKER_INITIALIZED ctor.
But we used to support it with BlockingMutex.
To prevent potential bugs delete LINKER_INITIALIZED Mutex ctor.
Also mark existing ctor as explicit.
Depends on
D106944.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D106945
Dmitry Vyukov [Wed, 28 Jul 2021 11:12:24 +0000 (13:12 +0200)]
sanitizers: switch BlockingMutex(LINKER_INITIALIZED) to Mutex
Mutex does not support LINKER_INITIALIZED support.
As preparation to switching BlockingMutex to Mutex,
proactively replace all BlockingMutex(LINKER_INITIALIZED) to Mutex.
All of these are objects with static storage duration and Mutex ctor
is constexpr, so it should be equivalent.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D106944
David Spickett [Tue, 30 Mar 2021 12:36:26 +0000 (13:36 +0100)]
[lldb] Add "memory tag write" --end-addr option
The default mode of "memory tag write" is to calculate the
range from the start address and the number of tags given.
(just like "memory write" does)
(lldb) memory tag write mte_buf 1 2
(lldb) memory tag read mte_buf mte_buf+48
Logical tag: 0x0
Allocation tags:
[0xfffff7ff9000, 0xfffff7ff9010): 0x1
[0xfffff7ff9010, 0xfffff7ff9020): 0x2
[0xfffff7ff9020, 0xfffff7ff9030): 0x0
This new option allows you to set an end address and have
the tags repeat until that point.
(lldb) memory tag write mte_buf 1 2 --end-addr mte_buf+64
(lldb) memory tag read mte_buf mte_buf+80
Logical tag: 0x0
Allocation tags:
[0xfffff7ff9000, 0xfffff7ff9010): 0x1
[0xfffff7ff9010, 0xfffff7ff9020): 0x2
[0xfffff7ff9020, 0xfffff7ff9030): 0x1
[0xfffff7ff9030, 0xfffff7ff9040): 0x2
[0xfffff7ff9040, 0xfffff7ff9050): 0x0
This is implemented using the QMemTags packet previously
added. We skip validating the number of tags in lldb and send
them on to lldb-server, which repeats them as needed.
Apart from the number of tags, all the other client side checks
remain. Tag values, memory range must be tagged, etc.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/
D105183
Simon Pilgrim [Wed, 28 Jul 2021 13:01:59 +0000 (14:01 +0100)]
[X86][AVX] Move VPERM2F128 defs above VINSERTF128 defs. NFC.
This will be necessary for a future patch to lower VINSERTF128 custom folds to VPERM2F128
Alexey Bataev [Fri, 18 Jun 2021 17:40:00 +0000 (10:40 -0700)]
[SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.
Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.
The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.
The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.
Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.
Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.
Differential Revision: https://reviews.llvm.org/
D105020
Whisperity [Wed, 28 Jul 2021 12:01:45 +0000 (14:01 +0200)]
[clang-tidy] Fix crash on "reference-to-array" parameters in 'bugprone-easily-swappable-parameters'
An otherwise unexercised code path related to trying to model
"array-to-pointer decay" resulted in a null pointer dereference crash
when parameters of type "reference to array" were encountered.
Fixes crash report http://bugs.llvm.org/show_bug.cgi?id=50995.
Reviewed By: aaron.ballman
Differential Revision: http://reviews.llvm.org/
D106946
Wael Yehia [Tue, 27 Jul 2021 13:26:38 +0000 (09:26 -0400)]
[LTO][Legacy] Add new API to check presence of ctor/dtor functions.
On AIX, the linker needs to check whether a given lto_module_t contains
any constructor/destructor functions, in order to implement the behavior
of the -bcdtors:all flag. See
https://www.ibm.com/docs/en/aix/7.2?topic=l-ld-command for the flag's
documentation.
In llvm IR, constructor (destructor) functions are added to a special
global array @llvm.global_ctors (@llvm.global_dtors).
However, because these two symbols are artificial, they are not visited
during the symbol traversal (using the
lto_module_get_[num_symbols|symbol_name|symbol_attribute] API).
This patch adds a new function to the libLTO interface that checks the
presence of one or both of these two symbols.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/
D106887
Florian Hahn [Wed, 28 Jul 2021 12:12:32 +0000 (13:12 +0100)]
[LV] Move recurrence backedge fixup code to VPlan::execute (NFC).
As suggested in
D105008, move the code that fixes up the backedge value
for first order recurrences to VPlan::execute.
Now all that remains in fixFirstOrderRecurrences is the code responsible
for creating the exit values in the middle block.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/
D106244
David Green [Wed, 28 Jul 2021 11:50:58 +0000 (12:50 +0100)]
[LV][ARM] Tighten up MLA reduction costing
This makes a couple of changes to the costing of MLA reduction patterns,
to more accurately cost various patterns that can come up from
vectorization.
- The Arm implementation of getExtendedAddReductionCost is altered to
only provide costs for legal or smaller types. Larger than legal types
need to be split, which currently does not work very well, especially
for predicated reductions where the predicate may be legal but needs to
be split. Currently we limit it to legal or smaller input types.
- The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext))
is a pattern that can come up, and can be treated the same as
reduce(mul(ext, ext)) providing the extension types match.
- And it has been adjusted to not count the ext in reduce(mul(ext, ext))
as part of a reduce(mul) pattern.
Together these changes help to more accurately cost the mla reductions
in cases such as where the extend types don't match or the extend
opcodes are different, picking better vector factors that don't result
in expanded reductions.
Differential Revision: https://reviews.llvm.org/
D106166
Tobias Gysi [Wed, 28 Jul 2021 11:24:27 +0000 (11:24 +0000)]
[mlir][linalg] Specialize LinalgOp canonicalization patterns (NFC).
Specialize the DeduplicateInputs and RemoveIdentityLinalgOps patterns for GenericOp instead of implementing them for the LinalgOp interface.
This revsion is based on https://reviews.llvm.org/
D105622 that moves the logic to erase identity CopyOps in a separate pattern.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/
D105291
Aaron Ballman [Wed, 28 Jul 2021 11:37:56 +0000 (07:37 -0400)]
Allow #pragma float_control(push|pop) within a language linkage specification
Currently, we prohibit this pragma from appearing within a language
linkage specification, but this is useful functionality that is
supported by MSVC (which is where we inherited this feature from).
This patch allows you to use the pragma within an extern "C" {} (etc)
block.
Tobias Gysi [Wed, 28 Jul 2021 09:42:01 +0000 (09:42 +0000)]
[mlir][linalg] Introduce a separate EraseIdentityCopyOp Pattern.
Split out an EraseIdentityCopyOp from the existing RemoveIdentityLinalgOps pattern. Introduce an additional check to ensure the pattern checks the permutation maps match. This is a preparation step to specialize RemoveIdentityLinalgOps to GenericOp only.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/
D105622
David Spickett [Wed, 28 Jul 2021 11:07:06 +0000 (11:07 +0000)]
[libcxx] Bump __libcpp_version to 14 after branch
This was missed in
08c766a7318ab37bf1d77e0c683cd3b00e700877
and caused test failures in the buildkite bots:
libcpp_version.pass.cpp:22:1:
error: static_assert failed due to requirement '14000 == libcpp_version'
"_LIBCPP_VERSION doesn't match __libcpp_version
Muhammad Omair Javaid [Wed, 28 Jul 2021 10:28:53 +0000 (15:28 +0500)]
[LLDB] Skip TestGuiBasicDebug.py on Arm/AArch64 Linux
TestGuiBasicDebug.py randomly fails due to timeouts sending out false
negatives on LLDB Arm and AArch64 Linux buildbots. I havnt found a
reliable wayy to set pexpect timeout for this test to pass regularly.
Skipping it on Arm and AArch64 Linux to silence buildbot failures.
Muhammad Omair Javaid [Wed, 28 Jul 2021 10:24:52 +0000 (15:24 +0500)]
[LLDB] Skip HW breakpoints test_step_until on Arm/Linux
test_step_until xpasses on some machines while fails on others.
Marking it as skipped for now.
Yi Zhang [Wed, 28 Jul 2021 10:19:05 +0000 (10:19 +0000)]
[mlir][memref] Fix collapsed shape ops memref.cast folding with changed type
`memref.collapse_shape` has verification logic to make sure
result dim must be static if all the collapsing src dims are static.
Cast folding might add more static information for the src operand
of `memref.collapse_shape` which might change a valid collapsing
operation to be invalid. Add `CollapseShapeOpMemRefCastFolder` pattern
to fix this.
Minor changes to `convertReassociationIndicesToExprs` to use `context`
instead of `builder` to avoid extra steps to construct temporary
builders.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/
D106670
David Green [Wed, 28 Jul 2021 09:55:06 +0000 (10:55 +0100)]
[ARM] Extra MVE reduction vectorizer tests. NFC
Raphael Isemann [Wed, 28 Jul 2021 09:13:23 +0000 (11:13 +0200)]
[lldb] Temporarily bump the max length of the pexpect error message to diagnose an lldb-aarch64 test failure
This is only temporarily to gather some logs before this gets reverted. See
D106873 for a discussion about how/if we can make this change permanent.
David Spickett [Wed, 31 Mar 2021 13:58:07 +0000 (14:58 +0100)]
[lldb] Add "memory tag write" command
This adds a new command for writing memory tags.
It is based on the existing "memory write" command.
Syntax: memory tag write <address-expression> <value> [<value> [...]]
(where "value" is a tag value)
(lldb) memory tag write mte_buf 1 2
(lldb) memory tag read mte_buf mte_buf+32
Logical tag: 0x0
Allocation tags:
[0xfffff7ff9000, 0xfffff7ff9010): 0x1
[0xfffff7ff9010, 0xfffff7ff9020): 0x2
The range you are writing to will be calculated by
aligning the address down to a granule boundary then
adding as many granules as there are tags.
(a repeating mode with an end address will be in a follow
up patch)
This is why "memory tag write" uses MakeTaggedRange but has
some extra steps to get this specific behaviour.
The command does all the usual argument validation:
* Address must evaluate
* You must supply at least one tag value
(though lldb-server would just treat that as a nop anyway)
* Those tag values must be valid for your tagging scheme
(e.g. for MTE the value must be > 0 and < 0xf)
* The calculated range must be memory tagged
That last error will show you the final range, not just
the start address you gave the command.
(lldb) memory tag write mte_buf_2+page_size-16 6
(lldb) memory tag write mte_buf_2+page_size-16 6 7
error: Address range 0xfffff7ffaff0:0xfffff7ffb010 is not in a memory tagged region
(note that we do not check if the region is writeable
since lldb can write to it anyway)
The read and write tag tests have been merged into
a single set of "tag access" tests as their test programs would
have been almost identical.
(also I have renamed some of the buffers to better
show what each one is used for)
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/
D105182
Chris Jackson [Wed, 28 Jul 2021 08:57:03 +0000 (09:57 +0100)]
Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR"
Crashes were reported on the upstreamm revision:
https://reviews.llvm.org/
D105207
This reverts commit
796b84d26f4d461fb50e7b4e84e15a10eaca88fc.
Luna Kirkby [Wed, 28 Jul 2021 08:17:54 +0000 (10:17 +0200)]
[clang-format] Correctly attach enum braces with ShortEnums disabled
Previously, with AllowShortEnumsOnASingleLine disabled, enums that would have otherwise fit on a single line would always put the opening brace on its own line.
This patch ensures that these enums will only put the brace on its own line if the existing attachment rules indicate that it should.
Reviewed By: HazardyKnusperkeks, curdeius
Differential Revision: https://reviews.llvm.org/D99840
Muhammad Omair Javaid [Wed, 28 Jul 2021 08:26:06 +0000 (13:26 +0500)]
Revert "[LLDB] Skip HW breakpoints test_step_until on Arm/Linux"
This reverts commit
ab5b8ee1a7a18fe097419e21224ac4f15591bcd7.
This caused some failure on buildbots so reverting it for now.
Muhammad Omair Javaid [Wed, 28 Jul 2021 08:17:10 +0000 (13:17 +0500)]
[LLDB] Skip HW breakpoints test_step_until on Arm/Linux
test_step_until xpasses on some machines while fails on others. I am
marking it as skipped for now.
Lang Hames [Wed, 28 Jul 2021 08:10:58 +0000 (18:10 +1000)]
[ORC] Fix missing include.
Aims to fix bot failures for some module builds, e.g.
https://green.lab.llvm.org/green/blue/organizations/jenkins/lldb-cmake/detail/lldb-cmake/33934/pipeline/
Simon Pilgrim [Wed, 28 Jul 2021 07:51:53 +0000 (08:51 +0100)]
[SLP][X86] Fix naming consistency of dot product tests. NFC.
Dmitry Vyukov [Tue, 27 Jul 2021 05:02:09 +0000 (07:02 +0200)]
Revert "sanitizers: increase .clang-format columns to 100"
This reverts commit
5d1df6d220f1d6f726d9643848679d781750db64.
There is a strong objection to this change:
https://reviews.llvm.org/
D106436#
2905618
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/
D106847
Mehdi Amini [Wed, 28 Jul 2021 06:04:27 +0000 (06:04 +0000)]
Revert "Emit strong definition for TypeID storage in Op/Type/Attributes definition"
This reverts commit
b349d4c5e1852091aad97d3750e286493cac7178.
This broke a bot that exposes some missing CMake dependencies that need
to be fixed first.
RamNalamothu [Wed, 28 Jul 2021 04:14:07 +0000 (09:44 +0530)]
[AMDGPU] We would need FP if there is call and caller save VGPR spills
Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs
to consider caller save VGPR spills as well while anticipating if we
require FP.
Fixes: SWDEV-295978
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/
D106758
Mehdi Amini [Wed, 28 Jul 2021 05:22:45 +0000 (05:22 +0000)]
Emit strong definition for TypeID storage in Op/Type/Attributes definition
By making an explicit template specialization for the TypeID provided by these classes,
the compiler will not emit an inline weak definition and rely on the linker to unique it.
Instead a single definition will be emitted in the C++ file alongside the implementation
for these classes. That will turn into a linker error what is now a hard-to-debug runtime
behavior where instances of the same class may be using a different TypeID inside of
different DSOs.
Differential Revision: https://reviews.llvm.org/
D105903
Tom Stellard [Wed, 28 Jul 2021 04:51:07 +0000 (21:51 -0700)]
Bump the trunk major version to 14
and clear the release notes.
Jose M Monsalve Diaz [Wed, 28 Jul 2021 04:44:36 +0000 (23:44 -0500)]
[OpenMP] Fixing missing variables when CUDA SDK not in system
This patch fixes the error reported in
D106751. When there is no CUDA SDK
installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE`
variables.
Using @zsrkmyn sugested fix
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/
D106933
Jose M Monsalve Diaz [Wed, 28 Jul 2021 02:38:27 +0000 (22:38 -0400)]
[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool
This patch introduces the `llvm-omp-device-info` tool, which uses the
omptarget library and interface to query the device info from all the
available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo`
Since omptarget usually requires a description structure with executable
kernels, I split the initialization of the RTLs and Devices to be able to
initialize all possible devices and query each of them.
This revision relies on the patch that introduces the print device info.
A limitation is that the order in which the devices are initialized, and the
corresponding device ID is not necesarily the one seen by OpenMP.
The changes are as follows:
1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function
2. Create an `initAllRTLs` method that initializes all available RTLs at runtime
3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information.
Example Output:
```
Device (0):
print_device_info not implemented
Device (1):
print_device_info not implemented
Device (2):
print_device_info not implemented
Device (3):
print_device_info not implemented
Device (4):
CUDA Driver Version: 11000
CUDA Device Number: 0
Device Name: Quadro P1000
Global Memory Size:
4236312576 bytes
Number of Multiprocessors: 5
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536 bytes
Max Shared Memory per Block: 49152 bytes
Registers per Block: 65536
Warp Size: 32 Threads
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions:
2147483647 x 65535 x 65535
Maximum Memory Pitch:
2147483647 bytes
Texture Alignment: 512 bytes
Clock Rate:
1480500 kHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: DEFAULT
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate:
2505000 kHz
Memory Bus Width: 128 bits
L2 Cache Size:
1048576 bytes
Max Threads Per SMP: 2048
Async Engines: Yes (2)
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: Yes
Preemption Supported: Yes
Cooperative Launch: Yes
Multi-Device Boars: No
Compute Capabilities: 61
```
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/
D106752
Valentin Clement [Wed, 28 Jul 2021 02:03:45 +0000 (22:03 -0400)]
[mlir][openacc] Initial translation for DataOp to LLVM IR
Add basic translation of acc.data to LLVM IR with runtime calls.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/
D104301
Jim Ingham [Wed, 28 Jul 2021 01:55:17 +0000 (18:55 -0700)]
Fix a thinko in the parsing of substitutions in CommandObjectRegexCommand.
The old code incorrectly calculated the start position for the search
for the third (and subsequent) instance of a particular substitution
pattern (e.g. %1).
I also added a few test cases for this parsing covering this failure.
Fangrui Song [Wed, 28 Jul 2021 01:57:33 +0000 (18:57 -0700)]
[mlir] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]]
[[noreturn]] can be used since 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
Fangrui Song [Wed, 28 Jul 2021 01:51:17 +0000 (18:51 -0700)]
[lld] Replace LLVM_ATTRIBUTE_NORETURN with [[noreturn]]
[[noreturn]] can be used since 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
Jose M Monsalve Diaz [Wed, 28 Jul 2021 01:47:40 +0000 (21:47 -0400)]
[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`
This patch introduces a function in the device's plugin to print the
device information. This patch relates to another patch that introduces
a CLI tool to obtain the device information from the omplibrary directly.
It is inspired by PGI's pgaccelinfo.
The modifications are as follows:
1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL.
2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented
3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy`
4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/
D106751
Jose M Monsalve Diaz [Wed, 28 Jul 2021 01:46:39 +0000 (21:46 -0400)]
[OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.
In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.
In the future we will explore specializing for multiple values of NumThreads and NumTeams.
Depends on
D106390
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/
D106033
Matheus Izvekov [Sun, 18 Apr 2021 20:24:39 +0000 (22:24 +0200)]
[clang] NFC: change uses of `Expr->getValueKind` into `is?Value`
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/
D100733
George Burgess IV [Mon, 26 Jul 2021 23:56:45 +0000 (23:56 +0000)]
llvm/utils: guarantee revert_checker's revert ordering
At the moment, the revert ordering from this tool is unspecified (though
it happens to be in `git log` order, so newest reverts come first).
From the standpoint of tooling and users, this seems to be the opposite
of what we want by default: tools and users will generally try to apply
these reverts as cherry-picks. If two reverts in the list are close
enough to each other, if the reverts get applied out of order, we'll get
a merge conflict.
Rather than having `reverse`s for all tools (and mental reverses for
manual users), just guarantee an oldest-first output ordering for this
function.
Differential Revision: https://reviews.llvm.org/
D106838
Juneyoung Lee [Fri, 2 Jul 2021 12:00:43 +0000 (21:00 +0900)]
[DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND
This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))`
if the SETCC is only used by BRCOND.
Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/
D105344
Juneyoung Lee [Fri, 2 Jul 2021 11:59:30 +0000 (20:59 +0900)]
Precommit test files for
D105344 (NFC)
Xiang1 Zhang [Wed, 28 Jul 2021 00:14:56 +0000 (08:14 +0800)]
[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/
D106780
Johannes Doerfert [Tue, 27 Jul 2021 23:13:56 +0000 (18:13 -0500)]
Reapply "[Attributor] Disable simplification AAs if a callback is present""
This reapplies commit
cbb709e25124dc38ee593882051fc88c987fe591 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.
This reverts commit
aa27430a625b2fd059707a87f8ba2df8f480ff11.
Xiang1 Zhang [Wed, 28 Jul 2021 00:09:20 +0000 (08:09 +0800)]
Revert "[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT"
This reverts commit
6ff73efea94621e74642e4d7a15cc86a5fb6d411.
Mehdi Amini [Tue, 27 Jul 2021 20:06:27 +0000 (20:06 +0000)]
Add llvm::equal convenient wrapper for ranges around std::equal
Differential Revision: https://reviews.llvm.org/
D106913
Louis Dionne [Tue, 27 Jul 2021 21:30:47 +0000 (17:30 -0400)]
[libc++] Fix a few warnings in system headers with GCC
This isn't fixing all of them, but at least it's making some progress.
Differential Revision: https://reviews.llvm.org/
D106283
Xiang1 Zhang [Tue, 27 Jul 2021 00:15:43 +0000 (08:15 +0800)]
[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT
River Riddle [Tue, 27 Jul 2021 23:58:41 +0000 (23:58 +0000)]
[PDL] Mark PatternOp as SingleBlock
This provides access to the SingleBlock accessor methods, e.g. getBody().
River Riddle [Tue, 27 Jul 2021 23:58:28 +0000 (23:58 +0000)]
[PDL] Fix the builders for OperationOp and PatternOp
Greg Clayton [Mon, 12 Jul 2021 17:03:46 +0000 (10:03 -0700)]
Create synthetic symbol names on demand to improve memory consumption and startup times.
This is a resubmission of https://reviews.llvm.org/
D105160 after fixing testing issues.
This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.
Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.
This patch fixes the issue by:
not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool
Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)
After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)
The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.
Differential Revision: https://reviews.llvm.org/
D106837
Krzysztof Parzyszek [Tue, 27 Jul 2021 22:18:10 +0000 (17:18 -0500)]
[Hexagon] Fix resetting dead registers in DBG_VALUE_LISTs
This fixes https://llvm.org/PR51229.
Fangrui Song [Tue, 27 Jul 2021 23:34:32 +0000 (16:34 -0700)]
Revert "[ELF] --gc-sections: allow GC on reserved sections in a group"
clang may place dynamic initializations for explicitly specialized class
template static data members in comdat.
Such in-comdat SHT_INIT_ARRAY was an abuse but we have to work around it for a while.
LLVM GN Syncbot [Tue, 27 Jul 2021 23:10:20 +0000 (23:10 +0000)]
[gn build] Port
8a48e6dda9f7
Johannes Doerfert [Tue, 27 Jul 2021 23:06:33 +0000 (18:06 -0500)]
Revert "[Attributor] Disable simplification AAs if a callback is present"
This reverts commit
cbb709e25124dc38ee593882051fc88c987fe591 as it
breaks the tests, which was not supposed to happen. Investigating now.
Johannes Doerfert [Tue, 27 Jul 2021 22:30:53 +0000 (17:30 -0500)]
[Attributor] Verify `checkForAllUses` return value properly
Also do not emit more than one remark after Heap2Stack failed.
Johannes Doerfert [Tue, 27 Jul 2021 17:54:04 +0000 (12:54 -0500)]
[OpenMP] Improve alignment handling in the new device runtime
Johannes Doerfert [Tue, 27 Jul 2021 18:44:21 +0000 (13:44 -0500)]
[Attributor] Disable simplification AAs if a callback is present
AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at
the IR by default. If queried for a IR position which has a
simplification callback we should either look at the callback return, or
give up. We do the latter for now.
zoecarver [Tue, 13 Jul 2021 18:06:10 +0000 (11:06 -0700)]
[libcxx][ranges] Add `counted_iterator`.
Differential Revision: https://reviews.llvm.org/
D106205
zoecarver [Tue, 27 Jul 2021 22:46:10 +0000 (15:46 -0700)]
[libcxx][nfc] Delete `cpp20_input_iterator`'s default constructor.
This will make it conform only to the minimum requirements for an `input_iterator`.
Leonard Chan [Fri, 9 Jul 2021 22:13:04 +0000 (15:13 -0700)]
[compiler-rt][hwasan][Fuchsia] Do not emit FindDynamicShadowStart for Fuchsia
This function is unused because fuchsia does not support a dynamic shadow.
Differential Revision: https://reviews.llvm.org/
D105735
James Y Knight [Tue, 27 Jul 2021 22:29:25 +0000 (18:29 -0400)]
Fix test/Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll.
It was writing to the source directory (which may not be writeable),
rather than using %t.
Fixes: a5dd6c6cf935 ("[LoopVectorize] Don't interleave scalar ordered reductions for inner loops")
Amilendra Kodithuwakku [Tue, 27 Jul 2021 21:29:25 +0000 (22:29 +0100)]
[lld][ELF] remove empty SyntheticSections from inputSections
Change removeUnusedSyntheticSections() to actually remove empty
SyntheticSections in inputSections.
In addition to doing what removeUnusedSyntheticSections() was meant
to do, this will also make the shuffle-sections tests, which shuffles
inputSections, less sensitive to empty Synthetic Sections that
will not appear in the final image.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/
D106427
Change-Id: I589eaf596472161a4395fb658aea0fad73318088
Nico Weber [Tue, 27 Jul 2021 22:23:28 +0000 (18:23 -0400)]
[gn build] manually port
71909de37495
Joseph Huber [Tue, 27 Jul 2021 19:00:09 +0000 (15:00 -0400)]
[Libomptarget] Revert new variable sharing to use the old method
The new method of sharing variables introduces a `__kmpc_alloc_shared` call
that cannot be removed in the middle end because of its non-constant argument
and unconnected free. This patch reverts this to the old method that used a
static amount of shared memory for sharing variables.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/
D106905
Jinsong Ji [Tue, 27 Jul 2021 22:10:33 +0000 (22:10 +0000)]
[AIX] Update fetch_and_add type
It turns out that the AIX kernel is defining int instead of unsigned int for fetch_and_add.
Legacy XL also defines this to be signed.
https://www.ibm.com/docs/en/aix/7.2?topic=f-fetch-add-kernel-services
So update the type for compat.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/
D106920
Mircea Trofin [Tue, 27 Jul 2021 22:09:36 +0000 (15:09 -0700)]
[MLGO] fix silly LLVM_DEBUG misuse
Joachim Protze [Tue, 27 Jul 2021 22:08:32 +0000 (00:08 +0200)]
[OpenMP][Tests] Fix test compatibility
gcc and clang disagree in how the event handle needs to be handled.
According to OpenMP LC, gcc is right. Will open clang bug report
Mircea Trofin [Tue, 27 Jul 2021 22:03:47 +0000 (15:03 -0700)]
[NFC][MLGO] Debug messages for what inline advisor is selected
We already have an indication (error) if the desired inline advisor
cannot be enabled, but we don't have a positive indication. Added
LLVM_DEBUG messages for the latter.
Joachim Protze [Wed, 2 Jun 2021 15:39:22 +0000 (17:39 +0200)]
[OpenMP] Fix deadlock for detachable task with child tasks
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066.
For detachable tasks, the assumption breaks that the proxy task cannot have
remaining child tasks when the proxy completes.
In stead of increment/decrement the incomplete task count, a high-order bit
is flipped to mark and wait for the incomplete proxy task.
Differential Revision: https://reviews.llvm.org/
D101082
Alfonso Gregory [Tue, 27 Jul 2021 21:33:06 +0000 (21:33 +0000)]
[libc] Fix strtok_r crash when src and *saveptr are both nullptr
While working and testing my refactoring of multiple string functions in libc, I came across a bug that needs to be addressed in a patch on its own: src is checked for nullptr and assigned to *saveptr if it is nullptr. However, saveptr is initially nullptr when it comes to reentry. This could cause a problem if both saveptr and src are null; we need to do the check first and return nullptr if both are nullptr.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/
D106885
Alex Langford [Tue, 27 Jul 2021 21:43:40 +0000 (14:43 -0700)]
[lldb][NFC] Fix incorrect log and comment
Likely copy & paste issue that was overlooked years ago
Jose M Monsalve Diaz [Tue, 27 Jul 2021 21:20:47 +0000 (17:20 -0400)]
[OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions
The device runtime contains several calls to __kmpc_get_hardware_num_threads_in_block
and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.
In commit
D106033 we have the optimization phase. This commit adds the attributes to
the outlined function for the grid size. the two attributes are `omp_target_num_teams` and
`omp_target_thread_limit`. These values are added as long as they are constant.
Two functions are created `getNumThreadsExprForTargetDirective` and
`getNumTeamsExprForTargetDirective`. The original functions `emitNumTeamsForTargetDirective`
and `emitNumThreadsForTargetDirective` identify the expresion and emit the code.
However, for the Device version of the outlined function, we cannot emit anything.
Therefore, this is a first attempt to separate emision of code from deduction of the
values.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/
D106298
Jianzhou Zhao [Tue, 27 Jul 2021 18:54:55 +0000 (18:54 +0000)]
[dfsan][NFC] Describe how origin trace tracking works
Reviewed By: gbalats
Differential Revision: https://reviews.llvm.org/
D106903
Siva Chandra Reddy [Mon, 26 Jul 2021 16:47:07 +0000 (16:47 +0000)]
[libc] Fix x86_64 fenv implementation for windows
All fenv functions are also enabled for windows. Since two tests,
enabled_exceptions_test and feholdexcept_test are still failing on
windows, they have been disabled.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/
D106808
Nemanja Ivanovic [Tue, 27 Jul 2021 20:47:44 +0000 (15:47 -0500)]
[PowerPC] Turn deprecated altivec prefetch instrs to nops on AIX
The dst/dstt/dstst/dststt instructions are nop's on all PowerPC
cores that AIX supports. The AIX assembler also does not accept
these mnemonics. Turn them into nop's on AIX (similar to dstall).
Sanjay Patel [Tue, 27 Jul 2021 20:42:51 +0000 (16:42 -0400)]
[x86] update stale code comment; NFC
The transform was generalized with:
1ce05ad619a5
Sanjay Patel [Tue, 27 Jul 2021 19:41:24 +0000 (15:41 -0400)]
[x86] add more tests for cmov and lea; NFC
River Riddle [Tue, 27 Jul 2021 20:27:37 +0000 (20:27 +0000)]
[mlir] Add a FailureOr copy constructor from a FailureOr of a convertible type.
River Riddle [Tue, 27 Jul 2021 20:27:24 +0000 (20:27 +0000)]
[PDL] Remove RewriteEndOp and mark RewriteOp as NoTerminator
RewriteEndOp was a fake terminator operation that is no longer needed now that blocks are not required to have terminators.
Differential Revision: https://reviews.llvm.org/
D106911
Hedin Garca [Tue, 27 Jul 2021 17:14:18 +0000 (17:14 +0000)]
[libc] Enable MPFR library for math functions test
Included more math functions to Windows's entrypoints
and made a cmake option (-DLLVM_LIBC_MPFR_INSTALL_PATH)
where the user can specify the install path where the MPFR
library was built so it can be linked. The try_compile was
moved to LLVMLibCCheckMPFR.cmake, so the variable that is
set after this process can retain its value in other files
of the same parent file. A direct reason for this is for
LIBC_TESTS_CAN_USE_MPFR to be true when the user specifies
MPFR's path and retain its value even after leaving the file.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/
D106894
Jim Ingham [Tue, 27 Jul 2021 20:36:21 +0000 (13:36 -0700)]
Add a test for top-level expressions using "expr --top-level".
This was broken for a while even though the Python version
continued to work. This adds a test so it doesn't regress.
Jim Ingham [Tue, 27 Jul 2021 20:33:49 +0000 (13:33 -0700)]
When calculating the "currently selected thread" in
Process::HandleStateChangedEvent, we check whether a thread stopped
for eStopReasonSignal is stopped for a signal that's currently set to
"no-stop". If it is, then we don't set that thread as the currently
selected thread.
But that only happens in the part of the algorithm that's handling the
case where the previously selected thread has no stop reason. Since we
want to keep on a thread as long as it is doing something interesting,
we always prefer the current thread. That's almost right, but we
forgot to check whether the previously selected thread stopped with an
eStopReasonSignal for a "no-stop" signal. If it did, then we shouldn't
select it.
This patch adds that check. I can't figure out a good way to test
this. This is the sort of thing that Ismail's scripted process plugin
will make easy once it is a real boy. But figuring out how to do this
in a real process is not trivial.
Differential Revision: https://reviews.llvm.org/
D106712
Jim Ingham [Fri, 23 Jul 2021 00:51:02 +0000 (17:51 -0700)]
Fix "break delete --disabled" with no arguments.
The code that figured out which breakpoints to delete was supposed
to set the result status if it found breakpoints, and then the code
that actually deleted them checked that the result's status was set.
The code for "break delete --disabled" failed to set the status if
no "protected" breakpoints were provided. This was a confusing way
to implement this, so I reworked it with early returns so it was less
error prone, and added a test case for the no arguments case.
Differential Revision: https://reviews.llvm.org/
D106623
Mark de Wever [Sun, 25 Jul 2021 07:18:53 +0000 (09:18 +0200)]
[libc++] Disable incomplete library features.
Adds a new CMake option to disable the usage of incomplete headers.
These incomplete headers are not guaranteed to be ABI stable. This
option is intended to be used by vendors so they can avoid their users
from code that's not ready for production usage.
The option is enabled by default.
Differential Revision: https://reviews.llvm.org/
D106763
Jacques Pienaar [Tue, 27 Jul 2021 20:32:09 +0000 (13:32 -0700)]
[mlir][bzl] Fix typo
Jinsong Ji [Tue, 27 Jul 2021 20:18:48 +0000 (20:18 +0000)]
[libclang] Check LLVM_HAVE_LINK_VERSION_SCRIPT
There are some platform that might not have version script support,
don't try to use version script on those.
Reviewed By: MaskRay, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/
D106914
Peter Steinfeld [Tue, 27 Jul 2021 17:40:34 +0000 (10:40 -0700)]
[flang] Disallow BOZ literal constants as output list items
According to C7109, "A boz-literal-constant shall appear only as a
data-stmt-constant in a DATA statement, or where explicitly allowed in
16.9 as an actual argument of an intrinsic procedure." This change
enforces that constraint for output list items.
I also added a general interface to determine if an expression is a BOZ
literal constant and changed all of the places I could find where it
could be used.
I also added a test.
This change stemmed from the following issue --
https://gitlab-master.nvidia.com/fortran/f18-stage/issues/108
Differential Revision: https://reviews.llvm.org/
D106893
Matt Arsenault [Mon, 26 Jul 2021 22:10:28 +0000 (18:10 -0400)]
AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9
The patterns for the m0 glue patterns were failing to import.
Matt Arsenault [Tue, 27 Jul 2021 17:06:38 +0000 (13:06 -0400)]
AMDGPU/GlobalISel: Fix wrong addrspace in test MMOs
Matt Arsenault [Tue, 27 Jul 2021 14:57:29 +0000 (10:57 -0400)]
AMDGPU/GlobalISel: Add a few tests for unaligned truncating stores
Florian Mayer [Tue, 27 Jul 2021 09:39:33 +0000 (10:39 +0100)]
[hwasan] Fix stack safety test for old PM.
With the old PM, the stub for __hwasan_generate_tag is still generated
in the IR, but never called.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/
D106858
Fanbo Meng [Tue, 27 Jul 2021 16:46:04 +0000 (12:46 -0400)]
[z/OS] Make MinGlobalAlign consistent with SystemZ
Remove overriding MinGlobalAlign to 0 for z/OS target to be consistent with SystemZ.
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/
D106890