Simon Pilgrim [Sat, 12 Nov 2022 13:45:16 +0000 (13:45 +0000)]
[MCA][X86][AVX512] Add test coverage for unsigned<->fp conversion instructions
Simon Pilgrim [Sat, 12 Nov 2022 12:39:59 +0000 (12:39 +0000)]
[X86] SkylakeServerModel - conversion instructions don't use Port015
Fixes a lot of throughput mismatches - the more complicated conversion instructions use SKXPort5+SKXPort01, not SKXPort5+SKXPort015 (SKXPort015 is mainly used for basic Logic + blend ops)
Fixing this should allow us to remove a lot of unnecessary scheduler overrides from SkylakeServerModel
Confirmed by both Agner + uops.info
Simon Pilgrim [Sat, 12 Nov 2022 12:15:56 +0000 (12:15 +0000)]
[X86] Replace unnecessary SKL CVTPD2DQ overrides with better base class defs
Also fixes some AVX missing folded instructions
Simon Pilgrim [Sat, 12 Nov 2022 10:37:33 +0000 (10:37 +0000)]
[X86] Tweak Alderlake instregex to match CodeGen-only and public scalar instruction ops
As detailed on #58792 the _Int postfix needs to be optional in the instregex to match both instructions - fixes mismatch warnings on a scheduler model verifier I'm working on
Simon Pilgrim [Sat, 12 Nov 2022 10:23:52 +0000 (10:23 +0000)]
[X86] Replace unnecessary SKL conversion overrides with better base class defs
Split various conversion instructions that use different scheduler pipes for the reg-reg and reg-mem variants (and not an additional Port23 uop for memory folding) - declare the classes separately instead of using the SKLWriteResPair helper
Brad Smith [Sat, 12 Nov 2022 09:54:41 +0000 (04:54 -0500)]
[Support/ELF] - Add OpenBSD PT_OPENBSD_MUTABLE constant.
OpenBSD commit for reference:
https://github.com/openbsd/src/commit/
bd249b5664da50f0178adea78250a7a0d8ea6566
Michał Górny [Sat, 12 Nov 2022 09:34:57 +0000 (10:34 +0100)]
[lldb] [cmake] Fix another typo in third-party/unittest path
wanglei [Sat, 12 Nov 2022 08:35:59 +0000 (16:35 +0800)]
[LoongArch] Implement MCTargetExpr::fixELFSymbolsInTLSFixups hook
Reviewed By: SixWeining, MaskRay
Differential Revision: https://reviews.llvm.org/D137628
Micah Weston [Sat, 12 Nov 2022 08:51:35 +0000 (00:51 -0800)]
[clang-format] Treats &/&& as reference when followed by ',' or ')'
Ran into an issue where function declarations inside function
scopes or uses of sizeof inside a function would treat the && in
'sizeof(Type &&)' as a binary operator.
Attempt to fix this by assuming reference when followed by ',' or
')'. Also adds tests for these.
Also hit an edge case in another test that treated "and" the same
as "&&" since it parses as C++. Changed the "and" to "also" so it
is no longer a keyword.
Fixes #58923.
Differential Revision: https://reviews.llvm.org/D137755
Owen Pan [Sat, 5 Nov 2022 11:09:13 +0000 (04:09 -0700)]
[clang-format] Correctly annotate function names before attributes
Fixes #58827.
Differential Revision: https://reviews.llvm.org/D137486
Craig Topper [Sat, 12 Nov 2022 08:31:31 +0000 (00:31 -0800)]
[RISCV] Rename template parameter. NFC
Craig Topper [Sat, 12 Nov 2022 07:05:34 +0000 (23:05 -0800)]
[RISCV] Use template to reduce some code. NFC
Fangrui Song [Sat, 12 Nov 2022 06:59:51 +0000 (22:59 -0800)]
Add back single quotes when dontcall attribute was split into dontcall-error/dontcall-warn
Single quotes were accidentally dropped in D110364.
Haohai Wen [Sat, 12 Nov 2022 04:30:04 +0000 (12:30 +0800)]
[X86] Reduce unnecessary instregex for AlderlakeP schedule model
Using instregex for simple instruction opcode is much slower than
instrs. This patch replaces them with instrs.
Github issue: 35303
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D137841
Amara Emerson [Sat, 12 Nov 2022 03:54:39 +0000 (19:54 -0800)]
[AArch64][GlobalISel] Select TBZ for icmp sge x, 0.
This results in some nice size improvements on -Os CTMark:
Program size.__text
sdag gisel diff
consumer-typeset/consumer-typeset 414124.00 414052.00 -0.0%
tramp3d-v4/tramp3d-v4 356840.00 356732.00 -0.0%
lencod/lencod 427560.00 427396.00 -0.0%
7zip/7zip-benchmark 568400.00 568172.00 -0.0%
Bullet/bullet 455660.00 455428.00 -0.1%
mafft/pairlocalalign 248236.00 248040.00 -0.1%
sqlite3/sqlite3 284404.00 284176.00 -0.1%
ClamAV/clamscan 381052.00 380604.00 -0.1%
SPASS/SPASS 411932.00 411296.00 -0.2%
kimwitu++/kc 439696.00 438992.00 -0.2%
Geomean difference -0.1%
Peiming Liu [Sat, 12 Nov 2022 01:00:44 +0000 (01:00 +0000)]
[mlir][sparse] fix incorrect coordinates ordering computed by the foreach operation.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137877
Raman Tenneti [Sat, 12 Nov 2022 02:11:06 +0000 (18:11 -0800)]
Removed tabs.
Raman Tenneti [Sat, 12 Nov 2022 02:00:01 +0000 (18:00 -0800)]
[libc] Implement gettimeofday
Implement gettimeofday per
.../onlinepubs/
9699919799/functions/gettimeofday.html.
This call clock_gettime to implement gettimeofday function.
Tested:
Limited unit test: This makes a call and checks that no error was
returned. Used nanosleep for 100 microseconds and verfified it
returns a value that elapses more than 100 microseconds and less
than 300 microseconds.
Co-authored-by: Jeff Bailey <jeffbailey@google.com>
Differential Revision: https://reviews.llvm.org/D137881
Dave Lee [Sat, 12 Nov 2022 00:19:03 +0000 (16:19 -0800)]
[lldb] Rewrite to assertEqual/assertNotEqual (NFC)
Using the more specific assert* methods results in more useful error message.
Matt Arsenault [Fri, 26 Jun 2020 03:57:30 +0000 (23:57 -0400)]
AMDGPU: Fold llvm.amdgcn.sqrt(undef)
Louis Dionne [Thu, 10 Nov 2022 03:00:32 +0000 (17:00 -1000)]
[libc++] Add a libc++ CI pipeline specific to Clang changes
This will ensure that Clang changes get tested against libc++.
Differential Revision: https://reviews.llvm.org/D137759
Matt Arsenault [Mon, 19 Sep 2022 18:04:12 +0000 (14:04 -0400)]
Analysis: Reorder code in isDereferenceableAndAlignedPointer
GEPs should be the most common and basic case, so try that first.
Matt Arsenault [Wed, 2 Nov 2022 21:46:00 +0000 (14:46 -0700)]
WebAssembly: Remove MachineFunction reference from MFI
The MachineFunctionInfo here is a bit awkward because
WasmEHInfo is in the MachineFunction but handled from
the target code. Either everything should move into WebAssembly
or into the MachineFunction for MIR serialization.
Matt Arsenault [Sat, 12 Nov 2022 00:13:40 +0000 (16:13 -0800)]
clang: Fix unnecessary truncation of resource limit values
Ye Luo [Sat, 12 Nov 2022 00:14:33 +0000 (18:14 -0600)]
Disable OMPD tests. Causing CMake issue.
Dave Lee [Fri, 11 Nov 2022 22:32:04 +0000 (14:32 -0800)]
[lldb] Fix SBFileSpec.fullpath for Windows
Fix `fullpath` to not assume a `/` path separator. This was discovered when
D133130 failed on Windows. Use `os.path.join()` to fix the issue.
Reviewed By: mib
Differential Revision: https://reviews.llvm.org/D133366
Michael Jones [Fri, 11 Nov 2022 21:50:01 +0000 (13:50 -0800)]
[libc] move fork into threads folder
Fork, as a thread function, should go in the threads folder.
Additionally, it depends on the thread mutex, and it was causing build
issues for targets where we don't support threads.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D137867
Lang Hames [Fri, 11 Nov 2022 21:34:29 +0000 (13:34 -0800)]
[ORC-RT][MachO] Unlock JDStatesMutex during push-initializers to avoid deadlock.
During __orc_rt_macho_jit_dlopen the ORC runtime will make a request to the JIT
to push any new initializers. Since this call may add new JD-state to the
runtime (and is expected to in general) we need to unlock the JDStatesMutex
during this operation (and similarly when running initializers and atexits, as
these may call trigger push-initializers recursively).
No testcase yet: I haven't been able to reproduce the deadlock when running
llvm-jitlink in in-process mode, and we don't support out-of-process mode in
regression tests yet.
Lang Hames [Fri, 11 Nov 2022 05:32:31 +0000 (21:32 -0800)]
[ORC] Capture JD by value in MachOPlatform::pushInitializersLoop.
The lambda may run after pushInitializersLoop returns.
Tom Stellard [Fri, 11 Nov 2022 23:17:25 +0000 (15:17 -0800)]
docs: add instructions for stand-alone builds of lld
Reviewed By: kwk, MaskRay
Differential Revision: https://reviews.llvm.org/D124405
Mahesh Ravishankar [Fri, 11 Nov 2022 23:09:08 +0000 (23:09 +0000)]
[mlir][Linalg] Avoid using `tensor.cast` by default while folding `fill` with `pad`.
This is unnecessary if the generated operation type already matches
the type of the replaced value. Also use `OpFoldResult` to reduce the
number of cases the casts are needed.
Reviewed By: springerm, hanchung, antiagainst
Differential Revision: https://reviews.llvm.org/D137479
Alina Sbirlea [Fri, 11 Nov 2022 22:57:10 +0000 (14:57 -0800)]
Revert "[Hexagon] Use default attributes for intrinsics"
This reverts commit
8a8983b279dd5e4dceabe1fadbb8980b6adb88f9.
Uncovers existing regalloc issue in Hexagon backend - blocking for Halide
Hexagon users. Reverting to unblock, to be recommitted when underlying issue is resolved.
Reproducer available shortly.
Matt Arsenault [Sun, 23 Oct 2022 19:29:18 +0000 (12:29 -0700)]
llvm-diff: Add failing testcase for issue 58629
Jordan Rupprecht [Fri, 11 Nov 2022 22:47:57 +0000 (14:47 -0800)]
[NFC] Remove unused var Op
Matt Arsenault [Mon, 24 Oct 2022 17:56:18 +0000 (10:56 -0700)]
llvm-reduce: Minor code cleanups
Matt Arsenault [Mon, 24 Oct 2022 23:19:40 +0000 (16:19 -0700)]
llvm-reduce: Use DenseSet
Dave Lee [Fri, 11 Nov 2022 20:18:37 +0000 (12:18 -0800)]
[lldb] Allow flexible importing of in_call_stack
Allow `in_call_stack` to be imported in either of the following ways:
```
command script import path/to/in_call_stack.py
command script import lldb.utils.in_call_stack
```
rdar://
102249295
Differential Revision: https://reviews.llvm.org/D137860
Mingming Liu [Wed, 9 Nov 2022 06:46:42 +0000 (22:46 -0800)]
[AArch64] Select BFI/BFXIL to ORR with shifted operand when one operand is the left or right shift of another operand
Use right shift [1] as an example
- Before, bfxil is generated (https://godbolt.org/z/EfzWMszPn)
- After, orr with right-shifted operand is generated (added test cases in `CodeGen/AArch64/bitfield-insert.ll`)
[1]
```
define i64 @test_orr_not_bfxil_i64(i64 %0) {
%2 = and i64 %0, 1044480 ; 0xff000
%3 = lshr i64 %2, 12
%4 = or i64 %2, %3
ret i64 %4
}
```
Differential Revision: https://reviews.llvm.org/D137689
Mingming Liu [Wed, 9 Nov 2022 06:28:07 +0000 (22:28 -0800)]
[NFC][AArch64]Precommit test cases to show ORR is better when one operand is a shift of the other operand
In `bfi-not-orr` tests, bfi/bfxil are better since they simplifies away two instructions (extracting bits into destination directly)
In `orr-not-bfi` tests, orr is better since both orr and bfm would simplify away one instruction (the shl node), orr has higher throughput and shorter latency than bfm.
Peiming Liu [Fri, 11 Nov 2022 18:51:25 +0000 (18:51 +0000)]
[mlir][sparse] fix crash when calling getTuple on non-sparse tensors.
This enables full sparse convolution codegen in D137298
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D137853
Matt Arsenault [Thu, 20 Oct 2022 21:19:59 +0000 (14:19 -0700)]
llvm-reduce: Report number of new chunks
Mengxuan Cai [Fri, 11 Nov 2022 17:42:45 +0000 (12:42 -0500)]
[LoopFuse] Ensure inner loops are in loop simplified form under new PM
LoopInfo doesn't give all loops in a loop nest, it gives top level loops
only. While isLoopSimplifyForm() only checkes for the outter most loop of a
loop nest. As a result, inner loops that are not in simplied form can
not be simplified with the original code.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D137672
Sanjay Patel [Fri, 11 Nov 2022 20:26:54 +0000 (15:26 -0500)]
[InstCombine] allow more folds more multi-use selects
The 'and' case showed up in a recent bug report and prevented
more follow-on transforms from happening.
We could handle more patterns (for example, the select arms
simplified, but not to constant values), but this seems
like a safe, conservative enhancement. The backend can
convert select-of-constants to math/logic in many cases
if it is profitable.
There is a lot of overlapping logic for these kinds of patterns
(see SimplifySelectsFeedingBinaryOp() and FoldOpIntoSelect()),
so there may be some opportunity to improve efficiency.
There are also optimization gaps/inconsistency because we do
not call this code for all bin-opcodes (see TODO for ashr test).
Sanjay Patel [Fri, 11 Nov 2022 16:17:07 +0000 (11:17 -0500)]
[InstCombine] add tests for binop with select operand; NFC
Jakub Kuderski [Fri, 11 Nov 2022 20:13:48 +0000 (15:13 -0500)]
[mlir][arith] Add `arith.cmpi` support to WIE
This inludes both LIT tests over IR and runtime checks.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D137846
Daniel Rodríguez Troitiño [Fri, 11 Nov 2022 19:56:46 +0000 (11:56 -0800)]
[objcopy] Fix order of Mach-O LINKEDIT pieces during layout
The exports trie and the chained fixups where in the opposite order, and
function starts happenned before them, instead of after them.
Restore the correct order and rewrite the code to make it easier to move
around in the future if needed by reusing the `Offset` variable and
keeping both the `StartOf...` and the size of each piece together.
This was found out while trying to use the system strip in a binary
already stripped by LLVM and receiving errors around chained fixups when
we enabled those in the linker.
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D133974
Krzysztof Parzyszek [Tue, 8 Nov 2022 22:57:21 +0000 (14:57 -0800)]
[Hexagon] Place aligned loads closer to users
Vector alignment code was grouping all aligned loads together. In some
cases the groups could become quite large causing a lot of spill to be
generated. This will place the loads closer to where they are used,
reducing the register pressure.
Dave Lee [Fri, 11 Nov 2022 19:44:06 +0000 (11:44 -0800)]
[lldb] Don't assume name of libc++ inline namespace in LibCxxUnorderedMap
Follow up to D117383, fixing the assumption that libc++ always uses `__1` as
its inline namespace name.
Reviewed By: rupprecht
Differential Revision: https://reviews.llvm.org/D133259
Fangrui Song [Fri, 11 Nov 2022 19:53:05 +0000 (11:53 -0800)]
[LinkerWrapper] Fix -Wpessimizing-move
Joseph Huber [Tue, 25 Oct 2022 17:28:28 +0000 (12:28 -0500)]
[LinkerWrapper] Perform device linking steps in parallel
This patch changes the device linking steps to be performed in parallel
when multiple offloading architectures are being used. We use the LLVM
parallelism support to accomplish this by simply doing each inidividual
device linking job in a single thread. This change required re-parsing
the input arguments as these arguments have internal state that would
not be properly shared between the threads otherwise.
By default, the parallelism uses all threads availible. But this can be
controlled with the `--wrapper-jobs=` option. This was required in a few
tests to ensure the ordering was still deterministic.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D136701
Dave Lee [Fri, 11 Nov 2022 19:07:37 +0000 (11:07 -0800)]
[lldb] Update regex to be less fragile in TestDataFormatterGenericUnordered
Follow up to D129386 where libc++ naming conventions were made consistent.
This changes the pattern to not rely on the internal name (`__cc` or `__cc_`),
and instead uses a pattern to check that the child has the form:
```
[0] = {
first = ...
```
Thanks to @rupprecht for pointing out this issue: https://reviews.llvm.org/D133259#3773120
Reviewed By: rupprecht
Differential Revision: https://reviews.llvm.org/D133395
Michał Górny [Fri, 11 Nov 2022 19:38:56 +0000 (20:38 +0100)]
[lldb] [cmake] Fix typo in unittest directory path
Fix a typo in
a11cd0d94ed3cabf0998a0289aead05da94c86eb that resulted
in additional "}" in unittest directory path, e.g.:
CMake Error at cmake/modules/LLDBStandalone.cmake:104 (add_subdirectory):
add_subdirectory given source
"/var/tmp/portage/dev-util/lldb-16.0.0_pre20221111/work/lldb/../third-party}/utils/unittest"
which is not an existing directory.
Call Stack (most recent call first):
CMakeLists.txt:29 (include)
Alex Brachet [Fri, 11 Nov 2022 19:40:08 +0000 (19:40 +0000)]
Revert "[Clang][AArch64][Darwin] Enable GlobalISel by default for Darwin ARM64 platforms."
This reverts commit
f64802e8d3e9db299cad913ffcb734c8d35dc5f0.
Philip Reames [Fri, 11 Nov 2022 19:10:29 +0000 (11:10 -0800)]
Add a const version of SDUse::getUser [nfc]
Sanjoy Das [Fri, 11 Nov 2022 05:31:33 +0000 (21:31 -0800)]
Model UB in integer division operations in the arith dialect
Before this commit `arith.{ceil}div{u|s}i` were marked `Pure` which is
incorrect because these operations invoke UB on certain inputs.
Fixes: https://github.com/llvm/llvm-project/issues/58700
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D137814
Daniel Rodríguez Troitiño [Fri, 11 Nov 2022 18:13:37 +0000 (10:13 -0800)]
[ObjectYAML] Basic support for chained fixups.
Add basic binary support for chained fixups. This allows basic tests
with chained fixups without trying to create a format for them until the
work on the Object library is considered finished.
Reviewed By: pete
Differential Revision: https://reviews.llvm.org/D134250
Aaron Ballman [Fri, 11 Nov 2022 17:22:56 +0000 (12:22 -0500)]
Fix typo; NFC
Co-authored-by: Guillot Tony <tony.guillot@protonmail.com>
Adrian Vogelsgesang [Fri, 11 Nov 2022 17:59:08 +0000 (09:59 -0800)]
Revert "[LLDB] Devirtualize coroutine promise types for `std::coroutine_handle`"
This reverts commit
558db7787005348e2efaabb628ec36f1c461a741 due to
buildbot failures on ARM
* https://lab.llvm.org/buildbot/#/builders/96/builds/31416
* https://lab.llvm.org/buildbot/#/builders/17/builds/30086
Mingming Liu [Thu, 10 Nov 2022 20:12:21 +0000 (12:12 -0800)]
[NFC][AArch64]Call encoding functions for left-shift immediate (which is no-op in terms of value but better code style)
Call encoding functions for left-shfit immidate for consistency (and
easier tracking if the encoding ever changes in the future).
Differential Revision: https://reviews.llvm.org/D137797
bixia1 [Fri, 11 Nov 2022 16:46:55 +0000 (08:46 -0800)]
[mlir][sparse] Extend more integration to run on the codegen path.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D137850
Simon Pilgrim [Fri, 11 Nov 2022 17:39:14 +0000 (17:39 +0000)]
[X86] Split int2double and float2double scheduler classes on Haswell/Broadwell to remove overrides
Haswell/Broadwell have numerous conversion instructions that use different scheduler pipes for the reg-reg and reg-mem variants (and not an additional Port23 uop for memory folding) - so declare the classes separately instead of using the HWWriteResPair/BWWriteResPair helpers
Adrian Vogelsgesang [Wed, 24 Aug 2022 03:53:00 +0000 (20:53 -0700)]
[LLDB] Devirtualize coroutine promise types for `std::coroutine_handle`
This commit teaches the `std::coroutine_handle` pretty-printer to
devirtualize type-erased promise types. This is particularly useful to
resonstruct call stacks, either of asynchronous control flow or of
recursive invocations of `std::generator`. For the example recently
introduced by https://reviews.llvm.org/D132451, printing the `__promise`
variable now shows
```
(std::__coroutine_traits_sfinae<task, void>::promise_type) __promise = {
continuation = coro frame = 0x555555562430 {
resume = 0x0000555555556310 (a.out`task detail::chain_fn<1>() at llvm-nested-example.cpp:66)
destroy = 0x0000555555556700 (a.out`task detail::chain_fn<1>() at llvm-nested-example.cpp:66)
promise = {
continuation = coro frame = 0x5555555623e0 {
resume = 0x0000555555557070 (a.out`task detail::chain_fn<2>() at llvm-nested-example.cpp:66)
destroy = 0x0000555555557460 (a.out`task detail::chain_fn<2>() at llvm-nested-example.cpp:66)
promise = {
...
}
}
result = 0
}
}
result = 0
}
```
(shortened to keep the commit message readable) instead of
```
(std::__coroutine_traits_sfinae<task, void>::promise_type) __promise = {
continuation = coro frame = 0x555555562430 {
resume = 0x0000555555556310 (a.out`task detail::chain_fn<1>() at llvm-nested-example.cpp:66)
destroy = 0x0000555555556700 (a.out`task detail::chain_fn<1>() at llvm-nested-example.cpp:66)
}
result = 0
}
```
Note how the new debug output reveals the complete asynchronous call
stack: our own function resumes `chain_fn<1>` which in turn will resume
`chain_fn<2>` and so on. Thereby this change allows users of lldb to
inspect the logical coroutine call stack without using any custom debug
scripts (although the display is still a bit clumsy. It would be nicer
to also integrate this into lldb's backtrace feature, but I don't know
how to do so)
The devirtualization currently works by introspecting the function
pointed to by the `destroy` pointer. (The `resume` pointer is not worth
much, given that for the final suspend point `resume` is set to a
nullptr. We have to use the `destroy` pointer instead.) We then look
for a `__promise` variable inside the `destroy` function. This
`__promise` variable is synthetically generated by LLVM, and looking at
its type reveals the type-erased promise_type.
This approach only works for clang-generated code, though. While gcc
also adds a `_Coro_promise` variable to the `resume` function, it does
not do so for the `destroy` function. However, we can't use the `resume`
function, as it will be reset to a nullptr at the final suspension
point. For the time being, I am happy with de-virtualization only working
for clang. A follow-up commit will further improve devirtualization and
also expose the variables spilled to the coroutine frame. As part of
this, I will also revisit gcc support.
Differential Revision: https://reviews.llvm.org/D132624
Matt Arsenault [Fri, 11 Nov 2022 16:51:24 +0000 (08:51 -0800)]
AMDGPU: Disable some class simplifications for strictfp
bixia1 [Fri, 11 Nov 2022 07:13:20 +0000 (23:13 -0800)]
[mlir][sparse] Fix a bug in rewriting dense2dense convert op.
Permutation wasn't handled correctly. Add a test for the rewriting.
Extend an integration test to run with enable_runtime_library=false to
also test the rewriting.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D137845
Sylvestre Ledru [Fri, 11 Nov 2022 16:36:07 +0000 (17:36 +0100)]
consistency: use spaces instead of tabs
Jordan Rupprecht [Fri, 11 Nov 2022 15:51:40 +0000 (07:51 -0800)]
[NFC] Remove unused OrigLoopID vars
Florian Hahn [Fri, 11 Nov 2022 15:39:07 +0000 (15:39 +0000)]
[LV] Remove unused OrigLoopID argument from completeLoopSekelton (NFC).
The argument is not used any longer and can be removed.
Benjamin Maxwell [Tue, 8 Nov 2022 11:52:55 +0000 (11:52 +0000)]
Precommit for redundant and after SVE load
Zahira Ammarguellat [Mon, 7 Nov 2022 18:54:42 +0000 (13:54 -0500)]
The handling of 'funsafe-math-optimizations' doesn't update the 'MathErrno'
flag. But the driver checks for 'fno-math-errno' before passing
'funsafe-math-optimizations' to the FE. In GCC, the option
'funsafe-math-optimizations' doesn't affect the 'fmath-errno' flag.
This patch aligns clang with GCC.
'-ffast-math' sets the FPContract to 'fast'. But 'funsafe-math-optimizations'
the driver doesn't consider the FPContract when handling the option.
Unfortunately there are places in the BE that interpret unsafe math
mode as allowing FMA. This patch makes -ffast-math' and
'funsafe-math-optimizations' behave similarly in regard to the setting of the
FPContract.
Differential Revision: https://reviews.llvm.org/D137578
Simon Pilgrim [Fri, 11 Nov 2022 14:51:05 +0000 (14:51 +0000)]
[X86] Replace unnecessary CVTPS2DQ folded overrides with better base class defs
Broadwell just needed the load latency to be tweaked for the overrides to be unnecessary - I think this was due to Issue #38536 (underestimation of most broadwell load latencies)
Sanjay Patel [Fri, 11 Nov 2022 13:51:13 +0000 (08:51 -0500)]
[InstSimplify] add test for fsub with inf operand; NFC
Verify that constant negation works with a partial undef vector.
Also, remove a bogus TODO comment on a related test.
Nikita Popov [Fri, 11 Nov 2022 14:05:11 +0000 (15:05 +0100)]
[MemCpyOpt] Avoid moving lifetime marker above def (PR58903)
This is unlikely to happen with opaque pointers, so just bail out
of the transform, rather than trying to move bitcasts/etc as well.
Fixes https://github.com/llvm/llvm-project/issues/58903.
Haojian Wu [Fri, 11 Nov 2022 13:54:20 +0000 (14:54 +0100)]
[include-cleaner] NFC, move the macro location fixme to findHeaders.
Sanjay Patel [Thu, 10 Nov 2022 22:35:36 +0000 (17:35 -0500)]
[InstSimplify] fold fsub nnan with Inf operand
Similar to
fbc2c8f2fbbb, but if we have a non-canonical
fsub with constant operand 1, then flip the sign of the
Infinity:
https://alive2.llvm.org/ce/z/vKWfhW
If Infinity is operand 0, then the sign remains:
https://alive2.llvm.org/ce/z/73d97C
Haojian Wu [Fri, 11 Nov 2022 13:40:08 +0000 (14:40 +0100)]
[include-cleaner] NFC, correct a comment in
PragmaIncludes::RecordPragma.
Matthias Springer [Fri, 11 Nov 2022 12:49:02 +0000 (13:49 +0100)]
[mlir][bufferize][NFC] Consolidate transform header files
Differential Revision: https://reviews.llvm.org/D137830
XingLi [Fri, 11 Nov 2022 13:22:45 +0000 (21:22 +0800)]
[compiler-rt] Mark $t* as clobbered for Linux/LoongArch syscalls
Linux/LoongArch doesn't preserve temporary registers across syscalls,
so we have to explicitly mark them as clobbered to avoid trashing local variables.
Reviewed By: xry111, xen0n, tangyouling, SixWeining
Differential Revision: https://reviews.llvm.org/D137396
Ron Lieberman [Fri, 11 Nov 2022 13:09:03 +0000 (07:09 -0600)]
for Vignesh: land changes to disable two recent ompd random fails
Differential Revision: https://reviews.llvm.org/D137831
Sam McCall [Fri, 11 Nov 2022 11:41:45 +0000 (12:41 +0100)]
[clang-include-cleaner] make SymbolLocation a real class, move FindHeaders
- replace SymbolLocation std::variant with enum-exposing version similar to
those in types.cpp. There's no appropriate implementation file, added
LocateSymbol.cpp in anticipation of locateDecl/locateMacro.
- FindHeaders is not part of the public Analysis interface, so should not
be implemented/tested there (just code organization)
- rename findIncludeHeaders->findHeaders to avoid confusion with Include concept
Differential Revision: https://reviews.llvm.org/D137825
wanglei [Fri, 11 Nov 2022 12:36:18 +0000 (20:36 +0800)]
[Clang][LoongArch] Remove duplicate declaration. NFC
Sam McCall [Fri, 11 Nov 2022 12:25:22 +0000 (13:25 +0100)]
[include-cleaner] Provide public to_string of RefType (for HTMLReport), clean up includes. NFC
Martin Storsjö [Thu, 10 Nov 2022 10:37:00 +0000 (10:37 +0000)]
[openmp] [test] Set the right calling convention for the Windows thread start function
This is required on i386 Windows; this fixes 99 testcases in that
build configuration.
Differential Revision: https://reviews.llvm.org/D137776
Martin Storsjö [Wed, 2 Nov 2022 13:35:50 +0000 (13:35 +0000)]
[openmp] [test] Use omp_testsuite.h instead of directly including pthread.h
OpenMP tests that use pthread functions include this header instead.
On Unix systems, this header includes pthread.h, while it provides
minimal implementations of the used pthread functions for Windows.
Differential Revision: https://reviews.llvm.org/D137746
Martin Storsjö [Wed, 2 Nov 2022 11:55:39 +0000 (11:55 +0000)]
[openmp] [test] Fix building the affinity/format/fields_values.c testcase on Windows
Add a missing <process.h> include for _getpid. Don't typedef the
pid_t type on mingw, as mingw headers already provide a typedef for
it.
Differential Revision: https://reviews.llvm.org/D137745
Martin Storsjö [Sun, 6 Nov 2022 22:57:07 +0000 (00:57 +0200)]
[openmp] Fix building in debug mode with mingw
Mingw doesn't provide the _malloc_dbg/_free_dbg functions.
Differential Revision: https://reviews.llvm.org/D137743
Sam McCall [Fri, 11 Nov 2022 11:10:01 +0000 (12:10 +0100)]
[include-cleaner] verbatimSpelling->verbatim, clean up some silly init-lists. NFC
Guray Ozen [Fri, 11 Nov 2022 10:57:00 +0000 (11:57 +0100)]
[mlir] Fix asan errors in gpu transform dialect
Matthias Springer [Fri, 11 Nov 2022 09:32:05 +0000 (10:32 +0100)]
[mlir][bufferize] Eliminate tensor.empty ops instead of bufferization.alloc_tensor ops
tensor.empty op elimination is an optimization that brings IR in a more bufferization-friendly form. E.g.:
```
%0 = tensor.empty()
%1 = linalg.fill(%cst, %0) {inplace = [true]}
%2 = tensor.insert_slice %1 into %t[10][20][1]
```
Is rewritten to:
```
%0 = tensor.extract_slice %t[10][20][1]
%1 = linalg.fill(%cst, %0) {inplace = [true]}
%2 = tensor.insert_slice %1 into %t[10][20][1]
```
This optimization used to operate on bufferization.alloc_tensor ops. This is not correct because the documentation of bufferization.alloc_tensor says that it always bufferizes to an allocation. Instead, this optimization should operate on tensor.empty ops, which can then be lowered to bufferization.alloc_tensor ops (if they don't get eliminated).
Differential Revision: https://reviews.llvm.org/D137162
wanglei [Fri, 11 Nov 2022 10:13:52 +0000 (18:13 +0800)]
[LoongArch] Generate PCALAU12I + JIRL instruction pair for medium codemodel
In LoongArch, when `CodeModel=Medium`, it just increases the jumping
ability of function calls relative to PC, from 2^28 to 2^32.
Depends on D137393
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D137394
Dmitry Preobrazhensky [Fri, 11 Nov 2022 10:14:42 +0000 (13:14 +0300)]
[AMDGPU][MC] Disable SGPRs as src operands of VOP3 VINTRP instructions
Differential Revision: https://reviews.llvm.org/D137575
wanglei [Fri, 11 Nov 2022 01:52:26 +0000 (09:52 +0800)]
[LoongArch] Moved expansion of PseudoCALL to LoongArchPreRAExpandPseudo pass
This patch moves the expansion of the `PseudoCALL` insturction to
`LoongArchPreRAExpandPseudo` pass. This helps to expand into different
instruction sequences according to different CodeModels.
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D137393
Nikita Popov [Tue, 8 Nov 2022 10:48:03 +0000 (11:48 +0100)]
[Hexagon] Use default attributes for intrinsics
This switches Hexagon intrinsics to use the default attributes
(nosync, nofree, nocallback and willreturn). Especially willreturn
is needed to prevent optimization regressions in the future.
The only intrinsics I've excluded here are the load/store locked
intrinsics, which presumably aren't nosync.
Differential Revision: https://reviews.llvm.org/D137623
Oleg Shyshkov [Thu, 10 Nov 2022 10:42:48 +0000 (11:42 +0100)]
Revert "Revert "[mlir][linalg] Replace "string" iterator_types attr with enums in LinalgInterface.""
With python code fixed.
This reverts commit
41280908e43d47903960c66237ab49caa5641b4d.
Alexander Belyaev [Fri, 11 Nov 2022 09:52:08 +0000 (10:52 +0100)]
[mlir] Fix forward the fix for incorrect Optional<ArrayAttr> usage.
Dmitry Makogon [Fri, 11 Nov 2022 09:45:22 +0000 (16:45 +0700)]
[Test] Add test for crash in IRCE when IV is AddRec for another loop
This adds a test for https://github.com/llvm/llvm-project/issues/58912.
IRCE crashes when it tries to check whether it is possible to safely
calculate the bounds of a loop with IV AddRec which is in another loop.
Alexander Belyaev [Fri, 11 Nov 2022 09:46:04 +0000 (10:46 +0100)]
[mlir] Fix incorrect access to the Optional<ArrayAttr> underlying values.
Haojian Wu [Fri, 11 Nov 2022 09:19:28 +0000 (10:19 +0100)]
[include-cleaner] Initial version for the "Location=>Header" step
This patch implements the initial version of "Location => Header" step:
- define the interface;
- integrate into the existing workflow, and use the PragmaIncludes;
Differential Revision: https://reviews.llvm.org/D137320
David Green [Fri, 11 Nov 2022 08:27:44 +0000 (08:27 +0000)]
[AArch64] Add smull sinking extract-and-splat tests and regenerate neon-vmull-high-p8.ll. NFC
Bjorn Pettersson [Tue, 8 Nov 2022 20:19:25 +0000 (21:19 +0100)]
[opt] Remove support for using -O[0|1|2|3|s|z] with legacy PM in opt
When running a default pipeline (for a specific O-level) in opt it is
now expected that the new PM should be used. Only reason to use the
legacy PM is when testing a pass that is locked to the legacy PM (or
when testing single passes, for example used by the llc backend).
If a test should run both a default pipeline plus some other passes,
the solution would be to invoke opt twice (separating the default
pipeline execution from the execution of individual passes).
Starting with this patch "opt -O0" etc. will result in an error.
Differential Revision: https://reviews.llvm.org/D137663
Guray Ozen [Thu, 10 Nov 2022 16:55:49 +0000 (17:55 +0100)]
[mlir] Introduce device mapper attribute for `thread_dim_map` and `mapped to dims`
`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```
It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.
The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```
Reviewed By: ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137413