River Riddle [Wed, 4 Aug 2021 18:15:50 +0000 (18:15 +0000)]
[mlir-lsp-server] Only use one MLIRContext per MLIRTextFile
A text file may be comprised of many different "chunks", when
the input file contains the `// -----` split markers. We don't
need to use a unique MLIRContext per chunk, as having
separate contexts is intended to allow for easy unloading of
unused data and all chunks have the same lifetime (tied to the
input file). This commit uses one context for the entire file,
greatly reducing memory consumption in certain situations (up
to 70%).
Differential Revision: https://reviews.llvm.org/D107488
Michael Jones [Tue, 27 Jul 2021 17:44:14 +0000 (17:44 +0000)]
[libc] add integration tests for scudo in libc
This change adds tests to make sure that SCUDO is being properly
included with llvm libc. This change also adds the toggles to properly
use SCUDO, as GWP-ASan is enabled by default and must be included for
SCUDO to function.
Reviewed By: sivachandra, hctim
Differential Revision: https://reviews.llvm.org/D106919
Fangrui Song [Wed, 4 Aug 2021 20:04:10 +0000 (13:04 -0700)]
[lld] Remove unused LLD_REPOSITORY
Remnant after D72803.
Distributions who want to customize the string can customize
LLD_VERSION_STRING instead.
Reviewed By: #lld-macho, mstorsjo, thakis
Differential Revision: https://reviews.llvm.org/D107416
Fangrui Song [Wed, 4 Aug 2021 19:45:17 +0000 (12:45 -0700)]
[CodeGen] Add -align-loops
to `lib/CodeGen/CommandFlags.cpp`. It can replace
-x86-experimental-pref-loop-alignment=.
The loop alignment is only used by MachineBlockPlacement.
The implementation uses a new `llvm::TargetOptions` for now, as
an IR function attribute/module flags metadata may be overkill.
This is the llvm part of D106701.
Michael Liao [Sun, 25 Jul 2021 03:42:26 +0000 (23:42 -0400)]
[amdgpu] Add an enhanced conversion from i64 to f32.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D107187
peter klausler [Mon, 2 Aug 2021 21:37:40 +0000 (14:37 -0700)]
[flang] runtime: For Fw.d formatting, don't oscillate forever
The algorithm for Fw.d output will drive binary to decimal conversion for
an initial fixed number of digits, then adjust that number based on the
result's exposent. For value close to a power of ten, this adjustment
process wouldn't terminate; e.g., formatting 9.999 as F10.2 would start
with 1e2, boost the digits to 2, get 9.99e1, decrease the digits, and loop.
Solve by refusing to boost the digits a second time.
Differential Revision: https://reviews.llvm.org/D107490
peter klausler [Wed, 28 Jul 2021 23:14:17 +0000 (16:14 -0700)]
[flang] Support DFLOAT legacy extension intrinsic function
Like the similar legacy extension FLOAT(), DFLOAT() represents a
conversion from default integer to DOUBLE PRECISION. Rewrite
into a conversion operation.
Differential Revision: https://reviews.llvm.org/D107489
Nikita Popov [Sun, 25 Jul 2021 15:34:17 +0000 (17:34 +0200)]
[MemCpyOpt] Relax libcall checks
Rather than blocking the whole MemCpyOpt pass if the libcalls are
not available, only disable creation of new memset/memcpy intrinsics
where only load/stores were used previously. This only affects the
store merging and load-store conversion optimization. Other
optimizations are derived from existing intrinsics, which are
well-defined in the absence of libcalls -- not having the libcalls
just means that call simplification won't convert them to intrinsics.
This is a weaker variation of D104801, which dropped these checks
entirely. Ideally we would not couple emission of intrinsics to
libcall availability at all, but as the intrinsics may be legalized
to libcalls we need to be a bit careful right now.
Differential Revision: https://reviews.llvm.org/D106769
Alfsonso Gregory [Wed, 4 Aug 2021 18:31:11 +0000 (18:31 +0000)]
[MLIR][NFC] Get DiagnosticEngine as a reference in doc
'mlir::DiagnosticEngine::DiagnosticEngine(const mlir::DiagnosticEngine&)' is implicitly deleted because the default definition would be ill-formed.
Reviewed By: rdzhabarov
Differential Revision: https://reviews.llvm.org/D107287
Arthur Eubanks [Tue, 3 Aug 2021 23:38:25 +0000 (16:38 -0700)]
[gn build] Add cfi ignorelist to compiler-rt/lib
So that building the compiler-rt target also copies the cfi ignorelist
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D107411
Alexey Bataev [Wed, 4 Aug 2021 18:49:17 +0000 (11:49 -0700)]
[SLP][NFC]Add tests for constants/undefs used in insertelements, NFC.
Giorgis Georgakoudis [Wed, 4 Aug 2021 00:15:37 +0000 (17:15 -0700)]
[OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D106892
Craig Topper [Wed, 4 Aug 2021 18:39:21 +0000 (11:39 -0700)]
[DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS.
This allows special constants like to 0 to be recognized. It's also
expected by isel patterns if a target had a mulh with immediate instructions.
The commuting done by tablegen won't commute patterns with immediates since it
expects DAGCombine to have done it.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D107486
Craig Topper [Wed, 4 Aug 2021 15:49:17 +0000 (08:49 -0700)]
[RISCV] Add test cases for conditional add/sub. NFC
InstCombine canonicalizes c ? (x+y) : x to (c ? y : 0) + x. It
does the same for and/or/xor. We already reverse this transform
for those, but don't do add/sub yet.
Jan Kratochvil [Wed, 4 Aug 2021 18:34:21 +0000 (20:34 +0200)]
[nfc] [lldb] Prevent needless copies of DataExtractor
lldb_private::DataExtractor contains DataBufferSP m_data_sp which is
relatively expensive to copy (due to multi-threading locking).
llvm::DataExtractor does not have this problem as it uses StringRef
instead.
The copy constructor is explicit as otherwise it is easy to make
unintended modification of a local copy instead of a caller's instance
(D107470 but that is llvm::DataExtractor).
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D107485
Alexey Bataev [Wed, 4 Aug 2021 17:59:58 +0000 (10:59 -0700)]
Revert "[SLP]Do not emit extra shuffle for insertelements vectorization."
This reverts commit
871ea69803b1f231254ab0c560795a33b6ed0c77 to fix the
problem if the first vector is not just undef.
Mitch Phillips [Wed, 4 Aug 2021 18:03:24 +0000 (11:03 -0700)]
[hwasan] Add __hwasan_init constructor to runtime lib.
Found by an Android toolchain upgrade, inherited module constructors
(like init_have_lse_atomics from the builtins) can sneak into the hwasan
runtime. If these inherited constructors call hwasanified libc
functions, then the HWASan runtime isn't setup enough, and the code
crashes.
Mark the initialized as a high-priority initializer to fix this.
Reviewed By: pcc, yabinc
Differential Revision: https://reviews.llvm.org/D107391
Dimitry Andric [Wed, 4 Aug 2021 18:11:43 +0000 (20:11 +0200)]
Work around non-existence of ElfW(type) macro on FreeBSD
Fixes PR51331. On FreeBSD, the elf headers don't (yet) provide the
ElfW(type) macro. However, there is a similar set of macros in the
<sys/elf-generic.h> header, of which `__ElfN(type)` exactly matches the
indended purpose.
Reviewed By: gulfem
Differential Revision: https://reviews.llvm.org/D107388
Reshabh Sharma [Wed, 4 Aug 2021 18:03:31 +0000 (23:33 +0530)]
Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list"
This reverts commit
d42e70b3d315645e37f3b1455d39e68678e69525.
Dawid Jurczak [Wed, 4 Aug 2021 15:39:51 +0000 (17:39 +0200)]
[DSE][NFC] Clean up DeadStoreElimination from unused variables
Differential Revision: https://reviews.llvm.org/D106446
Craig Topper [Wed, 4 Aug 2021 16:54:39 +0000 (09:54 -0700)]
[RISCV] Remove the _COMMUTABLE and _TA versions of FMA and wide FMA vector instructions.
Use a tail policy operand instead. Inspired by the work in D105092,
but without the intrinsic interface changes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D106512
Aart Bik [Wed, 4 Aug 2021 16:54:27 +0000 (09:54 -0700)]
[mlir][sparse] add doc to sparse tensor dialect passes
completes my first pass of filling out missing doc parts on our webpage
Reviewed By: grosul1
Differential Revision: https://reviews.llvm.org/D107479
LLVM GN Syncbot [Wed, 4 Aug 2021 17:28:44 +0000 (17:28 +0000)]
[gn build] Port
ee7d20e84675
Fangrui Song [Wed, 4 Aug 2021 17:28:27 +0000 (10:28 -0700)]
Jessica Paquette [Wed, 4 Aug 2021 00:40:24 +0000 (17:40 -0700)]
[AArch64][GlobalISel] Widen G_PHI before clamping it during legalization
This allows us to handle weird types like s88; we first widen to s128, then
clamp back down to s64.
https://godbolt.org/z/9xqbP46Mz
Also this makes it possible for GISel to legalize the case in pr48188.ll. It
now does the same thing as SDAG, although regalloc chooses different registers.
Differential Revision: https://reviews.llvm.org/D107417
Jessica Paquette [Tue, 3 Aug 2021 23:42:22 +0000 (16:42 -0700)]
[AArch64][GlobalISel] Widen G_FPTO*I before clamping
Going through our legalization rules and doing some cleanup.
Widening and then clamping is usually easier than clamping and then widening.
This allows us to legalize some weird types like s88.
Differential Revision: https://reviews.llvm.org/D107413
Petr Hosek [Tue, 3 Aug 2021 17:39:24 +0000 (10:39 -0700)]
[InstrProfiling] Emit bias variable eagerly
Rather than emitting the bias variable lazily as needed, emit it
eagerly. This allows profile runtime to refer to this variable
unconditionally without having to use the weak reference. The bias
variable is in a COMDAT so there'll never be more than one instance,
and if it's not needed, linker should be able to GC it, so the overhead
should be minimal.
Differential Revision: https://reviews.llvm.org/D107377
Geoffrey Martin-Noble [Wed, 4 Aug 2021 17:01:43 +0000 (10:01 -0700)]
[Bazel] Update build for
ee7d20e846
Updates the Bazel configuration for
https://github.com/llvm/llvm-project/commit/
ee7d20e84675. We need to
drop the dependency from llvm-tblgen to avoid a dependency cycle:
```
.-> @llvm-project//llvm:llvm-tblgen
| @llvm-project//llvm:tblgen
| @llvm-project//llvm:MC
| @llvm-project//llvm:ProfileData
| @llvm-project//llvm:Core
| @llvm-project//llvm:attributes_gen
| @llvm-project//llvm:include/llvm/IR/Attributes.inc
| @llvm-project//llvm:attributes_gen__gen_attrs_genrule
`-- @llvm-project//llvm:llvm-tblgen
```
It appears this dep was not strictly necessary though. TableGen uses MC
headers but it can get those through Support, which also exports MC
headers due to layering issues.
Differential Revision: https://reviews.llvm.org/D107480
Andrea Di Biagio [Tue, 3 Aug 2021 16:10:42 +0000 (17:10 +0100)]
[X86][SchedModel] Add missing ReadAdvance for some arithmetic ops (PR51318 and PR51322).
This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in
the RM/MR variants of ADC/SBB (PR51318)
This also fixes the absence of ReadAdvance for the register operand of
RMW arithmetic instructions (PR51322).
Differential Revision: https://reviews.llvm.org/D107367
Shilei Tian [Wed, 4 Aug 2021 16:36:34 +0000 (12:36 -0400)]
[OpenMP] Clean up for hidden helper task
This patch makes some clean up for code of hidden helper task.
Reviewed By: protze.joachim
Differential Revision: https://reviews.llvm.org/D107008
Shilei Tian [Wed, 4 Aug 2021 16:34:37 +0000 (12:34 -0400)]
[OpenMP] Fix performance regression reported in bug #51235
This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied.
Reviewed By: protze.joachim
Differential Revision: https://reviews.llvm.org/D107121
Fangrui Song [Wed, 4 Aug 2021 16:26:29 +0000 (09:26 -0700)]
[ELF] Fix typo. NFC
jamesluox [Wed, 4 Aug 2021 15:50:28 +0000 (08:50 -0700)]
[CSSPGO] Migrate and refactor the decoder of Pseudo Probe
Migrate pseudo probe decoding logic in llvm-profgen to MC, so other LLVM-base program could reuse existing codes. Redesign object layout of encoded and decoded pseudo probes.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D106861
Sander de Smalen [Wed, 4 Aug 2021 14:04:23 +0000 (15:04 +0100)]
[InstCombine] Fix vscale zext/sext optimization when vscale_range is unbounded.
According to the LangRef, a (vscale_range) value of 0 means unbounded.
This patch additionally cleans up the test file vscale_sext_and_zext.ll.
Bradley Smith [Mon, 26 Jul 2021 10:34:08 +0000 (10:34 +0000)]
[clang][AArch64][SVE] Avoid going through memory for fixed/scalable predicate casts
For fixed SVE types, predicates are represented using vectors of i8,
where as for scalable types they are represented using vectors of i1. We
can avoid going through memory for casts between these by bitcasting the
i1 scalable vectors to/from a scalable i8 vector of matching size, which
can then use the existing vector insert/extract logic.
Differential Revision: https://reviews.llvm.org/D106860
Fangrui Song [Wed, 4 Aug 2021 16:06:04 +0000 (09:06 -0700)]
[ELF] Combine foo@v1 and foo with the same versionId if both are defined
Due to an assembler design flaw (IMO), `.symver foo,foo@v1` produces two symbols `foo` and `foo@v1` if `foo` is defined.
* `v1 {};` produces both `foo` and `foo@v1`, but GNU ld only produces `foo@v1`
* `v1 { foo; };` produces both `foo@@v1` and `foo@v1`, but GNU ld only produces `foo@v1`
* `v2 { foo; };` produces both `foo@@v2` and `foo@v1`, matching GNU ld. (Tested by symver.s)
This patch implements the GNU ld behavior by reusing the symbol redirection mechanism
in D92259. The new test symver-non-default.s checks the first two cases.
Without the patch, the second case will produce `foo@v1` and `foo@@v1` which
looks weird and makes foo unnecessarily default versioned.
Note: `.symver foo,foo@v1,remove` exists but the unfortunate `foo` will not go
away anytime soon.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D107235
Dmitry Vyukov [Wed, 4 Aug 2021 15:03:44 +0000 (17:03 +0200)]
tsan: remove non-existent MemoryAccessRangeStep
Probably was used for Go at some point...
Depends on D107466.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107467
Dmitry Vyukov [Wed, 4 Aug 2021 15:00:24 +0000 (17:00 +0200)]
tsan: move AccessType to tsan_defs.h
It will be needed in more functions like ReportRace
(the plan is to pass it through MemoryAccess to ReportRace)
and this move will allow to split the huge tsan_rtl.h into parts
(e.g. move FastState/Shadow definitions to a separate header).
Depends on D107465.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107466
Dmitry Vyukov [Wed, 4 Aug 2021 14:56:22 +0000 (16:56 +0200)]
tsan: introduce kAccessExternalPC
Add kAccessExternal memory access flag that denotes
memory accesses with PCs that may have kExternalPCBit set.
In preparation for MemoryAccess refactoring.
Currently unused, but will allow to skip a branch.
Depends on D107464.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107465
Dmitry Vyukov [Wed, 4 Aug 2021 14:42:05 +0000 (16:42 +0200)]
tsan: introduce kAccessFree
Add kAccessFree memory access flag (similar to kAccessVptr).
In preparation for MemoryAccess refactoring.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107464
Fangrui Song [Wed, 4 Aug 2021 16:02:11 +0000 (09:02 -0700)]
[ELF] Apply version script patterns to non-default version symbols
Currently version script patterns are ignored for .symver produced
non-default version (single @) symbols. This makes such symbols
not localizable by `local:`, e.g.
```
.symver foo3_v1,foo3@v1
.globl foo_v1
foo3_v1:
ld.lld --version-script=a.ver -shared a.o
# In a.out, foo3@v1 is incorrectly exported.
```
This patch adds the support:
* Move `config->versionDefinitions[VER_NDX_LOCAL].patterns` to `config->versionDefinitions[versionId].localPatterns`
* Rename `config->versionDefinitions[versionId].patterns` to `config->versionDefinitions[versionId].nonLocalPatterns`
* Allow `findAllByVersion` to find non-default version symbols when `includeNonDefault` is true. (Note: `symtab` keys do not have `@@`)
* Make each pattern check both the unversioned `pat.name` and the versioned `${pat.name}@${v.name}`
* `localPatterns` can localize `${pat.name}@${v.name}`. `nonLocalPatterns` can prevent localization by assigning `verdefIndex` (before `parseSymbolVersion`).
---
If a user notices new `undefined symbol` errors with a version script containing
`local: *;`, the issue is likely due to a missing `global:` pattern.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D107234
Lechen Yu [Wed, 4 Aug 2021 15:17:10 +0000 (17:17 +0200)]
[openmp] Add OMPT initialization in libomptarget
When loading libomptarget, the init function in libomptarget/src/rtl.cpp
will search for the libomptarget_start_tool function using libdl.
libomptarget_start_tool will pass those OMPT callbacks related to target
constructs to libomptarget
Differential Revision: https://reviews.llvm.org/D99803
Fangrui Song [Wed, 4 Aug 2021 15:58:50 +0000 (08:58 -0700)]
[ELF] Make dot in .tbss correct
GNU ld doesn't support multiple SHF_TLS SHT_NOBITS output sections (it restores
the address after an SHF_TLS SHT_NOBITS section, so consecutive SHF_TLS
SHT_NOBITS sections will have conflicting address ranges).
That said, `threadBssOffset` implements limited support for consecutive SHF_TLS
SHT_NOBITS sections. (SHF_TLS SHT_PROGBITS following a SHF_TLS SHT_NOBITS can still be
incorrect.)
`.` in an output section description of an SHF_TLS SHT_NOBITS section is
incorrect. (https://lists.llvm.org/pipermail/llvm-dev/2021-July/151974.html)
This patch saves the end address of the previous tbss section in
`ctx->tbssAddr`, changes `dot` in the beginning of `assignOffset` so
that `.` evaluation will be correct.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D107208
Aart Bik [Wed, 4 Aug 2021 15:53:47 +0000 (08:53 -0700)]
[mlir][sparse] fixed typo in sparse tensor type attribute alias
Reviewed By: grosul1, rriddle
Differential Revision: https://reviews.llvm.org/D107472
Bradley Smith [Thu, 22 Jul 2021 12:30:45 +0000 (12:30 +0000)]
[AArch64][SVE] Combine bitcasts of predicate types with vector inserts/extracts of loads/stores
An insert subvector that is inserting the result of a vector predicate
sized load into undef at index 0, whose result is casted to a predicate
type, can be combined into a direct predicate load. Likewise the same
applies to extract subvector but in reverse.
The purpose of this optimization is to clean up cases that will be
introduced in a later patch where casts to/from predicate types from i8
types will use insert subvector, rather than going through memory early.
This optimization is done in SVEIntrinsicOpts rather than InstCombine to
re-introduce scalable loads as late as possible, to give other
optimizations the best chance possible to do a good job.
Differential Revision: https://reviews.llvm.org/D106549
Aart Bik [Wed, 4 Aug 2021 01:49:52 +0000 (18:49 -0700)]
[mlir][amx] add doc to AMX dialect
Making sure the AMX dialect webpage reads better with a short introduction on the purpose of this dialect.
Reviewed By: grosul1, bondhugula
Differential Revision: https://reviews.llvm.org/D107419
Pushpinder Singh [Wed, 4 Aug 2021 15:10:15 +0000 (15:10 +0000)]
[AMDGPU][OpenMP] Wrap amdgcn declare variant inside ifdef
This fixes the issue https://bugs.llvm.org/show_bug.cgi?id=51337
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D107468
Simon Wallis [Wed, 4 Aug 2021 15:17:20 +0000 (16:17 +0100)]
[AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults
Don't know how to custom expand this
UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788
The fix is to provide missing expansions for:
case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_TO_SINT:
A test case is provided.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D107452
Sean Fertile [Wed, 4 Aug 2021 15:01:00 +0000 (11:01 -0400)]
[PowerPC][AIX] Packed zero-width bitfields do not affect alignment.
Zero-width bitfields on AIX pad out to the natral alignment boundary but
do not change the containing records alignment.
Differential Revision: https://reviews.llvm.org/D106900
Jay Foad [Tue, 3 Aug 2021 16:13:02 +0000 (17:13 +0100)]
[AMDGPU] Add cttz tests and globalisel checks for ctlz
Chris Jackson [Wed, 4 Aug 2021 09:37:03 +0000 (10:37 +0100)]
[DebugInfo][LSR] Avoid crashes on large integer inputs
SCEV-based salvaging in LSR translates SCEVs to DIExpressions. SCEVs may
contain very large integers but the translation does not support
integers greater than 64 bits. This patch adds checks to ensure
conversions of these large integers is not attempted. A regression test
is added to ensure no such translation is attempted.
Reviewed by: StephenTozer
PR: https://bugs.llvm.org/show_bug.cgi?id=51329
Differential Revision: https://reviews.llvm.org/D107438
Jay Foad [Wed, 4 Aug 2021 07:39:12 +0000 (08:39 +0100)]
[AMDGPU] Generate checks for i64 to fp conversions
Differential Revision: https://reviews.llvm.org/D107429
Kazu Hirata [Wed, 4 Aug 2021 14:38:24 +0000 (07:38 -0700)]
[ADT] Drop unnecessary const from return types (NFC)
Identified with const-return-type-APInt.
Reshabh Sharma [Wed, 4 Aug 2021 14:17:07 +0000 (19:47 +0530)]
[AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.
HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.
To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D105682
Roman Lebedev [Wed, 4 Aug 2021 14:14:57 +0000 (17:14 +0300)]
[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up
Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef`
to it is confusing. Instead, just move the `Mask` vector higher up,
and change the code that earlier had no access to it but now does
to use `Mask` instead of `BaseMask`.
This has no other intentional changes.
Roman Lebedev [Wed, 4 Aug 2021 14:05:05 +0000 (17:05 +0300)]
[NFC][X86] combineX86ShuffleChain(): rename inner Mask to avoid future shadowing
I want to hoist `Mask` variable higher up,
but then it would clash with this one.
So let's rename this one first.
There are no other intentional changes here other than said rename.
Tomas Matheson [Tue, 3 Aug 2021 15:15:21 +0000 (16:15 +0100)]
[ARM][atomicrmw] Fix CMP_SWAP_32 expand assert
This assert is intended to ensure that the high registers are not
selected when it is passed to one of the thumb UXT instructions. However
it was triggering even for 32 bit where no UXT instruction is emitted.
Fixes PR51313.
Differential Revision: https://reviews.llvm.org/D107363
Roman Lebedev [Wed, 4 Aug 2021 13:46:29 +0000 (16:46 +0300)]
[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats
Given a shuffle mask, if it is picking from an input that is splat
given the current granularity of the shuffle, then adjust the mask
to pick from the same lane of the input as the mask element is in.
This may result in a shuffle being simplified into a blend.
I believe this is correct given that the splat detection matches the one
just above the new code,
My basic thought is that we might be able to get less regressions
by handling multiple insertions of the same value into a vector
if we form broadcasts+blend here, as opposed to D105390,
but i have not really thought this through,
and did not try implementing it yet.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D107009
Matthias Springer [Wed, 4 Aug 2021 13:48:34 +0000 (22:48 +0900)]
[mlir] Fix gcc-5 build in ViewOpGraph.cpp
Differential Revision: https://reviews.llvm.org/D107458
Matthias Springer [Wed, 4 Aug 2021 13:11:27 +0000 (22:11 +0900)]
[flang] Add missing FileSystem.h
This file was previously included transitively via `mlir/Transforms/Passes.h`, but the include has been removed from that file.
Differential Revision: https://reviews.llvm.org/D107455
David Green [Wed, 4 Aug 2021 13:21:32 +0000 (14:21 +0100)]
[RDA] Attempt to make RDA subreg aware
This attempts to make more of RDA aware of potentially overlapping
subregisters. Some of this was already in place, with it iterating
through MCRegUnitIterators. This also replaces calls to
LiveRegs.contains(..) with !LiveRegs.available(..), and updates the
isValidRegUseOf and isValidRegDefOf to search subregs.
Differential Revision: https://reviews.llvm.org/D107351
David Green [Wed, 4 Aug 2021 10:07:00 +0000 (11:07 +0100)]
[ARM] Test showing incorrect codegen when subreg liveness is enabled. NFC
Matthias Springer [Wed, 4 Aug 2021 12:37:10 +0000 (21:37 +0900)]
[mlir] Include llvm/Support/Debug.h in Transforms/Passes.h
There are many downstream users of llvm::dbgs, which is defined in Debug.h. Before D106342, many users included that dependency transitively via the now deleted ViewRegionGraph.h. Adding it back to Transforms/Passes.h for convenience.
Differential Revision: https://reviews.llvm.org/D107451
Simon Pilgrim [Wed, 4 Aug 2021 12:07:35 +0000 (13:07 +0100)]
[X86] Rename X86 tuning feature flag FeatureHasFastGather -> FeatureFastGather
Match the naming style used by the other 'FeatureFast/FeatureSlow' tuning flags.
Simon Pilgrim [Wed, 4 Aug 2021 11:41:34 +0000 (12:41 +0100)]
[X86] Move FeatureFastBEXTR from bdver2 features to tuning
Noticed while looking at the feature flag renaming suggested in D107370
Muhammad Omair Javaid [Wed, 4 Aug 2021 11:53:07 +0000 (16:53 +0500)]
[LLDB] Skip flaky tests on Arm/AArch64 Linux bots
Following LLDB tests fail randomly on LLDB Arm/AArch64 Linux buildbots.
We still not have a reliable solution for these tests to pass
consistently. I am marking them skipped for now.
TestBreakpointCallbackCommandSource.py
TestIOHandlerResize.py
TestEditline.py
TestGuiViewLarge.py
TestGuiExpandThreadsTree.py
TestGuiBreakpoints.py
Dmitry Vyukov [Tue, 3 Aug 2021 15:18:06 +0000 (17:18 +0200)]
tsan: don't use spinning in __cxa_guard_acquire/pthread_once
Currently we use passive spinning with internal_sched_yield to wait
in __cxa_guard_acquire/pthread_once. Passive spinning tends to degrade
ungracefully under high load. Use FutexWait/Wake instead.
Depends on D107359.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D107360
Jan Svoboda [Wed, 4 Aug 2021 11:47:29 +0000 (13:47 +0200)]
[clang][deps] Substitute clang-scan-deps executable in lit tests
The lit tests for `clang-scan-deps` invoke the tool without going through the substitution system. While the test runner correctly picks up the `clang-scan-deps` binary from the build directory, it doesn't print its absolute path. When copying the invocations when reproducing test failures, this can result in `command not found: clang-scan-deps` errors or worse yet: pick up the system `clang-scan-deps`. This patch adds new local `%clang-scan-deps` substitution.
Reviewed By: lxfind, dblaikie
Differential Revision: https://reviews.llvm.org/D107155
Dmitry Vyukov [Wed, 4 Aug 2021 11:29:38 +0000 (13:29 +0200)]
tsan: refactor guard_acquire/release
Introduce named consts for magic values we use.
Differential Revision: https://reviews.llvm.org/D107445
Jan Svoboda [Wed, 4 Aug 2021 11:27:25 +0000 (13:27 +0200)]
[clang][cli] Expose -fno-cxx-modules in cc1
For some use-cases, it might be useful to be able to turn off modules for C++ in `-cc1`. (The feature is implied by `-std=C++20`.)
This patch exposes the `-fno-cxx-modules` option in `-cc1`.
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D106864
Matthias Springer [Wed, 4 Aug 2021 11:20:48 +0000 (20:20 +0900)]
[mlir] Support drawing control-flow graphs in ViewOpGraph.cpp
* Add new pass option `print-data-flow-edges`, default value `true`.
* Add new pass option `print-control-flow-edges`, default value `false`.
* Remove `PrintCFGPass`. Same functionality now provided by
`PrintOpPass`.
Differential Revision: https://reviews.llvm.org/D106342
Dmitry Vyukov [Tue, 3 Aug 2021 14:30:08 +0000 (16:30 +0200)]
tsan: unify __cxa_guard_acquire and pthread_once implementations
Currently we effectively duplicate "once" logic for __cxa_guard_acquire
and pthread_once. Unify the implementations.
This is not a no-op change:
- constants used for pthread_once are changed to match __cxa_guard_acquire
(__cxa_guard_acquire constants are tied to ABI, but it does not seem
to be the case for pthread_once)
- pthread_once now also uses PotentiallyBlockingRegion annotations
- __cxa_guard_acquire checks thr->in_ignored_lib to skip user synchronization
It's unclear if these 2 differences are intentional or a mere sloppy inconsistency.
Since all tests still pass, let's assume the latter.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D107359
Dmitry Vyukov [Tue, 3 Aug 2021 17:01:38 +0000 (19:01 +0200)]
tsan: use DCHECK instead of CHECK in atomic functions
Atomic functions are semi-hot in profiles.
The CHECKs verify values passed by compiler
and they never fired, so replace them with DCHECKs.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D107373
Dmitry Vyukov [Tue, 3 Aug 2021 16:42:52 +0000 (18:42 +0200)]
tsan: minor MetaMap tweaks
1. Add some comments.
2. Use kInvalidStackID instead of literal 0.
3. Add more LIKELY/UNLIKELY.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D107371
David Spickett [Wed, 4 Aug 2021 10:26:46 +0000 (10:26 +0000)]
[llvm][MC] Disable cfi-version test for Windows on Arm
Like Windows on x86-64, Windows on arm64 uses structured
exception handling, so we don't emit .debug_frame.
See:
https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=msvc-160
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D107440
Tim Northover [Wed, 4 Aug 2021 11:09:51 +0000 (12:09 +0100)]
X86: add test for realignment fix committed earlier.
Forgot "git add" for a new file.
Jaroslav Sevcik [Wed, 4 Aug 2021 09:51:16 +0000 (11:51 +0200)]
Reland "[lldb/DWARF] Only match mangled name in full-name function lookup (with accelerators)"
Summary:
In the spirit of https://reviews.llvm.org/D70846, we only return functions with
matching mangled name from Apple/DebugNamesDWARFIndex::GetFunction if
eFunctionNameTypeFull is requested.
This speeds up lookup in the presence of large amount of class methods of the
same name (a typical examples would be constructors of templates with many
instantiations or overloaded operators).
Reviewers: labath, teemperor
Reviewed By: labath, teemperor
Subscribers: aprantl, arphaman, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D73191
Matthias Springer [Wed, 4 Aug 2021 10:19:27 +0000 (19:19 +0900)]
[mlir] Fix CMake linker rules for ViewOpGraph.cpp
Differential Revision: https://reviews.llvm.org/D107439
Serge Pavlov [Wed, 4 Aug 2021 10:18:15 +0000 (17:18 +0700)]
Revert "Introduce intrinsic llvm.isnan"
This reverts commit
16ff91ebccda1128c43ff3cee104e2c603569fb2.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
Simon Pilgrim [Wed, 4 Aug 2021 10:16:23 +0000 (11:16 +0100)]
[X86] Split Subtarget ISA / Security / Tuning Feature Flags Definitions. NFC
Our list of slow/fast tuning feature flags has become pretty extensive and is randomly interleaved with ISA and Security (Retpoline etc.) flags, not even based on when the ISAs/flags were introduced, making it tricky to locate them. Plus we started treating tuning flags separately some time ago, so this patch tries to group the flags to match.
I've left them mostly in the same order within each group - I'm happy to rearrange them further if there are specific ISA or Tuning flags that you think should be kept closer together.
Differential Revision: https://reviews.llvm.org/D107370
Kim-Anh Tran [Wed, 4 Aug 2021 07:16:10 +0000 (09:16 +0200)]
[lldb] Fix lookup of .debug_loclists with split-dwarf
This patch fixes the lookup of locations in
.debug_loclists, if they are split in a .dwp file.
Mainly, we need to consider the cu index offsets.
Reviewed By: jankratochvil
Differential Revision: https://reviews.llvm.org/D107161
David Spickett [Tue, 3 Aug 2021 15:16:41 +0000 (15:16 +0000)]
[llvm][ExecutionEngine] Don't try to run tests on ARM64/Windows on Arm
We use CMAKE_SYSTEM_PROCESSOR to set the host_arch lit feature.
This is going to be the same value as CMAKE_HOST_SYSTEM_PROCESSOR,
which on windows is set to the value of the PROCESSOR_ARCHITECTURE
environment variable.
https://cmake.org/cmake/help/latest/variable/CMAKE_HOST_SYSTEM_PROCESSOR.html#cmake-host-system-processor
On Windows on Arm this is "ARM64", not "AArch64" as we currently
look for.
https://docs.microsoft.com/en-us/windows/win32/winprog64/wow64-implementation-details#environment-variables
Add ARM64 to the unsupported list.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D107361
Raphael Isemann [Wed, 4 Aug 2021 08:59:53 +0000 (10:59 +0200)]
[lldb] Partly revert "Allow range-based for loops over DWARFDIE's children"
As pointed out in D107434 by Walter, D103172 also changed two for loops that
were actually not just iterating over some DIEs but also using the iteration
variable later on for some other things. This patch reverts the respective
faulty parts of D103172.
Tim Northover [Wed, 4 Aug 2021 08:24:39 +0000 (09:24 +0100)]
X86: fix frame offset calculation with mandatory tail calls
If there's a region of the stack reserved for potential tail call arguments
(only the case when we guarantee tail calls will be honoured), this is right
next to the incoming stored return address, not necessarily next to the
callee-saved area, so combining the two into a single figure leads to incorrect
offsets in some edge cases.
Serge Pavlov [Wed, 4 Aug 2021 08:27:49 +0000 (15:27 +0700)]
Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
Andre Vieira [Wed, 4 Aug 2021 08:17:12 +0000 (09:17 +0100)]
[libc] Fix Memory Benchmarks code after rename
Differential Revision: https://reviews.llvm.org/D107376
Sjoerd Meijer [Tue, 3 Aug 2021 19:42:09 +0000 (20:42 +0100)]
[FuncSpec] Support specialising recursive functions
This adds support for specialising recursive functions. For example:
int Global = 1;
void recursiveFunc(int *arg) {
if (*arg < 4) {
print(*arg);
recursiveFunc(*arg + 1);
}
}
void main() {
recursiveFunc(&Global);
}
After 3 iterations of function specialisation, followed by inlining of the
specialised versions of recursiveFunc, the main function looks like this:
void main() {
print(1);
print(2);
print(3);
}
To support this, the following has been added:
- Update the solver and state of the new specialised functions,
- An optimisation to propagate constant stack values after each iteration of
function specialisation, which is necessary for the next iteration to
recognise the constant values and trigger.
Specialising recursive functions is (at the moment) controlled by option
-func-specialization-max-iters and is opt-in for compile-time reasons. I.e.,
the default is -func-specialization-max-iters=1, but for the example above we
would need to use -func-specialization-max-iters=3. Future work is to see if we
can increase the default, or improve the cost-model/heuristics to control
compile-times.
Differential Revision: https://reviews.llvm.org/D106426
Senran Zhang [Wed, 4 Aug 2021 06:59:09 +0000 (23:59 -0700)]
[Support] Initialize common options in `getRegisteredOptions`
This allows users accessing options in libSupport before invoking
`cl::ParseCommandLineOptions`, and also matches the behavior before
D105959.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D106334
Adrian Kuegel [Wed, 4 Aug 2021 06:49:30 +0000 (08:49 +0200)]
[mlir][Bazel] Adjust BUILD.bazel file.
The dependency is needed after
1b00b94ffc2d60
Differential Revision: https://reviews.llvm.org/D107426
Esme-Yi [Wed, 4 Aug 2021 06:28:26 +0000 (06:28 +0000)]
[llvm-readobj][XCOFF] dump the string table only if the size is bigger than 4.
Stephen Neuendorffer [Tue, 1 Jun 2021 05:32:49 +0000 (22:32 -0700)]
[mlir] Handle cases where transfer_read should turn into a scalar load
The existing vector transforms reduce the dimension of transfer_read
ops. However, beyond a certain point, the vector op actually has
to be reduced to a scalar load, since we can't load a zero-dimension
vector. This handles this case.
Note that in the longer term, it may be preferaby to support
zero-dimension vectors. see
https://llvm.discourse.group/t/should-we-have-0-d-vectors/3097.
Differential Revision: https://reviews.llvm.org/D103432
hsmahesha [Wed, 4 Aug 2021 04:08:55 +0000 (09:38 +0530)]
[AMDGPU] Ignore call graph node which does not have function info.
While collecting reachable callees (from kernels), ignore call graph node which
does not have associated function or associated function is not a definition.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D107329
Senran Zhang [Wed, 4 Aug 2021 04:52:14 +0000 (21:52 -0700)]
[NFC][ConstantFold] Check getAggregateElement before getSplatValue call
Constant::getSplatValue has O(N) time complexity in the worst case,
where N is the # of elements in a vector. So we call
Constant::getAggregateElement first and return earlier if possible to
avoid unnecessary getSplatValue calls.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107252
Matthias Springer [Wed, 4 Aug 2021 04:10:42 +0000 (13:10 +0900)]
[mlir] Fix broken build in pass_manager.py
This test ensures that an error is generated from the Python side when running a module pass on a function. The test used to instantiate ViewOpGraph, however, this pass was changed into a general "any op" pass in D106253. Therefore, a different pass must be used in this test.
Differential Revision: https://reviews.llvm.org/D107424
Heejin Ahn [Mon, 2 Aug 2021 01:30:18 +0000 (18:30 -0700)]
[WebAssembly] Misc. cosmetic changes in EH (NFC)
- Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are
planning to add a separate `wasm.catch.longjmp` intrinsic which
returns two values.
- Rename several variables
- Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall`
from LowerEmscriptenEHSjLj pass
- Add `-verify-machineinstrs` in a test for a safety measure
- Add more comments + fix some errors in comments
- Replace `std::vector` with `SmallVector` for cases likely with small
number of elements
- Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are
soon going to add `EnableWasmSjLj`, so this makes the distincion
clearer
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D107405
Arthur Eubanks [Mon, 2 Aug 2021 22:33:07 +0000 (15:33 -0700)]
[MC][CodeGen] Emit constant pools earlier
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.
We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.
Fixes PR51208
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107314
Matthias Springer [Wed, 4 Aug 2021 02:57:44 +0000 (11:57 +0900)]
[mlir] Truncate/skip long strings in ViewOpGraph.cpp
* New pass option `max-label-len`: Truncate attributes/result types that have more #chars.
* New pass option `print-attrs`: Activate/deactivate rendering of attributes.
* New pass option `printResultTypes`: Activate/deactivate rendering of result types.
Differential Revision: https://reviews.llvm.org/D106337
Vitaly Buka [Tue, 3 Aug 2021 20:14:52 +0000 (13:14 -0700)]
[llvm-readobj][XCOFF] Warn about invalid offset
Followup for D105522
Differential Revision: https://reviews.llvm.org/D107398
Jacob Hegna [Wed, 4 Aug 2021 03:08:02 +0000 (03:08 +0000)]
[MLGO] Update the current model url for the Oz inliner model.
Matthias Springer [Wed, 4 Aug 2021 02:47:34 +0000 (11:47 +0900)]
[mlir] Improve Graphviz visualization in PrintOpPass
* Visualize blocks and regions as subgraphs.
* Generate DOT file directly instead of using `GraphTraits`. `GraphTraits` does not support subgraphs.
Differential Revision: https://reviews.llvm.org/D106253