review.tizen.org Git - platform/upstream/llvm.git/log

[mlir] LLVM import: handle function-typed constants

The current implementation of the LLVM-to-MLIR translation could not handle
functions used as constant values in instructions. The handling is added
trivially as `llvm.mlir.constant` can define constants of function type using
SymbolRef attributes, which works even for functions that have not been
declared yet.

GlobalISel: Implement lower for G_BITCAST

Bitcast only really applies between scalars and vectors. Implement as
an unmerge and remerge. The test needs to tolerate failure since one
of the unmerges currently fails to legalize.

AMDGPU: Partially directly select llvm.amdgcn.interp.p1.f16

The 16 bank LDS case is complicated due to using multiple
instructions. If I attempt to write a pattern for it, the generated
selector incorrectly places the copy to m0 after the first
instruction, so that needs to be separately addressed.

Also fix not gluing the copy to m0 to the second operation in the
second half of the 16 bank lowering.

GlobalISel: Fix narrowScalar for G_ANYEXT results

This is nearly the same as G_ZEXT.

TableGen: Delete some copy constuctors

Some register related machinery relies on uniqued, static pointers for
register classes and subregisters, so try to make sure these are never
copied.

TableGen/GlobalISel: Don't take reference to temporary values

These return temporary Optional<> values which are immediately
destroyed. I'm not sure why no sanitizers seem to have caught this,
but I encountered crashes on these in a future patch.

TableGen/GlobalISel: Don't reconstruct CodeGenRegBank

The maps for dealing with the relationships between different register
classes and subregister indexes rely on unique pointers for every
class/index. By constructing a second copy of CodeGenRegBank, two
different pointer values existed for a given subregister depending on
where you were querying.

Use the existing CodeGenRegBank owned by the CodeGenTarget instead of
constructing a second copy. This avoids incorrectly failing map
lookups in a future change.

[RISCV] Fix test for inline asm z constraint modifier

Summary: Use an `i` constraint in the test, to correctly trigger the code for
handling the `z` constraint modifier.

Reviewers: asb, lenary, jrtc27
Reviewed By: lenary, jrtc27
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72134

Further implement CWG 2292

The core issue is that simple-template-id is ambiguous between class-name
and type-name. This fixes PR43966.

[llvm-locstats] Add the --compare option

Draw a plot showing the difference in debug loc coverage on two
files provided.

Differential Revision: https://reviews.llvm.org/D71870

[PowerPC] Legalize saturating vector add/sub

These intrinsics and the corresponding ISD nodes were recently added. PPC has
instructions that do this for vectors. Legalize them and add patterns to emit
the satuarting instructions.

Differential revision: https://reviews.llvm.org/D71940

Bump the trunk major version to 11

and clear the release notes.

Revert rG6078f2fedcac5797ac39ee5ef3fd7a35ef1202d5 - "[AArch64][GlobalISel]: Support @llvm.{return,frame}address selection."

These intrinsics expand to a variable number of instructions so just like in
ISelLowering.cpp we use custom code to deal with them.

Committing Tim's original patch.

Differential Revision: https://reviews.llvm.org/D65656
----
Breaks EXPENSIVE_CHECKS builds.

[RISCV] Support ABI checking with per function target-features

if users don't specific -mattr, the default target-feature come
from IR attribute.

Reviewers: lenary, asb

Reviewed By: lenary, asb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70837

Revert "[RISCV] Support ABI checking with per function target-features"

This reverts commit 109e4d12edda07bdec139de36d9fdb6f73399f92.

Fix Wdocumentation warning. NFC.

RegisterClassInfo::computePSetLimit - assert that we actually find a register.

Fixes "pointer is null" clang static analyzer warning.

Fix "pointer is null" static analyzer warning. NFCI.

Use cast<> instead of dyn_cast<> since the pointer is always dereferenced and cast<> will perform the null assertion for us.

[yaml2obj/obj2yaml] - Add support for SHT_RELR sections.

Note: this is a reland with a trivial 2 lines fix in ELFState<ELFT>::writeSectionContent.
It adds a check similar to ones we already have for other sections to fix the case revealed
by bots, like http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/60744.

The encoded sequence of Elf*_Relr entries in a SHT_RELR section looks
like [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
i.e. start with an address, followed by any number of bitmaps. The address
entry encodes 1 relocation. The subsequent bitmap entries encode up to 63(31)
relocations each, at subsequent offsets following the last address entry.

More information is here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Object/ELF.cpp#L272

This patch adds a support for these sections.

Differential revision: https://reviews.llvm.org/D71872

[lldb] Add expect_expr function for testing expression evaluation in dotests.

Summary:
This patch adds a new function to lldbtest: `expect_expr`. This function is supposed to replace the current approach
of calling `expect`/`runCmd` with `expr`, `p` etc.

`expect_expr` allows evaluating expressions and matching their value/summary/type/error message without
having to do any string matching that might allow unintended passes (e.g., `self.expect("expr 3+4", substrs=["7"])`
can unexpectedly pass for results like `(Class7) $0 = 7`, `(int) $7 = 22`, `(int) $0 = 77` and so on).

This only uses the function in a few places to test and demonstrate it. I'll migrate the tests in follow up commits.

Reviewers: JDevlieghere, shafik, labath

Reviewed By: labath

Subscribers: christof, abidh, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D70314

[AArch64][SVE] Fold variable into assert to silence unused variable warnings in Release builds

[NFC] Adjust test cases numbering, test commit.

Summary:
Test case test14 is missing, adjust the numbering to have a consecutive range.
Also a test commit to verify commit access.

[llvm-locstats] Fix the docs

Add the missing picture for the documentation.

[Lexer] Allow UCN for dollar symbol '\u0024' in identifiers when using -fdollars-in-identifiers flag.

Summary:
Previously, the -fdollars-in-identifiers flag allows the '$' symbol to be used
in an identifier but the universal character name equivalent '\u0024' is not
allowed.
This patch changes this, so that \u0024 is valid in identifiers.

Reviewers: rsmith, jordan_rose

Reviewed By: rsmith

Subscribers: dexonsmith, simoncook, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D71758

Revert "[yaml2obj/obj2yaml] - Add support for SHT_RELR sections."

This reverts commit 46d11e30ee807accefd14e0b7f306647963a39b5.

It broke bots. E.g. http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/60744

[Support] Replace Windows __declspec(thread) with thread_local for LLVM_THREAD_LOCAL

Windows minimum host tools version is now VS2017, which supports C++11
thread_local so use this for LLVM_THREAD_LOCAL instead of
declspec(thread). According to [1], thread_local is implemented with
declspec(thread) so this should be NFC.

[1] https://docs.microsoft.com/en-us/cpp/cpp/thread?view=vs-2017

Differential Revision: https://reviews.llvm.org/D72399

[AArch64][SVE] Add ptest intrinsics

Summary:
Implements the following intrinsics:

    * @llvm.aarch64.sve.ptest.any
    * @llvm.aarch64.sve.ptest.first
    * @llvm.aarch64.sve.ptest.last

Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72398

[llvm-locstats] Add the --draw-plot option

When using the option, draw the histogram representing the debug
location buckets. The resulting histogram will be saved in a png
file.

Differential Revision: https://reviews.llvm.org/D71869

[yaml2obj/obj2yaml] - Add support for SHT_RELR sections.

The encoded sequence of Elf*_Relr entries in a SHT_RELR section looks
like [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
i.e. start with an address, followed by any number of bitmaps. The address
entry encodes 1 relocation. The subsequent bitmap entries encode up to 63(31)
relocations each, at subsequent offsets following the last address entry.

More information is here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Object/ELF.cpp#L272

This patch adds a support for these sections.

Differential revision: https://reviews.llvm.org/D71872

Revert "[RISCV] Add Clang frontend support for Bitmanip extension"

This reverts commit 57cf6ee9c84434161088c39a6f8dd2aae14eb12d.

[llvm-locstats][NFC] Support OOP concept

Making these changes, the code becomes more robust and easier for
adding the new features.

  -Introduce the LocationStats class representing the statistics
  -Add the pretty_print() method in the LocationStats class
  -Add additional '-' for the program options
  -Add the verify_program_inputs() function
  -Add the parse_locstats() function
  -Rename 'results' => 'opts'
  -Add more comments

Differential Revision: https://reviews.llvm.org/D71868

[RISCV] Support ABI checking with per function target-features

if users don't specific -mattr, the default target-feature come
from IR attribute.

[DWARF] Fix DWARFDebugAranges to support 64-bit CU offsets.

DWARFContext, the only user of this class, can already handle such offsets.

Differential Revision: https://reviews.llvm.org/D71834

[gn build] Port 0dc6c249bff

[MachO] Add a test for detecting reserved unit length.

This is a follow-up for D71546 to add a corresponding unit test.

Differential Revision: https://reviews.llvm.org/D72695

[AMDGPU] Invert the handling of skip insertion.

The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092

[VE] Minimal codegen for empty functions

Summary:
This patch implements minimal VE code generation for empty function bodies (no args, no value return).

Contents

* empty function code generation test.
* Minimal function prologue & epilogue emission
* Instruction formats and instruction definitions as far as required for the empty function prologue & epilogue.
* I64 register class definitions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D72598

[X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2.

We were performing an emulated i32->f64 in the SSE registers, then
storing that value to memory and doing a extload into the X87
domain.

After this patch we'll now just store the i32 to memory along
with an i32 0. Then do a 64-bit FILD to f80 completely in the X87
unit. This matches what we do without SSE.

[ARM] Reegenerate MVE tests. NFC

The mve-phireg.ll test no longer really tests what it was added for,
but the original case was fairly complex. I've left the test in as a
general codegen test.

[Attributor] AAValueConstantRange: Value range analysis using constant range

Summary:
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).

The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).

Currently, AAValueConstantRange is created in `getAssumedConstant` method when `AAValueSimplify` returns `nullptr`(worst state).

Supported
- BinaryOperator(add, sub, ...)
- CmpInst(icmp eq, ...)
- !range metadata

`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: phosek, davezarzycki, baziotis, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71620

[Scheduler] Adjust interface of CreateTargetMIHazardRecognizer to use ScheduleDAGMI. NFC

All the callers of this function will be ScheduleDAGMI from the
MachineScheduler. This allows us to use the extra info available in
ScheduleDAGMI without resorting to awkward casts.

[lldb/test] Add test for CMTime data formatter

Add a test for the CMTime data formatter. The coverage report showed
that this code path was untested.

[lldb/CommandInterpreter] Remove flag that's always true (NFC)

The 'asynchronously' argument to both GetLLDBCommandsFromIOHandler and
GetPythonCommandsFromIOHandler is true for all call sites. This commit
simplifies the API by dropping it and giving the baton a default
argument.

Fix up ms-pch-macro.c test to pass on non-Windows

[Driver][X86] Add -malign-branch* and -mbranches-within-32B-boundaries

These driver options perform some checking and delegate to MC options -x86-align-branch* and -x86-branches-within-32B-boundaries.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D72463

[ODRHash] Fix wrong error message with bitfields and mutable.

Add a check to bitfield mismatches that may have caused Clang to
give an error about the bitfield instead of being mutable.

[PowerPC] Fix powerpcspe subtarget enablement in llvm backend

Summary:
As currently written, -target powerpcspe will enable SPE regardless of
disabling the feature later on in the command line. Instead, change
this to just set a default CPU to 'e500' instead of a generic CPU.

As part of this, add FeatureSPE to the e500 definition.

Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D72673

Relax the rules around objc_alloc and objc_alloc_init optimizations.

Today the optimization is limited to:
- `[ClassName alloc]`
- `[self alloc]` when within a class method

However it means that when code is written this way:

```
    @interface MyObject
    - (id)copyWithZone:(NSZone *)zone
    {
        return [[self.class alloc] _initWith...];
    }

    @end
```

... then the optimization doesn't kick in and `+[NSObject alloc]` ends
up in IMP caches where it could have been avoided. It turns out that
`+alloc` -> `+[NSObject alloc]` is the most cached SEL/IMP pair in the
entire platform which is rather silly).

There's two theoretical risks allowing this optimization:

1. if the receiver is nil (which it can't be today), but it turns out
   that `objc_alloc()`/`objc_alloc_init()` cope with a nil receiver,

2. if the `Clas` type for the receiver is a lie. However, for such a
   code to work today (and not fail witn an unrecognized selector
   anyway) you'd have to have implemented the `-alloc` **instance
   method**.

   Fortunately, `objc_alloc()` doesn't assume that the receiver is a
   Class, it basically starts with a test that is similar to

       `if (receiver->isa->bits & hasDefaultAWZ) { /* fastpath */ }`.

   This bit is only set on metaclasses by the runtime, so if an instance
   is passed to this function by accident, its isa will fail this test,
   and `objc_alloc()` will gracefully fallback to `objc_msgSend()`.

   The one thing `objc_alloc()` doesn't support is tagged pointer
   instances. None of the tagged pointer classes implement an instance
   method called `'alloc'` (actually there's a single class in the
   entire Apple codebase that has such a method).

Differential Revision: https://reviews.llvm.org/D71682
Radar-Id: rdar://problem/58058316
Reviewed-By: Akira Hatanaka
Signed-off-by: Pierre Habouzit <phabouzit@apple.com>

CMake: Make most target symbols hidden by default

Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.

A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.

This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so

One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.

Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278

Reviewers: chandlerc, beanz, mgorny, rnk, hans

Reviewed By: rnk, hans

Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54439

PR44540: Prefer an inherited default constructor over an initializer
list constructor when initializing from {}.

We would previously pick between calling an initializer list constructor
and calling a default constructor unstably in this situation, depending
on whether the inherited default constructor had already been used
elsewhere in the program.

Modify test to use -S instead of -c so that it works when an external assembler is used that is not present.

DWARFDebugLine.cpp: Restore LF line endings

rG7e02406f6cf180a8c89ce64665660e7cc9dbc23e switched the file to CRLF
line endings.

[BranchAlign] Add master --x86-branches-within-32B-boundaries flag

This flag was originally part of D70157, but was removed as we carved away pieces of the review. Since we have the nop support checked in, and it appears mature(*), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults.

(*) I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach.

Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code. We don't yet have assembly syntax for the suppressions, so passing this directly to the assembler w/a raw .s file may result in broken code. Use at your own risk.

Also note that this isn't the wiring for the clang option. I think the most recent review for that is D72227, but I've lost track, so that might be off.

Differential Revision: https://reviews.llvm.org/D72738

[Concepts] Type Constraints

Add support for type-constraints in template type parameters.
Also add support for template type parameters as pack expansions (where the type constraint can now contain an unexpanded parameter pack).

Differential Revision: https://reviews.llvm.org/D44352

[X86] ABI compat bugfix for MSVC vectorcall

Summary:
Before this change, X86_32ABIInfo::classifyArgument would be called
twice on vector arguments to vectorcall functions. This function has
side effects to track GPR register usage, and this would lead to
incorrect GPR usage in some cases.  The specific case I noticed is from
running out of XMM registers with mixed FP and vector arguments and no
aggregates of any kind. Consider this prototype:

  void __vectorcall vectorcall_indirect_vec(
      double xmm0, double xmm1, double xmm2, double xmm3, double xmm4,
      __m128 xmm5,
      __m128 ecx,
      int edx,
      __m128 mem);

classifyArgument has no effects when called on a plain FP type, but when
called on a vector type, it modifies FreeRegs to model GPR consumption.
However, this should not happen during the vector call first pass.

I refactored the code to unify vectorcall HVA logic with regcall HVA
logic. The conventions pass HVAs in registers differently (expanded vs.
not expanded), but if they do not fit in registers, they both pass them
indirectly by address.

Reviewers: erichkeane, craig.topper

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D72110

Allow /D flags absent during PCH creation under msvc-compat

Summary:
Before this patch adding a new /D flag when compiling a source file that consumed a PCH with clang-cl would issue a diagnostic and then fail.  With the patch, the diagnostic is still issued but the definition is accepted.  This matches the msvc behavior.  The fuzzy-pch-msvc.c is a clone of the existing fuzzy-pch.c tests with some msvc specific rework.

msvc diagnostic:
  warning C4605: '/DBAR=int' specified on current command line, but was not specified when precompiled header was built

Output of the CHECK-BAR test prior to the code change:
  <built-in>(1,9): warning: definition of macro 'BAR' does not match definition in precompiled header [-Wclang-cl-pch]
  #define BAR int
          ^
  D:\repos\llvm\llvm-project\clang\test\PCH\fuzzy-pch-msvc.c(12,1): error: unknown type name 'BAR'
  BAR bar = 17;
  ^
  D:\repos\llvm\llvm-project\clang\test\PCH\fuzzy-pch-msvc.c(23,4): error: BAR was not defined
  #  error BAR was not defined
     ^
  1 warning and 2 errors generated.

Reviewers: rnk, thakis, hans, zturner

Subscribers: mikerice, aganea, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D72405

[Win64] Handle FP arguments more gracefully under -mno-sse

Pass small FP values in GPRs or stack memory according the the normal
convention. This is what gcc -mno-sse does on Win64.

I adjusted the conditions under which we emit an error to check if the
argument or return value would be passed in an XMM register when SSE is
disabled. This has a side effect of no longer emitting an error for FP
arguments marked 'inreg' when targetting x86 with SSE disabled. Our
calling convention logic was already assigning it to FP0/FP1, and then
we emitted this error. That seems unnecessary, we can ignore 'inreg' and
compile it without SSE.

Reviewers: jyknight, aemerson

Differential Revision: https://reviews.llvm.org/D70465

[amdgpu] Fix typos in a test case.

- There are typos introduced due to merge.

[X86] Drop an unneeded FIXME. NFC

The extload on X87 is free.

[X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit mode i64->f32/f64/f80 uint_to_fp algorithm.

This allows us to generate better code for selecting the fixup
to load.

Previously when the sign was set we had to load offset 0. And
when it was clear we had to load offset 4. This required a testl,
setns, zero extend, and finally a mul by 4. By switching the offsets
we can just shift the sign bit into the lsb and multiply it by 4.

[mlir] : Fix ViewOp shape folder for identity affine maps

Summary: Fix the ViewOpShapeFolder in case of no affine mapping associated with a Memref construct identity mapping.

Reviewers: nicolasvasilache

Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72735

[libcxx] Use C11 thread API on Fuchsia

On Fuchsia, pthread API is emulated on top of C11 thread API. Using C11
thread API directly is more efficient.

While this implementation is only used by Fuchsia at the moment, it's
not Fuchsia specific, and could be used by other platforms that use C11
threads rather than pthreads in the future.

Differential Revision: https://reviews.llvm.org/D64378

Fix windows bot failures in c410adb092c9cb51ddb0b55862b70f2aa8c5b16f
(clang diagnostic handler for IR input files)

[LIBOMPTARGET] Do not increment/decrement the refcount for "declare target" objects

The reference counter for global objects marked with declare target is INF. This patch prevents the runtime from incrementing /decrementing INF refcounts. Without it, the map(delete: global_object) directive actually deallocates the global on the device. With this patch, such a directive becomes a no-op.

Differential Revision: https://reviews.llvm.org/D72525

[codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU.

Summary:
- `dead-mi-elimination` assumes MIR in the SSA form and cannot be
  arranged after phi elimination or DeSSA. It's enhanced to handle the
  dead register definition by skipping use check on it. Once a register
  def is `dead`, all its uses, if any, should be `undef`.
- Re-arrange the DIE in RA phase for AMDGPU by placing it directly after
  `detect-dead-lanes`.
- Many relevant tests are refined due to different register assignment.

Reviewers: rampitec, qcolombet, sunfish

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72709

[mlir][spirv] Properly support SPIR-V conversion target

This commit defines a new SPIR-V dialect attribute for specifying
a SPIR-V target environment. It is a dictionary attribute containing
the SPIR-V version, supported extension list, and allowed capability
list. A SPIRVConversionTarget subclass is created to take in the
target environment and sets proper dynmaically legal ops by querying
the op availability interface of SPIR-V ops to make sure they are
available in the specified target environment. All existing conversions
targeting SPIR-V is changed to use this SPIRVConversionTarget. It
probes whether the input IR has a `spv.target_env` attribute,
otherwise, it uses the default target environment: SPIR-V 1.0 with
Shader capability and no extra extensions.

Differential Revision: https://reviews.llvm.org/D72256

[remark][diagnostics] Using clang diagnostic handler for IR input files

For IR input files, we currently use LLVM diagnostic handler even the
compilation is from clang. As a result, we are not able to use -Rpass
to get the transformation reports. Some warnings are not handled
properly either: We found many mysterious warnings in our ThinLTO backend
compilations in SamplePGO and CSPGO. An example of the warning:
"warning: net/proto2/public/metadata_lite.h:51:21: 0.02% (1 / 4999)"

This turns out to be a warning by Wmisexpect, which is supposed to be
filtered out by default. But since the filter is in clang's
diagnostic hander, we emit these incomplete warnings from LLVM's
diagnostic handler.

This patch uses clang diagnostic handler for IR input files. We create
a fake backendconsumer just to install the diagnostic handler.

With this change, we will have proper handling of all the warnings and we can
use -Rpass* options in IR input files compilation.
Also note that with is patch, LLVM's diagnostic options, like
"-mllvm -pass-remarks=*", are no longer be able to get optimization remarks.

Differential Revision: https://reviews.llvm.org/D72523

[mlir] Refactor ModuleState into AsmState and expose it to users.

Summary:
This allows for users to cache printer state, which can be costly to recompute. Each of the IR print methods gain a new overload taking this new state class.

Depends On D72293

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D72294

[OPENMP]Do not use RTTI by default for NVPTX devices.

NVPTX does not support RTTI, so disable it by default.

[mlir] Enable printing of FuncOp in the generic form.

Summary:
This was previously disabled as FunctionType TypeAttrs could not be roundtripped in the IR. This has been fixed, so we can now generically print FuncOp.

Depends On D72429

Reviewed By: jpienaar, mehdi_amini

Differential Revision: https://reviews.llvm.org/D72642

make -fmodules-codegen and -fmodules-debuginfo work also with PCHs

Allow to build PCH's (with -building-pch-with-obj and the extra .o file)
with -fmodules-codegen -fmodules-debuginfo to allow emitting shared code
into the extra .o file, similarly to how it works with modules. A bit of
a misnomer, but the underlying functionality is the same. This saves up
to 20% of build time here.

Differential Revision: https://reviews.llvm.org/D69778

fix recent -fmodules-codegen fix test

-fmodules-codegen should not emit extern templates

If a header contains 'extern template', then the template should be provided
somewhere by an explicit instantiation, so it is not necessary to generate
a copy. Worse, this can lead to an unresolved symbol, because the codegen's
object file will not actually contain functions from such a template
because of the GVA_AvailableExternally, but the object file for the explicit
instantiation will not contain them either because it will be blocked
by the information provided by the module.

Differential Revision: https://reviews.llvm.org/D69779

[mlir][Linalg] Update the semantics, verifier and test for Linalg with tensors.

Summary:
This diff fixes issues with the semantics of linalg.generic on tensors that appeared when converting directly from HLO to linalg.generic.
The changes are self-contained within MLIR and can be captured and tested independently of XLA.

The linalg.generic and indexed_generic are updated to:

To allow progressive lowering from the value world (a.k.a tensor values) to
the buffer world (a.k.a memref values), a linalg.generic op accepts
mixing input and output ranked tensor values with input and output memrefs.

```
%1 = linalg.generic #trait_attribute %A, %B {other-attributes} :
  tensor<?x?xf32>,
  memref<?x?xf32, stride_specification>
  -> (tensor<?x?xf32>)
```

In this case, the number of outputs (args_out) must match the sum of (1) the
number of output buffer operands and (2) the number of tensor return values.
The semantics is that the linalg.indexed_generic op produces (i.e.
allocates and fills) its return values.

Tensor values must be legalized by a buffer allocation pass before most
transformations can be applied. Such legalization moves tensor return values
into output buffer operands and updates the region argument accordingly.

Transformations that create control-flow around linalg.indexed_generic
operations are not expected to mix with tensors because SSA values do not
escape naturally. Still, transformations and rewrites that take advantage of
tensor SSA values are expected to be useful and will be added in the near
future.

Subscribers: bmahjour, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72555

[DAGCombine] Replace `getIntPtrConstant()` with `getVectorIdxTy()`.

- Prefer `getVectorIdxTy()` as the index operand type for
`EXTRACT_SUBVECTOR` as targets expect different types by overloading
`getVectorIdxTy()`.

[OPENMP]Do not emit special virtual function for NVPTX target.

There are no special virtual function handlers (like __cxa_pure_virtual)
defined for NVPTX target, so just emit such functions as null pointers
to prevent issues with linking and unresolved references.

[mlir] Use double format when parsing bfloat16 hexadecimal values

Summary: bfloat16 doesn't have a valid APFloat format, so we have to use double semantics when storing it. This change makes sure that hexadecimal values can be round-tripped properly given this fact.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D72667

Remove trailing `;`. NFC.

[AArch64][GlobalISel]: Support @llvm.{return,frame}address selection.

These intrinsics expand to a variable number of instructions so just like in
ISelLowering.cpp we use custom code to deal with them.

Committing Tim's original patch.

Differential Revision: https://reviews.llvm.org/D65656

[Driver][test] Fix Driver/hexagon-toolchain-elf.c for -DCLANG_DEFAULT_LINKER=lld builds

Reviewed By: nathanchance, sidneym

Differential Revision: https://reviews.llvm.org/D72668

[LegalizeTypes] Remove untested code from ExpandIntOp_UINT_TO_FP

This code is untested in tree because the "APFloat::semanticsPrecision(sem) >= SrcVT.getSizeInBits() - 1" check is false for most combinations for int and fp types except maybe i32 and f64. For that you would need i32 to be an illegal type, but f64 to be legal and have custom handling for legalizing the split sint_to_fp. The precision check itself was added in 2010 to fix a double rounding issue in the algorithm that would occur if the sint_to_fp was not able to do the conversion without rounding.

Differential Revision: https://reviews.llvm.org/D72728

[GVN] fix comment/argument name to match actual implementation. NFC

[clang][test][NFC] Use more widely supported sanitizer for file dependency tests

The tests aren't concerned at all by the actual sanitizer - only by blacklist being reported as a dependency.
We're unfortunately limited by platform support for any particular sanitizer but we can at least use one that is widely supported.

Post-commit review:
https://reviews.llvm.org/D72729

[InstCombine] Fix worklist management when removing guard intrinsic

When multiple guard intrinsics are merged into one, currently the
result of eraseInstFromFunction() is returned -- however, this
should only be done if the current instruction is being removed.
In this case we're removing a different instruction and should
instead report that the current one has been modified by returning it.

For this test case, this reduces the number of instcombine iterations
from 5 to 2 (the minimum possible).

Differential Revision: https://reviews.llvm.org/D72558

[DebugInfo] Add option to clang to limit debug info that is emitted for classes.

Summary:
This patch adds an option to limit debug info by only emitting complete class
type information when its constructor is emitted. This applies to classes
that have nontrivial user defined constructors.

I implemented the option by adding another level to `DebugInfoKind`, and
a flag `-flimit-debug-info-constructor`.

Total object file size on Windows, compiling with RelWithDebInfo:
  before: 4,257,448 kb
  after:  2,104,963 kb

And on Linux
  before: 9,225,140 kb
  after:  4,387,464 kb

According to the Windows clang.pdb files, here is a list of types that are no
longer complete with this option enabled: https://reviews.llvm.org/P8182

Reviewers: rnk, dblaikie

Subscribers: aprantl, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D72427

[analyzer] Fix SARIF column locations

Differential revision: https://reviews.llvm.org/D70689

dotest.py: Add option to pass extra lldb settings to dotest

The primary motivation for this is to add another dimension to the
Swift LLDB test matrix, but this seems generally useful.

Differential Revision: https://reviews.llvm.org/D72662

[libcxx] [Windows] Make a more proper implementation of strftime_l for mingw with msvcrt.dll

This also makes this function consistent with the rest of the
libc++ provided fallbacks.

The locale support in msvcrt.dll is very limited anyway; it can
only be configured processwide, not per thread, and it only seems
to support the locales "C" and "" (the user set locale), so it's
hard to make any meaningful automatic test for it. But manually tested,
this change does make time formatting locale code in libc++ output
times in the user requested format, when using locale "".

Differential Revision: https://reviews.llvm.org/D69554

[SVE] Add patterns for MUL immediate instruction.

Summary: Add the missing MUL pattern for integer immediate instructions.

Reviewers: sdesmalen, huntergr, efriedma, c-rhodes, kmclaughlin

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72654

[Driver] Ignore -fno-semantic-interposition

Fedora wants to build projects with -fno-semantic-interposition (e.g.
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup),
which is supported by GCC>=5.

Clang's current behavior is similar to -fno-semantic-interposition and
the end goal is to make it more so
(https://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html).
Ignore this option.

We should let users know -fsemantic-interposition is not currently
supported, so it should remain a hard error.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D72724

[OpenMP][Tool] Runtime warning for missing TSan-option

TSan spuriously reports for any OpenMP application a race on the initialization
of a runtime internal mutex:

```
Atomic read of size 1 at 0x7b6800005940 by thread T4:
  #0 pthread_mutex_lock <null> (a.out+0x43f39e)
  #1 __kmp_resume_64 <null> (libomp.so.5+0x84db4)

Previous write of size 1 at 0x7b6800005940 by thread T7:
  #0 pthread_mutex_init <null> (a.out+0x424793)
  #1 __kmp_suspend_initialize_thread <null> (libomp.so.5+0x8422e)
```

According to @AndreyChurbanov this is a false positive report, as the control
flow of the runtime guarantees the ordering of the mutex initialization and
the lock:
https://software.intel.com/en-us/forums/intel-open-source-openmp-runtime-library/topic/530363

To suppress this report, I suggest the use of
TSAN_OPTIONS='ignore_uninstrumented_modules=1'.
With this patch, a runtime warning is provided in case an OpenMP application
is built with Tsan and executed without this Tsan-option.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D70412

[NewPM] Port MergeFunctions pass

This ports the MergeFunctions pass to the NewPM. This was rather
straightforward, as no analyses are used.

Additionally MergeFunctions needs to be conditionally enabled in
the PassBuilder, but I left that part out of this patch.

Differential Revision: https://reviews.llvm.org/D72537

[OPENMP]Improve handling of possibly incorrectly mapped types.

Need to analayze the type of the expression for mapping, not the type of
the declaration.

[InstCombine] Fix infinite loop due to bitcast <-> phi transforms

Fix for https://bugs.llvm.org/show_bug.cgi?id=44245.

The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up
fighting against each other, because optimizeBitCastFromPhi()
assumes that bitcasts of loads will get folded. This doesn't
happen here, because a dangling phi node prevents the one-use
fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering.

This patch fixes the issue by explicitly performing the load
combine as part of the bitcast of phi transform. Other attempts
to force the load to be combined first were ultimately too
unreliable.

Differential Revision: https://reviews.llvm.org/D71164

[InstCombine] Make combineLoadToNewType a method; NFC

So it can be reused as part of other combines.
In particular for D71164.

[InstCombine] Fix user iterator invalidation in bitcast of phi transform

This fixes the issue encountered in D71164. Instead of using a
range-based for, manually iterate over the users and advance the
iterator beforehand, so we do not skip any users due to iterator
invalidation.

Differential Revision: https://reviews.llvm.org/D72657

[InstCombine] Add test for iterator invalidation bug; NFC

[nfc][libomptarget] Refactor nvptx/target_impl.cu

Summary:
[nfc][libomptarget] Refactor nxptx/target_impl.cu

Use __kmpc_impl_atomic_add instead of atomicAdd to match the rest of the file.
Alternatively, target_impl.cu could use the cuda functions directly. Using a mixture in this
file was an oversight, happy to resolve in either direction.

Removed some comments that look outdated.

Call __kmpc_impl_unset_lock directly to avoid a redundant diagnostic and remove an implict
dependency on interface.h.

Reviewers: ABataev, grokos, jdoerfert

Reviewed By: jdoerfert

Subscribers: jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72719

[nfc][libomptarget] Refactor amdgcn target_impl

Summary:
[nfc][libomptarget] Refactor amdgcn target_impl

Removes references to internal libraries from the header
Standardises on C++ mangling for all the target_impl functions
Update comment block
clang-format
Move some functions into a new target_impl.hip source file

This lays the groundwork for implementing the remaining unresolved
symbols in the target_impl.hip source.

Reviewers: jdoerfert, grokos, ABataev, ronlieb

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72712

Fix NetBSD bot after b4a99a061f517e60985667e39519f60186cbb469 ([Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation)