review.tizen.org Git - platform/upstream/llvm.git/log

[X86] Restore selection of MULX on BMI2 targets.

Looking back over gcc and icc behavior it looks like icc does
use mulx32 on 32-bit targets and mulx64 on 64-bit targets. It's
also used when dividing i32 by constant on 32-bit targets and
i64 by constant on 64-bit targets.

gcc uses it multiplies producing a 64 bit result on 32-bit targets
and 128-bit results on a 64-bit target. gcc does not appear to use
it for division by constant.

After this patch clang is closer to the icc behavior. This
basically reverts d1c61861ddc94457b08a5a653d3908b7b38ebb22, but
there were no strong feelings at the time.

Fixes PR45518.

Differential Revision: https://reviews.llvm.org/D80498

[llvm]NFC] Simplify ProfileSummaryInfo state transitions

ProfileSummaryInfo is updated seldom, as result of very specific
triggers. This patch clearly demarcates state updates from read-only uses.
This, arguably, improves readability and maintainability.

[InstCombine] add tests for vector demanded elements of select condition; NFC

AMDGPU: Start adding MODE register uses to instructions

This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.

Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.

Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.

[BPF] simplify zero extension with MOV_32_64

The current pattern matching for zext results in the following code snippet
being produced,

  w1 = w0
  r1 <<= 32
  r1 >>= 32

Because BPF implementations require zero extension on 32bit loads this
both adds a few extra unneeded instructions but also makes it a bit
harder for the verifier to track the r1 register bounds. For example in
this verifier trace we see at the end of the snippet R2 offset is unknown.
However, if we track this correctly we see w1 should have the same bounds
as r8. R8 smax is less than U32 max value so a zero extend load should keep
the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to
an off=0, as seen in R7 should create a max offset of 800. However at the
end of the snippet we note the R2 max offset is 0xffffFFFF.

  R0=inv(id=0,smax_value=800)
  R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff))
  R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
  R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R9=inv800 R10=fp0 fp-8=mmmm????
58: (1c) w9 -= w8
59: (bc) w1 = w8
60: (67) r1 <<= 32
61: (77) r1 >>= 32
62: (bf) r2 = r7
63: (0f) r2 += r1
64: (bf) r1 = r6
65: (bc) w3 = w9
66: (b7) r4 = 0
67: (85) call bpf_get_stack#67
  R0=inv(id=0,smax_value=800)
  R1_w=ctx(id=0,off=0,imm=0)
  R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
  R4_w=inv0 R6=ctx(id=0,off=0,imm=0)
  R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
  R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
  R10=fp0 fp-8=mmmm????

After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we
get correct bounds on R2 umax_value=800.

Further it reduces 3 insns to 1.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Differential Revision: https://reviews.llvm.org/D73985

[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm

Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.

Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc

Reviewed By: stefanp, nemanjai, amyk, #powerpc

Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo

Tags: #clang, #powerpc, #llvm

Differential Revision: https://reviews.llvm.org/D80020

[mlir] [VectorOps] Add 'vector.flat_transpose' operation

Summary:
Provides a representation of the linearized LLVM instrinsic.
With tests and lowering implementation to LLVM IR dialect.
Prepares better lowering for 2-D vector.transpose.

Reviewers: nicolasvasilache, ftynse, reidtatge, bkramer, dcaballe

Reviewed By: ftynse, dcaballe

Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80419

[CodeMoverUtils] Use dominator tree level to decide the direction of
code motion

Summary: Currently isSafeToMoveBefore uses DFS numbering for determining
the relative position of instruction and insert point which is not
always correct. This PR proposes the use of Dominator Tree depth for the
same. If a node is at a higher level than the insert point then it is
safe to say that we want to move in the forward direction.
Authored By: RithikSharma
Reviewer: Whitney, nikic, bmahjour, etiotto, fhahn
Reviewed By: Whitney
Subscribers: fhahn, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80084

[Driver] Support -fsanitize=shadow-call-stack and cfi-icall on aarch64_be

D80647 did not fix https://bugs.llvm.org/show_bug.cgi?id=46076
This is the fix.

[NFC][XCOFF][AIX] Return function entry point symbol with dedicate function

Use getFunctionEntryPointSymbol whenever possible to enclose the
implementation detail and reduce duplicate logic.

Differential Revision: https://reviews.llvm.org/D80402

AMDGPU: Set StackPointerRegisterToSaveRestore

This will enable selecting non-entry block allocas. Skip the SP write
check in the base isSchedulingBoundary implementation to preserve the
previous scheduling behavior and avoid test churn. It's apparently for
compile time reasons, but if we were to use this more work would be
needed since in some of the failing tests, we seem to incorrectly get
hazard nops inserted.

[Driver] Support -fsanitize=shadow-call-stack on aarch64_be

Fixes https://bugs.llvm.org/show_bug.cgi?id=46076

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D80647

[clangd] Add access specifier information to hover contents

Summary:
For https://github.com/clangd/clangd/issues/382

This commit adds access specifier information to the hover
contents. For example, the hover information of a class field or
member function will now indicate if the field or member is private,
public, or protected. This can be particularly useful when a developer
is in the implementation file and wants to know if a particular member
definition is public or private.

Reviewers: kadircet

Reviewed By: kadircet

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80472

[lldb/Reproducers] Skip API logging in the DUMMY macro

The purpose of the LLDB_RECORD_DUMMY macro is twofold: it is used in
functions that take arguments that we don't know how to serialize (e.g.
void*) and it's used by function where we want to avoid doing excessive
work because they can be called from a signal handler (e.g.
setTerminalWidth).

To support the latter case, I've disabled API logging form the Recorder
ctor used by the DUMMY macro. This ensures we don't allocate memory when
called from a signal handler.

AMDGPU: Fix dropping MI flags when rewriting instructions

All 3 passes that change instruction encodings were dropping MI
flags. This avoids scheduling regressions caused by setting
mayRaiseFPExceptions on FP instructions for non-strictfp functions.

[lldb] Make order of completions for expressions deterministic and sorted by Clang's priority values.

Summary:

It turns out that the order in which we provide completions for expressions is
nondeterministic. This leads to confusing user experience and also breaks the
reproducer tests (as two LLDB tests can go out of sync due to the
non-determinism in the completion lists)

The reason for the non-determinism is that the CompletionConsumer informs us
about decls in the order in which it finds declarations in the lookup store of
the DeclContexts it visits (mainly this snippet in SemaLookup.cpp):

``` lang=c++
    // Enumerate all of the results in this context.
    for (DeclContextLookupResult R :
         Load ? Ctx->lookups()
              : Ctx->noload_lookups(/*PreserveInternalState=*/false)) {
       [...]
```

This storage of the lookup is sorted by pointer values (see the hash of
`DeclarationName`) and can therefore be non-deterministic. The LLDB code
completion consumer that receives these calls originally expected that the order
of declarations is defined by Clang, but it seems the API expects the client to
provide an order to the completions.

This patch fixes the issue as follows:

* We sort the completions we get from Clang alphabetically and also by the
priority value we get from Clang (with priority value sorting having precedence
over the alphabetical sorting)

* We make all the functions/variables that touch a completion before the sorting
const-qualified. The idea is that this should prevent that we never have
observable side-effect from touching these declarations in a non-deterministic
order (e.g., we don't try to complete the type by accident).

This way we behave like the other parts of Clang which also sort the results by
some deterministic value (usually the name or something computed from a name,
e.g., edit distance to a given string).

We most likely also need to fix the Clang code to make the loop I listed above
deterministic to prevent these issues in the future (tracked in rdar://63442513
). This wouldn't replace the functionality provided in this patch though as we
would still need the priority and overall alphabetical sorting.

Note: I had to increase the lldb-vscode completion limit to 100 as the tests
look for strings that aren't in the first 50 results anymore due to variable
names starting with letters like 'v' (which are now always shown much further
down in the list due to the alphabetical sorting).

Fixes rdar://63200995

Reviewers: JDevlieghere, clayborg

Reviewed By: JDevlieghere

Subscribers: mgrang, abidh

Differential Revision: https://reviews.llvm.org/D80292

[X86] Assemble movzb 1280(%rbx, %r12), %r12 after D80608

ffmpeg/libavcodec/x86/h264_cabac.c inline assembly may produce
movzb 1280(%rbx, %r12), %r12

After D80608, llvm-mc errors:

error: unknown use of instruction mnemonic without a size suffix

[mlir][spirv] Lower allocation/deallocations of workgroup memory.

This allocation of a workgroup memory is lowered to a
spv.globalVariable. Only static size allocation with element type
being int or float is handled. The lowering does account for the
element type that are not supported in the lowered spv.module based on
the extensions/capabilities and adjusts the number of elements to get
the same byte length.

Differential Revision: https://reviews.llvm.org/D80411

[CodeGen] fix typo `def nxv1bf32` -> `def nxv1f32`

The `Add bfloat MVT type` patch introduced a typo in the nxv1f32 definition
in llvm/include/llvm/CodeGen/ValueTypes.td:
https://reviews.llvm.org/D79706/new/#inline-740433

This patch fixes that.

[gn build] Port 0d20ed664ff

[DDG] Data Dependence Graph - Add query function for memory dependencies between two nodes

Summary:
When working with the DDG it's useful to be able to query details of the
memory dependencies between two nodes connected by a memory edge. The DDG
does not hold a copy of the dependencies, but it contains a reference to a
DependenceInfo object through which dependence information can be queried.
This patch adds a query function to the DDG to obtain all the Dependence
objects that exist between instructions of two nodes.

Authored By: bmahjour

Reviewers: Meinersbur, Whitney, etiotto

Reviewed By: Whitney

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80529

[gn build] (manually) port dedaf3a2ac5

[MLIR] [OpenMP] Add basic OpenMP parallel operation

Summary:
This includes a basic implementation for the OpenMP parallel
operation without a custom pretty-printer and parser.
The if, num_threads, private, shared, first_private, last_private,
proc_bind and default clauses are included in this implementation.

Currently the reduction clause is omitted as it is more complex and
requires analysis to see if we can share implementation with the loop
dialect. The allocate clause is also omitted.

A discussion about the design of this operation can be found here:
https://llvm.discourse.group/t/openmp-parallel-operation-design-issues/686

The current OpenMP Specification can be found here:
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf

Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Reviewers: jdoerfert

Subscribers: mgorny, yaxunl, kristof.beyls, guansong, mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, grosul1, frgossen, Kayjukh, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79410

Start migrating away from statepoint's inline length prefixed argument bundles

In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles.

This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles.

Differential Revision: https://reviews.llvm.org/D8059

[VFABI] Fix parsing of uniform parameters that shouldn't expect step or positional data.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80575

Fix warning `-Wpedantic`. NFC.

Fix Darwin 'constinit thread_local' variables.

Unlike other platforms using ItaniumCXXABI, Darwin does not allow the
creation of a thread-wrapper function for a variable in the TU of
users. Because of this, it can set the linkage of the thread-local
symbol to internal, with the assumption that no TUs other than the one
defining the variable will need it.

However, constinit thread_local variables do not require the use of
the thread-wrapper call, so users reference the variable
directly. Thus, it must not be converted to internal, or users will
get a link failure.

This was a regression introduced by the optimization in
00223827a952f66e7426c9881a2a4229e59bb019.

Differential Revision: https://reviews.llvm.org/D80417

CoverageFilters.h - reduce unnecessary includes to forward declarations. NFC.

[mlir] Add simple generator for return types

Take advantage of equality constrains to generate the type inference interface.
This is used for equality and trivially built types. The type inference method
is only generated when no type inference trait is specified already.

This reorders verification that changes some test error messages.

Differential Revision: https://reviews.llvm.org/D80484

[OPENMP50]Initial support for use_device_addr clause.

Summary:
Added parsing/sema analysis/serialization support for use_device_addr
clauses.

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, arphaman, sstefan1, llvm-commits, cfe-commits, caomhin

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80404

[FileCheck] Allow parenthesized expressions

With this change it is be possible to write FileCheck expressions such
as [[#(VAR+1)-2]]. Currently, the only supported arithmetic operators are
plus and minus, so this is not particularly useful yet. However, it our
CHERI fork we have tests that benefit from having multiplication in
FileCheck expressions. Allowing parenthesized expressions is the simplest
way for us to work around the current lack of operator precedence in
FileCheck expressions.

Reviewed By: thopre, jhenderson
Differential Revision: https://reviews.llvm.org/D77383

Add support for UnaryOperator in SyntaxTree

Reviewers: gribozavr2

Reviewed By: gribozavr2

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80624

SpecialCaseList.h - reduce unnecessary includes to forward declarations. NFC.

Remove Regex forward declaration as we already require the Regex.h include.

Add missing VirtualFileSystem.h include to dependent source files.

Revert "[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm"

This reverts commit 7eb666b1556b86503f2f386bf921186cdbb2d22a.

[AArch64][BFloat] add BFloat instruction support for AArch64

Summary:
Add support for lowering various BFloat related SelDAG nodes:
- load/store (ldrh/strh)
- concat
- dup/duplane
- bitconvert/bitcast
- insert_subvector/insert_subreg

This patch is part of a series implementing the Bfloat16 extension of the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Reviewers: ab, t.p.northover, john.brawn, fpetrogalli, sdesmalen, LukeGeeson

Reviewed By: fpetrogalli

Subscribers: LukeGeeson, pbarrio, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79712

tsan: fix false positives in AcquireGlobal

Add ThreadClock:: global_acquire_ which is the last time another thread
has done a global acquire of this thread's clock.

It helps to avoid problem described in:
https://github.com/golang/go/issues/39186
See test/tsan/java_finalizer2.cpp for a regression test.
Note the failuire is _extremely_ hard to hit, so if you are trying
to reproduce it, you may want to run something like:
$ go get golang.org/x/tools/cmd/stress
$ stress -p=64 ./a.out

The crux of the problem is roughly as follows.
A number of O(1) optimizations in the clocks algorithm assume proper
transitive cumulative propagation of clock values. The AcquireGlobal
operation may produce an inconsistent non-linearazable view of
thread clocks. Namely, it may acquire a later value from a thread
with a higher ID, but fail to acquire an earlier value from a thread
with a lower ID. If a thread that executed AcquireGlobal then releases
to a sync clock, it will spoil the sync clock with the inconsistent
values. If another thread later releases to the sync clock, the optimized
algorithm may break.

The exact sequence of events that leads to the failure.
- thread 1 executes AcquireGlobal
- thread 1 acquires value 1 for thread 2
- thread 2 increments clock to 2
- thread 2 releases to sync object 1
- thread 3 at time 1
- thread 3 acquires from sync object 1
- thread 1 acquires value 1 for thread 3
- thread 1 releases to sync object 2
- sync object 2 clock has 1 for thread 2 and 1 for thread 3
- thread 3 releases to sync object 2
- thread 3 sees value 1 in the clock for itself
  and decides that it has already released to the clock
  and did not acquire anything from other threads after that
  (the last_acquire_ check in release operation)
- thread 3 does not update the value for thread 2 in the clock from 1 to 2
- thread 4 acquires from sync object 2
- thread 4 detects a false race with thread 2
  as it should have been synchronized with thread 2 up to time 2,
  but because of the broken clock it is now synchronized only up to time 1

The global_acquire_ value helps to prevent this scenario.
Namely, thread 3 will not trust any own clock values up to global_acquire_
for the purposes of the last_acquire_ optimization.

Reviewed-in: https://reviews.llvm.org/D80474
Reported-by: nvanbenschoten (Nathan VanBenschoten)

[AArch64][BFloat] basic AArch64 bfloat support

Summary:
This patch adds the bfloat type to the AArch64 backend:
- adds it as part of the FPR16 register class
- adds bfloat calling conventions
- as f16 is now not the only FPR16 type anymore, we need to constrain a number
of instruction patterns using FPR16Op to help out the TableGen type inferrer

This patch is part of a series implementing the Bfloat16 extension of the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Reviewers: t.p.northover, c-rhodes, fpetrogalli, sdesmalen, ostannard, LukeGeeson, ab

Reviewed By: fpetrogalli

Subscribers: pbarrio, LukeGeeson, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79709

[mlir] SCF: provide function_ref builders for IfOp

Now that OpBuilder is available in `build` functions, it becomes possible to
populate the "then" and "else" regions directly when building the "if"
operation. This is desirable in more structured forms of builders, especially
in when conditionals are mixed with loops. Provide new `build` APIs taking
callbacks for body constructors, similarly to scf::ForOp, and replace more
clunky edsc::BlockBuilder uses with these. The original APIs remain available
and go through the new implementation.

Differential Revision: https://reviews.llvm.org/D80527

[compiler-rt][asan] Add noinline to use-after-scope testcases

Some testcases are unexpectedly passing with NPM.
This is because the target functions are inlined in NPM.

I think we should add noinline attribute to keep these test points.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D79648

[llvm-readobj] - Do not crash when an invalid .eh_frame_hdr is dumped using --unwind.

When the p_offset/p_filesz of the PT_GNU_EH_FRAME is invalid
(e.g larger than the file size) then llvm-readobj might crash.

This patch fixes the issue. I've introduced `ELFFile<ELFT>::getSegmentContent`
method, which is very similar to `ELFFile<ELFT>::getSectionContentsAsArray` one.

Differential revision: https://reviews.llvm.org/D80380

[IR][BFloat] add BFloat IR intrinsics support

Summary:
This patch is part of a series that adds support for the Bfloat16 extension of
the Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Reviewers: scanon, fpetrogalli, sdesmalen, craig.topper, LukeGeeson

Reviewed By: fpetrogalli

Subscribers: LukeGeeson, pbarrio, kristof.beyls, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79707

[UnJ] Update LI for inner nested loops

This makes sure to correctly register the loop info of the children
of unroll and jammed loops. It re-uses some code from the unroller for
registering subloops.

Differential Revision: https://reviews.llvm.org/D80619

AMDGPU: Fix backwards s_cselect_* operands

The vector equivalent has backwards operands, but the scalar version
does not. The passes that use these hooks aren't enabled by default,
so this doesn't really change anything.

[IR] add set function for FMF 'contract'

This was missed when the flag was added with D31164.

ObjectFile.h - reduce unnecessary includes to forward declarations. NFC.

Fix SubtargetFeature.h include dependency in XCOFFObjectFile.cpp

ObjCARCInstKind.h - remove unused includes. NFC.

[CodeGen][BFloat] Add bfloat MVT type

Summary:
This patch adds BFloat MVT support. It also adds fixed and scalable vector MVT
types for BFloat.

This patch is part of a series that adds support for the Bfloat16 extension of the Armv8.6-a architecture, as
detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Reviewers: aemerson, huntergr, craig.topper, fpetrogalli, sdesmalen, LukeGeeson, ostannard

Reviewed By: ostannard

Subscribers: LukeGeeson, pbarrio, dschuff, kristof.beyls, hiraditya, aheejin, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79706

Update release notes with porting guide for AST Matchers

[Alignment] Fix misaligned interleaved loads

Summary: Tentatively fixing https://bugs.llvm.org/show_bug.cgi?id=45957

Reviewers: craig.topper, nlopes

Subscribers: hiraditya, llvm-commits, RKSimon, jdoerfert, efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80276

[lldb] Tab completion for process plugin name

Summary:

1. Added tab completion to `process launch -p`, `process attach -P`, `process
connect -p`;

2. Bound the plugin name common completion as the default completion for
`eArgTypePlugin` arguments.

Reviewers: teemperor, JDevlieghere

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D79929

[ARM] Fix rewrite of frame index in Thumb2's address mode i8s4

Summary:
In Thumb2's frame index rewriting process, the address mode i8s4, which
is used by LDRD and STRD instructions, is handled by taking the
immediate offset operand and multiplying it by 4.

This behaviour is wrong, however. In this specific address mode, the
MachineInstr's immediate operand is already in the expected form. By
consequence of that, multiplying it once more by 4 yields a flawed
offset value, four times greater than it should be.

Differential Revision: https://reviews.llvm.org/D80557

[lldb] Fix a potential bug that may cause assert failure in CommandObject::CheckRequirements

Summary: `CommandObject::CheckRequirements` requires cleaning up `m_exe_ctx`
between commands. Function `HandleOptionCompletion` returns without cleaning up
`m_exe_ctx` could cause assert failure in later `CheckRequirements`.

Reviewers: teemperor, JDevlieghere

Reviewed By: teemperor

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D80447

[NFC] Updating tests

Summary:
Updating IR now that alignment is explicitly set.
This is a prerequisite to D80276.

Reviewers: efriedma

Subscribers: llvm-commits, craig.topper

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80549

[LAA] We only need pointer checks if there are non-zero checks (NFC).

If it turns out that we can do runtime checks, but there are no
runtime-checks to generate, set RtCheck.Need to false.

This can happen if we can prove statically that the pointers passed in
to canCheckPtrAtRT do not alias. This should not change any results, but
allows us to skip some work and assert that runtime checks are
generated, if LAA indicates that runtime checks are required.

Reviewers: anemet, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79969

Note: This is a recommit of 259abfc7cbc11cd98c05b1eb8e4b3fb6a9664bc0,
with some suggested renaming.

Revert "[LAA] We only need pointer checks if there are non-zero checks (NFC)."

This reverts commit 259abfc7cbc11cd98c05b1eb8e4b3fb6a9664bc0.

Reverting this, as I missed a case where we return without setting
RtCheck.Need.

[LAA] We only need pointer checks if there are non-zero checks (NFC).

If it turns out that we can do runtime checks, but there are no
runtime-checks to generate, set RtCheck.Need to false.

This can happen if we can prove statically that the pointers passed in
to canCheckPtrAtRT do not alias. This should not change any results, but
allows us to skip some work and assert that runtime checks are
generated, if LAA indicates that runtime checks are required.

Reviewers: anemet, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79969

[SimpleLoopUnswitch] Drop uses of instructions before block deletion

Currently if instructions defined in a block are used in unreachable
blocks and SimpleLoopUnswitch attempts deleting the block, it triggers
assertion "Uses remain when a value is destroyed!".
This patch fixes it by replacing all uses of instructions from BB with
undefs before BB deletion.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D80551

[llvm-readelf] - Split GNUStyle<ELFT>::printHashHistogram. NFC.

As was mentioned in review comments for D80204,
`printHashHistogram` has 2 lambdas that are probably too large
and deserves splitting into member functions.

This patch does it.

Differential revision: https://reviews.llvm.org/D80546

ArchiveWriter.h - remove unnecessary includes. NFC.

DOTGraphTraitsPass.h - remove unnecessary includes. NFC.

[llvm-readobj] - Do not skip building of the GNU hash table histogram.

When the `--elf-hash-histogram` is used, the code first tries to build
a histogram for the .hash table and then for the .gnu.hash table.

The problem is that dumper might return early when unable or do not need to
build a histogram for the .hash.

This patch reorders the code slightly to fix the issue and adds a test case.

Differential revision: https://reviews.llvm.org/D80204

[lldb] Don't complete ObjCInterfaceDecls in ClangExternalASTSourceCallbacks::FindExternalVisibleDeclsByName

Summary:
For ObjCInterfaceDecls, LLDB iterates over the `methods` of the interface in FindExternalVisibleDeclsByName
since commit ef423a3ba57045f80b0fcafce72121449a8b54d4 .
However, when LLDB calls `oid->methods()` in that function, Clang will pull in all declarations in the current
DeclContext from the current ExternalASTSource (which is again, `ClangExternalASTSourceCallbacks`). The
reason for that is that `methods()` is just a wrapper for `decls()` which is supposed to provide a list of *all*
(both currently loaded and external) decls in the DeclContext.

However, `ClangExternalASTSourceCallbacks::FindExternalLexicalDecls` doesn't implement support for ObjCInterfaceDecl,
so we don't actually add any declarations and just mark the ObjCInterfaceDecl as having no ExternalLexicalStorage.

As LLDB uses the ExternalLexicalStorage to see if it can complete a type with the ExternalASTSource, this causes
that LLDB thinks our class can't be completed any further by the ExternalASTSource
and will from on no longer make any CompleteType/FindExternalLexicalDecls calls to that decl. This essentially
renders those types unusable in the expression parser as they will always be considered incomplete.

This patch just changes the call to `methods` (which is just a `decls()` wrapper), to some ad-hoc `noload_methods`
call which is wrapping `noload_decls()`. `noload_decls()` won't trigger any calls to the ExternalASTSource, so
this prevents that ExternalLexicalStorage will be set to false.

The test for this is just adding a method to an ObjC interface. Before this patch, this unset the ExternalLexicalStorage
flag and put the interface into the state described above.

In a normal user session this situation was triggered by setting a breakpoint in a method of some ObjC class. This
caused LLDB to create the MethodDecl for that specific method and put it into the the ObjCInterfaceDecl.
Also `ObjCLanguageRuntime::LookupInCompleteClassCache` needs to be unable to resolve the type do
an actual definition when the breakpoint is set (I'm not sure how exactly this can happen, but we just
found no Type instance that had the `TypePayloadClang::IsCompleteObjCClass` flag set in its payload in
the situation where this happens. This however doesn't seem to be a regression as logic wasn't changed
from what I can see).

The module-ownership.mm test had to be changed as the only reason why the ObjC interface in that test had
it's ExternalLexicalStorage flag set to false was because of this unintended side effect. What actually happens
in the test is that ExternalLexicalStorage is first set to false in `DWARFASTParserClang::CompleteTypeFromDWARF`
when we try to complete the `SomeClass` interface, but is then the flag is set back to true once we add
the last ivar of `SomeClass` (see `SetMemberOwningModule` in `TypeSystemClang.cpp` which is called
when we add the ivar). I'll fix the code for that in a follow-up patch.

I think some of the code here needs some rethinking. LLDB and Clang shouldn't infer anything about the ExternalASTSource
and its ability to complete the current type form the `ExternalLexicalStorage` flag. We probably should
also actually provide any declarations when we get asked for the lexical decls of an ObjCInterfaceDecl. But both of those
changes are bigger (and most likely would cause us to eagerly complete more types), so those will be follow up patches
and this patch just brings us back to the state before commit ef423a3ba57045f80b0fcafce72121449a8b54d4 .

Fixes rdar://63584164

Reviewers: aprantl, friss, shafik

Reviewed By: aprantl, shafik

Subscribers: arphaman, abidh, JDevlieghere

Differential Revision: https://reviews.llvm.org/D80556

VPlanValue.h - reduce unnecessary includes to forward declarations. NFC.

[X86][SSE] Convert PTEST to MOVMSK for allsign bits vector results

If we are using PTEST to check 'allsign bits' vector elements we can use MOVMSK to extract the signbits directly and perform the comparison on the scalar value.

For vXi16 cases, as we don't have a MOVMSK for this type, we must mask each signbit out of a PMOVMSKB v2Xi8 result, which folds into the TEST comparison.

If this allows us to remove a vector op (via the SimplifyMultipleUseDemandedBits call) this is consistently faster than a PTEST (https://godbolt.org/z/ziJUst).

I'm investigating whether we ever get regressions without the SimplifyMultipleUseDemandedBits call, even if this means we don't remove a vector op, but that has exposed some other poor codegen issues that I'm still investigating and would have to wait for a later patch.

Suggested on PR42035 to avoid unnecessary ashr(x,bw-1)/pcmpgt(0,x) sign splat patterns feeding into ptest.

Differential Revision: https://reviews.llvm.org/D80563

[GlobalISel][InlineAsm] Add missing EarlyClobber flag to inline asm output operands

Summary:
Previously, we only added early-clobber flags to the 'group' immediate flag operand
of an inline asm operand.
However, we also have to add the EarlyClobber flag to the MachineOperand itself.

This fixes PR46028

Reviewers: arsenm, leonardchan

Reviewed By: arsenm, leonardchan

Subscribers: phosek, wdng, rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80467

[StackSafety] Bailout on some function calls

Don't miss values used in calls outside regular argument list.

[StackSafety] Fix formatting in the test

[StackSafety] Ignore some use of values

We should ignore value used in MemTransferInst
as other then src/dst argument.

[DebugInfo] - Fix typo in comment. NFC.

I've forgot to address this bit when landed D80476.

[NFC][Debugify] Format the CheckModuleDebugify output

This fixes the output of the check-debugify option.
Without the patch an example of running the option:

$ opt -check-debugify test.ll -S -o testDebugify.ll
CheckModuleDebugifySkipping module without debugify metadata

After the patch:

$ opt -check-debugify test.ll -S -o testDebugify.ll
CheckModuleDebugify: Skipping module without debugify metadata

Differential Revision: https://reviews.llvm.org/D80553

[X86] Add helper function to reduce some code duplication when shrinking a vector load to a vzext_load.

There's more code for calling CombineTo and replacing the nodes
that I'd like to share, but its complicated by the getNode call
in the middle that needs to be specific to each opcode.

While there are also make sure we recursively delete the load
we're replacing. It eventually gets removed by a RemoveDeadNodes
call at the end of DAG combine, but we should be more eager about
it. We were inconsistently doing this in some places but not all.

[VE] Dynamic stack allocation

Summary:
This patch implements dynamic stack allocation for the VE target. Changes:
* compiler-rt: `__ve_grow_stack` to request stack allocation on the VE.
* VE: base pointer support, dynamic stack allocation.

Differential Revision: https://reviews.llvm.org/D79084

Add test exposing a bug in SimpleLoopUnswitch.

[OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

Summary:
Allow AMDGCN as a GPU offloading target for OpenMP during compiler
invocation and allow setting CUDAMode for it.

Originally authored by Greg Rodgers (@gregrodgers).

Reviewers: ronlieb, yaxunl, b-sumner, scchan, JonChesterfield, jdoerfert, sameerds, msearles, hliao, arsenm

Reviewed By: sameerds

Subscribers: sstefan1, jvesely, wdng, arsenm, guansong, dexonsmith, cfe-commits, llvm-commits, gregrodgers

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79754

Automatically configure MLIR when flang is enabled

This is more friendly than the "Unknown CMake command “mlir_tablegen”."
that would be issued instead.

Differential Revision: https://reviews.llvm.org/D80359

[PGO] Fix computation of function Hash

And bump its version number accordingly.

This is a patched recommit of 7c298c104bfe725d4315926a656263e8a5ac3054

Previous hash implementation was incorrectly passing an uint64_t, that got converted
to an uint8_t, to finalize the hash computation. This led to different functions
having the same hash if they only differ by the remaining statements, which is
incorrect.

Added a new test case that trivially tests that a small function change is
reflected in the hash value.

Not that as this patch fixes the hash computation, it would invalidate all hashes
computed before that patch applies, this is why we bumped the version number.

Update profile data hash entries due to hash function update, except for binary
version, in which case we keep the buggy behavior for backward compatibility.

Differential Revision: https://reviews.llvm.org/D79961

[X86] Lower sse_cmp_ss/sse2_cmp_sd intrinsics to X86ISD::FSETCC with vector types.

Isel match that instead of the intrinsic. Similar to what we do
for avx512.

Trying to move more intrinsics to target specific ISD opcodes.
Hoping to add DAG combines to shrink simple loads going into
scalar intrinsics that only read 32 or 64 bits.

[X86] Use SIMD_EXC to remove some let statements in tablegen. NFCI

[X86][llvm-mc] Make the suffix matcher more accurate.

Summary:
Some instruction like VPMULDQ is NOT the variant of VPMULD but a new
one.
So we should make sure the suffix matcher only works for memory variant
that has the same size with the suffix.
Currently we only check for SSE/AVX* instructions, because many legacy
instructions didn't declare the alias instructions of their variants.

Differential Revision: https://reviews.llvm.org/D80608

[StackSafety] Use SCEV to find mem operation length

[StackSafety] Use getSignedRange for offsets

[analyzer] Add support for IE of keyboard and mouse navigation in HTML report

IE throws errors while using key and mouse navigation through the error path tips.
querySelectorAll method returns NodeList. NodeList belongs to browser API. IE doesn't have forEach among NodeList's methods. At the same time Array is a JavaScript object and can be used instead. The fix is in the converting NodeList into Array and keeps using forEach method as before.

Checked in IE11, Chrome and Opera.

Differential Revision: https://reviews.llvm.org/D80444

[libc][NFC][Obvious] Convert the MPFR operations enum to an enum class.

This was suggested in https://reviews.llvm.org/D79149.

[mlir][linalg] Allow promotion to use callbacks for
alloc/dealloc/copies.

Add options to LinalgPromotion to use callbacks for implementating the
allocation, deallocation of buffers used for the promoted subviews,
and to copy data into and from the original subviews to the allocated
buffers.
Also some misc. cleanup of the code.

Differential Revision: https://reviews.llvm.org/D80365

[mlir][Linalg] Avoid using scf.parallel for non-parallel loops in Linalg ops.

Modifying the loop nest builder for generating scf.parallel loops to
not generate scf.parallel loops for non-parallel iterator types in
Linalg operations. The existing implementation incorrectly generated
scf.parallel for all tiled loops. It is rectified by refactoring logic
used while lowering to loops that accounted for this.

Differential Revision: https://reviews.llvm.org/D80188

[compiler-rt][NFC]Fix Wdeprecated warnings for fsanitize-coverage

A few testcases are still using deprecated options.

warning: argument '-fsanitize-coverage=[func|bb|edge]' is deprecated,
use '-fsanitize-coverage=[func|bb|edge],[trace-pc-guard|trace-pc]'
instead [-Wdeprecated]

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D79741

[NFC][PowerPC] Modify the test case two-address-crash.mir

Temporarily Revert "[Clang][AArch64] Capturing proper pointer alignment for Neon vld1 intrinsicts"
as it's causing crashes on code generation and https://bugs.llvm.org/show_bug.cgi?id=46084

This reverts commit 98cad555e29187a03e2bc3db5780762981913902.

GlobalISel: Add a clarification to G_STORE documentation

Mirror the note on G_LOAD. We probably do need to add an explicit
G_TRUNCSTORE opcode for the vector case, although I do not have a use
for it.

GlobalISel: Basic legalization for G_PTRMASK

[StackSafety] Simplify SCEVRewriteVisitor

Probably NFC.

[NFC, StackSafety] Add some missing includes

[NFC, StackSafety] Remove duplicate code

[NFC, StackSafety] Better names for internal stuff

Remove const from some parameters as upcoming changes in ScalarEvolution
calls will need non const pointers.

[AArch64][GlobalISel] Do not modify predicate when optimizing G_ICMP

This fixes a bug in `tryOptArithImmedIntegerCompare`.

It is unsafe to update the predicate on a MachineOperand when optimizing a
G_ICMP, because it may be used in more than one place.

For example, when we are optimizing G_SELECT, we allow compares which are used
in more than one G_SELECT. If we modify the G_ICMP, then we'll break one of
the G_SELECTs.

Since the compare is being produced to either

1) Select a G_ICMP
2) Fold a G_ICMP into an instruction when profitable

there's no reason to actually modify it. The change is local to the specific
compare.

Instead, pass a `CmpInst::Predicate` to `tryOptArithImmedIntegerCompare` which
can be modified by reference.

Differential Revision: https://reviews.llvm.org/D80585

Add self as code owner for SCEV and IndVars

This was discussed on llvm-dev thread "Transferring code ownership for SCEV and IndVars" a few months back. I just forgot to make the actual change.

[lldb/Docs] Add the application speicfic lldbinit to the man page

This used to be part of the man page but got lost when we moved to
generating it with Sphinx.

Split a test file so that most of it can be autogened

Autogen a couple of test files to make a future diff easier to read

[lldb][Core] Remove dead codepath in Mangled

Summary:
Objective-C names are stored in m_demangled, not in m_mangled. The
method in the condition will never return true.

Differential Revision: https://reviews.llvm.org/D79823