review.tizen.org Git - platform/upstream/llvm.git/log

[mlir][sparse] more test cases for linalg.index

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D121660

[X86] Fix AMD Znver3 model checks

While `-march=` is correctly detected as `znver3` for the cpu,
apparently the model check is incorrect:
```
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0-31
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 9 5950X 16-Core Processor
    CPU family:          25
    Model:               33
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            0
    Frequency boost:     disabled
    CPU max MHz:         6017.8462
    CPU min MHz:         2200.0000
    BogoMIPS:            8050.07
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse
                         3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_p
                         state ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbn
                         oinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
Virtualization features:
  Virtualization:        AMD-V
Caches (sum of all):
  L1d:                   512 KiB (16 instances)
  L1i:                   512 KiB (16 instances)
  L2:                    8 MiB (16 instances)
  L3:                    64 MiB (2 instances)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-31
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected
```

Model is 33 (0x21), while the code was expecting it to be `0x00 .. 0x1F`.
https://github.com/torvalds/linux/blob/v5.17-rc8/drivers/hwmon/k10temp.c#L432-L453 agrees.
I'm not sure if other ranges listed here should also be accepted.

I noticed this while implementing CPU model detection
for halide (https://github.com/halide/Halide/pull/6648)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121708

Revert "[clang][driver] Emit a warning if -xc/-xc++ is after the last input file"

This reverts commit 1c1a4b9556db8579f7428605ac2c351bddde9ad5.

Some builders failed.

[clang][driver] Emit a warning if -xc/-xc++ is after the last input file

This follows the same warning GCC produces.

Differential Revision: https://reviews.llvm.org/D121683

[AMDGPU] Fix typo consecutive in GCNNSAReassign

Revert "[lldb] Synchronize output through the IOHandler"

This reverts commit 242c574dc03e4b90e992cc8d07436efc3954727f because it
breaks the following tests on the bots:

- TestGuiExpandThreadsTree.py
- TestBreakpointCallbackCommandSource.py

[llvm-cov gcov] Fix calculating coverage of template functions

Template functions share the same lines in source files, so the common
container of lines' properties cannot be used to calculate the coverage
statistics of individual functions.

> cat tmpl.cpp
template <int N> int test() { return N; }
int main() { return test<1>() + test<2>(); }
> clang++ --coverage tmpl.cpp -o tmpl
> ./tmpl
> llvm-cov gcov tmpl.cpp -f
...
Function '_Z4testILi1EEiv'
Lines executed:100.00% of 1

Function '_Z4testILi2EEiv'
Lines executed:-nan% of 0
...
> llvm-cov-patched gcov tmpl.cpp -f
...
Function '_Z4testILi1EEiv'
Lines executed:100.00% of 1

Function '_Z4testILi2EEiv'
Lines executed:100.00% of 1
...

Differential Revision: https://reviews.llvm.org/D121390

[LLDB] Fixing DWARFExpression handling of ValueType::FileAddress case for DW_OP_deref_size

Currently DW_OP_deref_size just drops the ValueType::FileAddress case and does
not attempt to handle it. This adds support for this case and a test that
verifies this support.

I did a little refactoring since DW_OP_deref and DW_OP_deref_size have some
overlap in code.

Also see: rdar://66870821

Differential Revision: https://reviews.llvm.org/D121408

[flang][lowering] Add support for lowering of the `ibits` intrinsic

This patch adds support for lowering of the `ibits` intrinsic from Fortran
to the FIR dialect of MLIR.

This is part of the upstreaming effort from the `fir-dev` branch in [1].

[1] https://github.com/flang-compiler/f18-llvm-project

Differential Revision: https://reviews.llvm.org/D121693

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>

[lldb] Make the PlatformMacOSX unit test Apple specific

I thought that x86GetSupportedArchitectures would always return
x86_64-apple-macosx as a compatible architecture, regardless of the host
achitecture, but the Debian bot disagrees with that.

[lldb] Synchronize output through the IOHandler

Add synchronization to the IOHandler to prevent multiple threads from
writing concurrently to the output or error stream.

A scenario where this could happen is when a thread (the default event
thread for example) is using the debugger's asynchronous stream. We
would delegate this operation to the IOHandler which might be running on
another thread. Until this patch there was nothing to synchronize the
two at the IOHandler level.

Differential revision: https://reviews.llvm.org/D121500

[flang][lowering] Add support for lowering the `dim` intrinsic

This patch adds support for lowering of the `dim` intrinsic from Fortran
to the FIR dialect of MLIR.

This is part of the upstreaming effort from the `fir-dev` branch in [1].

[1] https://github.com/flang-compiler/f18-llvm-project

Differential Revision: https://reviews.llvm.org/D121689

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>

[libc] Add implementation of POSIX lseek function.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D121676

[lldb] Fix platform selection on Apple Silicon (again)

This patch is another attempt to fix platform selection on Apple
Silicon. It partially undoes D117340 which tried to fix the issue by
always instantiating a remote-ios platform for "iPhone and iPad Apps on
Apple Silicon Macs".

While the previous patch worked for attaching, it broke launching and
everything else that expects the remote platform to be connected. I made
an attempt to work around that, but quickly found out that there were
just too may places that had this assumption baked in.

This patch takes a different approach and reverts back to marking the
host platform compatible with iOS triples. This brings us back to the
original situation where platform selection was broken for remote iOS
debugging on Apple Silicon. To fix that, we now look at the process'
host architecture to differentiate between iOS binaries running remotely
and iOS binaries running locally.

I tested the following scenarios, which now all uses the desired
platform:

  - Launching an iOS binary on macOS: uses the host platform
  - Attaching to an iOS binary on macOS: uses the host platform
  - Attaching to a remote iOS binary: uses the remote-ios platform

rdar://89840215

Differential revision: https://reviews.llvm.org/D121444

[flang][lowering] Add support for lowering the `dot_product` intrinsic

This patch adds support for lowering the `dot_product` intrinsic from
Fortran to the FIR dialect of MLIR.

This is part of the upstreaming effort from the `fir-dev` branch in [1].

[1] https://github.com/flang-compiler/f18-llvm-project

Differential Revision: https://reviews.llvm.org/D121684

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
Co-authored-by: Mark Leair <leairmark@gmail.com>

[mlir][sparse][taco] Reorder a class.

Define IndexExpr before IndexVar. This is to prepare for the next change
to support the use of index values in tensor expressions.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D121649

[LegalizeTypes][RISCV][WebAssembly] Expand ABS in PromoteIntRes_ABS if it will expand to sra+xor+sub later.

If we promote the ABS and then Expand in LegalizeDAG, then both the
sra and the xor will have their inputs sign extended. This generates
extra code on RISCV which lacks an i8 or i16 sign extend instructon.
If we expand during type legalization, then only the sra will get its
input sign extended. RISCV is able to combine this with the sra by
doing a shift left followed by an sra.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121664

[DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference

RISCV strong prefers i32 values be sign extended to i64. This combine
was always zero extending the constant using APInt methods.

This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead.
getNode will call TLI.isSExtCheaperThanZExt to decide how to handle
the constant.

Tests were copied from D121598 where I noticed that we were creating
constants that were hard to materialize.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121650

Revert "[lldb/test] Make category-skipping logic "platform"-independent"

This reverts commit dddf4ce034a8e06cc1351492dceece3fa2344c14. It breaks
a couple of tests on macos.

[RISCV] Remove lowerSPLAT_VECTOR

This code handles fixed vector SPLAT_VECTOR, but is never called in
any tests.

We only form fixed vector splat vectors for vXi64 on RV32 as part
of DAGCombine. This will be type legalized to SPLAT_VECTOR_PARTS.
So the Custom handling for SPLAT_VECTOR is never needed.

This patch makes SPLAT_VECTOR for vXi64 'Legal' on RV32 so that
DAGCombine will create it, but there's no need for Custom handler.
It will still be type legalized to SPLAT_VECTOR_PARTS.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D121673

[libc] Fix exit not calling new handlers registered from a call to atexit in atexit handler

[libc][BlockStore] Add back, pop_back and empty methods

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D121656

[clang][dataflow] Allow disabling built-in transfer functions for CFG terminators

Terminators are handled specially in the transfer functions so we need an
additional check on whether the analysis has disabled built-in transfer
functions.

Differential Revision: https://reviews.llvm.org/D121694

[InstCombine] try harder to propagate 'nsz' through fneg-of-select

This can be viewed as swapping the select arms:
https://alive2.llvm.org/ce/z/jUvFMJ
...so we don't have the 'nsz' problem with the more general fold.

This unlocks other folds for the motivating fabs example.
This was discussed in issue #38828.

[InstCombine] add tests for fneg-of-select with FMF; NFC

[libc++] Overhaul all tests for assertions and debug mode

Prior to this patch, there was no distinction between tests that check
basic assertions and tests that check full-fledged iterator debugging
assertions. Both were disabled when support for the debug mode is not
provided in the dylib, which is stronger than it needs to be.

Furthermore, all of the tests using "debug_macros.h" that contain more
than one assertion in them were broken -- any code after the first
assertion would never be executed.

This patch refactors all of our assertion-related tests to:
1. Be enabled whenever they can, i.e. basic assertions tests are run
   even when the debug mode is disabled.
2. Use the superior `check_assertion.h` (previously `debug_mode_helper.h`)
   instead of `debug_macros.h`, which allows multiple assertions in the
   same program.
3. Coalesce some tests into the same file to make them more readable.
4. Use consistent naming for test files -- no more db{1,2,3,...,10} tests.

This is a large but mostly mechanical patch.

Differential Revision: https://reviews.llvm.org/D121462

[VE] strided v256.23 isel and tests

ISel for experimental.vp.strided.load|store for v256.32 types via
lowering to vvp_load|store SDNodes.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D121616

[mlir] Fix --convert-func-to-llvm=emit-c-wrappers argument and result attribute handling

When using `--convert-func-to-llvm=emit-c-wrappers` the attribute arguments of the wrapper would not be created correctly in some cases.
This patch fixes that and introduces a set of tests for (hopefully) all corner cases.

See https://github.com/llvm/llvm-project/issues/53503

Author: Sam Carroll <sam.carroll@lmns.com>
Co-Author: Laszlo Kindrat <laszlo.kindrat@lmns.com>

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D119895

[libc] Implement expm1f function that is correctly rounded for all rounding modes.

Implement expm1f function that is correctly rounded for all rounding modes.  This is based on expf implementation.

From exhaustive testings, using expf implementation, and subtract 1.0 before rounding the final result to single precision
gives correctly rounded results for all |x| > 2^-4 with 1 exception.  When |x| < 2^-25, we use x + x^2 (implemented with a
single fma).  And for 2^-25 <= |x| <= 2^-4, we use a single degree-8 minimax polynomial generated by Sollya.

Reviewed By: sivachandra, zimmermann6

Differential Revision: https://reviews.llvm.org/D121574

[InstCombine] Add general constant support to eq/ne icmp(add(X,C1),add(Y,C2)) -> icmp(add(X,C1-C2),Y) fold

A further extension for Issue #32161

For eq/ne comparisons - the sign mismatch and bounds constraints are redundant, so if the that fold fails, fallback and just fold the constants directly.

https://alive2.llvm.org/ce/z/cdodNQ

The loop rotation test change looks mostly benign - the backend doesn't seem to suffer? https://gcc.godbolt.org/z/dErMY78To

Differential Revision: https://reviews.llvm.org/D121551

[JITLink] Fix -Wparentheses warning in R_RISCV_SUB6 case.

Perform the mask inside parentheses before applying the offset

[AARCH64] ssbs should be enabled by default for cortex-x1, cortex-x1c, cortex-a77

Reviewed By: amilendra

Differential Revision: https://reviews.llvm.org/D121206

[MLIR][OpenMP] Add support for basic SIMD construct

Patch adds a new operation for the SIMD construct. The op is designed to be very similar to the existing `wsloop` operation, so that the `CanonicalLoopInfo` of `OpenMPIRBuilder` can be used.

Reviewed By: shraiysh

Differential Revision: https://reviews.llvm.org/D118065

[ASAN] Fix darwin-interface test

Fix darwin interface test after D121464. asan_rtl_x86_64.S is not
available on Darwin.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D121636

[gn build] Port 7262eacd4199

Revert "Load pass plugins during option processing, so that plugin options are registered and live."

This reverts commit 5e8700ce8bf58bdf0a59eef99c85185a74177555.

Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"

Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'

Remove a top-level "using namespace" in TargetTransformInfoImpl.h

Avoids polluting the namespace of all files including the header.

Load pass plugins during option processing, so that plugin options are registered and live.

[lldb/test] Make category-skipping logic "platform"-independent

The decision which categories are relevant for a particular test run
happen very early in the test setup process. They use the SBPlatform
object to determine which categories should be skipped. The platform
object created for this purpose transcends individual test runs.

This setup is not compatible with the direction discussed in
<https://discourse.llvm.org/t/multiple-platforms-with-the-same-name/59594>
-- when platform objects are tied to a specific (SB)Debugger, they need
to be created alongside it, which currently happens in the test setUp
method.

This patch is the first step in that direction -- it rewrites the
category skipping logic to avoid depending on a global SBPlatform
object. Fortunately, the skipping logic is fairly simple (and I believe
it outght to stay that way) and mainly consists of comparing the
platform name against some hardcoded lists. This patch bases this
comparison on the platform name instead of the os part of the triple (as
reported by the platform).

Differential Revision: https://reviews.llvm.org/D121605

[BasicAA] Add test showing incorrect noalias result with wrapping.

@mul_may_overflow_var_nonzero_minabsvarindex_one_index shows BasicAA
incorrectly determining noalias for (%gep.917, i8* %gep.idx).
If %v == 10581764700698480926, %idx == 917 and the GEPs alias.
https://alive2.llvm.org/ce/z/yzDgnn

[mlir][bufferize] Extract buffer hoisting into separate function

This improves the modularity of the bufferization.

From now on, all ops that do not implement BufferizableOpInterface are considered hoisting barriers. Previously, all ops that do not implement the interface were not considered barriers and such ops had to be marked as barriers explicitly. This was unsafe because we could've hoisted across unknown ops where it was not safe to hoist.

As a side effect, this allows for cleaning up AffineBufferizableOpInterfaceImpl. This build unit no longer needed and can be deleted.

Differential Revision: https://reviews.llvm.org/D121519

[clang-format] Correctly format variable templates.

Fixes https://github.com/llvm/llvm-project/issues/54257.

Reviewed By: MyDeveloperDay, HazardyKnusperkeks, owenpan

Differential Revision: https://reviews.llvm.org/D121456

[X86] combineSelect - canonicalize (vXi1 bitcast(iX Cond)) with combineToExtendBoolVectorInReg before legalization

This replaces the attempt in 20af71f8ec47319d375a871db6fd3889c2487cbd to use combineToExtendBoolVectorInReg to create X86ISD::BLENDV masks directly, instead we use it to canonicalize the iX bitcast to a sign-extended mask and then truncate it back to vXi1 prior to legalization breaking it apart.

Fixes #53760

[clang-format] Add regression tests for function ref qualifiers on operator definition. NFC.

Fixes https://github.com/llvm/llvm-project/issues/54374.

[LV] Make reduction-order.ll test independent of instruction naming.

Also update test to not use branch on undef.

[NFC] Add LazyValueInfo::clear method

This method just calls LazyValueInfoImpl::clear

[clang-format] Correctly recognize arrays in template parameter list.

Fixes https://github.com/llvm/llvm-project/issues/54245.

Reviewed By: MyDeveloperDay, HazardyKnusperkeks, owenpan

Differential Revision: https://reviews.llvm.org/D121584

[mlir][gpu] Introduce gpu.global_id op

Introduce OpenCL-style global_id op and corresponding spirv lowering.

Differential Revision: https://reviews.llvm.org/D121548

[mlir][spirv] Add AssumeTrueKHROp

Differential Revision: https://reviews.llvm.org/D121601

[mlir][bufferize][NFC] Deallocate all buffers at the end of bufferization

This makes bufferization more modular. This is in preparation of future refactorings.

Differential Revision: https://reviews.llvm.org/D121362

[OpenMPOpt] Avoid pointer element type access during region merging

Hardcode the function type as ParallelTask, which is the guaranteed
pointee type of this runtime function argument (if pointee types
exist). The elimination of the callee bitcast is left for InstCombine.

Differential Revision: https://reviews.llvm.org/D120885

[mlir][bufferize][NFC] Split BufferizationState into AnalysisState/BufferizationState

Differential Revision: https://reviews.llvm.org/D121361

[flang] fulfill -Msave/-fno-automatic in main programs too

`semantics::IsSaved()` was not applying -Msave/-fno-automatic for main programs.
This caused issues since lowering relies on it to allocate static
variables. This did not match nvfortran/gfortran behaviors where
-fno-automatic/-Msave control the static allocation of scalars in
main programs.

Some program may rely on main program scalars to be statically allocated in
bss (and therefore initialized to zero) with -Msave/-fno-automatic
flags.

Differential Revision: https://reviews.llvm.org/D121603

[mlir][bufferize] Fix config not passed to greedy rewriter

Also add a TODO to switch to a custom walk instead of the GreedyPatternRewriter, which should be more efficient. (The bufferization pattern is guaranteed to apply only a single time for every op, so a simple walk should suffice.)

We currently specify a top-to-bottom walk order. This is important because other walk orders could introduce additional casts and/or buffer copies. These canonicalize away again, but it is more efficient to never generate them in the first place.

Note: A few of these canonicalizations are not yet implemented.

Differential Revision: https://reviews.llvm.org/D121518

[libc][Obvious] Fix typo in CMake file.

[flang] Hanlde COMPLEX 2/3/10 in runtime TypeCode(cat, kind)

Type codes for COMPLEX kinds 2, 3, and 10 were added in https://reviews.llvm.org/D117336
but handling for these kinds in TypeCode(cat, kind) has not been added
yet.

Differential Revision: https://reviews.llvm.org/D121587

[MachineLICM] Simplify code and avoid adding nullptr values to ParentMap. NFC

[LV] Remove LoopVectorBody from InnerLoopVectorizer. (NFCI)

Update places still referencing LoopVectorBody to use the vector loop to
get the vector loop header. This is needed to move vector loop
code-generation to VPlan completely, which in turn is needed to model
pre-header & exit blocks in VPlan as well.

[mlir] Remove the deprecated ODS Op verifier/parser/printer code blocks

These have been deprecated for ~1 month now and can be removed.

Differential Revision: https://reviews.llvm.org/D121090

[clang][dataflow] Model the behavior of non-standard optional constructors

Model nullopt, inplace, value, and conversion constructors.

Reviewed-by: ymandel, xazax.hun, gribozavr2
Differential Revision: https://reviews.llvm.org/D121602

[mlir][Bazel] Adjust build file to account for new td files.

[PowerPC] Disable perfect shuffle by default

We are going to remove the old 'perfect shuffle' optimization since it
brings performance penalty in hot loop around vectors. For example, in
following loop sharing the same mask:

%v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
%v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>

The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead
of `vperm-vperm`. In some large loop cases, this causes 20%+ performance
penalty.

The original attempt to resolve this is to pre-record masks of every
shufflevector operation in DAG, but that is somewhat complex and brings
unnecessary computation (to scan all nodes) in optimization. Here we
disable it by default. There're indeed some cases becoming worse after
this, which will be fixed in a more careful way in future patches.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D121082

[mlir] Refactor how parser/printers are specified for AttrDef/TypeDef

There is currently an awkwardly complex set of rules for how a
parser/printer is generated for AttrDef/TypeDef. It can change depending on if a
mnemonic was specified, if there are parameters, if using the assemblyFormat, if
individual parser/printer code blocks were specified, etc. This commit refactors
this to make what the attribute/type wants more explicit, and to better align
with how formats are specified for operations.

Firstly, the parser/printer code blocks are removed in favor of a
`hasCustomAssemblyFormat` bit field. This aligns with the operation format
specification (and is nice to remove code blocks from ODS).

This commit also adds a requirement to explicitly set `assemblyFormat` or
`hasCustomAssemblyFormat` when the mnemonic is set and the attr/type
has no parameters. This removes the weird implicit matrix of behavior,
and also encourages the author to make a conscious choice of either C++
or declarative format instead of implicitly opting them into the C++
format (we should be pushing towards declarative when possible).

Differential Revision: https://reviews.llvm.org/D121505

[mlir] Rewrite and modernize the documentation for defining Attributes/Types

The current documentation is super old, crusty, and at times wrong. This commit
rewrites the documentation to focus on the TableGen declarative definition,
expounds on various components, and moves the doc out of Tutorials/ and into
a new top level `AttributesAndTypes.md` doc. As part of this, the AttrDef/TypeDef
documentation in OpDefinitions.md is removed.

Differential Revision: https://reviews.llvm.org/D120011

[mlir] Split out AttrDef/TypeDef and pattern constructs from OpBase.td

OpBase.td has formed into a huge monolith of all ODS constructs. This
commits starts to rectify that by splitting out some constructs to their
own .td files.

Differential Revision: https://reviews.llvm.org/D118636

[mlir][ods] Add support for custom directive in attr/type formats

This patch adds support for custom directives in attribute and type formats. Custom directives dispatch calls to user-defined parser and printer functions.

For example, the assembly format "custom<Foo>($foo, ref($bar))" expects a function with the signature

```
LogicalResult parseFoo(AsmParser &parser, FailureOr<FooT> &foo, BarT bar);
void printFoo(AsmPrinter &printer, FooT foo, BarT bar);
```

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D120944

[NFC][XCOFF] Refactor and format XCOFFObjectWriter.cpp.

Reviewed By: jhenderson, DiggerLin

Differential Revision: https://reviews.llvm.org/D120858

[llvm-objcopy] Simplify CompressedSection creation. NFC

Remove Expected<CompressedSection> factory functions in favor of constructors
now that zlib::compress returns void (D121512).

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D121644

[MC][test] Add more .loc directives to improve portability with older zlib

Make .debug_line so larger so that MC will more assuredly compress .debug_line
(it doesn't compress a section if compressed content is not smaller).

Revert "[mlirTranslateMain] Add a customization callback."

This reverts commit f18d6af7e972ed0e2215ad098b4c5f52ccb68b5f.
This patch is a more controversial than I expected, it is
better to revert while the discussion continues. xref this
thread:
https://discourse.llvm.org/t/doc-mlir-translate-mlir-opt/60751/

xref this phab patch: https://reviews.llvm.org/D120970

Differential Revision: https://reviews.llvm.org/D121668

[mlir][nvvm] Fix bug in ldmatrix intrinsic conversion

The ldmatrix intrinsic trans option was inverted.

Bug found by @christopherbate!

Differential Revision: https://reviews.llvm.org/D121666

[lldb] Cleanup MacOSX platform headers (NFC)

While working on dde487e54782 I noticed that the MacOSX platforms were
in need of some love. This patch cleans up the headers:

- Move platforms into the lldb_private namespace.
- Remove lldb_private:: prefixes to improve readability.
- Fix header includes and use forward declarations (iwyu).
- Fix formatting

[clang] Fix DIFile directory root on Windows

On unix systems this logic would not separate the file and directory of
the DIFile unless they shared more components at the start than just the
root path character. The logic to do this was unix specific so it didn't
work on Windows. Now we check if the entire root_path is the same as
what you were going to set as the Dir and use the full filepath in that
case.

Differential Revision: https://reviews.llvm.org/D111579

[test] Add lit helper for windows paths

This adds 2 new lit helpers `%{fs-src-root}` and `%{fs-sep}`, these
allow writing tests that correctly handle slashes on Windows. In the
case of tests like clang/test/CodeGen/debug-prefix-map.c, these are
unable to correctly test behavior on both platforms, unless they fork
and add OS requirements, because the relevant logic hits host specific
codepaths like checking if paths are absolute.

Differential Revision: https://reviews.llvm.org/D111457

[WebAssembly] Fix asan issue from https://reviews.llvm.org/D121349

AMDGPU: Use removeAllRegUnitsForPhysReg()

I met the issue here when working on something else.
Actually we have already reserved EXEC, but it looks
like the register coalescer is causing the sub-register
of EXEC appears in LiveIntervals. I have not looked
deeper why register coalscer have such behavior, but
removeAllRegUnitsForPhysReg() is the right way.

Reviewed By: critson, foad, arsenm

Differential Revision: https://reviews.llvm.org/D117014

[CMake][Fuchsia] Use correct architecture for iossim

We should be building iossim for x86_64, not arm64.

Differential Revision: https://reviews.llvm.org/D121659

[lld-macho] -flat_namespace for dylibs should make all externs interposable

All references to interposable symbols can be redirected at runtime to
point to a different symbol definition (with the same name). For
example, if both dylib A and B define symbol _foo, and we load A before
B at runtime, then all references to _foo within dylib B will point to
the definition in dylib A.

ld64 makes all extern symbols interposable when linking with
`-flat_namespace`.

TODO 1: Support `-interposable` and `-interposable_list`, which should
just be a matter of parsing those CLI flags and setting the
`Defined::interposable` bit.

TODO 2: Set Reloc::FinalDefinitionInLinkageUnit correctly with this info
(we are currently not setting it at all, so we're erring on the
conservative side, but we should help the LTO backend generate more
optimal code.)

Reviewed By: modimo, MaskRay

Differential Revision: https://reviews.llvm.org/D119294

[lld-macho][nfc] Allow Defined symbols to be placed in binding sections

Previously, we only allowed this for DylibSymbols. However, in order to
properly support `-flat_namespace` as well as `-interposable`, we need
to allow this for Defined symbols too. Therefore we hoist the
`lazyBindOffset` and the `stubsHelperIndex` into the parent Symbol
class.

The actual change to support interposition under `-flat_namespace` is in
{D119294}; the NFC changes here have been split out for easier review.

Perf regression isn't stat sig on my 3.2 GHz 16-Core Intel Xeon W linking
chromium_framework:

             base           diff           difference (95% CI)
  sys_time   1.227 ± 0.021  1.234 ± 0.031  [  -0.3% ..   +1.5%]
  user_time  3.665 ± 0.036  3.674 ± 0.035  [  -0.2% ..   +0.7%]
  wall_time  4.596 ± 0.055  4.609 ± 0.064  [  -0.3% ..   +0.9%]
  samples    34             47

Max RSS regression is barely stat sig:

           base                           diff                           difference (95% CI)
  time     1003664356.324 ± 15404053.912  1010380403.613 ± 10578309.455  [  +0.0% ..   +1.3%]
  samples  37                             31

Reviewed By: modimo

Differential Revision: https://reviews.llvm.org/D121351

[clang-format] Don't unwrap lines preceded by line comments

Fixes #53495

Differential Revision: https://reviews.llvm.org/D121576

[OpenMP][Fix] Fix test failing after patch

[OpenMP][Fix] Add offloading kind to AMDGPU libraries

Summary:
A previous patch added the offloading kind to the triple format we used.
I forgot to update the line where we add the AMDGPU libraries.

[gn build] Port 9c542a5a4e1b

Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO

For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121327

[CUDA] Add CUDA fatbinary magic

Nvidia uses fatbinaries to bundle all of their device code. This patch
adds the magic number "0x50ed55ba" used in their propeitary format to
the list of magic identifies. This is technically undocumented and could
unlikely be changed by Nvidia in the future.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D120932

[OpenMP][NFC] Refactor new driver to be more general

This path refactors the new driver to be less dependent on OpenMP. This
is done in preparation for the new driver to be able to handle other
offloading kinds and compile them together.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120934

[Clang] Add offload kind to embedded offload object

This patch adds the offload kind to the embedded section name in
preparation for offloading to different kinda like CUDA or HIP.

Depends on D120288

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120271

[OpenMP] Implement dense map info for device file

This patch implements a DenseMap info struct for the device file type.
This is used to help grouping device files that have the same triple and
architecture. Because of this the filename, which will always be unique
for each file, is not used.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120288

[OpenMP] Try to embed offloading objects after codegen

Currently we use the `-fembed-offload-object` option to embed a binary
file into the host as a named section. This is currently only used as a
codegen action, meaning we only handle this option correctly when the
input is a bitcode file. This patch adds the same handling to embed an
offloading object after we complete code generation. This allows us to
embed the object correctly if the input file is source or bitcode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120270

[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions

Differential Revision: https://reviews.llvm.org/D121634

Reland "[lld-macho] Avoid using bump-alloc in TrieBuider""

This reverts commit ee7a286cd3e4364d2f7d5b6ba4b8a6fc0d524854.

[AMDGPU] Add symbolic names for gfx940 HWREGs

The namespaces of HWREGs is now overlapping with gfx10. Thus the
patch is longer than necessary to just support new names. It also
need to handle proper error messages, i.e. to issue a "specified
hardware register is not supported on this GPU" message.

This may need a major refactoring in the future.

Differential Revision: https://reviews.llvm.org/D121418

[DFSan] Remove trampolines to unblock opaque pointers. (Reland with fix)

https://github.com/llvm/llvm-project/issues/54172

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D121250

[AMDGPU] Support for gfx940 flat lds opcodes

Differential Revision: https://reviews.llvm.org/D121414

[AMDGPU] Support gfx940 v_lshl_add_u64 instruction

Differential Revision: https://reviews.llvm.org/D121401

[ARM] __cxa_end_cleanup: avoid clobbering r4

The fix for D111703 clobbered r4 both to:
- Save/restore the original lr.
- Load the address of _Unwind_Resume for LIBCXXABI_BAREMETAL.

This patch saves and restores lr without clobbering any extra
registers.

For LIBCXXABI_BAREMETAL, it is still necessary to clobber one extra
register to hold the address of _Unwind_Resume, but it seems better to
use ip/r12 (intended for linker veneers/trampolines) than r4 for this
purpose.

The function also clobbers r0 for the _Unwind_Resume function's
parameter, but that is unavoidable.

Reviewed By: danielkiss, logan, MaskRay

Differential Revision: https://reviews.llvm.org/D121432

[Clang] noinline stmt attribute - emit warnings rather than errors

Compatible behaviour with always_inline stmt attribute

Don't report memory return values on MacOS_arm64 of SysV_arm64 ABI's.

They don't require that the memory return address be restored prior to
function exit, so there's no guarantee the value is correct. It's better
to return nothing that something that's not accurate.

Differential Revision: https://reviews.llvm.org/D121348

[AMDGPU] flat scratch SVS addressing mode for gfx940

Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254