review.tizen.org Git - platform/upstream/llvm.git/log

Emit LLVM IR equivalent of sizeof when lowering alloc operations

Originally, the lowering of `alloc` operations has been computing the number of
bytes to allocate when lowering based on the properties of MLIR type. This does
not take into account type legalization that happens when compiling LLVM IR
down to target assembly. This legalization can widen the type, potentially
leading to out-of-bounds accesses to `alloc`ed data due to mismatches between
address computation that takes the widening into account and allocation that
does not. Use the LLVM IR's equivalent of `sizeof` to compute the number of
bytes to be allocated:
%0 = getelementptr %type* null, %indexType 0
%1 = ptrtoint %type* %0 to %indexType
adapted from
http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt

PiperOrigin-RevId: 274159900

LLVM Dialect: introduce llvm.mlir.null operation

Similarly to `llvm.mlir.undef`, this auxiliary operation creates an SSA value
that corresponds to `null` in LLVM IR. This operation is necessary to model
sizeof(<...>) behavior when allocating memory.

PiperOrigin-RevId: 274158760

Drop obsolete code from std to llvm memref lowering

- dropping what looks like outdated code post some of the previous
updates

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#179

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/179 from bondhugula:llfix 2a72ea441fe1b3924802273ffbe9870afeb90f91
PiperOrigin-RevId: 274158273

Rename LLVM::exp and LLVM::fmuladd to LLVM::ExpOP and LLVM::FMulAddOp.

PiperOrigin-RevId: 274154655

Add unary ops and ExpOp to Standard Dialect.

PiperOrigin-RevId: 274152154

LLVM conversion: harden a test to check for LLVM funcs rather than any funcs

This test was not updated in the original commit that switched to using LLVM
functions since it wasn't broken by that change. FileCheck was able to match
the `func` part of `llvm.func` to the expected pattern and continue as usual.
Make sure the `llvm.` dialect prefix is included in the expected output.

PiperOrigin-RevId: 274127281

NFC: Print the generic op form after pass failure.

On failure, the IR is likely to be in an invalid state, meaning the custom printer for some operations may now crash. Using the generic op form prevents this from happening.

PiperOrigin-RevId: 274104146

Add support for generating reproducers on pass crash and failure.

This cl adds support for generating a .mlir file containing a reproducer for crashes and failures that happen during pass execution. The reproducer contains a comment detailing the configuration of the pass manager(e.g. the textual description of the pass pipeline that the pass manager was executing), along with the original input module.

Example Output:

// configuration: -pass-pipeline='func(cse, canonicalize), inline'
// note: verifyPasses=false

module {
...
}

PiperOrigin-RevId: 274088134

NFC: Initialize pass manager option fields inline instead of the class constructor.
PiperOrigin-RevId: 274087577

Standard-to-LLVM conversion: check that operands have LLVM types

In Standard to LLVM dialect conversion, the binary op conversion pattern
implicitly assumed some operands were of LLVM IR dialect type. This is not
necessarily true, for example if the Ops that produce those operands did not
match the existing convresion patterns. Check if all operands are of LLVM IR
dialect type and if not, fail to patch the binary op pattern.

Closes tensorflow/mlir#168

PiperOrigin-RevId: 274063207

Translation to LLVM: check the validity of module-level Ops

Translation to LLVM expects the entry module to have only specific types of ops
that correspond to LLVM IR entities allowed in a module. Currently those are
restricted to functions and globals. Introduce an additional check at the
module level. Inside individual functions, the check for supported Ops is
already performed, but it accepts all LLVM dialect Ops and wouldn't be
immediately applicable at the module level.

PiperOrigin-RevId: 274058651

Add lowering of constant ops to SPIR-V.

The lowering is specified as a pattern and is done only if the result
is a SPIR-V scalar type or vector type.
Handling ConstantOp with index return type needs special handling
since SPIR-V dialect does not have index types. Based on the bitwidth
of the attribute value, either i32 or i64 is chosen.
Other constant lowerings are left as a TODO.

PiperOrigin-RevId: 274056805

Add trait for specified shapes matching

PiperOrigin-RevId: 274046434

Add support for canonicalizing callable regions during inlining.

This will allow for inlining newly devirtualized calls, as well as give a more accurate cost model(when we have one). Currently canonicalization will only run for nodes that have no child edges, as the child nodes may be erased during canonicalization. We can support this in the future, but it requires more intricate deletion tracking.

PiperOrigin-RevId: 274011386

Remove the need to convert operations in regions of operations that have been replaced.

When an operation with regions gets replaced, we currently require that all of the remaining nested operations are still converted even though they are going to be replaced when the rewrite is finished. This cl adds a tracking for a minimal set of operations that are known to be "dead". This allows for ignoring the legalization of operations that are won't survive after conversion.

PiperOrigin-RevId: 274009003

Python bindings: export index_cast

We are now properly enforcing the absence of index elements in memrefs and
tensors. Instead, users are expected to store sized integers and cast them to
index type if necessary. Expose the respective operation to Python bindings.

PiperOrigin-RevId: 273985856

Mark GPU dialect as illegal when lowering to NVVM.

PiperOrigin-RevId: 273948293

NFC: Cleanup of type checking tests

1. Rename test ops referencing operand to index from 0 consistent with how we index elsewhere.
2. Don't limit type checking that functions for all shaped types to only tensors.
3. Don't limit (element) type checking functions and add tests for scalars.
4. Remove SSA values that don't do anything.

PiperOrigin-RevId: 273917608

Use llvm.func to define functions with wrapped LLVM IR function type

This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.

Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.

PiperOrigin-RevId: 273910855

Add test for fix to tablegen for custom folders for ops that return a single
variadic result.

Add missing test for single line fix to `void OpEmitter::genFolderDecls()`
entitled "Fold away reduction over 0 dimensions."

PiperOrigin-RevId: 273880337

Fix typo in QuantizedType method names

Closes tensorflow/mlir#172

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/172 from kiszk:quantops e27b57eac8f4c6ef7ee6a6f7b497d3e2f56f6798
PiperOrigin-RevId: 273879164

Pre-allocate space for results from a regex match that uses 3 match strings.

That space is 4 StringRefs, not 3, because element 0 of the match always
contains the entire source string.

PiperOrigin-RevId: 273875606

minor spelling tweaks

--
f93661f2c25aab6cc5bf439606b0a1312998a575 by Kazuaki Ishizaki <ishizaki@jp.ibm.com>:

address @River707's comment

Closes tensorflow/mlir#176

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/176 from kiszk:spelling_tweaks_include_tools f93661f2c25aab6cc5bf439606b0a1312998a575
PiperOrigin-RevId: 273830689

Add ::printAsTextualPipeline to Pass and OpPassManager.

Allow printing out pipelines in a format that is as close as possible to the
textual pass pipeline format. Individual passes can override the print function
in order to format any options that may have been used to construct that pass.

PiperOrigin-RevId: 273813627

Guard rewriter insertion point during signature conversion.

Avoid unexpected side effect in rewriter insertion point.

PiperOrigin-RevId: 273785794

Make SPIR-V lowering infrastructure follow Vulkan SPIR-V validation.

The lowering infrastructure needs to be enhanced to lower into a
spv.Module that is consistent with the SPIR-V spec. The following
changes are needed
1) The Vulkan/SPIR-V validation rules dictates entry functions to have
signature of void(void). This requires changes to the function
signature conversion infrastructure within the dialect conversion
framework. When an argument is dropped from the original function
signature, a function can be specified that when invoked will return
the value to use as a replacement for the argument from the original
function.
2) Some changes to the type converter to make the converted type
consistent with the Vulkan/SPIR-V validation rules,
   a) Add support for converting dynamically shaped tensors to
   spv.rtarray type.
   b) Make the global variable of type !spv.ptr<!spv.struct<...>>
3) Generate the entry point operation for the kernel functions and
automatically compute all the interface variables needed

PiperOrigin-RevId: 273784229

Fix Windows linkage error

This CL fixes bad macro names usage in mlir_runner_utils.h.
The macro mlir_runner_utils_EXPORTS now matches what is defined in CMakeLists.txt.

PiperOrigin-RevId: 273773931

Add support for some multi-store cases in affine fusion

This PR is a stepping stone towards supporting generic multi-store
source loop nests in affine loop fusion. It extends the algorithm to
support fusion of multi-store loop nests that:
1. have only one store that writes to a function-local live out, and
2. the remaining stores are involved in loop nest self dependences
or no dependences within the function.

Closes tensorflow/mlir#162

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/162 from dcaballe:dcaballe/multi-output-fusion 7fb7dec6fe8b45f5ce176f018bfe37b256420c45
PiperOrigin-RevId: 273773907

Update the usage and comments in define_inst.sh.

PiperOrigin-RevId: 273723108

Add exp operation to LLVMOPs.td.

PiperOrigin-RevId: 273718958

Change to doxygen comments. NFC.

PiperOrigin-RevId: 273707610

Assert that region is not cloned into itself.

PiperOrigin-RevId: 273707291

NFC: Fully qualify use of std::string.
PiperOrigin-RevId: 273668957

Allow dynamic but ranked types in ops with SameOperandsAndResultShape and SameOperandsAndResultType traits

Currently SameOperandsAndResultShape trait allows operands to have tensor<*xf32> and tensor<2xf32> but doesn't allow tensor<?xf32> and tensor<10xf32>.

Also, use the updated shape compatibility helper function in TensorCastOp::areCastCompatible method.

PiperOrigin-RevId: 273658336

Update the symbol utility methods to handle the case of unknown operations.

This enhances the symbol table utility methods to handle the case where an unknown operation may define a symbol table. When walking symbols, we now collect all symbol uses before allowing the user to iterate. This prevents the user from assuming that all symbols are actually known before performing a transformation.

PiperOrigin-RevId: 273651963

Add Instance Specific Pass Options.

This allows individual passes to define options structs and for these options to be parsed per instance of the pass while building the pass pipeline from the command line provided textual specification.

The user can specify these per-instance pipeline options like so:
```
struct MyPassOptions : public PassOptions<MyPassOptions> {
Option<int> exampleOption{*this, "flag-name", llvm::cl::desc("...")};
List<int> exampleListOption{*this, "list-flag-name", llvm::cl::desc("...")};
};

static PassRegistration<MyPass, MyPassOptions> pass("my-pass", "description");
```

PiperOrigin-RevId: 273650140

Add support for parsing/printing non bare-identifier SymbolRefs.

The restriction that symbols can only have identifier names is arbitrary, and artificially limits the names that a symbol may have. This change adds support for parsing and printing symbols that don't fit in the 'bare-identifier' grammar by printing the reference in quotes, e.g. @"0_my_reference" can now be used as a symbol name.

PiperOrigin-RevId: 273644768

[ROCm] Fix the return type for the device function calls from i32 to i64.

This is matching what the runtime library is expecting.

Closes tensorflow/mlir#171

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/171 from deven-amd:deven-rocdl-device-func-i64 80762629a8c34e844ebdc542b34dd783990db9db
PiperOrigin-RevId: 273640767

[spirv] Add a pass to decorate the composite types with layout info.

Add a pass to decorate the composite types used by
composite objects in the StorageBuffer, PhysicalStorageBuffer,
Uniform, and PushConstant storage classes with layout information.

Closes tensorflow/mlir#156

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/156 from denis0x0D:sandbox/layout_info_decoration 7c50840fd38ca169a2da7ce9886b52b50c868b84
PiperOrigin-RevId: 273634140

Add a PatternRewriter hook for cloning a region into another.

This is similar to the `inlineRegionBefore` hook, except the original blocks are unchanged. The region to be cloned *must* not have been modified during the conversion process at the point of cloning, i.e. it must belong an operation that has yet to be converted, or the operation that is currently being converted.

PiperOrigin-RevId: 273622533

unroll and jam: fix order of jammed bodies

- bodies would earlier appear in the order (i, i+3, i+2, i+1) instead of
(i, i+1, i+2, i+3) for example for factor 4.

- clean up hardcoded test cases

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#170

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/170 from bondhugula:ujam b66b405b2b1894a03b376952e32a9d0292042665
PiperOrigin-RevId: 273613131

Add support for walking the uses of a symbol.

MLIR uses symbol references to model references to many global entities, such as functions/variables/etc. Before this change, there is no way to actually reason about the uses of such entities. This change provides a walker for symbol references(via SymbolTable::walkSymbolUses), as well as 'use_empty' support(via SymbolTable::symbol_use_empty). It also resolves some deficiencies in the LangRef definition of SymbolRefAttr, namely the restrictions on where a SymbolRefAttr can be stored, ArrayAttr and DictionaryAttr, and the relationship with operations containing the SymbolTable trait.

PiperOrigin-RevId: 273549331

NFC: Remove unused default cl::opt value.

The default value is never used as the value of the elide option is only used if it has an occurrence.

PiperOrigin-RevId: 273545143

Linalg to LLVM lowering: decrease the reliance on symbol lookup in a module

During the conversion, both the original and the converted function may coexist
in the module and have the same symbol name. There is no guarantee which of the
two will be found by the symbol lookup. Avoid returning the result of the
library function lookup when lowering Linalg to Standard or LLVM. Use the
symbol reference instead. After the conversion completes, only one symbol will
remain and the Ops using SymbolRefAttrs will be referring to the correct one.

PiperOrigin-RevId: 273510079

GPUToCUDA: attach CUBIN to the nested module rather than to the function

Originally, we were attaching attributes containing CUBIN blobs to the kernel
function called by `gpu.launch_func`. This kernel is now contained in a nested
module that is used as a compilation unit. Attach compiled CUBIN blobs to the
module rather than to the function since we were compiling the module. This
also avoids duplication of the attribute on multiple kernels within the same
module.

PiperOrigin-RevId: 273497303

GPUToCUDA: emit addressof directly instead of wrapping it into a getter function

Originally, the CUBIN getter function was introduced as a mechanism to
circumvent the absence of globals in the LLVM dialect. It would allocate memory
and populate it with the CUBIN data. LLVM dialect now supports globals and they
are already used to store CUBIN data, making the getter function a trivial
address computation of a global. Emit the address computation directly at the
place of `gpu.launch_func` instead of putting it in a function and calling it.
This simplifies the conversion flow and prepares it for using the
DialectConversion infrastructure.

PiperOrigin-RevId: 273496221

Fuse GenerateCubinAccessors pass into LaunchFunctToCuda

Now that the accessor function is a trivial getter of the global variable, it
makes less sense to have the getter generation as a separate pass. Move the
getter generation into the lowering of `gpu.launch_func` to CUDA calls. This
change is mostly code motion, but the process can be simplified further by
generating the addressof inplace instead of using a call. This is will be done
in a follow-up.

PiperOrigin-RevId: 273492517

Use named modules for gpu.launch_func

The kernel function called by gpu.launch_func is now placed into an isolated
nested module during the outlining stage to simplify separate compilation.
Until recently, modules did not have names and could not be referenced. This
limitation was circumvented by introducing a stub kernel at the same name at
the same nesting level as the module containing the actual kernel. This
relation is only effective in one direction: from actual kernel function to its
launch_func "caller".

Leverage the recently introduced symbol name attributes on modules to refer to
a specific nested module from `gpu.launch_func`. This removes the implicit
connection between the identically named stub and kernel functions. It also
enables support for `gpu.launch_func`s to call different kernels located in the
same module.

PiperOrigin-RevId: 273491891

Update upgrade some uses of mlir::interleave API to take container argument directly.

PiperOrigin-RevId: 273446814

Add a flag to the AsmPrinter for eliding large ElementsAttrs.

Some modules may have extremely large ElementsAttrs, which makes debugging involving IR dumping extremely slow and painful. This change adds a flag that will elide ElementsAttrs with a "large"(as defined by the user) number of elements by printing "..." instead of the element data.

PiperOrigin-RevId: 273413100

Print result types when dumping graphviz.

PiperOrigin-RevId: 273406833

Expose `fuseProducerOf` in Linalg/Utils/Utils.h.

PiperOrigin-RevId: 273384063

Do not add spirv::BitcastOp for cast from signed to unsigned type.

Since MLIR integer types don't make a distinction between signed vs
unsigned integers, during deserialization of SPIR-V binaries, the
OpBitcast might result in a cast from/to the same type. Do not add a
spv.Bitcast operation to the spv.module in these cases.

PiperOrigin-RevId: 273381887

[spirv] Disable a crashing spv.loop test

PiperOrigin-RevId: 273379318

Add a new class, OpPrintingFlags, to enable programmatic control of Operation::print behavior.

This allows for controlling the behavior of the AsmPrinter programmatically, instead of relying exclusively on cl::opt flags. This will also allow for more fine-tuned control of printing behavior per callsite, instead of being applied globally.

PiperOrigin-RevId: 273368361

Update UndefOp (de)serialization to generate OpUndef at module level.

The SPIR-V spec recommends all OpUndef instructions be generated at
module level. For the SPIR-V dialect its better for UndefOp to produce
an SSA value for use with other instructions. If UndefOp is to be used
at module level, it cannot produce an SSA value (use of this SSA value
within FuncOp would need implicit capture). To satisfy needs of the
SPIR-V spec while making it simpler to represent UndefOp in the SPIR-V
dialect, the serialization is updated to create OpUndef instruction
at module scope.

PiperOrigin-RevId: 273355526

[spirv] Fix function entry block erase after moving to spv.selection

The structured selection/loop's entry block does not have arguments.
If the function's header block is also part of the structured control
flow, we cannot just simply erase it because it may contain arguments
matching the function signature and used by the cloned blocks. Instead,
turn it into a block only containing a spv.Branch op.

Also, we can directly emit instructions for the spv.selection header
block to the block containing the spv.selection op. This eliminates
unnecessary branches in the SPIR-V blob.

Added a test for nested spv.loop.

PiperOrigin-RevId: 273351424

fix simplify-affine-structures bug

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#157

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/157 from bondhugula:quickfix bd1fcd79825fc0bd5b4a3e688153fa0993ab703d
PiperOrigin-RevId: 273316498

Change Block::getParent() to be a const function. This is only necessary because ilist_node_with_parent specifically requires a 'getParent() const' method. If/When ilist_node removes this constraint we should drop the const to fit the rest of the MLIR const model.

PiperOrigin-RevId: 273316153

Fix a comment in the OperationInterface example.

PiperOrigin-RevId: 273308494

Start a minimal mlir_utils runtime library for testing debugging purposes

Now that MLIR has a standardized StridedMemRef descriptor, it becomes very easy to interact with external library functions and build utilities directly in C++.
This CL introduces basic printing support in a libmlir_utils.so.
Unit tests are rewritten using this feature and also to improve coverage.

For now, C mandates that we have a unique function for each MemRef element type and rank.
In a future a simple unranked descriptor can be introduced to only require uniqu'ing by element type.

PiperOrigin-RevId: 273304741

Support AllocOp terminal in Linalg::AliasAnalysis.

Now that linalg.view and strided memrefs are unified, there is no reason to
disallow AllocOp in alias analysis. This CLs adds support for AllocOp which allows writing shorter tests that do not require explicitly creating a view for
each operation.

PiperOrigin-RevId: 273303060

Add DialectType and generate docs for dialect types

Add new `typeDescription` (description was already used by base constraint class) field to type to allow writing longer descriptions about a type being defined. This allows for providing additional information/rationale for a defined type. This currently uses `description` as the heading/name for the type in the generated documentation.

PiperOrigin-RevId: 273299332

Fix CMake build after adding TestOpaqueLoc.cpp

PiperOrigin-RevId: 273296399

Add OpaqueLoc to MLIR locations.

See RFC: https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/xE2IzfhE3Wg.

Opaque location stores two pointers, one of them points to some data structure that is external to MLIR, and the other one is unique for each type and represents type id of that data structure. OpaqueLoc also stores an optional location that can be used if the first one is not suitable.
OpaqueLoc is managed similar to FileLineColLoc. It is passed around by MLIR transformations and can be used in compound locations like CallSiteLoc.

PiperOrigin-RevId: 273266510

Support reduction of partial warps.

gpu.all_reduce now supports block sizes that are not multiple of 32.

PiperOrigin-RevId: 273255204

Enable emitting dialect summary & description during op generation

Sort ops per dialect and emit summary & description (if provided) of each dialect before emitting the ops of the dialect.

PiperOrigin-RevId: 273077138

Allow element type traits to operate on scalars

This allows confirming that a scalar argument has the same element type as a shaped one. It's easy to validate a type is shaped on its own if that's desirable, so this shouldn't make that use case harder. This matches the behavior of other traits that operate on element type (e.g. AllElementTypesMatch). Also this makes the code simpler because now we just use getElementTypeOrSelf.

Verified that all uses in core already check the type is shaped in another way.

PiperOrigin-RevId: 273068507

NFC: Cleanup test ops and traits tests

1. Rename a few ops to make it clear they operate on *element* types.
2. Remove unused and generic operand and result ODS names (e.g. $res, $arg, $input). These are just clutter and don't make the op definitions any clearer.
3. Give test cases with duplicate names clearer names.
4. Add missing test case for no operands in SameOperandAndResultElementType.

PiperOrigin-RevId: 273067933

[spirv] Allow return ops to be in control flow ops

Use `getParentOfType<FunctionOp>()` instead of `cast<FuncOp>(getParentOp())`
to avoid crash when return ops are used inside spv.selection/spv.loop.

PiperOrigin-RevId: 273006041

Add missing dependency on the TypeInferOpInterface from the Test dialect

This is fixing a build failure, usually non-deterministic because of
parallelism in the build, but could be reliably reproduced:

ninja projects/mlir/test/lib/TestDialect/CMakeFiles/MLIRTestDialect.dir/TestPatterns.cpp.o

PiperOrigin-RevId: 272998436

Add spv.Undef op to support OpUndef instruction in SPIR-V.

Adding support for OpUndef instruction. Updating the dialect
generation script to fix a few bugs in the instruction spec
generation.

PiperOrigin-RevId: 272975685

Add some utility builder functions for SPIR-V operations.

Add builder functions for spv._address_of, spv.EntryPoint,
spv.ExecutionMode and spv.Load to make it easier to create these
operations.
Fix a minor bug in printing of spv.EntryPoint
Add a utility function to get the attribute name associated with a
decoration.

PiperOrigin-RevId: 272952846

Replace constexpr MemRefType::kDynamicStrideOrOffset by a MemRefType:;getDynamicStrideOrOffset() method - NFC

This fixes global ODR-use issues, some of which manifest in Parser.cpp.

Fixes tensorflow/mlir#167.

PiperOrigin-RevId: 272886347

Add missing Linalg lowerings to allow roundtrip.mlir to lower to LLVM

Certain lowering patterns were reported as [missing](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/dkdmHa77sSQ).

This CL adds them and allows Linalg/roundtrip.mlir and Linalg/loops.mlir to lower to LLVM directly. Those 2 tests are updated to additionally check that the direct lowering to LLVM does not crash.

The following points, left as TODOs still need to be addressed for correct end-to-end execution:
1. the lowering for ConvOp needs to pass attributes such as strides and dilations; the external library call needs to support it.
2. the lowering for GenericOp needs to support lowering to loops as a DialectConversion pattern. This is blocked on the DialectConversion infrastructure accepting an OperationFolder.

PiperOrigin-RevId: 272878131

Moving the GPUIndexIntrinsicOpLowering template to a common location

The GPUIndexIntrinsicOpLowering template is currently used by the code in both the GPUToNVVM and GPUToROCDL dirs.
Moving it to a common location to remove code duplication.

Closes tensorflow/mlir#163

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/163 from deven-amd:deven-refactor-gpu-index-ops-lowering b8dc2a5f5353df196039b6ff2ad42106028693ed
PiperOrigin-RevId: 272863297

Fix typos, NFC.

PiperOrigin-RevId: 272851237

Add support for inlining calls with different arg/result types from the callable.

Some dialects have implicit conversions inherent in their modeling, meaning that a call may have a different type that the type that the callable expects. To support this, a hook is added to the dialect interface that allows for materializing conversion operations during inlining when there is a mismatch. A hook is also added to the callable interface to allow for introspecting the expected result types.

PiperOrigin-RevId: 272814379

Update the Inliner pass to work on SCCs of the CallGraph.

This allows for the inliner to work on arbitrary call operations. The updated inliner will also work bottom-up through the callgraph enabling support for multiple levels of inlining.

PiperOrigin-RevId: 272813876

Add `axis` attribute to the quant.stats op

The first dim length of the axisStats attribute should equals to the slice size
of the input argument when splitted by the axis dimension.

PiperOrigin-RevId: 272798042

Add fpext and fptrunc to the Standard dialect and includes conversion to LLVM

PiperOrigin-RevId: 272768027

Generalize parse/printBinaryOp to parse/printOneResultOp.

PiperOrigin-RevId: 272722539

Add syntactic sugar for strided memref parsing.
This CL implements the last remaining bit of the [strided memref proposal](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).

The syntax is a bit more explicit than what was originally proposed and resembles:
`memref<?x?xf32, offset: 0 strides: [?, 1]>`

Nonnegative strides and offsets are currently supported. Future extensions will include negative strides.

This also gives a concrete example of syntactic sugar for the ([RFC] Proposed Changes to MemRef and Tensor MLIR Types)[https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/-wKHANzDNTg].

The underlying implementation still uses AffineMap layout.

PiperOrigin-RevId: 272717437

Make Module::getName return Optional<StringRef>

Module names are optional so it makes more sense to take and return an optional
any time the name is involved. Also update the language reference to reflect
the module names.

PiperOrigin-RevId: 272684698

Give modules a name

Modules are now Ops and, as such, can be nested. They do not produce an SSA
value so there is no possibility to refer to them in the IR. Introduce support
for symbol names attached to the module Op so that it can be referred to using
SymbolRefAttrs. The name is optional, for example the implicit top-level module
does not have a name.

PiperOrigin-RevId: 272671600

Add parentheses around boolean operators in assert

This removes a warning and is generally a good practice.

PiperOrigin-RevId: 272613597

NFC: rename Conversion/ControlFlowToCFG to Conversion/LoopToStandard

This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.

PiperOrigin-RevId: 272612112

Disallow index types in memrefs.

As specified in the MLIR language reference and rationale documents, `memref`
types should not be allowed to have `index` as element types. As observed in
https://groups.google.com/a/tensorflow.org/forum/#!msg/mlir/P49hVWqTMNc/nW89a4i_AgAJ
this restriction was lifted when canonicalization unit tests for affine
operations were introduced, without sufficient motivation to lift the
restriction itself. The test in question can be trivially rewritten (return
the value from a function instead of storing it to prevent DCE from removing
the producer operation) and the restriction put back in place.

If `memref<...x index>` is relevant for some use cases, the relaxation of the
type system can be implemented separately with appropriate modifications to the
documentation.

PiperOrigin-RevId: 272607043

Extract MemRefType::getStridesAndOffset as a free function and fix dynamic offset determination.

This also adds coverage with a missing test, which uncovered a bug in the conditional for testing whether an offset is dynamic or not.

PiperOrigin-RevId: 272505798

[spirv] Add support for spv.selection

Similar to spv.loop, spv.selection is another op for modelling
SPIR-V structured control flow. It covers both OpBranchConditional
and OpSwitch with OpSelectionMerge.

Instead of having a `spv.SelectionMerge` op to directly model
selection merge instruction for indicating the merge target,
we use regions to delimit the boundary of the selection: the
merge target is the next op following the `spv.selection` op.
This way it's easier to discover all blocks belonging to
the selection and it plays nicer with the MLIR system.

PiperOrigin-RevId: 272475006

Fix example in OpInterfaces documentation

The concept-based polymorphism structure was missing an inheritance link
between the concept and the model. The interface class did not re-export the
base class constructor, which made it unusable with llvm::isa calls. Fix these
and reformat the code around.

PiperOrigin-RevId: 272452062

Replace spurious `long` stride type by int64_t - NFC

PiperOrigin-RevId: 272425434

[ROCm] Adding pass to lower GPU Dialect to ROCDL Dialect.

This is a follow-up to the PRtensorflow/mlir#146 which introduced the ROCDL Dialect. This PR introduces a pass to lower GPU Dialect to the ROCDL Dialect. As with the previous PR, this one builds on the work done by @whchung, and addresses most of the review comments in the original PR.

Closes tensorflow/mlir#154

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/154 from deven-amd:deven-lower-gpu-to-rocdl 809893e08236da5ab6a38e3459692fa04247773d
PiperOrigin-RevId: 272390729

Show type even if elementsattr is elided in graph

The type is quite useful for debugging and shouldn't be too large.

PiperOrigin-RevId: 272390311

[spirv] Change enum case uniquing in gen_spirv_dialect.py

In SPIR-V we can have multiple symbols corresponding to the same
enum value. This is because when an extension is introduced into
the core spec, its suffix is typically removed, e.g., 'VulkanKHR'
memory model becomes 'Vulkan' memory model in SPIR-V 1.5.

Previously we just keep the first symbol for an enum value. That
symbol is not necessarily a better one. This CL changes to sort
symbols, grouped by enum values, alphabetically and then keep
the first one, which is typically shorter and without the extension
suffix. We also fix up certain ones like HlslSemanticGOOGLE.

PiperOrigin-RevId: 272290363

Add a pair of hooks to DominanceInfo.

This exposes hooks for accessing internal dominance nodes, and updating the internal DFS numbers.

Closes tensorflow/mlir#151

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/151 from schweitzpgi:dominance_hooks 69d14214a423b816cbd59feffcacdd02f3b5f921
PiperOrigin-RevId: 272287352

Fix and simplify CallOp/CallIndirectOp to LLVM::CallOp conversion

A recent ABI compatibility change affected the conversion from standard
CallOp/CallIndirectOp to LLVM::CallOp by changing its signature. In order to
analyze the signature, the code was looking up the callee symbol in the module.
This is incorrect since, during the conversion, the module may contain both the
original and the converted function op that have the same symbol name. There is
no strict guarantee on which of the two symbols will be found by the lookup.
The conversion was not failing because the type legalizer converts the LLVM
types to themselves making the original and the converted function signatures
ultimately produce the same type.

Instead of looking up the function signature to get the list of result types,
use the types of the CallOp/CallIndirectOp results which must match those of
the function in valid IR. These types are guaranteed to be the original,
unconverted types when converting the operation. Furthermore, this avoids the
need to perform a lookup of a symbol name in the module which may be expensive.

Finally, propagate attributes as-is from the original op to the converted op
since they share the attribute name for the callee of direct calls and the rest
of attributes are not affected by the conversion. This removes the need for
additional contorsions between direct and indirect calls to extract the name of
the optional callee attribute only to insert it back. This also prevents the
conversion from unintentionally dropping the other attributes of the op.

PiperOrigin-RevId: 272218871

Unify Linalg types by using strided memrefs

This CL finishes the implementation of the Linalg + Affine type unification of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
As a consequence, the !linalg.view type, linalg::DimOp, linalg::LoadOp and linalg::StoreOp can now disappear and Linalg can use standard types everywhere.

PiperOrigin-RevId: 272187165

[spirv] NFC: rename SPV_ArithmeticOp to SPV_ArithmeticBinaryOp

Also rename SPV_UnaryArithmeticOp to SPV_ArithmeticUnaryOp to be
consistent.

PiperOrigin-RevId: 272173974

Change all_reduce lowering to support 2D and 3D blocks.

Perform second reduce only with first warp. This requires an additional __sync_threads(), but doesn't need special handling when the last warp is small. This simplifies support for block sizes that are not multiple of 32.

Supporting partial warp reduce will be done in a separate CL.

PiperOrigin-RevId: 272168917