review.tizen.org Git - platform/upstream/llvm.git/log

[MLIR] Separate and split vectorization tests

These tests have become too bulky and unwiedly.
Splitting simplifies modifications that will occur in the next CL.

PiperOrigin-RevId: 223874321

[MLIR] Add VectorTransferOps

This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.

VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.

VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.

Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.

VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.

A vector transfer read has semantics similar to a vector load, with additional
support for:
  1. an optional value of the elemental type of the MemRef. This value
     supports non-effecting padding and is inserted in places where the
     vector read exceeds the MemRef bounds. If the value is not specified,
     the access is statically guaranteed to be within bounds;
  2. an attribute of type AffineMap to specify a slice of the original
     MemRef access and its transposition into the super-vector shape. The
     permutation_map is an unbounded AffineMap that must represent a
     permutation from the MemRef dim space projected onto the vector dim
     space.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
  ...
  %val = `ssa-value` : f32
  // let %i, %j, %k, %l be ssa-values of type index
  %v0 = vector_transfer_read %src, %i, %j, %k, %l
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index) ->
            vector<16x32x64xf32>
  %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index, f32) ->
            vector<16x32x64xf32>
```

VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
  %val = `ssa-value` : vector<16x32x64xf32>
  // let %i, %j, %k, %l be ssa-values of type index
  vector_transfer_write %val, %src, %i, %j, %k, %l
    {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
  (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234

Fix two more getHashValues.

These were still returning the hash of the pointers resulting in the two getHashValues being different.

PiperOrigin-RevId: 223862743

FlatAffineConstraints::composeMap: return failure instead of asserting on semi-affine maps

FlatAffineConstraints::composeMap: should return false instead of asserting on
a semi-affine map. Make getMemRefRegion just propagate false when encountering
semi-affine maps (instead of crashing!)
PiperOrigin-RevId: 223828743

Minor fix for replaceAllMemRefUsesWith.

The check for whether the memref was used in a non-derefencing context had to
be done inside, i.e., only for the op stmt's that the replacement was specified
to be performed on (by the domStmtFilter arg if provided). As such, it is
completely fine for example for a function to return a memref while the replacement
is being performed only a specific loop's body (as in the case of DMA
generation).

PiperOrigin-RevId: 223827753

Add a simple common sub expression elimination pass.

The algorithm collects defining operations within a scoped hash table. The scopes within the hash table correspond to nodes within the dominance tree for a function. This cl only adds support for simple operations, i.e non side-effecting. Such operations, e.g. load/store/call, will be handled in later patches.

PiperOrigin-RevId: 223811328

Remove tfl.reshape op when possible

Remove tfl.reshape for the following two cases:

1. A tfl.reshape's input is from another tfl.reshape.
Then these two tfl.reshape ops can be merged.

2. A tfl.reshape's result type is the same as its input type.
This tfl.reshape op does nothing, which can be removed.

These transformations are put in a new source file, Canonicalizer.cpp,
because they are TFLite op to TFLite op transformations, and aiming
to making TFLite ops more canonicalized.

Also added a hasCanonicalizationPatterns marker in TableGen Op class
to indicate whether an op has custom getCanonicalizationPatterns().

PiperOrigin-RevId: 223806921

Update getHashValue for ptr values stored in a DenseMap/Set to use getHasValue of KeyTy.

Ensures both hash values returned are the same. Tested by triggering resize of map/set and verifying failure before change.

PiperOrigin-RevId: 223651443

RankedTensorType: Use getHashValue(KeyTy) when calling getHashValue(RankedTensorTypeStorage*).

PiperOrigin-RevId: 223649958

Document SelectOp class

This was missing from the commit that introduced SelectOp although the
documentation was present in the LangRef.md.

PiperOrigin-RevId: 223476888

Avoid failing when attempting to print null Attribute.

This avoids segfaulting when dumping during debugging of failures.

PiperOrigin-RevId: 223449494

Debug output / logging memref sizes in DMA generation + related changes

- Add method to get a memref's size in bytes
- clean up a loop tiling pass helper (NFC)

PiperOrigin-RevId: 223422077

[MLIR] Reenable materialize_vectors test

Fixes one of the Filecheck'ed test which was mistakenly disabled.

PiperOrigin-RevId: 223401978

Add support for result type iteration in Operation/Instruction/OperationStmt.

PiperOrigin-RevId: 223264992

Split "rewrite" functionality out of Pattern into a new RewritePattern derived
class. This change is NFC, but allows for new kinds of patterns, specifically
LegalizationPatterns which will be allowed to change the types of things they
rewrite.

PiperOrigin-RevId: 223243783

Verify CmpIOp's result type to be bool-like

This CL added two new traits, SameOperandsAndResultShape and
ResultsAreBoolLike, and changed CmpIOp to embody these two
traits. As a consequence, CmpIOp's result type now is verified
to be bool-like.

PiperOrigin-RevId: 223208438

Add derived attribute support.

Derived attributes are attributes that are derived from other properties of the operation (e.g., the shape returned from the type). DerivedAttr is parameterized on the return type and function body.

PiperOrigin-RevId: 223180315

StandardOps: introduce 'select'.

The semantics of 'select' is conventional: return the second operand if the
first operand is true (1 : i1) and the third operand otherwise. It is
applicable to vectors and tensors element-wise, similarly to LLVM instruction.
This operation is necessary to implement min/max to lower 'for' loops with
complex bounds to CFG functions and to support ternary operations in ML
functions. It is preferred to first-class min/max because of its simplicity,
e.g. it is not concered with signedness.

PiperOrigin-RevId: 223160860

LLVM IR lowering: support 'dim' operation.

Add support for translating 'dim' opreation on MemRefs to LLVM IR. For a
static size, this operation merely defines an LLVM IR constant value that may
not appear in the output IR if not used (and had not been removed before by
DCE). For a dynamic size, this operation is translated into an access to the
MemRef descriptor that contains the dynamic size.

PiperOrigin-RevId: 223160774

LLVM IR lowering: support simple MemRef types

Introduce initial support for MemRef types, including type conversion,
allocation and deallocation, read and write element-wise access, passing
MemRefs to and returning from functions.  Affine map compositions and
non-default memory spaces are NOT YET supported.

Lowered code needs to handle potentially dynamic sizes of the MemRef.  To do
so, it replaces a MemRef-typed value with a special MemRef descriptor that
carries the data and the dynamic sizes together.  A MemRef type is converted to
LLVM's first-class structure type with the first element being the pointer to
the data buffer with data layed out linearly, followed by as many integer-typed
elements as MemRef has dynamic sizes.  The type of these elements is that of
MLIR index lowered to LLVM.  For example, `memref<?x42x?xf32>` is converted to
`{ f32*, i64, i64 }` provided `index` is lowered to `i64`.  While it is
possible to convert MemRefs with fully static sizes to simple pointers to their
elemental types, we opted for consistency and convert them to the
single-element structure.  This makes the conversion code simpler and the
calling convention of the generated LLVM IR functions consistent.

Loads from and stores to a MemRef element are lowered to a sequence of LLVM
instructions that, first, computes the linearized index of the element in the
data buffer using the access indices and combining the static sizes with the
dynamic sizes stored in the descriptor, and then loads from or stores to the
buffer element indexed by the linearized subscript.  While some of the index
computations may be redundant (i.e., consecutive load and store to the same
location in the same scope could reuse the linearized index), we emit them for
every operation.  A subsequent optimization pass may eliminate them if
necessary.

MemRef allocation and deallocation is performed using external functions
`__mlir_alloc(index) -> i8*` and `__mlir_free(i8*)` that must be implemented by
the caller.  These functions behave similarly to `malloc` and `free`, but can
be extended to support different memory spaces in future.  Allocation and
deallocation instructions take care of casting the pointers.  Prior to calling
the allocation function, the emitted code creates an SSA Value for the
descriptor and uses it to store the dynamic sizes of the MemRef passed to the
allocation operation.  It further emits instructions that compute the dynamic
amount of memory to allocate in bytes.  Finally, the allocation stores the
result of calling the `__mlir_alloc` in the MemRef descriptor.  Deallocation
extracts the pointer to the allocated memory from the descriptor and calls
`__mlir_free` on it.  The descriptor itself is not modified and, being
stack-allocated, ceases to exist when it goes out of scope.

MLIR functions that access MemRef values as arguments or return them are
converted to LLVM IR functions that accept MemRef descriptors as LLVM IR
structure types by value.  This significantly simplifies the calling convention
at the LLVM IR level and avoids handling descriptors in the dynamic memory,
however is not always comaptible with LLVM IR functions emitted from C code
with similar signatures.  A separate LLVM pass may be introduced in the future
to provide C-compatible calling conventions for LLVM IR functions generated
from MLIR.

PiperOrigin-RevId: 223134883

Make operation names hashable.

PiperOrigin-RevId: 223104253

Create Passes.md.

Start the documentation file listing available MLIR passes.  Briefly describe
the `-convert-to-cfg` and the `-lower-affine-apply` passes.  These passes
serve as description templates for other passes.  In particular, they include
the dialect and operation restrictions in the pass input and output.

PiperOrigin-RevId: 223076894

Fix typo.

Tensor has as element type a tensor-memref-element-type rather than a vector-element-type.

PiperOrigin-RevId: 223062135

Convert tf.FusedBatchNorm into tfl primary math ops

* Added TF::FusedBatchNormOp
* Validated TF::FusedBatchNormOp's operands
* Added converter from tf.FusedBatchNorm to tfl math ops

In the converter, we additionally check that the 'is_training'
attribute in tf.FusedBatchNorm is false and the last 4 outputs
are all not used (true for inference). These requirements do
not exist in the original TOCO source code, which just silently
ignores the last 4 outputs.

PiperOrigin-RevId: 223027333

Add support for setting the location of an IROperandOwner.

PiperOrigin-RevId: 222995814

Tidy up the replaceOp hooks in PatternMatch, generalizing them to support any
number of result ops. Among other things, this results in shorter names

PiperOrigin-RevId: 222685039

Minimal patch to allow patterns to rewrite multi-result instructions, related to b/119877155

PiperOrigin-RevId: 222597798

Rename Deaffinator to LowerAffineApply and patch it.

Several things were suggested in post-submission reviews. In particular, use
pointers in function interfaces instead of references (still use references
internally). Clarify the behavior of the pass in presence of MLFunctions.

PiperOrigin-RevId: 222556851

[MLIR] Fix opt build

PiperOrigin-RevId: 222491353

[MLIR][MaterializeVectors] Add a MaterializeVector pass via unrolling.

This CL adds an MLIR-MLIR pass which materializes super-vectors to
hardware-dependent sized vectors.

While the physical vector size is target-dependent, the pass is written in
a target-independent way: the target vector size is specified as a parameter
to the pass. This pass is thus a partial lowering that opens the "greybox"
that is the super-vector abstraction.

This first CL adds a first materilization pass iterates over vector_transfer_write operations and:
1. computes the program slice including the current vector_transfer_write;
2. computes the multi-dimensional ratio of super-vector shape to hardware
vector shape;
3. for each possible multi-dimensional value within the bounds of ratio, a new slice is
instantiated (i.e. cloned and rewritten) so that all operations in this instance operate on
the hardware vector type.

As a simple example, given:
```mlir
mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> {
  %A = alloc (%M, %N) : memref<?x?xf32>
  %B = alloc (%M, %N) : memref<?x?xf32>
  %C = alloc (%M, %N) : memref<?x?xf32>
  for %i0 = 0 to %M {
    for %i1 = 0 to %N {
      %a1 = load %A[%i0, %i1] : memref<?x?xf32>
      %b1 = load %B[%i0, %i1] : memref<?x?xf32>
      %s1 = addf %a1, %b1 : f32
      store %s1, %C[%i0, %i1] : memref<?x?xf32>
    }
  }
  return %C : memref<?x?xf32>
}
```

and the following options:
```
-vectorize -virtual-vector-size 32 --test-fastest-varying=0 -materialize-vectors -vector-size=8
```

materialization emits:
```mlir
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0, d1 + 8)
#map2 = (d0, d1) -> (d0, d1 + 16)
#map3 = (d0, d1) -> (d0, d1 + 24)
mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
  %0 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %1 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %2 = alloc(%arg0, %arg1) : memref<?x?xf32>
  for %i0 = 0 to %arg0 {
    for %i1 = 0 to %arg1 step 32 {
      %3 = affine_apply #map0(%i0, %i1)
      %4 = "vector_transfer_read"(%0, %3tensorflow/mlir#0, %3tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %5 = affine_apply #map1(%i0, %i1)
      %6 = "vector_transfer_read"(%0, %5tensorflow/mlir#0, %5tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %7 = affine_apply #map2(%i0, %i1)
      %8 = "vector_transfer_read"(%0, %7tensorflow/mlir#0, %7tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %9 = affine_apply #map3(%i0, %i1)
      %10 = "vector_transfer_read"(%0, %9tensorflow/mlir#0, %9tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %11 = affine_apply #map0(%i0, %i1)
      %12 = "vector_transfer_read"(%1, %11tensorflow/mlir#0, %11tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %13 = affine_apply #map1(%i0, %i1)
      %14 = "vector_transfer_read"(%1, %13tensorflow/mlir#0, %13tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %15 = affine_apply #map2(%i0, %i1)
      %16 = "vector_transfer_read"(%1, %15tensorflow/mlir#0, %15tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %17 = affine_apply #map3(%i0, %i1)
      %18 = "vector_transfer_read"(%1, %17tensorflow/mlir#0, %17tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %19 = addf %4, %12 : vector<8xf32>
      %20 = addf %6, %14 : vector<8xf32>
      %21 = addf %8, %16 : vector<8xf32>
      %22 = addf %10, %18 : vector<8xf32>
      %23 = affine_apply #map0(%i0, %i1)
      "vector_transfer_write"(%19, %2, %23tensorflow/mlir#0, %23tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
      %24 = affine_apply #map1(%i0, %i1)
      "vector_transfer_write"(%20, %2, %24tensorflow/mlir#0, %24tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
      %25 = affine_apply #map2(%i0, %i1)
      "vector_transfer_write"(%21, %2, %25tensorflow/mlir#0, %25tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
      %26 = affine_apply #map3(%i0, %i1)
      "vector_transfer_write"(%22, %2, %26tensorflow/mlir#0, %26tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
    }
  }
  return %2 : memref<?x?xf32>
}
```

PiperOrigin-RevId: 222455351

[MLIR][Slicing] Apply cleanups

This CL applies a few last cleanups from a previous CL that have been
missed during the previous submit.

PiperOrigin-RevId: 222454774

[MLIR][Slicing] Add utils for computing slices.

This CL adds tooling for computing slices as an independent CL.
The first consumer of this analysis will be super-vector materialization in a
followup CL.

In particular, this adds:
1. a getForwardStaticSlice function with documentation, example and a
standalone unit test;
2. a getBackwardStaticSlice function with documentation, example and a
standalone unit test;
3. a getStaticSlice function with documentation, example and a standalone unit
test;
4. a topologicalSort function that is exercised through the getStaticSlice
unit test.

The getXXXStaticSlice functions take an additional root (resp. terminators)
parameter which acts as a boundary that the transitive propagation algorithm
is not allowed to cross.

PiperOrigin-RevId: 222446208

Clean up parse_headers in mlir

Not having self-contained headers in LLVM is a constant pain. Don't make the
same mistake in mlir. The only interesting change here is moving setSuccessor
to Instructions.cpp, which breaks the cycle between Instructions.h and
BasicBlock.h.

PiperOrigin-RevId: 222440816

Fix bugs in DMA generation and FlatAffineConstraints; add more test
cases.

- fix bug in calculating index expressions for DMA buffers in certain cases
  (affected tiled loop nests); add more test cases for better coverage.
- introduce an additional optional argument to replaceAllMemRefUsesWith;
  additional operands to the index remap AffineMap can now be supplied by the
  client.
- FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound,
  ::composeMap - fix position bug.
- Some clean up and more comments

PiperOrigin-RevId: 222434628

Introduce Deaffinator pass.

This function pass replaces affine_apply operations in CFG functions with
sequences of primitive arithmetic instructions that form the affine map.

The actual replacement functionality is located in LoweringUtils as a
standalone function operating on an individual affine_apply operation and
inserting the result at the location of the original operation. It is expected
to be useful for other, target-specific lowering passes that may start at
MLFunction level that Deaffinator does not support.

PiperOrigin-RevId: 222406692

Lower scalar parts of CFG functions to LLVM IR

Initial restricted implementaiton of the MLIR to LLVM IR translation.
Introduce a new flow into the mlir-translate tool taking an MLIR module
containing CFG functions only and producing and LLVM IR module.  The MLIR
features supported by the translator are as follows:
- primitive and function types;
- integer constants;
- cfg and ext functions with 0 or 1 return values;
- calls to these functions;
- basic block conversion translation of arguments to phi nodes;
- conversion between arguments of the first basic block and function arguments;
- (conditional) branches;
- integer addition and comparison operations.

Are NOT supported:
- vector and tensor types and operations on them;
- memrefs and operations on them;
- allocations;
- functions returning multiple values;
- LLVM Module triple and data layout (index type is hardcoded to i64).

Create a new MLIR library and place it under lib/Target/LLVMIR.  The "Target"
library group is similar to the one present in LLVM and is intended to contain
all future public MLIR translation targets.

The general flow of MLIR to LLVM IR convresion will include several lowering
and simplification passes on the MLIR itself in order to make the translation
as simple as possible.  In particular, ML functions should be transformed to
CFG functions by the recently introduced pass, operations on structured types
will be converted to sequences of operations on primitive types, complex
operations such as affine_apply will be converted into sequence of primitive
operations, primitive operations themselves may eventually be converted to an
LLVM dialect that uses LLVM-like operations.

Introduce the first translation test so that further changes make sure the
basic translation functionality is not broken.

PiperOrigin-RevId: 222400112

Create the Support library.

This has been a long-standing TODO in the build system. Now that we need to
share the non-inlined implementation of file utilities for translators, create
a separate library for support functionality. Move Support/* headers to the
new library in the build system.

PiperOrigin-RevId: 222398880

Separate translators into "from MLIR" and "to MLIR".

Translations performed by mlir-translate only have MLIR on one end.
MLIR-to-MLIR conversions (including dialect changes) should be treated as
passes and run by mlir-opt.  Individual translations should not care about
reading or writing MLIR and should work on in-memory representation of MLIR
modules instead.  Split the TranslateFunction interface and the translate
registry into two parts: "from MLIR" and "to MLIR".

Update mlir-translate to handle both registries together by wrapping
translation functions into source-to-source convresions.  Remove MLIR parsing
and writing from individual translations and make them operate on Modules
instead.  This removes the need for individual translators to include
tools/mlir-translate/mlir-translate.h, which can now be safely removed.

Remove mlir-to-mlir translation that only existed as a registration example and
use mlir-opt instead for tests.

PiperOrigin-RevId: 222398707

Factor out translation registry.

The mlir-translate tool is expected to discover individual translations at link
time.  These translations must register themselves and may need the utilities
that are currently defined in mlir-translate.cpp for their entry point
functions.  Since mlir-translate is linking against individual translations,
the translations cannot link against mlir-translate themselves.  Extract out
the utilities into a separate "Translation" library to avoid the potential
dependency cycle.  Individual translations link to that library to access
TranslateRegistration. The mlir-translate tool links to individual translations
and to the "Translation" library because it needs the utilities as well.

The main header of the new library is located in include/mlir/Translation.h to
make it easily accessible by translators.  The rationale for putting it to
include/mlir rather than to one of its subdirectories is that its purpose is
similar to that of include/mlir/Pass.h so it makes sense to put them at the
same level.

PiperOrigin-RevId: 222398617

Introduce TF WhileOp and lower it to MLIR CFG

Also, added iterators for VariadicResults class.

TESTED with unit tests

TODOs:
- Handle non-bool condition results (similar to the IfOp)
- Use PatternRewriter
PiperOrigin-RevId: 222340376

Add verifier check for integer constants to check that the value can fit within the type bit width.

PiperOrigin-RevId: 222335526

Remove unnecessary include from StandardOps.cpp.

PiperOrigin-RevId: 222316745

Remove allocations for memref's that become dead as a result of double
buffering in the auto DMA overlap pass.

This is done online in the pass.

PiperOrigin-RevId: 222313640

Add iterators and size() helper method in ArrayAttr

PiperOrigin-RevId: 222312276

AffineExprVisitor: fix names of default visitation functions.

Existing default visitation function for dimension and symbols were called
"visitAffineDimExpr" and "visitAffineSymbolExpr".  However, generic CRTP-based
visit and walk methods were calling "visitDimExpr" and "visitSymbolExpr",
respectively, on derived classes.  This has not been discovered before because
all existing affine expression visitors (re)define functions for dimensions and
symbols.  Change the names of the default empty visitation functions to the
latter form.

PiperOrigin-RevId: 222312114

Adds ConstantFoldHook registry in MLIRContext

This reverts the previous method which needs to create a new dialect with the
constant fold hook from TensorFlow. This new method uses a function object in
dialect to store the constant fold hook. Once a hook is registered to the
dialect, this function object will be assigned when the dialect is added to the
MLIRContext.

For the operations which are not registered, a new method getRegisteredDialects
is added to the MLIRContext to query the dialects which matches their op name
prefixes.

PiperOrigin-RevId: 222310149

Add functionality for erasing terminator successor operands and basic block arguments.

PiperOrigin-RevId: 222303233

Automated rollback of changelist 221863955.

PiperOrigin-RevId: 222299120

[MLIR][Vectorize] Refactor Vectorize use-def propagation.

This CL refactors a few things in Vectorize.cpp:
1. a clear distinction is made between:
  a. the LoadOp are the roots of vectorization and must be vectorized
  eagerly and propagate their value; and
  b. the StoreOp which are the terminals of vectorization and must be
  vectorized late (i.e. they do not produce values that need to be
  propagated).
2. the StoreOp must be vectorized late because in general it can store a value
that is not reachable from the subset of loads defined in the
current pattern. One trivial such case is storing a constant defined at the
top-level of the MLFunction and that needs to be turned into a splat.
3. a description of the algorithm is given;
4. the implementation matches the algorithm;
5. the last example is made parametric, in practice it will fully rely on the
implementation of vector_transfer_read/write which will handle boundary
conditions and padding. This will happen by lowering to a lower-level
abstraction either:
  a. directly in MLIR (whether DMA or just loops or any async tasks in the
     future) (whiteboxing);
  b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor
  one may want to use;
  c. a partial mix of a. and b. (grey-boxing)
5. minor cleanups are applied;
6. mistakenly disabled unit tests are re-enabled (oopsie).

With this CL, this MLIR snippet:
```
mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> {
  %A = alloc (%M, %N) : memref<?x?xf32>
  %B = alloc (%M, %N) : memref<?x?xf32>
  %C = alloc (%M, %N) : memref<?x?xf32>
  %f1 = constant 1.0 : f32
  %f2 = constant 2.0 : f32
  for %i0 = 0 to %M {
    for %i1 = 0 to %N {
      // non-scoped %f1
      store %f1, %A[%i0, %i1] : memref<?x?xf32>
    }
  }
  for %i4 = 0 to %M {
    for %i5 = 0 to %N {
      %a5 = load %A[%i4, %i5] : memref<?x?xf32>
      %b5 = load %B[%i4, %i5] : memref<?x?xf32>
      %s5 = addf %a5, %b5 : f32
      // non-scoped %f1
      %s6 = addf %s5, %f1 : f32
      store %s6, %C[%i4, %i5] : memref<?x?xf32>
    }
  }
  return %C : memref<?x?xf32>
}
```

vectorized with these arguments:
```
-vectorize -virtual-vector-size 256 --test-fastest-varying=0
```

vectorization produces this standard innermost-loop vectorized code:
```
mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
  %0 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %1 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %2 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %cst = constant 1.000000e+00 : f32
  %cst_0 = constant 2.000000e+00 : f32
  for %i0 = 0 to %arg0 {
    for %i1 = 0 to %arg1 step 256 {
      %cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
      "vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
    }
  }
  for %i2 = 0 to %arg0 {
    for %i3 = 0 to %arg1 step 256 {
      %3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
      %4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
      %5 = addf %3, %4 : vector<256xf32>
      %cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
      %6 = addf %5, %cst_2 : vector<256xf32>
      "vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
    }
  }
  return %2 : memref<?x?xf32>
}
```

Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now.

PiperOrigin-RevId: 222280209

Convert TF::Conv2D into TFL::Conv2D

Added TF::Conv2D op and TFL::Conv2D op, and converted TF::Conv2D to
TFL::Conv2D, which need to address the operand numberr mismatch
and attribute conversion.
PiperOrigin-RevId: 222277554

ConvertToCFG: handle loop 1D affine loop bounds.

In the general case, loop bounds can be expressed as affine maps of the outer
loop iterators and function arguments.  Relax the check for loop bounds to be
known integer constants and also accept one-dimensional affine bounds in
ConvertToCFG ForStmt lowering.  Emit affine_apply operations for both the upper
and the lower bound.  The semantics of MLFunctions guarantees that both bounds
can be computed before the loop starts iterating.  Constant bounds are merely a
short-hand notation for zero-dimensional affine maps and get supported
transparently.

Multidimensional affine bounds are not yet supported because the target IR
dialect lacks min/max operations necessary to implement the corresponding
semantics.

PiperOrigin-RevId: 222275801

Add support for getting the operand number from an IROperandImpl(InstOperand, BasicBlockOperand, StmtOperand).

PiperOrigin-RevId: 222274598

Add op stats pass to mlir-opt.

op-stats pass currently returns the number of occurrences of different operations in a Module. Useful for verifying transformation properties (e.g., 3 ops of specific dialect, 0 of another), but probably not useful outside of that so keeping it local to mlir-opt. This does not consider op attributes when counting.

PiperOrigin-RevId: 222259727

Add support for Operation::moveBefore(Operation *).

PiperOrigin-RevId: 222252521

[MLIR][VectorAnalysis] Add a VectorAnalysis and standalone tests

This CL adds some vector support in prevision of the upcoming vector
materialization pass. In particular this CL adds 2 functions to:
1. compute the multiplicity of a subvector shape in a supervector shape;
2. help match operations on strict super-vectors. This is defined for a given
subvector shape as an operation that manipulates a vector type that is an
integral multiple of the subtype, with multiplicity at least 2.

This CL also adds a TestUtil pass where we can dump arbitrary testing of
functions and analysis that operate at a much smaller granularity than a pass
(e.g. an analysis for which it is convenient to write a bit of artificial MLIR
and write some custom test). This is in order to keep using Filecheck for
things that essentially look and feel like C++ unit tests.

PiperOrigin-RevId: 222250910

Convert MLIR DiagnosticKind to LLVM DiagKind when emitting diagnostic via mlir-opt.

PiperOrigin-RevId: 222147297

Update 'return' statement syntax in LangRef to reflect the actual parsing syntax.

PiperOrigin-RevId: 222107722

Fix the implementation of PatternRewriter::createChecked. The current implementation has bit rotted and won't compile. This cl updates the implementation to be similar to (CFGFuncBuilder/MLFuncBuilder)::createChecked.

PiperOrigin-RevId: 222014317

Import the "MLIR: The case for a simplified polyhedral form" proposal doc from
google docs into the codebase as a rationale doc, since this is an important
aspect of our design.

PiperOrigin-RevId: 221957444

Change pretty printing of constant so that the attributes precede the value.

This does create an inconsistency between the print formats (e.g., attributes are normally before operands) but fixes an invalid parsing & keeps constant uniform wrt itself (function or int attributes have type at same place). And specifying the specific type for a int/float attribute might get revised shortly.

Also add test to verify that output printed can be parsed again.

PiperOrigin-RevId: 221923893

Updates to transformation/analysis passes/utilities. Update DMA generation pass
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.

- add support for getMemRefRegion symbolic in outer loops - hence support for
  DMAs symbolic in outer surrounding loops.

- add DMA generation support for outgoing DMAs (store op's to lower memory
  space); extend getMemoryRegion to store op's. -memref-bound-check now works
  with store op's as well.

- fix dma-generate (references to the old memref in the dma_start op were also
  being replaced with the new buffer); we need replace all memref uses to work
  only on a subset of the uses - add a new optional argument for
  replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
  'operation' argument to serve as a filter - if provided, only those uses that
  are dominated by the filter are replaced.

- Add missing print for attributes for dma_start, dma_wait op's.

- update the FlatAffineConstraints API

PiperOrigin-RevId: 221889223

Mark AllocOp as being free of side effects

PiperOrigin-RevId: 221863955

Update LangRef to reflect int/flaot attribute specification changes.

PiperOrigin-RevId: 221802835

[MLIR] Rename OperationInst to Instruction.

PiperOrigin-RevId: 221795407

Implement IfOp verification

This would also make the CallOp and ExtractElementOp invocations from eliminateIfOp function always valid and removes the need for error handling.

Also, verify TensorFlowOp trait.

PiperOrigin-RevId: 221737192

Merge OperationInst functionality into Instruction.

We do some limited renaming here but define an alias for OperationInst so that a follow up cl can solely perform the large scale renaming.

PiperOrigin-RevId: 221726963

Add Type to int/float attributes.

* Optionally attach the type of integer and floating point attributes to the attributes, this allows restricting a int/float to specific width.
- Currently this allows suffixing int/float constant with type [this might be revised in future].
- Default to i64 and f32 if not specified.
* For index types the APInt width used is 64.
* Change callers to request a specific attribute type.
* Store iN type with APInt of width N.
* This change does not handle the folding of constants of different types (e.g., doing int type promotions to support constant folding i3 and i32), and instead restricts the constant folding to only operate on the same types.

PiperOrigin-RevId: 221722699

[MLIR] Merge terminator and uses into BasicBlock operations list handling.

PiperOrigin-RevId: 221700132

Replace TerminatorInst with builtin terminator operations.

Note: Terminators will be merged into the operations list in a follow up patch.
PiperOrigin-RevId: 221670037

Fix variables only used in assertions.

PiperOrigin-RevId: 221660580

Add functionality for parsing/managing operation terminator successors.

Follow up patches will work to remove TerminatorInst.

PiperOrigin-RevId: 221640621

Fix hasStaticShape() method on vectors and tensors to work correctly for unranked tensors and remove getShape() method for unranked tensors.

Unranked tensors used to return an empty list of dimensions as their shape. This is confusing since an empty list of dimensions is also returned for 0-D tensors. In particular, hasStaticShape() method used to check if any of the dimensions are -1, which held for unranked tensors even though they don't have static shape.

PiperOrigin-RevId: 221571138

ConvertToCFG: properly remap nested function attributes.

Array attributes can nested and function attributes can appear anywhere at that
level.  They should be remapped to point to the generated CFGFunction after
ML-to-CFG conversion, similarly to plain function attributes.  Extract the
nested attribute remapping functionality from the Parser to Utils.  Extract out
the remapping function for individual Functions from the module remapping
function.  Use these new functions in the ML-to-CFG conversion pass and in the
parser.

PiperOrigin-RevId: 221510997

Move definitions of lopoUnroll* functions to LoopUtils.cpp.

These functions are declared in Transforms/LoopUtils.h (included to the
Transforms/Utils library) but were defined in the loop unrolling pass in
Transforms/LoopUnroll.cpp. As a result, targets depending only on
TransformUtils library but not on Transforms could get link errors. Move the
definitions to Transforms/Utils/LoopUtils.cpp where they should actually live.
This does not modify any code.

PiperOrigin-RevId: 221508882

Fix some minor typos pointed out by rxwei

PiperOrigin-RevId: 221474217

Mark mlir code snippets as being written in mlir

Forgot to add these in previous change :/

PiperOrigin-RevId: 221444322

Mark mlir code snippets as being written in mlir

Basic MLIR syntax highlighting is supported so use it.

PiperOrigin-RevId: 221443618

[MLIR] Support for vectorizing operations.

This CL adds support for and a vectorization test to perform scalar 2-D addf.

The support extension notably comprises:
1. extend vectorizable test to exclude vector_transfer operations and
expose them to LoopAnalysis where they are needed. This is a temporary
solution a concrete MLIR Op exists;
2. add some more functional sugar mapKeys, apply and ScopeGuard (which became
relevant again);
3. fix improper shifting during coarsening;
4. rename unaligned load/store to vector_transfer_read/write and simplify the
design removing the unnecessary AllocOp that were introduced prematurely:
vector_transfer_read currently has the form:
(memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32>
vector_transfer_write currently has the form:
(vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> ()
5. adds vectorizeOperations which traverses the operations in a ForStmt and
rewrites them to their vector form;
6. add support for vector splat from a constant.

The relevant tests are also updated.

PiperOrigin-RevId: 221421426

Pull duplicated build() in subclasses into root UnaryOp

PiperOrigin-RevId: 221326369

Start the plumbing for removing TerminatorInst.
* Add skeleton br/cond_br builtin ops.
* Add a terminator trait for operations.
* Mark ReturnOp as a Terminator.

The functionality for managing/parsing/verifying successors will be added in a follow up cl.

PiperOrigin-RevId: 221283000

Update split marker for split-input-file option to be more restrictive

This is to allow usage of comment blocks along with splits in test cases.
For example, "Function Control Flow Lowering" comment block in
raise-control-flow.mlir

TESTED with existing unit tests

PiperOrigin-RevId: 221214451

Optionally emit errors from IntegerType factory functions.

Similarly to other types, introduce "get" and "getChecked" static member
functions for IntegerType.  The latter emits errors to the error handler
registered with the MLIR context and returns a null type for the caller to
handle errors gracefully.  This deduplicates type consistency checks between
the parser and the builder.  Update the parser to call IntegerType::getChecked
for error reporting instead of the builder that would simply assert.

This CL completes the type system error emission refactoring: the parser now
only emits syntax-related errors for types while type factory systems may emit
type consistency errors.

PiperOrigin-RevId: 221165207

Homogenize branch instruction arguments.

Branch instruction arguments were defined and used inconsistently across
different instructions, in both the spec and the implementation.  In
particular, conditional and unconditional branch instructions were using
different syntax in the implementation.  This led to the IR we produce not
being accepted by the parser. Update the printer to use common syntax: `(`
list-of-SSA-uses `:` list-of-types `)`.  The motivation for choosing this
syntax as opposed to the one in the spec, `(` list-of-SSA-uses `)` `:`
list-of-types is double-fold.  First, it is tricky to differentiate the label
of the false branch from the type while parsing conditional branches (which is
what apparently motivated the implementation to diverge from the spec in the
first place).  Second, the ongoing convergence between terminator instructions
and other operations prompts for consistency between their operand list syntax.
After this change, the only remaining difference between the two is the use of
parentheses.  Update the comment of the parser that did not correspond to the
code.  Remove the unused isParenthesized argument from parseSSAUseAndTypeList.

Update the spec accordingly.  Note that the examples in the spec were _not_
using the EBNF defined a couple of lines above them, but were using the current
syntax.  Add a supplementary example of a branch to a basic block with multiple
arguments.

PiperOrigin-RevId: 221162655

Basic conversion of MLFunctions to CFGFunctions.

Implement a pass converting a subset of MLFunctions to CFGFunctions.  Currently
supports arbitrarily complex imperfect loop nests with statically constant
(i.e., not affine map) bounds filled with operations.  Does NOT support
branches and non-constant loop bounds.

Conversion is performed per-function and the function names are preserved to
avoid breaking any external references to the current module.  In-memory IR is
updated to point to the right functions in direct calls and constant loads.
This behavior is tested via a really hidden flag that enables function
renaming.

Inside each function, the control flow conversion is based on single-entry
single-exit regions, i.e. subgraphs of the CFG that have exactly one incoming
and exactly one outgoing edge.  Since an MLFunction must have a single "return"
statement as per MLIR spec, it constitutes an SESE region.  Individual
operations are appended to this region.  Control flow statements are
recursively converted into such regions that are concatenated with the current
region.  Bodies of the compound statement also form SESE regions, which allows
to nest control flow statements easily.  Note that SESE regions are not
materialized in the code.  It is sufficent to keep track of the end of the
region as the current instruction insertion point as long as all recursive
calls update the insertion point in the end.

The converter maintains a mapping between SSA values in ML functions and their
CFG counterparts.  The mapping is used to find the operands for each operation
and is updated to contain the results of each operation as the conversion
continues.

PiperOrigin-RevId: 221162602

Switch IntegerAttr to use APInt.

Change the storage type to APInt from int64_t for IntegerAttr (following the change to APFloat storage in FloatAttr). Effectively a direct change from int64_t to 64-bit APInt throughout (the bitwidth hardcoded). This change also adds a getInt convenience method to IntegerAttr and replaces previous getValue calls with getInt calls.

While this changes updates the storage type, it does not update all constant folding calls.

PiperOrigin-RevId: 221082788

Change the index upper bound for the outer-loop as the comment says the array has 8 rows.

PiperOrigin-RevId: 221082461

- Simplify PatternMatch to *require* static benefits at pattern construction
  time.  The "Fast and Flexible Instruction Selection With Constraints" paper
  from CC2018 makes a credible argument that dynamic costs aren't actually
  necessary/important, and we are not using them.

- Check in my "MLIR Generic DAG Rewriter Infrastructure" design doc into the
  source tree.

PiperOrigin-RevId: 221017546

Add the "MLIR: Incremental Application to TensorFlow Graph Algorithms" document
I wrote last weekend.

PiperOrigin-RevId: 221017318

Handle VectorOrTensorType parse failure instead of crashing

This was unsafe after cr/219372163 and seems to be the only such case in the
change. All other usage of dyn_cast are either handling the nullptr or are
implicitly safe. For example, they are being extracted from operand or result
SSAValue.

TESTED with unit test

PiperOrigin-RevId: 220905942

Falls back to dialect constant folding hook

PiperOrigin-RevId: 220861133

- Add support for fused locations.

These are locations that form a collection of other source locations with an optional metadata attribute.

- Add initial support for print/dump for locations.
Location Printing Examples:
* Unknown        : [unknown-location]
* FileLineColLoc : third_party/llvm/llvm/projects/google-mlir/test/TensorFlowLite/legalize.mlir:6:8
* FusedLoc       : <"tfl-legalize">[third_party/llvm/llvm/projects/google-mlir/test/TensorFlowLite/legalize.mlir:6:8, third_party/llvm/llvm/projects/google-mlir/test/TensorFlowLite/legalize.mlir:7:8]

- Add diagnostic support for fused locs.
* Prints the first location as the main location and the remaining as "fused from here" notes:
e.g.
third_party/llvm/llvm/projects/google-mlir/test/TensorFlowLite/legalize.mlir:6:8: error: This is an error.
  %1 = "tf.add"(%arg0, %0) : (i32, i32) -> i32
       ^
third_party/llvm/llvm/projects/google-mlir/test/TensorFlowLite/legalize.mlir:7:8: error: Fused from here.
  %2 = "tf.relu"(%1) : (i32) -> i32
       ^

PiperOrigin-RevId: 220835552

Adds support for returning the direction of the dependence between memref accesses (distance/direction vectors).
Updates MemRefDependenceCheck to check and report on all memref access pairs at all loop nest depths.
Updates old and adds new memref dependence check tests.
Resolves multiple TODOs.

PiperOrigin-RevId: 220816515

Automatic DMA generation for simple cases.
- constant bounded memory regions, static shapes, no handling of
  overlapping/duplicate regions (through union) for now; also only, load memory
  op's.
- add build methods for DmaStartOp, DmaWaitOp.
- move getMemoryRegion() into Analysis/Utils and expose it.
- fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds;
  update test cases for memref-bound-check and memref-dependence-check for
  exclusive bounds (missed in a previous CL)

PiperOrigin-RevId: 220729810

Clean up TensorType construction.

This CL introduces the following related changes:
- move tensor element type validity checking to a static member function
  TensorType::isValidElementType
- introduce get/getChecked similarly to MemRefType, where the checked function
  emits errors and returns nullptrs;
- remove duplicate element type validity checking from the parser and rely on
  the type constructor to emit errors instead.

PiperOrigin-RevId: 220694831

Clean up VectorType construction.

This CL introduces the following related changes:
- factor out element type validity checking to a static member function
  VectorType::isValidElementType;
- introduce get/getChecked similarly to MemRefType, where the checked function
  emits errors and returns nullptrs;
- remove duplicate element type validity checking from the parser and rely on
  the type constructor to emit errors instead.

PiperOrigin-RevId: 220693828

Implement value type abstraction for locations.

Value type abstraction for locations differ from others in that a Location can NOT be null. NOTE: dyn_cast returns an Optional<T>.

PiperOrigin-RevId: 220682078

Complete migration to exclusive upper bound

cl/220448963 had missed a part of the updates.

- while on this, clean up some of the test cases to use ops' custom forms.

PiperOrigin-RevId: 220675303

Add lookupPassInfo to enable querying the pass info for a pass.

The short term use would be in querying the pass name when reporting errors.

PiperOrigin-RevId: 220665532

Bug fixes in FlatAffineConstraints. Tests cases that discovered these in follow up CL on memref dependence checks.

PiperOrigin-RevId: 220632386

Allow vector types to have index elements.

It is unclear why vector types were not allowed to have "index" as element
type. Index values are integers, although of unknown bit width, and should
behave as such. Vectors of integers are allowed and so are tensors of indices
(for indirection purposes), it is more consistent to also have vectors of
indices.

PiperOrigin-RevId: 220630123