review.tizen.org Git - platform/upstream/llvm.git/log

Give StmtBlocks a use-def list, and give OperationStmt's the ability to have
optional successor operands when they are terminator operations.

This isn't used yet, but is part 2/n towards merging BasicBlock into StmtBlock
and Instruction into OperationStmt.

PiperOrigin-RevId: 226684636

Refactor ForStmt: having it contain a StmtBlock instead of subclassing
StmtBlock. This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.

This is step 1/N towards merging BasicBlock and StmtBlock. This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.

This is a mechanical patch, NFC.

PiperOrigin-RevId: 226684158

Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice.
Updates LoopFusion pass to take these parameters as command line flags for experimentation.

PiperOrigin-RevId: 226514297

Unify type uniquing and construction.

This allows for us to decouple type uniquing/construction from MLIRContext and pave the way for dialect specific types.

To accomplish this we two new classes, TypeUniquer and TypeStorageAllocator.

* TypeUniquer is now responsible for all construction and uniquing of types.
* TypeStorageAllocator is a utility used by derived type storage objects to allocate memory within an MLIRContext.

This cl also standardizes what a derived type storage class needs to provide:
    - Define a type alias, KeyTy, to a type that uniquely identifies the
      instance of the type within its kind.
      * The key type must be constructible from the values passed into the
        detail::TypeUniquer::get call after the type kind.
      * The key type must have a llvm::DenseMapInfo specialization for
        hashing.

    - Provide a method, 'KeyTy getKey() const', to construct the key type
      from an existing storage instance.

    - Provide a construction method:
        'DerivedStorage *construct(TypeStorageAllocator &, ...)'
      that builds a unique instance of the derived storage. The arguments
      after the TypeStorageAllocator must correspond with the values passed
      into the detail::TypeUniquer::get call after the type kind.

PiperOrigin-RevId: 226507184

Expand rewriter gen to handle string attributes in output.

* Extend to handle rewrite patterns with output attributes;
- Constant attributes are defined with a value and a type;
- The type of the value is mapped to the corresponding attribute type (string -> StringAttr);
* Verifies the type of operands in the resultant matches the defined op's operands;

PiperOrigin-RevId: 226468908

Add method to retrieve a pass's ID.

Add passID member to Pass and enable querying it.

PiperOrigin-RevId: 226445431

Do proper indexing for local variables when building access function equality constraints (working on test cases).

PiperOrigin-RevId: 226399089

Pass loop depth 1 to memref dependence check when constructing dependence constraints used to calculate computation slice for loop fusion.
This done so that the dominance check between ancestors of op statements from src/dst memref accesses will be run.

PiperOrigin-RevId: 226350443

Change attribute to be input argument.

Change operands to arguments in Op and use it for both operands and arguments. This unifies the way that operands and attributes are specified and the intended way that matching/creating ops with attributes will look. Both can now be represented using the same dag structure (and also makes the ordering more explicit). Derived attributes are not considered as part of the arguments (as they are inferred from the created op, not something needed to created it).
* Generate named operand accessors;
* Simplified the way of specifying Attr and use ElementAttr for TFL_Const instead.
* Fix a incorrect assertion generated;

The input parsing can be made more robust, I'll address that in a follow up.

PiperOrigin-RevId: 226307424

Address some issues from memref dependence check bug (b/121216762), adds tests cases.

PiperOrigin-RevId: 226277453

Improve loop fusion algorithm by using a memref dependence graph.
Fixed TODO for reduction fusion unit test.

PiperOrigin-RevId: 226277226

Simplify memref-dependence-check's meta data structures / drop duplication and
reuse existing ones.

- drop IterationDomainContext, redundant since FlatAffineConstraints has
MLValue information associated with its dimensions.
- refactor to use existing support
- leads to a reduction in LOC
- as a result of these changes, non-constant loop bounds get naturally
supported for dep analysis.
- update test cases to include a couple with non-constant loop bounds
- rename addBoundsFromForStmt -> addForStmtDomain
- complete TODO for getLoopIVs (handle 'if' statements)

PiperOrigin-RevId: 226082008

Update / complete a TODO for addBoundsForForStmt

- when adding constraints from a 'for' stmt into FlatAffineConstraints,
  correctly add bound operands of the 'for' stmt as a dimensional identifier or
  a symbolic identifier depending on whether the bound operand is a valid
  MLFunction symbol
- update test case to exercise this.

PiperOrigin-RevId: 225988511

Densify storage for f16, f32 and support f16 semantics in FloatAttrs

Existing implementation always uses 64 bits to store floating point values in
DenseElementsAttr.  This was due to FloatAttrs always a `double` for storage
independently of the actual type.  Recent commits added support for FloatAttrs
with the proper f32 type and floating semantics and changed the bitwidth
reporting on FloatType.

Use the existing infrastructure for densely storing 16 and 32-bit values in
DenseElementsAttr storage to store f16 and f32 values.  Move floating semantics
definition to the FloatType level.  Properly support f16 / IEEEhalf semantics
at the FloatAttr level and in the builder.

Note that bf16 is still stored as a 64-bit value with IEEEdouble semantics
because APFloat does not have first-class support for bf16 types.

PiperOrigin-RevId: 225981289

Refactor/update memref-dep-check's addMemRefAccessConstraints and
addDomainConstraints; add support for mod/div for dependence testing.

- add support for mod/div expressions in dependence analysis
- refactor addMemRefAccessConstraints to use getFlattenedAffineExprs (instead
of getFlattenedAffineExpr); update addDomainConstraints.
- rename AffineExprFlattener::cst -> localVarCst

PiperOrigin-RevId: 225933306

Refactor LowerVectorTransfersPass using pattern rewriters

This introduces a generic lowering pass for ML functions.  The pass is
parameterized by template arguments defining individual pattern rewriters.
Concrete lowering passes define individual pattern rewriters and inherit from
the generic class that takes care of allocating rewriters, traversing ML
functions and performing the actual rewrite.

While this is similar to the greedy pattern rewriter available in
Transform/Utils, it requires adjustments due to the ML/CFG duality.  In
particular, ML function rewriters must be able to create statements, not only
operations, and need access to an MLFuncBuilder.  When we move to using the
unified function type, the ML-specific rewriting will become unnecessary.

Use LowerVectorTransfers as a testbed for the generic pass.

PiperOrigin-RevId: 225887424

LLVM IR lowering: support vector_type_cast

Introduce support for lowering vector_type_cast to LLVM IR.  It consists in
creating a new MemRef descriptor with the base pointer with the type that
corresponds to the lowered element type of the target memref.  Since
`vector_type_cast` does not support dynamic shapes in the target type, no
dynamic size conversion is necessary.

This commit goes in the opposite direction of what is expected of LLVM IR
lowering: it should not be aware of all the other dialects.  Instead, we should
have separate definitions for conversions in a global lowering framework.
However, this requires LLVM dialect to be implemented, which is currently
blocked by the absence of user-defined types.  Implement the lowering anyway to
unblock end-to-end vectorization experiments.
PiperOrigin-RevId: 225887368

Materialize vector_type_cast operation in the SuperVector dialect

This operation is produced and used by the super-vectorization passes and has
been emitted as an abstract unregistered operation until now.  For end-to-end
testing purposes, it has to be eventually lowered to LLVM IR.  Matching
abstract operation by name goes into the opposite direction of the generic
lowering approach that is expected to be used for LLVM IR lowering in the
future.  Register vector_type_cast operation as a part of the SuperVector
dialect.

Arguably, this operation is a special case of the `view` operation from the
Standard dialect.  The semantics of `view` is not fully specified at this point
so it is safer to rely on a custom operation.  Additionally, using a custom
operation may help to achieve clear dialect separation.

PiperOrigin-RevId: 225887305

Refactor / eliminate duplicate code in
memref-dep-check / getIterationDomainContext

PiperOrigin-RevId: 225857762

Type system: replace Type::getBitWidth with getIntOrFloatBitWidth

As MLIR moves towards dialect-specific types, a generic Type::getBitWidth does
not make sense for all of them.  Even with the current type system, the bit
width is not defined (and causes the method in question to abort) for all
TensorFlow types.

This commit restricts the bit width definition to primitive standard types that
have a number of bits appearing verbatim in their type, i.e., integers and
floats.  As a side effect, it delegates the decision on the bit width of the
`index` to the backends.  Existing backends currently hardcode it to 64 bits.

The Type::getBitWidth method is replaced by Type::getIntOrFloatBitWidth that
only applies to integers and floats.  The call sites are updated to use the new
method, where applicable, or rewritten so as not rely on it.  Incidentally,
this fixes a utility method that did not account for memrefs being allowed to
have vectors as element types in the size computation.

As an observation, several places in the code use Type in places where a more
specific type could be used instead.  Some of those are fixed by this commit.

PiperOrigin-RevId: 225844792

loop-unroll - add function callback argument for outside targets to
provide unroll factors, and a cmd line argument to specify number of
innermost loop unroll repetitions.

- add function callback parameter for outside targets to provide unroll factors
- add a cmd line parameter to repeatedly apply innermost loop unroll a certain
number of times (to avoid using -loop-unroll -loop-unroll ...; instead
-unroll-num-reps=2).
- implement the callback for a target
- update test cases / usage

PiperOrigin-RevId: 225843191

Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion.
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.

PiperOrigin-RevId: 225842944

Fix builder getFloatAttr of double to use F64 type and use fltSemantics in FloatAttr.

Store FloatAttr using more appropriate fltSemantics (mostly fixing up F32/F64 storage, F16/BF16 pending). Previously F32 type was used incorrectly for double (the storage was double). Also add query method that returns fltSemantics for IEEE fp types and use that to verify that the APfloat given matches the type:
* FloatAttr created using APFloat is verified that the semantics of the type and APFloat matches;
* FloatAttr created using double has the APFloat created to match the semantics of the type;

Change parsing of tensor negative splat element to pass in the element type expected. Misc other changes to account for the storage type matching the attribute.

PiperOrigin-RevId: 225821834

Free the name symbol in TableGen

Renamed the name field in Op to opName since it is the opcode's name.

Renamed the name parameters in TFLite op templates to opSummary since
they are meant as a summary of the op's functionality.

We will use the name symbol later for the name given by users via TF.

PiperOrigin-RevId: 225807135

Remove duplicate code / reuse right utilities from memref-dep-check / loop-tile

- use addBoundsForForStmt
- getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the
returned things aren't owned / part of the stmt on which it's being called.
- other minor API cleanup

PiperOrigin-RevId: 225774301

'memref-bound-check': extend to store op's as well

- extend memref-bound-check to store op's
- make the bound check an analysis util and move to lib/Analysis/Utils.cpp (so that
one doesn't need to always create a pass to use it)

PiperOrigin-RevId: 225564830

Extract vector_transfer_* Ops into a SuperVectorDialect.

From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction.  In particular, they
are lowered to the StandardOps dialect before further processing.  As such, it
does not make sense to keep them at the same level as StandardOps.  Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.

PiperOrigin-RevId: 225554492

Fix asan failures in mlir-op-gen.

PiperOrigin-RevId: 225532488

Use dag instead of list for operands to allow named operands.

Named operands allow generating builders with more meaningful names + lay the groundwork for allowing the specification of attributes as part of the inputs pattern of an op (which allows the declarative pattern rewrite generator to define ops with attributs). This is a minimal change that just changes how input operands are represented, changes to attributes in follow up and returnTypes later.

PiperOrigin-RevId: 225509805

Expression flattening improvement - reuse local expressions.

- if a local id was already for a specific mod/div expression, just reuse it if
  the expression repeats (instead of adding a new one).
- drastically reduces the number of local variables added during flattening for
  real use cases - since the same div's and mod expressions often repeat.
- add getFlattenedAffineExprs for AffineMap, IntegerSet based on the above

As a natural result of the above:

- FlatAffineConstraints(IntegerSet) ctor now deals with integer sets that have mod
  and div constraints as well, and these get simplified as well from -simplify-affine-structures

PiperOrigin-RevId: 225452174

Convert tf.FakeQuantWithMinMaxArgs/Vars to tfl.FakeQuant

- Define tf.FakeQuantWithMinMaxArgs and tf.FakeQuantWithMinMaxVars
- Add the unit tests for valid and invalid IRs
- Rewrite both to the tfl.FakeQuant op
- Add the unit tests for the rewriting

PiperOrigin-RevId: 225447109

Define TFLite Dequantize and FakeQuant ops

Besides the ops.td file changes to define both ops, this CL also changes the
mlir-op-gen to allow more flexible traits definition for "optional" operation
inputs.

Unit tests are added.

One TODO for the mlir-op-gen is to make attribute optional in the ops.

PiperOrigin-RevId: 225408349

FlatAffineConstraints - complete TODOs: add method to remove duplicate /
trivially redundant constraints. Update projectOut to eliminate identifiers in
a more efficient order. Fix b/120801118.

- add method to remove duplicate / trivially redundant constraints from
  FlatAffineConstraints (use a hashing-based approach with DenseSet)
- update projectOut to eliminate identifiers in a more efficient order

(A sequence of affine_apply's like this (from a real use case) finally exposed
the lack of the above trivial/low hanging simplifications).

  for %ii = 0 to 64 {
    for %jj = 0 to 9 {
      %a0 = affine_apply (d0, d1) -> (d0 * (9 * 1024) + d1 * 128) (%ii, %jj)
      %a1 = affine_apply (d0) ->
        (d0 floordiv (2 * 3 * 3 * 128 * 128),
        (d0 mod 294912) floordiv (3 * 3 * 128 * 128),
        (((d0 mod 294912) mod 147456) floordiv 1152) floordiv 8,
        (((d0 mod 294912) mod 147456) mod 1152) floordiv 384,
        ((((d0 mod 294912) mod 147456) mod 1152) mod 384) floordiv 128,
        (((((d0 mod 294912) mod 147456) mod 1152) mod 384) mod 128)
          floordiv 128) (%a0)
      %v0 = load %in[%a1tensorflow/mlir#0, %a1tensorflow/mlir#1, %a1tensorflow/mlir#3, %a1tensorflow/mlir#4, %a1tensorflow/mlir#2, %a1tensorflow/mlir#5]
        : memref<2x2x3x3x16x1xi32>
    }
  }

- update FlatAffineConstraints::print to print number of constraints.

PiperOrigin-RevId: 225397480

Check if the operation is already in the worklist before adding it.

PiperOrigin-RevId: 225379496

Fix loop unrolling test cases

- These test cases had to be updated post the switch to exclusive upper bound;
  however, the test cases hadn't originally been written to check correctly; as
  a result, they didn't fail and weren't updated. Update test case and fix
  upper bound.

PiperOrigin-RevId: 225194016

LLVM IR lowering: support 1D vector operations

Introduce initial support for 1D vector operations.  LLVM does not support
higher-dimensional vectors so the caller must make sure they don't appear in
the input MLIR.  Handle the presence of higher-dimensional vectors by failing
gracefully.

Introduce the type conversion for 1D vector types and hook it up with the rest
of the type convresion system.  Support "splat" constants for vector types.  As
a side effect, this refactors constant operation emission by separating out
scalar integer constants into a separate case and by extracting out the helper
function for scalar float construction.  Existing binary operations apply to
vectors transparently.

PiperOrigin-RevId: 225172349

ConvertToCFG: use affine_apply to implement loop steps

Originally, loop steps were implemented using `addi` and `constant` operations
because `affine_apply` was not handled in the first implementation. The
support for `affine_apply` has been added, use it to implement the update of
the loop induction variable. This is more consistent with the lower and upper
bounds of the loop that are also implemented as `affine_apply`, removes the
dependence of the converted function on the StandardOps dialect and makes it
clear from the CFG function that all operations on the loop induction variable
are purely affine.

PiperOrigin-RevId: 225165337

Add rudimentary pattern rewrite matching generation.

* Start very basic (about as basic as possible) with the pattern rewrite generation by only
  - Matching single node dags,
  - Single output, single result,
  - No constraints on inputs/outputs.
  - No attributes (only operands)
* The matcher generates C++ code akin to what is currently manually written.
  - This is very much not the final end state, and only intended for the short term;
* Always generate the default builder method to make it easier to generate calls;
  - Also add additional builder method for TFL::Add as attributes are not yet supported;
* Replace TF Add -> TFL Add matching using this generation;
* Introduce a conceptual textual namespace in the op registry
  - Will allow importing multiple dialect's op registry
  - Avoids needing to do anything special with tablegen or define a custom DSL;
    = I really want to do a custom DSL but this urge could just be as its fun :) So defer for now. From this structure we can dump out another structured form if needed;
  - Add a mapping from <namespace>_<op> in the op_gen and pattern rewrite gen
    = This allows placing ops in different namespaces from the same op registry which is convenient, esp. if we want to consider subnamespaces in future;
* Update tfl namespace to TFL to match TF and XLA;

PiperOrigin-RevId: 225155164

Remove dead code from FlatAffineConstraints

- getDimensionBounds() was added initially for quick experimentation - no
  longer used (getConstantBoundOnDimSize is the more powerful/complete
  replacement).
- FlatAffineConstraints::getConstantLower/UpperBound are incomplete,
  functionality/naming-wise misleading, and not used currently. Removing these;
  complete/fixed version will be added in an upcoming CL.

PiperOrigin-RevId: 225075061

Generate another op builder with aggregated parameters

For each op, generate another builder with the following signature:

  static void build(Builder* builder, OperationState* result,
                    ArrayRef<Type> resultTypes,
                    ArrayRef<SSAValue*> args,
                    ArrayRef<NamedAttribute> attributes);

PiperOrigin-RevId: 225066007

Disallow index types as elements of vector, memref and tensor types

An extensive discussion demonstrated that it is difficult to support `index`
types as elements of compound (vector, memref, tensor) types.  In particular,
their size is unknown until the target-specific lowering takes place.  MLIR may
need to store constants of the fixed-shape compound types (e.g.,
vector<4 x index>) internally and must know the size of the element type and
data layout constraints.  The same information is necessary for target-specific
lowering and translation to reliably support compound types with `index`
elements, but MLIR does not have a dedicated target description mechanism yet.

The uses cases for compound types with `index` elements, should they appear,
can be handled via an `index_cast` operation that converts between `index` and
fixed-size integer types at the SSA value level instead of the type level.

PiperOrigin-RevId: 225064373

Update/Fix LoopUtils::stmtBodySkew to handle loop step.

- loop step wasn't handled and there wasn't a TODO or an assertion; fix this.
- rename 'delay' to shift for consistency/readability.
- other readability changes.
- remove duplicate attribute print for DmaStartOp; fix misplaced attribute
print for DmaWaitOp
- add build method for AddFOp (unrelated to this CL, but add it anyway)

PiperOrigin-RevId: 224892958

Fix missing check for dependent DMAs in pipeline-data-transfer

- adding a conservative check for now (TODO: use the dependence analysis pass
once the latter is extended to deal with DMA ops). resolve an existing bug on
a test case.

- update test cases

PiperOrigin-RevId: 224869526

FlatAffineConstraints API cleanup; add normalizeConstraintsByGCD().

- add method normalizeConstraintsByGCD
- call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of
  projectOut.
- remove call to GCDTightenInequalities() from getMemRefRegion
- change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each
  time an identifier is eliminated (to detect emptiness early).
- make FourierMotzkinEliminate, gaussianEliminateId(s),
  GCDTightenInequalities() private
- improve / update stale comments

PiperOrigin-RevId: 224866741

Update/fix -pipeline-data-transfer; fix b/120770946

- fix replaceAllMemRefUsesWith call to replace only inside loop body.
- handle the case where DMA buffers are dynamic; extend doubleBuffer() method
to handle dynamically shaped DMA buffers (pass the right operands to AllocOp)
- place alloc's for DMA buffers at the depth at which pipelining is being done
(instead of at top-level)
- add more test cases

PiperOrigin-RevId: 224852231

Properly namespace createLowerAffineApply

This was missing from the original commit. The implementation of
createLowerAffineApply was defined in the default namespace but declared in the
`mlir` namespace, which could lead to linking errors when it was used. Put the
definition in `mlir` namespace.

PiperOrigin-RevId: 224830894

[MLIR] Drop bug-prone global map indexed by MLFunction*

PiperOrigin-RevId: 224610805

Extend loop tiling utility to handle non-constant loop bounds and bounds that
are a max/min of several expressions.

- Extend loop tiling to handle non-constant loop bounds and bounds that
  are a max/min of several expressions, i.e., bounds using multi-result affine
  maps

- also fix b/120630124 as a result (the IR was in an invalid state when tiled
  loop generation failed; SSA uses were created that weren't plugged into the IR).

PiperOrigin-RevId: 224604460

Generate strided DMAs from -dma-generate
- generate DMAs correctly now using strided DMAs where needed
- add support for multi-level/nested strides; op still supports one level of
  stride for now.

Other things
- add test case for  symbolic lower/upper bound; cases where the DMA buffer
  size can't be bounded by a known constant
- add test case for dynamic shapes where the DMA buffers are however bounded by
  constants
- refactor some of the '-dma-generate' code

PiperOrigin-RevId: 224584529

[MLIR] Add LowerVectorTransfersPass

This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp
to a simple loop nest via local buffer allocations.

This is an MLIR->MLIR lowering based on builders.

A few TODOs are left to address in particular:
1. invert the permutation map so the accesses to the remote memref are coalesced;
2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory);
3. support broadcast / avoid copies when permutation_map is not of full column rank
4. add a proper "element_cast" op

One notable limitation is this does not plan on supporting boundary conditions.
It should be significantly easier to use pre-baked MLIR functions to handle such paddings.
This is left for future consideration.
Therefore the current CL only works properly for full-tile cases atm.

This CL also adds 2 simple tests:

```mlir
  for %i0 = 0 to %M step 3 {
    for %i1 = 0 to %N step 4 {
      for %i2 = 0 to %O {
        for %i3 = 0 to %P step 5 {
          vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index
```

lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
  for %i1 = 0 to %arg1 step 4 {
    for %i2 = 0 to %arg2 {
      for %i3 = 0 to %arg3 step 5 {
        %1 = alloc() : memref<5x4x3xf32>
        %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
        store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>>
        for %i4 = 0 to 5 {
          %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
          for %i5 = 0 to 4 {
            %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5)
            for %i6 = 0 to 3 {
              %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
              %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32>
              store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32>
       dealloc %1 : memref<5x4x3xf32>
```

and
```mlir
  for %i0 = 0 to %M step 3 {
    for %i1 = 0 to %N {
      for %i2 = 0 to %O {
        for %i3 = 0 to %P step 5 {
          %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32>

```

lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
  for %i1 = 0 to %arg1 {
    for %i2 = 0 to %arg2 {
      for %i3 = 0 to %arg3 step 5 {
        %1 = alloc() : memref<5x4x3xf32>
        %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
        for %i4 = 0 to 5 {
          %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
          for %i5 = 0 to 4 {
            for %i6 = 0 to 3 {
              %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
              %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32>
              store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32>
        %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>>
        dealloc %1 : memref<5x4x3xf32>
```

PiperOrigin-RevId: 224552717

[MLIR] Fix the name of the MaterializeVectorPass

PiperOrigin-RevId: 224536381

[MLIR] Add composeWithUnboundedMap

This CL adds a finer grain composition function between AffineExpr and an
unbounded map. This will be used in the next CL.
Also cleans up some comments remaining from a previous CL.

PiperOrigin-RevId: 224536314

[MLIR] Add LangRef entries for vector_transfer ops

PiperOrigin-RevId: 224535443

Return bool from all emitError methods similar to Operation::emitOpError

This simplifies call-sites returning true after emitting an error. After the
conversion, dropped braces around single statement blocks as that seems more
common.

Also, switched to emitError method instead of emitting Error kind using the
emitDiagnostic method.

TESTED with existing unit tests

PiperOrigin-RevId: 224527868

Auto-generate op builder with TableGen

If no custom builder is supplied for an op, TableGen now generates
a default builder for it with the following signature:

  static void build(Builder *builder, OperationState* result,
                    <list-of-all-result-types>,
                    <list-of-all-operands>,
                    <list-of-all-attributes>);

PiperOrigin-RevId: 224382473

[MLIR] Drop assert for NYI in Vectorize.cpp

This CLs adds proper error emission, removes NYI assertions and documents
assumptions that are required in the relevant functions.

PiperOrigin-RevId: 224377207

[MLIR] Drop assert for NYI in VectorAnalysis

This CLs adds proper error emission, removes NYI assertions and documents
assumptions that are required in the relevant functions.

PiperOrigin-RevId: 224377143

[MLIR] Drop unnecessary mention of NYI.

This CL also documents the `substExpr` helper function assumptions.
The assumptions are properly propagated up already.

PiperOrigin-RevId: 224377072

[MLIR] Remove NYI assertions in LoopAnalysis.cpp

This CL also cleans up some loose ends and returns conservative answers while
emitting errors in the NYI cases.

PiperOrigin-RevId: 224377004

[MLIR] Error handling in MaterializeVectors

This removes assertions as a means to capture NYI behavior and propagates
errors up.

PiperOrigin-RevId: 224376935

[MLIR] Add AffineMap composition and use it in Materialization

This CL adds the following free functions:
```
/// Returns the AffineExpr e o m.
AffineExpr compose(AffineExpr e, AffineMap m);
/// Returns the AffineExpr f o g.
AffineMap compose(AffineMap f, AffineMap g);
```

This addresses the issue that AffineMap composition is only available at a
distance via AffineValueMap and is thus unusable on Attributes.
This CL thus implements AffineMap composition in a more modular and composable
way.

This CL does not claim that it can be a good replacement for the
implementation in AffineValueMap, in particular it does not support bounded
maps atm.

Standalone tests are added that replicate some of the logic of the AffineMap
composition pass.

Lastly, affine map composition is used properly inside MaterializeVectors and
a standalone test is added that requires permutation_map composition with a
projection map.

PiperOrigin-RevId: 224376870

[MLIR] Add support for permutation_map

This CL hooks up and uses permutation_map in vector_transfer ops.
In particular, when going into the nuts and bolts of the implementation, it
became clear that cases arose that required supporting broadcast semantics.
Broadcast semantics are thus added to the general permutation_map.
The verify methods and tests are updated accordingly.

Examples of interest include.

Example 1:
The following MLIR snippet:
```mlir
   for %i3 = 0 to %M {
     for %i4 = 0 to %N {
       for %i5 = 0 to %P {
         %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32>
   }}}
```
may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into:
```mlir
   for %i3 = 0 to %0 step 32 {
     for %i4 = 0 to %1 {
       for %i5 = 0 to %2 step 256 {
         %4 = vector_transfer_read %arg0, %i4, %i5, %i3
              {permutation_map: (d0, d1, d2) -> (d2, d1)} :
              (memref<?x?x?xf32>, index, index) -> vector<32x256xf32>
   }}}
````
Meaning that vector_transfer_read will be responsible for reading the 2-D slice:
`%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will
require a transposition when vector_transfer_read is further lowered.

Example 2:
The following MLIR snippet:
```mlir
   %cst0 = constant 0 : index
   for %i0 = 0 to %M {
     %a0 = load %A[%cst0, %cst0] : memref<?x?xf32>
   }
```
may vectorize with {permutation_map: (d0) -> (0)} into:
```mlir
   for %i0 = 0 to %0 step 128 {
     %3 = vector_transfer_read %arg0, %c0_0, %c0_0
          {permutation_map: (d0, d1) -> (0)} :
          (memref<?x?xf32>, index, index) -> vector<128xf32>
   }
````
Meaning that vector_transfer_read will be responsible of reading the 0-D slice
`%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector
broadcast when vector_transfer_read is further lowered.

Additionally, some minor cleanups and refactorings are performed.

One notable thing missing here is the composition with a projection map during
materialization. This is because I could not find an AffineMap composition
that operates on AffineMap directly: everything related to composition seems
to require going through SSAValue and only operates on AffinMap at a distance
via AffineValueMap. I have raised this concern a bunch of times already, the
followup CL will actually do something about it.

In the meantime, the projection is hacked at a minimum to pass verification
and materialiation tests are temporarily incorrect.

PiperOrigin-RevId: 224376828

ConvertToCFG: support min/max in loop bounds.

The recently introduced `select` operation enables ConvertToCFG to support
min(max) in loop bounds. Individual min(max) is implemented as
`cmpi "lt"`(`cmpi "gt"`) followed by a `select` between the compared values.
Multiple results of an `affine_apply` operation extracted from the loop bounds
are reduced using min(max) in a sequential manner. While this may decrease the
potential for instruction-level parallelism, it is easier to recognize for the
following passes, in particular for the vectorizer.

PiperOrigin-RevId: 224376233

OpPointer: replace conversion operator to Operation* to OpType*.

The implementation of OpPointer<OpType> provides an implicit conversion to
Operation *, but not to the underlying OpType *.  This has led to
awkward-looking code when an OpPointer needs to be passed to a function
accepting an OpType *.  For example,

    if (auto someOp = genericOp.dyn_cast<OpType>())
      someFunction(&*someOp);

where "&*" makes it harder to read.  Arguably, one does not want to spell out
OpPointer<OpType> in the line with dyn_cast.  More generally, OpPointer is now
being used as an owning pointer to OpType rather than to operation.

Replace the implicit conversion to Operation* with the conversion to OpType*
taking into account const-ness of the type.  An Operation* can be obtained from
an OpType with a simple call.  Since an instance of OpPointer owns the OpType
value, the pointer to it is never null.  However, the OpType value may not be
associated with any Operation*.  In this case, return nullptr when conversion
is attempted to maintain consistency with the existing null checks.

PiperOrigin-RevId: 224368103

Fix cases where unsigned / signed arithmetic was being mixed (following up on
cl/224246657); eliminate repeated evaluation of exprs in loop upper bounds.

- while on this, sweep through and fix potential repeated evaluation of
expressions in loop upper bounds

PiperOrigin-RevId: 224268918

Fix bug in GCD calculation when flattening AffineExpr (adds unit test which triggers the bug and tests the fix).

PiperOrigin-RevId: 224246657

Make examples semantically meaningful and fix miscellaneous typos. Thanks to @rocky for pointing out the bugs.

PiperOrigin-RevId: 224239160

Strided DMA support for DmaStartOp

- add optional stride arguments for DmaStartOp
- add DmaStartOp::verify(), and missing test cases for DMA op's in
test/IR/memory-ops.mlir.

PiperOrigin-RevId: 224232466

Complete multiple unhandled cases for DmaGeneration / getMemRefRegion;
update/improve/clean up API.

- update FlatAffineConstraints::getConstBoundDifference; return constant
  differences between symbolic affine expressions, look at equalities as well.
- fix buffer size computation when generating DMAs symbolic in outer loops,
  correctly handle symbols at various places (affine access maps, loop bounds,
  loop IVs outer to the depth at which DMA generation is being done)
- bug fixes / complete some TODOs for getMemRefRegion
- refactor common code b/w memref dependence check and getMemRefRegion
- FlatAffineConstraints API update; added methods employ trivial checks /
  detection - sufficient to handle hyper-rectangular cases in a precise way
  while being fast / low complexity. Hyper-rectangular cases fall out as
  trivial cases for these methods while other cases still do not cause failure
  (either return conservative or return failure that is handled by the caller).

PiperOrigin-RevId: 224229879

Clean up base TableGen definitions

* Removed unused builder field for type definitions
* Refined comments and reordered classes

PiperOrigin-RevId: 224223038

Enable using bare attributes.

Useful for defining ops such as <dialect>.Const where multiple kinds of attributes are legal.

PiperOrigin-RevId: 224210511

Add isIntOrIndex() and isIntOrIndexOrFloat() into Type

The checks for `isa<IndexType>() || isa<IntegerType>()` and
`isa<IndexType>() || isa<IntegerType>() || isa<FloatType>()`
are frequently used, so it's useful to have some helper
methods for them.

PiperOrigin-RevId: 224133596

Remove duplicate FlatAffineConstraints::removeId - refactor to use
removeColumnRange

- remove functionally duplicate code in removeId.

- rename removeColumnRange -> removeIdRange - restrict valid input to just the
identifier columns (not the constant term column).

PiperOrigin-RevId: 224054064

FlatAffineConstraints::removeId() fix.

This is an obvious bug, but none of the test cases exposed it since numIds was
correctly updated, and the dimensional identifiers were always eliminated
before the symbolic identifiers in all cases that removeId was getting
called from. However, other work in progress exercises the other scenarios and
exposes this bug.

Add an hasConsistentState() private method to move common assertion checks, and call it
from several base methods. Make hasInvalidConstraint() a private method as
well (from a file static one).

PiperOrigin-RevId: 224032721

Change TFLite binary ops to support implicit broadcasting

As it turns out, the TFLite runtime already supports implicit broadcasting
for math binary ops. As the instruction set for TFLite runtime, the tfl
dialect should reflect that, instead of requiring both operands for binary
ops to be of the same type.

To support implicit broadcast means it's not suitable to provide the
short-form assembly for TFLite binary ops anymore. So by default, we should
just provide the canonical-form assembly parser/printer for base binary op.
It's subclasses' choices whether to opt in to short-form.

Added BroadcastableTwoOperandsOneResult as a new dialect trait for checking
the operand and result types for TFLite binary ops.

Also added SameOperandsAndResultType to several neural network ops.

PiperOrigin-RevId: 224027445

During forward substitution, merge symbols from input AffineMap with the symbol list of the target AffineMap.
Symbols can be used as dim identifiers and symbolic identifiers, and so we must preserve the symbolic identifies from the input AffineMap during forward substitution, even if that same identifier is used as a dimension identifier in the target AffineMap.
Test case added.

Going forward, we may want to explore solutions where we do not maintain this split between dimensions and symbols, and instead verify the validity of each use of each AffineMap operand AffineMap in the context where the AffineMap operand usage is required to be a symbol: in the denominator of floordiv/ceildiv/mod for semi-affine maps, and in instructions that can capture symbols (i.e. alloc)

PiperOrigin-RevId: 224017364

Fix off by one in OpStats.

PiperOrigin-RevId: 223977444

ConvertToCFG: convert "if" statements.

The condition of the "if" statement is an integer set, defined as a conjunction
of affine constraints.  An affine constraints consists of an affine expression
and a flag indicating whether the expression is strictly equal to zero or is
also allowed to be greater than zero.  Affine maps, accepted by `affine_apply`
are also formed from affine expressions.  Leverage this fact to implement the
checking of "if" conditions.  Each affine expression from the integer set is
converted into an affine map.  This map is applied to the arguments of the "if"
statement.  The result of the application is compared with zero given the
equality flag to obtain the final boolean value.  The conjunction of conditions
is tested sequentially with short-circuit branching to the "else" branch if any
of the condition evaluates to false.

Create an SESE region for the if statement (including its "then" and optional
"else" statement blocks) and append it to the end of the current region.  The
conditional region consists of a sequence of condition-checking blocks that
implement the short-circuit scheme, followed by a "then" SESE region and an
"else" SESE region, and the continuation block that post-dominates all blocks
of the "if" statement.  The flow of blocks that correspond to the "then" and
"else" clauses are constructed recursively, enabling easy nesting of "if"
statements and if-then-else-if chains.

Note that MLIR semantics does not require nor prohibit short-circuit
evaluation.  Since affine expressions do not have side effects, there is no
observable difference in the program behavior.  We may trade off extra
operations for operation-level parallelism opportunity by first performing all
`affine_apply` and comparison operations independently, and then performing a
tree pattern reduction of the resulting boolean values with the `muli i1`
operations (in absence of the dedicated bit operations).  The pros and cons are
not clear, and since MLIR does not include parallel semantics, we prefer to
minimize the number of sequentially executed operations.

PiperOrigin-RevId: 223970248

LLVM IR Lowering: support multi-value returns.

Unlike MLIR, LLVM IR does not support functions that return multiple values.
Simulate this by packing values into the LLVM structure type in the same order
as they appear in the MLIR return. If the function returns only a single
value, return it directly without packing.

PiperOrigin-RevId: 223964886

[MLIR] Separate and split vectorization tests

These tests have become too bulky and unwiedly.
Splitting simplifies modifications that will occur in the next CL.

PiperOrigin-RevId: 223874321

[MLIR] Add VectorTransferOps

This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.

VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.

VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.

Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.

VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.

A vector transfer read has semantics similar to a vector load, with additional
support for:
  1. an optional value of the elemental type of the MemRef. This value
     supports non-effecting padding and is inserted in places where the
     vector read exceeds the MemRef bounds. If the value is not specified,
     the access is statically guaranteed to be within bounds;
  2. an attribute of type AffineMap to specify a slice of the original
     MemRef access and its transposition into the super-vector shape. The
     permutation_map is an unbounded AffineMap that must represent a
     permutation from the MemRef dim space projected onto the vector dim
     space.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
  ...
  %val = `ssa-value` : f32
  // let %i, %j, %k, %l be ssa-values of type index
  %v0 = vector_transfer_read %src, %i, %j, %k, %l
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index) ->
            vector<16x32x64xf32>
  %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index, f32) ->
            vector<16x32x64xf32>
```

VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
  %val = `ssa-value` : vector<16x32x64xf32>
  // let %i, %j, %k, %l be ssa-values of type index
  vector_transfer_write %val, %src, %i, %j, %k, %l
    {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
  (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234

Fix two more getHashValues.

These were still returning the hash of the pointers resulting in the two getHashValues being different.

PiperOrigin-RevId: 223862743

FlatAffineConstraints::composeMap: return failure instead of asserting on semi-affine maps

FlatAffineConstraints::composeMap: should return false instead of asserting on
a semi-affine map. Make getMemRefRegion just propagate false when encountering
semi-affine maps (instead of crashing!)
PiperOrigin-RevId: 223828743

Minor fix for replaceAllMemRefUsesWith.

The check for whether the memref was used in a non-derefencing context had to
be done inside, i.e., only for the op stmt's that the replacement was specified
to be performed on (by the domStmtFilter arg if provided). As such, it is
completely fine for example for a function to return a memref while the replacement
is being performed only a specific loop's body (as in the case of DMA
generation).

PiperOrigin-RevId: 223827753

Add a simple common sub expression elimination pass.

The algorithm collects defining operations within a scoped hash table. The scopes within the hash table correspond to nodes within the dominance tree for a function. This cl only adds support for simple operations, i.e non side-effecting. Such operations, e.g. load/store/call, will be handled in later patches.

PiperOrigin-RevId: 223811328

Remove tfl.reshape op when possible

Remove tfl.reshape for the following two cases:

1. A tfl.reshape's input is from another tfl.reshape.
Then these two tfl.reshape ops can be merged.

2. A tfl.reshape's result type is the same as its input type.
This tfl.reshape op does nothing, which can be removed.

These transformations are put in a new source file, Canonicalizer.cpp,
because they are TFLite op to TFLite op transformations, and aiming
to making TFLite ops more canonicalized.

Also added a hasCanonicalizationPatterns marker in TableGen Op class
to indicate whether an op has custom getCanonicalizationPatterns().

PiperOrigin-RevId: 223806921

Update getHashValue for ptr values stored in a DenseMap/Set to use getHasValue of KeyTy.

Ensures both hash values returned are the same. Tested by triggering resize of map/set and verifying failure before change.

PiperOrigin-RevId: 223651443

RankedTensorType: Use getHashValue(KeyTy) when calling getHashValue(RankedTensorTypeStorage*).

PiperOrigin-RevId: 223649958

Document SelectOp class

This was missing from the commit that introduced SelectOp although the
documentation was present in the LangRef.md.

PiperOrigin-RevId: 223476888

Avoid failing when attempting to print null Attribute.

This avoids segfaulting when dumping during debugging of failures.

PiperOrigin-RevId: 223449494

Debug output / logging memref sizes in DMA generation + related changes

- Add method to get a memref's size in bytes
- clean up a loop tiling pass helper (NFC)

PiperOrigin-RevId: 223422077

[MLIR] Reenable materialize_vectors test

Fixes one of the Filecheck'ed test which was mistakenly disabled.

PiperOrigin-RevId: 223401978

Add support for result type iteration in Operation/Instruction/OperationStmt.

PiperOrigin-RevId: 223264992

Split "rewrite" functionality out of Pattern into a new RewritePattern derived
class. This change is NFC, but allows for new kinds of patterns, specifically
LegalizationPatterns which will be allowed to change the types of things they
rewrite.

PiperOrigin-RevId: 223243783

Verify CmpIOp's result type to be bool-like

This CL added two new traits, SameOperandsAndResultShape and
ResultsAreBoolLike, and changed CmpIOp to embody these two
traits. As a consequence, CmpIOp's result type now is verified
to be bool-like.

PiperOrigin-RevId: 223208438

Add derived attribute support.

Derived attributes are attributes that are derived from other properties of the operation (e.g., the shape returned from the type). DerivedAttr is parameterized on the return type and function body.

PiperOrigin-RevId: 223180315

StandardOps: introduce 'select'.

The semantics of 'select' is conventional: return the second operand if the
first operand is true (1 : i1) and the third operand otherwise. It is
applicable to vectors and tensors element-wise, similarly to LLVM instruction.
This operation is necessary to implement min/max to lower 'for' loops with
complex bounds to CFG functions and to support ternary operations in ML
functions. It is preferred to first-class min/max because of its simplicity,
e.g. it is not concered with signedness.

PiperOrigin-RevId: 223160860

LLVM IR lowering: support 'dim' operation.

Add support for translating 'dim' opreation on MemRefs to LLVM IR. For a
static size, this operation merely defines an LLVM IR constant value that may
not appear in the output IR if not used (and had not been removed before by
DCE). For a dynamic size, this operation is translated into an access to the
MemRef descriptor that contains the dynamic size.

PiperOrigin-RevId: 223160774

LLVM IR lowering: support simple MemRef types

Introduce initial support for MemRef types, including type conversion,
allocation and deallocation, read and write element-wise access, passing
MemRefs to and returning from functions.  Affine map compositions and
non-default memory spaces are NOT YET supported.

Lowered code needs to handle potentially dynamic sizes of the MemRef.  To do
so, it replaces a MemRef-typed value with a special MemRef descriptor that
carries the data and the dynamic sizes together.  A MemRef type is converted to
LLVM's first-class structure type with the first element being the pointer to
the data buffer with data layed out linearly, followed by as many integer-typed
elements as MemRef has dynamic sizes.  The type of these elements is that of
MLIR index lowered to LLVM.  For example, `memref<?x42x?xf32>` is converted to
`{ f32*, i64, i64 }` provided `index` is lowered to `i64`.  While it is
possible to convert MemRefs with fully static sizes to simple pointers to their
elemental types, we opted for consistency and convert them to the
single-element structure.  This makes the conversion code simpler and the
calling convention of the generated LLVM IR functions consistent.

Loads from and stores to a MemRef element are lowered to a sequence of LLVM
instructions that, first, computes the linearized index of the element in the
data buffer using the access indices and combining the static sizes with the
dynamic sizes stored in the descriptor, and then loads from or stores to the
buffer element indexed by the linearized subscript.  While some of the index
computations may be redundant (i.e., consecutive load and store to the same
location in the same scope could reuse the linearized index), we emit them for
every operation.  A subsequent optimization pass may eliminate them if
necessary.

MemRef allocation and deallocation is performed using external functions
`__mlir_alloc(index) -> i8*` and `__mlir_free(i8*)` that must be implemented by
the caller.  These functions behave similarly to `malloc` and `free`, but can
be extended to support different memory spaces in future.  Allocation and
deallocation instructions take care of casting the pointers.  Prior to calling
the allocation function, the emitted code creates an SSA Value for the
descriptor and uses it to store the dynamic sizes of the MemRef passed to the
allocation operation.  It further emits instructions that compute the dynamic
amount of memory to allocate in bytes.  Finally, the allocation stores the
result of calling the `__mlir_alloc` in the MemRef descriptor.  Deallocation
extracts the pointer to the allocated memory from the descriptor and calls
`__mlir_free` on it.  The descriptor itself is not modified and, being
stack-allocated, ceases to exist when it goes out of scope.

MLIR functions that access MemRef values as arguments or return them are
converted to LLVM IR functions that accept MemRef descriptors as LLVM IR
structure types by value.  This significantly simplifies the calling convention
at the LLVM IR level and avoids handling descriptors in the dynamic memory,
however is not always comaptible with LLVM IR functions emitted from C code
with similar signatures.  A separate LLVM pass may be introduced in the future
to provide C-compatible calling conventions for LLVM IR functions generated
from MLIR.

PiperOrigin-RevId: 223134883

Make operation names hashable.

PiperOrigin-RevId: 223104253