review.tizen.org Git - platform/upstream/llvm.git/log

[MLIR] More graceful failure in MaterializeVectors

Even though it is unexpected except in pathological cases, a nullptr clone may
be returned. This CL handles the nullptr return gracefuly.

PiperOrigin-RevId: 227764615

[MLIR] Drop strict super-vector requirement in MaterializeVector

The strict requirement (i.e. at least 2 HW vectors in a super-vector) was a
premature optimization to avoid interfering with other vector code potentially
introduced via other means.

This CL avoids this premature optimization and the spurious errors it causes
when super-vector size == HW vector size (which is a possible corner case).

This may be revisited in the future.

PiperOrigin-RevId: 227763966

[MLIR] Fix uninitialized value found with msan

The omission of an early exit created opportunities for unitialized memory
reads. This CL fixes the issue.

PiperOrigin-RevId: 227761814

[MLIR] Handle corner case in MaterializeVectors

This corner was found when stress testing with a functional end-to-end CPU
path. In the case where the hardware vector size is 1x...x1 the `keep` vector
is empty and would result a crash.

While there is no reason to expect a 1x...x1 HW vector in practice, this case
can just gracefully degrade to scalar, which is what this CL allows.

PiperOrigin-RevId: 227761097

Split the standard types from builtin types and move them into separate source files(StandardTypes.cpp/h). After this cl only FunctionType and IndexType are builtin types, but IndexType will likely become a standard type when the ml/cfgfunc merger is done. Mechanical NFC.

PiperOrigin-RevId: 227750918

Include both TF and TFL ops.td in legalize patterns.

Need to do some ifdef jumps with TableGen to avoid errors due to including the base multiple times. The way TableGen flags repeated includes is by way of checking the include directive this necessitates that the guards are on the includes as well as around the classes/defines.

PiperOrigin-RevId: 227692030

Match the op via isa instead of string compare.

* Match using isa
- This limits the rewrite pattern to ops defined in op registry but that is probably better end state (esp. for additional verification).

PiperOrigin-RevId: 227598946

Add tf.Add op

Add ops.td for TF dialect and start by adding tf.Add (op used in legalizer pattern). This CL does not add TensorOp trait (that's not part of OpTrait namespace).

Remove OpCode emission that is not currently used and can be added back later if needed.

PiperOrigin-RevId: 227590973

Implement initial support for dialect specific types.

Dialect specific types are registered similarly to operations, i.e. registerType<...> within the dialect. Unlike operations, there is no notion of a "verbose" type, that is *all* types must be registered to a dialect. Casting support(isa/dyn_cast/etc.) is implemented by reserving a range of type kinds in the top level Type class as opposed to string comparison like operations.

To support derived types a few hooks need to be implemented:

In the concrete type class:
    - static char typeID;
      * A unique identifier for the type used during registration.

In the Dialect:
    - typeParseHook and typePrintHook must be implemented to provide parser support.

The syntax for dialect extended types is as follows:
dialect-type:  '!' dialect-namespace '<' '"' type-specific-data '"' '>'

The 'type-specific-data' is information used to identify different types within the dialect, e.g:
- !tf<"variant"> // Tensor Flow Variant Type
- !tf<"string">  // Tensor Flow String Type

TensorFlow/TensorFlowControl types are now implemented as dialect specific types as a proof
of concept.

PiperOrigin-RevId: 227580052

Merge LowerAffineApplyPass into LowerIfAndForPass, rename to LowerAffinePass

This change is mechanical and merges the LowerAffineApplyPass and
LowerIfAndForPass into a single LowerAffinePass.  It makes a step towards
defining an "affine dialect" that would contain all polyhedral-related
constructs.  The motivation for merging these two passes is based on retiring
MLFunctions and, eventually, transforming If and For statements into regular
operations.  After that happens, LowerAffinePass becomes yet another
legalization.

PiperOrigin-RevId: 227566113

Add builderCall to Type and add constant attr class.

With the builder to construct the type on the Type, the appropriate mlir::Type can be constructed where needed. Also add a constant attr class that has the attribute and value as members.

PiperOrigin-RevId: 227564789

LowerForAndIf: expand affine_apply's inplace

Existing implementation was created before ML/CFG unification refactoring and
did not concern itself with further lowering to separate concerns.  As a
result, it emitted `affine_apply` instructions to implement `for` loop bounds
and `if` conditions and required a follow-up function pass to lower those
`affine_apply` to arithmetic primitives.  In the unified function world,
LowerForAndIf is mostly a lowering pass with low complexity.  As we move
towards a dialect for affine operations (including `for` and `if`), it makes
sense to lower `for` and `if` conditions directly to arithmetic primitives
instead of relying on `affine_apply`.

Expose `expandAffineExpr` function in LoweringUtils.  Use this function
together with `expandAffineMaps` to emit primitives that implement loop and
branch conditions directly.

Also remove tests that become unnecessary after transforming LowerForAndIf into
a function pass.

PiperOrigin-RevId: 227563608

Refactor LowerAffineApply

In LoweringUtils, extract out `expandAffineMap`.  This function takes an affine
map and a list of values the map should be applied to and emits a sequence of
arithmetic instructions that implement the affine map.  It is independent of
the AffineApplyOp and can be used in places where we need to insert an
evaluation of an affine map without relying on a (temporary) `affine_apply`
instruction.  This prepares for a merge between LowerAffineApply and
LowerForAndIf passes.

Move the `expandAffineApply` function to the LowerAffineApply pass since it is
the only place that must be aware of the `affine_apply` instructions.

PiperOrigin-RevId: 227563439

Update the g3docs to reflect the merging of CFG and ML functions.

PiperOrigin-RevId: 227562943

Eliminate extfunc/cfgfunc/mlfunc as a concept, and just use 'func' instead.
The entire compiler now looks at structural properties of the function (e.g.
does it have one block, does it contain an if/for stmt, etc) so the only thing
holding up this difference is round tripping through the parser/printer syntax.
Removing this shrinks the compile by ~140LOC.

This is step 31/n towards merging instructions and statements. The last step
is updating the docs, which I will do as a separate patch in order to split it
from this mostly mechanical patch.

PiperOrigin-RevId: 227540453

Rename OperationPrefix to Namespace in Dialect. This is important as dialects will soon be able to define more than just operations.

Moving forward dialect namespaces cannot contain '.' characters.

This cl also standardizes that operation names must begin with the dialect namespace followed by a '.'.

PiperOrigin-RevId: 227532193

LLVM IR Lowering: support "select"

This commit adds support for the "select" operation that lowers directly into
its LLVM IR counterpart. A simple test is included.

PiperOrigin-RevId: 227527893

Simplify FunctionPass to only have a runOnFunction hook, instead of having a
runOnCFG/MLFunction override locations. Passes that care can handle this
filtering if they choose. Also, eliminate one needless difference between
CFG/ML functions in the parser.

This is step 30/n towards merging instructions and statements.

PiperOrigin-RevId: 227515912

Indent auto-generated build method

TESTED manually

PiperOrigin-RevId: 227495854

[MLIR] Sketch a simple set of EDSCs to declaratively write MLIR

This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs)
in MLIR components:
1. a `Type` system of shell classes that closely matches the MLIR type system. These
types are subdivided into `Bindable` leaf expressions and non-bindable `Expr`
expressions;
2. an `MLIREmitter` class whose purpose is to:
  a. maintain a map of `Bindable` leaf expressions to concrete SSAValue*;
  b. provide helper functionality to specify bindings of `Bindable` classes to
     SSAValue* while verifying comformable types;
  c. traverse the `Expr` and emit the MLIR.

This is used on a concrete example to implement MemRef load/store with clipping in the
LowerVectorTransfer pass. More specifically, the following pseudo-C++ code:
```c++
MLFuncBuilder *b = ...;
Location location = ...;
Bindable zero, one, expr, size;
// EDSL expression
auto access = select(expr < zero, zero, select(expr < size, expr, size - one));
auto ssaValue = MLIREmitter(b)
    .bind(zero, ...)
    .bind(one, ...)
    .bind(expr, ...)
    .bind(size, ...)
    .emit(location, access);
```
is used to emit all the MLIR for a clipped MemRef access.

This simple EDSL can easily be extended to more powerful patterns and should
serve as the counterpart to pattern matchers (and could potentially be unified
once we get enough experience).

In the future, most of this code should be TableGen'd but for now it has
concrete valuable uses: make MLIR programmable in a declarative fashion.

This CL also adds Stmt, proper supporting free functions and rewrites
VectorTransferLowering fully using EDSCs.

The code for creating the EDSCs emitting a VectorTransferReadOp as loops
with clipped loads is:

```c++
  Stmt block = Block({
    tmpAlloc = alloc(tmpMemRefType),
    vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
    ForNest(ivs, lbs, ubs, steps, {
      scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs),
      store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs),
    }),
    vectorValue = load(vectorView, zero),
    tmpDealloc = dealloc(tmpAlloc.getLHS())});
  emitter.emitStmt(block);
```

where `accessInfo.clippedScalarAccessExprs)` is created with:

```c++
select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one));
```

The generated MLIR resembles:

```mlir
    %1 = dim %0, 0 : memref<?x?x?x?xf32>
    %2 = dim %0, 1 : memref<?x?x?x?xf32>
    %3 = dim %0, 2 : memref<?x?x?x?xf32>
    %4 = dim %0, 3 : memref<?x?x?x?xf32>
    %5 = alloc() : memref<5x4x3xf32>
    %6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
    for %i4 = 0 to 3 {
      for %i5 = 0 to 4 {
        for %i6 = 0 to 5 {
          %7 = affine_apply #map0(%i0, %i4)
          %8 = cmpi "slt", %7, %c0 : index
          %9 = affine_apply #map0(%i0, %i4)
          %10 = cmpi "slt", %9, %1 : index
          %11 = affine_apply #map0(%i0, %i4)
          %12 = affine_apply #map1(%1, %c1)
          %13 = select %10, %11, %12 : index
          %14 = select %8, %c0, %13 : index
          %15 = affine_apply #map0(%i3, %i6)
          %16 = cmpi "slt", %15, %c0 : index
          %17 = affine_apply #map0(%i3, %i6)
          %18 = cmpi "slt", %17, %4 : index
          %19 = affine_apply #map0(%i3, %i6)
          %20 = affine_apply #map1(%4, %c1)
          %21 = select %18, %19, %20 : index
          %22 = select %16, %c0, %21 : index
          %23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32>
          store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32>
        }
      }
    }
    %24 = load %6[%c0] : memref<1xvector<5x4x3xf32>>
    dealloc %5 : memref<5x4x3xf32>
```

In particular notice that only 3 out of the 4-d accesses are clipped: this
corresponds indeed to the number of dimensions in the super-vector.

This CL also addresses the cleanups resulting from the review of the prevous
CL and performs some refactoring to simplify the abstraction.

PiperOrigin-RevId: 227367414

Merge together the CFG/ML function paths in the CSE pass. I did a first pass
on this to merge together the classes, but there may be other simplification
possible. I'll leave that to riverriddle@ as future work.

This is step 29/n towards merging instructions and statements.

PiperOrigin-RevId: 227328680

Update and generalize various passes to work on both CFG and ML functions,
simplifying them in minor ways. The only significant cleanup here
is the constant folding pass. All the other changes are simple and easy,
but this is still enough to shrink the compiler by 45LOC.

The one pass left to merge is the CSE pass, which will be move involved, so I'm
splitting it out to its own patch (which I'll tackle right after this).

This is step 28/n towards merging instructions and statements.

PiperOrigin-RevId: 227328115

Simplify the remapFunctionAttrs logic, merging CFG/ML function handling.
Remove an unnecessary restriction in forward substitution. Slightly
simplify LLVM IR lowering, which previously would crash if given an ML
function, it should now produce a clean error if given a function with an
if/for instruction in it, just like it does any other unsupported op.

This is step 27/n towards merging instructions and statements.

PiperOrigin-RevId: 227324542

Simplify GreedyPatternRewriteDriver now that functions are merged into one
representation, shrinking by 70LOC. The PatternRewriter class can probably
also be simplified as well, but one step at a time.

This is step 26/n towards merging instructions and statements. NFC.

PiperOrigin-RevId: 227324218

Drop unusued HyperRectangularSet.h/.cpp, given the new design being worked on.

- drop these ununsed/incomplete sketches given the new design
@albertcohen is working on, and given that FlatAffineConstraints is now
stable and fast enough for all the analyses/transforms that depend on it.

PiperOrigin-RevId: 227322739

Introduce PostDominanceInfo, fix properlyDominates() for Instructions

- introduce PostDominanceInfo in the right/complete way and use that for post
  dominance check in store-load forwarding
- replace all uses of Analysis/Utils::dominates/properlyDominates with
  DominanceInfo::dominates/properlyDominates
- drop all redundant copies of dominance methods in Analysis/Utils/
- in pipeline-data-transfer, replace dominates call with a much less expensive
  check; similarly, substitute dominates() in checkMemRefAccessDependence with
  a simpler check suitable for that context
- fix a bug in properlyDominates
- improve doc for 'for' instruction 'body'

PiperOrigin-RevId: 227320507

Fix dominates() for block's.

- dominates() for blocks was assuming that there was only a single block at the
top level whenever there was a hierarchy of blocks (as in the case of 'for'/'if'
instructions).
- fix the comments as well

PiperOrigin-RevId: 227319738

Greatly simplify the ConvertToCFG pass, converting it from a module pass to a
function pass, and eliminating the need to copy over code and do
interprocedural updates.  While here, also improve it to make fewer empty
blocks, and rename it to "LowerIfAndFor" since that is what it does.  This is
a net reduction of ~170 lines of code.

As drive-bys, change the splitBlock method to *not* insert an unconditional
branch, since that behavior is annoying for all clients.  Also improve the
AsmPrinter to not crash when a block is referenced that isn't linked into a
function.

PiperOrigin-RevId: 227308856

Fix ASAN failure in memref-dataflow-opt

- memrefsToErase had duplicates inserted into it; switch to SmallPtrSet.

PiperOrigin-RevId: 227299306

Make PrintOpStatsPass a module pass

PrintOpStatsPass is maintaining state (op stats ) across functions and doing
per-module work - it should be a module pass.

PiperOrigin-RevId: 227294151

Introduce memref store to load forwarding - a simple memref dataflow analysis

- the load/store forwarding relies on memref dependence routines as well as
  SSA/dominance to identify the memref store instance uniquely supplying a value
  to a memref load, and replaces the result of that load with the value being
  stored. The memref is also deleted when possible if only stores remain.

- add methods for post dominance for MLFunction blocks.

- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
  getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
  earlier static)

- add a helper method in FlatAffineConstraints - isRangeOneToOne.

PiperOrigin-RevId: 227252907

Fix b/122139732; update FlatAffineConstraints::isEmpty() to eliminate IDs in a
better order.

- update isEmpty() to eliminate IDs in a better order. Speed improvement for
complex cases (for eg. high-d reshape's involving mod's/div's).
- minor efficiency update to projectOut (was earlier making an extra albeit
benign call to gaussianEliminateIds) (NFC).
- move getBestIdToEliminate further up in the file (NFC).
- add the failing test case.
- add debug info to checkMemRefAccessDependence.

PiperOrigin-RevId: 227244634

Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops. Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.

This is step 25/n towards merging instructions and statements.

PiperOrigin-RevId: 227243966

Switch rewriters for relu, relu6, placeholder_input, softmax to patterns.

Add a constant F32 attribute for use with softmax legalization.

PiperOrigin-RevId: 227241643

Have the asmprinter take advantage of the new capabilities of the asmparser, by
printing the entry block in a CFG function's argument line. Since I'm touching
all of the testcases anyway, change the argument list from printing as
"%arg : type" to "%arg: type" which is more consistent with bb arguments.

In addition to being more consistent, this is a much nicer look for cfg functions.

PiperOrigin-RevId: 227240069

Clean up and improve the parser handling of basic block labels, now that we
have a designator. This improves diagnostics and merges handling between CFG
and ML functions more. This also eliminates hard coded parser knowledge of
terminator keywords, allowing dialects to define their own terminators.

PiperOrigin-RevId: 227239398

Introduce ^ as a basic block sigil, eliminating an ambiguity on the MLIR
syntax.

PiperOrigin-RevId: 227234174

Merge the verifier logic for all functions into a unified framework, this
requires enhancing DominanceInfo to handle the structure of an ML function,
which is required anyway. Along the way, this also fixes a const correctness
problem with Instruction::getBlock().

This is step 24/n towards merging instructions and statements.

PiperOrigin-RevId: 227228900

Enhance parsing of CFG and Ext functions to optionally allow named arguments in
the function signature, giving them common functionality to ml functions.  This
is a strictly additive patch that adds new capability without changing behavior
in a significant way (other than a few diagnostic cleanups).  A subsequent
patch will change the printer to use this behavior, which will require updating
a ton of testcases.  :)

This exposes the fact that we need to make a grammar change for block
arguments, as is tracked by b/122119779

This is step 23/n towards merging instructions and statements, and one of the
first steps towards eliminating the "cfg vs ml" distinction at a syntax and
semantic level.

PiperOrigin-RevId: 227228342

Match multiple pattern nodes as input to rewrite.

* Allow multi input node patterns in the rewrite;
* Use number of nodes matched as benefit;
* Rewrite relu(add(...)) matching using the new pattern;

To allow for undefined ops, do string compare - will address soon!

PiperOrigin-RevId: 227225425

Tidy up references to "basic blocks" that should refer to blocks now. NFC.

PiperOrigin-RevId: 227196077

Merge parser logic for CFG and ML functions, shrinking the code
by ~80 lines. This causes a slight change to diagnostics, but
is otherwise behavior preserving.

This is step 22/n towards merging instructions and statements, MFC.

PiperOrigin-RevId: 227187857

Standardize naming of statements -> instructions, revisting the code base to be
consistent and moving the using declarations over. Hopefully this is the last
truly massive patch in this refactoring.

This is step 21/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227178245

Extend/complete dependence tester to utilize local var info.

- extend/complete dependence tester to utilize local var info while adding
  access function equality constraints; one more step closer to get slicing
  based fusion working in the general case of affine_apply's involving mod's/div's.
- update test case to reflect more accurate dependence information; remove
  inaccurate comment on test case mod_deps.
- fix a minor "bug" in equality addition in addMemRefAccessConstraints (doesn't
  affect correctness, but the fixed version is more intuitive).
- some more surrounding code clean up
- move simplifyAffineExpr out of anonymous AffineExprFlattener class - the
  latter has state, and the former should reside outside.

PiperOrigin-RevId: 227175600

Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand.

The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.

This is step 19/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227163082

Add convenience wrapper for operator in tblgen

Add convenience wrapper to make it easier to iterate over attributes and operands of operator defined in TableGen file. Use this class in RewriterGen (not used in the op generator yet, will do shortly). Change the RewriterGen to pass the bound arguments explicitly, this is in preparation for multi-op matching.

PiperOrigin-RevId: 227156748

Merge ext/cfg/ml function printing logic in the AsmPrinter (shrinking it
by about 100 LOC), without changing any existing behavior.

This is step 20/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227155000

Eliminate the using decls for MLFunction and CFGFunction standardizing on
Function.

This is step 18/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227139399

Fix incorrect names due to merging of tblgen tools.

PiperOrigin-RevId: 227131485

Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(),
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.

This is step 17/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227121537

Merge Operation into OperationInst and standardize nomenclature around
OperationInst. This is a big mechanical patch.

This is step 16/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227093712

Update vim syntax file to highlight core ops

PiperOrigin-RevId: 227082502

Rework inherentance hierarchy: Operation now derives from Statement, and
OperationInst derives from it. This allows eliminating some forwarding
functions, other complex code handling multiple paths, and the 'isStatement'
bit tracked by Operation.

This is the last patch I think I can make before the big mechanical change
merging Operation into OperationInst, coming next.

This is step 15/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227077411

add a method to get FloatAttr value as double

Sometimes we have to get the raw value of the FloatAttr to invoke APIs from
non-MLIR libraries (i.e. in the tpu_ops.inc and convert_tensor.cc files). Using
`FloatAttr::getValue().convertToFloat()` and
`FloatAttr::getValue().convertToDouble()` is not safe because interally they
checke the semantics of the APFloat in the attribute, and the semantics is not
always specified (the default value is f64 then convertToFloat will fail) or
inferred incorrectly (for example, using 1.0 instead of 1.f for IEEEFloat).
Calling these convert methods without knowing the semantics can usually crash
the compiler.

This new method converts the value of a FloatAttr to double even if it loses
precision. Currently this method can be used to read in f32 data from arrays.

PiperOrigin-RevId: 227076616

Fix an ASAN detected bug introduced by cr/227067644. While MLFunctions always
have one block, CFG Functions don't necessarily have one (e.g. when they are
being first constructed by the parser).

PiperOrigin-RevId: 227075636

Delicately re-layer Operation, Statement, and OperationStmt, reworking
#includes so Statements.h includes Operation.h but nothing else does. This is
in preparation to eliminate the Operation class and the complexity it brings
with it. I split this patch off because it is just moving stuff around, the
next patch will be more complex.

This is step 14/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227071777

Minor renamings: Trim the "Stmt" prefix off
StmtSuccessorIterator/StmtSuccessorIterator, and rename and move the
CFGFunctionViewGraph pass to ViewFunctionGraph.

This is step 13/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227069438

Fix affine expr flattener bug introduced by cl/225452174.

- inconsistent local var constraint size when repeatedly using the same
flattener for all expressions in a map.

PiperOrigin-RevId: 227067836

Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new
FuncBuilder class. Also rename SSAValue.cpp to Value.cpp

This is step 12/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227067644

Merge SSAValue, CFGValue, and MLValue together into a single Value class, which
is the new base of the SSA value hierarchy. This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.

This is step 11/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227064624

Eliminate the Instruction, BasicBlock, CFGFunction, MLFunction, and ExtFunction classes, using the Statement/StmtBlock hierarchy and Function instead.

This *only* changes the internal data structures, it does not affect the user visible syntax or structure of MLIR code. Function gets new "isCFG()" sorts of predicates as a transitional measure.

This patch is gross in a number of ways, largely in an effort to reduce the amount of mechanical churn in one go. It introduces a bunch of using decls to keep the old names alive for now, and a bunch of stuff needs to be renamed.

This is step 10/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227044402

LoopAnalysis: isContiguousAccess fail gracefully

Existing implementation of isContiguousAccess asserts that one of the
function arguments is within certain range, depending on another parameter.
However, the value of this argument may come from outside, in particular in the
loop vectorization pass it may come from command line arguments.  This leads
to 'mlir-opt' crashing on an assertion depending on flags.  Handle the error
gracefully by reporting error returning a negative result instead.  This
negative result prevents any further transformation by the vectorizer so the IR
remains valid.

PiperOrigin-RevId: 227029496

Move print op stats pass to analysis.

Move PrintOpStatsPass out of tools and to other passes (moved to Analysis as it
doesn't modify the program but it is different than the other analysis passes
as it is only consumer at present is the user).

PiperOrigin-RevId: 227018996

Merge mlir-op-gen and mlir-rewriter-gen into mlir-tblgen.

Unify the two tools before factoring out helper classes.

PiperOrigin-RevId: 227015406

Rename findFunction from the ML side of the house to be named getFunction(),
making it more similar to the CFG side of things. It is true that in a deeply
nested case that this is not a guaranteed O(1) time operation, and that 'get'
could lead compiler hackers to think this is cheap, but we need to merge these
and we can look into solutions for this in the future if it becomes a problem
in practice.

This is step 9/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226983931

Inline Instruction's operands as TrailingObjects

For performance/memory saving purpose, having the Instruction holding a
std::vector for the operands isn't a really good tradeoff. The only reason for
this was to support adding/removing easily BasicBlock arguments to Terminator.
Since this isn't the most common operation, we instead force a pre-allocated
list of operands on Instructions at creation time.
PiperOrigin-RevId: 226981227

Rename CFGFunctionGraphTraits.h -> FunctionGraphTraits.h and add
graph specializations for doing CFG traversals of ML Functions, making the two
sorts of functions have the same capabilities.

This is step 8/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226968502

Eliminate the MLFuncArgument class representing arguments to MLFunctions: use the
BlockArgument arguments of the entry block instead. This makes MLFunctions and
CFGFunctions work more similarly.

This is step 7/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226966975

Introduce a new StmtBlockList type to hold a list of StmtBlocks. Use it in
MLFunction, IfStmt, ForStmt even though they currently only contain exactly one
block in that list.

This is step 6/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226960278

Support NameLoc and CallSiteLoc for mlir::Location

The NameLoc can be used to represent a variable, node or method. The
CallSiteLoc has two fields, one represents the concrete location and another
one represents the caller's location. Multiple CallSiteLocs can be chained as
a call stack.

For example, the following call stack
```
AAA
at file1:1
at file2:135
at file3:34
```

can be formed by call0:

```
auto name = NameLoc::get("AAA");
auto file1 = FileLineColLoc::get("file1", 1);
auto file2 = FileLineColLoc::get("file2", 135);
auto file3 = FileLineColLoc::get("file3", 34);
auto call2 = CallSiteLoc::get(file2, file3);
auto call1 = CallSiteLoc::get(file1, call2);
auto call0 = CallSiteLoc::get(name, call1);
```

PiperOrigin-RevId: 226941797

SuperVectorization: fix 'isa' assertion

Supervectorization uses null pointers to SSA values as a means of communicating
the failure to vectorize.  In operation vectorization, all operations producing
the values of operation arguments must be vectorized for the given operation to
be vectorized.  The existing check verified if any of the value "def"
statements was vectorized instead, sometimes leading to assertions inside `isa`
called on a null pointer.  Fix this to check that all "def" statements were
vectorized.

PiperOrigin-RevId: 226941552

LLVM IR lowering: support SubIOp and SubFOp

The binary subtraction operations were not supported by the lowering because
they were not essential for the testing flow. Add support for these
operations.

PiperOrigin-RevId: 226941463

Rename convenience methods to make type explicit.

PiperOrigin-RevId: 226939383

Refactor MLFunction to contain a StmtBlock for its body instead of inheriting
from it. This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future. This also removes IfClause and ForStmtBody.

This is step 5/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226936541

Eliminate the ability to add operands to an instruction, used in a narrow case
for SSA values in terminators, but easily worked around. At the same time,
move the StmtOperand list in a OperationStmt to the end of its trailing
objects list so we can *reduce* the number of operands, without affecting
offsets to the other stuff in the allocation.

This is important because we want OperationStmts to be consequtive, including
their operands - we don't want to use an std::vector of operands like
Instructions have.

This is patch 4/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226865727

Implement StmtBlocks support for arguments and pred/succ iteration. This isn't
tested yet, but will when stuff starts switching over to it. This is part 3/n of merging CFGFunctions and MLFunctions.

PiperOrigin-RevId: 226794787

Per review on the previous CL, drop MLFuncBuilder::createOperation, changing
clients to use OperationState instead.  This makes MLFuncBuilder more similiar
to CFGFuncBuilder.  This whole area will get tidied up more when cfg and ml
worlds get unified.  This patch is just gardening, NFC.

PiperOrigin-RevId: 226701959

Give StmtBlocks a use-def list, and give OperationStmt's the ability to have
optional successor operands when they are terminator operations.

This isn't used yet, but is part 2/n towards merging BasicBlock into StmtBlock
and Instruction into OperationStmt.

PiperOrigin-RevId: 226684636

Refactor ForStmt: having it contain a StmtBlock instead of subclassing
StmtBlock. This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.

This is step 1/N towards merging BasicBlock and StmtBlock. This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.

This is a mechanical patch, NFC.

PiperOrigin-RevId: 226684158

Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice.
Updates LoopFusion pass to take these parameters as command line flags for experimentation.

PiperOrigin-RevId: 226514297

Unify type uniquing and construction.

This allows for us to decouple type uniquing/construction from MLIRContext and pave the way for dialect specific types.

To accomplish this we two new classes, TypeUniquer and TypeStorageAllocator.

* TypeUniquer is now responsible for all construction and uniquing of types.
* TypeStorageAllocator is a utility used by derived type storage objects to allocate memory within an MLIRContext.

This cl also standardizes what a derived type storage class needs to provide:
    - Define a type alias, KeyTy, to a type that uniquely identifies the
      instance of the type within its kind.
      * The key type must be constructible from the values passed into the
        detail::TypeUniquer::get call after the type kind.
      * The key type must have a llvm::DenseMapInfo specialization for
        hashing.

    - Provide a method, 'KeyTy getKey() const', to construct the key type
      from an existing storage instance.

    - Provide a construction method:
        'DerivedStorage *construct(TypeStorageAllocator &, ...)'
      that builds a unique instance of the derived storage. The arguments
      after the TypeStorageAllocator must correspond with the values passed
      into the detail::TypeUniquer::get call after the type kind.

PiperOrigin-RevId: 226507184

Expand rewriter gen to handle string attributes in output.

* Extend to handle rewrite patterns with output attributes;
- Constant attributes are defined with a value and a type;
- The type of the value is mapped to the corresponding attribute type (string -> StringAttr);
* Verifies the type of operands in the resultant matches the defined op's operands;

PiperOrigin-RevId: 226468908

Add method to retrieve a pass's ID.

Add passID member to Pass and enable querying it.

PiperOrigin-RevId: 226445431

Do proper indexing for local variables when building access function equality constraints (working on test cases).

PiperOrigin-RevId: 226399089

Pass loop depth 1 to memref dependence check when constructing dependence constraints used to calculate computation slice for loop fusion.
This done so that the dominance check between ancestors of op statements from src/dst memref accesses will be run.

PiperOrigin-RevId: 226350443

Change attribute to be input argument.

Change operands to arguments in Op and use it for both operands and arguments. This unifies the way that operands and attributes are specified and the intended way that matching/creating ops with attributes will look. Both can now be represented using the same dag structure (and also makes the ordering more explicit). Derived attributes are not considered as part of the arguments (as they are inferred from the created op, not something needed to created it).
* Generate named operand accessors;
* Simplified the way of specifying Attr and use ElementAttr for TFL_Const instead.
* Fix a incorrect assertion generated;

The input parsing can be made more robust, I'll address that in a follow up.

PiperOrigin-RevId: 226307424

Address some issues from memref dependence check bug (b/121216762), adds tests cases.

PiperOrigin-RevId: 226277453

Improve loop fusion algorithm by using a memref dependence graph.
Fixed TODO for reduction fusion unit test.

PiperOrigin-RevId: 226277226

Simplify memref-dependence-check's meta data structures / drop duplication and
reuse existing ones.

- drop IterationDomainContext, redundant since FlatAffineConstraints has
MLValue information associated with its dimensions.
- refactor to use existing support
- leads to a reduction in LOC
- as a result of these changes, non-constant loop bounds get naturally
supported for dep analysis.
- update test cases to include a couple with non-constant loop bounds
- rename addBoundsFromForStmt -> addForStmtDomain
- complete TODO for getLoopIVs (handle 'if' statements)

PiperOrigin-RevId: 226082008

Update / complete a TODO for addBoundsForForStmt

- when adding constraints from a 'for' stmt into FlatAffineConstraints,
  correctly add bound operands of the 'for' stmt as a dimensional identifier or
  a symbolic identifier depending on whether the bound operand is a valid
  MLFunction symbol
- update test case to exercise this.

PiperOrigin-RevId: 225988511

Densify storage for f16, f32 and support f16 semantics in FloatAttrs

Existing implementation always uses 64 bits to store floating point values in
DenseElementsAttr.  This was due to FloatAttrs always a `double` for storage
independently of the actual type.  Recent commits added support for FloatAttrs
with the proper f32 type and floating semantics and changed the bitwidth
reporting on FloatType.

Use the existing infrastructure for densely storing 16 and 32-bit values in
DenseElementsAttr storage to store f16 and f32 values.  Move floating semantics
definition to the FloatType level.  Properly support f16 / IEEEhalf semantics
at the FloatAttr level and in the builder.

Note that bf16 is still stored as a 64-bit value with IEEEdouble semantics
because APFloat does not have first-class support for bf16 types.

PiperOrigin-RevId: 225981289

Refactor/update memref-dep-check's addMemRefAccessConstraints and
addDomainConstraints; add support for mod/div for dependence testing.

- add support for mod/div expressions in dependence analysis
- refactor addMemRefAccessConstraints to use getFlattenedAffineExprs (instead
of getFlattenedAffineExpr); update addDomainConstraints.
- rename AffineExprFlattener::cst -> localVarCst

PiperOrigin-RevId: 225933306

Refactor LowerVectorTransfersPass using pattern rewriters

This introduces a generic lowering pass for ML functions.  The pass is
parameterized by template arguments defining individual pattern rewriters.
Concrete lowering passes define individual pattern rewriters and inherit from
the generic class that takes care of allocating rewriters, traversing ML
functions and performing the actual rewrite.

While this is similar to the greedy pattern rewriter available in
Transform/Utils, it requires adjustments due to the ML/CFG duality.  In
particular, ML function rewriters must be able to create statements, not only
operations, and need access to an MLFuncBuilder.  When we move to using the
unified function type, the ML-specific rewriting will become unnecessary.

Use LowerVectorTransfers as a testbed for the generic pass.

PiperOrigin-RevId: 225887424

LLVM IR lowering: support vector_type_cast

Introduce support for lowering vector_type_cast to LLVM IR.  It consists in
creating a new MemRef descriptor with the base pointer with the type that
corresponds to the lowered element type of the target memref.  Since
`vector_type_cast` does not support dynamic shapes in the target type, no
dynamic size conversion is necessary.

This commit goes in the opposite direction of what is expected of LLVM IR
lowering: it should not be aware of all the other dialects.  Instead, we should
have separate definitions for conversions in a global lowering framework.
However, this requires LLVM dialect to be implemented, which is currently
blocked by the absence of user-defined types.  Implement the lowering anyway to
unblock end-to-end vectorization experiments.
PiperOrigin-RevId: 225887368

Materialize vector_type_cast operation in the SuperVector dialect

This operation is produced and used by the super-vectorization passes and has
been emitted as an abstract unregistered operation until now.  For end-to-end
testing purposes, it has to be eventually lowered to LLVM IR.  Matching
abstract operation by name goes into the opposite direction of the generic
lowering approach that is expected to be used for LLVM IR lowering in the
future.  Register vector_type_cast operation as a part of the SuperVector
dialect.

Arguably, this operation is a special case of the `view` operation from the
Standard dialect.  The semantics of `view` is not fully specified at this point
so it is safer to rely on a custom operation.  Additionally, using a custom
operation may help to achieve clear dialect separation.

PiperOrigin-RevId: 225887305

Refactor / eliminate duplicate code in
memref-dep-check / getIterationDomainContext

PiperOrigin-RevId: 225857762

Type system: replace Type::getBitWidth with getIntOrFloatBitWidth

As MLIR moves towards dialect-specific types, a generic Type::getBitWidth does
not make sense for all of them.  Even with the current type system, the bit
width is not defined (and causes the method in question to abort) for all
TensorFlow types.

This commit restricts the bit width definition to primitive standard types that
have a number of bits appearing verbatim in their type, i.e., integers and
floats.  As a side effect, it delegates the decision on the bit width of the
`index` to the backends.  Existing backends currently hardcode it to 64 bits.

The Type::getBitWidth method is replaced by Type::getIntOrFloatBitWidth that
only applies to integers and floats.  The call sites are updated to use the new
method, where applicable, or rewritten so as not rely on it.  Incidentally,
this fixes a utility method that did not account for memrefs being allowed to
have vectors as element types in the size computation.

As an observation, several places in the code use Type in places where a more
specific type could be used instead.  Some of those are fixed by this commit.

PiperOrigin-RevId: 225844792

loop-unroll - add function callback argument for outside targets to
provide unroll factors, and a cmd line argument to specify number of
innermost loop unroll repetitions.

- add function callback parameter for outside targets to provide unroll factors
- add a cmd line parameter to repeatedly apply innermost loop unroll a certain
number of times (to avoid using -loop-unroll -loop-unroll ...; instead
-unroll-num-reps=2).
- implement the callback for a target
- update test cases / usage

PiperOrigin-RevId: 225843191

Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion.
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.

PiperOrigin-RevId: 225842944

Fix builder getFloatAttr of double to use F64 type and use fltSemantics in FloatAttr.

Store FloatAttr using more appropriate fltSemantics (mostly fixing up F32/F64 storage, F16/BF16 pending). Previously F32 type was used incorrectly for double (the storage was double). Also add query method that returns fltSemantics for IEEE fp types and use that to verify that the APfloat given matches the type:
* FloatAttr created using APFloat is verified that the semantics of the type and APFloat matches;
* FloatAttr created using double has the APFloat created to match the semantics of the type;

Change parsing of tensor negative splat element to pass in the element type expected. Misc other changes to account for the storage type matching the attribute.

PiperOrigin-RevId: 225821834