## Globally Applied Rules
-These transformation are applied to all levels of IR:
+These transformations are applied to all levels of IR:
* Elimination of operations that have no side effects and have no uses.
* **Being declarative**: The pattern creator just needs to state the rewrite
pattern declaratively, without worrying about the concrete C++ methods to
call.
-* **Removing boilerplate and showing the very essense the the rewrite**:
+* **Removing boilerplate and showing the very essence of the rewrite**:
`mlir::RewritePattern` is already good at hiding boilerplate for defining a
rewrite rule. But we still need to write the class and function structures
required by the C++ programming language, inspect ops for matching, and call
op `build()` methods for constructing. These statements are typically quite
simple and similar, so they can be further condensed with auto-generation.
Because we reduce the boilerplate to the bare minimum, the declarative
- rewrite rule will just contain the very essense of the rewrite. This makes
+ rewrite rule will just contain the very essence of the rewrite. This makes
it very easy to understand the pattern.
## Strengths and Limitations
#### Binding op results
-In the result pattern, we can bind to the result(s) of an newly built op by
+In the result pattern, we can bind to the result(s) of a newly built op by
attaching symbols to the op. (But we **cannot** bind to op arguments given that
they are referencing previously bound symbols.) This is useful for reusing
newly created results where suitable. For example,
transformations on the arguments by calling into C++ helper functions. This is
achieved by `NativeCodeCall`.
-For example, if we want to catpure some op's attributes and group them as an
+For example, if we want to capture some op's attributes and group them as an
array attribute to construct a new op:
```tblgen
##### Customizing entire op building
`NativeCodeCall` is not only limited to transforming arguments for building an
-op; it can also used to specify how to build an op entirely. An example:
+op; it can be also used to specify how to build an op entirely. An example:
If we have a C++ function for building an op:
### Supporting auxiliary ops
-A declarative rewrite rule supports multiple result patterns. One of the purpose
-is to allow generating _auxiliary ops_. Auxiliary ops are operations used for
-building the replacement ops; but they are not directly used for replacement
-themselves.
+A declarative rewrite rule supports multiple result patterns. One of the
+purposes is to allow generating _auxiliary ops_. Auxiliary ops are operations
+used for building the replacement ops; but they are not directly used for
+replacement themselves.
For the case of uni-result ops, if there are multiple result patterns, only the
value generated from the last result pattern will be used to replace the matched
Constraints can be placed on op arguments when matching. But sometimes we need
to also place constraints on the matched op's results or sometimes need to limit
-the matching with some constraints that cover both the arugments and the
+the matching with some constraints that cover both the arguments and the
results. The third parameter to `Pattern` (and `Pat`) is for this purpose.
For example, we can write
### Adjusting benefits
-The benefit of a `Pattern` is an integer value indicating the benfit of matching
+The benefit of a `Pattern` is an integer value indicating the benefit of matching
the pattern. It determines the priorities of patterns inside the pattern rewrite
driver. A pattern with a higher benefit is applied before one with a lower
benefit.
* If a smaller one is applied first the larger one may not apply anymore.
-The forth parameter to `Pattern` (and `Pat`) allows to manually tweak a
+The fourth parameter to `Pattern` (and `Pat`) allows to manually tweak a
pattern's benefit. Just supply `(addBenefit N)` to add `N` to the benefit value.
## Special directives
### Defining the type class
As described above, `Type` objects in MLIR are value-typed and rely on having an
-implicity internal storage object that holds the actual data for the type. When
+implicitly internal storage object that holds the actual data for the type. When
defining a new `Type` it isn't always necessary to define a new storage class.
So before defining the derived `Type`, it's important to know which of the two
classes of `Type` we are defining. Some types are `primitives` meaning they do
```c++
struct MyDialect : public Dialect {
MyDialect(MLIRContext *context) : Dialect(/*name=*/"mydialect", context) {
- /// Add these types to the dialcet.
+ /// Add these types to the dialect.
addTypes<SimpleType, ComplexType>();
}
};
* Adopts [camelBack](https://llvm.org/docs/Proposals/VariableNames.html);
* Except for IR units (Region, Block, and Operation), non-nullable output
- argument are passed by non-const reference in general.
+ arguments are passed by non-const reference in general.
* IR constructs are not designed for [const correctness](UsageOfConst.md).
* Do *not* use recursive algorithms if the recursion can't be bounded
statically: that is avoid recursion if there is a possible IR input that can
ceildiv, and (4) addition and subtraction. All of these operators associate from
left to right.
-A _multi-dimensional affine expression_ is a comma separated list of
+A _multidimensional affine expression_ is a comma separated list of
one-dimensional affine expressions, with the entire list enclosed in
parentheses.
allow 'floordiv', 'ceildiv', and 'mod' with respect to positive integer
constants. Such extensions to affine functions have often been referred to as
quasi-affine functions by the polyhedral compiler community. MLIR uses the term
-'affine map' to refer to these multi-dimensional quasi-affine functions. As
+'affine map' to refer to these multidimensional quasi-affine functions. As
examples, $$(i+j+1, j)$$, $$(i \mod 2, j+i)$$, $$(j, i/4, i \mod 4)$$, $$(2i+1,
j)$$ are two-dimensional affine functions of $$(i, j)$$, but $$(i \cdot j,
i^2)$$, $$(i \mod j, i/j)$$ are not affine functions of $$(i, j)$$.
In these operations, `<size>` must be a value of wrapped LLVM IR integer type,
`<address>` must be a value of wrapped LLVM IR pointer type, and `<value>` must
-be a value of wrapped LLVM IR type that corresponds to the pointee type of
+be a value of wrapped LLVM IR type that corresponds to the pointer type of
`<address>`.
The `index` operands are integer values whose semantics is identical to the
* Stay as the same semantic level and try to be a mechanical 1:1 mapping;
* But deviate representationally if possible with MLIR mechanisms.
-* Be straightforward to serialize into and deserialize drom the SPIR-V binary
+* Be straightforward to serialize into and deserialize from the SPIR-V binary
format.
## Conventions
* Requirements for capabilities, extensions, extended instruction sets,
addressing model, and memory model is conveyed using `spv.module`
attributes. This is considered better because these information are for the
- exexcution environment. It's eaiser to probe them if on the module op
+ execution environment. It's easier to probe them if on the module op
itself.
-* Annotations/decoration instrutions are "folded" into the instructions they
- decorate and represented as attributes on those ops. This elimiates
+* Annotations/decoration instructions are "folded" into the instructions they
+ decorate and represented as attributes on those ops. This eliminates
potential forward references of SSA values, improves IR readability, and
makes querying the annotations more direct.
* Types are represented using MLIR standard types and SPIR-V dialect specific
...
\ | /
v
- +-------------+ (may have mulitple incoming branches)
+ +-------------+ (may have multiple incoming branches)
| merge block |
+-------------+
```
On a GPU one could then map `i`, `j`, `k` to blocks and threads. Notice that the
temporary storage footprint is `3 * 5` values but `3 * 4 * 5` values are
-actually transferred betwen `%A` and `%tmp`.
+actually transferred between `%A` and `%tmp`.
Alternatively, if a notional vector broadcast operation were available, the
lowered code would resemble:
almost all cases: you can simply instantiate the same pattern one time for each
possible cost and use the predicate to guard the match.
-The two phase nature of this API (match separate from rewrite) is important for
+The two-phase nature of this API (match separate from rewrite) is important for
two reasons: 1) some clients may want to explore different ways to tile the
graph, and only rewrite after committing to one tiling. 2) We want to support
runtime extensibility of the pattern sets, but want to be able to statically
An MLIR Function is an operation with a name containing one [region](#regions).
The region of a function is not allowed to implicitly capture values defined
-outside of the function, and all external references must use Function arguments
-or attributes that establish a symbolic connection(e.g. symbols referenced by
+outside of the function, and all external references must use function arguments
+or attributes that establish a symbolic connection (e.g. symbols referenced by
name via a string attribute like [SymbolRefAttr](#symbol-reference-attribute)):
``` {.ebnf}
^bb2:
"accelerator.launch"() {
^bb0:
- // Region of code nested under "accelerator_launch", it can reference %a but
+ // Region of code nested under "accelerator.launch", it can reference %a but
// not %value.
%new_value = "accelerator.do_something"(%a) : (i64) -> ()
}
// %new_value cannot be referenced outside of the region
-...
+
+^bb3:
+ ...
}
```
// f32 elements.
%T = alloc(%M, %N) [%B1, %B2] : memref<?x?xf32, #tiled_dynamic>
-// A memref that has a two element padding at either end. The allocation size
+// A memref that has a two-element padding at either end. The allocation size
// will fit 16 * 68 float elements of data.
%P = alloc() : memref<16x64xf32, #padded>
integer-set-attribute ::= affine-map
```
-An integer-set attribute is an attribute that represents a integer-set object.
+An integer-set attribute is an attribute that represents an integer-set object.
#### String Attribute
The MLIR in-memory data structure has a human readable and writable format, as
well as [a specification](LangRef.md) for that format - built just like any
-other programming language. Important properties of this format is that it is
+other programming language. Important properties of this format are that it is
compact, easy to read, and lossless. You can dump an MLIR program out to disk
and munge around with it, then send it through a few more passes.
appear - because the verifier can be run at any time, either as a compiler pass
or with a single function call.
-While MLIR provides a well considered infrastructure for IR verification, and
+While MLIR provides a well-considered infrastructure for IR verification, and
has simple checks for existing TensorFlow operations, there is a lot that should
be added here and lots of opportunity to get involved!
The "CHECK" comments are interpreted by the
[LLVM FileCheck tool](https://llvm.org/docs/CommandGuide/FileCheck.html), which
-is sort of like a really advanced grep. This test is fully self contained: it
+is sort of like a really advanced grep. This test is fully self-contained: it
feeds the input into the [canonicalize pass](Canonicalization.md), and checks
that the output matches the CHECK lines. See the `test/Transforms` directory for
more examples. In contrast, standard unit testing exposes the API of the
tiles into other DAG tiles, using a declarative pattern format. DAG to DAG
rewriting is a generalized solution for many common compiler optimizations,
lowerings, and other rewrites and having an IR enables us to invest in building
-a single high quality implementation.
+a single high-quality implementation.
Declarative pattern rules are preferable to imperative C++ code for a number of
reasons: they are more compact, easier to reason about, can have checkers
MLIR has been designed to be memory and compile-time efficient in its algorithms
and data structures, using immutable and uniqued structures, low level
-bit-packing, and other well known techniques to avoid unnecessary heap
+bit-packing, and other well-known techniques to avoid unnecessary heap
allocations, and allow simple and safe multithreaded optimization of MLIR
programs. There are other reasons to believe that the MLIR implementations of
common transformations will be more efficient than the Python and C++
`Confined` is provided as a general mechanism to help modelling further
constraints on attributes beyond the ones brought by value types. You can use
`Confined` to compose complex constraints out of more primitive ones. For
-example, an 32-bit integer attribute whose minimal value must be 10 can be
+example, a 32-bit integer attribute whose minimal value must be 10 can be
expressed as `Confined<I32Attr, [IntMinValue<10>]>`.
Right now, the following primitive constraints are supported:
### Custom builder methods
-For each operation, there are two builder automatically generated based on the
+For each operation, there are two builders automatically generated based on the
arguments and returns types:
```c++
ArrayRef<NamedAttribute> attributes);
```
-The above cases makes sure basic uniformity so that we can create ops using the
+The above cases make sure basic uniformity so that we can create ops using the
same form regardless of the exact op. This is particularly useful for
implementing declarative pattern rewrites.
Similarly, a set of `AttrConstraint`s are created for helping modelling
constraints of common attribute kinds. They are the `Attr` subclass hierarchy.
-It includes `F32Attr` for the constraints of being an float attribute,
+It includes `F32Attr` for the constraints of being a float attribute,
`F32ArrayAttr` for the constraints of being a float array attribute, and so on.
### Multi-entity constraint
For more complicated predicates, you can wrap it in a single `CPred`, or you
can use predicate combiners to combine them. For example, to write the
-constraint that an attribute `attr` is an 32-bit or 64-bit integer, you can
+constraint that an attribute `attr` is a 32-bit or 64-bit integer, you can
write it as
```tablegen
As to whether we should define the predicate using a single `CPred` wrapping
the whole expression, multiple `CPred`s with predicate combiners, or a single
`CPred` "invoking" a function, there are no clear-cut criteria. Defining using
-`CPred` and predicate combiners is preferrable since it exposes more information
+`CPred` and predicate combiners is preferable since it exposes more information
(instead hiding all the logic behind a C++ function) into the op definition spec
-so that it can pontentially drive more auto-generation cases. But it will
+so that it can potentially drive more auto-generation cases. But it will
require a nice library of common predicates as the building blocks to avoid the
duplication, which is being worked on right now.
But shape functions are determined by attributes and could be arbitrarily
complicated with a wide-range of specification possibilities. Equality
-relationship are common (e.g., the elemental type of the output matches the
+relationships are common (e.g., the elemental type of the output matches the
primitive type of the inputs, both inputs have exactly the same type [primitive
type and shape]) and so these should be easy to specify. Algebraic relationships
would also be common (e.g., a concat of `[n,m]` and `[n,m]` matrix along axis 0
value, the zero point must be an integer between the minimum and maximum affine
value (inclusive). For example, given an affine value represented by an 8 bit
unsigned integer, we have: $$ 0 \leq zero\_point \leq 255$$. This is important,
-because in deep neural networks's convolution-like operations, we frequently
+because in deep neural networks' convolution-like operations, we frequently
need to zero-pad inputs and outputs, so zero must be exactly representable, or
the result will be biased.
In the above, we assume that $$real\_value$$ is a Single, $$scale$$ is a Single,
$$roundToNearestInteger$$ returns a signed 32 bit integer, and $$zero\_point$$
is an unsigned 8 or 16 bit integer. Note that bit depth and number of fixed
-point values is indicative of common types on typical hardware but is not
+point values are indicative of common types on typical hardware but is not
constrained to particular bit depths or a requirement that the entire range of
an N-bit integer is used.
structure of the IR, operations, etc. See
[Table-driven Operation Definition](OpDefinitions.md) and
[Declarative Rewrite Rule](DeclarativeRewrites.md) for the detailed explanation
-of all available mechansims for defining operations and rewrites in a
+of all available mechanisms for defining operations and rewrites in a
table-driven manner.
## Adding operation
There are multiple forms of graph rewrite that can be performed in MLIR. One of
the most common is DAG tile to DAG tile rewrite. Patterns provide a concise way
to express this transformation as a pair of source pattern to match and
-resultant pattern. There is both the C++ classes to represent this
+resultant pattern. There are both the C++ classes to represent this
transformation, as well as the patterns in TableGen from which these can be
generated.
MLIR uses ideas drawn from IRs of LLVM and Swift for lower level constructs
while combining them with ideas from the polyhedral abstraction to represent
-loop nests, multi-dimensional data (tensors), and transformations on these
+loop nests, multidimensional data (tensors), and transformations on these
entities as first class concepts in the IR.
MLIR is a multi-level IR, i.e., it represents code at a domain-specific
Maps, sets, and relations with affine constraints are the core structures
underlying a polyhedral representation of high-dimensional loop nests and
-multi-dimensional arrays. These structures are represented as textual
+multidimensional arrays. These structures are represented as textual
expressions in a form close to their mathematical form. These structures are
used to capture loop nests, tensor data structures, and how they are reordered
and mapped for a target architecture. All structured or "conforming" loops are
Dialect extended types are represented as string literals wrapped inside of the
dialect namespace. This means that the parser delegates to the dialect for
parsing specific type instances. This differs from the representation of dialect
-defined operations, of which have a identifier name that the parser uses to
+defined operations, of which have an identifier name that the parser uses to
identify and parse them.
This representation was chosen for several reasons:
The current MLIR uses a representation of polyhedral schedules using a tree of
if/for loops. We extensively debated the tradeoffs involved in the typical
unordered polyhedral instruction representation (where each instruction has
-multi-dimensional schedule information), discussed the benefits of schedule tree
+multidimensional schedule information), discussed the benefits of schedule tree
forms, and eventually decided to go with a syntactic tree of affine if/else
conditionals and affine for loops. Discussion of the tradeoff was captured in
this document:
This representation is based on a simplified form of the domain/schedule
representation used by the polyhedral compiler community. Domains represent what
has to be executed while schedules represent the order in which domain elements
-are interleaved. We model domains as non piece-wise convex integer sets, and
+are interleaved. We model domains as non-piece-wise convex integer sets, and
schedules as affine functions; however, the former can be disjunctive, and the
latter can be piece-wise affine relations. In the schedule tree representation,
domain and schedules for instructions are represented in a tree-like structure
and also mutable: notably constants like `i32 0`. In LLVM, these constants are
`Value*r`'s, which allow them to be used as operands to instructions, and that
they also have SSA use lists. Because these things are uniqued, every `i32 0` in
-any function share a use list. This means that optimizing multiple functions in
+any function shares a use list. This means that optimizing multiple functions in
parallel won't work (at least without some sort of synchronization on the use
lists, which would be unbearably inefficient).
are defined in per-function pools, instead of being globally uniqued. 3)
functions themselves are not SSA values either, so they don't have the same
problem as constants. 4) FunctionPasses are copied (through their copy ctor)
-into one instances per thread, avoiding sharing of local state across threads.
+into one instance per thread, avoiding sharing of local state across threads.
This allows MLIR function passes to support efficient multithreaded compilation
and code generation.
that explored the tradeoffs of using this simplified form vs the traditional
polyhedral schedule list form. At some point, this document could be dusted off
and written as a proper academic paper, but until now, it is better to included
-it in this crufty form than not to. Beware that this document uses archaic
+it in this crafty form than not to. Beware that this document uses archaic
syntax and should not be considered a canonical reference to modern MLIR.
## Introduction
### Simplicity of code generation
-A key final stage of an mlfunc is its conversion to a cfg function, which is
+A key final stage of an mlfunc is its conversion to a CFG function, which is
required as part of lowering to the target machine. The simplified form has a
clear advantage here: the IR has a direct correspondence to the structure of the
generated code.
FileCheck is an extremely useful utility, it allows for easily matching various
parts of the output. This ease of use means that it becomes easy to write
brittle tests that are essentially `diff` tests. FileCheck tests should be as
-self contained as possible and focus on testing the minimal set of functionality
-needed. Let's see an example:
+self-contained as possible and focus on testing the minimal set of
+functionalities needed. Let's see an example:
```mlir {.mlir}
// RUN: mlir-opt %s -cse | FileCheck %s
```
Unlike more complex types, RangeType does not require a hashing key for
-unique'ing in the `MLIRContext`. Note that all MLIR types derive from
+uniquing in the `MLIRContext`. Note that all MLIR types derive from
`mlir::Type::TypeBase` and expose `using Base::Base` to enable generic hooks to
work properly (in this instance for llvm-style casts. RangeType does not even
require an implementation file as the above represents the whole code for the
%2 = linalg.slice %1[*, *, %0, *] : !linalg.view<?x?x?xf32>
```
-In this particular case, %2 slices dimension `2` of the four dimensional view
+In this particular case, %2 slices dimension `2` of the four-dimensional view
%1. The returned `!linalg.view<?x?x?xf32>` indicates that the indexing is
rank-reducing and that %0 is an `index`.
PatternMatchResult match(Operation *op) const override;
// A "rewriting" function that takes an original operation `op`, a list of
- // already rewritten opreands, and a function builder `rewriter`. It can use
+ // already rewritten operands, and a function builder `rewriter`. It can use
// the builder to construct new operations and ultimately create new values
// that will replace those currently produced by the original operation. It
// needs to define as many value as the original operation, but their types
}
```
-The actual conversion function may become quite involved. First, Let us go over
+The actual conversion function may become quite involved. First, let us go over
the components of a view descriptor and see how they can be constructed to
represent a _complete_ view of a `memref`, e.g. a view that covers all its
elements.
return builder.getArrayAttr(attrs);
}
- // Emit instructions obtaining individual values from the decsriptor.
+ // Emit instructions obtaining individual values from the descriptor.
Value *ptr() { return intrinsics::extractvalue(elementPtrType(), d, pos(0)); }
Value *offset() { return intrinsics::extractvalue(indexType(), d, pos(1)); }
Value *size(unsigned dim) {
# reuse the previously specialized and inferred version and return `<2, 2>`
var d = multiply_transpose(b, a);
- # A new call with `<2, 2>` for both dimension will trigger another
+ # A new call with `<2, 2>` for both dimensions will trigger another
# specialization of `multiply_transpose`.
var e = multiply_transpose(c, d);