## High-Level Structure {#high-level-structure}
The top-level unit of code in MLIR is a [Module](#module). A module contains a
-list of [Functions](#functions), and there are two types of function
-definitions, a "[CFG Function](#cfg-functions)" and an
-"[ML Function](#ml-functions)". Both kinds of functions are represented as a
-composition of [operations](#operations), but represent control flow in
-different ways: A CFG Function control flow using a CFG of [Blocks](#blocks),
-which contain instructions and end with
-[control flow terminator instructions](#terminator-instructions) (like
-branches). ML Functions represents control flow with a nest of affine loops and
-if conditions. Both types of functions can call back and forth between each
-other arbitrarily.
+list of [Functions](#functions). Functions are represented as a composition of
+[operations](#operations) and contain a Control Flow Graph (CFG) of
+[Blocks](#blocks), which contain instructions and end with
+[terminator operations](#terminator-operations) (like branches).
MLIR is an
[SSA-based](https://en.wikipedia.org/wiki/Static_single_assignment_form) IR,
dominance relations. Operations may produce zero or more results, and each is a
distinct SSA value with its own type defined by the [type system](#type-system).
-MLIR incorporates polyhedral compiler concepts, including `MLFunctions` with
-affine loops and if conditions. It also includes affine maps integrated into the
-type system - they are key to the representation of data and
-[MemRefs](#memref-type), which are the representation for tensors in addressable
-memory. MLIR also supports a first-class Tensor type allowing it to concisely
-represent operations on N-dimensional arrays.
+MLIR incorporates polyhedral compiler concepts, including `for` and `if`
+instructions, which model affine loops and affine conditionals. It also includes
+affine maps integrated into the type system - they are key to the representation
+of data and [MemRefs](#memref-type), which are the representation for tensors in
+addressable memory. MLIR also supports a first-class Tensor type allowing it to
+concisely represent operations on N-dimensional arrays.
Finally, MLIR supports operations for allocating buffers, producing views to
transform them, represent target-independent arithmetic, target-specific
Here's an example of an MLIR module:
```mlir {.mlir}
-// Compute A*B using mlfunc implementation of multiply kernel and print the
+// Compute A*B using an implementation of multiply kernel and print the
// result using a TensorFlow op. The dimensions of A and B are partially
// known. The shapes are assumed to match.
-cfgfunc @mul(tensor<100x?xf32>, tensor<?x50xf32>) -> (tensor<100x50xf32>) {
-// Block ^bb0. %A and %B come from function arguments.
-^bb0(%A: tensor<100x?xf32>, %B: tensor<?x50xf32>):
+func @mul(%A: tensor<100x?xf32>, %B: tensor<?x50xf32>) -> (tensor<100x50xf32>) {
// Compute the inner dimension of %A using the dim operation.
%n = dim %A, 1 : tensor<100x?xf32>
return %C : tensor<100x50xf32>
}
-// ML function that multiplies two memrefs and returns the result.
-mlfunc @multiply(%A: memref<100x?xf32>, %B: memref<?x50xf32>)
+// A function that multiplies two memrefs and returns the result.
+func @multiply(%A: memref<100x?xf32>, %B: memref<?x50xf32>)
-> (memref<100x50xf32>) {
// Compute the inner dimension of %A.
%n = dim %A, 1 : memref<100x?xf32>
for %k = 0 to %n {
%a_v = load %A[%i, %k] : memref<100x?xf32>
%b_v = load %B[%k, %j] : memref<?x50xf32>
- %prod = "mulf"(%a_v, %b_v) : (f32, f32) -> f32
+ %prod = mulf %a_v, %b_v : f32
%c_v = load %C[%i, %j] : memref<100x50xf32>
- %sum = "addf"(%c_v, %prod) : (f32, f32) -> f32
- store %sum to %C[%i, %j] : memref<100x50xf32>
+ %sum = addf %c_v, %prod : f32
+ store %sum, %C[%i, %j] : memref<100x50xf32>
}
}
}
will give them anonymous names like `%42`.
MLIR guarantees identifiers never collide with keywords by prefixing identifiers
-with a sigil (e.g. `%`, `#`, `@`). In certain unambiguous contexts (e.g. affine
-expressions), identifiers are not prefixed, for brevity. New keywords may be
-added to future versions of MLIR without danger of collision with existing
+with a sigil (e.g. `%`, `#`, `@`, `^`). In certain unambiguous contexts (e.g.
+affine expressions), identifiers are not prefixed, for brevity. New keywords may
+be added to future versions of MLIR without danger of collision with existing
identifiers.
The scope of SSA values is defined based on the standard definition of
SSA values bound to dimensions and symbols must always have 'index' type.
-In a [CFG Function](#cfg-functions), any SSA value can be bound to dimensional
-and symbol identifiers.
-
-In an [ML Function](#ml-functions), a symbolic identifier can be bound to an SSA
-value that is either an argument to the function, a value defined at the top
-level of that function (outside of all loops and if instructions), the result of
-a [`constant` operation](#'constant'-operation), or the result of an
+A symbolic identifier can be bound to an SSA value that is either an argument to
+the function, a value defined at the top level of that function (outside of all
+loops and if instructions), the result of a
+[`constant` operation](#'constant'-operation), or the result of an
[`affine_apply`](#'affine_apply'-operation) operation that recursively takes as
arguments any symbolic identifiers. Dimensions may be bound not only to anything
that a symbol is bound to, but also to induction variables of enclosing
and memory buffers. MLIR does not include complex numbers, tuples, structures,
arrays, or dictionaries.
+TODO: Revisit this in light of dialect extensible type systems.
+
``` {.ebnf}
type ::= integer-type
| index-type
have a designated width.
**Rationale:** low precision integers (like `i2`, `i4` etc) are useful for
-cutting edge inference work, and arbitrary precision integers are useful for
+low-precision inference chips, and arbitrary precision integers are useful for
hardware synthesis (where a 13 bit multiplier is a lot cheaper/smaller than a 16
bit one).
| function-id `:` function-type
```
-It is possible to attach attributes to operations, and the set of expected
-attributes, their structure, and the definition of that meaning is contextually
-dependent on the operation they are attached to.
-
-TODO: we should allow attaching attributes to functions.
+It is possible to attach attributes to instructions and functions, and the set
+of expected attributes, their structure, and the definition of that meaning is
+contextually dependent on the operation they are attached to.
## Module {#module}
## Functions {#functions}
-MLIR has three kinds of functions: external functions (functions with bodies
-that are defined outside of this module), [CFGFunctions](#cfg-functions) and
-[MLFunctions](#ml-functions), each of which are described below.
+MLIR functions have a signature (including argument and result types) and
+associated attributes according to the following grammar:
``` {.ebnf}
-function ::= ext-func | cfg-func | ml-func
-```
-
-The different kinds of function affect the structure and sort of code that can
-be represented inside of the function, but callers are unaffected by the choice
-of representation, and all functions have the same caller side capabilities.
+function ::= `func` function-signature function-attributes? function-body?
-### External Functions {#external-functions}
-
-External functions are a declaration that a function exists, without a
-definition of its body. It is used when referring to things defined outside of
-the current module.
-
-``` {.ebnf}
-argument-list ::= type (`,` type)* | /*empty*/
function-signature ::= function-id `(` argument-list `)` (`->` type-list)?
-ext-func ::= `extfunc` function-signature (`attributes` attribute-dict)?
-```
+argument-list ::= named-argument (`,` named-argument)* | /*empty*/
+argument-list ::= type (`,` type)* | /*empty*/ named-argument ::= ssa-id `:`
+type
-Examples:
-
-```mlir {.mlir}
-extfunc @abort()
-extfunc @scribble(i32, i64, memref<? x 128 x f32, #layout_map0>) -> f64
+function-attributes ::= `attributes` attribute-dict
+function-body ::= `{` block+ `}`
```
-TODO: This will eventually be expanded include mod-ref sets
-(read/write/may_read/may_write) and attributes.
-
-### CFG Functions {#cfg-functions}
-
-Syntax:
-
-``` {.ebnf}
-cfg-func ::= `cfgfunc` function-signature
- (`attributes` attribute-dict)? `{` block+ `}`
-```
+An external function declaration (used when referring to a function declared in
+some other module) has no body. A function definition contains a control
+flow graph made up of one or more blocks. While the MLIR textual form provides a
+nice inline syntax for function arguments, they are internally represented as
+"block arguments" to the first block in the function.
-A simple CFG function that returns its argument twice looks like this:
+Examples:
```mlir {.mlir}
-cfgfunc @count(i64) -> (i64, i64)
+// External function definitions.
+func @abort()
+func @scribble(i32, i64, memref<? x 128 x f32, #layout_map0>) -> f64
+
+// A function that returns its argument twice:
+func @count(%x: i64) -> (i64, i64)
attributes {fruit: "banana"} {
-^bb0(%x: i64):
return %x, %x: i64, i64
}
```
-**Context:** CFG functions are capable of representing arbitrary computation at
-either a high- or low-level: for example, they can represent an entire
-TensorFlow dataflow graph, where the instructions are TensorFlow "ops" producing
-values of Tensor type. It can also represent scalar math, and can be used as a
-way to lower [ML Functions](#ml-functions) before late code generation.
-
#### Blocks {#blocks}
Syntax:
``` {.ebnf}
-block ::= bb-label operation* terminator-inst
+block ::= bb-label instruction+
bb-label ::= bb-id bb-arg-list? `:`
bb-id ::= caret-id
ssa-id-and-type ::= ssa-id `:` type
A [basic block](https://en.wikipedia.org/wiki/Basic_block) is a sequential list
of instructions without control flow (calls are not considered control flow for
this purpose) that are executed from top to bottom. The last instruction in a
-block is a [terminator instruction](#terminator-instructions), which ends the
-block.
+block is a [terminator operation](#terminator-operations), which ends the block.
-Blocks in MLIR take a list of arguments, which represent SSA PHI nodes in a
-functional notation. The arguments are defined by the block, and values are
+Blocks in MLIR take a list of block arguments, which represent SSA PHI nodes in
+a functional notation. The arguments are defined by the block, and values are
provided for these block arguments by branches that go to the block.
Here is a simple example function showing branches, returns, and block
arguments:
```mlir {.mlir}
-cfgfunc @simple(i64, i1) -> i64 {
+func @simple(i64, i1) -> i64 {
^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a
br_cond %cond, ^bb1, ^bb2
br ^bb3(%a: i64) // Branch passes %a as the argument
^bb2:
- %b = "add"(%a, %a) : (i64, i64) -> i64
+ %b = addi %a, %a : i64
br ^bb3(%b: i64) // Branch passes %b as the argument
// ^bb3 receives an argument, named %c, from predecessors
case: they become arguments to the entry block
[[more rationale](Rationale.md#block-arguments-vs-phi-nodes)].
-Control flow within a CFG function is implemented with unconditional branches,
-conditional branches, and a `return` instruction.
-
-TODO: We can add
-[switches](http://llvm.org/docs/LangRef.html#switch-instruction),
-[indirect gotos](http://llvm.org/docs/LangRef.html#indirectbr-instruction), etc
-if/when there is demand.
-
-#### Terminator instructions {#terminator-instructions}
-
-##### 'br' terminator instruction {#'br'-terminator-instruction}
-
-Syntax:
-
-``` {.ebnf}
-terminator-inst ::= `br` bb-id branch-use-list?
-branch-use-list ::= `(` ssa-use-list `:` type-list-no-parens `)`
-```
-
-The `br` terminator instruction represents an unconditional jump to a target
-block. The count and types of operands to the branch must align with the
-arguments in the target block.
-
-The MLIR branch instruction is not allowed to target the entry block for a
-function.
-
-##### 'cond_br' terminator instruction {#'cond_br'-terminator-instruction}
-
-Syntax:
-
-``` {.ebnf}
-terminator-inst ::=
- `cond_br` ssa-use `,` bb-id branch-use-list? `,` bb-id branch-use-list?
-```
-
-The `cond_br` terminator instruction represents a conditional branch on a
-boolean (1-bit integer) value. If the bit is set, then the first destination is
-jumped to; if it is false, the second destination is chosen. The count and types
-of operands must align with the arguments in the corresponding target blocks.
-
-The MLIR conditional branch instruction is not allowed to target the entry block
-for a function. The two destinations of the conditional branch instruction are
-allowed to be the same.
-
-The following example illustrates a CFG function with a conditional branch
-instruction that targets the same block:
-
-```mlir {.mlir}
-cfgfunc @select(%a : i32, %b :i32, %flag : i1) -> i32 {
-^bb0:
- // Both targets are the same, operands differ
- cond_br %flag, ^bb1(%a : i32), ^bb1(%b : i32)
-
-^bb1(%x : i32) :
- return %x : i32
-}
-```
-
-##### 'return' terminator instruction {#'return'-terminator-instruction}
-
-Syntax:
-
-``` {.ebnf}
-terminator-inst ::= `return` (ssa-use-list `:` type-list-no-parens)?
-```
+### Instruction Kinds
-The `return` terminator instruction represents the completion of a cfg function,
-and produces the result values. The count and types of the operands must match
-the result types of the enclosing function. It is legal for multiple blocks in a
-single function to return.
-
-### ML Functions {#ml-functions}
-
-Syntax:
+MLIR has three kinds of instructions: [dialect-defined operations](#operations),
+an affine [`for` instruction](#'for'-instruction), and an affine
+[`if` instruction](#'if'-instruction).
``` {.ebnf}
-ml-func ::= `mlfunc` ml-func-signature
- (`attributes` attribute-dict)? `{` inst* return-inst `}`
-
-ml-argument ::= ssa-id `:` type
-ml-argument-list ::= ml-argument (`,` ml-argument)* | /*empty*/
-ml-func-signature ::= function-id `(` ml-argument-list `)` (`->` type-list)?
-
inst ::= operation | for-inst | if-inst
```
-The body of an ML Function is made up of nested affine for loops, conditionals,
-and [operation](#operations) instructions, and ends with a return instruction.
-Each of the control flow instructions is made up a list of instructions and
-other control flow instructions.
-
-While ML Functions are restricted to affine loops and conditionals, they may
-freely call (and be called) by CFG Functions which do not have these
-restrictions. As such, the expressivity of MLIR is not restricted in general;
-one can choose to apply MLFunctions when it is beneficial.
-
-#### 'return' instruction {#'return'-instruction}
-
-Syntax:
-
-``` {.ebnf}
-return-inst ::= `return` (ssa-use-list `:` type-list-no-parens)?
-```
-
-The arity and operand types of the return instruction must match the result of
-the enclosing function.
-
#### 'for' instruction {#'for'-instruction}
Syntax:
shorthand-bound ::= ssa-id | `-`? integer-literal
```
-The `for` instruction in an ML Function represents an affine loop nest, defining
-an SSA value for its induction variable. This SSA value always has type
-[`index`](#index-type), which is the size of the machine word.
+The `for` instruction represents an affine loop nest, defining an SSA value for
+its induction variable. This SSA value always has type [`index`](#index-type),
+which is the size of the machine word.
The `for` instruction executes its body a number of times iterating from a lower
bound to an upper bound by a stride. The stride, represented by `step`, is a
```mlir {.mlir}
#map57 = (d0, d1)[s0] -> (d0, s0 - d1 - 1)
-mlfunc @simple_example(%A: memref<?x?xf32>, %B: memref<?x?xf32>) {
+func @simple_example(%A: memref<?x?xf32>, %B: memref<?x?xf32>) {
%N = dim %A, 0 : memref<?x?xf32>
for %i = 0 to %N step 1 {
for %j = 0 to %N { // implicitly steps by 1
| if-inst-head `else` `{` inst* `}`
```
-The `if` instruction in an ML Function restricts execution to a subset of the
-loop iteration space defined by an integer set (a conjunction of affine
-constraints). A single `if` may have a number of optional `else if` clauses, and
-may end with an optional `else` clause.
+The `if` instruction restricts execution to a subset of the loop iteration space
+defined by an integer set (a conjunction of affine constraints). A single `if`
+may have a number of optional `else if` clauses, and may end with an optional
+`else` clause.
The condition of the `if` is represented by an [integer set](#integer-sets) (a
conjunction of affine constraints), and the SSA values bound to the dimensions
```mlir {.mlir}
#set = (d0, d1)[s0]: (d0 - 10 >= 0, s0 - d0 - 9 >= 0,
d1 - 10 >= 0, s0 - d1 - 9 >= 0)
-mlfunc @reduced_domain_example(%A, %X, %N) : (memref<10xi32>, i32, i32) {
+func @reduced_domain_example(%A, %X, %N) : (memref<10xi32>, i32, i32) {
for %i = 0 to %N {
for %j = 0 to %N {
%0 = affine_apply #map42(%i, %j)
attribute-dict? `:` function-type
```
-While CFG and ML Functions have two different ways of representing control flow
-within their body, MLIR represents the computation within them with a uniform
-concept called _operations_. Operations in MLIR are fully extensible (there is
-no fixed list of operations), and have application-specific semantics. For
-example, MLIR supports [target-independent operations](#memory-operations),
+MLIR represents computation within functions with a uniform concept called
+_operations_. Operations in MLIR are fully extensible (there is no fixed list of
+operations), and have application-specific semantics. For example, MLIR supports
+[target-independent operations](#memory-operations),
[high-level TensorFlow ops](#tensorflow-operations), and
[target-specific machine instructions](#target-specific-operations).
TODO: rank, which returns an index.
+#### Terminator operations {#terminator-operations}
+
+##### 'br' terminator operation {#'br'-terminator-operation}
+
+Syntax:
+
+``` {.ebnf}
+operation ::= `br` bb-id branch-use-list?
+branch-use-list ::= `(` ssa-use-list `:` type-list-no-parens `)`
+```
+
+The `br` terminator instruction represents an unconditional jump to a target
+block. The count and types of operands to the branch must align with the
+arguments in the target block.
+
+The MLIR branch instruction is not allowed to target the entry block for a
+function.
+
+##### 'cond_br' terminator operation {#'cond_br'-terminator-operation}
+
+Syntax:
+
+``` {.ebnf}
+operation ::=
+ `cond_br` ssa-use `,` bb-id branch-use-list? `,` bb-id branch-use-list?
+```
+
+The `cond_br` terminator instruction represents a conditional branch on a
+boolean (1-bit integer) value. If the bit is set, then the first destination is
+jumped to; if it is false, the second destination is chosen. The count and types
+of operands must align with the arguments in the corresponding target blocks.
+
+The MLIR conditional branch instruction is not allowed to target the entry block
+for a function. The two destinations of the conditional branch instruction are
+allowed to be the same.
+
+The following example illustrates a CFG function with a conditional branch
+instruction that targets the same block:
+
+```mlir {.mlir}
+func @select(%a : i32, %b :i32, %flag : i1) -> i32 {
+ // Both targets are the same, operands differ
+ cond_br %flag, ^bb1(%a : i32), ^bb1(%b : i32)
+
+^bb1(%x : i32) :
+ return %x : i32
+}
+```
+
+##### 'return' terminator operation {#'return'-terminator-operation}
+
+Syntax:
+
+``` {.ebnf}
+operation ::= `return` (ssa-use-list `:` type-list-no-parens)?
+```
+
+The `return` terminator instruction represents the completion of a function, and
+produces the result values. The count and types of the operands must match the
+result types of the enclosing function. It is legal for multiple blocks in a
+single function to return.
+
### Core Operations {#core-operations}
#### 'affine_apply' operation {#'affine_apply'-operation}
specified with a tag memref and its indices. The operands include the tag memref
followed by its indices and the number of elements associated with the DMA being
waited on. The indices of the tag memref have the same restrictions as
-load/store indices when appearing in MLFunctions.
+load/store indices.
Example:
```mlir
// RUN: mlir-opt %s -canonicalize | FileCheck %s
- cfgfunc @test_subi_zero_cfg(i32) -> i32 {
- ^bb0(%arg0: i32):
+ func @test_subi_zero_cfg(%arg0: i32) -> i32 {
%y = subi %arg0, %arg0 : i32
return %y: i32
}
- // CHECK-LABEL: cfgfunc @test_subi_zero_cfg
- // CHECK-NEXT: bb0(%arg0: i32):
+ // CHECK-LABEL: func @test_subi_zero_cfg(%arg0: i32)
// CHECK-NEXT: %c0_i32 = constant 0 : i32
// CHECK-NEXT: return %c0
```
```mlir {.mlir}
// RUN: mlir-opt %s -memref-dependence-check -verify
- mlfunc @different_memrefs() {
+ func @different_memrefs() {
%m.a = alloc() : memref<100xf32>
%m.b = alloc() : memref<100xf32>
%c0 = constant 0 : index
[TOC]
-## ML to CFG conversion (`-convert-to-cfg`) {#convert-to-cfg}
+## Lower `if` and `for` (`-lower-if-and-for`) {#lower-if-and-for}
-Convert ML functions to equivalent CFG functions.
+Lower the `if` and `for` instructions to the CFG equivalent.
Individual operations are preserved. Loops are converted to a subgraph of blocks
(initialization, condition checking, subgraph of body blocks) with loop
induction variable being passed as the block argument of the condition checking
block.
-### Input IR
-
-MLIR with ML, CFG and External functions. The following restrictions apply to
-the input ML functions:
-
-- no `Tensor` types;
-
-These restrictions may be lifted in the future.
-
-### Output IR
-
-MLIR with CFG and External functions only. The CFG functions introduced by this
-pass can contain `affine_apply` operations from the BuiltinOps dialect in
-addition to the operations present in the source ML functions.
-
-### Invariants
-
-- The CFG and External functions are not modified.
-- The CFG functions introduced by this pass have the same names as the
- replaced ML functions.
-- Individual operations other than control flow from the source ML functions
- are replicated in the produced CFG functions; their arguments may be updated
- to capture the corresponding SSA values after conversion (e.g., loop
- iterators become block arguments).
-
## `affine_apply` lowering (`-lower-affine-apply`) {#lower-affine-apply}
-Convert `affine_apply` operations in CFG functions into arithmetic operations
-they comrise. Arguments and results of all operations are of the `index` type.
+Convert `affine_apply` operations into arithmetic operations they comprise.
+Arguments and results of all operations are of the `index` type.
For example, `%r = affine_apply (d0, d1)[s0] -> (d0 + 2*d1 + s0)(%d0, %d1)[%s0]`
-can be converted into
+can be converted into:
```mlir
%d0 = <...>
%r = addi %2, %s0
```
-### Input IR
-
-MLIR with CFG and External functions.
+### Input invariant
-ML functions are not allowed in the input since they *may* include syntactic
-constructs equivalent to `affine_apply` that cannot be replaced, in particular
-`for` loop bounds and `if` conditions. Lower ML functions to CFG functions to
-expose all `affine_apply` operations before using this pass.
+`if` and `for` instructions should be eliminated before this pass.
### Output IR
-MLIR with CFG and External functions. CFG functions do not contain any
-`affine_apply` operations. Consequently, named maps may be removed from the
-module. CFG functions may use any operations from the StandardOps dialect in
-addition to the already used dialects.
+Functions that do not contain any `affine_apply` operations. Consequently, named
+maps may be removed from the module. CFG functions may use any operations from
+the StandardOps dialect in addition to the already used dialects.
### Invariants
-- External functions are not modified.
-- The semantics of the CFG functions remains the same.
- Operations other than `affine_apply` are not modified.
single continuous design provides a framework to lower from dataflow graphs to
high performance target specific code.
-MLIR stands for one of "Multidimensional Loop IR" or "Machine Learning IR" or
-"Mid Level IR", we prefer the first. This document only provides the rationale
-behind MLIR -- its actual [specification document](LangRef.md),
+MLIR stands for one of "Multi-Level IR" or "Multi-dimensional Loop IR" or
+"Machine Learning IR" or "Mid Level IR", we prefer the first. This document only
+provides the rationale behind MLIR -- its actual
+[specification document](LangRef.md),
[system design documentation](https://docs.google.com/document/d/1yRqja94Da6NtKmPxSYtTx6xbUtughLANyeD7dZ7mOBM/edit#)
and other content is hosted elsewhere.
## Introduction and Motivation {#introduction-and-motivation}
-The Multidimensional loop intermediate representation (MLIR) is intended for
-easy expression and optimization of computations involving deep loop nests and
-dense matrices of high dimensionality. It is thus well-suited to deep learning
+The Multi-Level Intermediate Representation (MLIR) is intended for easy
+expression and optimization of computations involving deep loop nests and dense
+matrices of high dimensionality. It is thus well-suited to deep learning
computations in particular. Yet it is general enough to also represent arbitrary
sequential computation. The representation allows high-level optimization and
parallelization for a wide range of parallel architectures including those with
an element of a memref. These instructions take as arguments n+1 indices for an
n-ranked tensor. This disallows the equivalent of pointer arithmetic or the
ability to index into the same memref in other ways (something which C arrays
-allow for example). Furthermore, in an ML function, the compiler can follow
-use-def chains (e.g. through
+allow for example). Furthermore, in an affine constructs, the compiler can
+follow use-def chains (e.g. through
[affine_apply instructions](https://docs.google.com/document/d/1lwJ3o6MrkAa-jiqEwoORBLW3bAI1f4ONofjRqMK1-YU/edit?ts=5b208818#heading=h.kt8lzanb487r))
to precisely analyze references at compile-time using polyhedral techniques.
This is possible because of the
-[restrictions on dimensions and symbols](https://docs.google.com/document/d/1lwJ3o6MrkAa-jiqEwoORBLW3bAI1f4ONofjRqMK1-YU/edit?ts=5b208818#heading=h.fnmv1awabfj)
-in ML functions. However, for CFG functions that are called from ML functions, a
-calling context sensitive analysis has to be performed for accesses in CFG
-functions in order to determine if they could be treated as affine accesses when
-analyzing or optimizing the calling ML function.
+[restrictions on dimensions and symbols](https://docs.google.com/document/d/1lwJ3o6MrkAa-jiqEwoORBLW3bAI1f4ONofjRqMK1-YU/edit?ts=5b208818#heading=h.fnmv1awabfj).
A scalar of element-type (a primitive type or a vector type) that is stored in
memory is modeled as a 0-d memref. This is also necessary for scalars that are
-live out of for loops and if conditionals in an ML function, for which we don't
-yet have an SSA representation (in an ML function) --
+live out of for loops and if conditionals in a function, for which we don't yet
+have an SSA representation --
[an extension](#mlfunction-extensions-for-"escaping-scalars") to allow that is
described later in this doc.
Example:
```mlir {.mlir}
-mlfunc foo(...) {
+func foo(...) {
%A = alloc <8x?xf32, #lmap> (%N)
...
call bar(%A) : (memref<8x?xf32, #lmap>)
}
-mlfunc bar(%A : memref<8x?xf32, #lmap>) {
+func bar(%A : memref<8x?xf32, #lmap>) {
// Type of %A indicates that %A has dynamic shape with 8 rows
// and unknown number of columns. The number of columns is queried
// dynamically using dim instruction.
- %N = builtin "dim"(%A){index : 1} : (memref<8x?xf32, #lmap>) -> int
+ %N = dim %A, 1 : memref<8x?xf32, #lmap>
for %i = 0 to 8 {
for %j = 0 to %N {
### Block Arguments vs PHI nodes {#block-arguments-vs-phi-nodes}
-MLIR CFG Functions represent SSA using "[block arguments](LangRef.md#blocks)"
-rather than [PHI instructions](http://llvm.org/docs/LangRef.html#i-phi) used in
-LLVM. This choice is representationally identical (the same constructs can be
+MLIR Functions represent SSA using "[block arguments](LangRef.md#blocks)" rather
+than [PHI instructions](http://llvm.org/docs/LangRef.html#i-phi) used in LLVM.
+This choice is representationally identical (the same constructs can be
represented in either form) but block arguments have several advantages:
1. LLVM PHI nodes always have to be kept at the top of a block, and
[talk on youtube](https://www.youtube.com/watch?v=VeRaLPupGks) talking about
LLVM 2.0.
-### Index type disallowed in aggregate types {#index-type-disallowed-in-aggregate-types}
+### Index type disallowed in vector/tensor/memref types {#index-type-disallowed-in-aggregate-types}
Index types are not allowed as elements of `vector`, `tensor` or `memref` type.
Index types are intended to be used for platform-specific "size" values and may
## Examples {#examples}
-This section describes a few very simple examples that help understand how ML
-functions and CFG functions work together.
+This section describes a few very simple examples that help understand how MLIR
+represents computation.
### Non-affine control flow {#non-affine-control-flow}
}
```
-The presence of dynamic control flow leads to a CFG function nested in an ML
-function: an ML function captures the outer loop while the inner loop is
-represented in the CFG function.
+The presence of dynamic control flow leads to an inner non-affine function
+nested in an outer function that using affine loops.
```mlir {.mlir}
-mlfunc @search(memref<?x?xi32 %A, <?xi32> %S, i32 %key) {
+func @search(memref<?x?xi32 %A, <?xi32> %S, i32 %key) {
%ni = dim %A, 0 : memref<?x?xi32>
// This loop can be parallelized
for %i = 0 to %ni {
return
}
-cfgfunc @search_body(memref<?x?xi32>, memref<?xi32>, i32) {
-^bb0(%A: memref<?x?xi32>, %S: memref<?xi32>, %key: i32)
+func @search_body(%A: memref<?x?xi32>, %S: memref<?xi32>, %key: i32) {
%nj = dim %A, 1 : memref<?x?xi32>
br ^bb1(0)
br_cond %p2, ^bb3(%j), ^bb4
^bb3(%j: i32)
- store %j to %S[%i] : memref<?xi32>
+ store %j, %S[%i] : memref<?xi32>
br ^bb5
^bb4:
As per the [MLIR spec](LangRef.md), the restrictions on dimensions and symbol
identifiers to be used with the affine_apply instruction only apply to accesses
-inside ML functions. However, an analysis of accesses inside the called CFG
-function (`@search_body`) is necessary to determine if the `%i` loop could be
-parallelized: such CFG function access analysis is calling context sensitive.
+inside `for` and `if` instructions. However, an analysis of accesses inside the
+called function (`@search_body`) is necessary to determine if the `%i` loop
+could be parallelized: such function access analysis is calling context
+sensitive.
### Non-affine loop bounds {#non-affine-loop-bounds}
-Loop bounds that are not affine lead to a nesting of ML functions and CFG
-functions as shown below.
+Loop bounds that are not affine lead to a nesting of functions as shown below.
```c
for (i=0; i <N; i++)
```
```mlir {.mlir}
-mlfunc @outer_nest(%n) : (i32) {
+func @outer_nest(%n) : (i32) {
for %i = 0 to %n {
for %j = 0 to %n {
call @inner_nest(%i, %j, %n)
}
}
+ return
}
-cfgfunc @inner_nest(i32, i32, i32) {
-^bb0(%i, %j, %n):
+func @inner_nest(%i: i32, %j: i32, %n: i32) {
%pow = call @pow(2, %j) : (f32, f32) -> f32
// TODO(missing cast from f32 to i32)
call @inner_nest2(%pow, %n)
+ return
}
-mlfunc @inner_nest2(%m, %n) : (i32) {
- for k = 0 to %m {
- for l = 0 to %n {
- ….
+func @inner_nest2(%m, %n) -> i32 {
+ for %k = 0 to %m {
+ for %l = 0 to %n {
+ ...
}
}
return
### Reference 2D Convolution {#reference-2d-convolution}
The following example illustrates a reference implementation of a 2D
-convolution, which uses an integer set `@@domain` to represent valid input data
+convolution, which uses an integer set `#domain` to represent valid input data
in a dilated convolution.
```mlir {.mlir}
// Dilation factors S0 and S1 can be constant folded if constant at compile time.
-@@domain = (d0, d1)[S0,S1,S2,S3]: (d0 % S0 == 0, d1 % S1 == 0, d0 >= 0, d1 >= 0,
+#domain = (d0, d1)[S0,S1,S2,S3]: (d0 % S0 == 0, d1 % S1 == 0, d0 >= 0, d1 >= 0,
S3 - d0 - 1 >= 0, S4 - d1 - 1 >= 0)
// Identity map (shown here for illustration).
#map0 = (d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3, d4, d5, d6)
// input: [batch, input_height, input_width, input_feature]
// kernel: [kernel_height, kernel_width, input_feature, output_feature]
// output: [batch, output_height, output_width, output_feature]
-mlfunc @conv2d(memref<16x1024x1024x3xf32, #lm0, vmem> %input,
- memref<5x5x3x32xf32, #lm0, vmem> %kernel,
- memref<16x512x512x32xf32, #lm0, vmem> %output) {
- for %b = 0 … %batch {
- for %oh = 0 … %output_height {
- for %ow = 0 ... %output_width {
- for %of = 0 … %output_feature {
- for %kh = 0 … %kernel_height {
- for %kw = 0 … %kernel_width {
- for %if = 0 … %input_feature {
+func @conv2d(memref<16x1024x1024x3xf32, #lm0, vmem> %input,
+ memref<5x5x3x32xf32, #lm0, vmem> %kernel,
+ memref<16x512x512x32xf32, #lm0, vmem> %output) {
+ for %b = 0 to %batch {
+ for %oh = 0 to %output_height {
+ for %ow = 0 to %output_width {
+ for %of = 0 to %output_feature {
+ for %kh = 0 to %kernel_height {
+ for %kw = 0 to %kernel_width {
+ for %if = 0 to %input_feature {
%0 = affine_apply #map0 (%b, %oh, %ow, %of, %kh, %kw, %if)
// Calculate input indices.
%1 = affine_apply #map1 (%0#1, %0#2, %0#4, %0#5)
%h_pad_low, %w_pad_low]
// Check if access is not in padding.
- if @@domain(%1#0, %1#1)
+ if #domain(%1#0, %1#1)
[%h_base_dilation, %w_kernel_dilation, %h_bound, %w_bound] {
%2 = affine_apply #map2 (%1#0, %1#1)
// Compute: output[output_indices] += input[input_indices] * kernel[kernel_indices]
implementation experience and learn more about the challenges and limitations of
our current design in practice.
-### ML Function representation alternatives: polyhedral schedule lists vs polyhedral schedules trees vs affine loop/If forms {#mlfunction-representation-alternatives-polyhedral-schedule-lists-vs-polyhedral-schedules-trees-vs-affine-loop-if-forms}
+### Polyhedral code representation alternatives: schedule lists vs schedules trees vs affine loop/if forms {#mlfunction-representation-alternatives-polyhedral-schedule-lists-vs-polyhedral-schedules-trees-vs-affine-loop-if-forms}
The current MLIR uses a representation of polyhedral schedules using a tree of
if/for loops. We extensively debated the tradeoffs involved in the typical
At a high level, we have two alternatives here:
-1. Schedule tree representation for MLFunctions instead of an affine loop AST
- form: The current proposal uses an affine loop and conditional tree form,
- which is syntactic and with no separation of domains as sets and schedules
- as multidimensional affine functions. A schedule tree form however makes
- polyhedral domains and schedules a first class concept in the IR allowing
- compact expression of transformations through the schedule tree without
- changing the domains of instructions. Such a representation also hides
- prologues, epilogues, partial tiles, complex loop bounds and conditionals
- making loop nests free of "syntax". Cost models instead look at domains and
- schedules. In addition, if necessary such a domain schedule representation
- can be normalized to explicitly propagate the schedule into domains and
- model all the cleanup code. An example and more detail on the schedule tree
- form is in the next section.
+1. Schedule tree representation instead of an affine loop AST form: The current
+ proposal uses an affine loop and conditional tree form, which is syntactic
+ and with no separation of domains as sets and schedules as multidimensional
+ affine functions. A schedule tree form however makes polyhedral domains and
+ schedules a first class concept in the IR allowing compact expression of
+ transformations through the schedule tree without changing the domains of
+ instructions. Such a representation also hides prologues, epilogues, partial
+ tiles, complex loop bounds and conditionals making loop nests free of
+ "syntax". Cost models instead look at domains and schedules. In addition, if
+ necessary such a domain schedule representation can be normalized to
+ explicitly propagate the schedule into domains and model all the cleanup
+ code. An example and more detail on the schedule tree form is in the next
+ section.
1. Having two different forms of MLFunctions: an affine loop tree form
(AffineLoopTreeFunction) and a polyhedral schedule tree form as two
different forms of MLFunctions. Or in effect, having four different forms
// A tiled matmul code (128x128x128) represented in schedule tree form
// #map0 = (d0, d1, d2, d3, d4, d5) -> (128*d0 + d3, 128*d1 + d4, 128*d2 + d5)
-@@intset_ij = (i, j) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0, -j + N-1 >= 0
-@@intset_ijk = (i, j, k) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0,
+#intset_ij = (i, j) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0, -j + N-1 >= 0
+#intset_ijk = (i, j, k) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0,
-j + M-1 >= 0, k >= 0, -k + N - 1 >= 0)
-mlfunc @matmul(%A, %B, %C, %M, %N, %K) : (...) { // %M, N, K are symbols
+func @matmul(%A, %B, %C, %M, %N, %K) : (...) { // %M, N, K are symbols
// t1, t2, t3, t4, t5, t6 are abstract polyhedral loops
mldim %t1 : {S1,S2,S3,S4,S5} floordiv (i, 128) {
mldim %t2 : {S1,S2,S3,S4,S5} floordiv (j, 128) {
mldim %t3 : {S2,S3,S4,S5} floordiv (k, 128) {
// (%i, %j, %k) = affine_apply (d0, d1, d2)
// -> (128*d0, 128*d1, 128*d2) (%t1, %t2, %t3)
- call dma_hbm_to_vmem(%A, ...) with @@inset_ijk (%i, %j, %k) [%M, %N, %K]
+ call dma_hbm_to_vmem(%A, ...) with #inset_ijk (%i, %j, %k) [%M, %N, %K]
// (%i, %j, %k) = affine_apply (d0, d1, d2)
// -> (128*d0, 128*d1, 128*d2) (%t1, %t2, %t3)
- call dma_hbm_to_vmem(%B, ...) with @@inset_ijk (%i, %j, %k) [%M, %N, %K]
+ call dma_hbm_to_vmem(%B, ...) with #inset_ijk (%i, %j, %k) [%M, %N, %K]
mldim %t4 : {S4} i mod 128 {
mldim %t5 : {S4} j mod 128 {
mldim %t6 : {S4} k mod 128 {
// (%i, %j, %k) = affine_apply #map0 (%t1, %t2, %t3, %t4, %t5, %t6)
call matmul_body(A, B, C, %i, %j, %k, %M, %N, %K)
- with @@inset_ijk(%i, %j, %k) [%M, %N, %K]
+ with #inset_ijk(%i, %j, %k) [%M, %N, %K]
} // end mld4im t6
} // end mldim t5
} // end mldim t4
} // end mldim t3
// (%i, %j) = affine_apply (d0, d1) -> (128*d0, 128*d1) (%t1, %t2)
- call $dma_vmem_to_hbm_C ... with @@intset(%i, %j) [%M, %N, %K]
+ call $dma_vmem_to_hbm_C ... with #intset(%i, %j) [%M, %N, %K]
} // end mldim t2
} // end mldim t1
return
// read relation: two elements ( d0 <= r0 <= d0+1 )
##aff_rel9 = (d0) -> (r0) : r0 - d0 >= 0, d0 - r0 + 1 >= 0
-cfgfunc @count ( memref<128xf32, (d0) -> (d0)> %A, i32 %pos) -> f32
+func @count (memref<128xf32, (d0) -> (d0)> %A, i32 %pos) -> f32
reads: {%A ##aff_rel9 (%pos)}
writes: /* empty */
may_reads: /* empty */
bb0 (%0, %1: memref<128xf32>, i64):
%val = load %A [(d0) -> (d0) (%pos)]
%val = load %A [(d0) -> (d0 + 1) (%pos)]
- %p = builtin mulf(%val, %val) : (f32, f32) -> f32
+ %p = mulf %val, %val : f32
return %p
}
```
read/write/may_read/may_write sets could be provided a-priori by a user as part
of the external function signature or they could be part of a database.
-TODO: figure out the right syntax.
+TODO: Design this, and update to use function attribute syntax.
Example:
```mlir {.mlir}
##rel9 ( ) [s0] -> (r0, r1) : 0 <= r0 <= 1023, 0 <= r1 <= s0 - 1
-extfunc @cblas_reduce_ffi(memref<1024 x ? x f32, #layout_map0, hbm> %M) -> f32 [
+func @cblas_reduce_ffi(memref<1024 x ? x f32, #layout_map0, hbm> %M) -> f32 [
reads: {%M, ##rel9() }
writes: /* empty */
may_reads: /* empty */
may_writes: /* empty */
]
-extfunc @dma_hbm_to_vmem(memref<1024 x f32, #layout_map0, hbm> %a,
+func @dma_hbm_to_vmem(memref<1024 x f32, #layout_map0, hbm> %a,
offset, memref<1024 x f32, #layout_map0, vmem> %b,
memref<1024 x f32, #layout_map0> %c
) [
representation. 2(b) requires no change, but impacts how cost models look at
index and layout maps.
-### ML Function Extensions for "Escaping Scalars" {#mlfunction-extensions-for-"escaping-scalars"}
+### `if` and `for` Extensions for "Escaping Scalars" {#extensions-for-"escaping-scalars"}
We considered providing a representation for SSA values that are live out of
-if/else conditional bodies or for loops of ML functions. We ultimately abandoned
-this approach due to its complexity. In the current design of MLIR, scalar
-variables cannot escape for loops or if instructions. In situations, where
-escaping is necessary, we use zero-dimensional tensors and memrefs instead of
-scalars.
+`if/else` conditional bodies and loop carried in `for` loops. We ultimately
+abandoned this approach due to its complexity. In the current design of MLIR,
+scalar variables cannot escape for loops or if instructions. In situations,
+where escaping is necessary, we use zero-dimensional tensors and memrefs instead
+of scalars.
+
+**TODO**: This whole section is obsolete and should be updated to use block
+arguments and a yield like terminator in for/if instructions.
The abandoned design of supporting escaping scalars is as follows:
```mlir {.mlir}
// Return sum of elements in 1-dimensional mref A
-mlfunc int32 @sum(%A : memref<?xi32>, %N : i32) -> (i32) {
+func int32 @sum(%A : memref<?xi32>, %N : i32) -> (i32) {
%init = 0
%result = for %i = 0 to N with %tmp(%init) {
%value = load %A[%i]
```mlir {.mlir}
// Compute sum of half of the array
-mlfunc int32 @sum_half(%A, %N) {
+func int32 @sum_half(%A, %N) {
%s0 = 0
%s1 = for %i = 1 ... N step 1 with %s2 (%s0) {
%s3 = if (%i >= %N / 2) {
/// Function declarations.
///
-/// func ::= `func` function-signature function-attributes? `{` block+ `}`
+/// function ::= `func` function-signature function-attributes? function-body?
+/// function-body ::= `{` block+ `}`
/// function-attributes ::= `attributes` attribute-dict
///
ParseResult ModuleParser::parseFunc() {