review.tizen.org Git - platform/upstream/llvm.git/log

Introduce the notion of dialect attributes and dependent attributes. A dialect attribute derives its context from a specific dialect, whereas a dependent attribute derives context from what it is attached to. Following this, we now enforce that functions and function arguments may only contain dialect specific attributes. These are generic entities and cannot provide any specific context for a dependent attribute.

Dialect attributes are defined as:

        dialect-namespace `.` attr-name `:` attribute-value

Dialects can override any of the following hooks to verify the validity of a given attribute:
  * verifyFunctionAttribute
  * verifyFunctionArgAttribute
  * verifyInstructionAttribute

PiperOrigin-RevId: 236507970

Implement the initial AnalysisManagement infrastructure, with the introduction of the FunctionAnalysisManager and ModuleAnalysisManager classes. These classes provide analysis computation, caching, and invalidation for a specific IR unit. The invalidation is currently limited to either all or none, i.e. you cannot yet preserve specific analyses.

An analysis can be any class, but it must provide the following:
* A constructor for a given IR unit.

struct MyAnalysis {
// Compute this analysis with the provided module.
MyAnalysis(Module *module);
};

Analyses can be accessed from a Pass by calling either the 'getAnalysisResult<AnalysisT>' or 'getCachedAnalysisResult<AnalysisT>' methods. A FunctionPass may query for a cached analysis on the parent module with 'getCachedModuleAnalysisResult'. Similary, a ModulePass may query an analysis, it doesn't need to be cached, on a child function with 'getFunctionAnalysisResult'.

By default, when running a pass all cached analyses are set to be invalidated. If no transformation was performed, a pass can use the method 'markAllAnalysesPreserved' to preserve all analysis results. As noted above, preserving specific analyses is not yet supported.

PiperOrigin-RevId: 236505642

Add an assertion on the builder to ensure that a block is set before creating an operation

This is more friendly for the user than a raw segfault

PiperOrigin-RevId: 236504102

Set the namespace of the StandardOps dialect to "std", but add a special case to the parser to allow parsing standard operations without the "std" prefix. This will now allow for the standard dialect to be looked up dynamically by name.

PiperOrigin-RevId: 236493865

Remove hidden flag from fusion CL options

PiperOrigin-RevId: 236409185

Update addSliceBounds to deal with loops with floor's/mod's.

- This change only impacts the cost model for fusion, given the way
  addSliceBounds was being used. It so happens that the output in spite of this
  CL's fix is the same; however, the assertions added no longer fail. (an
  invalid/inconsistent memref region was being used earlier).

PiperOrigin-RevId: 236405030

NFC. Move all of the remaining operations left in BuiltinOps to StandardOps. The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder.

PiperOrigin-RevId: 236403665

Use consistent names for dialect op source files

This CL changes dialect op source files (.h, .cpp, .td) to follow the following
convention:

<full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td}

Builtin and standard dialects are specially treated, though. Both of them do
not have dialect namespace; the former is still named as BuiltinOps.* and the
latter is named as Ops.*.

Purely mechanical. NFC.

PiperOrigin-RevId: 236371358

A simple pass to detect and mark all parallel loops
- detect all parallel loops based on dep information and mark them with a
  "parallel" attribute
- add mlir::isLoopParallel(OpPointer<AffineForOp> ...), and refactor an existing method
  to use that (reuse some code from @andydavis (cl/236007073) for this)
- a simple/meaningful way to test memref dep test as well

Ex:

$ mlir-opt -detect-parallel test/Transforms/parallelism-detection.mlir
#map1 = ()[s0] -> (s0)
func @foo(%arg0: index) {
  %0 = alloc() : memref<1024x1024xvector<64xf32>>
  %1 = alloc() : memref<1024x1024xvector<64xf32>>
  %2 = alloc() : memref<1024x1024xvector<64xf32>>
  for %i0 = 0 to %arg0 {
    for %i1 = 0 to %arg0 {
      for %i2 = 0 to %arg0 {
        %3 = load %0[%i0, %i2] : memref<1024x1024xvector<64xf32>>
        %4 = load %1[%i2, %i1] : memref<1024x1024xvector<64xf32>>
        %5 = load %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
        %6 = mulf %3, %4 : vector<64xf32>
        %7 = addf %5, %6 : vector<64xf32>
        store %7, %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
      } {parallel: false}
    } {parallel: true}
  } {parallel: true}
  return
}

PiperOrigin-RevId: 236367368

Loop fusion for input reuse.
*) Breaks fusion pass into multiple sub passes over nodes in data dependence graph:
- first pass fuses single-use producers into their unique consumer.
- second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges.
- third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer).
*) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved.
*) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation.
*) Adds a generic loop utilitiy for finding all sequential loops in a loop nest.
*) Adds and updates unit tests.

PiperOrigin-RevId: 236350987

Add support for parsing and printing affine.if and affine.for attributes. The attribute dictionaries are printed after the final block list for both operations:

  for %i = 0 to 10 {
     ...
  } {some_attr: true}

  if () : () () {
    ...
  } {some_attr: true}

  if () : () () {
    ...
  } else {
    ...
  } {some_attr: true}

PiperOrigin-RevId: 236346983

Analysis support for floordiv/mod's in loop bounds/

- handle floordiv/mod's in loop bounds for all analysis purposes
- allows fusion slicing to be more powerful
- add simple test cases based on -memref-bound-check
- fusion based test cases in follow up CLs

PiperOrigin-RevId: 236328551

Method to align/merge dimensional/symbolic identifiers between two FlatAffineConstraints

- add a method to merge and align the spaces (identifiers) of two
  FlatAffineConstraints (both get dimension-wise and symbol-wise unique
  columns)
- this completes several TODOs, gets rid of previous assumptions/restrictions
  in composeMap, unionBoundingBox, and reuses common code
- remove previous workarounds / duplicated funcitonality in
  FlatAffineConstraints::composeMap and unionBoundingBox, use mergeAlignIds
  from both

PiperOrigin-RevId: 236320581

EDSC bindings: expose generic Op construction interface

EDSC Expressions can now be used to build arbitrary MLIR operations identified
by their canonical name, i.e. the name obtained from
`OpClass::getOperationName()` for registered operations. Expose this
functionality to the C API and Python bindings. This exposes builder-level
interface to Python and avoids the need for experimental Python code to
implement EDSC free function calls for constructing each op type.

This modification required exposing mlir::Attribute to the C API and Python
bindings, which only supports integer attributes for now.

This is step 4/n to making EDSCs more generalizable.

PiperOrigin-RevId: 236306776

Use Instruction::isBeforeInBlock instead of a linear scan

- use Instruction::isBeforeInBlock instead of a linear scan in
AffineAnalysis.cpp

PiperOrigin-RevId: 236235824

Provide a Builder::getNamedAttr and (Instruction|Function)::setAttr(StringRef, Attribute) to simplify attribute manipulation.

PiperOrigin-RevId: 236222504

Remove PassResult and have the runOnFunction/runOnModule functions return void instead. To signal a pass failure, passes should now invoke the 'signalPassFailure' method. This provides the equivalent functionality when needed, but isn't an intrusive part of the API like PassResult.

PiperOrigin-RevId: 236202029

Change some of the debug messages to use emitError / emitWarning / emitNote - NFC

PiperOrigin-RevId: 236169676

Add support for named function argument attributes. The attribute dictionary is printed after the argument type:

func @arg_attrs(i32 {arg_attr: 10})

func @arg_attrs(%arg0: i32 {arg_attr: 10})

PiperOrigin-RevId: 236136830

LLVM IR Dialect: unify call and call0 operations

When the LLVM IR dialect was implemented, TableGen operation definition scheme
did not support operations with variadic results.  Therefore, the `call`
instruction was split into `call` and `call0` for the single- and zero-result
calls (LLVM does not support multi-result operations).  Unify `call` and
`call0` using the recently added TableGen support for operations with Variadic
results.  Explicitly verify that the new operation has 0 or 1 results.  As a
side effect, this change enables clean-ups in the conversion to the LLVM IR
dialect that no longer needs to rely on wrapped LLVM IR void types when
constructing zero-result calls.

PiperOrigin-RevId: 236119197

ExecutionEngine OptUtils: support -On flags in string-based initialization

Original implementation of OutUtils provided two different LLVM IR module
transformers to be used with the MLIR ExecutionEngine: OptimizingTransformer
parameterized by the optimization levels (similar to -O3 flags) and
LLVMPassesTransformer parameterized by the string formatted similarly to
command line options of LLVM's "opt" tool without support for -O* flags.
Introduce such support by declaring the flags inside the parser and by
populating the pass managers similarly to what "opt" does. Remove the
additional flags from mlir-cpu-runner as they can now be wrapped into
`-llvm-opts` together with other LLVM-related flags.

PiperOrigin-RevId: 236107292

When parsing, check that a region operation is not referencing any of the entry arguments to its block lists.

PiperOrigin-RevId: 236030438

Move the PassExecutor and ModuleToFunctionPassAdaptor classes from PassManager.h to Pass.cpp. This allows for us to remove a dependency on Pass.h from PassManager.h.

PiperOrigin-RevId: 236029339

Add a generic getValue to ElementsAttr for accessing a value at a given index.

PiperOrigin-RevId: 236013669

Remove the stubs for getValue from DenseIntElementsAttr and DenseFPElementsAttr as they aren't implemented. The type for the index is also wrong.

PiperOrigin-RevId: 236010720

Add support for registering pass pipelines to the PassRegistry. This is done by providing a static registration facility PassPipelineRegistration that works similarly to PassRegistration except for it also takes a function that will add necessary passes to a provided PassManager.

  void pipelineBuilder(PassManager &pm) {
      pm.addPass(new MyPass());
      pm.addPass(new MyOtherPass());
  }

  static PassPipelineRegistration Unused("unused", "Unused pass", pipelineBuilder);

This is also useful for registering specializations of existing passes:

  Pass *createFooPass10() { return new FooPass(10); }

  static PassPipelineRegistration Unused("unused", "Unused pass", createFooPass10);

PiperOrigin-RevId: 235996282

Fix incorrect line split in header guard.

PiperOrigin-RevId: 235994785

Detect more trivially redundant constraints better

- detect more trivially redundant constraints in
  FlatAffineConstraints::removeTrivialRedundantConstraints. Redundancy due to
  constraints that only differ in the constant part (eg., 32i + 64j - 3 >= 0, 32 +
  64j - 8 >= 0) is now detected.  The method is still linear-time and does
  a single scan over the FlatAffineConstraints buffer. This detection is useful
  and needed to eliminate redundant constraints generated after FM elimination.

- update GCDTightenInequalities so that we also normalize by the GCD while at
  it. This way more constraints will show up as redundant (232i - 203 >= 0
  becomes i - 1 >= 0 instead of 232i - 232 >= 0) without having to call
  normalizeConstraintsByGCD.

- In FourierMotzkinEliminate, call GCDTightenInequalities and
  normalizeConstraintsByGCD before calling removeTrivialRedundantConstraints()
  - so that more redundant constraints are detected. As a result, redundancy
    due to constraints like i - 5 >= 0, i - 7 >= 0, 2i - 5 >= 0, 232i - 203 >=
    0 is now detected (here only i >= 7 is non-redundant).

As a result of these, a -memref-bound-check on the added test case runs in 16ms
instead of 1.35s (opt build) and no longer returns a conservative result.

PiperOrigin-RevId: 235983550

Fix bug in memref region computation with slice loop bounds. Adds loop IV values to ComputationSliceState which are used in FlatAffineConstraints::addSliceBounds, to ensure that constraints are only added for loop IV values which are present in the constraint system.

PiperOrigin-RevId: 235952912

Port all of the existing passes over to the new pass manager infrastructure. This is largely NFC.

PiperOrigin-RevId: 235952357

Implement the initial pass management functionality.
The definitions of derived passes have now changed and passes must adhere to the following:

* Inherit from a CRTP base class FunctionPass/ModulePass.
   - This class provides several necessary utilities for the transformation:
       . Access to the IR unit being transformed (getFunction/getModule)
       . Various utilities for pass identification and registration.

* Provide a 'PassResult runOn(Function|Module)()' method to transform the IR.
   - This replaces the runOn* functions from before.

This patch also introduces the notion of the PassManager. This allows for simplified construction of pass pipelines and acts as the sole interface for executing passes. This is important as FunctionPass will no longer have a 'runOnModule' method.

PiperOrigin-RevId: 235952008

[TableGen] Use result names in build() methods if possible

This will make it clear which result's type we are expecting in the build() signature.

PiperOrigin-RevId: 235925706

[TableGen] Add more scalar integer and floating-point types

PiperOrigin-RevId: 235918286

EDSC: move FileCheck tests into the source file

EDSC provide APIs for constructing and modifying the IR.  These APIs are
currently tested by a "test" module pass that reads the dummy IR (empty
functions), recognizes certain function names and injects the IR into those
functions based on their name.  This situation is unsatisfactory because the
expected outcome of the test lives in a different file than the input to the
test, i.e. the API calls.

Create a new binary for tests that constructs the IR from scratch using EDSC
APIs and prints it.  Put FileCheck comments next to the printing.  This removes
the need to have a file with dummy inputs and assert on its contents in the
test driver.  The test source includes a simplistic test harness that runs all
functions marked as TEST_FUNC but intentionally does not include any
value-testing functionality.

PiperOrigin-RevId: 235886629

Adding an IREE type kind range definition.

PiperOrigin-RevId: 235849609

Add a new class NamedAttributeList to deduplicate named attribute handling between Function and Instruction.

PiperOrigin-RevId: 235830304

Temp change in FlatAffineConstraints::getSliceBounds() to deal with TODO in
LoopFusion

- getConstDifference in LoopFusion is pending a refactoring to handle bounds
  with min's and max's; it currently asserts on some useful test cases that we
  want to experiment with. This CL changes getSliceBounds to be more
  conservative so as to not trigger the assertion. Filed b/126426796 to track this.

PiperOrigin-RevId: 235826538

Allow function names to have a leading underscore. This matches what is already defined in the spec, but not supported in the implementation.

PiperOrigin-RevId: 235823663

Validate the names of attribute, dialect, and functions during verification. This essentially enforces the parsing rules upon their names.

PiperOrigin-RevId: 235818842

Loop fusion comand line options cleanup

- clean up loop fusion CL options for promoting local buffers to fast memory
space
- add parameters to loop fusion pass instantiation

PiperOrigin-RevId: 235813419

Add parser support for internal named attributes. These are attributes with names starting with ':'.

PiperOrigin-RevId: 235774810

[TableGen] Fix using rewrite()'s qualified name for a bound argument in match()

PiperOrigin-RevId: 235767304

Add a Function::isExternal utility to simplify checks for external functions.

PiperOrigin-RevId: 235746553

Rewrite the dominance info classes to allow for operating on arbitrary control flow within operation regions. The CSE pass is also updated to properly handle nested dominance.

PiperOrigin-RevId: 235742627

Unboxing for static memrefs.

When lowering to MLIR(LLVMDialect) we unbox the structs that result
from converting static memrefs, that is, singleton structs
that just contain a raw pointer. This allows us to get rid of all
"extractvalue" instructions in the common case where shapes are fully
known.

PiperOrigin-RevId: 235706021

LLVM IR dialect and translation: support conditional branches with arguments

Since the goal of the LLVM IR dialect is to reflect LLVM IR in MLIR, the
dialect and the conversion procedure must account for the differences betweeen
block arguments and LLVM IR PHI nodes. In particular, LLVM IR disallows PHI
nodes with different values coming from the same source. Therefore, the LLVM IR
dialect now disallows `cond_br` operations that have identical successors
accepting arguments, which would lead to invalid PHI nodes. The conversion
process resolves the potential PHI source ambiguity by injecting dummy blocks
if the same block is used more than once as a successor in an instruction.
These dummy blocks branch unconditionally to the original successors, pass them
the original operands (available in the dummy block because it is dominated by
the original block) and are used instead of them in the original terminator
operation.

PiperOrigin-RevId: 235682798

Update LLVM Dialect documentation

Addressing post-submit comments.  The `getelementptr` operation now supports
non-constant indexes, similarly to LLVM, and this functionality is exercised by
the lowering to the dialect.  Update the documentation accordingly.

List the values of integer comparison predicates, which currently correspond to
those of CmpIOp in MLIR.  Ideally, we would use strings instead, but it
requires additional support for argument conversion in both the dialect
lowering pass and the LLVM translator.

PiperOrigin-RevId: 235678877

Verify IR produced by TranslateToMLIR functions

TESTED with existing unit tests

PiperOrigin-RevId: 235623059

Cleanup post cl/235283610 - NFC

- remove stale comments + cleanup
- drop MLIRContext * field from expr flattener

PiperOrigin-RevId: 235621178

Convert the dialect type parse/print hooks into virtual functions on the Dialect class.

PiperOrigin-RevId: 235589945

Add support for constructing DenseIntElementsAttr with an array of APInt and
DenseFPElementsAttr with an array of APFloat.

PiperOrigin-RevId: 235581794

[TableGen] Use ArrayRef instead of SmallVectorImpl for suitable method

PiperOrigin-RevId: 235577399

Add a stripmineSink and imperfectly nested tiling primitives.

This CL adds a primitive to perform stripmining of a loop by a given factor and
sinking it under multiple target loops.
In turn this is used to implement imperfectly nested loop tiling (with interchange) by repeatedly calling the stripmineSink primitive.

The API returns the point loops and allows repeated invocations of tiling to achieve declarative, multi-level, imperfectly-nested tiling.

Note that this CL is only concerned with the mechanical aspects and does not worry about analysis and legality.

The API is demonstrated in an example which creates an EDSC block, emits the corresponding MLIR and applies imperfectly-nested tiling:

```cpp
    auto block = edsc::block({
      For(ArrayRef<edsc::Expr>{i, j}, {zero, zero}, {M, N}, {one, one}, {
        For(k1, zero, O, one, {
          C({i, j, k1}) = A({i, j, k1}) + B({i, j, k1})
        }),
        For(k2, zero, O, one, {
          C({i, j, k2}) = A({i, j, k2}) + B({i, j, k2})
        }),
      }),
    });
    // clang-format on
    emitter.emitStmts(block.getBody());

    auto l_i = emitter.getAffineForOp(i), l_j = emitter.getAffineForOp(j),
         l_k1 = emitter.getAffineForOp(k1), l_k2 = emitter.getAffineForOp(k2);
    auto indicesL1 = mlir::tile({l_i, l_j}, {512, 1024}, {l_k1, l_k2});
    auto l_ii1 = indicesL1[0][0], l_jj1 = indicesL1[1][0];
    mlir::tile({l_jj1, l_ii1}, {32, 16}, l_jj1);
```

The edsc::Expr for the induction variables (i, j, k_1, k_2) provide the programmatic hooks from which tiling can be applied declaratively.

PiperOrigin-RevId: 235548228

EDSC: support conditional branch instructions

Leverage the recently introduced support for multiple argument groups and
multiple destination blocks in EDSC Expressions to implement conditional
branches in EDSC.  Conditional branches have two successors and three argument
groups.  The first group contains a single expression of i1 type that
corresponds to the condition of the branch.  The two following groups contain
arguments of the two successors of the conditional branch instruction, in the
same order as the successors.  Expose this instruction to the C API and Python
bindings.

PiperOrigin-RevId: 235542768

EDSC: support branch instructions

The new implementation of blocks was designed to support blocks with arguments.
More specifically, StmtBlock can be constructed with a list of Bindables that
will be bound to block aguments upon construction.  Leverage this functionality
to implement branch instructions with arguments.

This additionally requires the statement storage to have a list of successors,
similarly to core IR operations.

Becauase successor chains can form loops, we need a possibility to decouple
block declaration, after which it becomes usable by branch instructions, from
block body definition.  This is achieved by creating an empty block and by
resetting its body with a new list of instructions.  Note that assigning a
block from another block will not affect any instructions that may have
designated this block as their successor (this behavior is necessary to make
value-type semantics of EDSC types consistent).  Combined, one can now write
generators like

    EDSCContext context;
    Type indexType = ...;
    Bindable i(indexType), ii(indexType), zero(indexType), one(indexType);
    StmtBlock loopBlock({i}, {});
    loopBlock.set({ii = i + one,
                   Branch(loopBlock, {ii})});
    MLIREmitter(&builder)
        .bindConstant<ConstantIndexOp>(zero, 0)
        .bindConstant<ConstantIndexOp>(one, 1)
.emitStmt(Branch(loopBlock, {zero}));

where the emitter will emit the statement and its successors, if present.

PiperOrigin-RevId: 235541892

Use dialect hook registration for constant folding hook.

Deletes specialized mechanism for registering constant folding hook and uses dialect hooks registration mechanism instead.

PiperOrigin-RevId: 235535410

Add constant folding for ExtractElementOp when the aggregate is an OpaqueElementsAttr.

PiperOrigin-RevId: 235533283

EDSC printing: handle integer attributes with bitwidth > 64

This came up in post-submit review. Use LLVM's support for outputting APInt
values directly instead of obtaining a 64-bit integer value from APInt, which
will not work for wider integers.

PiperOrigin-RevId: 235531574

[TableGen] Fix infinite loop in SubstLeaves substitution

Previously we have `auto pos = std::string::find(...) != std::string::npos` as
if condition to control substring substitution. Instead of the position for the
found substring, `pos` will be a boolean value indicating found nor not. Then
used as the replace start position, we were always replacing starting from 0 or
1. If the replaced substring also has the pattern to be matched, we'll see
an infinite loop.

PiperOrigin-RevId: 235504681

Refactor AffineExprFlattener and move FlatAffineConstraints out of IR into
Analysis - NFC

- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
  doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
  could be moved out of IR/; the simplification that the IR needs for
  AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
  all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
  class

- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor

PiperOrigin-RevId: 235283610

Spike to define real math ops and lowering of one variant of add to corresponding integer ops.

The only reason in starting with a fixedpoint add is that it is the absolute simplest variant and illustrates the level of abstraction I'm aiming for.

The overall flow would be:
  1. Determine quantization parameters (out of scope of this cl).
  2. Source dialect rules to lower supported math ops to the quantization dialect (out of scope of this cl).
  3. Quantization passes: [-quant-convert-const, -quant-lower-uniform-real-math, -quant-lower-unsupported-to-float] (the last one not implemented yet)
  4. Target specific lowering of the integral arithmetic ops (roughly at the level of gemmlowp) to more fundamental operations (i.e. calls to gemmlowp, simd instructions, DSP instructions, etc).

How I'm doing this should facilitate implementation of just about any kind of backend except TFLite, which has a very course, adhoc surface area for its quantized kernels. Options there include (I'm not taking an opinion on this - just trying to provide options):
  a) Not using any of this: just match q/dbarrier + tf math ops to the supported TFLite quantized op set.
  b) Implement the more fundamental integer math ops on TFLite and convert to those instead of the current op set.

Note that I've hand-waved over the process of choosing appropriate quantization parameters. Getting to that next. As you can see, different implementations will likely have different magic combinations of specific math support, and we will need the target system that has been discussed for some of the esoteric cases (i.e. many DSPs only support POT fixedpoint).

Two unrelated changes to the overall goal of this CL and can be broken out of desired:
  - Adding optional attribute support to TabelGen
  - Allowing TableGen native rewrite hooks to return nullptr, signalling that no rewrite has been done.

PiperOrigin-RevId: 235267229

NFC: Make DialectConversion not directly inherit from ModulePass. It is now just a utility class that performs dialect conversion on a provided module.

PiperOrigin-RevId: 235194067

Rewrite MLPatternLoweringPass to no longer inherit from FunctionPass and just provide a utility function that applies ML patterns.

PiperOrigin-RevId: 235194034

Internal change

PiperOrigin-RevId: 235191129

Document the conversion into the LLVM IR dialect

Add a documentation page on the key points of the conversion to LLVM IR. This
focuses on the aspects of conversion that are relevant for integration of the
LLVM IR dialect (and produced LLVM IR that is mostly a one-to-one translation)
into other projects. In particular, it describes the type conversion rules and
the memref model supporting dynamic sizes.

PiperOrigin-RevId: 235190772

Add a test example of calling a builtin function.

PiperOrigin-RevId: 235149430

Add documentation for the LLVM IR dialect

The LLVM IR pass was bootstrapped without user documentation, following LLVM's
language reference and existing conversions between MLIR standard operations
and LLVM IR instructions.  Provide concise documentation of the LLVM IR dialect
operations.  This documentation does not describe the semantics of the
operations, which should match that of LLVM IR, but highlights the structural
differences in operation definitions, in particular using attributes instead of
constant-only values.  It also describes pseudo-operations that exist only to
make the LLVM IR dialect self-contained within MLIR.

While it could have been possible to generate operation description from
TableGen, this opts for a more concise format where groups of related
operations are described together.

PiperOrigin-RevId: 235149136

Define a PassID class to use when defining a pass. This allows for the type used for the ID field to be self documenting. It also allows for the compiler to know the set alignment of the ID object, which is useful for storing pointer identifiers within llvm data structures.

PiperOrigin-RevId: 235107957

Lower standard DivF and RemF operations to the LLVM IR dialect

Add support for lowering DivF and RemF to LLVM::FDiv and LLMV::FRem
respectively. The lowering is a trivial one-to-one transformation.
The corresponding operations already existed in the LLVM IR dialect and can be
lowered to the LLVM IR proper. Add the necessary tests for scalar and vector
forms.

PiperOrigin-RevId: 234984608

Exposed division and remainder operations in EDSC

This change introduces three new operators in EDSC: Div (also exposed
via Expr.__div__ aka /) -- floating-point division, FloorDiv and CeilDiv
for flooring/ceiling index division.

The lowering to LLVM will be implemented in b/124872679.

PiperOrigin-RevId: 234963217

EDSC: support call instructions

Introduce support for binding MLIR functions as constant expressions.  Standard
constant operation supports functions as possible constant values.

Provide C APIs to look up existing named functions in an MLIR module and expose
them to the Python bindings.  Provide Python bindings to declare a function in
an MLIR module without defining it and to add a definition given a function
declaration.  These declarations are useful when attempting to link MLIR
modules with, e.g., the standard library.

Introduce EDSC support for direct and indirect calls to other MLIR functions.
Internally, an indirect call is always emitted to leverage existing support for
delayed construction of MLIR Values using EDSC Exprs.  If the expression is
bound to a constant function (looked up or declared beforehand), MLIR constant
folding will be able to replace an indirect call by a direct call.  Currently,
only zero- and one-result functions are supported since we don't have support
for multi-valued expressions in EDSC.

Expose function calling interface to Python bindings on expressions by defining
a `__call__` function accepting a variable number of arguments.

PiperOrigin-RevId: 234959444

Update / cleanup pass documentation + Langref alloc examples

PiperOrigin-RevId: 234866323

Fix unused errors in opt build.

PiperOrigin-RevId: 234841678

Print debug message better + switch a dma-generate cl opt to uint64_t

PiperOrigin-RevId: 234840316

Fix for getMemRefSizeInBytes: unsigned -> uint64_t

PiperOrigin-RevId: 234829637

Create OpTrait base class & allow operation predicate OpTraits.

* Introduce a OpTrait class in C++ to wrap the TableGen definition;
* Introduce PredOpTrait and rename previous usage of OpTrait to NativeOpTrait;
* PredOpTrait allows specifying a trait of the operation by way of predicate on the operation. This will be used in future to create reusable set of trait building blocks in the definition of operations. E.g., indicating whether to operands have the same type and allowing locally documenting op requirements by trait composition.
- Some of these building blocks could later evolve into known fixed set as LLVMs backends do, but that can be considered with more data.
* Use the modelling to address one verify TODO in a very local manner.

This subsumes the current custom verify specification which will be removed in a separate mechanical CL.

PiperOrigin-RevId: 234827169

Adding -mlir-print-internal-attributes to print attributes with ':' prefixes.
This enables lit testing of passes that add internal attributes.

PiperOrigin-RevId: 234809949

Allow Builder to create function-type constants

A recent change made ConstantOp::build accept a NumericAttr or assert that a
generic Attribute is in fact a NumericAttr.  The rationale behind the change
was that NumericAttrs have a type that can be used as the result type of the
constant operation.  FunctionAttr also has a type, and it is valid to construct
function-typed constants as exercised by the parser.mlir test.  Relax
ConstantOp::build back to take a generic Attribute.  In the overload that only
takes an attribute, assert that the Attribute is either a NumericAttr or a
FunctionAttr, because it is necessary to extract the type.  In the overload
that takes both type type and the attribute, delegate the attribute type
checking to ConstantOp::verify to prevent non-Builder-based Op construction
mechanisms from creating invalid IR.
PiperOrigin-RevId: 234798569

EDSC: emit composed affine maps again

The recent rework of MLIREmitter switched to using the generic call to
`builder.createOperation` from OperationState instead of individual customized
calls to `builder.create<>`. As a result, regular non-composed affine apply
operations where emitted. Introduce a special case in Expr::build to always
create composed affine maps instead, as it used to be the case before the
rework.

Such special-casing goes against the idea of EDSC generality and extensibility.
Instead, we should consider declaring the composed form canonical for
affine.apply operations and using the builder support for creating operations
and canonicalizing them immediately (ongoing effort).

PiperOrigin-RevId: 234790129

EDSC: introduce min/max only usable inside for upper/lower bounds of a loop

Introduce a type-safe way of building a 'for' loop with max/min bounds in EDSC.
Define new types MaxExpr and MinExpr in C++ EDSC API and expose them to Python
bindings.  Use values of these type to construct 'for' loops with max/min in
newly introduced overloads of the `edsc::For` factory function.  Note that in C
APIs, we still must expose MaxMinFor as a different function because C has no
overloads.  Also note that MaxExpr and MinExpr do _not_ derive from Expr
because they are not allowed to be used in a regular Expr context (which may
produce `affine.apply` instructions not expecting `min` or `max`).

Factory functions `Min` and `Max` in Python can be further overloaded to
produce chains of comparisons and selects on non-index types.  This is not
trivial in C++ since overloaded functions cannot differ by the return type only
(`MaxExpr` or `Expr`) and making `MaxExpr` derive from `Expr` defies the
purpose of type-safe construction.

PiperOrigin-RevId: 234786131

EDSC: support multi-expression loop bounds

MLIR supports 'for' loops with lower(upper) bound defined by taking a
maximum(minimum) of a list of expressions, but does not have first-class affine
constructs for the maximum(minimum).  All these expressions must have affine
provenance, similarly to a single-expression bound.  Add support for
constructing such loops using EDSC.  The expression factory function is called
`edsc::MaxMinFor` to (1) highlight that the maximum(minimum) operation is
applied to the lower(upper) bound expressions and (2) differentiate it from a
`edsc::For` that creates multiple perfectly nested loops (and should arguably
be called `edsc::ForNest`).

PiperOrigin-RevId: 234785996

EDSC: create constants as expressions

Introduce a functionality to create EDSC expressions from typed constants.
This complements the current functionality that uses "unbound" expressions and
binds them to a specific constant before emission. It comes in handy in cases
where we want to check if something is a constant early during construciton
rather than late during emission, for example multiplications and divisions in
affine expressions. This is also consistent with MLIR vision of constants
being defined by an operation (rather than being special kinds of values in the
IR) by exposing this operation as EDSC expression.

PiperOrigin-RevId: 234758020

[EDSC] Fix Stmt::operator= and allow DimOp in For loops

This CL fixes 2 recent issues with EDSCs:
1. the type of the LHS in Stmt::operator=(Expr rhs) should be the same as the (asserted unique) return type;
2. symbols coming from DimOp should be admissible as lower / upper bounds in For

The relevant tests are added.

PiperOrigin-RevId: 234750249

Extend/improve getSliceBounds() / complete TODO + update unionBoundingBox

- compute slices precisely where the destination iteration depends on multiple source
iterations (instead of over-approximating to the whole source loop extent)
- update unionBoundingBox to deal with input with non-matching symbols
- reenable disabled backend test case

PiperOrigin-RevId: 234714069

NFC: Refactor the files related to passes.

* PassRegistry is split into its own source file.
* Pass related files are moved to a new library 'Pass'.

PiperOrigin-RevId: 234705771

DMA placement update - hoist loops invariant DMAs

- hoist DMAs past all loops immediately surrounding the region that the latter
is invariant on - do this at DMA generation time itself

PiperOrigin-RevId: 234628447

[EDSC] Remove dead code in MLIREmitter.cpp

cl/234609882 made EDSCs typed on construction (instead of typed on emission).
This CL cleans up some leftover dead code.

PiperOrigin-RevId: 234627105

Update pass documentation + improve/fix some comments

- add documentation for passes
- improve / fix outdated doc comments

PiperOrigin-RevId: 234627076

Add a generic pattern matcher for matching constant values produced by an operation with zero operands and a single result.

PiperOrigin-RevId: 234616691

EDSC: clean up type casting mechanism

Originally, edsc::Expr had a long enum edsc::ExprKind with all supported types
of operations. Recent Expr extensibility support removed the need to specify
supported types in advance. Replace the no-longer-used blocks of enum values
reserved for unary/binary/ternary/variadic expressions with simple values (it
is still useful to know if an expression is, e.g., binary to access it through
a simpler API).

Furthermore, wrap string-comparison now used to identify specific ops into an
`Expr::is_op<>` function template, that acts similarly to `Instruction::isa<>`.
Introduce `{Unary,Binary,Ternary,Variadic}Expr::make<> ` function template that
creates a Expression emitting the MLIR Op specified as template argument.

PiperOrigin-RevId: 234612916

EDSC: make Expr typed and extensible

Expose the result types of edsc::Expr, which are now stored for all types of
Exprs and not only for the variadic ones.  Require return types when an Expr is
constructed, if it will ever have some.  An empty return type list is
interpreted as an Expr that does not create a value (e.g. `return` or `store`).

Conceptually, all edss::Exprs are now typed, with the type being a (potentially
empty) tuple of return types.  Unbound expressions and Bindables must now be
constructed with a specific type they will take.  This makes EDSC less
evidently type-polymorphic, but we can still write generic code such as

    Expr sumOfSquares(Expr lhs, Expr rhs) { return lhs * lhs + rhs * rhs; }

and use it to construct different typed expressions as

    sumOfSquares(Bindable(IndexType::get(ctx)), Bindable(IndexType::get(ctx)));
    sumOfSquares(Bindable(FloatType::getF32(ctx)),
                 Bindable(FloatType::getF32(ctx)));

On the positive side, we get the following.
1. We can now perform type checking when constructing Exprs rather than during
   MLIR emission.  Nevertheless, this is still duplicates the Op::verify()
   until we can factor out type checking from that.
2. MLIREmitter is significantly simplified.
3. ExprKind enum is only used for actual kinds of expressions.  Data structures
   are converging with AbstractOperation, and the users can now create a
   VariadicExpr("canonical_op_name", {types}, {exprs}) for any operation, even
   an unregistered one without having to extend the enum and make pervasive
   changes to EDSCs.

On the negative side, we get the following.
1. Typed bindables are more verbose, even in Python.
2. We lose the ability to do print debugging for higher-level EDSC abstractions
   that are implemented as multiple MLIR Ops, for example logical disjunction.

This is the step 2/n towards making EDSC extensible.

***

Move MLIR Op construction from MLIREmitter::emitExpr to Expr::build since Expr
now has sufficient information to build itself.

This is the step 3/n towards making EDSC extensible.

Both of these strive to minimize the amount of irrelevant changes.  In
particular, this introduces more complex pretty-printing for affine and binary
expression to make sure tests continue to pass.  It also relies on string
comparison to identify specific operations that an Expr produces.

PiperOrigin-RevId: 234609882

[TableGen] Support using Variadic<Type> in results

This CL extended TableGen Operator class to provide accessors for information on op
results.

In OpDefinitionGen, added checks to make sure only the last result can be variadic,
and adjusted traits and builders generation to consider variadic results.

PiperOrigin-RevId: 234596124

EDSC: introduce support for blocks

EDSC currently implement a block as a statement that is itself a list of
statements.  This suffers from two modeling problems: (1) these blocks are not
addressable, i.e. one cannot create an instruction where thus constructed block
is a successor; (2) they support block nesting, which is not supported by MLIR
blocks.  Furthermore, emitting such "compound statement" (misleadingly named
`Block` in Python bindings) does not actually produce a new Block in the IR.

Implement support for creating actual IR Blocks in EDSC.  In particular, define
a new StmtBlock EDSC class that is neither an Expr nor a Stmt but contains a
list of Stmts.  Additionally, StmtBlock may have (early-) typed arguments.
These arguments are Bindable expressions that can be used inside the block.
Provide two calls in the MLIREmitter, `emitBlock` that actually emits a new
block and `emitBlockBody` that only emits the instructions contained in the
block without creating a new block.  In the latter case, the instructions must
not use block arguments.

Update Python bindings to make it clear when instruction emission happens
without creating a new block.

PiperOrigin-RevId: 234556474

[TableGen] Fix discrepancy between parameter meaning and code logic

The parameter to emitStandaloneParamBuilder() was renamed from hasResultType to
isAllSameType, which is the opposite boolean value. The logic should be changed
to make them consistent.

Also re-ordered some methods in Operator. And few other tiny improvements.

PiperOrigin-RevId: 234478316

Misc. updates/fixes to analysis utils used for DMA generation; update DMA
generation pass to make it drop certain assumptions, complete TODOs.

- multiple fixes for getMemoryFootprintBytes
  - pass loopDepth correctly from getMemoryFootprintBytes()
  - use union while computing memory footprints

- bug fixes for addAffineForOpDomain
  - take into account loop step
  - add domains of other loop IVs in turn that might have been used in the bounds

- dma-generate: drop assumption of "non-unit stride loops being tile space loops
  and skipping those and recursing to inner depths"; DMA generation is now purely
  based on available fast mem capacity and memory footprint's calculated

- handle memory region compute failures/bailouts correctly from dma-generate

- loop tiling cleanup/NFC

- update some debug and error messages to use emitNote/emitError in
  pipeline-data-transfer pass - NFC

PiperOrigin-RevId: 234245969

Support fusing producer loop nests which write to a memref which is live out, provided that the write region of the consumer loop nest to the same memref is a super set of the producer's write region.

PiperOrigin-RevId: 234240958

EDSC: properly construct FunctionTypes

The existing implementation of makeFunctionType in EDSC contains a bug: the
array of input types is overwritten using output types passed as arguments and
the array of output types is never filled in. This leads to all sorts of
incorrect memory behavior. Fill in the array of output types using the proper
argument.

PiperOrigin-RevId: 234177221

ExecutionEngine: provide utils for running CLI-configured LLVM passes

A recent change introduced a possibility to run LLVM IR transformation during
JIT-compilation in the ExecutionEngine.  Provide helper functions that
construct IR transformers given either clang-style optimization levels or a
list passes to run.  The latter wraps the LLVM command line option parser to
parse strings rather than actual command line arguments.  As a result, we can
run either of

    mlir-cpu-runner -O3 input.mlir
    mlir-cpu-runner -some-mlir-pass -llvm-opts="-llvm-pass -other-llvm-pass"

to combine different transformations.  The transformer builder functions are
provided as a separate library that depends on LLVM pass libraries unlike the
main execution engine library.  The library can be used for integrating MLIR
execution engine into external frameworks.

PiperOrigin-RevId: 234173493

LoopFusion: perform a series of loop interchanges to increase the loop depth at which slices of producer loop nests can be fused into constumer loop nests.
*) Adds utility to LoopUtils to perform loop interchange of two AffineForOps.
*) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges.
*) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential.
*) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them.
*) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation.
*) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest.
*) Adds and updates related unit tests.

PiperOrigin-RevId: 234158370

[TableGen] Rename Operand to Value to prepare sharing between operand and result

We specify op operands and results in TableGen op definition using the same syntax.
They should be modelled similarly in TableGen driver wrapper classes.

PiperOrigin-RevId: 234153332