review.tizen.org Git - platform/upstream/llvm.git/log

Try to fold operations in DialectConversion when trying to legalize.

This change allows for DialectConversion to attempt folding as a mechanism to legalize illegal operations. This also expands folding support in OpBuilder::createOrFold to generate new constants when folding, and also enables it to work in the context of a PatternRewriter.

PiperOrigin-RevId: 285448440

Add a type range for the XLA HLO dialect.

PiperOrigin-RevId: 285437835

Fix maskAndClamp in gpu.all_reduce.

The clamp value determines the returned predicate. Previously, the clamp value was fixed to 31 and the predicate was therefore always true. This is incorrect for partial warp reductions, but went unnoticed because the returned values happened to be zero (but it could be anything).

PiperOrigin-RevId: 285343160

NFC: Cleanup the various Op::print methods.

This cleans up the implementation of the various operation print methods. This is done via a combination of code cleanup, adding new streaming methods to the printer(e.g. operand ranges), etc.

PiperOrigin-RevId: 285285181

Fix logic on when to emit collective type but separate arg builder

Got the comment right but the code wrong :/

PiperOrigin-RevId: 285270561

[VectorOps] Add lowering of vector.shuffle to LLVM IR

For example, a shuffle

%1 = vector.shuffle %arg0, %arg1 [0 : i32, 1 : i32] : vector<2xf32>, vector<2xf32>

becomes a direct LLVM shuffle

0 = llvm.shufflevector %arg0, %arg1 [0 : i32, 1 : i32] : !llvm<"<2 x float>">, !llvm<"<2 x float>">

but

%1 = vector.shuffle %a, %b[1 : i32, 0 : i32, 2: i32] : vector<1x4xf32>, vector<2x4xf32>

becomes the more elaborate (note the index permutation that drives
argument selection for the extract operations)

%0 = llvm.mlir.undef : !llvm<"[3 x <4 x float>]">
%1 = llvm.extractvalue %arg1[0] : !llvm<"[2 x <4 x float>]">
%2 = llvm.insertvalue %1, %0[0] : !llvm<"[3 x <4 x float>]">
%3 = llvm.extractvalue %arg0[0] : !llvm<"[1 x <4 x float>]">
%4 = llvm.insertvalue %3, %2[1] : !llvm<"[3 x <4 x float>]">
%5 = llvm.extractvalue %arg1[1] : !llvm<"[2 x <4 x float>]">
%6 = llvm.insertvalue %5, %4[2] : !llvm<"[3 x <4 x float>]">

PiperOrigin-RevId: 285268164

Add type inference variant for separate params builder generated

Add variant that does invoke infer type op interface where defined. Also add entry function that invokes that different separate argument builders for wrapped, unwrapped and inference variant.

PiperOrigin-RevId: 285220709

Retire !linalg.buffer type - NFC

This type is not used anymore now that Linalg view and subview have graduated to std and that alignment is supported on alloc.

PiperOrigin-RevId: 285213424

[Linalg] Add test for fusion of GenericOp with IndexedGenericOp.

PiperOrigin-RevId: 285211797

Added lowering of `std.tanh` to llvm function call to `tanh` and `tanhf`.

Closes tensorflow/mlir#312

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/312 from dfki-ehna:tanh 9e89b072ff91ff390ad739501745114feb3ac856
PiperOrigin-RevId: 285205674

Move cpu runner utils templates to .h

This allows reusing the implementation in various places by just including and permits more easily writing test functions without explicit template instantiations.

This also modifies UnrankedMemRefType to take a template type parameter since it cannot be type agnostic atm.

PiperOrigin-RevId: 285187711

Automated rollback of commit f68ac464d818629e0fe10c23b44ac782d64a12d2

PiperOrigin-RevId: 285162061

Switch from shfl.bfly to shfl.down.

Both work for the current use case, but the latter allows implementing
prefix sums and is a little easier to understand for partial warps.

PiperOrigin-RevId: 285145287

Make OpBuilder::insert virtual instead of OpBuilder::createOperation.

It is sometimes useful to create operations separately from the builder before insertion as it may be easier to erase them in isolation if necessary. One example use case for this is folding, as we will only want to insert newly generated constant operations on success. This has the added benefit of fixing some silent PatternRewriter failures related to cloning, as the OpBuilder 'clone' methods don't call createOperation.

PiperOrigin-RevId: 285086242

Add std.log* and llvm.intr.log* that correspond to the LLVMIR intrinsics

PiperOrigin-RevId: 285073483

Add missing CMake dependency for MLIRTestIR.

PiperOrigin-RevId: 285039153

Fix OSS build

PiperOrigin-RevId: 285036782

Expose a convenience function to add interface attributes to a function.

PiperOrigin-RevId: 285036647

[spirv] Add lowering for std.fdiv, std.frem, std.fsub

Closes tensorflow/mlir#313

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/313 from denis0x0D:sandbox/lowering_std_farith 41715070a74d13bfa9401957478978c1bb8006c0
PiperOrigin-RevId: 285023586

Continue refactoring StructuredOps utilities

This CL adds more common information to StructuredOpsUtils.h
The n_view attribute is retired in favor of args_in + args_out but the CL is otherwise NFC.

PiperOrigin-RevId: 285000621

NFC: Fix naming inconsistency: FuncOpLowering -> GPUFuncOpLowering.

Remove nested anonymous namespace.

PiperOrigin-RevId: 284987357

Roll-forward initial liveness analysis including test cases.

Fix the usage of the map size when appending to the map with [].

PiperOrigin-RevId: 284985916

Automated rollback of commit 98fbf41044d3364dbaf18db81b9e8d9520d14761

PiperOrigin-RevId: 284979684

Add a function to get lowering patterns from GPU to NVVM.

This enables combining the patterns with other patterns into larger lowerings.

PiperOrigin-RevId: 284979271

[Linalg] Add tiling for IndexedGenericOp with a region.

PiperOrigin-RevId: 284949355

Add initial liveness analysis including test cases.

Closes tensorflow/mlir#255

PiperOrigin-RevId: 284935454

[VectorOps] Add lowering of vector.insert to LLVM IR

For example, an insert

  %0 = vector.insert %arg0, %arg1[3 : i32] : f32 into vector<4xf32>

becomes

  %0 = llvm.mlir.constant(3 : i32) : !llvm.i32
  %1 = llvm.insertelement %arg0, %arg1[%0 : !llvm.i32] : !llvm<"<4 x float>">

A more elaborate example, inserting an element in a higher dimension
vector

  %0 = vector.insert %arg0, %arg1[3 : i32, 7 : i32, 15 : i32] : f32 into vector<4x8x16xf32>

becomes

  %0 = llvm.extractvalue %arg1[3 : i32, 7 : i32] : !llvm<"[4 x [8 x <16 x float>]]">
  %1 = llvm.mlir.constant(15 : i32) : !llvm.i32
  %2 = llvm.insertelement %arg0, %0[%1 : !llvm.i32] : !llvm<"<16 x float>">
  %3 = llvm.insertvalue %2, %arg1[3 : i32, 7 : i32] : !llvm<"[4 x [8 x <16 x float>]]">

PiperOrigin-RevId: 284882443

Add VectorOp transform pattern which splits vector TransferReadOps to target vector unroll size.

PiperOrigin-RevId: 284880592

More affine expr simplifications for floordiv and mod

Add one more simplification for floordiv and mod affine expressions.
Examples:
(2*d0 + 1) floordiv 2 is simplified to d0
(8*d0 + 4*d1 + d2) floordiv 4 simplified to 4*d0 + d1 + d2 floordiv 4.
etc.

Similarly, (4*d1 + 1) mod 2 is simplified to 1,
(2*d0 + 8*d1) mod 8 simplified to 2*d0 mod 8.

Change getLargestKnownDivisor to return int64_t to be consistent and
to avoid casting at call sites (since the return value is used in expressions
of int64_t/index type).

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#202

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/202 from bondhugula:affine b13fcb2f1c00a39ca5434613a02408e085a80e77
PiperOrigin-RevId: 284866710

Move gpu.launch_func to ODS. NFC

Move the definition of gpu.launch_func operation from hand-rolled C++
implementation to the ODS framework. Also move the documentation. This only
performs the move and remains a non-functional change, a follow-up will clean
up the custom functions that can be auto-generated using ODS.

PiperOrigin-RevId: 284842252

Fold TestLinalgTilePermutePatterns into TestLinalgTransformPatterns - NFC

Centralize all patterns that test Linalg transforms in a single pass.

PiperOrigin-RevId: 284835938

Refactor the various operand/result/type iterators to use indexed_accessor_range.

This has several benefits:
* The implementation is much cleaner and more efficient.
* The ranges now have support for many useful operations: operator[], slice, drop_front, size, etc.
* Value ranges can now directly query a range for their types via 'getTypes()': e.g:
   void foo(Operation::operand_range operands) {
     auto operandTypes = operands.getTypes();
   }

PiperOrigin-RevId: 284834912

[Linalg] Add a Linalg iterator permutation transformation

This patch closes issue tensorflow/mlir#272
We add a standalone iterator permutation transformation to Linalg.
This transformation composes a permutation map with the maps in the
"indexing_maps" attribute. It also permutes "iterator_types"
accordingly.

Change-Id: I7c1e693b8203aeecc595a7c012e738ca1100c857

Closes tensorflow/mlir#307

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/307 from tetuante:issue272 f7908d58792f4111119721885e247045104f1131
PiperOrigin-RevId: 284824102

Uniformize Vector transforms as patterns on the model of Linalg - NFC

This reorganizes the vector transformations to be more easily testable as patterns and more easily composable into fused passes in the future.

PiperOrigin-RevId: 284817474

Add Py API for composing an affine expression with a map. Also allows extracting constant values for const expressions.

PiperOrigin-RevId: 284809623

More convenience build methods for SPIR-V ops.

Add some convenience build methods to SPIR-V ops and update the
lowering to use these methods where possible.

For SPIRV::CompositeExtractOp move the method to deduce type of
element based on base and indices into a convenience function. Some
additional functionality needed to handle differences between parsing
and verification methods.

PiperOrigin-RevId: 284794404

Add a doc on guidelines for contributing a new dialect to the MLIR core repo

Closes tensorflow/mlir#263

PiperOrigin-RevId: 284760931

Drop Markdown style annotations

These come from a non-standard extenion that is not available on Github, so it
only clutters the documentation source with {.mlir} or {.ebnf} tags.

PiperOrigin-RevId: 284733003

Fix build breakage on gcc-5

Avoid `error: could not convert ?(const char*)"reduction"? from ?const char*? to ?llvm::StringLiteral?`. Tested with gcc-5.5.

PiperOrigin-RevId: 284677810

[VectorOps] Add a ShuffleOp to the VectorOps dialect

For example

%0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32>

yields a vector<4xf32> result with a permutation of the elements of %x and %y

PiperOrigin-RevId: 284657191

[VectorOps] Fix off-by-one error in insert/extract validation

PiperOrigin-RevId: 284652653

Refactor the Block support classes.

Each of the support classes for Block are now moved into a new header BlockSupport.h. The successor iterator class is also reimplemented as an indexed_accessor_range. This makes the class more efficient, and expands on its available functionality.

PiperOrigin-RevId: 284646792

Add new indexed_accessor_range_base and indexed_accessor_range classes that simplify defining index-able ranges.

Many ranges want similar functionality from a range type(e.g. slice/drop_front/operator[]/etc.), so these classes provide a generic implementation that may be used by many different types of ranges. This removes some code duplication, and also empowers many of the existing range types in MLIR(e.g. result type ranges, operand ranges, ElementsAttr ranges, etc.). This change only updates RegionRange and ValueRange, more ranges will be updated in followup commits.

PiperOrigin-RevId: 284615679

Fix minor spelling tweaks.

Closes tensorflow/mlir#306

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/306 from shanshanpt:master 11430c2131281d84a432f45e854e29917b336e8d
PiperOrigin-RevId: 284613648

[spirv] Add CompositeConstruct operation.

Closes tensorflow/mlir#308

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/308 from denis0x0D:sandbox/composite_construct 9ef7180f77f9374bcd05afc4f9e6c1d2d72d02b7
PiperOrigin-RevId: 284613617

[spirv] Add spv.IAdd, spv.ISub, and spv.IMul folders

The patterns to be folded away can be commonly generated
during lowering to SPIR-V.

PiperOrigin-RevId: 284604855

Factor out commonly reusable names across structured ops dialects

This CL starts extracting commonalities between dialects that use the structured ops abstractions. Also fixes an OSS build issue where StringRef were incorrectly used with constexpr.

PiperOrigin-RevId: 284591114

ODS: Generate named accessors for raw attributes

Currently named accessors are generated for attributes returning a consumer
friendly type. But sometimes the attributes are used while transforming an
existing op and then the returned type has to be converted back into an
attribute or the raw `getAttr` needs to be used. Generate raw named accessor
for attributes to reference the raw attributes without having to use the string
interface for better compile time verification. This allows calling
`blahAttr()` instead of `getAttr("blah")`.

Raw here refers to returning the underlying storage attribute.

PiperOrigin-RevId: 284583426

Add lowering for module with gpu.kernel_module attribute.

The existing GPU to SPIR-V lowering created a spv.module for every
function with gpu.kernel attribute. A better approach is to lower the
module that the function lives in (which has the attribute
gpu.kernel_module) to a spv.module operation. This better captures the
host-device separation modeled by GPU dialect and simplifies the
lowering as well.

PiperOrigin-RevId: 284574688

Unify vector op unrolling transformation.

Unifies vector op unrolling transformation, by using the same unrolling implementation for contraction and elementwise operations.
Removes fakefork/join operations which are non longer needed now that we have the InsertStridedSlice operation.

PiperOrigin-RevId: 284570784

Minor spelling tweaks

Closes tensorflow/mlir#304

PiperOrigin-RevId: 284568358

[StructuredOps][Linalg] Add a primitive pattern to rewrite the linalg.generic form of matmul to vector form.

This CL uses the newly expanded matcher support to easily detect when a linalg.generic has a multiply-accumulate body. A linalg.generic with such a body is rewritten as a vector contraction.
This CL additionally limits the rewrite to the case of matrix multiplication on contiguous and statically shaped memrefs for now.

Before expanding further, we should harden the infrastructure for expressing custom ops with the structured ops abstraction.

PiperOrigin-RevId: 284566659

Add RegionRange for when need to abstract over different region iteration

Follows ValueRange in representing a generic abstraction over the different
ways to represent a range of Regions. This wrapper is not as ValueRange and only
considers the current cases of interest: MutableArrayRef<Region> and
ArrayRef<std::unique_ptr<Region>> as occurs during op construction vs op region
querying.

Note: ArrayRef<std::unique_ptr<Region>> allows for unset regions, so this range
returns a pointer to a Region instead of a Region.
PiperOrigin-RevId: 284563229

Post-submit cleanups in RecursiveMatchers

This CL addresses leftover cleanups and adds a test mixing RecursiveMatchers and m_Constant
that captures properly.

PiperOrigin-RevId: 284551567

Replace spurious SmallVector constructions with ValueRange

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#305

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/305 from bondhugula:value_range 21d1fae73f549e3c8e72b60876eff1b864cea39c
PiperOrigin-RevId: 284541027

Add a layer of recursive matchers that compose.

This CL adds support for building matchers recursively.
The following matchers are provided:

1. `m_any()` can match any value
2. `m_val(Value *)` binds to a value and must match it
3. `RecursivePatternMatcher<OpType, Matchers...>` n-arity pattern that matches `OpType` and whose operands must be matched exactly by `Matchers...`.

This allows building expression templates for patterns, declaratively, in a very natural fashion.
For example pattern `p9` defined as follows:
```
  auto mul_of_muladd = m_Op<MulFOp>(m_Op<MulFOp>(), m_Op<AddFOp>());
  auto mul_of_anyadd = m_Op<MulFOp>(m_any(), m_Op<AddFOp>());
  auto p9 = m_Op<MulFOp>(m_Op<MulFOp>(
                     mul_of_muladd, m_Op<MulFOp>()),
                   m_Op<MulFOp>(mul_of_anyadd, mul_of_anyadd));
```

Successfully matches `%6` in:
```
  %0 = addf %a, %b: f32
  %1 = addf %a, %c: f32 // matched
  %2 = addf %c, %b: f32
  %3 = mulf %a, %2: f32 // matched
  %4 = mulf %3, %1: f32 // matched
  %5 = mulf %4, %4: f32 // matched
  %6 = mulf %5, %5: f32 // matched
```

Note that 0-ary matchers can be used as leaves in place of n-ary matchers. This alleviates from passing explicit `m_any()` leaves.

In the future, we may add extra patterns to specify that operands may be matched in any order.

PiperOrigin-RevId: 284469446

NFC: Expose constFoldBinaryOp via a header

This allows other dialects to reuse the logic to support constant
folding binary operations and reduces code duplication.

PiperOrigin-RevId: 284428721

Update the builder API to take ValueRange instead of ArrayRef<Value *>

This allows for users to provide operand_range and result_range in builder.create<> calls, instead of requiring an explicit copy into a separate data structure like SmallVector/std::vector.

PiperOrigin-RevId: 284360710

Add a new ValueRange class.

This class represents a generic abstraction over the different ways to represent a range of Values: ArrayRef<Value *>, operand_range, result_range. This class will allow for removing the many instances of explicit SmallVector<Value *, N> construction. It has the same memory cost as ArrayRef, and only suffers cost from indexing(if+elsing the different underlying representations).

This change only updates a few of the existing usages, with more to be changed in followups; e.g. 'build' API.

PiperOrigin-RevId: 284307996

Improve Linalg documentation following the Structured Ops presentation.

PiperOrigin-RevId: 284291653

Add a flag to the IRPrinter instrumentation to only print after a pass if there is a change to the IR.

This adds an additional filtering mode for printing after a pass that checks to see if the pass actually changed the IR before printing it. This "change" detection is implemented using a SHA1 hash of the current operation and its children.

PiperOrigin-RevId: 284291089

NFC - update doc, comments, vim syntax file

- for the symbol rules, the code was updated but the doc wasn't.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#284

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/284 from bondhugula:doc 9aad8b8a715559f7ce61265f3da3f8a3c11b45ea
PiperOrigin-RevId: 284283712

Fix langref code snippet - NFC

Closes tensorflow/mlir#294

PiperOrigin-RevId: 284281172

NFC: Separate implementation and definition in ConvertStandardToSPIRV.cpp
PiperOrigin-RevId: 284274326

Change inferReturnTypes to return LogicalResult and values

Previously the error case was using a sentinel in the error case which was bad. Also make the one `build` invoke the other `build` to reuse verification there.

And follow up on suggestion to use formatv which I missed during previous review.

PiperOrigin-RevId: 284265762

Replace custom getBody method with an ODS-generated in gpu::LaunchOp

PiperOrigin-RevId: 284262981

During serialization do a walk of ops in module to find spv.module.

During lowering, spv.module might be within other modules (for example
gpu kernel module). Walk the module op to find spirv module to
serialize.

PiperOrigin-RevId: 284262550

Move GPU::LaunchOp to ODS. NFC.

Move the definition of the GPU launch opreation from hand-rolled C++ code to
ODS framework. This only does the moves, a follow-up is necessary to clean up
users of custom functions that could be auto-generated by ODS.

PiperOrigin-RevId: 284261856

Use named traits in the ODS definition of LLVMFuncOp

The "FunctionLike" and "IsIsolatedFromAbove" op traits are now defined as named
records in base ODS file. Use those instead of NativeOpTrait referring to the
C++ class name in the ODS definition of LLVMFuncOp. NFC.

PiperOrigin-RevId: 284260891

[VecOps] Rename vector.[insert|extract]element to just vector.[insert|extract]

Since these operations lower to [insert|extract][element|value] at LLVM
dialect level, neither element nor value would correctly reflect the meaning.

PiperOrigin-RevId: 284240727

LLVM::GlobalOp: take address space as builder argument

Accept the address space of the global as a builder argument when constructing
an LLVM::GlobalOp instance. This decreases the reliance of LLVM::GlobalOp users
on the internal name of the attribute used for this purpose. Update several
uses of the address space in GPU to NVVM conversion.

PiperOrigin-RevId: 284233254

Move GPU::FuncOp definition to ODS - NFC

Move the definition of the GPU function opreation from hand-rolled C++ code to
ODS framework. This only does the moves, a follow-up is necessary to clean up
users of custom functions that could be auto-generated by ODS.

PiperOrigin-RevId: 284233245

Provide a way to get the type of a ValueHandle.

PiperOrigin-RevId: 284221337

[VectorOps] Add lowering of vector.broadcast to LLVM IR

For example, a scalar broadcast

    %0 = vector.broadcast %x : f32 to vector<2xf32>
    return %0 : vector<2xf32>

which expands scalar x into vector [x,x] by lowering
to the following LLVM IR dialect to implement the
duplication over the leading dimension.

    %0 = llvm.mlir.undef : !llvm<"<2 x float>">
    %1 = llvm.mlir.constant(0 : index) : !llvm.i64
    %2 = llvm.insertelement %x, %0[%1 : !llvm.i64] : !llvm<"<2 x float>">
    %3 = llvm.shufflevector %2, %0 [0 : i32, 0 : i32] : !llvm<"<2 x float>">, !llvm<"<2 x float>">
    return %3 : vector<2xf32>

In the trailing dimensions, the operand is simply
"passed through", unless a more elaborate "stretch"
is required.

For example

    %0 = vector.broadcast %arg0 : vector<1xf32> to vector<4xf32>
    return %0 : vector<4xf32>

becomes

    %0 = llvm.mlir.undef : !llvm<"<4 x float>">
    %1 = llvm.mlir.constant(0 : index) : !llvm.i64
    %2 = llvm.extractelement %arg0[%1 : !llvm.i64] : !llvm<"<1 x float>">
    %3 = llvm.mlir.constant(0 : index) : !llvm.i64
    %4 = llvm.insertelement %2, %0[%3 : !llvm.i64] : !llvm<"<4 x float>">
    %5 = llvm.shufflevector %4, %0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm<"<4 x float>">, !llvm<"<4 x float>">
    llvm.return %5 : !llvm<"<4 x float>">

PiperOrigin-RevId: 284219926

Generate builder for ops that use InferTypeOpInterface trait in ODS

For ops with infer type op interface defined, generate version that calls the inferal method on build. This is intermediate step to removing special casing of SameOperandsAndResultType & FirstAttrDereivedResultType. After that would be generating the inference code, with the initial focus on shaped container types. In between I plan to refactor these a bit to reuse generated paths. The intention would not be to add the type inference trait in multiple places, but rather to take advantage of the current modelling in ODS where possible to emit it instead.

Switch the `inferReturnTypes` method to be static.

Skipping ops with regions here as I don't like the Region vs unique_ptr<Region> difference at the moment, and I want the infer return type trait to be useful for verification too. So instead, just skip it for now to avoid churn.

PiperOrigin-RevId: 284217913

Add conversions of GPU func with memory attributions to LLVM/NVVM

GPU functions use memory attributions, a combination of Op attributes and
region arguments, to specify function-wide buffers placed in workgroup or
private memory spaces. Introduce a lowering pattern for GPU functions to be
converted to LLVM functions taking into account memory attributions. Workgroup
attributions get transformed into module-level globals with unique names
derived from function names. Private attributions get converted into
llvm.allocas inside the function body. In both cases, we inject at the
beginning of the function the IR that obtains the raw pointer to the data and
populates a MemRef descriptor based on the MemRef type of buffer, making
attributions compose with the rest of the MemRef lowering and transparent for
use with std.load and std.store. While using raw pointers instead of
descriptors might have been more efficient, it is better implemented as a
canonicalization or a separate transformation so that non-attribution memrefs
could also benefit from it.

PiperOrigin-RevId: 284208396

fix examples in comments

Closes tensorflow/mlir#301

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/301 from AlexandreEichenberger:vect-doc-update 7e5418a9101a4bdad2357882fe660b02bba8bd01
PiperOrigin-RevId: 284202462

Use regex to fix failure when stats are disabled.

It would be nice if we could detect if stats were enabled or not and use 'Requires', but this isn't possible to do at configure time.

Fixes tensorflow/mlir#296

PiperOrigin-RevId: 284200271

Unroll vector masks along with their associated vector arguments.

Updates vector ContractionOp to use proper vector masks (produced by CreateMaskOp/ConstantMaskOp).
Leverages the following canonicalizations in unrolling unit test: CreateMaskOp -> ConstantMaskOp, StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
Removes IndexTupleOp (no longer needed now that we have vector mask ops).
Updates all unit tests.

PiperOrigin-RevId: 284182168

[spirv] Reorder `erase` and `emplace` to avoid "invalid iterator access".

The iterator should be erased before adding a new entry
into blockMergeInfo to avoid iterator invalidation.

Closes tensorflow/mlir#299

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/299 from denis0x0D:sandbox/reoder_erase 983be565809aa0aadfc7e92962e4d4b282f63c66
PiperOrigin-RevId: 284173235

DimOp folding for alloc/view dynamic dimensions

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closes tensorflow/mlir#253

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/253 from bondhugula:dimop a4b464f24ae63fd259114558d87e11b8ee4dae86
PiperOrigin-RevId: 284169689

minor spelling tweaks

Closes tensorflow/mlir#290

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/290 from kiszk:spelling_tweaks_201912 9d9afd16a723dd65754a04698b3976f150a6054a
PiperOrigin-RevId: 284169681

LLVM::AddressOfOp: properly take into account the address space

The AddressOf operation in the LLVM dialect return a pointer to a global
variable. The latter may be in a non-default address space as indicated by the
"addr_space" attribute. Check that the address space of the pointer returned by
AddressOfOp matches that of the referenced GlobalOp. Update the AddressOfOp
builder to respect this constraint.

PiperOrigin-RevId: 284138860

NFC: Add documentation for `-mlir-print-op-on-diagnostic` and `-mlir-print-stacktrace-on-diagnostic`.

This change adds proper documentation in Diagnostics.md, allowing for users to more easily find them.

PiperOrigin-RevId: 284092336

Add include path to the TestDialect to fix broken build.

PiperOrigin-RevId: 284067891

[Linalg] Add permutation information to tiling

This patch closes issue tensorflow/mlir#271.
It adds an optional permutation map to declarative tiling transformations.
The map is expressed as a list of integers.

Closes tensorflow/mlir#288

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/288 from tetuante:issue271 2df2938d6a1f01b3bc404ded08dea2dd1e10b588
PiperOrigin-RevId: 284064151

Refactor the IRPrinting instrumentation to take a derivable config.

This allows for more interesting behavior from users, e.g. enabling the ability to dump the IR to a separate file for each pass invocation.

PiperOrigin-RevId: 284059447

Add UnrankedMemRef Type

Closes tensorflow/mlir#261

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/261 from nmostafa:nmostafa/unranked 96b6e918f6ed64496f7573b2db33c0b02658ca45
PiperOrigin-RevId: 284037040

[spirv] Add CompositeInsertOp operation

A CompositeInsertOp operation make a copy of a composite object,
while modifying one part of it.

Closes tensorflow/mlir#292

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/292 from denis0x0D:sandbox/composite_insert 2200962b9057bda53cd2f2866b461e2797196380
PiperOrigin-RevId: 284036551

Add support for instance specific pass statistics.

Statistics are a way to keep track of what the compiler is doing and how effective various optimizations are. It is useful to see what optimizations are contributing to making a particular program run faster. Pass-instance specific statistics take this even further as you can see the effect of placing a particular pass at specific places within the pass pipeline, e.g. they could help answer questions like "what happens if I run CSE again here".

Statistics can be added to a pass by simply adding members of type 'Pass::Statistics'. This class takes as a constructor arguments: the parent pass pointer, a name, and a description. Statistics can be dumped by the pass manager in a similar manner to how pass timing information is dumped, i.e. via PassManager::enableStatistics programmatically; or -pass-statistics and -pass-statistics-display via the command line pass manager options.

Below is an example:

struct MyPass : public OperationPass<MyPass> {
  Statistic testStat{this, "testStat", "A test statistic"};

  void runOnOperation() {
    ...
    ++testStat;
    ...
  }
};

$ mlir-opt -pass-pipeline='func(my-pass,my-pass)' foo.mlir -pass-statistics

Pipeline Display:
===-------------------------------------------------------------------------===
                         ... Pass statistics report ...
===-------------------------------------------------------------------------===
'func' Pipeline
  MyPass
    (S) 15 testStat - A test statistic
  MyPass
    (S)  6 testStat - A test statistic

List Display:
===-------------------------------------------------------------------------===
                         ... Pass statistics report ...
===-------------------------------------------------------------------------===
MyPass
  (S) 21 testStat - A test statistic

PiperOrigin-RevId: 284022014

Allow specification of the workgroup size for GPUToSPIRV lowering.

SPIR-V/Vulkan spec requires the workgroups size to be specified with
the spv.ExecutionMode operation. This was hard-wired to be set to a
particular value. It is now changed to be configurable by clients of
the pass or of the patterns that implement the lowering from GPU to
SPIRV.

PiperOrigin-RevId: 284017482

Add spv.AtomicCompareExchangeWeak

PiperOrigin-RevId: 283997917

Add a flag to dump the current stack trace when emitting a diagnostic.

It is often desirable to know where within the program that a diagnostic was emitted, without reverting to assert/unreachable which crash the program. This change adds a flag `mlir-print-stacktrace-on-diagnostic` that attaches the current stack trace as a note to every diagnostic that gets emitted.

PiperOrigin-RevId: 283996373

[spirv] Fix nested loop (de)serialization

For serialization, when we have nested ops, the inner loop will create multiple
SPIR-V blocks. If the outer loop has block arguments (which corresponds to
OpPhi instructions), we defer the handling of OpPhi's parent block handling
until we serialized all blocks and then fix it up with the result <id>. These two
cases happening together was generating invalid SPIR-V blob because we
previously assume the parent block to be the block containing the terminator.
That is not true anymore when the block contains structured control flow ops.
If that happens, it should be fixed to use the structured control flow op's
merge block.

For deserialization, we record a map from header blocks to their corresponding
merge and continue blocks during the initial deserialization and then use the
info to construct spv.selection/spv.loop. The existing implementation will also
fall apart when we have nested loops. If so, we clone all blocks for the outer
loop, including the ones for the inner loop, to the spv.loop's region. So the map
for header blocks' merge info need to be updated; otherwise we are operating
on already deleted blocks.

PiperOrigin-RevId: 283949230

Fix MLIR Build after LLVM upstream JIT changes (getMainJITDylib removed)

The getMainJITDylib() method was removed in 4fc68b9b7f, replace it by creating a JITDylib on the fly.

PiperOrigin-RevId: 283948595

Move ModuleManager functionality into mlir::SymbolTable.

Note for broken code, the following transformations occurred:
ModuleManager::insert(Block::iterator, Operation*) - > SymbolTable::insert(Operation*, Block::iterator)
ModuleManager::lookupSymbol -> SymbolTable::lookup
ModuleManager::getModule() -> SymbolTable::getOp()
ModuleManager::getContext() -> SymbolTable::getOp()->getContext()
ModuleManager::* -> SymbolTable::*
PiperOrigin-RevId: 283944635

Add MLIRIR as a dependency to LLVM and related dialects

Fixes tensorflow/mlir#289

PiperOrigin-RevId: 283914472

Optimize operation ordering to support non-congruent indices.

This change adds support for non-congruent indices in the operation ordering within a basic block. This effect of this is that insertions are less likely to cause an invalidation of the ordering within a block. This has a big effect on modules that have very large basic blocks.

PiperOrigin-RevId: 283858136

Add emitOptional(Error|Warning|Remark) functions to simplify emission with an optional location.

In some situations a diagnostic may optionally be emitted by the presence of a location, e.g. attribute and type verification. These situations currently require extra 'if(loc) emitError(...); return failure()' wrappers that make verification clunky. These new overloads take an optional location and a list of arguments to the diagnostic, and return a LogicalResult. We take the arguments directly and return LogicalResult instead of returning InFlightDiagnostic because we cannot create a valid diagnostic with a null location. This creates an awkward situation where a user may try to treat the, potentially null, diagnostic as a valid one and encounter crashes when attaching notes/etc. Below is an example of how these methods simplify some existing usages:

Before:

  if (loc)
    emitError(*loc, "this is my diagnostic with argument: ") << 5;
  return failure();

After:

  return emitOptionalError(loc, "this is my diagnostic with argument: ", 5);

PiperOrigin-RevId: 283853599

Add a CL option to Standard to LLVM lowering to use alloca instead of malloc/free.

In the future, a more configurable malloc and free interface should be used and exposed via
extra parameters to the `createLowerToLLVMPass`. Until requirements are gathered, a simple CL flag allows generating code that runs successfully on hardware that cannot use the stdlib.

PiperOrigin-RevId: 283833424