[Matrix] Factor and distribute transposes across multiplies
authorAdam Nemet <anemet@apple.com>
Tue, 18 May 2021 16:59:07 +0000 (09:59 -0700)
committerAdam Nemet <anemet@apple.com>
Tue, 25 May 2021 18:12:20 +0000 (11:12 -0700)
commitdfd1bbd00ac09b84c76cc5980cee1deb68475a04
tree3e1b7180b771867795b39c3d76e67632df469623
parent2ea6e13bf8189efb09cec89184b21f1db3de0d1c
[Matrix] Factor and distribute transposes across multiplies

Now that we can fold some transposes into multiplies (CM: A * B^t and RM:
A^t * B), we want to move them around to create the optimal expressions:

* fold away double transposes while still using them to assert the shape
* sink transposes hoping they cancel out
* lift transposes when both operands are transposed

This also modifies the matrix remarks to include the number of exposed
transposes (i.e. transposes that we couldn't fold into a multiply).

The adjustment to the test remarks-inlining is a bit subtle: I am changing the
double transpose to a single transpose so that we don't remove it completely.
More importantly this changes some of the total instruction count, most
notable stores because we can no longer use a vector store.

Differential Revision: https://reviews.llvm.org/D102733
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
llvm/test/Transforms/LowerMatrixIntrinsics/remarks-inlining.ll
llvm/test/Transforms/LowerMatrixIntrinsics/remarks-shared-subtrees.ll
llvm/test/Transforms/LowerMatrixIntrinsics/transpose-opts.ll [new file with mode: 0644]