ARM Compute Library
17.04
|
NEON kernel which transposes the elements of a matrix in chunks of 1x4 if the input data type is F32 or in chunks of 1x8 if the input data type is F16. More...
#include <NEGEMMTranspose1xWKernel.h>
Public Member Functions | |
void | configure (const ITensor *input, ITensor *output) |
Initialise the kernel's input and output. More... | |
void | run (const Window &window) override |
Execute the kernel on the passed window. More... | |
Public Member Functions inherited from ICPPSimpleKernel | |
ICPPSimpleKernel () | |
Constructor. More... | |
ICPPSimpleKernel (const ICPPSimpleKernel &)=delete | |
Prevent instances of this class from being copied (As this class contains pointers) More... | |
ICPPSimpleKernel & | operator= (const ICPPSimpleKernel &)=delete |
Prevent instances of this class from being copied (As this class contains pointers) More... | |
ICPPSimpleKernel (ICPPSimpleKernel &&)=default | |
Allow instances of this class to be moved. More... | |
ICPPSimpleKernel & | operator= (ICPPSimpleKernel &&)=default |
Allow instances of this class to be moved. More... | |
~ICPPSimpleKernel ()=default | |
Default destructor. More... | |
Public Member Functions inherited from ICPPKernel | |
virtual | ~ICPPKernel ()=default |
Default destructor. More... | |
Public Member Functions inherited from IKernel | |
IKernel () | |
Constructor. More... | |
virtual | ~IKernel ()=default |
Destructor. More... | |
virtual bool | is_parallelisable () const |
Indicates whether or not the kernel is parallelisable. More... | |
virtual BorderSize | border_size () const |
The size of the border for that kernel. More... | |
const Window & | window () const |
The maximum window the kernel can be executed on. More... | |
NEON kernel which transposes the elements of a matrix in chunks of 1x4 if the input data type is F32 or in chunks of 1x8 if the input data type is F16.
Following an example of how the transposition1xW works when the input data type is F32
\[ \left( \begin{array}{cccc} a00 & a01 & a02 & a03 \\ a10 & a11 & a12 & a13 \\ a20 & a21 & a22 & a23 \\ a30 & a31 & a32 & a33 \\ \end{array} \right) \rightarrow \left( \begin{array}{ccccccccccccccccc} a00 & a01 & a02 & a03 & a10 & a11 & a12 & a13 & a20 & a21 & a22 & a23 & a30 & a31 & a32 & a33 \\ \end{array} \right) \]
Following an example of how the transposition1xW works when the input data type is F16
\[ \left( \begin{array}{cccccccc} a00 & a01 & a02 & a03 & a04 & a05 & a06 & a7 \\ a10 & a11 & a12 & a13 & a14 & a15 & a16 & 17 \\ a20 & a21 & a22 & a23 & a24 & a25 & a26 & 27 \\ a30 & a31 & a32 & a33 & a34 & a35 & a36 & 37 \\ \end{array} \right) \rightarrow \left( \begin{array}{cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc} a00 & a01 & a02 & a03 & a04 & a05 & a06 & a07 & a10 & a11 & a12 & a13 & a14 & a15 & a16 & a17 & a20 & a21 & a22 & a23 & a24 & a25 & a26 & a27 & a30 & a31 & a32 & a33 & a34 & a35 & a36 & a37\\ \end{array} \right) \]
Definition at line 69 of file NEGEMMTranspose1xWKernel.h.
Initialise the kernel's input and output.
[in] | input | Input tensor. Data types supported: F32, 16. |
[out] | output | Output tensor. Data type supported: same as input . |
|
overridevirtual |
Execute the kernel on the passed window.
[in] | window | Region on which to execute the kernel. (Must be a region of the window returned by window()) |
Implements ICPPKernel.