OpenCL kernel used to quantize down the int32 accumulator values of GEMMLowp to QASYMM8.
More...
|
| CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel () |
| Constructor. More...
|
|
| CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel (const CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel &)=delete |
| Prevent instances of this class from being copied (As this class contains pointers) More...
|
|
CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel & | operator= (const CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel &)=delete |
| Prevent instances of this class from being copied (As this class contains pointers) More...
|
|
| CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel (CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel &&)=default |
| Allow instances of this class to be moved. More...
|
|
CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel & | operator= (CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel &&)=default |
| Allow instances of this class to be moved. More...
|
|
void | configure (const ICLTensor *input, const ICLTensor *bias, ICLTensor *output, int result_fixedpoint_multiplier, int result_shift, int result_offset_after_shift, int min=0, int max=0) |
| Initialise the kernel's input and output. More...
|
|
void | run (const Window &window, cl::CommandQueue &queue) override |
| Enqueue the OpenCL kernel to process the given window on the passed OpenCL command queue. More...
|
|
| ICLKernel () |
| Constructor. More...
|
|
cl::Kernel & | kernel () |
| Returns a reference to the OpenCL kernel of this object. More...
|
|
template<typename T > |
void | add_1D_array_argument (unsigned int &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window) |
| Add the passed 1D array's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_1D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 1D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_2D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 2D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_3D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 3D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | add_4D_tensor_argument (unsigned int &idx, const ICLTensor *tensor, const Window &window) |
| Add the passed 4D tensor's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
template<typename T > |
void | add_argument (unsigned int &idx, T value) |
| Add the passed parameters to the object's kernel's arguments starting from the index idx. More...
|
|
void | set_lws_hint (const cl::NDRange &lws_hint) |
| Set the Local-Workgroup-Size hint. More...
|
|
cl::NDRange | lws_hint () const |
| Return the Local-Workgroup-Size hint. More...
|
|
const std::string & | config_id () const |
| Get the configuration ID. More...
|
|
void | set_target (GPUTarget target) |
| Set the targeted GPU architecture. More...
|
|
void | set_target (cl::Device &device) |
| Set the targeted GPU architecture according to the CL device. More...
|
|
GPUTarget | get_target () const |
| Get the targeted GPU architecture. More...
|
|
size_t | get_max_workgroup_size () |
| Get the maximum workgroup size for the device the CLKernelLibrary uses. More...
|
|
template<typename T , unsigned int dimension_size> |
void | add_array_argument (unsigned &idx, const ICLArray< T > *array, const Strides &strides, unsigned int num_dimensions, const Window &window) |
| Add the passed array's parameters to the object's kernel's arguments starting from the index idx. More...
|
|
| IKernel () |
| Constructor. More...
|
|
virtual | ~IKernel ()=default |
| Destructor. More...
|
|
virtual bool | is_parallelisable () const |
| Indicates whether or not the kernel is parallelisable. More...
|
|
virtual BorderSize | border_size () const |
| The size of the border for that kernel. More...
|
|
const Window & | window () const |
| The maximum window the kernel can be executed on. More...
|
|
OpenCL kernel used to quantize down the int32 accumulator values of GEMMLowp to QASYMM8.
This kernel takes a final int32 accumulator value (the output of CLGEMMLowpMatrixMultiplyKernel), and processes it to obtain the final QASYMM8 value. The following computations will be performed by the kernel:
- Compute fixed point multiplication between each entry of input by result_fixedpoint_multiplier
- Add bias to final result if bias tensor is not a nullptr
- Round to nearest division by a power-of-two using result_shift
- Add offset to each result
- Clamp the value between the specified min and max bounds
- Clamp the resulting int32 values to the [0..255] range and cast to QASYMM8.
Definition at line 45 of file CLGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPointKernel.h.