NEON kernel to accumulate the biases, if provided, or downscale in case of quantized input. More...

#include <NEDirectConvolutionLayerOutputStageKernel.h>

Collaboration diagram for NEDirectConvolutionLayerOutputStageKernel:

Public Member Functions
const char *	name () const override
	Name of the kernel. More...

	NEDirectConvolutionLayerOutputStageKernel ()
	Default constructor. More...

	NEDirectConvolutionLayerOutputStageKernel (const NEDirectConvolutionLayerOutputStageKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

NEDirectConvolutionLayerOutputStageKernel &	operator= (const NEDirectConvolutionLayerOutputStageKernel &)=delete
	Prevent instances of this class from being copied (As this class contains pointers) More...

	NEDirectConvolutionLayerOutputStageKernel (NEDirectConvolutionLayerOutputStageKernel &&)=default
	Allow instances of this class to be moved. More...

NEDirectConvolutionLayerOutputStageKernel &	operator= (NEDirectConvolutionLayerOutputStageKernel &&)=default
	Allow instances of this class to be moved. More...

	~NEDirectConvolutionLayerOutputStageKernel ()=default
	Default destructor. More...

void	configure (ITensor input, const ITensor bias=nullptr, ITensor *output=nullptr, int result_fixedpoint_multiplier=0, int result_shift=0, int result_offset_after_shift=0)
	Set the accumulate buffer and the biases of the kernel. More...

void	run (const Window &window, const ThreadInfo &info) override
	Execute the kernel on the passed window. More...

Public Member Functions inherited from ICPPKernel
virtual	~ICPPKernel ()=default
	Default destructor. More...

Public Member Functions inherited from IKernel
	IKernel ()
	Constructor. More...

virtual	~IKernel ()=default
	Destructor. More...

virtual bool	is_parallelisable () const
	Indicates whether or not the kernel is parallelisable. More...

virtual BorderSize	border_size () const
	The size of the border for that kernel. More...

const Window &	window () const
	The maximum window the kernel can be executed on. More...

Static Public Member Functions
static Status	validate (const ITensorInfo input, const ITensorInfo bias=nullptr, const ITensorInfo *output=nullptr)
	Static function to check if given info will lead to a valid configuration of NEDirectConvolutionLayerOutputStageKernel. More...

Detailed Description

NEON kernel to accumulate the biases, if provided, or downscale in case of quantized input.

Note: We assume bias to be shared

Definition at line 36 of file NEDirectConvolutionLayerOutputStageKernel.h.

Constructor & Destructor Documentation

NEDirectConvolutionLayerOutputStageKernel ( )

Default constructor.

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

NEDirectConvolutionLayerOutputStageKernel ( const NEDirectConvolutionLayerOutputStageKernel & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

NEDirectConvolutionLayerOutputStageKernel ( NEDirectConvolutionLayerOutputStageKernel && )

default

Allow instances of this class to be moved.

~NEDirectConvolutionLayerOutputStageKernel ( )

default

Default destructor.

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

Member Function Documentation

void configure	(	ITensor *	input,
		const ITensor *	bias = `nullptr`,
		ITensor *	output = `nullptr`,
		int	result_fixedpoint_multiplier = `0`,
		int	result_shift = `0`,
		int	result_offset_after_shift = `0`
	)

Set the accumulate buffer and the biases of the kernel.

Parameters

[in,out]	input	Input to add the bias to. If `output` is not specified then accumulation is done in-place. Data type supported: QS16/QS32/F16/F32
[in]	bias	(Optional) The shared bias tensor to add. It must be 1D Tensor. Data type supported: Same as `input`
[out]	output	(Optional) If the output tensor is specified the accumulation is done out-of-place. (Defaults to nullptr) Data type supported: QS8/QS16/F16/F32
[in]	result_fixedpoint_multiplier	(Optional)Fixed point value to be multiplied to each element of the input matrix when once the result_offset has been add
[in]	result_shift	(Optional)Integer value used to round to nearest division by a power-of-two the result after the fixed point multiplication
[in]	result_offset_after_shift	(Optional)Offset to be applied to result before converting it back to QASYMM8

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

const char* name ( ) const

inlineoverridevirtual

Name of the kernel.

Returns: Kernel name

Implements ICPPKernel.

Definition at line 39 of file NEDirectConvolutionLayerOutputStageKernel.h.

References NEDirectConvolutionLayerOutputStageKernel::configure(), arm_compute::test::validation::info(), NEDirectConvolutionLayerOutputStageKernel::NEDirectConvolutionLayerOutputStageKernel(), NEDirectConvolutionLayerOutputStageKernel::operator=(), NEDirectConvolutionLayerOutputStageKernel::run(), NEDirectConvolutionLayerOutputStageKernel::validate(), IKernel::window(), and NEDirectConvolutionLayerOutputStageKernel::~NEDirectConvolutionLayerOutputStageKernel().

     {
         return "NEDirectConvolutionLayerOutputStageKernel";
     }

NEDirectConvolutionLayerOutputStageKernel& operator= ( const NEDirectConvolutionLayerOutputStageKernel & )

delete

Prevent instances of this class from being copied (As this class contains pointers)

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

NEDirectConvolutionLayerOutputStageKernel& operator= ( NEDirectConvolutionLayerOutputStageKernel && )

default

Allow instances of this class to be moved.

void run	(	const Window &	window,
		const ThreadInfo &	info
	)

overridevirtual

Execute the kernel on the passed window.

Warning: If is_parallelisable() returns false then the passed window must be equal to window()

Note: The window has to be a region within the window returned by the window() method; The width of the window has to be a multiple of num_elems_processed_per_iteration().

Parameters

[in]	window	Region on which to execute the kernel. (Must be a region of the window returned by window())
[in]	info	Info about executing thread and CPU.

Implements ICPPKernel.

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

static Status validate	(	const ITensorInfo *	input,
		const ITensorInfo *	bias = `nullptr`,
		const ITensorInfo *	output = `nullptr`
	)

static

Static function to check if given info will lead to a valid configuration of NEDirectConvolutionLayerOutputStageKernel.

Parameters

[in]	input	Input to add the bias to. If `output` is not specified then accumulation is done in-place. Data type supported: QS16/QS32/F16/F32
[in]	bias	(Optional) The shared bias tensor to add. It must be 1D Tensor. Data type supported: Same as `input`
[in]	output	(Optional) If the output tensor is specified the accumulation is done out-of-place. (Defaults to nullptr) Data type supported: QS8/QS16/F16/F32

Returns: a status

Referenced by NEDirectConvolutionLayerOutputStageKernel::name().

The documentation for this class was generated from the following file:

arm_compute/core/NEON/kernels/NEDirectConvolutionLayerOutputStageKernel.h

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation