From b9275a40034377a99f245cde36f63462a9dc0995 Mon Sep 17 00:00:00 2001 From: Raghuraman Krishnamoorthi Date: Tue, 31 Aug 2021 09:45:28 -0700 Subject: [PATCH] [ao][docs] Add description of qconfig and qengine to quantization page (#63582) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63582 Current quantization docs do not define qconfig and qengine. Added text to define these concepts before they are used. ghstack-source-id: 137051719 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D30658656 fbshipit-source-id: a45a0fcdf685ca1c3f5c3506337246a430f8f506 --- docs/source/quantization.rst | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/docs/source/quantization.rst b/docs/source/quantization.rst index eb6c74c..7053ca6 100644 --- a/docs/source/quantization.rst +++ b/docs/source/quantization.rst @@ -35,6 +35,13 @@ that perform all or part of the computation in lower precision. Higher-level APIs are provided that incorporate typical workflows of converting FP32 model to lower precision with minimal accuracy loss. +Quantization requires users to be aware of three concepts: + +#. Quantization Config (Qconfig): Specifies how weights and activations are to be quantized. Qconfig is needed to create a quantized model. +#. Backend: Refers to kernels that support quantization, usually with different numerics. +#. Quantization engine (torch.backends.quantization.engine): When a quantized model is executed, the qengine specifies which backend is to be used for execution. It is important to ensure that the qengine is consistent with the Qconfig. + + Natively supported backends --------------------------- @@ -45,7 +52,8 @@ Today, PyTorch supports the following backends for running quantized operators e * ARM CPUs (typically found in mobile/embedded devices), via `qnnpack` (``_). -The corresponding implementation is chosen automatically based on the PyTorch build mode. +The corresponding implementation is chosen automatically based on the PyTorch build mode, though users +have the option to override this by setting `torch.backends.quantization.engine` to `fbgemm` or `qnnpack`. .. note:: @@ -58,7 +66,7 @@ The corresponding implementation is chosen automatically based on the PyTorch bu When preparing a quantized model, it is necessary to ensure that qconfig -and the qengine used for quantized computations match the backend on which +and the engine used for quantized computations match the backend on which the model will be executed. The qconfig controls the type of observers used during the quantization passes. The qengine controls whether `fbgemm` or `qnnpack` specific packing function is used when packing weights for linear -- 2.7.4