[ao][docs] Add description of qconfig and qengine to quantization page (#63582)

author Raghuraman Krishnamoorthi <raghuraman@fb.com>

Tue, 31 Aug 2021 16:45:28 +0000 (09:45 -0700)

committer Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com>

Tue, 31 Aug 2021 17:33:07 +0000 (10:33 -0700)
author Raghuraman Krishnamoorthi <raghuraman@fb.com>
Tue, 31 Aug 2021 16:45:28 +0000 (09:45 -0700)
committer Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Tue, 31 Aug 2021 17:33:07 +0000 (10:33 -0700)
diff --git a/docs/source/quantization.rst b/docs/source/quantization.rst

index eb6c74c..7053ca6 100644 (file)
--- a/docs/source/quantization.rst
+++ b/docs/source/quantization.rst
@@ -35,6 +35,13 @@ that perform all or part of the computation in lower precision. Higher-level
  APIs are provided that incorporate typical workflows of converting FP32 model
  to lower precision with minimal accuracy loss.
  
+Quantization requires users to be aware of three concepts:
+
+#. Quantization Config (Qconfig): Specifies how weights and activations are to be quantized. Qconfig is needed to create a quantized model.
+#. Backend: Refers to kernels that support quantization, usually with different numerics.
+#. Quantization engine (torch.backends.quantization.engine): When a quantized model is executed, the qengine specifies which backend is to be used for execution. It is important to ensure that the qengine is consistent with the Qconfig.
+
+
  Natively supported backends
  ---------------------------
  
@@ -45,7 +52,8 @@ Today, PyTorch supports the following backends for running quantized operators e
  * ARM CPUs (typically found in mobile/embedded devices), via
    `qnnpack` (`<https://github.com/pytorch/QNNPACK>`_).
  
-The corresponding implementation is chosen automatically based on the PyTorch build mode.
+The corresponding implementation is chosen automatically based on the PyTorch build mode, though users
+have the option to override this by setting `torch.backends.quantization.engine` to `fbgemm` or `qnnpack`.
  
  .. note::
  
@@ -58,7 +66,7 @@ The corresponding implementation is chosen automatically based on the PyTorch bu
  
  
  When preparing a quantized model, it is necessary to ensure that qconfig
-and the qengine used for quantized computations match the backend on which
+and the engine used for quantized computations match the backend on which
  the model will be executed. The qconfig controls the type of observers used
  during the quantization passes. The qengine controls whether `fbgemm` or
  `qnnpack` specific packing function is used when packing weights for linear
author	Raghuraman Krishnamoorthi <raghuraman@fb.com>
	Tue, 31 Aug 2021 16:45:28 +0000 (09:45 -0700)
committer	Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
	Tue, 31 Aug 2021 17:33:07 +0000 (10:33 -0700)