APIs are provided that incorporate typical workflows of converting FP32 model
to lower precision with minimal accuracy loss.
+Quantization requires users to be aware of three concepts:
+
+#. Quantization Config (Qconfig): Specifies how weights and activations are to be quantized. Qconfig is needed to create a quantized model.
+#. Backend: Refers to kernels that support quantization, usually with different numerics.
+#. Quantization engine (torch.backends.quantization.engine): When a quantized model is executed, the qengine specifies which backend is to be used for execution. It is important to ensure that the qengine is consistent with the Qconfig.
+
+
Natively supported backends
---------------------------
* ARM CPUs (typically found in mobile/embedded devices), via
`qnnpack` (`<https://github.com/pytorch/QNNPACK>`_).
-The corresponding implementation is chosen automatically based on the PyTorch build mode.
+The corresponding implementation is chosen automatically based on the PyTorch build mode, though users
+have the option to override this by setting `torch.backends.quantization.engine` to `fbgemm` or `qnnpack`.
.. note::
When preparing a quantized model, it is necessary to ensure that qconfig
-and the qengine used for quantized computations match the backend on which
+and the engine used for quantized computations match the backend on which
the model will be executed. The qconfig controls the type of observers used
during the quantization passes. The qengine controls whether `fbgemm` or
`qnnpack` specific packing function is used when packing weights for linear