Fix formating in caffe2/quantization/server/README.md

author Jongsoo Park <jongsoo@fb.com>

Tue, 22 Jan 2019 18:08:33 +0000 (10:08 -0800)

committer Facebook Github Bot <facebook-github-bot@users.noreply.github.com>

Tue, 22 Jan 2019 18:15:37 +0000 (10:15 -0800)
author Jongsoo Park <jongsoo@fb.com>
Tue, 22 Jan 2019 18:08:33 +0000 (10:08 -0800)
committer Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
Tue, 22 Jan 2019 18:15:37 +0000 (10:15 -0800)
diff --git a/caffe2/quantization/server/README.md b/caffe2/quantization/server/README.md

index 35e94d6..54edd9e 100644 (file)
--- a/caffe2/quantization/server/README.md
+++ b/caffe2/quantization/server/README.md
@@ -24,7 +24,7 @@ The users could modify the code to use a different requantization implementation
  
  * 16-bit accumulation with outlier-aware quantization
  
-In current Intel processors, int8 multiplication with int32 accumulation doesn't provide very high speedup: 3 instructions vpmaddubsw + vpmaddwd + vpadd are needed. With 16-bit accumulation, we can use 2 instructions instead with up to 2x instruction throughput per cycle. However, 16-bit accumulation can lead to frequent saturation and hence a big accuracy drop. We minimize the saturation by splitting the weight matrix into two parts, W = W_main + W_outlier, where W_main contains values with small magnitude and W_outlier contains the residual. The matrix multiplication, X*W^T is calculated in two stages, where X*W_main^T uses 16-bit accumulation, and X*W_outlier^T uses 32-bit accumulation. W_outlier is typically sparse hence X*W_outlier^T accounts for a small fraction of the total time.
+In current Intel processors, int8 multiplication with int32 accumulation doesn't provide very high speedup: 3 instructions vpmaddubsw + vpmaddwd + vpadd are needed. With 16-bit accumulation, we can use 2 instructions instead with up to 2x instruction throughput per cycle. However, 16-bit accumulation can lead to frequent saturation and hence a big accuracy drop. We minimize the saturation by splitting the weight matrix into two parts, W = W_main + W_outlier, where W_main contains values with small magnitude and W_outlier contains the residual. The matrix multiplication, X x W^T is calculated in two stages, where X x W_main^T uses 16-bit accumulation, and X x W_outlier^T uses 32-bit accumulation. W_outlier is typically sparse hence X x W_outlier^T accounts for a small fraction of the total time.
  This implementation can be used by setting the Caffe2 operator engine to DNNLOWP_ACC16. Conv, ConvRelu, and FC support DNNLOWP_ACC16. The threshold for outlier can be controlled by nbits_in_non_outlier argument of the operator. For example, when nbits_in_non_outlier=7, a value is outlier if it needs more than 7-bit (e.g. the value is <= -65 or >= 64).
  
  * Dynamic quantization
author	Jongsoo Park <jongsoo@fb.com>
	Tue, 22 Jan 2019 18:08:33 +0000 (10:08 -0800)
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
	Tue, 22 Jan 2019 18:15:37 +0000 (10:15 -0800)