Fold col offsets into bias; optimize A symmetric quant (#16942)
authorJongsoo Park <jongsoo@fb.com>
Wed, 13 Feb 2019 01:00:33 +0000 (17:00 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Wed, 13 Feb 2019 01:33:06 +0000 (17:33 -0800)
commit92221ad8408c9e95d6e140b857225bd228907d1d
tree640c11d8851c5a208ca032857bf967d1e183f1ed
parent3e1e5d5a8bdbee579339144110e77124cf4e62b2
Fold col offsets into bias; optimize A symmetric quant (#16942)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16942

We can fold col offsets into bias if zero point of activation is constant.
fbgemm still needs to provide an option to pass col offsets in case zero point of activation keep changes (e.g., dynamic quantization).
A trick to optimize static quantization case is setting A zero point to 0 after folding into bias.

This diff also optimizes when weights use symmetric quantization. When B zero point is 0, we use PackAMatrix instead of PackAWithRowOffset .

TODO:
Ideally, PackAWithRowOffset should perform as fast as PackAMatrix when B_zero_point is 0 to make client code simpler
Same in PackAWithIm2Col and depth-wise convolution (group convolution is already doing this)

Reviewed By: csummersea

Differential Revision: D14013931

fbshipit-source-id: e4d313343e2a16a451eb910beed30e35de02a40c
caffe2/quantization/server/fully_connected_dnnlowp_acc16_op.cc
caffe2/quantization/server/fully_connected_dnnlowp_op.cc
caffe2/quantization/server/fully_connected_dnnlowp_op.h
caffe2/quantization/server/fully_connected_dnnlowp_op_test.py