compiler/circle-mpqsolver/README.md

   1 # circle-mpqsolver
   2 _circle-mpqsolver_ provides light-weight methods for finding a high-quality mixed-precision model
   3 within a reasonable time.
   4
   5 ## Methods
   6
   7 ### Bisection
   8 A model is split into two parts: front and back. One of them is quantized in uint8 and another in
   9 int16. The precision of front and back is determined by our proxy metric, upperbound of total layer
  10 errors. (See https://github.com/Samsung/ONE/pull/10170#discussion_r1042246598 for more details)
  11
  12 The boundary between the front and the back is decided by the depth of operators (depth: distance
  13 from input to the operator), i.e., given a depth d, layers with a depth less than d are included
  14 in front, and the rest are included in back. Bisection performs binary search to find a proper
  15 depth which achieves a qerror less than target_qerror.
  16
  17 In case front is quantized into Q16 the pseudocode is the following:
  18 ```
  19     until |_depth_max_ - _depth_min_| <=1 do
  20         _current_depth_ = 0.5 * (_depth_max_ + _depth_min_)
  21         if Loss(_current_depth_) < _target_loss_
  22             _depth_max_ = _current_depth_
  23         else
  24             _depth_min_ = _current_depth_
  25 ```
  26 , where Loss(current_depth) is the qerror of the mixied-precision model split at current_depth.
  27 As every iteration halves the remaining range (|depth_max - depth_min|), it converges in
  28 _~log2(max_depth)_ iterations.
  29
  30 ## Usage
  31 Run _circle-mpqsolver_ with the following arguments.
  32
  33 --data: .h5 file with test data
  34
  35 --input_model: Input float model initialized with min-max (recorded model)
  36
  37 --output_model: Output qunatized mode
  38
  39 --qerror_ratio: Target quantization error ratio. It should be in [0, 1]. 0 indicates qerror of full int16 model, 1 indicates qerror of full uint8 model. The lower `qerror_ratio` indicates the more accurate solution.
  40
  41 --bisection _mode_: input nodes should be at Q16 precision ['auto', 'true', 'false']
  42
  43 ```
  44 $ ./circle-mpqsolver
  45   --data <.h5 data>
  46   --input_model <input_recorded_model>
  47   --output_model <output_model_pat>
  48   --qerror_ratio <optional value for reproducing target _qerror_ default is 0.5>
  49   --bisection <whether input nodes should be quantized into Q16 default is 'auto'>
  50 ```
  51
  52 For example:
  53 ```
  54 $./circle-mpqsolver
  55     --data dataset.h5
  56     --input_model model.recorded.circle
  57     --output_model model.q_opt.circle
  58     --qerror_ratio 0.4f
  59     --bisection true
  60 ```
  61
  62 It will produce _model.q_opt.circle_, which is _model.recorded.circle_ quantized to mixed precision
  63 using _dataset.h5_, with input nodes set to _Q16_ precision and quantization error (_qerror_) of
  64 _model.q_opt.circle_ will be less than
  65 ```
  66  _qerror(full_q16) + qerror_ratio * (qerror(full_q8) - qerror(full_q16))_
  67  ```
  68  (_full_q16_ - model quantized using Q16 precision, _full_q8_ - model quantized using Q8 precision).