2 _circle-mpqsolver_ provides light-weight methods for finding a high-quality mixed-precision model
3 within a reasonable time.
8 A model is split into two parts: front and back. One of them is quantized in uint8 and another in
9 int16. The precision of front and back is determined by our proxy metric, upperbound of total layer
10 errors. (See https://github.com/Samsung/ONE/pull/10170#discussion_r1042246598 for more details)
12 The boundary between the front and the back is decided by the depth of operators (depth: distance
13 from input to the operator), i.e., given a depth d, layers with a depth less than d are included
14 in front, and the rest are included in back. Bisection performs binary search to find a proper
15 depth which achieves a qerror less than target_qerror.
17 In case front is quantized into Q16 the pseudocode is the following:
19 until |_depth_max_ - _depth_min_| <=1 do
20 _current_depth_ = 0.5 * (_depth_max_ + _depth_min_)
21 if Loss(_current_depth_) < _target_loss_
22 _depth_max_ = _current_depth_
24 _depth_min_ = _current_depth_
26 , where Loss(current_depth) is the qerror of the mixied-precision model split at current_depth.
27 As every iteration halves the remaining range (|depth_max - depth_min|), it converges in
28 _~log2(max_depth)_ iterations.
31 Run _circle-mpqsolver_ with the following arguments.
33 --data: .h5 file with test data
35 --input_model: Input float model initialized with min-max (recorded model)
37 --output_model: Output qunatized mode
39 --qerror_ratio: Target quantization error ratio. It should be in [0, 1]. 0 indicates qerror of full int16 model, 1 indicates qerror of full uint8 model. The lower `qerror_ratio` indicates the more accurate solution.
41 --bisection _mode_: input nodes should be at Q16 precision ['auto', 'true', 'false']
42 --visq_file: .visq.json file to be used in 'auto' mode
43 --save_intermediate: path to the directory where all intermediate results will be saved
48 --input_model <input_recorded_model>
49 --output_model <output_model_pat>
50 --qerror_ratio <optional value for reproducing target _qerror_ default is 0.5>
51 --bisection <whether input nodes should be quantized into Q16 default is 'auto'>
52 --visq_file <*.visq.json file with quantization errors>
53 --save_intermediate <intermediate_results_path>
60 --input_model model.recorded.circle
61 --output_model model.q_opt.circle
66 It will produce _model.q_opt.circle_, which is _model.recorded.circle_ quantized to mixed precision
67 using _dataset.h5_, with input nodes set to _Q16_ precision and quantization error (_qerror_) of
68 _model.q_opt.circle_ will be less than
70 _qerror(full_q16) + qerror_ratio * (qerror(full_q8) - qerror(full_q16))_
72 (_full_q16_ - model quantized using Q16 precision, _full_q8_ - model quantized using Q8 precision).