It can be seen that the int8 symmetric quantization model performs poorly compared to the original model on this image and only detects one target.
To Mix Precision Model
After int8 conversion, do these commands as beflow.
Step 1: Gen quantization table
Use run_qtable.py to gen qtable, parameters as below:
run_qtable.py parameters
Name
Required?
Explanation
(None)
Y
mlir file
dataset
N
Directory of input samples. Images, npz or npy files are placed in this directory
data_list
N
The sample list (cannot be used together with “dataset”)
calibration_table
Y
Name of calibration table file
chip
Y
The platform that the model will use. Support bm1684x/bm1684/cv183x/cv182x/cv181x/cv180x.
fp_type
N
Specifies the type of float used for mixing precision. Support auto,F16,F32,BF16. Default is auto, indicating that it is automatically selected by program
input_num
N
The number of sample, default 10
expected_cos
N
Specify the minimum cos value for the expected final output layer of the network. The default is 0.99. The smaller the value, the more layers may be set to floating-point
min_layer_cos
N
Specify the minimum cos expected per layer, below which an attempt is made to set the fp32 calculation. The default is 0.99
debug_cmd
N
Specifies a debug command string for development. It is empty by default
o
Y
output quantization table
The operation is as follows:
$ run_qtable.py yolov3_tiny.mlir \
--dataset ../COCO2017 \
--calibration_table yolov3_cali_table \
--min_layer_cos 0.999 \ #If the default 0.99 is used here, the program detects that the original int8 model already meets the cos of 0.99 and simply stops searching
--expected_cos 0.9999 \
--chip bm1684x \
-o yolov3_qtable
The final output after execution is printed as follows:
int8 outputs_cos:0.999317
mix model outputs_cos:0.999739
Output mix quantization table to yolov3_qtable
total time:44 second
Above, int8 outputs_cos represents the cos similarity between original network output of int8 model and fp32; mix model outputs_cos represents the cos similarity of network output after mixing precision is used in some layers; total time represents the search time of 44 seconds.
In addition,get quantization table yolov3_qtable, context as below:
This table is arranged smoothly according to the cos from small to large, indicating the cos calculated
by this Layer after the precursor layer of this layer has been changed to the corresponding floating-point mode.
If the cos is still smaller than the previous parameter min_layer_cos, this layer and its immediate successor
layer will be set to floating-point calculation。
run_qtable.py calculates the output cos of the whole network every time the neighboring two layers are set
to floating point. If the cos is larger than the specified expected_cos, the search is withdrawn. Therefore,
if you set a larger expected_cos value, you will try to set more layers to floating point。