Create a model_yolov5s directory, note that it is the same level directory as tpu-mlir; and put both model files and image files
into the model_yolov5s directory.
$TPUC_ROOT is an environment variable, corresponding to the tpu-mlir_xxxx directory.
ONNX to MLIR
If the input is image, we need to know the preprocessing of the model before transferring it. If the model uses preprocessed npz files as input, no preprocessing needs to be considered.
The preprocessing process is formulated as follows ( \(x\) represents the input):
The image of the official yolov5 is rgb. Each value will be multiplied by 1/255, respectively corresponding to
0.0,0.0,0.0 and 0.0039216,0.0039216,0.0039216 when it is converted into mean and scale.
The relevant parameters of model_deploy.py are as follows:
Function of model_deploy parameters
Name
Required?
Explanation
mlir
Y
Mlir file
quantize
Y
Quantization type (F32/F16/BF16/INT8)
chip
Y
The platform that the model will use. Support bm1684x/bm1684/cv183x/cv182x/cv181x/cv180x.
calibration_table
N
The calibration table path. Required when it is INT8 quantization
tolerance
N
Tolerance for the minimum similarity between MLIR quantized and MLIR fp32 inference results
test_input
N
The input file for validation, which can be an image, npy or npz. No validation will be carried out if it is not specified
test_reference
N
Reference data for validating mlir tolerance (in npz format). It is the result of each operator
compare_all
N
Compare all tensors, if set.
excepts
N
Names of network layers that need to be excluded from validation. Separated by comma
model
Y
Name of output model file (including path)
After compilation, a file named ${model_name}_1684x_f32.bmodel is generated.
MLIR to INT8 bmodel
Calibration table generation
Before converting to the INT8 model, you need to run calibration to get the calibration table. The number of input data is about 100 to 1000 according to the situation.
Then use the calibration table to generate a symmetric or asymmetric bmodel. It is generally not recommended to use the asymmetric one if the symmetric one already meets the requirements, because
the performance of the asymmetric model will be slightly worse than the symmetric model.
Here is an example of the existing 100 images from COCO2017 to perform calibration:
After running the command above, a file named ${model_name}_cali_table will be generated, which is used as the input file for subsequent compilation of the INT8 model.
Compile to INT8 symmetric quantized model
Execute the following command to convert to the INT8 symmetric quantized model:
After compilation, a file named ${model_name}_1684x_int8_asym.bmodel is generated.
Effect comparison
There is a yolov5 use case written in python in this release package for object detection on images. The source code path is $TPUC_ROOT/python/samples/detect_yolov5.py. It can be learned how the model is used by reading the code. Firstly, preprocess to get the model’s input, then do inference to get the output, and finally do post-processing.
Use the following codes to validate the inference results of onnx/f32/int8 respectively.
The onnx model is run as follows to get dog_onnx.jpg:
Due to different operating environments, the final performance will be somewhat different from Fig. 3.1.
Model performance test
The following operations need to be performed outside of Docker,
Install the libsophon
Please refer to the libsophon manual to install libsophon.
Check the performance of BModel
After installing libsophon, you can use bmrt_test to test the accuracy and performance of the bmodel. You can choose a suitable model by estimating the maximum fps of the model based on the output of bmrt_test.
# Test the bmodel compiled above
# --bmodel parameter followed by bmodel file,
$ cd $TPUC_ROOT/../model_yolov5s/workspace
$ bmrt_test --bmodel yolov5s_1684x_f32.bmodel
$ bmrt_test --bmodel yolov5s_1684x_int8_asym.bmodel
$ bmrt_test --bmodel yolov5s_1684x_int8_sym.bmodel
Take the output of the last command as an example (the log is partially truncated here):
The following information can be learned from the output above:
Lines 05-08: the input and output information of bmodel
Line 19: running time on the TPU, of which the TPU takes 4009us and the CPU takes 113us. The CPU time here mainly refers to the waiting time of calling at HOST