Accuracy Validation

Introduction

Objects

The accuracy validation in TPU-MLIR is mainly for the mlir model, fp32 uses the mlir model of the top layer while the int8 symmetric and asymmetric quantization uses the mlir model of the tpu layer.

Metrics

Currently, the validation is mainly used for classification and object detection networks. The metrics for classification networks are Top-1 and Top-5 accuracy, while the object detection networks use 12 metrics of COCO, as shown below. Generally, we record the Average Precision when IoU=0.5 (i.e., PASCAL VOC metric).

\[\begin{split}\boldsymbol{Average Precision (AP):} & \\ AP\quad & \text{\% AP at IoU=.50:.05:.95 (primary challenge metric)} \\ AP^{IoU}=.50\quad & \text{\% AP at IoU=.50 (PASCAL VOC metric)} \\ AP^{IoU}=.75\quad & \text{\% AP at IoU=.75 (strict metric)} \\ \boldsymbol{AP Across Scales:} & \\ AP^{small}\quad & \text{\% AP for small objects: $area < 32^2$} \\ AP^{medium}\quad & \text{\% AP for medium objects: $32^2 < area < 96^2$} \\ AP^{large}\quad & \text{\% AP for large objects: $area > 96^2$} \\ \boldsymbol{Average Recall (AR):} & \\ AR^{max=1}\quad & \text{\% AR given 1 detection per image} \\ AR^{max=10}\quad & \text{\% AR given 10 detections per image} \\ AR^{max=100}\quad & \text{\% AR given 100 detections per image} \\ \boldsymbol{AP Across Scales:} & \\ AP^{small}\quad & \text{\% AP for small objects: $area < 32^2$} \\ AP^{medium}\quad & \text{\% AP for medium objects: $32^2 < area < 96^2$} \\ AP^{large}\quad & \text{\% AP for large objects: $area > 96^2$}\end{split}\]

Datasets

In addition, the dataset used for validation needs to be downloaded by yourself. Classification networks use the validation set of ILSVRC2012 (50,000 images, https://www.image-net.org/challenges/LSVRC/2012/). There are two ways to place the images in the dataset. One is that there are 1000 subdirectories under the dataset directory, corresponding to 1000 classes, and each class has 50 images. In this case, no additional label file is required. The other way is that all images are in the same dataset directory, and there is an additional label file. According to the sequence of images’ names, each line in the txt file uses a number from 1 to 1000 to indicate the class of each image.

Object detection networks use the COCO2017 validation set (5000 images, https://cocodataset.org/#download). All images are under the same dataset directory. The corresponding json label file needs to be downloaded as well.

Validation Interface

TPU-MLIR provides the command for accuracy validation:

$ model_eval.py \
    --model_file mobilenet_v2.mlir \
    --count 50 \
    --dataset_type imagenet \
    --postprocess_type topx \
    --dataset datasets/ILSVRC2012_img_val_with_subdir

The supported parameters are shown below:

Function of model_eval.py parameters

Name	Required?	Explanation
model_file	Y	Model file
dataset	N	Directory of dataset
dataset_type	N	Dataset type. Currently mainly supports imagenet, coco. The default is imagenet
postprocess_type	Y	Metric. Currently supports topx and coco_mAP
label_file	N	txt label file, which may be needed when validating the accuracy of classification networks
coco_annotation	N	json label file, required when validating object detection networks
count	N	The number of images used for validation. The default is to use the entire dataset.

Validation Example

In this section, mobilenet_v2 and yolov5s are used as the representative of the classification network and the object detection network for accuracy validation.

mobilenet_v2

Dataset Downloading
Download the ILSVRC2012 validation set to the datasets/ILSVRC2012_img- _val_with_subdir directory. Images of the dataset are placed in subdirectories, so no additional label files are required.
Model Conversion
Use the model_transform.py interface to convert the original model to the mobilenet_v2.mlir model, and obtain mobilenet_v2_cali_table through the run_calibration.py interface. Please refer to the “User Interface” chapter for specific usage. The INT8 model of the tpu layer is obtained through the command below. After running the command, an intermediate file named mobilenet_v2_bm1684x_int8_sym_tpu.mlir will be generated. We will use this intermediate file to validate the accuracy of the INT8 symmetric quantized model:

# INT8 Sym Model
$ model_deploy.py \
   --mlir mobilenet_v2.mlir \
   --quantize INT8 \
   --calibration_table mobilenet_v2_cali_table \
   --chip bm1684x \
   --test_input mobilenet_v2_in_f32.npz \
   --test_reference mobilenet_v2_top_outputs.npz \
   --tolerance 0.95,0.69 \
   --model mobilenet_v2_int8.bmodel

Accuracy Validation
Use the model_eval.py interface to validate:

# F32 model validation
$ model_eval.py \
    --model_file mobilenet_v2.mlir \
    --count 50000 \
    --dataset_type imagenet \
    --postprocess_type topx \
    --dataset datasets/ILSVRC2012_img_val_with_subdir

# INT8 sym model validation
$ model_eval.py \
    --model_file mobilenet_v2_bm1684x_int8_sym_tpu.mlir \
    --count 50000 \
    --dataset_type imagenet \
    --postprocess_type topx \
    --dataset datasets/ILSVRC2012_img_val_with_subdir

The accuracy validation results of the F32 model and the INT8 symmetric quantization model are as follows:

# mobilenet_v2.mlir validation result
2022/11/08 01:30:29 - INFO : idx:50000, top1:0.710, top5:0.899
INFO:root:idx:50000, top1:0.710, top5:0.899

# mobilenet_v2_bm1684x_int8_sym_tpu.mlir validation result
2022/11/08 05:43:27 - INFO : idx:50000, top1:0.702, top5:0.895
INFO:root:idx:50000, top1:0.702, top5:0.895

yolov5s

Dataset Downloading
Download the COCO2017 validation set to the datasets/val2017 directory, which contains 5,000 images for validation. The corresponding label file instances_val2017.json is downloaded to the datasets directory.
Model Conversion
The conversion process is similar to mobilenet_v2.
Accuracy Validation
Use the model_eval.py interface to validate:

# F32 model validation
$ model_eval.py \
    --model_file yolov5s.mlir \
    --count 5000 \
    --dataset_type coco \
    --postprocess_type coco_mAP \
    --coco_annotation datasets/instances_val2017.json \
    --dataset datasets/val2017

# INT8 sym model validation
$ model_eval.py \
    --model_file yolov5s_bm1684x_int8_sym_tpu.mlir \
    --count 5000 \
    --dataset_type coco \
    --postprocess_type coco_mAP \
    --coco_annotation datasets/instances_val2017.json \
    --dataset datasets/val2017

The accuracy validation results of the F32 model and the INT8 symmetric quantization model are as follows:

# yolov5s.mlir validation result
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.369
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.561
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.393
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.217
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.422
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.470
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.300
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.502
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.542
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.602
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.670

# yolov5s_bm1684x_int8_sym_tpu.mlir validation result
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.337
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.365
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.382
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.432
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.281
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.473
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.514
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.337
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.566
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.636