Accuracy Validation
=====================

Introduction
------------

Objects
~~~~~~~~~~~~
The accuracy validation in TPU-MLIR is mainly for the mlir model, fp32 uses the mlir model of the top layer while the int8 symmetric and asymmetric quantization uses the mlir model of the tpu layer.

Metrics
~~~~~~~~~~~~
Currently, the validation is mainly used for classification and object detection networks. The metrics for classification networks are Top-1 and Top-5 accuracy, while the object detection networks use 12 metrics of COCO, as shown below. Generally, we record the Average Precision when IoU=0.5 (i.e., PASCAL VOC metric).

.. math::

   \boldsymbol{Average Precision (AP):} & \\
   AP\quad & \text{\% AP at IoU=.50:.05:.95 (primary challenge metric)} \\
   AP^{IoU}=.50\quad & \text{\% AP at IoU=.50 (PASCAL VOC metric)} \\
   AP^{IoU}=.75\quad & \text{\% AP at IoU=.75 (strict metric)} \\
   \boldsymbol{AP Across Scales:} & \\
   AP^{small}\quad & \text{\% AP for small objects: $area < 32^2$} \\
   AP^{medium}\quad & \text{\% AP for medium objects: $32^2 < area < 96^2$} \\
   AP^{large}\quad & \text{\% AP for large objects: $area > 96^2$} \\
   \boldsymbol{Average Recall (AR):} & \\
   AR^{max=1}\quad & \text{\% AR given 1 detection per image} \\
   AR^{max=10}\quad & \text{\% AR given 10 detections per image} \\
   AR^{max=100}\quad & \text{\% AR given 100 detections per image} \\
   \boldsymbol{AP Across Scales:} & \\
   AP^{small}\quad & \text{\% AP for small objects: $area < 32^2$} \\
   AP^{medium}\quad & \text{\% AP for medium objects: $32^2 < area < 96^2$} \\
   AP^{large}\quad & \text{\% AP for large objects: $area > 96^2$}


Datasets
~~~~~~~~~~~~
In addition, the dataset used for validation needs to be downloaded by yourself. Classification networks use the validation set of ILSVRC2012 (50,000 images, https://www.image-net.org/challenges/LSVRC/2012/). There are two ways to place the images in the dataset. One is that there are 1000 subdirectories under the dataset directory, corresponding to 1000 classes, and each class has 50 images. In this case, no additional label file is required. The other way is that all images are in the same dataset directory, and there is an additional label file. According to the sequence of images' names, each line in the txt file uses a number from 1 to 1000 to indicate the class of each image.

Object detection networks use the COCO2017 validation set (5000 images, https://cocodataset.org/#download). All images are under the same dataset directory. The corresponding json label file needs to be downloaded as well.

Validation Interface
--------------------

TPU-MLIR provides the command for accuracy validation:

.. code-block:: shell

    $ model_eval.py \
        --model_file mobilenet_v2.mlir \
        --count 50 \
        --dataset_type imagenet \
        --postprocess_type topx \
        --dataset datasets/ILSVRC2012_img_val_with_subdir

The supported parameters are shown below:

.. list-table:: Function of model_eval.py parameters
   :widths: 20 12 50
   :header-rows: 1

   * - Name
     - Required?
     - Explanation
   * - model_file
     - Y
     - Model file
   * - dataset
     - N
     - Directory of dataset
   * - dataset_type
     - N
     - Dataset type. Currently mainly supports imagenet, coco. The default is imagenet
   * - postprocess_type
     - Y
     - Metric. Currently supports topx and coco_mAP
   * - label_file
     - N
     - txt label file, which may be needed when validating the accuracy of classification networks
   * - coco_annotation
     - N
     - json label file, required when validating object detection networks
   * - count
     - N
     - The number of images used for validation. The default is to use the entire dataset.


Validation Example
------------------
In this section, mobilenet_v2 and yolov5s are used as the representative of the classification network and the object detection network for accuracy validation.

mobilenet_v2
~~~~~~~~~~~~~
1. Dataset Downloading

   Download the ILSVRC2012 validation set to the datasets/ILSVRC2012_img- _val_with_subdir directory. Images of the dataset are placed in subdirectories, so no additional label files are required.

2. Model Conversion

   Use the model_transform.py interface to convert the original model to the mobilenet_v2.mlir model, and obtain mobilenet_v2_cali_table through the run_calibration.py interface. Please refer to the "User Interface" chapter for specific usage. The INT8 model of the tpu layer is obtained through the command below. After running the command, an intermediate file named mobilenet_v2_bm1684x_int8_sym_tpu.mlir will be generated. We will use this intermediate file to validate the accuracy of the INT8 symmetric quantized model:

.. code-block:: shell

    # INT8 Sym Model
    $ model_deploy.py \
       --mlir mobilenet_v2.mlir \
       --quantize INT8 \
       --calibration_table mobilenet_v2_cali_table \
       --processor BM1684X \
       --test_input mobilenet_v2_in_f32.npz \
       --test_reference mobilenet_v2_top_outputs.npz \
       --tolerance 0.95,0.69 \
       --model mobilenet_v2_int8.bmodel

3. Accuracy Validation

   Use the model_eval.py interface to validate:

.. code-block:: shell

    # F32 model validation
    $ model_eval.py \
        --model_file mobilenet_v2.mlir \
        --count 50000 \
        --dataset_type imagenet \
        --postprocess_type topx \
        --dataset datasets/ILSVRC2012_img_val_with_subdir

    # INT8 sym model validation
    $ model_eval.py \
        --model_file mobilenet_v2_bm1684x_int8_sym_tpu.mlir \
        --count 50000 \
        --dataset_type imagenet \
        --postprocess_type topx \
        --dataset datasets/ILSVRC2012_img_val_with_subdir

The accuracy validation results of the F32 model and the INT8 symmetric quantization model are as follows:

.. code-block:: shell

    # mobilenet_v2.mlir validation result
    2022/11/08 01:30:29 - INFO : idx:50000, top1:0.710, top5:0.899
    INFO:root:idx:50000, top1:0.710, top5:0.899

    # mobilenet_v2_bm1684x_int8_sym_tpu.mlir validation result
    2022/11/08 05:43:27 - INFO : idx:50000, top1:0.702, top5:0.895
    INFO:root:idx:50000, top1:0.702, top5:0.895

yolov5s
~~~~~~~~~~~~~

1. Dataset Downloading

   Download the COCO2017 validation set to the datasets/val2017 directory, which contains 5,000 images for validation. The corresponding label file instances_val2017.json is downloaded to the datasets directory.

2. Model Conversion

   The conversion process is similar to mobilenet_v2.

3. Accuracy Validation

   Use the model_eval.py interface to validate:

.. code-block:: shell

    # F32 model validation
    $ model_eval.py \
        --model_file yolov5s.mlir \
        --count 5000 \
        --dataset_type coco \
        --postprocess_type coco_mAP \
        --coco_annotation datasets/instances_val2017.json \
        --dataset datasets/val2017

    # INT8 sym model validation
    $ model_eval.py \
        --model_file yolov5s_bm1684x_int8_sym_tpu.mlir \
        --count 5000 \
        --dataset_type coco \
        --postprocess_type coco_mAP \
        --coco_annotation datasets/instances_val2017.json \
        --dataset datasets/val2017

The accuracy validation results of the F32 model and the INT8 symmetric quantization model are as follows:

.. code-block:: shell

    # yolov5s.mlir validation result
    Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.369
    Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.561
    Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.393
    Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.217
    Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.422
    Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.470
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.300
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.502
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.542
    Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.602
    Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.670

    # yolov5s_bm1684x_int8_sym_tpu.mlir validation result
    Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.337
    Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
    Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.365
    Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196
    Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.382
    Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.432
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.281
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.473
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.514
    Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.337
    Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.566
    Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.636