16. Appendix.06: TDB Guidance

This chapter mainly introduces the use of the Tensor Debugger(TDB) tool. TDB provides a debugging window similar to pdb and gdb interfaces, which can be used to debug the BModel running process, including adding breakpoints, single-step execution, viewing memory data, data comparison, and other functions.

This tool currently supports BM1684, BM1684X, and BM1688.

16.1. Preparatory work

Environment configuration

First you need to refer to Environment Setup chapter to complete the environment configuration, enter the Docker container of TPU-MLIR, and install tpu_mlir in it.

If you have completed the environment configuration, you can ignore this step.

Generate bmodel

Before using TDB, you need to generate the bmodel file through TPU-MLIR, refer to Compile the ONNX model chapter to generate the bmodel file from the model.

You need to use the following two commands:

# Convert the ONNX model to top_mlir
$ model_transform
# Convert top_mlir to bmodel
$ model_deploy

Here, the model_deploy command needs to add --debug and --compare_all arguments to save the tpu_output.npz file and keep the intermediate data.

When the bmodel is built, a directory with the compilation.bmodel and final.mlir files is automatically generated, this directory is called the Context directory.

16.2. Start TDB

$ tdb [-h]
      [--inputs [INPUTS]]
      [--ref_data [REF_DATA ...]]
      [--plugins [PLUGINS]]
      [--ddr_size [DDR_SIZE]] [-v]
      [context_dir]

The main parameters of the tdb command are described as follows:

Table 16.1 Function of tdb parameters
Name	Required?	Explanation
context_dir	Y	The directory where the bmodel file resides, which is the current directory by default
-h, –help	N	Display help information
–inputs	N	Specify input data for the bmodel file
–ref_data	N	Specify reference data for the bmodel file
–plugins	N	Add extra plugins
–ddr_size	N	Specify the ddr_size of the cmodel
-v, –verbose	N	Use the progress bar

Example of starting TDB:

$ tdb
# equivalent to
$ tdb ./

16.3. TDB command summary

After entering TDB, press tab twice to get the command prompt. The display looks like this:

After entering TDB, the following commands can be used:

Table 16.2 TDB command summary
Command	Explanation
s/start	Load the bmodel and initialize it
r/run	Execute from the beginning to the end, the run instruction contains the initialization function
b/break	Add breakpoints in final.mlir
delete	Delete breakpoint
n/next	Execute the next instruction, you can use `n [num]` to execute more than one instruction
c/continue	Continue the instruction until the break point or the end of the run
info	Print breakpoint information or instructions in different formats
p/print	Print the current instruction or the data corresponding to the instruction
w/watch	Monitors a certain input/output of the current or previous atomic instruction and returns a prompt when the data at its address changes
q/quit	Quit TDB
py [py_cmd]	Execute python commands in TDB, integrated with pdb’s code completion function

Where num represents number; py_cmd denotes the python command.

16.4. TDB usage process

# start TDB in context directory
$ cd path/to/context_dir
$ tdb
# initialize
$ s
# execute line by line
$ n
# add breakpoint
$ b
# keep running
$ c
# continue debugging
$ info/p/w
# quit
$ q

16.5. TDB function description

16.5.1. next feature

# execute line by line use next
(tdb) n
# execute multiple instructions
(tdb) n [num]
# execute 3 instructions
(tdb) n 3

The instruction displayed after the n command is the next unexecuted instruction.

16.5.2. breakpoint feature

Breakpoint feature include viewing breakpoints, adding/removing breakpoints, and turning breakpoints on/off. Here’s how to use it:

Table 16.3 breakpoint feature
Command	Explanation	Example
info b/break	View breakpoint information	info b; info break
b/break	Add breakpoint	b 1
enable	Enable breakpoint	enable 1; enable 1,2
disable	Disable breakpoint	disable 1; disable 1,2
delete	Delete breakpoint	delete 1

Currently supported breakpoint types are as follows:

value-id

The Operation prefix in final.mlir corresponding to bmodel, for example:

%140 = "tpu.Load"(%6) {do_bcast = false …

where %140 and %6 are value-id , adding this type of breakpoint example is as follows:

(tdb) b %140
(tdb) b %6

op-name

The Operation name in final.mlir , in the above example, tpu.Load is the Op name, add this type of breakpoint example is as follows:

(tdb) b tpu.Load

cmd-id

The cmd-id of asm which is resolved. In the above example, D1 and B0 are cmd-id . The example of adding this type of breakpoint is as follows:

(tdb) b D2
(tdb) b B4

16.5.3. info feature

The info feature can print breakpoint information or instructions in different formats as follows:

info b

View breakpoint information.

(tdb) info b
index     type enable     text hit
    1  dialect      y tpu.load   0
    2     addr      y       R0   3
    3   cmd-id      y       D1   0
    4 value-id      y       %7   0

info asm

Show the current asm instruction.

(tdb) info asm
%R0, %B15 = "arith.add"(%R13, %C1.0, %D3) {round_mode = 0} : (memref<1x32x54x160xf32, strides: [8640, 8640, 160, 1]>, f32, none) -> (memref<1x32x54x160xf32, strides: [8640, 8640, 160, 1]>, none)

info mlir

Show the Operation in final.mlir that corresponds to the current instruction.

(tdb) info mlir
%137 = "tpu.Active"(%134) {ginfo = #tpu.lg<out_addr = 212992, out_size = 35456, buffer_addr = 0, buffer_size = 71040, eu_align = true, n_idx = [0], n_slice = [1], c_idx = [0], c_slice = [32], d_idx = [0], d_slice = [1], h_idx = [0, 53, 107, 161, 215, 267], h_slice = [54, 55, 55, 55, 53, 53], w_idx = [0, 159], w_slice = [160, 161], id = 6, stage = 1, group_type = 0>, mode = #tpu<active_mode SILU>} : (tensor<1x32x320x320xf32>) -> tensor<1x32x320x320xf32> loc(#loc19)

info reg

Show the value of each field after the current command has been parsed.

(tdb) info reg
{'cmd_short': 1, 'cmd_id': 15, 'cmd_id_dep': 3, 'tsk_typ': 3, 'tsk_eu_typ': 2, 'opd0_const': 0, 'opd1_const': 1, 'opd2_const': 0, 'tsk_opd_num': 2, 'cmd_id_en': 1, 'pwr_step': 0, 'intr_en': 0, 'res0_prec': 2, 'opd0_prec': 2, 'opd1_prec': 2, 'opd2_prec': 0, 'opd0_sign': 1, 'opd1_sign': 1, 'res0_str': 0, 'opd0_str': 0, 'opd1_str': 0, 'opd2_n_str': 0, 'rsvd0': 0, 'res0_n': 1, 'res0_c': 32, 'res0_h': 54, 'res0_w': 160, 'res0_addr': 0, 'opd0_addr': 212992, 'opd1_addr': 1065353216, 'opd2_addr': 0, 'res0_n_str': 0, 'res0_c_str': 0, 'opd0_n_str': 0, 'opd0_c_str': 0, 'opd1_n_str': 0, 'opd1_c_str': 0, 'res0_h_str': 0, 'res0_w_str': 0, 'opd0_h_str': 0, 'opd2_sign': 0, 'rsvd1': 0, 'opd0_w_str': 0, 'opd1_h_str': 0, 'opd1_w_str': 0, 'rsvd2': 0}

info loc

Show the corresponding Operation information of tensor_location.json in the Context directory.

(tdb) info loc
{'core_id': 0,
'file_line': 27,
'loc_index': 4,
'opcode': 'tpu.Active',
'operands': [@163840({name=122_Conv, layout=eu_align, slice=[0:1, 0:32, 0:1, 0:54, 0:160], mlir_type=tensor<1x32x320x320xf32>, memory_type=<1x32x54x160xf32>})],
'results': [@212992({name=124_Mul, layout=eu_align, slice=[0:1, 0:32, 0:1, 0:54, 0:160], mlir_type=tensor<1x32x320x320xf32>, memory_type=<1x32x54x160xf32>})],
'slice_all': False,
'subnet_id': 0,
'tiu_dma_id_after': [17, 3],
'tiu_dma_id_before': [1, 3]}

16.5.4. print feature

The print feature not only prints the current asm instruction, but also the input and output data of the instruction, the method of use is as follows:

Table 16.4 print feature
Command	Explanation	Example
p op	Show upcoming commands	p op
p pre/next	Show the previous or next instruction	p pre; p next
p in	Show the input data for the next unexecuted instruction	p in; p in 0
p out	Show the output data of the previous executed instruction	p out; p out 0

16.5.5. watchpoint feature

The watchpoint feature can monitor the input/output data of an instruction and return an alert when the data of a monitored variable changes, the method of use is as follows:

w

Show the currently added watchpoints, see the following example:

(tdb) w
index    cmd_type cmd_id core_id enabled                                                   value
    1 CMDType.dma      2       0       y %G0: memref<1x32x3x36xf32, strides: [3456, 108, 36, 1]>

w in

Adds one of the inputs for the next pending instruction as a watchpoint, see the following example:

(tdb) n
%R15.2688, %D2 = "dma.tensor"(%G0, %B0) {decompress = False} : (memref<1x32x3x36xf32, strides: [3456, 108, 36, 1]>, none) -> (memref<1x32x3x36xf32, strides: [108, 108, 36, 1]>, none)
(tdb) w in 0
(tdb) w
index    cmd_type cmd_id core_id enabled                                                   value
    1 CMDType.dma      2       0       y %G0: memref<1x32x3x36xf32, strides: [3456, 108, 36, 1]>

as you can see, w in 0 adds the first input %G0 of the next pending instruction as watchpoint.

w out

Adds one of the outputs of the last executed instruction as a watchpoint, see the following example:

(tdb) w out 0
(tdb) w
index    cmd_type cmd_id core_id enabled                                                         value
    1 CMDType.dma      2       0       y       %G0: memref<1x32x3x36xf32, strides: [3456, 108, 36, 1]>
    2 CMDType.dma      1       0       y %R0: memref<1x3x110x322xf32, strides: [35424, 35424, 322, 1]>

p w idx old/now

Prints the value of the added watchpoint, as shown in the following example:

Where idx is the index of the watchpoint returned using the w command, old means to view the data when the watchpoint was originally added, and now means to view the current data of the watchpoint.

The old/now can be omitted and the default is now, which means view the current data of the watchpoint.

(tdb) w
index    cmd_type cmd_id core_id enabled                                                         value
    1 CMDType.dma      2       0       y       %G0: memref<1x32x3x36xf32, strides: [3456, 108, 36, 1]>
    2 CMDType.dma      1       0       y %R0: memref<1x3x110x322xf32, strides: [35424, 35424, 322, 1]>
(tdb) p w 1
(tdb) p w 1 old

w delete [idx]

Deletes the added watchpoint, as shown in the following example:

When idx is entered, the corresponding watchpoint will be deleted; when idx is not entered, all watchpoints will be deleted.

(tdb) w
index    cmd_type cmd_id core_id enabled                                                         value
    1 CMDType.dma      2       0       y       %G0: memref<1x32x3x36xf32, strides: [3456, 108, 36, 1]>
    2 CMDType.dma      1       0       y %R0: memref<1x3x110x322xf32, strides: [35424, 35424, 322, 1]>
    3 CMDType.tiu     11       0       y %R13: memref<1x32x54x160xsi16, strides: [8640, 8640, 160, 1]>
(tdb) w delete 1
(tdb) w
index    cmd_type cmd_id core_id enabled                                                         value
    2 CMDType.dma      1       0       y %R0: memref<1x3x110x322xf32, strides: [35424, 35424, 322, 1]>
    3 CMDType.tiu     11       0       y %R13: memref<1x32x54x160xsi16, strides: [8640, 8640, 160, 1]>
(tdb) w delete
(tdb) w
index cmd_type cmd_id core_id enabled value

16.5.6. py feature

The py feature can execute python commands directly in the TDB environment, the method of use is as follows:

(tdb) py a = 2
(tdb) py b = a + 2
(tdb) py print(b)
4

16.6. BModel Disassembler

BModel Disassembler can disassemble the bmodel file to get the assembly code of atomic instruction in MLIR format, which is asm instruction. They are used to analyze the final runtime instruction of the model.

When you use it, you need to enter the Context directory first, and the method of use is as follows:

$ bmodel_dis [-h] [--format {mlir,reg,bits,bin,reg-set}] bmodels [bmodels ...]

where --format can specify the output format, which default use mlir format, bmodels means the bmodel file to be parsed. Example usage is as follows:

$ bmodel_dis compilation.bmodel
$ bmodel_dis --format reg compilation.bmodel

The output can be saved to a file as follows:

$ bmodel_dis compilation.bmodel > dis_bmodel.mlir
$ bmodel_dis --format reg compilation.bmodel > dis_reg.json

16.7. BModel Checker

BModel Checker is used to find errors (codegen errors) in a bmodel, if during model_deploy you find that the generated bmodel cannot be aligned with the tpu’s reference data, you can use this tool to locate the error.BModel for BM1684, BM1684X, BM1688 processors is currently supported.

When generating a bmodel file, the model_deploy command needs to add the --debug and -compare_all parameters, which are used to save the tpu_output.npz file and retain intermediate data.

The usage is as follows:

$ bmodel_checker [-h]
                 [--tolerance TOLERANCE]
                 [--report REPORT] [--fail_fast]
                 [--quiet] [--no_interactive]
                 [--dump_mode {failed,all,never}]
                 context_dir reference_data

The main parameters of bmodel_checker are described as follows:

Table 16.5 Function of bmodel_checker parameters
Name	Required?	Explanation
context_dir	Y	bmodel file directory
reference_data	Y	tpu_output.npz file location
quiet	N	The execution progress bar is not displayed
fail_fast	N	Stop at the first error
dump_mode	N	Specifies the data to be downloaded by the dump command, the default value is failed, it can also be all or never
tolerance	N	Specify comparison tolerances, default is “0.99,0.90”
report	N	Save the wrong data to file, default is `failed_bmodel_outputs.npz`
no_interactive	N	After running bmodel_checker, it exits TDB mode directly
cache_mode	No	Cache mode, with three options: online, offline, generate. Default is online.

To use bmodel_checker you need to enter the Context directory, as shown in the following example:

$ bmodel_checker ./ ../yolov5s_bm1684x_f32_tpu_outputs.npz
$ bmodel_checker ./ ../yolov5s_bm1684x_f32_tpu_outputs.npz --fail_fast
$ bmodel_checker ./ ../yolov5s_bm1684x_f32_tpu_outputs.npz --tolerance 0.99,0.90

After executing the bmodel_checker command, the checker report is output and the error outputs are saved to the failed_bmodel_outputs.npz file, which is described below:

where the “check” means pass, which the data is checked and its similarity conforms to cos > 0.99, eul > 0.9 (This is the default threshold, which can be modified by the tolerance parameter); The “cross” means an error, which the data does not reach the required similarity; The “question mark” means an unknown, which the reference data is not found and the correctness of the data cannot be determined. A complete checker report of a yolov5s model is shown below:

After outputting the check report, it automatically enters the interactive mode. The interactive mode provides a detailed view of the errors and also allows you to quickly jump between lines, as shown in the following example of a cswin_tiny model.

check summary

The check report can be reprinted by using the check summary command:

It is worth noting that you can aggregate inputs and outputs with the same line numbers using the check summary reduce command.

check data

(tdb) check data [file-line]

where file-line is the line number in the checker report, which corresponds to the line number of final.mlir . This command gives a description of all the input and output data of the command corresponding to file-line, an example is shown below:

(tdb) check data [file-line] [index]

Where index is the index of the data output by the check data [file-line] command. This command gives detailed information about the corresponding index data, and an example of comparing the correct data is shown below:

An example of comparison error data is shown below:

SOC Devices

When executing on SOC devices, in order to perform comparisons without introducing an mlir dependency, it is necessary to first generate a cache within a Docker environment. Subsequently, the cached model can be used for comparison in the SOC device environment.

$ bmodel_checker ./ ../yolov5s_bm1684x_f32_tpu_outputs.npz --cache_mode generate # on docker
$ bmodel_checker ./ ../yolov5s_bm1684x_f32_tpu_outputs.npz --cache_mode offline # on soc