TPU-MLIR Technical Reference Manual

Release Record

Version

Release date

Explanation

v1.28.0

2026.04.14

Support Qwen3.5, Qwen3-ASR, PaddleOCR-VL and lfm2-vl in llm_convert; LLM multi-device pipeline (PP) and batch-size support; New WeightDeduplicate pass to merge identical weights; yolov26 post-processing fusion and new llm_model memory analysis tool

v1.27.0

2026.02.04

FP8 CUDA inference; Mixing per-token/per-channel and group-size dynamic quantization; LoRA and dynamic shape support multi-core; Module hash dump/load for cache reuse; TPULang adds RotPosEmb and cumsum interfaces

v1.26.0

2025.12.25

CUDA inference framework; Dynamic group quantization for BM1684X; 4K-aligned bmodel format (model_tool can refresh existing bmodels); Qwen3-VL multi-ViT and LightStereo support; LLM compressed-tensors mode

v1.25.0

2025.11.28

Qwen3-VL support BM1690E support

v1.24.0

2025.10.30

MINICPMV4 support Supports peak system and device memory estimation for bmodel

v1.23.0

2025.09.30

GLM4.1V support BM1688 Conv and MatMul support W4A8 quantization

v1.22.0

2025.08.31

PPL supports dynamic compilation llm_analyse supports performance estimation for large models

v1.21.0

2025.07.31

BM1688 supports yolov8 post-processing bmodel_checker supports replacing wrong output as reference data

v1.20.0

2025.06.30

Support IO_RELOC; Deconv3D INT8 bugfix; BatchNorm and Conv backward operators support 128 batch training

v1.19.0

2025.05.30

Support AWQ and GPTQ models; Deconv3D F16, F32 bugfix

v1.18.0

2025.05.01

YOLO series adds automatic mixed precision setting; Added SmoothQuant option for run_calibration; New one-click compilation script for LLM

v1.17.0

2025.04.03

Significant improvement in LLM model compilation speed; TPULang supports PPL operator integration; Fixed random error issue with Trilu bf16 on CV184X

v1.16.0

2025.03.03

TPULang ROI_Extractor support; Einsum supports abcde,abfge->abcdfg pattern; LLMC adds Vila model support

v1.15.0

2025.02.05

Added LLMC quantization support; Address boundary check in codegen; Fixed several comparison issues

v1.14.0

2025.01.02

Added post-processing fusion for yolov8/v11; Support for Conv3D stride > 15; Improved FAttention accuracy

v1.13.0

2024.12.02

Streamlined Release package; Performance optimization for MaxPoolWithMask training operator; Added support for large RoPE operators

v1.12.0

2024.11.06

tpuv7-runtime cmodel integration; Support for PPL backend operator development

v1.11.0

2024.09.27

Added SoC mode for BM1688 tdb; bmodel supports fine-grained merging; Fixed several performance degradation issues

v1.10.0

2024.08.15

Added yolov10 support; New quantization tuning section; Optimized tpu-perf log output

v1.9.0

2024.07.16

New quantization algorithms: octav, aciq_guas and aciq_laplace

v1.8.0

2024.05.30

TPULang supports input/output order specification; tpuperf removes patchelf dependency

v1.7.0

2024.05.15

CV186X dual-core changed to single-core; Support for gemma/llama/qwen models

v1.6.0

2024.02.23

Added Pypi release format; Support for user-defined Global operators; Added CV186X processor platform support

v1.5.0

2023.11.03

More Global Layer support for multi-core parallel processing

v1.4.0

2023.09.27

System dependencies upgraded to Ubuntu22.04; Added BM1684 Winograd support

v1.3.0

2023.07.27

Added manual floating-point operation region specification; Added supported frontend framework operator list; Added NNTC vs TPU-MLIR quantization comparison

v1.2.0

2023.06.14

Adjusted mixed quantization examples

v1.1.0

2023.05.26

Added post-processing using intelligent deep learning processor

v1.0.0

2023.04.10

PyTorch support, added section for PyTorch model conversion

v0.8.0

2023.02.28

Added pre-processing using intelligent deep learning processor

v0.6.0

2022.11.05

Added section for mixed precision operation process

v0.5.0

2022.10.20

Added model-zoo specification to test all models within

v0.4.0

2022.09.20

Caffe support, added section for Caffe model conversion

v0.3.0

2022.08.24

TFLite support, added section for TFLite model conversion

v0.2.0

2022.08.02

Added chapter for running test samples in SDK

v0.1.0

2022.07.29

Initial release, supports resnet/mobilenet/vgg/ssd/yolov5s with yolov5s as example case

Table of Contents