Open source · Apache 2.0 Built on MLIR PyTorch · ONNX · TFLite · Caffe HuggingFace LLMs

From any model
to a bmodel on TPU.

TPU-MLIR is an open-source machine-learning compiler that converts pre-trained networks — from PyTorch, ONNX, TFLite, Caffe and HuggingFace LLMs — into .bmodel files running efficiently on Sophgo TPUs. A two-dialect MLIR pipeline keeps the path from algorithm to silicon clean, retargetable and easy to extend.

5+
Frontends
120+
Verified models
F32 → INT4
Precisions
Frontends → Top dialectTPU dialect → bmodel → Sophgo TPU
Workflow

Three commands. From a trained model to a runnable bmodel.

A pragmatic, scriptable flow built around three small Python entry points.

Import to MLIR

Import the source network into the framework-agnostic Top dialect.

$ model_transform.py \
    --model_name yolov5s \
    --model_def  yolov5s.onnx \
    --input_shapes [[1,3,640,640]] \
    --mlir yolov5s.mlir

Calibrate & quantize

Optional: build an INT8 calibration table from a small dataset slice.

$ run_calibration.py yolov5s.mlir \
    --dataset   ../COCO2017 \
    --input_num 100 \
    -o yolov5s_cali_table

Lower & deploy

Lower to the TPU dialect and emit a .bmodel for the target chip.

$ model_deploy.py \
    --mlir      yolov5s.mlir \
    --quantize  INT8 \
    --processor bm1684x \
    --model     yolov5s_int8.bmodel
Why TPU-MLIR

A compiler designed for production deployment.

Multiple frameworks in, multiple precisions out — on a clean MLIR foundation that's easy to extend.

Multi-framework frontends

PyTorch, ONNX, TFLite and Caffe out of the box. Other frameworks can be exported via ONNX.

HuggingFace LLMs

First-class support for HuggingFace LLMs — including AWQ, GPTQ and AutoRound variants — through a single llm_convert.py.

Full precision matrix

F32 / F16 / BF16 / INT8 / W4A16 / W8A16 — including post-training calibration for INT8.

Multi-target codegen

Targets Sophgo TPU processors such as bm1684x and bm1688 from the same MLIR pipeline.

Layer-Group optimization

Aggressive graph and tensor-level optimizations — Layer Group, fused preprocess and postprocess — keep TPUs busy and memory traffic low.

120+ models verified

Vision (YOLO family), speech (Whisper) and large language models (Qwen3.5, Qwen3-VL, MiniCPM-V-4, …) ship in the regression set.

Support matrix

Frameworks, precisions, and silicon — at a glance.

Pick a frontend, choose a precision, deploy to the chip you ship on.

Frameworks

PyTorch ONNX TFLite Caffe HuggingFace

Precisions

F32 F16 BF16 INT8 (PTQ + QAT) W4A16 W8A16 AWQ GPTQ

Targets

BM1684X BM1688 …and more Sophgo TPUs
Models

Vision · speech · LLMs — running on TPUs today.

A non-exhaustive snapshot of the families that ship through TPU-MLIR's regression and demo flow.

Vision

YOLOv5 YOLOv8 ResNet MobileNet SAM Stable Diffusion

Speech

Whisper Paraformer FunASR

Large Language Models

Qwen3.5 Qwen3-VL MiniCPM-V-4 …and more
YOLOv5s precision comparison across F32, F16 and INT8 bmodels
YOLOv5s — output comparison across F32 / F16 / INT8 bmodels.
Qwen3.5-2B INT4 chat running on Sophgo BM1684X via TPU-MLIR
Qwen3.5-2B (INT4 / AutoRound) compiled with llm_convert.py and chatting on a Sophgo BM1684X.
Quick start

Pick your starting point.

Both flows assume the official sophgo/tpuc_dev docker image and pip install tpu_mlir.

LLM Compile a HuggingFace LLM (Qwen3.5)
# 1. Pull a quantized build from HuggingFace
$ git lfs install
$ git clone https://huggingface.co/Intel/Qwen3.5-2B-int4-AutoRound

# 2. Convert to a bmodel for the BM1684X
$ llm_convert.py \
    -m /workspace/Qwen3.5-2B-int4-AutoRound \
    --max_input_length 1024 -s 2048 \
    -c bm1684x --max_pixels 768,768 \
    -o qwen3.5_2b
# → qwen3.5_2b/*.bmodel  ✔ ready for TPU
CNN Compile a vision model (YOLOv5s)
# 1) Frontend → Top dialect MLIR
$ model_transform.py \
    --model_name yolov5s \
    --model_def  ../yolov5s.onnx \
    --input_shapes [[1,3,640,640]] \
    --mlir yolov5s.mlir

# 2) Calibrate for INT8
$ run_calibration.py yolov5s.mlir \
    --dataset ../COCO2017 --input_num 100 \
    -o yolov5s_cali_table

# 3) Lower to a bmodel
$ model_deploy.py \
    --mlir yolov5s.mlir --quantize INT8 \
    --calibration_table yolov5s_cali_table \
    --processor bm1684x \
    --model yolov5s_int8.bmodel
Resources

Documentation, papers and walkthroughs.

Everything you need to go deeper, in English and 中文.