We designed a general-purpose, MLIR-based TPU compiler with a two-layer dialect hierarchy—Top Dialect and TPU Dialect—to bridge high-level algorithms and low-level processors. At the algorithm level, it provides native support for deep-learning frameworks such as PyTorch and ONNX and enables conversion of HuggingFace LLMs; at the hardware level, it adapts to multiple processor targets.Â
Paper | Code | QuickStart | DevManual
English | ç®€ä½“ä¸æ–‡
Supports F32, F16, BF16, INT8, W4A16 and W8A16 data types Â
Supports calibration-based quantization of models Â
Compatible with major deep learning frameworks such as PyTorch, ONNX, Caffe, etc. Â
Supports HuggingFace LLMs, including floating-point models as well as AWQ and GPTQ quantized variants Â
Compatible with 120+ mainstream models, including:Â Â
1) Vision models (e.g. YOLO series)Â Â
2) Speech models (e.g. Whisper)Â Â
3) Large language models (e.g. Qwen3, Qwen2.5VL, Llama2, MiniCPM-V-2_6, ChatGLM3, etc.)
Effect Comparison Chart for Different Data Types
Performance of Qwen2.5-VL-3B-AWQ INT4 on the Sophgo BM1684X