DriveMM:

All-in-One Large Multimodal Model for Autonomous Driving

Zhijian Huang 1*, Chengjian Feng 2*, Feng Yan, Baihui Xiao 2, Zequn Jie, Yujie Zhong, Xiaodan Liang 1*, Lin Ma 2†

1Shenzhen Campus of Sun Yat-sen University, 2Meituan Inc.

arXiv Code
Radar chart 1 Radar chart 2

Left: DriveMM outperforms all specific SOTA models and other general large multimodal models across all 6 datasets comprising 13 tasks; Right: In zero-shot learning on unseen dataset, DriveMM demonstrates stronger generalization ability compared to specialist models trained on individual datasets.