RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case

Figure 1. RoboTron-Sim achieves significant improvements in real-world driving capabilities by leveraging simulated hard-case data. We evaluate the benefits of simulated data using representative methods. Results show that the traditional method VAD yields minor gains, while LLaVA-OneVision struggles to improve performance in challenging scenarios. In contrast, our proposed RoboTron-Sim achieves over 50% improvement in hard-to-drive situations.

Abstract

Collecting real-world data for rare high-risk scenarios, long-tailed driving events, and complex interactions remains challenging, leading to poor performance of existing autonomous driving systems in these critical situations. In this paper, we propose RoboTron-Sim, which improves real-world driving in critical situations by utilizing simulated hard cases. First, we develop a simulated dataset called Hardcase Augmented Synthetic Scenarios (HASS), which covers 13 high-risk edge-case categories as well as balanced environmental conditions such as day/night and sunny/rainy. Secondly, we introduce Scenario-aware Prompt Engineering (SPE) and an Image-to-Ego Encoder (I2E Encoder) to enable multimodal large language models to effectively learn real-world challenging driving skills from HASS, by adapting to environmental deviations and hardware differences between real and simulated scenarios. Extensive experiments are conducted on nuScenes, where RoboTron-Sim improves driving performance in challenging scenarios by approximately 50%, achieving state-of-the-art results in real-world open-loop planning. Qualitative results further demonstrate the effectiveness of RoboTron-Sim in better managing rare high-risk driving scenarios.

RoboTron-Sim Framework

Figure 2. The overall framework of our proposed end-to-end autonomous driving system. The framework leverages simulation data to enhance performance in real-world scenarios, by integrating videos, 3D transformation, location, data source, and instruction information.

Experiment

Table 1. Comparison on the open-loop planning on nuScenes dataset with and without ego pose as input. We report the L2 distance (m), collision rate (%), and boundary violation rate (%) at 1s, 2s, and 3s time horizons, along with their averages.

Table 2. Red arrows indicate improvement (lower values are better), gray denotes no significant change. "MLLM" means the baseline we use.

Table 3. L2 Performance comparison of different datasets in E2D and H2D scenarios (lower values are better).

Visualizations

Green indicates the trajectory generated by RoboTron-Sim; red shows the ground truth (interference); yellow denotes the baseline.

BibTeX

@article{anonymous2025RoboTron-Sim,
  title={RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case},
  author={Baihui Xiao, Chengjian Feng, Zhijian Huang, Feng Yan, Yujie Zhong, Lin Ma},
  journal={arXiv preprint arXiv:0000.00000},
  year={2025}
}