We propose a novel problem setting that integrates an LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. For more details, please refer to our paper at arxiv.
The following table summarizes the differences between our V2V-QA dataset and recent related autonomous driving datasets. Our V2V-QA is a question-answering dataset that supports multiple vehicles in real cooperative driving scenarios.
Our V2V-QA includes grounding, notable object identification, and planning question-answer pairs, as illustrated in the following figures.
Our V2V-LLM takes the individual perception features of every CAV as the vision input, a question as the language input, and generates an answer as the language output.
The following figures show the V2V-LLM's qualitative results on V2V-QA's testing split. For more details, please to refer to our paper at arxiv.
@ARTICLE{chiu2025v2vllm,
title={V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models},
author={Chiu, Hsu-kuang and Hachiuma, Ryo and Wang, Chien-Yi and Smith, Stephen F. and Wang, Yu-Chiang Frank and Chen, Min-Hung},
journal={https://arxiv.org/abs/2502.09980},
year={2025}
}