AUTOMOTIVE-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems

Junfeng Yan^*1, Biao Wu^*1, Meng Fang², Ling Chen¹

¹Australian Artificial Intelligence Institute, Sydney, Australia
²University of Liverpool, Liverpool, United Kingdom

System Overview

Figure 1a. Automotive-ENV task overview.

Automotive-ENV system architecture overview

Figure 1b. Automotive-ENV architecture overview.

Automotive OS-based environment where the agent observes the accessibility tree, screen, and GPS; optionally consults GPS-contextualized web knowledge; and acts through tap screens and API calls. Task success is determined by low-level programmatic checks of system signals.

Abstract

Multimodal agents have demonstrated strong performance in general GUI interactions, but their application in automotive systems has been largely unexplored. In-vehicle GUIs present distinct challenges: drivers’ limited attention, strict safety requirements, and complex location-based interaction patterns. To address these challenges, we introduce Automotive-ENV, the first high-fidelity benchmark and interaction environment tailored for vehicle GUIs.

This platform defines 185 parameterized tasks spanning explicit control, implicit intent understanding, and safety-aware tasks, and provides structured multimodal observations with precise programmatic checks for reproducible evaluation. Building on this benchmark, we propose ASURADA, a geo-aware multimodal agent that integrates GPS-informed context to dynamically adjust actions based on location, environmental conditions, and regional driving norms.

Experiments show that geo-aware information significantly improves success on safety-aware tasks, highlighting the importance of location-based context in automotive environments. We will release Automotive-ENV, complete with all tasks and benchmarking tools, to further the development of safe and adaptive in-vehicle agents.

BibTeX

@misc{yan2025automotiveenvbenchmarkingmultimodalagents, title={Automotive-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems}, author={Junfeng Yan and Biao Wu and Meng Fang and Ling Chen}, year={2025}, eprint={2509.21143}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2509.21143} }

AUTOMOTIVE-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems

A short looping demo showcasing Automotive-ENV tasks and agent behavior.

System Overview

Abstract

BibTeX