AUTOMOTIVE-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems

Junfeng Yan*1, Biao Wu*1, Meng Fang2, Ling Chen1
1Australian Artificial Intelligence Institute, Sydney, Australia
2University of Liverpool, Liverpool, United Kingdom

A short looping demo showcasing Automotive-ENV tasks and agent behavior.

System Overview

Automotive-ENV task overview
Figure 1a. Automotive-ENV task overview.
Automotive-ENV system architecture overview
Figure 1b. Automotive-ENV architecture overview.

Automotive OS-based environment where the agent observes the accessibility tree, screen, and GPS; optionally consults GPS-contextualized web knowledge; and acts through tap screens and API calls. Task success is determined by low-level programmatic checks of system signals.

Abstract

Multimodal agents have demonstrated strong performance in general GUI interactions, but their application in automotive systems has been largely unexplored. In-vehicle GUIs present distinct challenges: drivers’ limited attention, strict safety requirements, and complex location-based interaction patterns. To address these challenges, we introduce Automotive-ENV, the first high-fidelity benchmark and interaction environment tailored for vehicle GUIs.

This platform defines 185 parameterized tasks spanning explicit control, implicit intent understanding, and safety-aware tasks, and provides structured multimodal observations with precise programmatic checks for reproducible evaluation. Building on this benchmark, we propose ASURADA, a geo-aware multimodal agent that integrates GPS-informed context to dynamically adjust actions based on location, environmental conditions, and regional driving norms.

Experiments show that geo-aware information significantly improves success on safety-aware tasks, highlighting the importance of location-based context in automotive environments. We will release Automotive-ENV, complete with all tasks and benchmarking tools, to further the development of safe and adaptive in-vehicle agents.

BibTeX

@misc{yan2025automotiveenvbenchmarkingmultimodalagents,
  title={Automotive-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems},
  author={Junfeng Yan and Biao Wu and Meng Fang and Ling Chen},
  year={2025},
  eprint={2509.21143},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2509.21143}
}