level: research
closed-loop driving simulators often fill their worlds with traffic agents that all act the same. these agents come from rule-based systems or learned models trained on a single behavior. recent work tries to add style variety by using labels on recorded data or reward weights guessed by large language models. but these methods only approximate what a style should reward, instead of showing how real humans drive when asked to follow a specific style.
personadrive is a pipeline that conditions a vision-language-action driving agent on retrieved examples from a human driving dataset. in this dataset, people drove routes in the carla simulator under aggressive, neutral, and conservative instructions using a driver-in-the-loop setup. the pipeline has three stages. first, it mines triplets from the human data offline, using a score that combines image and text similarity. then, at runtime, it retrieves the most relevant human demonstrations for the current situation. finally, the agent uses these demonstrations to guide its own driving actions.
the approach moves beyond proxy signals by directly using human demonstrations of instructed styles. this allows the agent to produce more natural and varied behaviors in simulation. the work shows how retrieval-augmented generation can be applied to embodied agents, making simulated traffic more realistic and useful for testing self-driving systems.
why it matters: more human-like traffic agents in simulation can improve the safety testing of autonomous vehicles by exposing them to diverse and realistic behaviors.