source: arxiv artificial intelligence: aura: action-gated memory for robot policies at constant vram
level: technical
large language models use a kv-cache that grows with each token, which works for datacenters handling many short requests. robots run a single long episode on edge hardware with limited memory and bandwidth. writing to flash wears it out, so memory writes can slow things down more than compute. aura-mem is designed for this setting, using a frozen vision-language-action model with a small recurrent memory that stays the same size no matter how long the robot runs.
the key idea is a learned gate that decides when to write to memory. it is trained directly on a closed-loop action error signal, not on reconstructing observations. the gate only writes when the current observation would change the next action. this means the memory stays silent most of the time, avoiding unnecessary writes and keeping the state tiny. the inference state is fixed at 4,224 bytes, while a standard kv-cache can grow to over 6,000 bytes and keeps expanding.
this approach suits robots that need to act continuously without resetting. by keeping memory constant and write operations sparse, aura-mem reduces wear on flash storage and fits within tight edge hardware limits. the method does not require retraining the base model, making it easier to add to existing systems. it shows that memory for embodied agents can be both compact and selective, focusing only on information that actually changes behavior.
why it matters: constant memory and selective writes let robots run long tasks on cheap edge hardware without running out of memory or wearing out storage.
source: arxiv artificial intelligence: aura: action-gated memory for robot policies at constant vram