source: arxiv artificial intelligence: dr-dci: scaling direct corpus interaction via dynamic workspace expansion

level: research

agentic search over large document collections usually depends on retriever-mediated interfaces like bm25 or colbert. these systems rank relevant documents but only show them as a list or limited views. agents cannot easily reorganize material or check facts across documents. direct corpus interaction (dci) tries to fix this by giving agents shell-like commands to search, filter, compare, and verify across the whole corpus. however, running these commands on the entire corpus becomes slow and unstable as the collection grows.

dr-dci is a new framework that uses a retriever to steer dci. instead of working on the full corpus, the agent calls the retriever to pull relevant documents into a local workspace. the workspace expands dynamically as the agent needs more evidence. dci operations then run inside this smaller, focused set of documents. this keeps the speed and stability of retrieval while keeping the flexibility of direct interaction.

the approach treats retrieval as an action the agent can take, blending it with workspace-based dci. the agent decides when to fetch more documents and when to run commands on what it already has. this lets it handle complex tasks that need cross-document reasoning without getting bogged down by corpus size. the method aims to make agentic search both scalable and thorough.

why it matters: it offers a practical way to build ai agents that can deeply search and reason over large document sets without performance issues, useful for research assistants or enterprise search tools.


source: arxiv artificial intelligence: dr-dci: scaling direct corpus interaction via dynamic workspace expansion