research agents leak private data through web queries

source: hugging face blog: mosaicleaks: can your research agent keep a secret?

level: research

deep research agents that mix private documents with web tools can leak sensitive information through their search queries. an observer who sees the query log can piece together private facts, a problem called the mosaic effect. mosaicleaks measures this by giving agents multi-hop questions that require both local and web information. the benchmark includes 1,001 chains over synthetic enterprise documents and a controlled web corpus.

simply telling the agent not to leak in the prompt helps a little but is inconsistent and can hurt task performance. training the agent only to solve more chains correctly raises success from 48.7% to 59.3%, but leakage jumps from 34.0% to 51.7%. the agent learns to pack more context into queries, which helps retrieval but gives away more private fragments.

a new training method called privacy-aware deep research (pa-dr) combines a task reward with a learned privacy reward. the privacy reward penalizes queries that leak information directly or create new mosaic leaks. this approach lifts strict chain success to 58.7% while cutting answer and full-information leakage to 9.9%, lower than the untrained model. pa-dr also uses situational rewards that are more sample-efficient, reaching the same task performance with far fewer training samples.

why it matters: for ai and data science, this shows that privacy in agentic systems cannot be fixed with prompts alone and requires training that explicitly penalizes information leakage, which is critical for enterprise deployments.

source: hugging face blog: mosaicleaks: can your research agent keep a secret?