self-evolving agent boosts legal case retrieval with bm25

source: arxiv artificial intelligence: when rules learn: a self-evolving agent for legal case retrieval

level: research

legal case retrieval is hard because legal language is complex and needs exact word matching between queries and cases. dense retrieval models have improved, but bm25 still works well as a baseline. researchers built a self-evolving framework that uses an llm agent to write query rewriting rules. the agent tests rule combinations automatically and drops bad rules based on past results. this approach needs no parameter training and directly improves bm25.

the framework gives the agent an automatic evaluation setup. the agent creates rewriting rules, plans experiments to test rule combinations, and removes rules that do not help. it learns from historical feedback to keep only useful rules. the method was tested on the chinese legal case retrieval benchmark lecard-v2. results show it beats non-evolving baselines like human-made rules and greedy rule selection.

the work shows that rule-based query rewriting can still be effective for specialized search tasks. by letting an llm agent evolve rules over time, the system adapts to the legal domain without expensive model training. this could be useful for other retrieval tasks where training data is scarce or where simple lexical methods are preferred. the approach combines the interpretability of rules with the adaptability of llm-driven iteration.

why it matters: it offers a training-free way to improve legal search with interpretable rules, useful for ai applications in law where data and compute are limited.

source: arxiv artificial intelligence: when rules learn: a self-evolving agent for legal case retrieval