learning transferable user preferences from limited interactions

source: arxiv artificial intelligence: learning transferable latent user preferences for human-aligned decision making

level: research

large language models often fail to produce solutions that match what people actually want. they can handle clear instructions but struggle with unspoken preferences that affect how decisions should be made in ambiguous situations. current methods either need many repeated interactions with users or cannot apply learned preferences to new tasks. this limits their use in real-world settings where quick, personalized decisions matter.

the clipr framework addresses this by learning transferable natural language rules from just a few conversations. an llm acts as a high-level reasoner that infers latent preferences during limited interactions. these preferences are then turned into actionable rules that guide downstream decision making across different tasks and contexts. the approach reduces the need for constant user feedback while keeping decisions aligned with individual values.

experiments show that clipr improves alignment with human preferences compared to baselines that do not transfer learned rules. the framework works by extracting generalizable patterns from user feedback, allowing it to apply them in new situations without retraining. this makes it practical for applications like personalized assistants, adaptive tutoring, or any system where understanding subtle user needs is key. the method focuses on making llm-based reasoning more responsive to the people it serves.

why it matters: it enables ai systems to quickly adapt to individual user preferences across tasks, reducing the need for constant feedback and making personalized decision support more practical.

source: arxiv artificial intelligence: learning transferable latent user preferences for human-aligned decision making