source: arxiv artificial intelligence: embeddings for preferences, not semantics

level: research

modern ai lets groups express views as free-form text instead of picking from fixed options. a natural next step is to embed these opinions into a vector space, so existing algorithms for facility location and fair clustering can find representative or fair outcomes. but standard text embeddings measure semantic similarity, not whether someone agrees with a statement. this mismatch causes problems when semantic closeness does not mean preferential closeness.

the paper calls this preferential similarity: a person's agreement with a text should be inversely related to their distance from it in the embedding space. off-the-shelf embeddings sometimes work because semantics and preferences often correlate, but they fail when the correlation breaks. the authors frame this as an invariance problem: text embedding models encode both preference-relevant signals like stance and values, and semantic information. the goal is to isolate the preference signal.

to fix this, the researchers propose learning embeddings that are invariant to semantic content but sensitive to preferences. they introduce a method that uses contrastive learning on pairs of texts with known agreement labels, pushing embeddings of agreed texts together and disagreed texts apart while ignoring semantic similarity. experiments show their preference embeddings outperform standard embeddings on tasks like selecting a representative opinion from a set of free-text responses, making collective decision tools more reliable.

why it matters: better preference embeddings can improve ai tools for collective decision-making, such as summarizing public feedback or finding consensus in online discussions, by aligning vector distances with actual agreement rather than just topic similarity.


source: arxiv artificial intelligence: embeddings for preferences, not semantics