source: arxiv artificial intelligence: identifying and understanding human values in text: a tailorable llm-based architecture

level: research

as autonomous systems grow more common, researchers are looking beyond simple utility models to include ethics and morals in decision-making. a key challenge is checking how well these decisions match human values. recent work has turned to large language models to spot values in text, both stated and implied. this paper presents an llm-based architecture that detects and quantifies the strength of human values in writing. it avoids problems of earlier methods that relied on fixed value theories or complex prompt design.

the architecture has three parts that work together. one module creates structured value definitions from the core texts of any value theory. another module uses these definitions to label texts. the third module measures how strongly each value appears. by separating value specification from labeling, the system can adapt to different ethical frameworks without retraining. the authors tested it on multiple datasets and found it performs well across various value models.

this approach makes value detection more flexible and scalable. it can be tailored to specific domains or cultural contexts by swapping in different foundational texts. the system outputs both the presence and intensity of values, giving a nuanced view of moral content in language. this could help align ai behavior with human norms in applications like content moderation, conversational agents, and decision support tools.

why it matters: it provides a practical way to audit and align ai outputs with diverse human values, reducing bias and ethical risks.


source: arxiv artificial intelligence: identifying and understanding human values in text: a tailorable llm-based architecture