editing multimodal models fails when inputs split

source: arxiv machine learning: correct when paired, wrong when split: decoupling and editing modality-specific neurons in mllms

level: research

knowledge editing lets us update facts in multimodal large language models without retraining. but a new study finds a problem: edits that work for combined image-text queries often fail when the same entity is queried with just text or just an image. for example, after editing a fact about a person using a photo and a question, the model might give the correct answer when shown the photo again, but revert to the old wrong answer when asked with text alone.

the researchers traced this to how knowledge is stored. entity facts are not kept in one place. instead, they are split across separate neural pathways for visual and language processing. current editing methods mainly adjust the multimodal pathway, leaving the unimodal ones unchanged. so when a query uses only one modality, the model falls back on outdated information from that pathway.

to fix this, the team created decode, a method that first identifies which neurons handle visual, language, and combined inputs for a given fact. it then applies targeted edits to all relevant groups. in tests, decode greatly reduced the gap between multimodal and unimodal editing success. this means a fact edited with an image-text pair stays corrected even when asked in text-only or image-only forms later.

why it matters: reliable knowledge editing is key for keeping ai models up to date without full retraining, and this work ensures edits hold across all input types, not just paired ones.

source: arxiv machine learning: correct when paired, wrong when split: decoupling and editing modality-specific neurons in mllms