level: research
masked diffusion models generate sequences by iteratively filling in masked tokens, but they only provide marginal conditional distributions and do not explicitly model dependencies between variables. this work introduces a neural framework that estimates pairwise conditional mutual information directly from the hidden states of a pretrained model. the estimator is trained using ground-truth mutual information computed from the model's own conditional distributions, so it captures the model's internal belief about variable dependencies.
the estimator predicts the full mutual information matrix in a single forward pass. this matrix reveals which variables are conditionally independent given the current context. by identifying these independent subsets, the method enables mutual information-guided parallel decoding, where multiple tokens can be generated simultaneously without violating the model's learned structure. experiments on sudoku and protein sequence generation with esm-c show that the mutual information maps recover known structural constraints.
the approach reduces inference-time forward passes by 3 to 5 times compared to standard sequential decoding, while maintaining generation quality. the mutual information maps also provide interpretability, showing how the model understands relationships between positions. this work bridges the gap between the implicit dependencies learned by masked diffusion models and explicit dependency modeling, offering a practical tool for faster and more transparent sequence generation.
why it matters: faster decoding in masked diffusion models can make large-scale sequence generation more practical for applications like protein design, while the mutual information maps help verify that models learn correct structural constraints.