physically grounded diagram generation from text

source: arxiv artificial intelligence: phydrawgen: physically grounded diagram generation from natural language

level: research

a new method called phydrawgen creates physics diagrams from natural language descriptions. it combines a large language model, a deterministic solver, and a vision-language model. the llm first extracts a typed scene graph from the problem text. this graph captures objects and their relationships. then a solver converts the graph into a planar straight-line graph. this step enforces physical constraints like force balance, optical paths, and field topologies. the result is a diagram that follows physical laws exactly.

the pipeline includes a propose-verify loop using a fine-tuned qwen-vl model. this loop checks the generated diagram for any remaining constraint violations. if errors are found, the model corrects them iteratively. this approach prevents common mistakes in ai-generated diagrams, such as hallucinated force vectors or ignored conservation laws. the system was tested on 1,449 problems covering mechanics, optics, and electromagnetism. it outperformed other models like gpt-5-image and gemini 2.5 flash.

the work addresses a key limitation of current generative models. they often produce visually plausible but physically incorrect diagrams. by separating scene understanding from constraint satisfaction, phydrawgen ensures accuracy. the neuro-symbolic design makes the process transparent and reliable. this could be useful for educational tools, automated problem solving, and scientific communication. the benchmark results show significant improvements in generating physically grounded diagrams.

why it matters: it enables reliable generation of physics diagrams from text, reducing errors in educational and scientific applications.

source: arxiv artificial intelligence: phydrawgen: physically grounded diagram generation from natural language