level: research
the open shop scheduling problem appears in manufacturing and services but is hard to solve as size grows. exact methods fail on large cases, and traditional heuristics need careful tuning. this work presents a transformer model with an encoder-decoder structure and multi-head attention. it takes only the processing-time matrix as input and outputs feasible schedules. training uses taillard benchmark instances of sizes 4x4, 5x5, 7x7, and 10x10. the resulting makespans are typically within 15 to 30 percent of the best-known values.
to test scalability, the trained policy is applied without any retraining to randomly generated instances from 40x40 up to 100x100. its performance is compared against classic dispatching rules: shortest processing time, longest processing time, most work remaining, and earliest start time. the transformer policy consistently outperforms these heuristics on large instances, showing strong generalization. the approach avoids the need for instance-specific tuning or retraining, making it practical for real-world use where problem sizes vary.
the method uses deep reinforcement learning to train the policy end-to-end. the model learns to construct schedules step by step, deciding which job to assign next. this data-driven approach captures complex patterns that simple rules miss. the results suggest that learning-based methods can handle combinatorial optimization problems at scales where traditional methods struggle. the work opens a path for applying similar architectures to other scheduling variants.
why it matters: it shows that a single trained model can quickly produce good schedules for large open shop problems, reducing the need for manual tuning or problem-specific algorithms.