source: pytorch blog: vllm and pytorch work together to improve the developer experience on aarch64

level: technical

starting with pytorch 2.11.0, running pip install torch on aarch64 linux pulls a cuda-enabled wheel instead of the previous cpu-only version. this change eliminates the long-standing issue where torch.cuda.is_available() would return false on systems like nvidia gh200, gb200, and gb300, even after a seemingly successful install. previously, users had to specify a custom pytorch download index to get gpu support, and transitive dependencies could silently replace the gpu build with a cpu wheel.

the vllm project had maintained workarounds for over a year, including a script that stripped torch requirements from dependency files and a uv configuration setting to reuse a pre-installed torch. these were necessary because pypi did not host aarch64 gpu wheels, forcing users to manually manage torch installations. the fix came through collaboration between vllm and pytorch under the pytorch foundation, with the issue raised in a technical advisory committee meeting and tracked in a github issue.

the new wheels dynamically link to libraries like nccl and cublas, keeping binary sizes small and avoiding hosting burdens. for vllm users on grace blackwell hardware, the standard install instructions now work without extra steps. the old workarounds remain for advanced users with custom pytorch builds, but ordinary users no longer need to know about them. this packaging improvement saves time and reduces friction for ai developers deploying on aarch64 gpu systems.

why it matters: it simplifies deployment of ai models on aarch64 gpu systems by removing a common installation pitfall, letting data scientists and engineers focus on building instead of debugging environment issues.


source: pytorch blog: vllm and pytorch work together to improve the developer experience on aarch64