If you're going "wait, no Apple platform has first-party CUDA support!", note that this set of patches also adds support for "Linux [platforms] with CUDA 12 and SM 7.0 (Volta) and up".
How does this work when one of the key features of MLX is using a unified memory architecture? (see bullets on repo readme: https://github.com/ml-explore/mlx )
I would think that bringing that to all UMA APUs (of any vendor) would be interesting, but discreet GPU's definitely would need a different approach?
edit: reading the PR comments, it appears that CUDA supports a UMA API directly, and will transparently copy as needed.
Random aside: A lot of the people working on MLX don't seem to be officially affiliated with Apple at least in a superficial review. See for example: https://x.com/prince_canuma
Idly wondering, is Apple bankrolling this but wants to keep it in the DL? There were also rumours the team was looking to move at one point ?
I wonder how much this is a result of Strix Halo. I had a fairly standard stipend for a work computer that I didn't end up using for a while so I recently cashed it in on the EVO-X2 and fuck me sideways: that thing is easily competitive with the mid-range znver5 EPYC machines I run substitors on. It mops the floor with any mere-mortal EC2 or GCE instance, like maybe some r1337.xxxxlarge.metal.metal or something has an edge, but the z1d.metal and the c6.2xlarge or whatever type stuff (fast cores, good NIC, table stakes), blows them away. And those things are 3-10K a month with heavy provisioned IOPS. This thing has real NVME and it cost 1800.
I haven't done much local inference on it, but various YouTubers are starting to call the DGX Spark overkill / overpriced next to Strix Halo. The catch of course is ROCm isn't there yet (they're seeming serious now though, matter of time).
Flawless CUDA on Apple gear would make it really tempting in a way that isn't true with Strix so cheap and good.
This is exciting. So this is using unified memory of CUDA? I wonder how well that works. Is the behavior of the unified memory in CUDA actually the same as for Apple silicon? For Apple silicon, as I understand, the memory is anyway shared between GPU and CPU. But for CUDA, this is not the case. So when you have some tensor on CPU, how will it end up on GPU then? This needs a copy somehow. Or is this all hidden by CUDA?
I’ve been very impressed with MLX models; I can open up local models to everyone in the house, something I wouldn’t dare with my Nvidia computer for the risk of burning down the house.
I’ve been hoping Apple Silicon becomes a serious contender for Nvidia chips; I wonder if the CUDA support is just Embrace, extend, and extinguish (EEE).
I wonder if Jensen is scared. If this opens up the door to other implementations this could be a real threat to Nvidia. CUDA on AMD, CUDA on Intel, etc. Might we see actual competition?
Edit: I had the details of the Google v Oracle case wrong. SCOTUS found that re-implementing an API does not infringe copyright. I was remembering the first and second appellate rulings.
Also apparently this is not a re-implementation of CUDA.
Apple's MLX adding CUDA support
(github.com)548 points by nsagent 14 July 2025 | 203 comments
Comments
1. Programs built against MLX -> Can take advantage of CUDA-enabled chips
but not:
2. CUDA programs -> Can now run on Apple Silicon.
Because the #2 would be a copyright violation (specifically with respect to NVidia's famous moat).
Is this correct?
https://ml-explore.github.io/mlx/build/html/install.html
I would think that bringing that to all UMA APUs (of any vendor) would be interesting, but discreet GPU's definitely would need a different approach?
edit: reading the PR comments, it appears that CUDA supports a UMA API directly, and will transparently copy as needed.
Edit: looks like it's "write once, use everywhere". Write MLX, run it on Linux CUDA, and Apple Silicon/Metal.
Does this means you can use MLX on linux now?
Edit:
Just tested it and it's working but only python 3.12 version is available on pypi right now: https://pypi.org/project/mlx-cuda/#files
looks like it allows MLX code to compile and run on x86 + GeForce hardware, not the other way around.
Idly wondering, is Apple bankrolling this but wants to keep it in the DL? There were also rumours the team was looking to move at one point ?
Academia and companies continue to write proprietary code. Its as if we continue to write code for Adobe Flash or Microsoft Silverlight in year 2025.
Honestly, I don't mind as Nvidia shareholder.
I haven't done much local inference on it, but various YouTubers are starting to call the DGX Spark overkill / overpriced next to Strix Halo. The catch of course is ROCm isn't there yet (they're seeming serious now though, matter of time).
Flawless CUDA on Apple gear would make it really tempting in a way that isn't true with Strix so cheap and good.
I’ve been hoping Apple Silicon becomes a serious contender for Nvidia chips; I wonder if the CUDA support is just Embrace, extend, and extinguish (EEE).
Also apparently this is not a re-implementation of CUDA.