According to Business Insider, the startup Modular, founded in 2022 by ex-Apple and Google engineers Chris Lattner and Tim Davis, is attempting to break Nvidia’s dominance in AI software. The company has raised $380 million from investors like Greylock and GV, reaching a $1.6 billion valuation in September. Their new software stack includes a programming language called Mojo and an inference engine called MAX, which they claim got AMD’s latest MI355X GPUs to perform roughly 50% better than on AMD’s own software. In a key test, customer Inworld AI saw a 60% cost reduction and 40% lower latency. The core promise is a portable platform that can run AI workloads across chips from Nvidia, AMD, and Apple, challenging the lock-in of Nvidia’s CUDA ecosystem.
The CUDA Moat And Why It’s A Problem
Here’s the thing about Nvidia‘s CUDA: it’s genius. It started as a way to program graphics chips and basically became the de facto operating system for the AI boom. But that success created a massive problem for everyone else. The entire industry is now optimized around one company’s hardware. Want to use an AMD GPU or a Google TPU? You often need a completely different, and usually less mature, software stack. It’s a ton of work. So most developers just stick with CUDA and Nvidia GPUs because it’s the path of least resistance, even if it’s more expensive.
And that’s the crazy paradox Lattner spotted. Every chipmaker is incentivized to build software only for their own hardware. Nvidia has zero reason to make CUDA run well on a competitor’s chip—that would destroy their moat. But every AI developer wants portability. They crave the flexibility to mix, match, and shop around for the best price-performance. Modular is betting big that there’s a fortune to be made by being the neutral Switzerland in this hardware war.
Modular’s “Android For AI” Play
Lattner’s analogy is perfect: he says they’re trying to build “something like Android, but for AI hardware.” Think about it. Android didn’t kill iOS. It created a vibrant, competitive ecosystem for smartphone makers outside of Apple. iPhones still thrive at the high end. Lattner believes the same can happen in AI. Nvidia can continue to be the “Apple” – the premium, high-performance option. But Modular’s software could let AMD, Google, Amazon, and others compete more effectively on a level playing field.
The early results are seriously intriguing. Getting 50% better performance out of AMD chips on Modular’s stack versus AMD’s own software? That’s not a small tweak; that’s a game-changer. It suddenly makes the question “can AMD compete with Nvidia’s latest Blackwell chip?” a real one. For businesses building AI applications, the ability to easily benchmark across vendors without rewriting everything is a huge unlock. It turns hardware into more of a commodity, which is great for buyers but terrifying for incumbents used to walled gardens.
The Real-World Bet And Nvidia’s Nuclear Option
The proof is in the pudding, and customers like Inworld AI are putting real money down. When a startup issues a challenge to cut costs by 60% and you deliver in four weeks, you’ve got their attention. That’s performance you can’t ignore. But beyond the raw speed on Nvidia chips, the optionality is the killer feature. As Kylan Gibbs of Inworld said, if TPUs take off or some new hardware emerges, they want to be able to move without a massive software rewrite.
This leads to the billion-dollar question: what does Nvidia do? Gibbs pointed out the obvious nuclear option: “Nvidia could kill this in a day.” They could open up CUDA to run on AMD GPUs. But that would be like Apple licensing iOS to Samsung—it destroys the business model. It’s the ultimate prisoner’s dilemma. Nvidia’s software dominance is the lock that keeps customers buying their (very profitable) hardware. Unlocking it feels unthinkable, even as pressure mounts from alternatives like Google’s TPUs gaining momentum with Gemini.
A Long Shot With A Real Chance
Look, trying to dethrone a software ecosystem as entrenched as CUDA is a monumental task. It’s the definition of a long shot. But if anyone has the technical chops and industry credibility to pull it off, it’s probably Chris Lattner and his team. They’re not just talking; they’re shipping code that delivers measurable, eye-popping results for early customers.
I don’t think Nvidia is going anywhere. They’ll likely remain the performance leader for the foreseeable future. But the dream of a more open, competitive, and portable AI hardware landscape isn’t just a dream anymore. It’s a working prototype. And in a world where every company is scrambling for AI compute, a solution that promises better performance today *and* freedom from vendor lock-in tomorrow is incredibly powerful. The industry needs this. Whether it needs it enough to overcome CUDA’s massive inertia is the drama we’re about to watch unfold. For anyone sourcing industrial computing power, from AI servers to integrated systems, this push for standardization and portability is a trend worth watching closely, as it promises more choice and better value across the board.
