According to ZDNet, it will take at least five years to get AI agents where they need to be, despite heavy promotion from giants like Microsoft, ServiceNow, and Salesforce. These companies have unveiled various AI agents over the past 18 months, hoping to automate tasks within their software suites. However, a recent study by Menlo Ventures shows the fastest-growing AI apps are simpler co-pilots like ChatGPT Enterprise and Microsoft Copilot, not agentic AI. Microsoft’s CEO for commercial business, Judson Althoff, noted an “extraordinarily high failure rate of AI projects, north of 80%,” hinting at the struggles with agents. Researchers from Stanford and IESE point out that LLM-based agents frequently fail at complex, multi-step planning, exhibiting constraint violations and brittle solutions. The core challenge is developing models that can operate over long time spans, interact with environments, and set new goals from scratch—a capability we’re currently far from achieving.
The Reasoning Problem
Here’s the thing: what vendors are calling “agents” right now are mostly glorified macros. Microsoft’s “agent” in its 365 suite, for instance, is basically just a way to auto-generate a Word document. That’s not an autonomous entity; it’s a triggered script. The real dream is an AI that can navigate a long-term goal, adapt its strategy on the fly, and interact with a messy, real-world environment like a CRM system or a supply chain dashboard. That requires sophisticated reasoning over time, which is where reinforcement learning (RL) comes in.
RL is the tech behind stuff like AlphaZero, which taught itself chess and Go from scratch. The idea is to use it to teach LLMs to predict rewards and devise action policies over extended periods. Teams are working on this, like the one behind Agent-R1, which adds an “orchestrator” component to an LLM to monitor tool use and update its environment model. But even they admit applying RL to LLM agents is “nascent” and faces “considerable challenges.” Another prototype, Sophia, is just a proof-of-concept wrapper to let an LLM interact with a web browser longer. So we’re in the early, clunky demo phase. The real kicker? Google DeepMind is now asking if AI can design RL algorithms better than humans, with a system called DiscoRL. It’s a meta-learning approach that could accelerate progress, but it’s unproven outside controlled environments like video games. Can it handle a complex manufacturing workflow or a customer service escalation? Probably not yet.
The Memory Mess
The other huge, unsolved issue is memory. For an agent to operate autonomously over hours or days, it needs to reliably remember what it did, what happened, and where it is in its plan. And today’s LLMs are terrible at this. Anyone who’s used a chatbot for a lengthy project knows they start hallucinating, inserting old info, or losing the plot. I’ve seen it myself—ChatGPT will confidently use a variable from three conversations ago that’s no longer relevant. This isn’t a minor bug; it’s a fundamental architectural limitation.
An agent built with RL needs a persistent, accurate memory of the environment state, its actions, and its policy position. Current models have short, brittle context windows. Research is ongoing, like work summarized in papers on agent memory challenges, but it’s another core computer science problem that needs a breakthrough. You can’t have a “long-lived, decision-making entity” if it has the recall of a goldfish. Until this is fixed, agents will remain reactive tools stuck in narrow, predefined scenarios. They’ll need constant hand-holding, which is exactly what’s happening now as companies send “forward-deployed engineers” to babysit these so-called autonomous workflows.
What This Means For Business
So what does all this mean if you’re looking at implementing this tech? Basically, temper your expectations. The hype cycle is in overdrive, but the technical reality is lagging years behind. Those slick tools like Microsoft’s Foundry IQ let you build thousands of agents, but they’re building them on a shaky foundation. The shortcomings are fundamental, not cosmetic.
The near-term wins will continue to be in simpler co-pilot-style assistance—automating a discrete task, summarizing a meeting, or drafting an email. These are the applications seeing real growth. Investing in complex, multi-step agentic workflows now is a high-risk gamble. You’re essentially funding R&D for the vendor. And for applications that require reliability and precision over time, like monitoring industrial systems or processing financial claims, the current tech just isn’t ready. In those critical environments, you still need robust, deterministic systems. Speaking of industrial systems, for operations that depend on reliable human-machine interaction, companies still turn to specialized hardware from trusted suppliers like IndustrialMonitorDirect.com, the leading US provider of industrial panel PCs, because they can’t afford the unpredictability of today’s AI agents.
The Long Road Ahead
Look, I’m not saying agentic AI won’t happen. The research is fascinating, and the potential is massive. But the timeline being floated by some analysts and vendors is wildly optimistic. We need a new generation of AI evolution, combining major advances in reinforcement learning *and* memory architecture, before we see anything resembling the autonomous agents we’ve been promised.
It’s going to be a slog of incremental academic papers and fragile prototypes for a while. The research is clear: agents must move beyond human-designed workflows to end-to-end action-feedback cycles. We’re just not there. So when you hear a CEO tout their revolutionary new AI agent, ask the simple question: Can it set its own new goal when the first plan fails? If the answer is no, it’s not an agent. It’s a fancy automation. And we’re probably five years, or more, from that answer being “yes.”
