True agentic AI is years away - here's why and how we get there | Retrui News

AI concept

Yuichiro Chino/Moment/Getty Images

Follow ZDNET: Add us as a preferred source on Google.

ZDNET's key takeaways

Today's AI agents don't meet the definition of true agents.
Key missing elements are reinforcement learning and complex memory.
It will take at least five years to get AI agents where they need to be.

The giants of enterprise technology -- Microsoft, ServiceNow, Salesforce, and others -- have spent the past year and a half unveiling various kinds of artificial intelligence agents, programs that can automate many tasks within their respective software suites.

Also: AI killed the cloud-first strategy: Why hybrid computing is the only way forward now

The vendors hope that these agents will manifest what they consider the true promise of generative AI: to make enterprise work more streamlined and productive.

While they may bring benefits, these agents are not the agents we really want. They are simple automations and don't live up to the true definition of an agent. As a result, enterprise hopes for agents are likely to meet with bitter disappointment in the near term. Key technology is missing from agents, and it may take another generation of AI evolution to bring the expected benefits.

The mess of AI agents today

Here's the key challenge: How do we develop large language models -- such as OpenAI's GPT and Google's Gemini -- to operate over long time spans in which they have broad goals; interact with their environment, including tools; retrieve and store data constantly; and -- the biggest challenge -- set new goals and strategies from scratch.

We're not there yet. We're not even close. Today's bots are limited to chat interactions and often fail outside that narrow operating context. For example, what Microsoft calls an "agent" in the Microsoft 365 productivity suite, probably the best-known instance of an agent, is simply a way to automatically generate a Word document.

Market data shows that agents haven't taken off. A study released this month by venture capital firm Menlo Ventures revealed that the fastest-growing area of AI applications consists almost entirely of simpler co-pilot programs, such as ChatGPT Enterprise, Claude for Work, and Microsoft Copilot, versus agentic AI, including Salesforce Agentforce, Writer, and Glean.

Also: While Google and OpenAI battle for model dominance, Anthropic is quietly winning the enterprise AI race

Simple automations can certainly bring about benefits, such as assisting a call center operator or rapidly handling numerous invoices. However, a growing body of scholarly and technical reports has highlighted the limitations of today's agents, which have failed to advance beyond these basic automations.

As researchers Gaurav Kumar and Anna Rana of Stanford University and The IESE business school of the Universidad de Navarra, succinctly point out in an overview of agents published this month, "Large Language Models have demonstrated impressive capabilities in reasoning and planning [but] LLM-based agents continue to fail in complex, multi-step planning tasks, frequently exhibiting constraint violations, inconsistent state tracking, and brittle solutions that break under minor changes."

The industry has also noted the problem. As Microsoft's CEO for commercial business, Judson Althoff, remarked this month at a Wall Street tech conference, "there is an extraordinarily high failure rate of AI projects, north of 80%." Although he didn't mention agents specifically, attempts to implement agents are probably among the thornier aspects of AI implementation.

There are numerous agentic tools available today, but they're not the answer. Offerings such as Microsoft's Foundry IQ let a company build thousands of different kinds of agents. That's nice, but the shortcomings of agents are inherent to the technology at a fundamental level, and slick tools won't resolve those shortcomings.

Also: Microsoft's new AI agents won't just help us code, now they'll decide what to code

Microsoft and other giants have plenty of staff helping customers to build "agentic workflows" -- they send on-site teams of "forward-deployed engineers" to do hand-holding. That's good, but hand-holding won't fix fundamental technology shortcomings.

Waiting for reinforcement

Before agents can live up to the "fully autonomous code" hype of Microsoft and others, they must overcome two primary technological shortcomings. Ongoing research across the industry is focused on these two challenges:

Developing a reinforcement learning approach to designing agents.
Re-engineering AI's use of memory -- not just memory chips such as DRAM, but the whole phenomenon of storing and retrieving information.

Reinforcement learning, which has been around for decades, has demonstrated striking results in enabling AI to carry out tasks over a very long time horizon.

The most notable example is Google DeepMind's AlphaZero, which was able to formulate rules for chess and the game of Go from scratch and then proceed through entire games at a level equal to or better than that of a human. That was all a result of reinforcement learning.

Also: AI scholars win Turing Prize for technique that made possible AlphaGo's chess triumph

Reinforcement learning involves an AI program generating predictions of the rewards that will result from taking actions in a given state of affairs, known as an environment, and then formulating a policy of action to obtain those rewards.

Reinforcement learning has increasingly been employed to improve LLM "reasoning" about a problem, such as the DeepSeek AI models that stunned the world at the beginning of 2025.

Several projects are attempting to extend reinforcement learning beyond reasoning functions to enable sustained activity by agents.

Mingyue Cheng and colleagues at China's University of Science and Technology in November unveiled what they call Agent-R1, a way to train LLMs with reinforcement learning to predict rewards and devise policies.

China University of Science and Technology

Cheng and team emphasized that agents must move beyond automated workflows and simple prompts to take a more autonomous approach.

"Workflows rely on human-designed routing or planning, while fully autonomous agents remove predefined workflows and interact with the environment proactively through an end-to-end action–feedback cycle," the team wrote.

In order to build something that conducts multiple operations without constantly being prompted, Cheng and team had to add components to LLMs -- such as an orchestrator. The orchestrator monitors what happens when an agent uses a tool, such as calling an outside program via an API. It then updates things such as the model of the environment, the rewards, and the policy.

Although R1 does better than a prompted LLM at "multi-hop" tasks, the kind that have multiple successive tasks, Cheng and team emphasize that agentic AI is, in their view, "an emerging field."

"The effective application of RL to LLM Agents is still in its nascent stages and faces considerable challenges," they wrote.

Another group, led by Mingyang Sun of Westlake University, unveiled Sophia this month, which they describe as a "wrapper" that enables an LLM to perform tasks over "prolonged durations" when interacting with a web browser.

Also: Even the best AI agents are thwarted by this protocol - what can be done

Sophia is a prototype, Sun and team emphasize, more of a proof of concept of how to add reinforcement learning to LLMs.

Even in Sun and team's optimistic view, today's LLMs are far from being real agents. "The rapid development of LLMs has elevated AI agents from task-specific tools to long-lived, decision-making entities capable of independent planning and strategic collaboration," they wrote. "However, most existing architectures remain reactive: they rely on manually crafted configurations that remain static after deployment, designed for narrow tasks or fixed scenarios."

How agents learn for themselves

On the horizon looms a significant shift in reinforcement learning itself, which could be a boon or further complicate matters. Can AI do a better job of designing reinforcement learning than humans?

Google DeepMind

That's the question posed by Google's DeepMind unit, the creators of AlphaZero, in a study published this month in Nature magazine. An AI program called DiscoRL automatically invents improved reinforcement learning algorithms that, in turn, lead to better agents.

The DiscoRL approach is a meta-learning approach that observes the results of multiple agents and then refines the predictions and policies that each agent formulates. As such, it can adapt agents for "radically different environments," unlike hand-crafted reinforcement learning rules that are often specific to a given problem.

Also: Enterprises are not prepared for a world of malicious AI agents

The DeepMind team refers to this approach as letting agents "discover learning algorithms for themselves."

That might accelerate the entire reinforcement learning field by eliminating human-designed reinforcement learning, just as AlphaZero dispensed with human examples of chess and Go, instead mastering the games by discovering their rules.

What's unknown is how generalized such an approach can be. DeepMind describes how DiscoRL agents achieved mastery of Atari video games such as Ms Pac-Man. But that's an area where previous reinforcement learning has already proven useful. Could such an approach master enterprise customer relationship management or insurance claims processing workflows from scratch? We don't yet know.

Waiting for real memory

The other key technological breakthrough waiting to happen is a complete rethinking of how agents store and retrieve data, broadly referred to as the memory usage of agents.

An AI agent developed through reinforcement learning must maintain a history of the environment, including the actions taken and the agent's current position within an overall policy of action -- functions intimately tied to memory.

Today's LLMs struggle to maintain the thread of conversation over multiple turns.

Anyone who's used a chatbot for a big project will notice that errors become more frequent, as bots can sometimes mistakenly insert information that came up much earlier in the conversation. I described that situation myself when I used ChatGPT over several days to formulate a business plan, and it started to insert incorrect variables into the calculation.

Also: I built a business plan with ChatGPT and it turned into a cautionary tale

The same kinds of failure across long spans of work are being seen by researchers when it comes to agents.

Stanford's Human-Centered AI group, in its annual State of AI report published in April, noted that agents fall behind human ability the longer they are asked to perform. "In short time-horizon settings (two-hour budget), top AI systems score four times higher than human experts, but as the time budget increases, human performance surpasses AI --outscoring it two to one at 32 hours."

Also: The AI model race has suddenly gotten a lot closer, say Stanford scholars

In a report published this month, lead author Yuyang Hu of the National University of Singapore and collaborating institutions, wrote that memory is the key to alleviating such failures.

A typical LLM uses only its most recent data, what's in its "context window", such as the most recent information you typed in the prompt.

However, to become "adaptive agents capable of continual adaptation through environmental interaction," as they put it, agents require "additional information derived from prior interactions, both within the current task and across previously completed tasks."

A lot of work has been spent on approaches to retrieval, such as retrieval-augmented generation (RAG) and vector databases. In fact, Hu and team have assembled a fantastic schematic of all the types of memory to which agents can have access. It's worth a close look:

National University of Singapore

But having a history in memory is not enough; management of memory itself must evolve, Hu and team argue. Their contention, purely theoretical at the moment, is that the entire control of memory will eventually be reinvented as agents "learn" about how to store and retrieve data via reinforcement learning.

You can see that this is a bit of a circular problem: reinforcement learning requires new forms of computer memory storage and retrieval to progress, but developing new forms of memory management itself may depend on reinforcement learning.

AGI won't solve it

Such big steps won't happen overnight. It's not a matter of a single company, such as DeepMind or Microsoft, offering a new LLM or even new LLM tools. What's required is a technological leap.

Nor is it likely these things can be magically solved anytime soon by artificial general intelligence (AGI), the fabled pinnacle of AI where the programs achieve some form of intelligent activity equal, broadly speaking, to human thought.

The greatest example of reinforcement learning we've seen, AlphaZero, wasn't a general intelligence; it was a specific problem solver. It solved chess because the rules of chess can be carefully defined, and because it's a game of "full information," where the so-called environment, the chess board and pieces, can be explicitly and completely described.

That's not the case for enterprise billing practices, customer service calls, and IT trouble ticket management. Again, we don't know how well the DiscoRL approach will generalize from Atari to these more complicated tasks.

The upshot: Given the complexity of re-engineering reinforcement learning and memory, we have a very long wait. Judging by how long it took to get from Google's original, breakthrough LLM, the Transformer, in 2017, to its progeny, ChatGPT, in 2022, an optimistic estimate of the time needed for the industry to achieve reliable agents is another five years.

RETRUI