Inside AT&T’s Agentic AI Stack: How 8 Billion Tokens a Day Led to a 90% Cost Cut

AT&T’s AI platform was quietly burning through roughly 8 billion tokens every day. For most enterprises, that level of usage is a sign that adoption is working. For Andy Markus, AT&T’s chief data officer, it was also a warning: pushing everything through large reasoning models was neither technically sustainable nor economically defensible.

In response, Markus and his team re-architected their orchestration layer around a multi-agent pattern built on LangChain, leaning on small, purpose-built models and a “super agent plus worker agents” design. The result: improved latency and responsiveness, and up to 90% cost savings on model usage, all while scaling to more than 100,000 internal users.

From 8 Billion Tokens a Day to a New Orchestration Strategy

AT&T’s scale problem started with success. Internal demand for its Ask AT&T personal assistant drove token volumes to around 8 billion per day. At that rate, naively routing every request to large, general-purpose models made little sense.

Markus’s team concluded that the orchestration layer—not the models themselves—had to change. They rebuilt it into a multi-agent stack using LangChain as the backbone. At the top sit “super agents,” which act as coordinators. These super agents decide which downstream “worker” agents should handle parts of a task, based on purpose, capability, and required context.

Worker agents are smaller, more specialized language models optimized for concise, domain-specific tasks—such as document processing, natural language–to–SQL translation, or image analysis. This separation of concerns allows AT&T to reserve large language models (LLMs) for when they’re truly needed, offloading routine or constrained work to small language models (SLMs).

According to Markus, SLMs can be “just about as accurate, if not as accurate, as a large language model on a given domain area.” That domain focus, combined with targeted orchestration, is where most of the 90% cost reduction comes from, along with latency and responsiveness gains that would be difficult to achieve with a monolithic LLM-only approach.

Inside the LangChain-Based Multi-Agent Stack

Technically, AT&T’s stack is built around LangChain as the core framework for agent construction and coordination. LangChain provides the abstractions to define super agents, worker agents, tools, and routing logic, giving Markus’s team a programmable way to express how tasks should be decomposed and executed.

On top of this, the company layers multiple proprietary tools as agent capabilities. These include systems for document processing, text-to-SQL conversion, and image analysis. Agents invoke these tools to interact with AT&T’s own data: as Markus describes it, “it’s AT&T’s data that’s really driving the decisions.” The agents don’t answer general web-scale questions; they interrogate and act over internal datasets.

To support this, AT&T uses standard retrieval-augmented generation (RAG) patterns and additional in-house algorithms to fine-tune models and ground them in its enterprise context. Microsoft plays a key role as an infrastructure partner: Ask AT&T Workflows runs on Microsoft Azure, and AT&T leans on Microsoft’s search functionality for its vector store layer.

All actions taken by agents are logged, with strict data isolation between steps and role-based access controls enforced as workloads are passed between agents. This “chain reaction” of agents is always overseen by a human “on the loop,” providing a final check and balance over autonomous behavior.

Composable Models, Not Rebuilding Commodities

Despite the depth of its platform, AT&T deliberately avoids a “build everything from scratch” philosophy. Markus emphasizes using models and tools that are “interchangeable and selectable,” and “never rebuilding a commodity” where suitable off-the-shelf options exist.

This is primarily a hedge against the pace of change in the AI tooling ecosystem. As Markus puts it, capabilities in this space can change weekly—or even multiple times per week. A rigid, fully custom stack would quickly become a liability. Instead, AT&T’s architecture is designed to pilot, plug in, and swap out components with minimal friction.

That doesn’t mean the company avoids internal innovation. AT&T runs rigorous evaluations on both external and in-house models and tools. For example, its Ask Data with Relational Knowledge Graph has led the Spider 2.0 text-to-SQL accuracy leaderboard, and other internal components have performed strongly on the BERT SQL benchmark. But even these successes are treated as candidates in an evolving portfolio, not permanent fixtures.

For enterprise leaders, this pattern suggests a pragmatic blueprint: build differentiating capabilities where proprietary data and domain knowledge matter most, and keep everything else modular and replaceable.

When to Use Agentic AI—and When Not To

AT&T’s stack is agentic by design, but Markus is explicit: not every solution should use agents. He has seen systems “over engineered” simply because agentic AI is available.

Instead, he advises teams to start with three core principles—accuracy, cost, and responsiveness—and then ask targeted design questions:

Could a simpler, single-turn generative solution achieve the required accuracy?
Can the problem be decomposed into smaller pieces where each step can be delivered “way more accurately” by a focused model or tool?
Does adding multi-step agentic behavior materially improve outcomes relative to its orchestration complexity and cost?

Even as AT&T’s internal solutions have become more complex, these basic criteria continue to guide architectural decisions. For technical decision-makers, the message is clear: agentic AI should be a means to specific performance and cost outcomes, not an end in itself.

Ask AT&T Workflows: Agent Building for 100,000+ Employees

The clearest proof point for AT&T’s orchestration strategy is Ask AT&T Workflows, a graphical agent builder deployed to more than 100,000 employees. More than half of those users report using the system daily, and active adopters have reported productivity gains as high as 90%, according to Markus.

The platform is structured around two distinct “journeys”:

Pro-code: Users can write Python behind the scenes, defining explicit rules for how agents should behave, chain tools, and handle edge cases.
No-code / low-code: A drag-and-drop visual interface allows users to connect building blocks into workflows for a “pretty light user experience.”

Interestingly, even highly technical users are gravitating toward the low-code path. At a recent hackathon aimed at a technical audience, more than half of participants chose the drag-and-drop option over pro-code, despite being proficient programmers. This suggests that, at scale, usability and speed of composition often matter more than full programmatic control.

Use cases span a wide range of operational workflows. One example comes from network engineering. A network engineer might design a series of agents to respond when customers lose connectivity:

One agent correlates telemetry data to identify the issue and its location, and checks change logs and known issues.
That agent can automatically open a trouble ticket based on its findings.
A second agent proposes potential remediation steps and can even generate new code to patch the problem.
A third agent writes a post-incident summary including preventative measures for the future.

Throughout this process, the human engineer supervises, validating that agents’ actions are appropriate and that automation doesn’t drift beyond acceptable bounds. For enterprises considering agent builders, AT&T’s experience highlights the importance of a clear oversight model combined with broad accessibility across both technical and non-technical staff.

AI-Fueled Coding: Redefining the Software Lifecycle

The same “break work into smaller, purpose-built pieces” mindset also underpins how AT&T now writes software. Markus refers to this as “AI-fueled coding,” a technique that aims to move far beyond casual code suggestion.

He likens the approach to RAG, but applied to software development. Developers use agile methods within an IDE, guided by “function-specific” build archetypes that dictate how code should interact and be structured. Rather than generating loose or experimental snippets, the output is “very close to production grade,” and in some cases can reach production quality in a single turn.

Markus contrasts this with what he calls “vibe coding,” where developers interact with a conversational code editor in an open-ended, iterative way. AI-fueled coding reduces that back-and-forth by constraining generation through archetypes and patterns, effectively narrowing the solution space and improving the odds that the first answer is shippable.

This shift is “tangibly redefining” AT&T’s software development cycle. It shortens timelines and increases the volume of production-grade code the organization can output. Crucially, it also pulls non-technical teams into the development process. Using plain-language prompts, business users can generate working software prototypes that meet internal architectural expectations.

In one example Markus cites, his team built an internal curated data product in about 20 minutes using AI-fueled coding. By their estimation, the same project would have taken roughly six weeks without these techniques. Today, they use it across software development, modification, data science, analytics, and data engineering—what Markus simply calls “a game changer” for how the company builds and adapts systems.

Lessons for Enterprise AI Leaders

AT&T’s journey from 8 billion tokens a day to a 90% cost reduction offers several concrete takeaways for enterprise AI engineers and data leaders:

Architect for orchestration first. Massive scale issues often stem less from the models themselves and more from how they’re invoked. A multi-agent architecture with clear roles (super agents vs. worker agents) can balance capability, cost, and latency.
Exploit small, domain-focused models. In constrained domains with strong proprietary data, SLMs can match or exceed LLM accuracy while being significantly cheaper to run.
Make models and tools swappable. Design around “interchangeable and selectable” components so that rapid industry advances become an advantage rather than a disruption.
Use agentic AI selectively. Let accuracy, cost, and responsiveness dictate whether to use agents, and resist the temptation to overcomplicate problems that simpler generative patterns can solve.
Empower the broad workforce. Provide both pro-code and low-code entry points so that technical teams can go deep while non-technical staff can still compose useful agents and workflows.
Apply AI to the development process itself. Techniques like AI-fueled coding show that orchestration principles can accelerate not only business workflows but also the software lifecycle that supports them.

For organizations wrestling with escalating AI usage and costs, AT&T’s experience underscores a central point: the path to sustainable, large-scale AI is less about the size of individual models and more about disciplined orchestration, modular design, and human-guided automation.

Cary Huang

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.