8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

AT&T’s AI Revolution: How 8 Billion Daily Tokens Led to a 90% Cost Reduction and a New Era of Agentic AI

When your organization processes 8 billion tokens daily, you’re not just dealing with big data—you’re wrestling with an economic monster that threatens to devour your AI ambitions whole. This was precisely the challenge facing AT&T’s chief data officer Andy Markus and his team, who recognized that pushing everything through large reasoning models wasn’t just impractical—it was financially unsustainable.

The solution? A complete reimagining of their AI architecture that’s now saving AT&T up to 90% in costs while dramatically improving performance across the board.

From Monolithic Models to Multi-Agent Orchestration

The transformation began when Markus’s team rebuilt the orchestration layer for their internal Ask AT&T personal assistant. Instead of relying on massive, expensive models for every query, they created a sophisticated multi-agent stack built on LangChain. This architecture features large language model “super agents” that direct smaller, specialized “worker” agents, each performing concise, purpose-driven tasks.

“Think of it as an AI command center,” Markus explained. “Rather than having one massive brain try to solve everything, we’ve created an ecosystem where specialized agents work together, each doing what they do best.”

This approach has delivered remarkable results. Latency has plummeted, response times have accelerated, and most impressively, the company has achieved up to 90% cost savings compared to their previous monolithic approach.

The Small Language Model Revolution

Perhaps the most revolutionary insight from AT&T’s journey is Markus’s bold prediction about the future of AI: “I believe the future of agentic AI is many, many, many small language models (SLMs).”

The data backs this up. AT&T has found that small language models can achieve accuracy levels “just about as accurate, if not as accurate, as a large language model on a given domain area.” This discovery has profound implications for the entire AI industry, suggesting that bigger isn’t always better—and certainly isn’t always more economical.

Ask AT&T Workflows: Democratizing AI for 100,000 Employees

Building on this architectural foundation, Markus and his team recently leveraged Microsoft Azure to launch Ask AT&T Workflows, a graphical drag-and-drop agent builder that’s putting AI power directly into the hands of employees. The platform integrates seamlessly with AT&T’s proprietary tools for document processing, natural language-to-SQL conversion, and image analysis.

“What makes this powerful is that it’s AT&T’s data driving the decisions,” Markus emphasized. “We’re not asking general questions—we’re asking questions of our data, and we bring our data to bear to make sure it focuses on our information as it makes decisions.”

The platform offers two distinct paths: a pro-code option for developers who want to program Python and dictate precise rules, and a no-code visual interface for those seeking a “pretty light user experience.” Interestingly, even technically proficient users have shown a strong preference for the no-code option, with over half of participants at a recent hackathon choosing the low-code path despite their programming expertise.

Human Oversight in an Autonomous World

Despite the autonomous nature of the system, AT&T maintains strict human oversight. Every agent action is logged, data remains isolated throughout processing, and role-based access controls govern how agents interact with each other. “Things do happen autonomously,” Markus acknowledged, “but the human on the loop still provides a check and balance of the entire process.”

This balanced approach ensures that while efficiency increases dramatically, accountability and control remain firmly in human hands.

Strategic Flexibility: The Plug-and-Play Philosophy

AT&T’s approach to AI development is characterized by remarkable flexibility. Rather than building everything from scratch, Markus’s team relies on “interchangeable and selectable” models, avoiding the trap of “rebuilding a commodity.”

“Because in this space, things change every week, if we’re lucky, sometimes multiple times a week,” Markus explained. “We need to be able to pilot, plug in and plug out different components.”

This philosophy extends to their evaluation process. AT&T conducts “really rigorous” assessments of both available options and their own tools. Their Ask Data with Relational Knowledge Graph has topped the Spider 2.0 text-to-SQL accuracy leaderboard, while other tools have scored highly on the BERT SQL benchmark.

Avoiding the Over-Engineering Trap

One of Markus’s most valuable pieces of advice for AI builders is perhaps the simplest: don’t overcomplicate things. “Sometimes we over complicate things,” he warned. “Sometimes I’ve seen a solution over engineered.”

Instead, builders should critically examine whether a tool actually needs to be agentic. Questions like “What accuracy level could be achieved if it was a simpler, single-turn generative solution?” and “How could we break it down into smaller pieces where each piece could be delivered ‘way more accurately’?” should guide development decisions.

Real-World Impact: 90% Productivity Gains

The numbers tell a compelling story. With over 100,000 employees using Ask AT&T Workflows, more than half report daily usage. Active adopters are reporting productivity gains as high as 90%, and the platform’s “stickiness”—measured by repeated usage—serves as a key indicator of success.

Employees are finding creative applications across various functions. Network engineers, for instance, can build chains of agents to handle customer connectivity issues. One agent correlates telemetry to identify network problems and their locations, pulls change logs, checks for known issues, and opens trouble tickets. Another devises solutions and even writes code to patch the problem. A third agent then generates comprehensive summaries with preventative measures for the future.

“The [human] engineer would watch over all of it, making sure the agents are performing as expected and taking the right actions,” Markus explained.

AI-Fueled Coding: The Next Frontier

Perhaps the most exciting development is what Markus calls “AI-fueled coding”—a revolutionary approach that’s fundamentally changing how AT&T writes code. This technique draws parallels to RAG, with developers using agile coding methods in integrated development environments (IDEs) alongside “function-specific” build archetypes that dictate how code should interact.

The results are stunning. The output is “very close to production grade” and can reach that quality in a single iteration. “We’ve all worked with vibe coding, where we have an agentic kind of code editor,” Markus noted. “But AI-fueled coding eliminates a lot of the back and forth iterations that you might see in vibe coding.”

This approach is “tangibly redefining” the software development cycle, shortening development timelines and increasing output of production-grade code. Even non-technical teams can participate, using plain language prompts to build software prototypes.

To illustrate the power of this approach, Markus shared how his team built an internal curated data product in just 20 minutes using AI-fueled coding—a task that would have taken six weeks without AI assistance. “We develop software with it, modify software with it, do data science with it, do data analytics with it, do data engineering with it,” he said. “So it’s a game changer.”

The Future of Enterprise AI

AT&T’s journey offers a blueprint for enterprise AI adoption that balances innovation with practicality, autonomy with oversight, and cutting-edge technology with proven principles. The three core principles Markus emphasizes—accuracy, cost, and tool responsiveness—continue to guide their efforts even as solutions become more complex.

As enterprises worldwide grapple with similar challenges of scale, cost, and complexity, AT&T’s multi-agent approach, emphasis on small language models, and commitment to practical implementation provide valuable lessons. The future of enterprise AI isn’t about building bigger models—it’s about building smarter systems that leverage the right tool for the right job, orchestrated by intelligent agents working in harmony.

In an industry where change happens “multiple times a week,” AT&T’s flexible, pragmatic approach ensures they’re not just keeping pace—they’re setting the pace for what enterprise AI can and should be.

Tags

multi-agent AI, small language models, enterprise AI, cost reduction, AT&T innovation, LangChain, AI orchestration, workflow automation, AI-fueled coding, productivity gains, agentic AI, Microsoft Azure, data-driven decisions, human oversight, flexible architecture, RAG implementation, no-code AI, pro-code development, software development revolution, token optimization, enterprise transformation

Viral Sentences

AI is eating the world, and AT&T just figured out how to make it cost 90% less while doing it. The future isn’t bigger models—it’s smarter orchestration. When you’re processing 8 billion tokens daily, every optimization matters. Small language models are the new big thing in enterprise AI. The multi-agent revolution is here, and it’s saving companies millions. AI-fueled coding is redefining what’s possible in software development. The no-code revolution is attracting even the most technical users. Enterprise AI success isn’t about building everything—it’s about building the right things. When your AI architecture can save 90% on costs, you’ve cracked the code. The future of coding is plain language prompts producing production-grade code.

,

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *