Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory
Google’s Always-On Memory Agent: The Future of Persistent AI Agents
By [Your Name], Technology Correspondent
In a groundbreaking move that could reshape how AI agents operate in enterprise environments, Google senior AI product manager Shubham Saboo has open-sourced an innovative “Always On Memory Agent” that tackles one of the most persistent challenges in agent design: continuous, reliable memory.
The timing couldn’t be more perfect. As businesses increasingly deploy AI agents for customer support, research assistance, and workflow automation, the ability to maintain context across sessions has become a critical bottleneck. Saboo’s solution, released on the official Google Cloud Platform GitHub repository under a permissive MIT License, offers a compelling blueprint for the next generation of autonomous agents.
The Architecture: Simple, Bold, and Provocative
At its core, the Always On Memory Agent represents a deliberate departure from conventional wisdom. As Saboo himself states in the repository documentation, the system operates on a provocative premise: “No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory.”
This minimalist approach is built using Google’s Agent Development Kit (ADK)—introduced in Spring 2025—and powered by Gemini 3.1 Flash-Lite, a model Google unveiled on March 3, 2026, as its fastest and most cost-efficient offering in the Gemini 3 series.
The agent runs continuously, ingesting files or API inputs, storing structured memories in SQLite, and performing scheduled memory consolidation every 30 minutes by default. The system supports text, image, audio, video, and PDF ingestion through a local HTTP API and Streamlit dashboard.
The Economic Logic Behind Flash-Lite
The choice of Gemini 3.1 Flash-Lite isn’t arbitrary. Google positions this model as built for high-volume developer workloads at scale, priced at $0.25 per million input tokens and $1.50 per million output tokens. The company claims Flash-Lite is 2.5 times faster than Gemini 2.5 Flash in time-to-first-token delivery and delivers a 45% increase in output speed while maintaining similar or better quality.
On Google’s published benchmarks, Flash-Lite posts an Elo score of 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. These characteristics make it particularly well-suited for high-frequency tasks like translation, moderation, UI generation, and—critically—background memory operations.
For a 24/7 service that periodically re-reads, consolidates, and serves memory, predictable latency and low inference costs are essential. Flash-Lite’s economics make the “always on” model viable rather than prohibitively expensive.
Beyond the Demo: A Reference Point for Agent Infrastructure
While the memory agent itself is compelling, its significance extends beyond a single implementation. Google’s ADK framework is presented as model-agnostic and deployment-agnostic, supporting workflow agents, multi-agent systems, tools, evaluation, and deployment targets including Cloud Run and Vertex AI Agent Engine.
This combination positions the memory agent as more than a one-off demo—it’s a reference point for a broader agent runtime strategy. Saboo is effectively trying to make agents feel like deployable software systems rather than isolated prompts, with memory becoming part of the runtime layer rather than just an add-on feature.
The Enterprise Reality Check
However, the public reaction reveals why enterprise adoption won’t hinge solely on speed or token pricing. Several responses on X highlighted exactly the concerns enterprise architects are likely to raise.
Franck Abe called Google ADK and 24/7 memory consolidation “brilliant leaps for continuous agent autonomy,” but warned that an agent “dreaming” and cross-pollinating memories in the background without deterministic boundaries becomes “a compliance nightmare.”
ELED made a related point, arguing that the main cost of always-on agents isn’t tokens but “drift and loops”—the unpredictable behavior that can emerge when systems continuously update their own understanding without clear boundaries.
These critiques go directly to the operational burden of persistent systems: who can write memory, what gets merged, how retention works, when memories are deleted, and how teams audit what the agent learned over time.
Iffy challenged the repo’s “no embeddings” framing, arguing that the system still has to chunk, index, and retrieve structured memory, and that it may work well for small-context agents but break down once memory stores become much larger.
What Saboo Has Shown—and What He Hasn’t
Critically, the provided materials don’t include a direct Flash-Lite versus Anthropic Claude Haiku benchmark for agent loops in production use. They also don’t lay out enterprise-grade compliance controls specific to this memory agent, such as deterministic policy boundaries, retention guarantees, segregation rules, or formal audit workflows.
While the repo appears to use multiple specialist agents internally, the materials don’t clearly prove a larger claim about persistent memory shared across multiple independent agents. For now, the repo reads as a compelling engineering template rather than a complete enterprise memory platform.
The Timing Is Everything
The release lands at exactly the right moment. Enterprise AI teams are moving beyond single-turn assistants and into systems expected to remember preferences, preserve project context, and operate across longer horizons. Saboo’s open-source memory agent offers a concrete starting point for that next layer of infrastructure, and Flash-Lite gives the economics some credibility.
But the strongest takeaway from the reaction around the launch is that continuous memory will be judged on governance as much as capability. That’s the real enterprise question behind Saboo’s demo: not whether an agent can remember, but whether it can remember in ways that stay bounded, inspectable, and safe enough to trust in production.
Tags: #Google #AI #MachineLearning #AgentDevelopment #ADK #Gemini #FlashLite #AlwaysOnMemory #EnterpriseAI #OpenSource #SQLite #MemoryManagement #AutonomousAgents #TechInnovation #AIInfrastructure #SoftwareEngineering
Viral Phrases:
- “No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory.”
- “The compliance nightmare of agents dreaming in the background”
- “The real cost isn’t tokens—it’s drift and loops”
- “Making agents feel like deployable software systems”
- “Continuous memory judged on governance as much as capability”
- “The economics of always-on AI”
- “Persistent memory: bounded, inspectable, and safe”
- “Google’s bold departure from conventional wisdom”
- “The next layer of agent infrastructure”
- “Engineering template versus complete enterprise platform”
,




Leave a Reply
Want to join the discussion?Feel free to contribute!