Your Coding Agent Remembers Everything, Until It Doesn't
Context windows can retrieve facts with remarkable accuracy, until compaction silently erases the project rules and conventions your team depends on.
Listen to this article
Keeps playing as you browse the site.
Your coding agent has a remarkable memory. Within the bounds of a single conversation, a frontier model like Claude Sonnet 4.5 can retrieve specific facts with 100% accuracy from a context window that's 97.5% full. No degradation, no confusion, even when the facts are deliberately designed to be confusable. The model really is that good at it.
And that's precisely what makes the failure so dangerous.
A recent paper by Oliver Zahn and Simran Chana puts hard numbers on a problem that practitioners have felt but rarely measured. They call it context rot: the silent erosion of stored knowledge through the lifecycle operations that production systems must perform. Context windows are finite. When they fill up, and in any real-world use they do, older content must be compressed to make room for new information. The paper measures what that compression costs: 60% of stored facts become irrecoverable after a single compaction pass.
But the more alarming finding is about goals and constraints. After repeated compaction, the model retained only 46% of non-default project rules: things like "deploy only to EU regions" or "never use Redis for caching." It didn't flag the loss. It kept working with full confidence, quietly violating rules it had forgotten existed.
Map this onto software development: your team's architectural decisions, your hard-won conventions, the deployment rules that exist because someone learned the hard way. These are the kinds of knowledge that compaction destroys.
Separation of Concerns
The paper frames the root cause as: "The model remains the system's intelligence; it is simply relieved of the burden of also being the system's memory, a burden for which its architecture is not well suited."
This is a separation of concerns argument, and one I've made in the past. For anyone who has built software systems, it ought to feel familiar: we don't store databases in RAM and hope nobody turns off the machine. Yet the dominant approach to AI agent memory does exactly this: stores everything in the model's working memory and hopes the context window never fills up. A larger context window is a larger bucket but it is still a bucket.
The issue is architectural, not a matter of model capability. No amount of model improvement will fix it, because the constraint is systemic: context is finite, sessions are ephemeral, and compression into model weights is lossy.
Why We Built Spark
This separation of storage and processing is the founding architectural principle of Spark. We've been building on this basis for over a year, and it is gratifying to see independent research confirm the reasoning quantitatively.
In Spark, knowledge lives outside the model. It persists in an external memory layer that is independent of any particular context window, session, or agent. When an agent needs to know your team's conventions, it retrieves them from Spark. It doesn't rely on them having survived an arbitrary number of compression passes inside a prompt.
This design choice has several consequences that map directly onto the failure modes the paper identifies.
No compaction loss. Because knowledge isn't stored in the context, it doesn't need to be compressed when the context fills up. An insight that was contributed six months ago is retrieved with the same fidelity as one contributed yesterday. Your team's constraints don't degrade over time or across sessions. An external memory system can scale with your organization's needs, and it does so much better than a collection of markdown files.
No goal drift. The conventions and rules that define how your team works, the non-default, environment-specific knowledge that the paper shows is most vulnerable to compaction, are stored as first-class objects in Spark, not as prose buried in a prompt that will eventually be summarised away. They survive because they were never at risk in the first place.
Knowledge accumulates, not decays. In the in-context model, every new session starts from zero, and every long session gradually erodes what came before. In Spark, the opposite happens: knowledge grows over time, curated by a continuous learning loop that refines, deduplicates, and quality-scores the memory. The system gets better the more it is used, rather than worse the longer it runs.
Knowledge is shared. The paper focuses on single-agent memory, and the problems are already severe. But in a real engineering team, the situation is worse: multiple agents, multiple engineers, multiple sessions all need access to the same institutional knowledge. When that knowledge lives in Spark, a lesson learned by one agent is immediately available to every other agent on the team.
Not Just About Storage
It's worth noting that the paper's Knowledge Objects solve the storage problem cleanly but stop there. Real-world institutional knowledge needs more than persistence. It needs curation: knowing which insights are trustworthy, which have been superseded, which are broadly applicable versus narrowly situational.
This is where Spark goes further. Every piece of knowledge in Spark carries a trust score that evolves based on feedback: how often it's been used, whether it led to successful outcomes, whether it's been corrected. High-trust knowledge surfaces first. Knowledge that stops being useful gradually decays in relevance. This active quality management is the difference between a filing cabinet and institutional intelligence.
The Takeaway
The Zahn & Chana paper provides strong empirical evidence for something we've believed since we started building Memco: the model is not the right place to store knowledge. Not because the model is bad at retrieval, it's excellent, but because the system around it, finite windows, ephemeral sessions, lossy compression, makes storage unreliable by design.
The path to reliable AI agents isn't about bigger context windows. It's about recognising that memory is a distinct concern, one that deserves its own architecture, its own quality management, and its own persistence guarantees. That's what we're building.