If Your Coding Agents Don't Share Memory, You're Burning Money

Most teams running AI coding agents are paying for the same knowledge over and over again. Every session starts from scratch. Every agent re-discovers the same framework patterns, hits the same dead ends, burns through the same tokens your teammate's agent already burned through last week. It's wasteful, and on any non-trivial codebase, it kills reliability too.

You might think a static markdown file in your repo solves this. An agents.md, a .cursorrules, whatever. It doesn't. Those are single-point-in-time snapshots that someone has to write and maintain. They don't learn from agent sessions, they don't accumulate knowledge as your team works, and they definitely don't transfer what one model learned to a completely different model. They also won't scale to adequately cover the thousands of lines of code in your proprietary codebase. Personal memory tools like mem0 get closer, but they're scoped to one user and one model. Your team's agents are still siloed.

Spark is multiplayer, multi-model, and multi-domain. One engineer's agent figures out how your state machine framework works, and every agent after that — different engineer, different model, different problem — starts with that knowledge already in hand. It accumulates automatically as agents work, compounds over time, and transfers across model boundaries. We ran 5 rounds of evaluation to pressure-test these claims. The short version: agents with Spark passed more tests, used up to 87% fewer tokens, and the knowledge generalized to problems no agent had ever seen.

The benchmark

We built TeamForge, a Node.js project with intentionally weird framework patterns. Custom routing, validation, state machines, event handling, authorization. None of it matches what agents see in training data, so they can't coast on memorized patterns. They have to read the code and figure it out.

Each round compares two conditions:

Control: Agent works alone, no shared memory
Treatment: Agent has access to Spark, seeded with knowledge from previous sessions

We track pass rate (do tests pass?), tokens (compute cost), and wall-clock time.

Rounds 1-2: Nothing

Both conditions hit 100% pass rate. Every feature, every time. Sonnet 4.6 just figured out our framework on its own.

Shared memory doesn't help when the agent can solve things from scratch. We needed harder problems.

Round 3: First real signal

We redesigned the eval to simulate sprints, where each one builds on the last and the codebase grows. Three trials.

Metric	Control	With Spark
Pass rate	70%	100%
Tokens per pass	1.35M	0.65M

Spark rescued 7 features that control failed. Zero went the other direction. By Sprint 3, agents with Spark used 65% fewer tokens. The gap widened as complexity grew, which is what you'd expect if the memory is actually doing something useful.

Round 4: More trials, same result

Same setup, five trials instead of three, beefier hardware.

Metric	Control	With Spark
Pass rate	90%	100%
Total cost (50 runs)	$215	$93

Treatment passed every feature across all 5 runs. The compounding was interesting to watch: by run 3, Spark agents had plateaued at ~430K tokens per feature, down from 1M+ in run 1. Each session made the next one cheaper, and then the savings leveled off once the knowledge base was saturated.

Round 5: Does it actually generalize?

Here's the obvious objection to everything above: the Spark namespace contained solutions to the exact features being tested. Maybe agents were just looking up answers, not learning transferable patterns.

Run a pilot

See what shared memory does inside your stack.

We onboard design-partner teams in a week, co-run an evaluation against your real workflows, and report back with the numbers. Private subnets and on-prem available.

Book a pilot call

Works with any model, any IDE. Swap them as the frontier moves. The memory you build stays with your team and compounds with every run.

Individual developer?

npm install -g @memco/spark

So we created two brand-new features that exercise the same hard patterns (state machines, event handling, authorization) with completely different business logic. No agent had ever seen them. Spark had zero direct knowledge.

Metric	Control	With Spark
Pass rate	80%	90%
Avg tokens per pass	2.7M	1.8M
Avg time (complex feature)	334s	170s

Even on problems it had never seen, Spark delivered a perfect pass rate while cutting token usage by 35% and halving wall-clock time on the hardest feature. The framework knowledge genuinely generalized.

What held up across 200+ runs

Pass rates went from 70% to 93-100% once the tasks were hard enough to matter. Token usage dropped 35-57% per round, with compounding savings as knowledge accumulated. Cross-model, the savings hit 87%.

And the knowledge transfers. Between models. To problems the agent has never seen. That's the part that matters if you're thinking about this at team scale.

Right now every agent session your team runs throws away everything it learned. Spark captures that and makes it available to every future session, whether that's the same engineer continuing tomorrow, a teammate working on something adjacent, or a cheaper model handling routine work.

Try Spark

Spark works with Claude Code today. Install the CLI or add the MCP server and point it at your codebase — the compound savings start from session one.

Install the CLI:

npm install -g @memco/spark

Or add the MCP server directly to Claude Code:

claude mcp add --transport http spark https://spark.memco.ai/mcp