New research shows memorization and reasoning use separate circuits in AI brains. That could open the door to safer, smarter, and more selective AI tools.
Image by Jo Lin on Unsplash
What if we could tell an AI model to forget only the stuff we don’t want it to remember—say, copyrighted text or leaked private information—but keep everything else working perfectly?
A team at Goodfire.ai might’ve just found a way to start doing that.
They’ve been digging into the deep internals of AI language models and discovered something pretty surprising: memorization and logical reasoning aren’t just different behaviors. They actually happen in different parts of the AI’s neural architecture. This separation might be much cleaner than anyone expected.
Let’s unpack what that means, and why it matters.
Memorizing vs Reasoning: Two Separate Paths in the AI Brain
When language models like OpenAI’s GPT-5 or the Allen Institute’s OLMo learn, they don’t just “understand” text like we do. They digest vast amounts of data and form an intricate web of internal weights (the dials that shape their responses).
- Memorize: Like reciting a quote from a book or remembering a multiplication table.
- Reason: Like figuring out if a sentence is logically true or solving a new problem based on patterns.
The team at Goodfire used a technique called K-FAC (Kronecker-Factored Approximate Curvature) to analyze how these processes show up in the model’s “loss landscape.” Think of that like a topographic map of how well the model’s guesses match the correct answers. Sharp peaks mean it’s reacting strongly to tiny differences—typical of memorization. Smoother hills suggest more flexible thinking, or reasoning.
Image by Brett Jordan on Unsplash
And here’s the kicker: when they selectively removed the memorization pathways, models lost nearly all their ability to regurgitate training data—dropping to just 3.4% accuracy—but kept their reasoning skills nearly untouched, staying at around 95% to 106% of performance.
That’s like making an AI forget most of its flashcards while still acing the test.
Arithmetic Is More Memory Than Math
The researchers tested everything from logic puzzles to factual recall and arithmetic.
They found that when memorization pathways were disabled, math performance dropped to 66%. So even basic equations like 2+2=4—which seem like reasoning—are often just memorized facts to a model.
This might help explain why AI models have such a hard time with math unless paired with external tools. They’re acting more like a student who memorized formulas for a test, rather than someone who understands how math works.
Not All Memories Are Equal
When the team dug deeper into what kind of information was affected, some patterns emerged:
- Rare facts (like obscure CEO names) vanished almost completely.
- Common facts (like capital cities) often stayed intact.
- Open-book problems (where the model is given context) weren’t affected much at all.
That suggests the model devotes different “mental real estate” based on how often it saw something during training.
Most impressively, this method worked better than existing tools like BalancedSubnet when tested on previously unseen quotes. It cut down memorized responses to only around 16% (compared to 60% with older methods).
And it didn’t need examples of what to forget—it figured out what looked like memorization just by analyzing how the model reacts to changes internally.
So What Can We Do With This?
If you’ve ever wondered whether we can train a clean, smart AI that performs well without leaking sensitive information, this research is an early sign that it might be possible.
Imagine being able to:
- Scrub copyrighted or private content from a model without retraining it completely
- Reduce hallucinated facts by targeting memorized text instead of reprogramming the whole model
- Better understand why models succeed or fail at certain tasks
But here’s the catch: memory removal isn’t perfect.
The researchers admit some “forgotten” content could come back if the model is re-exposed later. And in some cases—like arithmetic—it’s still unclear whether math involves memory or just uses similar circuits.
Even so, this is a big step toward opening that mysterious black box inside AI models. By mapping out where different types of knowledge live, we’re starting to get better at editing—not just prompting—our digital collaborators.
Why This Matters for the Future of AI
Right now, language models are like vast, powerful machines with a tangled mess of wires inside. Until now, we didn’t really know which wires did what.
Image by ZHENYU LUO on Unsplash
But if we can isolate certain capabilities so precisely, we might be able to design safer, more efficient AI systems that can learn and unlearn selectively.
This research gives engineers finer control over how AI models think—and forget. And for fields like data privacy, content licensing, or trust in AI decisions, that kind of control is huge.
We’re still early in understanding how to use this. But here’s what this research signals loud and clear:
The mind of a machine might be different from our own, but it’s not unknowable. We’re learning how to trace the lines of memory and reasoning—and someday, we might be able to edit both with precision.
Keywords: AI neural networks, AI memorization vs reasoning, language model memory, AI model safety, arithmetic in AI, K-FAC AI method, OLMo language models, AI memory removal