Building Memory for AI: Implementing Google's Titans Research

The Problem

Every conversation starts from zero.

You ask an AI chatbot a question. It gives you a thoughtful answer. You come back tomorrow with a follow-up, and it has no idea who you are or what you discussed. It's like talking to someone with permanent amnesia.

This is the fundamental limitation of modern AI systems. They're stateless. Every interaction is isolated. They can retrieve information from a knowledge base (RAG), but they can't learn from the conversation itself.

We wanted to change that. Not by fine-tuning models (expensive, slow, brittle). Not by stuffing more context into prompts (limited, doesn't scale). We wanted AI that actually remembers - that builds understanding over time, like humans do.

"The difference between a tool and an assistant is memory. Tools do what you tell them. Assistants remember what you've told them before."

What is Titans/MIRAS?

In December 2024, Google published research on "Titans" - a new architecture for giving AI systems long-term memory. The accompanying framework, MIRAS (Memory, Inference, Retrieval, and Adaptive Storage), describes how to implement test-time memorisation without retraining the underlying model.

The core idea is simple: instead of trying to make models remember everything, let them decide what's worth remembering based on three signals.

Surprise

How novel is this input?

1.0 - max_similarity

Momentum

Patterns confirmed repeatedly gain weight

confirmation_count++

Decay

Unused memories fade over time

weight *= decay_factor

This is fundamentally different from fine-tuning or vector search. Fine-tuning permanently alters the model. It's expensive, slow, and you can't easily undo it. Vector search (RAG) retrieves but doesn't learn; it's static. MIRAS operates at test time, building a living memory that evolves with every interaction.

Traditional RAG

•Static content retrieval
•Same response for all users
•No memory of past conversations
•Manual content updates required

MIRAS-Enhanced

◆Dynamic learning from interactions
◆Personalised responses per user
◆Learns patterns over 30-60 days
◆Automatic pattern emergence

Why We Built This

We had a chatbot on our website. It answered questions about Kablamo - our work, our services, our insights. It used RAG to pull relevant content from our knowledge base. It worked fine.

But every conversation evaporated. Users would ask the same questions repeatedly. We could see patterns in the queries - topics people cared about, gaps in our content, emerging interests - but the chatbot itself couldn't. It was a retrieval system, not a learning system.

We saw an opportunity: what if the chatbot could learn from every interaction? What if it could remember that you specifically care about Kubernetes, so when you ask "what does Kablamo do?" it emphasises our container orchestration work? What if frequently asked questions automatically surfaced content gaps for our marketing team?

The Strategic Bet

Memory becomes a differentiator for enterprise AI.

Every competitor has access to the same LLMs. The companies that win will be the ones whose AI systems actually understand their users, their domain, their patterns. That understanding comes from memory.

The Architecture

At the heart of our implementation is the MemoryCell - a data structure for storing learned patterns.

# MemoryCell Schema

content: str # The actual text/pattern

embedding: Vector # 768-dim vector embedding

weight: float # 0.0-1.0 importance score

surprise_score: float # How novel when first stored

confirmation_count: int # Times this pattern confirmed

scope: enum # personal | global

status: enum # pending | confirmed | decaying

Here's how memories flow through the system:

Check Global First

If this exact pattern exists globally, reinforce it. Don't duplicate.

Check Personal

If this user has seen this pattern before, reinforce their personal memory.

Calculate Surprise

Find semantically similar memories. Surprise = 1 - best_similarity.

Store or Reinforce

High surprise (>0.8) → store confirmed. Medium (0.3-0.8) → store pending. Low → reinforce existing.

Privacy by Design

Personal memories are never visible to other users. They're filtered out entirely. Anonymous users only see global memories.

When 5+ unique users confirm the same pattern, it graduates from personal to global - but only the pattern, not who confirmed it.

What It's Doing For Us

MIRAS has been running in production on our website. Here's what we're seeing:

Personalisation That Actually Works

Users who frequently ask about AI get AI-focused responses. Users who care about infrastructure see our Kubernetes and cloud work emphasised. Same question, different answers tailored to demonstrated interests.

Content Gap Discovery

High-surprise queries with high weight reveal what users want that we haven't written about. Our content team now has a data-driven backlog instead of guessing what to write next.

Organic Learning

We didn't define patterns manually. They emerged from usage. Topics we didn't anticipate became confirmed memories simply because users kept asking about them.

Live Knowledge Graph

We built a 3D visualisation of the entity relationships MIRAS has learned. You can explore it at /intelligence/architecture/knowledge-graph. It's fun to watch patterns form in real time.

"The best part? We didn't have to teach it anything. It learned from watching users interact with our content."

Technical Implementation

For those who want the implementation details, here's how we built it:

Storage

PostgreSQL + pgvector for vector similarity search. HNSW indexing gives us sub-millisecond recall on 100k+ memories.

Embeddings

Google's embedding API (768 dimensions). Fast, accurate, and we're already using Gemini for generation.

Deduplication

SHA256 content hashing for exact matches. Semantic similarity handles near-duplicates separately.

Maintenance

Daily cron job applies decay, graduates pending→confirmed, promotes personal→global, and forgets low-weight memories.

# Surprise Calculation

def calculate_surprise(content, existing_memories):

embedding = embed(content)

similarities = [cosine_sim(embedding, m.embedding) for m in existing_memories]

return 1.0 - max(similarities) if similarities else 1.0

What We Learned

Surprise thresholds matter. We landed on 0.8 for "definitely novel" and 0.3 for "worth tracking." Too low and you store noise. Too high and you miss patterns.

Decay prevents stale memories from dominating. 2% per week means a memory untouched for a year drops from 1.0 to ~0.35. Memories below 0.1 get deleted.

Global graduation needs a high bar. 5 unique users confirming the same pattern filters out individual quirks and surfaces real shared interests.

Seeding solves cold start but shouldn't replace organic learning. We bootstrap from our content corpus, but the real value comes from patterns that emerge from usage.

The Bigger Picture

MIRAS is a stepping stone. We're building toward AI systems that don't just answer questions. Systems that remember not just facts, but context, preferences, and patterns of inquiry.

The Titans research points to a future where every AI interaction contributes to a shared intelligence layer. Where your personal assistant actually knows you. Where enterprise AI systems develop institutional memory that survives employee turnover.

We're not there yet. But with MIRAS, we've taken the first step: AI that learns from every conversation, remembers what matters, and forgets what doesn't.

References

Google Research: Titans + MIRAS

Titans Paper (arXiv:2501.00663)

MIRAS Paper (arXiv:2501.00662)

Want to explore AI memory for your enterprise?

We'd love to discuss how MIRAS-style architectures could transform your AI systems. Let's have a conversation about building AI that actually learns.

Connect on LinkedIn