Docs / Key Concepts / Memory & Context

Memory & Context

Your assistant remembers you. Not just within a single conversation, but across days, weeks, and months. Here's how.

Two layers of memory

Your assistant has two ways of remembering things, and they serve different purposes.

1. Workspace files — the baseline

These are plain text files in ~/.vellum/workspace/ that define the constants:

  • SOUL.md — behavioral rules and personality
  • IDENTITY.md — your assistant's name, nature, and vibe
  • USER.md — facts about you (name, location, preferences, projects)

Your assistant loads these into every conversation. They're the foundation — the context that makes it feel like it knows you before you've said a word. Your assistant also updates them as it learns new things about you, and you can edit them directly at any time.

2. Long-term memory — the searchable history

Beyond workspace files, your assistant has a memory system that works more like human memory. It extracts facts from your conversations and stores them as searchable, categorized items:

  • Identity — personal facts (“Marina works at Vellum”) — lasts ~6 months
  • Preferences — likes and dislikes (“Marina hates morning meetings”) — lasts ~3 months
  • Constraints — rules and requirements (“Always use TypeScript for new skills”) — lasts ~1 month
  • Projects — repos, tech stacks, current work (“Working on Project Moonshot”) — lasts ~2 weeks
  • Decisions — choices made (“We decided to go with option B”) — lasts ~2 weeks
  • Events — deadlines, meetings, milestones (“Dentist appointment March 15th”) — lasts ~3 days

These lifetimes aren't hard cutoffs. Memories that come up across multiple conversations age more slowly — if you mention Project Moonshot in three separate conversations, it stays fresh longer than something mentioned once. Memories that go stale get demoted in search results before eventually dropping out.

How it decides what to remember

Your assistant doesn't save everything. After each message, it runs an extraction step that identifies facts worth keeping — with confidence scores, importance ratings, and fingerprints to prevent duplicates.

It extracts when:

  • You share a personal fact or preference
  • You make a decision worth tracking
  • It learns something non-obvious from a task
  • You correct its behavior
  • Something seems important for future interactions

Low-value messages (“ok,” “thanks,” “got it”) are filtered out before extraction even runs. It's designed to err on the side of remembering too little rather than too much.

If you want it to remember something specific, just say so:

“Remember that my dentist appointment is on March 15th.”
“Save this: the project deadline is end of Q2.”

When you explicitly ask it to remember something, it saves with high confidence — those memories are less likely to be superseded or go stale.

How it corrects itself

When the assistant extracts a new fact that contradicts an older one — say, you told it you preferred coffee last month but mentioned you've switched to tea — the new memory can supersede the old one. If the correction is explicit (“Actually, I prefer tea now”), the old memory is replaced immediately. If it's inferred, both coexist until the old one ages out.

Duplicate memories are caught by fingerprinting. If the same fact is extracted again, it reinforces the existing memory rather than creating a copy.

How context works in a conversation

Every time you send a message, your assistant assembles context from multiple sources:

  1. Workspace files — SOUL.md, IDENTITY.md, USER.md, loaded at the start of the conversation
  2. Conversation history — everything said so far in this session (summarized if it gets long)
  3. Memory recall — a search of long-term memory for anything relevant to your message
  4. Active skill instructions — if a skill is loaded, its instructions are included
  5. Your message — what you just said

All of this gets sent to the AI model together. That's how your assistant responds with awareness of who you are, what you've discussed before, and what's relevant right now.

How memory recall works

When you send a message, the assistant doesn't just do a keyword search. It runs a hybrid retrieval pipeline:

  1. Your message is embedded — converted into both a dense vector (capturing meaning) and a sparse vector (capturing keywords)
  2. Both vectors search the memory store — dense search finds semantically similar memories, sparse search finds keyword matches
  3. Results are merged — using Reciprocal Rank Fusion, which combines rankings from both search methods
  4. Scoring — each result gets a final score weighted 70% semantic relevance, 20% recency, 10% extraction confidence
  5. Tiering — high-scoring results (above 0.8) get priority injection; moderate results (above 0.6) are included as “possibly relevant”; lower scores are dropped
  6. Staleness check — old memories past their lifetime get demoted, even if they scored well
  7. Injection — relevant memories are formatted and inserted into the conversation as structured context, grouped by type (identity facts, preferences, and general context)

The budget for memory injection is dynamic — it expands or contracts based on how much room is left in the context window after workspace files, conversation history, and skill instructions.

What happens when conversations get long

Every AI model has a context window — a limit on how much text it can process at once. Your assistant manages this automatically:

  1. Compaction — when the conversation approaches 80% of the context limit, older messages are summarized into a compact form. The summary preserves goals, decisions, file paths, errors, and open questions while dropping filler and repetition.
  2. If that's not enough — tool results are truncated to their essentials.
  3. If still tight — images and file contents are replaced with text descriptions.
  4. Last resort — memory injection is scaled back to recent items only.

You won't notice this happening. The assistant keeps the conversation going smoothly — it just works with a summarized version of the earlier context rather than the full transcript.

Private conversations

You can start a private conversation that gets its own isolated memory scope. Memories from a private conversation:

  • Can't leak out — they won't surface in other conversations
  • Can read in — the private conversation can still access your shared memory pool

This is useful when you're discussing something sensitive. The assistant learns from the conversation, but those memories stay contained.

Trust and memory

Not everyone who talks to your assistant can shape its memories. Memory extraction only runs on messages from trusted actors — that's you (the guardian). Messages from trusted contacts or unknown parties are indexed for search within that conversation, but they can't create or modify your long-term memories.

This prevents external parties from injecting false facts into your assistant's memory.

Managing your memories

You have full control:

  • Ask what it knows: “What do you remember about me?” or “What do you know about Project Moonshot?”
  • Correct mistakes: “Actually, I prefer tea, not coffee.” (It'll supersede the old memory.)
  • Delete memories: “Forget what I told you about my dentist appointment.”
  • Search explicitly: “Search your memory for anything about the Q2 deadline.”
  • Edit files directly: Open USER.md or SOUL.md in any text editor and change whatever you want.

Privacy

Memories are stored locally on your machine — in a SQLite database and a vector store inside ~/.vellum/workspace/data/. They don't get synced to a cloud, shared with other users, or used to train AI models.

However, memories are included in the context sent to the AI model when they're relevant to a conversation. This is how your assistant “thinks” with your context. Local storage, cloud thinking — the same trade-off as everything else in the system.

If you tell your assistant something sensitive, it may extract it as a memory and include it in future AI model calls when relevant. You can ask it to forget specific things, edit your workspace files directly, or use private conversations to keep sensitive context isolated.