Your assistant remembers you. Not just within a single conversation, but across days, weeks, and months. Here's how.
Your assistant has two ways of remembering things, and they serve different purposes.
These are plain text files in ~/.vellum/workspace/ that define the constants:
Your assistant loads these into every conversation. They're the foundation — the context that makes it feel like it knows you before you've said a word. Your assistant also updates them as it learns new things about you, and you can edit them directly at any time.
Beyond workspace files, your assistant has a memory system that works more like human memory. It extracts facts from your conversations and stores them as searchable, categorized items:
These lifetimes aren't hard cutoffs. Memories that come up across multiple conversations age more slowly — if you mention Project Moonshot in three separate conversations, it stays fresh longer than something mentioned once. Memories that go stale get demoted in search results before eventually dropping out.
Your assistant doesn't save everything. After each message, it runs an extraction step that identifies facts worth keeping — with confidence scores, importance ratings, and fingerprints to prevent duplicates.
It extracts when:
Low-value messages (“ok,” “thanks,” “got it”) are filtered out before extraction even runs. It's designed to err on the side of remembering too little rather than too much.
If you want it to remember something specific, just say so:
“Remember that my dentist appointment is on March 15th.”
“Save this: the project deadline is end of Q2.”
When you explicitly ask it to remember something, it saves with high confidence — those memories are less likely to be superseded or go stale.
When the assistant extracts a new fact that contradicts an older one — say, you told it you preferred coffee last month but mentioned you've switched to tea — the new memory can supersede the old one. If the correction is explicit (“Actually, I prefer tea now”), the old memory is replaced immediately. If it's inferred, both coexist until the old one ages out.
Duplicate memories are caught by fingerprinting. If the same fact is extracted again, it reinforces the existing memory rather than creating a copy.
Every time you send a message, your assistant assembles context from multiple sources:
All of this gets sent to the AI model together. That's how your assistant responds with awareness of who you are, what you've discussed before, and what's relevant right now.
When you send a message, the assistant doesn't just do a keyword search. It runs a hybrid retrieval pipeline:
The budget for memory injection is dynamic — it expands or contracts based on how much room is left in the context window after workspace files, conversation history, and skill instructions.
Every AI model has a context window — a limit on how much text it can process at once. Your assistant manages this automatically:
You won't notice this happening. The assistant keeps the conversation going smoothly — it just works with a summarized version of the earlier context rather than the full transcript.
You can start a private conversation that gets its own isolated memory scope. Memories from a private conversation:
This is useful when you're discussing something sensitive. The assistant learns from the conversation, but those memories stay contained.
Not everyone who talks to your assistant can shape its memories. Memory extraction only runs on messages from trusted actors — that's you (the guardian). Messages from trusted contacts or unknown parties are indexed for search within that conversation, but they can't create or modify your long-term memories.
This prevents external parties from injecting false facts into your assistant's memory.
You have full control:
Memories are stored locally on your machine — in a SQLite database and a vector store inside ~/.vellum/workspace/data/. They don't get synced to a cloud, shared with other users, or used to train AI models.
However, memories are included in the context sent to the AI model when they're relevant to a conversation. This is how your assistant “thinks” with your context. Local storage, cloud thinking — the same trade-off as everything else in the system.
If you tell your assistant something sensitive, it may extract it as a memory and include it in future AI model calls when relevant. You can ask it to forget specific things, edit your workspace files directly, or use private conversations to keep sensitive context isolated.