Docs / Key Concepts / How It All Fits Together

How It All Fits Together

Vellum is a personal assistant that lives on your machine. It can read your screen, control your apps, manage your email, make phone calls, browse the web, and remember what matters — all from a single conversation.

Here's how the pieces connect.


The Layers

Think of Vellum as five layers stacked on top of each other:

You (the human)

Channels

DesktopVoiceTelegramSlackWhatsAppEmailiOSCLIWeb

Assistant Core

PersonalityMemoryPlans

Skills

GmailBrowserPhoneSlackScreen Watch25+ more

Tools

file_readbashbrowser_*computer_usememory_searchui_showhost_*...

Workspace

SOUL.mdUSER.mdIDENTITY.mdskills/

You talk to the assistant through channels — the desktop app, a Telegram bot, a phone call, or any of the other supported channels.

The assistant core decides what to do. It loads your workspace files to remember who it is, who you are, and how you like things done. It pulls relevant memories from past conversations. Then it sends all of that context, plus your message, to an AI model (Claude) to figure out the right response.

Skills are bundles of capability — Gmail, browser automation, phone calls, screen watching, and about 30 others. Each skill teaches the assistant how to do something specific and gives it the tools to do it.

Tools are the individual actions: read a file, run a shell command, click a button on screen, navigate a browser, search memory. Skills combine tools into useful sequences.

The workspace is the assistant's persistent home — a local directory (~/.vellum/workspace/) containing its identity, your preferences, custom skills, and anything else it needs to remember across conversations.


What happens when you send a message

  1. Your message enters through a channel — the desktop app, Telegram, voice, or any other connected channel.
  2. The assistant core loads context: workspace files (SOUL.md, USER.md, IDENTITY.md), relevant memories from past conversations, active skill instructions, and the current conversation history.
  3. Context plus your message goes to Claude (Anthropic's AI model) for reasoning.
  4. The response may include actions — “read this file,” “send this email,” “click that button,” “search the web.”
  5. Tools execute the actions. Some run in a sandbox (safe by default), others run on your machine and may ask for permission first.
  6. Skills orchestrate multi-step sequences — for example, the Gmail skill might search your inbox, draft a reply, and wait for your approval before sending.
  7. Results flow back through the same channel you started from.
  8. The assistant saves what it learns — facts, preferences, and context — to its memory system for future conversations.

Computer use

On macOS, the assistant can see and control your screen. This is one of its most distinctive capabilities.

Perceive: It reads your screen two ways — by enumerating the accessibility tree (the same API screen readers use) and by taking screenshots. Together, these give it a structured and visual understanding of what's on screen.

Verify: Before acting, every step passes through safety checks — sensitive data detection, destructive action blocking, loop detection, and step limits.

Execute: It injects mouse clicks and keyboard input via macOS system events. Text input uses the clipboard to handle special characters and input methods correctly.

Observe: After each action, it waits for the UI to settle, captures the new state, and decides what to do next.

This loop runs for up to 50 steps per session. You see an overlay showing what the assistant is doing, and you can stop it at any time.


Memory

The assistant doesn't just save notes — it runs a full retrieval system.

When you talk to it, it extracts facts and entities from the conversation and stores them as searchable memories. On the next turn (or the next conversation), it searches those memories using a hybrid approach — combining semantic similarity with keyword matching — to pull in the most relevant context.

Memories have lifetimes and can go stale. Private conversations get isolated memory scopes. Shared conversations contribute to a common pool. The system deduplicates and supersedes old facts automatically.

The result: your assistant gets better the more you use it, without you having to explicitly teach it anything.


Trusted contacts

You can grant other people limited access to your assistant through a guardian system.

A trusted contact can be verified across any channel — Telegram, WhatsApp, voice, or others. Once verified, they can send messages to your assistant and, depending on the permissions you've set, approve sensitive actions on your behalf.

This is useful for delegation: your assistant can handle a request from a colleague, check with you before doing anything risky, and report back through whatever channel they're on.


What runs where

ComponentWhere it runsWhat it does
Desktop appYour MacUI, screen perception, input injection
Assistant coreYour machine (local process)Conversations, memory, tool coordination
AI modelCloud (Anthropic)Reasoning and response generation
WorkspaceLocal (~/.vellum/)Persistent identity, preferences, skills
Tool executionLocal (sandbox or host)File operations, shell commands, browser
GatewayYour machine or Vellum cloudWebhook ingress for external channels
Connected servicesCloud (Gmail, Slack, etc.)External integrations via OAuth

The trade-off: Your messages and conversation history are sent to the AI model provider for reasoning. Everything else — your files, tools, memory, and workspace — stays on your machine.

If you prefer not to run things locally, Vellum also offers a managed mode: sign in with your Vellum account and the assistant runs on Vellum's platform instead. Same capabilities, hosted for you.


Skills in depth

The assistant ships with about 30 skills across several categories:

CategoryExamples
CommunicationGmail, Slack, phone calls, messaging
ResearchBrowser automation, document/PDF processing
ProductivityContacts, tasks, followups, notifications, playbooks
MonitoringScreen watch (OCR-based), service watchers (Gmail, Calendar, GitHub, Linear)
DevelopmentCode execution, app building, sub-agent orchestration

Each skill is a self-contained package: a markdown file that teaches the assistant when and how to use it, a tool manifest defining what actions are available, and the implementation code behind those tools.

You can also create your own skills. Describe what you want, and the assistant will scaffold, test, and persist a new skill to your workspace.