Introducing Vellum AssistantRead the launch announcement
Docs / Pricing

Pricing

Our pricing philosophy

We believe your AI costs should be transparent and predictable. Vellum passes through model provider costs at cost. We don't charge any margin or markup on token usage. When you spend $1 worth of credits on LLM tokens, that full dollar goes to the model provider. Our goal is to keep Vellum affordable and aligned with your actual usage, not to profit from the AI calls your assistant makes.

How pricing works

Vellum uses a prepaid credit balance: usage is deducted from your credits as you use the assistant. You can add credits anytime from the Billing page via Stripe Checkout. Applicable taxes may be added during checkout.

In the app, your Billing screen shows a Credit Balance plus a breakdown of settled and pending amounts:

  • Credit Balance: the current amount available after pending compute is considered.
  • Settled Balance: charges that have already settled.
  • Pending Usage: estimated in-flight compute that may not be fully settled yet.

Vellum Credits

The purchase and use of Vellum Credits is governed by Section 6 of our Terms of Service.

Vellum makes available certain features and functionalities within the Services, as designated by Vellum from time to time, that are accessible exclusively through the use of prepaid credits (“Vellum Credits”). These designated features and functionalities are referred to as “Credit-Eligible Features.” Credit-Eligible Features currently include inference, web search, image creation, and paid third-party APIs accessed through Vellum's managed OAuth (for example, Twitter). No alternative direct-payment method is available for Credit-Eligible Features.

What happens when credits run out

When credits are exhausted, the app will show a “You've run out of credits” message with an Add Credits action that links you to Billing. Assistant actions that require paid usage will pause until credits are added. Enabling Auto-Reload is the easiest way to avoid hitting this state.

Purchasing Credits

You may fund your Vellum Credit balance (“Vellum Balance”) by purchasing Vellum Credits in ten-dollar ($10.00 USD) increments, up to one hundred dollars ($100.00 USD) per top-up, through the payment methods made available in the Services, or at such other amounts as determined by Vellum from time to time.

How to add credits

You can add credits from the app's Billing settings:

  1. Open Settings and go to the Billing tab.
  2. Select Add Credits. The amount picker offers $10 to $100 in $10 increments.
  3. Complete checkout in your browser via Stripe.
  4. Return to the app; your Credit Balance updates automatically.

One Vellum Credit equals one US dollar. So $10 in checkout adds 10 credits to your balance.

Auto-Reload

If you'd rather not think about manual top-ups, Auto-Reload purchases more credits automatically whenever your balance drops below a threshold you set. Configure it from Settings → Billing.

You set three values:

  • Auto-Reload when balance below ($1 to $100). When your credit balance dips under this amount, an automatic top-up is triggered. Default is $100.
  • Add amount when auto reloading ($10 to $500). How much is charged each time the threshold trips. Default is $10.
  • Monthly spending cap (optional, $25 to $10,000). A safety net that pauses auto top-ups for the rest of the calendar month once total credit purchases reach this amount. Manual purchases count toward the cap too. Must be at least the top-up amount. Leave empty for no limit.

Auto-Reload requires a saved payment method, which you can add in the Payment Methods section right below the toggle. If you're close to the monthly cap when the threshold trips, Auto-Reload only adds the amount remaining before the cap.

You can disable Auto-Reload anytime; your saved card stays on file so you can re-enable it later without re-entering details.

How credits are spent

Four categories of work consume credits today: LLM inference (the biggest line item by far), web search, image generation, and paid third-party APIs you reach through Vellum's managed OAuth (for example, Twitter).

Inference is itself broken into a set of Actions you'll see attributed in your usage dashboard. Here's what each one does:

Conversation with your assistant

  • Main agent. Your assistant's response when you chat with them. The biggest chunk for most people when actively using the app.
  • Inference. One-off model calls from skills or utilities that don't fit a more specific category.

Memory subsystem (mostly background)

  • Memory consolidation. Promoting short-term observations into long-term memory pages.
  • Memory extraction. Pulling concrete facts, preferences, and entities out of a conversation so they can be remembered.
  • Memory retrieval. Looking up relevant memories when you ask a question or start a task.
  • Recall. Targeted, deeper memory lookups across notes, knowledge base, and past conversations.

Conversation polish (background)

  • Conversation summarization. Summarizing finished or long conversations so your assistant can refer back to them efficiently.
  • Conversation title. Auto-generating a short title for each new conversation.
  • Conversation starters. Suggested prompts the app surfaces when idle.
  • Empty-state greeting. The hello your assistant shows when you open the app with no active conversation.
  • Context compactor. Shrinking a long conversation's context so it still fits in the model's window without dropping anything important.

Autonomy (background)

  • Heartbeat agent. Periodic check-ins where your assistant reflects, plans, and decides whether anything needs your attention.
  • Filing agent. Filing notes, decisions, and learnings into your personal knowledge base.
  • Notification decision. Deciding whether to push you a notification or stay quiet.

Other

  • Unknown Task. LLM calls that haven't been tagged with a specific subsystem yet. We're cleaning up the remaining attribution gaps.

A lot of background work is configurable. You can ask your assistant to disable or reduce the frequency of “heartbeats” and “memory compaction,” or to use a less expensive model for these actions. We're actively working on making background spend more visible and easier to control.

Need help with billing? Contact support at support@vellum.ai.

Ready to meet yours?

Pick a name and share your world. Then watch the relationship grow.

HATCH YOURS