Vellum Product Update

All Post /

Product Updates

Search...

Index

Inline evaluation / Guardrails: Ensure good system performance at run-time

This is some text inside of a div block.

Vellum Product Update | July 2023

We've continued to build our platform more, here's a look at the latest from us and a sneak peak of what's coming!

Author

Noa Flaherty

Jul 27, 2023

Following the exciting public announcement of our seed round, we’ve been hard at work doubling down on building out our platform to help companies create production use cases with LLMs.

If you’re already a Vellum customer, you may have seen some of these already, but here’s a quick recap of everything new within the past month!

Model Support

Llama2 & MPT instruct

Last month we added support for our first open source LLM (falcon-40b-instruct). This past month, we’ve continued to add native support for open source LLMs – notably, the Llama2 series and MosaicML’s mpt-instruct series. We’re already seeing some exciting results from these models and encourage folks to check them out! You can compare them side-by-side against your own benchmarks and other models via Vellum’s Playground.

Claude 2

We also now provide native support for Anthropic’s Claude 2. If you need longer context windows and low latency, definitely give it a try!

Embedding Models for Search

Vellum Search now supports multilingual-e5-large – an awesome new open source embedding model that requires very little configuration to get great results.

Vellum Search is a great way to quickly get started on use-cases that require vector search and document retrieval, as you don’t have to manage any of the infrastructure and can simply hit APIs to upload documents and search across them.

Fine-Tuned Models

Vellum has begun to partner with select customers to provide fine-tuned open-source models that produce higher accuracy, lower latency, and lower costs compared to off-the-shelf closed source models. The early results look very promising!

If you’re interested in piloting this with us, contact us here.

‍

Playground

Vellum’s Playground is the centralized place where technical and non-technical folks alike collaborate on prompts. Some people spend hours at a time in Playground, and so we continue to invest in making it a useful and powerful tool.

Function Calling

One of the biggest additions to Playground is native support for OpenAI’s new function-calling functionality. You can now experiment with the entire function-calling lifecycle in Vellum, including using our UI to easily define new functions.

This big update probably warrants its own whole post – to learn more, check out the demo video here.

‍

Latency Tracking

You can now enable latency tracking to see how long it takes for an LLM’s response to start and finish coming back. These metrics are averaged over all Scenarios so you can get a feel for how fast a prompt/model combination will be.

‍

Manual Evaluation & Note-Taking

Vellum has had automated output evaluation for a while now, but sometimes, you just want to manually indicate which outputs were good or bad, or leave notes on a given output so that you can keep track of your thoughts. Now you can.

‍

Renaming Prompts & Scenarios

Renaming Prompts and Scenarios is useful to keep track of the intent of each. Now you can edit their names inline.

‍

Previewing Compiled LLM API Payloads

Vellum acts as an abstraction layer between you and the many LLM providers and models out there. However, sometimes it’s helpful to see what exactly Vellum is sending to the LLM via its API. Now you can via the Prompt fullscreen editor.

‍

Copy/Pasting Chat Messages

When iterating on prompts for AI chat applications, it’s likely that you’ll have a number of different Scenarios to test out conversation flows. Sometimes these flows are built up on one-another and it can be useful to start from an existing conversation. You can now cop/paste chat messages from one Scenario to another to help with the process.

Streaming

In our previous Product Update, we announced support for streaming results of OpenAI models back to the Playground UI. Now, we support streaming for Anthropic and Cohere models as well.

‍

Deployments

Streaming

In addition to adding streaming support for Anthropic and Cohere models to Playground, we also now have streaming support for these models in Vellum Deployments. You can learn more about our streaming API in our API docs here.

Filtering & Sorting Completions

Vellum provides observability and logging for all LLM requests made in production. However, it’s been historically hard to find specific requests that you may need to debug. Now, you can filter and sort on most columns in the Completions table. As you apply filters/sorting, the browser’s url will be updated. You can copy this url and refer back to it, or share with others, to pick up where you left off.

Quality of Life

Sometimes it’s the little things. In addition to cranking out new features, we hope to make Vellum more of a joy to use in general. This will be a big focus of next month, but we’ve already got a head start with:

Improved Test Suites Infrastructure – you can now run test suites containing hundreds or even thousands of test cases
Unified UI Components – more and more of the objects you see in Vellum have been standardized and made responsive to screen size
Correctly Formatted Copy/Paste – Copy/pasting the output of prompts from Playground into other systems will now maintain the original format.

Sneak Peak

If you’ve made it this far, congrats! You get a sneak peak of something big that we’ve been hard at work on and will announce more formally soon… Vellum Workflows!

Vellum Workflows is our answer to the wild world of experimenting with, versioning, and monitoring chains of LLM calls. More on this soon, but if you want to join our closed beta program for Workflows, you can contact us here.

And that’s a wrap! Thanks for following along and to our customers – thank you as always for your amazing product feedback! Vellum wouldn’t be what it is today without you all pushing us.

Following the exciting public announcement of our seed round, we’ve been hard at work doubling down on building out our platform to help companies create production use cases with LLMs.

If you’re already a Vellum customer, you may have seen some of these already, but here’s a quick recap of everything new within the past month!

Model Support

Llama2 & MPT instruct

Claude 2

We also now provide native support for Anthropic’s Claude 2. If you need longer context windows and low latency, definitely give it a try!

Embedding Models for Search

Vellum Search now supports multilingual-e5-large – an awesome new open source embedding model that requires very little configuration to get great results.

Fine-Tuned Models

If you’re interested in piloting this with us, contact us here.

‍

Playground

Function Calling

This big update probably warrants its own whole post – to learn more, check out the demo video here.

‍

Latency Tracking

‍

Manual Evaluation & Note-Taking

‍

Renaming Prompts & Scenarios

Renaming Prompts and Scenarios is useful to keep track of the intent of each. Now you can edit their names inline.

‍

Previewing Compiled LLM API Payloads

‍

Copy/Pasting Chat Messages

Streaming

In our previous Product Update, we announced support for streaming results of OpenAI models back to the Playground UI. Now, we support streaming for Anthropic and Cohere models as well.

‍

Deployments

Streaming

Filtering & Sorting Completions

Quality of Life

Improved Test Suites Infrastructure – you can now run test suites containing hundreds or even thousands of test cases
Unified UI Components – more and more of the objects you see in Vellum have been standardized and made responsive to screen size
Correctly Formatted Copy/Paste – Copy/pasting the output of prompts from Playground into other systems will now maintain the original format.

Sneak Peak

If you’ve made it this far, congrats! You get a sneak peak of something big that we’ve been hard at work on and will announce more formally soon… Vellum Workflows!

And that’s a wrap! Thanks for following along and to our customers – thank you as always for your amazing product feedback! Vellum wouldn’t be what it is today without you all pushing us.

ABOUT THE AUTHOR

Noa Flaherty

Co-founder & CTO

Noa Flaherty, CTO and co-founder at Vellum (YC W23) is helping developers to develop, deploy and evaluate LLM-powered apps. His diverse background in mechanical and software engineering, as well as marketing and business operations gives him the technical know-how and business acumen needed to bring value to nearly any aspect of startup life. Prior to founding Vellum, Noa completed his undergrad at MIT and worked at three tech startups, including roles in MLOps at DataRobot and Product Engineering at Dover.

No items found.

talk with an AI Expert

Product Updates

July 1, 2025

•

6 min

Vellum Product Update | May & June

LLM basics

June 8, 2025

•

5 min

Big Ideas from the AI Engineer World’s Fair

LLM basics

June 1, 2025

•

8 min

Build AI Products Faster: Top Development Platforms Compared

Customer Stories

May 30, 2025

•

5 min

How GravityStack Cut Credit Agreement Review Time by 200% with Agentic AI

Guides

May 28, 2025

•

7 min

How the Best Product and Engineering Teams Ship AI Solutions

Model Comparisons

May 23, 2025

•

8 min

Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro

The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska

Head of Engineering