Workflow execution timeline revamp, higher performance for evals, improved Map node debugging and more
Author
Noa Flaherty
Nov 1, 2024
Product Updates
No items found.
November is a month for crisp fall weather, giving thanks, and another round of Vellum product updates! In October, we shipped a ton of new models, improvements to Evals, Prompts, Workflows, and more.
Hold the gravy, let’s dive in and see what’s new 🎃
Online Evaluations for Workflow and Prompt Deployments
Previously, you could only run “Offline Evaluations” or “Inline Evaluations.” You can run Offline Evaluations manually when you want to check Prompt / Workflow performance, e.g. when you’re getting ready to make a new Production Release. Inline Evaluations are useful if you want to check quality during a Workflow’s execution and conditionally do something within the Workflow (retry a prompt, throw an error or Slack alert, escalate to a human, etc.)
But what if you want to monitor how your product performs live in production? Now you can!
Online Evaluations help you see your product’s performance in real time. They run on every production execution of your app, helping you catch & resolve edge-cases faster, and prevent regressions more thoroughly. The best part – you can use Vellum premade Metrics, or Custom Metrics that you’ve already configured!
Configure Metrics in the new “Metrics” Tab
Every execution of your Prompt or Workflow now gets evaluated with your Metric
Previously, if you wanted to avoid having a single Prompt node slow down your workflow, you’d need to setup a few nodes and cumbersome logic to time out early.
Now, you can easily set maximum timeouts for Prompt Nodes within Workflows, preventing bottlenecks and ensuring efficient resource management.
Configuring Timeouts in Prompt Nodes
AutoLayout and AutoConnect for Workflows
As you experiment and your workflows become more complex, keeping them organized will make them easier to iterate on. Now, you can automatically organize and connect nodes in Workflow Sandboxes with just a click.
How to use AutoLayout in Workflows
Datadog and Webhook Logging Beta Integrations
If you want deeper insights into key events happening in Vellum, but in the context of the rest of your systems, now you have it with our Datadog & Webhook Logging integrations (in beta). For example, you can set up a Datadog alert to fire when there are multiple subsequent failures when executing a Workflow Deployment.
If you’d like to participate in the Beta Period and want help setting up their integration, please contact us!
New Models and Providers!
Model optionality gives builders more flexibility to optimize for accuracy, latency, and cost, as use-cases necessitate. Here’s a quick overview of the 25 (!!) new models we added in October:
All Perplexity models — including Online models for searching the web!
Cerebras — featuring 2,100 tokens/sec. That’s 3x faster than the current state of the art, or nearly 3 books per minute!
13 new OpenRouter models
The newest Claude 3.5 Sonnet
Gemini 1.5 Flash 8B
Other noteworthy mentions:
Vertex AI embedding models: text-embedding-004 and text-multilingual-embedding-002
OpenAI Prompt Caching for GPT-4o and GPT-o1 models
Click here to see more details about the new models we’re supporting.
Evaluations
Reorder Test Suite Variables
You can now reorder Input and Evaluation Variables within a Test Suite’s settings page, helping you stay organized & make changes faster by putting related values next to one another.
Test Suite Input & Output Variables Configuration
Reorder Entities in Evaluation Reports
When your Evaluation Reports use many Metrics, often you want to see related Metrics grouped nearby one another. You can now reorder entities in the Evaluation Report table, making it easier to triage your Metric scores and iterate on your Prompts & Workflows accordingly.
Evaluation Report - Metric Types Ordering
Filter and Sort on Metric Scores
You can now filter and sort on a Metric’s score within Evaluation Reports. This makes it easier to find all Test Cases that fall below a given Metric threshold, so you can iterate and improve your products’ robustness faster.
Evaluation Report - Test Case Sorting by Metric Scores
Prompts, Models, and Embeddings
Prompt Caching Support for OpenAI
OpenAI now automatically performs prompt caching to help optimize cost & latency of prompts. In Vellum, we capture the new Cache Tokens when using supported OpenAI models, to help you analyze cache hit rates and optimize LLM spend.
Vertex AI Embedding Model Support
We now support Vertex AI Embedding Models: text-embedding-004 and text-multilingual-embedding-002, giving you more options to optimize your RAG pipelines.
Now you can programmatically retrieve all entities in a folder via API. The response lists these entities along with high-level metadata about them.
This new API is available in our SDKs beginning with version 0.8.25. For additional details, check out our API Reference here.
Quality of Life Improvements
Workflow Edge Type Improvements
Edges between Nodes in Workflows could appear jagged or misaligned, making it difficult to visualize connections. With this new improvement, edges now snap into straight-line connectors when they are close to horizontal.
See you in December!
That’s all for now folks. We hope you have a wonderful November, filled with lots of food & fall activities. See ya in December!
PSA - sign up for our newsletter to get these updates in right your inbox!
November is a month for crisp fall weather, giving thanks, and another round of Vellum product updates! In October, we shipped a ton of new models, improvements to Evals, Prompts, Workflows, and more.
Hold the gravy, let’s dive in and see what’s new 🎃
Online Evaluations for Workflow and Prompt Deployments
Previously, you could only run “Offline Evaluations” or “Inline Evaluations.” You can run Offline Evaluations manually when you want to check Prompt / Workflow performance, e.g. when you’re getting ready to make a new Production Release. Inline Evaluations are useful if you want to check quality during a Workflow’s execution and conditionally do something within the Workflow (retry a prompt, throw an error or Slack alert, escalate to a human, etc.)
But what if you want to monitor how your product performs live in production? Now you can!
Online Evaluations help you see your product’s performance in real time. They run on every production execution of your app, helping you catch & resolve edge-cases faster, and prevent regressions more thoroughly. The best part – you can use Vellum premade Metrics, or Custom Metrics that you’ve already configured!
Configure Metrics in the new “Metrics” Tab
Every execution of your Prompt or Workflow now gets evaluated with your Metric
Previously, if you wanted to avoid having a single Prompt node slow down your workflow, you’d need to setup a few nodes and cumbersome logic to time out early.
Now, you can easily set maximum timeouts for Prompt Nodes within Workflows, preventing bottlenecks and ensuring efficient resource management.
Configuring Timeouts in Prompt Nodes
AutoLayout and AutoConnect for Workflows
As you experiment and your workflows become more complex, keeping them organized will make them easier to iterate on. Now, you can automatically organize and connect nodes in Workflow Sandboxes with just a click.
How to use AutoLayout in Workflows
Datadog and Webhook Logging Beta Integrations
If you want deeper insights into key events happening in Vellum, but in the context of the rest of your systems, now you have it with our Datadog & Webhook Logging integrations (in beta). For example, you can set up a Datadog alert to fire when there are multiple subsequent failures when executing a Workflow Deployment.
If you’d like to participate in the Beta Period and want help setting up their integration, please contact us!
New Models and Providers!
Model optionality gives builders more flexibility to optimize for accuracy, latency, and cost, as use-cases necessitate. Here’s a quick overview of the 25 (!!) new models we added in October:
All Perplexity models — including Online models for searching the web!
Cerebras — featuring 2,100 tokens/sec. That’s 3x faster than the current state of the art, or nearly 3 books per minute!
13 new OpenRouter models
The newest Claude 3.5 Sonnet
Gemini 1.5 Flash 8B
Other noteworthy mentions:
Vertex AI embedding models: text-embedding-004 and text-multilingual-embedding-002
OpenAI Prompt Caching for GPT-4o and GPT-o1 models
Click here to see more details about the new models we’re supporting.
Evaluations
Reorder Test Suite Variables
You can now reorder Input and Evaluation Variables within a Test Suite’s settings page, helping you stay organized & make changes faster by putting related values next to one another.
Test Suite Input & Output Variables Configuration
Reorder Entities in Evaluation Reports
When your Evaluation Reports use many Metrics, often you want to see related Metrics grouped nearby one another. You can now reorder entities in the Evaluation Report table, making it easier to triage your Metric scores and iterate on your Prompts & Workflows accordingly.
Evaluation Report - Metric Types Ordering
Filter and Sort on Metric Scores
You can now filter and sort on a Metric’s score within Evaluation Reports. This makes it easier to find all Test Cases that fall below a given Metric threshold, so you can iterate and improve your products’ robustness faster.
Evaluation Report - Test Case Sorting by Metric Scores
Prompts, Models, and Embeddings
Prompt Caching Support for OpenAI
OpenAI now automatically performs prompt caching to help optimize cost & latency of prompts. In Vellum, we capture the new Cache Tokens when using supported OpenAI models, to help you analyze cache hit rates and optimize LLM spend.
Vertex AI Embedding Model Support
We now support Vertex AI Embedding Models: text-embedding-004 and text-multilingual-embedding-002, giving you more options to optimize your RAG pipelines.
Now you can programmatically retrieve all entities in a folder via API. The response lists these entities along with high-level metadata about them.
This new API is available in our SDKs beginning with version 0.8.25. For additional details, check out our API Reference here.
Quality of Life Improvements
Workflow Edge Type Improvements
Edges between Nodes in Workflows could appear jagged or misaligned, making it difficult to visualize connections. With this new improvement, edges now snap into straight-line connectors when they are close to horizontal.
See you in December!
That’s all for now folks. We hope you have a wonderful November, filled with lots of food & fall activities. See ya in December!
PSA - sign up for our newsletter to get these updates in right your inbox!
ABOUT THE AUTHOR
Noa Flaherty
Co-founder & CTO
Noa Flaherty, CTO and co-founder at Vellum (YC W23) is helping developers to develop, deploy and evaluate LLM-powered apps. His diverse background in mechanical and software engineering, as well as marketing and business operations gives him the technical know-how and business acumen needed to bring value to nearly any aspect of startup life. Prior to founding Vellum, Noa completed his undergrad at MIT and worked at three tech startups, including roles in MLOps at DataRobot and Product Engineering at Dover.
Build AI Products Faster: Top Development Platforms Compared
Customer Stories
May 30, 2025
•
5 min
How GravityStack Cut Credit Agreement Review Time by 200% with Agentic AI
Guides
May 28, 2025
•
7 min
How the Best Product and Engineering Teams Ship AI Solutions
Model Comparisons
May 23, 2025
•
8 min
Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro
Guides
May 16, 2025
•
7 min
Document Data Extraction in 2025: LLMs vs OCRs
The Best AI Tips — Direct To Your Inbox
Latest AI news, tips, and techniques
Specific tips for Your AI use cases
No spam
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.
Marina Trajkovska
Head of Engineering
This is just a great newsletter. The content is so helpful, even when I’m busy I read them.
Jeremy Hicks
Solutions Architect
Experiment, Evaluate, Deploy, Repeat.
AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.