Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

Introducing New Execute Prompt APIs

Introducing a new way to invoke your Vellum stored prompts!

Written by
Reviewed by
No items found.

Official today, we are launching two new production-grade APIs for invoking your prompts in Vellum - Execute Prompt and Execute Prompt Stream!

These APIs introduce a new interface that aligns with our "Execute Workflow" endpoint, and packs a bunch of new features around release pinning, future proofing, and more! Let’s dig in.

Improved, Consistent API

The "Execute Prompt" and "Execute Prompt Stream" are meant to replace our widely used "Generate" and "Generate Stream" endpoints. We recommend migrating to the new "Execute Prompt" APIs at your earliest convenience. We still haven't decided when we'll deprecate the legacy endpoints. While we'll continue to maintain them, we won't be adding new functionality to them.

If you use our (now legacy) endpoints, you may be familiar with this code snippet:

There are several aspects of this interface we wanted to improve:

  • What does generate do? It's not obvious that this API is the primary one for executing prompts.
  • Less than 0.01% of invocations involved sending multiple requests, which complicated the interface for the other 99.99% of invocations where only a single request was sent.
  • The 'input_values' only accepts STRING inputs, which often led to confusion among users about whether to include 'chat_history in its own arg or string-ified as part of input_values.
  • Check out that response - six data accesses just to see what the model responded with!

Here’s what the same action looks like using the new "Execute Prompt" API:


This interface brings the following improvements:

  • Consistent interface with current and future execute Vellum resources.
  • Non-batch by default.
  • Flexible inputs interface capable of handling all Vellum Variable Types.
  • Simplified response schema for core use cases.

Release Pinning

Every deployed prompt in Vellum, comes with an auto generated release tag that can be used to reference a specific Release of the deployment. In the future, users will be able to add their own custom release tags to identify specific releases of a deployment.

The new "Execute Prompt" APIs support a release_tag argument for accessing an exact release of your prompt. This is useful for keeping a production release of your prompt while letting the staging release float with “LATEST.”. For example:

You can find the release tag associated with each prompt deployment release on the "Releases" tab of your prompt deployment:

Future Feature Support By Default

Stop me if you’ve felt this before - OpenAI releases new feature in their API, but despite how fast Vellum tries to ship support for it, you’re still stuck for some time waiting for native support. We all know the feeling, and while we’re constantly making internal improvements to increase our time-to-delivery, these "Execute Prompt" APIs now include new parameters that allow for use of any future feature by default.

At its core, Vellum translates the inputs you provide into the API request that a given model expects. If there’s a parameter that you’d like to override at runtime, you can now use the raw_overrides parameter. Similarly, if there’s some data from the model provider’s raw API response, you can opt into returning it via the expand_raw parameter. These raw response fields are returned in the raw parameter.

Let’s see an example, using OpenAI seeds and fingerprinting, which is not yet supported in Vellum but coming soon:

These features are meant for power users that are eager to use the latest features coming from model providers. It gives raw, early access to users without needing to eject from Vellum’s API altogether, thereby losing all benefits of Vellum prompt deployments like versioning, monitoring, and use within workflows.

Storing Metadata With Each Execution

For some time now, Vellum has allowed to pass in an external_id with your prompt executions so that you can easily map executions stored in Vellum to your own internal systems. The "Execute Prompt" APIs now also support a new metadata field that allows you to pass in any arbitrary JSON data that’ll be stored with the prompt execution. In the future, we’ll support the ability to filter by these metadata fields in our UIs:

Multi Function Calling Support

Our old prompt execution endpoints returned function calls with `type="JSON"`. This brought with it a few downsides:

  • Users needed to parse the json-ified string twice in order to access the function calling arguments.
  • Users were responsible for adapting to changes made in the underlying model provider's API (OpenAI now uses `tool_calls`!)
  • Return type is overloaded with the soon to be supported JSON mode.

To solve this, each function call is now output to the user as a `FUNCTION_CALL` type:

Along with solving each of the aforementioned downsides above, these new endpoints now have first class support for Multi function calling, with each function call outputted as a separate within the top level `outputs` array.

Give it a shot!

If you’re a Vellum customer, we’re excited to hear what you think! Interested in trying out Vellum? Book a demo with us today.

ABOUT THE AUTHOR
David Vargas
Full Stack Founding Engineer

A Full-Stack Founding Engineer at Vellum, David Vargas is an MIT graduate (2017) with experience at a Series C startup and as an independent open-source engineer. He built tools for thought through his company, SamePage, and now focuses on shaping the next era of AI-driven tools for thought at Vellum.

ABOUT THE reviewer

No items found.
lAST UPDATED
Jan 4, 2024
share post
Expert verified
Related Posts
LLM basics
October 10, 2025
7 min
The Best AI Workflow Builders for Automating Business Processes
LLM basics
October 7, 2025
8 min
The Complete Guide to No‑Code AI Workflow Automation Tools
All
October 6, 2025
6 min
OpenAI's Agent Builder Explained
Product Updates
October 1, 2025
7
Vellum Product Update | September
Guides
October 6, 2025
15
A practical guide to AI automation
LLM basics
September 25, 2025
8 min
Top Low-code AI Agent Platforms for Product Managers
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

SOAP Note Generation Agent
Extract subjective and objective info, assess and output a treatment plan.
Prior authorization navigator
Automate the prior authorization process for medical claims.

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.
AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent
Check order status, manage shopping carts and process returns.

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.
Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
AI legal research agent
Comprehensive legal research memo based on research question, jurisdiction and date range.

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.
Q&A RAG Chatbot with Cohere reranking

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

Population health insights reporter
Combine healthcare sources and structure data for population health management.
AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

Synthetic Dataset Generator
Generate a synthetic dataset for testing your AI engineered logic.
E-commerce shopping agent
Check order status, manage shopping carts and process returns.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.
LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.
AI legal research agent
Comprehensive legal research memo based on research question, jurisdiction and date range.

Build AI agents in minutes for

{{industry_name}}

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Prior authorization navigator
Automate the prior authorization process for medical claims.
Population health insights reporter
Combine healthcare sources and structure data for population health management.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.