Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

When should I use function calling, structured outputs or JSON mode?

Learn how and when to JSON mode, structured outputs and function calling for your AI application.

Written by
Reviewed by
No items found.

If you’ve made it to this blog, you likely already know that LLMs predict one token at a time. Each predicted token is based on the trillions of tokens the model has seen in its training process, the context provided in the prompt & all the completions so far. The predicted output token is the most likely token in the distribution of all tokens.

This works great for free-form output like email generation, blog post writing etc. but we quickly start seeing limitations when we need reliable outputs.

Here’s a common example of when LLMs fail when they’re not provided any additional guardrails or instructions. Consider this prompt:

System:

You are a customer support agent working for Walmart. Your job is to look at incoming messages and determine whether they should be escalated to a human agent for review. Messages where the customer is angry or asks to speak to a manager.

Create a JSON with the following schema:

{
  should_escalate: boolean;  
  reasoning: string;  // rationale for the chosen response
}

Please respond with JSON only, nothing before nothing after! 🙏

User:
Where’s the closest Walmart to me?

The Assistant could respond with:

```json
{
  should_escalate: false;  
  reasoning: "the customer is asking for the location of their neighboring Walmart, they don't seem angry or wanting to speak to a manager"
}
```

This response is not valid JSON because of the three backticks before and after the JSON object. In the training process the model likely saw JSON in markdown and is outputting the backticks because those are the most likely tokens in this context.

With invalid JSON & incorrect schema adherence, developers aren’t able to use these outputs reliably in the rest of their applications. Model providers saw this happen over the last few quarters and have released a suite of improvements to allow developers to build more reliable AI systems.

In this blog we will discuss:

  1. How to choose between Function Calling, JSON Mode & Structured Outputs
  2. Which model providers have these options?
  3. When are reliable outputs are needed for AI applications?

Choosing between Function Calling, JSON Mode and Structured Outputs

JSON Mode was the first foray by OpenAI in creating reliable outputs. Toggling JSON mode on just required the output to be in valid JSON and did not ensure any schema adherence.

Developers wanted more and OpenAI & Gemini have since released Structured Outputs.

Enabling Structured Outputs allows you to specify a JSON schema through Zod, Pydantic or through Vellum’s UI to define the JSON. When structured output is enabled the model will adhere to the specified schema in its response.

Vellum’s UI to define function calls

We don't recommend using JSON mode by itself, you should always use Structured Outputs instead.

Function Calling vs Response_format

Now, when we need models to return reliable outputs Structured Outputs is the way to go. But choosing when to use Function Calling v/s responding as-is (OpenAI calls it response_format) is an interesting topic of exploration.

First, what is Function Calling?

You can read in detail here, but to put it simply: All major model providers make it easier for developers to call external tools or functions in their application. You can specify the schema of a function you’d like the model to call and the model would generate the appropriate parameters needed to make the function call (but not actually make the call).

Use Function Calling with Structured Outputs when:

  1. You want to make requests to an external API
  2. You’ve given the model options of multiple tools/functions and you’d like the model to decide which tool to use (multi-agent systems)
  3. Your use case requires an ongoing interaction between the Assistant and User to collect parameters needed to make a function call (for chatbot, copilot use cases)

Use response_format with Structured Outputs when:

  1. No interaction is needed between the Assistant and User, and usually this Assistant response is the last step in your pipeline.
  2. When there’s a specific task at hand (e.g., data extraction) and the model is not using its reasoning capabilities to pick a task

Which Model Providers Support these Options?

OpenAI Anthropic Gemini Mistral
JSON mode
Function / tool calling
Structured outputs

*Gemini only supports structured outputs through Function Calling and doesn’t offer a standalone structured output option for final responses, like OpenAI does with its response_format parameter.

Example Use Cases Where Reliable Outputs are Helpful

1. Data extraction

A common AI use case we see is extraction of structured data from unstructured fields — think obtaining the fields from a contract. Business value is clear, if an AI system can do the extraction reliably then we save countless human hours in manual data entry.

Say the input variable is a Master Services Agreement between companies and the desired output values are fields start_date , end_date , jurisdiction , force_majeure . The goal is for the model to reliably extract these values from the MSA.

Solution: Using Structured outputs with response_format will consistently ensure the model responds in the desired JSON schema it has been given.

2. Data analysis: Text to SQL

Getting LLMs to generate reliable SQL from natural language is tricky because the model doesn’t have full context about database schema. The initial user message also often doesn’t have all the information to make this query reliably. Some additional messages from the user might be needed.

Solution: What we’ve seen work well instead is using Structured Outputs with Function Calling to make an API call and obtain the relevant pieces of context to answer the user question.

3. Multi-agent systems

Composability while building AI systems is important. While building an advanced system it’s important that each agent only perform a specific task to ensure higher quality and consistency of final output. There’s usually an upstream node/agent which determines which downstream agent to call.

Solution: Use Structured Outputs with Function Calling to consistently provide the right input parameters while calling downstream agents.

Need Help Getting Started?

As AI systems get more advanced, we’re here to provide the tooling and best practices to help you get the most out of them. Vellum is the AI development platform for product & engineering teams with deadlines.

Take AI products from early-stage idea to production-grade feature with tooling for experimentation, evaluation, deployment,  monitoring, and collaboration.

Reach out to me at akash@vellum.ai or book a demo if you’d like to learn more.

ABOUT THE AUTHOR
Akash Sharma
Co-founder & CEO

Akash Sharma, CEO and co-founder at Vellum (YC W23) is enabling developers to easily start, develop and evaluate LLM powered apps. By talking to over 1,500 people at varying maturities of using LLMs in production, he has acquired a very unique understanding of the landscape, and is actively distilling his learnings with the broader LLM community. Before starting Vellum, Akash completed his undergrad at the University of California, Berkeley, then spent 5 years at McKinsey's Silicon Valley Office.

ABOUT THE reviewer

No items found.
lAST UPDATED
Sep 6, 2024
share post
Expert verified
Related Posts
Guides
October 21, 2025
15 min
AI transformation playbook
LLM basics
October 20, 2025
8 min
The Top Enterprise AI Automation Platforms (Guide)
LLM basics
October 10, 2025
7 min
The Best AI Workflow Builders for Automating Business Processes
LLM basics
October 7, 2025
8 min
The Complete Guide to No‑Code AI Workflow Automation Tools
All
October 6, 2025
6 min
OpenAI's Agent Builder Explained
Product Updates
October 1, 2025
7
Vellum Product Update | September
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

Population health insights reporter
Combine healthcare sources and structure data for population health management.
SOAP Note Generation Agent
Extract subjective and objective info, assess and output a treatment plan.

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.
Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.
Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent
Check order status, manage shopping carts and process returns.

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.
ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.

Dynamic template box for Sales, Use {{sales}}

Start with some of these sales examples

Research agent for sales demos
Company research based on Linkedin and public data as a prep for sales demo.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
PDF Data Extraction to CSV
Extract unstructured data (PDF) into a structured format (CSV).

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.
Q&A RAG Chatbot with Cohere reranking

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.
Prior authorization navigator
Automate the prior authorization process for medical claims.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

SOAP Note Generation Agent
Extract subjective and objective info, assess and output a treatment plan.
Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.
LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.
Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.
Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.
Review Comment Generator for GitHub PRs
Generate a code review comment for a GitHub pull request.

Build AI agents in minutes for

{{industry_name}}

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Prior authorization navigator
Automate the prior authorization process for medical claims.
Population health insights reporter
Combine healthcare sources and structure data for population health management.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.