Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

Zero-Shot vs Few-Shot prompting: A Guide with Examples

Exploring zero-shot & few-shot prompting: usage, application methods, and limits.

7 min
Written by
Reviewed by
No items found.

There are various techniques for improving your model's answers, including zero-shot prompting and few-shot prompting.

This guide will cover the basics of these methods, when to use them, and their limitations.

What is Zero-Shot prompting?

Zero-shot prompting provides no examples and lets the model figure things out on its own. It relies solely on the model's pre-training data and training techniques to generate a response. The response may not be completely perfect but will likely be coherent.

Here’s an example prompt that we ran with GPT-4.

Prompt:

Classify the text into neutral, negative or positive.

Text: I think the food is okay.

Sentiment:

Result
Neutral

Note that the prompt above didn’t give any instructions to the LLM about how to classify a sentiment. This goes to show that the model understands “sentiment” and can answer this question with zero-shot prompting.

With a broad enough knowledge base and understanding of language, LLMs can generate coherent responses for a number of new tasks using zero shot prompting.

If zero-shot doesn’t work for your example, it’s recommended to use few-shot prompting.

What is Few-Shot prompting?

Few-shot prompting is a method where you use a few examples in your prompt to guide language models (like GPT-4) to learn new tasks quickly. Rather than retraining an entire model from scratch, you use your context window to provide a few examples to improve the model’s performance.

With the latest models and bigger context window sizes, this technique is even more useful.

Here’s a few-shot prompt example.

Prompt
Classify the text into neutral, negative or positive.
Below are some correctly labeled responses.

Text: Yikes! That’s a tricky one
Sentiment: Neutral

Text: Amazing.. That’s just amazing. I can’t believe what he did to you :(
Sentiment: Negative

Text: Horrifying, but story-worthy experience to tell my grandsons about.
Sentiment: Neutral

Text: It could be better, but it’s still better than the rest of them.
Sentiment:

Result
Positive

This is a very simple example, but depending on your task these can get more complex for the model to understand.

In the next section, we look at two examples that are easy for humans, but more challenging for a language model to categorize.

Zero-Shot vs Few-Shot prompting (with examples)

Below we showcase two complex sentiment analysis examples that might be wrongly classified with zero-shot prompting. But, if similar examples are provided in a few-shot prompt, the model will learn and will correctly classify new similar ones.

Phrase with negation

Prompt
Classify the text into neutral, negative or positive.
Text: I do not dislike horror movies.

Sentiment:

Result
Neutral

This one is tricky because we used a phrase with negation and it confuses the model to assume that this statement has a neutral sentiment, where in reality the sentiment is positive.

Negative term used in a positive way

Prompt
Classify the text into neutral, negative or positive.
Text: The final episode was surprising with a terrible twist at the end

Sentiment:

Result
Negative

Again, the model is confused because it assumed that the terrible ending of the movie was perceived as negative, when in fact it was entertaining for the user and it was perceived as positive.

By providing similar examples in a few-shot prompt, you’ll help the model understand these edge cases. This way, the model can respond with the correct sentiment the next time it sees a similar example.

However, this prompting technique doesn’t come without its limits.

Limits to Few-Shot prompting?

There are cases where few-shot prompting won’t be a good fit.

Here are some examples:

  • When you’re dealing with a more complex reasoning task and want the model to think step by step; in this case it’s recommended that you use Chain of Thought prompting to get better results.
  • If you want to classify some data that has high variability and nuance; you might need to fine-tune a model, as the context window of the model might not fit all unique examples that you’d like the model to consider
  • In cases where you don’t want to use fine-tuning, you can use RAG-based few shot prompting. With this technique you can dynamically retrieve pre-labelled examples that are most relevant to the question at hand by referencing your proprietary data stored in a vector database.

Why Few-Shot Prompting Isn’t Always the Best for Reasoning Models

While few-shot prompting can help models handle tricky cases like negations or sarcastic language, it’s not always the right tool — especially with the latest reasoning models.

Modern reasoning models (like GPT-4o with reasoning mode, Claude 3.5 Sonnet, or GPT-5 Reasoner) already incorporate internal step-by-step reasoning. Studies and community reports show that few-shot examples can sometimes hurt performance by biasing the model toward surface patterns rather than allowing it to fully reason through the problem. (Anthropic research, OpenAI reasoning models overview)

For example:

  • Adding examples for math or logic puzzles may actually confuse the model into copying flawed steps, instead of leveraging its built-in chain-of-thought capability.
  • Research shows that zero-shot CoT ("Let’s think step by step") often outperforms few-shot for reasoning-heavy tasks because the model can directly generate a logical path without being constrained by a handful of potentially unrepresentative examples.

In short: few-shot prompting is great for classification or formatting tasks, but for reasoning, it’s often better to let the model think for itself with structured instructions.

Try and Test Prompts in Vellum

The best way to know if zero-shot, few-shot, or chain-of-thought prompting works for your task is to test them side by side.

With Vellum Prompts, you can:

  • Compare zero-shot, few-shot, and reasoning-mode prompts across different models.
  • Track accuracy on your own dataset.
  • Log and share results with your team so you can make data-driven choices.

👉 Start experimenting with Vellum Prompts and see which approach works best for your use case.

Practical FAQ

1. Should I always avoid few-shot with reasoning models?

Not always. Few-shot can still be useful if you want to enforce a very specific format or bias toward a narrow interpretation. But for reasoning (math, multi-step logic, structured problem-solving), zero-shot or explicit chain-of-thought usually performs better.

2. What’s the difference between “zero-shot CoT” and “few-shot CoT”?

  • Zero-shot CoT: Add a phrase like “Let’s think step by step” without examples.
  • Few-shot CoT: Provide worked-out reasoning examples before the new question.
    Most reasoning models today are strong enough that zero-shot CoT alone is often sufficient.

3. How do I know if my task is “reasoning-heavy”?

Ask yourself: does the answer require intermediate steps (calculations, logical deductions, multi-part instructions)? If yes, it’s reasoning-heavy. Sentiment classification or text formatting usually aren’t — legal contract review or risk scoring usually are.

4. When should I fine-tune instead of prompting?

  • When you have lots of domain-specific edge cases that can’t fit in a context window.
  • When you need consistent outputs at scale (e.g., compliance flags, structured extractions).
  • If retraining once saves more effort than managing increasingly complex prompts.

5. Can I combine few-shot and retrieval (RAG)?

Yes. RAG-based prompting lets you dynamically pull relevant labeled examples from your own dataset instead of hardcoding them into the prompt. This scales better and avoids wasting context space.

6. How does Vellum fit into this workflow?

With Vellum, you can:

  • Test few-shot vs. reasoning prompts in one place.
  • Run evals to see which approach works best on your real data.
  • Share results across your product, ops, and engineering teams to avoid duplicate effort.
ABOUT THE AUTHOR
Anita Kirkovska
Founding Growth Lead

An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.

ABOUT THE reviewer
Akash Sharma
Co-founder & CEO

Akash Sharma, CEO and co-founder at Vellum (YC W23) is enabling developers to easily start, develop and evaluate LLM powered apps. By talking to over 1,500 people at varying maturities of using LLMs in production, he has acquired a very unique understanding of the landscape, and is actively distilling his learnings with the broader LLM community. Before starting Vellum, Akash completed his undergrad at the University of California, Berkeley, then spent 5 years at McKinsey's Silicon Valley Office.

No items found.
lAST UPDATED
Sep 23, 2025
share post
Expert verified
Related Posts
Guides
October 21, 2025
15 min
AI transformation playbook
LLM basics
October 20, 2025
8 min
The Top Enterprise AI Automation Platforms (Guide)
LLM basics
October 10, 2025
7 min
The Best AI Workflow Builders for Automating Business Processes
LLM basics
October 7, 2025
8 min
The Complete Guide to No‑Code AI Workflow Automation Tools
All
October 6, 2025
6 min
OpenAI's Agent Builder Explained
Product Updates
October 1, 2025
7
Vellum Product Update | September
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

SOAP Note Generation Agent
Extract subjective and objective info, assess and output a treatment plan.
Prior authorization navigator
Automate the prior authorization process for medical claims.

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.
Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent
Check order status, manage shopping carts and process returns.

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.
Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.

Dynamic template box for Sales, Use {{sales}}

Start with some of these sales examples

Research agent for sales demos
Company research based on Linkedin and public data as a prep for sales demo.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Q&A RAG Chatbot with Cohere reranking
Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

Research agent for sales demos
Company research based on Linkedin and public data as a prep for sales demo.
LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.
Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.
Research agent for sales demos
Company research based on Linkedin and public data as a prep for sales demo.
SOAP Note Generation Agent
Extract subjective and objective info, assess and output a treatment plan.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Review Comment Generator for GitHub PRs
Generate a code review comment for a GitHub pull request.

Build AI agents in minutes for

{{industry_name}}

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Prior authorization navigator
Automate the prior authorization process for medical claims.
Population health insights reporter
Combine healthcare sources and structure data for population health management.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.