There are various techniques for improving your model's answers, including zero-shot prompting and few-shot prompting.
This guide will cover the basics of these methods, when to use them, and their limitations.
What is Zero-Shot prompting?
Zero-shot prompting provides no examples and lets the model figure things out on its own. It relies solely on the model's pre-training data and training techniques to generate a response. The response may not be completely perfect but will likely be coherent.
Here’s an example prompt that we ran with GPT-4.
Prompt:
Classify the text into neutral, negative or positive.
Text: I think the food is okay.
Sentiment:
Result
Neutral
Note that the prompt above didn’t give any instructions to the LLM about how to classify a sentiment. This goes to show that the model understands “sentiment” and can answer this question with zero-shot prompting.
With a broad enough knowledge base and understanding of language, LLMs can generate coherent responses for a number of new tasks using zero shot prompting.
If zero-shot doesn’t work for your example, it’s recommended to use few-shot prompting.
What is Few-Shot prompting?
Few-shot prompting is a method where you use a few examples in your prompt to guide language models (like GPT-4) to learn new tasks quickly. Rather than retraining an entire model from scratch, you use your context window to provide a few examples to improve the model’s performance.
With the latest models and bigger context window sizes, this technique is even more useful.
Here’s a few-shot prompt example.
Prompt
Classify the text into neutral, negative or positive.
Below are some correctly labeled responses.
Text: Yikes! That’s a tricky one
Sentiment: Neutral
Text: Amazing.. That’s just amazing. I can’t believe what he did to you :(
Sentiment: Negative
Text: Horrifying, but story-worthy experience to tell my grandsons about.
Sentiment: Neutral
Text: It could be better, but it’s still better than the rest of them.
Sentiment:
Result
Positive
This is a very simple example, but depending on your task these can get more complex for the model to understand.
In the next section, we look at two examples that are easy for humans, but more challenging for a language model to categorize.
Zero-Shot vs Few-Shot prompting (with examples)
Below we showcase two complex sentiment analysis examples that might be wrongly classified with zero-shot prompting. But, if similar examples are provided in a few-shot prompt, the model will learn and will correctly classify new similar ones.
Phrase with negation
Prompt
Classify the text into neutral, negative or positive.
Text: I do not dislike horror movies.
Sentiment:
Result
Neutral
This one is tricky because we used a phrase with negation and it confuses the model to assume that this statement has a neutral sentiment, where in reality the sentiment is positive.
Negative term used in a positive way
Prompt
Classify the text into neutral, negative or positive.
Text: The final episode was surprising with a terrible twist at the end
Sentiment:
Result
Negative
Again, the model is confused because it assumed that the terrible ending of the movie was perceived as negative, when in fact it was entertaining for the user and it was perceived as positive.
By providing similar examples in a few-shot prompt, you’ll help the model understand these edge cases. This way, the model can respond with the correct sentiment the next time it sees a similar example.
However, this prompting technique doesn’t come without its limits.
Limits to Few-Shot prompting?
There are cases where few-shot prompting won’t be a good fit.
Here are some examples:
- When you’re dealing with a more complex reasoning task and want the model to think step by step; in this case it’s recommended that you use Chain of Thought prompting to get better results.
- If you want to classify some data that has high variability and nuance; you might need to fine-tune a model, as the context window of the model might not fit all unique examples that you’d like the model to consider
- In cases where you don’t want to use fine-tuning, you can use RAG-based few shot prompting. With this technique you can dynamically retrieve pre-labelled examples that are most relevant to the question at hand by referencing your proprietary data stored in a vector database.
Why Few-Shot Prompting Isn’t Always the Best for Reasoning Models
While few-shot prompting can help models handle tricky cases like negations or sarcastic language, it’s not always the right tool — especially with the latest reasoning models.
Modern reasoning models (like GPT-4o with reasoning mode, Claude 3.5 Sonnet, or GPT-5 Reasoner) already incorporate internal step-by-step reasoning. Studies and community reports show that few-shot examples can sometimes hurt performance by biasing the model toward surface patterns rather than allowing it to fully reason through the problem. (Anthropic research, OpenAI reasoning models overview)
For example:
- Adding examples for math or logic puzzles may actually confuse the model into copying flawed steps, instead of leveraging its built-in chain-of-thought capability.
- Research shows that zero-shot CoT ("Let’s think step by step") often outperforms few-shot for reasoning-heavy tasks because the model can directly generate a logical path without being constrained by a handful of potentially unrepresentative examples.
In short: few-shot prompting is great for classification or formatting tasks, but for reasoning, it’s often better to let the model think for itself with structured instructions.
Try and Test Prompts in Vellum
The best way to know if zero-shot, few-shot, or chain-of-thought prompting works for your task is to test them side by side.
With Vellum Prompts, you can:
- Compare zero-shot, few-shot, and reasoning-mode prompts across different models.
- Track accuracy on your own dataset.
- Log and share results with your team so you can make data-driven choices.
👉 Start experimenting with Vellum Prompts and see which approach works best for your use case.
Practical FAQ
1. Should I always avoid few-shot with reasoning models?
Not always. Few-shot can still be useful if you want to enforce a very specific format or bias toward a narrow interpretation. But for reasoning (math, multi-step logic, structured problem-solving), zero-shot or explicit chain-of-thought usually performs better.
2. What’s the difference between “zero-shot CoT” and “few-shot CoT”?
- Zero-shot CoT: Add a phrase like “Let’s think step by step” without examples.
- Few-shot CoT: Provide worked-out reasoning examples before the new question.
Most reasoning models today are strong enough that zero-shot CoT alone is often sufficient.
3. How do I know if my task is “reasoning-heavy”?
Ask yourself: does the answer require intermediate steps (calculations, logical deductions, multi-part instructions)? If yes, it’s reasoning-heavy. Sentiment classification or text formatting usually aren’t — legal contract review or risk scoring usually are.
4. When should I fine-tune instead of prompting?
- When you have lots of domain-specific edge cases that can’t fit in a context window.
- When you need consistent outputs at scale (e.g., compliance flags, structured extractions).
- If retraining once saves more effort than managing increasingly complex prompts.
5. Can I combine few-shot and retrieval (RAG)?
Yes. RAG-based prompting lets you dynamically pull relevant labeled examples from your own dataset instead of hardcoding them into the prompt. This scales better and avoids wasting context space.
6. How does Vellum fit into this workflow?
With Vellum, you can:
- Test few-shot vs. reasoning prompts in one place.
- Run evals to see which approach works best on your real data.
- Share results across your product, ops, and engineering teams to avoid duplicate effort.

Latest AI news, tips, and techniques
Specific tips for Your AI use cases
No spam
Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Experiment, Evaluate, Deploy, Repeat.
AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.





.png)

