There are various techniques for improving your model's answers, including zero-shot prompting and few-shot prompting.
This guide will cover the basics of these methods, when to use them, and their limitations.
TABLE OF CONTENTS
Zero-shot prompting provides no examples and lets the model figure things out on its own. It relies solely on the model's pre-training data and training techniques to generate a response. The response may not be completely perfect but will likely be coherent.
Here’s an example prompt that we ran with GPT-4.
Note that the prompt above didn’t give any instructions to the LLM about how to classify a sentiment. This goes to show that the model understands “sentiment” and can answer this question with zero-shot prompting.
With a broad enough knowledge base and understanding of language, LLMs can generate coherent responses for a number of new tasks using zero shot prompting.
If zero-shot doesn’t work for your example, it’s recommended to use few-shot prompting.
Few-shot prompting is a method where you use a few examples in your prompt to guide language models (like GPT-4) to learn new tasks quickly. Rather than retraining an entire model from scratch, you use your context window to provide a few examples to improve the model’s performance.
With the latest models and bigger context window sizes, this technique is even more useful.
Here’s a few-shot prompt example.
This is a very simple example, but depending on your task these can get more complex for the model to understand.
In the next section, we look at two examples that are easy for humans, but more challenging for a language model to categorize.
Below we showcase two complex sentiment analysis examples that might be wrongly classified with zero-shot prompting. But, if similar examples are provided in a few-shot prompt, the model will learn and will correctly classify new similar ones.
Phrase with negation
This one is tricky because we used a phrase with negation and it confuses the model to assume that this statement has a neutral sentiment, where in reality the sentiment is positive.
Negative term used in a positive way
Again, the model is confused because it assumed that the terrible ending of the movie was perceived as negative, when in fact it was entertaining for the user and it was perceived as positive.
By providing similar examples in a few-shot prompt, you’ll help the model understand these edge cases. This way, the model can respond with the correct sentiment the next time it sees a similar example.
However, this prompting technique doesn’t come without its limits.
There are cases where few-shot prompting won’t be a good fit.
Here are some examples:
- When you’re dealing with a more complex reasoning task and want the model to think step by step; in this case it’s recommended that you use Chain of Thought prompting to get better results.
- If you want to classify some data that has high variability and nuance; you might need to fine-tune a model, as the context window of the model might not fit all unique examples that you’d like the model to consider
- In cases where you don’t want to use fine-tuning, you can use RAG-based few shot prompting. With this technique you can dynamically retrieve pre-labelled examples that are most relevant to the question at hand by referencing your proprietary data stored in a vector database.
You now have a solid understanding on zero-shot and few-shot prompting. Both can be very useful for different tasks.
When using few-shot prompting, it’s crucial to recognize the specific challenges in your data. Providing targeted examples can significantly improve the model's accuracy.
However, it's also important to be aware of the limitations. If your data varies a lot, or you're reaching the context window limits, or facing difficulties with complex prompts, think about whether fine-tuning a custom model could work better.
Ultimately, the key lies in experimentation. Try out different prompts, and perhaps even compare different models, to discover the most effective solution for your scenario.
These techniques are your toolbox, but it's your data and your experiments that'll show you what works best. Keep tinkering, and you'll find your sweet spot!