You switch to GPT-5 thinking it’s going to be a significant upgrade. Instead, you get slower responses, prompts that don’t work, and answers that feel verbose.
It’s not just you, people are noticing the same thing:
The reality is, however, that GPT-5 can be a great model for your use-case. But only if you know how to manage it’s new parameters.
This model is very adaptable, and comes with new settings like: reasoning options, verbosity controls, and specific prompting tips.
We wrote this guide, to help you learn:
Prompting techniques that actually get good results with GPT-5
How to speed things up without losing accuracy
When it makes sense to switch from older models, and when to stick with them
How to use GPT-5 to improve your own prompts
Let’s look at the changes in this model before we get to the prompting tips.
GPT-5’s new controls
GPT-5 is built for any use-case, and it comes with exciting new layer of developer controls.
Here’s a quick overview:
Reasoning effort: This parameter controls the reasoning effort that a model takes before answering a question.
Verbosity: With this parameter, you can control the amount of detail in the responses.
Custom tools: Custom tools work in much the same way as JSON schema-driven function tools. But rather than providing the model explicit instructions on what input your tool requires, the model can pass an arbitrary string back to your tool as input (e.g. SQL queries or shell scripts)
Tool preambles: These are brief, user-visible explanations that GPT-5 generates before invoking any tool or function, outlining its intent or plan (e.g., “why I'm calling this tool”). Useful for debugging and understanding how the model works.
Parameters / Tool
What It Does
When to Use It
Verbosity
Controls how many tokens to output per generation. As a result you get lower latency.
low → Best for situations where you want concise answers or simple code generation, such as SQL queries. Or when you need latency to go down. high → When you want your model to perform thorough explanations of documents or perform extensive code refactoring.
Reasoning effort
This parameter controls how many reasoning tokens the model generates before producing a response.
For tasks that require less reasoning you would like to use minimal reasoning, vs high for more reasoning-intense tasks.
Custom tools
The model can pass an arbitrary string back to your tool as input.
This is useful to avoid unnecessarily wrapping a response in JSON, or to apply a custom grammar to the response by using.
Tool preambles
User-visible explanations that GPT-5 generates before invoking any tool or function.
When you want to understand how the model works, and for debugging tool use.
For a complete breakdown of GPT-5’s new parameters and how to make them work for you, check out this OpenAI resource.
How to migrate from older models
When migrating to GPT-5 from an older OpenAI model, start by experimenting with reasoning levels and prompting strategies. Based on best-practices here's how you should think about migrating from older models:
o3: gpt-5 with medium or high reasoning is a great replacement. Start with medium reasoning with prompt tuning, then increasing to high if you aren't getting the results you want.
gpt-4.1: gpt-5 with minimal or low reasoning is a strong alternative. Start with minimal and tune your prompts; increase to low if you need better performance.
o4-mini or gpt-4.1-mini: gpt-5-mini with prompt tuning is a great replacement.
gpt-4.1-nano: gpt-5-nano with prompt tuning is a great replacement.
Previous Model
Recommended GPT-5 Model
Starting Reasoning Effort
o3
gpt-5
Medium → High
gpt-4.1
gpt-5
Minimal → Low
o4-mini
gpt-5-mini
Default
gpt-4.1-mini
gpt-5-mini
Default
gpt-4.1-nano
gpt-5-nano
Default
18 Prompting tips for GPT-5
GPT-5 is built to follow instructions with surgical precisions, meaning poorly structured prompts will almost always result in undesired outputs. You’ll need to be as explicit and specific as possible, and very conscientious of how you structure your prompts.
Below we cover the most useful practices we found to help you yield better results with GPT-5 for your use-case
1. Get the model to run faster, by lowering it’s reasoning effort
The new model comes with a very powerful parameter: reasoning_effort. Using this parameter you control how much reasoning tokens the model uses to get to an answer.
If you want your model to minimize the latency, you can set the reasoning to minimal or low. This will reduce the exploration depth and will improve efficiency and latency.
2. Define clear criteria in your prompt
Set clear rules in your prompt for how the model should explore the problem. This keeps it from wandering through too many ideas.
Make sure you follow the following:
Sets a clear goal: The model knows exactly what the outcome should be
Provides a step-by-step method: It lays out a logical order: start broad, branch into specifics, run parallel queries, deduplicate, and cache.
Defines stopping rules: The “early stop criteria” tell the model when to move from searching to acting, avoiding endless context gathering.
Handles uncertainty: The “escalate once” step prevents the model from looping endlessly if results conflict.
Controls depth: It limits how far the model should trace details, focusing only on relevant symbols/contracts.
Encourages action over overthinking: The loop structure reinforces moving forward, only searching again if something fails or new unknowns pop up.
Here’s an example of a good prompt that follows the above structure:
<context_gathering>
Goal: Get enough context fast. Parallelize discovery and stop as soon as you can act.
Method:
- Start broad, then fan out to focused subqueries.
- In parallel, launch varied queries; read top hits per query. Deduplicate paths and cache; don’t repeat queries.
- Avoid over searching for context. If needed, run targeted searches in one parallel batch.
Early stop criteria:
- You can name exact content to change.
- Top hits converge (~70%) on one area/path.
Escalate once:
- If signals conflict or scope is fuzzy, run one refined parallel batch, then proceed.
Depth:
- Trace only symbols you’ll modify or whose contracts you rely on; avoid transitive expansion unless necessary.
Loop:
- Batch search → minimal plan → complete task.
- Search again only if validation fails or new unknowns appear. Prefer acting over more searching.
</context_gathering>
3. For fast, high-quality answers, use minimal reasoning with a short explanation
If speed matters, you can run GPT-5 with minimal reasoning while still nudging it to “think.” OpenAI suggests asking the model to start its answer with a short summary of its thought process, like a quick bullet point list. This can improve performance on tasks that need more intelligence without slowing things down too much.
Example:
Give the answer in one sentence.
First, list 2–3 bullet points explaining your reasoning.
4. Remove contradictory instructions and clearly define exceptions
Since GPT-5 is really good at following instructions, prompts containing contradictory or vague instructions will be more damaging to GPT-5 than to previous models.
The model can easily get confused when two instructions pull in opposite directions, for example telling it “always wait for approval” but also “go ahead and do it right away.” Instead:
Set a clear instruction hierarchy so the model knows which rule overrides in each scenario
Explicitly state exceptions (e.g., “skip lookup only in emergencies”)
Review prompts for wording that could be interpreted in multiple ways
Here’s a bad prompt:
Always wait for manager approval before sending a report.
If the report is urgent, send it immediately without waiting for approval.
Here’s a prompt that will work better:
Wait for manager approval before sending a report. Exception: If the report is urgent, send it immediately and notify the manager afterward.
5. Prompting for higher reasoning outputs
On the other hand, if you want to give the model higher autonomy, can can increase the reasoning_effort to high.
Here’s an example prompt that can help aide with this:
<persistence>
- You are an agent
- Keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.
- Only terminate your turn when you are sure that the problem is solved.
- Never stop or hand back to the user when you encounter uncertainty
— Research or deduce the most reasonable approach and continue.
- Do not ask the human to confirm or clarify assumptions, as you can always adjust later
— Decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting
</persistence>
6. Provide an escape hatch
As you provide more autonomy to GPT-5, you should instruct the model how to act in a case of uncertainty.
You can provide a context-gathering tag, and give the model explicit permission to proceed even if it’s uncertain. This prevents stalls when GPT-5 can’t be fully confident and ensures it acts on the best available information instead of halting.
Example:
<context_gathering>
- Search depth: very low
- Bias strongly towards providing a correct answer as quickly as possible, **even if it might not be fully correct.**
- Usually, this means an absolute maximum of2 tool calls.
- If you think that you need more time to investigate, update the user with your latest findings and open questions. You can proceed if the user confirms.
</context_gathering>
7. Use tool preambles to set context for tool calls
In the GPT-5’s output you now have access to tool preambles. These are short explanations from the model on how it’s executing it’s tools.
The best part of this is that you can steer the frequency, style, and content of tool preambles in your prompt using a brief upfront plan. By controlling the tool preamble, you ensure that every tool call starts with a clear, predictable setup.
Example:
<tool_preambles>
- Always begin by rephrasing the user's goal in a friendly, clear, and concise manner, before calling any tools.
- Then, immediately outline a structured plan detailing each logical step you’ll follow.
- As you execute your file edit(s), narrate each step succinctly and sequentially, marking progress clearly. - Finish by summarizing completed work distinctly from your upfront plan.
</tool_preambles>
8. Use Responses API over Chat Completions
OpenAI recommends using the Responses API over the Chat Completions API because it can access the model’s hidden reasoning tokens, which aren’t exposed in the output of Chat Completions.
The Responses API can send the previous turn's CoT to the model. This leads to fewer generated reasoning tokens, higher cache hit rates, and less latency. In fact, Open AI observed an increase of the Tau-Bench Retail score from 73.9% to 78.2% just by switching to the Responses API and including previous_response_id to pass back previous reasoning items into subsequent requests. More info here.
Chat Completions API example
You ask GPT-5 to solve a math problem.
- Output: “The answer is 42.”
- You only see the final answer, not how it got there.
Responses API example
You ask the same question.
- Output: “The answer is 42.”
- You still only see the final answer, but the Responses API carries the chain-of-thought it generated under the hood into the next LLM call, improving accuracy in the following generation.
9. For higher safety, predictability and prompt caching use allowed tools
The parameter allowed_tools lets you give the model a smaller “allowed right now” list from your full tools list. You can also set mode to "auto" (can use any allowed tool or none) or "required" (must use one allowed tool).
Here, the model knows about all three tools, but in this request it can only use get_weather or deepwiki:
GPT-5 will work great if you ask it to plan it’s execution before actually generating the answer. Here’s an example from some community tests:
Before responding, please:
1. Decompose the request into core components
2. Identify any ambiguities that need clarification
3. Create a structured approach to address each component
4. Validate your understanding before proceeding
11. Include validation instructions
To prevent errors, you can include validation instructions in your prompt. Example from the community:
You have two tasks to complete:
Task 1: Summarize the provided report into exactly 5 bullet points.
Task 2: Translate those bullet points into French.
Plan both tasks before starting:
Complete Task 1 first, then pause and present the summary for validation. Ask explicitly, “Does this summary meet the requirements?” before starting Task 2. Once validated, complete Task 2 and present the translation for a final review to ensure both tasks meet the stated objectives.
12. Make instructions ultra-specific to get accurate multi-task results from one prompt
While we suggest keeping instructions in separate prompts whenever possible, Pietro’s GPT-5 prompt guide shows that the model can also handle parallel tasks well. But only if you clearly define each one in the prompt.
Quick tips from his guide:
Instruct the model to first create a detailed plan outlining sub-tasks
Check the results after each major step against your requirements
Confirm that all objectives have been met before concluding.
Example: When building a multi-page financial report, tell GPT-5: “Plan each section and data source before writing, verify figures after each section is drafted, and confirm that the final report matches all stated requirements before sending.”
13. Keep few-shot examples light
In earlier, pre-reasoning models, this prompting method was the go-to for getting better results. With today’s reasoning models, clear instructions and well-defined constraints often work better than adding examples. In fact, research shows that few-shot prompts can reduce performance when the task requires heavy reasoning. That said, they can still be useful in certain cases..
Here’s how to think about this:
Use few-shot prompts for tasks needing strict formats or specialized knowledge.
For more complex, reasoning tasks, start with prompts without examples and strong instructions, and iterate from there
14. Assign GPT-5 a persona & role
A role like “compliance officer” or “financial analyst” shapes vocabulary and reasoning.
Example: When reviewing a policy draft for compliance, start with “You are a compliance officer. Review the text for any GDPR violations” to ensure the response uses the right expertise and focus.
15. Break tasks across multiple agent turns
Split complex prompts into discrete, testable units. You’ll get best performance when distinct, separable tasks are broken up across multiple agent turns, with one turn for each task.
16. Controlling output length with Verbosity
Verbosity adjusts how much detail GPT-5 includes in the answer. Use low for concise answers, high for richer explanations.
Example: Set verbosity: low for a brief board summary; raise to high for a technical onboarding guide with step-by-step detail.
17. Ensure markdown output with specific instructions
By default, GPT-5 in the API does not format its final answers in Markdown. However this is a prompt that works really well to reinforce a markdown output from a GPT-5 model:
- Use Markdown **only where semantically correct** (e.g., `inline code`, ```code fences```, lists, tables).
- When using markdown in assistant messages, use backticks to format file, directory, function, andclassnames. Use \\( and \\) forinlinemath, \\[ and \\] forblockmath.
18. Use GPT-5 to write prompts for itself
Leverage GPT-5 as a meta-prompter to diagnose and fix issues in your existing prompts. It’s actually very successful in doing this.
Here’s an example prompt template that’s recommended by OpenAI:
When asked to optimize prompts, give answers from your own perspective - explain what specific phrases could be added to, or deleted from, this prompt to more consistently elicit the desired behavior or prevent the undesired behavior.
Here's a prompt: [PROMPT]
The desired behavior from this prompt is for the agent to [DO DESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR]. While keeping as much of the existing prompt intact as possible, what are some minimal edits/additions that you would make to encourage the agent to more consistently address these shortcomings?
Test driven prompting with Vellum
Implementing these tips alone will not ensure accurate outputs for use of GPT-5 used in production. Prompts that break past edge cases are built through iteration.
Vellum makes this process faster and more reliable by giving you a dedicated workspace for prompt management to design, test, and refine your GPT-5 prompts across varied scenarios.
Purpose built for prompt evaluation, you can log performance, compare outputs, and track improvements over time, ensuring your prompts keep delivering consistent production grade outputs.
Efficiency maxing prompts is just the start, with Vellum providing an all encompassing platform for developing production grade AI for any use case.
Try Vellum for free today and see how quickly you can design, test, and optimize GPT-5 prompts that outperform anything you’ve built before!
You switch to GPT-5 thinking it’s going to be a significant upgrade. Instead, you get slower responses, prompts that don’t work, and answers that feel verbose.
It’s not just you, people are noticing the same thing:
The reality is, however, that GPT-5 can be a great model for your use-case. But only if you know how to manage it’s new parameters.
This model is very adaptable, and comes with new settings like: reasoning options, verbosity controls, and specific prompting tips.
We wrote this guide, to help you learn:
Prompting techniques that actually get good results with GPT-5
How to speed things up without losing accuracy
When it makes sense to switch from older models, and when to stick with them
How to use GPT-5 to improve your own prompts
Let’s look at the changes in this model before we get to the prompting tips.
GPT-5’s new controls
GPT-5 is built for any use-case, and it comes with exciting new layer of developer controls.
Here’s a quick overview:
Reasoning effort: This parameter controls the reasoning effort that a model takes before answering a question.
Verbosity: With this parameter, you can control the amount of detail in the responses.
Custom tools: Custom tools work in much the same way as JSON schema-driven function tools. But rather than providing the model explicit instructions on what input your tool requires, the model can pass an arbitrary string back to your tool as input (e.g. SQL queries or shell scripts)
Tool preambles: These are brief, user-visible explanations that GPT-5 generates before invoking any tool or function, outlining its intent or plan (e.g., “why I'm calling this tool”). Useful for debugging and understanding how the model works.
Parameters / Tool
What It Does
When to Use It
Verbosity
Controls how many tokens to output per generation. As a result you get lower latency.
low → Best for situations where you want concise answers or simple code generation, such as SQL queries. Or when you need latency to go down. high → When you want your model to perform thorough explanations of documents or perform extensive code refactoring.
Reasoning effort
This parameter controls how many reasoning tokens the model generates before producing a response.
For tasks that require less reasoning you would like to use minimal reasoning, vs high for more reasoning-intense tasks.
Custom tools
The model can pass an arbitrary string back to your tool as input.
This is useful to avoid unnecessarily wrapping a response in JSON, or to apply a custom grammar to the response by using.
Tool preambles
User-visible explanations that GPT-5 generates before invoking any tool or function.
When you want to understand how the model works, and for debugging tool use.
For a complete breakdown of GPT-5’s new parameters and how to make them work for you, check out this OpenAI resource.
How to migrate from older models
When migrating to GPT-5 from an older OpenAI model, start by experimenting with reasoning levels and prompting strategies. Based on best-practices here's how you should think about migrating from older models:
o3: gpt-5 with medium or high reasoning is a great replacement. Start with medium reasoning with prompt tuning, then increasing to high if you aren't getting the results you want.
gpt-4.1: gpt-5 with minimal or low reasoning is a strong alternative. Start with minimal and tune your prompts; increase to low if you need better performance.
o4-mini or gpt-4.1-mini: gpt-5-mini with prompt tuning is a great replacement.
gpt-4.1-nano: gpt-5-nano with prompt tuning is a great replacement.
Previous Model
Recommended GPT-5 Model
Starting Reasoning Effort
o3
gpt-5
Medium → High
gpt-4.1
gpt-5
Minimal → Low
o4-mini
gpt-5-mini
Default
gpt-4.1-mini
gpt-5-mini
Default
gpt-4.1-nano
gpt-5-nano
Default
18 Prompting tips for GPT-5
GPT-5 is built to follow instructions with surgical precisions, meaning poorly structured prompts will almost always result in undesired outputs. You’ll need to be as explicit and specific as possible, and very conscientious of how you structure your prompts.
Below we cover the most useful practices we found to help you yield better results with GPT-5 for your use-case
1. Get the model to run faster, by lowering it’s reasoning effort
The new model comes with a very powerful parameter: reasoning_effort. Using this parameter you control how much reasoning tokens the model uses to get to an answer.
If you want your model to minimize the latency, you can set the reasoning to minimal or low. This will reduce the exploration depth and will improve efficiency and latency.
2. Define clear criteria in your prompt
Set clear rules in your prompt for how the model should explore the problem. This keeps it from wandering through too many ideas.
Make sure you follow the following:
Sets a clear goal: The model knows exactly what the outcome should be
Provides a step-by-step method: It lays out a logical order: start broad, branch into specifics, run parallel queries, deduplicate, and cache.
Defines stopping rules: The “early stop criteria” tell the model when to move from searching to acting, avoiding endless context gathering.
Handles uncertainty: The “escalate once” step prevents the model from looping endlessly if results conflict.
Controls depth: It limits how far the model should trace details, focusing only on relevant symbols/contracts.
Encourages action over overthinking: The loop structure reinforces moving forward, only searching again if something fails or new unknowns pop up.
Here’s an example of a good prompt that follows the above structure:
<context_gathering>
Goal: Get enough context fast. Parallelize discovery and stop as soon as you can act.
Method:
- Start broad, then fan out to focused subqueries.
- In parallel, launch varied queries; read top hits per query. Deduplicate paths and cache; don’t repeat queries.
- Avoid over searching for context. If needed, run targeted searches in one parallel batch.
Early stop criteria:
- You can name exact content to change.
- Top hits converge (~70%) on one area/path.
Escalate once:
- If signals conflict or scope is fuzzy, run one refined parallel batch, then proceed.
Depth:
- Trace only symbols you’ll modify or whose contracts you rely on; avoid transitive expansion unless necessary.
Loop:
- Batch search → minimal plan → complete task.
- Search again only if validation fails or new unknowns appear. Prefer acting over more searching.
</context_gathering>
3. For fast, high-quality answers, use minimal reasoning with a short explanation
If speed matters, you can run GPT-5 with minimal reasoning while still nudging it to “think.” OpenAI suggests asking the model to start its answer with a short summary of its thought process, like a quick bullet point list. This can improve performance on tasks that need more intelligence without slowing things down too much.
Example:
Give the answer in one sentence.
First, list 2–3 bullet points explaining your reasoning.
4. Remove contradictory instructions and clearly define exceptions
Since GPT-5 is really good at following instructions, prompts containing contradictory or vague instructions will be more damaging to GPT-5 than to previous models.
The model can easily get confused when two instructions pull in opposite directions, for example telling it “always wait for approval” but also “go ahead and do it right away.” Instead:
Set a clear instruction hierarchy so the model knows which rule overrides in each scenario
Explicitly state exceptions (e.g., “skip lookup only in emergencies”)
Review prompts for wording that could be interpreted in multiple ways
Here’s a bad prompt:
Always wait for manager approval before sending a report.
If the report is urgent, send it immediately without waiting for approval.
Here’s a prompt that will work better:
Wait for manager approval before sending a report. Exception: If the report is urgent, send it immediately and notify the manager afterward.
5. Prompting for higher reasoning outputs
On the other hand, if you want to give the model higher autonomy, can can increase the reasoning_effort to high.
Here’s an example prompt that can help aide with this:
<persistence>
- You are an agent
- Keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.
- Only terminate your turn when you are sure that the problem is solved.
- Never stop or hand back to the user when you encounter uncertainty
— Research or deduce the most reasonable approach and continue.
- Do not ask the human to confirm or clarify assumptions, as you can always adjust later
— Decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting
</persistence>
6. Provide an escape hatch
As you provide more autonomy to GPT-5, you should instruct the model how to act in a case of uncertainty.
You can provide a context-gathering tag, and give the model explicit permission to proceed even if it’s uncertain. This prevents stalls when GPT-5 can’t be fully confident and ensures it acts on the best available information instead of halting.
Example:
<context_gathering>
- Search depth: very low
- Bias strongly towards providing a correct answer as quickly as possible, **even if it might not be fully correct.**
- Usually, this means an absolute maximum of2 tool calls.
- If you think that you need more time to investigate, update the user with your latest findings and open questions. You can proceed if the user confirms.
</context_gathering>
7. Use tool preambles to set context for tool calls
In the GPT-5’s output you now have access to tool preambles. These are short explanations from the model on how it’s executing it’s tools.
The best part of this is that you can steer the frequency, style, and content of tool preambles in your prompt using a brief upfront plan. By controlling the tool preamble, you ensure that every tool call starts with a clear, predictable setup.
Example:
<tool_preambles>
- Always begin by rephrasing the user's goal in a friendly, clear, and concise manner, before calling any tools.
- Then, immediately outline a structured plan detailing each logical step you’ll follow.
- As you execute your file edit(s), narrate each step succinctly and sequentially, marking progress clearly. - Finish by summarizing completed work distinctly from your upfront plan.
</tool_preambles>
8. Use Responses API over Chat Completions
OpenAI recommends using the Responses API over the Chat Completions API because it can access the model’s hidden reasoning tokens, which aren’t exposed in the output of Chat Completions.
The Responses API can send the previous turn's CoT to the model. This leads to fewer generated reasoning tokens, higher cache hit rates, and less latency. In fact, Open AI observed an increase of the Tau-Bench Retail score from 73.9% to 78.2% just by switching to the Responses API and including previous_response_id to pass back previous reasoning items into subsequent requests. More info here.
Chat Completions API example
You ask GPT-5 to solve a math problem.
- Output: “The answer is 42.”
- You only see the final answer, not how it got there.
Responses API example
You ask the same question.
- Output: “The answer is 42.”
- You still only see the final answer, but the Responses API carries the chain-of-thought it generated under the hood into the next LLM call, improving accuracy in the following generation.
9. For higher safety, predictability and prompt caching use allowed tools
The parameter allowed_tools lets you give the model a smaller “allowed right now” list from your full tools list. You can also set mode to "auto" (can use any allowed tool or none) or "required" (must use one allowed tool).
Here, the model knows about all three tools, but in this request it can only use get_weather or deepwiki:
GPT-5 will work great if you ask it to plan it’s execution before actually generating the answer. Here’s an example from some community tests:
Before responding, please:
1. Decompose the request into core components
2. Identify any ambiguities that need clarification
3. Create a structured approach to address each component
4. Validate your understanding before proceeding
11. Include validation instructions
To prevent errors, you can include validation instructions in your prompt. Example from the community:
You have two tasks to complete:
Task 1: Summarize the provided report into exactly 5 bullet points.
Task 2: Translate those bullet points into French.
Plan both tasks before starting:
Complete Task 1 first, then pause and present the summary for validation. Ask explicitly, “Does this summary meet the requirements?” before starting Task 2. Once validated, complete Task 2 and present the translation for a final review to ensure both tasks meet the stated objectives.
12. Make instructions ultra-specific to get accurate multi-task results from one prompt
While we suggest keeping instructions in separate prompts whenever possible, Pietro’s GPT-5 prompt guide shows that the model can also handle parallel tasks well. But only if you clearly define each one in the prompt.
Quick tips from his guide:
Instruct the model to first create a detailed plan outlining sub-tasks
Check the results after each major step against your requirements
Confirm that all objectives have been met before concluding.
Example: When building a multi-page financial report, tell GPT-5: “Plan each section and data source before writing, verify figures after each section is drafted, and confirm that the final report matches all stated requirements before sending.”
13. Keep few-shot examples light
In earlier, pre-reasoning models, this prompting method was the go-to for getting better results. With today’s reasoning models, clear instructions and well-defined constraints often work better than adding examples. In fact, research shows that few-shot prompts can reduce performance when the task requires heavy reasoning. That said, they can still be useful in certain cases..
Here’s how to think about this:
Use few-shot prompts for tasks needing strict formats or specialized knowledge.
For more complex, reasoning tasks, start with prompts without examples and strong instructions, and iterate from there
14. Assign GPT-5 a persona & role
A role like “compliance officer” or “financial analyst” shapes vocabulary and reasoning.
Example: When reviewing a policy draft for compliance, start with “You are a compliance officer. Review the text for any GDPR violations” to ensure the response uses the right expertise and focus.
15. Break tasks across multiple agent turns
Split complex prompts into discrete, testable units. You’ll get best performance when distinct, separable tasks are broken up across multiple agent turns, with one turn for each task.
16. Controlling output length with Verbosity
Verbosity adjusts how much detail GPT-5 includes in the answer. Use low for concise answers, high for richer explanations.
Example: Set verbosity: low for a brief board summary; raise to high for a technical onboarding guide with step-by-step detail.
17. Ensure markdown output with specific instructions
By default, GPT-5 in the API does not format its final answers in Markdown. However this is a prompt that works really well to reinforce a markdown output from a GPT-5 model:
- Use Markdown **only where semantically correct** (e.g., `inline code`, ```code fences```, lists, tables).
- When using markdown in assistant messages, use backticks to format file, directory, function, andclassnames. Use \\( and \\) forinlinemath, \\[ and \\] forblockmath.
18. Use GPT-5 to write prompts for itself
Leverage GPT-5 as a meta-prompter to diagnose and fix issues in your existing prompts. It’s actually very successful in doing this.
Here’s an example prompt template that’s recommended by OpenAI:
When asked to optimize prompts, give answers from your own perspective - explain what specific phrases could be added to, or deleted from, this prompt to more consistently elicit the desired behavior or prevent the undesired behavior.
Here's a prompt: [PROMPT]
The desired behavior from this prompt is for the agent to [DO DESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR]. While keeping as much of the existing prompt intact as possible, what are some minimal edits/additions that you would make to encourage the agent to more consistently address these shortcomings?
Test driven prompting with Vellum
Implementing these tips alone will not ensure accurate outputs for use of GPT-5 used in production. Prompts that break past edge cases are built through iteration.
Vellum makes this process faster and more reliable by giving you a dedicated workspace for prompt management to design, test, and refine your GPT-5 prompts across varied scenarios.
Purpose built for prompt evaluation, you can log performance, compare outputs, and track improvements over time, ensuring your prompts keep delivering consistent production grade outputs.
Efficiency maxing prompts is just the start, with Vellum providing an all encompassing platform for developing production grade AI for any use case.
Try Vellum for free today and see how quickly you can design, test, and optimize GPT-5 prompts that outperform anything you’ve built before!
ABOUT THE AUTHOR
Anita Kirkovska
Founding Growth Lead
An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.
Nicolas Zeeb
Technical Content Lead
Nick is Vellum’s technical content lead, writing about practical ways to use both voice and text-based agents at work. He has hands-on experience automating repetitive workflows so teams can focus on higher-value work.
Partnering with Composio to Help You Build Better AI Agents
Product Updates
August 12, 2025
•
5
Vellum Product Update | July
Guides
August 8, 2025
•
7
Best practices for building AI multi agent systems
Guides
August 7, 2025
•
7 min
GPT-5 Benchmarks
Model Comparisons
August 6, 2025
•
7 min
OpenAI o3 vs gpt-oss 120b
Guides
August 5, 2025
•
7 min
Understanding your agent’s behavior in production
The Best AI Tips — Direct To Your Inbox
Latest AI news, tips, and techniques
Specific tips for Your AI use cases
No spam
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.
Marina Trajkovska
Head of Engineering
This is just a great newsletter. The content is so helpful, even when I’m busy I read them.
Jeremy Hicks
Solutions Architect
Experiment, Evaluate, Deploy, Repeat.
AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.