How to write effective prompts for GPT-5

A practical prompting guide to get GPT-5 to work for your use case.

Written by

Reviewed by

CONTENTS

Inline evaluation / Guardrails: Ensure good system performance at run-time

This is some text inside of a div block.

You switch to GPT-5 thinking it’s going to be a significant upgrade. Instead, you get slower responses, prompts that don’t work, and answers that feel verbose.

It’s not just you, people are noticing the same thing:

The reality is, however, that GPT-5 can be a great model for your use-case. But only if you know how to manage it’s new parameters.

This model is very adaptable, and comes with new settings like: reasoning options, verbosity controls, and specific prompting tips.

We wrote this guide, to help you learn:

Prompting techniques that actually get good results with GPT-5
How to speed things up without losing accuracy
When it makes sense to switch from older models, and when to stick with them
How to use GPT-5 to improve your own prompts

‍

Let’s look at the changes in this model before we get to the prompting tips. After reading the article, head into Vellum to test this model with your new prompts.

GPT-5’s new controls

GPT-5 is built for any use-case, and it comes with exciting new layer of developer controls.

Here’s a quick overview:

Reasoning effort: This parameter controls the reasoning effort that a model takes before answering a question.
Verbosity: With this parameter, you can control the amount of detail in the responses.
Custom tools: Custom tools work in much the same way as JSON schema-driven function tools. But rather than providing the model explicit instructions on what input your tool requires, the model can pass an arbitrary string back to your tool as input (e.g. SQL queries or shell scripts)
Tool preambles: These are brief, user-visible explanations that GPT-5 generates before invoking any tool or function, outlining its intent or plan (e.g., “why I'm calling this tool”). Useful for debugging and understanding how the model works.
‍

Parameters / Tool	What It Does	When to Use It
Verbosity	Controls how many tokens to output per generation. As a result you get lower latency.	`low` → Best for situations where you want concise answers or simple code generation, such as SQL queries. Or when you need latency to go down. `high` → When you want your model to perform thorough explanations of documents or perform extensive code refactoring.
Reasoning effort	This parameter controls how many reasoning tokens the model generates before producing a response.	For tasks that require less reasoning you would like to use `minimal` reasoning, vs `high` for more reasoning-intense tasks.
Custom tools	The model can pass an arbitrary string back to your tool as input.	This is useful to avoid unnecessarily wrapping a response in JSON, or to apply a custom grammar to the response by using.
Tool preambles	User-visible explanations that GPT-5 generates before invoking any tool or function.	When you want to understand how the model works, and for debugging tool use.

‍

For a complete breakdown of GPT-5’s new parameters and how to make them work for you, check out this OpenAI resource.

How to migrate from older models

When migrating to GPT-5 from an older OpenAI model, start by experimenting with reasoning levels and prompting strategies. Based on best-practices here's how you should think about migrating from older models:

o3: gpt-5 with medium or high reasoning is a great replacement. Start with medium reasoning with prompt tuning, then increasing to high if you aren't getting the results you want.
gpt-4.1: gpt-5 with minimal or low reasoning is a strong alternative. Start with minimal and tune your prompts; increase to low if you need better performance.
o4-mini or gpt-4.1-mini: gpt-5-mini with prompt tuning is a great replacement.
gpt-4.1-nano: gpt-5-nano with prompt tuning is a great replacement.
‍

Previous Model	Recommended GPT-5 Model	Starting Reasoning Effort
o3	gpt-5	Medium → High
gpt-4.1	gpt-5	Minimal → Low
o4-mini	gpt-5-mini	Default
gpt-4.1-mini	gpt-5-mini	Default
gpt-4.1-nano	gpt-5-nano	Default

‍

18 Prompting tips for GPT-5

GPT-5 is built to follow instructions with surgical precisions, meaning poorly structured prompts will almost always result in undesired outputs. You’ll need to be as explicit and specific as possible, and very conscientious of how you structure your prompts.

Below we cover the most useful practices we found to help you yield better results with GPT-5 for your use-case

1. Get the model to run faster, by lowering it’s reasoning effort

The new model comes with a very powerful parameter: reasoning_effort. Using this parameter you control how much reasoning tokens the model uses to get to an answer.

If you want your model to minimize the latency, you can set the reasoning to minimal or low. This will reduce the exploration depth and will improve efficiency and latency.

2. Define clear criteria in your prompt

Set clear rules in your prompt for how the model should explore the problem. This keeps it from wandering through too many ideas.

Make sure you follow the following:

Sets a clear goal: The model knows exactly what the outcome should be
Provides a step-by-step method: It lays out a logical order: start broad, branch into specifics, run parallel queries, deduplicate, and cache.
Defines stopping rules: The “early stop criteria” tell the model when to move from searching to acting, avoiding endless context gathering.
Handles uncertainty: The “escalate once” step prevents the model from looping endlessly if results conflict.
Controls depth: It limits how far the model should trace details, focusing only on relevant symbols/contracts.
Encourages action over overthinking: The loop structure reinforces moving forward, only searching again if something fails or new unknowns pop up.

Here’s an example of a good prompt that follows the above structure:

<context_gathering>
Goal: Get enough context fast. Parallelize discovery and stop as soon as you can act.

Method:
- Start broad, then fan out to focused subqueries.
- In parallel, launch varied queries; read top hits per query. Deduplicate paths and cache; don’t repeat queries.
- Avoid over searching for context. If needed, run targeted searches in one parallel batch.

Early stop criteria:
- You can name exact content to change.
- Top hits converge (~70%) on one area/path.

Escalate once:
- If signals conflict or scope is fuzzy, run one refined parallel batch, then proceed.

Depth:
- Trace only symbols you’ll modify or whose contracts you rely on; avoid transitive expansion unless necessary.

Loop:
- Batch search → minimal plan → complete task.
- Search again only if validation fails or new unknowns appear. Prefer acting over more searching.
</context_gathering>

3. For fast, high-quality answers, use minimal reasoning with a short explanation

If speed matters, you can run GPT-5 with minimal reasoning while still nudging it to “think.” OpenAI suggests asking the model to start its answer with a short summary of its thought process, like a quick bullet point list. This can improve performance on tasks that need more intelligence without slowing things down too much.

Example:

Give the answer in one sentence.
First, list 2–3 bullet points explaining your reasoning.

4. Remove contradictory instructions and clearly define exceptions

Since GPT-5 is really good at following instructions, prompts containing contradictory or vague instructions will be more damaging to GPT-5 than to previous models.

The model can easily get confused when two instructions pull in opposite directions, for example telling it “always wait for approval” but also “go ahead and do it right away.” Instead:

Set a clear instruction hierarchy so the model knows which rule overrides in each scenario
Explicitly state exceptions (e.g., “skip lookup only in emergencies”)
Review prompts for wording that could be interpreted in multiple ways

Here’s a bad prompt:

Always wait for manager approval before sending a report.
If the report is urgent, send it immediately without waiting for approval.

Here’s a prompt that will work better:

Wait for manager approval before sending a report. Exception: If the report is urgent, send it immediately and notify the manager afterward.

5. Prompting for higher reasoning outputs

On the other hand, if you want to give the model higher autonomy, can can increase the reasoning_effort to high.

Here’s an example prompt that can help aide with this:

<persistence>
- You are an agent 
- Keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.
- Only terminate your turn when you are sure that the problem is solved.
- Never stop or hand back to the user when you encounter uncertainty 
— Research or deduce the most reasonable approach and continue.
- Do not ask the human to confirm or clarify assumptions, as you can always adjust later 
— Decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting
</persistence>

6. Provide an escape hatch

As you provide more autonomy to GPT-5, you should instruct the model how to act in a case of uncertainty.

You can provide a context-gathering tag, and give the model explicit permission to proceed even if it’s uncertain. This prevents stalls when GPT-5 can’t be fully confident and ensures it acts on the best available information instead of halting.

Example:

<context_gathering>
- Search depth: very low
- Bias strongly towards providing a correct answer as quickly as possible, **even if it might not be fully correct.**
- Usually, this means an absolute maximum of 2 tool calls.
- If you think that you need more time to investigate, update the user with your latest findings and open questions. You can proceed if the user confirms.
</context_gathering>

7. Use `tool preambles` to set context for tool calls

In the GPT-5’s output you now have access to tool preambles. These are short explanations from the model on how it’s executing it’s tools.

The best part of this is that you can steer the frequency, style, and content of tool preambles in your prompt using a brief upfront plan. By controlling the tool preamble, you ensure that every tool call starts with a clear, predictable setup.

Example:

<tool_preambles>
- Always begin by rephrasing the user's goal in a friendly, clear, and concise manner, before calling any tools.
- Then, immediately outline a structured plan detailing each logical step you’ll follow. 
- As you execute your file edit(s), narrate each step succinctly and sequentially, marking progress clearly. - Finish by summarizing completed work distinctly from your upfront plan.
</tool_preambles>

8. Use Responses API over Chat Completions

OpenAI recommends using the Responses API over the Chat Completions API because it can access the model’s hidden reasoning tokens, which aren’t exposed in the output of Chat Completions.

The Responses API can send the previous turn's CoT to the model. This leads to fewer generated reasoning tokens, higher cache hit rates, and less latency. In fact, Open AI observed an increase of the Tau-Bench Retail score from 73.9% to 78.2% just by switching to the Responses API and including previous_response_id to pass back previous reasoning items into subsequent requests. More info here.

Chat Completions API example

You ask GPT-5 to solve a math problem.
- Output: “The answer is 42.”
- You only see the final answer, not how it got there.

Responses API example

You ask the same question.
- Output: “The answer is 42.” 
- You still only see the final answer, but the Responses API carries the chain-of-thought it generated under the hood into the next LLM call, improving accuracy in the following generation.

9. For higher safety, predictability and prompt caching use allowed tools

The parameter allowed_tools lets you give the model a smaller “allowed right now” list from your full tools list. You can also set mode to "auto" (can use any allowed tool or none) or "required" (must use one allowed tool).

Here, the model knows about all three tools, but in this request it can only use get_weather or deepwiki:

{
  "tools": [
    { "type": "function", "name": "get_weather" },
    { "type": "mcp", "server_label": "deepwiki" },
    { "type": "image_generation" }
  ],
  "tool_choice": {
    "type": "allowed_tools",
    "mode": "auto",
    "tools": [
      { "type": "function", "name": "get_weather" },
      { "type": "mcp", "server_label": "deepwiki" }
    ]
  }
}

10. Planning Before Execution

GPT-5 will work great if you ask it to plan it’s execution before actually generating the answer. Here’s an example from some community tests:

Before responding, please:
1. Decompose the request into core components
2. Identify any ambiguities that need clarification
3. Create a structured approach to address each component
4. Validate your understanding before proceeding

11. Include validation instructions

To prevent errors, you can include validation instructions in your prompt. Example from the community:

You have two tasks to complete:  
Task 1: Summarize the provided report into exactly 5 bullet points.  
Task 2: Translate those bullet points into French.  

Plan both tasks before starting: 
Complete Task 1 first, then pause and present the summary for validation. Ask explicitly, “Does this summary meet the requirements?” before starting Task 2. Once validated, complete Task 2 and present the translation for a final review to ensure both tasks meet the stated objectives.

12. Make instructions ultra-specific to get accurate multi-task results from one prompt

While we suggest keeping instructions in separate prompts whenever possible, Pietro’s GPT-5 prompt guide shows that the model can also handle parallel tasks well. But only if you clearly define each one in the prompt.

Quick tips from his guide:

Instruct the model to first create a detailed plan outlining sub-tasks
Check the results after each major step against your requirements
Confirm that all objectives have been met before concluding.

Example: When building a multi-page financial report, tell GPT-5: “Plan each section and data source before writing, verify figures after each section is drafted, and confirm that the final report matches all stated requirements before sending.”

13. Keep few-shot examples light

In earlier, pre-reasoning models, this prompting method was the go-to for getting better results. With today’s reasoning models, clear instructions and well-defined constraints often work better than adding examples. In fact, research shows that few-shot prompts can reduce performance when the task requires heavy reasoning. That said, they can still be useful in certain cases..

Here’s how to think about this:

Use few-shot prompts for tasks needing strict formats or specialized knowledge.
For more complex, reasoning tasks, start with prompts without examples and strong instructions, and iterate from there

14. Assign GPT-5 a persona & role

A role like “compliance officer” or “financial analyst” shapes vocabulary and reasoning.

Example: When reviewing a policy draft for compliance, start with “You are a compliance officer. Review the text for any GDPR violations” to ensure the response uses the right expertise and focus.

15. Break tasks across multiple agent turns

Split complex prompts into discrete, testable units. You’ll get best performance when distinct, separable tasks are broken up across multiple agent turns, with one turn for each task.

16. Controlling output length with Verbosity

Verbosity adjusts how much detail GPT-5 includes in the answer. Use low for concise answers, high for richer explanations.

Example: Set verbosity: low for a brief board summary; raise to high for a technical onboarding guide with step-by-step detail.

17. Ensure markdown output with specific instructions

By default, GPT-5 in the API does not format its final answers in Markdown. However this is a prompt that works really well to reinforce a markdown output from a GPT-5 model:

- Use Markdown **only where semantically correct** (e.g., `inline code`, ```code fences```, lists, tables).
- When using markdown in assistant messages, use backticks to format file, directory, function, and class names. Use \\( and \\) for inline math, \\[ and \\] for block math.

18. Use GPT-5 to write prompts for itself

Leverage GPT-5 as a meta-prompter to diagnose and fix issues in your existing prompts. It’s actually very successful in doing this.

Here’s an example prompt template that’s recommended by OpenAI:

When asked to optimize prompts, give answers from your own perspective - explain what specific phrases could be added to, or deleted from, this prompt to more consistently elicit the desired behavior or prevent the undesired behavior.
Here's a prompt: [PROMPT]
The desired behavior from this prompt is for the agent to [DO DESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR]. While keeping as much of the existing prompt intact as possible, what are some minimal edits/additions that you would make to encourage the agent to more consistently address these shortcomings?

Test driven prompting with Vellum

Implementing these tips alone will not ensure accurate outputs for use of GPT-5 used in production. Prompts that break past edge cases are built through iteration.

Vellum makes this process faster and more reliable by giving you a dedicated workspace for prompt management to design, test, and refine your GPT-5 prompts across varied scenarios.

Purpose built for prompt evaluation, you can log performance, compare outputs, and track improvements over time, ensuring your prompts keep delivering consistent production grade outputs.

Efficiency maxing prompts is just the start, with Vellum providing an all encompassing platform for developing production grade AI for any use case.

Try Vellum for free today and see how quickly you can design, test, and optimize GPT-5 prompts that outperform anything you’ve built before!

You switch to GPT-5 thinking it’s going to be a significant upgrade. Instead, you get slower responses, prompts that don’t work, and answers that feel verbose.

It’s not just you, people are noticing the same thing:

The reality is, however, that GPT-5 can be a great model for your use-case. But only if you know how to manage it’s new parameters.

This model is very adaptable, and comes with new settings like: reasoning options, verbosity controls, and specific prompting tips.

We wrote this guide, to help you learn:

Prompting techniques that actually get good results with GPT-5
How to speed things up without losing accuracy
When it makes sense to switch from older models, and when to stick with them
How to use GPT-5 to improve your own prompts

‍

Let’s look at the changes in this model before we get to the prompting tips. After reading the article, head into Vellum to test this model with your new prompts.

GPT-5’s new controls

GPT-5 is built for any use-case, and it comes with exciting new layer of developer controls.

Here’s a quick overview:

Reasoning effort: This parameter controls the reasoning effort that a model takes before answering a question.
Verbosity: With this parameter, you can control the amount of detail in the responses.
Custom tools: Custom tools work in much the same way as JSON schema-driven function tools. But rather than providing the model explicit instructions on what input your tool requires, the model can pass an arbitrary string back to your tool as input (e.g. SQL queries or shell scripts)
Tool preambles: These are brief, user-visible explanations that GPT-5 generates before invoking any tool or function, outlining its intent or plan (e.g., “why I'm calling this tool”). Useful for debugging and understanding how the model works.
‍

Parameters / Tool	What It Does	When to Use It
Verbosity	Controls how many tokens to output per generation. As a result you get lower latency.	`low` → Best for situations where you want concise answers or simple code generation, such as SQL queries. Or when you need latency to go down. `high` → When you want your model to perform thorough explanations of documents or perform extensive code refactoring.
Reasoning effort	This parameter controls how many reasoning tokens the model generates before producing a response.	For tasks that require less reasoning you would like to use `minimal` reasoning, vs `high` for more reasoning-intense tasks.
Custom tools	The model can pass an arbitrary string back to your tool as input.	This is useful to avoid unnecessarily wrapping a response in JSON, or to apply a custom grammar to the response by using.
Tool preambles	User-visible explanations that GPT-5 generates before invoking any tool or function.	When you want to understand how the model works, and for debugging tool use.

‍

For a complete breakdown of GPT-5’s new parameters and how to make them work for you, check out this OpenAI resource.

How to migrate from older models

o3: gpt-5 with medium or high reasoning is a great replacement. Start with medium reasoning with prompt tuning, then increasing to high if you aren't getting the results you want.
gpt-4.1: gpt-5 with minimal or low reasoning is a strong alternative. Start with minimal and tune your prompts; increase to low if you need better performance.
o4-mini or gpt-4.1-mini: gpt-5-mini with prompt tuning is a great replacement.
gpt-4.1-nano: gpt-5-nano with prompt tuning is a great replacement.
‍

Previous Model	Recommended GPT-5 Model	Starting Reasoning Effort
o3	gpt-5	Medium → High
gpt-4.1	gpt-5	Minimal → Low
o4-mini	gpt-5-mini	Default
gpt-4.1-mini	gpt-5-mini	Default
gpt-4.1-nano	gpt-5-nano	Default

‍

18 Prompting tips for GPT-5

Below we cover the most useful practices we found to help you yield better results with GPT-5 for your use-case

1. Get the model to run faster, by lowering it’s reasoning effort

The new model comes with a very powerful parameter: reasoning_effort. Using this parameter you control how much reasoning tokens the model uses to get to an answer.

If you want your model to minimize the latency, you can set the reasoning to minimal or low. This will reduce the exploration depth and will improve efficiency and latency.

2. Define clear criteria in your prompt

Set clear rules in your prompt for how the model should explore the problem. This keeps it from wandering through too many ideas.

Make sure you follow the following:

Sets a clear goal: The model knows exactly what the outcome should be
Provides a step-by-step method: It lays out a logical order: start broad, branch into specifics, run parallel queries, deduplicate, and cache.
Defines stopping rules: The “early stop criteria” tell the model when to move from searching to acting, avoiding endless context gathering.
Handles uncertainty: The “escalate once” step prevents the model from looping endlessly if results conflict.
Controls depth: It limits how far the model should trace details, focusing only on relevant symbols/contracts.
Encourages action over overthinking: The loop structure reinforces moving forward, only searching again if something fails or new unknowns pop up.

Here’s an example of a good prompt that follows the above structure:

<context_gathering>
Goal: Get enough context fast. Parallelize discovery and stop as soon as you can act.

Method:
- Start broad, then fan out to focused subqueries.
- In parallel, launch varied queries; read top hits per query. Deduplicate paths and cache; don’t repeat queries.
- Avoid over searching for context. If needed, run targeted searches in one parallel batch.

Early stop criteria:
- You can name exact content to change.
- Top hits converge (~70%) on one area/path.

Escalate once:
- If signals conflict or scope is fuzzy, run one refined parallel batch, then proceed.

Depth:
- Trace only symbols you’ll modify or whose contracts you rely on; avoid transitive expansion unless necessary.

Loop:
- Batch search → minimal plan → complete task.
- Search again only if validation fails or new unknowns appear. Prefer acting over more searching.
</context_gathering>

3. For fast, high-quality answers, use minimal reasoning with a short explanation

Example:

Give the answer in one sentence.
First, list 2–3 bullet points explaining your reasoning.

4. Remove contradictory instructions and clearly define exceptions

Since GPT-5 is really good at following instructions, prompts containing contradictory or vague instructions will be more damaging to GPT-5 than to previous models.

The model can easily get confused when two instructions pull in opposite directions, for example telling it “always wait for approval” but also “go ahead and do it right away.” Instead:

Set a clear instruction hierarchy so the model knows which rule overrides in each scenario
Explicitly state exceptions (e.g., “skip lookup only in emergencies”)
Review prompts for wording that could be interpreted in multiple ways

Here’s a bad prompt:

Always wait for manager approval before sending a report.
If the report is urgent, send it immediately without waiting for approval.

Here’s a prompt that will work better:

Wait for manager approval before sending a report. Exception: If the report is urgent, send it immediately and notify the manager afterward.

5. Prompting for higher reasoning outputs

On the other hand, if you want to give the model higher autonomy, can can increase the reasoning_effort to high.

Here’s an example prompt that can help aide with this:

<persistence>
- You are an agent 
- Keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.
- Only terminate your turn when you are sure that the problem is solved.
- Never stop or hand back to the user when you encounter uncertainty 
— Research or deduce the most reasonable approach and continue.
- Do not ask the human to confirm or clarify assumptions, as you can always adjust later 
— Decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting
</persistence>

6. Provide an escape hatch

As you provide more autonomy to GPT-5, you should instruct the model how to act in a case of uncertainty.

Example:

<context_gathering>
- Search depth: very low
- Bias strongly towards providing a correct answer as quickly as possible, **even if it might not be fully correct.**
- Usually, this means an absolute maximum of 2 tool calls.
- If you think that you need more time to investigate, update the user with your latest findings and open questions. You can proceed if the user confirms.
</context_gathering>

7. Use `tool preambles` to set context for tool calls

In the GPT-5’s output you now have access to tool preambles. These are short explanations from the model on how it’s executing it’s tools.

Example:

<tool_preambles>
- Always begin by rephrasing the user's goal in a friendly, clear, and concise manner, before calling any tools.
- Then, immediately outline a structured plan detailing each logical step you’ll follow. 
- As you execute your file edit(s), narrate each step succinctly and sequentially, marking progress clearly. - Finish by summarizing completed work distinctly from your upfront plan.
</tool_preambles>

8. Use Responses API over Chat Completions

OpenAI recommends using the Responses API over the Chat Completions API because it can access the model’s hidden reasoning tokens, which aren’t exposed in the output of Chat Completions.

Chat Completions API example

You ask GPT-5 to solve a math problem.
- Output: “The answer is 42.”
- You only see the final answer, not how it got there.

Responses API example

You ask the same question.
- Output: “The answer is 42.” 
- You still only see the final answer, but the Responses API carries the chain-of-thought it generated under the hood into the next LLM call, improving accuracy in the following generation.

9. For higher safety, predictability and prompt caching use allowed tools

Here, the model knows about all three tools, but in this request it can only use get_weather or deepwiki:

{
  "tools": [
    { "type": "function", "name": "get_weather" },
    { "type": "mcp", "server_label": "deepwiki" },
    { "type": "image_generation" }
  ],
  "tool_choice": {
    "type": "allowed_tools",
    "mode": "auto",
    "tools": [
      { "type": "function", "name": "get_weather" },
      { "type": "mcp", "server_label": "deepwiki" }
    ]
  }
}

10. Planning Before Execution

GPT-5 will work great if you ask it to plan it’s execution before actually generating the answer. Here’s an example from some community tests:

Before responding, please:
1. Decompose the request into core components
2. Identify any ambiguities that need clarification
3. Create a structured approach to address each component
4. Validate your understanding before proceeding

11. Include validation instructions

To prevent errors, you can include validation instructions in your prompt. Example from the community:

You have two tasks to complete:  
Task 1: Summarize the provided report into exactly 5 bullet points.  
Task 2: Translate those bullet points into French.  

Plan both tasks before starting: 
Complete Task 1 first, then pause and present the summary for validation. Ask explicitly, “Does this summary meet the requirements?” before starting Task 2. Once validated, complete Task 2 and present the translation for a final review to ensure both tasks meet the stated objectives.

12. Make instructions ultra-specific to get accurate multi-task results from one prompt

Quick tips from his guide:

Instruct the model to first create a detailed plan outlining sub-tasks
Check the results after each major step against your requirements
Confirm that all objectives have been met before concluding.

Example: When building a multi-page financial report, tell GPT-5: “Plan each section and data source before writing, verify figures after each section is drafted, and confirm that the final report matches all stated requirements before sending.”

13. Keep few-shot examples light

Here’s how to think about this:

Use few-shot prompts for tasks needing strict formats or specialized knowledge.
For more complex, reasoning tasks, start with prompts without examples and strong instructions, and iterate from there

14. Assign GPT-5 a persona & role

A role like “compliance officer” or “financial analyst” shapes vocabulary and reasoning.

Example: When reviewing a policy draft for compliance, start with “You are a compliance officer. Review the text for any GDPR violations” to ensure the response uses the right expertise and focus.

15. Break tasks across multiple agent turns

Split complex prompts into discrete, testable units. You’ll get best performance when distinct, separable tasks are broken up across multiple agent turns, with one turn for each task.

16. Controlling output length with Verbosity

Verbosity adjusts how much detail GPT-5 includes in the answer. Use low for concise answers, high for richer explanations.

Example: Set verbosity: low for a brief board summary; raise to high for a technical onboarding guide with step-by-step detail.

17. Ensure markdown output with specific instructions

By default, GPT-5 in the API does not format its final answers in Markdown. However this is a prompt that works really well to reinforce a markdown output from a GPT-5 model:

- Use Markdown **only where semantically correct** (e.g., `inline code`, ```code fences```, lists, tables).
- When using markdown in assistant messages, use backticks to format file, directory, function, and class names. Use \\( and \\) for inline math, \\[ and \\] for block math.

18. Use GPT-5 to write prompts for itself

Leverage GPT-5 as a meta-prompter to diagnose and fix issues in your existing prompts. It’s actually very successful in doing this.

Here’s an example prompt template that’s recommended by OpenAI:

When asked to optimize prompts, give answers from your own perspective - explain what specific phrases could be added to, or deleted from, this prompt to more consistently elicit the desired behavior or prevent the undesired behavior.
Here's a prompt: [PROMPT]
The desired behavior from this prompt is for the agent to [DO DESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR]. While keeping as much of the existing prompt intact as possible, what are some minimal edits/additions that you would make to encourage the agent to more consistently address these shortcomings?

Test driven prompting with Vellum

Implementing these tips alone will not ensure accurate outputs for use of GPT-5 used in production. Prompts that break past edge cases are built through iteration.

Vellum makes this process faster and more reliable by giving you a dedicated workspace for prompt management to design, test, and refine your GPT-5 prompts across varied scenarios.

Purpose built for prompt evaluation, you can log performance, compare outputs, and track improvements over time, ensuring your prompts keep delivering consistent production grade outputs.

Efficiency maxing prompts is just the start, with Vellum providing an all encompassing platform for developing production grade AI for any use case.

Try Vellum for free today and see how quickly you can design, test, and optimize GPT-5 prompts that outperform anything you’ve built before!

ABOUT THE AUTHOR

ABOUT THE reviewer

Anita Kirkovska

Founding Growth Lead

An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.

Nicolas Zeeb

Technical Content Lead

Nick is Vellum’s technical content lead, writing about practical ways to use both voice and text-based agents at work. He has hands-on experience automating repetitive workflows so teams can focus on higher-value work.

lAST UPDATED

Aug 14, 2025

Expert verified

November 11, 2025

•

15 min

AI Agent Use Cases Guide to Unlock AI ROI

LLM basics

November 6, 2025

•

7 min

Beginners Guide to Building AI Agents

Product Updates

November 5, 2025

•

7 min

Vellum Product Update | October

All

November 3, 2025

•

6 min

I’m done building AI agents

Guides

October 21, 2025

•

15 min

AI transformation playbook

LLM basics

October 20, 2025

•

8 min

The Top Enterprise AI Automation Platforms (Guide)

The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska

Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks

Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Book a DemoLearn more

How to write effective prompts for GPT-5

GPT-5’s new controls

How to migrate from older models

18 Prompting tips for GPT-5

1. Get the model to run faster, by lowering it’s reasoning effort

2. Define clear criteria in your prompt

3. For fast, high-quality answers, use minimal reasoning with a short explanation

4. Remove contradictory instructions and clearly define exceptions

5. Prompting for higher reasoning outputs

6. Provide an escape hatch

7. Use tool preambles to set context for tool calls

8. Use Responses API over Chat Completions

Chat Completions API example

Responses API example

9. For higher safety, predictability and prompt caching use allowed tools

10. Planning Before Execution

11. Include validation instructions

12. Make instructions ultra-specific to get accurate multi-task results from one prompt

13. Keep few-shot examples light

14. Assign GPT-5 a persona & role

15. Break tasks across multiple agent turns

16. Controlling output length with Verbosity

17. Ensure markdown output with specific instructions

18. Use GPT-5 to write prompts for itself

Test driven prompting with Vellum

GPT-5’s new controls

How to migrate from older models

18 Prompting tips for GPT-5

1. Get the model to run faster, by lowering it’s reasoning effort

2. Define clear criteria in your prompt

3. For fast, high-quality answers, use minimal reasoning with a short explanation

4. Remove contradictory instructions and clearly define exceptions

5. Prompting for higher reasoning outputs

6. Provide an escape hatch

7. Use tool preambles to set context for tool calls

8. Use Responses API over Chat Completions

Chat Completions API example

Responses API example

9. For higher safety, predictability and prompt caching use allowed tools

10. Planning Before Execution

11. Include validation instructions

12. Make instructions ultra-specific to get accurate multi-task results from one prompt

13. Keep few-shot examples light

14. Assign GPT-5 a persona & role

15. Break tasks across multiple agent turns

16. Controlling output length with Verbosity

17. Ensure markdown output with specific instructions

18. Use GPT-5 to write prompts for itself

Test driven prompting with Vellum

Experiment, Evaluate, Deploy, Repeat.

General CTA component, Use {{general-cta}}

General CTA component [For enterprise], Use {{general-cta-enterprise}}

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.Use {{ebook-cta}} and add a Ebook reference in the article

LLM leaderboard CTA component. Use {{llm-cta}}

Case study CTA component (ROI) = {{roi-cta}}

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

Case study CTA component (Time to value) = {{time-cta}}

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

Dynamic template box for Sales, Use {{sales}}

Start with some of these sales examples

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

7. Use `tool preambles` to set context for tool calls

7. Use `tool preambles` to set context for tool calls

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article