Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

OpenAI o1: Prompting Tips, Limitations, and Capabilities

Learn how to prompt OpenAI o1 models, understand their limits and the opportunities ahead.

Written by
Reviewed by
No items found.

The latest models from OpenAI—the OpenAI o1 and its "mini" version—operate differently from previous GPT models. These reasoning models are trained to think through their answers step by step before responding to the user. This means that each time you prompt these models, they will take some time to internally "think" before producing the final answer.

The performance gains are huge — for math reasoning specifically, the OpenAI o1 model is 70% more accurate.

These new qualities open up many opportunities, but there are also several limitations—in fact, this is still just version 1. But one thing is sure, these models are behaving differently, and we need to rethink our prompting methods.

Let’s look at what’s changed, how to prompt these models, understand their limits and the opportunities ahead.

A Primer on Chain of Thought

We wrote more on chain of thought here, but let’s cover this technique briefly.

Traditional LLMs (GPT-4o and alike) tend to predict the next word or token without fully working through the reasoning process, especially for multi-step problems like math or logic. They predict the next word (token) in the sentence based on a calculated probability within the context of that sentence.

The more complex the task, the easier it is for the model to lose track. So, Chain of Thought (CoT) works well because it helps break down complex reasoning tasks into smaller, more manageable steps.

With that, the models focus on solving one part of the problem at a time — and their accuracy increases.

For example, instead of just asking "Solve 2x + 3 = 7," we can include the intermediate steps that the model should follow to arrive at the correct answer:

System message: You’re the best mathematician in the world and you’ll help me solve linear equations.

Example:

For the equation 5x - 4 = 16

1. Add 4 to both sides: 5x - 4 + 4 = 16 + 4 → 5x = 20
2. Divide both sides by 5: 5x / 5 = 20 / 5 → x = 4
User message:

Now solve 2x + 3 = 7

This technique was widely used by everyone, and today, OpenAI has integrated this natively in the model — making it more powerful for reasoning tasks.

So basically, these models are smarter and don’t want to be confused with lots of prompting. Let’s see what that means.

How to Prompt OpenAI o1

Now, since these models performs chain-of-thought prompting internally, the best prompts for these “reasoning” models will be different. That means that some things are gonna change.

Here’s what OpenAI recommends:

1) Keep prompts simple

These models are trained to work best if you just write simple, straightforward prompts. They won’t need extensive guidance because they can find the most optimal path themselves.

2) Avoid using CoT

Because the chain of thought technique is part of the model’s reasoning already, using your own reasoning in the prompts won’t work and might hinder the performance.

3) Use Delimiters for Quality

This technique applies for all previous models as well as this one. To clearly indicate parts of your prompt, use delimiters like “###”, XML tags or section titles.

4) Limit Additional RAG

If you want to add more context in your prompt via RAG, make sure you only include the most relevant information. Providing a lot of information at inference time might make the model “overthink” and take more time to get to the answer.

But, OpenAI hides the actual reasoning process, and we don’t know how the model breaks down a given reasoning challenge — so determining what’s the most relevant information here will be tricky to do.

Capabilities

The OpenAI o1 models are powerful because they "think" through problems out of the box, thanks to their previous training using reinforcement learning algorithms.

They surpass PhD-level accuracy in science benchmarks, excels in competitive programming, and significantly improves math problem-solving, scoring 83% on a challenging exam compared to GPT-4o's 13%.

We also found out that OpenAI o1 is significantly better to solve the hardest SAT math equations, and is great at classifying customer tickets. Read more in our report here.

Limits

There are a few limits that we’ve observed:

1) OpenAI hides the actual chain of thought reasoning, and there is no way for you to measure how long a given answer will take, or to understand how the model go to the answer.

2) The model doesn’t come with more features out of the box — things like streaming, temperature setting, use of tools and others aren’t available for this model. This can hinder many frequent use-cases today.

3) Takes too long to reach an answer. Now, more than ever, we should think about our tasks and which models are most suitable to solve them. If your use-case is not sensitive to latency you can use OpenAI o1, but for most of them test GPT-4o models and balance the tradeoffs.

4) Not the best model for all use-cases. Human-experts that evaluated this model said that they don’t prefer it for some natural language tasks like creative writing.

The Potential

We see the o1 models as the GPT-2 models of their time. This is just the first step, and we’ll unlock new opportunities from these models as they’re further developed, and integrated with the tools/features we need.

While GPT-4o and alike are great models to handle various production cases, the o1 technology might power more agentic applications.

Think more “Cursor AI” than Klarna chatbots.

Think more “Devins” than Github Copilot.

In these cases, you won’t mind waiting a bit longer, as completing the task or finding the right solution is more important than getting an immediate responses. We might also mix reasoning models for “planning” tasks, and use much faster models for the execution.

The coming years will set the course for the future.

If you want to learn more about these changes, receive feedback on your use case, or evaluate these models for your task, reach out to our AI experts here.

ABOUT THE AUTHOR
Anita Kirkovska
Founding Growth Lead

An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.

ABOUT THE reviewer

No items found.
lAST UPDATED
Sep 13, 2024
share post
Expert verified
Related Posts
Guides
October 21, 2025
15 min
AI transformation playbook
LLM basics
October 20, 2025
8 min
The Top Enterprise AI Automation Platforms (Guide)
LLM basics
October 10, 2025
7 min
The Best AI Workflow Builders for Automating Business Processes
LLM basics
October 7, 2025
8 min
The Complete Guide to No‑Code AI Workflow Automation Tools
All
October 6, 2025
6 min
OpenAI's Agent Builder Explained
Product Updates
October 1, 2025
7
Vellum Product Update | September
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

SOAP Note Generation Agent
Extract subjective and objective info, assess and output a treatment plan.
Prior authorization navigator
Automate the prior authorization process for medical claims.

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.
AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent
Check order status, manage shopping carts and process returns.

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.
LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.

Dynamic template box for Sales, Use {{sales}}

Start with some of these sales examples

Research agent for sales demos
Company research based on Linkedin and public data as a prep for sales demo.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Q&A RAG Chatbot with Cohere reranking
Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Financial Statement Review Workflow
Extract and review financial statements and their corresponding footnotes from SEC 10-K filings.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
Population health insights reporter
Combine healthcare sources and structure data for population health management.
Financial Statement Review Workflow
Extract and review financial statements and their corresponding footnotes from SEC 10-K filings.
AI legal research agent
Comprehensive legal research memo based on research question, jurisdiction and date range.
Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.
Research agent for sales demos
Company research based on Linkedin and public data as a prep for sales demo.

Build AI agents in minutes for

{{industry_name}}

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Prior authorization navigator
Automate the prior authorization process for medical claims.
Population health insights reporter
Combine healthcare sources and structure data for population health management.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.