Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

GPT-5: What should we expect?

Learn more about the expected GPT-5 features on improved reasoning, multimodality and accuracy on math & coding

Written by
Reviewed by

If you work with LLMs, you probably wait for each version of OpenAI’s GPT series with excitement. It’s reminiscent of the early iPhone days, where each subsequent model was touted as a significant upgrade from the predecessor. This time, the excitement is shared by both consumers and enterprises, as these innovations set the foundations of many advanced AI systems.

To date, this year has been dominated by GPT-4. More specifically, the derivative GPT-4 Turbo and GPT-4o/Omni model have been at the forefront. Turbo significantly improved GPT-4’s accuracy, and Omni extended GPT-4’s reasoning and interfacing to voice/audio.

Now, we’re expecting OpenAI to debut its next major installment in the GPT series sometime at the end of this year or early 2025. The timing aligns with their 1-2 year cadence of releasing major models.

Knowing OpenAI.. they’ll probably launch in 2025.

But what are developers expecting from this new model?

Let’s cover the latest, the timelines — and the expectations from developers who build with LLMs.

Double Launch: Project Strawberry and Project Orion

According to two insiders, there are two models associated with OpenAI’s next launch—Project Strawberry and Project Orion (yes, we know, it sounds a little silly). The former is a brand-new type of model, tackling the coveted problem of reasoning. The other is the actual successor to GPT-4.

What is Project Strawberry?

Previously known as Q*, Project Strawberry is OpenAI’s most secretive project. The latest info suggests we might see a distilled version of this model as soon as this fall, and it’s expected to:

  • Solve new math problems it’s never encountered before (but how?)
  • Take time to “think deeply” when planning its answers
  • Offer advanced reasoning capabilities that you can toggle on or off depending on how quickly you need a response

But, there is another project that’s being talked about now: Project Orion!

What is Project Orion?

Project Orion is expected to be the next flagship model by OpenAI. What’s novel, however, is that GPT-5 is not just being trained on direct Internet data, but also synthetic data that’s being generated by Project Strawberry.

To visualize this through an oversimplification, Project Strawberry would download and digest a paper on, say, chemical titration, generate an abundance of data that’s ingestible to an LLM, and then train Project Orion on it so that it could tackle chemistry problems. This was harder before, because those reasoning problems were presented in irregular and sparse ways across the existing Internet.

This sounds too good to be true — We’re definitely excited to try this one!

It’s worth saying that although these are just rumors, as these things go, they become realities in a few months in.

But what is OpenAI really doing behind the curtain to enable these features?

The Engine Driving GPT-5’s Capabilities

We put our thinking cap and talked with some experts to understand how OpenAI might be pushing the boundaries for their new models. Here’re three interesting observations:

Improved Reasoning with Built-in Prompting Techniques

These days, there are tons of studies showing what works well when prompting LLMs. One particularly effective method is Chain of Thought prompting, which helps the model reason more effectively. Also, Anthropic introduced the “Thinking” step where the model first lays out its reasoning in an XML tags, before answering the question in another tag.

So, what’s stopping OpenAI from incorporating these techniques into the model, allowing it to perform these steps behind the scenes before delivering an answer?

People are also expecting the model to rank responses internally, evaluating options before selecting the best one to output.

One downside might be slower responses—but they could include an option to toggle this feature on or off, as some rumors suggest.

Knowing when it’s wrong

When it comes to LLM hallucinations — you can’t really take them out completely, because they’ll hinder the models ability to be “creative”. So finding the right balance between the two has been something that a lot of people have been thinking about.

Logprobs was a great feature that helped with this. Logprobs in LLMs indicate the model’s confidence in each generated token, showing how likely it is to be the correct choice based on the preceding context. Higher log probabilities mean the token is more likely in that context, helping users see the model’s confidence in its output or explore other options the model considered.

So, maybe we’ll eventually see logprobs as a built-in feature, allowing the model to be more “confident” in its answers right from the start.

High Multimodality Improvements

Today, using GPT-4o for data extractions from images/pdfs is very constrained — but that can change very fast. Mostly because we’ve been using GPT-4o for a while now, and they can utilize all of our data (pdfs, images..) to improve the mutimodal capability of the next model a.k.a GPT-5 or alike.

The biggest jump from GPT-3.5 to GPT-4 came from widespread adoption, and now OpenAI has doubled its users since last year—so the training is in full swing!

What do customers need from GPT-5?

We scoured the internet and asked our customers — “What do you actually need from GPT-5 to improve your systems?”. We got a really obvious answer:

The most hoped for capabilities are increased context windows, improved reasoning and multimodal functionality, lower hallucinations, and, of course, accuracy bumps across all benchmarks (especially coding & math) at a lower price.

In short, many things that didn’t work before should suddenly start working. We’re expecting a leap similar to the one from GPT-3.5 to GPT-4 — faster, cheaper and more powerful models.

GPT-5 Release Date?

The rumors suggest that we might get an early version of Project Orion (aka GPT-5) this fall, but knowing OpenAI — plan for 2025.

Thus far, OpenAI has been releasing major models every 2 years, with some intermediary models in-between. The major models (e.g. GPT-2, GPT-3, etc.) have featured sizable leaps. The intermediary GPT models (e.g. GPT-3.5, GPT-4 Turbo, GPT-4o) overcame the most immediate hiccups that held back the respective flagship model.

A Quick Timeline of GPT releases

Let’s review a timeline of the previous GPT AI models. While most users have only learned about GPT recently, it’s been going through iterations for over a half-decade.

GPT Version Launch Date Details
GPT-1 June 2018 GPT-1 was OpenAI's inaugural flagship model, trained on just 40GB of data. It could rephrase and generate text, and do some translation. It could only respond to fairly short sentences.
GPT-2 February 2019 GPT-2 was trained on significantly more text, with over 1.5B parameters. It could maintain coherence and relevance far better than GPT-1.
GPT-3 June 2020 GPT-3 was trained on 570GB of text data with over 175B parameters. This time, GPT-3 was trained on large corpuses of knowledge such as Wikipedia. GPT-3 was a significant leap over GPT-2, attaining major spotlight. It was also criticized by the general public for biases.
GPT-3.5 December 2022 GPT-3.5 was similar to GPT-3, but it featured some additional techniques such as Reinforcement Learning with Human Feedback (RLHF) to make responses more human-friendly, allowing it to parse intent better.
GPT-4 March 2023 Like previous iterations, GPT-4 was trained on more data. This dramatically expanded its knowledge base. It also was better at cracking down on disallowed content. The difference between GPT-4 and GPT-3.5 is minimal for small tasks, but sizable for big ones. It also integrated images as a valid input.
GPT-4 Turbo November 2023 GPT-4 Turbo is a faster version of GPT-4 with some improved accuracy by cracking down on hallucinations.
GPT-4o May 2024 GPT-4o expanded the multimodality of GPT-4, now able to handle not just text and images, but also voice interactions, allowing an end user to speak directly to it.
GPT-5 Rumored late 2024 or early 2025 Release date TBD. Should be the most advanced AI model to date.

Thus far, what has Sam Altman hinted at?

As goes the tune of many Reddit comments, we know to take OpenAI CEO Sam Altman’s comments with a grain of salt. It’s not a matter of dishonesty; hype cycles are just games of exaggeration and headlines, and his opinions are often harvested by online discourse for the spiciest bits. Rumors of previous models have always wavered from existential dread to praising monumental advancements. (Admittedly, both descriptors can coexist, but in our experience, GPT models rarely amount to either extreme.)

Regardless, Sam’s, Open AI’s, and OpenAI’s partners’ comments still matter. So far, there have been two major themes hinted for GPT-5's advantages over previous versions: multimodality and reasoning.

Multimodality

GPT-4o’s hallmark achievement was allowing users to interact with it via speech. It also integrated with DALLE, enabling users to requested generated images related to the conversation.

This trend will only continue with GPT-5, according to Sam Altman. GPT-5’s flagship feature will also be multimodality, with text, images, and videos being valid inputs and available outputs. Unlike GPT-4o, GPT-5 should be able to work with audiovisual data seamlessly, where there is a consistent thread between them, not just one-off generations.

This has been a constant gripe from our customers with the current model, where images aren’t consistent with one another, making the multimodality feel more like an integration than a native feature. GPT-5 should fix that.

Better “Reasoning”

While GPT cannot reason from an anthropomorphic standpoint, it can simulate reasoning through probabilistic inference.

On Bill Gates’s Unconfuse Me, Sam Altman spoke on how GPT-4o featured major reasoning (and accuracy) improvements over GPT-4, and that trend should continue with GPT-5 due to the sizable leaps in training size. Microsoft’s CTO, Kevin Scott, was more forthright with GPT-5’s promise, expecting it could “pass your qualifying exams when you’re a PhD student” and that everybody will be impressed by “reasoning breakthroughs” of the model.

How much will GPT-5 cost?

It is difficult to guess how much GPT-5 will cost. However, OpenAI has had a history of releasing the new flagship model at an expensive price, but then trimming the cost with subsequent models that are more streamlined and limited. We could expect the same pattern for GPT-5, especially if it can tackle niche tasks.

Conclusion

In short, all of this talk around the next GPT model—whether it’s Project Strawberry or Project Orion—is real, and everyone’s feeling it, from developers to businesses.

These new models promise to take things to the next level with smarter reasoning, better handling of different types of media, and overall stronger performance.

But as we look forward to these cool new features, we also need to think about the trade-offs, like how fast it responds and how accurate it is.

Whether OpenAI rolls these out later this year or in 2025, one thing’s clear: the next GPT model is going to shake things up in a big way.

ABOUT THE AUTHOR

ABOUT THE reviewer

Mathew Pregasen
Technical Contributor

Mathew Pregasen is a technical expert with experience with AI, infrastructure, security, and frontend frameworks. He contributes to multiple technical publications and is an alumnus of Columbia University and YCombinator.

Anita Kirkovska
Founding Growth Lead

An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.

lAST UPDATED
Aug 30, 2024
share post
Expert verified
Related Posts
Guides
October 21, 2025
15 min
AI transformation playbook
LLM basics
October 20, 2025
8 min
The Top Enterprise AI Automation Platforms (Guide)
LLM basics
October 10, 2025
7 min
The Best AI Workflow Builders for Automating Business Processes
LLM basics
October 7, 2025
8 min
The Complete Guide to No‑Code AI Workflow Automation Tools
All
October 6, 2025
6 min
OpenAI's Agent Builder Explained
Product Updates
October 1, 2025
7
Vellum Product Update | September
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

Prior authorization navigator
Automate the prior authorization process for medical claims.
SOAP Note Generation Agent
Extract subjective and objective info, assess and output a treatment plan.

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.
Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent
Check order status, manage shopping carts and process returns.

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.
ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.

Dynamic template box for Sales, Use {{sales}}

Start with some of these sales examples

Research agent for sales demos
Company research based on Linkedin and public data as a prep for sales demo.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.
Q&A RAG Chatbot with Cohere reranking

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.
ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.
Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.
Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.
ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.

Build AI agents in minutes for

{{industry_name}}

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Prior authorization navigator
Automate the prior authorization process for medical claims.
Population health insights reporter
Combine healthcare sources and structure data for population health management.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.