table of contents

Inline evaluation / Guardrails: Ensure good system performance at run-time

This is some text inside of a div block.

Tree of Thought Prompting: What It Is and How to Use It

Learn how to use Tree of Thought prompting to improve LLM results

Author

Anita Kirkovska

Nov 30, 2023

Different prompting techniques can improve the results from your large language models (LLMs).

One effective technique is Tree of Thought prompting, known for its ability to handle complex reasoning tasks.

In this blog post we’ll explain the framework, provide some examples and give you advice for your use-cases.

‍

Why do you need prompting techniques?

You can achieve a lot with simple prompts, but the quality of your results will always depend on the quality of your prompt and the information you provide.

To increase the quality of your prompt, there are a few advanced prompting techniques that can guide the LLM to provide better answers, with less hallucinations. This is very useful if you expect your LLM to solve new, unseen problems that usually need intermediate steps.

There are many prompting techniques like Few-Shot prompting or Chain of Thought prompting which we covered in another post.

Today we’ll look at Tree of Thoughts.

‍

Tree of Thoughts (ToT) framework

The Tree of Thoughts (ToT) is inspired by the human mind's approach to solve complex reasoning tasks through trial and error. Put simply, this technique guides the LLM to explore different ideas, and reevaluate when needed, in order to provide the optimal solution.

This approach outperforms Chain of Thought prompting because it doesn’t stick to one “chain of thought” reasoning. However, it requires a lot of coding and custom algorithms to search through the layers to find the best reasoning path.

How does it work?

ToT creates a tree-like structure of ideas, where each idea is a step towards solving a problem. This approach enables the LLM to self-evaluate the intermediate “thoughts” and decide whether to continue with that path or choose another.

To perform this, the authors of the ToT framework augment the LLM with search algorithms like breadth-first search and depth-first search.

Use-cases

The authors demonstrated that in tasks like mathematical reasoning and creative writing, the Tree of Thoughts prompting significantly exceeds the performance of other prompting methods.

This is certainly interesting. But, can we utilize this framework in a plain and simple prompt?

Turns out we can.

‍

Tree of Thoughts (ToT) prompting (with examples)

The Tree of Thoughts prompting (ToT) is adapted to the Tree of Thoughts framework, where the author tried to fit the whole concept in a single prompt.

The example below showcases the structure of the prompt.

It's important to know that GPT-3.5 doesn't get the right answer for the below example with Chain of Thought prompting, but it does with ToT prompting.

Example

ToT prompt

Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is...

‍

The response from GPT-3.5:

Expert 1: The ball is in the living room.
Expert 2: The ball is in the kitchen.
Expert 3: The ball is in the bedroom.
Expert 1: Bob carries the cup to the bedroom, so the ball must be in the cup.
Expert 2: Oh, I see my mistake. Yes, the ball is in the cup.
Expert 3: Agreed, the ball is in the cup in the bedroom.
Expert 1: Next, Bob turns the cup upside down in the bedroom.
Expert 2: After that, Bob walks to the garden and puts the cup down.
Expert 3: Therefore, the ball must have fallen out of the cup when Bob turned it upside down in the bedroom. So, the ball is in the bedroom, not in the cup anymore.
Expert 1: Oh, you're right. I made a mistake. The ball is in the bedroom, not in the cup.
Expert 2: Agreed, the ball is in the bedroom.
Expert 3: Bob then walks to the garage, so the ball remains in the bedroom. It is not in the garden or the garage.
Expert 1: Absolutely, the ball is still in the bedroom.
Expert 2: Yes, the ball hasn't moved from the bedroom.
Expert 3: Therefore, the ball is in the bedroom, not in the garden or the garage.
All three experts agree that the ball is in the bedroom.

‍

Some limited data from the authors shows that this prompting technique does better than Chain of Thought prompting on GPT 3.5, so it’s definitely worth exploring if you want to achieve better results with a less expensive model than GPT-4.

It’s always useful to compare various techniques with different models and evaluate which one will produce the best result for your use-case.

‍

Want to compare prompting techniques?

If you want to experiment with various prompt techniques across various models for your use-case - we can help you!

Vellum has the tooling layer to experiment with prompts and models, evaluate their quality, and make changes with confidence once in production.

You can take a look at our use-cases, or book a call to talk with someone from our team.

Different prompting techniques can improve the results from your large language models (LLMs).

One effective technique is Tree of Thought prompting, known for its ability to handle complex reasoning tasks.

In this blog post we’ll explain the framework, provide some examples and give you advice for your use-cases.

‍

Why do you need prompting techniques?

You can achieve a lot with simple prompts, but the quality of your results will always depend on the quality of your prompt and the information you provide.

There are many prompting techniques like Few-Shot prompting or Chain of Thought prompting which we covered in another post.

Today we’ll look at Tree of Thoughts.

‍

Tree of Thoughts (ToT) framework

How does it work?

To perform this, the authors of the ToT framework augment the LLM with search algorithms like breadth-first search and depth-first search.

Use-cases

The authors demonstrated that in tasks like mathematical reasoning and creative writing, the Tree of Thoughts prompting significantly exceeds the performance of other prompting methods.

This is certainly interesting. But, can we utilize this framework in a plain and simple prompt?

Turns out we can.

‍

Tree of Thoughts (ToT) prompting (with examples)

The Tree of Thoughts prompting (ToT) is adapted to the Tree of Thoughts framework, where the author tried to fit the whole concept in a single prompt.

The example below showcases the structure of the prompt.

It's important to know that GPT-3.5 doesn't get the right answer for the below example with Chain of Thought prompting, but it does with ToT prompting.

Example

ToT prompt

Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is...

‍

The response from GPT-3.5:

Expert 1: The ball is in the living room.
Expert 2: The ball is in the kitchen.
Expert 3: The ball is in the bedroom.
Expert 1: Bob carries the cup to the bedroom, so the ball must be in the cup.
Expert 2: Oh, I see my mistake. Yes, the ball is in the cup.
Expert 3: Agreed, the ball is in the cup in the bedroom.
Expert 1: Next, Bob turns the cup upside down in the bedroom.
Expert 2: After that, Bob walks to the garden and puts the cup down.
Expert 3: Therefore, the ball must have fallen out of the cup when Bob turned it upside down in the bedroom. So, the ball is in the bedroom, not in the cup anymore.
Expert 1: Oh, you're right. I made a mistake. The ball is in the bedroom, not in the cup.
Expert 2: Agreed, the ball is in the bedroom.
Expert 3: Bob then walks to the garage, so the ball remains in the bedroom. It is not in the garden or the garage.
Expert 1: Absolutely, the ball is still in the bedroom.
Expert 2: Yes, the ball hasn't moved from the bedroom.
Expert 3: Therefore, the ball is in the bedroom, not in the garden or the garage.
All three experts agree that the ball is in the bedroom.

‍

It’s always useful to compare various techniques with different models and evaluate which one will produce the best result for your use-case.

‍

Want to compare prompting techniques?

If you want to experiment with various prompt techniques across various models for your use-case - we can help you!

Vellum has the tooling layer to experiment with prompts and models, evaluate their quality, and make changes with confidence once in production.

You can take a look at our use-cases, or book a call to talk with someone from our team.

ABOUT THE AUTHOR

Anita Kirkovska

Founding Growth Lead

An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.

No items found.

Product Updates

July 15, 2025

•

Introducing Custom Docker Images & Custom Nodes

Product Updates

July 14, 2025

•

Vellum Workflows SDK is Generally Available

July 10, 2025

•

5 min

Announcing our $20m Series A

Product Updates

July 1, 2025

•

6 min

Vellum Product Update | May & June

LLM basics

June 8, 2025

•

5 min

Big Ideas from the AI Engineer World’s Fair

LLM basics

June 1, 2025

•

8 min

Build AI Products Faster: Top Development Platforms Compared

The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska

Head of Engineering