Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

How to Count Tokens Before you Send an OpenAI API Request

Learn how to use Tiktoken and Vellum to programmatically count tokens before running OpenAI API requests.

Written by
Reviewed by
No items found.

As a developer working with OpenAI's language models, it's essential to understand how tokenization works and how to count tokens programmatically.

For those who prefer to handle tokenization this way, there are several libraries tailored for this purpose.

Tiktoken stands out as a swift BPE (Byte Pair Encoding) tokenizer(more on this below) designed specifically for OpenAI's models.

In this article, we'll explain how tokenization works, and how to use the Tiktoken library to count tokens before you send an OpenAI API request with Vellum.

LLM Tokenization

Tokenization is the process of splitting a text string into a list of tokens. OpenAI's models, such as GPT-3 and GPT-4, process text in the form of tokens rather than raw characters. By breaking down text into tokens, the models can better understand and generate human-like text.

To perform this, we use BPE tokenizers, in this case Tiktoken (because it’s the fastest one).

BPE, or byte pair encoding, changes text into these numbers and it helps the model to recognize common parts of words, helping it learn grammar and understand language better.

So, how can we count these tokens?

Introducing Tiktoken

Tiktoken is an open-sourced tokenizer developed by OpenAI, that’s 3-6x faster than other open source tokenIzers. It provides a convenient way to tokenize text and count tokens programmatically.

Different OpenAI models use different encoding models, or:

  • cl100k_base: Gpt-4, Gpt-3.5-turbo, and Text-embedding-ada-002.
  • p50k_base: Codex models, text-davinci-003 and text-davinci-002.
  • r50k_base: GPT-3 models.

Using Tiktoken in Python

To get started with Tiktoken in Python, follow these steps (or run this Collab notebook)

1. Install or upgrade Tiktoken:


pip install --upgrade tiktoken

2. Import Tiktoken in your Python script:


import tiktoken

2. Count tokens using one of the following options:

Option 1: Use tiktoken.encoding_for_model() to automatically load the correct encoding for a given OpenAI model


def num_tokens_from_string(string: str, model_name: str) -> int:
    encoding = tiktoken.encoding_for_model(model_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

print(num_tokens_from_string("Hello world, let's test tiktoken.", "gpt-3.5-turbo"))

Option 2: Specify the encoding directly


def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

print(num_tokens_from_string("Hello world, let's test tiktoken.", "cl100k_base"))

Both options will output the number of tokens in the given text string.

You can try this script and check how does OpenAI count tokens for each model, but what happens when you want to do this programmatically to determine whether or not to send a request to OpenAI?

Let’s show you how you can do that with Vellum.

Counting Tokens Programmatically with Vellum

To decide when to send a request to OpenAI, you can use Vellum, an AI product development platform. With Vellum, you can set up a Workflow that checks if your input prompt is too long for a specific model's limit. It will flag any input that's too large before the request is sent to OpenAI.

Here's how to do it:

1. Create a Code Execution Node

In this node, we'll insert the code that runs Tiktoken. We’re using Option 2, where we’ll be working with a specific encoder ( cl100k_base ), as we’re using GPT-3.5-turbo for this example.

Here’s how we define this node:

2. Create a Conditional Node

To capture if a given prompt input size is beyond a certain token limit, we’ll attach a conditional node to the code execution node. This node will check if the token count is higher or lower than the expected limit.

In this example, since the context limit for GPT-3.5-turbo is 16k, the node will proceed with the API call if the token count is under this limit. If the count exceeds 16k the node will run a fallback prompt node instead. The fallback prompt in our example uses GPT-4 Turbo to run the prompt because it has higher token limit (128k), but you can set your fallback prompt to be whatever you want.

3. Add Final Output

Finally we attach a final-node to our Workflow, that will pass the selected prompt output.

4. Run the Workflow & Test

Here’s how the workflow looks like when it passes the token limit:

And here’s how the workflow will run when the prompt size is beyond the token limit (for simplicity we use a token limit of 400):

If you want to programmatically count the tokens in your prompts, book a call here to get started with Vellum.

Why Count Tokens Programmatically?

Counting tokens is crucial for two main reasons:

  1. Text Length Limit: OpenAI models have a maximum token limit for input text. By counting tokens programmatically, you can determine whether a given text string exceeds the model's capacity before you send the API request.
  2. API Usage Costs: OpenAI's API usage is priced based on the number of tokens processed. Knowing the token count helps you estimate and manage the cost of your API calls. In the example above for example, we showed that you can revert to a more expensive model that has higher context size in unique cases where your prompt size won’t fit the context input limit for a cheaper model like GPT-3.5 Turbo model.

Tokenization in Other Languages

Tiktoken and other tokenizer libraries are available in various programming languages, including:

For cl100k_base and p50k_base encodings:

For r50k_base (gpt2) encodings, tokenizers are available in many languages.

(Vellum makes no endorsements or guarantees of third-party libraries.)

Tokenization in Chat Mode

Chat models like gpt-3.5-turbo and gpt-4 use tokens in the same way as other models.

However, due to their message-based formatting, counting tokens for a conversation can be more challenging. If a conversation exceeds the model's token limit, you'll need to truncate, omit, or shrink the text to fit within the limit.

Keep in mind that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.

Conclusion

Understanding tokenization and how to count tokens is essential for working effectively with OpenAI's language models.

By using the Tiktoken library or other tokenizer libraries in your preferred programming language, you can easily count tokens and ensure that your text input fits within the model's limitations while managing API usage costs.

ABOUT THE AUTHOR
Anita Kirkovska
Founding Growth Lead

An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.

ABOUT THE reviewer

No items found.
lAST UPDATED
Mar 27, 2024
share post
Expert verified
Related Posts
LLM basics
October 10, 2025
7 min
The Best AI Workflow Builders for Automating Business Processes
LLM basics
October 7, 2025
8 min
The Complete Guide to No‑Code AI Workflow Automation Tools
All
October 6, 2025
6 min
OpenAI's Agent Builder Explained
Product Updates
October 1, 2025
7
Vellum Product Update | September
Guides
October 6, 2025
15
A practical guide to AI automation
LLM basics
September 25, 2025
8 min
Top Low-code AI Agent Platforms for Product Managers
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Prior authorization navigator
Automate the prior authorization process for medical claims.

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.
AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.
Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent
Check order status, manage shopping carts and process returns.

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.
Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.
Q&A RAG Chatbot with Cohere reranking

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

Synthetic Dataset Generator
Generate a synthetic dataset for testing your AI engineered logic.
LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.
ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.
Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.
Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Retail pricing optimizer agent
Analyze product data and market conditions and recommend pricing strategies.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.

Build AI agents in minutes for

{{industry_name}}

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Prior authorization navigator
Automate the prior authorization process for medical claims.
Population health insights reporter
Combine healthcare sources and structure data for population health management.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.