Top K
is a setting supported by some LLMs; it determines how many of the most likely tokens should be considered when generating a response.
Top K is sometimes stylized as top_k in literature.
LLMs, or Large Language Models, are trained on huge corpuses of text. Consequently, they feature massive dictionaries. However, some words are significantly more likely to appear (e.g. the
, you
, jump
) than others (e.g. omnivore
, innovation
, matrimony
).
These words are cataloged as tokens, the fundamental unit of LLMs. Tokenizing large words includes splitting them into smaller strings (e.g. omnivore
→ omni
and vore
).
Top K is an integer that defines how many of the most likely tokens should be considered when determining the next token.
To provide an example, imagine a response that has thus far generated the string: On burgers, I like to add
. With a Top K of 2
, the LLM would only consider the two most likely tokens, such as ketchup (0.2)
and mustard (0.1)
. There’s a long-tail of other considerations, such as onion (0.05)
, pickles(0.04)
, or butter (0.02)
, but those would be cropped from consideration.
An alternative to Top K is Top P. While Top K an explicit quantity of tokens, Top P instead denotes a probabilistic sum of the subset, which can significantly vary in token count.
Please note, OpenAI only supports Top P and not Top K.
You can set stop sequences on Anthropic’s Messages API with the optional top_k
parameter. top_k
strictly accepts an integer
value.
You can set Top K on Gemini’s API with the optional topK
value. HeretopK
also accepts an integer
value.
By increasing or decreasing Top K, you can see how repetitive or complex responses can get, particularly in their vocabulary and phrasing. With a very low Top K, such as 1
, you’ll get more predictable responses. With a very high Top K, you’ll get a more variance.
Conversely, you can also experiment with Top P, which is similar to Top K, but instead specifies the probabilistic sum of the considered tokens. Top P is more popular than Top K because it accounts for fast and slow drop-offs in probabilities. Because they are both limiters, Top P and Top K shouldn’t be used simultaneously.
Top K can be useful for certain scenarios:
1
because the LLM will only consider the most likely token. This makes the output deterministic and is known as a greedy response.1
and 10
) to create bland and “creative” options.