How to Control LLM Output Using Temperature, Top‑K, and Top‑P
The article explains how sampling parameters—Temperature, Top‑k, and Top‑p—shape the output of large language models, comparing greedy and beam search, illustrating probability changes with concrete examples, and offering practical guidance on adjusting these settings for different tasks.
Large language models (LLMs) generate text by predicting the next token from a probability distribution over the vocabulary. For the partial sentence “The cat sat on the…”, a typical distribution might be:
mat = 0.5 couch = 0.3 roof = 0.1 others = 0.05 …
Greedy Search
Greedy search selects the highest‑probability token at each step. In the example above the model would always choose “mat”. This method is fast and deterministic but can produce repetitive or less diverse text.
Beam Search
Beam search keeps the top k most likely partial sequences (the beam width) at each step, expands each with possible next tokens, and retains the k sequences with the highest combined scores.
With a beam width of 2 the two beams evolve as follows:
Beam 1: “The cat sat on the mat ”. Next‑word candidates: “and” (0.6), “while” (0.3).
Beam 2: “The cat sat on the couch ”. Next‑word candidates: “while” (0.4), “and” (0.3).
Combined probabilities:
The cat sat on the mat and… = 0.7 × 0.6 = 0.42
The cat sat on the couch while… = 0.2 × 0.4 = 0.08
Beam search selects the higher‑scoring sequence, often yielding more coherent results than greedy search.
Sampling
Sampling draws a token according to its probability rather than always picking the top one. Using the same distribution the model might output “couch” or even “roof”, introducing variability.
Temperature
Temperature (T) rescales the raw logits before the softmax, controlling randomness. Typical values are between 0 and 1.
Original probabilities (T = 1): mat = 0.5, couch = 0.3, roof = 0.1
T = 0.2: mat = 0.8, couch = 0.15, roof = 0.05
T = 0.7: mat = 0.55, couch = 0.35, roof = 0.10
Lower temperatures sharpen the distribution (making high‑probability tokens even more likely); higher temperatures flatten it (giving low‑probability tokens a chance).
Top‑P (Nucleus) Sampling
Top‑P selects the smallest set of tokens whose cumulative probability exceeds p, then samples from this subset.
Example probabilities: mat = 0.5, couch = 0.3, roof = 0.1, floor = 0.07, bed = 0.03.
If p = 0.5, only “mat” is kept.
If p = 0.9, “mat”, “couch”, and “roof” remain.
If p = 1.0, all tokens are eligible (full random sampling).
Top‑K Sampling
Top‑K limits consideration to the K highest‑probability tokens at each step, then samples from them.
K = 2 → keep “mat”(0.5) and “couch”(0.3).
K = 3 → keep “mat”(0.5), “couch”(0.3), “roof”(0.1).
K = vocabulary size → full sampling.
These strategies provide different trade‑offs between controllability and creativity.
Code example
如果 Top-p = 0.5 (选择性采样),此时模型只选择 mat,而忽略了 couch、roof、floor和 bed。
如果 Top-p = 0.9 (更多多样性),此时模型除了选择 mat外,还可以选择couch、roof,增加了多样性。
如果 Top-p = 1 (完全随机采样),此时所有候选单词都可以被采样。AI Algorithm Path
A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
