How Temperature Shapes Output in Large Language Models
The article explains the Temperature hyper‑parameter in large language models, shows how it modifies the softmax distribution, provides a Python visualisation script, and demonstrates through experiments that higher values increase creativity while lower values make outputs more deterministic.
Definition of Temperature
Temperature is a hyper‑parameter used in large language models such as ChatGPT, GPT‑3, GPT‑3.5, GPT‑4, and LLaMA to adjust the model’s confidence in its most likely responses.
Principle Explanation
When a model predicts the next token, it first produces raw scores z_i. These scores are turned into probabilities with the softmax function. Introducing a Temperature variable θ modifies the softmax as follows:
Dividing each logit by θ means a higher Temperature ( θ) lifts low‑probability tokens, while a lower Temperature suppresses them, making the distribution sharper.
Experimental Results
To visualise this effect, the following Python code plots the adjusted probabilities for a set of example token scores.
import math
import matplotlib.pyplot as plt
def plot_with_temperature(name_list, value_list, temperature):
tmp_list = [math.pow(math.e, x/temperature) for x in value_list]
sum_value = sum(tmp_list)
out_list = [x / sum_value for x in tmp_list]
plt.bar(name_list, out_list)
plt.show()
pass
if __name__ == "__main__":
name_list = ["cat","cheese","pizza","cookie","fondue","banana","baguette","cake"]
value_list = [3, 70, 40, 65, 55, 10, 15, 12]
plot_with_temperature(name_list, value_list, temperature=1)
plot_with_temperature(name_list, value_list, temperature=10)
plot_with_temperature(name_list, value_list, temperature=50)
plot_with_temperature(name_list, value_list, temperature=100)
plot_with_temperature(name_list, value_list, temperature=1000)The script is run with temperatures 1, 10, 50, 100 and 1000. The resulting bar charts are shown below.
Conclusion
Observing the charts, higher Temperature values produce flatter distributions, allowing the model to generate more diverse and creative text—useful for prose generation. Lower Temperature values concentrate probability on the top token, yielding more deterministic outputs—ideal for question‑answering scenarios.
AI Algorithm Path
A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
