Tagged articles

softmax

6 articles · Page 1 of 1

Jun 23, 2026 · Artificial Intelligence

Understanding NLP Activation Functions: The Role of Softmax

The article explains how the softmax activation function converts neural network outputs into probability distributions for multi‑class NLP tasks, describes its mathematical form and S‑shaped behavior, and discusses the inductive approach, data quality, training objectives, and interpretability challenges in deep learning language models.

Data QualityDeep LearningNLP

0 likes · 4 min read

Understanding NLP Activation Functions: The Role of Softmax

DeepHub IMBA

Jun 4, 2026 · Artificial Intelligence

Hand‑Writing a Triton Softmax Kernel: Program Instances, Block Size, Masking & Pointer Arithmetic

This article walks through implementing a row‑wise softmax kernel in Triton, explaining program‑instance mapping, block‑size selection, mask handling, pointer arithmetic, resource‑usage analysis, and a RTX 5090 benchmark that reveals performance cliffs compared to PyTorch.

CUDAGPU kernelPython

0 likes · 9 min read

Hand‑Writing a Triton Softmax Kernel: Program Instances, Block Size, Masking & Pointer Arithmetic

AI Large Model Application Practice

Mar 14, 2025 · Artificial Intelligence

Why Softmax Is the Secret Behind LLM Probabilities and Creative Generation

This article explains how the Softmax function converts raw neural‑network scores into a proper probability distribution, why this conversion is essential for training and inference in large language models, and how the temperature parameter shapes the model's creativity and diversity.

LLMlanguage modelsprobability

0 likes · 9 min read

Why Softmax Is the Secret Behind LLM Probabilities and Creative Generation

AI Algorithm Path

Feb 19, 2025 · Artificial Intelligence

How Temperature Shapes Output in Large Language Models

The article explains the Temperature hyper‑parameter in large language models, shows how it modifies the softmax distribution, provides a Python visualisation script, and demonstrates through experiments that higher values increase creativity while lower values make outputs more deterministic.

PythonSamplinglarge language models

0 likes · 5 min read

How Temperature Shapes Output in Large Language Models

Model Perspective

Sep 10, 2024 · Artificial Intelligence

Why Cross-Entropy Is the Key Loss Function for Classification Models

This article explains how loss functions evaluate model performance, contrasts regression’s mean squared error with classification’s cross‑entropy, describes one‑hot encoding and softmax outputs, and shows why higher predicted probabilities for the correct class yield lower loss, highlighting applications in image, language, and speech tasks.

One-hot encodingclassificationcross entropy

0 likes · 5 min read

Why Cross-Entropy Is the Key Loss Function for Classification Models

dbaplus Community

Nov 10, 2016 · Artificial Intelligence

Demystifying Recurrent Neural Networks: Theory, Training, and Implementation

This article explains the fundamentals of recurrent neural networks (RNNs), their role in language modeling, various RNN architectures such as bidirectional and deep RNNs, the back‑propagation through time (BPTT) training algorithm, gradient challenges, vectorization techniques, and provides a step‑by‑step code implementation.

BPTTDeep LearningLanguage Model

0 likes · 21 min read

Demystifying Recurrent Neural Networks: Theory, Training, and Implementation