Tagged articles
3 articles
Page 1 of 1
NewBeeNLP
NewBeeNLP
Jun 12, 2024 · Artificial Intelligence

Beyond Cosine Decay: Fixed LR + Quick Decay Beats Traditional Schedules in LLM Training

The article analyzes why the traditional cosine decay learning‑rate schedule hinders continued training of large language models and shows that fixed‑learning‑rate strategies such as Warmup‑Stable‑Decay, Cooldown, SWA, and Schedule‑Free Optimizer can match or surpass cosine performance while being more friendly to fine‑tuning.

LLM trainingSFOSWA
0 likes · 7 min read
Beyond Cosine Decay: Fixed LR + Quick Decay Beats Traditional Schedules in LLM Training
DataFunTalk
DataFunTalk
Aug 10, 2021 · Artificial Intelligence

Practical Deep Learning Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization, ReLU, Group Normalization, Label Smoothing, Wasserstein GAN, Skip Connections, Weight Initialization

This article presents a concise collection of practical deep‑learning techniques—including cyclic learning‑rate, flooding, warmup, RAdam, adversarial training, focal loss, dropout, various normalization methods, ReLU, group normalization, label smoothing, Wasserstein GAN, skip connections, and weight initialization—along with code snippets and references for implementation.

Deep LearningGANRegularization
0 likes · 8 min read
Practical Deep Learning Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization, ReLU, Group Normalization, Label Smoothing, Wasserstein GAN, Skip Connections, Weight Initialization
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 12, 2019 · Artificial Intelligence

How a Simple Learning‑Rate Trick Detects 90% of Noisy Labels in Image Data

Training deep neural networks on large‑scale weakly labeled image data suffers from noisy annotations that degrade performance, but a simple algorithm that adjusts the learning‑rate during training can automatically identify up to 90% of noisy samples, improving dataset cleanliness and model accuracy without manual intervention.

Deep LearningImage Classificationdata cleaning
0 likes · 15 min read
How a Simple Learning‑Rate Trick Detects 90% of Noisy Labels in Image Data