Artificial Intelligence 11 min read

Practical Deep Learning Training Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization and More

This article compiles essential deep learning training techniques—including cyclic learning rates, flooding, warmup, RAdam optimizer, adversarial training, focal loss, dropout, batch/group/weight normalization, label smoothing, Wasserstein GAN, skip connections, and weight initialization—providing concise explanations and code snippets for each method.

DataFunTalk

Dec 4, 2021

Practical Deep Learning Training Tricks: Cyclic LR, Flooding, Warmup, RAdam, Adversarial Training, Focal Loss, Dropout, Normalization and More

The article serves as a concise collection of practical tricks for machine learning and deep learning model training, offering brief explanations and ready‑to‑use code snippets for each technique.

Cyclic LR : Periodically restart the learning rate to explore multiple local minima within a fixed time budget.

scheduler = lambda x: ((LR_INIT-LR_MIN)/2)*(np.cos(PI*(np.mod(x-1,CYCLE)/(CYCLE)))+1)+LR_MIN

Flooding : Keep the training loss around a predefined threshold to encourage a "random walk" and find flatter loss regions, improving test loss stability. flood = (loss - b).abs() + b Warmup : Gradually increase the learning rate at the early stage to avoid premature over‑fitting on mini‑batches and to stabilize deep layers.

warmup_steps = int(batches_per_epoch * 5)

warmup_lr = (initial_learning_rate * tf.cast(global_step, tf.float32) / tf.cast(warmup_steps, tf.float32))

return tf.cond(global_step < warmup_steps, lambda: warmup_lr, lambda: lr)

RAdam : Uses exponential moving averages of first‑ and second‑order moments to adapt the learning rate, normalizing the first moment with the second. from radam import * Adversarial Training : Generates adversarial examples (e.g., FGSM, I‑FGSM, PGD) during training, acting as a regularizer that imposes a Lipschitz constraint on the network. adversarial_training(model, 'Embedding-Token', 0.5) Focal Loss : Mitigates class‑imbalance by down‑weighting easy samples and focusing the loss on hard examples.

loss = -np.log(p)

loss = (1-p)^G * loss

Dropout : Randomly drops neurons during training to reduce over‑fitting and improve model robustness.

Normalization :

Batch Normalization – normalizes each neuron using mini‑batch statistics.

Group Normalization – divides channels into groups and normalizes within each group.

def GroupNorm(x, gamma, beta, G, eps=1e-5):
    # x: input features with shape [N,C,H,W]
    # gamma, beta: scale and offset, with shape [1,C,1,1]
    # G: number of groups for GN
    N, C, H, W = x.shape
    x = tf.reshape(x, [N, G, C // G, H, W])
    mean, var = tf.nn.moments(x, [2, 3, 4], keep_dims=True)
    x = (x - mean) / tf.sqrt(var + eps)
    x = tf.reshape(x, [N, C, H, W])
    return x * gamma + beta

ReLU : Implements a simple non‑linear activation to alleviate gradient vanishing. x = max(x, 0) Skip Connection : Provides an identity mapping to prevent degradation in very deep networks. F(x) = F(x) + x Weight Initialization : Proper initialization (e.g., non‑zero, variance‑scaled) speeds up convergence and improves final model quality.

Embedding(embeddings_initializer=word2vec_emb, input_dim=2009, output_dim=DOTA)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning neural networks Regularization training tricks

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.