Tagged articles
3 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 20, 2026 · Artificial Intelligence

Can 99% Sparse Transformers Run Faster? Insights from the ‘Attention Is All You Need’ Authors

The paper shows that applying lightweight L1 regularization can make over 99% of FFN activations zero, and by using a new tile‑wise ELLPACK (TwELL) format together with a hybrid routing scheme, inference speed improves up to 30% while memory usage drops over 24% and energy consumption is reduced, all with negligible impact on downstream task performance.

CUDAGPU OptimizationHybrid Routing
0 likes · 8 min read
Can 99% Sparse Transformers Run Faster? Insights from the ‘Attention Is All You Need’ Authors
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 9, 2026 · Artificial Intelligence

Can 99% Sparse Transformers Run Faster? Insights from the Original Authors

A new ICML 2026 paper by Sakana AI and NVIDIA shows that applying lightweight L1 regularization can make Feed‑Forward Network activations in Transformers over 99% sparse, and with the TwELL storage format and a hybrid routing scheme this sparsity translates into up to 20.5% inference speedup, 21.9% training‑step acceleration, lower energy consumption and reduced peak memory across 0.5‑2 B‑parameter models while preserving downstream performance.

CUDAGPU OptimizationHybrid Routing
0 likes · 9 min read
Can 99% Sparse Transformers Run Faster? Insights from the Original Authors
Meituan Technology Team
Meituan Technology Team
Oct 12, 2017 · Artificial Intelligence

Machine Learning Q&A: Data Imputation, Feature Selection, Recommendation Systems and More

The article answers ten machine‑learning questions, explaining how to impute missing behavior data, extract and select features, describe Meituan‑Dianping’s recommendation pipeline, suggest a beginner learning path, clarify L1 sparsity, recommend TextCNN for text, discuss search‑ranking sample bias, label generation for wide‑deep models, the shift to deep‑learning video detection, and the use of factorization machines for CTR with open‑source examples.

Deep LearningL1 RegularizationRecommendation Systems
0 likes · 7 min read
Machine Learning Q&A: Data Imputation, Feature Selection, Recommendation Systems and More