Tagged articles
6 articles
Page 1 of 1
SuanNi
SuanNi
Mar 20, 2026 · Artificial Intelligence

How SkillCraft Shows AI Agents Can Cut Compute Costs by Up to 80%

SkillCraft, a new benchmark from Oxford and partner institutions, evaluates whether AI agents can autonomously combine basic tools into reusable skills, revealing that stronger models dramatically improve task success rates while slashing compute consumption by up to 80%, and exposing the limits of hierarchical skill nesting and cross‑model skill sharing.

AI BenchmarkCompute EfficiencySkillCraft
0 likes · 15 min read
How SkillCraft Shows AI Agents Can Cut Compute Costs by Up to 80%
ShiZhen AI
ShiZhen AI
Mar 17, 2026 · Artificial Intelligence

Kimi’s Attention Residuals Swap a Decade-Old Residual Trick for 1.25× Faster 48B MoE

The Kimi team introduces Attention Residuals, a softmax‑based replacement for the uniform residual connections used in Transformers for a decade, enabling selective aggregation of layer histories, reducing hidden‑state growth, and achieving a 1.25× compute‑efficiency gain on a 48‑billion‑parameter MoE model with less than 2% inference latency increase.

Attention ResidualsCompute EfficiencyDeep Learning
0 likes · 10 min read
Kimi’s Attention Residuals Swap a Decade-Old Residual Trick for 1.25× Faster 48B MoE
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 3, 2026 · Artificial Intelligence

Beyond Dense and MoE: JTok Module Cuts Compute by One‑Third as a New Scaling Path

The paper introduces JTok and its dynamic variant JTok‑M, a token‑indexed parameter scaling method that decouples model capacity from compute, achieving up to 35% compute reduction while delivering consistent performance gains across a wide range of downstream tasks and model sizes.

Compute EfficiencyJTokToken-indexed scaling
0 likes · 16 min read
Beyond Dense and MoE: JTok Module Cuts Compute by One‑Third as a New Scaling Path
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 21, 2023 · Artificial Intelligence

How Much Data Do You Need for a 10B LLM? Decoding Scaling Laws

This article explains how scaling laws can answer common LLM development questions—such as the data required for a 10B model, the model size achievable with 1 TB of data, and the optimal compute‑data‑model trade‑off for a fixed GPU budget—by presenting core formulas, practical derivations, and insights from OpenAI, DeepMind and Google.

Compute EfficiencyData RequirementsLarge Language Models
0 likes · 12 min read
How Much Data Do You Need for a 10B LLM? Decoding Scaling Laws
DataFunTalk
DataFunTalk
Aug 10, 2021 · Artificial Intelligence

A Comprehensive Review of Industrial-Scale Deep Learning for Click-Through Rate Prediction in Online Advertising

This article provides an extensive retrospective and forward‑looking analysis of the evolution of click‑through‑rate prediction technologies in online advertising, covering shallow‑learning era challenges, the rise of industrial‑scale deep learning, system‑level innovations such as recall, coarse‑ranking, fine‑ranking, bidding, and the emerging co‑design of algorithms, compute, and architecture.

Algorithmic OptimizationCTR predictionCompute Efficiency
0 likes · 65 min read
A Comprehensive Review of Industrial-Scale Deep Learning for Click-Through Rate Prediction in Online Advertising