Tagged articles

23 articles

Page 1 of 1

May 16, 2026 · Artificial Intelligence

GIPO: Overcoming Utilization Collapse for Efficient Large‑Model Reinforcement Learning

GIPO (Gaussian Importance Sampling Policy Optimization) replaces PPO’s hard clipping with a smooth Gaussian‑weighted trust region, achieving log‑space symmetry and bias‑variance balance that mitigates policy lag and utilization collapse, and demonstrates superior stability and sample efficiency on GridWorld, LIBERO, MetaWorld, and 7‑billion‑parameter VLA experiments.

Bias-Variance TradeoffGIPOLarge-Scale Training

0 likes · 17 min read

GIPO: Overcoming Utilization Collapse for Efficient Large‑Model Reinforcement Learning

Machine Heart

May 5, 2026 · Artificial Intelligence

Agent-World: Scaling Real-World Environments for Co‑Evolving Agents and Their Worlds

Agent-World introduces a universal training arena that automatically mines real‑world data from the internet to build over 1,900 diverse environments and 19,800 tools, then generates long‑horizon tasks through graph‑based and programmatic synthesis, creating a self‑evolving loop where agents are evaluated, diagnosed, and the environment is refined, achieving state‑of‑the‑art results on 23 benchmarks.

AI agentsAgent-WorldLarge-Scale Training

0 likes · 14 min read

Agent-World: Scaling Real-World Environments for Co‑Evolving Agents and Their Worlds

JD Tech Talk

Jan 30, 2026 · Artificial Intelligence

How JD’s 9N‑LLM Engine Powers Scalable Generative Recommendation at Billion‑Scale

This article details JD Retail’s 9N‑LLM unified training engine, explaining the background of generative recommendation, the challenges of massive sparse and dense parameters, and the multi‑framework, multi‑hardware solutions—including efficient sample processing, large‑scale sparse embedding, dense scaling, UniAttention acceleration, and reinforcement‑learning integration—that enable industrial‑scale deployment.

AI InfrastructureGenerative RecommendationLarge-Scale Training

0 likes · 26 min read

How JD’s 9N‑LLM Engine Powers Scalable Generative Recommendation at Billion‑Scale

Architect

Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

DeepSeek’s new paper introduces mHC, a manifold‑constrained version of Hyper‑Connections that stabilizes gradient flow, adds only 6.7% training overhead, and enables reliable training of 27‑billion‑parameter models while improving benchmark performance by about 2%.

AI ArchitectureDeep LearningLarge-Scale Training

0 likes · 7 min read

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

Architects' Tech Alliance

Dec 28, 2025 · Artificial Intelligence

Google’s TPU v7: How 1.5 & 2.6 Optical Modules per Chip Power AI Supercomputers

The article explains how Google’s TPU v7 supercomputer uses a simple yet powerful networking scheme—1.5 optical modules per TPU for intra‑rack communication and an additional 2.6 modules per TPU for inter‑rack high‑speed links—enabling massive AI model training with balanced cost and performance.

AI supercomputingGoogleLarge-Scale Training

0 likes · 13 min read

Google’s TPU v7: How 1.5 & 2.6 Optical Modules per Chip Power AI Supercomputers

Baidu Intelligent Cloud Tech Hub

Nov 7, 2025 · Artificial Intelligence

From Big Data to 30,000‑GPU Clusters: The Evolution of China’s AI Infrastructure

In a deep interview, Baidu AI Computing chief scientist Wang Yanpeng and host Koji trace China's internet infrastructure from the early big‑data era through cloud computing to today's AI boom, highlighting the pivotal role of compute power, GPU acceleration, data scaling, and Baidu's Baige platform in shaping the AI arms race.

AI InfrastructureBaidu BaigeGPU computing

0 likes · 26 min read

From Big Data to 30,000‑GPU Clusters: The Evolution of China’s AI Infrastructure

Architects' Tech Alliance

Sep 2, 2025 · Artificial Intelligence

Designing High‑Performance Networks for Massive AI Model Training

This article examines how AI large‑model training demands massive GPU clusters and low‑latency, high‑throughput networks, compares Clos/Fat‑Tree, Spine‑Leaf, Dragonfly, Group‑wise Dragonfly+ and Torus topologies, and discusses design choices for scaling to hundreds of thousands of GPUs while noting related data‑center resources.

AILarge-Scale Trainingdata center

0 likes · 8 min read

Designing High‑Performance Networks for Massive AI Model Training

Tencent Advertising Technology

Aug 31, 2025 · Artificial Intelligence

LFM4Ads: Full-Representation Multi-Granular Transfer Boosts Ad Recommendation

Tencent's LFM4Ads foundation model introduces a full-representation, multi-granular knowledge transfer framework that moves user, item, and cross representations to downstream tasks, dramatically improving ad recommendation metrics across dozens of business scenarios.

Knowledge TransferLarge-Scale Trainingad recommendation

0 likes · 10 min read

LFM4Ads: Full-Representation Multi-Granular Transfer Boosts Ad Recommendation

AI Algorithm Path

Aug 16, 2025 · Artificial Intelligence

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

Meta's DINOv3 is a 70‑billion‑parameter self‑supervised visual foundation model trained on 17 billion Instagram images without any labels, introducing dense feature extraction, Gram‑Anchoring to prevent feature collapse, high‑resolution adaptation, and multi‑student distillation that together enable out‑of‑the‑box performance on segmentation, depth estimation, 3D matching, and tracking while surpassing prior models such as DINOv2, CLIP, and SAM.

Computer VisionDINOv3Gram Anchoring

0 likes · 8 min read

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

Meituan Technology Team

May 15, 2025 · Artificial Intelligence

How Meituan’s MTGR Framework Achieved 65× Faster Inference with Scaling Laws

Meituan’s recommendation team introduced the MTGR framework, aligning traditional DLRM features with a unified HSTU‑based Transformer to explore scaling laws, delivering a 65‑fold FLOPs boost, 12% lower inference cost, and record gains in online CTR and order volume across its food‑delivery platform.

Inference OptimizationLarge-Scale TrainingMTGR

0 likes · 26 min read

How Meituan’s MTGR Framework Achieved 65× Faster Inference with Scaling Laws

AI Algorithm Path

May 11, 2025 · Artificial Intelligence

How to Parallelize Ultra‑Large Model Training with PyTorch

The article explains the core concepts and trade‑offs of five parallelism techniques—data, tensor, context, pipeline, and expert parallelism—plus the ZeRO optimizer, showing when each method is appropriate for training ultra‑large PyTorch models and providing concrete code snippets and performance considerations.

Context ParallelismData ParallelismExpert Parallelism

0 likes · 21 min read

How to Parallelize Ultra‑Large Model Training with PyTorch

DataFunSummit

Mar 20, 2025 · Artificial Intelligence

Evolution of AI Training Stability and Baidu Baige’s Full-Stack Solutions for Large-Scale Model Training

The article traces the evolution of AI training stability from early manual operations on small GPU clusters to sophisticated, fault‑tolerant infrastructures for thousand‑card and ten‑thousand‑card models, detailing Baidu Baige’s metrics, monitoring, eBPF‑based diagnostics, and checkpoint strategies that reduce invalid training time and accelerate fault recovery.

Distributed SystemsLarge-Scale Trainingcheckpointing

0 likes · 22 min read

Evolution of AI Training Stability and Baidu Baige’s Full-Stack Solutions for Large-Scale Model Training

AIWalker

Feb 12, 2025 · Artificial Intelligence

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

The paper presents Goku, a rectified‑flow transformer that jointly generates high‑quality images and videos at commercial scale, detailing its novel architecture, massive high‑quality data pipeline, efficient large‑scale training tricks, and state‑of‑the‑art results on GenEval, DPG‑Bench, VBench and UCF‑101.

Image GenerationLarge-Scale TrainingMultimodal AI

0 likes · 29 min read

Goku: How HKU and ByteDance’s New Model Sets New Benchmarks in Commercial Image and Video Generation

DataFunTalk

Apr 3, 2023 · Artificial Intelligence

Large‑Scale Recommendation System Training with TorchRec and Dynamic Embedding

This article explains how Tencent’s AI team leverages the PyTorch‑based TorchRec library and a custom dynamic embedding solution to train billion‑scale recommendation models efficiently, detailing the benefits of TorchRec, GPU embedding, optimized kernels, embedding partition strategies, experimental results, and practical deployment guidance.

GPU EmbeddingLarge-Scale TrainingPyTorch

0 likes · 15 min read

Large‑Scale Recommendation System Training with TorchRec and Dynamic Embedding

DataFunSummit

Apr 2, 2023 · Artificial Intelligence

Efficient Training of Large Models with the Open‑Source Distributed Framework Easy Parallel Library (EPL)

This article introduces the challenges of scaling deep‑learning model training, explains the design and components of the open‑source Easy Parallel Library (EPL) that unifies data, pipeline, and operator‑split parallelism, and demonstrates its best‑practice results on large‑scale classification, BERT‑large, and massive multimodal models.

Distributed TrainingEPLLarge-Scale Training

0 likes · 15 min read

Efficient Training of Large Models with the Open‑Source Distributed Framework Easy Parallel Library (EPL)

Tencent Advertising Technology

Mar 10, 2023 · Artificial Intelligence

Optimizing Large-Scale Model Training with Tencent's AngelPTM and ZeRO-Cache

This article presents Tencent's latest advancements in large‑scale model training, detailing the AngelPTM framework and its ZeRO‑Cache optimization techniques that reduce memory and storage costs, improve hardware utilization, and achieve high‑performance training for trillion‑parameter AI models across various applications.

AI modelsAngelPTMLarge-Scale Training

0 likes · 14 min read

Optimizing Large-Scale Model Training with Tencent's AngelPTM and ZeRO-Cache

Baidu Geek Talk

Feb 17, 2023 · Artificial Intelligence

How PGLBox Achieves 27× Faster GPU‑Powered Large‑Scale Graph Learning

PGLBox, Baidu’s GPU‑based large‑scale graph training framework, delivers up to 27× speedup over CPU clusters by fully GPU‑accelerating storage, sampling, and training, supporting billions of nodes, advanced GNN algorithms, multi‑level storage, and seamless integration of massive pretrained models.

GPULarge-Scale TrainingPGLBox

0 likes · 7 min read

How PGLBox Achieves 27× Faster GPU‑Powered Large‑Scale Graph Learning

Meituan Technology Team

Nov 24, 2022 · Artificial Intelligence

Large-Scale Graph Retrieval for Meituan In-Store Advertising: Design, Optimization, and Deployment

The article details Meituan's deployment of large-scale heterogeneous graph recall for in‑store recommendation ads, covering full‑scene graph construction, graph pruning, dynamic negative sampling, spatiotemporal sub‑graph fusion, and performance optimizations that together raise offline hit‑rate by over 5% and online revenue per search by 10‑15%.

Large-Scale TrainingMeituangraph neural networks

0 likes · 25 min read

Large-Scale Graph Retrieval for Meituan In-Store Advertising: Design, Optimization, and Deployment

DataFunSummit

Sep 9, 2022 · Artificial Intelligence

Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation

The presentation by Tencent expert Yuan Yi details the Wuliang deep learning system for recommendation, covering its background, technical challenges such as massive data and real‑time requirements, the parameter‑server based solutions for training and inference, model compression techniques, and continuous online deployment strategies.

Deep LearningLarge-Scale TrainingParameter Server

0 likes · 14 min read

Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation

Alimama Tech

Aug 17, 2022 · Artificial Intelligence

How Multimodal AI Transforms Advertising Copy: From Image Text to Video Scripts

Alibaba’s advertising AI team presents a comprehensive study of four new multimodal copywriting tasks—image overlay text generation, video narration, text style transfer, and detail-page extraction—detailing model architectures, training on billions of images, experimental results, and practical deployment in the “Xiyu” product.

Large-Scale TrainingMultimodal AIStyle Transfer

0 likes · 17 min read

How Multimodal AI Transforms Advertising Copy: From Image Text to Video Scripts

DataFunSummit

Feb 10, 2022 · Artificial Intelligence

Baidu's PGL2.2: A Graph Neural Network Framework, Techniques, and Real‑World Applications

This article introduces Baidu's PGL2.2 graph learning platform, explains graph modeling and message‑passing GNN techniques, details training strategies for small, medium and large graphs, showcases node classification and link‑prediction methods, and describes how the framework is applied in search, recommendation, risk control, and knowledge‑graph competitions.

Knowledge GraphsLarge-Scale TrainingPGL2.2

0 likes · 15 min read

Baidu's PGL2.2: A Graph Neural Network Framework, Techniques, and Real‑World Applications

Ctrip Technology

Apr 9, 2021 · Artificial Intelligence

Algorithm Optimization for Hotel Recommendation and Large‑Scale Discrete DNN Training at Ctrip

This article describes how Ctrip improved hotel recommendation by iterating from logistic regression to GBDT and deep neural networks, designing continuous and discrete features, adopting multi‑task learning with click and conversion signals, and building a large‑scale distributed DNN training and unified feature‑processing framework to boost model accuracy and engineering efficiency.

CtripDNNLarge-Scale Training

0 likes · 15 min read

Algorithm Optimization for Hotel Recommendation and Large‑Scale Discrete DNN Training at Ctrip

Alibaba Cloud Developer

Apr 16, 2018 · Artificial Intelligence

How Alibaba’s Deep Learning Transformed CTR Prediction: From MLR to Multi‑Interest Networks

This article recounts Alibaba‑Mama researcher Jing Shi’s presentation on the evolution of deep learning for click‑through‑rate (CTR) estimation, covering the shift from handcrafted features and linear models to piecewise linear MLR, end‑to‑end neural networks, multi‑interest user modeling, and large‑scale distributed training challenges.

AdvertisingCTR predictionDeep Learning

0 likes · 16 min read

How Alibaba’s Deep Learning Transformed CTR Prediction: From MLR to Multi‑Interest Networks