Tagged articles
2 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Sep 24, 2024 · Artificial Intelligence

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

The article discusses the challenges of training ever‑larger AI models on internet‑scale data, critiques traditional batch ETL pipelines, and proposes a streaming data‑flow architecture with dynamic data selection and a shared‑memory/Alluxio middle layer to decouple data processing from model training, improving efficiency and scalability.

AI InfrastructureMultimodal Datadata pipelines
0 likes · 20 min read
Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 12, 2019 · Artificial Intelligence

How Multi‑Layer Multi‑Frequency Streaming Training Boosts Real‑Time CTR/CVR Prediction

This article details a novel Multi‑Layer Multi‑Frequency streaming training approach that enables minute‑level real‑time updates of massive CTR/CVR models by partitioning weights into freezing embeddings, changing embeddings, and changing weights, demonstrating significant offline and online AUC gains, especially during high‑traffic events like Double 11.

CTR predictione‑commercemachine learning
0 likes · 18 min read
How Multi‑Layer Multi‑Frequency Streaming Training Boosts Real‑Time CTR/CVR Prediction