Tagged articles
3 articles
Page 1 of 1
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 29, 2024 · Artificial Intelligence

Can Data Mixing Laws Predict LLM Performance? A Deep Dive into Scaling Laws

This article reviews the paper “Data Mixing Laws: Optimizing Data Mixture by Predicting Language Modeling Performance”, explaining how the authors quantify the impact of data mixture ratios on LLM loss, propose a simple predictive model, validate it on RedPajama and multi‑domain mixes, and outline a scaling‑law procedure for continual pre‑training.

Data MixingData SchedulingLLM
0 likes · 9 min read
Can Data Mixing Laws Predict LLM Performance? A Deep Dive into Scaling Laws
DataFunTalk
DataFunTalk
Mar 5, 2022 · Big Data

Designing Cross‑Period Dependencies in Data Scheduling Systems

This article explains how data scheduling systems manage task execution, ETL processes, and cross‑period dependencies by linking task versions, data partitions, and time parameters, and introduces the offset‑and‑cnt model to express dynamic dependencies in big‑data pipelines.

DAGData SchedulingETL
0 likes · 14 min read
Designing Cross‑Period Dependencies in Data Scheduling Systems
Tencent Cloud Developer
Tencent Cloud Developer
Sep 24, 2020 · Cloud Computing

Technical Overview of Tencent Cloud CBS Data Scheduling System

The Tencent Cloud CBS data scheduling system has evolved from a simple snapshot service into a highly concurrent, low‑latency platform that uses COW/ROW mechanisms, multi‑version snapshots, rapid rollback, hot‑data caching, horizontal scaling, fault‑tolerant task switching, cross‑region replication, and seamless disk migration to ensure reliable, fast storage for backups, image creation, and cloud‑disk migration, with future AI‑driven scheduling and ultra‑low‑latency features.

Data Schedulingblock storagecloud storage
0 likes · 25 min read
Technical Overview of Tencent Cloud CBS Data Scheduling System