Tag

performance modeling

0 views collected around this technical thread.

Kuaishou Large Model
Kuaishou Large Model
Nov 22, 2024 · Artificial Intelligence

Boost LLM Training on Massive Clusters with DP/TP Overlap and Context Parallelism

This article details a comprehensive set of techniques—including data‑ and tensor‑parallel overlap, context‑parallelism, activation rematerialization, and a performance‑driven cost model—that dramatically improve large‑language‑model training efficiency on ultra‑large GPU clusters while preserving model quality.

Parallelismactivation recomputationdistributed training
0 likes · 28 min read
Boost LLM Training on Massive Clusters with DP/TP Overlap and Context Parallelism
Kuaishou Tech
Kuaishou Tech
Nov 21, 2024 · Artificial Intelligence

Best Practices for Training Large Language Models on Ultra‑Large Scale Clusters

This article summarizes the challenges of distributed training for massive language models and presents a suite of solutions—including DP/TP/PP overlap, context parallelism, efficient recomputation, and a performance‑aware cost model—that together boost training throughput by over 30% on large GPU clusters.

GPU clustersactivation rematerializationdistributed training
0 likes · 27 min read
Best Practices for Training Large Language Models on Ultra‑Large Scale Clusters
Kuaishou Large Model
Kuaishou Large Model
Jul 11, 2024 · Artificial Intelligence

Pipeline-Aware Offloading & Balanced Checkpointing Accelerate LLM Training

Researchers from Kwai’s large-model team present a novel training system that combines pipeline-parallel-aware activation offloading with a compute-memory balanced checkpointing strategy, enabling lossless acceleration of large language models, achieving up to 42.7% MFU on 256 NVIDIA H800 GPUs while reducing memory usage.

GPU trainingKwaiactivation offloading
0 likes · 13 min read
Pipeline-Aware Offloading & Balanced Checkpointing Accelerate LLM Training
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Aug 19, 2022 · Fundamentals

Superscalar Processor Architecture and Performance Modeling for Mobile Devices

Modern mobile CPUs are superscalar, using deep pipelining, branch prediction, register renaming, out‑of‑order issue, execution, write‑back, and commit stages to boost instruction‑level parallelism, while performance modeling via CPI and hardware counters helps engineers overcome power, memory, and compiler limitations for efficient code.

CPUMobile ProcessorSuperscalar
0 likes · 13 min read
Superscalar Processor Architecture and Performance Modeling for Mobile Devices
IT Architects Alliance
IT Architects Alliance
Mar 5, 2022 · Operations

High Availability Overview and Design for Business Systems

This article explains the concepts, metrics, planning stages, and architectural components of high availability for business systems, covering reliability, performance, scalability, evaluation phases, performance modeling, and practical implementation guidelines to achieve four‑nine (99.99%) uptime.

High AvailabilitySystem Architecturenon-functional requirements
0 likes · 17 min read
High Availability Overview and Design for Business Systems
IT Architects Alliance
IT Architects Alliance
Oct 24, 2021 · Databases

Database Capacity Planning and Scaling with ScyllaDB

This article explains why database capacity planning is challenging and presents a systematic approach—including workload analysis, performance modeling, consistency considerations, and node scaling decisions—using the open‑source NoSQL database ScyllaDB to guide accurate capacity estimation.

NoSQLScalingScyllaDB
0 likes · 14 min read
Database Capacity Planning and Scaling with ScyllaDB
Architects' Tech Alliance
Architects' Tech Alliance
Feb 17, 2019 · Operations

Modeling SSD Garbage Collection as a Gambler's Ruin Problem: Probabilistic Analysis and Control Strategies

By drawing analogies between casino gambling and SSD garbage collection, the article uses probability theory, Brownian motion, and stochastic processes to model victim block selection, resource depletion, and I/O bandwidth fluctuations, proposing control strategies that balance performance stability and resource safety.

Garbage CollectionProbability TheorySSD
0 likes · 21 min read
Modeling SSD Garbage Collection as a Gambler's Ruin Problem: Probabilistic Analysis and Control Strategies