Tagged articles
7 articles
Page 1 of 1
Lao Guo's Learning Space
Lao Guo's Learning Space
May 5, 2026 · Artificial Intelligence

Top DIY AI Supercomputer Builds 2026: RTX 5090 & GB300 from $300‑$100k

Analyzing the cost‑benefit of building personal AI supercomputers, the article compares cloud GPU rentals to DIY setups across budgets from $300 to $100k, detailing component choices such as RTX 5090, GB300, Mac Studio, and DGX Spark, while highlighting performance gains, ROI timelines, and common build pitfalls.

AI workstationDIY supercomputerGB300
0 likes · 14 min read
Top DIY AI Supercomputer Builds 2026: RTX 5090 & GB300 from $300‑$100k
Architects' Tech Alliance
Architects' Tech Alliance
Aug 20, 2025 · Artificial Intelligence

Dual ToR and Dual‑Plane Designs: Boosting AI Training Performance in Large‑Scale Data Centers

The article explains how non‑stacked dual‑ToR and dual‑plane network architectures, combined with single‑chip high‑performance switches and multi‑rail host networking, dramatically improve reliability, load balance, and end‑to‑end training speed for massive AI models such as GPT‑3 175B.

AI networkingGPU trainingdata center
0 likes · 11 min read
Dual ToR and Dual‑Plane Designs: Boosting AI Training Performance in Large‑Scale Data Centers
Kuaishou Large Model
Kuaishou Large Model
Jul 11, 2024 · Artificial Intelligence

Pipeline-Aware Offloading & Balanced Checkpointing Accelerate LLM Training

Researchers from Kwai’s large-model team present a novel training system that combines pipeline-parallel-aware activation offloading with a compute-memory balanced checkpointing strategy, enabling lossless acceleration of large language models, achieving up to 42.7% MFU on 256 NVIDIA H800 GPUs while reducing memory usage.

GPU trainingKwaiLarge Language Models
0 likes · 13 min read
Pipeline-Aware Offloading & Balanced Checkpointing Accelerate LLM Training
Ximalaya Technology Team
Ximalaya Technology Team
Oct 23, 2023 · Artificial Intelligence

HybridBackend Accelerates GPU-Based Recommendation Model Training for Ximalaya AI Cloud

Ximalaya AI Cloud adopted the open‑source HybridBackend framework to overcome sparse‑data bottlenecks, enabling columnar Parquet reads and hybrid parallel GPU training that boost GPU utilization by over threefold, cut recommendation model training time by more than half, and now powers all TensorFlow and DeepRec production models.

AI cloudDistributed TrainingGPU training
0 likes · 8 min read
HybridBackend Accelerates GPU-Based Recommendation Model Training for Ximalaya AI Cloud
IT Architects Alliance
IT Architects Alliance
Apr 17, 2023 · Artificial Intelligence

DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models

DeepSpeed Chat provides a fast, affordable, and scalable system for end‑to‑end RLHF training of ChatGPT‑style large language models, offering one‑click scripts, detailed performance benchmarks across GPU configurations, support for many model families, and a flexible API for custom RLHF pipelines.

ChatGPTDeepSpeedGPU training
0 likes · 14 min read
DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models
Code DAO
Code DAO
May 27, 2022 · Artificial Intelligence

Building an Image Classification Model with CNNs

This article explains how to train a convolutional neural network on a remote GPU for image classification, covering convolution, padding, activation, pooling, dropout, flattening, fully‑connected layers, dataset preparation, model definition, training, and prediction using TensorFlow/Keras.

CNNFood-101GPU training
0 likes · 13 min read
Building an Image Classification Model with CNNs
Meituan Technology Team
Meituan Technology Team
Mar 24, 2022 · Artificial Intelligence

Booster GPU Training Architecture for Recommendation Systems at Meituan: Design, Optimization, and Deployment

Meituan’s Booster architecture co‑designs algorithm and system to run TensorFlow recommendation training on multi‑GPU A100 servers, optimizing data fetching, embedding pipelines, custom kernels and communication fusion, delivering 2–4× cost‑performance over CPU, over threefold GPU throughput, and seamless deployment via a single‑line API.

Booster architectureGPU trainingTensorFlow
0 likes · 36 min read
Booster GPU Training Architecture for Recommendation Systems at Meituan: Design, Optimization, and Deployment