Tagged articles
2 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Mar 14, 2025 · Artificial Intelligence

Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations

The article summarizes Zhihu's machine‑learning platform lead Wang Xin's presentation on the ZhiLight large‑model inference framework, covering model execution mechanisms, GPU workload analysis, pipeline and tensor parallelism, GPU architecture evolution, open‑source engine comparisons, ZhiLight's compute‑communication overlap and quantization optimizations, benchmark results, supported models, and future directions.

GPUInferenceLLM
0 likes · 13 min read
Insights from Zhihu's ZhiLight Large‑Model Inference Framework: Architecture, Parallelism, and Performance Optimizations
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 24, 2023 · Artificial Intelligence

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview, Training, and RLHF Details

This article provides a comprehensive English overview of Meta's Llama 2 family, describing the model sizes, pre‑training data, architectural improvements, supervised fine‑tuning, reinforcement learning with human feedback, safety evaluations, reward‑model training, and iterative optimization techniques used to produce the high‑performing Llama 2‑Chat models.

Llama-2Open‑sourceRLHF
0 likes · 33 min read
Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview, Training, and RLHF Details