Tagged articles

BF16

3 articles · Page 1 of 1

Jul 25, 2026 · Artificial Intelligence

How Much Does Deploying GLM‑5.2 Locally Cost? A Detailed Cost Breakdown

The article provides a thorough cost analysis for locally deploying the GLM‑5.2 large language model, detailing hardware configurations, FP8 and BF16 precision options, single‑node versus dual‑node setups, memory requirements, and why regulated finance firms are the primary candidates for such an investment.

AI infrastructureBF16FP8

0 likes · 7 min read

How Much Does Deploying GLM‑5.2 Locally Cost? A Detailed Cost Breakdown

Machine Heart

Apr 16, 2026 · Artificial Intelligence

Achieving 4.6× Faster Diffusion Model Training with FP4‑BF16 Dual‑Track Parallelism (Sol‑RL)

Sol‑RL, a framework from NVIDIA, Hong Kong University and MIT, integrates NVFP4 inference for large‑scale rollout exploration and BF16 precision for high‑fidelity regeneration, delivering up to 4.64× faster convergence at equivalent reward levels while preserving BF16 training fidelity across SANA, FLUX.1 and SD3.5‑L models.

BF16FP4GPU Optimization

0 likes · 9 min read

Achieving 4.6× Faster Diffusion Model Training with FP4‑BF16 Dual‑Track Parallelism (Sol‑RL)

Code DAO

Jan 15, 2022 · Artificial Intelligence

How Intel BF16 with IPEX and oneDNN Boosts PyTorch Performance

This article explains how Intel and Facebook's BF16 support, combined with the Intel Extension for PyTorch (IPEX) and oneDNN, automates type and layout conversions and adds graph‑fusion optimizations, delivering 1.4×‑4.3× inference and up to 2.4× training speedups on Xeon CPUs for models such as DLRM, BERT‑Large, and ResNext‑101‑32x4d.

BF16CPU accelerationDeep Learning

0 likes · 13 min read

How Intel BF16 with IPEX and oneDNN Boosts PyTorch Performance