Tagged articles

low‑precision

2 articles · Page 1 of 1

Oct 30, 2024 · Artificial Intelligence

Why Google’s TPU Beats GPUs: Architecture, Performance, and Future Trends

This article analyzes Google’s Tensor Processing Unit (TPU) as a purpose‑built AI ASIC, tracing its evolution from early GPGPU and FPGA solutions, detailing its MXU systolic‑array design, low‑precision advantages, performance benchmarks, power efficiency, cluster interconnect innovations, and software integration with TensorFlow.

AI hardwareASICGoogle

0 likes · 15 min read

Why Google’s TPU Beats GPUs: Architecture, Performance, and Future Trends

Baobao Algorithm Notes

Oct 19, 2023 · Artificial Intelligence

Efficient LLM Deployment: Low‑Precision, Flash Attention, and Architecture Tricks

This article reviews the main memory and compute challenges of deploying large language models and presents practical solutions—including low‑precision arithmetic, flash attention, advanced positional embeddings, key‑value caching, and quantization techniques—backed by code examples and performance measurements on models such as OctoCoder.

Flash AttentionLLMQuantization

0 likes · 35 min read

Efficient LLM Deployment: Low‑Precision, Flash Attention, and Architecture Tricks