Architects' Tech Alliance
Architects' Tech Alliance
Oct 30, 2024 · Artificial Intelligence

Why Google’s TPU Beats GPUs: Architecture, Performance, and Future Trends

This article analyzes Google’s Tensor Processing Unit (TPU) as a purpose‑built AI ASIC, tracing its evolution from early GPGPU and FPGA solutions, detailing its MXU systolic‑array design, low‑precision advantages, performance benchmarks, power efficiency, cluster interconnect innovations, and software integration with TensorFlow.

AI hardwareASICGoogle
0 likes · 15 min read
Why Google’s TPU Beats GPUs: Architecture, Performance, and Future Trends
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 19, 2023 · Artificial Intelligence

Efficient LLM Deployment: Low‑Precision, Flash Attention, and Architecture Tricks

This article reviews the main memory and compute challenges of deploying large language models and presents practical solutions—including low‑precision arithmetic, flash attention, advanced positional embeddings, key‑value caching, and quantization techniques—backed by code examples and performance measurements on models such as OctoCoder.

Flash AttentionLLMQuantization
0 likes · 35 min read
Efficient LLM Deployment: Low‑Precision, Flash Attention, and Architecture Tricks