Fundamentals 8 min read

Why GPUs Power AI and Gaming: A Beginner’s Guide to Their Architecture

This article explains what a GPU is, how it differs from a CPU, its internal architecture, and why its massive parallel processing makes it essential for graphics rendering, scientific computation, and AI inference, illustrated with examples such as NVIDIA RTX 3090.

Open Source Linux

Jul 2, 2024

Why GPUs Power AI and Gaming: A Beginner’s Guide to Their Architecture

What is a GPU?

GPU stands for Graphics Processing Unit. It is a specialized chip for graphics rendering, numerical analysis, financial analysis, cryptography, and other mathematical and geometric computations. GPUs run on PCs, workstations, game consoles, phones, tablets, and other smart devices.

The relationship between a GPU and a graphics card is like that between a CPU and a motherboard: the GPU is the heart of the graphics card, while the card also includes video memory, VRM modules, bus interfaces, fans, and other components.

Which is stronger, GPU or CPU?

It is not easy to say; high‑end GPUs may contain more transistors than CPUs. CPUs excel at logical operations, while GPUs excel at mathematical operations and graphics rendering, which is why ChatGPT uses many high‑performance GPUs for AI inference.

Comparison of structure:

Different composition

Both CPU and GPU consist of an ALU, a control unit, and a cache, but the proportion of each differs dramatically.

In a CPU the cache occupies about 50%, control about 25%, and compute about 25% of the chip.

In a GPU the cache occupies about 5%, control about 5%, and compute about 90%.

This shows that CPUs have balanced compute ability but are not suited for massive parallel tasks, whereas GPUs are designed for massive parallel simple calculations.

CPU is like an expert handling complex logic, network communication, and user requests, but with fewer ALUs it handles fewer complex operations.

Cache differences

CPU typically has a multi‑level cache (up to four levels) comprising about half of the chip, while GPU usually has only one or two levels of cache.

Floating‑point computation

CPU focuses on thread performance and low‑power floating‑point work, while GPU performs single‑ or double‑precision floating‑point operations with higher throughput.

Response mode

CPU provides real‑time response with multi‑level cache, whereas GPU processes tasks in batch, queuing them for execution.

GPU for graphics processing

Rendering a 1080×720 frame at 24 fps requires processing about 18.66 million pixels per second; higher resolutions (2K, 4K, 8K) increase the workload dramatically, making CPU‑only rendering impractical for real‑time graphics.

During rendering, 3D objects undergo multiple coordinate transformations and lighting calculations such as diffusion, refraction, and scattering.

Example: NVIDIA RTX 3090

The RTX 3090 has 10,496 streaming multiprocessors, each containing integer and floating‑point units and queues for operands and results. Each SM can be seen as an independent task‑processing unit, effectively giving the GPU thousands of CPU‑like cores.

By dividing the 18.66 million pixels per second among the 10,496 processors, each handles roughly 1,778 pixels per second.

Performance factors besides CUDA cores include:

Core frequency – higher frequency yields stronger performance but higher power consumption.

Memory bus width – larger width allows more data to be processed simultaneously.

VRAM capacity – more memory can cache more data.

Memory frequency – higher frequency speeds up graphics data transfer.

Summary

In short, GPUs handle graphics rendering, numerical analysis, and AI inference by breaking down massive mathematical tasks into many simple parallel operations, similar to a cluster of CPUs.

Through their many streaming processors, GPUs split large workloads into small tasks that run concurrently, making them far faster than CPUs for parallel workloads.

The above provides a brief introduction to GPU concepts and operation; deeper topics such as pixel transformation and triangle rasterization are left for further study.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

parallel computing AI inference GPU Graphics Rendering hardware fundamentals

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.