How Does an NES Emulator Work? A Deep Dive into CPU, PPU, and Memory
This article explains the core principles of building a Nintendo Entertainment System (NES) emulator, covering the collaboration of CPU, PPU, and APU, memory mapping, ROM loading, instruction decoding, graphics rendering, and interrupt handling, with practical code examples.
How Does an NES Emulator Work?
The Nintendo Family Computer (FC), also known as the NES, was a first‑generation home console released in 1983; it sold 62.9 million units worldwide. Many developers are curious about how to recreate its behavior in software.
Emulator Workflow
The emulator runs in a loop that repeatedly executes CPU work and PPU work while a ROM is loaded into memory.
<code>load_rom();
while(!quit) {
cpu_work();
ppu_work();
}</code>The ROM contains the game’s machine code, which the CPU fetches, decodes, and executes. The PPU reads memory to generate the video output shown on the screen.
Memory
The FC has 64 KB of addressable memory, accessed via pointers in the emulator. Memory mirroring must be handled for both CPU and PPU address spaces.
<code>struct Memory {
Memory() {
_data = (uint8_t*)malloc(0x10000);
memset(_data, 0, 0x10000);
}
// Resolve mirrored addresses
inline uint8_t* _getRealAddr(uint16_t addr) {
if (addr >= 0x2008 && addr <= 0x3FFF) { // I/O register mirroring
addr = 0x2000 + (addr % 8);
} else if (addr >= 0x0800 && addr <= 0x1FFF) { // 2 KB internal RAM mirroring
addr = addr % 0x0800;
}
return _data + addr;
}
// ...
};</code>Loading the ROM
The ROM header is 16 bytes long and defines the size of PRG and CHR ROM, mapper flags, and other metadata. The PRG ROM is mapped into 0x8000‑0xFFFF; if only 16 KB is present it is mirrored at 0xC000.
0‑3: Constant "NES" (0x45 0x1A)
4: PRG ROM size (16 KB units)
5: CHR ROM size (8 KB units)
6‑7: Mapper and mirroring flags
8‑15: Additional flags and padding
CPU (2A03)
The FC uses an 8‑bit 2A03 CPU based on the 6502 instruction set. Its registers are modeled as a struct.
<code>struct CPU_2A03 {
struct __registers {
uint16_t PC; // Program Counter
uint8_t SP; // Stack Pointer
enum __P { C=0, Z, I, D, B, _, V, N };
uint8_t A, X, Y; // General‑purpose registers
// Processor status register
struct bit8 P;
} regs;
};</code>Stack
The stack occupies 0x0100‑0x01FF (256 bytes) and grows downward. The stack pointer (SP) is initialized to 0xFF.
<code>regs.SP = 0xFF; // Stack grows downwards</code>Instruction Decoding
Each instruction consists of an opcode byte followed by zero, one, or two operand bytes.
<code>cmd = read_8bit();
switch(cmd) {
case AND: ...
case ASL: ...
case BCC: ...
// many more cases
}</code>PPU (2C02)
The Picture Processing Unit renders graphics. It has its own 16 KB of VRAM for name tables, attribute tables, and pattern tables.
<code>struct VRAM {
VRAM() {
_data = (uint8_t*)malloc(0x4000);
memset(_data, 0, 0x4000);
}
const uint8_t* bkPaletteAddress() const { return _getRealAddr(0x3F00); }
const uint8_t* sprPaletteAddress() const { return _getRealAddr(0x3F10); }
const uint8_t* nameTableAddress(int i) const { return _getRealAddr(0x2000 + i*0x0400); }
// ...
};</code>Palette
The system palette defines 64 colors; background and sprite palettes each use 16 entries located at 0x3F00‑0x3F1F.
<code>const static uint8_t DEFAULT_PALETTE[192] = {84,84,84,0,30,116,8,16,144,48,0,136,68,0,100,92,...};</code>Pattern Table
Pattern tables store 4 KB of tile graphics. Each tile is 8×8 pixels and occupies 16 bytes (two bit‑planes).
Name Table
Four name tables (0x2000‑0x2FFF) each contain 960 bytes of tile indices and 64 bytes of attribute data, defining a 32×30 tile background (256×240 pixels).
Attribute Table
Attribute bytes assign palette high‑bits to 4×4 tile blocks, allowing each pixel to reference a full 4‑bit color index.
Sprites
Sprites are 4‑byte structures referencing a tile and containing position, attributes, and palette information. The OAM (Object Attribute Memory) holds 64 sprites (256 bytes) at 0x2000‑0x2007.
<code>struct Sprite {
uint8_t y;
uint8_t tileIndex; // Tile in pattern table
uint8_t info; // Attributes, palette, priority, flips
uint8_t x;
};</code>Screen Rendering
The PPU draws each pixel in real time; its clock runs three times faster than the CPU. Visible area is 256×240 pixels, but the NTSC timing includes 341×262 total scanlines, with HBlank and VBlank periods.
VBlank
During VBlank the PPU is not drawing visible lines, and the console triggers a Non‑Maskable Interrupt (NMI) so the program can update graphics safely.
Interrupts
Four interrupt types exist: NMI, Reset (non‑maskable), Break (triggered by opcode 0x00), and IRQ (maskable). The CPU checks for pending interrupts before executing the next instruction.
<code>// Example: retrieve interrupt handler address
inline uint16_t _getInterruptHandlerAddr(InterruptType type) const {
static const std::map<int, uint16_t> handler = {
{InterruptTypeNMI, 0xFFFA},
{InterruptTypeReset, 0xFFFC},
{InterruptTypeIRQs, 0xFFFE},
{InterruptTypeBreak, 0xFFFE}
};
uint16_t addr = handler.at(type);
return _mem->get16bitData(addr);
}</code>Conclusion
This overview provides the essential concepts needed to start building an NES emulator: loading ROMs, handling memory mirroring, emulating the 2A03 CPU, rendering graphics with the PPU, and managing interrupts. Further details such as mapper support, scrolling, collision detection, and precise timing can be explored on nesdev.com .
Kuaishou Large Model
Official Kuaishou Account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.