How Browsers Turn URLs into Web Pages: Inside Rendering Engines and Parsing

From typing a URL to seeing a page, browsers perform a complex series of steps—including network requests, HTML and CSS parsing, DOM and render tree construction, layout, painting, and script execution—while handling errors and optimizations across components such as the UI, engine, networking, JavaScript interpreter, and storage.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How Browsers Turn URLs into Web Pages: Inside Rendering Engines and Parsing

Introduction

Browsers are the most widely used software. This article explains how a browser works, from entering google.com in the address bar to displaying the Google homepage.

Browsers Covered

Five mainstream browsers exist today: IE, Firefox, Safari, Chrome, and Opera. The discussion focuses on open‑source browsers—Firefox, Chrome, and the partially open‑source Safari.

Main Functions of a Browser

A browser retrieves web resources identified by a URI, renders them (typically HTML, but also PDF, images, etc.), and presents them in a window. HTML and CSS specifications define how browsers should interpret documents; the W3C maintains these standards.

Because vendors add proprietary extensions, strict compliance is rare, leading to compatibility challenges for web developers.

High‑Level Structure (Components)

User Interface – address bar, navigation buttons, bookmarks, refresh/stop, home button.

Browser Engine – interface to the rendering engine.

Rendering Engine – parses HTML/CSS and paints the result.

Network – platform‑independent HTTP handling.

UI Backend – draws native widgets (menus, dialogs).

JavaScript Interpreter – executes JS code.

Data Storage – persistent storage (cookies, WebSQL, IndexedDB).

Browser components diagram
Browser components diagram

Component Communication

Firefox and Chrome implement a special inter‑component communication structure, described in a dedicated chapter.

Rendering Engine

The rendering engine’s job is to display the requested content. By default it can render HTML, XML, and images, and can use plugins for other formats (e.g., PDF).

Engines Used by the Discussed Browsers

Firefox uses Gecko, an engine developed by Mozilla. Chrome and Safari both use WebKit (Safari’s version is partially open‑source).

WebKit originated on Linux and was later ported to macOS and Windows.

Main Flow of a Rendering Engine

1. Network fetches the document (often in 8 KB chunks). 2. Parse HTML → build DOM tree. 3. Parse CSS and combine with DOM to build the render tree. 4. Layout the render tree (compute coordinates). 5. Paint the render tree to the screen.

Rendering engine basic flow
Rendering engine basic flow

Parsing

Parsing converts a document into a structured tree (parse tree or syntax tree). For example, parsing the expression 2+3-1 yields a binary‑tree representation.

Expression tree
Expression tree

Grammars and Parsers

Parsing relies on a grammar (usually a context‑free grammar expressed in BNF). Two main parser types exist:

Top‑down parsers – start from the highest‑level rule and try to match input.

Bottom‑up parsers – build matches from the input upward (shift‑reduce).

HTML Parsing

HTML cannot be parsed with generic top‑down or bottom‑up parsers because of its tolerant nature. Browsers implement a custom tokenization algorithm that turns the input stream into tokens (start tag, end tag, attribute name/value, character data, etc.) and then a tree‑construction algorithm that builds the DOM.

HTML parsing flow
HTML parsing flow

During tokenization, the parser maintains a state machine (e.g., Data State, Tag Open State, Tag Name State). When a '<' is encountered, it switches to Tag Open State, reads the tag name until '>', and creates a token.

Tokenization example
Tokenization example

Tree construction uses a stack of open elements to handle nesting, automatically inserting missing tags (e.g., <head>) and correcting mismatched structures.

HTML tree construction example
HTML tree construction example

Error Tolerance

Browsers silently fix malformed HTML (e.g., stray <br> tags, misplaced <table> elements, nested forms). The fixing code is internal and invisible to the user.

if (t->isCloseTag(brTag) && m_document->inCompatMode()) { reportError(MalformedBRError); t->beginTag = true; }

CSS Parsing

CSS is a context‑free grammar and can be parsed with standard parsers. Tokens are defined by regular expressions (identifiers, numbers, comments, etc.). The grammar describes rulesets, selectors, and declarations.

CSS parsing diagram
CSS parsing diagram

Script Parsing and Execution

JavaScript execution blocks document parsing unless the script is marked defer or async. Browsers may perform speculative parsing on a background thread to fetch resources while the main parser continues.

Render Tree Construction

After the DOM is built, the browser creates a render tree consisting of visible elements. Firefox calls these frames; WebKit calls them render objects. The render tree is used for layout and painting.

class RenderObject{ virtual void layout(); virtual void paint(PaintInfo); RenderStyle* style; Node* node; }
Render tree vs DOM tree
Render tree vs DOM tree

Style Computation

Each render object needs computed style values. Styles come from user‑agent defaults, author stylesheets, inline styles, and presentational attributes. To avoid recomputing, browsers share style objects when possible (same tag, class, state, no IDs, etc.).

Firefox builds a rule tree and a style‑context tree; WebKit traverses declarations in cascade order (non‑important, important, etc.).

Firefox rule tree
Firefox rule tree

Further Reading

Source: http://www.kuqin.com/system-analysis/20120205/317831.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BrowserDOMRendering Engine
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.