Frontend Development 36 min read

How Browsers Turn URLs into Web Pages: Inside Rendering Engines and Parsing

From typing a URL to seeing a page, browsers perform a complex series of steps—including network requests, HTML and CSS parsing, DOM and render tree construction, layout, painting, and script execution—while handling errors and optimizations across components such as the UI, engine, networking, JavaScript interpreter, and storage.

ITFLY8 Architecture Home

Jan 20, 2017

How Browsers Turn URLs into Web Pages: Inside Rendering Engines and Parsing

Introduction

Browsers are the most widely used software. This article explains how a browser works, from entering google.com in the address bar to displaying the Google homepage.

Browsers Covered

Five mainstream browsers exist today: IE, Firefox, Safari, Chrome, and Opera. The discussion focuses on open‑source browsers—Firefox, Chrome, and the partially open‑source Safari.

Main Functions of a Browser

A browser retrieves web resources identified by a URI, renders them (typically HTML, but also PDF, images, etc.), and presents them in a window. HTML and CSS specifications define how browsers should interpret documents; the W3C maintains these standards.

Because vendors add proprietary extensions, strict compliance is rare, leading to compatibility challenges for web developers.

High‑Level Structure (Components)

User Interface – address bar, navigation buttons, bookmarks, refresh/stop, home button.

Browser Engine – interface to the rendering engine.

Rendering Engine – parses HTML/CSS and paints the result.

Network – platform‑independent HTTP handling.

UI Backend – draws native widgets (menus, dialogs).

JavaScript Interpreter – executes JS code.

Data Storage – persistent storage (cookies, WebSQL, IndexedDB).

Component Communication

Firefox and Chrome implement a special inter‑component communication structure, described in a dedicated chapter.

Rendering Engine

The rendering engine’s job is to display the requested content. By default it can render HTML, XML, and images, and can use plugins for other formats (e.g., PDF).

Engines Used by the Discussed Browsers

Firefox uses Gecko, an engine developed by Mozilla. Chrome and Safari both use WebKit (Safari’s version is partially open‑source).

WebKit originated on Linux and was later ported to macOS and Windows.

Main Flow of a Rendering Engine

1. Network fetches the document (often in 8 KB chunks). 2. Parse HTML → build DOM tree. 3. Parse CSS and combine with DOM to build the render tree. 4. Layout the render tree (compute coordinates). 5. Paint the render tree to the screen.

Parsing

Parsing converts a document into a structured tree (parse tree or syntax tree). For example, parsing the expression 2+3-1 yields a binary‑tree representation.

Grammars and Parsers

Parsing relies on a grammar (usually a context‑free grammar expressed in BNF). Two main parser types exist:

Top‑down parsers – start from the highest‑level rule and try to match input.

Bottom‑up parsers – build matches from the input upward (shift‑reduce).

HTML Parsing

HTML cannot be parsed with generic top‑down or bottom‑up parsers because of its tolerant nature. Browsers implement a custom tokenization algorithm that turns the input stream into tokens (start tag, end tag, attribute name/value, character data, etc.) and then a tree‑construction algorithm that builds the DOM.

During tokenization, the parser maintains a state machine (e.g., Data State, Tag Open State, Tag Name State). When a '<' is encountered, it switches to Tag Open State, reads the tag name until '>', and creates a token.

Tree construction uses a stack of open elements to handle nesting, automatically inserting missing tags (e.g., <head>) and correcting mismatched structures.

Error Tolerance

Browsers silently fix malformed HTML (e.g., stray <br> tags, misplaced <table> elements, nested forms). The fixing code is internal and invisible to the user.

if (t->isCloseTag(brTag) && m_document->inCompatMode()) { reportError(MalformedBRError); t->beginTag = true; }

CSS Parsing

CSS is a context‑free grammar and can be parsed with standard parsers. Tokens are defined by regular expressions (identifiers, numbers, comments, etc.). The grammar describes rulesets, selectors, and declarations.

Script Parsing and Execution

JavaScript execution blocks document parsing unless the script is marked defer or async. Browsers may perform speculative parsing on a background thread to fetch resources while the main parser continues.

Render Tree Construction

After the DOM is built, the browser creates a render tree consisting of visible elements. Firefox calls these frames; WebKit calls them render objects. The render tree is used for layout and painting.

class RenderObject{ virtual void layout(); virtual void paint(PaintInfo); RenderStyle* style; Node* node; }

Style Computation

Each render object needs computed style values. Styles come from user‑agent defaults, author stylesheets, inline styles, and presentational attributes. To avoid recomputing, browsers share style objects when possible (same tag, class, state, no IDs, etc.).

Firefox builds a rule tree and a style‑context tree; WebKit traverses declarations in cascade order (non‑important, important, etc.).