Mobile Development 37 min read

How WKWebView Parses HTML: Decoding, Tokenization, and DOM Tree Construction

WKWebView parses HTML by streaming bytes from the network process to the rendering process, decoding them into characters, tokenizing into HTML tokens, building a DOM tree through node creation and insertion, and finally laying out and painting the document using a doubly‑linked in‑memory structure.

Baidu App Technology

Mar 7, 2022

How WKWebView Parses HTML: Decoding, Tokenization, and DOM Tree Construction

When a client‑side App creates a WKWebView object, three processes are launched: the main process, a rendering process, and a network process. The main process forwards a request to the rendering process, which forwards it to the network process. The network process fetches the server response. If the response is a web page, the network process streams the HTML byte data to the rendering process.

The rendering process first decodes the byte stream into a character stream, then parses the characters into a DOM tree. After the DOM tree is built, layout and painting are performed, and finally the main process creates a view to display the result.

1. What is a DOM tree

The rendering process converts the HTML character stream into a DOM tree. The left side of the figure shows an HTML file; the right side shows the resulting DOM tree. The root node is an HTMLDocument, representing the whole document. Its children correspond one‑to‑one with the tags in the HTML file (e.g., the <head> tag becomes a head node). Text nodes also become nodes, e.g., the text "Hello, World!" becomes a child of a div node.

Each node is an object with methods and properties defined by its class. For example, HTMLDocument inherits from Document:

<code>class HTMLDocument : public Document {</code>
<code>    // ...</code>
<code>    WEBCORE_EXPORT int width();</code>
<code>    WEBCORE_EXPORT int height();</code>
<code>    // ...</code>
<code>}

The Document class itself inherits from ContainerNode, which inherits from Node:

<code>class Document : public ContainerNode, public TreeScope, public ScriptExecutionContext,</code>
<code>    public FontSelectorClient, public FrameDestructionObserver, public Supplementable<Document>,</code>
<code>    public Logger::Observer, public CanvasObserver {</code>
<code>    WEBCORE_EXPORT ExceptionOr<Ref<Element>> createElementForBindings(const AtomString& tagName);</code>
<code>    WEBCORE_EXPORT Ref<Text> createTextNode(const String& data);</code>
<code>    // ...</code>
<code>}

These classes implement the DOM standard (Document Object Model) which defines the interfaces and attributes each node must provide. The IDL (Interface Description Language) for HTMLDocument, Document, and HTMLDivElement can be found in the W3C specifications.

In the DOM tree, every node inherits from Node. Element is a subclass of Node. Text nodes inherit directly from Node, while element nodes (e.g., div) inherit from Element. Consequently, the following JavaScript expressions return different results:

<code>document.childNodes; // returns all child Nodes, including DocumentType and HTML</code>
<code>document.children;   // returns only child Elements, excludes DocumentType

2. DOM Tree Construction

The construction process consists of four steps: decode, tokenize, create node, add node .

2.1 Decoding

The rendering process receives an HTML byte stream from the network process. Decoding converts the byte stream into a character stream. Different encodings (ISO‑8859‑1, UTF‑8, etc.) map bytes to characters. The core decoder is HTMLDocumentParser which uses HTMLInputStream to store the character stream.

The class diagram shows that HTMLDocumentParser relies on HTMLDocumentParser (decoder) and HTMLInputStream. The decoder looks for a <meta charset=...> tag in the <head> to determine the correct encoding. If none is found, it falls back to windows‑1252 (ISO‑8859‑1).

<code>// Simplified decoding method</code>
<code>String TextResourceDecoder::decode(const char* data, size_t length) {</code>
<code>    // ... check <head> for <meta charset></code>
<code>    if (!m_codec)</code>
<code>        m_codec = newTextCodec(m_encoding); // create concrete codec</code>
<code>    String result = m_codec->decode(m_buffer.data() + lengthOfBOM, m_buffer.size() - lengthOfBOM, false, ...);</code>
<code>    m_buffer.clear();</code>
<code>    return result;</code>
<code>}

If the charset is found, the decoder uses the appropriate codec; otherwise it decodes the entire buffered byte stream with the default codec after the network finishes.

2.2 Tokenization

After decoding, the character stream is tokenized. The tokenizer reads characters one by one, recognizing special characters such as <, /, >, and =. Tokens are represented by the HTMLToken class.

<code>class HTMLToken {</code>
<code>public:</code>
<code>    enum Type { Uninitialized, DOCTYPE, StartTag, EndTag, Comment, Character, EndOfFile };</code>
<code>    struct Attribute { Vector<UChar,32> name; Vector<UChar,64> value; unsigned startOffset; unsigned endOffset; };</code>
<code>    // ... other members ...</code>
<code>private:</code>
<code>    Type m_type;</code>
<code>    DataVector m_data;</code>
<code>    bool m_selfClosing;</code>
<code>    AttributeList m_attributes;</code>
<code>};

The tokenization loop is driven by HTMLDocumentParser::pumpTokenizerLoop. It repeatedly calls HTMLTokenizer::nextToken, processes the token, and may yield to avoid long‑running loops.

<code>bool HTMLDocumentParser::pumpTokenizerLoop(SynchronousMode mode, bool parsingFragment, PumpSession& session) {</code>
<code>    do {</code>
<code>        auto token = m_tokenizer.nextToken(m_input.current());</code>
<code>        if (!token) return false;</code>
<code>        constructTreeFromHTMLToken(token); // build DOM node</code>
<code>    } while (!isStopped());</code>
<code>    return false;</code>
<code>}

When the tokenizer encounters a <!DOCTYPE> tag, an HTMLToken::DOCTYPE token is produced. The state machine for parsing <!DOCTYPE> is illustrated in the source code (states DataState, TagOpenState, MarkupDeclarationOpenState, etc.). Similar state machines handle start tags, end tags, attributes, and plain text.

2.3 Node Creation and Insertion

Each token is transformed into a DOM node by HTMLTreeBuilder::processToken. For example, a DOCTYPE token triggers HTMLConstructionSite::insertDoctype, which creates a DocumentType node and schedules an insertion task.

<code>void HTMLConstructionSite::insertDoctype(AtomHTMLToken&& token) {</code>
<code>    attachLater(m_attachmentRoot, DocumentType::create(m_document, token.name(), publicId, systemId));</code>
<code>}

The attachLater method creates an HTMLConstructionSiteTask (operation = Insert) and pushes it onto a task queue. Later, executeQueuedTasks runs each task, ultimately calling ContainerNode::parserAppendChild to link the new node into the tree.

<code>void ContainerNode::parserAppendChild(Node& newChild) {</code>
<code>    executeNodeInsertionWithScriptAssertion(*this, newChild, ChildChange::Source::Parser, ...);</code>
<code>    newChild.setParentNode(this);</code>
<code>    if (m_lastChild) {</code>
<code>        newChild.setPreviousSibling(m_lastChild);</code>
<code>        m_lastChild->setNextSibling(&newChild);</code>
<code>    } else {</code>
<code>        m_firstChild = &newChild;</code>
<code>    }</code>
<code>    m_lastChild = &newChild;</code>
<code>}

Text nodes are created by HTMLConstructionSite::insertTextNode. If the text is longer than the internal length limit (65 536 characters), it is split into multiple Text nodes.

<code>void HTMLConstructionSite::insertTextNode(const String& characters, WhitespaceMode whitespaceMode) {</code>
<code>    while (currentPosition < characters.length()) {</code>
<code>        auto textNode = Text::createWithLengthLimit(task.parent->document(), ...);</code>
<code>        task.child = WTFMove(textNode);</code>
<code>        executeTask(task); // insert the text node</code>
<code>    }</code>
<code>}

After all tokens are processed, the DOM tree resides in memory. Unlike a logical tree where a parent stores an array of child pointers, WebKit’s in‑memory representation uses only two pointers per node ( m_firstChild and m_lastChild) and doubly‑linked sibling pointers. This reduces the need for frequent allocations when many children are added, at the cost of extra sibling pointers.

When parsing finishes, the HTMLElementStack ( m_openElements) still holds the html and body elements, which matches the console output shown in the original article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

html-parsing WebKit Tokenization WKWebView DOM

Written by

Baidu App Technology

Official Baidu App Tech Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.