Frontend Development 8 min read

Next-Generation Document Reader Architecture and Implementation Using Canvas

The team replaced Baidu Wenku’s HTML‑CSS reader with a Canvas‑based architecture (CReader) that separates logic, data, parsing, rendering and application layers, enabling direct long‑image export, fast text selection, anti‑copy protection, annotation support, and cross‑platform deployment on PC, WAP and mini‑programs.

Baidu Geek Talk

Dec 6, 2021

Next-Generation Document Reader Architecture and Implementation Using Canvas

The article discusses the limitations of the previous HTML+CSS-based document reader in Baidu Wenku, such as difficulty exporting long images, annotations, keyword highlighting, watermarks, content analysis, and anti-copy features.

To overcome these issues, the team adopted Canvas for a new-generation reader (CReader) supporting PC, WAP, and mini‑programs, enabling direct long‑image export, better text selection performance, and improved development experience.

The architecture is divided into five layers: Logic (data loading, page creation, rendering scheduling, event distribution, core API), Data (loading document content, custom fonts, images), Parsing (converting document data into renderable data like text, font size, position, images), Rendering (Canvas‑based rendering, extensible to HTML/SVG), and Application (business‑side usage with an integrated online reader).

Core technical points include a text‑and‑image rendering mechanism that draws directly on Canvas, optimizations for Safari canvas size limits and memory usage by rendering only visible pages, and a custom text‑selection implementation that maps mouse coordinates to character positions using a data layer that stores node coordinates, with a temporary Canvas for highlighting.

Business functions enabled by Canvas are anti‑cheating (content drawn to an image prevents text extraction), document‑to‑image conversion (native Canvas export avoids heavy headless‑browser processing), and document annotation (using libraries like Fabric.js to draw shapes and export as JSON for collaborative marking).

The solution also extends to mini‑programs, allowing fixed‑layout documents to be rendered inside the native page via WebView, preserving surrounding UI elements such as recommendations and toolbars.

Overall, the new reader improves user experience, development efficiency (Vite + TypeScript, hot reload, unit tests), and supports WAP, PC, and mini‑programs, covering 90% of document types with a path to full coverage.

TypeScript Canvas Annotation Vite Document Reader Text Selection

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.