How DingTalk Docs Enables Real-Time Collaborative Rich‑Text Editing Without contentEditable
This article explains how DingTalk Docs implements professional layout, custom rendering, and Operational Transformation to support complex rich‑text features and seamless multi‑user collaboration, all without relying on the native contentEditable API.
DingTalk Docs, a core component of Alibaba's DingTalk office suite, has evolved over three years into a highly complex product that supports professional layout (pagination, columns, mixed text‑image) and innovative features such as embedded mind maps and maps, while also handling real‑time collaborative editing.
What You See Is What You Get
The editor provides WYSIWYG capabilities with advanced layout support. For example, line breaking is essential when a paragraph spans multiple pages; the engine measures characters to determine where to split lines.
1. Measurement and Splitting
User‑entered content lacks layout information. The layout engine measures each character, splits paragraphs, and produces a view model that is then rendered into the final DOM.
Character measurement is performed for every glyph to calculate line breaks based on container width.
2. Measurement Result Caching
Because measuring each character can be costly, DingTalk Docs caches results using a character+style key. For Chinese characters, all glyphs are replaced with a single placeholder "中" to improve cache efficiency.
3. Splitting and Mapping
Each node in the view model receives a unique identifier (e.g., paragraph-1). When a node is split, the identifier becomes originalId-splitIndex, enabling the editor to map user interactions back to the original document model.
4. Summary of the Editing Data Flow
Editor Without contentEditable
Instead of using the native contentEditable API, DingTalk Docs implements its own selection calculation, rendering, and input handling to support layout‑aware editing.
1. Selection Calculation and Rendering
When a user clicks, the editor determines the cursor position by examining the event coordinates and performing a binary search on characters. Example pseudo‑code:
const { target, clientX, clientY } = event; if (target is a void node like image or video) { select the node } else if (target is a text node) { binary search to find character index } else { adjust clientX/clientY, recurse into child nodes }The resulting selection is described as:
Value.create({
selection: Selection.create({
anchor: Point.create({ key: 'Good', offset: 4 }),
focus: Point.create({ key: 'Good', offset: 4 })
})
});2. Input Composition
A hidden textarea captures user input and IME composition. During composition, the state is represented as: Value.create({ composing: 'hai' }); After the user confirms the text, the document model updates:
Value.create({
document: Document.create({
nodes: [Paragraph.create({ nodes: [Text.create('嗨')] })]
})
});3. Final Rendering Structure
The editor renders the combined value:
<editor>
<content {value.document + value.composing} />
<selection {value.selection} />
</editor>Multi‑User Collaborative Editing
DingTalk Docs supports real‑time collaboration using Operational Transformation (OT). Edits are transformed into atomic operations; the engine resolves conflicts similarly to Git rebase.
The editor converts user actions into nine atomic operation types, which drive the document model updates.
Operations are sent to the collaborative engine, transformed, and applied to keep all participants in sync.
Open‑Source Plan
The editor SDK was designed for reuse across more than 20 products (e.g., ATA, Aliway) and will be open‑sourced to encourage community contributions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
