Turning Canvas Snapshots into Real Video Cards with ProseMirror
This article recounts Bilibili’s evolution from using Canvas‑based screenshot tricks to render video cards in the Quill editor to adopting ProseMirror with Tiptap, detailing the technical debt, the new schema and NodeView architecture, performance optimizations, batch link validation, and strategies for backward compatibility and real‑time interactive components.
Background: From Fake Cards to Real Interaction
In Bilibili’s rich‑text editor history, the early UEditor stage satisfied basic text needs, while Quill introduced the Delta data model. However, Quill lacked proper BlockNode support, forcing developers to fake video cards by drawing them on an off‑screen Canvas, capturing a screenshot with html2canvas, converting it to a Base64 image, uploading it to a CDN, and inserting a static <img> tag.
This “canvas screenshot” approach caused interactive loss, performance bottlenecks, data staleness, and storage waste.
Chapter 1 – The Old World: Canvas‑Based Video Cards
1.1 User‑Facing “Ghost” Experience – Users paste a video link, see a loading spinner, then a static image appears. Clicking play does nothing; the title cannot be edited because the card is just a dead image.
1.2 Technical Trick: Canvas Screenshot
Implicit Rendering : Render a temporary card DOM outside the visible viewport.
Canvas Capture : Call html2canvas with scale: 4 to get a high‑resolution canvas.
Image Generation : Export the canvas as a Base64 image.
Upload & Replace : Upload the image to a CDN and insert an <img> tag.
1.3 Four Pain Points
Interactive Loss : The card is a static image, not a live component.
Performance Black Hole : The pipeline (API → render → screenshot → upload) takes >2 seconds, breaking writing flow.
Data Staleness : Playback count and comments freeze at insertion time.
Storage Waste : Every generated image occupies CDN space, scaling poorly with article volume.
Chapter 2 – Diagnosing the Problem: Quill Meets Video Cards
Quill’s Delta model is a linear operation log, similar to a receipt, making it hard to embed complex nested structures like a video player. ProseMirror, by contrast, uses a document‑tree model (like LEGO blocks) that naturally supports block nodes.
// Quill Delta (flat linear record)
[
{ "insert": "Hello " },
{ "insert": { "video": { "id": "BV1xx..." } } }, // forced object insert
]Attempting to insert a full video player into a flat receipt is impractical; the screenshot trick merely draws a TV on the receipt.
// ProseMirror Tree (structured)
{
"type": "doc",
"content": [
{ "type": "paragraph", "content": [{"type":"text","text":"Hello"}] },
{ "type": "videoCard", "attrs": {"bvid":"BV1xx..."}, "content": [] }
]
}With a proper block node, a video card can contain title, cover, and player sub‑nodes.
Chapter 3 – Core Implementation with ProseMirror
We introduced an “editor‑component separation” architecture, leveraging ProseMirror’s NodeView to bridge the document model and UI components.
3.1 Schema Definition
// schema/video-card.ts
const VideoCard = Node.create({
name: 'videoCard',
group: 'block',
atom: true,
draggable: true,
addAttributes() {
return {
card_style: { default: CardStyle.NORMAL },
info: { default: {} },
status: { default: 'loading' }
};
},
parseHTML() { return [{ tag: 'div[data-type="video-card"]' }]; },
renderHTML({ node }) { return ['div', { 'data-type': 'video-card', 'data-bvid': node.attrs.bvid }, 0]; }
});Key points: atom: true makes the whole node selectable as a single unit, preventing cursor leakage inside the card.
3.2 NodeView Bridge
// Abstract base for card NodeViews
abstract class BaseCardNodeView {
constructor(node) { /* init component */ }
update(node) { /* sync data */ }
destroy() { /* cleanup */ }
abstract createCardComponent();
}
// VideoCard implementation
class VideoCardNodeView extends BaseCardNodeView {
createCardComponent() {
return new VideoCard({ data: this.node.attrs, isInEditor: true });
}
setupEventListeners() {
this.cardComponent.on('statusChange', status => this.updateNodeAttributes({ status }));
this.cardComponent.on('delete', () => this.deleteFromEditor());
}
}The NodeView mounts a real React/Vue component, forwards state changes back to the document, and handles deletion.
3.3 Performance Safeguards
Each video card now contains a full DOM subtree and a player instance. To avoid memory blow‑up, we introduced a global CardPlayer manager that limits concurrent players to three and uses an LRU strategy to recycle instances.
// CardPlayer manager
class CardPlayer {
static MAX_PLAYERS = 3;
static play(playerId) { pauseOthers(playerId); startPlay(playerId); }
static enforceLimit() { if (count > MAX_PLAYERS) destroyOldest(); }
}3.4 Batch Link Validation & Caching
When users paste many links, we batch‑validate them and cache three layers: validation result, parsed type, and final card data.
const cache = {
validation: new Map(),
type: new Map(),
card: new Map()
};
async function validateLink(url) {
if (cache.has(url)) return cache.get(url);
addToQueue(url);
await waitBatchProcess();
return cache.get(url);
}This reduces API traffic and ensures zero‑delay rendering for repeated operations.
Chapter 4 – Compatibility with Legacy Content
To preserve millions of existing articles, we built a two‑way conversion based on the internal Opus protocol. Original links are stored in a resource_url attribute, enabling lossless round‑trip between link and card.
interface CardAttrs {
resource_url: string; // original link
info: object;
status: State;
}
function onPaste(url) { insertCard({ resource_url: url, status: 'loading' }); }
function convertToLink(card) { replaceWith(card.resource_url); }For un‑migratable old HTML (e.g., stray UEditor tags), the new editor falls back to an H5 parser that translates legacy markup into the new block node format on the fly.
Chapter 5 – Results and Conclusions
After the migration, inserting a video card becomes instantaneous, and the component is fully interactive. Benchmarks show insertion latency dropping from >2 seconds to <100 ms, while runtime memory remains stable thanks to the player pool and caching layers.
The shift from a static‑image workaround to a true application‑level document model unlocks future possibilities such as embedded polls, mini‑games, and richer interactive media.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
