Redesigning the Internationalization Translation Platform Document Parsing SDK: Architecture, Layers, and Implementation
This article details the motivation, benefits, and technical design of the Document Parsing 2.0 SDK for the internationalization translation platform, describing a three‑layer architecture, the use of Adapter, Decorator, and Proxy patterns, TypeScript implementations, and a high‑performance batch update mechanism.
Background
Rapid business growth expanded the platform’s document support from a single Word format to eight types, but the original Node.js SDK (Document Parser 1.0) became hard to maintain because each format required its own parsing, restoration, translation, and word‑count logic, leading to duplicated code and difficult extensions.
Refactoring Benefits
Core SDK code reduced by more than 70% (from over 6000 lines to about 2000).
Improved maintainability and extensibility, e.g., Lark Document 2.0 restoration speed increased tenfold, Node BFF projects saw an 80% reduction in related code, and SDK data visualisation was achieved using TypeScript decorators and FaaS.
Technical Principles
3.1 Architecture Design
The parsing and restoration processes are abstracted into a low‑level capability layer that converts any document into a unified DSL (segments) and back, analogous to the application layer in the TCP/IP model. An Adapter layer then normalises the varied parser outputs into a consistent DSL.
Resulting three‑layer architecture:
Parser Layer : Handles format‑specific parsing and restoration.
Adapter Layer : Transforms parser results into a unified DSL.
Feature (Application) Layer : Exposes unified capabilities such as segmentation, machine translation, word counting, and document generation.
3.2 Layer Implementations
Parser example (TypeScript with decorators):
class SomeTypeParser {
constructor(config) {}
@type2bridge
parse() {}
@bridge2type
restore() {}
}Feature example (CAT – Computer‑Assisted Translation):
class CAT {
// document parsing, returns segments
adaptCAT(type, config) {}
// machine translation
adaptCATWithMT() {}
// word count
countWords() {}
// generate document
genDoc() {}
// custom parser registration
apply(type, parser) {}
}Adapter functions:
export const type2bridge = () => { /* convert parser output to DSL */ }
export const bridge2type = () => { /* convert DSL back to document */ }Unified DSL structure example:
{
blockId: string,
elements: [{
type: string, // text or other
textRun?: { style: Object, content: string },
location: { start: number, end: number }
}]
}3.3 Decorator Usage and SDK Data Visualisation
Decorators enhance class methods, enabling additional behaviours such as tracing, proxy handling, and data collection. Example parser with decorators:
class TxtParser {
@txt2bridge
async parse(token) { const buffer = path2buffer(token); return [buffer.toString()]; }
@bridge2txt
async restore(blocks, _raw) { return { buffer: Buffer.from(blocks.join()) }; }
}Proxy helpers ( proxyForReturnValue and proxyForParam ) implement the Proxy design pattern to decouple decorator logic from business code.
export function proxyForReturnValue
(proxy) { /* ... */ }
export function proxyForParam
(proxy) { /* ... */ }A trace decorator collects execution metrics (time, CPU, memory) and reports them via FaaS for visualisation in an operations dashboard.
export function trace(docType, operateType) { return function(target, propertyName, descriptor) { /* ... */ } }3.4 Larkdocx Batch Update Key Implementation
To overcome the low QPS limit (≤3) of the Feishu document API, a batch‑update mechanism with a custom concurrent scheduler and retry logic was introduced.
async function concurrentFetch(poolLimit, iterable, iteratorFn) { /* concurrency control with retry */ }
async function batchUpdateBlock(blocks, documentId) { /* chunked batch updates with error handling */ }Conclusion
The redesign abstracts parsing and restoration into a three‑layer architecture, leveraging Decorator, Adapter, and Proxy patterns, and introduces a robust batch‑update scheduler, demonstrating how solid fundamentals and design patterns enable scalable, maintainable SDK solutions.
TikTok Frontend Technology Team
We are the TikTok Frontend Technology Team, serving TikTok and multiple ByteDance product lines, focused on building frontend infrastructure and exploring community technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.