Backend Development 13 min read

Redesigning the Internationalization Translation Platform Document Parsing SDK: Architecture, Layers, and Implementation

This article details the motivation, benefits, and technical design of the Document Parsing 2.0 SDK for the internationalization translation platform, describing a three‑layer architecture, the use of Adapter, Decorator, and Proxy patterns, TypeScript implementations, and a high‑performance batch update mechanism.

TikTok Frontend Technology Team
TikTok Frontend Technology Team
TikTok Frontend Technology Team
Redesigning the Internationalization Translation Platform Document Parsing SDK: Architecture, Layers, and Implementation

Background

Rapid business growth expanded the platform’s document support from a single Word format to eight types, but the original Node.js SDK (Document Parser 1.0) became hard to maintain because each format required its own parsing, restoration, translation, and word‑count logic, leading to duplicated code and difficult extensions.

Refactoring Benefits

Core SDK code reduced by more than 70% (from over 6000 lines to about 2000).

Improved maintainability and extensibility, e.g., Lark Document 2.0 restoration speed increased tenfold, Node BFF projects saw an 80% reduction in related code, and SDK data visualisation was achieved using TypeScript decorators and FaaS.

Technical Principles

3.1 Architecture Design

The parsing and restoration processes are abstracted into a low‑level capability layer that converts any document into a unified DSL (segments) and back, analogous to the application layer in the TCP/IP model. An Adapter layer then normalises the varied parser outputs into a consistent DSL.

Resulting three‑layer architecture:

Parser Layer : Handles format‑specific parsing and restoration.

Adapter Layer : Transforms parser results into a unified DSL.

Feature (Application) Layer : Exposes unified capabilities such as segmentation, machine translation, word counting, and document generation.

3.2 Layer Implementations

Parser example (TypeScript with decorators):

class SomeTypeParser {
    constructor(config) {}
    @type2bridge
    parse() {}
    @bridge2type
    restore() {}
}

Feature example (CAT – Computer‑Assisted Translation):

class CAT {
    // document parsing, returns segments
    adaptCAT(type, config) {}
    // machine translation
    adaptCATWithMT() {}
    // word count
    countWords() {}
    // generate document
    genDoc() {}
    // custom parser registration
    apply(type, parser) {}
}

Adapter functions:

export const type2bridge = () => { /* convert parser output to DSL */ }
export const bridge2type = () => { /* convert DSL back to document */ }

Unified DSL structure example:

{
  blockId: string,
  elements: [{
    type: string, // text or other
    textRun?: { style: Object, content: string },
    location: { start: number, end: number }
  }]
}

3.3 Decorator Usage and SDK Data Visualisation

Decorators enhance class methods, enabling additional behaviours such as tracing, proxy handling, and data collection. Example parser with decorators:

class TxtParser {
    @txt2bridge
    async parse(token) { const buffer = path2buffer(token); return [buffer.toString()]; }
    @bridge2txt
    async restore(blocks, _raw) { return { buffer: Buffer.from(blocks.join()) }; }
}

Proxy helpers ( proxyForReturnValue and proxyForParam ) implement the Proxy design pattern to decouple decorator logic from business code.

export function proxyForReturnValue
(proxy) { /* ... */ }
export function proxyForParam
(proxy) { /* ... */ }

A trace decorator collects execution metrics (time, CPU, memory) and reports them via FaaS for visualisation in an operations dashboard.

export function trace(docType, operateType) { return function(target, propertyName, descriptor) { /* ... */ } }

3.4 Larkdocx Batch Update Key Implementation

To overcome the low QPS limit (≤3) of the Feishu document API, a batch‑update mechanism with a custom concurrent scheduler and retry logic was introduced.

async function concurrentFetch(poolLimit, iterable, iteratorFn) { /* concurrency control with retry */ }
async function batchUpdateBlock(blocks, documentId) { /* chunked batch updates with error handling */ }

Conclusion

The redesign abstracts parsing and restoration into a three‑layer architecture, leveraging Decorator, Adapter, and Proxy patterns, and introduces a robust batch‑update scheduler, demonstrating how solid fundamentals and design patterns enable scalable, maintainable SDK solutions.

SDKtypescriptproxyNode.jsAdapter Patterndecorator
TikTok Frontend Technology Team
Written by

TikTok Frontend Technology Team

We are the TikTok Frontend Technology Team, serving TikTok and multiple ByteDance product lines, focused on building frontend infrastructure and exploring community technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.