Redesign of the Internationalization Translation Platform Document Parsing SDK to a Three‑Layer Architecture
This article details the motivation, benefits, and technical implementation of refactoring the document parsing SDK from a monolithic design to a three‑layer architecture using Adapter, Decorator, and Proxy patterns, reducing code size by over 70% and improving extensibility for multiple document formats.
Background
As the business grew, the original document parsing SDK (referred to as "Parser 1.0") supported only Word documents and later expanded to eight document types. Maintaining separate logic for each format caused code duplication and made it difficult to add new formats, prompting a complete architectural overhaul.
Refactoring Benefits
Core SDK logic reduced by more than 70% (from over 6000 lines to about 2000 lines).
Improved maintainability and extensibility, including a 10× increase in Lark Doc 2.0 restoration efficiency, an 80% reduction in Node BFF project parsing code, and the implementation of data visualization using TypeScript decorators and FaaS.
Technical Principles
3.1 Architecture Design
The essence of parsing is converting a document into a target DSL, while restoration converts the DSL back into a file. All document types share this abstract process, which can be modeled as a three‑layer architecture: a low‑level parser layer, an adapter layer that normalizes parsed results into a unified DSL, and a high‑level feature (application) layer that exposes unified capabilities.
The adapter layer solves the problem of divergent data structures by converting each parser’s output into a consistent DSL, enabling the feature layer to handle all document types uniformly.
3.2 Layer Implementation
Parser Layer
class SomeTypeParser {
constructor(config) {}
@type2bridge
parse() {}
@bridge2type
restore() {}
}The parser uses TypeScript decorators (covered in section 3.3).
Feature Layer
The feature layer provides a CAT (Computer‑Assisted Translation) interface that works with segments generated by the parser.
class CAT {
// Document parsing, returns segments
adaptCAT(type, config) {}
// Machine translation
adaptCATWithMT() {}
// Word count
countWords() {}
// Document generation
genDoc() {}
// Custom parser registration
apply(type, parser) {}
}Adapter Layer
export const type2bridge = () => {
// Convert parser output to unified DSL
};
export const bridge2type = () => {
// Convert unified DSL back to document format
};Unified DSL Example
{
blockId: string,
elements: {
type: string, // text or other
textRun?: { style: Object, content: string },
location: { start: number, end: number }
}[]
}3.3 Decorator Usage and SDK Data Visualization
Decorators enhance class methods, providing additional capabilities such as tracing and proxy handling.
Example parser with decorators:
class TxtParser {
@txt2bridge
async parse(token: string | Buffer) {
const buffer = path2buffer(token);
return [buffer.toString()];
}
@bridge2txt
async restore(blocks: string[], _raw: string) {
return { buffer: Buffer.from(blocks.join()) };
}
}Decorator implementations use proxy functions to intercept method calls:
export function proxyForReturnValue
(proxy: (this: THIS, data: T, config?: C) => Bridge.Data | Promise
) {
return function (target: any, propertyName: string, descriptor: TypedPropertyDescriptor
) {
const method = descriptor.value;
descriptor.value = async function (token: string, config?: C) {
const data = await method.call(this, token, config);
return await proxy.call(this, data, config);
};
};
}
export function proxyForParam
(proxy: (this: THIS, blocks: Bridge.Block[], raw: string | Buffer, config?: C) => T | Promise
) {
return function (target: any, propertyName: string, descriptor: TypedPropertyDescriptor
) {
const method = descriptor.value;
descriptor.value = async function (blocks: Bridge.Block[], raw: string | Buffer, config?: C) {
return await method.call(this, await proxy.call(this, blocks, raw, config), raw, config);
};
};
}A trace decorator collects execution metrics and reports them via FaaS:
export function trace(docType: DocType, operateType: 'parse' | 'restore') {
return function (target: any, propertyName: string, descriptor: TypedPropertyDescriptor
) {
const method = descriptor.value;
descriptor.value = async function (...args) {
try {
const res = await method.apply(this, args);
// Collect and report metrics here
return res;
} catch (error) {
// Report error via FaaS
throw error;
}
};
};
}3.4 Larkdocx Batch Update Implementation
To overcome the QPS limit (≤3) of the Feishu document API, a batch‑update mechanism with a custom concurrency limiter and retry logic was introduced.
async function concurrentFetch(poolLimit, iterable, iteratorFn) {
const result = [];
const retry = [];
const executing = new Set();
for (const item of iterable) {
const p = Promise.resolve().then(() => iteratorFn(item));
result.push(p);
executing.add(p);
const clean = () => executing.delete(p);
p.then(r => r ? retry.push(Promise.resolve().then(() => iteratorFn(r))) : clean())
.catch(clean);
if (executing.size >= poolLimit) {
await Promise.race(executing);
}
}
return Promise.all(result).then(() => Promise.all(retry));
}
async function batchUpdateBlock(blocks, documentId) {
const blockUpdates = blocks.map(block => this.getUpdateForBlock(block)).filter(Boolean);
const chunkedBlockUpdates = chunk(blockUpdates, MAX_BATCH_SIZE);
const batchUpdateWithErrorHandle = async chunk => {
return await this.lark.batchUpdateBlockForDocx(documentId, { requests: chunk })
.then(resp => {
if (resp?.data.code === LARK_BLOCK_UPDATE_ERROR_CODE.ForBidden) {
return this.downgradeUpdatesForBlocks(chunk);
}
return null;
});
};
await concurrentFetch(3, chunkedBlockUpdates, batchUpdateWithErrorHandle);
}Summary
By analyzing the challenges of the existing document parsing SDK and abstracting the parsing and restoration processes, a three‑layer architecture (Parser → Adapter → Feature) was designed and implemented using Decorator, Adapter, and Proxy patterns, along with a custom concurrency‑controlled batch update mechanism, resulting in a more maintainable and extensible solution.
ByteFE
Cutting‑edge tech, article sharing, and practical insights from the ByteDance frontend team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.