Backend Development 14 min read

Redesign of the Internationalization Translation Platform Document Parsing SDK to a Three‑Layer Architecture

This article details the motivation, benefits, and technical implementation of refactoring the document parsing SDK from a monolithic design to a three‑layer architecture using Adapter, Decorator, and Proxy patterns, reducing code size by over 70% and improving extensibility for multiple document formats.

ByteFE
ByteFE
ByteFE
Redesign of the Internationalization Translation Platform Document Parsing SDK to a Three‑Layer Architecture

Background

As the business grew, the original document parsing SDK (referred to as "Parser 1.0") supported only Word documents and later expanded to eight document types. Maintaining separate logic for each format caused code duplication and made it difficult to add new formats, prompting a complete architectural overhaul.

Refactoring Benefits

Core SDK logic reduced by more than 70% (from over 6000 lines to about 2000 lines).

Improved maintainability and extensibility, including a 10× increase in Lark Doc 2.0 restoration efficiency, an 80% reduction in Node BFF project parsing code, and the implementation of data visualization using TypeScript decorators and FaaS.

Technical Principles

3.1 Architecture Design

The essence of parsing is converting a document into a target DSL, while restoration converts the DSL back into a file. All document types share this abstract process, which can be modeled as a three‑layer architecture: a low‑level parser layer, an adapter layer that normalizes parsed results into a unified DSL, and a high‑level feature (application) layer that exposes unified capabilities.

The adapter layer solves the problem of divergent data structures by converting each parser’s output into a consistent DSL, enabling the feature layer to handle all document types uniformly.

3.2 Layer Implementation

Parser Layer

class SomeTypeParser {
    constructor(config) {}

    @type2bridge
    parse() {}

    @bridge2type
    restore() {}
}

The parser uses TypeScript decorators (covered in section 3.3).

Feature Layer

The feature layer provides a CAT (Computer‑Assisted Translation) interface that works with segments generated by the parser.

class CAT {
    // Document parsing, returns segments
    adaptCAT(type, config) {}

    // Machine translation
    adaptCATWithMT() {}

    // Word count
    countWords() {}

    // Document generation
    genDoc() {}

    // Custom parser registration
    apply(type, parser) {}
}

Adapter Layer

export const type2bridge = () => {
    // Convert parser output to unified DSL
};

export const bridge2type = () => {
    // Convert unified DSL back to document format
};

Unified DSL Example

{
    blockId: string,
    elements: {
        type: string, // text or other
        textRun?: { style: Object, content: string },
        location: { start: number, end: number }
    }[]
}

3.3 Decorator Usage and SDK Data Visualization

Decorators enhance class methods, providing additional capabilities such as tracing and proxy handling.

Example parser with decorators:

class TxtParser {
    @txt2bridge
    async parse(token: string | Buffer) {
        const buffer = path2buffer(token);
        return [buffer.toString()];
    }

    @bridge2txt
    async restore(blocks: string[], _raw: string) {
        return { buffer: Buffer.from(blocks.join()) };
    }
}

Decorator implementations use proxy functions to intercept method calls:

export function proxyForReturnValue
(proxy: (this: THIS, data: T, config?: C) => Bridge.Data | Promise
) {
    return function (target: any, propertyName: string, descriptor: TypedPropertyDescriptor
) {
        const method = descriptor.value;
        descriptor.value = async function (token: string, config?: C) {
            const data = await method.call(this, token, config);
            return await proxy.call(this, data, config);
        };
    };
}

export function proxyForParam
(proxy: (this: THIS, blocks: Bridge.Block[], raw: string | Buffer, config?: C) => T | Promise
) {
    return function (target: any, propertyName: string, descriptor: TypedPropertyDescriptor
) {
        const method = descriptor.value;
        descriptor.value = async function (blocks: Bridge.Block[], raw: string | Buffer, config?: C) {
            return await method.call(this, await proxy.call(this, blocks, raw, config), raw, config);
        };
    };
}

A trace decorator collects execution metrics and reports them via FaaS:

export function trace(docType: DocType, operateType: 'parse' | 'restore') {
    return function (target: any, propertyName: string, descriptor: TypedPropertyDescriptor
) {
        const method = descriptor.value;
        descriptor.value = async function (...args) {
            try {
                const res = await method.apply(this, args);
                // Collect and report metrics here
                return res;
            } catch (error) {
                // Report error via FaaS
                throw error;
            }
        };
    };
}

3.4 Larkdocx Batch Update Implementation

To overcome the QPS limit (≤3) of the Feishu document API, a batch‑update mechanism with a custom concurrency limiter and retry logic was introduced.

async function concurrentFetch(poolLimit, iterable, iteratorFn) {
    const result = [];
    const retry = [];
    const executing = new Set();
    for (const item of iterable) {
        const p = Promise.resolve().then(() => iteratorFn(item));
        result.push(p);
        executing.add(p);
        const clean = () => executing.delete(p);
        p.then(r => r ? retry.push(Promise.resolve().then(() => iteratorFn(r))) : clean())
         .catch(clean);
        if (executing.size >= poolLimit) {
            await Promise.race(executing);
        }
    }
    return Promise.all(result).then(() => Promise.all(retry));
}

async function batchUpdateBlock(blocks, documentId) {
    const blockUpdates = blocks.map(block => this.getUpdateForBlock(block)).filter(Boolean);
    const chunkedBlockUpdates = chunk(blockUpdates, MAX_BATCH_SIZE);
    const batchUpdateWithErrorHandle = async chunk => {
        return await this.lark.batchUpdateBlockForDocx(documentId, { requests: chunk })
            .then(resp => {
                if (resp?.data.code === LARK_BLOCK_UPDATE_ERROR_CODE.ForBidden) {
                    return this.downgradeUpdatesForBlocks(chunk);
                }
                return null;
            });
    };
    await concurrentFetch(3, chunkedBlockUpdates, batchUpdateWithErrorHandle);
}

Summary

By analyzing the challenges of the existing document parsing SDK and abstracting the parsing and restoration processes, a three‑layer architecture (Parser → Adapter → Feature) was designed and implemented using Decorator, Adapter, and Proxy patterns, along with a custom concurrency‑controlled batch update mechanism, resulting in a more maintainable and extensible solution.

typescriptBackend DevelopmentNode.jsAdapter PatternProxy PatterndecoratorSDK Architecture
ByteFE
Written by

ByteFE

Cutting‑edge tech, article sharing, and practical insights from the ByteDance frontend team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.