Zuoyebang’s NLP Platforms: Boosting Online Education with AI
In this interview, Zuoyebang’s NLP lead explains how the company built self‑developed platforms like IQC and FTP to automate text quality inspection and intelligent labeling, outlines their architecture, shares practical deep‑learning applications such as translation and grammar correction, and discusses future research directions in large‑scale multi‑label classification, few‑shot learning, and multimodal models.
Zuoyebang’s technical team discusses the rapid development of NLP in online education and how they have built self‑developed platforms such as the Intelligent Quality Control (IQC) platform and the Text Intelligent Annotation Platform (FTP) to improve user experience and operational efficiency.
InfoQ: What business background led Zuoyebang to adopt NLP?
Jiang Hongfei: Massive daily text data contains valuable information, but manual analysis was inconsistent, untimely, and unable to cover all data, leading to missed hot issues and inaccurate user‑experience improvements. NLP was adopted to address these challenges, despite early infrastructure limitations.
InfoQ: What NLP technologies and platforms have you deployed?
Jiang Hongfei: Statistical methods are still used where cost‑effective. With the industrialization of pre‑training and fine‑tuning, many tasks now rely on BERT‑based models. To improve efficiency, we design pre‑filtering techniques tailored to each scenario.
Two representative platforms have been built:
Intelligent Quality Control Platform (IQC): a fast‑configurable text‑matching platform not limited to quality inspection.
Text Intelligent Annotation Platform (FTP): a human‑machine collaborative labeling platform supporting active learning, self‑learning, multiple sampling strategies, and pre‑trained models.
InfoQ: What are the main modules of IQC?
Jiang Hongfei: IQC consists of five modules:
Data Acquisition: integrates pending‑inspection data from various business lines into a unified representation.
Strategy Configuration: interactive module for quality inspectors to define rules.
Strategy Parsing and Matching: matches data against configured rules.
Testing Module: validates rules using existing test sets and sampled incremental data.
Intelligent Strategy Optimization: recommends synonyms during rule configuration and automatically updates strategies based on human‑verified data.
FTP’s architecture includes the following modules:
Data Acquisition: fetches and filters candidate labeling data.
Pre‑processing: tokenization, sentence splitting, generalization, denoising, clustering, etc.
Data Sampling: multiple strategies to improve labeling efficiency.
Model Training: trains models on labeled data for predicting unlabeled data.
Data Annotation: human‑in‑the‑loop labeling interface.
Strategy Recommendation: intelligently suggests the next optimal iteration strategy.
Data Management: reuses historical data and manages label taxonomies.
Both platforms are fully designed and developed by Zuoyebang’s product‑research team.
InfoQ: What are the best practices of applying deep‑learning‑based NLP at Zuoyebang?
Key applications include:
Intelligent translation for word lookup, sentence recommendation, and passage translation.
English grammar error correction for student writing.
Intelligent question analysis for knowledge point, ability, and difficulty assessment.
Text structuring and tagging for user behavior research, learning analytics, and satisfaction surveys.
These innovations continuously drive each other, expanding the range of NLP use cases.
InfoQ: What experiences can the industry learn from Zuoyebang’s NLP exploration?
Challenges include long service chains, heavy human involvement, and abundant unstructured data. Early stages suffered from incomplete and untimely data, requiring self‑driven technical design and close collaboration with upstream teams.
Education users vary widely across subjects, regions, curricula, and pacing, demanding reusable and efficiently migratable solutions throughout data acquisition, labeling, model training, deployment, and task definition changes.
Because NLP is tightly coupled with business, algorithm engineers must deeply understand the domain, maintain close communication with educators, and proactively propose technical improvements.
InfoQ: What NLP directions will Zuoyebang explore in 2021?
Planned research areas include:
Large‑scale multi‑label classification with thousands of categories and severe imbalance.
Zero‑/few‑shot learning for rapidly emerging business scenarios.
Domain adaptation of pre‑trained models for diverse tasks.
Fine‑grained multi‑dimensional user preference analysis.
Multimodal learning combining text with speech and image features.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
