Artificial Intelligence 9 min read

Evolution and Design of Data Annotation Scheduling Systems at Baidu Intelligent Cloud

This article outlines the historical development of data annotation—from its early, manual stages to a mature, fully automated scheduling system—detailing key elements, challenges, and architectural solutions that enable scalable, high‑quality AI data pipelines at Baidu Intelligent Cloud.

Baidu Intelligent Testing

May 5, 2019

Evolution and Design of Data Annotation Scheduling Systems at Baidu Intelligent Cloud

Introduction

Data is the foundation of AI, and Baidu Intelligent Cloud’s Data Crowdsourcing Platform, founded in 2012, uses an efficient crowdsourcing model to collect raw data, process it, and deliver standardized, structured datasets for training AI models.

Stages of Data Annotation Development

Stage 1 – Germination : In the early years, the platform mainly supported internal Baidu product evaluations and model training, providing a simple platform for annotators to select and label data manually.

Stage 2 – Growth : As AI investments grew, data annotation demand increased, leading to the accumulation of methodologies and technologies over roughly three years.

Stage 3 – Explosion : In September 2016, Baidu’s CEO announced AI as the core of the company, triggering a massive surge in demand for low‑level annotation data across autonomous driving, vision, and speech.

Stage 4 – Maturity : By 2018, AI financing exceeded a trillion RMB, and the annotation market reached 100‑300 billion RMB, marking a mature phase where data quality became critical.

Key Elements of Data Annotation

Annotators : The primary productivity factor; improving their ability and efficiency is essential.

Data : Effective data ingestion, processing, and quality assurance are core challenges.

Annotation Tools : Provide labeling rules and interaction methods, crucial for empowering annotators.

Scheduling System Evolution

Germination Phase : Simple manual data distribution; no dedicated scheduling system required.

Growth Phase : Rising data volume demanded automated task dispatch; the early scheduling system emerged to automate data allocation.

Explosion Phase : Complex annotation types (e.g., autonomous driving) and higher quality requirements led to challenges in data quality management and large‑scale annotator coordination.

Maturity Phase : Scale‑up of business demands extreme data quality; solutions include refined data flow control and hierarchical annotator management (e.g., guild mechanisms).

Current Scheduling System Goals and Implementation

Generality : Supports universal scheduling objects—single data items, tasks (aggregations of items), and batches (aggregations of tasks).

Business Model Abstraction : Provides a unified representation of data flow, decision inputs, computation, and outputs.

Flow Strategy Generality : Input can be real‑time databases or offline warehouses; decisions are computed and result in configurable routing.

High Availability

Modules aim for 99.9% request correctness, 80% of decisions within 60 seconds, hot‑loading of strategy updates, and SLA‑based monitoring with self‑recovery mechanisms.

Conclusion

Amid rapid AI development, annotation scheduling has transitioned from manual to fully automated processes, emphasizing generality and scalability to meet complex business needs; future work will focus on micro‑scheduling optimizations to further improve data delivery efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI High Availability workflow automation quality control data annotation scheduling system

Written by

Baidu Intelligent Testing

Welcome to follow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.