Stella Data Annotation Platform: Design, Architecture, and AI‑Assisted Labeling
The article details the design and implementation of the Stella data annotation SaaS platform at 58.com, covering its background, evolution, modular architecture, annotation capabilities across text, image, audio, and video, AI‑assisted labeling, storage solutions, quality and efficiency management, as well as localization and licensing considerations.
Stella is a self‑developed data‑annotation SaaS platform built by 58.com to support algorithm model incubation and various business line labeling needs. It includes a task center, data management, configuration center, and annotation center, focusing on GUI usability, labeling quality, and efficiency. The platform currently supports 24 annotation methods and has generated over five million labeled samples for the group.
In the current AI‑driven environment, enterprises accelerate the construction of machine‑learning platforms that require massive, high‑quality labeled data. Supervised and semi‑supervised learning depend heavily on the quantity and quality of samples, making efficient labeling a core challenge.
The early labeling workflow relied on disparate tools such as Excel, labelMe, and labelImage, leading to inconsistent protocols, high labor costs, and poor quality. This motivated the creation of a unified annotation platform.
The platform’s core model treats annotation as a transformation from raw material (素材) to labeled samples (样本). The workflow consists of three pillars: material library, annotation engine, and sample library, all orchestrated by a task workflow engine.
Key design goals include extensible annotation capabilities, a configurable workbench, low‑cost operation, and high labeling accuracy. Annotation types are categorized by modality (text, image, audio, video, point cloud) and task (classification, detection, segmentation, entity labeling, relationship labeling, tracking, etc.).
To improve usability, the workbench is fully configurable: users can assemble tools (point, line, rectangle, polygon), media controls (zoom, crop, playback), label widgets (entity, relationship, dropdown, checkbox), and auxiliary pre‑labeling tools without system upgrades.
AI‑assisted labeling is introduced to reduce manual effort. Two approaches are described: fully automatic labeling using unsupervised models (e.g., clustering) for coarse tasks, and semi‑automatic labeling where a small manually labeled subset trains supervised or semi‑supervised models to label the remaining data. Active learning strategies (low‑confidence, margin, entropy sampling) are highlighted for selecting high‑value samples for manual annotation.
Business‑driven labeling, such as using user interactions with image captchas to generate labeled data, is also discussed as a way to lower labeling costs.
Data storage is designed for large‑scale, multi‑modal samples. Object storage (WOS) handles media files, column‑family databases (e.g., HBase) store metadata and labels, message queues enable asynchronous processing, and a data warehouse (Hive) supports OLAP analytics. A sample annotation schema (JSON) is provided:
{
"pattern":"annotation_type",
"basicinfo":{
"name":"sample_file_name",
"url":"sample_download_url",
"mediatype":"text/image/audio/video/behavior/pointcloud",
"audittype":"machine/human"
},
"metadata":{
"format":"png",
"width":600,
"height":450,
"depth":24,
"resolution":"600*450",
"size":97616
},
"result":[
{
"ptype":"rectangle",
"postion":[[45,97],[136,26]],
"contenttype":1,
"audittype":"machine/human",
"label":"car",
"rate":0.89,
"evidence":{"content":"selected_content"}
},
{
"ptype":"rectangle",
"postion":[[45,97],[136,26]],
"contenttype":1,
"audittype":"machine/human",
"label":123,
"rate":0.89,
"evidence":{"content":"selected_content"}
}
],
"version":"v1"
}Quality management includes real‑time and batch inspection, multi‑round reviews, and metrics such as accuracy, rejection rate, and reviewer performance. Human‑efficiency management tracks task volume, annotation time, and quality indicators via a data‑warehouse and BI dashboards.
For customers requiring on‑premise deployment, the platform offers localization and privatization. The delivery process covers pre‑delivery requirement analysis, contract signing, deployment, testing, and post‑delivery maintenance. License authentication and JAR encryption (using XJar or JVMTI‑based custom solutions) protect intellectual property, while compliance with third‑party open‑source licenses is ensured.
The article concludes with a summary emphasizing that platform success depends on user adoption, iterative feedback, and continuous optimization rather than merely technical features. It also outlines future directions such as crowdsourcing and integration with industry‑standard datasets.
Authors: Hou Zhiming, Li Gangqiang, Xing Erkang – Senior Backend Engineers at 58.com; Hu Beichen – Product Lead of the Stella platform.
Platform Access: https://stella.58.com | Documentation
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.