Tagged articles
14 articles
Page 1 of 1
Machine Heart
Machine Heart
Apr 19, 2026 · Artificial Intelligence

How Google Turns Your CAPTCHA Clicks into Training Data for the Next Generation of AI

The article explains how YouTube’s AI‑video rating and Google’s reCAPTCHA system covertly collect billions of user interactions each day, converting them into labeled data that fuels Google’s computer‑vision models such as Veo, Maps and Waymo, effectively turning routine security checks into a massive, unpaid AI training workforce.

AI trainingComputer VisionGoogle
0 likes · 7 min read
How Google Turns Your CAPTCHA Clicks into Training Data for the Next Generation of AI
AI Engineering
AI Engineering
Mar 16, 2026 · Artificial Intelligence

Does Synthetic Data Have a Future? Evidence‑Based Conclusions

A detailed investigation of two public programming‑training datasets shows that AI‑only synthetic data suffers from severe quality issues, and even AI‑plus‑expert review yields only about ten percent usable examples, proving that high‑quality training data still requires domain experts and rigorous quality‑control processes.

AI trainingModel Evaluationdata labeling
0 likes · 16 min read
Does Synthetic Data Have a Future? Evidence‑Based Conclusions
PMTalk Product Manager Community
PMTalk Product Manager Community
Dec 15, 2025 · Product Management

How an AI Product Director Turns an Idea into a Market‑Ready AI Product

The article walks through a six‑step framework—defining the product, setting value‑based metrics, acquiring and labeling data, choosing and evaluating models, building an MVP, and creating a growth loop—to guide AI product managers from concept to launch while emphasizing practical trade‑offs and real‑world examples.

AI productMVPModel Selection
0 likes · 10 min read
How an AI Product Director Turns an Idea into a Market‑Ready AI Product
Big Data Tech Team
Big Data Tech Team
Sep 17, 2025 · Big Data

How to Build a Scalable Tag System for Recommendation Engines

This article explains why a robust tag system is essential for recommendation and mining strategies, outlines the hierarchy of entity, concept, and theme tags, and provides practical principles, architecture, and step‑by‑step methods for constructing and managing tags in large‑scale data platforms.

Big DataData Architecturedata labeling
0 likes · 14 min read
How to Build a Scalable Tag System for Recommendation Engines
DataFunTalk
DataFunTalk
Jul 15, 2025 · Artificial Intelligence

Inside Scale AI: How a Data‑Labeling Startup Became a $29 B AI Powerhouse

This investigative article traces Scale AI’s evolution from a MIT‑dropout’s data‑annotation startup to a $29 billion AI infrastructure leader, detailing its founder Alexandr Wang, core products, government contracts, competitive advantages, and the strategic shift toward defense‑focused AI solutions.

AI InfrastructureArtificial IntelligenceScale AI
0 likes · 15 min read
Inside Scale AI: How a Data‑Labeling Startup Became a $29 B AI Powerhouse
Data Thinking Notes
Data Thinking Notes
Jun 11, 2025 · Artificial Intelligence

How RAG‑Powered AI Boosted Government Data Labeling Efficiency by 5×

This case study details how a government‑focused AI system using retrieval‑augmented generation (RAG) and advanced preprocessing algorithms increased data labeling speed by up to five times, raised accuracy above 95%, and produced high‑quality enterprise, spatial, and economic datasets.

GovernmentRAGai
0 likes · 5 min read
How RAG‑Powered AI Boosted Government Data Labeling Efficiency by 5×
Data Thinking Notes
Data Thinking Notes
Mar 6, 2024 · Product Management

Mastering Tag Systems: Design, Build, and Optimize Your Enterprise Data Labels

This article explains how enterprises can construct, design, and manage comprehensive tag systems—from foundational concepts and core design principles to construction workflows, evaluation methods, industry case studies, and practical Q&A—enabling precise customer segmentation and data‑driven marketing.

CDPCustomer SegmentationTagging
0 likes · 19 min read
Mastering Tag Systems: Design, Build, and Optimize Your Enterprise Data Labels
Baidu Geek Talk
Baidu Geek Talk
Oct 11, 2023 · Artificial Intelligence

How Baidu’s Qianfan 2.0 Supercharges Large‑Model Development and Deployment

The article reviews Baidu Cloud’s Qianfan 2.0 platform, detailing its expanded model catalog, dataset library, Chinese‑language enhancements, compression and speed gains, robust AI infrastructure, application templates, and end‑to‑end data‑labeling pipeline that together lower cost and accelerate large‑model adoption across industries.

AI PlatformCloud AILarge Language Models
0 likes · 14 min read
How Baidu’s Qianfan 2.0 Supercharges Large‑Model Development and Deployment
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 15, 2022 · Artificial Intelligence

Boost Model Performance with Only 5 Lines of Pseudo‑Label Code

This article explains how semi‑supervised pseudo‑label learning can dramatically improve model accuracy by using a tiny five‑line code snippet that generates pseudo‑labels for unlabeled data, retrains a second model, and avoids data leakage with a proper validation set.

Semi-supervised Learningaidata labeling
0 likes · 4 min read
Boost Model Performance with Only 5 Lines of Pseudo‑Label Code
Python Crawling & Data Mining
Python Crawling & Data Mining
Sep 16, 2021 · Frontend Development

Boost Your Captcha Labeling Speed with a Vue‑Spring Annotation Tool

This article walks through building a web‑based, high‑efficiency image captcha labeling system using a Vue admin template for the frontend and Spring Boot for the backend, detailing functional modules, technology stack, deployment steps, and demo results to streamline data annotation before AI model training.

Spring BootVuedata labeling
0 likes · 6 min read
Boost Your Captcha Labeling Speed with a Vue‑Spring Annotation Tool
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 28, 2020 · Artificial Intelligence

Avoid Common Pitfalls in Industrial Text Classification: A Practical Guide

This comprehensive guide examines real‑world text classification projects, covering label taxonomy design, data scarcity solutions, efficient annotation, new‑class discovery, algorithm selection, evaluation metrics, OOV handling, model evolution, rule‑model integration, performance‑boosting tricks, and inference under resource constraints.

Few‑Shot LearningModel EvaluationNLP
0 likes · 15 min read
Avoid Common Pitfalls in Industrial Text Classification: A Practical Guide
58 Tech
58 Tech
Aug 7, 2020 · Artificial Intelligence

Technical Overview of 58.com Intelligent Voice Analysis Platform

The article presents a comprehensive technical overview of 58.com’s intelligent voice analysis platform, detailing its business background, system architecture, speech and NLP technologies, speaker diarization methods, model performance, data labeling workflow, and practical applications in call‑center quality inspection and user profiling.

AI Platformdata labelingnatural language processing
0 likes · 11 min read
Technical Overview of 58.com Intelligent Voice Analysis Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 26, 2019 · Artificial Intelligence

Design and Architecture of the ANNO AI Data Annotation Platform

The ANNO platform unifies iQIYI’s AI data annotation by defining an abstract model—media records, HITs, partitions, and hitsets—driving a modular Vue.js front‑end that flexibly handles diverse media, annotation modes, and workflows, while AI‑assisted pre‑labeling cuts labeling time and supports scalable, secure, collaborative development.

AI annotationannotation workflowdata labeling
0 likes · 14 min read
Design and Architecture of the ANNO AI Data Annotation Platform