Tagged articles

data augmentation

73 articles · Page 1 of 1

Jun 28, 2026 · Artificial Intelligence

Which Training Data Shapes Large‑Model Abilities? Introducing Mechanistic Data Attribution (MDA)

The paper presents Mechanistic Data Attribution, a framework that traces the origins of specific internal mechanisms such as induction heads to particular training samples, revealing that repetitive "garbage" data—not high‑quality text—drives their emergence, and validates this causal link through deletion and augmentation experiments while enabling scalable data‑driven model improvement.

Causal InterventionInduction HeadsLarge Language Models

0 likes · 12 min read

Which Training Data Shapes Large‑Model Abilities? Introducing Mechanistic Data Attribution (MDA)

Baidu Maps Tech Team

Jun 11, 2026 · Artificial Intelligence

DuIVRS-2: End-to-End Large-Scale Interactive POI Update System

The article analyzes Baidu's DuIVRS-2, an end‑to‑end large‑scale interactive voice‑response system for POI data collection, detailing its architectural innovations, data‑augmentation, low‑latency LLM management, dual‑model iterative learning, engineering optimizations, and extensive offline and online experiments that demonstrate superior accuracy, speed, and cost efficiency over prior solutions.

IVRLLMPOI

0 likes · 18 min read

DuIVRS-2: End-to-End Large-Scale Interactive POI Update System

Data Party THU

May 26, 2026 · Artificial Intelligence

Time-Series Forecasting Augmentation: Frequency, Decomposition, and Patch Methods Compared

The article examines the challenges of augmenting time-series forecasting, reviews mainstream techniques—including frequency-domain, decomposition, and patch-based methods—and demonstrates through extensive experiments that Temporal Patch Shuffle (TPS) consistently achieves superior performance across long-term, short-term, and classification tasks.

Temporal Patch ShuffleTime Series Forecastingdata augmentation

0 likes · 20 min read

Time-Series Forecasting Augmentation: Frequency, Decomposition, and Patch Methods Compared

HyperAI Super Neural

May 19, 2026 · Artificial Intelligence

Generative AI Slashes Preclinical Animal Use by Up to 50% in Small‑Sample Research

A German‑French team introduced genESOM, a generative AI model that decouples structure learning from data synthesis, restores lost lipid signals in reduced‑sample multiple sclerosis studies, controls false‑positive inflation, and cuts required preclinical animal numbers by 30‑50% while outperforming GMM and CT‑GAN.

Generative AIanimal reductionbiomedical research

0 likes · 12 min read

Generative AI Slashes Preclinical Animal Use by Up to 50% in Small‑Sample Research

Data Party THU

Apr 30, 2026 · Artificial Intelligence

Time Series Forecasting Augmentation: Frequency, Decomposition, and Patch Techniques

This article reviews why classic classification augmentations fail for forecasting, introduces the essential data‑label consistency requirement, and systematically categorizes effective time‑series augmentation methods—including frequency‑domain (RobustTAD, FreqMask, FreqMix), decomposition (STAug), and patch‑based approaches (WaveMask, WaveMix, Dominant Shuffle, Temporal Patch Shuffle)—backed by extensive experiments on long‑term, short‑term, and classification tasks.

Temporal Patch ShuffleTime Series Forecastingdata augmentation

0 likes · 20 min read

Time Series Forecasting Augmentation: Frequency, Decomposition, and Patch Techniques

DeepHub IMBA

Apr 22, 2026 · Artificial Intelligence

A Survey of Time Series Forecasting Augmentation: Frequency Domain, Decomposition, and Patch Methods

The article reviews why classic classification augmentations fail for forecasting, outlines a taxonomy of effective time‑series augmentation techniques—including frequency‑domain, decomposition, and patch‑based methods—details the Temporal Patch Shuffle (TPS) pipeline, and presents extensive experiments showing TPS achieves state‑of‑the‑art improvements across long‑term, short‑term, and classification tasks.

Temporal Patch Shuffledata augmentationforecasting

0 likes · 17 min read

A Survey of Time Series Forecasting Augmentation: Frequency Domain, Decomposition, and Patch Methods

Alibaba Cloud Big Data AI Platform

Apr 16, 2026 · Artificial Intelligence

Build a Full End‑to‑End Embodied AI Workflow with Isaac Lab Arena

This notebook walks through a complete pipeline—from configuring Isaac Lab Arena environments and downloading datasets, to using Mimic for large‑scale data augmentation, fine‑tuning a GR00T‑N1.5 policy, and performing closed‑loop evaluation—demonstrating how to develop and validate embodied AI tasks on PAI‑DSW.

GR00TIsaac LabMimic

0 likes · 14 min read

Build a Full End‑to‑End Embodied AI Workflow with Isaac Lab Arena

Alibaba Cloud Big Data AI Platform

Apr 9, 2026 · Artificial Intelligence

How Data Flywheels Accelerate Small Agentic Model Training

This article details a data‑flywheel framework for training compact agentic language models, describing synthetic task generation, mock environment simulation, rubric‑based reward design, iterative hard‑sample augmentation, and experimental results that show consistent performance gains across benchmarks.

Synthetic Environmentsagentic modelsdata augmentation

0 likes · 17 min read

How Data Flywheels Accelerate Small Agentic Model Training

Data Party THU

Apr 5, 2026 · Artificial Intelligence

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Visual and vision-language models excel under IID benchmarks but often fail on out-of-distribution data due to shortcut learning; this article examines the problem, explains its causes, and proposes data-level and model-level interventions—including StillMix, FLASH, and SPARCL—to improve OOD robustness.

AI researchModel DesignOOD generalization

0 likes · 7 min read

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Data Party THU

Feb 25, 2026 · Artificial Intelligence

Why Multimodal LLMs Miss Tiny Objects—and How to Fix It

This article analyzes why multimodal large language models often fail to detect small objects, identifies three core bottlenecks, and presents a four‑tiered optimization roadmap—from zero‑cost inference tricks to data augmentation, model fine‑tuning, and engineering safeguards—backed by three real‑world case studies and actionable guidelines.

Inference Optimizationdata augmentationmodel fine-tuning

0 likes · 20 min read

Why Multimodal LLMs Miss Tiny Objects—and How to Fix It

Bighead's Algorithm Notes

Feb 1, 2026 · Artificial Intelligence

Beyond Historical Data: Adaptive Synthesis for Financial Time Series

This article reviews a recent paper that proposes a drift‑aware data‑stream system integrating machine‑learning‑based adaptive control into financial data management, introducing a parametric data‑operation module, a gradient‑based bi‑level optimizer, and a curriculum planner to improve model robustness and risk‑adjusted returns in non‑stationary markets.

adaptive data synthesisconcept driftcurriculum-learning

0 likes · 18 min read

Beyond Historical Data: Adaptive Synthesis for Financial Time Series

AsiaInfo Technology: New Tech Exploration

Nov 28, 2025 · Artificial Intelligence

Boosting 5G Complaint Intent Detection with Large-Model-Enhanced Few-Shot Learning

This paper presents a collaborative framework where a large language model generates high‑quality synthetic samples to augment a lightweight model, dramatically improving few‑shot user‑complaint intent recognition in 5G networks, achieving a 21% boost for rare categories and a 9% overall accuracy gain.

Knowledge Distillationcomplaint intent detectiondata augmentation

0 likes · 27 min read

Boosting 5G Complaint Intent Detection with Large-Model-Enhanced Few-Shot Learning

Alibaba Cloud Big Data AI Platform

Nov 24, 2025 · Artificial Intelligence

Fine‑Tuning GR00T‑N1.5: From Human Demonstrations to Distributed Imitation Learning

This tutorial walks through fine‑tuning the complex VLA model GR00T‑N1.5 by collecting human demonstrations, annotating and massively augmenting data with DLC, performing distributed imitation learning, and validating the model through a server‑client DSW setup, complete with code snippets, resource specs, and visual examples.

DSWDistributed Imitation LearningGR00T

0 likes · 18 min read

Fine‑Tuning GR00T‑N1.5: From Human Demonstrations to Distributed Imitation Learning

Data Party THU

Nov 22, 2025 · Artificial Intelligence

How Frequency‑Refined Augmentation Boosts Contrastive Learning for Time‑Series Classification

FreRA introduces a lightweight, plug‑in frequency‑refined augmentation that adaptively refines spectral components to preserve global semantics while injecting variance, dramatically improving contrastive learning performance on time‑series classification, anomaly detection, and transfer learning across multiple benchmark datasets.

contrastive learningdata augmentationfrequency domain

0 likes · 13 min read

How Frequency‑Refined Augmentation Boosts Contrastive Learning for Time‑Series Classification

Alibaba Cloud Big Data AI Platform

Nov 17, 2025 · Artificial Intelligence

End-to-End Navigation Model Training with Isaac Sim, MobilityGen, and Cosmos Augmentation

This tutorial walks through a complete workflow for building a navigation model using Isaac Sim and MobilityGen to generate synthetic data, applying Cosmos‑Transfer1‑7B for visual data augmentation, training the X‑Mobility model via imitation learning, converting it for ROS2 deployment, and performing software‑in‑the‑loop validation.

AI trainingIsaac SimNavigation

0 likes · 19 min read

End-to-End Navigation Model Training with Isaac Sim, MobilityGen, and Cosmos Augmentation

Alibaba Cloud Big Data AI Platform

Nov 10, 2025 · Artificial Intelligence

How to Boost Robot Imitation Learning with Cosmos World Model Data Augmentation

This guide demonstrates an end‑to‑end workflow on Alibaba Cloud PAI that uses the Cosmos world model to replace Isaac simulation for robot action data augmentation, including minimal human demonstrations, prompt‑driven data expansion, rejection sampling, IDM inverse‑kinematics extraction, imitation‑learning fine‑tuning, and model evaluation.

AICosmosdata augmentation

0 likes · 17 min read

How to Boost Robot Imitation Learning with Cosmos World Model Data Augmentation

Alibaba Cloud Big Data AI Platform

Nov 3, 2025 · Artificial Intelligence

Build Physical AI with Isaac Lab: Data Augmentation, Imitation Learning & Evaluation

This article walks through an end‑to‑end Physical AI workflow on Alibaba Cloud PAI, covering robot teleoperation data collection, Isaac Lab‑based data augmentation and enhancement, imitation‑learning model training, distributed DLC execution, and systematic evaluation across varied visual conditions.

Simulationdata augmentationimitation learning

0 likes · 17 min read

Build Physical AI with Isaac Lab: Data Augmentation, Imitation Learning & Evaluation

JavaEdge

Sep 14, 2025 · Artificial Intelligence

Exploring Hugging Face AI Sheets: No‑Code LLM‑Powered Data Manipulation

Hugging Face AI Sheets lets users employ large language models through a spreadsheet‑like interface to clean, transform, enrich, and generate datasets without writing code, offering both zero‑shot dataset creation and import‑based bulk processing, with optional self‑hosting via Docker for privacy‑sensitive workflows.

AI SheetsDocker deploymentHugging Face

0 likes · 5 min read

Exploring Hugging Face AI Sheets: No‑Code LLM‑Powered Data Manipulation

AI Frontier Lectures

Sep 8, 2025 · Artificial Intelligence

Why Data Augmentation Triggers OOD Fluctuations and How PEER Solves It

Data augmentation, while popular for single-source domain generalization, often induces severe out-of-distribution performance swings during training; the PEER framework combats this by employing dual-model collaboration, entropy regularization, periodic parameter averaging, and dynamic augmentation, achieving state-of-the-art robustness across multiple benchmark datasets.

OOD robustnessdata augmentationdomain generalization

0 likes · 7 min read

Why Data Augmentation Triggers OOD Fluctuations and How PEER Solves It

Data Party THU

Sep 7, 2025 · Artificial Intelligence

Tackling Imbalanced Data: MixUp, CutMix, and Focal Loss Explained

This article examines the challenges of imbalanced datasets in machine learning, especially in fields like medical imaging, and provides a detailed analysis of three key techniques—MixUp data mixing, CutMix region replacement, and the Focal Loss function—along with their implementations, advantages, limitations, and practical integration strategies.

CutMixFocal LossMixUp

0 likes · 11 min read

Tackling Imbalanced Data: MixUp, CutMix, and Focal Loss Explained

AI Algorithm Path

Jun 20, 2025 · Artificial Intelligence

Beginner’s Guide to Visual Language Models – Day 2: Understanding Contrastive Learning

This article explains contrastive learning for visual language models, covering its definition, four‑step workflow, how to choose positive and negative pairs, the difference between supervised and self‑supervised variants, and why the technique is essential for zero‑shot and cross‑modal capabilities.

contrastive learningdata augmentationrepresentation learning

0 likes · 6 min read

Beginner’s Guide to Visual Language Models – Day 2: Understanding Contrastive Learning

Amap Tech

May 27, 2025 · Artificial Intelligence

Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment

This article explains how Gaode Map leverages lightweight edge TTS models, dual‑autoregressive large‑model data augmentation, and a configurable audio‑processing DAG to enable users to create highly realistic personalized voice packs from just three recorded sentences.

Gaode MapsTTSdata augmentation

0 likes · 8 min read

Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment

Sohu Tech Products

Apr 16, 2025 · Artificial Intelligence

Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation

This guide walks readers through every stage of building high‑quality AI training datasets—from locating open‑source data and defining goals, through collection, annotation, cleaning, large‑scale processing, optional augmentation, and splitting, to validation—using a medical QA example for fine‑tuning DeepSeek‑R1.

AI fine-tuningDataset ConstructionPython

0 likes · 18 min read

Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation

AIWalker

Jan 10, 2025 · Artificial Intelligence

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

This paper presents SiCLIP, a framework that simplifies the Transformer architecture, combines weight‑sharing, multi‑stage knowledge distillation, and a novel pair‑matching loss with synthetic captions to train a competitive CLIP model using only one RTX3090 GPU and 1 TB of storage, achieving state‑of‑the‑art data‑size‑parameter‑accuracy trade‑offs.

CLIPKnowledge DistillationLightweight Training

0 likes · 19 min read

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

DataFunTalk

Jan 1, 2025 · Artificial Intelligence

Applying Large Language Models to Financial Risk Control at Akulaku

This article details Akulaku’s deployment of large language models across multimodal financial risk‑control scenarios—covering business background, a three‑module intelligent‑agent architecture, concrete tool‑ and planning‑enhancement case studies, and future outlook—demonstrating how LLMs boost efficiency, reduce labeling effort, and enable copilot‑style assistance.

KYC verificationLarge Language ModelsMultimodal AI

0 likes · 15 min read

Applying Large Language Models to Financial Risk Control at Akulaku

JD Tech Talk

Nov 26, 2024 · Artificial Intelligence

Design and Implementation of an Automated Logistics QA Bot Using Retrieval, Rerank, and Data Augmentation Techniques

This article describes a low‑cost, privacy‑preserving chatbot for logistics that combines data cleaning, large‑model‑based data augmentation, BM25 and vector retrieval, a DNN rerank model, and LLM‑driven answer rewriting to deliver accurate, compliant automated responses.

AIBM25QA bot

0 likes · 11 min read

Design and Implementation of an Automated Logistics QA Bot Using Retrieval, Rerank, and Data Augmentation Techniques

JD Cloud Developers

Nov 26, 2024 · Artificial Intelligence

Building a Low‑Cost, Privacy‑Safe Logistics QA Bot with Hybrid Retrieval & LLM

This article describes a privacy‑preserving, low‑cost logistics QA bot that combines data cleaning, augmentation, BM25 and vector retrieval, a DNN rerank model, and LLM‑based answer rewriting, along with evaluation results and deployment considerations.

Hybrid RetrievalLLM rewritingPrivacy

0 likes · 11 min read

Building a Low‑Cost, Privacy‑Safe Logistics QA Bot with Hybrid Retrieval & LLM

Alibaba Cloud Big Data AI Platform

Aug 30, 2024 · Artificial Intelligence

Boost LLM Performance: Data Augmentation & Distillation with Qwen2

This guide explains how to reduce the computational cost of large language models by preparing instruction data, optionally augmenting or refining it, deploying teacher and student models on PAI, and performing distillation training with detailed hyper‑parameter settings and sample Python scripts.

AIDeploymentDistillation

0 likes · 21 min read

Boost LLM Performance: Data Augmentation & Distillation with Qwen2

Ops Development & AI Practice

Jul 8, 2024 · Artificial Intelligence

Essential Denoising Techniques for Training Large AI Models

This article outlines key denoising methods—including data cleaning, augmentation, regularization, adversarial training, and self‑supervised learning—that improve the performance, generalization, and robustness of large neural network and transformer models.

DenoisingRegularizationadversarial training

0 likes · 5 min read

Essential Denoising Techniques for Training Large AI Models

Rare Earth Juejin Tech Community

Jun 16, 2024 · Artificial Intelligence

HRNet Source Code Walkthrough: Keypoint Dataset Construction, Online Data Augmentation, and Training Pipeline

This article provides a detailed, English-language walkthrough of the HRNet source code, covering how the COCO keypoint dataset is built, the online data‑augmentation techniques applied during training, and the end‑to‑end training and inference procedures for human pose estimation.

Deep LearningHRNetPyTorch

0 likes · 36 min read

HRNet Source Code Walkthrough: Keypoint Dataset Construction, Online Data Augmentation, and Training Pipeline

DaTaobao Tech

May 17, 2024 · Artificial Intelligence

Understanding Convolutional Neural Networks: Theory, Architecture, and Practical Techniques

The article explains CNN fundamentals—convolution, pooling, and fully‑connected layers—illustrates their implementation for American Sign Language letter recognition, details parameter calculations, demonstrates data augmentation and transfer learning techniques, and highlights how these methods boost image‑classification accuracy to around 92%.

CNNdata augmentationimage recognition

0 likes · 19 min read

Understanding Convolutional Neural Networks: Theory, Architecture, and Practical Techniques

NetEase Smart Enterprise Tech+

Jan 4, 2024 · Artificial Intelligence

How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades

The article examines the security challenges introduced by large‑model AIGC, outlines three technical upgrade paths—richer training data, few‑shot model fine‑tuning, and multimodal fusion—and demonstrates practical implementations that dramatically improve detection efficiency, accuracy, and scalability.

AI securityAIGCContent Safety

0 likes · 24 min read

How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades

php Courses

Oct 13, 2023 · Artificial Intelligence

Top 10 Python Libraries for Data Augmentation in Machine Learning

This article introduces ten popular Python libraries—Augmentor, imgaug, albumentations, nlpaug, textaugment, pytorch‑geometric, audiomentations, nlpaugment, keras‑augment, and OpenCV—that provide powerful image, text, audio, and graph data augmentation techniques to improve model generalization and robustness.

Image processingPythonaudio augmentation

0 likes · 8 min read

Top 10 Python Libraries for Data Augmentation in Machine Learning

DataFunSummit

May 23, 2023 · Artificial Intelligence

Continuous Semantic Enhancement for Neural Machine Translation: Methodology, Experiments, and Community Deployment

This article introduces a continuous semantic enhancement approach for neural machine translation that overcomes the limitations of discrete data‑augmentation techniques, details the neighbor risk minimization training objective, presents benchmark improvements on ACL‑2022 datasets, and describes practical deployment and fine‑tuning workflows in the Modu community.

Neural Machine Translationcontinuous semantic augmentationcontrastive learning

0 likes · 19 min read

Continuous Semantic Enhancement for Neural Machine Translation: Methodology, Experiments, and Community Deployment

Sohu Tech Products

Mar 16, 2023 · Artificial Intelligence

ChatGPT Data Augmentation Methods for NLP

This article introduces various ChatGPT‑based data‑augmentation techniques for natural language processing, explains how to use prompts for synonym, antonym, homophone, random insertion, deletion, and swapping transformations, and provides concrete example prompts and outputs to illustrate each method.

Artificial IntelligenceChatGPTNLP

0 likes · 15 min read

ChatGPT Data Augmentation Methods for NLP

Python Crawling & Data Mining

Mar 11, 2023 · Artificial Intelligence

How to Overcome Data Scarcity in Machine Learning: Strategies and Techniques

Facing data scarcity in machine learning, this article explores why large datasets are essential, categorizes missing data and label gaps, and presents practical solutions such as dataset reuse, augmentation, multimodal learning, curriculum learning, semi‑supervised methods, active learning, transfer and meta‑learning to mitigate the problem.

Data ScarcityMeta Learningdata augmentation

0 likes · 19 min read

How to Overcome Data Scarcity in Machine Learning: Strategies and Techniques

ELab Team

Dec 6, 2022 · Artificial Intelligence

Mastering CreateML: From Data Prep to Object Detection Models on iOS

This article introduces Apple’s CreateML tool, explains its supported model types, shows how to prepare and augment data, provides a Node.js script for generating synthetic training sets, and walks through training, testing, and integrating an object‑detection model into an iOS app.

CreateMLSwiftdata augmentation

0 likes · 17 min read

Mastering CreateML: From Data Prep to Object Detection Models on iOS

Meituan Technology Team

Nov 24, 2022 · Artificial Intelligence

Cross‑Lingual Structured Sentiment Analysis with Data Augmentation and Auxiliary Tasks

Meituan's Voice Interaction team tackled the lack of low‑resource language annotations and high optimization costs in SemEval‑2022 Task 10 by leveraging the cross‑lingual XLM‑RoBERTa model together with multi‑task learning and two data‑augmentation strategies, achieving first place in the zero‑shot subtask and second place in the monolingual subtask.

Cross-Lingual TransferMulti-Task LearningStructured Sentiment Analysis

0 likes · 25 min read

Cross‑Lingual Structured Sentiment Analysis with Data Augmentation and Auxiliary Tasks

AntTech

Nov 6, 2022 · Artificial Intelligence

Advanced Rule Learning, Constraint‑Adaptive Frameworks, and Semi‑Supervised Data Augmentation for Fraud Detection and Imbalanced Ranking

This article surveys recent Ant Group research on explainable fraud detection, including constraint‑adaptive rule‑set learning (CRSL), meta‑path guided rule generation (MetaRule), biased sampling for imbalanced ranking, and a semi‑supervised data‑augmentation framework (SDAT) for tabular data, highlighting their motivations, methodologies, deployments, and experimental results.

Graph Neural NetworksSemi-supervised Learningconstraint adaptive

0 likes · 18 min read

Advanced Rule Learning, Constraint‑Adaptive Frameworks, and Semi‑Supervised Data Augmentation for Fraud Detection and Imbalanced Ranking

DaTaobao Tech

Oct 17, 2022 · Artificial Intelligence

AI Live Stream: Causal Representation Learning and Real-time Color Enhancement

In this AI Live Stream, two Taobao Technology engineers present how causal representation learning enables unbiased data augmentation and factor‑controllable generation to boost fine‑grained image classification, while also unveiling a real‑time color‑enhancement technique that merges cascaded lookup tables with dynamic neural networks, illustrating modern AI trends and practical deployment strategies.

AI AlgorithmsFine-Grained ClassificationReal-time Processing

0 likes · 4 min read

AI Live Stream: Causal Representation Learning and Real-time Color Enhancement

NetEase Smart Enterprise Tech+

Jul 19, 2022 · Artificial Intelligence

How NER Dominated NLPCC 2022: Techniques Behind the Winning Model

This article reviews the recent NLPCC 2022 NER competition, explains the evolution of named entity recognition, details the five major modeling paradigms, and describes the winning team’s relation‑classification approach, data‑augmentation strategy, experimental results, and its practical deployment in NetEase Cloud Commerce services.

Artificial IntelligenceDeep LearningNLP

0 likes · 13 min read

How NER Dominated NLPCC 2022: Techniques Behind the Winning Model

DataFunSummit

Jun 26, 2022 · Artificial Intelligence

Applying Knowledge Graphs to Recruitment: Construction, Tag Mining, and Recommendation at 58.com

58.com’s NLP senior engineer explains how a recruitment knowledge graph is built—through multi‑dimensional tag systems, tag mining, and relation extraction—and how it enhances bidirectional matching and recommendation efficiency, addressing challenges such as weak expression, cold start, and supply‑demand imbalance.

AINLPdata augmentation

0 likes · 17 min read

Applying Knowledge Graphs to Recruitment: Construction, Tag Mining, and Recommendation at 58.com

DataFunSummit

Jun 23, 2022 · Artificial Intelligence

Unlocking Data Potential: Automatic Data Augmentation, Denoising, Active Learning, and Data Splitting

The talk explains how to maximize the value of training data by exploring background on model generalization, automatic data augmentation techniques, denoising strategies, active learning for selecting unlabeled samples, and robust data splitting methods, offering practical guidelines for AI practitioners.

AIActive LearningData Quality

0 likes · 16 min read

Unlocking Data Potential: Automatic Data Augmentation, Denoising, Active Learning, and Data Splitting

DaTaobao Tech

Jun 13, 2022 · Artificial Intelligence

Robust Neural Radiance Field Representation for Extrapolating Novel Views (RapNeRF)

RapNeRF enhances Neural Radiance Fields for extreme view extrapolation by introducing Random Ray Casting and a Ray Atlas, which together augment training data and store view‑dependent surface features, enabling robust, high‑quality novel‑view synthesis from sparse images and outperforming prior methods on synthetic and real datasets.

NeRFView Synthesisdata augmentation

0 likes · 8 min read

Robust Neural Radiance Field Representation for Extrapolating Novel Views (RapNeRF)

Meituan Technology Team

Jun 9, 2022 · Artificial Intelligence

FSL++: A Few-Shot Learning Model for Chinese Language Understanding that Tops the FewCLUE Benchmark

FSL++—a RoBERTa‑large‑based few‑shot model enhanced with domain‑adaptive pre‑training, prompt learning, diverse embedding‑level augmentations, and ensemble self‑training—topped the Chinese FewCLUE benchmark, beating human accuracy on news and scientific classification tasks and delivering measurable gains across multiple Meituan product scenarios.

Chinese language understandingEnsembleNLP

0 likes · 23 min read

FSL++: A Few-Shot Learning Model for Chinese Language Understanding that Tops the FewCLUE Benchmark

NetEase LeiHuo Testing Center

Apr 1, 2022 · Artificial Intelligence

Learning OCR for Game Text Recognition: From Data Preparation to CRNN Model Training

This article documents the author’s step‑by‑step journey of building an OCR system for recognizing Chinese characters in a card‑game UI, covering game selection, technical background, data generation, deep‑learning model training with CRNN, real‑image data collection, optimization attempts, and final performance evaluation.

CRNNDeep LearningEasyOCR

0 likes · 15 min read

Learning OCR for Game Text Recognition: From Data Preparation to CRNN Model Training

DataFunSummit

Feb 12, 2022 · Artificial Intelligence

Advances and Challenges in Post‑BERT Semantic Matching: Negative Sampling, Data Augmentation, and Applications

After the BERT era, this article reviews the limitations of pre‑trained language models for semantic matching, discusses negative‑sample sampling, data‑augmentation techniques, contrastive learning methods such as ConSERT and SimCSE, and practical deployment considerations in vector‑based retrieval systems.

contrastive learningdata augmentationpretrained language models

0 likes · 20 min read

Advances and Challenges in Post‑BERT Semantic Matching: Negative Sampling, Data Augmentation, and Applications

Code DAO

Jan 15, 2022 · Artificial Intelligence

How Tuun’s Automated Data Augmentation Boosts AI Model Accuracy

The article explains how Tuun, an open‑source Bayesian‑optimization tool, automatically searches data‑augmentation policies for machine‑learning models, details the setup with Microsoft NNI, provides code and configuration examples, and presents experiments on CIFAR‑10/100 and SVHN showing that Tuun‑generated policies match or surpass expert‑tuned strategies and further improve performance when combined.

AutoMLBayesian OptimizationNNI

0 likes · 14 min read

How Tuun’s Automated Data Augmentation Boosts AI Model Accuracy

Code DAO

Dec 30, 2021 · Artificial Intelligence

Revamper: An Intelligent Data Augmentation Engine for Faster DNN Training

The article presents a new data‑refurbishing technique and the Revamper loading system that cut CPU‑heavy data‑augmentation costs while preserving model generalization, showing up to significant throughput gains for ResNet‑50 on ImageNet without sacrificing accuracy.

CPU overheadDNN trainingResNet-50

0 likes · 10 min read

Revamper: An Intelligent Data Augmentation Engine for Faster DNN Training

ITPUB

Dec 13, 2021 · Artificial Intelligence

How Data Augmentation Boosts Machine Learning When Data Is Scarce

This article explains how data augmentation can alleviate overfitting by artificially expanding limited training sets, outlines common transformation techniques for images, text, and audio, and discusses the method's benefits, practical applications, and inherent limitations for machine‑learning practitioners.

Deep Learningcomputer visiondata augmentation

0 likes · 6 min read

How Data Augmentation Boosts Machine Learning When Data Is Scarce

Baidu Geek Talk

Sep 8, 2021 · Artificial Intelligence

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Knowledge DistillationModel OptimizationOCR

0 likes · 10 min read

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

Meituan Technology Team

Aug 19, 2021 · Artificial Intelligence

Few-Shot Learning Methods and Applications in Meituan NLP

Meituan’s NLP team leverages few‑shot learning—using data‑augmentation, semi‑supervised, ensemble/self‑training, and domain‑adaptation techniques—to cut annotation costs, achieving 1–2 percentage‑point accuracy gains on internal benchmarks and deploying high‑performing models for tasks such as topic classification, fake‑review detection, and sentiment analysis, while planning broader platform and model extensions.

Active LearningNLPSemi-supervised Learning

0 likes · 29 min read

Few-Shot Learning Methods and Applications in Meituan NLP

58 Tech

Aug 19, 2021 · Artificial Intelligence

Practical NER Techniques for Business Chatbots on the 58.com Service Platform

This article presents a comprehensive case study of applying named‑entity‑recognition (NER) techniques to the smart chat assistant of 58.com’s yellow‑page service, covering business background, model selection (BiLSTM‑CRF, IDCNN‑CRF, BERT), data‑augmentation, focal loss, fusion of rule‑based and neural methods, context modeling, online performance, and future research directions.

BERTCRFDialogue Systems

0 likes · 16 min read

Practical NER Techniques for Business Chatbots on the 58.com Service Platform

Beike Product & Technology

Jul 1, 2021 · Artificial Intelligence

Semantic Data Augmentation and GigaSpeech: Highlights of Two INTERSPEECH 2021 Papers from the Beike Voice Team

The article summarizes two INTERSPEECH 2021 papers from Beike's voice technology team, detailing a grammar‑based semantic data augmentation method that improves end‑to‑end Chinese speech recognition and introducing GigaSpeech, a massive 10,000‑hour multilingual English speech dataset for robust ASR research.

ChineseGigaSpeechInterspeech

0 likes · 7 min read

Semantic Data Augmentation and GigaSpeech: Highlights of Two INTERSPEECH 2021 Papers from the Beike Voice Team

DataFunTalk

May 9, 2021 · Artificial Intelligence

Few-Shot Learning, Data Augmentation, and Multi‑Task Learning for Safety Modeling in Ride‑Hailing Platforms

This article presents Didi's exploration of few‑shot learning, data‑augmentation, semi‑supervised self‑training and multi‑task learning techniques to address the scarcity of labeled samples in safety and governance scenarios, demonstrating practical solutions and performance gains across various risk‑detection tasks.

AIMulti-Task LearningSemi-supervised Learning

0 likes · 15 min read

Few-Shot Learning, Data Augmentation, and Multi‑Task Learning for Safety Modeling in Ride‑Hailing Platforms

Didi Tech

Apr 20, 2021 · Artificial Intelligence

Few-Shot Learning, Data Augmentation, and Semi‑Supervised Methods for Improving Safety and Governance Models at Didi

To overcome scarce labeled data for safety and governance, Didi combines few‑shot learning with systematic data augmentation, self‑training semi‑supervised labeling, and multi‑task neural architectures, cutting labeling costs and reducing log‑loss by over 20% while boosting ROC‑AUC and PR‑AUC across harassment detection, expense‑complaint, and route‑intercept use cases.

AI safetyDidiMulti-Task Learning

0 likes · 15 min read

Few-Shot Learning, Data Augmentation, and Semi‑Supervised Methods for Improving Safety and Governance Models at Didi

DataFunTalk

Apr 5, 2021 · Artificial Intelligence

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

The article reviews the Chinese NLP pre‑training model generalization competition, detailing data preprocessing, augmentation, external data usage, model scaling and architecture tweaks, loss functions, learning‑rate and adversarial training strategies, regularization techniques, post‑processing optimizations, and ineffective methods, highlighting their impact on performance metrics.

Loss FunctionsModel OptimizationNLP

0 likes · 15 min read

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

AntTech

Mar 3, 2021 · Artificial Intelligence

Ant Group Intelligent Service Research Overview: NLP, Dialogue, Recommendation, and Anti‑fraud Papers

The article presents a comprehensive overview of Ant Group's intelligent service research, summarizing recent AI‑focused papers on text classification, stance detection, data augmentation, knowledge distillation for ranking, reinforcement‑learning‑based dialogue clarification, behavior‑cloning dialogue systems, anti‑fraud outbound bots, tag‑based service recommendation, and multi‑agent service groups, while also highlighting future directions and recruitment opportunities.

AI researchAnti‑fraudDialogue Systems

0 likes · 17 min read

Ant Group Intelligent Service Research Overview: NLP, Dialogue, Recommendation, and Anti‑fraud Papers

Ctrip Technology

Dec 10, 2020 · Artificial Intelligence

Automatic Extraction of Theme-based Recommendation Reasons: Framework, Model Selection, Data Augmentation, and Optimization

This article presents a comprehensive study on automatically extracting theme‑based recommendation reasons for travel content, detailing a three‑stage retrieval framework, the advantages of interactive matching models over classification, rule‑based and back‑translation data augmentation techniques, and various model optimization strategies including priors, transfer learning, seed selection, optimizer choice, and layer‑wise learning rates.

AIRecommendation Systemsdata augmentation

0 likes · 19 min read

Automatic Extraction of Theme-based Recommendation Reasons: Framework, Model Selection, Data Augmentation, and Optimization

DeWu Technology

Nov 26, 2020 · Artificial Intelligence

Automated Captcha Recognition Using Machine Learning

The article outlines a machine‑learning pipeline for automated captcha recognition, covering dataset generation, image preprocessing, segmentation via clustering or watershed methods, and classification using classic models and CNNs, achieving roughly 94% accuracy while noting the growing complexity of modern captchas and recommending developer collaboration when feasible.

Pythoncaptchadata augmentation

0 likes · 23 min read

Automated Captcha Recognition Using Machine Learning

Suning Technology

Nov 14, 2020 · Artificial Intelligence

Designing Real-Time AI Algorithms for Unmanned Retail Stores

This lecture details the end‑to‑end AI architecture for unmanned stores, covering algorithm module selection, calibration, face recognition, multi‑task detection, tracking, recommendation, data collection, augmentation, model training, and GPU‑accelerated deployment to achieve real‑time performance and high accuracy.

Deep LearningModel DeploymentReal-time AI

0 likes · 15 min read

Designing Real-Time AI Algorithms for Unmanned Retail Stores

360 Quality & Efficiency

Sep 18, 2020 · Artificial Intelligence

Data Augmentation Techniques for Improving Object Detection Model Robustness

To enhance object detection robustness, the article discusses various data augmentation methods—including rotation, flipping, random cropping, scaling, color jitter, blurring, transparency adjustment, and image partitioning—providing code examples and illustrating their impact on model performance with before‑and‑after results.

Pythoncomputer visiondata augmentation

0 likes · 7 min read

Data Augmentation Techniques for Improving Object Detection Model Robustness

Huawei Cloud Developer Alliance

Jul 29, 2020 · Artificial Intelligence

Boosting Small Industrial Image Datasets with ModelArts Augmentation and Evaluation

This article describes a practical workflow for expanding a limited industrial solar‑panel defect dataset using flip augmentation, ModelArts smart labeling, and targeted data‑balancing techniques, then evaluates the impact on a ResNet‑50 classifier with detailed accuracy and recall metrics, demonstrating how thoughtful augmentation can improve defect detection performance.

Deep LearningModelArtsdata augmentation

0 likes · 10 min read

Boosting Small Industrial Image Datasets with ModelArts Augmentation and Evaluation

DataFunTalk

Jun 28, 2020 · Artificial Intelligence

Applying UDA Semi‑Supervised Learning to Financial Text Classification: Experiments and Insights

This article investigates the practical performance of Google’s 2019 Unsupervised Data Augmentation (UDA) framework on real‑world financial text classification tasks, detailing experiments with limited labeled data, domain‑out‑of‑distribution samples, noisy labels, and comparisons between BERT and lightweight TextCNN models.

BERTFinancial NLPSemi-supervised Learning

0 likes · 21 min read

Applying UDA Semi‑Supervised Learning to Financial Text Classification: Experiments and Insights

Sohu Tech Products

May 27, 2020 · Artificial Intelligence

Geometric Transformations and Data Augmentation with OpenCV: Forward/Backward Mapping, Rotation, Translation, and Affine Operations

This article explains traditional image augmentation techniques focusing on geometric transformations such as translation and rotation, describes forward and backward mapping concepts, coordinate‑system conversion, matrix representations, and provides detailed C++ OpenCV examples for implementing these operations with warpAffine and getRotationMatrix2D.

C#Geometric Transformationdata augmentation

0 likes · 11 min read

Geometric Transformations and Data Augmentation with OpenCV: Forward/Backward Mapping, Rotation, Translation, and Affine Operations

Taobao Frontend Technology

May 25, 2020 · Frontend Development

How to Build Front‑End AI Experiments with Pipcook: From Setup to Real‑World Image Classification

This comprehensive guide walks front‑end developers through preparing hardware and OS, installing Python and Node environments, launching Pipcook's visual board, running handwritten digit and image classification experiments, creating and augmenting training samples, configuring pipelines, training models, and understanding deployment, all using the Pipcook framework.

data augmentationimage classificationmachine learning

0 likes · 34 min read

How to Build Front‑End AI Experiments with Pipcook: From Setup to Real‑World Image Classification

DataFunTalk

Mar 17, 2020 · Artificial Intelligence

A Survey of Text Data Augmentation Techniques in Natural Language Processing

This article systematically reviews recent developments in text data augmentation for natural language processing, covering common scenarios such as low‑resource and imbalanced classification, and detailing five major techniques—including back‑translation, EDA, TF‑IDF‑based replacement, contextual augmentation, and language‑model‑based methods—with experimental results and future directions.

NLPdata augmentationmachine learning

0 likes · 27 min read

A Survey of Text Data Augmentation Techniques in Natural Language Processing

Xianyu Technology

Dec 11, 2019 · Artificial Intelligence

Improving Small Object Detection for UI2CODE via Data Augmentation and Model Optimization

The study enhances UI2CODE’s ability to detect tiny UI components by augmenting training data with copied small objects, upgrading the detector from Faster RCNN to FPN and Cascade FPN, and refining box positions with smoothing and projection, achieving superior small‑object mAP/mAR and enabling broader UI parsing applications.

FPNModel OptimizationUI2Code

0 likes · 9 min read

Improving Small Object Detection for UI2CODE via Data Augmentation and Model Optimization

ITPUB

Oct 22, 2019 · Artificial Intelligence

Master Real-Time Image Augmentation with Keras ImageDataGenerator

This guide explains how Keras ImageDataGenerator performs on‑the‑fly image augmentation—covering rotation, shifts, brightness, shear, zoom, channel shifts, flips, and fill‑mode options—with concise Python code examples and visual results to help prevent overfitting in deep‑learning models.

ImageDataGeneratorKerasTensorFlow

0 likes · 7 min read

Master Real-Time Image Augmentation with Keras ImageDataGenerator

Xianyu Technology

Aug 7, 2019 · Artificial Intelligence

Weex Page Mocking with Puppeteer for Large‑Scale UI Sample Generation

To solve the shortage of annotated UI data for UI2CODE, the team uses Puppeteer to load Weex pages, traverses the DOM to gather text and image elements, records their styles and positions, screenshots the page, and repeatedly swaps content, automatically generating thousands of realistic, labeled UI samples from a few hundred templates, greatly cutting manual labeling effort and boosting model accuracy.

PuppeteerSynthetic SamplesUI automation

0 likes · 8 min read

Weex Page Mocking with Puppeteer for Large‑Scale UI Sample Generation

DataFunTalk

Jun 10, 2019 · Artificial Intelligence

BERT Applications Across NLP Domains: Progress, Challenges, and Future Directions

This article surveys the rapid proliferation of BERT-based research over the past six months, analyzing its impact on various NLP tasks such as question answering, information retrieval, dialog systems, summarization, data augmentation, classification, and sequence labeling, while also discussing the model's strengths, limitations, and future research opportunities.

BERTNLPdata augmentation

0 likes · 52 min read

BERT Applications Across NLP Domains: Progress, Challenges, and Future Directions

Qunar Tech Salon

Apr 29, 2019 · Artificial Intelligence

Multi‑Level Deep Model Fusion for Fake News Detection Using BERT – Winning Solution of WSDM Cup 2019

The article details the Travel team's award‑winning solution for the WSDM Cup 2019 fake‑news detection task, describing data analysis, preprocessing, label‑propagation augmentation, a BERT‑based baseline, a three‑stage multi‑level model‑fusion framework, experimental results, and future directions.

BERTModel FusionNLP

0 likes · 12 min read

Multi‑Level Deep Model Fusion for Fake News Detection Using BERT – Winning Solution of WSDM Cup 2019

Suning Technology

Apr 26, 2018 · Artificial Intelligence

Inside Suning’s Scalable Real‑Time Face Recognition Architecture and Algorithms

Suning’s face recognition solution combines front‑end detection, optimal photo selection, alignment, and cloud‑based feature extraction and matching, leveraging deep‑learning models, weight and feature normalization, angular margins, and triplet loss, while optimizing hardware, bandwidth, and data quality for large‑scale 1:N deployments.

Deep Learningdata augmentationface recognition

0 likes · 18 min read

Inside Suning’s Scalable Real‑Time Face Recognition Architecture and Algorithms