Tagged articles
69 articles
Page 1 of 1
Data Party THU
Data Party THU
Apr 30, 2026 · Artificial Intelligence

Time Series Forecasting Augmentation: Frequency, Decomposition, and Patch Techniques

This article reviews why classic classification augmentations fail for forecasting, introduces the essential data‑label consistency requirement, and systematically categorizes effective time‑series augmentation methods—including frequency‑domain (RobustTAD, FreqMask, FreqMix), decomposition (STAug), and patch‑based approaches (WaveMask, WaveMix, Dominant Shuffle, Temporal Patch Shuffle)—backed by extensive experiments on long‑term, short‑term, and classification tasks.

data augmentationfrequency domaintemporal patch shuffle
0 likes · 20 min read
Time Series Forecasting Augmentation: Frequency, Decomposition, and Patch Techniques
DeepHub IMBA
DeepHub IMBA
Apr 22, 2026 · Artificial Intelligence

A Survey of Time Series Forecasting Augmentation: Frequency Domain, Decomposition, and Patch Methods

The article reviews why classic classification augmentations fail for forecasting, outlines a taxonomy of effective time‑series augmentation techniques—including frequency‑domain, decomposition, and patch‑based methods—details the Temporal Patch Shuffle (TPS) pipeline, and presents extensive experiments showing TPS achieves state‑of‑the‑art improvements across long‑term, short‑term, and classification tasks.

Time Seriesdata augmentationforecasting
0 likes · 17 min read
A Survey of Time Series Forecasting Augmentation: Frequency Domain, Decomposition, and Patch Methods
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 16, 2026 · Artificial Intelligence

Build a Full End‑to‑End Embodied AI Workflow with Isaac Lab Arena

This notebook walks through a complete pipeline—from configuring Isaac Lab Arena environments and downloading datasets, to using Mimic for large‑scale data augmentation, fine‑tuning a GR00T‑N1.5 policy, and performing closed‑loop evaluation—demonstrating how to develop and validate embodied AI tasks on PAI‑DSW.

GR00TIsaac LabMimic
0 likes · 14 min read
Build a Full End‑to‑End Embodied AI Workflow with Isaac Lab Arena
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 9, 2026 · Artificial Intelligence

How Data Flywheels Accelerate Small Agentic Model Training

This article details a data‑flywheel framework for training compact agentic language models, describing synthetic task generation, mock environment simulation, rubric‑based reward design, iterative hard‑sample augmentation, and experimental results that show consistent performance gains across benchmarks.

Model EvaluationReinforcement LearningSynthetic Environments
0 likes · 17 min read
How Data Flywheels Accelerate Small Agentic Model Training
Data Party THU
Data Party THU
Apr 5, 2026 · Artificial Intelligence

How to Beat Shortcut Learning for Better OOD Generalization in Vision Models

Visual and vision-language models excel under IID benchmarks but often fail on out-of-distribution data due to shortcut learning; this article examines the problem, explains its causes, and proposes data-level and model-level interventions—including StillMix, FLASH, and SPARCL—to improve OOD robustness.

AI researchModel DesignOOD generalization
0 likes · 7 min read
How to Beat Shortcut Learning for Better OOD Generalization in Vision Models
Data Party THU
Data Party THU
Feb 25, 2026 · Artificial Intelligence

Why Multimodal LLMs Miss Tiny Objects—and How to Fix It

This article analyzes why multimodal large language models often fail to detect small objects, identifies three core bottlenecks, and presents a four‑tiered optimization roadmap—from zero‑cost inference tricks to data augmentation, model fine‑tuning, and engineering safeguards—backed by three real‑world case studies and actionable guidelines.

Inference OptimizationMultimodal LLMdata augmentation
0 likes · 20 min read
Why Multimodal LLMs Miss Tiny Objects—and How to Fix It
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Feb 1, 2026 · Artificial Intelligence

Beyond Historical Data: Adaptive Synthesis for Financial Time Series

This article reviews a recent paper that proposes a drift‑aware data‑stream system integrating machine‑learning‑based adaptive control into financial data management, introducing a parametric data‑operation module, a gradient‑based bi‑level optimizer, and a curriculum planner to improve model robustness and risk‑adjusted returns in non‑stationary markets.

Quantitative Financeadaptive data synthesisconcept drift
0 likes · 18 min read
Beyond Historical Data: Adaptive Synthesis for Financial Time Series
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 28, 2025 · Artificial Intelligence

Boosting 5G Complaint Intent Detection with Large-Model-Enhanced Few-Shot Learning

This paper presents a collaborative framework where a large language model generates high‑quality synthetic samples to augment a lightweight model, dramatically improving few‑shot user‑complaint intent recognition in 5G networks, achieving a 21% boost for rare categories and a 9% overall accuracy gain.

Few‑Shot Learningcomplaint intent detectiondata augmentation
0 likes · 27 min read
Boosting 5G Complaint Intent Detection with Large-Model-Enhanced Few-Shot Learning
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 24, 2025 · Artificial Intelligence

Fine‑Tuning GR00T‑N1.5: From Human Demonstrations to Distributed Imitation Learning

This tutorial walks through fine‑tuning the complex VLA model GR00T‑N1.5 by collecting human demonstrations, annotating and massively augmenting data with DLC, performing distributed imitation learning, and validating the model through a server‑client DSW setup, complete with code snippets, resource specs, and visual examples.

DSWDistributed Imitation LearningGR00T
0 likes · 18 min read
Fine‑Tuning GR00T‑N1.5: From Human Demonstrations to Distributed Imitation Learning
Data Party THU
Data Party THU
Nov 22, 2025 · Artificial Intelligence

How Frequency‑Refined Augmentation Boosts Contrastive Learning for Time‑Series Classification

FreRA introduces a lightweight, plug‑in frequency‑refined augmentation that adaptively refines spectral components to preserve global semantics while injecting variance, dramatically improving contrastive learning performance on time‑series classification, anomaly detection, and transfer learning across multiple benchmark datasets.

Time Seriescontrastive learningdata augmentation
0 likes · 13 min read
How Frequency‑Refined Augmentation Boosts Contrastive Learning for Time‑Series Classification
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 17, 2025 · Artificial Intelligence

End-to-End Navigation Model Training with Isaac Sim, MobilityGen, and Cosmos Augmentation

This tutorial walks through a complete workflow for building a navigation model using Isaac Sim and MobilityGen to generate synthetic data, applying Cosmos‑Transfer1‑7B for visual data augmentation, training the X‑Mobility model via imitation learning, converting it for ROS2 deployment, and performing software‑in‑the‑loop validation.

AI trainingIsaac SimROS2
0 likes · 19 min read
End-to-End Navigation Model Training with Isaac Sim, MobilityGen, and Cosmos Augmentation
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 10, 2025 · Artificial Intelligence

How to Boost Robot Imitation Learning with Cosmos World Model Data Augmentation

This guide demonstrates an end‑to‑end workflow on Alibaba Cloud PAI that uses the Cosmos world model to replace Isaac simulation for robot action data augmentation, including minimal human demonstrations, prompt‑driven data expansion, rejection sampling, IDM inverse‑kinematics extraction, imitation‑learning fine‑tuning, and model evaluation.

AICosmosModel Evaluation
0 likes · 17 min read
How to Boost Robot Imitation Learning with Cosmos World Model Data Augmentation
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 3, 2025 · Artificial Intelligence

Build Physical AI with Isaac Lab: Data Augmentation, Imitation Learning & Evaluation

This article walks through an end‑to‑end Physical AI workflow on Alibaba Cloud PAI, covering robot teleoperation data collection, Isaac Lab‑based data augmentation and enhancement, imitation‑learning model training, distributed DLC execution, and systematic evaluation across varied visual conditions.

Physical AIRoboticsdata augmentation
0 likes · 17 min read
Build Physical AI with Isaac Lab: Data Augmentation, Imitation Learning & Evaluation
JavaEdge
JavaEdge
Sep 14, 2025 · Artificial Intelligence

Exploring Hugging Face AI Sheets: No‑Code LLM‑Powered Data Manipulation

Hugging Face AI Sheets lets users employ large language models through a spreadsheet‑like interface to clean, transform, enrich, and generate datasets without writing code, offering both zero‑shot dataset creation and import‑based bulk processing, with optional self‑hosting via Docker for privacy‑sensitive workflows.

AI SheetsDocker deploymentHugging Face
0 likes · 5 min read
Exploring Hugging Face AI Sheets: No‑Code LLM‑Powered Data Manipulation
AI Frontier Lectures
AI Frontier Lectures
Sep 8, 2025 · Artificial Intelligence

Why Data Augmentation Triggers OOD Fluctuations and How PEER Solves It

Data augmentation, while popular for single-source domain generalization, often induces severe out-of-distribution performance swings during training; the PEER framework combats this by employing dual-model collaboration, entropy regularization, periodic parameter averaging, and dynamic augmentation, achieving state-of-the-art robustness across multiple benchmark datasets.

OOD robustnessdata augmentationdomain generalization
0 likes · 7 min read
Why Data Augmentation Triggers OOD Fluctuations and How PEER Solves It
Data Party THU
Data Party THU
Sep 7, 2025 · Artificial Intelligence

Tackling Imbalanced Data: MixUp, CutMix, and Focal Loss Explained

This article examines the challenges of imbalanced datasets in machine learning, especially in fields like medical imaging, and provides a detailed analysis of three key techniques—MixUp data mixing, CutMix region replacement, and the Focal Loss function—along with their implementations, advantages, limitations, and practical integration strategies.

CutMixFocal LossMixUp
0 likes · 11 min read
Tackling Imbalanced Data: MixUp, CutMix, and Focal Loss Explained
AI Algorithm Path
AI Algorithm Path
Jun 20, 2025 · Artificial Intelligence

Beginner’s Guide to Visual Language Models – Day 2: Understanding Contrastive Learning

This article explains contrastive learning for visual language models, covering its definition, four‑step workflow, how to choose positive and negative pairs, the difference between supervised and self‑supervised variants, and why the technique is essential for zero‑shot and cross‑modal capabilities.

Visual-Language Modelscontrastive learningdata augmentation
0 likes · 6 min read
Beginner’s Guide to Visual Language Models – Day 2: Understanding Contrastive Learning
Amap Tech
Amap Tech
May 27, 2025 · Artificial Intelligence

Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment

This article explains how Gaode Map leverages lightweight edge TTS models, dual‑autoregressive large‑model data augmentation, and a configurable audio‑processing DAG to enable users to create highly realistic personalized voice packs from just three recorded sentences.

Gaode MapsTTSdata augmentation
0 likes · 8 min read
Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment
Sohu Tech Products
Sohu Tech Products
Apr 16, 2025 · Artificial Intelligence

Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation

This guide walks readers through every stage of building high‑quality AI training datasets—from locating open‑source data and defining goals, through collection, annotation, cleaning, large‑scale processing, optional augmentation, and splitting, to validation—using a medical QA example for fine‑tuning DeepSeek‑R1.

AI fine-tuningPythondata augmentation
0 likes · 18 min read
Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation
AIWalker
AIWalker
Jan 10, 2025 · Artificial Intelligence

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

This paper presents SiCLIP, a framework that simplifies the Transformer architecture, combines weight‑sharing, multi‑stage knowledge distillation, and a novel pair‑matching loss with synthetic captions to train a competitive CLIP model using only one RTX3090 GPU and 1 TB of storage, achieving state‑of‑the‑art data‑size‑parameter‑accuracy trade‑offs.

CLIPLightweight TrainingSynthetic Captions
0 likes · 19 min read
How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090
DataFunTalk
DataFunTalk
Jan 1, 2025 · Artificial Intelligence

Applying Large Language Models to Financial Risk Control at Akulaku

This article details Akulaku’s deployment of large language models across multimodal financial risk‑control scenarios—covering business background, a three‑module intelligent‑agent architecture, concrete tool‑ and planning‑enhancement case studies, and future outlook—demonstrating how LLMs boost efficiency, reduce labeling effort, and enable copilot‑style assistance.

Agent ArchitectureKYC verificationLarge Language Models
0 likes · 15 min read
Applying Large Language Models to Financial Risk Control at Akulaku
Ops Development & AI Practice
Ops Development & AI Practice
Jul 8, 2024 · Artificial Intelligence

Essential Denoising Techniques for Training Large AI Models

This article outlines key denoising methods—including data cleaning, augmentation, regularization, adversarial training, and self‑supervised learning—that improve the performance, generalization, and robustness of large neural network and transformer models.

DenoisingRegularizationadversarial training
0 likes · 5 min read
Essential Denoising Techniques for Training Large AI Models
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 16, 2024 · Artificial Intelligence

HRNet Source Code Walkthrough: Keypoint Dataset Construction, Online Data Augmentation, and Training Pipeline

This article provides a detailed, English-language walkthrough of the HRNet source code, covering how the COCO keypoint dataset is built, the online data‑augmentation techniques applied during training, and the end‑to‑end training and inference procedures for human pose estimation.

Computer VisionDeep LearningHRNet
0 likes · 36 min read
HRNet Source Code Walkthrough: Keypoint Dataset Construction, Online Data Augmentation, and Training Pipeline
DaTaobao Tech
DaTaobao Tech
May 17, 2024 · Artificial Intelligence

Understanding Convolutional Neural Networks: Theory, Architecture, and Practical Techniques

The article explains CNN fundamentals—convolution, pooling, and fully‑connected layers—illustrates their implementation for American Sign Language letter recognition, details parameter calculations, demonstrates data augmentation and transfer learning techniques, and highlights how these methods boost image‑classification accuracy to around 92%.

CNNdata augmentationimage recognition
0 likes · 19 min read
Understanding Convolutional Neural Networks: Theory, Architecture, and Practical Techniques
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jan 4, 2024 · Artificial Intelligence

How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades

The article examines the security challenges introduced by large‑model AIGC, outlines three technical upgrade paths—richer training data, few‑shot model fine‑tuning, and multimodal fusion—and demonstrates practical implementations that dramatically improve detection efficiency, accuracy, and scalability.

AI securityAIGCContent Safety
0 likes · 24 min read
How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades
php Courses
php Courses
Oct 13, 2023 · Artificial Intelligence

Top 10 Python Libraries for Data Augmentation in Machine Learning

This article introduces ten popular Python libraries—Augmentor, imgaug, albumentations, nlpaug, textaugment, pytorch‑geometric, audiomentations, nlpaugment, keras‑augment, and OpenCV—that provide powerful image, text, audio, and graph data augmentation techniques to improve model generalization and robustness.

Image ProcessingPythonaudio augmentation
0 likes · 8 min read
Top 10 Python Libraries for Data Augmentation in Machine Learning
DataFunSummit
DataFunSummit
May 23, 2023 · Artificial Intelligence

Continuous Semantic Enhancement for Neural Machine Translation: Methodology, Experiments, and Community Deployment

This article introduces a continuous semantic enhancement approach for neural machine translation that overcomes the limitations of discrete data‑augmentation techniques, details the neighbor risk minimization training objective, presents benchmark improvements on ACL‑2022 datasets, and describes practical deployment and fine‑tuning workflows in the Modu community.

Neural Machine Translationcontinuous semantic augmentationcontrastive learning
0 likes · 19 min read
Continuous Semantic Enhancement for Neural Machine Translation: Methodology, Experiments, and Community Deployment
Sohu Tech Products
Sohu Tech Products
Mar 16, 2023 · Artificial Intelligence

ChatGPT Data Augmentation Methods for NLP

This article introduces various ChatGPT‑based data‑augmentation techniques for natural language processing, explains how to use prompts for synonym, antonym, homophone, random insertion, deletion, and swapping transformations, and provides concrete example prompts and outputs to illustrate each method.

Artificial IntelligenceChatGPTNLP
0 likes · 15 min read
ChatGPT Data Augmentation Methods for NLP
Python Crawling & Data Mining
Python Crawling & Data Mining
Mar 11, 2023 · Artificial Intelligence

How to Overcome Data Scarcity in Machine Learning: Strategies and Techniques

Facing data scarcity in machine learning, this article explores why large datasets are essential, categorizes missing data and label gaps, and presents practical solutions such as dataset reuse, augmentation, multimodal learning, curriculum learning, semi‑supervised methods, active learning, transfer and meta‑learning to mitigate the problem.

Meta Learningdata augmentationdata scarcity
0 likes · 19 min read
How to Overcome Data Scarcity in Machine Learning: Strategies and Techniques
ELab Team
ELab Team
Dec 6, 2022 · Artificial Intelligence

Mastering CreateML: From Data Prep to Object Detection Models on iOS

This article introduces Apple’s CreateML tool, explains its supported model types, shows how to prepare and augment data, provides a Node.js script for generating synthetic training sets, and walks through training, testing, and integrating an object‑detection model into an iOS app.

CreateMLSwiftdata augmentation
0 likes · 17 min read
Mastering CreateML: From Data Prep to Object Detection Models on iOS
Meituan Technology Team
Meituan Technology Team
Nov 24, 2022 · Artificial Intelligence

Cross‑Lingual Structured Sentiment Analysis with Data Augmentation and Auxiliary Tasks

Meituan's Voice Interaction team tackled the lack of low‑resource language annotations and high optimization costs in SemEval‑2022 Task 10 by leveraging the cross‑lingual XLM‑RoBERTa model together with multi‑task learning and two data‑augmentation strategies, achieving first place in the zero‑shot subtask and second place in the monolingual subtask.

Cross-Lingual TransferStructured Sentiment AnalysisXLM-RoBERTa
0 likes · 25 min read
Cross‑Lingual Structured Sentiment Analysis with Data Augmentation and Auxiliary Tasks
AntTech
AntTech
Nov 6, 2022 · Artificial Intelligence

Advanced Rule Learning, Constraint‑Adaptive Frameworks, and Semi‑Supervised Data Augmentation for Fraud Detection and Imbalanced Ranking

This article surveys recent Ant Group research on explainable fraud detection, including constraint‑adaptive rule‑set learning (CRSL), meta‑path guided rule generation (MetaRule), biased sampling for imbalanced ranking, and a semi‑supervised data‑augmentation framework (SDAT) for tabular data, highlighting their motivations, methodologies, deployments, and experimental results.

Semi-supervised Learningconstraint adaptivedata augmentation
0 likes · 18 min read
Advanced Rule Learning, Constraint‑Adaptive Frameworks, and Semi‑Supervised Data Augmentation for Fraud Detection and Imbalanced Ranking
DaTaobao Tech
DaTaobao Tech
Oct 17, 2022 · Artificial Intelligence

AI Live Stream: Causal Representation Learning and Real-time Color Enhancement

In this AI Live Stream, two Taobao Technology engineers present how causal representation learning enables unbiased data augmentation and factor‑controllable generation to boost fine‑grained image classification, while also unveiling a real‑time color‑enhancement technique that merges cascaded lookup tables with dynamic neural networks, illustrating modern AI trends and practical deployment strategies.

AI AlgorithmsFine-Grained ClassificationReal-time Processing
0 likes · 4 min read
AI Live Stream: Causal Representation Learning and Real-time Color Enhancement
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jul 19, 2022 · Artificial Intelligence

How NER Dominated NLPCC 2022: Techniques Behind the Winning Model

This article reviews the recent NLPCC 2022 NER competition, explains the evolution of named entity recognition, details the five major modeling paradigms, and describes the winning team’s relation‑classification approach, data‑augmentation strategy, experimental results, and its practical deployment in NetEase Cloud Commerce services.

Artificial IntelligenceDeep LearningNLP
0 likes · 13 min read
How NER Dominated NLPCC 2022: Techniques Behind the Winning Model
DataFunSummit
DataFunSummit
Jun 26, 2022 · Artificial Intelligence

Applying Knowledge Graphs to Recruitment: Construction, Tag Mining, and Recommendation at 58.com

58.com’s NLP senior engineer explains how a recruitment knowledge graph is built—through multi‑dimensional tag systems, tag mining, and relation extraction—and how it enhances bidirectional matching and recommendation efficiency, addressing challenges such as weak expression, cold start, and supply‑demand imbalance.

AINLPdata augmentation
0 likes · 17 min read
Applying Knowledge Graphs to Recruitment: Construction, Tag Mining, and Recommendation at 58.com
DataFunSummit
DataFunSummit
Jun 23, 2022 · Artificial Intelligence

Unlocking Data Potential: Automatic Data Augmentation, Denoising, Active Learning, and Data Splitting

The talk explains how to maximize the value of training data by exploring background on model generalization, automatic data augmentation techniques, denoising strategies, active learning for selecting unlabeled samples, and robust data splitting methods, offering practical guidelines for AI practitioners.

AIData Qualityactive learning
0 likes · 16 min read
Unlocking Data Potential: Automatic Data Augmentation, Denoising, Active Learning, and Data Splitting
DaTaobao Tech
DaTaobao Tech
Jun 13, 2022 · Artificial Intelligence

Robust Neural Radiance Field Representation for Extrapolating Novel Views (RapNeRF)

RapNeRF enhances Neural Radiance Fields for extreme view extrapolation by introducing Random Ray Casting and a Ray Atlas, which together augment training data and store view‑dependent surface features, enabling robust, high‑quality novel‑view synthesis from sparse images and outperforming prior methods on synthetic and real datasets.

NeRFView Synthesisdata augmentation
0 likes · 8 min read
Robust Neural Radiance Field Representation for Extrapolating Novel Views (RapNeRF)
Meituan Technology Team
Meituan Technology Team
Jun 9, 2022 · Artificial Intelligence

FSL++: A Few-Shot Learning Model for Chinese Language Understanding that Tops the FewCLUE Benchmark

FSL++—a RoBERTa‑large‑based few‑shot model enhanced with domain‑adaptive pre‑training, prompt learning, diverse embedding‑level augmentations, and ensemble self‑training—topped the Chinese FewCLUE benchmark, beating human accuracy on news and scientific classification tasks and delivering measurable gains across multiple Meituan product scenarios.

Chinese language understandingFew‑Shot LearningNLP
0 likes · 23 min read
FSL++: A Few-Shot Learning Model for Chinese Language Understanding that Tops the FewCLUE Benchmark
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Apr 1, 2022 · Artificial Intelligence

Learning OCR for Game Text Recognition: From Data Preparation to CRNN Model Training

This article documents the author’s step‑by‑step journey of building an OCR system for recognizing Chinese characters in a card‑game UI, covering game selection, technical background, data generation, deep‑learning model training with CRNN, real‑image data collection, optimization attempts, and final performance evaluation.

CRNNDeep LearningEasyOCR
0 likes · 15 min read
Learning OCR for Game Text Recognition: From Data Preparation to CRNN Model Training
DataFunSummit
DataFunSummit
Feb 12, 2022 · Artificial Intelligence

Advances and Challenges in Post‑BERT Semantic Matching: Negative Sampling, Data Augmentation, and Applications

After the BERT era, this article reviews the limitations of pre‑trained language models for semantic matching, discusses negative‑sample sampling, data‑augmentation techniques, contrastive learning methods such as ConSERT and SimCSE, and practical deployment considerations in vector‑based retrieval systems.

contrastive learningdata augmentationpretrained language models
0 likes · 20 min read
Advances and Challenges in Post‑BERT Semantic Matching: Negative Sampling, Data Augmentation, and Applications
Code DAO
Code DAO
Jan 15, 2022 · Artificial Intelligence

How Tuun’s Automated Data Augmentation Boosts AI Model Accuracy

The article explains how Tuun, an open‑source Bayesian‑optimization tool, automatically searches data‑augmentation policies for machine‑learning models, details the setup with Microsoft NNI, provides code and configuration examples, and presents experiments on CIFAR‑10/100 and SVHN showing that Tuun‑generated policies match or surpass expert‑tuned strategies and further improve performance when combined.

AutoMLBayesian OptimizationImage Classification
0 likes · 14 min read
How Tuun’s Automated Data Augmentation Boosts AI Model Accuracy
Code DAO
Code DAO
Dec 30, 2021 · Artificial Intelligence

Revamper: An Intelligent Data Augmentation Engine for Faster DNN Training

The article presents a new data‑refurbishing technique and the Revamper loading system that cut CPU‑heavy data‑augmentation costs while preserving model generalization, showing up to significant throughput gains for ResNet‑50 on ImageNet without sacrificing accuracy.

CPU overheadDNN trainingResNet-50
0 likes · 10 min read
Revamper: An Intelligent Data Augmentation Engine for Faster DNN Training
ITPUB
ITPUB
Dec 13, 2021 · Artificial Intelligence

How Data Augmentation Boosts Machine Learning When Data Is Scarce

This article explains how data augmentation can alleviate overfitting by artificially expanding limited training sets, outlines common transformation techniques for images, text, and audio, and discusses the method's benefits, practical applications, and inherent limitations for machine‑learning practitioners.

Computer VisionDeep Learningdata augmentation
0 likes · 6 min read
How Data Augmentation Boosts Machine Learning When Data Is Scarce
Baidu Geek Talk
Baidu Geek Talk
Sep 8, 2021 · Artificial Intelligence

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Computer VisionModel OptimizationOCR
0 likes · 10 min read
How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations
Meituan Technology Team
Meituan Technology Team
Aug 19, 2021 · Artificial Intelligence

Few-Shot Learning Methods and Applications in Meituan NLP

Meituan’s NLP team leverages few‑shot learning—using data‑augmentation, semi‑supervised, ensemble/self‑training, and domain‑adaptation techniques—to cut annotation costs, achieving 1–2 percentage‑point accuracy gains on internal benchmarks and deploying high‑performing models for tasks such as topic classification, fake‑review detection, and sentiment analysis, while planning broader platform and model extensions.

Few‑Shot LearningNLPSemi-supervised Learning
0 likes · 29 min read
Few-Shot Learning Methods and Applications in Meituan NLP
58 Tech
58 Tech
Aug 19, 2021 · Artificial Intelligence

Practical NER Techniques for Business Chatbots on the 58.com Service Platform

This article presents a comprehensive case study of applying named‑entity‑recognition (NER) techniques to the smart chat assistant of 58.com’s yellow‑page service, covering business background, model selection (BiLSTM‑CRF, IDCNN‑CRF, BERT), data‑augmentation, focal loss, fusion of rule‑based and neural methods, context modeling, online performance, and future research directions.

BERTCRFDialogue Systems
0 likes · 16 min read
Practical NER Techniques for Business Chatbots on the 58.com Service Platform
Beike Product & Technology
Beike Product & Technology
Jul 1, 2021 · Artificial Intelligence

Semantic Data Augmentation and GigaSpeech: Highlights of Two INTERSPEECH 2021 Papers from the Beike Voice Team

The article summarizes two INTERSPEECH 2021 papers from Beike's voice technology team, detailing a grammar‑based semantic data augmentation method that improves end‑to‑end Chinese speech recognition and introducing GigaSpeech, a massive 10,000‑hour multilingual English speech dataset for robust ASR research.

ChineseGigaSpeechInterspeech
0 likes · 7 min read
Semantic Data Augmentation and GigaSpeech: Highlights of Two INTERSPEECH 2021 Papers from the Beike Voice Team
DataFunTalk
DataFunTalk
May 9, 2021 · Artificial Intelligence

Few-Shot Learning, Data Augmentation, and Multi‑Task Learning for Safety Modeling in Ride‑Hailing Platforms

This article presents Didi's exploration of few‑shot learning, data‑augmentation, semi‑supervised self‑training and multi‑task learning techniques to address the scarcity of labeled samples in safety and governance scenarios, demonstrating practical solutions and performance gains across various risk‑detection tasks.

AIFew‑Shot LearningSemi-supervised Learning
0 likes · 15 min read
Few-Shot Learning, Data Augmentation, and Multi‑Task Learning for Safety Modeling in Ride‑Hailing Platforms
Didi Tech
Didi Tech
Apr 20, 2021 · Artificial Intelligence

Few-Shot Learning, Data Augmentation, and Semi‑Supervised Methods for Improving Safety and Governance Models at Didi

To overcome scarce labeled data for safety and governance, Didi combines few‑shot learning with systematic data augmentation, self‑training semi‑supervised labeling, and multi‑task neural architectures, cutting labeling costs and reducing log‑loss by over 20% while boosting ROC‑AUC and PR‑AUC across harassment detection, expense‑complaint, and route‑intercept use cases.

AI SafetyDidiFew‑Shot Learning
0 likes · 15 min read
Few-Shot Learning, Data Augmentation, and Semi‑Supervised Methods for Improving Safety and Governance Models at Didi
DataFunTalk
DataFunTalk
Apr 5, 2021 · Artificial Intelligence

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

The article reviews the Chinese NLP pre‑training model generalization competition, detailing data preprocessing, augmentation, external data usage, model scaling and architecture tweaks, loss functions, learning‑rate and adversarial training strategies, regularization techniques, post‑processing optimizations, and ineffective methods, highlighting their impact on performance metrics.

Loss FunctionsModel OptimizationNLP
0 likes · 15 min read
Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge
AntTech
AntTech
Mar 3, 2021 · Artificial Intelligence

Ant Group Intelligent Service Research Overview: NLP, Dialogue, Recommendation, and Anti‑fraud Papers

The article presents a comprehensive overview of Ant Group's intelligent service research, summarizing recent AI‑focused papers on text classification, stance detection, data augmentation, knowledge distillation for ranking, reinforcement‑learning‑based dialogue clarification, behavior‑cloning dialogue systems, anti‑fraud outbound bots, tag‑based service recommendation, and multi‑agent service groups, while also highlighting future directions and recruitment opportunities.

AI researchAnti‑fraudDialogue Systems
0 likes · 17 min read
Ant Group Intelligent Service Research Overview: NLP, Dialogue, Recommendation, and Anti‑fraud Papers
Ctrip Technology
Ctrip Technology
Dec 10, 2020 · Artificial Intelligence

Automatic Extraction of Theme-based Recommendation Reasons: Framework, Model Selection, Data Augmentation, and Optimization

This article presents a comprehensive study on automatically extracting theme‑based recommendation reasons for travel content, detailing a three‑stage retrieval framework, the advantages of interactive matching models over classification, rule‑based and back‑translation data augmentation techniques, and various model optimization strategies including priors, transfer learning, seed selection, optimizer choice, and layer‑wise learning rates.

AIRecommendation Systemsdata augmentation
0 likes · 19 min read
Automatic Extraction of Theme-based Recommendation Reasons: Framework, Model Selection, Data Augmentation, and Optimization
DeWu Technology
DeWu Technology
Nov 26, 2020 · Artificial Intelligence

Automated Captcha Recognition Using Machine Learning

The article outlines a machine‑learning pipeline for automated captcha recognition, covering dataset generation, image preprocessing, segmentation via clustering or watershed methods, and classification using classic models and CNNs, achieving roughly 94% accuracy while noting the growing complexity of modern captchas and recommending developer collaboration when feasible.

CaptchaNeural NetworksPython
0 likes · 23 min read
Automated Captcha Recognition Using Machine Learning
Suning Technology
Suning Technology
Nov 14, 2020 · Artificial Intelligence

Designing Real-Time AI Algorithms for Unmanned Retail Stores

This lecture details the end‑to‑end AI architecture for unmanned stores, covering algorithm module selection, calibration, face recognition, multi‑task detection, tracking, recommendation, data collection, augmentation, model training, and GPU‑accelerated deployment to achieve real‑time performance and high accuracy.

Deep LearningModel Deploymentdata augmentation
0 likes · 15 min read
Designing Real-Time AI Algorithms for Unmanned Retail Stores
360 Quality & Efficiency
360 Quality & Efficiency
Sep 18, 2020 · Artificial Intelligence

Data Augmentation Techniques for Improving Object Detection Model Robustness

To enhance object detection robustness, the article discusses various data augmentation methods—including rotation, flipping, random cropping, scaling, color jitter, blurring, transparency adjustment, and image partitioning—providing code examples and illustrating their impact on model performance with before‑and‑after results.

Computer VisionPythondata augmentation
0 likes · 7 min read
Data Augmentation Techniques for Improving Object Detection Model Robustness
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Jul 29, 2020 · Artificial Intelligence

Boosting Small Industrial Image Datasets with ModelArts Augmentation and Evaluation

This article describes a practical workflow for expanding a limited industrial solar‑panel defect dataset using flip augmentation, ModelArts smart labeling, and targeted data‑balancing techniques, then evaluates the impact on a ResNet‑50 classifier with detailed accuracy and recall metrics, demonstrating how thoughtful augmentation can improve defect detection performance.

Deep LearningImage ClassificationModelArts
0 likes · 10 min read
Boosting Small Industrial Image Datasets with ModelArts Augmentation and Evaluation
DataFunTalk
DataFunTalk
Jun 28, 2020 · Artificial Intelligence

Applying UDA Semi‑Supervised Learning to Financial Text Classification: Experiments and Insights

This article investigates the practical performance of Google’s 2019 Unsupervised Data Augmentation (UDA) framework on real‑world financial text classification tasks, detailing experiments with limited labeled data, domain‑out‑of‑distribution samples, noisy labels, and comparisons between BERT and lightweight TextCNN models.

BERTSemi-supervised LearningTextCNN
0 likes · 21 min read
Applying UDA Semi‑Supervised Learning to Financial Text Classification: Experiments and Insights
Sohu Tech Products
Sohu Tech Products
May 27, 2020 · Artificial Intelligence

Geometric Transformations and Data Augmentation with OpenCV: Forward/Backward Mapping, Rotation, Translation, and Affine Operations

This article explains traditional image augmentation techniques focusing on geometric transformations such as translation and rotation, describes forward and backward mapping concepts, coordinate‑system conversion, matrix representations, and provides detailed C++ OpenCV examples for implementing these operations with warpAffine and getRotationMatrix2D.

Geometric TransformationOpenCVc++
0 likes · 11 min read
Geometric Transformations and Data Augmentation with OpenCV: Forward/Backward Mapping, Rotation, Translation, and Affine Operations
Taobao Frontend Technology
Taobao Frontend Technology
May 25, 2020 · Frontend Development

How to Build Front‑End AI Experiments with Pipcook: From Setup to Real‑World Image Classification

This comprehensive guide walks front‑end developers through preparing hardware and OS, installing Python and Node environments, launching Pipcook's visual board, running handwritten digit and image classification experiments, creating and augmenting training samples, configuring pipelines, training models, and understanding deployment, all using the Pipcook framework.

Image Classificationdata augmentationmachine learning
0 likes · 34 min read
How to Build Front‑End AI Experiments with Pipcook: From Setup to Real‑World Image Classification
DataFunTalk
DataFunTalk
Mar 17, 2020 · Artificial Intelligence

A Survey of Text Data Augmentation Techniques in Natural Language Processing

This article systematically reviews recent developments in text data augmentation for natural language processing, covering common scenarios such as low‑resource and imbalanced classification, and detailing five major techniques—including back‑translation, EDA, TF‑IDF‑based replacement, contextual augmentation, and language‑model‑based methods—with experimental results and future directions.

NLPdata augmentationmachine learning
0 likes · 27 min read
A Survey of Text Data Augmentation Techniques in Natural Language Processing
Xianyu Technology
Xianyu Technology
Dec 11, 2019 · Artificial Intelligence

Improving Small Object Detection for UI2CODE via Data Augmentation and Model Optimization

The study enhances UI2CODE’s ability to detect tiny UI components by augmenting training data with copied small objects, upgrading the detector from Faster RCNN to FPN and Cascade FPN, and refining box positions with smoothing and projection, achieving superior small‑object mAP/mAR and enabling broader UI parsing applications.

Computer VisionFPNModel Optimization
0 likes · 9 min read
Improving Small Object Detection for UI2CODE via Data Augmentation and Model Optimization
ITPUB
ITPUB
Oct 22, 2019 · Artificial Intelligence

Master Real-Time Image Augmentation with Keras ImageDataGenerator

This guide explains how Keras ImageDataGenerator performs on‑the‑fly image augmentation—covering rotation, shifts, brightness, shear, zoom, channel shifts, flips, and fill‑mode options—with concise Python code examples and visual results to help prevent overfitting in deep‑learning models.

ImageDataGeneratorKerasTensorFlow
0 likes · 7 min read
Master Real-Time Image Augmentation with Keras ImageDataGenerator
Xianyu Technology
Xianyu Technology
Aug 7, 2019 · Artificial Intelligence

Weex Page Mocking with Puppeteer for Large‑Scale UI Sample Generation

To solve the shortage of annotated UI data for UI2CODE, the team uses Puppeteer to load Weex pages, traverses the DOM to gather text and image elements, records their styles and positions, screenshots the page, and repeatedly swaps content, automatically generating thousands of realistic, labeled UI samples from a few hundred templates, greatly cutting manual labeling effort and boosting model accuracy.

PuppeteerSynthetic SamplesUI automation
0 likes · 8 min read
Weex Page Mocking with Puppeteer for Large‑Scale UI Sample Generation
DataFunTalk
DataFunTalk
Jun 10, 2019 · Artificial Intelligence

BERT Applications Across NLP Domains: Progress, Challenges, and Future Directions

This article surveys the rapid proliferation of BERT-based research over the past six months, analyzing its impact on various NLP tasks such as question answering, information retrieval, dialog systems, summarization, data augmentation, classification, and sequence labeling, while also discussing the model's strengths, limitations, and future research opportunities.

BERTNLPdata augmentation
0 likes · 52 min read
BERT Applications Across NLP Domains: Progress, Challenges, and Future Directions
Qunar Tech Salon
Qunar Tech Salon
Apr 29, 2019 · Artificial Intelligence

Multi‑Level Deep Model Fusion for Fake News Detection Using BERT – Winning Solution of WSDM Cup 2019

The article details the Travel team's award‑winning solution for the WSDM Cup 2019 fake‑news detection task, describing data analysis, preprocessing, label‑propagation augmentation, a BERT‑based baseline, a three‑stage multi‑level model‑fusion framework, experimental results, and future directions.

BERTModel FusionNLP
0 likes · 12 min read
Multi‑Level Deep Model Fusion for Fake News Detection Using BERT – Winning Solution of WSDM Cup 2019
Suning Technology
Suning Technology
Apr 26, 2018 · Artificial Intelligence

Inside Suning’s Scalable Real‑Time Face Recognition Architecture and Algorithms

Suning’s face recognition solution combines front‑end detection, optimal photo selection, alignment, and cloud‑based feature extraction and matching, leveraging deep‑learning models, weight and feature normalization, angular margins, and triplet loss, while optimizing hardware, bandwidth, and data quality for large‑scale 1:N deployments.

Deep Learningdata augmentationface recognition
0 likes · 18 min read
Inside Suning’s Scalable Real‑Time Face Recognition Architecture and Algorithms