Tagged articles
45 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 10, 2026 · Artificial Intelligence

How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill

This guide explains how to build high‑quality agent training data using ReAct trajectories, synthesize difficult samples with a data‑flywheel, and distill the knowledge into small LLMs on Alibaba Cloud PAI, covering teacher model deployment, EasyDistill installation, data generation, task solving, rubric filtering, and final model deployment.

AgentData GenerationEasyDistill
0 likes · 14 min read
How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Nov 15, 2025 · Artificial Intelligence

How to Build Robust Function Call Training Data for LLM Agents

This article explains why function call capabilities in large language model agents require dedicated training, outlines the four core abilities to teach, describes the structure and sources of effective training data, and compares lightweight LoRA fine‑tuning with full supervised fine‑tuning approaches.

Agent SystemsData GenerationFine-tuning
0 likes · 11 min read
How to Build Robust Function Call Training Data for LLM Agents
Ele.me Technology
Ele.me Technology
Nov 13, 2025 · Artificial Intelligence

How Multi‑Agent AI Architecture Solves Complex Data Generation Challenges

This article details the design and evolution of a multi‑agent AI system for automated data generation in integration testing, covering challenges, single‑ versus multi‑agent approaches, prompt engineering, tool governance, intent recognition, tool filtering, reasoning execution, performance gains, and practical recommendations.

AIData GenerationMulti-Agent
0 likes · 25 min read
How Multi‑Agent AI Architecture Solves Complex Data Generation Challenges
Data Party THU
Data Party THU
Oct 30, 2025 · Artificial Intelligence

How to Generate Realistic Synthetic Data with Histograms and GMMs

This article explains two practical techniques—histogram‑based per‑column synthesis and Gaussian‑Mixture‑Model generation—for creating large, privacy‑preserving synthetic datasets that retain the statistical distributions and inter‑column relationships of the original data, and shows how to evaluate their quality.

Data GenerationGaussian mixture modelPython
0 likes · 27 min read
How to Generate Realistic Synthetic Data with Histograms and GMMs
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 17, 2025 · Artificial Intelligence

Unlocking Precise AI Data Generation with Multi‑Agent Architecture

This article explains how a multi‑agent system—comprising intent‑recognition, tool‑engine, and inference agents—solves the challenges of AI‑driven data generation (AI‑造数) by improving accuracy, speed, and scalability through modular design, prompt engineering, and sophisticated tool governance.

AIData GenerationMulti-Agent
0 likes · 24 min read
Unlocking Precise AI Data Generation with Multi‑Agent Architecture
Fighter's World
Fighter's World
Sep 12, 2025 · Artificial Intelligence

Why Are Production‑Grade AI Agents So Hard to Build?

The article analyses why production‑grade AI agents remain unreliable, pinpointing the scarcity of high‑quality task‑action data, the limits of static benchmarks, and the need for massive data‑generation engines, simulation sandboxes, sophisticated RL reward design, and efficient context engineering.

AI AgentContext EngineeringData Generation
0 likes · 21 min read
Why Are Production‑Grade AI Agents So Hard to Build?
Cognitive Technology Team
Cognitive Technology Team
Mar 17, 2025 · Artificial Intelligence

Leveraging Large Language Models to Optimize Traditional Machine Learning Pipelines

Large language models can assist and enhance each stage of traditional machine learning—including sample generation, data cleaning, feature engineering, model selection, hyper‑parameter tuning, and workflow automation—by generating synthetic data, refining features, selecting models, and orchestrating pipelines, though challenges such as bias, privacy, and noise remain.

Data GenerationLLMfeature engineering
0 likes · 11 min read
Leveraging Large Language Models to Optimize Traditional Machine Learning Pipelines
FunTester
FunTester
Oct 15, 2024 · Backend Development

Generate Realistic Test Data in Go with the GoFakeIt Library

This article introduces GoFakeIt, a lightweight Go library for quickly generating diverse fake data—personal, address, financial, network, and more—explains its key features, shows how to install it via go get, and provides practical code examples for each data type.

BackendData GenerationFake Data
0 likes · 13 min read
Generate Realistic Test Data in Go with the GoFakeIt Library
DevOps
DevOps
May 29, 2024 · Artificial Intelligence

End-to-End Task-Oriented Dialogue Agent Construction Using Monte Carlo Simulation and LLM Fine-Tuning

This article presents an end‑to‑end approach for building task‑oriented dialogue agents by simulating user behavior with Monte Carlo methods, generating training data via LLMs, and efficiently fine‑tuning multiple large language models using LLaMA Factory, demonstrating significant improvements in intent recognition, slot filling, and contextual understanding.

Data GenerationLLM fine-tuningMonte Carlo simulation
0 likes · 17 min read
End-to-End Task-Oriented Dialogue Agent Construction Using Monte Carlo Simulation and LLM Fine-Tuning
Test Development Learning Exchange
Test Development Learning Exchange
May 20, 2024 · Backend Development

Why Generate Simulated ID Card Numbers for API Automation Testing and a Python GUI Generator

Generating simulated, legally‑formatted ID card numbers for API automation testing improves data realism, protects privacy, expands test coverage, and enables efficient, repeatable, parameterized, and performance testing, while the provided Python GUI script demonstrates how to create such data programmatically.

Data GenerationGUIID Card
0 likes · 10 min read
Why Generate Simulated ID Card Numbers for API Automation Testing and a Python GUI Generator
Code Ape Tech Column
Code Ape Tech Column
Jan 12, 2024 · Databases

MySQL Event Scheduler: Concepts, Operations, and Practical Examples

This article explains MySQL event scheduler fundamentals, including enabling/disabling the scheduler, creating, altering, and dropping events, scheduling syntax, and practical examples such as generating real-time sales data and periodic statistics, providing code snippets and best‑practice guidance for database automation.

Data GenerationEvent SchedulerSQL
0 likes · 9 min read
MySQL Event Scheduler: Concepts, Operations, and Practical Examples
Test Development Learning Exchange
Test Development Learning Exchange
Nov 9, 2023 · Backend Development

10 Common Python Automation Testing Scripts for API Testing

This article presents ten practical Python automation testing scripts—including batch test execution, data‑driven testing, API monitoring, performance measurement, database checks, screenshot capture, email reporting, data generation, logging, and response validation—to help developers streamline API testing and improve efficiency.

Data GenerationEmailapi-testing
0 likes · 7 min read
10 Common Python Automation Testing Scripts for API Testing
Python Programming Learning Circle
Python Programming Learning Circle
May 9, 2023 · Fundamentals

Python Automation Scripts: URL Shortener, Fake Data Generator, YouTube Downloader, NATO Encoder, and Selenium Login

This article showcases Python's concise syntax and powerful libraries by comparing a simple web request with JavaScript and providing five practical automation scripts—including a URL shortener, fake data generator, YouTube downloader, NATO alphabet encoder, and Selenium-based social‑media login—demonstrating why Python is ideal for repetitive tasks.

AutomationData GenerationFaker
0 likes · 7 min read
Python Automation Scripts: URL Shortener, Fake Data Generator, YouTube Downloader, NATO Encoder, and Selenium Login
KooFE Frontend Team
KooFE Frontend Team
May 5, 2023 · Frontend Development

Create Mock APIs in 10 Minutes with ChatGPT and json‑server

This guide shows front‑end developers how to generate realistic mock data with ChatGPT, export it as JSON, and instantly serve a full RESTful mock API using json‑server, covering schema design, routing, filtering, pagination, sorting, and query operators.

ChatGPTData GenerationMock API
0 likes · 8 min read
Create Mock APIs in 10 Minutes with ChatGPT and json‑server
Top Architect
Top Architect
Apr 2, 2023 · Databases

Optimizing Large-Scale Pagination Queries in MySQL: Data Generation and Index Strategies

This article demonstrates how to generate millions of test rows in MySQL, analyzes the performance impact of deep pagination using LIMIT, explains why non‑clustered index lookups cause costly table scans, and presents two optimization approaches—sub‑query ID filtering and key‑set pagination—to dramatically reduce query latency.

Data GenerationIndex OptimizationSQL
0 likes · 8 min read
Optimizing Large-Scale Pagination Queries in MySQL: Data Generation and Index Strategies
Python Programming Learning Circle
Python Programming Learning Circle
Mar 7, 2023 · Fundamentals

Python Automation Scripts: Web Requests, URL Shortening, Fake Data Generation, Video Downloading, NATO Encryption, and Selenium Login

This article presents a series of Python automation examples—including HTTP requests, URL shortening, fake data creation, YouTube video downloading, NATO‑style message encryption, and Selenium‑driven social‑media login—showcasing concise code snippets and explanations that highlight Python's simplicity and versatility for repetitive tasks.

Data GenerationSeleniumencryption
0 likes · 7 min read
Python Automation Scripts: Web Requests, URL Shortening, Fake Data Generation, Video Downloading, NATO Encryption, and Selenium Login
FunTester
FunTester
Apr 26, 2022 · Backend Development

Low‑Cost, Rapid Generation of High‑Quality Test Data Using Apifox

This article explains why test data is essential, introduces the Apifox tool as a low‑cost, fast solution for creating both generic and domain‑specific test data, and provides step‑by‑step guidance on using its mock engine, custom rules, batch generation, and automation features to produce reliable testing datasets.

API testingApifoxAutomation
0 likes · 9 min read
Low‑Cost, Rapid Generation of High‑Quality Test Data Using Apifox
Zhongtong Tech
Zhongtong Tech
Jan 10, 2022 · Backend Development

How ZTO Built a Unified Test Tool Platform to Boost Efficiency

This article describes how ZTO's testing team created a centralized test‑tool platform that integrates front‑end and back‑end services, standardizes tool access, tracks usage via AOP, supports data generation, order creation, tracking, MQ messaging, and other platform integrations to dramatically improve testing productivity.

Backend DevelopmentData GenerationMQ
0 likes · 11 min read
How ZTO Built a Unified Test Tool Platform to Boost Efficiency
MaGe Linux Operations
MaGe Linux Operations
Jun 13, 2021 · Fundamentals

7 Fun Python Projects: Web Scraping, Chatbots, Poetry Classification and More

This article presents seven practical Python scripts—from a concise web scraper for Zhihu images and a chatbot conversation loop to a Naive Bayes poem author classifier, a lottery number generator, an automated essay writer, a screen‑capture tool, and a GIF creator—demonstrating how to avoid reinventing the wheel while exploring diverse automation tasks.

ChatbotData GenerationNLP
0 likes · 8 min read
7 Fun Python Projects: Web Scraping, Chatbots, Poetry Classification and More
FunTester
FunTester
Feb 23, 2021 · Backend Development

Improving Software Testability: Practical Tips for Captcha Handling, Data Generation, Mocking, and Test Code Deployment

This article shares practical techniques to enhance software testability, covering strategies for bypassing graphical and SMS captchas, efficient test data creation, automated and brute‑force data injection, mocking services, and deploying test‑specific code without affecting production environments.

Backend testingCaptchaData Generation
0 likes · 11 min read
Improving Software Testability: Practical Tips for Captcha Handling, Data Generation, Mocking, and Test Code Deployment
Code Ape Tech Column
Code Ape Tech Column
Feb 8, 2021 · Databases

Evaluating the ‘No Join Over Three Tables’ Rule from Alibaba Java Development Manual with MySQL and Oracle Experiments

This article investigates why the Alibaba Java Development Manual advises against joining more than three tables by designing and executing large‑scale MySQL and Oracle experiments, analyzing query performance, indexing effects, and data‑generation scripts to determine the practical limits of multi‑table joins.

Data GenerationDatabase OptimizationJoin Performance
0 likes · 11 min read
Evaluating the ‘No Join Over Three Tables’ Rule from Alibaba Java Development Manual with MySQL and Oracle Experiments
Architecture Digest
Architecture Digest
Feb 2, 2021 · Databases

Why Alibaba's Java Development Manual Prohibits Joins Over Three Tables – MySQL and Oracle Performance Experiments

The article investigates the Alibaba Java Development Manual's rule against joining more than three tables by designing and executing extensive MySQL and Oracle experiments, generating massive test data, measuring query performance, and concluding that the restriction stems from join scalability limits on large datasets.

Data GenerationDatabase OptimizationJoin Performance
0 likes · 11 min read
Why Alibaba's Java Development Manual Prohibits Joins Over Three Tables – MySQL and Oracle Performance Experiments
MaGe Linux Operations
MaGe Linux Operations
Aug 27, 2020 · Backend Development

Boost Your Test Data Generation with Python’s Faker Library

This article introduces the Python Faker library, explains why manually creating test data is inefficient, shows how to install Faker, demonstrates basic usage, locale customization, a wide range of built‑in providers for personal, geographic, financial, and network data, and how to create custom providers for reusable mock data in development and testing workflows.

AutomationData GenerationFaker
0 likes · 14 min read
Boost Your Test Data Generation with Python’s Faker Library
Selected Java Interview Questions
Selected Java Interview Questions
Aug 24, 2020 · Databases

Performance Evaluation of Multi‑Table Joins in MySQL and Oracle with Large Data Sets

This article investigates the practical limits of joining more than three tables in MySQL by designing experiments with up to 1.5 billion rows, comparing indexed and non‑indexed queries, and contrasting the results with Oracle's performance, while providing full SQL scripts for data generation and analysis.

Data GenerationDatabase OptimizationJoin Performance
0 likes · 11 min read
Performance Evaluation of Multi‑Table Joins in MySQL and Oracle with Large Data Sets
Top Architect
Top Architect
Jun 16, 2020 · Databases

Performance Evaluation of Multi-Table Joins in MySQL and Oracle with Large Datasets

This article investigates the Alibaba Java Development Manual's recommendation against joining more than three tables by experimentally evaluating multi-table join performance in MySQL and Oracle using massive synthetic datasets, analyzing query execution times, indexing effects, and providing data generation scripts and detailed results.

Data GenerationJOIN optimizationOracle
0 likes · 13 min read
Performance Evaluation of Multi-Table Joins in MySQL and Oracle with Large Datasets
ITPUB
ITPUB
Jun 15, 2020 · Databases

Why Does Alibaba's Java Manual Ban Joins Over Three Tables? A Deep Performance Dive

This article investigates Alibaba's recommendation against joining more than three MySQL tables by designing experiments with synthetic data, measuring query performance on MySQL and Oracle, analyzing index impact, and providing full DDL and data‑generation scripts to explain the rule's practical limits.

Data GenerationJOINOracle
0 likes · 11 min read
Why Does Alibaba's Java Manual Ban Joins Over Three Tables? A Deep Performance Dive
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 25, 2020 · Artificial Intelligence

How 3D Synthetic Data Supercharges AI Vision for Smart Vending Machines

This article explains how Alibaba's Alipay visual vending cabinet leverages 3D synthetic data generation—covering full‑material 3D reconstruction, parametric scene modeling, and photo‑realistic rendering—to rapidly produce high‑quality training images, dramatically cutting cost and accelerating AI model deployment.

3D synthesisAI training dataComputer Vision
0 likes · 10 min read
How 3D Synthetic Data Supercharges AI Vision for Smart Vending Machines
21CTO
21CTO
Dec 12, 2019 · Databases

Why Does Alibaba’s Java Handbook Ban Joins Over Three Tables? A Deep MySQL & Oracle Performance Test

This article investigates the claim from Alibaba's Java Development Manual that joining more than three tables should be avoided, by setting up a MySQL 5.7 environment, generating massive synthetic data, executing multi‑table join queries, analyzing execution times, and comparing the results with Oracle, ultimately revealing the practical limits of MySQL joins on large data sets.

Data GenerationDatabase OptimizationJoin Performance
0 likes · 12 min read
Why Does Alibaba’s Java Handbook Ban Joins Over Three Tables? A Deep MySQL & Oracle Performance Test
Xianyu Technology
Xianyu Technology
Sep 17, 2019 · Frontend Development

Large-Scale UI Sample Generation for Alibaba 99 Promotion Module Recognition

The article describes a pipeline that automatically extracts a JSON‑like DSL representation of Alibaba’s 99‑promotion UI from rendered pages, cleanses CSS, converts transforms, renders the DSL to images, and combines it with dynamic ViewModel data to generate tens of thousands of high‑quality samples per module, raising recognition accuracy above 98%.

DSLData GenerationFront-end
0 likes · 8 min read
Large-Scale UI Sample Generation for Alibaba 99 Promotion Module Recognition
FunTester
FunTester
Sep 15, 2019 · Backend Development

Automate Adding High School Records with Selenium and Java

This tutorial shows how to use Java and Selenium WebDriver to programmatically add a high‑school record with random scores and rankings, streamlining pagination testing by generating the required data automatically instead of entering each entry manually.

Data GenerationJavaSelenium
0 likes · 4 min read
Automate Adding High School Records with Selenium and Java
Java Backend Technology
Java Backend Technology
Jul 23, 2019 · Databases

Why Alibaba Bans Joins Over Three Tables: Real‑World MySQL & Oracle Benchmarks

This article investigates Alibaba's rule against joining more than three tables by designing MySQL and Oracle experiments that generate massive student‑teacher‑course data, run multi‑table join queries, compare indexed versus non‑indexed scenarios, and reveal the performance limits that drive the guideline.

Data GenerationDatabase PerformanceIndex Optimization
0 likes · 12 min read
Why Alibaba Bans Joins Over Three Tables: Real‑World MySQL & Oracle Benchmarks
Test Development Learning Exchange
Test Development Learning Exchange
Sep 19, 2018 · Backend Development

Common Faker Library Functions – Usage and Code Examples

This article provides a comprehensive list of frequently used Faker library functions for generating mock data, explains how to install Faker, and includes practical Python code examples demonstrating random data generation, custom generators, and handling unique identifiers in test datasets.

Data GenerationFakerPython
0 likes · 8 min read
Common Faker Library Functions – Usage and Code Examples