Tag

synthetic data

0 views collected around this technical thread.

Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3
0 likes · 7 min read
DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data
AntTech
AntTech
Oct 29, 2024 · Artificial Intelligence

Embodied Intelligence and General‑Purpose Humanoid Robots: Insights from Wang He’s Ant T‑Space Talk

In a detailed presentation, Peking University assistant professor Wang He explained the current state and future direction of embodied intelligence, emphasizing synthetic data, three core intelligences, and the commercial‑grade capabilities of his startup’s general‑purpose humanoid robots across manufacturing, retail, and home applications.

AI researchIndustrial Automationembodied AI
0 likes · 17 min read
Embodied Intelligence and General‑Purpose Humanoid Robots: Insights from Wang He’s Ant T‑Space Talk
AntTech
AntTech
Sep 21, 2024 · Artificial Intelligence

Insights from the 2024 Inclusion·Bund Conference: From Data for AI to AI for Data

The 2024 Inclusion·Bund conference brought together academia and industry leaders to discuss how data technologies are evolving and aligning with AI, covering trends in large‑model storage, synthetic data generation, AI‑enhanced databases, and Ant Group's emerging AI‑centric data ecosystem.

AIAI alignmentData Strategy
0 likes · 7 min read
Insights from the 2024 Inclusion·Bund Conference: From Data for AI to AI for Data
AntData
AntData
Sep 6, 2024 · Artificial Intelligence

Insights from the 2024 Inclusion·Bund Conference: From Data for AI to AI for Data

The 2024 Inclusion·Bund Conference forum brought together leading academics and industry experts to examine how data value is shifting in the AI era, covering large‑model storage challenges, the rise of synthetic data, AI‑enhanced databases, and Ant Group’s next‑generation intelligent data architecture.

AIData StrategyIntelligent Data Systems
0 likes · 6 min read
Insights from the 2024 Inclusion·Bund Conference: From Data for AI to AI for Data
DataFunTalk
DataFunTalk
Aug 24, 2024 · Artificial Intelligence

Improving the Mathematical Reasoning Ability of Large Language Models: Overview, Mixed Instructions, Synthetic Data, and Training Optimization

This article presents a comprehensive approach to enhancing large language models' mathematical reasoning by reviewing model architectures, introducing mixed CoT‑PoT instructions, generating and filtering synthetic data, and applying multi‑stage training optimizations such as RFT, PPO, and DPO, with detailed experimental results and Q&A.

AIReward ModelTraining Optimization
0 likes · 16 min read
Improving the Mathematical Reasoning Ability of Large Language Models: Overview, Mixed Instructions, Synthetic Data, and Training Optimization
IT Services Circle
IT Services Circle
Jul 9, 2024 · Artificial Intelligence

Comparative Study of Classification Algorithms and Calibration Using Synthetic Data

This article presents a comprehensive case study that explains classification principles, shows the key formulas for logistic regression and SVM, and provides a full Python implementation that generates synthetic data, trains multiple classifiers, calibrates them, and visualizes calibration curves and probability histograms.

Pythoncalibrationclassification
0 likes · 6 min read
Comparative Study of Classification Algorithms and Calibration Using Synthetic Data
DataFunSummit
DataFunSummit
Nov 29, 2023 · Artificial Intelligence

AIGC and Causal Inference: Mutual Empowerment and Applications with YLearn

This article explores how generative AI (AIGC) can be used to synthesize structured data, how synthetic data supports causal inference, and how agent‑based modeling and the YLearn framework together enable advanced causal discovery, effect estimation, and scenario simulation for enterprise AI applications.

AIGCAgent-Based ModelingArtificial Intelligence
0 likes · 16 min read
AIGC and Causal Inference: Mutual Empowerment and Applications with YLearn
Model Perspective
Model Perspective
Oct 9, 2023 · Fundamentals

Unpacking Gender Wage Gaps: Oaxaca‑Blinder, Regression & Simulated Data

This article reviews Claudia Goldin’s Nobel‑winning research on gender wage disparities, explaining the Oaxaca‑Blinder decomposition, multiple linear regression, and mean‑difference models, and demonstrates their application with a synthetic dataset and Python code to illustrate how education, experience, and gender affect wages.

Oaxaca-Blindergender wage gaplabor economics
0 likes · 10 min read
Unpacking Gender Wage Gaps: Oaxaca‑Blinder, Regression & Simulated Data
DataFunSummit
DataFunSummit
Sep 4, 2023 · Artificial Intelligence

AIGC and Causal Inference: Mutual Empowerment and Applications with YLearn

This article explores how generative AI (AIGC) can be used to synthesize structured data, how synthetic data and agent‑based modeling support causal inference, and introduces the YLearn framework for end‑to‑end causal learning, highlighting practical use cases and research directions.

AIGCAgent-Based ModelingYLearn
0 likes · 15 min read
AIGC and Causal Inference: Mutual Empowerment and Applications with YLearn
DataFunTalk
DataFunTalk
Nov 22, 2022 · Artificial Intelligence

NVIDIA's Advances in Multi‑Role Generative Dialogue Modeling and Synthetic Data‑Driven QA

This article reviews NVIDIA's recent work on multi‑role generative dialogue modeling using GPT‑2‑based architectures and on enhancing question‑answering systems with synthetic data pipelines, covering model design, data preparation from Reddit, extensive experiments, scaling effects, and practical Q&A insights.

GPT-2Generative DialogueModel Scaling
0 likes · 17 min read
NVIDIA's Advances in Multi‑Role Generative Dialogue Modeling and Synthetic Data‑Driven QA
Kuaishou Large Model
Kuaishou Large Model
Dec 17, 2020 · Artificial Intelligence

How KAIFX Generates High‑Quality Virtual Data for AI Training

This article explains how KAIFX, a synthetic data platform built on computer graphics and AI techniques, tackles challenges of data scarcity, realism, labeling bias, and management to boost AR and 3D face reconstruction model performance.

3D face reconstructionAIAR
0 likes · 12 min read
How KAIFX Generates High‑Quality Virtual Data for AI Training