Tagged articles

Dataset Construction

12 articles · Page 1 of 1

AI Large-Model Wave and Transformation Guide

May 29, 2026 · Big Data

How to Solve Data Governance + AI Agent Pitfalls: Agent Roles, NL2SQL Datasets, and Rule Templates Explained

The article analyzes why data‑governance projects still fail when combined with AI, presents a four‑layer NL2SQL architecture, details agent responsibilities, metadata‑governance methods, anomaly‑diagnosis and permission‑control flows, outlines dataset‑building stages, evaluation metrics, and provides a step‑by‑step rollout roadmap.

AI AgentAnomaly DetectionData Governance

0 likes · 21 min read

How to Solve Data Governance + AI Agent Pitfalls: Agent Roles, NL2SQL Datasets, and Rule Templates Explained

Baidu Tech Salon

Oct 10, 2025 · Artificial Intelligence

Navigating the 2025 AI Model Boom: Practical Evaluation Strategies

This article examines the rapid surge of large AI models in 2024‑2025, critiques the reliability of public leaderboards, and presents a business‑focused evaluation framework—including dataset construction, metric selection, automation, and LLM‑as‑judge techniques—to help developers choose the right model for real‑world applications.

AI benchmarksAI performanceDataset Construction

0 likes · 17 min read

Navigating the 2025 AI Model Boom: Practical Evaluation Strategies

Fun with Large Models

Sep 6, 2025 · Artificial Intelligence

How to Build a High-Quality Domain-Specific Fine-Tuning Dataset for Large Models

This article outlines a systematic engineering workflow for creating professional domain fine‑tuning datasets for large models, covering data processing, validation, optimal sample size, industrial‑environment practices, and special considerations for reinforcement‑learning based fine‑tuning.

Data ValidationDataset Constructiondata processing

0 likes · 7 min read

How to Build a High-Quality Domain-Specific Fine-Tuning Dataset for Large Models

Sohu Tech Products

Apr 16, 2025 · Artificial Intelligence

Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation

This guide walks readers through every stage of building high‑quality AI training datasets—from locating open‑source data and defining goals, through collection, annotation, cleaning, large‑scale processing, optional augmentation, and splitting, to validation—using a medical QA example for fine‑tuning DeepSeek‑R1.

AI fine-tuningDataset ConstructionPython

0 likes · 18 min read

Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation

AI Frontier Lectures

Mar 25, 2025 · Artificial Intelligence

What Drives Alignment in Multimodal Large Language Models? A Comprehensive Review

This article provides an in‑depth review of alignment algorithms for multimodal large language models, covering application scenarios, dataset construction methods, evaluation benchmarks, current challenges, and future research directions, while summarizing contributions from leading academic institutions.

AI researchDataset Constructionalignment algorithms

0 likes · 22 min read

What Drives Alignment in Multimodal Large Language Models? A Comprehensive Review

Architect

Mar 24, 2025 · Artificial Intelligence

How Multimodal Alignment Is Shaping the Future of Large Language Models

This article provides a systematic review of recent advances in multimodal alignment for large language models, covering key contributions, application scenarios, dataset construction, evaluation benchmarks, future challenges, and insights from LLM alignment research to guide both academia and industry.

AI safetyDataset ConstructionMLLM

0 likes · 26 min read

How Multimodal Alignment Is Shaping the Future of Large Language Models

AIWalker

Mar 13, 2025 · Artificial Intelligence

VideoPainter: Plug‑and‑Play Video Inpainting and Editing Achieves 8 SOTA Benchmarks

VideoPainter introduces a plug‑and‑play dual‑branch framework with a lightweight context encoder and ID‑resampling adapter, built on the massive VPData/VPBench dataset, and demonstrates state‑of‑the‑art performance across eight video restoration and editing metrics, while supporting flexible model integration and long‑video consistency.

Dataset ConstructionDual-Branch ArchitectureID Consistency

0 likes · 18 min read

VideoPainter: Plug‑and‑Play Video Inpainting and Editing Achieves 8 SOTA Benchmarks

Architect

Feb 22, 2025 · Artificial Intelligence

How Open‑Source Projects Reproduced DeepSeek‑R1 and Pushed LLM Limits

This article reviews the most notable open‑source reproductions of DeepSeek‑R1—including Open R1, OpenThoughts, LIMO and DeepScaleR—detailing their data pipelines, training steps, reinforcement‑learning strategies, dataset constructions, and benchmark results that demonstrate how small, high‑quality data can rival massive‑scale models.

AI researchDataset ConstructionDeepSeek-R1

0 likes · 26 min read

How Open‑Source Projects Reproduced DeepSeek‑R1 and Pushed LLM Limits

DaTaobao Tech

Jun 5, 2024 · Artificial Intelligence

Automated Quality Assessment for AIGC Image Generation: Recent Research Advances

The article reviews recent automated quality assessment advances for AIGC image generation, including an aesthetic scoring framework with the APDD dataset and AANSPS network, a human‑preference benchmark (HPD v2 and HPS v2) that outperforms IS/FID, and the Pick‑Score model trained on user‑driven Pick‑a‑Pic data, all enabling faster, unbiased evaluation, cost savings, and more effective model iteration, with ongoing work in home‑improvement AI.

AIGCAesthetic EvaluationDataset Construction

0 likes · 15 min read

Automated Quality Assessment for AIGC Image Generation: Recent Research Advances

Sohu Tech Products

Apr 24, 2024 · Artificial Intelligence

Domain-Specific Large Model Construction Guide

The guide explains why generic LLMs struggle with enterprise tasks and outlines two remedies—retrieval‑augmented generation and domain‑specific fine‑tuning—detailing dataset creation, training strategies (full‑parameter, LoRA, Q‑LoRA), validation methods, hardware benchmarks, and practical tips such as supervised fine‑tuning, 30% domain data, and a stepwise tuning pipeline.

AIDataset Constructiondomain-specific LLM

0 likes · 16 min read

Domain-Specific Large Model Construction Guide

DataFunTalk

Apr 21, 2024 · Artificial Intelligence

Guidelines for Building Domain-Specific Large Models: Dataset Construction, Training Methods, Evaluation, and Hardware Benchmarking

This article presents a comprehensive guide on constructing domain-specific large language models, covering the differences from general models, how to build high‑quality domain datasets, selecting appropriate training methods, designing validation sets, evaluating model capabilities, and benchmarking domestic hardware performance.

AIDataset ConstructionLarge Language Model

0 likes · 20 min read

Guidelines for Building Domain-Specific Large Models: Dataset Construction, Training Methods, Evaluation, and Hardware Benchmarking

Tencent Music Tech Team

Jun 1, 2021 · Artificial Intelligence

TDQA: A No-Reference Deep Learning Based Video Quality Assessment Algorithm for Live Streaming

TDQA is a no‑reference, deep‑learning video quality assessment algorithm designed for live‑streaming, built on a large subjectively annotated dataset and an end‑to‑end architecture with fine‑tuned backbones, achieving state‑of‑the‑art accuracy and sub‑second inference for real‑time quality monitoring and pipeline optimization.

Dataset ConstructionDeep LearningLive Streaming

0 likes · 15 min read

TDQA: A No-Reference Deep Learning Based Video Quality Assessment Algorithm for Live Streaming