Tag

dataset construction

1 views collected around this technical thread.

Sohu Tech Products
Sohu Tech Products
Apr 16, 2025 · Artificial Intelligence

Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation

This guide walks readers through every stage of building high‑quality AI training datasets—from locating open‑source data and defining goals, through collection, annotation, cleaning, large‑scale processing, optional augmentation, and splitting, to validation—using a medical QA example for fine‑tuning DeepSeek‑R1.

AI fine‑tuningPythondata augmentation
0 likes · 18 min read
Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation
Sohu Tech Products
Sohu Tech Products
Mar 19, 2025 · Artificial Intelligence

Easy DataSet: An Open‑Source Tool for Building Domain‑Specific Datasets and Fine‑Tuning Large Language Models

The article introduces Easy DataSet, an open‑source tool that streamlines the creation of domain‑specific datasets by aggregating public data sources, chunking Markdown documents, generating and managing QA pairs with configurable LLM endpoints, and exporting them in common formats, while outlining its architecture and future roadmap.

AILLM fine-tuningdata management
0 likes · 30 min read
Easy DataSet: An Open‑Source Tool for Building Domain‑Specific Datasets and Fine‑Tuning Large Language Models
DaTaobao Tech
DaTaobao Tech
Jun 5, 2024 · Artificial Intelligence

Automated Quality Assessment for AIGC Image Generation: Recent Research Advances

The article reviews recent automated quality assessment advances for AIGC image generation, including an aesthetic scoring framework with the APDD dataset and AANSPS network, a human‑preference benchmark (HPD v2 and HPS v2) that outperforms IS/FID, and the Pick‑Score model trained on user‑driven Pick‑a‑Pic data, all enabling faster, unbiased evaluation, cost savings, and more effective model iteration, with ongoing work in home‑improvement AI.

AIGCAesthetic EvaluationHuman Preference
0 likes · 15 min read
Automated Quality Assessment for AIGC Image Generation: Recent Research Advances
Sohu Tech Products
Sohu Tech Products
Apr 24, 2024 · Artificial Intelligence

Domain-Specific Large Model Construction Guide

The guide explains why generic LLMs struggle with enterprise tasks and outlines two remedies—retrieval‑augmented generation and domain‑specific fine‑tuning—detailing dataset creation, training strategies (full‑parameter, LoRA, Q‑LoRA), validation methods, hardware benchmarks, and practical tips such as supervised fine‑tuning, 30% domain data, and a stepwise tuning pipeline.

AIdataset constructiondomain-specific LLM
0 likes · 16 min read
Domain-Specific Large Model Construction Guide
DataFunTalk
DataFunTalk
Apr 21, 2024 · Artificial Intelligence

Guidelines for Building Domain-Specific Large Models: Dataset Construction, Training Methods, Evaluation, and Hardware Benchmarking

This article presents a comprehensive guide on constructing domain-specific large language models, covering the differences from general models, how to build high‑quality domain datasets, selecting appropriate training methods, designing validation sets, evaluating model capabilities, and benchmarking domestic hardware performance.

AIdataset constructiondomain model
0 likes · 20 min read
Guidelines for Building Domain-Specific Large Models: Dataset Construction, Training Methods, Evaluation, and Hardware Benchmarking
Tencent Music Tech Team
Tencent Music Tech Team
Jun 1, 2021 · Artificial Intelligence

TDQA: A No-Reference Deep Learning Based Video Quality Assessment Algorithm for Live Streaming

TDQA is a no‑reference, deep‑learning video quality assessment algorithm designed for live‑streaming, built on a large subjectively annotated dataset and an end‑to‑end architecture with fine‑tuned backbones, achieving state‑of‑the‑art accuracy and sub‑second inference for real‑time quality monitoring and pipeline optimization.

Live StreamingNo-ReferenceTDQA
0 likes · 15 min read
TDQA: A No-Reference Deep Learning Based Video Quality Assessment Algorithm for Live Streaming