Easy DataSet: An Open‑Source Tool for Building Domain‑Specific Datasets and Fine‑Tuning Large Language Models
The article introduces Easy DataSet, an open‑source tool that streamlines the creation of domain‑specific datasets by aggregating public data sources, chunking Markdown documents, generating and managing QA pairs with configurable LLM endpoints, and exporting them in common formats, while outlining its architecture and future roadmap.