Explore a 1.59 Million Image NSFW Dataset with 159 Fine-Grained Categories
A data scientist from Besedo has open‑sourced a massive NSFW image dataset containing 1.589 million pictures, organized into 159 primary categories and further sub‑categories, with download scripts and GitHub links, requiring about 500 GB of storage and cautioning against viewing in the office.
A data scientist named Evgeny Bazarov, currently working at content‑optimization company Besedo, has released an open‑source dataset of 1.589 million NSFW (not safe for work) images. The release follows the earlier 200 k‑image nsfw_data_scrapper dataset and aims to provide a more comprehensive and finely categorized collection.
Dataset Structure
The dataset is divided into 159 top‑level categories based on scene, appearance, and other attributes. Examples of these categories include appearance_clothing_dresses, locations_nature_beach, and amateur_self‑shots. Each top‑level category is further split into sub‑categories; for instance, appearance_clothing_dresses contains five more specific groups.
Intended Use
The dataset is designed for training models that can identify NSFW content. Because the images are explicitly “not safe for work” and the categories are highly granular, it can serve as a valuable resource for research in image moderation, content filtering, and related computer‑vision tasks.
Download and Technical Details
The full dataset occupies roughly 500 GB after download and cleaning. The repository provides only URLs to the images; the actual files are hosted elsewhere. To retrieve the images, users can employ the script 2_download_from_urls.sh found in the scripts directory of the nsfw_data_scrapper project.
Key repository links:
Download script: https://github.com/alexkimxyz/nsfw_data_scrapper Dataset URL list (1.59 M entries):
https://github.com/EBazarov/nsfw_data_source_urlsSafety Note
Because the content is explicit, the author strongly advises not to open the images in a workplace environment.
Overall, this dataset offers a substantial resource for researchers and developers working on NSFW detection and related AI applications, provided they have sufficient storage and handle the material responsibly.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
