A 1.59 Million‑Image NSFW Dataset Released for Advanced Content Filtering

Data scientist Evgeny Bazarov has open‑sourced a 1.589 million‑image NSFW dataset organized into 159 fine‑grained categories, providing GitHub links, download scripts, and a 500 GB storage requirement, enabling researchers to build more precise adult‑content detection models.

ITPUB
ITPUB
ITPUB
A 1.59 Million‑Image NSFW Dataset Released for Advanced Content Filtering

Dataset Overview

An open‑source NSFW image dataset containing 1,589,000 images has been released. It expands the earlier nsfw_data_scrapper collection of 200 k images and is intended for research and development of fine‑grained image‑filtering models.

Category Structure

The images are organized into 159 top‑level categories that reflect scene, appearance, and other attributes (e.g., appearance_clothing_dresses, locations_nature_beach, amateur_self‑shots). Each top‑level category is further split into sub‑categories; for instance, appearance_clothing_dresses contains five sub‑categories.

Intended Use

The fine‑grained labeling enables training of models that can accurately identify and classify NSFW content across diverse visual contexts.

Access and Download Procedure

The list of image URLs is stored in the GitHub repository:

https://github.com/EBazarov/nsfw_data_source_urls

To download the images, use the 2_download_from_urls.sh script located in the scripts directory of the nsfw_data_scrapper repository:

https://github.com/alexkimxyz/nsfw_data_scrapper

# Clone the scraper repository
git clone https://github.com/alexkimxyz/nsfw_data_scrapper.git
cd nsfw_data_scrapper/scripts
# Execute the download script (requires the URL list file)
bash 2_download_from_urls.sh /path/to/url_list.txt

The cleaned dataset occupies roughly 500 GB, so a sufficiently large hard drive is required. The repository itself stores only the URLs; the actual image files are fetched during script execution.

Caveats

Do not open the downloaded images in a workplace environment.

For reference, the original NSFW detection model is available at:

https://github.com/rockyzhengwu/nsfw

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Image Classificationmachine learningComputer Visionlarge datasetGitHubNSFW dataset
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.