A 1.59 Million‑Image NSFW Dataset Released for Advanced Content Filtering
Data scientist Evgeny Bazarov has open‑sourced a 1.589 million‑image NSFW dataset organized into 159 fine‑grained categories, providing GitHub links, download scripts, and a 500 GB storage requirement, enabling researchers to build more precise adult‑content detection models.
Dataset Overview
An open‑source NSFW image dataset containing 1,589,000 images has been released. It expands the earlier nsfw_data_scrapper collection of 200 k images and is intended for research and development of fine‑grained image‑filtering models.
Category Structure
The images are organized into 159 top‑level categories that reflect scene, appearance, and other attributes (e.g., appearance_clothing_dresses, locations_nature_beach, amateur_self‑shots). Each top‑level category is further split into sub‑categories; for instance, appearance_clothing_dresses contains five sub‑categories.
Intended Use
The fine‑grained labeling enables training of models that can accurately identify and classify NSFW content across diverse visual contexts.
Access and Download Procedure
The list of image URLs is stored in the GitHub repository:
https://github.com/EBazarov/nsfw_data_source_urls
To download the images, use the 2_download_from_urls.sh script located in the scripts directory of the nsfw_data_scrapper repository:
https://github.com/alexkimxyz/nsfw_data_scrapper
# Clone the scraper repository
git clone https://github.com/alexkimxyz/nsfw_data_scrapper.git
cd nsfw_data_scrapper/scripts
# Execute the download script (requires the URL list file)
bash 2_download_from_urls.sh /path/to/url_list.txtThe cleaned dataset occupies roughly 500 GB, so a sufficiently large hard drive is required. The repository itself stores only the URLs; the actual image files are fetched during script execution.
Caveats
Do not open the downloaded images in a workplace environment.
For reference, the original NSFW detection model is available at:
https://github.com/rockyzhengwu/nsfw
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
