One-Line Python Scraping with Scrapeasy: Extract Data, Images, PDFs, Videos
This article introduces Scrapeasy, a Python library that lets you scrape websites, retrieve subpage links, images, PDFs, and videos with just a few lines of code, showing installation, basic usage, and advanced download techniques for various file types.
If you are looking for a powerful Python web‑scraping tool, Scrapeasy lets you start extracting data with a single line of code.
What is Scrapeasy?
Scrapeasy is a Python library that simplifies web crawling and data extraction. It can scrape a single page or multiple pages, and also extract data from PDF and HTML tables.
Key Features
One‑click website scraping – not limited to a single page.
Common scraping tasks (fetching links, images, or videos) are built‑in.
Retrieve special file types such as .php or .pdf from the crawled sites.
Installation
$ pip install scrapeasyBasic Usage
Import the main classes: from scrapeasy import Website, Page Initialize a website object by providing the homepage URL:
web = Website("https://tikocash.com/solange/index.php/2022/04/13/how-do-you-control-irrational-fear-and-overthinking/")Get all sub‑page links: links = web.getSubpagesLinks() Fetch all image links on the site: images = web.getImages() Download all images to a local folder:
web.download("img", "fahrschule/images")Link Extraction
Retrieve domain links only:
domains = web.getLinks(intern=False, extern=False, domain=True)Retrieve all external links (excluding domain filtering):
domains = web.getLinks(intern=False, extern=True, domain=False)Page Operations
Initialize a page object (example from w3schools):
w3 = Page("https://www.w3schools.com/html/html5_video.asp")Download all videos from the page: w3.download("video", "w3/videos") Or just get video URLs:
video_links = w3.getVideos()Downloading Other File Types
Download specific file types such as PDFs, PHP files, or icons using the generic .get() method or the .download() method with a file‑type argument:
calendar_links = Page("https://tikocash.com").get("php")Download all PDFs from a page:
Page("http://mathcourses.ch/mat182.html").download("pdf", "mathcourses/pdf-files")Conclusion
Python is a versatile language that, with just one line of code, can scrape any website in seconds, making it a powerful tool for web crawling and data mining.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
