Python Project for Simulating Login and Web Scraping Across Multiple Websites
This article introduces a Python-based project that demonstrates how to log into and scrape data from 18 major websites—including Facebook, Twitter, Zhihu, and Bilibili—using methods such as Selenium, direct HTTP requests, and cookie management, providing code examples and future improvement plans.
The article presents a Python project aimed at helping beginners acquire additional data for machine learning tasks by automating login and web scraping on various popular platforms. It covers login techniques ranging from direct HTTP authentication to Selenium WebDriver, and emphasizes the use of cookies for efficient data collection with tools like requests or scrapy .
A comprehensive list of 18 supported sites is provided, including Facebook, Twitter (frontend API without authentication), Weibo, Zhihu, QQZone, CSDN, Taobao, Baidu, Guokr, JingDong, 163mail, Lagou, Bilibili, Douban, Baidu2, Liepin, WeChat Web, Github, and an image‑crawling example for TuChong.
The article shows a practical demonstration where, after satisfying dependencies, the code can download images from the TuChong website based on a search term (e.g., "autumn"). Screenshots illustrate the search results and the downloaded images.
For Douban, the article highlights the main login function that handles captcha retrieval, solving, and cookie preservation. It also displays the captcha‑handling function as an image.
Finally, the author notes that the GitHub repository contains more examples, invites users to report broken login rules via Issues or Pull Requests, and outlines future work such as refactoring for better code style, extensibility, and readability, as well as encouraging community contributions for additional site support.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.