Using Proxy IPs for Web Scraping with Python: A Practical Guide
This article explains why proxy IPs are essential for reliable web crawling, compares dynamic and static residential proxies, and provides step‑by‑step Python code to scrape product titles, prices and links from Snapdeal while demonstrating how to integrate proxies for improved efficiency and security.
In the digital era, data is a core resource and web crawlers are essential for market analysis and research, but high‑frequency requests often trigger IP blocking.
Proxy IPs distribute requests across multiple addresses, bypassing rate limits, hiding the real IP, and improving crawl efficiency and data security.
The article describes the advantages of proxy IP services, how to obtain an account, and the difference between dynamic residential proxies and static residential proxies.
A practical example demonstrates scraping product titles, prices and links from the Snapdeal e‑commerce site using Python's requests and BeautifulSoup libraries.
Step‑by‑step code shows importing libraries, setting request headers, optionally configuring a proxy, fetching the page, parsing HTML, extracting product information with find_all , and printing results.
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
url = 'https://www.snapdeal.com/search?keyword=iPhone%2016&...'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
def extract_product_info():
products = []
product_elements = soup.find_all('div', class_='product-tuple-listing')
for product in product_elements:
title = product.find('p', class_='product-title').text.strip() if product.find('p', class_='product-title') else None
price = product.find('span', class_='lfloat product-price').text.strip() if product.find('span', class_='lfloat product-price') else None
link = product.find('a', href=True)['href'] if product.find('a', href=True) else None
if title and price and link:
products.append({'title': title, 'price': price, 'link': f'https://www.snapdeal.com{link}'})
return products
products = extract_product_info()
for p in products:
print(f"Title: {p['title']}")
print(f"Price: {p['price']}")
print(f"Link: {p['link']}")
print('-' * 40)To use a proxy, the script can be modified as follows:
proxyip = "http://username:[email protected]:7878"
proxies = {'http': proxyip}
response = requests.get(url, headers=headers, proxies=proxies, verify=False)The conclusion emphasizes that proxy IPs are vital for efficient and secure web data collection, especially when crawling e‑commerce platforms like Snapdeal.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.