Using Residential Proxy IPs for Unrestricted Web Scraping with Python
This guide explains what residential proxy IPs are, their advantages, how to obtain them from services like IPIDEA, and provides step‑by‑step Python code examples for integrating proxies into web‑scraping tasks such as extracting Amazon product data.
Many developers encounter IP bans and speed limits when crawling websites; residential proxy IPs can bypass these restrictions by providing IPs tied to real devices, allowing anonymous and unrestricted access.
A residential proxy IP is an address linked to a physical device that masks the true IP, enabling users to evade geographic blocks and act as a firewall for internal networks.
Key benefits include faster access through cached responses, enhanced privacy protection, higher download speeds by circumventing per‑IP limits, firewall capabilities, increased crawler throughput by rotating dynamic IPs, and better management of network resources.
Proxy services such as IPIDEA offer large pools of residential IPs; the platform provides over 90 million real residential proxies, supports HTTP/HTTPS/SOCKS5, and ensures high availability and security across 220+ countries.
Practical usage begins with obtaining an API link from the provider, then testing the proxy with a simple Python script that sends a request through the proxy and prints the response status and content.
# coding=utf-8
#!/usr/bin/env python
import json
import threading
import time
import requests as rq
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br"
}
testUrl = 'https://api.myip.la/en?json'
def testPost(host, port):
proxies = {
'http': f'http://{host}:{port}',
'https': f'http://{host}:{port}'
}
while True:
try:
res = rq.get(testUrl, proxies=proxies, timeout=5)
print(res.status_code, "***", res.text)
break
except Exception as e:
print(e)
break
return
class ThreadFactory(threading.Thread):
def __init__(self, host, port):
threading.Thread.__init__(self)
self.host = host
self.port = port
def run(self):
testPost(self.host, self.port)
tiqu = 'http://api.proxy.ipidea.io/getProxyIp...'
while True:
resp = rq.get(url=tiqu, timeout=5)
try:
if resp.status_code == 200:
dataBean = json.loads(resp.text)
else:
print("获取失败")
time.sleep(1)
continue
except ValueError:
print("获取失败")
time.sleep(1)
continue
else:
if dataBean["code"] == 0:
threads = []
for proxy in dataBean["data"]:
threads.append(ThreadFactory(proxy["ip"], proxy["port"]))
for t in threads:
t.start()
time.sleep(0.01)
for t in threads:
t.join()
time.sleep(1)Another example shows how to configure the requests library with a proxy obtained from IPIDEA, send a request to httpbin.org, and then use the proxy to fetch Amazon pages, extracting product names and prices via XPath.
import requests
import json
url='https://httpbin.org/get'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0"}
api_url='http://api.proxy.ipidea.io/getProxyIp?num=1&return_type=json&lb=1&sb=0&flow=1'
res = requests.post(api_url, verify=False)
ip_port = res.json()
proxie = f"http://{ip_port['data'][0]['ip']}:{ip_port['data'][0]['port']}"
proxies = {'http': proxie, 'https': proxie}
html = requests.get(url=url, headers=headers, proxies=proxies, verify=False).text
print(html)Finally, the article demonstrates crawling Amazon China for laptop names and prices by sending requests through the proxy, parsing the HTML with lxml.etree, and printing the results, confirming that residential proxies enable reliable large‑scale data extraction.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
php Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
