Backend Development 11 min read

Using Residential Proxy IPs for Unrestricted Web Scraping with Python

This guide explains what residential proxy IPs are, their advantages, how to obtain them from services like IPIDEA, and provides step‑by‑step Python code examples for integrating proxies into web‑scraping tasks such as extracting Amazon product data.

php Courses

Sep 6, 2022

Using Residential Proxy IPs for Unrestricted Web Scraping with Python

Many developers encounter IP bans and speed limits when crawling websites; residential proxy IPs can bypass these restrictions by providing IPs tied to real devices, allowing anonymous and unrestricted access.

A residential proxy IP is an address linked to a physical device that masks the true IP, enabling users to evade geographic blocks and act as a firewall for internal networks.

Key benefits include faster access through cached responses, enhanced privacy protection, higher download speeds by circumventing per‑IP limits, firewall capabilities, increased crawler throughput by rotating dynamic IPs, and better management of network resources.

Proxy services such as IPIDEA offer large pools of residential IPs; the platform provides over 90 million real residential proxies, supports HTTP/HTTPS/SOCKS5, and ensures high availability and security across 220+ countries.

Practical usage begins with obtaining an API link from the provider, then testing the proxy with a simple Python script that sends a request through the proxy and prints the response status and content.

# coding=utf-8
#!/usr/bin/env python
import json
import threading
import time
import requests as rq

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
    "Accept-Encoding": "gzip, deflate, br"
}

testUrl = 'https://api.myip.la/en?json'

def testPost(host, port):
    proxies = {
        'http': f'http://{host}:{port}',
        'https': f'http://{host}:{port}'
    }
    while True:
        try:
            res = rq.get(testUrl, proxies=proxies, timeout=5)
            print(res.status_code, "***", res.text)
            break
        except Exception as e:
            print(e)
            break
    return

class ThreadFactory(threading.Thread):
    def __init__(self, host, port):
        threading.Thread.__init__(self)
        self.host = host
        self.port = port
    def run(self):
        testPost(self.host, self.port)

tiqu = 'http://api.proxy.ipidea.io/getProxyIp...'
while True:
    resp = rq.get(url=tiqu, timeout=5)
    try:
        if resp.status_code == 200:
            dataBean = json.loads(resp.text)
        else:
            print("获取失败")
            time.sleep(1)
            continue
    except ValueError:
        print("获取失败")
        time.sleep(1)
        continue
    else:
        if dataBean["code"] == 0:
            threads = []
            for proxy in dataBean["data"]:
                threads.append(ThreadFactory(proxy["ip"], proxy["port"]))
            for t in threads:
                t.start()
                time.sleep(0.01)
            for t in threads:
                t.join()
    time.sleep(1)

Another example shows how to configure the requests library with a proxy obtained from IPIDEA, send a request to httpbin.org, and then use the proxy to fetch Amazon pages, extracting product names and prices via XPath.

import requests
import json
url='https://httpbin.org/get'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0"}
api_url='http://api.proxy.ipidea.io/getProxyIp?num=1&return_type=json&lb=1&sb=0&flow=1'
res = requests.post(api_url, verify=False)
ip_port = res.json()
proxie = f"http://{ip_port['data'][0]['ip']}:{ip_port['data'][0]['port']}"
proxies = {'http': proxie, 'https': proxie}
html = requests.get(url=url, headers=headers, proxies=proxies, verify=False).text
print(html)

Finally, the article demonstrates crawling Amazon China for laptop names and prices by sending requests through the proxy, parsing the HTML with lxml.etree, and printing the results, confirming that residential proxies enable reliable large‑scale data extraction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

proxy Python API requests residential IP

Written by

php Courses

php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.