Accessing Login‑Protected Pages with Cookies, urllib, requests, and Selenium in Python
This guide explains four practical methods—using a known cookie, simulating login with urllib or requests, maintaining a session, and employing a headless Selenium browser—to programmatically retrieve pages that require user authentication, complete with step‑by‑step instructions and code examples.
This article demonstrates how to programmatically access web pages that are only visible after a user logs in, using Python. Four approaches are covered: directly reusing a known cookie, simulating the login process with urllib or requests, keeping a logged‑in session, and driving a headless browser with Selenium.
Method 1 – Directly use a known cookie
After logging in with a browser, copy the JSESSIONID and iPlanetDirectoryPro values from the request headers. Include these cookies in subsequent HTTP requests to impersonate the logged‑in user.
Steps:
Log in via browser and open developer tools.
Locate the request headers for the target URL and copy the cookie string.
Insert the cookie string into your Python code.
Code example (urllib):
import sys
import io
from urllib import request
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf8')
url = 'http://ssfw.xmu.edu.cn/cmstar/index.portal'
cookie_str = r'JSESSIONID=xxxxxxxxxxxxxxxxxxxxxx; iPlanetDirectoryPro=xxxxxxxxxxxxxxxxxx'
req = request.Request(url)
req.add_header('cookie', cookie_str)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))Code example (requests):
import requests
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf8')
url = 'http://ssfw.xmu.edu.cn/cmstar/index.portal'
cookie_str = r'JSESSIONID=xxxxxxxxxxxxxxxxxxxxxx; iPlanetDirectoryPro=xxxxxxxxxxxxxxxxxx'
cookies = {}
for line in cookie_str.split(';'):
key, value = line.split('=', 1)
cookies[key.strip()] = value.strip()
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
resp = requests.get(url, headers=headers, cookies=cookies)
print(resp.content.decode('utf-8'))Method 2 – Simulate login and reuse the returned cookie
Send a POST request with the login form data (username, password, etc.) to obtain a fresh cookie, then use that cookie for subsequent requests.
Key steps:
Identify the form action URL using the browser’s network panel.
Collect all required form fields from the "Form Data" section.
Post the data with urllib or requests and capture the Set‑Cookie header.
Code example (urllib):
import sys
import io
import urllib.request
import http.cookiejar
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf8')
login_url = 'http://ssfw.xmu.edu.cn/cmstar/userPasswordValidate.portal'
post_data = urllib.parse.urlencode({
'Login.Token1': 'your_student_id',
'Login.Token2': 'your_password',
'goto:http': '//ssfw.xmu.edu.cn/cmstar/loginSuccess.portal',
'gotoOnFail:http': '//ssfw.xmu.edu.cn/cmstar/loginFailure.portal'
}).encode('utf-8')
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
req = urllib.request.Request(login_url, data=post_data, headers=headers)
cookie_jar = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookie_jar))
resp = opener.open(req)
# subsequent request using the same opener retains the cookie
url = 'http://ssfw.xmu.edu.cn/cmstar/index.portal'
req = urllib.request.Request(url, headers=headers)
resp = opener.open(req)
print(resp.read().decode('utf-8'))Code example (requests):
import requests
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf8')
login_url = 'http://ssfw.xmu.edu.cn/cmstar/userPasswordValidate.portal'
data = {
'Login.Token1': 'your_student_id',
'Login.Token2': 'your_password',
'goto:http': '//ssfw.xmu.edu.cn/cmstar/loginSuccess.portal',
'gotoOnFail:http': '//ssfw.xmu.edu.cn/cmstar/loginFailure.portal'
}
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
session = requests.Session()
resp = session.post(login_url, data=data, headers=headers)
url = 'http://ssfw.xmu.edu.cn/cmstar/index.portal'
resp = session.get(url, headers=headers)
print(resp.content.decode('utf-8'))Method 3 – Use a requests Session to keep login state
A Session object automatically stores cookies after the login POST, allowing later GET requests to be authenticated without manually handling cookies.
Code example (requests):
import requests
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf8')
login_url = 'http://ssfw.xmu.edu.cn/cmstar/userPasswordValidate.portal'
data = {
'Login.Token1': 'your_student_id',
'Login.Token2': 'your_password',
'goto:http': '//ssfw.xmu.edu.cn/cmstar/loginSuccess.portal',
'gotoOnFail:http': '//ssfw.xmu.edu.cn/cmstar/loginFailure.portal'
}
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
session = requests.Session()
session.post(login_url, data=data, headers=headers)
url = 'http://ssfw.xmu.edu.cn/cmstar/index.portal'
resp = session.get(url, headers=headers)
print(resp.content.decode('utf-8'))Method 4 – Use a headless browser (Selenium + PhantomJS)
Driving a real browser eliminates the need to reverse‑engineer request parameters; the script can fill the login form, submit it, and then retrieve the protected page.
Steps:
Install selenium and a headless driver such as PhantomJS.
Locate the username, password fields and the login button in the page source.
Use Selenium’s element‑finding methods to interact with those controls.
After login, capture the page source or take a screenshot.
Code example:
import requests
import sys
import io
from selenium import webdriver
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf8')
# Path to PhantomJS executable
browser = webdriver.PhantomJS('d:/tool/07-net/phantomjs-windows/phantomjs-2.1.1-windows/bin/phantomjs.exe')
url = r'http://ssfw.xmu.edu.cn/cmstar/index.portal'
browser.get(url)
browser.implicitly_wait(3)
username = browser.find_element_by_name('user')
username.send_keys('your_student_id')
password = browser.find_element_by_name('pwd')
password.send_keys('your_password')
student = browser.find_element_by_xpath('//input[@value="student"]')
student.click()
login_button = browser.find_element_by_name('btn')
login_button.submit()
# Optional: screenshot and page source
browser.save_screenshot('picture1.png')
print(browser.page_source.encode('utf-8').decode())
browser.quit()All four methods achieve the same goal—retrieving content that is normally hidden behind a login—allowing developers to automate data collection from protected sites.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
