Simulating Login to a Web Portal Using Python urllib2 and cookielib
This tutorial demonstrates how to programmatically log into a website by handling cookies, fetching and solving captchas, and posting the required form data using Python's urllib2 and cookielib modules, with a complete example targeting a school academic system.
When crawling websites you often encounter pages that require authentication; this article explains how to simulate a login using Python's urllib2 and cookielib libraries, illustrated with a login to a school academic system.
First, understand that cookies are used by websites to identify users and maintain sessions, so we employ the cookielib.CookieJar object to store and manage cookies automatically.
The captcha image changes on each request and is tied to the cookie, making automated recognition difficult; therefore the approach is to fetch the captcha image, save it locally, manually input the code, and then submit the login form together with the captured cookies.
Using browser developer tools (e.g., Chrome or Firefox) we inspect the login page to determine the required POST parameters and headers. The essential fields include txtUserName (username) and TextBox2 (password), along with hidden fields such as __VIEWSTATE and the captcha code.
Below is the complete Python script that performs the simulated login:
import urllib2<br/>import cookielib<br/>import urllib<br/>import re<br/>import sys<br/>'''模拟登录'''<br/>reload(sys)<br/>sys.setdefaultencoding("utf-8") # 防止中文报错<br/>CaptchaUrl = "http://202.115.80.153/CheckCode.aspx"<br/>PostUrl = "http://202.115.80.153/default2.aspx" # 验证码地址和post地址<br/>cookie = cookielib.CookieJar()<br/>handler = urllib2.HTTPCookieProcessor(cookie)<br/>opener = urllib2.build_opener(handler) # 将cookies绑定到一个opener cookie由cookielib自动管理<br/>username = 'username'<br/>password = 'password123' # 用户名和密码<br/>picture = opener.open(CaptchaUrl).read() # 用openr访问验证码地址,获取cookie<br/>local = open('e:/image.jpg', 'wb')<br/>local.write(picture)<br/>local.close() # 保存验证码到本地<br/>SecretCode = raw_input('输入验证码:') # 打开保存的验证码图片 输入<br/>postData = {<br/> '__VIEWSTATE': 'dDwyODE2NTM0OTg7Oz6pH0TWZk5t0lupp/tlA1L+rmL83g==',<br/> 'txtUserName': username,<br/> 'TextBox2': password,<br/> 'txtSecretCode': SecretCode,<br/> 'RadioButtonList1': '学生',<br/> 'Button1': '',<br/> 'lbLanguage': '',<br/> 'hidPdrs': '',<br/> 'hidsc': ''<br/>}<br/># 根据抓包信息 构造表单<br/>headers = {<br/> 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',<br/> 'Accept-Language': 'zh-CN,zh;q=0.8',<br/> 'Connection': 'keep-alive',<br/> 'Content-Type': 'application/x-www-form-urlencoded',<br/> 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36',<br/>}<br/># 根据抓包信息 构造headers<br/>data = urllib.urlencode(postData) # 生成post数据 ?key1=value1&key2=value2的形式<br/>request = urllib2.Request(PostUrl, data, headers) # 构造request请求<br/>try:<br/> response = opener.open(request)<br/> result = response.read().decode('gb2312') # 由于该网页是gb2312的编码,所以需要解码<br/> print result # 打印登录后的页面<br/>except urllib2.HTTPError, e:<br/> print e.code # 利用之前存有cookie的opener登录页面After a successful login, the same opener can be used to access other pages that require authentication.
Disclaimer: This article is compiled from online sources; copyright belongs to the original author. If any information is incorrect or infringes rights, please contact us for removal or authorization.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.