Backend Development 5 min read

Simulating Login to a Web Portal Using Python urllib2 and cookielib

This tutorial demonstrates how to programmatically log into a website by handling cookies, fetching and solving captchas, and posting the required form data using Python's urllib2 and cookielib modules, with a complete example targeting a school academic system.

Python Programming Learning Circle

Jul 23, 2021

Simulating Login to a Web Portal Using Python urllib2 and cookielib

When crawling websites you often encounter pages that require authentication; this article explains how to simulate a login using Python's urllib2 and cookielib libraries, illustrated with a login to a school academic system.

First, understand that cookies are used by websites to identify users and maintain sessions, so we employ the cookielib.CookieJar object to store and manage cookies automatically.

The captcha image changes on each request and is tied to the cookie, making automated recognition difficult; therefore the approach is to fetch the captcha image, save it locally, manually input the code, and then submit the login form together with the captured cookies.

Using browser developer tools (e.g., Chrome or Firefox) we inspect the login page to determine the required POST parameters and headers. The essential fields include txtUserName (username) and TextBox2 (password), along with hidden fields such as __VIEWSTATE and the captcha code.

Below is the complete Python script that performs the simulated login:

import urllib2<br/>import cookielib<br/>import urllib<br/>import re<br/>import sys<br/>'''模拟登录'''<br/>reload(sys)<br/>sys.setdefaultencoding("utf-8")  # 防止中文报错<br/>CaptchaUrl = "http://202.115.80.153/CheckCode.aspx"<br/>PostUrl = "http://202.115.80.153/default2.aspx"  # 验证码地址和post地址<br/>cookie = cookielib.CookieJar()<br/>handler = urllib2.HTTPCookieProcessor(cookie)<br/>opener = urllib2.build_opener(handler)  # 将cookies绑定到一个opener cookie由cookielib自动管理<br/>username = 'username'<br/>password = 'password123'  # 用户名和密码<br/>picture = opener.open(CaptchaUrl).read()  # 用openr访问验证码地址,获取cookie<br/>local = open('e:/image.jpg', 'wb')<br/>local.write(picture)<br/>local.close()  # 保存验证码到本地<br/>SecretCode = raw_input('输入验证码：')  # 打开保存的验证码图片 输入<br/>postData = {<br/>    '__VIEWSTATE': 'dDwyODE2NTM0OTg7Oz6pH0TWZk5t0lupp/tlA1L+rmL83g==',<br/>    'txtUserName': username,<br/>    'TextBox2': password,<br/>    'txtSecretCode': SecretCode,<br/>    'RadioButtonList1': '学生',<br/>    'Button1': '',<br/>    'lbLanguage': '',<br/>    'hidPdrs': '',<br/>    'hidsc': ''<br/>}<br/># 根据抓包信息 构造表单<br/>headers = {<br/>    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',<br/>    'Accept-Language': 'zh-CN,zh;q=0.8',<br/>    'Connection': 'keep-alive',<br/>    'Content-Type': 'application/x-www-form-urlencoded',<br/>    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36',<br/>}<br/># 根据抓包信息 构造headers<br/>data = urllib.urlencode(postData)  # 生成post数据 ?key1=value1&key2=value2的形式<br/>request = urllib2.Request(PostUrl, data, headers)  # 构造request请求<br/>try:<br/>    response = opener.open(request)<br/>    result = response.read().decode('gb2312')  # 由于该网页是gb2312的编码，所以需要解码<br/>    print result  # 打印登录后的页面<br/>except urllib2.HTTPError, e:<br/>    print e.code  # 利用之前存有cookie的opener登录页面

After a successful login, the same opener can be used to access other pages that require authentication.

Disclaimer: This article is compiled from online sources; copyright belongs to the original author. If any information is incorrect or infringes rights, please contact us for removal or authorization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python captcha web-scraping urllib2 cookielib Login Simulation

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.