Python Script for Scraping Super Schedule App Topics via HTTP Requests
This tutorial demonstrates how to capture network packets from the Super Schedule Android app, perform a login request with required headers, and repeatedly fetch and parse topic data using Python urllib2 and JSON handling, enabling infinite scrolling of user‑generated topics.
The article explains how to retrieve JSON‑formatted data from the Super Schedule Android application by first intercepting the app's network packets to obtain the required form parameters (username, password, device info) and necessary HTTP headers.
1. Capture APP data packet
The intercepted form includes encrypted credentials and device information, which must be posted together with appropriate headers (Content‑Type, User‑Agent, Host, Connection, Accept‑Encoding, Content‑Length).
2. Login
The following Python code logs into the service using urllib2 and a CookieJar to store session cookies:
<code>import urllib2
from cookielib import CookieJar
loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'User-Agent': 'Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
'Host': '120.55.151.61',
'Connection': 'Keep-Alive',
'Accept-Encoding': 'gzip',
'Content-Length': '207',
}
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
print loginResult</code>Successful login returns a JSON object containing account information.
3. Fetch topic data
Using the same headers and a similar POST request, the script obtains the URL and parameters for topic retrieval, then parses the returned JSON to extract fields such as content, school name, message ID, gender, and timestamp.
The complete script combines login, data loading, and pagination to continuously load new topics:
<code>#!/usr/local/bin/python2.7
# -*- coding: utf8 -*-
"""Super Schedule topic scraper"""
import urllib2
from cookielib import CookieJar
import json
def fetch_data(json_data):
data = json_data['data']
timestampLong = data['timestampLong']
messageBO = data['messageBOs']
topicList = []
for each in messageBO:
if each.get('content', False):
topicDict = {
'content': each['content'],
'schoolName': each['schoolName'],
'messageId': each['messageId'],
'gender': each['studentBO']['gender'],
'time': each['issueTime']
}
print each['schoolName'], each['content']
topicList.append(topicDict)
return timestampLong, topicList
def load(timestamp, headers, url):
headers['Content-Length'] = '159'
loadData = 'timestamp=%s&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&' % timestamp
req = urllib2.Request(url, loadData, headers)
loadResult = opener.open(req).read()
loginStatus = json.loads(loadResult).get('status', False)
if loginStatus == 1:
print 'load successful!'
timestamp, topicList = fetch_data(json.loads(loadResult))
load(timestamp, headers, url)
else:
print 'load fail'
print loadResult
return False
loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
topicUrl = 'http://120.55.151.61/V2/Treehole/Message/getMessageByTopicIdV3.action'
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'User-Agent': 'Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
'Host': '120.55.151.61',
'Connection': 'Keep-Alive',
'Accept-Encoding': 'gzip',
'Content-Length': '207',
}
# --- login ---
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
loginStatus = json.loads(loginResult).get('data', False)
if loginResult:
print 'login successful!'
else:
print 'login fail'
print loginResult
# --- fetch topics ---
topicData = 'timestamp=0&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
headers['Content-Length'] = '147'
topicRequest = urllib2.Request(topicUrl, topicData, headers)
topicHtml = opener.open(topicRequest).read()
topicJson = json.loads(topicHtml)
if topicJson.get('status') == 1:
print 'fetch topic success!'
timestamp, topicList = fetch_data(topicJson)
load(timestamp, headers, topicUrl)</code>The script prints each topic's school name and content, and can continuously load additional pages by updating the timestamp parameter, effectively achieving infinite scrolling of topics from the Super Schedule app.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.