Backend Development 7 min read

Python Script for Scraping Super Schedule App Topics via HTTP Requests

This tutorial demonstrates how to capture network packets from the Super Schedule Android app, perform a login request with required headers, and repeatedly fetch and parse topic data using Python urllib2 and JSON handling, enabling infinite scrolling of user‑generated topics.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Python Script for Scraping Super Schedule App Topics via HTTP Requests

The article explains how to retrieve JSON‑formatted data from the Super Schedule Android application by first intercepting the app's network packets to obtain the required form parameters (username, password, device info) and necessary HTTP headers.

1. Capture APP data packet

The intercepted form includes encrypted credentials and device information, which must be posted together with appropriate headers (Content‑Type, User‑Agent, Host, Connection, Accept‑Encoding, Content‑Length).

2. Login

The following Python code logs into the service using urllib2 and a CookieJar to store session cookies:

<code>import urllib2
from cookielib import CookieJar
loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
headers = {
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'User-Agent': 'Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
    'Host': '120.55.151.61',
    'Connection': 'Keep-Alive',
    'Accept-Encoding': 'gzip',
    'Content-Length': '207',
}
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
print loginResult</code>

Successful login returns a JSON object containing account information.

3. Fetch topic data

Using the same headers and a similar POST request, the script obtains the URL and parameters for topic retrieval, then parses the returned JSON to extract fields such as content, school name, message ID, gender, and timestamp.

The complete script combines login, data loading, and pagination to continuously load new topics:

<code>#!/usr/local/bin/python2.7
# -*- coding: utf8 -*-
"""Super Schedule topic scraper"""
import urllib2
from cookielib import CookieJar
import json

def fetch_data(json_data):
    data = json_data['data']
    timestampLong = data['timestampLong']
    messageBO = data['messageBOs']
    topicList = []
    for each in messageBO:
        if each.get('content', False):
            topicDict = {
                'content': each['content'],
                'schoolName': each['schoolName'],
                'messageId': each['messageId'],
                'gender': each['studentBO']['gender'],
                'time': each['issueTime']
            }
            print each['schoolName'], each['content']
            topicList.append(topicDict)
    return timestampLong, topicList

def load(timestamp, headers, url):
    headers['Content-Length'] = '159'
    loadData = 'timestamp=%s&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&' % timestamp
    req = urllib2.Request(url, loadData, headers)
    loadResult = opener.open(req).read()
    loginStatus = json.loads(loadResult).get('status', False)
    if loginStatus == 1:
        print 'load successful!'
        timestamp, topicList = fetch_data(json.loads(loadResult))
        load(timestamp, headers, url)
    else:
        print 'load fail'
        print loadResult
        return False

loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
topicUrl = 'http://120.55.151.61/V2/Treehole/Message/getMessageByTopicIdV3.action'
headers = {
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'User-Agent': 'Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
    'Host': '120.55.151.61',
    'Connection': 'Keep-Alive',
    'Accept-Encoding': 'gzip',
    'Content-Length': '207',
}
# --- login ---
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
loginStatus = json.loads(loginResult).get('data', False)
if loginResult:
    print 'login successful!'
else:
    print 'login fail'
    print loginResult
# --- fetch topics ---
topicData = 'timestamp=0&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
headers['Content-Length'] = '147'
topicRequest = urllib2.Request(topicUrl, topicData, headers)
topicHtml = opener.open(topicRequest).read()
topicJson = json.loads(topicHtml)
if topicJson.get('status') == 1:
    print 'fetch topic success!'
    timestamp, topicList = fetch_data(topicJson)
    load(timestamp, headers, topicUrl)</code>

The script prints each topic's school name and content, and can continuously load additional pages by updating the timestamp parameter, effectively achieving infinite scrolling of topics from the Super Schedule app.

BackendAndroidJSONWeb Scrapingurllib2
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.