Backend Development 7 min read

Python Script for Scraping Super Schedule App Topics via HTTP Requests

This tutorial demonstrates how to capture network packets from the Super Schedule Android app, perform a login request with required headers, and repeatedly fetch and parse topic data using Python urllib2 and JSON handling, enabling infinite scrolling of user‑generated topics.

Python Programming Learning Circle

Aug 21, 2021

Python Script for Scraping Super Schedule App Topics via HTTP Requests

The article explains how to retrieve JSON‑formatted data from the Super Schedule Android application by first intercepting the app's network packets to obtain the required form parameters (username, password, device info) and necessary HTTP headers.

1. Capture APP data packet

The intercepted form includes encrypted credentials and device information, which must be posted together with appropriate headers (Content‑Type, User‑Agent, Host, Connection, Accept‑Encoding, Content‑Length).

2. Login

The following Python code logs into the service using urllib2 and a CookieJar to store session cookies:

import urllib2
from cookielib import CookieJar
loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
headers = {
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'User-Agent': 'Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
    'Host': '120.55.151.61',
    'Connection': 'Keep-Alive',
    'Accept-Encoding': 'gzip',
    'Content-Length': '207',
}
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
print loginResult

Successful login returns a JSON object containing account information.

3. Fetch topic data

Using the same headers and a similar POST request, the script obtains the URL and parameters for topic retrieval, then parses the returned JSON to extract fields such as content, school name, message ID, gender, and timestamp.

The complete script combines login, data loading, and pagination to continuously load new topics:

#!/usr/local/bin/python2.7
# -*- coding: utf8 -*-
"""Super Schedule topic scraper"""
import urllib2
from cookielib import CookieJar
import json

def fetch_data(json_data):
    data = json_data['data']
    timestampLong = data['timestampLong']
    messageBO = data['messageBOs']
    topicList = []
    for each in messageBO:
        if each.get('content', False):
            topicDict = {
                'content': each['content'],
                'schoolName': each['schoolName'],
                'messageId': each['messageId'],
                'gender': each['studentBO']['gender'],
                'time': each['issueTime']
            }
            print each['schoolName'], each['content']
            topicList.append(topicDict)
    return timestampLong, topicList

def load(timestamp, headers, url):
    headers['Content-Length'] = '159'
    loadData = 'timestamp=%s&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&' % timestamp
    req = urllib2.Request(url, loadData, headers)
    loadResult = opener.open(req).read()
    loginStatus = json.loads(loadResult).get('status', False)
    if loginStatus == 1:
        print 'load successful!'
        timestamp, topicList = fetch_data(json.loads(loadResult))
        load(timestamp, headers, url)
    else:
        print 'load fail'
        print loadResult
        return False

loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
topicUrl = 'http://120.55.151.61/V2/Treehole/Message/getMessageByTopicIdV3.action'
headers = {
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'User-Agent': 'Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
    'Host': '120.55.151.61',
    'Connection': 'Keep-Alive',
    'Accept-Encoding': 'gzip',
    'Content-Length': '207',
}
# --- login ---
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
loginStatus = json.loads(loginResult).get('data', False)
if loginResult:
    print 'login successful!'
else:
    print 'login fail'
    print loginResult
# --- fetch topics ---
topicData = 'timestamp=0&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
headers['Content-Length'] = '147'
topicRequest = urllib2.Request(topicUrl, topicData, headers)
topicHtml = opener.open(topicRequest).read()
topicJson = json.loads(topicHtml)
if topicJson.get('status') == 1:
    print 'fetch topic success!'
    timestamp, topicList = fetch_data(topicJson)
    load(timestamp, headers, topicUrl)

The script prints each topic's school name and content, and can continuously load additional pages by updating the timestamp parameter, effectively achieving infinite scrolling of topics from the Super Schedule app.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Python Android json Web Scraping urllib2

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.