How to Export QQ Space Memories with Python Selenium: Posts and Photos Scraper

Learn step-by-step how to use Python's Selenium library to automate login, scroll through QQ Space, extract historical posts and download album photos, including driver setup, code snippets for login, shuoshuo retrieval, and photo downloading, enabling you to preserve nostalgic content.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Export QQ Space Memories with Python Selenium: Posts and Photos Scraper

Install Selenium

Selenium is a browser automation tool that simulates user actions to obtain page source. Install it via pip: pip install selenium Download the matching ChromeDriver from http://npm.taobao.org/mirrors/chromedriver and place it in the same directory as your Python script.

Login

Open the browser console (F12) to locate the login and password fields. Use the following function to log into QQ Space:

def login(login_qq, password, business_qq):
    '''
    Login to QQ Space
    :param login_qq: QQ account for login
    :param password: QQ password
    :param business_qq: Business QQ ID
    :return: driver instance
    '''
    driver = webdriver.Chrome()
    driver.get('https://user.qzone.qq.com/{}/311'.format(business_qq))
    driver.implicitly_wait(10)
    driver.find_element_by_id('login_div')
    driver.switch_to.frame('login_frame')
    driver.find_element_by_id('switcher_plogin').click()
    driver.find_element_by_id('u').clear()
    driver.find_element_by_id('u').send_keys(login_qq)
    driver.find_element_by_id('p').clear()
    driver.find_element_by_id('p').send_keys(password)
    driver.find_element_by_id('login_button').click()
    driver.switch_to.default_content()
    driver.implicitly_wait(10)
    time.sleep(5)
    try:
        driver.find_element_by_id('QM_OwnerInfo_Icon')
        return driver
    except:
        print('Cannot access ' + business_qq)
        return None

Posts (Shuoshuo)

After logging in, the default page shows the "shuoshuo" feed, which loads content lazily as you scroll. Use Selenium to scroll, then parse the page with BeautifulSoup:

def get_shuoshuo(driver):
    page = 1
    while True:
        # Scroll down multiple times
        for _ in range(1, 5):
            driver.execute_script("window.scrollBy(0,5000)")
            time.sleep(2)
        # Switch to the iframe containing the feed
        driver.switch_to.frame('app_canvas_frame')
        bs = BeautifulSoup(driver.page_source.encode('GBK', 'ignore').decode('gbk'))
        pres = bs.find_all('pre', class_='content')
        for pre in pres:
            shuoshuo = pre.text
            tx = pre.parent.parent.find('a', class_='c_tx c_tx3 goDetail')['title']
            print(tx + ':' + shuoshuo)
        # Pagination check
        page += 1
        maxPage = bs.find('a', title='末页').text
        if int(maxPage) < page:
            break
        driver.find_element_by_link_text(u'下一页').click()
        driver.switch_to.default_content()
        time.sleep(3)

Album

Downloading photos from an album also requires Selenium to click through the UI. The script clicks the album button, iterates over each album, opens photos, and saves them locally:

def get_photo(driver):
    photo_path = "C:/Users/xxx/Desktop/photo/{}/{}.jpg"
    photoIndex = 1
    while True:
        driver.switch_to.default_content()
        driver.find_element_by_xpath('//*[@id="menuContainer"]/div/ul/li[3]/a').click()
        driver.implicitly_wait(10)
        time.sleep(3)
        driver.switch_to.frame('app_canvas_frame')
        a = driver.find_elements_by_class_name('album-cover')
        a[photoIndex].click()
        driver.implicitly_wait(10)
        time.sleep(3)
        p = driver.find_elements_by_class_name('item-cover')[0]
        p.click()
        time.sleep(3)
        driver.switch_to.parent_frame()
        while True:
            img = driver.find_element_by_id('js-img-disp')
            src = img.get_attribute('src').replace('&t=5', '')
            name = driver.find_element_by_id('js-photo-name').text
            urlretrieve(src, photo_path.format(qq, name))
            counts = driver.find_element_by_xpath('//*[@id="js-ctn-infoBar"]/div/div[1]/span').text.split('/')
            if int(counts[0]) == int(counts[1]):
                driver.find_element_by_xpath('//*[@id="js-viewer-main"]/div[1]/a').click()
                break
            for i in (1, 10):
                if driver.find_element_by_id('js-btn-nextPhoto'):
                    n = driver.find_element_by_id('js-btn-nextPhoto')
                    ActionChains(driver).click(n).perform()
                    break
                else:
                    time.sleep(5)
        photoIndex += 1
        if len(a) <= photoIndex:
            break

Conclusion

Browsing through years‑old posts and photos can feel like uncovering a digital time capsule. With Selenium automation, you can efficiently retrieve and preserve these memories.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonautomationData ExtractionWeb ScrapingQQ SpaceSelenium
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.