How to Export QQ Space Memories with Python Selenium: Posts and Photos Scraper
Learn step-by-step how to use Python's Selenium library to automate login, scroll through QQ Space, extract historical posts and download album photos, including driver setup, code snippets for login, shuoshuo retrieval, and photo downloading, enabling you to preserve nostalgic content.
Install Selenium
Selenium is a browser automation tool that simulates user actions to obtain page source. Install it via pip: pip install selenium Download the matching ChromeDriver from http://npm.taobao.org/mirrors/chromedriver and place it in the same directory as your Python script.
Login
Open the browser console (F12) to locate the login and password fields. Use the following function to log into QQ Space:
def login(login_qq, password, business_qq):
'''
Login to QQ Space
:param login_qq: QQ account for login
:param password: QQ password
:param business_qq: Business QQ ID
:return: driver instance
'''
driver = webdriver.Chrome()
driver.get('https://user.qzone.qq.com/{}/311'.format(business_qq))
driver.implicitly_wait(10)
driver.find_element_by_id('login_div')
driver.switch_to.frame('login_frame')
driver.find_element_by_id('switcher_plogin').click()
driver.find_element_by_id('u').clear()
driver.find_element_by_id('u').send_keys(login_qq)
driver.find_element_by_id('p').clear()
driver.find_element_by_id('p').send_keys(password)
driver.find_element_by_id('login_button').click()
driver.switch_to.default_content()
driver.implicitly_wait(10)
time.sleep(5)
try:
driver.find_element_by_id('QM_OwnerInfo_Icon')
return driver
except:
print('Cannot access ' + business_qq)
return NonePosts (Shuoshuo)
After logging in, the default page shows the "shuoshuo" feed, which loads content lazily as you scroll. Use Selenium to scroll, then parse the page with BeautifulSoup:
def get_shuoshuo(driver):
page = 1
while True:
# Scroll down multiple times
for _ in range(1, 5):
driver.execute_script("window.scrollBy(0,5000)")
time.sleep(2)
# Switch to the iframe containing the feed
driver.switch_to.frame('app_canvas_frame')
bs = BeautifulSoup(driver.page_source.encode('GBK', 'ignore').decode('gbk'))
pres = bs.find_all('pre', class_='content')
for pre in pres:
shuoshuo = pre.text
tx = pre.parent.parent.find('a', class_='c_tx c_tx3 goDetail')['title']
print(tx + ':' + shuoshuo)
# Pagination check
page += 1
maxPage = bs.find('a', title='末页').text
if int(maxPage) < page:
break
driver.find_element_by_link_text(u'下一页').click()
driver.switch_to.default_content()
time.sleep(3)Album
Downloading photos from an album also requires Selenium to click through the UI. The script clicks the album button, iterates over each album, opens photos, and saves them locally:
def get_photo(driver):
photo_path = "C:/Users/xxx/Desktop/photo/{}/{}.jpg"
photoIndex = 1
while True:
driver.switch_to.default_content()
driver.find_element_by_xpath('//*[@id="menuContainer"]/div/ul/li[3]/a').click()
driver.implicitly_wait(10)
time.sleep(3)
driver.switch_to.frame('app_canvas_frame')
a = driver.find_elements_by_class_name('album-cover')
a[photoIndex].click()
driver.implicitly_wait(10)
time.sleep(3)
p = driver.find_elements_by_class_name('item-cover')[0]
p.click()
time.sleep(3)
driver.switch_to.parent_frame()
while True:
img = driver.find_element_by_id('js-img-disp')
src = img.get_attribute('src').replace('&t=5', '')
name = driver.find_element_by_id('js-photo-name').text
urlretrieve(src, photo_path.format(qq, name))
counts = driver.find_element_by_xpath('//*[@id="js-ctn-infoBar"]/div/div[1]/span').text.split('/')
if int(counts[0]) == int(counts[1]):
driver.find_element_by_xpath('//*[@id="js-viewer-main"]/div[1]/a').click()
break
for i in (1, 10):
if driver.find_element_by_id('js-btn-nextPhoto'):
n = driver.find_element_by_id('js-btn-nextPhoto')
ActionChains(driver).click(n).perform()
break
else:
time.sleep(5)
photoIndex += 1
if len(a) <= photoIndex:
breakConclusion
Browsing through years‑old posts and photos can feel like uncovering a digital time capsule. With Selenium automation, you can efficiently retrieve and preserve these memories.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
