Backend Development 6 min read

Build a Python Web Scraper that Emails Daily Vocabulary Reminders

This tutorial walks you through creating a Python script that crawls an online dictionary to fetch word‑meaning pairs, formats them, and automatically emails the list each day, using libraries like lxml, requests, BeautifulSoup, and smtplib for scheduling and delivery.

Python Crawling & Data Mining

Mar 24, 2025

Build a Python Web Scraper that Emails Daily Vocabulary Reminders

Preface

The author, a Python enthusiast, shares a small project inspired by a code snippet that periodically reminds users to study English vocabulary after a recent CET‑4/6 exam.

Implementation Idea

The solution combines two parts: a Python web crawler that extracts words and their Chinese meanings from a web page, and an email‑sending routine that formats the data and delivers it to a specified recipient.

Implementation Process

The complete source code is shown below; configure your email address, authorization code, and recipient before running.

from lxml import etree
import requests
import random
import smtplib
import schedule
import time
from bs4 import BeautifulSoup
from email.mime.text import MIMEText
from email.header import Header

# account = input('请输入你的邮箱：')
# password = input('请输入你的密码：')
# receiver = input('请输入收件人的邮箱：')
account = '{0}'.format('请输入你的邮箱：')
password = '{0}'.format('请输入你的密码：')
receiver = '{0}'.format('请输入收件人的邮箱：')

def recipe_spider():
    num = 0
    list_all = ''
    words = []
    meaning = []
    choice = random.choice([(11, 226), (12, 105), (122, 35), (123, 25)])
    url = "http://word.iciba.com/?action=words&class=" + str(choice[0]) + "&course=" + str(
        random.randint(1, choice[1]))
    r = requests.get(url)
    r.encoding = r.apparent_encoding
    if r.status_code == 200:
        text = r.text
        doc = etree.HTML(text)
        words = doc.xpath('//*[@class="word_main_list"]/li/div[@class="word_main_list_w"]/span//text()')
        meaning = doc.xpath('//*[@class="word_main_list"]/li/div[@class="word_main_list_s"]/span//text()')
        li = []
        for i in range(len(words)):
            num += 1
            n = '''
%s、 %s     %s
            ''' % (num, words[i].strip(), meaning[i].strip())
            list_all = list_all + n
            dic = {'words': words[i], 'meaning': meaning[i]}
            li.append(dic)
        print(li)
        return list_all

def send_email(list_all):
    global account, password, receiver
    mailhost = 'smtp.qq.com'
    qqmail = smtplib.SMTP_SSL(mailhost, 465)
    #qqmail.connect(mailhost,465)
    qqmail.login(account, password)
    content = '亲爱的，今天记单词：' + list_all
    message = MIMEText(content, 'plain', 'utf-8')
    subject = '今天记什么单词'
    message['Subject'] = Header(subject, 'utf-8')
    try:
        qqmail.sendmail(account, receiver, message.as_string())
        print('邮件发送成功')
    except:
        print('邮件发送失败')
    qqmail.quit()

def job():
    print('开始一次任务')
    list_all = recipe_spider()
    send_email(list_all)
    print('任务完成')

if __name__ == '__main__':
    job()

After setting the correct SMTP credentials, the script fetches a random set of words, assembles them into a formatted string, and sends the result via email each time it runs.

Conclusion

The article demonstrates a practical Python project that merges web scraping with automated email delivery, offering a convenient way to receive daily vocabulary reminders and illustrating basic backend automation techniques.

Python scheduler web-scraping email automation lxml

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.