Master Python Async/Await: From Coroutines to a Real-World Reddit Scraper

This tutorial explains Python's asynchronous programming model—including coroutines, the yield‑from syntax, async/await keywords, and event‑loop management—while providing a complete, runnable example that fetches Reddit JSON data concurrently using aiohttp.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Python Async/Await: From Coroutines to a Real-World Reddit Scraper

In recent years asynchronous programming has become popular because, although harder than sequential code, it allows a program to handle more work with fewer resources.

In synchronous code an HTTP request blocks the thread until a response arrives. With async code the request is started and the coroutine is placed in a queue; the program can perform other work while waiting, leading to higher efficiency for I/O‑bound tasks.

Python supports asynchronous functions—called coroutines—using the async keyword or the @asyncio.coroutine decorator. Both forms are equivalent:

import asyncio

async def ping_server(ip):
    pass

@asyncio.coroutine
def load_file(path):
    pass

Calling these functions returns a coroutine object, similar to a JavaScript Promise. The object can be scheduled on an event loop. To check whether an object is a coroutine, use asyncio.iscoroutine(obj).

Yield from

Before async/await, coroutines were driven with the yield from expression (introduced in Python 3.3 and extended in 3.5). It can only be used inside a function decorated with @asyncio.coroutine:

import asyncio
@asyncio.coroutine
def get_json(client, url):
    file_content = yield from load_file('/Users/scott/data.txt')

Using yield from outside such a function raises a SyntaxError.

Async/await

Python 3.5 introduced the clearer async / await syntax. An async def function defines a coroutine, and await is used to pause execution until another coroutine completes:

async def ping_server(ip):
    # ping code here...

async def ping_local():
    return await ping_server('192.168.1.1')

Both async/await and the older @asyncio.coroutine approach work in Python 3.5, but the former is preferred.

Running the event loop

The event loop drives coroutine execution. It provides registration, execution, and cancellation of asynchronous calls, creation of client/server protocols, subprocess handling, and the ability to run functions in a thread pool.

A minimal example to start the loop and run a coroutine:

import asyncio

async def speak_async():
    print('OMG asynchronicity!')

loop = asyncio.get_event_loop()
loop.run_until_complete(speak_async())
loop.close()

The run_until_complete call blocks until the coroutine finishes. Although the program runs in a single thread, you can off‑load I/O‑intensive work to the event loop while the main thread continues other tasks.

An example

The following complete script asynchronously fetches the top posts from three Reddit subreddits, parses the JSON, and prints each article's score, title, and URL. It demonstrates creating multiple coroutines with asyncio.ensure_future and keeping the loop alive until all requests finish.

import signal, sys, asyncio, aiohttp, json

loop = asyncio.get_event_loop()
client = aiohttp.ClientSession(loop=loop)

async def get_json(client, url):
    async with client.get(url) as response:
        assert response.status == 200
        return await response.read()

async def get_reddit_top(subreddit, client):
    data = await get_json(client, 'https://www.reddit.com/r/' + subreddit + '/top.json?sort=top&t=day&limit=5')
    j = json.loads(data.decode('utf-8'))
    for i in j['data']['children']:
        score = i['data']['score']
        title = i['data']['title']
        link = i['data']['url']
        print(f"{score}: {title} ({link})")
    print('DONE:', subreddit)

def signal_handler(sig, frame):
    loop.stop()
    client.close()
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler)

asyncio.ensure_future(get_reddit_top('python', client))
asyncio.ensure_future(get_reddit_top('programming', client))
asyncio.ensure_future(get_reddit_top('compsci', client))
loop.run_forever()

Install the required library with: pip install aiohttp Running the script on Python 3.5+ produces output similar to the example shown, with posts appearing in the order their HTTP responses arrive.

Conclusion

Although Python's built‑in async features are not as succinct as JavaScript's, they enable more efficient, responsive applications. Spending about half an hour learning the basics of asyncio, coroutines, and the event loop can greatly improve the performance of I/O‑bound projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonCoroutinesasync/awaitevent loopasyncioaiohttp
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.