How to Crawl Real-Time Data with Python WebSocket: A Step‑by‑Step Guide

This article explains how crawler engineers can fetch real‑time data such as sports scores, stock quotes, or cryptocurrency prices by comparing polling and WebSocket approaches, introducing the aiowebsocket library, and providing complete Python code to perform handshake, subscription, and continuous data streaming.

Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
How to Crawl Real-Time Data with Python WebSocket: A Step‑by‑Step Guide

What is WebSocket

WebSocket is a full‑duplex protocol over a single TCP connection that enables the server to push data to the client, providing true real‑time updates.

Polling vs. WebSocket

Polling means the client repeatedly requests data at fixed intervals (e.g., every second), which introduces latency and extra overhead because each request carries HTTP headers.

WebSocket uses a push model where the server actively sends data, achieving minimal latency and true real‑time communication.

WebSocket Advantages

Less control overhead: only one handshake is needed, after which only data frames are exchanged.

Stronger real‑time performance: server‑initiated pushes eliminate polling intervals.

Binary support: WebSocket can transmit binary frames, saving bandwidth.

Crawlers and WebSocket

Typical HTTP libraries like requests cannot handle WebSocket connections; a dedicated WebSocket client library is required.

Choosing a Python WebSocket Library

Popular options include websocket-client (synchronous), websockets (asynchronous), and aiowebsocket (asynchronous). This guide focuses on aiowebsocket.

Installing aiowebsocket

Run pip install aiowebsocket to install.

Basic aiowebsocket Example

import asyncio import logging from datetime import datetime from aiowebsocket.converses import AioWebSocket async def startup(uri):     async with AioWebSocket(uri) as aws:         converse = aws.manipulator         message = b'AioWebSocket - Async WebSocket Client'         while True:             await converse.send(message)             print('{time}-Client send: {message}'.format(time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'), message=message))             mes = await converse.receive()             print('{time}-Client receive: {rec}'.format(time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'), rec=mes)) if __name__ == '__main__':     remote = 'ws://echo.websocket.org'     try:         asyncio.get_event_loop().run_until_complete(startup(remote))     except KeyboardInterrupt as exc:         logging.info('Quit.')

Observing WebSocket Traffic in Chrome

Open DevTools, go to the WS filter, locate the request named realTime, and view the Headers and Frames tabs. The handshake uses the ws or wss scheme and returns status code 101.

Subscription Message

{"action":"subscribe","args":["QuoteBin5m:14"]}

Sample Data Frame

{"group":"QuoteBin5m:14","data":[{"low":"55.42","high":"55.63","open":"55.42","close":"55.59","last_price":"55.59","avg_price":"55.5111587372932781077","volume":"40078","timestamp":1551941701,"rise_fall_rate":"0.0030674846625766871","rise_fall_value":"0.17","base_coin_volume":"400.78","quote_coin_volume":"22247.7621987324"}]}

Full Crawling Code

import asyncio import logging from datetime import datetime from aiowebsocket.converses import AioWebSocket async def startup(uri):     async with AioWebSocket(uri) as aws:         converse = aws.manipulator         await converse.send('{"action":"subscribe","args":["QuoteBin5m:14"]}')         while True:             mes = await converse.receive()             print('{time}-Client receive: {rec}'.format(time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'), rec=mes)) if __name__ == '__main__':     remote = 'wss://api.bbxapp.vip/v1/ifcontract/realTime'     try:         asyncio.get_event_loop().run_until_complete(startup(remote))     except KeyboardInterrupt as exc:         logging.info('Quit.')

Running this script sends the subscription message after the handshake, after which the server continuously pushes real‑time data, enabling the crawler to capture the desired information.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonreal-time dataWebSocketWeb Scrapingasyncioaiowebsocket
Huawei Cloud Developer Alliance
Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.