How to Crawl Real-Time Data with Python WebSocket: A Step‑by‑Step Guide
This article explains how crawler engineers can fetch real‑time data such as sports scores, stock quotes, or cryptocurrency prices by comparing polling and WebSocket approaches, introducing the aiowebsocket library, and providing complete Python code to perform handshake, subscription, and continuous data streaming.
What is WebSocket
WebSocket is a full‑duplex protocol over a single TCP connection that enables the server to push data to the client, providing true real‑time updates.
Polling vs. WebSocket
Polling means the client repeatedly requests data at fixed intervals (e.g., every second), which introduces latency and extra overhead because each request carries HTTP headers.
WebSocket uses a push model where the server actively sends data, achieving minimal latency and true real‑time communication.
WebSocket Advantages
Less control overhead: only one handshake is needed, after which only data frames are exchanged.
Stronger real‑time performance: server‑initiated pushes eliminate polling intervals.
Binary support: WebSocket can transmit binary frames, saving bandwidth.
Crawlers and WebSocket
Typical HTTP libraries like requests cannot handle WebSocket connections; a dedicated WebSocket client library is required.
Choosing a Python WebSocket Library
Popular options include websocket-client (synchronous), websockets (asynchronous), and aiowebsocket (asynchronous). This guide focuses on aiowebsocket.
Installing aiowebsocket
Run pip install aiowebsocket to install.
Basic aiowebsocket Example
import asyncio import logging from datetime import datetime from aiowebsocket.converses import AioWebSocket async def startup(uri): async with AioWebSocket(uri) as aws: converse = aws.manipulator message = b'AioWebSocket - Async WebSocket Client' while True: await converse.send(message) print('{time}-Client send: {message}'.format(time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'), message=message)) mes = await converse.receive() print('{time}-Client receive: {rec}'.format(time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'), rec=mes)) if __name__ == '__main__': remote = 'ws://echo.websocket.org' try: asyncio.get_event_loop().run_until_complete(startup(remote)) except KeyboardInterrupt as exc: logging.info('Quit.')Observing WebSocket Traffic in Chrome
Open DevTools, go to the WS filter, locate the request named realTime, and view the Headers and Frames tabs. The handshake uses the ws or wss scheme and returns status code 101.
Subscription Message
{"action":"subscribe","args":["QuoteBin5m:14"]}Sample Data Frame
{"group":"QuoteBin5m:14","data":[{"low":"55.42","high":"55.63","open":"55.42","close":"55.59","last_price":"55.59","avg_price":"55.5111587372932781077","volume":"40078","timestamp":1551941701,"rise_fall_rate":"0.0030674846625766871","rise_fall_value":"0.17","base_coin_volume":"400.78","quote_coin_volume":"22247.7621987324"}]}Full Crawling Code
import asyncio import logging from datetime import datetime from aiowebsocket.converses import AioWebSocket async def startup(uri): async with AioWebSocket(uri) as aws: converse = aws.manipulator await converse.send('{"action":"subscribe","args":["QuoteBin5m:14"]}') while True: mes = await converse.receive() print('{time}-Client receive: {rec}'.format(time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'), rec=mes)) if __name__ == '__main__': remote = 'wss://api.bbxapp.vip/v1/ifcontract/realTime' try: asyncio.get_event_loop().run_until_complete(startup(remote)) except KeyboardInterrupt as exc: logging.info('Quit.')Running this script sends the subscription message after the handshake, after which the server continuously pushes real‑time data, enabling the crawler to capture the desired information.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Huawei Cloud Developer Alliance
The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
