Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio

This guide walks you through creating a lightweight Node.js web crawler using the request and cheerio modules, covering preparation, installation, core code, and testing steps, so you can fetch page HTML, parse data, and store results with just a few dozen lines of code.

Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio

Since Node.js appeared, developers have used it for tasks traditionally handled by backend languages like PHP or Python, such as writing web crawlers. This tutorial shows how to build a simple crawler with just a few dozen lines of code.

Crawler Overview

Send HTTP requests to obtain page HTML (optionally adding headers like cookies or referer). Parse the HTML using regular expressions or third‑party modules to extract useful data. Persist the extracted data to a database or file.

Preparation Stage

NPM

Install the required modules: request and cheerio Run npm install request cheerio in your project directory.

After installation, your package.json will contain the two dependencies.

crawler.js

Create a file named crawler.js and require the installed modules:

const request = require('request');
const cheerio = require('cheerio');

Learning Stage

REQUEST

The request module is a simplified HTTP client that wraps http.request, making it easy to download resources.

CHEERIO

cheerio

provides a server‑side implementation of jQuery’s core API, allowing you to manipulate and query the DOM of fetched HTML quickly and flexibly.

Construction Stage

Use request to fetch the target page (e.g., an article list on site A) and then parse the response with cheerio to extract the desired information.

Finally, write the extracted results to result.json:

const fs = require('fs');
fs.writeFileSync('result.json', JSON.stringify(data, null, 2));

Experiment Stage

Run the crawler with node crawler.js. After execution, a result.json file should appear in your directory containing the scraped data.

Congratulations, you have built a functional web crawler with only about 16 lines of code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Node.jscheeriorequestWeb Crawler
Tencent IMWeb Frontend Team
Written by

Tencent IMWeb Frontend Team

IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.