Backend Development 5 min read

Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio

This guide walks you through creating a lightweight Node.js web crawler using the request and cheerio modules, covering preparation, installation, core code, and testing steps, so you can fetch page HTML, parse data, and store results with just a few dozen lines of code.

Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio

Since Node.js appeared, developers have used it for tasks traditionally handled by backend languages like PHP or Python, such as writing web crawlers. This tutorial shows how to build a simple crawler with just a few dozen lines of code.

Crawler Overview

Send HTTP requests to obtain page HTML (optionally adding headers like cookies or referer). Parse the HTML using regular expressions or third‑party modules to extract useful data. Persist the extracted data to a database or file.

Preparation Stage

NPM

Install the required modules:

request

and

cheerio

Run

npm install request cheerio

in your project directory.

After installation, your

package.json

will contain the two dependencies.

crawler.js

Create a file named

crawler.js

and require the installed modules:

<code>const request = require('request');
const cheerio = require('cheerio');
</code>

Learning Stage

REQUEST

The

request

module is a simplified HTTP client that wraps

http.request

, making it easy to download resources.

CHEERIO

cheerio

provides a server‑side implementation of jQuery’s core API, allowing you to manipulate and query the DOM of fetched HTML quickly and flexibly.

Construction Stage

Use

request

to fetch the target page (e.g., an article list on site A) and then parse the response with

cheerio

to extract the desired information.

Finally, write the extracted results to

result.json

:

<code>const fs = require('fs');
fs.writeFileSync('result.json', JSON.stringify(data, null, 2));
</code>

Experiment Stage

Run the crawler with

node crawler.js

. After execution, a

result.json

file should appear in your directory containing the scraped data.

Congratulations, you have built a functional web crawler with only about 16 lines of code.

JavaScriptNode.jscheeriorequestweb crawler
Tencent IMWeb Frontend Team
Written by

Tencent IMWeb Frontend Team

IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.