Using Nightmare with Electron for Web Automation and Zhihu Topic Crawling
This article introduces Electron and the Nightmare framework, explains how to install and use Nightmare for web automation and crawling, and provides a complete example of scraping Zhihu topic data with JavaScript, Node.js, and Cheerio, including code snippets and JSON output.
Electron allows JavaScript to create desktop applications by exposing Chrome's native APIs; it can be seen as a Node.js variant for desktop rather than a web server.
Nightmare is a framework built on Electron that provides web automation and crawling capabilities, combining features of Puppeteer-like testing and request-like HTTP fetching.
Installation can be done via npm (optionally using the Taobao mirror) and a simple app.js script demonstrates creating a Nightmare instance, navigating to a page, logging messages, waiting, and closing.
const Nightmare = require('nightmare')
const nightmare = new Nightmare({
show: true,
openDevTools: {
mode: 'detach'
}
})
nightmare.goto('https://www.hujiang.com')
.evaluate(function() {
// can use any window/document objects here and return a promise
console.log('hello nightmare')
console.log('5 second close window')
})
.wait(5000)
.end()
.then(() => {
console.log('close nightmare')
})The script prints "hello nightmare", waits five seconds, then prints "close nightmare".
Nightmare works by leveraging Electron's Browser environment together with Node.js I/O, enabling easy implementation of crawlers.
Typical operations include browser navigation (goto, back, forward, refresh), user actions (click, mousedown, mouseup, mouseover, type, insert, select, check, uncheck, scrollTo), script injection, wait, and evaluate.
A complete Zhihu topic crawler example is presented, showing how to fetch root topics, traverse child topics via hover events, and extract data such as topic name, image, follower count, question count, and top‑answers.
/**
* 抓取对应的话题页面的url和对应的深度保存到指定的文件名中
* @param {string} rootUrl - 顶层的url
* @param {int} deep - 抓取页面的深度
* @param {string} toFile - 保存的文件名
* @param {Function} cb - 完成后的回调
*/
async function crawlerTopicsFromRoot(rootUrl, deep, toFile, cb) {
rootUrl = rootUrl || 'https://www.zhihu.com/topic/19776749/hot'
toFile = toFile || './topicsTree.json'
console.time()
const result = await interactive
.iAllTopics(rootUrl, deep)
console.timeEnd()
util.writeJSONToFile(result['topics'], toFile, cb)
}
crawlerTopicsFromRoot('', 2, '', _ => {
console.log('完成抓取')
}) // 获取对应的话题的信息
const cntObj = queue.shift()
const url = `https://www.zhihu.com/topic/${cntObj['id']}/hot`
const topicOriginalInfo = await nightmare
.goto(url)
.wait('.zu-main-sidebar') // 等待该元素的出现
.evaluate(function () {
return document.querySelector('.zu-main-sidebar').innerHTML
})
// ...后续操作
const hoverElement = `a.zm-item-tag[href$='${childTopics[i]['id']}']`
const waitElement = `.avatar-link[href$='${childTopics[i]['id']}']`
const topicAttached = await nightmare
.mouseover(hoverElement) // 触发hover事件
.wait(waitElement)
.evaluate(function () {
return document.querySelector('.zh-profile-card').innerHTML
})
.then(val => {
return parseRule.crawlerTopicNumbericalAttr(val)
})
.catch(error => {
console.error(error)
}) const $ = require('cheerio')
/** *抓取对应话题的问题数量/精华话题数量/关注者数量 */
const crawlerTopicNumbericalAttr = function (html) {
const $ = cheerio.load(html)
const keys = ['questions', 'top-answers', 'followers']
const obj = {}
obj['avatar'] = $('.Avatar.Avatar--xs').attr('src')
keys.forEach(key => {
obj[key] = ($(`div.meta a.item[href$=${key}] .value`).text() || '').trim()
})
return obj
}
/** * 抓取话题的信息 */
const crawlerTopics = function (html) {
const $ = cheerio.load(html)
const obj = {}
const childTopics = crawlerAttachTopic($, '.child-topic')
obj['desc'] = $('div.zm-editable-content').text() || ''
if (childTopics.length > 0) {
obj['childTopics'] = childTopics
}
return obj
}
/** * 抓取子话题的信息id/名称 */
const crawlerAttachTopic = function ($, selector) {
const topicsSet = []
$(selector).find('.zm-item-tag').each((index, elm) => {
const self = $(elm)
const topic = {}
topic['id'] = self.attr('data-token')
topic['value'] = self.text().trim()
topicsSet.push(topic)
})
return topicsSet
}The final JSON output illustrates the hierarchical structure of topics with fields like id, value, avatar, questions, top‑answers, followers, and child ids.
In summary, Nightmare’s main advantage for crawling is that it only requires the page URL and can trigger asynchronous data loading through simulated user actions, eliminating the need to manually craft HTTP requests.
Hujiang Technology
We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
