Boost Web Scraping Speed with Golang: Frameworks, Code Samples, and a Real‑World Douban Case

This article explains why Golang’s concurrency and low‑resource footprint make it ideal for web crawling, compares major Go crawling frameworks, and walks through a practical Douban book‑list and comment scraper with complete Colly and Rod code samples, plus a selection matrix and best‑practice summary.

Code Wrench
Code Wrench
Code Wrench
Boost Web Scraping Speed with Golang: Frameworks, Code Samples, and a Real‑World Douban Case

Why Choose Golang for Crawlers

In the information‑age, web crawling is a core skill for developers, and Golang’s built‑in high‑concurrency performance translates into significant efficiency gains for crawler projects. Its lightweight goroutines and efficient memory management support massive concurrent requests, dramatically reducing crawl time.

Strong concurrency : goroutine + channel easily handle high‑concurrency fetching.

High performance, low resource usage : suitable for large‑scale crawling tasks.

Easy deployment : compiled binary, container‑friendly and easy to schedule in distributed environments.

One‑line summary : Python is convenient, Go is a high‑performance data‑collection tool.

Popular Golang Crawling Frameworks

Colly – high‑performance crawler with a simple API, supports concurrency, queue, cache; cannot execute JavaScript natively.

goquery – HTML parsing library with jQuery‑like DOM queries; only parses, does not perform fetching.

chromedp – Chrome DevTools driver capable of scraping dynamic pages; incurs high browser resource consumption.

Rod – high‑level DevTools driver with automatic waiting and a modern API; similar to chromedp but with a smaller community.

Crawlab – distributed crawling management platform offering visual scheduling and multi‑language support; requires deployment and operational effort.

Douban Book & Review Case Study

Target site: Douban Books. The book‑list page is static HTML and can be batch‑scraped with Colly. The comment page requires JavaScript rendering and is handled by Rod.

4.1 Colly – Scrape Book List

package main

import (
    "fmt"
    "log"
    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector(
        colly.AllowedDomains("book.douban.com"),
        colly.Async(true),
    )
    c.Limit(&colly.LimitRule{
        DomainGlob: "*douban.*",
        Parallelism: 2,
    })
    c.OnHTML(".subject-item", func(e *colly.HTMLElement) {
        title := e.ChildText("h2 a")
        href := e.ChildAttr("h2 a", "href")
        author := e.ChildText(".pub")
        fmt.Printf("书名:%s
链接:%s
作者信息:%s

", title, href, author)
    })
    c.OnError(func(r *colly.Response, err error) {
        log.Println("Request error:", r.Request.URL, err)
    })
    c.Visit("https://book.douban.com/tag/小说")
    c.Wait()
}

4.2 Rod – Scrape Comments

package main

import (
    "context"
    "fmt"
    "time"
    "github.com/go-rod/rod"
    "github.com/go-rod/rod/lib/launcher"
)

func main() {
    url := launcher.New().Headless(true).MustLaunch()
    browser := rod.New().ControlURL(url).MustConnect()
    defer browser.MustClose()

    page := browser.MustPage("https://book.douban.com/subject/35582072/comments/")
    page.Timeout(10 * time.Second).MustWaitLoad()
    page.MustWaitElements(".comment-item")

    comments := page.MustElements(".comment-item .short")
    for i, c := range comments {
        if i >= 5 {
            break
        }
        fmt.Println("评论:", c.MustText())
    }
}

Note : The examples only fetch public comments for learning and analysis, respecting privacy, copyright, and robots.txt.

Framework Selection Matrix

Static page batch scraping – Colly : efficient, lightweight.

HTML parsing – goquery : simple and easy to use.

JS‑rendered pages – Rod / chromedp : browser simulation with interactive capabilities.

Distributed scheduling – Crawlab : visual management, multi‑language support.

Analogy of Crawling Schools

Colly → Shaolin light‑skill school: fast and steady.

goquery → Scholar school: deep parsing skill.

Rod → Demonic sect weapon: handles JS pages easily but heavy.

Crawlab → Martial‑arts alliance leader: orchestrates all disciples.

Conclusion

Combine Colly for static list extraction with Rod for dynamic comment fetching to achieve high‑efficiency crawling. Select the framework according to page type (static vs. dynamic) and consider Crawlab for large‑scale, distributed tasks, while ensuring compliance with copyright and robots protocols.

已生成图片
已生成图片
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendGolangcrawlingweb-scrapingcollyrod
Code Wrench
Written by

Code Wrench

Focuses on code debugging, performance optimization, and real-world engineering, sharing efficient development tips and pitfall guides. We break down technical challenges in a down-to-earth style, helping you craft handy tools so every line of code becomes a problem‑solving weapon. 🔧💻

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.