Using Go’s Standard Library to Crawl with an HTTP Proxy

This guide demonstrates building a simple Go crawler that fetches a webpage using only the standard library, then extends it to route requests through an HTTP proxy, covering proxy parsing, custom client configuration, error handling, and essential Go best practices such as deferring response closure.

Golang Shines
Golang Shines
Golang Shines
Using Go’s Standard Library to Crawl with an HTTP Proxy

The article first shows how to build a basic web crawler in Go that fetches the HTML of Baidu’s homepage using only the standard library. The code imports fmt, io/ioutil, and net/http, defines targetUrl, sends a GET request with http.Get, checks for errors, defers resp.Body.Close() to avoid leaks, reads the response with ioutil.ReadAll, and prints the body. It notes that in Go 1.16+ the call can be replaced by os.ReadAll after importing os.

package main
import (
    "fmt"
    "io/ioutil"
    "net/http"
)

func main() {
    targetUrl := "https://www.baidu.com"
    resp, err := http.Get(targetUrl)
    if err != nil {
        fmt.Println("请求失败:", err)
        return
    }
    defer resp.Body.Close()
    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Println("读取内容失败:", err)
        return
    }
    fmt.Println(string(body))
}

Next, the article explains how to route requests through an HTTP proxy to hide the real IP. It defines a proxy string (with optional authentication), parses it with url.Parse, creates an http.Transport with Proxy: http.ProxyURL(proxyURL), builds a custom http.Client using that transport, and sends the request with client.Do. The example includes error handling for proxy parsing and request failures, and again reads and prints the response body. The author emphasizes that the proxy string must be a valid address and that the custom client is the key to enabling proxy support.

package main
import (
    "fmt"
    "io/ioutil"
    "net/http"
    "net/url"
)

func main() {
    targetUrl := "https://www.baidu.com"
    proxyStr := "http://username:password@proxyserver:port"
    proxyURL, err := url.Parse(proxyStr)
    if err != nil {
        fmt.Println("解析代理失败:", err)
        return
    }
    transport := &http.Transport{Proxy: http.ProxyURL(proxyURL)}
    client := &http.Client{Transport: transport}
    req, _ := http.NewRequest("GET", targetUrl, nil)
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("代理请求失败:", err)
        return
    }
    defer resp.Body.Close()
    body, _ := ioutil.ReadAll(resp.Body)
    fmt.Println(string(body))
}

Finally, the article provides three practical notes: (1) the program relies solely on Go’s standard library, so no extra packages are required; (2) defer resp.Body.Close() must be used to prevent resource leaks; (3) in Go 1.16 and later, ioutil.ReadAll can be replaced by os.ReadAll after importing the os package.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GoNetwork ProgrammingHTTP proxyweb crawlingStandard Library
Golang Shines
Written by

Golang Shines

We share daily the latest Golang technical articles, practical resources, language news, tutorials, and real-world projects to help everyone learn and improve.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.