Using Go’s Standard Library to Crawl with an HTTP Proxy
This guide demonstrates building a simple Go crawler that fetches a webpage using only the standard library, then extends it to route requests through an HTTP proxy, covering proxy parsing, custom client configuration, error handling, and essential Go best practices such as deferring response closure.
The article first shows how to build a basic web crawler in Go that fetches the HTML of Baidu’s homepage using only the standard library. The code imports fmt, io/ioutil, and net/http, defines targetUrl, sends a GET request with http.Get, checks for errors, defers resp.Body.Close() to avoid leaks, reads the response with ioutil.ReadAll, and prints the body. It notes that in Go 1.16+ the call can be replaced by os.ReadAll after importing os.
package main
import (
"fmt"
"io/ioutil"
"net/http"
)
func main() {
targetUrl := "https://www.baidu.com"
resp, err := http.Get(targetUrl)
if err != nil {
fmt.Println("请求失败:", err)
return
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Println("读取内容失败:", err)
return
}
fmt.Println(string(body))
}Next, the article explains how to route requests through an HTTP proxy to hide the real IP. It defines a proxy string (with optional authentication), parses it with url.Parse, creates an http.Transport with Proxy: http.ProxyURL(proxyURL), builds a custom http.Client using that transport, and sends the request with client.Do. The example includes error handling for proxy parsing and request failures, and again reads and prints the response body. The author emphasizes that the proxy string must be a valid address and that the custom client is the key to enabling proxy support.
package main
import (
"fmt"
"io/ioutil"
"net/http"
"net/url"
)
func main() {
targetUrl := "https://www.baidu.com"
proxyStr := "http://username:password@proxyserver:port"
proxyURL, err := url.Parse(proxyStr)
if err != nil {
fmt.Println("解析代理失败:", err)
return
}
transport := &http.Transport{Proxy: http.ProxyURL(proxyURL)}
client := &http.Client{Transport: transport}
req, _ := http.NewRequest("GET", targetUrl, nil)
resp, err := client.Do(req)
if err != nil {
fmt.Println("代理请求失败:", err)
return
}
defer resp.Body.Close()
body, _ := ioutil.ReadAll(resp.Body)
fmt.Println(string(body))
}Finally, the article provides three practical notes: (1) the program relies solely on Go’s standard library, so no extra packages are required; (2) defer resp.Body.Close() must be used to prevent resource leaks; (3) in Go 1.16 and later, ioutil.ReadAll can be replaced by os.ReadAll after importing the os package.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Golang Shines
We share daily the latest Golang technical articles, practical resources, language news, tutorials, and real-world projects to help everyone learn and improve.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
