Build an Efficient Incremental Sitemap Generator in Go: Automation, Sharding, and Search Engine Ping

This article presents a Go‑based solution for large‑scale SEO sitemap management that incrementally generates shards, automatically updates the index, cleans up old files, and pings search engines, all without external libraries or storage dependencies.

Code Wrench
Code Wrench
Code Wrench
Build an Efficient Incremental Sitemap Generator in Go: Automation, Sharding, and Search Engine Ping

Background and Problem

Sitemaps are essential for SEO, but on large dynamic sites they cause several issues:

Full‑generation latency : millions of URLs require a complete database scan each run.

Duplicate notifications : unchanged URLs are resent, wasting bandwidth and reducing search‑engine trust.

Complex index and file management : many shards with inconsistent naming make maintenance difficult.

Lack of automation : manual generation, upload, index update and ping are inefficient.

Design Goals

Incremental generation – process only new or modified URLs.

Automatic sharding and compression – each shard limited to 50,000 URLs or 50 MB uncompressed.

Automatic index maintenance – append new shards to sitemap-index.xml and keep only the most recent N shards.

Search‑engine ping – automatically notify Google and Bing of updates.

Timestamped filenames for traceability.

Implementation Details

State Management and Incremental Detection

The tool stores a mapping of URL to its lastmod value in state.json:

{
  "https://example.com/item/1": "2025-11-08",
  "https://example.com/item/2": "2025-11-07"
}

During each run the program:

Loads the previous state.json.

Compares the current URL collection with the stored state.

Selects URLs that are new or whose lastmod has changed.

Generates sitemap shards only for those URLs.

Go example for detecting changes:

type UrlEntry struct {
    Loc     string `xml:"loc"`
    LastMod string `xml:"lastmod,omitempty"`
}

type StateMap map[string]string

func diffURLs(old StateMap, all []UrlEntry) (changed []UrlEntry, newState StateMap) {
    newState = make(StateMap, len(all))
    for _, u := range all {
        newState[u.Loc] = u.LastMod
        if last, ok := old[u.Loc]; !ok || last != u.LastMod {
            changed = append(changed, u)
        }
    }
    return
}

Sharding and Compression

Each shard must satisfy two limits:

Maximum 50,000 URLs.

Maximum 50 MB uncompressed size.

Example shard filename (timestamped): delta-20251108-153045-001-part001.xml.gz Core writer that respects both limits (gzip compression is applied):

func writeSitemapGzipWithLimit(baseFilename string, urls <-chan UrlEntry, maxURLs int, maxBytes int64) ([]SitemapMeta, error) {
    // Iterate over URLs, write to the current shard.
    // When maxURLs or maxBytes is exceeded, close the current file and start a new shard.
    // Return metadata for each created shard.
}

Index Maintenance and Old File Cleanup

After creating new shards the tool appends their entries to the main index sitemap-index.xml and retains only the most recent keep entries:

type SitemapMeta struct {
    Loc     string
    LastMod string
}

func appendToSitemapIndex(indexPath string, newEntries []SitemapMeta, baseURL string, keep int) (int, error) {
    // Read existing index, append new entries, keep the latest 'keep' entries, write back.
}

Shards that are no longer referenced in the index are removed:

func cleanupOldFiles(dir string, indexPath string) error {
    // List files referenced in the index and delete any shard file in 'dir' that is not listed.
}

Search‑Engine Ping

The generator notifies major search engines so new content is crawled promptly:

func pingSearchEngines(sitemapURL string) {
    endpoints := []string{
        "https://www.google.com/ping?sitemap=" + url.QueryEscape(sitemapURL),
        "https://www.bing.com/ping?sitemap=" + url.QueryEscape(sitemapURL),
    }
    for _, ep := range endpoints {
        resp, _ := http.Get(ep)
        log.Printf("[INFO] ping %s status: %d", ep, resp.StatusCode)
    }
}

Overall Execution Flow

Load state.json.

Fetch the current set of URLs.

Identify changed URLs and generate gzip‑compressed shards.

Append new shards to sitemap-index.xml while keeping only the latest N shards.

Delete obsolete shard files.

Write the updated state back to state.json.

Ping Google and Bing.

Example command to run the generator:

./sitemap-gen -n 120000 -workers 6 -base https://mysite.com -keep 50 -ping=true

Source Code and Repository

Complete Go source code is open‑source and can be cloned for direct execution or further development.

Repository URLs:

GitHub: https://github.com/louis-xie-programmer/sitemap-gen

Gitee: https://gitee.com/louis_xie/sitemap-gen

Sitemap Generator Screenshot
Sitemap Generator Screenshot
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AutomationGoSEOincrementalSitemap
Code Wrench
Written by

Code Wrench

Focuses on code debugging, performance optimization, and real-world engineering, sharing efficient development tips and pitfall guides. We break down technical challenges in a down-to-earth style, helping you craft handy tools so every line of code becomes a problem‑solving weapon. 🔧💻

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.