Operations 9 min read

Resolving Chinese Filename Encoding Issues on S3 Website with Nginx Proxy

The article explains how Chinese filenames cause encoding errors when uploading to Amazon S3, why S3 forces UTF‑8 for website access, and presents a practical Nginx‑based proxy solution that transparently converts GBK‑encoded URIs to UTF‑8 to ensure reliable file retrieval.

NetEase Game Operations Platform
NetEase Game Operations Platform
NetEase Game Operations Platform
Resolving Chinese Filename Encoding Issues on S3 Website with Nginx Proxy

The author, a veteran of NetEase Game operations, shares a recent migration of a game streaming site to Amazon S3 and the unexpected problems caused by Chinese characters in file names.

1. Chinese Filename and S3 Upload Encoding

When a file name contains Chinese characters, the encoding used during upload must match the environment; a GBK‑encoded name uploaded from a GBK environment succeeds, but the same name uploaded from a UTF‑8 environment fails. Moreover, when uploading whole directories, S3 silently skips files whose names are incorrectly encoded.

2. S3 Website Chinese Filename Issue

S3 website endpoints only support UTF‑8, so any uploaded file name is forced to UTF‑8. Accessing a GBK‑encoded name from a GBK client returns HTTP 400, while the same request in UTF‑8 succeeds after S3 automatically converts the name.

3. Impact of Forced UTF‑8

Most PC games on Windows use GBK encoding for resources such as battle recordings. When these GBK‑named files are stored on S3, a client that requests them using GBK receives a 400 error because the S3 website expects UTF‑8.

4. Current File Access Path

The article briefly mentions the use of an S3 proxy to provide virtual hosts for bucket sub‑directories and to unify access for multiple buckets.

5. How to Fix the Encoding Problem

The recommended approach is server‑side adaptation: before forwarding a client request to the S3 bucket, convert the URI from GBK to UTF‑8. This can be implemented in an Nginx/OpenResty proxy using the resty.iconv library or the native iconv module.

URI Conversion Considerations

Both GBK and UTF‑8 filenames may appear; the proxy must detect the client encoding.

If the URI contains only single‑byte characters, no conversion is needed.

For GBK requests, convert the URI to UTF‑8 before proxying to S3.

For UTF‑8 requests, forward unchanged.

Example Nginx Configuration (Lua/Iconv)

# nginx + lua/iconv (openresty)
rewrite_by_lua_block {
    local iconv = require 'resty.iconv'
    local request_uri = ngx.var.request_uri
    local handler, errmsg = iconv:new("utf-8", "gbk")
    if not handler then
        return ngx.say(errmsg)
    end
    local request_uri_utf8, words = handler:convert(request_uri)
    if not request_uri_utf8 then
        return ngx.say(words)
    end
    ngx.req.set_uri(request_uri_utf8)
}

Alternatively, the native iconv module can be used.

Fail‑over Strategy

Because Nginx cannot reliably detect the client’s original encoding, the proxy performs a blind conversion assuming GBK or UTF‑8. If the conversion causes S3 to return HTTP 403, the proxy captures the error and retries with the opposite conversion, ensuring successful access.

Full Proxy Setup

# Two upstream servers (both point to the same S3 bucket)
server { listen 8003 default_server; server_name wahaha-wahaha.s3.nie.netease.com; location / { proxy_pass http://wahaha-wahaha.s3.nie.netease.com; } }
server { listen 8004 default_server; server_name wahaha-wahaha.s3.nie.netease.com; location / { proxy_pass http://wahaha-wahaha.s3.nie.netease.com; } }
# Upstream definition with backup for GBK handling
upstream v.nie { server 127.0.0.1:8003; server 127.0.0.1:8004 backup; }
# S3 Proxy entry
server { listen 80; server_name wahaha.wahaha.netease.com; proxy_next_upstream http_403; location / { proxy_pass http://v.nie$uri; } }

Testing shows that after URI conversion, both GBK and UTF‑8 requests retrieve the correct files from the S3 website.

Conclusion

The presented solution enables seamless access to S3‑hosted files regardless of whether the client uses GBK or UTF‑8 encoding, eliminating the 400/403 errors caused by mismatched character sets.

operationsencodingNginxUTF-8cloud storageS3GBK
NetEase Game Operations Platform
Written by

NetEase Game Operations Platform

The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.