Backend Development 9 min read

A Case Study of Troubleshooting Service Log Garbled Character Issues

This article details a step‑by‑step investigation of Java service log garbled‑character problems caused by incorrect LC_CTYPE and locale settings, describing how environment variable synchronization via SSH led to ASCII encoding defaults and outlining preventive configurations for both client and server.

转转QA

Sep 6, 2022

A Case Study of Troubleshooting Service Log Garbled Character Issues

Problem Emergence

During routine work a user reported garbled characters in Java service logs on an internal test environment; Chinese characters were not displayed correctly, prompting an investigation.

Investigation Process

Checking the Java process details revealed the encoding was set to ASCII (shown as ANS_X3.4. in the JDK).

All service files themselves were correctly encoded in UTF‑8, indicating the issue was not within the service files.

Since the Java service default encoding was abnormal and the service start command did not specify an encoding, the JVM reads the system configuration to determine the default. Setting -Dfile.encoding=UTF-8 immediately fixed the log output. After a restart without the flag, the problem reappeared on multiple machines, showing the issue was not a one‑off.

Deep Dive Investigation

Testing on KVM virtual machines and Docker containers showed the issue only on KVM hosts. Using a custom agent, environment variables were collected from both problematic and normal machines.

The LC_CTYPE and LANG settings differed; hard‑coding LC_CTYPE to en_US.UTF-8 on the problematic machine restored correct logging, suggesting the root cause was an erroneous LC_CTYPE value. Investigation of iTerm2 SSH issues revealed that client‑side environment variables are transmitted to the remote host based on SendEnv (client) and AcceptEnv (server) configurations.

The variables sent depend on the local ssh_config SendEnv setting, and the remote sshd_config AcceptEnv setting determines which are accepted.

Problem Resolution

The solution focuses on three aspects: prevent the client from sending wrong encoding, prevent the server from accepting it, and add a fallback check.

Client should not send incorrect encoding:

Configure iTerm2 to stop sending UTF‑8 locale variables.

Modify the local /etc/ssh/ssh_config to comment out or remove the SendEnv line.

Server should not accept incorrect encoding:

If you have admin rights on the remote host, edit /etc/ssh/sshd_config to remove the unwanted variables.

Add a fallback in the agent before service start to verify the LC_CTYPE value; if it is set to UTF‑8, change it to en_US.UTF-8 so the JVM picks up the correct locale.

Underlying Mechanism

Linux locale documentation states that setting locale variables triggers setlocale(). During program start, the runtime calls setlocale(LC_ALL, "C") by default.

setlocale

is defined in locale.h and takes two parameters: the category (e.g., LC_ALL, LC_CTYPE) and the value (e.g., "en_US.UTF-8"). The value must follow the pattern language[_territory][.codeset][@modifier]. An invalid value like plain "UTF-8" is rejected, leaving the locale as "C", which corresponds to 7‑bit ASCII.

When the remote machine received an incorrect LC_CTYPE of UTF‑8, setlocale failed, fell back to "C", and the Java service consequently used ASCII encoding, causing the garbled logs.

Personal Takeaways

Technical problems require rigorous investigation rather than quick fixes; summarizing and documenting findings helps prevent recurrence.

Want to learn more about ZhiZhi's engineering practices? Follow the official public account.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java encoding Linux Environment Variables

Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.