A Case Study of Troubleshooting Service Log Garbled Character Issues
This article details a step‑by‑step investigation of Java service log garbled‑character problems caused by incorrect LC_CTYPE and locale settings, describing how environment variable synchronization via SSH led to ASCII encoding defaults and outlining preventive configurations for both client and server.
A Case Study of Troubleshooting Service Log Garbled Character Issues
Problem Emergence
During routine work a user reported garbled characters in Java service logs on an internal test environment; Chinese characters were not displayed correctly, prompting an investigation.
Investigation Process
Checking the Java process details revealed the encoding was set to ASCII (shown as ANS_X3.4. in the JDK).
All service files themselves were correctly encoded in UTF‑8, indicating the issue was not within the service files.
Since the Java service default encoding was abnormal and the service start command did not specify an encoding, the JVM reads the system configuration to determine the default. Setting -Dfile.encoding=UTF-8 immediately fixed the log output. After a restart without the flag, the problem reappeared on multiple machines, showing the issue was not a one‑off.
Deep Dive Investigation
Testing on KVM virtual machines and Docker containers showed the issue only on KVM hosts. Using a custom agent, environment variables were collected from both problematic and normal machines.
The LC_CTYPE and LANG settings differed; hard‑coding LC_CTYPE to en_US.UTF-8 on the problematic machine restored correct logging, suggesting the root cause was an erroneous LC_CTYPE value. Investigation of iTerm2 SSH issues revealed that client‑side environment variables are transmitted to the remote host based on SendEnv (client) and AcceptEnv (server) configurations.
The variables sent depend on the local ssh_config SendEnv setting, and the remote sshd_config AcceptEnv setting determines which are accepted.
Problem Resolution
The solution focuses on three aspects: prevent the client from sending wrong encoding, prevent the server from accepting it, and add a fallback check.
Client should not send incorrect encoding: Configure iTerm2 to stop sending UTF‑8 locale variables. Modify the local /etc/ssh/ssh_config to comment out or remove the SendEnv line.
Server should not accept incorrect encoding: If you have admin rights on the remote host, edit /etc/ssh/sshd_config to remove the unwanted variables.
Add a fallback in the agent before service start to verify the LC_CTYPE value; if it is set to UTF‑8, change it to en_US.UTF-8 so the JVM picks up the correct locale.
Underlying Mechanism
Linux locale documentation states that setting locale variables triggers setlocale() . During program start, the runtime calls setlocale(LC_ALL, "C") by default.
setlocale is defined in locale.h and takes two parameters: the category (e.g., LC_ALL, LC_CTYPE) and the value (e.g., "en_US.UTF-8"). The value must follow the pattern language[_territory][.codeset][@modifier] . An invalid value like plain "UTF-8" is rejected, leaving the locale as "C", which corresponds to 7‑bit ASCII.
When the remote machine received an incorrect LC_CTYPE of UTF‑8, setlocale failed, fell back to "C", and the Java service consequently used ASCII encoding, causing the garbled logs.
Personal Takeaways
Technical problems require rigorous investigation rather than quick fixes; summarizing and documenting findings helps prevent recurrence.
Want to learn more about ZhiZhi's engineering practices? Follow the official public account.
转转QA
In the era of knowledge sharing, discover 转转QA from a new perspective.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.