Why Sending a Tilde to an LLM Can Erase Your Entire Home Directory
A recent ACL 2026 paper uncovers a “Emoticon Semantic Confusion” vulnerability in large language models, where the tilde symbol (~) intended as a friendly emoticon is interpreted as the shell shortcut for the home directory, causing silent, irreversible deletions across major LLMs with a 38.6 % confusion rate.
Imagine working late night with an AI code assistant that creates a temporary directory tmp for testing. After the tests you type a casual command: “Task done, delete this directory~”. The trailing tilde (~) is merely a friendly emoticon, but the LLM parses it as the shell shortcut for the user’s home directory and executes rm -rf ~, erasing the entire home folder without warning.
The paper titled False Friends in the Shell: Unveiling the Emoticon Semantic Confusion in Large Language Models (accepted at ACL 2026) reports the first systematic study of this vulnerability, which the authors call Emoticon Semantic Confusion . The research team from Xi’an Jiaotong University, Nanyang Technological University, and UMass Amherst demonstrates that LLMs simultaneously process natural language and programming language, leading to divergent interpretations of symbols that look identical.
Key symbols that cause confusion include:
~ – user home directory in shells, but a casual emoticon in conversation.
* – wildcard in shells, yet a decorative asterisk.
> – output redirection in shells, often used as a smiley.
.. – parent‑directory navigation, sometimes used as an ellipsis.
() – function call or subshell execution, also used for emoticon faces.
Drawing on the linguistic concept of “false friends”, the authors argue that these symbols act as false friends between human language and code. Humans see a symbol as expressing tone, while the model sees it as syntax.
To quantify the problem, the team built an automated framework that mined over 60,000 real‑world emoticons, selected high‑risk candidates, and generated 3,757 test cases covering file management, database operations, and system administration across 21 realistic task scenarios. The tests span four programming languages (Shell, Python, SQL, JavaScript) and evaluate six leading LLMs: GPT, Claude, Gemini, Qwen, and two others.
The results are stark: no model is immune. The average confusion rate is 38.6 %, meaning roughly one in three requests containing an emoticon is mis‑interpreted. Even the best‑performing models, Claude and Qwen, exceed a 34 % confusion rate. Over 70 % of users habitually insert emoticons when interacting with code‑oriented AIs, making the issue widespread.
More concerning is the prevalence of “silent failures”: over 90 % of confused responses execute correctly syntactically but deviate semantically from the user’s intent. Half of these silent failures are classified as high‑risk, leading to actions such as deleting non‑target files, overwriting critical system configurations, or altering database schemas.
Code syntax is correct and runs, but the semantics are completely off.
Embedding LLMs in automation agents does not mitigate the risk; system prompts like “ignore emoticons” have little effect. The authors call for the security community to treat fine‑grained human‑AI interaction issues as core reliability concerns rather than minor UX problems.
In summary, the study reveals a structural mismatch: human communication habits (emoticons) collide with machine‑level syntax, producing potentially catastrophic outcomes as LLMs become more embedded in production pipelines.
The paper urges both academia and industry to incorporate fine‑grained safety checks into AI system design, aiming to make models understand human intent without forcing users to abandon natural expressive habits.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
