Detecting and Removing Emoji Characters in PHP Strings
This article explains how to determine whether a PHP string contains emoji characters, provides functions to detect and strip emojis using multibyte string functions, and discusses storing emoji‑containing strings in MySQL with utf8mb4 or base64 encoding.
UTF‑8 encoded emoji characters and some special symbols occupy four bytes, while common Chinese characters use three bytes.
1. Detecting emoji in a string
PHP provides three built‑in functions useful for this task:
mb_strlen – returns the number of characters in a string for a given encoding.
mb_substr – performs a multibyte‑safe substring operation.
strlen – returns the byte length of a string.
<code>mixed mb_strlen(string $str [, string $encoding = mb_internal_encoding()])</code> <code>string mb_substr(string $str, int $start [, int $length = NULL [, string $encoding = mb_internal_encoding()]])</code> <code>int strlen(string $string)</code>Using these functions, the following helper determines if a string contains any emoji:
<code>function haveEmojiChar($str) {
$mbLen = mb_strlen($str);
$strArr = [];
for ($i = 0; $i < $mbLen; $i++) {
$strArr[] = mb_substr($str, $i, 1, 'utf-8');
if (strlen($strArr[$i]) >= 4) {
return true;
}
}
return false;
}</code>2. Removing emoji from a string
<code>function removeEmojiChar($str) {
$mbLen = mb_strlen($str);
$strArr = [];
for ($i = 0; $i < $mbLen; $i++) {
$mbSubstr = mb_substr($str, $i, 1, 'utf-8');
if (strlen($mbSubstr) >= 4) {
continue;
}
$strArr[] = $mbSubstr;
}
return implode('', $strArr);
}</code>3. Storing strings with emoji in MySQL
Use the utf8mb4 character set.
Encode the string with base64_encode before storing and decode after retrieval.
Alternatively, simply remove emoji characters before saving.
php中文网 Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.