� 来自 ldap 的字符
� character from ldap
当我搜索某些用户时,我从 ldap 服务器收到一些奇怪的字符 info.if 值包含土耳其语字符,如“ç”,它替换为“�”。在这种情况下,我将字符串转换为 utf-8 比str_replace 修复 it.My 函数是这样的;
function utf8char($str) {
$search = array('Ý','ý', 'þ' ,'Þ' ,'ð','Ð');
$replace = array('İ' ,'ı' ,'ş','Ş','ğ','Ğ');
return str_replace($search, $replace, $str);
}
但有时这会导致一些问题,所以我必须检测字符串是否包含 '�' 字符才能修复 it.strpos work.Can 没有人对此说点什么吗?还有这个狗屎'?'字符是什么,如果有人能解释一下我会很高兴...
编辑:这是我的代码片段;
$name = $ldapHandler->get_user_info('username')['name'];
echo $name;
echo utf8_decode($name);
echo mb_convert_encoding($name,'utf-8');
echo utf8char(mb_convert_encoding($name,'utf-8'));
此代码的输出;
Bilgi ��lem Daire Ba�kanl���
Bilgi ?lem Daire Ba?kanl??
Bilgi Ýþlem Daire Baþkanlýðý
Bilgi İşlem Daire Başkanlığı (this is the correct string)
在将其存储到数据库中时使用 utf8_encode()。并在获取时使用 utf8_decode().
已经很久了,但我决定分享我遇到同样问题的解决方案。
这个功能对我有用:
function repair($value) {
$res = @iconv("UTF-8", "UTF-8//IGNORE", $value);
if (strlen($value) != strlen($res)) {
return w1250_to_utf8($value);
}
return $res;
}
function w1250_to_utf8($text) {
// map based on:
// http://konfiguracja.c0.pl/iso02vscp1250en.html
// http://konfiguracja.c0.pl/webpl/index_en.html#examp
// http://www.htmlentities.com/html/entities/
$map = array(
chr(0x8A) => chr(0xA9),
chr(0x8C) => chr(0xA6),
chr(0x8D) => chr(0xAB),
chr(0x8E) => chr(0xAE),
chr(0x8F) => chr(0xAC),
chr(0x9C) => chr(0xB6),
chr(0x9D) => chr(0xBB),
chr(0xA1) => chr(0xB7),
chr(0xA5) => chr(0xA1),
chr(0xBC) => chr(0xA5),
chr(0x9F) => chr(0xBC),
chr(0xB9) => chr(0xB1),
chr(0x9A) => chr(0xB9),
chr(0xBE) => chr(0xB5),
chr(0x9E) => chr(0xBE),
chr(0x80) => '€',
chr(0x82) => '‚',
chr(0x84) => '„',
chr(0x85) => '…',
chr(0x86) => '†',
chr(0x87) => '‡',
chr(0x89) => '‰',
chr(0x8B) => '‹',
chr(0x91) => '‘',
chr(0x92) => '’',
chr(0x93) => '“',
chr(0x94) => '”',
chr(0x95) => '•',
chr(0x96) => '–',
chr(0x97) => '—',
chr(0x99) => '™',
chr(0x9B) => '’',
chr(0xA6) => '¦',
chr(0xA9) => '©',
chr(0xAB) => '«',
chr(0xAE) => '®',
chr(0xB1) => '±',
chr(0xB5) => 'µ',
chr(0xB6) => '¶',
chr(0xB7) => '·',
chr(0xBB) => '»',
);
$search = array('Ý', 'ý', 'þ', 'Þ', 'ð', 'Ð');
$replace = array('İ', 'ı', 'ş', 'Ş', 'ğ', 'Ğ');
mb_internal_encoding("ISO-8859-1");
return str_replace($search, $replace, html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8'), ENT_QUOTES, 'UTF-8'));
}
当我搜索某些用户时,我从 ldap 服务器收到一些奇怪的字符 info.if 值包含土耳其语字符,如“ç”,它替换为“�”。在这种情况下,我将字符串转换为 utf-8 比str_replace 修复 it.My 函数是这样的;
function utf8char($str) {
$search = array('Ý','ý', 'þ' ,'Þ' ,'ð','Ð');
$replace = array('İ' ,'ı' ,'ş','Ş','ğ','Ğ');
return str_replace($search, $replace, $str);
}
但有时这会导致一些问题,所以我必须检测字符串是否包含 '�' 字符才能修复 it.strpos work.Can 没有人对此说点什么吗?还有这个狗屎'?'字符是什么,如果有人能解释一下我会很高兴...
编辑:这是我的代码片段;
$name = $ldapHandler->get_user_info('username')['name'];
echo $name;
echo utf8_decode($name);
echo mb_convert_encoding($name,'utf-8');
echo utf8char(mb_convert_encoding($name,'utf-8'));
此代码的输出;
Bilgi ��lem Daire Ba�kanl���
Bilgi ?lem Daire Ba?kanl??
Bilgi Ýþlem Daire Baþkanlýðý
Bilgi İşlem Daire Başkanlığı (this is the correct string)
在将其存储到数据库中时使用 utf8_encode()。并在获取时使用 utf8_decode().
已经很久了,但我决定分享我遇到同样问题的解决方案。
这个功能对我有用:
function repair($value) {
$res = @iconv("UTF-8", "UTF-8//IGNORE", $value);
if (strlen($value) != strlen($res)) {
return w1250_to_utf8($value);
}
return $res;
}
function w1250_to_utf8($text) {
// map based on:
// http://konfiguracja.c0.pl/iso02vscp1250en.html
// http://konfiguracja.c0.pl/webpl/index_en.html#examp
// http://www.htmlentities.com/html/entities/
$map = array(
chr(0x8A) => chr(0xA9),
chr(0x8C) => chr(0xA6),
chr(0x8D) => chr(0xAB),
chr(0x8E) => chr(0xAE),
chr(0x8F) => chr(0xAC),
chr(0x9C) => chr(0xB6),
chr(0x9D) => chr(0xBB),
chr(0xA1) => chr(0xB7),
chr(0xA5) => chr(0xA1),
chr(0xBC) => chr(0xA5),
chr(0x9F) => chr(0xBC),
chr(0xB9) => chr(0xB1),
chr(0x9A) => chr(0xB9),
chr(0xBE) => chr(0xB5),
chr(0x9E) => chr(0xBE),
chr(0x80) => '€',
chr(0x82) => '‚',
chr(0x84) => '„',
chr(0x85) => '…',
chr(0x86) => '†',
chr(0x87) => '‡',
chr(0x89) => '‰',
chr(0x8B) => '‹',
chr(0x91) => '‘',
chr(0x92) => '’',
chr(0x93) => '“',
chr(0x94) => '”',
chr(0x95) => '•',
chr(0x96) => '–',
chr(0x97) => '—',
chr(0x99) => '™',
chr(0x9B) => '’',
chr(0xA6) => '¦',
chr(0xA9) => '©',
chr(0xAB) => '«',
chr(0xAE) => '®',
chr(0xB1) => '±',
chr(0xB5) => 'µ',
chr(0xB6) => '¶',
chr(0xB7) => '·',
chr(0xBB) => '»',
);
$search = array('Ý', 'ý', 'þ', 'Þ', 'ð', 'Ð');
$replace = array('İ', 'ı', 'ş', 'Ş', 'ğ', 'Ğ');
mb_internal_encoding("ISO-8859-1");
return str_replace($search, $replace, html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8'), ENT_QUOTES, 'UTF-8'));
}