将编码的 html 个实体转换为 utf-8

convert encoded html entities to utf-8

如何将此字符串转换为 UTF-8:

&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041

我也想转换这个:

&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29

我想防止 XSS 攻击,我正在使用这篇文章作弊 sheet https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet

我的策略是将上面的字符串转为UTF-8,然后检查是否包含javascript.

我做了一个简单的函数来获得可能的HTML,检查:

$decimalHTML = '&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041';
$hexHTML = '&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29';

function getDecimalHTML($str) {
    return str_replace(
        '&#',
        '',
        preg_replace_callback(
            '/\d+/',
            function($v) {
                return str_replace(';', '', implode(array_map('chr', $v)));
            }, $str
        )
    );
}

function getHexDecimalHTML($str) {
    return str_replace(
        array('&#', 'x'),
        '',
        preg_replace_callback(
            '/(?<=x)\w+/',
            function($v) {
                return str_replace(';', '', implode(array_map('hex2bin', $v)));
            },
            $str
        )
    );
}

echo getDecimalHTML($decimalHTML) . "\n";
echo getHexDecimalHTML($hexHTML);

告诉我:

javascript:alert('XSS')
javascript:alert('XSS') 

我用 chr to get de char from ASCII and hex2bin 从十六进制代码中获取字符串....

我建议不要重新发明轮子并使用适合您的库,它们涵盖了这个问题的所有方面,例如 AntiXSS