这种创造性的输入净化方式可能面临什么样的安全漏洞？（如有）

Question

清理输入的标准方法是使用诸如

之类的命令

$url = preg_replace('|[^a-z0-9-~+_.?#=!&;,/:%@$\|*\'()\x80-\xff]|i', '', $url);

$strip = array('%0d', '%0a', '%0D', '%0A');

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

echo htmlentities($str);

但是，我喜欢当我的用户能够在他们的输入中使用括号、克拉、引号等漂亮的东西时，comments/usernames/etcetc。由于 HTML 将 ( 等代码呈现为 ( 等符号，我希望使用这种替代方法来清理他们的输入。

在我着手编写一个函数来对可能有害的字符执行此操作之前，例如 ( 或 ; 或 < （因此注入如偷偷摸摸的 eval() 或<text/javascript> 行不通）我尝试搜索以前的人尝试进行此类清理的尝试。

我找到了 none。

这让我觉得我一定很明显地忽略了我的 "creative" 清理方法中一些非常明显的安全漏洞。

我不会使用此功能作为保护我的 mySQL 数据库的主要方式。为此，我有新的 mysqli class。不过，在输入和查询的 mysqli 分离之上添加这种清理似乎是个好主意。
我正在使用完全不同的函数来清理 URL。这些需要不同的方法。
此函数将用于在页面上显示的用户输入。

所以....我可能遗漏了什么？我知道这个想法肯定有问题，因为没有其他人使用它，对吧？！是否有可能 "re-render the rendered text" 或其他可怕而明显的东西？到目前为止我的小功能：

接受输入字符串，例如 meep';) drop table 或 alert(eval('document.body.inne' + 'rHTML'));

function santitize_data($data)    {
//explode the string
//do a replacement for each character separately. Only do one replacement.
//dont do it with preg_replace because that function searches through a string in multiple passes 
//and replaces already-replaced characters, resulting in horrific mishmash.
//put it back together with + signs iterating through array variables   

$patterns = array();
$patterns[0] = "'";
$patterns[1] = '"';
$patterns[2] = '!';
$patterns[3] = '\';
$patterns[4] = '#';
$patterns[5] = '%';
$patterns[6] = '&';
$patterns[7] = '$';
$patterns[8] = '(';
$patterns[9] = ')';
$patterns[10] = '/';
$patterns[11] = ':';
$patterns[12] = ';';
$patterns[13] = '|';
$patterns[14] = '<';
$patterns[15] = '>';
$patterns[16] = '{';
$patterns[17] = '}';

$replacements = array();
$replacements[0] = '&#39;';
$replacements[1] = '&#34;';
$replacements[2] = '&#33';
$replacements[3] = '&#92;';
$replacements[4] = '&#35;';
$replacements[5] = '&#37;';
$replacements[6] = '&#38;';
$replacements[7] = '&#36;';
$replacements[8] = '&#40;';
$replacements[9] = '&#41;';
$replacements[10] = '&#47;';
$replacements[11] = '&#58;';
$replacements[12] = '&#59;';
$replacements[13] = '&#124;';
$replacements[14] = '&lt;';
$replacements[15] = '&gt;';
$replacements[16] = '&#123;';
$replacements[17] = '&#125;';

$split_data = str_split($data);

foreach ($split_data as &$value) {
    for ($i=0; $i<17; $i++){
        //testing
        //echo '<br> i='.$i.' value='.$value.' patterns[i]='.$patterns[$i].' replacements[i]='.$replacements[$i].'<br>';
        if ($value == $patterns[$i]) { 
            $value = $replacements[$i];
            $i=17;    }    }    }
unset($value); // break the reference with the last element

$data = implode($split_data);

//a bit of commented out code .. was using what seemed more logical before ... preg_replace .. but it parses the string in multiple passes ):
//$data = preg_replace($patterns, $replacements, $data);

return $data;

} //---END function definition of santitize_data

吐出结果字符串，如 meep';) drop table 或 alert(eval('document.body.inne' + 'rHTML'));
并且用户会看到这些在浏览器中呈现的内容，例如 meep';) drop table 和 alert(eval('document.body.inne' + 'rHTML'));

Answer 1

在不分析您的代码的情况下，我可以告诉您，您很可能忽略了一些攻击者可以用来注入他们自己的代码的东西。

这里的主要威胁是 XSS - 您不需要 "sanitize" 将数据插入数据库。您要么使用参数化查询，要么正确编码数据库查询语言赋予特殊含义的字符 在输入点 进入您的数据库（例如 ' 字符）。 XSS 通常通过在输出点进行编码来处理，但是如果你想允许富文本，那么你需要采取不同的方法，我相信你希望在这里实现.

请记住，没有神奇的功能可以以通用的方式清理输入 - 这在很大程度上取决于如何以及在何处使用它来确定它在该上下文中是否安全。（添加了这一点，所以如果有人搜索并找到这个答案，那么他们就会加快速度 - 不过我认为你已经掌握了这一点。）

复杂性是安全的主要敌人。如果您无法确定您的代码是否安全，那么它就太复杂了，并且有足够积极性和足够时间的攻击者会找到绕过您的清理方法的方法。

对此你能做什么？

如果您想让您的用户输入富文本，您可以允许 BBCode to allow users to insert a limited, safe subset of HTML via your own conversion functions or you could allow HTML entry and run the content through a tried and tested solution such as HTML Purifier. Now, HTML Purifier won't be perfect and I'm sure that (another) 将来某个时候会发现其中的缺陷。

如何防范？

如果你实现一个Content Security Policy on your site, this will prevent any successfully injected script code from executing in the browser. See here for current browser support for CSP。不要试图只使用其中一种方法 - 一个好的安全模型具有分层安全性，因此如果一个控制被规避，另一个可以捕获它。

Google have now implemented CSP in Gmail 以确保收到的任何 HTML 电子邮件都不会尝试偷偷摸摸地发起 XSS 攻击。

这种创造性的输入净化方式可能面临什么样的安全漏洞？（如有）

What kind of security loopholes could this creative way of sanitizing input, possibly face? (if any)

php

security

input

对此你能做什么？

如何防范？

这种创造性的输入净化方式可能面临什么样的安全漏洞？ （如有）

What kind of security loopholes could this creative way of sanitizing input, possibly face? (if any)

php

security

input

对此你能做什么？

如何防范？

这种创造性的输入净化方式可能面临什么样的安全漏洞？（如有）