Yahoo HTML5 上下文解析器 - 跨站点脚本 (XSS)
Yahoo HTML5 Context Parser - Cross site scripting (XSS)
我正在试用 Yahoo 的 HTML5 上下文解析器 Yahoo context-parser,它有助于识别潜在的 XSS 漏洞。
作为尝试,运行 ./bin/context-dump 实用程序针对文本
<form><input name=q value="%(query)s"> </form>
产生:
HTML-State { statesSize: 51 } +0ms
HTML-State { ch: 0, state: 1, symbol: 0 } +2ms
HTML-State { ch: f [0x66], state: 8, symbol: 11 } +1ms
HTML-State { ch: o [0x6f], state: 10, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 10, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 10, symbol: 11 } +0ms
HTML-State { ch: > [0x3e], state: 10, symbol: 9 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: < [0x3c], state: 1, symbol: 7 } +0ms
HTML-State { ch: i [0x69], state: 8, symbol: 11 } +0ms
HTML-State { ch: n [0x6e], state: 10, symbol: 11 } +0ms
HTML-State { ch: p [0x70], state: 10, symbol: 11 } +1ms
HTML-State { ch: u [0x75], state: 10, symbol: 11 } +0ms
HTML-State { ch: t [0x74], state: 10, symbol: 11 } +0ms
HTML-State { ch: [0x20], state: 10, symbol: 0 } +0ms
HTML-State { ch: n [0x6e], state: 34, symbol: 11 } +0ms
HTML-State { ch: a [0x61], state: 35, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 35, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 35, symbol: 11 } +0ms
HTML-State { ch: = [0x3d], state: 35, symbol: 8 } +0ms
HTML-State { ch: q [0x71], state: 37, symbol: 11 } +0ms
HTML-State { ch: [0x20], state: 40, symbol: 0 } +0ms
HTML-State { ch: v [0x76], state: 34, symbol: 11 } +0ms
HTML-State { ch: a [0x61], state: 35, symbol: 11 } +0ms
HTML-State { ch: l [0x6c], state: 35, symbol: 11 } +0ms
HTML-State { ch: u [0x75], state: 35, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 35, symbol: 11 } +0ms
HTML-State { ch: = [0x3d], state: 35, symbol: 8 } +0ms
HTML-State { ch: " [0x22], state: 37, symbol: 2 } +1ms
HTML-State { ch: % [0x25], state: 38, symbol: 12 } +0ms
HTML-State { ch: ( [0x28], state: 38, symbol: 12 } +1ms
HTML-State { ch: q [0x71], state: 38, symbol: 11 } +0ms
HTML-State { ch: u [0x75], state: 38, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 38, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 38, symbol: 11 } +0ms
HTML-State { ch: y [0x79], state: 38, symbol: 11 } +0ms
HTML-State { ch: ) [0x29], state: 38, symbol: 12 } +0ms
HTML-State { ch: s [0x73], state: 38, symbol: 11 } +0ms
HTML-State { ch: " [0x22], state: 38, symbol: 2 } +0ms
HTML-State { ch: > [0x3e], state: 42, symbol: 9 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: < [0x3c], state: 1, symbol: 7 } +0ms
HTML-State { ch: / [0x2f], state: 8, symbol: 6 } +0ms
HTML-State { ch: f [0x66], state: 9, symbol: 11 } +0ms
HTML-State { ch: o [0x6f], state: 10, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 10, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 10, symbol: 11 } +0ms
HTML-State { ch: > [0x3e], state: 10, symbol: 9 } +0ms
HTML-State { ch: [0xa], state: 1, symbol: 0 } +0ms
HTML-State { undefined - char in html without state } +0ms
给定的输出如何帮助我识别潜在的 XSS 问题,或者换句话说,上下文解析器如何帮助我?
它会告诉您 HTML 页面中每个字符的语法上下文。
state
可以在constants file中查找。例如10 表示它是正在解析的标签名称,在您的示例中,这是 <input />
和 <form />
标签的名称。
了解输出内容的上下文可告知开发人员要使用的正确编码。
例如,当将用户数据输出到 HTML 时,您 HTML 进行了编码。即某些字符,比如低于号变成HTML编码(<
= <
)。
在 JavaScript 上下文中,您使用十六进制实体编码,因此 <
变为 \x3c
。
在所有实际意义上,我不确定上下文解析器在日常使用中有多大用处。一旦您知道要使用哪些编码类型,应该就很明显了。自己学习这个的主要陷阱可能是当你在 HTML 中有一个 JavaScript 上下文时:
<a href="javascript:void();" onclick="//this is parsed by HTML parser and then the JavaScript parser" />
而在 <script>
块中它只是 JavaScript 解析器:
<script>
// The HTML parser don't run past here
</script>
但是,一旦您意识到这一点,上下文解析器的好处就微乎其微了。
因此,即使它可以帮助处理服务器端上下文,它也不会帮助 DOM 操纵和预防基于 DOM 的 XSS:
<a href="javascript:void()" onclick="document.getElementById('foo').innerHTML = '(whatever is here should be HTML encoded, then hex entity encoded, then HTML encoded again)'" />
(欢迎在 Context Parser 中尝试。)
(最后的 HTML 编码不应产生任何变化,因为十六进制实体编码字符 \
、x
和十六进制字符不需要 HTML 编码 - 但是最终上下文是还是HTML.)
我正在试用 Yahoo 的 HTML5 上下文解析器 Yahoo context-parser,它有助于识别潜在的 XSS 漏洞。
作为尝试,运行 ./bin/context-dump 实用程序针对文本
<form><input name=q value="%(query)s"> </form>
产生:
HTML-State { statesSize: 51 } +0ms
HTML-State { ch: 0, state: 1, symbol: 0 } +2ms
HTML-State { ch: f [0x66], state: 8, symbol: 11 } +1ms
HTML-State { ch: o [0x6f], state: 10, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 10, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 10, symbol: 11 } +0ms
HTML-State { ch: > [0x3e], state: 10, symbol: 9 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: < [0x3c], state: 1, symbol: 7 } +0ms
HTML-State { ch: i [0x69], state: 8, symbol: 11 } +0ms
HTML-State { ch: n [0x6e], state: 10, symbol: 11 } +0ms
HTML-State { ch: p [0x70], state: 10, symbol: 11 } +1ms
HTML-State { ch: u [0x75], state: 10, symbol: 11 } +0ms
HTML-State { ch: t [0x74], state: 10, symbol: 11 } +0ms
HTML-State { ch: [0x20], state: 10, symbol: 0 } +0ms
HTML-State { ch: n [0x6e], state: 34, symbol: 11 } +0ms
HTML-State { ch: a [0x61], state: 35, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 35, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 35, symbol: 11 } +0ms
HTML-State { ch: = [0x3d], state: 35, symbol: 8 } +0ms
HTML-State { ch: q [0x71], state: 37, symbol: 11 } +0ms
HTML-State { ch: [0x20], state: 40, symbol: 0 } +0ms
HTML-State { ch: v [0x76], state: 34, symbol: 11 } +0ms
HTML-State { ch: a [0x61], state: 35, symbol: 11 } +0ms
HTML-State { ch: l [0x6c], state: 35, symbol: 11 } +0ms
HTML-State { ch: u [0x75], state: 35, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 35, symbol: 11 } +0ms
HTML-State { ch: = [0x3d], state: 35, symbol: 8 } +0ms
HTML-State { ch: " [0x22], state: 37, symbol: 2 } +1ms
HTML-State { ch: % [0x25], state: 38, symbol: 12 } +0ms
HTML-State { ch: ( [0x28], state: 38, symbol: 12 } +1ms
HTML-State { ch: q [0x71], state: 38, symbol: 11 } +0ms
HTML-State { ch: u [0x75], state: 38, symbol: 11 } +0ms
HTML-State { ch: e [0x65], state: 38, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 38, symbol: 11 } +0ms
HTML-State { ch: y [0x79], state: 38, symbol: 11 } +0ms
HTML-State { ch: ) [0x29], state: 38, symbol: 12 } +0ms
HTML-State { ch: s [0x73], state: 38, symbol: 11 } +0ms
HTML-State { ch: " [0x22], state: 38, symbol: 2 } +0ms
HTML-State { ch: > [0x3e], state: 42, symbol: 9 } +0ms
HTML-State { ch: [0x20], state: 1, symbol: 0 } +0ms
HTML-State { ch: < [0x3c], state: 1, symbol: 7 } +0ms
HTML-State { ch: / [0x2f], state: 8, symbol: 6 } +0ms
HTML-State { ch: f [0x66], state: 9, symbol: 11 } +0ms
HTML-State { ch: o [0x6f], state: 10, symbol: 11 } +0ms
HTML-State { ch: r [0x72], state: 10, symbol: 11 } +0ms
HTML-State { ch: m [0x6d], state: 10, symbol: 11 } +0ms
HTML-State { ch: > [0x3e], state: 10, symbol: 9 } +0ms
HTML-State { ch: [0xa], state: 1, symbol: 0 } +0ms
HTML-State { undefined - char in html without state } +0ms
给定的输出如何帮助我识别潜在的 XSS 问题,或者换句话说,上下文解析器如何帮助我?
它会告诉您 HTML 页面中每个字符的语法上下文。
state
可以在constants file中查找。例如10 表示它是正在解析的标签名称,在您的示例中,这是 <input />
和 <form />
标签的名称。
了解输出内容的上下文可告知开发人员要使用的正确编码。
例如,当将用户数据输出到 HTML 时,您 HTML 进行了编码。即某些字符,比如低于号变成HTML编码(<
= <
)。
在 JavaScript 上下文中,您使用十六进制实体编码,因此 <
变为 \x3c
。
在所有实际意义上,我不确定上下文解析器在日常使用中有多大用处。一旦您知道要使用哪些编码类型,应该就很明显了。自己学习这个的主要陷阱可能是当你在 HTML 中有一个 JavaScript 上下文时:
<a href="javascript:void();" onclick="//this is parsed by HTML parser and then the JavaScript parser" />
而在 <script>
块中它只是 JavaScript 解析器:
<script>
// The HTML parser don't run past here
</script>
但是,一旦您意识到这一点,上下文解析器的好处就微乎其微了。
因此,即使它可以帮助处理服务器端上下文,它也不会帮助 DOM 操纵和预防基于 DOM 的 XSS:
<a href="javascript:void()" onclick="document.getElementById('foo').innerHTML = '(whatever is here should be HTML encoded, then hex entity encoded, then HTML encoded again)'" />
(欢迎在 Context Parser 中尝试。)
(最后的 HTML 编码不应产生任何变化,因为十六进制实体编码字符 \
、x
和十六进制字符不需要 HTML 编码 - 但是最终上下文是还是HTML.)