试图了解 html5lib-test 中的 ParseError 数量
Trying to understand number of ParseError in html5lib-test
我正在查看 html5lib-tests 中的以下测试用例:
{"description":"<!DOCTYPE\u0008",
"input":"<!DOCTYPE\u0008",
"output":["ParseError", "ParseError", "ParseError",
["DOCTYPE", "\u0008", null, null, false]]},
State |Input char | Actions
--------------------------------------------------------------------------------------------
Data State | "<" | -> TagOpenState
TagOpenState | "!" | -> MarkupDeclarationOpenState
MarkupDeclarationOpenState | "DOCTYPE" | -> DOCTYPE state
DOCTYPE state | "\u0008" | Parse error; -> before DOCTYPE name state (reconsume)
before DOCTYPE name state | "\u0008" | DOCTYPE(name = "\u0008"); -> DOCTYPE name state
DOCTYPE name state | EOF | Parse error. Set force quirks on. Emit DOCTYPE -> Data state.
Data state | EOF | Emit EOF.
我想知道这三个错误是从哪里来的?我只能跟踪两个,但我假设我在某个地方犯了逻辑错误。
您缺少的是 "Preprocessing the input stream" 部分中的那个:
Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse errors. These are all control characters or permanently undefined Unicode characters (noncharacters).
这会在 U+0008 字符到达分词器之前导致解析错误。鉴于分词器被定义为从输入流读取,分词器测试假设输入流应用了正常的预处理。
我正在查看 html5lib-tests 中的以下测试用例:
{"description":"<!DOCTYPE\u0008",
"input":"<!DOCTYPE\u0008",
"output":["ParseError", "ParseError", "ParseError",
["DOCTYPE", "\u0008", null, null, false]]},
State |Input char | Actions
--------------------------------------------------------------------------------------------
Data State | "<" | -> TagOpenState
TagOpenState | "!" | -> MarkupDeclarationOpenState
MarkupDeclarationOpenState | "DOCTYPE" | -> DOCTYPE state
DOCTYPE state | "\u0008" | Parse error; -> before DOCTYPE name state (reconsume)
before DOCTYPE name state | "\u0008" | DOCTYPE(name = "\u0008"); -> DOCTYPE name state
DOCTYPE name state | EOF | Parse error. Set force quirks on. Emit DOCTYPE -> Data state.
Data state | EOF | Emit EOF.
我想知道这三个错误是从哪里来的?我只能跟踪两个,但我假设我在某个地方犯了逻辑错误。
您缺少的是 "Preprocessing the input stream" 部分中的那个:
Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse errors. These are all control characters or permanently undefined Unicode characters (noncharacters).
这会在 U+0008 字符到达分词器之前导致解析错误。鉴于分词器被定义为从输入流读取,分词器测试假设输入流应用了正常的预处理。