Spirit X3,如何在非 ascii 输入上解析失败?
Spirit X3, How to fail parse on non-ascii input?
所以 objective 是不允许输入字符串中从 80h 到 FFh 的字符。我的印象是
using ascii::char_;
会处理这个的。但正如您在示例代码中看到的那样,它会愉快地打印 Parsing succeeded.
在下面的 Spirit 邮件列表 post 中,Joel 建议让这些非 ascii 字符的解析失败。但我不确定他是否继续这样做。
[Spirit-general] ascii encoding assert on invalid input ...
这里是我的示例代码:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(char_ - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80" "bla bla bla\"" }) {
std::string output;
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "Parsing succeeded\n";
std::cout << "input: " << input << "\n";
std::cout << "output: " << output << "\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
如何更改示例以使 Spirit 在此无效输入上失败?
此外,但非常相关,我想知道我应该如何使用定义 char_set 编码的字符解析器。你从 X3 docs: Character Parsers develop branch.
知道 char_(charset)
文档缺乏对基本功能的如此强烈的描述。为什么 boost 高层人员不能强制图书馆作者提供至少 cppreference.com 级别的文档?
您可以通过使用 print
解析器来实现:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::print;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(print - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80\"", "\"bla bla bla\"" }) {
std::string output;
std::cout << "input: " << input << "\n";
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "output: " << output << "\n";
std::cout << "Parsing succeeded\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
输出:
input: "naughty �"
Parsing failed
input: "bla bla bla"
output: "bla bla bla"
Parsing succeeded
https://wandbox.org/permlink/HSoB8uqMC3WME5yI
令人惊讶的事实是,出于某种原因,仅当 sizeof(iterator char type) > sizeof(char)
:
时才会对 char_
进行检查
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <string>
#include <boost/core/demangle.hpp>
#include <typeinfo>
namespace x3 = boost::spirit::x3;
template <typename Char>
void test(Char const* str)
{
std::basic_string<Char> s = str;
std::cout << boost::core::demangle(typeid(Char).name()) << ":\t";
Char c;
auto it = s.begin();
if (x3::parse(it, s.end(), x3::ascii::char_, c) && it == s.end())
std::cout << "OK: " << int(c) << "\n";
else
std::cout << "Failed\n";
}
int main()
{
test("\x80");
test(L"\x80");
test(u8"\x80");
test(u"\x80");
test(U"\x80");
}
输出:
char: OK: -128
wchar_t: Failed
char8_t: OK: 128
char16_t: Failed
char32_t: Failed
这里的文档没什么不好的。这只是一个库错误。
any_char
的代码表示:
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) || encoding::ischar(ch_));
}
应该说
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) && encoding::ischar(ch_));
}
这会使您的程序按预期和要求运行。该行为也符合 Qi 行为:
#include <boost/spirit/include/qi.hpp>
int main() {
namespace qi = boost::spirit::qi;
char const* input = "\x80";
assert(!qi::parse(input, input+1, qi::ascii::char_));
}
所以 objective 是不允许输入字符串中从 80h 到 FFh 的字符。我的印象是
using ascii::char_;
会处理这个的。但正如您在示例代码中看到的那样,它会愉快地打印 Parsing succeeded.
在下面的 Spirit 邮件列表 post 中,Joel 建议让这些非 ascii 字符的解析失败。但我不确定他是否继续这样做。 [Spirit-general] ascii encoding assert on invalid input ...
这里是我的示例代码:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(char_ - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80" "bla bla bla\"" }) {
std::string output;
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "Parsing succeeded\n";
std::cout << "input: " << input << "\n";
std::cout << "output: " << output << "\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
如何更改示例以使 Spirit 在此无效输入上失败?
此外,但非常相关,我想知道我应该如何使用定义 char_set 编码的字符解析器。你从 X3 docs: Character Parsers develop branch.
知道char_(charset)
文档缺乏对基本功能的如此强烈的描述。为什么 boost 高层人员不能强制图书馆作者提供至少 cppreference.com 级别的文档?
您可以通过使用 print
解析器来实现:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::print;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(print - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80\"", "\"bla bla bla\"" }) {
std::string output;
std::cout << "input: " << input << "\n";
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "output: " << output << "\n";
std::cout << "Parsing succeeded\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
输出:
input: "naughty �"
Parsing failed
input: "bla bla bla"
output: "bla bla bla"
Parsing succeeded
https://wandbox.org/permlink/HSoB8uqMC3WME5yI
令人惊讶的事实是,出于某种原因,仅当 sizeof(iterator char type) > sizeof(char)
:
char_
进行检查
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <string>
#include <boost/core/demangle.hpp>
#include <typeinfo>
namespace x3 = boost::spirit::x3;
template <typename Char>
void test(Char const* str)
{
std::basic_string<Char> s = str;
std::cout << boost::core::demangle(typeid(Char).name()) << ":\t";
Char c;
auto it = s.begin();
if (x3::parse(it, s.end(), x3::ascii::char_, c) && it == s.end())
std::cout << "OK: " << int(c) << "\n";
else
std::cout << "Failed\n";
}
int main()
{
test("\x80");
test(L"\x80");
test(u8"\x80");
test(u"\x80");
test(U"\x80");
}
输出:
char: OK: -128
wchar_t: Failed
char8_t: OK: 128
char16_t: Failed
char32_t: Failed
这里的文档没什么不好的。这只是一个库错误。
any_char
的代码表示:
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) || encoding::ischar(ch_));
}
应该说
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) && encoding::ischar(ch_));
}
这会使您的程序按预期和要求运行。该行为也符合 Qi 行为:
#include <boost/spirit/include/qi.hpp>
int main() {
namespace qi = boost::spirit::qi;
char const* input = "\x80";
assert(!qi::parse(input, input+1, qi::ascii::char_));
}