如何在 Spirit X3 中正确指定锚定条件?
How do I properly specify anchoring conditions in Spirit X3?
我是编写解析器的新手。我正在尝试创建一个可以从输入文本中提取美国邮政编码的解析器。我创建了以下解析器模式,它们可以完成我想要的大部分工作。我能够按预期匹配 5 位邮政编码或 9 位邮政编码 (90210-1234)。
但是,它不允许我避免匹配像这样的东西:
246764 (returns 46764)
578397 (returns 78397)
我想为上面的pattern左右指定一些锚定条件,希望能去掉上面的例子。更具体地说,当数字或破折号与候选邮政编码的开头或结尾相邻时,我想禁止匹配。
测试数据(粗体条目应匹配)
12345
foo456
ba58r
246764anc
578397
90210-
15206-1
15222-1825
15212-4267-53410-2807
完整代码:
using It = std::string::const_iterator;
using ZipCode = boost::fusion::vector<It, It>;
namespace boost { namespace spirit { namespace x3 { namespace traits {
template <>
void move_to<It, ZipCode>(It b, It e, ZipCode& z)
{
z =
{
b,
e
};
}}}}}
void Parse(std::string const& input)
{
auto start = std::begin(input);
auto begin = start;
auto end = std::end(input);
ZipCode current;
std::vector<ZipCode> matches;
auto const fiveDigits = boost::spirit::x3::repeat(5)[boost::spirit::x3::digit];
auto const fourDigits = boost::spirit::x3::repeat(4)[boost::spirit::x3::digit];
auto const dash = boost::spirit::x3::char_('-');
auto const notDashOrDigit = boost::spirit::x3::char_ - (dash | boost::spirit::x3::digit);
auto const zipCode59 =
boost::spirit::x3::lexeme
[
-(¬DashOrDigit) >>
boost::spirit::x3::raw[fiveDigits >> -(dash >> fourDigits)] >>
¬DashOrDigit
];
while (begin != end)
{
if (!boost::spirit::x3::phrase_parse(begin, end, zipCode59, boost::spirit::x3::blank, current))
{
++begin;
}
else
{
auto startOffset = std::distance(start, boost::fusion::at_c<0>(current));
auto endOffset = std::distance(start, boost::fusion::at_c<1>(current));
auto length = std::distance(boost::fusion::at_c<0>(current), boost::fusion::at_c<1>(current));
std::cout << "Matched (\"" << startOffset
<< "\", \""
<< endOffset
<< "\") => \""
<< input.substr(startOffset, length)
<< "\""
<< std::endl;
}
}
}
此代码与上述测试数据产生以下输出:
Matched ("0", "5") => "12345"
Matched ("29", "34") => "46764"
Matched ("42", "47") => "78397"
Matched ("68", "78") => "15222-1825"
如果我将 zipCode59 更改为以下内容,我将不会收到回复:
auto const zipCode59 =
boost::spirit::x3::lexeme
[
¬DashOrDigit >>
boost::spirit::x3::raw[fiveDigits >> -(dash >> fourDigits)] >>
¬DashOrDigit
];
我已经通读了这个问题:。然而,这道题使用了一个符号table。我不认为这对我有用,因为我缺乏指定硬编码字符串的能力。我也不清楚该问题的答案是如何设法禁止领先内容的。
使用 -(parser)
只是让 (parser)
可选。将它与 -(&parser)
一起使用几乎没有任何效果。
也许您想要一个否定断言 ("lookahead"),即 !(parser)
(与 &(parser)
相反)。
Note that the potential confusion maybe because of the difference between unary minus (negative assertion) and binary minus (reducing character sets).
断言邮政编码开头不是 dash/digit 似乎……很困惑。如果你想肯定地断言破折号或数字以外的东西是 &~char_("-0-9")
(使用一元 ~
来否定字符集)但它会阻止在输入的一开始就匹配。
积极的态度
左右去掉一些复杂性我会天真地从类似的东西开始:
using It = std::string::const_iterator;
using ZipCode = boost::iterator_range<It>;
auto Parse(std::string const& input) {
using namespace boost::spirit::x3;
auto dig = [](int n) { return repeat(n)[digit]; };
auto const zip59 = dig(5) >> -('-' >> dig(4));
auto const valid = zip59 >> !graph;
std::vector<ZipCode> matches;
if (!parse(begin(input), end(input), *seek[raw[valid]], matches))
throw std::runtime_error("parser failure");
return matches;
}
哪个当然匹配太多了:
Matched '12345'
Matched '78397'
Matched '15222-1825'
Matched '53410-2807'
做英雄
要限制它(并在输入开始时仍然匹配),您可以 seek[&('-'|digit)]
然后 需要 一个有效的 zip。
我承认在得到它之前不得不 fiddle 处理一些东西 "right"。在此过程中,我创建了一个调试助手:
auto trace_as = [&input](std::string const& caption, auto parser) {
return raw[parser] [([=,&input](auto& ctx) {
std::cout << std::setw(12) << (caption+":") << " '";
auto range = _attr(ctx);
for (auto ch : range) switch (ch) {
case '[=12=]': std::cout << "\0"; break;
case '\r': std::cout << "\r"; break;
case '\n': std::cout << "\n"; break;
default: std::cout << ch;
}
std::cout << "' at " << std::distance(input.begin(), range.begin()) << "\n";
})];
};
auto const valid = seek[&trace_as("seek", '-' | digit)] >> raw[zip59] >> !graph;
std::vector<ZipCode> matches;
if (!parse(begin(input), end(input), -valid % trace_as("skip", *graph >> +space), matches))
throw std::runtime_error("parser failure");
这会产生以下附加诊断输出:
seek: '1' at 0
skip: '\n ' at 5
seek: '4' at 13
skip: 'foo456\n ' at 10
seek: '5' at 23
skip: 'ba58r\n ' at 21
seek: '2' at 31
skip: '246764anc\n ' at 31
seek: '5' at 45
skip: '578397\n ' at 45
seek: '9' at 56
skip: '90210-\n ' at 56
seek: '1' at 67
skip: '15206-1\n ' at 67
seek: '1' at 79
skip: '\n ' at 89
seek: '1' at 94
Matched '12345'
Matched '15222-1825'
既然输出是我们想要的,那我们再砍一下脚手架:
完整列表
#include <boost/spirit/home/x3.hpp>
using It = std::string::const_iterator;
using ZipCode = boost::iterator_range<It>;
auto Parse(std::string const& input) {
using namespace boost::spirit::x3;
auto dig = [](int n) { return repeat(n)[digit]; };
auto const zip59 = dig(5) >> -('-' >> dig(4));
auto const valid = seek[&('-' | digit)] >> raw[zip59] >> !graph;
std::vector<ZipCode> matches;
if (!parse(begin(input), end(input), -valid % (*graph >> +space), matches))
throw std::runtime_error("parser failure");
return matches;
}
#include <iostream>
int main() {
std::string const sample = R"(12345
foo456
ba58r
246764anc
578397
90210-
15206-1
15222-1825
15212-4267-53410-2807)";
for (auto zip : Parse(sample))
std::cout << "Matched '" << zip << "'\n";
}
打印:
Matched '12345'
Matched '15222-1825'
我是编写解析器的新手。我正在尝试创建一个可以从输入文本中提取美国邮政编码的解析器。我创建了以下解析器模式,它们可以完成我想要的大部分工作。我能够按预期匹配 5 位邮政编码或 9 位邮政编码 (90210-1234)。
但是,它不允许我避免匹配像这样的东西:
246764 (returns 46764)
578397 (returns 78397)
我想为上面的pattern左右指定一些锚定条件,希望能去掉上面的例子。更具体地说,当数字或破折号与候选邮政编码的开头或结尾相邻时,我想禁止匹配。
测试数据(粗体条目应匹配)
12345
foo456
ba58r
246764anc
578397
90210-
15206-1
15222-1825
15212-4267-53410-2807
完整代码:
using It = std::string::const_iterator;
using ZipCode = boost::fusion::vector<It, It>;
namespace boost { namespace spirit { namespace x3 { namespace traits {
template <>
void move_to<It, ZipCode>(It b, It e, ZipCode& z)
{
z =
{
b,
e
};
}}}}}
void Parse(std::string const& input)
{
auto start = std::begin(input);
auto begin = start;
auto end = std::end(input);
ZipCode current;
std::vector<ZipCode> matches;
auto const fiveDigits = boost::spirit::x3::repeat(5)[boost::spirit::x3::digit];
auto const fourDigits = boost::spirit::x3::repeat(4)[boost::spirit::x3::digit];
auto const dash = boost::spirit::x3::char_('-');
auto const notDashOrDigit = boost::spirit::x3::char_ - (dash | boost::spirit::x3::digit);
auto const zipCode59 =
boost::spirit::x3::lexeme
[
-(¬DashOrDigit) >>
boost::spirit::x3::raw[fiveDigits >> -(dash >> fourDigits)] >>
¬DashOrDigit
];
while (begin != end)
{
if (!boost::spirit::x3::phrase_parse(begin, end, zipCode59, boost::spirit::x3::blank, current))
{
++begin;
}
else
{
auto startOffset = std::distance(start, boost::fusion::at_c<0>(current));
auto endOffset = std::distance(start, boost::fusion::at_c<1>(current));
auto length = std::distance(boost::fusion::at_c<0>(current), boost::fusion::at_c<1>(current));
std::cout << "Matched (\"" << startOffset
<< "\", \""
<< endOffset
<< "\") => \""
<< input.substr(startOffset, length)
<< "\""
<< std::endl;
}
}
}
此代码与上述测试数据产生以下输出:
Matched ("0", "5") => "12345"
Matched ("29", "34") => "46764"
Matched ("42", "47") => "78397"
Matched ("68", "78") => "15222-1825"
如果我将 zipCode59 更改为以下内容,我将不会收到回复:
auto const zipCode59 =
boost::spirit::x3::lexeme
[
¬DashOrDigit >>
boost::spirit::x3::raw[fiveDigits >> -(dash >> fourDigits)] >>
¬DashOrDigit
];
我已经通读了这个问题:
使用 -(parser)
只是让 (parser)
可选。将它与 -(&parser)
一起使用几乎没有任何效果。
也许您想要一个否定断言 ("lookahead"),即 !(parser)
(与 &(parser)
相反)。
Note that the potential confusion maybe because of the difference between unary minus (negative assertion) and binary minus (reducing character sets).
断言邮政编码开头不是 dash/digit 似乎……很困惑。如果你想肯定地断言破折号或数字以外的东西是 &~char_("-0-9")
(使用一元 ~
来否定字符集)但它会阻止在输入的一开始就匹配。
积极的态度
左右去掉一些复杂性我会天真地从类似的东西开始:
using It = std::string::const_iterator;
using ZipCode = boost::iterator_range<It>;
auto Parse(std::string const& input) {
using namespace boost::spirit::x3;
auto dig = [](int n) { return repeat(n)[digit]; };
auto const zip59 = dig(5) >> -('-' >> dig(4));
auto const valid = zip59 >> !graph;
std::vector<ZipCode> matches;
if (!parse(begin(input), end(input), *seek[raw[valid]], matches))
throw std::runtime_error("parser failure");
return matches;
}
哪个当然匹配太多了:
Matched '12345'
Matched '78397'
Matched '15222-1825'
Matched '53410-2807'
做英雄
要限制它(并在输入开始时仍然匹配),您可以 seek[&('-'|digit)]
然后 需要 一个有效的 zip。
我承认在得到它之前不得不 fiddle 处理一些东西 "right"。在此过程中,我创建了一个调试助手:
auto trace_as = [&input](std::string const& caption, auto parser) {
return raw[parser] [([=,&input](auto& ctx) {
std::cout << std::setw(12) << (caption+":") << " '";
auto range = _attr(ctx);
for (auto ch : range) switch (ch) {
case '[=12=]': std::cout << "\0"; break;
case '\r': std::cout << "\r"; break;
case '\n': std::cout << "\n"; break;
default: std::cout << ch;
}
std::cout << "' at " << std::distance(input.begin(), range.begin()) << "\n";
})];
};
auto const valid = seek[&trace_as("seek", '-' | digit)] >> raw[zip59] >> !graph;
std::vector<ZipCode> matches;
if (!parse(begin(input), end(input), -valid % trace_as("skip", *graph >> +space), matches))
throw std::runtime_error("parser failure");
这会产生以下附加诊断输出:
seek: '1' at 0
skip: '\n ' at 5
seek: '4' at 13
skip: 'foo456\n ' at 10
seek: '5' at 23
skip: 'ba58r\n ' at 21
seek: '2' at 31
skip: '246764anc\n ' at 31
seek: '5' at 45
skip: '578397\n ' at 45
seek: '9' at 56
skip: '90210-\n ' at 56
seek: '1' at 67
skip: '15206-1\n ' at 67
seek: '1' at 79
skip: '\n ' at 89
seek: '1' at 94
Matched '12345'
Matched '15222-1825'
既然输出是我们想要的,那我们再砍一下脚手架:
完整列表
#include <boost/spirit/home/x3.hpp>
using It = std::string::const_iterator;
using ZipCode = boost::iterator_range<It>;
auto Parse(std::string const& input) {
using namespace boost::spirit::x3;
auto dig = [](int n) { return repeat(n)[digit]; };
auto const zip59 = dig(5) >> -('-' >> dig(4));
auto const valid = seek[&('-' | digit)] >> raw[zip59] >> !graph;
std::vector<ZipCode> matches;
if (!parse(begin(input), end(input), -valid % (*graph >> +space), matches))
throw std::runtime_error("parser failure");
return matches;
}
#include <iostream>
int main() {
std::string const sample = R"(12345
foo456
ba58r
246764anc
578397
90210-
15206-1
15222-1825
15212-4267-53410-2807)";
for (auto zip : Parse(sample))
std::cout << "Matched '" << zip << "'\n";
}
打印:
Matched '12345'
Matched '15222-1825'