用于从字符串解析 key/value 对的 Boost 正则表达式
Boost regex for parsing key/value pairs from a string
我尝试解析以下字符串的键值对:
#include <iostream>
#include <string>
#include <map>
#include <boost/regex.hpp>
int main()
{
std::string deliveryReceipt = "id:pgl01130529155035239084 sub:001 dlvrd:001 submit date:1305291550 done date:1305291550 stat:DELIVRD err:0";
std::map<std::string, std::string> results;
boost::regex re("(?:([^:]+):([^,]+)(?:,|$))+"); // key - value pair
boost::sregex_iterator it(deliveryReceipt.begin(), deliveryReceipt.end(), re), end;
for ( ; it != end; ++it){
results[(*it)[1]] = (*it)[2];
}
std::map<std::string, std::string>::iterator resultsIter = results.begin();
while (resultsIter != results.end())
{
std::cout << "key:" << resultsIter->first << " value:" << resultsIter->second << std::endl;
resultsIter++;
}
}
我得到以下输出:
key:id value:pgl01130529155035239084 sub:001 dlvrd:001 提交 date:1305291550 完成 date:1305291550 stat:DELIVRD err:0
如何修复此正则表达式以正确解析 key/value 对?
这个表达式,
(?<=^|\s)([^:]+):(\S*)(?=$|\s)
或
(?<=^|\s)([^:]+):(\S*)
开始可能没问题,您可以针对基于语言的转义进行修改。
如果您希望 simplify/modify/explore 表达式,regex101.com. If you'd like, you can also watch in this link 的右上面板已对其进行说明,它将如何匹配一些样本输入。
我会选择这样的东西(已更新)
"\s*(?<!\S)([^:]+)\s*:(\S+)(?!\S)"
https://regex101.com/r/Sufx5m/1
已解释
\s* # Optional whitespace trim
(?<! \S) # Whitespace boundary delimiter
# (also matches at beginning of string)
( [^:]+ ) # (1), Key - not any ':' colon chars
\s* # Optional whitespace trim
: # Colon
( \S+ ) # (2), Value - not whitespace chars
(?! \S ) # Whitespace boundary delimiter.
# (also matches at end of string)
如果分隔符是 :
并且键和值本身不包含 :
您可以使用:
\s*([^:]+):([^:\s]+)
部分
\s*
匹配 0+ 个空白字符
(
捕获组 1
[^:]+
使用 negated character class 匹配除 :
以外的任何字符
)
关闭群组
:
字面匹配
(
捕获组 2
[^:\s]+
匹配除 :
或空白字符 之外的任何字符
)
关闭群组
现代 C++ 允许比腐烂的正则表达式更好的东西。
您可以使用 Boost Spirit 在几行代码中编写强类型语法规范:
using namespace boost::spirit::x3;
auto key = lexeme [ +(char_ - ':') ];
auto value = lexeme [ +graph ];
auto kvp = lexeme [key >> ':' >> value];
return skip(space) [ *kvp ];
演示
#include <map>
// for debug output only
#include <iostream>
#include <iomanip>
// for parsing
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/home/x3.hpp>
static inline auto kvp_parser() {
using namespace boost::spirit::x3;
auto key = lexeme [ +(char_ - ':') ];
auto value = lexeme [ +graph ];
auto kvp = lexeme [key >> ':' >> value];
return skip(space) [ *kvp ];
}
int main() {
std::string const deliveryReceipt = "id:pgl01130529155035239084 sub:001 dlvrd:001 submit date:1305291550 done date:1305291550 stat:DELIVRD err:0";
std::map<std::string, std::string> results;
parse(begin(deliveryReceipt), end(deliveryReceipt), kvp_parser(), results);
for (auto& [k,v]: results) {
std::cout << "key:" << std::quoted(k) << "\tvalue:" << std::quoted(v) << std::endl;
}
}
版画
key:"dlvrd" value:"001"
key:"done date" value:"1305291550"
key:"err" value:"0"
key:"id" value:"pgl01130529155035239084"
key:"stat" value:"DELIVRD"
key:"sub" value:"001"
key:"submit date" value:"1305291550"
我尝试解析以下字符串的键值对:
#include <iostream>
#include <string>
#include <map>
#include <boost/regex.hpp>
int main()
{
std::string deliveryReceipt = "id:pgl01130529155035239084 sub:001 dlvrd:001 submit date:1305291550 done date:1305291550 stat:DELIVRD err:0";
std::map<std::string, std::string> results;
boost::regex re("(?:([^:]+):([^,]+)(?:,|$))+"); // key - value pair
boost::sregex_iterator it(deliveryReceipt.begin(), deliveryReceipt.end(), re), end;
for ( ; it != end; ++it){
results[(*it)[1]] = (*it)[2];
}
std::map<std::string, std::string>::iterator resultsIter = results.begin();
while (resultsIter != results.end())
{
std::cout << "key:" << resultsIter->first << " value:" << resultsIter->second << std::endl;
resultsIter++;
}
}
我得到以下输出:
key:id value:pgl01130529155035239084 sub:001 dlvrd:001 提交 date:1305291550 完成 date:1305291550 stat:DELIVRD err:0
如何修复此正则表达式以正确解析 key/value 对?
这个表达式,
(?<=^|\s)([^:]+):(\S*)(?=$|\s)
或
(?<=^|\s)([^:]+):(\S*)
开始可能没问题,您可以针对基于语言的转义进行修改。
如果您希望 simplify/modify/explore 表达式,regex101.com. If you'd like, you can also watch in this link 的右上面板已对其进行说明,它将如何匹配一些样本输入。
我会选择这样的东西(已更新)
"\s*(?<!\S)([^:]+)\s*:(\S+)(?!\S)"
https://regex101.com/r/Sufx5m/1
已解释
\s* # Optional whitespace trim
(?<! \S) # Whitespace boundary delimiter
# (also matches at beginning of string)
( [^:]+ ) # (1), Key - not any ':' colon chars
\s* # Optional whitespace trim
: # Colon
( \S+ ) # (2), Value - not whitespace chars
(?! \S ) # Whitespace boundary delimiter.
# (also matches at end of string)
如果分隔符是 :
并且键和值本身不包含 :
您可以使用:
\s*([^:]+):([^:\s]+)
部分
\s*
匹配 0+ 个空白字符(
捕获组 1[^:]+
使用 negated character class 匹配除
:
以外的任何字符)
关闭群组:
字面匹配(
捕获组 2[^:\s]+
匹配除:
或空白字符 之外的任何字符
)
关闭群组
现代 C++ 允许比腐烂的正则表达式更好的东西。
您可以使用 Boost Spirit 在几行代码中编写强类型语法规范:
using namespace boost::spirit::x3;
auto key = lexeme [ +(char_ - ':') ];
auto value = lexeme [ +graph ];
auto kvp = lexeme [key >> ':' >> value];
return skip(space) [ *kvp ];
演示
#include <map>
// for debug output only
#include <iostream>
#include <iomanip>
// for parsing
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/home/x3.hpp>
static inline auto kvp_parser() {
using namespace boost::spirit::x3;
auto key = lexeme [ +(char_ - ':') ];
auto value = lexeme [ +graph ];
auto kvp = lexeme [key >> ':' >> value];
return skip(space) [ *kvp ];
}
int main() {
std::string const deliveryReceipt = "id:pgl01130529155035239084 sub:001 dlvrd:001 submit date:1305291550 done date:1305291550 stat:DELIVRD err:0";
std::map<std::string, std::string> results;
parse(begin(deliveryReceipt), end(deliveryReceipt), kvp_parser(), results);
for (auto& [k,v]: results) {
std::cout << "key:" << std::quoted(k) << "\tvalue:" << std::quoted(v) << std::endl;
}
}
版画
key:"dlvrd" value:"001"
key:"done date" value:"1305291550"
key:"err" value:"0"
key:"id" value:"pgl01130529155035239084"
key:"stat" value:"DELIVRD"
key:"sub" value:"001"
key:"submit date" value:"1305291550"