用于从字符串解析 key/value 对的 Boost 正则表达式

Boost regex for parsing key/value pairs from a string

我尝试解析以下字符串的键值对:

#include <iostream>
#include <string>
#include <map>

#include <boost/regex.hpp>

int main()
{
    std::string deliveryReceipt = "id:pgl01130529155035239084 sub:001 dlvrd:001 submit date:1305291550 done date:1305291550 stat:DELIVRD err:0";

    std::map<std::string, std::string> results;
    boost::regex re("(?:([^:]+):([^,]+)(?:,|$))+"); // key - value pair

    boost::sregex_iterator it(deliveryReceipt.begin(), deliveryReceipt.end(), re), end;
    for ( ; it != end; ++it){
      results[(*it)[1]] = (*it)[2];
    }    

    std::map<std::string, std::string>::iterator resultsIter = results.begin();
    while (resultsIter != results.end())
    {
        std::cout << "key:" << resultsIter->first << " value:" << resultsIter->second << std::endl;
        resultsIter++;
    }
}

我得到以下输出:

key:id value:pgl01130529155035239084 sub:001 dlvrd:001 提交 date:1305291550 完成 date:1305291550 stat:DELIVRD err:0

如何修复此正则表达式以正确解析 key/value 对?

这个表达式,

(?<=^|\s)([^:]+):(\S*)(?=$|\s)

(?<=^|\s)([^:]+):(\S*)

开始可能没问题,您可以针对基于语言的转义进行修改。


如果您希望 simplify/modify/explore 表达式,regex101.com. If you'd like, you can also watch in this link 的右上面板已对其进行说明,它将如何匹配一些样本输入。


我会选择这样的东西(已更新

"\s*(?<!\S)([^:]+)\s*:(\S+)(?!\S)"

https://regex101.com/r/Sufx5m/1

已解释

 \s*              # Optional whitespace trim
 (?<! \S)         # Whitespace boundary delimiter
                  #   (also matches at beginning of string)
 ( [^:]+ )        # (1), Key - not any ':' colon chars
 \s*              # Optional whitespace trim
 :                # Colon 
 ( \S+ )          # (2), Value - not whitespace chars
 (?! \S )         # Whitespace boundary delimiter.
                  #   (also matches at end of string)

如果分隔符是 : 并且键和值本身不包含 : 您可以使用:

\s*([^:]+):([^:\s]+)

部分

  • \s* 匹配 0+ 个空白字符
  • ( 捕获组 1
  • ) 关闭群组
  • :字面匹配
  • ( 捕获组 2
    • [^:\s]+ 匹配除 : 或空白字符
    • 之外的任何字符
  • ) 关闭群组

Regex demo

现代 C++ 允许比腐烂的正则表达式更好的东西。

您可以使用 Boost Spirit 在几行代码中编写强类型语法规范:

using namespace boost::spirit::x3;
auto key   = lexeme [ +(char_ - ':') ];
auto value = lexeme [ +graph ];
auto kvp   = lexeme [key >> ':' >> value];
return skip(space) [ *kvp ];

演示

Live On Coliru

#include <map>
// for debug output only
#include <iostream>
#include <iomanip>

// for parsing
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/home/x3.hpp>

static inline auto kvp_parser() {
    using namespace boost::spirit::x3;
    auto key   = lexeme [ +(char_ - ':') ];
    auto value = lexeme [ +graph ];
    auto kvp   = lexeme [key >> ':' >> value];
    return skip(space) [ *kvp ];
}

int main() {
    std::string const deliveryReceipt = "id:pgl01130529155035239084 sub:001 dlvrd:001 submit date:1305291550 done date:1305291550 stat:DELIVRD err:0";

    std::map<std::string, std::string> results;

    parse(begin(deliveryReceipt), end(deliveryReceipt), kvp_parser(), results);

    for (auto& [k,v]: results) {
        std::cout << "key:" << std::quoted(k) << "\tvalue:" << std::quoted(v) << std::endl;
    }
}

版画

key:"dlvrd" value:"001"
key:"done date" value:"1305291550"
key:"err"   value:"0"
key:"id"    value:"pgl01130529155035239084"
key:"stat"  value:"DELIVRD"
key:"sub"   value:"001"
key:"submit date"   value:"1305291550"