使用正则表达式用逗号分隔字符串并忽略双引号中的逗号

String split with comma and ignoring comma in double quotes using regex

我正在尝试使用正则表达式拆分字符串。我需要在 nifi 中使用正则表达式将字符串拆分成组。谁能帮我如何使用正则表达式拆分下面的字符串。

我有这样一个字符串:

"abc","-9223371901096288826","/home/test/20170614","abc.com","Hello,Test","7462200","4622012","1296614","1029293","893529","a:ce:o:5:l:p:MMM dd HH:mm:ss","Logs","UTF8","<111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -"


我想用逗号分隔,但我需要忽略引号中的逗号。我想要这样的结果:

    group 1 - abc
    group 2 - -9223371901096288826
    group 3 - /home/test/20170614
    group 4 - abc.com
    group 5 - Hello,Test
    group 6 - 7462200
    group 7 - 4622012
    group 8 - 1296614
    group 9 - 1029293
    group 10 - 893529
    group 11 - a:ce:o:5:l:p:MMM dd HH:mm:ss
    group 12 - Logs
    group 13 - UTF8
    group 14 - <111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -


我尝试了很多正则表达式来分割但无法得到正确的结果。

我尝试了 ,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$) 找到的正则表达式。

上面的正则表达式在 Java 中对 split() 函数非常有效,但我不想在 Java.

中使用

我尝试了 (?<=\")([^,]*)(?=\") 正则表达式并用逗号将字符串分成几组,但它也在双引号内分割。

谁能帮帮我。提前致谢。

您可以通过以下方式在不捕获组的情况下获得您的需求。

让我们考虑一下您的以下字符串。,

1.Use UpdateAttribute 用于将整个字符串存储在名为 "InputString".

的属性中
"abc","-9223371901096288826","/home/test/20170614","abc.com","Hello,Test","7462200","4622012","1296614","1029293","893529","a:ce:o:5:l:p:MMM dd HH:mm:ss","Logs","UTF8","<111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -"

2.After updateAttribute 的结果您可以使用另一个更新属性来提取这些值,如下所示..,

group1:${InputString:getDelimitedField(1)}
group2:${InputString:getDelimitedField(2)}
group3:${InputString:getDelimitedField(3)}
group4:${InputString:getDelimitedField(4)}
group5:${InputString:getDelimitedField(5)}
group6:${InputString:getDelimitedField(6)}
group7:${InputString:getDelimitedField(7)}
group8:${InputString:getDelimitedField(8)}
group9:${InputString:getDelimitedField(9)}
group10:${InputString:getDelimitedField(10)}
group11:${InputString:getDelimitedField(11)}
group12:${InputString:getDelimitedField(12)}
group13:${InputString:getDelimitedField(13)}

您可以使用 getDelimitedFunction 是通过以下参考提取这些值的最简单方法

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#getdelimitedfield

如果您在其中遇到任何问题,请告诉我。