如何计算值在行中存在的次数？

Question

我的数据是这样的..,

[123:1000,156,132,123,156,123]
[123:1009,392,132,123,156,123]
[234:987,789,132,123,156,123]
[234:8765,789,132,123,156,123]

我需要在 nifi 中使用表达式语言计算每行中存在“123”的次数。

我需要用表达式语言做only.How我可以算吗？

感谢任何帮助。

Answer 1

根据 documentation 计算是这样的：

${allMatchingAttributes(".*"):contains("123"):count()}

Answer 2

您应该将 SplitContent processor to split the flowfile content into individual flowfiles per line, then use ExtractText 与 pattern=(123)? 之类的正则表达式一起使用，这将导致为每个匹配组将一个属性添加到流文件中：

[123:1009,392,132,123,156,123] -> pattern.1, pattern.2, pattern.3 [234:987,789,132,123,156,123] -> pattern.1, pattern.2

最后，你可以使用一个ScanAttribute processor to detect the attribute with the highest group count in each of the flowfiles and route it to an UpdateAttribute to put that value into a common flowfile attribute (i.e. count). You could also replace some steps with an ExecuteStreamCommand，并使用各种OS级别的工具（grep/awk/sed/cut/等）执行计数，return那个值，并更新流文件的内容。

您在 ExecuteScript processor, as it could be done in 1-2 lines of Groovy, Ruby, Python, or Javascript, and would not require multiple processors. Apache NiFi is designed for data routing and simple transformation, not complex event processing, so there are not standard processors developed for these tasks. There is an open Jira for "Add processor to perform simple aggregations" which has a patch available here 内执行此计数操作可能会更简单，这可能对您有用。

如何计算值在行中存在的次数？

How to count the number of times value exists in line?

regex

apache-nifi