从 nifi 中的 xml 中提取属性
Extract attributes from xml in nifi
我有这些 xml 文件,我从 ftp 获取它们(使用列表和获取 ftp 处理器)。我想从 xml 文件中获取值并用这些值替换文件,因为它是 csv 。 (并使用 putFtp 处理器将它们放回 ftp)
期望的输出是这样的:
{"foodate":"somedate","name":"fooid1_foovalue","value":5.44}
{"foodate":"somedate","name":"fooid1_metrics","value":some-metrics}
.
.
.
{"foodate":"somedate","name":"fooid2_foovalue","value":2.34}
.
.
.
因此,对于每个 id,首先写入 foodate 属性,然后写入 id1、sample - 属性 1、id1、sample - 属性 2 等。
但是每次我都不知道第一个示例属性将是 foodate 的名称或属性的数量 be.Only。知道如何进行吗?我尝试使用 executeScript 处理器和 js,但它似乎无法识别 DOMParser() 等
<?xml version="1.0" encoding="ISO-8859-1"?>
<Document Version="2">
<ExportData lowerBound="2021/11/24 16:58:26" upperBound="2021/11/24 22:58:26">
<Site name="name" f="">
<Kapta fooid1="some-number">
<Infos>
<Info>
<EndPoint foo="value-name" />
</Info>
</Infos>
<Samples ordering="desc">
<Sample foodate="some-date" foovalue="5.44" metrics="some-metrics" metrics2="metrics-again" value="numbers5" te="numbers" />
<Sample foodate="some-date" foovalue="7.45" foom="some-metrics" metrics453="metrics-again" otherattribut="numbers5" att345="numbers" morevalues="numbers" foohdeiurf="numbers" hello="numbers"/>
</Samples>
</Kapta>
<Kapta fooid2="some-number">
<Infos>
<Info>
<EndPoint foo="value-name" />
</Info>
</Infos>
<Samples ordering="desc">
<Sample foodate="some-date" foovalue="2.34" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbersagain" />
<Sample foodate="some-date" foo="99.8" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbers" />
<Sample foodate="some-date" attr="234.56" someothermetrics="some-metrics" metr="metrics-again" anothervalue="numbers" />
</Samples>
</Kapta>
</Site>
</ExportData>
</Document>
Thanks a lot for your time and effort!
您可以使用 groovy xml 解析器库。根据您的需要有很多选择,检查this
这是一个实验代码,它从传入流文件的内容中获取 xml 并将一些提取输出为 json 列表。你可以根据你的需求开发它
请注意,此代码可能不是生产级的。有关 Nifi
中 Groovy 的更多信息,请参阅 ExecuteScript cookbook
import org.apache.nifi.flowfile.FlowFile;
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.InputStreamCallback
import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.StandardCharsets
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import groovy.xml.dom.DOMCategory
import groovy.json.JsonGenerator
def flowFile
try {
flowFile = session.get()
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = null
session.read(flowFile, {inputStream ->
doc = dBuilder.parse(inputStream)
} as InputStreamCallback)
def root = doc.documentElement
def sb = new StringBuilder()
def jsonGenerator = new JsonGenerator.Options().disableUnicodeEscaping().build()
// get a specific attribute
use(DOMCategory) {
root['ExportData']['Site']['Kapta']['Infos']['Info']['*'].findAll { node ->
def data = new LinkedHashMap()
data.NodeName = node.name()
data.foodate = node['@foo']
sb.append(jsonGenerator.toJson(data))
sb.append('\n')
}
}
// get all attributes of Sample under Samples
use(DOMCategory) {
root['ExportData']['Site']['Kapta']['Samples']['*'].findAll { node ->
def data = new LinkedHashMap()
data.NodeName = node.name()
def attributesMap = node.attributes()
for (int x = 0; x < attributesMap.getLength(); x++) {
data.AttrName = attributesMap.item(x).getNodeName();
data.AttrValue = attributesMap.item(x).getNodeValue();
sb.append(jsonGenerator.toJson(data))
sb.append('\n')
}
}
}
flowFile = session.write(flowFile, {inputStream, outputStream ->
outputStream.write(sb.toString().getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
} catch (Exception e) {
log.error('',e)
session.transfer(flowFile, REL_FAILURE)
}
我有这些 xml 文件,我从 ftp 获取它们(使用列表和获取 ftp 处理器)。我想从 xml 文件中获取值并用这些值替换文件,因为它是 csv 。 (并使用 putFtp 处理器将它们放回 ftp)
期望的输出是这样的:
{"foodate":"somedate","name":"fooid1_foovalue","value":5.44}
{"foodate":"somedate","name":"fooid1_metrics","value":some-metrics}
.
.
.
{"foodate":"somedate","name":"fooid2_foovalue","value":2.34}
.
.
.
因此,对于每个 id,首先写入 foodate 属性,然后写入 id1、sample - 属性 1、id1、sample - 属性 2 等。
但是每次我都不知道第一个示例属性将是 foodate 的名称或属性的数量 be.Only。知道如何进行吗?我尝试使用 executeScript 处理器和 js,但它似乎无法识别 DOMParser() 等
<?xml version="1.0" encoding="ISO-8859-1"?>
<Document Version="2">
<ExportData lowerBound="2021/11/24 16:58:26" upperBound="2021/11/24 22:58:26">
<Site name="name" f="">
<Kapta fooid1="some-number">
<Infos>
<Info>
<EndPoint foo="value-name" />
</Info>
</Infos>
<Samples ordering="desc">
<Sample foodate="some-date" foovalue="5.44" metrics="some-metrics" metrics2="metrics-again" value="numbers5" te="numbers" />
<Sample foodate="some-date" foovalue="7.45" foom="some-metrics" metrics453="metrics-again" otherattribut="numbers5" att345="numbers" morevalues="numbers" foohdeiurf="numbers" hello="numbers"/>
</Samples>
</Kapta>
<Kapta fooid2="some-number">
<Infos>
<Info>
<EndPoint foo="value-name" />
</Info>
</Infos>
<Samples ordering="desc">
<Sample foodate="some-date" foovalue="2.34" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbersagain" />
<Sample foodate="some-date" foo="99.8" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbers" />
<Sample foodate="some-date" attr="234.56" someothermetrics="some-metrics" metr="metrics-again" anothervalue="numbers" />
</Samples>
</Kapta>
</Site>
</ExportData>
</Document>
Thanks a lot for your time and effort!
您可以使用 groovy xml 解析器库。根据您的需要有很多选择,检查this
这是一个实验代码,它从传入流文件的内容中获取 xml 并将一些提取输出为 json 列表。你可以根据你的需求开发它
请注意,此代码可能不是生产级的。有关 Nifi
中 Groovy 的更多信息,请参阅 ExecuteScript cookbookimport org.apache.nifi.flowfile.FlowFile;
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.InputStreamCallback
import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.StandardCharsets
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import groovy.xml.dom.DOMCategory
import groovy.json.JsonGenerator
def flowFile
try {
flowFile = session.get()
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = null
session.read(flowFile, {inputStream ->
doc = dBuilder.parse(inputStream)
} as InputStreamCallback)
def root = doc.documentElement
def sb = new StringBuilder()
def jsonGenerator = new JsonGenerator.Options().disableUnicodeEscaping().build()
// get a specific attribute
use(DOMCategory) {
root['ExportData']['Site']['Kapta']['Infos']['Info']['*'].findAll { node ->
def data = new LinkedHashMap()
data.NodeName = node.name()
data.foodate = node['@foo']
sb.append(jsonGenerator.toJson(data))
sb.append('\n')
}
}
// get all attributes of Sample under Samples
use(DOMCategory) {
root['ExportData']['Site']['Kapta']['Samples']['*'].findAll { node ->
def data = new LinkedHashMap()
data.NodeName = node.name()
def attributesMap = node.attributes()
for (int x = 0; x < attributesMap.getLength(); x++) {
data.AttrName = attributesMap.item(x).getNodeName();
data.AttrValue = attributesMap.item(x).getNodeValue();
sb.append(jsonGenerator.toJson(data))
sb.append('\n')
}
}
}
flowFile = session.write(flowFile, {inputStream, outputStream ->
outputStream.write(sb.toString().getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
} catch (Exception e) {
log.error('',e)
session.transfer(flowFile, REL_FAILURE)
}