从 nifi 中的 xml 中提取属性

Extract attributes from xml in nifi

我有这些 xml 文件,我从 ftp 获取它们(使用列表和获取 ftp 处理器)。我想从 xml 文件中获取值并用这些值替换文件,因为它是 csv 。 (并使用 putFtp 处理器将它们放回 ftp)

期望的输出是这样的:

{"foodate":"somedate","name":"fooid1_foovalue","value":5.44}
{"foodate":"somedate","name":"fooid1_metrics","value":some-metrics}
.
.
.
{"foodate":"somedate","name":"fooid2_foovalue","value":2.34}
.
.
.

因此,对于每个 id,首先写入 foodate 属性,然后写入 id1、sample - 属性 1、id1、sample - 属性 2 等。

但是每次我都不知道第一个示例属性将是 foodate 的名称或属性的数量 be.Only。知道如何进行吗?我尝试使用 executeScript 处理器和 js,但它似乎无法识别 DOMParser() 等

<?xml version="1.0" encoding="ISO-8859-1"?>
<Document Version="2">
    <ExportData lowerBound="2021/11/24 16:58:26" upperBound="2021/11/24 22:58:26">
        <Site name="name" f="">
            <Kapta fooid1="some-number">
                <Infos>
                    <Info>
                        <EndPoint foo="value-name" />
                    </Info>
                </Infos>
                <Samples ordering="desc">
                    <Sample foodate="some-date" foovalue="5.44" metrics="some-metrics" metrics2="metrics-again" value="numbers5" te="numbers" />
                    <Sample foodate="some-date" foovalue="7.45" foom="some-metrics" metrics453="metrics-again" otherattribut="numbers5" att345="numbers" morevalues="numbers" foohdeiurf="numbers" hello="numbers"/>
                </Samples>
            </Kapta>
            <Kapta fooid2="some-number">
                <Infos>
                    <Info>
                        <EndPoint foo="value-name" />
                    </Info>
                </Infos>
                <Samples ordering="desc">
                    <Sample foodate="some-date" foovalue="2.34" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbersagain" />
                    <Sample foodate="some-date" foo="99.8" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbers" />
                    <Sample foodate="some-date" attr="234.56" someothermetrics="some-metrics" metr="metrics-again" anothervalue="numbers" />
                </Samples>
            </Kapta>
        </Site>
    </ExportData>
</Document>

Thanks a lot for your time and effort!

您可以使用 groovy xml 解析器库。根据您的需要有很多选择,检查this

这是一个实验代码,它从传入流文件的内容中获取 xml 并将一些提取输出为 json 列表。你可以根据你的需求开发它

请注意,此代码可能不是生产级的。有关 Nifi

中 Groovy 的更多信息,请参阅 ExecuteScript cookbook
import org.apache.nifi.flowfile.FlowFile;
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.InputStreamCallback
import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.StandardCharsets
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import groovy.xml.dom.DOMCategory
import groovy.json.JsonGenerator

def flowFile

try {
    
    flowFile = session.get()
    
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = null

    session.read(flowFile, {inputStream ->
        doc =  dBuilder.parse(inputStream)
    } as InputStreamCallback)
    
    def root = doc.documentElement
    def sb = new StringBuilder()
    def jsonGenerator = new JsonGenerator.Options().disableUnicodeEscaping().build()
    
    // get a specific attribute
    use(DOMCategory) {
         root['ExportData']['Site']['Kapta']['Infos']['Info']['*'].findAll { node ->
            def data = new LinkedHashMap()
            data.NodeName = node.name()
            data.foodate = node['@foo']
            sb.append(jsonGenerator.toJson(data))
            sb.append('\n')
        }   
    }
    
    // get all attributes of Sample under Samples
    use(DOMCategory) {
        root['ExportData']['Site']['Kapta']['Samples']['*'].findAll { node ->
            def data = new LinkedHashMap()
            data.NodeName = node.name()
            def attributesMap = node.attributes()
            for (int x = 0; x < attributesMap.getLength(); x++) {
                data.AttrName = attributesMap.item(x).getNodeName();
                data.AttrValue = attributesMap.item(x).getNodeValue();
                sb.append(jsonGenerator.toJson(data))
                sb.append('\n')
            }
                    
       }
    }   
    
    flowFile = session.write(flowFile, {inputStream, outputStream ->
        outputStream.write(sb.toString().getBytes(StandardCharsets.UTF_8))
    } as StreamCallback)
    
    session.transfer(flowFile, REL_SUCCESS)
    
} catch (Exception e) {
    log.error('',e)
    session.transfer(flowFile, REL_FAILURE)
}