Nifi MergeRecord 和 MergeContent 无法合并具有不同架构的 avro 流字段

Nifi MergeRecord & MergeContent unable to merge avro flow fiels having different schema

我正在使用 NiFi Flow 作为 ListFile >> FetchFile >> SplitJson >> UpdateAttribute >> FlattenJson >> InferAvroSchema >> ConvertRecord >> MergeRecord >> PutParquet.

Json 输入:

[{
       "Id": 1235,
        "Username": "fred1235",
        "Name": "Fred",
        "ShippingAddress": {
            "Address1": "456 Main St.",
            "Address2": "",
            "City": "Durham",
            "State": "NC"
        }

    },{

        "Id": 1236,
        "Username": "larry1234",
        "Name": "Larry",
        "ShippingAddress": {
            "Address1": "789 Main St.",
            "Address2": "",
            "City": "Durham",
            "State": "NC",
            "PostalCode": 277453
        },
        "Orders": [{
                "ItemId": 1111,
                "OrderDate": "11/11/2012"
            }, {
                "ItemId": 2222,
                "OrderDate": "12/12/2012"
        }]

}]

MergeRecord 处理器未提供 "Orders":合并文件架构中的数组。 MergeContent 处理器也有同样的问题。

而不是使用 SplitJson 和 FlattenJson,您可以使用 JoltTransformJSON 和以下 ChainR 规范来展平整个事物而不分裂:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "ShippingAddress": {
          "Address1": "[&2].ShippingAddress_Address1",
          "Address2": "[&2].ShippingAddress_Address2",
          "City": "[&2].ShippingAddress_City",
          "State": "[&2].ShippingAddress_State"
        },
        "Orders": {
          "*": {
            "ItemId": "[&3].Orders_&1_ItemId",
            "OrderDate": "[&3].Orders_&1_OrderDate"
          }
        },
        "*": "[&1].&"
      }
    }
  }
]

不确定 ConvertRecord 的用途,但您应该不再需要 MergeRecord。如果这不是您要查找的输出,请告诉我您的期望(对于两条记录,有和没有 Orders 字段的记录),我很乐意提供帮助。