Nifi MergeRecord 和 MergeContent 无法合并具有不同架构的 avro 流字段
Nifi MergeRecord & MergeContent unable to merge avro flow fiels having different schema
我正在使用 NiFi Flow 作为 ListFile >> FetchFile >> SplitJson >> UpdateAttribute >> FlattenJson >> InferAvroSchema >> ConvertRecord >> MergeRecord >> PutParquet.
Json 输入:
[{
"Id": 1235,
"Username": "fred1235",
"Name": "Fred",
"ShippingAddress": {
"Address1": "456 Main St.",
"Address2": "",
"City": "Durham",
"State": "NC"
}
},{
"Id": 1236,
"Username": "larry1234",
"Name": "Larry",
"ShippingAddress": {
"Address1": "789 Main St.",
"Address2": "",
"City": "Durham",
"State": "NC",
"PostalCode": 277453
},
"Orders": [{
"ItemId": 1111,
"OrderDate": "11/11/2012"
}, {
"ItemId": 2222,
"OrderDate": "12/12/2012"
}]
}]
MergeRecord 处理器未提供 "Orders":合并文件架构中的数组。 MergeContent 处理器也有同样的问题。
而不是使用 SplitJson 和 FlattenJson,您可以使用 JoltTransformJSON 和以下 ChainR 规范来展平整个事物而不分裂:
[
{
"operation": "shift",
"spec": {
"*": {
"ShippingAddress": {
"Address1": "[&2].ShippingAddress_Address1",
"Address2": "[&2].ShippingAddress_Address2",
"City": "[&2].ShippingAddress_City",
"State": "[&2].ShippingAddress_State"
},
"Orders": {
"*": {
"ItemId": "[&3].Orders_&1_ItemId",
"OrderDate": "[&3].Orders_&1_OrderDate"
}
},
"*": "[&1].&"
}
}
}
]
不确定 ConvertRecord 的用途,但您应该不再需要 MergeRecord。如果这不是您要查找的输出,请告诉我您的期望(对于两条记录,有和没有 Orders 字段的记录),我很乐意提供帮助。
我正在使用 NiFi Flow 作为 ListFile >> FetchFile >> SplitJson >> UpdateAttribute >> FlattenJson >> InferAvroSchema >> ConvertRecord >> MergeRecord >> PutParquet.
Json 输入:
[{
"Id": 1235,
"Username": "fred1235",
"Name": "Fred",
"ShippingAddress": {
"Address1": "456 Main St.",
"Address2": "",
"City": "Durham",
"State": "NC"
}
},{
"Id": 1236,
"Username": "larry1234",
"Name": "Larry",
"ShippingAddress": {
"Address1": "789 Main St.",
"Address2": "",
"City": "Durham",
"State": "NC",
"PostalCode": 277453
},
"Orders": [{
"ItemId": 1111,
"OrderDate": "11/11/2012"
}, {
"ItemId": 2222,
"OrderDate": "12/12/2012"
}]
}]
MergeRecord 处理器未提供 "Orders":合并文件架构中的数组。 MergeContent 处理器也有同样的问题。
而不是使用 SplitJson 和 FlattenJson,您可以使用 JoltTransformJSON 和以下 ChainR 规范来展平整个事物而不分裂:
[
{
"operation": "shift",
"spec": {
"*": {
"ShippingAddress": {
"Address1": "[&2].ShippingAddress_Address1",
"Address2": "[&2].ShippingAddress_Address2",
"City": "[&2].ShippingAddress_City",
"State": "[&2].ShippingAddress_State"
},
"Orders": {
"*": {
"ItemId": "[&3].Orders_&1_ItemId",
"OrderDate": "[&3].Orders_&1_OrderDate"
}
},
"*": "[&1].&"
}
}
}
]
不确定 ConvertRecord 的用途,但您应该不再需要 MergeRecord。如果这不是您要查找的输出,请告诉我您的期望(对于两条记录,有和没有 Orders 字段的记录),我很乐意提供帮助。