在 ADFv1 管道中将源文件名传递到目标

Passing source file name to destination in an ADFv1 pipeline

场景

我正在使用 Azure 数据工厂 v1 开发 ETL(不幸的是我不能使用 Azure 数据工厂 v2)。

我想从给定的 blob 存储容器中读取所有 .csv 文件,然后将每个文件的内容写入 SQL Azure 中的 table。

问题

目标 table 包含 csv 文件中的所有列。它还必须包含一个新列,其中包含数据来自的文件的名称。

这就是我卡住的地方:我找不到将文件名从源数据集(来自 blob 存储源的 .csv 文件)传递到目标数据集(Sql Azure sink ).

我已经尝试过的

我已经实现了一个从 blob 存储读取文件并将其保存到 SQL Azure 中的 table 的管道。

以下是 json 的摘录,它将单个文件复制到 SQL Azure:

{
    "name": "pipelineFileImport",
    "properties": {
        "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource",
                        "recursive": false
                    },
                    "sink": {
                        "type": "SqlSink",
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    },
                    "translator": {
                        "type": "TabularTranslator",
                        "columnMappings": "TypeOfRecord:TypeOfRecord,TPMType:TPMType,..."
                    }
                },
                "inputs": [
                    {
                        "name": "InputDataset-cn0"
                    }
                ],
                "outputs": [
                    {
                        "name": "OutputDataset-cn0"
                    }
                ],
                "policy": {
                    "timeout": "1.00:00:00",
                    "concurrency": 1,
                    "executionPriorityOrder": "NewestFirst",
                    "style": "StartOfInterval",
                    "retry": 3,
                    "longRetry": 0,
                    "longRetryInterval": "00:00:00"
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "Activity-0-pipelineFileImport_csv->[staging]_[Files]"
            }
        ],
        "start": "2018-07-20T09:50:55.486Z",
        "end": "2018-07-20T09:50:55.486Z",
        "isPaused": false,
        "hubName": "test_hub",
        "pipelineMode": "OneTime",
        "expirationTime": "3.00:00:00",
        "datasets": [
            {
                "name": "InputDataset-cn0",
                "properties": {
                    "structure": [
                        {
                            "name": "TypeOfRecord",
                            "type": "String"
                        },
                        {
                            "name": "TPMType",
                            "type": "String"
                        },
                        ...
                    ],
                    "published": false,
                    "type": "AzureBlob",
                    "linkedServiceName": "Source-TestBlobStorage",
                    "typeProperties": {
                        "fileName": "testFile001.csv",
                        "folderPath": "fileinput",
                        "format": {
                            "type": "TextFormat",
                            "columnDelimiter": ";",
                            "firstRowAsHeader": true
                        }
                    },
                    "availability": {
                        "frequency": "Day",
                        "interval": 1
                    },
                    "external": true,
                    "policy": {}
                }
            },
            {
                "name": "OutputDataset-cn0",
                "properties": {
                    "structure": [
                        {
                            "name": "TypeOfRecord",
                            "type": "String"
                        },
                        {
                            "name": "TPMType",
                            "type": "String"
                        },...
                    ],
                    "published": false,
                    "type": "AzureSqlTable",
                    "linkedServiceName": "Destination-SQLAzure-cn0",
                    "typeProperties": {
                        "tableName": "[staging].[Files]"
                    },
                    "availability": {
                        "frequency": "Day",
                        "interval": 1
                    },
                    "external": false,
                    "policy": {}
                }
            }
        ]
    }
}

我需要的

我需要一种方法将源文件的名称传递给目标数据集,以便将其写入 SQL Azure 数据库。

没有本地方法来处理这个问题。但我认为您可以使用存储过程来实现这一点。

存储过程请参考属性。 https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-azure-sql-connector#copy-activity-properties