无法将 30 GB SQL table 作为 530 MB Parquet 文件从客户端 SQL 服务器计算机传输到我的 Azure Data Lake Gen2

Cannot transfer a large 30 GB SQL table from a client SQL Server machine to my Azure Data Lake Gen2 as a 530 MB Parquet File

我无法使用 Azure 数据工厂将 30 GB SQL 服务器 table 作为 530 MB Parquet 文件从这台机器复制到我的 Azure Data Lake Gen 2 存储帐户。压缩类型是 gzip。吞吐量为 11.8 MB/s

复制详情:

失败的 ADF 复制错误消息是:

{ "errorCode": "2200", "message": "Failure happened on 'Sink' side. ErrorCode=UserErrorFailedBlobFSOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=BlobFS operation failed for: A task was canceled.. Account: 'datalake'. FileSystem: &aposcontainer-dl'. Path: 'ImportLayer/F61ILBarAcct_Txns.parquet'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Threading.Tasks.TaskCanceledException,Message=A task was canceled.,Source=mscorlib,'", "failureType": "UserError", "target": "Copy Latest Source Data" }

客户端集成运行时日志上也是一样

DEBUG:
TraceComponentId: TransferClientLibrary
TraceMessageId: BlobFSOperationRetry
@logId: Warning
jobId: c063e070-cc12-4cae-895f-f8ada2bfa3ff
activityId: ecfa652d-8471-4297-be2a-4ecc0ebc89c5
eventId: BlobFSOperationRetry
message: 'Type=System.Threading.Tasks.TaskCanceledException,Message=A task was canceled.,Source=mscorlib,StackTrace=   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Storage.Data.AzureDfsClient.<UpdatePathWithHttpMessagesAsync>d__41.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Storage.Data.AzureDfsClientExtensions.<UpdatePathAsync>d__24.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Storage.Data.AzureDfsClientExtensions.UpdatePath(IAzureDfsClient operations, String action, String filesystem, String path, Nullable`1 position, Nullable`1 retainUncommittedData, String contentLength, String xMsLeaseAction, String xMsLeaseId, String xMsCacheControl, String xMsContentType, String xMsContentDisposition, String xMsContentEncoding, String xMsContentLanguage, String xMsProperties, String ifMatch, String ifNoneMatch, String ifModifiedSince, String ifUnmodifiedSince, Stream requestBody, String xMsClientRequestId, Nullable`1 timeout, String xMsDate)
   at Microsoft.Azure.Storage.Data.BlobFSClient.<>c__DisplayClass37_0.<AppendFile>b__1()
   at Microsoft.Rest.TransientFaultHandling.RetryPolicy.<>c__DisplayClass16_0.<ExecuteAction>b__0()
   at Microsoft.Rest.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func),'

在客户端计算机上,cpu 是 Intel Xeon E7-2830 @2.13Ghz,64 位 OS。它有 16.0 GB 的内存。它有一个 40 GB 的硬盘驱动器和 10 GB 的可用空间 space。我将最大虚拟内存增加到 10 GB,以便它可以使用空闲 space。对于 Java 选项,我将 -Xmx,Java 最大堆内存设置为 26 GB 以利用它。我只能使用安装了 Integration Runtime 的客户端机器。

可能是什么问题?

我设法通过使用压缩类型 snappy 而不是 gzip 来解决它。它使用更少的处理能力。

此外,我运行一次复制一个,而不是一次复制多个。它更慢但更安全