将数据写入 Azure Data Lake Store - Powershell 脚本
Writing the data to Azure Data Lake Store - Powershell Scripting
我需要将数据写入 Azure Data Lake Storage 而不是我的本地 D:\ 驱动器。我正在尝试通过 PowerShell 获取 ADF 触发器信息,并希望将数据加载到目录中的 Azure Data Lake 容器,而不是 blob 存储中。
ADF -> PowerShell -> Azure 数据湖
我想在 YYYY(文件夹)-> MM(文件夹)-> DD(文件夹)-> .CSV 中的数据文件中的容器内的 Azure Data Lake Directory 中加载数据
这是我将数据写入本地机器的代码,我需要将其转换以将数据加载到 Data Lake Storage。为了隐藏用户名和密码,我使用了一种带有 Passowrd 和 AES 加密文件的机制。
任何帮助和建议将不胜感激?
代码:
# 1- Connect to Azure Account
$username = "xyz@abc.com"
$password = Get-Content D:\Powershell\new\passwords\password.txt | ConvertTo-SecureString -Key (Get-Content D:\Powershell\new\passwords\aes.key)
$credential = New-Object System.Management.Automation.PsCredential($username,$password)
#Connect-AzureRmAccount -Credential $credential | out-null
Connect-AzAccount -Credential $credential | out-null
# 2 - Input Area
$subscriptionName = 'Data Analytics'
$resourceGroupName = 'DataLake-Gen2'
$dataFactoryName = 'dna-production-gen2'
# 3 - (All Triggers Information)
$ErrorActionPreference="SilentlyContinue"
Stop-Transcript | out-null
$ErrorActionPreference = "Continue"
Start-Transcript -path D:\Powershell\new\TriggerInfo.txt -append
Get-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName
Stop-Transcript
# read the file as a single, multiline string using the -Raw switch
$triggers = Get-Content "D:\Powershell\new\TriggerInfo.txt" -Raw
# split the text in 'trigger' text blocks on the empty line
# loop through these blocks (skip any possible empty textblock)
$triggers = ($triggers -split '(\r?\n){2,}'| Where-Object {$_ -match '\S'}) | ForEach-Object {
# and parse the data into Hashtables
$today = Get-Date
$yesterday = $today.AddDays(-1)
$data = $_ -replace ':', '=' | ConvertFrom-StringData
$splat = @{
ResourceGroupName = $data.ResourceGroupName
DataFactoryName = $data.DataFactoryName
TriggerName = $data.TriggerName
TriggerRunStartedAfter = $yesterday
TriggerRunStartedBefore = $today
}
Get-AzDataFactoryV2TriggerRun @splat
} | Export-Csv -Path 'D:\Powershell\new\Output.csv' -Encoding UTF8 -NoTypeInformation
# 4 - To extract the final output from the Output File.
Import-Csv D:\Powershell\new\Output.csv -DeLimiter "," |
Select-Object 'TriggerRunTimestamp', 'ResourceGroupName','DataFactoryName','TriggerName','TriggerRunId','TriggerType','Status' |
Export-Csv -Path 'D:\Powershell\new\Finalresult.csv' -Encoding UTF8 -NoTypeInformation -Force
代码尝试从本地系统上传文件:
$storageAccount = Get-AzStorageAccount -ResourceGroupName "DataLake-Gen2" -AccountName "dna2020gen2"
>> $ctx = $storageAccount.Context
PS C:\Windows\system32> $filesystemName = "dev"
>> $dirname = "triggers/"
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $dirname -Directory
$localSrcFile = "D:\Powershell\new\passwords\password.txt"
>> $filesystemName = "dev"
>> $dirname = "triggers/"
>> $destPath = $dirname + (Get-Item $localSrcFile).Name
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $destPath -Source $localSrcFile -Force
我可以上传文件,但无法将命令输出写入数据湖。
问题请参考以下脚本
$username = "xyz@abc.com"
$password =ConvertTo-SecureString "" -AsPlainText -Force
$credential = New-Object System.Management.Automation.PsCredential($username,$password)
#Connect-AzureRmAccount -Credential $credential | out-null
Connect-AzAccount -Credential $credential
$dataFactoryName=""
$resourceGroupName=""
# get dataFactory triggers
$triggers=Get-AzDataFactoryV2Trigger -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName
$datas=@()
foreach ($trigger in $triggers) {
# get the trigger run history
$today = Get-Date
$yesterday = $today.AddDays(-1)
$splat = @{
ResourceGroupName = $trigger.ResourceGroupName
DataFactoryName = $trigger.DataFactoryName
TriggerName = $trigger.Name
TriggerRunStartedAfter = $yesterday
TriggerRunStartedBefore = $today
}
$historys =Get-AzDataFactoryV2TriggerRun @splat
if($historys -ne $null){
# create date
foreach($history in $historys){
$obj =[PsCustomObject]@{
'TriggerRunTimestamp ' = $history.TriggerRunTimestamp
'ResourceGroupName ' =$history.ResourceGroupName
'DataFactoryName' =$history.DataFactoryName
'TriggerName ' = $history.TriggerName
'TriggerRunId'= $history.TriggerRunId
'TriggerType'=$history.TriggerType
'Status' =$history.Status
}
# add data to an array
$datas += $obj
}
}
}
# convert data to csv string
$contents =(($datas | ConvertTo-Csv -NoTypeInformation) -join [Environment]::NewLine)
# upload to Azure Data Lake Store Gen2
#1. Create a sas token
$accountName="testadls05"
$fileSystemName="test"
$filePath="data.csv"
$account = Get-AzStorageAccount -ResourceGroupName andywin7 -Name $accountName
$sas= New-AzStorageAccountSASToken -Service Blob -ResourceType Service,Container,Object `
-Permission "racwdlup" -StartTime (Get-Date).AddMinutes(-10) `
-ExpiryTime (Get-Date).AddHours(2) -Context $account.Context
$baseUrl ="https://{0}.dfs.core.windows.net/{1}/{2}{3}" -f $accountName , $fileSystemName, $filePath, $sas
#2. Create file
$endpoint =$baseUrl +"&resource=file"
Invoke-RestMethod -Method Put -Uri $endpoint -Headers @{"Content-Length" = 0} -UseBasicParsing
#3 append data
$endpoint =$baseUrl +"&action=append&position=0"
Invoke-RestMethod -Method Patch -Uri $endpoint -Headers @{"Content-Length" = $contents.Length} -Body $contents -UseBasicParsing
#4 flush data
$endpoint =$baseUrl + ("&action=flush&position={0}" -f $contents.Length)
Invoke-RestMethod -Method Patch -Uri $endpoint -UseBasicParsing
#Check the result (get data)
Invoke-RestMethod -Method Get -Uri $baseUrl -UseBasicParsing
我需要将数据写入 Azure Data Lake Storage 而不是我的本地 D:\ 驱动器。我正在尝试通过 PowerShell 获取 ADF 触发器信息,并希望将数据加载到目录中的 Azure Data Lake 容器,而不是 blob 存储中。
ADF -> PowerShell -> Azure 数据湖
我想在 YYYY(文件夹)-> MM(文件夹)-> DD(文件夹)-> .CSV 中的数据文件中的容器内的 Azure Data Lake Directory 中加载数据
这是我将数据写入本地机器的代码,我需要将其转换以将数据加载到 Data Lake Storage。为了隐藏用户名和密码,我使用了一种带有 Passowrd 和 AES 加密文件的机制。
任何帮助和建议将不胜感激?
代码:
# 1- Connect to Azure Account
$username = "xyz@abc.com"
$password = Get-Content D:\Powershell\new\passwords\password.txt | ConvertTo-SecureString -Key (Get-Content D:\Powershell\new\passwords\aes.key)
$credential = New-Object System.Management.Automation.PsCredential($username,$password)
#Connect-AzureRmAccount -Credential $credential | out-null
Connect-AzAccount -Credential $credential | out-null
# 2 - Input Area
$subscriptionName = 'Data Analytics'
$resourceGroupName = 'DataLake-Gen2'
$dataFactoryName = 'dna-production-gen2'
# 3 - (All Triggers Information)
$ErrorActionPreference="SilentlyContinue"
Stop-Transcript | out-null
$ErrorActionPreference = "Continue"
Start-Transcript -path D:\Powershell\new\TriggerInfo.txt -append
Get-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName
Stop-Transcript
# read the file as a single, multiline string using the -Raw switch
$triggers = Get-Content "D:\Powershell\new\TriggerInfo.txt" -Raw
# split the text in 'trigger' text blocks on the empty line
# loop through these blocks (skip any possible empty textblock)
$triggers = ($triggers -split '(\r?\n){2,}'| Where-Object {$_ -match '\S'}) | ForEach-Object {
# and parse the data into Hashtables
$today = Get-Date
$yesterday = $today.AddDays(-1)
$data = $_ -replace ':', '=' | ConvertFrom-StringData
$splat = @{
ResourceGroupName = $data.ResourceGroupName
DataFactoryName = $data.DataFactoryName
TriggerName = $data.TriggerName
TriggerRunStartedAfter = $yesterday
TriggerRunStartedBefore = $today
}
Get-AzDataFactoryV2TriggerRun @splat
} | Export-Csv -Path 'D:\Powershell\new\Output.csv' -Encoding UTF8 -NoTypeInformation
# 4 - To extract the final output from the Output File.
Import-Csv D:\Powershell\new\Output.csv -DeLimiter "," |
Select-Object 'TriggerRunTimestamp', 'ResourceGroupName','DataFactoryName','TriggerName','TriggerRunId','TriggerType','Status' |
Export-Csv -Path 'D:\Powershell\new\Finalresult.csv' -Encoding UTF8 -NoTypeInformation -Force
代码尝试从本地系统上传文件:
$storageAccount = Get-AzStorageAccount -ResourceGroupName "DataLake-Gen2" -AccountName "dna2020gen2"
>> $ctx = $storageAccount.Context
PS C:\Windows\system32> $filesystemName = "dev"
>> $dirname = "triggers/"
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $dirname -Directory
$localSrcFile = "D:\Powershell\new\passwords\password.txt"
>> $filesystemName = "dev"
>> $dirname = "triggers/"
>> $destPath = $dirname + (Get-Item $localSrcFile).Name
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $destPath -Source $localSrcFile -Force
我可以上传文件,但无法将命令输出写入数据湖。
问题请参考以下脚本
$username = "xyz@abc.com"
$password =ConvertTo-SecureString "" -AsPlainText -Force
$credential = New-Object System.Management.Automation.PsCredential($username,$password)
#Connect-AzureRmAccount -Credential $credential | out-null
Connect-AzAccount -Credential $credential
$dataFactoryName=""
$resourceGroupName=""
# get dataFactory triggers
$triggers=Get-AzDataFactoryV2Trigger -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName
$datas=@()
foreach ($trigger in $triggers) {
# get the trigger run history
$today = Get-Date
$yesterday = $today.AddDays(-1)
$splat = @{
ResourceGroupName = $trigger.ResourceGroupName
DataFactoryName = $trigger.DataFactoryName
TriggerName = $trigger.Name
TriggerRunStartedAfter = $yesterday
TriggerRunStartedBefore = $today
}
$historys =Get-AzDataFactoryV2TriggerRun @splat
if($historys -ne $null){
# create date
foreach($history in $historys){
$obj =[PsCustomObject]@{
'TriggerRunTimestamp ' = $history.TriggerRunTimestamp
'ResourceGroupName ' =$history.ResourceGroupName
'DataFactoryName' =$history.DataFactoryName
'TriggerName ' = $history.TriggerName
'TriggerRunId'= $history.TriggerRunId
'TriggerType'=$history.TriggerType
'Status' =$history.Status
}
# add data to an array
$datas += $obj
}
}
}
# convert data to csv string
$contents =(($datas | ConvertTo-Csv -NoTypeInformation) -join [Environment]::NewLine)
# upload to Azure Data Lake Store Gen2
#1. Create a sas token
$accountName="testadls05"
$fileSystemName="test"
$filePath="data.csv"
$account = Get-AzStorageAccount -ResourceGroupName andywin7 -Name $accountName
$sas= New-AzStorageAccountSASToken -Service Blob -ResourceType Service,Container,Object `
-Permission "racwdlup" -StartTime (Get-Date).AddMinutes(-10) `
-ExpiryTime (Get-Date).AddHours(2) -Context $account.Context
$baseUrl ="https://{0}.dfs.core.windows.net/{1}/{2}{3}" -f $accountName , $fileSystemName, $filePath, $sas
#2. Create file
$endpoint =$baseUrl +"&resource=file"
Invoke-RestMethod -Method Put -Uri $endpoint -Headers @{"Content-Length" = 0} -UseBasicParsing
#3 append data
$endpoint =$baseUrl +"&action=append&position=0"
Invoke-RestMethod -Method Patch -Uri $endpoint -Headers @{"Content-Length" = $contents.Length} -Body $contents -UseBasicParsing
#4 flush data
$endpoint =$baseUrl + ("&action=flush&position={0}" -f $contents.Length)
Invoke-RestMethod -Method Patch -Uri $endpoint -UseBasicParsing
#Check the result (get data)
Invoke-RestMethod -Method Get -Uri $baseUrl -UseBasicParsing