将数据写入 Azure Data Lake Store - Powershell 脚本

Writing the data to Azure Data Lake Store - Powershell Scripting

我需要将数据写入 Azure Data Lake Storage 而不是我的本地 D:\ 驱动器。我正在尝试通过 PowerShell 获取 ADF 触发器信息,并希望将数据加载到目录中的 Azure Data Lake 容器,而不是 blob 存储中。

ADF -> PowerShell -> Azure 数据湖

我想在 YYYY(文件夹)-> MM(文件夹)-> DD(文件夹)-> .CSV 中的数据文件中的容器内的 Azure Data Lake Directory 中加载数据

这是我将数据写入本地机器的代码,我需要将其转换以将数据加载到 Data Lake Storage。为了隐藏用户名和密码,我使用了一种带有 Passowrd 和 AES 加密文件的机制。

任何帮助和建议将不胜感激?

代码:

# 1- Connect to Azure Account

$username = "xyz@abc.com"
$password = Get-Content D:\Powershell\new\passwords\password.txt | ConvertTo-SecureString -Key (Get-Content D:\Powershell\new\passwords\aes.key)
$credential = New-Object System.Management.Automation.PsCredential($username,$password)


#Connect-AzureRmAccount -Credential $credential | out-null

Connect-AzAccount -Credential $credential | out-null

# 2 - Input Area

$subscriptionName = 'Data Analytics'
$resourceGroupName = 'DataLake-Gen2'
$dataFactoryName = 'dna-production-gen2'


# 3 - (All Triggers Information)


$ErrorActionPreference="SilentlyContinue"
Stop-Transcript | out-null
$ErrorActionPreference = "Continue"
Start-Transcript -path D:\Powershell\new\TriggerInfo.txt -append
Get-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName
Stop-Transcript 

# read the file as a single, multiline string using the -Raw switch

$triggers = Get-Content "D:\Powershell\new\TriggerInfo.txt" -Raw

# split the text in 'trigger' text blocks on the empty line

# loop through these blocks (skip any possible empty textblock)

$triggers = ($triggers -split '(\r?\n){2,}'| Where-Object {$_ -match '\S'}) | ForEach-Object {

    # and parse the data into Hashtables
    $today = Get-Date
    $yesterday = $today.AddDays(-1)

    $data  = $_ -replace ':', '=' | ConvertFrom-StringData

    $splat = @{ 
        ResourceGroupName       = $data.ResourceGroupName
        DataFactoryName         = $data.DataFactoryName
        TriggerName             = $data.TriggerName
        TriggerRunStartedAfter  = $yesterday
        TriggerRunStartedBefore = $today
   }
    
   Get-AzDataFactoryV2TriggerRun @splat 

} | Export-Csv -Path 'D:\Powershell\new\Output.csv' -Encoding UTF8 -NoTypeInformation 

# 4 - To extract the final output from the Output File.

Import-Csv D:\Powershell\new\Output.csv -DeLimiter "," | 
Select-Object 'TriggerRunTimestamp', 'ResourceGroupName','DataFactoryName','TriggerName','TriggerRunId','TriggerType','Status' | 
Export-Csv -Path 'D:\Powershell\new\Finalresult.csv' -Encoding UTF8 -NoTypeInformation -Force

代码尝试从本地系统上传文件:

$storageAccount = Get-AzStorageAccount -ResourceGroupName "DataLake-Gen2" -AccountName "dna2020gen2"
>> $ctx = $storageAccount.Context
PS C:\Windows\system32> $filesystemName = "dev"
>> $dirname = "triggers/"
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $dirname -Directory

$localSrcFile =  "D:\Powershell\new\passwords\password.txt"
>> $filesystemName = "dev"
>> $dirname = "triggers/"
>> $destPath = $dirname + (Get-Item $localSrcFile).Name
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $destPath -Source $localSrcFile -Force

我可以上传文件,但无法将命令输出写入数据湖。

问题请参考以下脚本

$username = "xyz@abc.com"
$password =ConvertTo-SecureString "" -AsPlainText -Force
$credential = New-Object System.Management.Automation.PsCredential($username,$password)


#Connect-AzureRmAccount -Credential $credential | out-null

Connect-AzAccount -Credential $credential
$dataFactoryName=""
$resourceGroupName=""
# get dataFactory triggers
$triggers=Get-AzDataFactoryV2Trigger -DataFactoryName $dataFactoryName  -ResourceGroupName $resourceGroupName
$datas=@()
foreach ($trigger in $triggers) {
    # get the trigger run history
    $today = Get-Date
    $yesterday = $today.AddDays(-1)
     $splat = @{ 
        ResourceGroupName       = $trigger.ResourceGroupName
        DataFactoryName         = $trigger.DataFactoryName
        TriggerName             = $trigger.Name
        TriggerRunStartedAfter  = $yesterday
        TriggerRunStartedBefore = $today
   }
    
   $historys =Get-AzDataFactoryV2TriggerRun @splat
   if($historys -ne $null){
     # create date
     foreach($history in $historys){
        $obj =[PsCustomObject]@{
            'TriggerRunTimestamp '     = $history.TriggerRunTimestamp
            'ResourceGroupName '   =$history.ResourceGroupName
            'DataFactoryName' =$history.DataFactoryName
            'TriggerName '  = $history.TriggerName
            'TriggerRunId'= $history.TriggerRunId
            'TriggerType'=$history.TriggerType
            'Status' =$history.Status

        }
        # add data to an array
        $datas += $obj
     }
   } 
   
  
 }
 #  convert data to csv string
 $contents =(($datas | ConvertTo-Csv -NoTypeInformation) -join [Environment]::NewLine)

 # upload to Azure Data Lake Store Gen2

 #1. Create a sas token
 $accountName="testadls05"
 $fileSystemName="test"
 $filePath="data.csv"
 $account = Get-AzStorageAccount -ResourceGroupName andywin7 -Name $accountName
 $sas= New-AzStorageAccountSASToken -Service Blob  -ResourceType Service,Container,Object `
      -Permission "racwdlup" -StartTime (Get-Date).AddMinutes(-10) `
      -ExpiryTime (Get-Date).AddHours(2) -Context $account.Context
$baseUrl ="https://{0}.dfs.core.windows.net/{1}/{2}{3}" -f $accountName ,  $fileSystemName, $filePath, $sas
#2. Create file
$endpoint =$baseUrl +"&resource=file"

Invoke-RestMethod -Method Put -Uri $endpoint -Headers @{"Content-Length" = 0} -UseBasicParsing

#3 append data
$endpoint =$baseUrl +"&action=append&position=0"
Invoke-RestMethod -Method Patch -Uri $endpoint -Headers @{"Content-Length" = $contents.Length} -Body $contents -UseBasicParsing

#4 flush data
$endpoint =$baseUrl + ("&action=flush&position={0}" -f $contents.Length)
Invoke-RestMethod -Method Patch -Uri $endpoint -UseBasicParsing

#Check the result (get data)

Invoke-RestMethod -Method Get -Uri $baseUrl -UseBasicParsing

详情请参考here, here and here