使用 Powershell 脚本从文本文件中循环提取 key-value 对

Extracting the key-value pair in loop from text file using Powershell Script

我正在尝试使用 powershell 从具有其他数据以及 key:value 模式的文本文件中捕获特定的键值对。谁能帮我吗?到目前为止,由于我是 Powershell 的新手,我已经在互联网的帮助下尝试了代码。任何帮助将不胜感激。

源文本示例:

ResourceGroupName    : DataLake-Gen2
DataFactoryName      : dna-production-gen2
TriggerName          : TRG_RP_Optimizely_Import
TriggerRunId         : 08586050680855766354964895535CU57
TriggerType          : ScheduleTrigger
TriggerRunTimestamp  : 8/4/2020 10:59:59 AM
Status               : Succeeded
TriggeredPipelines   : {[PL_DATA_OPTIMIZELY_MART, 1f89fc3a-27b5-442e-9685-a444f751f607]}
Message              :
Properties           : {[TriggerTime, 8/4/2020 10:59:59 AM], [ScheduleTime, 8/4/2020 11:00:00 AM], [triggerObject, {
                         "name": "Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304",
                         "startTime": "2020-08-04T10:59:59.8982174Z",
                         "endTime": "2020-08-04T10:59:59.8982174Z",
                         "scheduledTime": "2020-08-04T11:00:00Z",
                         "trackingId": "fdf58bb2-ecd5-4fe9-b2ef-d94fd71729c3",
                         "clientTrackingId": "08586050680855766354964895535CU57",
                         "originHistoryName": "08586050680855766354964895535CU57",
                         "code": "OK",
                         "status": "Succeeded"
                       }]}
AdditionalProperties : {[groupId, 08586050680855766354964895535CU57]}

ResourceGroupName    : DataLake-Gen2
DataFactoryName      : dna-production-gen2
TriggerName          : TRG_RP_Optimizely_Import
TriggerRunId         : 08586049816852049265494275953CU24
TriggerType          : ScheduleTrigger
TriggerRunTimestamp  : 8/5/2020 11:00:00 AM
Status               : Succeeded
TriggeredPipelines   : {[PL_DATA_OPTIMIZELY_MART, dd6b5beb-b7f6-44ef-8903-34c845003dfc]}
Message              :
Properties           : {[TriggerTime, 8/5/2020 11:00:00 AM], [ScheduleTime, 8/5/2020 11:00:00 AM], [triggerObject, {
                         "name": "Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304",
                         "startTime": "2020-08-05T11:00:00.2662252Z",
                         "endTime": "2020-08-05T11:00:00.2662252Z",
                         "scheduledTime": "2020-08-05T11:00:00Z",
                         "trackingId": "ba223bbd-8cb2-40e8-951f-87130dbbbfe8",
                         "clientTrackingId": "08586049816852049265494275953CU24",
                         "originHistoryName": "08586049816852049265494275953CU24",
                         "code": "OK",
                         "status": "Succeeded"
                       }]}
AdditionalProperties : {[groupId, 08586049816852049265494275953CU24]}

目前使用的代码:

[CmdletBinding()]
Param(
    [Parameter(Mandatory=$true)]
    $path
)

function Format-LogFile {
    [CmdletBinding()]
    param (
        $log
    )

    $targets = 'TriggerRunTimestamp','ResourceGroupName', 'DataFactoryName', 'TriggerName', 'TriggerRunId', 'TriggerType', 'Status'
    [System.Collections.ArrayList]$lines = @()
    $log | ForEach-Object {
        $line = $_
        $targets | ForEach-Object {
            if ($line.Contains($_) -and $line -notin $lines) {
                $lines.Add($line) | Out-Null
            }
        }
    }
#    $lines[0] = $lines[0].TrimStart("JournalSMS  ")
#    return $lines
    
}


function Get-LogFields {
    [CmdletBinding()]
    param (

        $lines
    )
    $targets = 'TriggerRunTimestamp','ResourceGroupName', 'DataFactoryName', 'TriggerName', 'TriggerRunId', 'TriggerType', 'Status'
    $matchs = $lines | Select-String -Pattern "(?<=(\s||\b))[A-Z][\s\[A-Z]/]+?\s*?\:\s+[^\s\b]+" -AllMatches 
    
    $dict = @{}
    $matchs.Matches | ForEach-Object {
        $val = $_.Value
        $arr = $val.Split("")
        if ($arr[0].Trim() -in $targets)  {
            $dict.Add($arr[0].Trim(), $arr[1].Trim())
        } 
    }
    
    return $dict
}


$log = get-content 'D:\output.txt'
$path = "D:\output.txt"
$info = Get-ChildItem -File -Recurse -Path $path | ForEach-Object {
    $log = Get-Content $_.FullName -Encoding Default
    $lines = Format-LogFile $log
    $dict = Get-LogFields $lines
    $values = New-Object -TypeName psobject -Property $dict
    return $values
} 



# $info |
# Select-Object   @{name='TriggerRunTimestamp';expression={$_.'TriggerRunTimestamp'}},
#                 @{name='ResourceGroupName';expression={$_."ResourceGroupName"}},
#                 @{name='DataFactoryName';expression={$_.'DataFactoryName'}},
#                 @{name='TriggerName';expression={$_.'TriggerName'}},
#                 @{name='TriggerRunId';expression={$_.'TriggerRunId'}} 
#                  @{name='TriggerType';expression={$_.'TriggerType'}}
#                 @{name='Status';expression={$_.'Status'}}|
# Export-Csv -Encoding UTF8 -Path .\result.csv -Force


$info |
Select-Object   'TriggerRunTimestamp', "ResourceGroupName", 'DataFactoryName',
                'TriggerName', 'TriggerRunId', 'TriggerType', 'Status' |
ConvertTo-CSV -Delimiter ";" -NoTypeInformation |
% {$_.Replace('"','')} |
Set-Content -Path 'D:\result.csv' -Force
# Export-Csv -Encoding UTF8 -Path .\result.csv -Force

预期输出:

TriggerRunTimestamp ResourceGroupName DataFactoryName TriggerName TriggerRunId TriggerType Status TriggeredPipeline Properties_TriggerTime Properties_ScheduleTime triggerObject_name triggerObject_startTime triggerObject_endTime triggerObject_scheduledTime 8/4/2020 10:59 DataLake-Gen2 dna-production-gen2 TRG_RP_Optimizely_Import 08586050680855766354964895535CU57 ScheduleTrigger 成功 PL_DATA_OPTIMIZELY_MART 8/4/2020 10:59 8/4/2020 11:00 Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304 2020-08-04T10:59:59.8982174Z 2020-08-04T10:59:59.8982174Z 2020-08-04T11:00:00Z

注意:粗体值是列 headers,值是纯文本。

急需帮助!!

谢谢

此日志文件中有问题的部分在 属性 Properties 上,这是一个 JSON 字符串。幸运的是,您不希望输出 CSV 文件中有任何这些内容,因此下面的内容应该有效:

# read the file as a single, multiline string using the -Raw switch
$log = Get-Content -Path 'D:\Test\the_input_log.txt' -Raw
# split the content into several blocks on the empty line, skip blocks that do not contain text
$result = ($log -split '(\r?\n){2,}' | Where-Object {$_ -match '\S'}) | ForEach-Object {
    # split the block to get only the part with the properties you are interested in
    # replace ' : ' into an equals sign (mind the extra spaces around the colon, otherwise
    # you will also replace the colons in the 'TriggerRunTimestamp' property.

    # use ConvertFrom-StringData cmdlet to create a Hashtable from this and convert that to a PsCustomObject
    # finally, use Select-Object to output a new PSObject with only the properties you need in the wanted order.
    [PsCustomObject](($_ -split 'TriggeredPipelines')[0] -replace ' : ', '=' | ConvertFrom-StringData)  |
    Select-Object 'TriggerRunTimestamp', 'ResourceGroupName', 'DataFactoryName', 'TriggerName', 'TriggerRunId', 'TriggerType', 'Status'
}

# output on screen
$result | Format-Table -AutoSize

# write to CSV file
$result | Export-Csv -Path 'D:\Test\result.csv' -Encoding UTF8 -NoTypeInformation -Force

我在代码中添加了很多注释,希望您能理解其中发生的事情。

生成的 CSV 文件包含引号:

"TriggerRunTimestamp","ResourceGroupName","DataFactoryName","TriggerName","TriggerRunId","TriggerType","Status"
"8/4/2020 10:59:59 AM","DataLake-Gen2","dna-production-gen2","TRG_RP_Optimizely_Import","08586050680855766354964895535CU57","ScheduleTrigger","Succeeded"
"8/5/2020 11:00:00 AM","DataLake-Gen2","dna-production-gen2","TRG_RP_Optimizely_Import","08586049816852049265494275953CU24","ScheduleTrigger","Succeeded"

如果您绝对不需要引号并且您使用的是 PowerShell 版本 7,则可以将 -UseQuotes AsNeeded 添加到 Export-Csv cmdlet。

对于旧的 PowerShell 版本,您可以使用我的函数 ConvertTo-CsvNoQuotes


编辑

根据您的评论,您还需要来自(似乎是 JSON)元素的属性,您需要一种完全不同的方法。

对于您给出的示例,您可以使用:

# read the file as a single, multiline string using the -Raw switch
$log = Get-Content -Path 'D:\Test\the_input_log.txt' -Raw
# split the content into several blocks on the empty line, skip blocks that do not contain text
$result = ($log -split '(\r?\n){2,}' | Where-Object {$_ -match '\S'}) | ForEach-Object {
    # create a Hashtable to store the key/value properties we find looping over each line in the block
    $hash = @{}
    switch -Regex ($_.Trim() -split '\r?\n') {
        '^(\w+)\s+:\s*(.*)' { $key = $matches[1]; $hash[$key] = $matches[2] }     # found a key/value property
        '^\s+(\S.+)'        { if ($key) {$hash[$key] += ("`r`n"+ $matches[1])} }  # add to a multiline property
    }
    # test if the above actually was able to parse 'TriggeredPipelines'
    if (![string]::IsNullOrWhiteSpace($hash['TriggeredPipelines'])) {
    # remove the brackets from TriggeredPipelines
        $hash['TriggeredPipeline'] = ($hash['TriggeredPipelines'].Trim("{[]}") -split ',')[0]
    }

    # test if the above actually was able to parse 'Properties'
    if (![string]::IsNullOrWhiteSpace($hash['Properties'])) {
        # the 'Properties' property needs a bit more work:
        # 1) remove the surrounding brackets, split into the first line and a textblock with the rest of the properties
        $props = $hash['Properties'].Trim("{[ ]}") -split '\r?\n', 2
        # $props[0] is now "[TriggerTime, 8/4/2020 10:59:59 AM], [ScheduleTime, 8/4/2020 11:00:00 AM], [triggerObject, {"
        # parse the TriggerTime and ScheduleTime from that line and add them to the hash
        $temp  = ([regex]'(?i)TriggerTime,\s*([^\]]+)').Matches($props[0]).Groups[1].Value
        if (![string]::IsNullOrWhiteSpace($temp)) { $hash['Properties_TriggerTime'] = $temp }

        $temp = ([regex]'(?i)ScheduleTime,\s*([^\]]+)').Matches($props[0]).Groups[1].Value
        if (![string]::IsNullOrWhiteSpace($temp)) { $hash['Properties_ScheduleTime'] = $temp }

        if ($props.Count -eq 2) {
            # 2) surround $props[1] with curly brackets, so it will become valid JSON and convert from that
            $props = '{{{0}}}' -f $props[1] | ConvertFrom-Json
            # loop through the properties and add these to the hash with "TriggerObject_" prefix
            foreach($p in $props.PSObject.Properties.name) {
                $hash["TriggerObject_$p"] = $props.$p
            }
        }
    }

    # final test to see if we have managed to capture anything
    # more strict but memory consuming would be 
    # if ($hash.Count -and ![string]::IsNullOrWhiteSpace(-join $hash.Values)) {..}

    if ($hash.Count) {
        # convert the completed hash into a PSObject and select the properties you need from it
        [PsCustomObject]$hash | Select-Object 'TriggerRunTimestamp', 'ResourceGroupName', 'DataFactoryName',
                                              'TriggerName', 'TriggerRunId', 'TriggerType', 'Status', 
                                              'TriggeredPipeline', 'Properties_TriggerTime', 'Properties_ScheduleTime',
                                              'TriggerObject_name', 'TriggerObject_startTime', 
                                              'TriggerObject_endTime', 'TriggerObject_scheduledTime'
    }
}

# output on screen (won't fit as Table in the console)
$result

# write to CSV file
$result | Export-Csv -Path 'D:\Test\result.csv' -Encoding UTF8 -NoTypeInformation -Force

生成的 CSV 文件现在看起来像

"TriggerRunTimestamp","ResourceGroupName","DataFactoryName","TriggerName","TriggerRunId","TriggerType","Status","TriggeredPipeline","Properties_TriggerTime","Properties_ScheduleTime","TriggerObject_name","TriggerObject_startTime","TriggerObject_endTime","TriggerObject_scheduledTime"
"8/4/2020 10:59:59 AM","DataLake-Gen2","dna-production-gen2","TRG_RP_Optimizely_Import","08586050680855766354964895535CU57","ScheduleTrigger","Succeeded","PL_DATA_OPTIMIZELY_MART","8/4/2020 10:59:59 AM","8/4/2020 11:00:00 AM","Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304","2020-08-04T10:59:59.8982174Z","2020-08-04T10:59:59.8982174Z","2020-08-04T11:00:00Z"
"8/5/2020 11:00:00 AM","DataLake-Gen2","dna-production-gen2","TRG_RP_Optimizely_Import","08586049816852049265494275953CU24","ScheduleTrigger","Succeeded","PL_DATA_OPTIMIZELY_MART","8/5/2020 11:00:00 AM","8/5/2020 11:00:00 AM","Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304","2020-08-05T11:00:00.2662252Z","2020-08-05T11:00:00.2662252Z","2020-08-05T11:00:00Z"

在 PowerShell 5 中(我不知道低版本)。我们可以使用 -Match 比较将具有模式的字符串分解为“Key”和“Value”对。 大多数情况下,需要在处理 JSON 个对象时出现。


PS C:\Users> $str = '"KeyStr": "ValueString"'
PS C:\Users> $str -match '(?<key>.+):(?<value>.+)'
True
PS C:\Users> # $Matches is inbuilt variable in PowerShell
PS C:\Users> $Matches

Name                           Value
----                           -----
key                            "KeyStr"
value                           "ValueString"
0                              "KeyStr": "ValueString"


PS C:\Users> $Matches.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Hashtable                                System.Object


PS C:\Users> $Matches.key
"KeyStr"
PS C:\Users> $Matches.Value
 "ValueString"
PS C:\Users>
------------------------------------------

如需更多帮助,请查看 PowerShell 帮助

“获取帮助 about_Comparison_Operators”