使用 Powershell 将多行文本文件提取到单行 csv
Extract multiple line text file to single line csv using Powershell
我有一个任务,但没有一种简单的方法可以将一些数据解析为正确的格式。我的文本文件格式如下
#N Last Name: Joe
#D First Name: Doe
#P Middle Name: A
Some Data:
#C ID Number: (1) 12345
#S Status: (1) Active
#N Last Name: Jane
#D First Name: Doee
#P Middle Name:
Some Data:
#C ID Number: (1) 11111
#S Status: (1) Active
ID Number: (2) 1231
Status: (2) Active
这是我尝试使用的代码。
$A = Select-String -Pattern "#N" MYFILE.txt;
$B = Select-String -Pattern "#D" MYFILE.txt;
$C = Select-String -Pattern "#P" MYFILE.txt;
$D = Select-String -Pattern "#C" MYFILE.txt;
$E = Select-String -Pattern "#S" MYFILE.txt;
$wrapper = New-Object PSObject -Property @{ FirstColumn = $A; SecondColumn = $B; ThirdColumn = $C; FourthColumn = $D; FifthColumn = $E }
Export-Csv -InputObject $wrapper -Path .\output.csv -NoTypeInformation
这是我得到的结果
"SecondColumn","ThirdColumn","FifthColumn","FourthColumn","FirstColumn"
"System.Object[]","System.Object[]","System.Object[]","System.Object[]","System.Object[]"
我正在寻找的输出是; #N,#D,#P,#C,#S
Joe, Doe, A, 12345, Active
Jane, Doee, , 11111, Active
非常感谢任何帮助。
这会奏效,但它丑陋得像罪一样。
# Get content from text file
$Txtfile = Get-Content "C:\temp\test.txt" -raw
# Add delimiter to split users
$Delimiter = "
"
$Users = $Txtfile -split $Delimiter
# Create an array list to add data to so it can be exported later.
$collectionVariable = New-Object System.Collections.ArrayList
ForEach($Grouping in $Users) {
$temp = New-Object System.Object
$temp | Add-Member -MemberType NoteProperty -Name "#N" -Value (([regex]::match($Grouping, '#N.*').value) -split " ")[-1]
$temp | Add-Member -MemberType NoteProperty -Name "#D" -Value (([regex]::match($Grouping, '#D.*').value) -split " ")[-1]
$temp | Add-Member -MemberType NoteProperty -Name "#P" -Value (([regex]::match($Grouping, '#P.*').value) -split " ")[-1]
$temp | Add-Member -MemberType NoteProperty -Name "#C" -Value (([regex]::match($Grouping, '#C.*').value) -split " ")[-1]
# This is [-2] due to new line at end of the groupings.
$temp | Add-Member -MemberType NoteProperty -Name "#S" -Value (([regex]::match($Grouping, '#S.*').value) -split " ")[-2]
$collectionVariable.Add($temp) | Out-Null
}
这是解析该数据块的另一种方法。我更改了用户信息,以便更清楚地了解发生了什么。 [咧嘴一笑]
它的作用...
- 创建一个多行字符串来使用
当准备好真正做到这一点时,将整个 #region/#endregion
块替换为对 Get-Content -Raw
. 的调用
- 定义用户数据块之间的分隔符
在这种情况下,它是 2 个换行符 - 一个在最后一个数据行的末尾,一个用于空白行。
- 将单个多行字符串拆分为多个这样的字符串
- 迭代生成的文本块
- 初始化用于构建 PSCO 的 $Vars
- 将文本块拆分为文本行
- 过滤掉任何不以
#
开头的行,然后是字母,最后是 space
- 遍历剩余的行
- 在每行的第二个字符上运行
switch
- 当它匹配其中一个代码字母时,解析该行并为等效的 $Var
设置值
- 完成对当前字符串集的迭代
- 构建一个
[PSCustomObject]
来保存值
- 将其发送到
$Result
集合
- 完成对文本块的迭代[外部
foreach
]
- 在屏幕上显示集合
- 将集合保存为 CSV
如果您想从 CSV 中删除引号,请不要。 [grin] 如果您必须冒破坏 CSV 的风险,那么您可以使用 Get-Content
从文件中加载行并用任何内容替换引号。
代码...
#region >>> fake reading in a text file as a single multiline string
# in real life, use "Get-Content -Raw"
$InStuff = @'
#N Last Name: ALast
#D First Name: AFirst
#P Middle Name: AMid
Some Data:
#C ID Number: (1) 11111
#S Status: (1) Active
#N Last Name: BLast
#D First Name: BFirst
#P Middle Name:
Some Data:
#C ID Number: (1) 22222
#S Status: (1) Active
ID Number: (2) 1231
Status: (2) Active
'@
#endregion >>> fake reading in a text file as a single multiline string
$BlockDelim = ([System.Environment]::NewLine) * 2
$Result = foreach ($Block in ($InStuff -split $BlockDelim))
{
# initialize stuff to $Null
# this handles non-matches [such as a missing middle name]
$FirstName = $MidName = $LastName = $IdNumber = $Status = $Null
# the "-match" filters for lines that start with a "#", a single letter, and a space
foreach ($Line in ($Block -split [System.Environment]::NewLine -match '^#\w '))
{
switch ($Line[1])
{
'N' {
$LastName = $Line.Split(':')[-1].Trim()
break
}
'D' {
$FirstName = $Line.Split(':')[-1].Trim()
break
}
'P' {
$MidName = $Line.Split(':')[-1].Trim()
break
}
'C' {
$IdNumber = $Line.Split(':')[-1].Trim().Split(' ')[-1].Trim()
break
}
'S' {
$Status = $Line.Split(':')[-1].Trim().Split(' ')[-1].Trim()
break
}
} # end >>> switch ($Line[1])
} # end >>> foreach ($Line in ($Block -split [System.Environment]::NewLine))
# create a custom object and send it out to the collection
[PSCustomObject]@{
FirstName = $FirstName
LastName = $LastName
MidName = $MidName
IdNumber = $IdNumber
Status = $Status
}
} # end >>> foreach ($Block in ($InStuff -split $BlockDelim))
# display on screen
$Result
# send to a CSV file
$Result |
Export-Csv -LiteralPath "$env:TEMP\Veebster_ParsedResult.csv" -NoTypeInformation
on-screen 输出 ...
FirstName : AFirst
LastName : ALast
MidName : AMid
IdNumber : 11111
Status : Active
FirstName : BFirst
LastName : BLast
MidName :
IdNumber : 22222
Status : Active
CSV 文件的内容...
"FirstName","LastName","MidName","IdNumber","Status"
"AFirst","ALast","AMid","11111","Active"
"BFirst","BLast","","22222","Active"
请注意,没有错误检测或错误处理。 [咧嘴一笑]
这里还有一个推荐。我的建议是使用 ConvertFrom-String.
首先我们会做一个模板,我在你的示例数据中占了两行垃圾。我真的希望那是 typo/copyo.
$template = @'
#N Last Name {[string]Last*:last1}
#D First Name: {[string]First:first1}
#P Middle Name: {[string]Middle:A}
#C ID Number: (1) {[int]ID:11111}
#S Status: (1) {[string]Status:status1}
#N Last Name: {[string]Last*:Jane}
#D First Name: {[string]First:Doee}
#P Middle Name: {[string]Middle: \s}
#C ID Number: (1) {[int]ID:11111}
#S Status: (1) {[string]Status:Active}
{!Last*:ID Number: (2) 1231
Status: (2) Active}
'@
现在我们将该模板应用于您的数据。首先我们将解析一个here-string.
@'
#N Last Name: Joe
#D First Name: Doe
#P Middle Name: A
Some Data:
#C ID Number: (1) 12345
#S Status: (1) Active
#N Last Name: Jane
#D First Name: Doee
#P Middle Name:
Some Data:
#C ID Number: (1) 11111
#S Status: (1) Active
ID Number: (2) 1231
Status: (2) Active
'@ | ConvertFrom-String -TemplateContent $template -OutVariable results
输出
Last : Joe
First : Doe
Middle : A
ID : 12345
Status : Active
Last : Jane
First : Doee
ID : 11111
Status : Active
现在我们可以构造我们的对象以准备导出。
$results | foreach {
[pscustomobject]@{
FirstName = $_.first
LastName = $_.last
MidName = $_.middle
IdNumber = $_.id
Status = $_.status
}
} -OutVariable export
现在我们可以导出了
$export | Export-Csv -Path .\output.csv -NoTypeInformation
这是 output.csv
中的内容
PS C:\> Get-Content .\output.csv
"FirstName","LastName","MidName","IdNumber","Status"
"Doe","Joe","A","12345","Active"
"Doee","Jane",,"11111","Active"
这是从文件中读取它的相同内容。
$template = @'
#N Last Name {[string]Last*:last1}
#D First Name: {[string]First:first1}
#P Middle Name: {[string]Middle:A}
#C ID Number: (1) {[int]ID:11111}
#S Status: (1) {[string]Status:status1}
#N Last Name: {[string]Last*:Jane}
#D First Name: {[string]First:Doee}
#P Middle Name: {[string]Middle: \s}
#C ID Number: (1) {[int]ID:11111}
#S Status: (1) {[string]Status:Active}
{!Last*:ID Number: (2) 1231
Status: (2) Active}
'@
get-content .\ndpcs.txt |
ConvertFrom-String -TemplateContent $template | foreach {
[pscustomobject]@{
FirstName = $_.first
LastName = $_.last
MidName = $_.middle
IdNumber = $_.id
Status = $_.status
}
} | Export-Csv -Path .\output.csv -NoTypeInformation
让我们再次检查 CSV 的内容以确保万无一失。
Get-Content .\output.csv
"FirstName","LastName","MidName","IdNumber","Status"
"Doe","Joe","A","12345","Active"
"Doee","Jane",,"11111","Active"
有几点需要注意:
如果此后的数据集具有不同的特征,则需要向模板添加更多样本。
如果多余的两行(ID 和状态)不应该存在,只需删除模板的那部分。
我建议大家在编写 logic/building 脚本时使用 -outvariable 参数,因为您可以看到输出并同时分配给变量。
我有一个任务,但没有一种简单的方法可以将一些数据解析为正确的格式。我的文本文件格式如下
#N Last Name: Joe
#D First Name: Doe
#P Middle Name: A
Some Data:
#C ID Number: (1) 12345
#S Status: (1) Active
#N Last Name: Jane
#D First Name: Doee
#P Middle Name:
Some Data:
#C ID Number: (1) 11111
#S Status: (1) Active
ID Number: (2) 1231
Status: (2) Active
这是我尝试使用的代码。
$A = Select-String -Pattern "#N" MYFILE.txt;
$B = Select-String -Pattern "#D" MYFILE.txt;
$C = Select-String -Pattern "#P" MYFILE.txt;
$D = Select-String -Pattern "#C" MYFILE.txt;
$E = Select-String -Pattern "#S" MYFILE.txt;
$wrapper = New-Object PSObject -Property @{ FirstColumn = $A; SecondColumn = $B; ThirdColumn = $C; FourthColumn = $D; FifthColumn = $E }
Export-Csv -InputObject $wrapper -Path .\output.csv -NoTypeInformation
这是我得到的结果
"SecondColumn","ThirdColumn","FifthColumn","FourthColumn","FirstColumn"
"System.Object[]","System.Object[]","System.Object[]","System.Object[]","System.Object[]"
我正在寻找的输出是; #N,#D,#P,#C,#S
Joe, Doe, A, 12345, Active
Jane, Doee, , 11111, Active
非常感谢任何帮助。
这会奏效,但它丑陋得像罪一样。
# Get content from text file
$Txtfile = Get-Content "C:\temp\test.txt" -raw
# Add delimiter to split users
$Delimiter = "
"
$Users = $Txtfile -split $Delimiter
# Create an array list to add data to so it can be exported later.
$collectionVariable = New-Object System.Collections.ArrayList
ForEach($Grouping in $Users) {
$temp = New-Object System.Object
$temp | Add-Member -MemberType NoteProperty -Name "#N" -Value (([regex]::match($Grouping, '#N.*').value) -split " ")[-1]
$temp | Add-Member -MemberType NoteProperty -Name "#D" -Value (([regex]::match($Grouping, '#D.*').value) -split " ")[-1]
$temp | Add-Member -MemberType NoteProperty -Name "#P" -Value (([regex]::match($Grouping, '#P.*').value) -split " ")[-1]
$temp | Add-Member -MemberType NoteProperty -Name "#C" -Value (([regex]::match($Grouping, '#C.*').value) -split " ")[-1]
# This is [-2] due to new line at end of the groupings.
$temp | Add-Member -MemberType NoteProperty -Name "#S" -Value (([regex]::match($Grouping, '#S.*').value) -split " ")[-2]
$collectionVariable.Add($temp) | Out-Null
}
这是解析该数据块的另一种方法。我更改了用户信息,以便更清楚地了解发生了什么。 [咧嘴一笑]
它的作用...
- 创建一个多行字符串来使用
当准备好真正做到这一点时,将整个#region/#endregion
块替换为对Get-Content -Raw
. 的调用
- 定义用户数据块之间的分隔符
在这种情况下,它是 2 个换行符 - 一个在最后一个数据行的末尾,一个用于空白行。 - 将单个多行字符串拆分为多个这样的字符串
- 迭代生成的文本块
- 初始化用于构建 PSCO 的 $Vars
- 将文本块拆分为文本行
- 过滤掉任何不以
#
开头的行,然后是字母,最后是 space - 遍历剩余的行
- 在每行的第二个字符上运行
switch
- 当它匹配其中一个代码字母时,解析该行并为等效的 $Var 设置值
- 完成对当前字符串集的迭代
- 构建一个
[PSCustomObject]
来保存值 - 将其发送到
$Result
集合 - 完成对文本块的迭代[外部
foreach
] - 在屏幕上显示集合
- 将集合保存为 CSV
如果您想从 CSV 中删除引号,请不要。 [grin] 如果您必须冒破坏 CSV 的风险,那么您可以使用 Get-Content
从文件中加载行并用任何内容替换引号。
代码...
#region >>> fake reading in a text file as a single multiline string
# in real life, use "Get-Content -Raw"
$InStuff = @'
#N Last Name: ALast
#D First Name: AFirst
#P Middle Name: AMid
Some Data:
#C ID Number: (1) 11111
#S Status: (1) Active
#N Last Name: BLast
#D First Name: BFirst
#P Middle Name:
Some Data:
#C ID Number: (1) 22222
#S Status: (1) Active
ID Number: (2) 1231
Status: (2) Active
'@
#endregion >>> fake reading in a text file as a single multiline string
$BlockDelim = ([System.Environment]::NewLine) * 2
$Result = foreach ($Block in ($InStuff -split $BlockDelim))
{
# initialize stuff to $Null
# this handles non-matches [such as a missing middle name]
$FirstName = $MidName = $LastName = $IdNumber = $Status = $Null
# the "-match" filters for lines that start with a "#", a single letter, and a space
foreach ($Line in ($Block -split [System.Environment]::NewLine -match '^#\w '))
{
switch ($Line[1])
{
'N' {
$LastName = $Line.Split(':')[-1].Trim()
break
}
'D' {
$FirstName = $Line.Split(':')[-1].Trim()
break
}
'P' {
$MidName = $Line.Split(':')[-1].Trim()
break
}
'C' {
$IdNumber = $Line.Split(':')[-1].Trim().Split(' ')[-1].Trim()
break
}
'S' {
$Status = $Line.Split(':')[-1].Trim().Split(' ')[-1].Trim()
break
}
} # end >>> switch ($Line[1])
} # end >>> foreach ($Line in ($Block -split [System.Environment]::NewLine))
# create a custom object and send it out to the collection
[PSCustomObject]@{
FirstName = $FirstName
LastName = $LastName
MidName = $MidName
IdNumber = $IdNumber
Status = $Status
}
} # end >>> foreach ($Block in ($InStuff -split $BlockDelim))
# display on screen
$Result
# send to a CSV file
$Result |
Export-Csv -LiteralPath "$env:TEMP\Veebster_ParsedResult.csv" -NoTypeInformation
on-screen 输出 ...
FirstName : AFirst
LastName : ALast
MidName : AMid
IdNumber : 11111
Status : Active
FirstName : BFirst
LastName : BLast
MidName :
IdNumber : 22222
Status : Active
CSV 文件的内容...
"FirstName","LastName","MidName","IdNumber","Status"
"AFirst","ALast","AMid","11111","Active"
"BFirst","BLast","","22222","Active"
请注意,没有错误检测或错误处理。 [咧嘴一笑]
这里还有一个推荐。我的建议是使用 ConvertFrom-String.
首先我们会做一个模板,我在你的示例数据中占了两行垃圾。我真的希望那是 typo/copyo.
$template = @'
#N Last Name {[string]Last*:last1}
#D First Name: {[string]First:first1}
#P Middle Name: {[string]Middle:A}
#C ID Number: (1) {[int]ID:11111}
#S Status: (1) {[string]Status:status1}
#N Last Name: {[string]Last*:Jane}
#D First Name: {[string]First:Doee}
#P Middle Name: {[string]Middle: \s}
#C ID Number: (1) {[int]ID:11111}
#S Status: (1) {[string]Status:Active}
{!Last*:ID Number: (2) 1231
Status: (2) Active}
'@
现在我们将该模板应用于您的数据。首先我们将解析一个here-string.
@'
#N Last Name: Joe
#D First Name: Doe
#P Middle Name: A
Some Data:
#C ID Number: (1) 12345
#S Status: (1) Active
#N Last Name: Jane
#D First Name: Doee
#P Middle Name:
Some Data:
#C ID Number: (1) 11111
#S Status: (1) Active
ID Number: (2) 1231
Status: (2) Active
'@ | ConvertFrom-String -TemplateContent $template -OutVariable results
输出
Last : Joe
First : Doe
Middle : A
ID : 12345
Status : Active
Last : Jane
First : Doee
ID : 11111
Status : Active
现在我们可以构造我们的对象以准备导出。
$results | foreach {
[pscustomobject]@{
FirstName = $_.first
LastName = $_.last
MidName = $_.middle
IdNumber = $_.id
Status = $_.status
}
} -OutVariable export
现在我们可以导出了
$export | Export-Csv -Path .\output.csv -NoTypeInformation
这是 output.csv
中的内容PS C:\> Get-Content .\output.csv
"FirstName","LastName","MidName","IdNumber","Status"
"Doe","Joe","A","12345","Active"
"Doee","Jane",,"11111","Active"
这是从文件中读取它的相同内容。
$template = @'
#N Last Name {[string]Last*:last1}
#D First Name: {[string]First:first1}
#P Middle Name: {[string]Middle:A}
#C ID Number: (1) {[int]ID:11111}
#S Status: (1) {[string]Status:status1}
#N Last Name: {[string]Last*:Jane}
#D First Name: {[string]First:Doee}
#P Middle Name: {[string]Middle: \s}
#C ID Number: (1) {[int]ID:11111}
#S Status: (1) {[string]Status:Active}
{!Last*:ID Number: (2) 1231
Status: (2) Active}
'@
get-content .\ndpcs.txt |
ConvertFrom-String -TemplateContent $template | foreach {
[pscustomobject]@{
FirstName = $_.first
LastName = $_.last
MidName = $_.middle
IdNumber = $_.id
Status = $_.status
}
} | Export-Csv -Path .\output.csv -NoTypeInformation
让我们再次检查 CSV 的内容以确保万无一失。
Get-Content .\output.csv
"FirstName","LastName","MidName","IdNumber","Status"
"Doe","Joe","A","12345","Active"
"Doee","Jane",,"11111","Active"
有几点需要注意: 如果此后的数据集具有不同的特征,则需要向模板添加更多样本。 如果多余的两行(ID 和状态)不应该存在,只需删除模板的那部分。
我建议大家在编写 logic/building 脚本时使用 -outvariable 参数,因为您可以看到输出并同时分配给变量。