使用 Powershell 对 CSV 进行逻辑操作

Question

我需要根据 Col03 中的值对 Col01 应用一些更改

Col01,Col02,Col03
empty,empty,6
empty,empty,19
empty,empty,75
empty,empty,87
empty,red,145
empty,empty,625
empty,empty,abc

将Col01中的内容设为：

'small' 如果 Col03 值小于或等于 50
'medium' 如果 Col03 值介于 51 和 100 之间
'large' 如果 Col03 值介于 51 和 100 之间
'text' 如果 Col03 值是文本（不是数字）

结果：

Col01,Col02,Col03
small,empty,6
small,empty,19
medium,empty,75
medium,empty,87
large,empty,145
large,empty,625
text,empty,abc

Answer 1

这是正确的吗？有多种方法可以实现这一目标。例如：

#Sample data
$csv = @"
Col01,Col02,Col03
empty,empty,6
empty,empty,19
empty,empty,75
empty,empty,87
empty,red,145
empty,empty,625
empty,empty,abc
"@ | ConvertFrom-Csv

#Uncomment to read from file
#$csv = Import-CSV -Path C:\MyFile.csv

$csv | ForEach-Object {

    #Get current Col03 value
    $col3 = $_.Col03.Trim()

    #Calculate new Col01 value 
    $val = if($col3 -match '^\d+$') {
        #Col03 is integer
        if([int]$col3 -le 50) { "small" }
        elseif ([int]$col3 -le 100) { "medium" }
        else { "large" }
    } else { "text" }

    #Replace Col01-value
    $_.Col01 = $val

    #Output modified object
    $_
}

或者使用开关。此示例还将结果保存到文件中：

$csv = @"
Col01,Col02,Col03
empty,empty,6
empty,empty,19
empty,empty,75
empty,empty,87
empty,red,145
empty,empty,625
empty,empty,abc
"@ | ConvertFrom-Csv

#Uncomment to read from file
#$csv = Import-CSV -Path C:\MyFile.csv

$csv | ForEach-Object {

    $_.Col01 = switch($_.Col03.Trim()) {
        #Contains non-digit - text
        {$_ -match '\D'} { 'text'; break; }
        #Number - pick category
        {[int]$_ -le 50} { 'small'; break; }
        {[int]$_ -le 100} { 'medium'; break; }
        {[int]$_ -gt 100} { 'large'; break; } 
    }  

    #Output modified object
    $_
} | Export-CSV -Path MyOuput.csv -NoTypeInformation

输出：

Col01  Col02 Col03
-----  ----- -----
small  empty 6    
small  empty 19   
medium empty 75   
medium empty 87   
large  red   145  
large  empty 625  
text   empty abc

Answer 2

完美的练习草稿，可以写成多种写法，核心是：

正在读取 CSV 行
- 您可以将其作为文本行来完成，但 PowerShell 中的 'right' 方式是 Import-Csv)。
测试 text/number 部分
1. 首先（在您的代码中）处理文本条件，任何超过 的内容都将是 数字。
2. 首先尝试处理数字案例，超过的任何内容都将是文本。
3. 假设全是数字，如果断了，就是文本。这是可行的，但它使用异常来控制流，这很糟糕。这很糟糕，因为异常是针对特殊情况的，并且您期望文本是程序操作的正常部分，而且它有点糟糕，因为异常有很多开销。但是，您是程序员，您可以根据需要选择使用它们 - 例如特别是 readable/clear.
导出 CSV 行
- 同样，您可以将其作为文本行来完成，但是 Export-Csv 需要与 Import-Csv 配对，所以这就是 PowerShell 中的 'right' 方式。

并且在 PowerShell 术语中，您对其进行排列是有意义的：

Import-Csv | ForEach-Object { process one CSV row at a time } | Export-Csv

(as opposed to:
$foo = import-csv
$bar = @()
foreach ($line in $foo) {
   #...
   $bar += $line
}
which is workable but ugly and wasteful of memory and CPU and won't scale nicely)

好的，我们已经处理了 read/process/write 部分的结构。现在您要将数字值分配给存储桶。

0-10    11-20    21-30    31-40
\__/    \___/    \___/    \___/

或任何大小范围/桶条件。

那个模式尖叫 if/else 或 switch。

所以剩下的部分是，您选择 1. 2. 3. 方法中的哪一种，在何处以及如何将文本与数字分开，以及如何将数字分配到桶中。

大部分选择与可读性和您的偏好有关。

有开始和结束的桶意味着像 start -le $num -and $num -lt end 这样的双重测试，然后两个边缘情况只有一个测试。但是你的三个桶意味着一个需要双重测试，两个需要一个单一测试。

if ($foo -gt 100)
elseif (51 -lt $foo -and $foo -le 100)
elseif ($foo -lt 50)

看看 if/elseif 和 single/double 测试的混杂。但是因为你的桶很好地相互碰撞，你可以使用/误用跌落测试来获得：

if ($foo -gt 100) { big    }
if ($foo -le 100) { medium }
if ($foo -le 50 ) { small  }

好的，有些会被分配 'medium'，然后 'small'，但是这种布局更易于阅读，更易于查看它在做什么，不是吗？

除非您停止它，否则 fallthrough 会隐式地发生在 switch 中，因此将有带或不带 break 的 switch case 来停止 fallthrough。

如果您选择首先匹配文本，您可能会使用正则表达式来识别非数字的东西（向后），所以我明白了：

Import-Csv .\t.csv | ForEach-Object { 

    if ($_.Col03 -notmatch '^\d+$')
    {
        $_.Col01 = 'text'
    }
    else 
    {
        if ([int]$_.Col03 -gt 100) { $_.Col01 = 'large'  }
        if ([int]$_.Col03 -le 100) { $_.Col01 = 'medium' }
        if ([int]$_.Col03 -le 50)  { $_.Col01 = 'small'  }
    }

    $_

} # | Export-Csv out.csv -NoTypeInformation

这是可行的，但是嗯。如果您选择首先识别数字，您可以使用相同的正则表达式模式，或者告诉 .Net 框架将文本 TryParse-ing 为数字。我得到：

Import-Csv .\t.csv | ForEach-Object { 

    [int]$n = 0

    if ([int]::TryParse($_.Col03, [ref]$n)) 
    {
        if ($n -gt 100) { $_.Col01 = 'large'  }
        if ($n -le 100) { $_.Col01 = 'medium' }    
        if ($n -le 50)  { $_.Col01 = 'small'  }
    }
    else 
    {
        $_.Col01 = 'text'
    }

    $_

} # | Export-Csv out.csv -NoTypeInformation

这不是很漂亮。选择正则表达式/开关 fallthrough 组合，我得到：

Import-Csv .\t.csv | ForEach-Object { 

    switch -regex ($_)
    {
        {$_.Col03 -notmatch '^\d+$'} { $_.Col01 = 'text'   ; break }
        {[int]$_.Col03 -gt 100}      { $_.Col01 = 'large'          }
        {[int]$_.Col03 -le 100}      { $_.Col01 = 'medium'         }
        {[int]$_.Col03 -le 50 }      { $_.Col01 = 'small'          }
    }

    $_
} # | Export-Csv out.csv -NoTypeInformation

这很漂亮，但是部分 ; break / fallthrough 只是在等待有人在阅读它时犯错误。并使用异常处理作为控制流，我得到了这个，（由于 catch 块内 $_ 的范围问题，我已经将其更改为首先将整个文件读入内存）：

$Rows = foreach ($Row in Import-Csv .\t.csv)
{ 
    try {
        if ([int]$Row.Col03 -gt 100) { $Row.Col01 = 'large'  }
        if ([int]$Row.Col03 -le 100) { $Row.Col01 = 'medium' }
        if ([int]$Row.Col03 -le 50)  { $Row.Col01 = 'small'  }
    } catch {
        $row.Col01 = 'text'
    }

    $row
}

$Rows # | Export-Csv out.csv -NoTypeInformation

但他们基本上都在做同样的事情，我想不出更好的分桶方式，所以我所有的答案几乎都是相同的形状，即使他们这样做了运行通过完全不同的代码。

Frode F 使用从 if 分配的 PowerShell 再次以不同的方式工作，但在那种方法中我不能使用 fallthrough bucket 检查 - 因此他使用 if/elseif/if而不是 if/if/if。这有点不错，可以格式化为：

$row.Col01 =     if (                $n -le  50) { 'small'  }
             elseif ( 51 -lt $n -and $n -le 100) { 'medium' }
             elseif (100 -lt $n                ) { 'large'  }

然后更清楚它们是 start/end 范围，实际上是的，我最喜欢 Frode 的方法，只是没有格式化他的格式化方式，希望我在写这篇文章之前阅读他的回答。

使用 Powershell 对 CSV 进行逻辑操作

Logical operations on CSV with Powershell

csv

powershell

logical-operators