我如何用 cut 程序表达“pokemon.csv 文件每一行中的第 3 个字段”？

Question

根据 (https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6#file-pokemon-csv)，每个口袋妖怪可能有两种类型：type1 和 type2。在我创建所有 csv 文件后，当我检查它们时，我注意到它会添加 type2 与 i 匹配的口袋妖怪（对于水，电等中的 i）。例如，如果我为类型 1 的所有口袋妖怪创建了一个名为 Grass 的文件夹草，然后将所有包含 Type 1 = Grass Pokemon 的行添加到 pokemon_grass.csv 文件中。它应该只处理 type1。我正在尝试使用 cut 程序来执行此操作。有没有办法简化我的 for 循环条件？我的意思是不列出所有类型（水、草...），我可以做类似的事情吗 for i in `pokemon.csv | cut -d , -f 3` 每行哪一个只占第 3 个字段？

#This is a comment
if [ $# = 0 ]; then
    echo Error\: Missing Filename
    echo USAGE\: sh fileCheck.sh \<pokemon.csv\>
    exit
fi
if [ -f  ]; then
    echo FILE \"\" is found
    if [ -r  ]; then
    for i in Water Electric Rock Fire Ground Ghoust Dragon Grass Steel Bug Fightng Fairy Dark Ice Normal Poision Psychic Flying
    do
        mkdir $i
        `cat pokemon.csv|grep $i | cut -d , -f 3 >> $i/pokemon_$i.csv`
    done
    fi  
fi```

Answer 1

您可以使用 cut -d, -f3 pokemon.csv | sort -u | while read -r i; do ...; done 而不是手动列出所有类型。但是，这只是一个很小的改进。

现在您一遍又一遍地阅读 pokemon.csv（每种类型一次）。最好只读取一次文件，如下：

while IFS=, read -r id name type1 otherFields
    mkdir -p "$type1"
    echo "$id,$name,$type1,$otherFields" >> "$type1/pokemon_$type1.csv"
done < pokemon.csv

Answer 2

如果你想做的是将 pokemon.csv 文件分开到以 type 1 口袋妖怪命名的单独目录中（第 3 个字段），并用相同的 type 1 口袋妖怪写入每条记录到文件，则该作业的正确工具是 awk。单通道，只需构建 mkdir -p 和 touch 命令来创建所需的每个文件并将命令通过管道传递给 shell。然后简单地将每条记录重定向到正确的目录和名称，例如

awk -F, '
    NR > 1 { 
        if ( in a) {  # if 3rd field exists in array a, already created
            print [=10=] > "/pokemon_"".csv"    # print record to file
            next                                # get next record
        }
        a[]++         # increment value in array at index of 3rd field
        # build command line to create directory and empty file, pipe to shell
        printf "mkdir -p 7%s7 && touch 7%s7\n", , "/pokemon_"".csv" | "sh"
        close ("sh")    # close shell
        print [=10=] > "/pokemon_"".csv"        # print record to file
    }
' pokemon.csv

例子Use/Output

只需 select-复制并用鼠标中键将上面的命令粘贴到当前工作目录包含 pokemon.csv 的 xterm 中，例如

$ awk -F, '
>     NR > 1 {
>         if ( in a) {  # if 3rd field exists in array a, already created
>             print [=11=] > "/pokemon_"".csv"    # print 3rd filed to file
>             next                                # get next record
>         }
>         a[]++         # increment value in array at index of 3rd field
>         # build command line to create directory and empty file, pipe to shell
>         printf "mkdir -p 7%s7 && touch 7%s7\n", , "/pokemon_"".csv" | "sh"
>         close ("sh")    # close shell
>         print [=11=] > "/pokemon_"".csv"        # print 3rd field to file
>     }
> ' pokemon.csv

结果：

$ tree
.
├── Bug
│   └── pokemon_Bug.csv
├── Dark
│   └── pokemon_Dark.csv
├── Dragon
│   └── pokemon_Dragon.csv
├── Electric
│   └── pokemon_Electric.csv
├── Fairy
│   └── pokemon_Fairy.csv
├── Fighting
│   └── pokemon_Fighting.csv
├── Fire
│   └── pokemon_Fire.csv
├── Flying
│   └── pokemon_Flying.csv
├── Ghost
│   └── pokemon_Ghost.csv
├── Grass
│   └── pokemon_Grass.csv
├── Ground
│   └── pokemon_Ground.csv
├── Ice
│   └── pokemon_Ice.csv
├── Normal
│   └── pokemon_Normal.csv
├── Poison
│   └── pokemon_Poison.csv
├── Psychic
│   └── pokemon_Psychic.csv
├── Rock
│   └── pokemon_Rock.csv
├── Steel
│   └── pokemon_Steel.csv
├── Water
│   └── pokemon_Water.csv

其中，例如，Bug/pokemon_Bug.csv 包含：

$ cat Bug/pokemon_Bug.csv
10,Caterpie,Bug,,195,45,30,35,20,20,45,1,False
11,Metapod,Bug,,205,50,20,55,25,25,30,1,False
12,Butterfree,Bug,Flying,395,60,45,50,90,80,70,1,False
13,Weedle,Bug,Poison,195,40,35,30,20,20,50,1,False
14,Kakuna,Bug,Poison,205,45,25,50,25,25,35,1,False
15,Beedrill,Bug,Poison,395,65,90,40,45,80,75,1,False
15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False
46,Paras,Bug,Grass,285,35,70,55,45,55,25,1,False
47,Parasect,Bug,Grass,405,60,95,80,60,80,30,1,False
48,Venonat,Bug,Poison,305,60,55,50,40,55,45,1,False
49,Venomoth,Bug,Poison,450,70,65,60,90,75,90,1,False
123,Scyther,Bug,Flying,500,70,110,80,55,80,105,1,False
127,Pinsir,Bug,,500,65,125,100,55,70,85,1,False
127,PinsirMega Pinsir,Bug,Flying,600,65,155,120,65,90,105,1,False
165,Ledyba,Bug,Flying,265,40,20,30,40,80,55,2,False
166,Ledian,Bug,Flying,390,55,35,50,55,110,85,2,False
167,Spinarak,Bug,Poison,250,40,60,40,40,40,30,2,False
168,Ariados,Bug,Poison,390,70,90,70,60,60,40,2,False
193,Yanma,Bug,Flying,390,65,65,45,75,45,95,2,False
204,Pineco,Bug,,290,50,65,90,35,35,15,2,False
205,Forretress,Bug,Steel,465,75,90,140,60,60,40,2,False
<... snip ...>
595,Joltik,Bug,Electric,319,50,47,50,57,50,65,5,False
596,Galvantula,Bug,Electric,472,70,77,60,97,60,108,5,False
616,Shelmet,Bug,,305,50,40,85,40,65,25,5,False
617,Accelgor,Bug,,495,80,70,40,100,60,145,5,False
632,Durant,Bug,Steel,484,58,109,112,48,48,109,5,False
636,Larvesta,Bug,Fire,360,55,85,55,50,55,60,5,False
637,Volcarona,Bug,Fire,550,85,60,65,135,105,100,5,False
649,Genesect,Bug,Steel,600,71,120,95,120,95,99,5,False
664,Scatterbug,Bug,,200,38,35,40,27,25,35,6,False
665,Spewpa,Bug,,213,45,22,60,27,30,29,6,False
666,Vivillon,Bug,Flying,411,80,52,50,90,50,89,6,False

如前所述，您可以使用 shell 循环执行相同的操作——但这会非常低效，因为会为列表中的每个类型 1 名称重复搜索文件。使用 awk - 它会为您处理所有事情。（awk 是文本处理的瑞士军刀）见GNU Awk - User's Guide

Answer 3

没有样本 input/output 这是一个未经检验的猜测，但这是你想要做的吗？

sort -t',' -k3,3 -k1,1n pokemon.csv |
awk -F',' '
    NR == 1 {
        hdr = [=10=]
        next
    }
     != prev  {
        close(out)
        system("mkdir -p 7"  "7")
        out =  "/pokemon_"  ".csv"
        print hdr > out
        prev = 
    }
    { print > out }
'

以上一次只打开 1 个输出文件，因此无论输入中存在多少 $3，它都不会因任何 awk 中的“打开的文件过多”错误而失败，也不会减慢其他文件的速度不得不在后台管理 opening/closing 个文件。

完全不清楚为什么您希望输出位于 /pokemon_.csv 而不是 /pokemon.csv 或 ./pokemon_.csv - 使其在目录和 $3 中都是唯一的文件名似乎多余。

我如何用 cut 程序表达“pokemon.csv 文件每一行中的第 3 个字段”？

How can i express "3rd field in each line of the pokemon.csv file" with cut program?

bash

shell

awk