我如何用 cut 程序表达“pokemon.csv 文件每一行中的第 3 个字段”?
How can i express "3rd field in each line of the pokemon.csv file" with cut program?
根据 (https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6#file-pokemon-csv),每个口袋妖怪可能有两种类型:type1 和 type2。
在我创建所有 csv 文件后,当我检查它们时,我注意到它会添加 type2 与 i 匹配的口袋妖怪(对于水,电等中的 i)。
例如,如果我为类型 1 的所有口袋妖怪创建了一个名为 Grass 的文件夹
草,然后将所有包含 Type 1 = Grass Pokemon 的行添加到 pokemon_grass.csv 文件中。
它应该只处理 type1。
我正在尝试使用 cut 程序来执行此操作。
有没有办法简化我的 for 循环条件?我的意思是不列出所有类型(水、草...),我可以做类似的事情吗
for i in `pokemon.csv | cut -d , -f 3`
每行哪一个只占第 3 个字段?
#This is a comment
if [ $# = 0 ]; then
echo Error\: Missing Filename
echo USAGE\: sh fileCheck.sh \<pokemon.csv\>
exit
fi
if [ -f ]; then
echo FILE \"\" is found
if [ -r ]; then
for i in Water Electric Rock Fire Ground Ghoust Dragon Grass Steel Bug Fightng Fairy Dark Ice Normal Poision Psychic Flying
do
mkdir $i
`cat pokemon.csv|grep $i | cut -d , -f 3 >> $i/pokemon_$i.csv`
done
fi
fi```
您可以使用 cut -d, -f3 pokemon.csv | sort -u | while read -r i; do ...; done
而不是手动列出所有类型。但是,这只是一个很小的改进。
现在您一遍又一遍地阅读 pokemon.csv
(每种类型一次)。最好只读取一次文件,如下:
while IFS=, read -r id name type1 otherFields
mkdir -p "$type1"
echo "$id,$name,$type1,$otherFields" >> "$type1/pokemon_$type1.csv"
done < pokemon.csv
如果你想做的是将 pokemon.csv
文件分开到以 type 1
口袋妖怪命名的单独目录中(第 3 个字段),并用相同的 type 1
口袋妖怪写入每条记录到文件,则该作业的正确工具是 awk
。单通道,只需构建 mkdir -p
和 touch
命令来创建所需的每个文件并将命令通过管道传递给 shell。然后简单地将每条记录重定向到正确的目录和名称,例如
awk -F, '
NR > 1 {
if ( in a) { # if 3rd field exists in array a, already created
print [=10=] > "/pokemon_"".csv" # print record to file
next # get next record
}
a[]++ # increment value in array at index of 3rd field
# build command line to create directory and empty file, pipe to shell
printf "mkdir -p 7%s7 && touch 7%s7\n", , "/pokemon_"".csv" | "sh"
close ("sh") # close shell
print [=10=] > "/pokemon_"".csv" # print record to file
}
' pokemon.csv
例子Use/Output
只需 select-复制并用鼠标中键将上面的命令粘贴到当前工作目录包含 pokemon.csv
的 xterm 中,例如
$ awk -F, '
> NR > 1 {
> if ( in a) { # if 3rd field exists in array a, already created
> print [=11=] > "/pokemon_"".csv" # print 3rd filed to file
> next # get next record
> }
> a[]++ # increment value in array at index of 3rd field
> # build command line to create directory and empty file, pipe to shell
> printf "mkdir -p 7%s7 && touch 7%s7\n", , "/pokemon_"".csv" | "sh"
> close ("sh") # close shell
> print [=11=] > "/pokemon_"".csv" # print 3rd field to file
> }
> ' pokemon.csv
结果:
$ tree
.
├── Bug
│ └── pokemon_Bug.csv
├── Dark
│ └── pokemon_Dark.csv
├── Dragon
│ └── pokemon_Dragon.csv
├── Electric
│ └── pokemon_Electric.csv
├── Fairy
│ └── pokemon_Fairy.csv
├── Fighting
│ └── pokemon_Fighting.csv
├── Fire
│ └── pokemon_Fire.csv
├── Flying
│ └── pokemon_Flying.csv
├── Ghost
│ └── pokemon_Ghost.csv
├── Grass
│ └── pokemon_Grass.csv
├── Ground
│ └── pokemon_Ground.csv
├── Ice
│ └── pokemon_Ice.csv
├── Normal
│ └── pokemon_Normal.csv
├── Poison
│ └── pokemon_Poison.csv
├── Psychic
│ └── pokemon_Psychic.csv
├── Rock
│ └── pokemon_Rock.csv
├── Steel
│ └── pokemon_Steel.csv
├── Water
│ └── pokemon_Water.csv
其中,例如,Bug/pokemon_Bug.csv
包含:
$ cat Bug/pokemon_Bug.csv
10,Caterpie,Bug,,195,45,30,35,20,20,45,1,False
11,Metapod,Bug,,205,50,20,55,25,25,30,1,False
12,Butterfree,Bug,Flying,395,60,45,50,90,80,70,1,False
13,Weedle,Bug,Poison,195,40,35,30,20,20,50,1,False
14,Kakuna,Bug,Poison,205,45,25,50,25,25,35,1,False
15,Beedrill,Bug,Poison,395,65,90,40,45,80,75,1,False
15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False
46,Paras,Bug,Grass,285,35,70,55,45,55,25,1,False
47,Parasect,Bug,Grass,405,60,95,80,60,80,30,1,False
48,Venonat,Bug,Poison,305,60,55,50,40,55,45,1,False
49,Venomoth,Bug,Poison,450,70,65,60,90,75,90,1,False
123,Scyther,Bug,Flying,500,70,110,80,55,80,105,1,False
127,Pinsir,Bug,,500,65,125,100,55,70,85,1,False
127,PinsirMega Pinsir,Bug,Flying,600,65,155,120,65,90,105,1,False
165,Ledyba,Bug,Flying,265,40,20,30,40,80,55,2,False
166,Ledian,Bug,Flying,390,55,35,50,55,110,85,2,False
167,Spinarak,Bug,Poison,250,40,60,40,40,40,30,2,False
168,Ariados,Bug,Poison,390,70,90,70,60,60,40,2,False
193,Yanma,Bug,Flying,390,65,65,45,75,45,95,2,False
204,Pineco,Bug,,290,50,65,90,35,35,15,2,False
205,Forretress,Bug,Steel,465,75,90,140,60,60,40,2,False
<... snip ...>
595,Joltik,Bug,Electric,319,50,47,50,57,50,65,5,False
596,Galvantula,Bug,Electric,472,70,77,60,97,60,108,5,False
616,Shelmet,Bug,,305,50,40,85,40,65,25,5,False
617,Accelgor,Bug,,495,80,70,40,100,60,145,5,False
632,Durant,Bug,Steel,484,58,109,112,48,48,109,5,False
636,Larvesta,Bug,Fire,360,55,85,55,50,55,60,5,False
637,Volcarona,Bug,Fire,550,85,60,65,135,105,100,5,False
649,Genesect,Bug,Steel,600,71,120,95,120,95,99,5,False
664,Scatterbug,Bug,,200,38,35,40,27,25,35,6,False
665,Spewpa,Bug,,213,45,22,60,27,30,29,6,False
666,Vivillon,Bug,Flying,411,80,52,50,90,50,89,6,False
如前所述,您可以使用 shell 循环执行相同的操作——但这会非常低效,因为会为列表中的每个类型 1 名称重复搜索文件。使用 awk
- 它会为您处理所有事情。 (awk
是文本处理的瑞士军刀)见GNU Awk - User's Guide
没有样本 input/output 这是一个未经检验的猜测,但这是你想要做的吗?
sort -t',' -k3,3 -k1,1n pokemon.csv |
awk -F',' '
NR == 1 {
hdr = [=10=]
next
}
!= prev {
close(out)
system("mkdir -p 7" "7")
out = "/pokemon_" ".csv"
print hdr > out
prev =
}
{ print > out }
'
以上一次只打开 1 个输出文件,因此无论输入中存在多少 $3,它都不会因任何 awk 中的“打开的文件过多”错误而失败,也不会减慢其他文件的速度不得不在后台管理 opening/closing 个文件。
完全不清楚为什么您希望输出位于 /pokemon_.csv
而不是 /pokemon.csv
或 ./pokemon_.csv
- 使其在目录和 $3 中都是唯一的文件名似乎多余。
根据 (https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6#file-pokemon-csv),每个口袋妖怪可能有两种类型:type1 和 type2。
在我创建所有 csv 文件后,当我检查它们时,我注意到它会添加 type2 与 i 匹配的口袋妖怪(对于水,电等中的 i)。
例如,如果我为类型 1 的所有口袋妖怪创建了一个名为 Grass 的文件夹
草,然后将所有包含 Type 1 = Grass Pokemon 的行添加到 pokemon_grass.csv 文件中。
它应该只处理 type1。
我正在尝试使用 cut 程序来执行此操作。
有没有办法简化我的 for 循环条件?我的意思是不列出所有类型(水、草...),我可以做类似的事情吗
for i in `pokemon.csv | cut -d , -f 3`
每行哪一个只占第 3 个字段?
#This is a comment
if [ $# = 0 ]; then
echo Error\: Missing Filename
echo USAGE\: sh fileCheck.sh \<pokemon.csv\>
exit
fi
if [ -f ]; then
echo FILE \"\" is found
if [ -r ]; then
for i in Water Electric Rock Fire Ground Ghoust Dragon Grass Steel Bug Fightng Fairy Dark Ice Normal Poision Psychic Flying
do
mkdir $i
`cat pokemon.csv|grep $i | cut -d , -f 3 >> $i/pokemon_$i.csv`
done
fi
fi```
您可以使用 cut -d, -f3 pokemon.csv | sort -u | while read -r i; do ...; done
而不是手动列出所有类型。但是,这只是一个很小的改进。
现在您一遍又一遍地阅读 pokemon.csv
(每种类型一次)。最好只读取一次文件,如下:
while IFS=, read -r id name type1 otherFields
mkdir -p "$type1"
echo "$id,$name,$type1,$otherFields" >> "$type1/pokemon_$type1.csv"
done < pokemon.csv
如果你想做的是将 pokemon.csv
文件分开到以 type 1
口袋妖怪命名的单独目录中(第 3 个字段),并用相同的 type 1
口袋妖怪写入每条记录到文件,则该作业的正确工具是 awk
。单通道,只需构建 mkdir -p
和 touch
命令来创建所需的每个文件并将命令通过管道传递给 shell。然后简单地将每条记录重定向到正确的目录和名称,例如
awk -F, '
NR > 1 {
if ( in a) { # if 3rd field exists in array a, already created
print [=10=] > "/pokemon_"".csv" # print record to file
next # get next record
}
a[]++ # increment value in array at index of 3rd field
# build command line to create directory and empty file, pipe to shell
printf "mkdir -p 7%s7 && touch 7%s7\n", , "/pokemon_"".csv" | "sh"
close ("sh") # close shell
print [=10=] > "/pokemon_"".csv" # print record to file
}
' pokemon.csv
例子Use/Output
只需 select-复制并用鼠标中键将上面的命令粘贴到当前工作目录包含 pokemon.csv
的 xterm 中,例如
$ awk -F, '
> NR > 1 {
> if ( in a) { # if 3rd field exists in array a, already created
> print [=11=] > "/pokemon_"".csv" # print 3rd filed to file
> next # get next record
> }
> a[]++ # increment value in array at index of 3rd field
> # build command line to create directory and empty file, pipe to shell
> printf "mkdir -p 7%s7 && touch 7%s7\n", , "/pokemon_"".csv" | "sh"
> close ("sh") # close shell
> print [=11=] > "/pokemon_"".csv" # print 3rd field to file
> }
> ' pokemon.csv
结果:
$ tree
.
├── Bug
│ └── pokemon_Bug.csv
├── Dark
│ └── pokemon_Dark.csv
├── Dragon
│ └── pokemon_Dragon.csv
├── Electric
│ └── pokemon_Electric.csv
├── Fairy
│ └── pokemon_Fairy.csv
├── Fighting
│ └── pokemon_Fighting.csv
├── Fire
│ └── pokemon_Fire.csv
├── Flying
│ └── pokemon_Flying.csv
├── Ghost
│ └── pokemon_Ghost.csv
├── Grass
│ └── pokemon_Grass.csv
├── Ground
│ └── pokemon_Ground.csv
├── Ice
│ └── pokemon_Ice.csv
├── Normal
│ └── pokemon_Normal.csv
├── Poison
│ └── pokemon_Poison.csv
├── Psychic
│ └── pokemon_Psychic.csv
├── Rock
│ └── pokemon_Rock.csv
├── Steel
│ └── pokemon_Steel.csv
├── Water
│ └── pokemon_Water.csv
其中,例如,Bug/pokemon_Bug.csv
包含:
$ cat Bug/pokemon_Bug.csv
10,Caterpie,Bug,,195,45,30,35,20,20,45,1,False
11,Metapod,Bug,,205,50,20,55,25,25,30,1,False
12,Butterfree,Bug,Flying,395,60,45,50,90,80,70,1,False
13,Weedle,Bug,Poison,195,40,35,30,20,20,50,1,False
14,Kakuna,Bug,Poison,205,45,25,50,25,25,35,1,False
15,Beedrill,Bug,Poison,395,65,90,40,45,80,75,1,False
15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False
46,Paras,Bug,Grass,285,35,70,55,45,55,25,1,False
47,Parasect,Bug,Grass,405,60,95,80,60,80,30,1,False
48,Venonat,Bug,Poison,305,60,55,50,40,55,45,1,False
49,Venomoth,Bug,Poison,450,70,65,60,90,75,90,1,False
123,Scyther,Bug,Flying,500,70,110,80,55,80,105,1,False
127,Pinsir,Bug,,500,65,125,100,55,70,85,1,False
127,PinsirMega Pinsir,Bug,Flying,600,65,155,120,65,90,105,1,False
165,Ledyba,Bug,Flying,265,40,20,30,40,80,55,2,False
166,Ledian,Bug,Flying,390,55,35,50,55,110,85,2,False
167,Spinarak,Bug,Poison,250,40,60,40,40,40,30,2,False
168,Ariados,Bug,Poison,390,70,90,70,60,60,40,2,False
193,Yanma,Bug,Flying,390,65,65,45,75,45,95,2,False
204,Pineco,Bug,,290,50,65,90,35,35,15,2,False
205,Forretress,Bug,Steel,465,75,90,140,60,60,40,2,False
<... snip ...>
595,Joltik,Bug,Electric,319,50,47,50,57,50,65,5,False
596,Galvantula,Bug,Electric,472,70,77,60,97,60,108,5,False
616,Shelmet,Bug,,305,50,40,85,40,65,25,5,False
617,Accelgor,Bug,,495,80,70,40,100,60,145,5,False
632,Durant,Bug,Steel,484,58,109,112,48,48,109,5,False
636,Larvesta,Bug,Fire,360,55,85,55,50,55,60,5,False
637,Volcarona,Bug,Fire,550,85,60,65,135,105,100,5,False
649,Genesect,Bug,Steel,600,71,120,95,120,95,99,5,False
664,Scatterbug,Bug,,200,38,35,40,27,25,35,6,False
665,Spewpa,Bug,,213,45,22,60,27,30,29,6,False
666,Vivillon,Bug,Flying,411,80,52,50,90,50,89,6,False
如前所述,您可以使用 shell 循环执行相同的操作——但这会非常低效,因为会为列表中的每个类型 1 名称重复搜索文件。使用 awk
- 它会为您处理所有事情。 (awk
是文本处理的瑞士军刀)见GNU Awk - User's Guide
没有样本 input/output 这是一个未经检验的猜测,但这是你想要做的吗?
sort -t',' -k3,3 -k1,1n pokemon.csv |
awk -F',' '
NR == 1 {
hdr = [=10=]
next
}
!= prev {
close(out)
system("mkdir -p 7" "7")
out = "/pokemon_" ".csv"
print hdr > out
prev =
}
{ print > out }
'
以上一次只打开 1 个输出文件,因此无论输入中存在多少 $3,它都不会因任何 awk 中的“打开的文件过多”错误而失败,也不会减慢其他文件的速度不得不在后台管理 opening/closing 个文件。
完全不清楚为什么您希望输出位于 /pokemon_.csv
而不是 /pokemon.csv
或 ./pokemon_.csv
- 使其在目录和 $3 中都是唯一的文件名似乎多余。