使用 bash/python 合并两个 CSV 文件
Merge two CSV files using bash/python
我有两个 CSV 文件需要帮助 mapping/merging:
CSV 文件 1:
"ID","Name","Flavor"
"45fc754d-6a9b-4bde-b7ad-be91ae60f582","test1","m1.medium"
"83dbc739-e436-4c9f-a561-c5b40a3a6da5","test2","m1.tiny"
"ef68fcf3-f624-416d-a59b-bb8f1aa2a769","test3","m1.medium"
CSV 文件 2:
"Name","RAM","Disk","VCPUs"
"m1.medium",4096,40,2
"m1.xlarge",16384,160,8
"m1.tiny",128,1,1
理想的输出是:
"ID","Name","Flavor","RAM","Disk","VCPUs"
"45fc754d-6a9b-4bde-b7ad-be91ae60f582","test1","m1.medium",4096,40,2
"83dbc739-e436-4c9f-a561-c5b40a3a6da5","test2","m1.tiny",128,1,1
"ef68fcf3-f624-416d-a59b-bb8f1aa2a769","test3","m1.medium",4096,40,2
请注意,CSV 文件 1 中的 Flavor
和 CSV 文件 2 中的 Name
是相同的。名称的不同是由于使用了不同的工具来提取信息。
另请注意,CSV 文件 2 有一个 flavor/name
m1.xlarge
。如上所述,如果在 CSV 文件 1 中未找到 m1.xlarge
flavor/name
,则应将其从合并输出中丢弃。
我整天都在做这件事,结果好坏参半。任何想法将不胜感激。
类似这样的内容,但您必须尝试使用引号选项才能看到您喜欢的内容。
#!/usr/bin/env python3
import csv
by_name = {}
with open('b.csv') as b:
for row in csv.DictReader(b):
name = row.pop('Name')
by_name[name] = row
with open('c.csv', 'w') as c:
w = csv.DictWriter(c, ['ID', 'Name', 'Flavor', 'RAM', 'Disk', 'VCPUs'])
w.writeheader()
with open('a.csv') as a:
for row in csv.DictReader(a):
try:
match = by_name[row['Flavor']]
except KeyError:
continue
row.update(match)
w.writerow(row)
输出:
ID,Name,Flavor,RAM,Disk,VCPUs
45fc754d-6a9b-4bde-b7ad-be91ae60f582,test1,m1.medium,4096,40,2
83dbc739-e436-4c9f-a561-c5b40a3a6da5,test2,m1.tiny,128,1,1
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,test3,m1.medium,4096,40,2
您可以使用这个 awk
:
awk -v hdr='"ID","Name","Flavor","RAM","Disk","VCPUs"' 'BEGIN {
FS=OFS=","
print hdr
}
NR == FNR {
a[] = FS FS
next
}
in a {
print [=10=], a[]
}' file2.csv file1.csv
"ID","Name","Flavor","RAM","Disk","VCPUs"
"45fc754d-6a9b-4bde-b7ad-be91ae60f582","test1","m1.medium",4096,40,2
"83dbc739-e436-4c9f-a561-c5b40a3a6da5","test2","m1.tiny",128,1,1
"ef68fcf3-f624-416d-a59b-bb8f1aa2a769","test3","m1.medium",4096,40,2
如果我正确理解了您的问题,并且您想根据 Flavor
列中的字符串将第一个文件中的行与 Name
列中具有该值的行进行匹配第二个 csv,那么这很容易用 xsv
(which you'll likely need to install first):
$ xsv join "Flavor" file1.csv "Name" file2.csv
ID,Name,Flavor,Name,RAM,Disk,VCPUs
45fc754d-6a9b-4bde-b7ad-be91ae60f582,test1,m1.medium,m1.medium,4096,40,2
83dbc739-e436-4c9f-a561-c5b40a3a6da5,test2,m1.tiny,m1.tiny,128,1,1
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,test3,m1.medium,m1.medium,4096,40,2
您还必须删除重复的 Name
列,您可以再次使用 xsv
执行此操作:
$ xsv join "Flavor" file1.csv "Name" file2.csv | xsv select ID,Name,Flavor,RAM,Disk,VCPUs
ID,Name,Flavor,RAM,Disk,VCPUs
45fc754d-6a9b-4bde-b7ad-be91ae60f582,test1,m1.medium,4096,40,2
83dbc739-e436-4c9f-a561-c5b40a3a6da5,test2,m1.tiny,128,1,1
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,test3,m1.medium,4096,40,2
我有两个 CSV 文件需要帮助 mapping/merging:
CSV 文件 1:
"ID","Name","Flavor"
"45fc754d-6a9b-4bde-b7ad-be91ae60f582","test1","m1.medium"
"83dbc739-e436-4c9f-a561-c5b40a3a6da5","test2","m1.tiny"
"ef68fcf3-f624-416d-a59b-bb8f1aa2a769","test3","m1.medium"
CSV 文件 2:
"Name","RAM","Disk","VCPUs"
"m1.medium",4096,40,2
"m1.xlarge",16384,160,8
"m1.tiny",128,1,1
理想的输出是:
"ID","Name","Flavor","RAM","Disk","VCPUs"
"45fc754d-6a9b-4bde-b7ad-be91ae60f582","test1","m1.medium",4096,40,2
"83dbc739-e436-4c9f-a561-c5b40a3a6da5","test2","m1.tiny",128,1,1
"ef68fcf3-f624-416d-a59b-bb8f1aa2a769","test3","m1.medium",4096,40,2
请注意,CSV 文件 1 中的 Flavor
和 CSV 文件 2 中的 Name
是相同的。名称的不同是由于使用了不同的工具来提取信息。
另请注意,CSV 文件 2 有一个 flavor/name
m1.xlarge
。如上所述,如果在 CSV 文件 1 中未找到 m1.xlarge
flavor/name
,则应将其从合并输出中丢弃。
我整天都在做这件事,结果好坏参半。任何想法将不胜感激。
类似这样的内容,但您必须尝试使用引号选项才能看到您喜欢的内容。
#!/usr/bin/env python3
import csv
by_name = {}
with open('b.csv') as b:
for row in csv.DictReader(b):
name = row.pop('Name')
by_name[name] = row
with open('c.csv', 'w') as c:
w = csv.DictWriter(c, ['ID', 'Name', 'Flavor', 'RAM', 'Disk', 'VCPUs'])
w.writeheader()
with open('a.csv') as a:
for row in csv.DictReader(a):
try:
match = by_name[row['Flavor']]
except KeyError:
continue
row.update(match)
w.writerow(row)
输出:
ID,Name,Flavor,RAM,Disk,VCPUs
45fc754d-6a9b-4bde-b7ad-be91ae60f582,test1,m1.medium,4096,40,2
83dbc739-e436-4c9f-a561-c5b40a3a6da5,test2,m1.tiny,128,1,1
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,test3,m1.medium,4096,40,2
您可以使用这个 awk
:
awk -v hdr='"ID","Name","Flavor","RAM","Disk","VCPUs"' 'BEGIN {
FS=OFS=","
print hdr
}
NR == FNR {
a[] = FS FS
next
}
in a {
print [=10=], a[]
}' file2.csv file1.csv
"ID","Name","Flavor","RAM","Disk","VCPUs"
"45fc754d-6a9b-4bde-b7ad-be91ae60f582","test1","m1.medium",4096,40,2
"83dbc739-e436-4c9f-a561-c5b40a3a6da5","test2","m1.tiny",128,1,1
"ef68fcf3-f624-416d-a59b-bb8f1aa2a769","test3","m1.medium",4096,40,2
如果我正确理解了您的问题,并且您想根据 Flavor
列中的字符串将第一个文件中的行与 Name
列中具有该值的行进行匹配第二个 csv,那么这很容易用 xsv
(which you'll likely need to install first):
$ xsv join "Flavor" file1.csv "Name" file2.csv
ID,Name,Flavor,Name,RAM,Disk,VCPUs
45fc754d-6a9b-4bde-b7ad-be91ae60f582,test1,m1.medium,m1.medium,4096,40,2
83dbc739-e436-4c9f-a561-c5b40a3a6da5,test2,m1.tiny,m1.tiny,128,1,1
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,test3,m1.medium,m1.medium,4096,40,2
您还必须删除重复的 Name
列,您可以再次使用 xsv
执行此操作:
$ xsv join "Flavor" file1.csv "Name" file2.csv | xsv select ID,Name,Flavor,RAM,Disk,VCPUs
ID,Name,Flavor,RAM,Disk,VCPUs
45fc754d-6a9b-4bde-b7ad-be91ae60f582,test1,m1.medium,4096,40,2
83dbc739-e436-4c9f-a561-c5b40a3a6da5,test2,m1.tiny,128,1,1
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,test3,m1.medium,4096,40,2