如何将 CSV 列数据拆分为 Python 中的两列
How to split CSV column data into two colums in Python
我有以下代码(如下),它抓取 CSV 文件并将数据合并到一个合并的 CSV 文件中。
我现在需要从其中一列中获取特定信息,将该信息添加到另一列中。
我现在有一个 output.csv 文件,其中包含以下示例数据:
ID,Name,Flavor,RAM,Disk,VCPUs
45fc754d-6a9b-4bde-b7ad-be91ae60f582,customer1-test1-dns,m1.medium,4096,40,2
83dbc739-e436-4c9f-a561-c5b40a3a6da5,customer2-test2,m1.tiny,128,1,1
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,customer3-test3-dns-api,m1.medium,4096,40,2
我需要做的是打开这个 CSV 文件并将名称列中的数据分成两列,如下所示:
ID,Name,Flavor,RAM,Disk,VCPUs,Customer,Misc
45fc754d-6a9b-4bde-b7ad-be91ae60f582,customer1-test1-dns,m1.medium,4096,40,2,customer1,test1-dns
83dbc739-e436-4c9f-a561-c5b40a3a6da5,customer2-test2,m1.tiny,128,1,1,customer2,test2
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,customer3-test3-dns-api,m1.medium,4096,40,2,customer3,test3-dns-api
请注意 Misc 列如何将多个值拆分为一个或多个 -
。
如何通过 Python 完成此操作?下面是我现在的代码:
import csv
import os
import pandas as pd
by_name = {}
with open('flavor.csv') as b:
for row in csv.DictReader(b):
name = row.pop('Name')
by_name[name] = row
with open('output.csv', 'w') as c:
w = csv.DictWriter(c, ['ID', 'Name', 'Flavor', 'RAM', 'Disk', 'VCPUs'])
w.writeheader()
with open('instance.csv') as a:
for row in csv.DictReader(a):
try:
match = by_name[row['Flavor']]
except KeyError:
continue
row.update(match)
w.writerow(row)
试试这个:
import pandas as pd
df = pd.read_csv('flavor.csv')
df[['Customer','Misc']] = df.Name.str.split('-', n=1, expand=True)
df
输出:
ID Name Flavor RAM Disk VCPUs Customer Misc
0 45fc754d-6a9b-4bde-b7ad-be91ae60f582 customer1-test1-dns m1.medium 4096 40 2 customer1 test1-dns
1 83dbc739-e436-4c9f-a561-c5b40a3a6da5 customer2-test2 m1.tiny 128 1 1 customer2 test2
2 ef68fcf3-f624-416d-a59b-bb8f1aa2a769 customer3-test3-dns-api m1.medium 4096 40 2 customer3 test3-dns-api
我建议切换到 pandas
。这里是官方Getting Started documentation.
让我们先读入 csv。
import pandas as pd
df = pd.read_csv('input.csv')
print(df.head(1))
你应该得到类似的东西:
ID Name Flavor RAM Disk VCPUs
0 45fc754d-6a9b-4bde-b7ad-be91ae60f582 customer1-test1-dns m1.medium 4096 40 2
之后,在Pandas系列中使用字符串操作:
df[['Customer','Misc']] = df.Name.str.split('-', n=1, expand=True)
最后,您可以保存 csv。
df.to_csv('output.csv')
如果您使用 pandas,这段代码会更加优雅和简单。
import pandas as pd
df = pd.read_csv('flavor.csv')
df[['Customer','Misc']] = df['Name'].str.split(pat='-',n=1,expand=True)
df.to_csv('output.csv',index=False)
这是我的做法。诀窍在于函数 "split()" :
import pandas as pd
file = pd.read_csv(r"C:\...\yourfile.csv",sep=",")
file['Customer']=None
file['Misc']=None
for x in range(len(file)):
temp=file.Name[x].split('-', maxsplit=1)
file['Customer'].iloc[x] = temp[0]
file['Misc'].iloc[x] = temp[1]
file.to_csv(r"C:\...\yourfile_result.csv")
我有以下代码(如下),它抓取 CSV 文件并将数据合并到一个合并的 CSV 文件中。
我现在需要从其中一列中获取特定信息,将该信息添加到另一列中。
我现在有一个 output.csv 文件,其中包含以下示例数据:
ID,Name,Flavor,RAM,Disk,VCPUs
45fc754d-6a9b-4bde-b7ad-be91ae60f582,customer1-test1-dns,m1.medium,4096,40,2
83dbc739-e436-4c9f-a561-c5b40a3a6da5,customer2-test2,m1.tiny,128,1,1
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,customer3-test3-dns-api,m1.medium,4096,40,2
我需要做的是打开这个 CSV 文件并将名称列中的数据分成两列,如下所示:
ID,Name,Flavor,RAM,Disk,VCPUs,Customer,Misc
45fc754d-6a9b-4bde-b7ad-be91ae60f582,customer1-test1-dns,m1.medium,4096,40,2,customer1,test1-dns
83dbc739-e436-4c9f-a561-c5b40a3a6da5,customer2-test2,m1.tiny,128,1,1,customer2,test2
ef68fcf3-f624-416d-a59b-bb8f1aa2a769,customer3-test3-dns-api,m1.medium,4096,40,2,customer3,test3-dns-api
请注意 Misc 列如何将多个值拆分为一个或多个 -
。
如何通过 Python 完成此操作?下面是我现在的代码:
import csv
import os
import pandas as pd
by_name = {}
with open('flavor.csv') as b:
for row in csv.DictReader(b):
name = row.pop('Name')
by_name[name] = row
with open('output.csv', 'w') as c:
w = csv.DictWriter(c, ['ID', 'Name', 'Flavor', 'RAM', 'Disk', 'VCPUs'])
w.writeheader()
with open('instance.csv') as a:
for row in csv.DictReader(a):
try:
match = by_name[row['Flavor']]
except KeyError:
continue
row.update(match)
w.writerow(row)
试试这个:
import pandas as pd
df = pd.read_csv('flavor.csv')
df[['Customer','Misc']] = df.Name.str.split('-', n=1, expand=True)
df
输出:
ID Name Flavor RAM Disk VCPUs Customer Misc
0 45fc754d-6a9b-4bde-b7ad-be91ae60f582 customer1-test1-dns m1.medium 4096 40 2 customer1 test1-dns
1 83dbc739-e436-4c9f-a561-c5b40a3a6da5 customer2-test2 m1.tiny 128 1 1 customer2 test2
2 ef68fcf3-f624-416d-a59b-bb8f1aa2a769 customer3-test3-dns-api m1.medium 4096 40 2 customer3 test3-dns-api
我建议切换到 pandas
。这里是官方Getting Started documentation.
让我们先读入 csv。
import pandas as pd
df = pd.read_csv('input.csv')
print(df.head(1))
你应该得到类似的东西:
ID Name Flavor RAM Disk VCPUs
0 45fc754d-6a9b-4bde-b7ad-be91ae60f582 customer1-test1-dns m1.medium 4096 40 2
之后,在Pandas系列中使用字符串操作:
df[['Customer','Misc']] = df.Name.str.split('-', n=1, expand=True)
最后,您可以保存 csv。
df.to_csv('output.csv')
如果您使用 pandas,这段代码会更加优雅和简单。
import pandas as pd
df = pd.read_csv('flavor.csv')
df[['Customer','Misc']] = df['Name'].str.split(pat='-',n=1,expand=True)
df.to_csv('output.csv',index=False)
这是我的做法。诀窍在于函数 "split()" :
import pandas as pd
file = pd.read_csv(r"C:\...\yourfile.csv",sep=",")
file['Customer']=None
file['Misc']=None
for x in range(len(file)):
temp=file.Name[x].split('-', maxsplit=1)
file['Customer'].iloc[x] = temp[0]
file['Misc'].iloc[x] = temp[1]
file.to_csv(r"C:\...\yourfile_result.csv")