如何在 python 中最好在 pandas 中的每个 for 循环之后在 DataFrame 中添加行迭代更新它
how to add row in DataFrame iteratively updating it after each for loop in python preferably in pandas
这是我的 .csv 文件
Choco_Type,ID,Cocoa,Milk,Sugar,ID,Cocoa,Milk,Sugar
Dark,Batch_11,80,0,16,Batch_12,78,0,14
Milk,Batch_72,35,25,25,Batch_73,32,27,22
Swiss,Batch_52,30,30,20,Batch_53,28,33,18
这是我的代码
for row in reader_in:
type_chocolate=row[0]
a= [(type_chocolate,row[1],row[2],row[3],row[4]),(type_chocolate,row[5],row[6],row[7],row[8])]
df=DataFrame.from_records(a)
这应该是我的输出 DataFrame
Choco_Type ID Cocoa Milk Sugar
Dark Batch_11 80 0 16
Dark Batch_12 78 0 14
Milk Batch_72 35 25 25
Milk Batch_73 32 27 22
Swiss Batch_52 30 30 20
Swiss Batch_53 28 33 18
我无法理解如何在每个 'for' 循环后使用新行更新 DataFrame'df',这些新行是通过使用 'from_records' 函数更新的,该函数从 reader_in 因为它是输入的
首先使用 read_csv
从 csv
创建 DataFrame
。
然后 replace
.1
清空为列名称中没有重复项而添加的字符串。
set_index
with first column and use concat
with selecting first and last 4
columns by iloc
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Choco_Type,ID,Cocoa,Milk,Sugar,ID,Cocoa,Milk,Sugar
Dark,Batch_11,80,0,16,Batch_12,78,0,14
Milk,Batch_72,35,25,25,Batch_73,32,27,22
Swiss,Batch_52,30,30,20,Batch_53,28,33,18"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp))
print (df)
Choco_Type ID Cocoa Milk Sugar ID.1 Cocoa.1 Milk.1 Sugar.1
0 Dark Batch_11 80 0 16 Batch_12 78 0 14
1 Milk Batch_72 35 25 25 Batch_73 32 27 22
2 Swiss Batch_52 30 30 20 Batch_53 28 33 18
df.columns = df.columns.str.replace('.1', '')
df = df.set_index('Choco_Type')
df = pd.concat([df.iloc[:, :4], df.iloc[:, 4:]]).reset_index()
print (df)
Choco_Type ID Cocoa Milk Sugar
0 Dark Batch_11 80 0 16
1 Milk Batch_72 35 25 25
2 Swiss Batch_52 30 30 20
3 Dark Batch_12 78 0 14
4 Milk Batch_73 32 27 22
5 Swiss Batch_53 28 33 18
如果需要根据所需输出更改顺序:
df.columns = df.columns.str.replace('.1', '')
df = df.set_index('Choco_Type')
df = pd.concat([df.iloc[:, :4], df.iloc[:, 4:]], keys=(1,2), axis=1)
.stack(0)
.reset_index(level=1, drop=True)
.reset_index()
print (df)
Choco_Type ID Cocoa Milk Sugar
0 Dark Batch_11 80 0 16
1 Dark Batch_12 78 0 14
2 Milk Batch_72 35 25 25
3 Milk Batch_73 32 27 22
4 Swiss Batch_52 30 30 20
5 Swiss Batch_53 28 33 18
dict
的 pd.lreshape
的另一个解决方案由 dict comprehension
创建,其列名不包含 .1
,也需要删除 Choco_Type
:
cols = df.columns[~((df.columns.str.contains('.1')) | (df.columns == 'Choco_Type'))]
print (cols)
Index(['ID', 'Cocoa', 'Milk', 'Sugar'], dtype='object')
d = {x: df.columns[df.columns.str.contains(x)].tolist() for x in cols}
print (d)
{'Milk': ['Milk', 'Milk.1'],
'Sugar': ['Sugar', 'Sugar.1'],
'ID': ['ID', 'ID.1'],
'Cocoa': ['Cocoa', 'Cocoa.1']}
df = pd.lreshape(df, d)
print (df)
Choco_Type Milk Sugar ID Cocoa
0 Dark 0 16 Batch_11 80
1 Milk 25 25 Batch_72 35
2 Swiss 30 20 Batch_52 30
3 Dark 0 14 Batch_12 78
4 Milk 27 22 Batch_73 32
5 Swiss 33 18 Batch_53 28
这是我的 .csv 文件
Choco_Type,ID,Cocoa,Milk,Sugar,ID,Cocoa,Milk,Sugar
Dark,Batch_11,80,0,16,Batch_12,78,0,14
Milk,Batch_72,35,25,25,Batch_73,32,27,22
Swiss,Batch_52,30,30,20,Batch_53,28,33,18
这是我的代码
for row in reader_in:
type_chocolate=row[0]
a= [(type_chocolate,row[1],row[2],row[3],row[4]),(type_chocolate,row[5],row[6],row[7],row[8])]
df=DataFrame.from_records(a)
这应该是我的输出 DataFrame
Choco_Type ID Cocoa Milk Sugar
Dark Batch_11 80 0 16
Dark Batch_12 78 0 14
Milk Batch_72 35 25 25
Milk Batch_73 32 27 22
Swiss Batch_52 30 30 20
Swiss Batch_53 28 33 18
我无法理解如何在每个 'for' 循环后使用新行更新 DataFrame'df',这些新行是通过使用 'from_records' 函数更新的,该函数从 reader_in 因为它是输入的
首先使用 read_csv
从 csv
创建 DataFrame
。
然后 replace
.1
清空为列名称中没有重复项而添加的字符串。
set_index
with first column and use concat
with selecting first and last 4
columns by iloc
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Choco_Type,ID,Cocoa,Milk,Sugar,ID,Cocoa,Milk,Sugar
Dark,Batch_11,80,0,16,Batch_12,78,0,14
Milk,Batch_72,35,25,25,Batch_73,32,27,22
Swiss,Batch_52,30,30,20,Batch_53,28,33,18"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp))
print (df)
Choco_Type ID Cocoa Milk Sugar ID.1 Cocoa.1 Milk.1 Sugar.1
0 Dark Batch_11 80 0 16 Batch_12 78 0 14
1 Milk Batch_72 35 25 25 Batch_73 32 27 22
2 Swiss Batch_52 30 30 20 Batch_53 28 33 18
df.columns = df.columns.str.replace('.1', '')
df = df.set_index('Choco_Type')
df = pd.concat([df.iloc[:, :4], df.iloc[:, 4:]]).reset_index()
print (df)
Choco_Type ID Cocoa Milk Sugar
0 Dark Batch_11 80 0 16
1 Milk Batch_72 35 25 25
2 Swiss Batch_52 30 30 20
3 Dark Batch_12 78 0 14
4 Milk Batch_73 32 27 22
5 Swiss Batch_53 28 33 18
如果需要根据所需输出更改顺序:
df.columns = df.columns.str.replace('.1', '')
df = df.set_index('Choco_Type')
df = pd.concat([df.iloc[:, :4], df.iloc[:, 4:]], keys=(1,2), axis=1)
.stack(0)
.reset_index(level=1, drop=True)
.reset_index()
print (df)
Choco_Type ID Cocoa Milk Sugar
0 Dark Batch_11 80 0 16
1 Dark Batch_12 78 0 14
2 Milk Batch_72 35 25 25
3 Milk Batch_73 32 27 22
4 Swiss Batch_52 30 30 20
5 Swiss Batch_53 28 33 18
dict
的 pd.lreshape
的另一个解决方案由 dict comprehension
创建,其列名不包含 .1
,也需要删除 Choco_Type
:
cols = df.columns[~((df.columns.str.contains('.1')) | (df.columns == 'Choco_Type'))]
print (cols)
Index(['ID', 'Cocoa', 'Milk', 'Sugar'], dtype='object')
d = {x: df.columns[df.columns.str.contains(x)].tolist() for x in cols}
print (d)
{'Milk': ['Milk', 'Milk.1'],
'Sugar': ['Sugar', 'Sugar.1'],
'ID': ['ID', 'ID.1'],
'Cocoa': ['Cocoa', 'Cocoa.1']}
df = pd.lreshape(df, d)
print (df)
Choco_Type Milk Sugar ID Cocoa
0 Dark 0 16 Batch_11 80
1 Milk 25 25 Batch_72 35
2 Swiss 30 20 Batch_52 30
3 Dark 0 14 Batch_12 78
4 Milk 27 22 Batch_73 32
5 Swiss 33 18 Batch_53 28