我如何遍历 Pandas pivot table? (多索引数据框?)
How can I iterate over Pandas pivot table? (A multi-index dataframe?)
我有一个数据透视表table我想迭代,存储在数据库中。
age weekly_income
category_weekly_income category_age
High income Middle aged 45.527721 15015.463667
Old 70.041456 14998.104486
Young 14.995210 15003.750822
Low income Middle aged 45.548155 1497.228548
Old 70.049987 1505.655319
Young 15.013538 1501.718198
Middle income Middle aged 45.516583 6514.830294
Old 69.977657 6494.626962
Young 15.020688 6487.661554
我玩过 reshape、melt、各种 for 循环、黑暗中的语法刺、堆栈链、unstacks、reset_indexes 等。我得到的最接近的语法是:
crosstab[1:2].age
有了这个我可以拉出单个值单元格,但是我无法获得索引的值。
你不需要迭代dataframe,Pandas已经通过DataFrame.to_sql(...)提供了一种将dataframe转换为sql的方法。
或者,如果您想手动向数据库中插入数据,您可以使用Pandas' to_csv(),例如:
我有一个这样的 df:
df
A B
first second
bar one 0.826425 -1.126757
two 0.682297 0.875014
baz one -1.714757 -0.436622
two -0.366858 0.341702
foo one -1.068390 -1.074582
two 0.863934 0.043367
qux one -0.510881 0.215230
two 0.760373 0.274389
# set header=False, and index=True to get the MultiIndex from pivot
print df.to_csv(header=False, index=True)
bar,one,0.8264252111679552,-1.1267570930327846
bar,two,0.6822970851678805,0.8750144682657339
baz,one,-1.7147570530422946,-0.43662238320911956
baz,two,-0.3668584476904599,0.341701643567155
foo,one,-1.068390451744478,-1.0745823278191735
foo,two,0.8639343368644695,0.043366628502542914
qux,one,-0.5108806384876237,0.21522973766619563
qux,two,0.7603733646419842,0.2743886250125428
这将为您提供一个很好的逗号分隔格式,可以很容易地在 sql 执行查询中使用,例如:
data = []
for line in df.to_csv(header=False, index=True).split('\n'):
if line:
data.append(tuple(line.split(',')))
data
[('bar', 'one', '0.8264252111679552', '-1.1267570930327846'),
('bar', 'two', '0.6822970851678805', '0.8750144682657339'),
('baz', 'one', '-1.7147570530422946', '-0.43662238320911956'),
('baz', 'two', '-0.3668584476904599', '0.341701643567155'),
('foo', 'one', '-1.068390451744478', '-1.0745823278191735'),
('foo', 'two', '0.8639343368644695', '0.043366628502542914'),
('qux', 'one', '-0.5108806384876237', '0.21522973766619563'),
('qux', 'two', '0.7603733646419842', '0.2743886250125428')]
那么只需要做一个executemany
:
...
stmt = "INSERT INTO table (first, second, A, B) VALUES (%s, %s, %s, %s)"
cursor.executemany(stmt, data)
...
希望对您有所帮助。
我有一个数据透视表table我想迭代,存储在数据库中。
age weekly_income
category_weekly_income category_age
High income Middle aged 45.527721 15015.463667
Old 70.041456 14998.104486
Young 14.995210 15003.750822
Low income Middle aged 45.548155 1497.228548
Old 70.049987 1505.655319
Young 15.013538 1501.718198
Middle income Middle aged 45.516583 6514.830294
Old 69.977657 6494.626962
Young 15.020688 6487.661554
我玩过 reshape、melt、各种 for 循环、黑暗中的语法刺、堆栈链、unstacks、reset_indexes 等。我得到的最接近的语法是:
crosstab[1:2].age
有了这个我可以拉出单个值单元格,但是我无法获得索引的值。
你不需要迭代dataframe,Pandas已经通过DataFrame.to_sql(...)提供了一种将dataframe转换为sql的方法。
或者,如果您想手动向数据库中插入数据,您可以使用Pandas' to_csv(),例如:
我有一个这样的 df:
df
A B
first second
bar one 0.826425 -1.126757
two 0.682297 0.875014
baz one -1.714757 -0.436622
two -0.366858 0.341702
foo one -1.068390 -1.074582
two 0.863934 0.043367
qux one -0.510881 0.215230
two 0.760373 0.274389
# set header=False, and index=True to get the MultiIndex from pivot
print df.to_csv(header=False, index=True)
bar,one,0.8264252111679552,-1.1267570930327846
bar,two,0.6822970851678805,0.8750144682657339
baz,one,-1.7147570530422946,-0.43662238320911956
baz,two,-0.3668584476904599,0.341701643567155
foo,one,-1.068390451744478,-1.0745823278191735
foo,two,0.8639343368644695,0.043366628502542914
qux,one,-0.5108806384876237,0.21522973766619563
qux,two,0.7603733646419842,0.2743886250125428
这将为您提供一个很好的逗号分隔格式,可以很容易地在 sql 执行查询中使用,例如:
data = []
for line in df.to_csv(header=False, index=True).split('\n'):
if line:
data.append(tuple(line.split(',')))
data
[('bar', 'one', '0.8264252111679552', '-1.1267570930327846'),
('bar', 'two', '0.6822970851678805', '0.8750144682657339'),
('baz', 'one', '-1.7147570530422946', '-0.43662238320911956'),
('baz', 'two', '-0.3668584476904599', '0.341701643567155'),
('foo', 'one', '-1.068390451744478', '-1.0745823278191735'),
('foo', 'two', '0.8639343368644695', '0.043366628502542914'),
('qux', 'one', '-0.5108806384876237', '0.21522973766619563'),
('qux', 'two', '0.7603733646419842', '0.2743886250125428')]
那么只需要做一个executemany
:
...
stmt = "INSERT INTO table (first, second, A, B) VALUES (%s, %s, %s, %s)"
cursor.executemany(stmt, data)
...
希望对您有所帮助。