将计算数据存储在 pandas 数据框的新多列中
store calculated data in new multicolumn of pandas dataframe
我有一个包含多索引列的 pandas 数据框:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
现在我需要将 df["bar"] 的值除以 df["baz"] 并存储在名为 "new" 的数据框中(二级索引为 1和 2).
df["bar"] / df["baz"] 给了我正确的值,但我不明白如何将其存储在数据框中。
我试过了:
df["new"] = df["bar"]/df["baz"]
和 df.loc[:, ("new", ["one", "two"])] = df["bar"]/df["baz"]
,但都给出错误。任何想法如何在数据框中以新名称存储数据?
您可以通过MultiIndex.from_product
and then use concat
添加等级:
a = df["bar"] / df["baz"]
a.columns = pd.MultiIndex.from_product([['new'], a.columns])
print (a)
new
one two
A -1.080108 -0.876062
B 0.171536 0.278908
C 2.045792 0.795082
df1 = pd.concat([df, a], axis=1)
print (df1)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first new
second two one two
A 0.449483 -1.080108 -0.876062
B 1.944702 0.171536 0.278908
C -0.384060 2.045792 0.795082
通过xs
and rename, last join
选择原始的另一种解决方案:
a = (df.xs("bar", axis=1, level=0, drop_level=False) / df["baz"])
.rename(columns={'bar':'new'})
df1 = df.join(a)
print (df1)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first new
second two one two
A 0.449483 -1.080108 -0.876062
B 1.944702 0.171536 0.278908
C -0.384060 2.045792 0.795082
并且通过 stack
and unstack
重塑的解决方案在大型 df
:
中应该更慢
df1 = df.stack()
df1['new'] = df1["bar"] / df1["baz"]
df1 = df1.unstack()
print (df1)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first new
second two one two
A 0.449483 -1.080108 -0.876062
B 1.944702 0.171536 0.278908
C -0.384060 2.045792 0.795082
loc
的解决方案:
a = (df.loc(axis=1)['bar', :] / df["baz"]).rename(columns={'bar':'new'})
print (a)
first new
second one two
A -1.080108 -0.876062
B 0.171536 0.278908
C 2.045792 0.795082
df1 = df.join(a)
print (df1)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first new
second two one two
A 0.449483 -1.080108 -0.876062
B 1.944702 0.171536 0.278908
C -0.384060 2.045792 0.795082
设置:
np.random.seed(456)
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
print (df)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first
second two
A 0.449483
B 1.944702
C -0.384060
选项 1:
In [200]: df.join((df[['bar']]/df['baz']).rename(columns={'bar':'new'}))
Out[200]:
first bar baz foo qux new
second one two one two one two one two one two
A -1.089798 2.053026 0.470218 1.440740 -0.536965 -0.667857 0.717725 -1.202051 -2.317647 1.424980
B 0.488875 0.428836 1.413451 -0.683677 -1.293274 0.374481 0.074252 -1.195414 0.345873 -0.627250
C -0.243064 -0.069446 -0.911166 0.478370 -0.948390 -0.366823 -1.499948 1.513508 0.266761 -0.145172
解释:
In [201]: df[['bar']]/df['baz']
Out[201]:
first bar
second one two
A -2.317647 1.424980
B 0.345873 -0.627250
C 0.266761 -0.145172
In [202]: (df[['bar']]/df['baz']).rename(columns={'bar':'new'})
Out[202]:
first new
second one two
A -2.317647 1.424980
B 0.345873 -0.627250
C 0.266761 -0.145172
我有一个包含多索引列的 pandas 数据框:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
现在我需要将 df["bar"] 的值除以 df["baz"] 并存储在名为 "new" 的数据框中(二级索引为 1和 2).
df["bar"] / df["baz"] 给了我正确的值,但我不明白如何将其存储在数据框中。
我试过了:
df["new"] = df["bar"]/df["baz"]
和 df.loc[:, ("new", ["one", "two"])] = df["bar"]/df["baz"]
,但都给出错误。任何想法如何在数据框中以新名称存储数据?
您可以通过MultiIndex.from_product
and then use concat
添加等级:
a = df["bar"] / df["baz"]
a.columns = pd.MultiIndex.from_product([['new'], a.columns])
print (a)
new
one two
A -1.080108 -0.876062
B 0.171536 0.278908
C 2.045792 0.795082
df1 = pd.concat([df, a], axis=1)
print (df1)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first new
second two one two
A 0.449483 -1.080108 -0.876062
B 1.944702 0.171536 0.278908
C -0.384060 2.045792 0.795082
通过xs
and rename, last join
选择原始的另一种解决方案:
a = (df.xs("bar", axis=1, level=0, drop_level=False) / df["baz"])
.rename(columns={'bar':'new'})
df1 = df.join(a)
print (df1)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first new
second two one two
A 0.449483 -1.080108 -0.876062
B 1.944702 0.171536 0.278908
C -0.384060 2.045792 0.795082
并且通过 stack
and unstack
重塑的解决方案在大型 df
:
df1 = df.stack()
df1['new'] = df1["bar"] / df1["baz"]
df1 = df1.unstack()
print (df1)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first new
second two one two
A 0.449483 -1.080108 -0.876062
B 1.944702 0.171536 0.278908
C -0.384060 2.045792 0.795082
loc
的解决方案:
a = (df.loc(axis=1)['bar', :] / df["baz"]).rename(columns={'bar':'new'})
print (a)
first new
second one two
A -1.080108 -0.876062
B 0.171536 0.278908
C 2.045792 0.795082
df1 = df.join(a)
print (df1)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first new
second two one two
A 0.449483 -1.080108 -0.876062
B 1.944702 0.171536 0.278908
C -0.384060 2.045792 0.795082
设置:
np.random.seed(456)
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
print (df)
first bar baz foo qux \
second one two one two one two one
A -0.668129 -0.498210 0.618576 0.568692 1.350509 1.629589 0.301966
B -0.345811 -0.315231 -2.015971 -1.130231 -1.111846 0.237851 -0.325130
C 1.915676 0.920348 0.936398 1.157552 -0.106208 -0.088752 -0.971485
first
second two
A 0.449483
B 1.944702
C -0.384060
选项 1:
In [200]: df.join((df[['bar']]/df['baz']).rename(columns={'bar':'new'}))
Out[200]:
first bar baz foo qux new
second one two one two one two one two one two
A -1.089798 2.053026 0.470218 1.440740 -0.536965 -0.667857 0.717725 -1.202051 -2.317647 1.424980
B 0.488875 0.428836 1.413451 -0.683677 -1.293274 0.374481 0.074252 -1.195414 0.345873 -0.627250
C -0.243064 -0.069446 -0.911166 0.478370 -0.948390 -0.366823 -1.499948 1.513508 0.266761 -0.145172
解释:
In [201]: df[['bar']]/df['baz']
Out[201]:
first bar
second one two
A -2.317647 1.424980
B 0.345873 -0.627250
C 0.266761 -0.145172
In [202]: (df[['bar']]/df['baz']).rename(columns={'bar':'new'})
Out[202]:
first new
second one two
A -2.317647 1.424980
B 0.345873 -0.627250
C 0.266761 -0.145172