如何根据另一个 pandas.Series 的索引和值按 pandas.Dataframe 的列进行分组?
How to group by a pandas.Dataframe's columns based on the indexes and values of another pandas.Series?
我正在尝试根据另一个 pandas.Series 的值和索引将数据框的列分组在一起。 Series 的索引引用 DataFrame 的列,但它可能有更多元素。最好的 pythonic 方法是什么?
为进一步说明,这是我试图解决的单元测试(使用 pytest):
def test_sum_weights_by_classification_labels_default_arguments():
portfolio_weights = pd.DataFrame([[0.1, 0.3, 0.4, 0.2],
[0.25, 0.3, 0.25, 0.2],
[0.2, 0.3, 0.1, 0.4]],
index=['2001-01-02', '2001-01-03', '2001-01-04'],
columns=['ABC', 'DEF', 'UVW', 'XYZ'])
security_classification = pd.Series(['Consumer', 'Energy', 'Consumer', 'Materials', 'Financials', 'Energy'],
index=['ABC', 'DEF', 'GHI', 'RST', 'UVW', 'XYZ'],
name='Classification')
result_sector_weights = pd.DataFrame([[0.1, 0.5, 0.4],
[0.25, 0.5, 0.25],
[0.2, 0.7, 0.1]],
index=['2001-01-02', '2001-01-03', '2001-01-04'],
columns=['Consumer', 'Energy', 'Financials'])
pd.testing.assert_frame_equal(clb.sum_weights_by_classification_labels(portfolio_weights, security_classification),
result_sector_weights)
非常感谢!
经过进一步研究,我找到了解决办法。这是我在 DataFrame 的列上使用 pandas.Series.map
得出的结果:
def sum_weights_by_classification_labels(security_weights, security_classification):
classification_weights = security_weights.copy()
classification_weights.columns = classification_weights.columns.map(security_classification)
classification_weights = classification_weights.groupby(classification_weights.columns, axis=1).sum()
return classification_weights
或者使用 pandas.DataFrame.merge
:
def sum_weights_by_classification_labels(security_weights, security_classification):
security_weights_transposed = security_weights.transpose()
merged_data = security_weights_transposed.merge(security_classification, how='left', left_index=True,
right_index=True)
classification_weights = merged_data.groupby(security_classification.name).sum().transpose()
return classification_weights
而对于第二种解决方案,需要将此行添加到单元测试中,因为无法合并没有名称的系列(添加的列需要有一个):
result_sector_weights.columns.name = security_classification.name
我保留这个 post 希望它能在将来帮助到某人。
就是这样...
我正在尝试根据另一个 pandas.Series 的值和索引将数据框的列分组在一起。 Series 的索引引用 DataFrame 的列,但它可能有更多元素。最好的 pythonic 方法是什么?
为进一步说明,这是我试图解决的单元测试(使用 pytest):
def test_sum_weights_by_classification_labels_default_arguments():
portfolio_weights = pd.DataFrame([[0.1, 0.3, 0.4, 0.2],
[0.25, 0.3, 0.25, 0.2],
[0.2, 0.3, 0.1, 0.4]],
index=['2001-01-02', '2001-01-03', '2001-01-04'],
columns=['ABC', 'DEF', 'UVW', 'XYZ'])
security_classification = pd.Series(['Consumer', 'Energy', 'Consumer', 'Materials', 'Financials', 'Energy'],
index=['ABC', 'DEF', 'GHI', 'RST', 'UVW', 'XYZ'],
name='Classification')
result_sector_weights = pd.DataFrame([[0.1, 0.5, 0.4],
[0.25, 0.5, 0.25],
[0.2, 0.7, 0.1]],
index=['2001-01-02', '2001-01-03', '2001-01-04'],
columns=['Consumer', 'Energy', 'Financials'])
pd.testing.assert_frame_equal(clb.sum_weights_by_classification_labels(portfolio_weights, security_classification),
result_sector_weights)
非常感谢!
经过进一步研究,我找到了解决办法。这是我在 DataFrame 的列上使用 pandas.Series.map
得出的结果:
def sum_weights_by_classification_labels(security_weights, security_classification):
classification_weights = security_weights.copy()
classification_weights.columns = classification_weights.columns.map(security_classification)
classification_weights = classification_weights.groupby(classification_weights.columns, axis=1).sum()
return classification_weights
或者使用 pandas.DataFrame.merge
:
def sum_weights_by_classification_labels(security_weights, security_classification):
security_weights_transposed = security_weights.transpose()
merged_data = security_weights_transposed.merge(security_classification, how='left', left_index=True,
right_index=True)
classification_weights = merged_data.groupby(security_classification.name).sum().transpose()
return classification_weights
而对于第二种解决方案,需要将此行添加到单元测试中,因为无法合并没有名称的系列(添加的列需要有一个):
result_sector_weights.columns.name = security_classification.name
我保留这个 post 希望它能在将来帮助到某人。
就是这样...