通过添加列组创建新列

Creating new columns by adding groups of columns

我有一个数据框

df = pd.DataFrame({ 
    'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'], 
    'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
    'Small Business Loans < 0K 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
    'Small Business Loans < 0K 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
    'Small Business Loans < 0K 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190],
    'Small Business Loans 0K-0K 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
    'Small Business Loans 0K-0K 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
    'Small Business Loans 0K-0K 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190]
    'Multi Family Loans 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
    'Multi Family Loans 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
    'Multi Family Loans 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190]
})

我想创建新的列,将 'Small Business Loans < 0K 2020 ([=22=]0)' 添加到 'Small Business Loans 0K-0K 2020 ([=23=]0)' 并将 'Small Business Loans < 0K 2019 ([=24=]0)' 添加到 'Small Business Loans 0K-0K 2019 ([=25=]0)'。

基本上我想按年查看贷款总额

实际数据集有很多行和其他这样的列集。

如果列在两个数据框中的名称相同,我就可以使用

df_add = df1.add(df2, fill_value=0)

所以你需要做

df['1Q16-sum'] = df['1Q16-1'] + df['1Q16-2']
df['2Q16-sum'] = df['2Q16-1'] + df['2Q16-2']
df['3Q16-sum'] = df['3Q16-1'] + df['3Q16-2']

或循环

# python 3.6+
for i in range(1,4):
    df[f'{i}Q16-sum'] = df[f'{i}Q16-1'] + df[f'{i}Q16-2']

# or other python versions
for i in range(1,4):
    id = str(i) + "Q16"
    df[id+'-sum'] = df[id+'-1'] + df[id+'-2']

给你

      BU Line_Item  1Q16-1  2Q16-1  3Q16-1  1Q16-2  2Q16-2  3Q16-2  1Qq6-sum  2Qq6-sum  3Qq6-sum
0  Total  Revenues     100     100     200     100     100     200       200       200       400
1  Total       EBT     120       0     250     120       0     250       240         0       500
2  Total  Expenses       0     130       0       0     130       0         0       260         0
3    CRS  Revenues     200     200     120     200     200     120       400       400       240
4    CRS       EBT     190     190       0     190     190       0       380       380         0
5    CRS  Expenses     210     210     190     210     210     190       420       420       380

是吗?

更新: 对于小型企业贷款,尝试 regex 过滤器:

s = '$(000)'
years = range(2018, 2021)

df.assign(**{
    f'SBL {y} {s}': df.filter(regex=fr'Small Business Loans.*{y}.*{s}').sum(1)
    for y in years
})

要结合 MF 和 SBL,将 Small Business Loans 更改为 (Multi Family|Small Business Loans):

df.assign(**{
    f'Loans {y} {s}': df.filter(regex=fr'(Multi Family|Small Business Loans).*{y}.*{s}').sum(1)
    for y in years
})

您可以 assign() 理解新列:

df = df.assign(**{
    f'{i}Q16': df[f'{i}Q16-1'] + df[f'{i}Q16-2'] for i in [1,2,3]
})

也可以使用 like 过滤器进行求和:

df = df.assign(**{
    f'{i}Q16': df.filter(like=f'{i}Q16').sum(1) for i in [1,2,3]
})

输出:

      BU Line_Item  1Q16-1  2Q16-1  3Q16-1  1Q16-2  2Q16-2  3Q16-2  1Q16  2Q16  3Q16
0  Total  Revenues     100     100     200     100     100     200   200   200   400 
1  Total       EBT     120       0     250     120       0     250   240     0   500 
2  Total  Expenses       0     130       0       0     130       0     0   260     0 
3    CRS  Revenues     200     200     120     200     200     120   400   400   240 
4    CRS       EBT     190     190       0     190     190       0   380   380     0 
5    CRS  Expenses     210     210     190     210     210     190   420   420   380