通过添加列组创建新列
Creating new columns by adding groups of columns
我有一个数据框
df = pd.DataFrame({
'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'],
'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Small Business Loans < 0K 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
'Small Business Loans < 0K 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
'Small Business Loans < 0K 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190],
'Small Business Loans 0K-0K 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
'Small Business Loans 0K-0K 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
'Small Business Loans 0K-0K 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190]
'Multi Family Loans 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
'Multi Family Loans 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
'Multi Family Loans 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190]
})
我想创建新的列,将 'Small Business Loans < 0K 2020 ([=22=]0)' 添加到 'Small Business Loans 0K-0K 2020 ([=23=]0)' 并将 'Small Business Loans < 0K 2019 ([=24=]0)' 添加到 'Small Business Loans 0K-0K 2019 ([=25=]0)'。
基本上我想按年查看贷款总额
实际数据集有很多行和其他这样的列集。
如果列在两个数据框中的名称相同,我就可以使用
df_add = df1.add(df2, fill_value=0)
所以你需要做
df['1Q16-sum'] = df['1Q16-1'] + df['1Q16-2']
df['2Q16-sum'] = df['2Q16-1'] + df['2Q16-2']
df['3Q16-sum'] = df['3Q16-1'] + df['3Q16-2']
或循环
# python 3.6+
for i in range(1,4):
df[f'{i}Q16-sum'] = df[f'{i}Q16-1'] + df[f'{i}Q16-2']
# or other python versions
for i in range(1,4):
id = str(i) + "Q16"
df[id+'-sum'] = df[id+'-1'] + df[id+'-2']
给你
BU Line_Item 1Q16-1 2Q16-1 3Q16-1 1Q16-2 2Q16-2 3Q16-2 1Qq6-sum 2Qq6-sum 3Qq6-sum
0 Total Revenues 100 100 200 100 100 200 200 200 400
1 Total EBT 120 0 250 120 0 250 240 0 500
2 Total Expenses 0 130 0 0 130 0 0 260 0
3 CRS Revenues 200 200 120 200 200 120 400 400 240
4 CRS EBT 190 190 0 190 190 0 380 380 0
5 CRS Expenses 210 210 190 210 210 190 420 420 380
是吗?
更新: 对于小型企业贷款,尝试 regex
过滤器:
s = '$(000)'
years = range(2018, 2021)
df.assign(**{
f'SBL {y} {s}': df.filter(regex=fr'Small Business Loans.*{y}.*{s}').sum(1)
for y in years
})
要结合 MF 和 SBL,将 Small Business Loans
更改为 (Multi Family|Small Business Loans)
:
df.assign(**{
f'Loans {y} {s}': df.filter(regex=fr'(Multi Family|Small Business Loans).*{y}.*{s}').sum(1)
for y in years
})
您可以 assign()
理解新列:
df = df.assign(**{
f'{i}Q16': df[f'{i}Q16-1'] + df[f'{i}Q16-2'] for i in [1,2,3]
})
也可以使用 like
过滤器进行求和:
df = df.assign(**{
f'{i}Q16': df.filter(like=f'{i}Q16').sum(1) for i in [1,2,3]
})
输出:
BU Line_Item 1Q16-1 2Q16-1 3Q16-1 1Q16-2 2Q16-2 3Q16-2 1Q16 2Q16 3Q16
0 Total Revenues 100 100 200 100 100 200 200 200 400
1 Total EBT 120 0 250 120 0 250 240 0 500
2 Total Expenses 0 130 0 0 130 0 0 260 0
3 CRS Revenues 200 200 120 200 200 120 400 400 240
4 CRS EBT 190 190 0 190 190 0 380 380 0
5 CRS Expenses 210 210 190 210 210 190 420 420 380
我有一个数据框
df = pd.DataFrame({
'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'],
'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Small Business Loans < 0K 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
'Small Business Loans < 0K 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
'Small Business Loans < 0K 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190],
'Small Business Loans 0K-0K 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
'Small Business Loans 0K-0K 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
'Small Business Loans 0K-0K 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190]
'Multi Family Loans 2020 ([=10=]0)': [100, 120, 0, 200, 190, 210],
'Multi Family Loans 2019 ([=10=]0)': [100, 0, 130, 200, 190, 210],
'Multi Family Loans 2018 ([=10=]0)': [200, 250, 0, 120, 0, 190]
})
我想创建新的列,将 'Small Business Loans < 0K 2020 ([=22=]0)' 添加到 'Small Business Loans 0K-0K 2020 ([=23=]0)' 并将 'Small Business Loans < 0K 2019 ([=24=]0)' 添加到 'Small Business Loans 0K-0K 2019 ([=25=]0)'。
基本上我想按年查看贷款总额
实际数据集有很多行和其他这样的列集。
如果列在两个数据框中的名称相同,我就可以使用
df_add = df1.add(df2, fill_value=0)
所以你需要做
df['1Q16-sum'] = df['1Q16-1'] + df['1Q16-2']
df['2Q16-sum'] = df['2Q16-1'] + df['2Q16-2']
df['3Q16-sum'] = df['3Q16-1'] + df['3Q16-2']
或循环
# python 3.6+
for i in range(1,4):
df[f'{i}Q16-sum'] = df[f'{i}Q16-1'] + df[f'{i}Q16-2']
# or other python versions
for i in range(1,4):
id = str(i) + "Q16"
df[id+'-sum'] = df[id+'-1'] + df[id+'-2']
给你
BU Line_Item 1Q16-1 2Q16-1 3Q16-1 1Q16-2 2Q16-2 3Q16-2 1Qq6-sum 2Qq6-sum 3Qq6-sum
0 Total Revenues 100 100 200 100 100 200 200 200 400
1 Total EBT 120 0 250 120 0 250 240 0 500
2 Total Expenses 0 130 0 0 130 0 0 260 0
3 CRS Revenues 200 200 120 200 200 120 400 400 240
4 CRS EBT 190 190 0 190 190 0 380 380 0
5 CRS Expenses 210 210 190 210 210 190 420 420 380
是吗?
更新: 对于小型企业贷款,尝试 regex
过滤器:
s = '$(000)'
years = range(2018, 2021)
df.assign(**{
f'SBL {y} {s}': df.filter(regex=fr'Small Business Loans.*{y}.*{s}').sum(1)
for y in years
})
要结合 MF 和 SBL,将 Small Business Loans
更改为 (Multi Family|Small Business Loans)
:
df.assign(**{
f'Loans {y} {s}': df.filter(regex=fr'(Multi Family|Small Business Loans).*{y}.*{s}').sum(1)
for y in years
})
您可以 assign()
理解新列:
df = df.assign(**{
f'{i}Q16': df[f'{i}Q16-1'] + df[f'{i}Q16-2'] for i in [1,2,3]
})
也可以使用 like
过滤器进行求和:
df = df.assign(**{
f'{i}Q16': df.filter(like=f'{i}Q16').sum(1) for i in [1,2,3]
})
输出:
BU Line_Item 1Q16-1 2Q16-1 3Q16-1 1Q16-2 2Q16-2 3Q16-2 1Q16 2Q16 3Q16
0 Total Revenues 100 100 200 100 100 200 200 200 400
1 Total EBT 120 0 250 120 0 250 240 0 500
2 Total Expenses 0 130 0 0 130 0 0 260 0
3 CRS Revenues 200 200 120 200 200 120 400 400 240
4 CRS EBT 190 190 0 190 190 0 380 380 0
5 CRS Expenses 210 210 190 210 210 190 420 420 380