Pandas DateOffset 复制其他列
Pandas DateOffset to duplicate other columns
我有一个数据框,其中包含提供给账户 (account_id) 的贷款 (loan_id)、贷款期限 (loan_duration) 和每月还款额 (monthly_loan_payment).
最后,我想提取每个客户每个月的月付款总和。为了到达那里,我试图提取一个数据框,该数据框为我提供 account_id、每笔贷款的月份和每月还款额及其持续时间的每个月。假设一笔贷款是在 07/1993 发放的,每月支付 1000 美元,期限为 12 个月,我想 return 与 account_id、loan_id 和贷款期限 12 个月中每一个月的每月付款信息。 df 中的每笔贷款都相同。
我尝试了 df.groupby('account_id').apply(lambda x: x['date'] + pd.DateOffset(months = x['loan_duration'], axis=1)['monthly_payment']
但没有成功。如何在每一行上进行日期偏移,同时复制其他列的内容?
您可以为每笔贷款创建一个 pd.date_range
并使用 df.explode
获取所有单独的付款。
# sample data
# please always provide a callable line of code with your data
# you can get it with `df.head().to_dict('split')`
df = pd.DataFrame({
'account_id': [1, 1, 2, 3, 3],
'loan_id': [1, 2, 3, 4, 5],
'date': ['1993-07-01', '1993-08-01', '1993-09-01', '1993-09-01', '1993-09-01'],
'loan_duration_months': [12, 6, 5, 10, 10],
'monthly_payment': [1000, 500, 1000, 1000, 1000]
})
df['date'] = pd.to_datetime(df['date'])
df['payment_date'] = [
pd.date_range(start, periods=duration, freq='M')
for start, duration in zip(df['date'], df['loan_duration_months'])
]
df = df.explode('payment_date', ignore_index=True)
输出
account_id loan_id date loan_duration_months monthly_payment payment_date
0 1 1 1993-07-01 12 1000 1993-07-31
1 1 1 1993-07-01 12 1000 1993-08-31
2 1 1 1993-07-01 12 1000 1993-09-30
3 1 1 1993-07-01 12 1000 1993-10-31
4 1 1 1993-07-01 12 1000 1993-11-30
5 1 1 1993-07-01 12 1000 1993-12-31
6 1 1 1993-07-01 12 1000 1994-01-31
7 1 1 1993-07-01 12 1000 1994-02-28
8 1 1 1993-07-01 12 1000 1994-03-31
9 1 1 1993-07-01 12 1000 1994-04-30
10 1 1 1993-07-01 12 1000 1994-05-31
11 1 1 1993-07-01 12 1000 1994-06-30
12 1 2 1993-08-01 6 500 1993-08-31
13 1 2 1993-08-01 6 500 1993-09-30
14 1 2 1993-08-01 6 500 1993-10-31
15 1 2 1993-08-01 6 500 1993-11-30
16 1 2 1993-08-01 6 500 1993-12-31
17 1 2 1993-08-01 6 500 1994-01-31
18 2 3 1993-09-01 5 1000 1993-09-30
19 2 3 1993-09-01 5 1000 1993-10-31
20 2 3 1993-09-01 5 1000 1993-11-30
21 2 3 1993-09-01 5 1000 1993-12-31
22 2 3 1993-09-01 5 1000 1994-01-31
23 3 4 1993-09-01 10 1000 1993-09-30
24 3 4 1993-09-01 10 1000 1993-10-31
25 3 4 1993-09-01 10 1000 1993-11-30
26 3 4 1993-09-01 10 1000 1993-12-31
27 3 4 1993-09-01 10 1000 1994-01-31
28 3 4 1993-09-01 10 1000 1994-02-28
29 3 4 1993-09-01 10 1000 1994-03-31
30 3 4 1993-09-01 10 1000 1994-04-30
31 3 4 1993-09-01 10 1000 1994-05-31
32 3 4 1993-09-01 10 1000 1994-06-30
33 3 5 1993-09-01 10 1000 1993-09-30
34 3 5 1993-09-01 10 1000 1993-10-31
35 3 5 1993-09-01 10 1000 1993-11-30
36 3 5 1993-09-01 10 1000 1993-12-31
37 3 5 1993-09-01 10 1000 1994-01-31
38 3 5 1993-09-01 10 1000 1994-02-28
39 3 5 1993-09-01 10 1000 1994-03-31
40 3 5 1993-09-01 10 1000 1994-04-30
41 3 5 1993-09-01 10 1000 1994-05-31
42 3 5 1993-09-01 10 1000 1994-06-30
我有一个数据框,其中包含提供给账户 (account_id) 的贷款 (loan_id)、贷款期限 (loan_duration) 和每月还款额 (monthly_loan_payment).
最后,我想提取每个客户每个月的月付款总和。为了到达那里,我试图提取一个数据框,该数据框为我提供 account_id、每笔贷款的月份和每月还款额及其持续时间的每个月。假设一笔贷款是在 07/1993 发放的,每月支付 1000 美元,期限为 12 个月,我想 return 与 account_id、loan_id 和贷款期限 12 个月中每一个月的每月付款信息。 df 中的每笔贷款都相同。
我尝试了 df.groupby('account_id').apply(lambda x: x['date'] + pd.DateOffset(months = x['loan_duration'], axis=1)['monthly_payment']
但没有成功。如何在每一行上进行日期偏移,同时复制其他列的内容?
您可以为每笔贷款创建一个 pd.date_range
并使用 df.explode
获取所有单独的付款。
# sample data
# please always provide a callable line of code with your data
# you can get it with `df.head().to_dict('split')`
df = pd.DataFrame({
'account_id': [1, 1, 2, 3, 3],
'loan_id': [1, 2, 3, 4, 5],
'date': ['1993-07-01', '1993-08-01', '1993-09-01', '1993-09-01', '1993-09-01'],
'loan_duration_months': [12, 6, 5, 10, 10],
'monthly_payment': [1000, 500, 1000, 1000, 1000]
})
df['date'] = pd.to_datetime(df['date'])
df['payment_date'] = [
pd.date_range(start, periods=duration, freq='M')
for start, duration in zip(df['date'], df['loan_duration_months'])
]
df = df.explode('payment_date', ignore_index=True)
输出
account_id loan_id date loan_duration_months monthly_payment payment_date
0 1 1 1993-07-01 12 1000 1993-07-31
1 1 1 1993-07-01 12 1000 1993-08-31
2 1 1 1993-07-01 12 1000 1993-09-30
3 1 1 1993-07-01 12 1000 1993-10-31
4 1 1 1993-07-01 12 1000 1993-11-30
5 1 1 1993-07-01 12 1000 1993-12-31
6 1 1 1993-07-01 12 1000 1994-01-31
7 1 1 1993-07-01 12 1000 1994-02-28
8 1 1 1993-07-01 12 1000 1994-03-31
9 1 1 1993-07-01 12 1000 1994-04-30
10 1 1 1993-07-01 12 1000 1994-05-31
11 1 1 1993-07-01 12 1000 1994-06-30
12 1 2 1993-08-01 6 500 1993-08-31
13 1 2 1993-08-01 6 500 1993-09-30
14 1 2 1993-08-01 6 500 1993-10-31
15 1 2 1993-08-01 6 500 1993-11-30
16 1 2 1993-08-01 6 500 1993-12-31
17 1 2 1993-08-01 6 500 1994-01-31
18 2 3 1993-09-01 5 1000 1993-09-30
19 2 3 1993-09-01 5 1000 1993-10-31
20 2 3 1993-09-01 5 1000 1993-11-30
21 2 3 1993-09-01 5 1000 1993-12-31
22 2 3 1993-09-01 5 1000 1994-01-31
23 3 4 1993-09-01 10 1000 1993-09-30
24 3 4 1993-09-01 10 1000 1993-10-31
25 3 4 1993-09-01 10 1000 1993-11-30
26 3 4 1993-09-01 10 1000 1993-12-31
27 3 4 1993-09-01 10 1000 1994-01-31
28 3 4 1993-09-01 10 1000 1994-02-28
29 3 4 1993-09-01 10 1000 1994-03-31
30 3 4 1993-09-01 10 1000 1994-04-30
31 3 4 1993-09-01 10 1000 1994-05-31
32 3 4 1993-09-01 10 1000 1994-06-30
33 3 5 1993-09-01 10 1000 1993-09-30
34 3 5 1993-09-01 10 1000 1993-10-31
35 3 5 1993-09-01 10 1000 1993-11-30
36 3 5 1993-09-01 10 1000 1993-12-31
37 3 5 1993-09-01 10 1000 1994-01-31
38 3 5 1993-09-01 10 1000 1994-02-28
39 3 5 1993-09-01 10 1000 1994-03-31
40 3 5 1993-09-01 10 1000 1994-04-30
41 3 5 1993-09-01 10 1000 1994-05-31
42 3 5 1993-09-01 10 1000 1994-06-30