如何使用 Pandas 根据另一列的值将两列加在一起
How to sum two columns together based on another column's value using Pandas
我想使用 Pandas 重现 "desired_outcome" 列。基本上每次 "Acc Type" 等于 O 时,我都必须计算余额和金额的总和。
+--------+----------+-------+---------+--------+----------+-----------------+
| MainID | Date | SubID | Balance | Amount | Acc Type | desired_outcome |
+--------+----------+-------+---------+--------+----------+-----------------+
| 1 | 1/1/2020 | 1 | 10 | 5 | O | 15 |
| 1 | 1/1/2020 | 1 | 10 | 4 | R | 10 |
| 1 | 1/1/2020 | 2 | 20 | 5 | O | 25 |
| 1 | 1/1/2020 | 2 | 20 | 4 | R | 20 |
| 1 | 1/1/2020 | 3 | 30 | 5 | O | 35 |
| 1 | 1/1/2020 | 3 | 30 | 4 | R | 30 |
| 1 | 2/1/2020 | 1 | 40 | 5 | O | 45 |
| 1 | 2/1/2020 | 1 | 40 | 4 | R | 40 |
| 1 | 2/1/2020 | 2 | 50 | 5 | O | 55 |
| 1 | 2/1/2020 | 2 | 50 | 4 | R | 50 |
| 1 | 2/1/2020 | 3 | 60 | 5 | O | 65 |
| 1 | 2/1/2020 | 3 | 60 | 4 | R | 60 |
| 2 | 1/1/2020 | 7 | 100 | NaN | O | 100 |
| 2 | 1/1/2020 | 7 | 100 | NaN | R | 100 |
+--------+----------+-------+---------+--------+----------+-----------------+
此外,我知道这不是一个理想的数据框,理想的方法可能是拥有两个数据框。我如何设置它,我会有第二个数据框,如下所示:并且仍然能够像上面那样拥有 desired_output 列(没有额外的行,因为 acc 类型将不再存在)
+--------+----------+------------+----------+
| MainID | Date | Acc Amount | Acc Type |
+--------+----------+------------+----------+
| 1 | 1/1/2020 | 5 | O |
| 1 | 1/1/2020 | 4 | R |
| 1 | 2/1/2020 | 5 | O |
| 1 | 2/1/2020 | 4 | R |
| 2 | 1/1/2020 | NaN | O |
| 2 | 1/1/2020 | NaN | R |
+--------+----------+------------+----------+
谢谢!
你的数据框很好。这是我会做的:
df['desired_outcome'] = np.where(df['Acc Type']=='O',
df['Balance'] + df['Amount'].fillna(0),
df['Balance'])
输出:
MainID Date SubID Balance Amount Acc Type desired_outcome
0 1 1/1/2020 1 10 5.0 O 15.0
1 1 1/1/2020 1 10 4.0 R 10.0
2 1 1/1/2020 2 20 5.0 O 25.0
3 1 1/1/2020 2 20 4.0 R 20.0
4 1 1/1/2020 3 30 5.0 O 35.0
5 1 1/1/2020 3 30 4.0 R 30.0
6 1 2/1/2020 1 40 5.0 O 45.0
7 1 2/1/2020 1 40 4.0 R 40.0
8 1 2/1/2020 2 50 5.0 O 55.0
9 1 2/1/2020 2 50 4.0 R 50.0
10 1 2/1/2020 3 60 5.0 O 65.0
11 1 2/1/2020 3 60 4.0 R 60.0
12 2 1/1/2020 7 100 NaN O 100.0
13 2 1/1/2020 7 100 NaN R 100.0
我想使用 Pandas 重现 "desired_outcome" 列。基本上每次 "Acc Type" 等于 O 时,我都必须计算余额和金额的总和。
+--------+----------+-------+---------+--------+----------+-----------------+
| MainID | Date | SubID | Balance | Amount | Acc Type | desired_outcome |
+--------+----------+-------+---------+--------+----------+-----------------+
| 1 | 1/1/2020 | 1 | 10 | 5 | O | 15 |
| 1 | 1/1/2020 | 1 | 10 | 4 | R | 10 |
| 1 | 1/1/2020 | 2 | 20 | 5 | O | 25 |
| 1 | 1/1/2020 | 2 | 20 | 4 | R | 20 |
| 1 | 1/1/2020 | 3 | 30 | 5 | O | 35 |
| 1 | 1/1/2020 | 3 | 30 | 4 | R | 30 |
| 1 | 2/1/2020 | 1 | 40 | 5 | O | 45 |
| 1 | 2/1/2020 | 1 | 40 | 4 | R | 40 |
| 1 | 2/1/2020 | 2 | 50 | 5 | O | 55 |
| 1 | 2/1/2020 | 2 | 50 | 4 | R | 50 |
| 1 | 2/1/2020 | 3 | 60 | 5 | O | 65 |
| 1 | 2/1/2020 | 3 | 60 | 4 | R | 60 |
| 2 | 1/1/2020 | 7 | 100 | NaN | O | 100 |
| 2 | 1/1/2020 | 7 | 100 | NaN | R | 100 |
+--------+----------+-------+---------+--------+----------+-----------------+
此外,我知道这不是一个理想的数据框,理想的方法可能是拥有两个数据框。我如何设置它,我会有第二个数据框,如下所示:并且仍然能够像上面那样拥有 desired_output 列(没有额外的行,因为 acc 类型将不再存在)
+--------+----------+------------+----------+
| MainID | Date | Acc Amount | Acc Type |
+--------+----------+------------+----------+
| 1 | 1/1/2020 | 5 | O |
| 1 | 1/1/2020 | 4 | R |
| 1 | 2/1/2020 | 5 | O |
| 1 | 2/1/2020 | 4 | R |
| 2 | 1/1/2020 | NaN | O |
| 2 | 1/1/2020 | NaN | R |
+--------+----------+------------+----------+
谢谢!
你的数据框很好。这是我会做的:
df['desired_outcome'] = np.where(df['Acc Type']=='O',
df['Balance'] + df['Amount'].fillna(0),
df['Balance'])
输出:
MainID Date SubID Balance Amount Acc Type desired_outcome
0 1 1/1/2020 1 10 5.0 O 15.0
1 1 1/1/2020 1 10 4.0 R 10.0
2 1 1/1/2020 2 20 5.0 O 25.0
3 1 1/1/2020 2 20 4.0 R 20.0
4 1 1/1/2020 3 30 5.0 O 35.0
5 1 1/1/2020 3 30 4.0 R 30.0
6 1 2/1/2020 1 40 5.0 O 45.0
7 1 2/1/2020 1 40 4.0 R 40.0
8 1 2/1/2020 2 50 5.0 O 55.0
9 1 2/1/2020 2 50 4.0 R 50.0
10 1 2/1/2020 3 60 5.0 O 65.0
11 1 2/1/2020 3 60 4.0 R 60.0
12 2 1/1/2020 7 100 NaN O 100.0
13 2 1/1/2020 7 100 NaN R 100.0