如何在 python/pandas 中的 DataFrame 中添加另一个类别,仅包括缺失值?
How to add another category in a DataFrame in python/pandas including only missing values?
我有一个包含两列的数据框:'TotalCharges' 和 'Churn',有 7043 行。在 'TotalCharges' 列的 11 个单元格中,我有一个缺失值。我想要的是创建 10 个类别的 TotalCharges 加上一个名为 "MissingValues" 的类别,但我找不到实现它的方法。我的 DataFrame 如下所示:
TotalCharges Churn
0 29.85 No
1 1889.5 No
2 108.15 Yes
3 1840.75 No
4 151.65 Yes
5 820.5 Yes
6 1949.4 No
7 301.9 No
8 3046.05 Yes
9 3487.95 No
10 587.45 No
11 326.8 No
12 5681.1 No
13 5036.3 Yes
14 2686.05 No
15 7895.15 No
16 missing No
17 7382.25 No
18 528.35 Yes
.... ....
.... ....
我想得到这样的东西:
TotalCharges Churn TotalChargesCategories
0 29.85 No (18.799, 84.61]
1 1889.5 No (947.38, 1400.55]
2 108.15 Yes (84.61, 267.37]
3 1840.75 No (1400.55, 2065.52]
4 151.65 Yes (84.61, 267.37]
5 820.5 Yes (552.82, 947.38]
6 1949.4 No (1400.55, 2065.52]
7 301.9 No (267.37, 552.82]
8 3046.05 Yes (2065.52, 3132.75]
9 3487.95 No (3132.75, 4471.44]
10 587.45 No (552.82, 947.38]
11 326.8 No (267.37, 552.82]
12 5681.1 No (4471.44, 5973.69]
13 5036.3 Yes (4471.44, 5973.69]
14 2686.05 No (2065.52, 3132.75]
15 7895.15 No (5973.69, 8684.8]
16 missing No MissingValues
17 7382.25 No (5973.69, 8684.8]
18 528.35 Yes (267.37, 552.82]
.... ....
.... ....
如果没有缺失值,使用此代码会很容易:
width_bin = (pd.qcut(df.TotalCharges,10))
df = df.assign(TotalChargesCat=width_bin)
df
但是由于有 11 个缺失值,我在创建类别时遇到问题,并且此代码导致错误消息:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
只需将 missing
强制为 NaN
(通过显式替换或强制为数字数据类型),然后像以前一样使用 cut
:
df['TotalChargesCategories'] = pd.cut(pd.to_numeric(df['TotalCharges'], errors='coerce'),10)
>>> df
TotalCharges Churn TotalChargesCategories
0 29.85 No (21.985, 816.38]
1 1889.5 No (1602.91, 2389.44]
2 108.15 Yes (21.985, 816.38]
3 1840.75 No (1602.91, 2389.44]
4 151.65 Yes (21.985, 816.38]
5 820.5 Yes (816.38, 1602.91]
6 1949.4 No (1602.91, 2389.44]
7 301.9 No (21.985, 816.38]
8 3046.05 Yes (2389.44, 3175.97]
9 3487.95 No (3175.97, 3962.5]
10 587.45 No (21.985, 816.38]
11 326.8 No (21.985, 816.38]
12 5681.1 No (5535.56, 6322.09]
13 5036.3 Yes (4749.03, 5535.56]
14 2686.05 No (2389.44, 3175.97]
15 7895.15 No (7108.62, 7895.15]
16 missing No NaN
17 7382.25 No (7108.62, 7895.15]
18 528.35 Yes (21.985, 816.38]
我有一个包含两列的数据框:'TotalCharges' 和 'Churn',有 7043 行。在 'TotalCharges' 列的 11 个单元格中,我有一个缺失值。我想要的是创建 10 个类别的 TotalCharges 加上一个名为 "MissingValues" 的类别,但我找不到实现它的方法。我的 DataFrame 如下所示:
TotalCharges Churn
0 29.85 No
1 1889.5 No
2 108.15 Yes
3 1840.75 No
4 151.65 Yes
5 820.5 Yes
6 1949.4 No
7 301.9 No
8 3046.05 Yes
9 3487.95 No
10 587.45 No
11 326.8 No
12 5681.1 No
13 5036.3 Yes
14 2686.05 No
15 7895.15 No
16 missing No
17 7382.25 No
18 528.35 Yes
.... ....
.... ....
我想得到这样的东西:
TotalCharges Churn TotalChargesCategories
0 29.85 No (18.799, 84.61]
1 1889.5 No (947.38, 1400.55]
2 108.15 Yes (84.61, 267.37]
3 1840.75 No (1400.55, 2065.52]
4 151.65 Yes (84.61, 267.37]
5 820.5 Yes (552.82, 947.38]
6 1949.4 No (1400.55, 2065.52]
7 301.9 No (267.37, 552.82]
8 3046.05 Yes (2065.52, 3132.75]
9 3487.95 No (3132.75, 4471.44]
10 587.45 No (552.82, 947.38]
11 326.8 No (267.37, 552.82]
12 5681.1 No (4471.44, 5973.69]
13 5036.3 Yes (4471.44, 5973.69]
14 2686.05 No (2065.52, 3132.75]
15 7895.15 No (5973.69, 8684.8]
16 missing No MissingValues
17 7382.25 No (5973.69, 8684.8]
18 528.35 Yes (267.37, 552.82]
.... ....
.... ....
如果没有缺失值,使用此代码会很容易:
width_bin = (pd.qcut(df.TotalCharges,10))
df = df.assign(TotalChargesCat=width_bin)
df
但是由于有 11 个缺失值,我在创建类别时遇到问题,并且此代码导致错误消息:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
只需将 missing
强制为 NaN
(通过显式替换或强制为数字数据类型),然后像以前一样使用 cut
:
df['TotalChargesCategories'] = pd.cut(pd.to_numeric(df['TotalCharges'], errors='coerce'),10)
>>> df
TotalCharges Churn TotalChargesCategories
0 29.85 No (21.985, 816.38]
1 1889.5 No (1602.91, 2389.44]
2 108.15 Yes (21.985, 816.38]
3 1840.75 No (1602.91, 2389.44]
4 151.65 Yes (21.985, 816.38]
5 820.5 Yes (816.38, 1602.91]
6 1949.4 No (1602.91, 2389.44]
7 301.9 No (21.985, 816.38]
8 3046.05 Yes (2389.44, 3175.97]
9 3487.95 No (3175.97, 3962.5]
10 587.45 No (21.985, 816.38]
11 326.8 No (21.985, 816.38]
12 5681.1 No (5535.56, 6322.09]
13 5036.3 Yes (4749.03, 5535.56]
14 2686.05 No (2389.44, 3175.97]
15 7895.15 No (7108.62, 7895.15]
16 missing No NaN
17 7382.25 No (7108.62, 7895.15]
18 528.35 Yes (21.985, 816.38]