级数除以标量得到 NaN/0
Series divided by scalar results in NaN/0
我有一个按地区分组的系列 -> 犯罪类型 -> 犯罪数量:
PdDistrict Category
BAYVIEW ASSAULT 8976
BURGLARY 2891
DISORDERLY CONDUCT 207
DRIVING UNDER THE INFLUENCE 188
DRUG/NARCOTIC 2061
...
TENDERLOIN STOLEN PROPERTY 299
TRESPASS 665
VANDALISM 1710
VEHICLE THEFT 661
WEAPON LAWS 791
Name: IncidntNum, Length: 140, dtype: int64
我的目标是用标量除以每个值。
我尝试使用遍历 "PdDistricts" 和 运行 的循环来执行此操作,以下行:
series[district] = series[district] / sum(series[district])
如果我 运行 只是 series[district] / sum(series[district])
输出是预期的:
Category
ASSAULT 0.11434063
BURGLARY 0.09323762
DISORDERLY CONDUCT 0.00427552
DRIVING UNDER THE INFLUENCE 0.00478544
DRUG/NARCOTIC 0.05691535
DRUNKENNESS 0.00596219
LARCENY/THEFT 0.46712952
PROSTITUTION 0.00027457
ROBBERY 0.02753589
STOLEN PROPERTY 0.00917863
TRESPASS 0.01247352
VANDALISM 0.09335530
VEHICLE THEFT 0.09884679
WEAPON LAWS 0.01168902
Name: IncidntNum, dtype: float64
但是当我尝试更新系列的现有部分时 运行ning series[district] = series[district] / sum(series[district])
我得到:
Category
ASSAULT 0
BURGLARY 0
DISORDERLY CONDUCT 0
DRIVING UNDER THE INFLUENCE 0
DRUG/NARCOTIC 0
DRUNKENNESS 0
LARCENY/THEFT 0
PROSTITUTION 0
ROBBERY 0
STOLEN PROPERTY 0
TRESPASS 0
VANDALISM 0
VEHICLE THEFT 0
WEAPON LAWS 0
Name: IncidntNum, dtype: int64
这不是预期的。如果我使用 .loc,我只会得到 NaN 而不是 0。
我实在想不通到底出了什么问题,我尝试过的所有解决方案都失败了,我认为关键问题是我不完全了解如何在 Pandas 中使用 Series .
我希望你能帮助我理解这个问题。
/米克尔
我相信你需要 Series.sum
每个第一级 PdDistrict
- 对于 MultiIndex
第一级的总和值:
s1 = s.sum(level=0)
print (s1)
PdDistrict
BAYVIEW 14323
TENDERLOIN 4126
Name: IncidntNum, dtype: int64
然后除以 Series.div
第一级,所以除以 PdDistrict
s 的总和:
s2 = s.div(s1, level=0)
print (s2)
PdDistrict Category
BAYVIEW ASSAULT 0.626684
BURGLARY 0.201843
DISORDERLY CONDUCT 0.014452
DRIVING UNDER THE INFLUENCE 0.013126
DRUG/NARCOTIC 0.143894
TENDERLOIN STOLEN PROPERTY 0.072467
TRESPASS 0.161173
VANDALISM 0.414445
VEHICLE THEFT 0.160204
WEAPON LAWS 0.191711
Name: IncidntNum, dtype: float64
我有一个按地区分组的系列 -> 犯罪类型 -> 犯罪数量:
PdDistrict Category
BAYVIEW ASSAULT 8976
BURGLARY 2891
DISORDERLY CONDUCT 207
DRIVING UNDER THE INFLUENCE 188
DRUG/NARCOTIC 2061
...
TENDERLOIN STOLEN PROPERTY 299
TRESPASS 665
VANDALISM 1710
VEHICLE THEFT 661
WEAPON LAWS 791
Name: IncidntNum, Length: 140, dtype: int64
我的目标是用标量除以每个值。
我尝试使用遍历 "PdDistricts" 和 运行 的循环来执行此操作,以下行:
series[district] = series[district] / sum(series[district])
如果我 运行 只是 series[district] / sum(series[district])
输出是预期的:
Category
ASSAULT 0.11434063
BURGLARY 0.09323762
DISORDERLY CONDUCT 0.00427552
DRIVING UNDER THE INFLUENCE 0.00478544
DRUG/NARCOTIC 0.05691535
DRUNKENNESS 0.00596219
LARCENY/THEFT 0.46712952
PROSTITUTION 0.00027457
ROBBERY 0.02753589
STOLEN PROPERTY 0.00917863
TRESPASS 0.01247352
VANDALISM 0.09335530
VEHICLE THEFT 0.09884679
WEAPON LAWS 0.01168902
Name: IncidntNum, dtype: float64
但是当我尝试更新系列的现有部分时 运行ning series[district] = series[district] / sum(series[district])
我得到:
Category
ASSAULT 0
BURGLARY 0
DISORDERLY CONDUCT 0
DRIVING UNDER THE INFLUENCE 0
DRUG/NARCOTIC 0
DRUNKENNESS 0
LARCENY/THEFT 0
PROSTITUTION 0
ROBBERY 0
STOLEN PROPERTY 0
TRESPASS 0
VANDALISM 0
VEHICLE THEFT 0
WEAPON LAWS 0
Name: IncidntNum, dtype: int64
这不是预期的。如果我使用 .loc,我只会得到 NaN 而不是 0。
我实在想不通到底出了什么问题,我尝试过的所有解决方案都失败了,我认为关键问题是我不完全了解如何在 Pandas 中使用 Series .
我希望你能帮助我理解这个问题。
/米克尔
我相信你需要 Series.sum
每个第一级 PdDistrict
- 对于 MultiIndex
第一级的总和值:
s1 = s.sum(level=0)
print (s1)
PdDistrict
BAYVIEW 14323
TENDERLOIN 4126
Name: IncidntNum, dtype: int64
然后除以 Series.div
第一级,所以除以 PdDistrict
s 的总和:
s2 = s.div(s1, level=0)
print (s2)
PdDistrict Category
BAYVIEW ASSAULT 0.626684
BURGLARY 0.201843
DISORDERLY CONDUCT 0.014452
DRIVING UNDER THE INFLUENCE 0.013126
DRUG/NARCOTIC 0.143894
TENDERLOIN STOLEN PROPERTY 0.072467
TRESPASS 0.161173
VANDALISM 0.414445
VEHICLE THEFT 0.160204
WEAPON LAWS 0.191711
Name: IncidntNum, dtype: float64