计算数据框列中出现一堆值的频率
Count the frequency that a bunch of values occurs in a dataframe column
对 python 和 pandas 很陌生,我的数据框的一列中有 15000 个值,就像这样。
col1
col2
5
0.05964
19
0.00325
31
0.0225
12
0.03325
14
0.00525
我想输出这样的结果:
0.00 to 0.01 = 55 values,
0.01 to 0.02 = 365 values,
0.02 to 0.03 = 5464 values etc... from 0.00 to 1.00
我对 groupby 或 count.values 等有点迷茫...
感谢您的帮助!
IIUC,使用pd.cut
:
out = df.groupby(pd.cut(df['col2'], np.linspace(0, 1, 101)))['col1'].sum()
print(out)
# Output
col2
(0.0, 0.01] 33
(0.01, 0.02] 0
(0.02, 0.03] 31
(0.03, 0.04] 12
(0.04, 0.05] 0
..
(0.95, 0.96] 0
(0.96, 0.97] 0
(0.97, 0.98] 0
(0.98, 0.99] 0
(0.99, 1.0] 0
Name: col1, Length: 100, dtype: int64
对 python 和 pandas 很陌生,我的数据框的一列中有 15000 个值,就像这样。
col1 | col2 |
---|---|
5 | 0.05964 |
19 | 0.00325 |
31 | 0.0225 |
12 | 0.03325 |
14 | 0.00525 |
我想输出这样的结果:
0.00 to 0.01 = 55 values,
0.01 to 0.02 = 365 values,
0.02 to 0.03 = 5464 values etc... from 0.00 to 1.00
我对 groupby 或 count.values 等有点迷茫...
感谢您的帮助!
IIUC,使用pd.cut
:
out = df.groupby(pd.cut(df['col2'], np.linspace(0, 1, 101)))['col1'].sum()
print(out)
# Output
col2
(0.0, 0.01] 33
(0.01, 0.02] 0
(0.02, 0.03] 31
(0.03, 0.04] 12
(0.04, 0.05] 0
..
(0.95, 0.96] 0
(0.96, 0.97] 0
(0.97, 0.98] 0
(0.98, 0.99] 0
(0.99, 1.0] 0
Name: col1, Length: 100, dtype: int64