删除 pandas 数据框中满足阈值的第一行以下的行

Question

我有一个 df 看起来像：

import pandas as pd
import numpy as np
d = {'Hours':np.arange(12, 97, 12),
     'Average':np.random.random(8),
     'Count':[500, 250, 125, 75, 60, 25, 5, 15]}
df = pd.DataFrame(d)

此 df 每行的案例数都减少了。在计数下降到某个阈值以下后，我想放弃剩余部分，例如在达到 < 10 个案例阈值之后。

开始：

    Average     Count   Hours
0   0.560671    500     12
1   0.743811    250     24
2   0.953704    125     36
3   0.313850    75      48
4   0.640588    60      60
5   0.591149    25      72
6   0.302894    5       84
7   0.418912    15      96

已完成（删除第 6 行后的所有内容）：

    Average     Count   Hours
0   0.560671    500     12
1   0.743811    250     24
2   0.953704    125     36
3   0.313850    75      48
4   0.640588    60      60
5   0.591149    25      72

Answer 1

我们可以使用从布尔索引生成的索引，并使用 iloc:

对 df 进行切片

In [58]:

df.iloc[:df[df.Count < 10].index[0]]
Out[58]:
    Average  Count  Hours
0  0.183016    500     12
1  0.046221    250     24
2  0.687945    125     36
3  0.387634     75     48
4  0.167491     60     60
5  0.660325     25     72

只是为了分解这里发生的事情

In [54]:
# use a boolean mask to index into the df
df[df.Count < 10]
Out[54]:
    Average  Count  Hours
6  0.244839      5     84

In [56]:
# we want the index and can subscript the first element using [0]
df[df.Count < 10].index
Out[56]:
Int64Index([6], dtype='int64')

删除 pandas 数据框中满足阈值的第一行以下的行

Removing rows below first line that meets threshold in pandas dataframe

python

python-2.7

pandas