Python：如何根据列中的最大值从大型数据帧中提取子集，包括自身之前的 nrows 和之后的 nrows

Question

假设：

df['Column_Name'].max() # is the maximum value in a particular column in a dataframe

然后，您想 select 在特定列中具有最大值的行之前的 10 行和该行之后的 10 行（即总共 10 + 1 + 10 = 21 行），那么，如何可以在 Python 内完成吗？

Answer 1

您想获取具有最大值的行的索引。假设您正在使用 Pandas，这将通过使用 idxmax().

来完成

>>> from pandas import DataFrame
>>> data = [{'a':x} for x in range(40)]
>>> from random import shuffle
>>> shuffle(data)
>>> df = DataFrame(data)
>>> index_of_max_value = df['a'].idxmax()
>>> df['a'][max(0,index_of_max_value-10):min(len(df['a']), index_of_max_value+11)]
19    16
20    36
21     8
22    20
23    14
24    31
25     6
26    18
27    17
28    23
29    39
30     5
31    25
32     4
33    12
34    35
35    26
36     0
37    27
38    21
39    30
Name: a, dtype: int64

Answer 2

这是对 @2rs2ts 解决方案的补充，用于说明您的最大值接近系列或数据帧的开头或结尾。

df['a'][max(0,index_of_max_value-10):min(len(df['a']), index_of_max_value+11)]

Python：如何根据列中的最大值从大型数据帧中提取子集，包括自身之前的 nrows 和之后的 nrows

Python: how to subset from a large dataframe based upon a maximum value in a column and nrows before and nrows after including the self

python

subset

pandas