Select 仅是系列中的第 n 个最大值，每天

Question

我有一些噪音调查数据告诉我几天内测得的噪音水平。我想在每个夜间时段找到第 5 高的噪音水平。我已将其制作成 Pandas 系列，并使用 groupby 和 nlargest 方法向我显示每晚的 5 个最高噪音水平，但现在我只想查看每个时段的第 5 个最高值（即 82、86、86 , 87 等）。实现此目标的最佳方法是什么？

night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)

            Start date & time  
2021-08-18  2021-08-18 23:00:00     82.0
            2021-08-18 23:15:00     82.0
            2021-08-18 23:30:00     82.0
            2021-08-18 23:45:00     82.0
2021-08-19  2021-08-19 05:45:00    100.0
            2021-08-19 01:15:00     91.0
            2021-08-19 04:45:00     87.0
            2021-08-19 06:15:00     87.0
            2021-08-19 01:45:00     86.0
2021-08-20  2021-08-20 06:30:00     90.0
            2021-08-20 06:00:00     88.0
            2021-08-20 03:15:00     87.0
            2021-08-20 05:30:00     87.0
            2021-08-20 01:15:00     86.0
2021-08-21  2021-08-21 01:30:00     98.0
            2021-08-21 03:00:00     93.0
            2021-08-21 00:45:00     88.0
            2021-08-21 06:00:00     88.0
            2021-08-21 03:30:00     87.0
2021-08-22  2021-08-22 23:45:00    102.0
            2021-08-22 00:30:00     96.0
            2021-08-22 06:30:00     92.0
            2021-08-22 05:00:00     91.0
            2021-08-22 01:30:00     90.0
2021-08-23  2021-08-23 01:15:00     98.0
            2021-08-23 02:15:00     88.0
            2021-08-23 00:45:00     87.0
            2021-08-23 03:00:00     86.0
            2021-08-23 06:00:00     86.0
2021-08-24  2021-08-24 01:00:00     93.0
            2021-08-24 00:30:00     89.0
            2021-08-24 06:30:00     87.0
            2021-08-24 02:45:00     86.0
            2021-08-24 06:00:00     86.0```

Answer 1

我在这里看到两个选项。

根据您的值对数据进行排序，然后每组取 nth 个元素：

(night_time_lmax.sort_values(by='value_column', ascending=False)
                .groupby(by=night_time_lmax.index.date).nth(5)
)

## below gives the same result for shorter syntax:
# (night_time_lmax.sort_values(by='value_column')
#                 .groupby(by=night_time_lmax.index.date).nth(-5)
# )

或者使用双 groupby，一次用于前 5 名，一次用于最后一名：

(night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
                .groupby(by=night_time_lmax.index.date).last()
)

Select 仅是系列中的第 n 个最大值，每天

Select only the nth largest value in a Series, for each day

python

series

pandas