将每个分类变量的数据集子集设置为第 99.5 个百分位数

Question

我想对 data.frame 进行子集化以仅保留每个分类变量的第 99.5 个百分位数。

我的数据已使用分钟 = 分钟并且 location = location

我想取出每个位置前 0.5% 的分钟数数据。

新子集将具有位置 1 的 99.5 个百分点。位置 2 的 99.5 个百分点，等等。

谢谢！

Answer 1

这可能会解决您的问题，但如果您可以 post 您的数据，它会很有帮助。

library(plyr)

#add a column with information on where the 99.5% cutoff is
new.dataset1 <- ddply(your.dataset, "location", mutate, minutes.99.5.cutoff =                         
                      quantile(minutes.used, 0.95)) 

#subset the data to only include the bottom 99.5% of the data, then only 
#select the first two columns
trimmed.dataset <- new.dataset1[which(new.dataset1$minutes.used <= 
                                      new.dataset1$minutes.99.5.cutoff),1:2]

将每个分类变量的数据集子集设置为第 99.5 个百分位数

Subset dataset to 99.5th percentile for each of a categorical variable

r

subset

quantile

categorical-data