如何从行偏度大于 R 中给定值的数据框中随机 select 行

Question

我正在尝试从具有 1000 行（和六列）的数据框中 select 随机行，其中行的偏度大于给定值（比如 Sk > 0.3）。

我生成了以下数据框

df=data.frame(replicate(6,sample(10:100,1000,rep=TRUE)))

我可以从 fbasics 包中获取行偏度：

rowSkewness(df) 给出：

   [8] -0.2243295435  0.5306809351  0.0707122386  0.0341447417  0.3339384838 -0.3910593364 -0.6443905090
  [15]  0.5603809206  0.4406091534 -0.3736108832  0.0397860038  0.9970040772 -0.7702547535  0.2065830354

但是现在，我需要 select 说 10 行 df 的行偏度大于说 0.1...可能

for (a in 1:10) {
  sample.data[a,] = sample(x=df[wich(rowSkewness(df[sample(1:nrow(df),1)>0.1),], size = 1, replace = TRUE)
}

或类似的东西？

如有任何想法，我们将不胜感激。提前致谢。

Answer 1

知道了：

x=df %>% filter(rowSkewness(df)>0.1)
for (a in 1:samplesize) {
  sample.data[a,] = sample(x=x, size = 1, replace = TRUE)
}

Answer 2

您可以使用 sample_n() 函数或 sample_frac() - 让您的版本更短一些：

library(tidyr)
library(fBasics)
df=data.frame(replicate(6,sample(10:100,1000,rep=TRUE)))
x=df %>% dplyr::filter(rowSkewness(df)>0.1)  %>% dplyr::sample_n(10)

Answer 3

只做一个子集：

res1 <- DF[fBasics::rowSkewness(DF) > .1, ]

head(res1)
#    X1 X2 X3 X4 X5 X6
# 7  56 28 21 93 74 24
# 8  33 56 23 44 10 12
# 12 29 19 29 38 94 95
# 13 35 51 54 98 66 10
# 14 12 51 24 23 36 68
# 15 50 37 81 22 55 97

或 e1071::skewness:

res2 <- DF[apply(as.matrix(DF), 1, e1071::skewness) > .1, ]

stopifnot(all.equal(res1, res2))

数据

set.seed(42); DF <- data.frame(replicate(6, sample(10:100, 1000, rep=TRUE)))

如何从行偏度大于 R 中给定值的数据框中随机 select 行

How to randomly select row from a dataframe for which the row skewness is larger that a given value in R

random

select

r

rows

dataframe

数据