使用 rle() 进行索引 data.frame - 如何在函数中显示零以保持相同的向量长度？

Question

在下面的示例中，我的目标是显示 df 中连续数字转为 df_new 的年份低于 -1.2 中 threshold 的 threshold 18=] 连续实例。然后我想 return 来自列 df_new$year 的相应唯一值作为结果。我链接 rle() 函数结果的问题是长度与 df_new$year 长度不对应，因此我无法正确索引它。 rle() 函数的问题在于它不会 return 零，因此它 return 仅运行在 k 中的 threshold 之下至少 1 个值。我怎样才能改进这段代码来实现我所需要的？有没有办法强制 rle() 在 k 中包含零，或者我应该采用另一种方法吗？

# Example reproducible df:
set.seed(125)
df <- data.frame(V1=rnorm(10,-1.5,.5),
                 V2=rnorm(10,-1.5,.5),
                 V3=rnorm(10,-1.5,.5),
                 V4=rnorm(10,-1.5,.5),
                 V5=rnorm(10,-1.5,.5),
                 V6=rnorm(10,-1.5,.5),
                 V7=rnorm(10,-1.5,.5),
                 V8=rnorm(10,-1.5,.5),
                 V9=rnorm(10,-1.5,.5),
                 V10=rnorm(10,-1.5,.5))
library(data.table)
df_t <- t(df)
df_long <- melt(df_t)
df_long$year <- rep(1976:1985, each=nrow(df))
df_new <- data.frame(value=df_long$value,year=df_long$year)

# Threshold values:
 threshold = -1.2
    consecutiveentries = 5
    number <- consecutiveentries-1
# Start of the problem:
    k <- rle(df_new$value < threshold)
    years <- unique(df_new$year[k$lengths > number])

当前结果：

> years
[1] 1976 1978 1979 1980 1982 1984 1985

我希望它是什么：

> years
    [1] 1976 1980 1983 1985

Answer 1

这很丑陋，但它有效:)

df_new$year[cumsum(k$lengths)[which(k$lengths >= 5)-1]+1]

各部分：

idx <- which(k$lengths >= 5)-1 在值高于或等于 4 之前为您提供 k$lengths 的索引。

使用 cumsum(k$lengths)，然后我们在 k$lengths 上构建累加和，并获取 idx 处的元素。结果，我们得到了在第一行之前出现的行数，该行是 >=5 序列的一部分。

将此结果加 1 得到每个序列开始的行的索引。

使用 rle() 进行索引 data.frame - 如何在函数中显示零以保持相同的向量长度？

Using rle() for indexing data.frame - how to show zero's in the function to maintain the same vector length?

r

run-length-encoding

dataframe