最近的非 NA 值
Nearest non-NA value
我有一个数据框test
> test
foo bar baz timestamp
1 1 <NA> a 1552157998
2 1 <NA> <NA> 1552161596
3 1 stop <NA> 1552165194
4 1 <NA> b 1552168795
5 1 <NA> a 1552170839
6 1 <NA> <NA> 1552157998
7 1 stop <NA> 1552161596
8 1 <NA> a 1552165194
9 1 <NA> b 1552168795
10 1 <NA> <NA> 1552170839
我的目标是为 stop
的每个实例找到每个方向上最近的非 NA 值(基于 timestamp
),这将产生 table 这样的像这样:
> output
rownum pre post
1 3 a b
2 7 a a
有没有一种已知的方法可以用 zoo
和 na.locf()
做到这一点?
如有任何建议,我们将不胜感激
dput(test)
structure(list(foo = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), bar = c(NA,
NA, "stop", NA, NA, NA, "stop", NA, NA, NA), baz = c("a", NA,
NA, "b", "a", NA, NA, "a", "b", NA), timestamp = c(1552157998.427,
1552161596.004, 1552165194.255, 1552168794.918, 1552170839.363,
1552157998.427, 1552161596.004, 1552165194.255, 1552168794.918,
1552170839.363)), row.names = c(NA, -10L), class = "data.frame")
我将仅使用 magrittr
来组织代码。这可以很容易地转换为非 magrittr
、dplyr
或 data.table
,只需很少的努力。
library(magrittr)
test %>%
.[ order(.$timestamp), ] %>%
transform(.,
rownum = seq_len(nrow(.)),
pre = zoo::na.locf0(baz),
post = zoo::na.locf0(baz, fromLast = TRUE)) %>%
subset(., bar == "stop") %>%
.[, c("rownum", "pre", "post")]
# rownum pre post
# 7 4 a a
# 3 5 a a
(这与预期的输出不同,可能是因为这是一个错误?)
在 subset
:
之前查看它可以更好地了解它在做什么
test %>%
.[ order(.$timestamp), ] %>%
transform(.,
rownum = seq_len(nrow(.)),
pre = zoo::na.locf0(baz),
post = zoo::na.locf0(baz, fromLast = TRUE))
# foo bar baz timestamp rownum pre post
# 1 1 <NA> a 1552157998 1 a a
# 6 1 <NA> <NA> 1552157998 2 a a
# 2 1 <NA> <NA> 1552161596 3 a a
# 7 1 stop <NA> 1552161596 4 a a
# 3 1 stop <NA> 1552165194 5 a a
# 8 1 <NA> a 1552165194 6 a a
# 4 1 <NA> b 1552168795 7 b b
# 9 1 <NA> b 1552168795 8 b b
# 5 1 <NA> a 1552170839 9 a a
# 10 1 <NA> <NA> 1552170839 10 a <NA>
我有一个数据框test
> test
foo bar baz timestamp
1 1 <NA> a 1552157998
2 1 <NA> <NA> 1552161596
3 1 stop <NA> 1552165194
4 1 <NA> b 1552168795
5 1 <NA> a 1552170839
6 1 <NA> <NA> 1552157998
7 1 stop <NA> 1552161596
8 1 <NA> a 1552165194
9 1 <NA> b 1552168795
10 1 <NA> <NA> 1552170839
我的目标是为 stop
的每个实例找到每个方向上最近的非 NA 值(基于 timestamp
),这将产生 table 这样的像这样:
> output
rownum pre post
1 3 a b
2 7 a a
有没有一种已知的方法可以用 zoo
和 na.locf()
做到这一点?
如有任何建议,我们将不胜感激
dput(test)
structure(list(foo = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), bar = c(NA,
NA, "stop", NA, NA, NA, "stop", NA, NA, NA), baz = c("a", NA,
NA, "b", "a", NA, NA, "a", "b", NA), timestamp = c(1552157998.427,
1552161596.004, 1552165194.255, 1552168794.918, 1552170839.363,
1552157998.427, 1552161596.004, 1552165194.255, 1552168794.918,
1552170839.363)), row.names = c(NA, -10L), class = "data.frame")
我将仅使用 magrittr
来组织代码。这可以很容易地转换为非 magrittr
、dplyr
或 data.table
,只需很少的努力。
library(magrittr)
test %>%
.[ order(.$timestamp), ] %>%
transform(.,
rownum = seq_len(nrow(.)),
pre = zoo::na.locf0(baz),
post = zoo::na.locf0(baz, fromLast = TRUE)) %>%
subset(., bar == "stop") %>%
.[, c("rownum", "pre", "post")]
# rownum pre post
# 7 4 a a
# 3 5 a a
(这与预期的输出不同,可能是因为这是一个错误?)
在 subset
:
test %>%
.[ order(.$timestamp), ] %>%
transform(.,
rownum = seq_len(nrow(.)),
pre = zoo::na.locf0(baz),
post = zoo::na.locf0(baz, fromLast = TRUE))
# foo bar baz timestamp rownum pre post
# 1 1 <NA> a 1552157998 1 a a
# 6 1 <NA> <NA> 1552157998 2 a a
# 2 1 <NA> <NA> 1552161596 3 a a
# 7 1 stop <NA> 1552161596 4 a a
# 3 1 stop <NA> 1552165194 5 a a
# 8 1 <NA> a 1552165194 6 a a
# 4 1 <NA> b 1552168795 7 b b
# 9 1 <NA> b 1552168795 8 b b
# 5 1 <NA> a 1552170839 9 a a
# 10 1 <NA> <NA> 1552170839 10 a <NA>