从原始 data.frame 到 select 值的组创建索引以在结果中使用

Question

我有一个data.framedf。我想使用 summarize 的输出作为索引创建一个新变量，以从原始 data.frame.

中的列中检索值

df.l 有以下列 trial、location、posi、date 和 value.

我想使用每个组（trial、location、date）的“value==1”的总和作为索引select 来自 posi 的值并将其存储为新变量。

value indf.l 可以是 1 或 0（一旦它变为零，只要它的顺序正确，它就会保持不变，即 posi 0 - 1）。此分组总和表示值在组内从 1 变为 0 的位置。

要确定索引位置，请使用以下代码：

test <- df.l %>% 
  group_by(trial, location, date) %>%
  summarise(n= sum(value==1))

当然，posi 不见了。

我希望像下面的代码这样的东西能起作用，但它不起作用。它以正确的结果开始，但在某处索引出错了。不知道像我这样叫一个专栏有没有意义

test <- df.l %>% 
  group_by(trial, location, date) %>%
  summarise(n= sum(value==1)) %>%
  mutate(ANS = nth(df.l$posi,n))

我可以使用 dplyr 从组中创建一个 "index" 到 select 来自原始 data.frame 的值，然后将此变量添加到新的 data.frame？或者，是否有另一种方法使用 dplyr 来实现相同的结果？

# truncated data.frame
df.l <- structure(list(trial = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    location = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
    3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), posi = c(0, 
    0.28, 0.65, 1, 0, 0.33, 0.67, 1, 0, 0.2, 0.5, 1, 0, 0.28, 
    0.65, 1, 0, 0.33, 0.67, 1, 0, 0.2, 0.5, 1), date = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), value = c(1L, 1L, 1L, 0L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 
    1L, 1L, 1L, 0L, 0L)), .Names = c("trial", "location", "posi", "date", "value"), row.names = c(NA, 24L), class = "data.frame")

    #desired result
    result <- structure(list(trial = c(1L, 1L, 1L, 2L, 2L, 2L), location = c(1L, 
2L, 3L, 1L, 2L, 3L), date = c(1L, 1L, 1L, 1L, 1L, 1L), n = c(3L, 
4L, 4L, 1L, 4L, 2L), posi = c(0.65, 1, 1, 0, 1, 0.2)), class = "data.frame", .Names = c("trial", 
"location", "date", "n", "posi"), row.names = c(NA, -6L))

Answer 1

你可以在 summarise:

df.l %>% 
    group_by(trial, location, date) %>%
    summarise(n= sum(value==1), ANS = nth(posi,n))
#Source: local data frame [6 x 5]
#Groups: trial, location
#
#  trial location date n  ANS
#1     1        1    1 3 0.65
#2     1        2    1 4 1.00
#3     1        3    1 4 1.00
#4     2        1    1 1 0.00
#5     2        2    1 4 1.00
#6     2        3    1 2 0.20

或者，如果您实际上不需要结果中的 n，您可以这样做

df.l %>% 
    group_by(trial, location, date) %>%
    summarise(ANS = nth(posi, sum(value == 1)))

或

df.l %>% 
    group_by(trial, location, date) %>%
    summarise(ANS = posi[sum(value == 1)])

Answer 2

slice 似乎是这里最自然的选择：

df.l %>% group_by(trial,location,date) %>% mutate(n=row_number()) %>% slice(sum(value))

这给出了

  trial location posi date value n
1     1        1 0.65    1     1 3
2     1        2 1.00    1     1 4
3     1        3 1.00    1     1 4
4     2        1 0.00    1     1 1
5     2        2 1.00    1     1 4
6     2        3 0.20    1     1 2

slice 函数根据索引选择一个或多个行（如果适用，在一个组内），正如 OP 描述的那样。

从原始 data.frame 到 select 值的组创建索引以在结果中使用

Create index from group to select value from original data.frame to use in result

r

dplyr