在汇总输出中包含缺失值

Question

即使其中一列不存在，我仍试图将所有行保留在汇总输出中。我有一个如下所示的数据框：

dat <- data.frame(id=c(1,1,2,2,2,3),
                  seq_num=c(0:1,0:2,0:0),
                  time=c(4,5,6,7,8,9))

然后我需要按所有 id 进行汇总，其中 id 是一行，第一个 seq_num 和第二个有一列。即使第二个不存在，我仍然希望保留该行，并在该插槽中添加 NA 。我试过 this answer 中的答案，但它们不起作用。

dat %>% 
  group_by(id, .drop=FALSE) %>% 
  summarise(seq_0_time = time[seq_num==0],
            seq_1_time = time[seq_num==1])

产出

     id seq_0_time seq_1_time
  <dbl>      <dbl>      <dbl>
1     1          4          5
2     2          6          7

不过，我仍然想要第 3 行，seq_0_time=9，并且 seq_1_time=NA 因为它不存在。

我该怎么做？

Answer 1

我的理解是，您必须对 seq_num 和 id 变量都使用 complete() 才能达到您想要的结果：

library(tidyverse)
dat <- data.frame(id=c(1,1,2,2,2,3),
                  seq_num=c(0:1,0:2,0:0),
                  time=c(4,5,6,7,8,9)) %>%
  complete(seq_num = seq_num,
           id = id)
dat %>% 
  group_by(id, .drop=FALSE) %>% 
  summarise(seq_0_time = time[seq_num==0],
            seq_1_time = time[seq_num==1])
#> # A tibble: 3 x 3
#>      id seq_0_time seq_1_time
#>   <dbl>      <dbl>      <dbl>
#> 1     1          4          5
#> 2     2          6          7
#> 3     3          9         NA

^{由 reprex package (v2.0.1)}

于 2022-04-20 创建

Answer 2

实际上使用 reshape.

可以很容易地解决这个问题

> reshape(dat, timevar='seq_num', idvar = 'id', direction = 'wide')
  id time.0 time.1 time.2
1  1      4      5     NA
3  2      6      7      8
6  3      9     NA     NA

Answer 3

如果每个 'id' 每个 'seq_num' 最多只有一个观察值，那么可以在没有 [1][ 的情况下强制转换为 NA =17=]

library(dplyr)
dat %>% 
  group_by(id) %>% 
  summarise(seq_0_time = time[seq_num ==0][1],
            seq_1_time = time[seq_num == 1][1], .groups = 'drop')

-输出

# A tibble: 3 × 3
     id seq_0_time seq_1_time
  <dbl>      <dbl>      <dbl>
1     1          4          5
2     2          6          7
3     3          9         NA

只是0的长度可以通过赋值NA修改为长度1或者类似的可以通过指定没有出现的索引来复制NA来填充2、3等

> with(dat, time[seq_num==1 & id == 3])
numeric(0)
> with(dat, time[seq_num==1 & id == 3][1])
[1] NA
> numeric(0)
numeric(0)
> numeric(0)[1]
[1] NA
> numeric(0)[1:2]
[1] NA NA

或使用length<-

> `length<-`(numeric(0), 3)
[1] NA NA NA

在汇总输出中包含缺失值

Including missing values in summarise output

r

dplyr

tidyverse