如何在 R 中的长数据上使用 summarise()?

How to use summarise() on a long data in R?

我有一个长数据,其中 SBP/DBP 跨越数据行。

数据

df <- read.table(text = "
ID ITEM_NAME ITEM_VALUE DATE
1  SBP       154        20210102
1  DBP       66         20210111
2  SBP       115        20210513
2  SBP       113        20210513
2  DBP       62         20210413", header = TRUE)

我想将其转换为宽形式,使用带有 dplyr::group_bydplyr::summarise 函数的汇总函数,统计数据为:

想要的数据

df_new <- read.table(text = "
ID    DBP_no DBP_no_atleast2 DBP_mean DBP_latest SBP_no SBP_no_atleast2 SBP_mean SBP_latest
1     1      0               66       66         1      0               154      154       
2     1      0               62       62         2      1               114      113", header = TRUE)

方法

我尝试了以下代码,它为我提供了聚合数据,每一行都是 IDITEM_NAME 的唯一组合。

df %>% 
  group_by(ID, ITEM_NAME) %>% 
  summarise(meas_no = n(),
            meas_no_atleast2 = +(meas_no >= 2),
            meas_mean = mean(ITEM_VALUE),
            meas_latest = dplyr::last(ITEM_VALUE, DATE)) %>% 
  ungroup()
# ID ITEM_NAME meas_no meas_no_atleast2 meas_mean meas_latest
# <int> <chr>       <int>            <int>     <dbl>       <int>
#     1 DBP             1                0        66          66
#     1 SBP             1                0       154         154
#     2 DBP             1                0        62          62
#     2 SBP             2                1       114         113

因此,我不得不使用 tidyr::pivot_longer 和 tidyr::pivot_wider 进一步整理它以获得所需的结果。

df %>% 
  group_by(ID, ITEM_NAME) %>% 
  summarise(meas_no = n(),
            meas_no_atleast2 = +(meas_no >= 2),
            meas_mean = mean(ITEM_VALUE),
            meas_latest = dplyr::last(ITEM_VALUE, DATE)) %>% 
  ungroup() %>% 
  mutate(across(everything(), as.character)) %>% 
  pivot_longer(-c(ID, ITEM_NAME)) %>% 
  mutate(name = str_remove(name, "meas_")) %>% 
  unite(ITEM_NAME, c(ITEM_NAME, name)) %>% 
  pivot_wider(names_from = ITEM_NAME, values_from = value)
# ID    DBP_no DBP_no_atleast2 DBP_mean DBP_latest SBP_no SBP_no_atleast2 SBP_mean SBP_latest
# <chr> <chr>  <chr>           <chr>    <chr>      <chr>  <chr>           <chr>    <chr>     
# 1     1      0               66       66         1      0               154      154       
# 2     1      0               62       62         2      1               114      113

符合objective,就是这么长。我只是想知道是否有一个捷径,所以如果我想分享,其他 R 用户可以阅读它。

替代方法(无效)

我尝试了以下代码,但它也没有给出我想要的:

df %>% 
  group_by(ID) %>% 
  summarise(DBP_no = n()[ITEM_NAME == "DBP"],
            DBP_no_atleast2 = +(DBP_no >= 2),
            DBP_mean = mean(ITEM_VALUE[ITEM_NAME == "DBP"]),
            DBP_latest = dplyr::last(ITEM_VALUE[ITEM_NAME == "DBP"], DATE[ITEM_NAME == "DBP"]),
            SBP_no = n()[ITEM_NAME == "SBP"],
            SBP_no_atleast2 = +(SBP_no >= 2),
            SBP_mean = mean(ITEM_VALUE[ITEM_NAME == "SBP"]),
            SBP_latest = dplyr::last(ITEM_VALUE[ITEM_NAME == "SBP"], DATE[ITEM_NAME == "SBP"])) %>% 
  ungroup()
# ID DBP_no DBP_no_atleast2 DBP_mean DBP_latest SBP_no SBP_no_atleast2 SBP_mean SBP_latest
# <int>  <int>           <int>    <dbl>      <int>  <int>           <int>    <dbl>      <int>
#     1     NA              NA       66         66      2               1      154        154
#     2     NA              NA       62         62      3               1      114        113
#     2     NA              NA       62         62     NA              NA      114        113

summarise之后可以直接进行pivot_wider步骤-

library(dplyr)
library(tidyr)

df %>% 
  group_by(ID, ITEM_NAME) %>% 
  summarise(meas_no = n(),
            meas_no_atleast2 = +(meas_no >= 2),
            meas_mean = mean(ITEM_VALUE),
            meas_latest = dplyr::last(ITEM_VALUE, DATE)) %>% 
  ungroup %>%
  pivot_wider(names_from = ITEM_NAME, values_from = meas_no:meas_latest)