折叠成单行而不串联
collapse into single row without concatenation
我的DF如下:
df <- structure(list(RID = c(1L, 1L, 2L, 2L, 3L, 3L),
Sex = c("FEMALE", "FEMALE", "MALE", "MALE", "FEMALE", "FEMALE"),
Race = c("White","White", "Hispanic", "Hispanic", "Black", "Black"),
TIME = c("Break Fast", "Break Fast", "Lunch", "Lunch", "Dinner", "Dinner"),
Sugar = c("Normal", "Normal", "Abnormal", "Abnormal", "Satisfactory",
"Satisfactory"),
Test_A = c(90L,"","" , 157L,"" , 129L),
Test_B = c("",90L , 157L,"", 129L,"" )),
class = "data.frame", row.names = c(NA, -6L))
要求的输出是:
Requd_df <- structure(list(RID = c(1L, 2L,3L),
Sex = c("FEMALE", "MALE", "FEMALE"),
Race = c("White", "Hispanic","Black"),
TIME = c("Break Fast", "Lunch", "Dinner"),
Sugar = c("Normal", "Abnormal", "Satisfactory"),
Test_A = c(90L, 157L, 129L),
Test_B = c(90L , 157L, 129L)),
class = "data.frame", row.names = c(NA, -3L))
我的代码如下:
setDT(df)
df1 <- df[, lapply(.SD, paste0, collapse=""), by= RID]
我的代码连接列的每个元素 - RID、Sex、Race、Time、Sugar。
需要折叠而不串联
请帮忙
在by
-
中包含其他变量
library(data.table)
setDT(df)
df[, lapply(.SD, paste0, collapse=""), .(RID, Sex, Race, TIME, Sugar)]
# RID Sex Race TIME Sugar Test_A Test_B
#1: 1 FEMALE White Break Fast Normal 90 90
#2: 2 MALE Hispanic Lunch Abnormal 157 157
#3: 3 FEMALE Black Dinner Satisfactory 129 129
我们可以在 tidyverse
中做到这一点
library(dplyr)
library(stringr)
df %>%
group_by(across(RID:Sugar)) %>%
summarise(across(everything(), str_c, collapse=""), .groups = 'drop')
# A tibble: 3 × 7
RID Sex Race TIME Sugar Test_A Test_B
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 FEMALE White Break Fast Normal 90 90
2 2 MALE Hispanic Lunch Abnormal 157 157
3 3 FEMALE Black Dinner Satisfactory 129 129
我的DF如下:
df <- structure(list(RID = c(1L, 1L, 2L, 2L, 3L, 3L),
Sex = c("FEMALE", "FEMALE", "MALE", "MALE", "FEMALE", "FEMALE"),
Race = c("White","White", "Hispanic", "Hispanic", "Black", "Black"),
TIME = c("Break Fast", "Break Fast", "Lunch", "Lunch", "Dinner", "Dinner"),
Sugar = c("Normal", "Normal", "Abnormal", "Abnormal", "Satisfactory",
"Satisfactory"),
Test_A = c(90L,"","" , 157L,"" , 129L),
Test_B = c("",90L , 157L,"", 129L,"" )),
class = "data.frame", row.names = c(NA, -6L))
要求的输出是:
Requd_df <- structure(list(RID = c(1L, 2L,3L),
Sex = c("FEMALE", "MALE", "FEMALE"),
Race = c("White", "Hispanic","Black"),
TIME = c("Break Fast", "Lunch", "Dinner"),
Sugar = c("Normal", "Abnormal", "Satisfactory"),
Test_A = c(90L, 157L, 129L),
Test_B = c(90L , 157L, 129L)),
class = "data.frame", row.names = c(NA, -3L))
我的代码如下:
setDT(df)
df1 <- df[, lapply(.SD, paste0, collapse=""), by= RID]
我的代码连接列的每个元素 - RID、Sex、Race、Time、Sugar。 需要折叠而不串联 请帮忙
在by
-
library(data.table)
setDT(df)
df[, lapply(.SD, paste0, collapse=""), .(RID, Sex, Race, TIME, Sugar)]
# RID Sex Race TIME Sugar Test_A Test_B
#1: 1 FEMALE White Break Fast Normal 90 90
#2: 2 MALE Hispanic Lunch Abnormal 157 157
#3: 3 FEMALE Black Dinner Satisfactory 129 129
我们可以在 tidyverse
library(dplyr)
library(stringr)
df %>%
group_by(across(RID:Sugar)) %>%
summarise(across(everything(), str_c, collapse=""), .groups = 'drop')
# A tibble: 3 × 7
RID Sex Race TIME Sugar Test_A Test_B
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 FEMALE White Break Fast Normal 90 90
2 2 MALE Hispanic Lunch Abnormal 157 157
3 3 FEMALE Black Dinner Satisfactory 129 129