R将长数据格式化为宽数据......但具有链接结果
R formating long data to wide data... but with linked results
任何人都可以帮助将长数据安排到宽数据,但由于链接结果而变得复杂,即以研究编号标识的宽格式列出,这个重复结果以宽格式列出在 SN 之后(我已经显示了一个缩写 table 每个患者在底部列出了更多结果,并在 LabTest、LabDate、Result、Lower、Upper 列中重复)...我尝试过熔化和重铸,以及绑定列,但似乎无法使其工作.超过 1000 个结果需要重新格式化,因此无法手动输入结果需要在 R 中以宽格式重新格式化长数据 excel 文档谢谢
Original data looks like this
SN LabTest LabDate Result Lower Upper
TD62 Creat 05/12/2004 22 30 90
TD62 AST 06/12/2004 652 6 45
TD58 Creat 26/05/2007 72 30 90
TD58 Albumin 26/05/2005 22 25 35
TD14 AST 28/02/2007 234 6 45
TD14 Albumin 26/02/2007 15 25 35
格式化后的数据应该是这样的
SN LabTCode LabDate Result Lower Upper LabCode LabDate Result Lower Upper
TD62 Creat 05/12/04 22 30 90 AST 06/12/04 652 6 45
TD58 Creat 26/05/05 72 30 90 Alb 26/05/05 22 25 35
TD14 AST 28/02/07 92 30 90 Alb 26/02/07 15 25 35
Formatted data looks like this
到目前为止我已经尝试过:
data_wide2 <- dcast(tdl, SN + LabDate ~ LabCode, value.var="Result")
和
melt(tdl, id = c("SN", "LabDate"), measured= c("Result", "Upper", + "Lower"))
你的问题是 R 不喜欢最后的 table 因为它有重复的列名。也许您需要那种格式的数据,但这是一种糟糕的数据存储方式,因为如果不进行大量手动工作,很难将列重新放回到行中。
也就是说,如果您想这样做,您将需要一个新列来帮助您转置数据。
我在下面使用了 dplyr 和 tidyr,它们值得一看而不是重塑。它们出自同一作者,但更现代,并且设计为可以作为 'tidyverse'.
的一部分组合在一起
library(dplyr)
library(tidyr)
#Recreate your data (not doing this bit in your question is what got you downvoted)
df <- data.frame(
SN = c("TD62","TD62","TD58","TD58","TD14","TD14"),
LabTest = c("Creat","AST","Creat","Albumin","AST","Albumin"),
LabDate = c("05/12/2004","06/12/2004","26/05/2007","26/05/2005","28/02/2007","26/02/2007"),
Result = c(22,652,72,22,234,15),
Lower = c(30,6,30,25,6,25),
Upper = c(90,45,90,35,45,35),
stringsAsFactors = FALSE
)
output <- df %>%
group_by(SN) %>%
mutate(id_number = row_number()) %>% #create an id number to help with tracking the data as it's transposed
gather("key", "value", -SN, -id_number) %>% #flatten the data so that we can rename all the column headers
mutate(key = paste0("t",id_number, key)) %>% #add id_number to the column names. 't' for 'test' to start name with a letter.
select(-id_number) %>% #don't need id_number anymore
spread(key, value)
SN t1LabDate t1LabTest t1Lower t1Result t1Upper t2LabDate t2LabTest t2Lower t2Result t2Upper
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 TD14 28/02/2007 AST 6 234 45 26/02/2007 Albumin 25 15 35
2 TD58 26/05/2007 Creat 30 72 90 26/05/2005 Albumin 25 22 35
3 TD62 05/12/2004 Creat 30 22 90 06/12/2004 AST 6 652 45
如果您需要特定顺序的列,可能还有一些排序问题需要解决。
任何人都可以帮助将长数据安排到宽数据,但由于链接结果而变得复杂,即以研究编号标识的宽格式列出,这个重复结果以宽格式列出在 SN 之后(我已经显示了一个缩写 table 每个患者在底部列出了更多结果,并在 LabTest、LabDate、Result、Lower、Upper 列中重复)...我尝试过熔化和重铸,以及绑定列,但似乎无法使其工作.超过 1000 个结果需要重新格式化,因此无法手动输入结果需要在 R 中以宽格式重新格式化长数据 excel 文档谢谢
Original data looks like this
SN LabTest LabDate Result Lower Upper
TD62 Creat 05/12/2004 22 30 90
TD62 AST 06/12/2004 652 6 45
TD58 Creat 26/05/2007 72 30 90
TD58 Albumin 26/05/2005 22 25 35
TD14 AST 28/02/2007 234 6 45
TD14 Albumin 26/02/2007 15 25 35
格式化后的数据应该是这样的
SN LabTCode LabDate Result Lower Upper LabCode LabDate Result Lower Upper
TD62 Creat 05/12/04 22 30 90 AST 06/12/04 652 6 45
TD58 Creat 26/05/05 72 30 90 Alb 26/05/05 22 25 35
TD14 AST 28/02/07 92 30 90 Alb 26/02/07 15 25 35
Formatted data looks like this
到目前为止我已经尝试过:
data_wide2 <- dcast(tdl, SN + LabDate ~ LabCode, value.var="Result")
和
melt(tdl, id = c("SN", "LabDate"), measured= c("Result", "Upper", + "Lower"))
你的问题是 R 不喜欢最后的 table 因为它有重复的列名。也许您需要那种格式的数据,但这是一种糟糕的数据存储方式,因为如果不进行大量手动工作,很难将列重新放回到行中。
也就是说,如果您想这样做,您将需要一个新列来帮助您转置数据。
我在下面使用了 dplyr 和 tidyr,它们值得一看而不是重塑。它们出自同一作者,但更现代,并且设计为可以作为 'tidyverse'.
的一部分组合在一起library(dplyr)
library(tidyr)
#Recreate your data (not doing this bit in your question is what got you downvoted)
df <- data.frame(
SN = c("TD62","TD62","TD58","TD58","TD14","TD14"),
LabTest = c("Creat","AST","Creat","Albumin","AST","Albumin"),
LabDate = c("05/12/2004","06/12/2004","26/05/2007","26/05/2005","28/02/2007","26/02/2007"),
Result = c(22,652,72,22,234,15),
Lower = c(30,6,30,25,6,25),
Upper = c(90,45,90,35,45,35),
stringsAsFactors = FALSE
)
output <- df %>%
group_by(SN) %>%
mutate(id_number = row_number()) %>% #create an id number to help with tracking the data as it's transposed
gather("key", "value", -SN, -id_number) %>% #flatten the data so that we can rename all the column headers
mutate(key = paste0("t",id_number, key)) %>% #add id_number to the column names. 't' for 'test' to start name with a letter.
select(-id_number) %>% #don't need id_number anymore
spread(key, value)
SN t1LabDate t1LabTest t1Lower t1Result t1Upper t2LabDate t2LabTest t2Lower t2Result t2Upper
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 TD14 28/02/2007 AST 6 234 45 26/02/2007 Albumin 25 15 35
2 TD58 26/05/2007 Creat 30 72 90 26/05/2005 Albumin 25 22 35
3 TD62 05/12/2004 Creat 30 22 90 06/12/2004 AST 6 652 45
如果您需要特定顺序的列,可能还有一些排序问题需要解决。