R将长数据格式化为宽数据......但具有链接结果

R formating long data to wide data... but with linked results

任何人都可以帮助将长数据安排到宽数据,但由于链接结果而变得复杂,即以研究编号标识的宽格式列出,这个重复结果以宽格式列出在 SN 之后(我已经显示了一个缩写 table 每个患者在底部列出了更多结果,并在 LabTest、LabDate、Result、Lower、Upper 列中重复)...我尝试过熔化和重铸,以及绑定列,但似乎无法使其工作.超过 1000 个结果需要重新格式化,因此无法手动输入结果需要在 R 中以宽格式重新格式化长数据 excel 文档谢谢

Original data looks like this

SN     LabTest     LabDate    Result Lower Upper
TD62   Creat       05/12/2004  22     30    90
TD62   AST         06/12/2004  652    6     45
TD58   Creat       26/05/2007  72     30    90
TD58   Albumin     26/05/2005  22     25    35  
TD14   AST         28/02/2007  234    6     45
TD14   Albumin     26/02/2007  15     25    35

格式化后的数据应该是这样的

SN LabTCode LabDate Result Lower Upper LabCode LabDate Result Lower Upper
TD62 Creat   05/12/04  22    30   90   AST     06/12/04  652   6    45
TD58 Creat   26/05/05  72    30   90   Alb     26/05/05  22    25   35
TD14 AST     28/02/07  92    30   90   Alb     26/02/07  15    25   35

Formatted data looks like this

到目前为止我已经尝试过:

data_wide2 <- dcast(tdl, SN + LabDate ~ LabCode, value.var="Result")

melt(tdl, id = c("SN", "LabDate"), measured= c("Result", "Upper", + "Lower"))

你的问题是 R 不喜欢最后的 table 因为它有重复的列名。也许您需要那种格式的数据,但这是一种糟糕的数据存储方式,因为如果不进行大量手动工作,很难将列重新放回到行中。

也就是说,如果您想这样做,您将需要一个新列来帮助您转置数据。

我在下面使用了 dplyr 和 tidyr,它们值得一看而不是重塑。它们出自同一作者,但更现代,并且设计为可以作为 'tidyverse'.

的一部分组合在一起
library(dplyr)
library(tidyr)

#Recreate your data (not doing this bit in your question is what got you downvoted)
df <- data.frame(
  SN = c("TD62","TD62","TD58","TD58","TD14","TD14"),
  LabTest = c("Creat","AST","Creat","Albumin","AST","Albumin"),
  LabDate = c("05/12/2004","06/12/2004","26/05/2007","26/05/2005","28/02/2007","26/02/2007"),
  Result = c(22,652,72,22,234,15),
  Lower = c(30,6,30,25,6,25),
  Upper = c(90,45,90,35,45,35),
  stringsAsFactors = FALSE
)

output <- df %>% 
  group_by(SN) %>% 
  mutate(id_number = row_number()) %>% #create an id number to help with tracking the data as it's transposed
  gather("key", "value", -SN, -id_number) %>% #flatten the data so that we can rename all the column headers
  mutate(key = paste0("t",id_number, key)) %>% #add id_number to the column names. 't' for 'test' to start name with a letter.
  select(-id_number) %>% #don't need id_number anymore
  spread(key, value)

  SN    t1LabDate  t1LabTest t1Lower t1Result t1Upper t2LabDate  t2LabTest t2Lower t2Result t2Upper
  <chr> <chr>      <chr>     <chr>   <chr>    <chr>   <chr>      <chr>     <chr>   <chr>    <chr>  
1 TD14  28/02/2007 AST       6       234      45      26/02/2007 Albumin   25      15       35     
2 TD58  26/05/2007 Creat     30      72       90      26/05/2005 Albumin   25      22       35     
3 TD62  05/12/2004 Creat     30      22       90      06/12/2004 AST       6       652      45 

如果您需要特定顺序的列,可能还有一些排序问题需要解决。