R 使用 tidyr、reshape2 将多个列集转换为行

R using tidyr, reshape2 to convert multiple column sets to rows

我看过许多其他示例,这些示例使用收集、熔化或重塑将宽格式 table 转换为长格式,但通常这些示例非常简单。我需要使用 gather/melt/reshape(或其他)函数来获取此数据:

dt <- data.table(Id=1:4, TopicA_Percent=runif(4,0.0,1.0), TopicB_Percent=runif(4,0.0,1.0), TopicC_Percent=runif(4,0.0,1.0),
             TopicA_Attempted=rnorm(4), TopicB_Attempted=rnorm(4), TopicC_Attempted=rnorm(4),
             TopicA_TimeSpent=rnorm(4), TopicB_TimeSpent=rnorm(4), TopicC_TimeSpent=rnorm(4))

ID TopicA_Percent TopicB_Percent TopicC_Percent TopicA_Attempted TopicB_Attempted TopicC_Attempted TopicA_TimeSpent TopicB_TimeSpent TopicC_TimeSpent
1      0.6639903      0.4219777      0.4099906      -0.09964646       1.05460385       -1.3331776      -1.55929389      -0.83446808      -1.53410657
2      0.7517089      0.2375559      0.8479702       0.25357552      -0.50835127       -0.2126446       0.31249508      -1.33036583      -0.07090781
3      0.1593582      0.5654915      0.1409356      -0.14667119       0.53910258       -0.5661078      -0.02883193       0.60079330      -1.00326670
4      0.6815283      0.1458051      0.8079253      -0.00262729      -0.08975263        0.8448300       1.39846994      -0.03548673      -1.09306706

然后把它变成这样:

Id  Topic   Percent  Attempted  TimeSpent
1 TopicA 0.3871205  0.3460178  0.1834476
2 TopicA 0.6431426 -0.6779898 -1.3497432
3 TopicA 0.5538110 -1.4967361  0.2576378
4 TopicA 0.8621070 -1.4911159  1.7140344
1 TopicB 0.4513063  1.2083898  1.4198672
2 TopicB 0.2045888 -1.2631067 -0.4347670
3 TopicB 0.6605945  0.3486036 -0.6111504
4 TopicB 0.5353699 -0.4743263 -0.4719514
1 TopicC 0.7887296  0.3327606  2.2776418
2 TopicC 0.7280900  0.5818754 -0.3294534
3 TopicC 0.7140528 -1.1317054 -1.3284694
4 TopicC 0.1647406  0.5157608 -1.4876869

我希望解决方案涉及 tidyr 或 reshape2,但当然我很乐意使用任何方法获得帮助,也许我可以从中推断如何使用上述库之一来完成。谢谢!

我们可以使用 data.table 方法,即 melt 需要多个 measure patterns

dM <- melt(dt, measure = patterns("Percent$", "Attempted$", "TimeSpent$"), 
    value.name = c("Percent", "Attempted", "TimeSpent"), variable.name = "Topic")[,
       Topic :=  unique(sub("_.*", "", names(dt)[-1]))[Topic]][]

或者如@DavidArenburg 所述,一个紧凑的选项是

dcast(melt(dt, id = "Id"), Id + sub("_.*", "", variable) ~ sub(".*_", "", variable))

如果你想使用 tidyr 你会使用

library(tidyr)

dt %>% gather(key, value, -Id) %>%
  separate(key, c("topic", "category")) %>%
  spread(category, value)