R 使用 tidyr、reshape2 将多个列集转换为行
R using tidyr, reshape2 to convert multiple column sets to rows
我看过许多其他示例,这些示例使用收集、熔化或重塑将宽格式 table 转换为长格式,但通常这些示例非常简单。我需要使用 gather/melt/reshape(或其他)函数来获取此数据:
dt <- data.table(Id=1:4, TopicA_Percent=runif(4,0.0,1.0), TopicB_Percent=runif(4,0.0,1.0), TopicC_Percent=runif(4,0.0,1.0),
TopicA_Attempted=rnorm(4), TopicB_Attempted=rnorm(4), TopicC_Attempted=rnorm(4),
TopicA_TimeSpent=rnorm(4), TopicB_TimeSpent=rnorm(4), TopicC_TimeSpent=rnorm(4))
ID TopicA_Percent TopicB_Percent TopicC_Percent TopicA_Attempted TopicB_Attempted TopicC_Attempted TopicA_TimeSpent TopicB_TimeSpent TopicC_TimeSpent
1 0.6639903 0.4219777 0.4099906 -0.09964646 1.05460385 -1.3331776 -1.55929389 -0.83446808 -1.53410657
2 0.7517089 0.2375559 0.8479702 0.25357552 -0.50835127 -0.2126446 0.31249508 -1.33036583 -0.07090781
3 0.1593582 0.5654915 0.1409356 -0.14667119 0.53910258 -0.5661078 -0.02883193 0.60079330 -1.00326670
4 0.6815283 0.1458051 0.8079253 -0.00262729 -0.08975263 0.8448300 1.39846994 -0.03548673 -1.09306706
然后把它变成这样:
Id Topic Percent Attempted TimeSpent
1 TopicA 0.3871205 0.3460178 0.1834476
2 TopicA 0.6431426 -0.6779898 -1.3497432
3 TopicA 0.5538110 -1.4967361 0.2576378
4 TopicA 0.8621070 -1.4911159 1.7140344
1 TopicB 0.4513063 1.2083898 1.4198672
2 TopicB 0.2045888 -1.2631067 -0.4347670
3 TopicB 0.6605945 0.3486036 -0.6111504
4 TopicB 0.5353699 -0.4743263 -0.4719514
1 TopicC 0.7887296 0.3327606 2.2776418
2 TopicC 0.7280900 0.5818754 -0.3294534
3 TopicC 0.7140528 -1.1317054 -1.3284694
4 TopicC 0.1647406 0.5157608 -1.4876869
我希望解决方案涉及 tidyr 或 reshape2,但当然我很乐意使用任何方法获得帮助,也许我可以从中推断如何使用上述库之一来完成。谢谢!
我们可以使用 data.table
方法,即 melt
需要多个 measure
patterns
dM <- melt(dt, measure = patterns("Percent$", "Attempted$", "TimeSpent$"),
value.name = c("Percent", "Attempted", "TimeSpent"), variable.name = "Topic")[,
Topic := unique(sub("_.*", "", names(dt)[-1]))[Topic]][]
或者如@DavidArenburg 所述,一个紧凑的选项是
dcast(melt(dt, id = "Id"), Id + sub("_.*", "", variable) ~ sub(".*_", "", variable))
如果你想使用 tidyr
你会使用
library(tidyr)
dt %>% gather(key, value, -Id) %>%
separate(key, c("topic", "category")) %>%
spread(category, value)
我看过许多其他示例,这些示例使用收集、熔化或重塑将宽格式 table 转换为长格式,但通常这些示例非常简单。我需要使用 gather/melt/reshape(或其他)函数来获取此数据:
dt <- data.table(Id=1:4, TopicA_Percent=runif(4,0.0,1.0), TopicB_Percent=runif(4,0.0,1.0), TopicC_Percent=runif(4,0.0,1.0),
TopicA_Attempted=rnorm(4), TopicB_Attempted=rnorm(4), TopicC_Attempted=rnorm(4),
TopicA_TimeSpent=rnorm(4), TopicB_TimeSpent=rnorm(4), TopicC_TimeSpent=rnorm(4))
ID TopicA_Percent TopicB_Percent TopicC_Percent TopicA_Attempted TopicB_Attempted TopicC_Attempted TopicA_TimeSpent TopicB_TimeSpent TopicC_TimeSpent
1 0.6639903 0.4219777 0.4099906 -0.09964646 1.05460385 -1.3331776 -1.55929389 -0.83446808 -1.53410657
2 0.7517089 0.2375559 0.8479702 0.25357552 -0.50835127 -0.2126446 0.31249508 -1.33036583 -0.07090781
3 0.1593582 0.5654915 0.1409356 -0.14667119 0.53910258 -0.5661078 -0.02883193 0.60079330 -1.00326670
4 0.6815283 0.1458051 0.8079253 -0.00262729 -0.08975263 0.8448300 1.39846994 -0.03548673 -1.09306706
然后把它变成这样:
Id Topic Percent Attempted TimeSpent
1 TopicA 0.3871205 0.3460178 0.1834476
2 TopicA 0.6431426 -0.6779898 -1.3497432
3 TopicA 0.5538110 -1.4967361 0.2576378
4 TopicA 0.8621070 -1.4911159 1.7140344
1 TopicB 0.4513063 1.2083898 1.4198672
2 TopicB 0.2045888 -1.2631067 -0.4347670
3 TopicB 0.6605945 0.3486036 -0.6111504
4 TopicB 0.5353699 -0.4743263 -0.4719514
1 TopicC 0.7887296 0.3327606 2.2776418
2 TopicC 0.7280900 0.5818754 -0.3294534
3 TopicC 0.7140528 -1.1317054 -1.3284694
4 TopicC 0.1647406 0.5157608 -1.4876869
我希望解决方案涉及 tidyr 或 reshape2,但当然我很乐意使用任何方法获得帮助,也许我可以从中推断如何使用上述库之一来完成。谢谢!
我们可以使用 data.table
方法,即 melt
需要多个 measure
patterns
dM <- melt(dt, measure = patterns("Percent$", "Attempted$", "TimeSpent$"),
value.name = c("Percent", "Attempted", "TimeSpent"), variable.name = "Topic")[,
Topic := unique(sub("_.*", "", names(dt)[-1]))[Topic]][]
或者如@DavidArenburg 所述,一个紧凑的选项是
dcast(melt(dt, id = "Id"), Id + sub("_.*", "", variable) ~ sub(".*_", "", variable))
如果你想使用 tidyr
你会使用
library(tidyr)
dt %>% gather(key, value, -Id) %>%
separate(key, c("topic", "category")) %>%
spread(category, value)