当每个时间段的变量没有唯一指定时,使用 dcast 来转换宽格式
Using dcast to cast a wide format when the variables per time period are not uniquely specified
我想将 this csv file 转换成长格式。目前看起来如下:
每个ID每年列出1次,共7次。
我想做的是让每个 ID 有一行,其中变量列为 logwks1 + logwks2 + .. + logwks6 + logwks7
。
我一开始只是和wagem <- melt(wage, id = "ID")
融为一体
但是我不明白如何转换它以获得所需的格式。
我尝试这样做 wagec <- dcast(wagem, ID ~ variable)
,但随后它将观察结果计为默认值(我想是因为它无法知道如何以其他方式转换它们)。
我该如何解决这个问题?
来自 tidyr
的 spread
可以为您做这件事。只需先更改年份列的值以匹配您稍后想要的列名。
library(tidyverse)
data <- tibble::tribble(
~Year, ~LOGWKS, ~ID,
1, "0,862124465", 1,
2, "0,433704181", 1,
3, "0,409959143", 1,
4, "0,763847693", 1,
5, "0,847479032", 1,
6, "0,855926486", 1,
7, "0,809774126", 1
)
data %>%
mutate(
Year = paste0("LOGWKS", Year)
) %>%
spread(
Year, LOGWKS
)
#> # A tibble: 1 x 8
#> ID LOGWKS1 LOGWKS2 LOGWKS3 LOGWKS4 LOGWKS5 LOGWKS6 LOGWKS7
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 0,862124~ 0,433704~ 0,409959~ 0,763847~ 0,847479~ 0,85592~ 0,80977~
由 reprex package (v0.3.0)
于 2019-08-09 创建
编辑:如果你有多个变量要传播
您可以先使用 gather
然后再进行转换:
library(tidyverse)
data_semi_long <- tibble::tribble(
~Year, ~LOGWKS, ~whatever, ~ID,
1, "0,402711636", "0,182708713", 1,
2, "0,094020099", "0,776126975", 1,
3, "0,948184845", "0,083343821", 1,
4, "0,529592883", "0,462755147", 1,
5, "0,612587798", "0,613195331", 1,
6, "0,108845887", "0,032397081", 1,
7, "0,585433903", "0,788149493", 1
)
data_semi_long %>%
gather(key, value, -ID, - Year) %>%
mutate(
Year = paste0(key, Year)
) %>%
reshape2::dcast(
ID ~Year
)
#> ID LOGWKS1 LOGWKS2 LOGWKS3 LOGWKS4 LOGWKS5
#> 1 1 0,402711636 0,094020099 0,948184845 0,529592883 0,612587798
#> LOGWKS6 LOGWKS7 whatever1 whatever2 whatever3 whatever4
#> 1 0,108845887 0,585433903 0,182708713 0,776126975 0,083343821 0,462755147
#> whatever5 whatever6 whatever7
#> 1 0,613195331 0,032397081 0,788149493
由 reprex package (v0.3.0)
于 2019-08-09 创建
我想将 this csv file 转换成长格式。目前看起来如下:
每个ID每年列出1次,共7次。
我想做的是让每个 ID 有一行,其中变量列为 logwks1 + logwks2 + .. + logwks6 + logwks7
。
我一开始只是和wagem <- melt(wage, id = "ID")
但是我不明白如何转换它以获得所需的格式。
我尝试这样做 wagec <- dcast(wagem, ID ~ variable)
,但随后它将观察结果计为默认值(我想是因为它无法知道如何以其他方式转换它们)。
我该如何解决这个问题?
tidyr
的 spread
可以为您做这件事。只需先更改年份列的值以匹配您稍后想要的列名。
library(tidyverse)
data <- tibble::tribble(
~Year, ~LOGWKS, ~ID,
1, "0,862124465", 1,
2, "0,433704181", 1,
3, "0,409959143", 1,
4, "0,763847693", 1,
5, "0,847479032", 1,
6, "0,855926486", 1,
7, "0,809774126", 1
)
data %>%
mutate(
Year = paste0("LOGWKS", Year)
) %>%
spread(
Year, LOGWKS
)
#> # A tibble: 1 x 8
#> ID LOGWKS1 LOGWKS2 LOGWKS3 LOGWKS4 LOGWKS5 LOGWKS6 LOGWKS7
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 0,862124~ 0,433704~ 0,409959~ 0,763847~ 0,847479~ 0,85592~ 0,80977~
由 reprex package (v0.3.0)
于 2019-08-09 创建编辑:如果你有多个变量要传播
您可以先使用 gather
然后再进行转换:
library(tidyverse)
data_semi_long <- tibble::tribble(
~Year, ~LOGWKS, ~whatever, ~ID,
1, "0,402711636", "0,182708713", 1,
2, "0,094020099", "0,776126975", 1,
3, "0,948184845", "0,083343821", 1,
4, "0,529592883", "0,462755147", 1,
5, "0,612587798", "0,613195331", 1,
6, "0,108845887", "0,032397081", 1,
7, "0,585433903", "0,788149493", 1
)
data_semi_long %>%
gather(key, value, -ID, - Year) %>%
mutate(
Year = paste0(key, Year)
) %>%
reshape2::dcast(
ID ~Year
)
#> ID LOGWKS1 LOGWKS2 LOGWKS3 LOGWKS4 LOGWKS5
#> 1 1 0,402711636 0,094020099 0,948184845 0,529592883 0,612587798
#> LOGWKS6 LOGWKS7 whatever1 whatever2 whatever3 whatever4
#> 1 0,108845887 0,585433903 0,182708713 0,776126975 0,083343821 0,462755147
#> whatever5 whatever6 whatever7
#> 1 0,613195331 0,032397081 0,788149493
由 reprex package (v0.3.0)
于 2019-08-09 创建