当每个时间段的变量没有唯一指定时，使用 dcast 来转换宽格式

Question

我想将 this csv file 转换成长格式。目前看起来如下：

每个ID每年列出1次，共7次。

我想做的是让每个 ID 有一行，其中变量列为 logwks1 + logwks2 + .. + logwks6 + logwks7。

我一开始只是和wagem <- melt(wage, id = "ID")

融为一体

但是我不明白如何转换它以获得所需的格式。

我尝试这样做 wagec <- dcast(wagem, ID ~ variable)，但随后它将观察结果计为默认值（我想是因为它无法知道如何以其他方式转换它们）。

我该如何解决这个问题？

Answer 1

来自 tidyr 的

spread 可以为您做这件事。只需先更改年份列的值以匹配您稍后想要的列名。

library(tidyverse)
data <- tibble::tribble(
          ~Year,       ~LOGWKS, ~ID,
              1, "0,862124465",   1,
              2, "0,433704181",   1,
              3, "0,409959143",   1,
              4, "0,763847693",   1,
              5, "0,847479032",   1,
              6, "0,855926486",   1,
              7, "0,809774126",   1
          )
data %>% 
  mutate(
    Year = paste0("LOGWKS", Year)
  ) %>% 
  spread(
    Year, LOGWKS
  )
#> # A tibble: 1 x 8
#>      ID LOGWKS1   LOGWKS2   LOGWKS3   LOGWKS4   LOGWKS5   LOGWKS6  LOGWKS7 
#>   <dbl> <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    <chr>   
#> 1     1 0,862124~ 0,433704~ 0,409959~ 0,763847~ 0,847479~ 0,85592~ 0,80977~

^{由 reprex package (v0.3.0)}

于 2019-08-09 创建

编辑：如果你有多个变量要传播 您可以先使用 gather 然后再进行转换：

library(tidyverse)
data_semi_long <- tibble::tribble(
  ~Year,       ~LOGWKS,     ~whatever, ~ID,
      1, "0,402711636", "0,182708713",   1,
      2, "0,094020099", "0,776126975",   1,
      3, "0,948184845", "0,083343821",   1,
      4, "0,529592883", "0,462755147",   1,
      5, "0,612587798", "0,613195331",   1,
      6, "0,108845887", "0,032397081",   1,
      7, "0,585433903", "0,788149493",   1
  )
data_semi_long %>% 
  gather(key, value, -ID, - Year) %>% 
  mutate(
    Year = paste0(key, Year)
  ) %>% 
  reshape2::dcast(
    ID ~Year
  )
#>   ID     LOGWKS1     LOGWKS2     LOGWKS3     LOGWKS4     LOGWKS5
#> 1  1 0,402711636 0,094020099 0,948184845 0,529592883 0,612587798
#>       LOGWKS6     LOGWKS7   whatever1   whatever2   whatever3   whatever4
#> 1 0,108845887 0,585433903 0,182708713 0,776126975 0,083343821 0,462755147
#>     whatever5   whatever6   whatever7
#> 1 0,613195331 0,032397081 0,788149493

^{由 reprex package (v0.3.0)}

于 2019-08-09 创建

当每个时间段的变量没有唯一指定时，使用 dcast 来转换宽格式

Using dcast to cast a wide format when the variables per time period are not uniquely specified

r

reshape

reshape2

dcast