将长数据框重塑为宽数据框并使用一列作为前缀重命名新列

Reshape long dataframe to wide and rename new columns by using one column as prefix

给定一个数据帧 df 如下:

df <- structure(list(code = c("M0000273", "M0000357", "M0000545", "M0000273", 
"M0000357", "M0000545"), name = c("industry", "agriculture", 
"service", "industry", "agriculture", "service"), act_value = c(16.78, 
9.26, 49.38, 35.74, 88.42, 68.26), pred_value = c(17.78, 10.26, 
50.38, 36.74, 89.42, 69.26), year = c(2019L, 2019L, 2019L, 2020L, 
2020L, 2020L)), class = "data.frame", row.names = c(NA, -6L))

df:

      code        name act_value pred_value year
1 M0000273    industry     16.78      17.78 2019
2 M0000357 agriculture      9.26      10.26 2019
3 M0000545     service     49.38      50.38 2019
4 M0000273    industry     35.74      36.74 2020
5 M0000357 agriculture     88.42      89.42 2020
6 M0000545     service     68.26      69.26 2020

我想用codename作为索引列,将act_valuepred_value由长变宽,最后通过粘贴重命名新列year 列作为前缀。

预期结果格式如下:

      code        name  2019_act_value  2019_pred_value  2020_act_value  2020_pred_value
1 M0000273    industry           16.78            17.78           35.74            36.74
2 M0000357 agriculture            9.26            10.26           88.42            89.42
3 M0000545     service           49.38            50.38           68.26            69.26

我的试用码:

reshape(df, idvar = c('code', 'name'), timevar = 'year', direction = 'wide')

我如何使用 R 正确实现这一点?谢谢。

我们可以使用tidyr::pivot_wider来做到这一点。我不推荐您的命名约定,如果您删除 names_glue,我们会得到相同的结果,但会使用更整洁的年份作为后缀格式。

library(tidyr)

pivot_wider(df,
            names_from = year,
            names_glue = "{year}_{.value}",
            values_from = ends_with("value"))
#> # A tibble: 3 × 6
#>   code     name        `2019_act_value` `2020_act_value` `2019_pred_value`
#>   <chr>    <chr>                  <dbl>            <dbl>             <dbl>
#> 1 M0000273 industry               16.8              35.7              17.8
#> 2 M0000357 agriculture             9.26             88.4              10.3
#> 3 M0000545 service                49.4              68.3              50.4
#> # … with 1 more variable: 2020_pred_value <dbl>