不使用 t() 转置数据帧,使用 spread()、cast() 或 melt()

Transpose dataframe without t(), and using spread(), cast(), or melt()

我需要在不使用 t() 的情况下转置数据帧,因为我想避免将其转换为矩阵。所以,我正在使用 :

的解决方案
mydata <- data.table(col0=c("row1","row2","row3"),
                     col1=c(11,21,31),
                     col2=c(12,22,32),
                     col3=c(13,23,33))

mydata
# col0 col1 col2 col3
# row1   11   12   13
# row2   21   22   23
# row3   31   32   33

dcast(melt(mydata, id.vars = "col0"), variable ~ col0)
#    variable row1 row2 row3
# 1:     col1   11   21   31
# 2:     col2   12   22   32
# 3:     col3   13   23   33

我对正在使用的数据使用相同的逻辑:

x <- merge(as.data.frame(table(mtcars$mpg)), as.data.frame(round(prop.table(table(mtcars$mpg)),2)), by="Var1", all.x=TRUE)
data.table::dcast(data.table::melt(x, id.vars = "Var1"), variable ~ Var1)

有效!但它给了我一个警告和一个“未来的错误”:

Warning message in data.table::melt(x, id.vars = "Var1"): “The melt
generic in data.table has been passed a data.frame and will attempt to
redirect to the relevant reshape2 method; please note that reshape2 is
deprecated, and this redirection is now deprecated as well. To
continue using melt methods from reshape2 while both libraries are
attached, e.g. melt.list, you can prepend the namespace like
reshape2::melt(x). In the next version, this warning will become an
error.” Warning message in data.table::dcast(data.table::melt(x,
id.vars = "Var1"), variable ~ : “The dcast generic in data.table has
been passed a data.frame and will attempt to redirect to the
reshape2::dcast; please note that reshape2 is deprecated, and this
redirection is now deprecated as well. Please do this redirection
yourself like reshape2::dcast(data.table::melt(x, id.vars = "Var1")).
In the next version, this warning will become an error.”

此外,我一直在尝试使用 here 使用 dplyr::spread() 的解决方案转置数据帧,但它似乎比 data.table 包的解决方案复杂得多(当值列多于 1,如本例所示)。我更习惯 dplyr()tidyverse()data.table 解决方案要简单得多,只需忽略它即可。

附加信息。

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GGally_1.4.0       forcats_0.4.0      stringr_1.4.0      dplyr_0.8.3       
 [5] purrr_0.3.2        readr_1.3.1        tidyr_1.0.0        tibble_2.1.3      
 [9] ggplot2_3.2.1.9000 tidyverse_1.2.1    bigrquery_1.2.0    httr_1.4.1        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7          jsonlite_1.6         splines_3.6.0       
 [4] modelr_0.1.4         Formula_1.2-3        assertthat_0.2.1    
 [7] getPass_0.2-2        latticeExtra_0.6-28  cellranger_1.1.0    
[10] pillar_1.4.2         backports_1.1.5      lattice_0.20-38     
[13] glue_1.3.1           uuid_0.1-2           digest_0.6.21       
[16] checkmate_1.9.4      RColorBrewer_1.1-2   rvest_0.3.4         
[19] colorspace_1.4-1     htmltools_0.4.0      Matrix_1.2-17       
[22] plyr_1.8.4           psych_1.8.12         pkgconfig_2.0.3     
[25] broom_0.5.2          haven_2.1.1          scales_1.0.0        
[28] htmlTable_1.13.2     generics_0.0.2       withr_2.1.2         
[31] repr_1.0.1.9000      skimr_1.0.7          nnet_7.3-12         
[34] cli_1.1.0            mnormt_1.5-5         survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         readxl_1.3.1        
[40] evaluate_0.14        fs_1.3.1             nlme_3.1-141        
[43] xml2_1.2.2           foreign_0.8-72       data.table_1.12.4   
[46] tools_3.6.0          hms_0.5.1            gargle_0.4.0        
[49] lifecycle_0.1.0      munsell_0.5.0        cluster_2.1.0       
[52] compiler_3.6.0       rlang_0.4.0          grid_3.6.0          
[55] pbdZMQ_0.3-3         IRkernel_1.0.2.9000  rstudioapi_0.10     
[58] htmlwidgets_1.5.1    base64enc_0.1-3      gtable_0.3.0        
[61] DBI_1.0.0            reshape_0.8.8        reshape2_1.4.3      
[64] R6_2.4.0             gridExtra_2.3        lubridate_1.7.4     
[67] knitr_1.25           bit_1.1-14           zeallot_0.1.0       
[70] Hmisc_4.2-0          stringi_1.4.3        parallel_3.6.0      
[73] IRdisplay_0.7.0.9000 Rcpp_1.0.2           vctrs_0.2.0         
[76] rpart_4.1-15         acepack_1.4.1        xfun_0.10           
[79] tidyselect_0.2.5    

新的 tidyr 1.0 功能使这变得容易得多:

library(tidyverse)
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

mydata <- tibble(col0=c("row1","row2","row3"),
                 col1=c(11,21,31),
                 col2=c(12,22,32),
                 col3=c(13,23,33))

# First collect all the values in the one column
(new_data <- mydata %>% pivot_longer(col1:col3))
#> # A tibble: 9 x 3
#>   col0  name  value
#>   <chr> <chr> <dbl>
#> 1 row1  col1     11
#> 2 row1  col2     12
#> 3 row1  col3     13
#> 4 row2  col1     21
#> 5 row2  col2     22
#> 6 row2  col3     23
#> 7 row3  col1     31
#> 8 row3  col2     32
#> 9 row3  col3     33

# Col0 is what we want the new column names to come from, so:
(new_data %<>% pivot_wider(names_from = col0))
#> # A tibble: 3 x 4
#>   name   row1  row2  row3
#>   <chr> <dbl> <dbl> <dbl>
#> 1 col1     11    21    31
#> 2 col2     12    22    32
#> 3 col3     13    23    33

因此对于您的 mtcars 用例:

library(tidyverse)

(x <- 
    mtcars %>% 
    group_by(mpg) %>% 
    summarize(Freq.x = n(), 
              Freq.y = Freq.x/nrow(.)) %>% 
    pivot_longer(-mpg) %>% 
    pivot_wider(names_from = mpg))
#> # A tibble: 2 x 26
#>   name  `10.4` `13.3` `14.3` `14.7`   `15` `15.2` `15.5` `15.8` `16.4`
#>   <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 Freq~ 2      1      1      1      1      2      1      1      1     
#> 2 Freq~ 0.0625 0.0312 0.0312 0.0312 0.0312 0.0625 0.0312 0.0312 0.0312
#> # ... with 16 more variables: `17.3` <dbl>, `17.8` <dbl>, `18.1` <dbl>,
#> #   `18.7` <dbl>, `19.2` <dbl>, `19.7` <dbl>, `21` <dbl>, `21.4` <dbl>,
#> #   `21.5` <dbl>, `22.8` <dbl>, `24.4` <dbl>, `26` <dbl>, `27.3` <dbl>,
#> #   `30.4` <dbl>, `32.4` <dbl>, `33.9` <dbl>

我对data.table基本一无所知,但是这让我不红

现在,如果您的值不是同一类型,这仍然会给您带来问题 - 因为它仍然在某个时候将所有值堆叠到一列中 - 所以我打算建议 nest() 方法。但后来我意识到......如果你想转置这个东西,并且行不是所有相同的值类型,那么你最终会尝试将不同类型的值放入一列,不是吗?所以一些同质化的转换将是不可避免的。

reprex package (v0.3.0)

于 2019-10-22 创建

您需要确保将 data.table 对象传递给 data.table::meltdata.table::dcast

x<-merge(as.data.frame(table(mtcars$mpg)),
        as.data.frame(round(prop.table(table(mtcars$mpg)),2)), 
        by="Var1", all.x=TRUE)

data.table::dcast(data.table::melt(data.table::setDT(x), id.vars = "Var1"), 
                  variable ~ Var1)

警告:

您看到,通过使用 data.table::setDT“未来错误” 已解决。

#> Warning in melt.data.table(data.table::setDT(x), id.vars = "Var1"):
#> 'measure.vars' [Freq.x, Freq.y] are not all of the same type. By order
#> of hierarchy, the molten data value column will be of type 'double'. All
#> measure variables not of type 'double' will be coerced too. Check DETAILS
#> in ?melt.data.table for more on coercion.

输出:

#>    variable 10.4 13.3 14.3 14.7   15 15.2 15.5 15.8 16.4 17.3 17.8 18.1
#> 1:   Freq.x 2.00 1.00 1.00 1.00 1.00 2.00 1.00 1.00 1.00 1.00 1.00 1.00
#> 2:   Freq.y 0.06 0.03 0.03 0.03 0.03 0.06 0.03 0.03 0.03 0.03 0.03 0.03
#>    18.7 19.2 19.7   21 21.4 21.5 22.8 24.4   26 27.3 30.4 32.4 33.9
#> 1: 1.00 2.00 1.00 2.00 2.00 1.00 2.00 1.00 1.00 1.00 2.00 1.00 1.00
#> 2: 0.03 0.06 0.03 0.06 0.06 0.03 0.06 0.03 0.03 0.03 0.06 0.03 0.03

P.S. 我无法在 data.table_1.12.2 中重现错误,不得不更新到 data.table_1.12.6.

I need to transpose a dataframe without using t() as I want to avoid transforming it into a matrix.

如果您唯一的要求是避免将数据框强制转换为矩阵,您可以使用 data.table::transpose,这需要版本 >= 1.12.4

data.table::transpose(
  mydata, 
  keep.names = 'variable', 
  make.names = names(mydata)[1])

#    variable row1 row2 row3
# 1:     col1   11   21   31
# 2:     col2   12   22   32
# 3:     col3   13   23   33