不使用 t() 转置数据帧,使用 spread()、cast() 或 melt()
Transpose dataframe without t(), and using spread(), cast(), or melt()
我需要在不使用 t()
的情况下转置数据帧,因为我想避免将其转换为矩阵。所以,我正在使用 :
的解决方案
mydata <- data.table(col0=c("row1","row2","row3"),
col1=c(11,21,31),
col2=c(12,22,32),
col3=c(13,23,33))
mydata
# col0 col1 col2 col3
# row1 11 12 13
# row2 21 22 23
# row3 31 32 33
dcast(melt(mydata, id.vars = "col0"), variable ~ col0)
# variable row1 row2 row3
# 1: col1 11 21 31
# 2: col2 12 22 32
# 3: col3 13 23 33
我对正在使用的数据使用相同的逻辑:
x <- merge(as.data.frame(table(mtcars$mpg)), as.data.frame(round(prop.table(table(mtcars$mpg)),2)), by="Var1", all.x=TRUE)
data.table::dcast(data.table::melt(x, id.vars = "Var1"), variable ~ Var1)
有效!但它给了我一个警告和一个“未来的错误”:
Warning message in data.table::melt(x, id.vars = "Var1"): “The melt
generic in data.table has been passed a data.frame and will attempt to
redirect to the relevant reshape2 method; please note that reshape2 is
deprecated, and this redirection is now deprecated as well. To
continue using melt methods from reshape2 while both libraries are
attached, e.g. melt.list, you can prepend the namespace like
reshape2::melt(x). In the next version, this warning will become an
error.” Warning message in data.table::dcast(data.table::melt(x,
id.vars = "Var1"), variable ~ : “The dcast generic in data.table has
been passed a data.frame and will attempt to redirect to the
reshape2::dcast; please note that reshape2 is deprecated, and this
redirection is now deprecated as well. Please do this redirection
yourself like reshape2::dcast(data.table::melt(x, id.vars = "Var1")).
In the next version, this warning will become an error.”
此外,我一直在尝试使用 here 使用 dplyr::spread()
的解决方案转置数据帧,但它似乎比 data.table
包的解决方案复杂得多(当值列多于 1,如本例所示)。我更习惯 dplyr()
和 tidyverse()
但 data.table
解决方案要简单得多,只需忽略它即可。
附加信息。
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GGally_1.4.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3
[5] purrr_0.3.2 readr_1.3.1 tidyr_1.0.0 tibble_2.1.3
[9] ggplot2_3.2.1.9000 tidyverse_1.2.1 bigrquery_1.2.0 httr_1.4.1
loaded via a namespace (and not attached):
[1] bit64_0.9-7 jsonlite_1.6 splines_3.6.0
[4] modelr_0.1.4 Formula_1.2-3 assertthat_0.2.1
[7] getPass_0.2-2 latticeExtra_0.6-28 cellranger_1.1.0
[10] pillar_1.4.2 backports_1.1.5 lattice_0.20-38
[13] glue_1.3.1 uuid_0.1-2 digest_0.6.21
[16] checkmate_1.9.4 RColorBrewer_1.1-2 rvest_0.3.4
[19] colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-17
[22] plyr_1.8.4 psych_1.8.12 pkgconfig_2.0.3
[25] broom_0.5.2 haven_2.1.1 scales_1.0.0
[28] htmlTable_1.13.2 generics_0.0.2 withr_2.1.2
[31] repr_1.0.1.9000 skimr_1.0.7 nnet_7.3-12
[34] cli_1.1.0 mnormt_1.5-5 survival_2.44-1.1
[37] magrittr_1.5 crayon_1.3.4 readxl_1.3.1
[40] evaluate_0.14 fs_1.3.1 nlme_3.1-141
[43] xml2_1.2.2 foreign_0.8-72 data.table_1.12.4
[46] tools_3.6.0 hms_0.5.1 gargle_0.4.0
[49] lifecycle_0.1.0 munsell_0.5.0 cluster_2.1.0
[52] compiler_3.6.0 rlang_0.4.0 grid_3.6.0
[55] pbdZMQ_0.3-3 IRkernel_1.0.2.9000 rstudioapi_0.10
[58] htmlwidgets_1.5.1 base64enc_0.1-3 gtable_0.3.0
[61] DBI_1.0.0 reshape_0.8.8 reshape2_1.4.3
[64] R6_2.4.0 gridExtra_2.3 lubridate_1.7.4
[67] knitr_1.25 bit_1.1-14 zeallot_0.1.0
[70] Hmisc_4.2-0 stringi_1.4.3 parallel_3.6.0
[73] IRdisplay_0.7.0.9000 Rcpp_1.0.2 vctrs_0.2.0
[76] rpart_4.1-15 acepack_1.4.1 xfun_0.10
[79] tidyselect_0.2.5
新的 tidyr 1.0 功能使这变得容易得多:
library(tidyverse)
library(magrittr)
#>
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#>
#> set_names
#> The following object is masked from 'package:tidyr':
#>
#> extract
mydata <- tibble(col0=c("row1","row2","row3"),
col1=c(11,21,31),
col2=c(12,22,32),
col3=c(13,23,33))
# First collect all the values in the one column
(new_data <- mydata %>% pivot_longer(col1:col3))
#> # A tibble: 9 x 3
#> col0 name value
#> <chr> <chr> <dbl>
#> 1 row1 col1 11
#> 2 row1 col2 12
#> 3 row1 col3 13
#> 4 row2 col1 21
#> 5 row2 col2 22
#> 6 row2 col3 23
#> 7 row3 col1 31
#> 8 row3 col2 32
#> 9 row3 col3 33
# Col0 is what we want the new column names to come from, so:
(new_data %<>% pivot_wider(names_from = col0))
#> # A tibble: 3 x 4
#> name row1 row2 row3
#> <chr> <dbl> <dbl> <dbl>
#> 1 col1 11 21 31
#> 2 col2 12 22 32
#> 3 col3 13 23 33
因此对于您的 mtcars
用例:
library(tidyverse)
(x <-
mtcars %>%
group_by(mpg) %>%
summarize(Freq.x = n(),
Freq.y = Freq.x/nrow(.)) %>%
pivot_longer(-mpg) %>%
pivot_wider(names_from = mpg))
#> # A tibble: 2 x 26
#> name `10.4` `13.3` `14.3` `14.7` `15` `15.2` `15.5` `15.8` `16.4`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Freq~ 2 1 1 1 1 2 1 1 1
#> 2 Freq~ 0.0625 0.0312 0.0312 0.0312 0.0312 0.0625 0.0312 0.0312 0.0312
#> # ... with 16 more variables: `17.3` <dbl>, `17.8` <dbl>, `18.1` <dbl>,
#> # `18.7` <dbl>, `19.2` <dbl>, `19.7` <dbl>, `21` <dbl>, `21.4` <dbl>,
#> # `21.5` <dbl>, `22.8` <dbl>, `24.4` <dbl>, `26` <dbl>, `27.3` <dbl>,
#> # `30.4` <dbl>, `32.4` <dbl>, `33.9` <dbl>
我对data.table
基本一无所知,但是这让我不红
现在,如果您的值不是同一类型,这仍然会给您带来问题 - 因为它仍然在某个时候将所有值堆叠到一列中 - 所以我打算建议 nest()
方法。但后来我意识到......如果你想转置这个东西,并且行不是所有相同的值类型,那么你最终会尝试将不同类型的值放入一列,不是吗?所以一些同质化的转换将是不可避免的。
由 reprex package (v0.3.0)
于 2019-10-22 创建
您需要确保将 data.table
对象传递给 data.table::melt
和 data.table::dcast
。
x<-merge(as.data.frame(table(mtcars$mpg)),
as.data.frame(round(prop.table(table(mtcars$mpg)),2)),
by="Var1", all.x=TRUE)
data.table::dcast(data.table::melt(data.table::setDT(x), id.vars = "Var1"),
variable ~ Var1)
警告:
您看到,通过使用 data.table::setDT
,“未来错误” 已解决。
#> Warning in melt.data.table(data.table::setDT(x), id.vars = "Var1"):
#> 'measure.vars' [Freq.x, Freq.y] are not all of the same type. By order
#> of hierarchy, the molten data value column will be of type 'double'. All
#> measure variables not of type 'double' will be coerced too. Check DETAILS
#> in ?melt.data.table for more on coercion.
输出:
#> variable 10.4 13.3 14.3 14.7 15 15.2 15.5 15.8 16.4 17.3 17.8 18.1
#> 1: Freq.x 2.00 1.00 1.00 1.00 1.00 2.00 1.00 1.00 1.00 1.00 1.00 1.00
#> 2: Freq.y 0.06 0.03 0.03 0.03 0.03 0.06 0.03 0.03 0.03 0.03 0.03 0.03
#> 18.7 19.2 19.7 21 21.4 21.5 22.8 24.4 26 27.3 30.4 32.4 33.9
#> 1: 1.00 2.00 1.00 2.00 2.00 1.00 2.00 1.00 1.00 1.00 2.00 1.00 1.00
#> 2: 0.03 0.06 0.03 0.06 0.06 0.03 0.06 0.03 0.03 0.03 0.06 0.03 0.03
P.S. 我无法在 data.table_1.12.2
中重现错误,不得不更新到 data.table_1.12.6
.
I need to transpose a dataframe without using t() as I want to avoid transforming it into a matrix.
如果您唯一的要求是避免将数据框强制转换为矩阵,您可以使用 data.table::transpose
,这需要版本 >= 1.12.4
data.table::transpose(
mydata,
keep.names = 'variable',
make.names = names(mydata)[1])
# variable row1 row2 row3
# 1: col1 11 21 31
# 2: col2 12 22 32
# 3: col3 13 23 33
我需要在不使用 t()
的情况下转置数据帧,因为我想避免将其转换为矩阵。所以,我正在使用
mydata <- data.table(col0=c("row1","row2","row3"),
col1=c(11,21,31),
col2=c(12,22,32),
col3=c(13,23,33))
mydata
# col0 col1 col2 col3
# row1 11 12 13
# row2 21 22 23
# row3 31 32 33
dcast(melt(mydata, id.vars = "col0"), variable ~ col0)
# variable row1 row2 row3
# 1: col1 11 21 31
# 2: col2 12 22 32
# 3: col3 13 23 33
我对正在使用的数据使用相同的逻辑:
x <- merge(as.data.frame(table(mtcars$mpg)), as.data.frame(round(prop.table(table(mtcars$mpg)),2)), by="Var1", all.x=TRUE)
data.table::dcast(data.table::melt(x, id.vars = "Var1"), variable ~ Var1)
有效!但它给了我一个警告和一个“未来的错误”:
Warning message in data.table::melt(x, id.vars = "Var1"): “The melt generic in data.table has been passed a data.frame and will attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(x). In the next version, this warning will become an error.” Warning message in data.table::dcast(data.table::melt(x, id.vars = "Var1"), variable ~ : “The dcast generic in data.table has been passed a data.frame and will attempt to redirect to the reshape2::dcast; please note that reshape2 is deprecated, and this redirection is now deprecated as well. Please do this redirection yourself like reshape2::dcast(data.table::melt(x, id.vars = "Var1")). In the next version, this warning will become an error.”
此外,我一直在尝试使用 here 使用 dplyr::spread()
的解决方案转置数据帧,但它似乎比 data.table
包的解决方案复杂得多(当值列多于 1,如本例所示)。我更习惯 dplyr()
和 tidyverse()
但 data.table
解决方案要简单得多,只需忽略它即可。
附加信息。
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GGally_1.4.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3
[5] purrr_0.3.2 readr_1.3.1 tidyr_1.0.0 tibble_2.1.3
[9] ggplot2_3.2.1.9000 tidyverse_1.2.1 bigrquery_1.2.0 httr_1.4.1
loaded via a namespace (and not attached):
[1] bit64_0.9-7 jsonlite_1.6 splines_3.6.0
[4] modelr_0.1.4 Formula_1.2-3 assertthat_0.2.1
[7] getPass_0.2-2 latticeExtra_0.6-28 cellranger_1.1.0
[10] pillar_1.4.2 backports_1.1.5 lattice_0.20-38
[13] glue_1.3.1 uuid_0.1-2 digest_0.6.21
[16] checkmate_1.9.4 RColorBrewer_1.1-2 rvest_0.3.4
[19] colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-17
[22] plyr_1.8.4 psych_1.8.12 pkgconfig_2.0.3
[25] broom_0.5.2 haven_2.1.1 scales_1.0.0
[28] htmlTable_1.13.2 generics_0.0.2 withr_2.1.2
[31] repr_1.0.1.9000 skimr_1.0.7 nnet_7.3-12
[34] cli_1.1.0 mnormt_1.5-5 survival_2.44-1.1
[37] magrittr_1.5 crayon_1.3.4 readxl_1.3.1
[40] evaluate_0.14 fs_1.3.1 nlme_3.1-141
[43] xml2_1.2.2 foreign_0.8-72 data.table_1.12.4
[46] tools_3.6.0 hms_0.5.1 gargle_0.4.0
[49] lifecycle_0.1.0 munsell_0.5.0 cluster_2.1.0
[52] compiler_3.6.0 rlang_0.4.0 grid_3.6.0
[55] pbdZMQ_0.3-3 IRkernel_1.0.2.9000 rstudioapi_0.10
[58] htmlwidgets_1.5.1 base64enc_0.1-3 gtable_0.3.0
[61] DBI_1.0.0 reshape_0.8.8 reshape2_1.4.3
[64] R6_2.4.0 gridExtra_2.3 lubridate_1.7.4
[67] knitr_1.25 bit_1.1-14 zeallot_0.1.0
[70] Hmisc_4.2-0 stringi_1.4.3 parallel_3.6.0
[73] IRdisplay_0.7.0.9000 Rcpp_1.0.2 vctrs_0.2.0
[76] rpart_4.1-15 acepack_1.4.1 xfun_0.10
[79] tidyselect_0.2.5
新的 tidyr 1.0 功能使这变得容易得多:
library(tidyverse)
library(magrittr)
#>
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#>
#> set_names
#> The following object is masked from 'package:tidyr':
#>
#> extract
mydata <- tibble(col0=c("row1","row2","row3"),
col1=c(11,21,31),
col2=c(12,22,32),
col3=c(13,23,33))
# First collect all the values in the one column
(new_data <- mydata %>% pivot_longer(col1:col3))
#> # A tibble: 9 x 3
#> col0 name value
#> <chr> <chr> <dbl>
#> 1 row1 col1 11
#> 2 row1 col2 12
#> 3 row1 col3 13
#> 4 row2 col1 21
#> 5 row2 col2 22
#> 6 row2 col3 23
#> 7 row3 col1 31
#> 8 row3 col2 32
#> 9 row3 col3 33
# Col0 is what we want the new column names to come from, so:
(new_data %<>% pivot_wider(names_from = col0))
#> # A tibble: 3 x 4
#> name row1 row2 row3
#> <chr> <dbl> <dbl> <dbl>
#> 1 col1 11 21 31
#> 2 col2 12 22 32
#> 3 col3 13 23 33
因此对于您的 mtcars
用例:
library(tidyverse)
(x <-
mtcars %>%
group_by(mpg) %>%
summarize(Freq.x = n(),
Freq.y = Freq.x/nrow(.)) %>%
pivot_longer(-mpg) %>%
pivot_wider(names_from = mpg))
#> # A tibble: 2 x 26
#> name `10.4` `13.3` `14.3` `14.7` `15` `15.2` `15.5` `15.8` `16.4`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Freq~ 2 1 1 1 1 2 1 1 1
#> 2 Freq~ 0.0625 0.0312 0.0312 0.0312 0.0312 0.0625 0.0312 0.0312 0.0312
#> # ... with 16 more variables: `17.3` <dbl>, `17.8` <dbl>, `18.1` <dbl>,
#> # `18.7` <dbl>, `19.2` <dbl>, `19.7` <dbl>, `21` <dbl>, `21.4` <dbl>,
#> # `21.5` <dbl>, `22.8` <dbl>, `24.4` <dbl>, `26` <dbl>, `27.3` <dbl>,
#> # `30.4` <dbl>, `32.4` <dbl>, `33.9` <dbl>
我对data.table
基本一无所知,但是这让我不红
现在,如果您的值不是同一类型,这仍然会给您带来问题 - 因为它仍然在某个时候将所有值堆叠到一列中 - 所以我打算建议 nest()
方法。但后来我意识到......如果你想转置这个东西,并且行不是所有相同的值类型,那么你最终会尝试将不同类型的值放入一列,不是吗?所以一些同质化的转换将是不可避免的。
由 reprex package (v0.3.0)
于 2019-10-22 创建您需要确保将 data.table
对象传递给 data.table::melt
和 data.table::dcast
。
x<-merge(as.data.frame(table(mtcars$mpg)),
as.data.frame(round(prop.table(table(mtcars$mpg)),2)),
by="Var1", all.x=TRUE)
data.table::dcast(data.table::melt(data.table::setDT(x), id.vars = "Var1"),
variable ~ Var1)
警告:
您看到,通过使用 data.table::setDT
,“未来错误” 已解决。
#> Warning in melt.data.table(data.table::setDT(x), id.vars = "Var1"):
#> 'measure.vars' [Freq.x, Freq.y] are not all of the same type. By order
#> of hierarchy, the molten data value column will be of type 'double'. All
#> measure variables not of type 'double' will be coerced too. Check DETAILS
#> in ?melt.data.table for more on coercion.
输出:
#> variable 10.4 13.3 14.3 14.7 15 15.2 15.5 15.8 16.4 17.3 17.8 18.1
#> 1: Freq.x 2.00 1.00 1.00 1.00 1.00 2.00 1.00 1.00 1.00 1.00 1.00 1.00
#> 2: Freq.y 0.06 0.03 0.03 0.03 0.03 0.06 0.03 0.03 0.03 0.03 0.03 0.03
#> 18.7 19.2 19.7 21 21.4 21.5 22.8 24.4 26 27.3 30.4 32.4 33.9
#> 1: 1.00 2.00 1.00 2.00 2.00 1.00 2.00 1.00 1.00 1.00 2.00 1.00 1.00
#> 2: 0.03 0.06 0.03 0.06 0.06 0.03 0.06 0.03 0.03 0.03 0.06 0.03 0.03
P.S. 我无法在 data.table_1.12.2
中重现错误,不得不更新到 data.table_1.12.6
.
I need to transpose a dataframe without using t() as I want to avoid transforming it into a matrix.
如果您唯一的要求是避免将数据框强制转换为矩阵,您可以使用 data.table::transpose
,这需要版本 >= 1.12.4
data.table::transpose(
mydata,
keep.names = 'variable',
make.names = names(mydata)[1])
# variable row1 row2 row3
# 1: col1 11 21 31
# 2: col2 12 22 32
# 3: col3 13 23 33