R：使用 dplyr 缩放多列（具有相似名称）的子集

Question

我最近从 R 中的常见数据帧操作转向了 tidyverse。但是我遇到了有关使用 scale() 函数缩放列的问题。我的数据由一些列组成，其中一些是数字特征，一些是分类特征。最后一列也是数据的 y 值。所以我想缩放所有数字列而不是最后一列。使用 select() 函数，我可以编写非常短的代码行和 select 如果我添加 ends_with("...") 参数，我所有需要缩放的数字列。但是我真的不能通过缩放来利用它。在那里我必须使用 transmute(feature1=scale(feature1),feature2=scale(feature2)...) 并分别命名每个功能。这工作正常但会使代码膨胀。所以我的问题是：

Is there a smart solution to manipulate column by column without the need to address every single column name with transmute?

我想是这样的：

transmute(ends_with("...")=scale(ends_with("..."),featureX,featureZ)

（很清楚这行不通）

非常感谢

Answer 1

library(tidyverse)
data("economics") 

# add variables that are not numeric
economics[7:9] <- sample(LETTERS[1:10], size = dim(economics)[1], replace = TRUE)

# add a 'y' column (for illustration)
set.seed(1)
economics$y <- rnorm(n = dim(economics)[1])

economics_modified <- economics %>%
                       select(-y) %>%
                       transmute_if(is.numeric, scale) %>% 
                       add_column(y = economics$y)

如果您想保留那些非数字的列，请将 transmute_if 替换为 modify_if。（可能有一种更聪明的方法可以将列 y 排除在缩放范围之外。）

R：使用 dplyr 缩放多列（具有相似名称）的子集

R: Scale a subset of multiple columns (with similar names) with dplyr

r

dplyr

tidyverse