在 Redshift dplyr 操作中重命名汇总列
Renaming a Summarised Column inside Redshift dplyr operations
我正在使用 dplyr 在 Redshift 中执行某些操作,因此我不会将数据加载到内存中。
data <- tbl(conn, "customers") %>%
filter(age >= 18)
subset <- data %>%
filter(eye_color != "brown") %>%
group_by(gender, method, age, region) %>%
summarise(sum(purchases)) %>% # will create a column called sum(purchases)
full_join(data, by=c("region", "age", "method"))
现在,当我查看生成的数据框时,我会看到一个名为 sum(purchases)
的列,我想将其重命名为 purchases
,这将创建列 purchase.x
和purchase.y
合并后。
我已经read so far are dealing with dataframes that are in memory rather than dataframes that are lazily evaluated with dbplyr. I have tried using rename
, rename_
, rename_at
as well as different variations of select
. I have also tried strategies laid out and 重命名了大部分,但没有成功
有没有办法重命名 sum(purchases)
。我唯一的其他选择是在特定步骤将数据帧加载到内存中
data <- tbl(conn, "customers") %>%
filter(age >= 18)
subset <- data %>%
filter(eye_color != "brown") %>%
group_by(gender, method, age, region) %>%
summarise(sum(purchases)) %>%
loaded <- as.data.frame(subset)
# do some join here but in memory and not in Redshift
# full_join(data, by=c("region", "age", "method"))
您可以在 summarise
中指定名称。我没有你的数据,所以我无法进行三次检查,但我之前在调用 summarise(n())
时在我自己的代码中使用过它。像...
summarise(your_column_name = sum(purchases))
你也可以给它传递一个带空格的列名,你只需要使用反引号
summarise(`your column name` = sum(purchases))
我正在使用 dplyr 在 Redshift 中执行某些操作,因此我不会将数据加载到内存中。
data <- tbl(conn, "customers") %>%
filter(age >= 18)
subset <- data %>%
filter(eye_color != "brown") %>%
group_by(gender, method, age, region) %>%
summarise(sum(purchases)) %>% # will create a column called sum(purchases)
full_join(data, by=c("region", "age", "method"))
现在,当我查看生成的数据框时,我会看到一个名为 sum(purchases)
的列,我想将其重命名为 purchases
,这将创建列 purchase.x
和purchase.y
合并后。
我已经read so far are dealing with dataframes that are in memory rather than dataframes that are lazily evaluated with dbplyr. I have tried using rename
, rename_
, rename_at
as well as different variations of select
. I have also tried strategies laid out
有没有办法重命名 sum(purchases)
。我唯一的其他选择是在特定步骤将数据帧加载到内存中
data <- tbl(conn, "customers") %>%
filter(age >= 18)
subset <- data %>%
filter(eye_color != "brown") %>%
group_by(gender, method, age, region) %>%
summarise(sum(purchases)) %>%
loaded <- as.data.frame(subset)
# do some join here but in memory and not in Redshift
# full_join(data, by=c("region", "age", "method"))
您可以在 summarise
中指定名称。我没有你的数据,所以我无法进行三次检查,但我之前在调用 summarise(n())
时在我自己的代码中使用过它。像...
summarise(your_column_name = sum(purchases))
你也可以给它传递一个带空格的列名,你只需要使用反引号
summarise(`your column name` = sum(purchases))