在 Redshift dplyr 操作中重命名汇总列

Question

我正在使用 dplyr 在 Redshift 中执行某些操作，因此我不会将数据加载到内存中。

data <- tbl(conn, "customers") %>%
  filter(age >= 18)
subset <- data %>% 
  filter(eye_color != "brown") %>%
  group_by(gender, method, age, region) %>% 
  summarise(sum(purchases)) %>%  # will create a column called sum(purchases)
  full_join(data, by=c("region", "age", "method"))

现在，当我查看生成的数据框时，我会看到一个名为 sum(purchases) 的列，我想将其重命名为 purchases，这将创建列 purchase.x 和purchase.y合并后。

我已经read so far are dealing with dataframes that are in memory rather than dataframes that are lazily evaluated with dbplyr. I have tried using rename, rename_, rename_at as well as different variations of select. I have also tried strategies laid out and 重命名了大部分，但没有成功

有没有办法重命名 sum(purchases)。我唯一的其他选择是在特定步骤将数据帧加载到内存中

data <- tbl(conn, "customers") %>%
  filter(age >= 18)
subset <- data %>% 
  filter(eye_color != "brown") %>%
  group_by(gender, method, age, region) %>% 
  summarise(sum(purchases)) %>% 
loaded <- as.data.frame(subset)
# do some join here but in memory and not in Redshift
# full_join(data, by=c("region", "age", "method"))

Answer 1

您可以在 summarise 中指定名称。我没有你的数据，所以我无法进行三次检查，但我之前在调用 summarise(n()) 时在我自己的代码中使用过它。像...

summarise(your_column_name = sum(purchases))

你也可以给它传递一个带空格的列名，你只需要使用反引号

summarise(`your column name` = sum(purchases))

在 Redshift dplyr 操作中重命名汇总列

Renaming a Summarised Column inside Redshift dplyr operations

r

amazon-redshift

dplyr

dbplyr