如何通过 R 中的列值汇总多个字符串值?
How can I summarize multiple string values by a column value in R?
好的,所以这个问题并不像标题听起来那么简单。我有一个 table 结构如下:
| Brand | First Name | Last Name | Amount | e-mail |
|-------|------------|-----------|---------|---------------------|
| A | John | Smith | 920 USD | johnsmith@email.com |
| A | Mary | Smith | 650 USD | johnsmith@email.com |
| A | Margaret | Smith | 400 USD | johnsmith@email.com |
| B | Eric | Davis | 120 USD | jdavis@email.com |
| B | Wanda | Davis | 500 USD | jdavis@email.com |
| B | Jean | Davis | 300 USD | jdavis@email.com |
| A | Daniel | Barnes | 400 USD | dbarnes@email.com |
我最终想做的是生成要发送的电子邮件以通知客户他们的信用余额,在上面的示例中,我想向 johnsmith@email.com 发送一封电子邮件,说类似“您有品牌 A 的积分。John Smith 有 920 美元,Mary Smith 有 650 美元,Margaret Smith 有 400 美元。”
对于这个问题,我不需要一直讲下去,但我想做的是每个 e-mail 一行,其中以某种方式包含该电子邮件每一行的信息。也许某种生成的串联字段?这在理论上似乎很简单,但在实践中我很难想出如何在 R 中做到这一点。任何帮助将不胜感激!
奖励: 我对 MySQL 也相当有经验,所以如果 SQL 有更好的方法,那就是太棒了!
编辑: Dput 输出(编辑了姓名和电子邮件)
structure(list(BRAND = c("R", "C", "C", "C", "C", "R", "R", "C",
"C", "C"), GUEST_S_LAST_NAME = c("Stockman", "Ericson", "Ericson",
"Alcin", "Andrews", "Smith", "Smith", "Brown", "Brown", "Brown"
), GUEST_S_FIRST_NAME = c("Margaret", "Abraham", "Naomi", "Dina",
"Arthur", "Laura", "Alan", "Gregory", "Marina", "Viktoria"),
COMPENSATIONAMOUNT_OR_PERCENT = c("920 USD", "1363 USD",
"1363 USD", "452 USD", "452 USD", "250 USD", "250 USD", "1019 USD",
"1019 USD", "323 USD"), EXPIRATION_DATE = c("04/30/2022 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"04/30/2022 12:00:00 00 am", "04/30/2022 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"12/31/2021 12:00:00 00 am"), EMAIL = c("email1@email.com",
"email2@email.com", "email2@email.com", "email3@email.com",
"email3@email.com", "email4@email.com", "email4@email.com",
"email5@email.com", "email5@email.com", "email5@email.com"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
这是我使用 dplyr
的方法:
library(dplyr)
your_data %>%
group_by(BRAND, EMAIL) %>%
summarize(text = paste0(
sprintf("You have credits with Brand %s. ", BRAND),
paste(sprintf("%s %s has %s",
GUEST_S_FIRST_NAME,
GUEST_S_LAST_NAME,
COMPENSATIONAMOUNT_OR_PERCENT),
collapse = ", "), "."))
Returns:
# A tibble: 10 x 3
# Groups: BRAND, EMAIL [5]
BRAND EMAIL text
<chr> <chr> <chr>
1 C email2@email… You have credits with Brand C. Abraham Ericson has 1363 …
2 C email2@email… You have credits with Brand C. Abraham Ericson has 1363 …
3 C email3@email… You have credits with Brand C. Dina Alcin has 452 USD, A…
4 C email3@email… You have credits with Brand C. Dina Alcin has 452 USD, A…
5 C email5@email… You have credits with Brand C. Gregory Brown has 1019 US…
6 C email5@email… You have credits with Brand C. Gregory Brown has 1019 US…
7 C email5@email… You have credits with Brand C. Gregory Brown has 1019 US…
8 R email1@email… You have credits with Brand R. Margaret Stockman has 920…
9 R email4@email… You have credits with Brand R. Laura Smith has 250 USD, …
10 R email4@email… You have credits with Brand R. Laura Smith has 250 USD, …
# Data used:
your_data <- structure(list(BRAND = c("R", "C", "C", "C", "C", "R", "R", "C", "C", "C"), GUEST_S_LAST_NAME = c("Stockman", "Ericson", "Ericson", "Alcin", "Andrews", "Smith", "Smith", "Brown", "Brown", "Brown"), GUEST_S_FIRST_NAME = c("Margaret", "Abraham", "Naomi", "Dina", "Arthur", "Laura", "Alan", "Gregory", "Marina", "Viktoria"), COMPENSATIONAMOUNT_OR_PERCENT = c("920 USD", "1363 USD", "1363 USD", "452 USD", "452 USD", "250 USD", "250 USD", "1019 USD", "1019 USD", "323 USD"), EXPIRATION_DATE = c("04/30/2022 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "04/30/2022 12:00:00 00 am", "04/30/2022 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am"), EMAIL = c("email1@email.com", "email2@email.com", "email2@email.com", "email3@email.com", "email3@email.com", "email4@email.com", "email4@email.com", "email5@email.com", "email5@email.com", "email5@email.com")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
好的,所以这个问题并不像标题听起来那么简单。我有一个 table 结构如下:
| Brand | First Name | Last Name | Amount | e-mail |
|-------|------------|-----------|---------|---------------------|
| A | John | Smith | 920 USD | johnsmith@email.com |
| A | Mary | Smith | 650 USD | johnsmith@email.com |
| A | Margaret | Smith | 400 USD | johnsmith@email.com |
| B | Eric | Davis | 120 USD | jdavis@email.com |
| B | Wanda | Davis | 500 USD | jdavis@email.com |
| B | Jean | Davis | 300 USD | jdavis@email.com |
| A | Daniel | Barnes | 400 USD | dbarnes@email.com |
我最终想做的是生成要发送的电子邮件以通知客户他们的信用余额,在上面的示例中,我想向 johnsmith@email.com 发送一封电子邮件,说类似“您有品牌 A 的积分。John Smith 有 920 美元,Mary Smith 有 650 美元,Margaret Smith 有 400 美元。”
对于这个问题,我不需要一直讲下去,但我想做的是每个 e-mail 一行,其中以某种方式包含该电子邮件每一行的信息。也许某种生成的串联字段?这在理论上似乎很简单,但在实践中我很难想出如何在 R 中做到这一点。任何帮助将不胜感激!
奖励: 我对 MySQL 也相当有经验,所以如果 SQL 有更好的方法,那就是太棒了!
编辑: Dput 输出(编辑了姓名和电子邮件)
structure(list(BRAND = c("R", "C", "C", "C", "C", "R", "R", "C",
"C", "C"), GUEST_S_LAST_NAME = c("Stockman", "Ericson", "Ericson",
"Alcin", "Andrews", "Smith", "Smith", "Brown", "Brown", "Brown"
), GUEST_S_FIRST_NAME = c("Margaret", "Abraham", "Naomi", "Dina",
"Arthur", "Laura", "Alan", "Gregory", "Marina", "Viktoria"),
COMPENSATIONAMOUNT_OR_PERCENT = c("920 USD", "1363 USD",
"1363 USD", "452 USD", "452 USD", "250 USD", "250 USD", "1019 USD",
"1019 USD", "323 USD"), EXPIRATION_DATE = c("04/30/2022 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"04/30/2022 12:00:00 00 am", "04/30/2022 12:00:00 00 am",
"12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am",
"12/31/2021 12:00:00 00 am"), EMAIL = c("email1@email.com",
"email2@email.com", "email2@email.com", "email3@email.com",
"email3@email.com", "email4@email.com", "email4@email.com",
"email5@email.com", "email5@email.com", "email5@email.com"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
这是我使用 dplyr
的方法:
library(dplyr)
your_data %>%
group_by(BRAND, EMAIL) %>%
summarize(text = paste0(
sprintf("You have credits with Brand %s. ", BRAND),
paste(sprintf("%s %s has %s",
GUEST_S_FIRST_NAME,
GUEST_S_LAST_NAME,
COMPENSATIONAMOUNT_OR_PERCENT),
collapse = ", "), "."))
Returns:
# A tibble: 10 x 3
# Groups: BRAND, EMAIL [5]
BRAND EMAIL text
<chr> <chr> <chr>
1 C email2@email… You have credits with Brand C. Abraham Ericson has 1363 …
2 C email2@email… You have credits with Brand C. Abraham Ericson has 1363 …
3 C email3@email… You have credits with Brand C. Dina Alcin has 452 USD, A…
4 C email3@email… You have credits with Brand C. Dina Alcin has 452 USD, A…
5 C email5@email… You have credits with Brand C. Gregory Brown has 1019 US…
6 C email5@email… You have credits with Brand C. Gregory Brown has 1019 US…
7 C email5@email… You have credits with Brand C. Gregory Brown has 1019 US…
8 R email1@email… You have credits with Brand R. Margaret Stockman has 920…
9 R email4@email… You have credits with Brand R. Laura Smith has 250 USD, …
10 R email4@email… You have credits with Brand R. Laura Smith has 250 USD, …
# Data used:
your_data <- structure(list(BRAND = c("R", "C", "C", "C", "C", "R", "R", "C", "C", "C"), GUEST_S_LAST_NAME = c("Stockman", "Ericson", "Ericson", "Alcin", "Andrews", "Smith", "Smith", "Brown", "Brown", "Brown"), GUEST_S_FIRST_NAME = c("Margaret", "Abraham", "Naomi", "Dina", "Arthur", "Laura", "Alan", "Gregory", "Marina", "Viktoria"), COMPENSATIONAMOUNT_OR_PERCENT = c("920 USD", "1363 USD", "1363 USD", "452 USD", "452 USD", "250 USD", "250 USD", "1019 USD", "1019 USD", "323 USD"), EXPIRATION_DATE = c("04/30/2022 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "04/30/2022 12:00:00 00 am", "04/30/2022 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am", "12/31/2021 12:00:00 00 am"), EMAIL = c("email1@email.com", "email2@email.com", "email2@email.com", "email3@email.com", "email3@email.com", "email4@email.com", "email4@email.com", "email5@email.com", "email5@email.com", "email5@email.com")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))