有 left_join 和 list-column 的选项吗?
Is there an option to left_join a list-column?
假设我有一个包含销售额的数据集:
library(tidyverse)
initial_data <- tribble(
~ billing_loc, ~ type, ~ billing_comment,
"New York", "RE", "aaaa ssss 003tt",
"London", "ZO", "BO",
"Paris", "ZO", "003sd, 003Wf; 003ghdiscount"
)
首先,我想提取 billing_comment 列中提供的所有项目 ID。一个项目id总是以“003”开头,长度为5。我用下面的代码做到了:
modified_data <- initial_data %>%
mutate(project_id = str_extract_all(billing_comment, "003.."))
其次,我想使用left_join从第二个table中查找有关id的信息并将其插入list-column:
project_id_database <- tribble(
~ project_id, ~ type, ~ owner,
"003tt", "ZO", "Juan",
"003sd", "ZO", "Mike",
"003aA", "RE", "Brent",
"003Wf", "ZO", "Brent",
"003gh", "RE", "Anna",
"003qQ", "ZO", "Donald"
)
有没有一种方法可以在不取消嵌套的情况下对嵌套数据使用 left_join,并得到一个 list-column,其中包含关于这些 ID 的所有信息的小标题(不要与
billing_loc type billing_comment project_data
<chr> <chr> <chr> <list>
1 New York RE aaaa ssss 003tt <tibble [1 x 3]>
2 London ZO BO <lgl [1]>
3 Paris ZO 003sd, 003Wf; 003ghdiscount <tibble [3 x 3]>
我找到了一种使用 unnest()
然后使用 left_join
的方法,但我认为应该有更有效的解决方案。
最后,如果有一个解决方案来添加一个只有一个 id 的列与列“类型”中的条件匹配(它有几个 id 匹配它应该 return 第一个条件),那就太好了。为此,我使用了 map()
,但我也认为这种方式并不像它可能的那样有效,因为我为此使用了“状态”列:
test_data_nest %>%
mutate(final_project = map(data, ~ filter(., status == "Match"))) %>%
mutate(final_project = map(final_opp, 1))
我认为分开行、提取 project_id
、附加详细信息、嵌套,然后重新连接会比尝试 map
:
更简单、更快捷
initial_data %>%
separate_rows(billing_comment) %>%
mutate(project_id = str_extract(billing_comment, "003..")) %>%
inner_join(project_id_database %>% select(-type), by="project_id") %>%
group_by(billing_loc, type) %>%
nest() %>%
right_join(initial_data, by=c("billing_loc", "type"))
## A tibble: 3 x 4
## Groups: billing_loc, type [3]
# billing_loc type data billing_comment
# <chr> <chr> <list> <chr>
#1 New York RE <tibble [1 x 3]> aaaa ssss 003tt
#2 Paris ZO <tibble [3 x 3]> 003sd, 003Wf; 003ghdiscount
#3 London ZO <NULL> BO
如果您想在不取消嵌套长格式数据的情况下执行此操作,您可以使用 -
library(tidyverse)
initial_data %>%
mutate(project_id = str_extract_all(billing_comment, "003.."),
data = map(project_id,
~project_id_database[match(.x, project_id_database$project_id), ]))
# A tibble: 3 x 5
# billing_loc type billing_comment project_id data
# <chr> <chr> <chr> <list> <list>
#1 New York RE aaaa ssss 003tt <chr [1]> <tibble [1 × 3]>
#2 London ZO BO <chr [0]> <tibble [0 × 3]>
#3 Paris ZO 003sd, 003Wf; 003ghdiscount <chr [3]> <tibble [3 × 3]>
假设我有一个包含销售额的数据集:
library(tidyverse)
initial_data <- tribble(
~ billing_loc, ~ type, ~ billing_comment,
"New York", "RE", "aaaa ssss 003tt",
"London", "ZO", "BO",
"Paris", "ZO", "003sd, 003Wf; 003ghdiscount"
)
首先,我想提取 billing_comment 列中提供的所有项目 ID。一个项目id总是以“003”开头,长度为5。我用下面的代码做到了:
modified_data <- initial_data %>%
mutate(project_id = str_extract_all(billing_comment, "003.."))
其次,我想使用left_join从第二个table中查找有关id的信息并将其插入list-column:
project_id_database <- tribble(
~ project_id, ~ type, ~ owner,
"003tt", "ZO", "Juan",
"003sd", "ZO", "Mike",
"003aA", "RE", "Brent",
"003Wf", "ZO", "Brent",
"003gh", "RE", "Anna",
"003qQ", "ZO", "Donald"
)
有没有一种方法可以在不取消嵌套的情况下对嵌套数据使用 left_join,并得到一个 list-column,其中包含关于这些 ID 的所有信息的小标题(不要与 我找到了一种使用 billing_loc type billing_comment project_data
<chr> <chr> <chr> <list>
1 New York RE aaaa ssss 003tt <tibble [1 x 3]>
2 London ZO BO <lgl [1]>
3 Paris ZO 003sd, 003Wf; 003ghdiscount <tibble [3 x 3]>
unnest()
然后使用 left_join
的方法,但我认为应该有更有效的解决方案。
最后,如果有一个解决方案来添加一个只有一个 id 的列与列“类型”中的条件匹配(它有几个 id 匹配它应该 return 第一个条件),那就太好了。为此,我使用了 map()
,但我也认为这种方式并不像它可能的那样有效,因为我为此使用了“状态”列:test_data_nest %>%
mutate(final_project = map(data, ~ filter(., status == "Match"))) %>%
mutate(final_project = map(final_opp, 1))
我认为分开行、提取 project_id
、附加详细信息、嵌套,然后重新连接会比尝试 map
:
initial_data %>%
separate_rows(billing_comment) %>%
mutate(project_id = str_extract(billing_comment, "003..")) %>%
inner_join(project_id_database %>% select(-type), by="project_id") %>%
group_by(billing_loc, type) %>%
nest() %>%
right_join(initial_data, by=c("billing_loc", "type"))
## A tibble: 3 x 4
## Groups: billing_loc, type [3]
# billing_loc type data billing_comment
# <chr> <chr> <list> <chr>
#1 New York RE <tibble [1 x 3]> aaaa ssss 003tt
#2 Paris ZO <tibble [3 x 3]> 003sd, 003Wf; 003ghdiscount
#3 London ZO <NULL> BO
如果您想在不取消嵌套长格式数据的情况下执行此操作,您可以使用 -
library(tidyverse)
initial_data %>%
mutate(project_id = str_extract_all(billing_comment, "003.."),
data = map(project_id,
~project_id_database[match(.x, project_id_database$project_id), ]))
# A tibble: 3 x 5
# billing_loc type billing_comment project_id data
# <chr> <chr> <chr> <list> <list>
#1 New York RE aaaa ssss 003tt <chr [1]> <tibble [1 × 3]>
#2 London ZO BO <chr [0]> <tibble [0 × 3]>
#3 Paris ZO 003sd, 003Wf; 003ghdiscount <chr [3]> <tibble [3 × 3]>