如何做 dplyr inner_join col1 > col2
How to do a dplyr inner_join col1 > col2
当我不使用标准 "col1" = "col2" 连接时,我很难让 dplyr 连接工作。这是我遇到的两个例子。
首先:
library(dplyr)
tableA <- data.frame(col1= c("a","b","c","d"),
col2 = c(1,2,3,4))
inner_join(tableA, tableA, by = c("col1"!="col1")) %>%
select(col1, col2.x) %>%
arrange(col1, col2.x)
Error: by
must be a (named) character vector, list, or NULL for
natural joins (not recommended in production code), not logical
当我复制此代码但使用 sql 时,我得到以下信息:
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, tableA)
tbl(con, sql("select a.col1, b.col2
from
tableA as a
inner join
tableA as b
on a.col1 <> b.col1")) %>%
arrange(col1, col2)
来自 sql 查询的结果:
# Source: SQL [?? x 2]
# Database: sqlite 3.19.3 [:memory:]
# Ordered by: col1, col2
col1 col2
<chr> <dbl>
1 a 2
2 a 3
3 a 4
4 b 1
5 b 3
6 b 4
7 c 1
8 c 2
9 c 4
10 d 1
# ... with more rows
第二部分与上一部分类似:
inner_join(tableA, tableA, by = c("col1" > "col1")) %>%
select(col1, col2.x) %>%
arrange(col1, col2.x)
Error: by
must be a (named) character vector, list, or NULL for
natural joins (not recommended in production code), not logical
Sql 相当于:
tbl(con, sql("select a.col1, b.col2
from tableA as a
inner join tableA as b
on a.col1 > b.col1")) %>%
arrange(col1, col2)
第二个 sql 查询的结果:
# Source: SQL [?? x 2]
# Database: sqlite 3.19.3 [:memory:]
# Ordered by: col1, col2
col1 col2
<chr> <dbl>
1 b 1
2 c 1
3 c 2
4 d 1
5 d 2
6 d 3
有谁知道如何使用 dplyr 代码创建这些 sql 示例?
使用 dplyr
和 tidyr
的解决方案。想法是扩展数据框,然后与原始数据框进行连接。之后用tidyr
中的fill
填入NA
到之前的记录。最后过滤掉和NA
.
值相同的记录
library(dplyr)
library(tidyr)
tableB <- tableA %>%
complete(col1, col2) %>%
left_join(tableA %>% mutate(col3 = col2), by = c("col1", "col2")) %>%
group_by(col1) %>%
fill(col3, .direction = "up") %>%
filter(col2 != col3, !is.na(col3)) %>%
select(-col3) %>%
ungroup()
tableB
# # A tibble: 6 x 2
# col1 col2
# <chr> <dbl>
# 1 b 1
# 2 c 1
# 3 c 2
# 4 d 1
# 5 d 2
# 6 d 3
数据
tableA <- data.frame(col1= c("a","b","c","d"),
col2 = c(1,2,3,4), stringsAsFactors = FALSE)
对于您的第一个案例:
library(dplyr)
library(tidyr)
expand(tableA, col1, col2) %>%
left_join(tableA, by = 'col1') %>%
filter(col2.x != col2.y) %>%
select(col1, col2 = col2.x)
结果:
# A tibble: 12 x 2
col1 col2
<fctr> <dbl>
1 a 2
2 a 3
3 a 4
4 b 1
5 b 3
6 b 4
7 c 1
8 c 2
9 c 4
10 d 1
11 d 2
12 d 3
对于你的第二种情况:
expand(tableA, col1, col2) %>%
left_join(tableA, by = 'col1') %>%
filter(col2.x < col2.y) %>%
select(col1, col2 = col2.x)
结果:
# A tibble: 6 x 2
col1 col2
<fctr> <dbl>
1 b 1
2 c 1
3 c 2
4 d 1
5 d 2
6 d 3
当我不使用标准 "col1" = "col2" 连接时,我很难让 dplyr 连接工作。这是我遇到的两个例子。
首先:
library(dplyr)
tableA <- data.frame(col1= c("a","b","c","d"),
col2 = c(1,2,3,4))
inner_join(tableA, tableA, by = c("col1"!="col1")) %>%
select(col1, col2.x) %>%
arrange(col1, col2.x)
Error:
by
must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not logical
当我复制此代码但使用 sql 时,我得到以下信息:
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, tableA)
tbl(con, sql("select a.col1, b.col2
from
tableA as a
inner join
tableA as b
on a.col1 <> b.col1")) %>%
arrange(col1, col2)
来自 sql 查询的结果:
# Source: SQL [?? x 2]
# Database: sqlite 3.19.3 [:memory:]
# Ordered by: col1, col2
col1 col2
<chr> <dbl>
1 a 2
2 a 3
3 a 4
4 b 1
5 b 3
6 b 4
7 c 1
8 c 2
9 c 4
10 d 1
# ... with more rows
第二部分与上一部分类似:
inner_join(tableA, tableA, by = c("col1" > "col1")) %>%
select(col1, col2.x) %>%
arrange(col1, col2.x)
Error:
by
must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not logical
Sql 相当于:
tbl(con, sql("select a.col1, b.col2
from tableA as a
inner join tableA as b
on a.col1 > b.col1")) %>%
arrange(col1, col2)
第二个 sql 查询的结果:
# Source: SQL [?? x 2]
# Database: sqlite 3.19.3 [:memory:]
# Ordered by: col1, col2
col1 col2
<chr> <dbl>
1 b 1
2 c 1
3 c 2
4 d 1
5 d 2
6 d 3
有谁知道如何使用 dplyr 代码创建这些 sql 示例?
使用 dplyr
和 tidyr
的解决方案。想法是扩展数据框,然后与原始数据框进行连接。之后用tidyr
中的fill
填入NA
到之前的记录。最后过滤掉和NA
.
library(dplyr)
library(tidyr)
tableB <- tableA %>%
complete(col1, col2) %>%
left_join(tableA %>% mutate(col3 = col2), by = c("col1", "col2")) %>%
group_by(col1) %>%
fill(col3, .direction = "up") %>%
filter(col2 != col3, !is.na(col3)) %>%
select(-col3) %>%
ungroup()
tableB
# # A tibble: 6 x 2
# col1 col2
# <chr> <dbl>
# 1 b 1
# 2 c 1
# 3 c 2
# 4 d 1
# 5 d 2
# 6 d 3
数据
tableA <- data.frame(col1= c("a","b","c","d"),
col2 = c(1,2,3,4), stringsAsFactors = FALSE)
对于您的第一个案例:
library(dplyr)
library(tidyr)
expand(tableA, col1, col2) %>%
left_join(tableA, by = 'col1') %>%
filter(col2.x != col2.y) %>%
select(col1, col2 = col2.x)
结果:
# A tibble: 12 x 2
col1 col2
<fctr> <dbl>
1 a 2
2 a 3
3 a 4
4 b 1
5 b 3
6 b 4
7 c 1
8 c 2
9 c 4
10 d 1
11 d 2
12 d 3
对于你的第二种情况:
expand(tableA, col1, col2) %>%
left_join(tableA, by = 'col1') %>%
filter(col2.x < col2.y) %>%
select(col1, col2 = col2.x)
结果:
# A tibble: 6 x 2
col1 col2
<fctr> <dbl>
1 b 1
2 c 1
3 c 2
4 d 1
5 d 2
6 d 3