如何通过比较两个公共列来识别仅存在于两个数据集之一中的行?
How to identify rows present in only one of the two datasets by comparing two of the common columns?
我有两个列名相同的数据框,示例如下。
>dataframe1
Company_name Transaction_Code Sum
1: First 2000 234
2: First 3000 562
3: First 4000 105
4: Second 8888 740
5: Third 9000 325
6: Third 4000 145
7: BBB 1000 28
8: BBB 3535 100
>dataframe2
Company_name Transaction_Code Sum
1: First 2000 340
2: First 3000 620
3: First 4000 050
4: Second 8888 400
5: Third 9000 250
6: Third 4000 450
7: BBB 1000 27
我正在尝试通过前两列的值检查条目,以查看 dataframe1 中的哪些条目在 dataframe2 中丢失。如图所示,dataframe1 具有 dataframe2 中缺少的条目 #8。我已经看到 dplyr::anti.join 解决此类任务的解决方案 condition/column,但当我需要根据两列中的值判断条目时,它似乎不起作用。
P.S。我没有包含任何可重现的示例,因为我认为没有意义。我远不是 R 或一般编码方面的专家,所以这个问题可能以某种方式缺乏,抱歉。
setdiff()
可能就是您要找的:
df1 <- data.frame(company = c("first","first","first","second","third","third","BBB","BBB"),
transac = c(2000,3000,4000,8888,9000,4000,1000,3535),
sum=c(234,562,105,740,325,145,28,100))
df2 <- data.frame(company = c("first","first","first","second","third","third","BBB"),
transac = c(2000,3000,4000,8888,9000,4000,1000),
sum=c(340,620,050,400,250,450,27))
setdiff(df1[,1:2],df2[,1:2])
returns
company transac
1 BBB 3535
如果您指定要用于联接的列,则可以通过 anti_join()
完成此操作。
library(dplyr)
library(tibble)
dataframe1 = tribble(
~Company_name, ~Transaction_Code, ~Sum,
"First", 2000, 234,
"First", 3000, 562,
"First", 4000, 105,
"Second", 8888, 740,
"Third", 9000, 325,
"Third", 4000, 145,
"BBB", 1000, 28,
"BBB", 3535, 100
)
dataframe2 = tribble(
~Company_name, ~Transaction_Code, ~Sum,
"First", 2000, 340,
"First", 3000, 620,
"First", 4000, 050,
"Second", 8888, 400,
"Third", 9000, 250,
"Third", 4000, 450,
"BBB", 1000, 27
)
anti_join(dataframe1, dataframe2, by = c("Company_name", "Transaction_Code"))
我有两个列名相同的数据框,示例如下。
>dataframe1
Company_name Transaction_Code Sum
1: First 2000 234
2: First 3000 562
3: First 4000 105
4: Second 8888 740
5: Third 9000 325
6: Third 4000 145
7: BBB 1000 28
8: BBB 3535 100
>dataframe2
Company_name Transaction_Code Sum
1: First 2000 340
2: First 3000 620
3: First 4000 050
4: Second 8888 400
5: Third 9000 250
6: Third 4000 450
7: BBB 1000 27
我正在尝试通过前两列的值检查条目,以查看 dataframe1 中的哪些条目在 dataframe2 中丢失。如图所示,dataframe1 具有 dataframe2 中缺少的条目 #8。我已经看到 dplyr::anti.join 解决此类任务的解决方案 condition/column,但当我需要根据两列中的值判断条目时,它似乎不起作用。
P.S。我没有包含任何可重现的示例,因为我认为没有意义。我远不是 R 或一般编码方面的专家,所以这个问题可能以某种方式缺乏,抱歉。
setdiff()
可能就是您要找的:
df1 <- data.frame(company = c("first","first","first","second","third","third","BBB","BBB"),
transac = c(2000,3000,4000,8888,9000,4000,1000,3535),
sum=c(234,562,105,740,325,145,28,100))
df2 <- data.frame(company = c("first","first","first","second","third","third","BBB"),
transac = c(2000,3000,4000,8888,9000,4000,1000),
sum=c(340,620,050,400,250,450,27))
setdiff(df1[,1:2],df2[,1:2])
returns
company transac
1 BBB 3535
如果您指定要用于联接的列,则可以通过 anti_join()
完成此操作。
library(dplyr)
library(tibble)
dataframe1 = tribble(
~Company_name, ~Transaction_Code, ~Sum,
"First", 2000, 234,
"First", 3000, 562,
"First", 4000, 105,
"Second", 8888, 740,
"Third", 9000, 325,
"Third", 4000, 145,
"BBB", 1000, 28,
"BBB", 3535, 100
)
dataframe2 = tribble(
~Company_name, ~Transaction_Code, ~Sum,
"First", 2000, 340,
"First", 3000, 620,
"First", 4000, 050,
"Second", 8888, 400,
"Third", 9000, 250,
"Third", 4000, 450,
"BBB", 1000, 27
)
anti_join(dataframe1, dataframe2, by = c("Company_name", "Transaction_Code"))