R - 对多行数据帧执行 CountIF
R - Performing a CountIF for a multiple rows data frame
我在谷歌上搜索了很多关于如何在 R 中执行 CountIF 的示例,但是我仍然没有找到我想要的解决方案。
我基本上有 2 个数据帧:
df1: customer_id | date_of_export - here, we have only 1 date of export per customer
df2: customer_id | date_of_delivery - here, a customer can have different delivery dates (which means, same customer will appear more than once in the list)
而且我需要计算 df1 中的每个 customer_id,他们在出口日期之后收到了多少次交货。所以,我需要 count if df1$customer_id = df2$customer_id AND df1$date_of_export <= df2$date_of_delivery
更好地理解:
customer_id | date_of_export
1 | 2018-01-12
2 | 2018-01-12
3 | 2018-01-12
customer_id | date_of_delivery
1 | 2018-01-10
1 | 2018-01-17
2 | 2018-01-13
2 | 2018-01-20
3 | 2018-01-04
我的输出应该是:
customer_id | date_of_export | deliveries_after_export
1 | 2018-01-12 | 1 (one delivery after the export date)
2 | 2018-01-12 | 2 (two deliveries after the export date)
3 | 2018-01-12 | 0 (no delivery after the export date)
似乎没有那么复杂,但我没有找到一个好的方法来做到这一点。苦苦挣扎了2天,一事无成。
我希望我在这里说清楚了。谢谢!
我建议将两个 data.frames
合并在一起,这样就很简单 sum()
:
library(data.table)
df3 <- merge(df1, df2)
setDT(df3)[, .(deliveries_after_export = sum(date_of_delivery > date_of_export)), by = .(customer_id, date_of_export)]
# customer_id date_of_export deliveries_after_export
#1: 1 2018-01-12 1
#2: 2 2018-01-12 2
#3: 3 2018-01-12 0
我在谷歌上搜索了很多关于如何在 R 中执行 CountIF 的示例,但是我仍然没有找到我想要的解决方案。
我基本上有 2 个数据帧:
df1: customer_id | date_of_export - here, we have only 1 date of export per customer
df2: customer_id | date_of_delivery - here, a customer can have different delivery dates (which means, same customer will appear more than once in the list)
而且我需要计算 df1 中的每个 customer_id,他们在出口日期之后收到了多少次交货。所以,我需要 count if df1$customer_id = df2$customer_id AND df1$date_of_export <= df2$date_of_delivery
更好地理解:
customer_id | date_of_export
1 | 2018-01-12
2 | 2018-01-12
3 | 2018-01-12
customer_id | date_of_delivery
1 | 2018-01-10
1 | 2018-01-17
2 | 2018-01-13
2 | 2018-01-20
3 | 2018-01-04
我的输出应该是:
customer_id | date_of_export | deliveries_after_export
1 | 2018-01-12 | 1 (one delivery after the export date)
2 | 2018-01-12 | 2 (two deliveries after the export date)
3 | 2018-01-12 | 0 (no delivery after the export date)
似乎没有那么复杂,但我没有找到一个好的方法来做到这一点。苦苦挣扎了2天,一事无成。
我希望我在这里说清楚了。谢谢!
我建议将两个 data.frames
合并在一起,这样就很简单 sum()
:
library(data.table)
df3 <- merge(df1, df2)
setDT(df3)[, .(deliveries_after_export = sum(date_of_delivery > date_of_export)), by = .(customer_id, date_of_export)]
# customer_id date_of_export deliveries_after_export
#1: 1 2018-01-12 1
#2: 2 2018-01-12 2
#3: 3 2018-01-12 0