如何在重复计算中有效地计算唯一值的数量?

How to count the number of unique values efficiently in a repeated computation?

这是我的交易数据

from_id       to_id      date_trx      week    amount
<fctr>        <fctr>     <date>        <dbl>   <dbl>
6644           6934       2005-01-01    1      700
6753           8456       2005-01-01    1      600
9242           9333       2005-01-01    1      1000
9843           9115       2005-01-01    1      900 
7075           6510       2005-01-02    1      400 
8685           7207       2005-01-02    1      1100   

...            ...        ...           ...    ...

9866           6697       2010-12-31    313    95.8
9866           5992       2010-12-31    313    139.1
9866           5797       2010-12-31    313    72.1
9866           9736       2010-12-31    313    278.9
9868           8644       2010-12-31    313    242.8
9869           8399       2010-12-31    313    372.2

我想计算每个 week 中每个 from_id 的唯一 to_id 的数量:即:

data <- data %>% 
  group_by(week,from_id) %>% 
  mutate(weekly_distinct_accounts=n_distinct(to_id))

但是,计算似乎永远不会结束。这样做的有效方法是什么?我也尝试了其他功能mentioned here,但它们也没有帮助

如果您想将结果存储在 data 中,您可以使用 ave.

data$weekly_distinct_accounts <- ave(data$to_id, data$from_id, data$week
  , FUN=function(x) length(unique(x)))

或使用duplicated

data$weekly_distinct_accounts <- ave(data$to_id, data$from_id, data$week
  , FUN=function(x) sum(!duplicated(x)))

如果您只需要每组的总和,您可以使用 aggregate

aggregate(to_id ~ from_id + week, data, function(x) length(unique(x)))

aggregate(to_id ~ from_id + week, data, function(x) sum(!duplicated(x)))

aggregate(to_id ~ ., unique(data[c("to_id", "from_id", "week")]), length)