如何创建组并计算变量以查找变量之间的关系？

Question

我在根据字段条件汇总计数和聚合函数时遇到问题。

示例：

df = tbl_df(data.frame(
    users=c("1", "1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4", "4"), 
    projects=c("100", "101", "102", "103", "104", "105", "106", "107", "108", "109", "110", "111", "112"), 
    from=c("0", "0", "111", "106", "111", "101", "0", "101", "0", "100", "106", "108", "0")))

table包含用户（users）、用户拥有的项目（projects）和源自其他用户（from）的其他项目的项目。

我想知道哪些用户通过使用项目与其他用户建立了更多的关系。如table所示，一个用户的项目可以被其他用户（from）使用，用户可以拥有自己的项目（projects）。

我考虑过计算关系：其他用户使用的用户项目数量和他不是所有者的用户项目数量。

任何人都可以提示我如何使用 ddply 或其他函数（如总结或 group_by 执行此操作吗？

我能够使用 for 生成函数，但我知道这不是最合适的解决方案，尤其是当我有数百万用户在处理时。

提前致谢！

Answer 1

out <- data.frame(summarize(group_by(df, users),
                     number_of_user_owned_projects = length(df$from[df$from %in% projects]),
                     number_of_projects_from_others = length(unique(from[from != 0]))))
out
  users number_of_user_owned_projects number_of_projects_from_others
1     1                             3                              2
2     2                             2                              2
3     3                             1                              1
4     4                             2                              3

Answer 2

temp = df %>% group_by(from) %>% summarise(cntr = n()) %>% filter(from != 0)

#temp

#    from  cntr
#1    100     1
#2    101     2
#3    106     2
#4    108     1
#5    111     2


output = left_join(df, temp, by = c("projects" = "from")) %>% 
             group_by(users) %>% 
             summarize(user_owned = sum(cntr, na.rm = TRUE), other_owned = sum(from != 0))

#output

#   users user_owned other_owned

#1      1          3           2
#2      2          2           2
#3      3          1           1
#4      4          2           3

如何创建组并计算变量以查找变量之间的关系？

How to create groups and count variables to find relationships between variables?

group-by

r

relationship

plyr

rstudio