数据中变量的唯一组合的总和 table

Sum for unique combinations of variables in a data table

我有一个数据 table,其格式如下,表示国家间多种关系的强度:

Country1    Country2     Value     Category
A           A            4         1
A           B            2         1
A           C            9         1
B           A            3         2
B           D            4         1
C           A            2         2
D           C            7         2
...

现在我想总结每个类别的所有相互关系(例如 A-B 和 B-A;D-C 和 C-D 等...)(A-B 和 B-A 需要 "merged")。

什么是简洁且 "very R" 的解决方案?是否有任何现有功能可以做到这一点?

现在,我已经为 "Country1" 和 "Country2" 列设置了一个键,但没有找到下一步可以做什么来匹配相应的行..

感谢任何线索。

# x = your data as data.table
x[,  rel :=paste(fromCountry, toCountry, sep = "-")]
x[, .(sums = sum(Value)), by = rel]

# if fromCountry - toCountry is considered to be bidirectional, then make sets:

library(Kmisc)
x[, sets := Kmisc::str_sort(rel)]
x[, .(sum = sum(Value)), by = sets]

你如何"sum all mutual bipartite relations"取决于你想做什么。

统计每个类别的关系数:

x[, .N, by=Category]

总结每个类别中所有关系的value

x[, sum(Value), by=Category]

或者为了更漂亮的输出:

x[, list(TotalValue = sum(Value)), by=Category]

使用pminpmax..

require(data.table) # v1.9.6
dt = fread("Country1    Country2     Value     Category
A           A            4         1
A           B            2         1
A           C            9         1
B           A            3         2
B           D            4         1
C           A            2         2
D           C            7         2")
dt[, .(total = sum(Value)), 
     by=.(Country1 = pmin(Country1, Country2), 
          Country2 = pmax(Country1, Country2))]
#    Country1 Country2 total
# 1:        A        A     4
# 2:        A        B     5
# 3:        A        C    11
# 4:        B        D     4
# 5:        C        D     7

如果您想要在 Category 中使用它,只需将它也添加到 by