数据中变量的唯一组合的总和 table

Question

我有一个数据 table，其格式如下，表示国家间多种关系的强度：

Country1    Country2     Value     Category
A           A            4         1
A           B            2         1
A           C            9         1
B           A            3         2
B           D            4         1
C           A            2         2
D           C            7         2
...

现在我想总结每个类别的所有相互关系（例如 A-B 和 B-A；D-C 和 C-D 等...）（A-B 和 B-A 需要 "merged"）。

什么是简洁且 "very R" 的解决方案？是否有任何现有功能可以做到这一点？

现在，我已经为 "Country1" 和 "Country2" 列设置了一个键，但没有找到下一步可以做什么来匹配相应的行..

感谢任何线索。

Answer 1

# x = your data as data.table
x[,  rel :=paste(fromCountry, toCountry, sep = "-")]
x[, .(sums = sum(Value)), by = rel]

# if fromCountry - toCountry is considered to be bidirectional, then make sets:

library(Kmisc)
x[, sets := Kmisc::str_sort(rel)]
x[, .(sum = sum(Value)), by = sets]

Answer 2

你如何"sum all mutual bipartite relations"取决于你想做什么。

统计每个类别的关系数：

x[, .N, by=Category]

总结每个类别中所有关系的value：

x[, sum(Value), by=Category]

或者为了更漂亮的输出：

x[, list(TotalValue = sum(Value)), by=Category]

Answer 3

使用pmin和pmax..

require(data.table) # v1.9.6
dt = fread("Country1    Country2     Value     Category
A           A            4         1
A           B            2         1
A           C            9         1
B           A            3         2
B           D            4         1
C           A            2         2
D           C            7         2")
dt[, .(total = sum(Value)), 
     by=.(Country1 = pmin(Country1, Country2), 
          Country2 = pmax(Country1, Country2))]
#    Country1 Country2 total
# 1:        A        A     4
# 2:        A        B     5
# 3:        A        C    11
# 4:        B        D     4
# 5:        C        D     7

如果您想要在 Category 中使用它，只需将它也添加到 by。

数据中变量的唯一组合的总和 table

Sum for unique combinations of variables in a data table

r

data.table