将个人列表转换为 R 中对的出现

Convert list of individuals to occurence of pairs in R

我需要 data.frame 的特定格式来进行社会结构分析。如何转换 data.frame 包含在多个事件中一起出现的个人列表:

my.df <- data.frame(individual = c("A","B","C","B","C","D"),
                    time = rep(c("event_01","event_02"), each = 3))

  individual     time
1          A event_01
2          B event_01
3          C event_01
4          B event_02
5          C event_02
6          D event_02

变成一个 data.frame 包含每个对的出现(包括 [A,A]; [B,B] 等对:

ind_1    ind_2   times
  A        A       0
  A        B       1
  A        C       1
  A        D       0
  B        A       1
  B        B       0
  B        C       2
  B        D       1
  C        A       1
  C        B       2
  C        C       0
  C        D       1
  D        A       0
  D        B       1
  D        C       1
  D        D       0

您可以使用 data.table

dt_combs <- my.dt[,
                  list(ind_1 = combn(individual, 2)[1, ],
                       ind_2 = combn(individual, 2)[2, ]),
                  by = time]
dt_ncombs <- dt_combs[, .N, by = c("ind_1", "ind_2")]
dt_ncombs_inverted <- copy(dt_ncombs)
dt_ncombs_inverted[, temp := ind_1]
dt_ncombs_inverted[, ind_1 := ind_2]
dt_ncombs_inverted[, ind_2 := temp]
dt_ncombs_inverted[, temp := NULL]
dt_ncombs <- rbind(dt_ncombs, dt_ncombs_inverted)
dt_allcombs <- data.table(expand.grid(
  ind_1 = my.dt[, unique(individual)],
  ind_2 = my.dt[, unique(individual)]
))
dt_final <- merge(dt_allcombs,
                  dt_ncombs,
                  all.x = TRUE,
                  by = c("ind_1", "ind_2"))
dt_final[is.na(N), N := 0]
dt_final

在基础 R 中,您可以执行以下操作:

data.frame(as.table(`diag<-`(tcrossprod(table(my.df)), 0)))
#    individual individual.1 Freq
# 1           A            A    0
# 2           B            A    1
# 3           C            A    1
# 4           D            A    0
# 5           A            B    1
# 6           B            B    0
# 7           C            B    2
# 8           D            B    1
# 9           A            C    1
# 10          B            C    2
# 11          C            C    0
# 12          D            C    1
# 13          A            D    0
# 14          B            D    1
# 15          C            D    1
# 16          D            D    0

tcrossprod 为您提供以下内容:

> tcrossprod(table(my.df))
          individual
individual A B C D
         A 1 1 1 0
         B 1 2 2 1
         C 1 2 2 1
         D 0 1 1 1

这基本上就是您要查找的所有信息,但您希望它的形式略有不同,没有对角线值。

我们可以将对角线设置为零:

`diag<-`(theOutputFromAbove, 0)

然后,为了获得长形式,通过使用 as.table 欺骗 R 认为结果 matrixtable,并利用 data.frame tables.

的方法

你可以这样做:

创建新的前 2 个变量 data.frame:

df2 <- expand.grid(ind_2=levels(my.df$individual), ind_1=levels(my.df$individual))[, 2:1]

将相同个体对的值设置为 0:

df2$times[df2[, 1]==df2[, 2]] <- 0

查看其他独特组合:

comb_diff <- combn(levels(my.df$individual), 2)

计算找到每个唯一组合的次数:

times_uni <- apply(comb_diff, 2, function(inds){
                                     sum(table(my.df$time[my.df$individual %in% inds])==2)
                                 })

最后填写新的data.frame:

df2$times[match(c(paste0(comb_diff[1,], comb_diff[2,]), paste0(comb_diff[2, ], comb_diff[1, ])), paste0(df2[, 1],df2[, 2]))] <- rep(times_uni, 2)

df2
#   ind_1 ind_2 times
#1      A     A     0
#2      A     B     1
#3      A     C     1
#4      A     D     0
#5      B     A     1
#6      B     B     0
#7      B     C     2
#8      B     D     1
#9      C     A     1
#10     C     B     2
#11     C     C     0
#12     C     D     1
#13     D     A     0
#14     D     B     1
#15     D     C     1
#16     D     D     0