为 dyads 创建唯一的 ID。无方向性
Create unique ID for dyads. Non directional
我有一个数据框,其中包括 country/year 向其他国家/地区的进出口。与示例数据集中一样,二元导入和导出的数据没有完全重叠。
例如
library(tidyverse)
df <- data.frame("Reporter" = c("USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "Africa","Africa", "Africa","Africa", "Africa","Africa", "Africa","Africa", "EU", "EU","EU", "EU", "EU", "EU","EU", "EU"),
"Partner" = c("Africa","Africa", "Africa","Africa","EU", "EU","EU", "EU", "USA", "USA", "USA", "USA", "EU", "EU","EU", "EU","USA", "USA", "USA", "USA","Africa","Africa", "Africa","Africa"),
"Year" = c(1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980),
"Flow" = c("Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export"),
"Val" = runif(24, min=0, max=100), stringsAsFactors = FALSE)
# Reporter Partner Year Flow Val
# 1 USA Africa 1970 Import 13.169790
# 2 USA Africa 1970 Export 28.531263
# 3 USA Africa 1980 Import 66.811160
# 4 USA Africa 1980 Export 47.556102
# 5 USA EU 1970 Import 59.166556
# 6 USA EU 1970 Export 71.032895
# 7 USA EU 1980 Import 89.688642
# 8 USA EU 1980 Export 36.563593
# 9 Africa USA 1970 Import 33.088294
# 10 Africa USA 1970 Export 10.692528
# 11 Africa USA 1980 Import 69.296384
# 12 Africa USA 1980 Export 54.697131
# 13 Africa EU 1970 Import 64.327314
# 14 Africa EU 1970 Export 64.659566
# 15 Africa EU 1980 Import 6.139465
# 16 Africa EU 1980 Export 97.317815
# 17 EU USA 1970 Import 7.245794
# 18 EU USA 1970 Export 72.291265
# 19 EU USA 1980 Import 14.134386
# 20 EU USA 1980 Export 60.288242
# 21 EU Africa 1970 Import 29.648374
# 22 EU Africa 1970 Export 81.916536
# 23 EU Africa 1980 Import 47.665834
# 24 EU Africa 1980 Export 64.307639
我创建了这个数据的宽版本。
wide_df <- df %>% spread ("Flow", "Val")
我可以为 dyads 创建定向 ID。
wide_df$ReporterID <- as.numeric(factor(wide_df$Reporter, levels=unique(wide_df$Reporter)))
但是,结果数据被认为是不同的,例如,美国,非洲,非洲和美国。
问题:如何为每个 dyad 创建一个唯一的 ID?
谁能想出一种方法让我将这些二元组折叠成一个 ID 代码
library(tidyverse)
# vectorised function to order and combine values
f = function(x,y) paste(sort(c(x, y)), collapse="_")
f = Vectorize(f)
df %>%
spread ("Flow", "Val") %>%
mutate(ID1 = f(Reporter, Partner),
ID2 = as.numeric(as.factor(ID1)))
# Reporter Partner Year Export Import ID1 ID2
# 1 Afica EU 1970 56.6 98.9 Afica_EU 1
# 2 Afica EU 1980 95.3 2.25 Afica_EU 1
# 3 Afica USA 1970 50.4 10.3 Afica_USA 2
# 4 Afica USA 1980 29.4 3.08 Afica_USA 2
# 5 EU Afica 1970 88.8 56.3 Afica_EU 1
# 6 EU Afica 1980 53.6 48.0 Afica_EU 1
# 7 EU USA 1970 4.50 83.8 EU_USA 3
# 8 EU USA 1980 79.1 0.473 EU_USA 3
# 9 USA Afica 1970 61.9 37.2 Afica_USA 2
#10 USA Afica 1980 9.88 39.6 Afica_USA 2
#11 USA EU 1970 10.4 29.3 EU_USA 3
#12 USA EU 1980 21.1 35.3 EU_USA 3
一个选项是 ID1
,它结合了实际值。
另一个选项是 ID2
,它根据 ID1
创建一个数字。
这些 ID2
数字背后的逻辑是 factor
变量 ID1
级别的顺序(即本例中的字母顺序)。
如果您不需要原始列 Reporter
和 Partner
,您可以在过程结束时使用 unite(ID1, Reporter, Partner, remove = T)
或 select(-Reporter, -Partner)
排除它们。
我们通过 paste
为每一行 'Reporter'、'Partner' 对应元素的最小值和最大值创建唯一的“id”(pmin
、pmax
), 将其转换为 factor
并强制转换为 numeric
or using
tidyverse`
library(tidyverse)
wide_df %>%
mutate(newid = as.numeric(factor(paste(pmin(Reporter, Partner),
pmax(Reporter, Partner), sep="_"))))
# Reporter Partner Year Export Import newid
#1 Afica EU 1970 23.494073 62.50156 1
#2 Afica EU 1980 18.808975 52.17495 1
#3 Afica USA 1970 23.679063 37.02527 2
#4 Afica USA 1980 2.346382 21.69631 2
#5 EU Afica 1970 73.075570 78.00496 1
#6 EU Afica 1980 69.620370 60.24295 1
#7 EU USA 1970 89.163190 80.78952 3
#8 EU USA 1980 77.462146 48.51146 3
#9 USA Afica 1970 18.285198 99.99596 2
#10 USA Afica 1980 26.119664 40.51762 2
#11 USA EU 1970 78.307579 70.91757 3
#12 USA EU 1980 41.067151 84.06877 3
我有一个数据框,其中包括 country/year 向其他国家/地区的进出口。与示例数据集中一样,二元导入和导出的数据没有完全重叠。
例如
library(tidyverse)
df <- data.frame("Reporter" = c("USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "Africa","Africa", "Africa","Africa", "Africa","Africa", "Africa","Africa", "EU", "EU","EU", "EU", "EU", "EU","EU", "EU"),
"Partner" = c("Africa","Africa", "Africa","Africa","EU", "EU","EU", "EU", "USA", "USA", "USA", "USA", "EU", "EU","EU", "EU","USA", "USA", "USA", "USA","Africa","Africa", "Africa","Africa"),
"Year" = c(1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980),
"Flow" = c("Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export"),
"Val" = runif(24, min=0, max=100), stringsAsFactors = FALSE)
# Reporter Partner Year Flow Val
# 1 USA Africa 1970 Import 13.169790
# 2 USA Africa 1970 Export 28.531263
# 3 USA Africa 1980 Import 66.811160
# 4 USA Africa 1980 Export 47.556102
# 5 USA EU 1970 Import 59.166556
# 6 USA EU 1970 Export 71.032895
# 7 USA EU 1980 Import 89.688642
# 8 USA EU 1980 Export 36.563593
# 9 Africa USA 1970 Import 33.088294
# 10 Africa USA 1970 Export 10.692528
# 11 Africa USA 1980 Import 69.296384
# 12 Africa USA 1980 Export 54.697131
# 13 Africa EU 1970 Import 64.327314
# 14 Africa EU 1970 Export 64.659566
# 15 Africa EU 1980 Import 6.139465
# 16 Africa EU 1980 Export 97.317815
# 17 EU USA 1970 Import 7.245794
# 18 EU USA 1970 Export 72.291265
# 19 EU USA 1980 Import 14.134386
# 20 EU USA 1980 Export 60.288242
# 21 EU Africa 1970 Import 29.648374
# 22 EU Africa 1970 Export 81.916536
# 23 EU Africa 1980 Import 47.665834
# 24 EU Africa 1980 Export 64.307639
我创建了这个数据的宽版本。
wide_df <- df %>% spread ("Flow", "Val")
我可以为 dyads 创建定向 ID。
wide_df$ReporterID <- as.numeric(factor(wide_df$Reporter, levels=unique(wide_df$Reporter)))
但是,结果数据被认为是不同的,例如,美国,非洲,非洲和美国。
问题:如何为每个 dyad 创建一个唯一的 ID?
谁能想出一种方法让我将这些二元组折叠成一个 ID 代码
library(tidyverse)
# vectorised function to order and combine values
f = function(x,y) paste(sort(c(x, y)), collapse="_")
f = Vectorize(f)
df %>%
spread ("Flow", "Val") %>%
mutate(ID1 = f(Reporter, Partner),
ID2 = as.numeric(as.factor(ID1)))
# Reporter Partner Year Export Import ID1 ID2
# 1 Afica EU 1970 56.6 98.9 Afica_EU 1
# 2 Afica EU 1980 95.3 2.25 Afica_EU 1
# 3 Afica USA 1970 50.4 10.3 Afica_USA 2
# 4 Afica USA 1980 29.4 3.08 Afica_USA 2
# 5 EU Afica 1970 88.8 56.3 Afica_EU 1
# 6 EU Afica 1980 53.6 48.0 Afica_EU 1
# 7 EU USA 1970 4.50 83.8 EU_USA 3
# 8 EU USA 1980 79.1 0.473 EU_USA 3
# 9 USA Afica 1970 61.9 37.2 Afica_USA 2
#10 USA Afica 1980 9.88 39.6 Afica_USA 2
#11 USA EU 1970 10.4 29.3 EU_USA 3
#12 USA EU 1980 21.1 35.3 EU_USA 3
一个选项是 ID1
,它结合了实际值。
另一个选项是 ID2
,它根据 ID1
创建一个数字。
这些 ID2
数字背后的逻辑是 factor
变量 ID1
级别的顺序(即本例中的字母顺序)。
如果您不需要原始列 Reporter
和 Partner
,您可以在过程结束时使用 unite(ID1, Reporter, Partner, remove = T)
或 select(-Reporter, -Partner)
排除它们。
我们通过 paste
为每一行 'Reporter'、'Partner' 对应元素的最小值和最大值创建唯一的“id”(pmin
、pmax
), 将其转换为 factor
并强制转换为 numeric
or using
tidyverse`
library(tidyverse)
wide_df %>%
mutate(newid = as.numeric(factor(paste(pmin(Reporter, Partner),
pmax(Reporter, Partner), sep="_"))))
# Reporter Partner Year Export Import newid
#1 Afica EU 1970 23.494073 62.50156 1
#2 Afica EU 1980 18.808975 52.17495 1
#3 Afica USA 1970 23.679063 37.02527 2
#4 Afica USA 1980 2.346382 21.69631 2
#5 EU Afica 1970 73.075570 78.00496 1
#6 EU Afica 1980 69.620370 60.24295 1
#7 EU USA 1970 89.163190 80.78952 3
#8 EU USA 1980 77.462146 48.51146 3
#9 USA Afica 1970 18.285198 99.99596 2
#10 USA Afica 1980 26.119664 40.51762 2
#11 USA EU 1970 78.307579 70.91757 3
#12 USA EU 1980 41.067151 84.06877 3