如何将缺失的国家包括在 r 中的 df
How to include missing countries to df in r
这个问题是我之前 .
的衍生问题
我有一个关于并购 (M&As) 的大数据框(90 万行)。
df有四列:日期(并购完成的时间),target_nation(merged/acquired所在国家的公司),acquiror_nation(并购所在国家的公司)收购方),以及 big_corp(收购方是否为大公司,其中 TRUE 表示该公司为大公司)。
这是我的 df 示例:
df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2003L,
2003L, 1999L, 2001L, 2002L, 2002L, 2002L), target_nation = c("Uganda",
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Mozambique",
"Mozambique", "Mozambique", "Mozambique", "Mozambique", "Mozambique"
), acquiror_nation = c("France", "Germany", "France", "France",
"Germany", "Germany", "Germany", "Germany", "France", "France",
"Germany", "Japan"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE,
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)), row.names = c(NA,
-12L), class = c("data.table", "data.frame"))
> df
date target_nation acquiror_nation big_corp_TF
1: 2000 Uganda France TRUE
2: 2000 Uganda Germany FALSE
3: 2001 Uganda France TRUE
4: 2001 Uganda France FALSE
5: 2001 Uganda Germany FALSE
6: 2003 Uganda Germany TRUE
7: 2003 Mozambique Germany FALSE
8: 1999 Mozambique Germany FALSE
9: 2001 Mozambique France TRUE
10: 2002 Mozambique France FALSE
11: 2002 Mozambique Germany TRUE
12: 2002 Mozambique Japan TRUE
根据这些数据,我想创建一个新列,表示特定收购国的大公司在特定目标国家进行的并购的份额,计算 2 的平均值年。 (对于我的实际练习,我将计算 5 年的平均值,但这里让事情更简单)。
有一组收购国我特别感兴趣(比如法国、德国和日本)。我希望有一个列表示这些国家/地区的上述份额。
@AnilGoyal 之前帮我写了一段代码。这是代码:
df_calc <- df %>%
mutate(d = 1) %>%
group_by(target_nation) %>%
complete(date = seq(min(date), max(date), 1), nesting(acquiror_nation),
fill = list(d = 0, big_corp_TF = FALSE)) %>%
group_by(date, target_nation) %>%
mutate(total_MAs = sum(d)) %>%
group_by(date, target_nation, acquiror_nation) %>%
summarise(total_MAs = mean(total_MAs),
total_MAs_bigcorp = sum(big_corp_TF), .groups = 'drop') %>%
group_by(target_nation, acquiror_nation) %>%
mutate(share = sum_run(total_MAs_bigcorp, k=2)/sum_run(total_MAs, k=2))
这是输出:
date targ_nat acq_nat tot_MA big_MA share
1 1999 Mozambique France 1 0 0.0000000
2 1999 Mozambique Germany 1 0 0.0000000
3 1999 Mozambique Japan 1 0 0.0000000
4 2000 Mozambique France 0 0 0.0000000
5 2000 Mozambique Germany 0 0 0.0000000
6 2000 Mozambique Japan 0 0 0.0000000
7 2001 Mozambique France 1 1 1.0000000
8 2001 Mozambique Germany 1 0 0.0000000
9 2001 Mozambique Japan 1 0 0.0000000
10 2002 Mozambique France 3 0 0.2500000
11 2002 Mozambique Germany 3 1 0.2500000
12 2002 Mozambique Japan 3 1 0.2500000
13 2003 Mozambique France 1 0 0.0000000
14 2003 Mozambique Germany 1 0 0.2500000
15 2003 Mozambique Japan 1 0 0.2500000
16 2000 Uganda France 2 1 0.5000000
17 2000 Uganda Germany 2 0 0.0000000
18 2001 Uganda France 3 1 0.4000000
19 2001 Uganda Germany 3 0 0.0000000
20 2002 Uganda France 0 0 0.3333333
21 2002 Uganda Germany 0 0 0.0000000
22 2003 Uganda France 1 0 0.0000000
23 2003 Uganda Germany 1 1 1.0000000
所有的数字都符合要求。但是,我希望日本对乌干达的投资能有成果,但无法实现。 如何实现?我了解到日本在乌干达没有结果的原因是日本在任何一年都没有在乌干达进行任何投资(如数据样本所示)多于);但这种缺乏投资对我来说是一个有意义的结果,我希望日本作为收购国也能有争吵。就像这样(出于 space 原因,我将莫桑比克排除为 targ_nat):
date targ_nat acq_nat tot_MA big_MA share
16 2000 Uganda France 2 1 0.5000000
17 2000 Uganda Germany 2 0 0.0000000
18 2000 Uganda Japan 2 0 0.0000000
19 2001 Uganda France 3 1 0.4000000
20 2001 Uganda Germany 3 0 0.0000000
21 2001 Uganda Japan 3 0 0.0000000
22 2002 Uganda France 0 0 0.3333333
22 2002 Uganda Germany 0 0 0.0000000
23 2002 Uganda Japan 0 0 0.0000000
24 2003 Uganda France 1 0 0.0000000
25 2003 Uganda Germany 1 1 1.0000000
26 2003 Uganda Japan 1 0 0.0000000
关于如何实现这一点有什么想法吗?出于我的实际目的,我有一组 13 个国家/地区,我希望看到这些国家/地区作为收购国的结果(因此不仅仅是法国、德国和日本)。这些国家作为收购国出现在数据集中(但并非所有 target_nations (!) --- 就像这里乌干达和日本的例子一样)。
非常感谢任何帮助。
需要complete
library(dplyr)
library(tidyr)
out <- df_calc %>%
group_by(target_nation, date, total_MAs) %>%
complete(acquiror_nation = unique(.$acquiror_nation),
fill = list(total_MAs_bigcorp = 0, share = 0)) %>%
ungroup
-检查 'Uganda'
的输出
out %>%
filter(target_nation == 'Uganda')
# A tibble: 12 x 6
# target_nation date total_MAs acquiror_nation total_MAs_bigcorp share
# <chr> <dbl> <dbl> <chr> <dbl> <dbl>
# 1 Uganda 2000 2 France 1 0.5
# 2 Uganda 2000 2 Germany 0 0
# 3 Uganda 2000 2 Japan 0 0
# 4 Uganda 2001 3 France 1 0.4
# 5 Uganda 2001 3 Germany 0 0
# 6 Uganda 2001 3 Japan 0 0
# 7 Uganda 2002 0 France 0 0.333
# 8 Uganda 2002 0 Germany 0 0
# 9 Uganda 2002 0 Japan 0 0
#10 Uganda 2003 1 France 0 0
#11 Uganda 2003 1 Germany 1 1
#12 Uganda 2003 1 Japan 0 0
这个问题是我之前
我有一个关于并购 (M&As) 的大数据框(90 万行)。
df有四列:日期(并购完成的时间),target_nation(merged/acquired所在国家的公司),acquiror_nation(并购所在国家的公司)收购方),以及 big_corp(收购方是否为大公司,其中 TRUE 表示该公司为大公司)。
这是我的 df 示例:
df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2003L,
2003L, 1999L, 2001L, 2002L, 2002L, 2002L), target_nation = c("Uganda",
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Mozambique",
"Mozambique", "Mozambique", "Mozambique", "Mozambique", "Mozambique"
), acquiror_nation = c("France", "Germany", "France", "France",
"Germany", "Germany", "Germany", "Germany", "France", "France",
"Germany", "Japan"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE,
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)), row.names = c(NA,
-12L), class = c("data.table", "data.frame"))
> df
date target_nation acquiror_nation big_corp_TF
1: 2000 Uganda France TRUE
2: 2000 Uganda Germany FALSE
3: 2001 Uganda France TRUE
4: 2001 Uganda France FALSE
5: 2001 Uganda Germany FALSE
6: 2003 Uganda Germany TRUE
7: 2003 Mozambique Germany FALSE
8: 1999 Mozambique Germany FALSE
9: 2001 Mozambique France TRUE
10: 2002 Mozambique France FALSE
11: 2002 Mozambique Germany TRUE
12: 2002 Mozambique Japan TRUE
根据这些数据,我想创建一个新列,表示特定收购国的大公司在特定目标国家进行的并购的份额,计算 2 的平均值年。 (对于我的实际练习,我将计算 5 年的平均值,但这里让事情更简单)。
有一组收购国我特别感兴趣(比如法国、德国和日本)。我希望有一个列表示这些国家/地区的上述份额。
@AnilGoyal 之前帮我写了一段代码。这是代码:
df_calc <- df %>%
mutate(d = 1) %>%
group_by(target_nation) %>%
complete(date = seq(min(date), max(date), 1), nesting(acquiror_nation),
fill = list(d = 0, big_corp_TF = FALSE)) %>%
group_by(date, target_nation) %>%
mutate(total_MAs = sum(d)) %>%
group_by(date, target_nation, acquiror_nation) %>%
summarise(total_MAs = mean(total_MAs),
total_MAs_bigcorp = sum(big_corp_TF), .groups = 'drop') %>%
group_by(target_nation, acquiror_nation) %>%
mutate(share = sum_run(total_MAs_bigcorp, k=2)/sum_run(total_MAs, k=2))
这是输出:
date targ_nat acq_nat tot_MA big_MA share
1 1999 Mozambique France 1 0 0.0000000
2 1999 Mozambique Germany 1 0 0.0000000
3 1999 Mozambique Japan 1 0 0.0000000
4 2000 Mozambique France 0 0 0.0000000
5 2000 Mozambique Germany 0 0 0.0000000
6 2000 Mozambique Japan 0 0 0.0000000
7 2001 Mozambique France 1 1 1.0000000
8 2001 Mozambique Germany 1 0 0.0000000
9 2001 Mozambique Japan 1 0 0.0000000
10 2002 Mozambique France 3 0 0.2500000
11 2002 Mozambique Germany 3 1 0.2500000
12 2002 Mozambique Japan 3 1 0.2500000
13 2003 Mozambique France 1 0 0.0000000
14 2003 Mozambique Germany 1 0 0.2500000
15 2003 Mozambique Japan 1 0 0.2500000
16 2000 Uganda France 2 1 0.5000000
17 2000 Uganda Germany 2 0 0.0000000
18 2001 Uganda France 3 1 0.4000000
19 2001 Uganda Germany 3 0 0.0000000
20 2002 Uganda France 0 0 0.3333333
21 2002 Uganda Germany 0 0 0.0000000
22 2003 Uganda France 1 0 0.0000000
23 2003 Uganda Germany 1 1 1.0000000
所有的数字都符合要求。但是,我希望日本对乌干达的投资能有成果,但无法实现。 如何实现?我了解到日本在乌干达没有结果的原因是日本在任何一年都没有在乌干达进行任何投资(如数据样本所示)多于);但这种缺乏投资对我来说是一个有意义的结果,我希望日本作为收购国也能有争吵。就像这样(出于 space 原因,我将莫桑比克排除为 targ_nat):
date targ_nat acq_nat tot_MA big_MA share
16 2000 Uganda France 2 1 0.5000000
17 2000 Uganda Germany 2 0 0.0000000
18 2000 Uganda Japan 2 0 0.0000000
19 2001 Uganda France 3 1 0.4000000
20 2001 Uganda Germany 3 0 0.0000000
21 2001 Uganda Japan 3 0 0.0000000
22 2002 Uganda France 0 0 0.3333333
22 2002 Uganda Germany 0 0 0.0000000
23 2002 Uganda Japan 0 0 0.0000000
24 2003 Uganda France 1 0 0.0000000
25 2003 Uganda Germany 1 1 1.0000000
26 2003 Uganda Japan 1 0 0.0000000
关于如何实现这一点有什么想法吗?出于我的实际目的,我有一组 13 个国家/地区,我希望看到这些国家/地区作为收购国的结果(因此不仅仅是法国、德国和日本)。这些国家作为收购国出现在数据集中(但并非所有 target_nations (!) --- 就像这里乌干达和日本的例子一样)。
非常感谢任何帮助。
需要complete
library(dplyr)
library(tidyr)
out <- df_calc %>%
group_by(target_nation, date, total_MAs) %>%
complete(acquiror_nation = unique(.$acquiror_nation),
fill = list(total_MAs_bigcorp = 0, share = 0)) %>%
ungroup
-检查 'Uganda'
的输出out %>%
filter(target_nation == 'Uganda')
# A tibble: 12 x 6
# target_nation date total_MAs acquiror_nation total_MAs_bigcorp share
# <chr> <dbl> <dbl> <chr> <dbl> <dbl>
# 1 Uganda 2000 2 France 1 0.5
# 2 Uganda 2000 2 Germany 0 0
# 3 Uganda 2000 2 Japan 0 0
# 4 Uganda 2001 3 France 1 0.4
# 5 Uganda 2001 3 Germany 0 0
# 6 Uganda 2001 3 Japan 0 0
# 7 Uganda 2002 0 France 0 0.333
# 8 Uganda 2002 0 Germany 0 0
# 9 Uganda 2002 0 Japan 0 0
#10 Uganda 2003 1 France 0 0
#11 Uganda 2003 1 Germany 1 1
#12 Uganda 2003 1 Japan 0 0