我如何计算正确的平均值?
How do I calculate the right mean?
我有一个显示多个国家双边出口的数据集。由于数据波动,我需要计算年份组的平均值。并非所有国家/地区都包含准确的年份。有些开始较晚,有些之间有差距 - 这意味着,有些年份缺失(但没有 NA 条目)。在一位了不起的社区成员的帮助下,我已经设法将数据分成几部分:year_group.
下面我列出了另外两个问题以及我的代码,错误的输出和底部的数据集的一些示例输入数据 total_trade
问题 1
我面临的问题是代码没有计算出正确的方法。当我手动计算结果时,我得到的结果与我的代码不同。 (见下文)
这是我的代码
# create vectors for coding 4 years average
year_group_break <- c(1999, 2003, 2007, 2011, 2015, 2019)
year_group_labels <- c("1999-2002", "2003-2006", "2007-2010", "2011-2014", "2015-2018")
years <- c(1999, 2000, 2001, 2002,2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019)
FourY_av <- total_trade %>%
# create year_group variable for average values with above predefined labels and cuts,
# chose right = FALSE to take cut before year_group_break
mutate(year_group = cut(Year, breaks = year_group_break,
labels = year_group_labels,
include.lowest = TRUE, right = FALSE)) %>%
# add column with mean of total trade per four year period: "avg_year_group_total"
group_by(ReporterName, year_group) %>%
mutate(total_year_group = mean(Total_Year)) %>%
arrange(ReporterName,PartnerName, desc(Year))
View(FourY_av)
下面是错误的输出
此输出是 错误的 因为 total_year_group(安哥拉年份组“2015-2018”的平均值)应该是 34746013.5(手动计算时)而不是 34907582。(如在输出中)我的错误在哪里?
> head(FourY_av)
# A tibble: 6 x 9
# Groups: ReporterName, year_group [1]
Year ReporterName PartnerName PartnerISO3 `TradeValue in 1000 USD` Total_Year pct_by_partner_year year_group total_year_group
<int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <fct> <dbl>
1 2018 Angola Afghanistan AFG 19.4 42096736. 0.0000460 2015-2018 34907582.
2 2017 Angola Afghanistan AFG 2.25 34904881. 0.00000644 2015-2018 34907582.
3 2016 Angola Afghanistan AFG 0.775 28057500. 0.00000276 2015-2018 34907582.
4 2015 Angola Afghanistan AFG 39.6 33924937. 0.000117 2015-2018 34907582.
5 2018 Angola Albania ALB 2.38 42096736. 0.00000565 2015-2018 34907582.
6 2017 Angola Albania ALB 39.7 34904881. 0.000114 2015-2018 34907582.
问题2
另一个问题是并非所有国家/地区都显示所有年份的数据。一些起步较晚,一些有差距。
我仍然需要同年组的方法以确保可比性。数据集没有 NA。只是缺少数据。
例如安哥拉不涵盖 2008 年。数据集不包括 NA,但不包括安哥拉 2008 年的行和值。其他国家/地区正在显示 2008 年的数据。我仍然需要在 total_year_group 列中安哥拉可用年份的平均值(通过取 2007、2009 和 2010 年的平均值)。这不应该是 mean 函数的问题,对吧?或者在这种情况下我需要考虑一些特别的事情吗?
这是 total_trade
的一些 样本输入数据
dput(head(total_trade, n = 100))
structure(list(Year = c(2015L, 2018L, 2017L, 2016L, 2017L, 2015L,
2018L, 2016L, 2015L, 2017L, 2018L, 2018L, 2017L, 2018L, 2018L,
2015L, 2016L, 2017L, 2016L, 2015L, 2017L, 2018L, 2018L, 2017L,
2016L, 2015L, 2018L, 2014L, 2015L, 2016L, 2017L, 2017L, 2018L,
2016L, 2015L, 2016L, 2018L, 2017L, 2015L, 2010L, 2009L, 2016L,
2013L, 2014L, 2018L, 2017L, 2015L, 2016L, 2017L, 2018L, 2017L,
2018L, 2016L, 2016L, 2018L, 2007L, 2013L, 2009L, 2018L, 2015L,
2016L, 2014L, 2010L, 2017L, 2012L, 2011L, 2018L, 2016L, 2015L,
2016L, 2011L, 2018L, 2017L, 2015L, 2015L, 2016L, 2018L, 2017L,
2015L, 2015L, 2016L, 2018L, 2017L, 2007L, 2014L, 2010L, 2013L,
2011L, 2009L, 2012L, 2017L, 2018L, 2016L, 2015L, 2015L, 2015L,
2017L, 2016L, 2018L, 2015L), ReporterName = c("Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola"
), PartnerName = c("Afghanistan", "Afghanistan", "Afghanistan",
"Afghanistan", "Albania", "Albania", "Albania", "Algeria", "Algeria",
"Algeria", "Algeria", "American Samoa", "Andorra", "Andorra",
"Antigua and Barbuda", "Antigua and Barbuda", "Antigua and Barbuda",
"Antigua and Barbuda", "Argentina", "Argentina", "Argentina",
"Argentina", "Armenia", "Armenia", "Armenia", "Armenia", "Australia",
"Australia", "Australia", "Australia", "Australia", "Austria",
"Austria", "Austria", "Austria", "Azerbaijan", "Azerbaijan",
"Azerbaijan", "Azerbaijan", "Bahamas, The", "Bahamas, The", "Bahamas, The",
"Bahamas, The", "Bahamas, The", "Bahamas, The", "Bahamas, The",
"Bahamas, The", "Bahrain", "Bahrain", "Bahrain", "Bangladesh",
"Bangladesh", "Bangladesh", "Barbados", "Belarus", "Belgium",
"Belgium", "Belgium", "Belgium", "Belgium", "Belgium", "Belgium",
"Belgium", "Belgium", "Belgium", "Belgium", "Belize", "Belize",
"Belize", "Benin", "Benin", "Benin", "Benin", "Benin", "Bhutan",
"Bolivia", "Bolivia", "Bolivia", "Bolivia", "Botswana", "Botswana",
"Botswana", "Botswana", "Brazil", "Brazil", "Brazil", "Brazil",
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil",
"British Virgin Islands", "Brunei", "Bulgaria", "Bulgaria", "Bulgaria",
"Bulgaria"), PartnerISO3 = c("AFG", "AFG", "AFG", "AFG", "ALB",
"ALB", "ALB", "DZA", "DZA", "DZA", "DZA", "ASM", "AND", "AND",
"ATG", "ATG", "ATG", "ATG", "ARG", "ARG", "ARG", "ARG", "ARM",
"ARM", "ARM", "ARM", "AUS", "AUS", "AUS", "AUS", "AUS", "AUT",
"AUT", "AUT", "AUT", "AZE", "AZE", "AZE", "AZE", "BHS", "BHS",
"BHS", "BHS", "BHS", "BHS", "BHS", "BHS", "BHR", "BHR", "BHR",
"BGD", "BGD", "BGD", "BRB", "BLR", "BEL", "BEL", "BEL", "BEL",
"BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BLZ", "BLZ",
"BLZ", "BEN", "BEN", "BEN", "BEN", "BEN", "BTN", "BOL", "BOL",
"BOL", "BOL", "BWA", "BWA", "BWA", "BWA", "BRA", "BRA", "BRA",
"BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "VGB",
"BRN", "BGR", "BGR", "BGR", "BGR"), `TradeValue in 1000 USD` = c(39.586,
19.353, 2.248, 0.775, 39.723, 2.259, 2.38, 2169.123, 2322.463,
2241.599, 245.226, 12.007, 5.975, 0.326, 422.006, 155.467, 47.018,
54.774, 483.147, 142.23, 98.7, 61.362, 60.105, 30.494, 0.99,
0.731, 40220.092, 45435.804, 16096.404, 8546.882, 1904.301, 627.179,
433.699, 23.118, 5.124, 985.67, 600.371, 143.356, 9.926, 140139.415,
108214.936, 64444.203, 100210.999, 52974.059, 7322.893, 145.791,
26.995, 4.847, 5.187, 1.958, 125.722, 55.22, 2.75, 3.366, 54.31,
107976.895, 123610.469, 66757.2, 67763.201, 50046.64, 40199.706,
52383.95, 45614.873, 28690.458, 52907.343, 39328.574, 452.078,
5.82, 0.32, 970.324, 1700.981, 804.478, 332.216, 69.342, 1.632,
1530.58, 308.752, 62.569, 19.822, 55.241, 37.029, 16.917, 0.198,
874217.786, 1032751.313, 509259.955, 428750.075, 333280.441,
192964.08, 315316.932, 119947.132, 141486.749, 66556.728, 1273.093,
5.064, 22.324, 158.252, 33.583, 8.435, 0.077), Total_Year = c(33924937.48,
42096736.31, 34904881.111, 28057499.527, 34904881.111, 33924937.48,
42096736.31, 28057499.527, 33924937.48, 34904881.111, 42096736.31,
42096736.31, 34904881.111, 42096736.31, 42096736.31, 33924937.48,
28057499.527, 34904881.111, 28057499.527, 33924937.48, 34904881.111,
42096736.31, 42096736.31, 34904881.111, 28057499.527, 33924937.48,
42096736.31, 58672369.19, 33924937.48, 28057499.527, 34904881.111,
34904881.111, 42096736.31, 28057499.527, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 33924937.48, 52612114.76, 40639411.73,
28057499.527, 67712526.544, 58672369.19, 42096736.31, 34904881.111,
33924937.48, 28057499.527, 34904881.111, 42096736.31, 34904881.111,
42096736.31, 28057499.527, 28057499.527, 42096736.31, 44177783.072,
67712526.544, 40639411.73, 42096736.31, 33924937.48, 28057499.527,
58672369.19, 52612114.76, 34904881.111, 70863076.416, 66427390.221,
42096736.31, 28057499.527, 33924937.48, 28057499.527, 66427390.221,
42096736.31, 34904881.111, 33924937.48, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 33924937.48, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 44177783.072, 58672369.19, 52612114.76,
67712526.544, 66427390.221, 40639411.73, 70863076.416, 34904881.111,
42096736.31, 28057499.527, 33924937.48, 33924937.48, 33924937.48,
34904881.111, 28057499.527, 42096736.31, 33924937.48), pct_by_partner_year = c(0.000116687024179005,
4.59726850497024e-05, 6.44035999679013e-06, 2.7621848456389e-06,
0.000113803567683494, 6.65881846158674e-06, 5.65364493454718e-06,
0.0077309918437765, 0.00684588733986371, 0.00642202158738646,
0.000582529719629944, 2.8522401146684e-05, 1.71179497245645e-05,
7.74406827169068e-07, 0.00100246726228929, 0.000458267609458834,
0.000167577299448064, 0.000156923611416451, 0.00172198880208503,
0.000419249114560196, 0.000282768474948037, 0.00014576426910659,
0.000142778289407966, 8.73631395649993e-05, 3.5284683834613e-06,
2.15475710288619e-06, 0.0955420669759755, 0.0774398658640565,
0.0474471147057807, 0.0304620231456308, 0.00545568682484317,
0.00179682319502973, 0.00103024376238159, 8.23950829180388e-05,
1.51039335091503e-05, 0.00351303578942051, 0.00142616994243657,
0.000410704736521284, 2.92587127267419e-05, 0.2663633948935,
0.266280763902189, 0.229686194730164, 0.147994771595005, 0.0902879153020956,
0.0173953936620509, 0.000417680838208199, 7.95727332317548e-05,
1.72752386410474e-05, 1.48603858110989e-05, 4.65119192514428e-06,
0.000360184581635431, 0.000131174064405754, 9.80130106517029e-06,
1.19967925037684e-05, 0.000129012376636663, 0.244414471464133,
0.18255184868885, 0.164267141570654, 0.160970200874938, 0.147521686751831,
0.143276153177212, 0.0892821454514031, 0.0867003221749986, 0.0821961201035531,
0.0746613690455784, 0.0592053577133712, 0.00107390272887404,
2.07431171633786e-05, 9.43258923288073e-07, 0.00345834096536738,
0.0025606620918584, 0.00191102225615742, 0.000951775194258733,
0.000204398313308255, 4.81062050876917e-06, 0.00545515468521031,
0.000733434529761055, 0.000179255731601051, 5.84289949294256e-05,
0.000162833019316739, 0.000131975409869888, 4.01860131755188e-05,
5.6725590719059e-07, 1.97886296054109, 1.76020046106476, 0.967951882039117,
0.633191666126054, 0.50172141324715, 0.474820062066878, 0.444966473299775,
0.34363999584631, 0.336099093188823, 0.237215465105691, 0.00375267603882995,
1.49270724610338e-05, 6.58041006358842e-05, 0.000453380716286491,
0.00011969348860786, 2.00371827827334e-05, 2.26971678416193e-07
)), row.names = c(NA, -100L), groups = structure(list(Year = c(2007L,
2007L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2011L, 2011L,
2011L, 2012L, 2012L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L,
2014L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L
), ReporterName = c("Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola"), PartnerName = c("Belgium",
"Brazil", "Bahamas, The", "Belgium", "Brazil", "Bahamas, The",
"Belgium", "Brazil", "Belgium", "Benin", "Brazil", "Belgium",
"Brazil", "Bahamas, The", "Belgium", "Brazil", "Australia", "Bahamas, The",
"Belgium", "Brazil", "Afghanistan", "Albania", "Algeria", "Antigua and Barbuda",
"Argentina", "Armenia", "Australia", "Austria", "Azerbaijan",
"Bahamas, The", "Belgium", "Belize", "Benin", "Bhutan", "Bolivia",
"Botswana", "Brazil", "British Virgin Islands", "Brunei", "Bulgaria",
"Afghanistan", "Algeria", "Antigua and Barbuda", "Argentina",
"Armenia", "Australia", "Austria", "Azerbaijan", "Bahamas, The",
"Bahrain", "Bangladesh", "Barbados", "Belgium", "Belize", "Benin",
"Bolivia", "Botswana", "Brazil", "Bulgaria", "Afghanistan", "Albania",
"Algeria", "Andorra", "Antigua and Barbuda", "Argentina", "Armenia",
"Australia", "Austria", "Azerbaijan", "Bahamas, The", "Bahrain",
"Bangladesh", "Belgium", "Benin", "Bolivia", "Botswana", "Brazil",
"Bulgaria", "Afghanistan", "Albania", "Algeria", "American Samoa",
"Andorra", "Antigua and Barbuda", "Argentina", "Armenia", "Australia",
"Austria", "Azerbaijan", "Bahamas, The", "Bahrain", "Bangladesh",
"Belarus", "Belgium", "Belize", "Benin", "Bolivia", "Botswana",
"Brazil", "Bulgaria"), .rows = structure(list(56L, 84L, 41L,
58L, 89L, 40L, 63L, 86L, 66L, 71L, 88L, 65L, 90L, 43L, 57L,
87L, 28L, 44L, 62L, 85L, 1L, 6L, 9L, 16L, 20L, 26L, 29L,
35L, 39L, 47L, 60L, 69L, 74L, 75L, 79L, 80L, 94L, 95L, 96L,
100L, 4L, 8L, 17L, 19L, 25L, 30L, 34L, 36L, 42L, 48L, 53L,
54L, 61L, 68L, 70L, 76L, 81L, 93L, 98L, 3L, 5L, 10L, 13L,
18L, 21L, 24L, 31L, 32L, 38L, 46L, 49L, 51L, 64L, 73L, 78L,
83L, 91L, 97L, 2L, 7L, 11L, 12L, 14L, 15L, 22L, 23L, 27L,
33L, 37L, 45L, 50L, 52L, 55L, 59L, 67L, 72L, 77L, 82L, 92L,
99L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, 100L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
mean
的问题是数据中任何 ReporterName
的重复行。
问题一
total_trade %>%
# create year_group variable for average values with above predefined labels and cuts,
# chose right = FALSE to take cut before year_group_break
mutate(year_group = cut(Year, breaks = year_group_break,
labels = year_group_labels,
include.lowest = TRUE, right = FALSE)) %>%
# add column with mean of total trade per four year period: "avg_year_group_total"
group_by(ReporterName, year_group) %>%
mutate(dup = !duplicated(paste0(ReporterName, year_group, Total_Year)),
total_year_group = sum(Total_Year * dup)/sum(dup)) %>%
arrange(ReporterName,PartnerName, desc(Year))
# A tibble: 100 x 10
# Groups: ReporterName, year_group [3]
Year ReporterName PartnerName PartnerISO3 `TradeValue in 1000 USD` Total_Year pct_by_partner_year year_group dup total_year_group
<int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <fct> <lgl> <dbl>
1 2018 Angola Afghanistan AFG 19.4 42096736. 0.0000460 2015-2018 TRUE 34746014.
2 2017 Angola Afghanistan AFG 2.25 34904881. 0.00000644 2015-2018 TRUE 34746014.
3 2016 Angola Afghanistan AFG 0.775 28057500. 0.00000276 2015-2018 TRUE 34746014.
4 2015 Angola Afghanistan AFG 39.6 33924937. 0.000117 2015-2018 TRUE 34746014.
5 2018 Angola Albania ALB 2.38 42096736. 0.00000565 2015-2018 FALSE 34746014.
6 2017 Angola Albania ALB 39.7 34904881. 0.000114 2015-2018 FALSE 34746014.
7 2015 Angola Albania ALB 2.26 33924937. 0.00000666 2015-2018 FALSE 34746014.
8 2018 Angola Algeria DZA 245. 42096736. 0.000583 2015-2018 FALSE 34746014.
9 2017 Angola Algeria DZA 2242. 34904881. 0.00642 2015-2018 FALSE 34746014.
10 2016 Angola Algeria DZA 2169. 28057500. 0.00773 2015-2018 FALSE 34746014.
# ... with 90 more rows
问题2
使用 tidyr
中的 complete
。如果您可以显示所需的输出,我可以告诉您如何操作。
我有一个显示多个国家双边出口的数据集。由于数据波动,我需要计算年份组的平均值。并非所有国家/地区都包含准确的年份。有些开始较晚,有些之间有差距 - 这意味着,有些年份缺失(但没有 NA 条目)。在一位了不起的社区成员的帮助下,我已经设法将数据分成几部分:year_group.
下面我列出了另外两个问题以及我的代码,错误的输出和底部的数据集的一些示例输入数据 total_trade
问题 1
我面临的问题是代码没有计算出正确的方法。当我手动计算结果时,我得到的结果与我的代码不同。 (见下文)
这是我的代码
# create vectors for coding 4 years average
year_group_break <- c(1999, 2003, 2007, 2011, 2015, 2019)
year_group_labels <- c("1999-2002", "2003-2006", "2007-2010", "2011-2014", "2015-2018")
years <- c(1999, 2000, 2001, 2002,2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019)
FourY_av <- total_trade %>%
# create year_group variable for average values with above predefined labels and cuts,
# chose right = FALSE to take cut before year_group_break
mutate(year_group = cut(Year, breaks = year_group_break,
labels = year_group_labels,
include.lowest = TRUE, right = FALSE)) %>%
# add column with mean of total trade per four year period: "avg_year_group_total"
group_by(ReporterName, year_group) %>%
mutate(total_year_group = mean(Total_Year)) %>%
arrange(ReporterName,PartnerName, desc(Year))
View(FourY_av)
下面是错误的输出 此输出是 错误的 因为 total_year_group(安哥拉年份组“2015-2018”的平均值)应该是 34746013.5(手动计算时)而不是 34907582。(如在输出中)我的错误在哪里?
> head(FourY_av)
# A tibble: 6 x 9
# Groups: ReporterName, year_group [1]
Year ReporterName PartnerName PartnerISO3 `TradeValue in 1000 USD` Total_Year pct_by_partner_year year_group total_year_group
<int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <fct> <dbl>
1 2018 Angola Afghanistan AFG 19.4 42096736. 0.0000460 2015-2018 34907582.
2 2017 Angola Afghanistan AFG 2.25 34904881. 0.00000644 2015-2018 34907582.
3 2016 Angola Afghanistan AFG 0.775 28057500. 0.00000276 2015-2018 34907582.
4 2015 Angola Afghanistan AFG 39.6 33924937. 0.000117 2015-2018 34907582.
5 2018 Angola Albania ALB 2.38 42096736. 0.00000565 2015-2018 34907582.
6 2017 Angola Albania ALB 39.7 34904881. 0.000114 2015-2018 34907582.
问题2
另一个问题是并非所有国家/地区都显示所有年份的数据。一些起步较晚,一些有差距。 我仍然需要同年组的方法以确保可比性。数据集没有 NA。只是缺少数据。
例如安哥拉不涵盖 2008 年。数据集不包括 NA,但不包括安哥拉 2008 年的行和值。其他国家/地区正在显示 2008 年的数据。我仍然需要在 total_year_group 列中安哥拉可用年份的平均值(通过取 2007、2009 和 2010 年的平均值)。这不应该是 mean 函数的问题,对吧?或者在这种情况下我需要考虑一些特别的事情吗?
这是 total_trade
的一些 样本输入数据dput(head(total_trade, n = 100))
structure(list(Year = c(2015L, 2018L, 2017L, 2016L, 2017L, 2015L,
2018L, 2016L, 2015L, 2017L, 2018L, 2018L, 2017L, 2018L, 2018L,
2015L, 2016L, 2017L, 2016L, 2015L, 2017L, 2018L, 2018L, 2017L,
2016L, 2015L, 2018L, 2014L, 2015L, 2016L, 2017L, 2017L, 2018L,
2016L, 2015L, 2016L, 2018L, 2017L, 2015L, 2010L, 2009L, 2016L,
2013L, 2014L, 2018L, 2017L, 2015L, 2016L, 2017L, 2018L, 2017L,
2018L, 2016L, 2016L, 2018L, 2007L, 2013L, 2009L, 2018L, 2015L,
2016L, 2014L, 2010L, 2017L, 2012L, 2011L, 2018L, 2016L, 2015L,
2016L, 2011L, 2018L, 2017L, 2015L, 2015L, 2016L, 2018L, 2017L,
2015L, 2015L, 2016L, 2018L, 2017L, 2007L, 2014L, 2010L, 2013L,
2011L, 2009L, 2012L, 2017L, 2018L, 2016L, 2015L, 2015L, 2015L,
2017L, 2016L, 2018L, 2015L), ReporterName = c("Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola"
), PartnerName = c("Afghanistan", "Afghanistan", "Afghanistan",
"Afghanistan", "Albania", "Albania", "Albania", "Algeria", "Algeria",
"Algeria", "Algeria", "American Samoa", "Andorra", "Andorra",
"Antigua and Barbuda", "Antigua and Barbuda", "Antigua and Barbuda",
"Antigua and Barbuda", "Argentina", "Argentina", "Argentina",
"Argentina", "Armenia", "Armenia", "Armenia", "Armenia", "Australia",
"Australia", "Australia", "Australia", "Australia", "Austria",
"Austria", "Austria", "Austria", "Azerbaijan", "Azerbaijan",
"Azerbaijan", "Azerbaijan", "Bahamas, The", "Bahamas, The", "Bahamas, The",
"Bahamas, The", "Bahamas, The", "Bahamas, The", "Bahamas, The",
"Bahamas, The", "Bahrain", "Bahrain", "Bahrain", "Bangladesh",
"Bangladesh", "Bangladesh", "Barbados", "Belarus", "Belgium",
"Belgium", "Belgium", "Belgium", "Belgium", "Belgium", "Belgium",
"Belgium", "Belgium", "Belgium", "Belgium", "Belize", "Belize",
"Belize", "Benin", "Benin", "Benin", "Benin", "Benin", "Bhutan",
"Bolivia", "Bolivia", "Bolivia", "Bolivia", "Botswana", "Botswana",
"Botswana", "Botswana", "Brazil", "Brazil", "Brazil", "Brazil",
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil",
"British Virgin Islands", "Brunei", "Bulgaria", "Bulgaria", "Bulgaria",
"Bulgaria"), PartnerISO3 = c("AFG", "AFG", "AFG", "AFG", "ALB",
"ALB", "ALB", "DZA", "DZA", "DZA", "DZA", "ASM", "AND", "AND",
"ATG", "ATG", "ATG", "ATG", "ARG", "ARG", "ARG", "ARG", "ARM",
"ARM", "ARM", "ARM", "AUS", "AUS", "AUS", "AUS", "AUS", "AUT",
"AUT", "AUT", "AUT", "AZE", "AZE", "AZE", "AZE", "BHS", "BHS",
"BHS", "BHS", "BHS", "BHS", "BHS", "BHS", "BHR", "BHR", "BHR",
"BGD", "BGD", "BGD", "BRB", "BLR", "BEL", "BEL", "BEL", "BEL",
"BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BLZ", "BLZ",
"BLZ", "BEN", "BEN", "BEN", "BEN", "BEN", "BTN", "BOL", "BOL",
"BOL", "BOL", "BWA", "BWA", "BWA", "BWA", "BRA", "BRA", "BRA",
"BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "VGB",
"BRN", "BGR", "BGR", "BGR", "BGR"), `TradeValue in 1000 USD` = c(39.586,
19.353, 2.248, 0.775, 39.723, 2.259, 2.38, 2169.123, 2322.463,
2241.599, 245.226, 12.007, 5.975, 0.326, 422.006, 155.467, 47.018,
54.774, 483.147, 142.23, 98.7, 61.362, 60.105, 30.494, 0.99,
0.731, 40220.092, 45435.804, 16096.404, 8546.882, 1904.301, 627.179,
433.699, 23.118, 5.124, 985.67, 600.371, 143.356, 9.926, 140139.415,
108214.936, 64444.203, 100210.999, 52974.059, 7322.893, 145.791,
26.995, 4.847, 5.187, 1.958, 125.722, 55.22, 2.75, 3.366, 54.31,
107976.895, 123610.469, 66757.2, 67763.201, 50046.64, 40199.706,
52383.95, 45614.873, 28690.458, 52907.343, 39328.574, 452.078,
5.82, 0.32, 970.324, 1700.981, 804.478, 332.216, 69.342, 1.632,
1530.58, 308.752, 62.569, 19.822, 55.241, 37.029, 16.917, 0.198,
874217.786, 1032751.313, 509259.955, 428750.075, 333280.441,
192964.08, 315316.932, 119947.132, 141486.749, 66556.728, 1273.093,
5.064, 22.324, 158.252, 33.583, 8.435, 0.077), Total_Year = c(33924937.48,
42096736.31, 34904881.111, 28057499.527, 34904881.111, 33924937.48,
42096736.31, 28057499.527, 33924937.48, 34904881.111, 42096736.31,
42096736.31, 34904881.111, 42096736.31, 42096736.31, 33924937.48,
28057499.527, 34904881.111, 28057499.527, 33924937.48, 34904881.111,
42096736.31, 42096736.31, 34904881.111, 28057499.527, 33924937.48,
42096736.31, 58672369.19, 33924937.48, 28057499.527, 34904881.111,
34904881.111, 42096736.31, 28057499.527, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 33924937.48, 52612114.76, 40639411.73,
28057499.527, 67712526.544, 58672369.19, 42096736.31, 34904881.111,
33924937.48, 28057499.527, 34904881.111, 42096736.31, 34904881.111,
42096736.31, 28057499.527, 28057499.527, 42096736.31, 44177783.072,
67712526.544, 40639411.73, 42096736.31, 33924937.48, 28057499.527,
58672369.19, 52612114.76, 34904881.111, 70863076.416, 66427390.221,
42096736.31, 28057499.527, 33924937.48, 28057499.527, 66427390.221,
42096736.31, 34904881.111, 33924937.48, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 33924937.48, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 44177783.072, 58672369.19, 52612114.76,
67712526.544, 66427390.221, 40639411.73, 70863076.416, 34904881.111,
42096736.31, 28057499.527, 33924937.48, 33924937.48, 33924937.48,
34904881.111, 28057499.527, 42096736.31, 33924937.48), pct_by_partner_year = c(0.000116687024179005,
4.59726850497024e-05, 6.44035999679013e-06, 2.7621848456389e-06,
0.000113803567683494, 6.65881846158674e-06, 5.65364493454718e-06,
0.0077309918437765, 0.00684588733986371, 0.00642202158738646,
0.000582529719629944, 2.8522401146684e-05, 1.71179497245645e-05,
7.74406827169068e-07, 0.00100246726228929, 0.000458267609458834,
0.000167577299448064, 0.000156923611416451, 0.00172198880208503,
0.000419249114560196, 0.000282768474948037, 0.00014576426910659,
0.000142778289407966, 8.73631395649993e-05, 3.5284683834613e-06,
2.15475710288619e-06, 0.0955420669759755, 0.0774398658640565,
0.0474471147057807, 0.0304620231456308, 0.00545568682484317,
0.00179682319502973, 0.00103024376238159, 8.23950829180388e-05,
1.51039335091503e-05, 0.00351303578942051, 0.00142616994243657,
0.000410704736521284, 2.92587127267419e-05, 0.2663633948935,
0.266280763902189, 0.229686194730164, 0.147994771595005, 0.0902879153020956,
0.0173953936620509, 0.000417680838208199, 7.95727332317548e-05,
1.72752386410474e-05, 1.48603858110989e-05, 4.65119192514428e-06,
0.000360184581635431, 0.000131174064405754, 9.80130106517029e-06,
1.19967925037684e-05, 0.000129012376636663, 0.244414471464133,
0.18255184868885, 0.164267141570654, 0.160970200874938, 0.147521686751831,
0.143276153177212, 0.0892821454514031, 0.0867003221749986, 0.0821961201035531,
0.0746613690455784, 0.0592053577133712, 0.00107390272887404,
2.07431171633786e-05, 9.43258923288073e-07, 0.00345834096536738,
0.0025606620918584, 0.00191102225615742, 0.000951775194258733,
0.000204398313308255, 4.81062050876917e-06, 0.00545515468521031,
0.000733434529761055, 0.000179255731601051, 5.84289949294256e-05,
0.000162833019316739, 0.000131975409869888, 4.01860131755188e-05,
5.6725590719059e-07, 1.97886296054109, 1.76020046106476, 0.967951882039117,
0.633191666126054, 0.50172141324715, 0.474820062066878, 0.444966473299775,
0.34363999584631, 0.336099093188823, 0.237215465105691, 0.00375267603882995,
1.49270724610338e-05, 6.58041006358842e-05, 0.000453380716286491,
0.00011969348860786, 2.00371827827334e-05, 2.26971678416193e-07
)), row.names = c(NA, -100L), groups = structure(list(Year = c(2007L,
2007L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2011L, 2011L,
2011L, 2012L, 2012L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L,
2014L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L
), ReporterName = c("Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola"), PartnerName = c("Belgium",
"Brazil", "Bahamas, The", "Belgium", "Brazil", "Bahamas, The",
"Belgium", "Brazil", "Belgium", "Benin", "Brazil", "Belgium",
"Brazil", "Bahamas, The", "Belgium", "Brazil", "Australia", "Bahamas, The",
"Belgium", "Brazil", "Afghanistan", "Albania", "Algeria", "Antigua and Barbuda",
"Argentina", "Armenia", "Australia", "Austria", "Azerbaijan",
"Bahamas, The", "Belgium", "Belize", "Benin", "Bhutan", "Bolivia",
"Botswana", "Brazil", "British Virgin Islands", "Brunei", "Bulgaria",
"Afghanistan", "Algeria", "Antigua and Barbuda", "Argentina",
"Armenia", "Australia", "Austria", "Azerbaijan", "Bahamas, The",
"Bahrain", "Bangladesh", "Barbados", "Belgium", "Belize", "Benin",
"Bolivia", "Botswana", "Brazil", "Bulgaria", "Afghanistan", "Albania",
"Algeria", "Andorra", "Antigua and Barbuda", "Argentina", "Armenia",
"Australia", "Austria", "Azerbaijan", "Bahamas, The", "Bahrain",
"Bangladesh", "Belgium", "Benin", "Bolivia", "Botswana", "Brazil",
"Bulgaria", "Afghanistan", "Albania", "Algeria", "American Samoa",
"Andorra", "Antigua and Barbuda", "Argentina", "Armenia", "Australia",
"Austria", "Azerbaijan", "Bahamas, The", "Bahrain", "Bangladesh",
"Belarus", "Belgium", "Belize", "Benin", "Bolivia", "Botswana",
"Brazil", "Bulgaria"), .rows = structure(list(56L, 84L, 41L,
58L, 89L, 40L, 63L, 86L, 66L, 71L, 88L, 65L, 90L, 43L, 57L,
87L, 28L, 44L, 62L, 85L, 1L, 6L, 9L, 16L, 20L, 26L, 29L,
35L, 39L, 47L, 60L, 69L, 74L, 75L, 79L, 80L, 94L, 95L, 96L,
100L, 4L, 8L, 17L, 19L, 25L, 30L, 34L, 36L, 42L, 48L, 53L,
54L, 61L, 68L, 70L, 76L, 81L, 93L, 98L, 3L, 5L, 10L, 13L,
18L, 21L, 24L, 31L, 32L, 38L, 46L, 49L, 51L, 64L, 73L, 78L,
83L, 91L, 97L, 2L, 7L, 11L, 12L, 14L, 15L, 22L, 23L, 27L,
33L, 37L, 45L, 50L, 52L, 55L, 59L, 67L, 72L, 77L, 82L, 92L,
99L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, 100L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
mean
的问题是数据中任何 ReporterName
的重复行。
问题一
total_trade %>%
# create year_group variable for average values with above predefined labels and cuts,
# chose right = FALSE to take cut before year_group_break
mutate(year_group = cut(Year, breaks = year_group_break,
labels = year_group_labels,
include.lowest = TRUE, right = FALSE)) %>%
# add column with mean of total trade per four year period: "avg_year_group_total"
group_by(ReporterName, year_group) %>%
mutate(dup = !duplicated(paste0(ReporterName, year_group, Total_Year)),
total_year_group = sum(Total_Year * dup)/sum(dup)) %>%
arrange(ReporterName,PartnerName, desc(Year))
# A tibble: 100 x 10
# Groups: ReporterName, year_group [3]
Year ReporterName PartnerName PartnerISO3 `TradeValue in 1000 USD` Total_Year pct_by_partner_year year_group dup total_year_group
<int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <fct> <lgl> <dbl>
1 2018 Angola Afghanistan AFG 19.4 42096736. 0.0000460 2015-2018 TRUE 34746014.
2 2017 Angola Afghanistan AFG 2.25 34904881. 0.00000644 2015-2018 TRUE 34746014.
3 2016 Angola Afghanistan AFG 0.775 28057500. 0.00000276 2015-2018 TRUE 34746014.
4 2015 Angola Afghanistan AFG 39.6 33924937. 0.000117 2015-2018 TRUE 34746014.
5 2018 Angola Albania ALB 2.38 42096736. 0.00000565 2015-2018 FALSE 34746014.
6 2017 Angola Albania ALB 39.7 34904881. 0.000114 2015-2018 FALSE 34746014.
7 2015 Angola Albania ALB 2.26 33924937. 0.00000666 2015-2018 FALSE 34746014.
8 2018 Angola Algeria DZA 245. 42096736. 0.000583 2015-2018 FALSE 34746014.
9 2017 Angola Algeria DZA 2242. 34904881. 0.00642 2015-2018 FALSE 34746014.
10 2016 Angola Algeria DZA 2169. 28057500. 0.00773 2015-2018 FALSE 34746014.
# ... with 90 more rows
问题2
使用 tidyr
中的 complete
。如果您可以显示所需的输出,我可以告诉您如何操作。