R:每组时间序列的平均年数
R: Average years in time series per group
亲爱的社区,
我正在使用 R 并寻找 20 年来双边出口时间序列数据的趋势。由于数据在不同年份之间波动很大(而且不是 100% 可靠),我更愿意使用 four-years-average 数据(而不是单独查看每一年)来分析主要出口合作伙伴随着时间的推移而改变。
我有以下 数据集 ,称为 GrossExp3,涵盖了 15 个报告国家在(1998 年至 2019 年)之间所有年份的双边出口(以 1000 美元为单位) ) 到所有可用的伙伴国家。
它涵盖以下四个 变量:
Year, ReporterName (= exporter) , PartnerName (= export destination), 'TradeValue in 1000 USD' (= export value to the destination)
PartnerName 列还包括一个名为“All”的条目,这是报告者每年所有出口的总和。
这是我的数据摘要
> summary(GrossExp3)
Year ReporterName PartnerName TradeValue in 1000 USD
Min. :1998 Length:35961 Length:35961 Min. : 0
1st Qu.:2004 Class :character Class :character 1st Qu.: 39
Median :2009 Mode :character Mode :character Median : 597
Mean :2009 Mean : 134370
3rd Qu.:2014 3rd Qu.: 10090
Max. :2018 Max. :47471515
我的目标是return一个table,它显示了每个出口商到出口目的地的总贸易额占总出口额的百分比那个时期。我希望获得以下时期的平均数据,而不是每一年:2000-2003、2004-2007、2008-2011、2012-2015、2016-2019。
我试过的
我当前的代码(在这个令人惊叹的社区的支持下创建的代码如下:(目前,它分别显示每年的数据,但我需要标题中的平均数据)
# install packages
library(data.table)
library(dplyr)
library(tidyr)
library(stringr)
library(plyr)
library(visdat)
# set working directory
setwd("C:/R/R_09.2020/Other Indicators/Bilateral Trade Shift of Partners")
# load data
# create a file path SITC 3
path1 <- file.path("SITC Rev 3_Data from 1998.csv")
# load cvs data table, call "SITC3"
SITC3 <- fread(path1, drop = c(1,9,11,13))
# prepare data (SITC3) for analysis
# Filter for GROSS EXPORTS SITC3 (Gross exports = Exports that include intermediate products)
GrossExp3 <- SITC3 %>%
filter(TradeFlowName == "Gross Exp.", PartnerISO3 != "All", Year != 2019) %>% # filter for gross exports, remove "All", remove 2019
select(Year, ReporterName, PartnerName, `TradeValue in 1000 USD`) %>%
arrange(ReporterName, desc(Year))
# compare with old subset
summary(GrossExp3)
summary(SITC3)
# calculate percentage of total
GrossExp3Main <- GrossExp3 %>%
group_by(Year, ReporterName) %>%
add_tally(wt = `TradeValue in 1000 USD`, name = "TotalValue") %>%
mutate(Percentage = 100 * (`TradeValue in 1000 USD` / TotalValue)) %>%
arrange(ReporterName, desc(Year), desc(Percentage))
head(GrossExp3Main, n = 20)
# print tables in separate sheets to get an overview about hierarchy of export partners and development over time
SpreadExpMain <- GrossExp3Main %>%
select(Year, ReporterName, PartnerName, Percentage) %>%
spread(key = Year, value = Percentage) %>%
arrange(ReporterName, desc(`2018`))
View(SpreadExpMain) # shows whole table
这是我的数据头
> head(GrossExp3Main, n = 20)
# A tibble: 20 x 6
# Groups: Year, ReporterName [7]
Year ReporterName PartnerName `TradeValue in 100~ TotalValue Percentage
<int> <chr> <chr> <dbl> <dbl> <dbl>
1 2018 Angola China 24517058. 42096736. 58.2
2 2018 Angola India 3768940. 42096736. 8.95
3 2017 Angola China 19487067. 34904881. 55.8
4 2017 Angola India 2890061. 34904881. 8.28
5 2016 Angola China 13923092. 28057500. 49.6
6 2016 Angola India 1948845. 28057500. 6.95
7 2016 Angola United States 1525650. 28057500. 5.44
8 2015 Angola China 14320566. 33924937. 42.2
9 2015 Angola India 2676340. 33924937. 7.89
10 2015 Angola Spain 2245976. 33924937. 6.62
11 2014 Angola China 27527111. 58672369. 46.9
12 2014 Angola India 4507416. 58672369. 7.68
13 2014 Angola Spain 3726455. 58672369. 6.35
14 2013 Angola China 31947235. 67712527. 47.2
15 2013 Angola India 6764233. 67712527. 9.99
16 2013 Angola United States 5018391. 67712527. 7.41
17 2013 Angola Other Asia, ~ 4007020. 67712527. 5.92
18 2012 Angola China 33710030. 70863076. 47.6
19 2012 Angola India 6932061. 70863076. 9.78
20 2012 Angola United States 6594526. 70863076. 9.31
我不确定到目前为止我得到的结果是否正确?
另外,我还有以下问题:
- 关于如何使用 R 打印漂亮的 tables,您有什么建议吗?
- 我怎样才能更好地将百分比数据四舍五入到逗号后面的一个数字?
由于我在一周内一直被这些问题所困扰,如果您能提供有关如何解决该问题的任何建议,我将不胜感激!
祝你周末愉快,万事如意,
梅丽克
** 编辑**
这是一些样本数据
dput(head(GrossExp3Main, n = 20))
structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L), ReporterName = c("Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola"), PartnerName = c("China",
"India", "United States", "Spain", "South Africa", "Portugal",
"United Arab Emirates", "France", "Thailand", "Canada", "Indonesia",
"Singapore", "Italy", "Israel", "United Kingdom", "Unspecified",
"Namibia", "Uruguay", "Congo, Rep.", "Japan"), `TradeValue in 1000 USD` = c(24517058.342,
3768940.47, 1470132.736, 1250554.873, 1161852.097, 1074137.369,
884725.078, 734551.345, 649626.328, 647164.297, 575477.283, 513982.584,
468914.918, 452453.482, 425616.975, 423008.886, 327921.516, 320586.229,
299119.102, 264671.779), TotalValue = c(42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31), Percentage = c(58.2398078593471,
8.9530467213552, 3.49227247731025, 2.97066942147468, 2.75995765667944,
2.55159298119945, 2.10164767046284, 1.74491281127062, 1.54317504144777,
1.53732653342598, 1.3670353890672, 1.22095589599877, 1.11389850877492,
1.07479467925527, 1.01104506502775, 1.00484959899258, 0.778971352043039,
0.761546516668669, 0.710551762961598, 0.62872279943737)), row.names = c(NA,
-20L), groups = structure(list(Year = 2018L, ReporterName = "Angola",
.rows = structure(list(1:20), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
>
要执行您想要的操作,需要一个额外的变量来将年份组合在一起。我用 cut
来做到这一点。
library(dplyr)
# Define the cut breaks and labels for each group
# The cut define by the starting of each group and when using cut function
# I would use param right = FALSE to have the desire cut that I want here.
year_group_break <- c(2000, 2004, 2008, 2012, 2016, 2020)
year_group_labels <- c("2000-2003", "2004-2007", "2008-2011", "2012-2015", "2016-2019")
data %>%
# create the year group variable
mutate(year_group = cut(Year, breaks = year_group_break,
labels = year_group_labels,
include.lowest = TRUE, right = FALSE)) %>%
# calculte the total value for each Reporter + Partner in each year group
group_by(year_group, ReporterName, PartnerName) %>%
summarize(`TradeValue in 1000 USD` = sum(`TradeValue in 1000 USD`),
.groups = "drop") %>%
# calculate the percentage value for Partner of each Reporter/Year group
group_by(year_group, ReporterName) %>%
mutate(Percentage = `TradeValue in 1000 USD` / sum(`TradeValue in 1000 USD`)) %>%
ungroup()
示例输出
year_group ReporterName PartnerName `TradeValue in 1000 USD` Percentage
<fct> <chr> <chr> <dbl> <dbl>
1 2016-2019 Angola Canada 647164. 0.0161
2 2016-2019 Angola China 24517058. 0.609
3 2016-2019 Angola Congo, Rep. 299119. 0.00744
4 2016-2019 Angola France 734551. 0.0183
5 2016-2019 Angola India 3768940. 0.0937
6 2016-2019 Angola Indonesia 575477. 0.0143
7 2016-2019 Angola Israel 452453. 0.0112
8 2016-2019 Angola Italy 468915. 0.0117
9 2016-2019 Angola Japan 264672. 0.00658
10 2016-2019 Angola Namibia 327922. 0.00815
11 2016-2019 Angola Portugal 1074137. 0.0267
12 2016-2019 Angola Singapore 513983. 0.0128
13 2016-2019 Angola South Africa 1161852. 0.0289
14 2016-2019 Angola Spain 1250555. 0.0311
15 2016-2019 Angola Thailand 649626. 0.0161
16 2016-2019 Angola United Arab Emirates 884725. 0.0220
17 2016-2019 Angola United Kingdom 425617. 0.0106
18 2016-2019 Angola United States 1470133. 0.0365
19 2016-2019 Angola Unspecified 423009. 0.0105
20 2016-2019 Angola Uruguay 320586. 0.00797
亲爱的社区,
我正在使用 R 并寻找 20 年来双边出口时间序列数据的趋势。由于数据在不同年份之间波动很大(而且不是 100% 可靠),我更愿意使用 four-years-average 数据(而不是单独查看每一年)来分析主要出口合作伙伴随着时间的推移而改变。 我有以下 数据集 ,称为 GrossExp3,涵盖了 15 个报告国家在(1998 年至 2019 年)之间所有年份的双边出口(以 1000 美元为单位) ) 到所有可用的伙伴国家。 它涵盖以下四个 变量: Year, ReporterName (= exporter) , PartnerName (= export destination), 'TradeValue in 1000 USD' (= export value to the destination) PartnerName 列还包括一个名为“All”的条目,这是报告者每年所有出口的总和。
这是我的数据摘要
> summary(GrossExp3)
Year ReporterName PartnerName TradeValue in 1000 USD
Min. :1998 Length:35961 Length:35961 Min. : 0
1st Qu.:2004 Class :character Class :character 1st Qu.: 39
Median :2009 Mode :character Mode :character Median : 597
Mean :2009 Mean : 134370
3rd Qu.:2014 3rd Qu.: 10090
Max. :2018 Max. :47471515
我的目标是return一个table,它显示了每个出口商到出口目的地的总贸易额占总出口额的百分比那个时期。我希望获得以下时期的平均数据,而不是每一年:2000-2003、2004-2007、2008-2011、2012-2015、2016-2019。
我试过的 我当前的代码(在这个令人惊叹的社区的支持下创建的代码如下:(目前,它分别显示每年的数据,但我需要标题中的平均数据)
# install packages
library(data.table)
library(dplyr)
library(tidyr)
library(stringr)
library(plyr)
library(visdat)
# set working directory
setwd("C:/R/R_09.2020/Other Indicators/Bilateral Trade Shift of Partners")
# load data
# create a file path SITC 3
path1 <- file.path("SITC Rev 3_Data from 1998.csv")
# load cvs data table, call "SITC3"
SITC3 <- fread(path1, drop = c(1,9,11,13))
# prepare data (SITC3) for analysis
# Filter for GROSS EXPORTS SITC3 (Gross exports = Exports that include intermediate products)
GrossExp3 <- SITC3 %>%
filter(TradeFlowName == "Gross Exp.", PartnerISO3 != "All", Year != 2019) %>% # filter for gross exports, remove "All", remove 2019
select(Year, ReporterName, PartnerName, `TradeValue in 1000 USD`) %>%
arrange(ReporterName, desc(Year))
# compare with old subset
summary(GrossExp3)
summary(SITC3)
# calculate percentage of total
GrossExp3Main <- GrossExp3 %>%
group_by(Year, ReporterName) %>%
add_tally(wt = `TradeValue in 1000 USD`, name = "TotalValue") %>%
mutate(Percentage = 100 * (`TradeValue in 1000 USD` / TotalValue)) %>%
arrange(ReporterName, desc(Year), desc(Percentage))
head(GrossExp3Main, n = 20)
# print tables in separate sheets to get an overview about hierarchy of export partners and development over time
SpreadExpMain <- GrossExp3Main %>%
select(Year, ReporterName, PartnerName, Percentage) %>%
spread(key = Year, value = Percentage) %>%
arrange(ReporterName, desc(`2018`))
View(SpreadExpMain) # shows whole table
这是我的数据头
> head(GrossExp3Main, n = 20)
# A tibble: 20 x 6
# Groups: Year, ReporterName [7]
Year ReporterName PartnerName `TradeValue in 100~ TotalValue Percentage
<int> <chr> <chr> <dbl> <dbl> <dbl>
1 2018 Angola China 24517058. 42096736. 58.2
2 2018 Angola India 3768940. 42096736. 8.95
3 2017 Angola China 19487067. 34904881. 55.8
4 2017 Angola India 2890061. 34904881. 8.28
5 2016 Angola China 13923092. 28057500. 49.6
6 2016 Angola India 1948845. 28057500. 6.95
7 2016 Angola United States 1525650. 28057500. 5.44
8 2015 Angola China 14320566. 33924937. 42.2
9 2015 Angola India 2676340. 33924937. 7.89
10 2015 Angola Spain 2245976. 33924937. 6.62
11 2014 Angola China 27527111. 58672369. 46.9
12 2014 Angola India 4507416. 58672369. 7.68
13 2014 Angola Spain 3726455. 58672369. 6.35
14 2013 Angola China 31947235. 67712527. 47.2
15 2013 Angola India 6764233. 67712527. 9.99
16 2013 Angola United States 5018391. 67712527. 7.41
17 2013 Angola Other Asia, ~ 4007020. 67712527. 5.92
18 2012 Angola China 33710030. 70863076. 47.6
19 2012 Angola India 6932061. 70863076. 9.78
20 2012 Angola United States 6594526. 70863076. 9.31
我不确定到目前为止我得到的结果是否正确? 另外,我还有以下问题:
- 关于如何使用 R 打印漂亮的 tables,您有什么建议吗?
- 我怎样才能更好地将百分比数据四舍五入到逗号后面的一个数字?
由于我在一周内一直被这些问题所困扰,如果您能提供有关如何解决该问题的任何建议,我将不胜感激!
祝你周末愉快,万事如意,
梅丽克
** 编辑** 这是一些样本数据
dput(head(GrossExp3Main, n = 20))
structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L), ReporterName = c("Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola"), PartnerName = c("China",
"India", "United States", "Spain", "South Africa", "Portugal",
"United Arab Emirates", "France", "Thailand", "Canada", "Indonesia",
"Singapore", "Italy", "Israel", "United Kingdom", "Unspecified",
"Namibia", "Uruguay", "Congo, Rep.", "Japan"), `TradeValue in 1000 USD` = c(24517058.342,
3768940.47, 1470132.736, 1250554.873, 1161852.097, 1074137.369,
884725.078, 734551.345, 649626.328, 647164.297, 575477.283, 513982.584,
468914.918, 452453.482, 425616.975, 423008.886, 327921.516, 320586.229,
299119.102, 264671.779), TotalValue = c(42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31,
42096736.31, 42096736.31, 42096736.31), Percentage = c(58.2398078593471,
8.9530467213552, 3.49227247731025, 2.97066942147468, 2.75995765667944,
2.55159298119945, 2.10164767046284, 1.74491281127062, 1.54317504144777,
1.53732653342598, 1.3670353890672, 1.22095589599877, 1.11389850877492,
1.07479467925527, 1.01104506502775, 1.00484959899258, 0.778971352043039,
0.761546516668669, 0.710551762961598, 0.62872279943737)), row.names = c(NA,
-20L), groups = structure(list(Year = 2018L, ReporterName = "Angola",
.rows = structure(list(1:20), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
>
要执行您想要的操作,需要一个额外的变量来将年份组合在一起。我用 cut
来做到这一点。
library(dplyr)
# Define the cut breaks and labels for each group
# The cut define by the starting of each group and when using cut function
# I would use param right = FALSE to have the desire cut that I want here.
year_group_break <- c(2000, 2004, 2008, 2012, 2016, 2020)
year_group_labels <- c("2000-2003", "2004-2007", "2008-2011", "2012-2015", "2016-2019")
data %>%
# create the year group variable
mutate(year_group = cut(Year, breaks = year_group_break,
labels = year_group_labels,
include.lowest = TRUE, right = FALSE)) %>%
# calculte the total value for each Reporter + Partner in each year group
group_by(year_group, ReporterName, PartnerName) %>%
summarize(`TradeValue in 1000 USD` = sum(`TradeValue in 1000 USD`),
.groups = "drop") %>%
# calculate the percentage value for Partner of each Reporter/Year group
group_by(year_group, ReporterName) %>%
mutate(Percentage = `TradeValue in 1000 USD` / sum(`TradeValue in 1000 USD`)) %>%
ungroup()
示例输出
year_group ReporterName PartnerName `TradeValue in 1000 USD` Percentage
<fct> <chr> <chr> <dbl> <dbl>
1 2016-2019 Angola Canada 647164. 0.0161
2 2016-2019 Angola China 24517058. 0.609
3 2016-2019 Angola Congo, Rep. 299119. 0.00744
4 2016-2019 Angola France 734551. 0.0183
5 2016-2019 Angola India 3768940. 0.0937
6 2016-2019 Angola Indonesia 575477. 0.0143
7 2016-2019 Angola Israel 452453. 0.0112
8 2016-2019 Angola Italy 468915. 0.0117
9 2016-2019 Angola Japan 264672. 0.00658
10 2016-2019 Angola Namibia 327922. 0.00815
11 2016-2019 Angola Portugal 1074137. 0.0267
12 2016-2019 Angola Singapore 513983. 0.0128
13 2016-2019 Angola South Africa 1161852. 0.0289
14 2016-2019 Angola Spain 1250555. 0.0311
15 2016-2019 Angola Thailand 649626. 0.0161
16 2016-2019 Angola United Arab Emirates 884725. 0.0220
17 2016-2019 Angola United Kingdom 425617. 0.0106
18 2016-2019 Angola United States 1470133. 0.0365
19 2016-2019 Angola Unspecified 423009. 0.0105
20 2016-2019 Angola Uruguay 320586. 0.00797