R：每组时间序列的平均年数

Question

亲爱的社区，

我正在使用 R 并寻找 20 年来双边出口时间序列数据的趋势。由于数据在不同年份之间波动很大（而且不是 100% 可靠），我更愿意使用 four-years-average 数据（而不是单独查看每一年）来分析主要出口合作伙伴随着时间的推移而改变。我有以下 数据集 ，称为 GrossExp3，涵盖了 15 个报告国家在（1998 年至 2019 年）之间所有年份的双边出口（以 1000 美元为单位） ) 到所有可用的伙伴国家。它涵盖以下四个变量： Year, ReporterName (= exporter) , PartnerName (= export destination), 'TradeValue in 1000 USD' （= export value to the destination） PartnerName 列还包括一个名为“All”的条目，这是报告者每年所有出口的总和。

这是我的数据摘要

> summary(GrossExp3)
      Year      ReporterName       PartnerName        TradeValue in 1000 USD
 Min.   :1998   Length:35961       Length:35961       Min.   :       0      
 1st Qu.:2004   Class :character   Class :character   1st Qu.:      39      
 Median :2009   Mode  :character   Mode  :character   Median :     597      
 Mean   :2009                                         Mean   :  134370      
 3rd Qu.:2014                                         3rd Qu.:   10090      
 Max.   :2018                                         Max.   :47471515

我的目标是return一个table，它显示了每个出口商到出口目的地的总贸易额占总出口额的百分比那个时期。我希望获得以下时期的平均数据，而不是每一年：2000-2003、2004-2007、2008-2011、2012-2015、2016-2019。

我试过的 我当前的代码（在这个令人惊叹的社区的支持下创建的代码如下：（目前，它分别显示每年的数据，但我需要标题中的平均数据）

# install packages
library(data.table)
library(dplyr)
library(tidyr)
library(stringr)
library(plyr)
library(visdat)

# set working directory
setwd("C:/R/R_09.2020/Other Indicators/Bilateral Trade Shift of Partners")

# load data

# create a file path SITC 3 
path1 <- file.path("SITC Rev 3_Data from 1998.csv")

# load cvs data table, call "SITC3" 
SITC3 <- fread(path1, drop = c(1,9,11,13))

# prepare data (SITC3) for analysis
# Filter for GROSS EXPORTS SITC3 (Gross exports = Exports that include intermediate products)
GrossExp3 <- SITC3 %>%
  filter(TradeFlowName == "Gross Exp.", PartnerISO3 != "All", Year != 2019) %>%  # filter for gross exports, remove "All", remove 2019
  select(Year, ReporterName, PartnerName, `TradeValue in 1000 USD`) %>%
  arrange(ReporterName, desc(Year))
# compare with old subset
summary(GrossExp3)
summary(SITC3)

# calculate percentage of total
GrossExp3Main <- GrossExp3 %>%
  group_by(Year, ReporterName) %>%
  add_tally(wt = `TradeValue in 1000 USD`, name = "TotalValue") %>%
  mutate(Percentage = 100 * (`TradeValue in 1000 USD` / TotalValue)) %>%
  arrange(ReporterName, desc(Year), desc(Percentage))
head(GrossExp3Main, n = 20)

# print tables in separate sheets to get an overview about hierarchy of export partners and development over time
SpreadExpMain <- GrossExp3Main %>%
  select(Year, ReporterName, PartnerName, Percentage) %>%
  spread(key = Year, value = Percentage) %>%
  arrange(ReporterName, desc(`2018`))
View(SpreadExpMain) # shows whole table

这是我的数据头

> head(GrossExp3Main, n = 20)
# A tibble: 20 x 6
# Groups:   Year, ReporterName [7]
    Year ReporterName PartnerName   `TradeValue in 100~ TotalValue Percentage
   <int> <chr>        <chr>                       <dbl>      <dbl>      <dbl>
 1  2018 Angola       China                   24517058.  42096736.      58.2 
 2  2018 Angola       India                    3768940.  42096736.       8.95
 3  2017 Angola       China                   19487067.  34904881.      55.8 
 4  2017 Angola       India                    2890061.  34904881.       8.28
 5  2016 Angola       China                   13923092.  28057500.      49.6 
 6  2016 Angola       India                    1948845.  28057500.       6.95
 7  2016 Angola       United States            1525650.  28057500.       5.44
 8  2015 Angola       China                   14320566.  33924937.      42.2 
 9  2015 Angola       India                    2676340.  33924937.       7.89
10  2015 Angola       Spain                    2245976.  33924937.       6.62
11  2014 Angola       China                   27527111.  58672369.      46.9 
12  2014 Angola       India                    4507416.  58672369.       7.68
13  2014 Angola       Spain                    3726455.  58672369.       6.35
14  2013 Angola       China                   31947235.  67712527.      47.2 
15  2013 Angola       India                    6764233.  67712527.       9.99
16  2013 Angola       United States            5018391.  67712527.       7.41
17  2013 Angola       Other Asia, ~            4007020.  67712527.       5.92
18  2012 Angola       China                   33710030.  70863076.      47.6 
19  2012 Angola       India                    6932061.  70863076.       9.78
20  2012 Angola       United States            6594526.  70863076.       9.31

我不确定到目前为止我得到的结果是否正确？另外，我还有以下问题：

关于如何使用 R 打印漂亮的 tables，您有什么建议吗？
我怎样才能更好地将百分比数据四舍五入到逗号后面的一个数字？

由于我在一周内一直被这些问题所困扰，如果您能提供有关如何解决该问题的任何建议，我将不胜感激！

祝你周末愉快，万事如意，

梅丽克

** 编辑** 这是一些样本数据

dput(head(GrossExp3Main, n = 20))
structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L), ReporterName = c("Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola"), PartnerName = c("China", 
"India", "United States", "Spain", "South Africa", "Portugal", 
"United Arab Emirates", "France", "Thailand", "Canada", "Indonesia", 
"Singapore", "Italy", "Israel", "United Kingdom", "Unspecified", 
"Namibia", "Uruguay", "Congo, Rep.", "Japan"), `TradeValue in 1000 USD` = c(24517058.342, 
3768940.47, 1470132.736, 1250554.873, 1161852.097, 1074137.369, 
884725.078, 734551.345, 649626.328, 647164.297, 575477.283, 513982.584, 
468914.918, 452453.482, 425616.975, 423008.886, 327921.516, 320586.229, 
299119.102, 264671.779), TotalValue = c(42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31), Percentage = c(58.2398078593471, 
8.9530467213552, 3.49227247731025, 2.97066942147468, 2.75995765667944, 
2.55159298119945, 2.10164767046284, 1.74491281127062, 1.54317504144777, 
1.53732653342598, 1.3670353890672, 1.22095589599877, 1.11389850877492, 
1.07479467925527, 1.01104506502775, 1.00484959899258, 0.778971352043039, 
0.761546516668669, 0.710551762961598, 0.62872279943737)), row.names = c(NA, 
-20L), groups = structure(list(Year = 2018L, ReporterName = "Angola", 
    .rows = structure(list(1:20), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))
>

Answer 1

要执行您想要的操作，需要一个额外的变量来将年份组合在一起。我用 cut 来做到这一点。

library(dplyr)
# Define the cut breaks and labels for each group
# The cut define by the starting of each group and when using cut function
# I would use param right = FALSE to have the desire cut that I want here.
year_group_break <- c(2000, 2004, 2008, 2012, 2016, 2020)
year_group_labels <- c("2000-2003", "2004-2007", "2008-2011", "2012-2015", "2016-2019")

data %>%
  # create the year group variable
  mutate(year_group = cut(Year, breaks = year_group_break,
    labels  = year_group_labels,
    include.lowest = TRUE, right = FALSE)) %>%
  # calculte the total value for each Reporter + Partner in each year group
  group_by(year_group, ReporterName, PartnerName) %>%
  summarize(`TradeValue in 1000 USD` = sum(`TradeValue in 1000 USD`),
    .groups = "drop") %>%
  # calculate the percentage value for Partner of each Reporter/Year group
  group_by(year_group, ReporterName) %>%
  mutate(Percentage = `TradeValue in 1000 USD` / sum(`TradeValue in 1000 USD`)) %>%
  
  ungroup()

示例输出

   year_group ReporterName PartnerName          `TradeValue in 1000 USD` Percentage
   <fct>      <chr>        <chr>                                   <dbl>      <dbl>
 1 2016-2019  Angola       Canada                                647164.    0.0161 
 2 2016-2019  Angola       China                               24517058.    0.609  
 3 2016-2019  Angola       Congo, Rep.                           299119.    0.00744
 4 2016-2019  Angola       France                                734551.    0.0183 
 5 2016-2019  Angola       India                                3768940.    0.0937 
 6 2016-2019  Angola       Indonesia                             575477.    0.0143 
 7 2016-2019  Angola       Israel                                452453.    0.0112 
 8 2016-2019  Angola       Italy                                 468915.    0.0117 
 9 2016-2019  Angola       Japan                                 264672.    0.00658
10 2016-2019  Angola       Namibia                               327922.    0.00815
11 2016-2019  Angola       Portugal                             1074137.    0.0267 
12 2016-2019  Angola       Singapore                             513983.    0.0128 
13 2016-2019  Angola       South Africa                         1161852.    0.0289 
14 2016-2019  Angola       Spain                                1250555.    0.0311 
15 2016-2019  Angola       Thailand                              649626.    0.0161 
16 2016-2019  Angola       United Arab Emirates                  884725.    0.0220 
17 2016-2019  Angola       United Kingdom                        425617.    0.0106 
18 2016-2019  Angola       United States                        1470133.    0.0365 
19 2016-2019  Angola       Unspecified                           423009.    0.0105 
20 2016-2019  Angola       Uruguay                               320586.    0.00797

R：每组时间序列的平均年数

R: Average years in time series per group

average

r

time-series