使用 WDI API 后,如何通过字符串轻松删除聚合国家的行?

After using WDI API, how to easily remove rows of aggregated countries by string?

嗯,我有这个:

library(WDI)
wdi <- WDI(indicator=c("SI.POV.NAHC", "SI.POV.GINI", "SL.UEM.TOTL.ZS",
                       "SP.POP.TOTL"), start=1991, end=2018)
wdi <- as_tibble(wdi)
wdi <- wdi %>% select(-iso2c)
wdi <- wdi %>% rename(`Poverty Headcount`=SI.POV.NAHC, 
                      `Gini Index`=SI.POV.GINI,
                      `Total Unemployment`= SL.UEM.TOTL.ZS, 
                      `Total Population`=SP.POP.TOTL)
##Remove rows command goes here
kable(head(wdi))

如您所见,第一列是国家/地区,行是每年的国家/地区观察值。我想从此数据集中删除所有“组”(阿拉伯世界、世界、南亚……)我只想要国家。

这是世界银行的数据。


我使用的非基础包是:

library(tidyverse)
library(WDI)
library(psych)
library(pastecs)
library(xlsx)
library(stargazer)
library(xtable)
library(markdown)
library(knitr)
library(haven)
library(readr)
library(ggplot2)
library(dplyr)
library(sf)
library(tmap)
library(spData)
library(RColorBrewer)
library(stringr)

您可以检查 "iso2c" 列的有效性。一个简单的方法是使用 countrycode 包的 countrycode(<values>, <from>, <to>)

wdi <- WDI::WDI(indicator=c("SI.POV.NAHC", "SI.POV.GINI", "SL.UEM.TOTL.ZS", 
                            "SP.POP.TOTL"), start=1991, end=2018)

当您将 "iso2c" 用于 <from> 以及 <to> 时,您将得到 NA,您可以使用 is.na 轻松识别.但是在申请之前检查结果,否则该方法也会删除有争议的国家,例如科索沃。

library(countrycode)
rm.groups <- unique(wdi$country[is.na(countrycode(wdi$iso2c, "iso2c", "iso2c"))])

rm.groups
#  [1] "Arab World"                                          
#  [2] "World"                                               
#  [3] "East Asia & Pacific (excluding high income)"         
#  [4] "Europe & Central Asia (excluding high income)"       
#  [5] "South Asia"                                          
#  [6] "Central Europe and the Baltics"                      
#  [7] "European Union"                                      
#  [8] "Fragile and conflict affected situations"            
#  [9] "Channel Islands"                                     
# [10] "OECD members"                                        
# [11] "Small states"                                        
# [12] "Pacific island small states"                         
# [13] "Caribbean small states"                              
# [14] "Other small states"                                  
# [15] "Latin America & the Caribbean (IDA & IBRD countries)"
# [16] "Middle East & North Africa (IDA & IBRD countries)"   
# [17] "East Asia & Pacific (IDA & IBRD countries)"          
# [18] "South Asia (IDA & IBRD)"                             
# [19] "Sub-Saharan Africa (IDA & IBRD countries)"           
# [20] "Europe & Central Asia (IDA & IBRD countries)"        
# [21] "Pre-demographic dividend"                            
# [22] "Early-demographic dividend"                          
# [23] "Late-demographic dividend"                           
# [24] "Post-demographic dividend"                           
# [25] "Euro area"                                           
# [26] "High income"                                         
# [27] "Heavily indebted poor countries (HIPC)"              
# [28] "IBRD only"                                           
# [29] "IDA total"                                           
# [30] "IDA blend"                                           
# [31] "IDA only"                                            
# [32] "Latin America & Caribbean (excluding high income)"   
# [33] "Kosovo"                                              
# [34] "Least developed countries: UN classification"        
# [35] "Low income"                                          
# [36] "Lower middle income"                                 
# [37] "Low & middle income"                                 
# [38] "Middle income"                                       
# [39] "Middle East & North Africa (excluding high income)"  
# [40] "Upper middle income"                                 
# [41] "North America"                                       
# [42] "Not classified"                                      
# [43] "East Asia & Pacific"                                 
# [44] "Europe & Central Asia"                               
# [45] "Sub-Saharan Africa (excluding high income)"          
# [46] "Sub-Saharan Africa"                                  
# [47] "Latin America & Caribbean"                           
# [48] "Middle East & North Africa"                          
# [49] "IDA & IBRD total" 

但这很容易。检查 rm.groups 向量后,您可能希望保留这两个:

wdi$iso2c[wdi$country == "Kosovo"][1]
# [1] "XK"
wdi$iso2c[wdi$country == "Channel Islands"][1]
# [1] "JG"

只需使用 %in%:

将它们从 rm.groups 中删除
rm.groups <- rm.groups[-which(rm.groups %in% c("Kosovo", "Channel Islands"))]

最后,您可以通过保留 ! 而不是 %in% rm.groups.

的国家行来从 wdi 数据框中删除组
wdi.nogroups <- wdi[!wdi$country %in% rm.groups, ]
head(wdi.nogroups)
#     iso2c country year SI.POV.NAHC SI.POV.GINI SL.UEM.TOTL.ZS SP.POP.TOTL
# 141    AD Andorra 1991          NA          NA             NA       56671
# 142    AD Andorra 1992          NA          NA             NA       58888
# 143    AD Andorra 1993          NA          NA             NA       60971
# 144    AD Andorra 1994          NA          NA             NA       62677
# 145    AD Andorra 1995          NA          NA             NA       63850
# 146    AD Andorra 1996          NA          NA             NA       64360