使用 WDI API 后,如何通过字符串轻松删除聚合国家的行?
After using WDI API, how to easily remove rows of aggregated countries by string?
嗯,我有这个:
library(WDI)
wdi <- WDI(indicator=c("SI.POV.NAHC", "SI.POV.GINI", "SL.UEM.TOTL.ZS",
"SP.POP.TOTL"), start=1991, end=2018)
wdi <- as_tibble(wdi)
wdi <- wdi %>% select(-iso2c)
wdi <- wdi %>% rename(`Poverty Headcount`=SI.POV.NAHC,
`Gini Index`=SI.POV.GINI,
`Total Unemployment`= SL.UEM.TOTL.ZS,
`Total Population`=SP.POP.TOTL)
##Remove rows command goes here
kable(head(wdi))
如您所见,第一列是国家/地区,行是每年的国家/地区观察值。我想从此数据集中删除所有“组”(阿拉伯世界、世界、南亚……)我只想要国家。
这是世界银行的数据。
我使用的非基础包是:
library(tidyverse)
library(WDI)
library(psych)
library(pastecs)
library(xlsx)
library(stargazer)
library(xtable)
library(markdown)
library(knitr)
library(haven)
library(readr)
library(ggplot2)
library(dplyr)
library(sf)
library(tmap)
library(spData)
library(RColorBrewer)
library(stringr)
您可以检查 "iso2c"
列的有效性。一个简单的方法是使用 countrycode
包的 countrycode(<values>, <from>, <to>)
。
wdi <- WDI::WDI(indicator=c("SI.POV.NAHC", "SI.POV.GINI", "SL.UEM.TOTL.ZS",
"SP.POP.TOTL"), start=1991, end=2018)
当您将 "iso2c"
用于 <from>
以及 <to>
时,您将得到 NA
,您可以使用 is.na
轻松识别.但是在申请之前检查结果,否则该方法也会删除有争议的国家,例如科索沃。
library(countrycode)
rm.groups <- unique(wdi$country[is.na(countrycode(wdi$iso2c, "iso2c", "iso2c"))])
rm.groups
# [1] "Arab World"
# [2] "World"
# [3] "East Asia & Pacific (excluding high income)"
# [4] "Europe & Central Asia (excluding high income)"
# [5] "South Asia"
# [6] "Central Europe and the Baltics"
# [7] "European Union"
# [8] "Fragile and conflict affected situations"
# [9] "Channel Islands"
# [10] "OECD members"
# [11] "Small states"
# [12] "Pacific island small states"
# [13] "Caribbean small states"
# [14] "Other small states"
# [15] "Latin America & the Caribbean (IDA & IBRD countries)"
# [16] "Middle East & North Africa (IDA & IBRD countries)"
# [17] "East Asia & Pacific (IDA & IBRD countries)"
# [18] "South Asia (IDA & IBRD)"
# [19] "Sub-Saharan Africa (IDA & IBRD countries)"
# [20] "Europe & Central Asia (IDA & IBRD countries)"
# [21] "Pre-demographic dividend"
# [22] "Early-demographic dividend"
# [23] "Late-demographic dividend"
# [24] "Post-demographic dividend"
# [25] "Euro area"
# [26] "High income"
# [27] "Heavily indebted poor countries (HIPC)"
# [28] "IBRD only"
# [29] "IDA total"
# [30] "IDA blend"
# [31] "IDA only"
# [32] "Latin America & Caribbean (excluding high income)"
# [33] "Kosovo"
# [34] "Least developed countries: UN classification"
# [35] "Low income"
# [36] "Lower middle income"
# [37] "Low & middle income"
# [38] "Middle income"
# [39] "Middle East & North Africa (excluding high income)"
# [40] "Upper middle income"
# [41] "North America"
# [42] "Not classified"
# [43] "East Asia & Pacific"
# [44] "Europe & Central Asia"
# [45] "Sub-Saharan Africa (excluding high income)"
# [46] "Sub-Saharan Africa"
# [47] "Latin America & Caribbean"
# [48] "Middle East & North Africa"
# [49] "IDA & IBRD total"
但这很容易。检查 rm.groups
向量后,您可能希望保留这两个:
wdi$iso2c[wdi$country == "Kosovo"][1]
# [1] "XK"
wdi$iso2c[wdi$country == "Channel Islands"][1]
# [1] "JG"
只需使用 %in%
:
将它们从 rm.groups
中删除
rm.groups <- rm.groups[-which(rm.groups %in% c("Kosovo", "Channel Islands"))]
最后,您可以通过保留 !
而不是 %in%
rm.groups
.
的国家行来从 wdi
数据框中删除组
wdi.nogroups <- wdi[!wdi$country %in% rm.groups, ]
head(wdi.nogroups)
# iso2c country year SI.POV.NAHC SI.POV.GINI SL.UEM.TOTL.ZS SP.POP.TOTL
# 141 AD Andorra 1991 NA NA NA 56671
# 142 AD Andorra 1992 NA NA NA 58888
# 143 AD Andorra 1993 NA NA NA 60971
# 144 AD Andorra 1994 NA NA NA 62677
# 145 AD Andorra 1995 NA NA NA 63850
# 146 AD Andorra 1996 NA NA NA 64360
嗯,我有这个:
library(WDI)
wdi <- WDI(indicator=c("SI.POV.NAHC", "SI.POV.GINI", "SL.UEM.TOTL.ZS",
"SP.POP.TOTL"), start=1991, end=2018)
wdi <- as_tibble(wdi)
wdi <- wdi %>% select(-iso2c)
wdi <- wdi %>% rename(`Poverty Headcount`=SI.POV.NAHC,
`Gini Index`=SI.POV.GINI,
`Total Unemployment`= SL.UEM.TOTL.ZS,
`Total Population`=SP.POP.TOTL)
##Remove rows command goes here
kable(head(wdi))
如您所见,第一列是国家/地区,行是每年的国家/地区观察值。我想从此数据集中删除所有“组”(阿拉伯世界、世界、南亚……)我只想要国家。
这是世界银行的数据。
我使用的非基础包是:
library(tidyverse)
library(WDI)
library(psych)
library(pastecs)
library(xlsx)
library(stargazer)
library(xtable)
library(markdown)
library(knitr)
library(haven)
library(readr)
library(ggplot2)
library(dplyr)
library(sf)
library(tmap)
library(spData)
library(RColorBrewer)
library(stringr)
您可以检查 "iso2c"
列的有效性。一个简单的方法是使用 countrycode
包的 countrycode(<values>, <from>, <to>)
。
wdi <- WDI::WDI(indicator=c("SI.POV.NAHC", "SI.POV.GINI", "SL.UEM.TOTL.ZS",
"SP.POP.TOTL"), start=1991, end=2018)
当您将 "iso2c"
用于 <from>
以及 <to>
时,您将得到 NA
,您可以使用 is.na
轻松识别.但是在申请之前检查结果,否则该方法也会删除有争议的国家,例如科索沃。
library(countrycode)
rm.groups <- unique(wdi$country[is.na(countrycode(wdi$iso2c, "iso2c", "iso2c"))])
rm.groups
# [1] "Arab World"
# [2] "World"
# [3] "East Asia & Pacific (excluding high income)"
# [4] "Europe & Central Asia (excluding high income)"
# [5] "South Asia"
# [6] "Central Europe and the Baltics"
# [7] "European Union"
# [8] "Fragile and conflict affected situations"
# [9] "Channel Islands"
# [10] "OECD members"
# [11] "Small states"
# [12] "Pacific island small states"
# [13] "Caribbean small states"
# [14] "Other small states"
# [15] "Latin America & the Caribbean (IDA & IBRD countries)"
# [16] "Middle East & North Africa (IDA & IBRD countries)"
# [17] "East Asia & Pacific (IDA & IBRD countries)"
# [18] "South Asia (IDA & IBRD)"
# [19] "Sub-Saharan Africa (IDA & IBRD countries)"
# [20] "Europe & Central Asia (IDA & IBRD countries)"
# [21] "Pre-demographic dividend"
# [22] "Early-demographic dividend"
# [23] "Late-demographic dividend"
# [24] "Post-demographic dividend"
# [25] "Euro area"
# [26] "High income"
# [27] "Heavily indebted poor countries (HIPC)"
# [28] "IBRD only"
# [29] "IDA total"
# [30] "IDA blend"
# [31] "IDA only"
# [32] "Latin America & Caribbean (excluding high income)"
# [33] "Kosovo"
# [34] "Least developed countries: UN classification"
# [35] "Low income"
# [36] "Lower middle income"
# [37] "Low & middle income"
# [38] "Middle income"
# [39] "Middle East & North Africa (excluding high income)"
# [40] "Upper middle income"
# [41] "North America"
# [42] "Not classified"
# [43] "East Asia & Pacific"
# [44] "Europe & Central Asia"
# [45] "Sub-Saharan Africa (excluding high income)"
# [46] "Sub-Saharan Africa"
# [47] "Latin America & Caribbean"
# [48] "Middle East & North Africa"
# [49] "IDA & IBRD total"
但这很容易。检查 rm.groups
向量后,您可能希望保留这两个:
wdi$iso2c[wdi$country == "Kosovo"][1]
# [1] "XK"
wdi$iso2c[wdi$country == "Channel Islands"][1]
# [1] "JG"
只需使用 %in%
:
rm.groups
中删除
rm.groups <- rm.groups[-which(rm.groups %in% c("Kosovo", "Channel Islands"))]
最后,您可以通过保留 !
而不是 %in%
rm.groups
.
wdi
数据框中删除组
wdi.nogroups <- wdi[!wdi$country %in% rm.groups, ]
head(wdi.nogroups)
# iso2c country year SI.POV.NAHC SI.POV.GINI SL.UEM.TOTL.ZS SP.POP.TOTL
# 141 AD Andorra 1991 NA NA NA 56671
# 142 AD Andorra 1992 NA NA NA 58888
# 143 AD Andorra 1993 NA NA NA 60971
# 144 AD Andorra 1994 NA NA NA 62677
# 145 AD Andorra 1995 NA NA NA 63850
# 146 AD Andorra 1996 NA NA NA 64360