无法在 R 中创建印度等值线

Unable to create India choropleth in R

我想在 R 中创建印度的等值线

我做的第一步是在 R 中导入一个形状文件

来自https://github.com/datameet/maps/tree/master/States

并在 R 中阅读

shape <- rgdal::readOGR(dsn="/Data/Admin2.shp")
states <- fortify(shape, region = "ST_NM")

接下来我有一个州及其人口的数据集 states_data

structure(list(Name = c("JAMMU & KASHMIR", "HIMACHAL PRADESH", 
"UTTARAKHAND", "RAJASTHAN", "UTTAR PRADESH", "BIHAR", "SIKKIM", 
"ARUNACHAL PRADESH", "NAGALAND", "MANIPUR", "MIZORAM", "TRIPURA", 
"MEGHALAYA", "ASSAM", "WEST BENGAL", "JHARKHAND", "ODISHA", "CHHATTISGARH", 
"MADHYA PRADESH", "GUJARAT", "DAMAN & DIU", "DADRA & NAGAR HAVELI", 
"MAHARASHTRA", "ANDHRA PRADESH", "KARNATAKA", "GOA", "LAKSHADWEEP", 
"KERALA", "TAMIL NADU", "ANDAMAN & NICOBAR ISLANDS"), TOT_P = c(1493299, 
392126, 291903, 9238534, 1134273, 1336573, 206360, 951821, 1710973, 
1167422, 1036115, 1166813, 2555861, 3884371, 5296953, 8645042, 
9590756, 7822902, 15316784, 8917174, 15363, 178564, 10510213, 
5918073, 4248987, 149275, 61120, 484839, 794697, 28530)), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))

我合并两个关于州名的数据集

final_data <- merge(states,states_data, by.y="Name", by.x="id")

最后我使用 ggplot 作图

ggplot()+
  geom_polygon(data=final_data,
               aes(x= long, y=lat, group=id, fill=TOT_P), color='black',size=0.25)+
  coord_map()

我得到以下图表

谁能告诉我哪里出错了。感谢您的帮助!

谢谢!

您的两个数据集中的州名字符串不相同。

如果您查看唯一值,您会发现 shapefile 使用标题大小写

> unique(states$id)

[1] "Andaman & Nicobar Island" "Andhra Pradesh"           "Arunanchal Pradesh"       "Assam"                   
[5] "Bihar"                    "Chandigarh"               "Chhattisgarh"             "Dadara & Nagar Havelli"  
[9] "Daman & Diu"              "Goa"                      "Gujarat"                  "Haryana"                 
[13] "Himachal Pradesh"         "Jammu & Kashmir"          "Jharkhand"                "Karnataka"               
[17] "Kerala"                   "Lakshadweep"              "Madhya Pradesh"           "Maharashtra"             
[21] "Manipur"                  "Meghalaya"                "Mizoram"                  "Nagaland"                
[25] "NCT of Delhi"             "Odisha"                   "Puducherry"               "Punjab"                  
[29] "Rajasthan"                "Sikkim"                   "Tamil Nadu"               "Telangana"               
[33] "Tripura"                  "Uttar Pradesh"            "Uttarakhand"              "West Bengal"

而您的人口数据框使用全部大写:

> unique(states_data$Name)
[1] "JAMMU & KASHMIR"           "HIMACHAL PRADESH"          "UTTARAKHAND"               "RAJASTHAN"                
[5] "UTTAR PRADESH"             "BIHAR"                     "SIKKIM"                    "ARUNACHAL PRADESH"        
[9] "NAGALAND"                  "MANIPUR"                   "MIZORAM"                   "TRIPURA"                  
[13] "MEGHALAYA"                 "ASSAM"                     "WEST BENGAL"               "JHARKHAND"                
[17] "ODISHA"                    "CHHATTISGARH"              "MADHYA PRADESH"            "GUJARAT"                  
[21] "DAMAN & DIU"               "DADRA & NAGAR HAVELI"      "MAHARASHTRA"               "ANDHRA PRADESH"           
[25] "KARNATAKA"                 "GOA"                       "LAKSHADWEEP"               "KERALA"                   
[29] "TAMIL NADU"                "ANDAMAN & NICOBAR ISLANDS"

这就是为什么您的合并数据集 final_data 是空的。

一个可能的解决方法是在合并之前将两个数据集中的名称都变成小写:

states$id <- stringr::str_to_lower(states$id)
states_data$Name <- stringr::str_to_lower(states_data$Name)

但是,仍有几行不匹配,可能是因为 typos/different 拼写,也可能只是缺少数据。您可以通过

查看这些内容
setdiff(unique(states$id), unique(states_data$Name))

并尽可能调整拼写。

最后,在我的快速测试中,强化的多边形没有很好地绘制——这可能完全是我的 rgeos/rgdal/ggplot2 组合所特有的。不过,如果您打算更广泛地处理空间数据,我想向您推荐 sf 包。它使处理空间数据非常方便(请参阅综合文档 here),并使您能够简单地使用 geom_sf()ggplot2.

一起绘图
library(tidyverse)
library(sf)
# read shape and convert state names to lower case 
states <- st_read("./Data/Admin2.shp") %>%
                 mutate(Name = str_to_lower(ST_NM))
# merge spatial data with population data, also convert state names to lower case in the latter
states_population <- states %>%
  left_join(states_data %>% mutate(Name = str_to_lower(Name)), "Name")
# grey states are the result of unmatched states outlined above
ggplot(states_population, aes(fill = TOT_P)) +
  geom_sf() +
  scale_fill_viridis_c() +
  ggthemes::theme_map()