在 ggplot2 中绘制时缺少一些 sf 多边形

some sf polygons missing when plotted in ggplot2

我正在尝试绘制美国县级数据,但我不明白为什么有些县没有出现。在这个玩具示例中,我只关注加利福尼亚州的县,我保留了所有每日数据,直到我在调用 ggplot() 时过滤掉(我的实际用例涉及 gganimate,所以我需要每日数据).

library(tidyverse)
library(sf)
library(viridis)
library("rio")

# get county geometry
  url <- "https://gist.githubusercontent.com/ericpgreen/717596c37478ef894c14b250477fae92/raw/c2cf4b273a2c7f0677f22a37b5e9f7e893204e3b/cali.R"
  cali <- rio::import(url)

# get covid data
  covid <- read.csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv", 
                    stringsAsFactors = FALSE)

# prep covid data
  covidPrepped <-
  covid %>%
    filter(state=="California") %>%
    select(date, fips, cases, deaths) %>%
    mutate(date = lubridate::ymd(date)) %>%
    mutate(fips = stringr::str_pad(fips, width=5, pad="0")) %>%
    mutate(month = lubridate::month(date, 
                                    label=TRUE, 
                                    abbr=TRUE),
           day = lubridate::day(date),
           monthDay = paste(month, day, sep=" "))

# make sure every county has a row for every day
  complete <- 
  cali %>%
    left_join(covidPrepped, by = c("GEOID" = "fips")) %>%
    complete(date, GEOID, fill = list(cases = 0)) %>%
    select(date, GEOID, cases, monthDay)

# join back to geometry and construct casesPop
  pData <- 
  complete %>%
    left_join(select(cali, GEOID, NAME, estimate, geometry),
              by = "GEOID") %>%
    st_as_sf() %>%
    mutate(casesPop = (cases/estimate)*100000) %>%
    mutate(casesPop = ifelse(is.na(casesPop), 0, casesPop)) %>%
    mutate(group = cut(casesPop, 
                       breaks = c(0, 1, 3, 10, 30, 100, 
                                  300, 1000, 3000, 10000, 
                                  Inf),
                       labels = c(0, 1, 3, 10, 30, 100, 
                                  300, 1000, 3000, 10000),
                       include.lowest = TRUE)
    ) %>%
    select(GEOID, geometry, group, monthDay) 

# plot
  ggplot(pData %>% filter(monthDay=="May 5")) +
    geom_sf(aes(fill = group), color = "white", size=.1) +
    scale_fill_viridis_d(option = "magma", drop=FALSE) +
    coord_sf(crs = 102003) +
    theme_minimal() + 
    theme(legend.position = "top",
          legend.box = "horizontal",
          legend.title = element_blank(),
          legend.justification='left') +
    guides(fill = guide_legend(nrow = 1))

缺失的县:

missing <- pData %>% filter(monthDay=="May 5")
cali$GEOID[!(cali$GEOID %in% test$GEOID)]
#[1] "06035" "06049" "06091" "06105"

这些县没有 5 月 5 日的 covid 数据,但我认为这可以通过致电 complete().

来解决

complete(date, GEOID, fill = list(cases = 0))

我意识到 complete()monthDay 中留下了漏洞,我在后面的步骤中使用它来过滤。这些 NA 在绘图时被丢弃。

complete(date, GEOID, fill = list(cases = 0)) %>%
select(date, GEOID, cases, monthDay)

因此,我进行了一些重组,以便在将完整数据与几何数据相结合后创建 monthDay

library(tidyverse)
library(sf)
library(viridis)
library("rio")

# get county geometry
url <- "https://gist.githubusercontent.com/ericpgreen/717596c37478ef894c14b250477fae92/raw/c2cf4b273a2c7f0677f22a37b5e9f7e893204e3b/cali.R"
cali <- rio::import(url)

# get covid data
covid <- read.csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv", 
                  stringsAsFactors = FALSE)

# prep covid data
covidPrepped <-
  covid %>%
  filter(state=="California") %>%
  select(date, fips, cases, deaths) %>%
  mutate(date = lubridate::ymd(date)) %>%
  mutate(fips = stringr::str_pad(fips, width=5, pad="0")) 

# make sure every county has a row for every day
complete <- 
  cali %>%
  left_join(covidPrepped, by = c("GEOID" = "fips")) %>%
  complete(GEOID, date, fill = list(cases = 0)) %>%
  select(date, GEOID, cases)

# join back to geometry and construct casesPop
pData <- 
  complete %>%
  left_join(select(cali, GEOID, NAME, estimate, geometry),
            by = "GEOID") %>%
  st_as_sf() %>%
  mutate(casesPop = (cases/estimate)*100000) %>%
  mutate(casesPop = ifelse(is.na(casesPop), 0, casesPop)) %>%
  mutate(group = cut(casesPop, 
                     breaks = c(0, 1, 3, 10, 30, 100, 
                                300, 1000, 3000, 10000, 
                                Inf),
                     labels = c(0, 1, 3, 10, 30, 100, 
                                300, 1000, 3000, 10000),
                     include.lowest = TRUE)
  ) %>%
  mutate(month = lubridate::month(date, 
                                  label=TRUE, 
                                  abbr=TRUE),
         day = lubridate::day(date),
         monthDay = paste(month, day, sep=" ")) %>%
  select(GEOID, geometry, group, monthDay) 

# plot
ggplot(pData %>% filter(monthDay=="May 5")) +
  geom_sf(aes(fill = group), color = "white", size=.1) +
  scale_fill_viridis_d(option = "magma", drop=FALSE) +
  coord_sf(crs = 102003) +
  theme_minimal() + 
  theme(legend.position = "top",
        legend.box = "horizontal",
        legend.title = element_blank(),
        legend.justification='left') +
  guides(fill = guide_legend(nrow = 1))