如何分组或嵌套到 r 中的数据子集?

How to Group or nest to a subset of the data in r?

  1. 问题:如何根据数据子集将sizesgeom_point()中的变化保持在各自的month只有.

  2. 我是 R 新手,基于 months 前 5 个国家 和 [=18= 创建了一个 facet plot ] 每个月都有 covid 病例数。现在大小根据 months 中的 Cases_count 而变化,但希望将其限制为仅基于各自的月份。

  1. 下面是相同的代码
df_stack %>% 
  mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
  filter(Cases_type == "Confirmed") %>% 
  group_by(month, Country.Region) %>% 
  summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>% 
  top_n(n = 5, wt = Cases_count) %>% 
  ungroup() %>% 
  
  # Adding continents to data
  left_join(y = df_stack %>% 
              select(Country.Region, continent) %>% 
              unique(),
            by = "Country.Region") %>% 
  
  ggplot(aes(x = continent, y = Country.Region)) +
  geom_point(shape = 21, aes(size = Cases_count, color = as.factor(continent)), fill = "white", stroke = 3) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90),
        legend.position = "none") +
  facet_wrap(~month) +
  labs(title = "Top 5 confirmed cases Countries across the year")
  1. 查看数据的代码:
df_stack %>% 
  mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
  filter(Cases_type == "Confirmed") %>% 
  group_by(month, Country.Region) %>% 
  summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>% 
  top_n(n = 5, wt = Cases_count) %>% 
  ungroup() %>% 
  
  # Adding continents to data
  left_join(y = df_stack %>% 
              select(Country.Region, continent) %>% 
              unique(),
            by = "Country.Region")

######## output ########

A tibble: 60 x 4
   month Country.Region   Cases_count continent
   <ord> <chr>                  <dbl> <fct>    
 1 Jan   China                  38008 Asia     
 2 Jan   Japan                     56 Asia     
 3 Jan   Singapore                 53 Asia     
 4 Jan   Taiwan*                   52 Asia     
 5 Jan   Thailand                  94 Asia     
 6 Feb   China                1633361 Asia     
 7 Feb   Diamond Princess       10076 Unknown  
 8 Feb   Italy                   3966 Europe   
 9 Feb   Japan                   2418 Asia     
10 Feb   Korea, South           12128 Asia     
# ... with 50 more rows
  1. 期望输出/问题:我想根据Cases_count限制geom_point()size变体在整个数据帧 Cases_count.

    上仅分组到每个 monthnot
  2. 尝试

我已尝试使用 group_by(month),如下面的代码所示,但这也无济于事。

df_stack %>% 
  mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
  filter(Cases_type == "Confirmed") %>% 
  group_by(month, Country.Region) %>% 
  summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>% 
  top_n(n = 5, wt = Cases_count) %>% 
  ungroup() %>%

  # Adding continents to data  
  left_join(y = df_stack %>% 
              select(Country.Region, continent) %>% 
              unique(),
            by = "Country.Region") %>% 
  
  # Grouping by month to keep geom_point size variation to month only
  group_by(as.factor(month)) %>% 
  
  ggplot(aes(x = continent, y = Country.Region)) +
  geom_point(shape = 21, aes(size = Cases_count, color = as.factor(continent)), fill = "white", stroke = 3) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90),
        legend.position = "none") +
  facet_wrap(~month) +
  labs(title = "Top 5 confirmed cases Countries across the year")

更新代码的link用于从头开始重新创建df_stack数据帧

https://github.com/johnsnow09/covid19-df_stack-code/blob/main/df_stack_for_Whosebug.txt

数据是从代码中使用的各种 covid r 库中获取的。

我认为最简单的方法是通过month重新分组,然后使用mutate创建一个新变量专门确定大小。然后将 ggplot size = 美学设置为新变量。

另请注意,您可以在调用 summarise 时简单地保存大陆,无需 left_join:

library(tidyverse)
df_stack <- read.csv(url("https://raw.githubusercontent.com/ianmcampbell/RandomWhosebug/main/df_stack.csv"))
df_stack %>% 
  mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
  filter(Cases_type == "Confirmed") %>% 
  group_by(month, Country.Region) %>% 
  dplyr::summarise(Cases_count = sum(Cases_count, na.rm = TRUE),
            continent = first(continent)) %>% 
  top_n(n = 5, wt = Cases_count) %>%
  group_by(month) %>%
  mutate(Cases_size = Cases_count / sum(Cases_count)) %>%
ggplot(aes(x = continent, y = Country.Region)) +
  geom_point(shape = 21, aes(size = Cases_size, color = as.factor(continent)), fill = "white", stroke = 3) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90),
        legend.position = "none") +
  facet_wrap(~month) +
  coord_cartesian(clip = "off") +
  labs(title = "Countries with the greatest number of confirmed cases by month")