如何分组或嵌套到 r 中的数据子集?
How to Group or nest to a subset of the data in r?
问题:如何根据数据子集将sizes
在geom_point()
中的变化保持在各自的month
只有.
我是 R 新手,基于 months
为 前 5 个国家 和 [=18= 创建了一个 facet plot
] 每个月都有 covid 病例数。现在大小根据 months
中的 Cases_count
而变化,但希望将其限制为仅基于各自的月份。
- 下面是相同的代码
df_stack %>%
mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
filter(Cases_type == "Confirmed") %>%
group_by(month, Country.Region) %>%
summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>%
top_n(n = 5, wt = Cases_count) %>%
ungroup() %>%
# Adding continents to data
left_join(y = df_stack %>%
select(Country.Region, continent) %>%
unique(),
by = "Country.Region") %>%
ggplot(aes(x = continent, y = Country.Region)) +
geom_point(shape = 21, aes(size = Cases_count, color = as.factor(continent)), fill = "white", stroke = 3) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90),
legend.position = "none") +
facet_wrap(~month) +
labs(title = "Top 5 confirmed cases Countries across the year")
- 查看数据的代码:
df_stack %>%
mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
filter(Cases_type == "Confirmed") %>%
group_by(month, Country.Region) %>%
summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>%
top_n(n = 5, wt = Cases_count) %>%
ungroup() %>%
# Adding continents to data
left_join(y = df_stack %>%
select(Country.Region, continent) %>%
unique(),
by = "Country.Region")
######## output ########
A tibble: 60 x 4
month Country.Region Cases_count continent
<ord> <chr> <dbl> <fct>
1 Jan China 38008 Asia
2 Jan Japan 56 Asia
3 Jan Singapore 53 Asia
4 Jan Taiwan* 52 Asia
5 Jan Thailand 94 Asia
6 Feb China 1633361 Asia
7 Feb Diamond Princess 10076 Unknown
8 Feb Italy 3966 Europe
9 Feb Japan 2418 Asia
10 Feb Korea, South 12128 Asia
# ... with 50 more rows
期望输出/问题:我想根据Cases_count限制geom_point()
的size
变体在整个数据帧 Cases_count
.
上仅分组到每个 month
而 not
尝试
我已尝试使用 group_by(month)
,如下面的代码所示,但这也无济于事。
df_stack %>%
mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
filter(Cases_type == "Confirmed") %>%
group_by(month, Country.Region) %>%
summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>%
top_n(n = 5, wt = Cases_count) %>%
ungroup() %>%
# Adding continents to data
left_join(y = df_stack %>%
select(Country.Region, continent) %>%
unique(),
by = "Country.Region") %>%
# Grouping by month to keep geom_point size variation to month only
group_by(as.factor(month)) %>%
ggplot(aes(x = continent, y = Country.Region)) +
geom_point(shape = 21, aes(size = Cases_count, color = as.factor(continent)), fill = "white", stroke = 3) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90),
legend.position = "none") +
facet_wrap(~month) +
labs(title = "Top 5 confirmed cases Countries across the year")
更新:代码的link用于从头开始重新创建df_stack
数据帧
https://github.com/johnsnow09/covid19-df_stack-code/blob/main/df_stack_for_Whosebug.txt
数据是从代码中使用的各种 covid r 库中获取的。
我认为最简单的方法是通过month
重新分组,然后使用mutate
创建一个新变量专门确定大小。然后将 ggplot
size =
美学设置为新变量。
另请注意,您可以在调用 summarise
时简单地保存大陆,无需 left_join
:
library(tidyverse)
df_stack <- read.csv(url("https://raw.githubusercontent.com/ianmcampbell/RandomWhosebug/main/df_stack.csv"))
df_stack %>%
mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
filter(Cases_type == "Confirmed") %>%
group_by(month, Country.Region) %>%
dplyr::summarise(Cases_count = sum(Cases_count, na.rm = TRUE),
continent = first(continent)) %>%
top_n(n = 5, wt = Cases_count) %>%
group_by(month) %>%
mutate(Cases_size = Cases_count / sum(Cases_count)) %>%
ggplot(aes(x = continent, y = Country.Region)) +
geom_point(shape = 21, aes(size = Cases_size, color = as.factor(continent)), fill = "white", stroke = 3) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90),
legend.position = "none") +
facet_wrap(~month) +
coord_cartesian(clip = "off") +
labs(title = "Countries with the greatest number of confirmed cases by month")
问题:如何根据数据子集将
sizes
在geom_point()
中的变化保持在各自的month
只有.我是 R 新手,基于
months
为 前 5 个国家 和 [=18= 创建了一个facet plot
] 每个月都有 covid 病例数。现在大小根据months
中的Cases_count
而变化,但希望将其限制为仅基于各自的月份。
- 下面是相同的代码
df_stack %>%
mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
filter(Cases_type == "Confirmed") %>%
group_by(month, Country.Region) %>%
summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>%
top_n(n = 5, wt = Cases_count) %>%
ungroup() %>%
# Adding continents to data
left_join(y = df_stack %>%
select(Country.Region, continent) %>%
unique(),
by = "Country.Region") %>%
ggplot(aes(x = continent, y = Country.Region)) +
geom_point(shape = 21, aes(size = Cases_count, color = as.factor(continent)), fill = "white", stroke = 3) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90),
legend.position = "none") +
facet_wrap(~month) +
labs(title = "Top 5 confirmed cases Countries across the year")
- 查看数据的代码:
df_stack %>%
mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
filter(Cases_type == "Confirmed") %>%
group_by(month, Country.Region) %>%
summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>%
top_n(n = 5, wt = Cases_count) %>%
ungroup() %>%
# Adding continents to data
left_join(y = df_stack %>%
select(Country.Region, continent) %>%
unique(),
by = "Country.Region")
######## output ########
A tibble: 60 x 4
month Country.Region Cases_count continent
<ord> <chr> <dbl> <fct>
1 Jan China 38008 Asia
2 Jan Japan 56 Asia
3 Jan Singapore 53 Asia
4 Jan Taiwan* 52 Asia
5 Jan Thailand 94 Asia
6 Feb China 1633361 Asia
7 Feb Diamond Princess 10076 Unknown
8 Feb Italy 3966 Europe
9 Feb Japan 2418 Asia
10 Feb Korea, South 12128 Asia
# ... with 50 more rows
期望输出/问题:我想根据Cases_count限制
上仅分组到每个geom_point()
的size
变体在整个数据帧Cases_count
.month
而 not尝试
我已尝试使用 group_by(month)
,如下面的代码所示,但这也无济于事。
df_stack %>%
mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
filter(Cases_type == "Confirmed") %>%
group_by(month, Country.Region) %>%
summarise(Cases_count = sum(Cases_count, na.rm = TRUE)) %>%
top_n(n = 5, wt = Cases_count) %>%
ungroup() %>%
# Adding continents to data
left_join(y = df_stack %>%
select(Country.Region, continent) %>%
unique(),
by = "Country.Region") %>%
# Grouping by month to keep geom_point size variation to month only
group_by(as.factor(month)) %>%
ggplot(aes(x = continent, y = Country.Region)) +
geom_point(shape = 21, aes(size = Cases_count, color = as.factor(continent)), fill = "white", stroke = 3) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90),
legend.position = "none") +
facet_wrap(~month) +
labs(title = "Top 5 confirmed cases Countries across the year")
更新:代码的link用于从头开始重新创建df_stack
数据帧
https://github.com/johnsnow09/covid19-df_stack-code/blob/main/df_stack_for_Whosebug.txt
数据是从代码中使用的各种 covid r 库中获取的。
我认为最简单的方法是通过month
重新分组,然后使用mutate
创建一个新变量专门确定大小。然后将 ggplot
size =
美学设置为新变量。
另请注意,您可以在调用 summarise
时简单地保存大陆,无需 left_join
:
library(tidyverse)
df_stack <- read.csv(url("https://raw.githubusercontent.com/ianmcampbell/RandomWhosebug/main/df_stack.csv"))
df_stack %>%
mutate(month = lubridate::month(Date, label = TRUE, abbr = TRUE)) %>%
filter(Cases_type == "Confirmed") %>%
group_by(month, Country.Region) %>%
dplyr::summarise(Cases_count = sum(Cases_count, na.rm = TRUE),
continent = first(continent)) %>%
top_n(n = 5, wt = Cases_count) %>%
group_by(month) %>%
mutate(Cases_size = Cases_count / sum(Cases_count)) %>%
ggplot(aes(x = continent, y = Country.Region)) +
geom_point(shape = 21, aes(size = Cases_size, color = as.factor(continent)), fill = "white", stroke = 3) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90),
legend.position = "none") +
facet_wrap(~month) +
coord_cartesian(clip = "off") +
labs(title = "Countries with the greatest number of confirmed cases by month")