如何在 R 中的数据框中进行组划分?
How to do division of groups in dataframe in R?
我有以下数据框:
df <- tibble(year = c("2020","2020","2020","2021","2021","2021"),
website = c("google","facebook","twitter","google","facebook","twitter"),
category = c("big","big","small","big","big","small"),
value = c(10,20,30,40,50,60))
我如何计算不同年份之间的变化?
例如,我想比较 2021 年与 2020 年,我如何在 R 中执行此操作?
例如输出应该是这样的
year
website
category
comparison
2021 vs 2020
google
big
4
2021 vs 2020
facebook
big
2.5
2021 vs 2020
twitter
small
2
其中列比较实际上是当年的值/上一年的值
我不太清楚如何在 dplyr 中执行此操作?
我会将年份列旋转到两个不同的列,然后使用 mutate()
来计算 comparison
。
这是一个例子:
library(dplyr)
df %>%
tidyr::pivot_wider(names_from = year) %>%
mutate(
comparison = `2021`/`2020`
)
#> # A tibble: 3 × 5
#> website category `2020` `2021` comparison
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 google big 10 40 4
#> 2 facebook big 20 50 2.5
#> 3 twitter small 30 60 2
由 reprex package (v2.0.1)
于 2022-04-04 创建
之后您可以使用 select()
删除多余的列。
Note: if your column is not called value
, then you need to add this argument to pivot_wider()
: values_from = <your column name with values>
.
更新:如果你想以长格式保存你的数据
只需在 mutate()
:
中使用 group_by(website)
、arrange(year)
和 lag()
library(dplyr)
df %>%
group_by(website) %>%
arrange(year) %>%
mutate(
comparison = value / lag(value)
)
#> # A tibble: 6 × 5
#> # Groups: website [3]
#> year website category value comparison
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2020 google big 10 NA
#> 2 2020 facebook big 20 NA
#> 3 2020 twitter small 30 NA
#> 4 2021 google big 40 4
#> 5 2021 facebook big 50 2.5
#> 6 2021 twitter small 60 2
由 reprex package (v2.0.1)
于 2022-04-04 创建
你可以
library(tidyverse)
df %>%
group_by(website) %>%
arrange(year) %>%
summarize(year = paste(year[2], year[1], sep = ' vs '),
category = category[1],
comparison = value[2] / value[1])
#> A tibble: 3 x 4
#> website year category comparison
#> <chr> <chr> <chr> <dbl>
#> 1 facebook 2021 vs 2020 big 2.5
#> 2 google 2021 vs 2020 big 4
#> 3 twitter 2021 vs 2020 small 2
我有以下数据框:
df <- tibble(year = c("2020","2020","2020","2021","2021","2021"),
website = c("google","facebook","twitter","google","facebook","twitter"),
category = c("big","big","small","big","big","small"),
value = c(10,20,30,40,50,60))
我如何计算不同年份之间的变化?
例如,我想比较 2021 年与 2020 年,我如何在 R 中执行此操作?
例如输出应该是这样的
year | website | category | comparison |
---|---|---|---|
2021 vs 2020 | big | 4 | |
2021 vs 2020 | big | 2.5 | |
2021 vs 2020 | small | 2 |
其中列比较实际上是当年的值/上一年的值
我不太清楚如何在 dplyr 中执行此操作?
我会将年份列旋转到两个不同的列,然后使用 mutate()
来计算 comparison
。
这是一个例子:
library(dplyr)
df %>%
tidyr::pivot_wider(names_from = year) %>%
mutate(
comparison = `2021`/`2020`
)
#> # A tibble: 3 × 5
#> website category `2020` `2021` comparison
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 google big 10 40 4
#> 2 facebook big 20 50 2.5
#> 3 twitter small 30 60 2
由 reprex package (v2.0.1)
于 2022-04-04 创建之后您可以使用 select()
删除多余的列。
Note: if your column is not called
value
, then you need to add this argument topivot_wider()
:values_from = <your column name with values>
.
更新:如果你想以长格式保存你的数据
只需在 mutate()
:
group_by(website)
、arrange(year)
和 lag()
library(dplyr)
df %>%
group_by(website) %>%
arrange(year) %>%
mutate(
comparison = value / lag(value)
)
#> # A tibble: 6 × 5
#> # Groups: website [3]
#> year website category value comparison
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2020 google big 10 NA
#> 2 2020 facebook big 20 NA
#> 3 2020 twitter small 30 NA
#> 4 2021 google big 40 4
#> 5 2021 facebook big 50 2.5
#> 6 2021 twitter small 60 2
由 reprex package (v2.0.1)
于 2022-04-04 创建你可以
library(tidyverse)
df %>%
group_by(website) %>%
arrange(year) %>%
summarize(year = paste(year[2], year[1], sep = ' vs '),
category = category[1],
comparison = value[2] / value[1])
#> A tibble: 3 x 4
#> website year category comparison
#> <chr> <chr> <chr> <dbl>
#> 1 facebook 2021 vs 2020 big 2.5
#> 2 google 2021 vs 2020 big 4
#> 3 twitter 2021 vs 2020 small 2