如何在 R 中的数据框中进行组划分?

How to do division of groups in dataframe in R?

我有以下数据框:

df <- tibble(year = c("2020","2020","2020","2021","2021","2021"), 
             website = c("google","facebook","twitter","google","facebook","twitter"), 
             category = c("big","big","small","big","big","small"), 
             value = c(10,20,30,40,50,60))

我如何计算不同年份之间的变化?

例如,我想比较 2021 年与 2020 年,我如何在 R 中执行此操作?

例如输出应该是这样的

year website category comparison
2021 vs 2020 google big 4
2021 vs 2020 facebook big 2.5
2021 vs 2020 twitter small 2

其中列比较实际上是当年的值/上一年的值

我不太清楚如何在 dplyr 中执行此操作?

我会将年份列旋转到两个不同的列,然后使用 mutate() 来计算 comparison

这是一个例子:

library(dplyr)

df %>% 
  tidyr::pivot_wider(names_from = year) %>% 
  mutate(
    comparison = `2021`/`2020`
  )
#> # A tibble: 3 × 5
#>   website  category `2020` `2021` comparison
#>   <chr>    <chr>     <dbl>  <dbl>      <dbl>
#> 1 google   big          10     40        4  
#> 2 facebook big          20     50        2.5
#> 3 twitter  small        30     60        2

reprex package (v2.0.1)

于 2022-04-04 创建

之后您可以使用 select() 删除多余的列。

Note: if your column is not called value, then you need to add this argument to pivot_wider(): values_from = <your column name with values>.

更新:如果你想以长格式保存你的数据

只需在 mutate():

中使用 group_by(website)arrange(year)lag()
library(dplyr)

df %>% 
  group_by(website) %>% 
  arrange(year) %>% 
  mutate(
    comparison = value / lag(value)
  )
#> # A tibble: 6 × 5
#> # Groups:   website [3]
#>   year  website  category value comparison
#>   <chr> <chr>    <chr>    <dbl>      <dbl>
#> 1 2020  google   big         10       NA  
#> 2 2020  facebook big         20       NA  
#> 3 2020  twitter  small       30       NA  
#> 4 2021  google   big         40        4  
#> 5 2021  facebook big         50        2.5
#> 6 2021  twitter  small       60        2

reprex package (v2.0.1)

于 2022-04-04 创建

你可以

library(tidyverse)

df %>% 
  group_by(website) %>%
  arrange(year) %>%
  summarize(year = paste(year[2], year[1], sep = ' vs '),
            category = category[1],
            comparison = value[2] / value[1]) 
#>  A tibble: 3 x 4
#>   website  year         category comparison
#>   <chr>    <chr>        <chr>         <dbl>
#> 1 facebook 2021 vs 2020 big             2.5
#> 2 google   2021 vs 2020 big             4  
#> 3 twitter  2021 vs 2020 small           2