计算 x 每 y 出现的频率并在 R 中可视化

Question

我想计算数据集中的某些内容。我有面板数据，理想情况下想计算每个人的活动次数。

people <- c(1,1,1,2,2,3,3,4,4,5,5)
activity <- c(1,1,1,2,2,3,4,5,5,6,6)
completion <- c(0,0,1,0,1,1,1,0,0,0,1)

所以我的输出会告诉我第 4 个人有 2 个任务。

people 1
frequency activity 2

我需要分组吗？理想情况下，我还想将其可视化为直方图。

我试过这个：

> ##activity per person  cllw %>% 
> ## Group observations by people   group_by(id_user) %>% 
> ## count activities per person and i am not sure how to create frequencies at all

Answer 1

像这样？

library(dplyr)
df %>% 
  group_by(people) %>% 
  summarise("frequency activity" = n())

# A tibble: 5 x 2
  people `frequency activity`
   <dbl>                <int>
1      1                    3
2      2                    2
3      3                    2
4      4                    2
5      5                    2

如果您只想要 "active" 个任务，也可以像这样：

df %>% 
  filter(completion != 1) %>% 
  group_by(people) %>% 
  summarise("frequency activity" = n())

# A tibble: 4 x 2
  people `frequency activity`
   <dbl>                <int>
1      1                    2
2      2                    1
3      4                    2
4      5                    1

编辑每个人的独特任务：

df %>% 
  filter(completion != 1) %>% 
  distinct(people, activity) %>% 
  group_by(people) %>%
  summarise("frequency activity" = n())

# A tibble: 4 x 2
  people `frequency activity`
   <dbl>                <int>
1      1                    1
2      2                    1
3      4                    1
4      5                    1

Answer 2

@Dominik.S.Meier

我有一个类似的问题，如果我想从我的 df 中删除所有从未完成任何任务的人，我该怎么办。

我试过这个代码

never completed<- df %>% 
  filter(completion != 0) %>% 
  group_by(people) %>% 
  summarise("frequency activity" = n())

df<- -c (df$nevercompleted)

计算 x 每 y 出现的频率并在 R 中可视化

Counting how often x occures per y and Visualize in R

r

count

histogram