查找大于 r 中特定频率的特定值的数量

Question

我正在尝试获取超过特定数量的列表的频率分布。在我的数据中，我有多个列，我想生成一个代码来标识每列中“0”的频率，其中“0”大于 3。

我的数据集是这样的：


a   b   c   d   e   f   g   h 
0   1   0   1   1   1   1   1
2   0   0   0   0   0   0   0
0   1   2   2   2   1   0   1
0   0   0   0   1   0   0   0
1   0   2   1   1   0   0   0
1   1   0   0   1   0   0   0
0   1   2   2   2   2   2   2
```

The output of the code that I need is :
```
Variable     Frequency
a            4 
c            4 
f            4
g            5
h            4
```

So this will show us the numbers of "0" in the data frame in each column when it is greater than 3.

Thank you.

Answer 1

您可以使用 colSums 计算每列中 0 的个数，并对大于 3 的值进行子集化。

subset(stack(colSums(df == 0, na.rm = TRUE)), values > 3)

tidyverse 方法是：

library(dplyr)
df %>%
  summarise(across(.fns = ~sum(. == 0, na.rm = TRUE))) %>%
  tidyr::pivot_longer(cols = everything()) %>%
  filter(value > 3)

#  name  value
#  <chr> <int>
#1 a         4
#2 c         4
#3 f         4
#4 g         5
#5 h         4

数据

df <- structure(list(a = c(0L, 2L, 0L, 0L, 1L, 1L, 0L), b = c(1L, 0L, 
1L, 0L, 0L, 1L, 1L), c = c(0L, 0L, 2L, 0L, 2L, 0L, 2L), d = c(1L, 
0L, 2L, 0L, 1L, 0L, 2L), e = c(1L, 0L, 2L, 1L, 1L, 1L, 2L), f = c(1L, 
0L, 1L, 0L, 0L, 0L, 2L), g = c(1L, 0L, 0L, 0L, 0L, 0L, 2L), h = c(1L, 
0L, 1L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA, -7L))

查找大于 r 中特定频率的特定值的数量

Find the number of specific value where is greater than a specific frequency in r

r

dplyr

data.table

data-cleaning