如何汇总 r 中不同列的级别
How do I summarize levels from different columns in r
我有一个 table (df),其中分类变量作为具有不同水平的因素
A_ID
B_ID
C_ID
valid number
valid number
invalid number
valid number
valid number
invalid number
invalid number
invalid number
too shot
too shot
too shot
too shot
valid number
too long
too shot
too long
too long
valid number
invalid number
valid number
too long
too long
invalid number
too long
too short
too short
valid number
too short
valid number
too long
too long
invalid number
too long
valid number
invalid number
valid number
我想根据每个列的级别数来汇总,例如我想统计每个级别在每个列中出现的次数,结果应该像下面的table
Variable
Count_valid
Count_Invalid
Count_Short
Count_Long
A_ID
3
2
3
3
B_ID
4
4
2
2
C_ID
3
2
3
4
我试过使用应用功能:
t(sapply(names(df), function(x)
c(count_Valid=count(df[x])== "valid value",
count_Invalid=count(df[x]) == "invalid value",
count_Short=count(df[x] == "too short",
count_Long=count(df[x] == "too long")))))
这个有用吗:
library(dplyr)
library(tidyr)
df %>% pivot_longer(cols = everything()) %>% count(name, value) %>%
pivot_wider(id_cols = name, names_from = value, values_from = n) %>%
select('Variable' = name, 'Count_valid' = `valid number`, 'Count_Invalid' = `invalid number`, 'Count_Short' = `too short`, 'Count_long' = `too long`)
# A tibble: 3 x 5
Variable Count_valid Count_Invalid Count_Short Count_long
<chr> <int> <int> <int> <int>
1 A_ID 4 2 3 3
2 B_ID 4 4 2 2
3 C_ID 3 2 3 4
使用的数据:
df
# A tibble: 12 x 3
A_ID B_ID C_ID
<chr> <chr> <chr>
1 valid number valid number invalid number
2 valid number valid number invalid number
3 invalid number invalid number too short
4 too short too short too short
5 valid number too long too short
6 too long too long valid number
7 invalid number valid number too long
8 too long invalid number too long
9 too short too short valid number
10 too short valid number too long
11 too long invalid number too long
12 valid number invalid number valid number
我有一个 table (df),其中分类变量作为具有不同水平的因素
A_ID | B_ID | C_ID |
---|---|---|
valid number | valid number | invalid number |
valid number | valid number | invalid number |
invalid number | invalid number | too shot |
too shot | too shot | too shot |
valid number | too long | too shot |
too long | too long | valid number |
invalid number | valid number | too long |
too long | invalid number | too long |
too short | too short | valid number |
too short | valid number | too long |
too long | invalid number | too long |
valid number | invalid number | valid number |
我想根据每个列的级别数来汇总,例如我想统计每个级别在每个列中出现的次数,结果应该像下面的table
Variable | Count_valid | Count_Invalid | Count_Short | Count_Long |
---|---|---|---|---|
A_ID | 3 | 2 | 3 | 3 |
B_ID | 4 | 4 | 2 | 2 |
C_ID | 3 | 2 | 3 | 4 |
我试过使用应用功能:
t(sapply(names(df), function(x)
c(count_Valid=count(df[x])== "valid value",
count_Invalid=count(df[x]) == "invalid value",
count_Short=count(df[x] == "too short",
count_Long=count(df[x] == "too long")))))
这个有用吗:
library(dplyr)
library(tidyr)
df %>% pivot_longer(cols = everything()) %>% count(name, value) %>%
pivot_wider(id_cols = name, names_from = value, values_from = n) %>%
select('Variable' = name, 'Count_valid' = `valid number`, 'Count_Invalid' = `invalid number`, 'Count_Short' = `too short`, 'Count_long' = `too long`)
# A tibble: 3 x 5
Variable Count_valid Count_Invalid Count_Short Count_long
<chr> <int> <int> <int> <int>
1 A_ID 4 2 3 3
2 B_ID 4 4 2 2
3 C_ID 3 2 3 4
使用的数据:
df
# A tibble: 12 x 3
A_ID B_ID C_ID
<chr> <chr> <chr>
1 valid number valid number invalid number
2 valid number valid number invalid number
3 invalid number invalid number too short
4 too short too short too short
5 valid number too long too short
6 too long too long valid number
7 invalid number valid number too long
8 too long invalid number too long
9 too short too short valid number
10 too short valid number too long
11 too long invalid number too long
12 valid number invalid number valid number