从多列中获取第一和第二高的频率
Get the first and second highest frequencies from multiple column
我有一个包含 52 行和 161 列的数据框。我已经给出了我的数据框的结构。
>str(CEPH)
'data.frame': 52 obs. of 161 variables:
$ id : chr "85" "86" "94" "00" ...
$ subgroup : chr "AAA" "AAA" "AAA" "AAA" ...
$ A1_A : chr "3:01" "3:01" "2:01" "2:01" ...
$ A1_B : chr "" "" "" "" ...
$ A2_A : chr "2:01" "32:01:01" "32:01:01" "68:01:02" ...
$ A2_B : chr "" "32:01:02" "32:01:02" "" ...
$ A2_C : chr "" "" "" "" ...
$ B1_A : chr "7:02:01" "44:03:01" "40:02:00" "44:02:00" ...
...
我在某些列中有更多的 NA。因此我需要找到第一和第二高的频率。我尝试了以下代码。但是有50多个列。不可能一一通过专栏。有没有什么方法可以使用 sapply
输入数据:
id subgroup A1_A A1_B A1_C A1_D A1_E A1_F A1_G
1 85 AAA 3:01 "" "" "" "" ""
2 86 AAA 3:01 05:01 "" 07:08 "" ""
3 94 AAA 2:01 05:01 "" "" "" ""
4 000 AAA 2:01 06:07 "" "" "" ""
5 37 AAA 30:01:00 07:08 "" "" "" ""
6 48 AAA 2:01 01:01 "" "" "" ""
fre <- function(CEPH,col) {
q<-sort(table(CEPH[,col]),decreasing = TRUE)[1:2]
return(q) }
fre(AAA,4)
我得到了没有列名的输出
NA 32:01:02
49 2
欲望输出
Types Frequent_Type Highest_Frequency
A1_A 2:01 20
A1_A NA 5
A1_B NA 49
A1_B 3:01:01 5
A1_C 2:01 20
A1_C 05:02 2
这可能不是确切的解决方案。但不知何故,我设法分别获得了两个频率并合并在一起。
first_highest<-t(sapply(CEPH, function(x) {t_x <- sort(table(x), decreasing=TRUE); list(value=names(t_x)[1],freq=t_x[1])}))
second_highest<-t(sapply(CEPH, function(x) {t_x <- sort(table(x), decreasing=TRUE); list(value=names(t_x)[2],freq=t_x[2])}))
frequeny<-cbind(first_highest,second_highest)
我有一个包含 52 行和 161 列的数据框。我已经给出了我的数据框的结构。
>str(CEPH)
'data.frame': 52 obs. of 161 variables:
$ id : chr "85" "86" "94" "00" ...
$ subgroup : chr "AAA" "AAA" "AAA" "AAA" ...
$ A1_A : chr "3:01" "3:01" "2:01" "2:01" ...
$ A1_B : chr "" "" "" "" ...
$ A2_A : chr "2:01" "32:01:01" "32:01:01" "68:01:02" ...
$ A2_B : chr "" "32:01:02" "32:01:02" "" ...
$ A2_C : chr "" "" "" "" ...
$ B1_A : chr "7:02:01" "44:03:01" "40:02:00" "44:02:00" ...
...
我在某些列中有更多的 NA。因此我需要找到第一和第二高的频率。我尝试了以下代码。但是有50多个列。不可能一一通过专栏。有没有什么方法可以使用 sapply
输入数据:
id subgroup A1_A A1_B A1_C A1_D A1_E A1_F A1_G
1 85 AAA 3:01 "" "" "" "" ""
2 86 AAA 3:01 05:01 "" 07:08 "" ""
3 94 AAA 2:01 05:01 "" "" "" ""
4 000 AAA 2:01 06:07 "" "" "" ""
5 37 AAA 30:01:00 07:08 "" "" "" ""
6 48 AAA 2:01 01:01 "" "" "" ""
fre <- function(CEPH,col) {
q<-sort(table(CEPH[,col]),decreasing = TRUE)[1:2]
return(q) }
fre(AAA,4)
我得到了没有列名的输出
NA 32:01:02
49 2
欲望输出
Types Frequent_Type Highest_Frequency
A1_A 2:01 20
A1_A NA 5
A1_B NA 49
A1_B 3:01:01 5
A1_C 2:01 20
A1_C 05:02 2
这可能不是确切的解决方案。但不知何故,我设法分别获得了两个频率并合并在一起。
first_highest<-t(sapply(CEPH, function(x) {t_x <- sort(table(x), decreasing=TRUE); list(value=names(t_x)[1],freq=t_x[1])}))
second_highest<-t(sapply(CEPH, function(x) {t_x <- sort(table(x), decreasing=TRUE); list(value=names(t_x)[2],freq=t_x[2])}))
frequeny<-cbind(first_highest,second_highest)