将 plyr::ddply 转换为 dplyr
Convert plyr::ddply to dplyr
我有一个这样的数据框:
tmp <- read.table(header = T, text = "gene_id gene_symbol ensembl_id keep val1 val2 val3
x a Multiple Yes 1 2 3
x1 a Multiple No 2 3 4
x2 a Multiple No 1 4 3
y b Multiple Yes 22 20 12
y1 b Multiple No 98 7 97
y2 b Multiple No 8 76 6")
我正在尝试按 gene_symbol
变量分组并计算 keep == "Yes"
的每一行与所有其他行 (keep == "No"
) 之间的相关性,并返回平均相关性以及gene_symbol
和 gene_id
。这是函数:
# function to calculate avg. correlation
calc.mean.corr <- function(x){
gene.id <- x[which(x$keep == "Yes"),"gene_id"]
x1 <- x %>%
filter(keep == "Yes") %>%
select(-c(gene_id, gene_symbol, ensembl_id, keep)) %>%
as.numeric()
x2 <- x %>%
filter(keep == "No") %>%
select(-c(gene_id, gene_symbol, ensembl_id, keep))
# correlation of kept id with discarded ids
cor <- mean(apply(x2, 1, FUN = function(y) cor(x1, y)))
cor <- round(cor, digits = 2)
df <- data.frame(avg.cor = cor, gene_id = gene.id)
return(df)
}
# call using ddply
for.corr <- plyr::ddply(tmp, .variables = "gene_symbol", .fun = function(x) calc.mean.corr(x))
最终输出如下所示:
> for.corr
gene_symbol avg.cor gene_id
1 a 0.83 x
2 b 0.02 y
我为此使用 plyr::ddply
,但想改用 dplyr
。但是,我不确定如何将其转换为 dplyr 格式。任何帮助将非常感激。
如果我们不想更改函数,一个选项是执行 group_split
并应用函数
library(dplyr)
library(purrr)
tmp %>%
group_split(gene_symbol) %>%
map_dfr(calc.mean.corr)
包括 gene_symbol
tmp %>%
split(.$gene_symbol) %>%
map_dfr(~ calc.mean.corr(.), .id = 'gene_symbol')
# gene_symbol avg.cor gene_id
#1 a 0.83 x
#2 b 0.02 y
我有一个这样的数据框:
tmp <- read.table(header = T, text = "gene_id gene_symbol ensembl_id keep val1 val2 val3
x a Multiple Yes 1 2 3
x1 a Multiple No 2 3 4
x2 a Multiple No 1 4 3
y b Multiple Yes 22 20 12
y1 b Multiple No 98 7 97
y2 b Multiple No 8 76 6")
我正在尝试按 gene_symbol
变量分组并计算 keep == "Yes"
的每一行与所有其他行 (keep == "No"
) 之间的相关性,并返回平均相关性以及gene_symbol
和 gene_id
。这是函数:
# function to calculate avg. correlation
calc.mean.corr <- function(x){
gene.id <- x[which(x$keep == "Yes"),"gene_id"]
x1 <- x %>%
filter(keep == "Yes") %>%
select(-c(gene_id, gene_symbol, ensembl_id, keep)) %>%
as.numeric()
x2 <- x %>%
filter(keep == "No") %>%
select(-c(gene_id, gene_symbol, ensembl_id, keep))
# correlation of kept id with discarded ids
cor <- mean(apply(x2, 1, FUN = function(y) cor(x1, y)))
cor <- round(cor, digits = 2)
df <- data.frame(avg.cor = cor, gene_id = gene.id)
return(df)
}
# call using ddply
for.corr <- plyr::ddply(tmp, .variables = "gene_symbol", .fun = function(x) calc.mean.corr(x))
最终输出如下所示:
> for.corr
gene_symbol avg.cor gene_id
1 a 0.83 x
2 b 0.02 y
我为此使用 plyr::ddply
,但想改用 dplyr
。但是,我不确定如何将其转换为 dplyr 格式。任何帮助将非常感激。
如果我们不想更改函数,一个选项是执行 group_split
并应用函数
library(dplyr)
library(purrr)
tmp %>%
group_split(gene_symbol) %>%
map_dfr(calc.mean.corr)
包括 gene_symbol
tmp %>%
split(.$gene_symbol) %>%
map_dfr(~ calc.mean.corr(.), .id = 'gene_symbol')
# gene_symbol avg.cor gene_id
#1 a 0.83 x
#2 b 0.02 y