R:在for循环中一次子集多个数据帧
R: subset multiple data frames at once in for loop
我在 R 中有大量数据帧,我想在 for 循环中同时执行一些操作。
数据框包含有关基因表达数据的信息。对于每个基因,都有关于 upregulation/downregulation 的信息和相关的 P 值。最终,我想获得一个新的数据框,其中包含每个数据框显着(P 值 < 0.05)上调和下调基因的数量。
我将分两步进行:
- 在仅包含上调和下调基因的子集中对数据帧进行子集化
- 计算每个子集数据帧中重要基因的数量
首先,让我们制作两个虚拟数据框:
#data frame 1
gene = c('gene1','gene2','gene3','gene4','gene5','gene6')
direction = c('up','up','down','down','down','up')
Pvalue = as.numeric(c(0.05,0.06,0.001,0.075,0.11,0.12))
df1 = as.data.frame(cbind(gene,direction,Pvalue))
> df1
gene direction Pvalue
1 gene1 up 0.05
2 gene2 up 0.06
3 gene3 down 0.001
4 gene4 down 0.075
5 gene5 down 0.11
6 gene6 up 0.12
#data frame 2
gene = c('gene1','gene2','gene3','gene4','gene5','gene6')
direction = c('down','up','down','down','up','up')
Pvalue = as.numeric(c(0.043,0.001,0.34,0.96,0.001,0.04))
df2 = as.data.frame(cbind(gene,direction,Pvalue))
> df2
gene direction Pvalue
1 gene1 down 0.043
2 gene2 up 0.001
3 gene3 down 0.34
4 gene4 down 0.96
5 gene5 up 0.001
6 gene6 up 0.04
然后,我制作了一个包含所有数据框名称的列表:
df_summary = c('df1','df2')
然后我在这个列表上使用 for 循环来执行上面概述的步骤 1 和 2:
df3 = data.frame()
for (df in df_summary){
df_down = df[df$direction == 'down',]
df_up = df[df$direction == 'up',]
df_down_sign = length(which(df_down$Pvalue < 0.05))
df_up_sign = length(which(df_up$Pvalue < 0.05))
df3 = rbind.data.frame(df3, c(df_down_sign,df_up_sign))
}
此代码在循环外的单个数据帧上工作得很好,但是当我 运行 循环时抛出以下错误:
Error: $ operator is invalid for atomic vectors
我正在寻找的输出应该是这样的:
dataframe number
1 df1 1
2 df1 0
3 df2 1
4 df2 3
所以我的问题是:为什么我会在 for 循环中收到此错误,如何解决?
事实证明,在发布我的问题后,我遇到了一些看起来像是解决方案的东西。
简单运行
df_summary = list(df1,df2)
而不是
df_summary = c('df1','df2')
似乎解决了我的问题!
以下解决问题。
df_list <- mget(ls(pattern = "^df"))
df3 <- lapply(seq_along(df_list), function(i){
dftmp <- df_list[[i]]
dfname <- names(df_list)[i]
agg <- aggregate(Pvalue ~ direction, dftmp, function(x) sum(x < 0.05))
cbind.data.frame(dataframe = dfname, agg)
})
df3 <- do.call(rbind, df3)
df3
# dataframe direction Pvalue
#1 df1 down 1
#2 df1 up 0
#3 df2 down 1
#4 df2 up 3
我在 R 中有大量数据帧,我想在 for 循环中同时执行一些操作。
数据框包含有关基因表达数据的信息。对于每个基因,都有关于 upregulation/downregulation 的信息和相关的 P 值。最终,我想获得一个新的数据框,其中包含每个数据框显着(P 值 < 0.05)上调和下调基因的数量。
我将分两步进行:
- 在仅包含上调和下调基因的子集中对数据帧进行子集化
- 计算每个子集数据帧中重要基因的数量
首先,让我们制作两个虚拟数据框:
#data frame 1
gene = c('gene1','gene2','gene3','gene4','gene5','gene6')
direction = c('up','up','down','down','down','up')
Pvalue = as.numeric(c(0.05,0.06,0.001,0.075,0.11,0.12))
df1 = as.data.frame(cbind(gene,direction,Pvalue))
> df1 gene direction Pvalue 1 gene1 up 0.05 2 gene2 up 0.06 3 gene3 down 0.001 4 gene4 down 0.075 5 gene5 down 0.11 6 gene6 up 0.12
#data frame 2
gene = c('gene1','gene2','gene3','gene4','gene5','gene6')
direction = c('down','up','down','down','up','up')
Pvalue = as.numeric(c(0.043,0.001,0.34,0.96,0.001,0.04))
df2 = as.data.frame(cbind(gene,direction,Pvalue))
> df2 gene direction Pvalue 1 gene1 down 0.043 2 gene2 up 0.001 3 gene3 down 0.34 4 gene4 down 0.96 5 gene5 up 0.001 6 gene6 up 0.04
然后,我制作了一个包含所有数据框名称的列表:
df_summary = c('df1','df2')
然后我在这个列表上使用 for 循环来执行上面概述的步骤 1 和 2:
df3 = data.frame()
for (df in df_summary){
df_down = df[df$direction == 'down',]
df_up = df[df$direction == 'up',]
df_down_sign = length(which(df_down$Pvalue < 0.05))
df_up_sign = length(which(df_up$Pvalue < 0.05))
df3 = rbind.data.frame(df3, c(df_down_sign,df_up_sign))
}
此代码在循环外的单个数据帧上工作得很好,但是当我 运行 循环时抛出以下错误:
Error: $ operator is invalid for atomic vectors
我正在寻找的输出应该是这样的:
dataframe number 1 df1 1 2 df1 0 3 df2 1 4 df2 3
所以我的问题是:为什么我会在 for 循环中收到此错误,如何解决?
事实证明,在发布我的问题后,我遇到了一些看起来像是解决方案的东西。
简单运行
df_summary = list(df1,df2)
而不是
df_summary = c('df1','df2')
似乎解决了我的问题!
以下解决问题。
df_list <- mget(ls(pattern = "^df"))
df3 <- lapply(seq_along(df_list), function(i){
dftmp <- df_list[[i]]
dfname <- names(df_list)[i]
agg <- aggregate(Pvalue ~ direction, dftmp, function(x) sum(x < 0.05))
cbind.data.frame(dataframe = dfname, agg)
})
df3 <- do.call(rbind, df3)
df3
# dataframe direction Pvalue
#1 df1 down 1
#2 df1 up 0
#3 df2 down 1
#4 df2 up 3