从多个数据集中减去对照
subtracting controls from multiple datasets
下面是一些 qPCR 数据的简化数据框:
sample exprFile reaction_conc
1 A 140701_2014-07-03-15-49 59
2 A 140701_2014-07-03-15-49 70
3 NC_1 140701_2014-07-03-15-49 2
4 NC_1 140701_2014-07-03-15-49 3
5 NC_1 140701_2014-07-03-15-49 2
6 A 140701_2_2014-07-01-19-07 200
7 A 140701_2_2014-07-01-19-07 202
8 B 140701_2_2014-07-01-19-07 300
9 B 140701_2_2014-07-01-19-07 322
10 B 140701_2_2014-07-01-19-07 333
11 NC_1 140701_2_2014-07-01-19-07 8
12 NC_1 140701_2_2014-07-01-19-07 8
13 NC_2 140701_2_2014-07-01-19-07 4
14 D 140701_2014-07-02-20-53 44
15 NC_2 140701_2014-07-02-20-53 0
16 NC_2 140701_2014-07-02-20-53 2
17 NC_2 140701_2014-07-02-20-53 1
18 A 140708_2014-07-08-19-20 100
19 A 140708_2014-07-08-19-20 108
20 A 140708_2014-07-08-19-20 111
21 D 140708_2014-07-08-19-20 88
22 D 140708_2014-07-08-19-20 80
23 E 140708_2014-07-08-19-20 645
24 NC_3 140708_2014-07-08-19-20 8
25 NC_3 140708_2014-07-08-19-20 12
26 NC_1 140708_2014-07-08-19-20 4
27 NC_2 140708_2014-07-08-19-20 0
每个 exprFile 都是一个实验,我想通过取对照值 (reaction_conc) 的平均值从实验中的每个样本中减去对照(标记为 NC* 的样本)。一些实验包含多种类型的控制。我想为每种控件类型创建包含减去值的新列。最后,我想创建一个列来确定哪个控件类型最高并从值中减去它。
我可能对这个描述感到困惑(抱歉!),所以这是预期的输出:
sample exprFile reaction_conc minusNC_1 minusNC_2 minusNC_3 minusNC_highest
1 A 140701_2014-07-03-15-49 59 56.67 NA NA 56.67
2 A 140701_2014-07-03-15-49 70 67.67 NA NA 67.67
3 NC_1 140701_2014-07-03-15-49 2 -0.33 NA NA -0.33
4 NC_1 140701_2014-07-03-15-49 3 0.67 NA NA 0.67
5 NC_1 140701_2014-07-03-15-49 2 -0.33 NA NA -0.33
6 A 140701_2_2014-07-01-19-07 200 192.00 196 NA 192.00
7 A 140701_2_2014-07-01-19-07 202 194.00 198 NA 194.00
8 B 140701_2_2014-07-01-19-07 300 292.00 296 NA 292.00
9 B 140701_2_2014-07-01-19-07 322 314.00 318 NA 314.00
10 B 140701_2_2014-07-01-19-07 333 325.00 329 NA 325.00
11 NC_1 140701_2_2014-07-01-19-07 8 0.00 4 NA 0.00
12 NC_1 140701_2_2014-07-01-19-07 8 0.00 4 NA 0.00
13 NC_2 140701_2_2014-07-01-19-07 4 -4.00 0 NA -4.00
14 D 140701_2014-07-02-20-53 44 NA 43 NA 43.00
15 NC_2 140701_2014-07-02-20-53 0 NA -1 NA -1.00
16 NC_2 140701_2014-07-02-20-53 2 NA 1 NA 1.00
17 NC_2 140701_2014-07-02-20-53 1 NA 0 NA 0.00
18 A 140708_2014-07-08-19-20 100 96.00 100 90 90.00
19 A 140708_2014-07-08-19-20 108 104.00 108 98 98.00
20 A 140708_2014-07-08-19-20 111 107.00 111 101 101.00
21 D 140708_2014-07-08-19-20 88 84.00 88 78 78.00
22 D 140708_2014-07-08-19-20 80 76.00 80 70 70.00
23 E 140708_2014-07-08-19-20 645 641.00 645 635 635.00
24 NC_3 140708_2014-07-08-19-20 8 4.00 8 -2 -2.00
25 NC_3 140708_2014-07-08-19-20 12 8.00 12 2 2.00
26 NC_1 140708_2014-07-08-19-20 4 0.00 4 -6 -6.00
27 NC_2 140708_2014-07-08-19-20 0 -4.00 0 -10 -10.00
我不太确定你的描述是什么意思。起初,您似乎想从其他样本中减去所有 NC1 样本的平均值,但您的预期输出与此相矛盾。
如果是这样,我会建议:
##calculate means for control samples:
mean1=mean(df$reaction_conc[grepl("NC1",df$sample)])
mean2=mean(df$reaction_conc[grepl("NC2",df$sample)])
mean3=mean(df$reaction_conc[grepl("NC3",df$sample)])
##subtracts them from other data:
df=mutate(df,
minusNC_1 = reaction_conc - mean1,
minusNC_2 = reaction_conc - mean2,
minusNC_3 = reaction_conc - mean3,
minusNC_highest = reaction_conc - max(c(mean1,mean2,mean3)))
对于措辞不当的问题,我们深表歉意。在与我的实验室伙伴讨论之后,我得出了这样的结论:
#load dplyr
library(dplyr)
#subset negative controls
neg <- subset(data, sample == "NC_1" | sample == "NC_2" | sample == "NC_3")
#find mean of each control type in each experiment
neg_sub_mean <- neg %>% group_by(sample, exprFile) %>% summarise(mean = mean(DF_correction, na.rm=TRUE))
#find control with maximum value
neg_sub_mean_max <- neg_sub_mean %>% group_by(exprFile) %>% summarise(max_neg = max(mean))
#merge the original data dataframe and the newly created dataframe with the means of the maximum control per each experiment
df_merge <- full_join(data, neg_sub_mean_max, "exprFile")
#subtract the max negative control from the sample values
df_merge <- df_merge %>% mutate(minusNeg = reaction_conc - max_neg)
下面是一些 qPCR 数据的简化数据框:
sample exprFile reaction_conc
1 A 140701_2014-07-03-15-49 59
2 A 140701_2014-07-03-15-49 70
3 NC_1 140701_2014-07-03-15-49 2
4 NC_1 140701_2014-07-03-15-49 3
5 NC_1 140701_2014-07-03-15-49 2
6 A 140701_2_2014-07-01-19-07 200
7 A 140701_2_2014-07-01-19-07 202
8 B 140701_2_2014-07-01-19-07 300
9 B 140701_2_2014-07-01-19-07 322
10 B 140701_2_2014-07-01-19-07 333
11 NC_1 140701_2_2014-07-01-19-07 8
12 NC_1 140701_2_2014-07-01-19-07 8
13 NC_2 140701_2_2014-07-01-19-07 4
14 D 140701_2014-07-02-20-53 44
15 NC_2 140701_2014-07-02-20-53 0
16 NC_2 140701_2014-07-02-20-53 2
17 NC_2 140701_2014-07-02-20-53 1
18 A 140708_2014-07-08-19-20 100
19 A 140708_2014-07-08-19-20 108
20 A 140708_2014-07-08-19-20 111
21 D 140708_2014-07-08-19-20 88
22 D 140708_2014-07-08-19-20 80
23 E 140708_2014-07-08-19-20 645
24 NC_3 140708_2014-07-08-19-20 8
25 NC_3 140708_2014-07-08-19-20 12
26 NC_1 140708_2014-07-08-19-20 4
27 NC_2 140708_2014-07-08-19-20 0
每个 exprFile 都是一个实验,我想通过取对照值 (reaction_conc) 的平均值从实验中的每个样本中减去对照(标记为 NC* 的样本)。一些实验包含多种类型的控制。我想为每种控件类型创建包含减去值的新列。最后,我想创建一个列来确定哪个控件类型最高并从值中减去它。
我可能对这个描述感到困惑(抱歉!),所以这是预期的输出:
sample exprFile reaction_conc minusNC_1 minusNC_2 minusNC_3 minusNC_highest
1 A 140701_2014-07-03-15-49 59 56.67 NA NA 56.67
2 A 140701_2014-07-03-15-49 70 67.67 NA NA 67.67
3 NC_1 140701_2014-07-03-15-49 2 -0.33 NA NA -0.33
4 NC_1 140701_2014-07-03-15-49 3 0.67 NA NA 0.67
5 NC_1 140701_2014-07-03-15-49 2 -0.33 NA NA -0.33
6 A 140701_2_2014-07-01-19-07 200 192.00 196 NA 192.00
7 A 140701_2_2014-07-01-19-07 202 194.00 198 NA 194.00
8 B 140701_2_2014-07-01-19-07 300 292.00 296 NA 292.00
9 B 140701_2_2014-07-01-19-07 322 314.00 318 NA 314.00
10 B 140701_2_2014-07-01-19-07 333 325.00 329 NA 325.00
11 NC_1 140701_2_2014-07-01-19-07 8 0.00 4 NA 0.00
12 NC_1 140701_2_2014-07-01-19-07 8 0.00 4 NA 0.00
13 NC_2 140701_2_2014-07-01-19-07 4 -4.00 0 NA -4.00
14 D 140701_2014-07-02-20-53 44 NA 43 NA 43.00
15 NC_2 140701_2014-07-02-20-53 0 NA -1 NA -1.00
16 NC_2 140701_2014-07-02-20-53 2 NA 1 NA 1.00
17 NC_2 140701_2014-07-02-20-53 1 NA 0 NA 0.00
18 A 140708_2014-07-08-19-20 100 96.00 100 90 90.00
19 A 140708_2014-07-08-19-20 108 104.00 108 98 98.00
20 A 140708_2014-07-08-19-20 111 107.00 111 101 101.00
21 D 140708_2014-07-08-19-20 88 84.00 88 78 78.00
22 D 140708_2014-07-08-19-20 80 76.00 80 70 70.00
23 E 140708_2014-07-08-19-20 645 641.00 645 635 635.00
24 NC_3 140708_2014-07-08-19-20 8 4.00 8 -2 -2.00
25 NC_3 140708_2014-07-08-19-20 12 8.00 12 2 2.00
26 NC_1 140708_2014-07-08-19-20 4 0.00 4 -6 -6.00
27 NC_2 140708_2014-07-08-19-20 0 -4.00 0 -10 -10.00
我不太确定你的描述是什么意思。起初,您似乎想从其他样本中减去所有 NC1 样本的平均值,但您的预期输出与此相矛盾。 如果是这样,我会建议:
##calculate means for control samples:
mean1=mean(df$reaction_conc[grepl("NC1",df$sample)])
mean2=mean(df$reaction_conc[grepl("NC2",df$sample)])
mean3=mean(df$reaction_conc[grepl("NC3",df$sample)])
##subtracts them from other data:
df=mutate(df,
minusNC_1 = reaction_conc - mean1,
minusNC_2 = reaction_conc - mean2,
minusNC_3 = reaction_conc - mean3,
minusNC_highest = reaction_conc - max(c(mean1,mean2,mean3)))
对于措辞不当的问题,我们深表歉意。在与我的实验室伙伴讨论之后,我得出了这样的结论:
#load dplyr
library(dplyr)
#subset negative controls
neg <- subset(data, sample == "NC_1" | sample == "NC_2" | sample == "NC_3")
#find mean of each control type in each experiment
neg_sub_mean <- neg %>% group_by(sample, exprFile) %>% summarise(mean = mean(DF_correction, na.rm=TRUE))
#find control with maximum value
neg_sub_mean_max <- neg_sub_mean %>% group_by(exprFile) %>% summarise(max_neg = max(mean))
#merge the original data dataframe and the newly created dataframe with the means of the maximum control per each experiment
df_merge <- full_join(data, neg_sub_mean_max, "exprFile")
#subtract the max negative control from the sample values
df_merge <- df_merge %>% mutate(minusNeg = reaction_conc - max_neg)