从多个数据集中减去对照

Question

下面是一些 qPCR 数据的简化数据框：

   sample                  exprFile reaction_conc
1       A   140701_2014-07-03-15-49            59
2       A   140701_2014-07-03-15-49            70
3    NC_1   140701_2014-07-03-15-49             2
4    NC_1   140701_2014-07-03-15-49             3
5    NC_1   140701_2014-07-03-15-49             2
6       A 140701_2_2014-07-01-19-07           200
7       A 140701_2_2014-07-01-19-07           202
8       B 140701_2_2014-07-01-19-07           300
9       B 140701_2_2014-07-01-19-07           322
10      B 140701_2_2014-07-01-19-07           333
11   NC_1 140701_2_2014-07-01-19-07             8
12   NC_1 140701_2_2014-07-01-19-07             8
13   NC_2 140701_2_2014-07-01-19-07             4
14      D   140701_2014-07-02-20-53            44
15   NC_2   140701_2014-07-02-20-53             0
16   NC_2   140701_2014-07-02-20-53             2
17   NC_2   140701_2014-07-02-20-53             1
18      A   140708_2014-07-08-19-20           100
19      A   140708_2014-07-08-19-20           108
20      A   140708_2014-07-08-19-20           111
21      D   140708_2014-07-08-19-20            88
22      D   140708_2014-07-08-19-20            80
23      E   140708_2014-07-08-19-20           645
24   NC_3   140708_2014-07-08-19-20             8
25   NC_3   140708_2014-07-08-19-20            12
26   NC_1   140708_2014-07-08-19-20             4
27   NC_2   140708_2014-07-08-19-20             0

每个 exprFile 都是一个实验，我想通过取对照值 (reaction_conc) 的平均值从实验中的每个样本中减去对照（标记为 NC* 的样本）。一些实验包含多种类型的控制。我想为每种控件类型创建包含减去值的新列。最后，我想创建一个列来确定哪个控件类型最高并从值中减去它。

我可能对这个描述感到困惑（抱歉！），所以这是预期的输出：

   sample                  exprFile reaction_conc minusNC_1 minusNC_2 minusNC_3   minusNC_highest
1       A   140701_2014-07-03-15-49            59     56.67        NA        NA             56.67
2       A   140701_2014-07-03-15-49            70     67.67        NA        NA             67.67
3    NC_1   140701_2014-07-03-15-49             2     -0.33        NA        NA             -0.33
4    NC_1   140701_2014-07-03-15-49             3      0.67        NA        NA              0.67
5    NC_1   140701_2014-07-03-15-49             2     -0.33        NA        NA             -0.33
6       A 140701_2_2014-07-01-19-07           200    192.00       196        NA            192.00
7       A 140701_2_2014-07-01-19-07           202    194.00       198        NA            194.00       
8       B 140701_2_2014-07-01-19-07           300    292.00       296        NA            292.00
9       B 140701_2_2014-07-01-19-07           322    314.00       318        NA            314.00
10      B 140701_2_2014-07-01-19-07           333    325.00       329        NA            325.00
11   NC_1 140701_2_2014-07-01-19-07             8      0.00         4        NA              0.00
12   NC_1 140701_2_2014-07-01-19-07             8      0.00         4        NA              0.00
13   NC_2 140701_2_2014-07-01-19-07             4     -4.00         0        NA             -4.00
14      D   140701_2014-07-02-20-53            44        NA        43        NA             43.00          
15   NC_2   140701_2014-07-02-20-53             0        NA        -1        NA             -1.00
16   NC_2   140701_2014-07-02-20-53             2        NA         1        NA              1.00
17   NC_2   140701_2014-07-02-20-53             1        NA         0        NA              0.00
18      A   140708_2014-07-08-19-20           100     96.00       100        90             90.00
19      A   140708_2014-07-08-19-20           108    104.00       108        98             98.00
20      A   140708_2014-07-08-19-20           111    107.00       111       101            101.00
21      D   140708_2014-07-08-19-20            88     84.00        88        78             78.00
22      D   140708_2014-07-08-19-20            80     76.00        80        70             70.00
23      E   140708_2014-07-08-19-20           645    641.00       645       635            635.00
24   NC_3   140708_2014-07-08-19-20             8      4.00         8        -2             -2.00
25   NC_3   140708_2014-07-08-19-20            12      8.00        12         2              2.00
26   NC_1   140708_2014-07-08-19-20             4      0.00         4        -6             -6.00
27   NC_2   140708_2014-07-08-19-20             0     -4.00         0       -10            -10.00

Answer 1

我不太确定你的描述是什么意思。起初，您似乎想从其他样本中减去所有 NC1 样本的平均值，但您的预期输出与此相矛盾。如果是这样，我会建议：

##calculate means for control samples:
mean1=mean(df$reaction_conc[grepl("NC1",df$sample)])
mean2=mean(df$reaction_conc[grepl("NC2",df$sample)])
mean3=mean(df$reaction_conc[grepl("NC3",df$sample)])
##subtracts them from other data:
df=mutate(df,
minusNC_1 = reaction_conc - mean1,
minusNC_2 = reaction_conc - mean2,
minusNC_3 = reaction_conc - mean3,
minusNC_highest = reaction_conc - max(c(mean1,mean2,mean3)))

Answer 2

对于措辞不当的问题，我们深表歉意。在与我的实验室伙伴讨论之后，我得出了这样的结论：

#load dplyr    
library(dplyr)

#subset negative controls    
neg <- subset(data, sample == "NC_1" | sample == "NC_2" | sample == "NC_3")

#find mean of each control type in each experiment
neg_sub_mean <- neg %>% group_by(sample, exprFile) %>% summarise(mean = mean(DF_correction, na.rm=TRUE))

#find control with maximum value
neg_sub_mean_max <- neg_sub_mean %>% group_by(exprFile) %>% summarise(max_neg = max(mean))

#merge the original data dataframe and the newly created dataframe with the means of the maximum control per each experiment
df_merge <- full_join(data, neg_sub_mean_max, "exprFile")

#subtract the max negative control from the sample values
df_merge <- df_merge %>% mutate(minusNeg = reaction_conc - max_neg)

从多个数据集中减去对照

subtracting controls from multiple datasets

split

r

reshape2

dplyr