对两列 (x,y) 中包含的所有数据对执行 Spearman 相关?
Perform Spearman Correlation for all pairs of data contained in two columns (x,y)?
我的 .csv 格式数据如下所示:
sampleid blue red otuid
AB1 0.001020366 0.000262013 K00001
AB1 7.24E-05 0.00000307 K00002
AB1 0.000500854 0.000635104 K00003
AB1 3.50E-05 0.000000555 K00004
AB1 0.000196537 0.0000346 K00005
AB1 2.56E-05 2.92E-08 K00006
AB1 0.00027525 0.0000392 K00007
AB1 0.000177602 0.000000994 K00008
AB1 0.000128098 0.000151901 K00009
AB1 1.46E-06 0.000000468 K00010
AB1 0.000348187 0.000571836 K00011
AB1 0.000448518 0.000435364 K00012
AB1 0.000490293 0.000729903 K00013
AB1 0.000263668 0.00000567 K00014
AB1 0.00054961 0.000406697 K00015
AB2 0.001020366 0.000262013 K00001
AB2 7.24E-05 0.00000307 K00002
AB2 0.000500854 0.000635104 K00003
AB2 3.50E-05 0.000000555 K00004
AB2 0.000196537 0.0000346 K00005
AB2 2.56E-05 2.92E-08 K00006
AB2 0.00027525 0.0000392 K00007
AB2 0.000177602 0.000000994 K00008
AB2 0.000128098 0.000151901 K00009
AB2 1.46E-06 0.000000468 K00010
AB2 0.000348187 0.000571836 K00011
AB2 0.000448518 0.000435364 K00012
AB2 0.000490293 0.000729903 K00013
AB2 0.000263668 0.00000567 K00014
AB2 0.00054961 0.000406697 K00015
当我 运行 cor() 这样时:
d <- read.csv("name.csv")
cor(rank(test[,3]),rank(test[,4])
[1] 0.777888
我假设这是所有相关性测试的平均 R,但我更愿意在每个测试(X 与 Y)中获得每个 sample/OTU 的单独 R,这样我就可以写一个 table 看起来像这样:
otuid sampleid Spearman's R
k00001 Sample1 0.001
k00002 Sample1 0.012
k00003 Sample1 0.013
k00004 Sample1 0.015 ......
k00001 Sample2 0.001
k00002 Sample2 0.012
k00003 Sample2 0.013
k00004 Sample2 0.015
感谢您的帮助!
Data.frame 加快速度:
sampleid = c("AB1","AB1","AB1","AB1","AB1","AB1","AB1","AB1","AB1",
"AB1","AB1","AB1","AB1","AB1","AB2","AB2","AB2","AB2","AB2","AB2","AB2",
"AB2","AB2","AB2","AB2","AB2","AB2","AB2","AB2","AB2")
red = c(runif(30,0,100))
blue = c(runif(30,0,100))
otuid =c("K00001","K00002","K00003","K00004","K00005","K00006",
"K00007","K00008","K00009","K00010","K00011","K00012",
"K00013","K00014","K00015","K00001","K00002","K00003","K00004",
"K00005","K00006","K00007","K00008","K00009","K00010",
"K00011","K00012","K00013","K00014","K00015")
df = data.frame(sampleid, red, blue,otuid)
df
print(p)
根据您的评论并使用您提供的数据框,您可以使用 purrr 包计算每个样本内的相关性,如下所示:
library(purrr)
df %>%
split(.$sampleid) %>%
map_dbl(~ cor(.$blue, .$red))
#> AB1 AB2
#> 0.07714403 0.38077482
这是获得类似内容的基本 R 方法:
by(df, df$sampleid, function(x) cor(x$blue, x$red))
#> df$sampleid: AB1
#> [1] 0.205726
#> --------------------------------------------------------
#> df$sampleid: AB2
#> [1] 0.3237938
我的 .csv 格式数据如下所示:
sampleid blue red otuid
AB1 0.001020366 0.000262013 K00001
AB1 7.24E-05 0.00000307 K00002
AB1 0.000500854 0.000635104 K00003
AB1 3.50E-05 0.000000555 K00004
AB1 0.000196537 0.0000346 K00005
AB1 2.56E-05 2.92E-08 K00006
AB1 0.00027525 0.0000392 K00007
AB1 0.000177602 0.000000994 K00008
AB1 0.000128098 0.000151901 K00009
AB1 1.46E-06 0.000000468 K00010
AB1 0.000348187 0.000571836 K00011
AB1 0.000448518 0.000435364 K00012
AB1 0.000490293 0.000729903 K00013
AB1 0.000263668 0.00000567 K00014
AB1 0.00054961 0.000406697 K00015
AB2 0.001020366 0.000262013 K00001
AB2 7.24E-05 0.00000307 K00002
AB2 0.000500854 0.000635104 K00003
AB2 3.50E-05 0.000000555 K00004
AB2 0.000196537 0.0000346 K00005
AB2 2.56E-05 2.92E-08 K00006
AB2 0.00027525 0.0000392 K00007
AB2 0.000177602 0.000000994 K00008
AB2 0.000128098 0.000151901 K00009
AB2 1.46E-06 0.000000468 K00010
AB2 0.000348187 0.000571836 K00011
AB2 0.000448518 0.000435364 K00012
AB2 0.000490293 0.000729903 K00013
AB2 0.000263668 0.00000567 K00014
AB2 0.00054961 0.000406697 K00015
当我 运行 cor() 这样时:
d <- read.csv("name.csv")
cor(rank(test[,3]),rank(test[,4])
[1] 0.777888
我假设这是所有相关性测试的平均 R,但我更愿意在每个测试(X 与 Y)中获得每个 sample/OTU 的单独 R,这样我就可以写一个 table 看起来像这样:
otuid sampleid Spearman's R
k00001 Sample1 0.001
k00002 Sample1 0.012
k00003 Sample1 0.013
k00004 Sample1 0.015 ......
k00001 Sample2 0.001
k00002 Sample2 0.012
k00003 Sample2 0.013
k00004 Sample2 0.015
感谢您的帮助!
Data.frame 加快速度:
sampleid = c("AB1","AB1","AB1","AB1","AB1","AB1","AB1","AB1","AB1",
"AB1","AB1","AB1","AB1","AB1","AB2","AB2","AB2","AB2","AB2","AB2","AB2",
"AB2","AB2","AB2","AB2","AB2","AB2","AB2","AB2","AB2")
red = c(runif(30,0,100))
blue = c(runif(30,0,100))
otuid =c("K00001","K00002","K00003","K00004","K00005","K00006",
"K00007","K00008","K00009","K00010","K00011","K00012",
"K00013","K00014","K00015","K00001","K00002","K00003","K00004",
"K00005","K00006","K00007","K00008","K00009","K00010",
"K00011","K00012","K00013","K00014","K00015")
df = data.frame(sampleid, red, blue,otuid)
df
print(p)
根据您的评论并使用您提供的数据框,您可以使用 purrr 包计算每个样本内的相关性,如下所示:
library(purrr)
df %>%
split(.$sampleid) %>%
map_dbl(~ cor(.$blue, .$red))
#> AB1 AB2
#> 0.07714403 0.38077482
这是获得类似内容的基本 R 方法:
by(df, df$sampleid, function(x) cor(x$blue, x$red))
#> df$sampleid: AB1
#> [1] 0.205726
#> --------------------------------------------------------
#> df$sampleid: AB2
#> [1] 0.3237938