运行 几次 pop-gen 模拟,并将每次的结果存储在数据框的新列中
Run a pop-gen simulation several times and store the results of each in a new column on a data frame
我有一个基本的 Wright-Fisher 模拟,用于两个等位基因,效果非常好,并生成了一个好看的图,显示等位基因按预期固定或偶然消失。它将计算出的每一代都导出到数据框 d 中,因此我手头有每一代的值。我想要做的是 运行 整个事情说 20 次并自动将每个完整的模拟存储在一个新列中,这样我就可以将它们全部绘制在带有颜色和所有好东西的 ggplot 图表上。我最感兴趣的是获得一个整洁的框架来为项目制作好看的图,而不是惊人的效率。
#Wright Fisher model Mk1
#Simulation Parameters
# n = pop.size
# f = frequency of focal allele
# x = number of focal allele, do not set by hand
# y = number of the other allele, do not set by hand
# g = generations desired
n = 200
f = 0.6
x = (n*f)
y = (n-x)
g = 200
#This creates a data frame of the correct size to store each generation
d = data.frame(f = rep(0,g))
#Creates the graph.
plot(1,0, type = "n", xlim = c(1,200), ylim = c(0,n),
xlab = "Generation", ylab = "Frequency A")
#Creates the population, this model is limited to only two alleles, and can only plot one
alleles<- c(rep("A",x), rep("a",y))
#this is the loop that actually simulates the population
#It has code for plotting each generation on the graph as a point
#Exports the number of focal allele A to the data frame
for (i in 1:g){
alleles <- sample(alleles, n, replace = TRUE)
points(i, length(alleles[alleles=="A"]), pch = 19, col= "red")
F = sum(alleles == "A")
d[i, ] = c(F)
}
所以我想 运行 最后一位多次并以某种方式存储每个完整的迭代。我知道我可以通过嵌套来循环函数,尽管这又快又脏,但这样做只会存储外循环最后一次迭代的值。
这里有很多改进的机会,但这应该会让你继续前进。我只展示了五个模拟,但你应该能够扩展。本质上,将大部分代码放在一个函数中,然后您可以使用 purrr
包中的 map
函数,或者您也可以使用 replicate
:
做一些事情
library(tidyverse)
n = 200
f = 0.6
x = (n*f)
y = (n-x)
g = 200
d = data.frame(f = rep(0,g))
run_sim <- function() {
alleles <- c(rep("A", x), rep("a", y))
for (i in 1:g) {
alleles <- sample(alleles, n, replace = TRUE)
cnt_A = sum(alleles == "A")
d[i, ] = c(cnt_A)
}
return(d)
}
sims <- paste0("sim_", 1:5)
set.seed(4) # for reproducibility
sims %>%
map_dfc(~ run_sim()) %>%
set_names(sims) %>%
gather(simulation, results) %>%
group_by(simulation) %>%
mutate(period = row_number()) %>%
ggplot(., aes(x = period, y = results, group = simulation, color = simulation)) +
geom_line()
由 reprex package (v0.2.1)
创建于 2019-03-21
注意:您还可以向 run_sim
函数添加参数,比如 x
和 y
(即 run_sim <- function(x, y) { ... }
),这样您就可以探索其他可能性。
我有一个基本的 Wright-Fisher 模拟,用于两个等位基因,效果非常好,并生成了一个好看的图,显示等位基因按预期固定或偶然消失。它将计算出的每一代都导出到数据框 d 中,因此我手头有每一代的值。我想要做的是 运行 整个事情说 20 次并自动将每个完整的模拟存储在一个新列中,这样我就可以将它们全部绘制在带有颜色和所有好东西的 ggplot 图表上。我最感兴趣的是获得一个整洁的框架来为项目制作好看的图,而不是惊人的效率。
#Wright Fisher model Mk1
#Simulation Parameters
# n = pop.size
# f = frequency of focal allele
# x = number of focal allele, do not set by hand
# y = number of the other allele, do not set by hand
# g = generations desired
n = 200
f = 0.6
x = (n*f)
y = (n-x)
g = 200
#This creates a data frame of the correct size to store each generation
d = data.frame(f = rep(0,g))
#Creates the graph.
plot(1,0, type = "n", xlim = c(1,200), ylim = c(0,n),
xlab = "Generation", ylab = "Frequency A")
#Creates the population, this model is limited to only two alleles, and can only plot one
alleles<- c(rep("A",x), rep("a",y))
#this is the loop that actually simulates the population
#It has code for plotting each generation on the graph as a point
#Exports the number of focal allele A to the data frame
for (i in 1:g){
alleles <- sample(alleles, n, replace = TRUE)
points(i, length(alleles[alleles=="A"]), pch = 19, col= "red")
F = sum(alleles == "A")
d[i, ] = c(F)
}
所以我想 运行 最后一位多次并以某种方式存储每个完整的迭代。我知道我可以通过嵌套来循环函数,尽管这又快又脏,但这样做只会存储外循环最后一次迭代的值。
这里有很多改进的机会,但这应该会让你继续前进。我只展示了五个模拟,但你应该能够扩展。本质上,将大部分代码放在一个函数中,然后您可以使用 purrr
包中的 map
函数,或者您也可以使用 replicate
:
library(tidyverse)
n = 200
f = 0.6
x = (n*f)
y = (n-x)
g = 200
d = data.frame(f = rep(0,g))
run_sim <- function() {
alleles <- c(rep("A", x), rep("a", y))
for (i in 1:g) {
alleles <- sample(alleles, n, replace = TRUE)
cnt_A = sum(alleles == "A")
d[i, ] = c(cnt_A)
}
return(d)
}
sims <- paste0("sim_", 1:5)
set.seed(4) # for reproducibility
sims %>%
map_dfc(~ run_sim()) %>%
set_names(sims) %>%
gather(simulation, results) %>%
group_by(simulation) %>%
mutate(period = row_number()) %>%
ggplot(., aes(x = period, y = results, group = simulation, color = simulation)) +
geom_line()
由 reprex package (v0.2.1)
创建于 2019-03-21注意:您还可以向 run_sim
函数添加参数,比如 x
和 y
(即 run_sim <- function(x, y) { ... }
),这样您就可以探索其他可能性。