JAGS 检测模拟 Error/Ecologists Bungle 高级统计
JAGS Detection Simulation Error/Ecologists Bungle Advanced Statistics
我是贝叶斯世界的新手,很高兴能将它应用到我的研究中。作为第一步,我尝试建立一个模拟模型,该模型估计声学探测器 10 公里半径内的平均动物数量和作为噪音水平函数的探测概率。基本上,我问的是两个位置是否有 1k 只动物但噪声特征不同,模型能否告诉我 "true" 表示(1k 只动物)在两个研究区域之间是否相同。希望是的。
这是一个三步过程,它将检测到的呼叫数量 (kk1) 与检测概率 (Pdet)、信噪比(SNR,在对数标度上)最终与动物的平均数量联系起来区域。
我模拟了我的动物位置、源电平、传输损耗(取决于动物位置)、SNR 和 Pdet 函数如下。
##################################################################
library(runjags)
#library(plotrix)
library(rjags)
# Initializations, thanks Matt
### Gamma distribution specification function, thanks Matt! ###
gammaPr<-function(mu, sd)
{
shape<-mu^2/sd^2
rate<-mu/sd^2
x<-seq(max(0, mu-4*sd),mu+4*sd, length.out=100)
#plot(x, dgamma(x, shape=shape, rate=rate), main=paste("Prior distribution with mean ",mu," and sd ", sd), type="l")
#print(paste("The two parameters of the Gamma distribution are shape=",shape," and rate=",rate))
return(c(shape,rate))
}
##################
# Simulated Data #
##################
# Number of animals within the detection radius (5km)
N=1000
#x=seq(0,5,by=1/N)
# Idiot checking:
# Since this is point sampling and (for the moment) we assume that the number
# of animals available to be detected increases linearly with radius
#x1 = rbeta(N, shape1 = 2, shape2 = 1)
#theta = runif(N, min = 0, max=2*pi)
#radial.plot(lengths = x1, radial.pos = theta, rp.type = 's')
## We know the parameters for Source Levels values from the literature
nlpr=gammaPr(120, 30)
# Time series
tt=seq(1, 300, by=1)
# Number of animals at each site available to be sampled
N1=numeric(length=length(tt))
N2=N1
NL1=N1 # Noise Level Distribution
NL2=N1
TL1=N1 # Estimated Transmission loss
TL2=N1
kk1=N1 # Number of animals detected by the sensor
kk2=N1
for (ii in 1:length(tt)){
#N1[ii]=round(rnorm(1, mean = 700, sd=100)) # Screw it-there are enough problems to deal with
#N2[ii]=round(rnorm(1, mean = 700, sd=100))
N1[ii]=1000
N2[ii]=1000
# Random whale distances
r1=rbeta(N1[ii], shape1 = 2, shape2 = 1)*5 # assumed (known)
r2=rbeta(N2[ii], shape1 = 2, shape2 = 1)*5
# Random source levels for each whale
slpr=gammaPr(131, 20) # assumed (known)
SL1=rgamma(N1[ii], slpr[1], slpr[2])
SL2=rgamma(N2[ii], slpr[1], slpr[2])
# Noise Levels for each sensor
nlpr=gammaPr(120, 30)
NL1[ii]=rgamma(1, nlpr[1], nlpr[2]) # measured (known)
nlpr=gammaPr(90, 30)
NL2[ii]=rgamma(1, nlpr[1], nlpr[2])
# TL Transmission loss from each animal to sensor
TL1=20*log10(r1*1000) # this one is tricky, we can assume TL but it's based on multiple captures so how this make it into the probability density function escapes me
TL2=20*log10(r2*1000)
# Signal to noise ratio for each animal call
SNR1=SL1-(1/(1000*r1)^2)-NL1[ii] # calculated
SNR2=SL2-(1/(1000*r2)^2)-NL2[ii]
# Detection probability for a given SNR threshold
Pdet1=pbeta(SNR1, shape1 = 1.5, shape2 = 2.5) # unknown, we will need to estimate shape1 and shape from the JAGS model
Pdet2=pbeta(SNR2, shape1 = 1.2, shape2 = 2.5)
## Idiot checking again
#yy=pbeta(seq(-40,40, by=.01), 1.5, 2.5)
#plot(seq(-40,40, by=.01),yy, xlim=c(-2,2))
# Number of animals detected is binomial based on the detection function
kk1[ii]=sum(rbinom(N1[ii], size=1, prob = Pdet1))
kk2[ii]=sum(rbinom(N2[ii], size=1, prob = Pdet2))
}
data=data.frame(cbind(tt, N1,N2, NL1, NL2,kk1, kk2))
##################################################################
效果很好,但我未能成功构建一个 JAGS 模型
可以预测不同时间动物的平均数量,更不用说在两个不同的研究区域了。该模型的最新版本(对我而言)在逻辑上是有意义的,但会抛出有关重新定义节点 Pdet1 的错误。但由于这些是我试图估计的值,因此相当混乱。如有任何帮助,我们将不胜感激。
也许是定义临时变量的问题? E.g.?但这似乎与我的问题有些不同,因为我只使用一个循环。
提前感谢您的宝贵时间。
#############################################################################
# Model Fitting#
#############################################################################
# Build the JAGS model to determine if the true number
# of animals at each location is different
# We are trying to estimate the mean population size (mu)
# Number of trials (detection periods)
N=nrow(data)
# We guess that the median range of animals within the detection
# area is 2500 m (will be updated later with propationg modelling)
# This is stupid. There needs to be stochasticity in both values.
MedTL=20*log10(2500)
MedSL=rgamma(1, slpr[1], slpr[2])
# We are trying to estimate the population mean and the
# detection function parameters
modeltext <- '
model{
#### liklihood ###
for(ii in 1:N){
# Number of animals detected is distributed by a normal function the total available (mu)
# and the probability that they are detected(Pdet)
kk1[ii] ~ dbin(Pdet, mu)
# Pdet is a beta distribution with parameters estimated from above
Pdet<-pbeta(SNR[ii], betaPar1, betaPar2)
# SNR (signal to noise ratio) is the Source Level-Noise Levl-Transmission Loss
SNR[ii]<-MedSL-MedTL-NL1[ii]
# SL and RL need stochasticity.....
}
## Priors ###
# Mu is dependent on how the animals are distributed in space. In this case, 1000. Because that's what I told it to do.
mu~dpois(lambda)
# We know mu must be greater than 0 and poison
lambda~dgamma(11, .01) # approximately what they should be based on mu 1000, sd 300
# We will assume we know a bit about the detection function, we will be helpful
betaPar1 ~ dnorm(1,.3)
betaPar2 ~ dnorm(2,.3)
}
#data# kk1, NL1, N, slpr, MedTL, MedSL
#monitor# betaPar1, betaPar2, mu, pdet
'
# We could just run the model like this:
results1 <- run.jags(modeltext, n.chains=2)
在您的 JAGS 模型中,Pdet 是
中的标量概率
kk1[ii] ~ dbin(Pdet, mu)
但随后您将 Pdet 指定为 beta 概率分布函数:
Pdet<-pbeta(SNR[ii], betaPar1, betaPar2)
大概您的意思是 Pdet 是从 beta 发行版中提取的。这将是
Pdet ~ pbeta(SNR[ii], betaPar1, betaPar2)
或者(与您模拟的数据更一致),您的意思是 kk1[ii] 是 1000 次具有个体特定检测概率 Pdet 的伯努利试验的总和。所以你需要做的是让 kk1[ii] 成为一堆单独的特定伯努利试验的总和。在 JAGS 中,您观察到的数据必须是某种分布的随机实现,因此您 不能 写入,例如
for(ii in 1:300){
for(jj in 1:mu){
kk11[jj] ~ dbern(p[jj])
}
kk1[ii] <- sum(kk11[jj])
}
幸运的是,JAGS 已经预料到了您的困惑,并且已经为您的目的实现了特殊的可观察函数 dsum()。查看 JAGS 手册。
总而言之,除非我误读了你的模拟,否则你真正需要的是使用 kk1[ii] 遍历时间步长,但在该循环中你需要遍历个体,每个个体都需要在每个时间步长中有自己的模型化检测概率。此时,您可能会想问,"but how is their possibly enough data to model the individual detection probabilities across time-steps?" 答案是,您真正需要建模的只是多年来个体检测概率的 分布 。这将与传感器的个体距离分布、系统的(可能随时间变化的)噪声特性等有关。
好消息是(初步近似),只要你能在R中模拟数据,你就可以在JAGS中制定相应的模型。您现在的挑战是确保模拟生成您想要的数据,然后 JAGS 模型实际对应于模拟。
后记
为此,我想指出您可能没有想到的模拟和模型之间的一个奇怪差异。在模拟中,您固定 N=1000。在模型中,mu 的先验是伽马-泊松混合。你这样做有理由吗?一个原因是,如果真正感兴趣的量是泊松参数 lambda,它对应于系统中动物的密度,N 就是从中得出的。 在那种情况下,您也应该将此作为数据模拟的一部分:即
N <- rpois(1000)
并且您应该将 lambda(而不是 mu)作为 JAGS 输出中的主要关注量进行监控。 [这里还需要考虑 N 是绘制一次然后固定,还是在不同的时间步长重复绘制]。 否则,您可能会考虑更直观的 mu.
我是贝叶斯世界的新手,很高兴能将它应用到我的研究中。作为第一步,我尝试建立一个模拟模型,该模型估计声学探测器 10 公里半径内的平均动物数量和作为噪音水平函数的探测概率。基本上,我问的是两个位置是否有 1k 只动物但噪声特征不同,模型能否告诉我 "true" 表示(1k 只动物)在两个研究区域之间是否相同。希望是的。
这是一个三步过程,它将检测到的呼叫数量 (kk1) 与检测概率 (Pdet)、信噪比(SNR,在对数标度上)最终与动物的平均数量联系起来区域。
我模拟了我的动物位置、源电平、传输损耗(取决于动物位置)、SNR 和 Pdet 函数如下。
##################################################################
library(runjags)
#library(plotrix)
library(rjags)
# Initializations, thanks Matt
### Gamma distribution specification function, thanks Matt! ###
gammaPr<-function(mu, sd)
{
shape<-mu^2/sd^2
rate<-mu/sd^2
x<-seq(max(0, mu-4*sd),mu+4*sd, length.out=100)
#plot(x, dgamma(x, shape=shape, rate=rate), main=paste("Prior distribution with mean ",mu," and sd ", sd), type="l")
#print(paste("The two parameters of the Gamma distribution are shape=",shape," and rate=",rate))
return(c(shape,rate))
}
##################
# Simulated Data #
##################
# Number of animals within the detection radius (5km)
N=1000
#x=seq(0,5,by=1/N)
# Idiot checking:
# Since this is point sampling and (for the moment) we assume that the number
# of animals available to be detected increases linearly with radius
#x1 = rbeta(N, shape1 = 2, shape2 = 1)
#theta = runif(N, min = 0, max=2*pi)
#radial.plot(lengths = x1, radial.pos = theta, rp.type = 's')
## We know the parameters for Source Levels values from the literature
nlpr=gammaPr(120, 30)
# Time series
tt=seq(1, 300, by=1)
# Number of animals at each site available to be sampled
N1=numeric(length=length(tt))
N2=N1
NL1=N1 # Noise Level Distribution
NL2=N1
TL1=N1 # Estimated Transmission loss
TL2=N1
kk1=N1 # Number of animals detected by the sensor
kk2=N1
for (ii in 1:length(tt)){
#N1[ii]=round(rnorm(1, mean = 700, sd=100)) # Screw it-there are enough problems to deal with
#N2[ii]=round(rnorm(1, mean = 700, sd=100))
N1[ii]=1000
N2[ii]=1000
# Random whale distances
r1=rbeta(N1[ii], shape1 = 2, shape2 = 1)*5 # assumed (known)
r2=rbeta(N2[ii], shape1 = 2, shape2 = 1)*5
# Random source levels for each whale
slpr=gammaPr(131, 20) # assumed (known)
SL1=rgamma(N1[ii], slpr[1], slpr[2])
SL2=rgamma(N2[ii], slpr[1], slpr[2])
# Noise Levels for each sensor
nlpr=gammaPr(120, 30)
NL1[ii]=rgamma(1, nlpr[1], nlpr[2]) # measured (known)
nlpr=gammaPr(90, 30)
NL2[ii]=rgamma(1, nlpr[1], nlpr[2])
# TL Transmission loss from each animal to sensor
TL1=20*log10(r1*1000) # this one is tricky, we can assume TL but it's based on multiple captures so how this make it into the probability density function escapes me
TL2=20*log10(r2*1000)
# Signal to noise ratio for each animal call
SNR1=SL1-(1/(1000*r1)^2)-NL1[ii] # calculated
SNR2=SL2-(1/(1000*r2)^2)-NL2[ii]
# Detection probability for a given SNR threshold
Pdet1=pbeta(SNR1, shape1 = 1.5, shape2 = 2.5) # unknown, we will need to estimate shape1 and shape from the JAGS model
Pdet2=pbeta(SNR2, shape1 = 1.2, shape2 = 2.5)
## Idiot checking again
#yy=pbeta(seq(-40,40, by=.01), 1.5, 2.5)
#plot(seq(-40,40, by=.01),yy, xlim=c(-2,2))
# Number of animals detected is binomial based on the detection function
kk1[ii]=sum(rbinom(N1[ii], size=1, prob = Pdet1))
kk2[ii]=sum(rbinom(N2[ii], size=1, prob = Pdet2))
}
data=data.frame(cbind(tt, N1,N2, NL1, NL2,kk1, kk2))
##################################################################
效果很好,但我未能成功构建一个 JAGS 模型 可以预测不同时间动物的平均数量,更不用说在两个不同的研究区域了。该模型的最新版本(对我而言)在逻辑上是有意义的,但会抛出有关重新定义节点 Pdet1 的错误。但由于这些是我试图估计的值,因此相当混乱。如有任何帮助,我们将不胜感激。
也许是定义临时变量的问题? E.g.?但这似乎与我的问题有些不同,因为我只使用一个循环。
提前感谢您的宝贵时间。
#############################################################################
# Model Fitting#
#############################################################################
# Build the JAGS model to determine if the true number
# of animals at each location is different
# We are trying to estimate the mean population size (mu)
# Number of trials (detection periods)
N=nrow(data)
# We guess that the median range of animals within the detection
# area is 2500 m (will be updated later with propationg modelling)
# This is stupid. There needs to be stochasticity in both values.
MedTL=20*log10(2500)
MedSL=rgamma(1, slpr[1], slpr[2])
# We are trying to estimate the population mean and the
# detection function parameters
modeltext <- '
model{
#### liklihood ###
for(ii in 1:N){
# Number of animals detected is distributed by a normal function the total available (mu)
# and the probability that they are detected(Pdet)
kk1[ii] ~ dbin(Pdet, mu)
# Pdet is a beta distribution with parameters estimated from above
Pdet<-pbeta(SNR[ii], betaPar1, betaPar2)
# SNR (signal to noise ratio) is the Source Level-Noise Levl-Transmission Loss
SNR[ii]<-MedSL-MedTL-NL1[ii]
# SL and RL need stochasticity.....
}
## Priors ###
# Mu is dependent on how the animals are distributed in space. In this case, 1000. Because that's what I told it to do.
mu~dpois(lambda)
# We know mu must be greater than 0 and poison
lambda~dgamma(11, .01) # approximately what they should be based on mu 1000, sd 300
# We will assume we know a bit about the detection function, we will be helpful
betaPar1 ~ dnorm(1,.3)
betaPar2 ~ dnorm(2,.3)
}
#data# kk1, NL1, N, slpr, MedTL, MedSL
#monitor# betaPar1, betaPar2, mu, pdet
'
# We could just run the model like this:
results1 <- run.jags(modeltext, n.chains=2)
在您的 JAGS 模型中,Pdet 是
中的标量概率kk1[ii] ~ dbin(Pdet, mu)
但随后您将 Pdet 指定为 beta 概率分布函数:
Pdet<-pbeta(SNR[ii], betaPar1, betaPar2)
大概您的意思是 Pdet 是从 beta 发行版中提取的。这将是
Pdet ~ pbeta(SNR[ii], betaPar1, betaPar2)
或者(与您模拟的数据更一致),您的意思是 kk1[ii] 是 1000 次具有个体特定检测概率 Pdet 的伯努利试验的总和。所以你需要做的是让 kk1[ii] 成为一堆单独的特定伯努利试验的总和。在 JAGS 中,您观察到的数据必须是某种分布的随机实现,因此您 不能 写入,例如
for(ii in 1:300){
for(jj in 1:mu){
kk11[jj] ~ dbern(p[jj])
}
kk1[ii] <- sum(kk11[jj])
}
幸运的是,JAGS 已经预料到了您的困惑,并且已经为您的目的实现了特殊的可观察函数 dsum()。查看 JAGS 手册。
总而言之,除非我误读了你的模拟,否则你真正需要的是使用 kk1[ii] 遍历时间步长,但在该循环中你需要遍历个体,每个个体都需要在每个时间步长中有自己的模型化检测概率。此时,您可能会想问,"but how is their possibly enough data to model the individual detection probabilities across time-steps?" 答案是,您真正需要建模的只是多年来个体检测概率的 分布 。这将与传感器的个体距离分布、系统的(可能随时间变化的)噪声特性等有关。
好消息是(初步近似),只要你能在R中模拟数据,你就可以在JAGS中制定相应的模型。您现在的挑战是确保模拟生成您想要的数据,然后 JAGS 模型实际对应于模拟。
后记
为此,我想指出您可能没有想到的模拟和模型之间的一个奇怪差异。在模拟中,您固定 N=1000。在模型中,mu 的先验是伽马-泊松混合。你这样做有理由吗?一个原因是,如果真正感兴趣的量是泊松参数 lambda,它对应于系统中动物的密度,N 就是从中得出的。 在那种情况下,您也应该将此作为数据模拟的一部分:即
N <- rpois(1000)
并且您应该将 lambda(而不是 mu)作为 JAGS 输出中的主要关注量进行监控。 [这里还需要考虑 N 是绘制一次然后固定,还是在不同的时间步长重复绘制]。 否则,您可能会考虑更直观的 mu.