有人可以解释这些代码行的含义吗?
Can someone explain what these lines of code mean?
我一直在努力寻找一种方法来制作散点图,其颜色强度指示该区域中绘制的点的密度(这是一个有很多重叠的大数据集)。我找到了这些允许我执行此操作的代码行,但我想确保我真正了解每一行实际在做什么。
提前致谢:)
get_density <- function(x, y, ...){
dens <- MASS::kde2d(x, y, ...)
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
ii <- cbind(ix, iy)
return(dens$z[ii])
}
set.seed(1)
dat <- data.frame(x = subset2$conservation.phyloP, y = subset2$gene.expression.RPKM)
dat$density <- get_density(dat$x, dat$y, n = 100)
下面是带有一些解释性注释的函数,如果还有什么不明白的地方请告诉我:
# The function "get_density" takes two arguments, called x and y
# The "..." allows you to pass other arguments
get_density <- function(x, y, ...){
# The "MASS::" means it comes from the MASS package, but makes it so you don't have to load the whole MASS package and can just pull out this one function to use.
# This is where the arguments passed as "..." (above) would get passed along to the kde2d function
dens <- MASS::kde2d(x, y, ...)
# These lines use the base R function "findInterval" to get the density values of x and y
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
# This command "cbind" pastes the two sets of values together, each as one column
ii <- cbind(ix, iy)
# This line takes a subset of the "density" output, subsetted by the intervals above
return(dens$z[ii])
}
# The "set.seed()" function makes sure that any randomness used by a function is the same if it is re-run (as long as the same number is used), so it makes code more reproducible
set.seed(1)
dat <- data.frame(x = subset2$conservation.phyloP, y = subset2$gene.expression.RPKM)
dat$density <- get_density(dat$x, dat$y, n = 100)
如果您的问题是关于 MASS::kde2d
函数本身,最好重写这个 Whosebug 问题以反映这一点!
看起来相同的函数被包装到 here 描述的 ggplot2
方法中,所以如果您改用 ggplot2
来制作您的绘图,您可以尝试一下。
我一直在努力寻找一种方法来制作散点图,其颜色强度指示该区域中绘制的点的密度(这是一个有很多重叠的大数据集)。我找到了这些允许我执行此操作的代码行,但我想确保我真正了解每一行实际在做什么。 提前致谢:)
get_density <- function(x, y, ...){
dens <- MASS::kde2d(x, y, ...)
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
ii <- cbind(ix, iy)
return(dens$z[ii])
}
set.seed(1)
dat <- data.frame(x = subset2$conservation.phyloP, y = subset2$gene.expression.RPKM)
dat$density <- get_density(dat$x, dat$y, n = 100)
下面是带有一些解释性注释的函数,如果还有什么不明白的地方请告诉我:
# The function "get_density" takes two arguments, called x and y
# The "..." allows you to pass other arguments
get_density <- function(x, y, ...){
# The "MASS::" means it comes from the MASS package, but makes it so you don't have to load the whole MASS package and can just pull out this one function to use.
# This is where the arguments passed as "..." (above) would get passed along to the kde2d function
dens <- MASS::kde2d(x, y, ...)
# These lines use the base R function "findInterval" to get the density values of x and y
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
# This command "cbind" pastes the two sets of values together, each as one column
ii <- cbind(ix, iy)
# This line takes a subset of the "density" output, subsetted by the intervals above
return(dens$z[ii])
}
# The "set.seed()" function makes sure that any randomness used by a function is the same if it is re-run (as long as the same number is used), so it makes code more reproducible
set.seed(1)
dat <- data.frame(x = subset2$conservation.phyloP, y = subset2$gene.expression.RPKM)
dat$density <- get_density(dat$x, dat$y, n = 100)
如果您的问题是关于 MASS::kde2d
函数本身,最好重写这个 Whosebug 问题以反映这一点!
看起来相同的函数被包装到 here 描述的 ggplot2
方法中,所以如果您改用 ggplot2
来制作您的绘图,您可以尝试一下。