R:Loess 回归在值 10 之后产生类似阶梯的图形,而不是被平滑
R: Loess regression produces a staircase-like graph, rather than being smoothed, after the value 10
发生这种情况的可能原因是什么?它总是发生在值 10 之后。
应用回归前后感兴趣区域周围的数据集子集:
这是我用来生成图表的 ggplot2
调用。使用的平滑跨度为0.05.
dat <- read.csv("before_loess.csv", stringsAsFactors = FALSE)
smoothed.data <- applyLoessSmooth(dat, 0.05) # dat is the dataset before being smoothed
scan.plot.data <- melt(smoothed.data, id.vars = "sample.diameters", variable.name = 'series')
scan.plot <- ggplot(data = scan.plot.data, aes(sample.diameters, value)) +
geom_line(aes(colour = series)) +
xlab("Diameters (nm)") +
ylab("Concentration (dN#/cm^2)") +
theme(plot.title = element_text(hjust = 0.5))
用于应用黄土过滤器的函数:
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
## response
vars <- colnames(raw.data)
## covariate
id <- 1:nrow(raw.data)
## define a loess filter function (fitting loess regression line)
loess.filter <- function (x, given.data, span) loess(formula = as.formula(paste(x, "id", sep = "~")),
data = given.data,
degree = 1,
span = span)$fitted
## apply filter column-by-column
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data, span = smoothing.span),
col.names = colnames(raw.data))
sample.rows <- length(loess.graph.data[1])
loess.graph.data <- loess.graph.data %>% mutate("sample.diameters" = raw.data$sample.diameters[1:nrow(raw.data)])
}
第一个问题只是您的数据四舍五入为三位有效数字。低于 10,x 轴上的值 scan.plot.data$sample.diameters
以 0.01 的增量增加,这会在图表上产生平滑的曲线,但在 10 之后它们以 0.1 的增量增加,这在图表上显示为可见的阶梯。
第二个问题是您应该针对 sample.diameters
的值进行回归,而不是针对行号 id
。我认为这导致 x 的每个不同值都有多个平滑值 - 因此是步骤。以下是对您的函数的一些建议小修改...
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
vars <- colnames(raw.data)
vars <- vars[vars != "sample.diameters"] #you are regressing against this, so exclude it from vars
loess.filter <- function (x, given.data, span) loess(
formula = as.formula(paste(x, "sample.diameters", sep = "~")), #not 'id'
data = given.data,
degree = 1,
span = span)$fitted
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data,
span = smoothing.span),
col.names = vars) #final argument edited
loess.graph.data$sample.diameters <- raw.data$sample.diameters #simplified
return(loess.graph.data)
}
所有这些似乎都能解决问题...
当然,您也可以这样做...
dat.melt <- melt(dat, id.vars = "sample.diameters", variable.name = 'series')
ggplot(data = dat.melt, aes(sample.diameters, value, colour=series)) +
geom_smooth(method="loess", span=0.05, se=FALSE)
发生这种情况的可能原因是什么?它总是发生在值 10 之后。
应用回归前后感兴趣区域周围的数据集子集:
这是我用来生成图表的 ggplot2
调用。使用的平滑跨度为0.05.
dat <- read.csv("before_loess.csv", stringsAsFactors = FALSE)
smoothed.data <- applyLoessSmooth(dat, 0.05) # dat is the dataset before being smoothed
scan.plot.data <- melt(smoothed.data, id.vars = "sample.diameters", variable.name = 'series')
scan.plot <- ggplot(data = scan.plot.data, aes(sample.diameters, value)) +
geom_line(aes(colour = series)) +
xlab("Diameters (nm)") +
ylab("Concentration (dN#/cm^2)") +
theme(plot.title = element_text(hjust = 0.5))
用于应用黄土过滤器的函数:
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
## response
vars <- colnames(raw.data)
## covariate
id <- 1:nrow(raw.data)
## define a loess filter function (fitting loess regression line)
loess.filter <- function (x, given.data, span) loess(formula = as.formula(paste(x, "id", sep = "~")),
data = given.data,
degree = 1,
span = span)$fitted
## apply filter column-by-column
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data, span = smoothing.span),
col.names = colnames(raw.data))
sample.rows <- length(loess.graph.data[1])
loess.graph.data <- loess.graph.data %>% mutate("sample.diameters" = raw.data$sample.diameters[1:nrow(raw.data)])
}
第一个问题只是您的数据四舍五入为三位有效数字。低于 10,x 轴上的值 scan.plot.data$sample.diameters
以 0.01 的增量增加,这会在图表上产生平滑的曲线,但在 10 之后它们以 0.1 的增量增加,这在图表上显示为可见的阶梯。
第二个问题是您应该针对 sample.diameters
的值进行回归,而不是针对行号 id
。我认为这导致 x 的每个不同值都有多个平滑值 - 因此是步骤。以下是对您的函数的一些建议小修改...
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
vars <- colnames(raw.data)
vars <- vars[vars != "sample.diameters"] #you are regressing against this, so exclude it from vars
loess.filter <- function (x, given.data, span) loess(
formula = as.formula(paste(x, "sample.diameters", sep = "~")), #not 'id'
data = given.data,
degree = 1,
span = span)$fitted
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data,
span = smoothing.span),
col.names = vars) #final argument edited
loess.graph.data$sample.diameters <- raw.data$sample.diameters #simplified
return(loess.graph.data)
}
所有这些似乎都能解决问题...
当然,您也可以这样做...
dat.melt <- melt(dat, id.vars = "sample.diameters", variable.name = 'series')
ggplot(data = dat.melt, aes(sample.diameters, value, colour=series)) +
geom_smooth(method="loess", span=0.05, se=FALSE)