如何避免由于 R 中的 matplot 中缺失值而导致的间隙？

Question

我有一个使用 matplot 绘制一些数据的函数。数据结构是这样的：

test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])

到目前为止一切顺利。但是，如果数据集中存在缺失值，那么结果图中就会存在间隙，我想通过连接间隙的边缘来避免这些间隙。

test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])

在实际情况下这是一个函数内部，矩阵的维度更大，行数、列数和非重叠缺失值的位置可能会在不同的调用之间发生变化，所以我想找到一个可以灵活处理这个问题的解决方案。我还需要使用 matlines

我在想也许可以用内推数据填补空白，但也许有更好的解决方案。

Answer 1

您可以使用 imputeTS 包中的 na.interpolation 函数：

test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])

library('imputeTS')

test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])

Answer 2

我今天确实遇到了这种情况，但我不想插入值 - 我只是想让这些线条“跨越空白”，可以这么说。我想出了一个解决方案，在我看来，它比插值更优雅，所以我认为我会 post 即使问题很老。

导致间隙的问题是连续值之间有 NAs。所以我的解决方案是 'shift' 列值，这样就不会有 NA 间隙。例如，由 c(1,2,NA,NA,5) 组成的列将变为 c(1,2,5,NA,NA)。我在 apply() 循环中使用一个名为 shift_vec_na() 的函数来执行此操作。 x值也需要调整，所以我们可以用同样的原理把x值做成一个矩阵，但是利用y矩阵的列来决定哪些值要平移。

函数代码如下：

# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool 
#   is TRUE will be 'shifted' to the front of the vector, and the back of the
#   vector will be all NA (i.e. the number of NAs in the resulting vector is
#   sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
  n <- sum(bool)
  if(n < length(x)){
    x[1:n] <- x[bool]
    x[(n + 1):length(x)] <- NA
  } 
  return(x)
}

# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
  y2 <- apply(y, 2, function(col_i){
    return(shift_vec_na(col_i, !is.na(col_i)))
  })
  
  x2 <- apply(y, 2, function(col_i){
    return(shift_vec_na(x, !is.na(col_i)))
  })
  return(list(x = x2, y = y2))
}

然后，使用示例数据：

test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])

matplot(lst$x, lst$y, type = "b")

Answer 3

今天也有同样的问题。在我的上下文中，我不允许进行插值。我在这里提供了一个最小但足够通用的工作示例来说明我所做的事情。我希望它能帮助某人：

mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
    #graphical set up of the window
    plot.new()
    plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
    mtext(text = xlab,side = 1, line = 3)
    mtext(text = ylab,side = 2, line = 3)
    mtext(text = main,side = 3, line = 0)
    axis(1L)
    axis(2L)
    #plot the data
    for(i in 1:nrow(data)){
        nin.na <- !is.na(data[i,])
        lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
    }
}

核心'trick'在x=which(nin.na)。它使线的数据点与 x 轴的索引一致。
行数

plot.new()  
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))  
mtext(text = xlab,side = 1, line = 3)  
mtext(text = ylab,side = 2, line = 3)  
mtext(text = main,side = 3, line = 0)  
axis(1L)  
axis(2L)`

绘制 window 的图形部分。 range(data, na.rm=TRUE) 将绘图调整为适当的大小，能够包含所有 data 点。 mtext(...) 用于标记轴并提供主标题。轴本身由 axis(...) 命令绘制。
以下 for 循环绘制数据。
mymatplot 的函数头为典型 plot 参数的可选通道提供 ... 参数，如 lty、lwt、cex 等，通过.这些将传递给 lines.
关于颜色选择的最后一句话 - 它们取决于您的口味。

如何避免由于 R 中的 matplot 中缺失值而导致的间隙？

How to avoid gaps due to missing values in matplot in R?

r

missing-data

na