R Matrix 包:Demean 稀疏矩阵
R Matrix package: Demean sparse matrix
有没有一种简单的方法可以按列贬低稀疏矩阵,同时将零值视为缺失(使用 Matrix 包)?
我似乎遇到了两个问题:
找到合适的列意味着
空单元格被视为零而不是缺失:
M0 <- matrix(rep(1:5,4),nrow = 4)
M0[2,2] <- M0[2,3] <- 0
M <- as(M0, "sparseMatrix")
M
#[1,] 1 5 4 3 2
#[2,] 2 . . 4 3
#[3,] 3 2 1 5 4
#[4,] 4 3 2 1 5
colMeans(M)
#[1] 2.50 2.50 1.75 3.25 3.50
正确的结果应该是:
colMeans_correct <- colSums(M) / c(4,3,3,4,4)
colMeans_correct
#[1] 2.500000 3.333333 2.333333 3.250000 3.500000
减去列均值
也对缺失的单元格执行减法:
sweep(M, 2, colMeans_correct)
#4 x 5 Matrix of class "dgeMatrix"
# [,1] [,2] [,3] [,4] [,5]
#[1,] -1.5 1.6666667 1.6666667 -0.25 -1.5
#[2,] -0.5 -3.3333333 -2.3333333 0.75 -0.5
#[3,] 0.5 -1.3333333 -1.3333333 1.75 0.5
#[4,] 1.5 -0.3333333 -0.3333333 -2.25 1.5
P.S。希望发布一个由两个问题组成的问题不是问题。它们与同一个任务相关联,似乎反映了同一个问题——区分缺失值和实际零值。
一种选择是将 colSums
除以非零逻辑矩阵 colSums
colSums(M)/colSums(M!=0)
#[1] 2.500000 3.333333 2.333333 3.250000 3.500000
或者另一种选择是将 0 替换为 NA
并使用 na.rm = TRUE
参数
获取 colMeans
colMeans(M*NA^!M, na.rm = TRUE)
#[1] 2.500000 3.333333 2.333333 3.250000 3.500000
或@user20650 评论
colSums(M) / diff(M@p)
#[1] 2.500000 3.333333 2.333333 3.250000 3.500000
其中 'p' 是 ?sparseMatrix
中提到的指针
In typical usage, p is missing, i and j are vectors of positive
integers and x is a numeric vector. These three vectors, which must
have the same length, form the triplet representation of the sparse
matrix.
If i or j is missing then p must be a non-decreasing integer vector
whose first element is zero. It provides the compressed, or “pointer”
representation of the row or column indices, whichever is missing. The
expanded form of p, rep(seq_along(dp),dp) where dp <- diff(p), is used
as the (1-based) row or column indices.
有没有一种简单的方法可以按列贬低稀疏矩阵,同时将零值视为缺失(使用 Matrix 包)?
我似乎遇到了两个问题:
找到合适的列意味着
空单元格被视为零而不是缺失:
M0 <- matrix(rep(1:5,4),nrow = 4)
M0[2,2] <- M0[2,3] <- 0
M <- as(M0, "sparseMatrix")
M
#[1,] 1 5 4 3 2
#[2,] 2 . . 4 3
#[3,] 3 2 1 5 4
#[4,] 4 3 2 1 5
colMeans(M)
#[1] 2.50 2.50 1.75 3.25 3.50
正确的结果应该是:
colMeans_correct <- colSums(M) / c(4,3,3,4,4)
colMeans_correct
#[1] 2.500000 3.333333 2.333333 3.250000 3.500000
减去列均值
也对缺失的单元格执行减法:
sweep(M, 2, colMeans_correct)
#4 x 5 Matrix of class "dgeMatrix"
# [,1] [,2] [,3] [,4] [,5]
#[1,] -1.5 1.6666667 1.6666667 -0.25 -1.5
#[2,] -0.5 -3.3333333 -2.3333333 0.75 -0.5
#[3,] 0.5 -1.3333333 -1.3333333 1.75 0.5
#[4,] 1.5 -0.3333333 -0.3333333 -2.25 1.5
P.S。希望发布一个由两个问题组成的问题不是问题。它们与同一个任务相关联,似乎反映了同一个问题——区分缺失值和实际零值。
一种选择是将 colSums
除以非零逻辑矩阵 colSums
colSums(M)/colSums(M!=0)
#[1] 2.500000 3.333333 2.333333 3.250000 3.500000
或者另一种选择是将 0 替换为 NA
并使用 na.rm = TRUE
参数
colMeans
colMeans(M*NA^!M, na.rm = TRUE)
#[1] 2.500000 3.333333 2.333333 3.250000 3.500000
或@user20650 评论
colSums(M) / diff(M@p)
#[1] 2.500000 3.333333 2.333333 3.250000 3.500000
其中 'p' 是 ?sparseMatrix
In typical usage, p is missing, i and j are vectors of positive integers and x is a numeric vector. These three vectors, which must have the same length, form the triplet representation of the sparse matrix.
If i or j is missing then p must be a non-decreasing integer vector whose first element is zero. It provides the compressed, or “pointer” representation of the row or column indices, whichever is missing. The expanded form of p, rep(seq_along(dp),dp) where dp <- diff(p), is used as the (1-based) row or column indices.