计算 R 中大矩阵的 null space
Computing the null space of a bigmatrix in R
我找不到任何函数或包来计算 R 中 bigmatrix
(来自 library(bigmemory)
)的空值 space 或(QR 分解)。例如:
library(bigmemory)
a <- big.matrix(1000000, 1000, type='double', init=0)
我尝试了以下操作,但出现了显示的错误。如何找到 bigmemory
对象的 null space?
a.qr <- Matrix::qr(a)
# Error in as.vector(data) :
# no method for coercing this S4 class to a vector
q.null <- MASS::Null(a)
# Error in as.vector(data) :
# no method for coercing this S4 class to a vector
@Mahon @user20650 @F.Privė 为了清楚起见,我联系了 bigmemory 团队并询问了
Essentially, is there an implementation of the QR function (QR Decomposition) that works with big memory matrixes?
我觉得弄清楚最初提出的问题很有用。 @F.Privė - 很好的答案。希望您的回答和他们的回应将有助于指导未来的人们。他们的回复如下:
Thanks for the note. There is not currently an implementation of the qr decomposition. Ideally, you would implement this using Householder reflections (if the matrix is dense) or Givens rotations (if it is sparse).
The irlba package is compatible with bigmemory. It provides a truncated singular value decomposition. So, if your matrix is relatively sparse, you could truncate at the rank of the matrix. This is probably your best option. If you don't know the rank then you can use the package to update the truncation iteratively.
Please note that if your matrix is (tall and skinny or short and fat) then the SO solution is OK. However, anytime you resort to calculating the cross-product you lose some numerical stability. This can be an issue if you are planning on inverting the matrix.
如果要计算矩阵的完整 SVD,可以使用包 bigstatsr 来按块执行计算。 FBM
代表 Filebacked Big Matrix,是一个类似于包 bigmemory.
的 filebacked big.matrix
对象的对象
library(bigstatsr)
options(bigstatsr.block.sizeGB = 0.5)
# Initialize FBM with random numbers
a <- FBM(1e6, 1e3)
big_apply(a, a.FUN = function(X, ind) {
X[, ind] <- rnorm(nrow(X) * length(ind))
NULL
}, a.combine = 'c')
# Compute t(a) * a
K <- big_crossprodSelf(a, big_scale(center = FALSE, scale = FALSE))
# Get v and d where a = u * d * t(v) the SVD of a
eig <- eigen(K[])
v <- eig$vectors
d <- sqrt(eig$values)
# Get u if you need it. It will be of the same size of u
# so that I store it as a FBM.
u <- FBM(nrow(a), ncol(a))
big_apply(u, a.FUN = function(X, ind, a, v, d) {
X[ind, ] <- sweep(a[ind, ] %*% v, 2, d, "/")
NULL
}, a.combine = 'c', block.size = 50e3, ind = rows_along(u),
a = a, v = v, d = d)
# Verification
ind <- sample(nrow(a), 1000)
all.equal(a[ind, ], tcrossprod(sweep(u[ind, ], 2, d, "*"), v))
这在我的电脑上大约需要 10 分钟。
我找不到任何函数或包来计算 R 中 bigmatrix
(来自 library(bigmemory)
)的空值 space 或(QR 分解)。例如:
library(bigmemory)
a <- big.matrix(1000000, 1000, type='double', init=0)
我尝试了以下操作,但出现了显示的错误。如何找到 bigmemory
对象的 null space?
a.qr <- Matrix::qr(a)
# Error in as.vector(data) :
# no method for coercing this S4 class to a vector
q.null <- MASS::Null(a)
# Error in as.vector(data) :
# no method for coercing this S4 class to a vector
@Mahon @user20650 @F.Privė 为了清楚起见,我联系了 bigmemory 团队并询问了
Essentially, is there an implementation of the QR function (QR Decomposition) that works with big memory matrixes?
我觉得弄清楚最初提出的问题很有用。 @F.Privė - 很好的答案。希望您的回答和他们的回应将有助于指导未来的人们。他们的回复如下:
Thanks for the note. There is not currently an implementation of the qr decomposition. Ideally, you would implement this using Householder reflections (if the matrix is dense) or Givens rotations (if it is sparse).
The irlba package is compatible with bigmemory. It provides a truncated singular value decomposition. So, if your matrix is relatively sparse, you could truncate at the rank of the matrix. This is probably your best option. If you don't know the rank then you can use the package to update the truncation iteratively.
Please note that if your matrix is (tall and skinny or short and fat) then the SO solution is OK. However, anytime you resort to calculating the cross-product you lose some numerical stability. This can be an issue if you are planning on inverting the matrix.
如果要计算矩阵的完整 SVD,可以使用包 bigstatsr 来按块执行计算。 FBM
代表 Filebacked Big Matrix,是一个类似于包 bigmemory.
big.matrix
对象的对象
library(bigstatsr)
options(bigstatsr.block.sizeGB = 0.5)
# Initialize FBM with random numbers
a <- FBM(1e6, 1e3)
big_apply(a, a.FUN = function(X, ind) {
X[, ind] <- rnorm(nrow(X) * length(ind))
NULL
}, a.combine = 'c')
# Compute t(a) * a
K <- big_crossprodSelf(a, big_scale(center = FALSE, scale = FALSE))
# Get v and d where a = u * d * t(v) the SVD of a
eig <- eigen(K[])
v <- eig$vectors
d <- sqrt(eig$values)
# Get u if you need it. It will be of the same size of u
# so that I store it as a FBM.
u <- FBM(nrow(a), ncol(a))
big_apply(u, a.FUN = function(X, ind, a, v, d) {
X[ind, ] <- sweep(a[ind, ] %*% v, 2, d, "/")
NULL
}, a.combine = 'c', block.size = 50e3, ind = rows_along(u),
a = a, v = v, d = d)
# Verification
ind <- sample(nrow(a), 1000)
all.equal(a[ind, ], tcrossprod(sweep(u[ind, ], 2, d, "*"), v))
这在我的电脑上大约需要 10 分钟。