Title: | Parallel Low-Rank Approximation with Nonnegativity Constraints |
---|---|
Description: | 'Rcpp' bindings for 'PLANC', a highly parallel and extensible NMF/NTF (Non-negative Matrix/Tensor Factorization) library. Wraps algorithms described in Kannan et. al (2018) <doi:10.1109/TKDE.2017.2767592> and Eswar et. al (2021) <doi:10.1145/3432185>. Implements algorithms described in Welch et al. (2019) <doi:10.1016/j.cell.2019.05.006>, Gao et al. (2021) <doi:10.1038/s41587-021-00867-x>, and Kriebel & Welch (2022) <doi:10.1038/s41467-022-28431-4>. |
Authors: | Andrew Robbins [aut, cre], Yichen Wang [aut], Joshua Welch [cph], Ramakrishnan Kannan [cph] (The original PLANC code), UT-Batelle [cph] (The original PLANC code) |
Maintainer: | Andrew Robbins <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2024-11-28 05:41:56 UTC |
Source: | https://github.com/welch-lab/RcppPlanc |
Use the BPP algorithm to get the nonnegative least squares solution. Regular
NNLS problem is described as optimizing
where
and
are given and
is to be solved.
bppnnls
takes and
as input.
bppnnls_prod
takes
and
as
input to directly go for the intermediate step of BPP algorithm. This can be
useful when the dimensionality of
and
is large while
pre-calculating
and
is cheap.
bppnnls(C, B, nCores = 2L) bppnnls_prod(CtC, CtB, nCores = 2L)
bppnnls(C, B, nCores = 2L) bppnnls_prod(CtC, CtB, nCores = 2L)
C |
Input dense |
B |
Input |
nCores |
The number of parallel tasks that will be spawned.
Default |
CtC |
The |
CtB |
The |
The calculated solution matrix in dense form.
set.seed(1) C <- matrix(rnorm(250), nrow = 25) B <- matrix(rnorm(375), nrow = 25) res1 <- bppnnls(C, B) dim(res1) res2 <- bppnnls_prod(t(C) %*% C, t(C) %*% B) all.equal(res1, res2)
set.seed(1) C <- matrix(rnorm(250), nrow = 25) B <- matrix(rnorm(375), nrow = 25) res1 <- bppnnls(C, B) dim(res1) res2 <- bppnnls_prod(t(C) %*% C, t(C) %*% B) all.equal(res1, res2)
The two datasets, namingly ctrl.sparse
and stim.sparse
, are
single-cell transcriptomic data preprocessed and subsampled from the study
of Hyun Min Kang and et al., Nat Biotech., 2018. The raw datasets were two
sparse matrices of integer values indicating the counts of genes (rows) per
cell (columns). We normalized each column of both matrices by its library
size (sum), and selected common variable genes across the datasets. Finally,
we scaled the genes without centering them, in order to keep the
non-negativity. The processed datasets were then randomly subsampled for a
minimal example.
ctrl.sparse stim.sparse
ctrl.sparse stim.sparse
An object of class dgCMatrix
with 173 rows and 300 columns.
An object of class dgCMatrix
with 173 rows and 300 columns.
https://www.nature.com/articles/nbt.4042
Retrieve the dimension of H5SpMat argument list
## S3 method for class 'H5SpMat' dim(x) ## S3 replacement method for class 'H5SpMat' dim(x) <- value
## S3 method for class 'H5SpMat' dim(x) ## S3 replacement method for class 'H5SpMat' dim(x) <- value
x |
H5SpMat argument list object |
value |
Numeric vector of two, for number of rows and number of columns. |
Retriever returns a vector of two (nrow and ncol), setter sets the value of that in the argument list.
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"), "data", "indices", "indptr", 173, 300) dim(h) nrow(h) ncol(h) dim(h) <- c(200, 200) h
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"), "data", "indices", "indptr", 173, 300) dim(h) nrow(h) ncol(h) dim(h) <- c(200, 200) h
Prepare character information of a H5Mat object
## S3 method for class 'H5Mat' format(x, ...)
## S3 method for class 'H5Mat' format(x, ...)
x |
H5Mat argument list object |
... |
Not used. |
A character scalar of the displayed message
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"), "data") format(h)
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"), "data") format(h)
prepare character information of a H5SpMat object
## S3 method for class 'H5SpMat' format(x, ...)
## S3 method for class 'H5SpMat' format(x, ...)
x |
H5SpMat argument list object |
... |
Not used. |
A character scalar of the displayed message
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"), "data", "indices", "indptr", 173, 300) format(h)
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"), "data", "indices", "indptr", 173, 300) format(h)
For running inmf
, onlineINMF
or
uinmf
with dense matrix stored in HDF5 file, users will need
to construct an argument list for the filename of the HDF5 file as well as
the path in the file storing the matrix. H5Mat
is provided as an
instructed constructor. Meanwhile, since the INMF functions require that
all datasets should be of the same type, as.H5Mat
is provided for
writing in-memory data into a new HDF5 file on disk and returning the
constructed argument list.
H5Mat(filename, dataPath) as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...) ## S3 method for class 'matrix' as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...) ## S3 method for class 'dgCMatrix' as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...) ## Default S3 method: as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)
H5Mat(filename, dataPath) as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...) ## S3 method for class 'matrix' as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...) ## S3 method for class 'dgCMatrix' as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...) ## Default S3 method: as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)
filename |
Filename of the HDF5 file |
dataPath |
Path in the HDF5 file that points to a 2D dense matrix.
Default |
x |
For |
overwrite |
Logical, whether to overwrite the file if already exists at
the given path. Default |
... |
not used |
H5Mat object, indeed a list object.
if (require("withr")) { H5MatEx <- function(){ withr::local_dir(withr::local_tempdir()) h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"), "data") print(h) library(Matrix) ctrl.dense <- as.matrix(ctrl.sparse) h1 <- as.H5Mat(ctrl.dense, "ctrl_from_dense_to_dense.h5", dataPath = "data") h1 h2 <- as.H5Mat(ctrl.sparse, "ctrl_from_sparse_to_dense.h5", dataPath = "data") } H5MatEx() }
if (require("withr")) { H5MatEx <- function(){ withr::local_dir(withr::local_tempdir()) h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"), "data") print(h) library(Matrix) ctrl.dense <- as.matrix(ctrl.sparse) h1 <- as.H5Mat(ctrl.dense, "ctrl_from_dense_to_dense.h5", dataPath = "data") h1 h2 <- as.H5Mat(ctrl.sparse, "ctrl_from_sparse_to_dense.h5", dataPath = "data") } H5MatEx() }
For running inmf
, onlineINMF
or
uinmf
with sparse matrix stored in HDF5 file, users will need
to construct an argument list for the filename of the HDF5 file as well as
the paths in the file storing the arrays that construct the CSC (compressed
sparse column) matrix. H5SpMat
is provided as an instructed
constructor. Meanwhile, since the INMF functions require that all datasets
should be of the same type, as.H5SpMat
is provided for writing
in-memory data into a new HDF5 file on disk and returning the
constructed argument list.
H5SpMat(filename, valuePath, rowindPath, colptrPath, nrow, ncol) as.H5SpMat( x, filename, valuePath = "data", rowindPath = "indices", colptrPath = "indptr", overwrite = FALSE, ... ) ## S3 method for class 'matrix' as.H5SpMat( x, filename, valuePath = "data", rowindPath = "indices", colptrPath = "indptr", overwrite = FALSE, ... ) ## S3 method for class 'dgCMatrix' as.H5SpMat( x, filename, valuePath = "data", rowindPath = "indices", colptrPath = "indptr", overwrite = FALSE, ... ) ## Default S3 method: as.H5SpMat( x, filename, valuePath = "data", rowindPath = "indices", colptrPath = "indptr", overwrite = FALSE, ... )
H5SpMat(filename, valuePath, rowindPath, colptrPath, nrow, ncol) as.H5SpMat( x, filename, valuePath = "data", rowindPath = "indices", colptrPath = "indptr", overwrite = FALSE, ... ) ## S3 method for class 'matrix' as.H5SpMat( x, filename, valuePath = "data", rowindPath = "indices", colptrPath = "indptr", overwrite = FALSE, ... ) ## S3 method for class 'dgCMatrix' as.H5SpMat( x, filename, valuePath = "data", rowindPath = "indices", colptrPath = "indptr", overwrite = FALSE, ... ) ## Default S3 method: as.H5SpMat( x, filename, valuePath = "data", rowindPath = "indices", colptrPath = "indptr", overwrite = FALSE, ... )
filename |
Filename of the HDF5 file |
valuePath |
Path in the HDF5 file that points to a 1D array storing the
non-zero values of the sparse matrix. Default |
rowindPath |
Path in the HDF5 file that points to a 1D integer array
storing the row indices of non-zero values in each column of the sparse
matrix. Default |
colptrPath |
Path in the HDF5 file that points to a 1D integer array
storing the number of non-zero values in each column of the sparse matrix.
Default |
nrow , ncol
|
Integer, the true dimensionality of the sparse matrix. |
x |
For |
overwrite |
Logical, whether to overwrite the file if already exists at
the given path. Default |
... |
not used |
H5SpMat object, indeed a list object.
if (require("withr")) { H5SpMatEx <- function(){ withr::local_dir(withr::local_tempdir()) h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"), "data", "indices", "indptr", 173, 300) dim(h) library(Matrix) ctrl.dense <- as.matrix(ctrl.sparse) h1 <- as.H5SpMat(ctrl.sparse, "ctrl_from_sparse_to_sparse.h5", valuePath = "data", rowindPath = "indices", colptrPath = "indptr", nrow = nrow(ctrl.sparse), ncol = ncol(ctrl.sparse)) h1 h2 <- as.H5SpMat(ctrl.dense, "ctrl_from_dense_to_sparse.h5", valuePath = "data", rowindPath = "indices", colptrPath = "indptr", nrow = nrow(ctrl.sparse), ncol = ncol(ctrl.sparse)) h2 } H5SpMatEx() }
if (require("withr")) { H5SpMatEx <- function(){ withr::local_dir(withr::local_tempdir()) h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"), "data", "indices", "indptr", 173, 300) dim(h) library(Matrix) ctrl.dense <- as.matrix(ctrl.sparse) h1 <- as.H5SpMat(ctrl.sparse, "ctrl_from_sparse_to_sparse.h5", valuePath = "data", rowindPath = "indices", colptrPath = "indptr", nrow = nrow(ctrl.sparse), ncol = ncol(ctrl.sparse)) h1 h2 <- as.H5SpMat(ctrl.dense, "ctrl_from_dense_to_sparse.h5", valuePath = "data", rowindPath = "indices", colptrPath = "indptr", nrow = nrow(ctrl.sparse), ncol = ncol(ctrl.sparse)) h2 } H5SpMatEx() }
Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,
2019) to return factorized ,
, and
matrices. The
objective function is stated as
where is the input non-negative matrix of the
'th dataset,
is the total number of datasets.
is of size
for
features and
sample points,
is of size
,
is of size
, and
is of size
.
inmf
optimizes the objective with ANLS strategy, while
onlineINMF
optimizes the same objective with an online learning
strategy.
inmf( objectList, k = 20, lambda = 5, niter = 30, nCores = 2, Hinit = NULL, Vinit = NULL, Winit = NULL, verbose = FALSE )
inmf( objectList, k = 20, lambda = 5, niter = 30, nCores = 2, Hinit = NULL, Vinit = NULL, Winit = NULL, verbose = FALSE )
objectList |
list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat. |
k |
Integer. Inner dimensionality to factorize the datasets into.
Default |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
niter |
Integer. Total number of block coordinate descent iterations to
perform. Default |
nCores |
The number of parallel tasks that will be spawned.
Default |
Hinit |
Initial values to use for |
Vinit |
Similar to |
Winit |
Initial values to use for |
verbose |
Logical scalar. Whether to show information and progress.
Default |
A list of the following elements:
H
- a list of result matrices of size
V
- a list of result matrices
W
- the result matrix
objErr
- the final objective error value.
Yichen Wang
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
library(Matrix) set.seed(1) result <- inmf(list(ctrl.sparse, stim.sparse), k = 10, niter = 10)
library(Matrix) set.seed(1) result <- inmf(list(ctrl.sparse, stim.sparse), k = 10, niter = 10)
Regularly, Non-negative Matrix Factorization (NMF) is factorizes input
matrix into low rank matrices
and
, so that
. The objective function can be stated as
. In practice,
is usually
regarded as a matrix of
features by
sample points. And the
result matrix
should have the dimensionality of
and
H with
(transposed).
This function wraps the algorithms implemented in PLANC library to solve
NMF problems. Algorithms includes Alternating Non-negative Least Squares
with Block Principal Pivoting (ANLS-BPP), Alternating Direction Method of
Multipliers (ADMM), Hierarchical Alternating Least Squares (HALS), and
Multiplicative Update (MU).
nmf( x, k, niter = 30L, algo = "anlsbpp", nCores = 2L, Winit = NULL, Hinit = NULL )
nmf( x, k, niter = 30L, algo = "anlsbpp", nCores = 2L, Winit = NULL, Hinit = NULL )
x |
Input matrix for factorization. Can be either dense or sparse. |
k |
Integer. Factor matrix rank. |
niter |
Integer. Maximum number of NMF interations. |
algo |
Algorithm to perform the factorization, choose from "anlsbpp", "admm", "hals" or "mu". See detailed sections. |
nCores |
The number of parallel tasks that will be spawned. Only applies to anlsbpp.
Default |
Winit |
Initial left-hand factor matrix, must be of size m x k. |
Hinit |
Initial right-hand factor matrix, must be of size n x k. |
A list with the following elements:
W
- the result left-hand factor matrix
H
- the result right hand matrix.
objErr
- the objective error of the factorization.
Ramakrishnan Kannan and et al., A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization, PPoPP '16, 2016, 10.1145/2851141.2851152
Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,
2019, C. Gao, 2021) using online learning approach to return factorized
,
, and
matrices. The objective function is stated as
where is the input non-negative matrix of the
'th dataset,
is the total number of datasets.
is of size
for
features and
sample points,
is of size
,
is of size
, and
is of size
.
Different from inmf
which optimizes the objective with ANLS
approach, onlineINMF
optimizes the same objective with online learning
strategy, where it updates mini-batches of solving the NNLS
problem, and updates
and
with HALS multiplicative method.
This function allows online learning in 3 scenarios:
Fully observed datasets;
Iterative refinement using continually arriving datasets;
Projection of new datasets without updating the existing factorization
onlineINMF( objectList, newDatasets = NULL, project = FALSE, k = 20, lambda = 5, maxEpoch = 5, minibatchSize = 5000, maxHALSIter = 1, nCores = 2, Vinit = NULL, Winit = NULL, Ainit = NULL, Binit = NULL, verbose = FALSE )
onlineINMF( objectList, newDatasets = NULL, project = FALSE, k = 20, lambda = 5, maxEpoch = 5, minibatchSize = 5000, maxHALSIter = 1, nCores = 2, Vinit = NULL, Winit = NULL, Ainit = NULL, Binit = NULL, verbose = FALSE )
objectList |
list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat. |
newDatasets |
Same requirements as for new arriving datasets. Default
|
project |
Logical scalar, whether to run scenario 3. See description.
Default |
k |
Integer. Inner dimensionality to factorize the datasets into.
Default |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
maxEpoch |
The number of epochs to iterate through. Default |
minibatchSize |
Total number of cells in each mini-batch. Default
|
maxHALSIter |
Maximum number of block coordinate descent (HALS
algorithm) iterations to perform for each update of |
nCores |
The number of parallel tasks that will be spawned.
Default |
Vinit , Winit , Ainit , Binit
|
Pass the previous factorization result for
datasets existing in |
verbose |
Logical scalar. Whether to show information and progress.
Default |
A list of the following elements:
H
- a list of result matrices of size
V
- a list of result matrices
W
- the result matrix
A
- a list of result matrices,
B
- a list of result matrices,
objErr
- the final objective error value.
Yichen Wang
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
Chao Gao and et al., Iterative single-cell multi-omic integration using online learning, Nat Biotechnol., 2021
library(Matrix) # Scenario 1 with sparse matrices set.seed(1) res1 <- onlineINMF(list(ctrl.sparse, stim.sparse), minibatchSize = 50, k = 10) # Scenario 2 with H5 dense matrices h5dense1 <- H5Mat(filename = system.file("extdata", "ctrl_dense.h5", package = "RcppPlanc", mustWork = TRUE), dataPath = "scaleData") h5dense2 <- H5Mat(filename = system.file("extdata", "stim_dense.h5", package = "RcppPlanc", mustWork = TRUE), dataPath = "scaleData") res2 <- onlineINMF(list(ctrl = h5dense1), minibatchSize = 50, k = 10) res3 <- onlineINMF(list(ctrl = h5dense1), newDatasets = list(stim = h5dense2), Vinit = res2$V, Winit = res2$W, Ainit = res2$A, Binit = res2$B, minibatchSize = 50, k = 10) # Scenario 3 with H5 sparse matrices h5sparse1 <- H5SpMat(filename = system.file("extdata", "ctrl_sparse.h5", package = "RcppPlanc", mustWork = TRUE), valuePath = "scaleDataSparse/data", rowindPath = "scaleDataSparse/indices", colptrPath = "scaleDataSparse/indptr", nrow = nrow(ctrl.sparse), ncol = ncol(ctrl.sparse)) h5sparse2 <- H5SpMat(filename = system.file("extdata", "stim_sparse.h5", package = "RcppPlanc", mustWork = TRUE), valuePath = "scaleDataSparse/data", rowindPath = "scaleDataSparse/indices", colptrPath = "scaleDataSparse/indptr", nrow = nrow(stim.sparse), ncol = nrow(stim.sparse)) res4 <- onlineINMF(list(ctrl = h5sparse1), minibatchSize = 50, k = 10) res5 <- onlineINMF(list(ctrl = h5sparse1), newDatasets = list(stim = h5sparse2), project = TRUE, Vinit = res4$V, Winit = res4$W, Ainit = res4$A, Binit = res4$B, minibatchSize = 50, k = 10)
library(Matrix) # Scenario 1 with sparse matrices set.seed(1) res1 <- onlineINMF(list(ctrl.sparse, stim.sparse), minibatchSize = 50, k = 10) # Scenario 2 with H5 dense matrices h5dense1 <- H5Mat(filename = system.file("extdata", "ctrl_dense.h5", package = "RcppPlanc", mustWork = TRUE), dataPath = "scaleData") h5dense2 <- H5Mat(filename = system.file("extdata", "stim_dense.h5", package = "RcppPlanc", mustWork = TRUE), dataPath = "scaleData") res2 <- onlineINMF(list(ctrl = h5dense1), minibatchSize = 50, k = 10) res3 <- onlineINMF(list(ctrl = h5dense1), newDatasets = list(stim = h5dense2), Vinit = res2$V, Winit = res2$W, Ainit = res2$A, Binit = res2$B, minibatchSize = 50, k = 10) # Scenario 3 with H5 sparse matrices h5sparse1 <- H5SpMat(filename = system.file("extdata", "ctrl_sparse.h5", package = "RcppPlanc", mustWork = TRUE), valuePath = "scaleDataSparse/data", rowindPath = "scaleDataSparse/indices", colptrPath = "scaleDataSparse/indptr", nrow = nrow(ctrl.sparse), ncol = ncol(ctrl.sparse)) h5sparse2 <- H5SpMat(filename = system.file("extdata", "stim_sparse.h5", package = "RcppPlanc", mustWork = TRUE), valuePath = "scaleDataSparse/data", rowindPath = "scaleDataSparse/indices", colptrPath = "scaleDataSparse/indptr", nrow = nrow(stim.sparse), ncol = nrow(stim.sparse)) res4 <- onlineINMF(list(ctrl = h5sparse1), minibatchSize = 50, k = 10) res5 <- onlineINMF(list(ctrl = h5sparse1), newDatasets = list(stim = h5sparse2), project = TRUE, Vinit = res4$V, Winit = res4$W, Ainit = res4$A, Binit = res4$B, minibatchSize = 50, k = 10)
Show information of a H5Mat object
## S3 method for class 'H5Mat' print(x, ...)
## S3 method for class 'H5Mat' print(x, ...)
x |
H5Mat argument list object |
... |
Not used. |
NULL. Information displayed.
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"), "data") print(h)
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"), "data") print(h)
Show information of a H5SpMat object
## S3 method for class 'H5SpMat' print(x, ...)
## S3 method for class 'H5SpMat' print(x, ...)
x |
H5SpMat argument list object |
... |
Not used. |
NULL. Information displayed.
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"), "data", "indices", "indptr", 173, 300) print(h)
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"), "data", "indices", "indptr", 173, 300) print(h)
Symmetric input matrix of size
is required. Two
approaches are provided. Alternating Non-negative Least Squares Block
Principal Pivoting algorithm (ANLSBPP) with symmetric regularization, where
the objective function is set to be
, can be run with
algo = "anlsbpp"
.
Gaussian-Newton algorithm, where the objective function is set to be
, can be run with
algo =
"gnsym"
. In the objectives, is of size
and
is of size
. The returned results will all be
.
symNMF( x, k, niter = 30L, lambda = 0, algo = "gnsym", nCores = 2L, Hinit = NULL )
symNMF( x, k, niter = 30L, lambda = 0, algo = "gnsym", nCores = 2L, Hinit = NULL )
x |
Input matrix for factorization. Must be symmetric. Can be either dense or sparse. |
k |
Integer. Factor matrix rank. |
niter |
Integer. Maximum number of symNMF interations.
Default |
lambda |
Symmetric regularization parameter. Must be
non-negative. Default |
algo |
Algorithm to perform the factorization, choose from "gnsym" or
"anlsbpp". Default |
nCores |
The number of parallel tasks that will be spawned. Only applies to anlsbpp.
Default |
Hinit |
Initial right-hand factor matrix, must be of size n x k.
Default |
A list with the following elements:
W
- the result left-hand factor matrix, non-empty when using
"anlsbpp"
H
- the result right hand matrix.
objErr
- the objective error of the factorization.
Srinivas Eswar and et al., Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization, SC '20, 2020, 10.5555/3433701.3433799
Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R.
Kriebel, 2022) to return factorized ,
,
and
matrices. The objective function is stated as
where is the input non-negative matrix of the
'th dataset,
is the input non-negative matrix for the unshared features,
is the total number of datasets.
is of size
for
shared features and
sample points,
is of size
for
unshared feaetures,
is of size
,
is of size
,
is of size
and
is of
size
.
Similar to inmf
, uinmf
also optimizes the objective with
ANLS algorithm.
uinmf( objectList, unsharedList, k = 20, lambda = 5, niter = 30, nCores = 2, verbose = FALSE )
uinmf( objectList, unsharedList, k = 20, lambda = 5, niter = 30, nCores = 2, verbose = FALSE )
objectList |
list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat. |
unsharedList |
List of input unshared feature matrices, with the same
requirement as |
k |
Integer. Inner dimensionality to factorize the datasets into.
Default |
lambda |
Regularization parameter. Use one number for all datasets or a
vector to specify for each dataset. Larger values penalize dataset-specific
effects more strongly (i.e. alignment should increase as |
niter |
Integer. Total number of block coordinate descent iterations to
perform. Default |
nCores |
The number of parallel tasks that will be spawned.
Default |
verbose |
Logical scalar. Whether to show information and progress.
Default |
A list of the following elements:
H
- a list of result matrices of size
V
- a list of result matrices
W
- the result matrix
U
- a list of result matrices
objErr
- the final objective error value.
Yichen Wang
April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022
# Fake matrices representing unshared features of the given datasets # Real-life use should have features that are not presented in the # intersection of features of all datasets involved. ctrl.unshared <- ctrl.sparse[1:10,] stim.unshared <- stim.sparse[11:30,] set.seed(1) result <- uinmf(list(ctrl.sparse, stim.sparse), list(ctrl.unshared, stim.unshared))
# Fake matrices representing unshared features of the given datasets # Real-life use should have features that are not presented in the # intersection of features of all datasets involved. ctrl.unshared <- ctrl.sparse[1:10,] stim.unshared <- stim.sparse[11:30,] set.seed(1) result <- uinmf(list(ctrl.sparse, stim.sparse), list(ctrl.unshared, stim.unshared))