Package 'RcppPlanc' reference manual

Title:	Parallel Low-Rank Approximation with Nonnegativity Constraints
Description:	'Rcpp' bindings for 'PLANC', a highly parallel and extensible NMF/NTF (Non-negative Matrix/Tensor Factorization) library. Wraps algorithms described in Kannan et. al (2018) <doi:10.1109/TKDE.2017.2767592> and Eswar et. al (2021) <doi:10.1145/3432185>. Implements algorithms described in Welch et al. (2019) <doi:10.1016/j.cell.2019.05.006>, Gao et al. (2021) <doi:10.1038/s41587-021-00867-x>, and Kriebel & Welch (2022) <doi:10.1038/s41467-022-28431-4>.
Authors:	Andrew Robbins [aut, cre], Yichen Wang [aut], Joshua Welch [cph], Ramakrishnan Kannan [cph] (The original PLANC code), UT-Batelle [cph] (The original PLANC code)
Maintainer:	Andrew Robbins <[email protected]>
License:	GPL (>= 2)
Version:	1.0.0
Built:	2025-04-02 05:53:24 UTC
Source:	https://github.com/welch-lab/RcppPlanc

Block Principal Pivoted Non-Negative Least Squares

Description

Use the BPP algorithm to get the nonnegative least squares solution. Regular NNLS problem is described as optimizing $\min_{x\ge0}||CX - B||_F^2$ where $C$ and $B$ are given and $X$ is to be solved. bppnnls takes $C$ and $B$ as input. bppnnls_prod takes $C^\mathsf{T}C$ and $C^\mathsf{T}B$ as input to directly go for the intermediate step of BPP algorithm. This can be useful when the dimensionality of $C$ and $B$ is large while pre-calculating $C^\mathsf{T}C$ and $C^\mathsf{T}B$ is cheap.

Usage

bppnnls(C, B, nCores = 2L)

bppnnls_prod(CtC, CtB, nCores = 2L)
bppnnls(C, B, nCores = 2L)

bppnnls_prod(CtC, CtB, nCores = 2L)

Arguments

`C`	Input dense $C$ matrix
`B`	Input $B$ matrix of either dense or sparse form
`nCores`	The number of parallel tasks that will be spawned. Default `2`
`CtC`	The $C^\mathsf{T}C$ matrix, see description.
`CtB`	The $C^\mathsf{T}B$ matrix, see description.

Value

The calculated solution matrix in dense form.

Examples

set.seed(1)
C <- matrix(rnorm(250), nrow = 25)
B <- matrix(rnorm(375), nrow = 25)
res1 <- bppnnls(C, B)
dim(res1)
res2 <- bppnnls_prod(t(C) %*% C, t(C) %*% B)
all.equal(res1, res2)
set.seed(1)
C <- matrix(rnorm(250), nrow = 25)
B <- matrix(rnorm(375), nrow = 25)
res1 <- bppnnls(C, B)
dim(res1)
res2 <- bppnnls_prod(t(C) %*% C, t(C) %*% B)
all.equal(res1, res2)

Example single-cell transcriptomic data in sparse form

Description

The two datasets, namingly ctrl.sparse and stim.sparse, are single-cell transcriptomic data preprocessed and subsampled from the study of Hyun Min Kang and et al., Nat Biotech., 2018. The raw datasets were two sparse matrices of integer values indicating the counts of genes (rows) per cell (columns). We normalized each column of both matrices by its library size (sum), and selected common variable genes across the datasets. Finally, we scaled the genes without centering them, in order to keep the non-negativity. The processed datasets were then randomly subsampled for a minimal example.

Usage

ctrl.sparse

stim.sparse
ctrl.sparse

stim.sparse

Format

An object of class dgCMatrix with 173 rows and 300 columns.

Source

https://www.nature.com/articles/nbt.4042

Retrieve the dimension of H5SpMat argument list

Description

Retrieve the dimension of H5SpMat argument list

Usage

## S3 method for class 'H5SpMat'
dim(x)

## S3 replacement method for class 'H5SpMat'
dim(x) <- value
## S3 method for class 'H5SpMat'
dim(x)

## S3 replacement method for class 'H5SpMat'
dim(x) <- value

Arguments

`x`	H5SpMat argument list object
`value`	Numeric vector of two, for number of rows and number of columns.

Value

Retriever returns a vector of two (nrow and ncol), setter sets the value of that in the argument list.

Examples

h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
             "data", "indices", "indptr", 173, 300)
dim(h)
nrow(h)
ncol(h)
dim(h) <- c(200, 200)
h
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
             "data", "indices", "indptr", 173, 300)
dim(h)
nrow(h)
ncol(h)
dim(h) <- c(200, 200)
h

Prepare character information of a H5Mat object

Description

Prepare character information of a H5Mat object

Usage

## S3 method for class 'H5Mat'
format(x, ...)
## S3 method for class 'H5Mat'
format(x, ...)

Arguments

`x`	H5Mat argument list object
`...`	Not used.

Value

A character scalar of the displayed message

Examples

h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
           "data")
format(h)
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
           "data")
format(h)

prepare character information of a H5SpMat object

Description

prepare character information of a H5SpMat object

Usage

## S3 method for class 'H5SpMat'
format(x, ...)
## S3 method for class 'H5SpMat'
format(x, ...)

Arguments

`x`	H5SpMat argument list object
`...`	Not used.

Value

A character scalar of the displayed message

Examples

h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
             "data", "indices", "indptr", 173, 300)
format(h)
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
             "data", "indices", "indptr", 173, 300)
format(h)

Argument list object for using a dense matrix stored in HDF5 file

Description

For running inmf, onlineINMF or uinmf with dense matrix stored in HDF5 file, users will need to construct an argument list for the filename of the HDF5 file as well as the path in the file storing the matrix. H5Mat is provided as an instructed constructor. Meanwhile, since the INMF functions require that all datasets should be of the same type, as.H5Mat is provided for writing in-memory data into a new HDF5 file on disk and returning the constructed argument list.

Usage

H5Mat(filename, dataPath)

as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)

## S3 method for class 'matrix'
as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)

## S3 method for class 'dgCMatrix'
as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)

## Default S3 method:
as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)
H5Mat(filename, dataPath)

as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)

## S3 method for class 'matrix'
as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)

## S3 method for class 'dgCMatrix'
as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)

## Default S3 method:
as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)

Arguments

`filename`	Filename of the HDF5 file
`dataPath`	Path in the HDF5 file that points to a 2D dense matrix. Default `"data"` when using `as.H5Mat`.
`x`	For `as.H5Mat`, matrix of either dense or sparse type to be written; for `print`, a `H5Mat` argument list object
`overwrite`	Logical, whether to overwrite the file if already exists at the given path. Default `FALSE`.
`...`	not used

Value

H5Mat object, indeed a list object.

Examples

if (require("withr")) {
H5MatEx <- function(){
withr::local_dir(withr::local_tempdir())
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
           "data")
print(h)

library(Matrix)
ctrl.dense <- as.matrix(ctrl.sparse)
h1 <- as.H5Mat(ctrl.dense, "ctrl_from_dense_to_dense.h5",
               dataPath = "data")
h1
h2 <- as.H5Mat(ctrl.sparse, "ctrl_from_sparse_to_dense.h5",
               dataPath = "data")
}
H5MatEx()
}
if (require("withr")) {
H5MatEx <- function(){
withr::local_dir(withr::local_tempdir())
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
           "data")
print(h)

library(Matrix)
ctrl.dense <- as.matrix(ctrl.sparse)
h1 <- as.H5Mat(ctrl.dense, "ctrl_from_dense_to_dense.h5",
               dataPath = "data")
h1
h2 <- as.H5Mat(ctrl.sparse, "ctrl_from_sparse_to_dense.h5",
               dataPath = "data")
}
H5MatEx()
}

Argument list object for using a sparse matrix stored in HDF5 file

Description

For running inmf, onlineINMF or uinmf with sparse matrix stored in HDF5 file, users will need to construct an argument list for the filename of the HDF5 file as well as the paths in the file storing the arrays that construct the CSC (compressed sparse column) matrix. H5SpMat is provided as an instructed constructor. Meanwhile, since the INMF functions require that all datasets should be of the same type, as.H5SpMat is provided for writing in-memory data into a new HDF5 file on disk and returning the constructed argument list.

Usage

H5SpMat(filename, valuePath, rowindPath, colptrPath, nrow, ncol)

as.H5SpMat(
  x,
  filename,
  valuePath = "data",
  rowindPath = "indices",
  colptrPath = "indptr",
  overwrite = FALSE,
  ...
)

## S3 method for class 'matrix'
as.H5SpMat(
  x,
  filename,
  valuePath = "data",
  rowindPath = "indices",
  colptrPath = "indptr",
  overwrite = FALSE,
  ...
)

## S3 method for class 'dgCMatrix'
as.H5SpMat(
  x,
  filename,
  valuePath = "data",
  rowindPath = "indices",
  colptrPath = "indptr",
  overwrite = FALSE,
  ...
)

## Default S3 method:
as.H5SpMat(
  x,
  filename,
  valuePath = "data",
  rowindPath = "indices",
  colptrPath = "indptr",
  overwrite = FALSE,
  ...
)
H5SpMat(filename, valuePath, rowindPath, colptrPath, nrow, ncol)

as.H5SpMat(
  x,
  filename,
  valuePath = "data",
  rowindPath = "indices",
  colptrPath = "indptr",
  overwrite = FALSE,
  ...
)

## S3 method for class 'matrix'
as.H5SpMat(
  x,
  filename,
  valuePath = "data",
  rowindPath = "indices",
  colptrPath = "indptr",
  overwrite = FALSE,
  ...
)

## S3 method for class 'dgCMatrix'
as.H5SpMat(
  x,
  filename,
  valuePath = "data",
  rowindPath = "indices",
  colptrPath = "indptr",
  overwrite = FALSE,
  ...
)

## Default S3 method:
as.H5SpMat(
  x,
  filename,
  valuePath = "data",
  rowindPath = "indices",
  colptrPath = "indptr",
  overwrite = FALSE,
  ...
)

Arguments

`filename`	Filename of the HDF5 file
`valuePath`	Path in the HDF5 file that points to a 1D array storing the non-zero values of the sparse matrix. Default `"data"` when using `as.H5SpMat`.
`rowindPath`	Path in the HDF5 file that points to a 1D integer array storing the row indices of non-zero values in each column of the sparse matrix. Default `"indices"` when using `as.H5SpMat`.
`colptrPath`	Path in the HDF5 file that points to a 1D integer array storing the number of non-zero values in each column of the sparse matrix. Default `"indptr"` when using `as.H5SpMat`.
`nrow`, `ncol`	Integer, the true dimensionality of the sparse matrix.
`x`	For `as.H5SpMat`, matrix of either dense or sparse type to be written; for `print`, a `H5SpMat` argument list object.
`overwrite`	Logical, whether to overwrite the file if already exists at the given path. Default `FALSE`.
`...`	not used

Value

H5SpMat object, indeed a list object.

Examples

if (require("withr")) {
H5SpMatEx <- function(){
withr::local_dir(withr::local_tempdir())
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
             "data", "indices", "indptr", 173, 300)
dim(h)

library(Matrix)
ctrl.dense <- as.matrix(ctrl.sparse)
h1 <- as.H5SpMat(ctrl.sparse, "ctrl_from_sparse_to_sparse.h5",
                 valuePath = "data", rowindPath = "indices",
                 colptrPath = "indptr", nrow = nrow(ctrl.sparse),
                 ncol = ncol(ctrl.sparse))
h1
h2 <- as.H5SpMat(ctrl.dense, "ctrl_from_dense_to_sparse.h5",
                 valuePath = "data", rowindPath = "indices",
                 colptrPath = "indptr", nrow = nrow(ctrl.sparse),
                 ncol = ncol(ctrl.sparse))
h2
}
H5SpMatEx()
}
if (require("withr")) {
H5SpMatEx <- function(){
withr::local_dir(withr::local_tempdir())
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
             "data", "indices", "indptr", 173, 300)
dim(h)

library(Matrix)
ctrl.dense <- as.matrix(ctrl.sparse)
h1 <- as.H5SpMat(ctrl.sparse, "ctrl_from_sparse_to_sparse.h5",
                 valuePath = "data", rowindPath = "indices",
                 colptrPath = "indptr", nrow = nrow(ctrl.sparse),
                 ncol = ncol(ctrl.sparse))
h1
h2 <- as.H5SpMat(ctrl.dense, "ctrl_from_dense_to_sparse.h5",
                 valuePath = "data", rowindPath = "indices",
                 colptrPath = "indptr", nrow = nrow(ctrl.sparse),
                 ncol = ncol(ctrl.sparse))
h2
}
H5SpMatEx()
}

Perform Integrative Non-negative Matrix Factorization

Description

Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch, 2019) to return factorized $H$ , $W$ , and $V$ matrices. The objective function is stated as

$\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+ \lambda\sum_{i}^{d}||V_iH_i||_F^2$

where $E_i$ is the input non-negative matrix of the $i$ 'th dataset, $d$ is the total number of datasets. $E_i$ is of size $m \times n_i$ for $m$ features and $n_i$ sample points, $H_i$ is of size $k \times n_i$ , $V_i$ is of size $m \times k$ , and $W$ is of size $m \times k$ .

inmf optimizes the objective with ANLS strategy, while onlineINMF optimizes the same objective with an online learning strategy.

Usage

inmf(
  objectList,
  k = 20,
  lambda = 5,
  niter = 30,
  nCores = 2,
  Hinit = NULL,
  Vinit = NULL,
  Winit = NULL,
  verbose = FALSE
)
inmf(
  objectList,
  k = 20,
  lambda = 5,
  niter = 30,
  nCores = 2,
  Hinit = NULL,
  Vinit = NULL,
  Winit = NULL,
  verbose = FALSE
)

Arguments

`objectList`	list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat.
`k`	Integer. Inner dimensionality to factorize the datasets into. Default `20`.
`lambda`	Regularization parameter. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as `lambda` increases). Default `5`.
`niter`	Integer. Total number of block coordinate descent iterations to perform. Default `30`.
`nCores`	The number of parallel tasks that will be spawned. Default `2`
`Hinit`	Initial values to use for $H$ matrices. A list object where each element is the initial $H$ matrix of each dataset. Each should be dense matrix of size $n_i \times k$ . Default `NULL`.
`Vinit`	Similar to `Hinit`, but each should be of size $m \times k$ .
`Winit`	Initial values to use for $W$ matrix. A matrix object of size $m \times k$ . Default `NULL`.
`verbose`	Logical scalar. Whether to show information and progress. Default `FALSE`.

Value

A list of the following elements:

H - a list of result $H_i$ matrices of size $n_i \times k$
V - a list of result $V_i$ matrices
W - the result $W$ matrix
objErr - the final objective error value.

Author(s)

Yichen Wang

References

Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019

Examples

library(Matrix)
set.seed(1)
result <- inmf(list(ctrl.sparse, stim.sparse), k = 10, niter = 10)
library(Matrix)
set.seed(1)
result <- inmf(list(ctrl.sparse, stim.sparse), k = 10, niter = 10)

Perform Non-negative Matrix Factorization

Description

Regularly, Non-negative Matrix Factorization (NMF) is factorizes input matrix $X$ into low rank matrices $W$ and $H$ , so that $X \approx WH$ . The objective function can be stated as $\arg\min_{W\ge0,H\ge0}||X-WH||_F^2$ . In practice, $X$ is usually regarded as a matrix of $m$ features by $n$ sample points. And the result matrix $W$ should have the dimensionality of $m \times k$ and H with $n \times k$ (transposed). This function wraps the algorithms implemented in PLANC library to solve NMF problems. Algorithms includes Alternating Non-negative Least Squares with Block Principal Pivoting (ANLS-BPP), Alternating Direction Method of Multipliers (ADMM), Hierarchical Alternating Least Squares (HALS), and Multiplicative Update (MU).

Usage

nmf(
  x,
  k,
  niter = 30L,
  algo = "anlsbpp",
  nCores = 2L,
  Winit = NULL,
  Hinit = NULL
)
nmf(
  x,
  k,
  niter = 30L,
  algo = "anlsbpp",
  nCores = 2L,
  Winit = NULL,
  Hinit = NULL
)

Arguments

`x`	Input matrix for factorization. Can be either dense or sparse.
`k`	Integer. Factor matrix rank.
`niter`	Integer. Maximum number of NMF interations.
`algo`	Algorithm to perform the factorization, choose from "anlsbpp", "admm", "hals" or "mu". See detailed sections.
`nCores`	The number of parallel tasks that will be spawned. Only applies to anlsbpp. Default `2`
`Winit`	Initial left-hand factor matrix, must be of size m x k.
`Hinit`	Initial right-hand factor matrix, must be of size n x k.

Value

A list with the following elements:

W - the result left-hand factor matrix
H - the result right hand matrix.
objErr - the objective error of the factorization.

References

Ramakrishnan Kannan and et al., A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization, PPoPP '16, 2016, 10.1145/2851141.2851152

Perform Integrative Non-negative Matrix Factorization Using Online Learning

Description

Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch, 2019, C. Gao, 2021) using online learning approach to return factorized $H$ , $W$ , and $V$ matrices. The objective function is stated as

$\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+ \lambda\sum_{i}^{d}||V_iH_i||_F^2$

Different from inmf which optimizes the objective with ANLS approach, onlineINMF optimizes the same objective with online learning strategy, where it updates mini-batches of $H_i$ solving the NNLS problem, and updates $V_i$ and $W$ with HALS multiplicative method.

This function allows online learning in 3 scenarios:

Fully observed datasets;
Iterative refinement using continually arriving datasets;
Projection of new datasets without updating the existing factorization

Usage

onlineINMF(
  objectList,
  newDatasets = NULL,
  project = FALSE,
  k = 20,
  lambda = 5,
  maxEpoch = 5,
  minibatchSize = 5000,
  maxHALSIter = 1,
  nCores = 2,
  Vinit = NULL,
  Winit = NULL,
  Ainit = NULL,
  Binit = NULL,
  verbose = FALSE
)
onlineINMF(
  objectList,
  newDatasets = NULL,
  project = FALSE,
  k = 20,
  lambda = 5,
  maxEpoch = 5,
  minibatchSize = 5000,
  maxHALSIter = 1,
  nCores = 2,
  Vinit = NULL,
  Winit = NULL,
  Ainit = NULL,
  Binit = NULL,
  verbose = FALSE
)

Arguments

`objectList`	list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat.
`newDatasets`	Same requirements as for new arriving datasets. Default `NULL` for scenario 1, specify for scenario 2 or 3.
`project`	Logical scalar, whether to run scenario 3. See description. Default `FALSE`.
`k`	Integer. Inner dimensionality to factorize the datasets into. Default `20`.
`lambda`	Regularization parameter. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as `lambda` increases). Default `5`.
`maxEpoch`	The number of epochs to iterate through. Default `5`.
`minibatchSize`	Total number of cells in each mini-batch. Default `5000`.
`maxHALSIter`	Maximum number of block coordinate descent (HALS algorithm) iterations to perform for each update of $W$ and $V$ . Default `1`. Changing this parameter is not recommended.
`nCores`	The number of parallel tasks that will be spawned. Default `2`
`Vinit`, `Winit`, `Ainit`, `Binit`	Pass the previous factorization result for datasets existing in `objectList`, in order to run scenario 2 or 3. All should have `length(objectList)` matrices inside. See description for dimensionality of $V_i$ and $W_i$ . $A_i$ should be of size $k \times k$ and $B_i$ should be of size $m \times k$
`verbose`	Logical scalar. Whether to show information and progress. Default `FALSE`.

Value

A list of the following elements:

H - a list of result $H_i$ matrices of size $n_i \times k$
V - a list of result $V_i$ matrices
W - the result $W$ matrix
A - a list of result $A_i$ matrices, $k \times k$
B - a list of result $B_i$ matrices, $m \times k$
objErr - the final objective error value.

Author(s)

Yichen Wang

References

Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019

Chao Gao and et al., Iterative single-cell multi-omic integration using online learning, Nat Biotechnol., 2021

Examples

library(Matrix)

# Scenario 1 with sparse matrices
set.seed(1)
res1 <- onlineINMF(list(ctrl.sparse, stim.sparse), 
                   minibatchSize = 50, k = 10)

# Scenario 2 with H5 dense matrices
h5dense1 <- H5Mat(filename = system.file("extdata", "ctrl_dense.h5",
                             package = "RcppPlanc", mustWork = TRUE),
                                          dataPath = "scaleData")
h5dense2 <- H5Mat(filename = system.file("extdata", "stim_dense.h5",
                             package = "RcppPlanc", mustWork = TRUE),
                                          dataPath = "scaleData")
res2 <- onlineINMF(list(ctrl = h5dense1), minibatchSize = 50, k = 10)
res3 <- onlineINMF(list(ctrl = h5dense1),
                   newDatasets = list(stim = h5dense2),
                   Vinit = res2$V, Winit = res2$W,
                   Ainit = res2$A, Binit = res2$B, 
                   minibatchSize = 50, k = 10)

# Scenario 3 with H5 sparse matrices
h5sparse1 <- H5SpMat(filename = system.file("extdata", "ctrl_sparse.h5",
                                package = "RcppPlanc", mustWork = TRUE),
                                valuePath = "scaleDataSparse/data",
                                rowindPath = "scaleDataSparse/indices",
                                colptrPath = "scaleDataSparse/indptr",
                                nrow = nrow(ctrl.sparse),
                                ncol = ncol(ctrl.sparse))
h5sparse2 <- H5SpMat(filename = system.file("extdata", "stim_sparse.h5",
                                package = "RcppPlanc", mustWork = TRUE),
                                valuePath = "scaleDataSparse/data",
                                rowindPath = "scaleDataSparse/indices",
                                colptrPath = "scaleDataSparse/indptr",
                                nrow = nrow(stim.sparse),
                                ncol = nrow(stim.sparse))
res4 <- onlineINMF(list(ctrl = h5sparse1), minibatchSize = 50, k = 10)
res5 <- onlineINMF(list(ctrl = h5sparse1),
                   newDatasets = list(stim = h5sparse2), project = TRUE,
                   Vinit = res4$V, Winit = res4$W,
                   Ainit = res4$A, Binit = res4$B, minibatchSize = 50, k = 10)

library(Matrix)

# Scenario 1 with sparse matrices
set.seed(1)
res1 <- onlineINMF(list(ctrl.sparse, stim.sparse), 
                   minibatchSize = 50, k = 10)

# Scenario 2 with H5 dense matrices
h5dense1 <- H5Mat(filename = system.file("extdata", "ctrl_dense.h5",
                             package = "RcppPlanc", mustWork = TRUE),
                                          dataPath = "scaleData")
h5dense2 <- H5Mat(filename = system.file("extdata", "stim_dense.h5",
                             package = "RcppPlanc", mustWork = TRUE),
                                          dataPath = "scaleData")
res2 <- onlineINMF(list(ctrl = h5dense1), minibatchSize = 50, k = 10)
res3 <- onlineINMF(list(ctrl = h5dense1),
                   newDatasets = list(stim = h5dense2),
                   Vinit = res2$V, Winit = res2$W,
                   Ainit = res2$A, Binit = res2$B, 
                   minibatchSize = 50, k = 10)

# Scenario 3 with H5 sparse matrices
h5sparse1 <- H5SpMat(filename = system.file("extdata", "ctrl_sparse.h5",
                                package = "RcppPlanc", mustWork = TRUE),
                                valuePath = "scaleDataSparse/data",
                                rowindPath = "scaleDataSparse/indices",
                                colptrPath = "scaleDataSparse/indptr",
                                nrow = nrow(ctrl.sparse),
                                ncol = ncol(ctrl.sparse))
h5sparse2 <- H5SpMat(filename = system.file("extdata", "stim_sparse.h5",
                                package = "RcppPlanc", mustWork = TRUE),
                                valuePath = "scaleDataSparse/data",
                                rowindPath = "scaleDataSparse/indices",
                                colptrPath = "scaleDataSparse/indptr",
                                nrow = nrow(stim.sparse),
                                ncol = nrow(stim.sparse))
res4 <- onlineINMF(list(ctrl = h5sparse1), minibatchSize = 50, k = 10)
res5 <- onlineINMF(list(ctrl = h5sparse1),
                   newDatasets = list(stim = h5sparse2), project = TRUE,
                   Vinit = res4$V, Winit = res4$W,
                   Ainit = res4$A, Binit = res4$B, minibatchSize = 50, k = 10)

Show information of a H5Mat object

Description

Show information of a H5Mat object

Usage

## S3 method for class 'H5Mat'
print(x, ...)
## S3 method for class 'H5Mat'
print(x, ...)

Arguments

`x`	H5Mat argument list object
`...`	Not used.

Value

NULL. Information displayed.

Examples

h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
           "data")
print(h)
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
           "data")
print(h)

Show information of a H5SpMat object

Description

Show information of a H5SpMat object

Usage

## S3 method for class 'H5SpMat'
print(x, ...)
## S3 method for class 'H5SpMat'
print(x, ...)

Arguments

`x`	H5SpMat argument list object
`...`	Not used.

Value

NULL. Information displayed.

Examples

h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
             "data", "indices", "indptr", 173, 300)
print(h)
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
             "data", "indices", "indptr", 173, 300)
print(h)

Perform Symmetric Non-negative Matrix Factorization

Description

Symmetric input matrix $X$ of size $n \times n$ is required. Two approaches are provided. Alternating Non-negative Least Squares Block Principal Pivoting algorithm (ANLSBPP) with symmetric regularization, where the objective function is set to be $\arg\min_{H\ge0,W\ge0}||X-WH||_F^2+ \lambda||W-H||_F^2$ , can be run with algo = "anlsbpp". Gaussian-Newton algorithm, where the objective function is set to be $\arg\min_{H\ge0}||X-H^\mathsf{T}H||_F^2$ , can be run with algo = "gnsym". In the objectives, $W$ is of size $n \times k$ and $H$ is of size $k \times n$ . The returned results will all be $n \times k$ .

Usage

symNMF(
  x,
  k,
  niter = 30L,
  lambda = 0,
  algo = "gnsym",
  nCores = 2L,
  Hinit = NULL
)
symNMF(
  x,
  k,
  niter = 30L,
  lambda = 0,
  algo = "gnsym",
  nCores = 2L,
  Hinit = NULL
)

Arguments

`x`	Input matrix for factorization. Must be symmetric. Can be either dense or sparse.
`k`	Integer. Factor matrix rank.
`niter`	Integer. Maximum number of symNMF interations. Default `30`
`lambda`	Symmetric regularization parameter. Must be non-negative. Default `0.0` uses the square of the maximum value in `x`.
`algo`	Algorithm to perform the factorization, choose from "gnsym" or "anlsbpp". Default `"gnsym"`
`nCores`	The number of parallel tasks that will be spawned. Only applies to anlsbpp. Default `2`
`Hinit`	Initial right-hand factor matrix, must be of size n x k. Default `NULL`.

Value

A list with the following elements:

W - the result left-hand factor matrix, non-empty when using "anlsbpp"
H - the result right hand matrix.
objErr - the objective error of the factorization.

References

Srinivas Eswar and et al., Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization, SC '20, 2020, 10.5555/3433701.3433799

Perform Mosaic Integrative Non-negative Matrix Factorization with Unshared Features

Description

Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R. Kriebel, 2022) to return factorized $H$ , $W$ , $V$ and $U$ matrices. The objective function is stated as

$\arg\min_{H\ge0,W\ge0,V\ge0,U\ge0}\sum_{i}^{d} ||\begin{bmatrix}E_i \\ P_i \end{bmatrix} - (\begin{bmatrix}W \\ 0 \end{bmatrix}+ \begin{bmatrix}V_i \\ U_i \end{bmatrix})Hi||^2_F+ \lambda_i\sum_{i}^{d}||\begin{bmatrix}V_i \\ U_i \end{bmatrix}H_i||_F^2$

where $E_i$ is the input non-negative matrix of the $i$ 'th dataset, $P_i$ is the input non-negative matrix for the unshared features, $d$ is the total number of datasets. $E_i$ is of size $m \times n_i$ for $m$ shared features and $n_i$ sample points, $P_i$ is of size $u_i \times n_i$ for $u_i$ unshared feaetures, $H_i$ is of size $k \times n_i$ , $V_i$ is of size $m \times k$ , $W$ is of size $m \times k$ and $U_i$ is of size $u_i \times k$ .

Similar to inmf, uinmf also optimizes the objective with ANLS algorithm.

Usage

uinmf(
  objectList,
  unsharedList,
  k = 20,
  lambda = 5,
  niter = 30,
  nCores = 2,
  verbose = FALSE
)
uinmf(
  objectList,
  unsharedList,
  k = 20,
  lambda = 5,
  niter = 30,
  nCores = 2,
  verbose = FALSE
)

Arguments

`objectList`	list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat.
`unsharedList`	List of input unshared feature matrices, with the same requirement as `objectList`.
`k`	Integer. Inner dimensionality to factorize the datasets into. Default `20`.
`lambda`	Regularization parameter. Use one number for all datasets or a vector to specify for each dataset. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as `lambda` increases). Default `5`.
`niter`	Integer. Total number of block coordinate descent iterations to perform. Default `30`.
`nCores`	The number of parallel tasks that will be spawned. Default `2`.
`verbose`	Logical scalar. Whether to show information and progress. Default `FALSE`.

Value

A list of the following elements:

H - a list of result $H_i$ matrices of size $n_i \times k$
V - a list of result $V_i$ matrices
W - the result $W$ matrix
U - a list of result $A_i$ matrices
objErr - the final objective error value.

Author(s)

Yichen Wang

References

April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022

Examples

# Fake matrices representing unshared features of the given datasets
# Real-life use should have features that are not presented in the 
# intersection of features of all datasets involved.
ctrl.unshared <- ctrl.sparse[1:10,]
stim.unshared <- stim.sparse[11:30,]
set.seed(1)
result <- uinmf(list(ctrl.sparse, stim.sparse), 
                list(ctrl.unshared, stim.unshared))
# Fake matrices representing unshared features of the given datasets
# Real-life use should have features that are not presented in the 
# intersection of features of all datasets involved.
ctrl.unshared <- ctrl.sparse[1:10,]
stim.unshared <- stim.sparse[11:30,]
set.seed(1)
result <- uinmf(list(ctrl.sparse, stim.sparse), 
                list(ctrl.unshared, stim.unshared))

Package 'RcppPlanc'

Help Index

Block Principal Pivoted Non-Negative Least Squares

Description

Usage

Arguments

Value

Examples

Example single-cell transcriptomic data in sparse form

Description

Usage

Format

Source

Retrieve the dimension of H5SpMat argument list

Description

Usage

Arguments

Value

Examples

Prepare character information of a H5Mat object

Description

Usage

Arguments

Value

Examples

prepare character information of a H5SpMat object

Description

Usage

Arguments

Value

Examples

Argument list object for using a dense matrix stored in HDF5 file

Description

Usage

Arguments

Value

Examples

Argument list object for using a sparse matrix stored in HDF5 file

Description

Usage

Arguments

Value

Examples

Perform Integrative Non-negative Matrix Factorization

Description

Usage

Arguments

Value

Author(s)

References

Examples

Perform Non-negative Matrix Factorization

Description

Usage

Arguments

Value

References

Perform Integrative Non-negative Matrix Factorization Using Online Learning

Description

Usage

Arguments

Value

Author(s)

References

Examples

Show information of a H5Mat object

Description

Usage

Arguments

Value

Examples

Show information of a H5SpMat object

Description

Usage

Arguments

Value

Examples

Perform Symmetric Non-negative Matrix Factorization

Description

Usage