Title: | Linked Inference of Genomic Experimental Relationships |
---|---|
Description: | Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details. |
Authors: | Joshua Welch [aut], Yichen Wang [aut, cre], Chao Gao [aut], Jialin Liu [aut], Joshua Sodicoff [aut, ctb], Velina Kozareva [aut, ctb], Evan Macosko [aut, ctb], Paul Hoffman [ctb], Ilya Korsunsky [ctb], Robert Lee [ctb], Andrew Robbins [ctb] |
Maintainer: | Yichen Wang <[email protected]> |
License: | GPL-3 |
Version: | 2.1.0.9003 |
Built: | 2024-12-05 01:37:24 UTC |
Source: | https://github.com/welch-lab/liger |
Generate dot plot from input matrix with ComplexHeatmap
.complexHeatmapDotPlot( colorMat, sizeMat, featureAnnDF = NULL, cellSplitVar = NULL, cellLabels = NULL, maxDotSize = 4, clusterFeature = FALSE, clusterCell = FALSE, legendColorTitle = "Matrix Value", legendSizeTitle = "Fraction Value", transpose = FALSE, baseSize = 8, cellTextSize = NULL, featureTextSize = NULL, cellTitleSize = NULL, featureTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, featureGrpRot = 0, viridisOption = "C", viridisDirection = -1, ... )
.complexHeatmapDotPlot( colorMat, sizeMat, featureAnnDF = NULL, cellSplitVar = NULL, cellLabels = NULL, maxDotSize = 4, clusterFeature = FALSE, clusterCell = FALSE, legendColorTitle = "Matrix Value", legendSizeTitle = "Fraction Value", transpose = FALSE, baseSize = 8, cellTextSize = NULL, featureTextSize = NULL, cellTitleSize = NULL, featureTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, featureGrpRot = 0, viridisOption = "C", viridisDirection = -1, ... )
colorMat , sizeMat
|
Matrix of the same size. Values in |
featureAnnDF |
Data frame of features containing feature names and grouping labels. |
cellSplitVar |
Split the cell orientation (default columns) by this variable. |
cellLabels |
Label to be shown on cell orientation. |
maxDotSize |
The maximum dot size. Default |
clusterFeature , clusterCell
|
Whether the feature/cell orientation
(default rows/column, respectively) should be clustered. Default
|
legendColorTitle , legendSizeTitle
|
The title for color bar and dot size
legends, repectively. Default see |
transpose |
Logical, whether to rotate the dot plot orientation. i.e.
rows as cell aggregation and columns as features. Default |
baseSize |
One-parameter control of all text sizes. Individual text
element sizes can be controlled by other size arguments. "Title" sizes are
2 points larger than "text" sizes when being controlled by this. Default
|
cellTextSize , featureTextSize , legendTextSize
|
Size of cell labels,
feature label and legend text. Default |
cellTitleSize , featureTitleSize , legendTitleSize
|
Size of titles on
cell and feature orientation and legend title. Default |
featureGrpRot |
Number of degree to rotate the feature grouping label.
Default |
viridisOption , viridisDirection
|
See argument |
... |
Additional arguments passed to
|
A HeatmapList
object.
Produce single violin plot with data frame passed from upstream
.ggCellViolin( plotDF, y, groupBy = NULL, colorBy = NULL, violin = TRUE, violinAlpha = 0.8, violinWidth = 0.9, box = FALSE, boxAlpha = 0.6, boxWidth = 0.4, dot = FALSE, dotColor = "black", dotSize = getOption("ligerDotSize"), xlabAngle = 45, raster = NULL, seed = 1, ... )
.ggCellViolin( plotDF, y, groupBy = NULL, colorBy = NULL, violin = TRUE, violinAlpha = 0.8, violinWidth = 0.9, box = FALSE, boxAlpha = 0.6, boxWidth = 0.4, dot = FALSE, dotColor = "black", dotSize = getOption("ligerDotSize"), xlabAngle = 45, raster = NULL, seed = 1, ... )
plotDF |
Data frame like object (fortifiable) that contains all necessary information to make the plot. |
y , groupBy , colorBy
|
See |
violin , box , dot
|
Logical, whether to add violin plot, box plot or dot (scatter) plot, respectively. Layers are added in the order of dot, violin, and violin on the top surface. By default, only violin plot is generated. |
violinAlpha , boxAlpha
|
Numeric, controls the transparency of layers.
Default |
violinWidth , boxWidth
|
Numeric, controls the width of violin/box
bounding box. Default |
dotColor , dotSize
|
Numeric, globally controls the appearance of all
dots. Default |
xlabAngle |
Numeric, counter-clockwise rotation angle of X axis label
text. Default |
raster |
Logical, whether to rasterize the dot plot. Default |
seed |
Random seed for reproducibility. Default |
... |
More theme setting arguments passed to
|
ggplot object by default. When plotly = TRUE
, returns
plotly (htmlwidget) object.
Controls content and size of all peripheral texts.
.ggplotLigerTheme( plot, title = NULL, subtitle = NULL, xlab = TRUE, ylab = TRUE, xlabAngle = 0, legendColorTitle = NULL, legendFillTitle = NULL, legendShapeTitle = NULL, legendSizeTitle = NULL, showLegend = TRUE, legendPosition = "right", baseSize = getOption("ligerBaseSize"), titleSize = NULL, subtitleSize = NULL, xTextSize = NULL, xFacetSize = NULL, xTitleSize = NULL, yTextSize = NULL, yFacetSize = NULL, yTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, legendDotSize = 4, panelBorder = FALSE, legendNRow = NULL, legendNCol = NULL, colorLabels = NULL, colorValues = NULL, colorPalette = "magma", colorDirection = -1, naColor = "#DEDEDE", colorLow = NULL, colorMid = NULL, colorHigh = NULL, colorMidPoint = NULL, plotly = FALSE )
.ggplotLigerTheme( plot, title = NULL, subtitle = NULL, xlab = TRUE, ylab = TRUE, xlabAngle = 0, legendColorTitle = NULL, legendFillTitle = NULL, legendShapeTitle = NULL, legendSizeTitle = NULL, showLegend = TRUE, legendPosition = "right", baseSize = getOption("ligerBaseSize"), titleSize = NULL, subtitleSize = NULL, xTextSize = NULL, xFacetSize = NULL, xTitleSize = NULL, yTextSize = NULL, yFacetSize = NULL, yTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, legendDotSize = 4, panelBorder = FALSE, legendNRow = NULL, legendNCol = NULL, colorLabels = NULL, colorValues = NULL, colorPalette = "magma", colorDirection = -1, naColor = "#DEDEDE", colorLow = NULL, colorMid = NULL, colorHigh = NULL, colorMidPoint = NULL, plotly = FALSE )
plot |
ggplot object passed from wrapper plotting functions |
title , subtitle , xlab , ylab
|
Main title, subtitle or X/Y axis title text.
By default, no main title or subtitle will be set, and X/Y axis title will be
the names of variables used for plotting. Use |
xlabAngle |
Numeric, counter-clockwise rotation angle of X axis label
text. Default |
legendColorTitle |
Legend title text for color aesthetics, often used
for categorical or continuous coloring of dots. Default |
legendFillTitle |
Legend title text for fill aesthetics, often used for
violin, box, bar plots. Default |
legendShapeTitle |
Legend title text for shape aesthetics, often used
for shaping dots by categorical variable. Default |
legendSizeTitle |
Legend title text for size aesthetics, often used for
sizing dots by continuous variable. Default |
showLegend |
Whether to show the legend. Default |
legendPosition |
Text indicating where to place the legend. Choose from
|
baseSize |
One-parameter control of all text sizes. Individual text element sizes can be controlled by other size arguments. "Title" sizes are 2 points larger than "text" sizes when being controlled by this. |
titleSize , xTitleSize , yTitleSize , legendTitleSize
|
Size of main title,
axis titles and legend title. Default |
subtitleSize , xTextSize , yTextSize , legendTextSize
|
Size of subtitle text,
axis texts and legend text. Default |
xFacetSize |
Size of facet strip label text on x-axis. Default
|
yFacetSize |
Size of facet strip label text on y-axis. Default
|
legendDotSize |
Allow dots in legend region to be large enough to see
the colors/shapes clearly. Default |
panelBorder |
Whether to show rectangle border of the panel instead of
using ggplot classic bottom and left axis lines. Default |
legendNRow , legendNCol
|
Integer, when too many categories in one
variable, arranges number of rows or columns. Default |
colorLabels |
Character vector for modifying category names in a
color legend. Passed to |
colorValues |
Character vector of colors for modifying category colors
in a color legend. Passed to |
colorPalette |
For continuous coloring, an index or a palette name to
select from available options from ggplot
|
colorDirection |
Choose |
naColor |
The color code for |
colorLow , colorMid , colorHigh , colorMidPoint
|
All four of these must be specified to customize palette with |
plotly |
Whether to use plotly to enable web based interactive browsing
for the plot. Requires installation of package "plotly". Default
|
Updated ggplot object by default. When plotly = TRUE
, returns
plotly (htmlwidget) object.
Produce single scatter plot with data frame passed from upstream
.ggScatter( plotDF, x, y, colorBy = NULL, shapeBy = NULL, dotOrder = c("shuffle", "ascending", "descending"), dotSize = getOption("ligerDotSize"), dotAlpha = 0.9, trimHigh = NULL, trimLow = NULL, zeroAsNA = TRUE, raster = NULL, labelBy = colorBy, labelText = TRUE, labelTextSize = 4, ggrepelLabelTick = FALSE, seed = 1, ... )
.ggScatter( plotDF, x, y, colorBy = NULL, shapeBy = NULL, dotOrder = c("shuffle", "ascending", "descending"), dotSize = getOption("ligerDotSize"), dotAlpha = 0.9, trimHigh = NULL, trimLow = NULL, zeroAsNA = TRUE, raster = NULL, labelBy = colorBy, labelText = TRUE, labelTextSize = 4, ggrepelLabelTick = FALSE, seed = 1, ... )
plotDF |
Data frame like object (fortifiable) that contains all necessary information to make the plot. |
x , y
|
Available variable name in |
colorBy , shapeBy
|
See |
dotOrder |
Controls the order that each dot is added to the plot. Choose
from |
dotSize , dotAlpha
|
Numeric, controls the size or transparency of all
dots. Default |
trimHigh , trimLow
|
Numeric, limit the largest or smallest value of
continuous |
zeroAsNA |
Logical, whether to set zero values in continuous
|
raster |
Logical, whether to rasterize the plot. Default |
labelBy |
A variable name available in |
labelText |
Logical, whether to show text label at the median position
of each categorical group specified by |
labelTextSize |
Numeric, controls the size of label size when
|
ggrepelLabelTick |
Logical, whether to force showing the tick between
label texts and the position they point to. Useful when a lot of text labels
are required. Default |
seed |
Random seed for reproducibility. Default |
... |
More theme setting arguments passed to
|
Having package "ggrepel" installed can help adding tidier text labels on the scatter plot.
ggplot object by default. When plotly = TRUE
, returns
plotly (htmlwidget) object.
This is not an exported function. This documentation just
serves for a manual of extra arguments that users can use when generating
heatmaps with plotGeneHeatmap
or
plotFactorHeatmap
.
Note that the following arguments are pre-occupied by upstream wrappers so
users should not include them in a function call: dataMatrix
,
dataName
, cellDF
, featureDF
, cellSplitVar
,
featureSplitVar
.
The following arguments of Heatmap
is occupied
by this function, so users should include them in a function call as well:
matrix
, name
, col
, heatmap_legend_param
,
top_annotation
, column_title_gp
, column_names_gp
,
show_column_names
, column_split
, column_gap
,
left_annotation
, row_title_gp
, row_names_gp
,
show_row_names
, row_split
, row_gap
.
.plotHeatmap( dataMatrix, dataName = "Value", cellDF = NULL, featureDF = NULL, transpose = FALSE, cellSplitVar = NULL, featureSplitVar = NULL, dataScaleFunc = NULL, showCellLabel = FALSE, showCellLegend = TRUE, showFeatureLabel = TRUE, showFeatureLegend = TRUE, cellAnnColList = NULL, featureAnnColList = NULL, scale = FALSE, trim = c(-2, 2), baseSize = 8, cellTextSize = NULL, featureTextSize = NULL, cellTitleSize = NULL, featureTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, viridisOption = "A", viridisDirection = -1, RColorBrewerOption = "RdBu", ... )
.plotHeatmap( dataMatrix, dataName = "Value", cellDF = NULL, featureDF = NULL, transpose = FALSE, cellSplitVar = NULL, featureSplitVar = NULL, dataScaleFunc = NULL, showCellLabel = FALSE, showCellLegend = TRUE, showFeatureLabel = TRUE, showFeatureLegend = TRUE, cellAnnColList = NULL, featureAnnColList = NULL, scale = FALSE, trim = c(-2, 2), baseSize = 8, cellTextSize = NULL, featureTextSize = NULL, cellTitleSize = NULL, featureTitleSize = NULL, legendTextSize = NULL, legendTitleSize = NULL, viridisOption = "A", viridisDirection = -1, RColorBrewerOption = "RdBu", ... )
dataMatrix |
Matrix object with features/factors as rows and cells as columns. |
dataName |
Text for heatmap color bar title. Default |
cellDF |
data.frame object. Number of rows must match with number of
columns of |
featureDF |
data.frame object. Number of columns must match with number
of rows of |
transpose |
Logical, whether to "rotate" the heatmap by 90 degrees so
that cell information is displayed by row. Default |
cellSplitVar , featureSplitVar
|
Subset columns of |
dataScaleFunc |
A function object, applied to |
showCellLabel , showFeatureLabel
|
Logical, whether to show cell barcodes,
gene symbols or factor names. Default |
showCellLegend , showFeatureLegend
|
Logical, whether to show cell or
feature legends. Default |
cellAnnColList , featureAnnColList
|
List object, with each element a
named vector of R-interpretable color code. The names of the list elements
are used for matching the annotation variable names. The names of the colors
in the vectors are used for matching the levels of a variable (factor object,
categorical). Default |
scale |
Logical, whether to take z-score to scale and center gene
expression. Applied after |
trim |
Numeric vector of two values. Limit the z-score value into this
range when |
baseSize |
One-parameter control of all text sizes. Individual text element sizes can be controlled by other size arguments. "Title" sizes are 2 points larger than "text" sizes when being controlled by this. |
cellTextSize , featureTextSize , legendTextSize
|
Size of cell barcode
labels, gene/factor labels, or legend values. Default |
cellTitleSize , featureTitleSize , legendTitleSize
|
Size of titles of the
cell slices, gene/factor slices, or the legends. Default |
viridisOption , viridisDirection
|
See argument |
RColorBrewerOption |
When |
... |
Additional arguments to be passed to
|
HeatmapList-class
object
This function is a wrapper to switch between alternative factor loading alignment methods that LIGER provides, which is a required step for producing the final integrated result. Two methods are provided (click on options for more details):
method = "quantileNorm"
: Previously published quantile
normalization method. (default)
method = "centroidAlign"
: Newly developed centroid
alignment method.
alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...) ## S3 method for class 'liger' alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...) ## S3 method for class 'Seurat' alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...)
alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...) ## S3 method for class 'liger' alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...) ## S3 method for class 'Seurat' alignFactors(object, method = c("quantileNorm", "centroidAlign"), ...)
object |
A liger or Seurat object with valid factorization
result available (i.e. |
method |
Character, method to align factors. Default
|
... |
Additional arguments passed to selected methods.
For
|
This function converts data stored in SingleCellExperiment (SCE), Seurat
object or a merged sparse matrix (dgCMatrix) into a liger object. This is
designed for a container object or matrix that already contains multiple
datasets to be integerated with LIGER. For individual datasets, please use
createLiger
instead.
## S3 method for class 'dgCMatrix' as.liger(object, datasetVar = NULL, modal = NULL, ...) ## S3 method for class 'SingleCellExperiment' as.liger(object, datasetVar = NULL, modal = NULL, ...) ## S3 method for class 'Seurat' as.liger(object, datasetVar = NULL, modal = NULL, assay = NULL, ...) seuratToLiger(object, datasetVar = NULL, modal = NULL, assay = NULL, ...) as.liger(object, ...)
## S3 method for class 'dgCMatrix' as.liger(object, datasetVar = NULL, modal = NULL, ...) ## S3 method for class 'SingleCellExperiment' as.liger(object, datasetVar = NULL, modal = NULL, ...) ## S3 method for class 'Seurat' as.liger(object, datasetVar = NULL, modal = NULL, assay = NULL, ...) seuratToLiger(object, datasetVar = NULL, modal = NULL, assay = NULL, ...) as.liger(object, ...)
object |
Object. |
datasetVar |
Specify the dataset belonging by: 1. Select a variable from
existing metadata in the object (e.g. colData column); 2. Specify a
vector/factor that assign the dataset belonging. 3. Give a single character
string which means that all data is from one dataset (must not be a metadata
variable, otherwise it is understood as 1.). Default |
modal |
Modality setting for each dataset. See
|
... |
Additional arguments passed to |
assay |
Name of assay to use. Default |
For Seurat V5 structure, it is highly recommended that users make use of its
split layer feature, where things like "counts", "data", and "scale.data"
can be held for each dataset in the same Seurat object, e.g. with
"count.ctrl", "count.stim", not merged. If a Seurat object with split layers
is given, datasetVar
will be ignored and the layers will be directly
used.
a liger object.
# dgCMatrix (common sparse matrix class), usually obtained from other # container object, and contains multiple samples merged in one. matList <- rawData(pbmc) multiSampleMatrix <- mergeSparseAll(matList) # The `datasetVar` argument expects the variable assigning the sample source pbmc2 <- as.liger(multiSampleMatrix, datasetVar = pbmc$dataset) pbmc2 if (requireNamespace("SingleCellExperiment", quietly = TRUE)) { sce <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = multiSampleMatrix) ) sce$sample <- pbmc$dataset pbmc3 <- as.liger(sce, datasetVar = "sample") pbmc3 } if (requireNamespace("Seurat", quietly = TRUE)) { seu <- SeuratObject::CreateSeuratObject(multiSampleMatrix) # Seurat creates variable "orig.ident" by identifying the cell barcode # prefixes, which is indeed what we need in this case. Users might need # to be careful and have it confirmed first. pbmc4 <- as.liger(seu, datasetVar = "orig.ident") pbmc4 # As per Seurat V5 updates with layered data, specifically helpful udner the # scenario of dataset integration. "counts" and etc for each datasets can be # split into layers. seu5 <- seu seu5[["RNA"]] <- split(seu5[["RNA"]], pbmc$dataset) print(SeuratObject::Layers(seu5)) pbmc5 <- as.liger(seu5) pbmc5 }
# dgCMatrix (common sparse matrix class), usually obtained from other # container object, and contains multiple samples merged in one. matList <- rawData(pbmc) multiSampleMatrix <- mergeSparseAll(matList) # The `datasetVar` argument expects the variable assigning the sample source pbmc2 <- as.liger(multiSampleMatrix, datasetVar = pbmc$dataset) pbmc2 if (requireNamespace("SingleCellExperiment", quietly = TRUE)) { sce <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = multiSampleMatrix) ) sce$sample <- pbmc$dataset pbmc3 <- as.liger(sce, datasetVar = "sample") pbmc3 } if (requireNamespace("Seurat", quietly = TRUE)) { seu <- SeuratObject::CreateSeuratObject(multiSampleMatrix) # Seurat creates variable "orig.ident" by identifying the cell barcode # prefixes, which is indeed what we need in this case. Users might need # to be careful and have it confirmed first. pbmc4 <- as.liger(seu, datasetVar = "orig.ident") pbmc4 # As per Seurat V5 updates with layered data, specifically helpful udner the # scenario of dataset integration. "counts" and etc for each datasets can be # split into layers. seu5 <- seu seu5[["RNA"]] <- split(seu5[["RNA"]], pbmc$dataset) print(SeuratObject::Layers(seu5)) pbmc5 <- as.liger(seu5) pbmc5 }
Works for converting a matrix or container object to a single ligerDataset,
and can also convert the modality preset of a ligerDataset. When used with
a dense matrix object, it automatically converts the matrix to sparse form
(dgCMatrix-class
). When used with container objects
such as Seurat or SingleCellExperiment, it is highly recommended that the
object contains only one dataset/sample which is going to be integrated with
LIGER. For multi-sample objects, please use as.liger
with
dataset source variable specified.
## S3 method for class 'ligerDataset' as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ... ) ## Default S3 method: as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ... ) ## S3 method for class 'matrix' as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ... ) ## S3 method for class 'Seurat' as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), assay = NULL, ... ) ## S3 method for class 'SingleCellExperiment' as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ... ) as.ligerDataset(object, ...)
## S3 method for class 'ligerDataset' as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ... ) ## Default S3 method: as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ... ) ## S3 method for class 'matrix' as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ... ) ## S3 method for class 'Seurat' as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), assay = NULL, ... ) ## S3 method for class 'SingleCellExperiment' as.ligerDataset( object, modal = c("default", "rna", "atac", "spatial", "meth"), ... ) as.ligerDataset(object, ...)
object |
Object. |
modal |
Modality setting for each dataset. Choose from |
... |
Additional arguments passed to |
assay |
Name of assay to use. Default |
a liger object.
ctrl <- dataset(pbmc, "ctrl") ctrl # Convert the modality preset as.ligerDataset(ctrl, modal = "atac") rawCounts <- rawData(ctrl) class(rawCounts) as.ligerDataset(rawCounts)
ctrl <- dataset(pbmc, "ctrl") ctrl # Convert the modality preset as.ligerDataset(ctrl, modal = "atac") rawCounts <- rawData(ctrl) class(rawCounts) as.ligerDataset(rawCounts)
liger object of bone marrow subsample data with RNA and ATAC modality
bmmc
bmmc
liger object with two dataset named by "rna" and "atac"
https://www.nature.com/articles/s41587-019-0332-7
Jeffrey M. Granja and et. al., Nature Biotechnology, 2019
This metric quantifies how much the factorization and alignment distorts the geometry of the original datasets. The greater the agreement, the less distortion of geometry there is. This is calculated by performing dimensionality reduction on the original and integrated (factorized or plus aligned) datasets, and measuring similarity between the k nearest neighbors for each cell in original and integrated datasets. The Jaccard index is used to quantify similarity, and is the final metric averages across all cells.
Note that for most datasets, the greater the chosen nNeighbor
, the
greater the agreement in general. Although agreement can theoretically
approach 1, in practice it is usually no higher than 0.2-0.3.
calcAgreement( object, ndims = 40, nNeighbors = 15, useRaw = FALSE, byDataset = FALSE, seed = 1, dr.method = NULL, k = nNeighbors, use.aligned = NULL, rand.seed = seed, by.dataset = byDataset )
calcAgreement( object, ndims = 40, nNeighbors = 15, useRaw = FALSE, byDataset = FALSE, seed = 1, dr.method = NULL, k = nNeighbors, use.aligned = NULL, rand.seed = seed, by.dataset = byDataset )
object |
|
ndims |
Number of factors to produce in NMF. Default |
nNeighbors |
Number of nearest neighbors to use in calculating Jaccard
index. Default |
useRaw |
Whether to evaluate just factorized |
byDataset |
Whether to return agreement calculated for each dataset
instead of the average for all datasets. Default |
seed |
Random seed to allow reproducible results. Default |
dr.method |
|
k , rand.seed , by.dataset
|
|
use.aligned |
A numeric vector of agreement metric. A single value if
byDataset = FALSE
or each dataset a value otherwise.
if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- pbmc %>% normalize %>% selectGenes %>% scaleNotCenter %>% runINMF %>% alignFactors calcAgreement(pbmc) }
if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- pbmc %>% normalize %>% selectGenes %>% scaleNotCenter %>% runINMF %>% alignFactors calcAgreement(pbmc) }
This metric quantifies how well-aligned two or more datasets are. We randomly downsample all datasets to have as many cells as the smallest one. We construct a nearest-neighbor graph and calculate for each cell how many of its neighbors are from the same dataset. We average across all cells and compare to the expected value for perfectly mixed datasets, and scale the value from 0 to 1. Note that in practice, alignment can be greater than 1 occasionally.
calcAlignment( object, clustersUse = NULL, clusterVar = NULL, nNeighbors = NULL, cellIdx = NULL, cellComp = NULL, resultBy = c("all", "dataset", "cell"), seed = 1, k = nNeighbors, rand.seed = seed, cells.use = cellIdx, cells.comp = cellComp, clusters.use = clustersUse, by.cell = NULL, by.dataset = NULL )
calcAlignment( object, clustersUse = NULL, clusterVar = NULL, nNeighbors = NULL, cellIdx = NULL, cellComp = NULL, resultBy = c("all", "dataset", "cell"), seed = 1, k = nNeighbors, rand.seed = seed, cells.use = cellIdx, cells.comp = cellComp, clusters.use = clustersUse, by.cell = NULL, by.dataset = NULL )
object |
A liger object, with |
clustersUse |
The clusters to consider for calculating the alignment.
Should be a vector of existing levels in |
clusterVar |
The name of one variable in |
nNeighbors |
Number of neighbors to use in calculating alignment.
Default |
cellIdx , cellComp
|
Character, logical or numeric index that can
subscribe cells. Default |
resultBy |
Select from |
seed |
Random seed to allow reproducible results. Default |
k , rand.seed , cells.use , cells.comp , clusters.use
|
|
by.cell , by.dataset
|
is the average number of neighbors belonging to any cells' same
dataset,
is the number of datasets,
is the number of
neighbors in the KNN graph.
The selection on cells to be measured can be done in various way and represent different scenarios:
By default, all cells are considered and the alignment across all datasets will be calculated.
Select clustersUse
from clusterVar
to use cells from the
clusters of interests. This measures the alignment across all covered
datasets within the specified clusters.
Only Specify cellIdx
for flexible selection. This measures the
alignment across all covered datasets within the specified cells. A none-NULL
cellIdx
privileges over clustersUse
.
Specify cellIdx
and cellComp
at the same time, so that
the original dataset source will be ignored and cells specified by each
argument will be regarded as from each a dataset. This measures the alignment
between cells specified by the two arguments. cellComp
can contain
cells already specified in cellIdx
.
The alignment metric.
if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- pbmc %>% normalize %>% selectGenes %>% scaleNotCenter %>% runINMF %>% alignFactors calcAlignment(pbmc) }
if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- pbmc %>% normalize %>% selectGenes %>% scaleNotCenter %>% runINMF %>% alignFactors calcAlignment(pbmc) }
This function aims at calculating the adjusted Rand index for the clustering result obtained with LIGER and the external clustering (existing "true" annotation). ARI ranges from 0 to 1, with a score of 0 indicating no agreement between clusterings and 1 indicating perfect agreement.
The true clustering annotation must be specified as the base line. We suggest setting it to the object cellMeta so that it can be easily used for many other visualization and evaluation functions.
The ARI can be calculated for only specified datasets, since true annotation
might not be available for all datasets. Evaluation for only one or a few
datasets can be done by specifying useDatasets
. If useDatasets
is specified, the argument checking for trueCluster
and
useCluster
will be enforced to match the cells in the specified
datasets.
calcARI( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), classes.compare = trueCluster )
calcARI( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), classes.compare = trueCluster )
object |
A liger object, with the clustering result present in cellMeta. |
trueCluster |
Either the name of one variable in |
useCluster |
The name of one variable in |
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to be considered for the purity
calculation. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
classes.compare |
A numeric scalar, the ARI of the clustering result indicated by
useCluster
compared to trueCluster
.
A numeric scalar of the ARI value
L. Hubert and P. Arabie (1985) Comparing Partitions, Journal of the Classification, 2, pp. 193-218.
# Assume the true cluster in `pbmcPlot` is "leiden_cluster" # generate fake new labeling fake <- sample(1:7, ncol(pbmcPlot), replace = TRUE) # Insert into cellMeta pbmcPlot$new <- factor(fake) calcARI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new") # Now assume we got existing base line annotation only for "stim" dataset nStim <- ncol(dataset(pbmcPlot, "stim")) stimTrueLabel <- factor(fake[1:nStim]) # Insert into cellMeta cellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel # Assume "leiden_cluster" is the clustering result we got and need to be # evaluated calcARI(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim") # Comparison of the same labeling should always yield 1. calcARI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "leiden_cluster")
# Assume the true cluster in `pbmcPlot` is "leiden_cluster" # generate fake new labeling fake <- sample(1:7, ncol(pbmcPlot), replace = TRUE) # Insert into cellMeta pbmcPlot$new <- factor(fake) calcARI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new") # Now assume we got existing base line annotation only for "stim" dataset nStim <- ncol(dataset(pbmcPlot, "stim")) stimTrueLabel <- factor(fake[1:nStim]) # Insert into cellMeta cellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel # Assume "leiden_cluster" is the clustering result we got and need to be # evaluated calcARI(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim") # Comparison of the same labeling should always yield 1. calcARI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "leiden_cluster")
This score represents the relative magnitude of the
dataset-specific components of each factor's gene loadings compared to the
shared components for two datasets. First, for each dataset we calculate the
norm of the sum of each factor's shared loadings () and
dataset-specific loadings (
). We then determine the ratio of these two
values and subtract from 1... TODO: finish description.
calcDatasetSpecificity( object, dataset1, dataset2, doPlot = FALSE, do.plot = doPlot )
calcDatasetSpecificity( object, dataset1, dataset2, doPlot = FALSE, do.plot = doPlot )
object |
liger object with factorization results. |
dataset1 |
Name of first dataset. Required. |
dataset2 |
Name of second dataset. Required. |
doPlot |
Logical. Whether to display a barplot of dataset specificity
scores (by factor). Default |
do.plot |
Deprecated. Use |
List containing three elements.
pct1 |
Vector of the norm of each metagene factor for dataset1. |
pct2 |
Vector of the norm of each metagene factor for dataset2. |
pctSpec |
Vector of dataset specificity scores. |
This function aims at calculating the Normalized Mutual Information for the clustering result obtained with LIGER and the external clustering (existing "true" annotation). NMI ranges from 0 to 1, with a score of 0 indicating no agreement between clusterings and 1 indicating perfect agreement. The mathematical definition of NMI is as follows:
Where is the cluster variable to be evaluated and
is the true
cluster variable.
and
are the cluster labels in
and
respectively.
is the entropy and
is the mutual
information.
The true clustering annotation must be specified as the base line. We suggest setting it to the object cellMeta so that it can be easily used for many other visualization and evaluation functions.
The NMI can be calculated for only specified datasets, since true annotation
might not be available for all datasets. Evaluation for only one or a few
datasets can be done by specifying useDatasets
. If useDatasets
is specified, the argument checking for trueCluster
and
useCluster
will be enforced to match the cells in the specified
datasets.
calcNMI( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE) )
calcNMI( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE) )
object |
A liger object, with the clustering result present in cellMeta. |
trueCluster |
Either the name of one variable in |
useCluster |
The name of one variable in |
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to be considered for the purity
calculation. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
A numeric scalar of the NMI value
# Assume the true cluster in `pbmcPlot` is "leiden_cluster" # generate fake new labeling fake <- sample(1:7, ncol(pbmcPlot), replace = TRUE) # Insert into cellMeta pbmcPlot$new <- factor(fake) calcNMI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new") # Now assume we got existing base line annotation only for "stim" dataset nStim <- ncol(dataset(pbmcPlot, "stim")) stimTrueLabel <- factor(fake[1:nStim]) # Insert into cellMeta cellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel # Assume "leiden_cluster" is the clustering result we got and need to be # evaluated calcNMI(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim") # Comparison of the same labeling should always yield 1. calcNMI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "leiden_cluster")
# Assume the true cluster in `pbmcPlot` is "leiden_cluster" # generate fake new labeling fake <- sample(1:7, ncol(pbmcPlot), replace = TRUE) # Insert into cellMeta pbmcPlot$new <- factor(fake) calcNMI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new") # Now assume we got existing base line annotation only for "stim" dataset nStim <- ncol(dataset(pbmcPlot, "stim")) stimTrueLabel <- factor(fake[1:nStim]) # Insert into cellMeta cellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel # Assume "leiden_cluster" is the clustering result we got and need to be # evaluated calcNMI(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim") # Comparison of the same labeling should always yield 1. calcNMI(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "leiden_cluster")
This function aims at calculating the purity for the clustering result obtained with LIGER and the external clustering (existing "true" annotation). Purity can sometimes be a more useful metric when the clustering to be tested contains more subgroups or clusters than the true clusters. Purity ranges from 0 to 1, with a score of 1 representing a pure, accurate clustering.
The true clustering annotation must be specified as the base line. We suggest setting it to the object cellMeta so that it can be easily used for many other visualization and evaluation functions.
The purity can be calculated for only specified datasets, since true
annotation might not be available for all datasets. Evaluation for only one
or a few datasets can be done by specifying useDatasets
. If
useDatasets
is specified, the argument checking for trueCluster
and useCluster
will be enforced to match the cells in the specified
datasets.
calcPurity( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), classes.compare = trueCluster )
calcPurity( object, trueCluster, useCluster = NULL, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), classes.compare = trueCluster )
object |
A liger object, with the clustering result present in cellMeta. |
trueCluster |
Either the name of one variable in |
useCluster |
The name of one variable in |
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to be considered for the purity
calculation. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
classes.compare |
A numeric scalar, the purity of the clustering result indicated by
useCluster
compared to trueCluster
.
# Assume the true cluster in `pbmcPlot` is "leiden_cluster" # generate fake new labeling fake <- sample(1:7, ncol(pbmcPlot), replace = TRUE) # Insert into cellMeta pbmcPlot$new <- factor(fake) calcPurity(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new") # Now assume we got existing base line annotation only for "stim" dataset nStim <- ncol(dataset(pbmcPlot, "stim")) stimTrueLabel <- factor(fake[1:nStim]) # Insert into cellMeta cellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel # Assume "leiden_cluster" is the clustering result we got and need to be # evaluated calcPurity(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim")
# Assume the true cluster in `pbmcPlot` is "leiden_cluster" # generate fake new labeling fake <- sample(1:7, ncol(pbmcPlot), replace = TRUE) # Insert into cellMeta pbmcPlot$new <- factor(fake) calcPurity(pbmcPlot, trueCluster = "leiden_cluster", useCluster = "new") # Now assume we got existing base line annotation only for "stim" dataset nStim <- ncol(dataset(pbmcPlot, "stim")) stimTrueLabel <- factor(fake[1:nStim]) # Insert into cellMeta cellMeta(pbmcPlot, "stim_true_label", useDatasets = "stim") <- stimTrueLabel # Assume "leiden_cluster" is the clustering result we got and need to be # evaluated calcPurity(pbmcPlot, trueCluster = "stim_true_label", useCluster = "leiden_cluster", useDatasets = "stim")
This process treats the factor loading of each dataset as the low dimensional embedding as well as the cluster assignment probability, i.e. the soft clustering result. Then the method aligns the embedding by linearly moving the centroids of the same cluster but within each dataset towards each other.
ATTENTION: This method is still under development while has shown encouraging results in benchmarking tests. The arguments and their default values reflect the best scored parameters in the tests and some of them may be subject to change in the future.
centroidAlign(object, ...) ## S3 method for class 'liger' centroidAlign( object, lambda = 1, useDims = NULL, scaleEmb = TRUE, centerEmb = TRUE, scaleCluster = FALSE, centerCluster = FALSE, shift = FALSE, diagnosis = FALSE, ... ) ## S3 method for class 'Seurat' centroidAlign( object, reduction = "inmf", lambda = 1, useDims = NULL, scaleEmb = TRUE, centerEmb = TRUE, scaleCluster = FALSE, centerCluster = FALSE, shift = FALSE, diagnosis = FALSE, ... )
centroidAlign(object, ...) ## S3 method for class 'liger' centroidAlign( object, lambda = 1, useDims = NULL, scaleEmb = TRUE, centerEmb = TRUE, scaleCluster = FALSE, centerCluster = FALSE, shift = FALSE, diagnosis = FALSE, ... ) ## S3 method for class 'Seurat' centroidAlign( object, reduction = "inmf", lambda = 1, useDims = NULL, scaleEmb = TRUE, centerEmb = TRUE, scaleCluster = FALSE, centerCluster = FALSE, shift = FALSE, diagnosis = FALSE, ... )
object |
A liger or Seurat object with valid factorization
result available (i.e. |
... |
Arguments passed to other S3 methods of this function. |
lambda |
Ridge regression penalty applied to each dataset. Can be one
number that applies to all datasets, or a numeric vector with length equal to
the number of datasets. Default |
useDims |
Indices of factors to use considered for the alignment.
Default |
scaleEmb |
Logical, whether to scale the factor loading being considered
as the embedding. Default |
centerEmb |
Logical, whether to center the factor loading being
considered as the embedding before scaling it. Default |
scaleCluster |
Logical, whether to scale the factor loading being
considered as the cluster assignment probability. Default |
centerCluster |
Logical, whether to center the factor loading being
considered as the cluster assignment probability before scaling it. Default
|
shift |
Logical, whether to shift the factor loading being considered as
the cluster assignment probability after centered scaling. Default
|
diagnosis |
Logical, whether to return cell metadata variables with
diagnostic information. See Details. Default |
reduction |
Name of the reduction where LIGER integration result is
stored. Default |
Diagnostic information include:
object$raw_which.max: The index of the factor with the maximum value in the raw factor loading.
object$R_which.max: The index of the factor with the maximum value in the soft clustering probability matrix used for correction.
object$Z_which.max: The index of the factor with the maximum value in the aligned factor loading.
Returns the updated input object
liger method
Update the H.norm
slot for the aligned cell factor
loading, ready for running graph based community detection clustering
or dimensionality reduction for visualization.
Update the cellMata
slot with diagnostic information if
diagnosis = TRUE
.
Seurat method
Update the reductions
slot with a new DimReduc
object containing the aligned cell factor loading.
Update the metadata with diagnostic information if
diagnosis = TRUE
.
pbmc <- centroidAlign(pbmcPlot)
pbmc <- centroidAlign(pbmcPlot)
When need to interact with the data embedded in HDF5 files out of the currect R session, the HDF5 files has to be closed in order to be available to other processes.
closeAllH5(object) ## S3 method for class 'liger' closeAllH5(object) ## S3 method for class 'ligerDataset' closeAllH5(object)
closeAllH5(object) ## S3 method for class 'liger' closeAllH5(object) ## S3 method for class 'ligerDataset' closeAllH5(object)
object |
liger object. |
Nothing is returned.
Check difference of two liger command
commandDiff(object, cmd1, cmd2)
commandDiff(object, cmd1, cmd2)
object |
liger object |
cmd1 , cmd2
|
Exact string of command labels. Available options could be
viewed with running |
If any difference found, a character vector summarizing all differences
pbmc <- normalize(pbmc) pbmc <- normalize(pbmc, log = TRUE, scaleFactor = 1e4) cmds <- commands(pbmc) commandDiff(pbmc, cmds[1], cmds[2])
pbmc <- normalize(pbmc) pbmc <- normalize(pbmc, log = TRUE, scaleFactor = 1e4) cmds <- commands(pbmc) commandDiff(pbmc, cmds[1], cmds[2])
Convert old liger object to latest version
convertOldLiger( object, dimredName, clusterName = "clusters", h5FilePath = NULL )
convertOldLiger( object, dimredName, clusterName = "clusters", h5FilePath = NULL )
object |
|
dimredName |
The name of variable in |
clusterName |
The name of variable in |
h5FilePath |
Named list, to specify the path to the H5 file of each
dataset if location has been changed. Default |
## Not run: # Suppose you have a liger object of old version (<1.99.0) newLig <- convertOldLiger(oldLig) ## End(Not run)
## Not run: # Suppose you have a liger object of old version (<1.99.0) newLig <- convertOldLiger(oldLig) ## End(Not run)
Similar as how default ligerDataset data is accessed.
coordinate(x, dataset) coordinate(x, dataset, check = TRUE) <- value ## S4 method for signature 'liger,character' coordinate(x, dataset) ## S4 replacement method for signature 'liger,character' coordinate(x, dataset, check = TRUE) <- value ## S4 method for signature 'ligerSpatialDataset,missing' coordinate(x, dataset = NULL) ## S4 replacement method for signature 'ligerSpatialDataset,missing' coordinate(x, dataset = NULL, check = TRUE) <- value
coordinate(x, dataset) coordinate(x, dataset, check = TRUE) <- value ## S4 method for signature 'liger,character' coordinate(x, dataset) ## S4 replacement method for signature 'liger,character' coordinate(x, dataset, check = TRUE) <- value ## S4 method for signature 'ligerSpatialDataset,missing' coordinate(x, dataset = NULL) ## S4 replacement method for signature 'ligerSpatialDataset,missing' coordinate(x, dataset = NULL, check = TRUE) <- value
x |
ligerSpatialDataset object or a liger object. |
dataset |
Name or numeric index of an spatial dataset. |
check |
Logical, whether to perform object validity check on setting new value. |
value |
The retrieved coordinate matrix or the updated x
object.
For convenience, the default formatType = "10x"
directly fits the
structure of cellranger output. formatType = "anndata"
works for
current AnnData H5AD file specification (see Details). If a customized H5
file structure is presented, any of the rawData
,
indicesName
, indptrName
, genesName
, barcodesName
should be specified accordingly to override the formatType
preset.
DO make a copy of the H5AD files because rliger functions write to the files and they will not be able to be read back to Python. This will be fixed in the future.
createH5LigerDataset( h5file, formatType = "10x", rawData = NULL, normData = NULL, scaleData = NULL, barcodesName = NULL, genesName = NULL, indicesName = NULL, indptrName = NULL, anndataX = "X", modal = c("default", "rna", "atac", "spatial", "meth"), featureMeta = NULL, ... )
createH5LigerDataset( h5file, formatType = "10x", rawData = NULL, normData = NULL, scaleData = NULL, barcodesName = NULL, genesName = NULL, indicesName = NULL, indptrName = NULL, anndataX = "X", modal = c("default", "rna", "atac", "spatial", "meth"), featureMeta = NULL, ... )
h5file |
Filename of an H5 file |
formatType |
Select preset of H5 file structure. Default |
rawData , indicesName , indptrName
|
The path in a H5 file for the raw
sparse matrix data. These three types of data stands for the |
normData |
The path in a H5 file for the "x" vector of the normalized
sparse matrix. Default |
scaleData |
The path in a H5 file for the Group that contains the sparse
matrix constructing information for the scaled data. Default |
genesName , barcodesName
|
The path in a H5 file for the gene names and
cell barcodes. Default |
anndataX |
The HDF5 path to the raw count data in an H5AD file. See
Details. Default |
modal |
Name of modality for this dataset. Currently options of
|
featureMeta |
Data frame for feature metadata. Default |
... |
Additional slot data. See ligerDataset for detail. Given values will be directly placed at corresponding slots. |
For H5AD file written from an AnnData object, we allow using
formatType = "anndata"
for the function to infer the proper structure.
However, while a typical AnnData-based analysis tends to in-place update the
adata.X
attribute and there is no standard/forced convention for where
the raw count data, as needed from LIGER, is stored. Therefore, we expose
argument anndataX
for specifying this information. The default value
"X"
looks for adata.X
. If the raw data is stored in a layer,
e.g. adata.layers['count']
, then anndataX = "layers/count"
.
If it is stored to adata.raw.X
, then anndataX = "raw/X"
. If
your AnnData object does not have the raw count retained, you will have to
go back to the Python work flow to have it inserted at desired object space
and re-write the H5AD file, or just go from upstream source files with which
the AnnData was originally created.
H5-based ligerDataset object
h5Path <- system.file("extdata/ctrl.h5", package = "rliger") tempPath <- tempfile(fileext = ".h5") file.copy(from = h5Path, to = tempPath) ld <- createH5LigerDataset(tempPath)
h5Path <- system.file("extdata/ctrl.h5", package = "rliger") tempPath <- tempfile(fileext = ".h5") file.copy(from = h5Path, to = tempPath) ld <- createH5LigerDataset(tempPath)
This function allows creating liger object from
multiple datasets of various forms (See rawData
).
DO make a copy of the H5AD files because rliger functions write to the files and they will not be able to be read back to Python. This will be fixed in the future.
createLiger( rawData, modal = NULL, organism = "human", cellMeta = NULL, removeMissing = TRUE, addPrefix = "auto", formatType = "10X", anndataX = "X", dataName = NULL, indicesName = NULL, indptrName = NULL, genesName = NULL, barcodesName = NULL, newH5 = TRUE, verbose = getOption("ligerVerbose", TRUE), ..., raw.data = rawData, take.gene.union = NULL, remove.missing = removeMissing, format.type = formatType, data.name = dataName, indices.name = indicesName, indptr.name = indptrName, genes.name = genesName, barcodes.name = barcodesName )
createLiger( rawData, modal = NULL, organism = "human", cellMeta = NULL, removeMissing = TRUE, addPrefix = "auto", formatType = "10X", anndataX = "X", dataName = NULL, indicesName = NULL, indptrName = NULL, genesName = NULL, barcodesName = NULL, newH5 = TRUE, verbose = getOption("ligerVerbose", TRUE), ..., raw.data = rawData, take.gene.union = NULL, remove.missing = removeMissing, format.type = formatType, data.name = dataName, indices.name = indicesName, indptr.name = indptrName, genes.name = genesName, barcodes.name = barcodesName )
rawData |
Named list of datasets. Required. Elements allowed include a
matrix, a |
modal |
Character vector for modality setting. Use one string for all
datasets, or the same number of strings as the number of datasets. Currently
options of |
organism |
Character vector for setting organism for identifying mito,
ribo and hemo genes for expression percentage calculation. Use one string for
all datasets, or the same number of strings as the number of datasets.
Currently options of |
cellMeta |
data.frame of metadata at single-cell level. Default
|
removeMissing |
Logical. Whether to remove cells that do not have any
counts from each dataset. Default |
addPrefix |
Logical. Whether to add "datasetName_" as a prefix of
cell identifiers (e.g. barcodes) to avoid duplicates in multiple libraries (
common with 10X data). Default |
formatType |
Select preset of H5 file structure. Current available
options are |
anndataX |
The HDF5 path to the raw count data in an H5AD file. See
|
dataName , indicesName , indptrName
|
The path in a H5 file for the raw
sparse matrix data. These three types of data stands for the |
genesName , barcodesName
|
The path in a H5 file for the gene names and
cell barcodes. Default |
newH5 |
When using HDF5 based data and subsets created after removing
missing cells/features, whether to create new HDF5 files for the subset.
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
... |
Additional slot values that should be directly placed in object. |
raw.data , remove.missing , format.type , data.name , indices.name , indptr.name , genes.name , barcodes.name
|
|
take.gene.union |
createLigerDataset
, createH5LigerDataset
# Create from raw count matrices ctrl.raw <- rawData(pbmc, "ctrl") stim.raw <- rawData(pbmc, "stim") pbmc1 <- createLiger(list(ctrl = ctrl.raw, stim = stim.raw)) # Create from H5 files h5Path <- system.file("extdata/ctrl.h5", package = "rliger") tempPath <- tempfile(fileext = ".h5") file.copy(from = h5Path, to = tempPath) lig <- createLiger(list(ctrl = tempPath)) # Create from other container object if (requireNamespace("SeuratObject", quietly = TRUE)) { ctrl.seu <- SeuratObject::CreateSeuratObject(ctrl.raw) stim.seu <- SeuratObject::CreateSeuratObject(stim.raw) pbmc2 <- createLiger(list(ctrl = ctrl.seu, stim = stim.seu)) }
# Create from raw count matrices ctrl.raw <- rawData(pbmc, "ctrl") stim.raw <- rawData(pbmc, "stim") pbmc1 <- createLiger(list(ctrl = ctrl.raw, stim = stim.raw)) # Create from H5 files h5Path <- system.file("extdata/ctrl.h5", package = "rliger") tempPath <- tempfile(fileext = ".h5") file.copy(from = h5Path, to = tempPath) lig <- createLiger(list(ctrl = tempPath)) # Create from other container object if (requireNamespace("SeuratObject", quietly = TRUE)) { ctrl.seu <- SeuratObject::CreateSeuratObject(ctrl.raw) stim.seu <- SeuratObject::CreateSeuratObject(stim.raw) pbmc2 <- createLiger(list(ctrl = ctrl.seu, stim = stim.seu)) }
Create in-memory ligerDataset object
createLigerDataset( rawData = NULL, modal = c("default", "rna", "atac", "spatial", "meth"), normData = NULL, scaleData = NULL, featureMeta = NULL, ... )
createLigerDataset( rawData = NULL, modal = c("default", "rna", "atac", "spatial", "meth"), normData = NULL, scaleData = NULL, featureMeta = NULL, ... )
rawData , normData , scaleData
|
A |
modal |
Name of modality for this dataset. Currently options of
|
featureMeta |
Data frame of feature metadata. Default |
... |
Additional slot data. See ligerDataset for detail. Given values will be directly placed at corresponding slots. |
ligerDataset, ligerATACDataset, ligerSpatialDataset, ligerMethDataset
ctrl.raw <- rawData(pbmc, "ctrl") ctrl.ld <- createLigerDataset(ctrl.raw)
ctrl.raw <- rawData(pbmc, "ctrl") ctrl.ld <- createLigerDataset(ctrl.raw)
The data frame is the direct output of marker detection DEG test applied on
example dataset which can be loaded with data("pbmc")
. The DEG test
was done with:
defaultCluster(pbmc) <- pbmcPlot$leiden_cluster deg.marker <- runMarkerDEG( pbmc, minCellPerRep = 5 )
The result is for the marker detection test for 8 clusters in the dataset by comparing each cluster against all other clusters.
deg.marker
deg.marker
data.frame object of 1992 rows with columns:
feature: gene names, 249 unique genes repeated 8 times for the tests done for 8 clusters.
group: cluster names, 8 unique cluster names, dividing the tests.
logFC: log fold change of the gene expression between the cluster of interest against all other clusters.
pval: p-value of the DEG test.
padj: adjusted p-value of the DEG test.
pct_in: percentage of cells in the cluster of interest expressing the gene.
pct_out: percentage of cells in all other clusters expressing the gene.
The data frame is the direct output of pairwise DEG test applied on example
dataset which can be loaded with data("pbmc")
. The DEG test was done
with:
defaultCluster(pbmc) <- pbmcPlot$leiden_cluster degTest <- runPairwiseDEG( pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "defaultCluster" )`
The result is for the DEG test split for each cluster in the dataset, and within each cluster, compare the cells from "stim" against the cells from "ctrl".
deg.pw
deg.pw
data.frame object of 1743 rows with columns:
feature: gene names, 249 unique genes repeated 7 times for the tests
done for 7 clusters. (1 less cluster than in deg.marker
due to
too tiny sample size in the smallest cluster)
group: cluster names, 7 unique cluster names, dividing the tests.
logFC: log fold change of the gene expression between the condition of interest against the control condition.
pval: p-value of the DEG test.
padj: adjusted p-value of the DEG test.
pct_in: percentage of cells in the condition of interest expressing the gene.
pct_out: percentage of cells in the control condition expressing the gene.
This function mainly aims at downsampling datasets to a size suitable for plotting or expensive in-memmory calculation.
Users can balance the sample size of categories of interests with
balance
. Multi-variable specification to balance
is supported,
so that at most maxCells
cells will be sampled from each combination
of categories from the variables. For example, when two datasets are
presented and three clusters labeled across them, there would then be at most
cells being selected. Note that
"dataset"
will automatically be added as one variable when balancing
the downsampling. However, if users want to balance the downsampling solely
basing on dataset origin, users have to explicitly set balance =
"dataset"
.
downsample( object, balance = NULL, maxCells = 1000, useDatasets = NULL, seed = 1, returnIndex = FALSE, ... )
downsample( object, balance = NULL, maxCells = 1000, useDatasets = NULL, seed = 1, returnIndex = FALSE, ... )
object |
liger object |
balance |
Character vector of categorical variable names in
|
maxCells |
Max number of cells to sample from the grouping based on
|
useDatasets |
Index selection of datasets to include Default
|
seed |
Random seed for reproducibility. Default |
returnIndex |
Logical, whether to only return the numeric index that can
subset the original object instead of a subset object. Default |
... |
Arguments passed to |
By default, a subset of liger object
.
Alternatively when returnIndex = TRUE
, a numeric vector to be used
with the original object.
# Subsetting an object pbmc <- downsample(pbmc) # Creating a subsetting index sampleIdx <- downsample(pbmcPlot, balance = "leiden_cluster", maxCells = 10, returnIndex = TRUE) plotClusterDimRed(pbmcPlot, cellIdx = sampleIdx)
# Subsetting an object pbmc <- downsample(pbmc) # Creating a subsetting index sampleIdx <- downsample(pbmcPlot, balance = "leiden_cluster", maxCells = 10, returnIndex = TRUE) plotClusterDimRed(pbmcPlot, cellIdx = sampleIdx)
Export the predicted gene-pair interactions calculated by
upstream function linkGenesAndPeaks
into an Interact Track file
which is compatible with UCSC
Genome Browser.
exportInteractTrack( corrMat, pathToCoords, useGenes = NULL, outputPath = getwd() )
exportInteractTrack( corrMat, pathToCoords, useGenes = NULL, outputPath = getwd() )
corrMat |
A sparse matrix of correlation with peak names as rows and gene names as columns. |
pathToCoords |
Path to the gene coordinates file. |
useGenes |
Character vector of gene names to be exported. Default
|
outputPath |
Path of filename where the output file will be stored. If
a folder, a file named |
No return value. A file located at outputPath
will be created.
bmmc <- normalize(bmmc) bmmc <- selectGenes(bmmc) bmmc <- scaleNotCenter(bmmc) if (requireNamespace("RcppPlanc", quietly = TRUE) && requireNamespace("GenomicRanges", quietly = TRUE) && requireNamespace("IRanges", quietly = TRUE) && requireNamespace("psych", quietly = TRUE)) { bmmc <- runINMF(bmmc) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna") corr <- linkGenesAndPeaks( bmmc, useDataset = "rna", pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger") ) resultPath <- tempfile() exportInteractTrack( corrMat = corr, pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger"), outputPath = resultPath ) head(read.table(resultPath, skip = 1)) }
bmmc <- normalize(bmmc) bmmc <- selectGenes(bmmc) bmmc <- scaleNotCenter(bmmc) if (requireNamespace("RcppPlanc", quietly = TRUE) && requireNamespace("GenomicRanges", quietly = TRUE) && requireNamespace("IRanges", quietly = TRUE) && requireNamespace("psych", quietly = TRUE)) { bmmc <- runINMF(bmmc) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna") corr <- linkGenesAndPeaks( bmmc, useDataset = "rna", pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger") ) resultPath <- tempfile() exportInteractTrack( corrMat = corr, pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger"), outputPath = resultPath ) head(read.table(resultPath, skip = 1)) }
Applies various filters to genes on the shared () and
dataset-specific (
) components of the factorization, before selecting
those which load most significantly on each factor (in a shared or
dataset-specific way).
getFactorMarkers( object, dataset1, dataset2, factorShareThresh = 10, datasetSpecificity = NULL, logFCThresh = 1, pvalThresh = 0.05, nGenes = 30, printGenes = FALSE, verbose = getOption("ligerVerbose", TRUE), factor.share.thresh = factorShareThresh, dataset.specificity = datasetSpecificity, log.fc.thresh = logFCThresh, pval.thresh = pvalThresh, num.genes = nGenes, print.genes = printGenes )
getFactorMarkers( object, dataset1, dataset2, factorShareThresh = 10, datasetSpecificity = NULL, logFCThresh = 1, pvalThresh = 0.05, nGenes = 30, printGenes = FALSE, verbose = getOption("ligerVerbose", TRUE), factor.share.thresh = factorShareThresh, dataset.specificity = datasetSpecificity, log.fc.thresh = logFCThresh, pval.thresh = pvalThresh, num.genes = nGenes, print.genes = printGenes )
object |
liger object with factorization results. |
dataset1 |
Name of first dataset. Required. |
dataset2 |
Name of second dataset. Required |
factorShareThresh |
Numeric. Only factors with a dataset specificity
less than or equal to this threshold will be used. Default |
datasetSpecificity |
Numeric vector. Pre-calculated dataset specificity
if available. Length should match number of all factors available. Default
|
logFCThresh |
Numeric. Lower log-fold change threshold for differential
expression in markers. Default |
pvalThresh |
Numeric. Upper p-value threshold for Wilcoxon rank test for
gene expression. Default |
nGenes |
Integer. Max number of genes to report for each dataset.
Default |
printGenes |
Logical. Whether to print ordered markers passing logFC,
UMI and frac thresholds, when |
verbose |
Logical. Whether to show information of the progress. Default
|
factor.share.thresh , dataset.specificity , log.fc.thresh , pval.thresh , num.genes , print.genes
|
Deprecated. See Usage section for replacement. |
A list object consisting of the following entries:
value of dataset1 |
data.frame of dataset1-specific markers |
shared |
data.frame of shared markers |
value of dataset1 |
data.frame of dataset2-specific markers |
num_factors_V1 |
A frequency table indicating the number of factors each marker appears, in dataset1 |
num_factors_V2 |
A frequency table indicating the number of factors each marker appears, in dataset2 |
library(dplyr) result <- getFactorMarkers(pbmcPlot, dataset1 = "ctrl", dataset2 = "stim") print(class(result)) print(names(result)) result$shared %>% group_by(factor_num) %>% top_n(2, logFC)
library(dplyr) result <- getFactorMarkers(pbmcPlot, dataset1 = "ctrl", dataset2 = "stim") print(class(result)) print(names(result)) result$shared %>% group_by(factor_num) %>% top_n(2, logFC)
Calculates proportion of mitochondrial contribution based on raw or normalized data.
getProportionMito(object, use.norm = FALSE, pattern = "^mt-")
getProportionMito(object, use.norm = FALSE, pattern = "^mt-")
object |
|
use.norm |
Deprecated Whether to use cell normalized data in
calculating contribution. Default |
pattern |
Regex pattern for identifying mitochondrial genes. Default
|
Named vector containing proportion of mitochondrial contribution for each cell.
getProportionMito
will be deprecated because
runGeneralQC
generally covers and expands its use case.
# Example dataset does not contain MT genes, expected to see a message pbmc$mito <- getProportionMito(pbmc)
# Example dataset does not contain MT genes, expected to see a message pbmc$mito <- getProportionMito(pbmc)
h5 calculation wrapper, that runs specified calculation with on-disk matrix in chunks
H5Apply( object, FUN, init = NULL, useData = c("rawData", "normData"), chunkSize = 1000, verbose = getOption("ligerVerbose"), ... )
H5Apply( object, FUN, init = NULL, useData = c("rawData", "normData"), chunkSize = 1000, verbose = getOption("ligerVerbose"), ... )
object |
A ligerDataset object. |
FUN |
A function that is applied to each chunk. See detail for restrictions. |
init |
Initialized result if it need to be updated iteratively. Default
|
useData |
The slot name of the data to be processed. Choose from
|
chunkSize |
Number if columns to be included in each chunk.
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
... |
Other arguments to be passed to |
The FUN
function has to have the first four arguments ordered
by:
chunk data: A sparse matrix
(dgCMatrix-class
) containing maximum chunkSize
columns.
x-vector index: The index that subscribes the vector of x
slot of a dgCMatrix, which points to the values in each chunk. Mostly used
when need to write a new sparse matrix to H5 file.
cell index: The column index of each chunk out of the whole original matrix
Initialized result: A customized object, the value passed to
H5Apply(init)
argument will be passed here in the first iteration. And
the returned value of FUN
will be iteratively passed here in next
chunk iterations. So it is important to keep the object structure of the
returned value consistent with init
.
No default value to these four arguments should be pre-defined because
H5Apply
will automatically generate the input.
These are functions to download example datasets that are subset from public data.
PBMC - Downsampled from GSE96583, Kang et al, Nature Biotechnology, 2018. Contains two scRNAseq datasets.
BMMC - Downsampled from GSE139369, Granja et al, Nature Biotechnology, 2019. Contains two scRNAseq datasets and one scATAC data.
CGE - Downsampled from GSE97179, Luo et al, Science, 2017. Contains one scRNAseq dataset and one DNA methylation data.
importPBMC( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ... ) importBMMC( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ... ) importCGE( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ... )
importPBMC( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ... ) importBMMC( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ... ) importCGE( dir = getwd(), overwrite = FALSE, method = "libcurl", verbose = getOption("ligerVerbose", TRUE), ... )
dir |
Path to download datasets. Default current working directory
|
overwrite |
Logical, if a file exists at corresponding download
location, whether to re-download or directly use this file. Default
|
method |
|
verbose |
Logical. Whether to show information of the progress. Default
|
... |
Additional arguments passed to |
Constructed liger object with QC performed and missing data removed.
pbmc <- importPBMC() bmmc <- importBMMC() cge <- importCGE()
pbmc <- importPBMC() bmmc <- importBMMC() cge <- importCGE()
This function is designed for creating peak data for a dataset with only gene expression. This function uses aligned cell factor loading to find nearest neighbors between cells from the queried dataset (without peak) and cells from reference dataset (with peak). And then impute the peak for the former basing on the weight. Therefore, the reference dataset selected must be of "atac" modality setting.
imputeKNN( object, reference, queries = NULL, nNeighbors = 20, weight = TRUE, norm = TRUE, scale = FALSE, verbose = getOption("ligerVerbose", TRUE), ..., knn_k = nNeighbors )
imputeKNN( object, reference, queries = NULL, nNeighbors = 20, weight = TRUE, norm = TRUE, scale = FALSE, verbose = getOption("ligerVerbose", TRUE), ..., knn_k = nNeighbors )
object |
liger object with aligned factor loading computed in advance. |
reference |
Name of a dataset containing peak data to impute into query dataset(s). |
queries |
Names of datasets to be augmented by imputation. Should not
include |
nNeighbors |
The maximum number of nearest neighbors to search. Default
|
weight |
Logical. Whether to use KNN distances as weight matrix. Default
|
norm |
Logical. Whether to normalize the imputed data. Default
|
scale |
Logical. Whether to scale but not center the imputed data.
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
... |
Optional arguments to be passed to |
knn_k |
Deprecated. See Usage section for replacement. |
The input object
where queried ligerDataset
objects in datasets
slot are replaced. These datasets will all be
converted to ligerATACDataset class with an additional slot
rawPeak
to store the imputed peak counts, and normPeak
for
normalized imputed peak counts if norm = TRUE
.
bmmc <- normalize(bmmc) bmmc <- selectGenes(bmmc, datasets.use = "rna") bmmc <- scaleNotCenter(bmmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { bmmc <- runINMF(bmmc, k = 20) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna") }
bmmc <- normalize(bmmc) bmmc <- selectGenes(bmmc, datasets.use = "rna") bmmc <- scaleNotCenter(bmmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { bmmc <- runINMF(bmmc, k = 20) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna") }
Check if given liger object if under new implementation
is.newLiger(object)
is.newLiger(object)
object |
A liger object |
TRUE
if the version of object
is later than or equal to
1.99.0. Otherwise FALSE
. It raises an error if input object is not of
liger class.
is.newLiger(pbmc) # TRUE
is.newLiger(pbmc) # TRUE
Check if a liger or ligerDataset object is made of HDF5 file
isH5Liger(object, dataset = NULL)
isH5Liger(object, dataset = NULL)
object |
A liger or ligerDataset object. |
dataset |
If |
TRUE
or FALSE
for the specified check.
isH5Liger(pbmc) isH5Liger(pbmc, "ctrl") ctrl <- dataset(pbmc, "ctrl") isH5Liger(ctrl)
isH5Liger(pbmc) isH5Liger(pbmc, "ctrl") ctrl <- dataset(pbmc, "ctrl") isH5Liger(ctrl)
liger
object is the main data container for LIGER
analysis in R. The slot datasets
is a list where each element should
be a ligerDataset object containing dataset specific
information, such as the expression matrices. The other parts of liger object
stores information that can be shared across the analysis, such as the cell
metadata.
This manual provides explanation to the liger
object structure as well
as usage of class-specific methods. Please see detail sections for more
information.
For liger
objects created with older versions of rliger package,
please try updating the objects individually with
convertOldLiger
.
datasets(x, check = NULL) datasets(x, check = TRUE) <- value dataset(x, dataset = NULL) dataset(x, dataset, type = NULL, qc = TRUE) <- value cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ... ) cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, inplace = FALSE, check = FALSE ) <- value defaultCluster(x, useDatasets = NULL, ...) defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value dimReds(x) dimReds(x) <- value dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) <- value defaultDimRed(x, useDatasets = NULL, cellIdx = NULL) defaultDimRed(x) <- value varFeatures(x) varFeatures(x, check = TRUE) <- value varUnsharedFeatures(x, dataset = NULL) varUnsharedFeatures(x, dataset, check = TRUE) <- value commands(x, funcName = NULL, arg = NULL) ## S4 method for signature 'liger' show(object) ## S4 method for signature 'liger' dim(x) ## S4 method for signature 'liger' dimnames(x) ## S4 replacement method for signature 'liger,list' dimnames(x) <- value ## S4 method for signature 'liger' datasets(x, check = NULL) ## S4 replacement method for signature 'liger,logical' datasets(x, check = TRUE) <- value ## S4 replacement method for signature 'liger,missing' datasets(x, check = TRUE) <- value ## S4 method for signature 'liger,character_OR_NULL' dataset(x, dataset = NULL) ## S4 method for signature 'liger,missing' dataset(x, dataset = NULL) ## S4 method for signature 'liger,numeric' dataset(x, dataset = NULL) ## S4 replacement method for signature 'liger,character,missing,ANY,ligerDataset' dataset(x, dataset, type = NULL, qc = TRUE) <- value ## S4 replacement method for signature 'liger,character,ANY,ANY,matrixLike' dataset(x, dataset, type = c("rawData", "normData"), qc = FALSE) <- value ## S4 replacement method for signature 'liger,character,missing,ANY,NULL' dataset(x, dataset, type = NULL, qc = TRUE) <- value ## S3 method for class 'liger' names(x) ## S3 replacement method for class 'liger' names(x) <- value ## S3 method for class 'liger' length(x) ## S3 method for class 'liger' lengths(x, use.names = TRUE) ## S4 method for signature 'liger,NULL' cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ... ) ## S4 method for signature 'liger,character' cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ... ) ## S4 method for signature 'liger,missing' cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ... ) ## S4 replacement method for signature 'liger,missing' cellMeta(x, columns = NULL, useDatasets = NULL, cellIdx = NULL, check = FALSE) <- value ## S4 replacement method for signature 'liger,character' cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, inplace = TRUE, check = FALSE ) <- value ## S4 method for signature 'liger' rawData(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL' rawData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5D' rawData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'liger' normData(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL' normData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5D' normData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'liger,ANY' scaleData(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5D' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5Group' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'liger,character' scaleUnsharedData(x, dataset = NULL) ## S4 method for signature 'liger,numeric' scaleUnsharedData(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL' scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5D' scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5Group' scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'liger,ANY,ANY,ANY' getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B", "W", "H.norm", "rawPeak", "normPeak"), dataset = NULL, returnList = FALSE ) ## S4 method for signature 'liger,ANY' getH5File(x, dataset = NULL) ## S3 replacement method for class 'liger' x[[i]] <- value ## S3 method for class 'liger' x$name ## S3 replacement method for class 'liger' x$name <- value ## S4 method for signature 'liger' defaultCluster(x, useDatasets = NULL, droplevels = FALSE, ...) ## S4 replacement method for signature 'liger,ANY,ANY,character' defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value ## S4 replacement method for signature 'liger,ANY,ANY,factor' defaultCluster(x, name = NULL, useDatasets = NULL, droplevels = TRUE, ...) <- value ## S4 replacement method for signature 'liger,ANY,ANY,NULL' defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value ## S4 method for signature 'liger' dimReds(x) ## S4 replacement method for signature 'liger,list' dimReds(x) <- value ## S4 method for signature 'liger,missing_OR_NULL' dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) ## S4 method for signature 'liger,index' dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) ## S4 replacement method for signature 'liger,index,ANY,ANY,NULL' dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) <- value ## S4 replacement method for signature 'liger,character,ANY,ANY,matrixLike' dimRed( x, name = NULL, useDatasets = NULL, cellIdx = NULL, asDefault = NULL, inplace = FALSE, ... ) <- value ## S4 method for signature 'liger' defaultDimRed(x, useDatasets = NULL, cellIdx = NULL) ## S4 replacement method for signature 'liger,character' defaultDimRed(x) <- value ## S4 method for signature 'liger' varFeatures(x) ## S4 replacement method for signature 'liger,ANY,character' varFeatures(x, check = TRUE) <- value ## S4 method for signature 'liger,ANY' varUnsharedFeatures(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,character' varUnsharedFeatures(x, dataset, check = TRUE) <- value ## S3 method for class 'liger' fortify(model, data, ...) ## S3 method for class 'liger' c(...) ## S4 method for signature 'liger' commands(x, funcName = NULL, arg = NULL) ## S4 method for signature 'ligerDataset,missing' varUnsharedFeatures(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,missing,ANY,character' varUnsharedFeatures(x, dataset = NULL, check = TRUE) <- value
datasets(x, check = NULL) datasets(x, check = TRUE) <- value dataset(x, dataset = NULL) dataset(x, dataset, type = NULL, qc = TRUE) <- value cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ... ) cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, inplace = FALSE, check = FALSE ) <- value defaultCluster(x, useDatasets = NULL, ...) defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value dimReds(x) dimReds(x) <- value dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) <- value defaultDimRed(x, useDatasets = NULL, cellIdx = NULL) defaultDimRed(x) <- value varFeatures(x) varFeatures(x, check = TRUE) <- value varUnsharedFeatures(x, dataset = NULL) varUnsharedFeatures(x, dataset, check = TRUE) <- value commands(x, funcName = NULL, arg = NULL) ## S4 method for signature 'liger' show(object) ## S4 method for signature 'liger' dim(x) ## S4 method for signature 'liger' dimnames(x) ## S4 replacement method for signature 'liger,list' dimnames(x) <- value ## S4 method for signature 'liger' datasets(x, check = NULL) ## S4 replacement method for signature 'liger,logical' datasets(x, check = TRUE) <- value ## S4 replacement method for signature 'liger,missing' datasets(x, check = TRUE) <- value ## S4 method for signature 'liger,character_OR_NULL' dataset(x, dataset = NULL) ## S4 method for signature 'liger,missing' dataset(x, dataset = NULL) ## S4 method for signature 'liger,numeric' dataset(x, dataset = NULL) ## S4 replacement method for signature 'liger,character,missing,ANY,ligerDataset' dataset(x, dataset, type = NULL, qc = TRUE) <- value ## S4 replacement method for signature 'liger,character,ANY,ANY,matrixLike' dataset(x, dataset, type = c("rawData", "normData"), qc = FALSE) <- value ## S4 replacement method for signature 'liger,character,missing,ANY,NULL' dataset(x, dataset, type = NULL, qc = TRUE) <- value ## S3 method for class 'liger' names(x) ## S3 replacement method for class 'liger' names(x) <- value ## S3 method for class 'liger' length(x) ## S3 method for class 'liger' lengths(x, use.names = TRUE) ## S4 method for signature 'liger,NULL' cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ... ) ## S4 method for signature 'liger,character' cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ... ) ## S4 method for signature 'liger,missing' cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, as.data.frame = FALSE, ... ) ## S4 replacement method for signature 'liger,missing' cellMeta(x, columns = NULL, useDatasets = NULL, cellIdx = NULL, check = FALSE) <- value ## S4 replacement method for signature 'liger,character' cellMeta( x, columns = NULL, useDatasets = NULL, cellIdx = NULL, inplace = TRUE, check = FALSE ) <- value ## S4 method for signature 'liger' rawData(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL' rawData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5D' rawData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'liger' normData(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL' normData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5D' normData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'liger,ANY' scaleData(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5D' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5Group' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'liger,character' scaleUnsharedData(x, dataset = NULL) ## S4 method for signature 'liger,numeric' scaleUnsharedData(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,matrixLike_OR_NULL' scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5D' scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'liger,ANY,ANY,H5Group' scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'liger,ANY,ANY,ANY' getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B", "W", "H.norm", "rawPeak", "normPeak"), dataset = NULL, returnList = FALSE ) ## S4 method for signature 'liger,ANY' getH5File(x, dataset = NULL) ## S3 replacement method for class 'liger' x[[i]] <- value ## S3 method for class 'liger' x$name ## S3 replacement method for class 'liger' x$name <- value ## S4 method for signature 'liger' defaultCluster(x, useDatasets = NULL, droplevels = FALSE, ...) ## S4 replacement method for signature 'liger,ANY,ANY,character' defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value ## S4 replacement method for signature 'liger,ANY,ANY,factor' defaultCluster(x, name = NULL, useDatasets = NULL, droplevels = TRUE, ...) <- value ## S4 replacement method for signature 'liger,ANY,ANY,NULL' defaultCluster(x, name = NULL, useDatasets = NULL, ...) <- value ## S4 method for signature 'liger' dimReds(x) ## S4 replacement method for signature 'liger,list' dimReds(x) <- value ## S4 method for signature 'liger,missing_OR_NULL' dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) ## S4 method for signature 'liger,index' dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) ## S4 replacement method for signature 'liger,index,ANY,ANY,NULL' dimRed(x, name = NULL, useDatasets = NULL, cellIdx = NULL, ...) <- value ## S4 replacement method for signature 'liger,character,ANY,ANY,matrixLike' dimRed( x, name = NULL, useDatasets = NULL, cellIdx = NULL, asDefault = NULL, inplace = FALSE, ... ) <- value ## S4 method for signature 'liger' defaultDimRed(x, useDatasets = NULL, cellIdx = NULL) ## S4 replacement method for signature 'liger,character' defaultDimRed(x) <- value ## S4 method for signature 'liger' varFeatures(x) ## S4 replacement method for signature 'liger,ANY,character' varFeatures(x, check = TRUE) <- value ## S4 method for signature 'liger,ANY' varUnsharedFeatures(x, dataset = NULL) ## S4 replacement method for signature 'liger,ANY,ANY,character' varUnsharedFeatures(x, dataset, check = TRUE) <- value ## S3 method for class 'liger' fortify(model, data, ...) ## S3 method for class 'liger' c(...) ## S4 method for signature 'liger' commands(x, funcName = NULL, arg = NULL) ## S4 method for signature 'ligerDataset,missing' varUnsharedFeatures(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,missing,ANY,character' varUnsharedFeatures(x, dataset = NULL, check = TRUE) <- value
x , object , model
|
A liger object |
check |
Logical, whether to perform object validity check on setting new
value. Users are not supposed to set |
value |
Metadata value to be inserted |
dataset |
Name or numeric index of a dataset |
type |
When using |
qc |
Logical, whether to perform general qc on added new dataset. |
columns |
The names of available variables in |
useDatasets |
Setter or getter method should only apply on cells in
specified datasets. Any valid character, numeric or logical subscriber is
acceptable. Default |
cellIdx |
Valid cell subscription to subset retrieved variables. Default
|
as.data.frame |
Logical, whether to apply
|
... |
See detailed sections for explanation. |
inplace |
For |
name |
The name of available variables in |
funcName , arg
|
See Command records section. |
use.names |
Whether returned vector should be named with dataset names. |
slot |
Name of slot to retrieve matrix from. Options shown in Usage. |
returnList |
Logical, whether to force return a list even when only one
dataset-specific matrix (i.e. expression matrices, H, V or U) is requested.
Default |
i |
Name or numeric index of cell meta variable to be replaced |
droplevels |
Whether to remove unused cluster levels from the factor
object fetched by |
asDefault |
Whether to set the inserted dimension reduction matrix as
default for visualization methods. Default |
data |
fortify method required argument. Not used. |
See detailed sections for explanetion.
Input liger object updated with replaced/new variable in
cellMeta(x)
.
datasets
list of ligerDataset objects. Use generic
dataset
, dataset<-
, datasets
or datasets<-
to
interact with. See detailed section accordingly.
cellMeta
DFrame object for cell metadata. Pre-existing
metadata, QC metrics, cluster labeling and etc. are all stored here. Use
generic cellMeta
, cellMeta<-
, $
, [[]]
or
[[]]<-
to interact with. See detailed section accordingly.
varFeatures
Character vector of names of variable features. Use generic
varFeatures
or varFeatures<-
to interact with. See detailed
section accordingly.
W
iNMF output matrix of shared gene loadings for each factor. See
runIntegration
.
H.norm
Matrix of aligned factor loading for each cell. See
alignFactors
and runIntegration
.
commands
List of ligerCommand objects. Record of
analysis. Use commands
to retrieve information. See detailed section
accordingly.
uns
List for unstructured meta-info of analyses or presets.
version
Record of version of rliger package
datasets()
method only accesses the datasets
slot, the list of
ligerDataset objects. dataset()
method accesses a single
dataset, with subsequent cell metadata updates and checks bonded when adding
or modifying a dataset. Therefore, when users want to modify something inside
a ligerDataset
while no cell metadata change should happen, it is
recommended to use: datasets(x)[[name]] <- ligerD
for efficiency,
though the result would be the same as dataset(x, name) <- ligerD
.
length()
and names()
methods are implemented to access the
number and names of datasets. names<-
method is supported for
modifying dataset names, with taking care of the "dataset" variable in cell
metadata.
For liger
object, rawData()
, normData
,
scaleData()
and scaleUnsharedData()
methods are exported for
users to access the corresponding feature expression matrix with
specification of one dataset. For retrieving a type of matrix from multiple
datasets, please use getMatrix()
method.
When only one matrix is expected to be retrieved by getMatrix()
, the
matrix itself will be returned. A list will be returned if multiple matrices
is requested (by querying multiple datasets) or returnList
is set to
TRUE
.
Three approaches are provided for access of cell metadata. A generic function
cellMeta
is implemented with plenty of options and multi-variable
accessibility. Besides, users can use double-bracket (e.g.
ligerObj[[varName]]
) or dollor-sign (e.g. ligerObj$nUMI
) to
access or modify single variables.
For users' convenience of generating a customized ggplot with available cell
metadata, the S3 method fortify.liger
is implemented. With this under
the hook, users can create simple ggplots by directly starting with
ggplot(ligerObj, aes(...))
where cell metadata variables can be
directly thrown into aes()
.
Special partial metadata insertion is implemented specifically for mapping
categorical annotation from sub-population (subset object) back to original
experiment (full-size object). For example, when sub-clustering and
annotation is done for a specific cell-type of cells (stored in
subobj
) subset from an experiment (stored as obj
), users can do
cellMeta(obj, "sub_ann", cellIdx = colnames(subobj)) <- subobj$sub_ann
to map the value back, leaving other cells non-annotated with NAs. Plotting
with this variable will then also show NA cells with default grey color.
Furthermore, sub-clustering labels for other cell types can also be mapped
to the same variable. For example, cellMeta(obj, "sub_ann",
cellIdx = colnames(subobj2)) <- subobj2$sub_ann
. As long as the labeling
variables are stored as factor class (categorical), the levels (category
names) will be properly handled and merged. Other situations follow the R
default behavior (e.g. categories might be converted to integer numbers if
mapped to numerical variable in the original object). Note that this feature
is only available with using the generic function cellMeta
but not
with the `[[`
or `$`
accessing methods due to syntax reasons.
The generic defaultCluster
works as both getter and setter. As a
setter, users can do defaultCluster(obj) <- "existingVariableName"
to
set a categorical variable as default cluster used for visualization or
downstream analysis. Users can also do defaultCluster(obj,
"newVarName") <- factorOfLabels
to push new labeling into the object and set
as default. For getter method, the function returns a factor object of the
default cluster labeling. Argument useDatasets
can be used for
requiring that given or retrieved labeling should match with cells in
specified datasets. We generally don't recommend setting "dataset"
as
a default cluster because it is a preserved (always existing) field in
metadata and can lead to meaningless result when running analysis that
utilizes both clustering information and the dataset source information.
Currently, low-dimensional representaion of cells, presented as dense
matrices, are all stored in dimReds
slot, and can totally be accessed
with generics dimRed
and dimRed<-
. Adding a dimRed to the
object looks as simple as dimRed(obj, "name") <- matrixLike
. It can
be retrieved back with dimRed(obj, "name")
. Similar to having a
default cluster labeling, we also constructed the feature of default dimRed.
It can be set with defaultDimRed(obj) <- "existingMatLikeVar"
and the
matrix can be retrieved with defaultDimRed(obj)
.
The varFeatures
slot allows for character vectors of gene names.
varFeatures(x)
returns this vector and value
for
varFeatures<-
method has to be a character vector or NULL
.
The replacement method, when check = TRUE
performs checks on gene
name consistency check across the scaleData
, H
, V
slots
of inner ligerDataset
objects as well as the W
and
H.norm
slots of the input liger
object.
rliger functions, that perform calculation and update the liger
object, will be recorded in a ligerCommand
object and stored in the
commands
slot, a list, of liger
object. Method
commands()
is implemented to retrieve or show the log history.
Running with funcName = NULL
(default) returns all command labels.
Specifying funcName
allows partial matching to all command labels
and returns a subset list (of ligerCommand
object) of matches (or
the ligerCommand
object if only one match found). If arg
is
further specified, a subset list of parameters from the matches will be
returned. For example, requesting a list of resolution values used in
all louvain cluster attempts: commands(ligerObj, "louvainCluster",
"resolution")
For a liger
object, the column orientation is assigned for
cells. Due to the data structure, it is hard to define a row index for the
liger
object, which might contain datasets that vary in number of
genes.
Therefore, for liger
objects, dim
and dimnames
returns
NA
/NULL
for rows and total cell counts/barcodes for the
columns.
For direct call of dimnames<-
method, value
should be a list
with NULL
as the first element and valid cell identifiers as the
second element. For colnames<-
method, the character vector of cell
identifiers. rownames<-
method is not applicable.
For more detail of subsetting a liger
object or a
ligerDataset object, please check out subsetLiger
and subsetLigerDataset
. Here, we set the S4 method
"single-bracket" [
as a quick wrapper to subset a liger
object.
Note that j
serves as cell subscriptor which can be any valid index
refering the collection of all cells (i.e. rownames(cellMeta(obj))
).
While i
, the feature subscriptor can only be character vector because
the features for each dataset can vary. ...
arugments are passed to
subsetLiger
so that advanced options are allowed.
The list of datasets
slot,
the rows of cellMeta
slot and the list of commands
slot will
be simply concatenated. Variable features in varFeatures
slot will be
taken a union. The and
matrices are not taken into
account for now.
# Methods for base generics pbmcPlot print(pbmcPlot) dim(pbmcPlot) ncol(pbmcPlot) colnames(pbmcPlot)[1:5] pbmcPlot[varFeatures(pbmcPlot)[1:10], 1:10] names(pbmcPlot) length(pbmcPlot) # rliger generics ## Retrieving dataset(s), replacement methods available datasets(pbmcPlot) dataset(pbmcPlot, "ctrl") dataset(pbmcPlot, 2) ## Retrieving cell metadata, replacement methods available cellMeta(pbmcPlot) head(pbmcPlot[["nUMI"]]) ## Retrieving dimemtion reduction matrix head(dimRed(pbmcPlot, "UMAP")) ## Retrieving variable features, replacement methods available varFeatures(pbmcPlot) ## Command record/history pbmcPlot <- scaleNotCenter(pbmcPlot) commands(pbmcPlot) commands(pbmcPlot, funcName = "scaleNotCenter") # S3 methods pbmcPlot2 <- pbmcPlot names(pbmcPlot2) <- paste0(names(pbmcPlot), 2) c(pbmcPlot, pbmcPlot2) library(ggplot2) ggplot(pbmcPlot, aes(x = UMAP_1, y = UMAP_2)) + geom_point() cellMeta(pbmc) # Add new variable pbmc[["newVar"]] <- 1 cellMeta(pbmc) # Change existing variable pbmc[["newVar"]][1:3] <- 1:3 cellMeta(pbmc)
# Methods for base generics pbmcPlot print(pbmcPlot) dim(pbmcPlot) ncol(pbmcPlot) colnames(pbmcPlot)[1:5] pbmcPlot[varFeatures(pbmcPlot)[1:10], 1:10] names(pbmcPlot) length(pbmcPlot) # rliger generics ## Retrieving dataset(s), replacement methods available datasets(pbmcPlot) dataset(pbmcPlot, "ctrl") dataset(pbmcPlot, 2) ## Retrieving cell metadata, replacement methods available cellMeta(pbmcPlot) head(pbmcPlot[["nUMI"]]) ## Retrieving dimemtion reduction matrix head(dimRed(pbmcPlot, "UMAP")) ## Retrieving variable features, replacement methods available varFeatures(pbmcPlot) ## Command record/history pbmcPlot <- scaleNotCenter(pbmcPlot) commands(pbmcPlot) commands(pbmcPlot, funcName = "scaleNotCenter") # S3 methods pbmcPlot2 <- pbmcPlot names(pbmcPlot2) <- paste0(names(pbmcPlot), 2) c(pbmcPlot, pbmcPlot2) library(ggplot2) ggplot(pbmcPlot, aes(x = UMAP_1, y = UMAP_2)) + geom_point() cellMeta(pbmc) # Add new variable pbmc[["newVar"]] <- 1 cellMeta(pbmc) # Change existing variable pbmc[["newVar"]][1:3] <- 1:3 cellMeta(pbmc)
Inherits from ligerDataset class. Contained slots can be referred with the link.
rawPeak
sparse matrix
normPeak
sparse matrix
ligerCommand object: Record the input and time of a LIGER function call
## S4 method for signature 'ligerCommand' show(object)
## S4 method for signature 'ligerCommand' show(object)
object |
A |
funcName
Name of the function
time
A time stamp object
call
A character string converted from system call
parameters
List of all arguments except the liger object. Large object are summarized to short string.
objSummary
List of attributes of the liger object as a snapshot when command is operated.
ligerVersion
Character string converted from
packageVersion("rliger")
.
dependencyVersion
Named character vector of version number, if any dependency library has a chance to be included by the function. A dependency might only be invoked under certain conditions, such as using an alternative algorithm, which a call does not actually reach to, but it would still be included for this call.
pbmc <- normalize(pbmc) cmd <- commands(pbmc, "normalize") cmd
pbmc <- normalize(pbmc) cmd <- commands(pbmc, "normalize") cmd
Object for storing dastaset specific information. Will be embedded within a higher level liger object
rawData(x, dataset = NULL) rawData(x, dataset = NULL, check = TRUE) <- value normData(x, dataset = NULL) normData(x, dataset = NULL, check = TRUE) <- value scaleData(x, dataset = NULL) scaleData(x, dataset = NULL, check = TRUE) <- value scaleUnsharedData(x, dataset = NULL) scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value getMatrix(x, slot = "rawData", dataset = NULL, returnList = FALSE) h5fileInfo(x, info = NULL) h5fileInfo(x, info = NULL, check = TRUE) <- value getH5File(x, dataset = NULL) ## S4 method for signature 'ligerDataset,missing' getH5File(x, dataset = NULL) featureMeta(x, check = NULL) featureMeta(x, check = TRUE) <- value ## S4 method for signature 'ligerDataset' show(object) ## S4 method for signature 'ligerDataset' dim(x) ## S4 method for signature 'ligerDataset' dimnames(x) ## S4 replacement method for signature 'ligerDataset,list' dimnames(x) <- value ## S4 method for signature 'ligerDataset' rawData(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL' rawData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D' rawData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'ligerDataset' normData(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL' normData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D' normData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'ligerDataset,missing' scaleData(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5Group' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'ligerDataset,missing' scaleUnsharedData(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,missing,ANY,matrixLike_OR_NULL' scaleUnsharedData(x, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,missing,ANY,H5D' scaleUnsharedData(x, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,missing,ANY,H5Group' scaleUnsharedData(x, check = TRUE) <- value ## S4 method for signature 'ligerDataset,ANY,missing,missing' getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B"), dataset = NULL ) ## S4 method for signature 'ligerDataset' h5fileInfo(x, info = NULL) ## S4 replacement method for signature 'ligerDataset' h5fileInfo(x, info = NULL, check = TRUE) <- value ## S4 method for signature 'ligerDataset' featureMeta(x, check = NULL) ## S4 replacement method for signature 'ligerDataset' featureMeta(x, check = TRUE) <- value ## S3 method for class 'ligerDataset' cbind(x, ..., deparse.level = 1) ## S4 method for signature 'ligerATACDataset,ANY,missing,missing' getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B", "rawPeak", "normPeak"), dataset = NULL )
rawData(x, dataset = NULL) rawData(x, dataset = NULL, check = TRUE) <- value normData(x, dataset = NULL) normData(x, dataset = NULL, check = TRUE) <- value scaleData(x, dataset = NULL) scaleData(x, dataset = NULL, check = TRUE) <- value scaleUnsharedData(x, dataset = NULL) scaleUnsharedData(x, dataset = NULL, check = TRUE) <- value getMatrix(x, slot = "rawData", dataset = NULL, returnList = FALSE) h5fileInfo(x, info = NULL) h5fileInfo(x, info = NULL, check = TRUE) <- value getH5File(x, dataset = NULL) ## S4 method for signature 'ligerDataset,missing' getH5File(x, dataset = NULL) featureMeta(x, check = NULL) featureMeta(x, check = TRUE) <- value ## S4 method for signature 'ligerDataset' show(object) ## S4 method for signature 'ligerDataset' dim(x) ## S4 method for signature 'ligerDataset' dimnames(x) ## S4 replacement method for signature 'ligerDataset,list' dimnames(x) <- value ## S4 method for signature 'ligerDataset' rawData(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL' rawData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D' rawData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'ligerDataset' normData(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL' normData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D' normData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'ligerDataset,missing' scaleData(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,ANY,ANY,matrixLike_OR_NULL' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5D' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,ANY,ANY,H5Group' scaleData(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'ligerDataset,missing' scaleUnsharedData(x, dataset = NULL) ## S4 replacement method for signature 'ligerDataset,missing,ANY,matrixLike_OR_NULL' scaleUnsharedData(x, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,missing,ANY,H5D' scaleUnsharedData(x, check = TRUE) <- value ## S4 replacement method for signature 'ligerDataset,missing,ANY,H5Group' scaleUnsharedData(x, check = TRUE) <- value ## S4 method for signature 'ligerDataset,ANY,missing,missing' getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B"), dataset = NULL ) ## S4 method for signature 'ligerDataset' h5fileInfo(x, info = NULL) ## S4 replacement method for signature 'ligerDataset' h5fileInfo(x, info = NULL, check = TRUE) <- value ## S4 method for signature 'ligerDataset' featureMeta(x, check = NULL) ## S4 replacement method for signature 'ligerDataset' featureMeta(x, check = TRUE) <- value ## S3 method for class 'ligerDataset' cbind(x, ..., deparse.level = 1) ## S4 method for signature 'ligerATACDataset,ANY,missing,missing' getMatrix( x, slot = c("rawData", "normData", "scaleData", "scaleUnsharedData", "H", "V", "U", "A", "B", "rawPeak", "normPeak"), dataset = NULL )
x , object
|
A |
dataset |
Not applicable for |
check |
Whether to perform object validity check on setting new value. |
value |
See detail sections for requirements |
slot |
The slot name when using |
returnList |
Not applicable for |
info |
Name of the entry in |
... |
See detailed sections for explanation. |
deparse.level |
Not used here. |
rawData
Raw data. Feature by cell matrix. Most of the time, sparse matrix of integer numbers for RNA and ATAC data.
normData
Normalized data. Feature by cell matrix. Sparse if the
rawData
it is normalized from is sparse.
scaleData
Scaled data, usually with subset shared variable features, by cells. Most of the time sparse matrix of float numbers. This is the data used for iNMF factorization.
scaleUnsharedData
Scaled data of variable features not shared with other datasets. This is the data used for UINMF factorization.
varUnsharedFeatures
Variable features not shared with other datasets.
V
iNMF output matrix holding the dataset specific gene loading of each factor. Feature by factor matrix.
A
Online iNMF intermediate product matrix.
B
Online iNMF intermediate product matrix.
H
iNMF output matrix holding the factor loading of each cell. Factor by cell matrix.
U
UINMF output matrix holding the unshared variable gene loading of each factor. Feature by factor matrix.
h5fileInfo
list of meta information of HDF5 file used for constructing the object.
featureMeta
Feature metadata, DataFrame object.
colnames
Character vector of unique cell identifiers.
rownames
Character vector of unique feature names.
For ligerDataset
object, rawData()
, normData
,
scaleData()
and scaleUnsharedData()
methods are exported for
users to access the corresponding feature expression matrix. Replacement
methods are also available to modify the slots.
For other matrices, such as the and
, which are dataset
specific, please use
getMatrix()
method with specifying slot name.
Directly accessing slot with @
is generally not recommended.
A ligerDataset
object has a slot called h5fileInfo
, which is a
list object. The first element is called $H5File
, which is an
H5File
class object and is the connection to the input file. The
second element is $filename
which stores the absolute path of the H5
file in the current machine. The third element $formatType
stores the
name of preset being used, if applicable. The other following keys pair with
paths in the H5 file that point to specific data for constructing a feature
expression matrix.
h5fileInfo()
method access the list described above and simply
retrieves the corresponding value. When info = NULL
, returns the whole
list. When length(info) == 1
, returns the requested list value. When
more info requested, returns a subset list.
The replacement method modifies the list elements and corresponding slot
value (if applicable) at the same time. For example, running
h5fileInfo(obj, "rawData") <- newPath
not only updates the list, but
also updates the rawData
slot with the H5D
class data at
"newPath" in the H5File
object.
getH5File()
is a wrapper and is equivalent to
h5fileInfo(obj, "H5File")
.
A slot featureMeta
is included for each ligerDataset
object.
This slot requires a DataFrame-class
object, which
is the same as cellMeta
slot of a liger object. However,
the associated S4 methods only include access to the whole table for now.
Internal information access follows the same way as data.frame operation.
For example, featureMeta(ligerD)$nCell
or
featureMeta(ligerD)[varFeatures(ligerObj), "gene_var"]
.
For a ligerDataset
object, the column orientation is assigned for
cells and rows are for features. Therefore, for ligerDataset
objects,
dim()
returns a numeric vector of two numbers which are number of
features and number of cells. dimnames()
returns a list of two
character vectors, which are the feature names and the cell barcodes.
For direct call of dimnames<-
method, value
should be a list
with a character vector of feature names as the first element and cell
identifiers as the second element. For colnames<-
method, the
character vector of cell identifiers. For rownames<-
method, the
character vector of feature names.
For more detail of subsetting a liger
object or a
ligerDataset object, please check out subsetLiger
and subsetLigerDataset
. Here, we set the S3 method
"single-bracket" [
as a quick wrapper to subset a ligerDataset
object. i
and j
serves as feature and cell subscriptor,
respectively, which can be any valid index refering the available features
and cells in a dataset. ...
arugments are passed to
subsetLigerDataset
so that advanced options are allowed.
cbind()
method is implemented for concatenating ligerDataset
objects by cells. When applying, all feature expression matrix will be merged
with taking a union of all features for the rows.
ctrl <- dataset(pbmc, "ctrl") # Methods for base generics ctrl print(ctrl) dim(ctrl) ncol(ctrl) nrow(ctrl) colnames(ctrl)[1:5] rownames(ctrl)[1:5] ctrl[1:5, 1:5] # rliger generics ## raw data m <- rawData(ctrl) class(m) dim(m) ## normalized data pbmc <- normalize(pbmc) ctrl <- dataset(pbmc, "ctrl") m <- normData(ctrl) class(m) dim(m) ## scaled data pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) ctrl <- dataset(pbmc, "ctrl") m <- scaleData(ctrl) class(m) dim(m) n <- scaleData(pbmc, "ctrl") identical(m, n) ## Any other matrices if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runOnlineINMF(pbmc, k = 20, minibatchSize = 100) ctrl <- dataset(pbmc, "ctrl") V <- getMatrix(ctrl, "V") V[1:5, 1:5] Vs <- getMatrix(pbmc, "V") length(Vs) names(Vs) identical(Vs$ctrl, V) }
ctrl <- dataset(pbmc, "ctrl") # Methods for base generics ctrl print(ctrl) dim(ctrl) ncol(ctrl) nrow(ctrl) colnames(ctrl)[1:5] rownames(ctrl)[1:5] ctrl[1:5, 1:5] # rliger generics ## raw data m <- rawData(ctrl) class(m) dim(m) ## normalized data pbmc <- normalize(pbmc) ctrl <- dataset(pbmc, "ctrl") m <- normData(ctrl) class(m) dim(m) ## scaled data pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) ctrl <- dataset(pbmc, "ctrl") m <- scaleData(ctrl) class(m) dim(m) n <- scaleData(pbmc, "ctrl") identical(m, n) ## Any other matrices if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runOnlineINMF(pbmc, k = 20, minibatchSize = 100) ctrl <- dataset(pbmc, "ctrl") V <- getMatrix(ctrl, "V") V[1:5, 1:5] Vs <- getMatrix(pbmc, "V") length(Vs) names(Vs) identical(Vs$ctrl, V) }
Inherits from ligerDataset class. Contained slots
can be referred with the link. scaleNotCenter
applied on
datasets of this class will automatically be taken by reversing the
normalized data instead of scaling the variable features.
Inherits from ligerDataset class. Contained slots
can be referred with the link. This subclass does not have any different from
the default ligerDataset
class except the class name.
Inherits from ligerDataset class. Contained slots can be referred with the link.
coordinate
dense matrix
For converting a liger object to a Seurat object, the
rawData
, normData
, and scaleData
from each dataset,
the cellMeta
, H.norm
and varFeatures
slot will be
included. Compatible with V4 and V5. It is not recommended to use this
conversion if your liger object contains datasets from
various modalities.
ligerToSeurat( object, assay = NULL, identByDataset = FALSE, merge = FALSE, nms = NULL, renormalize = NULL, use.liger.genes = NULL, by.dataset = identByDataset )
ligerToSeurat( object, assay = NULL, identByDataset = FALSE, merge = FALSE, nms = NULL, renormalize = NULL, use.liger.genes = NULL, by.dataset = identByDataset )
object |
A liger object to be converted |
assay |
Name of assay to store the data. Default |
identByDataset |
Logical, whether to combine dataset variable and
default cluster labeling to set the Idents. Default |
merge |
Logical, whether to merge layers of different datasets into one.
Not recommended. Default |
nms |
Will be ignored because new object structure does not have related problem. |
renormalize |
Will be ignored because since Seurat V5, layers of data can exist at the same time and it is better to left it for users to do it by themselves. |
use.liger.genes |
Will be ignored and will always set LIGER variable features to the place. |
by.dataset |
Always returns Seurat object(s) of the latest version. By default a
Seurat object with split layers, e.g. with layers like "counts.ctrl" and
"counts.stim". If merge = TRUE
, return a single Seurat object with
layers for all datasets merged.
if (requireNamespace("SeuratObject", quietly = TRUE) && requireNamespace("Seurat", quietly = TRUE)) { seu <- ligerToSeurat(pbmc) }
if (requireNamespace("SeuratObject", quietly = TRUE) && requireNamespace("Seurat", quietly = TRUE)) { seu <- ligerToSeurat(pbmc) }
Evaluate the relationships between pairs of genes and peaks
based on specified distance metric. Usually used for inferring the
correlation between gene expression and imputed peak counts for datasets
without the modality originally (i.e. applied to imputeKNN
result).
linkGenesAndPeaks( object, useDataset, pathToCoords, useGenes = NULL, method = c("spearman", "pearson", "kendall"), alpha = 0.05, verbose = getOption("ligerVerbose", TRUE), path_to_coords = pathToCoords, genes.list = useGenes, dist = method )
linkGenesAndPeaks( object, useDataset, pathToCoords, useGenes = NULL, method = c("spearman", "pearson", "kendall"), alpha = 0.05, verbose = getOption("ligerVerbose", TRUE), path_to_coords = pathToCoords, genes.list = useGenes, dist = method )
object |
A liger object, with datasets that is of
ligerATACDataset class in the |
useDataset |
Name of one dataset, with both normalized gene expression and normalized peak counts available. |
pathToCoords |
Path tothe gene coordinates file, usually a BED file. |
useGenes |
Character vector of gene names to be tested. Default
|
method |
Choose the type of correlation to calculate, from
|
alpha |
Numeric, significance threshold for correlation p-value.
Peak-gene correlations with p-values below this threshold are considered
significant. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
path_to_coords , genes.list , dist
|
Deprecated. See Usage section for replacement. |
A sparse matrix with peak names as rows and gene names as columns, with each element indicating the correlation between peak i and gene j, 0 if the gene and peak are not significantly linked.
if (requireNamespace("RcppPlanc", quietly = TRUE) && requireNamespace("GenomicRanges", quietly = TRUE) && requireNamespace("IRanges", quietly = TRUE) && requireNamespace("psych", quietly = TRUE)) { bmmc <- normalize(bmmc) bmmc <- selectGenes(bmmc) bmmc <- scaleNotCenter(bmmc) bmmc <- runINMF(bmmc, miniBatchSize = 100) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna") corr <- linkGenesAndPeaks( bmmc, useDataset = "rna", pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger") ) }
if (requireNamespace("RcppPlanc", quietly = TRUE) && requireNamespace("GenomicRanges", quietly = TRUE) && requireNamespace("IRanges", quietly = TRUE) && requireNamespace("psych", quietly = TRUE)) { bmmc <- normalize(bmmc) bmmc <- selectGenes(bmmc) bmmc <- scaleNotCenter(bmmc) bmmc <- runINMF(bmmc, miniBatchSize = 100) bmmc <- alignFactors(bmmc) bmmc <- normalizePeak(bmmc) bmmc <- imputeKNN(bmmc, reference = "atac", queries = "rna") corr <- linkGenesAndPeaks( bmmc, useDataset = "rna", pathToCoords = system.file("extdata/hg19_genes.bed", package = "rliger") ) }
After quantile normalization, users can additionally run the Louvain algorithm for community detection, which is widely used in single-cell analysis and excels at merging small clusters into broad cell classes.
object |
|
k |
The maximum number of nearest neighbours to compute. (default 20) |
resolution |
Value of the resolution parameter, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities. (default 1.0) |
prune |
Sets the cutoff for acceptable Jaccard index when computing the neighborhood overlap for the SNN construction. Any edges with values less than or equal to this will be set to 0 and removed from the SNN graph. Essentially sets the strigency of pruning (0 — no pruning, 1 — prune everything). (default 1/15) |
eps |
The error bound of the nearest neighbor search. (default 0.1) |
nRandomStarts |
Number of random starts. (default 10) |
nIterations |
Maximal number of iterations per random start. (default 100) |
random.seed |
Seed of the random number generator. (default 1) |
verbose |
Print messages (TRUE by default) |
dims.use |
Indices of factors to use for clustering. Default |
object
with refined cluster assignment updated in
"louvain_cluster"
variable in cellMeta
slot. Can be fetched
with object$louvain_cluster
Fast calculation of feature count matrix
makeFeatureMatrix(bedmat, barcodes)
makeFeatureMatrix(bedmat, barcodes)
bedmat |
A feature count list generated by bedmap |
barcodes |
A list of barcodes |
A feature count matrix with features as rows and barcodes as columns
## Not run: gene.counts <- makeFeatureMatrix(genes.bc, barcodes) promoter.counts <- makeFeatureMatrix(promoters.bc, barcodes) samnple <- gene.counts + promoter.counts ## End(Not run)
## Not run: gene.counts <- makeFeatureMatrix(genes.bc, barcodes) promoter.counts <- makeFeatureMatrix(promoters.bc, barcodes) samnple <- gene.counts + promoter.counts ## End(Not run)
Export the predicted gene-pair interactions calculated by
upstream function linkGenesAndPeaks
into an Interact Track file
which is compatible with UCSC
Genome Browser.
corr.mat |
A sparse matrix of correlation with peak names as rows and gene names as columns. |
path_to_coords |
Path to the gene coordinates file. |
genes.list |
Character vector of gene names to be exported. Default
|
output_path |
Path of filename where the output file will be stored. If
a folder, a file named |
No return value. A file located at outputPath
will be created.
rliger-deprecated
, exportInteractTrack
Creates a riverplot to show how separate cluster assignments from two datasets map onto a joint clustering. The joint clustering is by default the object clustering, but an external one can also be passed in. Uses the riverplot package to construct riverplot object and then plot.
object |
|
cluster1 |
Cluster assignments for dataset 1. Note that cluster names should be distinct across datasets. |
cluster2 |
Cluster assignments for dataset 2. Note that cluster names should be distinct across datasets. |
cluster_consensus |
Optional external consensus clustering (to use instead of object clusters) |
min.frac |
Minimum fraction of cluster for edge to be shown (default 0.05). |
min.cells |
Minumum number of cells for edge to be shown (default 10). |
river.yscale |
y-scale to pass to riverplot – scales the edge with values by this factor, can be used to squeeze vertically (default 1). |
river.lty |
Line style to pass to riverplot (default 0). |
river.node_margin |
Node_margin to pass to riverplot – how much vertical space to keep between the nodes (default 0.1). |
label.cex |
Size of text labels (default 1). |
label.col |
Color of text labels (defualt "black"). |
lab.srt |
Angle of text labels (default 0). |
river.usr |
Coordinates at which to draw the plot in form (x0, x1, y0, y1). |
node.order |
Order of clusters in each set (list with three vectors of ordinal numbers). By default will try to automatically order them appropriately. |
object
with refined cluster assignment updated in
"louvain_cluster"
variable in cellMeta
slot. Can be fetched
with object$louvain_cluster
Designed for fast variable creation when a new variable is going to be created from existing variable. For example, multiple samples can be mapped to the same study design condition, clusters can be mapped to cell types.
mapCellMeta(object, from, newTo = NULL, ...)
mapCellMeta(object, from, newTo = NULL, ...)
object |
A liger object. |
from |
The name of the original variable to be mapped from. |
newTo |
The name of the new variable to store the mapped result. Default
|
... |
Mapping criteria, argument names are original existing categories
in the |
When newTo = NULL
, a factor object of the new variable.
Otherwise, the input object with variable newTo
updated in
cellMeta(object)
.
pbmc <- mapCellMeta(pbmc, from = "dataset", newTo = "modal", ctrl = "rna", stim = "rna")
pbmc <- mapCellMeta(pbmc, from = "dataset", newTo = "modal", ctrl = "rna", stim = "rna")
This function merges hdf5 files generated from different libraries (cell ranger by default) before they are preprocessed through Liger pipeline.
mergeH5( file.list, library.names, new.filename, format.type = "10X", data.name = NULL, indices.name = NULL, indptr.name = NULL, genes.name = NULL, barcodes.name = NULL )
mergeH5( file.list, library.names, new.filename, format.type = "10X", data.name = NULL, indices.name = NULL, indptr.name = NULL, genes.name = NULL, barcodes.name = NULL )
file.list |
List of path to hdf5 files. |
library.names |
Vector of library names (corresponding to file.list) |
new.filename |
String of new hdf5 file name after merging (default new.h5). |
format.type |
string of HDF5 format (10X CellRanger by default). |
data.name |
Path to the data values stored in HDF5 file. |
indices.name |
Path to the indices of data points stored in HDF5 file. |
indptr.name |
Path to the pointers stored in HDF5 file. |
genes.name |
Path to the gene names stored in HDF5 file. |
barcodes.name |
Path to the barcodes stored in HDF5 file. |
Directly generates newly merged hdf5 file.
## Not run: # For instance, we want to merge two datasets saved in HDF5 files (10X # CellRanger) paths to datasets: "library1.h5","library2.h5" # dataset names: "lib1", "lib2" # name for output HDF5 file: "merged.h5" mergeH5(list("library1.h5","library2.h5"), c("lib1","lib2"), "merged.h5") ## End(Not run)
## Not run: # For instance, we want to merge two datasets saved in HDF5 files (10X # CellRanger) paths to datasets: "library1.h5","library2.h5" # dataset names: "lib1", "lib2" # name for output HDF5 file: "merged.h5" mergeH5(list("library1.h5","library2.h5"), c("lib1","lib2"), "merged.h5") ## End(Not run)
mergeSparseAll
takes in a list of DGEs, with genes as
rows and cells as columns, and merges them into a single DGE. Also adds
libraryNames
to colnames from each DGE if expected to be overlap
(common with 10X barcodes). Values in rawData
or normData
slot of a ligerDataset object can be processed with this.
For a list of dense matrices, usually the values in scaleData
slot of
a ligerDataset object, please use mergeDenseAll
which
works in the same way.
mergeSparseAll( datalist, libraryNames = NULL, mode = c("union", "intersection") ) mergeDenseAll(datalist, libraryNames = NULL)
mergeSparseAll( datalist, libraryNames = NULL, mode = c("union", "intersection") ) mergeDenseAll(datalist, libraryNames = NULL)
datalist |
List of dgCMatrix for |
libraryNames |
Character vector to be added as the prefix for the
barcodes in each matrix in |
mode |
Whether to take the |
dgCMatrix or matrix with all barcodes in datalist
as columns
and the union of genes in datalist
as rows.
rawDataList <- getMatrix(pbmc, "rawData") merged <- mergeSparseAll(rawDataList, libraryNames = names(pbmc))
rawDataList <- getMatrix(pbmc, "rawData") merged <- mergeSparseAll(rawDataList, libraryNames = names(pbmc))
Return preset modality of a ligerDataset object or that of all datasets in a liger object
modalOf(object)
modalOf(object)
object |
a ligerDataset object or a liger object |
A single character of modality setting value for
ligerDataset object
, or a named vector for
liger object, where the names are dataset names.
modalOf(pbmc) ctrl <- dataset(pbmc, "ctrl") modalOf(ctrl) ctrl.atac <- as.ligerDataset(ctrl, modal = "atac") modalOf(ctrl.atac)
modalOf(pbmc) ctrl <- dataset(pbmc, "ctrl") modalOf(ctrl) ctrl.atac <- as.ligerDataset(ctrl, modal = "atac") modalOf(ctrl.atac)
Perform library size normalization on raw counts input. As for the preprocessing step of iNMF integration, by default we don't multiply the normalized values with a scale factor, nor do we take the log transformation. Applicable S3 methods can be found in Usage section.
normalizePeak
is designed for datasets of "atac" modality, i.e. stored
in ligerATACDataset. S3 method for various container object is
not supported yet due to difference in architecture design.
normalize(object, ...) ## S3 method for class 'matrix' normalize(object, log = FALSE, scaleFactor = NULL, ...) ## S3 method for class 'dgCMatrix' normalize(object, log = FALSE, scaleFactor = NULL, ...) ## S3 method for class 'ligerDataset' normalize(object, chunk = 1000, verbose = getOption("ligerVerbose", TRUE), ...) ## S3 method for class 'liger' normalize( object, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), format.type = NULL, remove.missing = NULL, ... ) ## S3 method for class 'Seurat' normalize(object, assay = NULL, layer = "counts", save = "ligerNormData", ...) normalizePeak( object, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), ... )
normalize(object, ...) ## S3 method for class 'matrix' normalize(object, log = FALSE, scaleFactor = NULL, ...) ## S3 method for class 'dgCMatrix' normalize(object, log = FALSE, scaleFactor = NULL, ...) ## S3 method for class 'ligerDataset' normalize(object, chunk = 1000, verbose = getOption("ligerVerbose", TRUE), ...) ## S3 method for class 'liger' normalize( object, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), format.type = NULL, remove.missing = NULL, ... ) ## S3 method for class 'Seurat' normalize(object, assay = NULL, layer = "counts", save = "ligerNormData", ...) normalizePeak( object, useDatasets = NULL, verbose = getOption("ligerVerbose", TRUE), ... )
object |
liger object |
... |
Arguments to be passed to S3 methods. The "liger" method calls
the "ligerDataset" method, which then calls "dgCMatrix" method.
|
log |
Logical. Whether to do a |
scaleFactor |
Numeric. Scale the normalized expression value by this
factor before transformation. |
chunk |
Integer. Number of maximum number of cells in each chunk when
working on HDF5 file based ligerDataset. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to be normalized. Should specify ATACseq
datasets when using |
format.type , remove.missing
|
Deprecated. The functionality of these is covered through other parts of the whole workflow and is no long needed. Will be ignored if specified. |
assay |
Name of assay to use. Default |
layer |
Where the input raw counts should be from. Default
|
save |
For Seurat>=4.9.9, the name of layer to store normalized data.
Default |
Updated object
.
dgCMatrix method - Returns processed dgCMatrix object
ligerDataset method - Updates the normData
slot of the object
liger method - Updates the normData
slot of chosen datasets
Seurat method - Adds a named layer in chosen assay (V5), or update the
data
slot of the chosen assay (<=V4)
normalizePeak
- Updates the normPeak
slot of chosen
datasets.
pbmc <- normalize(pbmc)
pbmc <- normalize(pbmc)
Please turn to runOnlineINMF
or
runIntegration
.
Perform online integrative non-negative matrix factorization to represent multiple single-cell datasets in terms of H, W, and V matrices. It optimizes the iNMF objective function using online learning (non-negative least squares for H matrix, hierarchical alternating least squares for W and V matrices), where the number of factors is set by k. The function allows online learning in 3 scenarios: (1) fully observed datasets; (2) iterative refinement using continually arriving datasets; and (3) projection of new datasets without updating the existing factorization. All three scenarios require fixed memory independent of the number of cells.
For each dataset, this factorization produces an H matrix (cells by k), a V matrix (k by genes), and a shared W matrix (k by genes). The H matrices represent the cell factor loadings. W is identical among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes.
object |
|
X_new |
List of new datasets for scenario 2 or scenario 3. Each list element should be the name of an HDF5 file. |
projection |
Perform data integration by shared metagene (W) projection (scenario 3). (default FALSE) |
W.init |
Optional initialization for W. (default NULL) |
V.init |
Optional initialization for V (default NULL) |
H.init |
Optional initialization for H (default NULL) |
A.init |
Optional initialization for A (default NULL) |
B.init |
Optional initialization for B (default NULL) |
k |
Inner dimension of factorization–number of metagenes (default 20). A value in the range 20-50 works well for most analyses. |
lambda |
Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). We recommend always using the default value except possibly for analyses with relatively small differences (biological replicates, male/female comparisons, etc.) in which case a lower value such as 1.0 may improve reconstruction quality. (default 5.0). |
max.epochs |
Maximum number of epochs (complete passes through the data). (default 5) |
miniBatch_max_iters |
Maximum number of block coordinate descent (HALS algorithm) iterations to perform for each update of W and V (default 1). Changing this parameter is not recommended. |
miniBatch_size |
Total number of cells in each minibatch (default 5000). This is a reasonable default, but a smaller value such as 1000 may be necessary for analyzing very small datasets. In general, minibatch size should be no larger than the number of cells in the smallest dataset. |
h5_chunk_size |
Chunk size of input hdf5 files (default 1000). The chunk size should be no larger than the batch size. |
seed |
Random seed to allow reproducible results (default 123). |
verbose |
Print progress bar/messages (TRUE by default) |
liger
object with H, W, V, A and B slots set.
Please turn to runINMF
or runIntegration
.
Perform integrative non-negative matrix factorization to return factorized H, W, and V matrices. It optimizes the iNMF objective function using block coordinate descent (alternating non-negative least squares), where the number of factors is set by k. TODO: include objective function equation here in documentation (using deqn)
For each dataset, this factorization produces an H matrix (cells by k), a V matrix (k by genes), and a shared W matrix (k by genes). The H matrices represent the cell factor loadings. W is held consistent among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes.
object |
|
k |
Inner dimension of factorization (number of factors). Run suggestK to determine appropriate value; a general rule of thumb is that a higher k will be needed for datasets with more sub-structure. |
lambda |
Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). Run suggestLambda to determine most appropriate value for balancing dataset alignment and agreement (default 5.0). |
thresh |
Convergence threshold. Convergence occurs when |obj0-obj|/(mean(obj0,obj)) < thresh. (default 1e-6) |
max.iters |
Maximum number of block coordinate descent iterations to perform (default 30). |
nrep |
Number of restarts to perform (iNMF objective function is non-convex, so taking the best objective from multiple successive initializations is recommended). For easier reproducibility, this increments the random seed by 1 for each consecutive restart, so future factorizations of the same dataset can be run with one rep if necessary. (default 1) |
H.init |
Initial values to use for H matrices. (default NULL) |
W.init |
Initial values to use for W matrix (default NULL) |
V.init |
Initial values to use for V matrices (default NULL) |
rand.seed |
Random seed to allow reproducible results (default 1). |
print.obj |
Print objective function values after convergence (default FALSE). |
verbose |
Print progress bar/messages (TRUE by default) |
... |
Arguments passed to other methods |
liger
object with H, W, and V slots set.
Uses an efficient strategy for updating that takes advantage of
the information in the existing factorization. Assumes that variable features
are presented in the new datasets. Two modes are supported (controlled by
merge
):
Append new data to existing datasets specified by useDatasets
.
Here the existing matrices for the target datasets will directly be
used as initialization, and new
matrices for the merged matrices will
be initialized accordingly.
Set new data as new datasets. Initial matrices for them will
be copied from datasets specified by
useDatasets
, and new
matrices will be initialized accordingly.
optimizeNewData( object, dataNew, useDatasets, merge = TRUE, lambda = NULL, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), new.data = dataNew, which.datasets = useDatasets, add.to.existing = merge, max.iters = nIteration, thresh = NULL )
optimizeNewData( object, dataNew, useDatasets, merge = TRUE, lambda = NULL, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), new.data = dataNew, which.datasets = useDatasets, add.to.existing = merge, max.iters = nIteration, thresh = NULL )
object |
A liger object. Should have integrative
factorization performed e.g. ( |
dataNew |
Named list of raw count matrices, genes by cells. |
useDatasets |
Selection of datasets to append new data to if
|
merge |
Logical, whether to add the new data to existing
datasets or treat as totally new datasets (i.e. calculate new |
lambda |
Numeric regularization parameter. By default |
nIteration |
Number of block coordinate descent iterations to perform.
Default |
seed |
Random seed to allow reproducible results. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
new.data , which.datasets , add.to.existing , max.iters
|
These arguments are now replaced by others and will be removed in the future. Please see usage for replacement. |
thresh |
Deprecated. New implementation of iNMF does not require
a threshold for convergence detection. Setting a large enough
|
object
with W
slot updated with the new
matrix, and the
H
and V
slots of each
ligerDataset object in the datasets
slot updated with
the new dataset specific and
matrix, respectively.
runINMF
, optimizeNewK
,
optimizeNewLambda
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) # Only running a few iterations for fast examples if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc, k = 20, nIteration = 2) # Create fake new data by increasing all non-zero count in "ctrl" by 1, # and make unique cell identifiers ctrl2 <- rawData(dataset(pbmc, "ctrl")) ctrl2@x <- ctrl2@x + 1 colnames(ctrl2) <- paste0(colnames(ctrl2), 2) pbmcNew <- optimizeNewData(pbmc, dataNew = list(ctrl2 = ctrl2), useDatasets = "ctrl", nIteration = 2) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) # Only running a few iterations for fast examples if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc, k = 20, nIteration = 2) # Create fake new data by increasing all non-zero count in "ctrl" by 1, # and make unique cell identifiers ctrl2 <- rawData(dataset(pbmc, "ctrl")) ctrl2@x <- ctrl2@x + 1 colnames(ctrl2) <- paste0(colnames(ctrl2), 2) pbmcNew <- optimizeNewData(pbmc, dataNew = list(ctrl2 = ctrl2), useDatasets = "ctrl", nIteration = 2) }
This uses an efficient strategy for updating that takes
advantage of the information in the existing factorization. It is most
recommended for values of kNew
smaller than current value (k
,
which is set when running runINMF
), where it is more likely to
speed up the factorization.
optimizeNewK( object, kNew, lambda = NULL, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), k.new = kNew, max.iters = nIteration, rand.seed = seed, thresh = NULL )
optimizeNewK( object, kNew, lambda = NULL, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), k.new = kNew, max.iters = nIteration, rand.seed = seed, thresh = NULL )
object |
A liger object. Should have integrative
factorization performed e.g. ( |
kNew |
Number of factors of factorization. |
lambda |
Numeric regularization parameter. By default |
nIteration |
Number of block coordinate descent iterations to
perform. Default |
seed |
Random seed to allow reproducible results. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
k.new , max.iters , rand.seed
|
These arguments are now replaced by others and will be removed in the future. Please see usage for replacement. |
thresh |
Deprecated. New implementation of iNMF does not require
a threshold for convergence detection. Setting a large enough
|
object
with W
slot updated with the new
matrix, and the
H
and V
slots of each
ligerDataset object in the datasets
slot updated with
the new dataset specific and
matrix, respectively.
runINMF
, optimizeNewLambda
,
optimizeNewData
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) # Only running a few iterations for fast examples if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc, k = 20, nIteration = 2) pbmc <- optimizeNewK(pbmc, kNew = 25, nIteration = 2) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) # Only running a few iterations for fast examples if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc, k = 20, nIteration = 2) pbmc <- optimizeNewK(pbmc, kNew = 25, nIteration = 2) }
Uses an efficient strategy for updating that takes advantage of the information in the existing factorization; always uses previous k. Recommended mainly when re-optimizing for higher lambda and when new lambda value is significantly different; otherwise may not return optimal results.
optimizeNewLambda( object, lambdaNew, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), new.lambda = lambdaNew, max.iters = nIteration, rand.seed = seed, thresh = NULL )
optimizeNewLambda( object, lambdaNew, nIteration = 30, seed = 1, verbose = getOption("ligerVerbose"), new.lambda = lambdaNew, max.iters = nIteration, rand.seed = seed, thresh = NULL )
object |
liger object. Should have integrative
factorization (e.g. |
lambdaNew |
Numeric regularization parameter. Larger values penalize dataset-specific effects more strongly. |
nIteration |
Number of block coordinate descent iterations to
perform. Default |
seed |
Random seed to allow reproducible results. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
new.lambda , max.iters , rand.seed
|
These arguments are now replaced by others and will be removed in the future. Please see usage for replacement. |
thresh |
Deprecated. New implementation of iNMF does not require
a threshold for convergence detection. Setting a large enough
|
Input object
with optimized factorization values updated.
including the W
matrix in liger object, and H
and
V
matrices in each ligerDataset object in the
datasets
slot.
runINMF
, optimizeNewK
,
optimizeNewData
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Only running a few iterations for fast examples pbmc <- runINMF(pbmc, k = 20, nIteration = 2) pbmc <- optimizeNewLambda(pbmc, lambdaNew = 5.5, nIteration = 2) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Only running a few iterations for fast examples pbmc <- runINMF(pbmc, k = 20, nIteration = 2) pbmc <- optimizeNewLambda(pbmc, lambdaNew = 5.5, nIteration = 2) }
Uses an efficient strategy for updating that takes advantage of the information in the existing factorization.
optimizeSubset( object, clusterVar = NULL, useClusters = NULL, lambda = NULL, nIteration = 30, cellIdx = NULL, scaleDatasets = NULL, seed = 1, verbose = getOption("ligerVerbose"), cell.subset = cellIdx, cluster.subset = useClusters, max.iters = nIteration, datasets.scale = scaleDatasets, thresh = NULL )
optimizeSubset( object, clusterVar = NULL, useClusters = NULL, lambda = NULL, nIteration = 30, cellIdx = NULL, scaleDatasets = NULL, seed = 1, verbose = getOption("ligerVerbose"), cell.subset = cellIdx, cluster.subset = useClusters, max.iters = nIteration, datasets.scale = scaleDatasets, thresh = NULL )
object |
liger object. Should have integrative
factorization (e.g. |
clusterVar , useClusters
|
Together select the clusters to subset the
object conveniently. |
lambda |
Numeric regularization parameter. By default |
nIteration |
Maximum number of block coordinate descent iterations to
perform. Default |
cellIdx |
Valid index vector that applies to the whole object. See
|
scaleDatasets |
Names of datasets to re-scale after subsetting.
Default |
seed |
Random seed to allow reproducible results. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
cell.subset , cluster.subset , max.iters , datasets.scale
|
These arguments are now replaced by others and will be removed in the future. Please see usage for replacement. |
thresh |
Deprecated. New implementation of iNMF does not require
a threshold for convergence detection. Setting a large enough
|
Subset object
with factorization matrices optimized, including
the W
matrix in liger object, and W
and V
matrices in each ligerDataset object in the datasets
slot. scaleData
in the ligerDataset objects of
datasets specified by scaleDatasets
will also be updated to reflect
the subset.
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Only running a few iterations for fast examples pbmc <- runINMF(pbmc, k = 20, nIteration = 2) pbmc <- optimizeSubset(pbmc, cellIdx = sort(sample(ncol(pbmc), 200)), nIteration = 2) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Only running a few iterations for fast examples pbmc <- runINMF(pbmc, k = 20, nIteration = 2) pbmc <- optimizeSubset(pbmc, cellIdx = sort(sample(ncol(pbmc), 200)), nIteration = 2) }
liger object of PBMC subsample data with Control and Stimulated datasets
pbmc
pbmc
liger object with two datasets named by "ctrl" and "stim".
https://www.nature.com/articles/nbt.4042
Hyun Min Kang and et. al., Nature Biotechnology, 2018
This data was generated from data "pbmc"
with default
parameter integration pipeline: normalize, selectGenes, scaleNotCenter,
runINMF, runCluster, runUMAP. To minimize the object size distributed with
the package, rawData and scaleData were removed. Genes are downsampled to
the top 50 variable genes, for smaller normData and matrix.
pbmcPlot
pbmcPlot
liger object with two datasets named by "ctrl" and "stim".
https://www.nature.com/articles/nbt.4042
Hyun Min Kang and et. al., Nature Biotechnology, 2018
This function ranks the total count of each cell within each dataset and make line plot. This function is simply for examining the input raw count data and does not infer any recommended cutoff for removing non-cell barcodes.
plotBarcodeRank(object, ...)
plotBarcodeRank(object, ...)
object |
A liger object. |
... |
Arguments passed on to
|
A list object of ggplot for each dataset
plotBarcodeRank(pbmc)
plotBarcodeRank(pbmc)
This function allows for using available cell metadata, feature
expression or factor loading to generate violin plot, and grouping the data
with available categorical cell metadata. Available categorical cell metadata
can be used to form the color annotation. When it is different from the
grouping, it forms a nested grouping. Multiple y-axis variables are allowed
from the same specification of slot
, and this returns a list of violin
plot for each. Users can further split the plot(s) by grouping on cells (e.g.
datasets).
plotCellViolin( object, y, groupBy = NULL, slot = c("cellMeta", "rawData", "normData", "scaleData", "H.norm", "H"), yFunc = NULL, cellIdx = NULL, colorBy = NULL, splitBy = NULL, titles = NULL, ... )
plotCellViolin( object, y, groupBy = NULL, slot = c("cellMeta", "rawData", "normData", "scaleData", "H.norm", "H"), yFunc = NULL, cellIdx = NULL, colorBy = NULL, splitBy = NULL, titles = NULL, ... )
object |
liger object |
y |
Available variable name in |
groupBy , colorBy
|
Available variable name in |
slot |
Choose the slot to find the |
yFunc |
A function object that expects a vector/factor/data.frame
retrieved by |
cellIdx |
Character, logical or numeric index that can subscribe cells.
Missing or |
splitBy |
Character vector of categorical variable names in
|
titles |
Title text. A character scalar or a character vector with as
many elements as multiple plots are supposed to be generated. Default
|
... |
Arguments passed on to
|
Available option for slot
include: "cellMeta"
,
"rawData"
, "normData"
, "scaleData"
, "H.norm"
and "H"
. When "rawData"
, "normData"
or
"scaleData"
, y
has to be a character vector of feature names.
When "H.norm"
or "H"
, colorBy
can be any valid index to
select one factor of interests. Note that character index follows
"Factor_[k]"
format, with replacing [k]
with an integer.
When "cellMeta"
, y
has to be an available column name in
the table. Note that, for y
as well as groupBy
, colorBy
and splitBy
since a matrix object is feasible in cellMeta
table, using a column (e.g. named as "column1"
in a certain matrix
(e.g. named as "matrixVar"
) should follow the syntax of
"matrixVar.column1"
. When the matrix does not have a "colname"
attribute, the subscription goes with "matrixVar.V1"
,
"matrixVar.V2"
and etc. These are based on the nature of
as.data.frame
method on a DataFrame
object.
groupBy
is basically send to ggplot2::aes(x)
, while
colorBy
is for the "colour" aesthetics. Specifying colorBy
without groupBy
visually creates grouping but there will not be
varying values on the x-axis, so boxWidth
will be forced to the same
value as violinWidth
under this situation.
A ggplot object when a single plot is intended. A list of ggplot
objects, when multiple y
variables and/or splitBy
are set. When
plotly = TRUE
, all ggplot objects become plotly (htmlwidget) objects.
plotCellViolin(pbmcPlot, y = "nUMI", groupBy = "dataset", slot = "cellMeta") plotCellViolin(pbmcPlot, y = "nUMI", groupBy = "leiden_cluster", slot = "cellMeta", splitBy = "dataset", colorBy = "leiden_cluster", box = TRUE, dot = TRUE, ylab = "Total counts per cell", colorValues = RColorBrewer::brewer.pal(8, "Set1")) plotCellViolin(pbmcPlot, y = "S100A8", slot = "normData", yFunc = function(x) log2(10000*x + 1), groupBy = "dataset", colorBy = "leiden_cluster", box = TRUE, ylab = "S100A8 Expression")
plotCellViolin(pbmcPlot, y = "nUMI", groupBy = "dataset", slot = "cellMeta") plotCellViolin(pbmcPlot, y = "nUMI", groupBy = "leiden_cluster", slot = "cellMeta", splitBy = "dataset", colorBy = "leiden_cluster", box = TRUE, dot = TRUE, ylab = "Total counts per cell", colorValues = RColorBrewer::brewer.pal(8, "Set1")) plotCellViolin(pbmcPlot, y = "S100A8", slot = "normData", yFunc = function(x) log2(10000*x + 1), groupBy = "dataset", colorBy = "leiden_cluster", box = TRUE, ylab = "S100A8 Expression")
This function produces dot plots. Each column represent a group
of cells specified by groupBy
, each row is a factor specified by
useDims
. The color of dots reflects mean of factor loading of
specified factors in each cell group and sizes reflects the percentage of
cells that have loadings of a factor in a group. We utilize
ComplexHeatmap
for simplified management of adding annotation and slicing subplots. This was
inspired by the implementation in
scCustomize.
plotClusterFactorDot( object, groupBy = NULL, useDims = NULL, useRaw = FALSE, splitBy = NULL, factorScaleFunc = NULL, cellIdx = NULL, legendColorTitle = "Mean Factor\nLoading", legendSizeTitle = "Percent\nLoaded", viridisOption = "viridis", verbose = FALSE, ... )
plotClusterFactorDot( object, groupBy = NULL, useDims = NULL, useRaw = FALSE, splitBy = NULL, factorScaleFunc = NULL, cellIdx = NULL, legendColorTitle = "Mean Factor\nLoading", legendSizeTitle = "Percent\nLoaded", viridisOption = "viridis", verbose = FALSE, ... )
object |
A liger object |
groupBy |
The names of the columns in |
useDims |
A Numeric vector to specify exact factors of interests.
Default |
useRaw |
Whether to use un-aligned cell factor loadings ( |
splitBy |
The names of the columns in |
factorScaleFunc |
A function object applied to factor loading matrix for
scaling the value for better visualization. Default |
cellIdx |
Valid cell subscription. See |
legendColorTitle |
Title for colorbar legend. Default
|
legendSizeTitle |
Title for size legend. Default
|
viridisOption |
Name of available viridis palette. See
|
verbose |
Logical. Whether to show progress information. Mainly when
subsetting data. Default |
... |
Additional theme setting arguments passed to
|
For ...
, please notice that arguments colorMat
,
sizeMat
, featureAnnDF
, cellSplitVar
, cellLabels
and viridisOption
from .complexHeatmapDotPlot
are
already occupied by this function internally. A lot of arguments from
Heatmap
have also been occupied: matrix,
name, heatmap_legend_param, rect_gp, col, layer_fun, km, border, border_gp,
column_gap, row_gap, cluster_row_slices, cluster_rows, row_title_gp,
row_names_gp, row_split, row_labels, cluster_column_slices, cluster_columns,
column_split, column_title_gp, column_title, column_labels, column_names_gp,
top_annotation
.
HeatmapList
object.
plotClusterFactorDot(pbmcPlot)
plotClusterFactorDot(pbmcPlot)
This function produces dot plots. Each column represent a group
of cells specified by groupBy
, each row is a gene specified by
features
. The color of dots reflects mean of normalized expression of
specified genes in each cell group and sizes reflects the percentage of cells
expressing each gene in a group. We utilize
ComplexHeatmap
for simplified management of adding annotation and slicing subplots. This was
inspired by the implementation in
scCustomize.
plotClusterGeneDot( object, features, groupBy = NULL, splitBy = NULL, featureScaleFunc = function(x) log2(10000 * x + 1), cellIdx = NULL, legendColorTitle = "Mean\nExpression", legendSizeTitle = "Percent\nExpressed", viridisOption = "magma", verbose = FALSE, ... )
plotClusterGeneDot( object, features, groupBy = NULL, splitBy = NULL, featureScaleFunc = function(x) log2(10000 * x + 1), cellIdx = NULL, legendColorTitle = "Mean\nExpression", legendSizeTitle = "Percent\nExpressed", viridisOption = "magma", verbose = FALSE, ... )
object |
A liger object |
features |
Use a character vector of gene names to make plain dot plot
like a heatmap. Use a data.frame where the first column is gene names and
second column is a grouping variable (e.g. subset |
groupBy |
The names of the columns in |
splitBy |
The names of the columns in |
featureScaleFunc |
A function object applied to normalized data for
scaling the value for better visualization. Default |
cellIdx |
Valid cell subscription. See |
legendColorTitle |
Title for colorbar legend. Default
|
legendSizeTitle |
Title for size legend. Default
|
viridisOption |
Name of available viridis palette. See
|
verbose |
Logical. Whether to show progress information. Mainly when
subsetting data. Default |
... |
Additional theme setting arguments passed to
|
For ...
, please notice that arguments colorMat
,
sizeMat
, featureAnnDF
, cellSplitVar
, cellLabels
and viridisOption
from .complexHeatmapDotPlot
are
already occupied by this function internally. A lot of arguments from
Heatmap
have also been occupied: matrix,
name, heatmap_legend_param, rect_gp, col, layer_fun, km, border, border_gp,
column_gap, row_gap, cluster_row_slices, cluster_rows, row_title_gp,
row_names_gp, row_split, row_labels, cluster_column_slices, cluster_columns,
column_split, column_title_gp, column_title, column_labels, column_names_gp,
top_annotation
.
HeatmapList
object.
# Use character vector of genes features <- varFeatures(pbmcPlot)[1:10] plotClusterGeneDot(pbmcPlot, features = features) # Use data.frame with grouping information, with more tweak on plot features <- data.frame(features, rep(letters[1:5], 2)) plotClusterGeneDot(pbmcPlot, features = features, clusterFeature = TRUE, clusterCell = TRUE, maxDotSize = 6)
# Use character vector of genes features <- varFeatures(pbmcPlot)[1:10] plotClusterGeneDot(pbmcPlot, features = features) # Use data.frame with grouping information, with more tweak on plot features <- data.frame(features, rep(letters[1:5], 2)) plotClusterGeneDot(pbmcPlot, features = features, clusterFeature = TRUE, clusterCell = TRUE, maxDotSize = 6)
Make violin plots for each given gene grouped by cluster variable and stack along y axis.
plotClusterGeneViolin( object, gene, groupBy = NULL, colorBy = NULL, box = FALSE, boxAlpha = 0.1, yFunc = function(x) log1p(x * 10000), showLegend = !is.null(colorBy), xlabAngle = 40, ... )
plotClusterGeneViolin( object, gene, groupBy = NULL, colorBy = NULL, box = FALSE, boxAlpha = 0.1, yFunc = function(x) log1p(x * 10000), showLegend = !is.null(colorBy), xlabAngle = 40, ... )
object |
A liger object. |
gene |
Character vector of gene names. |
groupBy |
The name of an available categorical variable in
|
colorBy |
The name of another categorical variable in |
box |
Logical, whether to add boxplot. Default |
boxAlpha |
Numeric, transparency of boxplot. Default |
yFunc |
Function to transform the y-axis. Default is
|
showLegend |
Whether to show the legend. Default |
xlabAngle |
Numeric, counter-clockwise rotation angle in degrees of X
axis label text. Default |
... |
Arguments passed on to
|
If xlab
need to be set, set xlabAngle
at the same time. This is
due to that the argument parsing mechanism will partially match it to main
function arguments before matching the ...
arguments.
A ggplot object.
plotClusterGeneViolin(pbmcPlot, varFeatures(pbmcPlot)[1:10])
plotClusterGeneViolin(pbmcPlot, varFeatures(pbmcPlot)[1:10])
This function shows the cell density presented in a 2D
dimensionality reduction coordinates. Density is shown with coloring and
contour lines. A scatter plot of the dimensionality reduction is added as
well. The density plot can be splitted by categorical variables (e.g.
"dataset"
), while the scatter plot will always be shown for all cells
in subplots as a reference of the global structure.
plotDensityDimRed( object, useDimRed = NULL, splitBy = NULL, combinePlot = TRUE, minDensity = 8, contour = TRUE, contourLineWidth = 0.3, contourBins = 5, dot = TRUE, dotColor = "grey", dotSize = 0.6, dotAlpha = 0.3, dotRaster = NULL, title = NULL, legendFillTitle = "Density", colorPalette = "magma", colorDirection = -1, ... )
plotDensityDimRed( object, useDimRed = NULL, splitBy = NULL, combinePlot = TRUE, minDensity = 8, contour = TRUE, contourLineWidth = 0.3, contourBins = 5, dot = TRUE, dotColor = "grey", dotSize = 0.6, dotAlpha = 0.3, dotRaster = NULL, title = NULL, legendFillTitle = "Density", colorPalette = "magma", colorDirection = -1, ... )
object |
A liger object |
useDimRed |
Name of the variable storing dimensionality reduction result
in the |
splitBy |
Character vector of categorical variable names in
|
combinePlot |
Logical, whether to utilize
|
minDensity |
A positive number to filter out low density region colored
on plot. Default |
contour |
Logical, whether to draw the contour line. Default
|
contourLineWidth |
Numeric, the width of the contour line. Default
|
contourBins |
Number of contour bins. Higher value generates more
contour lines. Default |
dot |
Logical, whether to add scatter plot of all cells, even when
density plot is splitted with |
dotColor , dotSize , dotAlpha
|
Numeric, controls the appearance of all
dots. Default |
dotRaster |
Logical, whether to rasterize the scatter plot. Default
|
title |
Text of main title of the plots. Default |
legendFillTitle |
Text of legend title. Default |
colorPalette |
Name of the option for
|
colorDirection |
Color gradient direction for
|
... |
Arguments passed on to
|
A ggplot object when only one plot is generated, A ggplot object
combined with plot_grid
when multiple plots and
combinePlot = TRUE
. A list of ggplot when multiple plots and
combinePlot = FALSE
.
# Example dataset has small number of cells, thus cutoff adjusted. plotDensityDimRed(pbmcPlot, minDensity = 1)
# Example dataset has small number of cells, thus cutoff adjusted. plotDensityDimRed(pbmcPlot, minDensity = 1)
This function allows for using available cell metadata to build
the x-/y-axis. Available per-cell data can be used to form the color/shape
annotation, including cell metadata, raw or processed gene expression, and
unnormalized or aligned factor loading. Multiple coloring variable is allowed
from the same specification of slot
, and this returns a list of plots
with different coloring values. Users can further split the plot(s) by
grouping on cells (e.g. datasets).
plotDimRed( object, colorBy = NULL, useDimRed = NULL, slot = c("cellMeta", "rawData", "normData", "scaleData", "H.norm", "H", "normPeak", "rawPeak"), colorByFunc = NULL, cellIdx = NULL, splitBy = NULL, shapeBy = NULL, titles = NULL, ... ) plotClusterDimRed(object, useCluster = NULL, useDimRed = NULL, ...) plotDatasetDimRed(object, useDimRed = NULL, ...) plotByDatasetAndCluster( object, useDimRed = NULL, useCluster = NULL, combinePlots = TRUE, ... ) plotGeneDimRed( object, features, useDimRed = NULL, log = TRUE, scaleFactor = 10000, zeroAsNA = TRUE, colorPalette = "C", ... ) plotPeakDimRed( object, features, useDimRed = NULL, log = TRUE, scaleFactor = 10000, zeroAsNA = TRUE, colorPalette = "C", ... ) plotFactorDimRed( object, factors, useDimRed = NULL, trimHigh = 0.03, zeroAsNA = TRUE, colorPalette = "D", ... )
plotDimRed( object, colorBy = NULL, useDimRed = NULL, slot = c("cellMeta", "rawData", "normData", "scaleData", "H.norm", "H", "normPeak", "rawPeak"), colorByFunc = NULL, cellIdx = NULL, splitBy = NULL, shapeBy = NULL, titles = NULL, ... ) plotClusterDimRed(object, useCluster = NULL, useDimRed = NULL, ...) plotDatasetDimRed(object, useDimRed = NULL, ...) plotByDatasetAndCluster( object, useDimRed = NULL, useCluster = NULL, combinePlots = TRUE, ... ) plotGeneDimRed( object, features, useDimRed = NULL, log = TRUE, scaleFactor = 10000, zeroAsNA = TRUE, colorPalette = "C", ... ) plotPeakDimRed( object, features, useDimRed = NULL, log = TRUE, scaleFactor = 10000, zeroAsNA = TRUE, colorPalette = "C", ... ) plotFactorDimRed( object, factors, useDimRed = NULL, trimHigh = 0.03, zeroAsNA = TRUE, colorPalette = "D", ... )
object |
A liger object. |
colorBy |
Available variable name in specified |
useDimRed |
Name of the variable storing dimensionality reduction result
in the |
slot |
Choose the slot to find the |
colorByFunc |
Default |
cellIdx |
Character, logical or numeric index that can subscribe cells.
Missing or |
splitBy |
Character vector of categorical variable names in
|
shapeBy |
Available variable name in |
titles |
Title text. A character scalar or a character vector with as
many elements as multiple plots are supposed to be generated. Default
|
... |
Arguments passed on to
|
useCluster |
Name of variable in |
combinePlots |
Logical, whether to utilize
|
features , factors
|
Name of genes or index of factors that need to be visualized. |
log |
Logical. Whether to log transform the normalized expression of
genes. Default |
scaleFactor |
Number to be multiplied with the normalized expression of
genes before log transformation. Default |
zeroAsNA |
Logical, whether to swap all zero values to |
colorPalette |
Name of viridis palette. See
|
trimHigh |
Number for highest cut-off to limit the outliers. Factor
loading above this value will all be trimmed to this value. Default
|
Available option for slot
include: "cellMeta"
,
"rawData"
, "normData"
, "scaleData"
, "H.norm"
and "H"
. When "rawData"
, "normData"
or
"scaleData"
, colorBy
has to be a character vector of feature
names. When "H.norm"
or "H"
, colorBy
can be any valid
index to select one factor of interests. Note that character index follows
"Factor_[k]"
format, with replacing [k]
with an integer.
When "cellMeta"
, colorBy
has to be an available column name in
the table. Note that, for colorBy
as well as x
, y
,
shapeBy
and splitBy
, since a matrix object is feasible in
cellMeta
table, using a column (e.g. named as "column1"
in a
certain matrix (e.g. named as "matrixVar"
) should follow the syntax of
"matrixVar.column1"
. When the matrix does not have a "colname"
attribute, the subscription goes with "matrixVar.V1"
,
"matrixVar.V2"
and etc. Use "UMAP.1"
, "UMAP.2"
,
"TSNE.1"
or "TSNE.2"
for the 2D embeddings generated with
rliger package. These are based on the nature of as.data.frame
method
on a DataFrame
object.
A ggplot object when a single plot is intended. A list of ggplot
objects, when multiple colorBy
variables and/or splitBy
are
set. When plotly = TRUE
, all ggplot objects become plotly (htmlwidget)
objects.
ggplot object when only one feature (e.g. cluster variable, gene, factor) is set. List object when multiple of those are specified.
plotDimRed(pbmcPlot, colorBy = "dataset", slot = "cellMeta", labelText = FALSE) plotDimRed(pbmcPlot, colorBy = "S100A8", slot = "normData", dotOrder = "ascending", dotSize = 2) plotDimRed(pbmcPlot, colorBy = 2, slot = "H.norm", dotOrder = "ascending", dotSize = 2, colorPalette = "viridis") plotClusterDimRed(pbmcPlot) plotDatasetDimRed(pbmcPlot) plotByDatasetAndCluster(pbmcPlot) plotGeneDimRed(pbmcPlot, varFeatures(pbmcPlot)[1]) plotFactorDimRed(pbmcPlot, 2)
plotDimRed(pbmcPlot, colorBy = "dataset", slot = "cellMeta", labelText = FALSE) plotDimRed(pbmcPlot, colorBy = "S100A8", slot = "normData", dotOrder = "ascending", dotSize = 2) plotDimRed(pbmcPlot, colorBy = 2, slot = "H.norm", dotOrder = "ascending", dotSize = 2, colorPalette = "viridis") plotClusterDimRed(pbmcPlot) plotDatasetDimRed(pbmcPlot) plotByDatasetAndCluster(pbmcPlot) plotGeneDimRed(pbmcPlot, varFeatures(pbmcPlot)[1]) plotFactorDimRed(pbmcPlot, 2)
Create volcano plot with EnhancedVolcano
plotEnhancedVolcano(result, group, ...)
plotEnhancedVolcano(result, group, ...)
result |
Data frame table returned by |
group |
Selection of one group available from |
... |
Arguments passed to EnhancedVolcano::EnhancedVolcano(), except
that |
ggplot
if (requireNamespace("EnhancedVolcano", quietly = TRUE)) { defaultCluster(pbmc) <- pbmcPlot$leiden_cluster # Test the DEG between "stim" and "ctrl", within each cluster result <- runPairwiseDEG( pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "defaultCluster" ) plotEnhancedVolcano(result, "0.stim") }
if (requireNamespace("EnhancedVolcano", quietly = TRUE)) { defaultCluster(pbmc) <- pbmcPlot$leiden_cluster # Test the DEG between "stim" and "ctrl", within each cluster result <- runPairwiseDEG( pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "defaultCluster" ) plotEnhancedVolcano(result, "0.stim") }
Plot Heatmap of Gene Expression or Factor Loading
plotGeneHeatmap( object, features, cellIdx = NULL, slot = c("normData", "rawData", "scaleData", "scaleUnsharedData"), useCellMeta = NULL, cellAnnotation = NULL, featureAnnotation = NULL, cellSplitBy = NULL, featureSplitBy = NULL, viridisOption = "C", ... ) plotFactorHeatmap( object, factors = NULL, cellIdx = NULL, slot = c("H.norm", "H"), useCellMeta = NULL, cellAnnotation = NULL, factorAnnotation = NULL, cellSplitBy = NULL, factorSplitBy = NULL, trim = c(0, 0.03), viridisOption = "D", ... )
plotGeneHeatmap( object, features, cellIdx = NULL, slot = c("normData", "rawData", "scaleData", "scaleUnsharedData"), useCellMeta = NULL, cellAnnotation = NULL, featureAnnotation = NULL, cellSplitBy = NULL, featureSplitBy = NULL, viridisOption = "C", ... ) plotFactorHeatmap( object, factors = NULL, cellIdx = NULL, slot = c("H.norm", "H"), useCellMeta = NULL, cellAnnotation = NULL, factorAnnotation = NULL, cellSplitBy = NULL, factorSplitBy = NULL, trim = c(0, 0.03), viridisOption = "D", ... )
object |
A liger object, with data to be plot available. |
features , factors
|
Character vector of genes of interests or numeric
index of factor to be involved. |
cellIdx |
Valid index to subscribe cells to be included. See
|
slot |
Use the chosen matrix for heatmap. For |
useCellMeta |
Character vector of available variable names in
|
cellAnnotation |
data.frame object for using external annotation, with
each column a variable and each row is a cell. Row names of this data.frame
will be used for matching cells involved in heatmap. For cells not found in
this data.frame, |
featureAnnotation , factorAnnotation
|
Similar as |
cellSplitBy |
Character vector of variable names available in annotation
given by |
featureSplitBy , factorSplitBy
|
Similar as |
viridisOption |
See |
... |
Arguments passed on to
|
trim |
Numeric vector of two numbers. Higher value limits the maximum
value and lower value limits the minimum value. Default |
HeatmapList-class
object
plotGeneHeatmap(pbmcPlot, varFeatures(pbmcPlot)) plotGeneHeatmap(pbmcPlot, varFeatures(pbmcPlot), useCellMeta = c("leiden_cluster", "dataset"), cellSplitBy = "leiden_cluster") plotFactorHeatmap(pbmcPlot) plotFactorHeatmap(pbmcPlot, cellIdx = pbmcPlot$leiden_cluster %in% 1:3, useCellMeta = c("leiden_cluster", "dataset"), cellSplitBy = "leiden_cluster")
plotGeneHeatmap(pbmcPlot, varFeatures(pbmcPlot)) plotGeneHeatmap(pbmcPlot, varFeatures(pbmcPlot), useCellMeta = c("leiden_cluster", "dataset"), cellSplitBy = "leiden_cluster") plotFactorHeatmap(pbmcPlot) plotFactorHeatmap(pbmcPlot, cellIdx = pbmcPlot$leiden_cluster %in% 1:3, useCellMeta = c("leiden_cluster", "dataset"), cellSplitBy = "leiden_cluster")
Visualize factor expression and gene loading
plotGeneLoadings( object, markerTable, useFactor, useDimRed = NULL, nLabel = 15, nPlot = 30, ... ) plotGeneLoadingRank( object, markerTable, useFactor, nLabel = 15, nPlot = 30, ... )
plotGeneLoadings( object, markerTable, useFactor, useDimRed = NULL, nLabel = 15, nPlot = 30, ... ) plotGeneLoadingRank( object, markerTable, useFactor, nLabel = 15, nPlot = 30, ... )
object |
A liger object with valid factorization result. |
markerTable |
Returned result of |
useFactor |
Integer index for which factor to visualize. |
useDimRed |
Name of the variable storing dimensionality reduction result
in the |
nLabel |
Integer, number of top genes to be shown with text labels.
Default |
nPlot |
Integer, number of top genes to be shown in the loading rank
plot. Default |
... |
Arguments passed on to
|
result <- getFactorMarkers(pbmcPlot, "ctrl", "stim") plotGeneLoadings(pbmcPlot, result, useFactor = 2)
result <- getFactorMarkers(pbmcPlot, "ctrl", "stim") plotGeneLoadings(pbmcPlot, result, useFactor = 2)
Visualize gene expression or cell metadata with violin plot
plotGeneViolin(object, gene, byDataset = TRUE, groupBy = NULL, ...) plotTotalCountViolin(object, groupBy = "dataset", ...) plotGeneDetectedViolin(object, groupBy = "dataset", ...)
plotGeneViolin(object, gene, byDataset = TRUE, groupBy = NULL, ...) plotTotalCountViolin(object, groupBy = "dataset", ...) plotGeneDetectedViolin(object, groupBy = "dataset", ...)
object |
A liger object. |
gene |
Character gene names. |
byDataset |
Logical, whether the violin plot should be splitted by
dataset. Default |
groupBy |
Names of available categorical variable in |
... |
Arguments passed on to
|
ggplot if using a single gene and not splitting by dataset. Otherwise, list of ggplot.
plotGeneViolin(pbmcPlot, varFeatures(pbmcPlot)[1], groupBy = "leiden_cluster") plotTotalCountViolin(pbmc) plotGeneDetectedViolin(pbmc, dot = TRUE, box = TRUE, colorBy = "dataset")
plotGeneViolin(pbmcPlot, varFeatures(pbmcPlot)[1], groupBy = "leiden_cluster") plotTotalCountViolin(pbmc) plotGeneDetectedViolin(pbmc, dot = TRUE, box = TRUE, colorBy = "dataset")
Visualize GO enrichment test result in dot plot
plotGODot( result, group = NULL, query = c("Up", "Down"), pvalThresh = 0.05, n = 20, termIDMatch = "^GO", colorPalette = "E", colorDirection = 1, xlab = "-log10(P-value)", ylab = "Term name", ... )
plotGODot( result, group = NULL, query = c("Up", "Down"), pvalThresh = 0.05, n = 20, termIDMatch = "^GO", colorPalette = "E", colorDirection = 1, xlab = "-log10(P-value)", ylab = "Term name", ... )
result |
Returned list object from |
group |
Character vector of group names, must be available in
|
query |
A single string selecting from which query to show the result.
Choose from |
pvalThresh |
Numeric scalar, cutoff for p-value where smaller values are
considered as significant. Default |
n |
Number of top terms to be shown, ranked by p-value. Default
|
termIDMatch |
Regular expression pattern to match the term ID. Default
|
colorPalette , colorDirection
|
Viridis palette options. Default
|
xlab , ylab
|
Axis title for x and y axis. Default
|
... |
Arguments passed on to
|
A ggplot object if only one group or a list of ggplot objects.
defaultCluster(pbmc) <- pbmcPlot$leiden_cluster # Test the DEG between "stim" and "ctrl", within each cluster result <- runPairwiseDEG( pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "defaultCluster" ) # Setting `significant = FALSE` because it's hard for a gene list obtained # from small test dataset to represent real-life biology. if (requireNamespace("gprofiler2", quietly = TRUE)) { go <- runGOEnrich(result, group = "0.stim", splitReg = TRUE, significant = FALSE) # The toy example won't have significant result. plotGODot(go) }
defaultCluster(pbmc) <- pbmcPlot$leiden_cluster # Test the DEG between "stim" and "ctrl", within each cluster result <- runPairwiseDEG( pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "defaultCluster" ) # Setting `significant = FALSE` because it's hard for a gene list obtained # from small test dataset to represent real-life biology. if (requireNamespace("gprofiler2", quietly = TRUE)) { go <- runGOEnrich(result, group = "0.stim", splitReg = TRUE, significant = FALSE) # The toy example won't have significant result. plotGODot(go) }
This function produces combined plot on group level (e.g. dataset, other metadata variable like biological conditions). Scatter plot of dimension reduction with cluster labeled is generated per group. Furthermore, a stacked barplot of cluster proportion within each group is also combined with the subplot of each group.
plotGroupClusterDimRed( object, useGroup = "dataset", useCluster = NULL, useDimRed = NULL, combinePlot = TRUE, droplevels = TRUE, relHeightMainLegend = c(5, 1), relHeightDRBar = c(10, 1), mainNRow = NULL, mainNCol = NULL, legendNRow = 1, ... )
plotGroupClusterDimRed( object, useGroup = "dataset", useCluster = NULL, useDimRed = NULL, combinePlot = TRUE, droplevels = TRUE, relHeightMainLegend = c(5, 1), relHeightDRBar = c(10, 1), mainNRow = NULL, mainNCol = NULL, legendNRow = 1, ... )
object |
A liger object with dimension reduction, grouping
variable and cluster assignment in |
useGroup |
Variable name of the group division in metadata. Default
|
useCluster |
Name of variable in |
useDimRed |
Name of the variable storing dimensionality reduction result
in |
combinePlot |
Whether to return combined plot. Default |
droplevels |
Logical, whether to perform |
relHeightMainLegend |
Relative heights of the main combination panel and
the legend at the bottom. Must be a numeric vector of 2 numbers. Default
|
relHeightDRBar |
Relative heights of the scatter plot and the barplot
within each subpanel. Must be a numeric vector of 2 numbers. Default
|
mainNRow , mainNCol
|
Arrangement of the main plotting region, for number
of rows and columns. Default |
legendNRow |
Arrangement of the legend, number of rows. Default
|
... |
Arguments passed on to
|
ggplot object when only one feature (e.g. cluster variable, gene, factor) is set. List object when multiple of those are specified.
plotGroupClusterDimRed(pbmcPlot)
plotGroupClusterDimRed(pbmcPlot)
Create heatmap for showing top marker expression in conditions
plotMarkerHeatmap( object, result, topN = 5, lfcThresh = 1, padjThresh = 0.05, pctInThresh = 50, pctOutThresh = 50, dedupBy = c("logFC", "padj"), groupBy = NULL, groupSize = 50, column_title = NULL, ... )
plotMarkerHeatmap( object, result, topN = 5, lfcThresh = 1, padjThresh = 0.05, pctInThresh = 50, pctOutThresh = 50, dedupBy = c("logFC", "padj"), groupBy = NULL, groupSize = 50, column_title = NULL, ... )
object |
A liger object, with normalized data and metadata to annotate available. |
result |
The data.frame returned by |
topN |
Number of top features to be plot for each group. Default
|
lfcThresh |
Hard threshold on logFC value. Default |
padjThresh |
Hard threshold on adjusted P-value. Default |
pctInThresh , pctOutThresh
|
Threshold on expression percentage. These
mean that a feature will only pass the filter if it is expressed in more than
|
dedupBy |
When ranking by padj and logFC and a feature is ranked as top
for multiple clusters, assign this feature as the marker of a cluster when
it has the largest |
groupBy |
Cell metadata variable names for cell grouping. Downsample
balancing will also be aware of this. Default |
groupSize |
Maximum number of cells in each group to be downsampled for
plotting. Default |
column_title |
Title on the column. Default |
... |
Arguments passed on to
|
A HeatmapList-class object.
defaultCluster(pbmc) <- pbmcPlot$leiden_cluster pbmc <- normalize(pbmc) plotMarkerHeatmap(pbmc, deg.marker)
defaultCluster(pbmc) <- pbmcPlot$leiden_cluster pbmc <- normalize(pbmc) plotMarkerHeatmap(pbmc, deg.marker)
Create heatmap for pairwise DEG analysis result
plotPairwiseDEGHeatmap( object, result, group = NULL, topN = 20, absLFCThresh = 1, padjThresh = 0.05, pctInThresh = 50, pctOutThresh = 50, downsampleSize = 200, useCellMeta = NULL, column_title = NULL, seed = 1, ... )
plotPairwiseDEGHeatmap( object, result, group = NULL, topN = 20, absLFCThresh = 1, padjThresh = 0.05, pctInThresh = 50, pctOutThresh = 50, downsampleSize = 200, useCellMeta = NULL, column_title = NULL, seed = 1, ... )
object |
A liger object, with normalized data and metadata to annotate available. |
result |
The data.frame returned by |
group |
The test group name among the result to be shown. Must specify
only one if multiple tests are available (i.e. split test). Default
|
topN |
Maximum number of top significant features to be plot for up- and
down-regulated genes. Default |
absLFCThresh |
Hard threshold on absolute logFC value. Default |
padjThresh |
Hard threshold on adjusted P-value. Default |
pctInThresh , pctOutThresh
|
Threshold on expression percentage. These
mean that a feature will only pass the filter if it is expressed in more than
|
downsampleSize |
Maximum number of downsampled cells to be shown in the
heatmap. The downsampling is balanced on the cells involved in the test
specified. Default |
useCellMeta |
Cell metadata variable names for cell grouping. Default
|
column_title |
Title on the column. Default |
seed |
Random seed for reproducibility. Default |
... |
Arguments passed on to
|
A HeatmapList-class object.
defaultCluster(pbmc) <- pbmcPlot$leiden_cluster pbmc <- normalize(pbmc) plotPairwiseDEGHeatmap(pbmc, deg.pw, '4.stim')
defaultCluster(pbmc) <- pbmcPlot$leiden_cluster pbmc <- normalize(pbmc) plotPairwiseDEGHeatmap(pbmc, deg.pw, '4.stim')
plotProportionBar
creates bar plots comparing the
cross-category proportion. plotProportionDot
creates dot plots.
plotClusterProportions
has variable pre-specified and calls the dot
plot. plotProportion
produces a combination of both bar plots and dot
plot.
Having package "ggrepel" installed can help adding tidier percentage
annotation on the pie chart. Run options(ggrepel.max.overlaps = n)
before plotting to set allowed label overlaps.
plotProportion( object, class1 = NULL, class2 = "dataset", method = c("stack", "group", "pie"), ... ) plotProportionDot( object, class1 = NULL, class2 = "dataset", showLegend = FALSE, panelBorder = TRUE, ... ) plotProportionBar( object, class1 = NULL, class2 = "dataset", method = c("stack", "group"), inclRev = FALSE, panelBorder = TRUE, combinePlot = TRUE, ... ) plotClusterProportions(object, useCluster = NULL, return.plot = FALSE, ...) plotProportionPie( object, class1 = NULL, class2 = "dataset", labelSize = 4, labelColor = "black", circleColors = NULL, ... )
plotProportion( object, class1 = NULL, class2 = "dataset", method = c("stack", "group", "pie"), ... ) plotProportionDot( object, class1 = NULL, class2 = "dataset", showLegend = FALSE, panelBorder = TRUE, ... ) plotProportionBar( object, class1 = NULL, class2 = "dataset", method = c("stack", "group"), inclRev = FALSE, panelBorder = TRUE, combinePlot = TRUE, ... ) plotClusterProportions(object, useCluster = NULL, return.plot = FALSE, ...) plotProportionPie( object, class1 = NULL, class2 = "dataset", labelSize = 4, labelColor = "black", circleColors = NULL, ... )
object |
A liger object. |
class1 , class2
|
Each should be a single name of a categorical variable
available in |
method |
For bar plot, choose whether to draw |
... |
Arguments passed on to
|
showLegend |
Whether to show the legend. Default |
panelBorder |
Whether to show rectangle border of the panel instead of
using ggplot classic bottom and left axis lines. Default |
inclRev |
Logical, for barplot, whether to reverse the specification for
|
combinePlot |
Logical, whether to combine the two plots with
|
useCluster |
For |
return.plot |
|
labelSize , labelColor
|
Settings on pie chart percentage label. Default
|
circleColors |
Character vector of colors. |
ggplot or list of ggplot
plotProportion(pbmcPlot) plotProportionBar(pbmcPlot, method = "group") plotProportionPie(pbmcPlot)
plotProportion(pbmcPlot) plotProportionBar(pbmcPlot, method = "group") plotProportionPie(pbmcPlot)
This function calculate the proportion of each category (e.g. cluster, cell type) within each dataset, and then make box plot grouped by condition. The proportion of all categories within one dataset sums up to 1. The condition variable must be a variable of dataset, i.e. each dataset must belong to only one condition.
plotProportionBox( object, useCluster = NULL, conditionBy = NULL, sampleBy = "dataset", splitByCluster = FALSE, dot = FALSE, dotSize = getOption("ligerDotSize", 1), dotJitter = FALSE, ... )
plotProportionBox( object, useCluster = NULL, conditionBy = NULL, sampleBy = "dataset", splitByCluster = FALSE, dot = FALSE, dotSize = getOption("ligerDotSize", 1), dotJitter = FALSE, ... )
object |
A liger object. |
useCluster |
Name of variable in |
conditionBy |
Name of the variable in |
sampleBy |
Name of the variable in |
splitByCluster |
Logical, whether to split the wide grouped box plot by
cluster, into a list of boxplots for each cluster. Default |
dot |
Logical, whether to add dot plot on top of the box plot. Default
|
dotSize |
Size of the dot. Default uses user option "ligerDotSize", or
|
dotJitter |
Logical, whether to jitter the dot to avoid overlapping
within a box when many dots are presented. Default |
... |
Arguments passed on to
|
A ggplot object or a list of ggplot objects if
splitByCluster = TRUE
.
# "boxes" are expected to appear as horizontal lines, because there's no # "condition" variable that groups the datasets in the example object, and # thus only one value exists for each "box". plotProportionBox(pbmcPlot, conditionBy = "dataset")
# "boxes" are expected to appear as horizontal lines, because there's no # "condition" variable that groups the datasets in the example object, and # thus only one value exists for each "box". plotProportionBox(pbmcPlot, conditionBy = "dataset")
Creates a riverplot/Sankey diagram to show how independent cluster
assignments from two datasets map onto a joint clustering. Prior knowledge of
cell annotation for the given datasets is required to make sense from the
visualization. Dataset original annotation can be added with the syntax shown
in example code in this manual. The joint clustering could be generated with
runCluster
or set by any other metadata annotation.
Dataset original annotation can be inserted before running this function
using cellMeta<-
method. Please see example below.
This function depends on CRAN available package "sankey" and it has to be installed in order to make this function work.
plotSankey( object, cluster1, cluster2, clusterConsensus = NULL, minFrac = 0.01, minCell = 10, titles = NULL, prefixes = NULL, labelCex = 1, titleCex = 1.1, colorValues = scPalette, mar = c(2, 2, 4, 2) )
plotSankey( object, cluster1, cluster2, clusterConsensus = NULL, minFrac = 0.01, minCell = 10, titles = NULL, prefixes = NULL, labelCex = 1, titleCex = 1.1, colorValues = scPalette, mar = c(2, 2, 4, 2) )
object |
A liger object with all three clustering variables available. |
cluster1 , cluster2
|
Name of the variables in |
clusterConsensus |
Name of the joint cluster variable to use. Default
uses the default clustering of the object. Can select a variable name in
|
minFrac |
Numeric. Minimum fraction of cluster for an edge to be shown.
Default |
minCell |
Numeric. Minimum number of cells for an edge to be shown.
Default |
titles |
Character vector of three. Customizes the column title text
shown. Default uses the variable names |
prefixes |
Character vector of three. Cluster names have to be unique
across all three variables, so this is provided to deduplicate the clusters
by adding |
labelCex |
Numeric. Amount by which node label text should be magnified
relative to the default. Default |
titleCex |
Numeric. Amount by which node label text should be magnified
relative to the default. Default |
colorValues |
Character vector of color codes to set color for each
level in the consensus clustering. Default |
mar |
Numeric vector of the form |
No returned value. The sankey diagram will be displayed instead.
This function works as a replacement of the function makeRiverplot
in rliger <1.99. We decide to make a new function because the dependency
adopted by the older version is archived on CRAN and will be no longer
available.
# Make fake dataset specific labels from joint clustering result cellMeta(pbmcPlot, "ctrl_cluster", "ctrl") <- cellMeta(pbmcPlot, "leiden_cluster", "ctrl") cellMeta(pbmcPlot, "stim_cluster", "stim") <- cellMeta(pbmcPlot, "leiden_cluster", "stim") if (requireNamespace("sankey", quietly = TRUE)) { plotSankey(pbmcPlot, "ctrl_cluster", "stim_cluster", titles = c("control", "LIGER", "stim"), prefixes = c("c", NA, "s")) }
# Make fake dataset specific labels from joint clustering result cellMeta(pbmcPlot, "ctrl_cluster", "ctrl") <- cellMeta(pbmcPlot, "leiden_cluster", "ctrl") cellMeta(pbmcPlot, "stim_cluster", "stim") <- cellMeta(pbmcPlot, "leiden_cluster", "stim") if (requireNamespace("sankey", quietly = TRUE)) { plotSankey(pbmcPlot, "ctrl_cluster", "stim_cluster", titles = c("control", "LIGER", "stim"), prefixes = c("c", NA, "s")) }
Simple visualization of spatial coordinates. See example code for how to have information preset in the object. Arguments to the liger object method are passed down to ligerDataset method.
plotSpatial2D(object, ...) ## S3 method for class 'liger' plotSpatial2D(object, dataset, useCluster = NULL, legendColorTitle = NULL, ...) ## S3 method for class 'ligerSpatialDataset' plotSpatial2D( object, useCluster = NULL, legendColorTitle = NULL, useDims = c(1, 2), xlab = NULL, ylab = NULL, labelText = FALSE, panelBorder = TRUE, ... )
plotSpatial2D(object, ...) ## S3 method for class 'liger' plotSpatial2D(object, dataset, useCluster = NULL, legendColorTitle = NULL, ...) ## S3 method for class 'ligerSpatialDataset' plotSpatial2D( object, useCluster = NULL, legendColorTitle = NULL, useDims = c(1, 2), xlab = NULL, ylab = NULL, labelText = FALSE, panelBorder = TRUE, ... )
object |
Either a liger object containing a spatial dataset or a ligerSpatialDataset object. |
... |
Arguments passed on to
|
dataset |
Name of one spatial dataset. |
useCluster |
Either the name of one variable in |
legendColorTitle |
Alternative title text in the legend. Default
|
useDims |
Numeric vector of two, choosing the coordinates to be drawn
on 2D space. (STARmap data could have 3 dimensions.) Default |
xlab , ylab
|
Text label on x-/y-axis. Default |
labelText |
Logical, whether to label annotation onto the scatter plot.
Default |
panelBorder |
Whether to show rectangle border of the panel instead of
using ggplot classic bottom and left axis lines. Default |
A ggplot object
ctrl.fake.spatial <- as.ligerDataset(dataset(pbmc, "ctrl"), modal = "spatial") fake.coords <- matrix(rnorm(2 * ncol(ctrl.fake.spatial)), ncol = 2) coordinate(ctrl.fake.spatial) <- fake.coords dataset(pbmc, "ctrl") <- ctrl.fake.spatial defaultCluster(pbmc) <- pbmcPlot$leiden_cluster plotSpatial2D(pbmc, dataset = "ctrl")
ctrl.fake.spatial <- as.ligerDataset(dataset(pbmc, "ctrl"), modal = "spatial") fake.coords <- matrix(rnorm(2 * ncol(ctrl.fake.spatial)), ncol = 2) coordinate(ctrl.fake.spatial) <- fake.coords dataset(pbmc, "ctrl") <- ctrl.fake.spatial defaultCluster(pbmc) <- pbmcPlot$leiden_cluster plotSpatial2D(pbmc, dataset = "ctrl")
For each dataset where the feature variablitity is calculated, a plot of log10 feature expression variance and log10 mean will be produced. Features that are considered as variable would be highlighted in red.
plotVarFeatures(object, combinePlot = TRUE, dotSize = 1, ...)
plotVarFeatures(object, combinePlot = TRUE, dotSize = 1, ...)
object |
liger object. |
combinePlot |
Logical. If |
dotSize |
Controls the size of dots in the main plot. Default
|
... |
More theme setting parameters passed to
|
ggplot
object when combinePlot = TRUE
, a list of
ggplot
objects when combinePlot = FALSE
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) plotVarFeatures(pbmc)
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) plotVarFeatures(pbmc)
plotVolcano
is a simple implementation and shares most of arguments
with other rliger plotting functions. plotEnhancedVolcano
is a
wrapper function of EnhancedVolcano::EnhancedVolcano()
, which has
provides substantial amount of arguments for graphical control. However, that
requires the installation of package "EnhancedVolcano".
highlight
and labelTopN
both controls the feature name
labeling, whereas highlight
is considered first. If both are as
default (NULL
), all significant features will be labeled.
plotVolcano( result, group = NULL, logFCThresh = 1, padjThresh = 0.01, highlight = NULL, labelTopN = NULL, dotSize = 2, dotAlpha = 0.8, legendPosition = "top", labelSize = 4, ... )
plotVolcano( result, group = NULL, logFCThresh = 1, padjThresh = 0.01, highlight = NULL, labelTopN = NULL, dotSize = 2, dotAlpha = 0.8, legendPosition = "top", labelSize = 4, ... )
result |
Data frame table returned by |
group |
Selection of one group available from |
logFCThresh |
Number for the threshold on the absolute value of the log2
fold change statistics. Default |
padjThresh |
Number for the threshold on the adjusted p-value
statistics. Default |
highlight |
A character vector of feature names to be highlighted.
Default |
labelTopN |
Number of top differential expressed features to be labeled
on the top of the dots. Ranked by adjusted p-value first and absolute value
of logFC next. Default |
dotSize , dotAlpha
|
Numbers for universal aesthetics control of dots.
Default |
legendPosition |
Text indicating where to place the legend. Choose from
|
labelSize |
Size of labeled top features and line annotations. Default
|
... |
Arguments passed on to
|
ggplot
plotVolcano(deg.pw, "0.stim")
plotVolcano(deg.pw, "0.stim")
Please turn to quantileNorm
.
This process builds a shared factor neighborhood graph to jointly cluster cells, then quantile normalizes corresponding clusters.
The first step, building the shared factor neighborhood graph, is performed in SNF(), and produces a graph representation where edge weights between cells (across all datasets) correspond to their similarity in the shared factor neighborhood space. An important parameter here is knn_k, the number of neighbors used to build the shared factor space.
Next we perform quantile alignment for each dataset, factor, and cluster (by stretching/compressing datasets' quantiles to better match those of the reference dataset). These aligned factor loadings are combined into a single matrix and returned as H.norm.
object |
|
knn_k |
Number of nearest neighbors for within-dataset knn graph (default 20). |
ref_dataset |
Name of dataset to use as a "reference" for normalization. By default, the dataset with the largest number of cells is used. |
min_cells |
Minimum number of cells to consider a cluster shared across datasets (default 20) |
quantiles |
Number of quantiles to use for quantile normalization (default 50). |
eps |
The error bound of the nearest neighbor search. (default 0.9) Lower values give more accurate nearest neighbor graphs but take much longer to computer. |
dims.use |
Indices of factors to use for shared nearest factor
determination (default |
do.center |
Centers the data when scaling factors (useful for less sparse modalities like methylation data). (default FALSE) |
max_sample |
Maximum number of cells used for quantile normalization of each cluster and factor. (default 1000) |
refine.knn |
whether to increase robustness of cluster assignments using KNN graph.(default TRUE) |
rand.seed |
Random seed to allow reproducible results (default 1) |
liger
object with 'H.norm' and 'clusters' slot set.
This is a deprecated function. Calling 'quantileNorm' instead.
quantileAlignSNF( object, knn_k = 20, k2 = 500, prune.thresh = 0.2, ref_dataset = NULL, min_cells = 20, quantiles = 50, nstart = 10, resolution = 1, dims.use = 1:ncol(x = object@H[[1]]), dist.use = "CR", center = FALSE, small.clust.thresh = 0, id.number = NULL, print.mod = FALSE, print.align.summary = FALSE )
quantileAlignSNF( object, knn_k = 20, k2 = 500, prune.thresh = 0.2, ref_dataset = NULL, min_cells = 20, quantiles = 50, nstart = 10, resolution = 1, dims.use = 1:ncol(x = object@H[[1]]), dist.use = "CR", center = FALSE, small.clust.thresh = 0, id.number = NULL, print.mod = FALSE, print.align.summary = FALSE )
object |
|
knn_k |
Number of nearest neighbors for within-dataset knn graph (default 20). |
k2 |
Horizon parameter for shared nearest factor graph. Distances to all but the k2 nearest neighbors are set to 0 (cuts down on memory usage for very large graphs). (default 500) |
prune.thresh |
Minimum allowed edge weight. Any edges below this are removed (given weight 0) (default 0.2) |
ref_dataset |
Name of dataset to use as a "reference" for normalization. By default, the dataset with the largest number of cells is used. |
min_cells |
Minimum number of cells to consider a cluster shared across datasets (default 2) |
quantiles |
Number of quantiles to use for quantile normalization (default 50). |
nstart |
Number of times to perform Louvain community detection with different random starts (default 10). |
resolution |
Controls the number of communities detected. Higher resolution -> more communities. (default 1) |
dims.use |
Indices of factors to use for shared nearest factor determination (default
|
dist.use |
Distance metric to use in calculating nearest neighbors (default "CR"). |
center |
Centers the data when scaling factors (useful for less sparse modalities like methylation data). (default FALSE) |
small.clust.thresh |
Extracts small clusters loading highly on single factor with fewer cells than this before regular alignment (default 0 – no small cluster extraction). |
id.number |
Number to use for identifying edge file (when running in parallel) (generates random value by default). |
print.mod |
Print modularity output from clustering algorithm (default FALSE). |
print.align.summary |
Print summary of clusters which did not align normally (default FALSE). |
This process builds a shared factor neighborhood graph to jointly cluster cells, then quantile normalizes corresponding clusters.
The first step, building the shared factor neighborhood graph, is performed in SNF(), and produces a graph representation where edge weights between cells (across all datasets) correspond to their similarity in the shared factor neighborhood space. An important parameter here is knn_k, the number of neighbors used to build the shared factor space (see SNF()). Afterwards, modularity-based community detection is performed on this graph (Louvain clustering) in order to identify shared clusters across datasets. The method was first developed by Waltman and van Eck (2013) and source code is available at http://www.ludowaltman.nl/slm/. The most important parameter here is resolution, which corresponds to the number of communities detected.
Next we perform quantile alignment for each dataset, factor, and cluster (by stretching/compressing datasets' quantiles to better match those of the reference dataset). These aligned factor loadings are combined into a single matrix and returned as H.norm.
liger
object with H.norm and cluster slots set.
## Not run: # liger object, factorization complete ligerex # do basic quantile alignment ligerex <- quantileAlignSNF(ligerex) # higher resolution for more clusters (note that SNF is conserved) ligerex <- quantileAlignSNF(ligerex, resolution = 1.2) # change knn_k for more fine-grained local clustering ligerex <- quantileAlignSNF(ligerex, knn_k = 15, resolution = 1.2) ## End(Not run)
## Not run: # liger object, factorization complete ligerex # do basic quantile alignment ligerex <- quantileAlignSNF(ligerex) # higher resolution for more clusters (note that SNF is conserved) ligerex <- quantileAlignSNF(ligerex, resolution = 1.2) # change knn_k for more fine-grained local clustering ligerex <- quantileAlignSNF(ligerex, knn_k = 15, resolution = 1.2) ## End(Not run)
This process builds a shared factor neighborhood graph to jointly cluster cells, then quantile normalizes corresponding clusters.
The first step, building the shared factor neighborhood graph, is performed
in SNF(), and produces a graph representation where edge weights between
cells (across all datasets) correspond to their similarity in the shared
factor neighborhood space. An important parameter here is nNeighbors
,
the number of neighbors used to build the shared factor space.
Next we perform quantile alignment for each dataset, factor, and cluster (by stretching/compressing datasets' quantiles to better match those of the reference dataset).
quantileNorm(object, ...) ## S3 method for class 'liger' quantileNorm( object, quantiles = 50, reference = NULL, minCells = 20, nNeighbors = 20, useDims = NULL, center = FALSE, maxSample = 1000, eps = 0.9, refineKNN = TRUE, clusterName = "quantileNorm_cluster", seed = 1, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' quantileNorm( object, reduction = "inmf", quantiles = 50, reference = NULL, minCells = 20, nNeighbors = 20, useDims = NULL, center = FALSE, maxSample = 1000, eps = 0.9, refineKNN = TRUE, clusterName = "quantileNorm_cluster", seed = 1, verbose = getOption("ligerVerbose", TRUE), ... )
quantileNorm(object, ...) ## S3 method for class 'liger' quantileNorm( object, quantiles = 50, reference = NULL, minCells = 20, nNeighbors = 20, useDims = NULL, center = FALSE, maxSample = 1000, eps = 0.9, refineKNN = TRUE, clusterName = "quantileNorm_cluster", seed = 1, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' quantileNorm( object, reduction = "inmf", quantiles = 50, reference = NULL, minCells = 20, nNeighbors = 20, useDims = NULL, center = FALSE, maxSample = 1000, eps = 0.9, refineKNN = TRUE, clusterName = "quantileNorm_cluster", seed = 1, verbose = getOption("ligerVerbose", TRUE), ... )
object |
A liger or Seurat object with valid factorization
result available (i.e. |
... |
Arguments passed to other S3 methods of this function. |
quantiles |
Number of quantiles to use for quantile normalization.
Default |
reference |
Character, numeric or logical selection of one dataset, out
of all available datasets in |
minCells |
Minimum number of cells to consider a cluster shared across
datasets. Default |
nNeighbors |
Number of nearest neighbors for within-dataset knn graph.
Default |
useDims |
Indices of factors to use for shared nearest factor
determination. Default |
center |
Whether to center the data when scaling factors. Could be
useful for less sparse modalities like methylation data. Default
|
maxSample |
Maximum number of cells used for quantile normalization of
each cluster and factor. Default |
eps |
The error bound of the nearest neighbor search. Lower values give
more accurate nearest neighbor graphs but take much longer to compute.
Default |
refineKNN |
whether to increase robustness of cluster assignments using
KNN graph. Default |
clusterName |
Variable name that will store the clustering result
in metadata of a liger object or a |
seed |
Random seed to allow reproducible results. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
reduction |
Name of the reduction where LIGER integration result is
stored. Default |
Updated input object
liger method
Update the H.norm
slot for the alignment cell factor
loading, ready for running graph based community detection
clustering or dimensionality reduction for visualization.
Update the cellMata
slot with a cluster assignment basing
on cell factor loading
Seurat method
Update the reductions
slot with a new DimReduc
object containing the aligned cell factor loading.
Update the metadata with a cluster assignment basing on cell factor loading
pbmc <- quantileNorm(pbmcPlot)
pbmc <- quantileNorm(pbmcPlot)
Similar as how default ligerDataset data is accessed.
rawPeak(x, dataset) rawPeak(x, dataset, check = TRUE) <- value normPeak(x, dataset) normPeak(x, dataset, check = TRUE) <- value ## S4 method for signature 'liger,character' rawPeak(x, dataset) ## S4 replacement method for signature 'liger,character' rawPeak(x, dataset, check = TRUE) <- value ## S4 method for signature 'liger,character' normPeak(x, dataset) ## S4 replacement method for signature 'liger,character' normPeak(x, dataset, check = TRUE) <- value ## S4 method for signature 'ligerATACDataset,missing' rawPeak(x, dataset = NULL) ## S4 replacement method for signature 'ligerATACDataset,missing' rawPeak(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'ligerATACDataset,missing' normPeak(x, dataset = NULL) ## S4 replacement method for signature 'ligerATACDataset,missing' normPeak(x, dataset = NULL, check = TRUE) <- value
rawPeak(x, dataset) rawPeak(x, dataset, check = TRUE) <- value normPeak(x, dataset) normPeak(x, dataset, check = TRUE) <- value ## S4 method for signature 'liger,character' rawPeak(x, dataset) ## S4 replacement method for signature 'liger,character' rawPeak(x, dataset, check = TRUE) <- value ## S4 method for signature 'liger,character' normPeak(x, dataset) ## S4 replacement method for signature 'liger,character' normPeak(x, dataset, check = TRUE) <- value ## S4 method for signature 'ligerATACDataset,missing' rawPeak(x, dataset = NULL) ## S4 replacement method for signature 'ligerATACDataset,missing' rawPeak(x, dataset = NULL, check = TRUE) <- value ## S4 method for signature 'ligerATACDataset,missing' normPeak(x, dataset = NULL) ## S4 replacement method for signature 'ligerATACDataset,missing' normPeak(x, dataset = NULL, check = TRUE) <- value
x |
ligerATACDataset object or a liger object. |
dataset |
Name or numeric index of an ATAC dataset. |
check |
Logical, whether to perform object validity check on setting new value. |
value |
|
The retrieved peak count matrix or the updated x
object.
Enables easy loading of sparse data matrices provided by 10X genomics.
read10X
works generally for 10X cellranger pipelines including:
CellRanger < 3.0 & >= 3.0 and CellRanger-ARC.
read10XRNA
invokes read10X
and takes the "Gene Expression" out,
so that the result can directly be used to construct a liger
object. See Examples for demonstration.
read10XATAC
works for both cellRanger-ARC and cellRanger-ATAC
pipelines but needs user arguments for correct recognition. Similarly, the
returned value can directly be used for constructing a liger
object.
read10X( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, reference = NULL, geneCol = 2, cellCol = 1, returnList = FALSE, verbose = getOption("ligerVerbose", TRUE), sample.dirs = path, sample.names = sampleNames, use.filtered = useFiltered, data.type = NULL, merge = NULL, num.cells = NULL, min.umis = NULL ) read10XRNA( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, reference = NULL, returnList = FALSE, ... ) read10XATAC( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, pipeline = c("atac", "arc"), arcFeatureType = "Peaks", returnList = FALSE, geneCol = 2, cellCol = 1, verbose = getOption("ligerVerbose", TRUE) )
read10X( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, reference = NULL, geneCol = 2, cellCol = 1, returnList = FALSE, verbose = getOption("ligerVerbose", TRUE), sample.dirs = path, sample.names = sampleNames, use.filtered = useFiltered, data.type = NULL, merge = NULL, num.cells = NULL, min.umis = NULL ) read10XRNA( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, reference = NULL, returnList = FALSE, ... ) read10XATAC( path, sampleNames = NULL, addPrefix = FALSE, useFiltered = NULL, pipeline = c("atac", "arc"), arcFeatureType = "Peaks", returnList = FALSE, geneCol = 2, cellCol = 1, verbose = getOption("ligerVerbose", TRUE) )
path |
(A.) A Directory containing the matrix.mtx, genes.tsv (or features.tsv), and barcodes.tsv files provided by 10X. A vector, a named vector, a list or a named list can be given in order to load several data directories. (B.) The 10X root directory where subdirectories of per-sample output folders can be found. Sample names will by default take the name of the vector, list or subfolders. |
sampleNames |
A vector of names to override the detected or set sample
names for what is given to |
addPrefix |
Logical, whether to add sample names as a prefix to the
barcodes. Default |
useFiltered |
Logical, if |
reference |
In case of specifying a CellRanger<3 root folder to
|
geneCol |
Specify which column of genes.tsv or features.tsv to use for
gene names. Default |
cellCol |
Specify which column of barcodes.tsv to use for cell names.
Default |
returnList |
Logical, whether to still return a structured list instead
of a single matrix object, in the case where only one sample and only one
feature type can be found. Otherwise will always return a list. Default
|
verbose |
Logical. Whether to show information of the progress. Default
|
sample.dirs , sample.names , use.filtered
|
These arguments are renamed and will be deprecated in the future. Please see usage for corresponding arguments. |
data.type , merge , num.cells , min.umis
|
These arguments are defuncted because the functionality can/should be fulfilled with other functions. |
... |
Arguments passed to |
pipeline |
Which cellRanger pipeline type to find the ATAC data. Choose
|
arcFeatureType |
When |
When only one sample is given or detected, and only one feature type
is detected or using CellRanger < 3.0, and returnList = FALSE
, a
sparse matrix object (dgCMatrix class) will be returned.
When using read10XRNA
or read10XATAC
, which are modality
specific, returns a list named by samples, and each element is the
corresponding sparse matrix object (dgCMatrix class).
read10X
generally returns a list named by samples. Each sample
element will be another list named by feature types even if only one feature
type is detected (or using CellRanger < 3.0) for data structure consistency.
The feature type "Gene Expression" always comes as the first type if
available.
## Not run: # For output from CellRanger < 3.0 dir <- 'path/to/data/directory' list.files(dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtx mat <- read10X(dir) class(mat) # Should show dgCMatrix # For root directory from CellRanger < 3.0 dir <- 'path/to/root' list.dirs(dir) # Should show sample names matList <- read10X(dir) names(matList) # Should show the sample names class(matList[[1]][["Gene Expression"]]) # Should show dgCMatrix # For output from CellRanger >= 3.0 with multiple data types dir <- 'path/to/data/directory' list.files(dir) # Should show barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz matList <- read10X(dir, sampleNames = "tissue1") names(matList) # Shoud show "tissue1" names(matList$tissue1) # Should show feature types, e.g. "Gene Expression" and etc. # For root directory from CellRanger >= 3.0 with multiple data types dir <- 'path/to/root' list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3" matList <- read10X(dir) names(matList) # Should show the sample names: "rep1", "rep2", "rep3" names(matList$rep1) # Should show the avalable feature types for rep1 ## End(Not run) ## Not run: # For creating LIGER object from root directory of CellRanger >= 3.0 dir <- 'path/to/root' list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3" matList <- read10XRNA(dir) names(matList) # Should show the sample names: "rep1", "rep2", "rep3" sapply(matList, class) # Should show matrix class all are "dgCMatrix" lig <- createLigerObject(matList) ## End(Not run)
## Not run: # For output from CellRanger < 3.0 dir <- 'path/to/data/directory' list.files(dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtx mat <- read10X(dir) class(mat) # Should show dgCMatrix # For root directory from CellRanger < 3.0 dir <- 'path/to/root' list.dirs(dir) # Should show sample names matList <- read10X(dir) names(matList) # Should show the sample names class(matList[[1]][["Gene Expression"]]) # Should show dgCMatrix # For output from CellRanger >= 3.0 with multiple data types dir <- 'path/to/data/directory' list.files(dir) # Should show barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz matList <- read10X(dir, sampleNames = "tissue1") names(matList) # Shoud show "tissue1" names(matList$tissue1) # Should show feature types, e.g. "Gene Expression" and etc. # For root directory from CellRanger >= 3.0 with multiple data types dir <- 'path/to/root' list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3" matList <- read10X(dir) names(matList) # Should show the sample names: "rep1", "rep2", "rep3" names(matList$rep1) # Should show the avalable feature types for rep1 ## End(Not run) ## Not run: # For creating LIGER object from root directory of CellRanger >= 3.0 dir <- 'path/to/root' list.dirs(dir) # Should show sample names, e.g. "rep1", "rep2", "rep3" matList <- read10XRNA(dir) names(matList) # Should show the sample names: "rep1", "rep2", "rep3" sapply(matList, class) # Should show matrix class all are "dgCMatrix" lig <- createLigerObject(matList) ## End(Not run)
This function works for loading a single sample with specifying the paths to
the matrix.mtx, barcodes.tsv, and features.tsv files. This function is
internally used by read10X
functions for loading individual
samples from cellranger output directory, while it can also be convenient
when out-of-standard files are presented (e.g. data downloaded from GEO).
read10XFiles( matrixPath, barcodesPath, featuresPath, sampleName = NULL, geneCol = 2, cellCol = 1, isATAC = FALSE, returnList = FALSE )
read10XFiles( matrixPath, barcodesPath, featuresPath, sampleName = NULL, geneCol = 2, cellCol = 1, isATAC = FALSE, returnList = FALSE )
matrixPath |
Character string, path to the matrix MTX file. Can be gzipped. |
barcodesPath |
Character string, path to the barcodes TSV file. Can be gzipped. |
featuresPath |
Character string, path to the features TSV file. Can be gzipped. |
sampleName |
Character string attached as a prefix to the cell barcodes
loaded from the barcodes file. Default |
geneCol |
An integer indicating which column in the features file to
extract as the feature identifiers. Default |
cellCol |
An integer indicating which column in the barcodes file to
extract as the cell identifiers. Default |
isATAC |
Logical, whether the data is for ATAC-seq. Default
|
returnList |
Logical, used internally by wrapper functions. Whether to
force putting the loaded matrix in a list even if there's only one matrix.
Default |
For a single-modal sample, a dgCMatrix object, or a list of one
dgCMatrix when returnList = TRUE
. A list of multiple dgCMatrix objects
when multiple feature types are detected.
## Not run: matrix <- read10XFiles( matrixPath = "path/to/matrix.mtx.gz", barcodesPath = "path/to/barcodes.tsv.gz", featuresPath = "path/to/features.tsv.gz" ) ## End(Not run)
## Not run: matrix <- read10XFiles( matrixPath = "path/to/matrix.mtx.gz", barcodesPath = "path/to/barcodes.tsv.gz", featuresPath = "path/to/features.tsv.gz" ) ## End(Not run)
This file reads a liger object stored in RDS files under all kinds of types.
A liger object with in-memory data created from package version since 1.99.
A liger object with on-disk H5 data associated, where the link to H5 files will be automatically restored.
A liger object created with older package version, and can be updated to the latest data structure by default.
readLiger( filename, dimredName, clusterName = "clusters", h5FilePath = NULL, update = TRUE )
readLiger( filename, dimredName, clusterName = "clusters", h5FilePath = NULL, update = TRUE )
filename |
Path to an RDS file of a |
dimredName |
The name of variable in |
clusterName |
The name of variable in |
h5FilePath |
Named character vector for all H5 file paths. Not required for object run with in-memory analysis. For object containing H5-based analysis (e.g. online iNMF), this must be supplied if the H5 file location is different from that at creation time. |
update |
Logical, whether to update an old (<=1.99.0) |
New version of liger object
# Save and read regular current-version liger object tempPath <- tempfile(fileext = ".rds") saveRDS(pbmc, tempPath) pbmc <- readLiger(tempPath, dimredName = NULL) # Save and read H5-based liger object h5Path <- system.file("extdata/ctrl.h5", package = "rliger") h5tempPath <- tempfile(fileext = ".h5") file.copy(from = h5Path, to = h5tempPath) lig <- createLiger(list(ctrl = h5tempPath)) tempPath <- tempfile(fileext = ".rds") saveRDS(lig, tempPath) lig <- readLiger(tempPath, h5FilePath = list(ctrl = h5tempPath)) ## Not run: # Read a old liger object <= 1.0.1 # Assume the dimensionality reduction method applied was UMAP # Assume the clustering was derived with Louvain method lig <- readLiger( filename = "path/to/oldLiger.rds", dimredName = "UMAP", clusterName = "louvain" ) ## End(Not run)
# Save and read regular current-version liger object tempPath <- tempfile(fileext = ".rds") saveRDS(pbmc, tempPath) pbmc <- readLiger(tempPath, dimredName = NULL) # Save and read H5-based liger object h5Path <- system.file("extdata/ctrl.h5", package = "rliger") h5tempPath <- tempfile(fileext = ".h5") file.copy(from = h5Path, to = h5tempPath) lig <- createLiger(list(ctrl = h5tempPath)) tempPath <- tempfile(fileext = ".rds") saveRDS(lig, tempPath) lig <- readLiger(tempPath, h5FilePath = list(ctrl = h5tempPath)) ## Not run: # Read a old liger object <= 1.0.1 # Assume the dimensionality reduction method applied was UMAP # Assume the clustering was derived with Louvain method lig <- readLiger( filename = "path/to/oldLiger.rds", dimredName = "UMAP", clusterName = "louvain" ) ## End(Not run)
downsample
This function mainly aims at downsampling datasets to a size suitable for plotting.
readSubset( object, slot.use = "normData", balance = NULL, max.cells = 1000, chunk = 1000, datasets.use = NULL, genes.use = NULL, rand.seed = 1, verbose = getOption("ligerVerbose", TRUE) )
readSubset( object, slot.use = "normData", balance = NULL, max.cells = 1000, chunk = 1000, datasets.use = NULL, genes.use = NULL, rand.seed = 1, verbose = getOption("ligerVerbose", TRUE) )
object |
liger object |
slot.use |
Only create subset from one or more of |
balance |
|
max.cells |
Max number of cells to sample from the grouping based on
|
chunk |
Integer. Number of maximum number of cells in each chunk,
Default |
datasets.use |
Index selection of datasets to consider. Default
|
genes.use |
Character vector. Subset features to this specified range.
Default |
rand.seed |
Random seed for reproducibility. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
Subset of liger object
.
downsample
, subsetLiger
,
subsetLigerDataset
Remove missing cells or features from liger object
removeMissing( object, orient = c("both", "feature", "cell"), minCells = NULL, minFeatures = NULL, useDatasets = NULL, newH5 = TRUE, filenameSuffix = "removeMissing", verbose = getOption("ligerVerbose", TRUE), ... ) removeMissingObs( object, slot.use = NULL, use.cols = TRUE, verbose = getOption("ligerVerbose", TRUE) )
removeMissing( object, orient = c("both", "feature", "cell"), minCells = NULL, minFeatures = NULL, useDatasets = NULL, newH5 = TRUE, filenameSuffix = "removeMissing", verbose = getOption("ligerVerbose", TRUE), ... ) removeMissingObs( object, slot.use = NULL, use.cols = TRUE, verbose = getOption("ligerVerbose", TRUE) )
object |
liger object |
orient |
Choose to remove non-expressing features ( |
minCells |
Keep features that are expressed in at least this number of
cells, calculated on a per-dataset base. A single value for all datasets or
a vector for each dataset. Default |
minFeatures |
Keep cells that express at least this number of features,
calculated on a per-dataset base. A single value for all datasets or a vector
for each dataset. Default |
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to be processed. Default
|
newH5 |
Logical, whether to create a new H5 file on disk for each
H5-based dataset on subset. Default |
filenameSuffix |
When subsetting H5-based datasets to new H5 files, this
suffix will be added to all the filenames. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
... |
Arguments passed to |
slot.use |
Deprecated. Always look at |
use.cols |
Deprecated. Previously means "treating each column as
a cell" when |
Updated (subset) object
.
removeMissingObs
will be deprecated. removeMissing
covers and
expands the use case and should be easier to understand.
# The example dataset does not contain non-expressing genes or empty barcodes pbmc <- removeMissing(pbmc)
# The example dataset does not contain non-expressing genes or empty barcodes pbmc <- removeMissing(pbmc)
When loading the saved liger object with HDF5 data in a new R session, the links to HDF5 files would be closed. This function enables the restoration of those links so that new analyses can be carried out.
restoreH5Liger(object, filePath = NULL) restoreOnlineLiger(object, file.path = NULL)
restoreH5Liger(object, filePath = NULL) restoreOnlineLiger(object, file.path = NULL)
object |
liger or ligerDataset object. |
filePath |
Paths to HDF5 files. A single character path for
ligerDataset input or a list of paths named by the datasets for
liger object input. Default |
file.path |
Will be deprecated with |
object
with restored links.
restoreOnlineLiger
will be deprecated for clarifying the terms used
for data structure.
h5Path <- system.file("extdata/ctrl.h5", package = "rliger") tempPath <- tempfile(fileext = ".h5") file.copy(from = h5Path, to = tempPath) lig <- createLiger(list(ctrl = tempPath)) # Now it is actually an invalid object! which is equivalent to what users # will get with `saveRDS(lig, "object.rds"); lig <- readRDS("object.rds")`` closeAllH5(lig) lig <- restoreH5Liger(lig)
h5Path <- system.file("extdata/ctrl.h5", package = "rliger") tempPath <- tempfile(fileext = ".h5") file.copy(from = h5Path, to = tempPath) lig <- createLiger(list(ctrl = tempPath)) # Now it is actually an invalid object! which is equivalent to what users # will get with `saveRDS(lig, "object.rds"); lig <- readRDS("object.rds")`` closeAllH5(lig) lig <- restoreH5Liger(lig)
Only retrieve data from specific slot to reduce memory used by
a whole liger object of the subset. Useful for plotting.
Internally used by plotDimRed
and plotCellViolin
.
retrieveCellFeature( object, feature, slot = c("rawData", "normData", "scaleData", "H", "H.norm", "cellMeta", "rawPeak", "normPeak"), cellIdx = NULL, ... )
retrieveCellFeature( object, feature, slot = c("rawData", "normData", "scaleData", "H", "H.norm", "cellMeta", "rawPeak", "normPeak"), cellIdx = NULL, ... )
object |
liger object |
feature |
Gene names, factor index or cell metadata variable names.
Should be available in specified |
slot |
Exactly choose from |
cellIdx |
Any valid type of index that subset from all cells. Default
|
... |
Additional arguments passed to |
A matrix object where rows are cells and columns are specified features.
S100A8Exp <- retrieveCellFeature(pbmc, "S100A8") qcMetrics <- retrieveCellFeature(pbmc, c("nUMI", "nGene", "mito"), slot = "cellMeta")
S100A8Exp <- retrieveCellFeature(pbmc, "S100A8") qcMetrics <- retrieveCellFeature(pbmc, c("nUMI", "nGene", "mito"), slot = "cellMeta")
Because gene body mCH proportions are negatively correlated with gene expression level in neurons, we need to reverse the direction of the methylation data. We do this by simply subtracting all values from the maximum methylation value. The resulting values are positively correlated with gene expression. This will only be applied to variable genes detected in prior.
reverseMethData(object, useDatasets, verbose = getOption("ligerVerbose", TRUE))
reverseMethData(object, useDatasets, verbose = getOption("ligerVerbose", TRUE))
object |
A liger object, with variable genes identified. |
useDatasets |
Required. A character vector of the names, a numeric or logical vector of the index of the datasets that should be identified as methylation data where the reversed data will be created. |
verbose |
Logical. Whether to show information of the progress. Default
|
The input liger object, where the scaleData
slot
of the specified datasets will be updated with value as described in
Description.
# Assuming the second dataset in example data "pbmc" is methylation data pbmc <- normalize(pbmc, useDatasets = 1) pbmc <- selectGenes(pbmc, datasets.use = 1) pbmc <- scaleNotCenter(pbmc, useDatasets = 1) pbmc <- reverseMethData(pbmc, useDatasets = 2)
# Assuming the second dataset in example data "pbmc" is methylation data pbmc <- normalize(pbmc, useDatasets = 1) pbmc <- selectGenes(pbmc, datasets.use = 1) pbmc <- scaleNotCenter(pbmc, useDatasets = 1) pbmc <- reverseMethData(pbmc, useDatasets = 2)
This is an experimental function and is subject to change.
Performs consensus integrative non-negative matrix factorization (c-iNMF)
to return factorized ,
, and
matrices. In order to
address the non-convex nature of NMF, we built on the cNMF method proposed by
D. Kotliar, 2019. We run the regular iNMF multiple times with different
random starts, and cluster the pool of all the factors in
and
s and take the consensus of the clusters of the largest population.
The cell factor loading
matrices are eventually solved
with the consensus
and
matrices.
Please see runINMF
for detailed introduction to the regular
iNMF algorithm which is run multiple times in this function.
The consensus iNMF algorithm is developed basing on the consensus NMF (cNMF) method (D. Kotliar et al., 2019).
runCINMF(object, k = 20, lambda = 5, rho = 0.3, ...) ## S3 method for class 'liger' runCINMF( object, k = 20, lambda = 5, rho = 0.3, nIteration = 30, nRandomStarts = 10, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' runCINMF( object, k = 20, lambda = 5, rho = 0.3, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "cinmf", nIteration = 30, nRandomStarts = 10, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... )
runCINMF(object, k = 20, lambda = 5, rho = 0.3, ...) ## S3 method for class 'liger' runCINMF( object, k = 20, lambda = 5, rho = 0.3, nIteration = 30, nRandomStarts = 10, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' runCINMF( object, k = 20, lambda = 5, rho = 0.3, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "cinmf", nIteration = 30, nRandomStarts = 10, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... )
object |
A liger object or a Seurat object with
non-negative scaled data of variable features (Done with
|
k |
Inner dimension of factorization (number of factors). Generally, a
higher |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
rho |
Numeric number between 0 and 1. Fraction for determining the
number of nearest neighbors to look at for consensus (by
|
... |
Arguments passed to methods. |
nIteration |
Total number of block coordinate descent iterations to
perform. Default |
nRandomStarts |
Number of replicate runs for creating the pool of
factorization results. Default |
HInit |
Initial values to use for |
WInit |
Initial values to use for |
VInit |
Initial values to use for |
seed |
Random seed to allow reproducible results. Default |
nCores |
The number of parallel tasks to speed up the computation.
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
datasetVar |
Metadata variable name that stores the dataset source
annotation. Default |
layer |
For Seurat>=4.9.9, the name of layer to retrieve input
non-negative scaled data. Default |
assay |
Name of assay to use. Default |
reduction |
Name of the reduction to store result. Also used as the
feature key. Default |
liger method - Returns updated input liger object
A list of all matrices can be accessed with
getMatrix(object, "H")
A list of all matrices can be accessed with
getMatrix(object, "V")
The matrix can be accessed with
getMatrix(object, "W")
Seurat method - Returns updated input Seurat object
matrices for all datasets will be concatenated and
transposed (all cells by k), and form a DimReduc object in the
reductions
slot named by argument reduction
.
matrix will be presented as
feature.loadings
in the
same DimReduc object.
matrices, an objective error value and the dataset
variable used for the factorization is currently stored in
misc
slot of the same DimReduc object.
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
Dylan Kotliar and et al., Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq, eLife, 2019
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runCINMF(pbmc) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runCINMF(pbmc) }
After aligning cell factor loadings, users can additionally run the Leiden or Louvain algorithm for community detection, which is widely used in single-cell analysis and excels at merging small clusters into broad cell classes.
While using aligned factor loadings (result from alignFactors
)
is recommended, this function looks for unaligned factor loadings (raw result
from runIntegration
) when the former is not available.
runCluster( object, resolution = 1, nNeighbors = 20, prune = 1/15, eps = 0.1, nRandomStarts = 10, nIterations = 5, method = c("leiden", "louvain"), useRaw = NULL, useDims = NULL, groupSingletons = TRUE, saveSNN = FALSE, clusterName = paste0(method, "_cluster"), seed = 1, verbose = getOption("ligerVerbose", TRUE) )
runCluster( object, resolution = 1, nNeighbors = 20, prune = 1/15, eps = 0.1, nRandomStarts = 10, nIterations = 5, method = c("leiden", "louvain"), useRaw = NULL, useDims = NULL, groupSingletons = TRUE, saveSNN = FALSE, clusterName = paste0(method, "_cluster"), seed = 1, verbose = getOption("ligerVerbose", TRUE) )
object |
A liger object. Should have valid factorization result available. |
resolution |
Numeric, value of the resolution parameter, a larger value
results in a larger number of communities with smaller sizes. Default
|
nNeighbors |
Integer, the maximum number of nearest neighbors to
compute. Default |
prune |
Numeric. Sets the cutoff for acceptable Jaccard index when
computing the neighborhood overlap for the SNN construction. Any edges with
values less than or equal to this will be set to 0 and removed from the SNN
graph. Essentially sets the stringency of pruning. |
eps |
Numeric, the error bound of the nearest neighbor search. Default
|
nRandomStarts |
Integer number of random starts. Will pick the
membership with highest quality to return. Default |
nIterations |
Integer, maximal number of iterations per random start.
Default |
method |
Community detection algorithm to use. Choose from
|
useRaw |
Whether to use un-aligned cell factor loadings ( |
useDims |
Indices of factors to use for clustering. Default |
groupSingletons |
Whether to group single cells that make up their own
cluster in with the cluster they are most connected to. Default |
saveSNN |
Logical, whether to store the SNN graph, as a dgCMatrix
object, in the object. Default |
clusterName |
Name of the variable that will store the clustering result
in |
seed |
Seed of the random number generator. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
object
with cluster assignment updated in clusterName
variable in cellMeta
slot. Can be fetched with
object[[clusterName]]
. If saveSNN = TRUE
, the SNN graph will
be stored at object@uns$snn
.
pbmcPlot <- runCluster(pbmcPlot) head(pbmcPlot$leiden_cluster) pbmcPlot <- runCluster(pbmcPlot, method = "louvain") head(pbmcPlot$louvain_cluster)
pbmcPlot <- runCluster(pbmcPlot) head(pbmcPlot$leiden_cluster) pbmcPlot <- runCluster(pbmcPlot, method = "louvain") head(pbmcPlot$louvain_cluster)
Detect doublet with DoubletFinder. Package "Seurat" and "DoubletFinder" would be required to run this function.
This wrapper runs Seurat PCA workflow (NormalizeData,
FindVariableFeatures, ScaleData, RunPCA) with all default settings on each
dataset, and then calls DoubletFinder::doubletFinder
. Users that
prefer having more control on the preprocessing part might consider creating
single-sample Seurat object with
CreateSeuratObject(rawData(object, "datasetName"))
.
runDoubletFinder( object, useDatasets = NULL, PCs = 1:10, nNeighbors = 20, nExp = NULL, verbose = getOption("ligerVerbose", TRUE), ... )
runDoubletFinder( object, useDatasets = NULL, PCs = 1:10, nNeighbors = 20, nExp = NULL, verbose = getOption("ligerVerbose", TRUE), ... )
object |
A liger object. |
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to run
|
PCs |
Specific principal components to use. Default |
nNeighbors |
Number of the PC neighborhood size used to compute pANN.
See "See Also". Scalar for all used datasets or vector for each. Default
|
nExp |
The total number of doublet predictions produced. Scalar for all
used datasets or vector for each. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
... |
Additional arguments passed to
|
Updated object
with variables DoubletFinder_pANN
and
DoubletFinder_classification
updated in cellMeta
slot
if (requireNamespace("DoubletFinder", quietly = TRUE)) { pbmc <- runDoubletFinder(pbmc) print(cellMeta(pbmc)) }
if (requireNamespace("DoubletFinder", quietly = TRUE)) { pbmc <- runDoubletFinder(pbmc) print(cellMeta(pbmc)) }
Calculate number of UMIs, number of detected features and percentage of feature subset (e.g. mito, ribo and hemo) expression per cell.
runGeneralQC( object, organism, features = NULL, pattern = NULL, overwrite = FALSE, useDatasets = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), mito = NULL, ribo = NULL, hemo = NULL )
runGeneralQC( object, organism, features = NULL, pattern = NULL, overwrite = FALSE, useDatasets = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), mito = NULL, ribo = NULL, hemo = NULL )
object |
liger object with |
organism |
Specify the organism of the dataset to identify the
mitochondrial, ribosomal and hemoglobin genes. Available options are
|
features |
Feature names matching the feature subsets that users want to
calculate the expression percentage with. A vector for a single subset, or a
named list for multiple subset. Default |
pattern |
Regex patterns for matching the feature subsets that users
want to calculate the expression percentage with. A vector for a single
subset, or a named list for multiple subset. Default |
overwrite |
Whether to overwrite existing QC metric variables. Default
|
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to be included for QC. Default
|
chunkSize |
Integer number of cells to include in a chunk when working
on HDF5 based dataset. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
mito , ribo , hemo
|
Now will always compute the percentages of mitochondrial, ribosomal and hemoglobin gene counts. These arguments will be ignored. |
This function by default calculates:
nUMI
- The column sum of the raw data matrix per cell.
Represents the total number of UMIs per cell if given raw counts.
nGene
- Number of detected features per cell
mito
- Percentage of mitochondrial gene expression per cell
ribo
- Percentage of ribosomal gene expression per cell
hemo
- Percentage of hemoglobin gene expression per cell
Users can also specify their own feature subsets with argument
features
, or regular expression patterns that match to genes of
interests with argument pattern
, to calculate the expression
percentage. If a character vector is given to features
, a QC metric
variable named "featureSubset_name"
will be computed. If a named list
of multiple subsets is given, the names will be used as the variable names.
If a single pattern is given to pattern
, a QC metric variable named
"featureSubset_pattern"
will be computed. If a named list of multiple
patterns is given, the names will be used as the variable names.
Duplicated QC metric names between these two arguments and the default
five listed above should be avoided.
This function is automatically operated at the creation time of each
liger object to capture the raw status. Argument
overwrite
is set to FALSE by default to avoid mistakenly updating
existing metrics after filtering the object. Users can still opt to update
all newly calculated metrics (including the default five) by setting
overwrite = TRUE
, or only some of newly calculated ones by providing
a character vector of the names of the metrics to update. Intended
overwriting only happens to datasets selected with useDatasets
.
Updated object
with the cellMeta(object)
updated as
intended by users. See Details for more information.
pbmc <- runGeneralQC(pbmc, "human", overwrite = TRUE)
pbmc <- runGeneralQC(pbmc, "human", overwrite = TRUE)
This function forms genesets basing on the differential expression result, and calls gene ontology (GO) analysis method provided by gprofiler2.
runGOEnrich( result, group = NULL, useBg = TRUE, orderBy = "padj", logFCThresh = 1, padjThresh = 0.05, splitReg = FALSE, ... )
runGOEnrich( result, group = NULL, useBg = TRUE, orderBy = "padj", logFCThresh = 1, padjThresh = 0.05, splitReg = FALSE, ... )
result |
Data frame of unfiltered output from |
group |
Selection of one group available from |
useBg |
Logical, whether to set all genes involved in DE analysis
(before threshold filtering) as a domain background of GO analysis. Default
|
orderBy |
Name of DE statistics metric to order the gene list for each
group. Choose from |
logFCThresh |
The log2FC threshold above which the genes will be used.
Default |
padjThresh |
The adjusted p-value threshold less than which the genes
will be used. Default |
splitReg |
Whether to have queries of both up-regulated and
down-regulated genes for each group. Default |
... |
Additional arguments passed to
Arguments |
A list object where each element is a result list for a group. Each result list contains two elements:
result |
data.frame of main GO analysis result. |
meta |
Meta information for the query. |
See gprofiler2::gost()
. for detailed explanation.
Kolberg, L. et al, 2020 and Raudvere, U. et al, 2019
# Setting `significant = FALSE` because it's hard for a gene list obtained # from small test dataset to represent real-life biology. if (requireNamespace("gprofiler2", quietly = TRUE)) { go <- runGOEnrich(deg.pw, group = "0.stim", significant = FALSE) }
# Setting `significant = FALSE` because it's hard for a gene list obtained # from small test dataset to represent real-life biology. if (requireNamespace("gprofiler2", quietly = TRUE)) { go <- runGOEnrich(deg.pw, group = "0.stim", significant = FALSE) }
Identify the biological pathways (gene sets from Reactome) that each metagene (factor) might belongs to.
runGSEA( object, genesets = NULL, useW = TRUE, useV = NULL, customGenesets = NULL, gene_sets = genesets, mat_w = useW, mat_v = useV, custom_gene_sets = customGenesets )
runGSEA( object, genesets = NULL, useW = TRUE, useV = NULL, customGenesets = NULL, gene_sets = genesets, mat_w = useW, mat_v = useV, custom_gene_sets = customGenesets )
object |
A liger object with valid factorization result. |
genesets |
Character vector of the Reactome gene sets names to be
tested. Default |
useW |
Logical, whether to use the shared factor loadings ( |
useV |
A character vector of the names, a numeric or logical
vector of the index of the datasets where the |
customGenesets |
A named list of character vectors of entrez gene ids.
Default |
gene_sets , mat_w , mat_v , custom_gene_sets
|
Deprecated. See Usage section for replacement. |
A list of matrices with GSEA analysis for each factor
if (requireNamespace("org.Hs.eg.db", quietly = TRUE) && requireNamespace("reactome.db", quietly = TRUE) && requireNamespace("fgsea", quietly = TRUE) && requireNamespace("AnnotationDbi", quietly = TRUE)) { runGSEA(pbmcPlot) }
if (requireNamespace("org.Hs.eg.db", quietly = TRUE) && requireNamespace("reactome.db", quietly = TRUE) && requireNamespace("fgsea", quietly = TRUE) && requireNamespace("AnnotationDbi", quietly = TRUE)) { runGSEA(pbmcPlot) }
Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,
2019) using block coordinate descent (alternating non-negative
least squares, ANLS) to return factorized ,
, and
matrices. The objective function is stated as
where is the input non-negative matrix of the i'th dataset,
is the total number of datasets.
is of size
for
variable genes and
cells,
is of size
,
is of size
, and
is of
size
.
The factorization produces a shared matrix (genes by k), and for each
dataset, an
matrix (k by cells) and a
matrix (genes by k).
The
matrices represent the cell factor loadings.
is held
consistent among all datasets, as it represents the shared components of the
metagenes across datasets. The
matrices represent the
dataset-specific components of the metagenes.
This function adopts highly optimized fast and memory efficient
implementation extended from Planc (Kannan, 2016). Pre-installation of
extension package RcppPlanc
is required. The underlying algorithm
adopts the identical ANLS strategy as optimizeALS
in the old
version of LIGER.
runINMF(object, k = 20, lambda = 5, ...) ## S3 method for class 'liger' runINMF( object, k = 20, lambda = 5, nIteration = 30, nRandomStarts = 1, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' runINMF( object, k = 20, lambda = 5, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "inmf", nIteration = 30, nRandomStarts = 1, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... )
runINMF(object, k = 20, lambda = 5, ...) ## S3 method for class 'liger' runINMF( object, k = 20, lambda = 5, nIteration = 30, nRandomStarts = 1, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' runINMF( object, k = 20, lambda = 5, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "inmf", nIteration = 30, nRandomStarts = 1, HInit = NULL, WInit = NULL, VInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... )
object |
A liger object or a Seurat object with
non-negative scaled data of variable features (Done with
|
k |
Inner dimension of factorization (number of factors). Generally, a
higher |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
... |
Arguments passed to methods. |
nIteration |
Total number of block coordinate descent iterations to
perform. Default |
nRandomStarts |
Number of restarts to perform (iNMF objective function
is non-convex, so taking the best objective from multiple successive
initialization is recommended). For easier reproducibility, this increments
the random seed by 1 for each consecutive restart, so future factorization
of the same dataset can be run with one rep if necessary. Default |
HInit |
Initial values to use for |
WInit |
Initial values to use for |
VInit |
Initial values to use for |
seed |
Random seed to allow reproducible results. Default |
nCores |
The number of parallel tasks to speed up the computation.
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
datasetVar |
Metadata variable name that stores the dataset source
annotation. Default |
layer |
For Seurat>=4.9.9, the name of layer to retrieve input
non-negative scaled data. Default |
assay |
Name of assay to use. Default |
reduction |
Name of the reduction to store result. Also used as the
feature key. Default |
liger method - Returns updated input liger object
A list of all matrices can be accessed with
getMatrix(object, "H")
A list of all matrices can be accessed with
getMatrix(object, "V")
The matrix can be accessed with
getMatrix(object, "W")
Seurat method - Returns updated input Seurat object
matrices for all datasets will be concatenated and
transposed (all cells by k), and form a DimReduc object in the
reductions
slot named by argument reduction
.
matrix will be presented as
feature.loadings
in the
same DimReduc object.
matrices, an objective error value and the dataset
variable used for the factorization is currently stored in
misc
slot of the same DimReduc object.
In the old version implementation, we compute the objective error at the end
of each iteration, and then compares if the algorithm is reaching a
convergence, using an argument thresh
. Now, since the computation of
objective error is indeed expensive, we canceled this feature and directly
runs a default of 30 (nIteration
) iterations, which empirically leads
to a convergence most of the time. Given that the new version is highly
optimized, running this many iteration should be acceptable.
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runINMF(pbmc) }
LIGER provides dataset integration methods based on iNMF (integrative
Non-negative Matrix Factorization [1]) and its variants (online iNMF [2]
and UINMF [3]). This function wraps runINMF
,
runOnlineINMF
and runUINMF
, of which the help
pages have more detailed description.
runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF", "UINMF"), ... ) ## S3 method for class 'liger' runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF", "UINMF"), seed = 1, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF"), datasetVar = "orig.ident", useLayer = "ligerScaleData", assay = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE), ... )
runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF", "UINMF"), ... ) ## S3 method for class 'liger' runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF", "UINMF"), seed = 1, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' runIntegration( object, k = 20, lambda = 5, method = c("iNMF", "onlineINMF"), datasetVar = "orig.ident", useLayer = "ligerScaleData", assay = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE), ... )
object |
A liger object or a Seurat object with
non-negative scaled data of variable features (Done with
|
k |
Inner dimension of factorization (number of factors). Generally, a
higher |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
method |
iNMF variant algorithm to use for integration. Choose from
|
... |
Arguments passed to other methods and wrapped functions. |
seed |
Random seed to allow reproducible results. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
datasetVar |
Metadata variable name that stores the dataset source
annotation. Default |
useLayer |
For Seurat>=4.9.9, the name of layer to retrieve input
non-negative scaled data. Default |
assay |
Name of assay to use. Default |
Updated input object. For detail, please refer to the refered method linked in Description.
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
Chao Gao and et al., Iterative single-cell multi-omic integration using online learning, Nat Biotechnol., 2021
April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runIntegration(pbmc) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { pbmc <- runIntegration(pbmc) }
Perform online integrative non-negative matrix factorization to
represent multiple single-cell datasets in terms of ,
, and
matrices. It optimizes the iNMF objective function (see
runINMF
) using online learning (non-negative least squares for
matrices, and hierarchical alternating least squares (HALS) for
matrices and
), where the number of factors is set by
k
. The function allows online learning in 3 scenarios:
Fully observed datasets;
Iterative refinement using continually arriving datasets;
Projection of new datasets without updating the existing factorization
All three scenarios require fixed memory independent of the number of cells.
For each dataset, this factorization produces an matrix (k by cell),
a
matrix (genes by k), and a shared
matrix (genes by k). The
matrices represent the cell factor loadings.
is identical among all datasets, as it represents the shared
components of the metagenes across datasets. The
matrices represent
the dataset-specific components of the metagenes.
runOnlineINMF(object, k = 20, lambda = 5, ...) ## S3 method for class 'liger' runOnlineINMF( object, k = 20, lambda = 5, newDatasets = NULL, projection = FALSE, maxEpochs = 5, HALSiter = 1, minibatchSize = 5000, WInit = NULL, VInit = NULL, AInit = NULL, BInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' runOnlineINMF( object, k = 20, lambda = 5, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "onlineINMF", maxEpochs = 5, HALSiter = 1, minibatchSize = 5000, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... )
runOnlineINMF(object, k = 20, lambda = 5, ...) ## S3 method for class 'liger' runOnlineINMF( object, k = 20, lambda = 5, newDatasets = NULL, projection = FALSE, maxEpochs = 5, HALSiter = 1, minibatchSize = 5000, WInit = NULL, VInit = NULL, AInit = NULL, BInit = NULL, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'Seurat' runOnlineINMF( object, k = 20, lambda = 5, datasetVar = "orig.ident", layer = "ligerScaleData", assay = NULL, reduction = "onlineINMF", maxEpochs = 5, HALSiter = 1, minibatchSize = 5000, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... )
object |
liger object. Scaled data required. |
k |
Inner dimension of factorization–number of metagenes. A value in
the range 20-50 works well for most analyses. Default |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
lambda increases). We recommend always using the default value except
possibly for analyses with relatively small differences (biological
replicates, male/female comparisons, etc.) in which case a lower value such
as 1.0 may improve reconstruction quality. Default |
... |
Arguments passed to other S3 methods of this function. |
newDatasets |
Named list of dgCMatrix-class object. New
datasets for scenario 2 or scenario 3. Default |
projection |
Whether to perform data integration with scenario 3 when
|
maxEpochs |
The number of epochs to iterate through. See detail.
Default |
HALSiter |
Maximum number of block coordinate descent (HALS
algorithm) iterations to perform for each update of |
minibatchSize |
Total number of cells in each minibatch. See detail.
Default |
WInit , VInit , AInit , BInit
|
Optional initialization for |
seed |
Random seed to allow reproducible results. Default |
nCores |
The number of parallel tasks to speed up the computation.
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
datasetVar |
Metadata variable name that stores the dataset source
annotation. Default |
layer |
For Seurat>=4.9.9, the name of layer to retrieve input
non-negative scaled data. Default |
assay |
Name of assay to use. Default |
reduction |
Name of the reduction to store result. Also used as the
feature key. Default |
For performing scenario 2 or 3, a complete set of factorization result from
a run of scenario 1 is required. Given the structure of a liger
object, all of the required information can be retrieved automatically.
Under the circumstance where users need customized information for existing
factorization, arguments WInit
, VInit
, AInit
and
BInit
are exposed. The requirements for these argument follows:
WInit - A matrix object of size . (see
runINMF
for notation)
VInit - A list object of matrices each of size .
Number of matrices should match with
newDatasets
.
AInit - A list object of matrices each of size .
Number of matrices should match with
newDatasets
.
BInit - A list object of matrices each of size .
Number of matrices should match with
newDatasets
.
Minibatch iterations is performed on small subset of cells. The exact
minibatch size applied on each dataset is minibatchSize
multiplied by
the proportion of cells in this dataset out of all cells. In general,
minibatchSize
should be no larger than the number of cells in the
smallest dataset (considering both object
and newDatasets
).
Therefore, a smaller value may be necessary for analyzing very small
datasets.
An epoch is one completion of calculation on all cells after a number of
iterations of minibatches. Therefore, the total number of iterations is
determined by the setting of maxEpochs
, total number of cells, and
minibatchSize
.
Currently, Seurat S3 method does not support working on Scenario 2 and 3, because there is no simple solution for organizing a number of miscellaneous matrices with a single Seurat object. We strongly recommend that users create a liger object which has the specific structure.
liger method - Returns updated input liger object.
A list of all matrices can be accessed with
getMatrix(object, "H")
A list of all matrices can be accessed with
getMatrix(object, "V")
The matrix can be accessed with
getMatrix(object, "W")
Meanwhile, intermediate matrices and
produced in
HALS update can also be accessed similarly.
Seurat method - Returns updated input Seurat object.
matrices for all datasets will be concatenated and
transposed (all cells by k), and form a DimReduc object in the
reductions
slot named by argument reduction
.
matrix will be presented as
feature.loadings
in the
same DimReduc object.
matrices,
matrices,
matricesm an objective
error value and the dataset variable used for the factorization is
currently stored in
misc
slot of the same DimReduc object.
Chao Gao and et al., Iterative single-cell multi-omic integration using online learning, Nat Biotechnol., 2021
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Scenario 1 pbmc <- runOnlineINMF(pbmc, minibatchSize = 200) # Scenario 2 # Fake new dataset by increasing all non-zero value in "ctrl" by 1 ctrl2 <- rawData(dataset(pbmc, "ctrl")) ctrl2@x <- ctrl2@x + 1 colnames(ctrl2) <- paste0(colnames(ctrl2), 2) pbmc2 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2), minibatchSize = 100) # Scenario 3 pbmc3 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2), projection = TRUE) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc) if (requireNamespace("RcppPlanc", quietly = TRUE)) { # Scenario 1 pbmc <- runOnlineINMF(pbmc, minibatchSize = 200) # Scenario 2 # Fake new dataset by increasing all non-zero value in "ctrl" by 1 ctrl2 <- rawData(dataset(pbmc, "ctrl")) ctrl2@x <- ctrl2@x + 1 colnames(ctrl2) <- paste0(colnames(ctrl2), 2) pbmc2 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2), minibatchSize = 100) # Scenario 3 pbmc3 <- runOnlineINMF(pbmc, k = 20, newDatasets = list(ctrl2 = ctrl2), projection = TRUE) }
Two methods are supported: "pseudoBulk"
and
"wilcoxon"
. Pseudo-bulk method aggregates cells basing on biological
replicates and calls bulk RNAseq DE methods, DESeq2 wald test, while
Wilcoxon rank sum test is performed on single-cell level.
runPairwiseDEG()
is generally used for flexibly comparing two specific
groups of cells, while runMarkerDEG()
is used for a one-vs-rest marker
test strategy.
While using pseudo-bulk method, it is generally recommended that you have these variables available in your object:
The cell type or cluster labeling. This can be obtained from prior
study or computed with runCluster
The biological replicate labeling, most of the time the
"dataset"
variable automatically generated when the
liger object is created. Users may use other variables if
a "dataset" is merged from multiple replicates.
The condition labeling that reflects the study design, such as the treatment or disease status for each sample/dataset.
Please see below for detailed scenarios.
runPairwiseDEG( object, groupTest, groupCtrl, variable1 = NULL, variable2 = NULL, splitBy = NULL, method = c("pseudoBulk", "wilcoxon"), usePeak = FALSE, useReplicate = "dataset", nPsdRep = NULL, minCellPerRep = 3, printDiagnostic = FALSE, chunk = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE) ) runMarkerDEG( object, conditionBy = NULL, splitBy = NULL, method = c("pseudoBulk", "wilcoxon"), useDatasets = NULL, usePeak = FALSE, useReplicate = "dataset", nPsdRep = NULL, minCellPerRep = 3, printDiagnostic = FALSE, chunk = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE) ) runWilcoxon( object, data.use = NULL, compare.method = c("clusters", "datasets") )
runPairwiseDEG( object, groupTest, groupCtrl, variable1 = NULL, variable2 = NULL, splitBy = NULL, method = c("pseudoBulk", "wilcoxon"), usePeak = FALSE, useReplicate = "dataset", nPsdRep = NULL, minCellPerRep = 3, printDiagnostic = FALSE, chunk = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE) ) runMarkerDEG( object, conditionBy = NULL, splitBy = NULL, method = c("pseudoBulk", "wilcoxon"), useDatasets = NULL, usePeak = FALSE, useReplicate = "dataset", nPsdRep = NULL, minCellPerRep = 3, printDiagnostic = FALSE, chunk = NULL, seed = 1, verbose = getOption("ligerVerbose", TRUE) ) runWilcoxon( object, data.use = NULL, compare.method = c("clusters", "datasets") )
object |
A liger object, with normalized data available |
groupTest , groupCtrl , variable1 , variable2
|
Condition specification. See
|
splitBy |
Name(s) of the variable(s) in |
method |
DEG test method to use. Choose from |
usePeak |
Logical. Whether to use peak count instead of gene count.
Only supported when ATAC datasets are involved. Default |
useReplicate |
|
nPsdRep |
Number of pseudo-replicates to create. Only used when
|
minCellPerRep |
Numeric, will not make pseudo-bulk for replicate with
less than this number of cells. Default |
printDiagnostic |
Logical. Whether to show more detail when
|
chunk |
Number of features to process at a time during Wilcoxon test.
Useful when memory is limited. Default |
seed |
Random seed to use for pseudo-replicate generation. Default
|
verbose |
Logical. Whether to show information of the progress. Default
|
conditionBy |
|
useDatasets |
Datasets to perform marker detection within. Default
|
data.use |
Same as |
compare.method |
Choose from |
A data.frame with DEG information with the all or some of the following fields:
feature |
Gene names |
group |
Test group name. Multiple tests might be present for each function call. This is the main variable to distinguish the tests. For a pairwise test, a row with a certain group name represents the test result between the this group against the other control group; When split by a variable, it would be presented in "split.group" format, meaning the stats is by comparing the group in the split level against the control group in the same split level. When running marker detection without splitting, a row with group "a" represents the stats of the gene in group "a" against all other cells. When running split marker detection, the group name would be in "split.group" format, meaning the stats is by comparing the group in the split level against all other cells in the same split level. |
logFC |
Log fold change |
pval |
P-value |
padj |
Adjusted p-value |
avgExpr |
Mean expression in the test group indicated by the "group" field. Only available for wilcoxon tests. |
statistic |
Wilcoxon rank-sum test statistic. Only available for wilcoxon tests. |
auc |
Area under the ROC curve. Only available for wilcoxon tests. |
pct_in |
Percentage of cells in the test group, indicated by the "group" field, that express the feature. Only available for wilcoxon tests. |
pct_out |
Percentage of cells in the control group or other cells, as explained for the "group" field, that express the feature. Only available for wilcoxon tests. |
Wilcoxon rank-sum test works for each gene and is based on the rank of the expression in each cell. LIGER provides dataset integration but does not "correct" the expression values. Projects with strong batch effects or integrate drastically different modalities should be cautious when using this method.
Most of times, people would want to know what cell types are for each cluster
after clustering. This can be done with a marker detection method that test
each cluster against all the other cells. This can be done with a command
like runMarkerDEG(object, conditionBy = "cluster_var")
. When using
default pseudo-bulk method, users should additionaly determine the
pseudo-bulk setup parameters. If the real biological replicate variable is
available, it should be supplied to argument useReplicate
, otherwise,
pseudo-replicates should be created. See "Pseudo-Replicate" section for more.
It is frequently needed to identify the difference between conditions. Users
can simply set conditionBy = "condition_var"
. However, most of time,
such comparisons should be ideally done in a per-cluster manner. This can be
done by setting splitBy = "cluster_var"
. This will run a loop for each
cluster, and within the group of cells, compare each condition against all
other cells in the cluster.
In the scenario when users only need to compare two conditions for each
cluster, running runPairwiseDEG(object, groupTest = "condition1",
groupCtrl = "condition2", variable1 = "condition_var",
splitBy = "cluster_var")
would address the need.
For both use case, if pseudo-bulk (default) method is used, users should determine the pseudo-bulk setup parameters as mentioned in the previous section.
runMarkerDEG
usageMarker detection is performed in a one vs. rest manner. The grouping of such
condition is specified by conditionBy
, which should be a column name
in cellMeta
. When splitBy
is specified as another variable
name in cellMeta
, the marker detection will be iteratively done for
within each level of splitBy
variable.
For example, when conditionBy = "celltype"
and splitBy = NULL
,
marker detection will be performed by comparing all cells of "celltype_i"
against all other cells, and etc. This is analogous to the old version when
running runWilcoxon(method = "cluster")
.
When conditionBy = "gender"
and splitBy = "leiden_cluster"
,
marker detection will be performed by comparing "gender_i" cells from "cluster_j"
against other cells from "cluster_j", and etc. This is analogous to the old
version when running runWilcoxon(method = "dataset")
.
runPairwiseDEG
usageUsers can select classes of cells from a variable in cellMeta
.
variable1
and variable2
are used to specify a column in
cellMeta
, and groupTest
and groupCtrl
are used to specify
existing classes from variable1
and variable2
, respectively.
When variable2
is missing, groupCtrl
will be considered from
variable1
.
For example, when variable1 = "celltype"
and variable2 = NULL
,
groupTest
and groupCtrl
should be valid cell types in
object$celltype
.
When variable1
is "celltype" and variable2
is "gender",
groupTest
should be a valid cell type from object$celltype
and
groupCtrl
should be a valid class from object$gender
.
When both variable1
and variable2
are missing, groupTest
and groupCtrl
should be valid index of cells in object
.
Pseudo-replicate assignment is a technique to complement the lack of real
biological replicates when using pseudo-bulk DE methods. LIGER's pseudo-bulk
method generally requires that each comparison group has at least 3
replicates each composed of at least 3 cells, in order to ensure the
statistic power. When less than 3 real replicates are found for a comparison,
the default setting (nPsdRep = NULL
) splits each into 3
pseudo-replicates, otherwise no pseudo-replicates are automatically
generated. When nPsdRep
is given a number, LIGER will always go
through each comparison group and split each real replicate into the given
number of pseudo-replicates.
pbmc$leiden_cluster <- pbmcPlot$leiden_cluster # Identify cluster markers degStats1 <- runMarkerDEG(pbmc, conditionBy = "leiden_cluster") # Compare "stim" data against "ctrl" data within each cluster degStats3 <- runPairwiseDEG(pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "leiden_cluster", minCellPerRep = 4)
pbmc$leiden_cluster <- pbmcPlot$leiden_cluster # Identify cluster markers degStats1 <- runMarkerDEG(pbmc, conditionBy = "leiden_cluster") # Compare "stim" data against "ctrl" data within each cluster degStats3 <- runPairwiseDEG(pbmc, groupTest = "stim", groupCtrl = "ctrl", variable1 = "dataset", splitBy = "leiden_cluster", minCellPerRep = 4)
Runs t-SNE on the aligned cell factors (result from
alignFactors
), or unaligned cell factors (result from
runIntegration
)) to generate a 2D embedding for visualization.
By default Rtsne
(Barnes-Hut implementation of t-SNE)
method is invoked, while alternative "fftRtsne" method (FFT-accelerated
Interpolation-based t-SNE, using Kluger Lab implementation) is also
supported. For very large datasets, it is recommended to use
method = "fftRtsne"
due to its efficiency and scalability.
Extra external installation steps are required for using "fftRtsne" method. Please consult detailed guide.
runTSNE( object, useRaw = NULL, useDims = NULL, nDims = 2, usePCA = FALSE, perplexity = 30, theta = 0.5, method = c("Rtsne", "fftRtsne"), dimredName = "TSNE", asDefault = NULL, fitsnePath = NULL, seed = 42, verbose = getOption("ligerVerbose", TRUE), k = nDims, use.raw = useRaw, dims.use = useDims, use.pca = usePCA, fitsne.path = fitsnePath, rand.seed = seed )
runTSNE( object, useRaw = NULL, useDims = NULL, nDims = 2, usePCA = FALSE, perplexity = 30, theta = 0.5, method = c("Rtsne", "fftRtsne"), dimredName = "TSNE", asDefault = NULL, fitsnePath = NULL, seed = 42, verbose = getOption("ligerVerbose", TRUE), k = nDims, use.raw = useRaw, dims.use = useDims, use.pca = usePCA, fitsne.path = fitsnePath, rand.seed = seed )
object |
liger object with factorization results. |
useRaw |
Whether to use un-aligned cell factor loadings ( |
useDims |
Index of factors to use for computing the embedding. Default
|
nDims |
Number of dimensions to reduce to. Default |
usePCA |
Whether to perform initial PCA step for Rtsne. Default
|
perplexity |
Numeric parameter to pass to Rtsne (expected number of
neighbors). Default |
theta |
Speed/accuracy trade-off (increase for less accuracy), set to
|
method |
Choose from |
dimredName |
Name of the variable in |
asDefault |
Logical, whether to set the resulting dimRed as default for
visualization. Default |
fitsnePath |
Path to the cloned FIt-SNE directory (i.e.
|
seed |
Random seed for reproducibility. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
use.raw , dims.use , k , use.pca , fitsne.path , rand.seed
|
Deprecated. See Usage section for replacement. |
The object
where a "TSNE"
variable is updated in the
cellMeta
slot with the whole 2D embedding matrix.
pbmc <- runTSNE(pbmcPlot)
pbmc <- runTSNE(pbmcPlot)
Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R.
Kriebel, 2022) using block coordinate descent (alternating non-negative
least squares, ANLS) to return factorized ,
,
and
matrices. The objective function is stated as
where is the input non-negative matrix of the
'th dataset,
is the input non-negative matrix for the unshared features,
is the total number of datasets.
is of size
for
shared features and
cells,
is of size
for
unshared feaetures,
is of size
,
is of size
,
is of size
and
is of
size
.
The factorization produces a shared matrix (genes by k). For each
dataset, an
matrix (k by cells), a
matrix (genes by k) and
a
matrix (unshared genes by k). The
matrices represent the
cell factor loadings.
is held consistent among all datasets, as it
represents the shared components of the metagenes across datasets. The
matrices represent the dataset-specific components of the metagenes,
matrices are similar to
s but represents the loading
contributed by unshared features.
This function adopts highly optimized fast and memory efficient
implementation extended from Planc (Kannan, 2016). Pre-installation of
extension package RcppPlanc
is required. The underlying algorithm
adopts the identical ANLS strategy as optimizeALS(unshared =
TRUE)
in the old version of LIGER.
runUINMF(object, k = 20, lambda = 5, ...) ## S3 method for class 'liger' runUINMF( object, k = 20, lambda = 5, nIteration = 30, nRandomStarts = 1, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... )
runUINMF(object, k = 20, lambda = 5, ...) ## S3 method for class 'liger' runUINMF( object, k = 20, lambda = 5, nIteration = 30, nRandomStarts = 1, seed = 1, nCores = 2L, verbose = getOption("ligerVerbose", TRUE), ... )
object |
liger object. Should run
|
k |
Inner dimension of factorization (number of factors). Generally, a
higher |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
... |
Arguments passed to other methods and wrapped functions. |
nIteration |
Total number of block coordinate descent iterations to
perform. Default |
nRandomStarts |
Number of restarts to perform (iNMF objective function
is non-convex, so taking the best objective from multiple successive
initialization is recommended). For easier reproducibility, this increments
the random seed by 1 for each consecutive restart, so future factorization
of the same dataset can be run with one rep if necessary. Default |
seed |
Random seed to allow reproducible results. Default |
nCores |
The number of parallel tasks to speed up the computation.
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
liger method - Returns updated input liger object.
A list of all matrices can be accessed with
getMatrix(object, "H")
A list of all matrices can be accessed with
getMatrix(object, "V")
The matrix can be accessed with
getMatrix(object, "W")
A list of all matrices can be accessed with
getMatrix(object, "U")
Currently, Seurat S3 method is not supported for UINMF because there is no simple solution for organizing a number of miscellaneous matrices with a single Seurat object. We strongly recommend that users create a liger object which has the specific structure.
April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc, useUnsharedDatasets = c("ctrl", "stim")) pbmc <- scaleNotCenter(pbmc) if (!is.null(getMatrix(pbmc, "scaleUnsharedData", "ctrl")) && !is.null(getMatrix(pbmc, "scaleUnsharedData", "stim"))) { # TODO: unshared variable features cannot be detected from this example pbmc <- runUINMF(pbmc) }
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc, useUnsharedDatasets = c("ctrl", "stim")) pbmc <- scaleNotCenter(pbmc) if (!is.null(getMatrix(pbmc, "scaleUnsharedData", "ctrl")) && !is.null(getMatrix(pbmc, "scaleUnsharedData", "stim"))) { # TODO: unshared variable features cannot be detected from this example pbmc <- runUINMF(pbmc) }
Run UMAP on the aligned cell factors (result from
alignFactors
), or unaligned cell factors (raw result from
runIntegration
)) to generate a 2D embedding for visualization
(or general dimensionality reduction). Has option to run on subset of
factors. It is generally recommended to use this method for dimensionality
reduction with extremely large datasets. The underlying UMAP calculation
imports uwot umap
.
runUMAP( object, useRaw = NULL, useDims = NULL, nDims = 2, distance = c("cosine", "euclidean", "manhattan", "hamming"), nNeighbors = 20, minDist = 0.1, dimredName = "UMAP", asDefault = NULL, seed = 42, verbose = getOption("ligerVerbose", TRUE), k = nDims, use.raw = useRaw, dims.use = useDims, n_neighbors = nNeighbors, min_dist = minDist, rand.seed = seed )
runUMAP( object, useRaw = NULL, useDims = NULL, nDims = 2, distance = c("cosine", "euclidean", "manhattan", "hamming"), nNeighbors = 20, minDist = 0.1, dimredName = "UMAP", asDefault = NULL, seed = 42, verbose = getOption("ligerVerbose", TRUE), k = nDims, use.raw = useRaw, dims.use = useDims, n_neighbors = nNeighbors, min_dist = minDist, rand.seed = seed )
object |
liger object with factorization results. |
useRaw |
Whether to use un-aligned cell factor loadings ( |
useDims |
Index of factors to use for computing the embedding. Default
|
nDims |
Number of dimensions to reduce to. Default |
distance |
Character. Metric used to measure distance in the input
space. Default |
nNeighbors |
Number of neighboring points used in local approximations
of manifold structure. Default |
minDist |
Numeric. Controls how tightly the embedding is allowed
compress points together. Default |
dimredName |
Name of the variable in |
asDefault |
Logical, whether to set the resulting dimRed as default for
visualization. Default |
seed |
Random seed for reproducibility. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
k , use.raw , dims.use , n_neighbors , min_dist , rand.seed
|
Deprecated. See Usage section for replacement. |
For nNeighbors
, larger values will result in more global
structure being preserved at the loss of detailed local structure. In general
this parameter should often be in the range 5 to 50, with a choice of 10 to
15 being a sensible default.
For minDist
, larger values ensure embedded points are more evenly
distributed, while smaller values allow the algorithm to optimize more
accurately with regard to local structure. Sensible values are in the range
0.001 to 0.5, with 0.1 being a reasonable default.
The object
where a "UMAP"
variable is updated in the
cellMeta
slot with the whole 2D embedding matrix.
pbmc <- runUMAP(pbmcPlot)
pbmc <- runUMAP(pbmcPlot)
This function scales normalized gene expression data after variable genes have been selected. We do not mean-center the data before scaling in order to address the non-negativity constraint of NMF. Computation applied to each normalized dataset matrix can form the following equation:
Where denotes the normalized matrix for an individual dataset,
is the output scaled matrix for this dataset, and
is the
number of cells in this dataset.
denotes the specific gene and
cell index, and
is the cell iterator.
Please see detailed section below for explanation on methylation dataset.
scaleNotCenter(object, ...) ## S3 method for class 'dgCMatrix' scaleNotCenter(object, ...) ## S3 method for class 'ligerDataset' scaleNotCenter( object, features = NULL, chunk = 1000, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'ligerMethDataset' scaleNotCenter( object, features = NULL, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'liger' scaleNotCenter( object, useDatasets = NULL, features = varFeatures(object), verbose = getOption("ligerVerbose", TRUE), remove.missing = NULL, ... ) ## S3 method for class 'Seurat' scaleNotCenter( object, assay = NULL, layer = "ligerNormData", save = "ligerScaleData", datasetVar = "orig.ident", features = NULL, ... )
scaleNotCenter(object, ...) ## S3 method for class 'dgCMatrix' scaleNotCenter(object, ...) ## S3 method for class 'ligerDataset' scaleNotCenter( object, features = NULL, chunk = 1000, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'ligerMethDataset' scaleNotCenter( object, features = NULL, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'liger' scaleNotCenter( object, useDatasets = NULL, features = varFeatures(object), verbose = getOption("ligerVerbose", TRUE), remove.missing = NULL, ... ) ## S3 method for class 'Seurat' scaleNotCenter( object, assay = NULL, layer = "ligerNormData", save = "ligerScaleData", datasetVar = "orig.ident", features = NULL, ... )
object |
liger object, ligerDataset object, dgCMatrix-class object, or a Seurat object. |
... |
Arguments passed to other methods. The order goes by: "liger" method calls "ligerDataset" method", which then calls "dgCMatrix" method. "Seurat" method directly calls "dgCMatrix" method. |
features |
Character, numeric or logical index that choose the variable
feature to be scaled. "liger" method by default uses
|
chunk |
Integer. Number of maximum number of cells in each chunk, when
scaling is applied to any HDF5 based dataset. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to be scaled but not centered. Default
|
remove.missing |
Deprecated. The functionality of this is covered through other parts of the whole workflow and is no long needed. Will be ignored if specified. |
assay |
Name of assay to use. Default |
layer |
For Seurat>=4.9.9, the name of layer to retrieve normalized
data. Default |
save |
For Seurat>=4.9.9, the name of layer to store normalized data.
Default |
datasetVar |
Metadata variable name that stores the dataset source
annotation. Default |
Updated object
dgCMatrix method - Returns scaled dgCMatrix object
ligerDataset method - Updates the scaleData
and
scaledUnsharedData
(if unshared variable feature available) slot
of the object
liger method - Updates the scaleData
and
scaledUnsharedData
(if unshared variable feature available) slot
of chosen datasets
Seurat method - Adds a named layer in chosen assay (V5), or update the
scale.data
slot of the chosen assay (<=V4)
Because gene body mCH proportions are negatively correlated with gene
expression level in neurons, we need to reverse the direction of the
methylation data before performing the integration. We do this by simply
subtracting all values from the maximum methylation value. The resulting
values are positively correlated with gene expression. This will only be
applied to variable genes detected in prior. Please make sure that argument
modal
is set accordingly when running createLiger
. In
this way, this function can automatically detect it and take proper action.
If it is not set, users can still manually have the equivalent processing
done by doing scaleNotCenter(lig, useDataset = c("other", "datasets"))
,
and then reverseMethData(lig, useDataset = c("meth", "datasets"))
.
Since the scaling on genes is applied on a per dataset base, other scaling
methods that apply to a whole concatenated matrix of multiple datasets might
not be considered as equivalent alternatives, even if options like
center
are set to FALSE
. Hence we implemented an efficient
solution that works under such circumstance, provided with the Seurat S3
method.
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc)
pbmc <- normalize(pbmc) pbmc <- selectGenes(pbmc) pbmc <- scaleNotCenter(pbmc)
Method to select HVGs based on mean dispersions of genes that are highly variable genes in all batches. Using a the top target_genes per batch by average normalize dispersion. If target genes still hasn't been reached, then HVGs in all but one batches are used to fill up. This is continued until HVGs in a single batch are considered.
This is an rliger implementation of the method originally published in SCIB. We found the potential that it can improve integration under some circumstances, and is currently testing it.
This function currently only works for shared features across all datasets.
For selection from only part of the datasets and selection for
dataset-specific unshared features, please use selectGenes()
.
selectBatchHVG(object, ...) ## S3 method for class 'liger' selectBatchHVG( object, nGenes = 2000, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'ligerDataset' selectBatchHVG( object, nGenes = 2000, features = NULL, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'dgCMatrix' selectBatchHVG( object, nGenes = 2000, returnStats = FALSE, verbose = getOption("ligerVerbose", TRUE), ... )
selectBatchHVG(object, ...) ## S3 method for class 'liger' selectBatchHVG( object, nGenes = 2000, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'ligerDataset' selectBatchHVG( object, nGenes = 2000, features = NULL, verbose = getOption("ligerVerbose", TRUE), ... ) ## S3 method for class 'dgCMatrix' selectBatchHVG( object, nGenes = 2000, returnStats = FALSE, verbose = getOption("ligerVerbose", TRUE), ... )
object |
A |
... |
Arguments passed to S3 methods. |
nGenes |
Integer number of target genes to select. Default |
verbose |
Logical. Whether to show a progress bar. Default
|
features |
For ligerDataset method, the feature subset to limit the
selection to, due to liger downstream requires non-zero features to be
considered. Default |
returnStats |
Logical, for dgCMatrix-method, whether to return a data
frame of statistics for all features, or by default |
liger-method: Returns the input liger object with the selected genes
updated in varFeatures
slot, which can be accessed with
varFeatures(object)
. Additionally, the statistics are updated in
the featureMeta
slot of each ligerDataset object within the
datasets
slot of the object
.
ligerDataset-method: Returns the input ligerDataset object with the
statistics updated in the featureMeta
slot.
dgCMatrix-method: By default returns a character vector of selected
variable features. If returnStats = TRUE
, returns a data.frame of the
statistics.
Luecken, M.D., Büttner, M., Chaichoompu, K. et al. (2022), Benchmarking atlas-level data integration in single-cell genomics. Nat Methods, 19, 41–50. https://doi.org/10.1038/s41592-021-01336-8.
pbmc <- selectBatchHVG(pbmc, nGenes = 10) varFeatures(pbmc)
pbmc <- selectBatchHVG(pbmc, nGenes = 10) varFeatures(pbmc)
This function identifies highly variable genes from each dataset and combines these gene sets (either by union or intersection) for use in downstream analysis. Assuming that gene expression approximately follows a Poisson distribution, this function identifies genes with gene expression variance above a given variance threshold (relative to mean gene expression). Alternatively, we allow selecting a desired number of genes for each dataset by ranking the relative variance, and then take the combination.
selectGenes(object, thresh = 0.1, nGenes = NULL, alpha = 0.99, ...) ## S3 method for class 'liger' selectGenes( object, thresh = 0.1, nGenes = NULL, alpha = 0.99, useDatasets = NULL, useUnsharedDatasets = NULL, unsharedThresh = 0.1, combine = c("union", "intersection"), chunk = 1000, verbose = getOption("ligerVerbose", TRUE), var.thresh = thresh, alpha.thresh = alpha, num.genes = nGenes, datasets.use = useDatasets, unshared.datasets = useUnsharedDatasets, unshared.thresh = unsharedThresh, tol = NULL, do.plot = NULL, cex.use = NULL, unshared = NULL, ... ) ## S3 method for class 'Seurat' selectGenes( object, thresh = 0.1, nGenes = NULL, alpha = 0.99, useDatasets = NULL, layer = "ligerNormData", assay = NULL, datasetVar = "orig.ident", combine = c("union", "intersection"), verbose = getOption("ligerVerbose", TRUE), ... )
selectGenes(object, thresh = 0.1, nGenes = NULL, alpha = 0.99, ...) ## S3 method for class 'liger' selectGenes( object, thresh = 0.1, nGenes = NULL, alpha = 0.99, useDatasets = NULL, useUnsharedDatasets = NULL, unsharedThresh = 0.1, combine = c("union", "intersection"), chunk = 1000, verbose = getOption("ligerVerbose", TRUE), var.thresh = thresh, alpha.thresh = alpha, num.genes = nGenes, datasets.use = useDatasets, unshared.datasets = useUnsharedDatasets, unshared.thresh = unsharedThresh, tol = NULL, do.plot = NULL, cex.use = NULL, unshared = NULL, ... ) ## S3 method for class 'Seurat' selectGenes( object, thresh = 0.1, nGenes = NULL, alpha = 0.99, useDatasets = NULL, layer = "ligerNormData", assay = NULL, datasetVar = "orig.ident", combine = c("union", "intersection"), verbose = getOption("ligerVerbose", TRUE), ... )
object |
A liger, ligerDataset or
|
thresh |
Variance threshold used to identify variable genes. Higher
threshold results in fewer selected genes. Liger and Seurat S3 methods accept
a single value or a vector with specific threshold for each dataset in
|
nGenes |
Number of genes to find for each dataset. By setting this,
we optimize the threshold used for each dataset so that we get |
alpha |
Alpha threshold. Controls upper bound for expected mean gene
expression. Lower threshold means higher upper bound. Default |
... |
Arguments passed to other methods. |
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to use for shared variable feature
selection. Default |
useUnsharedDatasets |
A character vector of the names, a numeric or
logical vector of the index of the datasets to use for finding unshared
variable features. Default |
unsharedThresh |
The same thing as |
combine |
How to combine variable genes selected from all datasets.
Choose from |
chunk |
Integer. Number of maximum number of cells in each chunk, when
gene selection is applied to any HDF5 based dataset. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
var.thresh , alpha.thresh , num.genes , datasets.use , unshared.datasets , unshared.thresh
|
Deprecated. These arguments are renamed and will be removed in the future. Please see function usage for replacement. |
tol , do.plot , cex.use , unshared
|
Deprecated. Gene variability
metric is now visualized with separated function
|
layer |
Where the input normalized counts should be from. Default
|
assay |
Name of assay to use. Default |
datasetVar |
Metadata variable name that stores the dataset source
annotation. Default |
Updated object
liger method - Each involved dataset stored in
ligerDataset is updated with its featureMeta
slot and varUnsharedFeatures
slot (if requested with
useUnsharedDatasets
), while varFeatures(object)
will be
updated with the final combined gene set.
Seurat method - Final selection will be updated at
Seurat::VariableFeatures(object)
. Per-dataset information is
stored in the meta.features
slot of the chosen Assay.
pbmc <- normalize(pbmc) # Select basing on thresholding the relative variance pbmc <- selectGenes(pbmc, thresh = .1) # Select specified number for each dataset pbmc <- selectGenes(pbmc, nGenes = c(60, 60))
pbmc <- normalize(pbmc) # Select basing on thresholding the relative variance pbmc <- selectGenes(pbmc, thresh = .1) # Select specified number for each dataset pbmc <- selectGenes(pbmc, nGenes = c(60, 60))
Seurat FindVariableFeatures VST method. This allows the selection of a fixed number of variable features, but only applies to one dataset. No normalization is needed in advance.
selectGenesVST( object, useDataset, n = 2000, loessSpan = 0.3, clipMax = "auto", useShared = TRUE, verbose = getOption("ligerVerbose", TRUE) )
selectGenesVST( object, useDataset, n = 2000, loessSpan = 0.3, clipMax = "auto", useShared = TRUE, verbose = getOption("ligerVerbose", TRUE) )
object |
A liger object. |
useDataset |
The names, a numeric or logical index of the dataset to be considered for selection. |
n |
Number of variable features needed. Default |
loessSpan |
Loess span parameter used when fitting the variance-mean
relationship. Default |
clipMax |
After standardization values larger than |
useShared |
Logical. Whether to only select from genes shared by all
dataset. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
Seurat::FindVariableFeatures.default(selection.method = "vst")
pbmc <- selectGenesVST(pbmc, "ctrl", n = 50)
pbmc <- selectGenesVST(pbmc, "ctrl", n = 50)
Subset liger with brackets
## S3 method for class 'liger' x[i, j, ...]
## S3 method for class 'liger' x[i, j, ...]
x |
A liger object |
i |
Feature subscriptor, passed to |
j |
Cell subscriptor, passed to |
... |
Additional arguments passed to |
Subset of x
with specified features and cells.
pbmcPlot[varFeatures(pbmcPlot)[1:10], 1:10]
pbmcPlot[varFeatures(pbmcPlot)[1:10], 1:10]
Subset ligerDataset object
## S3 method for class 'ligerDataset' x[i, j, ...]
## S3 method for class 'ligerDataset' x[i, j, ...]
x |
A ligerDataset object |
i |
Numeric, logical index or character vector of feature names to subscribe. Leave missing for all features. |
j |
Numeric, logical index or character vector of cell IDs to subscribe. Leave missing for all cells. |
... |
Additional arguments passed to |
If i
is given, the selected metadata will be returned; if it
is missing, the whole cell metadata table in
S4Vectors::DataFrame
class will be returned.
ctrl <- dataset(pbmc, "ctrl") ctrl[1:5, 1:5]
ctrl <- dataset(pbmc, "ctrl") ctrl[1:5, 1:5]
Get cell metadata variable
## S3 method for class 'liger' x[[i, ...]]
## S3 method for class 'liger' x[[i, ...]]
x |
A liger object |
i |
Name or numeric index of cell meta data to fetch |
... |
Anything that |
If i
is given, the selected metadata will be returned; if it
is missing, the whole cell metadata table in
S4Vectors::DataFrame
class will be returned.
# Retrieve whole cellMeta pbmc[[]] # Retrieve a variable pbmc[["dataset"]]
# Retrieve whole cellMeta pbmc[[]] # Retrieve a variable pbmc[["dataset"]]
This function subsets a liger object with
character feature index and any valid cell index. For datasets based on HDF5,
the filenames of subset H5 files could only be automatically generated for
now. Feature subsetting is based on the intersection of available features
from datasets involved by cellIdx
, while featureIdx = NULL
does
not take the intersection (i.e. nothing done on the feature axis).
a ligerDataset object is also allowed for now and meanwhile,
setting filename
is supported.
subsetLiger( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), newH5 = TRUE, returnObject = TRUE, ... )
subsetLiger( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), newH5 = TRUE, returnObject = TRUE, ... )
object |
A liger or ligerDataset object. |
featureIdx |
Character vector. Missing or |
cellIdx |
Character, logical or numeric index that can subscribe cells.
Missing or |
useSlot |
The slot(s) to only consider. Choose one or more from
|
chunkSize |
Integer. Number of maximum number of cells in each chunk,
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
newH5 |
Whether to create new H5 files on disk for the subset datasets
if involved datasets in the |
returnObject |
Logical, whether to return a liger object
for result. Default |
... |
Arguments passed to |
Subset object
pbmc.small <- subsetLiger(pbmc, cellIdx = pbmc$nUMI > 200) pbmc.small <- pbmc[, pbmc$nGene > 50]
pbmc.small <- subsetLiger(pbmc, cellIdx = pbmc$nUMI > 200) pbmc.small <- pbmc[, pbmc$nGene > 50]
This function subsets a ligerDataset object with valid feature and cell indices. For HDF5 based object, options are available for subsetting data into memory or a new on-disk H5 file. Feature and cell subscription is always based on the size of rawData. Therefore, the feature subsetting on scaled data, which usually contains already a subset of features, will select the intersection between the wanted features and the set available from scaled data.
subsetLigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, newH5 = TRUE, filename = NULL, filenameSuffix = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), returnObject = TRUE, ... ) subsetH5LigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, newH5 = TRUE, filename = NULL, filenameSuffix = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), returnObject = TRUE ) subsetMemLigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, returnObject = TRUE )
subsetLigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, newH5 = TRUE, filename = NULL, filenameSuffix = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), returnObject = TRUE, ... ) subsetH5LigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, newH5 = TRUE, filename = NULL, filenameSuffix = NULL, chunkSize = 1000, verbose = getOption("ligerVerbose", TRUE), returnObject = TRUE ) subsetMemLigerDataset( object, featureIdx = NULL, cellIdx = NULL, useSlot = NULL, returnObject = TRUE )
object |
ligerDataset object. HDF5 based object if using
|
featureIdx |
Character, logical or numeric index that can subscribe
features. Missing or |
cellIdx |
Character, logical or numeric index that can subscribe cells.
Missing or |
useSlot |
The slot(s) to only consider. Choose one or more from
|
newH5 |
Whether to create a new H5 file on disk for the subset dataset
if |
filename |
Filename of the new H5 file if being created. Default
|
filenameSuffix |
Instead of specifying the exact filename, set a suffix
for the new files so the new filename looks like
|
chunkSize |
Integer. Number of maximum number of cells in each chunk,
Default |
verbose |
Logical. Whether to show information of the progress. Default
|
returnObject |
Logical, whether to return a ligerDataset
object for result. Default |
... |
Arguments passed to |
Subset object
ctrl <- dataset(pbmc, "ctrl") ctrl.small <- subsetLigerDataset(ctrl, cellIdx = 1:5) ctrl.tiny <- ctrl[1:5, 1:5]
ctrl <- dataset(pbmc, "ctrl") ctrl.small <- subsetLigerDataset(ctrl, cellIdx = 1:5) ctrl.tiny <- ctrl[1:5, 1:5]
Due to massive updates since rliger 2.0, old liger object structures are no longer compatible with the current package. This function will update the object to the latest structure.
updateLigerObject( object, dimredName, clusterName = "clusters", h5FilePath = NULL )
updateLigerObject( object, dimredName, clusterName = "clusters", h5FilePath = NULL )
object |
An object of any version of rliger |
dimredName |
Name of the dimension reduction embedding to be stored. Please see Details section. |
clusterName |
Name of the clustering assignment variable to be stored. Please see Details section. |
h5FilePath |
Named character vector for all H5 file paths. Not required for object run with in-memory analysis. For object containing H5-based analysis (e.g. online iNMF), this must be supplied if the H5 file location is different from that at creation time. |
Old liger object (<1.99.0) stores only one embedding at slot
tsne.coords
. dimredName
must be specified as a single
character. Pre-release version (1.99.0) stores multiple embeddings in
cellMeta
. dimredName
must be exact existing variable names in
cellMeta
slot.
Old liger object stores clustering assignment in slot clusters
.
clusterName
must be specified as a single character. Pre-release
version does not require this.
Updated liger object.
## Not run: # Suppose you have a liger object of old version (<1.99.0) newLig <- updateLigerObject(oldLig, dimredName = "UMAP", clusterName = "louvain") ## End(Not run)
## Not run: # Suppose you have a liger object of old version (<1.99.0) newLig <- updateLigerObject(oldLig, dimredName = "UMAP", clusterName = "louvain") ## End(Not run)
This function writes in-memory data into H5 file by default in 10x cellranger
HDF5 output format. The main goal of this function is to allow users to
integrate large H5-based dataset, that cannot be fully loaded into memory,
with other data already loaded in memory using runOnlineINMF
.
In this case, users can write the smaller in-memory data to H5 file instead
of loading subset of the large H5-based dataset into memory, where
information might be lost.
Basing on the goal of the whole workflow, the data will always be written in a CSC matrix format and colnames/rownames are always required.
The default method coerces the input to a dgCMatrix-class object. Methods for other container classes tries to extract proper data and calls the default method.
writeH5(x, file, ...) ## Default S3 method: writeH5(x, file, ...) ## S3 method for class 'dgCMatrix' writeH5( x, file, overwrite = FALSE, indicesPath = "matrix/indices", indptrPath = "matrix/indptr", dataPath = "matrix/data", shapePath = "matrix/shape", barcodesPath = "matrix/barcodes", featuresPath = "matrix/features/name", ... ) ## S3 method for class 'ligerDataset' writeH5(x, file, ...) ## S3 method for class 'liger' writeH5(x, file, useDatasets, ...)
writeH5(x, file, ...) ## Default S3 method: writeH5(x, file, ...) ## S3 method for class 'dgCMatrix' writeH5( x, file, overwrite = FALSE, indicesPath = "matrix/indices", indptrPath = "matrix/indptr", dataPath = "matrix/data", shapePath = "matrix/shape", barcodesPath = "matrix/barcodes", featuresPath = "matrix/features/name", ... ) ## S3 method for class 'ligerDataset' writeH5(x, file, ...) ## S3 method for class 'liger' writeH5(x, file, useDatasets, ...)
x |
An object with in-memory data to be written into H5 file. |
file |
A character string of the file path to be written. |
... |
Arguments passed to other S3 methods. |
overwrite |
Logical, whether to overwrite the file if it already exists.
Default |
indicesPath , indptrPath , dataPath
|
The paths inside the H5 file where
the dgCMatrix-class constructor |
shapePath |
The path inside the H5 file where the shape of the matrix
will be written to. Default |
barcodesPath |
The path inside the H5 file where the barcodes/colnames
will be written to. Default |
featuresPath |
The path inside the H5 file where the features/rownames
will be written to. Default |
useDatasets |
For liger method. Names or indices of datasets to be written to H5 files. Required. |
Nothing is returned. H5 file will be created on disk.
10X cellranger H5 matrix detail
raw <- rawData(pbmc, "ctrl") writeH5(raw, tempfile(pattern = "ctrl_", fileext = ".h5"))
raw <- rawData(pbmc, "ctrl") writeH5(raw, tempfile(pattern = "ctrl_", fileext = ".h5"))