Title: | Supervised iNMF informed Deconvolution |
---|---|
Description: | A package for completing cell type deconvolution on bulk spatial transcriptomic data utilizing multiple reference scRNA-seq datasets. |
Authors: | Joshua Sodicoff [aut, cre], Yichen Wang [ctb] |
Maintainer: | Joshua Sodicoff <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.0.9000 |
Built: | 2024-12-11 05:43:24 UTC |
Source: | https://github.com/welch-lab/SiNMFiD |
Calculate relationships between cell types
analyze_gene_signatures( filepath, analysis.name, spatial.data.name, rand.seed = 123, cell.types.use = NULL, return.objs = F )
analyze_gene_signatures( filepath, analysis.name, spatial.data.name, rand.seed = 123, cell.types.use = NULL, return.objs = F )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
cell.types.use |
A string of cell type labels to include in the plot, by default all cell types present |
return.objs |
Logical, whether to return a list of matrices of derived data |
named list of cosine similarity matrix and hierarchical clustering,
if return.objs = TRUE
Calculate relationships between cell type distributions
analyze_spatial_correlation( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.use = NULL, return.objs = F )
analyze_spatial_correlation( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.use = NULL, return.objs = F )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
mat.use |
A string, either "raw" or "proportions" referring to what version of the results to summarize |
cell.types.use |
A string of cell type labels to include in the plot, by default all cell types present |
return.objs |
Logical, whether to return a list of matrices of derived data |
named list of pearson correlation matrix and hierarchical clustering,
if return.objs = TRUE
Calculate cell sizes with all reference data
calculate_cell_sizes( data.list, annotations, filepath, analysis.name, datasets.remove = NULL, plot.hist = FALSE, chunk = 1000 )
calculate_cell_sizes( data.list, annotations, filepath, analysis.name, datasets.remove = NULL, plot.hist = FALSE, chunk = 1000 )
data.list |
Various formats are allowed, including 1. a liger object; 2. a character vector containing file names to RDS/H5 files. 3. Named list of liger object, RDS/H5 file name, matrix/dgCMatrix. List option can have element types mixed. A liger object have to be of version older than 1.99. RDS files must contain base dense matrix or dgCMatrix supported by package "Matrix". H5 files must contain dataset processed by rliger < 1.99. |
annotations |
Named factor of all cell type assignments, should be concatenated from all datasets. |
filepath |
Path to analysis directory where output sampling needs to be stored. |
analysis.name |
String identifying the analysis, used to make up a sub-folder name. |
datasets.remove |
Character vector of datasets to be excluded from
sampling if |
plot.hist |
Logical, if to display and save histograms of nUMIs by cell type |
chunk |
Integer chunk size for processing sparse data stored in H5.
Number of cells to load into memory per iteration. Default |
Nothing is returned, but the following file will be stored to local:
"<filepath>/<analysis.name>/cell_size_histogram.pdf"
- A PDF
file for the histogram that shows nUMI per cell distribution for each cell
type
"<filepath>/<analysis.name>/cell_size.RDS"
- RDS file of a
named numeric vector object, total number of counts per cell type across all
datasets.
Calculate the Wasserstein distance between cell-types and genes
calculate_wasserstein( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.use = NULL, genes.use = NULL, p = 2, min.samples = 1, return.objs = F )
calculate_wasserstein( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.use = NULL, genes.use = NULL, p = 2, min.samples = 1, return.objs = F )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
mat.use |
A string, either "raw" or "proportions" referring to what version of the results to summarize |
cell.types.use |
A string of cell type labels to include in the plot, by default all cell types present |
genes.use |
A string of genes to include in a plot, by default none |
p |
The |
min.samples |
Integer value, the minimum number of samples a cell type can load on and be included in the analysis |
return.objs |
Logical, whether to return a list of matrices of derived data |
matrix of pairwise Wasserstein distances if return.objs = TRUE
Generate histograms of loading by cell type
cell_type_loading_histogram( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.plot = NULL, print.plots = TRUE, bin.num = 30 )
cell_type_loading_histogram( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.plot = NULL, print.plots = TRUE, bin.num = 30 )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
mat.use |
A string, either "raw" or "proportions" referring to what version of the results to summarize |
print.plots |
Logical, whether to display results in the plots panel |
bin.num |
Integer number of bins to use in histogram |
cell.types.use |
A string of cell type labels to include in the plot, by default all cell types present |
nothing
Title
deconvolve_spatial( filepath, analysis.name, spatial.data.name, rand.seed = 123, cell.size = T )
deconvolve_spatial( filepath, analysis.name, spatial.data.name, rand.seed = 123, cell.size = T )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
cell.size |
Logical, if to scale gene signatures by cell sizes |
nothing
Title
generate_label_gifs( filepath, analysis.name, spatial.data.name, labels.plot, dims = c(500, 500) )
generate_label_gifs( filepath, analysis.name, spatial.data.name, labels.plot, dims = c(500, 500) )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
labels.plot |
A named vector or matrix of labels to plot for the provided coordinates |
dims |
Integer vector of length 2 corresponding to the width and height of the RGL window |
nothing
Generate gifs of cell type distributions derived from deconvolution in space
generate_loading_gifs( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.plot = NULL, filter = NULL, dims = c(500, 500) )
generate_loading_gifs( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.plot = NULL, filter = NULL, dims = c(500, 500) )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
mat.use |
A string, either "raw" or "proportions" referring to what version of the results to summarize |
cell.types.plot |
A character vector of cell types to plot |
dims |
Integer vector of length 2 corresponding to the width and height of the RGL window |
nothing
Title
learn_gene_signatures( filepath, analysis.name, spatial.data.name, rand.seed = 123, lambda = 1, thresh = 1e-08, max.iters = 100, nrep = 1, print.obj = FALSE, verbose = FALSE )
learn_gene_signatures( filepath, analysis.name, spatial.data.name, rand.seed = 123, lambda = 1, thresh = 1e-08, max.iters = 100, nrep = 1, print.obj = FALSE, verbose = FALSE )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
lambda |
Double, regularization parameter for which increasing penalizes dataset-specific effects |
thresh |
Double, minimum fractional change in objective function to continue iteration |
max.iters |
Integer maximum of iterations to complete before pausing |
nrep |
Number of random starts to complete |
print.obj |
Logical, if to print current value of objective |
verbose |
Logical, if to print the final objective and best random seed |
nothing
Load data from one of multiple formats
load_objs(objs, datasets.remove)
load_objs(objs, datasets.remove)
objs |
A named list of matrices (dgCMatrix), RDS file paths to matirces, H5 file paths to LIGER analyzed datasets. |
list object. List element type depends on input.
Flip axes in spatial data
mirror_spatial_coords( filepath, analysis.name, spatial.data.name, axes.flip = c(FALSE, FALSE, FALSE), overwrite = T )
mirror_spatial_coords( filepath, analysis.name, spatial.data.name, axes.flip = c(FALSE, FALSE, FALSE), overwrite = T )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
axes.flip |
A vector with three logicals, corresponding to which of the axes to invert |
overwrite |
Logical, if the original data should be overwritten,
otherwise " |
nothing
Title
overlay_subregion_gifs( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.plot = NULL, subregions.plot = NULL, filter = NULL, dims = c(500, 500) )
overlay_subregion_gifs( filepath, analysis.name, spatial.data.name, rand.seed = 123, mat.use = "proportions", cell.types.plot = NULL, subregions.plot = NULL, filter = NULL, dims = c(500, 500) )
filepath |
filepath |
analysis.name |
analysis.name |
spatial.data.name |
spatial.data.name |
rand.seed |
rand.seed |
mat.use |
mat.use |
cell.types.plot |
cell.types.plot |
subregions.plot |
subregions.plot |
filter |
filter |
dims |
dims |
nothing
analyze_gene_signatures
Plot results of analyze_gene_signatures
plot_analyze_gene_signatures( filepath, analysis.name, spatial.data.name, rand.seed = 123, print.plots = T )
plot_analyze_gene_signatures( filepath, analysis.name, spatial.data.name, rand.seed = 123, print.plots = T )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
print.plots |
Logical, whether to display results in the plots panel |
nothing
analyze_spatial_correlation
Plot results of analyze_spatial_correlation
plot_analyze_spatial_correlation( filepath, analysis.name, spatial.data.name, rand.seed = 123, print.plots = TRUE )
plot_analyze_spatial_correlation( filepath, analysis.name, spatial.data.name, rand.seed = 123, print.plots = TRUE )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
print.plots |
Logical, whether to display results in the plots panel |
nothing
calculate_wasserstein
Plot results of calculate_wasserstein
plot_calculate_wasserstein( filepath, analysis.name, spatial.data.name, rand.seed = 123, print.plots = T )
plot_calculate_wasserstein( filepath, analysis.name, spatial.data.name, rand.seed = 123, print.plots = T )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
print.plots |
Logical, whether to display results in the plots panel |
nothing
summarize_by_layer
Plot results of summarize_by_layer
plot_summarize_by_layer( filepath, analysis.name, spatial.data.name, rand.seed = 123, print.plots = T )
plot_summarize_by_layer( filepath, analysis.name, spatial.data.name, rand.seed = 123, print.plots = T )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
print.plots |
Logical, whether to display results in the plots panel |
nothing
Quality-control spatial data
qc_spatial_data( filepath, analysis.name, spatial.data.name, count.data = FALSE, z = 1, n.umi.thresh = 150, rand.seed = 123 )
qc_spatial_data( filepath, analysis.name, spatial.data.name, count.data = FALSE, z = 1, n.umi.thresh = 150, rand.seed = 123 )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
count.data |
Logical, if the spatial data is from a counts or intensity-based modality |
z |
Double, the standard deviations above the mean that the number of NAs in a gene can be before the gene is removed, for intensity data |
n.umi.thresh |
Integer number of counts below which to remove a sample, for counts based data |
rand.seed |
Integer random seed |
nothing
Generate silouhettes of the data along all three axes
reference_3d_coordinates( filepath, analysis.name, spatial.data.name, save.plots = FALSE )
reference_3d_coordinates( filepath, analysis.name, spatial.data.name, save.plots = FALSE )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
save.plots |
A logical, corresponding with if to save requested plots upon generation |
nothing
Transfer labels from coarse-grained sampled
register_voxel_to_label( filepath, analysis.name, spatial.data.name, labels.use, label.name )
register_voxel_to_label( filepath, analysis.name, spatial.data.name, labels.use, label.name )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
labels.use |
Named vector of labels for the prevoxelized data |
label.name |
String identifying the label set |
nothing
Sample from single cell reference datasets
sample_single_cell( data.list, annotations, filepath, analysis.name, datasets.remove = NULL, n.cells = 500, rand.seed = 123, chunk = 1000 )
sample_single_cell( data.list, annotations, filepath, analysis.name, datasets.remove = NULL, n.cells = 500, rand.seed = 123, chunk = 1000 )
data.list |
Various formats are allowed, including 1. a liger object; 2. a character vector containing file names to RDS/H5 files. 3. Named list of liger object, RDS/H5 file name, matrix/dgCMatrix. List option can have element types mixed. A liger object have to be of version older than 1.99. RDS files must contain base dense matrix or dgCMatrix supported by package "Matrix". H5 files must contain dataset processed by rliger < 1.99. |
annotations |
Named factor of cell type assignments. |
filepath |
Path to analysis directory where output sampling needs to be stored. |
analysis.name |
String identifying the analysis, used to make up a sub-folder name. |
datasets.remove |
Character vector of datasets to be excluded from
sampling if |
n.cells |
Integer value corresponding to maximum number of samples per
cell type. Default |
rand.seed |
Integer random seed for reproducible sampling. |
chunk |
Integer chunk size for processing sparse data stored in H5.
Number of cells to load into memory per iteration. Default |
Nothing is returned. File "norm_data.RDS"
will be stored under
"<filepath>/<analysis.name>/<rand.seed>/"
, containing a list of
downsampled scaled (not centered) data matrix. File
"sampled_cells.RDS"
is stored at the same path, containing barcode
vector of the sampled cells. File "source_annotations.RDS"
is stored
at "<filepath>/<analysis.name>/"
which contains input
annotations
.
## Not run: # Explanation for how `datasets.remove` works with example: names([email protected]) # above should show "data1", "data2", "data3", ... # Then when sampling from `lig`, the first two datasets can be excluded with sample_single_cell(data.list = lig, datasets.remove = c("data1", "data2")) # If we got a list of liger object sample_single_cell(data.list = list(human = lig1, mouse = lig2), datasets.remove = list(human = c("data1", "data2"), mouse = c("10x1"))) ## End(Not run)
## Not run: # Explanation for how `datasets.remove` works with example: names(lig@raw.data) # above should show "data1", "data2", "data3", ... # Then when sampling from `lig`, the first two datasets can be excluded with sample_single_cell(data.list = lig, datasets.remove = c("data1", "data2")) # If we got a list of liger object sample_single_cell(data.list = list(human = lig1, mouse = lig2), datasets.remove = list(human = c("data1", "data2"), mouse = c("10x1"))) ## End(Not run)
Add a new spatial dataset to the analysis directory
save_spatial_data( filepath, analysis.name, spatial.data.file, coords.file, spatial.data.name )
save_spatial_data( filepath, analysis.name, spatial.data.file, coords.file, spatial.data.name )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.file |
Path to an RDS file containing desired expression data |
coords.file |
Path to an RDS file containing desired coordinate data |
spatial.data.name |
String identifying the spatial sample |
nothing
select variable genes with the Kruskal-Wallis test
select_defining_genes( filepath, analysis.name, deconv.gene.num = 2000, gene.num.tol = 50, rand.seed = 123 )
select_defining_genes( filepath, analysis.name, deconv.gene.num = 2000, gene.num.tol = 50, rand.seed = 123 )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
deconv.gene.num |
Integer, the number of genes to select |
gene.num.tol |
Integer, the maximum difference between the number of
genes selected and |
rand.seed |
Integer random seed |
nothing
Set up new analysis directory
start_analysis(filepath, analysis.name)
start_analysis(filepath, analysis.name)
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
nothing
Subset a spatial dataset by coordinates for analysis
subset_spatial_data( filepath, analysis.name, spatial.data.name, subset.specs = list(c(NaN, NaN), c(NaN, NaN), c(NaN, NaN)), new.spatial.data.name = NULL, out.filepath = NULL )
subset_spatial_data( filepath, analysis.name, spatial.data.name, subset.specs = list(c(NaN, NaN), c(NaN, NaN), c(NaN, NaN)), new.spatial.data.name = NULL, out.filepath = NULL )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
subset.specs |
A list with length equal to the number of axes, with each entry a vector of length two, with the first element being the minimum value to include and the second being the maximum, or NaN to indicate a missing value |
new.spatial.data.name |
String, optional name for new analysis, otherwise
the default " |
out.filepath |
Path to directory to save subset data to if not within the analysis |
nothing
Summarize cell-type and gene expression data by
summarize_by_layer( filepath, analysis.name, spatial.data.name, rand.seed = 123, layer.list, type = "mean", mat.use = "proportions", cell.types.use = NULL, genes.use = NULL, return.objs = FALSE )
summarize_by_layer( filepath, analysis.name, spatial.data.name, rand.seed = 123, layer.list, type = "mean", mat.use = "proportions", cell.types.use = NULL, genes.use = NULL, return.objs = FALSE )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
layer.list |
A named list of spatial samples by layer of interest |
type |
A string, either "mean" or "sum", how results should be combined for summary |
mat.use |
A string, either "raw", "proportions", or "assignments" referring to what version of the results to summarize |
cell.types.use |
A string of cell type labels to include in the plot, by default all cell types present |
genes.use |
A string of genes to include in a plot, by default none |
return.objs |
Logical, whether to return a list of matrices of derived data |
cell-type and gene expression data summarized by layer in a named
list, if return.objs = TRUE
Summarize cell types present in the source annotations
summarize_clusters(filepath, analysis.name, return.objs = F)
summarize_clusters(filepath, analysis.name, return.objs = F)
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
return.objs |
Logical, whether to return a vector of the names of clusters |
A vector of unique clusters in the source annotations,
if return.objs = TRUE
Summarize subregions of a vector of regions of interest
summarize_subregions( regions, ontology.file = "Downloads/allen_structure_ontology.csv", return.objs = F )
summarize_subregions( regions, ontology.file = "Downloads/allen_structure_ontology.csv", return.objs = F )
regions |
A vector of region names |
ontology.file |
A csv describing the Allen structure ontology |
return.objs |
Logical, whether to return acronyms for all subregions found |
A vector of unique subregions within the provided regions,
if return.objs = TRUE
Use predefined transformations to match some modalities to the Allen CCF
transform_coords_to_ccf(filepath, analysis.name, spatial.data.name, ish = T)
transform_coords_to_ccf(filepath, analysis.name, spatial.data.name, ish = T)
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
ish |
Logical, if the data comes from the Allen Institute quantified ISH dataset |
nothing
Title
view_in_rgl( filepath, analysis.name, spatial.data.name, rand.seed = 123, cell.type, mat.use = "proportions", filter.samples = NULL, dims = c(500, 500) )
view_in_rgl( filepath, analysis.name, spatial.data.name, rand.seed = 123, cell.type, mat.use = "proportions", filter.samples = NULL, dims = c(500, 500) )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
rand.seed |
Integer random seed |
cell.type |
A string corresponding to one cell type found in the deconvolution results |
mat.use |
A string, either "raw" or "proportions" referring to what version of the results to summarize |
filter.samples |
Value for binarizing results, either presence above the provided threshold or absence below |
dims |
Integer vector of length 2 corresponding to the width and height of the RGL window |
nothing
Coarse-grain spatial data to a predetermined resolution
voxelize_single_cells( filepath, analysis.name, spatial.data.name, voxel.size, out.filepath = NULL, verbose = TRUE )
voxelize_single_cells( filepath, analysis.name, spatial.data.name, voxel.size, out.filepath = NULL, verbose = TRUE )
filepath |
Path to analysis directory |
analysis.name |
String identifying the analysis |
spatial.data.name |
String identifying the spatial sample |
voxel.size |
Integer, side length of one voxel |
out.filepath |
Path to directory to save subset data to if not within the analysis |
verbose |
Logical, if to print several lines of metadata on results |
nothing