Title: | Single-Cell Interpretable Tensor Decomposition |
---|---|
Description: | Single-cell Interpretable Tensor Decomposition (scITD) employs the Tucker tensor decomposition to extract multicell-type gene expression patterns that vary across donors/individuals. This tool is geared for use with single-cell RNA-sequencing datasets consisting of many source donors. The method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. Each "multicellular process" that is extracted consists of (A) a multi cell type gene loadings matrix and (B) a corresponding donor scores vector indicating the level at which the corresponding loadings matrix is expressed in each donor. Additional methods are implemented to aid in selecting an appropriate number of factors and to evaluate stability of the decomposition. Additional tools are provided for downstream analysis, including integration of gene set enrichment analysis and ligand-receptor analysis. Tucker, L.R. (1966) <doi:10.1007/BF02289464>. Unkel, S., Hannachi, A., Trendafilov, N. T., & Jolliffe, I. T. (2011) <doi:10.1007/s13253-011-0055-9>. Zhou, G., & Cichocki, A. (2012) <doi:10.2478/v10175-012-0051-4>. |
Authors: | Jonathan Mitchel [cre, aut], Evan Biederstedt [aut], Peter Kharchenko [aut] |
Maintainer: | Jonathan Mitchel <[email protected]> |
License: | GPL-3 |
Version: | 1.0.4 |
Built: | 2025-02-11 05:32:31 UTC |
Source: | https://github.com/cran/scITD |
Apply ComBat batch correction to pseudobulk matrices. Generally, this should be done through calling the form_tensor() wrapper function.
apply_combat(container, batch_var)
apply_combat(container, batch_var)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
batch_var |
character A batch variable from metadata to remove |
The project container with the batc corrected pseudobulked matrices.
Calculate F-Statistics for the association between donor scores for each factor donor values of shuffled gene_ctype fibers
calculate_fiber_fstats(tensor_data, tucker_results, s_fibers)
calculate_fiber_fstats(tensor_data, tucker_results, s_fibers)
tensor_data |
list The tensor data including donor, gene, and cell type labels as well as the tensor array itself |
tucker_results |
list The results from Tucker decomposition. Includes a scores matrix as the first element and the loadings tensor unfolded as the second element. |
s_fibers |
list Gene and cell type indices for the randomly selected fibers |
A numeric vector of F-statistics for associations between all shuffled fibers and donor scores.
Helper function to check whether receptor is present in target cell type
check_rec_pres( container, lig_ct_exp, rec_elements, target_ct, percentile_exp_rec )
check_rec_pres( container, lig_ct_exp, rec_elements, target_ct, percentile_exp_rec )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
lig_ct_exp |
numeric Scaled expression for a ligand in the source cell type |
rec_elements |
character One or more components of a receptor complex |
target_ct |
character The name of the target cell type |
percentile_exp_rec |
numeric The percentile of ligand expression above which all donors need to have at least 5 cells expressing the receptor. |
A logical indicating whether receptor is present or not.
Clean data to remove genes only expressed in a few cells and donors with very few cells. Generally, this should be done through calling the form_tensor() wrapper function.
clean_data(container, donor_min_cells = 5)
clean_data(container, donor_min_cells = 5)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
donor_min_cells |
numeric Minimum threshold for number of cells per donor (default=5) |
The project container with cleaned counts matrices in each container$scMinimal_ctype$<ctype>$count_data.
Calculates column mean and variance. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp
colMeanVars(sY, rowSel, ncores = 1L)
colMeanVars(sY, rowSel, ncores = 1L)
sY |
sparse matrix Gene by cell matrix of counts |
rowSel |
numeric The selected rows (genes) |
ncores |
numeric The number of cores |
data.frame with columns of mean, variance, and number of observeatios for each gene across samples
library(Matrix) donor_by_gene <- rbind(c(9,2,1,5), c(3,3,1,2)) donor_by_gene <- Matrix(donor_by_gene, sparse = TRUE) result <- colMeanVars(donor_by_gene, rowSel = NULL, ncores=1)
library(Matrix) donor_by_gene <- rbind(c(9,2,1,5), c(3,3,1,2)) donor_by_gene <- Matrix(donor_by_gene, sparse = TRUE) result <- colMeanVars(donor_by_gene, rowSel = NULL, ncores=1)
Plot a pairwise comparison of factors from two separate decompositions
compare_decompositions( tucker_res1, tucker_res2, decomp_names, meta_anno1 = NULL, meta_anno2 = NULL, use_text = TRUE )
compare_decompositions( tucker_res1, tucker_res2, decomp_names, meta_anno1 = NULL, meta_anno2 = NULL, use_text = TRUE )
tucker_res1 |
list The container$tucker_res from first decomposition |
tucker_res2 |
list The container$tucker_res from first decomposition |
decomp_names |
character Names of the two decompositions that will go on the axes of the heatmap |
meta_anno1 |
matrix The result of calling get_meta_associations() corresponding to the first decomposition, which is stored in container$meta_associations (default=NULL) |
meta_anno2 |
matrix The result of calling get_meta_associations() corresponding to the second decomposition, which is stored in container$meta_associations (default=NULL) |
use_text |
logical If TRUE, then displays correlation coefficients in cells (default=TRUE) |
No return value, as the resulting plots are drawn.
test_container <- run_tucker_ica(test_container, ranks=c(2,4), tucker_type='regular', rotation_type='hybrid') tucker_res1 <- test_container$tucker_results test_container <- run_tucker_ica(test_container, ranks=c(2,4), tucker_type='regular', rotation_type='ica_dsc') tucker_res2 <- test_container$tucker_results compare_decompositions(tucker_res1,tucker_res2,c('hybrid_method','ica_method'))
test_container <- run_tucker_ica(test_container, ranks=c(2,4), tucker_type='regular', rotation_type='hybrid') tucker_res1 <- test_container$tucker_results test_container <- run_tucker_ica(test_container, ranks=c(2,4), tucker_type='regular', rotation_type='ica_dsc') tucker_res2 <- test_container$tucker_results compare_decompositions(tucker_res1,tucker_res2,c('hybrid_method','ica_method'))
Compute associations between donor proportions and factor scores
compute_associations(donor_balances, donor_scores, stat_type)
compute_associations(donor_balances, donor_scores, stat_type)
donor_balances |
matrx The balances computed from donor cell type proportions |
donor_scores |
data.frame The donor scores matrix from tucker results |
stat_type |
character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues. |
A numeric vector of association statistics (one for each factor)
Get donor proportions of each cell type or subtype
compute_donor_props(clusts, metadata)
compute_donor_props(clusts, metadata)
clusts |
integer Cluster assignments for each cell with names as cell barcodes |
metadata |
data.frame The $metadata field for the given scMinimal |
A data.frame of cluster proportions for each donor.
Compute and plot the LR interactions for one factor
compute_LR_interact( container, lr_pairs, sig_thresh = 0.05, percentile_exp_rec = 0.75, add_ld_fact_sig = TRUE, ncores = container$experiment_params$ncores )
compute_LR_interact( container, lr_pairs, sig_thresh = 0.05, percentile_exp_rec = 0.75, add_ld_fact_sig = TRUE, ncores = container$experiment_params$ncores )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
lr_pairs |
data.frame Data of ligand-receptor pairs. First column should be ligands and second column should be one or more receptors separated by an underscore such as receptor1_receptor2 in the case that multiple receptors are required for signaling. |
sig_thresh |
numeric The p-value significance threshold to use for module- factor associations and ligand-factor associations (default=0.05) |
percentile_exp_rec |
numeric The percentile above which the top donors expressing the ligand all must be expressing the receptor (default=0.75) |
add_ld_fact_sig |
logical Set to TRUE to append a heatmap showing significance of associations between each ligand hit and each factor (default=TRUE) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
The LR analysis results heatmap as ComplexHeatmap object. Adjusted p-values for all results are placed in container$lr_res.
Convert gene identifiers to gene symbols
convert_gn(container, genes)
convert_gn(container, genes)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
genes |
character Vector of the gene identifiers to be converted to gene symbols |
A character vector of gene symbols.
count_word. From older version of simplifyEnrichment package.
count_word(term, exclude_words = NULL)
count_word(term, exclude_words = NULL)
term |
A vector of description texts. |
exclude_words |
The words that should be excluded. |
A data frame with words and frequencies.
Run rank determination by svd on the tensor unfolded along each mode
determine_ranks_tucker( container, max_ranks_test, shuffle_level = "cells", shuffle_within = NULL, num_iter = 100, batch_var = NULL, norm_method = "trim", scale_factor = 10000, scale_var = TRUE, var_scale_power = 0.5, seed = container$experiment_params$rand_seed )
determine_ranks_tucker( container, max_ranks_test, shuffle_level = "cells", shuffle_within = NULL, num_iter = 100, batch_var = NULL, norm_method = "trim", scale_factor = 10000, scale_var = TRUE, var_scale_power = 0.5, seed = container$experiment_params$rand_seed )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
max_ranks_test |
numeric Vector of length 2 specifying the maximum number of donor and gene ranks to test |
shuffle_level |
character Either "cells" to shuffle cell-donor linkages or "tensor" to shuffle values within the tensor (default="cells") |
shuffle_within |
character A metadata variable to shuffle cell-donor linkages within (default=NULL) |
num_iter |
numeric Number of null iterations (default=100) |
batch_var |
character A batch variable from metadata to remove. No batch correction applied if NULL. (default=NULL) |
norm_method |
character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim') |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
scale_var |
logical TRUE to scale the gene expression variance across donors for each cell type. If FALSE then all genes are scaled to unit variance across donors for each cell type. (default=TRUE) |
var_scale_power |
numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5) |
seed |
numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed) |
The project container with a cowplot figure of rank determination plots in container$plots$rank_determination_plot.
test_container <- determine_ranks_tucker(test_container, max_ranks_test=c(3,5), shuffle_level='tensor', num_iter=4, norm_method='trim', scale_factor=10000, scale_var=TRUE, var_scale_power=.5)
test_container <- determine_ranks_tucker(test_container, max_ranks_test=c(3,5), shuffle_level='tensor', num_iter=4, norm_method='trim', scale_factor=10000, scale_var=TRUE, var_scale_power=.5)
Form the pseudobulk tensor as preparation for running the tensor decomposition.
form_tensor( container, donor_min_cells = 5, norm_method = "trim", scale_factor = 10000, vargenes_method = "norm_var", vargenes_thresh = 500, batch_var = NULL, scale_var = TRUE, var_scale_power = 0.5, custom_genes = NULL, verbose = TRUE )
form_tensor( container, donor_min_cells = 5, norm_method = "trim", scale_factor = 10000, vargenes_method = "norm_var", vargenes_thresh = 500, batch_var = NULL, scale_var = TRUE, var_scale_power = 0.5, custom_genes = NULL, verbose = TRUE )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
donor_min_cells |
numeric Minimum threshold for number of cells per donor (default=5) |
norm_method |
character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim') |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
vargenes_method |
character The method by which to select highly variable genes from each cell type. Set to 'anova' to select genes by anova. Set to 'norm_var' to select the top genes by normalized variance or 'norm_var_pvals' to select genes by significance of their overdispersion (default='norm_var') |
vargenes_thresh |
numeric The threshold to use in variable gene selection. For 'anova' and 'norm_var_pvals' this should be a p-value threshold. For 'norm_var' this should be the number of most variably expressed genes to select from each cell type (default=500) |
batch_var |
character A batch variable from metadata to remove (default=NULL) |
scale_var |
logical TRUE to scale the gene expression variance across donors for each cell type. If FALSE then all genes are scaled to unit variance across donors for each cell type. (default=TRUE) |
var_scale_power |
numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5) |
custom_genes |
character A vector of genes to include in the tensor. Overrides the default gene selection if not NULL. (default=NULL) |
verbose |
logical Set to TRUE to print out progress (default=TRUE) |
The project container with a list of tensor data added in the container$tensor_data slot.
test_container <- form_tensor(test_container, donor_min_cells=0, norm_method='trim', scale_factor=10000, vargenes_method='norm_var', vargenes_thresh=500, scale_var = TRUE, var_scale_power = 1.5)
test_container <- form_tensor(test_container, donor_min_cells=0, norm_method='trim', scale_factor=10000, vargenes_method='norm_var', vargenes_thresh=500, scale_var = TRUE, var_scale_power = 1.5)
Generate loadings heatmaps for all factors
get_all_lds_factor_plots( container, use_sig_only = FALSE, nonsig_to_zero = FALSE, annot = "none", pathways_list = NULL, sim_de_donor_group = NULL, sig_thresh = 0.05, display_genes = FALSE, gene_callouts = FALSE, callout_n_gene_per_ctype = 5, callout_ctypes = NULL, show_var_explained = TRUE, reset_other_factor_plots = TRUE )
get_all_lds_factor_plots( container, use_sig_only = FALSE, nonsig_to_zero = FALSE, annot = "none", pathways_list = NULL, sim_de_donor_group = NULL, sig_thresh = 0.05, display_genes = FALSE, gene_callouts = FALSE, callout_n_gene_per_ctype = 5, callout_ctypes = NULL, show_var_explained = TRUE, reset_other_factor_plots = TRUE )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
use_sig_only |
logical If TRUE, includes only significant genes from jackstraw in the heatmap. If FALSE, includes all the variable genes. (default = FALSE) |
nonsig_to_zero |
logical If TRUE, makes the loadings of all nonsignificant genes 0 (default=FALSE) |
annot |
character If set to "pathways" then creates an adjacent heatmap showing which genes are in which pathways. If set to "sig_genes" then creates an adjacent heatmap showing which genes were significant from jackstraw. If set to "none" no adjacent heatmap is plotted. (default="none") |
pathways_list |
list A list of sets of pathways for each factor. List index should be the number corresponding to the factor. (default=NULL) |
sim_de_donor_group |
numeric To plot the ground truth significant genes from a simulation next to the heatmap, put the number of the donor group that corresponds to the factor being plotted. Here it should be a vector corresponding to the factors. (default=NULL) |
sig_thresh |
numeric Pvalue significance threshold to use. If use_sig_only is TRUE the threshold is used as a cutoff for genes to include. If annot is "sig_genes" this value is used in the gene significance colormap as a minimum threshold. (default=0.05) |
display_genes |
logical If TRUE, displays the names of gene names (default=FALSE) |
gene_callouts |
logical If TRUE, then adds gene callout annotations to the heatmap (default=FALSE) |
callout_n_gene_per_ctype |
numeric To use if gene_callouts is TRUE. Sets the number of largest magnitude significant genes from each cell type to include in gene callouts. (default=5) |
callout_ctypes |
list To use if gene_callouts is TRUE. Specifies which cell types to get gene callouts for. Each entry of the list should be a character vector of ctypes for the respective factor. If NULL, then gets gene callouts for largest magnitude significant genes for all cell types. (default=NULL) |
show_var_explained |
logical If TRUE then shows an anottation with the explained variance for each cell type (default=TRUE) |
reset_other_factor_plots |
logical If TRUE then removes any existing loadings plots (default=TRUE) |
The project container with the list of all loadings heatmap plots placed in container$plots$all_lds_plots.
test_container <- get_all_lds_factor_plots(test_container)
test_container <- get_all_lds_factor_plots(test_container)
Get gene callout annotations for a loadings heatmap
get_callouts_annot( container, tmp_casted_num, factor_select, sig_thresh, top_n_per_ctype = 5, ctypes = NULL )
get_callouts_annot( container, tmp_casted_num, factor_select, sig_thresh, top_n_per_ctype = 5, ctypes = NULL )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
tmp_casted_num |
matrix The gene by cell type loadings matrix |
factor_select |
numeric The factor to investigate |
sig_thresh |
numeric Pvalue cutoff for significant genes |
top_n_per_ctype |
numeric The number of significant, largest magnitude genes from each cell type to generate callouts for (default=5) |
ctypes |
character The cell types for which to get the top genes to make callouts for. If NULL then uses all cell types. (default=NULL) |
A HeatmapAnnotation object for the gene callouts.
Get explained variance of the reconstructed data using one cell type from one factor
get_ctype_exp_var(container, factor_use, ctype)
get_ctype_exp_var(container, factor_use, ctype)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_use |
numeric The factor to get variance explained for |
ctype |
character The cell type to get variance explained for |
The explained variance numeric value for one cell type of one factor.
Compute and plot associations between donor factor scores and donor proportions of major cell types
get_ctype_prop_associations(container, stat_type, n_col = 2)
get_ctype_prop_associations(container, stat_type, n_col = 2)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
stat_type |
character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues. |
n_col |
numeric The number of columns to organize the plots into (default=2) |
The project container with a cowplot figure of results plots in container$plots$ctype_prop_factor_associations.
Compute and plot associations between donor factor scores and donor proportions of cell subtypes
get_ctype_subc_prop_associations( container, ctype, res, n_col = 2, alt_name = NULL )
get_ctype_subc_prop_associations( container, ctype, res, n_col = 2, alt_name = NULL )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctype |
character The cell type to get results for |
res |
numeric The clustering resolution to retrieve |
n_col |
numeric The number of columns to organize the plots into (default=2) |
alt_name |
character Alternate name for the cell type used in clustering (default=NULL) |
The project container with a cowplot figure of results plots in container$plots$ctype_prop_factor_associations.
Partition main gene by cell matrix into per cell type matrices with significantly variable genes only. Generally, this should be done through calling the form_tensor() wrapper function.
get_ctype_vargenes( container, method, thresh, ncores = container$experiment_params$ncores, seed = container$experiment_params$rand_seed )
get_ctype_vargenes( container, method, thresh, ncores = container$experiment_params$ncores, seed = container$experiment_params$rand_seed )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
method |
character The method used to select significantly variable genes across donors within a cell type. Can be either "anova" to use basic anova with cells grouped by donor or "norm_var" to get the top overdispersed genes by normalized variance. Set to "norm_var_pvals" to use normalized variance p-values as calculated in pagoda2. |
thresh |
numeric A pvalue threshold to use for gene significance when method is set to "anova" or "empir". For the method "norm_var" thresh is the number of top overdispersed genes from each cell type to include. |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
seed |
numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed) |
The project container with pseudobulk matrices limted to the selected most variable genes.
Get metadata matrix of dimensions donors by variables (not per cell)
get_donor_meta(container, additional_meta = NULL, only_analyzed = TRUE)
get_donor_meta(container, additional_meta = NULL, only_analyzed = TRUE)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
additional_meta |
character A vector of other variables to include (default=NULL) |
only_analyzed |
logical Set to TRUE to only include donors that were included in the formed tensor, otherwise set to FALSE (default=TRUE) |
The project container with metadata per donor (not per cell) in container$donor_metadata.
test_container <- get_donor_meta(test_container, additional_meta='lanes')
test_container <- get_donor_meta(test_container, additional_meta='lanes')
Get the explained variance of the reconstructed data using one factor
get_factor_exp_var(container, factor_use)
get_factor_exp_var(container, factor_use)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_use |
numeric The factor to investigate |
The explained variance numeric value for one factor.
Calculate adjusted p-values for gene_celltype fiber-donor score associations
get_fstats_pvals(fstats_real, fstats_shuffled)
get_fstats_pvals(fstats_real, fstats_shuffled)
fstats_real |
numeric A vector of F-Statistics for gene-cell type-factor combinations |
fstats_shuffled |
numeric A vector of null F-Statistics |
A vector of adjusted p-values for associations of the unshuffled fibers with factor donor scores.
Compute WGCNA gene modules for each cell type
get_gene_modules(container, sft_thresh)
get_gene_modules(container, sft_thresh)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
sft_thresh |
numeric A vector indicating the soft threshold to use for each cell type. Length should be the same as container$experiment_params$ctypes_use |
The project container with WGCNA gene co-expression modules added. The module eigengenes for each cell type are in container$module_eigengenes, and the module genes for each cell type are in container$module_genes.
Get logical vectors indicating which genes are in which pathways
get_gene_set_vectors(container, gene_sets, tmp_casted_num)
get_gene_set_vectors(container, gene_sets, tmp_casted_num)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
gene_sets |
character Vector of gene sets to extract genes for |
tmp_casted_num |
matrix The gene by cell type loadings matrix |
A list of the logical vectors for each pathway.
Compute subtype proportion-factor association p-values for all subclusters of a given major cell type
get_indv_subtype_associations(container, donor_props, factor_select)
get_indv_subtype_associations(container, donor_props, factor_select)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
donor_props |
matrix Donor proportions of subtypes |
factor_select |
numeric The factor to get associations for |
A vector of association statistics each cell subtype against a selected factor.
Extract the intersection of gene sets which are enriched in two or more cell types for a factor
get_intersecting_pathways( container, factor_select, these_ctypes_only, up_down, thresh = 0.05 )
get_intersecting_pathways( container, factor_select, these_ctypes_only, up_down, thresh = 0.05 )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to investigate |
these_ctypes_only |
character A vector of cell types for which to get gene sets that are enriched in all of these and not in any other cell types |
up_down |
character Set to "up" to get the gene sets for the positive loading genes. Set to "down" to get the gene sets for the negative loadings genes. |
thresh |
numeric Pvalue significance threshold for selecting enriched sets (default=0.05) |
A vector of the intersection of pathways that are significantly enriched in two or more cell types for a factor.
Get the leading edge genes from GSEA results
get_leading_edge_genes(container, factor_select, gsets, num_genes_per = 5)
get_leading_edge_genes(container, factor_select, gsets, num_genes_per = 5)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to get results for |
gsets |
character A vector of gene set names to get leading edge genes for. |
num_genes_per |
numeric The maximum number of leading edge genes to get for each gene set (default=5) |
A named character vector of gene sets, with leading edge genes as the names.
Compute gene-factor associations using univariate linear models
get_lm_pvals(container, n.cores = container$experiment_params$ncores)
get_lm_pvals(container, n.cores = container$experiment_params$ncores)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
n.cores |
Number of cores to use (default = container$experiment_params$ncores) |
The project container with a vector of adjusted p-values for the gene-factor associations in container$gene_score_associations.
test_container <- get_lm_pvals(test_container, n.cores=1)
test_container <- get_lm_pvals(test_container, n.cores=1)
Computes the max correlation between each factor of the decomposition done using the whole dataset to each factor computed using the subsampled/bootstrapped dataset
get_max_correlations(res_full, res_sub, res_use)
get_max_correlations(res_full, res_sub, res_use)
res_full |
matrix Either the donor scores or loadings matrix from the original decomposition |
res_sub |
matrix Either the donor scores or loadings matrix from the new decomposition |
res_use |
character Can either be 'loadings' or 'dscores' and should correspond with the data matrix used |
a vector of the max correlations for each original factor
Get metadata associations with factor donor scores
get_meta_associations(container, vars_test, stat_use = "rsq")
get_meta_associations(container, vars_test, stat_use = "rsq")
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
vars_test |
character The names of meta variables to get associations for |
stat_use |
character Set to either 'rsq' to get r-squared values or 'pval' to get adjusted pvalues (default='rsq) |
The project container with a matrix of metadata associations with each factor in container$meta_associations.
test_container <- get_meta_associations(test_container, vars_test='lanes', stat_use='pval')
test_container <- get_meta_associations(test_container, vars_test='lanes', stat_use='pval')
Evaluate the minimum number for significant genes in any factor for a given number of factors extracted by the decomposition
get_min_sig_genes( container, donor_rank_range, gene_ranks, use_lm = TRUE, tucker_type = "regular", rotation_type = "hybrid", n_fibers = 100, n_iter = 500, n.cores = container$experiment_params$ncores, thresh = 0.05 )
get_min_sig_genes( container, donor_rank_range, gene_ranks, use_lm = TRUE, tucker_type = "regular", rotation_type = "hybrid", n_fibers = 100, n_iter = 500, n.cores = container$experiment_params$ncores, thresh = 0.05 )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses. Should have |
donor_rank_range |
numeric Range of possible number of donor factors to use. |
gene_ranks |
numeric The number of gene ranks to use in the decomposition |
use_lm |
logical Set to true to use get_lm_pvals otherwise uses jackstraw (default=TRUE) |
tucker_type |
character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular') |
rotation_type |
character Set to 'hybrid' to perform hybrid rotation on resulting donor factor matrix and loadings. Otherwise set to 'ica_lds' to perform ica rotation on loadings or ica_dsc to perform ica on donor scores. (default='hybrid') |
n_fibers |
numeric The number of fibers the randomly shuffle in each jackstraw iteration (default=100) |
n_iter |
numeric The number of jackstraw shuffling iterations to complete (default=500) |
n.cores |
Number of cores to use in get_lm_pvals() (default = container$experiment_params$ncores) |
thresh |
numeric Pvalue threshold for significant genes in calculating the number of significant genes identified per factor. (default=0.05) |
The project container with a plot of the minimum significant genes for each decomposition with varying number of donor factors located in container$plots$min_sig_genes.
test_container <- get_min_sig_genes(test_container, donor_rank_range=c(2:4), gene_ranks=4, tucker_type='regular', rotation_type='hybrid', n.cores=1)
test_container <- get_min_sig_genes(test_container, donor_rank_range=c(2:4), gene_ranks=4, tucker_type='regular', rotation_type='hybrid', n.cores=1)
Identify gene sets that are enriched within specified gene co-regulatory modules. Uses a hypergeometric test for over-representation. Used in plot_multi_module_enr().
get_module_enr(container, ctype, mod_select, db_use = "GO", adjust_pval = TRUE)
get_module_enr(container, ctype, mod_select, db_use = "GO", adjust_pval = TRUE)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctype |
character The name of cell type for the cell type module to test |
mod_select |
numeric The module number for the cell type module to test |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", "Hallmark", "TF", and "immuno". More than one database can be used. (default="GO") |
adjust_pval |
logical Set to TRUE to apply FDR correction (default=TRUE) |
A vector of p-values for the tested gene sets.
Get normalized variance for each gene, taking into account mean-variance trend
get_normalized_variance(container)
get_normalized_variance(container)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
The project container with vectors of normalized variances values in scMinimal objects for each cell type. Generally, this should be done through calling the form_tensor() wrapper function.
Plot factor-batch associations for increasing number of donor factors
get_num_batch_ranks( container, donor_ranks_test, gene_ranks, batch_var, thresh = 0.5, tucker_type = "regular", rotation_type = "hybrid" )
get_num_batch_ranks( container, donor_ranks_test, gene_ranks, batch_var, thresh = 0.5, tucker_type = "regular", rotation_type = "hybrid" )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
donor_ranks_test |
numeric The number of donor rank values to test |
gene_ranks |
numeric The number of gene ranks to use throughout |
batch_var |
character The name of the batch meta variable |
thresh |
numeric The threshold r-squared cutoff for considering a factor to be a batch factor. Can be a vector of multiple values to get plots at varying thresholds. (default=0.5) |
tucker_type |
character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular') |
rotation_type |
character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid') |
A ggpubr figure of ggplot objects showing batch-factor associations and placed in container$plots$num_batch_factors slot
test_container <- get_num_batch_ranks(test_container, donor_ranks_test=c(2:4), gene_ranks=10, batch_var='lanes', thresh=0.5, tucker_type='regular', rotation_type='hybrid')
test_container <- get_num_batch_ranks(test_container, donor_ranks_test=c(2:4), gene_ranks=10, batch_var='lanes', thresh=0.5, tucker_type='regular', rotation_type='hybrid')
Get the donor scores and loadings matrix for a single-factor
get_one_factor(container, factor_select)
get_one_factor(container, factor_select)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The number corresponding to the factor to extract |
A list with the first element as the donor scores and the second element as the corresponding loadings matrix for one factor.
f1_res <- get_one_factor(test_container, factor_select=1)
f1_res <- get_one_factor(test_container, factor_select=1)
Get significant genes for a factor
get_one_factor_gene_pvals(container, factor_select)
get_one_factor_gene_pvals(container, factor_select)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The number corresponding to the factor to extract |
A gene by cell type matrix of gene significance p-values for a factor
Collapse data from cell-level to donor-level via summing counts. Generally, this should be done through calling the form_tensor() wrapper function.
get_pseudobulk(container, shuffle = FALSE, shuffle_within = NULL)
get_pseudobulk(container, shuffle = FALSE, shuffle_within = NULL)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
shuffle |
logical Set to TRUE to shuffle cell-donor linkages (default=FALSE) |
shuffle_within |
character A metadata variable to shuffle cell-donor linkages within (default=NULL) |
The project container with pseudobulked count matrices in container$scMinimal_ctype$<ctype>$pseudobulk slots for each cell type.
Get F-Statistics for the real (non-shuffled) gene_ctype fibers
get_real_fstats(container, ncores)
get_real_fstats(container, ncores)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ncores |
numeric The number of cores to use |
A vector F-statistics for each gene_celltype-factor association of the unshuffled data.
Calculate reconstruction errors using svd approach
get_reconstruct_errors_svd(tnsr, max_ranks_test, shuffle_tensor)
get_reconstruct_errors_svd(tnsr, max_ranks_test, shuffle_tensor)
tnsr |
array A 3-dimensional array with dimensions of donors, genes, and cell types in that order |
max_ranks_test |
numeric Vector of length 3 with maximum number of ranks to test for donor, gene, and cell type modes in that order |
shuffle_tensor |
logical Set to TRUE to shuffle values within the tensor |
A list of reconstruction errors for each mode of the tensor.
Get vectors indicating which genes are significant in which cell types for a factor of interest
get_significance_vectors(container, factor_select, ctypes)
get_significance_vectors(container, factor_select, ctypes)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to query |
ctypes |
character The cell types used in all the analysis ordered as they appear in the loadings matrix |
A list of the adjusted p-values for expression of each gene in each cell type in association with a factor of interest.
Get list of cell subtype differential expression heatmaps
get_subclust_de_hmaps(container, all_ctypes, all_res)
get_subclust_de_hmaps(container, all_ctypes, all_res)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
all_ctypes |
character A vector of the cell types to include |
all_res |
numeric A vector of resolutions matching the all_ctypes parameter |
A list of cell subcluster DE marker gene heatmaps as grob objects.
Get scatter plot for association of a cell subtype proportion with scores for a factor
get_subclust_enr_dotplot( container, ctype, res, subtype, factor_use, ctype_cur = ctype )
get_subclust_enr_dotplot( container, ctype, res, subtype, factor_use, ctype_cur = ctype )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctype |
character The cell type to plot |
res |
numeric The subcluster resolution to use |
subtype |
numeric The number corresponding with the subtype of the major cell type to plot |
factor_use |
numeric The factor to plot |
ctype_cur |
character The name of the major cell type used in the main analysis |
A ggplot object of each donor's cell subcluster proportions against donor scores for a selected factor.
Get a figure showing cell subtype proportion associations with each factor. Combines this plot with subtype UMAPs and differential expression heatmaps. Note that this function runs better if the number of cores in the conos object in container$embedding has n.cores set to a relatively small value < 10.
get_subclust_enr_fig(container, all_ctypes, all_res)
get_subclust_enr_fig(container, all_ctypes, all_res)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
all_ctypes |
character A vector of the cell types to include |
all_res |
numeric A vector of resolutions matching the all_ctypes parameter |
A cowplot figure placed in the slot container$plots$subc_fig.
Get heatmap of subtype proportion associations for each celltype/subtype and each factor
get_subclust_enr_hmap(container, all_ctypes, all_res, all_factors)
get_subclust_enr_hmap(container, all_ctypes, all_res, all_factors)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
all_ctypes |
character A vector of the cell types to include |
all_res |
numeric A vector of resolutions matching the all_ctypes parameter |
all_factors |
numerc A vector of the factors to compute associations for |
A ComplexHeatmap object in container$plots$subc_enr_hmap showing the univariate associations between cell subcluster proportions and each factor.
Get a figure to display subclusterings at multiple resolutions
get_subclust_umap(container, all_ctypes, all_res, n_col = 3)
get_subclust_umap(container, all_ctypes, all_res, n_col = 3)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
all_ctypes |
character A vector of the cell types to include |
all_res |
numeric A vector of resolutions matching the all_ctypes parameter |
n_col |
numeric The number of columns to organize the figure into (default=3) |
The project container with a cowplot figure of all UMAP plots in container$plots$subc_umap_fig and the individual umap plots in container$plots$subc_umaps
Perform leiden subclustering to get cell subtypes
get_subclusters( container, ctype, resolution, min_cells_group = 50, small_clust_action = "merge" )
get_subclusters( container, ctype, resolution, min_cells_group = 50, small_clust_action = "merge" )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctype |
character The cell type to do subclustering for |
resolution |
numeric The leiden resolution to use |
min_cells_group |
numeric The minimum allowable cluster size (default=50) |
small_clust_action |
character Either 'remove' to remove subclusters or 'merge' to merge clusters below min_cells_group threshold to the nearest cluster above the size threshold (default='merge') |
A vector of cell subclusters.
Compute and plot associations between factor scores and cell subtype composition for various clustering resolution parameters
get_subtype_prop_associations( container, max_res, stat_type, integration_var = NULL, min_cells_group = 50, use_existing_subc = FALSE, alt_ct_names = NULL, n_col = 2 )
get_subtype_prop_associations( container, max_res, stat_type, integration_var = NULL, min_cells_group = 50, use_existing_subc = FALSE, alt_ct_names = NULL, n_col = 2 )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
max_res |
numeric The maximum clustering resolution to use. Minimum is 0.5. |
stat_type |
character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues. |
integration_var |
character The meta data variable to use for creating the joint embedding with Conos if not already provided in container$embedding (default=NULL) |
min_cells_group |
numeric The minimum allowable size for cell subpopulations (default=50) |
use_existing_subc |
logical Set to TRUE to use existing subcluster annotations (default=FALSE) |
alt_ct_names |
character Cell type names used in clustering if different from those used in the main analysis. Should match the order of container$experiment_params$ctypes_use. (default=NULL) |
n_col |
numeric The number of columns to organize the plots into (default=2) |
The project container with a cowplot figure of cell subtype proportion-factor association results plots in container$plots$subtype_prop_factor_associations.
Calculates factor-stratified sums for each column. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp
get_sums(sY, rowSel)
get_sums(sY, rowSel)
sY |
sparse matrix Gene by cell matrix of counts |
rowSel |
factor The donor that each cell is from |
matrix of summed counts per gene per sample
Visualize the similarity matrix and the clustering. Adapted from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R
ht_clusters( mat, cl, dend = NULL, col = c("white", "red"), draw_word_cloud = is_GO_id(rownames(mat)[1]) || !is.null(term), term = NULL, min_term = 5, order_by_size = FALSE, exclude_words = character(0), max_words = 10, word_cloud_grob_param = list(), fontsize_range = c(4, 16), column_title = NULL, ht_list = NULL, use_raster = TRUE, ... )
ht_clusters( mat, cl, dend = NULL, col = c("white", "red"), draw_word_cloud = is_GO_id(rownames(mat)[1]) || !is.null(term), term = NULL, min_term = 5, order_by_size = FALSE, exclude_words = character(0), max_words = 10, word_cloud_grob_param = list(), fontsize_range = c(4, 16), column_title = NULL, ht_list = NULL, use_raster = TRUE, ... )
mat |
A similarity matrix. |
cl |
Cluster labels inferred from the similarity matrix, e.g. from 'cluster_terms' or 'binary_cut'. |
dend |
Used internally. |
col |
A vector of colors that map from 0 to the 95^th percentile of the similarity values. |
draw_word_cloud |
Whether to draw the word clouds. |
term |
The full name or the description of the corresponding GO IDs. |
min_term |
Minimal number of functional terms in a cluster. All the clusters with size less than “min_term“ are all merged into one separated cluster in the heatmap. |
order_by_size |
Whether to reorder clusters by their sizes. The cluster that is merged from small clusters (size < “min_term“) is always put to the bottom of the heatmap. |
exclude_words |
Words that are excluded in the word cloud. |
max_words |
Maximal number of words visualized in the word cloud. |
word_cloud_grob_param |
A list of graphic parameters passed to 'word_cloud_grob'. |
fontsize_range |
The range of the font size. The value should be a numeric vector with length two. The minimal font size is mapped to word frequency value of 1 and the maximal font size is mapped to the maximal word frequency. The font size interlopation is linear. |
column_title |
Column title for the heatmap. |
ht_list |
A list of additional heatmaps added to the left of the similarity heatmap. |
use_raster |
Whether to write the heatmap as a raster image. |
... |
other parameters |
A list containing a 'ComplexHeatmap::HeatmapList-class' object and GO term ordering.
Extract metadata for sex information if not provided already
identify_sex_metadata(container, y_gene = "RPS4Y1", x_gene = "XIST")
identify_sex_metadata(container, y_gene = "RPS4Y1", x_gene = "XIST")
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
y_gene |
character Gene name to use for identifying male donors (default='RPS4Y1') |
x_gene |
character Gene name to use for identifying female donors (default='XIST') |
The project container with sex metadata added to the metadata.
Initialize parameters to be used throughout scITD in various functions
initialize_params(ctypes_use, ncores = 4, rand_seed = 10)
initialize_params(ctypes_use, ncores = 4, rand_seed = 10)
ctypes_use |
character Names of the cell types to use for the analysis (default=NULL) |
ncores |
numeric Number of cores to use (default=4) |
rand_seed |
numeric Random seed to use (default=10) |
A list of the experiment parameters to use.
param_list <- initialize_params(ctypes_use = c("CD4+ T", "CD8+ T"), ncores = 1, rand_seed = 10)
param_list <- initialize_params(ctypes_use = c("CD4+ T", "CD8+ T"), ncores = 1, rand_seed = 10)
Create an scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.
instantiate_scMinimal( count_data, meta_data, metadata_cols = NULL, metadata_col_nm = NULL )
instantiate_scMinimal( count_data, meta_data, metadata_cols = NULL, metadata_col_nm = NULL )
count_data |
sparseMatrix Matrix of raw counts with genes as rows and cells as columns |
meta_data |
data.frame Metadata with cells as rows and variables as columns. Number of rows in metadata should equal number of columns in count matrix. |
metadata_cols |
character The names of the metadata columns to use (default=NULL) |
metadata_col_nm |
character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL) |
An scMinimal object holding counts and metadata for a project.
scMinimal <- instantiate_scMinimal(count_data=test_container$scMinimal_full$count_data, meta_data=test_container$scMinimal_full$metadata)
scMinimal <- instantiate_scMinimal(count_data=test_container$scMinimal_full$count_data, meta_data=test_container$scMinimal_full$metadata)
Check if a character is a go ID
is_GO_id(x)
is_GO_id(x)
x |
A character |
A logical
Create a container to store all data and results for the project. You must provide a params list as generated by initialize_params(). You also need to provide either a Seurat object or both a count_data matrix and a meta_data matrix.
make_new_container( params, count_data = NULL, meta_data = NULL, seurat_obj = NULL, scMinimal = NULL, gn_convert = NULL, metadata_cols = NULL, metadata_col_nm = NULL, label_donor_sex = FALSE )
make_new_container( params, count_data = NULL, meta_data = NULL, seurat_obj = NULL, scMinimal = NULL, gn_convert = NULL, metadata_cols = NULL, metadata_col_nm = NULL, label_donor_sex = FALSE )
params |
list A list of the experiment params to use as generated by initialize_params() |
count_data |
dgCMatrix Matrix of raw counts with genes as rows and cells as columns (default=NULL) |
meta_data |
data.frame Metadata with cells as rows and variables as columns. Number of rows in metadata should equal number of columns in count matrix (default=NULL) |
seurat_obj |
Seurat object that has been cleaned and includes the normalized, log-transformed counts. The meta.data should include a column with the header 'sex' and values of 'M' or 'F' if available. The metadata should also have a column with the header 'ctypes' with the corresponding names of the cell types as well as a column with header 'donors' that contains identifiers for each donor. (default=NULL) |
scMinimal |
environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata (default=NULL) |
gn_convert |
data.frame Gene identifier -> gene name conversions table. Gene identifiers used in counts matrices should appear in the first column and the corresponding gene symbols should appear in the second column. Can remain NULL if the identifiers are already gene symbols. (default=NULL) |
metadata_cols |
character The names of the metadata columns to use (default=NULL) |
metadata_col_nm |
character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL) |
label_donor_sex |
logical Set to TRUE to label donor sex in the meta data by using expressing of sex-associated genes (default=FALSE) |
A project container of class environment that stores sub-containers for each cell type as well as results and plots from all analyses.
Merge small subclusters into larger ones
merge_small_clusts(con, clusts, min_cells_group)
merge_small_clusts(con, clusts, min_cells_group)
con |
conos Object for the dataset with umap projection and groups as cell types |
clusts |
character The initially assigned subclusters by leiden clustering |
min_cells_group |
numeric The minimum allowable cluster size |
The subcluster labels with small clusters below the size threshold merged into the nearest larger cluster.
Computes non-negative matrix factorization on the tensor unfolded along the donor dimension
nmf_unfolded(container, ranks)
nmf_unfolded(container, ranks)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of factors to extract. Unlike with the Tucker decomposition, this should be a single number. |
The project container with results of the decomposition in container$tucker_results. The results object is a list with the donor scores matrix in the first element and the unfolded loadings matrix in the second element.
test_container <- nmf_unfolded(test_container, 2)
test_container <- nmf_unfolded(test_container, 2)
Calculates the normalized variance for each gene. This is adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/R/Pagoda2.R Generally, this should be done through calling the form_tensor() wrapper function.
norm_var_helper(scMinimal)
norm_var_helper(scMinimal)
scMinimal |
environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata |
A list with the first element containing a vector of the normalized variance for each gene and the second element containing log-transformed adjusted p-values for the overdispersion of each gene.
Helper function to normalize and log-transform count data
normalize_counts(count_data, scale_factor = 10000)
normalize_counts(count_data, scale_factor = 10000)
count_data |
matrix or sparse matrix Gene by cell matrix of counts |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
The normalized, log-transformed matrix.
Normalize the pseudobulked counts matrices. Generally, this should be done through calling the form_tensor() wrapper function.
normalize_pseudobulk(container, method = "trim", scale_factor = 10000)
normalize_pseudobulk(container, method = "trim", scale_factor = 10000)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
method |
character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim') |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
The project container with normalized pseudobulk matrices in container$scMinimal_ctype$<ctype>$pseudobulk slots.
Parse main counts matrix into per-celltype-matrices. Generally, this should be done through calling the form_tensor() wrapper function.
parse_data_by_ctypes(container)
parse_data_by_ctypes(container)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
The project container with separate scMinimal objects per cell type in the container$scMinimal_ctype slot
Computes singular-value decomposition on the tensor unfolded along the donor dimension
pca_unfolded(container, ranks)
pca_unfolded(container, ranks)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of factors to extract. Unlike with the Tucker decomposition, this should be a single number. |
The project container with results of the decomposition in container$tucker_results. The results object is a list with the donor scores matrix in the first element and the unfolded loadings matrix in the second element.
test_container <- pca_unfolded(test_container, 2)
test_container <- pca_unfolded(test_container, 2)
Plot matrix of donor scores extracted from Tucker decomposition
plot_donor_matrix( container, meta_vars = NULL, cluster_by_meta = NULL, show_donor_ids = FALSE, add_meta_associations = NULL, show_var_explained = TRUE, donors_sel = NULL, h_w = NULL )
plot_donor_matrix( container, meta_vars = NULL, cluster_by_meta = NULL, show_donor_ids = FALSE, add_meta_associations = NULL, show_var_explained = TRUE, donors_sel = NULL, h_w = NULL )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
meta_vars |
character Names of metadata variables to plot alongside the donor scores. Can include more than one variable. (default=NULL) |
cluster_by_meta |
character One metadata variable to cluster the heatmap by. If NULL, donor clustering is done using donor scores. (default=NULL) |
show_donor_ids |
logical Set to TRUE to show donor id as row name on the heamap (default=FALSE) |
add_meta_associations |
character Adds meta data associations with each factor as top annotation. These should be generated first with plot_meta_associations(). Set to 'pval' if used 'pval' in plot_meta_associations(), otherwise set to 'rsq'. If NULL, no annotation is added. (default=NULL) |
show_var_explained |
logical Set to TRUE to display the explained variance for each factor (default=TRUE) |
donors_sel |
character A vector of a subset of donors to include in the plot (default=NULL) |
h_w |
numeric Vector specifying height and width (defualt=NULL) |
The project container with a heatmap plot of donor scores in container$plots$donor_matrix.
test_container <- plot_donor_matrix(test_container, show_donor_ids = TRUE)
test_container <- plot_donor_matrix(test_container, show_donor_ids = TRUE)
Plot donor celltype/subtype proportions against each factor
plot_donor_props( donor_props, donor_scores, significance, ctype_mapping = NULL, stat_type = "adj_pval", n_col = 2 )
plot_donor_props( donor_props, donor_scores, significance, ctype_mapping = NULL, stat_type = "adj_pval", n_col = 2 )
donor_props |
data.frame Donor proportions as output from compute_donor_props() |
donor_scores |
data.frame Donor scores from tucker results |
significance |
numeric F-Statistics as output from compute_associations() |
ctype_mapping |
character The cell types corresponding with columns of donor_props (default=NULL) |
stat_type |
character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues (default='adj_pval') |
n_col |
numeric The number of columns to organize the plots into (default=2) |
A cowplot figure of ggplot objects for proportions of each cell type against donor factor scores for each factor.
Generate a gene by donor heatmap showing scaled expression of top loading genes for a given factor
plot_donor_sig_genes( container, factor_select, top_n_per_ctype, ctypes_use = NULL, show_donor_labels = FALSE, additional_meta = NULL, add_genes = NULL )
plot_donor_sig_genes( container, factor_select, top_n_per_ctype, ctypes_use = NULL, show_donor_labels = FALSE, additional_meta = NULL, add_genes = NULL )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to query |
top_n_per_ctype |
numeric Vector of the number of top genes from each cell type to plot |
ctypes_use |
character The cell types for which to get the top genes to make callouts for. If NULL then uses all cell types. (default=NULL) |
show_donor_labels |
logical Set to TRUE to display donor labels (default=FALSE) |
additional_meta |
character Another meta variable to plot (default=NULL) |
add_genes |
character Additional genes to plot for all ctypes (default=NULL) |
The project container with a heatmap plot in the slot container$plots$donor_sig_genes$<Factor#>. This heatmap shows scaled expression of top loading genes in each cell type for a selected factor.
test_container <- plot_donor_sig_genes(test_container, factor_select=1, top_n_per_ctype=2)
test_container <- plot_donor_sig_genes(test_container, factor_select=1, top_n_per_ctype=2)
Compute enrichment of donor metadata categorical variables at high/low factor scores
plot_dscore_enr(container, factor_use, meta_var)
plot_dscore_enr(container, factor_use, meta_var)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_use |
numeric The factor to test |
meta_var |
character The name of the metadata variable to test |
A cowplot figure of enrichment plots.
fig <- plot_dscore_enr(test_container, factor_use=1, meta_var='lanes')
fig <- plot_dscore_enr(test_container, factor_use=1, meta_var='lanes')
Plot enriched gene sets from all cell types in a heatmap
plot_gsea_hmap(container, factor_select, thresh)
plot_gsea_hmap(container, factor_select, thresh)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to plot |
thresh |
numeric Pvalue threshold to use for including gene sets in the heatmap |
A stacked heatmap object from ComplexHeatmap.
Plot already computed enriched gene sets to show semantic similarity between sets
plot_gsea_hmap_w_similarity( container, factor_select, direc, thresh, exclude_words = character(0) )
plot_gsea_hmap_w_similarity( container, factor_select, direc, thresh, exclude_words = character(0) )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to plot |
direc |
character Set to either 'up' or 'down' to use the appropriate sets |
thresh |
numeric Pvalue threshold to use for including gene sets in the heatmap |
exclude_words |
character Vector of words to exclude from word cloud (default=character(0)) |
No value is returned. A heatmap showing enriched gene sets clustered by semantic similarity is drawn.
Look at enriched gene sets from a cluster of semantically similar gene sets. Uses the results from previous run of plot_gsea_hmap_w_similarity()
plot_gsea_sub(container, clust_select, thresh = 0.05)
plot_gsea_sub(container, clust_select, thresh = 0.05)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
clust_select |
numeric The cluster to plot gene sets from. On the previous semantic similarity plot, cluster numbering starts from the top as 1. |
thresh |
numeric Color threshold to use for showing significance (default=0.05) |
A heatmap plot from ComplexHeatmap showing one semantic similarity cluster of enriched gene sets with adjusted p-values for each cell type.
Plot the gene by celltype loadings for a factor
plot_loadings_annot( container, factor_select, use_sig_only = FALSE, nonsig_to_zero = FALSE, annot = "none", pathways = NULL, sim_de_donor_group = NULL, sig_thresh = 0.05, display_genes = FALSE, gene_callouts = FALSE, callout_n_gene_per_ctype = 5, callout_ctypes = NULL, specific_callouts = NULL, le_set_callouts = NULL, le_set_colormap = NULL, le_set_num_per = 5, show_le_legend = FALSE, show_xlab = TRUE, show_var_explained = TRUE, clust_method = "median", h_w = NULL, reset_other_factor_plots = FALSE, draw_plot = TRUE )
plot_loadings_annot( container, factor_select, use_sig_only = FALSE, nonsig_to_zero = FALSE, annot = "none", pathways = NULL, sim_de_donor_group = NULL, sig_thresh = 0.05, display_genes = FALSE, gene_callouts = FALSE, callout_n_gene_per_ctype = 5, callout_ctypes = NULL, specific_callouts = NULL, le_set_callouts = NULL, le_set_colormap = NULL, le_set_num_per = 5, show_le_legend = FALSE, show_xlab = TRUE, show_var_explained = TRUE, clust_method = "median", h_w = NULL, reset_other_factor_plots = FALSE, draw_plot = TRUE )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to plot |
use_sig_only |
logical If TRUE, includes only significant genes from jackstraw in the heatmap. If FALSE, includes all the variable genes. (default = FALSE) |
nonsig_to_zero |
logical If TRUE, makes the loadings of all nonsignificant genes 0 (default=FALSE) |
annot |
character If set to "pathways" then creates an adjacent heatmap showing which genes are in which pathways. If set to "sig_genes" then creates an adjacent heatmap showing which genes were significant from jackstraw. If set to "none" no adjacent heatmap is plotted. (default="none") |
pathways |
character Gene sets to plot if annot is set to "pathways" (default=NULL) |
sim_de_donor_group |
numeric To plot the ground truth significant genes from a simulation next to the heatmap, put the number of the donor group that corresponds to the factor being plotted (default=NULL) |
sig_thresh |
numeric Pvalue significance threshold to use. If use_sig_only is TRUE the threshold is used as a cutoff for genes to include. If annot is "sig_genes" this value is used in the gene significance colormap as a minimum threshold. (default=0.05) |
display_genes |
logical If TRUE, displays the names of gene names (default=FALSE) |
gene_callouts |
logical If TRUE, then adds gene callout annotations to the heatmap (default=FALSE) |
callout_n_gene_per_ctype |
numeric To use if gene_callouts is TRUE. Sets the number of largest magnitude significant genes from each cell type to include in gene callouts. (default=5) |
callout_ctypes |
character To use if gene_callouts is TRUE. Specifies which cell types to get gene callouts for. If NULL, then gets gene callouts for largest magnitude significant genes for all cell types. (default=NULL) |
specific_callouts |
character A vector of gene names to show callouts for (default=NULL) |
le_set_callouts |
character Pass a vector of gene set names to show leading edge genes for a select set of gene sets (default=NULL) |
le_set_colormap |
character A named vector with names as gene sets and values as colors. If NULL, then selects first n colors of Set3 color palette. (default=NULL) |
le_set_num_per |
numeric The number of leading edge genes to show for each gene set (default=5) |
show_le_legend |
logical Set to TRUE to show the color map legend for leading edge genes (default=FALSE) |
show_xlab |
logical If TRUE, displays the xlabel 'genes' (default=TRUE) |
show_var_explained |
logical If TRUE then shows an anotation with the explained variance for each cell type (default=TRUE) |
clust_method |
character The hclust method to use for clustering rows (default='median') |
h_w |
numeric Vector specifying height and width (defualt=NULL) |
reset_other_factor_plots |
logical Set to TRUE to set all other loadings plots to NULL. Useful if run get_all_lds_factor_plots but then only want to show one or two plots. (default=FALSE) |
draw_plot |
logical Set to TRUE to show the plot. Plot is stored regardless. (default=TRUE) |
The project container with a heatmap of loadings for one factor put in container$plots$all_lds_plots. The legend for the heatmap is put in container$plots$all_legends. Use draw(<hmap obj>,annotation_legend_list = <hmap legend obj>) to re-render the plot with legend.
test_container <- plot_loadings_annot(test_container, 1, display_genes=FALSE, show_var_explained = TRUE)
test_container <- plot_loadings_annot(test_container, 1, display_genes=FALSE, show_var_explained = TRUE)
Plot trio of associations between ligand expression, module eigengenes, and factor scores
plot_mod_and_lig(container, factor_select, mod_ct, mod, lig_ct, lig)
plot_mod_and_lig(container, factor_select, mod_ct, mod, lig_ct, lig)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to use |
mod_ct |
character The name of the cell type for the corresponding module |
mod |
numeric The number of the corresponding module |
lig_ct |
character The name of the cell type where the ligand is expressed |
lig |
character The name of the ligand to use |
A cowplot figure of ggplot objects for the three associations scatter plots.
Generate gene set x ct_module heatmap showing co-expression module gene set enrichment results
plot_multi_module_enr( container, ctypes, modules, sig_thresh = 0.05, db_use = "TF", max_plt_pval = 0.1, h_w = NULL )
plot_multi_module_enr( container, ctypes, modules, sig_thresh = 0.05, db_use = "TF", max_plt_pval = 0.1, h_w = NULL )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctypes |
character A vector of cell type names corresponding to the module numbers in mod_select, specifying the modules to compute enrichment for |
modules |
numeric A vector of module numbers corresponding to the cell types in ctype, specifying the modules to compute enrichment for |
sig_thresh |
numeric P-value threshold for results to include. Only shows a given gene set if at least one module has a result lower than the threshold. (default=0.05) |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", "Hallmark", "TF", and "immuno". More than one database can be used. (default="GO") |
max_plt_pval |
max pvalue shown on plot, but not used to remove rows like sig_thresh (default=.1) |
h_w |
numeric Vector specifying height and width (defualt=NULL) |
A ComplexHeatmap object of enrichment results.
Plot reconstruction errors as bar plot for svd method
plot_rec_errors_bar_svd(real, shuffled, mode_to_show)
plot_rec_errors_bar_svd(real, shuffled, mode_to_show)
real |
list The real reconstruction errors |
shuffled |
list The reconstruction errors under null model |
mode_to_show |
numeric The mode to plot the results for |
A ggplot object showing the difference in reconstruction errors for successive factors.
Plot reconstruction errors as line plot for svd method
plot_rec_errors_line_svd(real, shuffled, mode_to_show)
plot_rec_errors_line_svd(real, shuffled, mode_to_show)
real |
list The real reconstruction errors |
shuffled |
list The reconstruction errors under null model |
mode_to_show |
numeric The mode to plot the results for |
A ggplot object showing relative reconstruction errors.
Plot dotplots for each factor to compare donor scores between metadata groups
plot_scores_by_meta(container, meta_var)
plot_scores_by_meta(container, meta_var)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
meta_var |
character The meta data variable to compare groups for |
The project container with a figure of comparison plots (one for each factor) placed in container$plots$indv_meta_scores_associations.
Plot enrichment results for hand picked gene sets
plot_select_sets( container, factors_all, sets_plot, color_sets = NULL, cl_rows = FALSE, h_w = NULL, myfontsize = 8 )
plot_select_sets( container, factors_all, sets_plot, color_sets = NULL, cl_rows = FALSE, h_w = NULL, myfontsize = 8 )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factors_all |
numeric Vector of one or more factor numbers to get plots for |
sets_plot |
character Vector of gene set names to show enrichment values for |
color_sets |
named character Values are colors corresponding to each set, with names as the gene set names (default=NULL) |
cl_rows |
logical Set to TRUE to cluster gene set results (default=FALSE) |
h_w |
numeric Vector specifying height and width (defualt=NULL) |
myfontsize |
numeric Gene set label fontsize (default=8) |
A list with a ComplexHeatmap object of select enriched gene sets as the first element and with a legend object as the second element.
Generate a plot for either the donor scores or loadings stability test
plot_stability_results(container, plt_data)
plot_stability_results(container, plt_data)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
plt_data |
character Either 'lds' or 'dsc' and indicates which plot to make |
the plot
Plot association significances for varying clustering resolutions
plot_subclust_associations(res, n_col = 2)
plot_subclust_associations(res, n_col = 2)
res |
data.frame Regression statistics for each subcluster analysis |
n_col |
numeric The number of columns to organize the plots into (default=2) |
A cowplot of ggplot objects showing statistics for regressions of proportions of each cell subtype (at varying clustering resolutions) against each factor.
Plot a heatmap of differential genes. Code is adapted from Conos package. https://github.com/kharchenkolab/conos/blob/master/R/plot.R
plotDEheatmap_conos( con, groups, container, de = NULL, min.auc = NULL, min.specificity = NULL, min.precision = NULL, n.genes.per.cluster = 10, additional.genes = NULL, exclude.genes = NULL, labeled.gene.subset = NULL, expression.quantile = 0.99, pal = (grDevices::colorRampPalette(c("dodgerblue1", "grey95", "indianred1")))(1024), ordering = "-AUC", column.metadata = NULL, show.gene.clusters = TRUE, remove.duplicates = TRUE, column.metadata.colors = NULL, show.cluster.legend = TRUE, show_heatmap_legend = FALSE, border = TRUE, return.details = FALSE, row.label.font.size = 10, order.clusters = FALSE, split = FALSE, split.gap = 0, cell.order = NULL, averaging.window = 0, ... )
plotDEheatmap_conos( con, groups, container, de = NULL, min.auc = NULL, min.specificity = NULL, min.precision = NULL, n.genes.per.cluster = 10, additional.genes = NULL, exclude.genes = NULL, labeled.gene.subset = NULL, expression.quantile = 0.99, pal = (grDevices::colorRampPalette(c("dodgerblue1", "grey95", "indianred1")))(1024), ordering = "-AUC", column.metadata = NULL, show.gene.clusters = TRUE, remove.duplicates = TRUE, column.metadata.colors = NULL, show.cluster.legend = TRUE, show_heatmap_legend = FALSE, border = TRUE, return.details = FALSE, row.label.font.size = 10, order.clusters = FALSE, split = FALSE, split.gap = 0, cell.order = NULL, averaging.window = 0, ... )
con |
conos (or p2) object |
groups |
groups in which the DE genes were determined (so that the cells can be ordered correctly) |
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
de |
differential expression result (list of data frames) |
min.auc |
optional minimum AUC threshold |
min.specificity |
optional minimum specificity threshold |
min.precision |
optional minimum precision threshold |
n.genes.per.cluster |
number of genes to show for each cluster |
additional.genes |
optional additional genes to include (the genes will be assigned to the closest cluster) |
exclude.genes |
an optional list of genes to exclude from the heatmap |
labeled.gene.subset |
a subset of gene names to show (instead of all genes). Can be a vector of gene names, or a number of top genes (in each cluster) to show the names for. |
expression.quantile |
expression quantile to show (0.98 by default) |
pal |
palette to use for the main heatmap |
ordering |
order by which the top DE genes (to be shown) are determined (default "-AUC") |
column.metadata |
additional column metadata, passed either as a data.frame with rows named as cells, or as a list of named cell factors. |
show.gene.clusters |
whether to show gene cluster color codes |
remove.duplicates |
remove duplicated genes (leaving them in just one of the clusters) |
column.metadata.colors |
a list of color specifications for additional column metadata, specified according to the HeatmapMetadata format. Use "clusters" slot to specify cluster colors. |
show.cluster.legend |
whether to show the cluster legend |
show_heatmap_legend |
whether to show the expression heatmap legend |
border |
show borders around the heatmap and annotations |
return.details |
if TRUE will return a list containing the heatmap (ha), but also raw matrix (x), expression list (expl) and other info to produce the heatmap on your own. |
row.label.font.size |
font size for the row labels |
order.clusters |
whether to re-order the clusters according to the similarity of the expression patterns (of the genes being shown) |
split |
logical If TRUE splits the heatmap by cell type (default=FALSE) |
split.gap |
numeric The distance to put in the gaps between split parts of the heatmap if split=TRUE (default=0) |
cell.order |
explicitly supply cell order |
averaging.window |
optional window averaging between neighboring cells within each group (turned off by default) - useful when very large number of cells shown (requires zoo package) |
... |
extra parameters are passed to pheatmap |
ComplexHeatmap::Heatmap object (see return.details param for other output)
Prepare data for LR analysis and get soft thresholds to use for gene modules
prep_LR_interact( container, lr_pairs, norm_method = "trim", scale_factor = 10000, var_scale_power = 0.5, batch_var = NULL )
prep_LR_interact( container, lr_pairs, norm_method = "trim", scale_factor = 10000, var_scale_power = 0.5, batch_var = NULL )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
lr_pairs |
data.frame Data of ligand-receptor pairs. First column should be ligands and second column should be one or more receptors separated by an underscore such as receptor1_receptor2 in the case that multiple receptors are required for signaling. |
norm_method |
character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim') |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
var_scale_power |
numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5) |
batch_var |
character A batch variable from metadata to remove (default=NULL) |
The project container with added container$scale_pb_extra slot that contains the tensor with additional ligands and receptors. Also has container$no_scale_pb_extra slot with pseudobulked, normalized data that is not scaled.
Project multicellular patterns to get scores on new data
project_new_data(new_container, old_container)
project_new_data(new_container, old_container)
new_container |
environment A project container with new data to project scores for. The form_tensor() function should be run. |
old_container |
environment The original project container that has the multicellular gene expression patterns already extracted. These patterns will be projected onto the new data. |
The new container environment object with projected scores in new_container$projected_scores. The factors will be ordered the same as the factors in old_container.
Gets a conos object of the data, aligning datasets across a specified variable such as batch or donors. This can be run independently or through get_subtype_prop_associations().
reduce_dimensions( container, integration_var, ncores = container$experiment_params$ncores )
reduce_dimensions( container, integration_var, ncores = container$experiment_params$ncores )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
integration_var |
character The meta data variable to use for creating the joint embedding with Conos. |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
The project container with a conos object in container$embedding.
Reduce each cell type's expression matrix to just the significantly variable genes. Generally, this should be done through calling the form_tensor() wrapper function.
reduce_to_vargenes(container)
reduce_to_vargenes(container)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
The project container with pseudobulked matrices reduced to only the most variable genes.
Create a figure of all loadings plots arranged
render_multi_plots(container, data_type, max_cols = 3)
render_multi_plots(container, data_type, max_cols = 3)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
data_type |
character Can be either "loadings", "gsea", or "dgenes". This determines which list of heatmaps to organize into the figure. |
max_cols |
numeric The max number of columns to plot. Can only either be 2 or 3 since these are large plots. (default=3) |
The multi-plot figure.
test_container <- get_all_lds_factor_plots(test_container) fig <- render_multi_plots(test_container, data_type='loadings')
test_container <- get_all_lds_factor_plots(test_container) fig <- render_multi_plots(test_container, data_type='loadings')
Reshape loadings for a factor from linearized to matrix form
reshape_loadings(ldngs_row, genes, ctypes)
reshape_loadings(ldngs_row, genes, ctypes)
ldngs_row |
numeric A vector of loadings values for one factor |
genes |
character The gene identifiers corresponding to each loading |
ctypes |
character The cell type corresponding to each loading |
A loadings matrix with dimensions of genes by cell types.
Run fgsea for one cell type of one factor
run_fgsea( container, factor_select, ctype, db_use = "GO", signed = TRUE, min_gs_size = 15, max_gs_size = 500, ncores = container$experiment_params$ncores )
run_fgsea( container, factor_select, ctype, db_use = "GO", signed = TRUE, min_gs_size = 15, max_gs_size = 500, ncores = container$experiment_params$ncores )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor of interest |
ctype |
character The cell type of interest |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", and "Hallmark". More than one database can be used. (default="GO") |
signed |
logical If TRUE, uses signed gsea. If FALSE, uses unsigned gsea. Currently only works with fgsea method. (default=TRUE) |
min_gs_size |
numeric Minimum gene set size (default=15) |
max_gs_size |
numeric Maximum gene set size (default=500) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
A data.frame of the fgsea results for enrichment of gene sets in a given cell type for a given factor. The results contain adjusted p-values, normalized enrichment scores, leading edge genes, and other information output by fgsea.
Run gsea separately for all cell types of one specified factor and plot results
run_gsea_one_factor( container, factor_select, method = "fgsea", thresh = 0.05, db_use = "GO", signed = TRUE, min_gs_size = 15, max_gs_size = 500, reset_other_factor_plots = FALSE, draw_plot = TRUE, ncores = container$experiment_params$ncores )
run_gsea_one_factor( container, factor_select, method = "fgsea", thresh = 0.05, db_use = "GO", signed = TRUE, min_gs_size = 15, max_gs_size = 500, reset_other_factor_plots = FALSE, draw_plot = TRUE, ncores = container$experiment_params$ncores )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor of interest |
method |
character The method of gsea to use. Can either be "fgsea", "fgsea_special or "hypergeometric". (default="fgsea") |
thresh |
numeric Pvalue significance threshold to use. Will include gene sets in resulting heatmap if pvalue is below this threshold for at least one cell type. (default=0.05) |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", and "BioCarta". More than one database can be used. (default="GO") |
signed |
logical If TRUE, uses signed gsea. If FALSE, uses unsigned gsea. Currently only works with fgsea method (default=TRUE) |
min_gs_size |
numeric Minimum gene set size (default=15) |
max_gs_size |
numeric Maximum gene set size (default=500) |
reset_other_factor_plots |
logical Set to TRUE to set all other gsea plots to NULL (default=FALSE) |
draw_plot |
logical Set to TRUE to show the plot. Plot is stored regardless. (default=TRUE) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
A stacked heatmap plot of the gsea results in the slot container$plots$gsea$<Factor#>. The heatmaps show adjusted p-values for the enrichment of each gene set in each cell type for the selected factor. The top heatmap shows enriched gene sets among the positive loading genes and the bottom heatmap shows enriched gene sets among the negative loading genes for the factor.
test_container <- run_gsea_one_factor(test_container, factor_select=1, method="fgsea", thresh=0.05, db_use="Hallmark", signed=TRUE)
test_container <- run_gsea_one_factor(test_container, factor_select=1, method="fgsea", thresh=0.05, db_use="Hallmark", signed=TRUE)
Compute enriched gene sets among significant genes in a cell type for a factor using hypergeometric test
run_hypergeometric_gsea( container, factor_select, ctype, up_down, thresh = 0.05, min_gs_size = 15, max_gs_size = 500, db_use = "GO" )
run_hypergeometric_gsea( container, factor_select, ctype, up_down, thresh = 0.05, min_gs_size = 15, max_gs_size = 500, db_use = "GO" )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor of interest |
ctype |
character The cell type of interest |
up_down |
character Either "up" to compute enrichment among the significant positive loading genes or "down" to compute enrichment among the significant negative loading genes. |
thresh |
numeric Pvalue significance threshold. Used as cutoff for calling genes as significant to use for enrichment tests. (default=0.05) |
min_gs_size |
numeric Minimum gene set size (default=15) |
max_gs_size |
numeric Maximum gene set size (default=500) |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", and "BioCarta". More than one database can be used. (default="GO") |
A vector of adjusted p-values for enrichment of gene sets in the significant genes of a given cell type in a given factor.
Run jackstraw to get genes that are significantly associated with donor scores for factors extracted by Tucker decomposition
run_jackstraw( container, ranks, n_fibers = 100, n_iter = 500, tucker_type = "regular", rotation_type = "hybrid", seed = container$experiment_params$rand_seed, ncores = container$experiment_params$ncores )
run_jackstraw( container, ranks, n_fibers = 100, n_iter = 500, tucker_type = "regular", rotation_type = "hybrid", seed = container$experiment_params$rand_seed, ncores = container$experiment_params$ncores )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of donor ranks and gene ranks to decompose to using Tucker decomposition |
n_fibers |
numeric The number of fibers the randomly shuffle in each iteration (default=100) |
n_iter |
numeric The number of shuffling iterations to complete (default=500) |
tucker_type |
character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular') |
rotation_type |
character Set to 'hybrid' to perform hybrid rotation on resulting donor factor matrix and loadings. Otherwise set to 'ica_lds' to perform ica rotation on loadings or ica_dsc to perform ica on donor scores. (default='hybrid') |
seed |
numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
The project container with a vector of adjusted pvalues in container$gene_score_associations.
test_container <- run_jackstraw(test_container, ranks=c(2,4), n_fibers=2, n_iter=10, tucker_type='regular', rotation_type='hybrid', ncores=1)
test_container <- run_jackstraw(test_container, ranks=c(2,4), n_fibers=2, n_iter=10, tucker_type='regular', rotation_type='hybrid', ncores=1)
Test stability of a decomposition by subsampling or bootstrapping donors. Note that running this function will replace the decomposition in the project container with one resulting from the tucker parameters entered here.
run_stability_analysis( container, ranks, tucker_type = "regular", rotation_type = "hybrid", subset_type = "subset", sub_prop = 0.75, n_iterations = 100, ncores = container$experiment_params$ncores )
run_stability_analysis( container, ranks, tucker_type = "regular", rotation_type = "hybrid", subset_type = "subset", sub_prop = 0.75, n_iterations = 100, ncores = container$experiment_params$ncores )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of donor, gene, and cell type ranks, respectively, to decompose to using Tucker decomposition. |
tucker_type |
character The 'regular' type is the only one implemented with sparsity constraints (default='regular') |
rotation_type |
character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid') |
subset_type |
character Set to either 'subset' or 'bootstrap' (default='subset') |
sub_prop |
numeric The proportion of donors to keep when using subset_type='subset' (default=.75) |
n_iterations |
numeric The number of iterations to perform (default=100) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
The project container with the donor scores stability plot in container$plots$stability_plot_dsc and the loadings stability plot in container$plots$stability_plot_lds
test_container <- run_stability_analysis(test_container, ranks=c(2,4), tucker_type='regular', rotation_type='hybrid', subset_type='subset', sub_prop=0.75, n_iterations=5, ncores=1)
test_container <- run_stability_analysis(test_container, ranks=c(2,4), tucker_type='regular', rotation_type='hybrid', subset_type='subset', sub_prop=0.75, n_iterations=5, ncores=1)
Run the Tucker decomposition and rotate the factors
run_tucker_ica( container, ranks, tucker_type = "regular", rotation_type = "hybrid" )
run_tucker_ica( container, ranks, tucker_type = "regular", rotation_type = "hybrid" )
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of donor factors and gene factors, respectively, to decompose the data into. Since we rearrange the standard output of the Tucker decomposition to be 'donor centric', the number of donor factors will also be the total number of main factors that can be used for downstream analysis. The number of gene factors will only impact the quality of the decomposition. |
tucker_type |
character The 'regular' type is the only one currently implemented |
rotation_type |
character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid') |
The project container with results of the decomposition in container$tucker_results. The results object is a list with the donor scores matrix in the first element and the unfolded loadings matrix in the second element.
test_container <- run_tucker_ica(test_container,ranks=c(2,4))
test_container <- run_tucker_ica(test_container,ranks=c(2,4))
Get a list of tensor fibers to shuffle
sample_fibers(tensor_data, n_fibers)
sample_fibers(tensor_data, n_fibers)
tensor_data |
list The tensor data including donor, gene, and cell type labels as well as the tensor array itself |
n_fibers |
numeric The number of fibers to get |
A list of gene and cell type indices for the randomly selected fibers
Scale font size. From simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R
scale_fontsize(x, rg = c(1, 30), fs = c(4, 16))
scale_fontsize(x, rg = c(1, 30), fs = c(4, 16))
x |
A numeric vector. |
rg |
The range. |
fs |
Range of the font size. |
A numeric vector.
Scale variance across donors for each gene within each cell type. Generally, this should be done through calling the form_tensor() wrapper function.
scale_variance(container, var_scale_power)
scale_variance(container, var_scale_power)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
var_scale_power |
numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. |
The project container with the variance altered for each gene within the pseudobulked matrices for each cell type.
Convert Seurat object to scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.
seurat_to_scMinimal(seurat_obj, metadata_cols = NULL, metadata_col_nm = NULL)
seurat_to_scMinimal(seurat_obj, metadata_cols = NULL, metadata_col_nm = NULL)
seurat_obj |
Seurat object that has been cleaned and includes the normalized, log-transformed counts. The meta.data should include a column with the header 'sex' and values of 'M' or 'F' if available. The metadata should also have a column with the header 'ctypes' with the corresponding names of the cell types as well as a column with header 'donors' that contains identifiers for each donor. |
metadata_cols |
character The names of the metadata columns to use (default=NULL) |
metadata_col_nm |
character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL) |
An scMinimal object holding counts and metadata for a project.
Shuffle elements within the selected fibers
shuffle_fibers(tensor_data, s_fibers)
shuffle_fibers(tensor_data, s_fibers)
tensor_data |
list The tensor data including donor, gene, and cell type labels as well as the tensor array itself |
s_fibers |
list Gene and cell type indices for the randomly selected fibers |
The tensor_data object with the values for the selected fibers shuffled.
Create the tensor object by stacking each pseudobulk cell type matrix. Generally, this should be done through calling the form_tensor() wrapper function.
stack_tensor(container)
stack_tensor(container)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
The project container with the list of tensor data in container$tensor_data.
Helper function from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/utils.R
stop_wrap(...)
stop_wrap(...)
... |
other parameters |
No value is returned.
Subset an scMinimal object by specified genes, donors, cells, or cell types
subset_scMinimal( scMinimal, ctypes_use = NULL, cells_use = NULL, donors_use = NULL, genes_use = NULL, in_place = TRUE )
subset_scMinimal( scMinimal, ctypes_use = NULL, cells_use = NULL, donors_use = NULL, genes_use = NULL, in_place = TRUE )
scMinimal |
environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata |
ctypes_use |
character The cell types to keep (default=NULL) |
cells_use |
character Cell barcodes for the cells to keep (default=NULL) |
donors_use |
character The donors to keep (default=NULL) |
genes_use |
character The genes to keep (default=NULL) |
in_place |
logical If set to TRUE then replaces the input object with the new subsetted object (default=TRUE) |
A subsetted scMinimal object.
cell_names <- colnames(test_container$scMinimal_full$count_data) cells_sub <- sample(cell_names,40) scMinimal <- subset_scMinimal(test_container$scMinimal_full, cells_use=cells_sub)
cell_names <- colnames(test_container$scMinimal_full$count_data) cells_sub <- sample(cell_names,40) scMinimal <- subset_scMinimal(test_container$scMinimal_full, cells_use=cells_sub)
Data container for testing tensor formation steps
test_container
test_container
An object of class environment
of length 10.
Helper function for running the decomposition. Use the run_tucker_ica() wrapper function instead.
tucker_ica_helper( tensor_data, ranks, tucker_type, rotation_type, projection_container = NULL )
tucker_ica_helper( tensor_data, ranks, tucker_type, rotation_type, projection_container = NULL )
tensor_data |
list The tensor data including donor, gene, and cell type labels as well as the tensor array itself |
ranks |
numeric The number of donor and gene factors respectively, to decompose to using Tucker decomposition. |
tucker_type |
character The 'regular' type is the only one currently implemented |
rotation_type |
character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. |
projection_container |
environment A project container to store projection data in. Currently only implemented for 'hybrid' and 'ica_dsc' rotations. (default=NULL) |
The list of results for tucker decomposition with donor scores matrix in first element and loadings matrix in second element.
Update any of the experiment-wide parameters
update_params(container, ctypes_use = NULL, ncores = NULL, rand_seed = NULL)
update_params(container, ctypes_use = NULL, ncores = NULL, rand_seed = NULL)
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctypes_use |
character Names of the cell types to use for the analysis (default=NULL) |
ncores |
numeric Number of cores to use (default=NULL) |
rand_seed |
numeric Random seed to use (default=NULL) |
The project container with updated experiment parameters in container$experiment_params.
test_container <- update_params(test_container, ncores=1)
test_container <- update_params(test_container, ncores=1)
Compute significantly variable genes via anova. Generally, this should be done through calling the form_tensor() wrapper function.
vargenes_anova(scMinimal, ncores)
vargenes_anova(scMinimal, ncores)
scMinimal |
environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms |
ncores |
numeric Number of cores to use |
A list of raw p-values for each gene.