Package 'scITD' reference manual

Title:	Single-Cell Interpretable Tensor Decomposition
Description:	Single-cell Interpretable Tensor Decomposition (scITD) employs the Tucker tensor decomposition to extract multicell-type gene expression patterns that vary across donors/individuals. This tool is geared for use with single-cell RNA-sequencing datasets consisting of many source donors. The method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. Each "multicellular process" that is extracted consists of (A) a multi cell type gene loadings matrix and (B) a corresponding donor scores vector indicating the level at which the corresponding loadings matrix is expressed in each donor. Additional methods are implemented to aid in selecting an appropriate number of factors and to evaluate stability of the decomposition. Additional tools are provided for downstream analysis, including integration of gene set enrichment analysis and ligand-receptor analysis. Tucker, L.R. (1966) <doi:10.1007/BF02289464>. Unkel, S., Hannachi, A., Trendafilov, N. T., & Jolliffe, I. T. (2011) <doi:10.1007/s13253-011-0055-9>. Zhou, G., & Cichocki, A. (2012) <doi:10.2478/v10175-012-0051-4>.
Authors:	Jonathan Mitchel [cre, aut], Evan Biederstedt [aut], Peter Kharchenko [aut]
Maintainer:	Jonathan Mitchel <[email protected]>
License:	GPL-3
Version:	1.0.4
Built:	2025-03-13 05:36:55 UTC
Source:	https://github.com/cran/scITD

Apply ComBat batch correction to pseudobulk matrices. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Apply ComBat batch correction to pseudobulk matrices. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

apply_combat(container, batch_var)
apply_combat(container, batch_var)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`batch_var`	character A batch variable from metadata to remove

Value

The project container with the batc corrected pseudobulked matrices.

Calculate F-Statistics for the association between donor scores for each factor donor values of shuffled gene_ctype fibers

Description

Calculate F-Statistics for the association between donor scores for each factor donor values of shuffled gene_ctype fibers

Usage

calculate_fiber_fstats(tensor_data, tucker_results, s_fibers)
calculate_fiber_fstats(tensor_data, tucker_results, s_fibers)

Arguments

`tensor_data`	list The tensor data including donor, gene, and cell type labels as well as the tensor array itself
`tucker_results`	list The results from Tucker decomposition. Includes a scores matrix as the first element and the loadings tensor unfolded as the second element.
`s_fibers`	list Gene and cell type indices for the randomly selected fibers

Value

A numeric vector of F-statistics for associations between all shuffled fibers and donor scores.

Helper function to check whether receptor is present in target cell type

Description

Helper function to check whether receptor is present in target cell type

Usage

check_rec_pres(
  container,
  lig_ct_exp,
  rec_elements,
  target_ct,
  percentile_exp_rec
)
check_rec_pres(
  container,
  lig_ct_exp,
  rec_elements,
  target_ct,
  percentile_exp_rec
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`lig_ct_exp`	numeric Scaled expression for a ligand in the source cell type
`rec_elements`	character One or more components of a receptor complex
`target_ct`	character The name of the target cell type
`percentile_exp_rec`	numeric The percentile of ligand expression above which all donors need to have at least 5 cells expressing the receptor.

Value

A logical indicating whether receptor is present or not.

Clean data to remove genes only expressed in a few cells and donors with very few cells. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Clean data to remove genes only expressed in a few cells and donors with very few cells. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

clean_data(container, donor_min_cells = 5)
clean_data(container, donor_min_cells = 5)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`donor_min_cells`	numeric Minimum threshold for number of cells per donor (default=5)

Value

The project container with cleaned counts matrices in each container$scMinimal_ctype$<ctype>$count_data.

Calculates column mean and variance. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp

Description

Calculates column mean and variance. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp

Usage

colMeanVars(sY, rowSel, ncores = 1L)
colMeanVars(sY, rowSel, ncores = 1L)

Arguments

`sY`	sparse matrix Gene by cell matrix of counts
`rowSel`	numeric The selected rows (genes)
`ncores`	numeric The number of cores

Value

data.frame with columns of mean, variance, and number of observeatios for each gene across samples

Examples

library(Matrix)
donor_by_gene <- rbind(c(9,2,1,5), c(3,3,1,2))
donor_by_gene <- Matrix(donor_by_gene, sparse = TRUE)
result <- colMeanVars(donor_by_gene, rowSel = NULL, ncores=1)
library(Matrix)
donor_by_gene <- rbind(c(9,2,1,5), c(3,3,1,2))
donor_by_gene <- Matrix(donor_by_gene, sparse = TRUE)
result <- colMeanVars(donor_by_gene, rowSel = NULL, ncores=1)

Plot a pairwise comparison of factors from two separate decompositions

Description

Plot a pairwise comparison of factors from two separate decompositions

Usage

compare_decompositions(
  tucker_res1,
  tucker_res2,
  decomp_names,
  meta_anno1 = NULL,
  meta_anno2 = NULL,
  use_text = TRUE
)
compare_decompositions(
  tucker_res1,
  tucker_res2,
  decomp_names,
  meta_anno1 = NULL,
  meta_anno2 = NULL,
  use_text = TRUE
)

Arguments

`tucker_res1`	list The container$tucker_res from first decomposition
`tucker_res2`	list The container$tucker_res from first decomposition
`decomp_names`	character Names of the two decompositions that will go on the axes of the heatmap
`meta_anno1`	matrix The result of calling get_meta_associations() corresponding to the first decomposition, which is stored in container$meta_associations (default=NULL)
`meta_anno2`	matrix The result of calling get_meta_associations() corresponding to the second decomposition, which is stored in container$meta_associations (default=NULL)
`use_text`	logical If TRUE, then displays correlation coefficients in cells (default=TRUE)

Value

No return value, as the resulting plots are drawn.

Examples

test_container <- run_tucker_ica(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='hybrid')
tucker_res1 <- test_container$tucker_results
test_container <- run_tucker_ica(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='ica_dsc')
tucker_res2 <- test_container$tucker_results
compare_decompositions(tucker_res1,tucker_res2,c('hybrid_method','ica_method'))
test_container <- run_tucker_ica(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='hybrid')
tucker_res1 <- test_container$tucker_results
test_container <- run_tucker_ica(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='ica_dsc')
tucker_res2 <- test_container$tucker_results
compare_decompositions(tucker_res1,tucker_res2,c('hybrid_method','ica_method'))

Compute associations between donor proportions and factor scores

Description

Compute associations between donor proportions and factor scores

Usage

compute_associations(donor_balances, donor_scores, stat_type)
compute_associations(donor_balances, donor_scores, stat_type)

Arguments

`donor_balances`	matrx The balances computed from donor cell type proportions
`donor_scores`	data.frame The donor scores matrix from tucker results
`stat_type`	character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues.

Value

A numeric vector of association statistics (one for each factor)

Get donor proportions of each cell type or subtype

Description

Get donor proportions of each cell type or subtype

Usage

compute_donor_props(clusts, metadata)
compute_donor_props(clusts, metadata)

Arguments

`clusts`	integer Cluster assignments for each cell with names as cell barcodes
`metadata`	data.frame The $metadata field for the given scMinimal

Value

A data.frame of cluster proportions for each donor.

Compute and plot the LR interactions for one factor

Description

Compute and plot the LR interactions for one factor

Usage

compute_LR_interact(
  container,
  lr_pairs,
  sig_thresh = 0.05,
  percentile_exp_rec = 0.75,
  add_ld_fact_sig = TRUE,
  ncores = container$experiment_params$ncores
)
compute_LR_interact(
  container,
  lr_pairs,
  sig_thresh = 0.05,
  percentile_exp_rec = 0.75,
  add_ld_fact_sig = TRUE,
  ncores = container$experiment_params$ncores
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`lr_pairs`	data.frame Data of ligand-receptor pairs. First column should be ligands and second column should be one or more receptors separated by an underscore such as receptor1_receptor2 in the case that multiple receptors are required for signaling.
`sig_thresh`	numeric The p-value significance threshold to use for module- factor associations and ligand-factor associations (default=0.05)
`percentile_exp_rec`	numeric The percentile above which the top donors expressing the ligand all must be expressing the receptor (default=0.75)
`add_ld_fact_sig`	logical Set to TRUE to append a heatmap showing significance of associations between each ligand hit and each factor (default=TRUE)
`ncores`	numeric The number of cores to use (default=container$experiment_params$ncores)

Value

The LR analysis results heatmap as ComplexHeatmap object. Adjusted p-values for all results are placed in container$lr_res.

Convert gene identifiers to gene symbols

Description

Convert gene identifiers to gene symbols

Usage

convert_gn(container, genes)
convert_gn(container, genes)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`genes`	character Vector of the gene identifiers to be converted to gene symbols

Value

A character vector of gene symbols.

count_word. From older version of simplifyEnrichment package.

Description

count_word. From older version of simplifyEnrichment package.

Usage

count_word(term, exclude_words = NULL)
count_word(term, exclude_words = NULL)

Arguments

`term`	A vector of description texts.
`exclude_words`	The words that should be excluded.

Value

A data frame with words and frequencies.

Run rank determination by svd on the tensor unfolded along each mode

Description

Run rank determination by svd on the tensor unfolded along each mode

Usage

determine_ranks_tucker(
  container,
  max_ranks_test,
  shuffle_level = "cells",
  shuffle_within = NULL,
  num_iter = 100,
  batch_var = NULL,
  norm_method = "trim",
  scale_factor = 10000,
  scale_var = TRUE,
  var_scale_power = 0.5,
  seed = container$experiment_params$rand_seed
)
determine_ranks_tucker(
  container,
  max_ranks_test,
  shuffle_level = "cells",
  shuffle_within = NULL,
  num_iter = 100,
  batch_var = NULL,
  norm_method = "trim",
  scale_factor = 10000,
  scale_var = TRUE,
  var_scale_power = 0.5,
  seed = container$experiment_params$rand_seed
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`max_ranks_test`	numeric Vector of length 2 specifying the maximum number of donor and gene ranks to test
`shuffle_level`	character Either "cells" to shuffle cell-donor linkages or "tensor" to shuffle values within the tensor (default="cells")
`shuffle_within`	character A metadata variable to shuffle cell-donor linkages within (default=NULL)
`num_iter`	numeric Number of null iterations (default=100)
`batch_var`	character A batch variable from metadata to remove. No batch correction applied if NULL. (default=NULL)
`norm_method`	character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim')
`scale_factor`	numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000)
`scale_var`	logical TRUE to scale the gene expression variance across donors for each cell type. If FALSE then all genes are scaled to unit variance across donors for each cell type. (default=TRUE)
`var_scale_power`	numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5)
`seed`	numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed)

Value

The project container with a cowplot figure of rank determination plots in container$plots$rank_determination_plot.

Examples

test_container <- determine_ranks_tucker(test_container, max_ranks_test=c(3,5),
shuffle_level='tensor', num_iter=4, norm_method='trim', scale_factor=10000,
scale_var=TRUE, var_scale_power=.5)
test_container <- determine_ranks_tucker(test_container, max_ranks_test=c(3,5),
shuffle_level='tensor', num_iter=4, norm_method='trim', scale_factor=10000,
scale_var=TRUE, var_scale_power=.5)

Form the pseudobulk tensor as preparation for running the tensor decomposition.

Description

Form the pseudobulk tensor as preparation for running the tensor decomposition.

Usage

form_tensor(
  container,
  donor_min_cells = 5,
  norm_method = "trim",
  scale_factor = 10000,
  vargenes_method = "norm_var",
  vargenes_thresh = 500,
  batch_var = NULL,
  scale_var = TRUE,
  var_scale_power = 0.5,
  custom_genes = NULL,
  verbose = TRUE
)
form_tensor(
  container,
  donor_min_cells = 5,
  norm_method = "trim",
  scale_factor = 10000,
  vargenes_method = "norm_var",
  vargenes_thresh = 500,
  batch_var = NULL,
  scale_var = TRUE,
  var_scale_power = 0.5,
  custom_genes = NULL,
  verbose = TRUE
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`donor_min_cells`	numeric Minimum threshold for number of cells per donor (default=5)
`norm_method`	character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim')
`scale_factor`	numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000)
`vargenes_method`	character The method by which to select highly variable genes from each cell type. Set to 'anova' to select genes by anova. Set to 'norm_var' to select the top genes by normalized variance or 'norm_var_pvals' to select genes by significance of their overdispersion (default='norm_var')
`vargenes_thresh`	numeric The threshold to use in variable gene selection. For 'anova' and 'norm_var_pvals' this should be a p-value threshold. For 'norm_var' this should be the number of most variably expressed genes to select from each cell type (default=500)
`batch_var`	character A batch variable from metadata to remove (default=NULL)
`scale_var`	logical TRUE to scale the gene expression variance across donors for each cell type. If FALSE then all genes are scaled to unit variance across donors for each cell type. (default=TRUE)
`var_scale_power`	numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5)
`custom_genes`	character A vector of genes to include in the tensor. Overrides the default gene selection if not NULL. (default=NULL)
`verbose`	logical Set to TRUE to print out progress (default=TRUE)

Value

The project container with a list of tensor data added in the container$tensor_data slot.

Examples

test_container <- form_tensor(test_container, donor_min_cells=0,
norm_method='trim', scale_factor=10000, vargenes_method='norm_var', vargenes_thresh=500,
scale_var = TRUE, var_scale_power = 1.5)
test_container <- form_tensor(test_container, donor_min_cells=0,
norm_method='trim', scale_factor=10000, vargenes_method='norm_var', vargenes_thresh=500,
scale_var = TRUE, var_scale_power = 1.5)

Generate loadings heatmaps for all factors

Description

Generate loadings heatmaps for all factors

Usage

get_all_lds_factor_plots(
  container,
  use_sig_only = FALSE,
  nonsig_to_zero = FALSE,
  annot = "none",
  pathways_list = NULL,
  sim_de_donor_group = NULL,
  sig_thresh = 0.05,
  display_genes = FALSE,
  gene_callouts = FALSE,
  callout_n_gene_per_ctype = 5,
  callout_ctypes = NULL,
  show_var_explained = TRUE,
  reset_other_factor_plots = TRUE
)
get_all_lds_factor_plots(
  container,
  use_sig_only = FALSE,
  nonsig_to_zero = FALSE,
  annot = "none",
  pathways_list = NULL,
  sim_de_donor_group = NULL,
  sig_thresh = 0.05,
  display_genes = FALSE,
  gene_callouts = FALSE,
  callout_n_gene_per_ctype = 5,
  callout_ctypes = NULL,
  show_var_explained = TRUE,
  reset_other_factor_plots = TRUE
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`use_sig_only`	logical If TRUE, includes only significant genes from jackstraw in the heatmap. If FALSE, includes all the variable genes. (default = FALSE)
`nonsig_to_zero`	logical If TRUE, makes the loadings of all nonsignificant genes 0 (default=FALSE)
`annot`	character If set to "pathways" then creates an adjacent heatmap showing which genes are in which pathways. If set to "sig_genes" then creates an adjacent heatmap showing which genes were significant from jackstraw. If set to "none" no adjacent heatmap is plotted. (default="none")
`pathways_list`	list A list of sets of pathways for each factor. List index should be the number corresponding to the factor. (default=NULL)
`sim_de_donor_group`	numeric To plot the ground truth significant genes from a simulation next to the heatmap, put the number of the donor group that corresponds to the factor being plotted. Here it should be a vector corresponding to the factors. (default=NULL)
`sig_thresh`	numeric Pvalue significance threshold to use. If use_sig_only is TRUE the threshold is used as a cutoff for genes to include. If annot is "sig_genes" this value is used in the gene significance colormap as a minimum threshold. (default=0.05)
`display_genes`	logical If TRUE, displays the names of gene names (default=FALSE)
`gene_callouts`	logical If TRUE, then adds gene callout annotations to the heatmap (default=FALSE)
`callout_n_gene_per_ctype`	numeric To use if gene_callouts is TRUE. Sets the number of largest magnitude significant genes from each cell type to include in gene callouts. (default=5)
`callout_ctypes`	list To use if gene_callouts is TRUE. Specifies which cell types to get gene callouts for. Each entry of the list should be a character vector of ctypes for the respective factor. If NULL, then gets gene callouts for largest magnitude significant genes for all cell types. (default=NULL)
`show_var_explained`	logical If TRUE then shows an anottation with the explained variance for each cell type (default=TRUE)
`reset_other_factor_plots`	logical If TRUE then removes any existing loadings plots (default=TRUE)

Value

The project container with the list of all loadings heatmap plots placed in container$plots$all_lds_plots.

Examples

test_container <- get_all_lds_factor_plots(test_container)
test_container <- get_all_lds_factor_plots(test_container)

Get gene callout annotations for a loadings heatmap

Description

Get gene callout annotations for a loadings heatmap

Usage

get_callouts_annot(
  container,
  tmp_casted_num,
  factor_select,
  sig_thresh,
  top_n_per_ctype = 5,
  ctypes = NULL
)
get_callouts_annot(
  container,
  tmp_casted_num,
  factor_select,
  sig_thresh,
  top_n_per_ctype = 5,
  ctypes = NULL
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`tmp_casted_num`	matrix The gene by cell type loadings matrix
`factor_select`	numeric The factor to investigate
`sig_thresh`	numeric Pvalue cutoff for significant genes
`top_n_per_ctype`	numeric The number of significant, largest magnitude genes from each cell type to generate callouts for (default=5)
`ctypes`	character The cell types for which to get the top genes to make callouts for. If NULL then uses all cell types. (default=NULL)

Value

A HeatmapAnnotation object for the gene callouts.

Get explained variance of the reconstructed data using one cell type from one factor

Description

Get explained variance of the reconstructed data using one cell type from one factor

Usage

get_ctype_exp_var(container, factor_use, ctype)
get_ctype_exp_var(container, factor_use, ctype)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_use`	numeric The factor to get variance explained for
`ctype`	character The cell type to get variance explained for

Value

The explained variance numeric value for one cell type of one factor.

Compute and plot associations between donor factor scores and donor proportions of major cell types

Description

Compute and plot associations between donor factor scores and donor proportions of major cell types

Usage

get_ctype_prop_associations(container, stat_type, n_col = 2)
get_ctype_prop_associations(container, stat_type, n_col = 2)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`stat_type`	character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues.
`n_col`	numeric The number of columns to organize the plots into (default=2)

Value

The project container with a cowplot figure of results plots in container$plots$ctype_prop_factor_associations.

Compute and plot associations between donor factor scores and donor proportions of cell subtypes

Description

Compute and plot associations between donor factor scores and donor proportions of cell subtypes

Usage

get_ctype_subc_prop_associations(
  container,
  ctype,
  res,
  n_col = 2,
  alt_name = NULL
)
get_ctype_subc_prop_associations(
  container,
  ctype,
  res,
  n_col = 2,
  alt_name = NULL
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ctype`	character The cell type to get results for
`res`	numeric The clustering resolution to retrieve
`n_col`	numeric The number of columns to organize the plots into (default=2)
`alt_name`	character Alternate name for the cell type used in clustering (default=NULL)

Value

The project container with a cowplot figure of results plots in container$plots$ctype_prop_factor_associations.

Partition main gene by cell matrix into per cell type matrices with significantly variable genes only. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Partition main gene by cell matrix into per cell type matrices with significantly variable genes only. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

get_ctype_vargenes(
  container,
  method,
  thresh,
  ncores = container$experiment_params$ncores,
  seed = container$experiment_params$rand_seed
)
get_ctype_vargenes(
  container,
  method,
  thresh,
  ncores = container$experiment_params$ncores,
  seed = container$experiment_params$rand_seed
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`method`	character The method used to select significantly variable genes across donors within a cell type. Can be either "anova" to use basic anova with cells grouped by donor or "norm_var" to get the top overdispersed genes by normalized variance. Set to "norm_var_pvals" to use normalized variance p-values as calculated in pagoda2.
`thresh`	numeric A pvalue threshold to use for gene significance when method is set to "anova" or "empir". For the method "norm_var" thresh is the number of top overdispersed genes from each cell type to include.
`ncores`	numeric The number of cores to use (default=container$experiment_params$ncores)
`seed`	numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed)

Value

The project container with pseudobulk matrices limted to the selected most variable genes.

Get metadata matrix of dimensions donors by variables (not per cell)

Description

Get metadata matrix of dimensions donors by variables (not per cell)

Usage

get_donor_meta(container, additional_meta = NULL, only_analyzed = TRUE)
get_donor_meta(container, additional_meta = NULL, only_analyzed = TRUE)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`additional_meta`	character A vector of other variables to include (default=NULL)
`only_analyzed`	logical Set to TRUE to only include donors that were included in the formed tensor, otherwise set to FALSE (default=TRUE)

Value

The project container with metadata per donor (not per cell) in container$donor_metadata.

Examples

test_container <- get_donor_meta(test_container, additional_meta='lanes')
test_container <- get_donor_meta(test_container, additional_meta='lanes')

Get the explained variance of the reconstructed data using one factor

Description

Get the explained variance of the reconstructed data using one factor

Usage

get_factor_exp_var(container, factor_use)
get_factor_exp_var(container, factor_use)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_use`	numeric The factor to investigate

Value

The explained variance numeric value for one factor.

Calculate adjusted p-values for gene_celltype fiber-donor score associations

Description

Calculate adjusted p-values for gene_celltype fiber-donor score associations

Usage

get_fstats_pvals(fstats_real, fstats_shuffled)
get_fstats_pvals(fstats_real, fstats_shuffled)

Arguments

`fstats_real`	numeric A vector of F-Statistics for gene-cell type-factor combinations
`fstats_shuffled`	numeric A vector of null F-Statistics

Value

A vector of adjusted p-values for associations of the unshuffled fibers with factor donor scores.

Compute WGCNA gene modules for each cell type

Description

Compute WGCNA gene modules for each cell type

Usage

get_gene_modules(container, sft_thresh)
get_gene_modules(container, sft_thresh)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`sft_thresh`	numeric A vector indicating the soft threshold to use for each cell type. Length should be the same as container$experiment_params$ctypes_use

Value

The project container with WGCNA gene co-expression modules added. The module eigengenes for each cell type are in container$module_eigengenes, and the module genes for each cell type are in container$module_genes.

Get logical vectors indicating which genes are in which pathways

Description

Get logical vectors indicating which genes are in which pathways

Usage

get_gene_set_vectors(container, gene_sets, tmp_casted_num)
get_gene_set_vectors(container, gene_sets, tmp_casted_num)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`gene_sets`	character Vector of gene sets to extract genes for
`tmp_casted_num`	matrix The gene by cell type loadings matrix

Value

A list of the logical vectors for each pathway.

Compute subtype proportion-factor association p-values for all subclusters of a given major cell type

Description

Compute subtype proportion-factor association p-values for all subclusters of a given major cell type

Usage

get_indv_subtype_associations(container, donor_props, factor_select)
get_indv_subtype_associations(container, donor_props, factor_select)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`donor_props`	matrix Donor proportions of subtypes
`factor_select`	numeric The factor to get associations for

Value

A vector of association statistics each cell subtype against a selected factor.

Extract the intersection of gene sets which are enriched in two or more cell types for a factor

Description

Extract the intersection of gene sets which are enriched in two or more cell types for a factor

Usage

get_intersecting_pathways(
  container,
  factor_select,
  these_ctypes_only,
  up_down,
  thresh = 0.05
)
get_intersecting_pathways(
  container,
  factor_select,
  these_ctypes_only,
  up_down,
  thresh = 0.05
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor to investigate
`these_ctypes_only`	character A vector of cell types for which to get gene sets that are enriched in all of these and not in any other cell types
`up_down`	character Set to "up" to get the gene sets for the positive loading genes. Set to "down" to get the gene sets for the negative loadings genes.
`thresh`	numeric Pvalue significance threshold for selecting enriched sets (default=0.05)

Value

A vector of the intersection of pathways that are significantly enriched in two or more cell types for a factor.

Get the leading edge genes from GSEA results

Description

Get the leading edge genes from GSEA results

Usage

get_leading_edge_genes(container, factor_select, gsets, num_genes_per = 5)
get_leading_edge_genes(container, factor_select, gsets, num_genes_per = 5)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor to get results for
`gsets`	character A vector of gene set names to get leading edge genes for.
`num_genes_per`	numeric The maximum number of leading edge genes to get for each gene set (default=5)

Value

A named character vector of gene sets, with leading edge genes as the names.

Compute gene-factor associations using univariate linear models

Description

Compute gene-factor associations using univariate linear models

Usage

get_lm_pvals(container, n.cores = container$experiment_params$ncores)
get_lm_pvals(container, n.cores = container$experiment_params$ncores)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`n.cores`	Number of cores to use (default = container$experiment_params$ncores)

Value

The project container with a vector of adjusted p-values for the gene-factor associations in container$gene_score_associations.

Examples

test_container <- get_lm_pvals(test_container, n.cores=1)
test_container <- get_lm_pvals(test_container, n.cores=1)

Computes the max correlation between each factor of the decomposition done using the whole dataset to each factor computed using the subsampled/bootstrapped dataset

Description

Computes the max correlation between each factor of the decomposition done using the whole dataset to each factor computed using the subsampled/bootstrapped dataset

Usage

get_max_correlations(res_full, res_sub, res_use)
get_max_correlations(res_full, res_sub, res_use)

Arguments

`res_full`	matrix Either the donor scores or loadings matrix from the original decomposition
`res_sub`	matrix Either the donor scores or loadings matrix from the new decomposition
`res_use`	character Can either be 'loadings' or 'dscores' and should correspond with the data matrix used

Value

a vector of the max correlations for each original factor

Get metadata associations with factor donor scores

Description

Get metadata associations with factor donor scores

Usage

get_meta_associations(container, vars_test, stat_use = "rsq")
get_meta_associations(container, vars_test, stat_use = "rsq")

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`vars_test`	character The names of meta variables to get associations for
`stat_use`	character Set to either 'rsq' to get r-squared values or 'pval' to get adjusted pvalues (default='rsq)

Value

The project container with a matrix of metadata associations with each factor in container$meta_associations.

Examples

test_container <- get_meta_associations(test_container, vars_test='lanes', stat_use='pval')
test_container <- get_meta_associations(test_container, vars_test='lanes', stat_use='pval')

Evaluate the minimum number for significant genes in any factor for a given number of factors extracted by the decomposition

Description

Evaluate the minimum number for significant genes in any factor for a given number of factors extracted by the decomposition

Usage

get_min_sig_genes(
  container,
  donor_rank_range,
  gene_ranks,
  use_lm = TRUE,
  tucker_type = "regular",
  rotation_type = "hybrid",
  n_fibers = 100,
  n_iter = 500,
  n.cores = container$experiment_params$ncores,
  thresh = 0.05
)
get_min_sig_genes(
  container,
  donor_rank_range,
  gene_ranks,
  use_lm = TRUE,
  tucker_type = "regular",
  rotation_type = "hybrid",
  n_fibers = 100,
  n_iter = 500,
  n.cores = container$experiment_params$ncores,
  thresh = 0.05
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses. Should have
`donor_rank_range`	numeric Range of possible number of donor factors to use.
`gene_ranks`	numeric The number of gene ranks to use in the decomposition
`use_lm`	logical Set to true to use get_lm_pvals otherwise uses jackstraw (default=TRUE)
`tucker_type`	character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular')
`rotation_type`	character Set to 'hybrid' to perform hybrid rotation on resulting donor factor matrix and loadings. Otherwise set to 'ica_lds' to perform ica rotation on loadings or ica_dsc to perform ica on donor scores. (default='hybrid')
`n_fibers`	numeric The number of fibers the randomly shuffle in each jackstraw iteration (default=100)
`n_iter`	numeric The number of jackstraw shuffling iterations to complete (default=500)
`n.cores`	Number of cores to use in get_lm_pvals() (default = container$experiment_params$ncores)
`thresh`	numeric Pvalue threshold for significant genes in calculating the number of significant genes identified per factor. (default=0.05)

Value

The project container with a plot of the minimum significant genes for each decomposition with varying number of donor factors located in container$plots$min_sig_genes.

Examples

test_container <- get_min_sig_genes(test_container, donor_rank_range=c(2:4),
gene_ranks=4, tucker_type='regular', rotation_type='hybrid', n.cores=1)
test_container <- get_min_sig_genes(test_container, donor_rank_range=c(2:4),
gene_ranks=4, tucker_type='regular', rotation_type='hybrid', n.cores=1)

Identify gene sets that are enriched within specified gene co-regulatory modules. Uses a hypergeometric test for over-representation. Used in plot_multi_module_enr().

Description

Identify gene sets that are enriched within specified gene co-regulatory modules. Uses a hypergeometric test for over-representation. Used in plot_multi_module_enr().

Usage

get_module_enr(container, ctype, mod_select, db_use = "GO", adjust_pval = TRUE)
get_module_enr(container, ctype, mod_select, db_use = "GO", adjust_pval = TRUE)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ctype`	character The name of cell type for the cell type module to test
`mod_select`	numeric The module number for the cell type module to test
`db_use`	character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", "Hallmark", "TF", and "immuno". More than one database can be used. (default="GO")
`adjust_pval`	logical Set to TRUE to apply FDR correction (default=TRUE)

Value

A vector of p-values for the tested gene sets.

Get normalized variance for each gene, taking into account mean-variance trend

Description

Get normalized variance for each gene, taking into account mean-variance trend

Usage

get_normalized_variance(container)
get_normalized_variance(container)

Arguments

container

environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses

Value

The project container with vectors of normalized variances values in scMinimal objects for each cell type. Generally, this should be done through calling the form_tensor() wrapper function.

Plot factor-batch associations for increasing number of donor factors

Description

Plot factor-batch associations for increasing number of donor factors

Usage

get_num_batch_ranks(
  container,
  donor_ranks_test,
  gene_ranks,
  batch_var,
  thresh = 0.5,
  tucker_type = "regular",
  rotation_type = "hybrid"
)
get_num_batch_ranks(
  container,
  donor_ranks_test,
  gene_ranks,
  batch_var,
  thresh = 0.5,
  tucker_type = "regular",
  rotation_type = "hybrid"
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`donor_ranks_test`	numeric The number of donor rank values to test
`gene_ranks`	numeric The number of gene ranks to use throughout
`batch_var`	character The name of the batch meta variable
`thresh`	numeric The threshold r-squared cutoff for considering a factor to be a batch factor. Can be a vector of multiple values to get plots at varying thresholds. (default=0.5)
`tucker_type`	character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular')
`rotation_type`	character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid')

Value

A ggpubr figure of ggplot objects showing batch-factor associations and placed in container$plots$num_batch_factors slot

Examples

test_container <- get_num_batch_ranks(test_container, donor_ranks_test=c(2:4), 
gene_ranks=10, batch_var='lanes', thresh=0.5, tucker_type='regular', rotation_type='hybrid')
test_container <- get_num_batch_ranks(test_container, donor_ranks_test=c(2:4), 
gene_ranks=10, batch_var='lanes', thresh=0.5, tucker_type='regular', rotation_type='hybrid')

Get the donor scores and loadings matrix for a single-factor

Description

Get the donor scores and loadings matrix for a single-factor

Usage

get_one_factor(container, factor_select)
get_one_factor(container, factor_select)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The number corresponding to the factor to extract

Value

A list with the first element as the donor scores and the second element as the corresponding loadings matrix for one factor.

Examples

f1_res <- get_one_factor(test_container, factor_select=1)
f1_res <- get_one_factor(test_container, factor_select=1)

Get significant genes for a factor

Description

Get significant genes for a factor

Usage

get_one_factor_gene_pvals(container, factor_select)
get_one_factor_gene_pvals(container, factor_select)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The number corresponding to the factor to extract

Value

A gene by cell type matrix of gene significance p-values for a factor

Collapse data from cell-level to donor-level via summing counts. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Collapse data from cell-level to donor-level via summing counts. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

get_pseudobulk(container, shuffle = FALSE, shuffle_within = NULL)
get_pseudobulk(container, shuffle = FALSE, shuffle_within = NULL)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`shuffle`	logical Set to TRUE to shuffle cell-donor linkages (default=FALSE)
`shuffle_within`	character A metadata variable to shuffle cell-donor linkages within (default=NULL)

Value

The project container with pseudobulked count matrices in container$scMinimal_ctype$<ctype>$pseudobulk slots for each cell type.

Get F-Statistics for the real (non-shuffled) gene_ctype fibers

Description

Get F-Statistics for the real (non-shuffled) gene_ctype fibers

Usage

get_real_fstats(container, ncores)
get_real_fstats(container, ncores)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ncores`	numeric The number of cores to use

Value

A vector F-statistics for each gene_celltype-factor association of the unshuffled data.

Calculate reconstruction errors using svd approach

Description

Calculate reconstruction errors using svd approach

Usage

get_reconstruct_errors_svd(tnsr, max_ranks_test, shuffle_tensor)
get_reconstruct_errors_svd(tnsr, max_ranks_test, shuffle_tensor)

Arguments

`tnsr`	array A 3-dimensional array with dimensions of donors, genes, and cell types in that order
`max_ranks_test`	numeric Vector of length 3 with maximum number of ranks to test for donor, gene, and cell type modes in that order
`shuffle_tensor`	logical Set to TRUE to shuffle values within the tensor

Value

A list of reconstruction errors for each mode of the tensor.

Get vectors indicating which genes are significant in which cell types for a factor of interest

Description

Get vectors indicating which genes are significant in which cell types for a factor of interest

Usage

get_significance_vectors(container, factor_select, ctypes)
get_significance_vectors(container, factor_select, ctypes)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor to query
`ctypes`	character The cell types used in all the analysis ordered as they appear in the loadings matrix

Value

A list of the adjusted p-values for expression of each gene in each cell type in association with a factor of interest.

Get list of cell subtype differential expression heatmaps

Description

Get list of cell subtype differential expression heatmaps

Usage

get_subclust_de_hmaps(container, all_ctypes, all_res)
get_subclust_de_hmaps(container, all_ctypes, all_res)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`all_ctypes`	character A vector of the cell types to include
`all_res`	numeric A vector of resolutions matching the all_ctypes parameter

Value

A list of cell subcluster DE marker gene heatmaps as grob objects.

Get scatter plot for association of a cell subtype proportion with scores for a factor

Description

Get scatter plot for association of a cell subtype proportion with scores for a factor

Usage

get_subclust_enr_dotplot(
  container,
  ctype,
  res,
  subtype,
  factor_use,
  ctype_cur = ctype
)
get_subclust_enr_dotplot(
  container,
  ctype,
  res,
  subtype,
  factor_use,
  ctype_cur = ctype
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ctype`	character The cell type to plot
`res`	numeric The subcluster resolution to use
`subtype`	numeric The number corresponding with the subtype of the major cell type to plot
`factor_use`	numeric The factor to plot
`ctype_cur`	character The name of the major cell type used in the main analysis

Value

A ggplot object of each donor's cell subcluster proportions against donor scores for a selected factor.

Get a figure showing cell subtype proportion associations with each factor. Combines this plot with subtype UMAPs and differential expression heatmaps. Note that this function runs better if the number of cores in the conos object in container$embedding has n.cores set to a relatively small value < 10.

Description

Get a figure showing cell subtype proportion associations with each factor. Combines this plot with subtype UMAPs and differential expression heatmaps. Note that this function runs better if the number of cores in the conos object in container$embedding has n.cores set to a relatively small value < 10.

Usage

get_subclust_enr_fig(container, all_ctypes, all_res)
get_subclust_enr_fig(container, all_ctypes, all_res)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`all_ctypes`	character A vector of the cell types to include
`all_res`	numeric A vector of resolutions matching the all_ctypes parameter

Value

A cowplot figure placed in the slot container$plots$subc_fig.

Get heatmap of subtype proportion associations for each celltype/subtype and each factor

Description

Get heatmap of subtype proportion associations for each celltype/subtype and each factor

Usage

get_subclust_enr_hmap(container, all_ctypes, all_res, all_factors)
get_subclust_enr_hmap(container, all_ctypes, all_res, all_factors)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`all_ctypes`	character A vector of the cell types to include
`all_res`	numeric A vector of resolutions matching the all_ctypes parameter
`all_factors`	numerc A vector of the factors to compute associations for

Value

A ComplexHeatmap object in container$plots$subc_enr_hmap showing the univariate associations between cell subcluster proportions and each factor.

Get a figure to display subclusterings at multiple resolutions

Description

Get a figure to display subclusterings at multiple resolutions

Usage

get_subclust_umap(container, all_ctypes, all_res, n_col = 3)
get_subclust_umap(container, all_ctypes, all_res, n_col = 3)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`all_ctypes`	character A vector of the cell types to include
`all_res`	numeric A vector of resolutions matching the all_ctypes parameter
`n_col`	numeric The number of columns to organize the figure into (default=3)

Value

The project container with a cowplot figure of all UMAP plots in container$plots$subc_umap_fig and the individual umap plots in container$plots$subc_umaps

Perform leiden subclustering to get cell subtypes

Description

Perform leiden subclustering to get cell subtypes

Usage

get_subclusters(
  container,
  ctype,
  resolution,
  min_cells_group = 50,
  small_clust_action = "merge"
)
get_subclusters(
  container,
  ctype,
  resolution,
  min_cells_group = 50,
  small_clust_action = "merge"
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ctype`	character The cell type to do subclustering for
`resolution`	numeric The leiden resolution to use
`min_cells_group`	numeric The minimum allowable cluster size (default=50)
`small_clust_action`	character Either 'remove' to remove subclusters or 'merge' to merge clusters below min_cells_group threshold to the nearest cluster above the size threshold (default='merge')

Value

A vector of cell subclusters.

Compute and plot associations between factor scores and cell subtype composition for various clustering resolution parameters

Description

Compute and plot associations between factor scores and cell subtype composition for various clustering resolution parameters

Usage

get_subtype_prop_associations(
  container,
  max_res,
  stat_type,
  integration_var = NULL,
  min_cells_group = 50,
  use_existing_subc = FALSE,
  alt_ct_names = NULL,
  n_col = 2
)
get_subtype_prop_associations(
  container,
  max_res,
  stat_type,
  integration_var = NULL,
  min_cells_group = 50,
  use_existing_subc = FALSE,
  alt_ct_names = NULL,
  n_col = 2
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`max_res`	numeric The maximum clustering resolution to use. Minimum is 0.5.
`stat_type`	character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues.
`integration_var`	character The meta data variable to use for creating the joint embedding with Conos if not already provided in container$embedding (default=NULL)
`min_cells_group`	numeric The minimum allowable size for cell subpopulations (default=50)
`use_existing_subc`	logical Set to TRUE to use existing subcluster annotations (default=FALSE)
`alt_ct_names`	character Cell type names used in clustering if different from those used in the main analysis. Should match the order of container$experiment_params$ctypes_use. (default=NULL)
`n_col`	numeric The number of columns to organize the plots into (default=2)

Value

The project container with a cowplot figure of cell subtype proportion-factor association results plots in container$plots$subtype_prop_factor_associations.

Calculates factor-stratified sums for each column. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp

Description

Calculates factor-stratified sums for each column. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp

Usage

get_sums(sY, rowSel)
get_sums(sY, rowSel)

Arguments

`sY`	sparse matrix Gene by cell matrix of counts
`rowSel`	factor The donor that each cell is from

Value

matrix of summed counts per gene per sample

Visualize the similarity matrix and the clustering. Adapted from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R

Description

Visualize the similarity matrix and the clustering. Adapted from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R

Usage

ht_clusters(
  mat,
  cl,
  dend = NULL,
  col = c("white", "red"),
  draw_word_cloud = is_GO_id(rownames(mat)[1]) || !is.null(term),
  term = NULL,
  min_term = 5,
  order_by_size = FALSE,
  exclude_words = character(0),
  max_words = 10,
  word_cloud_grob_param = list(),
  fontsize_range = c(4, 16),
  column_title = NULL,
  ht_list = NULL,
  use_raster = TRUE,
  ...
)
ht_clusters(
  mat,
  cl,
  dend = NULL,
  col = c("white", "red"),
  draw_word_cloud = is_GO_id(rownames(mat)[1]) || !is.null(term),
  term = NULL,
  min_term = 5,
  order_by_size = FALSE,
  exclude_words = character(0),
  max_words = 10,
  word_cloud_grob_param = list(),
  fontsize_range = c(4, 16),
  column_title = NULL,
  ht_list = NULL,
  use_raster = TRUE,
  ...
)

Arguments

`mat`	A similarity matrix.
`cl`	Cluster labels inferred from the similarity matrix, e.g. from 'cluster_terms' or 'binary_cut'.
`dend`	Used internally.
`col`	A vector of colors that map from 0 to the 95^th percentile of the similarity values.
`draw_word_cloud`	Whether to draw the word clouds.
`term`	The full name or the description of the corresponding GO IDs.
`min_term`	Minimal number of functional terms in a cluster. All the clusters with size less than “min_term“ are all merged into one separated cluster in the heatmap.
`order_by_size`	Whether to reorder clusters by their sizes. The cluster that is merged from small clusters (size < “min_term“) is always put to the bottom of the heatmap.
`exclude_words`	Words that are excluded in the word cloud.
`max_words`	Maximal number of words visualized in the word cloud.
`word_cloud_grob_param`	A list of graphic parameters passed to 'word_cloud_grob'.
`fontsize_range`	The range of the font size. The value should be a numeric vector with length two. The minimal font size is mapped to word frequency value of 1 and the maximal font size is mapped to the maximal word frequency. The font size interlopation is linear.
`column_title`	Column title for the heatmap.
`ht_list`	A list of additional heatmaps added to the left of the similarity heatmap.
`use_raster`	Whether to write the heatmap as a raster image.
`...`	other parameters

Value

A list containing a 'ComplexHeatmap::HeatmapList-class' object and GO term ordering.

Extract metadata for sex information if not provided already

Description

Extract metadata for sex information if not provided already

Usage

identify_sex_metadata(container, y_gene = "RPS4Y1", x_gene = "XIST")
identify_sex_metadata(container, y_gene = "RPS4Y1", x_gene = "XIST")

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`y_gene`	character Gene name to use for identifying male donors (default='RPS4Y1')
`x_gene`	character Gene name to use for identifying female donors (default='XIST')

Value

The project container with sex metadata added to the metadata.

Initialize parameters to be used throughout scITD in various functions

Description

Initialize parameters to be used throughout scITD in various functions

Usage

initialize_params(ctypes_use, ncores = 4, rand_seed = 10)
initialize_params(ctypes_use, ncores = 4, rand_seed = 10)

Arguments

`ctypes_use`	character Names of the cell types to use for the analysis (default=NULL)
`ncores`	numeric Number of cores to use (default=4)
`rand_seed`	numeric Random seed to use (default=10)

Value

A list of the experiment parameters to use.

Examples

param_list <- initialize_params(ctypes_use = c("CD4+ T", "CD8+ T"),
ncores = 1, rand_seed = 10)
param_list <- initialize_params(ctypes_use = c("CD4+ T", "CD8+ T"),
ncores = 1, rand_seed = 10)

Create an scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.

Description

Create an scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.

Usage

instantiate_scMinimal(
  count_data,
  meta_data,
  metadata_cols = NULL,
  metadata_col_nm = NULL
)
instantiate_scMinimal(
  count_data,
  meta_data,
  metadata_cols = NULL,
  metadata_col_nm = NULL
)

Arguments

`count_data`	sparseMatrix Matrix of raw counts with genes as rows and cells as columns
`meta_data`	data.frame Metadata with cells as rows and variables as columns. Number of rows in metadata should equal number of columns in count matrix.
`metadata_cols`	character The names of the metadata columns to use (default=NULL)
`metadata_col_nm`	character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL)

Value

An scMinimal object holding counts and metadata for a project.

Examples

scMinimal <- instantiate_scMinimal(count_data=test_container$scMinimal_full$count_data,
meta_data=test_container$scMinimal_full$metadata)
scMinimal <- instantiate_scMinimal(count_data=test_container$scMinimal_full$count_data,
meta_data=test_container$scMinimal_full$metadata)

Check if a character is a go ID

Description

Check if a character is a go ID

Usage

is_GO_id(x)
is_GO_id(x)

Arguments

`x`	A character

Value

A logical

Create a container to store all data and results for the project. You must provide a params list as generated by initialize_params(). You also need to provide either a Seurat object or both a count_data matrix and a meta_data matrix.

Description

Create a container to store all data and results for the project. You must provide a params list as generated by initialize_params(). You also need to provide either a Seurat object or both a count_data matrix and a meta_data matrix.

Usage

make_new_container(
  params,
  count_data = NULL,
  meta_data = NULL,
  seurat_obj = NULL,
  scMinimal = NULL,
  gn_convert = NULL,
  metadata_cols = NULL,
  metadata_col_nm = NULL,
  label_donor_sex = FALSE
)
make_new_container(
  params,
  count_data = NULL,
  meta_data = NULL,
  seurat_obj = NULL,
  scMinimal = NULL,
  gn_convert = NULL,
  metadata_cols = NULL,
  metadata_col_nm = NULL,
  label_donor_sex = FALSE
)

Arguments

`params`	list A list of the experiment params to use as generated by initialize_params()
`count_data`	dgCMatrix Matrix of raw counts with genes as rows and cells as columns (default=NULL)
`meta_data`	data.frame Metadata with cells as rows and variables as columns. Number of rows in metadata should equal number of columns in count matrix (default=NULL)
`seurat_obj`	Seurat object that has been cleaned and includes the normalized, log-transformed counts. The meta.data should include a column with the header 'sex' and values of 'M' or 'F' if available. The metadata should also have a column with the header 'ctypes' with the corresponding names of the cell types as well as a column with header 'donors' that contains identifiers for each donor. (default=NULL)
`scMinimal`	environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata (default=NULL)
`gn_convert`	data.frame Gene identifier -> gene name conversions table. Gene identifiers used in counts matrices should appear in the first column and the corresponding gene symbols should appear in the second column. Can remain NULL if the identifiers are already gene symbols. (default=NULL)
`metadata_cols`	character The names of the metadata columns to use (default=NULL)
`metadata_col_nm`	character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL)
`label_donor_sex`	logical Set to TRUE to label donor sex in the meta data by using expressing of sex-associated genes (default=FALSE)

Value

A project container of class environment that stores sub-containers for each cell type as well as results and plots from all analyses.

Merge small subclusters into larger ones

Description

Merge small subclusters into larger ones

Usage

merge_small_clusts(con, clusts, min_cells_group)
merge_small_clusts(con, clusts, min_cells_group)

Arguments

`con`	conos Object for the dataset with umap projection and groups as cell types
`clusts`	character The initially assigned subclusters by leiden clustering
`min_cells_group`	numeric The minimum allowable cluster size

Value

The subcluster labels with small clusters below the size threshold merged into the nearest larger cluster.

Computes non-negative matrix factorization on the tensor unfolded along the donor dimension

Description

Computes non-negative matrix factorization on the tensor unfolded along the donor dimension

Usage

nmf_unfolded(container, ranks)
nmf_unfolded(container, ranks)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ranks`	numeric The number of factors to extract. Unlike with the Tucker decomposition, this should be a single number.

Value

The project container with results of the decomposition in container$tucker_results. The results object is a list with the donor scores matrix in the first element and the unfolded loadings matrix in the second element.

Examples

test_container <- nmf_unfolded(test_container, 2)
test_container <- nmf_unfolded(test_container, 2)

Calculates the normalized variance for each gene. This is adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/R/Pagoda2.R Generally, this should be done through calling the form_tensor() wrapper function.

Description

Calculates the normalized variance for each gene. This is adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/R/Pagoda2.R Generally, this should be done through calling the form_tensor() wrapper function.

Usage

norm_var_helper(scMinimal)
norm_var_helper(scMinimal)

Arguments

scMinimal

environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata

Value

A list with the first element containing a vector of the normalized variance for each gene and the second element containing log-transformed adjusted p-values for the overdispersion of each gene.

Helper function to normalize and log-transform count data

Description

Helper function to normalize and log-transform count data

Usage

normalize_counts(count_data, scale_factor = 10000)
normalize_counts(count_data, scale_factor = 10000)

Arguments

`count_data`	matrix or sparse matrix Gene by cell matrix of counts
`scale_factor`	numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000)

Value

The normalized, log-transformed matrix.

Normalize the pseudobulked counts matrices. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Normalize the pseudobulked counts matrices. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

normalize_pseudobulk(container, method = "trim", scale_factor = 10000)
normalize_pseudobulk(container, method = "trim", scale_factor = 10000)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`method`	character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim')
`scale_factor`	numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000)

Value

The project container with normalized pseudobulk matrices in container$scMinimal_ctype$<ctype>$pseudobulk slots.

Parse main counts matrix into per-celltype-matrices. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Parse main counts matrix into per-celltype-matrices. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

parse_data_by_ctypes(container)
parse_data_by_ctypes(container)

Arguments

container

environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses

Value

The project container with separate scMinimal objects per cell type in the container$scMinimal_ctype slot

Computes singular-value decomposition on the tensor unfolded along the donor dimension

Description

Computes singular-value decomposition on the tensor unfolded along the donor dimension

Usage

pca_unfolded(container, ranks)
pca_unfolded(container, ranks)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ranks`	numeric The number of factors to extract. Unlike with the Tucker decomposition, this should be a single number.

Value

Examples

test_container <- pca_unfolded(test_container, 2)
test_container <- pca_unfolded(test_container, 2)

Plot matrix of donor scores extracted from Tucker decomposition

Description

Plot matrix of donor scores extracted from Tucker decomposition

Usage

plot_donor_matrix(
  container,
  meta_vars = NULL,
  cluster_by_meta = NULL,
  show_donor_ids = FALSE,
  add_meta_associations = NULL,
  show_var_explained = TRUE,
  donors_sel = NULL,
  h_w = NULL
)
plot_donor_matrix(
  container,
  meta_vars = NULL,
  cluster_by_meta = NULL,
  show_donor_ids = FALSE,
  add_meta_associations = NULL,
  show_var_explained = TRUE,
  donors_sel = NULL,
  h_w = NULL
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`meta_vars`	character Names of metadata variables to plot alongside the donor scores. Can include more than one variable. (default=NULL)
`cluster_by_meta`	character One metadata variable to cluster the heatmap by. If NULL, donor clustering is done using donor scores. (default=NULL)
`show_donor_ids`	logical Set to TRUE to show donor id as row name on the heamap (default=FALSE)
`add_meta_associations`	character Adds meta data associations with each factor as top annotation. These should be generated first with plot_meta_associations(). Set to 'pval' if used 'pval' in plot_meta_associations(), otherwise set to 'rsq'. If NULL, no annotation is added. (default=NULL)
`show_var_explained`	logical Set to TRUE to display the explained variance for each factor (default=TRUE)
`donors_sel`	character A vector of a subset of donors to include in the plot (default=NULL)
`h_w`	numeric Vector specifying height and width (defualt=NULL)

Value

The project container with a heatmap plot of donor scores in container$plots$donor_matrix.

Examples

test_container <- plot_donor_matrix(test_container, show_donor_ids = TRUE)
test_container <- plot_donor_matrix(test_container, show_donor_ids = TRUE)

Plot donor celltype/subtype proportions against each factor

Description

Plot donor celltype/subtype proportions against each factor

Usage

plot_donor_props(
  donor_props,
  donor_scores,
  significance,
  ctype_mapping = NULL,
  stat_type = "adj_pval",
  n_col = 2
)
plot_donor_props(
  donor_props,
  donor_scores,
  significance,
  ctype_mapping = NULL,
  stat_type = "adj_pval",
  n_col = 2
)

Arguments

`donor_props`	data.frame Donor proportions as output from compute_donor_props()
`donor_scores`	data.frame Donor scores from tucker results
`significance`	numeric F-Statistics as output from compute_associations()
`ctype_mapping`	character The cell types corresponding with columns of donor_props (default=NULL)
`stat_type`	character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues (default='adj_pval')
`n_col`	numeric The number of columns to organize the plots into (default=2)

Value

A cowplot figure of ggplot objects for proportions of each cell type against donor factor scores for each factor.

Generate a gene by donor heatmap showing scaled expression of top loading genes for a given factor

Description

Generate a gene by donor heatmap showing scaled expression of top loading genes for a given factor

Usage

plot_donor_sig_genes(
  container,
  factor_select,
  top_n_per_ctype,
  ctypes_use = NULL,
  show_donor_labels = FALSE,
  additional_meta = NULL,
  add_genes = NULL
)
plot_donor_sig_genes(
  container,
  factor_select,
  top_n_per_ctype,
  ctypes_use = NULL,
  show_donor_labels = FALSE,
  additional_meta = NULL,
  add_genes = NULL
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor to query
`top_n_per_ctype`	numeric Vector of the number of top genes from each cell type to plot
`ctypes_use`	character The cell types for which to get the top genes to make callouts for. If NULL then uses all cell types. (default=NULL)
`show_donor_labels`	logical Set to TRUE to display donor labels (default=FALSE)
`additional_meta`	character Another meta variable to plot (default=NULL)
`add_genes`	character Additional genes to plot for all ctypes (default=NULL)

Value

The project container with a heatmap plot in the slot container$plots$donor_sig_genes$<Factor#>. This heatmap shows scaled expression of top loading genes in each cell type for a selected factor.

Examples

test_container <- plot_donor_sig_genes(test_container, factor_select=1,
top_n_per_ctype=2)
test_container <- plot_donor_sig_genes(test_container, factor_select=1,
top_n_per_ctype=2)

Compute enrichment of donor metadata categorical variables at high/low factor scores

Description

Compute enrichment of donor metadata categorical variables at high/low factor scores

Usage

plot_dscore_enr(container, factor_use, meta_var)
plot_dscore_enr(container, factor_use, meta_var)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_use`	numeric The factor to test
`meta_var`	character The name of the metadata variable to test

Value

A cowplot figure of enrichment plots.

Examples

fig <- plot_dscore_enr(test_container, factor_use=1, meta_var='lanes')
fig <- plot_dscore_enr(test_container, factor_use=1, meta_var='lanes')

Plot enriched gene sets from all cell types in a heatmap

Description

Plot enriched gene sets from all cell types in a heatmap

Usage

plot_gsea_hmap(container, factor_select, thresh)
plot_gsea_hmap(container, factor_select, thresh)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor to plot
`thresh`	numeric Pvalue threshold to use for including gene sets in the heatmap

Value

A stacked heatmap object from ComplexHeatmap.

Plot already computed enriched gene sets to show semantic similarity between sets

Description

Plot already computed enriched gene sets to show semantic similarity between sets

Usage

plot_gsea_hmap_w_similarity(
  container,
  factor_select,
  direc,
  thresh,
  exclude_words = character(0)
)
plot_gsea_hmap_w_similarity(
  container,
  factor_select,
  direc,
  thresh,
  exclude_words = character(0)
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor to plot
`direc`	character Set to either 'up' or 'down' to use the appropriate sets
`thresh`	numeric Pvalue threshold to use for including gene sets in the heatmap
`exclude_words`	character Vector of words to exclude from word cloud (default=character(0))

Value

No value is returned. A heatmap showing enriched gene sets clustered by semantic similarity is drawn.

Look at enriched gene sets from a cluster of semantically similar gene sets. Uses the results from previous run of plot_gsea_hmap_w_similarity()

Description

Look at enriched gene sets from a cluster of semantically similar gene sets. Uses the results from previous run of plot_gsea_hmap_w_similarity()

Usage

plot_gsea_sub(container, clust_select, thresh = 0.05)
plot_gsea_sub(container, clust_select, thresh = 0.05)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`clust_select`	numeric The cluster to plot gene sets from. On the previous semantic similarity plot, cluster numbering starts from the top as 1.
`thresh`	numeric Color threshold to use for showing significance (default=0.05)

Value

A heatmap plot from ComplexHeatmap showing one semantic similarity cluster of enriched gene sets with adjusted p-values for each cell type.

Plot the gene by celltype loadings for a factor

Description

Plot the gene by celltype loadings for a factor

Usage

plot_loadings_annot(
  container,
  factor_select,
  use_sig_only = FALSE,
  nonsig_to_zero = FALSE,
  annot = "none",
  pathways = NULL,
  sim_de_donor_group = NULL,
  sig_thresh = 0.05,
  display_genes = FALSE,
  gene_callouts = FALSE,
  callout_n_gene_per_ctype = 5,
  callout_ctypes = NULL,
  specific_callouts = NULL,
  le_set_callouts = NULL,
  le_set_colormap = NULL,
  le_set_num_per = 5,
  show_le_legend = FALSE,
  show_xlab = TRUE,
  show_var_explained = TRUE,
  clust_method = "median",
  h_w = NULL,
  reset_other_factor_plots = FALSE,
  draw_plot = TRUE
)
plot_loadings_annot(
  container,
  factor_select,
  use_sig_only = FALSE,
  nonsig_to_zero = FALSE,
  annot = "none",
  pathways = NULL,
  sim_de_donor_group = NULL,
  sig_thresh = 0.05,
  display_genes = FALSE,
  gene_callouts = FALSE,
  callout_n_gene_per_ctype = 5,
  callout_ctypes = NULL,
  specific_callouts = NULL,
  le_set_callouts = NULL,
  le_set_colormap = NULL,
  le_set_num_per = 5,
  show_le_legend = FALSE,
  show_xlab = TRUE,
  show_var_explained = TRUE,
  clust_method = "median",
  h_w = NULL,
  reset_other_factor_plots = FALSE,
  draw_plot = TRUE
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor to plot
`use_sig_only`	logical If TRUE, includes only significant genes from jackstraw in the heatmap. If FALSE, includes all the variable genes. (default = FALSE)
`nonsig_to_zero`	logical If TRUE, makes the loadings of all nonsignificant genes 0 (default=FALSE)
`annot`	character If set to "pathways" then creates an adjacent heatmap showing which genes are in which pathways. If set to "sig_genes" then creates an adjacent heatmap showing which genes were significant from jackstraw. If set to "none" no adjacent heatmap is plotted. (default="none")
`pathways`	character Gene sets to plot if annot is set to "pathways" (default=NULL)
`sim_de_donor_group`	numeric To plot the ground truth significant genes from a simulation next to the heatmap, put the number of the donor group that corresponds to the factor being plotted (default=NULL)
`sig_thresh`	numeric Pvalue significance threshold to use. If use_sig_only is TRUE the threshold is used as a cutoff for genes to include. If annot is "sig_genes" this value is used in the gene significance colormap as a minimum threshold. (default=0.05)
`display_genes`	logical If TRUE, displays the names of gene names (default=FALSE)
`gene_callouts`	logical If TRUE, then adds gene callout annotations to the heatmap (default=FALSE)
`callout_n_gene_per_ctype`	numeric To use if gene_callouts is TRUE. Sets the number of largest magnitude significant genes from each cell type to include in gene callouts. (default=5)
`callout_ctypes`	character To use if gene_callouts is TRUE. Specifies which cell types to get gene callouts for. If NULL, then gets gene callouts for largest magnitude significant genes for all cell types. (default=NULL)
`specific_callouts`	character A vector of gene names to show callouts for (default=NULL)
`le_set_callouts`	character Pass a vector of gene set names to show leading edge genes for a select set of gene sets (default=NULL)
`le_set_colormap`	character A named vector with names as gene sets and values as colors. If NULL, then selects first n colors of Set3 color palette. (default=NULL)
`le_set_num_per`	numeric The number of leading edge genes to show for each gene set (default=5)
`show_le_legend`	logical Set to TRUE to show the color map legend for leading edge genes (default=FALSE)
`show_xlab`	logical If TRUE, displays the xlabel 'genes' (default=TRUE)
`show_var_explained`	logical If TRUE then shows an anotation with the explained variance for each cell type (default=TRUE)
`clust_method`	character The hclust method to use for clustering rows (default='median')
`h_w`	numeric Vector specifying height and width (defualt=NULL)
`reset_other_factor_plots`	logical Set to TRUE to set all other loadings plots to NULL. Useful if run get_all_lds_factor_plots but then only want to show one or two plots. (default=FALSE)
`draw_plot`	logical Set to TRUE to show the plot. Plot is stored regardless. (default=TRUE)

Value

The project container with a heatmap of loadings for one factor put in container$plots$all_lds_plots. The legend for the heatmap is put in container$plots$all_legends. Use draw(<hmap obj>,annotation_legend_list = <hmap legend obj>) to re-render the plot with legend.

Examples

test_container <- plot_loadings_annot(test_container, 1, display_genes=FALSE,
show_var_explained = TRUE)
test_container <- plot_loadings_annot(test_container, 1, display_genes=FALSE,
show_var_explained = TRUE)

Plot trio of associations between ligand expression, module eigengenes, and factor scores

Description

Plot trio of associations between ligand expression, module eigengenes, and factor scores

Usage

plot_mod_and_lig(container, factor_select, mod_ct, mod, lig_ct, lig)
plot_mod_and_lig(container, factor_select, mod_ct, mod, lig_ct, lig)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor to use
`mod_ct`	character The name of the cell type for the corresponding module
`mod`	numeric The number of the corresponding module
`lig_ct`	character The name of the cell type where the ligand is expressed
`lig`	character The name of the ligand to use

Value

A cowplot figure of ggplot objects for the three associations scatter plots.

Generate gene set x ct_module heatmap showing co-expression module gene set enrichment results

Description

Generate gene set x ct_module heatmap showing co-expression module gene set enrichment results

Usage

plot_multi_module_enr(
  container,
  ctypes,
  modules,
  sig_thresh = 0.05,
  db_use = "TF",
  max_plt_pval = 0.1,
  h_w = NULL
)
plot_multi_module_enr(
  container,
  ctypes,
  modules,
  sig_thresh = 0.05,
  db_use = "TF",
  max_plt_pval = 0.1,
  h_w = NULL
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ctypes`	character A vector of cell type names corresponding to the module numbers in mod_select, specifying the modules to compute enrichment for
`modules`	numeric A vector of module numbers corresponding to the cell types in ctype, specifying the modules to compute enrichment for
`sig_thresh`	numeric P-value threshold for results to include. Only shows a given gene set if at least one module has a result lower than the threshold. (default=0.05)
`db_use`	character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", "Hallmark", "TF", and "immuno". More than one database can be used. (default="GO")
`max_plt_pval`	max pvalue shown on plot, but not used to remove rows like sig_thresh (default=.1)
`h_w`	numeric Vector specifying height and width (defualt=NULL)

Value

A ComplexHeatmap object of enrichment results.

Plot reconstruction errors as bar plot for svd method

Description

Plot reconstruction errors as bar plot for svd method

Usage

plot_rec_errors_bar_svd(real, shuffled, mode_to_show)
plot_rec_errors_bar_svd(real, shuffled, mode_to_show)

Arguments

`real`	list The real reconstruction errors
`shuffled`	list The reconstruction errors under null model
`mode_to_show`	numeric The mode to plot the results for

Value

A ggplot object showing the difference in reconstruction errors for successive factors.

Plot reconstruction errors as line plot for svd method

Description

Plot reconstruction errors as line plot for svd method

Usage

plot_rec_errors_line_svd(real, shuffled, mode_to_show)
plot_rec_errors_line_svd(real, shuffled, mode_to_show)

Arguments

`real`	list The real reconstruction errors
`shuffled`	list The reconstruction errors under null model
`mode_to_show`	numeric The mode to plot the results for

Value

A ggplot object showing relative reconstruction errors.

Plot dotplots for each factor to compare donor scores between metadata groups

Description

Plot dotplots for each factor to compare donor scores between metadata groups

Usage

plot_scores_by_meta(container, meta_var)
plot_scores_by_meta(container, meta_var)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`meta_var`	character The meta data variable to compare groups for

Value

The project container with a figure of comparison plots (one for each factor) placed in container$plots$indv_meta_scores_associations.

Plot enrichment results for hand picked gene sets

Description

Plot enrichment results for hand picked gene sets

Usage

plot_select_sets(
  container,
  factors_all,
  sets_plot,
  color_sets = NULL,
  cl_rows = FALSE,
  h_w = NULL,
  myfontsize = 8
)
plot_select_sets(
  container,
  factors_all,
  sets_plot,
  color_sets = NULL,
  cl_rows = FALSE,
  h_w = NULL,
  myfontsize = 8
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factors_all`	numeric Vector of one or more factor numbers to get plots for
`sets_plot`	character Vector of gene set names to show enrichment values for
`color_sets`	named character Values are colors corresponding to each set, with names as the gene set names (default=NULL)
`cl_rows`	logical Set to TRUE to cluster gene set results (default=FALSE)
`h_w`	numeric Vector specifying height and width (defualt=NULL)
`myfontsize`	numeric Gene set label fontsize (default=8)

Value

A list with a ComplexHeatmap object of select enriched gene sets as the first element and with a legend object as the second element.

Generate a plot for either the donor scores or loadings stability test

Description

Generate a plot for either the donor scores or loadings stability test

Usage

plot_stability_results(container, plt_data)
plot_stability_results(container, plt_data)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`plt_data`	character Either 'lds' or 'dsc' and indicates which plot to make

Value

the plot

Plot association significances for varying clustering resolutions

Description

Plot association significances for varying clustering resolutions

Usage

plot_subclust_associations(res, n_col = 2)
plot_subclust_associations(res, n_col = 2)

Arguments

`res`	data.frame Regression statistics for each subcluster analysis
`n_col`	numeric The number of columns to organize the plots into (default=2)

Value

A cowplot of ggplot objects showing statistics for regressions of proportions of each cell subtype (at varying clustering resolutions) against each factor.

Plot a heatmap of differential genes. Code is adapted from Conos package. https://github.com/kharchenkolab/conos/blob/master/R/plot.R

Description

Plot a heatmap of differential genes. Code is adapted from Conos package. https://github.com/kharchenkolab/conos/blob/master/R/plot.R

Usage

plotDEheatmap_conos(
  con,
  groups,
  container,
  de = NULL,
  min.auc = NULL,
  min.specificity = NULL,
  min.precision = NULL,
  n.genes.per.cluster = 10,
  additional.genes = NULL,
  exclude.genes = NULL,
  labeled.gene.subset = NULL,
  expression.quantile = 0.99,
  pal = (grDevices::colorRampPalette(c("dodgerblue1", "grey95", "indianred1")))(1024),
  ordering = "-AUC",
  column.metadata = NULL,
  show.gene.clusters = TRUE,
  remove.duplicates = TRUE,
  column.metadata.colors = NULL,
  show.cluster.legend = TRUE,
  show_heatmap_legend = FALSE,
  border = TRUE,
  return.details = FALSE,
  row.label.font.size = 10,
  order.clusters = FALSE,
  split = FALSE,
  split.gap = 0,
  cell.order = NULL,
  averaging.window = 0,
  ...
)
plotDEheatmap_conos(
  con,
  groups,
  container,
  de = NULL,
  min.auc = NULL,
  min.specificity = NULL,
  min.precision = NULL,
  n.genes.per.cluster = 10,
  additional.genes = NULL,
  exclude.genes = NULL,
  labeled.gene.subset = NULL,
  expression.quantile = 0.99,
  pal = (grDevices::colorRampPalette(c("dodgerblue1", "grey95", "indianred1")))(1024),
  ordering = "-AUC",
  column.metadata = NULL,
  show.gene.clusters = TRUE,
  remove.duplicates = TRUE,
  column.metadata.colors = NULL,
  show.cluster.legend = TRUE,
  show_heatmap_legend = FALSE,
  border = TRUE,
  return.details = FALSE,
  row.label.font.size = 10,
  order.clusters = FALSE,
  split = FALSE,
  split.gap = 0,
  cell.order = NULL,
  averaging.window = 0,
  ...
)

Arguments

`con`	conos (or p2) object
`groups`	groups in which the DE genes were determined (so that the cells can be ordered correctly)
`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`de`	differential expression result (list of data frames)
`min.auc`	optional minimum AUC threshold
`min.specificity`	optional minimum specificity threshold
`min.precision`	optional minimum precision threshold
`n.genes.per.cluster`	number of genes to show for each cluster
`additional.genes`	optional additional genes to include (the genes will be assigned to the closest cluster)
`exclude.genes`	an optional list of genes to exclude from the heatmap
`labeled.gene.subset`	a subset of gene names to show (instead of all genes). Can be a vector of gene names, or a number of top genes (in each cluster) to show the names for.
`expression.quantile`	expression quantile to show (0.98 by default)
`pal`	palette to use for the main heatmap
`ordering`	order by which the top DE genes (to be shown) are determined (default "-AUC")
`column.metadata`	additional column metadata, passed either as a data.frame with rows named as cells, or as a list of named cell factors.
`show.gene.clusters`	whether to show gene cluster color codes
`remove.duplicates`	remove duplicated genes (leaving them in just one of the clusters)
`column.metadata.colors`	a list of color specifications for additional column metadata, specified according to the HeatmapMetadata format. Use "clusters" slot to specify cluster colors.
`show.cluster.legend`	whether to show the cluster legend
`show_heatmap_legend`	whether to show the expression heatmap legend
`border`	show borders around the heatmap and annotations
`return.details`	if TRUE will return a list containing the heatmap (ha), but also raw matrix (x), expression list (expl) and other info to produce the heatmap on your own.
`row.label.font.size`	font size for the row labels
`order.clusters`	whether to re-order the clusters according to the similarity of the expression patterns (of the genes being shown)
`split`	logical If TRUE splits the heatmap by cell type (default=FALSE)
`split.gap`	numeric The distance to put in the gaps between split parts of the heatmap if split=TRUE (default=0)
`cell.order`	explicitly supply cell order
`averaging.window`	optional window averaging between neighboring cells within each group (turned off by default) - useful when very large number of cells shown (requires zoo package)
`...`	extra parameters are passed to pheatmap

Value

ComplexHeatmap::Heatmap object (see return.details param for other output)

Prepare data for LR analysis and get soft thresholds to use for gene modules

Description

Prepare data for LR analysis and get soft thresholds to use for gene modules

Usage

prep_LR_interact(
  container,
  lr_pairs,
  norm_method = "trim",
  scale_factor = 10000,
  var_scale_power = 0.5,
  batch_var = NULL
)
prep_LR_interact(
  container,
  lr_pairs,
  norm_method = "trim",
  scale_factor = 10000,
  var_scale_power = 0.5,
  batch_var = NULL
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`lr_pairs`	data.frame Data of ligand-receptor pairs. First column should be ligands and second column should be one or more receptors separated by an underscore such as receptor1_receptor2 in the case that multiple receptors are required for signaling.
`norm_method`	character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim')
`scale_factor`	numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000)
`var_scale_power`	numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5)
`batch_var`	character A batch variable from metadata to remove (default=NULL)

Value

The project container with added container$scale_pb_extra slot that contains the tensor with additional ligands and receptors. Also has container$no_scale_pb_extra slot with pseudobulked, normalized data that is not scaled.

Project multicellular patterns to get scores on new data

Description

Project multicellular patterns to get scores on new data

Usage

project_new_data(new_container, old_container)
project_new_data(new_container, old_container)

Arguments

`new_container`	environment A project container with new data to project scores for. The form_tensor() function should be run.
`old_container`	environment The original project container that has the multicellular gene expression patterns already extracted. These patterns will be projected onto the new data.

Value

The new container environment object with projected scores in new_container$projected_scores. The factors will be ordered the same as the factors in old_container.

Gets a conos object of the data, aligning datasets across a specified variable such as batch or donors. This can be run independently or through get_subtype_prop_associations().

Description

Gets a conos object of the data, aligning datasets across a specified variable such as batch or donors. This can be run independently or through get_subtype_prop_associations().

Usage

reduce_dimensions(
  container,
  integration_var,
  ncores = container$experiment_params$ncores
)
reduce_dimensions(
  container,
  integration_var,
  ncores = container$experiment_params$ncores
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`integration_var`	character The meta data variable to use for creating the joint embedding with Conos.
`ncores`	numeric The number of cores to use (default=container$experiment_params$ncores)

Value

The project container with a conos object in container$embedding.

Reduce each cell type's expression matrix to just the significantly variable genes. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Reduce each cell type's expression matrix to just the significantly variable genes. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

reduce_to_vargenes(container)
reduce_to_vargenes(container)

Arguments

container

environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses

Value

The project container with pseudobulked matrices reduced to only the most variable genes.

Create a figure of all loadings plots arranged

Description

Create a figure of all loadings plots arranged

Usage

render_multi_plots(container, data_type, max_cols = 3)
render_multi_plots(container, data_type, max_cols = 3)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`data_type`	character Can be either "loadings", "gsea", or "dgenes". This determines which list of heatmaps to organize into the figure.
`max_cols`	numeric The max number of columns to plot. Can only either be 2 or 3 since these are large plots. (default=3)

Value

The multi-plot figure.

Examples

test_container <- get_all_lds_factor_plots(test_container)
fig <- render_multi_plots(test_container, data_type='loadings')
test_container <- get_all_lds_factor_plots(test_container)
fig <- render_multi_plots(test_container, data_type='loadings')

Reshape loadings for a factor from linearized to matrix form

Description

Reshape loadings for a factor from linearized to matrix form

Usage

reshape_loadings(ldngs_row, genes, ctypes)
reshape_loadings(ldngs_row, genes, ctypes)

Arguments

`ldngs_row`	numeric A vector of loadings values for one factor
`genes`	character The gene identifiers corresponding to each loading
`ctypes`	character The cell type corresponding to each loading

Value

A loadings matrix with dimensions of genes by cell types.

Run fgsea for one cell type of one factor

Description

Run fgsea for one cell type of one factor

Usage

run_fgsea(
  container,
  factor_select,
  ctype,
  db_use = "GO",
  signed = TRUE,
  min_gs_size = 15,
  max_gs_size = 500,
  ncores = container$experiment_params$ncores
)
run_fgsea(
  container,
  factor_select,
  ctype,
  db_use = "GO",
  signed = TRUE,
  min_gs_size = 15,
  max_gs_size = 500,
  ncores = container$experiment_params$ncores
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor of interest
`ctype`	character The cell type of interest
`db_use`	character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", and "Hallmark". More than one database can be used. (default="GO")
`signed`	logical If TRUE, uses signed gsea. If FALSE, uses unsigned gsea. Currently only works with fgsea method. (default=TRUE)
`min_gs_size`	numeric Minimum gene set size (default=15)
`max_gs_size`	numeric Maximum gene set size (default=500)
`ncores`	numeric The number of cores to use (default=container$experiment_params$ncores)

Value

A data.frame of the fgsea results for enrichment of gene sets in a given cell type for a given factor. The results contain adjusted p-values, normalized enrichment scores, leading edge genes, and other information output by fgsea.

Run gsea separately for all cell types of one specified factor and plot results

Description

Run gsea separately for all cell types of one specified factor and plot results

Usage

run_gsea_one_factor(
  container,
  factor_select,
  method = "fgsea",
  thresh = 0.05,
  db_use = "GO",
  signed = TRUE,
  min_gs_size = 15,
  max_gs_size = 500,
  reset_other_factor_plots = FALSE,
  draw_plot = TRUE,
  ncores = container$experiment_params$ncores
)
run_gsea_one_factor(
  container,
  factor_select,
  method = "fgsea",
  thresh = 0.05,
  db_use = "GO",
  signed = TRUE,
  min_gs_size = 15,
  max_gs_size = 500,
  reset_other_factor_plots = FALSE,
  draw_plot = TRUE,
  ncores = container$experiment_params$ncores
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor of interest
`method`	character The method of gsea to use. Can either be "fgsea", "fgsea_special or "hypergeometric". (default="fgsea")
`thresh`	numeric Pvalue significance threshold to use. Will include gene sets in resulting heatmap if pvalue is below this threshold for at least one cell type. (default=0.05)
`db_use`	character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", and "BioCarta". More than one database can be used. (default="GO")
`signed`	logical If TRUE, uses signed gsea. If FALSE, uses unsigned gsea. Currently only works with fgsea method (default=TRUE)
`min_gs_size`	numeric Minimum gene set size (default=15)
`max_gs_size`	numeric Maximum gene set size (default=500)
`reset_other_factor_plots`	logical Set to TRUE to set all other gsea plots to NULL (default=FALSE)
`draw_plot`	logical Set to TRUE to show the plot. Plot is stored regardless. (default=TRUE)
`ncores`	numeric The number of cores to use (default=container$experiment_params$ncores)

Value

A stacked heatmap plot of the gsea results in the slot container$plots$gsea$<Factor#>. The heatmaps show adjusted p-values for the enrichment of each gene set in each cell type for the selected factor. The top heatmap shows enriched gene sets among the positive loading genes and the bottom heatmap shows enriched gene sets among the negative loading genes for the factor.

Examples

test_container <- run_gsea_one_factor(test_container, factor_select=1,
method="fgsea", thresh=0.05, db_use="Hallmark", signed=TRUE)
test_container <- run_gsea_one_factor(test_container, factor_select=1,
method="fgsea", thresh=0.05, db_use="Hallmark", signed=TRUE)

Compute enriched gene sets among significant genes in a cell type for a factor using hypergeometric test

Description

Compute enriched gene sets among significant genes in a cell type for a factor using hypergeometric test

Usage

run_hypergeometric_gsea(
  container,
  factor_select,
  ctype,
  up_down,
  thresh = 0.05,
  min_gs_size = 15,
  max_gs_size = 500,
  db_use = "GO"
)
run_hypergeometric_gsea(
  container,
  factor_select,
  ctype,
  up_down,
  thresh = 0.05,
  min_gs_size = 15,
  max_gs_size = 500,
  db_use = "GO"
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`factor_select`	numeric The factor of interest
`ctype`	character The cell type of interest
`up_down`	character Either "up" to compute enrichment among the significant positive loading genes or "down" to compute enrichment among the significant negative loading genes.
`thresh`	numeric Pvalue significance threshold. Used as cutoff for calling genes as significant to use for enrichment tests. (default=0.05)
`min_gs_size`	numeric Minimum gene set size (default=15)
`max_gs_size`	numeric Maximum gene set size (default=500)
`db_use`	character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", and "BioCarta". More than one database can be used. (default="GO")

Value

A vector of adjusted p-values for enrichment of gene sets in the significant genes of a given cell type in a given factor.

Run jackstraw to get genes that are significantly associated with donor scores for factors extracted by Tucker decomposition

Description

Run jackstraw to get genes that are significantly associated with donor scores for factors extracted by Tucker decomposition

Usage

run_jackstraw(
  container,
  ranks,
  n_fibers = 100,
  n_iter = 500,
  tucker_type = "regular",
  rotation_type = "hybrid",
  seed = container$experiment_params$rand_seed,
  ncores = container$experiment_params$ncores
)
run_jackstraw(
  container,
  ranks,
  n_fibers = 100,
  n_iter = 500,
  tucker_type = "regular",
  rotation_type = "hybrid",
  seed = container$experiment_params$rand_seed,
  ncores = container$experiment_params$ncores
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ranks`	numeric The number of donor ranks and gene ranks to decompose to using Tucker decomposition
`n_fibers`	numeric The number of fibers the randomly shuffle in each iteration (default=100)
`n_iter`	numeric The number of shuffling iterations to complete (default=500)
`tucker_type`	character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular')
`rotation_type`	character Set to 'hybrid' to perform hybrid rotation on resulting donor factor matrix and loadings. Otherwise set to 'ica_lds' to perform ica rotation on loadings or ica_dsc to perform ica on donor scores. (default='hybrid')
`seed`	numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed)
`ncores`	numeric The number of cores to use (default=container$experiment_params$ncores)

Value

The project container with a vector of adjusted pvalues in container$gene_score_associations.

Examples

test_container <- run_jackstraw(test_container, ranks=c(2,4), n_fibers=2, n_iter=10,
tucker_type='regular', rotation_type='hybrid', ncores=1)
test_container <- run_jackstraw(test_container, ranks=c(2,4), n_fibers=2, n_iter=10,
tucker_type='regular', rotation_type='hybrid', ncores=1)

Test stability of a decomposition by subsampling or bootstrapping donors. Note that running this function will replace the decomposition in the project container with one resulting from the tucker parameters entered here.

Description

Test stability of a decomposition by subsampling or bootstrapping donors. Note that running this function will replace the decomposition in the project container with one resulting from the tucker parameters entered here.

Usage

run_stability_analysis(
  container,
  ranks,
  tucker_type = "regular",
  rotation_type = "hybrid",
  subset_type = "subset",
  sub_prop = 0.75,
  n_iterations = 100,
  ncores = container$experiment_params$ncores
)
run_stability_analysis(
  container,
  ranks,
  tucker_type = "regular",
  rotation_type = "hybrid",
  subset_type = "subset",
  sub_prop = 0.75,
  n_iterations = 100,
  ncores = container$experiment_params$ncores
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ranks`	numeric The number of donor, gene, and cell type ranks, respectively, to decompose to using Tucker decomposition.
`tucker_type`	character The 'regular' type is the only one implemented with sparsity constraints (default='regular')
`rotation_type`	character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid')
`subset_type`	character Set to either 'subset' or 'bootstrap' (default='subset')
`sub_prop`	numeric The proportion of donors to keep when using subset_type='subset' (default=.75)
`n_iterations`	numeric The number of iterations to perform (default=100)
`ncores`	numeric The number of cores to use (default=container$experiment_params$ncores)

Value

The project container with the donor scores stability plot in container$plots$stability_plot_dsc and the loadings stability plot in container$plots$stability_plot_lds

Examples

test_container <- run_stability_analysis(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='hybrid', subset_type='subset', 
sub_prop=0.75, n_iterations=5, ncores=1)
test_container <- run_stability_analysis(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='hybrid', subset_type='subset', 
sub_prop=0.75, n_iterations=5, ncores=1)

Run the Tucker decomposition and rotate the factors

Description

Run the Tucker decomposition and rotate the factors

Usage

run_tucker_ica(
  container,
  ranks,
  tucker_type = "regular",
  rotation_type = "hybrid"
)
run_tucker_ica(
  container,
  ranks,
  tucker_type = "regular",
  rotation_type = "hybrid"
)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ranks`	numeric The number of donor factors and gene factors, respectively, to decompose the data into. Since we rearrange the standard output of the Tucker decomposition to be 'donor centric', the number of donor factors will also be the total number of main factors that can be used for downstream analysis. The number of gene factors will only impact the quality of the decomposition.
`tucker_type`	character The 'regular' type is the only one currently implemented
`rotation_type`	character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid')

Value

Examples

test_container <- run_tucker_ica(test_container,ranks=c(2,4))
test_container <- run_tucker_ica(test_container,ranks=c(2,4))

Get a list of tensor fibers to shuffle

Description

Get a list of tensor fibers to shuffle

Usage

sample_fibers(tensor_data, n_fibers)
sample_fibers(tensor_data, n_fibers)

Arguments

`tensor_data`	list The tensor data including donor, gene, and cell type labels as well as the tensor array itself
`n_fibers`	numeric The number of fibers to get

Value

A list of gene and cell type indices for the randomly selected fibers

Scale font size. From simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R

Description

Scale font size. From simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R

Usage

scale_fontsize(x, rg = c(1, 30), fs = c(4, 16))
scale_fontsize(x, rg = c(1, 30), fs = c(4, 16))

Arguments

`x`	A numeric vector.
`rg`	The range.
`fs`	Range of the font size.

Value

A numeric vector.

Scale variance across donors for each gene within each cell type. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Scale variance across donors for each gene within each cell type. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

scale_variance(container, var_scale_power)
scale_variance(container, var_scale_power)

Arguments

container

environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses

var_scale_power

numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params.

Value

The project container with the variance altered for each gene within the pseudobulked matrices for each cell type.

Convert Seurat object to scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.

Description

Convert Seurat object to scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.

Usage

seurat_to_scMinimal(seurat_obj, metadata_cols = NULL, metadata_col_nm = NULL)
seurat_to_scMinimal(seurat_obj, metadata_cols = NULL, metadata_col_nm = NULL)

Arguments

`seurat_obj`	Seurat object that has been cleaned and includes the normalized, log-transformed counts. The meta.data should include a column with the header 'sex' and values of 'M' or 'F' if available. The metadata should also have a column with the header 'ctypes' with the corresponding names of the cell types as well as a column with header 'donors' that contains identifiers for each donor.
`metadata_cols`	character The names of the metadata columns to use (default=NULL)
`metadata_col_nm`	character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL)

Value

An scMinimal object holding counts and metadata for a project.

Shuffle elements within the selected fibers

Description

Shuffle elements within the selected fibers

Usage

shuffle_fibers(tensor_data, s_fibers)
shuffle_fibers(tensor_data, s_fibers)

Arguments

`tensor_data`	list The tensor data including donor, gene, and cell type labels as well as the tensor array itself
`s_fibers`	list Gene and cell type indices for the randomly selected fibers

Value

The tensor_data object with the values for the selected fibers shuffled.

Create the tensor object by stacking each pseudobulk cell type matrix. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Create the tensor object by stacking each pseudobulk cell type matrix. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

stack_tensor(container)
stack_tensor(container)

Arguments

container

environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses

Value

The project container with the list of tensor data in container$tensor_data.

Helper function from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/utils.R

Description

Helper function from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/utils.R

Usage

stop_wrap(...)
stop_wrap(...)

Arguments

...

other parameters

Value

No value is returned.

Subset an scMinimal object by specified genes, donors, cells, or cell types

Description

Subset an scMinimal object by specified genes, donors, cells, or cell types

Usage

subset_scMinimal(
  scMinimal,
  ctypes_use = NULL,
  cells_use = NULL,
  donors_use = NULL,
  genes_use = NULL,
  in_place = TRUE
)
subset_scMinimal(
  scMinimal,
  ctypes_use = NULL,
  cells_use = NULL,
  donors_use = NULL,
  genes_use = NULL,
  in_place = TRUE
)

Arguments

`scMinimal`	environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata
`ctypes_use`	character The cell types to keep (default=NULL)
`cells_use`	character Cell barcodes for the cells to keep (default=NULL)
`donors_use`	character The donors to keep (default=NULL)
`genes_use`	character The genes to keep (default=NULL)
`in_place`	logical If set to TRUE then replaces the input object with the new subsetted object (default=TRUE)

Value

A subsetted scMinimal object.

Examples

cell_names <- colnames(test_container$scMinimal_full$count_data)
cells_sub <- sample(cell_names,40)
scMinimal <- subset_scMinimal(test_container$scMinimal_full,
cells_use=cells_sub)
cell_names <- colnames(test_container$scMinimal_full$count_data)
cells_sub <- sample(cell_names,40)
scMinimal <- subset_scMinimal(test_container$scMinimal_full,
cells_use=cells_sub)

Data container for testing tensor formation steps

Description

Data container for testing tensor formation steps

Usage

test_container
test_container

Format

An object of class environment of length 10.

Helper function for running the decomposition. Use the run_tucker_ica() wrapper function instead.

Description

Helper function for running the decomposition. Use the run_tucker_ica() wrapper function instead.

Usage

tucker_ica_helper(
  tensor_data,
  ranks,
  tucker_type,
  rotation_type,
  projection_container = NULL
)
tucker_ica_helper(
  tensor_data,
  ranks,
  tucker_type,
  rotation_type,
  projection_container = NULL
)

Arguments

`tensor_data`	list The tensor data including donor, gene, and cell type labels as well as the tensor array itself
`ranks`	numeric The number of donor and gene factors respectively, to decompose to using Tucker decomposition.
`tucker_type`	character The 'regular' type is the only one currently implemented
`rotation_type`	character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation.
`projection_container`	environment A project container to store projection data in. Currently only implemented for 'hybrid' and 'ica_dsc' rotations. (default=NULL)

Value

The list of results for tucker decomposition with donor scores matrix in first element and loadings matrix in second element.

Update any of the experiment-wide parameters

Description

Update any of the experiment-wide parameters

Usage

update_params(container, ctypes_use = NULL, ncores = NULL, rand_seed = NULL)
update_params(container, ctypes_use = NULL, ncores = NULL, rand_seed = NULL)

Arguments

`container`	environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses
`ctypes_use`	character Names of the cell types to use for the analysis (default=NULL)
`ncores`	numeric Number of cores to use (default=NULL)
`rand_seed`	numeric Random seed to use (default=NULL)

Value

The project container with updated experiment parameters in container$experiment_params.

Examples

test_container <- update_params(test_container, ncores=1)
test_container <- update_params(test_container, ncores=1)

Compute significantly variable genes via anova. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Compute significantly variable genes via anova. Generally, this should be done through calling the form_tensor() wrapper function.

Usage

vargenes_anova(scMinimal, ncores)
vargenes_anova(scMinimal, ncores)

Arguments

`scMinimal`	environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms
`ncores`	numeric Number of cores to use

Value

A list of raw p-values for each gene.

Package 'scITD'

Help Index

Apply ComBat batch correction to pseudobulk matrices. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Usage

Arguments

Value

Calculate F-Statistics for the association between donor scores for each factor donor values of shuffled gene_ctype fibers

Description

Usage

Arguments

Value

Helper function to check whether receptor is present in target cell type

Description

Usage

Arguments

Value

Clean data to remove genes only expressed in a few cells and donors with very few cells. Generally, this should be done through calling the form_tensor() wrapper function.

Description

Usage

Arguments

Value

Calculates column mean and variance. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp

Description

Usage

Arguments

Value

Examples

Plot a pairwise comparison of factors from two separate decompositions

Description

Usage

Arguments

Value

Examples

Compute associations between donor proportions and factor scores

Description

Usage

Arguments

Value

Get donor proportions of each cell type or subtype

Description

Usage

Arguments

Value

Compute and plot the LR interactions for one factor

Description

Usage

Arguments

Value

Convert gene identifiers to gene symbols

Description

Usage

Arguments

Value

count_word. From older version of simplifyEnrichment package.

Description

Usage

Arguments

Value

Run rank determination by svd on the tensor unfolded along each mode

Description

Usage

Arguments

Value

Examples

Form the pseudobulk tensor as preparation for running the tensor decomposition.

Description

Usage

Arguments

Value

Examples

Generate loadings heatmaps for all factors

Description

Usage

Arguments

Value

Examples

Get gene callout annotations for a loadings heatmap

Description

Usage