t-SNE analysis

Helping to reduce dimensionality and visualize relationships in a non-linear fashion

Use the t-SNE (t-distributed Stochastic Neighbor Embedding) tab to visualize high-dimensional data in a low-dimensional map, typically 2D.

t-SNE is particularly good at revealing local structure and clusters within your data. It works by modeling similarities between high-dimensional data points and representing them as probabilities, then finding a low-dimensional embedding that preserves these similarities.

Unlike PCA, t-SNE uses a non-linear algorithm. This often makes it better suited for visualizing complex datasets where relationships aren't linear, such as identifying distinct cell populations in single-cell RNA sequencing (scRNA-seq) data.

Keep in mind:

  • t-SNE is primarily for visualization, not necessarily for preserving global distances accurately. The distances between clusters in a t-SNE plot might not be meaningful.

  • The resulting plot can depend heavily on the chosen parameters (like perplexity).

Configure the t-SNE calculation and visualization using the options in the side panel. For general setup like initial column selection and standard preprocessing, refer to the main documentation sections.

1. t-SNE Hyperparameter Setup

These parameters control the core t-SNE algorithm. Finding optimal values often requires experimentation, but PANDORA may provide automatic optimization or reasonable defaults.

  • Perplexity:

    • Related to the number of nearest neighbors considered for each point. It balances attention to local vs. global aspects of the data.

    • Typical values range from 5 to 50. Lower values emphasize local structure; higher values consider more neighbors.

  • Exaggeration Factor:

    • Controls how much the natural clusters in the data are separated from each other during the initial optimization phase. Higher values can create more space between clusters.

    • Typical values might range from 4 to 30.

  • Theta:

    • Controls the trade-off between speed and accuracy for the Barnes-Hut approximation used in t-SNE.

    • Lower values (e.g., 0) are more accurate but slower. Higher values (e.g., 0.5 to 1) are faster but less accurate.

  • Max Iterations:

    • The maximum number of optimization steps the algorithm will run.

    • Should be high enough for the embedding to stabilize (often 1000 or more). PANDORA allows up to 50,000.

  • Learning Rate (Eta):

    • Controls the step size during the optimization process.

    • Typical values might be around 200. If the learning rate is too high, the embedding might diverge; if too low, it might take many iterations to converge.

2. Clustered t-SNE Settings

These settings apply specifically when generating the Clustered t-SNE Plot, which runs a clustering algorithm on the 2D t-SNE results.

  • Clustering Algorithm: Choose the method used to identify clusters in the 2D t-SNE map:

    • Louvain: Community detection algorithm often used with KNN graphs.

      • K (for KNN graph): The number of nearest neighbors used to build the graph for Louvain clustering.

    • Hierarchical Clustering: Builds a hierarchy of clusters.

      • Clustering Method (Linkage): Select the linkage method (e.g., ward, complete, average).

    • Mclust: Model-based clustering assuming Gaussian mixture models.

      • epsQuantile: Parameter related to density or neighborhood size (shared with Density-based).

    • Density-based clustering (e.g., DBSCAN): Groups points based on density.

      • epsQuantile: Parameter controlling the density threshold or neighborhood size. Higher values increase the considered neighborhood.

3. Dataset Analysis Settings (Post-Clustering Analysis)

Perform further analysis on the identified clusters from the Clustered t-SNE.

  • Dataset Analysis Type: Select how to visualize the characteristics of the identified clusters using the original high-dimensional data:

    • Heatmap: Shows the mean expression/value of original variables within each cluster.

    • Hierarchical Clustering: Performs clustering on the cluster means or representative profiles.

  • Grouped Display: (Typically used with Heatmap) Display the mean values of the original variables for each identified t-SNE cluster.

4. Optional Visualization Settings

Control how points are colored in the main t-SNE plots:

  • Grouping Variable:

    • Select a categorical variable from your metadata (e.g., 'cell_type', 'treatment').

    • Points in the t-SNE plot will be colored according to this variable.

    • Important: This variable is excluded from the t-SNE calculation itself and used only for visualization.

  • Color Variable:

    • Select a continuous variable from your dataset (e.g., expression level of a specific gene, a clinical score).

    • Points in the t-SNE plot will be colored based on the value of this variable (using a continuous color scale).

    • Important: This variable is included in the t-SNE calculation along with other selected features.

Last updated

Was this helpful?