t-SNE analysis
Helping to reduce dimensionality and visualize relationships in a non-linear fashion
Use the t-SNE (t-distributed Stochastic Neighbor Embedding) tab to visualize high-dimensional data in a low-dimensional map, typically 2D.
t-SNE is particularly good at revealing local structure and clusters within your data. It works by modeling similarities between high-dimensional data points and representing them as probabilities, then finding a low-dimensional embedding that preserves these similarities.
Unlike PCA, t-SNE uses a non-linear algorithm. This often makes it better suited for visualizing complex datasets where relationships aren't linear, such as identifying distinct cell populations in single-cell RNA sequencing (scRNA-seq) data.
Keep in mind:
t-SNE is primarily for visualization, not necessarily for preserving global distances accurately. The distances between clusters in a t-SNE plot might not be meaningful.
The resulting plot can depend heavily on the chosen parameters (like perplexity).

Configure the t-SNE calculation and visualization using the options in the side panel. For general setup like initial column selection and standard preprocessing, refer to the main documentation sections.
1. t-SNE Hyperparameter Setup
These parameters control the core t-SNE algorithm. Finding optimal values often requires experimentation, but PANDORA may provide automatic optimization or reasonable defaults.
Perplexity:
Related to the number of nearest neighbors considered for each point. It balances attention to local vs. global aspects of the data.
Typical values range from 5 to 50. Lower values emphasize local structure; higher values consider more neighbors.
Exaggeration Factor:
Controls how much the natural clusters in the data are separated from each other during the initial optimization phase. Higher values can create more space between clusters.
Typical values might range from 4 to 30.
Theta:
Controls the trade-off between speed and accuracy for the Barnes-Hut approximation used in t-SNE.
Lower values (e.g., 0) are more accurate but slower. Higher values (e.g., 0.5 to 1) are faster but less accurate.
Max Iterations:
The maximum number of optimization steps the algorithm will run.
Should be high enough for the embedding to stabilize (often 1000 or more). PANDORA allows up to 50,000.
Learning Rate (Eta):
Controls the step size during the optimization process.
Typical values might be around 200. If the learning rate is too high, the embedding might diverge; if too low, it might take many iterations to converge.
2. Clustered t-SNE Settings
These settings apply specifically when generating the Clustered t-SNE Plot, which runs a clustering algorithm on the 2D t-SNE results.
Clustering Algorithm: Choose the method used to identify clusters in the 2D t-SNE map:
Louvain: Community detection algorithm often used with KNN graphs.
K (for KNN graph): The number of nearest neighbors used to build the graph for Louvain clustering.
Hierarchical Clustering: Builds a hierarchy of clusters.
Clustering Method (Linkage): Select the linkage method (e.g.,
ward
,complete
,average
).
Mclust: Model-based clustering assuming Gaussian mixture models.
epsQuantile: Parameter related to density or neighborhood size (shared with Density-based).
Density-based clustering (e.g., DBSCAN): Groups points based on density.
epsQuantile: Parameter controlling the density threshold or neighborhood size. Higher values increase the considered neighborhood.
3. Dataset Analysis Settings (Post-Clustering Analysis)
Perform further analysis on the identified clusters from the Clustered t-SNE.
Dataset Analysis Type: Select how to visualize the characteristics of the identified clusters using the original high-dimensional data:
Heatmap: Shows the mean expression/value of original variables within each cluster.
Hierarchical Clustering: Performs clustering on the cluster means or representative profiles.
Grouped Display: (Typically used with Heatmap) Display the mean values of the original variables for each identified t-SNE cluster.
4. Optional Visualization Settings
Control how points are colored in the main t-SNE plots:
Grouping Variable:
Select a categorical variable from your metadata (e.g., 'cell_type', 'treatment').
Points in the t-SNE plot will be colored according to this variable.
Important: This variable is excluded from the t-SNE calculation itself and used only for visualization.
Color Variable:
Select a continuous variable from your dataset (e.g., expression level of a specific gene, a clinical score).
Points in the t-SNE plot will be colored based on the value of this variable (using a continuous color scale).
Important: This variable is included in the t-SNE calculation along with other selected features.
Last updated
Was this helpful?