PCA analysis
Helping to reduce dimensionality and visualize relationships.
Use the PCA Analysis tab to perform Principal Component Analysis (PCA). This is a powerful technique for dimensionality reduction, simplifying complex datasets by identifying the main axes of variation (principal components).
Why use PCA?
Exploratory Data Analysis: Visualize high-dimensional biological data (like gene expression, proteomics, or flow cytometry data) in 2D or 3D plots to spot patterns, clusters, outliers, or batch effects among your samples.
Machine Learning Preparation: Reduce the number of features (e.g., genes, proteins) before feeding data into machine learning models. This can:
Improve model performance by focusing on meaningful variation.
Reduce computational complexity and training time.
Help prevent overfitting by removing noise or redundant information.
Understanding Variance: Identify which original variables contribute most strongly to the differences observed between samples or experimental conditions.
This tab provides tools to calculate principal components and visualize the results through various plots like scree plots, variable contribution plots, and sample scatter plots.

Setup Options
Grouping Variable:
Select a categorical variable from your dataset (e.g., 'treatment', 'cell_type', 'batch').
Important: This variable is only used for coloring or grouping points in the output plots (like the Individuals Plot). It does not influence the PCA calculation itself.
Use this to visually check if samples with the same label cluster together in the principal component space.
X and Y Axes:
Choose which principal components (PCs) to display on the X and Y axes of the scatter plots (Individuals and Variables plots).
Defaults usually are PC1 (explains the most variance) for the X-axis and PC2 (explains the second most) for the Y-axis. You can change this to explore other dimensions (e.g., PC2 vs. PC3).
KMO/Bartlett Column Limit:
Set a maximum number of columns (variables) for performing the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett's test of sphericity.
These tests help assess if your data is suitable for PCA. If your dataset has more columns than this limit, these tests will be skipped to save computation time.
Analysis Method:
Choose the appropriate method based on your data type:
PCA (Principal Component Analysis): Use for numerical variables.
MCA (Multiple Correspondence Analysis): Use for categorical variables.
Display Loadings:
Toggle this option ON to overlay variable loadings (arrows indicating variable contributions) onto the Individuals Plot. This helps relate sample positions to the influence of original variables. (Note: May clutter the plot if many variables are present).
Ellipse Options (for Grouping Variable):
Remove Ellipse: Toggle this OFF to draw concentration or confidence ellipses around the groups defined by your Grouping Variable on the Individuals Plot. Toggle ON to hide these ellipses.
Ellipse Alpha: Adjust the transparency level (0 = fully transparent, 1 = fully opaque) of the group ellipses when they are displayed. Lower values make the ellipses fainter.
Last updated
Was this helpful?