PCA can be used to reduce the dimensionality of the complex immune data and visualize the features that contribute to the most variation in the dataset across all timepoints. We will use PCA to also investigate how individuals cluster based on their overall immune profile and whether this relates to features such as disease severity, changes over time, or responder status
Action
Perform PCA
Navigate to PCA analysis by going to Discovery -> Start -> PCA analysis
Select all relevant columns on which to perform PCA. This can be achieved in two ways:
Selecting desired columns in the Columns tab. For this example, we will choose all numerical immunological assay columns (e.g. e.g., pseudoNA Abs, ADCD, ADMP, ADNKA, B cells elispot, S-IgA, S-IgG1…, N-IgG, Proliferation assays, T cell ELISpots, MSD assays etc.)
Removing undesired columns in the Exclude Columns tab. For this example, since we want to keep all numerical immunological assays, we will remove Donor ID, Timepoint, Days pso, Responder, demographics (Age, Sex), clinical symptom columns
You cannot use categorical variables to perform PCA
Perform preprocessing of the features. This is essential for PCA
Choose center and scale to perform z-score normalization on the data
Choose a method for addressing the missing values. There are two options: a)medianimpute (replaces NA with median of the feature data, might be acceptable for visualization) and b)Remove NA toggle (if imputation is undesirable, but this reduces data considerably)
Choose a grouping variable. This will determine how to color the PCA plot and clusters, and is vital for interpreting immune trajectories
To choose a grouping variable, go to PCA Settings (below Preprocessing Options)
For this dataset, we will be grouping the variables based on Disease severity, Timepoint and optionally Responder variables. The plots and analysis using these grouping variables can be seen below.