In this phase, you will run t-SNE analysis to check for confounding variables within your dataset.
Assess whether confounding variables, such as age, sex, or batch year, are evenly distributed across responder classifications. Identifying confounding ensures that any patterns in your predictive models are biologically meaningful rather than products of biased group composition.
Select year, sex, and the responder column for Grouping variable
Select z_score_continuous for Color variable
2. Check for confounding variables
Compare all t-SNE plots generated to the responder t-SNE plot
Is there an approximately equal distribution of confounding variable values in each responder class? If not, there may be confounding in your predictive model.
ex. Is there an equal distribution of males and females in each responder class?
An example confounding check with the manual HAI Responder group
Z-score vs HAI Responder
Here we see no confounding effect from z-score
Year vs HAI Responder
Confounding is unclear
Sex vs HAI Responder
Confounding is unclear
After generating all these t-SNE plots for the confounder check, it may be a good idea to save the plots to report with your findings later.
3. Additional analysis
In some cases, the resulting t-SNE plots for confounding analysis may be unclear, warranting further analysis, as in the example. It can be beneficial to manually check the confounding variable distribution for each responder class in these cases.
Open the dataset with responder columns in Excel
Filter by responder class, and manually check the distribution for any confounder variables warranting further analysis
You’ve examined the distribution of key demographic variables across responder classes to detect possible confounding. If distributions appear balanced, you can proceed confidently; if not, consider addressing the imbalance before continuing with predictive modeling.