Phase 3: Confounding check

In this phase, you will run t-SNE analysis to check for confounding variables within your dataset.

Assess whether confounding variables, such as age, sex, or batch year, are evenly distributed across responder classifications. Identifying confounding ensures that any patterns in your predictive models are biologically meaningful rather than products of biased group composition.

1. Confounding analysis (Example)
  1. Configure Column Selection

    1. Select all *fold_change columns

    2. Select year, sex, and the responder column for Grouping variable

    3. Select z_score_continuous for Color variable

2. Check for confounding variables
  1. Compare all t-SNE plots generated to the responder t-SNE plot

    • Is there an approximately equal distribution of confounding variable values in each responder class? If not, there may be confounding in your predictive model.

      • ex. Is there an equal distribution of males and females in each responder class?

  1. An example confounding check with the manual HAI Responder group

    • Z-score vs HAI Responder

      • Here we see no confounding effect from z-score

    • Year vs HAI Responder

      • Confounding is unclear

    • Sex vs HAI Responder

      • Confounding is unclear

3. Additional analysis

In some cases, the resulting t-SNE plots for confounding analysis may be unclear, warranting further analysis, as in the example. It can be beneficial to manually check the confounding variable distribution for each responder class in these cases.

  1. Open the dataset with responder columns in Excel

  1. Filter by responder class, and manually check the distribution for any confounder variables warranting further analysis

You’ve examined the distribution of key demographic variables across responder classes to detect possible confounding. If distributions appear balanced, you can proceed confidently; if not, consider addressing the imbalance before continuing with predictive modeling.

Last updated

Was this helpful?