# Phase 3: Confounding check

Assess whether confounding variables, such as age, sex, or batch year, are evenly distributed across responder classifications. Identifying confounding ensures that any patterns in your predictive models are biologically meaningful rather than products of biased group composition.

<details>

<summary>1. Confounding analysis (Example)</summary>

1. Navigate to [**Discovery** -> **Start** -> **t-SNE analysis**](https://app.gitbook.com/s/9LdC62ZpkxqvCBTPwVZU/data-analysis/discovery/t-sne-analysis)
2. Configure **Column Selection**
   1. Select all `*fold_change` columns
   2. Select `year`, `sex`, and the responder column for **Grouping variable**
   3. Select `z_score_continuous` for **Color variable**

<figure><img src="https://1845146574-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZMrkCA3Bqd62Gp0kAk79%2Fuploads%2FhZpKlq9U8vScJv6NMuFj%2FFF_Phase%203_Confounding%20Setup%20tSNE_annotated.png?alt=media&#x26;token=fb61a2b1-79ec-4feb-a204-93c6e130a150" alt=""><figcaption></figcaption></figure>

</details>

<details>

<summary>2. Check for confounding variables</summary>

1. Compare all t-SNE plots generated to the responder t-SNE plot
   * Is there an approximately equal distribution of confounding variable values in each responder class?  If not, there may be confounding in your predictive model.
     * ex. Is there an equal distribution of males and females in each responder class?
2. An example confounding check with the manual HAI Responder group

   * Z-score vs HAI Responder
     * Here we see no confounding effect from z-score

   <figure><img src="https://1845146574-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZMrkCA3Bqd62Gp0kAk79%2Fuploads%2FVfe466qRnbnya7aqVapv%2FFF_Phase%20%203_Z-score%20vs%20HAI%20Responder.png?alt=media&#x26;token=db0a0684-6a74-49bd-9bc6-a4807c03b627" alt="" width="563"><figcaption></figcaption></figure>

   * Year vs HAI Responder
     * Confounding is unclear

   <figure><img src="https://1845146574-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZMrkCA3Bqd62Gp0kAk79%2Fuploads%2FH7jFNgK1tifbOZoo9vFr%2FFF_Phase%20%203_Batch%20Year%20vs%20HAI%20Responder.png?alt=media&#x26;token=2d1cbd34-4c35-413e-8959-b393191ff026" alt="" width="563"><figcaption></figcaption></figure>

   * Sex vs HAI Responder
     * Confounding is unclear

<figure><img src="https://1845146574-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZMrkCA3Bqd62Gp0kAk79%2Fuploads%2Ff9b27zCfZy32LChywDbY%2FFF_Phase%20%203_Sex%20vs%20HAI%20Responder.png?alt=media&#x26;token=9457e31d-b70b-495f-a3ea-f3b965eb079d" alt="" width="563"><figcaption></figcaption></figure>

{% hint style="success" %}
After generating all these t-SNE plots for the confounder check, it may be a good idea to save the plots to report with your findings later.
{% endhint %}

</details>

<details>

<summary>3. Additional analysis</summary>

In some cases, the resulting t-SNE plots for confounding analysis may be unclear, warranting further analysis, as in the example. It can be beneficial to manually check the confounding variable distribution for each responder class in these cases.

1. Open the dataset with responder columns in Excel
2. Filter by responder class, and manually check the distribution for any confounder variables warranting further analysis

<figure><img src="https://1845146574-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZMrkCA3Bqd62Gp0kAk79%2Fuploads%2FISyu7PAvlBicPB2pZv1o%2FHAI%20Responder_Sex%20and%20year%20confound.png?alt=media&#x26;token=21bd81c8-c156-45e9-8fbd-9e17284f2311" alt=""><figcaption></figcaption></figure>

</details>

You’ve examined the distribution of key demographic variables across responder classes to detect possible confounding. If distributions appear balanced, you can proceed confidently; if not, consider addressing the imbalance before continuing with predictive modeling.
