Phase 4: Predictive modeling

In this phase, you will create models to predict response classification from baseline immune measurements.

Prepare your dataset for predictive analysis by removing outcome variables that could bias results, ensuring that only baseline predictor variables remain. Then, configure and run predictive models in PANDORA using the cleaned dataset.

1. Process dataset

To ensure unbiased predictions, it's important to remove any outcome variables that aren't the designated responder. Various tools can be used for this step, but Excel is used in the example below.

  1. Open your Flu Fighter dataset with responder columns in Excel

  1. Search and remove all undesired outcome variables. A few examples below:

    1. ch6_titer_v21, h3_v2_shed, h1_hai_gmt_fold_change

    2. Helpful search terms

      1. fold

      2. v2

      3. v7

  1. Select and delete every column containing these terms.

  1. Save as a new predictive processed .csv file

  1. Upload the new file to PANDORA

2. Setup prediction task
  1. Navigate to Workspace

  1. Select the processed Flu Fighters dataset with added ResponderStatus or Cluster column

  1. Navigate to Predictive -> Start

  1. Configure Analysis Properties

    1. Select all columns as Predictor variables

    2. Use PANDORA's Exclude predictors for *fold_change, v2, v7, v21 or any other accidental outcome variables. There should be none if the predictive processing was completed correctly in step 1.

    3. Select ResponderStatus or pandora_cluster column for Response

    4. Select Preprocessing options center, scale, medianimpute, corr, zv, and nzv

    5. Set Training/Testing dataset partition to 75% training and 25% testing

  1. Select packages for your predictive models

    1. For this example, select rf, nb, glm, mlp, and C5.0

Experimental Options

When creating your own predictive models, you can experiment with the following:

  • Training/Testing dataset partition: Different models perform better in different partitions, and experimenting with this parameter can help generate the best model.

  • Packages: PANDORA has 200+ packages for predictive models, and you can even select a whole family of models with similar features.

3. Run analysis
  1. Click the Validate data button

  1. Click the Process button on the pop-up that appears

  1. Monitor Progress on your PANDORA Dashboard

You’ve successfully processed your dataset to remove bias-inducing outcome variables and configured predictive models using PANDORA. Once your models have completed processing, you're ready to interpret the results and evaluate model performance in the next phase.

Last updated

Was this helpful?