Phase 4: Predictive modeling

In this phase, you will create models to predict response classification from baseline immune measurements.

Prepare your dataset for predictive analysis by removing outcome variables that could bias results, ensuring that only baseline predictor variables remain. Then, configure and run predictive models in PANDORA using the cleaned dataset.

1. Process dataset

To ensure unbiased predictions, it's important to remove any outcome variables that aren't the designated responder. Various tools can be used for this step, but Excel is used in the example below.

Open your Flu Fighter dataset with responder columns in Excel

Search and remove all undesired outcome variables. A few examples below:
1. ch6_titer_v21, h3_v2_shed, h1_hai_gmt_fold_change
2. Helpful search terms
  1. fold
  2. v2
  3. v7

Select and delete every column containing these terms.

Save as a new predictive processed .csv file

Upload the new file to PANDORA

2. Setup prediction task

Navigate to Workspace

Select the processed Flu Fighters dataset with added ResponderStatus or Cluster column

Navigate to Predictive -> Start

Configure Analysis Properties
1. Select all columns as Predictor variables
2. Use PANDORA's Exclude predictors for *fold_change, v2, v7, v21 or any other accidental outcome variables. There should be none if the predictive processing was completed correctly in step 1.
3. Select ResponderStatus or pandora_cluster column for Response
4. Select Preprocessing options center, scale, medianimpute, corr, zv, and nzv
5. Set Training/Testing dataset partition to 75% training and 25% testing

Select packages for your predictive models
1. For this example, select rf, nb, glm, mlp, and C5.0

Experimental Options

When creating your own predictive models, you can experiment with the following:

Training/Testing dataset partition: Different models perform better in different partitions, and experimenting with this parameter can help generate the best model.
Packages: PANDORA has 200+ packages for predictive models, and you can even select a whole family of models with similar features.
Multi-set Intersection
Feature Selection

Caution: Running too many models simultaneously on a personal computer may significantly increase processing time, and computationally intensive models may fail due to Timeout

3. Run analysis

Click the Validate data button

Click the Process button on the pop-up that appears

Monitor Progress on your PANDORA Dashboard

You’ve successfully processed your dataset to remove bias-inducing outcome variables and configured predictive models using PANDORA. Once your models have completed processing, you're ready to interpret the results and evaluate model performance in the next phase.

PreviousImportance of confounding checks NextPhase 5: Model evaluation

Last updated 2 months ago

Was this helpful?