In this phase, you will create models to predict response classification from baseline immune measurements.
Prepare your dataset for predictive analysis by removing outcome variables that could bias results, ensuring that only baseline predictor variables remain. Then, configure and run predictive models in PANDORA using the cleaned dataset.
1. Process dataset
To ensure unbiased predictions, it's important to remove any outcome variables that aren't the designated responder. Various tools can be used for this step, but Excel is used in the example below.
Open your Flu Fighter dataset with responder columns in Excel
Search and remove all undesired outcome variables. A few examples below:
ch6_titer_v21, h3_v2_shed, h1_hai_gmt_fold_change
Helpful search terms
fold
v2
v7
Select and delete every column containing these terms.
Save as a new predictive processed .csv file
Upload the new file to PANDORA
2. Setup prediction task
Navigate to Workspace
Select the processed Flu Fighters dataset with added ResponderStatus or Cluster column
Navigate to Predictive -> Start
Configure Analysis Properties
Select all columns as Predictor variables
Use PANDORA's Exclude predictors for *fold_change, v2, v7, v21 or any other accidental outcome variables. There should be none if the predictive processing was completed correctly in step 1.
Select ResponderStatus or pandora_cluster column for Response
Select Preprocessing options center, scale, medianimpute, corr, zv, and nzv
Set Training/Testing dataset partition to 75% training and 25% testing
Select packages for your predictive models
For this example, select rf, nb, glm, mlp, and C5.0
Experimental Options
When creating your own predictive models, you can experiment with the following:
Training/Testing dataset partition: Different models perform better in different partitions, and experimenting with this parameter can help generate the best model.
Packages: PANDORA has 200+ packages for predictive models, and you can even select a whole family of models with similar features.
Caution: Running too many models simultaneously on a personal computer may significantly increase processing time, and computationally intensive models may fail due to Timeout
3. Run analysis
Click the Validate data button
Click the Process button on the pop-up that appears
Monitor Progress on your PANDORA Dashboard
You’ve successfully processed your dataset to remove bias-inducing outcome variables and configured predictive models using PANDORA. Once your models have completed processing, you're ready to interpret the results and evaluate model performance in the next phase.