In this phase, you will gather and report findings for your top predictive model.
Report the best model and its test set performance (e.g., AUC). List the top predictors identified via Variable Importance. Describe insights from confounder analysis (Phase 3) and Model Interpretation (if applicable). Discuss the biological relevance of the top predictors.
1. Combine Findings
Identify the top model from phase 4 by considering
Model performance metrics
ROC Curves
Biological relevance of top predictors in Variable Importance
Confounder Check (phase 3)
Pull together all your findings, including
Clustered t-SNE plots for responder classification, if applicable (Phase 2)
t-SNE plots and analysis from Confounder check (Phase 3)
Model performance metrics (Phase 4)
Training and Testing ROC Curves
Model Interpretation plots, if applicable
Variable Importance bar plot
Features across dataset dot plots for top predictive features
2. Analyze GO Terms & Biological Themes
The GO terms present in your dataset are a result of pathway enrichment analysis, which is a powerful tool external to PANDORA that helps identify biological themes from gene expression. You can use GO term databases to identify GO terms to uncover overall biological themes in responder groups and model prediction.
Pathway Enrichment Analysis Tools:
clusterProfiler in R
DAVID
Metascape
Enrichr
GO term databases
GO
KEGG
Reactome
GO Terms alongside predictive variables can be used to identify biological themes using the following workflow:
Search for all your top GO predictive terms in the form GO:#
i.e. GO:0070206, GO:1903214
Click term history to see ancestor chart, child terms, and co-occurring terms
Create a list of all biological processes and themes related to your GO Terms
Check the expression levels of baseline terms in responder groups
Select your predictive processed dataset from the Workspace (This dataset should only contain baseline features and your responder columns)
Navigate to Discovery -> Start -> Hierarchical Clustering
Configure Clustering Column Selection
Select your Responder column for the Columns
Set First (n) rows such that it is larger than the total number of baseline features
Configure Clustering Display Options
Enable Grouped display
Select the responder column for Grouped column
Click Plot image
Analyze the resultant heatmap
Take note on how the expression of top predictive variables varies among the responder classes.
With biological themes in mind from predictive variables and top GO terms, consider the biological themes among responder classes.
Make plots reflecting biological themes (optional)
Outside of PANDORA, you may create additional plots, such as radar plots, reflecting the different immune profiles of responder classes based on the baseline or fold change expression levels of features in each class.
You've now identified and analyzed your strongest model through consideration of model performance, biological interpretation, and confounder analysis. By pulling all your analysis together, you have now created a comprehensive picture of your model to draw biologically relevant insights from.