Discovery

Exploratory Data Analysis

The Discovery section in PANDORA offers a set of tools to help you get to know your biomedical datasets. It's designed for initial data exploration, visualization, and finding underlying patterns using unsupervised machine learning tools. Using these tools first can help you understand your data's characteristics, spot potential relationships, and form hypotheses before you move on to more complex modeling.

Key goals when using the Discovery tools:

  • Get familiar with your data: Use the Data Overview to check the structure of your dataset, how your variables are distributed, and the overall data quality.

  • Assess relationships: The Correlation tool helps you measure and visualize how different variables in your dataset relate to each other.

  • Find patterns: Identify natural groupings and structures within your data using unsupervised methods:

    • Hierarchical Clustering: Groups similar samples or features based on their values.

    • PCA Analysis (Principal Component Analysis): Simplifies complex data by finding the main sources of variation, helping you visualize these in lower dimensions.

    • t-SNE Analysis and UMAP: Visualize high-dimensional data (like genomics or proteomics) in 2D or 3D plots to reveal clusters and non-linear relationships that might not be obvious otherwise.

Each tab within this section is dedicated to a specific analytical approach, allowing for a systematic exploration of your data.

Use the Data Overview tab to get a quick summary of your dataset and explore initial data distributions.

Available Plots

  • Table Plot: Visualize distribution patterns for multiple variables together in a single figure. This helps you spot broader trends across your data.

  • Distribution Plot: Examine the frequency and spread for individual variables. Use this to check data ranges and identify potential outliers.

Settings

Customize your overview using these options:

  • Column Selection: Choose which variables (columns) you want to include or exclude from the plots. This lets you focus on specific parts of your data.

  • Preprocessing: Apply simple preprocessing steps directly within the tab, such as normalization or handling missing values, before generating visualizations.

  • Theme Settings: Change the visual appearance (like colors and styles) of your plots to make patterns easier to see.

Last updated

Was this helpful?