Correlation

Enables users to analyze correlations between variables within a dataset.

Use the Correlation tab to explore relationships between variables in your dataset. PANDORA creates correlograms (correlation plots) that help you visualize these connections.

You can easily see the strength and direction (positive or negative) of correlations between variables. Additionally, you can check the statistical significance (p-values) of these relationships to understand how reliable they are.

This section details the specific settings available in the Correlation tab. For general setup steps like column selection and standard preprocessing, refer to the main documentation sections on Side Panel Options and Preprocessing.

1. Correlation Method

  • In the Column Selection area (or a dedicated "Method" section), choose the statistical method used to calculate correlations:

    • Pearson: Standard correlation coefficient, measures linear relationships. Assumes data is normally distributed.

    • Kendall: Rank-based correlation, measures ordinal association. Less sensitive to outliers than Pearson.

    • Spearman: Rank-based correlation, measures monotonic relationships (how well the relationship can be described using a monotonic function). Also robust to outliers.

2. Correlation Settings

  • NA Action: Select how to handle missing values (NAs) during correlation calculation.

    • The default (everything) typically attempts to compute correlations whenever possible pairs of observations exist.

    • Other options (like pairwise.complete.obs or complete.obs) might be available, allowing you to use only complete pairs or only rows with no NAs across all selected variables. Refer to standard R cor() function documentation for detailed behavior of these options if needed.

  • Plot Method: Choose how the correlation values are visualized within the plot matrix:

    • Options often include circle, square, ellipse, number, shade, color, pie. These determine the shape or shading used to represent the correlation strength and direction.

  • Plot Type: Select which part of the correlation matrix to display:

    • full: Show the entire square matrix.

    • upper: Show only the upper triangle (excluding the diagonal).

    • lower: Show only the lower triangle (excluding the diagonal).

  • Reorder Correlation: Choose how variables are ordered along the axes:

    • Options might include alphabetical order, or ordering based on clustering results (like Angular Order of Eigenvectors, AOE, or First Principal Component, FPC), or hierarchical clustering.

  • Hierarchical Clustering: If you choose a reordering method based on hierarchical clustering (or enable a specific clustering option):

    • Clustering Method: Select the linkage algorithm (e.g., ward, complete, average) used to build the hierarchy.

    • Number of Rectangles (Clusters): If desired, specify the number of clusters (k) to highlight with rectangles drawn on the heatmap, based on cutting the dendrogram.

3. Text Size

  • Axis Text Size: Adjust the font size for the variable names displayed on the plot axes using +/- buttons or by entering a numeric value.

Last updated

Was this helpful?