Understanding table plots
Look into the importance of table plots and how to analyze the plots produced by PANDORA, particularly with a mix of categorical and numerical variables
Why are table plots important?
Table plots also provide vital information to understand the structure of the dataset, which is especially important when working with large datasets in biomedical research. However, table plots preserve individual-level or close to individual-level (depending on number of objects per bin) data, unlike distribution plots which provide summary statistics. Hence, table plots allow the user to:
Observe heterogeneity in variables that can be hidden in averages
Visualize and examine aggregated distribution patterns across multiple variables
Identify variables with missing data and outliers
What do PANDORA's table plots show?
Generally the table plot will consist of the following features:
Sorting variable: The first plot will be the sorting variable that will determine the position of the samples.
Y-axis: The position of a sample in the sorted list (0-100% quantiles). Hence, earlier datapoints are present on top of the graph while later datapoints are present in the bottom.
Table bins information: Present in the bottom left of the plot. Provides information about:
Number of row bins
Number of objects
Rounded number of objects per bin
Plot title: Showing the variable that is plotted following the order of the sorting variable. Some variables will be log-transformed to normalize the distribution for better visualization.

How to obtain information from a table plot?
As an example, let us look at a table plot with days after positive SARS-Cov-2 test a sample was taken (Timepoint) as the sorting variable and variables of interest as
Disease severity: the severity of disease symptomsS-IgG: immunoglobin G antibodies specific to the SARS-CoV-2 spike proteinTotal pos T cell elispot: total activated T cell countResponder: outcome of immune response durability at 6 months

Here are the key features to analyze in the plot for each type of variable:
Numerical variables
Log transformation: The title of the graph can appear as the variable name, or log (variable name), like seen with
log(S.IgG)in the plot below (third graph).Log transformation makes data, especially skewed data, more symmetrical that allows for better visualization.
Length of bars: Spread/magnitude of the values for the objects within the bin.
Shorter bars correspond to lower values while longer bars correspond to larger values.
Mean of bin: Shown as the black vertical line in the middle of the bar
These help visualize trends such increasing/decreasing values as you go down the plot, consistency of values, and so on.
Lack of bins: A lack of bins can indicate several possibilities:
The objects consist of discrete values (such as the first graph Timepoint) and hence is likely a categorical variable instead of numerical.
There are missing values present in the particular variable, as seen in the fourth graph,
log(Total.pos.T.cells.elispot)where the gaps between 20-33%, 60-67% and 86-100% percentiles (y-axis) indicate there is no available values from the T cell ELISpot at certain timepoints.
Categorical variables
Bar colors: Instead of having values indicated by the x-axis, the bars are color-coded by category.
These bar colors give a general indication of the distribution of data within each category. It can also indicate missing values when present
For example, in the second graph of
Disease.severity, it is clear a higher number of blue bars are present compared to orange and yellow
Legend: Provides name of the category corresponding to each color and includes missing values
For example, in the
Respondervariable graph, we can see that there are about the same numbers of high and low responders and a few missing values
Last updated
Was this helpful?