More on how to compare box plots

Overlapping boxes and medians

It gets tricky when the boxes overlap and their median lines are inside the overlap range. As always, math comes to the rescue. Follow this simple formula:

  • Over 33% for a sample size of 30.
  • Over 20% for a sample size of 100.
  • Over 10% for a sample size of 1000.

Box plots are about ranges, not actual counts.

At first glance, it is easy to think a longer section on a box plot represent a higher count. That is not the case. Take a look at this box plot:

Box plots skewed to the right? To the left?

Limitations of box plots

  • No indication of sample size: Though you can use box plots on non-parametric data, it is best to have a sample size of at least 20 (some might even say 30). For a smaller sample size, consider using individual value plots.
  • The illusion of bar graphs: Box plots resembles bar graphs in their appearance, yet they present completely different information. Bar graphs compare groups by their absolute counts, while box plots show their distributional ranges. Remember: the size of each section in a box plot shows how widely spread a data range is; it says nothing about the quantity of the group.
  • The troubles are in the whiskers: Box plots’ whiskers are mistaken as error bars more often than you’d think, especially when there are asterisks representing outliers on top of them. They are not. They show the lowest and highest quartiles of values. They contain half of the data points; the other half are in the box.
  • The secret box: Box plots sometimes hide important information. When data “morph” but manage to maintain their ranges and medians, their box plots stay the same.
  • Violin plot is a better alternative: Violin plots present the same information as box plots, and more. They have a built-in density plot, and therefore show “the shape of data” more clearly. All data points are contained inside the violin. And they look nothing like bar graphs.

To sum up:

Box-and-whiskers plots are an excellent way to visualize differences among groups. To compare two box plots with overlapping boxes and medians, calculate the Distance Between Medians as a percentage of the Overall Visible Spread.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
BioTuring Team

BioTuring Team

At BioTuring, we dream, we think, we code, and we deliver important algorithms and software — to tackle biomedical challenges.