How does our approach to testing for COVID-19 impact our understanding of what’s really happening?

Image for post
Image for post

Suppose we’re looking at a crowd of people.

We want to find out something about them, but we don’t have enough time or resources to ask them all. What do we do? We choose a few people, and only ask this sample of the population.

How does the way we sample a population impact what we find out?

Image for post
Image for post

What if our crowd was just exposed to a virus? Let’s say it’s the covid-19 virus. Some of the people in our population are showing symptoms. Some are infected, but not showing any symptoms yet. Some have even recovered.

This is a new disease, and any predictions we make now are just models based on what limited information we have — which isn’t a lot.

We need answers to these questions, so that we can be ready to care for those who will become most ill.

How we conduct testing for the virus will impact how well we can tell what’s going on in our population.

Let’s try an experiment, with our simulated population. If we select people randomly, does the sample reflect the rest of the crowd? Sometimes, but usually not

Example of a random sampling from our population.
Image for post
Image for post

What happens in our model if we take another random sample of our population? Or only sample people who are noticeably sick?

Image for post
Image for post
Example of testing only the most ill people in our population.

Both strategies, testing purely randomly and only with visible indications, can lead to inaccurate understandings of the actual population. With random testing, we might completely miss infections all together if we don’t have a large enough sample. And yet, restricting testing to only those who are ill enough to go to a hospital and meet ‘testing criteria’ will continue to leave us with incomplete, likely biased, data.

In the case of COVID-19, we need both a large enough sample and an understanding of the biases in our sampling strategies.

Originally published at http://kristinhenry.github.io. An interactive version is available at http://kristinhenry.github.io/sampling.html

Technical Note: the charts in this article were created with d3.js and ProPubica’s weepeople font.

Written by

Data Visualization Consultant. Generative and Data Artist. Creative Coder. Founder of GalaxyGoo. http://kristinhenry.github.io/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store