Test results are positive. What are the odds you have the virus? Surprise, intuition is wrong!

Let’s say there’s a virus pandemic sweeping through the population, and 1% of people have the virus. Let’s say there’s a test for this condition, and the test is 99% reliable, meaning — out of 100 tested cases, the test will be correct in 99 cases, and will be wrong in 1 case. The reliability is the same (99%) for both positive and negative results.

You take the test, and the result comes back positive — the test says you have the virus. And that’s all the information you have. What’s the probability you actually do have the virus? Intuition says 99%, but that’s wrong — the probability is nearly 2x less than that.

Keep in mind, the analysis in the first part of this article only applies to an ideal case (the key here is the phrase “that’s all the information you have”). It does NOT mean testing is useless. It does NOT apply to most real-world testing. I will explain real-world results in the second part of the article.

Bayes’s Theorem

This is pretty much a direct application of Bayes’s Theorem:

P(A|B) = P(A) * P(B|A) / P(B)

P(A) is the probability of event A (probability of having the disease, which is 0.01)

P(B) is the probability of event B (probability of testing positive regardless of actually having the disease or not)

P(A|B) is the probability of observing A when B is true (probability of having the disease when test results are positive)

P(B|A) is the probability of observing B when A is true (probability of testing positive when you actually have the disease, which is 0.99)

Let’s assume the total population is 10000 people, and we’ll calculate some of these partial probabilities:

probability table
probability table

The population is 10k, so the sick total is 100, and the healthy total is 9900 (only 1% of the population has the virus).

Of the 100 sick, 99 will test positive and 1 will test negative (the test is only 99% reliable).

Of the 9900 healthy, 99 will test positive and 9801 will test negative (the test is only 99% reliable).

The number of total positive tests (correct and incorrect) will be 99 + 99 = 198 (and we have the answer already, but let’s pretend we don’t just yet).

What is P(B) again? It’s the total probability of testing positive. From the table, P(B) = 198 / 10000 = 0.0198

So then the formula gives:

P(A|B) = P(A) * P(B|A) / P(B) = 0.01 * 0.99 / 0.0198 = 0.5

If you test positive, the likelihood of actually having the virus is 50%! That’s very counter-intuitive. You would expect it’s around 99%. What happened?

A key thing here is that the infection rate is only 1%. There are far, FAR more healthy people out there (9900), compared to sick people (100). But the test is equally wrong about both (1% wrong). So the number of healthy people wrongly labeled positive will be very bloated (99), to the point where it becomes equal to the number of sick people correctly labeled positive (99).

But what if the infection rate is not 1%? What if it’s higher than that? Say, 10% of people have the virus, or 50%. Is the probability then of being sick the same when the test is positive? Let’s do a plot.

chances vs infection rate
chances vs infection rate

50% chance of being sick only holds true at a very low infection rate (1%). As the infection rate goes up, the positive result very quickly becomes more reliable. At an infection rate of 10%, the positive result is over 90% reliable.

How about real-world testing?

Does that mean testing is useless when rates are low? Actually no.

If you’re testing because you have symptoms, you’re not anymore in the general population at 1% infection rate. You’re in the cohort of people with symptoms, which has a much higher infection rate — and then the positive result is much more reliable.

If you’re testing because one of your close contacts is sick, you’re in the cohort of people with infected close contacts, and again the infection rate there is much higher, and the positive result is much more reliable.

The only time the positive result is not very reliable is when:

  • the overall infection rate is very low (1%), and
  • you’re picking test subjects completely at random (picking completely random people out of the general population, like literally pulling first name / last name out of a hat)

Then yes, the positive result is only 50% reliable (assuming 1% infection rate and 99% test accuracy).

Do not naively extend these results to the real world. It’s very easy to lose track of the confounding factors, which often boost the reliability of the positive result. If you do, you will assume the positive result is not reliable, and you’ll be wrong.

How about negative test results?

I wrote a follow-up which deals with the negative results. Read it here:

Credits and code

The images are free to use, via Unsplash. Credit links:

The R code that generates the plot is my own:

sick.chance <- function(inf.rate, test.rel = 0.99, tot.pop = 10000) {
sick.pop <- tot.pop * inf.rate
healthy.pop <- tot.pop - sick.pop
pos.sick <- sick.pop * test.rel
neg.healthy <- healthy.pop * test.rel
pos.healthy <- healthy.pop - neg.healthy
pos.total <- pos.sick + pos.healthy
chance <- pos.sick / pos.total
return(chance)
}
infection.rate <- 1:100 / 100
chances <- lapply(infection.rate, sick.chance)
plot(infection.rate, chances, col = 'blue'); grid()

Thanks for reading!

Graduated Physics. Engineer in the computer industry. Working on my Master’s degree in Data Science.