You’re working with timeseries data, and you’re at the initial discovery stage. You’re trying to figure out all the different ways that the data varies in time. There may be constant trends, whereby the data is constantly increasing or decreasing in time. There may be completely random variations. And there may be cyclic components, that vary up and down in time with some fixed periods.
If there is only one periodic component, that may be easy to see in the data, and you could even get a visual estimate of its period. But what if there are multiple periodic components…
How do neural networks really work? I will show you a complete example, written from scratch in Python, with all the math you need to completely understand the process.
I will explain everything in plain English as well. You could just follow along, read just the text and still get the general idea. But to re-implement everything from scratch on your own, you will have to understand the math and the code.
I’ve seen several articles such as this, but in most cases they were incomplete. …
You have a data sample. From it, you want to calculate a confidence interval for the population mean value. What’s the first thing you think about? It’s usually a t-test.
But the t-test has several requirements, one of which is that the sampling distribution of the mean is nearly normal (either the population is normal, or the sample is reasonably large). In practice, that’s not always true, and so the t-test may not always deliver optimal results.
To work around that kind of limitation, use the bootstrap method. It has only one important requirement: that the sample approximates the population…
But how was that number calculated? Turns out, the basic value is pretty easy. I’ll show you how to do that, and then I will make an estimate for how confident we are that the value is right.
To test a vaccine, you need to do a randomized blind trial. Gather tens of thousands of people. Divide them into two nearly equal groups. One group will receive the vaccine. The other group (the control) will receive an injection that looks exactly like the vaccine, but doesn’t actually do anything.
The control group shows what happens when there is no vaccine…
This is part 2 of this article:
To recap: there’s a pandemic going on, 1% of people have the virus. There’s a test that can detect the virus, and the test is 99% reliable (for both positive and negative results).
But this time, when you take the test, the result is negative. How much can you trust that result?
If you know nothing else besides the test result, then it’s very reliable: 99.9898%, which is basically 100%.
I will not repeat the analysis, please refer to Part 1. But, again, this is the ideal case scenario. What happens in reality?
Let’s say there’s a virus pandemic sweeping through the population, and 1% of people have the virus. Let’s say there’s a test for this condition, and the test is 99% reliable, meaning — out of 100 tested cases, the test will be correct in 99 cases, and will be wrong in 1 case. The reliability is the same (99%) for both positive and negative results.
You take the test, and the result comes back positive — the test says you have the virus. And that’s all the information you have. What’s the probability you actually do have the virus? …
Sometimes trends need to be removed from timeseries data, in preparation for the next steps, or part of the data cleaning process. If you can identify a trend, then simply subtract it from the data, and the result is detrended data.
If the trend is linear, you can find it via linear regression. But what if the trend is not linear? We’ll see what we can do about that in a few moments.
But first, the simple case…
Here’s timeseries data with a trend:
Let’s load it up and see what does it look like:
import pandas as pd
Math is hard, let’s go shopping — for tutorials, that is. I definitely wish I had read this tutorial before trying some things in Python that involve extremely large numbers (binomial probability for large values of n) and my code started to crash.
But wait, I hear you saying, Python can handle arbitrarily large numbers, limited only by the amount of RAM. Sure, as long as those are all integers. Now try to mix some float values in, for good measure, and the snake starts barfing. Arbitrarily large numbers mixed with arbitrary precision floats are not fun in vanilla Python.