A Data Science Central Community
Many time series charts seem to exhibit a pattern: an up-trend, apparent periodicity, a stochastic process that seems not to be memory-less, and so on. Look at the following picture, representing stock price simulations. Do you think there is an up-trend? Actually, long term, there isn’t: it’s a realization of a pure random walk. At any time, the chance that it goes up or down by a given amount, is exactly 50%. Yet, on short term periods, it is perfectly normal to observe ups and downs. It does not make this time series predictable: you could try to design a predictive model trained on the first 1,000 observations, then test it on the remaining 1,000 observations. You would notice that your model is not better (for predicting up and down movements) than randomly throwing a dice.
Figure 7.4: This is a random walk, indeed!
Other tests that you can do to familiarize yourself with how randomness and lack of randomness look like, is to simulate auto-regressive time series. One of the most basic processes is X(t) = a * X(t-1) + b * X(t-2) + E + c, where t is time, c is a constant (-0.25 < c < +0.25) and E an error term (white noise) simulated as a uniform deviate on [-1, +1]. For instance, X(t) represents a stock price at time t. You can start with X(0) = X(1) = 0. If a + b = 1, then the process is stable, in equilibrium (why?) If c < 0, there is a downward trend. If c > 0, there is an upward trend. If c = 0, it does not go up or down (stationary process). If a = b = 0, the process is memory-free, there no stochastic pattern embedded into it, it just pure noise. Try producing some of these charts with various a, b, c and see if you visually notice the pattern, and can correctly interpret it. In the above figure, c = 0, b = 0, a =1. Of course, with some values of a, b and c, the patterns are visually obvious. But if you keep both c and a + b close to zero, it is visually a much more difficult exercise, and you might have to look at a long time frame for your brain to recognize the correct pattern. Try detecting the patterns with a predictive algorithm, and see when and if your brain can beat your predictive model!
Related articles
Comment
What is missing in all this is *context* - a fatal exclusion in any data analysis effort. Initially we are only told that this is a "picture, representing stock price simulations." Later we are told that the picture is "an extract from a simulation that is about 3 times longer. This the middle part. The last part shows a big drop. The first part is more neutral, with ups and downs." So the total series is not only right-censored, it is also left-censored - hiding even more pertinent information.
Perhaps most importantly, initially we don't know the *scale* of the x-axis. Only later are we told "if you remove just a few days from this time series (when the unusually long runs occur), this chart would look like going downward." Now knowing that the data are "stock prices" fluctuating over just days, who would claim they are predictable?
Most people who *assume* stock prices will continue to increase are basing that on their *hope* - not any model they may have created. :-)
[And speaking generally of models, before jumping into traditional ARIMA types, one might want to think parsimoniously and consider linear trends and level shifts...pulses and seasonal pulses...]
To add to that image, I have added a triangle chart pattern.
http://www.flickr.com/photos/[email protected]/11315285494/ .
A few years ago I wrote a program to add a number of common chart patterns to price series. What amazed me was how many patterns could be overlaid on a series, all conforming to strict rules of construction. Not only do our eyes play tricks, but we can easily write software to enhance those pattern finding illusions.
Other interesting fact: You will spend much more time on the same side of the X-axis than time switching between positive and negative. See also the arc-sine law, another example of counter-intuitive result associated with random walks.
@Wayne: You can call it a non-sequitur, but that's what caused millions of investors to lose trillions of dollars: Thinking the process had a built-in growth mechanism that beats inflation, based on past data (because of just a few rare big spikes - in short, because of extreme events). Also real data is right-censored: the future is hidden, as in my chart, which makes my chart a realistic example.
And, I meant X(t) = a * X(t-1) + b * X(t-2) + E + c. If you replace c by ct, there's a quadratic growth built-in (in top of the first three terms), not a linear one. The thing that I'm a bit unsure about, is what conditions to put on a and b to make the process stable: a + b = 1? Don't remember exactly. Maybe I should do some simulations to figure that out.
Also, I used the word stationary (as well as stable) in laymen terminology, rather than statistical (stochastic process) terminology, as this article is also aimed at people with no statistical background.
{...along the lines of Peter Lane's comment below...}
Vincent, you've tried to "trick" you readers with a type of non sequitur: You ask if the reader thinks there's an up-trend...and yes indeed the data you present do trend up. Then you (albeit indirectly) ask if the data are predictable - a quite different question.
One might call this a case of "mis-direction."
That's an extract from a simulation that is about 3 times longer. This the middle part. The last part shows a big drop. The first part is more neutral, with ups and downs. The point here is to illustrate that even a long up-trend can occur in processes that are totally neutral. It can lead someone to erroneously think that this stuff will always go up, which has caused financial ruin for many people.
In this picture, the entire growth is actually caused by just two steep climbs, a run-of-the mill trick typically occurring in such processes. You need more than two steep climbs to determine if this is going up or down: two is not enough for statistical significance. In short, if you remove just a few days from this time series (when the unusually long runs occur), this chart would look like going downward.
I don't believe it! I think the so-called "random walk" in the diagram is not random at all, but has been selected to be extreme (in the sense of having a large apparent trend). Just generating a set of data in a particular way is not enough to give it a name like "random walk" if a selection process has been imposed to decide which example of the generation to show.
Assuming that there are 2,000 points in the diagram (as implied by the first paragraph), the variance of the last point in the series is easy to compute (667), and its distribution is effectively Normal by the central limit theorem. Compared to the starting point, the upper 2.5% point of the distribution is 51, and the upper 0.05% (one in a thousand as large, two-sided) is 85. But the difference between first and last in the graph is about 245-155 = 90.
The point is that it is not possible to distinguish between patterns from different generating mechanisms if you don't know what the possible class of mechanisms are (including any selection process). So the illustration in this article is one that could quite reasonably have been generated by a process with a consistent upwards trend.
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge