A Data Science Central Community

This is a mathematical challenge, thought it is related to statistical parameter estimation in the context of time series / auto-regressive processes, such as ARMA. No prior advanced calculus knowledge necessary - smart high school kids can find the solution, thought it's not trivial!

*Click here for picture source *

Let's say that we have the model X(t) = **a** X(t-1) + **b** X(t-2) + e, where e is a white, independent noise (random variable) with zero mean, and t is the time. In short, a basic auto-regressive process or time series. More complex models are considered below.

The questions are as follows:

- What constrainsts should we put on
**a**and**b**to guarantee that the model is sound? - What statistical inference techniques offer solutions satisfying the above conditions?

Example: Let's assume that X(0) = 1, X(1) = 1, and for the sake of simplicity, let's assume that e = 0. Clearly if **a**=0.5 and **b**=0.5, then X(t) is constant, always equal to 1 no matter the value of t. If **a**=1 and **b**=1, then X(t) quickly becomes infinite as t grows.

We have the following potential cases for X(t), depending on **a** and **b**:

- Polynomial growth (including linear or constant)
- Exponential growth (with or without wild oscillations)
- Converging to 0
- Stable and non-periodic
- Stable and periodic

Question: what are the parameter sets driving stability?

The model X(t) = **a** X(t-1) + **b** X(t-2) + e has the following characteristic equation:

x^2 - a*x - b = 0.

The solutions to this equation (as well as initial conditions X(0) and X(1)) entirely determines whether X(t) is stable or not. Let's denote as r and s the two solutions of this characteristic equation:

- If r=s, we get linear or no growth for X(t).
- If |r| and |s| are < 0, then X(t) converges to 0 as t grows.
- If |r| or |s| > 0, we might experience exponential growth.

**Challenge**

- Formalize conditions to be satisfied by
**a**and**b**, to guarantee long-term stability - Identify statistical techniques (regression, Box-Jenkins) producing estimates that meet the previous conditions. Show that most traditional statistical (econometrics) inference techniques actually fail to meet the condition, and are thus only good for very short-term predictions.
- Generalize to X(t) =
**a**X(t-1) +**b**X(t-2) +**c**X(t-3) + noise - Generalize to spatial processes, for instance an image with pixel interactions with neighbor pixels: X(t, u) =
**a**X(t-1, u) +**b**X(t+1, u) +**c**X(t, u-1) +**d**X(t, u+1) + noise

Perform monte carlo simulations with various values of **a**, **b**, X(0) and X(1) to simulate these auto-regressive time series (can be done in Excel, R, Perl, Matlab or Python), to confirm your findings.

**Former weekly challenge**

Tags: