I am looking to build a predictive model for predicting churn and looking to use a discrete time survival model fitted to a person-period training dataset (one row for each customer and discrete period they were at risk, with an indicator for event – equaling 1 if the churn happened in that period, else 0).
- I am fitting the model using ordinary logistic regression using the technique from Singer and Willet.
- The churn of a customer can happen anywhere during a month, but it is only at the end of the month that we know about it (i.e. sometime during that month they left). 24 months is being used for training.
- The time variable being used is the origin time of the sample - all customers active as of 12/31/2008. A covariate used is the tenure of the customer at that point in time.
There are a series of covariates that were constructed – some that do not change across the rows of the dataset (for a given customer) and some that do.
These time variant covariates are the issue and what is causing to me question a survival model for churn prediction (compared to a regular classifier that predicts churn in the next x months based on current snapshot data). The time-invariant ones describe activity the month prior and are expected to be important triggers.
The implementation of this predictive model, at least based on my current thinking, is to score the customer base at the end of each month, calculating the probability / risk of churn sometime during the next month. Then again for the next 1,2 or 3 months. Then for the next 1,2,3,4,5,6 months. For the 3 and 6 month churn probability, I would be using the estimated survival curve.
When it comes to thinking about scoring, how can I incorporate time-varying predictors? It seems like I can only score with time-invariant predictors or to include those that are time invariant, you have to make them time invariant – set to the value “right now”.
Does anyone have experience or thoughts on this use of a survival model?