Subscribe to DSC Newsletter

Hi,

I have been working on developing a churn prediction model (in telco market) for the last 2-3 months. Using logistic regression (NN and DT was also used but Log Reg gave the best results)  I made a model with a very high predictive accuracy. All seemed to go well until i observed that most of the predicted churners had churned even before the prediction.

The structure of my model is as follows:

1) churn is defined as inactivity of 20 consecutive days since this is not a subscription account
2) high revenue customers (>=75USD revenue per month)
3) 3 prior months of historical usage data for each customer (usage aggregates for each of the 3 months)


the data set is trained on a sample of 100k subs with 50k churners and 50k non churners. The total high revenue base is 2.7 Million. The lift of the model is very good but inaccuracy is very high because of 'in-actionable' churners. Nothing seems to predict these people 'before' they churn.

Please advice if any one has faced a similar problem in churn. Thanks


BR/Talha

Views: 6075

Reply to This

Replies to This Discussion

Very high accuracy is always suspicious :-)

How have you observed that most of churnes had churned before prediction???
... why do you build a model on already churned customers?
Remove them from training set and build a new model.

Btw. what's your highest absolute coeficient in you regression formula - and of what attribute is that?
Hi Jozo,

Thanks for your reply.

In the top bin [0.9-1.0] I got an accuracy of 65%.

I did not build my model on already churned customers BUT when i scored my model on the entire base of 2.7 Million, i predicted 97,000 churners. Of these those who actually churned were 63,000. Of these 63,000 churners 39,000 had become dormant just a few days before i scored the model. This meant that i lost a big chunk of actionable churners.

I have tried shifting the scoring dates but that doesnt make any impact - my feeling is that once the user decreases his revenue it is only then that his 'decreasing trend' is captured by the model (i feel this way because the 2 most important predictors are that of days since last activity).

Since the user is high revenue, he is consistently generating large amounts of revenue but just before churning he suddenly decreases it and leaves cauing no response time. Now i need to come up with a method of maybe capturing some aspect of the behaviour that indicates in 'advance' that the customer is going to churn. Or may be there is another solution to it ?

The highest absolute coeffiecients and their attributes are as follows:
Days Since Last OutBound Activity coefficient=0.8091825
Days Since Last InBound Activity coefficient=0.27037


Thanks Again :)

BR/Talha
My guess is, the definition of churners should be modified.

I have a few questions/suggestions:
1. Inactivity of 20 consecutive days: If customer is inactive between 61-80 days, he's a churner. Are you capturing the transaction info from the 1st day till the 60th day? If you do this, your model should predict the probability of churn in the next 20 days. So any action has to be done from day 1 onwards.

2. What about breaking up that 20 day period into something like - inactivity for 7 consecutive days = Low, for 14 days = medium, and for 20 days = high, and then build a decision tree? This way, the contact/retention strategy could be customized for each group.

Regards,
Datalligence
Score only active customers in the latest data. There's no reason for scoring already churned customers.

And if you still have problem that you loose actionable customers - create another target variable - churned in some period AFTER 15 days from today...

Btw. coefficients seem to be fine and there shouldn't be trivial prediction in your model.
@ DataLLigence
My guess is, the definition of churners should be modified.
Well what we are trying to predict is 'soft' churn i.e. our analysis showed that 20 days of inactivity eventually leads to hard churn. So we actually are trying to model the 20 day dormant behavior and predict subscribers with an inactivity of 20 days. Could you suggest any alternative churn definitions from your experiences?

I have a few questions/suggestions:
1. Inactivity of 20 consecutive days: If customer is inactive between 61-80 days, he's a churner. Are you capturing the transaction info from the 1st day till the 60th day?


Yes we are capturing the transaction info from the 1st till the 60th day. Then from 61st to 70th day is a period we call as the marketing gap. And the dormancy is between the 71st and 90th day. During training we take as churners those subscribers who were active in the Marketing gap and then inactive in the churn period. Non-Churners are those who are active in the marketing gap as well as the churn period.So in essence subscribers active in the marketing gap are taken in for training the model.

If you do this, your model should predict the probability of churn in the next 20 days. So any action has to be done from day 1 onwards.

Under our definition, we expect our model to predict churn between 71st and 90th day AND he is expected to be active in the Marketing gap so that any retention offer can be made to him. Unfortunately, our problem is that a subscriber who churns in the 20 day churn period is NOT active in the marketing gap i.e. he has churned before the 10 day marketing gap. Only very very few subscribers who churn are active in the Marketing. I hope i make sense :)

2. What about breaking up that 20 day period into something like - inactivity for 7 consecutive days = Low, for 14 days = medium, and for 20 days = high, and then build a decision tree? This way, the contact/retention strategy could be customized for each group.

Yes this can be experimented with BUT under the light of current problems do you see any improvements (if any) with the splitted churn period that you suggested?


@ Jozo
Score only active customers in the latest data. There's no reason for scoring already churned customers.
Well since our definition of churn is inactivity of 20 days, many of the people start their inactivity 4-5 days before we score them. And then they continue their inactivity without us being ever able to contact them :( Hence we do not know who has churned before the time of scoring - so we cannot exclude them.
Hi Talha,

So, your churners are customers who are active in the 10 days marketing period, and become inactive in the next 20 days. And according to you,

"Only very very few subscribers who churn are active in the Marketing."

That means the churners have already left before your churn window definition of inactivity for 20 days. That also means, you don't have transaction data for these churners for the 10 days. Therein lies the problem :-) Why don't you try defining churn as customers who are inactive for 30 days? Any churn model should predict when customers are about to leave. And from your explanation, my understanding is that most of your customers have already decided to leave about 10 days before you define them as churners. Try changing the definition.

Regards,
Datalligence
Hi Talha,

So, your churners are customers who are active in the 10 days marketing period, and become inactive in the next 20 days.

Yes that is correct :)
And according to you,

"Only very very few subscribers who churn are active in the Marketing."

That means the churners have already left before your churn window definition of inactivity for 20 days. That also means, you don't have transaction data for these churners for the 10 days.

Yes we do not have the transaction data available for these 10 days. neither in the training and of course not in the scoring data sets.

Therein lies the problem :-) Why don't you try defining churn as customers who are inactive for 30 days? Any churn model should predict when customers are about to leave. And from your explanation, my understanding is that most of your customers have already decided to leave about 10 days before you define them as churners. Try changing the definition.

So do you mean that i should remove the marketing gap and include that in the churn period. meaning churn period is from 60th to 90th day. OR should i leave the markteing gap as is, and define my churn window from 70th to 100th day? I did do the former a few days back BUT still the problem remains the same. A lot of miss-hits (around 60%) and of the 40% correctly predicted churners (34%) already dormant :(
You need a real-time scoring predicting - will this customers visit us again in the next 20 days?

Or just send an email to (all) inactive customers after 15 days of inactivity.

Simple solution may be the most effective.
are you not predicting two things here, who and also when, since the dependent variable is in part time to event (churn), could you not run survival analysis on this

time to event modelling is very common in telecoms churn

if you have open source r available, here is a starting point

easily accomodate100, 000 via mle estimation

http://gking.harvard.edu/zelig/docs/index.html

Models for Continous Bounded Dependent Variables is what you need

hth paul d
@Jozo
You need a real-time scoring predicting - will this customers visit us again in the next 20 days?
Well maybe :)
Or just send an email to (all) inactive customers after 15 days of inactivity.
We cannot do that because most of the people who are 15 days dormant never return so after 15 days it is very likely that the prepaid customer has thrwon away his sim card and switched to another user....so we need to predict this 15 day inactivity in advance :)
Simple solution may be the most effective.
I agree but so far no simple solution could be found

@Paul
Thanks paul for you feedback and referring me to this website :) I ll look into it BUT actually i did use survival analysis. Unfortuanatley initial results from the rough cut model were not very encouraging :( How about Multi Nominal regression? may be i can make 4 categories of target variable 1) dormant in marketing gap, dormant in churn period 2) active in marketing gap, active in churn period 3) active in Marketing gap, dormant in churn period 4)dormant in marketing gap and active in churn period....What do you think about that? Thanks :)
Talha,
I'm really interested about what do you do for retaining your customers?
... if they throw SIM card away what's your plan to prevent them to do so?
Talha,

I guess in theory that a multinomial logit model would be ok, because what we are talking about here is states of potential (probability of outcome) for churn stemming from a pre-condition of dormancy.

Hidden markov models spring to mind, but are not that effective in practice.

Since you cannot predict dormancy from the variables you have, to some degree this variable can be thought of as being stochastic - random (it is a state of mind essentially before any action is undertaken). That does not mean you cannot apply some basic analysis to dormancys such as modelling it via a simple weibull curve, examining the distribution etc.

Possible experiment
Could you instead undertake some experimentation on the efficacy of your existing churn strategy. Each time a customer enters the dormant stage based on your defn, assign them to one of two conditions (retention or no retention strategy), deploy your retention strategy on 50% of the sample, and examine the impact on churn rates.

HTH Paul

Jozo,

Just out of curiosity, why not use churners in the modelling process? Given that churn is the outcome you are trying to predict, and each of the variables in logistic, random forest, SVM models can be developed to attempt to predict this outcome .... What other outcome variable would you train your model on if not churn?

cheers Paul

RSS

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service