Subscribe to DSC Newsletter

How to develop churn prediction model for telecom company?

Hi everyone,

I am working in a telecom company, which is interested in developing a churn prediction model. I want to know the which steps should I follow in order to develop such kind of model. Any help regarding the problem is highly appreciated. thanks in advance.

Views: 32239

Reply to This

Replies to This Discussion

We did this for a financial company a while ago. It depends highly on the context you want to bring along.

I would say that starting with the current CRM database is a solid base. Most likely, the number of customer care calls, the number of complaint e-mails etc. give a good indicator of churn. Just counting will most likely not be sufficient though, you will need to analyze the content of the e-mail, audio from the conversations with customer care, web behavior and perhaps even social network analysis.

thanks Erik,

You are right, the most important place to dig is in Customer Care system or better say CRM database. What I want is that what are the steps in an order way to design the prediction model and of course which model best suits for analyzing telecom data.

After researching a lot in whitepapers and articles in about CHURN PREDICTION in telecom I came to these conclusions, I want you gurus to confirm what I have conclude and if you think I am wrong, give comment and guide me with correct solution. Here is my findings:

Step1: find as much attributes in telecom data as you can, and make a dataset of those data. the data are:

Customer Demographic data such as:

  • Zipcode
  • Income
  • Occupation
  • Age
  • Gender
  • Living Address
  • Occupation Address

Purchase History:

  • Number of Service Purchase
  • Value of Purchase
  • Last Date purchase
  • Payment type


  • Product/Service/Campaign type
  • Product diversity

Customer Relation Data:

  • Number of Questions about the services from e.g. IVR
  • Number of Visits to retail shops or online website (e.g selfcare website)
  • Number of Complaints solved
  • Number of total complaints

Service Usage:

  • Number of calls
  • Number of Outgoing calls
  • Number of Incoming calls
  • Number of Roaming Calls
  • Number of International Calls
  • Number of SMS
  • Total minutes of calls
  • Number of VAS activations
  • Number of VAS deactivations
  • Number of joining in campaigns

Billing Data:

  • Total amount of bill
  • Total amount of voice call
  • total amount of VAS service (MMS/GPRS/etc)
  • total amount of SMS
  • total number of barred (one-way barred)
  • total number of full barred (two-way barred)
  • average number of days that payment is done after bill due date.

and more data (I would be glad if any other attributes you think can help).

after having these data, we should extract in a period of time the data, I mean by having these attributes we should for example have a training set of 3 months (e.g. Jan 2013, Feb 2013, Mar 2013) and extract those customers in this period of time (Jan , Feb and March) which leave the company (Am i right?) and then by having this dataset of churned and unchurned customers in Jan and Feb and March 2013 we can go to step 2 for further processes to finally could build a model which can predict the churn rate of customers in April 2013(Am i right? I want to know whether I am doing right or not?).

Step 2: Finding Churned customers' behavioral pattern from above dataset

I think in this step I should find as much hypothesis as I can from the data in dataset which is highly related to the reason why a customer churned, by that I mean for example I may say from the dataset that "out of all churned customers in this dataset, 80% of them had filled online complaint form before leaving" and then test this percentage (80%) with the unchurned customers in the dataset in this period(JAN and Feb and MArch) to see whether this is also true about unchurned customer or not? (Am I right?)

After finding some assumptions or hypothesis or rules(Am i right with this word?), then we are ready to build our prediction model.(Am I right?)

step3: building the model:

Uhh here I suck actually, I don't know which Data mining Prediction Model algorithm (e,g Logistic regression, Naive Bayes, Decision tree with which technique ID3?, Info Gain?) should I use?????

I really need help, thanks for reading.


Since you have a whole gamut of data available its just information which you need to extract from the same. You need to first prioritize what info you want first. You can take out the basic first and then derive more models later

a) Churn propensity of the customers basis their AON and ARPU--Trace the churn pattern over a historical dataset and cull out the line graph and chalk the grey areas.

b) Which mode the customers are churning out of the network - involuntary or voluntary. In the above identified grey areas you need to define the mode for more drill down.

The above hypothesis is correct but then use a structured analysis by asking questions to yourself.

Great start!

When you have this dataset then open your favorite DM tool (for example Orange, import data and train for example decision tree.

You can simply predict what customer will churn in next 3 months. 

Decision trees do the great job. They are accurate enough and more important reveal why people churn. Just read the tree and you'll find out. 

Gini or ID3 or ChiSq? Try them all and compare. It's easy to train 20 trees in a 60 minutes and compare.

hi m n...

im a fifth year student currently doing a thesis entitle "applying data mining technique among broadband subscribers". right nw im having a difficulty in acquiring dataset from a company to do my an alternative i have come up to make a survey instead.. my question is 

is it enough by doing survey i can construct a predictive model about churn customer management?

and one of my study problem is to find out the factors that affecting the churn customer, will i be able to answer the study problem by doing the survey only to get data?

or perhaps do have any suggestions or advice for my study?

u can email me to :[email protected]

thank you

 The set of fields for the analysis seems reasonable. However, in our experience with churn analysis in telecom industry and customer retention in general you have to capture not only the total or average values, but use a temporal abstraction approach, where you look at service usage and billing over the last N months before churn or current date (if no churn). The steps are well described, e.g. in Handbook of Statistical Analysis and Data Mining Applications by Robert Nisbet, John Elder IV and Gary Miner.

With respect to the method selection, I would recommend trying Stochastic Gradient Boosting approach that usually gives robust and accurate results in such applications. If you look for better interpretability, then classification trees and logistic regression might be of help. Although in the latter case you will have to check on initial assumptions (e.g. multicollinearity, heteroscedasticity, etc.).


Can some one send the Data set for working on Churn Predictive Model. I am a student from University of Illinois and this Data set would be of great help to work on our project .

Thanks in advance.Mail me at [email protected] or send me link to download.




I am developing project on same topic.And i require datasets for it.

Hi, Can anybody answer how big of problem churn is, in dollar terms for wireless carriers? Any market data available for US market in 2012 or 2011? Please advise. Thanks. 


We are working on churn predictive tools with prorpietary algoritms. if your company is interested in developping some specific tools we can speak.

we have a team of researchers, statitician, churn manager and consultants on analyttics and churn; mastering this aspect at very advanced level.

If any interest contact me

Best regards

Remi Mollicone

Innovation, Alliances/Partnerships, Business Development
[email protected]
Skype: remimollicone7
Tél: + 33 630 729 013

I appreciate this answer. I happened to be  a potential consideration for such an effort, when the client's vendor / contractor had ton of analyses for this telecoms their ad  effectiveness -- time of the day, etc summarized from major telecoms ; and the idea was to develop Predictive Model for those factors and the churn - which telecom  is loosing to which telecom is loosing;  not sure where they ended up-from the recruiter; they paid hefty sum for somebody to develop ; the churn prediction models .. The developed product would be used to Market such methodology.

Good luck,

hi all,
What I did is as follow:
In the telecom which I am working I had access to 3 cycle of billing info. each cycle is 2 months. cycle 1, cycle 2 and cycle 3 respectively. cycle 1 is the most old cycle.
firstly I brought out all active customers from cycle 1. I did this by querying those customers who had feature 1:call duration>0 OR feature 2: SMS Amount>0 OR feature3:VAS Amount>0.
then in cycle 2 and cycle 3, I labeled those customers who had those three features bigger than zero (with AND not OR) as Active and others as Churned. Actually I had a question that based on which feature I should label the records??
any help is deeply appreciated.
thanks for reading.


Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2018 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service