I am working on a project to forecast CLTV and would like to get suggestions regarding possible approach I could adopt. Below is a description of my project:
- Our client is an e-retailer specializing in Flash sales of designer apparel. A person has to register on their website to make a purchase or even to browse the sales taking place.
- We have defined 2 types of people:
- Member: A member is someone who registers on their site but has not made any purchase till now.
- Customer: A customer is defined as someone who has registered and made at least 1 purchase.
- Our client wants us to forecast CLTV for both customers and members.
- Also we have data from 2008 onwards.
- For customer we have demographic data, purchase data and website visitation data.
- For members we will have only demographic and website visitation data.
Here is an approach I have in mind:
- Define CLTV as: Net present value of “CLTV = Probability(alive at each point in time) * Forecast Revenue at that point in time discounted at a rate of 15%”
- The first part is to forecast Probability(alive) or Probability(of profitable lifetime to company), I am thinking of a survival analysis (Cox proportional hazards model) using Proc PHREG in SAS.
- In order to implement Proc PHREG, we have defined Churn as: A customer is churned if he does not make a purchase for 12 months since his last purchase.
- My modeling dataset would contain data from 2008 till Feb’2011.
- For example let us suppose a customer A registers on the website on Feb’2008, makes his 1st purchase on Apr’2008, 2nd purchase on Sep’2008, 3rd on Feb’2009 and after that never made any purchase. For A dependent variable ‘t’ will be (Last purchase date – First purchase date) in essence describing the time during which he was profitable to the company.
- There are some customers who have placed on 1 order, for example let us suppose a customer B, registers in Jan’2009 makes a purchase in Feb’2009 and never makes a purchase, for such customers ‘t’ will be defined by a very small value say 1 day. But the problem is for this type of customer I can’t defined “Average Order Gap” variable which I believe will be important independent variable.
- If a customer has not churned as of Feb’2011, he will be censored in my model.
- This is an overall rough approach I have in mind for first part of the model, I have read many papers which have modeled using NBD/Pareto model and modifications of that model, but these are just based on a customer’s last purchase and my client is particular about modeling using traditional survival approach.
- For the 2nd part of CLTV, to forecast revenue in future, I was thinking about using a Average value defined on basis of customer cohorts. But again my client wants us to try to use modeling to predict future revenue at each point in time. I am like kind of lost as to how to accomplish this.
- One more major problem is how do I give a CLTV value for a member, who has never made a purchase till now? The above models can only be built on customers who have purchase information but member have only demographic and visitation information. I am thinking of rough segmentation/look alike kind of approach to map members to customer groups based on demographic and visitation data only.
- Would appreciate if members could share their suggestions regarding the above approach and also suggest alternatives. Thanks, Hari