Subscribe to DSC Newsletter


I need to make a response model for a marketing campaign. But I do not have a set of customers whoc responded and did not respond.

All I have is the set of customers who responded and another set which has untested set.

Can we make a response model using these two datasets? What are the techniques if any?

Views: 251

Replies to This Discussion

Yes, one approach is to score the untested dataset based on a "model" from the responded dataset. A first pass coud be using a SOM to model the untested dataset & then using a RFBN net to score the untested dataset.
Sorry about the "typos" in the above ... coud = could & RFBN = RBFN ... and so it goes.
Thanks for the response....Can you elaborate on the method, as in how do I score the untested set , I do not have a model yet. How do I build a model?.

can you specify what techniques are SOM and RFBN?
Yes Rahul, SOM is the Kahonen Self Organizing Map. The SOM is an unsupervised technique that identifies the underlining features in a dataset. Thus, using the SOM on the responded dataset will establish the features of customers who responded. While the RBFN is the Radial Basis Function Network. Using the RFBN with a scaling function on the untested file will reveal "how similar" each record is to the the responded profile. A simple descending sort will produce a "lift list" with the earlier records being more likely to be responders than later records. While this is not a "canned" approach, it is helpful to use "creative predictive analytics" to address "non traditional" problems. From DLvK of
I like this approach. It's creative and worth trying.
Thanks Jozo, as noted this is a "creative predictive analytics" to "non traditional" problem.
DLvK of
Just a remark:
For the sake of occam's razor I would try a k-nearestneighbor (ignoring the labels of course) first. This approach has been ennobled by collaborative filtering. Of course the performance will be worse compared to SOM and slow (if your set of responders is big), but much less complicated.
... what response rate do you expect in untested set?

if expected response rate is small (let say less than 20%, lesser is better), then
1. join these sets together
2. mark all observations from first set as possitive and all observation from second set as negative
(or mark X % of records as possitive, where X is expected response rate)
3. split data 50% train set, 50% test set
4. then train your model using your favorite method (e.g. some tree / logistic regression) and validate on test set
5. if lift is good & stabile, then try to check rules in your model - if your "expert's expectations" agree with them at least generally
6. score new data, pray & execute campaign (try smaller pilot first), pray again & evaluate reponses

this approach is not statistically correct, but still should improve your lift/ROI if executed properly.
then write here, what have you tried and with what results :)
Thanks....I do not have a response rate but I have been trying this approach. Will let you know when I have the results.....
I agree with Jozo. If historically your response rate is relatively low, you can get away with creating your dataset using the known responders (and they are obviously the 'ones' in terms of the Y, and using everyone else as the zeros. The reason that you can do this is because when you look at everyone else, only a very small percentage of them would take on the attributes of the responders, and a majority of them will take on the attributes of the non-responders. I've used this method many times and it typically works well.

I have to admit that my favorite part of Jozo's post was that you should 'pray and execute campaign, and then pray again and evaluate responses'. That's good stuff!!!
I agree, too. But one thing keeps bothering me: At first glance it is correct that the response rate should be low so that the noise caused by treating response as no-response is small. But:

Given an arbitrary response rate, a set of responses and a set representing a valid sample from the basic population. If the classes response (=class 1) and no-response (=class 0) are distinguishable, should it not be possible to distinguish response and basic population, too ? In the latter case basic population would be treated as class 0, as suggest by Jozo.

Of course you need a model robust to unbalanced class distributions, the resulting probabilities are wrong, etc. ... but the ranking should be correct. What do you think ?

I wouldnt use this approach in health area, though ;)

kind regards,



On Data Science Central

© 2019 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service