I came across some speculation on R:NR ratio to decide the technique that needs to be employed. I haven't found any documentation or proof as yet, so I thought I'd get some feedback/comments on the same.
Taking 3 scenarios of modeling situation:
We have a 3 populations of 100K customers, targeted by 3 different programs
Situation A - 5% have responded to a program of ours.
Situation B - Nearly 50% have responded.
Situation C - Greater than 70-80% have responded.
In each of the three scenarios, we can exploit the data to yield insights into what kind of customers our responders are. But the question is, does the response rate define what techniques we need to use?
For eg, Does only Situation A call for Logistic Regression, while B & C are not suitable for Logistic Regression? Would CHAID IDTs be more suitable where R:NR ratio is near equal i.e 50:50?
As far as my knowledge goes, with more data, a logistic should be benefited into making a robust model with better probability scores. So, a logistic regression model, would definitely work better in any scenario, given the best kind of predictor variables, and definitely better in 50:50 as compared to a 5:95.
Please share your thoughts & experiences.