Test for Difference in Proportions - T Test? Proc GLM? - AnalyticBridge2020-08-15T05:08:06Zhttps://www.analyticbridge.datasciencecentral.com/forum/topics/test-for-difference-in?feed=yes&xn_auth=noThere's several reasons
1. Yo…tag:www.analyticbridge.datasciencecentral.com,2009-09-13:2004291:Comment:547222009-09-13T12:03:14.180ZJaap Vinkhttps://www.analyticbridge.datasciencecentral.com/profile/JaapVink
There's several reasons<br />
1. You've intervened in the process. For example when you build a model for a retention campaign you've selected the customers most likely to leave and made them an offer to stay. Therefor you've created your own 'false positives'. You will need to look at the response to refine your model for next time and inprove both your churn preiction and offer acceptance models.<br />
2. The world has changed. New competitors/competitive offers may have influenced the responses and you…
There's several reasons<br />
1. You've intervened in the process. For example when you build a model for a retention campaign you've selected the customers most likely to leave and made them an offer to stay. Therefor you've created your own 'false positives'. You will need to look at the response to refine your model for next time and inprove both your churn preiction and offer acceptance models.<br />
2. The world has changed. New competitors/competitive offers may have influenced the responses and you want to make sure you identify these changes asap.<br />
3. In each model you run the risk of haveing 'false negatives'. You want to make sure for next time that you keep monitoing the perofmance of you're model and either have a model refresh or a champion-challenger approach. You may even include a random sample of non-selected targets to keep an eye on possible new opportunities.<br />
4. More and more companies first send out pilot campaigns to test for different factors that influence response like message, offer, creative etc. These test campaigns are specifically for building models afterwards. OK now you're talking propens…tag:www.analyticbridge.datasciencecentral.com,2009-09-11:2004291:Comment:546442009-09-11T02:08:12.252ZPaul Wilsonhttps://www.analyticbridge.datasciencecentral.com/profile/PaulWilson
OK now you're talking propensity scores and again I 100% agree the algorithms you mention are useful for that purpose.<br />
<br />
However, I don't see why would one need to build a new predictive model in order to evaluate campaign results (i.e. response rate). Usually what people do is build and apply them prior to launching campaigns.<br />
<br />
I can understand why would one use the already developed and deployed model results after campaign is finished to compare and evaluate actual results to model output…
OK now you're talking propensity scores and again I 100% agree the algorithms you mention are useful for that purpose.<br />
<br />
However, I don't see why would one need to build a new predictive model in order to evaluate campaign results (i.e. response rate). Usually what people do is build and apply them prior to launching campaigns.<br />
<br />
I can understand why would one use the already developed and deployed model results after campaign is finished to compare and evaluate actual results to model output like for example propensity deciles or clusters response rates.<br />
I'm not sure why would someone need to build a new predictive model after campaign is over in order to evaluate that campaign.<br />
<br />
Maybe I misunderstood something. "Of course the strict assumpt…tag:www.analyticbridge.datasciencecentral.com,2009-09-11:2004291:Comment:546402009-09-11T01:52:37.864ZPaul Wilsonhttps://www.analyticbridge.datasciencecentral.com/profile/PaulWilson
"Of course the strict assumptions behind the tests are often violated but this does not necessarily mean that the results are worthless"<br />
<br />
Sure, but that 95% percent confidence that one provides hoping to give the results little more backbone looses its charm after one has to communicate with such caution<br />
<br />
:)<br />
<br />
<br />
"Statistical methods are simply powerful tools to aid understanding and decision-making. They are not an excuse to turn off the brain"<br />
<br />
Couldn't agree more with that statement.
"Of course the strict assumptions behind the tests are often violated but this does not necessarily mean that the results are worthless"<br />
<br />
Sure, but that 95% percent confidence that one provides hoping to give the results little more backbone looses its charm after one has to communicate with such caution<br />
<br />
:)<br />
<br />
<br />
"Statistical methods are simply powerful tools to aid understanding and decision-making. They are not an excuse to turn off the brain"<br />
<br />
Couldn't agree more with that statement. It is certainly true that a b…tag:www.analyticbridge.datasciencecentral.com,2009-09-10:2004291:Comment:546142009-09-10T18:00:18.549ZMatt Coateshttps://www.analyticbridge.datasciencecentral.com/profile/MattCoates
It is certainly true that a blind application and reliance on statistical methods without a good understanding of how to interpret the results can be dangerous - but this is not a failing of the methods. In my experience it is much more common for people to neglect the use of proper statistical techniques than to have too much reliance on them.<br />
<br />
Regarding your specific points:<br />
<br />
1. Yes, it is true that with enough observations, even the smallest of effects will become statistically significant,…
It is certainly true that a blind application and reliance on statistical methods without a good understanding of how to interpret the results can be dangerous - but this is not a failing of the methods. In my experience it is much more common for people to neglect the use of proper statistical techniques than to have too much reliance on them.<br />
<br />
Regarding your specific points:<br />
<br />
1. Yes, it is true that with enough observations, even the smallest of effects will become statistically significant, but that is not a reason to dismiss the use of statistical methods in these cases. If you have a very large data set (lucky you) and you find a significant effect that is too small to be of any business value, the interpretation of the result is still the same - that it is unlikely to have come about purely by chance. You have simply found a very small, but still apparently real, effect - no harm done because you (the intelligent practitioner) know that it is too small to be important (though it might still be of interest).<br />
<br />
At the other end of the scale, there are those effects that are so large as to be obviously both significant and important without the use of any statistical tests, regardless of sample size - again, lucky you!<br />
<br />
The real value of statistical methods comes in providing an objective criterion to assess all of those in-between cases which, due either to small relative sample size or large background variability, are difficult assess and where lack of knowledge or faulty human intuition can mislead you into making bad decisions.<br />
<br />
2. Of course the strict assumptions behind the tests are often violated but this does not necessarily mean that the results are worthless - just that they have to be interpreted with caution. Many methods are quite robust to these violations (within limits) and there are usually other alternatives (e.g. models based on non-Normal distributions, non-parametric methods, etc.) that can be used to verify or refine the results before using them to support mission-critical decisions.<br />
<br />
The bottom line...<br />
<br />
Statistical methods are simply powerful tools to aid understanding and decision-making. They are not an excuse to turn off the brain. Just like any other power tools, they require training, skill, experience and care to get the best results. Without them, however, you are left with the old 'hand tools' of guesswork, intuition, and trial and error - I know which I prefer. I agree with both points. Esp…tag:www.analyticbridge.datasciencecentral.com,2009-09-10:2004291:Comment:546132009-09-10T17:05:04.093ZJaap Vinkhttps://www.analyticbridge.datasciencecentral.com/profile/JaapVink
I agree with both points. Especially with your remarks above about 'millions of cases' in mind. But even in that case it might make sense to use 'statistical techniques' like modelling on different samples because the response to a mailing can have a random factor and if you model on all data you are likely to mis some key segments. Using several models from different samples together (and maybe even different types of models) might give a more balanced propensity score. Also using multiple…
I agree with both points. Especially with your remarks above about 'millions of cases' in mind. But even in that case it might make sense to use 'statistical techniques' like modelling on different samples because the response to a mailing can have a random factor and if you model on all data you are likely to mis some key segments. Using several models from different samples together (and maybe even different types of models) might give a more balanced propensity score. Also using multiple samples removes exactly the problem that you have with millions of cases as described in 1) in techniques like CHAID, C&RT, Logistic Regression and other models based on statistics. Good points Matt. If a differ…tag:www.analyticbridge.datasciencecentral.com,2009-09-09:2004291:Comment:545742009-09-09T19:38:29.272ZMike Laracyhttps://www.analyticbridge.datasciencecentral.com/profile/MikeLaracy
Good points Matt. If a difference in a statistic between two groups is found to be “statistically significant’ it simply means that based on sample size, variation, and the value of the measured statistic, the difference you have seen is not likely to have occurred by chance. What is actually being testing in this case is the hypothesis that p1-p2=0 (where p1 and p2 are the response rates for the two groups). To become even more useful in making business decisions, this hypothesis being tested…
Good points Matt. If a difference in a statistic between two groups is found to be “statistically significant’ it simply means that based on sample size, variation, and the value of the measured statistic, the difference you have seen is not likely to have occurred by chance. What is actually being testing in this case is the hypothesis that p1-p2=0 (where p1 and p2 are the response rates for the two groups). To become even more useful in making business decisions, this hypothesis being tested can be changed to test whether the difference between the two groups is greater than a certain threshold.<br />
<br />
For example, perhaps you have lowered a monthly rate from $21.99 to $19.99 for a group of customers, and you know that in order to make a business case for this, you need customer retention rates to increase by 10% (in real terms) in order for the lowering of the rate from $21.99 to $19.99 to make business sense. In that case instead of testing the hypothesis that p1-p2=0, you would want to test the hypothesis that p2-p1>.1. Just making this point to show that you are not limited to testing whether or not the difference for a statistic between two groups is zero. I certainly appreciate your p…tag:www.analyticbridge.datasciencecentral.com,2009-09-09:2004291:Comment:545712009-09-09T17:20:19.368ZPaul Wilsonhttps://www.analyticbridge.datasciencecentral.com/profile/PaulWilson
I certainly appreciate your perspective and agree with many things said.<br />
<br />
I'll also add that in my opinion there is a danger of relying on the statistical tests too much for a number of reasons.<br />
Here are a couple I could think of off the top of my head:<br />
<br />
1) as the sample increases, the likelihood of a statistical test being significant increases as well (law of the large numbers).<br />
In DM you're often dealing with huge samples which will tend to produce statisticaly significant tests more than…
I certainly appreciate your perspective and agree with many things said.<br />
<br />
I'll also add that in my opinion there is a danger of relying on the statistical tests too much for a number of reasons.<br />
Here are a couple I could think of off the top of my head:<br />
<br />
1) as the sample increases, the likelihood of a statistical test being significant increases as well (law of the large numbers).<br />
In DM you're often dealing with huge samples which will tend to produce statisticaly significant tests more than they really should.<br />
<br />
2) Statistical tests have quite a few assumptions and in most "real world application" cases they are violated. I suppose we have a misunders…tag:www.analyticbridge.datasciencecentral.com,2009-09-09:2004291:Comment:545682009-09-09T17:10:01.696ZPaul Wilsonhttps://www.analyticbridge.datasciencecentral.com/profile/PaulWilson
I suppose we have a misunderstanding here.<br />
My example given above pertains to a "main" mailing of millions pieces of mail, not a test of 5000 or so.<br />
0.3% response increase in those circumstances can indeed make a monetary impact on a bottom line, future customer retention effors, ROI etc.<br />
I can see the logic you describe be applied in a test/control group setting though.
I suppose we have a misunderstanding here.<br />
My example given above pertains to a "main" mailing of millions pieces of mail, not a test of 5000 or so.<br />
0.3% response increase in those circumstances can indeed make a monetary impact on a bottom line, future customer retention effors, ROI etc.<br />
I can see the logic you describe be applied in a test/control group setting though. "Statistically significant" s…tag:www.analyticbridge.datasciencecentral.com,2009-09-09:2004291:Comment:545522009-09-09T12:49:11.736ZMatt Coateshttps://www.analyticbridge.datasciencecentral.com/profile/MattCoates
"Statistically significant" simply means that the difference you have seen is unlikely to have occurred purely by chance and is therefore likely to be a "real" effect. If the result is not statistically significant then, however attractive the potential profit might look from the pilot study, you simply don't know whether you will get a similar result when you make your investment. Unless you have calculated statistical confidence limits for the expected improvement (giving best and worst case…
"Statistically significant" simply means that the difference you have seen is unlikely to have occurred purely by chance and is therefore likely to be a "real" effect. If the result is not statistically significant then, however attractive the potential profit might look from the pilot study, you simply don't know whether you will get a similar result when you make your investment. Unless you have calculated statistical confidence limits for the expected improvement (giving best and worst case results) you could just as easily see a drop in profit once you go live in a larger market.<br />
<br />
In an ideal world, you would ensure via proper statistical sample size calculation that a difference that is important from a business perspective coincides with one that is statistically significant. The difference in response yo…tag:www.analyticbridge.datasciencecentral.com,2009-09-09:2004291:Comment:545482009-09-09T12:21:28.051ZJaap Vinkhttps://www.analyticbridge.datasciencecentral.com/profile/JaapVink
The difference in response you mention (1.2 vs 0.9) could be a fluke of random events and you'd be gambling you're company's money and in stead of . Statistics will give you more confidence in making your decision. If you have to segments of 5000 consumers with these response rates and you perform a Z test you'll find that the confidence level of the difference is 91.5%. This at least gives you the information that you've taken the decision to interpret the difference as significant while…
The difference in response you mention (1.2 vs 0.9) could be a fluke of random events and you'd be gambling you're company's money and in stead of . Statistics will give you more confidence in making your decision. If you have to segments of 5000 consumers with these response rates and you perform a Z test you'll find that the confidence level of the difference is 91.5%. This at least gives you the information that you've taken the decision to interpret the difference as significant while there's still a 8.5% chance that the difference is caused by randomness.