<p>The standard approach to have atleast 2% response rate in your data. You can do boosting here e.g Oversampling is an approach where one can increase the response rate by repeating the no. of rows of responders to a considerable level resulting in increased response rate.</p>
<p>or</p>
<p>Also you can run other decision tree techniques to remove some nonresponders by not considering a segment of customers which will help you increase your response rate.</p>
<p>Hi,</p>
<p>I am now studying an economic binary dependent variable with the Logit Regression Analysis. My data is large (N = 340.000) but the yes cases are only about 1% of the data, so the goodness of fit of my model is very low, mainly because of that. Could you please help me understanding if there are another Binary Regression Models that I should use to obtain better results? Or do you think I should transform my data, or do you have any other idea for this type of situation?...…</p>
Hi -<br />
<br />
Probit assumes a Normal distribution, while Logit assumes a Log distribution of your data set. The reason the results are similar is because the sample size you use is "large" - do the same thing with a smaller dataset and you'll see a distinct difference.
Hi Matt,<br />
<br />
Thanks for your reply. I took time to reply your mail as I was tried to understand your reply. I won't say I've understood everything, but it helpede me to clear lot of points. Thank you for all your help.<br />
<br />
Regards.<br />
Arijit
Arijit<br />
<br />
In their raw form, all of your observations are either 0 or 1, which are discrete groups, so my statement concerning "data in the tails" may not make immediate sense. However, what the probit/logit models actually do is to model a continuous probability of group membership, using one of those two sigmoid curves. Hence, for an individual observation, the model will return a value somewhere between 0 and 1, which lies somewhere on that curve.<br />
<br />
By "tails" I mean the part of the probability…
Hi Matt,<br />
<br />
Thank you for your help. Let me give you the detail of the model I was trying to build. I work as a web analyst, & for one of my client I wanted to know what are the factors that is affecting "higher" sale of a product. So, first I made 2 segment within the products sold based on Google Analytics calculated $ Index value. The higher than average $ Index value is termed as success & marked by 1 & lower than average is termed as failure & marked by 0. So when I was…
Arijit<br />
<br />
I'm not sure what you mean by "my dependent variable is dummy". Could you add some clarification?<br />
<br />
In my experience, the logit and probit models tend to produce extremely similar results and you usually need a lot of data in the tails to notice a difference in fit (if you superimpose the response curves from the two models you will see that they are almost identical). The difference in application of the two approaches is mostly down to which has historically been used in the particular…
