All Discussions Tagged 'logistic' - AnalyticBridge2020-06-04T05:31:07Zhttps://www.analyticbridge.datasciencecentral.com/forum/topic/listForTag?tag=logistic&feed=yes&xn_auth=noPreparing categorical data with a large amount of categoriestag:www.analyticbridge.datasciencecentral.com,2015-02-12:2004291:Topic:3195552015-02-12T13:26:03.655ZAlexhttps://www.analyticbridge.datasciencecentral.com/profile/Alex995
<p>Hi all,</p>
<p>I have data containing few categorical columns with a huge amount of categories at each (more than 1000 different categories at each column). I have to build a predictive model on this data, using the Logistic Regression method (I cannot use any model that can handle categorical data as is - Random Forest, Naïve Bayes, etc.).</p>
<p>Applying the standard 1-to-N method, to change the categorical values to 0-1 vectors, generates a really huge dimension and causes the algorithm…</p>
<p>Hi all,</p>
<p>I have data containing few categorical columns with a huge amount of categories at each (more than 1000 different categories at each column). I have to build a predictive model on this data, using the Logistic Regression method (I cannot use any model that can handle categorical data as is - Random Forest, Naïve Bayes, etc.).</p>
<p>Applying the standard 1-to-N method, to change the categorical values to 0-1 vectors, generates a really huge dimension and causes the algorithm to work very slowly (so I cannot apply this categorical data handling method).</p>
<p>Does anybody know any method how to transform categorical data with a large amount of categories, so that distance based methods will be able to handle this data properly?</p>
<p>Thanks in advance!</p> Payment projection scorecardtag:www.analyticbridge.datasciencecentral.com,2012-05-03:2004291:Topic:1886312012-05-03T09:24:06.624ZJanvihttps://www.analyticbridge.datasciencecentral.com/profile/ManishaSadhwani
<p>Hi Dear fellow members,</p>
<p>I am working on a payment projection scorecard for Collections team. I wanted to build continuous outcome model where the observed % payment received could be split into one event and one non event with suitable weights (proportion recovered could be weight for event while 1-proportion not recovered would be weight for non event).</p>
<p>Would proc logistic with weights option be a good option or should I consider using survey logistic. I am not using any…</p>
<p>Hi Dear fellow members,</p>
<p>I am working on a payment projection scorecard for Collections team. I wanted to build continuous outcome model where the observed % payment received could be split into one event and one non event with suitable weights (proportion recovered could be weight for event while 1-proportion not recovered would be weight for non event).</p>
<p>Would proc logistic with weights option be a good option or should I consider using survey logistic. I am not using any complex survey data. The standard errors and Chi-sq values between the two methods is remarkably different and I am not sure how to proceed.</p>
<p>Please advise if any of you have any insights into this. </p>
<p>Many thanks in advance!!</p>
<p></p> How to reduce high concordance (more than 85) in logistic regression model?tag:www.analyticbridge.datasciencecentral.com,2010-08-03:2004291:Topic:754902010-08-03T18:20:32.746ZBiswajit Palhttps://www.analyticbridge.datasciencecentral.com/profile/BiswajitPal
<p align="left" class="MsoNormal" style="MARGIN: 0cm 0cm 10pt"><font color="#000000" face="Calibri" size="3">Hi</font></p>
<p align="left" class="MsoNormal" style="MARGIN: 0cm 0cm 10pt"><font color="#000000" face="Calibri" size="3">I am getting a very high concordance in one of my logistic regression model.</font></p>
<p align="left" class="MsoNormal" style="MARGIN: 0cm 0cm 10pt"><font color="#000000" face="Calibri" size="3">Can anybody explain the effect of it in the model or why it is not…</font></p>
<p style="MARGIN: 0cm 0cm 10pt" class="MsoNormal" align="left"><font color="#000000" size="3" face="Calibri">Hi</font></p>
<p style="MARGIN: 0cm 0cm 10pt" class="MsoNormal" align="left"><font color="#000000" size="3" face="Calibri">I am getting a very high concordance in one of my logistic regression model.</font></p>
<p style="MARGIN: 0cm 0cm 10pt" class="MsoNormal" align="left"><font color="#000000" size="3" face="Calibri">Can anybody explain the effect of it in the model or why it is not recommended of having a very high concordance and what steps to follow to reduce it back to 65-70?</font></p>
<p style="MARGIN: 0cm 0cm 10pt" class="MsoNormal" align="left"><font color="#000000" size="3" face="Calibri">Thanks a lot in advance!</font></p>
<p style="MARGIN: 0cm 0cm 10pt" class="MsoNormal" align="left"><font color="#000000" size="3" face="Calibri">Thanks</font></p>
<p style="MARGIN: 0cm 0cm 10pt" class="MsoNormal" align="left"><font color="#000000" size="3" face="Calibri">Biswajit</font></p> Oversampling/Undersampling in Logistic Regressiontag:www.analyticbridge.datasciencecentral.com,2010-06-23:2004291:Topic:723292010-06-23T16:23:10.834ZDataLLigencehttps://www.analyticbridge.datasciencecentral.com/profile/datalligence
<p>Most people use logistic regression for modeling response, attrition, risk, etc. And in the world of business, these are usually rare occurences.</p>
<p> </p>
<p>One practise widely accepted is oversampling or undersampling to model these rare events. Sometime back, I was working on a campaign response model using logistic regression. After getting frustrated with the model performance/accuracy, I use weights to oversample the responders. I remember clearly that I got the same or a very…</p>
<p>Most people use logistic regression for modeling response, attrition, risk, etc. And in the world of business, these are usually rare occurences.</p>
<p> </p>
<p>One practise widely accepted is oversampling or undersampling to model these rare events. Sometime back, I was working on a campaign response model using logistic regression. After getting frustrated with the model performance/accuracy, I use weights to oversample the responders. I remember clearly that I got the same or a very similar model.</p>
<p> </p>
<p>According to Gordon Linoff and Michael Berry's <strong><a href="http://blog.data-miners.com/">blog</a></strong></p>
<p> </p>
<p>"Standard statistical techniques are insensitive to the original density of the data. So, a logistic regression run on oversampled data should produce essentially the same model as on the original data. It turns out that the confidence intervals on the coefficients do vary, but the model remains basically the same."</p>
<p> </p>
<p>But everyone seems to extol or recommend oversampling/undersampling for modeling rare events using logistic regression. What are your experiences and opinions on this?</p>
<p> </p>
<p>Regards,</p>
<p><a href="http://datalligence.blogspot.com/">Datalligence</a></p>