Creating a big data set from a small data set for logistic regression - AnalyticBridge2020-04-03T21:18:51Zhttps://www.analyticbridge.datasciencecentral.com/forum/topics/creating-a-big-data-set-from-a-small-data-set-for-logistic?feed=yes&xn_auth=noRatheen, you had raised an in…tag:www.analyticbridge.datasciencecentral.com,2013-07-26:2004291:Comment:2580942013-07-26T15:09:41.988ZChandrasekhara S. "C.S." Gantihttps://www.analyticbridge.datasciencecentral.com/profile/ChandrasekharaSCGanti
<p>Ratheen, you had raised an interesting topic / discussion.</p>
<p>Tejamoy / Is this proposed solution usable across many not necessarily Predictive Maintenance problems, ex. in Insurance. Especially rare events (natural hazards modeling frequency say we have 1 -5 storms during certain period as an ex) and the severity of losses (given a distribution for sizes from history).</p>
<p>Similarly in Health Care Risk Assessment for certain diseases "Framingham Heart Study" - Logistic…</p>
<p>Ratheen, you had raised an interesting topic / discussion.</p>
<p>Tejamoy / Is this proposed solution usable across many not necessarily Predictive Maintenance problems, ex. in Insurance. Especially rare events (natural hazards modeling frequency say we have 1 -5 storms during certain period as an ex) and the severity of losses (given a distribution for sizes from history).</p>
<p>Similarly in Health Care Risk Assessment for certain diseases "Framingham Heart Study" - Logistic Regression application Odds Ratios for Coronary problems based on various risk factors Age, family history etc -- Thanks</p>
<p></p> It does! Thanks a tontag:www.analyticbridge.datasciencecentral.com,2013-07-26:2004291:Comment:2580762013-07-26T09:18:27.467Zratheen chaturvedihttps://www.analyticbridge.datasciencecentral.com/profile/ratheenchaturvedi
<p>It does! Thanks a ton</p>
<p>It does! Thanks a ton</p> In that case, what you can do…tag:www.analyticbridge.datasciencecentral.com,2013-07-25:2004291:Comment:2577992013-07-25T16:46:13.510ZTejamoy Ghoshhttps://www.analyticbridge.datasciencecentral.com/profile/TejamoyGhosh
<p>In that case, what you can do is:</p>
<p>Create a linear combination of the variables (predictors), say, LC = a+b1*x1+...+bN*xN</p>
<p>a, b1,...bN being known numbers (as opposed to parameters to be estimated). For example:</p>
<p>LC = 12.64+0.32*x1+...-0.987*xN</p>
<p>Crate ELC = exp(LC)/[1+exp(LC)]</p>
<p>Then create the binary dependent as </p>
<p>if ELC < 0.4 then Y = 0</p>
<p>Else if ELC > 0.6 Y = 1</p>
<p>Else if Random Vbl (from Uniform dist) > 0.05 then Y = 1</p>
<p>Else Y =…</p>
<p>In that case, what you can do is:</p>
<p>Create a linear combination of the variables (predictors), say, LC = a+b1*x1+...+bN*xN</p>
<p>a, b1,...bN being known numbers (as opposed to parameters to be estimated). For example:</p>
<p>LC = 12.64+0.32*x1+...-0.987*xN</p>
<p>Crate ELC = exp(LC)/[1+exp(LC)]</p>
<p>Then create the binary dependent as </p>
<p>if ELC < 0.4 then Y = 0</p>
<p>Else if ELC > 0.6 Y = 1</p>
<p>Else if Random Vbl (from Uniform dist) > 0.05 then Y = 1</p>
<p>Else Y = 0</p>
<p>(Use an appropriate Random Vbl generator function depending on which software you are using)</p>
<p></p>
<p>Now with simulated values of x1,...,xN you can have a dataset of any size.</p>
<p>Does this make sense?</p>
<p></p> Tejamoy sorry for the mislead…tag:www.analyticbridge.datasciencecentral.com,2013-07-25:2004291:Comment:2577272013-07-25T07:31:24.016Zratheen chaturvedihttps://www.analyticbridge.datasciencecentral.com/profile/ratheenchaturvedi
<p><a href="http://www.analyticbridge.com/forum/topic/listForContributor?user=2xccj21rvzo48" class="fn url">Tejamoy</a> sorry for the misleading opening statement. I do not have the small data set - rather the ranges of predictors. Now I have to put 0,1 and simulate a predictive maintenance problem so that I can i can use a classification method logistic regression or decision tree. Any ideas How do I generate the fault (0/1) columns of my data set?</p>
<p><a href="http://www.analyticbridge.com/forum/topic/listForContributor?user=2xccj21rvzo48" class="fn url">Tejamoy</a> sorry for the misleading opening statement. I do not have the small data set - rather the ranges of predictors. Now I have to put 0,1 and simulate a predictive maintenance problem so that I can i can use a classification method logistic regression or decision tree. Any ideas How do I generate the fault (0/1) columns of my data set?</p> One possible way to simulate…tag:www.analyticbridge.datasciencecentral.com,2013-07-25:2004291:Comment:2577242013-07-25T07:20:59.293ZTejamoy Ghoshhttps://www.analyticbridge.datasciencecentral.com/profile/TejamoyGhosh
<p>One possible way to simulate values for the dependent variable can beto use a conditional distribution estimated from the small data you have. This is somewhat extending Ralph's recommended method of using a suitable joint distribution to simulate values for the predictors.</p>
<p>Once you have a model built on the small data, and a set of simulated values for the independent variables, predict values/probabilities of the dependent variable and add an error term (may be dron from iid normal…</p>
<p>One possible way to simulate values for the dependent variable can beto use a conditional distribution estimated from the small data you have. This is somewhat extending Ralph's recommended method of using a suitable joint distribution to simulate values for the predictors.</p>
<p>Once you have a model built on the small data, and a set of simulated values for the independent variables, predict values/probabilities of the dependent variable and add an error term (may be dron from iid normal (0, 0.1). This is a method I have used to create datasets for POC/R&D/Training projects involving many different types of Generalized Linear Models.</p> Thanks for your reply Ralph.…tag:www.analyticbridge.datasciencecentral.com,2013-07-25:2004291:Comment:2577142013-07-25T04:57:08.426Zratheen chaturvedihttps://www.analyticbridge.datasciencecentral.com/profile/ratheenchaturvedi
<p>Thanks for your reply Ralph. Just a correction - I know the ranges and but not have a small data set. this is by the way a predictive maintenance problem. I can use <span>mvrnorm for predictors(sensors) as you suggested .But h</span>ow do I put the target variable (1,0) once I get the normal distribution for predictors. Any ideas or should I go for the higher ranges of values in Sensors and randomly generate 1,0 and 0 for the rest?</p>
<p>Thanks for your reply Ralph. Just a correction - I know the ranges and but not have a small data set. this is by the way a predictive maintenance problem. I can use <span>mvrnorm for predictors(sensors) as you suggested .But h</span>ow do I put the target variable (1,0) once I get the normal distribution for predictors. Any ideas or should I go for the higher ranges of values in Sensors and randomly generate 1,0 and 0 for the rest?</p> Yes, through simulation. You…tag:www.analyticbridge.datasciencecentral.com,2013-07-25:2004291:Comment:2579652013-07-25T02:16:33.261ZRalph Wintershttps://www.analyticbridge.datasciencecentral.com/profile/RalphWinters
<p>Yes, through simulation. You first have to compute the means for all of the variables as well as the correlation or covariance matrices. What you do next depends on your software. e.g if you assume a multivariate normal distribution, you can the R <span>mvrnorm function to generate as many samples as you would like.…</span></p>
<p></p>
<p>Yes, through simulation. You first have to compute the means for all of the variables as well as the correlation or covariance matrices. What you do next depends on your software. e.g if you assume a multivariate normal distribution, you can the R <span>mvrnorm function to generate as many samples as you would like.</span></p>
<p><a rel="nofollow" href="http://stat.ethz.ch/R-manual/R-devel/library/MASS/html/mvrnorm.html" style="font-size: 13px;">http://stat.ethz.ch/R-manual/R-devel/library/MASS/html/mvrnorm.html</a></p>
<p>Another way to do it would be to assign a new variable as a weighting variable which represents the number of occurences of each sample observation. Most stat packages can handle this.</p>
<p>But since you are framing this as a "big data" problem, sounds like using simulation to generate the actual raw data may be a better way to go. </p>
<p></p>
<p></p>