All Discussions Tagged 'Regression' - AnalyticBridge2019-09-18T11:28:05Zhttps://www.analyticbridge.datasciencecentral.com/forum/topic/listForTag?tag=Regression&feed=yes&xn_auth=noEasy to use tool for estimating probability of generating a saletag:www.analyticbridge.datasciencecentral.com,2016-03-07:2004291:Topic:3429992016-03-07T00:36:31.581ZDavid Collinshttps://www.analyticbridge.datasciencecentral.com/profile/DavidCollins
<p>Hi - I am trying to determine a good tool (that requires minimal additional effort) that will help me generate a probability of a sale for a list of of 300,000 products. I have attached a sample of the data, with 20,000 records.</p>
<p>Basically, I have a table of historical sales data (with about 300,000 records) that contains around 8 continuous independent variables along with a dependent variable that has a yes/no (i.e., binary outcome) value indicating whether product in the list has…</p>
<p>Hi - I am trying to determine a good tool (that requires minimal additional effort) that will help me generate a probability of a sale for a list of of 300,000 products. I have attached a sample of the data, with 20,000 records.</p>
<p>Basically, I have a table of historical sales data (with about 300,000 records) that contains around 8 continuous independent variables along with a dependent variable that has a yes/no (i.e., binary outcome) value indicating whether product in the list has had a sale in the past 12 months.</p>
<p>The historical data essentially looks like this.</p>
<p>Product1,2,3 etc<br/>Variable 1<br/>Variable 2 <br/>Variable 3<br/>Variable 4<br/>Variable 5<br/>Variable 6<br/>Variable 7<br/>Variable 8<br/>[B]Sold in past 12 months[/B] (Yes or No)</p>
<p>The last variable in the list is of course the dependent variable.</p>
<p>All I want to do is to find a tool that is going to be the best or easiest to use, so that I can assign a probability to each product in the list, essentially giving me the chance to condense my list to the products that are the highest likelihood to generate a sale, so that I can list those products instead of the others that have lower probability of generating a sale.</p>
<p>Ideally, the tool could do a quick logistic regression, or some other probability calculation based on the available variables, and thereby give me a (RVU-like) number (perhaps a probability ranging from 0 to 1) for each product, allowing me to quickly select the top 50,000 products to list on a website, since they have the higher probability of generating a sale according to the available variables.</p>
<p>I am of course assuming that the variables are somehow correlated to the outcome, but perhaps the tool will help me determine that.</p>
<p>Does anyone have any suggestions of a good tool to accomplish this? I would presume that there is a simple way to set this up in Microsoft Excel, but if not, then a piece of software that does this would of course be great too.</p>
<p>Or, feel free to review the actual sample data set, to help me understand how best to approach analyzing the data, and whether I should eliminate certain variables from the results. </p>
<p>See attached file. </p>
<p>Thanks for any suggestions.</p> Queries in modelingtag:www.analyticbridge.datasciencecentral.com,2015-12-29:2004291:Topic:3385312015-12-29T14:24:55.267ZRaghu Chittarihttps://www.analyticbridge.datasciencecentral.com/profile/RaghuChittari
<p>Hi all,</p>
<p>I am from engineering background. I would require your help in certain modeling concepts. Your help would be greatly appreciated!</p>
<p>Following are my few questions...</p>
<ol>
<li>If a variable which is important from business standpoint has a p-value of 0.5, then should it be considered in the model? If Yes, then wouldn't it make the model coefficients unstable?</li>
<li>Should I standardize the variables before building a logistic regression model? If Yes, is there a…</li>
</ol>
<p>Hi all,</p>
<p>I am from engineering background. I would require your help in certain modeling concepts. Your help would be greatly appreciated!</p>
<p>Following are my few questions...</p>
<ol>
<li>If a variable which is important from business standpoint has a p-value of 0.5, then should it be considered in the model? If Yes, then wouldn't it make the model coefficients unstable?</li>
<li>Should I standardize the variables before building a logistic regression model? If Yes, is there a commonly followed approach?</li>
<li>I am planning to develop a logistic regression to rate the employees as good or bad. The model includes variables such as his innovation score, #papers published, salary, Training cost, etc. First two are kind of assets to the company and the next two are kind of liabilities. Should I explicitly make the model understand this by considering the liabilities as negative values?</li>
<li>I have two independent variables in my LR model. Var1 has levels 'A' and 'B'. Var2 has levels 'X' and 'Y'. Of the entire dataset, there are 30% observations with Var1 as 'A' and Var2 as 'X', 35% observations with Var1 as 'A' and Var2 as 'Y', 30% observations with Var1 as 'B' and Var2 as 'X', 5% observations with Var1 as 'B' and Var2 as 'Y'. The number of observations with Var1 as 'B' and Var2 as 'Y' are far too less compared to other combinations. Is this skewness in data going to affect my results? If so, how should I rectify this?</li>
</ol> Random Forests vs MARS vs Linear regressiontag:www.analyticbridge.datasciencecentral.com,2014-03-13:2004291:Topic:2907282014-03-13T11:53:05.800ZOrestis Chrysafishttps://www.analyticbridge.datasciencecentral.com/profile/OrestisChrysafis
<p>Hi all, I would like to get the group's view on the advantages and disadvantages of Random Forests and MARS modelling vs Linear regression. It would be interesting to compare them both at a statistical principles level, but also in their usefulness to econometrics.</p>
<p>Hi all, I would like to get the group's view on the advantages and disadvantages of Random Forests and MARS modelling vs Linear regression. It would be interesting to compare them both at a statistical principles level, but also in their usefulness to econometrics.</p> Techniques to address very low event rate for Logistic Regression Modeltag:www.analyticbridge.datasciencecentral.com,2013-10-25:2004291:Topic:2777712013-10-25T09:52:58.645ZHimanshu Sinhahttps://www.analyticbridge.datasciencecentral.com/profile/HSINHA
<p> Hi Folks,</p>
<p></p>
<p>I am looking at data form a telecom company and developing model to predict an event ( read churn).</p>
<p></p>
<p>I am planning to develop GLM using logit link function.</p>
<p>The real problem I am facing in the data is - very low volume (1.6 %) of churners.</p>
<p>So seeking advise on the following ;</p>
<p>- What are the possible (bad) outcomes if I take randomised training sample, consisting just 1.6 % churners ?</p>
<p>- Should I weight the training…</p>
<p> Hi Folks,</p>
<p></p>
<p>I am looking at data form a telecom company and developing model to predict an event ( read churn).</p>
<p></p>
<p>I am planning to develop GLM using logit link function.</p>
<p>The real problem I am facing in the data is - very low volume (1.6 %) of churners.</p>
<p>So seeking advise on the following ;</p>
<p>- What are the possible (bad) outcomes if I take randomised training sample, consisting just 1.6 % churners ?</p>
<p>- Should I weight the training sample to have a event rate >25% ?</p>
<p>- Any other technique to address problem of such small event rate.</p>
<p></p>
<p>Regards,</p>
<p>HV</p>
<p></p>
<p></p>
<p></p> Understanding the Kalman Filter Application in Economic Time Series Datatag:www.analyticbridge.datasciencecentral.com,2013-04-25:2004291:Topic:2430742013-04-25T13:37:53.300ZArunhttps://www.analyticbridge.datasciencecentral.com/profile/Arun
<div class="discussion"><div class="description"><div class="xg_user_generated"><p>The Kalman filter has been extensively used in Science for various applications, from detecting missile targets to just any changing scenario that can be learned.</p>
<p>I'm trying to understand how Kalman Filter can be applied on Time Series data with Exogenous variables - in a nutshell, trying to replicate PROC UCM in excel.</p>
<p>State-space equation :</p>
<p><img alt="Kalman - equation 1" border="p" height="23" src="http://bilgin.esme.org/Portals/0/images/kalman/equation1.gif" width="215"></img></p>
<p><img alt="Kalman - equation 2" border="0" height="30" src="http://bilgin.esme.org/Portals/0/images/kalman/equation2.gif" width="115"></img></p>
<p>To those…</p>
</div>
</div>
</div>
<div class="discussion"><div class="description"><div class="xg_user_generated"><p>The Kalman filter has been extensively used in Science for various applications, from detecting missile targets to just any changing scenario that can be learned.</p>
<p>I'm trying to understand how Kalman Filter can be applied on Time Series data with Exogenous variables - in a nutshell, trying to replicate PROC UCM in excel.</p>
<p>State-space equation :</p>
<p><img alt="Kalman - equation 1" src="http://bilgin.esme.org/Portals/0/images/kalman/equation1.gif" border="p" height="23" width="215"/></p>
<p><img alt="Kalman - equation 2" src="http://bilgin.esme.org/Portals/0/images/kalman/equation2.gif" border="0" height="30" width="115"/></p>
<p>To those familiar with the Kalman filter, it essentially consists of the following two steps,</p>
<p>Predict:</p>
<p><img alt="Kalman Filter - Time Update Equations" src="http://bilgin.esme.org/Portals/0/images/kalman/time_update_equations.gif" style="width: 177px; height: 86px; border-width: 0px; border-style: solid;"/></p>
<p>Update:</p>
<p><img alt="Kalman Filter - Measurement Update Equations" src="http://bilgin.esme.org/Portals/0/images/kalman/measurement_update_equations.gif" style="width: 233px; height: 142px; border-width: 0px; border-style: solid;"/></p>
<p>Most of the text on Kalman only introduce univariate analysis, with no exogenous variables. And most applications in control engg seem to suit that as well.</p>
<p>What I'm stuck figuring is -</p>
<p>1. How can I update the H matrix with every observation? Pretty much, MMSE or ML can help me do this, but I'm just unable to do this with just one observation! The problem of recursive estimation with just one observation if I could say...</p>
<p>2. How can I bring in the estimation of betas of other exogenous variables that also affect the Y variable, so, I'm going to be understating the latent state variable to be just a constant base or linear trend.</p>
<p>Any help would be greatly appreciated, and if you have some good docs/sites that explain this better for the econometrician, please do pass it on.</p>
<p>Thanks,</p>
<p>Arun</p>
</div>
</div>
</div> Does R:NR ratio matter in deciding what technique we use for modeling?tag:www.analyticbridge.datasciencecentral.com,2010-09-21:2004291:Topic:790332010-09-21T18:42:39.579ZArunhttps://www.analyticbridge.datasciencecentral.com/profile/Arun
I came across some speculation on R:NR ratio to decide the technique that needs to be employed. I haven't found any documentation or proof as yet, so I thought I'd get some feedback/comments on the same.<br></br><br></br>Taking 3 scenarios of modeling situation:<br></br>We have a 3 populations of 100K customers, targeted by 3 different programs<br></br><br></br>Situation A - 5% have responded to a program of ours.<br></br>Situation B - Nearly 50% have responded.<br></br>Situation C - Greater than 70-80% have…
I came across some speculation on R:NR ratio to decide the technique that needs to be employed. I haven't found any documentation or proof as yet, so I thought I'd get some feedback/comments on the same.<br/><br/>Taking 3 scenarios of modeling situation:<br/>We have a 3 populations of 100K customers, targeted by 3 different programs<br/><br/>Situation A - 5% have responded to a program of ours.<br/>Situation B - Nearly 50% have responded.<br/>Situation C - Greater than 70-80% have responded.<br/><br/>In each of the three scenarios, we can exploit the data to yield insights into what kind of customers our responders are. But the question is, does the response rate define what techniques we need to use?<br/><br/>For eg, Does only Situation A call for Logistic Regression, while B & C are not suitable for Logistic Regression? Would CHAID IDTs be more suitable where R:NR ratio is near equal i.e 50:50?<br/><br/>As far as my knowledge goes, with more data, a logistic should be benefited into making a robust model with better probability scores. So, a logistic regression model, would definitely work better in any scenario, given the best kind of predictor variables, and definitely better in 50:50 as compared to a 5:95.<br/><br/>Please share your thoughts & experiences.<br/><br/>Thanks,<br/>Arun<br/> Accessing robustness of a Logistic Modeltag:www.analyticbridge.datasciencecentral.com,2010-08-03:2004291:Topic:754922010-08-03T18:26:59.099ZArunhttps://www.analyticbridge.datasciencecentral.com/profile/Arun
Hi,<br></br><br></br>I've got a Logistic model built for a particular response-non response event.<br></br><br></br>The model suggests statistics that don't look like a robust model. I'm sharing those for more clarification..<br></br><br></br>No. of variables - around 5-8<br></br>c = 0.9<br></br>concordance = 0.93<br></br>H-L Chi square (Goodness of Fit)= 700 (P <<0.0001) (rejects Null - bad model characteristics)<br></br><br></br>Also, a univariate distribution of P(Y=1|X1..Xn) gives me 95% of the probabilities fall within 0.4!!!…
Hi,<br/><br/>I've got a Logistic model built for a particular response-non response event.<br/><br/>The model suggests statistics that don't look like a robust model. I'm sharing those for more clarification..<br/><br/>No. of variables - around 5-8<br/>c = 0.9<br/>concordance = 0.93<br/>H-L Chi square (Goodness of Fit)= 700 (P <<0.0001) (rejects Null - bad model characteristics)<br/><br/>Also, a univariate distribution of P(Y=1|X1..Xn) gives me 95% of the probabilities fall within 0.4!!! Which suggests that the model does poorer than a random!!<br/><br/>What are the ways to improve my model? I know of one or two methods that I surfed through recently, but none hands on.. Would like to hear any advice on this!<br/><br/>Thanks in advance.<br/><br/>Arun<br/> Discriminant Analysis on Categorical Variablestag:www.analyticbridge.datasciencecentral.com,2009-10-26:2004291:Topic:562332009-10-26T10:27:40.888ZArunhttps://www.analyticbridge.datasciencecentral.com/profile/Arun
I have a set of Independent Variables - both Categorical Variables and Continuous Variables. There is the predictor variable which have certain classes say C1 to Cn. The aim is to predict the category membership!<br />
<br />
I'm facing two issues. Any discriminant procedure requires only continuous variables for prediciting. And second, logistic regression which can be used produces probability values of category membership, which does not equivalently specify the inter-class variance using distance…
I have a set of Independent Variables - both Categorical Variables and Continuous Variables. There is the predictor variable which have certain classes say C1 to Cn. The aim is to predict the category membership!<br />
<br />
I'm facing two issues. Any discriminant procedure requires only continuous variables for prediciting. And second, logistic regression which can be used produces probability values of category membership, which does not equivalently specify the inter-class variance using distance measures like a Canonical Discriminant Analysis does using %plotit macro.<br />
<br />
Hence, I've got two questions.<br />
1. If I've got mixed variables - both Continuous & Catergorical, can I still predict membership of category in the predictor variable? If yes, how?<br />
2. If the answer to the above is to use Logistic Regression or Genmod/Catmod, can I still obtain a plot of the various observations that are governed by the category in a distance measure plot to find out the between category variance/distance and hence understand visually what is the scenario of the categories.<br />
<br />
Also, I'm not able to plot using %plotit due to the high no. of observations I've got (1.5 Mi). Do I need to consider a downscaling to bring it down to a lesser no? Or can I plot a contour to know the idea of the area coverage? PROC LOGISTIC and Data Visualization Topics of Free Nov 7 Online VirtualSUG Sessionstag:www.analyticbridge.datasciencecentral.com,2008-11-01:2004291:Topic:277602008-11-01T13:18:17.270ZAndrew Karphttps://www.analyticbridge.datasciencecentral.com/profile/AndrewKarp
The Virtual SAS Users Group (VirtualSUG) will present two free online sessions on November 7, 2008. Complete details, including information on how to register for these events, is available at <a href="http://www.virtualsug.org">http://www.virtualsug.org</a><br />
<br />
The first session on November 7 will be offered by Joshua Drukenborg of the US Environmental Protection Agency from 0830-0930 Pacific/1130-1230 Eastern Time. His presentation, “Using SAS® and Google Earth™ to Access and Display Air…
The Virtual SAS Users Group (VirtualSUG) will present two free online sessions on November 7, 2008. Complete details, including information on how to register for these events, is available at <a href="http://www.virtualsug.org">http://www.virtualsug.org</a><br />
<br />
The first session on November 7 will be offered by Joshua Drukenborg of the US Environmental Protection Agency from 0830-0930 Pacific/1130-1230 Eastern Time. His presentation, “Using SAS® and Google Earth™ to Access and Display Air Pollution Data,” demonstrates how SAS can be used to create files written in Google Earth’s Keyhole Markup Language (KML) as well as how these KML files can utilize SAS/IntrNet® to display data dynamically in an easy, user-friendly fashion.<br />
<br />
The second presentation on Nov. 7 is by Peter Flom, Ph.D., an independent SAS and statistical consulting in New York City. “PROC LOGISITC: Traps for the Unwary,” identifies situations where this popular SAS/STAT® procedure runs without errors, but the model it generated is problematic. The talk then describes ways to address these problems in order to obtain useful results.<br />
<br />
Complete details on how to participate in VirtualSUG’s online sessions are available at <a href="http://www.virtualsug.org">http://www.virtualsug.org</a> . Please take the time to read ALL of the information on our site’s homepage to understand how VirtualSUG “works” and what you will need to do to take advantage of this free online resource for the SAS Software user community.<br />
<br />
Thank you!<br />
<br />
Andrew Karp<br />
Virtual SAS Users Group<br />
<a href="Http://www.VirtualSUG.org">Http://www.VirtualSUG.org</a> Free Predictive Analytics decision engine with iGoogletag:www.analyticbridge.datasciencecentral.com,2008-04-24:2004291:Topic:121482008-04-24T22:30:49.010ZAlex Guazzellihttps://www.analyticbridge.datasciencecentral.com/profile/AlexGuazzelli
Recently, the company I work for, Zementis, released the first Predictive Analytics Engine that can be accessed anywhere at anytime. It can be downloaded as an iGoogle gadget for now. Coming up in May, Zementis will start offering the full decision engine through the Amazon Elastic Compute Cloud (EC2), making it the first SaaS (Software as a Service) decision engine available.<br />
<br />
Here are some useful links:<br />
<br />
Info on our Amazon EC2 offering - coming soon:…
Recently, the company I work for, Zementis, released the first Predictive Analytics Engine that can be accessed anywhere at anytime. It can be downloaded as an iGoogle gadget for now. Coming up in May, Zementis will start offering the full decision engine through the Amazon Elastic Compute Cloud (EC2), making it the first SaaS (Software as a Service) decision engine available.<br />
<br />
Here are some useful links:<br />
<br />
Info on our Amazon EC2 offering - coming soon: <a href="http://www.zementis.com/howtobuy.htm">http://www.zementis.com/howtobuy.htm</a><br />
<br />
The iGoogle ADAPA Predictive Analytics gadget … install the engine as an iGoogle gadget and start processing your data right away!! See pic below.<br />
<a href="http://www.google.com/ig/adde?hl=en&moduleurl=hosting.gmodules.com/ig/gadgets/file/115640297026242314759/adapawidget.xml">http://www.google.com/ig/adde?hl=en&moduleurl=hosting.gmodules.com/ig/gadgets/file/115640297026242314759/adapawidget.xml</a><br />
<br />
The iGoogle ADAPA PMML 3.2 Converter gadget … converts PMML models generated for example in SPSS PMML 3.1 to PMML 3.2 for scoring in ADAPA.<br />
<a href="http://www.google.com/ig/adde?hl=en&moduleurl=hosting.gmodules.com/ig/gadgets/file/115640297026242314759/converterwidget.xml">http://www.google.com/ig/adde?hl=en&moduleurl=hosting.gmodules.com/ig/gadgets/file/115640297026242314759/converterwidget.xml</a><br />
<br />
Both ADAPA and converter gadgets can also be accessed directly from our website: <a href="http://www.zementis.com">http://www.zementis.com</a><br />
<br />
FAQ on the predictive analytics engine, PMML, and the iGoogle gadgets:<br />
<a href="http://adapasupport.zementis.com">http://adapasupport.zementis.com</a><br />
<br />
To learn more about PMML (Predictive Modeling Markup Language) … the standard way to represent data mining models, check:<br />
<a href="http://www.dmg.org">www.dmg.org</a><br />
<br />
If you need examples of SVM or Neural Network models to start playing with, visit our examples page:<br />
<a href="http://www.zementis.com/pmml_examples.htm">http://www.zementis.com/pmml_examples.htm</a><br />
<br />
We have also been working with a company in Australia to offer PMML Exporters to the R community. You can basically build your models for free in R and use ADAPA web services functionality to score them in real-time from inside your customer platform. For more visit: <a href="http://www.zementis.com/pmml_exporters.htm">http://www.zementis.com/pmml_exporters.htm</a><br />
<br />
I am truly excited about this new product offering. For the first time, people everywhere will be able to truly unleash the power of predictive analytics. You can find out more about this offering by downloading the PDF flyer: <a href="http://www.zementis.com/docs/Zementis_ADAPA_Predictive_Analytics_Edition.pdf">http://www.zementis.com/docs/Zementis_ADAPA_Predictive_Analytics_Edition.pdf</a><br />
<br />
We also offer the full ADAPA Enterprise solution with rules and reporting for on-site deployment through our regular sales channels. To learn more about ADAPA Enterprise Edition, check it out on wikipedia: <a href="http://en.wikipedia.org/wiki/ADAPA">http://en.wikipedia.org/wiki/ADAPA</a> or download the PDF flyer: <a href="http://www.zementis.com/docs/Zementis_ADAPA_Enterprise_Edition.pdf">http://www.zementis.com/docs/Zementis_ADAPA_Enterprise_Edition.pdf</a>