All Discussions Tagged 'Logistic' - AnalyticBridge2019-09-17T08:20:51Zhttps://www.analyticbridge.datasciencecentral.com/forum/topic/listForTag?tag=Logistic&feed=yes&xn_auth=noEasy to use tool for estimating probability of generating a saletag:www.analyticbridge.datasciencecentral.com,2016-03-07:2004291:Topic:3429992016-03-07T00:36:31.581ZDavid Collinshttps://www.analyticbridge.datasciencecentral.com/profile/DavidCollins
<p>Hi - I am trying to determine a good tool (that requires minimal additional effort) that will help me generate a probability of a sale for a list of of 300,000 products. I have attached a sample of the data, with 20,000 records.</p>
<p>Basically, I have a table of historical sales data (with about 300,000 records) that contains around 8 continuous independent variables along with a dependent variable that has a yes/no (i.e., binary outcome) value indicating whether product in the list has…</p>
<p>Hi - I am trying to determine a good tool (that requires minimal additional effort) that will help me generate a probability of a sale for a list of of 300,000 products. I have attached a sample of the data, with 20,000 records.</p>
<p>Basically, I have a table of historical sales data (with about 300,000 records) that contains around 8 continuous independent variables along with a dependent variable that has a yes/no (i.e., binary outcome) value indicating whether product in the list has had a sale in the past 12 months.</p>
<p>The historical data essentially looks like this.</p>
<p>Product1,2,3 etc<br/>Variable 1<br/>Variable 2 <br/>Variable 3<br/>Variable 4<br/>Variable 5<br/>Variable 6<br/>Variable 7<br/>Variable 8<br/>[B]Sold in past 12 months[/B] (Yes or No)</p>
<p>The last variable in the list is of course the dependent variable.</p>
<p>All I want to do is to find a tool that is going to be the best or easiest to use, so that I can assign a probability to each product in the list, essentially giving me the chance to condense my list to the products that are the highest likelihood to generate a sale, so that I can list those products instead of the others that have lower probability of generating a sale.</p>
<p>Ideally, the tool could do a quick logistic regression, or some other probability calculation based on the available variables, and thereby give me a (RVU-like) number (perhaps a probability ranging from 0 to 1) for each product, allowing me to quickly select the top 50,000 products to list on a website, since they have the higher probability of generating a sale according to the available variables.</p>
<p>I am of course assuming that the variables are somehow correlated to the outcome, but perhaps the tool will help me determine that.</p>
<p>Does anyone have any suggestions of a good tool to accomplish this? I would presume that there is a simple way to set this up in Microsoft Excel, but if not, then a piece of software that does this would of course be great too.</p>
<p>Or, feel free to review the actual sample data set, to help me understand how best to approach analyzing the data, and whether I should eliminate certain variables from the results. </p>
<p>See attached file. </p>
<p>Thanks for any suggestions.</p> Queries in modelingtag:www.analyticbridge.datasciencecentral.com,2015-12-29:2004291:Topic:3385312015-12-29T14:24:55.267ZRaghu Chittarihttps://www.analyticbridge.datasciencecentral.com/profile/RaghuChittari
<p>Hi all,</p>
<p>I am from engineering background. I would require your help in certain modeling concepts. Your help would be greatly appreciated!</p>
<p>Following are my few questions...</p>
<ol>
<li>If a variable which is important from business standpoint has a p-value of 0.5, then should it be considered in the model? If Yes, then wouldn't it make the model coefficients unstable?</li>
<li>Should I standardize the variables before building a logistic regression model? If Yes, is there a…</li>
</ol>
<p>Hi all,</p>
<p>I am from engineering background. I would require your help in certain modeling concepts. Your help would be greatly appreciated!</p>
<p>Following are my few questions...</p>
<ol>
<li>If a variable which is important from business standpoint has a p-value of 0.5, then should it be considered in the model? If Yes, then wouldn't it make the model coefficients unstable?</li>
<li>Should I standardize the variables before building a logistic regression model? If Yes, is there a commonly followed approach?</li>
<li>I am planning to develop a logistic regression to rate the employees as good or bad. The model includes variables such as his innovation score, #papers published, salary, Training cost, etc. First two are kind of assets to the company and the next two are kind of liabilities. Should I explicitly make the model understand this by considering the liabilities as negative values?</li>
<li>I have two independent variables in my LR model. Var1 has levels 'A' and 'B'. Var2 has levels 'X' and 'Y'. Of the entire dataset, there are 30% observations with Var1 as 'A' and Var2 as 'X', 35% observations with Var1 as 'A' and Var2 as 'Y', 30% observations with Var1 as 'B' and Var2 as 'X', 5% observations with Var1 as 'B' and Var2 as 'Y'. The number of observations with Var1 as 'B' and Var2 as 'Y' are far too less compared to other combinations. Is this skewness in data going to affect my results? If so, how should I rectify this?</li>
</ol> Techniques to address very low event rate for Logistic Regression Modeltag:www.analyticbridge.datasciencecentral.com,2013-10-25:2004291:Topic:2777712013-10-25T09:52:58.645ZHimanshu Sinhahttps://www.analyticbridge.datasciencecentral.com/profile/HSINHA
<p> Hi Folks,</p>
<p></p>
<p>I am looking at data form a telecom company and developing model to predict an event ( read churn).</p>
<p></p>
<p>I am planning to develop GLM using logit link function.</p>
<p>The real problem I am facing in the data is - very low volume (1.6 %) of churners.</p>
<p>So seeking advise on the following ;</p>
<p>- What are the possible (bad) outcomes if I take randomised training sample, consisting just 1.6 % churners ?</p>
<p>- Should I weight the training…</p>
<p> Hi Folks,</p>
<p></p>
<p>I am looking at data form a telecom company and developing model to predict an event ( read churn).</p>
<p></p>
<p>I am planning to develop GLM using logit link function.</p>
<p>The real problem I am facing in the data is - very low volume (1.6 %) of churners.</p>
<p>So seeking advise on the following ;</p>
<p>- What are the possible (bad) outcomes if I take randomised training sample, consisting just 1.6 % churners ?</p>
<p>- Should I weight the training sample to have a event rate >25% ?</p>
<p>- Any other technique to address problem of such small event rate.</p>
<p></p>
<p>Regards,</p>
<p>HV</p>
<p></p>
<p></p>
<p></p> Does R:NR ratio matter in deciding what technique we use for modeling?tag:www.analyticbridge.datasciencecentral.com,2010-09-21:2004291:Topic:790332010-09-21T18:42:39.579ZArunhttps://www.analyticbridge.datasciencecentral.com/profile/Arun
I came across some speculation on R:NR ratio to decide the technique that needs to be employed. I haven't found any documentation or proof as yet, so I thought I'd get some feedback/comments on the same.<br></br><br></br>Taking 3 scenarios of modeling situation:<br></br>We have a 3 populations of 100K customers, targeted by 3 different programs<br></br><br></br>Situation A - 5% have responded to a program of ours.<br></br>Situation B - Nearly 50% have responded.<br></br>Situation C - Greater than 70-80% have…
I came across some speculation on R:NR ratio to decide the technique that needs to be employed. I haven't found any documentation or proof as yet, so I thought I'd get some feedback/comments on the same.<br/><br/>Taking 3 scenarios of modeling situation:<br/>We have a 3 populations of 100K customers, targeted by 3 different programs<br/><br/>Situation A - 5% have responded to a program of ours.<br/>Situation B - Nearly 50% have responded.<br/>Situation C - Greater than 70-80% have responded.<br/><br/>In each of the three scenarios, we can exploit the data to yield insights into what kind of customers our responders are. But the question is, does the response rate define what techniques we need to use?<br/><br/>For eg, Does only Situation A call for Logistic Regression, while B & C are not suitable for Logistic Regression? Would CHAID IDTs be more suitable where R:NR ratio is near equal i.e 50:50?<br/><br/>As far as my knowledge goes, with more data, a logistic should be benefited into making a robust model with better probability scores. So, a logistic regression model, would definitely work better in any scenario, given the best kind of predictor variables, and definitely better in 50:50 as compared to a 5:95.<br/><br/>Please share your thoughts & experiences.<br/><br/>Thanks,<br/>Arun<br/> Accessing robustness of a Logistic Modeltag:www.analyticbridge.datasciencecentral.com,2010-08-03:2004291:Topic:754922010-08-03T18:26:59.099ZArunhttps://www.analyticbridge.datasciencecentral.com/profile/Arun
Hi,<br></br><br></br>I've got a Logistic model built for a particular response-non response event.<br></br><br></br>The model suggests statistics that don't look like a robust model. I'm sharing those for more clarification..<br></br><br></br>No. of variables - around 5-8<br></br>c = 0.9<br></br>concordance = 0.93<br></br>H-L Chi square (Goodness of Fit)= 700 (P <<0.0001) (rejects Null - bad model characteristics)<br></br><br></br>Also, a univariate distribution of P(Y=1|X1..Xn) gives me 95% of the probabilities fall within 0.4!!!…
Hi,<br/><br/>I've got a Logistic model built for a particular response-non response event.<br/><br/>The model suggests statistics that don't look like a robust model. I'm sharing those for more clarification..<br/><br/>No. of variables - around 5-8<br/>c = 0.9<br/>concordance = 0.93<br/>H-L Chi square (Goodness of Fit)= 700 (P <<0.0001) (rejects Null - bad model characteristics)<br/><br/>Also, a univariate distribution of P(Y=1|X1..Xn) gives me 95% of the probabilities fall within 0.4!!! Which suggests that the model does poorer than a random!!<br/><br/>What are the ways to improve my model? I know of one or two methods that I surfed through recently, but none hands on.. Would like to hear any advice on this!<br/><br/>Thanks in advance.<br/><br/>Arun<br/> Discriminant Analysis on Categorical Variablestag:www.analyticbridge.datasciencecentral.com,2009-10-26:2004291:Topic:562332009-10-26T10:27:40.888ZArunhttps://www.analyticbridge.datasciencecentral.com/profile/Arun
I have a set of Independent Variables - both Categorical Variables and Continuous Variables. There is the predictor variable which have certain classes say C1 to Cn. The aim is to predict the category membership!<br />
<br />
I'm facing two issues. Any discriminant procedure requires only continuous variables for prediciting. And second, logistic regression which can be used produces probability values of category membership, which does not equivalently specify the inter-class variance using distance…
I have a set of Independent Variables - both Categorical Variables and Continuous Variables. There is the predictor variable which have certain classes say C1 to Cn. The aim is to predict the category membership!<br />
<br />
I'm facing two issues. Any discriminant procedure requires only continuous variables for prediciting. And second, logistic regression which can be used produces probability values of category membership, which does not equivalently specify the inter-class variance using distance measures like a Canonical Discriminant Analysis does using %plotit macro.<br />
<br />
Hence, I've got two questions.<br />
1. If I've got mixed variables - both Continuous & Catergorical, can I still predict membership of category in the predictor variable? If yes, how?<br />
2. If the answer to the above is to use Logistic Regression or Genmod/Catmod, can I still obtain a plot of the various observations that are governed by the category in a distance measure plot to find out the between category variance/distance and hence understand visually what is the scenario of the categories.<br />
<br />
Also, I'm not able to plot using %plotit due to the high no. of observations I've got (1.5 Mi). Do I need to consider a downscaling to bring it down to a lesser no? Or can I plot a contour to know the idea of the area coverage? PROC LOGISTIC and Data Visualization Topics of Free Nov 7 Online VirtualSUG Sessionstag:www.analyticbridge.datasciencecentral.com,2008-11-01:2004291:Topic:277602008-11-01T13:18:17.270ZAndrew Karphttps://www.analyticbridge.datasciencecentral.com/profile/AndrewKarp
The Virtual SAS Users Group (VirtualSUG) will present two free online sessions on November 7, 2008. Complete details, including information on how to register for these events, is available at <a href="http://www.virtualsug.org">http://www.virtualsug.org</a><br />
<br />
The first session on November 7 will be offered by Joshua Drukenborg of the US Environmental Protection Agency from 0830-0930 Pacific/1130-1230 Eastern Time. His presentation, “Using SAS® and Google Earth™ to Access and Display Air…
The Virtual SAS Users Group (VirtualSUG) will present two free online sessions on November 7, 2008. Complete details, including information on how to register for these events, is available at <a href="http://www.virtualsug.org">http://www.virtualsug.org</a><br />
<br />
The first session on November 7 will be offered by Joshua Drukenborg of the US Environmental Protection Agency from 0830-0930 Pacific/1130-1230 Eastern Time. His presentation, “Using SAS® and Google Earth™ to Access and Display Air Pollution Data,” demonstrates how SAS can be used to create files written in Google Earth’s Keyhole Markup Language (KML) as well as how these KML files can utilize SAS/IntrNet® to display data dynamically in an easy, user-friendly fashion.<br />
<br />
The second presentation on Nov. 7 is by Peter Flom, Ph.D., an independent SAS and statistical consulting in New York City. “PROC LOGISITC: Traps for the Unwary,” identifies situations where this popular SAS/STAT® procedure runs without errors, but the model it generated is problematic. The talk then describes ways to address these problems in order to obtain useful results.<br />
<br />
Complete details on how to participate in VirtualSUG’s online sessions are available at <a href="http://www.virtualsug.org">http://www.virtualsug.org</a> . Please take the time to read ALL of the information on our site’s homepage to understand how VirtualSUG “works” and what you will need to do to take advantage of this free online resource for the SAS Software user community.<br />
<br />
Thank you!<br />
<br />
Andrew Karp<br />
Virtual SAS Users Group<br />
<a href="Http://www.VirtualSUG.org">Http://www.VirtualSUG.org</a>