<p>Hi - I am trying to determine a good tool (that requires minimal additional effort) that will help me generate a probability of a sale for a list of of 300,000 products. I have attached a sample of the data, with 20,000 records.</p>
<p>Basically, I have a table of historical sales data (with about 300,000 records) that contains around 8 continuous independent variables along with a dependent variable that has a yes/no (i.e., binary outcome) value indicating whether product in the list has had a sale in the past 12 months.</p>
<p>The historical data essentially looks like this.</p>
<p>Product1,2,3 etc<br/>Variable 1<br/>Variable 2 <br/>Variable 3<br/>Variable 4<br/>Variable 5<br/>Variable 6<br/>Variable 7<br/>Variable 8<br/>[B]Sold in past 12 months[/B] (Yes or No)</p>
<p>The last variable in the list is of course the dependent variable.</p>
<p>All I want to do is to find a tool that is going to be the best or easiest to use, so that I can assign a probability to each product in the list, essentially giving me the chance to condense my list to the products that are the highest likelihood to generate a sale, so that I can list those products instead of the others that have lower probability of generating a sale.</p>
<p>Ideally, the tool could do a quick logistic regression, or some other probability calculation based on the available variables, and thereby give me a (RVU-like) number (perhaps a probability ranging from 0 to 1) for each product, allowing me to quickly select the top 50,000 products to list on a website, since they have the higher probability of generating a sale according to the available variables.</p>
<p>I am of course assuming that the variables are somehow correlated to the outcome, but perhaps the tool will help me determine that.</p>
<p>Does anyone have any suggestions of a good tool to accomplish this? I would presume that there is a simple way to set this up in Microsoft Excel, but if not, then a piece of software that does this would of course be great too.</p>
<p>Or, feel free to review the actual sample data set, to help me understand how best to approach analyzing the data, and whether I should eliminate certain variables from the results. </p>
<p>See attached file. </p>
<p>Thanks for any suggestions.</p>