A Data Science Central Community
Hi - I am trying to determine a good tool (that requires minimal additional effort) that will help me generate a probability of a sale for a list of of 300,000 products. I have attached a sample of the data, with 20,000 records.
Basically, I have a table of historical sales data (with about 300,000 records) that contains around 8 continuous independent variables along with a dependent variable that has a yes/no (i.e., binary outcome) value indicating whether product in the list has had a sale in the past 12 months.
The historical data essentially looks like this.
[B]Sold in past 12 months[/B] (Yes or No)
The last variable in the list is of course the dependent variable.
All I want to do is to find a tool that is going to be the best or easiest to use, so that I can assign a probability to each product in the list, essentially giving me the chance to condense my list to the products that are the highest likelihood to generate a sale, so that I can list those products instead of the others that have lower probability of generating a sale.
Ideally, the tool could do a quick logistic regression, or some other probability calculation based on the available variables, and thereby give me a (RVU-like) number (perhaps a probability ranging from 0 to 1) for each product, allowing me to quickly select the top 50,000 products to list on a website, since they have the higher probability of generating a sale according to the available variables.
I am of course assuming that the variables are somehow correlated to the outcome, but perhaps the tool will help me determine that.
Does anyone have any suggestions of a good tool to accomplish this? I would presume that there is a simple way to set this up in Microsoft Excel, but if not, then a piece of software that does this would of course be great too.
Or, feel free to review the actual sample data set, to help me understand how best to approach analyzing the data, and whether I should eliminate certain variables from the results.
See attached file.
Thanks for any suggestions.
Applied a profit model to it so you can determine which sales line items could be a potential - you can change sales price and unit price and those changes are reflected immediately; for the utility function, those parameters require Solver
Thanks very much Martin.
Can you let me know which column in the sample results I could use to serve as a value that indicates the probability of a sale? Or is there perhaps a column in the results that will essentially give me the ability to sort my inventory by probability to achieve a sale?
The "q(pi)" and the "Li" columns look like potential candidates.
My pleasure yes the q(pi) is the probability of sale and the Li is the likelihood of sale or no sale.