A Data Science Central Community
I am curretly working on a point of sale data for a retail store which sells from garments to grocery. The data i have is a point of sale(say a customer invoice). One challenge i see is there is no loyalty program for the store and only 20% of the customers use credit card for the transactions so unable to track a customer.
I have tried Market basket analysis(Association rules), cannabalization analysis and Next best offer analysis.
I would like to know the set of predictive analytics that are possible on this type of data.
Any help is greatly appreciated.
Thanks in advance.
Thank you Illingworth. This helps to look in few more dimensions in my data. We have considered predicting churn, however the loyalty programs are not to the mark and unable to know which customer turns back to the store. Any views on this?
Maybe you can have a try for esProc. It's good for data handling. http://www.raqsoft.com/download.html
Sorry for the late posting. I think that I can give more ideas.
If you have the following data:
With this information, you have the means to do the following(broken down by hourly, daily, weekly, monthly, yearly):
By count distinctly the tran# into time buckets you will be able to see volume and any seasonality, short term trends, long term trends, profitability per sku, cost per sku, revenue per sku, revenue at the store, profit at the store, volume of each of the items at the store or broken down by SKU, do a promotional effectiveness for any promotions that occur(you just check the price and compare to quantity to see which price increased the quantity in each of the time bucket. marketing does not need to be involved), sku momentum or rate at which the sku sells over time and the list goes on for an eternity.
By counting distinctly trans# within a Registerid into time buckets you will be able to see workload on a per register, peak times, low time, determine if a new register is needed, with profit(Price minus cost) justify new registers, be able to determine flow through the store -- lots of queueing numbers can be spewed out.
You can change the dimensions, rather than tran# -- do things by SKU or by Registerid or by cost or by price or by quantity.
The above is basically a time series analysis of your data with different dimensions.
Now some more complex analysis, in your market basket analysis you can create another model to discover associations. So for example you can use a form of Schellings segregation model as a simple ranking mechanism to help in searching for association rules. For example: you have 30 skus in the store, all that you are concerned with are the skus that have been purchased together and nothing else. So on a 2 sku model, you are only concerned with the transactions that contain those 2 skus(contain both or contain one of the SKUs) over a bucket of time, with the simple construct of abs(( sku1 / countoftransactions(sku1) )- ( sku2 / countoftransactions(sku2) ))/2 ... if the result ever is zero then there is a perfect mix.
Then you start performing an analysis this over a daily( for each day would have the result ), a weekly( for each week would have the result), monthly -- you will start to see trends of when products are bought together then start to be bought separately. This gives a relationship among the SKUs and there by producing seasonality on the SKU mixture. With a 30 SKU model, you simply expand the subtractions and divide the number SKUs under consideration. You can extend the segregation to a time series analysis with some differencing on your time series to get some more meaningful numbers.
There are some caveats to this segregation methods, I will let you discover what they are and how to overcome them in your analysis.
Thank you Lance, your thoughts are very insightful. We have implemented this association rule mining on this data. It really works.
Though i cant reveal the result but on a high level we used this to find related product/sku purchase on time dimension drill down.
Thank you once again for your thoughts.
Wow...I need to explain things better when I post. I left out a few explanations.
The simple equation as stated above is
abs(( sku1 / countoftransactions(sku1) )- ( sku2 / countoftransactions(sku2) ))/2
The part of the equation that states ( sku1 / countoftransactions(sku1) ) should be defined as:
countoftransactions(sku1) is the count of ALL transactions/orders that sku1 are found. The numerator sku1 is the count of all transactions/orders that sku1 and sku2 are found in the same order.
The part of the equation that states ( sku2 / countoftransactions(sku2) )should be defined as:
countoftransactions(sku2) is the count of ALL transactions/orders that sku2 are found. The numerator sku2 is the count of all transactions/orders that sku2 and sku1 are found in the same order.