Subscribe to DSC Newsletter

I have a data set with 67,330 customer orders and 22,529 distinct products in these orders. I was trying to perform association rule analysis in R but it reached memory limit. I have a machine of 2GB and I don't want to decrease the size of my data. Is there alternative to my approach?

Views: 111

Replies to This Discussion

By all means use the product hiearchy. The 22,529 products are classified into different product segments and I would climb the product segment tree up to the level you can analyze it in R. For example you are working in a sporting goods company and one of the products is a the 993 Heritage Shoe which can be classified according to the following hierarchy

"993 Heritage Shoe"/Running Shoe/Shoes/Apparrel etc.

In the case you are working on there must also be a product segment tree that is either implied or is already provided that you could make use of...
Hi, Mike:
I thought of that too. The problem is that people in our Customer Service department wants me to work on product level. We do have product hiearchy as you mentioned but when a customer calls in to place an order on some products, the CSR would like to be able to tell them what other products people tend to buy together along with the product that the customer just purchased to increase sales. I know that R loads all data into RAM not like SAS that loads into hard drive. That may be the cause of the problem. But, SAS EM is too expensive that my company won't even consider leasing it and I am stuck with non-commercial options. Do you think that other open source packages than R can handle large data set? Thanks.

Product level would result in too granular of a solution as a thought why not use the product segmentation to get your association level at the product group level (one level up from product say). Then when the time comes to make a recommendation then pick the most popular product from the product group level that is associated with the product group the customer just purchased.
Hi, Mike:
That sounds like a good idea. What I have in mind is the way Amazon does it. When I search for a book in Amazon, it would tell me the other products that customers tend to buy together with that book. It seems like Amazon is looking at product level, not a level up. Wouldn't looking at product level give customer better recommendation?

Maybe and example might be an electric guitar and one of the skus would be black. An association analysis would recommend a black guitar while the color really doesn't matter but the product group of guitar would. No matter what I think you are in need of reducing the dimensionality of the project. Another way to do this is to segment the customers and preform an assocation analysis of the products at the customer segment level which may lead to a better recommendation since the recommendation would take into account of the product preferences of a smaller group thereby reducing the problems dimensionality.
Hi, Mike:
That was exactly what I did already. Just like your example, I already ignored the colors and/or the sizes of basically the same products by giving them the same product number, but I still ended up with this many distinct "PRODUCTS".
Your idea of segmenting customers first and performing association analysis within each group is pretty interesting. Let me try it and report back to you. Thanks a lot!


On Data Science Central

© 2020 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service