Subscribe to DSC Newsletter

novel itemset representations based on prefix-tree nodes for frequent itemset mining

In recent years, I proposed three new kinds of data structure to represent itemsets. They are Node-list [1], N-list [2], and Nodeset [3]. They use prefix-tree nodes to represent itemsets instead of the set of tranaction ids.
Since prefix-tree is usually high compressed, Node-list [1], N-list [2], and Nodeset [3] are much shorten than Tidset or diffset, which are two classical vertical representatnion of itemsets. Therefore, for frequent itemset mining, the algorithms based on Node-list, N-list, and Nodeset are much more efficient than algorithms based on Tidset or diffset.
Our extensive experiments show that algorithms based on Node-list, N-list, and Nodeset are even more efficient than FP-growth algorithm.
I think the structures can be used to mining other patters efficiently. See [1, 2, 3] for more details.
 
[1] Deng, Z. & Wang, Z. A New Fast Vertical Method for Mining Frequent Patterns. International Journal of Computational Intelligence Systems, 3(6): 733 - 744 2010.
download website: http://www.tandfonline.com/doi/abs/10.1080/18756891.2010.9727736
 
[2] Deng, Z.; Wang, Z. & Jiang, J. A New Algorithm for Fast Mining Frequesent Itemsets Using N-Lists [2]. SCIENCE CHINA Information Sciences, 55 (9): 2008 - 2030, 2012.
download website: http://info.scichina.com:8084/sciFe/EN/abstract/abstract508369.shtml
 
[3] Deng, Z. & Lv, S. Fast mining frequent itemsets using Nodesets [3]. Expert Systems with Applications, 41(10): 4505–4512, 2014.
download website: http://www.sciencedirect.com/science/article/pii/S0957417414000463

Views: 84

Tags: frequent, itemset, mining, pattern, representation

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service