Subscribe to DSC Newsletter

Hidden Decision Trees vs. Decision Trees or Logistic Regression

Hidden Decision Trees is a statistical and data mining methodology (just like logistic regression, SVM, neural networks or decision trees) to handle problems with large amounts of data, non-linearities and strongly correlated dependent variables.

The technique is easy to implement in any programming language. It is more robust than decision trees or logistic regression, and help detect natural final nodes. Implementations typically rely heavily on large, granular hash tables.

No decision tree is actually built (thus the name hidden decision trees), but the final output of an hidden decision tree procedure consists of a few hundred nodes from multiple non-overlapping small decision trees. Each of these parent (invisible) decision trees corresponds e.g. to a particular type of fraud, in fraud detection models. Interpretation is straightforward, in contrast with traditional decision trees.

The methodology was first invented in the context of credit card fraud detection, back in 2003. It is not implemented in any statistical package at this time. Frequently, hidden decision trees are combined with logistic regression in an hybrid scoring algorithm, where 80% of the transactions are scored via hidden decision trees, while the remaining 20% are scored using a compatible logistic regression type of scoring.

Hidden decision trees take advantage of the structure of large multivariate features typically observed when scoring a large number of transactions, e.g. for fraud detection. The technique is not connected with hidden Markov fields.

Related article

Views: 17854

Reply to This

Replies to This Discussion

Does this apply to large data set? I am currently using logistic regression to build response model on about half million customers with over 300 variable.
Yes, it was initially designed to handle data sets with 60,000,000 observations. It took 2 days for SAS EM to analyze the lift from one rule set, using decision trees, while hidden decision trees could process hundreds of rules in less than 3 hours (if written in Perl) and in less than one hour if written in C.
This is quite interesting! Vincent, can you provide references to white papers/articles where I can study the details? Also, which software offers this algorithm?
Currently, this is proprietary. I'm not in the Academia, so I can not publish papers on the subject. But I believe that this technology will bring tremendous business value,and become as popular as logistic regression or SVM.
Hi Vincent,

Is it available in SAS.If it is available how to do this kindly let me know
It is not available in SAS nor in other statistical packages. In SAS, you would have to call a few procedures from SAS Base and possibly write some macros to get it implemented. It's a new methodology.
Thanks vincent,Could you please forward coding if you have
Vincent. The general idea sounds quite similar to Random Forests. Could you briefly explain how this differs?
  • It does not involve comparing / averaging / computing a mode across multiple decision trees with (potentially) overlapping nodes
  • No decision tree is actually built, so there's no splitting and no splitting criterion ever used (no pruning either)
  • Final nodes are not necessarily "deepest nodes", they usually are not very deep
  • Emphasis is not on producing maximum predictive power, but instead on maximum robustness to avoid over-fitting
Hi Matt, two more comments:
  • Hidden decision trees is an hybrid method. In the case I am working on, 75% of the transactions are scored via hidden decision trees nodes, and 25% are scored with another methodology. The reason being that only 75% of the transactions belong to statistically significant nodes. And the remaining 25% can not be handled by neighboring parent nodes because of bias: in a fraud detection system, these 25% transactions tend to be more fraudulent than average.
  • Eventually, all methods are equivalent. A logistic regression with dummy variables (logic logistic regression) with 2nd, 3rd and 4th order interactions, with an observations matrix with a very large number of variables (mostly cross products of initial variables), but an extremely sparse matrix at the same time, with sophisticated numerical analysis techniques to handle sparsity, is equivalent to decision trees.
"Random forests" are to "decision trees" what "hidden forests" are to "hidden decision trees". More on this later.
A this stage, I am not allowed to post source code or details on how to make this HDT technique effective. If I had the time and if I was in the Academy, I would write a seminal paper on the subject. Eventually, I will write more about it. Please be patient.


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service