Subscribe to DSC Newsletter

What are the advantages and drawbacks of writing decision rules (e.g. for fraud detection) in XML or PMML, vs. SQL?

Views: 1838

Reply to This

Replies to This Discussion

For rule-based fraud detection I would probably investigate Drools ( further.
Good Question.
Is markup language functionally better ?
I found the RuleML website made this case because of "permitting both forward (bottom-up) and backward (top-down) rules in XML for deduction, rewriting, and further inferential-transformational tasks".
Of course the fraud detection engine must be designed to exploit these technologies.

What I'd like to see is a strategic rules visualizer - one which articulates the inter dependencies and overlapping behaviors (including value based accentuation).
...and, after further review of an impressive list of PMML examples, I wondered why I ever bothered writing SQL queries. Anomalous data is often, but not always, the first clue.
PMML supports:
-Association Rules
-Clustering Models
-Decision Trees
-Neural Networks
-Regression & General Regression Models
-Rule Set Models
-Sequence Models
-Support Vector Machines
-Naive Bayes
...and how long would it take to successfully deploy SQL versions of these models ?
It takes me 5 mins max to deploy in SQL and score my multi layer backpropagation neural network models or decision tree (cart,. c5 etc) models on several million rows.

I always use SQL scoring because I have no need to save the models in their deployed form (although I do retain the entire data prep and scoring process in a proprietary PMML-like format. SPSS Clementine stream file) and I always work with the same data warehouse.

You could use either SQL or PMML for deployment, it just depends what you will be deploying it on :) I'd stick with the SQL route for high scale scoring on massively parrallel data warehouses. If you want flexability, storing and versioning, and might deploy the same models on numerous different systems then PMML will probably be the more successful route. PMML is also much easier to work with if you want to visualise or graph the model is any way.

There are likely many models you can do with PMML that you can't in SQL, but the common ones we use (nn, aprori, k-means, kohonen, cart, c5) can be represented as SQL.

Gordon Linoff made a good post on displaying NN's recently;

Not sure what format he was using though... I presume PMML.


-Tim Manns
This is a great question Vincent and great discussion as well.

In general, the rule of thumb for fraud is the earlier you catch it, the more you save (money, hassle, ....). So, executing your predictive models in real-time becomes very important.

In my view, PMML is definitely the way to go. Granted that SQL can make it easy if you are working on top of your database, but that is actually similar to writing your own code to represent your model (or SQL generated by a proprietary solution). There is no portability or models cannot be shared between applications. Many statistical packages such as R, SPSS/Clementine, SAS, KNIME, etc export models in PMML. With our ADAPA scoring engine (which reads in models represented in PMML) we have been able to execute thousands of records per second.

So, we believe PMML is extremely powerful when it comes to represent predictive models (and rules). In ADAPA however, we have taken a different route when it comes to rules. Basically, as Jens suggested, we have been using Drools as our rules engine. ADAPA actually integrates PMML (predictive models) and rules seamle.... In this way, one can use rules to implement logic around PMML models such as segmentation or use the score from one model to decide, for example, on which model to execute next. By having Drools being part of ADAPA, we benefit from a robust and fast rules-engine.


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service