Subscribe to DSC Newsletter

simple example of wrapping an open source learner in PMML?

Does anyone know of examples of wrapping an existing piece of supervised learning software to output models in PMML format?  Of particular interest are learners that just take in labeled vectors of numbers as training data and put out models that are pretty much just coefficient vectors (liblinear, SVMlight, BXRtrain, BOW, etc.).  That is, they don't have any smarts about data types, ranges of legal values of features, etc.: something else is assumed to deal with that and present the learner with appropriate numeric vectors. 

For such software, all the interesting data dictionary stuff would need to be supplied alongside the input data if it's going to show up in the PMML model that's output.  There's nothing conceptually difficult about this: what I'm curious to see is if there's any conventions, design patterns, etc. that have grown up in the PMML community for doing this.

The PMML website <a href="http://www.dmg.org/products.html">list of software that either consumes or produces models in PMML</a>, but this is mostly commercial closed source software.  The programs with open source versions listed there (Rapidminer and WEKA that I can spot) are rather complex data mining suites.  What I'd like to see an example of is a a minimalist wrapping of a simple one-trick pony kind of learner.

Tags: design-patterns, open-source, software

Views: 296

Replies to This Discussion

KNIME (also open source) recently added PMML preprocessing support - so you can now add preprocessing to the PMML model using the graphical workflow editor (see also our KDD-PMML Workshop paper). The interna wrap the data dictionary around the model which doesn't know much about this so maybe a look at the KNIME code may help? But it sounds as if you are looking for something even simpler...

 

Michael

Yes, I hoping to find an example of something simple like SVMlight being wrapped for PMML.  But it's looking like people only take on PMML when the effort can be amortized over some huge system.   Well, fools rush in...we're doing it on a project I'm on, and I'll post on this when we're done.

RSS

On Data Science Central

© 2020   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service