A Data Science Central Community
I'm working on a stand-alone software application right now that needs a couple of relatively simple analytic routines, PCA and Agglomerative Hierarchical Clustering , for a dataset with just a few thousand records. The application itself is in VB.NET and I was looking for a suitable math/stats/data-mining library that I could call from .NET to do the number crunching. So far, I've not had much success.
These routines are simple enough that I can code them if necessary but I would rather save the development/maintenance time and hook into something that's already built and heavily tested for a reasonable fee.
I have to be able to distribute the library as part of a commercial application so I need access to appropriate commercial licensing for a reasonable fee.
What has worked well for you?
Have you looked at R and RdotNet? http://rdotnet.codeplex.com/
Step van Schalkwyk
Only briefly as I was (at least originally) looking for something more specific (PCA and clustering).
Have you had success using this combination ? Any tips or watchouts for me ?
Update: I got back to this today and, after dealing with an issue running 32-bit R and VB.NET defaulting to a 64 bit build :-), it does seem to work very well indeed.
Data moves between .NET and R seamlessly (at least for the relatively small vectors I have to deal with). I'm already running PCA and cluster analysis appears to be only slightly harder.
Great tip - thanks Step.
Always welcome. Very pleased it works for you. Keep us posted about large vector performance if you would.
Have you looked at this?
Thanks for the suggestion Jim.
I checked out the API reference (http://api.mathdotnet.com/) and while it seems to have a sound set of math/stats functions that are more advanced than I get natively from .NET I don't see much on the Stats side beyond summary statistics (average, median, min, max, standard deviation etc.)