A Data Science Central Community
Loading feed
Posted on April 21, 2009 at 9:30am 1 Comment 0 Likes
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
Comment Wall (4 comments)
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge
Thanks for positive feedback.
Bruce
First, you are absolutely right that it would make sense to release now. As they say, release early, release often. Perfectionism is a developer's worst enemy. At the same time, you also can't succeed without some amount of perfectionism. So I guess you have to be a bit schizophrenic about that. The temptation to delay release in favor of some additional features is just too strong to resist right now.
How long did it take you to create the tool?
About 10 months.
What languages/tools did you use?
C# for the backend, Silverlight for the UI. Java would have been a better choice, it's more portable. There are ways to run C# code on Linux systems, but it's not very robust.
Can I connect to my MS SQL database?
Right now, it is necessary to import the data (ex: export database content to CSV -> import that). Perhaps later, this will become easier. When running from the "cloud", direct access to enterprise SQL databases can be a bit tricky. On the other hand, direct access to rich online sources of business data is very feasible (ex: connect to SalesForce.com over the Internet -> download business data). Once the data has been imported, it (along with analysis results) is stored in SQL.
How did you come up with the very robust visualizations?
I think that's probably more art than science - inspiration is available to all of us! Technologies such as Flash or Silverlight help too.
What is the largest data set that you have analyzed using your tool?
I routinely analyze 100K+ record data sets. So not terabytes of data, but enough for many small to medium business scenarios, perhaps stretching to large marketing campaigns / individual web logs / individual product orders.
And ok, underneath it all (can you discuss?), are you relying on open source (and very powerful) algorithms (like Weka, R), are you using proprietary algorithms, both?
Just robust implementations of efficient data mining algorithms found in the literature, with some tweaks to increase robusiness (ex: handle a mix of discrete / numeric / missing values), and performance (ex: replace discrete values by hash values). There just are too many issues with using open source software in terms of commercialization.
Thanks again!