Subscribe to DSC Newsletter

Bruno M
  • Male
  • New York, NY
  • United States
Share on Facebook

Bruno M's Friends

  • Marija Blagojević
  • Visual Mining, Inc.
  • Dominic Pouzin
  • Bansi Patel
  • Manish
  • Bruce Ratner
  • Vincent Granville

My Delicioius Links

Loading… Loading feed


Bruno M's Page

Profile Information

Intelligent Mining, Inc.
Manager, Director
Job Function:
Business Analytics, Marketing Databases, Web Analytics
Short Bio:
LinkedIn Profile:

Leo Breiman's paper Statistical Modeling: The Two Cultures

I really enjoyed this paper when I first read it - it helped clear some things up for me.

This paper is pretty basic compared to some of the stuff on the site, neverthess as I am always interested in the non-formal education of an analyst in today's world I wanted to share something I found helpful.

Bruno M's Blog

4 open source data mining tools (with GUI)

Posted on April 21, 2009 at 9:30am 1 Comment

I was at the Semantic Web Meetup @ the Hearst Building in NYC (amazing venue, the first green building completed in NYC) yesterday and someone asked about open source tools available for data mining, specifically for clustering. Unfortunately I had to run out after the meetup and couldn’t provide these to him. The one mentioned by the presenter was Weka, which also the first free open source tool I came across.

Anyway, here are the ones I have found that are worth checking out and… Continue

Comment Wall (4 comments)

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

At 8:32am on November 23, 2009, John F. Elder IV said…
Bruno, just noticed your comment; thanks! The book seems to be doing well. Me too. How are you? -John
At 4:24am on June 12, 2009, Bruce Ratner said…
Thanks for positive feedback.
At 12:41pm on May 13, 2009, Dominic Pouzin said…
Absolutely, feel free to re-post my comments. Would you be interested in playing with the tool when I have a beta? I'd love to get feedback from you!
At 12:10pm on May 13, 2009, Dominic Pouzin said…
Hi Bruno, I stumbled on your blog page with some really good questions - thanks for the plug too! Somehow, I was unable to log in to leave a comment, so I'm answering right here.

First, you are absolutely right that it would make sense to release now. As they say, release early, release often. Perfectionism is a developer's worst enemy. At the same time, you also can't succeed without some amount of perfectionism. So I guess you have to be a bit schizophrenic about that. The temptation to delay release in favor of some additional features is just too strong to resist right now.

How long did it take you to create the tool?
About 10 months.

What languages/tools did you use?
C# for the backend, Silverlight for the UI. Java would have been a better choice, it's more portable. There are ways to run C# code on Linux systems, but it's not very robust.

Can I connect to my MS SQL database?
Right now, it is necessary to import the data (ex: export database content to CSV -> import that). Perhaps later, this will become easier. When running from the "cloud", direct access to enterprise SQL databases can be a bit tricky. On the other hand, direct access to rich online sources of business data is very feasible (ex: connect to over the Internet -> download business data). Once the data has been imported, it (along with analysis results) is stored in SQL.

How did you come up with the very robust visualizations?
I think that's probably more art than science - inspiration is available to all of us! Technologies such as Flash or Silverlight help too.

What is the largest data set that you have analyzed using your tool?
I routinely analyze 100K+ record data sets. So not terabytes of data, but enough for many small to medium business scenarios, perhaps stretching to large marketing campaigns / individual web logs / individual product orders.

And ok, underneath it all (can you discuss?), are you relying on open source (and very powerful) algorithms (like Weka, R), are you using proprietary algorithms, both?
Just robust implementations of efficient data mining algorithms found in the literature, with some tweaks to increase robusiness (ex: handle a mix of discrete / numeric / missing values), and performance (ex: replace discrete values by hash values). There just are too many issues with using open source software in terms of commercialization.

Thanks again!

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service