I've been exploring free DM software solutions, but I must confess I'm quite lost by now. With this in mind, I would like your opinion on this:
"What do you recon is the best open source DM software available, weighting ease of work and powerfulness of the algorithims".
Please state your opinions.

Disclaimer: I don't have anything to reveal; I do not have any financial interest in the solutions I'm suggesting and have no interest in them otherwise, that might allow me to benefit if they are used.

For a specific, extensible Data Mining solution consider Pentaho, which is here Pentaho has a well-defined API which allows for the development of a larger body of plugins, than otherwise. Mondrian, for example, provides OLAP.

I don't believe it is possible to fault anyone who begins with Pentaho, unless all they need is a solution to something as simple as a homework assignment; something that is solvable using Microsoft Excel and its web querying capability. This, by the way, may be what you are implying where you refer to "ease of work". In my opinion there isn't such a thing as "light" Data Mining. No matter how you slice it, either you are looking at what plain-old SQL queries or you are looking at N-dimensional arrays. Once you hit three dimensions, adding more should require less and less effort if the tool is meant for use on "N-dimensions".

To clarify my use of "SQL"; this is a reference to Software Query Language, not to the Microsoft database solution that is a morph on the term, "SQL". The standardized SQL syntax is an ANSI standard.

For more Open Source info that pertains to the subject matter, you will find over 900 hits for "Data Mining" alone, on SourceForge. Another place to look is the website for the Open Source Foundation but I bet it points you back to SourceForge.

If you can detail the things I'm referring to below it would help me understand what you are after.

1) "free" = "Open Source" as you are using the term, correct?, and

2) I'm assuming that by reference to "DM" you are referring to Data Mining. In common usage, the term can refer to a scope so large that a meaningful answer isn't possible without verbal discussion too, and it can be so narrow that the Data Mining aspect is useful only if you already have software to do the other things (e.g. OLAP, data integration/aggregation, reporting) but this means an API must be defined.

3) What are one or two of the packages you have already looked at and how close does each come to what you seek, and

4) What are one or two aspects whose purpose is escapes your grasp at this moment, and

5) What do you mean by "best"? Off-the-Shelf (OTS) software typically gives the user 80% of the funtionality at less than 20% of the cost (of custom software). This makes the likelihood of a universal "best" virutally impossible.

I hope this helps!

--John Crout
Many thanks for your reply. It helped a lot.

As my first language isn't English i think my original message wasn't very clear sometimes... I wil try to clear the misunderstood points.

1) you are right. When I wrote "open source", I meant free.
2) I wouldn’t call myself a “light” Data Mining user. I would call myself more of a “mid-weight” user. I also don’t think that “light” Data Mining is possible. When I spoke of easiness-of-use I was referring mostly to the comparison between the learning curves of SAS Enterprise Miner or Clementine and R. I would call SAS and Clementine “easier”. I don’t know if the expressions I’m using are the best…

So, my main problem is to find a free Data Mining package, preferably “similar” to the packages I’m used to (SAS EM, or SPSS Clementine).

Thanks for your advice for Pentaho. I think the DataMining solution (Based on Weka, right?) is perhaps the best solution for my challenges. I think I will investigate this solution and perhaps R (the Rattle interface might improve my learning curve).

Best regards,
Here is a blog on 3 awesome free Math programs, and the comments provide lots more.

I'll investigate your advices (I already had the intention of investigating R).

Many thanks,

I would suggest to review the following tools:

- R Project with the Rattle GUI from Togaware
- RapidMiner
I didn't mean to come across as though I was criticizing in any way and I apologize because I did not see this post before now. (This is a reply to this: dated Dec 5.

Regarding SAS and SPSS; I'm not aware of anything that approaches either one in terms of the analyses they Can do. You may wish to consider starting a project on SourceForge, whether or not you program, and challenging the community of programmers to develop something. You could serve as (and may be more succesful at recruiting others if you do this) someone who manages the project, develops the requirements for it, and functions as a tester. You could begin by establishing what you need as the initial requirements that need to be developed. Just be sure to make available what relvant information you can tell folks, in the event you appear as someone simply looking to get a lot, without writing code.

In my opinion, a tool comparable to the SAS base needs to be available to Open Source folks.

P.S> You may also wish to add, in quotes, "Open Source" to the tags, for this post.
Please refer to the article of "Open-Source Tools for Data Mining", written by Blaz Zupan and Janez Demsar, and the full text can be found on

I think this article can answer your question well. There are some popular open source data mining software, such as Weka, Rapid Miner, KNIME, R and etc. Weka is very powerful, but it seems that Weka is harder to use than the commercial software (SAS EM or Clementine). From my opinion, I'd like to suggest you to try KNIME.
Please when you use the term "open source" realize that there is a strict definition of the term, which is not the same as "free", in the zero cost sense.

The most widely accepted definition of the term "open source" is by the Open Source Initiative (

Some software claims to be open source (e.g. KNIME), but it is not. Even it's own software license explicitely says it is not open source.
What do you call software with open source code that isn't "open source software"?
That depends on what you mean with "open", ie. what your rights as a user are with the source code.

With open source it is clearly defined, what you can expect. Any other license where you have access to see the source code can restrict you in any way possible, e.g. you can see the code, but you may not use it for anything.

Therefore it is not fair to call it open source. By calling it open source notwithstanding, you unfairly benefit from the goodwill of true open source, and devaluate the true meaning of it.

What you will call it then depends on what rights you want to grant to your users, but if you do not want to use a true open source license, you are on your own.


