Subscribe to DSC Newsletter

Hello all-

I am coming from years of SPSS. I looking at some different data mining packages to learn and use (to supplement or maybe supplant SPSS). Clementine/Data modeler will be out of my price range. 

Looking at the polls- seems Rapidminer is the most frequently used in DM (other than the commercial SPSS, SAS or basic R console). Rapidminer appears to look promising from a glance.

My question is- what are the benefits of Rapidminer? What has made it so popular and used by so many? Recommendations for it, against it? What are its pros and limitations?

Thanks

Myles

PS- I saw similar posts to this from back in 2009, but assumed much of that information may have changed since then. 

Tags: data, mining, rapidminer, review, software

Views: 2905

Reply to This

Replies to This Discussion

Hi Myles,

first of all: I am biased since I am working for Rapid-I, the company behind RapidMiner. However, I wanted to share some thoughts with you and hope that they are helpful...

There are a couple of factors why so many people (of course not all) love using RapidMiner:

  • license model: it's your choice - you can start with the already powerful Community Edition for free, or you could switch over to the more feature-rich Enterprise Editions including support and other services. Even then license costs are much lower since licensing is not done per end user.

  • ease of use & analyst support: of course there is a nice an intuitive GUI. But RapidMiner is in particular amazing when it comes to supporting beginners as well as experiences analysts. Thanks to the continous propagation of meta data through the process, you can always see what you can expect without the need of actually executing the analysis workflow (which saves a lot of time especially when working with large data sets). If RapidMiner detects a potential problem, it informs the user and offers so-called quick fixes. RapidMiner can analyze the analytical process / workflow itself and recommends useful next step. There are also assistants like Parent which uses meta-learning for suggesting probably well-working algorithms and parameters or the Intelligent Discovery Assistant where you only define your analytical goal and the data and the assistant plans a complete workflow automatically for you. And there is more like lots of wizards, template processes for frequent tasks etc.

  • server: like for the old and big players, RapidMiner offers a full server (called RapidAnalytics) for remote execution, sharing of resources, scheduling, user and right management but also the easy integration of analytical processes via web services, and web- based reporting among others.

  • community & innovation: we have more than 35,000 productive environments in the world and several hundreds of thousands of frequent users. Of course there is also a very active community helping each other in support forums or creating new modules for RapidMiner. Many of those developers are strongly connected to science and develop the latest algorithms much quicker than the big players. For example: RapidMiner offered Support Vector Machines in different flavours already in 2001 while big players needed almost 10 additional years to add their own implementations. The community also ensures a very high product quality since errors are detected very early and thanks to the open source nature of RapidMiner they are fixed much faster as well.

  • features: RapidMiner offers much more operations than the traditional players. This is especially true for the modeling schemes, where also many more innovative methods are available as stated above. But also in the fields of text mining and web mining, when it comes to deal with less common data formats and for data transformations in general, RapidMiner is probably the most feature-rich platform available on the market. See the fact sheet for an overview: http://rapid-i.com/downloads/brochures/RapidMiner_Fact_Sheet.pdf

  • modular concept: for example, SPSS does not really allow to evaluate the influence of data preprocessing. This means the prediction quality is most often estimated overoptimistically. In RapidMiner, you can really combine all the evaluation methods with all learners and all types of preprocessing. Preprocessing steps deliver preprocessing models which only use training data (hence the realistic estimation) and can be applied on testing / application data later on.

  • scaling up: many people agreed already that the in-memory calculations of 32-bit RapidMiner are more memory efficient than for example those of R or SPSS (SAS always have delivered impressive scalability in that respect - cudos!). Things have changed a lot since the wide adoption of 64 bit systems and larger amounts of main memory. For really large data sets, at least the Enterprise Editions of RapidMiner offer much more: in-database-mining for example where modeling takes place directly in the database itself, connectors to column-oriented database systems like VectorWise, and one of the very first products for creating analytical workflows on top of a Hadoop clusters by means of Radoop: http://www.radoop.eu/ or http://siliconangle.com/blog/2011/08/11/radoop-its-like-yahoo-pipes...

  • augmentable: the world is easy at RapidMiner. Virtually no programming takes place and processes are merely defined by putting components together. This also increases maintainability. However, in the rare case that you miss functionality, there are multiple ways to augment the system:
    • Creating and invoking R scripts,
    • Extending RapidMiner by creating new operators against the RapidMiner Java API,
    • Adding on-the-fly operators written in Groovy,
    • Invoking command line calls of external programs, and last but not least by
    • Invoking external web services.

  • unified environment: not so interesting for data mining but still a very important point. There is no break between ETL, data mining, and reporting. Most other tools would require a change of systems for those different tasks. In the RapidMiner-world, you are using the same environment and concepts for all tasks around data analysis. Everything is a process - even the creation of elements of web-based reports are a process. This also allows a direct integration with business processes and the creation of feedback (like writing back into a database or triggering actions), something which is usually not possible with other solutions.

  • company & services: and of course there is Rapid-I, the company behind all this. In contrast to many other open source suited, RapidMiner has been designed from day one to be useful to both communities, the scientific users as well as enterprise users solving real-world problems. To help those users to achieve the biggest success possible, Rapid-I and our very engaged support team does its best to support our customers as good as possible. In fact, these are some of the things we hear most often shortly after a change from SPSS or SAS to RapidMiner from our customers: support times of Rapid-I are much shorter; help was, well, more helpful; and even fixes or small enhancements are done as good and fast as possible. Compare this: http://www.decisionstats.com/using-rapid-miner-and-r-for-sports-ana...

I tried to take the viewpoint of our community members and users here and to compile what we here often from them. I hope this helps and I would be happy to welcome you in the community of new RapidMiner users soon!

Cheers,

Ingo

Ingo, 

Thanks for your reply. As I said, I have played with it some- and it looks good. I have two questions for you. Any good training materials out there? Secondly, will you have a US based conference or training session soon (Budapest is a little far). 

Thanks

MPG

Hi Myles,

sure, I am glad to help. About the materials, Rapid-I offers

There is also a whole bunch of upcoming things:

  • a new book "RapidMiner Use Cases": http://rapidminerbook.com/ (hopefully released end of this year at the latest)
  • a new book "Data Mining for the Masses" by Matthew North using RapidMiner and which will be presented at RCOMM 2012: http://www.rcomm2012.org
  • our new operator reference which will be released right before RCOMM 2012 or directly afterwards, explains the 200 most important operators on 1000+ pages including sample processes for all operators

I am sure that the documents above will explain everything necessary for getting started. The only thing you need is time to work through it ;-)

Still not enough? Well, time is money. Get in contact with Rapid-I, ask for an offer for our Support Subscriptions. Or our training courses. Or our webinars. Or... ;-)


Ok, back from marketing (sorry, could not resist...). About the US based conference / training sessions: we recently had a training in Atlanta and we are right now in the process of planning the next US based trainings. If you are interested in details, I could bring you in contact with our US representatives in order to align plans. In any case, I am sure that next US trainings will come soon already and the same holds also for a US user conference or at least a user meeting. We also have certified US-based partners who could offer a training to you. In case of any interest, just contact us at [email protected]

Hope that helps,

Ingo

RSS

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service