Subscribe to DSC Newsletter

Comparison on RapidMiner, SAS Enterprise Miner, R and orange

hi. I'm currently evaluating on the data mining tools above. I would like to ask some question.

1) how good is the tools in accessing and managing data? Which is the best software among the four tools?

2) Is Enterprise miner a machine learning tools?

3) Does Orange, R and Enterprise miner supports multi-cores?

4) Orange is a white box or black box tools?

5) Enterprise Miner provides scripting data mining? 

Any other good information that can help me do a clear comparison between these 4 data mining tools will be good.

thanks. 

Tags: Data, mining, tools

Views: 19119

Replies to This Discussion

Also, in terms of power and flexibility, as well as scalability, which software is better?

i can help with the training materials and videos of sas enterprise miner can reach me on [email protected]

I know R and RapidMiner so that's what I will answer about.

1)

R has a huge array of possibilities for connecting to databases, Big Data solutions, and processing all kinds of files and documents, including save files from other statistics packages. But you need to learn the scripting/programming language in order to process your data.

RapidMiner also has the most important connectors: databases, Excel, CSV, etc. In RapidMiner, you can use wizards for setting up your data sources and a graphical environment for processing data flows.

3. R has support for multi-cores (and even computing clusters) with the foreach packages. RapidMiner also has a parallel execution plugin.

Both R and RapidMiner are available for free and with source, but also in a professionally supported licensed commercial package from the authors (RapidMiner) or Revolutions (R). 

RapidMiner can also execute R scripts for data input, transformation and graphing so you can easily connect the two.

Which is better depends on your background and your needs. There are lots of good books for R; RapidMiner is more intuitive and you can find ready-to-use examples on myexperiment.com. 

R is where cutting edge statistical and data mining research happen. Of course, for that, you need to look at and try new, sometimes experimental packages. Everything in the canon of established methods and algorithms is there, too. RapidMiner has a smaller but still huge range of data mining methods, and can use the Weka library with lots more. 

Thanks for the information. can i know more about R and rapidminer in terms of data manipulation? they extract sampling, has direct access to database or both? So R has more connectivity to data like odbc, gateway than rapidminer? 

can R and rapidminer pass rules directly to OLAP tools and receive data for mining from OLAP tools, as well as, can direct access to warehouse? 

Which software is better in size constraints (handling maximum number of rows or records)? 

Just to double confirm, both of them can support for mining very large databases right? but which software is better in this? 

Both R and RapidMiner have direct access to most relational databases. R supports ODBC and many database systems directly. RapidMiner is written in Java so it uses JDBC; most relational database systems have JDBC drivers. There should be some JDBC to ODBC bridge, too.

The difference is in supporting file formats of obscure statistics packages you probably never heard of. You should't have problems reading your relational database data and files in RapidMiner or R. 

R is also a programming language, and RapidMiner can be extended with Groovy scripts or Java modules, so in the end you can write any data access methods, including OLAP tools if those have a defined API.

Both R and RapidMiner are memory-based systems. So they analyze your data as long as it fits the RAM of your computer. On a 64-bit operating system you can easily have 24 GB of RAM to analyze more than 20 GB of data.

Revolution Analytics, a provider of commercial enhanced R versions also has extensions for processing larger datasets.

RapidMiner has Radoop (beta version), which uses the Hadoop environment for processing large datasets.

It is possible you could help me grade this in your opinion? 

Rate from 1- 5

1 - very bad , 5 excellent

 

Rapid Miner

R

Product architecture

Data manipulation- extract sampling, direct access to database or both?

 

 

 

Warehouse/OLAP intergration

Connectivity to other tools

 

 

Performance

Support for multiple user access

 

 

Support for mining very large databases

 

 

Function

Mining approaches

 

 

Mining techniques

 

 

Presentation

Data visualisation

 

 

Environment

Platform independence

 

 

Size constraints (in handling maximum number of rows or records)

 

 

 

 

It's hard to assign numbers because it depends on your environment, your programming ability and the technology you are using.

With R, you have all possibilities but you need to learn the R language and install modules.

RapidMiner has a graphical modelling interface for ease of use but if your needs are special, you must do some scripting or developing extensions, too.

RapidMiner has good and easy to use graphical capabilities. R is the champion in visualisation but it is a bit harder to create pretty graphs. (Recently, some graphical interfaces for graphs have been developed, search for "Deducer ggplot2".)

The size constraints of the standard packages depend on your memory size; with the commercial Big Data tools, both can support almost unlimited data sets.

No. R has both established and experimental algorithms. R is probably the overall leader in mining techniques because most researchers use R for their first publication.

For example, there are not only classical decision trees (in package rpart) but also an innovative approach called conditional trees (in package party).

 

so what other techniques does R has besides decision trees? Conditional trees? and?

Everything. SVM, neural nets, regression, whatever you want. As I wrote, R is the favourite tool of scientists, so both well-established as well as experimental research algorithms are available.

But it only support in the R enterprise version, not the open source version ?

Hi. Can you provide me with more information to what do R and Rapidminer provide in their open source tools? not the commercial. thanks.

RSS

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service