Subscribe to DSC Newsletter

Hi Everyone,

 I am a newbie in R but given the documentation, forums available etc I can sense R is most state of the art as well as easy to use tool in data mining. However I work in a Big Data team (in Hadoop ecosystem). Can you please tell me:

1) How large data sets can R handle through ff package etc?

2) Can R connect to Hadoop through foreach package or any other packages?

3) If answer to 2) is yes, rre algos in R parallelized or only connector is available

4) Is Rev R/RapidMiner the answer to big data analytics?

Your response will be deeply appreciated



Views: 848

Reply to This

Replies to This Discussion

We have used R extensions on HDFS data using "rmr" which are parallized across the Hadoop nodes. However, we had much better performance with the "rhive" especially on time series data using RCFile format and compression. It did scale good enough in a limited capability cluster(8 nodes) to upto 50GB without problem.


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service