A Data Science Central Community
I am a newbie in R but given the documentation, forums available etc I can sense R is most state of the art as well as easy to use tool in data mining. However I work in a Big Data team (in Hadoop ecosystem). Can you please tell me:
1) How large data sets can R handle through ff package etc?
2) Can R connect to Hadoop through foreach package or any other packages?
3) If answer to 2) is yes, rre algos in R parallelized or only connector is available
4) Is Rev R/RapidMiner the answer to big data analytics?
Your response will be deeply appreciated
We have used R extensions on HDFS data using "rmr" which are parallized across the Hadoop nodes. However, we had much better performance with the "rhive" especially on time series data using RCFile format and compression. It did scale good enough in a limited capability cluster(8 nodes) to upto 50GB without problem.