Hello everyone,

I am currently starting some research on human behavior modeling and prediction. While searching for the best statistics and data mining software I came across a very big $$ issue :) As I am doing this in the course of my PhD and currently the institute/university is not capable of providing a license for it, I decided to go for R.

I am particularly enthusiastic as it can be plugged together with Java and therefore address stuff in real-time. Based on your expertise, do you think this software will limit my results? What you be the major drawbacks of R?

Best Regards,
Jose Simoes

Hi Jose,
only draw back with R is that it can not handle large data more then 1.5 GB using R console. R can be connected to data base using all type of data base connection. If you are interested in using java based software then i would recommend kNIME open source software. This can be easily integrated with R and weka using plugins.
This has got nice connectivity to data base and we can handle large volume of data. I hope this will help you.
Please mail me if you need some more help.
Nikesh Srivastava
I do not doubt your words, but could you please point to a reference regarding the memory limitation (especially regarding the size 1.5 GB) ?
Apparently the limit is 3 GB:

Although you can definitely use distributed computing with R. I believe there are open source solutions, but there is also a commercial solution: REvolution Computing
Thank you !
I am not sure about the 3GB limit. The limitation is mostly becaue of the RAM. I don't know how big is the dataset on which Jose is going to work on, but I have seen R perform very well under Linux 64 bit versions.
Oh, seems that you are correct. I performed a little search and discovered this:

So there is no memory limit for R under Linux ? Any remarks, Mr Winters ? :)
Hi Steffen,
Limitaion figure 1.5GB has been conculeded based on my experiment with R. i wanted to measure scalability of R in term amount of data it can take. If you want to know the memeory limitation of R just type memory.limit in R console. It would give you intial figure of memory. If you want to improve the memory type memory.size(4000) it would increase memory to 4 GB but defentely you can not stratch more then this. In fact when i was running SVM classification on data with 42k record i got error message R can not handle vector of size 1.5GB. So these are some of my observation while working with R. But R is algorithmics rich software.
Quote: If you want to improve the memory type memory.size(4000) it would increase memory to 4 GB but defentely you can not stratch more then this

Under windows or linux ? Please more references and less opinions :). No offense !
Memory.size() only applies on windows.
Some tips for diagnosis and treatment of memory problems:
Thanks Robin !
Thank you all for your answers.

However, there is another issue which I think I did not make clear. I pretend to use this JAVA code in real-time (and online), in other words, the code is supposed to be deployed in a Java Application Server (Servlet container) like JBOSS or Sailfin. So my question is, if I develop something in KNIME or other Java environment, will it still run on these environments?

Best Regards,
Jose Simoes


