Subscribe to DSC Newsletter

At the useR conference this year, we are organizing a panel discussion (by vendors and end users) on the challenges that R users face bringing R into their organization--things like lack of IT knowledge/acceptance, issues of technical support, validation/compliance, versioning/rapidly changing code base, etc.

Have these been issues for you? What would you like to see discussed by this panel?

Views: 323

Replies to This Discussion

I think main problem for non statistician at business analytic area is about GUI on R and R cannot produce SQL logic form developed model so it’s hard to apply modeling result to large data for activity like scoring.

But however R for me is very flexible to solve specific needs analytic problem.
Another issue is that most analytic tools implemented nowadays are wrapped around large enterprise BI applications. R doesn't address a lot of that need. And, to a lot of companies, that is essentially what analytics are about.

-Ralph Winters
- Memory limitations: more than 500K observations will typically make your Windows machine crash unless you spend time getting it in the cloud, then there are other costs such as the numbers of hours you spend getting your process optimized for distributed architecture.

- I've found that installing packages is not trivial. If installing a R decision tree package takes significantly more time than writing advanced decision tree algorithms in C or Perl, the benefit of using R is lost.

- Finally, I've tried to install R on a Unix shared server to start running my own analytic server (with API, SaaS and on-demand services), but I did not succeed after spending 24 hours trying to get it to work. I ended up using Perl/CGI (PHP would be even easier) and the standard graphic libraries available on UNIX.
Memory limitations - the most frequent questions I faced boil down to the one on how well it handles operations on large dataset (most often over a few million records).
Great comments, and issues which sound very familiar from previous conversations with R & S+ users.

1. Regarding friendly UIs/BI applications on R--this is a significant focus for TIBCO Spotfire. Our latest release of Spotfire features integration with R & S+ on the back end, to make it easy to develop interactive applications which leverage R analytics, and deploy those out a community of business analysts. If you'd like to learn more, check out this link , or watch the webcast I did yesterday on this topic.

2. On producing SQL: while I don't know of any packages that produce SQL, the pmml package allows you to export certain models as PMML, which can then be imported and used in many environments. The DMG has a great list of the models supported by the pmml package, as well as other products.

3. On handling large data sets, while there is no unified solution in R, there are some packages which implement out of memory methods, and others which implement parallel methods. The High Performance and Parallel Computing Task View on CRAN gives a great overview of these packages.

Another alternative many of our customers use is the bigdata library in S+, which implements several out-of-memory methods for data preparation, modeling, and scoring. If you'd like more info, the full documentation on the big data library is available on the Spotfire Technology Network.

FYI, I have also cross posted this discussion on the R group on LinkedIn.


On Data Science Central

© 2020 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service