Subscribe to DSC Newsletter

The question has arisen in this discussion (http://www.analyticbridge.com/forum/topics/using-r-for-research). Can anyone provide an example where apply (or tapply, sapply...) is a significant timesaver ?

 

From "S Programming" (by Venables and Ripley):

It was once true that functions such as sapply concealed explicit loops, but this is no longer the case in the case of the S engines.

 

Is this also true for R ? :)


EDIT: I mean "timesaver" in terms of execution time. Sorry for the ambiguity.

 

Views: 160

Replies to This Discussion

> N=1:1000000; st= system.time
> xx=data.frame(i=N, x=N)/100
> st({A=sum(xx$x); B=sum(xx$i); print(c(A,B))})
[1] 5000005000 5000005000
[1] 0.02
> st({A=B=0
+ for(i in N){A=A+xx[i,1]; B=B+xx[i,2]}
+ print(c(A,B))
+ })
[1] 5000005000 5000005000
[1] 146.1
> st({print(apply(xx,2,sum))})
i x
5000005000 5000005000
[1] 0.28
@Jan st=system.time ;)

@Alex:
Interesting example. However, the speedup comes from "sum" not from "apply", doesn't it ? But still interesting to see how much overhead in terms of execution time "apply" needs (comparing the first and the last execution).

> st({res<-rep(0,2)
+ for(i in (1:2)){res[i]<-sum(xx[,i])}
+ print(res)
+ })
Correct !
the variants of apply are a waste of time figuratively and literally. if you're looking for optimization, 'by' is the way to go. or Revolution's parallelized foreach.
Contradiction ?

The R documentation states "Function 'by' is an object-oriented wrapper for 'tapply' applied to data frames".

So: Code snippet or I dont believe that it's faster ;)

RSS

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service