Subscribe to DSC Newsletter

The question has arisen in this discussion ( Can anyone provide an example where apply (or tapply, sapply...) is a significant timesaver ?


From "S Programming" (by Venables and Ripley):

It was once true that functions such as sapply concealed explicit loops, but this is no longer the case in the case of the S engines.


Is this also true for R ? :)

EDIT: I mean "timesaver" in terms of execution time. Sorry for the ambiguity.


Views: 284

Replies to This Discussion

> N=1:1000000; st= system.time
> xx=data.frame(i=N, x=N)/100
> st({A=sum(xx$x); B=sum(xx$i); print(c(A,B))})
[1] 5000005000 5000005000
[1] 0.02
> st({A=B=0
+ for(i in N){A=A+xx[i,1]; B=B+xx[i,2]}
+ print(c(A,B))
+ })
[1] 5000005000 5000005000
[1] 146.1
> st({print(apply(xx,2,sum))})
i x
5000005000 5000005000
[1] 0.28
@Jan st=system.time ;)

Interesting example. However, the speedup comes from "sum" not from "apply", doesn't it ? But still interesting to see how much overhead in terms of execution time "apply" needs (comparing the first and the last execution).

> st({res<-rep(0,2)
+ for(i in (1:2)){res[i]<-sum(xx[,i])}
+ print(res)
+ })
Correct !
the variants of apply are a waste of time figuratively and literally. if you're looking for optimization, 'by' is the way to go. or Revolution's parallelized foreach.
Contradiction ?

The R documentation states "Function 'by' is an object-oriented wrapper for 'tapply' applied to data frames".

So: Code snippet or I dont believe that it's faster ;)


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service