A Data Science Central Community
A very efficient approach to random sampling in SAS® achieves speed increases orders of magnitude faster than the relevant "built-in" SAS® procedures. For sampling with replacement as applied to bootstraps, seven algorithms are compared, and the fastest ("OPDY"), based on the new approach, achieves speed increases over 220x faster than Proc SurveySelect. OPDY also handles datasets many times larger than those on which two hashing algorithms crash. For sampling without replacement as applied to permutation tests, six algorithms are compared, and the fastest ("OPDN"), based on the new approach, achieves speed increases over 215x faster than Proc SurveySelect, over 350x faster than NPAR1WAY (which crashes on datasets less than a tenth the size OPDN can handle), and over 720x faster than Proc Multtest. OPDN utilizes a simple draw-by-draw procedure that allows for the repeated creation of many without-replacement permutation samples without requiring any additional storage or memory space. Based on these results, there appear to be no faster or more scalable algorithms in SAS® for bootstraps, permutation tests, or sampling with or without replacement.