There is an increasing number of individuals and companies that are now delivering analytics solutions using modern web-based platforms: Data-Applied
to name a few.
The concept is at least 10 years old, but because inexpensive web servers can now handle a large bandwidth, and can process megabytes of data in a few seconds (even without cloud), and because Internet users have much faster (broadband) connections, it is possible to develop analytics applications capable of processing millions of observations online, on demand, in real time, and deliver results via an API, or "on the fly". In some cases, results consist of processed data sets, sometimes fairly large, where one column has been added to the input file: for instance, the new column (the output) is a score attached to each observation. This is a solution that we are working on, using ad-hoc statistical techniques to process data very efficiently with hidden decision trees, using very little memory and efficient data structures, and thus allowing users to process online, "on the fly", large data sets that R or other statistical packages would not be able to process even on a desktop. In fact, these traditional packages (R, Splus, Salford Systems) require that all your data be stored in memory, and will typically crash if your input file has more than 500,000 observations. Web 3.0 analytics can easily handle much larger data sets -- online!
Interestingly, this new type of analytics service can rely on popular statistical packages (SAS, etc.) or can use ad-hoc algorithms written in Perl (including production of charts with the GD library), Python, C, C# or Java. A version based on SAS would be called a SAS web server
(extranet or intranet) and work as follows:
Once our application (analytics 3.0) will be live, we will make a public announcement, probably in January. Stay tuned!
- An API call is made to an external web site where SAS is installed; parameters in the API call describe the type of analysis requested (logistic regression, etc.)
- A perl/CGI script processes the HTTP request, extracts the parameters and automatically write a SAS program corresponding to the user's request.
- The SAS code is run from the Perl script in command-line mode, and produces an output file such as a chart or XML or data file.
- The Perl script reads the chart and display it in the browser (if the user is a human being using a web browser), and provides a URL where the user can fetch the chart (in case the user is a web robot executing an API call).