A Data Science Central Community
From the OLAP concept in earlier years to the agile BI over the last few years, BI vendors never stop advertising the self-service capability, claiming that business users will be able to perform analytics by themselves. Since there are strong self-service needs among users, the two really hit it off and it is very likely that a quick deal is made. The question is - does a BI product’s self-service functionality enable a truly flexible data analytics by business users?
There isn’t a standard definition of “data analytics” in the industry. So no one can say for sure whether the claim is objective or exaggerated. But for users who have little BI experience, the fact is that most of their self-service needs can’t be met with the so-called self-service technology. According to industry experiences, the best record is about 30% solved problems. Most BI products lag far behind the number, lingering around 10%.
We’ll look at the phenomenon from three aspects.
Multidimensional analysis performs interactive operations over a pre-created data set (or a data cube). Today most BI products provide this type of analytic capability. Though a new generation of BI products has improved much on interface design and operational smoothness, their ability of implementing computations hasn’t essentially improved.
The key aspect of multidimensional analysis is model creation, which is the pre-preparation of data sets. If the data to be analyzed is all held in a single data set, and if the operations to be performed are within those provided by a BI product (including rotation, drilldown, slicing, and so on), the analysis is well within the product’s capability. But for most real-life scenarios, analytic needs are beyond these pre-installed functionalities, like adding a provisional data item or perform a join with another data set, leading to a re-creation of the model. The problem is that the model creation requires technical professionals, sending the tool non-self-service.
Multidimensional analysis can meet only 10% of the self-service needs, which reflects the average self-service ability of today’s BI products.
Some BI products provide associative query capability to make up for the limitations of multidimensional analysis. The strategy is to create a new data set by joining multiple data sets before performing the multidimensional analysis, or to implement certain joins between multiple data cubes during the multidimensional analysis. This means business users are to some degree allowed to create models.
It isn’t easy to implement an associative query well. Relational databases give a too simple definition of the JOIN operation, making the association between data sets too complicated to understand for many business users. The issue can be partly addressed through a well-designed product interface, and a good BI product enables business users to appropriately handle non-circular joins. But to solve the issue, we need to change the data organization scheme on the database level. The reality is that nearly no BI products re-define the database model, thus the improvement of associative query ability is limited. We’ll discuss the related technologies in later articles.
Here’s a typical example for testing the associative query ability of a BI product: finding the male employees under the female managers. The simple query involves multiple self-joins, but most BI products are incapable of handling it (without first creating a model).
BI products’ associative query capability can meet 20%-30% of the self-service needs, though the specific number depends on the different capabilities provided by different products.
About 70% or more self-service demands involve multi-step procedural computations, which is completely beyond the design target of a BI product and even can be considered beyond data analytics, but is a hot user problem. Users hope that frontline employees can get data as flexible as possible within their authority.
A simple solution is exporting data with the BI product, and then handling it by the frontline workers with desktop tools like Excel. Unfortunately, Excel is not good at handling multilevel joins (the issue will be discussed later), as well as dealing with a large amount of data, making it unsuitable in many computing scenarios.
Before more advanced interactive computing technology appears, technical specialists are responsible for tackling those problems. In this context, instead of pursuing the self-service procedure computing, BI products should focus on facilitating business users’ access to technical resources and the development process for developers.
There are two things that we can do. One is establishing an algorithm library where the algorithms of handled scenarios are stored. Business users would call up an algorithm and change parameters for use in a same type of computing scenario. They can also find an algorithm in the library for the technical specialists’ reference in handling a new scenario, reducing the chance of difference between the business users and the development team in understanding a computing scenario, which is a major factor for transaction delay. The other is providing efficient and manageable programming technology that facilitates coding and modification and that supports storing an algorithm in the library and its reuse. Corresponding technologies are rare in the industry. SQL has good manageability, but SQL code in handling procedural computations is too tedious. The stored procedure needs recompilation, which is inconvenient for reuse. Java code needs recompilation, too, and is nearly unmanageable. Other scripting languages are integration-unfriendly and thus difficult to store and manage in a database for reuse.
At present, BI products are barely able to meet the most common self-service needs. Usually the BI vendors are talking about multidimensional analysis while the users are thinking of problems that need to be done with procedure computing. The misunderstanding invites high expectations as well as big disappointments. In view of this, it’s critical that users have a good understanding of their self-service needs: Is multidimensional analysis sufficient for dealing with the problems? How many associative queries will be needed? Will the frontline employees have a lot of problems that require procedure computing? Having these questions answered is necessary for setting a reasonable expectation of a BI product and for knowing what the BI product can do, thus avoiding being misled by the flowery interface and smooth operation and making a wrong purchase decision.