A Data Science Central Community
To this day, randomized experiments remain the gold standard for generating models that permit causal inference. In many fields, such as drug trials, they are, in fact, the conditio sine qua non. Without first having established and quantified the treatment effect (and any associated side effects), no new drug could possibly win approval. This means that a drug must be proven in terms of its causal effect and hence the underlying study must facilitate causal inference.
However, in many other domains, such controlled experiments are not feasible, be it for ethical, economical or practical reasons. For instance, it is obvious that the federal government could not create two different tax regimes in order to evaluate their respective impact on economic growth. For lack of such experiments, economists have been traditionally be constrained to studying strictly observational data and, although much-desired, causal inference is much more difficult to carry out on that basis. Causal inference from observational studies typically requires an extensive range of assumptions, which may or may not be justifiable depending on one’s viewpoint. Being subject to such individual judgement, it should not surprise us that there is widespread disagreement among economic experts and government leaders regarding the effect of economic policies.
While economists and social scientists have been using observational data for over a century for policy development, the business world has only recently been discovering the emerging potential of “big data” and “competing on analytics.” As these terms are becoming buzzwords, and are rightfully expected to hold great promise, the strictly observational nature of most “big data” sources is often overlooked. The wide availability of new, easy-to-use analytics tools may turn out to be counterproductive, as observational versus causal inference are not explicitly differentiated. While the mantra of “correlation does not imply causation” remains frequently quoted as a general warning, many business analysts would not know under what specific conditions it can be acceptable to derive a causal interpretation from correlation in observational data. Consequently, causal assumptions are often made rather informally and implicitly and thus they typically remain undocumented. The line between association and causation often becomes further blurred in the eyes of the end users of such research. Given that the concept of causality remains ill-understood in many practical applications, we seriously question today’s real-world business capabilities for deriving rational policies from the newly-found “big data.”