A Data Science Central Community
Improperly ‘conflating causation with correlation’ is a central but often overlooked danger in business analysis and data science initiatives. Especially with ‘big data’ sets, analysis will often reveal patterns that suggest a causal element which is only co-occurring phenomenon, or worse, ‘phantom phenomenon’ (i.e. coincidence or a happenstance of a limited dataset).
Some practical examples concerning mistaking correlation for causation: a recent letter to the editor in the INFORMS society magazine by Dr. John Crocker entitled “Numbers don’t lie and other myths” raised two excellent examples of improper causal attribution (Crocker, June 2013). In one case, he noted that a recent article claimed that suffering hair loss predisposed one to migraine headaches. This is an example of correlation, but not causation. Whereas there may be a statistical correlation between the two phenomenon (baldness and migraines), this is not a license to conclude one ‘causes’ the other, merely that they have a propensity to co-occur. Such an observation indicates there is likely more fundamental phenomenon at play (i.e. genetics predisposition to higher testosterone levels which leads to both hair loss and greater stress, leading to high blood pressure, which predisposes one to migraines).