A Data Science Central Community
Graphs are gimmicks, substituting fancy displays for careful analysis and rigorous reasoning. It's basically a tradeoff: the snazzier your display, the more you can get away with a crappy underlying analysis. Conversely, a good analysis doesn't need a fancy graph to sell itself. The best quantitative research has an underlying clarity and a substantive importance whose results are best presented in a sober, serious tabular display. And the best quantitative researchers trust their peers enough to present their estimates and standard errors directly, with no tricks, for all to see and evaluate. Let's leave the dot plots, pie charts, moving zip charts, and all the rest to the folks in the marketing department and the art directors of Newsweek and USA Today. Here at this blog we're doing actual research and we want to see, and present, the hard numbers.
To get a sense of what's at stake here, consider two sorts of analyses. At one extreme are controlled experiments with clean estimate and p-value, and a well-specified regressions with robust standard errors, where the p-values really mean something. At the other extreme are descriptive data summaries--often augmented with models such as multilevel regressions chock full of probability distributions that aren't actually justified by any randomization, either in treatment assignment or data collection--with displays of all sorts of cross-classified model estimates. The problem with this latter analysis is not really the modeling--if you state your assumptions carefully, models are fine--but the display of all sorts of numbers and comparisons that no way are statistically significant.
For example, suppose a research article with a graph showing three lines with different slopes. It's natural for the reader to assume, if such a graph is featured prominently in the article, that the three slopes are statistically significantly different from each other. But what if no p-value is given? Worse, what there are no point estimates are no standard errors to be found? Let alone the sort of multiple comparisons correction that might be needed, considering all the graphs that might have been displayed? Now, I'm not implying any scientific misconduct here--and, to keep personalities out of this, I've refrained from linking to the article that I'm thinking about here--but it's sloppy at best and statistical malpractice at worst to foreground a comparison that has been presented with no rigorous--or even approximately rigorous--measure of uncertainty. And, no, it's not an excuse that the researchers actually "believe" their claim. Sincerity is no defense, There's a reason our forefathers developed p-values and all the rest, and let's remember those reasons.
The positive case for tables
So far I've explained my aversion to graphs as an adornment to, or really a substitute for, scientific research. I've been bothered for a while by the trend of graphical displays in journal articles, but only in writing this piece right here have I realized the real problem, which is not so much that graphs are imprecise, or hard to read, or even that they encourage us to evaluate research by its "production values" (as embodied in fancy models in graphs) rather than its fundamental contributions, but rather that graphs are inherently a way of implying results that are often not statistically significant. (And all but the simplest graphs allow so many different visual comparisons, that even if certain main effects actually do past the p-value test, many many more inevitably won't. Some techniques have been developed to display multiple-comparisons-corrected uncertainty bounds, but these are rarely included in graphs for the understandable reason that they magnify visual clutter.)
But enough about graphs. Now I'd like to talk a bit about why tables are not merely a necessary evil but are actually a positive good.
Read comments and full article at http://www.stat.columbia.edu/~cook/movabletype/archives/2009/04/why...
This is the first time I have ever heard graphs referred to as "gimmicks".
I can appreciate the authors point of view, however the fact that there are no p-values on a graph is an error of omision rather than a deficiency of a graph. Visualizations definitely aid in data mining projects. Especially as the number of of dimensions and measures grow exponentially. There is no reason why graphs and data tables cannot co-exist together, one can misrepresent or omit data just as easily in one medium as in another.