A Data Science Central Community
An occasional series in which a review of recent posts on SmartData Collective reveals the following nuggets:
Since the beginning of…
Believe it or not, the concept of data quality has been touted as important since the beginning of the relational database. The original concept of a relational database came from Dr. Edgar Codd, who worked for IBM in the 1960s and 70s. Dr. Codd’s ideas about relational databases, storing data in cross-referenced tables, were groundbreaking, but largely ignored at IBM where he worked. It was only when Larry Ellison grabbed onto the idea and began to have success with a little company named Oracle that IBM did finally pay attention. Today, relational databases are everywhere.
—Steve Sarsfield: “A Brief History of Data Quality”
Empowering the business
The other aspect that of today’s data visualization leaps is that it disassociates the business from the information. If only a small group of geeky mathematicians and programmers understand the data, it creates a mysticism that can lead to distrust of information. If people don’t understand it, they don’t learn, and they don’t improve. What I hope we can do as really smart statisticians, data analysts, and programmers is make the connection between information and visualization so that is further democratizes insight and empowers our business.
—Michele Goetz: “Cool Data Visualization – What is That?”
Building momentum the simple way
Too many projects fail because of lofty expectations, unmanaged scope creep, and the unrealistic perspective that data quality problems can be permanently “fixed” as opposed to needing eternal vigilance. In order to be successful, projects must always be understood as an iterative process. Return on investment (ROI) will be achieved by targeting well-defined objectives that can deliver small incremental returns that will build momentum to larger success over time.
—Jim Harris: “The Data Quality Goldilocks Zone”
It’s not sexy…
...it’s not business alignment, and it doesn’t require a lot of meetings. It’s not data governance. Instead, it’s the day-to-day management of detailed data, including the dirty work of establishing standards. Standardizing terms, values, and definitions means that as we move data around and between systems it’s consistent and meaningful. This is Information Technology 101. You can’t go to IT 301—jeez, you can’t graduate!—without data management. It’s just one of those fundamentals.
—Evan Levy: “Not MDM, Not Data Governance: Data Management”
Bridging the cultural divide
I think we need to find a place for the artist within the experimental process. The current constraints of science make it clear that the breach between our two cultures is not merely an academic problem that stifles conversation at cocktail parties. Rather, it is a practical problem, and it holds back science’s theories. If we want answers to our most essential questions, such as “Where does consciousness come from?”, then we will need to bridge our cultural divide. By heeding the wisdom of the arts, science can gain the kinds of new insights and perspectives that are the seeds of scientific progress. If nothing else, artists can teach scientists to ask better questions.
—Jonah Lehrer in interview with Tom H.C. Anderson: “Market Researchers are Neuroscientists Too”
Into the past
I tend to agree with the statement that business analytics is part of business intelligence, but it’s not an opinion that I hold religiously. If the reader feels that they are separate disciplines, I’m unlikely to argue vociferously with them. However, if someone makes a wholly inane statement such as BI “can only provide historical information that can’t drive organizations forward,” then I may be a little more forthcoming.
—Peter Thomas: “Business Analytics vs. Business Intelligence”
Is it fit for purpose?
If you are saying that you shouldn’t wait for your data to be perfect before using it in BI, then I agree; but to completely ignore the quality of the information you’re using to inform your decisions would be like playing roulette - Russian style. I’d also suggest that having too much data or data that is out of date are very much data quality issues.
—Steve Tuck: “What us is BI without fit-for-purpose data?