An occasional series in which a review of recent posts on SmartData Collective reveals the following nuggets:
The eternal quest: better data
Data governance and data quality are often the domain of data quality vendors, but any
technology that can help your quest to achieve better data is worth exploring. Rather than fixing up data after it has been corrupted, it’s a good idea to use preventative technologies to stop poor data quality in the first place.
—Steve Sarsfield: “Guiding Call Center Workers to Data Quality”
The devil in the details
The long-term problem of understanding metadata remains challenging, however – especially within organizations. Indeed, most of the effort of implementing business intelligence projects often goes into trying to determining what people are trying to measure – i.e. which data sources need to be connected to each other, and how common business terms should be calculated. It’s one of those areas that exasperate business users: “How hard can it be to give me sales revenue by product?!” But the IT department understands that the devil is in the details.
—Timo Elliott: “The Inevitable Wolfram|Alpha Problem: Semantics”
Software alone won’t cut it
And where software is purchased, there is usually many times more the cost of the software in training and consulting to help understand better how to use the software… But even with software, unless there is clear thinking about the problems that need to be solved, and which ones can be solved realistically (or impacted) with analytics, the software will just sit, doing nothing useful. This is surely a factor in the divide between potential capabilities in analytics (i.e., software on the shelf) and benefits attained by analytics.
—Dean Abbott: “Is analytics a winner in a recession?”
For you SQL jockeys, most of the heavy-lifting in database processing is in the where clause. Columnar databases are faster because their processing isn’t inhibited by unnecessary row content. Because many database tables can have upwards of 100 columns, and because most business questions only request a handful of them, this just makes business sense. And In these days of multi-billion row tables and petabyte-sized systems, columnar databases make more sense than ever.
—Evan Levy: “The Rise of the Columnar Database”
Driving the transformation
Many firms have used the recession as an opportunity to focus much harder internally on eliminate wastage and streamlining poor process flows, which has effectively put them in a much healthier position to move into outsourcing environments that can be underpinned by robust ERP and standardized processes. Other firms have not been so diligent, and are looking for providers to take on their back-office baggage and grant them cost-savings. In these situations, the onus on the service provider to help its client refine their processes is very strong. If the service provider fails to help drive the transformation in tandem with the client's governance leadership, the engagement is unlikely to reap many rewards for either party.
—Phil Fersht: “Globalizing the business is the key to outsourcing today”
Come out, come out, whoever you are
Just imagine how easy it would be for someone who didn’t like you do start posting embarrassing comments and signing them with your name. Or perhaps someone might pursue a more subtle strategy, such as posting reasonable-sounding comments in order to advance an agenda. Less speculatively, we’ve seen how anonymity can be troublesome for the integrity of Wikipedia editing. Given the growing role of social media, we’re going to have to cross this information accountability
bridge sooner or later. I hope it’s sooner. Would it be nice if we developed a cultural norm that people stood proudly behind their online words?
—Daniel Tunkelang: “Approach and Identify”
Think about it
It appears that the datasets available now are heavy on the earth sciences areas, but according to the FAQ, more datasets will be available. There’s even a place to request new datasets. Most surprising, to me, is the fact that the site offers the ability to rate the utility, usefulness, and ease of access for the data. I wonder how many of us are providing that feature to our users?
—Karen Lopez: “Data.gov is Live: Access US Federal Data”