A Data Science Central Community
Good Morning and Welcome to this addition of the Morning Analytic Coffee Blog. Today, I'm focusing on the science in Data Science. In 7th grade, we were taught that the following is the scientific method, as described by Bacon: Question, followed by initial knowledge gathering, the formation of hypothesis, designing a test to that hypothesis, performing said experiment, writing all of the results, analyzing said results, reporting them and finally verifying them through a duplicate experiment, hopefully by someone else.
I think we often forget the science part of data science because many of us seek to be viewed as smart and hopefully useful. And by "forgetting the science part", I mean that we're afraid to make mistakes. The high pressure world in which we live often makes it seem as if we can't ever mess up. But I think, in the words of Miss Frizzle of "Magic School Bus" fame, it's time to "Get Messy, Make Mistakes and do Data Science!"
I don't mean make a critical mistake, or erroneously report on work that matters, or foul up a huge report. I mean it's time to re-learn the art of when to experiment, to develop methods and find, as Edison put it regarding his light bulb, "9999 things that don't work."
We are so often focused on bottom lines and end results that we've forgotten the love of discovery, and that comes through trial and error. Maybe this is my ADHD talking, but I think when entering a new market, it is important to understand the drive to accumulate knowledge and to make it useful. Too often we see take the words "the customer is always right" to mean "Our customer tells us not just what they need but how to do it". That's not what they hire you for - they hire for a result. And so often, analysts and others work to achieve that result that they ignore the path taken. Good Analysis, however, in my opinion, not only shows possible correct paths, but also the reasons for not going down other paths and why those conclusions were drawn. In larger scale projects, such as environmental determinations or civil engineering projects, the methodologies for such are discussed and rejected ideas are usually given a reason. However, if the reason is only cost, was the avenue explored? Was long term cost over life of project, as opposed to capital cost of build, explored? What constraints were placed that changed the project?
It's here that I think sometimes we fail as analysts: In trying to match certain criteria, we reject some of the data we might want to dig in. I know there's only so many billable hours in a day, and that's difficult to match those with exploration and ideas, but better projects come from finding fertile grounds for choice. I do not mean finding completely impractical paths, such as a tourist gondola in a crowded downtown with no attractions for tourists (seen this proposed elsewhere), but multiple realistic scenarios (fixed guideways, tunnels, etc) that solve a problem. I think it's time analysis was focused not just on "the customer" or "the need" but "the system that forms because of the project", a look at inflow and outflow and systemic impacts.
Science, to me, is the look at systemic changes as a result of one alteration at one point. Our projects should matter, and because of this, our science should take a view that involves a few mistakes to prove out not just that the experiment was good, but the initial hypothesis was broad enough to create a good experiment in the first place.