A Data Science Central Community
The job of a data analyst nowadays has become very extensive, in its need to cover a number of different and ever-changing tasks.
A data analyst must query a variety of internal and external data sources, each with a different access protocol and format; integrate these data with results from REST and web services queried over the Internet, such as Google API or any social media channel; exchange information with business analysts, who, while lacking the deep mathematical background, are still the owners of the domain knowledge; rely on old data preparation scripts programmed by now retired employees and running on obsolete data management platforms; include the results of a third party software running on some cloud; and then, finally, after putting together this whole variety of data sources and integrating the whole bonanza of new and old data management application tools, he/she is are ready to apply the selected data analytics algorithm.
Given this situation, it is obvious that we need to rethink the modern data analytics landscape, providing a few additional requirements. While in the past a powerful platform was all that was needed, today we need more features to match the current data analytics needs.
In particular, we need our platform to be integrative, collaborative, agile, transparent, and yes powerful as well!
Indeed, we need the flexibility to interact with data from different sources, with different formats, and different protocols, and to interact with other tools. The time of the walled garden, when an analytics platform was purely supporting itself, are over, and this paves the way to more open solutions.
As data analysts we also need to exchange knowledge with other data analysts and business analysts. Fostering collaboration across teams and analysts empowers the collective smart. (http://en.wikipedia.org/wiki/Collective_intelligence).
We definitely need agility, to quickly build and test proof of concept prototypes, before deciding of their final destination, whether production or trash bin. (http://www.ibm.com/developerworks/library/d-agile-data-analysis/)
Transparency as intra-tool documentation is also a nice feature to have, saving us tons of documentation e-pages to explain to others what our data analytics script is actually doing.
Finally, a data analytics platform must still be as powerful as ever, to deal with all sorts of data: big and small.
All that defines an open architecture (http://www.knime.org/open-for-innovation) and we, today’s data analysts, cannot settle for anything less.