A Data Science Central Community
Once established that it would be beneficial to integrate a big data platform into the corporate analytics ecosystem, the problem has just started.
First, you need to select your platform among the many choices available on the market. Then, you need to configure your analytics environment so that any script/workflow can connect and run on the big data platform of choice.
Notice, also, that we are just at the beginning of the big data era: the winners of today might not seem the best choice in a few years (or even months) anymore! You need the freedom to change platform quickly, whenever necessary.
KNIME has developed a very flexible and easy to implement strategy to access any big data platform.
1. Start with a Connector node, either a generic one requiring a JDBC driver or a dedicated one, such as for Cloudera Impala, Hadoop Hive, Hortonworks, HPVertica, parStream, and more.
2. Define the appropriate SQL query, even without being a SQL Wizard, using the Database nodes. These nodes implement SQL queries for different degrees of SQL knowledge. The Database Query node lets you write your own SQL query, while other Database Manipulation nodes, such as Database GroupBy or Database Joiner, expose a simple GUI interface to hide the SQL implementation details. If you already know KNIME, these SQL code free nodes offer the same GUI as the corresponding nodes handling data tables. No additional learning curve required!
3. Execute and retrieve the resulting data with the "Database Connection – Table Reader" node. Execution runs on the big data platform and the resulting data table is only then imported into the KNIME workflow for further analysis.
Such an easy approach opens the door to the introduction of big data into KNIME, without the headache of configuring each tiny detail to connect to the platform of choice.
It also preserves the freedom of change. If in the time to come you are not satisfied with your big data platform anymore and need to change it, you just need to change the initial Connector node in your workflow! Everything else remains as it is!