A Data Science Central Community
This link will take you to a great YouTube Video on the 5 classes of databases and the differences in the major players in the NoSQL market.
And here are some conclusions I took away from the presentation.
Mike Bowers, Principal Engineer at the Church of Jesus Christ of Latter Day Saints is an expert in database technology. He accomplishes two major objectives; Mike reviews the strengths and weaknesses of the five major classes of databases today (relational, dimensional, object, graph and document). He then dissects the major NoSQL databases on the market including MarkLogic, Mongo, Riak, Cloudant/Couch DB and Cassandra. How do they stack up? Are they enterprise ready? If developer productivity, application performance and enterprise readiness are concerns that your company has, this video is a “must see”. Here are some sound bites I took away from the presentation. Please note these comments only begin to scratch the surface of Mike’s message.
Since over 80 % of the data being created today is unstructured data (probably better termed poly-structured), organizations need to store, search and analyze hundreds of different data formats at light speed. The ability to handle data variability, data variety and data relevance has jumped to the top of the agenda for both business and IT. But how can organizations discern meaning from this data? How do they create context around unstructured data that exists in so many forms while also making it discoverable? Relational Models are not well suited to handle the problem since they were designed to organize your data in rows, columns and tables. The variety and complexity of unstructured data coupled with the overriding need to scale out on commodity hardware prevent them from leveraging over 80% of the data today. And there’s no end in sight. Mike shows a great example of how the document database (NoSQL database) takes unstructured data in the form of a story, identifies the data elements in the story (topic, location, author), semantically links these elements to show relationships between the elements and then identifies the hierarchy within the story (title, subtitle, body, etc…). Armed with all of this, the unstructured data lives with context. The original document persists but now all of the elements are discoverable in a variety of ways.
Given the reality that unstructured data is growing exponentially and needs to be integrated and analyzed alongside structured data to complete the picture, what does an application need from a NoSQL database? Basically what every database needs - five core capabilities. While this oversimplifies requirements, they are 1) inserts, updates and deletes 2) the ability to query the data 3) the ability to search the data 4) the ability to bulk process the data and 5) the ability to do all of this consistently. With extraordinary data volumes, this has to be done at scale in an affordable way. The only enterprise NoSQL database that handles all of this today is MarkLogic. Mike evaluates search relevance, advanced search using facets, geospatial search, entity enrichment, data consistency, developer productivity using JAVA, the ability to retrieve multiple documents, integration with the BI stack using SQL, real time data ingestion, indexing and much more. Imagine if you had to ask your programmers to develop an application to handle data locks, threading bugs, serialization, dead locks and rare conditions? Imagine if you had to write the code to ensure all parts of your data transactions succeeded? How would you ensure all of the data has been committed consistently? Do the commits meet all of your data rules? How do you ensure your data survives system failures and can be recovered after an inadvertent deletion? The vast majority of NoSQL document model databases lack these capabilities. If you are evaluating database technology today, I would highly recommend watching this video – at least twice.