A Data Science Central Community
By Dan Kellett, Director of Data Science, Capital One UK
Disclaimer: This is my attempt to explain some of the ‘Big Data’ concepts using basic analogies. There are inevitably nuances my analogy misses.
What is HDFS?
When people talk about ‘Hadoop’ they are usually referring to either the efficient storing or processing of large amounts of data. MapReduce is a framework for efficient processing using a parallel, distributed algorithm…Continue
Added by Dan Kellett on July 21, 2016 at 2:00am — No Comments
This is an excerpt from my blogpost Working With Large Data Sets...
For the past 18 months I’ve moved from working on the SMTP proxy to working on our other systems, all of which make use of the data we collect from each connection. It’s a fair amount of data and it can be up to 2Kb in size for each connection. Our servers receive approximately 1000 of these pieces of data per second, which is fairly sustained due to our global…Continue