A Data Science Central Community
Data scientists use a range of tools in their work and some of these eventually require programming. This book, titled The Art and Craft of Computer Programming, is a guide to computer programming. It does not focus on a specific programming language, but instead contains the essential material from a first year Computer Science course. The book is available from Amazon.com.…Continue
One of my favorite things over the year was starting a personal blog. (You can find my website here if you are curious.) How did it happen? Well, I was reading an article and one quote in particular really struck me: "it's not what you know it's who you know".
That quote really resonates with me. Throughout my life I’ve learned a lot, and…Continue
Added by Olga on November 2, 2016 at 9:03pm — No Comments
Most people think data science is smart people doing very smart stuff. Well that’s not it. Data science is just another subject involving its own bit of subtle complexities that has to be handled with knowledge and an innovative approach. JUST LIKE COOKING.
Cooking is art and science. So is Analytics. Both start from getting the right ingredients. No matter how many spices and cooking techniques you apply, the dish won’t…Continue
Added by Vivek Kalyanarangan on November 1, 2016 at 10:00am — No Comments
"Information is the oil of the 21st century, and analytics is the combustion engine" Peter Sondergaard, SVP, Gartner Research
In analytics, we retrieve information from various data sources; it can be structured or unstructured. The biggest challenge here is to retrieve information from unstructured data mainly texts. Here machine learning comes into the picture to overcome this challenge. Different algorithms have been designed in different platforms…Continue
Added by Vivek Kalyanarangan on September 9, 2016 at 8:30am — No Comments
By Dan Kellett, Director of Data Science, Capital One UK
Disclaimer: This is my attempt to explain some of the ‘Big Data’ concepts using basic analogies. There are inevitably nuances my analogy misses.
What is HDFS?
When people talk about ‘Hadoop’ they are usually referring to either the efficient storing or processing of large amounts of data. MapReduce is a framework for efficient processing using a parallel, distributed algorithm…Continue
Added by Dan Kellett on July 21, 2016 at 2:00am — No Comments
You’re working on the MAIN MODEL. The one that leverages half the company’s assets, and on which your paycheck and that of many others depends. You’ve already run through a stepwise, forward, and backward search of the variables, their interactions, and possible curvatures. What are the most productive things to do next?
Here are a couple of ideas…Continue
Added by David G. Young on April 27, 2016 at 8:07am — No Comments
As R programming language becoming popular more and more among data science group, industries, researchers, companies embracing R, going forward I will be writing posts on learning Data science using R. The tutorial course will include topics on data types of R, handling data using R, probability theory, Machine Learning, Supervised – unSupervised learning, Data Visualization using R, etc. Before going further, let’s just see some stats and tidbits on data science and…Continue
Recently I have come across a term, CRISP-DM - a data mining standard. Though this process is not a new one but I felt every analyst should know about commonly used Industry wide process. In this post I will explain about different phases involved in creating a data mining solution.
CRISP-DM, an acronym for Cross Industry Standard Process for Data Mining, is a data mining process model that includes commonly used approaches that data…
The job of a data analyst nowadays has become very extensive, in its need to cover a number of different and ever-changing tasks.
A data analyst must query a variety of internal and external data sources, each with a different access protocol and format; integrate these data with results from REST and web services queried over the Internet, such as Google API or any social media channel; exchange information with business analysts, who, while lacking the deep mathematical background,…Continue
Added by Rosaria Silipo on July 14, 2015 at 2:27am — No Comments
Random Forest is a machine learning algorithm used for classification, regression, and feature selection. It's an ensemble technique, meaning it combines the output of one weaker technique in order to get a stronger result.
The weaker technique in this case is a decision tree. Decision trees work by splitting the and re-splitting the data by…Continue
Added by Alex Woods on July 4, 2015 at 8:30am — No Comments
We all know that time is money, especially when you're paying a data scientist. But the New York Times reports that...
"Data scientists, according to interviews and expert estimates, spend 50 percent to 80 percent of their time mired in [the] mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets."…Continue
Added by Rosaria Silipo on May 6, 2015 at 12:30am — No Comments
Added by Vikas Kamra on December 1, 2014 at 4:30am — No Comments
In our day to day life, we come across a large number of Recommendation engines like Facebook Recommendation Engine for Friends’ suggestions, and suggestions of similar Like Pages, Youtube recommendation engine suggesting videos similar to our previous searches/preferences. In today’s blog post I will explain how to build a basic recommender System.…
Added by suresh kumar Gorakala on June 5, 2014 at 10:55pm — No Comments
All of us at some point in the process of examining…Continue
Added by aatash shah on February 27, 2014 at 5:36am — No Comments
Practicing Data science…
Added by Manish Bhoge on October 18, 2013 at 12:22pm — No Comments
Here is a crisp info graphic which communicates the top 5 data products which can be $ denting in the Airline industry
1. Property reccomender
2. Word of mouth modeler
3, Funnel friction Spotter
4. Traveller churn scorer
5. Sentiment Analyzer