Subscribe to DSC Newsletter

Vincent Granville's Blog (722)

The Death of the Data Scientist

In 2018 Fast Company declared the Data Scientist the best job for the third year in a row, which I wholeheartedly agree with (besides the Director of Fun at the York National Railway Museum), however the role of data scientist, as we know it, will soon have the same fate as the bowling pinsetters, chariot racers, and human alarm clocks.

In 2000-2010 data science was dominated by masters of herculean subjects, with PhDs in linear…

Continue

Added by Vincent Granville on July 12, 2018 at 6:30pm — No Comments

Great Saturday Reading

Here is our selection of featured articles posted in the last few days:

Featured Resources and Technical Contributions

Continue

Added by Vincent Granville on July 7, 2018 at 10:16am — No Comments

Great Saturday Reading

Here is our selection of featured articles and resources posted in the last few days:

Featured Resources and Technical Contributions

Continue

Added by Vincent Granville on June 30, 2018 at 3:26pm — No Comments

Simple Solution to Feature Selection Problems

We discuss a new approach for selecting features from a large set of features, in an unsupervised machine learning framework. In supervised learning such as linear regression or supervised clustering, it is possible to test the predicting power of a set of features (also called independent variables by statisticians, or predictors) using metrics such as goodness of fit with the response (the dependent variable), for instance using the R-squared coefficient. This makes the process of feature…

Continue

Added by Vincent Granville on June 20, 2018 at 8:36am — No Comments

Great Saturday Reading

Here are our selection of featured contributions posted in the last few days:

Featured Resources and Technical Contributions

Continue

Added by Vincent Granville on June 16, 2018 at 10:00am — No Comments

Top 20 Python libraries for data science in 2018

Python continues to take leading positions in solving data science tasks and challenges. Last year we made a blog post overviewing the Python’s libraries that proved to be the most helpful at that moment. This year, we expanded our list with new libraries and gave a fresh look to the ones we already talked about, focusing on the updates that have been made during the year.…

Continue

Added by Vincent Granville on June 15, 2018 at 11:57am — No Comments

Scale-Invariant Clustering and Regression

The impact of a change of scale, for instance using years instead of days as the unit of measurement for one variable in a clustering problem, can be dramatic. It can result in a totally different cluster structure. Frequently, this is not a desirable property, yet it is rarely mentioned in textbooks. I think all clustering software should state in their user guide, that the algorithm is sensitive to scale.

We illustrate the problem here, and propose a scale-invariant methodology for…

Continue

Added by Vincent Granville on June 9, 2018 at 8:02pm — No Comments

Is Model Bias a Threat to Equal and Fair Treatment? Maybe, Maybe Not.

New article by Bill Vorhies.

Summary:  There is a great hue and cry about the danger of bias in our predictive models when applied to high significance events like who gets a loan, insurance, a good school assignment, or bail.  It’s not as simple as it seems and here we try to take a more nuanced look.  The result is not as threatening as many headlines make it seem.…

Continue

Added by Vincent Granville on June 7, 2018 at 5:33pm — No Comments

Great Sunday Reading

Here is our selection of featured articles and resources posted in the last few days.

Featured Resources and Technical Contributions

Continue

Added by Vincent Granville on June 3, 2018 at 4:30pm — No Comments

12 Interesting Reads for Math Geeks

Many data scientists have a passion for mathematics, and many modern math problems can be explored using data science. Below is a selection of interesting articles, many about challenging, deep mathematical problems, by a data scientist who developed math-free algorithms. Some of these articles cover statistical theory and thus belong to…

Continue

Added by Vincent Granville on May 28, 2018 at 11:30am — No Comments

Mathematical Olympiads for Undergrad Students

Mathematical Olympiads are popular among high school students. However, there is nothing similar for college students, except maybe IMC. Even IMC is not popular. It focuses mostly on the same kind of problems as high school Olympiads, and you can not participate if you are over 23 years old. In addition, it is organized by country, as opposed to globally, thus favoring countries with a large population. Topics such as…

Continue

Added by Vincent Granville on May 25, 2018 at 9:00am — No Comments

The First Things you Should Learn as a Data Scientist - Not what you Think

The list below is a (non-comprehensive) selection of what I believe should be taught first, in data science classes, based on 30 years of business experience. This is a follow up to my article Why logistic regression should be taught last.

I am not sure whether these topics below are even discussed in data camps or college…

Continue

Added by Vincent Granville on May 24, 2018 at 9:15pm — No Comments

Great Saturday Reading

Here is our selection of featured articles and resources posted in the last few days.

Continue

Added by Vincent Granville on May 19, 2018 at 3:22pm — No Comments

5 Tips How to Write Data Analysis Plan

With a data analysis plan, you know what you’re going to do when you actually sit down to do the analysis of the data you’ve gathered. It’s a vitally important thing for you to have, as it will guide how you’re going to collect your data. After all, it’s very difficult to add in new variables afterward.

For that reason, you want to make sure you’ve created your plan beforehand so that you can be sure that you’re asking all the questions you need to and you know what you’re going to…

Continue

Added by Vincent Granville on May 13, 2018 at 5:30pm — No Comments

Great Sunday Reading

Here is our selection of featured articles and resources posted in the last few days:

Continue

Added by Vincent Granville on May 13, 2018 at 3:00pm — No Comments

Selection of Great Data Science Articles still Worth Reading

These articles are between 3 and 5 year old, but are still valuable today. The methodology used in these articles is modern, and still state-of-the-art today. Some discuss immense data sets still available to the public, and that resulted in designing new machine learning techniques to handle them. 

I am in the process of organizing these articles (written by myself) to eventually self-publish data science tutorials, in a few separate booklets, that are easy to understand for the…

Continue

Added by Vincent Granville on May 12, 2018 at 8:30pm — No Comments

Deep Dive into Polynomial Regression and Overfitting

In this article, we show that the issue with polynomial regression is not over-fitting, but numerical precision. Even if done right, numerical precision still remains an insurmountable challenge. We focus here on step-wise polynomial regression, which is supposed to be more stable than the traditional model. In step-wise regression, we estimate one coefficient at a time, using the classic least square technique. …

Continue

Added by Vincent Granville on May 9, 2018 at 9:30pm — No Comments

AI with Pyramids of Self Programmable Gates

Guest blog post by David Enríquez Arriano. For more information or to get higher pictures resolution, contact the author (see contact information at the bottom of this article.)

Introduction

This is a different approach to solve the AI problem. It is a cognitive math based on pyramids built with self-programming logic gates through learning.

A Boolean polynomial associated with a given truth table can be implemented with electronic…

Continue

Added by Vincent Granville on May 8, 2018 at 4:07pm — No Comments

Great Saturday Reading

Here is our selection of featured resources and articles published in the last few days. Enjoy the reading!

Resources

Continue

Added by Vincent Granville on May 5, 2018 at 9:57am — No Comments

Temporal Convolutional Nets Take Over from RNNs for NLP Predictions

Summary: Our starting assumption that sequence problems (language, speech, and others) are the natural domain of RNNs is being challenged.  Temporal Convolutional Nets (TCNs) which are our workhorse CNNs with a few new features are outperforming RNNs on major applications today.  Looks like RNNs may well be history.

 …

Continue

Added by Vincent Granville on May 2, 2018 at 7:59am — No Comments

Monthly Archives

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2018   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service