# January 2017 Blog Posts (9)

### How to Handle Outliers in Regression Problems

Data Science in Python: Pandas Cheat Sheet -- This cheat sheet, along with explanations, was first published on DataCamp. Click on the picture to zoom in. To view other cheat sheets (Python, R, Machine Learning, Probability, Visualizations, Deel Learning, Data Science, and so on) click here. To read the article,…

January 31, 2017

### Tutorial: Neutralizing Outliers in Any Dimension

The main focus of this article is on computing the point that minimizes the sum of the "distances" to n points in a d-dimensional space, called centroid or center, in the presence of outliers.

This long article has several sections.

Content

1. A related physics problem

2. Algorithm to find the centroid

• Source code to generate points and compute centroid, using Monte…
January 30, 2017

### 46 SQL Job Interview Questions for Data Scientists

Here is our updated selection of featured articles and resources posted over the weekend:

January 15, 2017

### In Japan, "Artificial Intelligence" comes to be a super star while "Data Scientist" is fading away

I published a post about the current status of "Data Scientist" in Japan, as a periodic follow-up analysis since two years ago. Its trend still remains, but it's beyond my anticipation at that time.

Indeed growing trend of "Artificial Intelligence" in Japan is steeper than that in English, and "Data Scientist" is now getting to be…

Takashi J. OZAKI on January 13, 2017

### Ten Simple Rules for Effective Statistical Practice

This article, written by Kass RE, Caffo BS, Davidian M, Meng X-L, Yu B, and Reid N, contains the following rules:

• Statistical Methods Should Enable Data to Answer Scientific Questions
• Signals Always Come with Noise
• Statistical Analysis Is More Than a Set of Computations
• Keep it Simple
• Provide Assessments of Variability
• When Possible,…
January 10, 2017

### How to build a search engine: Part 4

This post is the fourth part of the multi-part series on how to build a search engine –

Vivek Kalyanarangan on January 10, 2017

### 7 Traps to Avoid Being Fooled by Statistical Randomness

Randomness is all around us. Its existence sends fear into the hearts of predictive analytics specialists everywhere -- if a process is truly random, then it is not predictable, in the analytic sense of that term.  Randomness refers to the absence of patterns, order, coherence, and predictability in a system.

Unfortunately, we…

Kirk Borne on January 9, 2017

### 12 Statistical and Machine Learning Methods that Every Data Scientist Should Know

Below is my personal list of statistical and machine learning methods that every data scientist should know in 2016.

1. Statistical Hypothesis Testing (t-test, chi-squared test & ANOVA)
2. Multiple Regression (Linear Models)
3. General Linear Models (GLM: Logistic Regression, Poisson Regression)
4. Random Forest
5. Xgboost (eXtreme Gradient Boosted Trees)
6. Deep Learning
7. Bayesian Modeling with…
Takashi J. OZAKI on January 8, 2017

### Data science jobs not requiring human interactions

Mirko Krivanek on January 6, 2017

