Subscribe to DSC Newsletter

January 2017 Blog Posts (9)

How to Handle Outliers in Regression Problems

New featured content for data scientists:

Data Science in Python: Pandas Cheat Sheet -- This cheat sheet, along with explanations, was first published on DataCamp. Click on the picture to zoom in. To view other cheat sheets (Python, R, Machine Learning, Probability, Visualizations, Deel Learning, Data Science, and so on) click here. To read the article,…


Added by Vincent Granville on January 31, 2017 at 10:30pm — No Comments

Tutorial: Neutralizing Outliers in Any Dimension

The main focus of this article is on computing the point that minimizes the sum of the "distances" to n points in a d-dimensional space, called centroid or center, in the presence of outliers. 

This long article has several sections.


1. A related physics problem

2. Algorithm to find the centroid

  • Source code to generate points and compute centroid, using Monte…

Added by Vincent Granville on January 30, 2017 at 2:30pm — No Comments

46 SQL Job Interview Questions for Data Scientists

Here is our updated selection of featured articles and resources posted over the weekend:


Added by Vincent Granville on January 15, 2017 at 7:49pm — No Comments

In Japan, "Artificial Intelligence" comes to be a super star while "Data Scientist" is fading away

I published a post about the current status of "Data Scientist" in Japan, as a periodic follow-up analysis since two years ago. Its trend still remains, but it's beyond my anticipation at that time.

Indeed growing trend of "Artificial Intelligence" in Japan is steeper than that in English, and "Data Scientist" is now getting to be…


Added by Takashi J. OZAKI on January 13, 2017 at 6:30am — 1 Comment

Ten Simple Rules for Effective Statistical Practice

This article, written by Kass RE, Caffo BS, Davidian M, Meng X-L, Yu B, and Reid N, contains the following rules:

  • Statistical Methods Should Enable Data to Answer Scientific Questions
  • Signals Always Come with Noise
  • Plan Ahead, Really Ahead
  • Worry about Data Quality
  • Statistical Analysis Is More Than a Set of Computations
  • Keep it Simple
  • Provide Assessments of Variability
  • Check Your Assumptions
  • When Possible,…

Added by Vincent Granville on January 10, 2017 at 11:16am — No Comments

How to build a search engine: Part 4

This post is the fourth part of the multi-part series on how to build a search engine –


Added by Vivek Kalyanarangan on January 10, 2017 at 1:00am — No Comments

7 Traps to Avoid Being Fooled by Statistical Randomness

Randomness is all around us. Its existence sends fear into the hearts of predictive analytics specialists everywhere -- if a process is truly random, then it is not predictable, in the analytic sense of that term.  Randomness refers to the absence of patterns, order, coherence, and predictability in a system. 

Unfortunately, we…


Added by Kirk Borne on January 9, 2017 at 6:00pm — 5 Comments

12 Statistical and Machine Learning Methods that Every Data Scientist Should Know

Below is my personal list of statistical and machine learning methods that every data scientist should know in 2016.

  1. Statistical Hypothesis Testing (t-test, chi-squared test & ANOVA)
  2. Multiple Regression (Linear Models)
  3. General Linear Models (GLM: Logistic Regression, Poisson Regression)
  4. Random Forest
  5. Xgboost (eXtreme Gradient Boosted Trees)
  6. Deep Learning
  7. Bayesian Modeling with…

Added by Takashi J. OZAKI on January 8, 2017 at 6:30am — 1 Comment

Blog Topics by Tags

Monthly Archives














On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service