Subscribe to DSC Newsletter

12 Statistical and Machine Learning Methods that Every Data Scientist Should Know

Below is my personal list of statistical and machine learning methods that every data scientist should know in 2016.

  1. Statistical Hypothesis Testing (t-test, chi-squared test & ANOVA)
  2. Multiple Regression (Linear Models)
  3. General Linear Models (GLM: Logistic Regression, Poisson Regression)
  4. Random Forest
  5. Xgboost (eXtreme Gradient Boosted Trees)
  6. Deep Learning
  7. Bayesian Modeling with MCMC
  8. word2vec
  9. K-means Clustering
  10. Graph Theory & Network Analysis
  • (A1) Latent Dirichlet Allocation & Topic Modeling
  • (A2) Factorization (SVD, NMF)

From my experience in the data science industry for 4 years, I think that currently these 12 methods are the most popular, useful and suitable for various problems requiring data science.

As far as I've known, there have been not a few lists of "representative methods in data science" ever. However, I feel some of them are already out-of-date because they appear to neglect the latest advance of data science in the industry. Thus I made this list as the one by business person, who knows practical matters and solutions with data science, including statistics and machine learning in the industry.

In addition to the list itself, I showed R or Python scripts of an experiment on sample datasets for each method, in order to enable readers to try it easily.

The original post is here, including R or Python scripts and experiments on sample datasets.

Views: 57224


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Priya on March 8, 2017 at 4:23am
Please suggest a link for budding data scientists

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service