A Data Science Central Community

Below is my personal list of statistical and machine learning methods that every data scientist should know in 2016.

**Statistical Hypothesis Testing (t-test, chi-squared test & ANOVA)****Multiple Regression (Linear Models)****General Linear Models (GLM: Logistic Regression, Poisson Regression)****Random Forest****Xgboost (eXtreme Gradient Boosted Trees)****Deep Learning****Bayesian Modeling with MCMC****word2vec****K-means Clustering****Graph Theory & Network Analysis**

**(A1) Latent Dirichlet Allocation & Topic Modeling****(A2) Factorization (SVD, NMF)**

From my experience in the data science industry for 4 years, I think that currently these 12 methods are the most popular, useful and suitable for various problems requiring data science.

As far as I've known, there have been not a few lists of "representative methods in data science" ever. However, I feel some of them are already out-of-date because they appear to neglect the latest advance of data science in the industry. Thus I made this list as the one by business person, who knows practical matters and solutions with data science, including statistics and machine learning in the industry.

In addition to the list itself, I showed R or Python scripts of an experiment on sample datasets for each method, in order to enable readers to try it easily.

The original post is **here**, including R or Python scripts and experiments on sample datasets.

## You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge