### 10 Machine Learning Methods that Every Data Scientist Should Know

Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.

To demystify machine learning and to offer a learning path for those who are new to the core…

Added by Vincent Granville on August 30, 2019 at 11:08am

### A Strange Family of Statistical Distributions

I introduce here a family of very peculiar statistical distributions governed by two parameters: p, a real number in [0, 1], and b, an integer > 1.

Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number…

Added by Vincent Granville on August 30, 2019 at 10:11am

### Extreme Events Modeling Using Continued Fractions

Continued fractions are usually considered as a beautiful, curious mathematical topic, but with applications mostly theoretical and limited to math and number theory. Here we show how it can be used in applied business and economics contexts, leveraging the mathematical theory developed for continued fraction, to model and explain natural phenomena. …

Added by Vincent Granville on August 30, 2019 at 9:42am

### Comparing Model Evaluation Techniques

In my previous posts, I compared model evaluation techniques using Statistical Tools & Tests and commonly used Classification and Clustering evaluation techniques

In this post, I'll take a look at how you can compare regression models. Comparing regression models is perhaps one of the trickiest tasks to complete in the "comparing models" arena; The reason is that there are literally dozens of statistics you can calculate to compare regression models, including:

1.…

Added by Vincent Granville on August 8, 2019 at 10:37am

### Elegant Representation of Forward and Back Propagation in Neural Networks

Sometimes, you see a diagram and it gives you an ‘aha ha’ moment. Here is one representing forward propagation and back propagation in a neural network:

A brief explanation is:

• Using the input variables x and y, The forwardpass (left half of the figure) calculates output z as a function of x and y i.e. f(x,y)
• The right side…
Added by Vincent Granville on August 8, 2019 at 10:29am

### Decision Tree vs Random Forest vs Gradient Boosting Machines

Decision Trees, Random Forests and Boosting are among the top 16 data science and machine learning tools used by data scientists. The three methods are similar, with a significant amount of overlap. In a nutshell:

• A decision tree is a simple, decision making-diagram.
• Random forests are a large number of trees, combined (using averages or "majority rules") at the end of the process.
• Gradient boosting machines also combine decision trees, but start the combining…
Added by Vincent Granville on August 8, 2019 at 10:25am

