Subscribe to DSC Newsletter

Vincent Granville's Blog (780)

40+ Modern Tutorials Covering All Aspects of Machine Learning

This list of lists contains books, notebooks, presentations, cheat sheets, and tutorials covering all aspects of data science, machine learning, deep learning, statistics, math, and more, with most documents featuring Python or R code and numerous illustrations or case studies. All this material is available for free, and consists of content mostly created in 2019 and 2018, by various top experts in their respective fields. A few of these documents are available on LinkedIn: see last…

Continue

Added by Vincent Granville on October 13, 2019 at 11:00am — No Comments

Surprising Uses of Synthetic Random Data Sets

I have used synthetic data sets many times for simulation purposes, most recently in my articles Six degrees of Separations between any two Datasets and How to Lie with p-values. Many…

Continue

Added by Vincent Granville on October 2, 2019 at 5:00pm — No Comments

Six Degrees of Separation Between Any Two Data Sets

This is an interesting data science conjecture, inspired by the well known six degrees of separation problem, stating that there is a link involving no more than 6 connections between any two people on Earth, say between you and anyone living (say) in North Korea.   

Here the link is between any two univariate data sets…

Continue

Added by Vincent Granville on September 9, 2019 at 10:30am — No Comments

Two New Deep Conjectures in Probabilistic Number Theory

The material discussed here is also of interest to machine learning, AI, big data, and data science practitioners, as much of the work is based on heavy data processing, algorithms, efficient coding, testing, and experimentation. Also, it's not just two new conjectures, but paths and suggestions to solve these problems. The last section contains a few new, original exercises, some with solutions, and may be useful to students, researchers, and instructors offering math and statistics classes…

Continue

Added by Vincent Granville on September 8, 2019 at 4:09am — No Comments

10 Machine Learning Methods that Every Data Scientist Should Know

Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.

To demystify machine learning and to offer a learning path for those who are new to the core…

Continue

Added by Vincent Granville on August 30, 2019 at 11:08am — No Comments

A Strange Family of Statistical Distributions

I introduce here a family of very peculiar statistical distributions governed by two parameters: p, a real number in [0, 1], and b, an integer > 1. 

Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number…

Continue

Added by Vincent Granville on August 30, 2019 at 10:11am — No Comments

Extreme Events Modeling Using Continued Fractions

Continued fractions are usually considered as a beautiful, curious mathematical topic, but with applications mostly theoretical and limited to math and number theory. Here we show how it can be used in applied business and economics contexts, leveraging the mathematical theory developed for continued fraction, to model and explain natural phenomena. …

Continue

Added by Vincent Granville on August 30, 2019 at 9:42am — No Comments

Comparing Model Evaluation Techniques

In my previous posts, I compared model evaluation techniques using Statistical Tools & Tests and commonly used Classification and Clustering evaluation techniques

In this post, I'll take a look at how you can compare regression models. Comparing regression models is perhaps one of the trickiest tasks to complete in the "comparing models" arena; The reason is that there are literally dozens of statistics you can calculate to compare regression models, including:

1.…

Continue

Added by Vincent Granville on August 8, 2019 at 10:37am — No Comments

Elegant Representation of Forward and Back Propagation in Neural Networks

Sometimes, you see a diagram and it gives you an ‘aha ha’ moment. Here is one representing forward propagation and back propagation in a neural network:

A brief explanation is:

  • Using the input variables x and y, The forwardpass (left half of the figure) calculates output z as a function of x and y i.e. f(x,y)
  • The right side…
Continue

Added by Vincent Granville on August 8, 2019 at 10:29am — No Comments

Decision Tree vs Random Forest vs Gradient Boosting Machines

Decision Trees, Random Forests and Boosting are among the top 16 data science and machine learning tools used by data scientists. The three methods are similar, with a significant amount of overlap. In a nutshell:

  • A decision tree is a simple, decision making-diagram.
  • Random forests are a large number of trees, combined (using averages or "majority rules") at the end of the process.
  • Gradient boosting machines also combine decision trees, but start the combining…
Continue

Added by Vincent Granville on August 8, 2019 at 10:25am — No Comments

How the Mathematics of Fractals Can Help Predict Stock Markets Shifts

In financial markets, two of the most common trading strategies used by investors are the momentum and mean reversion strategies. If a stock exhibits momentum (or trending behavior as shown in the figure below), its price on the current period is more likely to increase (decrease) if it has already increased (decreased) on the previous period.

When the return of a stock at time t depends in some way on the return at the previous time t-1, the returns are said to be autocorrelated. In…

Continue

Added by Vincent Granville on July 8, 2019 at 10:25am — No Comments

Where’s the Love – Trends in Data Science Career Opportunities

Summary:  The annual Burtch Works salary survey tells us a lot about which industries are using the most data scientists and the difference between higher and lower skilled data scientists.  Salary increases show us whether demand is increasing, and finally we take a shot at determining which skills are most in demand.

 What a difference a few years can make.  We used to say that everyone loves a data scientist – and wants to be one. …

Continue

Added by Vincent Granville on July 8, 2019 at 10:18am — No Comments

How to learn the maths of Data Science using your high school maths knowledge

By Ajit Jaokar. This post is a part of my forthcoming book on Mathematical foundations of Data Science. In this post, we use the Perceptron algorithm to bridge the gap between high school maths and deep learning. 

Background

As part of my role as course director of the Artificial Intelligence: Cloud and Edge Computing at the University of Oxford, I see more students who are familiar with programming than with mathematics.

They have last learnt maths…

Continue

Added by Vincent Granville on June 27, 2019 at 12:22pm — No Comments

Machine Learning and Data Science Cheat Sheet

Originally published in 2014 and viewed more than 200,000 times, this is the oldest data science cheat sheet - the mother of all the numerous cheat sheets that are so popular nowadays. I decided to update it in June 2019. While the first half, dealing with installing components on your laptop and learning UNIX, regular expressions, and file management hasn't changed much, the second half, dealing with machine learning, was rewritten entirely from scratch. It is amazing how things changed in…

Continue

Added by Vincent Granville on June 6, 2019 at 8:27pm — No Comments

7 Simple Tricks to Handle Complex Machine Learning Issues

We propose simple solutions to important problems that all data scientists face almost every day. In short, a toolbox for the handyman, useful to busy professionals in any field.

1. Eliminating sample size effectsMany statistics, such as correlations or R-squared, depend on the sample size, making it difficult to…

Continue

Added by Vincent Granville on June 4, 2019 at 12:00pm — No Comments

Gentle Approach to Linear Algebra, with Machine Learning Applications

This simple introduction to matrix theory offers a refreshing perspective on the subject. Using a basic concept that leads to a simple formula for the power of a matrix, we see how it can solve time series, Markov chains, linear regression, data reduction, principal components analysis (PCA) and other machine learning problems. These problems are usually solved with more advanced matrix calculus, including eigenvalues, diagonalization, generalized inverse matrices, and other types of…

Continue

Added by Vincent Granville on May 28, 2019 at 9:00pm — No Comments

New Book: Classification and Regression In a Weekend (in Python)

We have added a new free book in our selection exclusively for DSC members. See the first entry below, to get started with machine learning with Python.

1. Book: Classification and Regression In a Weekend

This tutorial began as a series of weekend workshops created by Ajit Jaokar and Dan Howarth. The idea was to work with a specific (longish) program such that we explore as much of it as possible in one weekend. This book is an attempt to take this idea online.…

Continue

Added by Vincent Granville on May 16, 2019 at 6:24pm — No Comments

Confidence Intervals Without Pain, with Excel

We propose a simple model-free solution to compute any confidence interval and to extrapolate these intervals beyond the observations available in your data set. In addition we propose a mechanism  to sharpen the confidence intervals, to reduce their width by an order of magnitude. The methodology works with any estimator (mean, median, variance, quantile, correlation and so on) even when the data set violates the classical requirements necessary to make traditional statistical techniques…

Continue

Added by Vincent Granville on May 9, 2019 at 11:30am — No Comments

Re-sampling: Amazing Results and Applications

This crash course features a new fundamental statistics theorem -- even more important than the central limit theorem -- and a new set of statistical rules and recipes. We discuss concepts related to determining the optimum sample size, the optimum k in k-fold cross-validation, bootstrapping, new re-sampling techniques, simulations, tests of hypotheses, confidence intervals, and statistical inference using a unified, robust, simple…

Continue

Added by Vincent Granville on May 4, 2019 at 12:30pm — No Comments

Some Fun with Gentle Chaos, the Golden Ratio, and Stochastic Number Theory

So many fascinating and deep results have been written about the number (1 + SQRT(5)) / 2 and its related sequence - the Fibonacci numbers - that it would take years to read all of them. This number has been studied both for its applications (population growth, architecture) and its mathematical properties, for over 2,000 years. It is still a topic of active research.…

Continue

Added by Vincent Granville on April 25, 2019 at 7:30am — No Comments

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service