A Data Science Central Community

In this article, we explore a new type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These distributions are semi-stable (we define what this means below). In short it is a much wider class than the stable distributions (the only stable distribution with a finite variance…

ContinueAdded by Vincent Granville on November 27, 2019 at 11:14pm — No Comments

Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.

To demystify machine learning and to offer a learning path for those who are new to the core…

ContinueAdded by Vincent Granville on November 27, 2019 at 10:58am — No Comments

*This article is by Jorge Castañón, Ph.D., Senior Data Scientist at the IBM Machine Learning Hub.*

Data visualization plays two key roles:

1. *Communicating results clearly to a general audience.*

2. …

ContinueAdded by Vincent Granville on November 12, 2019 at 10:00am — No Comments

Some original and very interesting material is presented here, with possible applications in Fintech. No need for a PhD in math to understand this article: I tried to make the presentation as simple as possible, focusing on high-level results rather than technicalities. Yet, professional statisticians and mathematicians, even academic researchers, will find some deep and fascinating results worth further exploring.…

ContinueAdded by Vincent Granville on October 26, 2019 at 6:00pm — No Comments

By Bill Vorhies.

*Summary:** Here’ a proposal for real ‘zero touch’, ‘set-em-and-forget-em’ machine learning from the researchers at Amazon. If you have an environment as fast changing as e-retail and a huge number of models matching buyers and products you could achieve real cost savings and revenue increases by making the refresh cycle faster and more accurate with automation. This capability likely will be coming soon to your favorite AML…*

Added by Vincent Granville on October 22, 2019 at 2:30pm — No Comments

This list of lists contains books, notebooks, presentations, cheat sheets, and tutorials covering all aspects of data science, machine learning, deep learning, statistics, math, and more, with most documents featuring Python or R code and numerous illustrations or case studies. All this material is available for free, and consists of content mostly created in 2019 and 2018, by various top experts in their respective fields. A few of these documents are available on LinkedIn: see last…

ContinueAdded by Vincent Granville on October 13, 2019 at 11:00am — No Comments

I have used synthetic data sets many times for simulation purposes, most recently in my articles Six degrees of Separations between any two Datasets and How to Lie with p-values. Many…

ContinueAdded by Vincent Granville on October 2, 2019 at 5:00pm — No Comments

This is an interesting data science conjecture, inspired by the well known six degrees of separation problem, stating that there is a link involving no more than 6 connections between any two people on Earth, say between you and anyone living (say) in North Korea.

Here the link is between any two univariate data sets…

ContinueAdded by Vincent Granville on September 9, 2019 at 10:30am — No Comments

The material discussed here is also of interest to machine learning, AI, big data, and data science practitioners, as much of the work is based on heavy data processing, algorithms, efficient coding, testing, and experimentation. Also, it's not just two new conjectures, but paths and suggestions to solve these problems. The last section contains a few new, original exercises, some with solutions, and may be useful to students, researchers, and instructors offering math and statistics classes…

ContinueAdded by Vincent Granville on September 8, 2019 at 4:09am — No Comments

Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.

To demystify machine learning and to offer a learning path for those who are new to the core…

ContinueAdded by Vincent Granville on August 30, 2019 at 11:08am — No Comments

I introduce here a family of very peculiar statistical distributions governed by two parameters: *p*, a real number in [0, 1], and *b*, an integer > 1.

Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number…

ContinueAdded by Vincent Granville on August 30, 2019 at 10:11am — No Comments

Continued fractions are usually considered as a beautiful, curious mathematical topic, but with applications mostly theoretical and limited to math and number theory. Here we show how it can be used in applied business and economics contexts, leveraging the mathematical theory developed for continued fraction, to model and explain natural phenomena. …

ContinueAdded by Vincent Granville on August 30, 2019 at 9:42am — No Comments

In my previous posts, I compared model evaluation techniques using Statistical Tools & Tests and commonly used Classification and Clustering evaluation techniques

In this post, I'll take a look at how you can compare regression models. Comparing regression models is perhaps one of the trickiest tasks to complete in the "comparing models" arena; The reason is that there are literally dozens of statistics you can calculate to compare regression models, including:

**1.…**

Added by Vincent Granville on August 8, 2019 at 10:37am — No Comments

Sometimes, you see a diagram and it gives you an ‘aha ha’ moment. Here is one representing forward propagation and back propagation in a neural network:

A brief explanation is:

- Using the input variables x and y, The forwardpass (left half of the figure) calculates output z as a function of x and y i.e. f(x,y)
- The right side…

Added by Vincent Granville on August 8, 2019 at 10:29am — No Comments

Decision Trees, Random Forests and Boosting are among the top 16 data science and machine learning tools used by data scientists. The three methods are similar, with a significant amount of overlap. In a nutshell:

- A decision tree is a simple, decision making-diagram.
- Random forests are a large number of trees, combined (using averages or "majority rules") at the end of the process.
- Gradient boosting machines also combine decision trees, but start the combining…

Added by Vincent Granville on August 8, 2019 at 10:25am — No Comments

In financial markets, two of the most common trading strategies used by investors are the momentum and mean reversion strategies. If a stock exhibits momentum (or trending behavior as shown in the figure below), its price on the current period is more likely to increase (decrease) if it has already increased (decreased) on the previous period.

When the return of a stock at time t depends in some way on the return at the previous time t-1, the returns are said to be autocorrelated. In…

ContinueAdded by Vincent Granville on July 8, 2019 at 10:25am — No Comments

*Summary:** The annual Burtch Works salary survey tells us a lot about which industries are using the most data scientists and the difference between higher and lower skilled data scientists. Salary increases show us whether demand is increasing, and finally we take a shot at determining which skills are most in demand.*

What a difference a few years can make. We used to say that everyone loves a data scientist – and wants to be one. …

ContinueAdded by Vincent Granville on July 8, 2019 at 10:18am — No Comments

By Ajit Jaokar. This post is a part of my forthcoming book on Mathematical foundations of Data Science. In this post, we use the Perceptron algorithm to bridge the gap between high school maths and deep learning.

**Background**

As part of my role as course director of the Artificial Intelligence: Cloud and Edge Computing at the University of Oxford, I see more students who are familiar with programming than with mathematics.

They have last learnt maths…

ContinueAdded by Vincent Granville on June 27, 2019 at 12:22pm — No Comments

Originally published in 2014 and viewed more than 200,000 times, this is the oldest data science cheat sheet - the mother of all the numerous cheat sheets that are so popular nowadays. I decided to update it in June 2019. While the first half, dealing with installing components on your laptop and learning UNIX, regular expressions, and file management hasn't changed much, the second half, dealing with machine learning, was rewritten entirely from scratch. It is amazing how things changed in…

ContinueAdded by Vincent Granville on June 6, 2019 at 8:27pm — No Comments

We propose simple solutions to important problems that all data scientists face almost every day. In short, a toolbox for the handyman, useful to busy professionals in any field.

**1. Eliminating sample size effects**. Many statistics, such as correlations or R-squared, depend on the sample size, making it difficult to…

Added by Vincent Granville on June 4, 2019 at 12:00pm — No Comments

- The Exponential Mean: Alternative to Classic Means
- Bernouilli Lattice Models - Connection to Poisson Processes
- Explaining Data Science to a Non-Data Scientist
- New Probabilistic Approach to Factoring Big Numbers
- Simple Trick to Dramatically Improve Speed of Convergence
- State-of-the-Art Statistical Science to Tackle Famous Number Theory Conjectures
- Advanced Analytic Platforms – Changes in the Leaderboard 2020

- The 8 worst predictive modeling techniques
- Common Errors in Machine Learning due to Poor Statistics Knowledge
- How to Detect if Numbers are Random or Not
- How maths should be taught in high school
- Are Lottery Winning Numbers Really Random?
- 200 machine learning and data science resources
- 8 New Articles and Resources Featured Today

- Bayesian statistics (1)
- analytics (1)
- churn (1)
- crowd sourcing (1)
- data mining (1)
- email campaigns (1)
- fico (1)
- graph (1)
- lifetime value (1)
- rosacea (1)
- statistical litigation (1)
- user retention (1)

**2020**

**2019**

- December (4)
- November (5)
- October (4)
- September (2)
- August (5)
- July (2)
- June (2)
- May (4)
- April (3)
- March (3)
- February (5)
- January (2)

**2018**

- December (2)
- November (1)
- September (5)
- August (10)
- July (3)
- June (7)
- May (11)
- April (8)
- March (9)
- February (9)
- January (11)

**2017**

- December (6)
- November (8)
- October (9)
- September (5)
- August (8)
- July (3)
- June (6)
- May (4)
- April (10)
- March (4)
- February (6)
- January (5)

**2016**

**2015**

**2014**

**2013**

- December (6)
- November (6)
- October (4)
- September (4)
- August (7)
- July (8)
- June (4)
- May (8)
- April (9)
- March (11)
- February (9)
- January (6)

**2012**

- December (2)
- November (12)
- October (17)
- September (10)
- August (15)
- July (13)
- June (12)
- May (10)
- April (8)
- March (20)
- February (19)
- January (11)

**2011**

- December (19)
- November (15)
- October (11)
- September (16)
- August (7)
- July (4)
- June (8)
- May (11)
- April (9)
- March (6)
- February (7)
- January (7)

**2010**

- December (9)
- November (12)
- October (14)
- September (16)
- August (6)
- July (6)
- June (1)
- May (4)
- April (4)
- March (3)
- February (5)
- January (10)

**2009**

- December (11)
- November (9)
- October (6)
- September (1)
- July (1)
- June (1)
- May (2)
- April (1)
- March (1)
- February (2)
- January (2)

**2008**

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions