A Data Science Central Community
This crash course features a new fundamental statistics theorem -- even more important than the central limit theorem -- and a new set of statistical rules and recipes. We discuss concepts related to determining the optimum sample size, the optimum k in k-fold cross-validation, bootstrapping, new re-sampling techniques, simulations, tests of hypotheses, confidence intervals, and statistical inference using a unified, robust, simple…Continue
Added by Vincent Granville on May 4, 2019 at 12:30pm — No Comments
So many fascinating and deep results have been written about the number (1 + SQRT(5)) / 2 and its related sequence - the Fibonacci numbers - that it would take years to read all of them. This number has been studied both for its applications (population growth, architecture) and its mathematical properties, for over 2,000 years. It is still a topic of active research.…Continue
Added by Vincent Granville on April 25, 2019 at 7:30am — No Comments
Summary: Finally there are tools that let us transcend ‘correlation is not causation’ and identify true causal factors and their relative strengths in our models. This is what prescriptive analytics was meant to be.
Just when I thought we’d figured it all out,…Continue
Added by Vincent Granville on April 24, 2019 at 7:30pm — No Comments
I describe here the ultimate number guessing game, played with real money. It is a new trading and gaming system, based on state-of-the-art mathematical engineering, robust architecture, and patent-pending technology. It offers an alternative to the stock market and traditional gaming. This system is also far more transparent than the stock market, and can not be manipulated, as formulas to win the biggest returns (with real money) are made public. Also, it simulates a neutral,…Continue
Added by Vincent Granville on April 15, 2019 at 10:00am — No Comments
Graph are meant to be seen
The third layer of graph technology that we discuss in this article is the front-end layer, the graph visualization one. The visualization of information has been the support of many types of analysis, including Social Network Analysis. For decades, visual representations have helped researchers,…
Added by Elise Devaux on April 9, 2019 at 4:00am — No Comments
Summary: A new business model strategy based around intermediary platforms powered by AI/ML is promising the most direct path to fastest growth, profitability, and competitive success. Adopting this new approach requires a deep change in mindset and is quite different from just adopting AI/ML to optimize your current operations.…Continue
Added by Vincent Granville on April 8, 2019 at 11:00pm — No Comments
We investigate a large class of auto-correlated, stationary time series, proposing a new statistical test to measure departure from the base model, known as Brownian motion. We also discuss a methodology to deconstruct these time series, in order to identify the root mechanism that generates the observations. The time series studied here can be discrete or continuous in time, they can have various degrees of smoothness (typically measured using the Hurst exponent) as well as long-range or…Continue
Added by Vincent Granville on April 1, 2019 at 1:00pm — No Comments
The emergence of alternative data as a key enabler in expanding credit delivery and financial inclusion is unmistakable.
The saying that the only thing that is constant is change, is attributed to Heraclitus, the Greek Philosopher. This is so very relevant today in the way lenders use technology and scoring solutions to understand the credit worthiness of applicants. Credit Risk Management has come a long way from the days when banks used just one credit score cut off to…Continue
Added by Naagesh Padmanaban on March 25, 2019 at 11:15pm — No Comments
I present here some innovative results from my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available …Continue
Added by Vincent Granville on March 21, 2019 at 7:30am — No Comments
Determining the number of clusters when performing unsupervised clustering is a tricky problem. Many data sets don't exhibit well separated clusters, and two human beings asked to visually tell the number of clusters by looking at a chart, are likely to provide two different answers. Sometimes clusters overlap with each other, and large clusters contain sub-clusters, making a decision not easy.
For instance, how many clusters do you see in the picture below? What is the optimum number…Continue
Added by Vincent Granville on March 13, 2019 at 6:00pm — No Comments
Many times, complex models are not enough (or too heavy), or not necessary, to get great, robust, sustainable insights out of data. Deep analytical thinking may prove more useful, and can be done by people not necessarily trained in data science, even by people with limited coding experience. Here we explore what we mean by deep analytical thinking, using a case study, and how it works: combining craftsmanship, business acumen, the use and creation of tricks and rules of thumb, to provide…Continue
Added by Vincent Granville on March 7, 2019 at 1:46pm — No Comments
Graph analytics frameworks consist of a set of tools and methods developed to extract knowledge…Continue
Added by Elise Devaux on February 27, 2019 at 5:00am — No Comments
In this data science article, emphasis is placed on science, not just on data. State-of-the art material is presented in simple English, from multiple perspectives: applications, theoretical research asking more questions than it answers, scientific computing, machine learning, and algorithms. I attempt here to lay the foundations of a new statistical technology, hoping that it will plant the seeds for further research on a topic with a broad range of potential…Continue
Added by Vincent Granville on February 23, 2019 at 11:00am — No Comments
Many of the following statistical tests are rarely discussed in textbooks or in college classes, much less in data camps. Yet they help answer a lot of different and interesting questions. I used most of them without even computing the underlying distribution under the null hypothesis, but instead, using simulations to check whether my assumptions were plausible or not. In short, my approach to statistical testing is is model-free, data-driven. Some are easy to implement even in Excel. Some…Continue
Added by Vincent Granville on February 13, 2019 at 7:00pm — No Comments
For background to this post, please see Learn Machine Learning Coding Basics in a weekend. Here,we present the glossary that we use for the coding and the mindmap attached to these classes and upcoming book. About 80 terms are included in the glossary, covering Ensembles, Regression, Classification,…Continue
Added by Vincent Granville on February 12, 2019 at 12:31pm — No Comments
Logistic regression (LR) models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn't work well for all situations. For example:
Added by Vincent Granville on February 7, 2019 at 3:23pm — No Comments
This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely.
For simplicity, I'll start with some notations used in the context of matrix theory, familiar to everyone: T(f) = g, where f and g are vectors, and T a square matrix. The notation T(f) represents the product between the matrix T, and the vector f. Now, imagine that the…Continue
First days after the celebration of the New Year is the time when looking back we can analyze our actions, promises and draw conclusions whether our predictions and expectations came true. As 2018 came to its end, it is perfect time to analyze it and to set trends for the next year. The amount of data generated every minute is enormous. Therefore new approaches, techniques, and solutions have been developed.…Continue
Added by Vincent Granville on January 29, 2019 at 11:43am — No Comments
Extract from the upcoming Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, …Continue
Added by Vincent Granville on January 27, 2019 at 3:20pm — No Comments
A passionate customer always provides feedback about his favorite product if it touches his emotional chord.
Product review contains wealth of information. Analyzing the review texts can unearth many hidden data points about the customer and the product. Such insights can help grow the business and gain revenue.
Lets look into a specific example. …Continue
Added by Kaniska Mandal on January 24, 2019 at 3:30pm — No Comments