Subscribe to DSC Newsletter

New Perspectives on Statistical Distributions and Deep Learning

In this data science article, emphasis is placed on science, not just on data. State-of-the art material is presented in simple English, from multiple perspectives: applications, theoretical research asking more questions than it answers, scientific computing, machine learning, and algorithms. I attempt here to lay the foundations of a new statistical technology, hoping that it will plant the seeds for further research on a topic with a broad range of potential applications. It is based on mixture models. Mixtures have been studied and used in applications for a long time, and it is still a subject of active research. Yet you will find here plenty of new material.

Introduction and Context

In a previous article (see here) I attempted to approximate a random variable representing real data, by a weighted sum of simple kernels such as uniformly and independently, identically distributed random variables. The purpose was to build Taylor-like series approximations to more complex models (each term in the series being a random variable), to

  • avoid over-fitting,
  • approximate any empirical distribution (the inverse of the percentiles function) attached to real data,
  • easily compute data-driven confidence intervals regardless of the underlying distribution,
  • derive simple tests of hypothesis,
  • perform model reduction, 
  • optimize data binning to facilitate feature selection, and to improve visualizations of histograms
  • create perfect histograms,
  • build simple density estimators,
  • perform interpolations, extrapolations, or predictive analytics,
  • perform clustering and detect the number of clusters,
  • create deep learning Bayesian systems.

Why I've found very interesting properties about stable distributions during this research project, I could not come up with a solution to solve all these problems. The fact is that these weighed sums would usually converge (in distribution) to a normal distribution if the weights did not decay too fast -- a consequence of the central limit theorem. And even if using uniform kernels (as opposed to Gaussian ones) with fast-decaying weights, it would converge to an almost symmetrical, Gaussian-like distribution. In short, very few real-life data sets could be approximated by this type of model.

Now, in this article, I offer a full solution, using mixtures rather than sums. The possibilities are endless. 

Content of this article

1. Introduction and Context

2. Approximations Using Mixture Models

  • The error term
  • Kernels and model parameters
  • Algorithms to find the optimum parameters
  • Convergence and uniqueness of solution
  • Find near-optimum with fast, black-box step-wise algorithm

3. Example

  • Data and source code
  • Results

4. Applications

  • Optimal binning
  • Predictive analytics
  • Test of hypothesis and confidence intervals
  • Deep learning: Bayesian decision trees
  • Clustering

5. Interesting problems

  • Gaussian mixtures uniquely characterize a broad class of distributions
  • Weighted sums fail to achieve what mixture models do
  • Stable mixtures
  • Nested mixtures and Hierarchical Bayesian Systems
  • Correlations

Read full article here

Views: 71


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

On Data Science Central

© 2019 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service