]]>

]]>

I present here some innovative results in my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available here) on applied stochastic processes. You don't need to read my book to understand this article, but the book is a nice complement and introduction to the concepts discussed here.None of the material presented here is covered in standard textbooks on stochastic processes or dynamical systems. In particular, it has nothing to do with the classical logistic map or Brownian motions, though the systems investigated here exhibit very similar behaviors and are related to the classical models. This cross-disciplinary article is targeted to professionals with interests in statistics, probability, mathematics, machine learning, simulations, signal processing, operations research, computer science, pattern recognition, and physics. Because of its tutorial style, it should also appeal to beginners learning about Markov processes, time series, and data science techniques in general, offering fresh, off-the-beaten-path content not found anywhere else, contrasting with the material covered again and again in countless, identical books, websites, and classes catering to students and researchers alike. Some problems discussed here could be used by college professors in the classroom, or as original exam questions, while others are extremely challenging questions that could be the subject of a PhD thesis or even well beyond that level. This article constitutes (along with my book) a stepping stone in my endeavor to solve one of the biggest mysteries in the universe: are the digits of mathematical constants such as Pi, evenly distributed? To this day, no one knows if these digits even have a distribution to start with, let alone whether that distribution is uniform or not. Part of the discussion is about statistical properties of numeration systems in a non-integer base (such as the golden ratio base) and its applications. All systems investigated here, whether deterministic or not, are treated as stochastic processes, including the digits in question. They all exhibit strong chaos, albeit easily manageable due to their ergodicity. .Interesting connections with the golden ratio, special polynomials, and other special mathematical constants, are discussed in section 2. Finally, all the analyses performed during this work were done in Excel. I share my spreadsheets in this article, as well as many illustration, and all the results are replicable.Read the full article here.Content of this article1. General framework, notations and terminologyFinding the equilibrium distributionAuto-correlation and spectral analysisErgodicity, convergence, and attractorsSpace state, time state, and Markov chain approximationsExamples2. Case studyFirst fundamental theoremSecond fundamental theoremConvergence to equilibrium: illustration3. ApplicationsPotential application domainsExample: the golden ratio processFinding other useful b-processes4. Additional research topicsPerfect stochastic processesCharacterization of equilibrium distributions (the attractors)Probabilistic calculus and number theory, special integrals5. AppendixComputing the auto-correlation at equilibriumProof of the first fundamental theoremHow to find the exact equilibrium distribution6. Additional ResourcesRead the full article here.See More

]]>

]]>

Job SpotlightSr. Data Scientist - Child TrendsPrincipal Data Scientist - AetnaAdvanced Analytics Manager - Central California Alliance for HealthPrincipal Analytics Consultant - AetnaFeatured JobsStaff Data Scientist, Operations Researcher - BDSr Manager of Customer Experience & Analytics - MoosejawBiostatistician, Internal Medicine Research Core - Univ. of ArizonaBusiness Analyst - Resurgent Capital ServicesPrincipal Engineer - PayPalApplications Analyst - Bridgestone AmericasExpert Data Scientist - Information Security - NikeSoftware Engineer, Intern/Co-op - FacebookSolutions Architect, Healthcare and Life Sciences, Google CloudData Science Engineer - Hitachi VantaraSoftware Development Engineer - AWSMachine Learning Engineer - FacebookSenior Data Engineer - Hotels - TripAdvisorData Project Analyst - HoneywellSenior Data Scientist, Flights, Cruise & Car - TripAdvisorSoftware Engineer, iOS (Direct) - InstagramDirector of Data Science, Energy Platform - ShellData Scientist - Data and Enterprise Solutions - TwitterData Scientist/Machine Learning Engineer - CiscoSenior Scientist, Core Radio RecSys & User Engagement - PandoraSr. HR Business Partner - Wells FargoData Scientist - McAfeeCheck out the most recent jobs on AnalyticTalent.comFeatured BlogFree eBook: Enterprise AI - An Applications PerspectiveEnterprise AI: An applications perspective takes a use case driven approach to understanding the deployment of AI in the Enterprise. Designed for strategists and developers, the book provides a simple and practical roadmap based on application use cases for AI in Enterprises. Exclusively for Data Science Central members, with free access. You can download this book (PDF) here.Other ResourcesPredictive Analytics: Practical Applications - April 4See More

]]>

Determining the number of clusters when performing unsupervised clustering is a tricky problem. Many data sets don't exhibit well separated clusters, and two human beings asked to visually tell the number of clusters by looking at a chart, are likely to provide two different answers. Sometimes clusters overlap with each other, and large clusters contain sub-clusters, making a decision not easy.For instance, how many clusters do you see in the picture below? What is the optimum number of clusters? No one can tell with certainty, not AI, not a human being, not an algorithm. How many clusters here? (source: see here)In the above picture, the underlying data suggests that there are three main clusters. But an answer such as 6 or 7, seems equally valid. A number of empirical approaches have been used to determine the number of clusters in a data set. They usually fit into two categories:Model fitting techniques: an example is using a mixture model to fit with your data, and determine the optimum number of components; or use density estimation techniques, and test for the number of modes (see here.) Sometimes, the fit is compared with that of a model where observations are uniformly distributed on the entire support domain, thus with no cluster; you may have to estimate the support domain in question, and assume that it is not made of disjoint sub-domains; in many cases, the convex hull of your data set, as an estimate of the support domain, is good enough. Visual techniques: for instance, the silhouette or elbow rule (very popular.)In both cases, you need a criterion to determine the optimum number of clusters. In the case of the elbow rule, one typically uses the percentage of unexplained variance. This number is 100% with zero cluster, and it decreases (initially sharply, then more modestly) as you increase the number of clusters in your model. When each point constitutes a cluster, this number drops to 0. Somewhere in between, the curve that displays your criterion, exhibits an elbow (see picture below), and that elbow determines the number of clusters. For instance, in the chart below, the optimum number of clusters is 4.The elbow rule tells you that here, your data set has 4 clusters (elbow strength in red)Good references on the topic are available. Some R functions are available too, for instance fviz_nbclust. However, I could not find in the literature, how the elbow point is explicitly computed. Most references mention that it is mostly hand-picked by visual inspection, or based on some predetermined but arbitrary threshold. In the next section, we solve this problem.Read full article here. See More

]]>

Many times, complex models are not enough (or too heavy), or not necessary, to get great, robust, sustainable insights out of data. Deep analytical thinking may prove more useful, and can be done by people not necessarily trained in data science, even by people with limited coding experience. Here we explore what we mean by deep analytical thinking, using a case study, and how it works: combining craftsmanship, business acumen, the use and creation of tricks and rules of thumb, to provide sound answers to business problems. These skills are usually acquired by experience more than by training, and data science generalists (see here how to become one) usually possess them.This article is targeted to data science managers and decision makers, as well as to junior professionals who want to become one at some point in their career. Deep thinking, unlike deep learning, is also more difficult to automate, so it provides better job security. Those automating deep learning are actually the new data science wizards, who can think out-of-the box. Much of what is described in this article is also data science wizardry, and not taught in standard textbooks nor in the classroom. By reading this tutorial, you will learn and be able to use these data science secrets, and possibly change your perspective on data science. Data science is like an iceberg: everyone knows and can see the tip of the iceberg (regression models, neural nets, cross-validation, clustering, Python, and so on, as presented in textbooks.) Here I focus on the unseen bottom, using a statistical level almost accessible to the layman, avoiding jargon and complicated math formulas, yet discussing a few advanced concepts. Read full article here. Content1. Case Study: The Problem2. Deep Analytical ThinkingAnswering hidden questionsBusiness questionsData questionsMetrics questions3. Data Science WizardryGeneric algorithmIllustration with three different modelsResults4. A few data science hacksSee More