All Blog Posts Tagged 'bayes' - AnalyticBridge2020-04-03T18:01:20Zhttps://www.analyticbridge.datasciencecentral.com/profiles/blog/feed?tag=bayes&xn_auth=noState-of-the-Art Statistical Science to Tackle Famous Number Theory Conjecturestag:www.analyticbridge.datasciencecentral.com,2020-03-01:2004291:BlogPost:3971512020-03-01T06:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>The methodology described here has broad applications, leading to new statistical tests, new type of ANOVA (analysis of variance), improved design of experiments, interesting fractional factorial designs, a better understanding of irrational numbers leading to cryptography, gaming and Fintech applications, and high quality random numbers generators (and when you really need them). It also features exact arithmetic / high performance computing and distributed algorithms to compute millions of…</p>
<p>The methodology described here has broad applications, leading to new statistical tests, new type of ANOVA (analysis of variance), improved design of experiments, interesting fractional factorial designs, a better understanding of irrational numbers leading to cryptography, gaming and Fintech applications, and high quality random numbers generators (and when you really need them). It also features exact arithmetic / high performance computing and distributed algorithms to compute millions of binary digits for an infinite family of real numbers, including detection of auto- and cross-correlations (or lack of) in the digit distributions.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3972061349?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3972061349?profile=RESIZE_710x" class="align-center"/></a></p>
<p>The data processed in my experiment, consisting of raw irrational numbers (described by a new class of elementary recurrences) led to the discovery of unexpected apparent patterns in their digit distribution: in particular, the fact that a few of these numbers, contrarily to popular belief, do not have 50% of their binary digits equal to 1. It turned out that perfectly random digits simulated in large numbers, with a good enough pseudo-random generator, also exhibit the same strange behavior, pointing to the fact that pure randomness may not be as random as we imagine it is. Ironically, failure to exhibit these patterns would be an indicator that there really is a departure from pure randomness in the digits in question.</p>
<p>In addition to new statistical / mathematical methods and discoveries and interesting applications, you will learn in my article how to avoid this type of statistical traps that lead to erroneous conclusions, when performing a large number of statistical tests, and how to not be misled by false appearances. I call them<span> </span><em>statistical hallucinations</em> and<span> </span><em>false outliers</em>.</p>
<p>This article has two main sections: section 1, with deep research in number theory, and section 2, with deep research in statistics, with applications. You may skip one of the two sections depending on your interests and how much time you have. Both sections, despite state-of-the-art in their respective fields, are written in simple English. It is my wish that with this article, I can get data scientists to be interested in math, and the other way around: the topics in both cases have been chosen to be exciting and modern. I also hope that this article will give you new powerful tools to add to your arsenal of tricks and techniques. Both topics are related, the statistical analysis being based on the numbers discussed in the math section. </p>
<p>One of the interesting new topics discussed here for the first time is the cross-correlation between the digits of two irrational numbers. These digit sequences are treated as multivariate time series. I believe this is the first time ever that this subject is not only investigated in detail, but in addition comes with a deep, spectacular probabilistic number theory result about the distributions in question, with important implications in security and cryptography systems. Another related topic discussed here is a generalized version of the Collatz conjecture, with some insights on how to potentially solve it.</p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/state-of-the-art-statistical-science-to-address-famous-number-the" target="_blank" rel="noopener">Read the full article here</a>. </p>
<p><strong>Content</strong></p>
<p>1. On the Digits Distribution of Quadractic Irrational Numbers</p>
<ul>
<li>Properties of the recursion</li>
<li>Reverse recursion</li>
<li>Properties of the reverse recursion</li>
<li>Connection to Collatz conjecture</li>
<li>Source code</li>
<li>New deep probabilistic number theory results</li>
<li>Spectacular new result about cross-correlations</li>
<li>Applications</li>
</ul>
<p>2. New Statistical Techniques Used in Our Analysis</p>
<ul>
<li>Data, features, and preliminary analysis</li>
<li>Doing it the right way</li>
<li>Are the patterns found a statistical illusion, or caused by errors, or real?</li>
<li>Pattern #1: Non-Gaussian behavior</li>
<li>Pattern #2: Illusionary outliers</li>
<li>Pattern #3: Weird distribution for block counts</li>
<li>Related articles and books</li>
</ul>
<p>Appendix</p>Advanced Analytic Platforms – Changes in the Leaderboard 2020tag:www.analyticbridge.datasciencecentral.com,2020-02-21:2004291:BlogPost:3969732020-02-21T16:25:05.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong><em>Summary:</em></strong><span> </span><em>The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out the big news is how much more capable all the platforms have become. Of course there are also some interesting winner and loser stories.</em></p>
<p>The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out for 2020. The really big news is how many excellent choices are now available. In a remarkable move, the whole field…</p>
<p><strong><em>Summary:</em></strong><span> </span><em>The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out the big news is how much more capable all the platforms have become. Of course there are also some interesting winner and loser stories.</em></p>
<p>The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out for 2020. The really big news is how many excellent choices are now available. In a remarkable move, the whole field of competitors has moved strongly up and to the right offering more and more Leaders or near-leader Visionaries than ever before.</p>
<p>It’s a mark of maturity in our industry that so many platforms offer fully capable model development, operationalizing, and management features. That list of requirements as defined by Gartner grows longer every year and earning a better rating requires increasing capability and increasing customer satisfaction.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3886789662?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3886789662?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span><strong>What Are the Major Changes?</strong></span></p>
<p>As in previous years we’ve charted the major changes in position using green arrows for improvement and red arrows to indicate a reduced rating. The blue dots are current ratings and the gray dots are from a year ago.</p>
<p><em>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/advanced-analytic-platforms-changes-in-the-leaderboard-2020" target="_blank" rel="noopener">here</a> with the 2020 version of the above chart, with comments.</em></p>Sentiment Analysis with Naive Bayes and LSTMtag:www.analyticbridge.datasciencecentral.com,2020-02-20:2004291:BlogPost:3969662020-02-20T03:42:19.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p class="justifyfull" dir="ltr"><span>In this notebook, we try to predict the positive (label 1) or negative (label 0) sentiment of the sentence. We use the UCI Sentiment Labelled Sentences Data Set</span><span>.</span></p>
<p class="justifyfull" dir="ltr"><span>Sentiment analysis is very useful in many areas. For example, it can be used for internet conversations moderation. Also, it is possible to predict ratings that users can assign to a certain product (food, household appliances, hotels,…</span></p>
<p class="justifyfull" dir="ltr"><span>In this notebook, we try to predict the positive (label 1) or negative (label 0) sentiment of the sentence. We use the UCI Sentiment Labelled Sentences Data Set</span><span>.</span></p>
<p class="justifyfull" dir="ltr"><span>Sentiment analysis is very useful in many areas. For example, it can be used for internet conversations moderation. Also, it is possible to predict ratings that users can assign to a certain product (food, household appliances, hotels, films, etc) based on the reviews.</span></p>
<p class="justifyfull" dir="ltr"><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3873882138?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3873882138?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p class="justifyfull" dir="ltr"><span>In this notebook we are using two families of machine learning algorithms</span><span>: Naive Bayes (NB) and</span><span> </span><span>long short term memory (LSTM) neural networks</span><span>.</span></p>
<ul>
<li dir="ltr"><p dir="ltr">AYLIEN</p>
</li>
<li dir="ltr"><p dir="ltr">Deeplearning4j</p>
</li>
<li dir="ltr"><p dir="ltr">Understanding LSTM Networks</p>
</li>
<li dir="ltr"><p dir="ltr">Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling<span> </span><span> </span></p>
</li>
<li><p dir="ltr">The Unreasonable Effectiveness of Recurrent Neural Networks</p>
</li>
</ul>
<p class="justifyfull" dir="ltr"><span>We will use pandas, numpy for data manipulation, nltk for natural language processing, matplotlib, seaborn and plotly for data visualization, sklearn and keras for learning the models.</span></p>
<p class="justifyfull" dir="ltr"><em>Read the full article with source code and illustrations, <a href="https://www.datasciencecentral.com/profiles/blogs/sentiment-analysis-with-naive-bayes-and-lstm" target="_blank" rel="noopener">here</a>. </em></p>Common Errors in Machine Learning due to Poor Statistics Knowledgetag:www.analyticbridge.datasciencecentral.com,2020-02-07:2004291:BlogPost:3967692020-02-07T16:48:30.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-lie-with-p-values" rel="noopener" target="_blank">How to Lie with P-values</a> (also discussing…</p>
<p>Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-lie-with-p-values" target="_blank" rel="noopener">How to Lie with P-values</a> (also discussing how to handle and fix it.)</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3852501387?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3852501387?profile=RESIZE_710x" class="align-center"/></a></p>
<p>This is being done on such a large scale, I think it is probably the main cause of fake news, and the impact is disastrous on people who take for granted what they read in the news or what they hear from the government. Some people are sent to jail based on evidence tainted with major statistical flaws. Government money is spent, propaganda is generated, wars are started, and laws are created based on false evidence. Sometimes the data scientist has no choice but to knowingly cook the numbers to keep her job. Usually, these “bad stats” end up being featured in beautiful but faulty visualizations: axes are truncated, charts are distorted, observations and variables are carefully chosen just to make a (wrong) point.</p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/common-errors-in-machine-learning-due-to-poor-statistics-knowledg" target="_blank" rel="noopener">Read the full article here</a>. </p>
<p><strong>Related articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-lie-with-p-values" target="_blank" rel="noopener">How to Lie with P-values</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/four-types-of-data-scientist" target="_blank" rel="noopener">Four Types of Data Scientist</a></li>
<li><a href="https://www.bigdatanews.datasciencecentral.com/profiles/blogs/debunking-forbes-article-about-the-death-of-the-data-scientist" target="_blank" rel="noopener">Debunking Forbes Article about the Death of the Data Scientist</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-be-a-data-science-generalist" target="_blank" rel="noopener">Why You Should be a Data Science Generalist - and How to Become One</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/becoming-a-billionaire-data-scientist-vs-struggling-to-get-a-100k">Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job<span> </span></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/opinion-is-a-phd-helpful-for-a-data-science-career" target="_blank" rel="noopener">Is a PhD helpful for a data science career?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/if-data-science-is-in-demand-why-is-it-so-hard-to-get-a-job" target="_blank" rel="noopener">If data science is in demand, why is it so hard to get a job?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-do-people-with-no-experience-want-to-become-data-scientists" target="_blank" rel="noopener">Why do people with no experience want to become data scientists?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-is-becoming-a-data-scientist-so-difficult" target="_blank" rel="noopener">Why is Becoming a Data Scientist so Difficult?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/full-stack-data-scientist-the-elusive-unicorn-and-data-hacker" target="_blank" rel="noopener">Full Stack Data Scientist: The Elusive Unicorn and Data Hacker</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/statistical-significance-and-p-values-take-another-blow" target="_blank" rel="noopener">Statistical Significance and p-Values Take Another Blow</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/are-data-science-or-stats-curricula-in-us-too-specialized" target="_blank" rel="noopener">Are data science or stats curricula in US too specialized?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-do-you-identify-an-actual-data-scientist" target="_blank" rel="noopener">How do you identify an actual data scientist?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/is-it-still-possible-today-to-become-a-self-taught-data-scientist" target="_blank" rel="noopener">Is it still possible today to become a self-taught data scientist?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/will-the-job-outlook-for-data-scientists-severely-decline-after-2" target="_blank" rel="noopener">Will the job outlook for data scientists severely decline after 2020?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-logistic-regression-should-be-the-last-thing-you-learn-when-b" target="_blank" rel="noopener">Why Logistic Regression should be the last thing you learn</a></li>
</ul>
<p><em>Source for picture: <a href="https://storage.ning.com/topology/rest/1.0/file/get/3852503404?profile=original" target="_blank" rel="noopener">here</a> </em></p>New Perspective on Fermat's Last Theoremtag:www.analyticbridge.datasciencecentral.com,2020-01-30:2004291:BlogPost:3964002020-01-30T08:09:04.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Fermat's last conjecture has puzzled mathematicians for 300 years, and was eventually proved only recently. In this note, I propose a generalization, that could actually lead to a much simpler proof and a more powerful result with broader applications, including to solve numerous similar equations. As usual, my research involves a significant amount of computations and experimental math, as an exploratory step before stating new conjectures, and eventually trying to prove them. The…</p>
<p>Fermat's last conjecture has puzzled mathematicians for 300 years, and was eventually proved only recently. In this note, I propose a generalization, that could actually lead to a much simpler proof and a more powerful result with broader applications, including to solve numerous similar equations. As usual, my research involves a significant amount of computations and experimental math, as an exploratory step before stating new conjectures, and eventually trying to prove them. The methodology is very similar to that used in data science, involving the following steps:</p>
<ol>
<li>Identify and process the data. Here the data set consists of all real numbers; it is infinite, which brings its own challenges. On the plus side, the data is public and accessible to everyone, though very powerful computation techniques are required, usually involving a distributed architecture. </li>
<li>Data cleaning: in this case, inaccuracies are caused by no using enough precision; the solution consists of finding better / faster algorithms for your computations, and sometimes having to work with exact arithmetic, using<span> </span><a href="https://www.datasciencecentral.com/forum/topics/question-how-precision-computing-in-python" target="_blank" rel="noopener">Bignum libraries</a>.</li>
<li>Sample data and perform exploratory analysis to identify patterns. Formulate hypotheses. Perform statistical tests to validate (or not) these hypotheses. Then formulate conjectures based on this analysis. </li>
<li>Build models (about how your numbers seem to behave) and focus on models offering the best fit. Perform simulations based on your model, see if your numbers agree with your simulations, by testing on a much larger set of numbers. Discard conjectures that do not pass these tests.</li>
<li>Formally prove or disprove retained conjectures, when possible. Then write a conclusion if possible: in this case, a new, major mathematical theorem, showing potential applications. This last step is similar to data scientists presenting the main insights of their analysis, to a layman audience.</li>
</ol>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3840031367?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3840031367?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>See <a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-fermat-s-last-theorem?xg_source=activity" target="_blank" rel="noopener">full article</a> for explanations about this table (representing the number of solutions)</em></p>
<p>The motivation in this article is two-fold:</p>
<ul>
<li>Presenting a new path that can lead to new interesting results and theoretical research in mathematics (yet my writing style and content is accessible to the layman).</li>
<li>Offering data scientists and machine learning / AI practitioners (including newbies) an interesting framework to test their programming, discovery and analysis skills, using a huge (infinite) data set that has been available to everyone since the beginning of times, and applied to a fascinating problem. </li>
</ul>
<p><em>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-fermat-s-last-theorem?xg_source=activity" target="_blank" rel="noopener">here</a>. For more math-oriented articles, visit <a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">this page</a> (check the math section), or download my books, available <a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members" target="_blank" rel="noopener">here</a>.</em></p>Best Languages for Data Science and Statistics in One Picturetag:www.analyticbridge.datasciencecentral.com,2020-01-29:2004291:BlogPost:3963922020-01-29T03:41:02.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>Hundreds of programming languages dominate the data science and statistics market: Python, R, SAS and SQL are standouts. If you're looking to branch out and add a new programming language to your skill set, which one should you learn? This one picture breaks down the differences between the four languages.…</span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3838310782?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/3838310782?profile=RESIZE_710x"></img></a></span></p>
<p><span>Hundreds of programming languages dominate the data science and statistics market: Python, R, SAS and SQL are standouts. If you're looking to branch out and add a new programming language to your skill set, which one should you learn? This one picture breaks down the differences between the four languages.</span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3838310782?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3838310782?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p></p>
<p>View the full picture (with pluses and minuses) as well as related articles, <a href="https://www.datasciencecentral.com/profiles/blogs/best-languages-for-data-science-and-statistics-in-one-picture" target="_blank" rel="noopener">here</a>. </p>
<p>Below are more resources for specific languages, including comparisons between languages, and same algorithms illustrated in different languages.</p>
<ul>
<li><a href="https://www.datasciencecentral.com/page/search?q=python" target="_blank" rel="noopener">Python</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=python+vs+R" target="_blank" rel="noopener">Python vs R</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=R" target="_blank" rel="noopener">R</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=sql" target="_blank" rel="noopener">SQL</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=sas" target="_blank" rel="noopener">SAS</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=julia" target="_blank" rel="noopener">Julia</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=scala" target="_blank" rel="noopener">Scala</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=java" target="_blank" rel="noopener">Java</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=c" target="_blank" rel="noopener">C</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=matlab" target="_blank" rel="noopener">Matlab</a></li>
</ul>
<p>To quickly learn these languages or refresh your skills, check out our <a href="https://www.datasciencecentral.com/page/search?q=cheat+sheets" target="_blank" rel="noopener">cheat sheets</a>.</p>
<p></p>Quick Primer On Graph Data Structuretag:www.analyticbridge.datasciencecentral.com,2020-01-21:2004291:BlogPost:3967312020-01-21T17:12:58.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>While many of the programming libraries encapsulate the inner working details of graph and other algorithms, as a data scientist it helps a lot having a reasonably good familiarity of such details. A solid understanding of the intuition behind such algorithms not only helps in appreciating the logic behind them but also helps in making conscious decisions about their applicability in real life cases. There are several graph based algorithms and most notable are the shortest path…</span></p>
<p><span>While many of the programming libraries encapsulate the inner working details of graph and other algorithms, as a data scientist it helps a lot having a reasonably good familiarity of such details. A solid understanding of the intuition behind such algorithms not only helps in appreciating the logic behind them but also helps in making conscious decisions about their applicability in real life cases. There are several graph based algorithms and most notable are the shortest path algorithms. Algorithms such as Dijkstra’s, Bellman Ford, A*, Floyd-Warshall and Johnson’s algorithms are commonly encountered. While these algorithms are discussed in many text books and informative resources online, I felt that not many provided visual examples that would otherwise illustrate the processing steps to sufficient granularity enabling easy understanding of the working details. As such, I had to use simple enough graphs to visualize the algorithmic flow for my own understanding and I wanted to share my examples along with the explanations through this article. Since there are many algorithms to illustrate, I decided to divide the article into several parts. In part 1, I have illustrated Dijkstra’s and Bellman-Ford algorithms. Before diving into algorithms, I also wanted to highlight salient points on the graph data structure.</span></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3829903896?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3829903896?profile=RESIZE_710x" class="align-center"/></a></p>
<p><strong>Content of this article</strong>:</p>
<ul>
<li>Quick Primer On Graph Data Structure</li>
<li>Dijkstra’s Algorithm</li>
<li>Bellman-Ford Algorithm</li>
<li>More Algorithms To Cover</li>
</ul>
<p>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/illustration-of-key-graph-based-shortest-path-algorithms" target="_blank" rel="noopener">here</a>. </p>
<p><em>Written by Murali Kashaboina, Tech. Executive, PhD Researcher AI/ML/DS, Data Scientist, Industry Speaker, Entrepreneur.</em></p>TensorFlow 1.x vs 2.x. – summary of changestag:www.analyticbridge.datasciencecentral.com,2020-01-09:2004291:BlogPost:3962502020-01-09T16:49:08.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>In 2019, Google announced TensorFlow 2.0, it is a major leap from the existing TensorFlow 1.0. The key differences are as follows:</p>
<p><strong>Ease of use:</strong><span> </span>Many old libraries (example tf.contrib) were removed, and some consolidated. For example, in TensorFlow1.x the model could be made using Contrib, layers, Keras or estimators, so many options for the same task confused many new users. TensorFlow 2.0 promotes TensorFlow Keras for model experimentation and Estimators…</p>
<p>In 2019, Google announced TensorFlow 2.0, it is a major leap from the existing TensorFlow 1.0. The key differences are as follows:</p>
<p><strong>Ease of use:</strong><span> </span>Many old libraries (example tf.contrib) were removed, and some consolidated. For example, in TensorFlow1.x the model could be made using Contrib, layers, Keras or estimators, so many options for the same task confused many new users. TensorFlow 2.0 promotes TensorFlow Keras for model experimentation and Estimators for scaled serving, and the two APIs are very convenient to use.</p>
<p><strong>Eager Execution</strong>: In TensorFlow 1.x. The writing of code was divided into two parts: building the computational graph and later creating a session to execute it. this was quite cumbersome, especially if in the big model that you have designed, a small error existed somewhere in the beginning. TensorFlow2.0 Eager Execution is implemented by default, i.e. you no longer need to create a session to run the computational graph, you can see the result of your code directly without the need of creating Session.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3810094427?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3810094427?profile=RESIZE_710x" class="align-center"/></a></p>
<p><strong>Model Building and deploying made easy:</strong> With TensorFlow2.0 providing high level TensorFlow Keras API, the user has a greater flexibility in creating the model. One can define model using Keras functional or sequential API. The TensorFlow Estimator API allows one to run model on a local host or on a distributed multi-server environment without changing your model. Computational graphs are powerful in terms of performance, in TensorFlow 2.0 you can use the decorator<span> </span><strong>tf.function</strong><span> </span>so that the following function block is run as a single graph. This is done via the powerful Autograph feature of TensorFlow 2.0. This allows users to optimize the function and increase portability. And the best part you can write the function using natural Python syntax.</p>
<p><em>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/tensorflow-1-x-vs-2-x-summary-of-changes" target="_blank" rel="noopener">here</a>. To access the author's books covering machine learning, Azure, Tensorflow, deep learning and related topics (free for DSC members), <a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members" target="_blank" rel="noopener">follow this link</a>. </em></p>The Next Big Thing in AI/ML is…tag:www.analyticbridge.datasciencecentral.com,2020-01-07:2004291:BlogPost:3961592020-01-07T14:41:30.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong><em>Summary:</em></strong><em> AI/ML itself is the next big thing for many fields if you’re on the outside looking in. But if you’re a data scientist it’s possible to see those advancements that will propel AI/ML to its next phase of utility.</em></p>
<p> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3673028562?profile=original" rel="noopener" target="_blank"><img class="align-right" src="https://storage.ning.com/topology/rest/1.0/file/get/3673028562?profile=RESIZE_710x" width="350"></img></a> “The Next Big Thing in AI/ML is…” as the lead to an article is probably the most…</p>
<p><strong><em>Summary:</em></strong><em> AI/ML itself is the next big thing for many fields if you’re on the outside looking in. But if you’re a data scientist it’s possible to see those advancements that will propel AI/ML to its next phase of utility.</em></p>
<p> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3673028562?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3673028562?profile=RESIZE_710x" width="350" class="align-right"/></a>“The Next Big Thing in AI/ML is…” as the lead to an article is probably the most overused trope since “once upon a time”. Seriously, just how many ‘next big things’ can there be? Is your incredulity not stretched every time you read that?</p>
<p>It’s tempting to say that writers starting an article in this way should be flogged …except that yours truly did recently start one with “<a href="https://www.datasciencecentral.com/profiles/blogs/causality-the-next-most-important-thing-in-ai-ml"><em><u>the next most IMPORTANT thing in AI/ML</u></em></a>…” Well that’s clearly different isn’t it – almost.</p>
<p>If you label something ‘next big thing’ it’s evident you have a strong opinion – or your marketing department has no imagination. </p>
<p>First of all, if you’re on the outside of AI/ML looking in, AI/ML clearly is the next big thing. Most next-big-thing articles are actually in this category, explaining how AI/ML can enhance everything from your dating life to your investment portfolio.</p>
<p>But if you’re fortunate enough to be on the inside as our readers are then you know that the future of AI/ML is developing along many different paths and some of those should be more important than others. Some are technical, some are applications, and some are even social or philosophical. So how to tell what the next big thing is or at least what the rankings should be.</p>
<p><em>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/the-next-big-thing-in-ai-ml-is" target="_blank" rel="noopener">here</a>. For more recent articles about AI, <a href="https://www.datasciencecentral.com/page/search?q=ai" target="_blank" rel="noopener">follow this link</a>. </em></p>How exactly do you determine causation?tag:www.analyticbridge.datasciencecentral.com,2019-12-17:2004291:BlogPost:3958132019-12-17T21:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><em>Another good article by Ajit Joakar. </em></p>
<p><strong><em>Co-relation does not equal causation</em></strong><span> </span>– is a mantra drilled into a Data Scientist from an early age</p>
<p>That’s fine. But very few talk of the follow-on question ..</p>
<p><strong><em>How exactly do you determine causation?</em></strong></p>
<p>This problem is further compounded because most books and examples are based on standard datasets (ex: Boston, Iris etc) . These examples do not discuss…</p>
<p><em>Another good article by Ajit Joakar. </em></p>
<p><strong><em>Co-relation does not equal causation</em></strong><span> </span>– is a mantra drilled into a Data Scientist from an early age</p>
<p>That’s fine. But very few talk of the follow-on question ..</p>
<p><strong><em>How exactly do you determine causation?</em></strong></p>
<p>This problem is further compounded because most books and examples are based on standard datasets (ex: Boston, Iris etc) . These examples do not discuss causation because the features chosen are already determined to be causal (ex the factors affecting house prices are chosen to be causal.) So, if we start from the beginning (without simplified examples) how do you know if a particular variable is a causal variable?</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3774867428?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3774867428?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Firstly, causality cannot be determined from data alone. Data gives co-relation, but data alone cannot determine causation. To determine causation, we need to perform an<span> </span><strong>experiment or a controlled study</strong>.</p>
<p>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/correlation-does-not-equal-causation-but-how-exactly-do-you/" target="_blank" rel="noopener">here</a>. For other articles on this topic, <a href="https://www.datasciencecentral.com/page/search?q=causation" target="_blank" rel="noopener">follow this link</a>. Other relevant articles include:</p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-lie-with-p-values" target="_blank" rel="noopener">How to Lie with P-values</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/six-degrees-of-separation-between-any-two-data-sets" target="_blank" rel="noopener">Six Degrees of Separation Between Any Two Data Sets</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/the-curse-of-big-data" target="_blank" rel="noopener">The curse of Big Data</a></li>
<li>Chapter 27 (about strong correlation) <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">in this book</a></li>
<li>Pages 165-166 <a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-book" target="_blank" rel="noopener">in this book</a></li>
</ul>
<p></p>Rule of thumb: Which AI / ML algorithms to applytag:www.analyticbridge.datasciencecentral.com,2019-12-17:2004291:BlogPost:3958102019-12-17T16:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><em>Written by Ajit Jaokar.</em></p>
<p>Firstly, there are three broad categories of algorithms:</p>
<ul>
<li><strong>Supervised learning:</strong><span> </span>You know how to classify the input data and the type of behavior you want to predict, but you need the algorithm to calculate it for you on new data</li>
<li><strong>Unsupervised learning:</strong><span> </span>You do not know how to classify the data, and you want the algorithm to find patterns and classify the data for…</li>
</ul>
<p><em>Written by Ajit Jaokar.</em></p>
<p>Firstly, there are three broad categories of algorithms:</p>
<ul>
<li><strong>Supervised learning:</strong><span> </span>You know how to classify the input data and the type of behavior you want to predict, but you need the algorithm to calculate it for you on new data</li>
<li><strong>Unsupervised learning:</strong><span> </span>You do not know how to classify the data, and you want the algorithm to find patterns and classify the data for you</li>
<li><strong>Reinforcement learning:</strong><span> </span>An algorithm which learns by trial and error by interacting with the environment. You use it when you don’t have a lot of training data; you cannot clearly define the ideal end state; or the only way to learn about the environment is to interact with it</li>
</ul>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3774522423?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3774522423?profile=RESIZE_710x" class="align-center"/></a></p>
<p>So, let us consider which algorithms can apply to business problems.</p>
<p><strong>1. Customer services and supply chain</strong></p>
<ul>
<li>Understand product-sales drivers such as competition prices, distribution, advertisement, etc<span> </span><strong>linear regression</strong></li>
<li>Optimize price points and estimate product-price elasticities<span> </span><strong>linear regression</strong></li>
<li>Classify customers based on how likely they are to repay a loan<span> </span><strong>logistic regression</strong></li>
<li>Predict client churn<span> </span><strong>Linear/quadratic discriminant analysis</strong></li>
<li>Predict a sales lead’s likelihood of closing<span> </span><strong>Linear/quadratic discriminant analysis</strong></li>
<li>Detect a company logo in social media to better understand joint marketing opportunities (eg, pairing of brands in one product):<span> </span><strong>Convolutional neural networks</strong></li>
<li>Understand customer brand perception and usage through images :<span> </span><strong>Convolutional neural networks</strong></li>
</ul>
<p><em>To read the full article featuring other applications, including in healthcare and trading, <a href="https://www.datasciencecentral.com/profiles/blogs/rule-of-thumb-which-ai-ml-algorithms-to-apply-to-business-1" target="_blank" rel="noopener">follow this link</a>. For other articles by Ajit Joakar, <a href="https://www.datasciencecentral.com/profiles/blog/list?user=32ac9fc41n4f4" target="_blank" rel="noopener">visit this webpage</a>. Details about these algorithms can be found <a href="https://www.datasciencecentral.com/page/search?q=algorithm" target="_blank" rel="noopener">here</a>. </em></p>Statistics for Data Science in One Picturetag:www.analyticbridge.datasciencecentral.com,2019-12-13:2004291:BlogPost:3957052019-12-13T01:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>There's no doubt about it, probability and statistics is an enormous field, encompassing topics from the familiar (like the average) to the complex (regression analysis, correlation coefficients and hypothesis testing to name but a few). If you want to be a great data scientist, you have to know some basic statistics. The following picture shows which statistics topics you must know if you're going to excel in data science.…</p>
<p></p>
<p>There's no doubt about it, probability and statistics is an enormous field, encompassing topics from the familiar (like the average) to the complex (regression analysis, correlation coefficients and hypothesis testing to name but a few). If you want to be a great data scientist, you have to know some basic statistics. The following picture shows which statistics topics you must know if you're going to excel in data science.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3767565869?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3767565869?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/statistics-for-data-science-in-one-picture" target="_blank" rel="noopener">here</a>. For more concepts explained in one picture, follow <a href="https://www.datasciencecentral.com/page/search?q=in+one+picture" target="_blank" rel="noopener">this link</a>. For articles about statistical and machine learning concepts explained in simple English, from the same author, follow <a href="https://www.datasciencecentral.com/page/search?q=in+simple+english" target="_blank" rel="noopener">this link</a>. Or to download a book featuring many of these resources, click <a href="https://www.datasciencecentral.com/profiles/blogs/online-encyclopedia-of-statistical-science-free-1" target="_blank" rel="noopener">here</a> (free, but available to DSC members exclusively.)</p>
<p><strong>From our Sponsors</strong></p>
<ul>
<li><a href="https://dsc.news/34h27EX" target="_blank" rel="noopener">Future-proof your path to Enterprise AI</a> - Dataiku 6 Webinar Recording</li>
</ul>
<p></p>On Being a 50 Year Old Data Scientisttag:www.analyticbridge.datasciencecentral.com,2019-12-10:2004291:BlogPost:3955862019-12-10T18:51:04.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>At the time of writing, I'm a 52 year-old working in the fields of mathematics and data science. In mathematics, that makes me well-seasoned (and probably well-tenured, if I had chosen to continue in academia). In data science, some would consider me a dinosaur. In fact, many older people considering a career in data science might be put off by the thought that data science is tough to break into at a later age. But is that statement true? Should the over 50 crowd put down their textbooks…</p>
<p>At the time of writing, I'm a 52 year-old working in the fields of mathematics and data science. In mathematics, that makes me well-seasoned (and probably well-tenured, if I had chosen to continue in academia). In data science, some would consider me a dinosaur. In fact, many older people considering a career in data science might be put off by the thought that data science is tough to break into at a later age. But is that statement true? Should the over 50 crowd put down their textbooks and pick up their gardening tools?</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3764064994?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3764064994?profile=RESIZE_710x" class="align-center"/></a></p>
<p><strong>Is Math a Young Person's Game? Maybe</strong></p>
<p>As far as the mathematics portion of my career, I didn't become a mathematician until I was in my mid-thirties. Before that I dabbled with whatever venture brought in a few bob to feed the kids: computer operator, Ebay entrepreneur, aviation electrician. I was 36 when I decided to go back to school to get my master's. If Alfred Adler<span> </span>is to be believed, my "mathematical life" had already long passed by the time I graduated.</p>
<p>Work rarely improves after the age of twenty-five or thirty. If little has been accomplished by then, little will ever be accomplished. </p>
<p>Read the full article by Stephanie Glen, <a href="https://www.datasciencecentral.com/profiles/blogs/on-being-a-50-year-old-data-scientist" target="_blank" rel="noopener">here</a>. For other articles by Stephanie Glen, <a href="https://www.datasciencecentral.com/profiles/blog/list?user=0lahn4b4odglr" target="_blank" rel="noopener">follow this link</a>. </p>
<p><strong>Sponsored Announcement</strong></p>
<ul>
<li><span>Be Indispensable With a Master’s in Data Analytics. As technology and the marketplace change constantly, you want the skills to thrive. The UCLA Anderson Master of Science in Business Analytics is a 13-month program that will give you the tools to become a leader in this rapidly evolving field. Read more <a href="https://dsc.news/2KTpz3V" target="_blank" rel="noopener">here</a>. </span></li>
</ul>How are Deep Neural Networks Adding to Advantage in Climate Change Studies?tag:www.analyticbridge.datasciencecentral.com,2019-12-03:2004291:BlogPost:3955742019-12-03T06:24:32.000ZDivyesh Aegishttps://www.analyticbridge.datasciencecentral.com/profile/DivyeshAegis
<p align="justify">Imagine yourself relocating to a more industrial place for living because your beach house was washed away in the tide. In another scenario imagine yourself wearing masks throughout the year. What if I say that, all this that you just imagined could be a reality very soon?</p>
<p> </p>
<p align="justify"><span>Whether you choose to believe it or not, climate change is happening for real. Even though you might not be able to spot a lot of its impact around you, the world…</span></p>
<p align="justify">Imagine yourself relocating to a more industrial place for living because your beach house was washed away in the tide. In another scenario imagine yourself wearing masks throughout the year. What if I say that, all this that you just imagined could be a reality very soon?</p>
<p> </p>
<p align="justify"><span>Whether you choose to believe it or not, climate change is happening for real. Even though you might not be able to spot a lot of its impact around you, the world around is all set to change in the coming years. Be it the rise in oceans that will lead to submerging of the low lying areas in water or the global rise in temperature. The world as we know today is going to change drastically as a consequence of our actions. </span></p>
<p> </p>
<p align="justify"><span>Look at any newspaper around you. Not a single day goes by without spotting the news of new issues arising on account of climate change. The fact is that it has shaken the world to its core. <a href="https://climate.nasa.gov/causes/">Research at NASA</a> suggests that the leading cause of rapid change in climate is due to the emission of greenhouse gases. This phenomenon is termed as global warming and statistics suggest that there is a 95 percent probability that human actions in the past 50 are responsible for warming up the planet.</span></p>
<p> </p>
<p><strong>Why did Researchers consider AI’s deep neural networks in climate change studies?</strong></p>
<p> </p>
<p align="justify"><span>Climate change has, therefore, become more than just an impending issue. It is an existential crisis. But, all is not lost. Researchers all across the world are coming up with new ideas and concepts that harness the best of technology to battle climate change issues. One of the most advanced and resourceful technologies that are being used in climate change studies in Artificial Intelligence. AI along with its subsidiaries like machine learning are helping scientists and practitioners come up with insightful information and predictions for climate change. These are then being further studied by experts who can collaborate with the governments to take action and form plans to preserve our home planet.</span></p>
<p> </p>
<p align="justify"><span>Before scientists realized the potential of machine learning for climate change, a lot of physics-based models were being used to make predictions. However, most of these followed bottom-up approaches and made predictions only based on physical boundary conditions. On one hand, there were General Circular models or GCMs that <a href="https://www.nexsoftsys.com/technologies/python-development-services.html" target="_blank" rel="noopener">were python software developed</a> by the numerical representation of atmospherically physical conditions. On the other hand, there were Earth System models that considered features like biochemical cycling and atmospheric chemistry. Even though some of the advanced GCM models were used for climate change studies they suffered from significant errors in prediction. </span></p>
<p> </p>
<p align="justify"><span>That’s when researchers started considering the potential of deep neural networks in climate change studies. To understand what a deep neural network means, let’s take a look at a simple neural network. </span></p>
<p> </p>
<p align="justify"><span>A simple neural network can consist of two inputs just like any other model. The differentiating factor is the hidden layer, which can have, let’s say two neurons. These neurons help imitate the functioning of a human brain and consist of attached weights for computation. Now each weight of the neuron is respectively multiplied to each of the inputs of the neuron. These are then summed and activated using an activation function. It is only upon activation that they come out as an output from the neuron. That’s how you design a simple neural network. </span></p>
<p> </p>
<p align="justify"><span>Similarly, when it comes to deep neural networks, the only difference is in the number of hidden layers and corresponding weights attached to them. The more you increase the number of the hidden layers, the more complex or deep your model becomes. But, at this point, it is wise to wonder how any of this helps model the data and ais climate change studies. </span></p>
<p> </p>
<p align="justify"><span>Understand it this way. Hidden layers are the magic of deep neural networks and provide mandatory discrimination to separate your training data. Take a simple XOR function as an example. A single layer neural network does not have the capacity to provide two disjoint decision boundaries for an XOR function. However, adding a hidden layer fulfills the complexity needed. You can choose to increase the number of neurons in a particular layer or increase the number of layers itself for the task. Both ways you will be increasing the complexity of the system. While increasing the number of neurons accounts for a decrease in the training error, it also reduces the generalization of the model, which is a crucial parameter. Similarly, the more hidden units and layers you add in your model, the more complex hyperplanes you can learn.</span></p>
<p> </p>
<p align="justify"><span>For climate change studies this can be a miracle. Researchers have multiple factors that they’d like to add as inputs. After all, you can’t just make a decision about global warming based on an increase in the number of vehicles on the road. The decision is far more complex. When <a href="https://www.nytimes.com/interactive/2019/10/29/climate/coastal-cities-underwater.html">researchers say</a> that rising seas will erase some of the greatest cities by 2050, they don’t just have one or two input parameters. There are billions of complex calculations involved behind these predictions. Deep neural networks help researchers take a wide number of parameters into account and produce a decision boundary that sufficiently encompasses these inputs.</span></p>
<p> </p>
<p><strong>Neural Network Examples in Climate Change Studies</strong></p>
<p> </p>
<p align="justify"><span>Discovery of sustainable materials: One of the greatest crisis in the world is dealing with large amounts of material on the planet that is neither degradable nor recyclable. All these accounts to garbage. But, with deep neural networks in the picture, researchers have an edge to come up with highly optimized molecular structures that are more sustainable and energy-efficient. Most people think that cotton bags are the ultimate solution when it comes to climate change. But, seldom do they realize the fact that a cotton bag consumes more resources than other materials. Generative networks, when used with predictive or recurrent neural network models, can be used to design several property-based molecular designs.</span></p>
<p> </p>
<p align="justify"><span>Precision Monitoring of Regions: Another great field where neural networks can assist through precision modeling is the monitoring of crops and forest regions. Convolution neural networks along with computer vision can help segment the satellite imagery and make predictions about droughts, fires, destruction or negative crop outcomes. They can also help in identifying the type of pests that can be used to find a suitable pest control solution for the soil.</span></p>
<p> </p>
<p><strong>Conclusion</strong></p>
<p> </p>
<p align="justify"><span>The impact of neural networks on climate change studies is huge. With historic and existing data, scientists are finally able to take a peek into the future. Even if the future doesn’t look very bright, AI and ML models can be used to make world leaders aware of the impending scenario. It will help form the right strategies and take control measures within whatever time there’s left for us. </span></p>
<p> </p>Variance, Attractors and Behavior of Chaotic Statistical Systemstag:www.analyticbridge.datasciencecentral.com,2019-11-29:2004291:BlogPost:3957632019-11-29T09:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also…</span></p>
<p><span>We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also related to numeration systems. One of the highlights in this article is the discovery of a simple variance formula for an infinite sum of highly correlated random variables. We also try to find and characterize attractor distributions: these are the limiting distributions for the systems in question, just like the Gaussian attractor is the universal attractor with finite variance in the central limit theorem framework. Each of these systems is governed by a specific functional equation, typically a stochastic integral equation whose solutions are the attractors. This equation helps establish many of their properties. The material discussed here is state-of-the-art and original, yet presented in a format accessible to professionals with limited exposure to statistical science. Physicists, statisticians, data scientists and people interested in signal processing, chaos modeling, or dynamical systems will find this article particularly interesting. Connection to other similar chaotic systems is also discussed. </span></p>
<p>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/chaos-attractors-in-machine-learning-systems" target="_blank" rel="noopener">here</a>. </p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746624910?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746624910?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p><span><strong>Content of this article</strong></span></p>
<p>1. The Geometric System: Definition and Properties</p>
<ul>
<li>A test for independence</li>
<li>Connection to the Fixed-Point Theorem</li>
</ul>
<p>2. Geometric and Uniform Attractors</p>
<ul>
<li>General formula</li>
<li>The geometric attractor</li>
<li>Not any distribution can be an attractor</li>
<li>The uniform attractor</li>
</ul>
<p>3. Discrete <em>X</em> Resulting in a Gaussian-looking Attractor</p>
<ul>
<li>Towards a numerical solution</li>
</ul>
<p>4. Special Cases with Continuous Distribution for <em>X</em></p>
<ul>
<li>An almost perfect equality</li>
<li>Is the log-normal distribution an attractor?</li>
</ul>
<p>5. Connection to Binary Digits and Singular Distributions</p>
<ul>
<li>Numbers made up of random digits</li>
<li>Singular distributions</li>
<li>Connection to Infinite Random Products</li>
</ul>
<p>6. A General Classification of Chaotic Statistical Distributions</p>
<p><em>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/chaos-attractors-in-machine-learning-systems" target="_blank" rel="noopener">here</a>. </em></p>A Lesson in Using NLP for Hidden Feature Extractiontag:www.analyticbridge.datasciencecentral.com,2019-11-29:2004291:BlogPost:3956562019-11-29T05:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><strong><em>Summary:</em></strong><em> 99% of our application of NLP has to do with chatbots or translation. This is a very interesting story about expanding the bounds of NLP and feature creation to predict bestselling novels. The authors created over 20,000 NLP features, about 2,700 of which proved to be predictive with a 90% accuracy rate in predicting NYT bestsellers.…</em></p>
<p></p>
<p><strong><em>Summary:</em></strong><em> 99% of our application of NLP has to do with chatbots or translation. This is a very interesting story about expanding the bounds of NLP and feature creation to predict bestselling novels. The authors created over 20,000 NLP features, about 2,700 of which proved to be predictive with a 90% accuracy rate in predicting NYT bestsellers.</em></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3515945869?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3515945869?profile=RESIZE_710x" width="300" class="align-right"/></a>It’s a pretty rare individual who hasn’t had a personal experience with NLP (Natural Language Processing). About 99% of those experiences are in the form of chatbots or translators, either text or speech in, and text or speech out.</p>
<p>This has proved to be one of the hottest and most economically valuable applications of deep learning but it’s not the whole story.</p>
<p>I recently picked up a copy of a 2016 book entitled<span> </span><em>“The Bestseller Code – Anatomy of the Blockbuster Novel”</em><span> </span>which promised a story about using NLP and machine learning to predict which US fiction novels would make the New York Times Best Sellers list and which would not.</p>
<p>There are about 55,000 new works of fiction published each year (and that doesn’t count self-published). Less than 0.5% or about 200 to 220 make the NYT Bestseller list in a year. Only 3 or 4 of those will sell more than a million copies.</p>
<p>The authors, Jodie Archer (background in publishing), and Matt Jockers (cofounder of the Stanford Literary Lab) write about their model which has an astounding 90% success rate in predicting which books will make the NYT list using a corpus of 5,000 novels from the last 30 years which included 500 NYT Bestsellers.</p>
<p>The book, which I heartily recommend, is not a data science book, nor is it a how-to-write-a-bestseller. And while it has elements of both it’s mostly reporting about the most interesting finds among the 20,000 extracted features they developed, about 2,800 of which proved to be predictive. More on that later.</p>
<p>What struck me was the potential this field of ‘stylometrics’ has for extracting hidden features for almost any problem which has a large amount of text as one of its data sources. Could be CSR logs of customer interaction, could be doctor’s notes, blogs, or warranty repair descriptions where we’re really only scratching the surface with word clouds and sentiment analysis.</p>
<p></p>
<p><em>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/nlp-picks-bestsellers-a-lesson-in-using-nlp-for-hidden-feature-ex" target="_blank" rel="noopener">here</a>.</em></p>New Family of Generalized Gaussian Distributionstag:www.analyticbridge.datasciencecentral.com,2019-11-28:2004291:BlogPost:3957602019-11-28T06:14:46.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>In this article, we explore a new type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These distributions are semi-stable (we define what this means below). In short it is a much wider class than the stable distributions</span><span> (the only stable distribution with a finite variance…</span></p>
<p><span>In this article, we explore a new type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These distributions are semi-stable (we define what this means below). In short it is a much wider class than the stable distributions</span><span> (the only stable distribution with a finite variance being the Gaussian one) and it encompasses all stable distributions as a subset. It is a sub-class of the divisible distributions. </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744926698?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744926698?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p><strong>Content of this article:</strong></p>
<ul>
<li>New two-parameter distribution <em>G</em>(<em>a</em>, <em>b</em>): introduction, properties</li>
<li>Generalized central limit theorem</li>
<li>Characteristic function</li>
<li>Density: special cases, moments, mathematical conjecture</li>
<li>Simulations</li>
<li>Weakly semi-stable distributions</li>
<li>Counter-example</li>
<li>Applications and conclusions</li>
</ul>
<p><em>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/new-family-of-generalized-gaussian-distributions" target="_blank" rel="noopener">here</a>. </em></p>
<p></p>10 Machine Learning Methods that Every Data Scientist Should Knowtag:www.analyticbridge.datasciencecentral.com,2019-11-27:2004291:BlogPost:3957572019-11-27T17:58:33.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw" id="a572">Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.</p>
<p class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw" id="0d4d">To demystify machine learning and to offer a learning path for those who are new to the core…</p>
<p id="a572" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw">Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.</p>
<p id="0d4d" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw">To demystify machine learning and to offer a learning path for those who are new to the core concepts, let’s look at ten different methods, including simple descriptions, visualizations, and examples for each one.</p>
<p id="64a5" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw">A machine learning algorithm, also called model, is a mathematical expression that represents data in the context of a problem, often a business problem. The aim is to go from data to insight. For example, if an online retailer wants to anticipate sales for the next quarter, they might use a machine learning algorithm that predicts those sales based on past sales and other relevant data. Similarly, a windmill manufacturer might visually monitor important equipment and feed the video data through algorithms trained to identify dangerous cracks.</p>
<p class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw"><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744174486?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744174486?profile=RESIZE_710x" class="align-center"/></a></p>
<p id="00c2" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw">The ten methods described offer an overview — and a foundation you can build on as you hone your machine learning knowledge and skill:</p>
<ol class="">
<li id="b886" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw nx ny nz">Regression</li>
<li id="2763" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Classification</li>
<li id="54dd" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Clustering</li>
<li id="c007" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Dimensionality Reduction</li>
<li id="1af1" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Ensemble Methods</li>
</ol>
<p><em>Read the rest of the list, with description for all the 10 algorithms, <a href="https://www.datasciencecentral.com/profiles/blogs/10-machine-learning-methods-that-every-data-scientist-should-know" target="_blank" rel="noopener">here</a>. </em></p>10 Visualizations Every Data Scientist Should Knowtag:www.analyticbridge.datasciencecentral.com,2019-11-12:2004291:BlogPost:3954782019-11-12T17:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><em>This article is by Jorge Castañón, Ph.D., Senior Data Scientist at the IBM Machine Learning Hub.</em></p>
<p class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv" id="5920">Data visualization plays two key roles:</p>
<p class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv" id="085d">1.<span> </span><em class="op">Communicating results clearly to a general audience.</em></p>
<p class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv" id="c440">2.<span> …</span></p>
<p><em>This article is by Jorge Castañón, Ph.D., Senior Data Scientist at the IBM Machine Learning Hub.</em></p>
<p id="5920" class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv">Data visualization plays two key roles:</p>
<p id="085d" class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv">1.<span> </span><em class="op">Communicating results clearly to a general audience.</em></p>
<p id="c440" class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv">2.<span> </span><em class="op">Organizing a view of data that suggests a new hypothesis or a next step in a project.</em></p>
<p id="f14e" class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv">It’s no surprise that most people prefer visuals to large tables of numbers. That’s why clearly labeled plots with meaningful interpretation always make it to the front of academic papers.</p>
<p class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv"><a href="https://storage.ning.com/topology/rest/1.0/file/get/3709852824?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3709852824?profile=RESIZE_710x" class="align-center"/></a></p>
<p id="6028" class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv">This post looks at the 10 visualizations you can bring to bear on your data — whether you want to convince the wider world of your theories or crack open your own project and take the next step:</p>
<ol class="">
<li id="53c6" class="ni nj en ao nk b nl nm nn no np nq nr ns nt nu nv oq or os">Histograms</li>
<li id="ddc7" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">Bar/Pie charts</li>
<li id="6fcc" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">Scatter/Line plots</li>
<li id="3613" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">Time series</li>
<li id="6263" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">Relationship maps</li>
<li id="c7df" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">Heat maps</li>
<li id="d07c" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">Geo Maps</li>
<li id="8f76" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">3-D Plots</li>
<li id="3965" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">Higher-Dimensional Plots</li>
<li id="ec17" class="ni nj en ao nk b nl ot nn ou np ov nr ow nt ox nv oq or os">Word clouds</li>
</ol>
<p>Read the full article, with descriptions and illustrations for these visualizations, <a href="https://www.datasciencecentral.com/profiles/blogs/10-visualizations-every-data-scientist-should-know" target="_blank" rel="noopener">here</a>.</p>Python for Automating Your Quality Analysistag:www.analyticbridge.datasciencecentral.com,2019-11-08:2004291:BlogPost:3951932019-11-08T06:00:00.000ZDivyesh Aegishttps://www.analyticbridge.datasciencecentral.com/profile/DivyeshAegis
<p align="justify">Analyzing the quality of your software is crucial to any business. The process appears towards the end of your software development lifecycle but indeed decides the fate of it. In other words, quality analysis demonstrates a process in which the actual output of the software is tested with its expected output. There are a variety of test inputs that are used in the process of quality analysis so that the product sheds light on the actual truth of where it…</p>
<p align="justify">Analyzing the quality of your software is crucial to any business. The process appears towards the end of your software development lifecycle but indeed decides the fate of it. In other words, quality analysis demonstrates a process in which the actual output of the software is tested with its expected output. There are a variety of test inputs that are used in the process of quality analysis so that the product sheds light on the actual truth of where it stands. </p>
<p> </p>
<p align="justify">The aspect of business growth is fundamental to quality testing of the products even if it doesn’t appear that way. When you build a product, there is more than one developer involved in the job who has their style of coding. As multiple modules are combined, there is a strong possibility that there might present a lot of bugs in the outcome. If you hand over the product to your client or sell it to your customers this way, chances are they will chase you for their money. </p>
<p> </p>
<p align="justify">On the other hand, quality analysis ensures that the software product delivered to your customer matches their expectations. It not just displays diligence in your work but also portrays your brand as authentic. There are many ways that you can accomplish quality testing for your software. However, <a href="https://www.nexsoftsys.com/technologies/python-development-services.html" target="_blank" rel="noopener">using Python development for the task</a> is one of the best practices and guarantees superior results. </p>
<p align="justify"></p>
<p align="justify"><a href="https://storage.ning.com/topology/rest/1.0/file/get/3706679078?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3706679078?profile=RESIZE_710x" class="align-center"/></a></p>
<p> </p>
<p><strong>The Need for Automating Quality Analysis</strong></p>
<p> </p>
<p align="justify"><a href="https://www.macadamian.com/learn/creating-a-strategy-for-software-quality-testing/">Quality analysis</a> of your software product can be done in two ways- manually or through an automated way. While the manual method for automation is what organizations have preferred for a long time, things are beginning to change. There are several disadvantages of manual quality analysis that its automated version can overcome. But whichever method companies use, it is crucial to perform a quality analysis no matter how confident they are about their product. </p>
<p> </p>
<p align="justify">When software is created, it is prone to a lot of bugs. However, it’s not heartbreaking if you are willing to test it and rectify the bugs that appear. Be it the developer’s coding style, missing parameters, or compilation of different modules. Quality analysis ensures that any bug in the software is easily analyzed. Once it is, it can be sent back to the developers for fixing. The point is that the entire QA process assures the quality delivery of products to the clients or customers. </p>
<p> </p>
<p>Similarly, quality analysis ensures that your business is on the right path to growth, and you do not end up with a bunch of unhappy customers. When you deliver flawless quality products to your customers, they are bound to come back to you with more requirements. This way, you get more work opportunities and develop a reputation in the market. </p>
<p> </p>
<p align="justify">Even though you might not see a direct relationship, but quality analysis helps in spreading the right word of mouth among your customers. The point is no matter what you deliver; customers are going to talk about it. So, it’s better to offer them the highest quality that you can, so that what spreads through your mouth is nothing but success stories. </p>
<p>Difference between Automation and Manual Quality Analysis</p>
<p> </p>
<p align="justify">While some companies say that manual testing is the best way to ensure high product quality, others rely on automation for the job. But, both have their <a href="https://www.macadamian.com/learn/7-ways-learning-python-will-improve-software-testing/">pros and cons</a>. While automating all test cases might sound like a perfect solution, it isn’t the case. You can fire a thousand inputs and test an entire application within seconds. But the fact is that software applications are built for humans, not machines. As humans, we interact with apps in a unique number of ways, which is vital to be considered during quality analysis. The most important bugs are found when interacting with the application manually.</p>
<p> </p>
<p align="justify">Running a script over and over again does not show or highlight the bus that arises on account of the usability of an application. But, with advancements in technology, the performance of a software application can be easily tested, and quality can be assured. A large part of organizations believe that testing is only manual, but most of the technical and repetitive stuff can be automated using Python scripts. One of the most efficient practices of ensuring quality is to automate some scenarios in the unit test level, some of the API level, some of the UI level along with testing other scenarios manually.</p>
<p></p>
<p>When it comes to automating the quality analysis, there is nothing better than Python that can do the job. Python is not just a robust programming language, but one of the most flexible and easy to use languages that ensure the highest quality of software applications, when used in QA.</p>
<p></p>
<p><strong>Using ‘Pytest’ </strong></p>
<p align="justify"></p>
<p align="justify">One of the advantages of using Python for quality assurance is that it offers a plethora of relevant framework for the task. ‘Pytest’ is one of the most popular quality assurance frameworks of Python. It helps in testing almost anything and everything from basic scripts to databases and APIs. It also lets you test UIs, offering a lending hand in manual testing. </p>
<p align="justify"></p>
<p align="justify">Pytest can be easily installed from PyPi using the command ‘pip install pytest’. Once installed, it can be called in the project using ‘py.test’. Unlike other Python frameworks, Pytest looks for test files in all locations inside the project directory. In other words, any file starting ‘test_’ or ending with ‘_test’ is considered a test file in the Python terminology. </p>
<p align="justify"></p>
<p align="justify">Pytest provides a much simpler syntax to analyze the quality of your application. For example, the default ‘asset’ statement comes handy as compared to other frameworks. Along with this, there are plenty of other customization that come with the Pytest framework. One of them is called sub-string matching. It helps in testing only the selected method from a particular class. This helps in assuring the functioning of even the smallest element in an application. Similarly marking is another method that can be used to run a specific set of tests. </p>
<p></p>
<p><strong>Conclusion</strong></p>
<p align="justify"></p>
<p align="justify">Python for quality assurance is the key to the growth of your organization. With its parallel processing feature, Python lets you run several test methods in parallel. The abundance of frameworks and libraries in the language makes it a boon for both static and dynamic testing landscapes. </p>
<p> </p>More Weird Statistical Distributionstag:www.analyticbridge.datasciencecentral.com,2019-10-27:2004291:BlogPost:3951392019-10-27T00:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Some original and very interesting material is presented here, with possible applications in Fintech. No need for a PhD in math to understand this article: I tried to make the presentation as simple as possible, focusing on high-level results rather than technicalities. Yet, professional statisticians and mathematicians, even academic researchers, will find some deep and fascinating results worth further exploring.…</p>
<p></p>
<p>Some original and very interesting material is presented here, with possible applications in Fintech. No need for a PhD in math to understand this article: I tried to make the presentation as simple as possible, focusing on high-level results rather than technicalities. Yet, professional statisticians and mathematicians, even academic researchers, will find some deep and fascinating results worth further exploring.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3681849077?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3681849077?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Can you identify patterns in this chart? (see section 2.2. in the article for an answer)</em></p>
<p>Let's start with </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3681308901?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3681308901?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Here the<span> </span><em>X</em>(<em>k</em>)'s are random variable identically and independently distributed, commonly referred to as <em>X</em>. We are trying to find the distribution of<span> </span><em>Z</em>.</p>
<p><strong>Contents</strong></p>
<p>1. Using a Simple Discrete Distribution for <em>X</em></p>
<p>2. Towards a Better Model</p>
<ul>
<li>Approximate Solution</li>
<li>The Fractal, Brownian-like Error Term</li>
</ul>
<p>3. Finding <em>X</em> and <em>Z</em> Using Characteristic Functions</p>
<ul>
<li>Test with Log-normal Distribution for <em>X</em></li>
<li>Playing with the Characteristic Functions</li>
<li>Generalization to Continued Fractions and Nested Cubic Roots</li>
</ul>
<p>4. Exercises</p>
<p><em>Read this article <a href="https://www.datasciencecentral.com/profiles/blogs/math-fun-infinite-nested-radicals-of-random-variables" target="_blank" rel="noopener">here</a>. </em></p>
<p></p>Complete Hands-Off Automated Machine Learningtag:www.analyticbridge.datasciencecentral.com,2019-10-22:2004291:BlogPost:3948882019-10-22T20:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>By Bill Vorhies. </p>
<p><strong><em>Summary:</em></strong><em> Here’ a proposal for real ‘zero touch’, ‘set-em-and-forget-em’ machine learning from the researchers at Amazon. If you have an environment as fast changing as e-retail and a huge number of models matching buyers and products you could achieve real cost savings and revenue increases by making the refresh cycle faster and more accurate with automation. This capability likely will be coming soon to your favorite AML…</em></p>
<p>By Bill Vorhies. </p>
<p><strong><em>Summary:</em></strong><em> Here’ a proposal for real ‘zero touch’, ‘set-em-and-forget-em’ machine learning from the researchers at Amazon. If you have an environment as fast changing as e-retail and a huge number of models matching buyers and products you could achieve real cost savings and revenue increases by making the refresh cycle faster and more accurate with automation. This capability likely will be coming soon to your favorite AML platform.</em></p>
<p><em><a href="https://storage.ning.com/topology/rest/1.0/file/get/3674974988?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3674974988?profile=RESIZE_710x" class="align-center"/></a></em></p>
<p>Is there a future in which we can really ‘set-em-and-forget-em’ machine learning? So far Automated Machine Learning (AML) is delivering on vastly simplifying the creation of models but the maintenance, refresh, and update still require manual intervention.</p>
<p>Not that we’re trying to talk ourselves out of a job. But after all, once the model is built and implemented it’s more fun to move on to the next opportunity. If the maintenance and refresh cycle could be truly automated that would be a good thing.</p>
<p>Much of the effort so far has been put into simplifying getting the model out of its AML environment and into its production environment. Facebook’s FBLearner is an example of this. A number of platforms claim to ease this process for the rest of us. At least once we manually refresh the model it’s easier to update it in production.</p>
<p><em>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/complete-hands-off-automated-machine-learning" target="_blank" rel="noopener">here</a>. </em></p>40+ Modern Tutorials Covering All Aspects of Machine Learningtag:www.analyticbridge.datasciencecentral.com,2019-10-13:2004291:BlogPost:3947202019-10-13T17:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>This list of lists contains books, notebooks, presentations, cheat sheets, and tutorials covering all aspects of data science, machine learning, deep learning, statistics, math, and more, with most documents featuring Python or R code and numerous illustrations or case studies. All this material is available for free, and consists of content mostly created in 2019 and 2018, by various top experts in their respective fields. A few of these documents are available on LinkedIn: see last…</span></p>
<p><span>This list of lists contains books, notebooks, presentations, cheat sheets, and tutorials covering all aspects of data science, machine learning, deep learning, statistics, math, and more, with most documents featuring Python or R code and numerous illustrations or case studies. All this material is available for free, and consists of content mostly created in 2019 and 2018, by various top experts in their respective fields. A few of these documents are available on LinkedIn: see last section on how to download them. </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3660371847?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3660371847?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p><span>Below are the first two sections.</span></p>
<p><strong>General References</strong></p>
<ul>
<li>Free Deep Learning Book (639 pages) by Prof. Gilles Louppe</li>
<li>Python Crash Course (562 pages) by Eric Matthes</li>
<li>Free Book: Applied Data Science (141 pages) - Columbia University</li>
<li>Data Science in Practice</li>
<li>Machine Learning 101 - By Jason Mayes, Google</li>
<li>The Ultimate guide to AI, Data Science & Machine Learning</li>
<li>Free Handbooks for Data Science Professionals</li>
<li>Free Book: Natural Language Processing with Python</li>
<li>Data Visualization Resources</li>
<li>Textbook: Probability Course - Harvard University</li>
<li>Textbook: The Math of Machine Learning - Berkeley University</li>
<li>Comprehensive Guide to Machine Learning - Berkeley University</li>
<li>Free Book: Foundations of Data Science - by Microsoft Research</li>
<li>Comprehensive Guide on Machine Learning - by J.P. Morgan</li>
<li>Gentle Approach to Linear Algebra - by Vincent Granville</li>
</ul>
<p><strong>Data Science Central Books, Booklets and References</strong></p>
<ul>
<li>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</li>
<li>Deep Learning and Computer Vision with CNNs</li>
<li>Getting Started with TensorFlow 2.0</li>
<li>Classification and Regression in a Weekend</li>
<li>Online Encyclopedia of Statistical Science</li>
<li>Azure Machine Learning in a Weekend</li>
<li>Enterprise AI - An Application Perspective</li>
<li>Applied Stochastic Processes</li>
<li>Comprehensive Repository of Data Science and ML Resources</li>
<li>Foundations of ML and Data Science for Developers</li>
<li>Elegant Representation of Forward/Back Propagation in Neural Networks</li>
<li>Learning the Math of Data Science</li>
</ul>
<p>To access all these documents and more, <a href="https://www.datasciencecentral.com/profiles/blogs/40-tutorials-covering-all-aspects-of-machine-learning" target="_blank" rel="noopener">follow this link</a>.</p>Surprising Uses of Synthetic Random Data Setstag:www.analyticbridge.datasciencecentral.com,2019-10-02:2004291:BlogPost:3947462019-10-02T23:00:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>I have used synthetic data sets many times for simulation purposes, most recently in my articles<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/six-degrees-of-separation-between-any-two-data-sets" rel="noopener" target="_blank">Six degrees of Separations between any two Datasets</a><span> </span>and<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-lie-with-p-values" rel="noopener" target="_blank">How to Lie with p-values</a>. Many…</p>
<p>I have used synthetic data sets many times for simulation purposes, most recently in my articles<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/six-degrees-of-separation-between-any-two-data-sets" target="_blank" rel="noopener">Six degrees of Separations between any two Datasets</a><span> </span>and<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-lie-with-p-values" target="_blank" rel="noopener">How to Lie with p-values</a>. Many applications (including the data sets themselves) can be found in my books<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">Applied Stochastic Processes</a><span> </span>and<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">New Foundations of Statistical Science</a>. For instance, these data sets can be used to benchmark some statistical tests of hypothesis (the null hypothesis known to be true or false in advance) and to assess the power of such tests or confidence intervals. In other cases, it is used to simulate clusters and test cluster detection / pattern detection algorithms, see<span> </span><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/how-to-detect-a-pattern-problem-and-solution" target="_blank" rel="noopener">here</a>. I also used such data sets to discover two new deep conjectures in number theory (see<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/two-new-deep-conjectures-in-probabilistic-number-theory" target="_blank" rel="noopener">here</a>), to design new Fintech models such as<span> </span><em>bounded Brownian motions</em>, and find new families of statistical distributions (see<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/a-strange-family-of-statistical-distributions" target="_blank" rel="noopener">here</a>).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3641314354?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3641314354?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Goldbach's comet </em></p>
<p>In this article, I focus on peculiar random data sets to prove -- heuristically -- two of the most famous math conjectures in number theory, related to prime numbers: the Twin Prime conjecture, and the Goldbach conjecture. The methodology is at the intersection of probability theory, experimental math, and probabilistic number theory. It involves working with infinite data sets, dwarfing any data set found in any business context.</p>
<p>Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/surprising-uses-of-synthetic-random-data-sets?xg_source=activity" target="_blank" rel="noopener">here</a>. </p>Six Degrees of Separation Between Any Two Data Setstag:www.analyticbridge.datasciencecentral.com,2019-09-09:2004291:BlogPost:3943772019-09-09T16:30:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>This is an interesting data science conjecture, inspired by the well known<span> </span><a href="https://www.bigdatanews.datasciencecentral.com/profiles/blogs/graph-theory-six-degrees-of-separation-problem" rel="noopener" target="_blank">six degrees of separation problem</a>, stating that there is a link involving no more than 6 connections between any two people on Earth, say between you and anyone living (say) in North Korea. </p>
<p>Here the link is between any two univariate data sets…</p>
<p>This is an interesting data science conjecture, inspired by the well known<span> </span><a href="https://www.bigdatanews.datasciencecentral.com/profiles/blogs/graph-theory-six-degrees-of-separation-problem" target="_blank" rel="noopener">six degrees of separation problem</a>, stating that there is a link involving no more than 6 connections between any two people on Earth, say between you and anyone living (say) in North Korea. </p>
<p>Here the link is between any two univariate data sets of the same size, say Data A and Data B. The claim is that there is a chain involving no more than 6 intermediary data sets, each highly correlated to the previous one (with a correlation above 0.8), between Data A and Data B. The concept is illustrated in the example below, where only 4 intermediary data sets (labeled Degree 1, Degree 2, Degree 3, and Degree 4) are actually needed. </p>
<p><img src="https://storage.ning.com/topology/rest/1.0/file/get/3547469050?profile=RESIZE_710x" class="align-center"/></p>
<p style="text-align: center;"><em>Correlation table for the 6 data sets</em></p>
<p>The view the (random) data sets, understand how the chain of intermediary data sets was built, and access the spreadsheets to reproduce the results or test on different data, <a href="https://www.datasciencecentral.com/profiles/blogs/six-degrees-of-separation-between-any-two-data-sets" target="_blank" rel="noopener">follow this link</a>. I<span>t makes for an interesting theoretical data science research project, for people with too much free time on their hands. </span></p>Two New Deep Conjectures in Probabilistic Number Theorytag:www.analyticbridge.datasciencecentral.com,2019-09-08:2004291:BlogPost:3941282019-09-08T10:09:38.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>The material discussed here is also of interest to machine learning, AI, big data, and data science practitioners, as much of the work is based on heavy data processing, algorithms, efficient coding, testing, and experimentation. Also, it's not just two new conjectures, but paths and suggestions to solve these problems. The last section contains a few new, original exercises, some with solutions, and may be useful to students, researchers, and instructors offering math and statistics classes…</p>
<p>The material discussed here is also of interest to machine learning, AI, big data, and data science practitioners, as much of the work is based on heavy data processing, algorithms, efficient coding, testing, and experimentation. Also, it's not just two new conjectures, but paths and suggestions to solve these problems. The last section contains a few new, original exercises, some with solutions, and may be useful to students, researchers, and instructors offering math and statistics classes at the college level: they range from easy to very difficult. Some great probability theorems are also discussed, in layman's terms: see section 1.2. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3546311327?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3546311327?profile=RESIZE_710x" class="align-center"/></a></p>
<p>The two deep conjectures highlighted in this article (conjectures B and C) are related to the digit distribution of well known math constants such as Pi or log 2, with an emphasis on binary digits of SQRT(2). This is an old problem, one of the most famous ones in mathematics, still unsolved today.</p>
<p><strong>Content of this article</strong></p>
<p>A Strange Recursive Formula</p>
<ul>
<li>Conjecture A</li>
<li>A deeper result</li>
<li>Conjecture B</li>
<li>Connection to the Berry-Esseen theorem</li>
<li>Potential path to solving this problem</li>
</ul>
<p>Potential Solution Based on Special Rational Number Sequences</p>
<ul>
<li>Interesting statistical result</li>
<li>Conjecture C</li>
<li>Another curious statistical result</li>
</ul>
<p>Exercises</p>
<p><em>Read the full article <a href="https://www.datasciencecentral.com/profiles/blogs/two-new-deep-conjectures-in-probabilistic-number-theory" target="_blank" rel="noopener">here</a>. </em></p>Python as a tool benefiting data scientists in many waystag:www.analyticbridge.datasciencecentral.com,2019-09-05:2004291:BlogPost:3945402019-09-05T06:00:00.000ZDivyesh Aegishttps://www.analyticbridge.datasciencecentral.com/profile/DivyeshAegis
<p align="justify">Being extremely versatile general purpose, professional programming language, Python offers plenty of applications. Python language is user-friendly and simple to grasp and this made it popular throughout the world. Python plays a critical role for data scientists to find out lucrative job opportunities. </p>
<p align="justify"></p>
<p align="justify">Today, Python has become the most in-demand programming language in the data science world. Python offers an extensive range…</p>
<p align="justify">Being extremely versatile general purpose, professional programming language, Python offers plenty of applications. Python language is user-friendly and simple to grasp and this made it popular throughout the world. Python plays a critical role for data scientists to find out lucrative job opportunities. </p>
<p align="justify"></p>
<p align="justify">Today, Python has become the most in-demand programming language in the data science world. Python offers an extensive range of applications <strong><a href="https://www.nexsoftsys.com/technologies/python-development-services.html" target="_blank" rel="noopener">like python software development</a></strong>, web development, analysis of data sets, mobile app development, computing of numeric and scientific data and developing machine learning algorithms. </p>
<p align="justify"></p>
<p align="justify">You can any time enhance your skillset by taking up Big Data training to get the best paying data science jobs. Data science includes dealing with a bulk amount of data sets that are usually complex to work with. Python programming language is simple to use as compared to other high-level programming languages when it comes to quantitative computing and analytical computing. This makes python the most preferred programming language in the data science world. </p>
<p align="justify"></p>
<p align="justify">If you are a beginner, you can think about python training or data science jobs. There is a reason behind it, i.e. python is being applied by several industrial verticals, such as signal processing, marketing, technology, finance, business, oil and gas, medical, etc. You can avail myriad options in python but data scientist job is the highest paying one. </p>
<p align="justify"></p>
<p align="justify"><a href="https://storage.ning.com/topology/rest/1.0/file/get/3526440174?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3526440174?profile=RESIZE_710x" class="align-full"/></a></p>
<p align="justify"></p>
<p align="justify">Python is used by data scientists more than other data science tools, because of the following benefits-</p>
<p align="justify"></p>
<p><strong>Powerful and User-friendly Programming Language </strong></p>
<p></p>
<p align="justify">Python is an open-source tool that offers high flexibility. You can call it a beginner’s programming language since it can be learned by any employee or student using basic knowledge. The simple learning curve of python enables each individual to learn at their own pace.</p>
<p align="justify"></p>
<p align="justify">It is different from other programming languages since it offers smooth learning with a gradual increase in hard levels. With the help of this feature, learners of python can adapt to the slow-paced change and complete training without any doubt.</p>
<p align="justify"></p>
<p>Programming languages like C, C#, and Java consume more time in code implementation when compared to Python. Code implementation in python is lesser since it reduces the time spent on debugging code that helps scientists work most efficiently.</p>
<p></p>
<p><strong>More Libraries</strong></p>
<p></p>
<p align="justify">Python programming language supports newly evolved technologies like AI and ML and also offers a massive database of libraries. Every library includes different useful modules. You can anytime import and implement these modules in your routine coding.</p>
<p align="justify"></p>
<p>Major popular libraries found in python are Seaborn, Matlotlib, Pytorch, TensorFlow, Scikit Learn, and more. You can find more websites offering easy python courses for data science.</p>
<p></p>
<p><strong>Scalability</strong></p>
<p></p>
<p align="justify">When it comes to scalability, nothing can beat the python programming language. Python is extremely scalable and more efficient than other programming languages like Java and R. Since the world is getting new technology in every minute and the consumer demand is on the rise, companies are being pressurized by the market to provide faster and better results.</p>
<p align="justify"></p>
<p align="justify">There is a reason why many organizations prefer using python in their data science algorithms. The programming language python offers them ease of scalability and quick turnaround time. It is now easy for them to build rapid apps and developing tools of all kinds at affordable prices with python. They even use this programming tool to solve complex problems that involve bulk data sets, which is next to impossible with other available programming languages.</p>
<p></p>
<p><strong>Graphics and visualization </strong></p>
<p></p>
<p align="justify">If you own a company, you should know the significance hold by visualization and graphics. Both graphics and visualization act as a key driver for growth in the industry. Dealing with big data and datasets can be a time consuming and stressful process. Python provides support of different visualization options that makes it one of the most efficient programming languages in the industry.</p>
<p align="justify"></p>
<p align="justify">Matplotlib is one of the python libraries offering a strong foundation using which other libraries like PyTorch, ggplot, pandas plotting are developed. These packages of library assist in building charts, web-ready plots, graphical information, and graphical layouts.</p>
<p align="justify"></p>
<p align="justify">Python has gone through a lot of development and evolvement over the last years. Most of the time, it is the course like big data training along with python language is mostly chosen by individuals who are interested in making their career in data science.</p>
<p align="justify"></p>
<p align="justify">Companies that are transitioning to the data science world are offering desired training and courses to their staff employees. Python is a great tool benefitting data scientists in several ways. Most of them have been discussed in this article. If you know more of such benefits provided by Python programming language, you can share them with other readers through comments.</p>
<p> </p>10 Machine Learning Methods that Every Data Scientist Should Knowtag:www.analyticbridge.datasciencecentral.com,2019-08-30:2004291:BlogPost:3944382019-08-30T17:08:12.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw" id="a572">Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.</p>
<p class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw" id="0d4d">To demystify machine learning and to offer a learning path for those who are new to the core…</p>
<p id="a572" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw">Machine learning is a hot topic in research and industry, with new methodologies developed all the time. The speed and complexity of the field makes keeping up with new techniques difficult even for experts — and potentially overwhelming for beginners.</p>
<p id="0d4d" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw">To demystify machine learning and to offer a learning path for those who are new to the core concepts, let’s look at ten different methods, including simple descriptions, visualizations, and examples for each one.</p>
<p class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw"><a href="https://storage.ning.com/topology/rest/1.0/file/get/3487793979?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3487793979?profile=RESIZE_710x" class="align-center"/></a></p>
<p id="64a5" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw">A machine learning algorithm, also called model, is a mathematical expression that represents data in the context of a problem, often a business problem. The aim is to go from data to insight. For example, if an online retailer wants to anticipate sales for the next quarter, they might use a machine learning algorithm that predicts those sales based on past sales and other relevant data. Similarly, a windmill manufacturer might visually monitor important equipment and feed the video data through algorithms trained to identify dangerous cracks.</p>
<p id="00c2" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw">The ten methods described offer an overview — and a foundation you can build on as you hone your machine learning knowledge and skill:</p>
<ol class="">
<li id="b886" class="nj nk eo ao nl b nm nn no np nq nr ns nt nu nv nw nx ny nz">Regression</li>
<li id="2763" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Classification</li>
<li id="54dd" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Clustering</li>
<li id="c007" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Dimensionality Reduction</li>
<li id="1af1" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Ensemble Methods</li>
<li id="91ed" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Neural Nets and Deep Learning</li>
<li id="5128" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Transfer Learning</li>
<li id="2251" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Reinforcement Learning</li>
<li id="6975" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Natural Language Processing</li>
<li id="429f" class="nj nk eo ao nl b nm ob no oc nq od ns oe nu of nw nx ny nz">Word Embeddings</li>
</ol>
<p><em>Read the full article, with detailed description for each method, <a href="https://www.datasciencecentral.com/profiles/blogs/10-machine-learning-methods-that-every-data-scientist-should-know" target="_blank" rel="noopener">here</a>. </em></p>A Strange Family of Statistical Distributionstag:www.analyticbridge.datasciencecentral.com,2019-08-30:2004291:BlogPost:3943402019-08-30T16:11:16.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p><span>I introduce here a family of very peculiar statistical distributions governed by two parameters: </span><em>p</em><span>, a real number in [0, 1], and </span><em>b</em><span>, an integer > 1. </span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3487729021?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/3487729021?profile=RESIZE_710x"></img></a></p>
<p><span>Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number…</span></p>
<p><span>I introduce here a family of very peculiar statistical distributions governed by two parameters: </span><em>p</em><span>, a real number in [0, 1], and </span><em>b</em><span>, an integer > 1. </span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3487729021?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3487729021?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span>Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number generation, benchmarking statistical tests (see </span><a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">here</a><span>) and even gaming (see </span><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-foundations-for-a-new-stock-market" target="_blank" rel="noopener">here</a><span>.) However, the most interesting application is probably to gain insights about how non-normal numbers look like, especially their chaotic nature. It is a fundamental tool to help solve one of the most intriguing mathematical conjectures of all times (yet unsolved): are the digits of standard constants such as Pi or SQRT(2) uniformly distributed or not? For instance, when </span><em>b</em><span> = 2, any departure from </span><em>p</em><span> = 0.5 (a normal seed) results in a strong discontinuity for </span><em>f</em><span>(</span><em>x</em><span>) at </span><em>x</em><span> = 0.5. If you look at the above chart, </span><em>f(</em><span>0) = </span><em>f(</em><span>1/2) = </span><em>f</em><span>(1) regardless of </span><em>p</em><span>, but discontinuities are masking this fact. </span></p>
<p><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-strange-family-of-statistical-distributions" target="_blank" rel="noopener">Read full article here</a>. </span></p>Extreme Events Modeling Using Continued Fractionstag:www.analyticbridge.datasciencecentral.com,2019-08-30:2004291:BlogPost:3943242019-08-30T15:42:00.000ZVincent Granvillehttps://www.analyticbridge.datasciencecentral.com/profile/VincentGranville
<p>Continued fractions are usually considered as a beautiful, curious mathematical topic, but with applications mostly theoretical and limited to math and number theory. Here we show how it can be used in applied business and economics contexts, leveraging the mathematical theory developed for continued fraction, to model and explain natural phenomena. …</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3487696331?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/3487696331?profile=RESIZE_710x"></img></a></p>
<p>Continued fractions are usually considered as a beautiful, curious mathematical topic, but with applications mostly theoretical and limited to math and number theory. Here we show how it can be used in applied business and economics contexts, leveraging the mathematical theory developed for continued fraction, to model and explain natural phenomena. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3487696331?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3487696331?profile=RESIZE_710x" class="align-center"/></a></p>
<p>The interest in this project started when analyzing sequences such as<span> </span><em>x</em>(<em>n</em>) = {<span> </span><em>nq</em><span> </span>} =<span> </span><em>nq</em><span> </span>- INT(<em>nq</em>) where<span> </span><em>n</em>= 1, 2, and so on, and<span> </span><em>q</em><span> </span>is an irrational number in [0, 1] called the<span> </span><em>seed</em>. The brackets denote the fractional part function. The values<span> </span><em>x</em>(<em>n</em>) are also in [0, 1] and get arbitrarily close to 0 and 1 infinitely often, and indeed arbitrarily close to any number in [0, 1] infinitely often. I became interested to see what happens when it gets very close to 1, and more precisely, about the distribution of the arrival times<span> </span><em>t</em>(<em>n</em>) of successive records. I was curious to compare these arrival times with those from truly random numbers, or from real-life time series such as temperature, stock market or gaming/sports data. Such arrival times are known to have an infinite expectation under stable conditions, though their medians always exist: after all, any record could be the final one, never to be surpassed again in the future. This always happens at some point with the sequence<span> </span><em>x</em>(<em>n</em>), if<span> </span><em>q</em><span> </span>is a rational number -- thus our focus on irrational seeds: they yield successive records that keep growing over and over, without end, although the gaps between successive records eventually grow very large, in a chaotic, unpredictable way, just like records in traditional time series.</p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/extreme-events-modeling-using-continued-fractions" target="_blank" rel="noopener">Read the full article here</a>.</p>
<p><strong>Content</strong>:</p>
<ul>
<li>Theoretical background (simplified)</li>
<li>Generalization and potential applications to real life problems</li>
<li>Original applications in music and probabilistic number theory</li>
</ul>