Given n observations x1, ..., xn, the generalized mean (also called power mean) is defined as The case p = 1 corresponds to the traditional arithmetic mean, while p = 0 yields the geometric mean, and p = -1 yields the harmonic mean. See here for details. This metric is favored by statisticians. It is a particular case of the quasi-arithmetic mean. Here I introduce another kind of mean called exponential mean, also based on a parameter p, that may have an appeal to data scientists and machine learning professionals. It is also a special case of the quasi-arithmetic mean. Though the concept is basic, there is very little if any literature about it. It is related to the LogSumExp and the Log semiring. It is defined as follows:Here the logarithm is in base p, with p positive. When p tends to 0, mp is the minimum of the observations. When p tends to 1, it yields the classic arithmetic mean, and as p tends to infinity, it yields the maximum of the observations. Content of this articleAdvantages of the exponential meanIllustration on a test data setImportant inequalityDoubly exponential meanRead the full article here. See More

]]>

]]>

Product of two large primes are at the core of many encryption algorithms, as factoring the product is very hard for numbers with a few hundred digits. The two prime factors are associated with the encryption keys (public and private keys). Here we describe a new approach to factoring a big number that is the product of two primes of roughly the same size. It is designed especially to handle this problem and identify flaws in encryption algorithms. Riemann zeta function in the complex planeWhile at first glance it appears to substantially reduce the computational complexity of traditional factoring, at this stage there is still a lot of progress needed to make the new algorithm efficient. An interesting feature is that the success depends on the probability of two numbers to be co-prime, given the fact that they don't share the first few primes (say 2, 3, 5, 7, 11, 13) as common divisors. This probability can be computed explicitly and is about 99%. The methodology relies heavily on solving systems of congruences, the Chinese Remainder Theorem, and the modular multiplicative inverse of some carefully chosen integers. We also discuss computational complexity issues. Finally, the off-the-beaten-path material presented here leads to many original exercises or exam questions for students learning probability, computer science, or number theory: proving the various simple statements made in my article. ContentSome Number Theory Explained in Simple EnglishCo-primes and pairwise co-primesProbability of being co-primeModular multiplicative inverseChinese remainder theorem, version AChinese remainder theorem, version BThe New Factoring AlgorithmImproving computational complexityFive-step algorithmProbabilistic optimizationCompact Formulation of the ProblemRead the full article here. Other Math Articles by Same AuthorHere is a selection of articles pertaining to experimental math and probabilistic number theory:Statistics: New Foundations, Toolbox, and Machine Learning RecipesApplied Stochastic ProcessesVariance, Attractors and Behavior of Chaotic Statistical SystemsNew Family of Generalized Gaussian DistributionsA Beautiful Result in Probability TheoryTwo New Deep Conjectures in Probabilistic Number TheoryExtreme Events Modeling Using Continued FractionsA Strange Family of Statistical DistributionsSome Fun with Gentle Chaos, the Golden Ratio, and Stochastic Number...Fascinating New Results in the Theory of RandomnessTwo Beautiful Mathematical Results - Part 2Two Beautiful Mathematical ResultsNumber Theory: Nice Generalization of the Waring ConjectureFascinating Chaotic Sequences with Cool ApplicationsSimple Proof of the Prime Number TheoremFactoring Massive Numbers: Machine Learning ApproachSee More

We discuss a simple trick to significantly accelerate the convergence of an algorithm when the error term decreases in absolute value over successive iterations, with the error term oscillating (not necessarily periodically) between positive and negative values. We first illustrate the technique on a well known and simple case: the computation of log 2 using its well know, slow-converging series. We then discuss a very interesting and more complex case, before finally focusing on a more challenging example in the context of probabilistic number theory and experimental math.The technique must be tested for each specific case to assess the improvement in convergence speed. There is no general, theoretical rule to measure the gain, and if the error term does not oscillate in a balanced way between positive and negative values, this technique does not produce any gain. However, in the examples below, the gain was dramatic. Let's say you run an algorithm, for instance gradient descent. The input (model parameters) is x, the output if f(x), for instance a local optimum. We consider f(x) to be univariate, but it easily generalizes to the multivariate case, by applying the technique separately for each component. At iteration k, you obtain an approximation f(k, x) of f(x), and the error is E(k, x) = f(x) - f(k, x). The total number of iterations is N. starting with first iteration k = 1. The idea consists in first running the algorithm as is, and then compute the "smoothed" approximations, using the following m steps.Read the full article here.ContentGeneral framework and simple illustrationA strange functionEven stranger functionsSee More

]]>

The methodology described here has broad applications, leading to new statistical tests, new type of ANOVA (analysis of variance), improved design of experiments, interesting fractional factorial designs, a better understanding of irrational numbers leading to cryptography, gaming and Fintech applications, and high quality random numbers generators (and when you really need them). It also features exact arithmetic / high performance computing and distributed algorithms to compute millions of binary digits for an infinite family of real numbers, including detection of auto- and cross-correlations (or lack of) in the digit distributions.The data processed in my experiment, consisting of raw irrational numbers (described by a new class of elementary recurrences) led to the discovery of unexpected apparent patterns in their digit distribution: in particular, the fact that a few of these numbers, contrarily to popular belief, do not have 50% of their binary digits equal to 1. It turned out that perfectly random digits simulated in large numbers, with a good enough pseudo-random generator, also exhibit the same strange behavior, pointing to the fact that pure randomness may not be as random as we imagine it is. Ironically, failure to exhibit these patterns would be an indicator that there really is a departure from pure randomness in the digits in question.In addition to new statistical / mathematical methods and discoveries and interesting applications, you will learn in my article how to avoid this type of statistical traps that lead to erroneous conclusions, when performing a large number of statistical tests, and how to not be misled by false appearances. I call them statistical hallucinations and false outliers.This article has two main sections: section 1, with deep research in number theory, and section 2, with deep research in statistics, with applications. You may skip one of the two sections depending on your interests and how much time you have. Both sections, despite state-of-the-art in their respective fields, are written in simple English. It is my wish that with this article, I can get data scientists to be interested in math, and the other way around: the topics in both cases have been chosen to be exciting and modern. I also hope that this article will give you new powerful tools to add to your arsenal of tricks and techniques. Both topics are related, the statistical analysis being based on the numbers discussed in the math section. One of the interesting new topics discussed here for the first time is the cross-correlation between the digits of two irrational numbers. These digit sequences are treated as multivariate time series. I believe this is the first time ever that this subject is not only investigated in detail, but in addition comes with a deep, spectacular probabilistic number theory result about the distributions in question, with important implications in security and cryptography systems. Another related topic discussed here is a generalized version of the Collatz conjecture, with some insights on how to potentially solve it.Read the full article here. Content1. On the Digits Distribution of Quadractic Irrational NumbersProperties of the recursionReverse recursionProperties of the reverse recursionConnection to Collatz conjectureSource codeNew deep probabilistic number theory resultsSpectacular new result about cross-correlationsApplications2. New Statistical Techniques Used in Our AnalysisData, features, and preliminary analysisDoing it the right wayAre the patterns found a statistical illusion, or caused by errors, or real?Pattern #1: Non-Gaussian behaviorPattern #2: Illusionary outliersPattern #3: Weird distribution for block countsRelated articles and booksAppendixSee More

Summary: The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out the big news is how much more capable all the platforms have become. Of course there are also some interesting winner and loser stories.The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out for 2020. The really big news is how many excellent choices are now available. In a remarkable move, the whole field of competitors has moved strongly up and to the right offering more and more Leaders or near-leader Visionaries than ever before.It’s a mark of maturity in our industry that so many platforms offer fully capable model development, operationalizing, and management features. That list of requirements as defined by Gartner grows longer every year and earning a better rating requires increasing capability and increasing customer satisfaction.What Are the Major Changes?As in previous years we’ve charted the major changes in position using green arrows for improvement and red arrows to indicate a reduced rating. The blue dots are current ratings and the gray dots are from a year ago.Read the full article here with the 2020 version of the above chart, with comments.See More

In this notebook, we try to predict the positive (label 1) or negative (label 0) sentiment of the sentence. We use the UCI Sentiment Labelled Sentences Data Set.Sentiment analysis is very useful in many areas. For example, it can be used for internet conversations moderation. Also, it is possible to predict ratings that users can assign to a certain product (food, household appliances, hotels, films, etc) based on the reviews.In this notebook we are using two families of machine learning algorithms: Naive Bayes (NB) and long short term memory (LSTM) neural networks.AYLIENDeeplearning4jUnderstanding LSTM NetworksEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling The Unreasonable Effectiveness of Recurrent Neural NetworksWe will use pandas, numpy for data manipulation, nltk for natural language processing, matplotlib, seaborn and plotly for data visualization, sklearn and keras for learning the models.Read the full article with source code and illustrations, here. See More

Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing how to handle and fix it.)This is being done on such a large scale, I think it is probably the main cause of fake news, and the impact is disastrous on people who take for granted what they read in the news or what they hear from the government. Some people are sent to jail based on evidence tainted with major statistical flaws. Government money is spent, propaganda is generated, wars are started, and laws are created based on false evidence. Sometimes the data scientist has no choice but to knowingly cook the numbers to keep her job. Usually, these “bad stats” end up being featured in beautiful but faulty visualizations: axes are truncated, charts are distorted, observations and variables are carefully chosen just to make a (wrong) point.Read the full article here. Related articlesHow to Lie with P-valuesFour Types of Data ScientistDebunking Forbes Article about the Death of the Data ScientistWhy You Should be a Data Science Generalist - and How to Become OneBecoming a Billionaire Data Scientist vs Struggling to Get a $100k Job Is a PhD helpful for a data science career?If data science is in demand, why is it so hard to get a job?Why do people with no experience want to become data scientists?Why is Becoming a Data Scientist so Difficult?Full Stack Data Scientist: The Elusive Unicorn and Data HackerStatistical Significance and p-Values Take Another BlowAre data science or stats curricula in US too specialized?How do you identify an actual data scientist?Is it still possible today to become a self-taught data scientist?Will the job outlook for data scientists severely decline after 2020?Why Logistic Regression should be the last thing you learnSource for picture: here See More